All of lore.kernel.org
 help / color / mirror / Atom feed
From: Boris Brezillon <boris.brezillon@collabora.com>
To: dri-devel@lists.freedesktop.org
Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>,
	Daniel Vetter <daniel.vetter@ffwll.ch>,
	Steven Price <steven.price@arm.com>,
	Rob Herring <robh+dt@kernel.org>,
	Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>,
	Boris Brezillon <boris.brezillon@collabora.com>,
	Robin Murphy <robin.murphy@arm.com>
Subject: [PATCH v5 01/16] drm/sched: Document what the timedout_job method should do
Date: Tue, 29 Jun 2021 09:34:55 +0200	[thread overview]
Message-ID: <20210629073510.2764391-2-boris.brezillon@collabora.com> (raw)
In-Reply-To: <20210629073510.2764391-1-boris.brezillon@collabora.com>

The documentation is a bit vague and doesn't really describe what the
->timedout_job() is expected to do. Let's add a few more details.

v5:
* New patch

Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
---
 include/drm/gpu_scheduler.h | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 10225a0a35d0..65700511e074 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -239,6 +239,20 @@ struct drm_sched_backend_ops {
 	 * @timedout_job: Called when a job has taken too long to execute,
 	 * to trigger GPU recovery.
 	 *
+	 * This method is called in a workqueue context.
+	 *
+	 * Drivers typically issue a reset to recover from GPU hangs, and this
+	 * procedure usually follows the following workflow:
+	 *
+	 * 1. Stop the scheduler using drm_sched_stop(). This will park the
+	 *    scheduler thread and cancel the timeout work, guaranteeing that
+	 *    nothing is queued while we reset the hardware queue
+	 * 2. Try to gracefully stop non-faulty jobs (optional)
+	 * 3. Issue a GPU reset (driver-specific)
+	 * 4. Re-submit jobs using drm_sched_resubmit_jobs()
+	 * 5. Restart the scheduler using drm_sched_start(). At that point, new
+	 *    jobs can be queued, and the scheduler thread is unblocked
+	 *
 	 * Return DRM_GPU_SCHED_STAT_NOMINAL, when all is normal,
 	 * and the underlying driver has started or completed recovery.
 	 *
-- 
2.31.1


  reply	other threads:[~2021-06-29  7:35 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-29  7:34 [PATCH v5 00/16] drm/panfrost: Misc improvements Boris Brezillon
2021-06-29  7:34 ` Boris Brezillon [this message]
2021-06-29  9:05   ` [PATCH v5 01/16] drm/sched: Document what the timedout_job method should do Daniel Vetter
2021-06-29  7:34 ` [PATCH v5 02/16] drm/sched: Allow using a dedicated workqueue for the timeout/fault tdr Boris Brezillon
2021-06-29  8:50   ` Daniel Vetter
2021-06-29  8:58     ` Boris Brezillon
2021-06-29 11:03   ` Christian König
2021-06-29 11:18     ` Boris Brezillon
2021-06-29 11:24       ` Christian König
2021-06-29 14:05         ` Daniel Vetter
2021-09-07 18:53         ` Andrey Grodzovsky
2021-09-08  6:50           ` Boris Brezillon
2021-09-08 14:53             ` Andrey Grodzovsky
2021-09-08 14:55               ` Boris Brezillon
2021-06-29  7:34 ` [PATCH v5 03/16] drm/panfrost: Make ->run_job() return an ERR_PTR() when appropriate Boris Brezillon
2021-06-29  7:34 ` [PATCH v5 04/16] drm/panfrost: Get rid of the unused JS_STATUS_EVENT_ACTIVE definition Boris Brezillon
2021-06-29  7:34 ` [PATCH v5 05/16] drm/panfrost: Drop the pfdev argument passed to panfrost_exception_name() Boris Brezillon
2021-06-29  7:35 ` [PATCH v5 06/16] drm/panfrost: Do the exception -> string translation using a table Boris Brezillon
2021-06-29  7:35 ` [PATCH v5 07/16] drm/panfrost: Expose a helper to trigger a GPU reset Boris Brezillon
2021-06-29  7:35 ` [PATCH v5 08/16] drm/panfrost: Use a threaded IRQ for job interrupts Boris Brezillon
2021-06-29  7:35 ` [PATCH v5 09/16] drm/panfrost: Simplify the reset serialization logic Boris Brezillon
2021-06-29 11:32   ` Boris Brezillon
2021-06-29  7:35 ` [PATCH v5 10/16] drm/panfrost: Make sure job interrupts are masked before resetting Boris Brezillon
2021-06-29  7:35 ` [PATCH v5 11/16] drm/panfrost: Disable the AS on unhandled page faults Boris Brezillon
2021-06-29  7:35 ` [PATCH v5 12/16] drm/panfrost: Reset the GPU when the AS_ACTIVE bit is stuck Boris Brezillon
2021-06-29  7:35 ` [PATCH v5 13/16] drm/panfrost: Don't reset the GPU on job faults unless we really have to Boris Brezillon
2021-06-29  7:35 ` [PATCH v5 14/16] drm/panfrost: Kill in-flight jobs on FD close Boris Brezillon
2021-06-29  7:35 ` [PATCH v5 15/16] drm/panfrost: Queue jobs on the hardware Boris Brezillon
2021-06-29  7:35 ` [PATCH v5 16/16] drm/panfrost: Increase the AS_ACTIVE polling timeout Boris Brezillon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210629073510.2764391-2-boris.brezillon@collabora.com \
    --to=boris.brezillon@collabora.com \
    --cc=alyssa.rosenzweig@collabora.com \
    --cc=daniel.vetter@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=robh+dt@kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=steven.price@arm.com \
    --cc=tomeu.vizoso@collabora.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.