dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
From: Boris Brezillon <boris.brezillon@collabora.com>
To: dri-devel@lists.freedesktop.org
Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>,
	Daniel Vetter <daniel.vetter@ffwll.ch>,
	Steven Price <steven.price@arm.com>,
	Rob Herring <robh+dt@kernel.org>,
	Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>,
	Boris Brezillon <boris.brezillon@collabora.com>,
	Robin Murphy <robin.murphy@arm.com>
Subject: [PATCH v6 01/16] drm/sched: Document what the timedout_job method should do
Date: Wed, 30 Jun 2021 08:27:36 +0200	[thread overview]
Message-ID: <20210630062751.2832545-2-boris.brezillon@collabora.com> (raw)
In-Reply-To: <20210630062751.2832545-1-boris.brezillon@collabora.com>

The documentation is a bit vague and doesn't really describe what the
->timedout_job() is expected to do. Let's add a few more details.

v5:
* New patch

Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 include/drm/gpu_scheduler.h | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index d18af49fd009..aa90ed1f1b2b 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -239,6 +239,20 @@ struct drm_sched_backend_ops {
 	 * @timedout_job: Called when a job has taken too long to execute,
 	 * to trigger GPU recovery.
 	 *
+	 * This method is called in a workqueue context.
+	 *
+	 * Drivers typically issue a reset to recover from GPU hangs, and this
+	 * procedure usually follows the following workflow:
+	 *
+	 * 1. Stop the scheduler using drm_sched_stop(). This will park the
+	 *    scheduler thread and cancel the timeout work, guaranteeing that
+	 *    nothing is queued while we reset the hardware queue
+	 * 2. Try to gracefully stop non-faulty jobs (optional)
+	 * 3. Issue a GPU reset (driver-specific)
+	 * 4. Re-submit jobs using drm_sched_resubmit_jobs()
+	 * 5. Restart the scheduler using drm_sched_start(). At that point, new
+	 *    jobs can be queued, and the scheduler thread is unblocked
+	 *
 	 * Return DRM_GPU_SCHED_STAT_NOMINAL, when all is normal,
 	 * and the underlying driver has started or completed recovery.
 	 *
-- 
2.31.1


  reply	other threads:[~2021-06-30  6:28 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-30  6:27 [PATCH v6 00/16] drm/panfrost Boris Brezillon
2021-06-30  6:27 ` Boris Brezillon [this message]
2021-06-30  6:27 ` [PATCH v6 02/16] drm/sched: Allow using a dedicated workqueue for the timeout/fault tdr Boris Brezillon
2021-06-30  6:27 ` [PATCH v6 03/16] drm/panfrost: Make ->run_job() return an ERR_PTR() when appropriate Boris Brezillon
2021-06-30  6:27 ` [PATCH v6 04/16] drm/panfrost: Get rid of the unused JS_STATUS_EVENT_ACTIVE definition Boris Brezillon
2021-06-30  6:27 ` [PATCH v6 05/16] drm/panfrost: Drop the pfdev argument passed to panfrost_exception_name() Boris Brezillon
2021-06-30 13:56   ` Alyssa Rosenzweig
2021-06-30  6:27 ` [PATCH v6 06/16] drm/panfrost: Do the exception -> string translation using a table Boris Brezillon
2021-06-30  6:27 ` [PATCH v6 07/16] drm/panfrost: Expose a helper to trigger a GPU reset Boris Brezillon
2021-06-30  6:27 ` [PATCH v6 08/16] drm/panfrost: Use a threaded IRQ for job interrupts Boris Brezillon
2021-06-30  6:27 ` [PATCH v6 09/16] drm/panfrost: Simplify the reset serialization logic Boris Brezillon
2021-06-30  6:27 ` [PATCH v6 10/16] drm/panfrost: Make sure job interrupts are masked before resetting Boris Brezillon
2021-06-30  6:27 ` [PATCH v6 11/16] drm/panfrost: Disable the AS on unhandled page faults Boris Brezillon
2021-06-30  6:27 ` [PATCH v6 12/16] drm/panfrost: Reset the GPU when the AS_ACTIVE bit is stuck Boris Brezillon
2021-06-30  6:27 ` [PATCH v6 13/16] drm/panfrost: Don't reset the GPU on job faults unless we really have to Boris Brezillon
2021-06-30  6:27 ` [PATCH v6 14/16] drm/panfrost: Kill in-flight jobs on FD close Boris Brezillon
2021-06-30 14:14   ` Steven Price
2021-06-30  6:27 ` [PATCH v6 15/16] drm/panfrost: Queue jobs on the hardware Boris Brezillon
2021-06-30 15:00   ` Steven Price
2021-06-30  6:27 ` [PATCH v6 16/16] drm/panfrost: Increase the AS_ACTIVE polling timeout Boris Brezillon
2021-06-30 15:01   ` Steven Price
2021-07-01  6:56 ` [PATCH v6 00/16] drm/panfrost Boris Brezillon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210630062751.2832545-2-boris.brezillon@collabora.com \
    --to=boris.brezillon@collabora.com \
    --cc=alyssa.rosenzweig@collabora.com \
    --cc=daniel.vetter@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=robh+dt@kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=steven.price@arm.com \
    --cc=tomeu.vizoso@collabora.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).