linux-tegra.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mikko Perttunen <mperttunen@nvidia.com>
To: thierry.reding@gmail.com, jonathanh@nvidia.com, digetx@gmail.com,
	airlied@linux.ie, daniel@ffwll.ch
Cc: linux-tegra@vger.kernel.org, dri-devel@lists.freedesktop.org,
	talho@nvidia.com, bhuntsman@nvidia.com,
	Mikko Perttunen <mperttunen@nvidia.com>
Subject: [PATCH v3 10/20] gpu: host1x: Add no-recovery mode
Date: Wed,  7 Oct 2020 20:12:28 +0300	[thread overview]
Message-ID: <20201007171238.1795964-11-mperttunen@nvidia.com> (raw)
In-Reply-To: <20201007171238.1795964-1-mperttunen@nvidia.com>

Add a new property for jobs to enable or disable recovery i.e.
CPU increments of syncpoints to max value on job timeout. This
allows for a more solid model for hanged jobs, where userspace
doesn't need to guess if a syncpoint increment happened because
the job completed, or because job timeout was triggered.

On job timeout, we stop the channel, NOP all future jobs on the
channel using the same syncpoint, mark the syncpoint as locked
and resume the channel from the next job, if any.

The future jobs are NOPed, since because we don't do the CPU
increments, the value of the syncpoint is no longer synchronized,
and any waiters would become confused if a future job incremented
the syncpoint. The syncpoint is marked locked to ensure that any
future jobs cannot increment the syncpoint either, until the
application has recognized the situation and reallocated the
syncpoint.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
v3:
* Move 'locked' check inside CDMA lock to prevent race
* Add clarifying comment to NOP-patching code
---
 drivers/gpu/drm/tegra/drm.c        |  1 +
 drivers/gpu/host1x/cdma.c          | 58 ++++++++++++++++++++++++++----
 drivers/gpu/host1x/hw/channel_hw.c |  2 +-
 drivers/gpu/host1x/job.c           |  4 +++
 drivers/gpu/host1x/syncpt.c        |  2 ++
 drivers/gpu/host1x/syncpt.h        | 12 +++++++
 include/linux/host1x.h             |  9 +++++
 7 files changed, 81 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index ceea9db341f0..7437c67924aa 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -197,6 +197,7 @@ int tegra_drm_submit(struct tegra_drm_context *context,
 	job->client = client;
 	job->class = client->class;
 	job->serialize = true;
+	job->syncpt_recovery = true;
 
 	/*
 	 * Track referenced BOs so that they can be unreferenced after the
diff --git a/drivers/gpu/host1x/cdma.c b/drivers/gpu/host1x/cdma.c
index 6e6ca774f68d..bd151c3a2a5f 100644
--- a/drivers/gpu/host1x/cdma.c
+++ b/drivers/gpu/host1x/cdma.c
@@ -312,10 +312,6 @@ static void update_cdma_locked(struct host1x_cdma *cdma)
 	bool signal = false;
 	struct host1x_job *job, *n;
 
-	/* If CDMA is stopped, queue is cleared and we can return */
-	if (!cdma->running)
-		return;
-
 	/*
 	 * Walk the sync queue, reading the sync point registers as necessary,
 	 * to consume as many sync queue entries as possible without blocking
@@ -324,7 +320,8 @@ static void update_cdma_locked(struct host1x_cdma *cdma)
 		struct host1x_syncpt *sp = job->syncpt;
 
 		/* Check whether this syncpt has completed, and bail if not */
-		if (!host1x_syncpt_is_expired(sp, job->syncpt_end)) {
+		if (!host1x_syncpt_is_expired(sp, job->syncpt_end) &&
+		    !job->cancelled) {
 			/* Start timer on next pending syncpt */
 			if (job->timeout)
 				cdma_start_timer_locked(cdma, job);
@@ -413,8 +410,11 @@ void host1x_cdma_update_sync_queue(struct host1x_cdma *cdma,
 	else
 		restart_addr = cdma->last_pos;
 
+	if (!job)
+		goto resume;
+
 	/* do CPU increments for the remaining syncpts */
-	if (job) {
+	if (job->syncpt_recovery) {
 		dev_dbg(dev, "%s: perform CPU incr on pending buffers\n",
 			__func__);
 
@@ -433,8 +433,44 @@ void host1x_cdma_update_sync_queue(struct host1x_cdma *cdma,
 
 		dev_dbg(dev, "%s: finished sync_queue modification\n",
 			__func__);
+	} else {
+		struct host1x_job *failed_job = job;
+
+		host1x_job_dump(dev, job);
+
+		host1x_syncpt_set_locked(job->syncpt);
+		failed_job->cancelled = true;
+
+		list_for_each_entry_continue(job, &cdma->sync_queue, list) {
+			unsigned int i;
+
+			if (job->syncpt != failed_job->syncpt)
+				continue;
+
+			for (i = 0; i < job->num_slots; i++) {
+				unsigned int slot = (job->first_get/8 + i) %
+						    HOST1X_PUSHBUFFER_SLOTS;
+				u32 *mapped = cdma->push_buffer.mapped;
+
+				/*
+				 * Overwrite opcodes with 0 word writes to
+				 * to offset 0xbad. This does nothing but
+				 * has a easily detected signature in debug
+				 * traces.
+				 */
+				mapped[2*slot+0] = 0x1bad0000;
+				mapped[2*slot+1] = 0x1bad0000;
+			}
+
+			job->cancelled = true;
+		}
+
+		wmb();
+
+		update_cdma_locked(cdma);
 	}
 
+resume:
 	/* roll back DMAGET and start up channel again */
 	host1x_hw_cdma_resume(host1x, cdma, restart_addr);
 }
@@ -490,6 +526,16 @@ int host1x_cdma_begin(struct host1x_cdma *cdma, struct host1x_job *job)
 
 	mutex_lock(&cdma->lock);
 
+	/*
+	 * Check if syncpoint was locked due to previous job timeout.
+	 * This needs to be done within the cdma lock to avoid a race
+	 * with the timeout handler.
+	 */
+	if (job->syncpt->locked) {
+		mutex_unlock(&cdma->lock);
+		return -EPERM;
+	}
+
 	if (job->timeout) {
 		/* init state on first submit with timeout value */
 		if (!cdma->timeout.initialized) {
diff --git a/drivers/gpu/host1x/hw/channel_hw.c b/drivers/gpu/host1x/hw/channel_hw.c
index d4c28faf27d1..bf21512e5078 100644
--- a/drivers/gpu/host1x/hw/channel_hw.c
+++ b/drivers/gpu/host1x/hw/channel_hw.c
@@ -191,7 +191,7 @@ static int channel_submit(struct host1x_job *job)
 	/* schedule a submit complete interrupt */
 	err = host1x_intr_add_action(host, sp, syncval,
 				     HOST1X_INTR_ACTION_SUBMIT_COMPLETE, ch,
-				     completed_waiter, NULL);
+				     completed_waiter, &job->waiter);
 	completed_waiter = NULL;
 	WARN(err, "Failed to set submit complete interrupt");
 
diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c
index d8345d3bf0b3..e4f16fc899b0 100644
--- a/drivers/gpu/host1x/job.c
+++ b/drivers/gpu/host1x/job.c
@@ -79,6 +79,10 @@ static void job_free(struct kref *ref)
 {
 	struct host1x_job *job = container_of(ref, struct host1x_job, ref);
 
+	if (job->waiter)
+		host1x_intr_put_ref(job->syncpt->host, job->syncpt->id,
+				    job->waiter);
+
 	if (job->syncpt)
 		host1x_syncpt_put(job->syncpt);
 
diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
index c7b910e413d8..8d658e5f7db2 100644
--- a/drivers/gpu/host1x/syncpt.c
+++ b/drivers/gpu/host1x/syncpt.c
@@ -385,6 +385,8 @@ static void syncpt_release(struct kref *ref)
 {
 	struct host1x_syncpt *sp = container_of(ref, struct host1x_syncpt, ref);
 
+	sp->locked = false;
+
 	mutex_lock(&sp->host->syncpt_mutex);
 
 	host1x_syncpt_base_free(sp->base);
diff --git a/drivers/gpu/host1x/syncpt.h b/drivers/gpu/host1x/syncpt.h
index a6766f8d55ee..93e894677d89 100644
--- a/drivers/gpu/host1x/syncpt.h
+++ b/drivers/gpu/host1x/syncpt.h
@@ -40,6 +40,13 @@ struct host1x_syncpt {
 
 	/* interrupt data */
 	struct host1x_syncpt_intr intr;
+
+	/* 
+	 * If a submission incrementing this syncpoint fails, lock it so that
+	 * further submission cannot be made until application has handled the
+	 * failure.
+	 */
+	bool locked;
 };
 
 /* Initialize sync point array  */
@@ -115,4 +122,9 @@ static inline int host1x_syncpt_is_valid(struct host1x_syncpt *sp)
 	return sp->id < host1x_syncpt_nb_pts(sp->host);
 }
 
+static inline void host1x_syncpt_set_locked(struct host1x_syncpt *sp)
+{
+	sp->locked = true;
+}
+
 #endif
diff --git a/include/linux/host1x.h b/include/linux/host1x.h
index bcfb1cc9a1c1..fb62cc8b77dd 100644
--- a/include/linux/host1x.h
+++ b/include/linux/host1x.h
@@ -233,9 +233,15 @@ struct host1x_job {
 	u32 syncpt_incrs;
 	u32 syncpt_end;
 
+	/* Completion waiter ref */
+	void *waiter;
+
 	/* Maximum time to wait for this job */
 	unsigned int timeout;
 
+	/* Job has timed out and should be released */
+	bool cancelled;
+
 	/* Index and number of slots used in the push buffer */
 	unsigned int first_get;
 	unsigned int num_slots;
@@ -256,6 +262,9 @@ struct host1x_job {
 
 	/* Add a channel wait for previous ops to complete */
 	bool serialize;
+
+	/* Fast-forward syncpoint increments on job timeout */
+	bool syncpt_recovery;
 };
 
 struct host1x_job *host1x_job_alloc(struct host1x_channel *ch,
-- 
2.28.0


  parent reply	other threads:[~2020-10-07 17:17 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-07 17:12 [PATCH v3 00/20] Host1x/TegraDRM UAPI Mikko Perttunen
2020-10-07 17:12 ` [PATCH v3 01/20] gpu: host1x: Use different lock classes for each client Mikko Perttunen
2020-10-07 17:12 ` [PATCH v3 02/20] gpu: host1x: Allow syncpoints without associated client Mikko Perttunen
2020-10-07 17:12 ` [PATCH v3 03/20] gpu: host1x: Show number of pending waiters in debugfs Mikko Perttunen
2020-10-07 17:12 ` [PATCH v3 04/20] gpu: host1x: Remove cancelled waiters immediately Mikko Perttunen
2020-10-07 17:12 ` [PATCH v3 05/20] gpu: host1x: Use HW-equivalent syncpoint expiration check Mikko Perttunen
2020-10-07 17:12 ` [PATCH v3 06/20] gpu: host1x: Cleanup and refcounting for syncpoints Mikko Perttunen
2020-10-07 22:23   ` kernel test robot
2020-10-07 17:12 ` [PATCH v3 07/20] gpu: host1x: Introduce UAPI header Mikko Perttunen
2020-10-07 17:12 ` [PATCH v3 08/20] gpu: host1x: Implement /dev/host1x device node Mikko Perttunen
2020-10-07 17:12 ` [PATCH v3 09/20] gpu: host1x: DMA fences and userspace fence creation Mikko Perttunen
2020-10-07 23:13   ` kernel test robot
2020-10-08 11:13   ` kernel test robot
2020-10-07 17:12 ` Mikko Perttunen [this message]
2020-10-07 17:12 ` [PATCH v3 11/20] gpu: host1x: Add job release callback Mikko Perttunen
2020-10-07 17:12 ` [PATCH v3 12/20] gpu: host1x: Add support for syncpoint waits in CDMA pushbuffer Mikko Perttunen
2020-10-07 17:12 ` [PATCH v3 13/20] gpu: host1x: Reset max value when freeing a syncpoint Mikko Perttunen
2020-10-07 17:12 ` [PATCH v3 14/20] gpu: host1x: Reserve VBLANK syncpoints at initialization Mikko Perttunen
2020-10-07 17:12 ` [PATCH v3 15/20] drm/tegra: Add new UAPI to header Mikko Perttunen
2020-10-07 17:12 ` [PATCH v3 16/20] drm/tegra: Boot VIC during runtime PM resume Mikko Perttunen
2020-10-07 17:12 ` [PATCH v3 17/20] drm/tegra: Set resv fields when importing/exporting GEMs Mikko Perttunen
2020-10-07 17:12 ` [PATCH v3 18/20] drm/tegra: Allocate per-engine channel in core code Mikko Perttunen
2020-10-07 17:12 ` [PATCH v3 19/20] drm/tegra: Implement new UAPI Mikko Perttunen
2020-10-08  3:42   ` kernel test robot
2020-10-19  2:21   ` Dmitry Osipenko
2020-10-19  8:13     ` Mikko Perttunen
2020-10-19 17:27       ` Dmitry Osipenko
2020-10-20  9:18         ` Mikko Perttunen
2020-10-22  4:20           ` Dmitry Osipenko
2020-10-26  9:11             ` Mikko Perttunen
2020-10-27 19:06               ` Dmitry Osipenko
2020-10-28  9:54                 ` Mikko Perttunen
2020-10-30 23:13                   ` Dmitry Osipenko
2020-11-09 14:53                     ` Mikko Perttunen
2020-11-12 18:35                       ` Dmitry Osipenko
2020-10-20 11:40         ` Daniel Vetter
2020-10-20 12:51           ` Mikko Perttunen
2020-10-07 17:12 ` [PATCH v3 20/20] drm/tegra: Add job firewall Mikko Perttunen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201007171238.1795964-11-mperttunen@nvidia.com \
    --to=mperttunen@nvidia.com \
    --cc=airlied@linux.ie \
    --cc=bhuntsman@nvidia.com \
    --cc=daniel@ffwll.ch \
    --cc=digetx@gmail.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=jonathanh@nvidia.com \
    --cc=linux-tegra@vger.kernel.org \
    --cc=talho@nvidia.com \
    --cc=thierry.reding@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).