All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/6] drm/i915: Finish the wait-for-wedge by retiring all the inflight requests
@ 2018-03-07 13:42 Chris Wilson
  2018-03-07 13:42 ` [PATCH 2/6] drm/i915: Reset ring space estimate after unwinding the request Chris Wilson
                   ` (7 more replies)
  0 siblings, 8 replies; 19+ messages in thread
From: Chris Wilson @ 2018-03-07 13:42 UTC (permalink / raw)
  To: intel-gfx

Before we reset the GPU after marking the device as wedged, we wait for
all the remaining requests to be completed (and marked as EIO).
Afterwards, we should flush the request lists so the next batch start
with the driver in an idle start.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index a5bd07338b46..30cd07ebcb8e 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3273,7 +3273,8 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915)
 	if (!test_bit(I915_WEDGED, &i915->gpu_error.flags))
 		return true;
 
-	/* Before unwedging, make sure that all pending operations
+	/*
+	 * Before unwedging, make sure that all pending operations
 	 * are flushed and errored out - we may have requests waiting upon
 	 * third party fences. We marked all inflight requests as EIO, and
 	 * every execbuf since returned EIO, for consistency we want all
@@ -3291,7 +3292,8 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915)
 			if (!rq)
 				continue;
 
-			/* We can't use our normal waiter as we want to
+			/*
+			 * We can't use our normal waiter as we want to
 			 * avoid recursively trying to handle the current
 			 * reset. The basic dma_fence_default_wait() installs
 			 * a callback for dma_fence_signal(), which is
@@ -3306,8 +3308,11 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915)
 				return false;
 		}
 	}
+	i915_retire_requests(i915);
+	GEM_BUG_ON(i915->gt.active_requests);
 
-	/* Undo nop_submit_request. We prevent all new i915 requests from
+	/*
+	 * Undo nop_submit_request. We prevent all new i915 requests from
 	 * being queued (by disallowing execbuf whilst wedged) so having
 	 * waited for all active requests above, we know the system is idle
 	 * and do not have to worry about a thread being inside
-- 
2.16.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/6] drm/i915: Reset ring space estimate after unwinding the request
  2018-03-07 13:42 [PATCH 1/6] drm/i915: Finish the wait-for-wedge by retiring all the inflight requests Chris Wilson
@ 2018-03-07 13:42 ` Chris Wilson
  2018-03-09 13:17   ` Mika Kuoppala
  2018-03-09 13:20   ` Chris Wilson
  2018-03-07 13:42 ` [PATCH 3/6] drm/i915: Update ring position from request on retiring Chris Wilson
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 19+ messages in thread
From: Chris Wilson @ 2018-03-07 13:42 UTC (permalink / raw)
  To: intel-gfx

With a series of unusual events (a sequence of interrupted request
allocations), we could gradually leak the ring->space estimate by
unwinding the ring back to the start of the request, but not return the
used space back to the ring. Eventually and with great misfortune, it
would be possible to trigger ring->space exhaustion with no requests on
the ring.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_request.c     | 1 +
 drivers/gpu/drm/i915/intel_ringbuffer.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index d437beac3969..efa9ee557f31 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -798,6 +798,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 
 err_unwind:
 	rq->ring->emit = rq->head;
+	intel_ring_update_space(rq->ring);
 
 	/* Make sure we didn't add ourselves to external state before freeing */
 	GEM_BUG_ON(!list_empty(&rq->active_list));
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 1d599524a759..88eeb64041ae 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1593,6 +1593,7 @@ static noinline int wait_for_space(struct intel_ring *ring, unsigned int bytes)
 	if (intel_ring_update_space(ring) >= bytes)
 		return 0;
 
+	GEM_BUG_ON(list_empty(&ring->request_list));
 	list_for_each_entry(target, &ring->request_list, ring_link) {
 		/* Would completion of this request free enough space? */
 		if (bytes <= __intel_ring_space(target->postfix,
-- 
2.16.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/6] drm/i915: Update ring position from request on retiring
  2018-03-07 13:42 [PATCH 1/6] drm/i915: Finish the wait-for-wedge by retiring all the inflight requests Chris Wilson
  2018-03-07 13:42 ` [PATCH 2/6] drm/i915: Reset ring space estimate after unwinding the request Chris Wilson
@ 2018-03-07 13:42 ` Chris Wilson
  2018-03-09 13:38   ` Mika Kuoppala
  2018-03-07 13:42 ` [PATCH 4/6] drm/i915: Include ring->emit in debugging Chris Wilson
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Chris Wilson @ 2018-03-07 13:42 UTC (permalink / raw)
  To: intel-gfx

When wedged, we do not update the ring->tail as we submit the requests
causing us to leak the ring->space upon cleaning up the wedged driver.
We can just use the value stored in rq->tail, and keep the submission
backend details away from set-wedge.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_request.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index efa9ee557f31..69b378a323fc 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -358,7 +358,7 @@ static void advance_ring(struct i915_request *request)
 		 * is just about to be. Either works, if we miss the last two
 		 * noops - they are safe to be replayed on a reset.
 		 */
-		tail = READ_ONCE(request->ring->tail);
+		tail = READ_ONCE(request->tail);
 	} else {
 		tail = request->postfix;
 	}
-- 
2.16.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/6] drm/i915: Include ring->emit in debugging
  2018-03-07 13:42 [PATCH 1/6] drm/i915: Finish the wait-for-wedge by retiring all the inflight requests Chris Wilson
  2018-03-07 13:42 ` [PATCH 2/6] drm/i915: Reset ring space estimate after unwinding the request Chris Wilson
  2018-03-07 13:42 ` [PATCH 3/6] drm/i915: Update ring position from request on retiring Chris Wilson
@ 2018-03-07 13:42 ` Chris Wilson
  2018-03-09 13:41   ` Mika Kuoppala
  2018-03-07 13:42 ` [PATCH 5/6] drm/i915: Wrap engine->schedule in RCU locks for set-wedge protection Chris Wilson
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Chris Wilson @ 2018-03-07 13:42 UTC (permalink / raw)
  To: intel-gfx

Include ring->emit and ring->space alongside ring->(head,tail) when
printing debug information.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c    |  4 ++--
 drivers/gpu/drm/i915/intel_engine_cs.c | 10 +++++++---
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index e838c765b251..4de0a52f14a9 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1923,8 +1923,8 @@ static int i915_gem_framebuffer_info(struct seq_file *m, void *data)
 
 static void describe_ctx_ring(struct seq_file *m, struct intel_ring *ring)
 {
-	seq_printf(m, " (ringbuffer, space: %d, head: %u, tail: %u)",
-		   ring->space, ring->head, ring->tail);
+	seq_printf(m, " (ringbuffer, space: %d, head: %u, tail: %u, emit: %u)",
+		   ring->space, ring->head, ring->tail, ring->emit);
 }
 
 static int i915_context_status(struct seq_file *m, void *unused)
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 4ba139c27fba..e71bd6951d9b 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1929,12 +1929,16 @@ void intel_engine_dump(struct intel_engine_cs *engine,
 			   rq->head, rq->postfix, rq->tail,
 			   rq->batch ? upper_32_bits(rq->batch->node.start) : ~0u,
 			   rq->batch ? lower_32_bits(rq->batch->node.start) : ~0u);
-		drm_printf(m, "\t\tring->start: 0x%08x\n",
+		drm_printf(m, "\t\tring->start : 0x%08x\n",
 			   i915_ggtt_offset(rq->ring->vma));
-		drm_printf(m, "\t\tring->head:  0x%08x\n",
+		drm_printf(m, "\t\tring->head:   0x%08x\n",
 			   rq->ring->head);
-		drm_printf(m, "\t\tring->tail:  0x%08x\n",
+		drm_printf(m, "\t\tring->tail:   0x%08x\n",
 			   rq->ring->tail);
+		drm_printf(m, "\t\tring->emit:   0x%08x\n",
+			   rq->ring->emit);
+		drm_printf(m, "\t\tring->space:  0x%08x\n",
+			   rq->ring->space);
 	}
 
 	rcu_read_unlock();
-- 
2.16.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 5/6] drm/i915: Wrap engine->schedule in RCU locks for set-wedge protection
  2018-03-07 13:42 [PATCH 1/6] drm/i915: Finish the wait-for-wedge by retiring all the inflight requests Chris Wilson
                   ` (2 preceding siblings ...)
  2018-03-07 13:42 ` [PATCH 4/6] drm/i915: Include ring->emit in debugging Chris Wilson
@ 2018-03-07 13:42 ` Chris Wilson
  2018-03-09 13:49   ` Mika Kuoppala
  2018-03-07 13:42 ` [PATCH 6/6] drm/i915: Only call tasklet_kill() on the first prepare_reset Chris Wilson
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Chris Wilson @ 2018-03-07 13:42 UTC (permalink / raw)
  To: intel-gfx; +Cc: Mika Kuoppala

Similar to the staging around handling of engine->submit_request, we
need to stop adding to the execlists->queue prior to calling
engine->cancel_requests. cancel_requests will move requests from the
queue onto the timeline, so if we add a request onto the queue after that
point, it will be lost.

Fixes: af7a8ffad9c5 ("drm/i915: Use rcu instead of stop_machine in set_wedged")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c     | 13 +++++++------
 drivers/gpu/drm/i915/i915_request.c |  2 ++
 2 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 30cd07ebcb8e..3b44952e089f 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -471,10 +471,11 @@ static void __fence_set_priority(struct dma_fence *fence, int prio)
 
 	rq = to_request(fence);
 	engine = rq->engine;
-	if (!engine->schedule)
-		return;
 
-	engine->schedule(rq, prio);
+	rcu_read_lock();
+	if (engine->schedule)
+		engine->schedule(rq, prio);
+	rcu_read_unlock();
 }
 
 static void fence_set_priority(struct dma_fence *fence, int prio)
@@ -3214,8 +3215,11 @@ void i915_gem_set_wedged(struct drm_i915_private *i915)
 	 */
 	for_each_engine(engine, i915, id) {
 		i915_gem_reset_prepare_engine(engine);
+
 		engine->submit_request = nop_submit_request;
+		engine->schedule = NULL;
 	}
+	i915->caps.scheduler = 0;
 
 	/*
 	 * Make sure no one is running the old callback before we proceed with
@@ -3233,11 +3237,8 @@ void i915_gem_set_wedged(struct drm_i915_private *i915)
 		 * start to complete all requests.
 		 */
 		engine->submit_request = nop_complete_submit_request;
-		engine->schedule = NULL;
 	}
 
-	i915->caps.scheduler = 0;
-
 	/*
 	 * Make sure no request can slip through without getting completed by
 	 * either this call here to intel_engine_init_global_seqno, or the one
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 69b378a323fc..0258449579f8 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1082,8 +1082,10 @@ void __i915_request_add(struct i915_request *request, bool flush_caches)
 	 * decide whether to preempt the entire chain so that it is ready to
 	 * run at the earliest possible convenience.
 	 */
+	rcu_read_lock();
 	if (engine->schedule)
 		engine->schedule(request, request->ctx->priority);
+	rcu_read_unlock();
 
 	local_bh_disable();
 	i915_sw_fence_commit(&request->submit);
-- 
2.16.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 6/6] drm/i915: Only call tasklet_kill() on the first prepare_reset
  2018-03-07 13:42 [PATCH 1/6] drm/i915: Finish the wait-for-wedge by retiring all the inflight requests Chris Wilson
                   ` (3 preceding siblings ...)
  2018-03-07 13:42 ` [PATCH 5/6] drm/i915: Wrap engine->schedule in RCU locks for set-wedge protection Chris Wilson
@ 2018-03-07 13:42 ` Chris Wilson
  2018-03-09 14:01   ` Mika Kuoppala
  2018-03-09 14:10   ` Chris Wilson
  2018-03-07 14:31 ` ✓ Fi.CI.BAT: success for series starting with [1/6] drm/i915: Finish the wait-for-wedge by retiring all the inflight requests Patchwork
                   ` (2 subsequent siblings)
  7 siblings, 2 replies; 19+ messages in thread
From: Chris Wilson @ 2018-03-07 13:42 UTC (permalink / raw)
  To: intel-gfx

tasklet_kill() will spin waiting for the current tasklet to be executed.
However, if tasklet_disable() has been called, then the tasklet is never
executed but permanently put back onto the runlist until
tasklet_enable() is called. Ergo, we cannot use tasklet_kill() inside a
disable/enable pair. This is the case when we call set-wedge from inside
i915_reset(), and another request was submitted to us concurrent to the
reset.

Fixes: 963ddd63c314 ("drm/i915: Suspend submission tasklets around wedging")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 3b44952e089f..e75af06904b7 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2941,7 +2941,8 @@ i915_gem_reset_prepare_engine(struct intel_engine_cs *engine)
 	 * Turning off the execlists->tasklet until the reset is over
 	 * prevents the race.
 	 */
-	tasklet_kill(&engine->execlists.tasklet);
+	if (!atomic_read(&engine->execlists.tasklet.count))
+		tasklet_kill(&engine->execlists.tasklet);
 	tasklet_disable(&engine->execlists.tasklet);
 
 	/*
-- 
2.16.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* ✓ Fi.CI.BAT: success for series starting with [1/6] drm/i915: Finish the wait-for-wedge by retiring all the inflight requests
  2018-03-07 13:42 [PATCH 1/6] drm/i915: Finish the wait-for-wedge by retiring all the inflight requests Chris Wilson
                   ` (4 preceding siblings ...)
  2018-03-07 13:42 ` [PATCH 6/6] drm/i915: Only call tasklet_kill() on the first prepare_reset Chris Wilson
@ 2018-03-07 14:31 ` Patchwork
  2018-03-07 16:55 ` ✓ Fi.CI.IGT: " Patchwork
  2018-03-09 12:54 ` [PATCH 1/6] " Mika Kuoppala
  7 siblings, 0 replies; 19+ messages in thread
From: Patchwork @ 2018-03-07 14:31 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/6] drm/i915: Finish the wait-for-wedge by retiring all the inflight requests
URL   : https://patchwork.freedesktop.org/series/39532/
State : success

== Summary ==

Series 39532v1 series starting with [1/6] drm/i915: Finish the wait-for-wedge by retiring all the inflight requests
https://patchwork.freedesktop.org/api/1.0/series/39532/revisions/1/mbox/

---- Known issues:

Test kms_pipe_crc_basic:
        Subgroup suspend-read-crc-pipe-b:
                incomplete -> PASS       (fi-snb-2520m) fdo#103713
        Subgroup suspend-read-crc-pipe-c:
                incomplete -> PASS       (fi-bxt-dsi) fdo#103927

fdo#103713 https://bugs.freedesktop.org/show_bug.cgi?id=103713
fdo#103927 https://bugs.freedesktop.org/show_bug.cgi?id=103927

fi-bdw-5557u     total:288  pass:267  dwarn:0   dfail:0   fail:0   skip:21  time:423s
fi-bdw-gvtdvm    total:288  pass:264  dwarn:0   dfail:0   fail:0   skip:24  time:425s
fi-blb-e6850     total:288  pass:223  dwarn:1   dfail:0   fail:0   skip:64  time:370s
fi-bsw-n3050     total:288  pass:242  dwarn:0   dfail:0   fail:0   skip:46  time:497s
fi-bwr-2160      total:288  pass:183  dwarn:0   dfail:0   fail:0   skip:105 time:278s
fi-bxt-dsi       total:288  pass:258  dwarn:0   dfail:0   fail:0   skip:30  time:488s
fi-bxt-j4205     total:288  pass:259  dwarn:0   dfail:0   fail:0   skip:29  time:488s
fi-byt-j1900     total:288  pass:253  dwarn:0   dfail:0   fail:0   skip:35  time:478s
fi-byt-n2820     total:288  pass:249  dwarn:0   dfail:0   fail:0   skip:39  time:469s
fi-cfl-8700k     total:288  pass:260  dwarn:0   dfail:0   fail:0   skip:28  time:403s
fi-cfl-s2        total:288  pass:262  dwarn:0   dfail:0   fail:0   skip:26  time:575s
fi-cfl-u         total:288  pass:262  dwarn:0   dfail:0   fail:0   skip:26  time:504s
fi-elk-e7500     total:288  pass:229  dwarn:0   dfail:0   fail:0   skip:59  time:413s
fi-gdg-551       total:288  pass:179  dwarn:0   dfail:0   fail:1   skip:108 time:289s
fi-glk-1         total:288  pass:260  dwarn:0   dfail:0   fail:0   skip:28  time:515s
fi-hsw-4770      total:288  pass:261  dwarn:0   dfail:0   fail:0   skip:27  time:396s
fi-ilk-650       total:288  pass:228  dwarn:0   dfail:0   fail:0   skip:60  time:410s
fi-ivb-3520m     total:288  pass:259  dwarn:0   dfail:0   fail:0   skip:29  time:464s
fi-ivb-3770      total:288  pass:255  dwarn:0   dfail:0   fail:0   skip:33  time:421s
fi-kbl-7500u     total:288  pass:263  dwarn:1   dfail:0   fail:0   skip:24  time:468s
fi-kbl-7567u     total:288  pass:268  dwarn:0   dfail:0   fail:0   skip:20  time:461s
fi-kbl-r         total:288  pass:261  dwarn:0   dfail:0   fail:0   skip:27  time:507s
fi-pnv-d510      total:288  pass:222  dwarn:1   dfail:0   fail:0   skip:65  time:588s
fi-skl-6260u     total:288  pass:268  dwarn:0   dfail:0   fail:0   skip:20  time:435s
fi-skl-6600u     total:288  pass:261  dwarn:0   dfail:0   fail:0   skip:27  time:521s
fi-skl-6700hq    total:288  pass:262  dwarn:0   dfail:0   fail:0   skip:26  time:534s
fi-skl-6700k2    total:288  pass:264  dwarn:0   dfail:0   fail:0   skip:24  time:501s
fi-skl-6770hq    total:288  pass:268  dwarn:0   dfail:0   fail:0   skip:20  time:486s
fi-skl-guc       total:288  pass:260  dwarn:0   dfail:0   fail:0   skip:28  time:420s
fi-skl-gvtdvm    total:288  pass:265  dwarn:0   dfail:0   fail:0   skip:23  time:429s
fi-snb-2520m     total:288  pass:248  dwarn:0   dfail:0   fail:0   skip:40  time:522s
fi-snb-2600      total:288  pass:248  dwarn:0   dfail:0   fail:0   skip:40  time:401s

36e67bf18117a31df69df314e9e7f3709c37c9c8 drm-tip: 2018y-03m-07d-13h-36m-57s UTC integration manifest
7dc0b21fe5c5 drm/i915: Only call tasklet_kill() on the first prepare_reset
b6b27d833c1e drm/i915: Wrap engine->schedule in RCU locks for set-wedge protection
922b2f72ecc7 drm/i915: Include ring->emit in debugging
bdd4d1164ec2 drm/i915: Update ring position from request on retiring
50b3612924f1 drm/i915: Reset ring space estimate after unwinding the request
95cc7a5b45e6 drm/i915: Finish the wait-for-wedge by retiring all the inflight requests

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_8258/issues.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* ✓ Fi.CI.IGT: success for series starting with [1/6] drm/i915: Finish the wait-for-wedge by retiring all the inflight requests
  2018-03-07 13:42 [PATCH 1/6] drm/i915: Finish the wait-for-wedge by retiring all the inflight requests Chris Wilson
                   ` (5 preceding siblings ...)
  2018-03-07 14:31 ` ✓ Fi.CI.BAT: success for series starting with [1/6] drm/i915: Finish the wait-for-wedge by retiring all the inflight requests Patchwork
@ 2018-03-07 16:55 ` Patchwork
  2018-03-09 12:54 ` [PATCH 1/6] " Mika Kuoppala
  7 siblings, 0 replies; 19+ messages in thread
From: Patchwork @ 2018-03-07 16:55 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/6] drm/i915: Finish the wait-for-wedge by retiring all the inflight requests
URL   : https://patchwork.freedesktop.org/series/39532/
State : success

== Summary ==

---- Possible new issues:

Test kms_busy:
        Subgroup extended-pageflip-hang-oldfb-render-a:
                skip       -> PASS       (shard-snb)
Test kms_cursor_legacy:
        Subgroup cursorb-vs-flipa-varying-size:
                dmesg-warn -> PASS       (shard-hsw)
Test kms_vblank:
        Subgroup pipe-a-ts-continuation-modeset:
                skip       -> PASS       (shard-snb)
        Subgroup pipe-a-wait-busy-hang:
                skip       -> PASS       (shard-snb)
        Subgroup pipe-b-wait-idle:
                dmesg-warn -> PASS       (shard-hsw)

---- Known issues:

Test gem_eio:
        Subgroup in-flight:
                pass       -> INCOMPLETE (shard-apl) fdo#105341 +1
Test kms_chv_cursor_fail:
        Subgroup pipe-b-64x64-right-edge:
                dmesg-warn -> PASS       (shard-snb) fdo#105185 +2
Test kms_cursor_crc:
        Subgroup cursor-128x128-suspend:
                pass       -> INCOMPLETE (shard-hsw) fdo#103540
Test kms_flip:
        Subgroup 2x-flip-vs-expired-vblank-interruptible:
                pass       -> FAIL       (shard-hsw) fdo#102887
        Subgroup 2x-plain-flip-fb-recreate:
                pass       -> FAIL       (shard-hsw) fdo#100368 +2

fdo#105341 https://bugs.freedesktop.org/show_bug.cgi?id=105341
fdo#105185 https://bugs.freedesktop.org/show_bug.cgi?id=105185
fdo#103540 https://bugs.freedesktop.org/show_bug.cgi?id=103540
fdo#102887 https://bugs.freedesktop.org/show_bug.cgi?id=102887
fdo#100368 https://bugs.freedesktop.org/show_bug.cgi?id=100368

shard-apl        total:3394 pass:1786 dwarn:1   dfail:0   fail:8   skip:1597 time:11868s
shard-hsw        total:3429 pass:1751 dwarn:1   dfail:0   fail:5   skip:1670 time:11284s
shard-snb        total:3467 pass:1362 dwarn:2   dfail:0   fail:2   skip:2101 time:6988s
Blacklisted hosts:
shard-kbl        total:3381 pass:1900 dwarn:3   dfail:0   fail:7   skip:1470 time:9183s

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_8258/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/6] drm/i915: Finish the wait-for-wedge by retiring all the inflight requests
  2018-03-07 13:42 [PATCH 1/6] drm/i915: Finish the wait-for-wedge by retiring all the inflight requests Chris Wilson
                   ` (6 preceding siblings ...)
  2018-03-07 16:55 ` ✓ Fi.CI.IGT: " Patchwork
@ 2018-03-09 12:54 ` Mika Kuoppala
  7 siblings, 0 replies; 19+ messages in thread
From: Mika Kuoppala @ 2018-03-09 12:54 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> Before we reset the GPU after marking the device as wedged, we wait for
> all the remaining requests to be completed (and marked as EIO).
> Afterwards, we should flush the request lists so the next batch start
> with the driver in an idle start.

s/start/state?

>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>

Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>

> ---
>  drivers/gpu/drm/i915/i915_gem.c | 11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index a5bd07338b46..30cd07ebcb8e 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3273,7 +3273,8 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915)
>  	if (!test_bit(I915_WEDGED, &i915->gpu_error.flags))
>  		return true;
>  
> -	/* Before unwedging, make sure that all pending operations
> +	/*
> +	 * Before unwedging, make sure that all pending operations
>  	 * are flushed and errored out - we may have requests waiting upon
>  	 * third party fences. We marked all inflight requests as EIO, and
>  	 * every execbuf since returned EIO, for consistency we want all
> @@ -3291,7 +3292,8 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915)
>  			if (!rq)
>  				continue;
>  
> -			/* We can't use our normal waiter as we want to
> +			/*
> +			 * We can't use our normal waiter as we want to
>  			 * avoid recursively trying to handle the current
>  			 * reset. The basic dma_fence_default_wait() installs
>  			 * a callback for dma_fence_signal(), which is
> @@ -3306,8 +3308,11 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915)
>  				return false;
>  		}
>  	}
> +	i915_retire_requests(i915);
> +	GEM_BUG_ON(i915->gt.active_requests);
>  
> -	/* Undo nop_submit_request. We prevent all new i915 requests from
> +	/*
> +	 * Undo nop_submit_request. We prevent all new i915 requests from
>  	 * being queued (by disallowing execbuf whilst wedged) so having
>  	 * waited for all active requests above, we know the system is idle
>  	 * and do not have to worry about a thread being inside
> -- 
> 2.16.2
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/6] drm/i915: Reset ring space estimate after unwinding the request
  2018-03-07 13:42 ` [PATCH 2/6] drm/i915: Reset ring space estimate after unwinding the request Chris Wilson
@ 2018-03-09 13:17   ` Mika Kuoppala
  2018-03-09 13:42     ` Chris Wilson
  2018-03-09 13:20   ` Chris Wilson
  1 sibling, 1 reply; 19+ messages in thread
From: Mika Kuoppala @ 2018-03-09 13:17 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> With a series of unusual events (a sequence of interrupted request
> allocations), we could gradually leak the ring->space estimate by
> unwinding the ring back to the start of the request, but not return the
> used space back to the ring. Eventually and with great misfortune, it
> would be possible to trigger ring->space exhaustion with no requests on
> the ring.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/i915_request.c     | 1 +
>  drivers/gpu/drm/i915/intel_ringbuffer.c | 1 +
>  2 files changed, 2 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index d437beac3969..efa9ee557f31 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -798,6 +798,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
>  
>  err_unwind:
>  	rq->ring->emit = rq->head;
> +	intel_ring_update_space(rq->ring);
>  
>  	/* Make sure we didn't add ourselves to external state before freeing */
>  	GEM_BUG_ON(!list_empty(&rq->active_list));
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 1d599524a759..88eeb64041ae 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1593,6 +1593,7 @@ static noinline int wait_for_space(struct intel_ring *ring, unsigned int bytes)
>  	if (intel_ring_update_space(ring) >= bytes)
>  		return 0;
>  
> +	GEM_BUG_ON(list_empty(&ring->request_list));

At some point, after long series of misfortunate events, we
will add a first request and need actually wait a space for this
and then we kaboom in here?

-Mika

>  	list_for_each_entry(target, &ring->request_list, ring_link) {
>  		/* Would completion of this request free enough space? */
>  		if (bytes <= __intel_ring_space(target->postfix,
> -- 
> 2.16.2
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/6] drm/i915: Reset ring space estimate after unwinding the request
  2018-03-07 13:42 ` [PATCH 2/6] drm/i915: Reset ring space estimate after unwinding the request Chris Wilson
  2018-03-09 13:17   ` Mika Kuoppala
@ 2018-03-09 13:20   ` Chris Wilson
  1 sibling, 0 replies; 19+ messages in thread
From: Chris Wilson @ 2018-03-09 13:20 UTC (permalink / raw)
  To: intel-gfx

Quoting Chris Wilson (2018-03-07 13:42:22)
> With a series of unusual events (a sequence of interrupted request
> allocations), we could gradually leak the ring->space estimate by
> unwinding the ring back to the start of the request, but not return the
> used space back to the ring. Eventually and with great misfortune, it
> would be possible to trigger ring->space exhaustion with no requests on
> the ring.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/i915_request.c     | 1 +
>  drivers/gpu/drm/i915/intel_ringbuffer.c | 1 +
>  2 files changed, 2 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index d437beac3969..efa9ee557f31 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -798,6 +798,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
>  
>  err_unwind:
>         rq->ring->emit = rq->head;
> +       intel_ring_update_space(rq->ring);

Ok, skip this one as we will correct ourselves next time we
wait_for_space. It's just the next one where we weren't maintaining
ring->tail that was the issue.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/6] drm/i915: Update ring position from request on retiring
  2018-03-07 13:42 ` [PATCH 3/6] drm/i915: Update ring position from request on retiring Chris Wilson
@ 2018-03-09 13:38   ` Mika Kuoppala
  2018-03-09 13:56     ` Chris Wilson
  0 siblings, 1 reply; 19+ messages in thread
From: Mika Kuoppala @ 2018-03-09 13:38 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> When wedged, we do not update the ring->tail as we submit the requests
> causing us to leak the ring->space upon cleaning up the wedged driver.
> We can just use the value stored in rq->tail, and keep the submission
> backend details away from set-wedge.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/i915_request.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index efa9ee557f31..69b378a323fc 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -358,7 +358,7 @@ static void advance_ring(struct i915_request *request)
>  		 * is just about to be. Either works, if we miss the last two
>  		 * noops - they are safe to be replayed on a reset.
>  		 */
> -		tail = READ_ONCE(request->ring->tail);
> +		tail = READ_ONCE(request->tail);

I tried to think if we need the READ_ONCE here anymore.

But as this is the safest version,
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>

Noticed that request->tail is not cleared on i915_request_alloc.

If we would set rq->head = rq->tail = rq->ring->emit
we could use rq->head == rq->tail to assert that
we screw up something major during the request lifetime.

-Mika


>  	} else {
>  		tail = request->postfix;
>  	}
> -- 
> 2.16.2
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/6] drm/i915: Include ring->emit in debugging
  2018-03-07 13:42 ` [PATCH 4/6] drm/i915: Include ring->emit in debugging Chris Wilson
@ 2018-03-09 13:41   ` Mika Kuoppala
  0 siblings, 0 replies; 19+ messages in thread
From: Mika Kuoppala @ 2018-03-09 13:41 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> Include ring->emit and ring->space alongside ring->(head,tail) when
> printing debug information.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/i915_debugfs.c    |  4 ++--
>  drivers/gpu/drm/i915/intel_engine_cs.c | 10 +++++++---
>  2 files changed, 9 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index e838c765b251..4de0a52f14a9 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -1923,8 +1923,8 @@ static int i915_gem_framebuffer_info(struct seq_file *m, void *data)
>  
>  static void describe_ctx_ring(struct seq_file *m, struct intel_ring *ring)
>  {
> -	seq_printf(m, " (ringbuffer, space: %d, head: %u, tail: %u)",
> -		   ring->space, ring->head, ring->tail);
> +	seq_printf(m, " (ringbuffer, space: %d, head: %u, tail: %u, emit: %u)",
> +		   ring->space, ring->head, ring->tail, ring->emit);
>  }
>  
>  static int i915_context_status(struct seq_file *m, void *unused)
> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
> index 4ba139c27fba..e71bd6951d9b 100644
> --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> @@ -1929,12 +1929,16 @@ void intel_engine_dump(struct intel_engine_cs *engine,
>  			   rq->head, rq->postfix, rq->tail,
>  			   rq->batch ? upper_32_bits(rq->batch->node.start) : ~0u,
>  			   rq->batch ? lower_32_bits(rq->batch->node.start) : ~0u);
> -		drm_printf(m, "\t\tring->start: 0x%08x\n",
> +		drm_printf(m, "\t\tring->start : 0x%08x\n",

Please check the space before ':', seems unintentional.

Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>

>  			   i915_ggtt_offset(rq->ring->vma));
> -		drm_printf(m, "\t\tring->head:  0x%08x\n",
> +		drm_printf(m, "\t\tring->head:   0x%08x\n",
>  			   rq->ring->head);
> -		drm_printf(m, "\t\tring->tail:  0x%08x\n",
> +		drm_printf(m, "\t\tring->tail:   0x%08x\n",
>  			   rq->ring->tail);
> +		drm_printf(m, "\t\tring->emit:   0x%08x\n",
> +			   rq->ring->emit);
> +		drm_printf(m, "\t\tring->space:  0x%08x\n",
> +			   rq->ring->space);
>  	}
>  
>  	rcu_read_unlock();
> -- 
> 2.16.2
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/6] drm/i915: Reset ring space estimate after unwinding the request
  2018-03-09 13:17   ` Mika Kuoppala
@ 2018-03-09 13:42     ` Chris Wilson
  0 siblings, 0 replies; 19+ messages in thread
From: Chris Wilson @ 2018-03-09 13:42 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx

Quoting Mika Kuoppala (2018-03-09 13:17:09)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > With a series of unusual events (a sequence of interrupted request
> > allocations), we could gradually leak the ring->space estimate by
> > unwinding the ring back to the start of the request, but not return the
> > used space back to the ring. Eventually and with great misfortune, it
> > would be possible to trigger ring->space exhaustion with no requests on
> > the ring.
> >
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_request.c     | 1 +
> >  drivers/gpu/drm/i915/intel_ringbuffer.c | 1 +
> >  2 files changed, 2 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> > index d437beac3969..efa9ee557f31 100644
> > --- a/drivers/gpu/drm/i915/i915_request.c
> > +++ b/drivers/gpu/drm/i915/i915_request.c
> > @@ -798,6 +798,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
> >  
> >  err_unwind:
> >       rq->ring->emit = rq->head;
> > +     intel_ring_update_space(rq->ring);
> >  
> >       /* Make sure we didn't add ourselves to external state before freeing */
> >       GEM_BUG_ON(!list_empty(&rq->active_list));
> > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > index 1d599524a759..88eeb64041ae 100644
> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > @@ -1593,6 +1593,7 @@ static noinline int wait_for_space(struct intel_ring *ring, unsigned int bytes)
> >       if (intel_ring_update_space(ring) >= bytes)
> >               return 0;
> >  
> > +     GEM_BUG_ON(list_empty(&ring->request_list));
> 
> At some point, after long series of misfortunate events, we
> will add a first request and need actually wait a space for this
> and then we kaboom in here?

That's the experience. However, I don't think this patch helps because
we do the intel_ring_update_space() here inside wait_for_space anyway,
so it should come out in the wash so long as we aren't corrupting
the ring space calculation.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/6] drm/i915: Wrap engine->schedule in RCU locks for set-wedge protection
  2018-03-07 13:42 ` [PATCH 5/6] drm/i915: Wrap engine->schedule in RCU locks for set-wedge protection Chris Wilson
@ 2018-03-09 13:49   ` Mika Kuoppala
  0 siblings, 0 replies; 19+ messages in thread
From: Mika Kuoppala @ 2018-03-09 13:49 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> Similar to the staging around handling of engine->submit_request, we
> need to stop adding to the execlists->queue prior to calling
> engine->cancel_requests. cancel_requests will move requests from the
> queue onto the timeline, so if we add a request onto the queue after that
> point, it will be lost.
>
> Fixes: af7a8ffad9c5 ("drm/i915: Use rcu instead of stop_machine in set_wedged")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>

Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>

> ---
>  drivers/gpu/drm/i915/i915_gem.c     | 13 +++++++------
>  drivers/gpu/drm/i915/i915_request.c |  2 ++
>  2 files changed, 9 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 30cd07ebcb8e..3b44952e089f 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -471,10 +471,11 @@ static void __fence_set_priority(struct dma_fence *fence, int prio)
>  
>  	rq = to_request(fence);
>  	engine = rq->engine;
> -	if (!engine->schedule)
> -		return;
>  
> -	engine->schedule(rq, prio);
> +	rcu_read_lock();
> +	if (engine->schedule)
> +		engine->schedule(rq, prio);
> +	rcu_read_unlock();
>  }
>  
>  static void fence_set_priority(struct dma_fence *fence, int prio)
> @@ -3214,8 +3215,11 @@ void i915_gem_set_wedged(struct drm_i915_private *i915)
>  	 */
>  	for_each_engine(engine, i915, id) {
>  		i915_gem_reset_prepare_engine(engine);
> +
>  		engine->submit_request = nop_submit_request;
> +		engine->schedule = NULL;
>  	}
> +	i915->caps.scheduler = 0;
>  
>  	/*
>  	 * Make sure no one is running the old callback before we proceed with
> @@ -3233,11 +3237,8 @@ void i915_gem_set_wedged(struct drm_i915_private *i915)
>  		 * start to complete all requests.
>  		 */
>  		engine->submit_request = nop_complete_submit_request;
> -		engine->schedule = NULL;
>  	}
>  
> -	i915->caps.scheduler = 0;
> -
>  	/*
>  	 * Make sure no request can slip through without getting completed by
>  	 * either this call here to intel_engine_init_global_seqno, or the one
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 69b378a323fc..0258449579f8 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -1082,8 +1082,10 @@ void __i915_request_add(struct i915_request *request, bool flush_caches)
>  	 * decide whether to preempt the entire chain so that it is ready to
>  	 * run at the earliest possible convenience.
>  	 */
> +	rcu_read_lock();
>  	if (engine->schedule)
>  		engine->schedule(request, request->ctx->priority);
> +	rcu_read_unlock();
>  
>  	local_bh_disable();
>  	i915_sw_fence_commit(&request->submit);
> -- 
> 2.16.2
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/6] drm/i915: Update ring position from request on retiring
  2018-03-09 13:38   ` Mika Kuoppala
@ 2018-03-09 13:56     ` Chris Wilson
  0 siblings, 0 replies; 19+ messages in thread
From: Chris Wilson @ 2018-03-09 13:56 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx

Quoting Mika Kuoppala (2018-03-09 13:38:37)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > When wedged, we do not update the ring->tail as we submit the requests
> > causing us to leak the ring->space upon cleaning up the wedged driver.
> > We can just use the value stored in rq->tail, and keep the submission
> > backend details away from set-wedge.
> >
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_request.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> > index efa9ee557f31..69b378a323fc 100644
> > --- a/drivers/gpu/drm/i915/i915_request.c
> > +++ b/drivers/gpu/drm/i915/i915_request.c
> > @@ -358,7 +358,7 @@ static void advance_ring(struct i915_request *request)
> >                * is just about to be. Either works, if we miss the last two
> >                * noops - they are safe to be replayed on a reset.
> >                */
> > -             tail = READ_ONCE(request->ring->tail);
> > +             tail = READ_ONCE(request->tail);
> 
> I tried to think if we need the READ_ONCE here anymore.

I tried as well to see if the comment was still correct. It still is due
to that we can retire before we see the context-switch interrupt.
 
> But as this is the safest version,
> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> 
> Noticed that request->tail is not cleared on i915_request_alloc.
> 
> If we would set rq->head = rq->tail = rq->ring->emit
> we could use rq->head == rq->tail to assert that
> we screw up something major during the request lifetime.

Heh, if we get a stray write here, we're going to get stray writes
everywhere :)
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/6] drm/i915: Only call tasklet_kill() on the first prepare_reset
  2018-03-07 13:42 ` [PATCH 6/6] drm/i915: Only call tasklet_kill() on the first prepare_reset Chris Wilson
@ 2018-03-09 14:01   ` Mika Kuoppala
  2018-03-09 14:10   ` Chris Wilson
  1 sibling, 0 replies; 19+ messages in thread
From: Mika Kuoppala @ 2018-03-09 14:01 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> tasklet_kill() will spin waiting for the current tasklet to be executed.
> However, if tasklet_disable() has been called, then the tasklet is never
> executed but permanently put back onto the runlist until
> tasklet_enable() is called. Ergo, we cannot use tasklet_kill() inside a
> disable/enable pair. This is the case when we call set-wedge from inside
> i915_reset(), and another request was submitted to us concurrent to the
> reset.
>
> Fixes: 963ddd63c314 ("drm/i915: Suspend submission tasklets around wedging")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 3b44952e089f..e75af06904b7 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2941,7 +2941,8 @@ i915_gem_reset_prepare_engine(struct intel_engine_cs *engine)
>  	 * Turning off the execlists->tasklet until the reset is over
>  	 * prevents the race.
>  	 */
> -	tasklet_kill(&engine->execlists.tasklet);
> +	if (!atomic_read(&engine->execlists.tasklet.count))
> +		tasklet_kill(&engine->execlists.tasklet);

As discussed in irc, this might warrant  comment that it is
our tasklet of which count we are racily investigating here,
so the race does not matter in the path we avoid killing.

Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>

>  	tasklet_disable(&engine->execlists.tasklet);
>  
>  	/*
> -- 
> 2.16.2
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/6] drm/i915: Only call tasklet_kill() on the first prepare_reset
  2018-03-07 13:42 ` [PATCH 6/6] drm/i915: Only call tasklet_kill() on the first prepare_reset Chris Wilson
  2018-03-09 14:01   ` Mika Kuoppala
@ 2018-03-09 14:10   ` Chris Wilson
  2018-03-09 14:22     ` Chris Wilson
  1 sibling, 1 reply; 19+ messages in thread
From: Chris Wilson @ 2018-03-09 14:10 UTC (permalink / raw)
  To: intel-gfx

Quoting Chris Wilson (2018-03-07 13:42:26)
> tasklet_kill() will spin waiting for the current tasklet to be executed.
> However, if tasklet_disable() has been called, then the tasklet is never
> executed but permanently put back onto the runlist until
> tasklet_enable() is called. Ergo, we cannot use tasklet_kill() inside a
> disable/enable pair. This is the case when we call set-wedge from inside
> i915_reset(), and another request was submitted to us concurrent to the
> reset.
> 
> Fixes: 963ddd63c314 ("drm/i915: Suspend submission tasklets around wedging")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 3b44952e089f..e75af06904b7 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2941,7 +2941,8 @@ i915_gem_reset_prepare_engine(struct intel_engine_cs *engine)
>          * Turning off the execlists->tasklet until the reset is over
>          * prevents the race.
>          */
> -       tasklet_kill(&engine->execlists.tasklet);
> +       if (!atomic_read(&engine->execlists.tasklet.count))
> +               tasklet_kill(&engine->execlists.tasklet);

Note this is racy; two separate atomic operations. The only race we have
is with set-wedge vs reset, which is a rare and already contentious
issue. The upside of preventing the lockup is that we don't lose the
machine.

+        * Note that this needs to be a single atomic operation on the
+        * tasklet (flush existing tasks, prevent new tasks) to prevent
+        * a race between reset and set-wedged. It is not, so we do the best
+        * we can atm and make sure we don't lock the machine up in the more
+        * common case of recursing calling set-wedged from inside i915_reset.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/6] drm/i915: Only call tasklet_kill() on the first prepare_reset
  2018-03-09 14:10   ` Chris Wilson
@ 2018-03-09 14:22     ` Chris Wilson
  0 siblings, 0 replies; 19+ messages in thread
From: Chris Wilson @ 2018-03-09 14:22 UTC (permalink / raw)
  To: intel-gfx

Quoting Chris Wilson (2018-03-09 14:10:34)
> Quoting Chris Wilson (2018-03-07 13:42:26)
> > tasklet_kill() will spin waiting for the current tasklet to be executed.
> > However, if tasklet_disable() has been called, then the tasklet is never
> > executed but permanently put back onto the runlist until
> > tasklet_enable() is called. Ergo, we cannot use tasklet_kill() inside a
> > disable/enable pair. This is the case when we call set-wedge from inside
> > i915_reset(), and another request was submitted to us concurrent to the
> > reset.
> > 
> > Fixes: 963ddd63c314 ("drm/i915: Suspend submission tasklets around wedging")
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_gem.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index 3b44952e089f..e75af06904b7 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -2941,7 +2941,8 @@ i915_gem_reset_prepare_engine(struct intel_engine_cs *engine)
> >          * Turning off the execlists->tasklet until the reset is over
> >          * prevents the race.
> >          */
> > -       tasklet_kill(&engine->execlists.tasklet);
> > +       if (!atomic_read(&engine->execlists.tasklet.count))
> > +               tasklet_kill(&engine->execlists.tasklet);
> 
> Note this is racy; two separate atomic operations. The only race we have
> is with set-wedge vs reset, which is a rare and already contentious
> issue. The upside of preventing the lockup is that we don't lose the
> machine.
> 
> +        * Note that this needs to be a single atomic operation on the
> +        * tasklet (flush existing tasks, prevent new tasks) to prevent
> +        * a race between reset and set-wedged. It is not, so we do the best
> +        * we can atm and make sure we don't lock the machine up in the more
> +        * common case of recursing calling set-wedged from inside i915_reset.

The alternative is to not try and do tasklet_kill() here. But that has
the side-effect of leaving it spinning if it was running at the same
time as the suspend. Hmm, for the moment I'll go with the comment and
see how inspired we are once gem_eio stops dieing.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2018-03-09 14:23 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-07 13:42 [PATCH 1/6] drm/i915: Finish the wait-for-wedge by retiring all the inflight requests Chris Wilson
2018-03-07 13:42 ` [PATCH 2/6] drm/i915: Reset ring space estimate after unwinding the request Chris Wilson
2018-03-09 13:17   ` Mika Kuoppala
2018-03-09 13:42     ` Chris Wilson
2018-03-09 13:20   ` Chris Wilson
2018-03-07 13:42 ` [PATCH 3/6] drm/i915: Update ring position from request on retiring Chris Wilson
2018-03-09 13:38   ` Mika Kuoppala
2018-03-09 13:56     ` Chris Wilson
2018-03-07 13:42 ` [PATCH 4/6] drm/i915: Include ring->emit in debugging Chris Wilson
2018-03-09 13:41   ` Mika Kuoppala
2018-03-07 13:42 ` [PATCH 5/6] drm/i915: Wrap engine->schedule in RCU locks for set-wedge protection Chris Wilson
2018-03-09 13:49   ` Mika Kuoppala
2018-03-07 13:42 ` [PATCH 6/6] drm/i915: Only call tasklet_kill() on the first prepare_reset Chris Wilson
2018-03-09 14:01   ` Mika Kuoppala
2018-03-09 14:10   ` Chris Wilson
2018-03-09 14:22     ` Chris Wilson
2018-03-07 14:31 ` ✓ Fi.CI.BAT: success for series starting with [1/6] drm/i915: Finish the wait-for-wedge by retiring all the inflight requests Patchwork
2018-03-07 16:55 ` ✓ Fi.CI.IGT: " Patchwork
2018-03-09 12:54 ` [PATCH 1/6] " Mika Kuoppala

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.