* [PATCH 1/6] drm/i915: Finish the wait-for-wedge by retiring all the inflight requests
@ 2018-03-07 13:42 Chris Wilson
2018-03-07 13:42 ` [PATCH 2/6] drm/i915: Reset ring space estimate after unwinding the request Chris Wilson
` (7 more replies)
0 siblings, 8 replies; 19+ messages in thread
From: Chris Wilson @ 2018-03-07 13:42 UTC (permalink / raw)
To: intel-gfx
Before we reset the GPU after marking the device as wedged, we wait for
all the remaining requests to be completed (and marked as EIO).
Afterwards, we should flush the request lists so the next batch start
with the driver in an idle start.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
drivers/gpu/drm/i915/i915_gem.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index a5bd07338b46..30cd07ebcb8e 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3273,7 +3273,8 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915)
if (!test_bit(I915_WEDGED, &i915->gpu_error.flags))
return true;
- /* Before unwedging, make sure that all pending operations
+ /*
+ * Before unwedging, make sure that all pending operations
* are flushed and errored out - we may have requests waiting upon
* third party fences. We marked all inflight requests as EIO, and
* every execbuf since returned EIO, for consistency we want all
@@ -3291,7 +3292,8 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915)
if (!rq)
continue;
- /* We can't use our normal waiter as we want to
+ /*
+ * We can't use our normal waiter as we want to
* avoid recursively trying to handle the current
* reset. The basic dma_fence_default_wait() installs
* a callback for dma_fence_signal(), which is
@@ -3306,8 +3308,11 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915)
return false;
}
}
+ i915_retire_requests(i915);
+ GEM_BUG_ON(i915->gt.active_requests);
- /* Undo nop_submit_request. We prevent all new i915 requests from
+ /*
+ * Undo nop_submit_request. We prevent all new i915 requests from
* being queued (by disallowing execbuf whilst wedged) so having
* waited for all active requests above, we know the system is idle
* and do not have to worry about a thread being inside
--
2.16.2
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 2/6] drm/i915: Reset ring space estimate after unwinding the request
2018-03-07 13:42 [PATCH 1/6] drm/i915: Finish the wait-for-wedge by retiring all the inflight requests Chris Wilson
@ 2018-03-07 13:42 ` Chris Wilson
2018-03-09 13:17 ` Mika Kuoppala
2018-03-09 13:20 ` Chris Wilson
2018-03-07 13:42 ` [PATCH 3/6] drm/i915: Update ring position from request on retiring Chris Wilson
` (6 subsequent siblings)
7 siblings, 2 replies; 19+ messages in thread
From: Chris Wilson @ 2018-03-07 13:42 UTC (permalink / raw)
To: intel-gfx
With a series of unusual events (a sequence of interrupted request
allocations), we could gradually leak the ring->space estimate by
unwinding the ring back to the start of the request, but not return the
used space back to the ring. Eventually and with great misfortune, it
would be possible to trigger ring->space exhaustion with no requests on
the ring.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
drivers/gpu/drm/i915/i915_request.c | 1 +
drivers/gpu/drm/i915/intel_ringbuffer.c | 1 +
2 files changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index d437beac3969..efa9ee557f31 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -798,6 +798,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
err_unwind:
rq->ring->emit = rq->head;
+ intel_ring_update_space(rq->ring);
/* Make sure we didn't add ourselves to external state before freeing */
GEM_BUG_ON(!list_empty(&rq->active_list));
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 1d599524a759..88eeb64041ae 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1593,6 +1593,7 @@ static noinline int wait_for_space(struct intel_ring *ring, unsigned int bytes)
if (intel_ring_update_space(ring) >= bytes)
return 0;
+ GEM_BUG_ON(list_empty(&ring->request_list));
list_for_each_entry(target, &ring->request_list, ring_link) {
/* Would completion of this request free enough space? */
if (bytes <= __intel_ring_space(target->postfix,
--
2.16.2
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 3/6] drm/i915: Update ring position from request on retiring
2018-03-07 13:42 [PATCH 1/6] drm/i915: Finish the wait-for-wedge by retiring all the inflight requests Chris Wilson
2018-03-07 13:42 ` [PATCH 2/6] drm/i915: Reset ring space estimate after unwinding the request Chris Wilson
@ 2018-03-07 13:42 ` Chris Wilson
2018-03-09 13:38 ` Mika Kuoppala
2018-03-07 13:42 ` [PATCH 4/6] drm/i915: Include ring->emit in debugging Chris Wilson
` (5 subsequent siblings)
7 siblings, 1 reply; 19+ messages in thread
From: Chris Wilson @ 2018-03-07 13:42 UTC (permalink / raw)
To: intel-gfx
When wedged, we do not update the ring->tail as we submit the requests
causing us to leak the ring->space upon cleaning up the wedged driver.
We can just use the value stored in rq->tail, and keep the submission
backend details away from set-wedge.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
drivers/gpu/drm/i915/i915_request.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index efa9ee557f31..69b378a323fc 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -358,7 +358,7 @@ static void advance_ring(struct i915_request *request)
* is just about to be. Either works, if we miss the last two
* noops - they are safe to be replayed on a reset.
*/
- tail = READ_ONCE(request->ring->tail);
+ tail = READ_ONCE(request->tail);
} else {
tail = request->postfix;
}
--
2.16.2
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 4/6] drm/i915: Include ring->emit in debugging
2018-03-07 13:42 [PATCH 1/6] drm/i915: Finish the wait-for-wedge by retiring all the inflight requests Chris Wilson
2018-03-07 13:42 ` [PATCH 2/6] drm/i915: Reset ring space estimate after unwinding the request Chris Wilson
2018-03-07 13:42 ` [PATCH 3/6] drm/i915: Update ring position from request on retiring Chris Wilson
@ 2018-03-07 13:42 ` Chris Wilson
2018-03-09 13:41 ` Mika Kuoppala
2018-03-07 13:42 ` [PATCH 5/6] drm/i915: Wrap engine->schedule in RCU locks for set-wedge protection Chris Wilson
` (4 subsequent siblings)
7 siblings, 1 reply; 19+ messages in thread
From: Chris Wilson @ 2018-03-07 13:42 UTC (permalink / raw)
To: intel-gfx
Include ring->emit and ring->space alongside ring->(head,tail) when
printing debug information.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
drivers/gpu/drm/i915/i915_debugfs.c | 4 ++--
drivers/gpu/drm/i915/intel_engine_cs.c | 10 +++++++---
2 files changed, 9 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index e838c765b251..4de0a52f14a9 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1923,8 +1923,8 @@ static int i915_gem_framebuffer_info(struct seq_file *m, void *data)
static void describe_ctx_ring(struct seq_file *m, struct intel_ring *ring)
{
- seq_printf(m, " (ringbuffer, space: %d, head: %u, tail: %u)",
- ring->space, ring->head, ring->tail);
+ seq_printf(m, " (ringbuffer, space: %d, head: %u, tail: %u, emit: %u)",
+ ring->space, ring->head, ring->tail, ring->emit);
}
static int i915_context_status(struct seq_file *m, void *unused)
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 4ba139c27fba..e71bd6951d9b 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1929,12 +1929,16 @@ void intel_engine_dump(struct intel_engine_cs *engine,
rq->head, rq->postfix, rq->tail,
rq->batch ? upper_32_bits(rq->batch->node.start) : ~0u,
rq->batch ? lower_32_bits(rq->batch->node.start) : ~0u);
- drm_printf(m, "\t\tring->start: 0x%08x\n",
+ drm_printf(m, "\t\tring->start : 0x%08x\n",
i915_ggtt_offset(rq->ring->vma));
- drm_printf(m, "\t\tring->head: 0x%08x\n",
+ drm_printf(m, "\t\tring->head: 0x%08x\n",
rq->ring->head);
- drm_printf(m, "\t\tring->tail: 0x%08x\n",
+ drm_printf(m, "\t\tring->tail: 0x%08x\n",
rq->ring->tail);
+ drm_printf(m, "\t\tring->emit: 0x%08x\n",
+ rq->ring->emit);
+ drm_printf(m, "\t\tring->space: 0x%08x\n",
+ rq->ring->space);
}
rcu_read_unlock();
--
2.16.2
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 5/6] drm/i915: Wrap engine->schedule in RCU locks for set-wedge protection
2018-03-07 13:42 [PATCH 1/6] drm/i915: Finish the wait-for-wedge by retiring all the inflight requests Chris Wilson
` (2 preceding siblings ...)
2018-03-07 13:42 ` [PATCH 4/6] drm/i915: Include ring->emit in debugging Chris Wilson
@ 2018-03-07 13:42 ` Chris Wilson
2018-03-09 13:49 ` Mika Kuoppala
2018-03-07 13:42 ` [PATCH 6/6] drm/i915: Only call tasklet_kill() on the first prepare_reset Chris Wilson
` (3 subsequent siblings)
7 siblings, 1 reply; 19+ messages in thread
From: Chris Wilson @ 2018-03-07 13:42 UTC (permalink / raw)
To: intel-gfx; +Cc: Mika Kuoppala
Similar to the staging around handling of engine->submit_request, we
need to stop adding to the execlists->queue prior to calling
engine->cancel_requests. cancel_requests will move requests from the
queue onto the timeline, so if we add a request onto the queue after that
point, it will be lost.
Fixes: af7a8ffad9c5 ("drm/i915: Use rcu instead of stop_machine in set_wedged")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
---
drivers/gpu/drm/i915/i915_gem.c | 13 +++++++------
drivers/gpu/drm/i915/i915_request.c | 2 ++
2 files changed, 9 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 30cd07ebcb8e..3b44952e089f 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -471,10 +471,11 @@ static void __fence_set_priority(struct dma_fence *fence, int prio)
rq = to_request(fence);
engine = rq->engine;
- if (!engine->schedule)
- return;
- engine->schedule(rq, prio);
+ rcu_read_lock();
+ if (engine->schedule)
+ engine->schedule(rq, prio);
+ rcu_read_unlock();
}
static void fence_set_priority(struct dma_fence *fence, int prio)
@@ -3214,8 +3215,11 @@ void i915_gem_set_wedged(struct drm_i915_private *i915)
*/
for_each_engine(engine, i915, id) {
i915_gem_reset_prepare_engine(engine);
+
engine->submit_request = nop_submit_request;
+ engine->schedule = NULL;
}
+ i915->caps.scheduler = 0;
/*
* Make sure no one is running the old callback before we proceed with
@@ -3233,11 +3237,8 @@ void i915_gem_set_wedged(struct drm_i915_private *i915)
* start to complete all requests.
*/
engine->submit_request = nop_complete_submit_request;
- engine->schedule = NULL;
}
- i915->caps.scheduler = 0;
-
/*
* Make sure no request can slip through without getting completed by
* either this call here to intel_engine_init_global_seqno, or the one
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 69b378a323fc..0258449579f8 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1082,8 +1082,10 @@ void __i915_request_add(struct i915_request *request, bool flush_caches)
* decide whether to preempt the entire chain so that it is ready to
* run at the earliest possible convenience.
*/
+ rcu_read_lock();
if (engine->schedule)
engine->schedule(request, request->ctx->priority);
+ rcu_read_unlock();
local_bh_disable();
i915_sw_fence_commit(&request->submit);
--
2.16.2
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 6/6] drm/i915: Only call tasklet_kill() on the first prepare_reset
2018-03-07 13:42 [PATCH 1/6] drm/i915: Finish the wait-for-wedge by retiring all the inflight requests Chris Wilson
` (3 preceding siblings ...)
2018-03-07 13:42 ` [PATCH 5/6] drm/i915: Wrap engine->schedule in RCU locks for set-wedge protection Chris Wilson
@ 2018-03-07 13:42 ` Chris Wilson
2018-03-09 14:01 ` Mika Kuoppala
2018-03-09 14:10 ` Chris Wilson
2018-03-07 14:31 ` ✓ Fi.CI.BAT: success for series starting with [1/6] drm/i915: Finish the wait-for-wedge by retiring all the inflight requests Patchwork
` (2 subsequent siblings)
7 siblings, 2 replies; 19+ messages in thread
From: Chris Wilson @ 2018-03-07 13:42 UTC (permalink / raw)
To: intel-gfx
tasklet_kill() will spin waiting for the current tasklet to be executed.
However, if tasklet_disable() has been called, then the tasklet is never
executed but permanently put back onto the runlist until
tasklet_enable() is called. Ergo, we cannot use tasklet_kill() inside a
disable/enable pair. This is the case when we call set-wedge from inside
i915_reset(), and another request was submitted to us concurrent to the
reset.
Fixes: 963ddd63c314 ("drm/i915: Suspend submission tasklets around wedging")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
drivers/gpu/drm/i915/i915_gem.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 3b44952e089f..e75af06904b7 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2941,7 +2941,8 @@ i915_gem_reset_prepare_engine(struct intel_engine_cs *engine)
* Turning off the execlists->tasklet until the reset is over
* prevents the race.
*/
- tasklet_kill(&engine->execlists.tasklet);
+ if (!atomic_read(&engine->execlists.tasklet.count))
+ tasklet_kill(&engine->execlists.tasklet);
tasklet_disable(&engine->execlists.tasklet);
/*
--
2.16.2
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 19+ messages in thread
* ✓ Fi.CI.BAT: success for series starting with [1/6] drm/i915: Finish the wait-for-wedge by retiring all the inflight requests
2018-03-07 13:42 [PATCH 1/6] drm/i915: Finish the wait-for-wedge by retiring all the inflight requests Chris Wilson
` (4 preceding siblings ...)
2018-03-07 13:42 ` [PATCH 6/6] drm/i915: Only call tasklet_kill() on the first prepare_reset Chris Wilson
@ 2018-03-07 14:31 ` Patchwork
2018-03-07 16:55 ` ✓ Fi.CI.IGT: " Patchwork
2018-03-09 12:54 ` [PATCH 1/6] " Mika Kuoppala
7 siblings, 0 replies; 19+ messages in thread
From: Patchwork @ 2018-03-07 14:31 UTC (permalink / raw)
To: Chris Wilson; +Cc: intel-gfx
== Series Details ==
Series: series starting with [1/6] drm/i915: Finish the wait-for-wedge by retiring all the inflight requests
URL : https://patchwork.freedesktop.org/series/39532/
State : success
== Summary ==
Series 39532v1 series starting with [1/6] drm/i915: Finish the wait-for-wedge by retiring all the inflight requests
https://patchwork.freedesktop.org/api/1.0/series/39532/revisions/1/mbox/
---- Known issues:
Test kms_pipe_crc_basic:
Subgroup suspend-read-crc-pipe-b:
incomplete -> PASS (fi-snb-2520m) fdo#103713
Subgroup suspend-read-crc-pipe-c:
incomplete -> PASS (fi-bxt-dsi) fdo#103927
fdo#103713 https://bugs.freedesktop.org/show_bug.cgi?id=103713
fdo#103927 https://bugs.freedesktop.org/show_bug.cgi?id=103927
fi-bdw-5557u total:288 pass:267 dwarn:0 dfail:0 fail:0 skip:21 time:423s
fi-bdw-gvtdvm total:288 pass:264 dwarn:0 dfail:0 fail:0 skip:24 time:425s
fi-blb-e6850 total:288 pass:223 dwarn:1 dfail:0 fail:0 skip:64 time:370s
fi-bsw-n3050 total:288 pass:242 dwarn:0 dfail:0 fail:0 skip:46 time:497s
fi-bwr-2160 total:288 pass:183 dwarn:0 dfail:0 fail:0 skip:105 time:278s
fi-bxt-dsi total:288 pass:258 dwarn:0 dfail:0 fail:0 skip:30 time:488s
fi-bxt-j4205 total:288 pass:259 dwarn:0 dfail:0 fail:0 skip:29 time:488s
fi-byt-j1900 total:288 pass:253 dwarn:0 dfail:0 fail:0 skip:35 time:478s
fi-byt-n2820 total:288 pass:249 dwarn:0 dfail:0 fail:0 skip:39 time:469s
fi-cfl-8700k total:288 pass:260 dwarn:0 dfail:0 fail:0 skip:28 time:403s
fi-cfl-s2 total:288 pass:262 dwarn:0 dfail:0 fail:0 skip:26 time:575s
fi-cfl-u total:288 pass:262 dwarn:0 dfail:0 fail:0 skip:26 time:504s
fi-elk-e7500 total:288 pass:229 dwarn:0 dfail:0 fail:0 skip:59 time:413s
fi-gdg-551 total:288 pass:179 dwarn:0 dfail:0 fail:1 skip:108 time:289s
fi-glk-1 total:288 pass:260 dwarn:0 dfail:0 fail:0 skip:28 time:515s
fi-hsw-4770 total:288 pass:261 dwarn:0 dfail:0 fail:0 skip:27 time:396s
fi-ilk-650 total:288 pass:228 dwarn:0 dfail:0 fail:0 skip:60 time:410s
fi-ivb-3520m total:288 pass:259 dwarn:0 dfail:0 fail:0 skip:29 time:464s
fi-ivb-3770 total:288 pass:255 dwarn:0 dfail:0 fail:0 skip:33 time:421s
fi-kbl-7500u total:288 pass:263 dwarn:1 dfail:0 fail:0 skip:24 time:468s
fi-kbl-7567u total:288 pass:268 dwarn:0 dfail:0 fail:0 skip:20 time:461s
fi-kbl-r total:288 pass:261 dwarn:0 dfail:0 fail:0 skip:27 time:507s
fi-pnv-d510 total:288 pass:222 dwarn:1 dfail:0 fail:0 skip:65 time:588s
fi-skl-6260u total:288 pass:268 dwarn:0 dfail:0 fail:0 skip:20 time:435s
fi-skl-6600u total:288 pass:261 dwarn:0 dfail:0 fail:0 skip:27 time:521s
fi-skl-6700hq total:288 pass:262 dwarn:0 dfail:0 fail:0 skip:26 time:534s
fi-skl-6700k2 total:288 pass:264 dwarn:0 dfail:0 fail:0 skip:24 time:501s
fi-skl-6770hq total:288 pass:268 dwarn:0 dfail:0 fail:0 skip:20 time:486s
fi-skl-guc total:288 pass:260 dwarn:0 dfail:0 fail:0 skip:28 time:420s
fi-skl-gvtdvm total:288 pass:265 dwarn:0 dfail:0 fail:0 skip:23 time:429s
fi-snb-2520m total:288 pass:248 dwarn:0 dfail:0 fail:0 skip:40 time:522s
fi-snb-2600 total:288 pass:248 dwarn:0 dfail:0 fail:0 skip:40 time:401s
36e67bf18117a31df69df314e9e7f3709c37c9c8 drm-tip: 2018y-03m-07d-13h-36m-57s UTC integration manifest
7dc0b21fe5c5 drm/i915: Only call tasklet_kill() on the first prepare_reset
b6b27d833c1e drm/i915: Wrap engine->schedule in RCU locks for set-wedge protection
922b2f72ecc7 drm/i915: Include ring->emit in debugging
bdd4d1164ec2 drm/i915: Update ring position from request on retiring
50b3612924f1 drm/i915: Reset ring space estimate after unwinding the request
95cc7a5b45e6 drm/i915: Finish the wait-for-wedge by retiring all the inflight requests
== Logs ==
For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_8258/issues.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 19+ messages in thread
* ✓ Fi.CI.IGT: success for series starting with [1/6] drm/i915: Finish the wait-for-wedge by retiring all the inflight requests
2018-03-07 13:42 [PATCH 1/6] drm/i915: Finish the wait-for-wedge by retiring all the inflight requests Chris Wilson
` (5 preceding siblings ...)
2018-03-07 14:31 ` ✓ Fi.CI.BAT: success for series starting with [1/6] drm/i915: Finish the wait-for-wedge by retiring all the inflight requests Patchwork
@ 2018-03-07 16:55 ` Patchwork
2018-03-09 12:54 ` [PATCH 1/6] " Mika Kuoppala
7 siblings, 0 replies; 19+ messages in thread
From: Patchwork @ 2018-03-07 16:55 UTC (permalink / raw)
To: Chris Wilson; +Cc: intel-gfx
== Series Details ==
Series: series starting with [1/6] drm/i915: Finish the wait-for-wedge by retiring all the inflight requests
URL : https://patchwork.freedesktop.org/series/39532/
State : success
== Summary ==
---- Possible new issues:
Test kms_busy:
Subgroup extended-pageflip-hang-oldfb-render-a:
skip -> PASS (shard-snb)
Test kms_cursor_legacy:
Subgroup cursorb-vs-flipa-varying-size:
dmesg-warn -> PASS (shard-hsw)
Test kms_vblank:
Subgroup pipe-a-ts-continuation-modeset:
skip -> PASS (shard-snb)
Subgroup pipe-a-wait-busy-hang:
skip -> PASS (shard-snb)
Subgroup pipe-b-wait-idle:
dmesg-warn -> PASS (shard-hsw)
---- Known issues:
Test gem_eio:
Subgroup in-flight:
pass -> INCOMPLETE (shard-apl) fdo#105341 +1
Test kms_chv_cursor_fail:
Subgroup pipe-b-64x64-right-edge:
dmesg-warn -> PASS (shard-snb) fdo#105185 +2
Test kms_cursor_crc:
Subgroup cursor-128x128-suspend:
pass -> INCOMPLETE (shard-hsw) fdo#103540
Test kms_flip:
Subgroup 2x-flip-vs-expired-vblank-interruptible:
pass -> FAIL (shard-hsw) fdo#102887
Subgroup 2x-plain-flip-fb-recreate:
pass -> FAIL (shard-hsw) fdo#100368 +2
fdo#105341 https://bugs.freedesktop.org/show_bug.cgi?id=105341
fdo#105185 https://bugs.freedesktop.org/show_bug.cgi?id=105185
fdo#103540 https://bugs.freedesktop.org/show_bug.cgi?id=103540
fdo#102887 https://bugs.freedesktop.org/show_bug.cgi?id=102887
fdo#100368 https://bugs.freedesktop.org/show_bug.cgi?id=100368
shard-apl total:3394 pass:1786 dwarn:1 dfail:0 fail:8 skip:1597 time:11868s
shard-hsw total:3429 pass:1751 dwarn:1 dfail:0 fail:5 skip:1670 time:11284s
shard-snb total:3467 pass:1362 dwarn:2 dfail:0 fail:2 skip:2101 time:6988s
Blacklisted hosts:
shard-kbl total:3381 pass:1900 dwarn:3 dfail:0 fail:7 skip:1470 time:9183s
== Logs ==
For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_8258/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/6] drm/i915: Finish the wait-for-wedge by retiring all the inflight requests
2018-03-07 13:42 [PATCH 1/6] drm/i915: Finish the wait-for-wedge by retiring all the inflight requests Chris Wilson
` (6 preceding siblings ...)
2018-03-07 16:55 ` ✓ Fi.CI.IGT: " Patchwork
@ 2018-03-09 12:54 ` Mika Kuoppala
7 siblings, 0 replies; 19+ messages in thread
From: Mika Kuoppala @ 2018-03-09 12:54 UTC (permalink / raw)
To: Chris Wilson, intel-gfx
Chris Wilson <chris@chris-wilson.co.uk> writes:
> Before we reset the GPU after marking the device as wedged, we wait for
> all the remaining requests to be completed (and marked as EIO).
> Afterwards, we should flush the request lists so the next batch start
> with the driver in an idle start.
s/start/state?
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> ---
> drivers/gpu/drm/i915/i915_gem.c | 11 ++++++++---
> 1 file changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index a5bd07338b46..30cd07ebcb8e 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3273,7 +3273,8 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915)
> if (!test_bit(I915_WEDGED, &i915->gpu_error.flags))
> return true;
>
> - /* Before unwedging, make sure that all pending operations
> + /*
> + * Before unwedging, make sure that all pending operations
> * are flushed and errored out - we may have requests waiting upon
> * third party fences. We marked all inflight requests as EIO, and
> * every execbuf since returned EIO, for consistency we want all
> @@ -3291,7 +3292,8 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915)
> if (!rq)
> continue;
>
> - /* We can't use our normal waiter as we want to
> + /*
> + * We can't use our normal waiter as we want to
> * avoid recursively trying to handle the current
> * reset. The basic dma_fence_default_wait() installs
> * a callback for dma_fence_signal(), which is
> @@ -3306,8 +3308,11 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915)
> return false;
> }
> }
> + i915_retire_requests(i915);
> + GEM_BUG_ON(i915->gt.active_requests);
>
> - /* Undo nop_submit_request. We prevent all new i915 requests from
> + /*
> + * Undo nop_submit_request. We prevent all new i915 requests from
> * being queued (by disallowing execbuf whilst wedged) so having
> * waited for all active requests above, we know the system is idle
> * and do not have to worry about a thread being inside
> --
> 2.16.2
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/6] drm/i915: Reset ring space estimate after unwinding the request
2018-03-07 13:42 ` [PATCH 2/6] drm/i915: Reset ring space estimate after unwinding the request Chris Wilson
@ 2018-03-09 13:17 ` Mika Kuoppala
2018-03-09 13:42 ` Chris Wilson
2018-03-09 13:20 ` Chris Wilson
1 sibling, 1 reply; 19+ messages in thread
From: Mika Kuoppala @ 2018-03-09 13:17 UTC (permalink / raw)
To: Chris Wilson, intel-gfx
Chris Wilson <chris@chris-wilson.co.uk> writes:
> With a series of unusual events (a sequence of interrupted request
> allocations), we could gradually leak the ring->space estimate by
> unwinding the ring back to the start of the request, but not return the
> used space back to the ring. Eventually and with great misfortune, it
> would be possible to trigger ring->space exhaustion with no requests on
> the ring.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> ---
> drivers/gpu/drm/i915/i915_request.c | 1 +
> drivers/gpu/drm/i915/intel_ringbuffer.c | 1 +
> 2 files changed, 2 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index d437beac3969..efa9ee557f31 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -798,6 +798,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
>
> err_unwind:
> rq->ring->emit = rq->head;
> + intel_ring_update_space(rq->ring);
>
> /* Make sure we didn't add ourselves to external state before freeing */
> GEM_BUG_ON(!list_empty(&rq->active_list));
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 1d599524a759..88eeb64041ae 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1593,6 +1593,7 @@ static noinline int wait_for_space(struct intel_ring *ring, unsigned int bytes)
> if (intel_ring_update_space(ring) >= bytes)
> return 0;
>
> + GEM_BUG_ON(list_empty(&ring->request_list));
At some point, after long series of misfortunate events, we
will add a first request and need actually wait a space for this
and then we kaboom in here?
-Mika
> list_for_each_entry(target, &ring->request_list, ring_link) {
> /* Would completion of this request free enough space? */
> if (bytes <= __intel_ring_space(target->postfix,
> --
> 2.16.2
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/6] drm/i915: Reset ring space estimate after unwinding the request
2018-03-07 13:42 ` [PATCH 2/6] drm/i915: Reset ring space estimate after unwinding the request Chris Wilson
2018-03-09 13:17 ` Mika Kuoppala
@ 2018-03-09 13:20 ` Chris Wilson
1 sibling, 0 replies; 19+ messages in thread
From: Chris Wilson @ 2018-03-09 13:20 UTC (permalink / raw)
To: intel-gfx
Quoting Chris Wilson (2018-03-07 13:42:22)
> With a series of unusual events (a sequence of interrupted request
> allocations), we could gradually leak the ring->space estimate by
> unwinding the ring back to the start of the request, but not return the
> used space back to the ring. Eventually and with great misfortune, it
> would be possible to trigger ring->space exhaustion with no requests on
> the ring.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> ---
> drivers/gpu/drm/i915/i915_request.c | 1 +
> drivers/gpu/drm/i915/intel_ringbuffer.c | 1 +
> 2 files changed, 2 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index d437beac3969..efa9ee557f31 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -798,6 +798,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
>
> err_unwind:
> rq->ring->emit = rq->head;
> + intel_ring_update_space(rq->ring);
Ok, skip this one as we will correct ourselves next time we
wait_for_space. It's just the next one where we weren't maintaining
ring->tail that was the issue.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 3/6] drm/i915: Update ring position from request on retiring
2018-03-07 13:42 ` [PATCH 3/6] drm/i915: Update ring position from request on retiring Chris Wilson
@ 2018-03-09 13:38 ` Mika Kuoppala
2018-03-09 13:56 ` Chris Wilson
0 siblings, 1 reply; 19+ messages in thread
From: Mika Kuoppala @ 2018-03-09 13:38 UTC (permalink / raw)
To: Chris Wilson, intel-gfx
Chris Wilson <chris@chris-wilson.co.uk> writes:
> When wedged, we do not update the ring->tail as we submit the requests
> causing us to leak the ring->space upon cleaning up the wedged driver.
> We can just use the value stored in rq->tail, and keep the submission
> backend details away from set-wedge.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> ---
> drivers/gpu/drm/i915/i915_request.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index efa9ee557f31..69b378a323fc 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -358,7 +358,7 @@ static void advance_ring(struct i915_request *request)
> * is just about to be. Either works, if we miss the last two
> * noops - they are safe to be replayed on a reset.
> */
> - tail = READ_ONCE(request->ring->tail);
> + tail = READ_ONCE(request->tail);
I tried to think if we need the READ_ONCE here anymore.
But as this is the safest version,
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Noticed that request->tail is not cleared on i915_request_alloc.
If we would set rq->head = rq->tail = rq->ring->emit
we could use rq->head == rq->tail to assert that
we screw up something major during the request lifetime.
-Mika
> } else {
> tail = request->postfix;
> }
> --
> 2.16.2
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 4/6] drm/i915: Include ring->emit in debugging
2018-03-07 13:42 ` [PATCH 4/6] drm/i915: Include ring->emit in debugging Chris Wilson
@ 2018-03-09 13:41 ` Mika Kuoppala
0 siblings, 0 replies; 19+ messages in thread
From: Mika Kuoppala @ 2018-03-09 13:41 UTC (permalink / raw)
To: Chris Wilson, intel-gfx
Chris Wilson <chris@chris-wilson.co.uk> writes:
> Include ring->emit and ring->space alongside ring->(head,tail) when
> printing debug information.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> ---
> drivers/gpu/drm/i915/i915_debugfs.c | 4 ++--
> drivers/gpu/drm/i915/intel_engine_cs.c | 10 +++++++---
> 2 files changed, 9 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index e838c765b251..4de0a52f14a9 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -1923,8 +1923,8 @@ static int i915_gem_framebuffer_info(struct seq_file *m, void *data)
>
> static void describe_ctx_ring(struct seq_file *m, struct intel_ring *ring)
> {
> - seq_printf(m, " (ringbuffer, space: %d, head: %u, tail: %u)",
> - ring->space, ring->head, ring->tail);
> + seq_printf(m, " (ringbuffer, space: %d, head: %u, tail: %u, emit: %u)",
> + ring->space, ring->head, ring->tail, ring->emit);
> }
>
> static int i915_context_status(struct seq_file *m, void *unused)
> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
> index 4ba139c27fba..e71bd6951d9b 100644
> --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> @@ -1929,12 +1929,16 @@ void intel_engine_dump(struct intel_engine_cs *engine,
> rq->head, rq->postfix, rq->tail,
> rq->batch ? upper_32_bits(rq->batch->node.start) : ~0u,
> rq->batch ? lower_32_bits(rq->batch->node.start) : ~0u);
> - drm_printf(m, "\t\tring->start: 0x%08x\n",
> + drm_printf(m, "\t\tring->start : 0x%08x\n",
Please check the space before ':', seems unintentional.
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> i915_ggtt_offset(rq->ring->vma));
> - drm_printf(m, "\t\tring->head: 0x%08x\n",
> + drm_printf(m, "\t\tring->head: 0x%08x\n",
> rq->ring->head);
> - drm_printf(m, "\t\tring->tail: 0x%08x\n",
> + drm_printf(m, "\t\tring->tail: 0x%08x\n",
> rq->ring->tail);
> + drm_printf(m, "\t\tring->emit: 0x%08x\n",
> + rq->ring->emit);
> + drm_printf(m, "\t\tring->space: 0x%08x\n",
> + rq->ring->space);
> }
>
> rcu_read_unlock();
> --
> 2.16.2
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/6] drm/i915: Reset ring space estimate after unwinding the request
2018-03-09 13:17 ` Mika Kuoppala
@ 2018-03-09 13:42 ` Chris Wilson
0 siblings, 0 replies; 19+ messages in thread
From: Chris Wilson @ 2018-03-09 13:42 UTC (permalink / raw)
To: Mika Kuoppala, intel-gfx
Quoting Mika Kuoppala (2018-03-09 13:17:09)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
>
> > With a series of unusual events (a sequence of interrupted request
> > allocations), we could gradually leak the ring->space estimate by
> > unwinding the ring back to the start of the request, but not return the
> > used space back to the ring. Eventually and with great misfortune, it
> > would be possible to trigger ring->space exhaustion with no requests on
> > the ring.
> >
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > ---
> > drivers/gpu/drm/i915/i915_request.c | 1 +
> > drivers/gpu/drm/i915/intel_ringbuffer.c | 1 +
> > 2 files changed, 2 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> > index d437beac3969..efa9ee557f31 100644
> > --- a/drivers/gpu/drm/i915/i915_request.c
> > +++ b/drivers/gpu/drm/i915/i915_request.c
> > @@ -798,6 +798,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
> >
> > err_unwind:
> > rq->ring->emit = rq->head;
> > + intel_ring_update_space(rq->ring);
> >
> > /* Make sure we didn't add ourselves to external state before freeing */
> > GEM_BUG_ON(!list_empty(&rq->active_list));
> > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > index 1d599524a759..88eeb64041ae 100644
> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > @@ -1593,6 +1593,7 @@ static noinline int wait_for_space(struct intel_ring *ring, unsigned int bytes)
> > if (intel_ring_update_space(ring) >= bytes)
> > return 0;
> >
> > + GEM_BUG_ON(list_empty(&ring->request_list));
>
> At some point, after long series of misfortunate events, we
> will add a first request and need actually wait a space for this
> and then we kaboom in here?
That's the experience. However, I don't think this patch helps because
we do the intel_ring_update_space() here inside wait_for_space anyway,
so it should come out in the wash so long as we aren't corrupting
the ring space calculation.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 5/6] drm/i915: Wrap engine->schedule in RCU locks for set-wedge protection
2018-03-07 13:42 ` [PATCH 5/6] drm/i915: Wrap engine->schedule in RCU locks for set-wedge protection Chris Wilson
@ 2018-03-09 13:49 ` Mika Kuoppala
0 siblings, 0 replies; 19+ messages in thread
From: Mika Kuoppala @ 2018-03-09 13:49 UTC (permalink / raw)
To: Chris Wilson, intel-gfx
Chris Wilson <chris@chris-wilson.co.uk> writes:
> Similar to the staging around handling of engine->submit_request, we
> need to stop adding to the execlists->queue prior to calling
> engine->cancel_requests. cancel_requests will move requests from the
> queue onto the timeline, so if we add a request onto the queue after that
> point, it will be lost.
>
> Fixes: af7a8ffad9c5 ("drm/i915: Use rcu instead of stop_machine in set_wedged")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> ---
> drivers/gpu/drm/i915/i915_gem.c | 13 +++++++------
> drivers/gpu/drm/i915/i915_request.c | 2 ++
> 2 files changed, 9 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 30cd07ebcb8e..3b44952e089f 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -471,10 +471,11 @@ static void __fence_set_priority(struct dma_fence *fence, int prio)
>
> rq = to_request(fence);
> engine = rq->engine;
> - if (!engine->schedule)
> - return;
>
> - engine->schedule(rq, prio);
> + rcu_read_lock();
> + if (engine->schedule)
> + engine->schedule(rq, prio);
> + rcu_read_unlock();
> }
>
> static void fence_set_priority(struct dma_fence *fence, int prio)
> @@ -3214,8 +3215,11 @@ void i915_gem_set_wedged(struct drm_i915_private *i915)
> */
> for_each_engine(engine, i915, id) {
> i915_gem_reset_prepare_engine(engine);
> +
> engine->submit_request = nop_submit_request;
> + engine->schedule = NULL;
> }
> + i915->caps.scheduler = 0;
>
> /*
> * Make sure no one is running the old callback before we proceed with
> @@ -3233,11 +3237,8 @@ void i915_gem_set_wedged(struct drm_i915_private *i915)
> * start to complete all requests.
> */
> engine->submit_request = nop_complete_submit_request;
> - engine->schedule = NULL;
> }
>
> - i915->caps.scheduler = 0;
> -
> /*
> * Make sure no request can slip through without getting completed by
> * either this call here to intel_engine_init_global_seqno, or the one
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 69b378a323fc..0258449579f8 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -1082,8 +1082,10 @@ void __i915_request_add(struct i915_request *request, bool flush_caches)
> * decide whether to preempt the entire chain so that it is ready to
> * run at the earliest possible convenience.
> */
> + rcu_read_lock();
> if (engine->schedule)
> engine->schedule(request, request->ctx->priority);
> + rcu_read_unlock();
>
> local_bh_disable();
> i915_sw_fence_commit(&request->submit);
> --
> 2.16.2
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 3/6] drm/i915: Update ring position from request on retiring
2018-03-09 13:38 ` Mika Kuoppala
@ 2018-03-09 13:56 ` Chris Wilson
0 siblings, 0 replies; 19+ messages in thread
From: Chris Wilson @ 2018-03-09 13:56 UTC (permalink / raw)
To: Mika Kuoppala, intel-gfx
Quoting Mika Kuoppala (2018-03-09 13:38:37)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
>
> > When wedged, we do not update the ring->tail as we submit the requests
> > causing us to leak the ring->space upon cleaning up the wedged driver.
> > We can just use the value stored in rq->tail, and keep the submission
> > backend details away from set-wedge.
> >
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > ---
> > drivers/gpu/drm/i915/i915_request.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> > index efa9ee557f31..69b378a323fc 100644
> > --- a/drivers/gpu/drm/i915/i915_request.c
> > +++ b/drivers/gpu/drm/i915/i915_request.c
> > @@ -358,7 +358,7 @@ static void advance_ring(struct i915_request *request)
> > * is just about to be. Either works, if we miss the last two
> > * noops - they are safe to be replayed on a reset.
> > */
> > - tail = READ_ONCE(request->ring->tail);
> > + tail = READ_ONCE(request->tail);
>
> I tried to think if we need the READ_ONCE here anymore.
I tried as well to see if the comment was still correct. It still is due
to that we can retire before we see the context-switch interrupt.
> But as this is the safest version,
> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>
> Noticed that request->tail is not cleared on i915_request_alloc.
>
> If we would set rq->head = rq->tail = rq->ring->emit
> we could use rq->head == rq->tail to assert that
> we screw up something major during the request lifetime.
Heh, if we get a stray write here, we're going to get stray writes
everywhere :)
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 6/6] drm/i915: Only call tasklet_kill() on the first prepare_reset
2018-03-07 13:42 ` [PATCH 6/6] drm/i915: Only call tasklet_kill() on the first prepare_reset Chris Wilson
@ 2018-03-09 14:01 ` Mika Kuoppala
2018-03-09 14:10 ` Chris Wilson
1 sibling, 0 replies; 19+ messages in thread
From: Mika Kuoppala @ 2018-03-09 14:01 UTC (permalink / raw)
To: Chris Wilson, intel-gfx
Chris Wilson <chris@chris-wilson.co.uk> writes:
> tasklet_kill() will spin waiting for the current tasklet to be executed.
> However, if tasklet_disable() has been called, then the tasklet is never
> executed but permanently put back onto the runlist until
> tasklet_enable() is called. Ergo, we cannot use tasklet_kill() inside a
> disable/enable pair. This is the case when we call set-wedge from inside
> i915_reset(), and another request was submitted to us concurrent to the
> reset.
>
> Fixes: 963ddd63c314 ("drm/i915: Suspend submission tasklets around wedging")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> ---
> drivers/gpu/drm/i915/i915_gem.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 3b44952e089f..e75af06904b7 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2941,7 +2941,8 @@ i915_gem_reset_prepare_engine(struct intel_engine_cs *engine)
> * Turning off the execlists->tasklet until the reset is over
> * prevents the race.
> */
> - tasklet_kill(&engine->execlists.tasklet);
> + if (!atomic_read(&engine->execlists.tasklet.count))
> + tasklet_kill(&engine->execlists.tasklet);
As discussed in irc, this might warrant comment that it is
our tasklet of which count we are racily investigating here,
so the race does not matter in the path we avoid killing.
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> tasklet_disable(&engine->execlists.tasklet);
>
> /*
> --
> 2.16.2
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 6/6] drm/i915: Only call tasklet_kill() on the first prepare_reset
2018-03-07 13:42 ` [PATCH 6/6] drm/i915: Only call tasklet_kill() on the first prepare_reset Chris Wilson
2018-03-09 14:01 ` Mika Kuoppala
@ 2018-03-09 14:10 ` Chris Wilson
2018-03-09 14:22 ` Chris Wilson
1 sibling, 1 reply; 19+ messages in thread
From: Chris Wilson @ 2018-03-09 14:10 UTC (permalink / raw)
To: intel-gfx
Quoting Chris Wilson (2018-03-07 13:42:26)
> tasklet_kill() will spin waiting for the current tasklet to be executed.
> However, if tasklet_disable() has been called, then the tasklet is never
> executed but permanently put back onto the runlist until
> tasklet_enable() is called. Ergo, we cannot use tasklet_kill() inside a
> disable/enable pair. This is the case when we call set-wedge from inside
> i915_reset(), and another request was submitted to us concurrent to the
> reset.
>
> Fixes: 963ddd63c314 ("drm/i915: Suspend submission tasklets around wedging")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> ---
> drivers/gpu/drm/i915/i915_gem.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 3b44952e089f..e75af06904b7 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2941,7 +2941,8 @@ i915_gem_reset_prepare_engine(struct intel_engine_cs *engine)
> * Turning off the execlists->tasklet until the reset is over
> * prevents the race.
> */
> - tasklet_kill(&engine->execlists.tasklet);
> + if (!atomic_read(&engine->execlists.tasklet.count))
> + tasklet_kill(&engine->execlists.tasklet);
Note this is racy; two separate atomic operations. The only race we have
is with set-wedge vs reset, which is a rare and already contentious
issue. The upside of preventing the lockup is that we don't lose the
machine.
+ * Note that this needs to be a single atomic operation on the
+ * tasklet (flush existing tasks, prevent new tasks) to prevent
+ * a race between reset and set-wedged. It is not, so we do the best
+ * we can atm and make sure we don't lock the machine up in the more
+ * common case of recursing calling set-wedged from inside i915_reset.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 6/6] drm/i915: Only call tasklet_kill() on the first prepare_reset
2018-03-09 14:10 ` Chris Wilson
@ 2018-03-09 14:22 ` Chris Wilson
0 siblings, 0 replies; 19+ messages in thread
From: Chris Wilson @ 2018-03-09 14:22 UTC (permalink / raw)
To: intel-gfx
Quoting Chris Wilson (2018-03-09 14:10:34)
> Quoting Chris Wilson (2018-03-07 13:42:26)
> > tasklet_kill() will spin waiting for the current tasklet to be executed.
> > However, if tasklet_disable() has been called, then the tasklet is never
> > executed but permanently put back onto the runlist until
> > tasklet_enable() is called. Ergo, we cannot use tasklet_kill() inside a
> > disable/enable pair. This is the case when we call set-wedge from inside
> > i915_reset(), and another request was submitted to us concurrent to the
> > reset.
> >
> > Fixes: 963ddd63c314 ("drm/i915: Suspend submission tasklets around wedging")
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > ---
> > drivers/gpu/drm/i915/i915_gem.c | 3 ++-
> > 1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index 3b44952e089f..e75af06904b7 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -2941,7 +2941,8 @@ i915_gem_reset_prepare_engine(struct intel_engine_cs *engine)
> > * Turning off the execlists->tasklet until the reset is over
> > * prevents the race.
> > */
> > - tasklet_kill(&engine->execlists.tasklet);
> > + if (!atomic_read(&engine->execlists.tasklet.count))
> > + tasklet_kill(&engine->execlists.tasklet);
>
> Note this is racy; two separate atomic operations. The only race we have
> is with set-wedge vs reset, which is a rare and already contentious
> issue. The upside of preventing the lockup is that we don't lose the
> machine.
>
> + * Note that this needs to be a single atomic operation on the
> + * tasklet (flush existing tasks, prevent new tasks) to prevent
> + * a race between reset and set-wedged. It is not, so we do the best
> + * we can atm and make sure we don't lock the machine up in the more
> + * common case of recursing calling set-wedged from inside i915_reset.
The alternative is to not try and do tasklet_kill() here. But that has
the side-effect of leaving it spinning if it was running at the same
time as the suspend. Hmm, for the moment I'll go with the comment and
see how inspired we are once gem_eio stops dieing.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2018-03-09 14:23 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-07 13:42 [PATCH 1/6] drm/i915: Finish the wait-for-wedge by retiring all the inflight requests Chris Wilson
2018-03-07 13:42 ` [PATCH 2/6] drm/i915: Reset ring space estimate after unwinding the request Chris Wilson
2018-03-09 13:17 ` Mika Kuoppala
2018-03-09 13:42 ` Chris Wilson
2018-03-09 13:20 ` Chris Wilson
2018-03-07 13:42 ` [PATCH 3/6] drm/i915: Update ring position from request on retiring Chris Wilson
2018-03-09 13:38 ` Mika Kuoppala
2018-03-09 13:56 ` Chris Wilson
2018-03-07 13:42 ` [PATCH 4/6] drm/i915: Include ring->emit in debugging Chris Wilson
2018-03-09 13:41 ` Mika Kuoppala
2018-03-07 13:42 ` [PATCH 5/6] drm/i915: Wrap engine->schedule in RCU locks for set-wedge protection Chris Wilson
2018-03-09 13:49 ` Mika Kuoppala
2018-03-07 13:42 ` [PATCH 6/6] drm/i915: Only call tasklet_kill() on the first prepare_reset Chris Wilson
2018-03-09 14:01 ` Mika Kuoppala
2018-03-09 14:10 ` Chris Wilson
2018-03-09 14:22 ` Chris Wilson
2018-03-07 14:31 ` ✓ Fi.CI.BAT: success for series starting with [1/6] drm/i915: Finish the wait-for-wedge by retiring all the inflight requests Patchwork
2018-03-07 16:55 ` ✓ Fi.CI.IGT: " Patchwork
2018-03-09 12:54 ` [PATCH 1/6] " Mika Kuoppala
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.