All of lore.kernel.org
 help / color / mirror / Atom feed
* [CI 1/5] drm/i915: Consolidate reset_request()
@ 2017-01-10 17:22 Chris Wilson
  2017-01-10 17:22 ` [CI 2/5] drm/i915: Set guilty-flag on fence after detecting a hang Chris Wilson
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Chris Wilson @ 2017-01-10 17:22 UTC (permalink / raw)
  To: intel-gfx

Always reset the requests of the guilty context, including the hung
request that we tell the hardware to skip. This should help if the
reprogram fails entirely, but more importantly makes the guilty path
more uniform (and simplifies the subsequent patch to tweak the cancelled
requests).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 91d726f8bdfa..fb2433175a3d 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2658,13 +2658,13 @@ static void i915_gem_reset_engine(struct intel_engine_cs *engine)
 		ring_hung = false;
 	}
 
-	if (ring_hung)
+	if (ring_hung) {
 		i915_gem_context_mark_guilty(hung_ctx);
-	else
+		reset_request(request);
+	} else {
 		i915_gem_context_mark_innocent(hung_ctx);
-
-	if (!ring_hung)
 		return;
+	}
 
 	DRM_DEBUG_DRIVER("resetting %s to restart from tail of request 0x%x\n",
 			 engine->name, request->global_seqno);
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [CI 2/5] drm/i915: Set guilty-flag on fence after detecting a hang
  2017-01-10 17:22 [CI 1/5] drm/i915: Consolidate reset_request() Chris Wilson
@ 2017-01-10 17:22 ` Chris Wilson
  2017-01-10 17:22 ` [CI 3/5] drm/i915: Set an error status for a resubmitted request Chris Wilson
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Chris Wilson @ 2017-01-10 17:22 UTC (permalink / raw)
  To: intel-gfx

The struct dma_fence carries a status field exposed to userspace by
sync_file. This is inspected after the fence is signaled and can convey
whether or not the request completed successfully, or in our case if we
detected a hang during the request (signaled via -EIO in
SYNC_IOC_FILE_INFO).

v2: Mark all cancelled requests as failed.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index fb2433175a3d..324a49813668 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2626,6 +2626,8 @@ static void reset_request(struct drm_i915_gem_request *request)
 		head = 0;
 	}
 	memset(vaddr + head, 0, request->postfix - head);
+
+	dma_fence_set_error(&request->fence, -EIO);
 }
 
 void i915_gem_reset_prepare(struct drm_i915_private *dev_priv)
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [CI 3/5] drm/i915: Set an error status for a resubmitted request
  2017-01-10 17:22 [CI 1/5] drm/i915: Consolidate reset_request() Chris Wilson
  2017-01-10 17:22 ` [CI 2/5] drm/i915: Set guilty-flag on fence after detecting a hang Chris Wilson
@ 2017-01-10 17:22 ` Chris Wilson
  2017-01-10 17:22 ` [CI 4/5] drm/i915: Mark all incomplete requests as -EIO when wedged Chris Wilson
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Chris Wilson @ 2017-01-10 17:22 UTC (permalink / raw)
  To: intel-gfx

Let userspace know if its request was resubmitted due to it being
executed at the time of a global reset. In this case, the reset was for
a guilty request on another engine, and this request was an innocent
victim that will be re-executed upon restarting. However, since it was
running at the time of the reset, we can not guarantee that it suffered
no ill-effects from the reset (e.g. some context state may be lost, or
some self-modifying fragment shaders will be restarted from the final
state not their initial state), to let userspace know that it has been
corrupted set a special value on the fence->error, -EAGAIN.

If the request does hang on resubmission, the error will be overwritten
with -EIO.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 324a49813668..94ad9eb83a5c 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2665,6 +2665,7 @@ static void i915_gem_reset_engine(struct intel_engine_cs *engine)
 		reset_request(request);
 	} else {
 		i915_gem_context_mark_innocent(hung_ctx);
+		dma_fence_set_error(&request->fence, -EAGAIN);
 		return;
 	}
 
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [CI 4/5] drm/i915: Mark all incomplete requests as -EIO when wedged
  2017-01-10 17:22 [CI 1/5] drm/i915: Consolidate reset_request() Chris Wilson
  2017-01-10 17:22 ` [CI 2/5] drm/i915: Set guilty-flag on fence after detecting a hang Chris Wilson
  2017-01-10 17:22 ` [CI 3/5] drm/i915: Set an error status for a resubmitted request Chris Wilson
@ 2017-01-10 17:22 ` Chris Wilson
  2017-01-10 17:22 ` [CI 5/5] drm/i915: Rename i915_gem_engine_cleanup() to engine_set_wedged() Chris Wilson
  2017-01-10 18:53 ` ✗ Fi.CI.BAT: warning for series starting with [CI,1/5] drm/i915: Consolidate reset_request() Patchwork
  4 siblings, 0 replies; 6+ messages in thread
From: Chris Wilson @ 2017-01-10 17:22 UTC (permalink / raw)
  To: intel-gfx

Similarly to a normal reset, after we mark the GPU as wedged (completely
fubar and no more requests can be executed), set the error status on all
the in flight requests.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 94ad9eb83a5c..7f73a35c7725 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2730,12 +2730,16 @@ void i915_gem_reset_finish(struct drm_i915_private *dev_priv)
 
 static void nop_submit_request(struct drm_i915_gem_request *request)
 {
+	dma_fence_set_error(&request->fence, -EIO);
 	i915_gem_request_submit(request);
 	intel_engine_init_global_seqno(request->engine, request->global_seqno);
 }
 
 static void i915_gem_cleanup_engine(struct intel_engine_cs *engine)
 {
+	struct drm_i915_gem_request *request;
+	unsigned long flags;
+
 	/* We need to be sure that no thread is running the old callback as
 	 * we install the nop handler (otherwise we would submit a request
 	 * to hardware that will never complete). In order to prevent this
@@ -2744,6 +2748,12 @@ static void i915_gem_cleanup_engine(struct intel_engine_cs *engine)
 	 */
 	engine->submit_request = nop_submit_request;
 
+	/* Mark all executing requests as skipped */
+	spin_lock_irqsave(&engine->timeline->lock, flags);
+	list_for_each_entry(request, &engine->timeline->requests, link)
+		dma_fence_set_error(&request->fence, -EIO);
+	spin_unlock_irqrestore(&engine->timeline->lock, flags);
+
 	/* Mark all pending requests as complete so that any concurrent
 	 * (lockless) lookup doesn't try and wait upon the request as we
 	 * reset it.
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [CI 5/5] drm/i915: Rename i915_gem_engine_cleanup() to engine_set_wedged()
  2017-01-10 17:22 [CI 1/5] drm/i915: Consolidate reset_request() Chris Wilson
                   ` (2 preceding siblings ...)
  2017-01-10 17:22 ` [CI 4/5] drm/i915: Mark all incomplete requests as -EIO when wedged Chris Wilson
@ 2017-01-10 17:22 ` Chris Wilson
  2017-01-10 18:53 ` ✗ Fi.CI.BAT: warning for series starting with [CI,1/5] drm/i915: Consolidate reset_request() Patchwork
  4 siblings, 0 replies; 6+ messages in thread
From: Chris Wilson @ 2017-01-10 17:22 UTC (permalink / raw)
  To: intel-gfx

It has been some time since i915_gem_engine_cleanup was only called from
the module unload path, and now it is only called when the GPU is
wedged. Mika complained that the name is confusing, especially in light
of the existence of i915_gem_cleanup_engines().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 7f73a35c7725..e5d96de61c14 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2735,7 +2735,7 @@ static void nop_submit_request(struct drm_i915_gem_request *request)
 	intel_engine_init_global_seqno(request->engine, request->global_seqno);
 }
 
-static void i915_gem_cleanup_engine(struct intel_engine_cs *engine)
+static void engine_set_wedged(struct intel_engine_cs *engine)
 {
 	struct drm_i915_gem_request *request;
 	unsigned long flags;
@@ -2789,7 +2789,7 @@ static int __i915_gem_set_wedged_BKL(void *data)
 	enum intel_engine_id id;
 
 	for_each_engine(engine, i915, id)
-		i915_gem_cleanup_engine(engine);
+		engine_set_wedged(engine);
 
 	return 0;
 }
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* ✗ Fi.CI.BAT: warning for series starting with [CI,1/5] drm/i915: Consolidate reset_request()
  2017-01-10 17:22 [CI 1/5] drm/i915: Consolidate reset_request() Chris Wilson
                   ` (3 preceding siblings ...)
  2017-01-10 17:22 ` [CI 5/5] drm/i915: Rename i915_gem_engine_cleanup() to engine_set_wedged() Chris Wilson
@ 2017-01-10 18:53 ` Patchwork
  4 siblings, 0 replies; 6+ messages in thread
From: Patchwork @ 2017-01-10 18:53 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [CI,1/5] drm/i915: Consolidate reset_request()
URL   : https://patchwork.freedesktop.org/series/17779/
State : warning

== Summary ==

Series 17779v1 Series without cover letter
https://patchwork.freedesktop.org/api/1.0/series/17779/revisions/1/mbox/

Test kms_flip:
        Subgroup basic-flip-vs-wf_vblank:
                pass       -> DMESG-WARN (fi-ivb-3770)

fi-bdw-5557u     total:246  pass:232  dwarn:0   dfail:0   fail:0   skip:14 
fi-bsw-n3050     total:246  pass:207  dwarn:0   dfail:0   fail:0   skip:39 
fi-bxt-j4205     total:246  pass:224  dwarn:0   dfail:0   fail:0   skip:22 
fi-bxt-t5700     total:82   pass:69   dwarn:0   dfail:0   fail:0   skip:12 
fi-byt-j1900     total:246  pass:219  dwarn:0   dfail:0   fail:0   skip:27 
fi-byt-n2820     total:246  pass:215  dwarn:0   dfail:0   fail:0   skip:31 
fi-hsw-4770      total:246  pass:227  dwarn:0   dfail:0   fail:0   skip:19 
fi-hsw-4770r     total:246  pass:227  dwarn:0   dfail:0   fail:0   skip:19 
fi-ivb-3520m     total:246  pass:225  dwarn:0   dfail:0   fail:0   skip:21 
fi-ivb-3770      total:246  pass:224  dwarn:1   dfail:0   fail:0   skip:21 
fi-kbl-7500u     total:246  pass:225  dwarn:0   dfail:0   fail:0   skip:21 
fi-skl-6260u     total:246  pass:233  dwarn:0   dfail:0   fail:0   skip:13 
fi-skl-6700hq    total:246  pass:226  dwarn:0   dfail:0   fail:0   skip:20 
fi-skl-6700k     total:246  pass:222  dwarn:3   dfail:0   fail:0   skip:21 
fi-skl-6770hq    total:246  pass:233  dwarn:0   dfail:0   fail:0   skip:13 
fi-snb-2520m     total:246  pass:215  dwarn:0   dfail:0   fail:0   skip:31 
fi-snb-2600      total:246  pass:214  dwarn:0   dfail:0   fail:0   skip:32 

44b1ad71b67ecfae6ba9b816c6f3a6ebe1fd182e drm-tip: 2017y-01m-10d-16h-32m-29s UTC integration manifest
887a3f4 drm/i915: Rename i915_gem_engine_cleanup() to engine_set_wedged()
1c791fe drm/i915: Mark all incomplete requests as -EIO when wedged
1536137 drm/i915: Set an error status for a resubmitted request
99d7098 drm/i915: Set guilty-flag on fence after detecting a hang
7cefc5d drm/i915: Consolidate reset_request()

== Logs ==

For more details see: https://intel-gfx-ci.01.org/CI/Patchwork_3469/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-01-10 18:53 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-10 17:22 [CI 1/5] drm/i915: Consolidate reset_request() Chris Wilson
2017-01-10 17:22 ` [CI 2/5] drm/i915: Set guilty-flag on fence after detecting a hang Chris Wilson
2017-01-10 17:22 ` [CI 3/5] drm/i915: Set an error status for a resubmitted request Chris Wilson
2017-01-10 17:22 ` [CI 4/5] drm/i915: Mark all incomplete requests as -EIO when wedged Chris Wilson
2017-01-10 17:22 ` [CI 5/5] drm/i915: Rename i915_gem_engine_cleanup() to engine_set_wedged() Chris Wilson
2017-01-10 18:53 ` ✗ Fi.CI.BAT: warning for series starting with [CI,1/5] drm/i915: Consolidate reset_request() Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.