stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] drm/i915/gem: Propagate error from cancelled submit due to context closure
@ 2020-12-03 10:34 Chris Wilson
  2020-12-04 11:09 ` [Intel-gfx] " Tvrtko Ursulin
  0 siblings, 1 reply; 2+ messages in thread
From: Chris Wilson @ 2020-12-03 10:34 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson, stable

In the course of discovering and closing many races with context closure
and execbuf submission, since commit 61231f6bd056 ("drm/i915/gem: Check
that the context wasn't closed during setup") we started checking that
the context was not closed by another userspace thread during the execbuf
ioctl. In doing so we cancelled the inflight request (by telling it to be
skipped), but kept reporting success since we do submit a request, albeit
one that doesn't execute. As the error is known before we return from the
ioctl, we can report the error we detect immediately, rather than leave
it on the fence status. With the immediate propagation of the error, it
is easier for userspace to handle.

Fixes: 61231f6bd056 ("drm/i915/gem: Check that the context wasn't closed during setup")
Testcase: igt/gem_ctx_exec/basic-close-race
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: <stable@vger.kernel.org> # v5.7+
---
 drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 1904e6e5ea64..b07dc1156a0e 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -3097,7 +3097,7 @@ static void retire_requests(struct intel_timeline *tl, struct i915_request *end)
 			break;
 }
 
-static void eb_request_add(struct i915_execbuffer *eb)
+static int eb_request_add(struct i915_execbuffer *eb, int err)
 {
 	struct i915_request *rq = eb->request;
 	struct intel_timeline * const tl = i915_request_timeline(rq);
@@ -3118,6 +3118,7 @@ static void eb_request_add(struct i915_execbuffer *eb)
 		/* Serialise with context_close via the add_to_timeline */
 		i915_request_set_error_once(rq, -ENOENT);
 		__i915_request_skip(rq);
+		err = -ENOENT; /* override any transient errors */
 	}
 
 	__i915_request_queue(rq, &attr);
@@ -3127,6 +3128,8 @@ static void eb_request_add(struct i915_execbuffer *eb)
 		retire_requests(tl, prev);
 
 	mutex_unlock(&tl->mutex);
+
+	return err;
 }
 
 static const i915_user_extension_fn execbuf_extensions[] = {
@@ -3332,7 +3335,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	err = eb_submit(&eb, batch);
 err_request:
 	i915_request_get(eb.request);
-	eb_request_add(&eb);
+	err = eb_request_add(&eb, err);
 
 	if (eb.fences)
 		signal_fence_array(&eb);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915/gem: Propagate error from cancelled submit due to context closure
  2020-12-03 10:34 [PATCH] drm/i915/gem: Propagate error from cancelled submit due to context closure Chris Wilson
@ 2020-12-04 11:09 ` Tvrtko Ursulin
  0 siblings, 0 replies; 2+ messages in thread
From: Tvrtko Ursulin @ 2020-12-04 11:09 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: stable


On 03/12/2020 10:34, Chris Wilson wrote:
> In the course of discovering and closing many races with context closure
> and execbuf submission, since commit 61231f6bd056 ("drm/i915/gem: Check
> that the context wasn't closed during setup") we started checking that
> the context was not closed by another userspace thread during the execbuf
> ioctl. In doing so we cancelled the inflight request (by telling it to be
> skipped), but kept reporting success since we do submit a request, albeit
> one that doesn't execute. As the error is known before we return from the
> ioctl, we can report the error we detect immediately, rather than leave
> it on the fence status. With the immediate propagation of the error, it
> is easier for userspace to handle.
> 
> Fixes: 61231f6bd056 ("drm/i915/gem: Check that the context wasn't closed during setup")
> Testcase: igt/gem_ctx_exec/basic-close-race
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: <stable@vger.kernel.org> # v5.7+
> ---
>   drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 7 +++++--
>   1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> index 1904e6e5ea64..b07dc1156a0e 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> @@ -3097,7 +3097,7 @@ static void retire_requests(struct intel_timeline *tl, struct i915_request *end)
>   			break;
>   }
>   
> -static void eb_request_add(struct i915_execbuffer *eb)
> +static int eb_request_add(struct i915_execbuffer *eb, int err)
>   {
>   	struct i915_request *rq = eb->request;
>   	struct intel_timeline * const tl = i915_request_timeline(rq);
> @@ -3118,6 +3118,7 @@ static void eb_request_add(struct i915_execbuffer *eb)
>   		/* Serialise with context_close via the add_to_timeline */
>   		i915_request_set_error_once(rq, -ENOENT);
>   		__i915_request_skip(rq);
> +		err = -ENOENT; /* override any transient errors */
>   	}
>   
>   	__i915_request_queue(rq, &attr);
> @@ -3127,6 +3128,8 @@ static void eb_request_add(struct i915_execbuffer *eb)
>   		retire_requests(tl, prev);
>   
>   	mutex_unlock(&tl->mutex);
> +
> +	return err;
>   }
>   
>   static const i915_user_extension_fn execbuf_extensions[] = {
> @@ -3332,7 +3335,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
>   	err = eb_submit(&eb, batch);
>   err_request:
>   	i915_request_get(eb.request);
> -	eb_request_add(&eb);
> +	err = eb_request_add(&eb, err);
>   
>   	if (eb.fences)
>   		signal_fence_array(&eb);
> 

Simple enough and it explains why gem_busy assert in the IGT can fail 
after execbuf succeeded - skipped request executes before the check 
since the payload was zapped. Fast timing but obviously possible.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2020-12-04 11:11 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-03 10:34 [PATCH] drm/i915/gem: Propagate error from cancelled submit due to context closure Chris Wilson
2020-12-04 11:09 ` [Intel-gfx] " Tvrtko Ursulin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).