All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] drm/i915: Stop retiring requests from busy-ioctl
@ 2016-05-12 10:25 Chris Wilson
  2016-05-12 10:25 ` [PATCH 2/2] drm/i915: Stop automatically retiring requests after a GPU hang Chris Wilson
                   ` (5 more replies)
  0 siblings, 6 replies; 18+ messages in thread
From: Chris Wilson @ 2016-05-12 10:25 UTC (permalink / raw)
  To: intel-gfx; +Cc: Mika Kuoppala

In order to reduce the workload of the caller, we do not want to
actually have to retire requests of others when checking the status of
this object.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 8439867..474c027 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3069,14 +3069,8 @@ i915_gem_object_flush_active(struct drm_i915_gem_object *obj)
 		if (req == NULL)
 			continue;
 
-		if (list_empty(&req->list))
-			goto retire;
-
-		if (i915_gem_request_completed(req, true)) {
-			__i915_gem_request_retire__upto(req);
-retire:
+		if (i915_gem_request_completed(req, true))
 			i915_gem_object_retire__read(obj, i);
-		}
 	}
 
 	return 0;
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/2] drm/i915: Stop automatically retiring requests after a GPU hang
  2016-05-12 10:25 [PATCH 1/2] drm/i915: Stop retiring requests from busy-ioctl Chris Wilson
@ 2016-05-12 10:25 ` Chris Wilson
  2016-05-13  7:57   ` [PATCH v2] " Chris Wilson
  2016-05-13  8:25 ` ✗ Ro.CI.BAT: failure for series starting with [1/2] drm/i915: Stop retiring requests from busy-ioctl (rev2) Patchwork
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 18+ messages in thread
From: Chris Wilson @ 2016-05-12 10:25 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter, Mika Kuoppala

Following a GPU hang, we break out of the request loop in order to
unlock the struct_mutex for use by the GPU reset. However, if we retire
all the requests at that moment, we cannot identify the guilty request
after performing the reset.

Fixes: f4457ae71fd6 ("drm/i915: Prevent leaking of -EIO from i915_wait_request()")
Testcase: igt/gem_reset_stats
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 474c027..b1a33fe9 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1462,7 +1462,10 @@ i915_wait_request(struct drm_i915_gem_request *req)
 	if (ret)
 		return ret;
 
-	__i915_gem_request_retire__upto(req);
+	/* If the GPU hung, we want to keep the requests to find the guilty. */
+	if (req->reset_counter == i915_reset_counter(&dev_priv->gpu_error))
+		__i915_gem_request_retire__upto(req);
+
 	return 0;
 }
 
@@ -1519,7 +1522,8 @@ i915_gem_object_retire_request(struct drm_i915_gem_object *obj,
 	else if (obj->last_write_req == req)
 		i915_gem_object_retire__write(obj);
 
-	__i915_gem_request_retire__upto(req);
+	if (req->reset_counter == i915_reset_counter(&req->i915->gpu_error))
+		__i915_gem_request_retire__upto(req);
 }
 
 /* A nonblocking variant of the above wait. This is a highly dangerous routine
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2] drm/i915: Stop automatically retiring requests after a GPU hang
  2016-05-12 10:25 ` [PATCH 2/2] drm/i915: Stop automatically retiring requests after a GPU hang Chris Wilson
@ 2016-05-13  7:57   ` Chris Wilson
  2016-05-13  8:43     ` Mika Kuoppala
  2016-05-13 10:48     ` [PATCH v2] drm/i915: Stop automatically retiring requests after a GPU hang Mika Kuoppala
  0 siblings, 2 replies; 18+ messages in thread
From: Chris Wilson @ 2016-05-13  7:57 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter, Mika Kuoppala

Following a GPU hang, we break out of the request loop in order to
unlock the struct_mutex for use by the GPU reset. However, if we retire
all the requests at that moment, we cannot identify the guilty request
after performing the reset.

v2: Not automatically retiring requests forces us to recheck for
available ringspace.

Fixes: f4457ae71fd6 ("drm/i915: Prevent leaking of -EIO from i915_wait_request()")
Testcase: igt/gem_reset_stats
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c         | 8 ++++++--
 drivers/gpu/drm/i915/intel_ringbuffer.c | 2 ++
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 6e61738fab31..a3d826bb216b 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1462,7 +1462,10 @@ i915_wait_request(struct drm_i915_gem_request *req)
 	if (ret)
 		return ret;
 
-	__i915_gem_request_retire__upto(req);
+	/* If the GPU hung, we want to keep the requests to find the guilty. */
+	if (req->reset_counter == i915_reset_counter(&dev_priv->gpu_error))
+		__i915_gem_request_retire__upto(req);
+
 	return 0;
 }
 
@@ -1519,7 +1522,8 @@ i915_gem_object_retire_request(struct drm_i915_gem_object *obj,
 	else if (obj->last_write_req == req)
 		i915_gem_object_retire__write(obj);
 
-	__i915_gem_request_retire__upto(req);
+	if (req->reset_counter == i915_reset_counter(&req->i915->gpu_error))
+		__i915_gem_request_retire__upto(req);
 }
 
 /* A nonblocking variant of the above wait. This is a highly dangerous routine
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 0618dd34c3ec..8d35a3978f9b 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2450,6 +2450,8 @@ int intel_ring_begin(struct drm_i915_gem_request *req, int num_dwords)
 			return ret;
 
 		intel_ring_update_space(ringbuf);
+		if (unlikely(ringbuf->space < wait_bytes))
+			return -EAGAIN;
 	}
 
 	if (unlikely(need_wrap)) {
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* ✗ Ro.CI.BAT: failure for series starting with [1/2] drm/i915: Stop retiring requests from busy-ioctl (rev2)
  2016-05-12 10:25 [PATCH 1/2] drm/i915: Stop retiring requests from busy-ioctl Chris Wilson
  2016-05-12 10:25 ` [PATCH 2/2] drm/i915: Stop automatically retiring requests after a GPU hang Chris Wilson
@ 2016-05-13  8:25 ` Patchwork
  2016-05-13  8:38 ` [PATCH 1/2] drm/i915: Stop retiring requests from busy-ioctl Tvrtko Ursulin
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 18+ messages in thread
From: Patchwork @ 2016-05-13  8:25 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/2] drm/i915: Stop retiring requests from busy-ioctl (rev2)
URL   : https://patchwork.freedesktop.org/series/7059/
State : failure

== Summary ==

Series 7059v2 Series without cover letter
http://patchwork.freedesktop.org/api/1.0/series/7059/revisions/2/mbox

Test drv_hangman:
        Subgroup error-state-basic:
                fail       -> PASS       (ro-ilk1-i5-650)
Test gem_exec_flush:
        Subgroup basic-batch-kernel-default-cmd:
                pass       -> FAIL       (fi-byt-n2820)
Test kms_flip:
        Subgroup basic-flip-vs-wf_vblank:
                fail       -> PASS       (ro-hsw-i7-4770r)
                pass       -> FAIL       (fi-hsw-i7-4770k)
Test kms_pipe_crc_basic:
        Subgroup nonblocking-crc-pipe-c:
                skip       -> PASS       (fi-skl-i5-6260u)

fi-bdw-i7-5557u  total:219  pass:206  dwarn:0   dfail:0   fail:0   skip:13 
fi-bsw-n3050     total:218  pass:174  dwarn:0   dfail:0   fail:2   skip:42 
fi-byt-n2820     total:218  pass:174  dwarn:0   dfail:0   fail:3   skip:41 
fi-hsw-i7-4770k  total:219  pass:197  dwarn:0   dfail:0   fail:1   skip:21 
fi-hsw-i7-4770r  total:219  pass:193  dwarn:0   dfail:0   fail:0   skip:26 
fi-skl-i5-6260u  total:219  pass:207  dwarn:0   dfail:0   fail:0   skip:12 
fi-skl-i7-6700k  total:219  pass:191  dwarn:0   dfail:0   fail:0   skip:28 
fi-snb-i7-2600   total:37   pass:27   dwarn:0   dfail:0   fail:0   skip:9  
ro-bdw-i5-5250u  total:219  pass:181  dwarn:0   dfail:0   fail:0   skip:38 
ro-bdw-i7-5557U  total:219  pass:206  dwarn:0   dfail:0   fail:0   skip:13 
ro-bdw-i7-5600u  total:219  pass:187  dwarn:0   dfail:0   fail:0   skip:32 
ro-bsw-n3050     total:219  pass:175  dwarn:0   dfail:0   fail:2   skip:42 
ro-byt-n2820     total:218  pass:175  dwarn:0   dfail:0   fail:2   skip:41 
ro-hsw-i3-4010u  total:218  pass:193  dwarn:0   dfail:0   fail:0   skip:25 
ro-hsw-i7-4770r  total:219  pass:194  dwarn:0   dfail:0   fail:0   skip:25 
ro-ilk-i7-620lm  total:219  pass:151  dwarn:0   dfail:0   fail:1   skip:67 
ro-ilk1-i5-650   total:214  pass:152  dwarn:0   dfail:0   fail:1   skip:61 
ro-ivb-i7-3770   total:219  pass:183  dwarn:0   dfail:0   fail:0   skip:36 
ro-ivb2-i7-3770  total:219  pass:186  dwarn:0   dfail:0   fail:1   skip:32 
ro-skl-i7-6700hq total:214  pass:190  dwarn:0   dfail:0   fail:0   skip:24 
ro-snb-i7-2620M  total:219  pass:177  dwarn:0   dfail:0   fail:1   skip:41 

Results at /archive/results/CI_IGT_test/RO_Patchwork_879/

a6ac4ab drm-intel-nightly: 2016y-05m-12d-15h-37m-38s UTC integration manifest
f6b2891 drm/i915: Stop automatically retiring requests after a GPU hang
41fa6f9 drm/i915: Stop retiring requests from busy-ioctl

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] drm/i915: Stop retiring requests from busy-ioctl
  2016-05-12 10:25 [PATCH 1/2] drm/i915: Stop retiring requests from busy-ioctl Chris Wilson
  2016-05-12 10:25 ` [PATCH 2/2] drm/i915: Stop automatically retiring requests after a GPU hang Chris Wilson
  2016-05-13  8:25 ` ✗ Ro.CI.BAT: failure for series starting with [1/2] drm/i915: Stop retiring requests from busy-ioctl (rev2) Patchwork
@ 2016-05-13  8:38 ` Tvrtko Ursulin
  2016-05-13  8:50   ` Chris Wilson
  2016-05-13  9:48 ` ✗ Ro.CI.BAT: failure for series starting with [1/2] drm/i915: Stop retiring requests from busy-ioctl (rev4) Patchwork
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 18+ messages in thread
From: Tvrtko Ursulin @ 2016-05-13  8:38 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: Mika Kuoppala


On 12/05/16 11:25, Chris Wilson wrote:
> In order to reduce the workload of the caller, we do not want to
> actually have to retire requests of others when checking the status of
> this object.

Wrt the subject, and from wait ioctl as well.

Also i915_gem_object_sync / i915_gem_execbuffer_move_to_gpu path 
(execbuf) looks like it could still do the upto retiring so that isn't 
only moving the thing around?

And in general, commit does not say who was impacted by this and how much?

Regards,

Tvrtko

>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_gem.c | 8 +-------
>   1 file changed, 1 insertion(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 8439867..474c027 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3069,14 +3069,8 @@ i915_gem_object_flush_active(struct drm_i915_gem_object *obj)
>   		if (req == NULL)
>   			continue;
>
> -		if (list_empty(&req->list))
> -			goto retire;
> -
> -		if (i915_gem_request_completed(req, true)) {
> -			__i915_gem_request_retire__upto(req);
> -retire:
> +		if (i915_gem_request_completed(req, true))
>   			i915_gem_object_retire__read(obj, i);
> -		}
>   	}
>
>   	return 0;
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] drm/i915: Stop automatically retiring requests after a GPU hang
  2016-05-13  7:57   ` [PATCH v2] " Chris Wilson
@ 2016-05-13  8:43     ` Mika Kuoppala
  2016-05-13  9:38       ` Chris Wilson
  2016-05-13  9:39       ` [PATCH 1/2] drm/i915: Move get-reset-stats ioctl from intel_uncore.c to i915_gem_context.c Chris Wilson
  2016-05-13 10:48     ` [PATCH v2] drm/i915: Stop automatically retiring requests after a GPU hang Mika Kuoppala
  1 sibling, 2 replies; 18+ messages in thread
From: Mika Kuoppala @ 2016-05-13  8:43 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: Daniel Vetter

Chris Wilson <chris@chris-wilson.co.uk> writes:

> [ text/plain ]
> Following a GPU hang, we break out of the request loop in order to
> unlock the struct_mutex for use by the GPU reset. However, if we retire
> all the requests at that moment, we cannot identify the guilty request
> after performing the reset.
>
> v2: Not automatically retiring requests forces us to recheck for
> available ringspace.
>
> Fixes: f4457ae71fd6 ("drm/i915: Prevent leaking of -EIO from
> i915_wait_request()")

Partially. We don't clear the request list before reset
processing. But we still do release clients before reset
processing. This leads to clients see completed
requests before the reset statistics have been updated.

If we return -EAGAIN in wait_request, then we unblock
the clients only until the reset handling has updated
the seqno past the reset boundary. This would fix
both issues I am seeing.

-Mika

> Testcase: igt/gem_reset_stats
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem.c         | 8 ++++++--
>  drivers/gpu/drm/i915/intel_ringbuffer.c | 2 ++
>  2 files changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 6e61738fab31..a3d826bb216b 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1462,7 +1462,10 @@ i915_wait_request(struct drm_i915_gem_request *req)
>  	if (ret)
>  		return ret;
>  
> -	__i915_gem_request_retire__upto(req);
> +	/* If the GPU hung, we want to keep the requests to find the guilty. */
> +	if (req->reset_counter == i915_reset_counter(&dev_priv->gpu_error))
> +		__i915_gem_request_retire__upto(req);
> +
>  	return 0;
>  }
>  
> @@ -1519,7 +1522,8 @@ i915_gem_object_retire_request(struct drm_i915_gem_object *obj,
>  	else if (obj->last_write_req == req)
>  		i915_gem_object_retire__write(obj);
>  
> -	__i915_gem_request_retire__upto(req);
> +	if (req->reset_counter == i915_reset_counter(&req->i915->gpu_error))
> +		__i915_gem_request_retire__upto(req);
>  }
>  
>  /* A nonblocking variant of the above wait. This is a highly dangerous routine
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 0618dd34c3ec..8d35a3978f9b 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -2450,6 +2450,8 @@ int intel_ring_begin(struct drm_i915_gem_request *req, int num_dwords)
>  			return ret;
>  
>  		intel_ring_update_space(ringbuf);
> +		if (unlikely(ringbuf->space < wait_bytes))
> +			return -EAGAIN;
>  	}
>  
>  	if (unlikely(need_wrap)) {
> -- 
> 2.8.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] drm/i915: Stop retiring requests from busy-ioctl
  2016-05-13  8:38 ` [PATCH 1/2] drm/i915: Stop retiring requests from busy-ioctl Tvrtko Ursulin
@ 2016-05-13  8:50   ` Chris Wilson
  2016-05-13  9:01     ` Tvrtko Ursulin
  0 siblings, 1 reply; 18+ messages in thread
From: Chris Wilson @ 2016-05-13  8:50 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx, Mika Kuoppala

On Fri, May 13, 2016 at 09:38:54AM +0100, Tvrtko Ursulin wrote:
> 
> On 12/05/16 11:25, Chris Wilson wrote:
> >In order to reduce the workload of the caller, we do not want to
> >actually have to retire requests of others when checking the status of
> >this object.
> 
> Wrt the subject, and from wait ioctl as well.
> 
> Also i915_gem_object_sync / i915_gem_execbuffer_move_to_gpu path
> (execbuf) looks like it could still do the upto retiring so that
> isn't only moving the thing around?
> 
> And in general, commit does not say who was impacted by this and how much?

Nothing much until we have them all removed. Please see the other
patches! The end result in having a completely lockless busy-ioctl is a
major improvement to a minor path... The impact is really only measured
by reduced contention between clients, for which the busy-ioctl is
mostly a victim.

This was just an opportunistic change to reduce the numer of
retire__upto when thinking about the bugfix in patch 2.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] drm/i915: Stop retiring requests from busy-ioctl
  2016-05-13  8:50   ` Chris Wilson
@ 2016-05-13  9:01     ` Tvrtko Ursulin
  2016-05-13 10:30       ` [PATCH v2] drm/i915: Stop retiring requests from busy/wait ioctls Chris Wilson
  0 siblings, 1 reply; 18+ messages in thread
From: Tvrtko Ursulin @ 2016-05-13  9:01 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx, Mika Kuoppala


On 13/05/16 09:50, Chris Wilson wrote:
> On Fri, May 13, 2016 at 09:38:54AM +0100, Tvrtko Ursulin wrote:
>>
>> On 12/05/16 11:25, Chris Wilson wrote:
>>> In order to reduce the workload of the caller, we do not want to
>>> actually have to retire requests of others when checking the status of
>>> this object.
>>
>> Wrt the subject, and from wait ioctl as well.
>>
>> Also i915_gem_object_sync / i915_gem_execbuffer_move_to_gpu path
>> (execbuf) looks like it could still do the upto retiring so that
>> isn't only moving the thing around?
>>
>> And in general, commit does not say who was impacted by this and how much?
>
> Nothing much until we have them all removed. Please see the other
> patches! The end result in having a completely lockless busy-ioctl is a
> major improvement to a minor path... The impact is really only measured
> by reduced contention between clients, for which the busy-ioctl is
> mostly a victim.
>
> This was just an opportunistic change to reduce the numer of
> retire__upto when thinking about the bugfix in patch 2.

I don't mind the idea, just though it needs more explaining in the
commit and updated subject.

I don't know the reset path nearly well enough to comment on 2/2.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] drm/i915: Stop automatically retiring requests after a GPU hang
  2016-05-13  8:43     ` Mika Kuoppala
@ 2016-05-13  9:38       ` Chris Wilson
  2016-05-13  9:39       ` [PATCH 1/2] drm/i915: Move get-reset-stats ioctl from intel_uncore.c to i915_gem_context.c Chris Wilson
  1 sibling, 0 replies; 18+ messages in thread
From: Chris Wilson @ 2016-05-13  9:38 UTC (permalink / raw)
  To: Mika Kuoppala; +Cc: Daniel Vetter, intel-gfx

On Fri, May 13, 2016 at 11:43:25AM +0300, Mika Kuoppala wrote:
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > [ text/plain ]
> > Following a GPU hang, we break out of the request loop in order to
> > unlock the struct_mutex for use by the GPU reset. However, if we retire
> > all the requests at that moment, we cannot identify the guilty request
> > after performing the reset.
> >
> > v2: Not automatically retiring requests forces us to recheck for
> > available ringspace.
> >
> > Fixes: f4457ae71fd6 ("drm/i915: Prevent leaking of -EIO from
> > i915_wait_request()")
> 
> Partially. We don't clear the request list before reset
> processing. But we still do release clients before reset
> processing. This leads to clients see completed
> requests before the reset statistics have been updated.
> 
> If we return -EAGAIN in wait_request, then we unblock
> the clients only until the reset handling has updated
> the seqno past the reset boundary. This would fix
> both issues I am seeing.

I am very wary about reintroducing -EAGAIN here, since the majority of
callers do not care - or worse do not handle such notifications and
would be confused by it (the request they waited upon is no more and
they are ready to continue after all).

Anyway, the bug looks to be on the other side here as the
get-reset-stats ioctl doesn't wait for a pending reset before reporting
victim/guilty counts.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/2] drm/i915: Move get-reset-stats ioctl from intel_uncore.c to i915_gem_context.c
  2016-05-13  8:43     ` Mika Kuoppala
  2016-05-13  9:38       ` Chris Wilson
@ 2016-05-13  9:39       ` Chris Wilson
  2016-05-13  9:39         ` [PATCH 2/2] drm/i915: Complete pending resets before get-reset-stats ioctl Chris Wilson
  2016-05-13  9:51         ` [PATCH 1/2] drm/i915: Move get-reset-stats ioctl from intel_uncore.c to i915_gem_context.c Mika Kuoppala
  1 sibling, 2 replies; 18+ messages in thread
From: Chris Wilson @ 2016-05-13  9:39 UTC (permalink / raw)
  To: intel-gfx

The get-reset-stats ioctl reports upon the statistics (number of hangs,
be it as a victim or the guilty party) of a particular context. It is
semantically better as being part of i915_gem_context.c user interface,
as opposed to the hardware level access of intel_uncore.c

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_dma.c         |  2 +-
 drivers/gpu/drm/i915/i915_drv.h         |  4 ++--
 drivers/gpu/drm/i915/i915_gem_context.c | 39 +++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_uncore.c     | 39 ---------------------------------
 4 files changed, 42 insertions(+), 42 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index e42fa1769c10..05b2b1901d60 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -1631,7 +1631,7 @@ const struct drm_ioctl_desc i915_ioctls[] = {
 	DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_CREATE, i915_gem_context_create_ioctl, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_DESTROY, i915_gem_context_destroy_ioctl, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(I915_REG_READ, i915_reg_read_ioctl, DRM_RENDER_ALLOW),
-	DRM_IOCTL_DEF_DRV(I915_GET_RESET_STATS, i915_get_reset_stats_ioctl, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(I915_GET_RESET_STATS, i915_gem_context_reset_stats_ioctl, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(I915_GEM_USERPTR, i915_gem_userptr_ioctl, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_GETPARAM, i915_gem_context_getparam_ioctl, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_SETPARAM, i915_gem_context_setparam_ioctl, DRM_RENDER_ALLOW),
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 7a0b51301337..d9d07b70f05c 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3362,6 +3362,8 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
 				    struct drm_file *file_priv);
 int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 				    struct drm_file *file_priv);
+int i915_gem_context_reset_stats_ioctl(struct drm_device *dev, void *data,
+				       struct drm_file *file);
 
 /* i915_gem_evict.c */
 int __must_check i915_gem_evict_something(struct drm_device *dev,
@@ -3578,8 +3580,6 @@ extern void intel_detect_pch(struct drm_device *dev);
 extern bool i915_semaphore_is_enabled(struct drm_i915_private *dev_priv);
 int i915_reg_read_ioctl(struct drm_device *dev, void *data,
 			struct drm_file *file);
-int i915_get_reset_stats_ioctl(struct drm_device *dev, void *data,
-			       struct drm_file *file);
 
 /* overlay */
 extern struct intel_overlay_error_state *
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 0fffebcc0ace..f3c76ca6b411 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -1040,3 +1040,42 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 
 	return ret;
 }
+
+int i915_gem_context_reset_stats_ioctl(struct drm_device *dev,
+				       void *data, struct drm_file *file)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct drm_i915_reset_stats *args = data;
+	struct i915_ctx_hang_stats *hs;
+	struct intel_context *ctx;
+	int ret;
+
+	if (args->flags || args->pad)
+		return -EINVAL;
+
+	if (args->ctx_id == DEFAULT_CONTEXT_HANDLE && !capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
+	ret = mutex_lock_interruptible(&dev->struct_mutex);
+	if (ret)
+		return ret;
+
+	ctx = i915_gem_context_get(file->driver_priv, args->ctx_id);
+	if (IS_ERR(ctx)) {
+		mutex_unlock(&dev->struct_mutex);
+		return PTR_ERR(ctx);
+	}
+	hs = &ctx->hang_stats;
+
+	if (capable(CAP_SYS_ADMIN))
+		args->reset_count = i915_reset_count(&dev_priv->gpu_error);
+	else
+		args->reset_count = 0;
+
+	args->batch_active = hs->batch_active;
+	args->batch_pending = hs->batch_pending;
+
+	mutex_unlock(&dev->struct_mutex);
+
+	return 0;
+}
diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
index 0b1bd07dd04a..385114bca924 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -1460,45 +1460,6 @@ out:
 	return ret;
 }
 
-int i915_get_reset_stats_ioctl(struct drm_device *dev,
-			       void *data, struct drm_file *file)
-{
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	struct drm_i915_reset_stats *args = data;
-	struct i915_ctx_hang_stats *hs;
-	struct intel_context *ctx;
-	int ret;
-
-	if (args->flags || args->pad)
-		return -EINVAL;
-
-	if (args->ctx_id == DEFAULT_CONTEXT_HANDLE && !capable(CAP_SYS_ADMIN))
-		return -EPERM;
-
-	ret = mutex_lock_interruptible(&dev->struct_mutex);
-	if (ret)
-		return ret;
-
-	ctx = i915_gem_context_get(file->driver_priv, args->ctx_id);
-	if (IS_ERR(ctx)) {
-		mutex_unlock(&dev->struct_mutex);
-		return PTR_ERR(ctx);
-	}
-	hs = &ctx->hang_stats;
-
-	if (capable(CAP_SYS_ADMIN))
-		args->reset_count = i915_reset_count(&dev_priv->gpu_error);
-	else
-		args->reset_count = 0;
-
-	args->batch_active = hs->batch_active;
-	args->batch_pending = hs->batch_pending;
-
-	mutex_unlock(&dev->struct_mutex);
-
-	return 0;
-}
-
 static int i915_reset_complete(struct pci_dev *pdev)
 {
 	u8 gdrst;
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/2] drm/i915: Complete pending resets before get-reset-stats ioctl
  2016-05-13  9:39       ` [PATCH 1/2] drm/i915: Move get-reset-stats ioctl from intel_uncore.c to i915_gem_context.c Chris Wilson
@ 2016-05-13  9:39         ` Chris Wilson
  2016-05-13 10:12           ` Mika Kuoppala
  2016-05-13  9:51         ` [PATCH 1/2] drm/i915: Move get-reset-stats ioctl from intel_uncore.c to i915_gem_context.c Mika Kuoppala
  1 sibling, 1 reply; 18+ messages in thread
From: Chris Wilson @ 2016-05-13  9:39 UTC (permalink / raw)
  To: intel-gfx

The get-reset-stats ioctls wasn't waiting for a pending reset before
reporting its statistics, and so was ignoring a hang generated by the
context that should have been reported against said context.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_context.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index f3c76ca6b411..2aedd188473d 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -1056,7 +1056,7 @@ int i915_gem_context_reset_stats_ioctl(struct drm_device *dev,
 	if (args->ctx_id == DEFAULT_CONTEXT_HANDLE && !capable(CAP_SYS_ADMIN))
 		return -EPERM;
 
-	ret = mutex_lock_interruptible(&dev->struct_mutex);
+	ret = i915_mutex_lock_interruptible(dev);
 	if (ret)
 		return ret;
 
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* ✗ Ro.CI.BAT: failure for series starting with [1/2] drm/i915: Stop retiring requests from busy-ioctl (rev4)
  2016-05-12 10:25 [PATCH 1/2] drm/i915: Stop retiring requests from busy-ioctl Chris Wilson
                   ` (2 preceding siblings ...)
  2016-05-13  8:38 ` [PATCH 1/2] drm/i915: Stop retiring requests from busy-ioctl Tvrtko Ursulin
@ 2016-05-13  9:48 ` Patchwork
  2016-05-13 10:40 ` [PATCH 1/2] drm/i915: Stop retiring requests from busy-ioctl Mika Kuoppala
  2016-05-13 11:00 ` ✗ Ro.CI.BAT: failure for series starting with [v2] drm/i915: Stop retiring requests from busy/wait ioctls (rev5) Patchwork
  5 siblings, 0 replies; 18+ messages in thread
From: Patchwork @ 2016-05-13  9:48 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/2] drm/i915: Stop retiring requests from busy-ioctl (rev4)
URL   : https://patchwork.freedesktop.org/series/7059/
State : failure

== Summary ==

Applying: drm/i915: Stop retiring requests from busy-ioctl
Applying: drm/i915: Complete pending resets before get-reset-stats ioctl
Patch failed at 0002 drm/i915: Complete pending resets before get-reset-stats ioctl
The copy of the patch that failed is found in: .git/rebase-apply/patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] drm/i915: Move get-reset-stats ioctl from intel_uncore.c to i915_gem_context.c
  2016-05-13  9:39       ` [PATCH 1/2] drm/i915: Move get-reset-stats ioctl from intel_uncore.c to i915_gem_context.c Chris Wilson
  2016-05-13  9:39         ` [PATCH 2/2] drm/i915: Complete pending resets before get-reset-stats ioctl Chris Wilson
@ 2016-05-13  9:51         ` Mika Kuoppala
  1 sibling, 0 replies; 18+ messages in thread
From: Mika Kuoppala @ 2016-05-13  9:51 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> [ text/plain ]
> The get-reset-stats ioctl reports upon the statistics (number of hangs,
> be it as a victim or the guilty party) of a particular context. It is
> semantically better as being part of i915_gem_context.c user interface,
> as opposed to the hardware level access of intel_uncore.c
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>

> ---
>  drivers/gpu/drm/i915/i915_dma.c         |  2 +-
>  drivers/gpu/drm/i915/i915_drv.h         |  4 ++--
>  drivers/gpu/drm/i915/i915_gem_context.c | 39 +++++++++++++++++++++++++++++++++
>  drivers/gpu/drm/i915/intel_uncore.c     | 39 ---------------------------------
>  4 files changed, 42 insertions(+), 42 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
> index e42fa1769c10..05b2b1901d60 100644
> --- a/drivers/gpu/drm/i915/i915_dma.c
> +++ b/drivers/gpu/drm/i915/i915_dma.c
> @@ -1631,7 +1631,7 @@ const struct drm_ioctl_desc i915_ioctls[] = {
>  	DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_CREATE, i915_gem_context_create_ioctl, DRM_RENDER_ALLOW),
>  	DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_DESTROY, i915_gem_context_destroy_ioctl, DRM_RENDER_ALLOW),
>  	DRM_IOCTL_DEF_DRV(I915_REG_READ, i915_reg_read_ioctl, DRM_RENDER_ALLOW),
> -	DRM_IOCTL_DEF_DRV(I915_GET_RESET_STATS, i915_get_reset_stats_ioctl, DRM_RENDER_ALLOW),
> +	DRM_IOCTL_DEF_DRV(I915_GET_RESET_STATS, i915_gem_context_reset_stats_ioctl, DRM_RENDER_ALLOW),
>  	DRM_IOCTL_DEF_DRV(I915_GEM_USERPTR, i915_gem_userptr_ioctl, DRM_RENDER_ALLOW),
>  	DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_GETPARAM, i915_gem_context_getparam_ioctl, DRM_RENDER_ALLOW),
>  	DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_SETPARAM, i915_gem_context_setparam_ioctl, DRM_RENDER_ALLOW),
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 7a0b51301337..d9d07b70f05c 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -3362,6 +3362,8 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
>  				    struct drm_file *file_priv);
>  int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
>  				    struct drm_file *file_priv);
> +int i915_gem_context_reset_stats_ioctl(struct drm_device *dev, void *data,
> +				       struct drm_file *file);
>  
>  /* i915_gem_evict.c */
>  int __must_check i915_gem_evict_something(struct drm_device *dev,
> @@ -3578,8 +3580,6 @@ extern void intel_detect_pch(struct drm_device *dev);
>  extern bool i915_semaphore_is_enabled(struct drm_i915_private *dev_priv);
>  int i915_reg_read_ioctl(struct drm_device *dev, void *data,
>  			struct drm_file *file);
> -int i915_get_reset_stats_ioctl(struct drm_device *dev, void *data,
> -			       struct drm_file *file);
>  
>  /* overlay */
>  extern struct intel_overlay_error_state *
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 0fffebcc0ace..f3c76ca6b411 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -1040,3 +1040,42 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
>  
>  	return ret;
>  }
> +
> +int i915_gem_context_reset_stats_ioctl(struct drm_device *dev,
> +				       void *data, struct drm_file *file)
> +{
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +	struct drm_i915_reset_stats *args = data;
> +	struct i915_ctx_hang_stats *hs;
> +	struct intel_context *ctx;
> +	int ret;
> +
> +	if (args->flags || args->pad)
> +		return -EINVAL;
> +
> +	if (args->ctx_id == DEFAULT_CONTEXT_HANDLE && !capable(CAP_SYS_ADMIN))
> +		return -EPERM;
> +
> +	ret = mutex_lock_interruptible(&dev->struct_mutex);
> +	if (ret)
> +		return ret;
> +
> +	ctx = i915_gem_context_get(file->driver_priv, args->ctx_id);
> +	if (IS_ERR(ctx)) {
> +		mutex_unlock(&dev->struct_mutex);
> +		return PTR_ERR(ctx);
> +	}
> +	hs = &ctx->hang_stats;
> +
> +	if (capable(CAP_SYS_ADMIN))
> +		args->reset_count = i915_reset_count(&dev_priv->gpu_error);
> +	else
> +		args->reset_count = 0;
> +
> +	args->batch_active = hs->batch_active;
> +	args->batch_pending = hs->batch_pending;
> +
> +	mutex_unlock(&dev->struct_mutex);
> +
> +	return 0;
> +}
> diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
> index 0b1bd07dd04a..385114bca924 100644
> --- a/drivers/gpu/drm/i915/intel_uncore.c
> +++ b/drivers/gpu/drm/i915/intel_uncore.c
> @@ -1460,45 +1460,6 @@ out:
>  	return ret;
>  }
>  
> -int i915_get_reset_stats_ioctl(struct drm_device *dev,
> -			       void *data, struct drm_file *file)
> -{
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> -	struct drm_i915_reset_stats *args = data;
> -	struct i915_ctx_hang_stats *hs;
> -	struct intel_context *ctx;
> -	int ret;
> -
> -	if (args->flags || args->pad)
> -		return -EINVAL;
> -
> -	if (args->ctx_id == DEFAULT_CONTEXT_HANDLE && !capable(CAP_SYS_ADMIN))
> -		return -EPERM;
> -
> -	ret = mutex_lock_interruptible(&dev->struct_mutex);
> -	if (ret)
> -		return ret;
> -
> -	ctx = i915_gem_context_get(file->driver_priv, args->ctx_id);
> -	if (IS_ERR(ctx)) {
> -		mutex_unlock(&dev->struct_mutex);
> -		return PTR_ERR(ctx);
> -	}
> -	hs = &ctx->hang_stats;
> -
> -	if (capable(CAP_SYS_ADMIN))
> -		args->reset_count = i915_reset_count(&dev_priv->gpu_error);
> -	else
> -		args->reset_count = 0;
> -
> -	args->batch_active = hs->batch_active;
> -	args->batch_pending = hs->batch_pending;
> -
> -	mutex_unlock(&dev->struct_mutex);
> -
> -	return 0;
> -}
> -
>  static int i915_reset_complete(struct pci_dev *pdev)
>  {
>  	u8 gdrst;
> -- 
> 2.8.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] drm/i915: Complete pending resets before get-reset-stats ioctl
  2016-05-13  9:39         ` [PATCH 2/2] drm/i915: Complete pending resets before get-reset-stats ioctl Chris Wilson
@ 2016-05-13 10:12           ` Mika Kuoppala
  0 siblings, 0 replies; 18+ messages in thread
From: Mika Kuoppala @ 2016-05-13 10:12 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> [ text/plain ]
> The get-reset-stats ioctls wasn't waiting for a pending reset before
> reporting its statistics, and so was ignoring a hang generated by the
> context that should have been reported against said context.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

This is the fix we are looking for. And so blatantly obvious that
I blame myself for walking past this yesterday.

Testcase: igt/gem_reset_stats/reset_stats-*
Tested-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>

> ---
>  drivers/gpu/drm/i915/i915_gem_context.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index f3c76ca6b411..2aedd188473d 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -1056,7 +1056,7 @@ int i915_gem_context_reset_stats_ioctl(struct drm_device *dev,
>  	if (args->ctx_id == DEFAULT_CONTEXT_HANDLE && !capable(CAP_SYS_ADMIN))
>  		return -EPERM;
>  
> -	ret = mutex_lock_interruptible(&dev->struct_mutex);
> +	ret = i915_mutex_lock_interruptible(dev);
>  	if (ret)
>  		return ret;
>  
> -- 
> 2.8.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2] drm/i915: Stop retiring requests from busy/wait ioctls
  2016-05-13  9:01     ` Tvrtko Ursulin
@ 2016-05-13 10:30       ` Chris Wilson
  0 siblings, 0 replies; 18+ messages in thread
From: Chris Wilson @ 2016-05-13 10:30 UTC (permalink / raw)
  To: intel-gfx

In order to reduce the workload of the caller, we do not want to
actually have to retire requests of others when checking the busy status
of this object. This applies to both busy/wait ioctls as the wait ioctl
has a semantically equivalent mode to the busy ioctl.

At the present time, this is only a minor improvement to reduce the
workload of the busy ioctl under the struct_mutex which helps to reduce
its impact upon contention of struct_mutex. However, since it is mostly
a victim in highly contended scenarios, the impact is very minor until
we can eliminate the struct_mutex requirement for busy-ioctl in the near
future.

v2: Mention the patches intended limited impact. It is just paving the
way for greater changes whilst reducing the impact of a bugfix in the
next patch.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index be75689455b2..6e61738fab31 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3066,14 +3066,8 @@ i915_gem_object_flush_active(struct drm_i915_gem_object *obj)
 		if (req == NULL)
 			continue;
 
-		if (list_empty(&req->list))
-			goto retire;
-
-		if (i915_gem_request_completed(req, true)) {
-			__i915_gem_request_retire__upto(req);
-retire:
+		if (i915_gem_request_completed(req, true))
 			i915_gem_object_retire__read(obj, i);
-		}
 	}
 
 	return 0;
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] drm/i915: Stop retiring requests from busy-ioctl
  2016-05-12 10:25 [PATCH 1/2] drm/i915: Stop retiring requests from busy-ioctl Chris Wilson
                   ` (3 preceding siblings ...)
  2016-05-13  9:48 ` ✗ Ro.CI.BAT: failure for series starting with [1/2] drm/i915: Stop retiring requests from busy-ioctl (rev4) Patchwork
@ 2016-05-13 10:40 ` Mika Kuoppala
  2016-05-13 11:00 ` ✗ Ro.CI.BAT: failure for series starting with [v2] drm/i915: Stop retiring requests from busy/wait ioctls (rev5) Patchwork
  5 siblings, 0 replies; 18+ messages in thread
From: Mika Kuoppala @ 2016-05-13 10:40 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> [ text/plain ]
> In order to reduce the workload of the caller, we do not want to
> actually have to retire requests of others when checking the status of
> this object.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>

> ---
>  drivers/gpu/drm/i915/i915_gem.c | 8 +-------
>  1 file changed, 1 insertion(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 8439867..474c027 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3069,14 +3069,8 @@ i915_gem_object_flush_active(struct drm_i915_gem_object *obj)
>  		if (req == NULL)
>  			continue;
>  
> -		if (list_empty(&req->list))
> -			goto retire;
> -
> -		if (i915_gem_request_completed(req, true)) {
> -			__i915_gem_request_retire__upto(req);
> -retire:
> +		if (i915_gem_request_completed(req, true))
>  			i915_gem_object_retire__read(obj, i);
> -		}
>  	}
>  
>  	return 0;
> -- 
> 2.8.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] drm/i915: Stop automatically retiring requests after a GPU hang
  2016-05-13  7:57   ` [PATCH v2] " Chris Wilson
  2016-05-13  8:43     ` Mika Kuoppala
@ 2016-05-13 10:48     ` Mika Kuoppala
  1 sibling, 0 replies; 18+ messages in thread
From: Mika Kuoppala @ 2016-05-13 10:48 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: Daniel Vetter

Chris Wilson <chris@chris-wilson.co.uk> writes:

> [ text/plain ]
> Following a GPU hang, we break out of the request loop in order to
> unlock the struct_mutex for use by the GPU reset. However, if we retire
> all the requests at that moment, we cannot identify the guilty request
> after performing the reset.
>
> v2: Not automatically retiring requests forces us to recheck for
> available ringspace.
>
> Fixes: f4457ae71fd6 ("drm/i915: Prevent leaking of -EIO from i915_wait_request()")
> Testcase: igt/gem_reset_stats

Testcase: igt/gem_reset_stats/ban-*
Tested-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>

> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem.c         | 8 ++++++--
>  drivers/gpu/drm/i915/intel_ringbuffer.c | 2 ++
>  2 files changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 6e61738fab31..a3d826bb216b 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1462,7 +1462,10 @@ i915_wait_request(struct drm_i915_gem_request *req)
>  	if (ret)
>  		return ret;
>  
> -	__i915_gem_request_retire__upto(req);
> +	/* If the GPU hung, we want to keep the requests to find the guilty. */
> +	if (req->reset_counter == i915_reset_counter(&dev_priv->gpu_error))
> +		__i915_gem_request_retire__upto(req);
> +
>  	return 0;
>  }
>  
> @@ -1519,7 +1522,8 @@ i915_gem_object_retire_request(struct drm_i915_gem_object *obj,
>  	else if (obj->last_write_req == req)
>  		i915_gem_object_retire__write(obj);
>  
> -	__i915_gem_request_retire__upto(req);
> +	if (req->reset_counter == i915_reset_counter(&req->i915->gpu_error))
> +		__i915_gem_request_retire__upto(req);
>  }
>  
>  /* A nonblocking variant of the above wait. This is a highly dangerous routine
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 0618dd34c3ec..8d35a3978f9b 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -2450,6 +2450,8 @@ int intel_ring_begin(struct drm_i915_gem_request *req, int num_dwords)
>  			return ret;
>  
>  		intel_ring_update_space(ringbuf);
> +		if (unlikely(ringbuf->space < wait_bytes))
> +			return -EAGAIN;
>  	}
>  
>  	if (unlikely(need_wrap)) {
> -- 
> 2.8.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* ✗ Ro.CI.BAT: failure for series starting with [v2] drm/i915: Stop retiring requests from busy/wait ioctls (rev5)
  2016-05-12 10:25 [PATCH 1/2] drm/i915: Stop retiring requests from busy-ioctl Chris Wilson
                   ` (4 preceding siblings ...)
  2016-05-13 10:40 ` [PATCH 1/2] drm/i915: Stop retiring requests from busy-ioctl Mika Kuoppala
@ 2016-05-13 11:00 ` Patchwork
  5 siblings, 0 replies; 18+ messages in thread
From: Patchwork @ 2016-05-13 11:00 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [v2] drm/i915: Stop retiring requests from busy/wait ioctls (rev5)
URL   : https://patchwork.freedesktop.org/series/7059/
State : failure

== Summary ==

Applying: drm/i915: Stop retiring requests from busy/wait ioctls
Applying: drm/i915: Complete pending resets before get-reset-stats ioctl
Patch failed at 0002 drm/i915: Complete pending resets before get-reset-stats ioctl
The copy of the patch that failed is found in: .git/rebase-apply/patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2016-05-13 11:00 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-12 10:25 [PATCH 1/2] drm/i915: Stop retiring requests from busy-ioctl Chris Wilson
2016-05-12 10:25 ` [PATCH 2/2] drm/i915: Stop automatically retiring requests after a GPU hang Chris Wilson
2016-05-13  7:57   ` [PATCH v2] " Chris Wilson
2016-05-13  8:43     ` Mika Kuoppala
2016-05-13  9:38       ` Chris Wilson
2016-05-13  9:39       ` [PATCH 1/2] drm/i915: Move get-reset-stats ioctl from intel_uncore.c to i915_gem_context.c Chris Wilson
2016-05-13  9:39         ` [PATCH 2/2] drm/i915: Complete pending resets before get-reset-stats ioctl Chris Wilson
2016-05-13 10:12           ` Mika Kuoppala
2016-05-13  9:51         ` [PATCH 1/2] drm/i915: Move get-reset-stats ioctl from intel_uncore.c to i915_gem_context.c Mika Kuoppala
2016-05-13 10:48     ` [PATCH v2] drm/i915: Stop automatically retiring requests after a GPU hang Mika Kuoppala
2016-05-13  8:25 ` ✗ Ro.CI.BAT: failure for series starting with [1/2] drm/i915: Stop retiring requests from busy-ioctl (rev2) Patchwork
2016-05-13  8:38 ` [PATCH 1/2] drm/i915: Stop retiring requests from busy-ioctl Tvrtko Ursulin
2016-05-13  8:50   ` Chris Wilson
2016-05-13  9:01     ` Tvrtko Ursulin
2016-05-13 10:30       ` [PATCH v2] drm/i915: Stop retiring requests from busy/wait ioctls Chris Wilson
2016-05-13  9:48 ` ✗ Ro.CI.BAT: failure for series starting with [1/2] drm/i915: Stop retiring requests from busy-ioctl (rev4) Patchwork
2016-05-13 10:40 ` [PATCH 1/2] drm/i915: Stop retiring requests from busy-ioctl Mika Kuoppala
2016-05-13 11:00 ` ✗ Ro.CI.BAT: failure for series starting with [v2] drm/i915: Stop retiring requests from busy/wait ioctls (rev5) Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.