All of lore.kernel.org
 help / color / mirror / Atom feed
* [Intel-gfx] [PATCH] drm/i915: Reduce i915_request.lock contention for i915_request_wait
@ 2020-07-15 10:50 Chris Wilson
  2020-07-15 12:06 ` Tvrtko Ursulin
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Chris Wilson @ 2020-07-15 10:50 UTC (permalink / raw)
  To: intel-gfx; +Cc: Matthew Auld, Chris Wilson

Currently, we use i915_request_completed() directly in
i915_request_wait() and follow up with a manual invocation of
dma_fence_signal(). This appears to cause a large number of contentions
on i915_request.lock as when the process is woken up after the fence is
signaled by an interrupt, we will then try and call dma_fence_signal()
ourselves while the signaler is still holding the lock.
dma_fence_is_signaled() has the benefit of checking the
DMA_FENCE_FLAG_SIGNALED_BIT prior to calling dma_fence_signal() and so
avoids most of that contention.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_request.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 0b2fe55e6194..bb4eb1a8780e 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1640,7 +1640,7 @@ static bool busywait_stop(unsigned long timeout, unsigned int cpu)
 	return this_cpu != cpu;
 }
 
-static bool __i915_spin_request(const struct i915_request * const rq, int state)
+static bool __i915_spin_request(struct i915_request * const rq, int state)
 {
 	unsigned long timeout_ns;
 	unsigned int cpu;
@@ -1673,7 +1673,7 @@ static bool __i915_spin_request(const struct i915_request * const rq, int state)
 	timeout_ns = READ_ONCE(rq->engine->props.max_busywait_duration_ns);
 	timeout_ns += local_clock_ns(&cpu);
 	do {
-		if (i915_request_completed(rq))
+		if (dma_fence_is_signaled(&rq->fence))
 			return true;
 
 		if (signal_pending_state(state, current))
@@ -1766,10 +1766,8 @@ long i915_request_wait(struct i915_request *rq,
 	 * duration, which we currently lack.
 	 */
 	if (IS_ACTIVE(CONFIG_DRM_I915_MAX_REQUEST_BUSYWAIT) &&
-	    __i915_spin_request(rq, state)) {
-		dma_fence_signal(&rq->fence);
+	    __i915_spin_request(rq, state))
 		goto out;
-	}
 
 	/*
 	 * This client is about to stall waiting for the GPU. In many cases
@@ -1796,10 +1794,8 @@ long i915_request_wait(struct i915_request *rq,
 	for (;;) {
 		set_current_state(state);
 
-		if (i915_request_completed(rq)) {
-			dma_fence_signal(&rq->fence);
+		if (dma_fence_is_signaled(&rq->fence))
 			break;
-		}
 
 		intel_engine_flush_submission(rq->engine);
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: Reduce i915_request.lock contention for i915_request_wait
  2020-07-15 10:50 [Intel-gfx] [PATCH] drm/i915: Reduce i915_request.lock contention for i915_request_wait Chris Wilson
@ 2020-07-15 12:06 ` Tvrtko Ursulin
  2020-07-15 12:26   ` Tvrtko Ursulin
  2020-07-15 12:12 ` [Intel-gfx] ✗ Fi.CI.SPARSE: warning for " Patchwork
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: Tvrtko Ursulin @ 2020-07-15 12:06 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: Matthew Auld


On 15/07/2020 11:50, Chris Wilson wrote:
> Currently, we use i915_request_completed() directly in
> i915_request_wait() and follow up with a manual invocation of
> dma_fence_signal(). This appears to cause a large number of contentions
> on i915_request.lock as when the process is woken up after the fence is
> signaled by an interrupt, we will then try and call dma_fence_signal()
> ourselves while the signaler is still holding the lock.
> dma_fence_is_signaled() has the benefit of checking the
> DMA_FENCE_FLAG_SIGNALED_BIT prior to calling dma_fence_signal() and so
> avoids most of that contention.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Matthew Auld <matthew.auld@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_request.c | 12 ++++--------
>   1 file changed, 4 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 0b2fe55e6194..bb4eb1a8780e 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -1640,7 +1640,7 @@ static bool busywait_stop(unsigned long timeout, unsigned int cpu)
>   	return this_cpu != cpu;
>   }
>   
> -static bool __i915_spin_request(const struct i915_request * const rq, int state)
> +static bool __i915_spin_request(struct i915_request * const rq, int state)
>   {
>   	unsigned long timeout_ns;
>   	unsigned int cpu;
> @@ -1673,7 +1673,7 @@ static bool __i915_spin_request(const struct i915_request * const rq, int state)
>   	timeout_ns = READ_ONCE(rq->engine->props.max_busywait_duration_ns);
>   	timeout_ns += local_clock_ns(&cpu);
>   	do {
> -		if (i915_request_completed(rq))
> +		if (dma_fence_is_signaled(&rq->fence))
>   			return true;
>   
>   		if (signal_pending_state(state, current))
> @@ -1766,10 +1766,8 @@ long i915_request_wait(struct i915_request *rq,
>   	 * duration, which we currently lack.
>   	 */
>   	if (IS_ACTIVE(CONFIG_DRM_I915_MAX_REQUEST_BUSYWAIT) &&
> -	    __i915_spin_request(rq, state)) {
> -		dma_fence_signal(&rq->fence);
> +	    __i915_spin_request(rq, state))
>   		goto out;
> -	}
>   
>   	/*
>   	 * This client is about to stall waiting for the GPU. In many cases
> @@ -1796,10 +1794,8 @@ long i915_request_wait(struct i915_request *rq,
>   	for (;;) {
>   		set_current_state(state);
>   
> -		if (i915_request_completed(rq)) {
> -			dma_fence_signal(&rq->fence);
> +		if (dma_fence_is_signaled(&rq->fence))
>   			break;
> -		}
>   
>   		intel_engine_flush_submission(rq->engine);
>   
> 

In other words putting some latency back into the waiters, which is 
probably okay, since sync waits is not our primary model.

I have a slight concern about the remaining value of busy spinning if 
i915_request_completed check is removed from there as well. Of course it 
doesn't make sense to have different completion criteria between the 
two.. We could wait a bit longer if real check in busyspin said request 
is actually completed, just not signal it but wait for the breadcrumbs 
to do it.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Intel-gfx] ✗ Fi.CI.SPARSE: warning for drm/i915: Reduce i915_request.lock contention for i915_request_wait
  2020-07-15 10:50 [Intel-gfx] [PATCH] drm/i915: Reduce i915_request.lock contention for i915_request_wait Chris Wilson
  2020-07-15 12:06 ` Tvrtko Ursulin
@ 2020-07-15 12:12 ` Patchwork
  2020-07-15 12:34 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
  2020-07-15 16:11 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork
  3 siblings, 0 replies; 11+ messages in thread
From: Patchwork @ 2020-07-15 12:12 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: Reduce i915_request.lock contention for i915_request_wait
URL   : https://patchwork.freedesktop.org/series/79514/
State : warning

== Summary ==

$ dim sparse --fast origin/drm-tip
Sparse version: v0.6.0
Fast mode used, each commit won't be checked separately.


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: Reduce i915_request.lock contention for i915_request_wait
  2020-07-15 12:06 ` Tvrtko Ursulin
@ 2020-07-15 12:26   ` Tvrtko Ursulin
  2020-07-15 14:47     ` Chris Wilson
  0 siblings, 1 reply; 11+ messages in thread
From: Tvrtko Ursulin @ 2020-07-15 12:26 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: Matthew Auld


On 15/07/2020 13:06, Tvrtko Ursulin wrote:
> 
> On 15/07/2020 11:50, Chris Wilson wrote:
>> Currently, we use i915_request_completed() directly in
>> i915_request_wait() and follow up with a manual invocation of
>> dma_fence_signal(). This appears to cause a large number of contentions
>> on i915_request.lock as when the process is woken up after the fence is
>> signaled by an interrupt, we will then try and call dma_fence_signal()
>> ourselves while the signaler is still holding the lock.
>> dma_fence_is_signaled() has the benefit of checking the
>> DMA_FENCE_FLAG_SIGNALED_BIT prior to calling dma_fence_signal() and so
>> avoids most of that contention.
>>
>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: Matthew Auld <matthew.auld@intel.com>
>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_request.c | 12 ++++--------
>>   1 file changed, 4 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_request.c 
>> b/drivers/gpu/drm/i915/i915_request.c
>> index 0b2fe55e6194..bb4eb1a8780e 100644
>> --- a/drivers/gpu/drm/i915/i915_request.c
>> +++ b/drivers/gpu/drm/i915/i915_request.c
>> @@ -1640,7 +1640,7 @@ static bool busywait_stop(unsigned long timeout, 
>> unsigned int cpu)
>>       return this_cpu != cpu;
>>   }
>> -static bool __i915_spin_request(const struct i915_request * const rq, 
>> int state)
>> +static bool __i915_spin_request(struct i915_request * const rq, int 
>> state)
>>   {
>>       unsigned long timeout_ns;
>>       unsigned int cpu;
>> @@ -1673,7 +1673,7 @@ static bool __i915_spin_request(const struct 
>> i915_request * const rq, int state)
>>       timeout_ns = READ_ONCE(rq->engine->props.max_busywait_duration_ns);
>>       timeout_ns += local_clock_ns(&cpu);
>>       do {
>> -        if (i915_request_completed(rq))
>> +        if (dma_fence_is_signaled(&rq->fence))
>>               return true;
>>           if (signal_pending_state(state, current))
>> @@ -1766,10 +1766,8 @@ long i915_request_wait(struct i915_request *rq,
>>        * duration, which we currently lack.
>>        */
>>       if (IS_ACTIVE(CONFIG_DRM_I915_MAX_REQUEST_BUSYWAIT) &&
>> -        __i915_spin_request(rq, state)) {
>> -        dma_fence_signal(&rq->fence);
>> +        __i915_spin_request(rq, state))
>>           goto out;
>> -    }
>>       /*
>>        * This client is about to stall waiting for the GPU. In many cases
>> @@ -1796,10 +1794,8 @@ long i915_request_wait(struct i915_request *rq,
>>       for (;;) {
>>           set_current_state(state);
>> -        if (i915_request_completed(rq)) {
>> -            dma_fence_signal(&rq->fence);
>> +        if (dma_fence_is_signaled(&rq->fence))
>>               break;
>> -        }
>>           intel_engine_flush_submission(rq->engine);
>>
> 
> In other words putting some latency back into the waiters, which is 
> probably okay, since sync waits is not our primary model.
> 
> I have a slight concern about the remaining value of busy spinning if 
> i915_request_completed check is removed from there as well. Of course it 
> doesn't make sense to have different completion criteria between the 
> two.. We could wait a bit longer if real check in busyspin said request 
> is actually completed, just not signal it but wait for the breadcrumbs 
> to do it.

What a load of nonsense.. :)

Okay, I think the only real question is i915_request_completed vs 
dma_fence_signaled in __i915_spin_request. Do we want to burn CPU cycles 
waiting on GPU and breadcrumb irq work, or just the GPU.

Regards,

Tvrtko



_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Intel-gfx] ✓ Fi.CI.BAT: success for drm/i915: Reduce i915_request.lock contention for i915_request_wait
  2020-07-15 10:50 [Intel-gfx] [PATCH] drm/i915: Reduce i915_request.lock contention for i915_request_wait Chris Wilson
  2020-07-15 12:06 ` Tvrtko Ursulin
  2020-07-15 12:12 ` [Intel-gfx] ✗ Fi.CI.SPARSE: warning for " Patchwork
@ 2020-07-15 12:34 ` Patchwork
  2020-07-15 16:11 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork
  3 siblings, 0 replies; 11+ messages in thread
From: Patchwork @ 2020-07-15 12:34 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 6494 bytes --]

== Series Details ==

Series: drm/i915: Reduce i915_request.lock contention for i915_request_wait
URL   : https://patchwork.freedesktop.org/series/79514/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_8749 -> Patchwork_18176
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18176/index.html

Known issues
------------

  Here are the changes found in Patchwork_18176 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_exec_suspend@basic-s0:
    - fi-tgl-u2:          [PASS][1] -> [FAIL][2] ([i915#1888])
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8749/fi-tgl-u2/igt@gem_exec_suspend@basic-s0.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18176/fi-tgl-u2/igt@gem_exec_suspend@basic-s0.html

  * igt@i915_pm_rpm@module-reload:
    - fi-kbl-guc:         [PASS][3] -> [SKIP][4] ([fdo#109271])
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8749/fi-kbl-guc/igt@i915_pm_rpm@module-reload.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18176/fi-kbl-guc/igt@i915_pm_rpm@module-reload.html

  * igt@i915_selftest@live@gt_contexts:
    - fi-snb-2520m:       [PASS][5] -> [DMESG-FAIL][6] ([i915#541])
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8749/fi-snb-2520m/igt@i915_selftest@live@gt_contexts.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18176/fi-snb-2520m/igt@i915_selftest@live@gt_contexts.html

  * igt@kms_force_connector_basic@force-connector-state:
    - fi-tgl-y:           [PASS][7] -> [DMESG-WARN][8] ([i915#1982]) +1 similar issue
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8749/fi-tgl-y/igt@kms_force_connector_basic@force-connector-state.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18176/fi-tgl-y/igt@kms_force_connector_basic@force-connector-state.html

  * igt@prime_vgem@basic-write:
    - fi-tgl-y:           [PASS][9] -> [DMESG-WARN][10] ([i915#402]) +1 similar issue
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8749/fi-tgl-y/igt@prime_vgem@basic-write.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18176/fi-tgl-y/igt@prime_vgem@basic-write.html

  
#### Possible fixes ####

  * igt@i915_pm_rpm@module-reload:
    - fi-byt-j1900:       [DMESG-WARN][11] ([i915#1982]) -> [PASS][12]
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8749/fi-byt-j1900/igt@i915_pm_rpm@module-reload.html
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18176/fi-byt-j1900/igt@i915_pm_rpm@module-reload.html

  * igt@kms_cursor_legacy@basic-busy-flip-before-cursor-atomic:
    - fi-bsw-n3050:       [DMESG-WARN][13] ([i915#1982]) -> [PASS][14]
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8749/fi-bsw-n3050/igt@kms_cursor_legacy@basic-busy-flip-before-cursor-atomic.html
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18176/fi-bsw-n3050/igt@kms_cursor_legacy@basic-busy-flip-before-cursor-atomic.html
    - fi-bsw-kefka:       [DMESG-WARN][15] ([i915#1982]) -> [PASS][16]
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8749/fi-bsw-kefka/igt@kms_cursor_legacy@basic-busy-flip-before-cursor-atomic.html
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18176/fi-bsw-kefka/igt@kms_cursor_legacy@basic-busy-flip-before-cursor-atomic.html

  * igt@kms_flip@basic-plain-flip@d-dsi1:
    - {fi-tgl-dsi}:       [DMESG-WARN][17] ([i915#1982]) -> [PASS][18]
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8749/fi-tgl-dsi/igt@kms_flip@basic-plain-flip@d-dsi1.html
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18176/fi-tgl-dsi/igt@kms_flip@basic-plain-flip@d-dsi1.html

  * igt@vgem_basic@create:
    - fi-tgl-y:           [DMESG-WARN][19] ([i915#402]) -> [PASS][20]
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8749/fi-tgl-y/igt@vgem_basic@create.html
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18176/fi-tgl-y/igt@vgem_basic@create.html

  
#### Warnings ####

  * igt@kms_cursor_legacy@basic-flip-before-cursor-varying-size:
    - fi-kbl-x1275:       [DMESG-WARN][21] ([i915#62] / [i915#92]) -> [DMESG-WARN][22] ([i915#62] / [i915#92] / [i915#95]) +5 similar issues
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8749/fi-kbl-x1275/igt@kms_cursor_legacy@basic-flip-before-cursor-varying-size.html
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18176/fi-kbl-x1275/igt@kms_cursor_legacy@basic-flip-before-cursor-varying-size.html

  * igt@kms_force_connector_basic@force-edid:
    - fi-kbl-x1275:       [DMESG-WARN][23] ([i915#62] / [i915#92] / [i915#95]) -> [DMESG-WARN][24] ([i915#62] / [i915#92]) +1 similar issue
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8749/fi-kbl-x1275/igt@kms_force_connector_basic@force-edid.html
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18176/fi-kbl-x1275/igt@kms_force_connector_basic@force-edid.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [i915#1888]: https://gitlab.freedesktop.org/drm/intel/issues/1888
  [i915#1982]: https://gitlab.freedesktop.org/drm/intel/issues/1982
  [i915#402]: https://gitlab.freedesktop.org/drm/intel/issues/402
  [i915#541]: https://gitlab.freedesktop.org/drm/intel/issues/541
  [i915#62]: https://gitlab.freedesktop.org/drm/intel/issues/62
  [i915#92]: https://gitlab.freedesktop.org/drm/intel/issues/92
  [i915#95]: https://gitlab.freedesktop.org/drm/intel/issues/95


Participating hosts (47 -> 40)
------------------------------

  Missing    (7): fi-ilk-m540 fi-hsw-4200u fi-byt-squawks fi-bsw-cyan fi-ctg-p8600 fi-byt-clapper fi-bdw-samus 


Build changes
-------------

  * Linux: CI_DRM_8749 -> Patchwork_18176

  CI-20190529: 20190529
  CI_DRM_8749: c704cccd64e81bb9a8d23c414fdcafff65697801 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_5735: 21f8204e54c122e4a0f8ca4b59e4b2db8d1ba687 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_18176: e688f32314fbdcc1be26177cb1fb29f3be801120 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

e688f32314fb drm/i915: Reduce i915_request.lock contention for i915_request_wait

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18176/index.html

[-- Attachment #1.2: Type: text/html, Size: 8184 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: Reduce i915_request.lock contention for i915_request_wait
  2020-07-15 12:26   ` Tvrtko Ursulin
@ 2020-07-15 14:47     ` Chris Wilson
  2020-07-15 14:47       ` Chris Wilson
  0 siblings, 1 reply; 11+ messages in thread
From: Chris Wilson @ 2020-07-15 14:47 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx; +Cc: Matthew Auld

Quoting Tvrtko Ursulin (2020-07-15 13:26:23)
> 
> On 15/07/2020 13:06, Tvrtko Ursulin wrote:
> > 
> > On 15/07/2020 11:50, Chris Wilson wrote:
> >> Currently, we use i915_request_completed() directly in
> >> i915_request_wait() and follow up with a manual invocation of
> >> dma_fence_signal(). This appears to cause a large number of contentions
> >> on i915_request.lock as when the process is woken up after the fence is
> >> signaled by an interrupt, we will then try and call dma_fence_signal()
> >> ourselves while the signaler is still holding the lock.
> >> dma_fence_is_signaled() has the benefit of checking the
> >> DMA_FENCE_FLAG_SIGNALED_BIT prior to calling dma_fence_signal() and so
> >> avoids most of that contention.
> >>
> >> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >> Cc: Matthew Auld <matthew.auld@intel.com>
> >> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >> ---
> >>   drivers/gpu/drm/i915/i915_request.c | 12 ++++--------
> >>   1 file changed, 4 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/i915/i915_request.c 
> >> b/drivers/gpu/drm/i915/i915_request.c
> >> index 0b2fe55e6194..bb4eb1a8780e 100644
> >> --- a/drivers/gpu/drm/i915/i915_request.c
> >> +++ b/drivers/gpu/drm/i915/i915_request.c
> >> @@ -1640,7 +1640,7 @@ static bool busywait_stop(unsigned long timeout, 
> >> unsigned int cpu)
> >>       return this_cpu != cpu;
> >>   }
> >> -static bool __i915_spin_request(const struct i915_request * const rq, 
> >> int state)
> >> +static bool __i915_spin_request(struct i915_request * const rq, int 
> >> state)
> >>   {
> >>       unsigned long timeout_ns;
> >>       unsigned int cpu;
> >> @@ -1673,7 +1673,7 @@ static bool __i915_spin_request(const struct 
> >> i915_request * const rq, int state)
> >>       timeout_ns = READ_ONCE(rq->engine->props.max_busywait_duration_ns);
> >>       timeout_ns += local_clock_ns(&cpu);
> >>       do {
> >> -        if (i915_request_completed(rq))
> >> +        if (dma_fence_is_signaled(&rq->fence))
> >>               return true;
> >>           if (signal_pending_state(state, current))
> >> @@ -1766,10 +1766,8 @@ long i915_request_wait(struct i915_request *rq,
> >>        * duration, which we currently lack.
> >>        */
> >>       if (IS_ACTIVE(CONFIG_DRM_I915_MAX_REQUEST_BUSYWAIT) &&
> >> -        __i915_spin_request(rq, state)) {
> >> -        dma_fence_signal(&rq->fence);
> >> +        __i915_spin_request(rq, state))
> >>           goto out;
> >> -    }
> >>       /*
> >>        * This client is about to stall waiting for the GPU. In many cases
> >> @@ -1796,10 +1794,8 @@ long i915_request_wait(struct i915_request *rq,
> >>       for (;;) {
> >>           set_current_state(state);
> >> -        if (i915_request_completed(rq)) {
> >> -            dma_fence_signal(&rq->fence);
> >> +        if (dma_fence_is_signaled(&rq->fence))
> >>               break;
> >> -        }
> >>           intel_engine_flush_submission(rq->engine);
> >>
> > 
> > In other words putting some latency back into the waiters, which is 
> > probably okay, since sync waits is not our primary model.
> > 
> > I have a slight concern about the remaining value of busy spinning if 
> > i915_request_completed check is removed from there as well. Of course it 
> > doesn't make sense to have different completion criteria between the 
> > two.. We could wait a bit longer if real check in busyspin said request 
> > is actually completed, just not signal it but wait for the breadcrumbs 
> > to do it.
> 
> What a load of nonsense.. :)
> 
> Okay, I think the only real question is i915_request_completed vs 
> dma_fence_signaled in __i915_spin_request. Do we want to burn CPU cycles 
> waiting on GPU and breadcrumb irq work, or just the GPU.

dma_fence_is_signaled() {
	if (test_bit(SIGNALED_BIT))
		return true;
	
	if (i915_request_completed()) {
		dma_fence_signal();
		return true;
	}
	
	return false;
}

with the indirection. So the question is whether the indirection is
worth the extra test bit. Just purely looking at the i915_request.lock
contention suggests that it probably is. For the spinner, burning a few
extra CPU cycles for *vfunc is not an issue, it's the wakeup latency,
and since we are calling dma_fence_signal() upon wakeup we do take the
spinlock without checking for an early return from the SIGNALED_BIT.
So I think it's a net positive. The alternative was to write

	if (i915_request_completed()) {
		if (!i915_request_is_signaled())
			dma_fence_signal();
		break;
	}

but

	if (dma_fence_is_signaled())
		break;

does appear simpler, if only by virtue of hiding the details in an
inline.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: Reduce i915_request.lock contention for i915_request_wait
  2020-07-15 14:47     ` Chris Wilson
@ 2020-07-15 14:47       ` Chris Wilson
  2020-07-16  8:41         ` Tvrtko Ursulin
  0 siblings, 1 reply; 11+ messages in thread
From: Chris Wilson @ 2020-07-15 14:47 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx; +Cc: Matthew Auld

Quoting Chris Wilson (2020-07-15 15:47:15)
> Quoting Tvrtko Ursulin (2020-07-15 13:26:23)
> > 
> > On 15/07/2020 13:06, Tvrtko Ursulin wrote:
> > > 
> > > On 15/07/2020 11:50, Chris Wilson wrote:
> > >> Currently, we use i915_request_completed() directly in
> > >> i915_request_wait() and follow up with a manual invocation of
> > >> dma_fence_signal(). This appears to cause a large number of contentions
> > >> on i915_request.lock as when the process is woken up after the fence is
> > >> signaled by an interrupt, we will then try and call dma_fence_signal()
> > >> ourselves while the signaler is still holding the lock.
> > >> dma_fence_is_signaled() has the benefit of checking the
> > >> DMA_FENCE_FLAG_SIGNALED_BIT prior to calling dma_fence_signal() and so
> > >> avoids most of that contention.
> > >>
> > >> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > >> Cc: Matthew Auld <matthew.auld@intel.com>
> > >> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > >> ---
> > >>   drivers/gpu/drm/i915/i915_request.c | 12 ++++--------
> > >>   1 file changed, 4 insertions(+), 8 deletions(-)
> > >>
> > >> diff --git a/drivers/gpu/drm/i915/i915_request.c 
> > >> b/drivers/gpu/drm/i915/i915_request.c
> > >> index 0b2fe55e6194..bb4eb1a8780e 100644
> > >> --- a/drivers/gpu/drm/i915/i915_request.c
> > >> +++ b/drivers/gpu/drm/i915/i915_request.c
> > >> @@ -1640,7 +1640,7 @@ static bool busywait_stop(unsigned long timeout, 
> > >> unsigned int cpu)
> > >>       return this_cpu != cpu;
> > >>   }
> > >> -static bool __i915_spin_request(const struct i915_request * const rq, 
> > >> int state)
> > >> +static bool __i915_spin_request(struct i915_request * const rq, int 
> > >> state)
> > >>   {
> > >>       unsigned long timeout_ns;
> > >>       unsigned int cpu;
> > >> @@ -1673,7 +1673,7 @@ static bool __i915_spin_request(const struct 
> > >> i915_request * const rq, int state)
> > >>       timeout_ns = READ_ONCE(rq->engine->props.max_busywait_duration_ns);
> > >>       timeout_ns += local_clock_ns(&cpu);
> > >>       do {
> > >> -        if (i915_request_completed(rq))
> > >> +        if (dma_fence_is_signaled(&rq->fence))
> > >>               return true;
> > >>           if (signal_pending_state(state, current))
> > >> @@ -1766,10 +1766,8 @@ long i915_request_wait(struct i915_request *rq,
> > >>        * duration, which we currently lack.
> > >>        */
> > >>       if (IS_ACTIVE(CONFIG_DRM_I915_MAX_REQUEST_BUSYWAIT) &&
> > >> -        __i915_spin_request(rq, state)) {
> > >> -        dma_fence_signal(&rq->fence);
> > >> +        __i915_spin_request(rq, state))
> > >>           goto out;
> > >> -    }
> > >>       /*
> > >>        * This client is about to stall waiting for the GPU. In many cases
> > >> @@ -1796,10 +1794,8 @@ long i915_request_wait(struct i915_request *rq,
> > >>       for (;;) {
> > >>           set_current_state(state);
> > >> -        if (i915_request_completed(rq)) {
> > >> -            dma_fence_signal(&rq->fence);
> > >> +        if (dma_fence_is_signaled(&rq->fence))
> > >>               break;
> > >> -        }
> > >>           intel_engine_flush_submission(rq->engine);
> > >>
> > > 
> > > In other words putting some latency back into the waiters, which is 
> > > probably okay, since sync waits is not our primary model.
> > > 
> > > I have a slight concern about the remaining value of busy spinning if 
> > > i915_request_completed check is removed from there as well. Of course it 
> > > doesn't make sense to have different completion criteria between the 
> > > two.. We could wait a bit longer if real check in busyspin said request 
> > > is actually completed, just not signal it but wait for the breadcrumbs 
> > > to do it.
> > 
> > What a load of nonsense.. :)
> > 
> > Okay, I think the only real question is i915_request_completed vs 
> > dma_fence_signaled in __i915_spin_request. Do we want to burn CPU cycles 
> > waiting on GPU and breadcrumb irq work, or just the GPU.
> 
> dma_fence_is_signaled() {
>         if (test_bit(SIGNALED_BIT))
>                 return true;
>         
>         if (i915_request_completed()) {
>                 dma_fence_signal();
>                 return true;
>         }
>         
>         return false;
> }
> 
> with the indirection. So the question is whether the indirection is
> worth the extra test bit. Just purely looking at the i915_request.lock
> contention suggests that it probably is. For the spinner, burning a few
> extra CPU cycles for *vfunc is not an issue, it's the wakeup latency,
> and since we are calling dma_fence_signal() upon wakeup we do take the
> spinlock without checking for an early return from the SIGNALED_BIT.
> So I think it's a net positive. The alternative was to write
> 
>         if (i915_request_completed()) {
>                 if (!i915_request_is_signaled())
>                         dma_fence_signal();
>                 break;
>         }
> 
> but
> 
>         if (dma_fence_is_signaled())
>                 break;
> 
> does appear simpler, if only by virtue of hiding the details in an
> inline.

Or a patch to add test_bit() to dma_fence_signal, but this looked
better.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Intel-gfx] ✓ Fi.CI.IGT: success for drm/i915: Reduce i915_request.lock contention for i915_request_wait
  2020-07-15 10:50 [Intel-gfx] [PATCH] drm/i915: Reduce i915_request.lock contention for i915_request_wait Chris Wilson
                   ` (2 preceding siblings ...)
  2020-07-15 12:34 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
@ 2020-07-15 16:11 ` Patchwork
  3 siblings, 0 replies; 11+ messages in thread
From: Patchwork @ 2020-07-15 16:11 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 14362 bytes --]

== Series Details ==

Series: drm/i915: Reduce i915_request.lock contention for i915_request_wait
URL   : https://patchwork.freedesktop.org/series/79514/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_8749_full -> Patchwork_18176_full
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  

Known issues
------------

  Here are the changes found in Patchwork_18176_full that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_exec_balancer@bonded-early:
    - shard-kbl:          [PASS][1] -> [FAIL][2] ([i915#2079])
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8749/shard-kbl3/igt@gem_exec_balancer@bonded-early.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18176/shard-kbl7/igt@gem_exec_balancer@bonded-early.html

  * igt@gem_exec_whisper@basic-queues-priority-all:
    - shard-glk:          [PASS][3] -> [DMESG-WARN][4] ([i915#118] / [i915#95])
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8749/shard-glk7/igt@gem_exec_whisper@basic-queues-priority-all.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18176/shard-glk5/igt@gem_exec_whisper@basic-queues-priority-all.html

  * igt@gem_softpin@evict-active:
    - shard-tglb:         [PASS][5] -> [DMESG-WARN][6] ([i915#402]) +2 similar issues
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8749/shard-tglb5/igt@gem_softpin@evict-active.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18176/shard-tglb8/igt@gem_softpin@evict-active.html

  * igt@kms_big_fb@y-tiled-64bpp-rotate-180:
    - shard-glk:          [PASS][7] -> [DMESG-FAIL][8] ([i915#118] / [i915#95])
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8749/shard-glk9/igt@kms_big_fb@y-tiled-64bpp-rotate-180.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18176/shard-glk8/igt@kms_big_fb@y-tiled-64bpp-rotate-180.html

  * igt@kms_cursor_crc@pipe-b-cursor-64x64-sliding:
    - shard-skl:          [PASS][9] -> [FAIL][10] ([i915#54])
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8749/shard-skl8/igt@kms_cursor_crc@pipe-b-cursor-64x64-sliding.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18176/shard-skl1/igt@kms_cursor_crc@pipe-b-cursor-64x64-sliding.html

  * igt@kms_flip@2x-flip-vs-expired-vblank@ab-hdmi-a1-hdmi-a2:
    - shard-glk:          [PASS][11] -> [FAIL][12] ([i915#79])
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8749/shard-glk1/igt@kms_flip@2x-flip-vs-expired-vblank@ab-hdmi-a1-hdmi-a2.html
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18176/shard-glk7/igt@kms_flip@2x-flip-vs-expired-vblank@ab-hdmi-a1-hdmi-a2.html

  * igt@kms_flip@basic-plain-flip@a-edp1:
    - shard-skl:          [PASS][13] -> [DMESG-WARN][14] ([i915#1982]) +11 similar issues
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8749/shard-skl7/igt@kms_flip@basic-plain-flip@a-edp1.html
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18176/shard-skl4/igt@kms_flip@basic-plain-flip@a-edp1.html

  * igt@kms_flip@flip-vs-suspend-interruptible@c-dp1:
    - shard-kbl:          [PASS][15] -> [DMESG-WARN][16] ([i915#165])
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8749/shard-kbl2/igt@kms_flip@flip-vs-suspend-interruptible@c-dp1.html
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18176/shard-kbl6/igt@kms_flip@flip-vs-suspend-interruptible@c-dp1.html

  * igt@kms_frontbuffer_tracking@fbcpsr-stridechange:
    - shard-tglb:         [PASS][17] -> [DMESG-WARN][18] ([i915#1982]) +2 similar issues
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8749/shard-tglb7/igt@kms_frontbuffer_tracking@fbcpsr-stridechange.html
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18176/shard-tglb2/igt@kms_frontbuffer_tracking@fbcpsr-stridechange.html

  * igt@kms_hdr@bpc-switch-suspend:
    - shard-kbl:          [PASS][19] -> [DMESG-WARN][20] ([i915#180]) +3 similar issues
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8749/shard-kbl3/igt@kms_hdr@bpc-switch-suspend.html
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18176/shard-kbl7/igt@kms_hdr@bpc-switch-suspend.html

  * igt@kms_plane_alpha_blend@pipe-c-constant-alpha-min:
    - shard-skl:          [PASS][21] -> [FAIL][22] ([fdo#108145] / [i915#265]) +1 similar issue
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8749/shard-skl5/igt@kms_plane_alpha_blend@pipe-c-constant-alpha-min.html
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18176/shard-skl9/igt@kms_plane_alpha_blend@pipe-c-constant-alpha-min.html

  * igt@kms_psr@psr2_primary_page_flip:
    - shard-iclb:         [PASS][23] -> [SKIP][24] ([fdo#109441])
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8749/shard-iclb2/igt@kms_psr@psr2_primary_page_flip.html
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18176/shard-iclb3/igt@kms_psr@psr2_primary_page_flip.html

  * igt@kms_vblank@pipe-b-ts-continuation-dpms-suspend:
    - shard-skl:          [PASS][25] -> [INCOMPLETE][26] ([i915#69])
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8749/shard-skl1/igt@kms_vblank@pipe-b-ts-continuation-dpms-suspend.html
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18176/shard-skl6/igt@kms_vblank@pipe-b-ts-continuation-dpms-suspend.html

  * igt@prime_self_import@reimport-vs-gem_close-race:
    - shard-iclb:         [PASS][27] -> [DMESG-WARN][28] ([i915#1982])
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8749/shard-iclb4/igt@prime_self_import@reimport-vs-gem_close-race.html
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18176/shard-iclb2/igt@prime_self_import@reimport-vs-gem_close-race.html

  
#### Possible fixes ####

  * igt@gem_exec_reloc@basic-concurrent0:
    - shard-glk:          [FAIL][29] ([i915#1930]) -> [PASS][30]
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8749/shard-glk8/igt@gem_exec_reloc@basic-concurrent0.html
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18176/shard-glk6/igt@gem_exec_reloc@basic-concurrent0.html

  * igt@gem_exec_whisper@basic-queues-priority:
    - shard-glk:          [DMESG-WARN][31] ([i915#118] / [i915#95]) -> [PASS][32]
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8749/shard-glk2/igt@gem_exec_whisper@basic-queues-priority.html
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18176/shard-glk3/igt@gem_exec_whisper@basic-queues-priority.html

  * igt@gen9_exec_parse@allowed-all:
    - shard-apl:          [DMESG-WARN][33] ([i915#1436] / [i915#1635] / [i915#716]) -> [PASS][34]
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8749/shard-apl1/igt@gen9_exec_parse@allowed-all.html
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18176/shard-apl8/igt@gen9_exec_parse@allowed-all.html

  * igt@i915_suspend@fence-restore-tiled2untiled:
    - shard-skl:          [INCOMPLETE][35] ([i915#69]) -> [PASS][36]
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8749/shard-skl8/igt@i915_suspend@fence-restore-tiled2untiled.html
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18176/shard-skl3/igt@i915_suspend@fence-restore-tiled2untiled.html

  * igt@kms_big_fb@y-tiled-64bpp-rotate-0:
    - shard-glk:          [DMESG-FAIL][37] ([i915#118] / [i915#95]) -> [PASS][38]
   [37]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8749/shard-glk8/igt@kms_big_fb@y-tiled-64bpp-rotate-0.html
   [38]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18176/shard-glk6/igt@kms_big_fb@y-tiled-64bpp-rotate-0.html

  * igt@kms_cursor_crc@pipe-a-cursor-128x128-onscreen:
    - shard-skl:          [FAIL][39] ([i915#54]) -> [PASS][40] +1 similar issue
   [39]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8749/shard-skl7/igt@kms_cursor_crc@pipe-a-cursor-128x128-onscreen.html
   [40]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18176/shard-skl4/igt@kms_cursor_crc@pipe-a-cursor-128x128-onscreen.html

  * igt@kms_draw_crc@draw-method-xrgb2101010-mmap-gtt-ytiled:
    - shard-apl:          [DMESG-WARN][41] ([i915#1635] / [i915#1982]) -> [PASS][42] +1 similar issue
   [41]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8749/shard-apl2/igt@kms_draw_crc@draw-method-xrgb2101010-mmap-gtt-ytiled.html
   [42]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18176/shard-apl1/igt@kms_draw_crc@draw-method-xrgb2101010-mmap-gtt-ytiled.html

  * igt@kms_flip@2x-flip-vs-wf_vblank-interruptible@ab-vga1-hdmi-a1:
    - shard-hsw:          [DMESG-WARN][43] ([i915#1982]) -> [PASS][44]
   [43]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8749/shard-hsw6/igt@kms_flip@2x-flip-vs-wf_vblank-interruptible@ab-vga1-hdmi-a1.html
   [44]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18176/shard-hsw6/igt@kms_flip@2x-flip-vs-wf_vblank-interruptible@ab-vga1-hdmi-a1.html

  * igt@kms_flip@flip-vs-panning-interruptible@a-edp1:
    - shard-skl:          [DMESG-WARN][45] ([i915#1982]) -> [PASS][46] +4 similar issues
   [45]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8749/shard-skl10/igt@kms_flip@flip-vs-panning-interruptible@a-edp1.html
   [46]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18176/shard-skl7/igt@kms_flip@flip-vs-panning-interruptible@a-edp1.html

  * igt@kms_flip@flip-vs-suspend-interruptible@a-dp1:
    - shard-kbl:          [DMESG-WARN][47] ([i915#180]) -> [PASS][48] +2 similar issues
   [47]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8749/shard-kbl2/igt@kms_flip@flip-vs-suspend-interruptible@a-dp1.html
   [48]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18176/shard-kbl6/igt@kms_flip@flip-vs-suspend-interruptible@a-dp1.html

  * igt@kms_frontbuffer_tracking@psr-1p-primscrn-spr-indfb-draw-mmap-cpu:
    - shard-tglb:         [DMESG-WARN][49] ([i915#1982]) -> [PASS][50] +2 similar issues
   [49]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8749/shard-tglb2/igt@kms_frontbuffer_tracking@psr-1p-primscrn-spr-indfb-draw-mmap-cpu.html
   [50]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18176/shard-tglb8/igt@kms_frontbuffer_tracking@psr-1p-primscrn-spr-indfb-draw-mmap-cpu.html

  * igt@kms_hdr@bpc-switch-suspend:
    - shard-skl:          [FAIL][51] ([i915#1188]) -> [PASS][52]
   [51]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8749/shard-skl6/igt@kms_hdr@bpc-switch-suspend.html
   [52]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18176/shard-skl6/igt@kms_hdr@bpc-switch-suspend.html

  * igt@kms_plane_alpha_blend@pipe-a-coverage-7efc:
    - shard-skl:          [FAIL][53] ([fdo#108145] / [i915#265]) -> [PASS][54]
   [53]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8749/shard-skl7/igt@kms_plane_alpha_blend@pipe-a-coverage-7efc.html
   [54]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18176/shard-skl4/igt@kms_plane_alpha_blend@pipe-a-coverage-7efc.html

  * igt@kms_psr@psr2_suspend:
    - shard-iclb:         [SKIP][55] ([fdo#109441]) -> [PASS][56] +2 similar issues
   [55]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8749/shard-iclb4/igt@kms_psr@psr2_suspend.html
   [56]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18176/shard-iclb2/igt@kms_psr@psr2_suspend.html

  
#### Warnings ####

  * igt@i915_module_load@reload:
    - shard-tglb:         [DMESG-WARN][57] ([i915#1982]) -> [DMESG-WARN][58] ([i915#402])
   [57]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8749/shard-tglb3/igt@i915_module_load@reload.html
   [58]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18176/shard-tglb5/igt@i915_module_load@reload.html

  * igt@i915_pm_dc@dc3co-vpb-simulation:
    - shard-iclb:         [SKIP][59] ([i915#658]) -> [SKIP][60] ([i915#588])
   [59]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8749/shard-iclb4/igt@i915_pm_dc@dc3co-vpb-simulation.html
   [60]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18176/shard-iclb2/igt@i915_pm_dc@dc3co-vpb-simulation.html

  * igt@kms_cursor_crc@pipe-c-cursor-suspend:
    - shard-kbl:          [DMESG-WARN][61] ([i915#180]) -> [INCOMPLETE][62] ([i915#155])
   [61]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8749/shard-kbl7/igt@kms_cursor_crc@pipe-c-cursor-suspend.html
   [62]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18176/shard-kbl6/igt@kms_cursor_crc@pipe-c-cursor-suspend.html

  
  [fdo#108145]: https://bugs.freedesktop.org/show_bug.cgi?id=108145
  [fdo#109441]: https://bugs.freedesktop.org/show_bug.cgi?id=109441
  [i915#118]: https://gitlab.freedesktop.org/drm/intel/issues/118
  [i915#1188]: https://gitlab.freedesktop.org/drm/intel/issues/1188
  [i915#1436]: https://gitlab.freedesktop.org/drm/intel/issues/1436
  [i915#155]: https://gitlab.freedesktop.org/drm/intel/issues/155
  [i915#1635]: https://gitlab.freedesktop.org/drm/intel/issues/1635
  [i915#165]: https://gitlab.freedesktop.org/drm/intel/issues/165
  [i915#180]: https://gitlab.freedesktop.org/drm/intel/issues/180
  [i915#1930]: https://gitlab.freedesktop.org/drm/intel/issues/1930
  [i915#1982]: https://gitlab.freedesktop.org/drm/intel/issues/1982
  [i915#2079]: https://gitlab.freedesktop.org/drm/intel/issues/2079
  [i915#265]: https://gitlab.freedesktop.org/drm/intel/issues/265
  [i915#402]: https://gitlab.freedesktop.org/drm/intel/issues/402
  [i915#54]: https://gitlab.freedesktop.org/drm/intel/issues/54
  [i915#588]: https://gitlab.freedesktop.org/drm/intel/issues/588
  [i915#658]: https://gitlab.freedesktop.org/drm/intel/issues/658
  [i915#69]: https://gitlab.freedesktop.org/drm/intel/issues/69
  [i915#716]: https://gitlab.freedesktop.org/drm/intel/issues/716
  [i915#79]: https://gitlab.freedesktop.org/drm/intel/issues/79
  [i915#95]: https://gitlab.freedesktop.org/drm/intel/issues/95


Participating hosts (10 -> 10)
------------------------------

  No changes in participating hosts


Build changes
-------------

  * Linux: CI_DRM_8749 -> Patchwork_18176

  CI-20190529: 20190529
  CI_DRM_8749: c704cccd64e81bb9a8d23c414fdcafff65697801 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_5735: 21f8204e54c122e4a0f8ca4b59e4b2db8d1ba687 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_18176: e688f32314fbdcc1be26177cb1fb29f3be801120 @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4509: fdc5a4ca11124ab8413c7988896eec4c97336694 @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18176/index.html

[-- Attachment #1.2: Type: text/html, Size: 17182 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: Reduce i915_request.lock contention for i915_request_wait
  2020-07-15 14:47       ` Chris Wilson
@ 2020-07-16  8:41         ` Tvrtko Ursulin
  2020-07-16  8:47           ` Chris Wilson
  0 siblings, 1 reply; 11+ messages in thread
From: Tvrtko Ursulin @ 2020-07-16  8:41 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: Matthew Auld


On 15/07/2020 15:47, Chris Wilson wrote:
> Quoting Chris Wilson (2020-07-15 15:47:15)
>> Quoting Tvrtko Ursulin (2020-07-15 13:26:23)
>>>
>>> On 15/07/2020 13:06, Tvrtko Ursulin wrote:
>>>>
>>>> On 15/07/2020 11:50, Chris Wilson wrote:
>>>>> Currently, we use i915_request_completed() directly in
>>>>> i915_request_wait() and follow up with a manual invocation of
>>>>> dma_fence_signal(). This appears to cause a large number of contentions
>>>>> on i915_request.lock as when the process is woken up after the fence is
>>>>> signaled by an interrupt, we will then try and call dma_fence_signal()
>>>>> ourselves while the signaler is still holding the lock.
>>>>> dma_fence_is_signaled() has the benefit of checking the
>>>>> DMA_FENCE_FLAG_SIGNALED_BIT prior to calling dma_fence_signal() and so
>>>>> avoids most of that contention.
>>>>>
>>>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>>>> Cc: Matthew Auld <matthew.auld@intel.com>
>>>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>> ---
>>>>>    drivers/gpu/drm/i915/i915_request.c | 12 ++++--------
>>>>>    1 file changed, 4 insertions(+), 8 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/i915/i915_request.c
>>>>> b/drivers/gpu/drm/i915/i915_request.c
>>>>> index 0b2fe55e6194..bb4eb1a8780e 100644
>>>>> --- a/drivers/gpu/drm/i915/i915_request.c
>>>>> +++ b/drivers/gpu/drm/i915/i915_request.c
>>>>> @@ -1640,7 +1640,7 @@ static bool busywait_stop(unsigned long timeout,
>>>>> unsigned int cpu)
>>>>>        return this_cpu != cpu;
>>>>>    }
>>>>> -static bool __i915_spin_request(const struct i915_request * const rq,
>>>>> int state)
>>>>> +static bool __i915_spin_request(struct i915_request * const rq, int
>>>>> state)
>>>>>    {
>>>>>        unsigned long timeout_ns;
>>>>>        unsigned int cpu;
>>>>> @@ -1673,7 +1673,7 @@ static bool __i915_spin_request(const struct
>>>>> i915_request * const rq, int state)
>>>>>        timeout_ns = READ_ONCE(rq->engine->props.max_busywait_duration_ns);
>>>>>        timeout_ns += local_clock_ns(&cpu);
>>>>>        do {
>>>>> -        if (i915_request_completed(rq))
>>>>> +        if (dma_fence_is_signaled(&rq->fence))
>>>>>                return true;
>>>>>            if (signal_pending_state(state, current))
>>>>> @@ -1766,10 +1766,8 @@ long i915_request_wait(struct i915_request *rq,
>>>>>         * duration, which we currently lack.
>>>>>         */
>>>>>        if (IS_ACTIVE(CONFIG_DRM_I915_MAX_REQUEST_BUSYWAIT) &&
>>>>> -        __i915_spin_request(rq, state)) {
>>>>> -        dma_fence_signal(&rq->fence);
>>>>> +        __i915_spin_request(rq, state))
>>>>>            goto out;
>>>>> -    }
>>>>>        /*
>>>>>         * This client is about to stall waiting for the GPU. In many cases
>>>>> @@ -1796,10 +1794,8 @@ long i915_request_wait(struct i915_request *rq,
>>>>>        for (;;) {
>>>>>            set_current_state(state);
>>>>> -        if (i915_request_completed(rq)) {
>>>>> -            dma_fence_signal(&rq->fence);
>>>>> +        if (dma_fence_is_signaled(&rq->fence))
>>>>>                break;
>>>>> -        }
>>>>>            intel_engine_flush_submission(rq->engine);
>>>>>
>>>>
>>>> In other words putting some latency back into the waiters, which is
>>>> probably okay, since sync waits is not our primary model.
>>>>
>>>> I have a slight concern about the remaining value of busy spinning if
>>>> i915_request_completed check is removed from there as well. Of course it
>>>> doesn't make sense to have different completion criteria between the
>>>> two.. We could wait a bit longer if real check in busyspin said request
>>>> is actually completed, just not signal it but wait for the breadcrumbs
>>>> to do it.
>>>
>>> What a load of nonsense.. :)
>>>
>>> Okay, I think the only real question is i915_request_completed vs
>>> dma_fence_signaled in __i915_spin_request. Do we want to burn CPU cycles
>>> waiting on GPU and breadcrumb irq work, or just the GPU.
>>
>> dma_fence_is_signaled() {
>>          if (test_bit(SIGNALED_BIT))
>>                  return true;
>>          
>>          if (i915_request_completed()) {
>>                  dma_fence_signal();
>>                  return true;
>>          }
>>          
>>          return false;
>> }
>>
>> with the indirection. So the question is whether the indirection is
>> worth the extra test bit. Just purely looking at the i915_request.lock
>> contention suggests that it probably is. For the spinner, burning a few
>> extra CPU cycles for *vfunc is not an issue, it's the wakeup latency,
>> and since we are calling dma_fence_signal() upon wakeup we do take the
>> spinlock without checking for an early return from the SIGNALED_BIT.
>> So I think it's a net positive. The alternative was to write
>>
>>          if (i915_request_completed()) {
>>                  if (!i915_request_is_signaled())
>>                          dma_fence_signal();
>>                  break;
>>          }
>>
>> but
>>
>>          if (dma_fence_is_signaled())
>>                  break;
>>
>> does appear simpler, if only by virtue of hiding the details in an
>> inline.
> 
> Or a patch to add test_bit() to dma_fence_signal, but this looked
> better.

Right I missed dma_fence_is_signaled calls i915_request_completed.

In this case the remaining question is do we care about wait ioctl 
potentially returning before the hypothetical sync fence for the same 
request is signaled? This seems like a slight change in user observable 
behaviour.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: Reduce i915_request.lock contention for i915_request_wait
  2020-07-16  8:41         ` Tvrtko Ursulin
@ 2020-07-16  8:47           ` Chris Wilson
  2020-07-16  9:02             ` Tvrtko Ursulin
  0 siblings, 1 reply; 11+ messages in thread
From: Chris Wilson @ 2020-07-16  8:47 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx; +Cc: Matthew Auld

Quoting Tvrtko Ursulin (2020-07-16 09:41:17)
> Right I missed dma_fence_is_signaled calls i915_request_completed.
> 
> In this case the remaining question is do we care about wait ioctl 
> potentially returning before the hypothetical sync fence for the same 
> request is signaled? This seems like a slight change in user observable 
> behaviour.

You would not be able to observe the difference after wakeup in state 
between a sync_file and busy_ioctl/wait_ioctl, as a query after wakeup
will see the same signaled bit. So the only possible difference is a
change in wakeup order; but process wakeup/execution order is definitely
not defined by us :)
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: Reduce i915_request.lock contention for i915_request_wait
  2020-07-16  8:47           ` Chris Wilson
@ 2020-07-16  9:02             ` Tvrtko Ursulin
  0 siblings, 0 replies; 11+ messages in thread
From: Tvrtko Ursulin @ 2020-07-16  9:02 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: Matthew Auld


On 16/07/2020 09:47, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2020-07-16 09:41:17)
>> Right I missed dma_fence_is_signaled calls i915_request_completed.
>>
>> In this case the remaining question is do we care about wait ioctl
>> potentially returning before the hypothetical sync fence for the same
>> request is signaled? This seems like a slight change in user observable
>> behaviour.
> 
> You would not be able to observe the difference after wakeup in state
> between a sync_file and busy_ioctl/wait_ioctl, as a query after wakeup
> will see the same signaled bit. So the only possible difference is a
> change in wakeup order; but process wakeup/execution order is definitely
> not defined by us :)

Doh.. true, same vfunc..

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-07-16  9:02 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-15 10:50 [Intel-gfx] [PATCH] drm/i915: Reduce i915_request.lock contention for i915_request_wait Chris Wilson
2020-07-15 12:06 ` Tvrtko Ursulin
2020-07-15 12:26   ` Tvrtko Ursulin
2020-07-15 14:47     ` Chris Wilson
2020-07-15 14:47       ` Chris Wilson
2020-07-16  8:41         ` Tvrtko Ursulin
2020-07-16  8:47           ` Chris Wilson
2020-07-16  9:02             ` Tvrtko Ursulin
2020-07-15 12:12 ` [Intel-gfx] ✗ Fi.CI.SPARSE: warning for " Patchwork
2020-07-15 12:34 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2020-07-15 16:11 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.