Re: [Intel-gfx] [PATCH] drm/i915: Reduce i915_request.lock contention for i915_request_wait

From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>, intel-gfx@lists.freedesktop.org
Cc: Matthew Auld <matthew.auld@intel.com>
Subject: Re: [Intel-gfx] [PATCH] drm/i915: Reduce i915_request.lock contention for i915_request_wait
Date: Wed, 15 Jul 2020 13:06:41 +0100	[thread overview]
Message-ID: <2bbb4568-2f73-bedc-8f3b-726bb19a62f2@linux.intel.com> (raw)
In-Reply-To: <20200715105004.17973-1-chris@chris-wilson.co.uk>

On 15/07/2020 11:50, Chris Wilson wrote:
> Currently, we use i915_request_completed() directly in
> i915_request_wait() and follow up with a manual invocation of
> dma_fence_signal(). This appears to cause a large number of contentions
> on i915_request.lock as when the process is woken up after the fence is
> signaled by an interrupt, we will then try and call dma_fence_signal()
> ourselves while the signaler is still holding the lock.
> dma_fence_is_signaled() has the benefit of checking the
> DMA_FENCE_FLAG_SIGNALED_BIT prior to calling dma_fence_signal() and so
> avoids most of that contention.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Matthew Auld <matthew.auld@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_request.c | 12 ++++--------
>   1 file changed, 4 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 0b2fe55e6194..bb4eb1a8780e 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -1640,7 +1640,7 @@ static bool busywait_stop(unsigned long timeout, unsigned int cpu)
>   	return this_cpu != cpu;
>   }
>   
> -static bool __i915_spin_request(const struct i915_request * const rq, int state)
> +static bool __i915_spin_request(struct i915_request * const rq, int state)
>   {
>   	unsigned long timeout_ns;
>   	unsigned int cpu;
> @@ -1673,7 +1673,7 @@ static bool __i915_spin_request(const struct i915_request * const rq, int state)
>   	timeout_ns = READ_ONCE(rq->engine->props.max_busywait_duration_ns);
>   	timeout_ns += local_clock_ns(&cpu);
>   	do {
> -		if (i915_request_completed(rq))
> +		if (dma_fence_is_signaled(&rq->fence))
>   			return true;
>   
>   		if (signal_pending_state(state, current))
> @@ -1766,10 +1766,8 @@ long i915_request_wait(struct i915_request *rq,
>   	 * duration, which we currently lack.
>   	 */
>   	if (IS_ACTIVE(CONFIG_DRM_I915_MAX_REQUEST_BUSYWAIT) &&
> -	    __i915_spin_request(rq, state)) {
> -		dma_fence_signal(&rq->fence);
> +	    __i915_spin_request(rq, state))
>   		goto out;
> -	}
>   
>   	/*
>   	 * This client is about to stall waiting for the GPU. In many cases
> @@ -1796,10 +1794,8 @@ long i915_request_wait(struct i915_request *rq,
>   	for (;;) {
>   		set_current_state(state);
>   
> -		if (i915_request_completed(rq)) {
> -			dma_fence_signal(&rq->fence);
> +		if (dma_fence_is_signaled(&rq->fence))
>   			break;
> -		}
>   
>   		intel_engine_flush_submission(rq->engine);
>   
> 

In other words putting some latency back into the waiters, which is 
probably okay, since sync waits is not our primary model.

I have a slight concern about the remaining value of busy spinning if 
i915_request_completed check is removed from there as well. Of course it 
doesn't make sense to have different completion criteria between the 
two.. We could wait a bit longer if real check in busyspin said request 
is actually completed, just not signal it but wait for the breadcrumbs 
to do it.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx