All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>,
	intel-gfx@lists.freedesktop.org,
	Lionel Landwerlin <lionel.g.landwerlin@linux.intel.com>
Subject: Re: [PATCH v4] drm/i915: Implement inter-engine read-read optimisations
Date: Fri, 27 Mar 2015 11:22:15 +0000	[thread overview]
Message-ID: <55153D67.6070501@linux.intel.com> (raw)
In-Reply-To: <20150326213140.GF18055@nuc-i3427.alporthouse.com>


On 03/26/2015 09:31 PM, Chris Wilson wrote:
> On Thu, Mar 26, 2015 at 05:43:33PM +0000, Tvrtko Ursulin wrote:
>>> -static int
>>> -i915_gem_object_wait_rendering__tail(struct drm_i915_gem_object *obj)
>>> -{
>>> -	if (!obj->active)
>>> -		return 0;
>>> -
>>> -	/* Manually manage the write flush as we may have not yet
>>> -	 * retired the buffer.
>>> -	 *
>>> -	 * Note that the last_write_req is always the earlier of
>>> -	 * the two (read/write) requests, so if we haved successfully waited,
>>> -	 * we know we have passed the last write.
>>> -	 */
>>> -	i915_gem_request_assign(&obj->last_write_req, NULL);
>>> -
>>> -	return 0;
>>> +	return __i915_wait_request(req, reset_counter,
>>> +				   interruptible, NULL, NULL);
>>>   }
>>
>> Net code comments -/+ for this patch needs improvement. :) Above you
>> deleted a chunk but below added nothing.
>
> I did. It's not added here as this code is buggy.

In a later version or you mean few lines at i915_gem_object_sync?

>> I think something high level is needed to explain the active lists
>> and tracking etc.
>
> GEM object domain management and eviction.
>
>>>   /**
>>> @@ -1405,18 +1381,39 @@ static __must_check int
>>>   i915_gem_object_wait_rendering(struct drm_i915_gem_object *obj,
>>>   			       bool readonly)
>>>   {
>>> -	struct drm_i915_gem_request *req;
>>> -	int ret;
>>> +	int ret, i;
>>>
>>> -	req = readonly ? obj->last_write_req : obj->last_read_req;
>>> -	if (!req)
>>> +	if (!obj->active)
>>>   		return 0;
>>>
>>> -	ret = i915_wait_request(req);
>>> -	if (ret)
>>> -		return ret;
>>> +	if (readonly) {
>>> +		if (obj->last_write_req != NULL) {
>>> +			ret = i915_wait_request(obj->last_write_req);
>>> +			if (ret)
>>> +				return ret;
>>> +
>>> +			i = obj->last_write_req->ring->id;
>>> +			if (obj->last_read_req[i] == obj->last_write_req)
>>> +				i915_gem_object_retire__read(obj, i);
>>> +			else
>>> +				i915_gem_object_retire__write(obj);
>>
>> Above mentioned comments would especially help understand the
>> ordering rules here.
>>
>>> +		}
>>> +	} else {
>>> +		for (i = 0; i < I915_NUM_RINGS; i++) {
>>> +			if (obj->last_read_req[i] == NULL)
>>> +				continue;
>>> +
>>> +			ret = i915_wait_request(obj->last_read_req[i]);
>>> +			if (ret)
>>> +				return ret;
>>> +
>>> +			i915_gem_object_retire__read(obj, i);
>>> +		}
>>
>> I think with optimistic spinning this could end up doing num_rings
>> spins etc in sequence. Would it be worth smarting it up somehow?
>
> Hmm. Good point. Obviously the goal of the optimisation is to have more
> opportunities where we never have to wait at at all. Writing a new
> i915_wait_requests() will add a lot of code, definitely something I want
> to postpone. But given the sequential wait, later ones are more likely
> to already be complete.

Just because of elapsed time I suppose, not because of any conceptual 
correlations between execution time and rings. If we have three batches 
on three rings with execution times like:

0: ==
1: ====
2: ========

It will end up "spin a bit wait a bit" three times.

But it is probably not likely so I defer to your opinion that it is OK 
to postpone smarting this up.

>>> +		BUG_ON(obj->active);
>>
>> Better WARN and halt the driver indirectly if unavoidable.
>
> It's a debug leftover, it's gone.
>
>>> +	}
>>> +
>>> +	return 0;
>>>
>>> -	return i915_gem_object_wait_rendering__tail(obj);
>>>   }
>>>
>>>   /* A nonblocking variant of the above wait. This is a highly dangerous routine
>>> @@ -1427,37 +1424,71 @@ i915_gem_object_wait_rendering__nonblocking(struct drm_i915_gem_object *obj,
>>>   					    struct drm_i915_file_private *file_priv,
>>>   					    bool readonly)
>>>   {
>>> -	struct drm_i915_gem_request *req;
>>>   	struct drm_device *dev = obj->base.dev;
>>>   	struct drm_i915_private *dev_priv = dev->dev_private;
>>> +	struct drm_i915_gem_request *requests[I915_NUM_RINGS];
>>>   	unsigned reset_counter;
>>> -	int ret;
>>> +	int ret, i, n = 0;
>>>
>>>   	BUG_ON(!mutex_is_locked(&dev->struct_mutex));
>>>   	BUG_ON(!dev_priv->mm.interruptible);
>>>
>>> -	req = readonly ? obj->last_write_req : obj->last_read_req;
>>> -	if (!req)
>>> +	if (!obj->active)
>>>   		return 0;
>>>
>>>   	ret = i915_gem_check_wedge(&dev_priv->gpu_error, true);
>>>   	if (ret)
>>>   		return ret;
>>>
>>> -	ret = i915_gem_check_olr(req);
>>> -	if (ret)
>>> -		return ret;
>>> -
>>>   	reset_counter = atomic_read(&dev_priv->gpu_error.reset_counter);
>>> -	i915_gem_request_reference(req);
>>
>> Good place for a comment: Build an array of requests to be waited upon etc..
>
> Code comments need to explain why.

I suppose that means it is totally obvious what all this code does since 
there are no comments? :D

Sometimes it can help to say what you are doing if it is a block of code 
in question. You can even add why to the what. I think it helps the 
reviewer or anyone trying to understand the code in the future.

>>> +
>>> +	if (readonly) {
>>> +		struct drm_i915_gem_request *rq;
>>> +
>>> +		rq = obj->last_write_req;
>>> +		if (rq == NULL)
>>> +			return 0;
>>> +
>>> +		ret = i915_gem_check_olr(rq);
>>> +		if (ret)
>>> +			goto err;
>>> +
>>> +		requests[n++] = i915_gem_request_reference(rq);
>>> +	} else {
>>> +		for (i = 0; i < I915_NUM_RINGS; i++) {
>>> +			struct drm_i915_gem_request *rq;
>>> +
>>> +			rq = obj->last_read_req[i];
>>> +			if (rq == NULL)
>>> +				continue;
>>> +
>>> +			ret = i915_gem_check_olr(rq);
>>> +			if (ret)
>>> +				goto err;
>>> +
>>> +			requests[n++] = i915_gem_request_reference(rq);
>>> +		}
>>> +	}
>>
>> I wonder if merging the tracked requests to a single array, roughly
>> something like:
>>
>> 	obj->last_req[1 + NUM_RINGS]
>>
>> and:
>>
>> 	LAST_WRITE_REQ = 0
>> 	LAST_READ_REQ(ring) = 1 + ring
>>
>> Could make the above logic more compact and how would it look
>> elsewhere in the code? Definitely looks like it would work for the
>> above loop:
>>
>> 	for (i = LAST_WRITE_REQ; i <= LAST_TRACKED_REQUEST; i++) {
>> 		rq = obj->last_req[i];
>> 		if (rq == NULL && readonly)
>> 			return 0;
>> 		else if (rq == NULL)
>> 			continue;
>> 		...
>> 		requests[n++] = ...
>> 		if (readonly)
>> 			break;
>> 	}
>>
>> Not sure, might not be that great idea.
>
> Nah. I think it's only a win here. Elsewhere there is greater
> diferrentiation between read/write. Conceptually it is even
>
> 	struct {
> 		struct drm_i915_gem_request *request;
> 		struct list_head active_link;
> 	} read[I915_NUM_RINGS];
> 	struct drm_i915_gem_request *write_request, *fence_request;
>
>>>   	mutex_unlock(&dev->struct_mutex);
>>> -	ret = __i915_wait_request(req, reset_counter, true, NULL, file_priv);
>>> +	for (i = 0; ret == 0 && i < n; i++)
>>> +		ret = __i915_wait_request(requests[i], reset_counter, true,
>>> +					  NULL, file_priv);
>>
>> Another chance for more optimal "group" waiting.
>>
>>>   	mutex_lock(&dev->struct_mutex);
>>> -	i915_gem_request_unreference(req);
>>> -	if (ret)
>>> -		return ret;
>>>
>>> -	return i915_gem_object_wait_rendering__tail(obj);
>>> +err:
>>> +	for (i = 0; i < n; i++) {
>>> +		if (ret == 0) {
>>> +			int ring = requests[i]->ring->id;
>>> +			if (obj->last_read_req[ring] == requests[i])
>>> +				i915_gem_object_retire__read(obj, ring);
>>> +			if (obj->last_write_req == requests[i])
>>> +				i915_gem_object_retire__write(obj);
>>> +		}
>>
>> What if one wait succeeded and one failed, no need to update anything?
>
> Too much complication for an error path. This just aims to optimise the
> bookkeeping for an object after a wait.
>
>>> -static void
>>> -i915_gem_object_move_to_active(struct drm_i915_gem_object *obj,
>>> -			       struct intel_engine_cs *ring)
>>> +void i915_vma_move_to_active(struct i915_vma *vma,
>>> +			     struct intel_engine_cs *ring)
>>>   {
>>> -	struct drm_i915_gem_request *req;
>>> -	struct intel_engine_cs *old_ring;
>>> -
>>> -	BUG_ON(ring == NULL);
>>> -
>>> -	req = intel_ring_get_request(ring);
>>> -	old_ring = i915_gem_request_get_ring(obj->last_read_req);
>>> -
>>> -	if (old_ring != ring && obj->last_write_req) {
>>> -		/* Keep the request relative to the current ring */
>>> -		i915_gem_request_assign(&obj->last_write_req, req);
>>> -	}
>>> +	struct drm_i915_gem_object *obj = vma->obj;
>>>
>>>   	/* Add a reference if we're newly entering the active list. */
>>> -	if (!obj->active) {
>>> +	if (obj->last_read_req[ring->id] == NULL && obj->active++ == 0)
>>>   		drm_gem_object_reference(&obj->base);
>>> -		obj->active = 1;
>>> -	}
>>> +	BUG_ON(obj->active == 0 || obj->active > I915_NUM_RINGS);
>>
>> Maybe we need I915_WARN_AND_HALT() which would be a WARN() plus put
>> the driver in halted state?
>
> This just magically evaporates by moving over to an obj->active
> bitfield. You'll love v5, but hate v6 (which abuses a lots more internal
> knowledge of request management).
>
>>> +	if (to == NULL) {
>>> +		ret = i915_gem_object_wait_rendering(obj, readonly);
>>> +	} else if (readonly) {
>>> +		ret = __i915_gem_object_sync(obj, to,
>>> +					     obj->last_write_req);
>>> +	} else {
>>> +		for (i = 0; i < I915_NUM_RINGS; i++) {
>>> +			ret = __i915_gem_object_sync(obj, to,
>>> +						     obj->last_read_req[i]);
>>
>> Here I think is another opportunity to wait for all of them at once.
>> Via a __i915_gem_object_sync helper which would take an array, or a
>> ring/request mask. Not sure if it would be worth it though.
>
> No. This is more complicated still since we have the semaphores to also
> deal will. Definitely worth waiting for a testcase.
>
>>> -	args->busy = obj->active;
>>> -	if (obj->last_read_req) {
>>> -		struct intel_engine_cs *ring;
>>>   		BUILD_BUG_ON(I915_NUM_RINGS > 16);
>>> -		ring = i915_gem_request_get_ring(obj->last_read_req);
>>> -		args->busy |= intel_ring_flag(ring) << 16;
>>> +
>>> +		for (i = 0; i < I915_NUM_RINGS; i++) {
>>> +			if (obj->last_read_req[i] == NULL)
>>> +				continue;
>>> +
>>> +			args->busy |= 1 << (16 + i) | 1;
>>
>> Doesn't look like equivalent bits will be set? What is the "| 1" at
>> the end for?
>
> No. It's designed for. This is what userspaces expects and is required
> to help workaround #70764.
>
> I want to replace the (| 1) with
> (| intel_ring_flag(obj->last_write_request->ring); but it exists because
> we couldn't be sure if any userspace depended on exactly (0, 1).

I don't get it, it is exported to userspace and now it is different, why 
is that OK?

>>> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
>>> index 0efb19a9b9a5..1a3756e6bea4 100644
>>> --- a/drivers/gpu/drm/i915/intel_display.c
>>> +++ b/drivers/gpu/drm/i915/intel_display.c
>>> @@ -9674,7 +9674,7 @@ static bool use_mmio_flip(struct intel_engine_cs *ring,
>>>   	else if (i915.enable_execlists)
>>>   		return true;
>>>   	else
>>> -		return ring != i915_gem_request_get_ring(obj->last_read_req);
>>> +		return ring != i915_gem_request_get_ring(obj->last_write_req);
>>>   }
>>
>> Why this?
>
> Because that is correct.

OK I'll think about it.

Regards,

Tvrtko


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2015-03-27 11:22 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-16 15:45 [PATCH] drm/i915: Implement inter-engine read-read optimisations Chris Wilson
2015-03-17  9:54 ` [PATCH v2] " Chris Wilson
2015-03-17 10:11   ` [PATCH v3] " Chris Wilson
2015-03-19  8:59   ` [PATCH v4] " Chris Wilson
2015-03-26 16:09     ` Lionel Landwerlin
2015-03-26 16:09       ` [PATCH] drm/i915: Implement " Lionel Landwerlin
2015-03-26 16:12       ` Chris Wilson
2015-03-26 17:43     ` [PATCH v4] drm/i915: Implement " Tvrtko Ursulin
2015-03-26 21:31       ` Chris Wilson
2015-03-27 11:22         ` Tvrtko Ursulin [this message]
2015-03-27 11:38           ` Chris Wilson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55153D67.6070501@linux.intel.com \
    --to=tvrtko.ursulin@linux.intel.com \
    --cc=chris@chris-wilson.co.uk \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=lionel.g.landwerlin@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.