Re: [RFC] drm/i915: Add sync framework support to execbuff IOCTL

From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: John Harrison <John.C.Harrison@Intel.com>,
	Daniel Vetter <daniel@ffwll.ch>
Cc: Intel-GFX@Lists.FreeDesktop.Org
Subject: Re: [RFC] drm/i915: Add sync framework support to execbuff IOCTL
Date: Mon, 06 Jul 2015 15:46:49 +0100	[thread overview]
Message-ID: <559A94D9.1030705@linux.intel.com> (raw)
In-Reply-To: <559A9004.50606@Intel.com>

On 07/06/2015 03:26 PM, John Harrison wrote:
> On 06/07/2015 14:59, Daniel Vetter wrote:
>> On Mon, Jul 06, 2015 at 01:58:25PM +0100, John Harrison wrote:
>>> On 06/07/2015 10:29, Daniel Vetter wrote:
>>>> On Fri, Jul 03, 2015 at 12:17:33PM +0100, Tvrtko Ursulin wrote:
>>>>> On 07/02/2015 04:55 PM, Chris Wilson wrote:
>>>>>> It would be nice if we could reuse one seqno both for
>>>>>> internal/external
>>>>>> fences. If you need to expose a fence ordering within a timeline
>>>>>> that is
>>>>>> based on the creation stamp rather than execution stamp, it seems
>>>>>> like
>>>>>> we could just add such a stamp when creating the sync_pt and not
>>>>>> worry
>>>>>> about its relationship to the execution seqno.
>>>>>>
>>>>>> Doing so does expose that requests are reordered to userspace
>>>>>> since the
>>>>>> signalling timeline is not the same as userspace's ordered
>>>>>> timeline. Not
>>>>>> sure if that is a problem or not.
>>>>>>
>>>>>> Afaict the sync uapi is based on waiting for all of a set of
>>>>>> fences to
>>>>>> retire. It doesn't seem to rely on fence ordering (that is knowing
>>>>>> that
>>>>>> fence A will signal before fence B so it need only wait on fence B).
>>>>>>
>>>>>> Here's hoping that we can have both simplicity and efficiency...
>>>>> Jumping in with not even perfect understanding of everything here -
>>>>> but
>>>>> timeline business has always been confusing me. There is nothing in
>>>>> the
>>>>> uapi which needs it afaics and iirc there was some discussion at
>>>>> the time
>>>>> Jesse floated his patches that it can be removed. Based on that when I
>>>>> squashed his patches and ported them on top of John's request to fence
>>>>> conversion it ended up something like the below (manually edited a
>>>>> bit to
>>>>> be less noisy and some prep patches omitted):
>>>>>
>>>>> This implements the ioctl based uapi and indeed seqnos are not
>>>>> actually
>>>>> used in waits. So is this insufficient for some reason? (Other that it
>>>>> does not implement the input fence side of things.)
>>>> Yeah android syncpt on top of struct fence embedded int i915 request is
>>>> what I'd have expected.
>>> The thing I'm not happy with in that plan is that it leaves the kernel
>>> driver at the mercy of user land applications. If we return a fence
>>> object
>>> to user land via a file descriptor (or indeed any other mechanism)
>>> then that
>>> fence object must be locked until user land closes the file. If the
>>> fence
>>> object is the one embedded within our request structure then that
>>> means user
>>> land is effectively locking our request structure. Given that more
>>> and more
>>> stuff is being attached to the request, that could be a fair bit of
>>> memory
>>> tied up that we can do nothing about. E.g. if a rogue/buggy application
>>> requests a fence be returned for every batch buffer submitted but never
>>> closes them. Whereas, if we go the route of a separate fence object
>>> specifically for user land then they can leak them like a sieve and
>>> we won't
>>> really care so much.
>> Userspace can exhaust kernel allocations, that's nothing new. And if we
>> keep it userspace simply needs to leak a few more fence fds than if
>> there's a bit more data attached to it.
>>
>> The solution to this problem is to have a mem cgroup limit set. No
>> need to
>> complicate our kernel code.
>
> There is still the extra complication that request unreferencing cannot
> require any kind of mutex lock if we are allowing it to happen from
> outside of the driver. That means the unreference callback must move the
> request to a 'please clean me later' list, schedule a worker thread to
> run, and thus do the clean up asynchronously.

For this particular issue my solution was to extend the sync_fence 
constructor to take a mutex and store it inside the object. Then at 
destruction time, which happens at sync_fd->f_ops->release() time, it is 
just a matter of calling kref_put_mutex instead of kref_put.

Seemed to work under some quick testing but that is as much as I did 
back then.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx