dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] drm/i915: Lock timeline mutex directly in error path of eb_pin_timeline
@ 2022-01-04 23:30 Matthew Brost
  2022-01-05  9:35 ` [Intel-gfx] " Tvrtko Ursulin
  0 siblings, 1 reply; 7+ messages in thread
From: Matthew Brost @ 2022-01-04 23:30 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, john.c.harrison

Don't use the interruptable version of the timeline mutex lock in the
error path of eb_pin_timeline as the cleanup must always happen.

v2:
 (John Harrison)
  - Don't check for interrupt during mutex lock

Fixes: 544460c33821 ("drm/i915: Multi-BB execbuf")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index e9541244027a..e96e133cbb1f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -2516,9 +2516,9 @@ static int eb_pin_timeline(struct i915_execbuffer *eb, struct intel_context *ce,
 				      timeout) < 0) {
 			i915_request_put(rq);
 
-			tl = intel_context_timeline_lock(ce);
+			mutex_lock(&ce->timeline->mutex);
 			intel_context_exit(ce);
-			intel_context_timeline_unlock(tl);
+			mutex_unlock(&ce->timeline->mutex);
 
 			if (nonblock)
 				return -EWOULDBLOCK;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: Lock timeline mutex directly in error path of eb_pin_timeline
  2022-01-04 23:30 [PATCH] drm/i915: Lock timeline mutex directly in error path of eb_pin_timeline Matthew Brost
@ 2022-01-05  9:35 ` Tvrtko Ursulin
  2022-01-05 16:24   ` Matthew Brost
  0 siblings, 1 reply; 7+ messages in thread
From: Tvrtko Ursulin @ 2022-01-05  9:35 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel


On 04/01/2022 23:30, Matthew Brost wrote:
> Don't use the interruptable version of the timeline mutex lock in the

interruptible

> error path of eb_pin_timeline as the cleanup must always happen.
> 
> v2:
>   (John Harrison)
>    - Don't check for interrupt during mutex lock
> 
> Fixes: 544460c33821 ("drm/i915: Multi-BB execbuf")
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> index e9541244027a..e96e133cbb1f 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> @@ -2516,9 +2516,9 @@ static int eb_pin_timeline(struct i915_execbuffer *eb, struct intel_context *ce,
>   				      timeout) < 0) {
>   			i915_request_put(rq);
>   
> -			tl = intel_context_timeline_lock(ce);
> +			mutex_lock(&ce->timeline->mutex);

On the other hand it is more user friendly to handle signals (which 
maybe does not matter in this case, not sure any longer how long hold 
time it can have) but there is also a question of consistency within the 
very function you are changing.

Apart from consistency, what about the parent-child magic 
intel_context_timeline_lock does and you wouldn't have here?

And what about the very existence of intel_context_timeline_lock as a 
component boundary separation API, if it is used inconsistently 
throughout i915_gem_execbuffer.c?

Regards,

Tvrtko

>   			intel_context_exit(ce);
> -			intel_context_timeline_unlock(tl);
> +			mutex_unlock(&ce->timeline->mutex);
>   
>   			if (nonblock)
>   				return -EWOULDBLOCK;
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: Lock timeline mutex directly in error path of eb_pin_timeline
  2022-01-05  9:35 ` [Intel-gfx] " Tvrtko Ursulin
@ 2022-01-05 16:24   ` Matthew Brost
  2022-01-06  9:56     ` Tvrtko Ursulin
  0 siblings, 1 reply; 7+ messages in thread
From: Matthew Brost @ 2022-01-05 16:24 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx, dri-devel

On Wed, Jan 05, 2022 at 09:35:44AM +0000, Tvrtko Ursulin wrote:
> 
> On 04/01/2022 23:30, Matthew Brost wrote:
> > Don't use the interruptable version of the timeline mutex lock in the
> 
> interruptible
> 
> > error path of eb_pin_timeline as the cleanup must always happen.
> > 
> > v2:
> >   (John Harrison)
> >    - Don't check for interrupt during mutex lock
> > 
> > Fixes: 544460c33821 ("drm/i915: Multi-BB execbuf")
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >   drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 4 ++--
> >   1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > index e9541244027a..e96e133cbb1f 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > @@ -2516,9 +2516,9 @@ static int eb_pin_timeline(struct i915_execbuffer *eb, struct intel_context *ce,
> >   				      timeout) < 0) {
> >   			i915_request_put(rq);
> > -			tl = intel_context_timeline_lock(ce);
> > +			mutex_lock(&ce->timeline->mutex);
> 
> On the other hand it is more user friendly to handle signals (which maybe
> does not matter in this case, not sure any longer how long hold time it can
> have) but there is also a question of consistency within the very function
> you are changing.
> 
> Apart from consistency, what about the parent-child magic
> intel_context_timeline_lock does and you wouldn't have here?
> 
> And what about the very existence of intel_context_timeline_lock as a
> component boundary separation API, if it is used inconsistently throughout
> i915_gem_execbuffer.c?

intel_context_timeline_lock does 2 things:

1. Handles lockdep nesting of timeline locks for parent-child contexts
ensuring locks are acquired from parent to last child, then released
last child to parent
2. Allows the mutex lock to be interrupted

This helper should be used in setup steps where a user can signal abort
(context pinning time + request creation time), by 'should be' I mean
this was how it was done before I extended the execbuf IOCTL for
multiple BBs. Slightly confusing but this is what was in place so I
stuck with it.

This code here is an error path that only hold at most 1 timeline lock
(no nesting required) and is a path that must be executed as it is a
cleanup step (not allowed to be interrupted by user, intel_context_exit
must be called or we have dangling engine PM refs).

Make sense? I probably should update the comment message to explain this
a bit better as it did take me a bit to understand how this locking
worked.

Matt

> 
> Regards,
> 
> Tvrtko
> 
> >   			intel_context_exit(ce);
> > -			intel_context_timeline_unlock(tl);
> > +			mutex_unlock(&ce->timeline->mutex);
> >   			if (nonblock)
> >   				return -EWOULDBLOCK;
> > 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: Lock timeline mutex directly in error path of eb_pin_timeline
  2022-01-05 16:24   ` Matthew Brost
@ 2022-01-06  9:56     ` Tvrtko Ursulin
  2022-01-06 16:18       ` Matthew Brost
  0 siblings, 1 reply; 7+ messages in thread
From: Tvrtko Ursulin @ 2022-01-06  9:56 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-gfx, dri-devel


On 05/01/2022 16:24, Matthew Brost wrote:
> On Wed, Jan 05, 2022 at 09:35:44AM +0000, Tvrtko Ursulin wrote:
>>
>> On 04/01/2022 23:30, Matthew Brost wrote:
>>> Don't use the interruptable version of the timeline mutex lock in the
>>
>> interruptible
>>
>>> error path of eb_pin_timeline as the cleanup must always happen.
>>>
>>> v2:
>>>    (John Harrison)
>>>     - Don't check for interrupt during mutex lock
>>>
>>> Fixes: 544460c33821 ("drm/i915: Multi-BB execbuf")
>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 4 ++--
>>>    1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>>> index e9541244027a..e96e133cbb1f 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>>> @@ -2516,9 +2516,9 @@ static int eb_pin_timeline(struct i915_execbuffer *eb, struct intel_context *ce,
>>>    				      timeout) < 0) {
>>>    			i915_request_put(rq);
>>> -			tl = intel_context_timeline_lock(ce);
>>> +			mutex_lock(&ce->timeline->mutex);
>>
>> On the other hand it is more user friendly to handle signals (which maybe
>> does not matter in this case, not sure any longer how long hold time it can
>> have) but there is also a question of consistency within the very function
>> you are changing.
>>
>> Apart from consistency, what about the parent-child magic
>> intel_context_timeline_lock does and you wouldn't have here?
>>
>> And what about the very existence of intel_context_timeline_lock as a
>> component boundary separation API, if it is used inconsistently throughout
>> i915_gem_execbuffer.c?
> 
> intel_context_timeline_lock does 2 things:
> 
> 1. Handles lockdep nesting of timeline locks for parent-child contexts
> ensuring locks are acquired from parent to last child, then released
> last child to parent
> 2. Allows the mutex lock to be interrupted
> 
> This helper should be used in setup steps where a user can signal abort
> (context pinning time + request creation time), by 'should be' I mean
> this was how it was done before I extended the execbuf IOCTL for
> multiple BBs. Slightly confusing but this is what was in place so I
> stuck with it.
> 
> This code here is an error path that only hold at most 1 timeline lock
> (no nesting required) and is a path that must be executed as it is a
> cleanup step (not allowed to be interrupted by user, intel_context_exit
> must be called or we have dangling engine PM refs).
> 
> Make sense? I probably should update the comment message to explain this
> a bit better as it did take me a bit to understand how this locking
> worked.

The part which does not make sense is this:

eb_pin_timeline()
{
...
	tl = intel_context_timeline_lock(ce);
	if (IS_ERR(tl))
		return PTR_ERR(tl);

... do some throttling, and if it fail:
			mutex_lock(&ce->timeline->mutex);

Therefore argument that at most one timeline lock is held and the extra 
stuff is not needed does not hold for me. Why would the throttling 
failed path be different than the initial step in this respect?

Using two ways to lock the same mutex withing 10 lines of code is confusing.

In my mind we have this question of API usage consistency, and also the 
unanswered questions of whether reacting to signals during taking this 
mutex matters (what are the pessimistic lock hold times and what 
influences them?).

Note that first lock handles signals, throttling also handles signals, 
so why wouldn't the cleanup path? Just because then you don't have to 
bother with error unwind is to only reason I can see.

So I suggest you just do proper error unwind and be done with it.

  if (rq) {
	ret = i915_request_wait()
	i915_request_put(rq)
	if (ret)
		goto err;
  }

  return 0;

  err:

  tl = intel_context_timeline_lock()
  intel_context_exit()
  intel_context_timeline_unlock()

  return nonblock ? ... : ...;

Regards,

Tvrtko

> 
> Matt
> 
>>
>> Regards,
>>
>> Tvrtko
>>
>>>    			intel_context_exit(ce);
>>> -			intel_context_timeline_unlock(tl);
>>> +			mutex_unlock(&ce->timeline->mutex);
>>>    			if (nonblock)
>>>    				return -EWOULDBLOCK;
>>>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: Lock timeline mutex directly in error path of eb_pin_timeline
  2022-01-06  9:56     ` Tvrtko Ursulin
@ 2022-01-06 16:18       ` Matthew Brost
  2022-01-06 16:51         ` Tvrtko Ursulin
  0 siblings, 1 reply; 7+ messages in thread
From: Matthew Brost @ 2022-01-06 16:18 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx, dri-devel

On Thu, Jan 06, 2022 at 09:56:03AM +0000, Tvrtko Ursulin wrote:
> 
> On 05/01/2022 16:24, Matthew Brost wrote:
> > On Wed, Jan 05, 2022 at 09:35:44AM +0000, Tvrtko Ursulin wrote:
> > > 
> > > On 04/01/2022 23:30, Matthew Brost wrote:
> > > > Don't use the interruptable version of the timeline mutex lock in the
> > > 
> > > interruptible
> > > 
> > > > error path of eb_pin_timeline as the cleanup must always happen.
> > > > 
> > > > v2:
> > > >    (John Harrison)
> > > >     - Don't check for interrupt during mutex lock
> > > > 
> > > > Fixes: 544460c33821 ("drm/i915: Multi-BB execbuf")
> > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > > ---
> > > >    drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 4 ++--
> > > >    1 file changed, 2 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > > > index e9541244027a..e96e133cbb1f 100644
> > > > --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > > > @@ -2516,9 +2516,9 @@ static int eb_pin_timeline(struct i915_execbuffer *eb, struct intel_context *ce,
> > > >    				      timeout) < 0) {
> > > >    			i915_request_put(rq);
> > > > -			tl = intel_context_timeline_lock(ce);
> > > > +			mutex_lock(&ce->timeline->mutex);
> > > 
> > > On the other hand it is more user friendly to handle signals (which maybe
> > > does not matter in this case, not sure any longer how long hold time it can
> > > have) but there is also a question of consistency within the very function
> > > you are changing.
> > > 
> > > Apart from consistency, what about the parent-child magic
> > > intel_context_timeline_lock does and you wouldn't have here?
> > > 
> > > And what about the very existence of intel_context_timeline_lock as a
> > > component boundary separation API, if it is used inconsistently throughout
> > > i915_gem_execbuffer.c?
> > 
> > intel_context_timeline_lock does 2 things:
> > 
> > 1. Handles lockdep nesting of timeline locks for parent-child contexts
> > ensuring locks are acquired from parent to last child, then released
> > last child to parent
> > 2. Allows the mutex lock to be interrupted
> > 
> > This helper should be used in setup steps where a user can signal abort
> > (context pinning time + request creation time), by 'should be' I mean
> > this was how it was done before I extended the execbuf IOCTL for
> > multiple BBs. Slightly confusing but this is what was in place so I
> > stuck with it.
> > 
> > This code here is an error path that only hold at most 1 timeline lock
> > (no nesting required) and is a path that must be executed as it is a
> > cleanup step (not allowed to be interrupted by user, intel_context_exit
> > must be called or we have dangling engine PM refs).
> > 
> > Make sense? I probably should update the comment message to explain this
> > a bit better as it did take me a bit to understand how this locking
> > worked.
> 
> The part which does not make sense is this:
> 

I'll try to explain this again...

> eb_pin_timeline()
> {
> ...
> 	tl = intel_context_timeline_lock(ce);
> 	if (IS_ERR(tl))
> 		return PTR_ERR(tl);

This part is allowed to be aborted by the user.

> 
> ... do some throttling, and if it fail:
> 			mutex_lock(&ce->timeline->mutex);

This part is not.

> 
> Therefore argument that at most one timeline lock is held and the extra
> stuff is not needed does not hold for me. Why would the throttling failed
> path be different than the initial step in this respect?
> 
> Using two ways to lock the same mutex withing 10 lines of code is confusing.
> 
> In my mind we have this question of API usage consistency, and also the
> unanswered questions of whether reacting to signals during taking this mutex
> matters (what are the pessimistic lock hold times and what influences
> them?).
> 
> Note that first lock handles signals, throttling also handles signals, so
> why wouldn't the cleanup path? Just because then you don't have to bother
> with error unwind is to only reason I can see.
> 
> So I suggest you just do proper error unwind and be done with it.
> 
>  if (rq) {
> 	ret = i915_request_wait()
> 	i915_request_put(rq)
> 	if (ret)
> 		goto err;
>  }
> 
>  return 0;
> 
>  err:
> 
>  tl = intel_context_timeline_lock()

We've already successfully called intel_context_enter,
intel_context_timeline_lock can be aborted by the user and return NULL.
If NULL lockdep blows up in intel_context_exit and we can NULL ptr dref
in intel_context_timeline_unlock. If we bail on tl returning NULL, now
we don't call intel_context_exit and be have dangling PM ref and GPU
will never power down.

Again all of the same code was in place before any of updated to the
execbuf IOCTL for multi-BB, just fixing a mistake I made when doing this
update. 

Matt

>  intel_context_exit()
>  intel_context_timeline_unlock()
> 
>  return nonblock ? ... : ...;
> 
> Regards,
> 
> Tvrtko
> 
> > 
> > Matt
> > 
> > > 
> > > Regards,
> > > 
> > > Tvrtko
> > > 
> > > >    			intel_context_exit(ce);
> > > > -			intel_context_timeline_unlock(tl);
> > > > +			mutex_unlock(&ce->timeline->mutex);
> > > >    			if (nonblock)
> > > >    				return -EWOULDBLOCK;
> > > > 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: Lock timeline mutex directly in error path of eb_pin_timeline
  2022-01-06 16:18       ` Matthew Brost
@ 2022-01-06 16:51         ` Tvrtko Ursulin
  2022-01-06 16:52           ` Matthew Brost
  0 siblings, 1 reply; 7+ messages in thread
From: Tvrtko Ursulin @ 2022-01-06 16:51 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-gfx, dri-devel


On 06/01/2022 16:18, Matthew Brost wrote:
> On Thu, Jan 06, 2022 at 09:56:03AM +0000, Tvrtko Ursulin wrote:
>>
>> On 05/01/2022 16:24, Matthew Brost wrote:
>>> On Wed, Jan 05, 2022 at 09:35:44AM +0000, Tvrtko Ursulin wrote:
>>>>
>>>> On 04/01/2022 23:30, Matthew Brost wrote:
>>>>> Don't use the interruptable version of the timeline mutex lock in the
>>>>
>>>> interruptible
>>>>
>>>>> error path of eb_pin_timeline as the cleanup must always happen.
>>>>>
>>>>> v2:
>>>>>     (John Harrison)
>>>>>      - Don't check for interrupt during mutex lock
>>>>>
>>>>> Fixes: 544460c33821 ("drm/i915: Multi-BB execbuf")
>>>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>>>> ---
>>>>>     drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 4 ++--
>>>>>     1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>>>>> index e9541244027a..e96e133cbb1f 100644
>>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>>>>> @@ -2516,9 +2516,9 @@ static int eb_pin_timeline(struct i915_execbuffer *eb, struct intel_context *ce,
>>>>>     				      timeout) < 0) {
>>>>>     			i915_request_put(rq);
>>>>> -			tl = intel_context_timeline_lock(ce);
>>>>> +			mutex_lock(&ce->timeline->mutex);
>>>>
>>>> On the other hand it is more user friendly to handle signals (which maybe
>>>> does not matter in this case, not sure any longer how long hold time it can
>>>> have) but there is also a question of consistency within the very function
>>>> you are changing.
>>>>
>>>> Apart from consistency, what about the parent-child magic
>>>> intel_context_timeline_lock does and you wouldn't have here?
>>>>
>>>> And what about the very existence of intel_context_timeline_lock as a
>>>> component boundary separation API, if it is used inconsistently throughout
>>>> i915_gem_execbuffer.c?
>>>
>>> intel_context_timeline_lock does 2 things:
>>>
>>> 1. Handles lockdep nesting of timeline locks for parent-child contexts
>>> ensuring locks are acquired from parent to last child, then released
>>> last child to parent
>>> 2. Allows the mutex lock to be interrupted
>>>
>>> This helper should be used in setup steps where a user can signal abort
>>> (context pinning time + request creation time), by 'should be' I mean
>>> this was how it was done before I extended the execbuf IOCTL for
>>> multiple BBs. Slightly confusing but this is what was in place so I
>>> stuck with it.
>>>
>>> This code here is an error path that only hold at most 1 timeline lock
>>> (no nesting required) and is a path that must be executed as it is a
>>> cleanup step (not allowed to be interrupted by user, intel_context_exit
>>> must be called or we have dangling engine PM refs).
>>>
>>> Make sense? I probably should update the comment message to explain this
>>> a bit better as it did take me a bit to understand how this locking
>>> worked.
>>
>> The part which does not make sense is this:
>>
> 
> I'll try to explain this again...
> 
>> eb_pin_timeline()
>> {
>> ...
>> 	tl = intel_context_timeline_lock(ce);
>> 	if (IS_ERR(tl))
>> 		return PTR_ERR(tl);
> 
> This part is allowed to be aborted by the user.
> 
>>
>> ... do some throttling, and if it fail:
>> 			mutex_lock(&ce->timeline->mutex);
> 
> This part is not.

Pfft got it this time round.

I suggest a comment above the mutex to this effect. And maybe still 
consider the explicit error path which may be more readable due single 
request put, but it's up to you.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: Lock timeline mutex directly in error path of eb_pin_timeline
  2022-01-06 16:51         ` Tvrtko Ursulin
@ 2022-01-06 16:52           ` Matthew Brost
  0 siblings, 0 replies; 7+ messages in thread
From: Matthew Brost @ 2022-01-06 16:52 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx, dri-devel

On Thu, Jan 06, 2022 at 04:51:21PM +0000, Tvrtko Ursulin wrote:
> 
> On 06/01/2022 16:18, Matthew Brost wrote:
> > On Thu, Jan 06, 2022 at 09:56:03AM +0000, Tvrtko Ursulin wrote:
> > > 
> > > On 05/01/2022 16:24, Matthew Brost wrote:
> > > > On Wed, Jan 05, 2022 at 09:35:44AM +0000, Tvrtko Ursulin wrote:
> > > > > 
> > > > > On 04/01/2022 23:30, Matthew Brost wrote:
> > > > > > Don't use the interruptable version of the timeline mutex lock in the
> > > > > 
> > > > > interruptible
> > > > > 
> > > > > > error path of eb_pin_timeline as the cleanup must always happen.
> > > > > > 
> > > > > > v2:
> > > > > >     (John Harrison)
> > > > > >      - Don't check for interrupt during mutex lock
> > > > > > 
> > > > > > Fixes: 544460c33821 ("drm/i915: Multi-BB execbuf")
> > > > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > > > > ---
> > > > > >     drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 4 ++--
> > > > > >     1 file changed, 2 insertions(+), 2 deletions(-)
> > > > > > 
> > > > > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > > > > > index e9541244027a..e96e133cbb1f 100644
> > > > > > --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > > > > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > > > > > @@ -2516,9 +2516,9 @@ static int eb_pin_timeline(struct i915_execbuffer *eb, struct intel_context *ce,
> > > > > >     				      timeout) < 0) {
> > > > > >     			i915_request_put(rq);
> > > > > > -			tl = intel_context_timeline_lock(ce);
> > > > > > +			mutex_lock(&ce->timeline->mutex);
> > > > > 
> > > > > On the other hand it is more user friendly to handle signals (which maybe
> > > > > does not matter in this case, not sure any longer how long hold time it can
> > > > > have) but there is also a question of consistency within the very function
> > > > > you are changing.
> > > > > 
> > > > > Apart from consistency, what about the parent-child magic
> > > > > intel_context_timeline_lock does and you wouldn't have here?
> > > > > 
> > > > > And what about the very existence of intel_context_timeline_lock as a
> > > > > component boundary separation API, if it is used inconsistently throughout
> > > > > i915_gem_execbuffer.c?
> > > > 
> > > > intel_context_timeline_lock does 2 things:
> > > > 
> > > > 1. Handles lockdep nesting of timeline locks for parent-child contexts
> > > > ensuring locks are acquired from parent to last child, then released
> > > > last child to parent
> > > > 2. Allows the mutex lock to be interrupted
> > > > 
> > > > This helper should be used in setup steps where a user can signal abort
> > > > (context pinning time + request creation time), by 'should be' I mean
> > > > this was how it was done before I extended the execbuf IOCTL for
> > > > multiple BBs. Slightly confusing but this is what was in place so I
> > > > stuck with it.
> > > > 
> > > > This code here is an error path that only hold at most 1 timeline lock
> > > > (no nesting required) and is a path that must be executed as it is a
> > > > cleanup step (not allowed to be interrupted by user, intel_context_exit
> > > > must be called or we have dangling engine PM refs).
> > > > 
> > > > Make sense? I probably should update the comment message to explain this
> > > > a bit better as it did take me a bit to understand how this locking
> > > > worked.
> > > 
> > > The part which does not make sense is this:
> > > 
> > 
> > I'll try to explain this again...
> > 
> > > eb_pin_timeline()
> > > {
> > > ...
> > > 	tl = intel_context_timeline_lock(ce);
> > > 	if (IS_ERR(tl))
> > > 		return PTR_ERR(tl);
> > 
> > This part is allowed to be aborted by the user.
> > 
> > > 
> > > ... do some throttling, and if it fail:
> > > 			mutex_lock(&ce->timeline->mutex);
> > 
> > This part is not.
> 
> Pfft got it this time round.
> 
> I suggest a comment above the mutex to this effect. And maybe still consider
> the explicit error path which may be more readable due single request put,
> but it's up to you.
> 

I'll add comment and see how adding an explict error path looks in the code.

Matt

> Regards,
> 
> Tvrtko

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-01-06 16:58 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-04 23:30 [PATCH] drm/i915: Lock timeline mutex directly in error path of eb_pin_timeline Matthew Brost
2022-01-05  9:35 ` [Intel-gfx] " Tvrtko Ursulin
2022-01-05 16:24   ` Matthew Brost
2022-01-06  9:56     ` Tvrtko Ursulin
2022-01-06 16:18       ` Matthew Brost
2022-01-06 16:51         ` Tvrtko Ursulin
2022-01-06 16:52           ` Matthew Brost

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).