dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] drm/i915: Handle Intel igfx + Intel dgfx hybrid graphics setup
@ 2021-10-05 11:31 Tvrtko Ursulin
  2021-10-05 13:05 ` Thomas Hellström
  0 siblings, 1 reply; 6+ messages in thread
From: Tvrtko Ursulin @ 2021-10-05 11:31 UTC (permalink / raw)
  To: Intel-gfx
  Cc: dri-devel, Tvrtko Ursulin, Daniel Vetter, Matthew Auld,
	Thomas Hellström

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

In short this makes i915 work for hybrid setups (DRI_PRIME=1 with Mesa)
when rendering is done on Intel dgfx and scanout/composition on Intel
igfx.

Before this patch the driver was not quite ready for that setup, mainly
because it was able to emit a semaphore wait between the two GPUs, which
results in deadlocks because semaphore target location in HWSP is neither
shared between the two, nor mapped in both GGTT spaces.

To fix it the patch adds an additional check to a couple of relevant code
paths in order to prevent using semaphores for inter-engine
synchronisation when relevant objects are not in the same GGTT space.

v2:
 * Avoid adding rq->i915. (Chris)

v3:
 * Use GGTT which describes the limit more precisely.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_request.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 79da5eca60af..4f189982f67e 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1145,6 +1145,12 @@ __emit_semaphore_wait(struct i915_request *to,
 	return 0;
 }
 
+static bool
+can_use_semaphore_wait(struct i915_request *to, struct i915_request *from)
+{
+	return to->engine->gt->ggtt == from->engine->gt->ggtt;
+}
+
 static int
 emit_semaphore_wait(struct i915_request *to,
 		    struct i915_request *from,
@@ -1153,6 +1159,9 @@ emit_semaphore_wait(struct i915_request *to,
 	const intel_engine_mask_t mask = READ_ONCE(from->engine)->mask;
 	struct i915_sw_fence *wait = &to->submit;
 
+	if (!can_use_semaphore_wait(to, from))
+		goto await_fence;
+
 	if (!intel_context_use_semaphores(to->context))
 		goto await_fence;
 
@@ -1256,7 +1265,8 @@ __i915_request_await_execution(struct i915_request *to,
 	 * immediate execution, and so we must wait until it reaches the
 	 * active slot.
 	 */
-	if (intel_engine_has_semaphores(to->engine) &&
+	if (can_use_semaphore_wait(to, from) &&
+	    intel_engine_has_semaphores(to->engine) &&
 	    !i915_request_has_initial_breadcrumb(to)) {
 		err = __emit_semaphore_wait(to, from, from->fence.seqno - 1);
 		if (err < 0)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] drm/i915: Handle Intel igfx + Intel dgfx hybrid graphics setup
  2021-10-05 11:31 [PATCH] drm/i915: Handle Intel igfx + Intel dgfx hybrid graphics setup Tvrtko Ursulin
@ 2021-10-05 13:05 ` Thomas Hellström
  2021-10-05 14:55   ` Tvrtko Ursulin
  2021-10-13 12:06   ` Daniel Vetter
  0 siblings, 2 replies; 6+ messages in thread
From: Thomas Hellström @ 2021-10-05 13:05 UTC (permalink / raw)
  To: Tvrtko Ursulin, Intel-gfx
  Cc: dri-devel, Tvrtko Ursulin, Daniel Vetter, Matthew Auld

Hi, Tvrtko,

On 10/5/21 13:31, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>
> In short this makes i915 work for hybrid setups (DRI_PRIME=1 with Mesa)
> when rendering is done on Intel dgfx and scanout/composition on Intel
> igfx.
>
> Before this patch the driver was not quite ready for that setup, mainly
> because it was able to emit a semaphore wait between the two GPUs, which
> results in deadlocks because semaphore target location in HWSP is neither
> shared between the two, nor mapped in both GGTT spaces.
>
> To fix it the patch adds an additional check to a couple of relevant code
> paths in order to prevent using semaphores for inter-engine
> synchronisation when relevant objects are not in the same GGTT space.
>
> v2:
>   * Avoid adding rq->i915. (Chris)
>
> v3:
>   * Use GGTT which describes the limit more precisely.
>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Matthew Auld <matthew.auld@intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>

An IMO pretty important bugfix. I read up a bit on the previous 
discussion on this, and from what I understand the other two options were

1) Ripping out the semaphore code,
2) Consider dma-fences from other instances of the same driver as foreign.

For imported dma-bufs we do 2), but particularly with lmem and p2p 
that's a more straightforward decision.

I don't think 1) is a reasonable approach to fix this bug, (but perhaps 
as a general cleanup?), and for 2) yes I guess we might end up doing 
that, unless we find some real benefits in treating 
same-driver-separate-device dma-fences as local, but for this particular 
bug, IMO this is a reasonable fix.

So,

Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>





> ---
>   drivers/gpu/drm/i915/i915_request.c | 12 +++++++++++-
>   1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 79da5eca60af..4f189982f67e 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -1145,6 +1145,12 @@ __emit_semaphore_wait(struct i915_request *to,
>   	return 0;
>   }
>   
> +static bool
> +can_use_semaphore_wait(struct i915_request *to, struct i915_request *from)
> +{
> +	return to->engine->gt->ggtt == from->engine->gt->ggtt;
> +}
> +
>   static int
>   emit_semaphore_wait(struct i915_request *to,
>   		    struct i915_request *from,
> @@ -1153,6 +1159,9 @@ emit_semaphore_wait(struct i915_request *to,
>   	const intel_engine_mask_t mask = READ_ONCE(from->engine)->mask;
>   	struct i915_sw_fence *wait = &to->submit;
>   
> +	if (!can_use_semaphore_wait(to, from))
> +		goto await_fence;
> +
>   	if (!intel_context_use_semaphores(to->context))
>   		goto await_fence;
>   
> @@ -1256,7 +1265,8 @@ __i915_request_await_execution(struct i915_request *to,
>   	 * immediate execution, and so we must wait until it reaches the
>   	 * active slot.
>   	 */
> -	if (intel_engine_has_semaphores(to->engine) &&
> +	if (can_use_semaphore_wait(to, from) &&
> +	    intel_engine_has_semaphores(to->engine) &&
>   	    !i915_request_has_initial_breadcrumb(to)) {
>   		err = __emit_semaphore_wait(to, from, from->fence.seqno - 1);
>   		if (err < 0)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] drm/i915: Handle Intel igfx + Intel dgfx hybrid graphics setup
  2021-10-05 13:05 ` Thomas Hellström
@ 2021-10-05 14:55   ` Tvrtko Ursulin
  2021-10-13 12:06   ` Daniel Vetter
  1 sibling, 0 replies; 6+ messages in thread
From: Tvrtko Ursulin @ 2021-10-05 14:55 UTC (permalink / raw)
  To: Thomas Hellström, Intel-gfx
  Cc: dri-devel, Tvrtko Ursulin, Daniel Vetter, Matthew Auld


On 05/10/2021 14:05, Thomas Hellström wrote:
> Hi, Tvrtko,
> 
> On 10/5/21 13:31, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> In short this makes i915 work for hybrid setups (DRI_PRIME=1 with Mesa)
>> when rendering is done on Intel dgfx and scanout/composition on Intel
>> igfx.
>>
>> Before this patch the driver was not quite ready for that setup, mainly
>> because it was able to emit a semaphore wait between the two GPUs, which
>> results in deadlocks because semaphore target location in HWSP is neither
>> shared between the two, nor mapped in both GGTT spaces.
>>
>> To fix it the patch adds an additional check to a couple of relevant code
>> paths in order to prevent using semaphores for inter-engine
>> synchronisation when relevant objects are not in the same GGTT space.
>>
>> v2:
>>   * Avoid adding rq->i915. (Chris)
>>
>> v3:
>>   * Use GGTT which describes the limit more precisely.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
>> Cc: Matthew Auld <matthew.auld@intel.com>
>> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> 
> An IMO pretty important bugfix. I read up a bit on the previous 
> discussion on this, and from what I understand the other two options were
> 
> 1) Ripping out the semaphore code,
> 2) Consider dma-fences from other instances of the same driver as foreign.

Yes, with the caveat on the second point that there is a multi-tile 
scenario, granted of limited consequence because it only applies is 
someone tries to run that wo/ GuC, where the "same driver" check is not 
enough. This patch handles that case as well. And of course it is 
hypothetical someone would be able to create a inter-tile dependency 
there. Probably nothing in the current code does it.

> For imported dma-bufs we do 2), but particularly with lmem and p2p 
> that's a more straightforward decision.

I am not immediately familiar with p2p considerations.

> I don't think 1) is a reasonable approach to fix this bug, (but perhaps 
> as a general cleanup?), and for 2) yes I guess we might end up doing 
> that, unless we find some real benefits in treating 
> same-driver-separate-device dma-fences as local, but for this particular 
> bug, IMO this is a reasonable fix.

On the option of removing the semaphore inter-optimisation I would not 
call it cleanup since it had clear performance benefits. I personally 
don't have those benchmarks results saved though. So I'd proceed with 
caution there if the code can harmlessly remain in the confines of the 
execlists backend.

Second topic, the whole same driver fence upcast issue, I suppose can be 
discussed along the lines of whether priority inheritance across drivers 
is useful. Like for instance page flip prio boost, which currently does 
safely work between i915 instances, and is relevant to hybrid graphics. 
It was safe when I looked at it, courtesy of global scheduler lock. If 
we wanted to keep that and formalise via an more explicit/generic cross 
driver API is the question. So unless it is not safe after all, I 
wouldn't rip it out before the discussion on the big picture happens.

> So,
> 
> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

Thanks, I'll push it once again cleared by CI.

Regards,

Tvrtko

> 
> 
> 
> 
>> ---
>>   drivers/gpu/drm/i915/i915_request.c | 12 +++++++++++-
>>   1 file changed, 11 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_request.c 
>> b/drivers/gpu/drm/i915/i915_request.c
>> index 79da5eca60af..4f189982f67e 100644
>> --- a/drivers/gpu/drm/i915/i915_request.c
>> +++ b/drivers/gpu/drm/i915/i915_request.c
>> @@ -1145,6 +1145,12 @@ __emit_semaphore_wait(struct i915_request *to,
>>       return 0;
>>   }
>> +static bool
>> +can_use_semaphore_wait(struct i915_request *to, struct i915_request 
>> *from)
>> +{
>> +    return to->engine->gt->ggtt == from->engine->gt->ggtt;
>> +}
>> +
>>   static int
>>   emit_semaphore_wait(struct i915_request *to,
>>               struct i915_request *from,
>> @@ -1153,6 +1159,9 @@ emit_semaphore_wait(struct i915_request *to,
>>       const intel_engine_mask_t mask = READ_ONCE(from->engine)->mask;
>>       struct i915_sw_fence *wait = &to->submit;
>> +    if (!can_use_semaphore_wait(to, from))
>> +        goto await_fence;
>> +
>>       if (!intel_context_use_semaphores(to->context))
>>           goto await_fence;
>> @@ -1256,7 +1265,8 @@ __i915_request_await_execution(struct 
>> i915_request *to,
>>        * immediate execution, and so we must wait until it reaches the
>>        * active slot.
>>        */
>> -    if (intel_engine_has_semaphores(to->engine) &&
>> +    if (can_use_semaphore_wait(to, from) &&
>> +        intel_engine_has_semaphores(to->engine) &&
>>           !i915_request_has_initial_breadcrumb(to)) {
>>           err = __emit_semaphore_wait(to, from, from->fence.seqno - 1);
>>           if (err < 0)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] drm/i915: Handle Intel igfx + Intel dgfx hybrid graphics setup
  2021-10-05 13:05 ` Thomas Hellström
  2021-10-05 14:55   ` Tvrtko Ursulin
@ 2021-10-13 12:06   ` Daniel Vetter
  2021-10-13 16:02     ` Tvrtko Ursulin
  1 sibling, 1 reply; 6+ messages in thread
From: Daniel Vetter @ 2021-10-13 12:06 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: Tvrtko Ursulin, Intel-gfx, dri-devel, Tvrtko Ursulin,
	Daniel Vetter, Matthew Auld

On Tue, Oct 05, 2021 at 03:05:25PM +0200, Thomas Hellström wrote:
> Hi, Tvrtko,
> 
> On 10/5/21 13:31, Tvrtko Ursulin wrote:
> > From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > 
> > In short this makes i915 work for hybrid setups (DRI_PRIME=1 with Mesa)
> > when rendering is done on Intel dgfx and scanout/composition on Intel
> > igfx.
> > 
> > Before this patch the driver was not quite ready for that setup, mainly
> > because it was able to emit a semaphore wait between the two GPUs, which
> > results in deadlocks because semaphore target location in HWSP is neither
> > shared between the two, nor mapped in both GGTT spaces.
> > 
> > To fix it the patch adds an additional check to a couple of relevant code
> > paths in order to prevent using semaphores for inter-engine
> > synchronisation when relevant objects are not in the same GGTT space.
> > 
> > v2:
> >   * Avoid adding rq->i915. (Chris)
> > 
> > v3:
> >   * Use GGTT which describes the limit more precisely.
> > 
> > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > Cc: Matthew Auld <matthew.auld@intel.com>
> > Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> 
> An IMO pretty important bugfix. I read up a bit on the previous discussion
> on this, and from what I understand the other two options were
> 
> 1) Ripping out the semaphore code,
> 2) Consider dma-fences from other instances of the same driver as foreign.
> 
> For imported dma-bufs we do 2), but particularly with lmem and p2p that's a
> more straightforward decision.
> 
> I don't think 1) is a reasonable approach to fix this bug, (but perhaps as a
> general cleanup?), and for 2) yes I guess we might end up doing that, unless
> we find some real benefits in treating same-driver-separate-device
> dma-fences as local, but for this particular bug, IMO this is a reasonable
> fix.

The foreign dma-fences have uapi impact, which Tvrtko shrugged off as
"it's a good idea", and not it's really just not. So we still need to that
this properly.

> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

But I'm also ok with just merging this as-is so the situation doesn't
become too entertaining.
-Daniel

> 
> 
> 
> 
> 
> > ---
> >   drivers/gpu/drm/i915/i915_request.c | 12 +++++++++++-
> >   1 file changed, 11 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> > index 79da5eca60af..4f189982f67e 100644
> > --- a/drivers/gpu/drm/i915/i915_request.c
> > +++ b/drivers/gpu/drm/i915/i915_request.c
> > @@ -1145,6 +1145,12 @@ __emit_semaphore_wait(struct i915_request *to,
> >   	return 0;
> >   }
> > +static bool
> > +can_use_semaphore_wait(struct i915_request *to, struct i915_request *from)
> > +{
> > +	return to->engine->gt->ggtt == from->engine->gt->ggtt;
> > +}
> > +
> >   static int
> >   emit_semaphore_wait(struct i915_request *to,
> >   		    struct i915_request *from,
> > @@ -1153,6 +1159,9 @@ emit_semaphore_wait(struct i915_request *to,
> >   	const intel_engine_mask_t mask = READ_ONCE(from->engine)->mask;
> >   	struct i915_sw_fence *wait = &to->submit;
> > +	if (!can_use_semaphore_wait(to, from))
> > +		goto await_fence;
> > +
> >   	if (!intel_context_use_semaphores(to->context))
> >   		goto await_fence;
> > @@ -1256,7 +1265,8 @@ __i915_request_await_execution(struct i915_request *to,
> >   	 * immediate execution, and so we must wait until it reaches the
> >   	 * active slot.
> >   	 */
> > -	if (intel_engine_has_semaphores(to->engine) &&
> > +	if (can_use_semaphore_wait(to, from) &&
> > +	    intel_engine_has_semaphores(to->engine) &&
> >   	    !i915_request_has_initial_breadcrumb(to)) {
> >   		err = __emit_semaphore_wait(to, from, from->fence.seqno - 1);
> >   		if (err < 0)

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] drm/i915: Handle Intel igfx + Intel dgfx hybrid graphics setup
  2021-10-13 12:06   ` Daniel Vetter
@ 2021-10-13 16:02     ` Tvrtko Ursulin
  0 siblings, 0 replies; 6+ messages in thread
From: Tvrtko Ursulin @ 2021-10-13 16:02 UTC (permalink / raw)
  To: Daniel Vetter, Thomas Hellström
  Cc: Intel-gfx, dri-devel, Tvrtko Ursulin, Daniel Vetter, Matthew Auld


On 13/10/2021 13:06, Daniel Vetter wrote:
> On Tue, Oct 05, 2021 at 03:05:25PM +0200, Thomas Hellström wrote:
>> Hi, Tvrtko,
>>
>> On 10/5/21 13:31, Tvrtko Ursulin wrote:
>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>
>>> In short this makes i915 work for hybrid setups (DRI_PRIME=1 with Mesa)
>>> when rendering is done on Intel dgfx and scanout/composition on Intel
>>> igfx.
>>>
>>> Before this patch the driver was not quite ready for that setup, mainly
>>> because it was able to emit a semaphore wait between the two GPUs, which
>>> results in deadlocks because semaphore target location in HWSP is neither
>>> shared between the two, nor mapped in both GGTT spaces.
>>>
>>> To fix it the patch adds an additional check to a couple of relevant code
>>> paths in order to prevent using semaphores for inter-engine
>>> synchronisation when relevant objects are not in the same GGTT space.
>>>
>>> v2:
>>>    * Avoid adding rq->i915. (Chris)
>>>
>>> v3:
>>>    * Use GGTT which describes the limit more precisely.
>>>
>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
>>> Cc: Matthew Auld <matthew.auld@intel.com>
>>> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>>
>> An IMO pretty important bugfix. I read up a bit on the previous discussion
>> on this, and from what I understand the other two options were
>>
>> 1) Ripping out the semaphore code,
>> 2) Consider dma-fences from other instances of the same driver as foreign.
>>
>> For imported dma-bufs we do 2), but particularly with lmem and p2p that's a
>> more straightforward decision.
>>
>> I don't think 1) is a reasonable approach to fix this bug, (but perhaps as a
>> general cleanup?), and for 2) yes I guess we might end up doing that, unless
>> we find some real benefits in treating same-driver-separate-device
>> dma-fences as local, but for this particular bug, IMO this is a reasonable
>> fix.
> 
> The foreign dma-fences have uapi impact, which Tvrtko shrugged off as
> "it's a good idea", and not it's really just not. So we still need to that
> this properly.

I always said lets merge the fix and discuss it. Fix only improved one 
fail and did not introduce any new issues you are worried about. They 
were all already there.

So lets start the discussion why it is not a good idea to extend the 
concept of priority inheritance in the hybrid case?

Today we can have high priority compositor waiting for client rendering, 
or even I915_PRIORITY_DISPLAY which I _think_ somehow ties into page 
flips with full screen stuff, and with igpu we do priority inheritance 
in those cases. Why it is a bad idea to do the same in the hybrid setup?

Regards,

Tvrtko

> 
>> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> 
> But I'm also ok with just merging this as-is so the situation doesn't
> become too entertaining.
> -Daniel
> 
>>
>>
>>
>>
>>
>>> ---
>>>    drivers/gpu/drm/i915/i915_request.c | 12 +++++++++++-
>>>    1 file changed, 11 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
>>> index 79da5eca60af..4f189982f67e 100644
>>> --- a/drivers/gpu/drm/i915/i915_request.c
>>> +++ b/drivers/gpu/drm/i915/i915_request.c
>>> @@ -1145,6 +1145,12 @@ __emit_semaphore_wait(struct i915_request *to,
>>>    	return 0;
>>>    }
>>> +static bool
>>> +can_use_semaphore_wait(struct i915_request *to, struct i915_request *from)
>>> +{
>>> +	return to->engine->gt->ggtt == from->engine->gt->ggtt;
>>> +}
>>> +
>>>    static int
>>>    emit_semaphore_wait(struct i915_request *to,
>>>    		    struct i915_request *from,
>>> @@ -1153,6 +1159,9 @@ emit_semaphore_wait(struct i915_request *to,
>>>    	const intel_engine_mask_t mask = READ_ONCE(from->engine)->mask;
>>>    	struct i915_sw_fence *wait = &to->submit;
>>> +	if (!can_use_semaphore_wait(to, from))
>>> +		goto await_fence;
>>> +
>>>    	if (!intel_context_use_semaphores(to->context))
>>>    		goto await_fence;
>>> @@ -1256,7 +1265,8 @@ __i915_request_await_execution(struct i915_request *to,
>>>    	 * immediate execution, and so we must wait until it reaches the
>>>    	 * active slot.
>>>    	 */
>>> -	if (intel_engine_has_semaphores(to->engine) &&
>>> +	if (can_use_semaphore_wait(to, from) &&
>>> +	    intel_engine_has_semaphores(to->engine) &&
>>>    	    !i915_request_has_initial_breadcrumb(to)) {
>>>    		err = __emit_semaphore_wait(to, from, from->fence.seqno - 1);
>>>    		if (err < 0)
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] drm/i915: Handle Intel igfx + Intel dgfx hybrid graphics setup
@ 2021-08-27 13:30 Tvrtko Ursulin
  0 siblings, 0 replies; 6+ messages in thread
From: Tvrtko Ursulin @ 2021-08-27 13:30 UTC (permalink / raw)
  To: Intel-gfx; +Cc: dri-devel, Tvrtko Ursulin

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

In short this makes i915 work for hybrid setups (DRI_PRIME=1 with Mesa)
when rendering is done on Intel dgfx and scanout/composition on Intel
igfx.

Before this patch the driver was not quite ready for that setup, mainly
because it was able to emit a semaphore wait between the two GPUs, which
results in deadlocks because semaphore target location in HWSP is neither
shared between the two, nor mapped in both GGTT spaces.

To fix it the patch adds an additional check to a couple of relevant code
paths in order to prevent using semaphores for inter-engine
synchronisation between different driver instances.

Patch also moves singly used i915_gem_object_last_write_engine to be
private in its only calling unit (debugfs), while modifying it to only
show activity belonging to the respective driver instance.

What remains in this problem space is the question of the GEM busy ioctl.
We have a somewhat ambigous comment there saying only status of native
fences will be reported, which could be interpreted as either i915, or
native to the drm fd. For now I have decided to leave that as is, meaning
any i915 instance activity continues to be reported.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_object.h | 17 ----------------
 drivers/gpu/drm/i915/i915_debugfs.c        | 23 +++++++++++++++++++++-
 drivers/gpu/drm/i915/i915_request.c        |  7 ++++++-
 drivers/gpu/drm/i915/i915_request.h        |  1 +
 4 files changed, 29 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h
index 48112b9d76df..3043fcbd31bd 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
@@ -503,23 +503,6 @@ i915_gem_object_finish_access(struct drm_i915_gem_object *obj)
 	i915_gem_object_unpin_pages(obj);
 }
 
-static inline struct intel_engine_cs *
-i915_gem_object_last_write_engine(struct drm_i915_gem_object *obj)
-{
-	struct intel_engine_cs *engine = NULL;
-	struct dma_fence *fence;
-
-	rcu_read_lock();
-	fence = dma_resv_get_excl_unlocked(obj->base.resv);
-	rcu_read_unlock();
-
-	if (fence && dma_fence_is_i915(fence) && !dma_fence_is_signaled(fence))
-		engine = to_request(fence)->engine;
-	dma_fence_put(fence);
-
-	return engine;
-}
-
 void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
 					 unsigned int cache_level);
 void i915_gem_object_flush_if_display(struct drm_i915_gem_object *obj);
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 04351a851586..2f49ff0e8c21 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -135,6 +135,27 @@ static const char *stringify_vma_type(const struct i915_vma *vma)
 	return "ppgtt";
 }
 
+static struct intel_engine_cs *
+last_write_engine(struct drm_i915_private *i915,
+		  struct drm_i915_gem_object *obj)
+{
+	struct intel_engine_cs *engine = NULL;
+	struct dma_fence *fence;
+
+	rcu_read_lock();
+	fence = dma_resv_get_excl_unlocked(obj->base.resv);
+	rcu_read_unlock();
+
+	if (fence &&
+	    !dma_fence_is_signaled(fence) &&
+	    dma_fence_is_i915(fence) &&
+	    to_request(fence)->i915 == i915)
+		engine = to_request(fence)->engine;
+	dma_fence_put(fence);
+
+	return engine;
+}
+
 void
 i915_debugfs_describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
 {
@@ -230,7 +251,7 @@ i915_debugfs_describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
 	if (i915_gem_object_is_framebuffer(obj))
 		seq_printf(m, " (fb)");
 
-	engine = i915_gem_object_last_write_engine(obj);
+	engine = last_write_engine(dev_priv, obj);
 	if (engine)
 		seq_printf(m, " (%s)", engine->name);
 }
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index ce446716d092..d2dec669d262 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -900,6 +900,7 @@ __i915_request_create(struct intel_context *ce, gfp_t gfp)
 	 * hold the intel_context reference. In execlist mode the request always
 	 * eventually points to a physical engine so this isn't an issue.
 	 */
+	rq->i915 = tl->gt->i915;
 	rq->context = intel_context_get(ce);
 	rq->engine = ce->engine;
 	rq->ring = ce->ring;
@@ -1160,6 +1161,9 @@ emit_semaphore_wait(struct i915_request *to,
 	const intel_engine_mask_t mask = READ_ONCE(from->engine)->mask;
 	struct i915_sw_fence *wait = &to->submit;
 
+	if (to->i915 != from->i915)
+		goto await_fence;
+
 	if (!intel_context_use_semaphores(to->context))
 		goto await_fence;
 
@@ -1263,7 +1267,8 @@ __i915_request_await_execution(struct i915_request *to,
 	 * immediate execution, and so we must wait until it reaches the
 	 * active slot.
 	 */
-	if (intel_engine_has_semaphores(to->engine) &&
+	if (to->i915 == from->i915 &&
+	    intel_engine_has_semaphores(to->engine) &&
 	    !i915_request_has_initial_breadcrumb(to)) {
 		err = __emit_semaphore_wait(to, from, from->fence.seqno - 1);
 		if (err < 0)
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index 1bc1349ba3c2..61a2ad6f1f1c 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -163,6 +163,7 @@ enum {
  */
 struct i915_request {
 	struct dma_fence fence;
+	struct drm_i915_private *i915;
 	spinlock_t lock;
 
 	/**
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-10-13 16:04 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-05 11:31 [PATCH] drm/i915: Handle Intel igfx + Intel dgfx hybrid graphics setup Tvrtko Ursulin
2021-10-05 13:05 ` Thomas Hellström
2021-10-05 14:55   ` Tvrtko Ursulin
2021-10-13 12:06   ` Daniel Vetter
2021-10-13 16:02     ` Tvrtko Ursulin
  -- strict thread matches above, loose matches on Subject: below --
2021-08-27 13:30 Tvrtko Ursulin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).