[PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences.

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences.
@ 2016-09-08  0:14 Mario Kleiner
  2016-09-08  6:30 ` Chris Wilson
  0 siblings, 1 reply; 46+ messages in thread
From: Mario Kleiner @ 2016-09-08  0:14 UTC (permalink / raw)
  To: dri-devel; +Cc: Daniel Vetter, Michel Dänzer

amdgpu-kms uses shared fences for its prime exported dmabufs,
instead of an exclusive fence. Therefore we need to wait for
all fences of the dmabuf reservation object to prevent
unsynchronized rendering and flipping.

This patch was tested to behave properly with intel-kms +
radeon/amdgpu/nouveau-kms for correct prime sync during
pageflipping under DRI3/Present.

Should fix https://bugs.freedesktop.org/show_bug.cgi?id=95472
at least for page-flipped presentation.

Suggested-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Cc: Michel Dänzer <michel.daenzer@amd.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Airlie <airlied@linux.ie>
---
 drivers/gpu/drm/i915/intel_display.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 922709b..4b74b96 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -12043,7 +12043,7 @@ static void intel_mmio_flip_work_func(struct work_struct *w)
 	/* For framebuffer backed by dmabuf, wait for fence */
 	resv = i915_gem_object_get_dmabuf_resv(obj);
 	if (resv)
-		WARN_ON(reservation_object_wait_timeout_rcu(resv, false, false,
+		WARN_ON(reservation_object_wait_timeout_rcu(resv, true, false,
 							    MAX_SCHEDULE_TIMEOUT) < 0);
 
 	intel_pipe_update_start(crtc);
@@ -14700,7 +14700,7 @@ intel_prepare_plane_fb(struct drm_plane *plane,
 	if (resv) {
 		long lret;
 
-		lret = reservation_object_wait_timeout_rcu(resv, false, true,
+		lret = reservation_object_wait_timeout_rcu(resv, true, true,
 							   MAX_SCHEDULE_TIMEOUT);
 		if (lret == -ERESTARTSYS)
 			return lret;
-- 
2.7.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences.
  2016-09-08  0:14 [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences Mario Kleiner
@ 2016-09-08  6:30 ` Chris Wilson
  2016-09-08 15:21   ` Mario Kleiner
  0 siblings, 1 reply; 46+ messages in thread
From: Chris Wilson @ 2016-09-08  6:30 UTC (permalink / raw)
  To: Mario Kleiner; +Cc: Daniel Vetter, Michel Dänzer, dri-devel

On Thu, Sep 08, 2016 at 02:14:43AM +0200, Mario Kleiner wrote:
> amdgpu-kms uses shared fences for its prime exported dmabufs,
> instead of an exclusive fence. Therefore we need to wait for
> all fences of the dmabuf reservation object to prevent
> unsynchronized rendering and flipping.

No. Fix the root cause as this affects not just flips but copies -
this implies that everybody using the resv object must wait for all
fences. The resv object is not just used for prime, but all fencing, so
this breaks the ability to schedule parallel operations across engine.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences.
  2016-09-08  6:30 ` Chris Wilson
@ 2016-09-08 15:21   ` Mario Kleiner
  2016-09-08 16:23     ` Chris Wilson
  0 siblings, 1 reply; 46+ messages in thread
From: Mario Kleiner @ 2016-09-08 15:21 UTC (permalink / raw)
  To: Chris Wilson, dri-devel, Michel Dänzer, Daniel Vetter, David Airlie

On 09/08/2016 08:30 AM, Chris Wilson wrote:
> On Thu, Sep 08, 2016 at 02:14:43AM +0200, Mario Kleiner wrote:
>> amdgpu-kms uses shared fences for its prime exported dmabufs,
>> instead of an exclusive fence. Therefore we need to wait for
>> all fences of the dmabuf reservation object to prevent
>> unsynchronized rendering and flipping.
>
> No. Fix the root cause as this affects not just flips but copies -
> this implies that everybody using the resv object must wait for all
> fences. The resv object is not just used for prime, but all fencing, so
> this breaks the ability to schedule parallel operations across engine.
> -Chris
>

Ok. I think i now understand the difference, but let's check: The 
exclusive fence is essentially acting a bit like a write-lock, and the 
shared fences as readers-locks? So you can have multiple readers but 
only one writer at a time?

Ie.:

Writer must wait for all fences before starting write access to a 
buffer, then attach the exclusive fence and signal it on end of write 
access. E.g., write to renderbuffer, write to texture etc.

Readers must wait for exclusive fence, then attach a shared fence per 
reader and signal it on end of read access? E.g., read from texture, fb, 
scanout?

Is that correct? In that case we'd have a missing exclusive fence in 
amdgpu for the linear target dmabuf? Probably beyond my level of 
knowledge to fix this?

thanks,
-mario
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences.
  2016-09-08 15:21   ` Mario Kleiner
@ 2016-09-08 16:23     ` Chris Wilson
       [not found]       ` <20160908162346.GA5479-aII6DKEyn0pWYbfKqPwjAkR8Iwp7RQ6xAL8bYrjMMd8@public.gmane.org>
  0 siblings, 1 reply; 46+ messages in thread
From: Chris Wilson @ 2016-09-08 16:23 UTC (permalink / raw)
  To: Mario Kleiner; +Cc: Daniel Vetter, Michel Dänzer, dri-devel

On Thu, Sep 08, 2016 at 05:21:42PM +0200, Mario Kleiner wrote:
> On 09/08/2016 08:30 AM, Chris Wilson wrote:
> >On Thu, Sep 08, 2016 at 02:14:43AM +0200, Mario Kleiner wrote:
> >>amdgpu-kms uses shared fences for its prime exported dmabufs,
> >>instead of an exclusive fence. Therefore we need to wait for
> >>all fences of the dmabuf reservation object to prevent
> >>unsynchronized rendering and flipping.
> >
> >No. Fix the root cause as this affects not just flips but copies -
> >this implies that everybody using the resv object must wait for all
> >fences. The resv object is not just used for prime, but all fencing, so
> >this breaks the ability to schedule parallel operations across engine.
> >-Chris
> >
> 
> Ok. I think i now understand the difference, but let's check: The
> exclusive fence is essentially acting a bit like a write-lock, and
> the shared fences as readers-locks? So you can have multiple readers
> but only one writer at a time?

That's how we (i915.ko and I hope the rest of the world) are using them.
In the model where here is just one reservation object on the GEM
object, that reservation object is then shared between internal driver
scheduling and external. We are reliant on being able to use buffers on
multiple engines through the virtue of the shared fences, and to serialise
after writes by waiting on the exclusive fence. (So we can have concurrent
reads on the display engine, render engines and on the CPU - or
alternatively an exclusive writer.)

In the near future, i915 flips will wait on the common reservation object
not only for dma-bufs, but also its own GEM objects.
 
> Ie.:
> 
> Writer must wait for all fences before starting write access to a
> buffer, then attach the exclusive fence and signal it on end of
> write access. E.g., write to renderbuffer, write to texture etc.

Yes.
 
> Readers must wait for exclusive fence, then attach a shared fence
> per reader and signal it on end of read access? E.g., read from
> texture, fb, scanout?

Yes.
 
> Is that correct? In that case we'd have a missing exclusive fence in
> amdgpu for the linear target dmabuf? Probably beyond my level of
> knowledge to fix this?

i915.ko requires the client to mark which buffers are written to.

In ttm, there are ttm_validate_buffer objects which mark whether they
should be using shared or exclusive fences. Afaict, in amdgpu they are
all set to shared, the relevant user interface seems to be
amdgpu_bo_list_set().
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences.
       [not found]       ` <20160908162346.GA5479-aII6DKEyn0pWYbfKqPwjAkR8Iwp7RQ6xAL8bYrjMMd8@public.gmane.org>
@ 2016-09-09  1:15         ` Michel Dänzer
       [not found]           ` <abccc8ac-10c6-ab22-c59d-f43ee48ba78d-otUistvHUpPR7s880joybQ@public.gmane.org>
  0 siblings, 1 reply; 46+ messages in thread
From: Michel Dänzer @ 2016-09-09  1:15 UTC (permalink / raw)
  To: Chris Wilson, Mario Kleiner, Christian König
  Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On 09/09/16 01:23 AM, Chris Wilson wrote:
> On Thu, Sep 08, 2016 at 05:21:42PM +0200, Mario Kleiner wrote:
>> On 09/08/2016 08:30 AM, Chris Wilson wrote:
>>> On Thu, Sep 08, 2016 at 02:14:43AM +0200, Mario Kleiner wrote:
>>>> amdgpu-kms uses shared fences for its prime exported dmabufs,
>>>> instead of an exclusive fence. Therefore we need to wait for
>>>> all fences of the dmabuf reservation object to prevent
>>>> unsynchronized rendering and flipping.
>>>
>>> No. Fix the root cause as this affects not just flips but copies -
>>> this implies that everybody using the resv object must wait for all
>>> fences. The resv object is not just used for prime, but all fencing, so
>>> this breaks the ability to schedule parallel operations across engine.
>>> -Chris
>>>
>>
>> Ok. I think i now understand the difference, but let's check: The
>> exclusive fence is essentially acting a bit like a write-lock, and
>> the shared fences as readers-locks? So you can have multiple readers
>> but only one writer at a time?
> 
> That's how we (i915.ko and I hope the rest of the world) are using them.
> In the model where here is just one reservation object on the GEM
> object, that reservation object is then shared between internal driver
> scheduling and external. We are reliant on being able to use buffers on
> multiple engines through the virtue of the shared fences, and to serialise
> after writes by waiting on the exclusive fence. (So we can have concurrent
> reads on the display engine, render engines and on the CPU - or
> alternatively an exclusive writer.)
> 
> In the near future, i915 flips will wait on the common reservation object
> not only for dma-bufs, but also its own GEM objects.
>  
>> Ie.:
>>
>> Writer must wait for all fences before starting write access to a
>> buffer, then attach the exclusive fence and signal it on end of
>> write access. E.g., write to renderbuffer, write to texture etc.
> 
> Yes.
>  
>> Readers must wait for exclusive fence, then attach a shared fence
>> per reader and signal it on end of read access? E.g., read from
>> texture, fb, scanout?
> 
> Yes.
>  
>> Is that correct? In that case we'd have a missing exclusive fence in
>> amdgpu for the linear target dmabuf? Probably beyond my level of
>> knowledge to fix this?
> 
> i915.ko requires the client to mark which buffers are written to.
> 
> In ttm, there are ttm_validate_buffer objects which mark whether they
> should be using shared or exclusive fences. Afaict, in amdgpu they are
> all set to shared, the relevant user interface seems to be
> amdgpu_bo_list_set().

This all makes sense to me.

Christian, why is amdgpu setting only shared fences? Can we fix that?


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences.
       [not found]           ` <abccc8ac-10c6-ab22-c59d-f43ee48ba78d-otUistvHUpPR7s880joybQ@public.gmane.org>
@ 2016-09-13  8:44             ` Christian König
  2016-09-13  9:39               ` Chris Wilson
  0 siblings, 1 reply; 46+ messages in thread
From: Christian König @ 2016-09-13  8:44 UTC (permalink / raw)
  To: Michel Dänzer, Chris Wilson, Mario Kleiner
  Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Am 09.09.2016 um 03:15 schrieb Michel Dänzer:
> On 09/09/16 01:23 AM, Chris Wilson wrote:
>> On Thu, Sep 08, 2016 at 05:21:42PM +0200, Mario Kleiner wrote:
>>> On 09/08/2016 08:30 AM, Chris Wilson wrote:
>>>> On Thu, Sep 08, 2016 at 02:14:43AM +0200, Mario Kleiner wrote:
>>>>> amdgpu-kms uses shared fences for its prime exported dmabufs,
>>>>> instead of an exclusive fence. Therefore we need to wait for
>>>>> all fences of the dmabuf reservation object to prevent
>>>>> unsynchronized rendering and flipping.
>>>> No. Fix the root cause as this affects not just flips but copies -
>>>> this implies that everybody using the resv object must wait for all
>>>> fences. The resv object is not just used for prime, but all fencing, so
>>>> this breaks the ability to schedule parallel operations across engine.
>>>> -Chris
>>>>
>>> Ok. I think i now understand the difference, but let's check: The
>>> exclusive fence is essentially acting a bit like a write-lock, and
>>> the shared fences as readers-locks? So you can have multiple readers
>>> but only one writer at a time?
>> That's how we (i915.ko and I hope the rest of the world) are using them.
>> In the model where here is just one reservation object on the GEM
>> object, that reservation object is then shared between internal driver
>> scheduling and external. We are reliant on being able to use buffers on
>> multiple engines through the virtue of the shared fences, and to serialise
>> after writes by waiting on the exclusive fence. (So we can have concurrent
>> reads on the display engine, render engines and on the CPU - or
>> alternatively an exclusive writer.)
>>
>> In the near future, i915 flips will wait on the common reservation object
>> not only for dma-bufs, but also its own GEM objects.
>>   
>>> Ie.:
>>>
>>> Writer must wait for all fences before starting write access to a
>>> buffer, then attach the exclusive fence and signal it on end of
>>> write access. E.g., write to renderbuffer, write to texture etc.
>> Yes.
>>   
>>> Readers must wait for exclusive fence, then attach a shared fence
>>> per reader and signal it on end of read access? E.g., read from
>>> texture, fb, scanout?
>> Yes.
>>   
>>> Is that correct? In that case we'd have a missing exclusive fence in
>>> amdgpu for the linear target dmabuf? Probably beyond my level of
>>> knowledge to fix this?
>> i915.ko requires the client to mark which buffers are written to.
>>
>> In ttm, there are ttm_validate_buffer objects which mark whether they
>> should be using shared or exclusive fences. Afaict, in amdgpu they are
>> all set to shared, the relevant user interface seems to be
>> amdgpu_bo_list_set().
> This all makes sense to me.
>
> Christian, why is amdgpu setting only shared fences? Can we fix that?

No, amdgpu relies on the fact that we even allow concurrent write 
accesses by userspace.

E.g. one part of the buffer can be rendered by one engine while another 
part could be rendered by another engine.

Just imagine X which is composing a buffer with both the 3D engine as 
well as the DMA engine.

All engines need to run in parallel and you need to wait for all of them 
to finish before scanout.

Everybody which needs exclusive access to the reservation object (like 
scanouts do) needs to wait for all fences, not just the exclusive one.

The Intel driver clearly needs to be fixed here.

Regards,
Christian.
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences.
  2016-09-13  8:44             ` Christian König
@ 2016-09-13  9:39               ` Chris Wilson
       [not found]                 ` <20160913093945.GA25204-aII6DKEyn0pWYbfKqPwjAkR8Iwp7RQ6xAL8bYrjMMd8@public.gmane.org>
  0 siblings, 1 reply; 46+ messages in thread
From: Chris Wilson @ 2016-09-13  9:39 UTC (permalink / raw)
  To: Christian König; +Cc: Michel Dänzer, amd-gfx, dri-devel

On Tue, Sep 13, 2016 at 10:44:11AM +0200, Christian König wrote:
> Am 09.09.2016 um 03:15 schrieb Michel Dänzer:
> >On 09/09/16 01:23 AM, Chris Wilson wrote:
> >>On Thu, Sep 08, 2016 at 05:21:42PM +0200, Mario Kleiner wrote:
> >>>On 09/08/2016 08:30 AM, Chris Wilson wrote:
> >>>>On Thu, Sep 08, 2016 at 02:14:43AM +0200, Mario Kleiner wrote:
> >>>>>amdgpu-kms uses shared fences for its prime exported dmabufs,
> >>>>>instead of an exclusive fence. Therefore we need to wait for
> >>>>>all fences of the dmabuf reservation object to prevent
> >>>>>unsynchronized rendering and flipping.
> >>>>No. Fix the root cause as this affects not just flips but copies -
> >>>>this implies that everybody using the resv object must wait for all
> >>>>fences. The resv object is not just used for prime, but all fencing, so
> >>>>this breaks the ability to schedule parallel operations across engine.
> >>>>-Chris
> >>>>
> >>>Ok. I think i now understand the difference, but let's check: The
> >>>exclusive fence is essentially acting a bit like a write-lock, and
> >>>the shared fences as readers-locks? So you can have multiple readers
> >>>but only one writer at a time?
> >>That's how we (i915.ko and I hope the rest of the world) are using them.
> >>In the model where here is just one reservation object on the GEM
> >>object, that reservation object is then shared between internal driver
> >>scheduling and external. We are reliant on being able to use buffers on
> >>multiple engines through the virtue of the shared fences, and to serialise
> >>after writes by waiting on the exclusive fence. (So we can have concurrent
> >>reads on the display engine, render engines and on the CPU - or
> >>alternatively an exclusive writer.)
> >>
> >>In the near future, i915 flips will wait on the common reservation object
> >>not only for dma-bufs, but also its own GEM objects.
> >>>Ie.:
> >>>
> >>>Writer must wait for all fences before starting write access to a
> >>>buffer, then attach the exclusive fence and signal it on end of
> >>>write access. E.g., write to renderbuffer, write to texture etc.
> >>Yes.
> >>>Readers must wait for exclusive fence, then attach a shared fence
> >>>per reader and signal it on end of read access? E.g., read from
> >>>texture, fb, scanout?
> >>Yes.
> >>>Is that correct? In that case we'd have a missing exclusive fence in
> >>>amdgpu for the linear target dmabuf? Probably beyond my level of
> >>>knowledge to fix this?
> >>i915.ko requires the client to mark which buffers are written to.
> >>
> >>In ttm, there are ttm_validate_buffer objects which mark whether they
> >>should be using shared or exclusive fences. Afaict, in amdgpu they are
> >>all set to shared, the relevant user interface seems to be
> >>amdgpu_bo_list_set().
> >This all makes sense to me.
> >
> >Christian, why is amdgpu setting only shared fences? Can we fix that?
> 
> No, amdgpu relies on the fact that we even allow concurrent write
> accesses by userspace.
> 
> E.g. one part of the buffer can be rendered by one engine while
> another part could be rendered by another engine.
> 
> Just imagine X which is composing a buffer with both the 3D engine
> as well as the DMA engine.
> 
> All engines need to run in parallel and you need to wait for all of
> them to finish before scanout.
> 
> Everybody which needs exclusive access to the reservation object
> (like scanouts do) needs to wait for all fences, not just the
> exclusive one.
> 
> The Intel driver clearly needs to be fixed here.

If you are not using implicit fencing, you have to pass explicit fences
instead.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences.
       [not found]                 ` <20160913093945.GA25204-aII6DKEyn0pWYbfKqPwjAkR8Iwp7RQ6xAL8bYrjMMd8@public.gmane.org>
@ 2016-09-13 12:52                   ` Christian König
  2016-09-21  9:56                     ` Michel Dänzer
  0 siblings, 1 reply; 46+ messages in thread
From: Christian König @ 2016-09-13 12:52 UTC (permalink / raw)
  To: Chris Wilson, Michel Dänzer, Mario Kleiner,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Am 13.09.2016 um 11:39 schrieb Chris Wilson:
> On Tue, Sep 13, 2016 at 10:44:11AM +0200, Christian König wrote:
>> Am 09.09.2016 um 03:15 schrieb Michel Dänzer:
>>> On 09/09/16 01:23 AM, Chris Wilson wrote:
>>>> On Thu, Sep 08, 2016 at 05:21:42PM +0200, Mario Kleiner wrote:
>>>>> On 09/08/2016 08:30 AM, Chris Wilson wrote:
>>>>>> On Thu, Sep 08, 2016 at 02:14:43AM +0200, Mario Kleiner wrote:
>>>>>>> amdgpu-kms uses shared fences for its prime exported dmabufs,
>>>>>>> instead of an exclusive fence. Therefore we need to wait for
>>>>>>> all fences of the dmabuf reservation object to prevent
>>>>>>> unsynchronized rendering and flipping.
>>>>>> No. Fix the root cause as this affects not just flips but copies -
>>>>>> this implies that everybody using the resv object must wait for all
>>>>>> fences. The resv object is not just used for prime, but all fencing, so
>>>>>> this breaks the ability to schedule parallel operations across engine.
>>>>>> -Chris
>>>>>>
>>>>> Ok. I think i now understand the difference, but let's check: The
>>>>> exclusive fence is essentially acting a bit like a write-lock, and
>>>>> the shared fences as readers-locks? So you can have multiple readers
>>>>> but only one writer at a time?
>>>> That's how we (i915.ko and I hope the rest of the world) are using them.
>>>> In the model where here is just one reservation object on the GEM
>>>> object, that reservation object is then shared between internal driver
>>>> scheduling and external. We are reliant on being able to use buffers on
>>>> multiple engines through the virtue of the shared fences, and to serialise
>>>> after writes by waiting on the exclusive fence. (So we can have concurrent
>>>> reads on the display engine, render engines and on the CPU - or
>>>> alternatively an exclusive writer.)
>>>>
>>>> In the near future, i915 flips will wait on the common reservation object
>>>> not only for dma-bufs, but also its own GEM objects.
>>>>> Ie.:
>>>>>
>>>>> Writer must wait for all fences before starting write access to a
>>>>> buffer, then attach the exclusive fence and signal it on end of
>>>>> write access. E.g., write to renderbuffer, write to texture etc.
>>>> Yes.
>>>>> Readers must wait for exclusive fence, then attach a shared fence
>>>>> per reader and signal it on end of read access? E.g., read from
>>>>> texture, fb, scanout?
>>>> Yes.
>>>>> Is that correct? In that case we'd have a missing exclusive fence in
>>>>> amdgpu for the linear target dmabuf? Probably beyond my level of
>>>>> knowledge to fix this?
>>>> i915.ko requires the client to mark which buffers are written to.
>>>>
>>>> In ttm, there are ttm_validate_buffer objects which mark whether they
>>>> should be using shared or exclusive fences. Afaict, in amdgpu they are
>>>> all set to shared, the relevant user interface seems to be
>>>> amdgpu_bo_list_set().
>>> This all makes sense to me.
>>>
>>> Christian, why is amdgpu setting only shared fences? Can we fix that?
>> No, amdgpu relies on the fact that we even allow concurrent write
>> accesses by userspace.
>>
>> E.g. one part of the buffer can be rendered by one engine while
>> another part could be rendered by another engine.
>>
>> Just imagine X which is composing a buffer with both the 3D engine
>> as well as the DMA engine.
>>
>> All engines need to run in parallel and you need to wait for all of
>> them to finish before scanout.
>>
>> Everybody which needs exclusive access to the reservation object
>> (like scanouts do) needs to wait for all fences, not just the
>> exclusive one.
>>
>> The Intel driver clearly needs to be fixed here.
> If you are not using implicit fencing, you have to pass explicit fences
> instead.

Which is exactly what we do, but only for the driver internally command 
submissions.

All command submissions from the same process can run concurrently with 
amdgpu, only when we see a fence from another driver or process we wait 
for it to complete before starting to run a command submission.

Other drivers can't make any assumption on what a shared access is 
actually doing (e.g. writing or reading) with a buffer.

So the model i915.ko is using the reservation object and it's shared 
fences is certainly not correct and needs to be fixed.

Regards,
Christian.

> -Chris
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences.
  2016-09-13 12:52                   ` Christian König
@ 2016-09-21  9:56                     ` Michel Dänzer
       [not found]                       ` <7aafce92-8bcf-1c5c-45de-9e8ecda85239-otUistvHUpPR7s880joybQ@public.gmane.org>
  0 siblings, 1 reply; 46+ messages in thread
From: Michel Dänzer @ 2016-09-21  9:56 UTC (permalink / raw)
  To: Christian König, Chris Wilson, Mario Kleiner; +Cc: amd-gfx, dri-devel

On 13/09/16 09:52 PM, Christian König wrote:
> Am 13.09.2016 um 11:39 schrieb Chris Wilson:
>> On Tue, Sep 13, 2016 at 10:44:11AM +0200, Christian König wrote:
>>> Am 09.09.2016 um 03:15 schrieb Michel Dänzer:
>>>> On 09/09/16 01:23 AM, Chris Wilson wrote:
>>>>> On Thu, Sep 08, 2016 at 05:21:42PM +0200, Mario Kleiner wrote:
>>>>>> On 09/08/2016 08:30 AM, Chris Wilson wrote:
>>>>>>> On Thu, Sep 08, 2016 at 02:14:43AM +0200, Mario Kleiner wrote:
>>>>>>>> amdgpu-kms uses shared fences for its prime exported dmabufs,
>>>>>>>> instead of an exclusive fence. Therefore we need to wait for
>>>>>>>> all fences of the dmabuf reservation object to prevent
>>>>>>>> unsynchronized rendering and flipping.
>>>>>>> No. Fix the root cause as this affects not just flips but copies -
>>>>>>> this implies that everybody using the resv object must wait for all
>>>>>>> fences. The resv object is not just used for prime, but all
>>>>>>> fencing, so
>>>>>>> this breaks the ability to schedule parallel operations across
>>>>>>> engine.
>>>>>>> -Chris
>>>>>>>
>>>>>> Ok. I think i now understand the difference, but let's check: The
>>>>>> exclusive fence is essentially acting a bit like a write-lock, and
>>>>>> the shared fences as readers-locks? So you can have multiple readers
>>>>>> but only one writer at a time?
>>>>> That's how we (i915.ko and I hope the rest of the world) are using
>>>>> them.
>>>>> In the model where here is just one reservation object on the GEM
>>>>> object, that reservation object is then shared between internal driver
>>>>> scheduling and external. We are reliant on being able to use
>>>>> buffers on
>>>>> multiple engines through the virtue of the shared fences, and to
>>>>> serialise
>>>>> after writes by waiting on the exclusive fence. (So we can have
>>>>> concurrent
>>>>> reads on the display engine, render engines and on the CPU - or
>>>>> alternatively an exclusive writer.)
>>>>>
>>>>> In the near future, i915 flips will wait on the common reservation
>>>>> object
>>>>> not only for dma-bufs, but also its own GEM objects.
>>>>>> Ie.:
>>>>>>
>>>>>> Writer must wait for all fences before starting write access to a
>>>>>> buffer, then attach the exclusive fence and signal it on end of
>>>>>> write access. E.g., write to renderbuffer, write to texture etc.
>>>>> Yes.
>>>>>> Readers must wait for exclusive fence, then attach a shared fence
>>>>>> per reader and signal it on end of read access? E.g., read from
>>>>>> texture, fb, scanout?
>>>>> Yes.
>>>>>> Is that correct? In that case we'd have a missing exclusive fence in
>>>>>> amdgpu for the linear target dmabuf? Probably beyond my level of
>>>>>> knowledge to fix this?
>>>>> i915.ko requires the client to mark which buffers are written to.
>>>>>
>>>>> In ttm, there are ttm_validate_buffer objects which mark whether they
>>>>> should be using shared or exclusive fences. Afaict, in amdgpu they are
>>>>> all set to shared, the relevant user interface seems to be
>>>>> amdgpu_bo_list_set().
>>>> This all makes sense to me.
>>>>
>>>> Christian, why is amdgpu setting only shared fences? Can we fix that?
>>> No, amdgpu relies on the fact that we even allow concurrent write
>>> accesses by userspace.
>>>
>>> E.g. one part of the buffer can be rendered by one engine while
>>> another part could be rendered by another engine.
>>>
>>> Just imagine X which is composing a buffer with both the 3D engine
>>> as well as the DMA engine.
>>>
>>> All engines need to run in parallel and you need to wait for all of
>>> them to finish before scanout.
>>>
>>> Everybody which needs exclusive access to the reservation object
>>> (like scanouts do) needs to wait for all fences, not just the
>>> exclusive one.
>>>
>>> The Intel driver clearly needs to be fixed here.
>> If you are not using implicit fencing, you have to pass explicit fences
>> instead.
> 
> Which is exactly what we do, but only for the driver internally command
> submissions.
> 
> All command submissions from the same process can run concurrently with
> amdgpu, only when we see a fence from another driver or process we wait
> for it to complete before starting to run a command submission.
> 
> Other drivers can't make any assumption on what a shared access is
> actually doing (e.g. writing or reading) with a buffer.
> 
> So the model i915.ko is using the reservation object and it's shared
> fences is certainly not correct and needs to be fixed.

Looks like there are different interpretations of the semantics of
exclusive vs. shared fences. Where are these semantics documented?


FWIW, we seem to have the same issue with radeon vs. amdgpu: radeon only
seems to wait for exclusive fences, so e.g. running Xorg on amdgpu and
using PRIME slave scanout on radeon leaves artifacts.


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences.
       [not found]                       ` <7aafce92-8bcf-1c5c-45de-9e8ecda85239-otUistvHUpPR7s880joybQ@public.gmane.org>
@ 2016-09-21 10:30                         ` Christian König
  2016-09-21 11:04                           ` Daniel Vetter
  2016-09-21 15:13                           ` Michel Dänzer
  0 siblings, 2 replies; 46+ messages in thread
From: Christian König @ 2016-09-21 10:30 UTC (permalink / raw)
  To: Michel Dänzer, Chris Wilson, Mario Kleiner
  Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Am 21.09.2016 um 11:56 schrieb Michel Dänzer:
>
> Looks like there are different interpretations of the semantics of
> exclusive vs. shared fences. Where are these semantics documented?

Yeah, I think as well that this is the primary question here.

IIRC the fences were explicitly called exclusive/shared instead of 
writing/reading on purpose.

I absolutely don't mind switching to them to writing/reading semantics, 
but amdgpu really needs multiple writers at the same time.

So in this case the writing side of a reservation object needs to be a 
collection of fences as well.

> FWIW, we seem to have the same issue with radeon vs. amdgpu: radeon only
> seems to wait for exclusive fences, so e.g. running Xorg on amdgpu and
> using PRIME slave scanout on radeon leaves artifacts.

Yeah, I know. See radeon_display.c radeon_flip_work_func().

We pretty much need the same patch here I've done for amdgpu as well.

Regards,
Christian.
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences.
  2016-09-21 10:30                         ` Christian König
@ 2016-09-21 11:04                           ` Daniel Vetter
       [not found]                             ` <CAKMK7uG3j54NzwjxmWuSmP787r+QN-Cu5T8R-naX6S9RvvKemw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2016-09-21 15:13                           ` Michel Dänzer
  1 sibling, 1 reply; 46+ messages in thread
From: Daniel Vetter @ 2016-09-21 11:04 UTC (permalink / raw)
  To: Christian König; +Cc: dri-devel, Michel Dänzer, amd-gfx list

On Wed, Sep 21, 2016 at 12:30 PM, Christian König
<deathsimple@vodafone.de> wrote:
> Am 21.09.2016 um 11:56 schrieb Michel Dänzer:
>>
>>
>> Looks like there are different interpretations of the semantics of
>> exclusive vs. shared fences. Where are these semantics documented?
>
>
> Yeah, I think as well that this is the primary question here.
>
> IIRC the fences were explicitly called exclusive/shared instead of
> writing/reading on purpose.
>
> I absolutely don't mind switching to them to writing/reading semantics, but
> amdgpu really needs multiple writers at the same time.
>
> So in this case the writing side of a reservation object needs to be a
> collection of fences as well.

You can't have multiple writers with implicit syncing. That confusion
is exactly why we called them shared/exclusive. Multiple writers
generally means that you do some form of fencing in userspace
(unsync'ed gl buffer access is the common one). What you do for
private buffers doesn't matter, but when you render into a
shared/winsys buffer you really need to set the exclusive fence (and
there can only ever be one). So probably needs some userspace
adjustments to make sure you don't accidentally set an exclusive write
hazard when you don't really want that implicit sync.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences.
       [not found]                             ` <CAKMK7uG3j54NzwjxmWuSmP787r+QN-Cu5T8R-naX6S9RvvKemw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2016-09-21 11:19                               ` Christian König
  2016-09-21 12:56                                 ` Daniel Vetter
  0 siblings, 1 reply; 46+ messages in thread
From: Christian König @ 2016-09-21 11:19 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: dri-devel, Michel Dänzer, Mario Kleiner, amd-gfx list, Chris Wilson

Am 21.09.2016 um 13:04 schrieb Daniel Vetter:
> On Wed, Sep 21, 2016 at 12:30 PM, Christian König
> <deathsimple@vodafone.de> wrote:
>> Am 21.09.2016 um 11:56 schrieb Michel Dänzer:
>>>
>>> Looks like there are different interpretations of the semantics of
>>> exclusive vs. shared fences. Where are these semantics documented?
>>
>> Yeah, I think as well that this is the primary question here.
>>
>> IIRC the fences were explicitly called exclusive/shared instead of
>> writing/reading on purpose.
>>
>> I absolutely don't mind switching to them to writing/reading semantics, but
>> amdgpu really needs multiple writers at the same time.
>>
>> So in this case the writing side of a reservation object needs to be a
>> collection of fences as well.
> You can't have multiple writers with implicit syncing. That confusion
> is exactly why we called them shared/exclusive. Multiple writers
> generally means that you do some form of fencing in userspace
> (unsync'ed gl buffer access is the common one). What you do for
> private buffers doesn't matter, but when you render into a
> shared/winsys buffer you really need to set the exclusive fence (and
> there can only ever be one). So probably needs some userspace
> adjustments to make sure you don't accidentally set an exclusive write
> hazard when you don't really want that implicit sync.

Nope, that isn't true.

We use multiple writers without implicit syncing between processes in 
the amdgpu stack perfectly fine.

See amdgpu_sync.c for the implementation. What we do there is taking a 
look at all the fences associated with a reservation object and only 
sync to those who are from another process.

Then we use implicit syncing for command submissions in the form of 
"dependencies". E.g. for each CS we report back an identifier of that 
submission to user space and on the next submission you can give this 
identifier as dependency which needs to be satisfied before the command 
submission can start running.

This was done to allow multiple engines (3D, DMA, Compute) to compose a 
buffer while still allow compatibility with protocols like DRI2/DRI3.

Regards,
Christian.

> -Daniel


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences.
  2016-09-21 11:19                               ` Christian König
@ 2016-09-21 12:56                                 ` Daniel Vetter
       [not found]                                   ` <CAKMK7uH6N2Kgwkf-11iwdqDAUrFmreYKLLeTGXmEh+N0DQ4tJg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 46+ messages in thread
From: Daniel Vetter @ 2016-09-21 12:56 UTC (permalink / raw)
  To: Christian König; +Cc: dri-devel, Michel Dänzer, amd-gfx list

On Wed, Sep 21, 2016 at 1:19 PM, Christian König
<deathsimple@vodafone.de> wrote:
> Am 21.09.2016 um 13:04 schrieb Daniel Vetter:
>>
>> On Wed, Sep 21, 2016 at 12:30 PM, Christian König
>> <deathsimple@vodafone.de> wrote:
>>>
>>> Am 21.09.2016 um 11:56 schrieb Michel Dänzer:
>>>>
>>>>
>>>> Looks like there are different interpretations of the semantics of
>>>> exclusive vs. shared fences. Where are these semantics documented?
>>>
>>>
>>> Yeah, I think as well that this is the primary question here.
>>>
>>> IIRC the fences were explicitly called exclusive/shared instead of
>>> writing/reading on purpose.
>>>
>>> I absolutely don't mind switching to them to writing/reading semantics,
>>> but
>>> amdgpu really needs multiple writers at the same time.
>>>
>>> So in this case the writing side of a reservation object needs to be a
>>> collection of fences as well.
>>
>> You can't have multiple writers with implicit syncing. That confusion
>> is exactly why we called them shared/exclusive. Multiple writers
>> generally means that you do some form of fencing in userspace
>> (unsync'ed gl buffer access is the common one). What you do for
>> private buffers doesn't matter, but when you render into a
>> shared/winsys buffer you really need to set the exclusive fence (and
>> there can only ever be one). So probably needs some userspace
>> adjustments to make sure you don't accidentally set an exclusive write
>> hazard when you don't really want that implicit sync.
>
>
> Nope, that isn't true.
>
> We use multiple writers without implicit syncing between processes in the
> amdgpu stack perfectly fine.
>
> See amdgpu_sync.c for the implementation. What we do there is taking a look
> at all the fences associated with a reservation object and only sync to
> those who are from another process.
>
> Then we use implicit syncing for command submissions in the form of
> "dependencies". E.g. for each CS we report back an identifier of that
> submission to user space and on the next submission you can give this
> identifier as dependency which needs to be satisfied before the command
> submission can start running.

This is called explicit fencing. Implemented with a driver-private
primitive (and not sync_file fds like on android), but still
conceptually explicit fencing. Implicit fencing really only can handle
one writer, at least as currently implemented by struct
reservation_object.

> This was done to allow multiple engines (3D, DMA, Compute) to compose a
> buffer while still allow compatibility with protocols like DRI2/DRI3.

Instead of the current solution you need to stop attaching exclusive
fences to non-shared buffers (which are coordinated using the
driver-private explicit fencing you're describing), and only attach
exclusive fences to shared buffers (DRI2/3, PRIME, whatever). Since
you're doing explicit syncing for internal stuff anyway you can still
opt to ignore the exclusive fences if you want to (driven by a flag or
something similar).
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences.
       [not found]                                   ` <CAKMK7uH6N2Kgwkf-11iwdqDAUrFmreYKLLeTGXmEh+N0DQ4tJg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2016-09-21 15:07                                     ` Michel Dänzer
       [not found]                                       ` <9d1f4872-cabd-bd1b-7f10-6e4230a1f58c-otUistvHUpPR7s880joybQ@public.gmane.org>
  0 siblings, 1 reply; 46+ messages in thread
From: Michel Dänzer @ 2016-09-21 15:07 UTC (permalink / raw)
  To: Daniel Vetter, Christian König
  Cc: Mario Kleiner, dri-devel, amd-gfx list, Chris Wilson

On 21/09/16 09:56 PM, Daniel Vetter wrote:
> On Wed, Sep 21, 2016 at 1:19 PM, Christian König
> <deathsimple@vodafone.de> wrote:
>> Am 21.09.2016 um 13:04 schrieb Daniel Vetter:
>>> On Wed, Sep 21, 2016 at 12:30 PM, Christian König
>>> <deathsimple@vodafone.de> wrote:
>>>> Am 21.09.2016 um 11:56 schrieb Michel Dänzer:
>>>>>
>>>>>
>>>>> Looks like there are different interpretations of the semantics of
>>>>> exclusive vs. shared fences. Where are these semantics documented?
>>>>
>>>>
>>>> Yeah, I think as well that this is the primary question here.
>>>>
>>>> IIRC the fences were explicitly called exclusive/shared instead of
>>>> writing/reading on purpose.
>>>>
>>>> I absolutely don't mind switching to them to writing/reading semantics,
>>>> but
>>>> amdgpu really needs multiple writers at the same time.
>>>>
>>>> So in this case the writing side of a reservation object needs to be a
>>>> collection of fences as well.
>>>
>>> You can't have multiple writers with implicit syncing. That confusion
>>> is exactly why we called them shared/exclusive. Multiple writers
>>> generally means that you do some form of fencing in userspace
>>> (unsync'ed gl buffer access is the common one). What you do for
>>> private buffers doesn't matter, but when you render into a
>>> shared/winsys buffer you really need to set the exclusive fence (and
>>> there can only ever be one). So probably needs some userspace
>>> adjustments to make sure you don't accidentally set an exclusive write
>>> hazard when you don't really want that implicit sync.
>>
>>
>> Nope, that isn't true.
>>
>> We use multiple writers without implicit syncing between processes in the
>> amdgpu stack perfectly fine.
>>
>> See amdgpu_sync.c for the implementation. What we do there is taking a look
>> at all the fences associated with a reservation object and only sync to
>> those who are from another process.
>>
>> Then we use implicit syncing for command submissions in the form of
>> "dependencies". E.g. for each CS we report back an identifier of that
>> submission to user space and on the next submission you can give this
>> identifier as dependency which needs to be satisfied before the command
>> submission can start running.
> 
> This is called explicit fencing. Implemented with a driver-private
> primitive (and not sync_file fds like on android), but still
> conceptually explicit fencing. Implicit fencing really only can handle
> one writer, at least as currently implemented by struct
> reservation_object.
> 
>> This was done to allow multiple engines (3D, DMA, Compute) to compose a
>> buffer while still allow compatibility with protocols like DRI2/DRI3.
> 
> Instead of the current solution you need to stop attaching exclusive
> fences to non-shared buffers (which are coordinated using the
> driver-private explicit fencing you're describing),

Err, the current issue is actually that amdgpu never sets an exclusive
fence, only ever shared ones. :)

> and only attach exclusive fences to shared buffers (DRI2/3, PRIME,
> whatever).

Still, it occurred to me in the meantime that amdgpu setting the
exclusive fence for buffers shared via PRIME (no matter if it's a write
or read operation) might be a solution. Christian, what do you think?


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences.
  2016-09-21 10:30                         ` Christian König
  2016-09-21 11:04                           ` Daniel Vetter
@ 2016-09-21 15:13                           ` Michel Dänzer
       [not found]                             ` <f0e034f9-db22-6577-97c7-dd8d3e851226-otUistvHUpPR7s880joybQ@public.gmane.org>
  1 sibling, 1 reply; 46+ messages in thread
From: Michel Dänzer @ 2016-09-21 15:13 UTC (permalink / raw)
  To: Christian König, Chris Wilson, Mario Kleiner; +Cc: amd-gfx, dri-devel

On 21/09/16 07:30 PM, Christian König wrote:
> Am 21.09.2016 um 11:56 schrieb Michel Dänzer:
>>
>> FWIW, we seem to have the same issue with radeon vs. amdgpu: radeon only
>> seems to wait for exclusive fences, so e.g. running Xorg on amdgpu and
>> using PRIME slave scanout on radeon leaves artifacts.
> 
> Yeah, I know. See radeon_display.c radeon_flip_work_func().
> 
> We pretty much need the same patch here I've done for amdgpu as well.

Actually, the PRIME slave can't scan out from the shared BOs directly
(recall the recent discussion we had about that with Mario), but has to
copy from the shared BO to a dedicated scanout BO. These copies need to
be synchronized with the primary GPU's copies to the shared BO.


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences.
       [not found]                                       ` <9d1f4872-cabd-bd1b-7f10-6e4230a1f58c-otUistvHUpPR7s880joybQ@public.gmane.org>
@ 2016-09-21 15:15                                         ` Christian König
       [not found]                                           ` <5c2048ff-0e20-ddf3-2d73-9a3acb38e7ff-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
  2016-09-22  6:33                                         ` Daniel Vetter
  1 sibling, 1 reply; 46+ messages in thread
From: Christian König @ 2016-09-21 15:15 UTC (permalink / raw)
  To: Michel Dänzer, Daniel Vetter
  Cc: Mario Kleiner, dri-devel, amd-gfx list, Chris Wilson

Am 21.09.2016 um 17:07 schrieb Michel Dänzer:
> On 21/09/16 09:56 PM, Daniel Vetter wrote:
>> On Wed, Sep 21, 2016 at 1:19 PM, Christian König
>> <deathsimple@vodafone.de> wrote:
>>> Am 21.09.2016 um 13:04 schrieb Daniel Vetter:
>>>> On Wed, Sep 21, 2016 at 12:30 PM, Christian König
>>>> <deathsimple@vodafone.de> wrote:
>>>>> Am 21.09.2016 um 11:56 schrieb Michel Dänzer:
>>>>>>
>>>>>> Looks like there are different interpretations of the semantics of
>>>>>> exclusive vs. shared fences. Where are these semantics documented?
>>>>>
>>>>> Yeah, I think as well that this is the primary question here.
>>>>>
>>>>> IIRC the fences were explicitly called exclusive/shared instead of
>>>>> writing/reading on purpose.
>>>>>
>>>>> I absolutely don't mind switching to them to writing/reading semantics,
>>>>> but
>>>>> amdgpu really needs multiple writers at the same time.
>>>>>
>>>>> So in this case the writing side of a reservation object needs to be a
>>>>> collection of fences as well.
>>>> You can't have multiple writers with implicit syncing. That confusion
>>>> is exactly why we called them shared/exclusive. Multiple writers
>>>> generally means that you do some form of fencing in userspace
>>>> (unsync'ed gl buffer access is the common one). What you do for
>>>> private buffers doesn't matter, but when you render into a
>>>> shared/winsys buffer you really need to set the exclusive fence (and
>>>> there can only ever be one). So probably needs some userspace
>>>> adjustments to make sure you don't accidentally set an exclusive write
>>>> hazard when you don't really want that implicit sync.
>>>
>>> Nope, that isn't true.
>>>
>>> We use multiple writers without implicit syncing between processes in the
>>> amdgpu stack perfectly fine.
>>>
>>> See amdgpu_sync.c for the implementation. What we do there is taking a look
>>> at all the fences associated with a reservation object and only sync to
>>> those who are from another process.
>>>
>>> Then we use implicit syncing for command submissions in the form of
>>> "dependencies". E.g. for each CS we report back an identifier of that
>>> submission to user space and on the next submission you can give this
>>> identifier as dependency which needs to be satisfied before the command
>>> submission can start running.
>> This is called explicit fencing. Implemented with a driver-private
>> primitive (and not sync_file fds like on android), but still
>> conceptually explicit fencing. Implicit fencing really only can handle
>> one writer, at least as currently implemented by struct
>> reservation_object.
>>
>>> This was done to allow multiple engines (3D, DMA, Compute) to compose a
>>> buffer while still allow compatibility with protocols like DRI2/DRI3.
>> Instead of the current solution you need to stop attaching exclusive
>> fences to non-shared buffers (which are coordinated using the
>> driver-private explicit fencing you're describing),
> Err, the current issue is actually that amdgpu never sets an exclusive
> fence, only ever shared ones. :)

Actually amdgpu does set the exclusive fence for buffer migrations, 
cause that is an operation user space doesn't know about and so it needs 
to be "exclusive" access to the buffer.


>> and only attach exclusive fences to shared buffers (DRI2/3, PRIME,
>> whatever).
> Still, it occurred to me in the meantime that amdgpu setting the
> exclusive fence for buffers shared via PRIME (no matter if it's a write
> or read operation) might be a solution. Christian, what do you think?

The problem is that we don't have one fence, but many.

E.g. there can be many operation on a buffer at the same time and we 
need to wait for all of them to complete before it can be displayed.

Regards,
Christian.
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences.
       [not found]                             ` <f0e034f9-db22-6577-97c7-dd8d3e851226-otUistvHUpPR7s880joybQ@public.gmane.org>
@ 2016-09-21 15:21                               ` Christian König
  2016-09-21 15:28                                 ` Michel Dänzer
  0 siblings, 1 reply; 46+ messages in thread
From: Christian König @ 2016-09-21 15:21 UTC (permalink / raw)
  To: Michel Dänzer, Chris Wilson, Mario Kleiner
  Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Am 21.09.2016 um 17:13 schrieb Michel Dänzer:
> On 21/09/16 07:30 PM, Christian König wrote:
>> Am 21.09.2016 um 11:56 schrieb Michel Dänzer:
>>> FWIW, we seem to have the same issue with radeon vs. amdgpu: radeon only
>>> seems to wait for exclusive fences, so e.g. running Xorg on amdgpu and
>>> using PRIME slave scanout on radeon leaves artifacts.
>> Yeah, I know. See radeon_display.c radeon_flip_work_func().
>>
>> We pretty much need the same patch here I've done for amdgpu as well.
> Actually, the PRIME slave can't scan out from the shared BOs directly
> (recall the recent discussion we had about that with Mario), but has to
> copy from the shared BO to a dedicated scanout BO. These copies need to
> be synchronized with the primary GPU's copies to the shared BO.

Yeah, that thought came to my mind before as well.

Buffer migrations by the kernel caused by a prime export actually set 
the exclusive fence.

So this shouldn't be an issue in practice when the displaying GPU needs 
to copy from the BO again anyway.

The only case I can see when this can happen is when the BO is composed 
directly in system memory by the engines and not migrated there.

Could be that we run into this issue more often in the future, because 
that is pretty much what we want to have for 4K UVD decode.

Christian.

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences.
  2016-09-21 15:21                               ` Christian König
@ 2016-09-21 15:28                                 ` Michel Dänzer
  0 siblings, 0 replies; 46+ messages in thread
From: Michel Dänzer @ 2016-09-21 15:28 UTC (permalink / raw)
  To: Christian König, Chris Wilson, Mario Kleiner; +Cc: amd-gfx, dri-devel

On 22/09/16 12:21 AM, Christian König wrote:
> Am 21.09.2016 um 17:13 schrieb Michel Dänzer:
>> On 21/09/16 07:30 PM, Christian König wrote:
>>> Am 21.09.2016 um 11:56 schrieb Michel Dänzer:
>>>> FWIW, we seem to have the same issue with radeon vs. amdgpu: radeon
>>>> only
>>>> seems to wait for exclusive fences, so e.g. running Xorg on amdgpu and
>>>> using PRIME slave scanout on radeon leaves artifacts.
>>> Yeah, I know. See radeon_display.c radeon_flip_work_func().
>>>
>>> We pretty much need the same patch here I've done for amdgpu as well.
>> Actually, the PRIME slave can't scan out from the shared BOs directly
>> (recall the recent discussion we had about that with Mario), but has to
>> copy from the shared BO to a dedicated scanout BO. These copies need to
>> be synchronized with the primary GPU's copies to the shared BO.
> 
> Yeah, that thought came to my mind before as well.
> 
> Buffer migrations by the kernel caused by a prime export actually set
> the exclusive fence.
> 
> So this shouldn't be an issue in practice when the displaying GPU needs
> to copy from the BO again anyway.
> 
> The only case I can see when this can happen is when the BO is composed
> directly in system memory by the engines and not migrated there.

There is no migration going on in the steady state, just the primary GPU
writing to the shared BO and the slave GPU reading from it. If those
operations aren't properly synchronized, there is at least intermittent
tearing, but there can even be artifacts which stay until that area is
updated again.


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences.
       [not found]                                           ` <5c2048ff-0e20-ddf3-2d73-9a3acb38e7ff-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
@ 2016-09-21 15:29                                             ` Michel Dänzer
  2016-09-21 16:23                                               ` Christian König
  0 siblings, 1 reply; 46+ messages in thread
From: Michel Dänzer @ 2016-09-21 15:29 UTC (permalink / raw)
  To: Christian König, Daniel Vetter
  Cc: Mario Kleiner, dri-devel, amd-gfx list, Chris Wilson

On 22/09/16 12:15 AM, Christian König wrote:
> Am 21.09.2016 um 17:07 schrieb Michel Dänzer:
>> On 21/09/16 09:56 PM, Daniel Vetter wrote:
>>> On Wed, Sep 21, 2016 at 1:19 PM, Christian König
>>> <deathsimple@vodafone.de> wrote:
>>>>
>>>> We use multiple writers without implicit syncing between processes
>>>> in the
>>>> amdgpu stack perfectly fine.
>>>>
>>>> See amdgpu_sync.c for the implementation. What we do there is taking
>>>> a look
>>>> at all the fences associated with a reservation object and only sync to
>>>> those who are from another process.
>>>>
>>>> Then we use implicit syncing for command submissions in the form of
>>>> "dependencies". E.g. for each CS we report back an identifier of that
>>>> submission to user space and on the next submission you can give this
>>>> identifier as dependency which needs to be satisfied before the command
>>>> submission can start running.
>>> This is called explicit fencing. Implemented with a driver-private
>>> primitive (and not sync_file fds like on android), but still
>>> conceptually explicit fencing. Implicit fencing really only can handle
>>> one writer, at least as currently implemented by struct
>>> reservation_object.
>>>
>>>> This was done to allow multiple engines (3D, DMA, Compute) to compose a
>>>> buffer while still allow compatibility with protocols like DRI2/DRI3.
>>> Instead of the current solution you need to stop attaching exclusive
>>> fences to non-shared buffers (which are coordinated using the
>>> driver-private explicit fencing you're describing),
>> Err, the current issue is actually that amdgpu never sets an exclusive
>> fence, only ever shared ones. :)
> 
> Actually amdgpu does set the exclusive fence for buffer migrations,
> cause that is an operation user space doesn't know about and so it needs
> to be "exclusive" access to the buffer.
> 
> 
>>> and only attach exclusive fences to shared buffers (DRI2/3, PRIME,
>>> whatever).
>> Still, it occurred to me in the meantime that amdgpu setting the
>> exclusive fence for buffers shared via PRIME (no matter if it's a write
>> or read operation) might be a solution. Christian, what do you think?
> 
> The problem is that we don't have one fence, but many.
> 
> E.g. there can be many operation on a buffer at the same time and we
> need to wait for all of them to complete before it can be displayed.

Maybe in theory, but with the problem we have in practice right now, the
amdgpu GPU should only ever access the shared BO with the same engine.

Anyway, this should be solvable by setting some kind of meta-fence as
the exclusive fence, which can internally be mapped to multiple fences,
maybe up to one for each ring which can access the BO?


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences.
  2016-09-21 15:29                                             ` Michel Dänzer
@ 2016-09-21 16:23                                               ` Christian König
  2016-09-22  6:36                                                 ` Daniel Vetter
  0 siblings, 1 reply; 46+ messages in thread
From: Christian König @ 2016-09-21 16:23 UTC (permalink / raw)
  To: Michel Dänzer, Daniel Vetter; +Cc: dri-devel, amd-gfx list

Am 21.09.2016 um 17:29 schrieb Michel Dänzer:
> On 22/09/16 12:15 AM, Christian König wrote:
>> Am 21.09.2016 um 17:07 schrieb Michel Dänzer:
>>> On 21/09/16 09:56 PM, Daniel Vetter wrote:
>>>> On Wed, Sep 21, 2016 at 1:19 PM, Christian König
>>>> <deathsimple@vodafone.de> wrote:
>>>>> We use multiple writers without implicit syncing between processes
>>>>> in the
>>>>> amdgpu stack perfectly fine.
>>>>>
>>>>> See amdgpu_sync.c for the implementation. What we do there is taking
>>>>> a look
>>>>> at all the fences associated with a reservation object and only sync to
>>>>> those who are from another process.
>>>>>
>>>>> Then we use implicit syncing for command submissions in the form of
>>>>> "dependencies". E.g. for each CS we report back an identifier of that
>>>>> submission to user space and on the next submission you can give this
>>>>> identifier as dependency which needs to be satisfied before the command
>>>>> submission can start running.
>>>> This is called explicit fencing. Implemented with a driver-private
>>>> primitive (and not sync_file fds like on android), but still
>>>> conceptually explicit fencing. Implicit fencing really only can handle
>>>> one writer, at least as currently implemented by struct
>>>> reservation_object.
>>>>
>>>>> This was done to allow multiple engines (3D, DMA, Compute) to compose a
>>>>> buffer while still allow compatibility with protocols like DRI2/DRI3.
>>>> Instead of the current solution you need to stop attaching exclusive
>>>> fences to non-shared buffers (which are coordinated using the
>>>> driver-private explicit fencing you're describing),
>>> Err, the current issue is actually that amdgpu never sets an exclusive
>>> fence, only ever shared ones. :)
>> Actually amdgpu does set the exclusive fence for buffer migrations,
>> cause that is an operation user space doesn't know about and so it needs
>> to be "exclusive" access to the buffer.
>>
>>
>>>> and only attach exclusive fences to shared buffers (DRI2/3, PRIME,
>>>> whatever).
>>> Still, it occurred to me in the meantime that amdgpu setting the
>>> exclusive fence for buffers shared via PRIME (no matter if it's a write
>>> or read operation) might be a solution. Christian, what do you think?
>> The problem is that we don't have one fence, but many.
>>
>> E.g. there can be many operation on a buffer at the same time and we
>> need to wait for all of them to complete before it can be displayed.
> Maybe in theory, but with the problem we have in practice right now, the
> amdgpu GPU should only ever access the shared BO with the same engine.

That clearly won't work. Take a look at what both Mesa and the pro stack 
do with the BO before it is displayed makes it mandatory to execute 
things in parallel (at least for the not shared case).

> Anyway, this should be solvable by setting some kind of meta-fence as
> the exclusive fence, which can internally be mapped to multiple fences,
> maybe up to one for each ring which can access the BO?

I've thought about that as well, but this approach would also only work 
when we keep a collection of fences and not just an array because of the 
scheduler.

For a quick workaround I suggest to just serialize all accesses to BO 
shared with different drivers, but essentially I think it is a perfectly 
valid requirement to have multiple writers to one BO.

Christian.

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences.
       [not found]                                       ` <9d1f4872-cabd-bd1b-7f10-6e4230a1f58c-otUistvHUpPR7s880joybQ@public.gmane.org>
  2016-09-21 15:15                                         ` Christian König
@ 2016-09-22  6:33                                         ` Daniel Vetter
  1 sibling, 0 replies; 46+ messages in thread
From: Daniel Vetter @ 2016-09-22  6:33 UTC (permalink / raw)
  To: Michel Dänzer
  Cc: Mario Kleiner, dri-devel, Chris Wilson, Christian König,
	amd-gfx list, Daniel Vetter

On Thu, Sep 22, 2016 at 12:07:24AM +0900, Michel Dänzer wrote:
> On 21/09/16 09:56 PM, Daniel Vetter wrote:
> > On Wed, Sep 21, 2016 at 1:19 PM, Christian König
> > <deathsimple@vodafone.de> wrote:
> >> Am 21.09.2016 um 13:04 schrieb Daniel Vetter:
> >>> On Wed, Sep 21, 2016 at 12:30 PM, Christian König
> >>> <deathsimple@vodafone.de> wrote:
> >>>> Am 21.09.2016 um 11:56 schrieb Michel Dänzer:
> >>>>>
> >>>>>
> >>>>> Looks like there are different interpretations of the semantics of
> >>>>> exclusive vs. shared fences. Where are these semantics documented?
> >>>>
> >>>>
> >>>> Yeah, I think as well that this is the primary question here.
> >>>>
> >>>> IIRC the fences were explicitly called exclusive/shared instead of
> >>>> writing/reading on purpose.
> >>>>
> >>>> I absolutely don't mind switching to them to writing/reading semantics,
> >>>> but
> >>>> amdgpu really needs multiple writers at the same time.
> >>>>
> >>>> So in this case the writing side of a reservation object needs to be a
> >>>> collection of fences as well.
> >>>
> >>> You can't have multiple writers with implicit syncing. That confusion
> >>> is exactly why we called them shared/exclusive. Multiple writers
> >>> generally means that you do some form of fencing in userspace
> >>> (unsync'ed gl buffer access is the common one). What you do for
> >>> private buffers doesn't matter, but when you render into a
> >>> shared/winsys buffer you really need to set the exclusive fence (and
> >>> there can only ever be one). So probably needs some userspace
> >>> adjustments to make sure you don't accidentally set an exclusive write
> >>> hazard when you don't really want that implicit sync.
> >>
> >>
> >> Nope, that isn't true.
> >>
> >> We use multiple writers without implicit syncing between processes in the
> >> amdgpu stack perfectly fine.
> >>
> >> See amdgpu_sync.c for the implementation. What we do there is taking a look
> >> at all the fences associated with a reservation object and only sync to
> >> those who are from another process.
> >>
> >> Then we use implicit syncing for command submissions in the form of
> >> "dependencies". E.g. for each CS we report back an identifier of that
> >> submission to user space and on the next submission you can give this
> >> identifier as dependency which needs to be satisfied before the command
> >> submission can start running.
> > 
> > This is called explicit fencing. Implemented with a driver-private
> > primitive (and not sync_file fds like on android), but still
> > conceptually explicit fencing. Implicit fencing really only can handle
> > one writer, at least as currently implemented by struct
> > reservation_object.
> > 
> >> This was done to allow multiple engines (3D, DMA, Compute) to compose a
> >> buffer while still allow compatibility with protocols like DRI2/DRI3.
> > 
> > Instead of the current solution you need to stop attaching exclusive
> > fences to non-shared buffers (which are coordinated using the
> > driver-private explicit fencing you're describing),
> 
> Err, the current issue is actually that amdgpu never sets an exclusive
> fence, only ever shared ones. :)

Well since you sometimes sync and sometimes not sync it is kinda a special
case of semi-exclusive fence (even if attached to the shared slots).
> 
> > and only attach exclusive fences to shared buffers (DRI2/3, PRIME,
> > whatever).
> 
> Still, it occurred to me in the meantime that amdgpu setting the
> exclusive fence for buffers shared via PRIME (no matter if it's a write
> or read operation) might be a solution. Christian, what do you think?

Yup, that's what I mean. And it shouldn't cause a problem since for shared
buffers (at least for protocols where implicit fencing is required), since
for those you really can't have multiple concurrent writers. And with the
special checks in amdgpu_sync.c that's what's happening in reality, only
difference is that the filtering/selection of what is considered and
exclusive fences happens when you sync, and not when you attach them. And
that breaks reservation_object assumptions.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences.
  2016-09-21 16:23                                               ` Christian König
@ 2016-09-22  6:36                                                 ` Daniel Vetter
       [not found]                                                   ` <20160922063625.GD22164-XQyZGdhdUcTMwUGJfOwWj/ooFf0ArEBIu+b9c/7xato@public.gmane.org>
  0 siblings, 1 reply; 46+ messages in thread
From: Daniel Vetter @ 2016-09-22  6:36 UTC (permalink / raw)
  To: Christian König; +Cc: Michel Dänzer, dri-devel, amd-gfx list

On Wed, Sep 21, 2016 at 06:23:35PM +0200, Christian König wrote:
> For a quick workaround I suggest to just serialize all accesses to BO shared
> with different drivers, but essentially I think it is a perfectly valid
> requirement to have multiple writers to one BO.

It is, but it's not possible with implicit sync. If you want parallel
write access to the same shared buffer, you _must_ carry around some
explicit fences. Within amdgpu you can use driver-specific cookies, for
shared buffer we now have sync_file. But multiple writers with implicit
sync simply fundamentally doesn't work. Because you have no idea with which
writer, touching the same subrange you want to touch.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences.
       [not found]                                                   ` <20160922063625.GD22164-XQyZGdhdUcTMwUGJfOwWj/ooFf0ArEBIu+b9c/7xato@public.gmane.org>
@ 2016-09-22 10:55                                                     ` Christian König
  2016-09-22 12:26                                                       ` Daniel Vetter
  0 siblings, 1 reply; 46+ messages in thread
From: Christian König @ 2016-09-22 10:55 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: dri-devel, Michel Dänzer, Mario Kleiner, amd-gfx list, Chris Wilson

Am 22.09.2016 um 08:36 schrieb Daniel Vetter:
> On Wed, Sep 21, 2016 at 06:23:35PM +0200, Christian König wrote:
>> For a quick workaround I suggest to just serialize all accesses to BO shared
>> with different drivers, but essentially I think it is a perfectly valid
>> requirement to have multiple writers to one BO.
> It is, but it's not possible with implicit sync. If you want parallel
> write access to the same shared buffer, you _must_ carry around some
> explicit fences. Within amdgpu you can use driver-specific cookies, for
> shared buffer we now have sync_file. But multiple writers with implicit
> sync simply fundamentally doesn't work. Because you have no idea with which
> writer, touching the same subrange you want to touch.

You don't need to separate the BO into subranges which are touched by 
different engines for allowing multiple writers.

AMD hardware and I'm pretty sure others as well are perfectly capable of 
writing to the same memory from multiple engines and even multiple GPUs 
at the same time.

For a good hint of what is possible see the public AMD ISA documentation 
about atomic operations, but that is only the start of it.

The crux here is that we need to assume that we will have implicit and 
explicit sync mixed for backward compatibility.

This implies that we need some mechanism like amdgpu uses in it's sync 
implementation where every fence is associated with an owner which 
denotes the domain in which implicit sync happens. If you leave this 
domain you will automatically run into explicit sync.

Currently we define the borders of this domain in amdgpu on process 
boundary to keep things like DRI2/DRI3 working as expected.

I really don't see how you want to solve this with a single explicit 
fence for each reservation object. As long as you have multiple 
concurrently running operations accessing the same buffer you need to 
keep one fence for each operation no matter what.

Regards,
Christian.

> -Daniel

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences.
  2016-09-22 10:55                                                     ` Christian König
@ 2016-09-22 12:26                                                       ` Daniel Vetter
  2016-09-22 12:44                                                         ` Christian König
  0 siblings, 1 reply; 46+ messages in thread
From: Daniel Vetter @ 2016-09-22 12:26 UTC (permalink / raw)
  To: Christian König; +Cc: dri-devel, Michel Dänzer, amd-gfx list

On Thu, Sep 22, 2016 at 12:55 PM, Christian König
<deathsimple@vodafone.de> wrote:
> Am 22.09.2016 um 08:36 schrieb Daniel Vetter:
>>
>> On Wed, Sep 21, 2016 at 06:23:35PM +0200, Christian König wrote:
>>>
>>> For a quick workaround I suggest to just serialize all accesses to BO
>>> shared
>>> with different drivers, but essentially I think it is a perfectly valid
>>> requirement to have multiple writers to one BO.
>>
>> It is, but it's not possible with implicit sync. If you want parallel
>> write access to the same shared buffer, you _must_ carry around some
>> explicit fences. Within amdgpu you can use driver-specific cookies, for
>> shared buffer we now have sync_file. But multiple writers with implicit
>> sync simply fundamentally doesn't work. Because you have no idea with
>> which
>> writer, touching the same subrange you want to touch.
>
>
> You don't need to separate the BO into subranges which are touched by
> different engines for allowing multiple writers.
>
> AMD hardware and I'm pretty sure others as well are perfectly capable of
> writing to the same memory from multiple engines and even multiple GPUs at
> the same time.
>
> For a good hint of what is possible see the public AMD ISA documentation
> about atomic operations, but that is only the start of it.
>
>
> The crux here is that we need to assume that we will have implicit and
> explicit sync mixed for backward compatibility.
>
> This implies that we need some mechanism like amdgpu uses in it's sync
> implementation where every fence is associated with an owner which denotes
> the domain in which implicit sync happens. If you leave this domain you will
> automatically run into explicit sync.
>
> Currently we define the borders of this domain in amdgpu on process boundary
> to keep things like DRI2/DRI3 working as expected.
>
> I really don't see how you want to solve this with a single explicit fence
> for each reservation object. As long as you have multiple concurrently
> running operations accessing the same buffer you need to keep one fence for
> each operation no matter what.

I can't make sense of what you're saying, and I suspect we put
different meaning to different words. So let me define here:

- implicit fencing: Userspace does not track read/writes to buffers,
but only the kernel does that. This is the assumption DRI2/3 has.
Since synchronization is by necessity on a per-buffer level you can
only have 1 writer. In the kernel the cross-driver interface for this
is struct reservation_object attached to dma-bufs. If you don't fill
out/wait for the exclusive fence in there, you're driver is _not_
doing (cross-device) implicit fencing.

- explicit fencing: Userspace passes around distinct fence objects for
any work going on on the gpu. The kernel doesn't insert any stall of
it's own (except for moving buffer objects around ofc). This is what
Android. This also seems to be what amdgpu is doing within one
process/owner.

Given that I'm not sure what you mean with "one explicit fence per
reservation_object", since explicit fencing should not attach anything
(at least not any exclusive fences) to a reservation_object. It does
sound a bit like you have the meaning the other way round (as in
explicit fencing = the kernel explicitly takes care of fencing, but
it's explicit as in explicit fences visible to userspace).
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences.
  2016-09-22 12:26                                                       ` Daniel Vetter
@ 2016-09-22 12:44                                                         ` Christian König
  2016-09-22 13:05                                                           ` Daniel Vetter
  0 siblings, 1 reply; 46+ messages in thread
From: Christian König @ 2016-09-22 12:44 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: dri-devel, Michel Dänzer, amd-gfx list

Am 22.09.2016 um 14:26 schrieb Daniel Vetter:
> On Thu, Sep 22, 2016 at 12:55 PM, Christian König
> <deathsimple@vodafone.de> wrote:
>> Am 22.09.2016 um 08:36 schrieb Daniel Vetter:
>>> On Wed, Sep 21, 2016 at 06:23:35PM +0200, Christian König wrote:
>>>> For a quick workaround I suggest to just serialize all accesses to BO
>>>> shared
>>>> with different drivers, but essentially I think it is a perfectly valid
>>>> requirement to have multiple writers to one BO.
>>> It is, but it's not possible with implicit sync. If you want parallel
>>> write access to the same shared buffer, you _must_ carry around some
>>> explicit fences. Within amdgpu you can use driver-specific cookies, for
>>> shared buffer we now have sync_file. But multiple writers with implicit
>>> sync simply fundamentally doesn't work. Because you have no idea with
>>> which
>>> writer, touching the same subrange you want to touch.
>>
>> You don't need to separate the BO into subranges which are touched by
>> different engines for allowing multiple writers.
>>
>> AMD hardware and I'm pretty sure others as well are perfectly capable of
>> writing to the same memory from multiple engines and even multiple GPUs at
>> the same time.
>>
>> For a good hint of what is possible see the public AMD ISA documentation
>> about atomic operations, but that is only the start of it.
>>
>>
>> The crux here is that we need to assume that we will have implicit and
>> explicit sync mixed for backward compatibility.
>>
>> This implies that we need some mechanism like amdgpu uses in it's sync
>> implementation where every fence is associated with an owner which denotes
>> the domain in which implicit sync happens. If you leave this domain you will
>> automatically run into explicit sync.
>>
>> Currently we define the borders of this domain in amdgpu on process boundary
>> to keep things like DRI2/DRI3 working as expected.
>>
>> I really don't see how you want to solve this with a single explicit fence
>> for each reservation object. As long as you have multiple concurrently
>> running operations accessing the same buffer you need to keep one fence for
>> each operation no matter what.
> I can't make sense of what you're saying, and I suspect we put
> different meaning to different words. So let me define here:
>
> - implicit fencing: Userspace does not track read/writes to buffers,
> but only the kernel does that. This is the assumption DRI2/3 has.
> Since synchronization is by necessity on a per-buffer level you can
> only have 1 writer. In the kernel the cross-driver interface for this
> is struct reservation_object attached to dma-bufs. If you don't fill
> out/wait for the exclusive fence in there, you're driver is _not_
> doing (cross-device) implicit fencing.

I can confirm that my understanding of implicit fencing is exactly the 
same as yours.

> - explicit fencing: Userspace passes around distinct fence objects for
> any work going on on the gpu. The kernel doesn't insert any stall of
> it's own (except for moving buffer objects around ofc). This is what
> Android. This also seems to be what amdgpu is doing within one
> process/owner.

No, that is clearly not my understanding of explicit fencing.

Userspace doesn't necessarily need to pass around distinct fence objects 
with all of it's protocols and the kernel is still responsible for 
inserting stalls whenever an userspace protocol or application requires 
this semantics.

Otherwise you will never be able to use explicit fencing on the Linux 
desktop with protocols like DRI2/DRI3.

I would expect that every driver in the system waits for all fences of a 
reservation object as long as it isn't told otherwise by providing a 
distinct fence object with the IOCTL in question.

Regards,
Christian.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences.
  2016-09-22 12:44                                                         ` Christian König
@ 2016-09-22 13:05                                                           ` Daniel Vetter
  2016-09-22 13:22                                                             ` Christian König
  0 siblings, 1 reply; 46+ messages in thread
From: Daniel Vetter @ 2016-09-22 13:05 UTC (permalink / raw)
  To: Christian König; +Cc: dri-devel, Michel Dänzer, amd-gfx list

On Thu, Sep 22, 2016 at 2:44 PM, Christian König
<deathsimple@vodafone.de> wrote:
>> - explicit fencing: Userspace passes around distinct fence objects for
>> any work going on on the gpu. The kernel doesn't insert any stall of
>> it's own (except for moving buffer objects around ofc). This is what
>> Android. This also seems to be what amdgpu is doing within one
>> process/owner.
>
>
> No, that is clearly not my understanding of explicit fencing.
>
> Userspace doesn't necessarily need to pass around distinct fence objects
> with all of it's protocols and the kernel is still responsible for inserting
> stalls whenever an userspace protocol or application requires this
> semantics.
>
> Otherwise you will never be able to use explicit fencing on the Linux
> desktop with protocols like DRI2/DRI3.

This is about mixing them. Explicit fencing still means userspace has
an explicit piece, separate from buffers, (either sync_file fd, or a
driver-specific cookie, or similar).

> I would expect that every driver in the system waits for all fences of a
> reservation object as long as it isn't told otherwise by providing a
> distinct fence object with the IOCTL in question.

Yup agreed. This way if your explicitly-fencing driver reads a shared
buffer passed over a protocol that does implicit fencing (like
DRI2/3), then it will work.

The other interop direction is explicitly-fencing driver passes a
buffer to a consumer which expects implicit fencing. In that case you
must attach the right fence to the exclusive slot, but _only_ in that
case. Otherwise you end up stalling your explicitly-fencing userspace,
since implicit fencing doesn't allow more than 1 writer. For amdgpu
one possible way to implement this might be to count how many users a
dma-buf has, and if it's more than just the current context set the
exclusive fence. Or do an uabi revision and let userspace decide (or
at least overwrite it).

But the current approach in amdgpu_sync.c of declaring a fence as
exclusive after the fact (if owners don't match) just isn't how
reservation_object works. You can of course change that, but that
means you must change all drivers implementing support for implicit
fencing of dma-buf. Fixing amdgpu will be easier ;-)
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences.
  2016-09-22 13:05                                                           ` Daniel Vetter
@ 2016-09-22 13:22                                                             ` Christian König
       [not found]                                                               ` <d2430ff8-43bd-bff2-9b02-847cabfd56c0-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
  0 siblings, 1 reply; 46+ messages in thread
From: Christian König @ 2016-09-22 13:22 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: dri-devel, Michel Dänzer, amd-gfx list

Am 22.09.2016 um 15:05 schrieb Daniel Vetter:
> On Thu, Sep 22, 2016 at 2:44 PM, Christian König
> <deathsimple@vodafone.de> wrote:
>>> - explicit fencing: Userspace passes around distinct fence objects for
>>> any work going on on the gpu. The kernel doesn't insert any stall of
>>> it's own (except for moving buffer objects around ofc). This is what
>>> Android. This also seems to be what amdgpu is doing within one
>>> process/owner.
>>
>> No, that is clearly not my understanding of explicit fencing.
>>
>> Userspace doesn't necessarily need to pass around distinct fence objects
>> with all of it's protocols and the kernel is still responsible for inserting
>> stalls whenever an userspace protocol or application requires this
>> semantics.
>>
>> Otherwise you will never be able to use explicit fencing on the Linux
>> desktop with protocols like DRI2/DRI3.
> This is about mixing them. Explicit fencing still means userspace has
> an explicit piece, separate from buffers, (either sync_file fd, or a
> driver-specific cookie, or similar).
>
>> I would expect that every driver in the system waits for all fences of a
>> reservation object as long as it isn't told otherwise by providing a
>> distinct fence object with the IOCTL in question.
> Yup agreed. This way if your explicitly-fencing driver reads a shared
> buffer passed over a protocol that does implicit fencing (like
> DRI2/3), then it will work.
>
> The other interop direction is explicitly-fencing driver passes a
> buffer to a consumer which expects implicit fencing. In that case you
> must attach the right fence to the exclusive slot, but _only_ in that
> case.

Ok well sounds like you are close to understand why I can't do exactly 
this: There simply is no right fence I could attach.

When amdgpu makes the command submissions it doesn't necessarily know 
that the buffer will be exported and shared with another device later on.

So when the buffer is exported and given to the other device you might 
have a whole bunch of fences which run concurrently and not in any 
serial order.

> Otherwise you end up stalling your explicitly-fencing userspace,
> since implicit fencing doesn't allow more than 1 writer. For amdgpu
> one possible way to implement this might be to count how many users a
> dma-buf has, and if it's more than just the current context set the
> exclusive fence. Or do an uabi revision and let userspace decide (or
> at least overwrite it).

I mean I can pick one fence and wait for the rest to finish manually, 
but that would certainly defeat the whole effort, doesn't it?

I completely agree that you have only 1 writer with implicit fencing, 
but when you switch from explicit fencing back to implicit fencing you 
can have multiple ones.

> But the current approach in amdgpu_sync.c of declaring a fence as
> exclusive after the fact (if owners don't match) just isn't how
> reservation_object works. You can of course change that, but that
> means you must change all drivers implementing support for implicit
> fencing of dma-buf. Fixing amdgpu will be easier ;-)

Well as far as I can see there is no way I can fix amdgpu in this case.

The handling clearly needs to be changed on the receiving side of the 
reservation objects if I don't completely want to disable concurrent 
access to BOs in amdgpu.

Regards,
Christian.

> -Daniel


_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences.
       [not found]                                                               ` <d2430ff8-43bd-bff2-9b02-847cabfd56c0-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
@ 2016-09-23 10:00                                                                 ` Michel Dänzer
  2016-09-23 12:09                                                                   ` Daniel Vetter
  0 siblings, 1 reply; 46+ messages in thread
From: Michel Dänzer @ 2016-09-23 10:00 UTC (permalink / raw)
  To: Christian König, Daniel Vetter
  Cc: Mario Kleiner, dri-devel, amd-gfx list, Chris Wilson

On 22/09/16 10:22 PM, Christian König wrote:
> Am 22.09.2016 um 15:05 schrieb Daniel Vetter:
>> On Thu, Sep 22, 2016 at 2:44 PM, Christian König
>> <deathsimple@vodafone.de> wrote:
>>>> - explicit fencing: Userspace passes around distinct fence objects for
>>>> any work going on on the gpu. The kernel doesn't insert any stall of
>>>> it's own (except for moving buffer objects around ofc). This is what
>>>> Android. This also seems to be what amdgpu is doing within one
>>>> process/owner.
>>>
>>> No, that is clearly not my understanding of explicit fencing.
>>>
>>> Userspace doesn't necessarily need to pass around distinct fence objects
>>> with all of it's protocols and the kernel is still responsible for
>>> inserting
>>> stalls whenever an userspace protocol or application requires this
>>> semantics.
>>>
>>> Otherwise you will never be able to use explicit fencing on the Linux
>>> desktop with protocols like DRI2/DRI3.
>> This is about mixing them. Explicit fencing still means userspace has
>> an explicit piece, separate from buffers, (either sync_file fd, or a
>> driver-specific cookie, or similar).
>>
>>> I would expect that every driver in the system waits for all fences of a
>>> reservation object as long as it isn't told otherwise by providing a
>>> distinct fence object with the IOCTL in question.
>> Yup agreed. This way if your explicitly-fencing driver reads a shared
>> buffer passed over a protocol that does implicit fencing (like
>> DRI2/3), then it will work.
>>
>> The other interop direction is explicitly-fencing driver passes a
>> buffer to a consumer which expects implicit fencing. In that case you
>> must attach the right fence to the exclusive slot, but _only_ in that
>> case.
> 
> Ok well sounds like you are close to understand why I can't do exactly
> this: There simply is no right fence I could attach.
> 
> When amdgpu makes the command submissions it doesn't necessarily know
> that the buffer will be exported and shared with another device later on.
> 
> So when the buffer is exported and given to the other device you might
> have a whole bunch of fences which run concurrently and not in any
> serial order.

I feel like you're thinking too much of buffers shared between GPUs as
being short-lived and only shared late. In the use-cases I know about,
shared buffers are created separately and shared ahead of time, the
actual rendering work is done to non-shared buffers and then just copied
to the shared buffers for transfer between GPUs. These copies are always
performed by the same context in such a way that they should always be
performed by the same HW engine and thus implicitly serialized.

Do you have any specific use-cases in mind where buffers are only shared
between GPUs after the rendering operations creating the buffer contents
to be shared have already been submitted?


>> Otherwise you end up stalling your explicitly-fencing userspace,
>> since implicit fencing doesn't allow more than 1 writer. For amdgpu
>> one possible way to implement this might be to count how many users a
>> dma-buf has, and if it's more than just the current context set the
>> exclusive fence. Or do an uabi revision and let userspace decide (or
>> at least overwrite it).
> 
> I mean I can pick one fence and wait for the rest to finish manually,
> but that would certainly defeat the whole effort, doesn't it?

I'm afraid it's not clear to me why it would. Can you elaborate?


>> But the current approach in amdgpu_sync.c of declaring a fence as
>> exclusive after the fact (if owners don't match) just isn't how
>> reservation_object works. You can of course change that, but that
>> means you must change all drivers implementing support for implicit
>> fencing of dma-buf. Fixing amdgpu will be easier ;-)
> 
> Well as far as I can see there is no way I can fix amdgpu in this case.
> 
> The handling clearly needs to be changed on the receiving side of the
> reservation objects if I don't completely want to disable concurrent
> access to BOs in amdgpu.

Anyway, we need a solution for this between radeon and amdgpu, and I
don't think a solution which involves those drivers using reservation
object semantics between them which are different from all other drivers
is a good idea.


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences.
  2016-09-23 10:00                                                                 ` Michel Dänzer
@ 2016-09-23 12:09                                                                   ` Daniel Vetter
  2016-09-26  0:48                                                                     ` Michel Dänzer
  0 siblings, 1 reply; 46+ messages in thread
From: Daniel Vetter @ 2016-09-23 12:09 UTC (permalink / raw)
  To: Michel Dänzer; +Cc: dri-devel, amd-gfx list

On Fri, Sep 23, 2016 at 07:00:25PM +0900, Michel Dänzer wrote:
> On 22/09/16 10:22 PM, Christian König wrote:
> > Am 22.09.2016 um 15:05 schrieb Daniel Vetter:
> >> On Thu, Sep 22, 2016 at 2:44 PM, Christian König
> >> <deathsimple@vodafone.de> wrote:
> >>>> - explicit fencing: Userspace passes around distinct fence objects for
> >>>> any work going on on the gpu. The kernel doesn't insert any stall of
> >>>> it's own (except for moving buffer objects around ofc). This is what
> >>>> Android. This also seems to be what amdgpu is doing within one
> >>>> process/owner.
> >>>
> >>> No, that is clearly not my understanding of explicit fencing.
> >>>
> >>> Userspace doesn't necessarily need to pass around distinct fence objects
> >>> with all of it's protocols and the kernel is still responsible for
> >>> inserting
> >>> stalls whenever an userspace protocol or application requires this
> >>> semantics.
> >>>
> >>> Otherwise you will never be able to use explicit fencing on the Linux
> >>> desktop with protocols like DRI2/DRI3.
> >> This is about mixing them. Explicit fencing still means userspace has
> >> an explicit piece, separate from buffers, (either sync_file fd, or a
> >> driver-specific cookie, or similar).
> >>
> >>> I would expect that every driver in the system waits for all fences of a
> >>> reservation object as long as it isn't told otherwise by providing a
> >>> distinct fence object with the IOCTL in question.
> >> Yup agreed. This way if your explicitly-fencing driver reads a shared
> >> buffer passed over a protocol that does implicit fencing (like
> >> DRI2/3), then it will work.
> >>
> >> The other interop direction is explicitly-fencing driver passes a
> >> buffer to a consumer which expects implicit fencing. In that case you
> >> must attach the right fence to the exclusive slot, but _only_ in that
> >> case.
> > 
> > Ok well sounds like you are close to understand why I can't do exactly
> > this: There simply is no right fence I could attach.
> > 
> > When amdgpu makes the command submissions it doesn't necessarily know
> > that the buffer will be exported and shared with another device later on.
> > 
> > So when the buffer is exported and given to the other device you might
> > have a whole bunch of fences which run concurrently and not in any
> > serial order.
> 
> I feel like you're thinking too much of buffers shared between GPUs as
> being short-lived and only shared late. In the use-cases I know about,
> shared buffers are created separately and shared ahead of time, the
> actual rendering work is done to non-shared buffers and then just copied
> to the shared buffers for transfer between GPUs. These copies are always
> performed by the same context in such a way that they should always be
> performed by the same HW engine and thus implicitly serialized.
> 
> Do you have any specific use-cases in mind where buffers are only shared
> between GPUs after the rendering operations creating the buffer contents
> to be shared have already been submitted?

Yeah, it should be known which buffer you use (at least in userspace,
maybe not in the kernel) for which you need implicit fencing. At least
with DRI2/3 it's really obvious which buffers are shared. Same holds for
external images and other imported buffers.

Yes that means you need to keep track of a few things in userspace, and
you need to add a special flag to CS to make sure the kernel does set the
exclusive fence.

> >> Otherwise you end up stalling your explicitly-fencing userspace,
> >> since implicit fencing doesn't allow more than 1 writer. For amdgpu
> >> one possible way to implement this might be to count how many users a
> >> dma-buf has, and if it's more than just the current context set the
> >> exclusive fence. Or do an uabi revision and let userspace decide (or
> >> at least overwrite it).
> > 
> > I mean I can pick one fence and wait for the rest to finish manually,
> > but that would certainly defeat the whole effort, doesn't it?
> 
> I'm afraid it's not clear to me why it would. Can you elaborate?
> 
> 
> >> But the current approach in amdgpu_sync.c of declaring a fence as
> >> exclusive after the fact (if owners don't match) just isn't how
> >> reservation_object works. You can of course change that, but that
> >> means you must change all drivers implementing support for implicit
> >> fencing of dma-buf. Fixing amdgpu will be easier ;-)
> > 
> > Well as far as I can see there is no way I can fix amdgpu in this case.
> > 
> > The handling clearly needs to be changed on the receiving side of the
> > reservation objects if I don't completely want to disable concurrent
> > access to BOs in amdgpu.
> 
> Anyway, we need a solution for this between radeon and amdgpu, and I
> don't think a solution which involves those drivers using reservation
> object semantics between them which are different from all other drivers
> is a good idea.

Afaik there's also amd+intel machines out there, so really the only option
is to either fix amdgpu to correctly set exclusive fences on shared
buffers (with the help of userspace hints). Or change all the existing
drivers. No idea what's simpler to do, since I don't know about amdgpu
userspace.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences.
  2016-09-23 12:09                                                                   ` Daniel Vetter
@ 2016-09-26  0:48                                                                     ` Michel Dänzer
  2016-09-26  8:04                                                                       ` Daniel Vetter
  0 siblings, 1 reply; 46+ messages in thread
From: Michel Dänzer @ 2016-09-26  0:48 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: dri-devel, amd-gfx list

On 23/09/16 09:09 PM, Daniel Vetter wrote:
> On Fri, Sep 23, 2016 at 07:00:25PM +0900, Michel Dänzer wrote:
>> On 22/09/16 10:22 PM, Christian König wrote:
>>> Am 22.09.2016 um 15:05 schrieb Daniel Vetter:
>>>> 
>>>> But the current approach in amdgpu_sync.c of declaring a fence as
>>>> exclusive after the fact (if owners don't match) just isn't how
>>>> reservation_object works. You can of course change that, but that
>>>> means you must change all drivers implementing support for implicit
>>>> fencing of dma-buf. Fixing amdgpu will be easier ;-)
>>>
>>> Well as far as I can see there is no way I can fix amdgpu in this case.
>>>
>>> The handling clearly needs to be changed on the receiving side of the
>>> reservation objects if I don't completely want to disable concurrent
>>> access to BOs in amdgpu.
>>
>> Anyway, we need a solution for this between radeon and amdgpu, and I
>> don't think a solution which involves those drivers using reservation
>> object semantics between them which are different from all other drivers
>> is a good idea.
> 
> Afaik there's also amd+intel machines out there,

Sure, what I meant was that even if we didn't care about those (which we
do), we'd still need a solution between our own drivers.


> so really the only option is to either fix amdgpu to correctly set
> exclusive fences on shared buffers (with the help of userspace hints).
> Or change all the existing drivers.

I got some fresh perspective on the problem while taking a walk, and I'm
now fairly convinced that neither amdgpu userspace nor other drivers
need to be modified:

It occurred to me that all the information we need is already there in
the exclusive and shared fences set by amdgpu. We just need to use it
differently to match the expectations of other drivers.

We should be able to set some sort of "pseudo" fence as the exclusive
fence of the reservation object. When we are asked by another driver to
wait for this fence to signal, we take the current "real" exclusive
fence (which we can keep track of e.g. in our BO struct) and shared
fences, and wait for all of those to signal; then we mark the "pseudo"
exclusive fence as signalled.

Am I missing anything which would prevent this from working?


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences.
  2016-09-26  0:48                                                                     ` Michel Dänzer
@ 2016-09-26  8:04                                                                       ` Daniel Vetter
       [not found]                                                                         ` <20160926080419.GV20761-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>
  0 siblings, 1 reply; 46+ messages in thread
From: Daniel Vetter @ 2016-09-26  8:04 UTC (permalink / raw)
  To: Michel Dänzer; +Cc: dri-devel, amd-gfx list

On Mon, Sep 26, 2016 at 09:48:37AM +0900, Michel Dänzer wrote:
> On 23/09/16 09:09 PM, Daniel Vetter wrote:
> > On Fri, Sep 23, 2016 at 07:00:25PM +0900, Michel Dänzer wrote:
> >> On 22/09/16 10:22 PM, Christian König wrote:
> >>> Am 22.09.2016 um 15:05 schrieb Daniel Vetter:
> >>>> 
> >>>> But the current approach in amdgpu_sync.c of declaring a fence as
> >>>> exclusive after the fact (if owners don't match) just isn't how
> >>>> reservation_object works. You can of course change that, but that
> >>>> means you must change all drivers implementing support for implicit
> >>>> fencing of dma-buf. Fixing amdgpu will be easier ;-)
> >>>
> >>> Well as far as I can see there is no way I can fix amdgpu in this case.
> >>>
> >>> The handling clearly needs to be changed on the receiving side of the
> >>> reservation objects if I don't completely want to disable concurrent
> >>> access to BOs in amdgpu.
> >>
> >> Anyway, we need a solution for this between radeon and amdgpu, and I
> >> don't think a solution which involves those drivers using reservation
> >> object semantics between them which are different from all other drivers
> >> is a good idea.
> > 
> > Afaik there's also amd+intel machines out there,
> 
> Sure, what I meant was that even if we didn't care about those (which we
> do), we'd still need a solution between our own drivers.
> 
> 
> > so really the only option is to either fix amdgpu to correctly set
> > exclusive fences on shared buffers (with the help of userspace hints).
> > Or change all the existing drivers.
> 
> I got some fresh perspective on the problem while taking a walk, and I'm
> now fairly convinced that neither amdgpu userspace nor other drivers
> need to be modified:
> 
> It occurred to me that all the information we need is already there in
> the exclusive and shared fences set by amdgpu. We just need to use it
> differently to match the expectations of other drivers.
> 
> We should be able to set some sort of "pseudo" fence as the exclusive
> fence of the reservation object. When we are asked by another driver to
> wait for this fence to signal, we take the current "real" exclusive
> fence (which we can keep track of e.g. in our BO struct) and shared
> fences, and wait for all of those to signal; then we mark the "pseudo"
> exclusive fence as signalled.
> 
> Am I missing anything which would prevent this from working?

One thing to make sure is that you don't change the (real, private stored)
fences you're waiting on over the lifetime of the public exclusive fence.
That might lead to some hilarity wrt potentially creating fence depency
loops. But I think as long as you guarantee that the private internal
fences are always amdgpu ones, and never anything imported from a
different driver even that should be fine. Not because this would break
the loops, but since amgpud has a hangcheck it can still gurantee that the
fence eventually fires even if there is a loop.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences.
       [not found]                                                                         ` <20160926080419.GV20761-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>
@ 2016-10-07 12:34                                                                           ` Mike Lothian
       [not found]                                                                             ` <CAHbf0-HZ6EotqwgvkxRTdRF97xB3qBA=DRKAzaAXguV_PR_P8w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 46+ messages in thread
From: Mike Lothian @ 2016-10-07 12:34 UTC (permalink / raw)
  To: Daniel Vetter, Michel Dänzer; +Cc: amd-gfx list, dri-devel

Hi

This has discussion has gone a little quiet

Was there any agreement about what needed doing to get this working
for i965/amdgpu?

Regards

Mike

On Mon, 26 Sep 2016 at 09:04 Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Mon, Sep 26, 2016 at 09:48:37AM +0900, Michel Dänzer wrote:
> > On 23/09/16 09:09 PM, Daniel Vetter wrote:
> > > On Fri, Sep 23, 2016 at 07:00:25PM +0900, Michel Dänzer wrote:
> > >> On 22/09/16 10:22 PM, Christian König wrote:
> > >>> Am 22.09.2016 um 15:05 schrieb Daniel Vetter:
> > >>>>
> > >>>> But the current approach in amdgpu_sync.c of declaring a fence as
> > >>>> exclusive after the fact (if owners don't match) just isn't how
> > >>>> reservation_object works. You can of course change that, but that
> > >>>> means you must change all drivers implementing support for implicit
> > >>>> fencing of dma-buf. Fixing amdgpu will be easier ;-)
> > >>>
> > >>> Well as far as I can see there is no way I can fix amdgpu in this case.
> > >>>
> > >>> The handling clearly needs to be changed on the receiving side of the
> > >>> reservation objects if I don't completely want to disable concurrent
> > >>> access to BOs in amdgpu.
> > >>
> > >> Anyway, we need a solution for this between radeon and amdgpu, and I
> > >> don't think a solution which involves those drivers using reservation
> > >> object semantics between them which are different from all other drivers
> > >> is a good idea.
> > >
> > > Afaik there's also amd+intel machines out there,
> >
> > Sure, what I meant was that even if we didn't care about those (which we
> > do), we'd still need a solution between our own drivers.
> >
> >
> > > so really the only option is to either fix amdgpu to correctly set
> > > exclusive fences on shared buffers (with the help of userspace hints).
> > > Or change all the existing drivers.
> >
> > I got some fresh perspective on the problem while taking a walk, and I'm
> > now fairly convinced that neither amdgpu userspace nor other drivers
> > need to be modified:
> >
> > It occurred to me that all the information we need is already there in
> > the exclusive and shared fences set by amdgpu. We just need to use it
> > differently to match the expectations of other drivers.
> >
> > We should be able to set some sort of "pseudo" fence as the exclusive
> > fence of the reservation object. When we are asked by another driver to
> > wait for this fence to signal, we take the current "real" exclusive
> > fence (which we can keep track of e.g. in our BO struct) and shared
> > fences, and wait for all of those to signal; then we mark the "pseudo"
> > exclusive fence as signalled.
> >
> > Am I missing anything which would prevent this from working?
>
> One thing to make sure is that you don't change the (real, private stored)
> fences you're waiting on over the lifetime of the public exclusive fence.
> That might lead to some hilarity wrt potentially creating fence depency
> loops. But I think as long as you guarantee that the private internal
> fences are always amdgpu ones, and never anything imported from a
> different driver even that should be fine. Not because this would break
> the loops, but since amgpud has a hangcheck it can still gurantee that the
> fence eventually fires even if there is a loop.
> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences.
       [not found]                                                                             ` <CAHbf0-HZ6EotqwgvkxRTdRF97xB3qBA=DRKAzaAXguV_PR_P8w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2016-10-11  3:58                                                                               ` Michel Dänzer
       [not found]                                                                                 ` <d74d34a7-5221-d282-d9d1-b0e1007fc0c7-otUistvHUpPR7s880joybQ@public.gmane.org>
  0 siblings, 1 reply; 46+ messages in thread
From: Michel Dänzer @ 2016-10-11  3:58 UTC (permalink / raw)
  To: Mike Lothian, Daniel Vetter, Christian König; +Cc: dri-devel, amd-gfx list

On 07/10/16 09:34 PM, Mike Lothian wrote:
> 
> This has discussion has gone a little quiet
> 
> Was there any agreement about what needed doing to get this working
> for i965/amdgpu?

Christian, do you see anything which could prevent the solution I
outlined from working?


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences.
       [not found]                                                                                 ` <d74d34a7-5221-d282-d9d1-b0e1007fc0c7-otUistvHUpPR7s880joybQ@public.gmane.org>
@ 2016-10-11 12:04                                                                                   ` Christian König
       [not found]                                                                                     ` <c77a2cb9-1f0e-f1a3-aedd-a111cd6ba8e8-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
  0 siblings, 1 reply; 46+ messages in thread
From: Christian König @ 2016-10-11 12:04 UTC (permalink / raw)
  To: Michel Dänzer, Mike Lothian, Daniel Vetter
  Cc: Nayan Deshmukh, dri-devel, amd-gfx list

Am 11.10.2016 um 05:58 schrieb Michel Dänzer:
> On 07/10/16 09:34 PM, Mike Lothian wrote:
>> This has discussion has gone a little quiet
>>
>> Was there any agreement about what needed doing to get this working
>> for i965/amdgpu?
> Christian, do you see anything which could prevent the solution I
> outlined from working?

I thought about that approach as well, but unfortunately it also has a 
couple of downsides. Especially keeping the exclusive fence set while we 
actually don't need it isn't really clean either.

I'm currently a bit busy with other tasks and so put Nayan on a road to 
get a bit into the kernel driver (he asked for that anyway).

Implementing the simple workaround to sync when we export the DMA-buf 
should be something like 20 lines of code and fortunately Nayan has an 
I+A system and so can actually test it.

If it turns out to be more problematic or somebody really starts to need 
it I'm going to hack on that myself a bit more.

Regards,
Christian.
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences.
       [not found]                                                                                     ` <c77a2cb9-1f0e-f1a3-aedd-a111cd6ba8e8-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
@ 2016-10-12  0:40                                                                                       ` Michel Dänzer
  2016-10-27 13:33                                                                                         ` Mike Lothian
  0 siblings, 1 reply; 46+ messages in thread
From: Michel Dänzer @ 2016-10-12  0:40 UTC (permalink / raw)
  To: Christian König, Mike Lothian, Daniel Vetter
  Cc: Nayan Deshmukh, amd-gfx list, dri-devel

On 11/10/16 09:04 PM, Christian König wrote:
> Am 11.10.2016 um 05:58 schrieb Michel Dänzer:
>> On 07/10/16 09:34 PM, Mike Lothian wrote:
>>> This has discussion has gone a little quiet
>>>
>>> Was there any agreement about what needed doing to get this working
>>> for i965/amdgpu?
>> Christian, do you see anything which could prevent the solution I
>> outlined from working?
> 
> I thought about that approach as well, but unfortunately it also has a
> couple of downsides. Especially keeping the exclusive fence set while we
> actually don't need it isn't really clean either.

I was wondering if it's possible to have a singleton pseudo exclusive
fence which is used for all BOs. That might keep the overhead acceptably
low.


> I'm currently a bit busy with other tasks and so put Nayan on a road to
> get a bit into the kernel driver (he asked for that anyway).
> 
> Implementing the simple workaround to sync when we export the DMA-buf
> should be something like 20 lines of code and fortunately Nayan has an
> I+A system and so can actually test it.
> 
> If it turns out to be more problematic or somebody really starts to need
> it I'm going to hack on that myself a bit more.

If you mean only syncing when a DMA-buf is exported, that's not enough,
as I explained before. The BOs being shared are long-lived, and
synchronization between GPUs is required for every command stream
submission.


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences.
  2016-10-12  0:40                                                                                       ` Michel Dänzer
@ 2016-10-27 13:33                                                                                         ` Mike Lothian
       [not found]                                                                                           ` <CAHbf0-GGMWZrhB+PKpc-QbD__6fqB4pQVFfN+gzLWNhi+DuG3Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 46+ messages in thread
From: Mike Lothian @ 2016-10-27 13:33 UTC (permalink / raw)
  To: Michel Dänzer, Christian König, Daniel Vetter
  Cc: Nayan Deshmukh, amd-gfx list, dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1823 bytes --]

Hi

Just another gentle ping to see where you are with this?

Cheers

Mike

On Wed, 12 Oct 2016 at 01:40 Michel Dänzer <michel@daenzer.net> wrote:

> On 11/10/16 09:04 PM, Christian König wrote:
> > Am 11.10.2016 um 05:58 schrieb Michel Dänzer:
> >> On 07/10/16 09:34 PM, Mike Lothian wrote:
> >>> This has discussion has gone a little quiet
> >>>
> >>> Was there any agreement about what needed doing to get this working
> >>> for i965/amdgpu?
> >> Christian, do you see anything which could prevent the solution I
> >> outlined from working?
> >
> > I thought about that approach as well, but unfortunately it also has a
> > couple of downsides. Especially keeping the exclusive fence set while we
> > actually don't need it isn't really clean either.
>
> I was wondering if it's possible to have a singleton pseudo exclusive
> fence which is used for all BOs. That might keep the overhead acceptably
> low.
>
>
> > I'm currently a bit busy with other tasks and so put Nayan on a road to
> > get a bit into the kernel driver (he asked for that anyway).
> >
> > Implementing the simple workaround to sync when we export the DMA-buf
> > should be something like 20 lines of code and fortunately Nayan has an
> > I+A system and so can actually test it.
> >
> > If it turns out to be more problematic or somebody really starts to need
> > it I'm going to hack on that myself a bit more.
>
> If you mean only syncing when a DMA-buf is exported, that's not enough,
> as I explained before. The BOs being shared are long-lived, and
> synchronization between GPUs is required for every command stream
> submission.
>
>
> --
> Earthling Michel Dänzer               |               http://www.amd.com
> Libre software enthusiast             |             Mesa and X developer
>

[-- Attachment #1.2: Type: text/html, Size: 3122 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences.
       [not found]                                                                                           ` <CAHbf0-GGMWZrhB+PKpc-QbD__6fqB4pQVFfN+gzLWNhi+DuG3Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2016-10-28  1:34                                                                                             ` Michel Dänzer
  2016-10-28 17:37                                                                                               ` Mario Kleiner
  0 siblings, 1 reply; 46+ messages in thread
From: Michel Dänzer @ 2016-10-28  1:34 UTC (permalink / raw)
  To: Mike Lothian, Christian König, Daniel Vetter
  Cc: Nayan Deshmukh, dri-devel, amd-gfx list

On 27/10/16 10:33 PM, Mike Lothian wrote:
> 
> Just another gentle ping to see where you are with this?

I haven't got a chance to look into this any further.


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences.
  2016-10-28  1:34                                                                                             ` Michel Dänzer
@ 2016-10-28 17:37                                                                                               ` Mario Kleiner
       [not found]                                                                                                 ` <7eb19a73-a558-d2e6-bd8d-34fe95045dfd-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2016-10-28 18:37                                                                                                 ` Mike Lothian
  0 siblings, 2 replies; 46+ messages in thread
From: Mario Kleiner @ 2016-10-28 17:37 UTC (permalink / raw)
  To: Michel Dänzer, Mike Lothian, Christian König, Daniel Vetter
  Cc: Nayan Deshmukh, amd-gfx list, dri-devel

[-- Attachment #1: Type: text/plain, Size: 1555 bytes --]



On 10/28/2016 03:34 AM, Michel Dänzer wrote:
> On 27/10/16 10:33 PM, Mike Lothian wrote:
>>
>> Just another gentle ping to see where you are with this?
>
> I haven't got a chance to look into this any further.
>
>

Fwiw., as a proof of concept, the attached experimental patch does work 
as tested on Intel HD Haswell + AMD R9 380 Tonga under amdgpu and 
DRI3/Present when applied to drm-next (updated from a few days ago). 
With DRI_PRIME=1 tearing for page-flipped fullscreen windows is gone 
under all loads. The tearing with "windowed" windows now looks as 
expected for regular tearing not related to Prime.

ftrace confirms the i915 driver's pageflip function is waiting on the 
fence in reservation_object_wait_timeout_rcu() as it should.

That entry->tv.shared needs to be set false for such buffers in 
amdgpu_bo_list_set() makes sense to me, as that is part of the buffer 
validation for command stream submission. There are other places in the 
driver where tv.shared is set, which i didn't check so far.

I don't know which of these would need to be updated with a "exported 
bo" check as well, e.g., for video decoding or maybe gpu compute? Adding 
or removing the check to amdgpu_gem_va_update_vm(), e.g., made no 
difference. I assume that makes sense because that functions seems to 
deal with amdgpu internal vm page tables or page table entries for such 
a bo, not with something visible to external clients?

All i can say is it fixes 3D rendering under DRI3 + Prime + pageflipping 
without causing any obvious new problems.

-mario

[-- Attachment #2: 0001-drm-amdgpu-Attach-exclusive-fence-to-prime-exported-.patch --]
[-- Type: text/x-patch, Size: 3024 bytes --]

>From 2a8d7fcd36da30305fa675df311c697162792597 Mon Sep 17 00:00:00 2001
From: Mario Kleiner <mario.kleiner.de@gmail.com>
Date: Wed, 26 Oct 2016 10:58:00 +0200
Subject: [PATCH] drm/amdgpu: Attach exclusive fence to prime exported bo's.

External clients which import our bo's wait only
for exclusive dmabuf-fences, not on shared ones,
so attach fences on such exported buffers as
exclusive ones, not shared ones.

-> Backup commit. Work in progress.

Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h         | 1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c | 5 ++++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c     | 6 ++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c   | 3 +++
 4 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 039b57e..a337d56 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -459,6 +459,7 @@ struct amdgpu_bo {
 	u64				metadata_flags;
 	void				*metadata;
 	u32				metadata_size;
+	bool				prime_exported;
 	/* list of all virtual address to which this bo
 	 * is associated to
 	 */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
index 651115d..6e1d7b3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
@@ -132,7 +132,10 @@ static int amdgpu_bo_list_set(struct amdgpu_device *adev,
 		entry->priority = min(info[i].bo_priority,
 				      AMDGPU_BO_LIST_MAX_PRIORITY);
 		entry->tv.bo = &entry->robj->tbo;
-		entry->tv.shared = true;
+		entry->tv.shared = !entry->robj->prime_exported;
+
+		if (entry->robj->prime_exported)
+		    DRM_DEBUG_PRIME("Exclusive fence for exported prime bo %p\n", entry->robj);
 
 		if (entry->robj->prefered_domains == AMDGPU_GEM_DOMAIN_GDS)
 			gds_obj = entry->robj;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index a7ea9a3..730a68e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -494,6 +494,12 @@ static void amdgpu_gem_va_update_vm(struct amdgpu_device *adev,
 
 	tv.bo = &bo_va->bo->tbo;
 	tv.shared = true;
+
+	if (bo_va->bo->prime_exported) {
+	    DRM_DEBUG_PRIME("Update for exported prime bo %p\n", bo_va->bo);
+	    /* tv.shared = false; */
+	}
+
 	list_add(&tv.head, &list);
 
 	amdgpu_vm_get_pd_bo(bo_va->vm, &list, &vm_pd);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
index 7700dc2..bfbfeb9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
@@ -121,5 +121,8 @@ struct dma_buf *amdgpu_gem_prime_export(struct drm_device *dev,
 	if (amdgpu_ttm_tt_get_usermm(bo->tbo.ttm))
 		return ERR_PTR(-EPERM);
 
+	bo->prime_exported = true;
+	DRM_DEBUG_PRIME("Exporting prime bo %p\n", bo);
+
 	return drm_gem_prime_export(dev, gobj, flags);
 }
-- 
2.7.4


[-- Attachment #3: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences.
       [not found]                                                                                                 ` <7eb19a73-a558-d2e6-bd8d-34fe95045dfd-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2016-10-28 17:48                                                                                                   ` Christian König
  2016-11-05  1:17                                                                                                     ` Mario Kleiner
  2016-10-31  6:41                                                                                                   ` Michel Dänzer
  1 sibling, 1 reply; 46+ messages in thread
From: Christian König @ 2016-10-28 17:48 UTC (permalink / raw)
  To: Mario Kleiner, Michel Dänzer, Mike Lothian, Daniel Vetter
  Cc: Nayan Deshmukh, amd-gfx list, dri-devel

Am 28.10.2016 um 19:37 schrieb Mario Kleiner:
>
>
> On 10/28/2016 03:34 AM, Michel Dänzer wrote:
>> On 27/10/16 10:33 PM, Mike Lothian wrote:
>>>
>>> Just another gentle ping to see where you are with this?
>>
>> I haven't got a chance to look into this any further.
>>
>>
>
> Fwiw., as a proof of concept, the attached experimental patch does 
> work as tested on Intel HD Haswell + AMD R9 380 Tonga under amdgpu and 
> DRI3/Present when applied to drm-next (updated from a few days ago). 
> With DRI_PRIME=1 tearing for page-flipped fullscreen windows is gone 
> under all loads. The tearing with "windowed" windows now looks as 
> expected for regular tearing not related to Prime.

Yeah, that's pretty much what I had in mind as well. You additionally 
need to wait for the shared fences when you export the BO for the first 
time, but that's only a nitpick.

>
> ftrace confirms the i915 driver's pageflip function is waiting on the 
> fence in reservation_object_wait_timeout_rcu() as it should.
>
> That entry->tv.shared needs to be set false for such buffers in 
> amdgpu_bo_list_set() makes sense to me, as that is part of the buffer 
> validation for command stream submission. There are other places in 
> the driver where tv.shared is set, which i didn't check so far.
>
> I don't know which of these would need to be updated with a "exported 
> bo" check as well, e.g., for video decoding or maybe gpu compute? 
> Adding or removing the check to amdgpu_gem_va_update_vm(), e.g., made 
> no difference. I assume that makes sense because that functions seems 
> to deal with amdgpu internal vm page tables or page table entries for 
> such a bo, not with something visible to external clients?

Yes, exactly. VM updates doesn't matter for anybody else than amdgpu. 
Additional to that we don't even add a fence to the shared slot we 
reserve (could probably drop that for optimization).

Please remove the debugging stuff and the extra code on the VM updates 
and add a reservation_object_wait_timeout_rcu() to the export path and 
we should be good to go.

Regards,
Christian.

>
> All i can say is it fixes 3D rendering under DRI3 + Prime + 
> pageflipping without causing any obvious new problems.
>
> -mario


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences.
  2016-10-28 17:37                                                                                               ` Mario Kleiner
       [not found]                                                                                                 ` <7eb19a73-a558-d2e6-bd8d-34fe95045dfd-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2016-10-28 18:37                                                                                                 ` Mike Lothian
  2016-10-29 13:58                                                                                                   ` Mike Lothian
  1 sibling, 1 reply; 46+ messages in thread
From: Mike Lothian @ 2016-10-28 18:37 UTC (permalink / raw)
  To: Mario Kleiner, Michel Dänzer, Christian König, Daniel Vetter
  Cc: Nayan Deshmukh, amd-gfx list, dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 2072 bytes --]

Hi Mario

That fixes the tearing, it's been replaced with a strange stutter, like
it's only showing half the number of frames being reported - it's really
noticeable in tomb raider

Thanks for your work on this, the stutter is much more manageable than the
tearing was

I've attached the patch that applies cleanly to 4.10-wip



On Fri, 28 Oct 2016 at 18:37 Mario Kleiner <mario.kleiner.de@gmail.com>
wrote:

>
>
> On 10/28/2016 03:34 AM, Michel Dänzer wrote:
> > On 27/10/16 10:33 PM, Mike Lothian wrote:
> >>
> >> Just another gentle ping to see where you are with this?
> >
> > I haven't got a chance to look into this any further.
> >
> >
>
> Fwiw., as a proof of concept, the attached experimental patch does work
> as tested on Intel HD Haswell + AMD R9 380 Tonga under amdgpu and
> DRI3/Present when applied to drm-next (updated from a few days ago).
> With DRI_PRIME=1 tearing for page-flipped fullscreen windows is gone
> under all loads. The tearing with "windowed" windows now looks as
> expected for regular tearing not related to Prime.
>
> ftrace confirms the i915 driver's pageflip function is waiting on the
> fence in reservation_object_wait_timeout_rcu() as it should.
>
> That entry->tv.shared needs to be set false for such buffers in
> amdgpu_bo_list_set() makes sense to me, as that is part of the buffer
> validation for command stream submission. There are other places in the
> driver where tv.shared is set, which i didn't check so far.
>
> I don't know which of these would need to be updated with a "exported
> bo" check as well, e.g., for video decoding or maybe gpu compute? Adding
> or removing the check to amdgpu_gem_va_update_vm(), e.g., made no
> difference. I assume that makes sense because that functions seems to
> deal with amdgpu internal vm page tables or page table entries for such
> a bo, not with something visible to external clients?
>
> All i can say is it fixes 3D rendering under DRI3 + Prime + pageflipping
> without causing any obvious new problems.
>
> -mario
>

[-- Attachment #1.2: Type: text/html, Size: 3233 bytes --]

[-- Attachment #2: 0001-drm-amdgpu-Attach-exclusive-fence-to-prime-exported-.patch --]
[-- Type: text/x-patch, Size: 2235 bytes --]

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 217df24..6757b99 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -391,6 +391,7 @@ struct amdgpu_bo {
 	u64				metadata_flags;
 	void				*metadata;
 	u32				metadata_size;
+	bool				prime_exported;
 	/* list of all virtual address to which this bo
 	 * is associated to
 	 */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
index 651115d..6e1d7b3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
@@ -132,7 +132,10 @@ static int amdgpu_bo_list_set(struct amdgpu_device *adev,
 		entry->priority = min(info[i].bo_priority,
 				      AMDGPU_BO_LIST_MAX_PRIORITY);
 		entry->tv.bo = &entry->robj->tbo;
-		entry->tv.shared = true;
+		entry->tv.shared = !entry->robj->prime_exported;
+
+		if (entry->robj->prime_exported)
+		    DRM_DEBUG_PRIME("Exclusive fence for exported prime bo %p\n", entry->robj);
 
 		if (entry->robj->prefered_domains == AMDGPU_GEM_DOMAIN_GDS)
 			gds_obj = entry->robj;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index cd62f6f..54099a5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -504,6 +504,12 @@ static void amdgpu_gem_va_update_vm(struct amdgpu_device *adev,
 
 	tv.bo = &bo_va->bo->tbo;
 	tv.shared = true;
+
+	if (bo_va->bo->prime_exported) {
+	    DRM_DEBUG_PRIME("Update for exported prime bo %p\n", bo_va->bo);
+	    /* tv.shared = false; */
+	}
+
 	list_add(&tv.head, &list);
 
 	amdgpu_vm_get_pd_bo(bo_va->vm, &list, &vm_pd);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
index 7700dc2..bfbfeb9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
@@ -121,5 +121,8 @@ struct dma_buf *amdgpu_gem_prime_export(struct drm_device *dev,
 	if (amdgpu_ttm_tt_get_usermm(bo->tbo.ttm))
 		return ERR_PTR(-EPERM);
 
+	bo->prime_exported = true;
+	DRM_DEBUG_PRIME("Exporting prime bo %p\n", bo);
+
 	return drm_gem_prime_export(dev, gobj, flags);
 }

[-- Attachment #3: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences.
  2016-10-28 18:37                                                                                                 ` Mike Lothian
@ 2016-10-29 13:58                                                                                                   ` Mike Lothian
       [not found]                                                                                                     ` <CAHbf0-EY2OM_HgxTjmMi4-f5TQ8fkqf5XYBxHZtJVsnSpxPyyA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 46+ messages in thread
From: Mike Lothian @ 2016-10-29 13:58 UTC (permalink / raw)
  To: Mario Kleiner, Michel Dänzer, Christian König, Daniel Vetter
  Cc: Nayan Deshmukh, amd-gfx list, dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 2349 bytes --]

I turned on vsync and everything works great in tomb raider

:D

Thanks again to everyone who made this possible

On Fri, 28 Oct 2016 at 19:37 Mike Lothian <mike-4+n8WJKc9ve9FHfhHBbuYA@public.gmane.org> wrote:

> Hi Mario
>
> That fixes the tearing, it's been replaced with a strange stutter, like
> it's only showing half the number of frames being reported - it's really
> noticeable in tomb raider
>
> Thanks for your work on this, the stutter is much more manageable than the
> tearing was
>
> I've attached the patch that applies cleanly to 4.10-wip
>
>
>
> On Fri, 28 Oct 2016 at 18:37 Mario Kleiner <mario.kleiner.de-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> wrote:
>
>
>
> On 10/28/2016 03:34 AM, Michel Dänzer wrote:
> > On 27/10/16 10:33 PM, Mike Lothian wrote:
> >>
> >> Just another gentle ping to see where you are with this?
> >
> > I haven't got a chance to look into this any further.
> >
> >
>
> Fwiw., as a proof of concept, the attached experimental patch does work
> as tested on Intel HD Haswell + AMD R9 380 Tonga under amdgpu and
> DRI3/Present when applied to drm-next (updated from a few days ago).
> With DRI_PRIME=1 tearing for page-flipped fullscreen windows is gone
> under all loads. The tearing with "windowed" windows now looks as
> expected for regular tearing not related to Prime.
>
> ftrace confirms the i915 driver's pageflip function is waiting on the
> fence in reservation_object_wait_timeout_rcu() as it should.
>
> That entry->tv.shared needs to be set false for such buffers in
> amdgpu_bo_list_set() makes sense to me, as that is part of the buffer
> validation for command stream submission. There are other places in the
> driver where tv.shared is set, which i didn't check so far.
>
> I don't know which of these would need to be updated with a "exported
> bo" check as well, e.g., for video decoding or maybe gpu compute? Adding
> or removing the check to amdgpu_gem_va_update_vm(), e.g., made no
> difference. I assume that makes sense because that functions seems to
> deal with amdgpu internal vm page tables or page table entries for such
> a bo, not with something visible to external clients?
>
> All i can say is it fixes 3D rendering under DRI3 + Prime + pageflipping
> without causing any obvious new problems.
>
> -mario
>
>

[-- Attachment #1.2: Type: text/html, Size: 4160 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences.
       [not found]                                                                                                 ` <7eb19a73-a558-d2e6-bd8d-34fe95045dfd-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2016-10-28 17:48                                                                                                   ` Christian König
@ 2016-10-31  6:41                                                                                                   ` Michel Dänzer
  1 sibling, 0 replies; 46+ messages in thread
From: Michel Dänzer @ 2016-10-31  6:41 UTC (permalink / raw)
  To: Mario Kleiner, Mike Lothian, Christian König, Daniel Vetter
  Cc: Nayan Deshmukh, dri-devel, amd-gfx list

On 29/10/16 02:37 AM, Mario Kleiner wrote:
> On 10/28/2016 03:34 AM, Michel Dänzer wrote:
>> On 27/10/16 10:33 PM, Mike Lothian wrote:
>>>
>>> Just another gentle ping to see where you are with this?
>>
>> I haven't got a chance to look into this any further.
> 
> Fwiw., as a proof of concept, the attached experimental patch does work
> as tested on Intel HD Haswell + AMD R9 380 Tonga under amdgpu and
> DRI3/Present when applied to drm-next (updated from a few days ago).
> With DRI_PRIME=1 tearing for page-flipped fullscreen windows is gone
> under all loads. The tearing with "windowed" windows now looks as
> expected for regular tearing not related to Prime.
> 
> ftrace confirms the i915 driver's pageflip function is waiting on the
> fence in reservation_object_wait_timeout_rcu() as it should.
> 
> That entry->tv.shared needs to be set false for such buffers in
> amdgpu_bo_list_set() makes sense to me, as that is part of the buffer
> validation for command stream submission. There are other places in the
> driver where tv.shared is set, which i didn't check so far.
> 
> I don't know which of these would need to be updated with a "exported
> bo" check as well, e.g., for video decoding or maybe gpu compute? Adding
> or removing the check to amdgpu_gem_va_update_vm(), e.g., made no
> difference. I assume that makes sense because that functions seems to
> deal with amdgpu internal vm page tables or page table entries for such
> a bo, not with something visible to external clients?
> 
> All i can say is it fixes 3D rendering under DRI3 + Prime + pageflipping
> without causing any obvious new problems.

[...]

> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
> index 7700dc2..bfbfeb9 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
> @@ -121,5 +121,8 @@ struct dma_buf *amdgpu_gem_prime_export(struct drm_device *dev,
>  	if (amdgpu_ttm_tt_get_usermm(bo->tbo.ttm))
>  		return ERR_PTR(-EPERM);
>  
> +	bo->prime_exported = true;
> +	DRM_DEBUG_PRIME("Exporting prime bo %p\n", bo);
> +
>  	return drm_gem_prime_export(dev, gobj, flags);
>  }
> 

This will take effect in non-PRIME cases as well, at least DRI3 and
GL<->[other API] interop off the top of my head. Is that okay?


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences.
       [not found]                                                                                                     ` <CAHbf0-EY2OM_HgxTjmMi4-f5TQ8fkqf5XYBxHZtJVsnSpxPyyA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2016-10-31  6:44                                                                                                       ` Michel Dänzer
       [not found]                                                                                                         ` <c45e2f4c-c075-47b6-7e02-3bd98748c83a-otUistvHUpPR7s880joybQ@public.gmane.org>
  0 siblings, 1 reply; 46+ messages in thread
From: Michel Dänzer @ 2016-10-31  6:44 UTC (permalink / raw)
  To: Mike Lothian, Mario Kleiner, Christian König, Daniel Vetter
  Cc: Nayan Deshmukh, dri-devel, amd-gfx list

On 29/10/16 10:58 PM, Mike Lothian wrote:
> I turned on vsync and everything works great in tomb raider
> 
> :D
> 
> Thanks again to everyone who made this possible
> 
> On Fri, 28 Oct 2016 at 19:37 Mike Lothian <mike@fireburn.co.uk
> <mailto:mike@fireburn.co.uk>> wrote:
> 
>     Hi Mario
> 
>     That fixes the tearing, it's been replaced with a strange stutter,
>     like it's only showing half the number of frames being reported -
>     it's really noticeable in tomb raider

I wonder if the stutter might be due to the dGPU writing another frame
before the iGPU is done processing the previous one. Christian, does the
amdgpu scheduler wait for shared fences of shared BOs to signal before
submitting jobs using them to the GPU?


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences.
       [not found]                                                                                                         ` <c45e2f4c-c075-47b6-7e02-3bd98748c83a-otUistvHUpPR7s880joybQ@public.gmane.org>
@ 2016-10-31  8:00                                                                                                           ` Christian König
  2016-10-31  8:06                                                                                                             ` Michel Dänzer
  0 siblings, 1 reply; 46+ messages in thread
From: Christian König @ 2016-10-31  8:00 UTC (permalink / raw)
  To: Michel Dänzer, Mike Lothian, Mario Kleiner, Daniel Vetter
  Cc: Nayan Deshmukh, dri-devel, amd-gfx list

Am 31.10.2016 um 07:44 schrieb Michel Dänzer:
> On 29/10/16 10:58 PM, Mike Lothian wrote:
>> I turned on vsync and everything works great in tomb raider
>>
>> :D
>>
>> Thanks again to everyone who made this possible
>>
>> On Fri, 28 Oct 2016 at 19:37 Mike Lothian <mike@fireburn.co.uk
>> <mailto:mike@fireburn.co.uk>> wrote:
>>
>>      Hi Mario
>>
>>      That fixes the tearing, it's been replaced with a strange stutter,
>>      like it's only showing half the number of frames being reported -
>>      it's really noticeable in tomb raider
> I wonder if the stutter might be due to the dGPU writing another frame
> before the iGPU is done processing the previous one. Christian, does the
> amdgpu scheduler wait for shared fences of shared BOs to signal before
> submitting jobs using them to the GPU?

Yeah, that should work. We wait for both the exclusive as well as all 
shared fences before CS or pageflip.

Only on CS we filter the shared fences so that we don't necessary wait 
for submissions from the same process.

Christian.
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences.
  2016-10-31  8:00                                                                                                           ` Christian König
@ 2016-10-31  8:06                                                                                                             ` Michel Dänzer
  0 siblings, 0 replies; 46+ messages in thread
From: Michel Dänzer @ 2016-10-31  8:06 UTC (permalink / raw)
  To: Christian König, Mike Lothian, Mario Kleiner, Daniel Vetter
  Cc: Nayan Deshmukh, amd-gfx list, dri-devel

On 31/10/16 05:00 PM, Christian König wrote:
> Am 31.10.2016 um 07:44 schrieb Michel Dänzer:
>> On 29/10/16 10:58 PM, Mike Lothian wrote:
>>> I turned on vsync and everything works great in tomb raider
>>>
>>> :D
>>>
>>> Thanks again to everyone who made this possible
>>>
>>> On Fri, 28 Oct 2016 at 19:37 Mike Lothian <mike@fireburn.co.uk
>>> <mailto:mike@fireburn.co.uk>> wrote:
>>>
>>>      Hi Mario
>>>
>>>      That fixes the tearing, it's been replaced with a strange stutter,
>>>      like it's only showing half the number of frames being reported -
>>>      it's really noticeable in tomb raider
>> I wonder if the stutter might be due to the dGPU writing another frame
>> before the iGPU is done processing the previous one. Christian, does the
>> amdgpu scheduler wait for shared fences of shared BOs to signal before
>> submitting jobs using them to the GPU?
> 
> Yeah, that should work. We wait for both the exclusive as well as all
> shared fences before CS or pageflip.
> 
> Only on CS we filter the shared fences so that we don't necessary wait
> for submissions from the same process.

Note that it can be the same process (Xorg) in the RandR 1.4 multihead
case. But it sounds like Mike's stutter can't be due to the issue I was
thinking of.

-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences.
  2016-10-28 17:48                                                                                                   ` Christian König
@ 2016-11-05  1:17                                                                                                     ` Mario Kleiner
  0 siblings, 0 replies; 46+ messages in thread
From: Mario Kleiner @ 2016-11-05  1:17 UTC (permalink / raw)
  To: Christian König, Michel Dänzer, Mike Lothian, Daniel Vetter
  Cc: Nayan Deshmukh, amd-gfx list, dri-devel



On 10/28/2016 07:48 PM, Christian König wrote:
> Am 28.10.2016 um 19:37 schrieb Mario Kleiner:
>>
>>
>> On 10/28/2016 03:34 AM, Michel Dänzer wrote:
>>> On 27/10/16 10:33 PM, Mike Lothian wrote:
>>>>
>>>> Just another gentle ping to see where you are with this?
>>>
>>> I haven't got a chance to look into this any further.
>>>
>>>
>>
>> Fwiw., as a proof of concept, the attached experimental patch does
>> work as tested on Intel HD Haswell + AMD R9 380 Tonga under amdgpu and
>> DRI3/Present when applied to drm-next (updated from a few days ago).
>> With DRI_PRIME=1 tearing for page-flipped fullscreen windows is gone
>> under all loads. The tearing with "windowed" windows now looks as
>> expected for regular tearing not related to Prime.
>
> Yeah, that's pretty much what I had in mind as well. You additionally
> need to wait for the shared fences when you export the BO for the first
> time, but that's only a nitpick.
>
>>
>> ftrace confirms the i915 driver's pageflip function is waiting on the
>> fence in reservation_object_wait_timeout_rcu() as it should.
>>
>> That entry->tv.shared needs to be set false for such buffers in
>> amdgpu_bo_list_set() makes sense to me, as that is part of the buffer
>> validation for command stream submission. There are other places in
>> the driver where tv.shared is set, which i didn't check so far.
>>
>> I don't know which of these would need to be updated with a "exported
>> bo" check as well, e.g., for video decoding or maybe gpu compute?
>> Adding or removing the check to amdgpu_gem_va_update_vm(), e.g., made
>> no difference. I assume that makes sense because that functions seems
>> to deal with amdgpu internal vm page tables or page table entries for
>> such a bo, not with something visible to external clients?
>
> Yes, exactly. VM updates doesn't matter for anybody else than amdgpu.
> Additional to that we don't even add a fence to the shared slot we
> reserve (could probably drop that for optimization).
>
> Please remove the debugging stuff and the extra code on the VM updates
> and add a reservation_object_wait_timeout_rcu() to the export path and
> we should be good to go.
>

Done. Patch v2 just sent out after retesting with Tonga only and Intel + 
Tonga renderoffload. Would be cool if we could get this into 4.9-rc.

Ideally also backported to 4.8, given it is a simple change, as that 
would be the next official kernel for *buntu 16.04-LTS and derivatives, 
but that's probably breaking the rules as it doesn't fix a regression?

thanks,
-mario


> Regards,
> Christian.
>
>>
>> All i can say is it fixes 3D rendering under DRI3 + Prime +
>> pageflipping without causing any obvious new problems.
>>
>> -mario
>
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2016-11-05  1:17 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-08  0:14 [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences Mario Kleiner
2016-09-08  6:30 ` Chris Wilson
2016-09-08 15:21   ` Mario Kleiner
2016-09-08 16:23     ` Chris Wilson
     [not found]       ` <20160908162346.GA5479-aII6DKEyn0pWYbfKqPwjAkR8Iwp7RQ6xAL8bYrjMMd8@public.gmane.org>
2016-09-09  1:15         ` Michel Dänzer
     [not found]           ` <abccc8ac-10c6-ab22-c59d-f43ee48ba78d-otUistvHUpPR7s880joybQ@public.gmane.org>
2016-09-13  8:44             ` Christian König
2016-09-13  9:39               ` Chris Wilson
     [not found]                 ` <20160913093945.GA25204-aII6DKEyn0pWYbfKqPwjAkR8Iwp7RQ6xAL8bYrjMMd8@public.gmane.org>
2016-09-13 12:52                   ` Christian König
2016-09-21  9:56                     ` Michel Dänzer
     [not found]                       ` <7aafce92-8bcf-1c5c-45de-9e8ecda85239-otUistvHUpPR7s880joybQ@public.gmane.org>
2016-09-21 10:30                         ` Christian König
2016-09-21 11:04                           ` Daniel Vetter
     [not found]                             ` <CAKMK7uG3j54NzwjxmWuSmP787r+QN-Cu5T8R-naX6S9RvvKemw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-09-21 11:19                               ` Christian König
2016-09-21 12:56                                 ` Daniel Vetter
     [not found]                                   ` <CAKMK7uH6N2Kgwkf-11iwdqDAUrFmreYKLLeTGXmEh+N0DQ4tJg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-09-21 15:07                                     ` Michel Dänzer
     [not found]                                       ` <9d1f4872-cabd-bd1b-7f10-6e4230a1f58c-otUistvHUpPR7s880joybQ@public.gmane.org>
2016-09-21 15:15                                         ` Christian König
     [not found]                                           ` <5c2048ff-0e20-ddf3-2d73-9a3acb38e7ff-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
2016-09-21 15:29                                             ` Michel Dänzer
2016-09-21 16:23                                               ` Christian König
2016-09-22  6:36                                                 ` Daniel Vetter
     [not found]                                                   ` <20160922063625.GD22164-XQyZGdhdUcTMwUGJfOwWj/ooFf0ArEBIu+b9c/7xato@public.gmane.org>
2016-09-22 10:55                                                     ` Christian König
2016-09-22 12:26                                                       ` Daniel Vetter
2016-09-22 12:44                                                         ` Christian König
2016-09-22 13:05                                                           ` Daniel Vetter
2016-09-22 13:22                                                             ` Christian König
     [not found]                                                               ` <d2430ff8-43bd-bff2-9b02-847cabfd56c0-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
2016-09-23 10:00                                                                 ` Michel Dänzer
2016-09-23 12:09                                                                   ` Daniel Vetter
2016-09-26  0:48                                                                     ` Michel Dänzer
2016-09-26  8:04                                                                       ` Daniel Vetter
     [not found]                                                                         ` <20160926080419.GV20761-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>
2016-10-07 12:34                                                                           ` Mike Lothian
     [not found]                                                                             ` <CAHbf0-HZ6EotqwgvkxRTdRF97xB3qBA=DRKAzaAXguV_PR_P8w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-10-11  3:58                                                                               ` Michel Dänzer
     [not found]                                                                                 ` <d74d34a7-5221-d282-d9d1-b0e1007fc0c7-otUistvHUpPR7s880joybQ@public.gmane.org>
2016-10-11 12:04                                                                                   ` Christian König
     [not found]                                                                                     ` <c77a2cb9-1f0e-f1a3-aedd-a111cd6ba8e8-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
2016-10-12  0:40                                                                                       ` Michel Dänzer
2016-10-27 13:33                                                                                         ` Mike Lothian
     [not found]                                                                                           ` <CAHbf0-GGMWZrhB+PKpc-QbD__6fqB4pQVFfN+gzLWNhi+DuG3Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-10-28  1:34                                                                                             ` Michel Dänzer
2016-10-28 17:37                                                                                               ` Mario Kleiner
     [not found]                                                                                                 ` <7eb19a73-a558-d2e6-bd8d-34fe95045dfd-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2016-10-28 17:48                                                                                                   ` Christian König
2016-11-05  1:17                                                                                                     ` Mario Kleiner
2016-10-31  6:41                                                                                                   ` Michel Dänzer
2016-10-28 18:37                                                                                                 ` Mike Lothian
2016-10-29 13:58                                                                                                   ` Mike Lothian
     [not found]                                                                                                     ` <CAHbf0-EY2OM_HgxTjmMi4-f5TQ8fkqf5XYBxHZtJVsnSpxPyyA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-10-31  6:44                                                                                                       ` Michel Dänzer
     [not found]                                                                                                         ` <c45e2f4c-c075-47b6-7e02-3bd98748c83a-otUistvHUpPR7s880joybQ@public.gmane.org>
2016-10-31  8:00                                                                                                           ` Christian König
2016-10-31  8:06                                                                                                             ` Michel Dänzer
2016-09-22  6:33                                         ` Daniel Vetter
2016-09-21 15:13                           ` Michel Dänzer
     [not found]                             ` <f0e034f9-db22-6577-97c7-dd8d3e851226-otUistvHUpPR7s880joybQ@public.gmane.org>
2016-09-21 15:21                               ` Christian König
2016-09-21 15:28                                 ` Michel Dänzer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.