* [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences. @ 2016-09-08 0:14 Mario Kleiner 2016-09-08 6:30 ` Chris Wilson 0 siblings, 1 reply; 46+ messages in thread From: Mario Kleiner @ 2016-09-08 0:14 UTC (permalink / raw) To: dri-devel; +Cc: Daniel Vetter, Michel Dänzer amdgpu-kms uses shared fences for its prime exported dmabufs, instead of an exclusive fence. Therefore we need to wait for all fences of the dmabuf reservation object to prevent unsynchronized rendering and flipping. This patch was tested to behave properly with intel-kms + radeon/amdgpu/nouveau-kms for correct prime sync during pageflipping under DRI3/Present. Should fix https://bugs.freedesktop.org/show_bug.cgi?id=95472 at least for page-flipped presentation. Suggested-by: Michel Dänzer <michel.daenzer@amd.com> Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com> Cc: Michel Dänzer <michel.daenzer@amd.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> --- drivers/gpu/drm/i915/intel_display.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index 922709b..4b74b96 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -12043,7 +12043,7 @@ static void intel_mmio_flip_work_func(struct work_struct *w) /* For framebuffer backed by dmabuf, wait for fence */ resv = i915_gem_object_get_dmabuf_resv(obj); if (resv) - WARN_ON(reservation_object_wait_timeout_rcu(resv, false, false, + WARN_ON(reservation_object_wait_timeout_rcu(resv, true, false, MAX_SCHEDULE_TIMEOUT) < 0); intel_pipe_update_start(crtc); @@ -14700,7 +14700,7 @@ intel_prepare_plane_fb(struct drm_plane *plane, if (resv) { long lret; - lret = reservation_object_wait_timeout_rcu(resv, false, true, + lret = reservation_object_wait_timeout_rcu(resv, true, true, MAX_SCHEDULE_TIMEOUT); if (lret == -ERESTARTSYS) return lret; -- 2.7.0 _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply related [flat|nested] 46+ messages in thread
* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences. 2016-09-08 0:14 [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences Mario Kleiner @ 2016-09-08 6:30 ` Chris Wilson 2016-09-08 15:21 ` Mario Kleiner 0 siblings, 1 reply; 46+ messages in thread From: Chris Wilson @ 2016-09-08 6:30 UTC (permalink / raw) To: Mario Kleiner; +Cc: Daniel Vetter, Michel Dänzer, dri-devel On Thu, Sep 08, 2016 at 02:14:43AM +0200, Mario Kleiner wrote: > amdgpu-kms uses shared fences for its prime exported dmabufs, > instead of an exclusive fence. Therefore we need to wait for > all fences of the dmabuf reservation object to prevent > unsynchronized rendering and flipping. No. Fix the root cause as this affects not just flips but copies - this implies that everybody using the resv object must wait for all fences. The resv object is not just used for prime, but all fencing, so this breaks the ability to schedule parallel operations across engine. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences. 2016-09-08 6:30 ` Chris Wilson @ 2016-09-08 15:21 ` Mario Kleiner 2016-09-08 16:23 ` Chris Wilson 0 siblings, 1 reply; 46+ messages in thread From: Mario Kleiner @ 2016-09-08 15:21 UTC (permalink / raw) To: Chris Wilson, dri-devel, Michel Dänzer, Daniel Vetter, David Airlie On 09/08/2016 08:30 AM, Chris Wilson wrote: > On Thu, Sep 08, 2016 at 02:14:43AM +0200, Mario Kleiner wrote: >> amdgpu-kms uses shared fences for its prime exported dmabufs, >> instead of an exclusive fence. Therefore we need to wait for >> all fences of the dmabuf reservation object to prevent >> unsynchronized rendering and flipping. > > No. Fix the root cause as this affects not just flips but copies - > this implies that everybody using the resv object must wait for all > fences. The resv object is not just used for prime, but all fencing, so > this breaks the ability to schedule parallel operations across engine. > -Chris > Ok. I think i now understand the difference, but let's check: The exclusive fence is essentially acting a bit like a write-lock, and the shared fences as readers-locks? So you can have multiple readers but only one writer at a time? Ie.: Writer must wait for all fences before starting write access to a buffer, then attach the exclusive fence and signal it on end of write access. E.g., write to renderbuffer, write to texture etc. Readers must wait for exclusive fence, then attach a shared fence per reader and signal it on end of read access? E.g., read from texture, fb, scanout? Is that correct? In that case we'd have a missing exclusive fence in amdgpu for the linear target dmabuf? Probably beyond my level of knowledge to fix this? thanks, -mario _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences. 2016-09-08 15:21 ` Mario Kleiner @ 2016-09-08 16:23 ` Chris Wilson [not found] ` <20160908162346.GA5479-aII6DKEyn0pWYbfKqPwjAkR8Iwp7RQ6xAL8bYrjMMd8@public.gmane.org> 0 siblings, 1 reply; 46+ messages in thread From: Chris Wilson @ 2016-09-08 16:23 UTC (permalink / raw) To: Mario Kleiner; +Cc: Daniel Vetter, Michel Dänzer, dri-devel On Thu, Sep 08, 2016 at 05:21:42PM +0200, Mario Kleiner wrote: > On 09/08/2016 08:30 AM, Chris Wilson wrote: > >On Thu, Sep 08, 2016 at 02:14:43AM +0200, Mario Kleiner wrote: > >>amdgpu-kms uses shared fences for its prime exported dmabufs, > >>instead of an exclusive fence. Therefore we need to wait for > >>all fences of the dmabuf reservation object to prevent > >>unsynchronized rendering and flipping. > > > >No. Fix the root cause as this affects not just flips but copies - > >this implies that everybody using the resv object must wait for all > >fences. The resv object is not just used for prime, but all fencing, so > >this breaks the ability to schedule parallel operations across engine. > >-Chris > > > > Ok. I think i now understand the difference, but let's check: The > exclusive fence is essentially acting a bit like a write-lock, and > the shared fences as readers-locks? So you can have multiple readers > but only one writer at a time? That's how we (i915.ko and I hope the rest of the world) are using them. In the model where here is just one reservation object on the GEM object, that reservation object is then shared between internal driver scheduling and external. We are reliant on being able to use buffers on multiple engines through the virtue of the shared fences, and to serialise after writes by waiting on the exclusive fence. (So we can have concurrent reads on the display engine, render engines and on the CPU - or alternatively an exclusive writer.) In the near future, i915 flips will wait on the common reservation object not only for dma-bufs, but also its own GEM objects. > Ie.: > > Writer must wait for all fences before starting write access to a > buffer, then attach the exclusive fence and signal it on end of > write access. E.g., write to renderbuffer, write to texture etc. Yes. > Readers must wait for exclusive fence, then attach a shared fence > per reader and signal it on end of read access? E.g., read from > texture, fb, scanout? Yes. > Is that correct? In that case we'd have a missing exclusive fence in > amdgpu for the linear target dmabuf? Probably beyond my level of > knowledge to fix this? i915.ko requires the client to mark which buffers are written to. In ttm, there are ttm_validate_buffer objects which mark whether they should be using shared or exclusive fences. Afaict, in amdgpu they are all set to shared, the relevant user interface seems to be amdgpu_bo_list_set(). -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 46+ messages in thread
[parent not found: <20160908162346.GA5479-aII6DKEyn0pWYbfKqPwjAkR8Iwp7RQ6xAL8bYrjMMd8@public.gmane.org>]
* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences. [not found] ` <20160908162346.GA5479-aII6DKEyn0pWYbfKqPwjAkR8Iwp7RQ6xAL8bYrjMMd8@public.gmane.org> @ 2016-09-09 1:15 ` Michel Dänzer [not found] ` <abccc8ac-10c6-ab22-c59d-f43ee48ba78d-otUistvHUpPR7s880joybQ@public.gmane.org> 0 siblings, 1 reply; 46+ messages in thread From: Michel Dänzer @ 2016-09-09 1:15 UTC (permalink / raw) To: Chris Wilson, Mario Kleiner, Christian König Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW On 09/09/16 01:23 AM, Chris Wilson wrote: > On Thu, Sep 08, 2016 at 05:21:42PM +0200, Mario Kleiner wrote: >> On 09/08/2016 08:30 AM, Chris Wilson wrote: >>> On Thu, Sep 08, 2016 at 02:14:43AM +0200, Mario Kleiner wrote: >>>> amdgpu-kms uses shared fences for its prime exported dmabufs, >>>> instead of an exclusive fence. Therefore we need to wait for >>>> all fences of the dmabuf reservation object to prevent >>>> unsynchronized rendering and flipping. >>> >>> No. Fix the root cause as this affects not just flips but copies - >>> this implies that everybody using the resv object must wait for all >>> fences. The resv object is not just used for prime, but all fencing, so >>> this breaks the ability to schedule parallel operations across engine. >>> -Chris >>> >> >> Ok. I think i now understand the difference, but let's check: The >> exclusive fence is essentially acting a bit like a write-lock, and >> the shared fences as readers-locks? So you can have multiple readers >> but only one writer at a time? > > That's how we (i915.ko and I hope the rest of the world) are using them. > In the model where here is just one reservation object on the GEM > object, that reservation object is then shared between internal driver > scheduling and external. We are reliant on being able to use buffers on > multiple engines through the virtue of the shared fences, and to serialise > after writes by waiting on the exclusive fence. (So we can have concurrent > reads on the display engine, render engines and on the CPU - or > alternatively an exclusive writer.) > > In the near future, i915 flips will wait on the common reservation object > not only for dma-bufs, but also its own GEM objects. > >> Ie.: >> >> Writer must wait for all fences before starting write access to a >> buffer, then attach the exclusive fence and signal it on end of >> write access. E.g., write to renderbuffer, write to texture etc. > > Yes. > >> Readers must wait for exclusive fence, then attach a shared fence >> per reader and signal it on end of read access? E.g., read from >> texture, fb, scanout? > > Yes. > >> Is that correct? In that case we'd have a missing exclusive fence in >> amdgpu for the linear target dmabuf? Probably beyond my level of >> knowledge to fix this? > > i915.ko requires the client to mark which buffers are written to. > > In ttm, there are ttm_validate_buffer objects which mark whether they > should be using shared or exclusive fences. Afaict, in amdgpu they are > all set to shared, the relevant user interface seems to be > amdgpu_bo_list_set(). This all makes sense to me. Christian, why is amdgpu setting only shared fences? Can we fix that? -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 46+ messages in thread
[parent not found: <abccc8ac-10c6-ab22-c59d-f43ee48ba78d-otUistvHUpPR7s880joybQ@public.gmane.org>]
* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences. [not found] ` <abccc8ac-10c6-ab22-c59d-f43ee48ba78d-otUistvHUpPR7s880joybQ@public.gmane.org> @ 2016-09-13 8:44 ` Christian König 2016-09-13 9:39 ` Chris Wilson 0 siblings, 1 reply; 46+ messages in thread From: Christian König @ 2016-09-13 8:44 UTC (permalink / raw) To: Michel Dänzer, Chris Wilson, Mario Kleiner Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW Am 09.09.2016 um 03:15 schrieb Michel Dänzer: > On 09/09/16 01:23 AM, Chris Wilson wrote: >> On Thu, Sep 08, 2016 at 05:21:42PM +0200, Mario Kleiner wrote: >>> On 09/08/2016 08:30 AM, Chris Wilson wrote: >>>> On Thu, Sep 08, 2016 at 02:14:43AM +0200, Mario Kleiner wrote: >>>>> amdgpu-kms uses shared fences for its prime exported dmabufs, >>>>> instead of an exclusive fence. Therefore we need to wait for >>>>> all fences of the dmabuf reservation object to prevent >>>>> unsynchronized rendering and flipping. >>>> No. Fix the root cause as this affects not just flips but copies - >>>> this implies that everybody using the resv object must wait for all >>>> fences. The resv object is not just used for prime, but all fencing, so >>>> this breaks the ability to schedule parallel operations across engine. >>>> -Chris >>>> >>> Ok. I think i now understand the difference, but let's check: The >>> exclusive fence is essentially acting a bit like a write-lock, and >>> the shared fences as readers-locks? So you can have multiple readers >>> but only one writer at a time? >> That's how we (i915.ko and I hope the rest of the world) are using them. >> In the model where here is just one reservation object on the GEM >> object, that reservation object is then shared between internal driver >> scheduling and external. We are reliant on being able to use buffers on >> multiple engines through the virtue of the shared fences, and to serialise >> after writes by waiting on the exclusive fence. (So we can have concurrent >> reads on the display engine, render engines and on the CPU - or >> alternatively an exclusive writer.) >> >> In the near future, i915 flips will wait on the common reservation object >> not only for dma-bufs, but also its own GEM objects. >> >>> Ie.: >>> >>> Writer must wait for all fences before starting write access to a >>> buffer, then attach the exclusive fence and signal it on end of >>> write access. E.g., write to renderbuffer, write to texture etc. >> Yes. >> >>> Readers must wait for exclusive fence, then attach a shared fence >>> per reader and signal it on end of read access? E.g., read from >>> texture, fb, scanout? >> Yes. >> >>> Is that correct? In that case we'd have a missing exclusive fence in >>> amdgpu for the linear target dmabuf? Probably beyond my level of >>> knowledge to fix this? >> i915.ko requires the client to mark which buffers are written to. >> >> In ttm, there are ttm_validate_buffer objects which mark whether they >> should be using shared or exclusive fences. Afaict, in amdgpu they are >> all set to shared, the relevant user interface seems to be >> amdgpu_bo_list_set(). > This all makes sense to me. > > Christian, why is amdgpu setting only shared fences? Can we fix that? No, amdgpu relies on the fact that we even allow concurrent write accesses by userspace. E.g. one part of the buffer can be rendered by one engine while another part could be rendered by another engine. Just imagine X which is composing a buffer with both the 3D engine as well as the DMA engine. All engines need to run in parallel and you need to wait for all of them to finish before scanout. Everybody which needs exclusive access to the reservation object (like scanouts do) needs to wait for all fences, not just the exclusive one. The Intel driver clearly needs to be fixed here. Regards, Christian. _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences. 2016-09-13 8:44 ` Christian König @ 2016-09-13 9:39 ` Chris Wilson [not found] ` <20160913093945.GA25204-aII6DKEyn0pWYbfKqPwjAkR8Iwp7RQ6xAL8bYrjMMd8@public.gmane.org> 0 siblings, 1 reply; 46+ messages in thread From: Chris Wilson @ 2016-09-13 9:39 UTC (permalink / raw) To: Christian König; +Cc: Michel Dänzer, amd-gfx, dri-devel On Tue, Sep 13, 2016 at 10:44:11AM +0200, Christian König wrote: > Am 09.09.2016 um 03:15 schrieb Michel Dänzer: > >On 09/09/16 01:23 AM, Chris Wilson wrote: > >>On Thu, Sep 08, 2016 at 05:21:42PM +0200, Mario Kleiner wrote: > >>>On 09/08/2016 08:30 AM, Chris Wilson wrote: > >>>>On Thu, Sep 08, 2016 at 02:14:43AM +0200, Mario Kleiner wrote: > >>>>>amdgpu-kms uses shared fences for its prime exported dmabufs, > >>>>>instead of an exclusive fence. Therefore we need to wait for > >>>>>all fences of the dmabuf reservation object to prevent > >>>>>unsynchronized rendering and flipping. > >>>>No. Fix the root cause as this affects not just flips but copies - > >>>>this implies that everybody using the resv object must wait for all > >>>>fences. The resv object is not just used for prime, but all fencing, so > >>>>this breaks the ability to schedule parallel operations across engine. > >>>>-Chris > >>>> > >>>Ok. I think i now understand the difference, but let's check: The > >>>exclusive fence is essentially acting a bit like a write-lock, and > >>>the shared fences as readers-locks? So you can have multiple readers > >>>but only one writer at a time? > >>That's how we (i915.ko and I hope the rest of the world) are using them. > >>In the model where here is just one reservation object on the GEM > >>object, that reservation object is then shared between internal driver > >>scheduling and external. We are reliant on being able to use buffers on > >>multiple engines through the virtue of the shared fences, and to serialise > >>after writes by waiting on the exclusive fence. (So we can have concurrent > >>reads on the display engine, render engines and on the CPU - or > >>alternatively an exclusive writer.) > >> > >>In the near future, i915 flips will wait on the common reservation object > >>not only for dma-bufs, but also its own GEM objects. > >>>Ie.: > >>> > >>>Writer must wait for all fences before starting write access to a > >>>buffer, then attach the exclusive fence and signal it on end of > >>>write access. E.g., write to renderbuffer, write to texture etc. > >>Yes. > >>>Readers must wait for exclusive fence, then attach a shared fence > >>>per reader and signal it on end of read access? E.g., read from > >>>texture, fb, scanout? > >>Yes. > >>>Is that correct? In that case we'd have a missing exclusive fence in > >>>amdgpu for the linear target dmabuf? Probably beyond my level of > >>>knowledge to fix this? > >>i915.ko requires the client to mark which buffers are written to. > >> > >>In ttm, there are ttm_validate_buffer objects which mark whether they > >>should be using shared or exclusive fences. Afaict, in amdgpu they are > >>all set to shared, the relevant user interface seems to be > >>amdgpu_bo_list_set(). > >This all makes sense to me. > > > >Christian, why is amdgpu setting only shared fences? Can we fix that? > > No, amdgpu relies on the fact that we even allow concurrent write > accesses by userspace. > > E.g. one part of the buffer can be rendered by one engine while > another part could be rendered by another engine. > > Just imagine X which is composing a buffer with both the 3D engine > as well as the DMA engine. > > All engines need to run in parallel and you need to wait for all of > them to finish before scanout. > > Everybody which needs exclusive access to the reservation object > (like scanouts do) needs to wait for all fences, not just the > exclusive one. > > The Intel driver clearly needs to be fixed here. If you are not using implicit fencing, you have to pass explicit fences instead. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 46+ messages in thread
[parent not found: <20160913093945.GA25204-aII6DKEyn0pWYbfKqPwjAkR8Iwp7RQ6xAL8bYrjMMd8@public.gmane.org>]
* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences. [not found] ` <20160913093945.GA25204-aII6DKEyn0pWYbfKqPwjAkR8Iwp7RQ6xAL8bYrjMMd8@public.gmane.org> @ 2016-09-13 12:52 ` Christian König 2016-09-21 9:56 ` Michel Dänzer 0 siblings, 1 reply; 46+ messages in thread From: Christian König @ 2016-09-13 12:52 UTC (permalink / raw) To: Chris Wilson, Michel Dänzer, Mario Kleiner, dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW Am 13.09.2016 um 11:39 schrieb Chris Wilson: > On Tue, Sep 13, 2016 at 10:44:11AM +0200, Christian König wrote: >> Am 09.09.2016 um 03:15 schrieb Michel Dänzer: >>> On 09/09/16 01:23 AM, Chris Wilson wrote: >>>> On Thu, Sep 08, 2016 at 05:21:42PM +0200, Mario Kleiner wrote: >>>>> On 09/08/2016 08:30 AM, Chris Wilson wrote: >>>>>> On Thu, Sep 08, 2016 at 02:14:43AM +0200, Mario Kleiner wrote: >>>>>>> amdgpu-kms uses shared fences for its prime exported dmabufs, >>>>>>> instead of an exclusive fence. Therefore we need to wait for >>>>>>> all fences of the dmabuf reservation object to prevent >>>>>>> unsynchronized rendering and flipping. >>>>>> No. Fix the root cause as this affects not just flips but copies - >>>>>> this implies that everybody using the resv object must wait for all >>>>>> fences. The resv object is not just used for prime, but all fencing, so >>>>>> this breaks the ability to schedule parallel operations across engine. >>>>>> -Chris >>>>>> >>>>> Ok. I think i now understand the difference, but let's check: The >>>>> exclusive fence is essentially acting a bit like a write-lock, and >>>>> the shared fences as readers-locks? So you can have multiple readers >>>>> but only one writer at a time? >>>> That's how we (i915.ko and I hope the rest of the world) are using them. >>>> In the model where here is just one reservation object on the GEM >>>> object, that reservation object is then shared between internal driver >>>> scheduling and external. We are reliant on being able to use buffers on >>>> multiple engines through the virtue of the shared fences, and to serialise >>>> after writes by waiting on the exclusive fence. (So we can have concurrent >>>> reads on the display engine, render engines and on the CPU - or >>>> alternatively an exclusive writer.) >>>> >>>> In the near future, i915 flips will wait on the common reservation object >>>> not only for dma-bufs, but also its own GEM objects. >>>>> Ie.: >>>>> >>>>> Writer must wait for all fences before starting write access to a >>>>> buffer, then attach the exclusive fence and signal it on end of >>>>> write access. E.g., write to renderbuffer, write to texture etc. >>>> Yes. >>>>> Readers must wait for exclusive fence, then attach a shared fence >>>>> per reader and signal it on end of read access? E.g., read from >>>>> texture, fb, scanout? >>>> Yes. >>>>> Is that correct? In that case we'd have a missing exclusive fence in >>>>> amdgpu for the linear target dmabuf? Probably beyond my level of >>>>> knowledge to fix this? >>>> i915.ko requires the client to mark which buffers are written to. >>>> >>>> In ttm, there are ttm_validate_buffer objects which mark whether they >>>> should be using shared or exclusive fences. Afaict, in amdgpu they are >>>> all set to shared, the relevant user interface seems to be >>>> amdgpu_bo_list_set(). >>> This all makes sense to me. >>> >>> Christian, why is amdgpu setting only shared fences? Can we fix that? >> No, amdgpu relies on the fact that we even allow concurrent write >> accesses by userspace. >> >> E.g. one part of the buffer can be rendered by one engine while >> another part could be rendered by another engine. >> >> Just imagine X which is composing a buffer with both the 3D engine >> as well as the DMA engine. >> >> All engines need to run in parallel and you need to wait for all of >> them to finish before scanout. >> >> Everybody which needs exclusive access to the reservation object >> (like scanouts do) needs to wait for all fences, not just the >> exclusive one. >> >> The Intel driver clearly needs to be fixed here. > If you are not using implicit fencing, you have to pass explicit fences > instead. Which is exactly what we do, but only for the driver internally command submissions. All command submissions from the same process can run concurrently with amdgpu, only when we see a fence from another driver or process we wait for it to complete before starting to run a command submission. Other drivers can't make any assumption on what a shared access is actually doing (e.g. writing or reading) with a buffer. So the model i915.ko is using the reservation object and it's shared fences is certainly not correct and needs to be fixed. Regards, Christian. > -Chris > _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences. 2016-09-13 12:52 ` Christian König @ 2016-09-21 9:56 ` Michel Dänzer [not found] ` <7aafce92-8bcf-1c5c-45de-9e8ecda85239-otUistvHUpPR7s880joybQ@public.gmane.org> 0 siblings, 1 reply; 46+ messages in thread From: Michel Dänzer @ 2016-09-21 9:56 UTC (permalink / raw) To: Christian König, Chris Wilson, Mario Kleiner; +Cc: amd-gfx, dri-devel On 13/09/16 09:52 PM, Christian König wrote: > Am 13.09.2016 um 11:39 schrieb Chris Wilson: >> On Tue, Sep 13, 2016 at 10:44:11AM +0200, Christian König wrote: >>> Am 09.09.2016 um 03:15 schrieb Michel Dänzer: >>>> On 09/09/16 01:23 AM, Chris Wilson wrote: >>>>> On Thu, Sep 08, 2016 at 05:21:42PM +0200, Mario Kleiner wrote: >>>>>> On 09/08/2016 08:30 AM, Chris Wilson wrote: >>>>>>> On Thu, Sep 08, 2016 at 02:14:43AM +0200, Mario Kleiner wrote: >>>>>>>> amdgpu-kms uses shared fences for its prime exported dmabufs, >>>>>>>> instead of an exclusive fence. Therefore we need to wait for >>>>>>>> all fences of the dmabuf reservation object to prevent >>>>>>>> unsynchronized rendering and flipping. >>>>>>> No. Fix the root cause as this affects not just flips but copies - >>>>>>> this implies that everybody using the resv object must wait for all >>>>>>> fences. The resv object is not just used for prime, but all >>>>>>> fencing, so >>>>>>> this breaks the ability to schedule parallel operations across >>>>>>> engine. >>>>>>> -Chris >>>>>>> >>>>>> Ok. I think i now understand the difference, but let's check: The >>>>>> exclusive fence is essentially acting a bit like a write-lock, and >>>>>> the shared fences as readers-locks? So you can have multiple readers >>>>>> but only one writer at a time? >>>>> That's how we (i915.ko and I hope the rest of the world) are using >>>>> them. >>>>> In the model where here is just one reservation object on the GEM >>>>> object, that reservation object is then shared between internal driver >>>>> scheduling and external. We are reliant on being able to use >>>>> buffers on >>>>> multiple engines through the virtue of the shared fences, and to >>>>> serialise >>>>> after writes by waiting on the exclusive fence. (So we can have >>>>> concurrent >>>>> reads on the display engine, render engines and on the CPU - or >>>>> alternatively an exclusive writer.) >>>>> >>>>> In the near future, i915 flips will wait on the common reservation >>>>> object >>>>> not only for dma-bufs, but also its own GEM objects. >>>>>> Ie.: >>>>>> >>>>>> Writer must wait for all fences before starting write access to a >>>>>> buffer, then attach the exclusive fence and signal it on end of >>>>>> write access. E.g., write to renderbuffer, write to texture etc. >>>>> Yes. >>>>>> Readers must wait for exclusive fence, then attach a shared fence >>>>>> per reader and signal it on end of read access? E.g., read from >>>>>> texture, fb, scanout? >>>>> Yes. >>>>>> Is that correct? In that case we'd have a missing exclusive fence in >>>>>> amdgpu for the linear target dmabuf? Probably beyond my level of >>>>>> knowledge to fix this? >>>>> i915.ko requires the client to mark which buffers are written to. >>>>> >>>>> In ttm, there are ttm_validate_buffer objects which mark whether they >>>>> should be using shared or exclusive fences. Afaict, in amdgpu they are >>>>> all set to shared, the relevant user interface seems to be >>>>> amdgpu_bo_list_set(). >>>> This all makes sense to me. >>>> >>>> Christian, why is amdgpu setting only shared fences? Can we fix that? >>> No, amdgpu relies on the fact that we even allow concurrent write >>> accesses by userspace. >>> >>> E.g. one part of the buffer can be rendered by one engine while >>> another part could be rendered by another engine. >>> >>> Just imagine X which is composing a buffer with both the 3D engine >>> as well as the DMA engine. >>> >>> All engines need to run in parallel and you need to wait for all of >>> them to finish before scanout. >>> >>> Everybody which needs exclusive access to the reservation object >>> (like scanouts do) needs to wait for all fences, not just the >>> exclusive one. >>> >>> The Intel driver clearly needs to be fixed here. >> If you are not using implicit fencing, you have to pass explicit fences >> instead. > > Which is exactly what we do, but only for the driver internally command > submissions. > > All command submissions from the same process can run concurrently with > amdgpu, only when we see a fence from another driver or process we wait > for it to complete before starting to run a command submission. > > Other drivers can't make any assumption on what a shared access is > actually doing (e.g. writing or reading) with a buffer. > > So the model i915.ko is using the reservation object and it's shared > fences is certainly not correct and needs to be fixed. Looks like there are different interpretations of the semantics of exclusive vs. shared fences. Where are these semantics documented? FWIW, we seem to have the same issue with radeon vs. amdgpu: radeon only seems to wait for exclusive fences, so e.g. running Xorg on amdgpu and using PRIME slave scanout on radeon leaves artifacts. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 46+ messages in thread
[parent not found: <7aafce92-8bcf-1c5c-45de-9e8ecda85239-otUistvHUpPR7s880joybQ@public.gmane.org>]
* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences. [not found] ` <7aafce92-8bcf-1c5c-45de-9e8ecda85239-otUistvHUpPR7s880joybQ@public.gmane.org> @ 2016-09-21 10:30 ` Christian König 2016-09-21 11:04 ` Daniel Vetter 2016-09-21 15:13 ` Michel Dänzer 0 siblings, 2 replies; 46+ messages in thread From: Christian König @ 2016-09-21 10:30 UTC (permalink / raw) To: Michel Dänzer, Chris Wilson, Mario Kleiner Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW Am 21.09.2016 um 11:56 schrieb Michel Dänzer: > > Looks like there are different interpretations of the semantics of > exclusive vs. shared fences. Where are these semantics documented? Yeah, I think as well that this is the primary question here. IIRC the fences were explicitly called exclusive/shared instead of writing/reading on purpose. I absolutely don't mind switching to them to writing/reading semantics, but amdgpu really needs multiple writers at the same time. So in this case the writing side of a reservation object needs to be a collection of fences as well. > FWIW, we seem to have the same issue with radeon vs. amdgpu: radeon only > seems to wait for exclusive fences, so e.g. running Xorg on amdgpu and > using PRIME slave scanout on radeon leaves artifacts. Yeah, I know. See radeon_display.c radeon_flip_work_func(). We pretty much need the same patch here I've done for amdgpu as well. Regards, Christian. _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences. 2016-09-21 10:30 ` Christian König @ 2016-09-21 11:04 ` Daniel Vetter [not found] ` <CAKMK7uG3j54NzwjxmWuSmP787r+QN-Cu5T8R-naX6S9RvvKemw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2016-09-21 15:13 ` Michel Dänzer 1 sibling, 1 reply; 46+ messages in thread From: Daniel Vetter @ 2016-09-21 11:04 UTC (permalink / raw) To: Christian König; +Cc: dri-devel, Michel Dänzer, amd-gfx list On Wed, Sep 21, 2016 at 12:30 PM, Christian König <deathsimple@vodafone.de> wrote: > Am 21.09.2016 um 11:56 schrieb Michel Dänzer: >> >> >> Looks like there are different interpretations of the semantics of >> exclusive vs. shared fences. Where are these semantics documented? > > > Yeah, I think as well that this is the primary question here. > > IIRC the fences were explicitly called exclusive/shared instead of > writing/reading on purpose. > > I absolutely don't mind switching to them to writing/reading semantics, but > amdgpu really needs multiple writers at the same time. > > So in this case the writing side of a reservation object needs to be a > collection of fences as well. You can't have multiple writers with implicit syncing. That confusion is exactly why we called them shared/exclusive. Multiple writers generally means that you do some form of fencing in userspace (unsync'ed gl buffer access is the common one). What you do for private buffers doesn't matter, but when you render into a shared/winsys buffer you really need to set the exclusive fence (and there can only ever be one). So probably needs some userspace adjustments to make sure you don't accidentally set an exclusive write hazard when you don't really want that implicit sync. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 46+ messages in thread
[parent not found: <CAKMK7uG3j54NzwjxmWuSmP787r+QN-Cu5T8R-naX6S9RvvKemw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences. [not found] ` <CAKMK7uG3j54NzwjxmWuSmP787r+QN-Cu5T8R-naX6S9RvvKemw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2016-09-21 11:19 ` Christian König 2016-09-21 12:56 ` Daniel Vetter 0 siblings, 1 reply; 46+ messages in thread From: Christian König @ 2016-09-21 11:19 UTC (permalink / raw) To: Daniel Vetter Cc: dri-devel, Michel Dänzer, Mario Kleiner, amd-gfx list, Chris Wilson Am 21.09.2016 um 13:04 schrieb Daniel Vetter: > On Wed, Sep 21, 2016 at 12:30 PM, Christian König > <deathsimple@vodafone.de> wrote: >> Am 21.09.2016 um 11:56 schrieb Michel Dänzer: >>> >>> Looks like there are different interpretations of the semantics of >>> exclusive vs. shared fences. Where are these semantics documented? >> >> Yeah, I think as well that this is the primary question here. >> >> IIRC the fences were explicitly called exclusive/shared instead of >> writing/reading on purpose. >> >> I absolutely don't mind switching to them to writing/reading semantics, but >> amdgpu really needs multiple writers at the same time. >> >> So in this case the writing side of a reservation object needs to be a >> collection of fences as well. > You can't have multiple writers with implicit syncing. That confusion > is exactly why we called them shared/exclusive. Multiple writers > generally means that you do some form of fencing in userspace > (unsync'ed gl buffer access is the common one). What you do for > private buffers doesn't matter, but when you render into a > shared/winsys buffer you really need to set the exclusive fence (and > there can only ever be one). So probably needs some userspace > adjustments to make sure you don't accidentally set an exclusive write > hazard when you don't really want that implicit sync. Nope, that isn't true. We use multiple writers without implicit syncing between processes in the amdgpu stack perfectly fine. See amdgpu_sync.c for the implementation. What we do there is taking a look at all the fences associated with a reservation object and only sync to those who are from another process. Then we use implicit syncing for command submissions in the form of "dependencies". E.g. for each CS we report back an identifier of that submission to user space and on the next submission you can give this identifier as dependency which needs to be satisfied before the command submission can start running. This was done to allow multiple engines (3D, DMA, Compute) to compose a buffer while still allow compatibility with protocols like DRI2/DRI3. Regards, Christian. > -Daniel _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences. 2016-09-21 11:19 ` Christian König @ 2016-09-21 12:56 ` Daniel Vetter [not found] ` <CAKMK7uH6N2Kgwkf-11iwdqDAUrFmreYKLLeTGXmEh+N0DQ4tJg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 46+ messages in thread From: Daniel Vetter @ 2016-09-21 12:56 UTC (permalink / raw) To: Christian König; +Cc: dri-devel, Michel Dänzer, amd-gfx list On Wed, Sep 21, 2016 at 1:19 PM, Christian König <deathsimple@vodafone.de> wrote: > Am 21.09.2016 um 13:04 schrieb Daniel Vetter: >> >> On Wed, Sep 21, 2016 at 12:30 PM, Christian König >> <deathsimple@vodafone.de> wrote: >>> >>> Am 21.09.2016 um 11:56 schrieb Michel Dänzer: >>>> >>>> >>>> Looks like there are different interpretations of the semantics of >>>> exclusive vs. shared fences. Where are these semantics documented? >>> >>> >>> Yeah, I think as well that this is the primary question here. >>> >>> IIRC the fences were explicitly called exclusive/shared instead of >>> writing/reading on purpose. >>> >>> I absolutely don't mind switching to them to writing/reading semantics, >>> but >>> amdgpu really needs multiple writers at the same time. >>> >>> So in this case the writing side of a reservation object needs to be a >>> collection of fences as well. >> >> You can't have multiple writers with implicit syncing. That confusion >> is exactly why we called them shared/exclusive. Multiple writers >> generally means that you do some form of fencing in userspace >> (unsync'ed gl buffer access is the common one). What you do for >> private buffers doesn't matter, but when you render into a >> shared/winsys buffer you really need to set the exclusive fence (and >> there can only ever be one). So probably needs some userspace >> adjustments to make sure you don't accidentally set an exclusive write >> hazard when you don't really want that implicit sync. > > > Nope, that isn't true. > > We use multiple writers without implicit syncing between processes in the > amdgpu stack perfectly fine. > > See amdgpu_sync.c for the implementation. What we do there is taking a look > at all the fences associated with a reservation object and only sync to > those who are from another process. > > Then we use implicit syncing for command submissions in the form of > "dependencies". E.g. for each CS we report back an identifier of that > submission to user space and on the next submission you can give this > identifier as dependency which needs to be satisfied before the command > submission can start running. This is called explicit fencing. Implemented with a driver-private primitive (and not sync_file fds like on android), but still conceptually explicit fencing. Implicit fencing really only can handle one writer, at least as currently implemented by struct reservation_object. > This was done to allow multiple engines (3D, DMA, Compute) to compose a > buffer while still allow compatibility with protocols like DRI2/DRI3. Instead of the current solution you need to stop attaching exclusive fences to non-shared buffers (which are coordinated using the driver-private explicit fencing you're describing), and only attach exclusive fences to shared buffers (DRI2/3, PRIME, whatever). Since you're doing explicit syncing for internal stuff anyway you can still opt to ignore the exclusive fences if you want to (driven by a flag or something similar). -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 46+ messages in thread
[parent not found: <CAKMK7uH6N2Kgwkf-11iwdqDAUrFmreYKLLeTGXmEh+N0DQ4tJg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences. [not found] ` <CAKMK7uH6N2Kgwkf-11iwdqDAUrFmreYKLLeTGXmEh+N0DQ4tJg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2016-09-21 15:07 ` Michel Dänzer [not found] ` <9d1f4872-cabd-bd1b-7f10-6e4230a1f58c-otUistvHUpPR7s880joybQ@public.gmane.org> 0 siblings, 1 reply; 46+ messages in thread From: Michel Dänzer @ 2016-09-21 15:07 UTC (permalink / raw) To: Daniel Vetter, Christian König Cc: Mario Kleiner, dri-devel, amd-gfx list, Chris Wilson On 21/09/16 09:56 PM, Daniel Vetter wrote: > On Wed, Sep 21, 2016 at 1:19 PM, Christian König > <deathsimple@vodafone.de> wrote: >> Am 21.09.2016 um 13:04 schrieb Daniel Vetter: >>> On Wed, Sep 21, 2016 at 12:30 PM, Christian König >>> <deathsimple@vodafone.de> wrote: >>>> Am 21.09.2016 um 11:56 schrieb Michel Dänzer: >>>>> >>>>> >>>>> Looks like there are different interpretations of the semantics of >>>>> exclusive vs. shared fences. Where are these semantics documented? >>>> >>>> >>>> Yeah, I think as well that this is the primary question here. >>>> >>>> IIRC the fences were explicitly called exclusive/shared instead of >>>> writing/reading on purpose. >>>> >>>> I absolutely don't mind switching to them to writing/reading semantics, >>>> but >>>> amdgpu really needs multiple writers at the same time. >>>> >>>> So in this case the writing side of a reservation object needs to be a >>>> collection of fences as well. >>> >>> You can't have multiple writers with implicit syncing. That confusion >>> is exactly why we called them shared/exclusive. Multiple writers >>> generally means that you do some form of fencing in userspace >>> (unsync'ed gl buffer access is the common one). What you do for >>> private buffers doesn't matter, but when you render into a >>> shared/winsys buffer you really need to set the exclusive fence (and >>> there can only ever be one). So probably needs some userspace >>> adjustments to make sure you don't accidentally set an exclusive write >>> hazard when you don't really want that implicit sync. >> >> >> Nope, that isn't true. >> >> We use multiple writers without implicit syncing between processes in the >> amdgpu stack perfectly fine. >> >> See amdgpu_sync.c for the implementation. What we do there is taking a look >> at all the fences associated with a reservation object and only sync to >> those who are from another process. >> >> Then we use implicit syncing for command submissions in the form of >> "dependencies". E.g. for each CS we report back an identifier of that >> submission to user space and on the next submission you can give this >> identifier as dependency which needs to be satisfied before the command >> submission can start running. > > This is called explicit fencing. Implemented with a driver-private > primitive (and not sync_file fds like on android), but still > conceptually explicit fencing. Implicit fencing really only can handle > one writer, at least as currently implemented by struct > reservation_object. > >> This was done to allow multiple engines (3D, DMA, Compute) to compose a >> buffer while still allow compatibility with protocols like DRI2/DRI3. > > Instead of the current solution you need to stop attaching exclusive > fences to non-shared buffers (which are coordinated using the > driver-private explicit fencing you're describing), Err, the current issue is actually that amdgpu never sets an exclusive fence, only ever shared ones. :) > and only attach exclusive fences to shared buffers (DRI2/3, PRIME, > whatever). Still, it occurred to me in the meantime that amdgpu setting the exclusive fence for buffers shared via PRIME (no matter if it's a write or read operation) might be a solution. Christian, what do you think? -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 46+ messages in thread
[parent not found: <9d1f4872-cabd-bd1b-7f10-6e4230a1f58c-otUistvHUpPR7s880joybQ@public.gmane.org>]
* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences. [not found] ` <9d1f4872-cabd-bd1b-7f10-6e4230a1f58c-otUistvHUpPR7s880joybQ@public.gmane.org> @ 2016-09-21 15:15 ` Christian König [not found] ` <5c2048ff-0e20-ddf3-2d73-9a3acb38e7ff-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org> 2016-09-22 6:33 ` Daniel Vetter 1 sibling, 1 reply; 46+ messages in thread From: Christian König @ 2016-09-21 15:15 UTC (permalink / raw) To: Michel Dänzer, Daniel Vetter Cc: Mario Kleiner, dri-devel, amd-gfx list, Chris Wilson Am 21.09.2016 um 17:07 schrieb Michel Dänzer: > On 21/09/16 09:56 PM, Daniel Vetter wrote: >> On Wed, Sep 21, 2016 at 1:19 PM, Christian König >> <deathsimple@vodafone.de> wrote: >>> Am 21.09.2016 um 13:04 schrieb Daniel Vetter: >>>> On Wed, Sep 21, 2016 at 12:30 PM, Christian König >>>> <deathsimple@vodafone.de> wrote: >>>>> Am 21.09.2016 um 11:56 schrieb Michel Dänzer: >>>>>> >>>>>> Looks like there are different interpretations of the semantics of >>>>>> exclusive vs. shared fences. Where are these semantics documented? >>>>> >>>>> Yeah, I think as well that this is the primary question here. >>>>> >>>>> IIRC the fences were explicitly called exclusive/shared instead of >>>>> writing/reading on purpose. >>>>> >>>>> I absolutely don't mind switching to them to writing/reading semantics, >>>>> but >>>>> amdgpu really needs multiple writers at the same time. >>>>> >>>>> So in this case the writing side of a reservation object needs to be a >>>>> collection of fences as well. >>>> You can't have multiple writers with implicit syncing. That confusion >>>> is exactly why we called them shared/exclusive. Multiple writers >>>> generally means that you do some form of fencing in userspace >>>> (unsync'ed gl buffer access is the common one). What you do for >>>> private buffers doesn't matter, but when you render into a >>>> shared/winsys buffer you really need to set the exclusive fence (and >>>> there can only ever be one). So probably needs some userspace >>>> adjustments to make sure you don't accidentally set an exclusive write >>>> hazard when you don't really want that implicit sync. >>> >>> Nope, that isn't true. >>> >>> We use multiple writers without implicit syncing between processes in the >>> amdgpu stack perfectly fine. >>> >>> See amdgpu_sync.c for the implementation. What we do there is taking a look >>> at all the fences associated with a reservation object and only sync to >>> those who are from another process. >>> >>> Then we use implicit syncing for command submissions in the form of >>> "dependencies". E.g. for each CS we report back an identifier of that >>> submission to user space and on the next submission you can give this >>> identifier as dependency which needs to be satisfied before the command >>> submission can start running. >> This is called explicit fencing. Implemented with a driver-private >> primitive (and not sync_file fds like on android), but still >> conceptually explicit fencing. Implicit fencing really only can handle >> one writer, at least as currently implemented by struct >> reservation_object. >> >>> This was done to allow multiple engines (3D, DMA, Compute) to compose a >>> buffer while still allow compatibility with protocols like DRI2/DRI3. >> Instead of the current solution you need to stop attaching exclusive >> fences to non-shared buffers (which are coordinated using the >> driver-private explicit fencing you're describing), > Err, the current issue is actually that amdgpu never sets an exclusive > fence, only ever shared ones. :) Actually amdgpu does set the exclusive fence for buffer migrations, cause that is an operation user space doesn't know about and so it needs to be "exclusive" access to the buffer. >> and only attach exclusive fences to shared buffers (DRI2/3, PRIME, >> whatever). > Still, it occurred to me in the meantime that amdgpu setting the > exclusive fence for buffers shared via PRIME (no matter if it's a write > or read operation) might be a solution. Christian, what do you think? The problem is that we don't have one fence, but many. E.g. there can be many operation on a buffer at the same time and we need to wait for all of them to complete before it can be displayed. Regards, Christian. _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 46+ messages in thread
[parent not found: <5c2048ff-0e20-ddf3-2d73-9a3acb38e7ff-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>]
* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences. [not found] ` <5c2048ff-0e20-ddf3-2d73-9a3acb38e7ff-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org> @ 2016-09-21 15:29 ` Michel Dänzer 2016-09-21 16:23 ` Christian König 0 siblings, 1 reply; 46+ messages in thread From: Michel Dänzer @ 2016-09-21 15:29 UTC (permalink / raw) To: Christian König, Daniel Vetter Cc: Mario Kleiner, dri-devel, amd-gfx list, Chris Wilson On 22/09/16 12:15 AM, Christian König wrote: > Am 21.09.2016 um 17:07 schrieb Michel Dänzer: >> On 21/09/16 09:56 PM, Daniel Vetter wrote: >>> On Wed, Sep 21, 2016 at 1:19 PM, Christian König >>> <deathsimple@vodafone.de> wrote: >>>> >>>> We use multiple writers without implicit syncing between processes >>>> in the >>>> amdgpu stack perfectly fine. >>>> >>>> See amdgpu_sync.c for the implementation. What we do there is taking >>>> a look >>>> at all the fences associated with a reservation object and only sync to >>>> those who are from another process. >>>> >>>> Then we use implicit syncing for command submissions in the form of >>>> "dependencies". E.g. for each CS we report back an identifier of that >>>> submission to user space and on the next submission you can give this >>>> identifier as dependency which needs to be satisfied before the command >>>> submission can start running. >>> This is called explicit fencing. Implemented with a driver-private >>> primitive (and not sync_file fds like on android), but still >>> conceptually explicit fencing. Implicit fencing really only can handle >>> one writer, at least as currently implemented by struct >>> reservation_object. >>> >>>> This was done to allow multiple engines (3D, DMA, Compute) to compose a >>>> buffer while still allow compatibility with protocols like DRI2/DRI3. >>> Instead of the current solution you need to stop attaching exclusive >>> fences to non-shared buffers (which are coordinated using the >>> driver-private explicit fencing you're describing), >> Err, the current issue is actually that amdgpu never sets an exclusive >> fence, only ever shared ones. :) > > Actually amdgpu does set the exclusive fence for buffer migrations, > cause that is an operation user space doesn't know about and so it needs > to be "exclusive" access to the buffer. > > >>> and only attach exclusive fences to shared buffers (DRI2/3, PRIME, >>> whatever). >> Still, it occurred to me in the meantime that amdgpu setting the >> exclusive fence for buffers shared via PRIME (no matter if it's a write >> or read operation) might be a solution. Christian, what do you think? > > The problem is that we don't have one fence, but many. > > E.g. there can be many operation on a buffer at the same time and we > need to wait for all of them to complete before it can be displayed. Maybe in theory, but with the problem we have in practice right now, the amdgpu GPU should only ever access the shared BO with the same engine. Anyway, this should be solvable by setting some kind of meta-fence as the exclusive fence, which can internally be mapped to multiple fences, maybe up to one for each ring which can access the BO? -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences. 2016-09-21 15:29 ` Michel Dänzer @ 2016-09-21 16:23 ` Christian König 2016-09-22 6:36 ` Daniel Vetter 0 siblings, 1 reply; 46+ messages in thread From: Christian König @ 2016-09-21 16:23 UTC (permalink / raw) To: Michel Dänzer, Daniel Vetter; +Cc: dri-devel, amd-gfx list Am 21.09.2016 um 17:29 schrieb Michel Dänzer: > On 22/09/16 12:15 AM, Christian König wrote: >> Am 21.09.2016 um 17:07 schrieb Michel Dänzer: >>> On 21/09/16 09:56 PM, Daniel Vetter wrote: >>>> On Wed, Sep 21, 2016 at 1:19 PM, Christian König >>>> <deathsimple@vodafone.de> wrote: >>>>> We use multiple writers without implicit syncing between processes >>>>> in the >>>>> amdgpu stack perfectly fine. >>>>> >>>>> See amdgpu_sync.c for the implementation. What we do there is taking >>>>> a look >>>>> at all the fences associated with a reservation object and only sync to >>>>> those who are from another process. >>>>> >>>>> Then we use implicit syncing for command submissions in the form of >>>>> "dependencies". E.g. for each CS we report back an identifier of that >>>>> submission to user space and on the next submission you can give this >>>>> identifier as dependency which needs to be satisfied before the command >>>>> submission can start running. >>>> This is called explicit fencing. Implemented with a driver-private >>>> primitive (and not sync_file fds like on android), but still >>>> conceptually explicit fencing. Implicit fencing really only can handle >>>> one writer, at least as currently implemented by struct >>>> reservation_object. >>>> >>>>> This was done to allow multiple engines (3D, DMA, Compute) to compose a >>>>> buffer while still allow compatibility with protocols like DRI2/DRI3. >>>> Instead of the current solution you need to stop attaching exclusive >>>> fences to non-shared buffers (which are coordinated using the >>>> driver-private explicit fencing you're describing), >>> Err, the current issue is actually that amdgpu never sets an exclusive >>> fence, only ever shared ones. :) >> Actually amdgpu does set the exclusive fence for buffer migrations, >> cause that is an operation user space doesn't know about and so it needs >> to be "exclusive" access to the buffer. >> >> >>>> and only attach exclusive fences to shared buffers (DRI2/3, PRIME, >>>> whatever). >>> Still, it occurred to me in the meantime that amdgpu setting the >>> exclusive fence for buffers shared via PRIME (no matter if it's a write >>> or read operation) might be a solution. Christian, what do you think? >> The problem is that we don't have one fence, but many. >> >> E.g. there can be many operation on a buffer at the same time and we >> need to wait for all of them to complete before it can be displayed. > Maybe in theory, but with the problem we have in practice right now, the > amdgpu GPU should only ever access the shared BO with the same engine. That clearly won't work. Take a look at what both Mesa and the pro stack do with the BO before it is displayed makes it mandatory to execute things in parallel (at least for the not shared case). > Anyway, this should be solvable by setting some kind of meta-fence as > the exclusive fence, which can internally be mapped to multiple fences, > maybe up to one for each ring which can access the BO? I've thought about that as well, but this approach would also only work when we keep a collection of fences and not just an array because of the scheduler. For a quick workaround I suggest to just serialize all accesses to BO shared with different drivers, but essentially I think it is a perfectly valid requirement to have multiple writers to one BO. Christian. _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences. 2016-09-21 16:23 ` Christian König @ 2016-09-22 6:36 ` Daniel Vetter [not found] ` <20160922063625.GD22164-XQyZGdhdUcTMwUGJfOwWj/ooFf0ArEBIu+b9c/7xato@public.gmane.org> 0 siblings, 1 reply; 46+ messages in thread From: Daniel Vetter @ 2016-09-22 6:36 UTC (permalink / raw) To: Christian König; +Cc: Michel Dänzer, dri-devel, amd-gfx list On Wed, Sep 21, 2016 at 06:23:35PM +0200, Christian König wrote: > For a quick workaround I suggest to just serialize all accesses to BO shared > with different drivers, but essentially I think it is a perfectly valid > requirement to have multiple writers to one BO. It is, but it's not possible with implicit sync. If you want parallel write access to the same shared buffer, you _must_ carry around some explicit fences. Within amdgpu you can use driver-specific cookies, for shared buffer we now have sync_file. But multiple writers with implicit sync simply fundamentally doesn't work. Because you have no idea with which writer, touching the same subrange you want to touch. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 46+ messages in thread
[parent not found: <20160922063625.GD22164-XQyZGdhdUcTMwUGJfOwWj/ooFf0ArEBIu+b9c/7xato@public.gmane.org>]
* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences. [not found] ` <20160922063625.GD22164-XQyZGdhdUcTMwUGJfOwWj/ooFf0ArEBIu+b9c/7xato@public.gmane.org> @ 2016-09-22 10:55 ` Christian König 2016-09-22 12:26 ` Daniel Vetter 0 siblings, 1 reply; 46+ messages in thread From: Christian König @ 2016-09-22 10:55 UTC (permalink / raw) To: Daniel Vetter Cc: dri-devel, Michel Dänzer, Mario Kleiner, amd-gfx list, Chris Wilson Am 22.09.2016 um 08:36 schrieb Daniel Vetter: > On Wed, Sep 21, 2016 at 06:23:35PM +0200, Christian König wrote: >> For a quick workaround I suggest to just serialize all accesses to BO shared >> with different drivers, but essentially I think it is a perfectly valid >> requirement to have multiple writers to one BO. > It is, but it's not possible with implicit sync. If you want parallel > write access to the same shared buffer, you _must_ carry around some > explicit fences. Within amdgpu you can use driver-specific cookies, for > shared buffer we now have sync_file. But multiple writers with implicit > sync simply fundamentally doesn't work. Because you have no idea with which > writer, touching the same subrange you want to touch. You don't need to separate the BO into subranges which are touched by different engines for allowing multiple writers. AMD hardware and I'm pretty sure others as well are perfectly capable of writing to the same memory from multiple engines and even multiple GPUs at the same time. For a good hint of what is possible see the public AMD ISA documentation about atomic operations, but that is only the start of it. The crux here is that we need to assume that we will have implicit and explicit sync mixed for backward compatibility. This implies that we need some mechanism like amdgpu uses in it's sync implementation where every fence is associated with an owner which denotes the domain in which implicit sync happens. If you leave this domain you will automatically run into explicit sync. Currently we define the borders of this domain in amdgpu on process boundary to keep things like DRI2/DRI3 working as expected. I really don't see how you want to solve this with a single explicit fence for each reservation object. As long as you have multiple concurrently running operations accessing the same buffer you need to keep one fence for each operation no matter what. Regards, Christian. > -Daniel _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences. 2016-09-22 10:55 ` Christian König @ 2016-09-22 12:26 ` Daniel Vetter 2016-09-22 12:44 ` Christian König 0 siblings, 1 reply; 46+ messages in thread From: Daniel Vetter @ 2016-09-22 12:26 UTC (permalink / raw) To: Christian König; +Cc: dri-devel, Michel Dänzer, amd-gfx list On Thu, Sep 22, 2016 at 12:55 PM, Christian König <deathsimple@vodafone.de> wrote: > Am 22.09.2016 um 08:36 schrieb Daniel Vetter: >> >> On Wed, Sep 21, 2016 at 06:23:35PM +0200, Christian König wrote: >>> >>> For a quick workaround I suggest to just serialize all accesses to BO >>> shared >>> with different drivers, but essentially I think it is a perfectly valid >>> requirement to have multiple writers to one BO. >> >> It is, but it's not possible with implicit sync. If you want parallel >> write access to the same shared buffer, you _must_ carry around some >> explicit fences. Within amdgpu you can use driver-specific cookies, for >> shared buffer we now have sync_file. But multiple writers with implicit >> sync simply fundamentally doesn't work. Because you have no idea with >> which >> writer, touching the same subrange you want to touch. > > > You don't need to separate the BO into subranges which are touched by > different engines for allowing multiple writers. > > AMD hardware and I'm pretty sure others as well are perfectly capable of > writing to the same memory from multiple engines and even multiple GPUs at > the same time. > > For a good hint of what is possible see the public AMD ISA documentation > about atomic operations, but that is only the start of it. > > > The crux here is that we need to assume that we will have implicit and > explicit sync mixed for backward compatibility. > > This implies that we need some mechanism like amdgpu uses in it's sync > implementation where every fence is associated with an owner which denotes > the domain in which implicit sync happens. If you leave this domain you will > automatically run into explicit sync. > > Currently we define the borders of this domain in amdgpu on process boundary > to keep things like DRI2/DRI3 working as expected. > > I really don't see how you want to solve this with a single explicit fence > for each reservation object. As long as you have multiple concurrently > running operations accessing the same buffer you need to keep one fence for > each operation no matter what. I can't make sense of what you're saying, and I suspect we put different meaning to different words. So let me define here: - implicit fencing: Userspace does not track read/writes to buffers, but only the kernel does that. This is the assumption DRI2/3 has. Since synchronization is by necessity on a per-buffer level you can only have 1 writer. In the kernel the cross-driver interface for this is struct reservation_object attached to dma-bufs. If you don't fill out/wait for the exclusive fence in there, you're driver is _not_ doing (cross-device) implicit fencing. - explicit fencing: Userspace passes around distinct fence objects for any work going on on the gpu. The kernel doesn't insert any stall of it's own (except for moving buffer objects around ofc). This is what Android. This also seems to be what amdgpu is doing within one process/owner. Given that I'm not sure what you mean with "one explicit fence per reservation_object", since explicit fencing should not attach anything (at least not any exclusive fences) to a reservation_object. It does sound a bit like you have the meaning the other way round (as in explicit fencing = the kernel explicitly takes care of fencing, but it's explicit as in explicit fences visible to userspace). -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences. 2016-09-22 12:26 ` Daniel Vetter @ 2016-09-22 12:44 ` Christian König 2016-09-22 13:05 ` Daniel Vetter 0 siblings, 1 reply; 46+ messages in thread From: Christian König @ 2016-09-22 12:44 UTC (permalink / raw) To: Daniel Vetter; +Cc: dri-devel, Michel Dänzer, amd-gfx list Am 22.09.2016 um 14:26 schrieb Daniel Vetter: > On Thu, Sep 22, 2016 at 12:55 PM, Christian König > <deathsimple@vodafone.de> wrote: >> Am 22.09.2016 um 08:36 schrieb Daniel Vetter: >>> On Wed, Sep 21, 2016 at 06:23:35PM +0200, Christian König wrote: >>>> For a quick workaround I suggest to just serialize all accesses to BO >>>> shared >>>> with different drivers, but essentially I think it is a perfectly valid >>>> requirement to have multiple writers to one BO. >>> It is, but it's not possible with implicit sync. If you want parallel >>> write access to the same shared buffer, you _must_ carry around some >>> explicit fences. Within amdgpu you can use driver-specific cookies, for >>> shared buffer we now have sync_file. But multiple writers with implicit >>> sync simply fundamentally doesn't work. Because you have no idea with >>> which >>> writer, touching the same subrange you want to touch. >> >> You don't need to separate the BO into subranges which are touched by >> different engines for allowing multiple writers. >> >> AMD hardware and I'm pretty sure others as well are perfectly capable of >> writing to the same memory from multiple engines and even multiple GPUs at >> the same time. >> >> For a good hint of what is possible see the public AMD ISA documentation >> about atomic operations, but that is only the start of it. >> >> >> The crux here is that we need to assume that we will have implicit and >> explicit sync mixed for backward compatibility. >> >> This implies that we need some mechanism like amdgpu uses in it's sync >> implementation where every fence is associated with an owner which denotes >> the domain in which implicit sync happens. If you leave this domain you will >> automatically run into explicit sync. >> >> Currently we define the borders of this domain in amdgpu on process boundary >> to keep things like DRI2/DRI3 working as expected. >> >> I really don't see how you want to solve this with a single explicit fence >> for each reservation object. As long as you have multiple concurrently >> running operations accessing the same buffer you need to keep one fence for >> each operation no matter what. > I can't make sense of what you're saying, and I suspect we put > different meaning to different words. So let me define here: > > - implicit fencing: Userspace does not track read/writes to buffers, > but only the kernel does that. This is the assumption DRI2/3 has. > Since synchronization is by necessity on a per-buffer level you can > only have 1 writer. In the kernel the cross-driver interface for this > is struct reservation_object attached to dma-bufs. If you don't fill > out/wait for the exclusive fence in there, you're driver is _not_ > doing (cross-device) implicit fencing. I can confirm that my understanding of implicit fencing is exactly the same as yours. > - explicit fencing: Userspace passes around distinct fence objects for > any work going on on the gpu. The kernel doesn't insert any stall of > it's own (except for moving buffer objects around ofc). This is what > Android. This also seems to be what amdgpu is doing within one > process/owner. No, that is clearly not my understanding of explicit fencing. Userspace doesn't necessarily need to pass around distinct fence objects with all of it's protocols and the kernel is still responsible for inserting stalls whenever an userspace protocol or application requires this semantics. Otherwise you will never be able to use explicit fencing on the Linux desktop with protocols like DRI2/DRI3. I would expect that every driver in the system waits for all fences of a reservation object as long as it isn't told otherwise by providing a distinct fence object with the IOCTL in question. Regards, Christian. _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences. 2016-09-22 12:44 ` Christian König @ 2016-09-22 13:05 ` Daniel Vetter 2016-09-22 13:22 ` Christian König 0 siblings, 1 reply; 46+ messages in thread From: Daniel Vetter @ 2016-09-22 13:05 UTC (permalink / raw) To: Christian König; +Cc: dri-devel, Michel Dänzer, amd-gfx list On Thu, Sep 22, 2016 at 2:44 PM, Christian König <deathsimple@vodafone.de> wrote: >> - explicit fencing: Userspace passes around distinct fence objects for >> any work going on on the gpu. The kernel doesn't insert any stall of >> it's own (except for moving buffer objects around ofc). This is what >> Android. This also seems to be what amdgpu is doing within one >> process/owner. > > > No, that is clearly not my understanding of explicit fencing. > > Userspace doesn't necessarily need to pass around distinct fence objects > with all of it's protocols and the kernel is still responsible for inserting > stalls whenever an userspace protocol or application requires this > semantics. > > Otherwise you will never be able to use explicit fencing on the Linux > desktop with protocols like DRI2/DRI3. This is about mixing them. Explicit fencing still means userspace has an explicit piece, separate from buffers, (either sync_file fd, or a driver-specific cookie, or similar). > I would expect that every driver in the system waits for all fences of a > reservation object as long as it isn't told otherwise by providing a > distinct fence object with the IOCTL in question. Yup agreed. This way if your explicitly-fencing driver reads a shared buffer passed over a protocol that does implicit fencing (like DRI2/3), then it will work. The other interop direction is explicitly-fencing driver passes a buffer to a consumer which expects implicit fencing. In that case you must attach the right fence to the exclusive slot, but _only_ in that case. Otherwise you end up stalling your explicitly-fencing userspace, since implicit fencing doesn't allow more than 1 writer. For amdgpu one possible way to implement this might be to count how many users a dma-buf has, and if it's more than just the current context set the exclusive fence. Or do an uabi revision and let userspace decide (or at least overwrite it). But the current approach in amdgpu_sync.c of declaring a fence as exclusive after the fact (if owners don't match) just isn't how reservation_object works. You can of course change that, but that means you must change all drivers implementing support for implicit fencing of dma-buf. Fixing amdgpu will be easier ;-) -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences. 2016-09-22 13:05 ` Daniel Vetter @ 2016-09-22 13:22 ` Christian König [not found] ` <d2430ff8-43bd-bff2-9b02-847cabfd56c0-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org> 0 siblings, 1 reply; 46+ messages in thread From: Christian König @ 2016-09-22 13:22 UTC (permalink / raw) To: Daniel Vetter; +Cc: dri-devel, Michel Dänzer, amd-gfx list Am 22.09.2016 um 15:05 schrieb Daniel Vetter: > On Thu, Sep 22, 2016 at 2:44 PM, Christian König > <deathsimple@vodafone.de> wrote: >>> - explicit fencing: Userspace passes around distinct fence objects for >>> any work going on on the gpu. The kernel doesn't insert any stall of >>> it's own (except for moving buffer objects around ofc). This is what >>> Android. This also seems to be what amdgpu is doing within one >>> process/owner. >> >> No, that is clearly not my understanding of explicit fencing. >> >> Userspace doesn't necessarily need to pass around distinct fence objects >> with all of it's protocols and the kernel is still responsible for inserting >> stalls whenever an userspace protocol or application requires this >> semantics. >> >> Otherwise you will never be able to use explicit fencing on the Linux >> desktop with protocols like DRI2/DRI3. > This is about mixing them. Explicit fencing still means userspace has > an explicit piece, separate from buffers, (either sync_file fd, or a > driver-specific cookie, or similar). > >> I would expect that every driver in the system waits for all fences of a >> reservation object as long as it isn't told otherwise by providing a >> distinct fence object with the IOCTL in question. > Yup agreed. This way if your explicitly-fencing driver reads a shared > buffer passed over a protocol that does implicit fencing (like > DRI2/3), then it will work. > > The other interop direction is explicitly-fencing driver passes a > buffer to a consumer which expects implicit fencing. In that case you > must attach the right fence to the exclusive slot, but _only_ in that > case. Ok well sounds like you are close to understand why I can't do exactly this: There simply is no right fence I could attach. When amdgpu makes the command submissions it doesn't necessarily know that the buffer will be exported and shared with another device later on. So when the buffer is exported and given to the other device you might have a whole bunch of fences which run concurrently and not in any serial order. > Otherwise you end up stalling your explicitly-fencing userspace, > since implicit fencing doesn't allow more than 1 writer. For amdgpu > one possible way to implement this might be to count how many users a > dma-buf has, and if it's more than just the current context set the > exclusive fence. Or do an uabi revision and let userspace decide (or > at least overwrite it). I mean I can pick one fence and wait for the rest to finish manually, but that would certainly defeat the whole effort, doesn't it? I completely agree that you have only 1 writer with implicit fencing, but when you switch from explicit fencing back to implicit fencing you can have multiple ones. > But the current approach in amdgpu_sync.c of declaring a fence as > exclusive after the fact (if owners don't match) just isn't how > reservation_object works. You can of course change that, but that > means you must change all drivers implementing support for implicit > fencing of dma-buf. Fixing amdgpu will be easier ;-) Well as far as I can see there is no way I can fix amdgpu in this case. The handling clearly needs to be changed on the receiving side of the reservation objects if I don't completely want to disable concurrent access to BOs in amdgpu. Regards, Christian. > -Daniel _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 46+ messages in thread
[parent not found: <d2430ff8-43bd-bff2-9b02-847cabfd56c0-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>]
* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences. [not found] ` <d2430ff8-43bd-bff2-9b02-847cabfd56c0-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org> @ 2016-09-23 10:00 ` Michel Dänzer 2016-09-23 12:09 ` Daniel Vetter 0 siblings, 1 reply; 46+ messages in thread From: Michel Dänzer @ 2016-09-23 10:00 UTC (permalink / raw) To: Christian König, Daniel Vetter Cc: Mario Kleiner, dri-devel, amd-gfx list, Chris Wilson On 22/09/16 10:22 PM, Christian König wrote: > Am 22.09.2016 um 15:05 schrieb Daniel Vetter: >> On Thu, Sep 22, 2016 at 2:44 PM, Christian König >> <deathsimple@vodafone.de> wrote: >>>> - explicit fencing: Userspace passes around distinct fence objects for >>>> any work going on on the gpu. The kernel doesn't insert any stall of >>>> it's own (except for moving buffer objects around ofc). This is what >>>> Android. This also seems to be what amdgpu is doing within one >>>> process/owner. >>> >>> No, that is clearly not my understanding of explicit fencing. >>> >>> Userspace doesn't necessarily need to pass around distinct fence objects >>> with all of it's protocols and the kernel is still responsible for >>> inserting >>> stalls whenever an userspace protocol or application requires this >>> semantics. >>> >>> Otherwise you will never be able to use explicit fencing on the Linux >>> desktop with protocols like DRI2/DRI3. >> This is about mixing them. Explicit fencing still means userspace has >> an explicit piece, separate from buffers, (either sync_file fd, or a >> driver-specific cookie, or similar). >> >>> I would expect that every driver in the system waits for all fences of a >>> reservation object as long as it isn't told otherwise by providing a >>> distinct fence object with the IOCTL in question. >> Yup agreed. This way if your explicitly-fencing driver reads a shared >> buffer passed over a protocol that does implicit fencing (like >> DRI2/3), then it will work. >> >> The other interop direction is explicitly-fencing driver passes a >> buffer to a consumer which expects implicit fencing. In that case you >> must attach the right fence to the exclusive slot, but _only_ in that >> case. > > Ok well sounds like you are close to understand why I can't do exactly > this: There simply is no right fence I could attach. > > When amdgpu makes the command submissions it doesn't necessarily know > that the buffer will be exported and shared with another device later on. > > So when the buffer is exported and given to the other device you might > have a whole bunch of fences which run concurrently and not in any > serial order. I feel like you're thinking too much of buffers shared between GPUs as being short-lived and only shared late. In the use-cases I know about, shared buffers are created separately and shared ahead of time, the actual rendering work is done to non-shared buffers and then just copied to the shared buffers for transfer between GPUs. These copies are always performed by the same context in such a way that they should always be performed by the same HW engine and thus implicitly serialized. Do you have any specific use-cases in mind where buffers are only shared between GPUs after the rendering operations creating the buffer contents to be shared have already been submitted? >> Otherwise you end up stalling your explicitly-fencing userspace, >> since implicit fencing doesn't allow more than 1 writer. For amdgpu >> one possible way to implement this might be to count how many users a >> dma-buf has, and if it's more than just the current context set the >> exclusive fence. Or do an uabi revision and let userspace decide (or >> at least overwrite it). > > I mean I can pick one fence and wait for the rest to finish manually, > but that would certainly defeat the whole effort, doesn't it? I'm afraid it's not clear to me why it would. Can you elaborate? >> But the current approach in amdgpu_sync.c of declaring a fence as >> exclusive after the fact (if owners don't match) just isn't how >> reservation_object works. You can of course change that, but that >> means you must change all drivers implementing support for implicit >> fencing of dma-buf. Fixing amdgpu will be easier ;-) > > Well as far as I can see there is no way I can fix amdgpu in this case. > > The handling clearly needs to be changed on the receiving side of the > reservation objects if I don't completely want to disable concurrent > access to BOs in amdgpu. Anyway, we need a solution for this between radeon and amdgpu, and I don't think a solution which involves those drivers using reservation object semantics between them which are different from all other drivers is a good idea. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences. 2016-09-23 10:00 ` Michel Dänzer @ 2016-09-23 12:09 ` Daniel Vetter 2016-09-26 0:48 ` Michel Dänzer 0 siblings, 1 reply; 46+ messages in thread From: Daniel Vetter @ 2016-09-23 12:09 UTC (permalink / raw) To: Michel Dänzer; +Cc: dri-devel, amd-gfx list On Fri, Sep 23, 2016 at 07:00:25PM +0900, Michel Dänzer wrote: > On 22/09/16 10:22 PM, Christian König wrote: > > Am 22.09.2016 um 15:05 schrieb Daniel Vetter: > >> On Thu, Sep 22, 2016 at 2:44 PM, Christian König > >> <deathsimple@vodafone.de> wrote: > >>>> - explicit fencing: Userspace passes around distinct fence objects for > >>>> any work going on on the gpu. The kernel doesn't insert any stall of > >>>> it's own (except for moving buffer objects around ofc). This is what > >>>> Android. This also seems to be what amdgpu is doing within one > >>>> process/owner. > >>> > >>> No, that is clearly not my understanding of explicit fencing. > >>> > >>> Userspace doesn't necessarily need to pass around distinct fence objects > >>> with all of it's protocols and the kernel is still responsible for > >>> inserting > >>> stalls whenever an userspace protocol or application requires this > >>> semantics. > >>> > >>> Otherwise you will never be able to use explicit fencing on the Linux > >>> desktop with protocols like DRI2/DRI3. > >> This is about mixing them. Explicit fencing still means userspace has > >> an explicit piece, separate from buffers, (either sync_file fd, or a > >> driver-specific cookie, or similar). > >> > >>> I would expect that every driver in the system waits for all fences of a > >>> reservation object as long as it isn't told otherwise by providing a > >>> distinct fence object with the IOCTL in question. > >> Yup agreed. This way if your explicitly-fencing driver reads a shared > >> buffer passed over a protocol that does implicit fencing (like > >> DRI2/3), then it will work. > >> > >> The other interop direction is explicitly-fencing driver passes a > >> buffer to a consumer which expects implicit fencing. In that case you > >> must attach the right fence to the exclusive slot, but _only_ in that > >> case. > > > > Ok well sounds like you are close to understand why I can't do exactly > > this: There simply is no right fence I could attach. > > > > When amdgpu makes the command submissions it doesn't necessarily know > > that the buffer will be exported and shared with another device later on. > > > > So when the buffer is exported and given to the other device you might > > have a whole bunch of fences which run concurrently and not in any > > serial order. > > I feel like you're thinking too much of buffers shared between GPUs as > being short-lived and only shared late. In the use-cases I know about, > shared buffers are created separately and shared ahead of time, the > actual rendering work is done to non-shared buffers and then just copied > to the shared buffers for transfer between GPUs. These copies are always > performed by the same context in such a way that they should always be > performed by the same HW engine and thus implicitly serialized. > > Do you have any specific use-cases in mind where buffers are only shared > between GPUs after the rendering operations creating the buffer contents > to be shared have already been submitted? Yeah, it should be known which buffer you use (at least in userspace, maybe not in the kernel) for which you need implicit fencing. At least with DRI2/3 it's really obvious which buffers are shared. Same holds for external images and other imported buffers. Yes that means you need to keep track of a few things in userspace, and you need to add a special flag to CS to make sure the kernel does set the exclusive fence. > >> Otherwise you end up stalling your explicitly-fencing userspace, > >> since implicit fencing doesn't allow more than 1 writer. For amdgpu > >> one possible way to implement this might be to count how many users a > >> dma-buf has, and if it's more than just the current context set the > >> exclusive fence. Or do an uabi revision and let userspace decide (or > >> at least overwrite it). > > > > I mean I can pick one fence and wait for the rest to finish manually, > > but that would certainly defeat the whole effort, doesn't it? > > I'm afraid it's not clear to me why it would. Can you elaborate? > > > >> But the current approach in amdgpu_sync.c of declaring a fence as > >> exclusive after the fact (if owners don't match) just isn't how > >> reservation_object works. You can of course change that, but that > >> means you must change all drivers implementing support for implicit > >> fencing of dma-buf. Fixing amdgpu will be easier ;-) > > > > Well as far as I can see there is no way I can fix amdgpu in this case. > > > > The handling clearly needs to be changed on the receiving side of the > > reservation objects if I don't completely want to disable concurrent > > access to BOs in amdgpu. > > Anyway, we need a solution for this between radeon and amdgpu, and I > don't think a solution which involves those drivers using reservation > object semantics between them which are different from all other drivers > is a good idea. Afaik there's also amd+intel machines out there, so really the only option is to either fix amdgpu to correctly set exclusive fences on shared buffers (with the help of userspace hints). Or change all the existing drivers. No idea what's simpler to do, since I don't know about amdgpu userspace. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences. 2016-09-23 12:09 ` Daniel Vetter @ 2016-09-26 0:48 ` Michel Dänzer 2016-09-26 8:04 ` Daniel Vetter 0 siblings, 1 reply; 46+ messages in thread From: Michel Dänzer @ 2016-09-26 0:48 UTC (permalink / raw) To: Daniel Vetter; +Cc: dri-devel, amd-gfx list On 23/09/16 09:09 PM, Daniel Vetter wrote: > On Fri, Sep 23, 2016 at 07:00:25PM +0900, Michel Dänzer wrote: >> On 22/09/16 10:22 PM, Christian König wrote: >>> Am 22.09.2016 um 15:05 schrieb Daniel Vetter: >>>> >>>> But the current approach in amdgpu_sync.c of declaring a fence as >>>> exclusive after the fact (if owners don't match) just isn't how >>>> reservation_object works. You can of course change that, but that >>>> means you must change all drivers implementing support for implicit >>>> fencing of dma-buf. Fixing amdgpu will be easier ;-) >>> >>> Well as far as I can see there is no way I can fix amdgpu in this case. >>> >>> The handling clearly needs to be changed on the receiving side of the >>> reservation objects if I don't completely want to disable concurrent >>> access to BOs in amdgpu. >> >> Anyway, we need a solution for this between radeon and amdgpu, and I >> don't think a solution which involves those drivers using reservation >> object semantics between them which are different from all other drivers >> is a good idea. > > Afaik there's also amd+intel machines out there, Sure, what I meant was that even if we didn't care about those (which we do), we'd still need a solution between our own drivers. > so really the only option is to either fix amdgpu to correctly set > exclusive fences on shared buffers (with the help of userspace hints). > Or change all the existing drivers. I got some fresh perspective on the problem while taking a walk, and I'm now fairly convinced that neither amdgpu userspace nor other drivers need to be modified: It occurred to me that all the information we need is already there in the exclusive and shared fences set by amdgpu. We just need to use it differently to match the expectations of other drivers. We should be able to set some sort of "pseudo" fence as the exclusive fence of the reservation object. When we are asked by another driver to wait for this fence to signal, we take the current "real" exclusive fence (which we can keep track of e.g. in our BO struct) and shared fences, and wait for all of those to signal; then we mark the "pseudo" exclusive fence as signalled. Am I missing anything which would prevent this from working? -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences. 2016-09-26 0:48 ` Michel Dänzer @ 2016-09-26 8:04 ` Daniel Vetter [not found] ` <20160926080419.GV20761-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org> 0 siblings, 1 reply; 46+ messages in thread From: Daniel Vetter @ 2016-09-26 8:04 UTC (permalink / raw) To: Michel Dänzer; +Cc: dri-devel, amd-gfx list On Mon, Sep 26, 2016 at 09:48:37AM +0900, Michel Dänzer wrote: > On 23/09/16 09:09 PM, Daniel Vetter wrote: > > On Fri, Sep 23, 2016 at 07:00:25PM +0900, Michel Dänzer wrote: > >> On 22/09/16 10:22 PM, Christian König wrote: > >>> Am 22.09.2016 um 15:05 schrieb Daniel Vetter: > >>>> > >>>> But the current approach in amdgpu_sync.c of declaring a fence as > >>>> exclusive after the fact (if owners don't match) just isn't how > >>>> reservation_object works. You can of course change that, but that > >>>> means you must change all drivers implementing support for implicit > >>>> fencing of dma-buf. Fixing amdgpu will be easier ;-) > >>> > >>> Well as far as I can see there is no way I can fix amdgpu in this case. > >>> > >>> The handling clearly needs to be changed on the receiving side of the > >>> reservation objects if I don't completely want to disable concurrent > >>> access to BOs in amdgpu. > >> > >> Anyway, we need a solution for this between radeon and amdgpu, and I > >> don't think a solution which involves those drivers using reservation > >> object semantics between them which are different from all other drivers > >> is a good idea. > > > > Afaik there's also amd+intel machines out there, > > Sure, what I meant was that even if we didn't care about those (which we > do), we'd still need a solution between our own drivers. > > > > so really the only option is to either fix amdgpu to correctly set > > exclusive fences on shared buffers (with the help of userspace hints). > > Or change all the existing drivers. > > I got some fresh perspective on the problem while taking a walk, and I'm > now fairly convinced that neither amdgpu userspace nor other drivers > need to be modified: > > It occurred to me that all the information we need is already there in > the exclusive and shared fences set by amdgpu. We just need to use it > differently to match the expectations of other drivers. > > We should be able to set some sort of "pseudo" fence as the exclusive > fence of the reservation object. When we are asked by another driver to > wait for this fence to signal, we take the current "real" exclusive > fence (which we can keep track of e.g. in our BO struct) and shared > fences, and wait for all of those to signal; then we mark the "pseudo" > exclusive fence as signalled. > > Am I missing anything which would prevent this from working? One thing to make sure is that you don't change the (real, private stored) fences you're waiting on over the lifetime of the public exclusive fence. That might lead to some hilarity wrt potentially creating fence depency loops. But I think as long as you guarantee that the private internal fences are always amdgpu ones, and never anything imported from a different driver even that should be fine. Not because this would break the loops, but since amgpud has a hangcheck it can still gurantee that the fence eventually fires even if there is a loop. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 46+ messages in thread
[parent not found: <20160926080419.GV20761-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>]
* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences. [not found] ` <20160926080419.GV20761-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org> @ 2016-10-07 12:34 ` Mike Lothian [not found] ` <CAHbf0-HZ6EotqwgvkxRTdRF97xB3qBA=DRKAzaAXguV_PR_P8w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 46+ messages in thread From: Mike Lothian @ 2016-10-07 12:34 UTC (permalink / raw) To: Daniel Vetter, Michel Dänzer; +Cc: amd-gfx list, dri-devel Hi This has discussion has gone a little quiet Was there any agreement about what needed doing to get this working for i965/amdgpu? Regards Mike On Mon, 26 Sep 2016 at 09:04 Daniel Vetter <daniel@ffwll.ch> wrote: > > On Mon, Sep 26, 2016 at 09:48:37AM +0900, Michel Dänzer wrote: > > On 23/09/16 09:09 PM, Daniel Vetter wrote: > > > On Fri, Sep 23, 2016 at 07:00:25PM +0900, Michel Dänzer wrote: > > >> On 22/09/16 10:22 PM, Christian König wrote: > > >>> Am 22.09.2016 um 15:05 schrieb Daniel Vetter: > > >>>> > > >>>> But the current approach in amdgpu_sync.c of declaring a fence as > > >>>> exclusive after the fact (if owners don't match) just isn't how > > >>>> reservation_object works. You can of course change that, but that > > >>>> means you must change all drivers implementing support for implicit > > >>>> fencing of dma-buf. Fixing amdgpu will be easier ;-) > > >>> > > >>> Well as far as I can see there is no way I can fix amdgpu in this case. > > >>> > > >>> The handling clearly needs to be changed on the receiving side of the > > >>> reservation objects if I don't completely want to disable concurrent > > >>> access to BOs in amdgpu. > > >> > > >> Anyway, we need a solution for this between radeon and amdgpu, and I > > >> don't think a solution which involves those drivers using reservation > > >> object semantics between them which are different from all other drivers > > >> is a good idea. > > > > > > Afaik there's also amd+intel machines out there, > > > > Sure, what I meant was that even if we didn't care about those (which we > > do), we'd still need a solution between our own drivers. > > > > > > > so really the only option is to either fix amdgpu to correctly set > > > exclusive fences on shared buffers (with the help of userspace hints). > > > Or change all the existing drivers. > > > > I got some fresh perspective on the problem while taking a walk, and I'm > > now fairly convinced that neither amdgpu userspace nor other drivers > > need to be modified: > > > > It occurred to me that all the information we need is already there in > > the exclusive and shared fences set by amdgpu. We just need to use it > > differently to match the expectations of other drivers. > > > > We should be able to set some sort of "pseudo" fence as the exclusive > > fence of the reservation object. When we are asked by another driver to > > wait for this fence to signal, we take the current "real" exclusive > > fence (which we can keep track of e.g. in our BO struct) and shared > > fences, and wait for all of those to signal; then we mark the "pseudo" > > exclusive fence as signalled. > > > > Am I missing anything which would prevent this from working? > > One thing to make sure is that you don't change the (real, private stored) > fences you're waiting on over the lifetime of the public exclusive fence. > That might lead to some hilarity wrt potentially creating fence depency > loops. But I think as long as you guarantee that the private internal > fences are always amdgpu ones, and never anything imported from a > different driver even that should be fine. Not because this would break > the loops, but since amgpud has a hangcheck it can still gurantee that the > fence eventually fires even if there is a loop. > -Daniel > -- > Daniel Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 46+ messages in thread
[parent not found: <CAHbf0-HZ6EotqwgvkxRTdRF97xB3qBA=DRKAzaAXguV_PR_P8w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences. [not found] ` <CAHbf0-HZ6EotqwgvkxRTdRF97xB3qBA=DRKAzaAXguV_PR_P8w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2016-10-11 3:58 ` Michel Dänzer [not found] ` <d74d34a7-5221-d282-d9d1-b0e1007fc0c7-otUistvHUpPR7s880joybQ@public.gmane.org> 0 siblings, 1 reply; 46+ messages in thread From: Michel Dänzer @ 2016-10-11 3:58 UTC (permalink / raw) To: Mike Lothian, Daniel Vetter, Christian König; +Cc: dri-devel, amd-gfx list On 07/10/16 09:34 PM, Mike Lothian wrote: > > This has discussion has gone a little quiet > > Was there any agreement about what needed doing to get this working > for i965/amdgpu? Christian, do you see anything which could prevent the solution I outlined from working? -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 46+ messages in thread
[parent not found: <d74d34a7-5221-d282-d9d1-b0e1007fc0c7-otUistvHUpPR7s880joybQ@public.gmane.org>]
* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences. [not found] ` <d74d34a7-5221-d282-d9d1-b0e1007fc0c7-otUistvHUpPR7s880joybQ@public.gmane.org> @ 2016-10-11 12:04 ` Christian König [not found] ` <c77a2cb9-1f0e-f1a3-aedd-a111cd6ba8e8-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org> 0 siblings, 1 reply; 46+ messages in thread From: Christian König @ 2016-10-11 12:04 UTC (permalink / raw) To: Michel Dänzer, Mike Lothian, Daniel Vetter Cc: Nayan Deshmukh, dri-devel, amd-gfx list Am 11.10.2016 um 05:58 schrieb Michel Dänzer: > On 07/10/16 09:34 PM, Mike Lothian wrote: >> This has discussion has gone a little quiet >> >> Was there any agreement about what needed doing to get this working >> for i965/amdgpu? > Christian, do you see anything which could prevent the solution I > outlined from working? I thought about that approach as well, but unfortunately it also has a couple of downsides. Especially keeping the exclusive fence set while we actually don't need it isn't really clean either. I'm currently a bit busy with other tasks and so put Nayan on a road to get a bit into the kernel driver (he asked for that anyway). Implementing the simple workaround to sync when we export the DMA-buf should be something like 20 lines of code and fortunately Nayan has an I+A system and so can actually test it. If it turns out to be more problematic or somebody really starts to need it I'm going to hack on that myself a bit more. Regards, Christian. _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 46+ messages in thread
[parent not found: <c77a2cb9-1f0e-f1a3-aedd-a111cd6ba8e8-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>]
* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences. [not found] ` <c77a2cb9-1f0e-f1a3-aedd-a111cd6ba8e8-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org> @ 2016-10-12 0:40 ` Michel Dänzer 2016-10-27 13:33 ` Mike Lothian 0 siblings, 1 reply; 46+ messages in thread From: Michel Dänzer @ 2016-10-12 0:40 UTC (permalink / raw) To: Christian König, Mike Lothian, Daniel Vetter Cc: Nayan Deshmukh, amd-gfx list, dri-devel On 11/10/16 09:04 PM, Christian König wrote: > Am 11.10.2016 um 05:58 schrieb Michel Dänzer: >> On 07/10/16 09:34 PM, Mike Lothian wrote: >>> This has discussion has gone a little quiet >>> >>> Was there any agreement about what needed doing to get this working >>> for i965/amdgpu? >> Christian, do you see anything which could prevent the solution I >> outlined from working? > > I thought about that approach as well, but unfortunately it also has a > couple of downsides. Especially keeping the exclusive fence set while we > actually don't need it isn't really clean either. I was wondering if it's possible to have a singleton pseudo exclusive fence which is used for all BOs. That might keep the overhead acceptably low. > I'm currently a bit busy with other tasks and so put Nayan on a road to > get a bit into the kernel driver (he asked for that anyway). > > Implementing the simple workaround to sync when we export the DMA-buf > should be something like 20 lines of code and fortunately Nayan has an > I+A system and so can actually test it. > > If it turns out to be more problematic or somebody really starts to need > it I'm going to hack on that myself a bit more. If you mean only syncing when a DMA-buf is exported, that's not enough, as I explained before. The BOs being shared are long-lived, and synchronization between GPUs is required for every command stream submission. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences. 2016-10-12 0:40 ` Michel Dänzer @ 2016-10-27 13:33 ` Mike Lothian [not found] ` <CAHbf0-GGMWZrhB+PKpc-QbD__6fqB4pQVFfN+gzLWNhi+DuG3Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 46+ messages in thread From: Mike Lothian @ 2016-10-27 13:33 UTC (permalink / raw) To: Michel Dänzer, Christian König, Daniel Vetter Cc: Nayan Deshmukh, amd-gfx list, dri-devel [-- Attachment #1.1: Type: text/plain, Size: 1823 bytes --] Hi Just another gentle ping to see where you are with this? Cheers Mike On Wed, 12 Oct 2016 at 01:40 Michel Dänzer <michel@daenzer.net> wrote: > On 11/10/16 09:04 PM, Christian König wrote: > > Am 11.10.2016 um 05:58 schrieb Michel Dänzer: > >> On 07/10/16 09:34 PM, Mike Lothian wrote: > >>> This has discussion has gone a little quiet > >>> > >>> Was there any agreement about what needed doing to get this working > >>> for i965/amdgpu? > >> Christian, do you see anything which could prevent the solution I > >> outlined from working? > > > > I thought about that approach as well, but unfortunately it also has a > > couple of downsides. Especially keeping the exclusive fence set while we > > actually don't need it isn't really clean either. > > I was wondering if it's possible to have a singleton pseudo exclusive > fence which is used for all BOs. That might keep the overhead acceptably > low. > > > > I'm currently a bit busy with other tasks and so put Nayan on a road to > > get a bit into the kernel driver (he asked for that anyway). > > > > Implementing the simple workaround to sync when we export the DMA-buf > > should be something like 20 lines of code and fortunately Nayan has an > > I+A system and so can actually test it. > > > > If it turns out to be more problematic or somebody really starts to need > > it I'm going to hack on that myself a bit more. > > If you mean only syncing when a DMA-buf is exported, that's not enough, > as I explained before. The BOs being shared are long-lived, and > synchronization between GPUs is required for every command stream > submission. > > > -- > Earthling Michel Dänzer | http://www.amd.com > Libre software enthusiast | Mesa and X developer > [-- Attachment #1.2: Type: text/html, Size: 3122 bytes --] [-- Attachment #2: Type: text/plain, Size: 160 bytes --] _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 46+ messages in thread
[parent not found: <CAHbf0-GGMWZrhB+PKpc-QbD__6fqB4pQVFfN+gzLWNhi+DuG3Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences. [not found] ` <CAHbf0-GGMWZrhB+PKpc-QbD__6fqB4pQVFfN+gzLWNhi+DuG3Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2016-10-28 1:34 ` Michel Dänzer 2016-10-28 17:37 ` Mario Kleiner 0 siblings, 1 reply; 46+ messages in thread From: Michel Dänzer @ 2016-10-28 1:34 UTC (permalink / raw) To: Mike Lothian, Christian König, Daniel Vetter Cc: Nayan Deshmukh, dri-devel, amd-gfx list On 27/10/16 10:33 PM, Mike Lothian wrote: > > Just another gentle ping to see where you are with this? I haven't got a chance to look into this any further. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences. 2016-10-28 1:34 ` Michel Dänzer @ 2016-10-28 17:37 ` Mario Kleiner [not found] ` <7eb19a73-a558-d2e6-bd8d-34fe95045dfd-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 2016-10-28 18:37 ` Mike Lothian 0 siblings, 2 replies; 46+ messages in thread From: Mario Kleiner @ 2016-10-28 17:37 UTC (permalink / raw) To: Michel Dänzer, Mike Lothian, Christian König, Daniel Vetter Cc: Nayan Deshmukh, amd-gfx list, dri-devel [-- Attachment #1: Type: text/plain, Size: 1555 bytes --] On 10/28/2016 03:34 AM, Michel Dänzer wrote: > On 27/10/16 10:33 PM, Mike Lothian wrote: >> >> Just another gentle ping to see where you are with this? > > I haven't got a chance to look into this any further. > > Fwiw., as a proof of concept, the attached experimental patch does work as tested on Intel HD Haswell + AMD R9 380 Tonga under amdgpu and DRI3/Present when applied to drm-next (updated from a few days ago). With DRI_PRIME=1 tearing for page-flipped fullscreen windows is gone under all loads. The tearing with "windowed" windows now looks as expected for regular tearing not related to Prime. ftrace confirms the i915 driver's pageflip function is waiting on the fence in reservation_object_wait_timeout_rcu() as it should. That entry->tv.shared needs to be set false for such buffers in amdgpu_bo_list_set() makes sense to me, as that is part of the buffer validation for command stream submission. There are other places in the driver where tv.shared is set, which i didn't check so far. I don't know which of these would need to be updated with a "exported bo" check as well, e.g., for video decoding or maybe gpu compute? Adding or removing the check to amdgpu_gem_va_update_vm(), e.g., made no difference. I assume that makes sense because that functions seems to deal with amdgpu internal vm page tables or page table entries for such a bo, not with something visible to external clients? All i can say is it fixes 3D rendering under DRI3 + Prime + pageflipping without causing any obvious new problems. -mario [-- Attachment #2: 0001-drm-amdgpu-Attach-exclusive-fence-to-prime-exported-.patch --] [-- Type: text/x-patch, Size: 3024 bytes --] >From 2a8d7fcd36da30305fa675df311c697162792597 Mon Sep 17 00:00:00 2001 From: Mario Kleiner <mario.kleiner.de@gmail.com> Date: Wed, 26 Oct 2016 10:58:00 +0200 Subject: [PATCH] drm/amdgpu: Attach exclusive fence to prime exported bo's. External clients which import our bo's wait only for exclusive dmabuf-fences, not on shared ones, so attach fences on such exported buffers as exclusive ones, not shared ones. -> Backup commit. Work in progress. Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com> --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c | 5 ++++- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 6 ++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c | 3 +++ 4 files changed, 14 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 039b57e..a337d56 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -459,6 +459,7 @@ struct amdgpu_bo { u64 metadata_flags; void *metadata; u32 metadata_size; + bool prime_exported; /* list of all virtual address to which this bo * is associated to */ diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c index 651115d..6e1d7b3 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c @@ -132,7 +132,10 @@ static int amdgpu_bo_list_set(struct amdgpu_device *adev, entry->priority = min(info[i].bo_priority, AMDGPU_BO_LIST_MAX_PRIORITY); entry->tv.bo = &entry->robj->tbo; - entry->tv.shared = true; + entry->tv.shared = !entry->robj->prime_exported; + + if (entry->robj->prime_exported) + DRM_DEBUG_PRIME("Exclusive fence for exported prime bo %p\n", entry->robj); if (entry->robj->prefered_domains == AMDGPU_GEM_DOMAIN_GDS) gds_obj = entry->robj; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c index a7ea9a3..730a68e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c @@ -494,6 +494,12 @@ static void amdgpu_gem_va_update_vm(struct amdgpu_device *adev, tv.bo = &bo_va->bo->tbo; tv.shared = true; + + if (bo_va->bo->prime_exported) { + DRM_DEBUG_PRIME("Update for exported prime bo %p\n", bo_va->bo); + /* tv.shared = false; */ + } + list_add(&tv.head, &list); amdgpu_vm_get_pd_bo(bo_va->vm, &list, &vm_pd); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c index 7700dc2..bfbfeb9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c @@ -121,5 +121,8 @@ struct dma_buf *amdgpu_gem_prime_export(struct drm_device *dev, if (amdgpu_ttm_tt_get_usermm(bo->tbo.ttm)) return ERR_PTR(-EPERM); + bo->prime_exported = true; + DRM_DEBUG_PRIME("Exporting prime bo %p\n", bo); + return drm_gem_prime_export(dev, gobj, flags); } -- 2.7.4 [-- Attachment #3: Type: text/plain, Size: 160 bytes --] _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply related [flat|nested] 46+ messages in thread
[parent not found: <7eb19a73-a558-d2e6-bd8d-34fe95045dfd-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>]
* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences. [not found] ` <7eb19a73-a558-d2e6-bd8d-34fe95045dfd-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> @ 2016-10-28 17:48 ` Christian König 2016-11-05 1:17 ` Mario Kleiner 2016-10-31 6:41 ` Michel Dänzer 1 sibling, 1 reply; 46+ messages in thread From: Christian König @ 2016-10-28 17:48 UTC (permalink / raw) To: Mario Kleiner, Michel Dänzer, Mike Lothian, Daniel Vetter Cc: Nayan Deshmukh, amd-gfx list, dri-devel Am 28.10.2016 um 19:37 schrieb Mario Kleiner: > > > On 10/28/2016 03:34 AM, Michel Dänzer wrote: >> On 27/10/16 10:33 PM, Mike Lothian wrote: >>> >>> Just another gentle ping to see where you are with this? >> >> I haven't got a chance to look into this any further. >> >> > > Fwiw., as a proof of concept, the attached experimental patch does > work as tested on Intel HD Haswell + AMD R9 380 Tonga under amdgpu and > DRI3/Present when applied to drm-next (updated from a few days ago). > With DRI_PRIME=1 tearing for page-flipped fullscreen windows is gone > under all loads. The tearing with "windowed" windows now looks as > expected for regular tearing not related to Prime. Yeah, that's pretty much what I had in mind as well. You additionally need to wait for the shared fences when you export the BO for the first time, but that's only a nitpick. > > ftrace confirms the i915 driver's pageflip function is waiting on the > fence in reservation_object_wait_timeout_rcu() as it should. > > That entry->tv.shared needs to be set false for such buffers in > amdgpu_bo_list_set() makes sense to me, as that is part of the buffer > validation for command stream submission. There are other places in > the driver where tv.shared is set, which i didn't check so far. > > I don't know which of these would need to be updated with a "exported > bo" check as well, e.g., for video decoding or maybe gpu compute? > Adding or removing the check to amdgpu_gem_va_update_vm(), e.g., made > no difference. I assume that makes sense because that functions seems > to deal with amdgpu internal vm page tables or page table entries for > such a bo, not with something visible to external clients? Yes, exactly. VM updates doesn't matter for anybody else than amdgpu. Additional to that we don't even add a fence to the shared slot we reserve (could probably drop that for optimization). Please remove the debugging stuff and the extra code on the VM updates and add a reservation_object_wait_timeout_rcu() to the export path and we should be good to go. Regards, Christian. > > All i can say is it fixes 3D rendering under DRI3 + Prime + > pageflipping without causing any obvious new problems. > > -mario _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences. 2016-10-28 17:48 ` Christian König @ 2016-11-05 1:17 ` Mario Kleiner 0 siblings, 0 replies; 46+ messages in thread From: Mario Kleiner @ 2016-11-05 1:17 UTC (permalink / raw) To: Christian König, Michel Dänzer, Mike Lothian, Daniel Vetter Cc: Nayan Deshmukh, amd-gfx list, dri-devel On 10/28/2016 07:48 PM, Christian König wrote: > Am 28.10.2016 um 19:37 schrieb Mario Kleiner: >> >> >> On 10/28/2016 03:34 AM, Michel Dänzer wrote: >>> On 27/10/16 10:33 PM, Mike Lothian wrote: >>>> >>>> Just another gentle ping to see where you are with this? >>> >>> I haven't got a chance to look into this any further. >>> >>> >> >> Fwiw., as a proof of concept, the attached experimental patch does >> work as tested on Intel HD Haswell + AMD R9 380 Tonga under amdgpu and >> DRI3/Present when applied to drm-next (updated from a few days ago). >> With DRI_PRIME=1 tearing for page-flipped fullscreen windows is gone >> under all loads. The tearing with "windowed" windows now looks as >> expected for regular tearing not related to Prime. > > Yeah, that's pretty much what I had in mind as well. You additionally > need to wait for the shared fences when you export the BO for the first > time, but that's only a nitpick. > >> >> ftrace confirms the i915 driver's pageflip function is waiting on the >> fence in reservation_object_wait_timeout_rcu() as it should. >> >> That entry->tv.shared needs to be set false for such buffers in >> amdgpu_bo_list_set() makes sense to me, as that is part of the buffer >> validation for command stream submission. There are other places in >> the driver where tv.shared is set, which i didn't check so far. >> >> I don't know which of these would need to be updated with a "exported >> bo" check as well, e.g., for video decoding or maybe gpu compute? >> Adding or removing the check to amdgpu_gem_va_update_vm(), e.g., made >> no difference. I assume that makes sense because that functions seems >> to deal with amdgpu internal vm page tables or page table entries for >> such a bo, not with something visible to external clients? > > Yes, exactly. VM updates doesn't matter for anybody else than amdgpu. > Additional to that we don't even add a fence to the shared slot we > reserve (could probably drop that for optimization). > > Please remove the debugging stuff and the extra code on the VM updates > and add a reservation_object_wait_timeout_rcu() to the export path and > we should be good to go. > Done. Patch v2 just sent out after retesting with Tonga only and Intel + Tonga renderoffload. Would be cool if we could get this into 4.9-rc. Ideally also backported to 4.8, given it is a simple change, as that would be the next official kernel for *buntu 16.04-LTS and derivatives, but that's probably breaking the rules as it doesn't fix a regression? thanks, -mario > Regards, > Christian. > >> >> All i can say is it fixes 3D rendering under DRI3 + Prime + >> pageflipping without causing any obvious new problems. >> >> -mario > > _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences. [not found] ` <7eb19a73-a558-d2e6-bd8d-34fe95045dfd-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 2016-10-28 17:48 ` Christian König @ 2016-10-31 6:41 ` Michel Dänzer 1 sibling, 0 replies; 46+ messages in thread From: Michel Dänzer @ 2016-10-31 6:41 UTC (permalink / raw) To: Mario Kleiner, Mike Lothian, Christian König, Daniel Vetter Cc: Nayan Deshmukh, dri-devel, amd-gfx list On 29/10/16 02:37 AM, Mario Kleiner wrote: > On 10/28/2016 03:34 AM, Michel Dänzer wrote: >> On 27/10/16 10:33 PM, Mike Lothian wrote: >>> >>> Just another gentle ping to see where you are with this? >> >> I haven't got a chance to look into this any further. > > Fwiw., as a proof of concept, the attached experimental patch does work > as tested on Intel HD Haswell + AMD R9 380 Tonga under amdgpu and > DRI3/Present when applied to drm-next (updated from a few days ago). > With DRI_PRIME=1 tearing for page-flipped fullscreen windows is gone > under all loads. The tearing with "windowed" windows now looks as > expected for regular tearing not related to Prime. > > ftrace confirms the i915 driver's pageflip function is waiting on the > fence in reservation_object_wait_timeout_rcu() as it should. > > That entry->tv.shared needs to be set false for such buffers in > amdgpu_bo_list_set() makes sense to me, as that is part of the buffer > validation for command stream submission. There are other places in the > driver where tv.shared is set, which i didn't check so far. > > I don't know which of these would need to be updated with a "exported > bo" check as well, e.g., for video decoding or maybe gpu compute? Adding > or removing the check to amdgpu_gem_va_update_vm(), e.g., made no > difference. I assume that makes sense because that functions seems to > deal with amdgpu internal vm page tables or page table entries for such > a bo, not with something visible to external clients? > > All i can say is it fixes 3D rendering under DRI3 + Prime + pageflipping > without causing any obvious new problems. [...] > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c > index 7700dc2..bfbfeb9 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c > @@ -121,5 +121,8 @@ struct dma_buf *amdgpu_gem_prime_export(struct drm_device *dev, > if (amdgpu_ttm_tt_get_usermm(bo->tbo.ttm)) > return ERR_PTR(-EPERM); > > + bo->prime_exported = true; > + DRM_DEBUG_PRIME("Exporting prime bo %p\n", bo); > + > return drm_gem_prime_export(dev, gobj, flags); > } > This will take effect in non-PRIME cases as well, at least DRI3 and GL<->[other API] interop off the top of my head. Is that okay? -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences. 2016-10-28 17:37 ` Mario Kleiner [not found] ` <7eb19a73-a558-d2e6-bd8d-34fe95045dfd-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> @ 2016-10-28 18:37 ` Mike Lothian 2016-10-29 13:58 ` Mike Lothian 1 sibling, 1 reply; 46+ messages in thread From: Mike Lothian @ 2016-10-28 18:37 UTC (permalink / raw) To: Mario Kleiner, Michel Dänzer, Christian König, Daniel Vetter Cc: Nayan Deshmukh, amd-gfx list, dri-devel [-- Attachment #1.1: Type: text/plain, Size: 2072 bytes --] Hi Mario That fixes the tearing, it's been replaced with a strange stutter, like it's only showing half the number of frames being reported - it's really noticeable in tomb raider Thanks for your work on this, the stutter is much more manageable than the tearing was I've attached the patch that applies cleanly to 4.10-wip On Fri, 28 Oct 2016 at 18:37 Mario Kleiner <mario.kleiner.de@gmail.com> wrote: > > > On 10/28/2016 03:34 AM, Michel Dänzer wrote: > > On 27/10/16 10:33 PM, Mike Lothian wrote: > >> > >> Just another gentle ping to see where you are with this? > > > > I haven't got a chance to look into this any further. > > > > > > Fwiw., as a proof of concept, the attached experimental patch does work > as tested on Intel HD Haswell + AMD R9 380 Tonga under amdgpu and > DRI3/Present when applied to drm-next (updated from a few days ago). > With DRI_PRIME=1 tearing for page-flipped fullscreen windows is gone > under all loads. The tearing with "windowed" windows now looks as > expected for regular tearing not related to Prime. > > ftrace confirms the i915 driver's pageflip function is waiting on the > fence in reservation_object_wait_timeout_rcu() as it should. > > That entry->tv.shared needs to be set false for such buffers in > amdgpu_bo_list_set() makes sense to me, as that is part of the buffer > validation for command stream submission. There are other places in the > driver where tv.shared is set, which i didn't check so far. > > I don't know which of these would need to be updated with a "exported > bo" check as well, e.g., for video decoding or maybe gpu compute? Adding > or removing the check to amdgpu_gem_va_update_vm(), e.g., made no > difference. I assume that makes sense because that functions seems to > deal with amdgpu internal vm page tables or page table entries for such > a bo, not with something visible to external clients? > > All i can say is it fixes 3D rendering under DRI3 + Prime + pageflipping > without causing any obvious new problems. > > -mario > [-- Attachment #1.2: Type: text/html, Size: 3233 bytes --] [-- Attachment #2: 0001-drm-amdgpu-Attach-exclusive-fence-to-prime-exported-.patch --] [-- Type: text/x-patch, Size: 2235 bytes --] diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 217df24..6757b99 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -391,6 +391,7 @@ struct amdgpu_bo { u64 metadata_flags; void *metadata; u32 metadata_size; + bool prime_exported; /* list of all virtual address to which this bo * is associated to */ diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c index 651115d..6e1d7b3 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c @@ -132,7 +132,10 @@ static int amdgpu_bo_list_set(struct amdgpu_device *adev, entry->priority = min(info[i].bo_priority, AMDGPU_BO_LIST_MAX_PRIORITY); entry->tv.bo = &entry->robj->tbo; - entry->tv.shared = true; + entry->tv.shared = !entry->robj->prime_exported; + + if (entry->robj->prime_exported) + DRM_DEBUG_PRIME("Exclusive fence for exported prime bo %p\n", entry->robj); if (entry->robj->prefered_domains == AMDGPU_GEM_DOMAIN_GDS) gds_obj = entry->robj; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c index cd62f6f..54099a5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c @@ -504,6 +504,12 @@ static void amdgpu_gem_va_update_vm(struct amdgpu_device *adev, tv.bo = &bo_va->bo->tbo; tv.shared = true; + + if (bo_va->bo->prime_exported) { + DRM_DEBUG_PRIME("Update for exported prime bo %p\n", bo_va->bo); + /* tv.shared = false; */ + } + list_add(&tv.head, &list); amdgpu_vm_get_pd_bo(bo_va->vm, &list, &vm_pd); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c index 7700dc2..bfbfeb9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c @@ -121,5 +121,8 @@ struct dma_buf *amdgpu_gem_prime_export(struct drm_device *dev, if (amdgpu_ttm_tt_get_usermm(bo->tbo.ttm)) return ERR_PTR(-EPERM); + bo->prime_exported = true; + DRM_DEBUG_PRIME("Exporting prime bo %p\n", bo); + return drm_gem_prime_export(dev, gobj, flags); } [-- Attachment #3: Type: text/plain, Size: 160 bytes --] _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply related [flat|nested] 46+ messages in thread
* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences. 2016-10-28 18:37 ` Mike Lothian @ 2016-10-29 13:58 ` Mike Lothian [not found] ` <CAHbf0-EY2OM_HgxTjmMi4-f5TQ8fkqf5XYBxHZtJVsnSpxPyyA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 46+ messages in thread From: Mike Lothian @ 2016-10-29 13:58 UTC (permalink / raw) To: Mario Kleiner, Michel Dänzer, Christian König, Daniel Vetter Cc: Nayan Deshmukh, amd-gfx list, dri-devel [-- Attachment #1.1: Type: text/plain, Size: 2349 bytes --] I turned on vsync and everything works great in tomb raider :D Thanks again to everyone who made this possible On Fri, 28 Oct 2016 at 19:37 Mike Lothian <mike-4+n8WJKc9ve9FHfhHBbuYA@public.gmane.org> wrote: > Hi Mario > > That fixes the tearing, it's been replaced with a strange stutter, like > it's only showing half the number of frames being reported - it's really > noticeable in tomb raider > > Thanks for your work on this, the stutter is much more manageable than the > tearing was > > I've attached the patch that applies cleanly to 4.10-wip > > > > On Fri, 28 Oct 2016 at 18:37 Mario Kleiner <mario.kleiner.de-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> > wrote: > > > > On 10/28/2016 03:34 AM, Michel Dänzer wrote: > > On 27/10/16 10:33 PM, Mike Lothian wrote: > >> > >> Just another gentle ping to see where you are with this? > > > > I haven't got a chance to look into this any further. > > > > > > Fwiw., as a proof of concept, the attached experimental patch does work > as tested on Intel HD Haswell + AMD R9 380 Tonga under amdgpu and > DRI3/Present when applied to drm-next (updated from a few days ago). > With DRI_PRIME=1 tearing for page-flipped fullscreen windows is gone > under all loads. The tearing with "windowed" windows now looks as > expected for regular tearing not related to Prime. > > ftrace confirms the i915 driver's pageflip function is waiting on the > fence in reservation_object_wait_timeout_rcu() as it should. > > That entry->tv.shared needs to be set false for such buffers in > amdgpu_bo_list_set() makes sense to me, as that is part of the buffer > validation for command stream submission. There are other places in the > driver where tv.shared is set, which i didn't check so far. > > I don't know which of these would need to be updated with a "exported > bo" check as well, e.g., for video decoding or maybe gpu compute? Adding > or removing the check to amdgpu_gem_va_update_vm(), e.g., made no > difference. I assume that makes sense because that functions seems to > deal with amdgpu internal vm page tables or page table entries for such > a bo, not with something visible to external clients? > > All i can say is it fixes 3D rendering under DRI3 + Prime + pageflipping > without causing any obvious new problems. > > -mario > > [-- Attachment #1.2: Type: text/html, Size: 4160 bytes --] [-- Attachment #2: Type: text/plain, Size: 154 bytes --] _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 46+ messages in thread
[parent not found: <CAHbf0-EY2OM_HgxTjmMi4-f5TQ8fkqf5XYBxHZtJVsnSpxPyyA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences. [not found] ` <CAHbf0-EY2OM_HgxTjmMi4-f5TQ8fkqf5XYBxHZtJVsnSpxPyyA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2016-10-31 6:44 ` Michel Dänzer [not found] ` <c45e2f4c-c075-47b6-7e02-3bd98748c83a-otUistvHUpPR7s880joybQ@public.gmane.org> 0 siblings, 1 reply; 46+ messages in thread From: Michel Dänzer @ 2016-10-31 6:44 UTC (permalink / raw) To: Mike Lothian, Mario Kleiner, Christian König, Daniel Vetter Cc: Nayan Deshmukh, dri-devel, amd-gfx list On 29/10/16 10:58 PM, Mike Lothian wrote: > I turned on vsync and everything works great in tomb raider > > :D > > Thanks again to everyone who made this possible > > On Fri, 28 Oct 2016 at 19:37 Mike Lothian <mike@fireburn.co.uk > <mailto:mike@fireburn.co.uk>> wrote: > > Hi Mario > > That fixes the tearing, it's been replaced with a strange stutter, > like it's only showing half the number of frames being reported - > it's really noticeable in tomb raider I wonder if the stutter might be due to the dGPU writing another frame before the iGPU is done processing the previous one. Christian, does the amdgpu scheduler wait for shared fences of shared BOs to signal before submitting jobs using them to the GPU? -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 46+ messages in thread
[parent not found: <c45e2f4c-c075-47b6-7e02-3bd98748c83a-otUistvHUpPR7s880joybQ@public.gmane.org>]
* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences. [not found] ` <c45e2f4c-c075-47b6-7e02-3bd98748c83a-otUistvHUpPR7s880joybQ@public.gmane.org> @ 2016-10-31 8:00 ` Christian König 2016-10-31 8:06 ` Michel Dänzer 0 siblings, 1 reply; 46+ messages in thread From: Christian König @ 2016-10-31 8:00 UTC (permalink / raw) To: Michel Dänzer, Mike Lothian, Mario Kleiner, Daniel Vetter Cc: Nayan Deshmukh, dri-devel, amd-gfx list Am 31.10.2016 um 07:44 schrieb Michel Dänzer: > On 29/10/16 10:58 PM, Mike Lothian wrote: >> I turned on vsync and everything works great in tomb raider >> >> :D >> >> Thanks again to everyone who made this possible >> >> On Fri, 28 Oct 2016 at 19:37 Mike Lothian <mike@fireburn.co.uk >> <mailto:mike@fireburn.co.uk>> wrote: >> >> Hi Mario >> >> That fixes the tearing, it's been replaced with a strange stutter, >> like it's only showing half the number of frames being reported - >> it's really noticeable in tomb raider > I wonder if the stutter might be due to the dGPU writing another frame > before the iGPU is done processing the previous one. Christian, does the > amdgpu scheduler wait for shared fences of shared BOs to signal before > submitting jobs using them to the GPU? Yeah, that should work. We wait for both the exclusive as well as all shared fences before CS or pageflip. Only on CS we filter the shared fences so that we don't necessary wait for submissions from the same process. Christian. _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences. 2016-10-31 8:00 ` Christian König @ 2016-10-31 8:06 ` Michel Dänzer 0 siblings, 0 replies; 46+ messages in thread From: Michel Dänzer @ 2016-10-31 8:06 UTC (permalink / raw) To: Christian König, Mike Lothian, Mario Kleiner, Daniel Vetter Cc: Nayan Deshmukh, amd-gfx list, dri-devel On 31/10/16 05:00 PM, Christian König wrote: > Am 31.10.2016 um 07:44 schrieb Michel Dänzer: >> On 29/10/16 10:58 PM, Mike Lothian wrote: >>> I turned on vsync and everything works great in tomb raider >>> >>> :D >>> >>> Thanks again to everyone who made this possible >>> >>> On Fri, 28 Oct 2016 at 19:37 Mike Lothian <mike@fireburn.co.uk >>> <mailto:mike@fireburn.co.uk>> wrote: >>> >>> Hi Mario >>> >>> That fixes the tearing, it's been replaced with a strange stutter, >>> like it's only showing half the number of frames being reported - >>> it's really noticeable in tomb raider >> I wonder if the stutter might be due to the dGPU writing another frame >> before the iGPU is done processing the previous one. Christian, does the >> amdgpu scheduler wait for shared fences of shared BOs to signal before >> submitting jobs using them to the GPU? > > Yeah, that should work. We wait for both the exclusive as well as all > shared fences before CS or pageflip. > > Only on CS we filter the shared fences so that we don't necessary wait > for submissions from the same process. Note that it can be the same process (Xorg) in the RandR 1.4 multihead case. But it sounds like Mike's stutter can't be due to the issue I was thinking of. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences. [not found] ` <9d1f4872-cabd-bd1b-7f10-6e4230a1f58c-otUistvHUpPR7s880joybQ@public.gmane.org> 2016-09-21 15:15 ` Christian König @ 2016-09-22 6:33 ` Daniel Vetter 1 sibling, 0 replies; 46+ messages in thread From: Daniel Vetter @ 2016-09-22 6:33 UTC (permalink / raw) To: Michel Dänzer Cc: Mario Kleiner, dri-devel, Chris Wilson, Christian König, amd-gfx list, Daniel Vetter On Thu, Sep 22, 2016 at 12:07:24AM +0900, Michel Dänzer wrote: > On 21/09/16 09:56 PM, Daniel Vetter wrote: > > On Wed, Sep 21, 2016 at 1:19 PM, Christian König > > <deathsimple@vodafone.de> wrote: > >> Am 21.09.2016 um 13:04 schrieb Daniel Vetter: > >>> On Wed, Sep 21, 2016 at 12:30 PM, Christian König > >>> <deathsimple@vodafone.de> wrote: > >>>> Am 21.09.2016 um 11:56 schrieb Michel Dänzer: > >>>>> > >>>>> > >>>>> Looks like there are different interpretations of the semantics of > >>>>> exclusive vs. shared fences. Where are these semantics documented? > >>>> > >>>> > >>>> Yeah, I think as well that this is the primary question here. > >>>> > >>>> IIRC the fences were explicitly called exclusive/shared instead of > >>>> writing/reading on purpose. > >>>> > >>>> I absolutely don't mind switching to them to writing/reading semantics, > >>>> but > >>>> amdgpu really needs multiple writers at the same time. > >>>> > >>>> So in this case the writing side of a reservation object needs to be a > >>>> collection of fences as well. > >>> > >>> You can't have multiple writers with implicit syncing. That confusion > >>> is exactly why we called them shared/exclusive. Multiple writers > >>> generally means that you do some form of fencing in userspace > >>> (unsync'ed gl buffer access is the common one). What you do for > >>> private buffers doesn't matter, but when you render into a > >>> shared/winsys buffer you really need to set the exclusive fence (and > >>> there can only ever be one). So probably needs some userspace > >>> adjustments to make sure you don't accidentally set an exclusive write > >>> hazard when you don't really want that implicit sync. > >> > >> > >> Nope, that isn't true. > >> > >> We use multiple writers without implicit syncing between processes in the > >> amdgpu stack perfectly fine. > >> > >> See amdgpu_sync.c for the implementation. What we do there is taking a look > >> at all the fences associated with a reservation object and only sync to > >> those who are from another process. > >> > >> Then we use implicit syncing for command submissions in the form of > >> "dependencies". E.g. for each CS we report back an identifier of that > >> submission to user space and on the next submission you can give this > >> identifier as dependency which needs to be satisfied before the command > >> submission can start running. > > > > This is called explicit fencing. Implemented with a driver-private > > primitive (and not sync_file fds like on android), but still > > conceptually explicit fencing. Implicit fencing really only can handle > > one writer, at least as currently implemented by struct > > reservation_object. > > > >> This was done to allow multiple engines (3D, DMA, Compute) to compose a > >> buffer while still allow compatibility with protocols like DRI2/DRI3. > > > > Instead of the current solution you need to stop attaching exclusive > > fences to non-shared buffers (which are coordinated using the > > driver-private explicit fencing you're describing), > > Err, the current issue is actually that amdgpu never sets an exclusive > fence, only ever shared ones. :) Well since you sometimes sync and sometimes not sync it is kinda a special case of semi-exclusive fence (even if attached to the shared slots). > > > and only attach exclusive fences to shared buffers (DRI2/3, PRIME, > > whatever). > > Still, it occurred to me in the meantime that amdgpu setting the > exclusive fence for buffers shared via PRIME (no matter if it's a write > or read operation) might be a solution. Christian, what do you think? Yup, that's what I mean. And it shouldn't cause a problem since for shared buffers (at least for protocols where implicit fencing is required), since for those you really can't have multiple concurrent writers. And with the special checks in amdgpu_sync.c that's what's happening in reality, only difference is that the filtering/selection of what is considered and exclusive fences happens when you sync, and not when you attach them. And that breaks reservation_object assumptions. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences. 2016-09-21 10:30 ` Christian König 2016-09-21 11:04 ` Daniel Vetter @ 2016-09-21 15:13 ` Michel Dänzer [not found] ` <f0e034f9-db22-6577-97c7-dd8d3e851226-otUistvHUpPR7s880joybQ@public.gmane.org> 1 sibling, 1 reply; 46+ messages in thread From: Michel Dänzer @ 2016-09-21 15:13 UTC (permalink / raw) To: Christian König, Chris Wilson, Mario Kleiner; +Cc: amd-gfx, dri-devel On 21/09/16 07:30 PM, Christian König wrote: > Am 21.09.2016 um 11:56 schrieb Michel Dänzer: >> >> FWIW, we seem to have the same issue with radeon vs. amdgpu: radeon only >> seems to wait for exclusive fences, so e.g. running Xorg on amdgpu and >> using PRIME slave scanout on radeon leaves artifacts. > > Yeah, I know. See radeon_display.c radeon_flip_work_func(). > > We pretty much need the same patch here I've done for amdgpu as well. Actually, the PRIME slave can't scan out from the shared BOs directly (recall the recent discussion we had about that with Mario), but has to copy from the shared BO to a dedicated scanout BO. These copies need to be synchronized with the primary GPU's copies to the shared BO. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 46+ messages in thread
[parent not found: <f0e034f9-db22-6577-97c7-dd8d3e851226-otUistvHUpPR7s880joybQ@public.gmane.org>]
* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences. [not found] ` <f0e034f9-db22-6577-97c7-dd8d3e851226-otUistvHUpPR7s880joybQ@public.gmane.org> @ 2016-09-21 15:21 ` Christian König 2016-09-21 15:28 ` Michel Dänzer 0 siblings, 1 reply; 46+ messages in thread From: Christian König @ 2016-09-21 15:21 UTC (permalink / raw) To: Michel Dänzer, Chris Wilson, Mario Kleiner Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW Am 21.09.2016 um 17:13 schrieb Michel Dänzer: > On 21/09/16 07:30 PM, Christian König wrote: >> Am 21.09.2016 um 11:56 schrieb Michel Dänzer: >>> FWIW, we seem to have the same issue with radeon vs. amdgpu: radeon only >>> seems to wait for exclusive fences, so e.g. running Xorg on amdgpu and >>> using PRIME slave scanout on radeon leaves artifacts. >> Yeah, I know. See radeon_display.c radeon_flip_work_func(). >> >> We pretty much need the same patch here I've done for amdgpu as well. > Actually, the PRIME slave can't scan out from the shared BOs directly > (recall the recent discussion we had about that with Mario), but has to > copy from the shared BO to a dedicated scanout BO. These copies need to > be synchronized with the primary GPU's copies to the shared BO. Yeah, that thought came to my mind before as well. Buffer migrations by the kernel caused by a prime export actually set the exclusive fence. So this shouldn't be an issue in practice when the displaying GPU needs to copy from the BO again anyway. The only case I can see when this can happen is when the BO is composed directly in system memory by the engines and not migrated there. Could be that we run into this issue more often in the future, because that is pretty much what we want to have for 4K UVD decode. Christian. _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences. 2016-09-21 15:21 ` Christian König @ 2016-09-21 15:28 ` Michel Dänzer 0 siblings, 0 replies; 46+ messages in thread From: Michel Dänzer @ 2016-09-21 15:28 UTC (permalink / raw) To: Christian König, Chris Wilson, Mario Kleiner; +Cc: amd-gfx, dri-devel On 22/09/16 12:21 AM, Christian König wrote: > Am 21.09.2016 um 17:13 schrieb Michel Dänzer: >> On 21/09/16 07:30 PM, Christian König wrote: >>> Am 21.09.2016 um 11:56 schrieb Michel Dänzer: >>>> FWIW, we seem to have the same issue with radeon vs. amdgpu: radeon >>>> only >>>> seems to wait for exclusive fences, so e.g. running Xorg on amdgpu and >>>> using PRIME slave scanout on radeon leaves artifacts. >>> Yeah, I know. See radeon_display.c radeon_flip_work_func(). >>> >>> We pretty much need the same patch here I've done for amdgpu as well. >> Actually, the PRIME slave can't scan out from the shared BOs directly >> (recall the recent discussion we had about that with Mario), but has to >> copy from the shared BO to a dedicated scanout BO. These copies need to >> be synchronized with the primary GPU's copies to the shared BO. > > Yeah, that thought came to my mind before as well. > > Buffer migrations by the kernel caused by a prime export actually set > the exclusive fence. > > So this shouldn't be an issue in practice when the displaying GPU needs > to copy from the BO again anyway. > > The only case I can see when this can happen is when the BO is composed > directly in system memory by the engines and not migrated there. There is no migration going on in the steady state, just the primary GPU writing to the shared BO and the slave GPU reading from it. If those operations aren't properly synchronized, there is at least intermittent tearing, but there can even be artifacts which stay until that area is updated again. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 46+ messages in thread
end of thread, other threads:[~2016-11-05 1:17 UTC | newest] Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-09-08 0:14 [PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences Mario Kleiner 2016-09-08 6:30 ` Chris Wilson 2016-09-08 15:21 ` Mario Kleiner 2016-09-08 16:23 ` Chris Wilson [not found] ` <20160908162346.GA5479-aII6DKEyn0pWYbfKqPwjAkR8Iwp7RQ6xAL8bYrjMMd8@public.gmane.org> 2016-09-09 1:15 ` Michel Dänzer [not found] ` <abccc8ac-10c6-ab22-c59d-f43ee48ba78d-otUistvHUpPR7s880joybQ@public.gmane.org> 2016-09-13 8:44 ` Christian König 2016-09-13 9:39 ` Chris Wilson [not found] ` <20160913093945.GA25204-aII6DKEyn0pWYbfKqPwjAkR8Iwp7RQ6xAL8bYrjMMd8@public.gmane.org> 2016-09-13 12:52 ` Christian König 2016-09-21 9:56 ` Michel Dänzer [not found] ` <7aafce92-8bcf-1c5c-45de-9e8ecda85239-otUistvHUpPR7s880joybQ@public.gmane.org> 2016-09-21 10:30 ` Christian König 2016-09-21 11:04 ` Daniel Vetter [not found] ` <CAKMK7uG3j54NzwjxmWuSmP787r+QN-Cu5T8R-naX6S9RvvKemw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2016-09-21 11:19 ` Christian König 2016-09-21 12:56 ` Daniel Vetter [not found] ` <CAKMK7uH6N2Kgwkf-11iwdqDAUrFmreYKLLeTGXmEh+N0DQ4tJg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2016-09-21 15:07 ` Michel Dänzer [not found] ` <9d1f4872-cabd-bd1b-7f10-6e4230a1f58c-otUistvHUpPR7s880joybQ@public.gmane.org> 2016-09-21 15:15 ` Christian König [not found] ` <5c2048ff-0e20-ddf3-2d73-9a3acb38e7ff-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org> 2016-09-21 15:29 ` Michel Dänzer 2016-09-21 16:23 ` Christian König 2016-09-22 6:36 ` Daniel Vetter [not found] ` <20160922063625.GD22164-XQyZGdhdUcTMwUGJfOwWj/ooFf0ArEBIu+b9c/7xato@public.gmane.org> 2016-09-22 10:55 ` Christian König 2016-09-22 12:26 ` Daniel Vetter 2016-09-22 12:44 ` Christian König 2016-09-22 13:05 ` Daniel Vetter 2016-09-22 13:22 ` Christian König [not found] ` <d2430ff8-43bd-bff2-9b02-847cabfd56c0-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org> 2016-09-23 10:00 ` Michel Dänzer 2016-09-23 12:09 ` Daniel Vetter 2016-09-26 0:48 ` Michel Dänzer 2016-09-26 8:04 ` Daniel Vetter [not found] ` <20160926080419.GV20761-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org> 2016-10-07 12:34 ` Mike Lothian [not found] ` <CAHbf0-HZ6EotqwgvkxRTdRF97xB3qBA=DRKAzaAXguV_PR_P8w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2016-10-11 3:58 ` Michel Dänzer [not found] ` <d74d34a7-5221-d282-d9d1-b0e1007fc0c7-otUistvHUpPR7s880joybQ@public.gmane.org> 2016-10-11 12:04 ` Christian König [not found] ` <c77a2cb9-1f0e-f1a3-aedd-a111cd6ba8e8-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org> 2016-10-12 0:40 ` Michel Dänzer 2016-10-27 13:33 ` Mike Lothian [not found] ` <CAHbf0-GGMWZrhB+PKpc-QbD__6fqB4pQVFfN+gzLWNhi+DuG3Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2016-10-28 1:34 ` Michel Dänzer 2016-10-28 17:37 ` Mario Kleiner [not found] ` <7eb19a73-a558-d2e6-bd8d-34fe95045dfd-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 2016-10-28 17:48 ` Christian König 2016-11-05 1:17 ` Mario Kleiner 2016-10-31 6:41 ` Michel Dänzer 2016-10-28 18:37 ` Mike Lothian 2016-10-29 13:58 ` Mike Lothian [not found] ` <CAHbf0-EY2OM_HgxTjmMi4-f5TQ8fkqf5XYBxHZtJVsnSpxPyyA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2016-10-31 6:44 ` Michel Dänzer [not found] ` <c45e2f4c-c075-47b6-7e02-3bd98748c83a-otUistvHUpPR7s880joybQ@public.gmane.org> 2016-10-31 8:00 ` Christian König 2016-10-31 8:06 ` Michel Dänzer 2016-09-22 6:33 ` Daniel Vetter 2016-09-21 15:13 ` Michel Dänzer [not found] ` <f0e034f9-db22-6577-97c7-dd8d3e851226-otUistvHUpPR7s880joybQ@public.gmane.org> 2016-09-21 15:21 ` Christian König 2016-09-21 15:28 ` Michel Dänzer
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.