From: "Christian König" <christian.koenig@amd.com>
To: "Daniel Vetter" <daniel.vetter@ffwll.ch>,
"Christian König" <deathsimple@vodafone.de>
Cc: Thomas Hellstrom <thellstrom@vmware.com>,
nouveau <nouveau@lists.freedesktop.org>,
LKML <linux-kernel@vger.kernel.org>,
dri-devel <dri-devel@lists.freedesktop.org>,
"Deucher, Alexander" <alexander.deucher@amd.com>,
Ben Skeggs <bskeggs@redhat.com>
Subject: Re: [Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences
Date: Tue, 22 Jul 2014 18:39:48 +0200 [thread overview]
Message-ID: <53CE93D4.3010204@amd.com> (raw)
In-Reply-To: <CAKMK7uFugmw_pP1nyA1ws5tMB6iCe1CqBHeWHoYzQ0wXE301EA@mail.gmail.com>
> Maybe I've mixed things up a bit in my description. There is
> fence_signal which the implementor/exporter of a fence must call when
> the fence is completed. If the exporter has an ->enable_signaling
> callback it can delay that call to fence_signal for as long as it
> wishes as long as enable_signaling isn't called yet. But that's just
> the optimization to not required irqs to be turned on all the time.
>
> The other function is fence_is_signaled, which is used by code that is
> interested in the fence state, together with fence_wait if it wants to
> block and not just wants to know the momentary fence state. All the
> other functions (the stuff that adds callbacks and the various _locked
> and other versions) are just for fancy special cases.
Well that's rather bad, cause IRQs aren't reliable enough on Radeon HW
for such a thing. Especially on Prime systems and Macs.
That's why we have this fancy HZ/2 timeout on all fence wait operations
to manually check if the fence is signaled or not.
To guarantee that a fence is signaled after enable_signaling is called
we would need to fire up a kernel thread which periodically calls
fence->signaled.
Christian.
Am 22.07.2014 18:21, schrieb Daniel Vetter:
> On Tue, Jul 22, 2014 at 5:59 PM, Christian König
> <deathsimple@vodafone.de> wrote:
>> Am 22.07.2014 17:42, schrieb Daniel Vetter:
>>
>>> On Tue, Jul 22, 2014 at 5:35 PM, Christian König
>>> <christian.koenig@amd.com> wrote:
>>>> Drivers exporting fences need to provide a fence->signaled and a
>>>> fence->wait
>>>> function, everything else like fence->enable_signaling or calling
>>>> fence_signaled() from the driver is optional.
>>>>
>>>> Drivers wanting to use exported fences don't call fence->signaled or
>>>> fence->wait in atomic or interrupt context, and not with holding any
>>>> global
>>>> locking primitives (like mmap_sem etc...). Holding locking primitives
>>>> local
>>>> to the driver is ok, as long as they don't conflict with anything
>>>> possible
>>>> used by their own fence implementation.
>>> Well that's almost what we have right now with the exception that
>>> drivers are allowed (actually must for correctness when updating
>>> fences) the ww_mutexes for dma-bufs (or other buffer objects).
>>
>> In this case sorry for so much noise. I really haven't looked in so much
>> detail into anything but Maarten's Radeon patches.
>>
>> But how does that then work right now? My impression was that it's mandatory
>> for drivers to call fence_signaled()?
> Maybe I've mixed things up a bit in my description. There is
> fence_signal which the implementor/exporter of a fence must call when
> the fence is completed. If the exporter has an ->enable_signaling
> callback it can delay that call to fence_signal for as long as it
> wishes as long as enable_signaling isn't called yet. But that's just
> the optimization to not required irqs to be turned on all the time.
>
> The other function is fence_is_signaled, which is used by code that is
> interested in the fence state, together with fence_wait if it wants to
> block and not just wants to know the momentary fence state. All the
> other functions (the stuff that adds callbacks and the various _locked
> and other versions) are just for fancy special cases.
>
>>> Locking
>>> correctness is enforced with some extremely nasty lockdep annotations
>>> + additional debugging infrastructure enabled with
>>> CONFIG_DEBUG_WW_MUTEX_SLOWPATH. We really need to be able to hold
>>> dma-buf ww_mutexes while updating fences or waiting for them. And
>>> obviously for ->wait we need non-atomic context, not just
>>> non-interrupt.
>>
>> Sounds mostly reasonable, but for holding the dma-buf ww_mutex, wouldn't be
>> an RCU be more appropriate here? E.g. aren't we just interested that the
>> current assigned fence at some point is signaled?
> Yeah, as an optimization you can get the set of currently attached
> fences to a dma-buf with just rcu. But if you update the set of fences
> attached to a dma-buf (e.g. radeon blits the newly rendered frame to a
> dma-buf exported by i915 for scanout on i915) then you need a write
> lock on that buffer. Which is what the ww_mutex is for, to make sure
> that you don't deadlock with i915 doing concurrent ops on the same
> underlying buffer.
>
>> Something like grab ww_mutexes, grab a reference to the current fence
>> object, release ww_mutex, wait for fence, release reference to the fence
>> object.
> Yeah, if the only thing you want to do is wait for fences, then the
> rcu-protected fence ref grabbing + lockless waiting is all you need.
> But e.g. in an execbuf you also need to update fences and maybe deep
> down in the reservation code you notice that you need to evict some
> stuff and so need to wait on some other guy to finish, and it's too
> complicated to drop and reacquire all the locks. Or you simply need to
> do a blocking wait on other gpus (because there's no direct hw sync
> mechanism) and again dropping locks would needlessly complicate the
> code. So I think we should allow this just to avoid too hairy/brittle
> (and almost definitely little tested code) in drivers.
>
> Afaik this is also the same way ttm currently handles things wrt
> buffer reservation and eviction.
>
>>> Agreed that any shared locks are out of the way (especially stuff like
>>> dev->struct_mutex or other non-strictly driver-private stuff, i915 is
>>> really bad here still).
>>
>> Yeah that's also an point I've wanted to note on Maartens patch. Radeon
>> grabs the read side of it's exclusive semaphore while waiting for fences
>> (because it assumes that the fence it waits for is a Radeon fence).
>>
>> Assuming that we need to wait in both directions with Prime (e.g. Intel
>> driver needs to wait for Radeon to finish rendering and Radeon needs to wait
>> for Intel to finish displaying), this might become a perfect example of
>> locking inversion.
> fence updates are atomic on a dma-buf, protected by ww_mutex. The neat
> trick of ww_mutex is that they enforce a global ordering, so in your
> scenario either i915 or radeon would be first and you can't deadlock.
> There is no way to interleave anything even if you have lots of
> buffers shared between i915/radeon. Wrt deadlocking it's exactly the
> same guarantees as the magic ttm provides for just one driver with
> concurrent command submission since it's the same idea.
>
>>> So from the core fence framework I think we already have exactly this,
>>> and we only need to adjust the radeon implementation a bit to make it
>>> less risky and invasive to the radeon driver logic.
>>
>> Agree. Well the biggest problem I see is that exclusive semaphore I need to
>> take when anything calls into the driver. For the fence code I need to move
>> that down into the fence->signaled handler, cause that now can be called
>> from outside the driver.
>>
>> Maarten solved this by telling the driver in the lockup handler (where we
>> grab the write side of the exclusive lock) that all interrupts are already
>> enabled, so that fence->signaled hopefully wouldn't mess with the hardware
>> at all. While this probably works, it just leaves me with a feeling that we
>> are doing something wrong here.
> I'm not versed on the details in readon, but on i915 we can attach a
> memory location and cookie value to each fence and just do a memory
> fetch to figure out whether the fence has passed or not. So no locking
> needed at all. Of course the fence itself needs to lock a reference
> onto that memory location, which is a neat piece of integration work
> that we still need to tackle in some cases - there's conflicting patch
> series all over this ;-)
>
> But like I've said fence->signaled is optional so you don't need this
> necessarily, as long as radeon eventually calls fence_signaled once
> the fence has completed.
> -Daniel
next prev parent reply other threads:[~2014-07-22 16:40 UTC|newest]
Thread overview: 94+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-07-09 12:29 [PATCH 00/17] Convert TTM to the new fence interface Maarten Lankhorst
2014-07-09 12:29 ` [PATCH 01/17] drm/ttm: add interruptible parameter to ttm_eu_reserve_buffers Maarten Lankhorst
2014-07-09 12:29 ` [PATCH 02/17] drm/ttm: kill off some members to ttm_validate_buffer Maarten Lankhorst
2014-07-09 12:29 ` [PATCH 03/17] drm/nouveau: add reservation to nouveau_gem_ioctl_cpu_prep Maarten Lankhorst
2014-07-09 12:29 ` [PATCH 04/17] drm/nouveau: require reservations for nouveau_fence_sync and nouveau_bo_fence Maarten Lankhorst
2014-07-09 12:29 ` [PATCH 05/17] drm/ttm: call ttm_bo_wait while inside a reservation Maarten Lankhorst
2014-07-09 12:29 ` [PATCH 06/17] drm/ttm: kill fence_lock Maarten Lankhorst
2014-07-09 12:29 ` [PATCH 07/17] drm/nouveau: rework to new fence interface Maarten Lankhorst
2014-07-09 12:29 ` [PATCH 08/17] drm/radeon: add timeout argument to radeon_fence_wait_seq Maarten Lankhorst
2014-07-09 12:29 ` [PATCH 09/17] drm/radeon: use common fence implementation for fences Maarten Lankhorst
2014-07-09 12:57 ` Deucher, Alexander
2014-07-09 13:23 ` [PATCH v2 " Maarten Lankhorst
2014-07-10 17:27 ` Alex Deucher
2014-07-22 4:05 ` [PATCH " Dave Airlie
2014-07-22 8:43 ` Christian König
2014-07-22 11:46 ` Daniel Vetter
2014-07-22 11:52 ` Daniel Vetter
2014-07-22 11:57 ` Daniel Vetter
2014-07-22 12:19 ` Christian König
2014-07-22 13:26 ` [Nouveau] " Daniel Vetter
2014-07-22 13:45 ` Christian König
2014-07-22 14:44 ` Maarten Lankhorst
2014-07-22 15:02 ` Christian König
2014-07-22 15:18 ` Maarten Lankhorst
2014-07-22 15:17 ` Daniel Vetter
2014-07-22 15:35 ` Christian König
2014-07-22 15:42 ` Daniel Vetter
2014-07-22 15:59 ` Christian König
2014-07-22 16:21 ` Daniel Vetter
2014-07-22 16:39 ` Christian König [this message]
2014-07-22 16:52 ` Daniel Vetter
2014-07-22 16:43 ` Daniel Vetter
2014-07-23 6:40 ` Maarten Lankhorst
2014-07-23 6:52 ` Christian König
2014-07-23 7:02 ` Daniel Vetter
2014-07-23 7:06 ` Maarten Lankhorst
2014-07-23 7:09 ` Daniel Vetter
2014-07-23 7:15 ` Christian König
2014-07-23 7:32 ` Maarten Lankhorst
2014-07-23 7:41 ` Christian König
2014-07-23 7:26 ` Christian König
2014-07-23 7:31 ` Daniel Vetter
2014-07-23 7:37 ` Christian König
2014-07-23 7:51 ` Maarten Lankhorst
2014-07-23 7:58 ` Christian König
2014-07-23 8:07 ` Daniel Vetter
2014-07-23 8:20 ` Christian König
2014-07-23 8:25 ` Maarten Lankhorst
2014-07-23 8:42 ` Daniel Vetter
2014-07-23 8:46 ` Christian König
2014-07-23 8:54 ` Daniel Vetter
2014-07-23 9:27 ` Christian König
2014-07-23 9:30 ` Daniel Vetter
2014-07-23 9:36 ` Christian König
2014-07-23 9:38 ` Maarten Lankhorst
2014-07-23 9:39 ` Christian König
2014-07-23 9:39 ` Daniel Vetter
2014-07-23 9:44 ` Daniel Vetter
2014-07-23 9:47 ` Christian König
2014-07-23 9:52 ` Daniel Vetter
2014-07-23 9:55 ` Maarten Lankhorst
2014-07-23 10:13 ` Christian König
2014-07-23 10:52 ` Daniel Vetter
2014-07-23 12:36 ` Christian König
2014-07-23 12:42 ` Daniel Vetter
2014-07-23 13:16 ` Maarten Lankhorst
2014-07-23 14:05 ` Maarten Lankhorst
2014-07-24 13:47 ` Christian König
2014-07-23 8:01 ` Daniel Vetter
2014-07-23 8:31 ` Christian König
2014-07-23 12:35 ` Rob Clark
2014-07-22 14:05 ` Maarten Lankhorst
2014-07-22 14:24 ` Christian König
2014-07-22 14:27 ` Maarten Lankhorst
2014-07-22 14:39 ` Christian König
2014-07-22 14:47 ` Maarten Lankhorst
2014-07-22 15:16 ` Christian König
2014-07-22 15:19 ` Daniel Vetter
2014-07-22 15:42 ` Alex Deucher
2014-07-22 15:48 ` Daniel Vetter
2014-07-22 19:14 ` Jesse Barnes
2014-07-23 9:47 ` [Nouveau] " Daniel Vetter
2014-07-23 15:37 ` Jesse Barnes
2014-07-22 11:51 ` Maarten Lankhorst
2014-07-09 12:29 ` [PATCH 10/17] drm/qxl: rework to new fence interface Maarten Lankhorst
2014-07-09 12:30 ` [PATCH 11/17] drm/vmwgfx: get rid of different types of fence_flags entirely Maarten Lankhorst
2014-07-09 12:30 ` [PATCH 12/17] drm/vmwgfx: rework to new fence interface Maarten Lankhorst
2014-07-09 12:30 ` [PATCH 13/17] drm/ttm: flip the switch, and convert to dma_fence Maarten Lankhorst
2014-07-09 12:30 ` [PATCH 14/17] drm/nouveau: use rcu in nouveau_gem_ioctl_cpu_prep Maarten Lankhorst
2014-07-09 12:30 ` [PATCH 15/17] drm/radeon: use rcu waits in some ioctls Maarten Lankhorst
2014-07-09 12:30 ` [PATCH 16/17] drm/vmwgfx: use rcu in vmw_user_dmabuf_synccpu_grab Maarten Lankhorst
2014-07-09 12:30 ` [PATCH 17/17] drm/ttm: use rcu in core ttm Maarten Lankhorst
[not found] ` <CAHbf0-HaFi0px7QGfBErKenH7wDU08B5mxo_QhFJdDPC4WBDrQ@mail.gmail.com>
2014-07-09 13:21 ` [PATCH 00/17] Convert TTM to the new fence interface Maarten Lankhorst
2014-07-10 21:37 ` Thomas Hellström
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=53CE93D4.3010204@amd.com \
--to=christian.koenig@amd.com \
--cc=alexander.deucher@amd.com \
--cc=bskeggs@redhat.com \
--cc=daniel.vetter@ffwll.ch \
--cc=deathsimple@vodafone.de \
--cc=dri-devel@lists.freedesktop.org \
--cc=linux-kernel@vger.kernel.org \
--cc=nouveau@lists.freedesktop.org \
--cc=thellstrom@vmware.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).