Tackling the indefinite/user DMA fence problem

* Tackling the indefinite/user DMA fence problem
@ 2022-05-02 16:37 Christian König
  2022-05-02 16:37 ` [PATCH 01/15] dma-buf: rename DMA_FENCE_FLAG_USER_BITS to _DEVICE Christian König
                   ` (15 more replies)
  0 siblings, 16 replies; 41+ messages in thread
From: Christian König @ 2022-05-02 16:37 UTC (permalink / raw)
  To: daniel, jason, daniels, skhawaja, maad.aldabagh, sergemetral,
	sumit.semwal, gustavo, Felix.Kuehling, alexander.deucher,
	tzimmermann, tvrtko.ursulin, linux-media, dri-devel,
	linaro-mm-sig

Hello everyone,

it's a well known problem that the DMA-buf subsystem mixed synchronization and memory management requirements into the same dma_fence and dma_resv objects. Because of this dma_fence objects need to guarantee that they complete within a finite amount of time or otherwise the system can easily deadlock.

One of the few good things about this problem is that it is really good understood by now.

Daniel and others came up with some documentation: https://dri.freedesktop.org/docs/drm/driver-api/dma-buf.html?highlight=dma_buf#indefinite-dma-fences

And Jason did an excellent presentation about that problem on last years LPC: https://lpc.events/event/11/contributions/1115/

Based on that we had been able to reject new implementations of infinite/user DMA fences and mitigate the effect of the few existing ones.

The still remaining down side is that we don't have a way of using user fences as dependency in both the explicit (sync_file, drm_syncobj) as well as the implicit (dma_resv) synchronization objects, resulting in numerous problems and limitations for things like HMM, user queues etc....

This patch set here now tries to tackle this problem by untangling the synchronization from the memory management. What it does *not* try to do is to fix the existing kernel fences, because I think we now can all agree on that this isn't really possible.

To archive this goal what I do in this patch set is to add some parallel infrastructure to cleanly separate normal kernel dma_fence objects from indefinite/user fences:

1. It introduce a DMA_FENCE_FLAG_USER define (after renaming some existing driver defines). To note that a certain dma_fence is an user fence and *must* be ignore by memory management and never used as dependency for normal none user dma_fence objects.

2. The dma_fence_array and dma_fence_chain containers are modified so that they are marked as user fences whenever any of their contained fences are an user fence.

3. The dma_resv object gets a new DMA_RESV_USAGE_USER flag which must be used with indefinite/user fences and separates those into it's own synchronization domain.

4. The existing dma_buf_poll_add_cb() function is modified so that indefinite/user fences are included in the polling.

5. The sync_file synchronization object is modified so that we essentially have two fence streams instead of just one.

6. The drm_syncobj is modified in a similar way. User fences are just ignored unless the driver explicitly states support to wait for them.

7. The DRM subsystem gains a new DRIVER_USER_FENCE flag which drivers can use to indicate the need for user fences. If user fences are used the atomic mode setting starts to support user fences as IN/OUT fences.

8. Lockdep is used at various critical locations to ensure that nobody ever tries to mix user fences with non user fences.

The general approach is to just ignore user fences unless a driver stated explicitely support for them.

On top of all of this I've hacked amdgpu so that we add the resulting CS fence only as kernel dependency to the dma_resv object and an additional wrapped up with a dma_fence_array and a stub user fence.

The result is that the newly added atomic modeset functions now correctly wait for the user fence to complete before doing the flip. And dependent CS don't pipeline any more, but rather block on the CPU before submitting work.

After tons of debugging and testing everything now seems to not go up in flames immediately and even lockdep is happy with the annotations.

I'm perfectly aware that this is probably by far the most controversial patch set I've ever created and I really wish we wouldn't need it. But we certainly have the requirement for this and I don't see much other chance to get that working in an UAPI compatible way.

Thoughts/comments?

Regards,
Christian.

^ permalink raw reply	[flat|nested] 41+ messages in thread