All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Vetter <daniel@ffwll.ch>
To: "Christian König" <ckoenig.leichtzumerken@gmail.com>
Cc: daniel@ffwll.ch, jason@jlekstrand.net, daniels@collabora.com,
	skhawaja@google.com, maad.aldabagh@amd.com,
	sergemetral@google.com, sumit.semwal@linaro.org,
	gustavo@padovan.org, Felix.Kuehling@amd.com,
	alexander.deucher@amd.com, tzimmermann@suse.de,
	tvrtko.ursulin@linux.intel.com, linux-media@vger.kernel.org,
	dri-devel@lists.freedesktop.org, linaro-mm-sig@lists.linaro.org
Subject: Re: Tackling the indefinite/user DMA fence problem
Date: Wed, 4 May 2022 12:08:51 +0200	[thread overview]
Message-ID: <YnJQs1iusrBvpuMs@phenom.ffwll.local> (raw)
In-Reply-To: <20220502163722.3957-1-christian.koenig@amd.com>

On Mon, May 02, 2022 at 06:37:07PM +0200, Christian König wrote:
> Hello everyone,
> 
> it's a well known problem that the DMA-buf subsystem mixed
> synchronization and memory management requirements into the same
> dma_fence and dma_resv objects. Because of this dma_fence objects need
> to guarantee that they complete within a finite amount of time or
> otherwise the system can easily deadlock.
> 
> One of the few good things about this problem is that it is really good
> understood by now.
> 
> Daniel and others came up with some documentation:
> https://dri.freedesktop.org/docs/drm/driver-api/dma-buf.html?highlight=dma_buf#indefinite-dma-fences
> 
> And Jason did an excellent presentation about that problem on last years
> LPC: https://lpc.events/event/11/contributions/1115/
> 
> Based on that we had been able to reject new implementations of
> infinite/user DMA fences and mitigate the effect of the few existing
> ones.
> 
> The still remaining down side is that we don't have a way of using user
> fences as dependency in both the explicit (sync_file, drm_syncobj) as
> well as the implicit (dma_resv) synchronization objects, resulting in
> numerous problems and limitations for things like HMM, user queues
> etc....
> 
> This patch set here now tries to tackle this problem by untangling the
> synchronization from the memory management. What it does *not* try to do
> is to fix the existing kernel fences, because I think we now can all
> agree on that this isn't really possible.
> 
> To archive this goal what I do in this patch set is to add some parallel
> infrastructure to cleanly separate normal kernel dma_fence objects from
> indefinite/user fences:
> 
> 1. It introduce a DMA_FENCE_FLAG_USER define (after renaming some
> existing driver defines). To note that a certain dma_fence is an user
> fence and *must* be ignore by memory management and never used as
> dependency for normal none user dma_fence objects.
> 
> 2. The dma_fence_array and dma_fence_chain containers are modified so
> that they are marked as user fences whenever any of their contained
> fences are an user fence.
> 
> 3. The dma_resv object gets a new DMA_RESV_USAGE_USER flag which must be
> used with indefinite/user fences and separates those into it's own
> synchronization domain.
> 
> 4. The existing dma_buf_poll_add_cb() function is modified so that
> indefinite/user fences are included in the polling.
> 
> 5. The sync_file synchronization object is modified so that we
> essentially have two fence streams instead of just one.
> 
> 6. The drm_syncobj is modified in a similar way. User fences are just
> ignored unless the driver explicitly states support to wait for them.
> 
> 7. The DRM subsystem gains a new DRIVER_USER_FENCE flag which drivers
> can use to indicate the need for user fences. If user fences are used
> the atomic mode setting starts to support user fences as IN/OUT fences.
> 
> 8. Lockdep is used at various critical locations to ensure that nobody
> ever tries to mix user fences with non user fences.
> 
> The general approach is to just ignore user fences unless a driver
> stated explicitely support for them.
> 
> On top of all of this I've hacked amdgpu so that we add the resulting CS
> fence only as kernel dependency to the dma_resv object and an additional
> wrapped up with a dma_fence_array and a stub user fence.
> 
> The result is that the newly added atomic modeset functions now
> correctly wait for the user fence to complete before doing the flip. And
> dependent CS don't pipeline any more, but rather block on the CPU before
> submitting work.
> 
> After tons of debugging and testing everything now seems to not go up in
> flames immediately and even lockdep is happy with the annotations.
> 
> I'm perfectly aware that this is probably by far the most controversial
> patch set I've ever created and I really wish we wouldn't need it. But
> we certainly have the requirement for this and I don't see much other
> chance to get that working in an UAPI compatible way.
> 
> Thoughts/comments?

I think you need to type up the goal or exact problem statement you're
trying to solve first. What you typed up is a solution along the lines of
"try to stuff userspace memory fences into dma_fence and see how horrible
it all is", and that's certainly an interesting experiment, but what are
you trying to solve with it?

Like if the issue is to enable opencl or whatever, then that's no problem
(rocm on amdkfd is a thing, same maybe without the kfd part can be done
anywhere else). If the goal is to enable userspace memory fences for vk,
then we really don't need these everywhere, but really only in drm_syncobj
(and maybe sync_file).

If the goal is specifically atomic kms, then there's an entire can of
worms there that I really don't want to think about, but it exists: We
have dma_fence as out-fences from atomic commit, and that's already
massively broken since most drivers allocate some memory or at least take
locks which can allocate memory in their commit path. Like i2c. Putting a
userspace memory fence as in-fence in there makes that problem
substantially worse, since at least in theory you're just not allowed to
might_faul in atomic_commit_tail.

If the goal is to keep the uapi perfectly compatible then your patch set
doesn't look like a solution, since as soon as another driver is involved
which doesn't understand userspace memory fences it all falls apart. So
works great for a quick demo with amd+amd sharing, but not much further.
And I don't think it's feasible to just rev the entire ecosystem, since
that kinda defeats the point of keeping uapi stable - if we rev everything
we might as well also rev the uapi and make this a bit more incremental
again :-)

There's probably more to ponder here ...

I'm not sure what exactly the problem statement is that matches your
solution here though, so that seems to be missing.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

WARNING: multiple messages have this Message-ID (diff)
From: Daniel Vetter <daniel@ffwll.ch>
To: "Christian König" <ckoenig.leichtzumerken@gmail.com>
Cc: tvrtko.ursulin@linux.intel.com, sergemetral@google.com,
	tzimmermann@suse.de, gustavo@padovan.org, Felix.Kuehling@amd.com,
	linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org,
	linaro-mm-sig@lists.linaro.org, jason@jlekstrand.net,
	alexander.deucher@amd.com, daniels@collabora.com,
	skhawaja@google.com, sumit.semwal@linaro.org,
	maad.aldabagh@amd.com
Subject: Re: Tackling the indefinite/user DMA fence problem
Date: Wed, 4 May 2022 12:08:51 +0200	[thread overview]
Message-ID: <YnJQs1iusrBvpuMs@phenom.ffwll.local> (raw)
In-Reply-To: <20220502163722.3957-1-christian.koenig@amd.com>

On Mon, May 02, 2022 at 06:37:07PM +0200, Christian König wrote:
> Hello everyone,
> 
> it's a well known problem that the DMA-buf subsystem mixed
> synchronization and memory management requirements into the same
> dma_fence and dma_resv objects. Because of this dma_fence objects need
> to guarantee that they complete within a finite amount of time or
> otherwise the system can easily deadlock.
> 
> One of the few good things about this problem is that it is really good
> understood by now.
> 
> Daniel and others came up with some documentation:
> https://dri.freedesktop.org/docs/drm/driver-api/dma-buf.html?highlight=dma_buf#indefinite-dma-fences
> 
> And Jason did an excellent presentation about that problem on last years
> LPC: https://lpc.events/event/11/contributions/1115/
> 
> Based on that we had been able to reject new implementations of
> infinite/user DMA fences and mitigate the effect of the few existing
> ones.
> 
> The still remaining down side is that we don't have a way of using user
> fences as dependency in both the explicit (sync_file, drm_syncobj) as
> well as the implicit (dma_resv) synchronization objects, resulting in
> numerous problems and limitations for things like HMM, user queues
> etc....
> 
> This patch set here now tries to tackle this problem by untangling the
> synchronization from the memory management. What it does *not* try to do
> is to fix the existing kernel fences, because I think we now can all
> agree on that this isn't really possible.
> 
> To archive this goal what I do in this patch set is to add some parallel
> infrastructure to cleanly separate normal kernel dma_fence objects from
> indefinite/user fences:
> 
> 1. It introduce a DMA_FENCE_FLAG_USER define (after renaming some
> existing driver defines). To note that a certain dma_fence is an user
> fence and *must* be ignore by memory management and never used as
> dependency for normal none user dma_fence objects.
> 
> 2. The dma_fence_array and dma_fence_chain containers are modified so
> that they are marked as user fences whenever any of their contained
> fences are an user fence.
> 
> 3. The dma_resv object gets a new DMA_RESV_USAGE_USER flag which must be
> used with indefinite/user fences and separates those into it's own
> synchronization domain.
> 
> 4. The existing dma_buf_poll_add_cb() function is modified so that
> indefinite/user fences are included in the polling.
> 
> 5. The sync_file synchronization object is modified so that we
> essentially have two fence streams instead of just one.
> 
> 6. The drm_syncobj is modified in a similar way. User fences are just
> ignored unless the driver explicitly states support to wait for them.
> 
> 7. The DRM subsystem gains a new DRIVER_USER_FENCE flag which drivers
> can use to indicate the need for user fences. If user fences are used
> the atomic mode setting starts to support user fences as IN/OUT fences.
> 
> 8. Lockdep is used at various critical locations to ensure that nobody
> ever tries to mix user fences with non user fences.
> 
> The general approach is to just ignore user fences unless a driver
> stated explicitely support for them.
> 
> On top of all of this I've hacked amdgpu so that we add the resulting CS
> fence only as kernel dependency to the dma_resv object and an additional
> wrapped up with a dma_fence_array and a stub user fence.
> 
> The result is that the newly added atomic modeset functions now
> correctly wait for the user fence to complete before doing the flip. And
> dependent CS don't pipeline any more, but rather block on the CPU before
> submitting work.
> 
> After tons of debugging and testing everything now seems to not go up in
> flames immediately and even lockdep is happy with the annotations.
> 
> I'm perfectly aware that this is probably by far the most controversial
> patch set I've ever created and I really wish we wouldn't need it. But
> we certainly have the requirement for this and I don't see much other
> chance to get that working in an UAPI compatible way.
> 
> Thoughts/comments?

I think you need to type up the goal or exact problem statement you're
trying to solve first. What you typed up is a solution along the lines of
"try to stuff userspace memory fences into dma_fence and see how horrible
it all is", and that's certainly an interesting experiment, but what are
you trying to solve with it?

Like if the issue is to enable opencl or whatever, then that's no problem
(rocm on amdkfd is a thing, same maybe without the kfd part can be done
anywhere else). If the goal is to enable userspace memory fences for vk,
then we really don't need these everywhere, but really only in drm_syncobj
(and maybe sync_file).

If the goal is specifically atomic kms, then there's an entire can of
worms there that I really don't want to think about, but it exists: We
have dma_fence as out-fences from atomic commit, and that's already
massively broken since most drivers allocate some memory or at least take
locks which can allocate memory in their commit path. Like i2c. Putting a
userspace memory fence as in-fence in there makes that problem
substantially worse, since at least in theory you're just not allowed to
might_faul in atomic_commit_tail.

If the goal is to keep the uapi perfectly compatible then your patch set
doesn't look like a solution, since as soon as another driver is involved
which doesn't understand userspace memory fences it all falls apart. So
works great for a quick demo with amd+amd sharing, but not much further.
And I don't think it's feasible to just rev the entire ecosystem, since
that kinda defeats the point of keeping uapi stable - if we rev everything
we might as well also rev the uapi and make this a bit more incremental
again :-)

There's probably more to ponder here ...

I'm not sure what exactly the problem statement is that matches your
solution here though, so that seems to be missing.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

  parent reply	other threads:[~2022-05-04 10:08 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-02 16:37 Tackling the indefinite/user DMA fence problem Christian König
2022-05-02 16:37 ` [PATCH 01/15] dma-buf: rename DMA_FENCE_FLAG_USER_BITS to _DEVICE Christian König
2022-05-02 16:37 ` [PATCH 02/15] dma-buf: introduce user fence support Christian König
2022-05-04  7:53   ` Tvrtko Ursulin
2022-05-04  9:15     ` Christian König
2022-05-02 16:37 ` [PATCH 03/15] dma-buf: add user fence support to dma_fence_array Christian König
2022-05-02 16:37 ` [PATCH 04/15] dma-buf: add user fence support to dma_fence_chain Christian König
2022-05-02 16:37 ` [PATCH 05/15] dma-buf: add user fence support to dma_resv Christian König
2022-05-02 16:37 ` [PATCH 06/15] dma-buf: add user fence support to dma_fence_merge() Christian König
2022-05-02 16:37 ` [PATCH 07/15] dma-buf: add user fence utility functions Christian König
2022-05-02 16:37 ` [PATCH 08/15] dma-buf: add support for polling on user fences Christian König
2022-05-02 16:37 ` [PATCH 09/15] dma-buf/sync_file: add user fence support Christian König
2022-05-02 16:37 ` [PATCH 10/15] drm: add user fence support for atomic out fences Christian König
2022-05-02 16:37 ` [PATCH 11/15] drm: add user fence support for atomic in fences Christian König
2022-05-02 16:37 ` [PATCH 12/15] drm: add user fence support to drm_gem_plane_helper_prepare_fb Christian König
2022-05-02 16:37 ` [PATCH 13/15] drm: add user fence support to drm_syncobj Christian König
2022-05-02 16:37 ` [PATCH 14/15] drm/amdgpu: switch DM to atomic fence helpers Christian König
2022-05-02 16:37   ` Christian König
2022-05-02 16:37 ` [PATCH 15/15] drm/amdgpu: user fence proof of concept Christian König
2022-05-04 10:08 ` Daniel Vetter [this message]
2022-05-04 10:08   ` Tackling the indefinite/user DMA fence problem Daniel Vetter
2022-05-09  6:56   ` Christian König
2022-05-09  6:56     ` Christian König
2022-05-09 14:10     ` Daniel Vetter
2022-05-09 14:10       ` Daniel Vetter
2022-05-17 10:28       ` Christian König
2022-05-17 10:28         ` Christian König
2022-05-25 13:05         ` Daniel Vetter
2022-05-25 13:05           ` Daniel Vetter
2022-05-25 13:28           ` Michel Dänzer
2022-05-25 13:28             ` Michel Dänzer
2022-05-25 13:51             ` Daniel Vetter
2022-05-25 13:51               ` Daniel Vetter
2022-05-25 14:07               ` Simon Ser
2022-05-25 14:07                 ` Simon Ser
2022-05-25 14:15                 ` Daniel Stone
2022-05-25 14:15                   ` Daniel Stone
2022-05-25 14:22                   ` Christian König
2022-05-25 14:22                     ` Christian König
2022-05-25 14:25                     ` Daniel Vetter
2022-05-25 14:25                       ` Daniel Vetter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YnJQs1iusrBvpuMs@phenom.ffwll.local \
    --to=daniel@ffwll.ch \
    --cc=Felix.Kuehling@amd.com \
    --cc=alexander.deucher@amd.com \
    --cc=ckoenig.leichtzumerken@gmail.com \
    --cc=daniels@collabora.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=gustavo@padovan.org \
    --cc=jason@jlekstrand.net \
    --cc=linaro-mm-sig@lists.linaro.org \
    --cc=linux-media@vger.kernel.org \
    --cc=maad.aldabagh@amd.com \
    --cc=sergemetral@google.com \
    --cc=skhawaja@google.com \
    --cc=sumit.semwal@linaro.org \
    --cc=tvrtko.ursulin@linux.intel.com \
    --cc=tzimmermann@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.