All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Vetter <daniel@ffwll.ch>
To: "Christian König" <ckoenig.leichtzumerken@gmail.com>
Cc: dri-devel@lists.freedesktop.org
Subject: Re: [RFC] Implicit vs explicit user fence sync
Date: Tue, 11 May 2021 18:48:15 +0200	[thread overview]
Message-ID: <YJq1T8yWXSW6TRjW@phenom.ffwll.local> (raw)
In-Reply-To: <a08a4b30-5ae5-49ac-bad0-c77a5cabbecd@gmail.com>

On Tue, May 11, 2021 at 05:32:29PM +0200, Christian König wrote:
> Am 11.05.21 um 16:23 schrieb Daniel Vetter:
> > On Tue, May 11, 2021 at 09:47:56AM +0200, Christian König wrote:
> > > Am 11.05.21 um 09:31 schrieb Daniel Vetter:
> > > > [SNIP]
> > > > > > And that's just the one ioctl I know is big trouble, I'm sure we'll find
> > > > > > more funny corner cases when we roll out explicit user fencing.
> > > > > I think we can just ignore sync_file. As far as it concerns me that UAPI is
> > > > > pretty much dead.
> > > > Uh that's rather bold. Android is built on it. Currently atomic kms is
> > > > built on it.
> > > To be honest I don't think we care about Android at all.
> > we = amd or we = upstream here?
> 
> we = amd, for everybody else that is certainly a different topic.
> 
> But for now AMD is the only one running into this problem.
> 
> Could be that Nouveau sees this as well with the next hw generation, but who
> knows?
> 
> > > > Why is this not much of a problem if it's just within one driver?
> > > Because inside the same driver I can easily add the waits before submitting
> > > the MM work as necessary.
> > What is MM work here now?
> 
> MM=multimedia, e.g. UVD, VCE, VCN engines on AMD hardware.
> 
> > > > > > > Adding implicit synchronization on top of that is then rather trivial.
> > > > > > Well that's what I disagree with, since I already see some problems that I
> > > > > > don't think we can overcome (the atomic ioctl is one). And that's with us
> > > > > > only having a fairly theoretical understanding of the overall situation.
> > > > > But how should we then ever support user fences with the atomic IOCTL?
> > > > > 
> > > > > We can't wait in user space since that will disable the support for waiting
> > > > > in the hardware.
> > > > Well, figure it out :-)
> > > > 
> > > > This is exactly why I'm not seeing anything solved with just rolling a
> > > > function call to a bunch of places, because it's pretending all things are
> > > > solved when clearly that's not the case.
> > > > 
> > > > I really think what we need is to first figure out how to support
> > > > userspace fences as explicit entities across the stack, maybe with
> > > > something like this order:
> > > > 1. enable them purely within a single userspace driver (like vk with
> > > > winsys disabled, or something else like that except not amd because
> > > > there's this amdkfd split for "real" compute)
> > > > 1a. including atomic ioctl, e.g. for vk direct display support this can be
> > > > used without cross-process sharing, new winsys protocols and all that fun
> > > > 2. figure out how to transport these userspace fences with something like
> > > > drm_syncobj
> > > > 2a. figure out the compat story for drivers which dont do userspace fences
> > > > 2b. figure out how to absorb the overhead if the winsys/compositor doesn't
> > > > support explicit sync
> > > > 3. maybe figure out how to make this all happen magically with implicit
> > > > sync, if we really, really care
> > > > 
> > > > If we do 3 before we've nailed all these problems, we're just guaranteeing
> > > > we'll get the wrong solutions and so we'll then have 3 ways of doing
> > > > userspace fences
> > > > - the butchered implicit one that didn't quite work
> > > > - the explicit one
> > > > - the not-so-butchered implicit one with the lessons from the properly
> > > >     done explicit one
> > > > 
> > > > The thing is, if you have no idea how to integrate userspace fences
> > > > explicitly into atomic ioctl, then you definitely have no idea how to do
> > > > it implicitly :-)
> > > Well I agree on that. But the question is still how would you do explicit
> > > with atomic?
> > If you supply an userpace fence (is that what we call them now) as
> > in-fence, then your only allowed to get a userspace fence as out-fence.
> 
> Yeah, that part makes perfectly sense. But I don't see the problem with
> that?
> 
> > That way we
> > - don't block anywhere we shouldn't
> > - don't create a dma_fence out of a userspace fence
> > 
> > The problem is this completely breaks your "magically make implicit
> > fencing with userspace fences" plan.
> 
> Why?

If you allow implicit fencing then you can end up with
- an implicit userspace fence as the in-fence
- but an explicit dma_fence as the out fence

Which is not allowed. So there's really no way to make this work, except
if you stall in the ioctl, which also doesn't work.

So you have to do an uapi change here. At that point we might as well do
it right.

Of course if you only care about some specific compositors (or maybe only
the -amdgpu Xorg driver even) then this isn't a concern, but atomic is
cross-driver so we can't do that. Or at least I don't see a way how to do
this without causing endless amounts of fun down the road.

> > So I have a plan here, what was yours?
> 
> As far as I see that should still work perfectly fine and I have the strong
> feeling I'm missing something here.
> 
> > > Transporting fences between processes is not the fundamental problem here,
> > > but rather the question how we represent all this in the kernel?
> > > 
> > > In other words I think what you outlined above is just approaching it from
> > > the wrong side again. Instead of looking what the kernel needs to support
> > > this you take a look at userspace and the requirements there.
> > Uh ... that was my idea here? That's why I put "build userspace fences in
> > userspace only" as the very first thing. Then extend to winsys and
> > atomic/display and all these cases where things get more tricky.
> > 
> > I agree that transporting the fences is easy, which is why it's not
> > interesting trying to solve that problem first. Which is kinda what you're
> > trying to do here by adding implicit userspace fences (well not even that,
> > just a bunch of function calls without any semantics attached to them).
> > 
> > So if there's more here, you need to flesh it out more or I just dont get
> > what you're actually trying to demonstrate.
> 
> Well I'm trying to figure out why you see it as such a problem to keep
> implicit sync around.
> 
> As far as I can tell it is completely octagonal if we use implicit/explicit
> and dma_fence/user_fence.
> 
> It's just a different implementation inside the kernel.

See above. It falls apart with the atomic ioctl.
-Daniel

> 
> Christian.
> 
> > -Daniel
> > 
> > > Regards,
> > > Christian.
> > > 
> > > > And "just block" might be good enough for a quick demo, it still breaks
> > > > the contract. Same holds for a bunch of the winsys problems we'll have to
> > > > deal with here.
> > > > -Daniel
> > > > 
> > > > > Regards,
> > > > > Christian.
> > > > > 
> > > > > > Like here at intel we have internal code for compute, and we're starting
> > > > > > to hit some interesting cases with interop with media already, but that's
> > > > > > it. Nothing even close to desktop/winsys/kms, and that's where I expect
> > > > > > will all the pain be at.
> > > > > > 
> > > > > > Cheers, Daniel
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

  reply	other threads:[~2021-05-11 16:48 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-04 13:27 [RFC] Implicit vs explicit user fence sync Christian König
2021-05-04 13:27 ` [PATCH 01/12] dma-buf: add interface for user fence synchronization Christian König
2021-05-04 13:27 ` [PATCH 02/12] RDMA/mlx5: add DMA-buf user fence support Christian König
2021-05-04 13:27 ` [PATCH 03/12] drm/amdgpu: " Christian König
2021-05-04 13:27 ` [PATCH 04/12] drm/gem: dd DMA-buf user fence support for the atomic helper Christian König
2021-05-04 13:27 ` [PATCH 05/12] drm/etnaviv: add DMA-buf user fence support Christian König
2021-05-04 13:27 ` [PATCH 06/12] drm/i915: " Christian König
2021-05-04 13:27 ` [PATCH 07/12] drm/lima: " Christian König
2021-05-04 13:27 ` [PATCH 08/12] drm/msm: " Christian König
2021-05-04 13:27 ` [PATCH 09/12] drm/nouveau: " Christian König
2021-05-04 13:27 ` [PATCH 10/12] drm/panfrost: " Christian König
2021-05-04 13:27 ` [PATCH 11/12] drm/radeon: " Christian König
2021-05-04 13:27 ` [PATCH 12/12] drm/v3d: " Christian König
2021-05-04 14:15 ` [RFC] Implicit vs explicit user fence sync Daniel Vetter
2021-05-04 14:26   ` Christian König
2021-05-04 15:11     ` Daniel Vetter
2021-05-10 18:12       ` Christian König
2021-05-11  7:31         ` Daniel Vetter
2021-05-11  7:47           ` Christian König
2021-05-11 14:23             ` Daniel Vetter
2021-05-11 15:32               ` Christian König
2021-05-11 16:48                 ` Daniel Vetter [this message]
2021-05-11 19:34                   ` Christian König
2021-05-12  8:13                     ` Daniel Vetter
2021-05-12  8:23                       ` Christian König
2021-05-12  8:50                         ` Daniel Vetter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YJq1T8yWXSW6TRjW@phenom.ffwll.local \
    --to=daniel@ffwll.ch \
    --cc=ckoenig.leichtzumerken@gmail.com \
    --cc=dri-devel@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.