dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
From: Jason Ekstrand <jason@jlekstrand.net>
To: Daniel Vetter <daniel@ffwll.ch>
Cc: "Christian König" <ckoenig.leichtzumerken@gmail.com>,
	dri-devel <dri-devel@lists.freedesktop.org>,
	"Marek Olšák" <maraeo@gmail.com>,
	"ML Mesa-dev" <mesa-dev@lists.freedesktop.org>
Subject: Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal
Date: Tue, 20 Apr 2021 11:24:58 -0500	[thread overview]
Message-ID: <CAOFGe95HdA3+4ihCPHuf_8DSJeLsDfHqJZE0M84_9d6bbLhKGQ@mail.gmail.com> (raw)
In-Reply-To: <CAKMK7uHdfG94WxVnbXyTFL7iu6UNd1Tv3OY2xVO2HT_zV+Avmw@mail.gmail.com>

On Tue, Apr 20, 2021 at 9:10 AM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Tue, Apr 20, 2021 at 1:59 PM Christian König
> <ckoenig.leichtzumerken@gmail.com> wrote:
> >
> > > Yeah. If we go with userspace fences, then userspace can hang itself. Not
> > > the kernel's problem.
> >
> > Well, the path of inner peace begins with four words. “Not my fucking
> > problem.”
> >
> > But I'm not that much concerned about the kernel, but rather about
> > important userspace processes like X, Wayland, SurfaceFlinger etc...
> >
> > I mean attaching a page to a sync object and allowing to wait/signal
> > from both CPU as well as GPU side is not so much of a problem.
> >
> > > You have to somehow handle that, e.g. perhaps with conditional
> > > rendering and just using the old frame in compositing if the new one
> > > doesn't show up in time.
> >
> > Nice idea, but how would you handle that on the OpenGL/Glamor/Vulkan level.
>
> For opengl we do all the same guarantees, so if you get one of these
> you just block until the fence is signalled. Doing that properly means
> submit thread to support drm_syncobj like for vulkan.
>
> For vulkan we probably want to represent these as proper vk timeline
> objects, and the vulkan way is to just let the application (well
> compositor) here deal with it. If they import timelines from untrusted
> other parties, they need to handle the potential fallback of being
> lied at. How is "not vulkan's fucking problem", because that entire
> "with great power (well performance) comes great responsibility" is
> the entire vk design paradigm.

The security aspects are currently an unsolved problem in Vulkan.  The
assumption is that everyone trusts everyone else to be careful with
the scissors.  It's a great model!

I think we can do something in Vulkan to allow apps to protect
themselves a bit but it's tricky and non-obvious.

--Jason


> Glamour will just rely on GL providing nice package of the harsh
> reality of gpus, like usual.
>
> So I guess step 1 here for GL would be to provide some kind of
> import/export of timeline syncobj, including properly handling this
> "future/indefinite fences" aspect of them with submit thread and
> everything.
>
> -Daniel
>
> >
> > Regards,
> > Christian.
> >
> > Am 20.04.21 um 13:16 schrieb Daniel Vetter:
> > > On Tue, Apr 20, 2021 at 07:03:19AM -0400, Marek Olšák wrote:
> > >> Daniel, are you suggesting that we should skip any deadlock prevention in
> > >> the kernel, and just let userspace wait for and signal any fence it has
> > >> access to?
> > > Yeah. If we go with userspace fences, then userspace can hang itself. Not
> > > the kernel's problem. The only criteria is that the kernel itself must
> > > never rely on these userspace fences, except for stuff like implementing
> > > optimized cpu waits. And in those we must always guarantee that the
> > > userspace process remains interruptible.
> > >
> > > It's a completely different world from dma_fence based kernel fences,
> > > whether those are implicit or explicit.
> > >
> > >> Do you have any concern with the deprecation/removal of BO fences in the
> > >> kernel assuming userspace is only using explicit fences? Any concern with
> > >> the submit and return fences for modesetting and other producer<->consumer
> > >> scenarios?
> > > Let me work on the full replay for your rfc first, because there's a lot
> > > of details here and nuance.
> > > -Daniel
> > >
> > >> Thanks,
> > >> Marek
> > >>
> > >> On Tue, Apr 20, 2021 at 6:34 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> > >>
> > >>> On Tue, Apr 20, 2021 at 12:15 PM Christian König
> > >>> <ckoenig.leichtzumerken@gmail.com> wrote:
> > >>>> Am 19.04.21 um 17:48 schrieb Jason Ekstrand:
> > >>>>> Not going to comment on everything on the first pass...
> > >>>>>
> > >>>>> On Mon, Apr 19, 2021 at 5:48 AM Marek Olšák <maraeo@gmail.com> wrote:
> > >>>>>> Hi,
> > >>>>>>
> > >>>>>> This is our initial proposal for explicit fences everywhere and new
> > >>> memory management that doesn't use BO fences. It's a redesign of how Linux
> > >>> graphics drivers work, and it can coexist with what we have now.
> > >>>>>>
> > >>>>>> 1. Introduction
> > >>>>>> (skip this if you are already sold on explicit fences)
> > >>>>>>
> > >>>>>> The current Linux graphics architecture was initially designed for
> > >>> GPUs with only one graphics queue where everything was executed in the
> > >>> submission order and per-BO fences were used for memory management and
> > >>> CPU-GPU synchronization, not GPU-GPU synchronization. Later, multiple
> > >>> queues were added on top, which required the introduction of implicit
> > >>> GPU-GPU synchronization between queues of different processes using per-BO
> > >>> fences. Recently, even parallel execution within one queue was enabled
> > >>> where a command buffer starts draws and compute shaders, but doesn't wait
> > >>> for them, enabling parallelism between back-to-back command buffers.
> > >>> Modesetting also uses per-BO fences for scheduling flips. Our GPU scheduler
> > >>> was created to enable all those use cases, and it's the only reason why the
> > >>> scheduler exists.
> > >>>>>> The GPU scheduler, implicit synchronization, BO-fence-based memory
> > >>> management, and the tracking of per-BO fences increase CPU overhead and
> > >>> latency, and reduce parallelism. There is a desire to replace all of them
> > >>> with something much simpler. Below is how we could do it.
> > >>>>>>
> > >>>>>> 2. Explicit synchronization for window systems and modesetting
> > >>>>>>
> > >>>>>> The producer is an application and the consumer is a compositor or a
> > >>> modesetting driver.
> > >>>>>> 2.1. The Present request
> > >>>>>>
> > >>>>>> As part of the Present request, the producer will pass 2 fences (sync
> > >>> objects) to the consumer alongside the presented DMABUF BO:
> > >>>>>> - The submit fence: Initially unsignalled, it will be signalled when
> > >>> the producer has finished drawing into the presented buffer.
> > >>>>>> - The return fence: Initially unsignalled, it will be signalled when
> > >>> the consumer has finished using the presented buffer.
> > >>>>> I'm not sure syncobj is what we want.  In the Intel world we're trying
> > >>>>> to go even further to something we're calling "userspace fences" which
> > >>>>> are a timeline implemented as a single 64-bit value in some
> > >>>>> CPU-mappable BO.  The client writes a higher value into the BO to
> > >>>>> signal the timeline.
> > >>>> Well that is exactly what our Windows guys have suggested as well, but
> > >>>> it strongly looks like that this isn't sufficient.
> > >>>>
> > >>>> First of all you run into security problems when any application can
> > >>>> just write any value to that memory location. Just imagine an
> > >>>> application sets the counter to zero and X waits forever for some
> > >>>> rendering to finish.
> > >>> The thing is, with userspace fences security boundary issue prevent
> > >>> moves into userspace entirely. And it really doesn't matter whether
> > >>> the event you're waiting on doesn't complete because the other app
> > >>> crashed or was stupid or intentionally gave you a wrong fence point:
> > >>> You have to somehow handle that, e.g. perhaps with conditional
> > >>> rendering and just using the old frame in compositing if the new one
> > >>> doesn't show up in time. Or something like that. So trying to get the
> > >>> kernel involved but also not so much involved sounds like a bad design
> > >>> to me.
> > >>>
> > >>>> Additional to that in such a model you can't determine who is the guilty
> > >>>> queue in case of a hang and can't reset the synchronization primitives
> > >>>> in case of an error.
> > >>>>
> > >>>> Apart from that this is rather inefficient, e.g. we don't have any way
> > >>>> to prevent priority inversion when used as a synchronization mechanism
> > >>>> between different GPU queues.
> > >>> Yeah but you can't have it both ways. Either all the scheduling in the
> > >>> kernel and fence handling is a problem, or you actually want to
> > >>> schedule in the kernel. hw seems to definitely move towards the more
> > >>> stupid spinlock-in-hw model (and direct submit from userspace and all
> > >>> that), priority inversions be damned. I'm really not sure we should
> > >>> fight that - if it's really that inefficient then maybe hw will add
> > >>> support for waiting sync constructs in hardware, or at least be
> > >>> smarter about scheduling other stuff. E.g. on intel hw both the kernel
> > >>> scheduler and fw scheduler knows when you're spinning on a hw fence
> > >>> (whether userspace or kernel doesn't matter) and plugs in something
> > >>> else. Add in a bit of hw support to watch cachelines, and you have
> > >>> something which can handle both directions efficiently.
> > >>>
> > >>> Imo given where hw is going, we shouldn't try to be too clever here.
> > >>> The only thing we do need to provision is being able to do cpu side
> > >>> waits without spinning. And that should probably be done in a fairly
> > >>> gpu specific way still.
> > >>> -Daniel
> > >>>
> > >>>> Christian.
> > >>>>
> > >>>>>     The kernel then provides some helpers for
> > >>>>> waiting on them reliably and without spinning.  I don't expect
> > >>>>> everyone to support these right away but, If we're going to re-plumb
> > >>>>> userspace for explicit synchronization, I'd like to make sure we take
> > >>>>> this into account so we only have to do it once.
> > >>>>>
> > >>>>>
> > >>>>>> Deadlock mitigation to recover from segfaults:
> > >>>>>> - The kernel knows which process is obliged to signal which fence.
> > >>> This information is part of the Present request and supplied by userspace.
> > >>>>> This isn't clear to me.  Yes, if we're using anything dma-fence based
> > >>>>> like syncobj, this is true.  But it doesn't seem totally true as a
> > >>>>> general statement.
> > >>>>>
> > >>>>>
> > >>>>>> - If the producer crashes, the kernel signals the submit fence, so
> > >>> that the consumer can make forward progress.
> > >>>>>> - If the consumer crashes, the kernel signals the return fence, so
> > >>> that the producer can reclaim the buffer.
> > >>>>>> - A GPU hang signals all fences. Other deadlocks will be handled like
> > >>> GPU hangs.
> > >>>>> What do you mean by "all"?  All fences that were supposed to be
> > >>>>> signaled by the hung context?
> > >>>>>
> > >>>>>
> > >>>>>> Other window system requests can follow the same idea.
> > >>>>>>
> > >>>>>> Merged fences where one fence object contains multiple fences will be
> > >>> supported. A merged fence is signalled only when its fences are signalled.
> > >>> The consumer will have the option to redefine the unsignalled return fence
> > >>> to a merged fence.
> > >>>>>> 2.2. Modesetting
> > >>>>>>
> > >>>>>> Since a modesetting driver can also be the consumer, the present
> > >>> ioctl will contain a submit fence and a return fence too. One small problem
> > >>> with this is that userspace can hang the modesetting driver, but in theory,
> > >>> any later present ioctl can override the previous one, so the unsignalled
> > >>> presentation is never used.
> > >>>>>>
> > >>>>>> 3. New memory management
> > >>>>>>
> > >>>>>> The per-BO fences will be removed and the kernel will not know which
> > >>> buffers are busy. This will reduce CPU overhead and latency. The kernel
> > >>> will not need per-BO fences with explicit synchronization, so we just need
> > >>> to remove their last user: buffer evictions. It also resolves the current
> > >>> OOM deadlock.
> > >>>>> Is this even really possible?  I'm no kernel MM expert (trying to
> > >>>>> learn some) but my understanding is that the use of per-BO dma-fence
> > >>>>> runs deep.  I would like to stop using it for implicit synchronization
> > >>>>> to be sure, but I'm not sure I believe the claim that we can get rid
> > >>>>> of it entirely.  Happy to see someone try, though.
> > >>>>>
> > >>>>>
> > >>>>>> 3.1. Evictions
> > >>>>>>
> > >>>>>> If the kernel wants to move a buffer, it will have to wait for
> > >>> everything to go idle, halt all userspace command submissions, move the
> > >>> buffer, and resume everything. This is not expected to happen when memory
> > >>> is not exhausted. Other more efficient ways of synchronization are also
> > >>> possible (e.g. sync only one process), but are not discussed here.
> > >>>>>> 3.2. Per-process VRAM usage quota
> > >>>>>>
> > >>>>>> Each process can optionally and periodically query its VRAM usage
> > >>> quota and change domains of its buffers to obey that quota. For example, a
> > >>> process allocated 2 GB of buffers in VRAM, but the kernel decreased the
> > >>> quota to 1 GB. The process can change the domains of the least important
> > >>> buffers to GTT to get the best outcome for itself. If the process doesn't
> > >>> do it, the kernel will choose which buffers to evict at random. (thanks to
> > >>> Christian Koenig for this idea)
> > >>>>> This is going to be difficult.  On Intel, we have some resources that
> > >>>>> have to be pinned to VRAM and can't be dynamically swapped out by the
> > >>>>> kernel.  In GL, we probably can deal with it somewhat dynamically.  In
> > >>>>> Vulkan, we'll be entirely dependent on the application to use the
> > >>>>> appropriate Vulkan memory budget APIs.
> > >>>>>
> > >>>>> --Jason
> > >>>>>
> > >>>>>
> > >>>>>> 3.3. Buffer destruction without per-BO fences
> > >>>>>>
> > >>>>>> When the buffer destroy ioctl is called, an optional fence list can
> > >>> be passed to the kernel to indicate when it's safe to deallocate the
> > >>> buffer. If the fence list is empty, the buffer will be deallocated
> > >>> immediately. Shared buffers will be handled by merging fence lists from all
> > >>> processes that destroy them. Mitigation of malicious behavior:
> > >>>>>> - If userspace destroys a busy buffer, it will get a GPU page fault.
> > >>>>>> - If userspace sends fences that never signal, the kernel will have a
> > >>> timeout period and then will proceed to deallocate the buffer anyway.
> > >>>>>> 3.4. Other notes on MM
> > >>>>>>
> > >>>>>> Overcommitment of GPU-accessible memory will cause an allocation
> > >>> failure or invoke the OOM killer. Evictions to GPU-inaccessible memory
> > >>> might not be supported.
> > >>>>>> Kernel drivers could move to this new memory management today. Only
> > >>> buffer residency and evictions would stop using per-BO fences.
> > >>>>>>
> > >>>>>> 4. Deprecating implicit synchronization
> > >>>>>>
> > >>>>>> It can be phased out by introducing a new generation of hardware
> > >>> where the driver doesn't add support for it (like a driver fork would do),
> > >>> assuming userspace has all the changes for explicit synchronization. This
> > >>> could potentially create an isolated part of the kernel DRM where all
> > >>> drivers only support explicit synchronization.
> > >>>>>> Marek
> > >>>>>> _______________________________________________
> > >>>>>> dri-devel mailing list
> > >>>>>> dri-devel@lists.freedesktop.org
> > >>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
> > >>>>> _______________________________________________
> > >>>>> mesa-dev mailing list
> > >>>>> mesa-dev@lists.freedesktop.org
> > >>>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> > >>>
> > >>> --
> > >>> Daniel Vetter
> > >>> Software Engineer, Intel Corporation
> > >>> http://blog.ffwll.ch
> > >>>
> >
>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

  reply	other threads:[~2021-04-20 16:25 UTC|newest]

Thread overview: 105+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-19 10:47 [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal Marek Olšák
2021-04-19 15:48 ` Jason Ekstrand
2021-04-20  2:25   ` Marek Olšák
2021-04-20 10:15   ` [Mesa-dev] " Christian König
2021-04-20 10:34     ` Daniel Vetter
2021-04-20 11:03       ` Marek Olšák
2021-04-20 11:16         ` Daniel Vetter
2021-04-20 11:59           ` Christian König
2021-04-20 14:09             ` Daniel Vetter
2021-04-20 16:24               ` Jason Ekstrand [this message]
2021-04-20 16:19             ` Jason Ekstrand
2021-04-20 12:01 ` Daniel Vetter
2021-04-20 12:19   ` [Mesa-dev] " Christian König
2021-04-20 13:03   ` Daniel Stone
2021-04-20 14:04     ` Daniel Vetter
2021-04-20 12:42 ` Daniel Stone
2021-04-20 15:45   ` Jason Ekstrand
2021-04-20 17:44     ` Daniel Stone
2021-04-20 18:00       ` [Mesa-dev] " Christian König
2021-04-20 18:15         ` Daniel Stone
2021-04-20 19:03           ` Bas Nieuwenhuizen
2021-04-20 19:18             ` Daniel Stone
2021-04-20 18:53       ` Daniel Vetter
2021-04-20 19:14         ` Daniel Stone
2021-04-20 19:29           ` Daniel Vetter
2021-04-20 20:32             ` Daniel Stone
2021-04-26 20:59               ` Marek Olšák
2021-04-27  8:02                 ` Daniel Vetter
2021-04-27 11:49                   ` Marek Olšák
2021-04-27 12:06                     ` Christian König
2021-04-27 12:11                       ` Marek Olšák
2021-04-27 12:15                         ` Daniel Vetter
2021-04-27 12:27                           ` Christian König
2021-04-27 12:46                           ` Marek Olšák
2021-04-27 12:50                             ` Christian König
2021-04-27 13:26                               ` Marek Olšák
2021-04-27 15:13                                 ` Christian König
2021-04-27 17:31                                 ` Lucas Stach
2021-04-27 17:35                                   ` Simon Ser
2021-04-27 18:01                                     ` Alex Deucher
2021-04-27 18:27                                       ` Simon Ser
2021-04-28 10:01                                         ` Daniel Vetter
2021-04-28 10:05                                       ` Daniel Vetter
2021-04-28 10:31                                         ` Christian König
2021-04-28 12:21                                           ` Daniel Vetter
2021-04-28 12:26                                             ` Daniel Vetter
2021-04-28 13:11                                               ` Christian König
2021-04-28 13:34                                                 ` Daniel Vetter
2021-04-28 13:37                                                   ` Christian König
2021-04-28 14:34                                                     ` Daniel Vetter
2021-04-28 14:45                                                       ` Christian König
2021-04-29 11:07                                                         ` Daniel Vetter
2021-04-28 20:39                                                       ` Alex Deucher
2021-04-29 11:12                                                         ` Daniel Vetter
2021-04-30  8:58                                                           ` Daniel Vetter
2021-04-30  9:07                                                             ` Christian König
2021-04-30  9:35                                                               ` Daniel Vetter
2021-04-30 10:17                                                                 ` Daniel Stone
2021-04-28 12:45                                             ` Simon Ser
2021-04-28 13:03                                           ` Alex Deucher
2021-04-27 19:41                                   ` Jason Ekstrand
2021-04-27 21:58                                     ` Marek Olšák
2021-04-28  4:01                                       ` Jason Ekstrand
2021-04-28  5:19                                         ` Marek Olšák
2021-04-27 18:38                       ` Dave Airlie
2021-04-27 19:23                         ` Marek Olšák
2021-04-28  6:59                           ` Christian König
2021-04-28  9:07                             ` Michel Dänzer
2021-04-28  9:57                               ` Daniel Vetter
2021-05-01 22:27                               ` Marek Olšák
2021-05-03 14:42                                 ` Alex Deucher
2021-05-03 14:59                                   ` Jason Ekstrand
2021-05-03 15:03                                     ` Christian König
2021-05-03 15:15                                       ` Jason Ekstrand
2021-05-03 15:16                                     ` Bas Nieuwenhuizen
2021-05-03 15:23                                       ` Jason Ekstrand
2021-05-03 20:36                                         ` Marek Olšák
2021-05-04  3:11                                           ` Marek Olšák
2021-05-04  7:01                                             ` Christian König
2021-05-04  7:32                                               ` Daniel Vetter
2021-05-04  8:09                                                 ` Christian König
2021-05-04  8:27                                                   ` Daniel Vetter
2021-05-04  9:14                                                     ` Christian König
2021-05-04  9:47                                                       ` Daniel Vetter
2021-05-04 10:53                                                         ` Christian König
2021-05-04 11:13                                                           ` Daniel Vetter
2021-05-04 12:48                                                             ` Christian König
2021-05-04 16:44                                                               ` Daniel Vetter
2021-05-04 17:16                                                               ` Marek Olšák
2021-05-04 21:06                                                                 ` Jason Ekstrand
2021-04-28  9:54                             ` Daniel Vetter
2021-04-27 20:49                         ` Jason Ekstrand
2021-04-27 12:12                     ` Daniel Vetter
2021-04-20 19:16         ` Jason Ekstrand
2021-04-20 19:27           ` Daniel Vetter
2021-04-20 14:53 ` Daniel Stone
2021-04-20 14:58   ` [Mesa-dev] " Christian König
2021-04-20 15:07     ` Daniel Stone
2021-04-20 15:16       ` Christian König
2021-04-20 15:49         ` Daniel Stone
2021-04-20 16:25           ` Marek Olšák
2021-04-20 16:42             ` Jacob Lifshay
2021-04-20 18:03             ` Daniel Stone
2021-04-20 18:39             ` Daniel Vetter
2021-04-20 19:20               ` Marek Olšák

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOFGe95HdA3+4ihCPHuf_8DSJeLsDfHqJZE0M84_9d6bbLhKGQ@mail.gmail.com \
    --to=jason@jlekstrand.net \
    --cc=ckoenig.leichtzumerken@gmail.com \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=maraeo@gmail.com \
    --cc=mesa-dev@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).