From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: ** X-Spam-Status: No, score=2.6 required=3.0 tests=DKIM_ADSP_CUSTOM_MED, DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,HTML_MESSAGE,MAILING_LIST_MULTI, MIME_HTML_MOSTLY,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 42F3EC0044D for ; Mon, 16 Mar 2020 03:50:46 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 05F0A20658 for ; Mon, 16 Mar 2020 03:50:45 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="CjUwz/S7" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 05F0A20658 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=dri-devel-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id E52136E329; Mon, 16 Mar 2020 03:50:43 +0000 (UTC) Received: from mail-pl1-x636.google.com (mail-pl1-x636.google.com [IPv6:2607:f8b0:4864:20::636]) by gabe.freedesktop.org (Postfix) with ESMTPS id DB98D6E0D0; Mon, 16 Mar 2020 03:50:42 +0000 (UTC) Received: by mail-pl1-x636.google.com with SMTP id ay11so7367277plb.0; Sun, 15 Mar 2020 20:50:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=UW/HFQjKEah1qrbpQMFto5SQSC7poXNf/OZE4uJnndo=; b=CjUwz/S7Q50C55biq6IHl5/zxfi+2kzZ8axxTroi1D2Q/HwdCvW47blHDS2H0Stw9z NHL0leL9lJkQAoIiah8HC/G//7ROfS0pZMpKiA9o4y60t8K1n3d+S00dXxaTVTNnw86A WXKQKAGzsqGoKrUMNeYOuUT+bmDbQ3dDveN1U6NQ7qqaUBqMfkZscLxqAmfc/WFIaLjU t9fNrI/bRVswSsOcATjyCvMVLOdEj6M5KEdXqopJJHGdfLxLoQqFKJDQ9v3vggsKCjRM QIHaJgyK89whJuwPXZvbTZGtNRT0kd8+psUYBy8kuu+T7cxnq9sZx6IdzsNXmW+3oqWC ByLQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=UW/HFQjKEah1qrbpQMFto5SQSC7poXNf/OZE4uJnndo=; b=TvjkMYWqnmI5x3luO94btD0inDqGlClAhnhkbLZUF60WGB1Ef6+Lo7f7BvnvFWhLmy 0e87et0Rihy5nZt0AG3mReY63JByFKsByZbCvKnpiePi2nFGc+L1UX87X7BkJTQi9iZb /2lz5LFRcUcxeA9XijuiP6Avdd2LW0p/sVfbJnFcGyKzbJm5PzlkKKYuT8y4PsCr4haM 0+zOg56RE7IWrL0yHtMAeip7DOXuADiu/dBCJWeL3G9Xl9/oHHzBK3hw2k/KeDhx5G1Q 9LT56Lx6D1moltRdkrewAMSSu9m+MMqnidc8+UTwZq9DmJ9Tp8s2cPblTGuKzJkeLiGp fQmQ== X-Gm-Message-State: ANhLgQ3LtrtCZXfrPKeipgWd4trF0oCoe4MuS3PAEHOOstWQbGAQMQ4Y PmY0clom5g6GuojB3lEYjGFepltI+TncM6W3AqU= X-Google-Smtp-Source: ADFU+vtCMJ69ttHvNywYvKgXOj/mFiG8nyOnRUJ9R6JiEI81YwDiejEG6kVEtag+OVzd/6kpiavpdX9VdY9OGXaDy94= X-Received: by 2002:a17:90a:d350:: with SMTP id i16mr23193152pjx.38.1584330642266; Sun, 15 Mar 2020 20:50:42 -0700 (PDT) MIME-Version: 1.0 References: <170e13edbb0.27ad.c6988b7ea6112e3e892765a0d4287e0c@jlekstrand.net> In-Reply-To: <170e13edbb0.27ad.c6988b7ea6112e3e892765a0d4287e0c@jlekstrand.net> From: =?UTF-8?B?TWFyZWsgT2zFocOhaw==?= Date: Sun, 15 Mar 2020 23:50:29 -0400 Message-ID: Subject: Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem To: Jason Ekstrand X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Daniel Vetter , xorg-devel , Maling list - DRI developers , "wayland-devel @ lists . freedesktop . org" , Discussion of the development of and with GStreamer , ML mesa-dev , linux-media@vger.kernel.org Content-Type: multipart/mixed; boundary="===============0500380615==" Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" --===============0500380615== Content-Type: multipart/alternative; boundary="000000000000239d9d05a0f0b952" --000000000000239d9d05a0f0b952 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable The synchronization works because the Mesa driver waits for idle (drains the GFX pipeline) at the end of command buffers and there is only 1 graphics queue, so everything is ordered. The GFX pipeline runs asynchronously to the command buffer, meaning the command buffer only starts draws and doesn't wait for completion. If the Mesa driver didn't wait at the end of the command buffer, the command buffer would finish and a different process could start execution of its own command buffer while shaders of the previous process are still running. If the Mesa driver submits a command buffer internally (because it's full), it doesn't wait, so the GFX pipeline doesn't notice that a command buffer ended and a new one started. The waiting at the end of command buffers happens only when the flush is external (Swap buffers, glFlush). It's a performance problem, because the GFX queue is blocked until the GFX pipeline is drained at the end of every frame at least. So explicit fences for SwapBuffers would help. Marek On Sun., Mar. 15, 2020, 22:49 Jason Ekstrand, wrote: > Could you elaborate. If there's something missing from my mental model of > how implicit sync works, I'd like to have it corrected. People continue > claiming that AMD is somehow special but I have yet to grasp what makes i= t > so. (Not that anyone has bothered to try all that hard to explain it.) > > > --Jason > > On March 13, 2020 21:03:21 Marek Ol=C5=A1=C3=A1k wrote= : > >> There is no synchronization between processes (e.g. 3D app and >> compositor) within X on AMD hw. It works because of some hacks in Mesa. >> >> Marek >> >> On Wed, Mar 11, 2020 at 1:31 PM Jason Ekstrand >> wrote: >> >>> All, >>> >>> Sorry for casting such a broad net with this one. I'm sure most people >>> who reply will get at least one mailing list rejection. However, this >>> is an issue that affects a LOT of components and that's why it's >>> thorny to begin with. Please pardon the length of this e-mail as >>> well; I promise there's a concrete point/proposal at the end. >>> >>> >>> Explicit synchronization is the future of graphics and media. At >>> least, that seems to be the consensus among all the graphics people >>> I've talked to. I had a chat with one of the lead Android graphics >>> engineers recently who told me that doing explicit sync from the start >>> was one of the best engineering decisions Android ever made. It's >>> also the direction being taken by more modern APIs such as Vulkan. >>> >>> >>> ## What are implicit and explicit synchronization? >>> >>> For those that aren't familiar with this space, GPUs, media encoders, >>> etc. are massively parallel and synchronization of some form is >>> required to ensure that everything happens in the right order and >>> avoid data races. Implicit synchronization is when bits of work (3D, >>> compute, video encode, etc.) are implicitly based on the absolute >>> CPU-time order in which API calls occur. Explicit synchronization is >>> when the client (whatever that means in any given context) provides >>> the dependency graph explicitly via some sort of synchronization >>> primitives. If you're still confused, consider the following >>> examples: >>> >>> With OpenGL and EGL, almost everything is implicit sync. Say you have >>> two OpenGL contexts sharing an image where one writes to it and the >>> other textures from it. The way the OpenGL spec works, the client has >>> to make the API calls to render to the image before (in CPU time) it >>> makes the API calls which texture from the image. As long as it does >>> this (and maybe inserts a glFlush?), the driver will ensure that the >>> rendering completes before the texturing happens and you get correct >>> contents. >>> >>> Implicit synchronization can also happen across processes. Wayland, >>> for instance, is currently built on implicit sync where the client >>> does their rendering and then does a hand-off (via wl_surface::commit) >>> to tell the compositor it's done at which point the compositor can now >>> texture from the surface. The hand-off ensures that the client's >>> OpenGL API calls happen before the server's OpenGL API calls. >>> >>> A good example of explicit synchronization is the Vulkan API. There, >>> a client (or multiple clients) can simultaneously build command >>> buffers in different threads where one of those command buffers >>> renders to an image and the other textures from it and then submit >>> both of them at the same time with instructions to the driver for >>> which order to execute them in. The execution order is described via >>> the VkSemaphore primitive. With the new VK_KHR_timeline_semaphore >>> extension, you can even submit the work which does the texturing >>> BEFORE the work which does the rendering and the driver will sort it >>> out. >>> >>> The #1 problem with implicit synchronization (which explicit solves) >>> is that it leads to a lot of over-synchronization both in client space >>> and in driver/device space. The client has to synchronize a lot more >>> because it has to ensure that the API calls happen in a particular >>> order. The driver/device have to synchronize a lot more because they >>> never know what is going to end up being a synchronization point as an >>> API call on another thread/process may occur at any time. As we move >>> to more and more multi-threaded programming this synchronization (on >>> the client-side especially) becomes more and more painful. >>> >>> >>> ## Current status in Linux >>> >>> Implicit synchronization in Linux works via a the kernel's internal >>> dma_buf and dma_fence data structures. A dma_fence is a tiny object >>> which represents the "done" status for some bit of work. Typically, >>> dma_fences are created as a by-product of someone submitting some bit >>> of work (say, 3D rendering) to the kernel. The dma_buf object has a >>> set of dma_fences on it representing shared (read) and exclusive >>> (write) access to the object. When work is submitted which, for >>> instance renders to the dma_buf, it's queued waiting on all the fences >>> on the dma_buf and and a dma_fence is created representing the end of >>> said rendering work and it's installed as the dma_buf's exclusive >>> fence. This way, the kernel can manage all its internal queues (3D >>> rendering, display, video encode, etc.) and know which things to >>> submit in what order. >>> >>> For the last few years, we've had sync_file in the kernel and it's >>> plumbed into some drivers. A sync_file is just a wrapper around a >>> single dma_fence. A sync_file is typically created as a by-product of >>> submitting work (3D, compute, etc.) to the kernel and is signaled when >>> that work completes. When a sync_file is created, it is guaranteed by >>> the kernel that it will become signaled in finite time and, once it's >>> signaled, it remains signaled for the rest of time. A sync_file is >>> represented in UAPIs as a file descriptor and can be used with normal >>> file APIs such as dup(). It can be passed into another UAPI which >>> does some bit of queue'd work and the submitted work will wait for the >>> sync_file to be triggered before executing. A sync_file also supports >>> poll() if you want to wait on it manually. >>> >>> Unfortunately, sync_file is not broadly used and not all kernel GPU >>> drivers support it. Here's a very quick overview of my understanding >>> of the status of various components (I don't know the status of >>> anything in the media world): >>> >>> - Vulkan: Explicit synchronization all the way but we have to go >>> implicit as soon as we interact with a window-system. Vulkan has APIs >>> to import/export sync_files to/from it's VkSemaphore and VkFence >>> synchronization primitives. >>> - OpenGL: Implicit all the way. There are some EGL extensions to >>> enable some forms of explicit sync via sync_file but OpenGL itself is >>> still implicit. >>> - Wayland: Currently depends on implicit sync in the kernel (accessed >>> via EGL/OpenGL). There is an unstable extension to allow passing >>> sync_files around but it's questionable how useful it is right now >>> (more on that later). >>> - X11: With present, it has these "explicit" fence objects but >>> they're always a shmfence which lets the X server and client do a >>> userspace CPU-side hand-off without going over the socket (and >>> round-tripping through the kernel). However, the only thing that >>> fence does is order the OpenGL API calls in the client and server and >>> the real synchronization is still implicit. >>> - linux/i915/gem: Fully supports using sync_file or syncobj for >>> explicit sync. >>> - linux/amdgpu: Supports sync_file and syncobj but it still >>> implicitly syncs sometimes due to it's internal memory residency >>> handling which can lead to over-synchronization. >>> - KMS: Implicit sync all the way. There are no KMS APIs which take >>> explicit sync primitives. >>> - v4l: ??? >>> - gstreamer: ??? >>> - Media APIs such as vaapi etc.: ??? >>> >>> >>> ## Chicken and egg problems >>> >>> Ok, this is where it starts getting depressing. I made the claim >>> above that Wayland has an explicit synchronization protocol that's of >>> questionable usefulness. I would claim that basically any bit of >>> plumbing we do through window systems is currently of questionable >>> usefulness. Why? >>> >>> From my perspective, as a Vulkan driver developer, I have to deal with >>> the fact that Vulkan is an explicit sync API but Wayland and X11 >>> aren't. Unfortunately, the Wayland extension solves zero problems for >>> me because I can't really use it unless it's implemented in all of the >>> compositors. Until every Wayland compositor I care about my users >>> being able to use (which is basically all of them) supports the >>> extension, I have to continue carry around my pile of hacks to keep >>> implicit sync and Vulkan working nicely together. >>> >>> From the perspective of a Wayland compositor (I used to play in this >>> space), they'd love to implement the new explicit sync extension but >>> can't. Sure, they could wire up the extension, but the moment they go >>> to flip a client buffer to the screen directly, they discover that KMS >>> doesn't support any explicit sync APIs. So, yes, they can technically >>> implement the extension assuming the EGL stack they're running on has >>> the sync_file extensions but any client buffers which come in using >>> the explicit sync Wayland extension have to be composited and can't be >>> scanned out directly. As a 3D driver developer, I absolutely don't >>> want compositors doing that because my users will complain about >>> performance issues due to the extra blit. >>> >>> Ok, so let's say we get KMS wired up with implicit sync. That solves >>> all our problems, right? It does, right up until someone decides that >>> they wan to screen capture their Wayland session via some hardware >>> media encoder that doesn't support explicit sync. Now we have to >>> plumb it all the way through the media stack, gstreamer, etc. Great, >>> so let's do that! Oh, but gstreamer won't want to plumb it through >>> until they're guaranteed that they can use explicit sync when >>> displaying on X11 or Wayland. Are you seeing the problem? >>> >>> To make matters worse, since most things are doing implicit >>> synchronization today, it's really easy to get your explicit >>> synchronization wrong and never notice. If you forget to pass a >>> sync_file into one place (say you never notice KMS doesn't support >>> them), it will probably work anyway thanks to all the implicit sync >>> that's going on elsewhere. >>> >>> So, clearly, we all need to go write piles of code that we can't >>> actually properly test until everyone else has written their piece and >>> then we use explicit sync if and only if all components support it. >>> Really? We're going to do multiple years of development and then just >>> hope it works when we finally flip the switch? That doesn't sound >>> like a good plan to me. >>> >>> >>> ## A proposal: Implicit and explicit sync together >>> >>> How to solve all these chicken-and-egg problems is something I've been >>> giving quite a bit of thought (and talking with many others about) in >>> the last couple of years. One motivation for this is that we have to >>> deal with a mismatch in Vulkan. Another motivation is that I'm >>> becoming increasingly unhappy with the way that synchronization, >>> memory residency, and command submission are inherently intertwined in >>> i915 and would like to break things apart. Towards that end, I have >>> an actual proposal. >>> >>> A couple weeks ago, I sent a series of patches to the dri-devel >>> mailing list which adds a pair of new ioctls to dma-buf which allow >>> userspace to manually import or export a sync_file from a dma-buf. >>> The idea is that something like a Wayland compositor can switch to >>> 100% explicit sync internally once the ioctl is available. If it gets >>> buffers in from a client that doesn't use the explicit sync extension, >>> it can pull a sync_file from the dma-buf and use that exactly as it >>> would a sync_file passed via the explicit sync extension. When it >>> goes to scan out a user buffer and discovers that KMS doesn't accept >>> sync_files (or if it tries to use that pesky media encoder no one has >>> converted), it can take it's sync_file for display and stuff it into >>> the dma-buf before handing it to KMS. >>> >>> Along with the kernel patches, I've also implemented support for this >>> in the Vulkan WSI code used by ANV and RADV. With those patches, the >>> only requirement on the Vulkan drivers is that you be able to export >>> any VkSemaphore as a sync_file and temporarily import a sync_file into >>> any VkFence or VkSemaphore. As long as that works, the core Vulkan >>> driver only ever sees explicit synchronization via sync_file. The WSI >>> code uses these new ioctls to translate the implicit sync of X11 and >>> Wayland to the explicit sync the Vulkan driver wants. >>> >>> I'm hoping (and here's where I want a sanity check) that a simple API >>> like this will allow us to finally start moving the Linux ecosystem >>> over to explicit synchronization one piece at a time in a way that's >>> actually correct. (No Wayland explicit sync with compositors hoping >>> KMS magically works even though it doesn't have a sync_file API.) >>> Once some pieces in the ecosystem start moving, there will be >>> motivation to start moving others and maybe we can actually build the >>> momentum to get most everything converted. >>> >>> For reference, you can find the kernel RFC patches and mesa MR here: >>> >>> https://lists.freedesktop.org/archives/dri-devel/2020-March/258833.html >>> >>> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4037 >>> >>> At this point, I welcome your thoughts, comments, objections, and >>> maybe even help/review. :-) >>> >>> --Jason Ekstrand >>> _______________________________________________ >>> mesa-dev mailing list >>> mesa-dev@lists.freedesktop.org >>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev >>> >> > --000000000000239d9d05a0f0b952 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
The synchronization works because the Mesa driver waits f= or idle (drains the GFX pipeline) at the end of command buffers and there i= s only 1 graphics queue, so everything is ordered.

The GFX pipeline runs asynchronously to the command bu= ffer, meaning the command buffer only starts draws and doesn't wait for= completion. If the Mesa driver didn't wait at the end of the command b= uffer, the command buffer would finish and a different process could start = execution of its own command buffer while shaders of the previous process a= re still running.

If the M= esa driver submits a command buffer internally (because it's full), it = doesn't wait, so the GFX pipeline doesn't notice that a command buf= fer ended and a new one started.

The waiting at the end of command buffers happens only when the f= lush is external (Swap buffers, glFlush).

=
It's a performance problem, because the GFX queue is = blocked until the GFX pipeline is drained at the end of every frame at leas= t.

So explicit fences fo= r SwapBuffers would help.

Marek

On Sun., Mar. 15, 2020, 22:49 Jason Ekstrand, <jason@jlekstrand.net> wrote:
Could you elaborate. If there's something missing fro= m my mental model of how implicit sync works, I'd like to have it corre= cted. People continue claiming that AMD is somehow special but I have yet t= o grasp what makes it so.=C2=A0 (Not that anyone has bothered to try all th= at hard to explain it.)

=
--Jason

On March 13, 2020 21:03:21 Marek Ol=C5=A1=C3=A1k <maraeo@gmail.com>= ; wrote:

There is no synchronization between processes (e.g. 3= D app and compositor) within X on AMD hw. It works because of some hacks in= Mesa.

Marek

On Wed, Mar 11, 2020 at 1:31= PM Jason Ekstrand <jason@jlekstrand.net> wrote:
All,

Sorry for casting such a broad net with this one. I'm sure most people<= br> who reply will get at least one mailing list rejection.=C2=A0 However, this=
is an issue that affects a LOT of components and that's why it's thorny to begin with.=C2=A0 Please pardon the length of this e-mail as
well; I promise there's a concrete point/proposal at the end.


Explicit synchronization is the future of graphics and media.=C2=A0 At
least, that seems to be the consensus among all the graphics people
I've talked to.=C2=A0 I had a chat with one of the lead Android graphic= s
engineers recently who told me that doing explicit sync from the start
was one of the best engineering decisions Android ever made.=C2=A0 It's=
also the direction being taken by more modern APIs such as Vulkan.


## What are implicit and explicit synchronization?

For those that aren't familiar with this space, GPUs, media encoders, etc. are massively parallel and synchronization of some form is
required to ensure that everything happens in the right order and
avoid data races.=C2=A0 Implicit synchronization is when bits of work (3D,<= br> compute, video encode, etc.) are implicitly based on the absolute
CPU-time order in which API calls occur.=C2=A0 Explicit synchronization is<= br> when the client (whatever that means in any given context) provides
the dependency graph explicitly via some sort of synchronization
primitives.=C2=A0 If you're still confused, consider the following
examples:

With OpenGL and EGL, almost everything is implicit sync.=C2=A0 Say you have=
two OpenGL contexts sharing an image where one writes to it and the
other textures from it.=C2=A0 The way the OpenGL spec works, the client has=
to make the API calls to render to the image before (in CPU time) it
makes the API calls which texture from the image.=C2=A0 As long as it does<= br> this (and maybe inserts a glFlush?), the driver will ensure that the
rendering completes before the texturing happens and you get correct
contents.

Implicit synchronization can also happen across processes.=C2=A0 Wayland, for instance, is currently built on implicit sync where the client
does their rendering and then does a hand-off (via wl_surface::commit)
to tell the compositor it's done at which point the compositor can now<= br> texture from the surface.=C2=A0 The hand-off ensures that the client's<= br> OpenGL API calls happen before the server's OpenGL API calls.

A good example of explicit synchronization is the Vulkan API.=C2=A0 There,<= br> a client (or multiple clients) can simultaneously build command
buffers in different threads where one of those command buffers
renders to an image and the other textures from it and then submit
both of them at the same time with instructions to the driver for
which order to execute them in.=C2=A0 The execution order is described via<= br> the VkSemaphore primitive.=C2=A0 With the new VK_KHR_timeline_semaphore
extension, you can even submit the work which does the texturing
BEFORE the work which does the rendering and the driver will sort it
out.

The #1 problem with implicit synchronization (which explicit solves)
is that it leads to a lot of over-synchronization both in client space
and in driver/device space.=C2=A0 The client has to synchronize a lot more<= br> because it has to ensure that the API calls happen in a particular
order.=C2=A0 The driver/device have to synchronize a lot more because they<= br> never know what is going to end up being a synchronization point as an
API call on another thread/process may occur at any time.=C2=A0 As we move<= br> to more and more multi-threaded programming this synchronization (on
the client-side especially) becomes more and more painful.


## Current status in Linux

Implicit synchronization in Linux works via a the kernel's internal
dma_buf and dma_fence data structures.=C2=A0 A dma_fence is a tiny object which represents the "done" status for some bit of work.=C2=A0 Ty= pically,
dma_fences are created as a by-product of someone submitting some bit
of work (say, 3D rendering) to the kernel.=C2=A0 The dma_buf object has a set of dma_fences on it representing shared (read) and exclusive
(write) access to the object.=C2=A0 When work is submitted which, for
instance renders to the dma_buf, it's queued waiting on all the fences<= br> on the dma_buf and and a dma_fence is created representing the end of
said rendering work and it's installed as the dma_buf's exclusive fence.=C2=A0 This way, the kernel can manage all its internal queues (3D rendering, display, video encode, etc.) and know which things to
submit in what order.

For the last few years, we've had sync_file in the kernel and it's<= br> plumbed into some drivers.=C2=A0 A sync_file is just a wrapper around a
single dma_fence.=C2=A0 A sync_file is typically created as a by-product of=
submitting work (3D, compute, etc.) to the kernel and is signaled when
that work completes.=C2=A0 When a sync_file is created, it is guaranteed by=
the kernel that it will become signaled in finite time and, once it's signaled, it remains signaled for the rest of time.=C2=A0 A sync_file is represented in UAPIs as a file descriptor and can be used with normal
file APIs such as dup().=C2=A0 It can be passed into another UAPI which
does some bit of queue'd work and the submitted work will wait for the<= br> sync_file to be triggered before executing.=C2=A0 A sync_file also supports=
poll() if=C2=A0 you want to wait on it manually.

Unfortunately, sync_file is not broadly used and not all kernel GPU
drivers support it.=C2=A0 Here's a very quick overview of my understand= ing
of the status of various components (I don't know the status of
anything in the media world):

=C2=A0- Vulkan: Explicit synchronization all the way but we have to go
implicit as soon as we interact with a window-system.=C2=A0 Vulkan has APIs=
to import/export sync_files to/from it's VkSemaphore and VkFence
synchronization primitives.
=C2=A0- OpenGL: Implicit all the way.=C2=A0 There are some EGL extensions t= o
enable some forms of explicit sync via sync_file but OpenGL itself is
still implicit.
=C2=A0- Wayland: Currently depends on implicit sync in the kernel (accessed=
via EGL/OpenGL).=C2=A0 There is an unstable extension to allow passing
sync_files around but it's questionable how useful it is right now
(more on that later).
=C2=A0- X11: With present, it has these "explicit" fence objects = but
they're always a shmfence which lets the X server and client do a
userspace CPU-side hand-off without going over the socket (and
round-tripping through the kernel).=C2=A0 However, the only thing that
fence does is order the OpenGL API calls in the client and server and
the real synchronization is still implicit.
=C2=A0- linux/i915/gem: Fully supports using sync_file or syncobj for expli= cit sync.
=C2=A0- linux/amdgpu: Supports sync_file and syncobj but it still
implicitly syncs sometimes due to it's internal memory residency
handling which can lead to over-synchronization.
=C2=A0- KMS: Implicit sync all the way.=C2=A0 There are no KMS APIs which t= ake
explicit sync primitives.
=C2=A0- v4l: ???
=C2=A0- gstreamer: ???
=C2=A0- Media APIs such as vaapi etc.:=C2=A0 ???


## Chicken and egg problems

Ok, this is where it starts getting depressing.=C2=A0 I made the claim
above that Wayland has an explicit synchronization protocol that's of questionable usefulness.=C2=A0 I would claim that basically any bit of
plumbing we do through window systems is currently of questionable
usefulness.=C2=A0 Why?

>From my perspective, as a Vulkan driver developer, I have to deal with
the fact that Vulkan is an explicit sync API but Wayland and X11
aren't.=C2=A0 Unfortunately, the Wayland extension solves zero problems= for
me because I can't really use it unless it's implemented in all of = the
compositors.=C2=A0 Until every Wayland compositor I care about my users
being able to use (which is basically all of them) supports the
extension, I have to continue carry around my pile of hacks to keep
implicit sync and Vulkan working nicely together.

>From the perspective of a Wayland compositor (I used to play in this
space), they'd love to implement the new explicit sync extension but can't.=C2=A0 Sure, they could wire up the extension, but the moment the= y go
to flip a client buffer to the screen directly, they discover that KMS
doesn't support any explicit sync APIs.=C2=A0 So, yes, they can technic= ally
implement the extension assuming the EGL stack they're running on has the sync_file extensions but any client buffers which come in using
the explicit sync Wayland extension have to be composited and can't be<= br> scanned out directly.=C2=A0 As a 3D driver developer, I absolutely don'= t
want compositors doing that because my users will complain about
performance issues due to the extra blit.

Ok, so let's say we get KMS wired up with implicit sync.=C2=A0 That sol= ves
all our problems, right?=C2=A0 It does, right up until someone decides that=
they wan to screen capture their Wayland session via some hardware
media encoder that doesn't support explicit sync.=C2=A0 Now we have to<= br> plumb it all the way through the media stack, gstreamer, etc.=C2=A0 Great,<= br> so let's do that!=C2=A0 Oh, but gstreamer won't want to plumb it th= rough
until they're guaranteed that they can use explicit sync when
displaying on X11 or Wayland.=C2=A0 Are you seeing the problem?

To make matters worse, since most things are doing implicit
synchronization today, it's really easy to get your explicit
synchronization wrong and never notice.=C2=A0 If you forget to pass a
sync_file into one place (say you never notice KMS doesn't support
them), it will probably work anyway thanks to all the implicit sync
that's going on elsewhere.

So, clearly, we all need to go write piles of code that we can't
actually properly test until everyone else has written their piece and
then we use explicit sync if and only if all components support it.
Really?=C2=A0 We're going to do multiple years of development and then = just
hope it works when we finally flip the switch?=C2=A0 That doesn't sound=
like a good plan to me.


## A proposal: Implicit and explicit sync together

How to solve all these chicken-and-egg problems is something I've been<= br> giving quite a bit of thought (and talking with many others about) in
the last couple of years.=C2=A0 One motivation for this is that we have to<= br> deal with a mismatch in Vulkan.=C2=A0 Another motivation is that I'm becoming increasingly unhappy with the way that synchronization,
memory residency, and command submission are inherently intertwined in
i915 and would like to break things apart.=C2=A0 Towards that end, I have an actual proposal.

A couple weeks ago, I sent a series of patches to the dri-devel
mailing list which adds a pair of new ioctls to dma-buf which allow
userspace to manually import or export a sync_file from a dma-buf.
The idea is that something like a Wayland compositor can switch to
100% explicit sync internally once the ioctl is available.=C2=A0 If it gets=
buffers in from a client that doesn't use the explicit sync extension,<= br> it can pull a sync_file from the dma-buf and use that exactly as it
would a sync_file passed via the explicit sync extension.=C2=A0 When it
goes to scan out a user buffer and discovers that KMS doesn't accept sync_files (or if it tries to use that pesky media encoder no one has
converted), it can take it's sync_file for display and stuff it into the dma-buf before handing it to KMS.

Along with the kernel patches, I've also implemented support for this in the Vulkan WSI code used by ANV and RADV.=C2=A0 With those patches, the<= br> only requirement on the Vulkan drivers is that you be able to export
any VkSemaphore as a sync_file and temporarily import a sync_file into
any VkFence or VkSemaphore.=C2=A0 As long as that works, the core Vulkan driver only ever sees explicit synchronization via sync_file.=C2=A0 The WSI=
code uses these new ioctls to translate the implicit sync of X11 and
Wayland to the explicit sync the Vulkan driver wants.

I'm hoping (and here's where I want a sanity check) that a simple A= PI
like this will allow us to finally start moving the Linux ecosystem
over to explicit synchronization one piece at a time in a way that's actually correct.=C2=A0 (No Wayland explicit sync with compositors hoping KMS magically works even though it doesn't have a sync_file API.)
Once some pieces in the ecosystem start moving, there will be
motivation to start moving others and maybe we can actually build the
momentum to get most everything converted.

For reference, you can find the kernel RFC patches and mesa MR here:

https://lists.free= desktop.org/archives/dri-devel/2020-March/258833.html

https://gitlab.freedesktop.= org/mesa/mesa/-/merge_requests/4037

At this point, I welcome your thoughts, comments, objections, and
maybe even help/review. :-)

--Jason Ekstrand
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mail= man/listinfo/mesa-dev

--000000000000239d9d05a0f0b952-- --===============0500380615== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel --===============0500380615==--