From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E9515C2B9F7 for ; Wed, 26 May 2021 12:21:35 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7EB9661184 for ; Wed, 26 May 2021 12:21:35 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7EB9661184 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kde.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=dri-devel-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id AC6096ECC6; Wed, 26 May 2021 12:21:34 +0000 (UTC) Received: from mail-pf1-f175.google.com (mail-pf1-f175.google.com [209.85.210.175]) by gabe.freedesktop.org (Postfix) with ESMTPS id A0AF46ECC7 for ; Wed, 26 May 2021 12:21:33 +0000 (UTC) Received: by mail-pf1-f175.google.com with SMTP id d16so767837pfn.12 for ; Wed, 26 May 2021 05:21:33 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=X3eAA+e6rpspoPfU/+yeXERNgg2MJq/oOiZw0qV7Xm8=; b=XRZV2je2eT9TyDGMkEmJHYgWKEYcAowxMKJ8OrtqNNNTE9HAo/Fmt5kEmSiLwjywf2 Zz/WV1lXojQEWE0r57ki+GAouq/6XCW2yC4Usz2iPKeolvfinXhr8IKgmrW+0aBdwcw6 lLd76MUgG2RAFk0AZ26RriEnWZTRIBhS/r/FLaGxY/sE0lnUzCVd6JLVGcYVY/wttUnZ rbm8g0LoDkCMcX5Sv1AxRUWkAfy6L6Z0YND8m/XiaIEfOjStYoAMWRL1tzjjRr8lQQ6o f/947lqWqgewSBv8zq0Sc9HGoW/8c0SveXof14gLGynkTT9Ke+No6+XIuKZbk9cbtTHi aHPQ== X-Gm-Message-State: AOAM532f2UiB5/dD80CkOb96NmtwY1h40Qs4BeRiW/84LMcYvNppXUOB /h7pz1LBSIlIuttnWQlTsFLhU5STXIY= X-Google-Smtp-Source: ABdhPJyNfOUc5aE73lNtuj8U5l8nUKRgsP7vpgix48ptPwwBWHtqyLrJVjnb+Bk6bitTEqPg5KoHGg== X-Received: by 2002:a05:6a00:14cb:b029:2be:1466:5a28 with SMTP id w11-20020a056a0014cbb02902be14665a28mr34673289pfu.55.1622031691910; Wed, 26 May 2021 05:21:31 -0700 (PDT) Received: from [192.168.0.131] ([176.122.100.174]) by smtp.gmail.com with ESMTPSA id gg10sm14301614pjb.49.2021.05.26.05.21.28 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 26 May 2021 05:21:31 -0700 (PDT) Subject: Re: [PATCH 4/4] RFC: dma-buf: Add an API for importing sync files (v6) To: dri-devel@lists.freedesktop.org References: <20210520190007.534046-1-jason@jlekstrand.net> <20210520190007.534046-5-jason@jlekstrand.net> From: Vlad Zahorodnii Message-ID: <87260bb2-d85d-648c-58d4-c9c096bcf3c6@kde.org> Date: Wed, 26 May 2021 15:21:14 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.10.2 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" On 5/24/21 8:11 PM, Jason Ekstrand wrote: > On Sat, May 22, 2021 at 3:05 PM Daniel Stone wrote: >> >> Hi, >> >> On Thu, 20 May 2021 at 20:00, Jason Ekstrand wrote: >>> In the Vulkan world, this is useful for dealing with the out-fence from >>> vkQueuePresent. Current Linux window-systems (X11, Wayland, etc.) all >>> rely on dma-buf implicit sync. Since Vulkan is an explicit sync API, we >>> get a set of fences (VkSemaphores) in vkQueuePresent and have to stash >>> those as an exclusive (write) fence on the dma-buf. We handle it in >>> Mesa today with the above mentioned dummy submit trick. This ioctl >>> would allow us to set it directly without the dummy submit. >>> >>> This may also open up possibilities for GPU drivers to move away from >>> implicit sync for their kernel driver uAPI and instead provide sync >>> files and rely on dma-buf import/export for communicating with other >>> implicit sync clients. >>> >>> We make the explicit choice here to only allow setting RW fences which >>> translates to an exclusive fence on the dma_resv. There's no use for >>> read-only fences for communicating with other implicit sync userspace >>> and any such attempts are likely to be racy at best. >> >> I think I almost follow, but I'm not quite seeing the race you allude >> to. Let me talk through my understanding so it's hopefully more clear >> for others as a summary. Please correct me on anything I've >> misunderstood or just missed completely. (I thought when I wrote this >> intro that the email might be relatively snappy, but it's really long >> and takes in a lot of breadth. Sorry.) >> >> So as far as I'm reading this patchset and the Mesa MR, this _only_ >> concerns the out-fence (i.e. compositor -> client AcquireNextImage >> semaphore/fence) so far. The in-fence (client->compositor QueuePresent >> wait-semaphores/fences) is handled by the driver ensuring that an >> exclusive resv is placed on the union of the signal semaphores passed >> to QueuePresent, either through flags on its CS ioctl, or amdgpu's BO >> flags, or ... either way, no problem as it should always be exclusive, >> and it seems pretty uncontroversial that we should pull the wait >> semaphore into an exclusive fence so that no downstream consumer will >> begin using it until the client ops have fully retired. >> >> For AcquireNextImage, your patchset exports all the fences (shared and >> exclusive both) on the dma_resv out into the semaphore/fence such that >> the client can't progress (CPU-side for VkFence, GPU-side for >> VkSemaphore) until all currently queued operations have completely >> retired. So, as long as the server ensures that all its kernel-side >> work is flushed before its IPC to unblock ANI (wl_buffer.release or >> DRI3 PresentIdle, both indicating that the client is free to reuse the >> buffer, subject to synchronising against implicit fences on the resv), >> all good: we snapshot the current resv state, wrap the relevant >> driver-side Vulkan primitive around it, and go back to explicit >> synchronisation. The client can't race generating new work against >> this, because it can't queue any work until ANI has returned and >> queued a signaling op on the semaphore/fence. >> >> So far, so good. I really like both your series up until this >> narrative point; as I was saying in the userspace-fence thread, the >> WSI<->client thread is certainly pulling a very big lever with a >> heavyweight transition between the two different worlds, and I like >> that the new export ioctl lets us be very clear about what exactly is >> happening under the hood. Good stuff. > > Glad to hear. If you turned that into RBs on the first three patches > in this series and all but the last patch on the Mesa MR, it'd make me > even happier. :-D > > At this point, I think everyone is pretty happy with the first three > patches and the export ioctl. In the Vulkan WSI code, it solves a > significant over-synchronization issue for ANV. Also, the uAPI > shouldn't be controversial at all because it's identical to poll() > except that it gives you a FD you can poll on later to get the result > of the poll as it would be now. I think if we get some Mesa reviews, > we should be able to land those. It's import that's trickier. > >> So, what gives with the import ioctl? Taking a guess at where you're >> going, the import ioctl is going to be used in QueuePresent just as >> the export ioctl is in ANI: instead of having CS flags to write into >> the resv exclusive slot or per-BO flags to always dump in exclusive, >> there's a forthcoming patch somewhere which lets drivers skip this and >> instead have common QueuePresent code dump the wait semaphore into the >> resv, so servers on the other side of an implicit-only protocol will >> synchronise their access against the fence imported as part of >> client-side QueuePresent? > > Correct. And the patch isn't forthcoming. It already exists as the > top patch in the Mesa MR (!4037). > >> That makes sense to me and is nicely symmetrical, plus it gets GPU >> drivers further out of the business of doing magic winsys/display >> synchronisation, which is good. But why enforce that imported fences >> have to go into the exclusive slot? It makes sense from the point of >> view of clients doing QueueSubmit, but I think it might preclude other >> uses for no particularly good reason. > > Mostly, I was trying to limit the scope a bit. Import is tricky and, > to be honest, I'm still not 100% sure that it's safe. I probably > should have left RFC on this patch. > > As long as we keep everything inside the kernel, there's a requirement > that any work which triggers any fence on a dma_resv waits on the > exclusive fence, if any. Work which sets the exclusive fence has to > wait on all fences. With the import ioctl, we can guarantee that the > fences fire in the right order by making an array fence containing the > new fence and all other fences and using that as the exclusive fence. > What we can't do, however, is ensure that the work which triggers the > fence actually blocks on ANY of the fences on the dma_resv. > > Hrm... I think I've now convinced myself that importing shared fences > is no more dangerous than importing an exclusive fence because they > share the same set of problems. Unfortunately, I've unconvinced > myself of the safety of either. I've got to think some more.... > > The most convincing argument I can make to myself is that this ioctl > is like having a one-shot little queue that contains tiny little work > jobs which wait on whatever sync_file you provide, do nothing, and > then signal. That should be safe, right? > >> Certainly on X11 there's no real use for the shared slot - you get >> into difficulties with mixed client/server rendering - but at least >> Wayland and GBM are always non-destructive, so conceptually have a >> good use for being able to import fences into the shared resv. This >> goes for media and KMS access as well, if you think beyond the usecase >> of an explicit Vulkan client sending through to a dumb implicit server >> which just wants to texture. >> >> Media clients are definitely a relevant usecase, both directly with >> V4L2/VA-API, neither of which have support for explicit >> synchronisation yet and (at least for V4L2) are unlikely to do so in >> the near future, but even more with pipeline frameworks like GStreamer >> and PipeWire, which don't have fundamental awareness (they're prepared >> to deal with deep pipelines which return at arbitrary times, but once >> anything is returned, it is available for immediate use). Maybe >> open-coding CPU-side waits in these users is the best way longer term, >> but for easier interop if nothing else, being able to pull shared >> fences in seems like it could be a win ('the compositor is still >> texturing from this for now, so feel free to read back from ref >> frames, but don't scribble over it until it's finished'). >> >> I'm slightly bemused that there's so much energy spent on designing >> around winsys being dumb and implicit. > > I'd like to address this one as it's a comment you've made several > times. Once you've fixed raw X11 (not just XWayland) and a new > release has been made (hah!) and is shipping in distros with said > support, then we can talk. Sorry if that comes off as overly snarky > but that's reality that we (driver devs) are living with. To make > things even worse, when we're in Vulkan land (as opposed to GL), we > can't tell up-front whether or not our window-system supports foo > fences and adjust accordingly. We have to start up, begin rendering, > and only later figure out "oops, this one goes to X11". We really > can't say things like "when running on modern Wayland, do things the > new way" because Vulkan doesn't have a concept of "running on" a > window system. > > FWIW, I do have a Mesa MR somewhere which adds explicit sync for > Vulkan+Wayland when the wayland protocols are there. I don't remember > why it didn't land. Maybe lack of acquire fence support? I think the > protocol issues have been fixed, so we should up-rev the requirements > as needed and land it. > >> We did take a long time to roll >> out sync_file support, but that was only because we didn't quite >> understand all the nuances of why implicit sync is so difficult for >> GPU drivers on modern architectures and how it was actually a win >> compared to doing nothing; maybe we should have some kind of >> conference where we all get together and talk to each other ... ? >> Anyway, by the time we got to the cusp of rolling out bi-directional >> sync_file support, suddenly we had drm_syncobj. By the time that had >> semi-settled down and started to be rolled out, then we suddenly had >> userspace fences on the horizon. And what do we do with those? > > You're not wrong.... And that's the second reason for the gymnastics > above. Ever since Vulkan launched, we've been fumbling around trying > to figure out how to best do synchronization. 'm reasonably certain > that userspace memory fences are the future but I'm much less certain > about the path to get there. It's been a process of patient > experimentation so far and I think we're getting closer. Syncobj, > timeline syncobj, etc. have all been steps along that path. I've been > hesitant to ask the window-system people to draft piles of extensions > until we're sure we've found "the one". It's bad enough iterating in > kernel-space and userspace without bringing Wayland protocol into each > iteration step. For that reason, one of my goals along this process > has been to try and decouple things as much as we can. > > So, in summary, none of this is because I think that window systems > are dumb and implicit or for any lack of respect for the people that > work on them. I assume that X11 will always be dumb and implicit. > (I'd love to be proven wrong!) For Wayland (and XWayland, probably), > I assume the people are very smart and active and will implement > whatever APIs we (the driver people) need as long as they're > reasonable. I just don't know what to ask for yet. > >> We've - certainly Weston, and I'm pretty confident I speak for Simon >> on the wlroots side and Jonas for Mutter - landed on accepting that >> we're going to have to deal with timeline semaphores and >> wait-before-signal; we have a credible plan (we think) for protocol to >> deal with at least syncobj, which should conceptually extend to >> userspace fences. The biggest blocker there is the syncobj uAPI. >> Compositors aren't Vulkan clients - we don't have one thread per group >> of work, because the inter-client synchronisation becomes nightmarish >> overhead for little gain. I don't think this will ever change, because >> the balance of work is totally different between client and >> compositor. >> >> Anyway, the problem with syncobj is that the ioctl to wait for a >> sync_file to materialise for a given timeline point only allows us to >> block with a timeout; this is a non-starter, because we need something >> which fits into epoll. The most optimal case is returning a new >> eventfd or similar which signals when a given timeline point becomes >> available or signaled, but in extremis a syncobj FD becoming readable >> when any activity which would change the result of any zero-timeout >> wait on that syncobj is more or less workable. > > Right. You could call this an oversight. Really, though, it was > because the use-cases we were interested in were ones where a wait > with a timeout was perfectly acceptable and where the extra overhead > of setting an epoll with ioctls wasn't. It shouldn't be too hard to > wire up if that helps (cross your fingers). But.... > > If we go the direction of userspace memory fences (which I think is > likely), there is no such thing as "wait for the fence to > materialize". The work is either done or it isn't. There is no > enforceable concept of "about to be done". The word "enforceable" is > important there. We could add such a concept but it'd be dependent on > the userspace client (not driver) to let us know when it's just about > ready and then we'd have to VK_ERROR_DEVICE_LOST on them or similar if > they break the contract. Maybe possible but there's some design work > required there. The other option is to make the compositors handle > this new way of thinking more thoroughly and eat the latency hit. > >> What we do want to do though, regardless of the primitive or its >> semantics, is to only have to go through this once, not rework it all >> in six months, and have to support a bunch of diverging codepaths. So >> what is the optimal synchronisation primitive we should be aiming for >> on the winsys side? Is sync_file a good enough lowest common >> denominator, or should we do timeline-syncobj for fancy clients, in >> tandem with sync_file bolted on the side for media and ancient GPUs? >> Or are userspace fences going to give us some new primitive? And if >> so, can we please talk about those semantics now, so we don't end up >> with three synchronisation mechanisms which are all sort of but not >> really alike? > > As I said above, I think we're getting close but I don't think we're there yet. > >> As far as I can tell, the three relevant vendors with (more or less) >> upstream drivers here are AMD, Arm, and Intel. Arm is obviously a bit >> more up in the air as the hardware and specs aren't currently >> available to the upstream development teams, but hopefully we can >> bring them into this conversation. I think it's looking like we're >> optimising for all of the following, without leaving anyone out in the >> cold: >> >> 1. Modern userspace-fencing hardware running on a >> userspace-fencing-aware winsys, i.e. new AMD/Arm/Intel on one of the >> big three Wayland compositors which handle enough synchronisation >> logic internally that the choice of underlying >> composition/presentation API is immaterial (for which any new API is >> important) >> 2. Modern userspace-fencing hardware running on Xwayland, for which >> we'll inevitably have to type out new DRI3 synchronsiation, but >> derived purely from the Wayland protocols so it can be passed through >> quite directly, and with any mixed X11<->user buffer client >> interaction (e.g. subwindows) being penalised by a long blocking wait >> in the X server >> 3. sync_file-oriented hardware, for which we need to do ~nothing >> 4. Modern userspace-fencing hardware and APIs which need to interop >> with media units, for all four combinations of client/server >> source/sink; for some APIs like Vulkan Video synchronisation is not a >> problem, for others like VA-API/V4L2/GStreamer we're probably need >> going to live with the implicit semantics for the foreseeable future, >> which means we should do the right thing for them up front, because >> why would you even be playing games if you're not streaming them to >> Twitch at the same time? (Personally I'm much happier that no-one >> other than my friends who already know I'm terrible can see me playing >> CoD, but y'know, kids these days ...) >> >> The fifth case I've left out is clients who want to smash Vulkan >> content directly into the display. For games and Kodi-like UI I'm just >> going to ignore this, because (maybe I'm biased, but) using KMS >> directly is difficult enough that you shouldn't do it and just use a >> winsys because we've spent a very long time dealing with these >> problems for you. The remaining usecase there is XR, but their uses >> are very different, and OpenXR already deals with a _lot_ of this for >> you, with the runtimes having sufficiently sophisticated >> synchronisation internally that whatever we come up with won't be >> onerous for them to implement. Either way, integration with KMS/GBM >> has the same problem as Wayland, in that unless you fully encapsulate >> it in something like VK_KHR_display, you don't get to throw a thread >> under the bus to do delayed submit, because you need to actually >> return something to the caller. > > You're missing a use-case: Modern userspace-fencing hardware running > on bare X11. I don't know that we should optimize for this case, so > to speak, but it has to work non-terribly. Also, as I alluded to > above, I really don't want to be maintaining two drivers forever: One > for X11 and ancient Wayland and one for modern Wayland. We need to be > able to modernize the driver and still support the old window-systems. > Unless you can promise me that X11 and KDE/Wayland will either go away > or else get modernized, I need a fall-back plan. And even if you make Hi, I don't know why you specifically mentioned KDE/Wayland, but we are on board with explicit sync. :) Cheers, Vlad > me said promise, I'm not going to believe you. :-P 8-9 years ago, I > was one of the people making those promises. Now I'm writing X11 > winsys code for drivers. And... now I'm re-thinking some of my life > choices.... > >> Ultimately, I think I'm leaning towards agreeing with Christian that I >> would like to see a good holistic model for how this works in a >> variety of usecases before we progress new uAPI, but also to agreeing >> with you and Dan in that how amdgpu currently implements things is >> currently objectively very wrong (despite the motivations for having >> implemented it that way). >> >> If there are any usecases I've missed then I'm all ears, but else I >> think we should take this forward by working with >> Vulkan/EGL/Wayland/X11/media/KMS and coming up with realistic >> skeletons for end-to-end synchronisation through those usecases. It's >> clear that this never happened for syncobj, because it's just not >> usable as a primitive in any compositor which exists today: let's make >> sure we don't get into the same trap again. If we can do this over >> email/GitLab then that's great, but if not, maybe we need to do a kind >> of mini-XDC with some kind of virtual whiteboard we can all scribble >> over. > > I think what we're looking at is roughly three steps, the first two of > which are mostly there on the Wayland side: > > 1. Implicit sync. We know what this one is. glFlush/vkQueueSubmit > on the one side, start texturing on the other, and it all works. > > 2. Immutable SW fences, i.e. sync_file. This is where we have a > fence object that gets returned from the flush/submit and can be > passed into the texture op to provide a dependency. Syncobj may be > useful here but only as a container. If a sync_file is a dma_fence*, > a syncobj should be thought of as a dma_fence**. This may be useful > if we want to retrofit sync_file into X11 where the current DRI3 > explicit sync stuff is based on creating an object and then re-using > it for every present. > > 3. Userspace memory fences. > > Note that timeline syncobj is NOT in that list. IMO, all the "wait > for submit" stuff is an implementation detail we needed in order to > get the timeline semantics on top of immutable SW fences. Under the > hood it's all dma_fence; this just gives us a shareable container so > we can implement VK_KHR_timeline_semaphore with sharing. I really > don't want to make Wayland protocol around it if memory fences are the > final solution. > > >> (As a coda to this, I'm pretty sure I understand the subtleties of the >> memory fences for relocation/shootdown, but it would be _really_ good >> to have a coherent description everyone agrees on written down >> somewhere, so people can understand the issues without having to read >> 250 emails with people at loggerheads.) > > Let me give it a try. I define a userspace memory fence as follows: > - The fence object is fundamentally a bit of mappable gpu-accessible > memory which stores a 64-bit counter value which is used as a > timeline. (On Windows, they require it to live in system memory.) > - For sharable memory fences, each one probably has to go in its own page. :-( > - The value is initialized to 0. > - To signal the fence, someone (GPU or CPU) writes a new 64-bit value. > - Waits on the fence are waits for it to be >= some value. > > Depending on circumstances, the wait may be a 32-bit comparison and > may be >= with wrap-around. For the purposes of window-system > protocol, I think we can assume 64-bit >= with no wrap-around. > > There are a few very important points worth noting about them, however: > - No one except the userspace client (not driver!) has any idea > when/if a value will be signaled > - The value may never be signaled > - Someone may submit a wait before someone submits a signal; we have > to deal with that > - There is no concept of "about to be signaled" > - For the kernel/firmware GPU scheduler handling waits, this means it > just keeps trying to run work and hopes the wait eventually unblocks. > It also means we need to totally decouple kernel synchronization for > memory management purposes from synchronization for work scheduling. > - For a compositor, I'm still not 100% sure what this means. TBD, I think. > > There are some ways to work around some of these issues. Windows has > a few tricks which we might be able to steal if we want. > > Why would anyone want such a horrid thing? Application developers > absolutely love them. They can write massively multi-threaded apps > with piles of work queues that require very little cross-thread > synchronization and the GPU scheduler sorts it all out for them in the > end. If you're a 3D game engine developer, this timeline model is > very powerful. If you're a driver or window-system developer, you > really have to just embrace the lack of knowledge and trust apps. > > Oof... That got long. I hope it was informative. > > --Jason > > >> Cheers, >> Daniel >> >> >> >> >> >> >> >> >>> When we got to >>> insert the RW fence, the actual fence we set as the new exclusive fence >>> is a combination of the sync_file provided by the user and all the other >>> fences on the dma_resv. This ensures that the newly added exclusive >>> fence will never signal before the old one would have and ensures that >>> we don't break any dma_resv contracts. We require userspace to specify >>> RW in the flags for symmetry with the export ioctl and in case we ever >>> want to support read fences in the future. >>> >>> There is one downside here that's worth documenting: If two clients >>> writing to the same dma-buf using this API race with each other, their >>> actions on the dma-buf may happen in parallel or in an undefined order. >>> Both with and without this API, the pattern is the same: Collect all >>> the fences on dma-buf, submit work which depends on said fences, and >>> then set a new exclusive (write) fence on the dma-buf which depends on >>> said work. The difference is that, when it's all handled by the GPU >>> driver's submit ioctl, the three operations happen atomically under the >>> dma_resv lock. If two userspace submits race, one will happen before >>> the other. You aren't guaranteed which but you are guaranteed that >>> they're strictly ordered. If userspace manages the fences itself, then >>> these three operations happen separately and the two render operations >>> may happen genuinely in parallel or get interleaved. However, this is a >>> case of userspace racing with itself. As long as we ensure userspace >>> can't back the kernel into a corner, it should be fine. >>> >>> v2 (Jason Ekstrand): >>> - Use a wrapper dma_fence_array of all fences including the new one >>> when importing an exclusive fence. >>> >>> v3 (Jason Ekstrand): >>> - Lock around setting shared fences as well as exclusive >>> - Mark SIGNAL_SYNC_FILE as a read-write ioctl. >>> - Initialize ret to 0 in dma_buf_wait_sync_file >>> >>> v4 (Jason Ekstrand): >>> - Use the new dma_resv_get_singleton helper >>> >>> v5 (Jason Ekstrand): >>> - Rename the IOCTLs to import/export rather than wait/signal >>> - Drop the WRITE flag and always get/set the exclusive fence >>> >>> v5 (Jason Ekstrand): >>> - Split import and export into separate patches >>> - New commit message >>> >>> Signed-off-by: Jason Ekstrand >>> --- >>> drivers/dma-buf/dma-buf.c | 32 ++++++++++++++++++++++++++++++++ >>> include/uapi/linux/dma-buf.h | 1 + >>> 2 files changed, 33 insertions(+) >>> >>> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c >>> index 7679562b57bfc..c9d6b9195c00c 100644 >>> --- a/drivers/dma-buf/dma-buf.c >>> +++ b/drivers/dma-buf/dma-buf.c >>> @@ -417,6 +417,36 @@ static long dma_buf_export_sync_file(struct dma_buf *dmabuf, >>> put_unused_fd(fd); >>> return ret; >>> } >>> + >>> +static long dma_buf_import_sync_file(struct dma_buf *dmabuf, >>> + const void __user *user_data) >>> +{ >>> + struct dma_buf_sync_file arg; >>> + struct dma_fence *fence, *singleton = NULL; >>> + int ret = 0; >>> + >>> + if (copy_from_user(&arg, user_data, sizeof(arg))) >>> + return -EFAULT; >>> + >>> + if (arg.flags != DMA_BUF_SYNC_RW) >>> + return -EINVAL; >>> + >>> + fence = sync_file_get_fence(arg.fd); >>> + if (!fence) >>> + return -EINVAL; >>> + >>> + dma_resv_lock(dmabuf->resv, NULL); >>> + >>> + ret = dma_resv_get_singleton_rcu(dmabuf->resv, fence, &singleton); >>> + if (!ret && singleton) >>> + dma_resv_add_excl_fence(dmabuf->resv, singleton); >>> + >>> + dma_resv_unlock(dmabuf->resv); >>> + >>> + dma_fence_put(fence); >>> + >>> + return ret; >>> +} >>> #endif >>> >>> static long dma_buf_ioctl(struct file *file, >>> @@ -465,6 +495,8 @@ static long dma_buf_ioctl(struct file *file, >>> #if IS_ENABLED(CONFIG_SYNC_FILE) >>> case DMA_BUF_IOCTL_EXPORT_SYNC_FILE: >>> return dma_buf_export_sync_file(dmabuf, (void __user *)arg); >>> + case DMA_BUF_IOCTL_IMPORT_SYNC_FILE: >>> + return dma_buf_import_sync_file(dmabuf, (const void __user *)arg); >>> #endif >>> >>> default: >>> diff --git a/include/uapi/linux/dma-buf.h b/include/uapi/linux/dma-buf.h >>> index f902cadcbdb56..75fdde4800267 100644 >>> --- a/include/uapi/linux/dma-buf.h >>> +++ b/include/uapi/linux/dma-buf.h >>> @@ -70,5 +70,6 @@ struct dma_buf_sync_file { >>> #define DMA_BUF_SET_NAME_A _IOW(DMA_BUF_BASE, 1, u32) >>> #define DMA_BUF_SET_NAME_B _IOW(DMA_BUF_BASE, 1, u64) >>> #define DMA_BUF_IOCTL_EXPORT_SYNC_FILE _IOWR(DMA_BUF_BASE, 2, struct dma_buf_sync_file) >>> +#define DMA_BUF_IOCTL_IMPORT_SYNC_FILE _IOW(DMA_BUF_BASE, 3, struct dma_buf_sync) >>> >>> #endif >>> -- >>> 2.31.1 >>>