From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: ** X-Spam-Status: No, score=2.5 required=3.0 tests=BAYES_00,DKIM_ADSP_CUSTOM_MED, DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,HTML_MESSAGE,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B0E30C433B4 for ; Tue, 27 Apr 2021 12:11:19 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 64E00610FA for ; Tue, 27 Apr 2021 12:11:19 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 64E00610FA Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=dri-devel-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id B49766E954; Tue, 27 Apr 2021 12:11:17 +0000 (UTC) Received: from mail-pg1-x533.google.com (mail-pg1-x533.google.com [IPv6:2607:f8b0:4864:20::533]) by gabe.freedesktop.org (Postfix) with ESMTPS id 3B0596E951; Tue, 27 Apr 2021 12:11:16 +0000 (UTC) Received: by mail-pg1-x533.google.com with SMTP id b17so5173300pgh.7; Tue, 27 Apr 2021 05:11:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=pd2+UwZciI1tc56csC8GjjoarfQrD1jhX4kXCp39fKk=; b=hbmulHUwt07NPBPRJAwuEBQq6mhayTArmjGi21BffdCJ/q/kJ3jyjqMzmPirmSbh8f JZvJRj6GX2NaAKGqEJLbQj+pv8vcDJCqtwdIpqwp1oAHhYzzj94yuHlqk0oZNSupFn1b GBr98scVbnGBgowOgqFEgR8G8k5dQnQ0EkIhZfz9+buSSBQJJ4Q5I36nmq1mDvEsDIhV tbAOEI3Q3AfEu0VEzJodlBizlFul4OgdlkcwgJnUu7IGibWmOEHvd6r7+PEo8r4UjoaE +Jj9GB9Z61/z1OefNMF9Mje5L8QbuOk+OtHtIcKo65dVRoDLh92lwGZ8RGP56OY/Ud6z 1JsA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=pd2+UwZciI1tc56csC8GjjoarfQrD1jhX4kXCp39fKk=; b=Xr8gJKvQL693VmzSBuCfdX6zC/SOwdYU5efAnZXcLZNRuLRE/g8vpLOcQYP19AE2lb 5ECli5wEzMbjKRzuwXQNrSduvqXqGaoQgtm24xyUWX+DBdO900UQG7CRwn+boqeI82II qXmlJR2v30e5sHHOvbSeZS5vHH4/H4dJlTSDIcUP2+b3AMG1GMTYp+383lg5sBZyVFNT SpcAOvgodPx2ync3WCIPqXIUFCjdRyPGwKOjdfLDjKSkBKXiD96zMrVcMsHamdERn41E 2ERA1SRjO0pwTtgo7jowFjK70ZM05RusJLgOeg52i1YxS91Ns2V/QvaPngjRBxQP5t+9 cT+g== X-Gm-Message-State: AOAM531UGDFTKXas9w8XbQuApPeK0LGjaxsCe+hjfXrkSqkDEji/Ksap 6rOYwPkYT3bNUZKIdbxGsEgiasUZIQrcdoKcKfI= X-Google-Smtp-Source: ABdhPJwINdq7aAdYEqMya0/GoL3/LZrHyyeEOmA+1ANzEbo+zZUVFT4hZeS8gHXSP7IiF/Wvxm9UsGfjGmHxVPkIhN4= X-Received: by 2002:a62:b412:0:b029:21f:6b06:7bdd with SMTP id h18-20020a62b4120000b029021f6b067bddmr22517650pfn.51.1619525475820; Tue, 27 Apr 2021 05:11:15 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: =?UTF-8?B?TWFyZWsgT2zFocOhaw==?= Date: Tue, 27 Apr 2021 08:11:02 -0400 Message-ID: Subject: Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal To: =?UTF-8?Q?Christian_K=C3=B6nig?= X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: ML Mesa-dev , dri-devel Content-Type: multipart/mixed; boundary="===============1622784382==" Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" --===============1622784382== Content-Type: multipart/alternative; boundary="000000000000b0e7a605c0f3282f" --000000000000b0e7a605c0f3282f Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Ok. I'll interpret this as "yes, it will work, let's do it". Marek On Tue., Apr. 27, 2021, 08:06 Christian K=C3=B6nig, < ckoenig.leichtzumerken@gmail.com> wrote: > Correct, we wouldn't have synchronization between device with and without > user queues any more. > > That could only be a problem for A+I Laptops. > > Memory management will just work with preemption fences which pause the > user queues of a process before evicting something. That will be a > dma_fence, but also a well known approach. > > Christian. > > Am 27.04.21 um 13:49 schrieb Marek Ol=C5=A1=C3=A1k: > > If we don't use future fences for DMA fences at all, e.g. we don't use > them for memory management, it can work, right? Memory management can > suspend user queues anytime. It doesn't need to use DMA fences. There mig= ht > be something that I'm missing here. > > What would we lose without DMA fences? Just inter-device synchronization? > I think that might be acceptable. > > The only case when the kernel will wait on a future fence is before a pag= e > flip. Everything today already depends on userspace not hanging the gpu, > which makes everything a future fence. > > Marek > > On Tue., Apr. 27, 2021, 04:02 Daniel Vetter, wrote: > >> On Mon, Apr 26, 2021 at 04:59:28PM -0400, Marek Ol=C5=A1=C3=A1k wrote: >> > Thanks everybody. The initial proposal is dead. Here are some thoughts >> on >> > how to do it differently. >> > >> > I think we can have direct command submission from userspace via >> > memory-mapped queues ("user queues") without changing window systems. >> > >> > The memory management doesn't have to use GPU page faults like HMM. >> > Instead, it can wait for user queues of a specific process to go idle >> and >> > then unmap the queues, so that userspace can't submit anything. Buffer >> > evictions, pinning, etc. can be executed when all queues are unmapped >> > (suspended). Thus, no BO fences and page faults are needed. >> > >> > Inter-process synchronization can use timeline semaphores. Userspace >> will >> > query the wait and signal value for a shared buffer from the kernel. T= he >> > kernel will keep a history of those queries to know which process is >> > responsible for signalling which buffer. There is only the wait-timeou= t >> > issue and how to identify the culprit. One of the solutions is to have >> the >> > GPU send all GPU signal commands and all timed out wait commands via a= n >> > interrupt to the kernel driver to monitor and validate userspace >> behavior. >> > With that, it can be identified whether the culprit is the waiting >> process >> > or the signalling process and which one. Invalid signal/wait parameter= s >> can >> > also be detected. The kernel can force-signal only the semaphores that >> time >> > out, and punish the processes which caused the timeout or used invalid >> > signal/wait parameters. >> > >> > The question is whether this synchronization solution is robust enough >> for >> > dma_fence and whatever the kernel and window systems need. >> >> The proper model here is the preempt-ctx dma_fence that amdkfd uses >> (without page faults). That means dma_fence for synchronization is doa, = at >> least as-is, and we're back to figuring out the winsys problem. >> >> "We'll solve it with timeouts" is very tempting, but doesn't work. It's >> akin to saying that we're solving deadlock issues in a locking design by >> doing a global s/mutex_lock/mutex_lock_timeout/ in the kernel. Sure it >> avoids having to reach the reset button, but that's about it. >> >> And the fundamental problem is that once you throw in userspace command >> submission (and syncing, at least within the userspace driver, otherwise >> there's kinda no point if you still need the kernel for cross-engine syn= c) >> means you get deadlocks if you still use dma_fence for sync under >> perfectly legit use-case. We've discussed that one ad nauseam last summe= r: >> >> >> https://dri.freedesktop.org/docs/drm/driver-api/dma-buf.html?highlight= =3Ddma_fence#indefinite-dma-fences >> >> See silly diagramm at the bottom. >> >> Now I think all isn't lost, because imo the first step to getting to thi= s >> brave new world is rebuilding the driver on top of userspace fences, and >> with the adjusted cmd submit model. You probably don't want to use amdkf= d, >> but port that as a context flag or similar to render nodes for gl/vk. Of >> course that means you can only use this mode in headless, without >> glx/wayland winsys support, but it's a start. >> -Daniel >> >> > >> > Marek >> > >> > On Tue, Apr 20, 2021 at 4:34 PM Daniel Stone >> wrote: >> > >> > > Hi, >> > > >> > > On Tue, 20 Apr 2021 at 20:30, Daniel Vetter wrote: >> > > >> > >> The thing is, you can't do this in drm/scheduler. At least not >> without >> > >> splitting up the dma_fence in the kernel into separate memory fence= s >> > >> and sync fences >> > > >> > > >> > > I'm starting to think this thread needs its own glossary ... >> > > >> > > I propose we use 'residency fence' for execution fences which enact >> > > memory-residency operations, e.g. faulting in a page ultimately >> depending >> > > on GPU work retiring. >> > > >> > > And 'value fence' for the pure-userspace model suggested by timeline >> > > semaphores, i.e. fences being (*addr =3D=3D val) rather than being a= ble >> to look >> > > at ctx seqno. >> > > >> > > Cheers, >> > > Daniel >> > > _______________________________________________ >> > > mesa-dev mailing list >> > > mesa-dev@lists.freedesktop.org >> > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev >> > > >> >> -- >> Daniel Vetter >> Software Engineer, Intel Corporation >> http://blog.ffwll.ch >> > > _______________________________________________ > mesa-dev mailing listmesa-dev@lists.freedesktop.orghttps://lists.freedesk= top.org/mailman/listinfo/mesa-dev > > > --000000000000b0e7a605c0f3282f Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Ok. I'll interpret this as "yes, it will work, l= et's do it".

Marek

On Tue., Apr. 27, 2021, 08:06 Christian K=C3=B6nig, <ckoenig.leichtzumerken@gmail.com>= ; wrote:
=20 =20 =20
Correct, we wouldn't have synchronization between device with and without user queues any more.

That could only be a problem for A+I Laptops.

Memory management will just work with preemption fences which pause the user queues of a process before evicting something. That will be a dma_fence, but also a well known approach.

Christian.

Am 27.04.21 um 13:49 schrieb Marek Ol=C5=A1=C3=A1k:
=20
If we don't use future fences for DMA fences at all, e.g. we don't use them for memory management, it can work, right? Memory management can suspend user queues anytime. It doesn't need to use DMA fences. There might be something that I'm missing here.

What would we lose without DMA fences? Just inter-device synchronization? I think that might be acceptable.

The only case when the kernel will wait on a future fence is before a page flip. Everything today already depends on userspace not hanging the gpu, which makes everything a future fence.

Marek

On Tue., Apr. 27, 2021, 04:02 Daniel Vetter, <daniel@ffwll.ch> wrote:
On Mon, Apr 26, 2021 at 04:59:28PM -0400, Marek Ol=C5=A1=C3=A1k wrote= :
> Thanks everybody. The initial proposal is dead. Here are some thoughts on
> how to do it differently.
>
> I think we can have direct command submission from userspace via
> memory-mapped queues ("user queues") without c= hanging window systems.
>
> The memory management doesn't have to use GPU page faults like HMM.
> Instead, it can wait for user queues of a specific process to go idle and
> then unmap the queues, so that userspace can't submi= t anything. Buffer
> evictions, pinning, etc. can be executed when all queues are unmapped
> (suspended). Thus, no BO fences and page faults are needed.
>
> Inter-process synchronization can use timeline semaphores. Userspace will
> query the wait and signal value for a shared buffer from the kernel. The
> kernel will keep a history of those queries to know which process is
> responsible for signalling which buffer. There is only the wait-timeout
> issue and how to identify the culprit. One of the solutions is to have the
> GPU send all GPU signal commands and all timed out wait commands via an
> interrupt to the kernel driver to monitor and validate userspace behavior.
> With that, it can be identified whether the culprit is the waiting process
> or the signalling process and which one. Invalid signal/wait parameters can
> also be detected. The kernel can force-signal only the semaphores that time
> out, and punish the processes which caused the timeout or used invalid
> signal/wait parameters.
>
> The question is whether this synchronization solution is robust enough for
> dma_fence and whatever the kernel and window systems need.

The proper model here is the preempt-ctx dma_fence that amdkfd uses
(without page faults). That means dma_fence for synchronization is doa, at
least as-is, and we're back to figuring out the winsys problem.

"We'll solve it with timeouts" is very tempting= , but doesn't work. It's
akin to saying that we're solving deadlock issues in a locking design by
doing a global s/mutex_lock/mutex_lock_timeout/ in the kernel. Sure it
avoids having to reach the reset button, but that's about it.

And the fundamental problem is that once you throw in userspace command
submission (and syncing, at least within the userspace driver, otherwise
there's kinda no point if you still need the kernel for cross-engine sync)
means you get deadlocks if you still use dma_fence for sync under
perfectly legit use-case. We've discussed that one ad nauseam last summer:

https://dri.freedesktop.org/docs/dr= m/driver-api/dma-buf.html?highlight=3Ddma_fence#indefinite-dma-fences
See silly diagramm at the bottom.

Now I think all isn't lost, because imo the first step to getting to this
brave new world is rebuilding the driver on top of userspace fences, and
with the adjusted cmd submit model. You probably don't want to use amdkfd,
but port that as a context flag or similar to render nodes for gl/vk. Of
course that means you can only use this mode in headless, without
glx/wayland winsys support, but it's a start.
-Daniel

>
> Marek
>
> On Tue, Apr 20, 2021 at 4:34 PM Daniel Stone <daniel@fooishbar.org> wrote:
>
> > Hi,
> >
> > On Tue, 20 Apr 2021 at 20:30, Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> >> The thing is, you can't do this in drm/scheduler. At least not without
> >> splitting up the dma_fence in the kernel into separate memory fences
> >> and sync fences
> >
> >
> > I'm starting to think this thread needs its own glossary ...
> >
> > I propose we use 'residency fence' for exec= ution fences which enact
> > memory-residency operations, e.g. faulting in a page ultimately depending
> > on GPU work retiring.
> >
> > And 'value fence' for the pure-userspace mo= del suggested by timeline
> > semaphores, i.e. fences being (*addr =3D=3D val) rather than being able to look
> > at ctx seqno.
> >
> > Cheers,
> > Daniel
> > _______________________________________________
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.o= rg
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> >

--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listin=
fo/mesa-dev

--000000000000b0e7a605c0f3282f-- --===============1622784382== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel --===============1622784382==--