From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: ** X-Spam-Status: No, score=2.5 required=3.0 tests=BAYES_00,DKIM_ADSP_CUSTOM_MED, DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,HTML_MESSAGE,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CAAB9C4708F for ; Tue, 1 Jun 2021 13:01:27 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 94A02613CD for ; Tue, 1 Jun 2021 13:01:27 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 94A02613CD Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=dri-devel-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id E7DFA6EA5E; Tue, 1 Jun 2021 13:01:26 +0000 (UTC) Received: from mail-pl1-x633.google.com (mail-pl1-x633.google.com [IPv6:2607:f8b0:4864:20::633]) by gabe.freedesktop.org (Postfix) with ESMTPS id 9AD986EA5E; Tue, 1 Jun 2021 13:01:26 +0000 (UTC) Received: by mail-pl1-x633.google.com with SMTP id 11so3325064plk.12; Tue, 01 Jun 2021 06:01:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=KEdF81igRGCuLRLhACg38eVCDeweT0WOlvNRgIH+krM=; b=qfUAZi9VrDyX+Ln4lZBpLNU+gQdg8vJDG8XmSBSOrTRSJzfWnWI7/O9apz11O16MOZ 5eeon0ETT0A2xDfRAJQLj2GJCcI/f6oYJSJUGxHPcBpCqX3zjQog83egYX7KatfTQbd0 ONWIGzgHs/7L/Jk/ic1pKNqaLkvHUIbMS1zTsYnhb9iPKacnNFMbTUVipzsfIDYuYbgx lSArJSQrRuttJBas1nCPUJDCBbMQvmRBzc/IgYca1a3GOYmEaDjz0QS7TjGQMgOdXgBA n/hiwOHeefUzfnV6x6sQjPP/Le0zeIXYWSk6pswj82LXabYCzq3lDfklLZ2Kh5+S29tt Uu1Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=KEdF81igRGCuLRLhACg38eVCDeweT0WOlvNRgIH+krM=; b=TpV7FmFwBdFatnV5OvY7rD5/Ywozlp9R5T+vTtTuaiD5doGX/S5V7sA5+tXDBbwueY RLhwe+CQAtxCcahc4QuLTvsLdO9ssFgpLMmml8UNHskFf+KlpImtPMIRjKLDq3C/to4P hoDfAu1eoTgZ1iLoB7ZajmEfJM+jf6eEygz5Caw6tI5J1SzJuTb7mQTd8gak5q5Z1zS/ LL3sqYregfO63DcmTSWePIkHAb0rUbJfxMfTO+t01sU1RMw6urgblif+0jUBelRT/mza T63WbuV/t5EULZLuMGTz6StcbeZTo1YG1dHwmusEIT/y0olCK8hMlmyw1lZAOSviiT9a s3gg== X-Gm-Message-State: AOAM530DGe23tOGRoKWrcOf9oV1+IFD1+0t2UviuvAodcPXVAKVC2LaU 2kYpfXiHONmFyf1WZN5vcLKx/iwBjVDs/Sedla8= X-Google-Smtp-Source: ABdhPJwXzL0oxpDRdLR2y4uaeSIWjOP5B3JMkn3hL6GPphsoW43dt468ly73rVxjDKSjmHaXqgVO35kb4tQp35Ki3K0= X-Received: by 2002:a17:902:f682:b029:106:2ff0:db1d with SMTP id l2-20020a170902f682b02901062ff0db1dmr7275354plg.74.1622552486289; Tue, 01 Jun 2021 06:01:26 -0700 (PDT) MIME-Version: 1.0 References: <327e4008-b29f-f5b7-bb30-532fa52c797f@gmail.com> <7f19e3c7-b6b2-5200-95eb-3fed8d22a6b3@daenzer.net> In-Reply-To: From: =?UTF-8?B?TWFyZWsgT2zFocOhaw==?= Date: Tue, 1 Jun 2021 09:01:14 -0400 Message-ID: Subject: Re: [Mesa-dev] Linux Graphics Next: Userspace submission update To: =?UTF-8?Q?Christian_K=C3=B6nig?= Content-Type: multipart/alternative; boundary="0000000000009323d005c3b3f0f8" X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Jason Ekstrand , =?UTF-8?Q?Michel_D=C3=A4nzer?= , dri-devel , ML Mesa-dev Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" --0000000000009323d005c3b3f0f8 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Tue., Jun. 1, 2021, 08:51 Christian K=C3=B6nig, < ckoenig.leichtzumerken@gmail.com> wrote: > Am 01.06.21 um 14:30 schrieb Daniel Vetter: > > On Tue, Jun 1, 2021 at 2:10 PM Christian K=C3=B6nig > > wrote: > >> Am 01.06.21 um 12:49 schrieb Michel D=C3=A4nzer: > >>> On 2021-06-01 12:21 p.m., Christian K=C3=B6nig wrote: > >>>> Am 01.06.21 um 11:02 schrieb Michel D=C3=A4nzer: > >>>>> On 2021-05-27 11:51 p.m., Marek Ol=C5=A1=C3=A1k wrote: > >>>>>> 3) Compositors (and other privileged processes, and display > flipping) can't trust imported/exported fences. They need a timeout > recovery mechanism from the beginning, and the following are some possibl= e > solutions to timeouts: > >>>>>> > >>>>>> a) use a CPU wait with a small absolute timeout, and display the > previous content on timeout > >>>>>> b) use a GPU wait with a small absolute timeout, and conditional > rendering will choose between the latest content (if signalled) and > previous content (if timed out) > >>>>>> > >>>>>> The result would be that the desktop can run close to 60 fps even > if an app runs at 1 fps. > >>>>> FWIW, this is working with > >>>>> https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1880 , even > with implicit sync (on current Intel GPUs; amdgpu/radeonsi would need to > provide the same dma-buf poll semantics as other drivers and high priorit= y > GFX contexts via EGL_IMG_context_priority which can preempt lower priorit= y > ones). > >>>> Yeah, that is really nice to have. > >>>> > >>>> One question is if you wait on the CPU or the GPU for the new surfac= e > to become available? > >>> It's based on polling dma-buf fds, i.e. CPU. > >>> > >>>> The former is a bit bad for latency and power management. > >>> There isn't a choice for Wayland compositors in general, since there > can be arbitrary other state which needs to be applied atomically togethe= r > with the new buffer. (Though in theory, a compositor might get fancy and > special-case surface commits which can be handled by waiting on the GPU) > >>> > >>> Latency is largely a matter of scheduling in the compositor. The > latency incurred by the compositor shouldn't have to be more than > single-digit milliseconds. (I've seen total latency from when the client > starts processing a (static) frame to when it starts being scanned out as > low as ~6 ms with > https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1620, lower than > typical with Xorg) > >> Well let me describe it like this: > >> > >> We have an use cases for 144 Hz guaranteed refresh rate. That > >> essentially means that the client application needs to be able to spit > >> out one frame/window content every ~6.9ms. That's tough, but doable. > >> > >> When you now add 6ms latency in the compositor that means the client > >> application has only .9ms left for it's frame which is basically > >> impossible to do. > >> > >> See for the user fences handling the display engine will learn to read > >> sequence numbers from memory and decide on it's own if the old frame o= r > >> the new one is scanned out. To get the latency there as low as possibl= e. > > This won't work with implicit sync at all. > > > > If you want to enable this use-case with driver magic and without the > > compositor being aware of what's going on, the solution is EGLStreams. > > Not sure we want to go there, but it's definitely a lot more feasible > > than trying to stuff eglstreams semantics into dma-buf implicit > > fencing support in a desperate attempt to not change compositors. > > Well not changing compositors is certainly not something I would try > with this use case. > > Not changing compositors is more like ok we have Ubuntu 20.04 and need > to support that we the newest hardware generation. > > > I still think the most reasonable approach here is that we wrap a > > dma_fence compat layer/mode over new hw for existing > > userspace/compositors. And then enable userspace memory fences and the > > fancy new features those allow with a new model that's built for them. > > Yeah, that's basically the same direction I'm heading. Question is how > to fix all those details. > > > Also even with dma_fence we could implement your model of staying with > > the previous buffer (or an older buffer at that's already rendered), > > but it needs explicit involvement of the compositor. At least without > > adding eglstreams fd to the kernel and wiring up all the relevant > > extensions. > > Question is do we already have some extension which allows different > textures to be selected on the fly depending on some state? > There is no such extension for sync objects, but it can be done with queries, like occlusion queries. There is also no timeout option and it can only do "if" and "if not", but not "if .. else" Marek > E.g. something like use new frame if it's available and old frame > otherwise. > > If you then apply this to the standard dma_fence based hardware or the > new user fence based one is then pretty much irrelevant. > > Regards, > Christian. > > > -Daniel > > > >>>> Another question is if that is sufficient as security for the displa= y > server or if we need further handling down the road? I mean essentially w= e > are moving the reliability problem into the display server. > >>> Good question. This should generally protect the display server from > freezing due to client fences never signalling, but there might still be > corner cases. > >>> > >>> > > > > --0000000000009323d005c3b3f0f8 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


On Tue., Jun. 1, 2021, 08:51 Christian K=C3=B6nig, <= ;ckoenig.leichtzumerken= @gmail.com> wrote:
Am 01.06= .21 um 14:30 schrieb Daniel Vetter:
> On Tue, Jun 1, 2021 at 2:10 PM Christian K=C3=B6nig
> <ckoenig.leichtzumerken@gmail.com> wrote:
>> Am 01.06.21 um 12:49 schrieb Michel D=C3=A4nzer:
>>> On 2021-06-01 12:21 p.m., Christian K=C3=B6nig wrote:
>>>> Am 01.06.21 um 11:02 schrieb Michel D=C3=A4nzer:
>>>>> On 2021-05-27 11:51 p.m., Marek Ol=C5=A1=C3=A1k wrote:=
>>>>>> 3) Compositors (and other privileged processes, an= d display flipping) can't trust imported/exported fences. They need a t= imeout recovery mechanism from the beginning, and the following are some po= ssible solutions to timeouts:
>>>>>>
>>>>>> a) use a CPU wait with a small absolute timeout, a= nd display the previous content on timeout
>>>>>> b) use a GPU wait with a small absolute timeout, a= nd conditional rendering will choose between the latest content (if signall= ed) and previous content (if timed out)
>>>>>>
>>>>>> The result would be that the desktop can run close= to 60 fps even if an app runs at 1 fps.
>>>>> FWIW, this is working with
>>>>> https://g= itlab.gnome.org/GNOME/mutter/-/merge_requests/1880 , even with implicit= sync (on current Intel GPUs; amdgpu/radeonsi would need to provide the sam= e dma-buf poll semantics as other drivers and high priority GFX contexts vi= a EGL_IMG_context_priority which can preempt lower priority ones).
>>>> Yeah, that is really nice to have.
>>>>
>>>> One question is if you wait on the CPU or the GPU for the = new surface to become available?
>>> It's based on polling dma-buf fds, i.e. CPU.
>>>
>>>> The former is a bit bad for latency and power management.<= br> >>> There isn't a choice for Wayland compositors in general, s= ince there can be arbitrary other state which needs to be applied atomicall= y together with the new buffer. (Though in theory, a compositor might get f= ancy and special-case surface commits which can be handled by waiting on th= e GPU)
>>>
>>> Latency is largely a matter of scheduling in the compositor. T= he latency incurred by the compositor shouldn't have to be more than si= ngle-digit milliseconds. (I've seen total latency from when the client = starts processing a (static) frame to when it starts being scanned out as l= ow as ~6 ms with https://gitla= b.gnome.org/GNOME/mutter/-/merge_requests/1620, lower than typical with= Xorg)
>> Well let me describe it like this:
>>
>> We have an use cases for 144 Hz guaranteed refresh rate. That
>> essentially means that the client application needs to be able to = spit
>> out one frame/window content every ~6.9ms. That's tough, but d= oable.
>>
>> When you now add 6ms latency in the compositor that means the clie= nt
>> application has only .9ms left for it's frame which is basical= ly
>> impossible to do.
>>
>> See for the user fences handling the display engine will learn to = read
>> sequence numbers from memory and decide on it's own if the old= frame or
>> the new one is scanned out. To get the latency there as low as pos= sible.
> This won't work with implicit sync at all.
>
> If you want to enable this use-case with driver magic and without the<= br> > compositor being aware of what's going on, the solution is EGLStre= ams.
> Not sure we want to go there, but it's definitely a lot more feasi= ble
> than trying to stuff eglstreams semantics into dma-buf implicit
> fencing support in a desperate attempt to not change compositors.

Well not changing compositors is certainly not something I would try
with this use case.

Not changing compositors is more like ok we have Ubuntu 20.04 and need
to support that we the newest hardware generation.

> I still think the most reasonable approach here is that we wrap a
> dma_fence compat layer/mode over new hw for existing
> userspace/compositors. And then enable userspace memory fences and the=
> fancy new features those allow with a new model that's built for t= hem.

Yeah, that's basically the same direction I'm heading. Question is = how
to fix all those details.

> Also even with dma_fence we could implement your model of staying with=
> the previous buffer (or an older buffer at that's already rendered= ),
> but it needs explicit involvement of the compositor. At least without<= br> > adding eglstreams fd to the kernel and wiring up all the relevant
> extensions.

Question is do we already have some extension which allows different
textures to be selected on the fly depending on some state?

There is no such= extension for sync objects, but it can be done with queries, like occlusio= n queries. There is also no timeout option and it can only do "if"= ; and "if not", but not "if .. else"

Marek



E.g. something like use new frame if it's available and old frame other= wise.

If you then apply this to the standard dma_fence based hardware or the
new user fence based one is then pretty much irrelevant.

Regards,
Christian.

> -Daniel
>
>>>> Another question is if that is sufficient as security for = the display server or if we need further handling down the road? I mean ess= entially we are moving the reliability problem into the display server.
>>> Good question. This should generally protect the display serve= r from freezing due to client fences never signalling, but there might stil= l be corner cases.
>>>
>>>
>

--0000000000009323d005c3b3f0f8--