From: "Michel Dänzer" <michel@daenzer.net>
To: Daniel Vetter <daniel@ffwll.ch>
Cc: Vivek Kasireddy <vivek.kasireddy@intel.com>,
Dongwon Kim <dongwon.kim@intel.com>,
dri-devel@lists.freedesktop.org,
Tina Zhang <tina.zhang@intel.com>,
Gerd Hoffmann <kraxel@redhat.com>
Subject: Re: [RFC v1 0/4] drm: Add support for DRM_CAP_DEFERRED_OUT_FENCE capability
Date: Mon, 2 Aug 2021 11:19:01 +0200 [thread overview]
Message-ID: <ab6300d6-1cc8-8084-81bc-118156ad8c52@daenzer.net> (raw)
In-Reply-To: <YQe1eUxNvJZ9G200@phenom.ffwll.local>
On 2021-08-02 11:06 a.m., Daniel Vetter wrote:
> On Mon, Aug 02, 2021 at 10:49:37AM +0200, Michel Dänzer wrote:
>> On 2021-08-02 9:59 a.m., Daniel Vetter wrote:
>>> On Fri, Jul 30, 2021 at 02:50:10PM +0200, Michel Dänzer wrote:
>>>> On 2021-07-30 12:25 p.m., Daniel Vetter wrote:
>>>>> On Thu, Jul 29, 2021 at 01:16:55AM -0700, Vivek Kasireddy wrote:
>>>>>> By separating the OUT_FENCE signalling from pageflip completion allows
>>>>>> a Guest compositor to start a new repaint cycle with a new buffer
>>>>>> instead of waiting for the old buffer to be free.
>>>>>>
>>>>>> This work is based on the idea/suggestion from Simon and Pekka.
>>>>>>
>>>>>> This capability can be a solution for this issue:
>>>>>> https://gitlab.freedesktop.org/wayland/weston/-/issues/514
>>>>>>
>>>>>> Corresponding Weston MR:
>>>>>> https://gitlab.freedesktop.org/wayland/weston/-/merge_requests/668
>>>>>
>>>>> Uh I kinda wanted to discuss this a bit more before we jump into typing
>>>>> code, but well I guess not that much work yet.
>>>>>
>>>>> So maybe I'm not understanding the problem, but I think the fundamental
>>>>> underlying issue is that with KMS you can have at most 2 buffers
>>>>> in-flight, due to our queue depth limit of 1 pending flip.
>>>>>
>>>>> Unfortunately that means for virtual hw where it takes a few more
>>>>> steps/vblanks until the framebuffer actually shows up on screen and is
>>>>> scanned out, we suffer deeply. The usual fix for that is to drop the
>>>>> latency and increase throughput, and have more buffers in-flight. Which
>>>>> this patch tries to do.
>>>>
>>>> Per
>>>> https://gitlab.freedesktop.org/wayland/weston/-/issues/514#note_986797 ,
>>>> IMO the underlying issue is actually that the guest compositor repaint
>>>> cycle is not aligned with the host compositor one. If they were aligned,
>>>> the problem would not occur even without allowing multiple page flips in
>>>> flight, and latency would be lower.
>>>
>>> Yeah my proposal here is under the premise that we do actually need to fix
>>> this with a deeper queue depth.
>>>
>>>>> Now I think where we go wrong here is that we're trying to hack this up by
>>>>> defining different semantics for the out-fence and for the drm-event. Imo
>>>>> that's wrong, they're both meant to show eactly the same thing:
>>>>> - when is the new frame actually visible to the user (as in, eyeballs in a
>>>>> human head, preferrably, not the time when we've handed the buffer off
>>>>> to the virtual hw)
>>>>> - when is the previous buffer no longer being used by the scanout hw
>>>>>
>>>>> We do cheat a bit right now in so far that we assume they're both the
>>>>> same, as in, panel-side latency is currently the compositor's problem to
>>>>> figure out.
>>>>>
>>>>> So for virtual hw I think the timestamp and even completion really need to
>>>>> happen only when the buffer has been pushed through the entire
>>>>> virtualization chain, i.e. ideally we get the timestamp from the kms
>>>>> driver from the host side. Currently that's not done, so this is most
>>>>> likely quite broken already (virtio relies on the no-vblank auto event
>>>>> sending, which definitely doesn't wait for anything, or I'm completely
>>>>> missing something).
>>>>>
>>>>> I think instead of hacking up some ill-defined 1.5 queue depth support,
>>>>> what we should do is support queue depth > 1 properly. So:
>>>>>
>>>>> - Change atomic to support queue depth > 1, this needs to be a per-driver
>>>>> thing due to a bunch of issues in driver code. Essentially drivers must
>>>>> never look at obj->state pointers, and only ever look up state through
>>>>> the passed-in drm_atomic_state * update container.
>>>>>
>>>>> - Aside: virtio should loose all it's empty hooks, there's no point in
>>>>> that.
>>>>>
>>>>> - We fix virtio to send out the completion event at the end of this entire
>>>>> pipeline, i.e. virtio code needs to take care of sending out the
>>>>> crtc_state->event correctly.
>>>>>
>>>>> - We probably also want some kind of (maybe per-crtc) recommended queue
>>>>> depth property so compositors know how many buffers to keep in flight.
>>>>> Not sure about that.
>>>>
>>>> I'd say there would definitely need to be some kind of signal for the
>>>> display server that it should queue multiple flips, since this is
>>>> normally not desirable for latency. In other words, this wouldn't really
>>>> be useful on bare metal (in contrast to the ability to replace a pending
>>>> flip with a newer one).
>>>
>>> Hm I was thinking that the compositor can tune this. If the round-trip
>>> latency (as measured by events) is too long to get full refresh rate, it
>>> can add more buffers to the queue. That's kinda why I think the returned
>>> event really must be accurate wrt actual display time (and old buffer
>>> release time), so that this computation in the compositor because a pretty
>>> simple
>>>
>>> num_buffers = (flip_time - submit_time) / frame_time
>>>
>>> With maybe some rounding up and averaging. You can also hit this when your
>>> 3d engine has an extremely deep pipeline (like some of the tiling
>>> renders have), where rendering just takes forever, but as long as you keep
>>> 2 frames in the renderer in-flight you can achieve full refresh rate (at a
>>> latency cost).
>>
>> As long as a page flip submitted after vblank N can complete during
>> vblank N+1, full frame rate can be sustained[0]. User space can use as
>> many buffers as needed to keep the rendering pipeline busy.
>>
>> [0] This is broken by the mis-aligned compositor repaint cycles: The
>> flip from the guest compositor misses the host compositor's cycle, so it
>> takes more than one display refresh cycle to complete.
>>
>>
>>> So kernel can't really tell you in all cases how many buffers you should
>>> have.
>>
>> That's not exactly what I mean. Right now, KMS user space has to wait
>> for a flip to complete before submitting another one, or it gets EBUSY.
>> So if the kernel wants to allow multiple flips to be submitted, it has
>> to somehow tell user space that this is possible, or it'll never happen.
>> And the kernel should never advertise this for bare metal, since it's
>> not needed there (and undesirable).
>
> Oh the existence of the deep queue needs a getcap ofc.
>
> Also, deep queues do exist in hw (including scheduling to the right
> frame), the benefit is reduced battery usage for e.g. video playback if
> you do the rendering for an entire set of frames and then just let the
> display show them.
>
> It costs latency ofc (or we need a cancellable queue, which once it's in
> hw is tricky).
And if it's not in HW, it can be handled by the user-space display server instead of the kernel (except for cancelling pending flips).
(Note that this is currently not possible either way with Wayland, since it uses mailbox semantics; there are proposals for a Wayland extension which allows queuing multiple frames though. Meanwhile, this would need to be handled in the Wayland clients.)
--
Earthling Michel Dänzer | https://redhat.com
Libre software enthusiast | Mesa and X developer
next prev parent reply other threads:[~2021-08-02 9:19 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-29 8:16 [RFC v1 0/4] drm: Add support for DRM_CAP_DEFERRED_OUT_FENCE capability Vivek Kasireddy
2021-07-29 8:16 ` [RFC v1 1/4] drm: Add a capability flag to support deferred out_fence signalling Vivek Kasireddy
2021-07-29 8:16 ` [RFC v1 2/4] virtio-gpu uapi: Add VIRTIO_GPU_F_OUT_FENCE feature Vivek Kasireddy
2021-07-29 9:50 ` Gerd Hoffmann
2021-07-29 18:53 ` Kasireddy, Vivek
2021-07-29 8:16 ` [RFC v1 3/4] drm/virtio: Add VIRTIO_GPU_CMD_RESOURCE_OUT_FENCE cmd Vivek Kasireddy
2021-07-29 8:16 ` [RFC v1 4/4] drm/virtio: Probe and implement VIRTIO_GPU_F_OUT_FENCE feature Vivek Kasireddy
2021-07-29 9:52 ` Gerd Hoffmann
2021-07-29 18:55 ` Kasireddy, Vivek
2021-07-30 10:25 ` [RFC v1 0/4] drm: Add support for DRM_CAP_DEFERRED_OUT_FENCE capability Daniel Vetter
2021-07-30 12:50 ` Michel Dänzer
2021-08-02 7:59 ` Daniel Vetter
2021-08-02 8:49 ` Michel Dänzer
2021-08-02 9:06 ` Daniel Vetter
2021-08-02 9:19 ` Michel Dänzer [this message]
2021-07-30 13:38 ` Gerd Hoffmann
2021-08-02 9:09 ` Daniel Vetter
2021-08-02 12:50 ` Gerd Hoffmann
2021-08-02 14:35 ` Daniel Vetter
2021-08-03 6:18 ` Kasireddy, Vivek
2021-08-03 7:51 ` Gerd Hoffmann
2021-08-04 7:27 ` Kasireddy, Vivek
2021-08-02 4:48 ` Zhang, Tina
2021-08-02 6:51 ` Kasireddy, Vivek
2021-08-02 8:14 ` Daniel Vetter
2021-08-03 6:11 ` Kasireddy, Vivek
2021-08-03 7:33 ` Michel Dänzer
2021-08-04 7:25 ` Kasireddy, Vivek
2021-08-04 12:11 ` Daniel Vetter
2021-08-05 4:15 ` Kasireddy, Vivek
2021-08-05 12:08 ` Daniel Vetter
2021-08-06 7:27 ` Kasireddy, Vivek
2021-08-09 14:15 ` Daniel Vetter
2021-08-10 8:21 ` Kasireddy, Vivek
2021-08-10 8:30 ` Daniel Vetter
2021-08-10 10:57 ` Michel Dänzer
2021-08-11 7:25 ` Kasireddy, Vivek
2021-08-11 7:20 ` Kasireddy, Vivek
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ab6300d6-1cc8-8084-81bc-118156ad8c52@daenzer.net \
--to=michel@daenzer.net \
--cc=daniel@ffwll.ch \
--cc=dongwon.kim@intel.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=kraxel@redhat.com \
--cc=tina.zhang@intel.com \
--cc=vivek.kasireddy@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).