* Re: Support for 2D engines/blitters in V4L2 and DRM
@ 2019-04-24 8:31 ` Michel Dänzer
0 siblings, 0 replies; 36+ messages in thread
From: Michel Dänzer @ 2019-04-24 8:31 UTC (permalink / raw)
To: Paul Kocialkowski, Nicolas Dufresne, Daniel Vetter
Cc: Alexandre Courbot, Maxime Ripard, linux-kernel, dri-devel,
Tomasz Figa, Hans Verkuil, Thomas Petazzoni, Dave Airlie,
Mauro Carvalho Chehab, linux-media
On 2019-04-19 10:38 a.m., Paul Kocialkowski wrote:
> On Thu, 2019-04-18 at 20:30 -0400, Nicolas Dufresne wrote:
>> Le jeudi 18 avril 2019 à 10:18 +0200, Daniel Vetter a écrit :
>>>> It would be cool if both could be used concurrently and not just return
>>>> -EBUSY when the device is used with the other subsystem.
>>>
>>> We live in this world already :-) I think there's even patches (or merged
>>> already) to add fences to v4l, for Android.
>>
>> This work is currently suspended. It will require some feature on DRM
>> display to really make this useful, but there is also a lot of
>> challanges in V4L2. In GFX space, most of the use case are about
>> rendering as soon as possible. Though, in multimedia we have two
>> problems, we need to synchronize the frame rendering with the audio,
>> and output buffers may comes out of order due to how video CODECs are
>> made.
>
> Definitely, it feels like the DRM display side is currently a good fit
> for render use cases, but not so much for precise display cases where
> we want to try and display a buffer at a given vblank target instead of
> "as soon as possible".
>
> I have a userspace project where I've implemented a page flip queue,
> which only schedules the next flip when relevant and keeps ready
> buffers in the queue until then. This requires explicit vblank
> syncronisation (which DRM offsers, but pretty much all other display
> APIs, that are higher-level don't, so I'm just using a refresh-rate
> timer for them) and flip done notification.
>
> I haven't looked too much at how to flip with a target vblank with DRM
> directly but maybe the atomic API already has the bits in for that (but
> I haven't heard of such a thing as a buffer queue, so that makes me
> doubt it).
Not directly. What's available is that if userspace waits for vblank n
and then submits a flip, the flip will complete in vblank n+1 (or a
later vblank, depending on when the flip is submitted and when the
fences the flip depends on signal).
There is reluctance allowing more than one flip to be queued in the
kernel, as it would considerably increase complexity in the kernel. It
would probably only be considered if there was a compelling use-case
which was outright impossible otherwise.
> Well, I need to handle stuff like SDL in my userspace project, so I have
> to have all that queuing stuff in software anyway, but it would be good
> if each project didn't have to implement that. Worst case, it could be
> in libdrm too.
Usually, this kind of queuing will be handled in a display server such
as Xorg or a Wayland compositor, not by the application such as a video
player itself, or any library in the latter's address space. I'm not
sure there's much potential for sharing code between display servers for
this.
>> In the first, we'd need a mechanism where we can schedule a render at a
>> specific time or vblank. We can of course already implement this in
>> software, but with fences, the scheduling would need to be done in the
>> driver. Then if the fence is signalled earlier, the driver should hold
>> on until the delay is met. If the fence got signalled late, we also
>> need to think of a workflow. As we can't schedule more then one render
>> in DRM at one time, I don't really see yet how to make that work.
>
> Indeed, that's also one of the main issues I've spotted. Before using
> an implicit fence, we basically have to make sure the frame is due for
> display at the next vblank. Otherwise, we need to refrain from using
> the fence and schedule the flip later, which is kind of counter-
> productive.
Fences are about signalling that the contents of a frame are "done" and
ready to be presented. They're not about specifying which frame is to be
presented when.
> I feel like specifying a target vblank would be a good unit for that,
The mechanism described above works for that.
> since it's our native granularity after all (while a timestamp is not).
Note that variable refresh rate (Adaptive Sync / FreeSync / G-Sync)
changes things in this regard. It makes the vblank length variable, and
if you wait for multiple vblanks between flips, you get the maximum
vblank length corresponding to the minimum refresh rate / timing
granularity. Thus, it would be useful to allow userspace to specify a
timestamp corresponding to the earliest time when the flip is to
complete. The kernel could then try to hit that as closely as possible.
--
Earthling Michel Dänzer | https://www.amd.com
Libre software enthusiast | Mesa and X developer
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Support for 2D engines/blitters in V4L2 and DRM
2019-04-24 8:31 ` Michel Dänzer
(?)
@ 2019-04-24 12:01 ` Nicolas Dufresne
2019-04-24 14:39 ` Michel Dänzer
-1 siblings, 1 reply; 36+ messages in thread
From: Nicolas Dufresne @ 2019-04-24 12:01 UTC (permalink / raw)
To: Michel Dänzer, Paul Kocialkowski, Daniel Vetter
Cc: Alexandre Courbot, Maxime Ripard, linux-kernel, dri-devel,
Tomasz Figa, Hans Verkuil, Thomas Petazzoni, Dave Airlie,
Mauro Carvalho Chehab, linux-media
Le mercredi 24 avril 2019 à 10:31 +0200, Michel Dänzer a écrit :
> On 2019-04-19 10:38 a.m., Paul Kocialkowski wrote:
> > On Thu, 2019-04-18 at 20:30 -0400, Nicolas Dufresne wrote:
> > > Le jeudi 18 avril 2019 à 10:18 +0200, Daniel Vetter a écrit :
> > > > > It would be cool if both could be used concurrently and not just return
> > > > > -EBUSY when the device is used with the other subsystem.
> > > >
> > > > We live in this world already :-) I think there's even patches (or merged
> > > > already) to add fences to v4l, for Android.
> > >
> > > This work is currently suspended. It will require some feature on DRM
> > > display to really make this useful, but there is also a lot of
> > > challanges in V4L2. In GFX space, most of the use case are about
> > > rendering as soon as possible. Though, in multimedia we have two
> > > problems, we need to synchronize the frame rendering with the audio,
> > > and output buffers may comes out of order due to how video CODECs are
> > > made.
> >
> > Definitely, it feels like the DRM display side is currently a good fit
> > for render use cases, but not so much for precise display cases where
> > we want to try and display a buffer at a given vblank target instead of
> > "as soon as possible".
> >
> > I have a userspace project where I've implemented a page flip queue,
> > which only schedules the next flip when relevant and keeps ready
> > buffers in the queue until then. This requires explicit vblank
> > syncronisation (which DRM offsers, but pretty much all other display
> > APIs, that are higher-level don't, so I'm just using a refresh-rate
> > timer for them) and flip done notification.
> >
> > I haven't looked too much at how to flip with a target vblank with DRM
> > directly but maybe the atomic API already has the bits in for that (but
> > I haven't heard of such a thing as a buffer queue, so that makes me
> > doubt it).
>
> Not directly. What's available is that if userspace waits for vblank n
> and then submits a flip, the flip will complete in vblank n+1 (or a
> later vblank, depending on when the flip is submitted and when the
> fences the flip depends on signal).
>
> There is reluctance allowing more than one flip to be queued in the
> kernel, as it would considerably increase complexity in the kernel. It
> would probably only be considered if there was a compelling use-case
> which was outright impossible otherwise.
>
>
> > Well, I need to handle stuff like SDL in my userspace project, so I have
> > to have all that queuing stuff in software anyway, but it would be good
> > if each project didn't have to implement that. Worst case, it could be
> > in libdrm too.
>
> Usually, this kind of queuing will be handled in a display server such
> as Xorg or a Wayland compositor, not by the application such as a video
> player itself, or any library in the latter's address space. I'm not
> sure there's much potential for sharing code between display servers for
> this.
>
>
> > > In the first, we'd need a mechanism where we can schedule a render at a
> > > specific time or vblank. We can of course already implement this in
> > > software, but with fences, the scheduling would need to be done in the
> > > driver. Then if the fence is signalled earlier, the driver should hold
> > > on until the delay is met. If the fence got signalled late, we also
> > > need to think of a workflow. As we can't schedule more then one render
> > > in DRM at one time, I don't really see yet how to make that work.
> >
> > Indeed, that's also one of the main issues I've spotted. Before using
> > an implicit fence, we basically have to make sure the frame is due for
> > display at the next vblank. Otherwise, we need to refrain from using
> > the fence and schedule the flip later, which is kind of counter-
> > productive.
>
> Fences are about signalling that the contents of a frame are "done" and
> ready to be presented. They're not about specifying which frame is to be
> presented when.
>
>
> > I feel like specifying a target vblank would be a good unit for that,
>
> The mechanism described above works for that.
>
> > since it's our native granularity after all (while a timestamp is not).
>
> Note that variable refresh rate (Adaptive Sync / FreeSync / G-Sync)
> changes things in this regard. It makes the vblank length variable, and
> if you wait for multiple vblanks between flips, you get the maximum
> vblank length corresponding to the minimum refresh rate / timing
> granularity. Thus, it would be useful to allow userspace to specify a
> timestamp corresponding to the earliest time when the flip is to
> complete. The kernel could then try to hit that as closely as possible.
Rendering a video stream is more complex then what you describe here.
Whenever there is a unexpected delay (late delivery of a frame as an
example) you may endup in situation where one frame is ready after the
targeted vblank. If there is another frame that targets the following
vblank that gets ready on-time, the previous frame should be replaced
by the most recent one.
With fences, what happens is that even if you received the next frame
on time, naively replacing it is not possible, because we don't know
when the fence for the next frame will be signalled. If you simply
always replace the current frame, you may endup skipping a lot more
vblank then what you expect, and that results in jumpy playback.
Render queues with timestamp are used to smooth rendering and handle
rendering collision so that the latency is kept low (like when you have
a 100fps video over a 60Hz display). This is normally done in
userspace, but with fences, you ask the kernel to render something in
an unpredictable future, so we loose the ability to make the final
decision.
>
>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Support for 2D engines/blitters in V4L2 and DRM
2019-04-24 12:01 ` Nicolas Dufresne
@ 2019-04-24 14:39 ` Michel Dänzer
0 siblings, 0 replies; 36+ messages in thread
From: Michel Dänzer @ 2019-04-24 14:39 UTC (permalink / raw)
To: Nicolas Dufresne, Paul Kocialkowski, Daniel Vetter
Cc: Alexandre Courbot, Maxime Ripard, linux-kernel, dri-devel,
Tomasz Figa, Hans Verkuil, Thomas Petazzoni, Dave Airlie,
Mauro Carvalho Chehab, linux-media
On 2019-04-24 2:01 p.m., Nicolas Dufresne wrote:
> Le mercredi 24 avril 2019 à 10:31 +0200, Michel Dänzer a écrit :
>> On 2019-04-19 10:38 a.m., Paul Kocialkowski wrote:
>>> On Thu, 2019-04-18 at 20:30 -0400, Nicolas Dufresne wrote:
>>>> Le jeudi 18 avril 2019 à 10:18 +0200, Daniel Vetter a écrit :
>>
>>>> In the first, we'd need a mechanism where we can schedule a render at a
>>>> specific time or vblank. We can of course already implement this in
>>>> software, but with fences, the scheduling would need to be done in the
>>>> driver. Then if the fence is signalled earlier, the driver should hold
>>>> on until the delay is met. If the fence got signalled late, we also
>>>> need to think of a workflow. As we can't schedule more then one render
>>>> in DRM at one time, I don't really see yet how to make that work.
>>>
>>> Indeed, that's also one of the main issues I've spotted. Before using
>>> an implicit fence, we basically have to make sure the frame is due for
>>> display at the next vblank. Otherwise, we need to refrain from using
>>> the fence and schedule the flip later, which is kind of counter-
>>> productive.
>>
>> Fences are about signalling that the contents of a frame are "done" and
>> ready to be presented. They're not about specifying which frame is to be
>> presented when.
>>
>>
>>> I feel like specifying a target vblank would be a good unit for that,
>>
>> The mechanism described above works for that.
>>
>>> since it's our native granularity after all (while a timestamp is not).
>>
>> Note that variable refresh rate (Adaptive Sync / FreeSync / G-Sync)
>> changes things in this regard. It makes the vblank length variable, and
>> if you wait for multiple vblanks between flips, you get the maximum
>> vblank length corresponding to the minimum refresh rate / timing
>> granularity. Thus, it would be useful to allow userspace to specify a
>> timestamp corresponding to the earliest time when the flip is to
>> complete. The kernel could then try to hit that as closely as possible.
>
> Rendering a video stream is more complex then what you describe here.
> Whenever there is a unexpected delay (late delivery of a frame as an
> example) you may endup in situation where one frame is ready after the
> targeted vblank. If there is another frame that targets the following
> vblank that gets ready on-time, the previous frame should be replaced
> by the most recent one.
>
> With fences, what happens is that even if you received the next frame
> on time, naively replacing it is not possible, because we don't know
> when the fence for the next frame will be signalled. If you simply
> always replace the current frame, you may endup skipping a lot more
> vblank then what you expect, and that results in jumpy playback.
So you want to be able to replace a queued flip with another one then.
That doesn't necessarily require allowing more than one flip to be
queued ahead of time.
Note that this can also be done in userspace with explicit fencing (by
only selecting a frame and submitting it to the kernel after all
corresponding fences have signalled), at least to some degree, but the
kernel should be able to do it up to a later point in time and more
reliably, with less risk of missing a flip for a frame which becomes
ready just in time.
> Render queues with timestamp are used to smooth rendering and handle
> rendering collision so that the latency is kept low (like when you have
> a 100fps video over a 60Hz display). This is normally done in
> userspace, but with fences, you ask the kernel to render something in
> an unpredictable future, so we loose the ability to make the final
> decision.
That's just not what fences are intended to be used for with the current
KMS UAPI.
--
Earthling Michel Dänzer | https://www.amd.com
Libre software enthusiast | Mesa and X developer
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Support for 2D engines/blitters in V4L2 and DRM
@ 2019-04-24 14:39 ` Michel Dänzer
0 siblings, 0 replies; 36+ messages in thread
From: Michel Dänzer @ 2019-04-24 14:39 UTC (permalink / raw)
To: Nicolas Dufresne, Paul Kocialkowski, Daniel Vetter
Cc: Alexandre Courbot, Maxime Ripard, linux-kernel, dri-devel,
Tomasz Figa, Hans Verkuil, Thomas Petazzoni, Dave Airlie,
Mauro Carvalho Chehab, linux-media
On 2019-04-24 2:01 p.m., Nicolas Dufresne wrote:
> Le mercredi 24 avril 2019 à 10:31 +0200, Michel Dänzer a écrit :
>> On 2019-04-19 10:38 a.m., Paul Kocialkowski wrote:
>>> On Thu, 2019-04-18 at 20:30 -0400, Nicolas Dufresne wrote:
>>>> Le jeudi 18 avril 2019 à 10:18 +0200, Daniel Vetter a écrit :
>>
>>>> In the first, we'd need a mechanism where we can schedule a render at a
>>>> specific time or vblank. We can of course already implement this in
>>>> software, but with fences, the scheduling would need to be done in the
>>>> driver. Then if the fence is signalled earlier, the driver should hold
>>>> on until the delay is met. If the fence got signalled late, we also
>>>> need to think of a workflow. As we can't schedule more then one render
>>>> in DRM at one time, I don't really see yet how to make that work.
>>>
>>> Indeed, that's also one of the main issues I've spotted. Before using
>>> an implicit fence, we basically have to make sure the frame is due for
>>> display at the next vblank. Otherwise, we need to refrain from using
>>> the fence and schedule the flip later, which is kind of counter-
>>> productive.
>>
>> Fences are about signalling that the contents of a frame are "done" and
>> ready to be presented. They're not about specifying which frame is to be
>> presented when.
>>
>>
>>> I feel like specifying a target vblank would be a good unit for that,
>>
>> The mechanism described above works for that.
>>
>>> since it's our native granularity after all (while a timestamp is not).
>>
>> Note that variable refresh rate (Adaptive Sync / FreeSync / G-Sync)
>> changes things in this regard. It makes the vblank length variable, and
>> if you wait for multiple vblanks between flips, you get the maximum
>> vblank length corresponding to the minimum refresh rate / timing
>> granularity. Thus, it would be useful to allow userspace to specify a
>> timestamp corresponding to the earliest time when the flip is to
>> complete. The kernel could then try to hit that as closely as possible.
>
> Rendering a video stream is more complex then what you describe here.
> Whenever there is a unexpected delay (late delivery of a frame as an
> example) you may endup in situation where one frame is ready after the
> targeted vblank. If there is another frame that targets the following
> vblank that gets ready on-time, the previous frame should be replaced
> by the most recent one.
>
> With fences, what happens is that even if you received the next frame
> on time, naively replacing it is not possible, because we don't know
> when the fence for the next frame will be signalled. If you simply
> always replace the current frame, you may endup skipping a lot more
> vblank then what you expect, and that results in jumpy playback.
So you want to be able to replace a queued flip with another one then.
That doesn't necessarily require allowing more than one flip to be
queued ahead of time.
Note that this can also be done in userspace with explicit fencing (by
only selecting a frame and submitting it to the kernel after all
corresponding fences have signalled), at least to some degree, but the
kernel should be able to do it up to a later point in time and more
reliably, with less risk of missing a flip for a frame which becomes
ready just in time.
> Render queues with timestamp are used to smooth rendering and handle
> rendering collision so that the latency is kept low (like when you have
> a 100fps video over a 60Hz display). This is normally done in
> userspace, but with fences, you ask the kernel to render something in
> an unpredictable future, so we loose the ability to make the final
> decision.
That's just not what fences are intended to be used for with the current
KMS UAPI.
--
Earthling Michel Dänzer | https://www.amd.com
Libre software enthusiast | Mesa and X developer
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Support for 2D engines/blitters in V4L2 and DRM
2019-04-24 14:39 ` Michel Dänzer
@ 2019-04-24 14:41 ` Paul Kocialkowski
-1 siblings, 0 replies; 36+ messages in thread
From: Paul Kocialkowski @ 2019-04-24 14:41 UTC (permalink / raw)
To: Michel Dänzer, Nicolas Dufresne, Daniel Vetter
Cc: Alexandre Courbot, Maxime Ripard, linux-kernel, dri-devel,
Tomasz Figa, Hans Verkuil, Thomas Petazzoni, Dave Airlie,
Mauro Carvalho Chehab, linux-media
Hi,
On Wed, 2019-04-24 at 16:39 +0200, Michel Dänzer wrote:
> On 2019-04-24 2:01 p.m., Nicolas Dufresne wrote:
> > Le mercredi 24 avril 2019 à 10:31 +0200, Michel Dänzer a écrit :
> > > On 2019-04-19 10:38 a.m., Paul Kocialkowski wrote:
> > > > On Thu, 2019-04-18 at 20:30 -0400, Nicolas Dufresne wrote:
> > > > > Le jeudi 18 avril 2019 à 10:18 +0200, Daniel Vetter a écrit :
> > > > > In the first, we'd need a mechanism where we can schedule a render at a
> > > > > specific time or vblank. We can of course already implement this in
> > > > > software, but with fences, the scheduling would need to be done in the
> > > > > driver. Then if the fence is signalled earlier, the driver should hold
> > > > > on until the delay is met. If the fence got signalled late, we also
> > > > > need to think of a workflow. As we can't schedule more then one render
> > > > > in DRM at one time, I don't really see yet how to make that work.
> > > >
> > > > Indeed, that's also one of the main issues I've spotted. Before using
> > > > an implicit fence, we basically have to make sure the frame is due for
> > > > display at the next vblank. Otherwise, we need to refrain from using
> > > > the fence and schedule the flip later, which is kind of counter-
> > > > productive.
> > >
> > > Fences are about signalling that the contents of a frame are "done" and
> > > ready to be presented. They're not about specifying which frame is to be
> > > presented when.
> > >
> > >
> > > > I feel like specifying a target vblank would be a good unit for that,
> > >
> > > The mechanism described above works for that.
> > >
> > > > since it's our native granularity after all (while a timestamp is not).
> > >
> > > Note that variable refresh rate (Adaptive Sync / FreeSync / G-Sync)
> > > changes things in this regard. It makes the vblank length variable, and
> > > if you wait for multiple vblanks between flips, you get the maximum
> > > vblank length corresponding to the minimum refresh rate / timing
> > > granularity. Thus, it would be useful to allow userspace to specify a
> > > timestamp corresponding to the earliest time when the flip is to
> > > complete. The kernel could then try to hit that as closely as possible.
> >
> > Rendering a video stream is more complex then what you describe here.
> > Whenever there is a unexpected delay (late delivery of a frame as an
> > example) you may endup in situation where one frame is ready after the
> > targeted vblank. If there is another frame that targets the following
> > vblank that gets ready on-time, the previous frame should be replaced
> > by the most recent one.
> >
> > With fences, what happens is that even if you received the next frame
> > on time, naively replacing it is not possible, because we don't know
> > when the fence for the next frame will be signalled. If you simply
> > always replace the current frame, you may endup skipping a lot more
> > vblank then what you expect, and that results in jumpy playback.
>
> So you want to be able to replace a queued flip with another one then.
> That doesn't necessarily require allowing more than one flip to be
> queued ahead of time.
There might be other ways to do it, but this one has plenty of
advantages.
> Note that this can also be done in userspace with explicit fencing (by
> only selecting a frame and submitting it to the kernel after all
> corresponding fences have signalled), at least to some degree, but the
> kernel should be able to do it up to a later point in time and more
> reliably, with less risk of missing a flip for a frame which becomes
> ready just in time.
Indeed, but it would be great if we could do that with implicit fencing
as well.
> > Render queues with timestamp are used to smooth rendering and handle
> > rendering collision so that the latency is kept low (like when you have
> > a 100fps video over a 60Hz display). This is normally done in
> > userspace, but with fences, you ask the kernel to render something in
> > an unpredictable future, so we loose the ability to make the final
> > decision.
>
> That's just not what fences are intended to be used for with the current
> KMS UAPI.
Yes, and I think we're discussing towards changing that in the future.
Cheers,
Paul
--
Paul Kocialkowski, Bootlin
Embedded Linux and kernel engineering
https://bootlin.com
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Support for 2D engines/blitters in V4L2 and DRM
@ 2019-04-24 14:41 ` Paul Kocialkowski
0 siblings, 0 replies; 36+ messages in thread
From: Paul Kocialkowski @ 2019-04-24 14:41 UTC (permalink / raw)
To: Michel Dänzer, Nicolas Dufresne, Daniel Vetter
Cc: Alexandre Courbot, Maxime Ripard, linux-kernel, dri-devel,
Tomasz Figa, Hans Verkuil, Thomas Petazzoni, Dave Airlie,
Mauro Carvalho Chehab, linux-media
Hi,
On Wed, 2019-04-24 at 16:39 +0200, Michel Dänzer wrote:
> On 2019-04-24 2:01 p.m., Nicolas Dufresne wrote:
> > Le mercredi 24 avril 2019 à 10:31 +0200, Michel Dänzer a écrit :
> > > On 2019-04-19 10:38 a.m., Paul Kocialkowski wrote:
> > > > On Thu, 2019-04-18 at 20:30 -0400, Nicolas Dufresne wrote:
> > > > > Le jeudi 18 avril 2019 à 10:18 +0200, Daniel Vetter a écrit :
> > > > > In the first, we'd need a mechanism where we can schedule a render at a
> > > > > specific time or vblank. We can of course already implement this in
> > > > > software, but with fences, the scheduling would need to be done in the
> > > > > driver. Then if the fence is signalled earlier, the driver should hold
> > > > > on until the delay is met. If the fence got signalled late, we also
> > > > > need to think of a workflow. As we can't schedule more then one render
> > > > > in DRM at one time, I don't really see yet how to make that work.
> > > >
> > > > Indeed, that's also one of the main issues I've spotted. Before using
> > > > an implicit fence, we basically have to make sure the frame is due for
> > > > display at the next vblank. Otherwise, we need to refrain from using
> > > > the fence and schedule the flip later, which is kind of counter-
> > > > productive.
> > >
> > > Fences are about signalling that the contents of a frame are "done" and
> > > ready to be presented. They're not about specifying which frame is to be
> > > presented when.
> > >
> > >
> > > > I feel like specifying a target vblank would be a good unit for that,
> > >
> > > The mechanism described above works for that.
> > >
> > > > since it's our native granularity after all (while a timestamp is not).
> > >
> > > Note that variable refresh rate (Adaptive Sync / FreeSync / G-Sync)
> > > changes things in this regard. It makes the vblank length variable, and
> > > if you wait for multiple vblanks between flips, you get the maximum
> > > vblank length corresponding to the minimum refresh rate / timing
> > > granularity. Thus, it would be useful to allow userspace to specify a
> > > timestamp corresponding to the earliest time when the flip is to
> > > complete. The kernel could then try to hit that as closely as possible.
> >
> > Rendering a video stream is more complex then what you describe here.
> > Whenever there is a unexpected delay (late delivery of a frame as an
> > example) you may endup in situation where one frame is ready after the
> > targeted vblank. If there is another frame that targets the following
> > vblank that gets ready on-time, the previous frame should be replaced
> > by the most recent one.
> >
> > With fences, what happens is that even if you received the next frame
> > on time, naively replacing it is not possible, because we don't know
> > when the fence for the next frame will be signalled. If you simply
> > always replace the current frame, you may endup skipping a lot more
> > vblank then what you expect, and that results in jumpy playback.
>
> So you want to be able to replace a queued flip with another one then.
> That doesn't necessarily require allowing more than one flip to be
> queued ahead of time.
There might be other ways to do it, but this one has plenty of
advantages.
> Note that this can also be done in userspace with explicit fencing (by
> only selecting a frame and submitting it to the kernel after all
> corresponding fences have signalled), at least to some degree, but the
> kernel should be able to do it up to a later point in time and more
> reliably, with less risk of missing a flip for a frame which becomes
> ready just in time.
Indeed, but it would be great if we could do that with implicit fencing
as well.
> > Render queues with timestamp are used to smooth rendering and handle
> > rendering collision so that the latency is kept low (like when you have
> > a 100fps video over a 60Hz display). This is normally done in
> > userspace, but with fences, you ask the kernel to render something in
> > an unpredictable future, so we loose the ability to make the final
> > decision.
>
> That's just not what fences are intended to be used for with the current
> KMS UAPI.
Yes, and I think we're discussing towards changing that in the future.
Cheers,
Paul
--
Paul Kocialkowski, Bootlin
Embedded Linux and kernel engineering
https://bootlin.com
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Support for 2D engines/blitters in V4L2 and DRM
2019-04-24 14:41 ` Paul Kocialkowski
(?)
@ 2019-04-24 15:06 ` Daniel Vetter
2019-04-24 15:44 ` Nicolas Dufresne
-1 siblings, 1 reply; 36+ messages in thread
From: Daniel Vetter @ 2019-04-24 15:06 UTC (permalink / raw)
To: Paul Kocialkowski
Cc: Michel Dänzer, Nicolas Dufresne, Alexandre Courbot,
Maxime Ripard, Linux Kernel Mailing List, dri-devel, Tomasz Figa,
Hans Verkuil, Thomas Petazzoni, Dave Airlie,
Mauro Carvalho Chehab, open list:DMA BUFFER SHARING FRAMEWORK
On Wed, Apr 24, 2019 at 4:41 PM Paul Kocialkowski
<paul.kocialkowski@bootlin.com> wrote:
>
> Hi,
>
> On Wed, 2019-04-24 at 16:39 +0200, Michel Dänzer wrote:
> > On 2019-04-24 2:01 p.m., Nicolas Dufresne wrote:
> > > Le mercredi 24 avril 2019 à 10:31 +0200, Michel Dänzer a écrit :
> > > > On 2019-04-19 10:38 a.m., Paul Kocialkowski wrote:
> > > > > On Thu, 2019-04-18 at 20:30 -0400, Nicolas Dufresne wrote:
> > > > > > Le jeudi 18 avril 2019 à 10:18 +0200, Daniel Vetter a écrit :
> > > > > > In the first, we'd need a mechanism where we can schedule a render at a
> > > > > > specific time or vblank. We can of course already implement this in
> > > > > > software, but with fences, the scheduling would need to be done in the
> > > > > > driver. Then if the fence is signalled earlier, the driver should hold
> > > > > > on until the delay is met. If the fence got signalled late, we also
> > > > > > need to think of a workflow. As we can't schedule more then one render
> > > > > > in DRM at one time, I don't really see yet how to make that work.
> > > > >
> > > > > Indeed, that's also one of the main issues I've spotted. Before using
> > > > > an implicit fence, we basically have to make sure the frame is due for
> > > > > display at the next vblank. Otherwise, we need to refrain from using
> > > > > the fence and schedule the flip later, which is kind of counter-
> > > > > productive.
> > > >
> > > > Fences are about signalling that the contents of a frame are "done" and
> > > > ready to be presented. They're not about specifying which frame is to be
> > > > presented when.
> > > >
> > > >
> > > > > I feel like specifying a target vblank would be a good unit for that,
> > > >
> > > > The mechanism described above works for that.
> > > >
> > > > > since it's our native granularity after all (while a timestamp is not).
> > > >
> > > > Note that variable refresh rate (Adaptive Sync / FreeSync / G-Sync)
> > > > changes things in this regard. It makes the vblank length variable, and
> > > > if you wait for multiple vblanks between flips, you get the maximum
> > > > vblank length corresponding to the minimum refresh rate / timing
> > > > granularity. Thus, it would be useful to allow userspace to specify a
> > > > timestamp corresponding to the earliest time when the flip is to
> > > > complete. The kernel could then try to hit that as closely as possible.
> > >
> > > Rendering a video stream is more complex then what you describe here.
> > > Whenever there is a unexpected delay (late delivery of a frame as an
> > > example) you may endup in situation where one frame is ready after the
> > > targeted vblank. If there is another frame that targets the following
> > > vblank that gets ready on-time, the previous frame should be replaced
> > > by the most recent one.
> > >
> > > With fences, what happens is that even if you received the next frame
> > > on time, naively replacing it is not possible, because we don't know
> > > when the fence for the next frame will be signalled. If you simply
> > > always replace the current frame, you may endup skipping a lot more
> > > vblank then what you expect, and that results in jumpy playback.
> >
> > So you want to be able to replace a queued flip with another one then.
> > That doesn't necessarily require allowing more than one flip to be
> > queued ahead of time.
>
> There might be other ways to do it, but this one has plenty of
> advantages.
The point of kms (well one of the reasons) was to separate the
implementation of modesetting for specific hw from policy decisions
like which frames to drop and how to schedule them. Kernel gives
tools, userspace implements the actual protocols.
There's definitely a bit a gap around scheduling flips for a specific
frame or allowing to cancel/overwrite an already scheduled flip, but
no one yet has come up with a clear proposal for new uapi + example
implementation + userspace implementation + big enough support from
other compositors that this is what they want too.
And yes writing a really good compositor is really hard, and I think a
lot of people underestimate that and just create something useful for
their niche. If userspace can't come up with a shared library of
helpers, I don't think baking it in as kernel uapi with 10+ years
regression free api guarantees is going to make it any better.
> > Note that this can also be done in userspace with explicit fencing (by
> > only selecting a frame and submitting it to the kernel after all
> > corresponding fences have signalled), at least to some degree, but the
> > kernel should be able to do it up to a later point in time and more
> > reliably, with less risk of missing a flip for a frame which becomes
> > ready just in time.
>
> Indeed, but it would be great if we could do that with implicit fencing
> as well.
1. extract implicit fences from dma-buf. This part is just an idea,
but easy to implement once we have someone who actually wants this.
All we need is a new ioctl on the dma-buf to export the fences from
the reservation_object as a sync_file (either the exclusive or the
shared ones, selected with a flag).
2. do the exact same frame scheduling as with explicit fencing
3. supply explicit fences in your atomic ioctl calls - these should
overrule any implicit fences (assuming correct kernel drivers, but we
have helpers so you can assume they all work correctly).
By design this is possible, it's just that no one yet bothered enough
to make it happen.
-Daniel
> > > Render queues with timestamp are used to smooth rendering and handle
> > > rendering collision so that the latency is kept low (like when you have
> > > a 100fps video over a 60Hz display). This is normally done in
> > > userspace, but with fences, you ask the kernel to render something in
> > > an unpredictable future, so we loose the ability to make the final
> > > decision.
> >
> > That's just not what fences are intended to be used for with the current
> > KMS UAPI.
>
> Yes, and I think we're discussing towards changing that in the future.
>
> Cheers,
>
> Paul
>
> --
> Paul Kocialkowski, Bootlin
> Embedded Linux and kernel engineering
> https://bootlin.com
>
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Support for 2D engines/blitters in V4L2 and DRM
2019-04-24 15:06 ` Daniel Vetter
@ 2019-04-24 15:44 ` Nicolas Dufresne
2019-04-24 16:54 ` Michel Dänzer
0 siblings, 1 reply; 36+ messages in thread
From: Nicolas Dufresne @ 2019-04-24 15:44 UTC (permalink / raw)
To: Daniel Vetter, Paul Kocialkowski
Cc: Michel Dänzer, Alexandre Courbot, Maxime Ripard,
Linux Kernel Mailing List, dri-devel, Tomasz Figa, Hans Verkuil,
Thomas Petazzoni, Dave Airlie, Mauro Carvalho Chehab,
open list:DMA BUFFER SHARING FRAMEWORK
[-- Attachment #1: Type: text/plain, Size: 8155 bytes --]
Le mercredi 24 avril 2019 à 17:06 +0200, Daniel Vetter a écrit :
> On Wed, Apr 24, 2019 at 4:41 PM Paul Kocialkowski
> <paul.kocialkowski@bootlin.com> wrote:
> > Hi,
> >
> > On Wed, 2019-04-24 at 16:39 +0200, Michel Dänzer wrote:
> > > On 2019-04-24 2:01 p.m., Nicolas Dufresne wrote:
> > > > Le mercredi 24 avril 2019 à 10:31 +0200, Michel Dänzer a écrit :
> > > > > On 2019-04-19 10:38 a.m., Paul Kocialkowski wrote:
> > > > > > On Thu, 2019-04-18 at 20:30 -0400, Nicolas Dufresne wrote:
> > > > > > > Le jeudi 18 avril 2019 à 10:18 +0200, Daniel Vetter a écrit :
> > > > > > > In the first, we'd need a mechanism where we can schedule a render at a
> > > > > > > specific time or vblank. We can of course already implement this in
> > > > > > > software, but with fences, the scheduling would need to be done in the
> > > > > > > driver. Then if the fence is signalled earlier, the driver should hold
> > > > > > > on until the delay is met. If the fence got signalled late, we also
> > > > > > > need to think of a workflow. As we can't schedule more then one render
> > > > > > > in DRM at one time, I don't really see yet how to make that work.
> > > > > >
> > > > > > Indeed, that's also one of the main issues I've spotted. Before using
> > > > > > an implicit fence, we basically have to make sure the frame is due for
> > > > > > display at the next vblank. Otherwise, we need to refrain from using
> > > > > > the fence and schedule the flip later, which is kind of counter-
> > > > > > productive.
> > > > >
> > > > > Fences are about signalling that the contents of a frame are "done" and
> > > > > ready to be presented. They're not about specifying which frame is to be
> > > > > presented when.
> > > > >
> > > > >
> > > > > > I feel like specifying a target vblank would be a good unit for that,
> > > > >
> > > > > The mechanism described above works for that.
> > > > >
> > > > > > since it's our native granularity after all (while a timestamp is not).
> > > > >
> > > > > Note that variable refresh rate (Adaptive Sync / FreeSync / G-Sync)
> > > > > changes things in this regard. It makes the vblank length variable, and
> > > > > if you wait for multiple vblanks between flips, you get the maximum
> > > > > vblank length corresponding to the minimum refresh rate / timing
> > > > > granularity. Thus, it would be useful to allow userspace to specify a
> > > > > timestamp corresponding to the earliest time when the flip is to
> > > > > complete. The kernel could then try to hit that as closely as possible.
> > > >
> > > > Rendering a video stream is more complex then what you describe here.
> > > > Whenever there is a unexpected delay (late delivery of a frame as an
> > > > example) you may endup in situation where one frame is ready after the
> > > > targeted vblank. If there is another frame that targets the following
> > > > vblank that gets ready on-time, the previous frame should be replaced
> > > > by the most recent one.
> > > >
> > > > With fences, what happens is that even if you received the next frame
> > > > on time, naively replacing it is not possible, because we don't know
> > > > when the fence for the next frame will be signalled. If you simply
> > > > always replace the current frame, you may endup skipping a lot more
> > > > vblank then what you expect, and that results in jumpy playback.
> > >
> > > So you want to be able to replace a queued flip with another one then.
> > > That doesn't necessarily require allowing more than one flip to be
> > > queued ahead of time.
> >
> > There might be other ways to do it, but this one has plenty of
> > advantages.
>
> The point of kms (well one of the reasons) was to separate the
> implementation of modesetting for specific hw from policy decisions
> like which frames to drop and how to schedule them. Kernel gives
> tools, userspace implements the actual protocols.
>
> There's definitely a bit a gap around scheduling flips for a specific
> frame or allowing to cancel/overwrite an already scheduled flip, but
> no one yet has come up with a clear proposal for new uapi + example
> implementation + userspace implementation + big enough support from
> other compositors that this is what they want too.
>
> And yes writing a really good compositor is really hard, and I think a
> lot of people underestimate that and just create something useful for
> their niche. If userspace can't come up with a shared library of
> helpers, I don't think baking it in as kernel uapi with 10+ years
> regression free api guarantees is going to make it any better.
>
> > > Note that this can also be done in userspace with explicit fencing (by
> > > only selecting a frame and submitting it to the kernel after all
> > > corresponding fences have signalled), at least to some degree, but the
> > > kernel should be able to do it up to a later point in time and more
> > > reliably, with less risk of missing a flip for a frame which becomes
> > > ready just in time.
> >
> > Indeed, but it would be great if we could do that with implicit fencing
> > as well.
>
> 1. extract implicit fences from dma-buf. This part is just an idea,
> but easy to implement once we have someone who actually wants this.
> All we need is a new ioctl on the dma-buf to export the fences from
> the reservation_object as a sync_file (either the exclusive or the
> shared ones, selected with a flag).
> 2. do the exact same frame scheduling as with explicit fencing
> 3. supply explicit fences in your atomic ioctl calls - these should
> overrule any implicit fences (assuming correct kernel drivers, but we
> have helpers so you can assume they all work correctly).
>
> By design this is possible, it's just that no one yet bothered enough
> to make it happen.
> -Daniel
I'm not sure I understand the workflow of this one. I'm all in favour
leaving the hard work to userspace. Note that I have assumed explicit
fences from the start, I don't think implicit fence will ever exist in
v4l2, but I might be wrong. What I understood is that there was a
previous attempt in the past but it raised more issues then it actually
solved. So that being said, how do handle exactly the follow use cases:
- A frame was lost by capture driver, but it was schedule as being the
next buffer to render (normally previous frame should remain).
- The scheduled frame is late for the next vblank (didn't signal on-
time), a new one may be better for the next vlbank, but we will only
know when it's fence is signaled.
Better in this context means the the presentation time of this frame is
closer to the next vblank time. Keep in mind that the idea is to
schedule the frames before they are signal, in order to make the usage
of the fence useful in lowering the latency. Of course as Michel said,
we could just always wait on the fence and just schedule. But if you do
that, why would you care implementing the fence in v4l2 to start with,
DQBuf does just that already.
Note that this has nothing to do with the valid use case where you
would want to apply various transformations (m2m or gpu) on the capture
buffer. You still gain from the fence in the context, even if you wait
in userspace on the fence before display. This alone is likely enough
to justify using fences.
>
> > > > Render queues with timestamp are used to smooth rendering and handle
> > > > rendering collision so that the latency is kept low (like when you have
> > > > a 100fps video over a 60Hz display). This is normally done in
> > > > userspace, but with fences, you ask the kernel to render something in
> > > > an unpredictable future, so we loose the ability to make the final
> > > > decision.
> > >
> > > That's just not what fences are intended to be used for with the current
> > > KMS UAPI.
> >
> > Yes, and I think we're discussing towards changing that in the future.
> >
> > Cheers,
> >
> > Paul
> >
> > --
> > Paul Kocialkowski, Bootlin
> > Embedded Linux and kernel engineering
> > https://bootlin.com
> >
>
>
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 195 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Support for 2D engines/blitters in V4L2 and DRM
2019-04-24 15:44 ` Nicolas Dufresne
@ 2019-04-24 16:54 ` Michel Dänzer
0 siblings, 0 replies; 36+ messages in thread
From: Michel Dänzer @ 2019-04-24 16:54 UTC (permalink / raw)
To: Nicolas Dufresne, Daniel Vetter, Paul Kocialkowski
Cc: Alexandre Courbot, Maxime Ripard, Linux Kernel Mailing List,
dri-devel, Tomasz Figa, Hans Verkuil, Thomas Petazzoni,
Dave Airlie, Mauro Carvalho Chehab,
open list:DMA BUFFER SHARING FRAMEWORK
[-- Attachment #1.1: Type: text/plain, Size: 5289 bytes --]
On 2019-04-24 5:44 p.m., Nicolas Dufresne wrote:
> Le mercredi 24 avril 2019 à 17:06 +0200, Daniel Vetter a écrit :
>> On Wed, Apr 24, 2019 at 4:41 PM Paul Kocialkowski
>> <paul.kocialkowski@bootlin.com> wrote:
>>> On Wed, 2019-04-24 at 16:39 +0200, Michel Dänzer wrote:
>>>> On 2019-04-24 2:01 p.m., Nicolas Dufresne wrote:
>>>>>
>>>>> Rendering a video stream is more complex then what you describe here.
>>>>> Whenever there is a unexpected delay (late delivery of a frame as an
>>>>> example) you may endup in situation where one frame is ready after the
>>>>> targeted vblank. If there is another frame that targets the following
>>>>> vblank that gets ready on-time, the previous frame should be replaced
>>>>> by the most recent one.
>>>>>
>>>>> With fences, what happens is that even if you received the next frame
>>>>> on time, naively replacing it is not possible, because we don't know
>>>>> when the fence for the next frame will be signalled. If you simply
>>>>> always replace the current frame, you may endup skipping a lot more
>>>>> vblank then what you expect, and that results in jumpy playback.
>>>>
>>>> So you want to be able to replace a queued flip with another one then.
>>>> That doesn't necessarily require allowing more than one flip to be
>>>> queued ahead of time.
>>>
>>> There might be other ways to do it, but this one has plenty of
>>> advantages.
>>
>> The point of kms (well one of the reasons) was to separate the
>> implementation of modesetting for specific hw from policy decisions
>> like which frames to drop and how to schedule them. Kernel gives
>> tools, userspace implements the actual protocols.
>>
>> There's definitely a bit a gap around scheduling flips for a specific
>> frame or allowing to cancel/overwrite an already scheduled flip, but
>> no one yet has come up with a clear proposal for new uapi + example
>> implementation + userspace implementation + big enough support from
>> other compositors that this is what they want too.
Actually, the ATOMIC_AMEND patches propose a way to replace a scheduled
flip?
>>>> Note that this can also be done in userspace with explicit fencing (by
>>>> only selecting a frame and submitting it to the kernel after all
>>>> corresponding fences have signalled), at least to some degree, but the
>>>> kernel should be able to do it up to a later point in time and more
>>>> reliably, with less risk of missing a flip for a frame which becomes
>>>> ready just in time.
>>>
>>> Indeed, but it would be great if we could do that with implicit fencing
>>> as well.
>>
>> 1. extract implicit fences from dma-buf. This part is just an idea,
>> but easy to implement once we have someone who actually wants this.
>> All we need is a new ioctl on the dma-buf to export the fences from
>> the reservation_object as a sync_file (either the exclusive or the
>> shared ones, selected with a flag).
>> 2. do the exact same frame scheduling as with explicit fencing
>> 3. supply explicit fences in your atomic ioctl calls - these should
>> overrule any implicit fences (assuming correct kernel drivers, but we
>> have helpers so you can assume they all work correctly).
>>
>> By design this is possible, it's just that no one yet bothered enough
>> to make it happen.
>> -Daniel
>
> I'm not sure I understand the workflow of this one. I'm all in favour
> leaving the hard work to userspace. Note that I have assumed explicit
> fences from the start, I don't think implicit fence will ever exist in
> v4l2, but I might be wrong. What I understood is that there was a
> previous attempt in the past but it raised more issues then it actually
> solved. So that being said, how do handle exactly the follow use cases:
>
> - A frame was lost by capture driver, but it was schedule as being the
> next buffer to render (normally previous frame should remain).
Userspace just doesn't call into the kernel to flip to the lost frame,
so the previous one remains.
> - The scheduled frame is late for the next vblank (didn't signal on-
> time), a new one may be better for the next vlbank, but we will only
> know when it's fence is signaled.
Userspace only selects a frame and submits it to the kernel after all
its fences have signalled.
> Better in this context means the the presentation time of this frame is
> closer to the next vblank time. Keep in mind that the idea is to
> schedule the frames before they are signal, in order to make the usage
> of the fence useful in lowering the latency.
Fences are about signalling completion, not about low latency.
With a display server, the client can send frames to the display server
ahead of time, only the display server needs to wait for fences to
signal before submitting frames to the kernel.
> Of course as Michel said, we could just always wait on the fence and
> just schedule. But if you do that, why would you care implementing the
> fence in v4l2 to start with, DQBuf does just that already.
A fence is more likely to work out of the box with non-V4L-related code
than DQBuf?
--
Earthling Michel Dänzer | https://www.amd.com
Libre software enthusiast | Mesa and X developer
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 195 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Support for 2D engines/blitters in V4L2 and DRM
@ 2019-04-24 16:54 ` Michel Dänzer
0 siblings, 0 replies; 36+ messages in thread
From: Michel Dänzer @ 2019-04-24 16:54 UTC (permalink / raw)
To: Nicolas Dufresne, Daniel Vetter, Paul Kocialkowski
Cc: Alexandre Courbot, Maxime Ripard, Linux Kernel Mailing List,
dri-devel, Tomasz Figa, Hans Verkuil, Thomas Petazzoni,
Dave Airlie, Mauro Carvalho Chehab,
open list:DMA BUFFER SHARING FRAMEWORK
[-- Attachment #1.1.1: Type: text/plain, Size: 5289 bytes --]
On 2019-04-24 5:44 p.m., Nicolas Dufresne wrote:
> Le mercredi 24 avril 2019 à 17:06 +0200, Daniel Vetter a écrit :
>> On Wed, Apr 24, 2019 at 4:41 PM Paul Kocialkowski
>> <paul.kocialkowski@bootlin.com> wrote:
>>> On Wed, 2019-04-24 at 16:39 +0200, Michel Dänzer wrote:
>>>> On 2019-04-24 2:01 p.m., Nicolas Dufresne wrote:
>>>>>
>>>>> Rendering a video stream is more complex then what you describe here.
>>>>> Whenever there is a unexpected delay (late delivery of a frame as an
>>>>> example) you may endup in situation where one frame is ready after the
>>>>> targeted vblank. If there is another frame that targets the following
>>>>> vblank that gets ready on-time, the previous frame should be replaced
>>>>> by the most recent one.
>>>>>
>>>>> With fences, what happens is that even if you received the next frame
>>>>> on time, naively replacing it is not possible, because we don't know
>>>>> when the fence for the next frame will be signalled. If you simply
>>>>> always replace the current frame, you may endup skipping a lot more
>>>>> vblank then what you expect, and that results in jumpy playback.
>>>>
>>>> So you want to be able to replace a queued flip with another one then.
>>>> That doesn't necessarily require allowing more than one flip to be
>>>> queued ahead of time.
>>>
>>> There might be other ways to do it, but this one has plenty of
>>> advantages.
>>
>> The point of kms (well one of the reasons) was to separate the
>> implementation of modesetting for specific hw from policy decisions
>> like which frames to drop and how to schedule them. Kernel gives
>> tools, userspace implements the actual protocols.
>>
>> There's definitely a bit a gap around scheduling flips for a specific
>> frame or allowing to cancel/overwrite an already scheduled flip, but
>> no one yet has come up with a clear proposal for new uapi + example
>> implementation + userspace implementation + big enough support from
>> other compositors that this is what they want too.
Actually, the ATOMIC_AMEND patches propose a way to replace a scheduled
flip?
>>>> Note that this can also be done in userspace with explicit fencing (by
>>>> only selecting a frame and submitting it to the kernel after all
>>>> corresponding fences have signalled), at least to some degree, but the
>>>> kernel should be able to do it up to a later point in time and more
>>>> reliably, with less risk of missing a flip for a frame which becomes
>>>> ready just in time.
>>>
>>> Indeed, but it would be great if we could do that with implicit fencing
>>> as well.
>>
>> 1. extract implicit fences from dma-buf. This part is just an idea,
>> but easy to implement once we have someone who actually wants this.
>> All we need is a new ioctl on the dma-buf to export the fences from
>> the reservation_object as a sync_file (either the exclusive or the
>> shared ones, selected with a flag).
>> 2. do the exact same frame scheduling as with explicit fencing
>> 3. supply explicit fences in your atomic ioctl calls - these should
>> overrule any implicit fences (assuming correct kernel drivers, but we
>> have helpers so you can assume they all work correctly).
>>
>> By design this is possible, it's just that no one yet bothered enough
>> to make it happen.
>> -Daniel
>
> I'm not sure I understand the workflow of this one. I'm all in favour
> leaving the hard work to userspace. Note that I have assumed explicit
> fences from the start, I don't think implicit fence will ever exist in
> v4l2, but I might be wrong. What I understood is that there was a
> previous attempt in the past but it raised more issues then it actually
> solved. So that being said, how do handle exactly the follow use cases:
>
> - A frame was lost by capture driver, but it was schedule as being the
> next buffer to render (normally previous frame should remain).
Userspace just doesn't call into the kernel to flip to the lost frame,
so the previous one remains.
> - The scheduled frame is late for the next vblank (didn't signal on-
> time), a new one may be better for the next vlbank, but we will only
> know when it's fence is signaled.
Userspace only selects a frame and submits it to the kernel after all
its fences have signalled.
> Better in this context means the the presentation time of this frame is
> closer to the next vblank time. Keep in mind that the idea is to
> schedule the frames before they are signal, in order to make the usage
> of the fence useful in lowering the latency.
Fences are about signalling completion, not about low latency.
With a display server, the client can send frames to the display server
ahead of time, only the display server needs to wait for fences to
signal before submitting frames to the kernel.
> Of course as Michel said, we could just always wait on the fence and
> just schedule. But if you do that, why would you care implementing the
> fence in v4l2 to start with, DQBuf does just that already.
A fence is more likely to work out of the box with non-V4L-related code
than DQBuf?
--
Earthling Michel Dänzer | https://www.amd.com
Libre software enthusiast | Mesa and X developer
[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 195 bytes --]
[-- Attachment #2: Type: text/plain, Size: 159 bytes --]
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Support for 2D engines/blitters in V4L2 and DRM
2019-04-24 16:54 ` Michel Dänzer
@ 2019-04-24 17:43 ` Nicolas Dufresne
-1 siblings, 0 replies; 36+ messages in thread
From: Nicolas Dufresne @ 2019-04-24 17:43 UTC (permalink / raw)
To: Michel Dänzer, Daniel Vetter, Paul Kocialkowski
Cc: Alexandre Courbot, Maxime Ripard, Linux Kernel Mailing List,
dri-devel, Tomasz Figa, Hans Verkuil, Thomas Petazzoni,
Dave Airlie, Mauro Carvalho Chehab,
open list:DMA BUFFER SHARING FRAMEWORK
[-- Attachment #1: Type: text/plain, Size: 7371 bytes --]
Le mercredi 24 avril 2019 à 18:54 +0200, Michel Dänzer a écrit :
> On 2019-04-24 5:44 p.m., Nicolas Dufresne wrote:
> > Le mercredi 24 avril 2019 à 17:06 +0200, Daniel Vetter a écrit :
> > > On Wed, Apr 24, 2019 at 4:41 PM Paul Kocialkowski
> > > <paul.kocialkowski@bootlin.com> wrote:
> > > > On Wed, 2019-04-24 at 16:39 +0200, Michel Dänzer wrote:
> > > > > On 2019-04-24 2:01 p.m., Nicolas Dufresne wrote:
> > > > > > Rendering a video stream is more complex then what you describe here.
> > > > > > Whenever there is a unexpected delay (late delivery of a frame as an
> > > > > > example) you may endup in situation where one frame is ready after the
> > > > > > targeted vblank. If there is another frame that targets the following
> > > > > > vblank that gets ready on-time, the previous frame should be replaced
> > > > > > by the most recent one.
> > > > > >
> > > > > > With fences, what happens is that even if you received the next frame
> > > > > > on time, naively replacing it is not possible, because we don't know
> > > > > > when the fence for the next frame will be signalled. If you simply
> > > > > > always replace the current frame, you may endup skipping a lot more
> > > > > > vblank then what you expect, and that results in jumpy playback.
> > > > >
> > > > > So you want to be able to replace a queued flip with another one then.
> > > > > That doesn't necessarily require allowing more than one flip to be
> > > > > queued ahead of time.
> > > >
> > > > There might be other ways to do it, but this one has plenty of
> > > > advantages.
> > >
> > > The point of kms (well one of the reasons) was to separate the
> > > implementation of modesetting for specific hw from policy decisions
> > > like which frames to drop and how to schedule them. Kernel gives
> > > tools, userspace implements the actual protocols.
> > >
> > > There's definitely a bit a gap around scheduling flips for a specific
> > > frame or allowing to cancel/overwrite an already scheduled flip, but
> > > no one yet has come up with a clear proposal for new uapi + example
> > > implementation + userspace implementation + big enough support from
> > > other compositors that this is what they want too.
>
> Actually, the ATOMIC_AMEND patches propose a way to replace a scheduled
> flip?
>
>
> > > > > Note that this can also be done in userspace with explicit fencing (by
> > > > > only selecting a frame and submitting it to the kernel after all
> > > > > corresponding fences have signalled), at least to some degree, but the
> > > > > kernel should be able to do it up to a later point in time and more
> > > > > reliably, with less risk of missing a flip for a frame which becomes
> > > > > ready just in time.
> > > >
> > > > Indeed, but it would be great if we could do that with implicit fencing
> > > > as well.
> > >
> > > 1. extract implicit fences from dma-buf. This part is just an idea,
> > > but easy to implement once we have someone who actually wants this.
> > > All we need is a new ioctl on the dma-buf to export the fences from
> > > the reservation_object as a sync_file (either the exclusive or the
> > > shared ones, selected with a flag).
> > > 2. do the exact same frame scheduling as with explicit fencing
> > > 3. supply explicit fences in your atomic ioctl calls - these should
> > > overrule any implicit fences (assuming correct kernel drivers, but we
> > > have helpers so you can assume they all work correctly).
> > >
> > > By design this is possible, it's just that no one yet bothered enough
> > > to make it happen.
> > > -Daniel
> >
> > I'm not sure I understand the workflow of this one. I'm all in favour
> > leaving the hard work to userspace. Note that I have assumed explicit
> > fences from the start, I don't think implicit fence will ever exist in
> > v4l2, but I might be wrong. What I understood is that there was a
> > previous attempt in the past but it raised more issues then it actually
> > solved. So that being said, how do handle exactly the follow use cases:
> >
> > - A frame was lost by capture driver, but it was schedule as being the
> > next buffer to render (normally previous frame should remain).
>
> Userspace just doesn't call into the kernel to flip to the lost frame,
> so the previous one remains.
We are stuck in a loop you a me. Considering v4l2 to drm, where fences
don't exist on v4l2, it makes very little sense to bring up fences if
we are to wait on the fence in userspace. Unless of course you have
other operations before end making a proper use of the fences.
>
> > - The scheduled frame is late for the next vblank (didn't signal on-
> > time), a new one may be better for the next vlbank, but we will only
> > know when it's fence is signaled.
>
> Userspace only selects a frame and submits it to the kernel after all
> its fences have signalled.
>
> > Better in this context means the the presentation time of this frame is
> > closer to the next vblank time. Keep in mind that the idea is to
> > schedule the frames before they are signal, in order to make the usage
> > of the fence useful in lowering the latency.
>
> Fences are about signalling completion, not about low latency.
It can be used to remove a roundtrip with userspace at a very time
sensitive moment. If you pass a dmabuf with it's unsignalled fence to a
kernel driver, the driver can start the job on this dmabuf as soon as
the fence is signalled. If you always wait on a fence in userspace, you
have to wait for the userspace process to be scheduled, then userspace
will setup the drm atomic request or similar action, which may take
some time and may require another process in the kernel to have to be
schedule. This effectively adds some variable delay, a gap where
nothing is happening between two operations. This time is lost and
contributes to the overall operation latency.
The benefit of fences we are looking for is being able to setup before
the fence is signalled the operations on various compatible drivers.
This way, on the time critical moment a driver can be feed more jobs,
there is no userspace rountrip involved. It is also proposed to use it
to return the buffers into v4l2 queued when they are freed, which can
in some conditions avoid let's say a capture driver from skipping due
to random scheduling delays.
>
> With a display server, the client can send frames to the display server
> ahead of time, only the display server needs to wait for fences to
> signal before submitting frames to the kernel.
>
>
> > Of course as Michel said, we could just always wait on the fence and
> > just schedule. But if you do that, why would you care implementing the
> > fence in v4l2 to start with, DQBuf does just that already.
>
> A fence is more likely to work out of the box with non-V4L-related code
> than DQBuf?
If you use DQBuf, you are guarantied that the data has been produced. A
fence is not useful on a buffer that already contains the data you
would be waiting for. That's why the fence is provided in the RFC at
QBUf, basically when the free buffer is given to the v4l2 driver. QBuf
can also be passed a fence in the RFC, so if the buffer is not yet
free, the driver would wait on the fence before using it.
>
>
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 195 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Support for 2D engines/blitters in V4L2 and DRM
@ 2019-04-24 17:43 ` Nicolas Dufresne
0 siblings, 0 replies; 36+ messages in thread
From: Nicolas Dufresne @ 2019-04-24 17:43 UTC (permalink / raw)
To: Michel Dänzer, Daniel Vetter, Paul Kocialkowski
Cc: Alexandre Courbot, Maxime Ripard, Linux Kernel Mailing List,
dri-devel, Tomasz Figa, Hans Verkuil, Thomas Petazzoni,
Dave Airlie, Mauro Carvalho Chehab,
open list:DMA BUFFER SHARING FRAMEWORK
[-- Attachment #1: Type: text/plain, Size: 7371 bytes --]
Le mercredi 24 avril 2019 à 18:54 +0200, Michel Dänzer a écrit :
> On 2019-04-24 5:44 p.m., Nicolas Dufresne wrote:
> > Le mercredi 24 avril 2019 à 17:06 +0200, Daniel Vetter a écrit :
> > > On Wed, Apr 24, 2019 at 4:41 PM Paul Kocialkowski
> > > <paul.kocialkowski@bootlin.com> wrote:
> > > > On Wed, 2019-04-24 at 16:39 +0200, Michel Dänzer wrote:
> > > > > On 2019-04-24 2:01 p.m., Nicolas Dufresne wrote:
> > > > > > Rendering a video stream is more complex then what you describe here.
> > > > > > Whenever there is a unexpected delay (late delivery of a frame as an
> > > > > > example) you may endup in situation where one frame is ready after the
> > > > > > targeted vblank. If there is another frame that targets the following
> > > > > > vblank that gets ready on-time, the previous frame should be replaced
> > > > > > by the most recent one.
> > > > > >
> > > > > > With fences, what happens is that even if you received the next frame
> > > > > > on time, naively replacing it is not possible, because we don't know
> > > > > > when the fence for the next frame will be signalled. If you simply
> > > > > > always replace the current frame, you may endup skipping a lot more
> > > > > > vblank then what you expect, and that results in jumpy playback.
> > > > >
> > > > > So you want to be able to replace a queued flip with another one then.
> > > > > That doesn't necessarily require allowing more than one flip to be
> > > > > queued ahead of time.
> > > >
> > > > There might be other ways to do it, but this one has plenty of
> > > > advantages.
> > >
> > > The point of kms (well one of the reasons) was to separate the
> > > implementation of modesetting for specific hw from policy decisions
> > > like which frames to drop and how to schedule them. Kernel gives
> > > tools, userspace implements the actual protocols.
> > >
> > > There's definitely a bit a gap around scheduling flips for a specific
> > > frame or allowing to cancel/overwrite an already scheduled flip, but
> > > no one yet has come up with a clear proposal for new uapi + example
> > > implementation + userspace implementation + big enough support from
> > > other compositors that this is what they want too.
>
> Actually, the ATOMIC_AMEND patches propose a way to replace a scheduled
> flip?
>
>
> > > > > Note that this can also be done in userspace with explicit fencing (by
> > > > > only selecting a frame and submitting it to the kernel after all
> > > > > corresponding fences have signalled), at least to some degree, but the
> > > > > kernel should be able to do it up to a later point in time and more
> > > > > reliably, with less risk of missing a flip for a frame which becomes
> > > > > ready just in time.
> > > >
> > > > Indeed, but it would be great if we could do that with implicit fencing
> > > > as well.
> > >
> > > 1. extract implicit fences from dma-buf. This part is just an idea,
> > > but easy to implement once we have someone who actually wants this.
> > > All we need is a new ioctl on the dma-buf to export the fences from
> > > the reservation_object as a sync_file (either the exclusive or the
> > > shared ones, selected with a flag).
> > > 2. do the exact same frame scheduling as with explicit fencing
> > > 3. supply explicit fences in your atomic ioctl calls - these should
> > > overrule any implicit fences (assuming correct kernel drivers, but we
> > > have helpers so you can assume they all work correctly).
> > >
> > > By design this is possible, it's just that no one yet bothered enough
> > > to make it happen.
> > > -Daniel
> >
> > I'm not sure I understand the workflow of this one. I'm all in favour
> > leaving the hard work to userspace. Note that I have assumed explicit
> > fences from the start, I don't think implicit fence will ever exist in
> > v4l2, but I might be wrong. What I understood is that there was a
> > previous attempt in the past but it raised more issues then it actually
> > solved. So that being said, how do handle exactly the follow use cases:
> >
> > - A frame was lost by capture driver, but it was schedule as being the
> > next buffer to render (normally previous frame should remain).
>
> Userspace just doesn't call into the kernel to flip to the lost frame,
> so the previous one remains.
We are stuck in a loop you a me. Considering v4l2 to drm, where fences
don't exist on v4l2, it makes very little sense to bring up fences if
we are to wait on the fence in userspace. Unless of course you have
other operations before end making a proper use of the fences.
>
> > - The scheduled frame is late for the next vblank (didn't signal on-
> > time), a new one may be better for the next vlbank, but we will only
> > know when it's fence is signaled.
>
> Userspace only selects a frame and submits it to the kernel after all
> its fences have signalled.
>
> > Better in this context means the the presentation time of this frame is
> > closer to the next vblank time. Keep in mind that the idea is to
> > schedule the frames before they are signal, in order to make the usage
> > of the fence useful in lowering the latency.
>
> Fences are about signalling completion, not about low latency.
It can be used to remove a roundtrip with userspace at a very time
sensitive moment. If you pass a dmabuf with it's unsignalled fence to a
kernel driver, the driver can start the job on this dmabuf as soon as
the fence is signalled. If you always wait on a fence in userspace, you
have to wait for the userspace process to be scheduled, then userspace
will setup the drm atomic request or similar action, which may take
some time and may require another process in the kernel to have to be
schedule. This effectively adds some variable delay, a gap where
nothing is happening between two operations. This time is lost and
contributes to the overall operation latency.
The benefit of fences we are looking for is being able to setup before
the fence is signalled the operations on various compatible drivers.
This way, on the time critical moment a driver can be feed more jobs,
there is no userspace rountrip involved. It is also proposed to use it
to return the buffers into v4l2 queued when they are freed, which can
in some conditions avoid let's say a capture driver from skipping due
to random scheduling delays.
>
> With a display server, the client can send frames to the display server
> ahead of time, only the display server needs to wait for fences to
> signal before submitting frames to the kernel.
>
>
> > Of course as Michel said, we could just always wait on the fence and
> > just schedule. But if you do that, why would you care implementing the
> > fence in v4l2 to start with, DQBuf does just that already.
>
> A fence is more likely to work out of the box with non-V4L-related code
> than DQBuf?
If you use DQBuf, you are guarantied that the data has been produced. A
fence is not useful on a buffer that already contains the data you
would be waiting for. That's why the fence is provided in the RFC at
QBUf, basically when the free buffer is given to the v4l2 driver. QBuf
can also be passed a fence in the RFC, so if the buffer is not yet
free, the driver would wait on the fence before using it.
>
>
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 195 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Support for 2D engines/blitters in V4L2 and DRM
2019-04-24 17:43 ` Nicolas Dufresne
@ 2019-04-25 15:17 ` Michel Dänzer
-1 siblings, 0 replies; 36+ messages in thread
From: Michel Dänzer @ 2019-04-25 15:17 UTC (permalink / raw)
To: Nicolas Dufresne, Daniel Vetter, Paul Kocialkowski
Cc: Alexandre Courbot, Maxime Ripard, Linux Kernel Mailing List,
dri-devel, Tomasz Figa, Hans Verkuil, Thomas Petazzoni,
Dave Airlie, Mauro Carvalho Chehab,
open list:DMA BUFFER SHARING FRAMEWORK
[-- Attachment #1.1: Type: text/plain, Size: 7053 bytes --]
On 2019-04-24 7:43 p.m., Nicolas Dufresne wrote:
> Le mercredi 24 avril 2019 à 18:54 +0200, Michel Dänzer a écrit :
>> On 2019-04-24 5:44 p.m., Nicolas Dufresne wrote:
>>> Le mercredi 24 avril 2019 à 17:06 +0200, Daniel Vetter a écrit :
>>>> On Wed, Apr 24, 2019 at 4:41 PM Paul Kocialkowski
>>>> <paul.kocialkowski@bootlin.com> wrote:
>>>>> On Wed, 2019-04-24 at 16:39 +0200, Michel Dänzer wrote:
>>>>>> On 2019-04-24 2:01 p.m., Nicolas Dufresne wrote:
>>>>>>> Rendering a video stream is more complex then what you describe here.
>>>>>>> Whenever there is a unexpected delay (late delivery of a frame as an
>>>>>>> example) you may endup in situation where one frame is ready after the
>>>>>>> targeted vblank. If there is another frame that targets the following
>>>>>>> vblank that gets ready on-time, the previous frame should be replaced
>>>>>>> by the most recent one.
>>>>>>>
>>>>>>> With fences, what happens is that even if you received the next frame
>>>>>>> on time, naively replacing it is not possible, because we don't know
>>>>>>> when the fence for the next frame will be signalled. If you simply
>>>>>>> always replace the current frame, you may endup skipping a lot more
>>>>>>> vblank then what you expect, and that results in jumpy playback.
>>>>>>
>>>>>> So you want to be able to replace a queued flip with another one then.
>>>>>> That doesn't necessarily require allowing more than one flip to be
>>>>>> queued ahead of time.
>>>>>
>>>>> There might be other ways to do it, but this one has plenty of
>>>>> advantages.
>>>>
>>>> The point of kms (well one of the reasons) was to separate the
>>>> implementation of modesetting for specific hw from policy decisions
>>>> like which frames to drop and how to schedule them. Kernel gives
>>>> tools, userspace implements the actual protocols.
>>>>
>>>> There's definitely a bit a gap around scheduling flips for a specific
>>>> frame or allowing to cancel/overwrite an already scheduled flip, but
>>>> no one yet has come up with a clear proposal for new uapi + example
>>>> implementation + userspace implementation + big enough support from
>>>> other compositors that this is what they want too.
>>
>> Actually, the ATOMIC_AMEND patches propose a way to replace a scheduled
>> flip?
>>
>>
>>>>>> Note that this can also be done in userspace with explicit fencing (by
>>>>>> only selecting a frame and submitting it to the kernel after all
>>>>>> corresponding fences have signalled), at least to some degree, but the
>>>>>> kernel should be able to do it up to a later point in time and more
>>>>>> reliably, with less risk of missing a flip for a frame which becomes
>>>>>> ready just in time.
>>>>>
>>>>> Indeed, but it would be great if we could do that with implicit fencing
>>>>> as well.
>>>>
>>>> 1. extract implicit fences from dma-buf. This part is just an idea,
>>>> but easy to implement once we have someone who actually wants this.
>>>> All we need is a new ioctl on the dma-buf to export the fences from
>>>> the reservation_object as a sync_file (either the exclusive or the
>>>> shared ones, selected with a flag).
>>>> 2. do the exact same frame scheduling as with explicit fencing
>>>> 3. supply explicit fences in your atomic ioctl calls - these should
>>>> overrule any implicit fences (assuming correct kernel drivers, but we
>>>> have helpers so you can assume they all work correctly).
>>>>
>>>> By design this is possible, it's just that no one yet bothered enough
>>>> to make it happen.
>>>> -Daniel
>>>
>>> I'm not sure I understand the workflow of this one. I'm all in favour
>>> leaving the hard work to userspace. Note that I have assumed explicit
>>> fences from the start, I don't think implicit fence will ever exist in
>>> v4l2, but I might be wrong. What I understood is that there was a
>>> previous attempt in the past but it raised more issues then it actually
>>> solved. So that being said, how do handle exactly the follow use cases:
>>>
>>> - A frame was lost by capture driver, but it was schedule as being the
>>> next buffer to render (normally previous frame should remain).
>>
>> Userspace just doesn't call into the kernel to flip to the lost frame,
>> so the previous one remains.
>
> We are stuck in a loop you a me. Considering v4l2 to drm, where fences
> don't exist on v4l2, it makes very little sense to bring up fences if
> we are to wait on the fence in userspace.
It makes sense insofar as no V4L specific code would be needed to make
sure that the contents of a buffer produced via V4L aren't consumed
before they're ready to be.
>>> - The scheduled frame is late for the next vblank (didn't signal on-
>>> time), a new one may be better for the next vlbank, but we will only
>>> know when it's fence is signaled.
>>
>> Userspace only selects a frame and submits it to the kernel after all
>> its fences have signalled.
>>
>>> Better in this context means the the presentation time of this frame is
>>> closer to the next vblank time. Keep in mind that the idea is to
>>> schedule the frames before they are signal, in order to make the usage
>>> of the fence useful in lowering the latency.
>>
>> Fences are about signalling completion, not about low latency.
>
> It can be used to remove a roundtrip with userspace at a very time
> sensitive moment. If you pass a dmabuf with it's unsignalled fence to a
> kernel driver, the driver can start the job on this dmabuf as soon as
> the fence is signalled. If you always wait on a fence in userspace, you
> have to wait for the userspace process to be scheduled,
I doubt this magically works without something like that (e.g. a
workqueue, which runs in normal process context) in the kernel either. :)
> then userspace will setup the drm atomic request or similar action, which
> may take some time and may require another process in the kernel to have
> to be schedule. This effectively adds some variable delay, a gap where
> nothing is happening between two operations. This time is lost and
> contributes to the overall operation latency.
It only increases latency if it causes a flip to miss its target vblank,
and it's not possible to know this happens at an unacceptable rate
without trying. The prudent approach is to at least prototype a solution
with as much complexity as possible in userspace first. If that turns
out to perform too badly, then we can think about how to improve it by
adding complexity in the kernel.
> The benefit of fences we are looking for is being able to setup before
> the fence is signalled the operations on various compatible drivers.
> This way, on the time critical moment a driver can be feed more jobs,
> there is no userspace rountrip involved.
That is possible with other operations, just not with page flipping yet.
--
Earthling Michel Dänzer | https://www.amd.com
Libre software enthusiast | Mesa and X developer
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 195 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Support for 2D engines/blitters in V4L2 and DRM
@ 2019-04-25 15:17 ` Michel Dänzer
0 siblings, 0 replies; 36+ messages in thread
From: Michel Dänzer @ 2019-04-25 15:17 UTC (permalink / raw)
To: Nicolas Dufresne, Daniel Vetter, Paul Kocialkowski
Cc: Alexandre Courbot, Maxime Ripard, Linux Kernel Mailing List,
dri-devel, Tomasz Figa, Hans Verkuil, Thomas Petazzoni,
Dave Airlie, Mauro Carvalho Chehab,
open list:DMA BUFFER SHARING FRAMEWORK
[-- Attachment #1.1.1: Type: text/plain, Size: 7053 bytes --]
On 2019-04-24 7:43 p.m., Nicolas Dufresne wrote:
> Le mercredi 24 avril 2019 à 18:54 +0200, Michel Dänzer a écrit :
>> On 2019-04-24 5:44 p.m., Nicolas Dufresne wrote:
>>> Le mercredi 24 avril 2019 à 17:06 +0200, Daniel Vetter a écrit :
>>>> On Wed, Apr 24, 2019 at 4:41 PM Paul Kocialkowski
>>>> <paul.kocialkowski@bootlin.com> wrote:
>>>>> On Wed, 2019-04-24 at 16:39 +0200, Michel Dänzer wrote:
>>>>>> On 2019-04-24 2:01 p.m., Nicolas Dufresne wrote:
>>>>>>> Rendering a video stream is more complex then what you describe here.
>>>>>>> Whenever there is a unexpected delay (late delivery of a frame as an
>>>>>>> example) you may endup in situation where one frame is ready after the
>>>>>>> targeted vblank. If there is another frame that targets the following
>>>>>>> vblank that gets ready on-time, the previous frame should be replaced
>>>>>>> by the most recent one.
>>>>>>>
>>>>>>> With fences, what happens is that even if you received the next frame
>>>>>>> on time, naively replacing it is not possible, because we don't know
>>>>>>> when the fence for the next frame will be signalled. If you simply
>>>>>>> always replace the current frame, you may endup skipping a lot more
>>>>>>> vblank then what you expect, and that results in jumpy playback.
>>>>>>
>>>>>> So you want to be able to replace a queued flip with another one then.
>>>>>> That doesn't necessarily require allowing more than one flip to be
>>>>>> queued ahead of time.
>>>>>
>>>>> There might be other ways to do it, but this one has plenty of
>>>>> advantages.
>>>>
>>>> The point of kms (well one of the reasons) was to separate the
>>>> implementation of modesetting for specific hw from policy decisions
>>>> like which frames to drop and how to schedule them. Kernel gives
>>>> tools, userspace implements the actual protocols.
>>>>
>>>> There's definitely a bit a gap around scheduling flips for a specific
>>>> frame or allowing to cancel/overwrite an already scheduled flip, but
>>>> no one yet has come up with a clear proposal for new uapi + example
>>>> implementation + userspace implementation + big enough support from
>>>> other compositors that this is what they want too.
>>
>> Actually, the ATOMIC_AMEND patches propose a way to replace a scheduled
>> flip?
>>
>>
>>>>>> Note that this can also be done in userspace with explicit fencing (by
>>>>>> only selecting a frame and submitting it to the kernel after all
>>>>>> corresponding fences have signalled), at least to some degree, but the
>>>>>> kernel should be able to do it up to a later point in time and more
>>>>>> reliably, with less risk of missing a flip for a frame which becomes
>>>>>> ready just in time.
>>>>>
>>>>> Indeed, but it would be great if we could do that with implicit fencing
>>>>> as well.
>>>>
>>>> 1. extract implicit fences from dma-buf. This part is just an idea,
>>>> but easy to implement once we have someone who actually wants this.
>>>> All we need is a new ioctl on the dma-buf to export the fences from
>>>> the reservation_object as a sync_file (either the exclusive or the
>>>> shared ones, selected with a flag).
>>>> 2. do the exact same frame scheduling as with explicit fencing
>>>> 3. supply explicit fences in your atomic ioctl calls - these should
>>>> overrule any implicit fences (assuming correct kernel drivers, but we
>>>> have helpers so you can assume they all work correctly).
>>>>
>>>> By design this is possible, it's just that no one yet bothered enough
>>>> to make it happen.
>>>> -Daniel
>>>
>>> I'm not sure I understand the workflow of this one. I'm all in favour
>>> leaving the hard work to userspace. Note that I have assumed explicit
>>> fences from the start, I don't think implicit fence will ever exist in
>>> v4l2, but I might be wrong. What I understood is that there was a
>>> previous attempt in the past but it raised more issues then it actually
>>> solved. So that being said, how do handle exactly the follow use cases:
>>>
>>> - A frame was lost by capture driver, but it was schedule as being the
>>> next buffer to render (normally previous frame should remain).
>>
>> Userspace just doesn't call into the kernel to flip to the lost frame,
>> so the previous one remains.
>
> We are stuck in a loop you a me. Considering v4l2 to drm, where fences
> don't exist on v4l2, it makes very little sense to bring up fences if
> we are to wait on the fence in userspace.
It makes sense insofar as no V4L specific code would be needed to make
sure that the contents of a buffer produced via V4L aren't consumed
before they're ready to be.
>>> - The scheduled frame is late for the next vblank (didn't signal on-
>>> time), a new one may be better for the next vlbank, but we will only
>>> know when it's fence is signaled.
>>
>> Userspace only selects a frame and submits it to the kernel after all
>> its fences have signalled.
>>
>>> Better in this context means the the presentation time of this frame is
>>> closer to the next vblank time. Keep in mind that the idea is to
>>> schedule the frames before they are signal, in order to make the usage
>>> of the fence useful in lowering the latency.
>>
>> Fences are about signalling completion, not about low latency.
>
> It can be used to remove a roundtrip with userspace at a very time
> sensitive moment. If you pass a dmabuf with it's unsignalled fence to a
> kernel driver, the driver can start the job on this dmabuf as soon as
> the fence is signalled. If you always wait on a fence in userspace, you
> have to wait for the userspace process to be scheduled,
I doubt this magically works without something like that (e.g. a
workqueue, which runs in normal process context) in the kernel either. :)
> then userspace will setup the drm atomic request or similar action, which
> may take some time and may require another process in the kernel to have
> to be schedule. This effectively adds some variable delay, a gap where
> nothing is happening between two operations. This time is lost and
> contributes to the overall operation latency.
It only increases latency if it causes a flip to miss its target vblank,
and it's not possible to know this happens at an unacceptable rate
without trying. The prudent approach is to at least prototype a solution
with as much complexity as possible in userspace first. If that turns
out to perform too badly, then we can think about how to improve it by
adding complexity in the kernel.
> The benefit of fences we are looking for is being able to setup before
> the fence is signalled the operations on various compatible drivers.
> This way, on the time critical moment a driver can be feed more jobs,
> there is no userspace rountrip involved.
That is possible with other operations, just not with page flipping yet.
--
Earthling Michel Dänzer | https://www.amd.com
Libre software enthusiast | Mesa and X developer
[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 195 bytes --]
[-- Attachment #2: Type: text/plain, Size: 159 bytes --]
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Support for 2D engines/blitters in V4L2 and DRM
2019-04-24 8:31 ` Michel Dänzer
(?)
(?)
@ 2019-04-24 12:19 ` Paul Kocialkowski
2019-04-24 17:10 ` Michel Dänzer
-1 siblings, 1 reply; 36+ messages in thread
From: Paul Kocialkowski @ 2019-04-24 12:19 UTC (permalink / raw)
To: Michel Dänzer, Nicolas Dufresne, Daniel Vetter
Cc: Alexandre Courbot, Maxime Ripard, linux-kernel, dri-devel,
Tomasz Figa, Hans Verkuil, Thomas Petazzoni, Dave Airlie,
Mauro Carvalho Chehab, linux-media
Hi,
On Wed, 2019-04-24 at 10:31 +0200, Michel Dänzer wrote:
> On 2019-04-19 10:38 a.m., Paul Kocialkowski wrote:
> > On Thu, 2019-04-18 at 20:30 -0400, Nicolas Dufresne wrote:
> > > Le jeudi 18 avril 2019 à 10:18 +0200, Daniel Vetter a écrit :
> > > > > It would be cool if both could be used concurrently and not just return
> > > > > -EBUSY when the device is used with the other subsystem.
> > > >
> > > > We live in this world already :-) I think there's even patches (or merged
> > > > already) to add fences to v4l, for Android.
> > >
> > > This work is currently suspended. It will require some feature on DRM
> > > display to really make this useful, but there is also a lot of
> > > challanges in V4L2. In GFX space, most of the use case are about
> > > rendering as soon as possible. Though, in multimedia we have two
> > > problems, we need to synchronize the frame rendering with the audio,
> > > and output buffers may comes out of order due to how video CODECs are
> > > made.
> >
> > Definitely, it feels like the DRM display side is currently a good fit
> > for render use cases, but not so much for precise display cases where
> > we want to try and display a buffer at a given vblank target instead of
> > "as soon as possible".
> >
> > I have a userspace project where I've implemented a page flip queue,
> > which only schedules the next flip when relevant and keeps ready
> > buffers in the queue until then. This requires explicit vblank
> > syncronisation (which DRM offsers, but pretty much all other display
> > APIs, that are higher-level don't, so I'm just using a refresh-rate
> > timer for them) and flip done notification.
> >
> > I haven't looked too much at how to flip with a target vblank with DRM
> > directly but maybe the atomic API already has the bits in for that (but
> > I haven't heard of such a thing as a buffer queue, so that makes me
> > doubt it).
>
> Not directly. What's available is that if userspace waits for vblank n
> and then submits a flip, the flip will complete in vblank n+1 (or a
> later vblank, depending on when the flip is submitted and when the
> fences the flip depends on signal).
>
> There is reluctance allowing more than one flip to be queued in the
> kernel, as it would considerably increase complexity in the kernel. It
> would probably only be considered if there was a compelling use-case
> which was outright impossible otherwise.
Well, I think it's just less boilerplace for userspace. This is indeed
quite complex, and I prefer to see that complexity done once and well
in Linux rather than duplicated in userspace with more or less reliable
implementations.
> > Well, I need to handle stuff like SDL in my userspace project, so I have
> > to have all that queuing stuff in software anyway, but it would be good
> > if each project didn't have to implement that. Worst case, it could be
> > in libdrm too.
>
> Usually, this kind of queuing will be handled in a display server such
> as Xorg or a Wayland compositor, not by the application such as a video
> player itself, or any library in the latter's address space. I'm not
> sure there's much potential for sharing code between display servers for
> this.
This assumes that you are using a display server, which is definitely
not always the case (there is e.g. Kodi GBM). Well, I'm not saying it
is essential to have it in the kernel, but it would avoid code
duplication and lower the complexity in userspace.
> > > In the first, we'd need a mechanism where we can schedule a render at a
> > > specific time or vblank. We can of course already implement this in
> > > software, but with fences, the scheduling would need to be done in the
> > > driver. Then if the fence is signalled earlier, the driver should hold
> > > on until the delay is met. If the fence got signalled late, we also
> > > need to think of a workflow. As we can't schedule more then one render
> > > in DRM at one time, I don't really see yet how to make that work.
> >
> > Indeed, that's also one of the main issues I've spotted. Before using
> > an implicit fence, we basically have to make sure the frame is due for
> > display at the next vblank. Otherwise, we need to refrain from using
> > the fence and schedule the flip later, which is kind of counter-
> > productive.
>
> Fences are about signalling that the contents of a frame are "done" and
> ready to be presented. They're not about specifying which frame is to be
> presented when.
Yes, that's precisely the issue I see with them. Once you have
scheduled the flip with a buffer, it is too late to schedule a more
recent buffer for flip if a more recent buffer is available sooner (see
the issue that Nicolas is describing). If you attach a vblank target to
the flip, the flip can be skipped when the fence is signaled if a more
recent buffer was signaled first.
> > I feel like specifying a target vblank would be a good unit for that,
>
> The mechanism described above works for that.
I still don't see any fence-based mechanism that can work to achieve
that, but maybe I'm missing your point.
> > since it's our native granularity after all (while a timestamp is not).
>
> Note that variable refresh rate (Adaptive Sync / FreeSync / G-Sync)
> changes things in this regard. It makes the vblank length variable, and
> if you wait for multiple vblanks between flips, you get the maximum
> vblank length corresponding to the minimum refresh rate / timing
> granularity. Thus, it would be useful to allow userspace to specify a
> timestamp corresponding to the earliest time when the flip is to
> complete. The kernel could then try to hit that as closely as possible.
I'm not very familiar with how this works, but I don't really see what
it changes. Does it mean we can flip multiple times per vblank?
If so, how can userspace be aware of that and deal with it properly?
Unless I'm missing something, I think flip scheduling should still work
on vblank granularity in that case.
And I really like a vblank count over a timestamp, as one is the native
unit at hand and the other one only correleates to it.
Cheers,
Paul
--
Paul Kocialkowski, Bootlin
Embedded Linux and kernel engineering
https://bootlin.com
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Support for 2D engines/blitters in V4L2 and DRM
2019-04-24 12:19 ` Paul Kocialkowski
@ 2019-04-24 17:10 ` Michel Dänzer
0 siblings, 0 replies; 36+ messages in thread
From: Michel Dänzer @ 2019-04-24 17:10 UTC (permalink / raw)
To: Paul Kocialkowski, Nicolas Dufresne, Daniel Vetter
Cc: Alexandre Courbot, Maxime Ripard, linux-kernel, dri-devel,
Tomasz Figa, Hans Verkuil, Thomas Petazzoni, Dave Airlie,
Mauro Carvalho Chehab, linux-media
On 2019-04-24 2:19 p.m., Paul Kocialkowski wrote:
> On Wed, 2019-04-24 at 10:31 +0200, Michel Dänzer wrote:
>> On 2019-04-19 10:38 a.m., Paul Kocialkowski wrote:
>>> On Thu, 2019-04-18 at 20:30 -0400, Nicolas Dufresne wrote:
>>>> Le jeudi 18 avril 2019 à 10:18 +0200, Daniel Vetter a écrit :
>>>>>> It would be cool if both could be used concurrently and not just return
>>>>>> -EBUSY when the device is used with the other subsystem.
>>>>>
>>>>> We live in this world already :-) I think there's even patches (or merged
>>>>> already) to add fences to v4l, for Android.
>>>>
>>>> This work is currently suspended. It will require some feature on DRM
>>>> display to really make this useful, but there is also a lot of
>>>> challanges in V4L2. In GFX space, most of the use case are about
>>>> rendering as soon as possible. Though, in multimedia we have two
>>>> problems, we need to synchronize the frame rendering with the audio,
>>>> and output buffers may comes out of order due to how video CODECs are
>>>> made.
>>>
>>> Definitely, it feels like the DRM display side is currently a good fit
>>> for render use cases, but not so much for precise display cases where
>>> we want to try and display a buffer at a given vblank target instead of
>>> "as soon as possible".
>>>
>>> I have a userspace project where I've implemented a page flip queue,
>>> which only schedules the next flip when relevant and keeps ready
>>> buffers in the queue until then. This requires explicit vblank
>>> syncronisation (which DRM offsers, but pretty much all other display
>>> APIs, that are higher-level don't, so I'm just using a refresh-rate
>>> timer for them) and flip done notification.
>>>
>>> I haven't looked too much at how to flip with a target vblank with DRM
>>> directly but maybe the atomic API already has the bits in for that (but
>>> I haven't heard of such a thing as a buffer queue, so that makes me
>>> doubt it).
>>
>> Not directly. What's available is that if userspace waits for vblank n
>> and then submits a flip, the flip will complete in vblank n+1 (or a
>> later vblank, depending on when the flip is submitted and when the
>> fences the flip depends on signal).
>>
>> There is reluctance allowing more than one flip to be queued in the
>> kernel, as it would considerably increase complexity in the kernel. It
>> would probably only be considered if there was a compelling use-case
>> which was outright impossible otherwise.
>
> Well, I think it's just less boilerplace for userspace. This is indeed
> quite complex, and I prefer to see that complexity done once and well
> in Linux rather than duplicated in userspace with more or less reliable
> implementations.
That's not the only trade-off to consider, e.g. I suspect handling this
in the kernel is more complex than in userspace.
>>> Well, I need to handle stuff like SDL in my userspace project, so I have
>>> to have all that queuing stuff in software anyway, but it would be good
>>> if each project didn't have to implement that. Worst case, it could be
>>> in libdrm too.
>>
>> Usually, this kind of queuing will be handled in a display server such
>> as Xorg or a Wayland compositor, not by the application such as a video
>> player itself, or any library in the latter's address space. I'm not
>> sure there's much potential for sharing code between display servers for
>> this.
>
> This assumes that you are using a display server, which is definitely
> not always the case (there is e.g. Kodi GBM). Well, I'm not saying it
> is essential to have it in the kernel, but it would avoid code
> duplication and lower the complexity in userspace.
For code duplication, my suggestion would be to use a display server
instead of duplicating its functionality.
>>>> In the first, we'd need a mechanism where we can schedule a render at a
>>>> specific time or vblank. We can of course already implement this in
>>>> software, but with fences, the scheduling would need to be done in the
>>>> driver. Then if the fence is signalled earlier, the driver should hold
>>>> on until the delay is met. If the fence got signalled late, we also
>>>> need to think of a workflow. As we can't schedule more then one render
>>>> in DRM at one time, I don't really see yet how to make that work.
>>>
>>> Indeed, that's also one of the main issues I've spotted. Before using
>>> an implicit fence, we basically have to make sure the frame is due for
>>> display at the next vblank. Otherwise, we need to refrain from using
>>> the fence and schedule the flip later, which is kind of counter-
>>> productive.
>>
>> [...]
>
>>> I feel like specifying a target vblank would be a good unit for that,
>>
>> The mechanism described above works for that.
>
> I still don't see any fence-based mechanism that can work to achieve
> that, but maybe I'm missing your point.
It's not fence based, just good old waiting for the previous vblank
before submitting the flip to the kernel.
>>> since it's our native granularity after all (while a timestamp is not).
>>
>> Note that variable refresh rate (Adaptive Sync / FreeSync / G-Sync)
>> changes things in this regard. It makes the vblank length variable, and
>> if you wait for multiple vblanks between flips, you get the maximum
>> vblank length corresponding to the minimum refresh rate / timing
>> granularity. Thus, it would be useful to allow userspace to specify a
>> timestamp corresponding to the earliest time when the flip is to
>> complete. The kernel could then try to hit that as closely as possible.
>
> I'm not very familiar with how this works, but I don't really see what
> it changes. Does it mean we can flip multiple times per vblank?
It's not about that.
> And I really like a vblank count over a timestamp, as one is the native
> unit at hand and the other one only correleates to it.
From a video playback application POV it's really the other way around,
isn't it? The target time is known (e.g. in order to sync up with
audio), the vblank count has to be calculated from that. And with
variable refresh rate, this calculation can't be done reliably, because
it's not known ahead of time when the next vblank starts (at least not
more accurately than an interval corresponding to the maximum/minimum
refresh rates).
If the target timestamp could be specified explicitly, the kernel could
do the conversion to the vblank count for fixed refresh, and could
adjust the refresh rate to hit the target more accurately with variable
refresh.
--
Earthling Michel Dänzer | https://www.amd.com
Libre software enthusiast | Mesa and X developer
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Support for 2D engines/blitters in V4L2 and DRM
@ 2019-04-24 17:10 ` Michel Dänzer
0 siblings, 0 replies; 36+ messages in thread
From: Michel Dänzer @ 2019-04-24 17:10 UTC (permalink / raw)
To: Paul Kocialkowski, Nicolas Dufresne, Daniel Vetter
Cc: Alexandre Courbot, Maxime Ripard, linux-kernel, dri-devel,
Tomasz Figa, Hans Verkuil, Thomas Petazzoni, Dave Airlie,
Mauro Carvalho Chehab, linux-media
On 2019-04-24 2:19 p.m., Paul Kocialkowski wrote:
> On Wed, 2019-04-24 at 10:31 +0200, Michel Dänzer wrote:
>> On 2019-04-19 10:38 a.m., Paul Kocialkowski wrote:
>>> On Thu, 2019-04-18 at 20:30 -0400, Nicolas Dufresne wrote:
>>>> Le jeudi 18 avril 2019 à 10:18 +0200, Daniel Vetter a écrit :
>>>>>> It would be cool if both could be used concurrently and not just return
>>>>>> -EBUSY when the device is used with the other subsystem.
>>>>>
>>>>> We live in this world already :-) I think there's even patches (or merged
>>>>> already) to add fences to v4l, for Android.
>>>>
>>>> This work is currently suspended. It will require some feature on DRM
>>>> display to really make this useful, but there is also a lot of
>>>> challanges in V4L2. In GFX space, most of the use case are about
>>>> rendering as soon as possible. Though, in multimedia we have two
>>>> problems, we need to synchronize the frame rendering with the audio,
>>>> and output buffers may comes out of order due to how video CODECs are
>>>> made.
>>>
>>> Definitely, it feels like the DRM display side is currently a good fit
>>> for render use cases, but not so much for precise display cases where
>>> we want to try and display a buffer at a given vblank target instead of
>>> "as soon as possible".
>>>
>>> I have a userspace project where I've implemented a page flip queue,
>>> which only schedules the next flip when relevant and keeps ready
>>> buffers in the queue until then. This requires explicit vblank
>>> syncronisation (which DRM offsers, but pretty much all other display
>>> APIs, that are higher-level don't, so I'm just using a refresh-rate
>>> timer for them) and flip done notification.
>>>
>>> I haven't looked too much at how to flip with a target vblank with DRM
>>> directly but maybe the atomic API already has the bits in for that (but
>>> I haven't heard of such a thing as a buffer queue, so that makes me
>>> doubt it).
>>
>> Not directly. What's available is that if userspace waits for vblank n
>> and then submits a flip, the flip will complete in vblank n+1 (or a
>> later vblank, depending on when the flip is submitted and when the
>> fences the flip depends on signal).
>>
>> There is reluctance allowing more than one flip to be queued in the
>> kernel, as it would considerably increase complexity in the kernel. It
>> would probably only be considered if there was a compelling use-case
>> which was outright impossible otherwise.
>
> Well, I think it's just less boilerplace for userspace. This is indeed
> quite complex, and I prefer to see that complexity done once and well
> in Linux rather than duplicated in userspace with more or less reliable
> implementations.
That's not the only trade-off to consider, e.g. I suspect handling this
in the kernel is more complex than in userspace.
>>> Well, I need to handle stuff like SDL in my userspace project, so I have
>>> to have all that queuing stuff in software anyway, but it would be good
>>> if each project didn't have to implement that. Worst case, it could be
>>> in libdrm too.
>>
>> Usually, this kind of queuing will be handled in a display server such
>> as Xorg or a Wayland compositor, not by the application such as a video
>> player itself, or any library in the latter's address space. I'm not
>> sure there's much potential for sharing code between display servers for
>> this.
>
> This assumes that you are using a display server, which is definitely
> not always the case (there is e.g. Kodi GBM). Well, I'm not saying it
> is essential to have it in the kernel, but it would avoid code
> duplication and lower the complexity in userspace.
For code duplication, my suggestion would be to use a display server
instead of duplicating its functionality.
>>>> In the first, we'd need a mechanism where we can schedule a render at a
>>>> specific time or vblank. We can of course already implement this in
>>>> software, but with fences, the scheduling would need to be done in the
>>>> driver. Then if the fence is signalled earlier, the driver should hold
>>>> on until the delay is met. If the fence got signalled late, we also
>>>> need to think of a workflow. As we can't schedule more then one render
>>>> in DRM at one time, I don't really see yet how to make that work.
>>>
>>> Indeed, that's also one of the main issues I've spotted. Before using
>>> an implicit fence, we basically have to make sure the frame is due for
>>> display at the next vblank. Otherwise, we need to refrain from using
>>> the fence and schedule the flip later, which is kind of counter-
>>> productive.
>>
>> [...]
>
>>> I feel like specifying a target vblank would be a good unit for that,
>>
>> The mechanism described above works for that.
>
> I still don't see any fence-based mechanism that can work to achieve
> that, but maybe I'm missing your point.
It's not fence based, just good old waiting for the previous vblank
before submitting the flip to the kernel.
>>> since it's our native granularity after all (while a timestamp is not).
>>
>> Note that variable refresh rate (Adaptive Sync / FreeSync / G-Sync)
>> changes things in this regard. It makes the vblank length variable, and
>> if you wait for multiple vblanks between flips, you get the maximum
>> vblank length corresponding to the minimum refresh rate / timing
>> granularity. Thus, it would be useful to allow userspace to specify a
>> timestamp corresponding to the earliest time when the flip is to
>> complete. The kernel could then try to hit that as closely as possible.
>
> I'm not very familiar with how this works, but I don't really see what
> it changes. Does it mean we can flip multiple times per vblank?
It's not about that.
> And I really like a vblank count over a timestamp, as one is the native
> unit at hand and the other one only correleates to it.
From a video playback application POV it's really the other way around,
isn't it? The target time is known (e.g. in order to sync up with
audio), the vblank count has to be calculated from that. And with
variable refresh rate, this calculation can't be done reliably, because
it's not known ahead of time when the next vblank starts (at least not
more accurately than an interval corresponding to the maximum/minimum
refresh rates).
If the target timestamp could be specified explicitly, the kernel could
do the conversion to the vblank count for fixed refresh, and could
adjust the refresh rate to hit the target more accurately with variable
refresh.
--
Earthling Michel Dänzer | https://www.amd.com
Libre software enthusiast | Mesa and X developer
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 36+ messages in thread