linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Support for 2D engines/blitters in V4L2 and DRM
@ 2019-04-17 18:10 Paul Kocialkowski
  2019-04-18  8:18 ` Daniel Vetter
  2019-05-06  8:28 ` Pekka Paalanen
  0 siblings, 2 replies; 24+ messages in thread
From: Paul Kocialkowski @ 2019-04-17 18:10 UTC (permalink / raw)
  To: Nicolas Dufresne
  Cc: linux-kernel, Alexandre Courbot, Tomasz Figa, Maxime Ripard,
	Hans Verkuil, Mauro Carvalho Chehab, linux-media, dri-devel,
	Thomas Petazzoni, Eric Anholt, Rob Clark, Dave Airlie,
	Daniel Vetter, Maarten Lankhorst

Hi Nicolas,

I'm detaching this thread from our V4L2 stateless decoding spec since
it has drifted off and would certainly be interesting to DRM folks as
well!

For context: I was initially talking about writing up support for the
Allwinner 2D engine as a DRM render driver, where I'd like to be able
to batch jobs that affect the same destination buffer to only signal
the out fence once when the batch is done. We have a similar issue in
v4l2 where we'd like the destination buffer for a set of requests (each
covering one H264 slice) to be marked as done once the set was decoded.

Le mercredi 17 avril 2019 à 12:22 -0400, Nicolas Dufresne a écrit :
> > > > Interestingly, I'm experiencing the exact same problem dealing with a
> > > > 2D graphics blitter that has limited ouput scaling abilities which
> > > > imply handlnig a large scaling operation as multiple clipped smaller
> > > > scaling operations. The issue is basically that multiple jobs have to
> > > > be submitted to complete a single frame and relying on an indication
> > > > from the destination buffer (such as a fence) doesn't work to indicate
> > > > that all the operations were completed, since we get the indication at
> > > > each step instead of at the end of the batch.
> > > 
> > > That looks similar to the IMX.6 IPU m2m driver. It splits the image in
> > > tiles of 1024x1024 and process each tile separately. This driver has
> > > been around for a long time, so I guess they have a solution to that.
> > > They don't need requests, because there is nothing to be bundled with
> > > the input image. I know that Renesas folks have started working on a
> > > de-interlacer. Again, this kind of driver may process and reuse input
> > > buffers for motion compensation, but I don't think they need special
> > > userspace API for that.
> > 
> > Thanks for the reference! I hope it's not a blitter that was
> > contributed as a V4L2 driver instead of DRM, as it probably would be
> > more useful in DRM (but that's way beside the point).
> 
> DRM does not offer a generic and discoverable interface for these
> accelerators. Note that these drivers have most of the time started as
> DRM driver and their DRM side where dropped. That was the case for
> Exynos drivers at least.

Heh, sadly I'm aware of how things turn out most of the time. The thing
is that DRM expects drivers to implement their own interface. That's
fine for passing BOs with GPU bitstream and textures, but not so much
for dealing with framebuffer-based operations where the streaming and
buffer interface that v4l2 has is a good fit.

There's also the fact that the 2D pipeline is fixed-function and highly
hardware-specific, so we need driver-specific job descriptions to
really make the most of it. That's where v4l2 is not much of a good fit
for complex 2D pipelines either. Most 2D engines can take multiple
inputs and blit them together in various ways, which is too far from
what v4l2 deals with. So we can have fixed single-buffer pipelines with
at best CSC and scaling, but not much more with v4l2 really.

I don't think it would be too much work to bring an interface to DRM in
order to describe render framebuffers (we only have display
framebuffers so far), with a simple queuing interface for scheduling
driver-specific jobs, which could be grouped together to only signal
the out fences when every buffer of the batch was done being rendered.
This last point would allow handling cases where userapce need to
perform multiple operations to carry out the single operation that it
needs to do. In the case of my 2D blitter, that would be scaling above
a 1024x1024 destination, which could be required to scaling a video
buffer up to a 1920x1080 display. With that, we can e.g. page flip the
2D engine destination buffer and be certain that scaling will be fully
done when the fence is signaled.

There's also the userspace problem: DRM render has mesa to back it in
userspace and provide a generic API for other programes. For 2D
engines, we don't have much to hold on to. Cairo has a DRM render
interface that supports a few DRM render drivers where there is either
a 2D pipeline or where pre-built shaders are used to implement a 2D
pipeline, and that's about it as far as I know.

There's also the possibility of writing up a drm-render DDX to handle
these 2D blitters that can make things a lot faster when running a
desktop environment. As for wayland, well, I don't really know what to
think. I was under the impression that it relies on GL for 2D
operations, but am really not sure how true that actually is.

> The thing is that DRM is great if you do immediate display stuff, while
> V4L2 is nice if you do streaming, where you expect filling queued, and
> popping buffers from queues.
> 
> In the end, this is just an interface, nothing prevents you from making
> an internal driver (like the Meson Canvas) and simply letting multiple
> sub-system expose it. Specially that some of these IP will often
> support both signal and memory processing, so they equally fit into a
> media controller ISP, a v4l2 m2m or a DRM driver.

Having base drivers that can hook to both v4l2 m2m and DRM would
definitely be awesome. Maybe we could have some common internal
synchronization logic to make writing these drivers easier.

It would be cool if both could be used concurrently and not just return
-EBUSY when the device is used with the other subsystem.

Anyway, that's my 2 cents about the situation and what we can do to
improve it. I'm definitely interested in tackling these items, but it
may take some time before we get there. Not to mention we need to
rework media/v4l2 for per-slice decoding support ;)

> Another driver you might want to look is Rockchip RGA driver (which is
> a multi function IP, including blitting).

Yep, I've aware of it as well. There's also vivante which exposes 2D
cores but I'm really not sure whether any function is actually
implemented. 

OMAP4 and OMAP5 have a 2D engine that seems to be vivante as well from
what I could find out, but it seems to only have blobs for bltsville
and no significant docs.

Cheers,

Paul


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Support for 2D engines/blitters in V4L2 and DRM
  2019-04-17 18:10 Support for 2D engines/blitters in V4L2 and DRM Paul Kocialkowski
@ 2019-04-18  8:18 ` Daniel Vetter
  2019-04-18  8:54   ` Paul Kocialkowski
  2019-04-19  0:30   ` Nicolas Dufresne
  2019-05-06  8:28 ` Pekka Paalanen
  1 sibling, 2 replies; 24+ messages in thread
From: Daniel Vetter @ 2019-04-18  8:18 UTC (permalink / raw)
  To: Paul Kocialkowski
  Cc: Nicolas Dufresne, linux-kernel, Alexandre Courbot, Tomasz Figa,
	Maxime Ripard, Hans Verkuil, Mauro Carvalho Chehab, linux-media,
	dri-devel, Thomas Petazzoni, Eric Anholt, Rob Clark, Dave Airlie,
	Daniel Vetter, Maarten Lankhorst

On Wed, Apr 17, 2019 at 08:10:15PM +0200, Paul Kocialkowski wrote:
> Hi Nicolas,
> 
> I'm detaching this thread from our V4L2 stateless decoding spec since
> it has drifted off and would certainly be interesting to DRM folks as
> well!
> 
> For context: I was initially talking about writing up support for the
> Allwinner 2D engine as a DRM render driver, where I'd like to be able
> to batch jobs that affect the same destination buffer to only signal
> the out fence once when the batch is done. We have a similar issue in
> v4l2 where we'd like the destination buffer for a set of requests (each
> covering one H264 slice) to be marked as done once the set was decoded.
> 
> Le mercredi 17 avril 2019 à 12:22 -0400, Nicolas Dufresne a écrit :
> > > > > Interestingly, I'm experiencing the exact same problem dealing with a
> > > > > 2D graphics blitter that has limited ouput scaling abilities which
> > > > > imply handlnig a large scaling operation as multiple clipped smaller
> > > > > scaling operations. The issue is basically that multiple jobs have to
> > > > > be submitted to complete a single frame and relying on an indication
> > > > > from the destination buffer (such as a fence) doesn't work to indicate
> > > > > that all the operations were completed, since we get the indication at
> > > > > each step instead of at the end of the batch.
> > > > 
> > > > That looks similar to the IMX.6 IPU m2m driver. It splits the image in
> > > > tiles of 1024x1024 and process each tile separately. This driver has
> > > > been around for a long time, so I guess they have a solution to that.
> > > > They don't need requests, because there is nothing to be bundled with
> > > > the input image. I know that Renesas folks have started working on a
> > > > de-interlacer. Again, this kind of driver may process and reuse input
> > > > buffers for motion compensation, but I don't think they need special
> > > > userspace API for that.
> > > 
> > > Thanks for the reference! I hope it's not a blitter that was
> > > contributed as a V4L2 driver instead of DRM, as it probably would be
> > > more useful in DRM (but that's way beside the point).
> > 
> > DRM does not offer a generic and discoverable interface for these
> > accelerators. Note that these drivers have most of the time started as
> > DRM driver and their DRM side where dropped. That was the case for
> > Exynos drivers at least.
> 
> Heh, sadly I'm aware of how things turn out most of the time. The thing
> is that DRM expects drivers to implement their own interface. That's
> fine for passing BOs with GPU bitstream and textures, but not so much
> for dealing with framebuffer-based operations where the streaming and
> buffer interface that v4l2 has is a good fit.
> 
> There's also the fact that the 2D pipeline is fixed-function and highly
> hardware-specific, so we need driver-specific job descriptions to
> really make the most of it. That's where v4l2 is not much of a good fit
> for complex 2D pipelines either. Most 2D engines can take multiple
> inputs and blit them together in various ways, which is too far from
> what v4l2 deals with. So we can have fixed single-buffer pipelines with
> at best CSC and scaling, but not much more with v4l2 really.
> 
> I don't think it would be too much work to bring an interface to DRM in
> order to describe render framebuffers (we only have display
> framebuffers so far), with a simple queuing interface for scheduling
> driver-specific jobs, which could be grouped together to only signal
> the out fences when every buffer of the batch was done being rendered.
> This last point would allow handling cases where userapce need to
> perform multiple operations to carry out the single operation that it
> needs to do. In the case of my 2D blitter, that would be scaling above
> a 1024x1024 destination, which could be required to scaling a video
> buffer up to a 1920x1080 display. With that, we can e.g. page flip the
> 2D engine destination buffer and be certain that scaling will be fully
> done when the fence is signaled.
> 
> There's also the userspace problem: DRM render has mesa to back it in
> userspace and provide a generic API for other programes. For 2D
> engines, we don't have much to hold on to. Cairo has a DRM render
> interface that supports a few DRM render drivers where there is either
> a 2D pipeline or where pre-built shaders are used to implement a 2D
> pipeline, and that's about it as far as I know.
> 
> There's also the possibility of writing up a drm-render DDX to handle
> these 2D blitters that can make things a lot faster when running a
> desktop environment. As for wayland, well, I don't really know what to
> think. I was under the impression that it relies on GL for 2D
> operations, but am really not sure how true that actually is.

Just fyi in case you folks aren't aware, I typed up a blog a while ago
about why drm doesn't have a 2d submit api:

https://blog.ffwll.ch/2018/08/no-2d-in-drm.html

> > The thing is that DRM is great if you do immediate display stuff, while
> > V4L2 is nice if you do streaming, where you expect filling queued, and
> > popping buffers from queues.
> > 
> > In the end, this is just an interface, nothing prevents you from making
> > an internal driver (like the Meson Canvas) and simply letting multiple
> > sub-system expose it. Specially that some of these IP will often
> > support both signal and memory processing, so they equally fit into a
> > media controller ISP, a v4l2 m2m or a DRM driver.
> 
> Having base drivers that can hook to both v4l2 m2m and DRM would
> definitely be awesome. Maybe we could have some common internal
> synchronization logic to make writing these drivers easier.

We have, it's called dma_fence. Ties into dma_bufs using
reservation_objecsts.

> It would be cool if both could be used concurrently and not just return
> -EBUSY when the device is used with the other subsystem.

We live in this world already :-) I think there's even patches (or merged
already) to add fences to v4l, for Android.

> Anyway, that's my 2 cents about the situation and what we can do to
> improve it. I'm definitely interested in tackling these items, but it
> may take some time before we get there. Not to mention we need to
> rework media/v4l2 for per-slice decoding support ;)
> 
> > Another driver you might want to look is Rockchip RGA driver (which is
> > a multi function IP, including blitting).
> 
> Yep, I've aware of it as well. There's also vivante which exposes 2D
> cores but I'm really not sure whether any function is actually
> implemented. 
> 
> OMAP4 and OMAP5 have a 2D engine that seems to be vivante as well from
> what I could find out, but it seems to only have blobs for bltsville
> and no significant docs.

Yeah that's the usual approach for drm 2d drivers: You have a bespoke
driver in userspace. Usually that means an X driver, but there's been talk
to pimp the hwc interface to make that _the_ 2d accel interface. There's
also fbdev ... *shudder*.

All of these options are geared towards ultimately displaying stuff on
screens, not pure m2m 2d accel.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Support for 2D engines/blitters in V4L2 and DRM
  2019-04-18  8:18 ` Daniel Vetter
@ 2019-04-18  8:54   ` Paul Kocialkowski
  2019-04-18  9:09     ` Tomasz Figa
  2019-04-19  0:30   ` Nicolas Dufresne
  1 sibling, 1 reply; 24+ messages in thread
From: Paul Kocialkowski @ 2019-04-18  8:54 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Nicolas Dufresne, linux-kernel, Alexandre Courbot, Tomasz Figa,
	Maxime Ripard, Hans Verkuil, Mauro Carvalho Chehab, linux-media,
	dri-devel, Thomas Petazzoni, Eric Anholt, Rob Clark, Dave Airlie,
	Maarten Lankhorst

Hi Daniel,

On Thu, 2019-04-18 at 10:18 +0200, Daniel Vetter wrote:
> On Wed, Apr 17, 2019 at 08:10:15PM +0200, Paul Kocialkowski wrote:
> > Hi Nicolas,
> > 
> > I'm detaching this thread from our V4L2 stateless decoding spec since
> > it has drifted off and would certainly be interesting to DRM folks as
> > well!
> > 
> > For context: I was initially talking about writing up support for the
> > Allwinner 2D engine as a DRM render driver, where I'd like to be able
> > to batch jobs that affect the same destination buffer to only signal
> > the out fence once when the batch is done. We have a similar issue in
> > v4l2 where we'd like the destination buffer for a set of requests (each
> > covering one H264 slice) to be marked as done once the set was decoded.
> > 
> > Le mercredi 17 avril 2019 à 12:22 -0400, Nicolas Dufresne a écrit :
> > > > > > Interestingly, I'm experiencing the exact same problem dealing with a
> > > > > > 2D graphics blitter that has limited ouput scaling abilities which
> > > > > > imply handlnig a large scaling operation as multiple clipped smaller
> > > > > > scaling operations. The issue is basically that multiple jobs have to
> > > > > > be submitted to complete a single frame and relying on an indication
> > > > > > from the destination buffer (such as a fence) doesn't work to indicate
> > > > > > that all the operations were completed, since we get the indication at
> > > > > > each step instead of at the end of the batch.
> > > > > 
> > > > > That looks similar to the IMX.6 IPU m2m driver. It splits the image in
> > > > > tiles of 1024x1024 and process each tile separately. This driver has
> > > > > been around for a long time, so I guess they have a solution to that.
> > > > > They don't need requests, because there is nothing to be bundled with
> > > > > the input image. I know that Renesas folks have started working on a
> > > > > de-interlacer. Again, this kind of driver may process and reuse input
> > > > > buffers for motion compensation, but I don't think they need special
> > > > > userspace API for that.
> > > > 
> > > > Thanks for the reference! I hope it's not a blitter that was
> > > > contributed as a V4L2 driver instead of DRM, as it probably would be
> > > > more useful in DRM (but that's way beside the point).
> > > 
> > > DRM does not offer a generic and discoverable interface for these
> > > accelerators. Note that these drivers have most of the time started as
> > > DRM driver and their DRM side where dropped. That was the case for
> > > Exynos drivers at least.
> > 
> > Heh, sadly I'm aware of how things turn out most of the time. The thing
> > is that DRM expects drivers to implement their own interface. That's
> > fine for passing BOs with GPU bitstream and textures, but not so much
> > for dealing with framebuffer-based operations where the streaming and
> > buffer interface that v4l2 has is a good fit.
> > 
> > There's also the fact that the 2D pipeline is fixed-function and highly
> > hardware-specific, so we need driver-specific job descriptions to
> > really make the most of it. That's where v4l2 is not much of a good fit
> > for complex 2D pipelines either. Most 2D engines can take multiple
> > inputs and blit them together in various ways, which is too far from
> > what v4l2 deals with. So we can have fixed single-buffer pipelines with
> > at best CSC and scaling, but not much more with v4l2 really.
> > 
> > I don't think it would be too much work to bring an interface to DRM in
> > order to describe render framebuffers (we only have display
> > framebuffers so far), with a simple queuing interface for scheduling
> > driver-specific jobs, which could be grouped together to only signal
> > the out fences when every buffer of the batch was done being rendered.
> > This last point would allow handling cases where userapce need to
> > perform multiple operations to carry out the single operation that it
> > needs to do. In the case of my 2D blitter, that would be scaling above
> > a 1024x1024 destination, which could be required to scaling a video
> > buffer up to a 1920x1080 display. With that, we can e.g. page flip the
> > 2D engine destination buffer and be certain that scaling will be fully
> > done when the fence is signaled.
> > 
> > There's also the userspace problem: DRM render has mesa to back it in
> > userspace and provide a generic API for other programes. For 2D
> > engines, we don't have much to hold on to. Cairo has a DRM render
> > interface that supports a few DRM render drivers where there is either
> > a 2D pipeline or where pre-built shaders are used to implement a 2D
> > pipeline, and that's about it as far as I know.
> > 
> > There's also the possibility of writing up a drm-render DDX to handle
> > these 2D blitters that can make things a lot faster when running a
> > desktop environment. As for wayland, well, I don't really know what to
> > think. I was under the impression that it relies on GL for 2D
> > operations, but am really not sure how true that actually is.
> 
> Just fyi in case you folks aren't aware, I typed up a blog a while ago
> about why drm doesn't have a 2d submit api:
> 
> https://blog.ffwll.ch/2018/08/no-2d-in-drm.html

I definitely share the observation that each 2D engine has its own kind
of pipeline, which is close to impossible to describe in a generic way
while exposing all the possible features of the pipeline.

I thought about this some more yesterday and I see a few areas that
could however be made generic:
* GEM allocation for framebuffers (with a unified ioctl);
* framebuffer management, (that's only in KMS for now and we need
pretty much the same thing here);
* some queuing mechanism, either for standalone submissions or groups
of them.

So I started thinking about writing up a "DRM GFX" API which would
provide this, instead of implementing it in my 2D blitter driver.
There's a chance I'll submit a proposal of that along with my driver.

I am convinced the job submit ioctl needs to remain driver-specific to
properly describe the pipeline though.

> > > The thing is that DRM is great if you do immediate display stuff, while
> > > V4L2 is nice if you do streaming, where you expect filling queued, and
> > > popping buffers from queues.
> > > 
> > > In the end, this is just an interface, nothing prevents you from making
> > > an internal driver (like the Meson Canvas) and simply letting multiple
> > > sub-system expose it. Specially that some of these IP will often
> > > support both signal and memory processing, so they equally fit into a
> > > media controller ISP, a v4l2 m2m or a DRM driver.
> > 
> > Having base drivers that can hook to both v4l2 m2m and DRM would
> > definitely be awesome. Maybe we could have some common internal
> > synchronization logic to make writing these drivers easier.
> 
> We have, it's called dma_fence. Ties into dma_bufs using
> reservation_objecsts.

That's not what I meant: I'm talking about exposing the 2D engine
capabilities through both DRM and V4L2 M2M, where the V4L2 M2M driver
would be an internal client to DRM. So it's about using the same
hardware with both APIs concurrently.

And while at it, we could allow detaching display pipeline elements
that have intermediary writeback and exposing them as 2D engines
through the same API (which would return busy when the block is used
for the video pipeline).

> > It would be cool if both could be used concurrently and not just return
> > -EBUSY when the device is used with the other subsystem.
> 
> We live in this world already :-) I think there's even patches (or merged
> already) to add fences to v4l, for Android.
> 
> > Anyway, that's my 2 cents about the situation and what we can do to
> > improve it. I'm definitely interested in tackling these items, but it
> > may take some time before we get there. Not to mention we need to
> > rework media/v4l2 for per-slice decoding support ;)
> > 
> > > Another driver you might want to look is Rockchip RGA driver (which is
> > > a multi function IP, including blitting).
> > 
> > Yep, I've aware of it as well. There's also vivante which exposes 2D
> > cores but I'm really not sure whether any function is actually
> > implemented. 
> > 
> > OMAP4 and OMAP5 have a 2D engine that seems to be vivante as well from
> > what I could find out, but it seems to only have blobs for bltsville
> > and no significant docs.
> 
> Yeah that's the usual approach for drm 2d drivers: You have a bespoke
> driver in userspace. Usually that means an X driver, but there's been talk
> to pimp the hwc interface to make that _the_ 2d accel interface. There's
> also fbdev ... *shudder*.
> 
> All of these options are geared towards ultimately displaying stuff on
> screens, not pure m2m 2d accel.

I think it would be good to have a specific library to translate
between "standard" 2d ops (porter-duff blending and such) to driver-
specific setup submit ioctls. Could be called "libdrm-gfx" and used by
an associated DDX (as well as any other program that needs 2D ops
acceleration).

Cheers,

Paul

-- 
Paul Kocialkowski, Bootlin
Embedded Linux and kernel engineering
https://bootlin.com


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Support for 2D engines/blitters in V4L2 and DRM
  2019-04-18  8:54   ` Paul Kocialkowski
@ 2019-04-18  9:09     ` Tomasz Figa
  2019-04-18  9:13       ` Paul Kocialkowski
  0 siblings, 1 reply; 24+ messages in thread
From: Tomasz Figa @ 2019-04-18  9:09 UTC (permalink / raw)
  To: Paul Kocialkowski
  Cc: Daniel Vetter, Nicolas Dufresne, Linux Kernel Mailing List,
	Alexandre Courbot, Maxime Ripard, Hans Verkuil,
	Mauro Carvalho Chehab, Linux Media Mailing List, dri-devel,
	Thomas Petazzoni, Eric Anholt, Rob Clark, Dave Airlie,
	Maarten Lankhorst

On Thu, Apr 18, 2019 at 5:55 PM Paul Kocialkowski
<paul.kocialkowski@bootlin.com> wrote:
>
> Hi Daniel,
>
> On Thu, 2019-04-18 at 10:18 +0200, Daniel Vetter wrote:
> > On Wed, Apr 17, 2019 at 08:10:15PM +0200, Paul Kocialkowski wrote:
> > > Hi Nicolas,
> > >
> > > I'm detaching this thread from our V4L2 stateless decoding spec since
> > > it has drifted off and would certainly be interesting to DRM folks as
> > > well!
> > >
> > > For context: I was initially talking about writing up support for the
> > > Allwinner 2D engine as a DRM render driver, where I'd like to be able
> > > to batch jobs that affect the same destination buffer to only signal
> > > the out fence once when the batch is done. We have a similar issue in
> > > v4l2 where we'd like the destination buffer for a set of requests (each
> > > covering one H264 slice) to be marked as done once the set was decoded.
> > >

Out of curiosity, what area did you find a 2D blitter useful for?

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Support for 2D engines/blitters in V4L2 and DRM
  2019-04-18  9:09     ` Tomasz Figa
@ 2019-04-18  9:13       ` Paul Kocialkowski
  2019-04-18  9:21         ` Tomasz Figa
  0 siblings, 1 reply; 24+ messages in thread
From: Paul Kocialkowski @ 2019-04-18  9:13 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: Daniel Vetter, Nicolas Dufresne, Linux Kernel Mailing List,
	Alexandre Courbot, Maxime Ripard, Hans Verkuil,
	Mauro Carvalho Chehab, Linux Media Mailing List, dri-devel,
	Thomas Petazzoni, Eric Anholt, Rob Clark, Dave Airlie,
	Maarten Lankhorst

Hi,

On Thu, 2019-04-18 at 18:09 +0900, Tomasz Figa wrote:
> On Thu, Apr 18, 2019 at 5:55 PM Paul Kocialkowski
> <paul.kocialkowski@bootlin.com> wrote:
> > Hi Daniel,
> > 
> > On Thu, 2019-04-18 at 10:18 +0200, Daniel Vetter wrote:
> > > On Wed, Apr 17, 2019 at 08:10:15PM +0200, Paul Kocialkowski wrote:
> > > > Hi Nicolas,
> > > > 
> > > > I'm detaching this thread from our V4L2 stateless decoding spec since
> > > > it has drifted off and would certainly be interesting to DRM folks as
> > > > well!
> > > > 
> > > > For context: I was initially talking about writing up support for the
> > > > Allwinner 2D engine as a DRM render driver, where I'd like to be able
> > > > to batch jobs that affect the same destination buffer to only signal
> > > > the out fence once when the batch is done. We have a similar issue in
> > > > v4l2 where we'd like the destination buffer for a set of requests (each
> > > > covering one H264 slice) to be marked as done once the set was decoded.
> > > > 
> 
> Out of curiosity, what area did you find a 2D blitter useful for?

The initial motivation is to bring up a DDX with that for platforms
that have 2D engines but no free software GPU drivers yet.

I also have a personal project in the works where I'd like to implement
accelerated UI rendering in 2D. The idea is to avoid using GL entirely.

That last point is in part because I have a GPU-less device that I want
to get going with mainline: http://linux-sunxi.org/F60_Action_Camera

Cheers,

Paul

-- 
Paul Kocialkowski, Bootlin
Embedded Linux and kernel engineering
https://bootlin.com


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Support for 2D engines/blitters in V4L2 and DRM
  2019-04-18  9:13       ` Paul Kocialkowski
@ 2019-04-18  9:21         ` Tomasz Figa
  0 siblings, 0 replies; 24+ messages in thread
From: Tomasz Figa @ 2019-04-18  9:21 UTC (permalink / raw)
  To: Paul Kocialkowski
  Cc: Daniel Vetter, Nicolas Dufresne, Linux Kernel Mailing List,
	Alexandre Courbot, Maxime Ripard, Hans Verkuil,
	Mauro Carvalho Chehab, Linux Media Mailing List, dri-devel,
	Thomas Petazzoni, Eric Anholt, Rob Clark, Dave Airlie,
	Maarten Lankhorst

On Thu, Apr 18, 2019 at 6:14 PM Paul Kocialkowski
<paul.kocialkowski@bootlin.com> wrote:
>
> Hi,
>
> On Thu, 2019-04-18 at 18:09 +0900, Tomasz Figa wrote:
> > On Thu, Apr 18, 2019 at 5:55 PM Paul Kocialkowski
> > <paul.kocialkowski@bootlin.com> wrote:
> > > Hi Daniel,
> > >
> > > On Thu, 2019-04-18 at 10:18 +0200, Daniel Vetter wrote:
> > > > On Wed, Apr 17, 2019 at 08:10:15PM +0200, Paul Kocialkowski wrote:
> > > > > Hi Nicolas,
> > > > >
> > > > > I'm detaching this thread from our V4L2 stateless decoding spec since
> > > > > it has drifted off and would certainly be interesting to DRM folks as
> > > > > well!
> > > > >
> > > > > For context: I was initially talking about writing up support for the
> > > > > Allwinner 2D engine as a DRM render driver, where I'd like to be able
> > > > > to batch jobs that affect the same destination buffer to only signal
> > > > > the out fence once when the batch is done. We have a similar issue in
> > > > > v4l2 where we'd like the destination buffer for a set of requests (each
> > > > > covering one H264 slice) to be marked as done once the set was decoded.
> > > > >
> >
> > Out of curiosity, what area did you find a 2D blitter useful for?
>
> The initial motivation is to bring up a DDX with that for platforms
> that have 2D engines but no free software GPU drivers yet.
>
> I also have a personal project in the works where I'd like to implement
> accelerated UI rendering in 2D. The idea is to avoid using GL entirely.
>
> That last point is in part because I have a GPU-less device that I want
> to get going with mainline: http://linux-sunxi.org/F60_Action_Camera

Okay, thanks.

I feel like the typical DRM model with a render node and a userspace
library would make sense for these specific use cases on these
specific hardware platforms then.

Hopefully the availability of open drivers for 3D engines continues to improve.

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Support for 2D engines/blitters in V4L2 and DRM
  2019-04-18  8:18 ` Daniel Vetter
  2019-04-18  8:54   ` Paul Kocialkowski
@ 2019-04-19  0:30   ` Nicolas Dufresne
  2019-04-19  4:27     ` Tomasz Figa
  2019-04-19  8:38     ` Paul Kocialkowski
  1 sibling, 2 replies; 24+ messages in thread
From: Nicolas Dufresne @ 2019-04-19  0:30 UTC (permalink / raw)
  To: Daniel Vetter, Paul Kocialkowski
  Cc: linux-kernel, Alexandre Courbot, Tomasz Figa, Maxime Ripard,
	Hans Verkuil, Mauro Carvalho Chehab, linux-media, dri-devel,
	Thomas Petazzoni, Eric Anholt, Rob Clark, Dave Airlie,
	Maarten Lankhorst

Le jeudi 18 avril 2019 à 10:18 +0200, Daniel Vetter a écrit :
> > It would be cool if both could be used concurrently and not just return
> > -EBUSY when the device is used with the other subsystem.
> 
> We live in this world already :-) I think there's even patches (or merged
> already) to add fences to v4l, for Android.

This work is currently suspended. It will require some feature on DRM
display to really make this useful, but there is also a lot of
challanges in V4L2. In GFX space, most of the use case are about
rendering as soon as possible. Though, in multimedia we have two
problems, we need to synchronize the frame rendering with the audio,
and output buffers may comes out of order due to how video CODECs are
made.

In the first, we'd need a mechanism where we can schedule a render at a
specific time or vblank. We can of course already implement this in
software, but with fences, the scheduling would need to be done in the
driver. Then if the fence is signalled earlier, the driver should hold
on until the delay is met. If the fence got signalled late, we also
need to think of a workflow. As we can't schedule more then one render
in DRM at one time, I don't really see yet how to make that work.

For the second, it's complicated on V4L2 side. Currently we signal
buffers when they are ready in the display order. With fences, we
receive early pairs buffer and fence (in decoding order). There exist
cases where reordering is done by the driver (stateful CODEC). We
cannot schedule these immediately we would need a new mechanism to know
which one come next. If we just reuse current mechnism, it would void
the fence usage since the fence will always be signalled by the time it
reaches DRM or other v4l2 component.

There also other issues, for video capture pipeline, if you are not
rendering ASAP, you need the HW timestamp in order to schedule. Again,
we'd get the fence early, but the actual timestamp will be signalled at
the very last minutes, so we also risk of turning the fence into pure
overhead. Note that as we speak, I have colleagues who are
experimenting with frame timestamp prediction that slaves to the
effective timestamp (catching up over time). But we still have issues
when the capture driver skipped a frame (missed a capture window).


I hope this is useful reflection data,
Nicolas


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Support for 2D engines/blitters in V4L2 and DRM
  2019-04-19  0:30   ` Nicolas Dufresne
@ 2019-04-19  4:27     ` Tomasz Figa
  2019-04-19 15:31       ` Nicolas Dufresne
  2019-04-19  8:38     ` Paul Kocialkowski
  1 sibling, 1 reply; 24+ messages in thread
From: Tomasz Figa @ 2019-04-19  4:27 UTC (permalink / raw)
  To: Nicolas Dufresne
  Cc: Daniel Vetter, Paul Kocialkowski, Linux Kernel Mailing List,
	Alexandre Courbot, Maxime Ripard, Hans Verkuil,
	Mauro Carvalho Chehab, Linux Media Mailing List, dri-devel,
	Thomas Petazzoni, Eric Anholt, Rob Clark, Dave Airlie,
	Maarten Lankhorst

On Fri, Apr 19, 2019 at 9:30 AM Nicolas Dufresne <nicolas@ndufresne.ca> wrote:
>
> Le jeudi 18 avril 2019 à 10:18 +0200, Daniel Vetter a écrit :
> > > It would be cool if both could be used concurrently and not just return
> > > -EBUSY when the device is used with the other subsystem.
> >
> > We live in this world already :-) I think there's even patches (or merged
> > already) to add fences to v4l, for Android.
>
> This work is currently suspended. It will require some feature on DRM
> display to really make this useful, but there is also a lot of
> challanges in V4L2. In GFX space, most of the use case are about
> rendering as soon as possible. Though, in multimedia we have two
> problems, we need to synchronize the frame rendering with the audio,
> and output buffers may comes out of order due to how video CODECs are
> made.
>
> In the first, we'd need a mechanism where we can schedule a render at a
> specific time or vblank. We can of course already implement this in
> software, but with fences, the scheduling would need to be done in the
> driver. Then if the fence is signalled earlier, the driver should hold
> on until the delay is met. If the fence got signalled late, we also
> need to think of a workflow. As we can't schedule more then one render
> in DRM at one time, I don't really see yet how to make that work.
>
> For the second, it's complicated on V4L2 side. Currently we signal
> buffers when they are ready in the display order. With fences, we
> receive early pairs buffer and fence (in decoding order). There exist
> cases where reordering is done by the driver (stateful CODEC). We
> cannot schedule these immediately we would need a new mechanism to know
> which one come next. If we just reuse current mechnism, it would void
> the fence usage since the fence will always be signalled by the time it
> reaches DRM or other v4l2 component.
>
> There also other issues, for video capture pipeline, if you are not
> rendering ASAP, you need the HW timestamp in order to schedule. Again,
> we'd get the fence early, but the actual timestamp will be signalled at
> the very last minutes, so we also risk of turning the fence into pure
> overhead. Note that as we speak, I have colleagues who are
> experimenting with frame timestamp prediction that slaves to the
> effective timestamp (catching up over time). But we still have issues
> when the capture driver skipped a frame (missed a capture window).

Note that a fence has a timestamp internally and it can be queried for
it from the user space if exposed as a sync file:
https://elixir.bootlin.com/linux/v5.1-rc5/source/drivers/dma-buf/sync_file.c#L386

Fences in V4L2 would be also useful for stateless decoders and any
mem-to-mem processors that operate in order, like the blitters
mentioned here or actually camera ISPs, which can be often chained
into relatively sophisticated pipelines.

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Support for 2D engines/blitters in V4L2 and DRM
  2019-04-19  0:30   ` Nicolas Dufresne
  2019-04-19  4:27     ` Tomasz Figa
@ 2019-04-19  8:38     ` Paul Kocialkowski
  2019-04-24  8:31       ` Michel Dänzer
  1 sibling, 1 reply; 24+ messages in thread
From: Paul Kocialkowski @ 2019-04-19  8:38 UTC (permalink / raw)
  To: Nicolas Dufresne, Daniel Vetter
  Cc: linux-kernel, Alexandre Courbot, Tomasz Figa, Maxime Ripard,
	Hans Verkuil, Mauro Carvalho Chehab, linux-media, dri-devel,
	Thomas Petazzoni, Eric Anholt, Rob Clark, Dave Airlie,
	Maarten Lankhorst

Hi,

On Thu, 2019-04-18 at 20:30 -0400, Nicolas Dufresne wrote:
> Le jeudi 18 avril 2019 à 10:18 +0200, Daniel Vetter a écrit :
> > > It would be cool if both could be used concurrently and not just return
> > > -EBUSY when the device is used with the other subsystem.
> > 
> > We live in this world already :-) I think there's even patches (or merged
> > already) to add fences to v4l, for Android.
> 
> This work is currently suspended. It will require some feature on DRM
> display to really make this useful, but there is also a lot of
> challanges in V4L2. In GFX space, most of the use case are about
> rendering as soon as possible. Though, in multimedia we have two
> problems, we need to synchronize the frame rendering with the audio,
> and output buffers may comes out of order due to how video CODECs are
> made.

Definitely, it feels like the DRM display side is currently a good fit
for render use cases, but not so much for precise display cases where
we want to try and display a buffer at a given vblank target instead of
"as soon as possible".

I have a userspace project where I've implemented a page flip queue,
which only schedules the next flip when relevant and keeps ready
buffers in the queue until then. This requires explicit vblank
syncronisation (which DRM offsers, but pretty much all other display
APIs, that are higher-level don't, so I'm just using a refresh-rate
timer for them) and flip done notification.

I haven't looked too much at how to flip with a target vblank with DRM
directly but maybe the atomic API already has the bits in for that (but
I haven't heard of such a thing as a buffer queue, so that makes me
doubt it). Well, I need to handle stuff like SDL in my userspace
project, so I have to have all that queuing stuff in software anyway,
but it would be good if each project didn't have to implement that.
Worst case, it could be in libdrm too.

> In the first, we'd need a mechanism where we can schedule a render at a
> specific time or vblank. We can of course already implement this in
> software, but with fences, the scheduling would need to be done in the
> driver. Then if the fence is signalled earlier, the driver should hold
> on until the delay is met. If the fence got signalled late, we also
> need to think of a workflow. As we can't schedule more then one render
> in DRM at one time, I don't really see yet how to make that work.

Indeed, that's also one of the main issues I've spotted. Before using
an implicit fence, we basically have to make sure the frame is due for
display at the next vblank. Otherwise, we need to refrain from using
the fence and schedule the flip later, which is kind of counter-
productive.

So maybe adding this queue in DRM directly would make everyone's life
much easier for non-render applications.

I feel like specifying a target vblank would be a good unit for that,
since it's our native granularity after all (while a timestamp is not).

> For the second, it's complicated on V4L2 side. Currently we signal
> buffers when they are ready in the display order. With fences, we
> receive early pairs buffer and fence (in decoding order). There exist
> cases where reordering is done by the driver (stateful CODEC). We
> cannot schedule these immediately we would need a new mechanism to know
> which one come next. If we just reuse current mechnism, it would void
> the fence usage since the fence will always be signalled by the time it
> reaches DRM or other v4l2 component.

Well, our v4l2 buffers do have a timestamp and fences expose it too, so
we'd need DRM to convert that to a target vblank and add it to the
internal queue mentioned above. That seems doable.

I think we only gave a vague meaning to the v4l2 timestamp for the
decoding case and it could be any number, the timestamp when submitting
decoding or the target timestamp for the frame. I think we should aim
for the latter, but not sure it's always doable to know beforehand.
Perhaps you have a clear idea of this?

> There also other issues, for video capture pipeline, if you are not
> rendering ASAP, you need the HW timestamp in order to schedule. Again,
> we'd get the fence early, but the actual timestamp will be signalled at
> the very last minutes, so we also risk of turning the fence into pure
> overhead. Note that as we speak, I have colleagues who are
> experimenting with frame timestamp prediction that slaves to the
> effective timestamp (catching up over time). But we still have issues
> when the capture driver skipped a frame (missed a capture window).
> 
> I hope this is useful reflection data,

It is definitely very useful and there seems to be a few things that
could be improved already without too much effort.

Cheers,

Paul

-- 
Paul Kocialkowski, Bootlin
Embedded Linux and kernel engineering
https://bootlin.com


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Support for 2D engines/blitters in V4L2 and DRM
  2019-04-19  4:27     ` Tomasz Figa
@ 2019-04-19 15:31       ` Nicolas Dufresne
  2019-04-22  4:02         ` Tomasz Figa
  0 siblings, 1 reply; 24+ messages in thread
From: Nicolas Dufresne @ 2019-04-19 15:31 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: Daniel Vetter, Paul Kocialkowski, Linux Kernel Mailing List,
	Alexandre Courbot, Maxime Ripard, Hans Verkuil,
	Mauro Carvalho Chehab, Linux Media Mailing List, dri-devel,
	Thomas Petazzoni, Eric Anholt, Rob Clark, Dave Airlie,
	Maarten Lankhorst

Le vendredi 19 avril 2019 à 13:27 +0900, Tomasz Figa a écrit :
> On Fri, Apr 19, 2019 at 9:30 AM Nicolas Dufresne <nicolas@ndufresne.ca> wrote:
> > Le jeudi 18 avril 2019 à 10:18 +0200, Daniel Vetter a écrit :
> > > > It would be cool if both could be used concurrently and not just return
> > > > -EBUSY when the device is used with the other subsystem.
> > > 
> > > We live in this world already :-) I think there's even patches (or merged
> > > already) to add fences to v4l, for Android.
> > 
> > This work is currently suspended. It will require some feature on DRM
> > display to really make this useful, but there is also a lot of
> > challanges in V4L2. In GFX space, most of the use case are about
> > rendering as soon as possible. Though, in multimedia we have two
> > problems, we need to synchronize the frame rendering with the audio,
> > and output buffers may comes out of order due to how video CODECs are
> > made.
> > 
> > In the first, we'd need a mechanism where we can schedule a render at a
> > specific time or vblank. We can of course already implement this in
> > software, but with fences, the scheduling would need to be done in the
> > driver. Then if the fence is signalled earlier, the driver should hold
> > on until the delay is met. If the fence got signalled late, we also
> > need to think of a workflow. As we can't schedule more then one render
> > in DRM at one time, I don't really see yet how to make that work.
> > 
> > For the second, it's complicated on V4L2 side. Currently we signal
> > buffers when they are ready in the display order. With fences, we
> > receive early pairs buffer and fence (in decoding order). There exist
> > cases where reordering is done by the driver (stateful CODEC). We
> > cannot schedule these immediately we would need a new mechanism to know
> > which one come next. If we just reuse current mechnism, it would void
> > the fence usage since the fence will always be signalled by the time it
> > reaches DRM or other v4l2 component.
> > 
> > There also other issues, for video capture pipeline, if you are not
> > rendering ASAP, you need the HW timestamp in order to schedule. Again,
> > we'd get the fence early, but the actual timestamp will be signalled at
> > the very last minutes, so we also risk of turning the fence into pure
> > overhead. Note that as we speak, I have colleagues who are
> > experimenting with frame timestamp prediction that slaves to the
> > effective timestamp (catching up over time). But we still have issues
> > when the capture driver skipped a frame (missed a capture window).
> 
> Note that a fence has a timestamp internally and it can be queried for
> it from the user space if exposed as a sync file:
> https://elixir.bootlin.com/linux/v5.1-rc5/source/drivers/dma-buf/sync_file.c#L386

Don't we need something the other way around ? This seems to be the
timestamp of when it was triggered (I'm not familiar with this though).

> 
> Fences in V4L2 would be also useful for stateless decoders and any
> mem-to-mem processors that operate in order, like the blitters
> mentioned here or actually camera ISPs, which can be often chained
> into relatively sophisticated pipelines.

I agree fence can be used to optimize specific corner cases. They are
not as critical in V4L2 since we have async queues. I think the use
case for fences in V4L2 is mostly to lower the latency. Not all use
cases requires such a low latency. There was argument around fences
that is simplify the the code, I haven't seen a compelling argument
demonstrating that this would be the case for V4L2 programming. The
only case is when doing V4L2 to DRM exchanges, and only in the context
where time synchronization does not matter. In fact, so far it is more
work since information starts flowing through separate events
(buffer/fence first, later timestamps and possibly critical metadata.
This might be induced by the design, but clearly there is a slight API
clash.

> 
> Best regards,
> Tomasz


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Support for 2D engines/blitters in V4L2 and DRM
  2019-04-19 15:31       ` Nicolas Dufresne
@ 2019-04-22  4:02         ` Tomasz Figa
  0 siblings, 0 replies; 24+ messages in thread
From: Tomasz Figa @ 2019-04-22  4:02 UTC (permalink / raw)
  To: Nicolas Dufresne
  Cc: Daniel Vetter, Paul Kocialkowski, Linux Kernel Mailing List,
	Alexandre Courbot, Maxime Ripard, Hans Verkuil,
	Mauro Carvalho Chehab, Linux Media Mailing List, dri-devel,
	Thomas Petazzoni, Eric Anholt, Rob Clark, Dave Airlie,
	Maarten Lankhorst

On Sat, Apr 20, 2019 at 12:31 AM Nicolas Dufresne <nicolas@ndufresne.ca> wrote:
>
> Le vendredi 19 avril 2019 à 13:27 +0900, Tomasz Figa a écrit :
> > On Fri, Apr 19, 2019 at 9:30 AM Nicolas Dufresne <nicolas@ndufresne.ca> wrote:
> > > Le jeudi 18 avril 2019 à 10:18 +0200, Daniel Vetter a écrit :
> > > > > It would be cool if both could be used concurrently and not just return
> > > > > -EBUSY when the device is used with the other subsystem.
> > > >
> > > > We live in this world already :-) I think there's even patches (or merged
> > > > already) to add fences to v4l, for Android.
> > >
> > > This work is currently suspended. It will require some feature on DRM
> > > display to really make this useful, but there is also a lot of
> > > challanges in V4L2. In GFX space, most of the use case are about
> > > rendering as soon as possible. Though, in multimedia we have two
> > > problems, we need to synchronize the frame rendering with the audio,
> > > and output buffers may comes out of order due to how video CODECs are
> > > made.
> > >
> > > In the first, we'd need a mechanism where we can schedule a render at a
> > > specific time or vblank. We can of course already implement this in
> > > software, but with fences, the scheduling would need to be done in the
> > > driver. Then if the fence is signalled earlier, the driver should hold
> > > on until the delay is met. If the fence got signalled late, we also
> > > need to think of a workflow. As we can't schedule more then one render
> > > in DRM at one time, I don't really see yet how to make that work.
> > >
> > > For the second, it's complicated on V4L2 side. Currently we signal
> > > buffers when they are ready in the display order. With fences, we
> > > receive early pairs buffer and fence (in decoding order). There exist
> > > cases where reordering is done by the driver (stateful CODEC). We
> > > cannot schedule these immediately we would need a new mechanism to know
> > > which one come next. If we just reuse current mechnism, it would void
> > > the fence usage since the fence will always be signalled by the time it
> > > reaches DRM or other v4l2 component.
> > >
> > > There also other issues, for video capture pipeline, if you are not
> > > rendering ASAP, you need the HW timestamp in order to schedule. Again,
> > > we'd get the fence early, but the actual timestamp will be signalled at
> > > the very last minutes, so we also risk of turning the fence into pure
> > > overhead. Note that as we speak, I have colleagues who are
> > > experimenting with frame timestamp prediction that slaves to the
> > > effective timestamp (catching up over time). But we still have issues
> > > when the capture driver skipped a frame (missed a capture window).
> >
> > Note that a fence has a timestamp internally and it can be queried for
> > it from the user space if exposed as a sync file:
> > https://elixir.bootlin.com/linux/v5.1-rc5/source/drivers/dma-buf/sync_file.c#L386
>
> Don't we need something the other way around ? This seems to be the
> timestamp of when it was triggered (I'm not familiar with this though).
>

Honestly, I'm not fully sure what this timestamp is expected to be.

For video capture pipeline the fence would signal once the whole frame
is captured, so I think it could be a reasonable value to consider
later in the pipeline?

> >
> > Fences in V4L2 would be also useful for stateless decoders and any
> > mem-to-mem processors that operate in order, like the blitters
> > mentioned here or actually camera ISPs, which can be often chained
> > into relatively sophisticated pipelines.
>
> I agree fence can be used to optimize specific corner cases. They are
> not as critical in V4L2 since we have async queues.

I wouldn't call those corner cases. A stateful decoder is actually one
of the opposite extremes, because one would normally just decode and
show the frame, so not much complexity needed to handle it and async
queues actually work quite well.

I don't think async queues are very helpful for any more complicated
use cases. The userspace still needs to wake up and push the buffers
through the pipeline. If you have some depth across the whole
pipeline, with queues always having some buffers waiting to be
processed, fences indeed wouldn't change too much (+/- the CPU
time/power wasted on context switches). However, with real time use
cases, such as anything involving streaming from cameras, image
processing stages and encoding into a stream to be passed to a
latency-sensitive application, such as WebRTC, the latency imposed by
the lack of fences would be significant. Especially if the image
processing in between consists of several inter-dependent stages.

> I think the use
> case for fences in V4L2 is mostly to lower the latency. Not all use
> cases requires such a low latency.

Indeed, not all, but I think it doesn't make fences less important,
given that there are use cases that require such a low latency.

> There was argument around fences
> that is simplify the the code, I haven't seen a compelling argument
> demonstrating that this would be the case for V4L2 programming. The
> only case is when doing V4L2 to DRM exchanges, and only in the context
> where time synchronization does not matter.

Another huge use case would be Android. The lack of fences is a
significant show stopper for V4L2 adoption there.

Also, V4L2 to GPU (GLES, Vulkan) exchange should not be forgotten too.

> In fact, so far it is more
> work since information starts flowing through separate events
> (buffer/fence first, later timestamps and possibly critical metadata.
> This might be induced by the design, but clearly there is a slight API
> clash.

Well, nothing is perfect from the start. (In fact, probably nothing is
perfect in general. ;))

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Support for 2D engines/blitters in V4L2 and DRM
  2019-04-19  8:38     ` Paul Kocialkowski
@ 2019-04-24  8:31       ` Michel Dänzer
  2019-04-24 12:01         ` Nicolas Dufresne
  2019-04-24 12:19         ` Paul Kocialkowski
  0 siblings, 2 replies; 24+ messages in thread
From: Michel Dänzer @ 2019-04-24  8:31 UTC (permalink / raw)
  To: Paul Kocialkowski, Nicolas Dufresne, Daniel Vetter
  Cc: Alexandre Courbot, Maxime Ripard, linux-kernel, dri-devel,
	Tomasz Figa, Hans Verkuil, Thomas Petazzoni, Dave Airlie,
	Mauro Carvalho Chehab, linux-media

On 2019-04-19 10:38 a.m., Paul Kocialkowski wrote:
> On Thu, 2019-04-18 at 20:30 -0400, Nicolas Dufresne wrote:
>> Le jeudi 18 avril 2019 à 10:18 +0200, Daniel Vetter a écrit :
>>>> It would be cool if both could be used concurrently and not just return
>>>> -EBUSY when the device is used with the other subsystem.
>>>
>>> We live in this world already :-) I think there's even patches (or merged
>>> already) to add fences to v4l, for Android.
>>
>> This work is currently suspended. It will require some feature on DRM
>> display to really make this useful, but there is also a lot of
>> challanges in V4L2. In GFX space, most of the use case are about
>> rendering as soon as possible. Though, in multimedia we have two
>> problems, we need to synchronize the frame rendering with the audio,
>> and output buffers may comes out of order due to how video CODECs are
>> made.
> 
> Definitely, it feels like the DRM display side is currently a good fit
> for render use cases, but not so much for precise display cases where
> we want to try and display a buffer at a given vblank target instead of
> "as soon as possible".
> 
> I have a userspace project where I've implemented a page flip queue,
> which only schedules the next flip when relevant and keeps ready
> buffers in the queue until then. This requires explicit vblank
> syncronisation (which DRM offsers, but pretty much all other display
> APIs, that are higher-level don't, so I'm just using a refresh-rate
> timer for them) and flip done notification.
> 
> I haven't looked too much at how to flip with a target vblank with DRM
> directly but maybe the atomic API already has the bits in for that (but
> I haven't heard of such a thing as a buffer queue, so that makes me
> doubt it).

Not directly. What's available is that if userspace waits for vblank n
and then submits a flip, the flip will complete in vblank n+1 (or a
later vblank, depending on when the flip is submitted and when the
fences the flip depends on signal).

There is reluctance allowing more than one flip to be queued in the
kernel, as it would considerably increase complexity in the kernel. It
would probably only be considered if there was a compelling use-case
which was outright impossible otherwise.


> Well, I need to handle stuff like SDL in my userspace project, so I have
> to have all that queuing stuff in software anyway, but it would be good
> if each project didn't have to implement that. Worst case, it could be
> in libdrm too.

Usually, this kind of queuing will be handled in a display server such
as Xorg or a Wayland compositor, not by the application such as a video
player itself, or any library in the latter's address space. I'm not
sure there's much potential for sharing code between display servers for
this.


>> In the first, we'd need a mechanism where we can schedule a render at a
>> specific time or vblank. We can of course already implement this in
>> software, but with fences, the scheduling would need to be done in the
>> driver. Then if the fence is signalled earlier, the driver should hold
>> on until the delay is met. If the fence got signalled late, we also
>> need to think of a workflow. As we can't schedule more then one render
>> in DRM at one time, I don't really see yet how to make that work.
> 
> Indeed, that's also one of the main issues I've spotted. Before using
> an implicit fence, we basically have to make sure the frame is due for
> display at the next vblank. Otherwise, we need to refrain from using
> the fence and schedule the flip later, which is kind of counter-
> productive.

Fences are about signalling that the contents of a frame are "done" and
ready to be presented. They're not about specifying which frame is to be
presented when.


> I feel like specifying a target vblank would be a good unit for that,

The mechanism described above works for that.

> since it's our native granularity after all (while a timestamp is not).

Note that variable refresh rate (Adaptive Sync / FreeSync / G-Sync)
changes things in this regard. It makes the vblank length variable, and
if you wait for multiple vblanks between flips, you get the maximum
vblank length corresponding to the minimum refresh rate / timing
granularity. Thus, it would be useful to allow userspace to specify a
timestamp corresponding to the earliest time when the flip is to
complete. The kernel could then try to hit that as closely as possible.


-- 
Earthling Michel Dänzer               |              https://www.amd.com
Libre software enthusiast             |             Mesa and X developer

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Support for 2D engines/blitters in V4L2 and DRM
  2019-04-24  8:31       ` Michel Dänzer
@ 2019-04-24 12:01         ` Nicolas Dufresne
  2019-04-24 14:39           ` Michel Dänzer
  2019-04-24 12:19         ` Paul Kocialkowski
  1 sibling, 1 reply; 24+ messages in thread
From: Nicolas Dufresne @ 2019-04-24 12:01 UTC (permalink / raw)
  To: Michel Dänzer, Paul Kocialkowski, Daniel Vetter
  Cc: Alexandre Courbot, Maxime Ripard, linux-kernel, dri-devel,
	Tomasz Figa, Hans Verkuil, Thomas Petazzoni, Dave Airlie,
	Mauro Carvalho Chehab, linux-media

Le mercredi 24 avril 2019 à 10:31 +0200, Michel Dänzer a écrit :
> On 2019-04-19 10:38 a.m., Paul Kocialkowski wrote:
> > On Thu, 2019-04-18 at 20:30 -0400, Nicolas Dufresne wrote:
> > > Le jeudi 18 avril 2019 à 10:18 +0200, Daniel Vetter a écrit :
> > > > > It would be cool if both could be used concurrently and not just return
> > > > > -EBUSY when the device is used with the other subsystem.
> > > > 
> > > > We live in this world already :-) I think there's even patches (or merged
> > > > already) to add fences to v4l, for Android.
> > > 
> > > This work is currently suspended. It will require some feature on DRM
> > > display to really make this useful, but there is also a lot of
> > > challanges in V4L2. In GFX space, most of the use case are about
> > > rendering as soon as possible. Though, in multimedia we have two
> > > problems, we need to synchronize the frame rendering with the audio,
> > > and output buffers may comes out of order due to how video CODECs are
> > > made.
> > 
> > Definitely, it feels like the DRM display side is currently a good fit
> > for render use cases, but not so much for precise display cases where
> > we want to try and display a buffer at a given vblank target instead of
> > "as soon as possible".
> > 
> > I have a userspace project where I've implemented a page flip queue,
> > which only schedules the next flip when relevant and keeps ready
> > buffers in the queue until then. This requires explicit vblank
> > syncronisation (which DRM offsers, but pretty much all other display
> > APIs, that are higher-level don't, so I'm just using a refresh-rate
> > timer for them) and flip done notification.
> > 
> > I haven't looked too much at how to flip with a target vblank with DRM
> > directly but maybe the atomic API already has the bits in for that (but
> > I haven't heard of such a thing as a buffer queue, so that makes me
> > doubt it).
> 
> Not directly. What's available is that if userspace waits for vblank n
> and then submits a flip, the flip will complete in vblank n+1 (or a
> later vblank, depending on when the flip is submitted and when the
> fences the flip depends on signal).
> 
> There is reluctance allowing more than one flip to be queued in the
> kernel, as it would considerably increase complexity in the kernel. It
> would probably only be considered if there was a compelling use-case
> which was outright impossible otherwise.
> 
> 
> > Well, I need to handle stuff like SDL in my userspace project, so I have
> > to have all that queuing stuff in software anyway, but it would be good
> > if each project didn't have to implement that. Worst case, it could be
> > in libdrm too.
> 
> Usually, this kind of queuing will be handled in a display server such
> as Xorg or a Wayland compositor, not by the application such as a video
> player itself, or any library in the latter's address space. I'm not
> sure there's much potential for sharing code between display servers for
> this.
> 
> 
> > > In the first, we'd need a mechanism where we can schedule a render at a
> > > specific time or vblank. We can of course already implement this in
> > > software, but with fences, the scheduling would need to be done in the
> > > driver. Then if the fence is signalled earlier, the driver should hold
> > > on until the delay is met. If the fence got signalled late, we also
> > > need to think of a workflow. As we can't schedule more then one render
> > > in DRM at one time, I don't really see yet how to make that work.
> > 
> > Indeed, that's also one of the main issues I've spotted. Before using
> > an implicit fence, we basically have to make sure the frame is due for
> > display at the next vblank. Otherwise, we need to refrain from using
> > the fence and schedule the flip later, which is kind of counter-
> > productive.
> 
> Fences are about signalling that the contents of a frame are "done" and
> ready to be presented. They're not about specifying which frame is to be
> presented when.
> 
> 
> > I feel like specifying a target vblank would be a good unit for that,
> 
> The mechanism described above works for that.
> 
> > since it's our native granularity after all (while a timestamp is not).
> 
> Note that variable refresh rate (Adaptive Sync / FreeSync / G-Sync)
> changes things in this regard. It makes the vblank length variable, and
> if you wait for multiple vblanks between flips, you get the maximum
> vblank length corresponding to the minimum refresh rate / timing
> granularity. Thus, it would be useful to allow userspace to specify a
> timestamp corresponding to the earliest time when the flip is to
> complete. The kernel could then try to hit that as closely as possible.

Rendering a video stream is more complex then what you describe here.
Whenever there is a unexpected delay (late delivery of a frame as an
example) you may endup in situation where one frame is ready after the
targeted vblank. If there is another frame that targets the following
vblank that gets ready on-time, the previous frame should be replaced
by the most recent one.

With fences, what happens is that even if you received the next frame
on time, naively replacing it is not possible, because we don't know
when the fence for the next frame will be signalled. If you simply
always replace the current frame, you may endup skipping a lot more
vblank then what you expect, and that results in jumpy playback.

Render queues with timestamp are used to smooth rendering and handle
rendering collision so that the latency is kept low (like when you have
a 100fps video over a 60Hz display). This is normally done in
userspace, but with fences, you ask the kernel to render something in
an unpredictable future, so we loose the ability to make the final
decision.

> 
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Support for 2D engines/blitters in V4L2 and DRM
  2019-04-24  8:31       ` Michel Dänzer
  2019-04-24 12:01         ` Nicolas Dufresne
@ 2019-04-24 12:19         ` Paul Kocialkowski
  2019-04-24 17:10           ` Michel Dänzer
  1 sibling, 1 reply; 24+ messages in thread
From: Paul Kocialkowski @ 2019-04-24 12:19 UTC (permalink / raw)
  To: Michel Dänzer, Nicolas Dufresne, Daniel Vetter
  Cc: Alexandre Courbot, Maxime Ripard, linux-kernel, dri-devel,
	Tomasz Figa, Hans Verkuil, Thomas Petazzoni, Dave Airlie,
	Mauro Carvalho Chehab, linux-media

Hi,

On Wed, 2019-04-24 at 10:31 +0200, Michel Dänzer wrote:
> On 2019-04-19 10:38 a.m., Paul Kocialkowski wrote:
> > On Thu, 2019-04-18 at 20:30 -0400, Nicolas Dufresne wrote:
> > > Le jeudi 18 avril 2019 à 10:18 +0200, Daniel Vetter a écrit :
> > > > > It would be cool if both could be used concurrently and not just return
> > > > > -EBUSY when the device is used with the other subsystem.
> > > > 
> > > > We live in this world already :-) I think there's even patches (or merged
> > > > already) to add fences to v4l, for Android.
> > > 
> > > This work is currently suspended. It will require some feature on DRM
> > > display to really make this useful, but there is also a lot of
> > > challanges in V4L2. In GFX space, most of the use case are about
> > > rendering as soon as possible. Though, in multimedia we have two
> > > problems, we need to synchronize the frame rendering with the audio,
> > > and output buffers may comes out of order due to how video CODECs are
> > > made.
> > 
> > Definitely, it feels like the DRM display side is currently a good fit
> > for render use cases, but not so much for precise display cases where
> > we want to try and display a buffer at a given vblank target instead of
> > "as soon as possible".
> > 
> > I have a userspace project where I've implemented a page flip queue,
> > which only schedules the next flip when relevant and keeps ready
> > buffers in the queue until then. This requires explicit vblank
> > syncronisation (which DRM offsers, but pretty much all other display
> > APIs, that are higher-level don't, so I'm just using a refresh-rate
> > timer for them) and flip done notification.
> > 
> > I haven't looked too much at how to flip with a target vblank with DRM
> > directly but maybe the atomic API already has the bits in for that (but
> > I haven't heard of such a thing as a buffer queue, so that makes me
> > doubt it).
> 
> Not directly. What's available is that if userspace waits for vblank n
> and then submits a flip, the flip will complete in vblank n+1 (or a
> later vblank, depending on when the flip is submitted and when the
> fences the flip depends on signal).
> 
> There is reluctance allowing more than one flip to be queued in the
> kernel, as it would considerably increase complexity in the kernel. It
> would probably only be considered if there was a compelling use-case
> which was outright impossible otherwise.

Well, I think it's just less boilerplace for userspace. This is indeed
quite complex, and I prefer to see that complexity done once and well
in Linux rather than duplicated in userspace with more or less reliable
implementations.

> > Well, I need to handle stuff like SDL in my userspace project, so I have
> > to have all that queuing stuff in software anyway, but it would be good
> > if each project didn't have to implement that. Worst case, it could be
> > in libdrm too.
> 
> Usually, this kind of queuing will be handled in a display server such
> as Xorg or a Wayland compositor, not by the application such as a video
> player itself, or any library in the latter's address space. I'm not
> sure there's much potential for sharing code between display servers for
> this.

This assumes that you are using a display server, which is definitely
not always the case (there is e.g. Kodi GBM). Well, I'm not saying it
is essential to have it in the kernel, but it would avoid code
duplication and lower the complexity in userspace.

> > > In the first, we'd need a mechanism where we can schedule a render at a
> > > specific time or vblank. We can of course already implement this in
> > > software, but with fences, the scheduling would need to be done in the
> > > driver. Then if the fence is signalled earlier, the driver should hold
> > > on until the delay is met. If the fence got signalled late, we also
> > > need to think of a workflow. As we can't schedule more then one render
> > > in DRM at one time, I don't really see yet how to make that work.
> > 
> > Indeed, that's also one of the main issues I've spotted. Before using
> > an implicit fence, we basically have to make sure the frame is due for
> > display at the next vblank. Otherwise, we need to refrain from using
> > the fence and schedule the flip later, which is kind of counter-
> > productive.
> 
> Fences are about signalling that the contents of a frame are "done" and
> ready to be presented. They're not about specifying which frame is to be
> presented when.

Yes, that's precisely the issue I see with them. Once you have
scheduled the flip with a buffer, it is too late to schedule a more
recent buffer for flip if a more recent buffer is available sooner (see
the issue that Nicolas is describing). If you attach a vblank target to
the flip, the flip can be skipped when the fence is signaled if a more
recent buffer was signaled first.

> > I feel like specifying a target vblank would be a good unit for that,
> 
> The mechanism described above works for that.

I still don't see any fence-based mechanism that can work to achieve
that, but maybe I'm missing your point.

> > since it's our native granularity after all (while a timestamp is not).
> 
> Note that variable refresh rate (Adaptive Sync / FreeSync / G-Sync)
> changes things in this regard. It makes the vblank length variable, and
> if you wait for multiple vblanks between flips, you get the maximum
> vblank length corresponding to the minimum refresh rate / timing
> granularity. Thus, it would be useful to allow userspace to specify a
> timestamp corresponding to the earliest time when the flip is to
> complete. The kernel could then try to hit that as closely as possible.

I'm not very familiar with how this works, but I don't really see what
it changes. Does it mean we can flip multiple times per vblank?
If so, how can userspace be aware of that and deal with it properly?
Unless I'm missing something, I think flip scheduling should still work
on vblank granularity in that case.

And I really like a vblank count over a timestamp, as one is the native
unit at hand and the other one only correleates to it.

Cheers,

Paul
-- 
Paul Kocialkowski, Bootlin
Embedded Linux and kernel engineering
https://bootlin.com


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Support for 2D engines/blitters in V4L2 and DRM
  2019-04-24 12:01         ` Nicolas Dufresne
@ 2019-04-24 14:39           ` Michel Dänzer
  2019-04-24 14:41             ` Paul Kocialkowski
  0 siblings, 1 reply; 24+ messages in thread
From: Michel Dänzer @ 2019-04-24 14:39 UTC (permalink / raw)
  To: Nicolas Dufresne, Paul Kocialkowski, Daniel Vetter
  Cc: Alexandre Courbot, Maxime Ripard, linux-kernel, dri-devel,
	Tomasz Figa, Hans Verkuil, Thomas Petazzoni, Dave Airlie,
	Mauro Carvalho Chehab, linux-media

On 2019-04-24 2:01 p.m., Nicolas Dufresne wrote:
> Le mercredi 24 avril 2019 à 10:31 +0200, Michel Dänzer a écrit :
>> On 2019-04-19 10:38 a.m., Paul Kocialkowski wrote:
>>> On Thu, 2019-04-18 at 20:30 -0400, Nicolas Dufresne wrote:
>>>> Le jeudi 18 avril 2019 à 10:18 +0200, Daniel Vetter a écrit :
>>
>>>> In the first, we'd need a mechanism where we can schedule a render at a
>>>> specific time or vblank. We can of course already implement this in
>>>> software, but with fences, the scheduling would need to be done in the
>>>> driver. Then if the fence is signalled earlier, the driver should hold
>>>> on until the delay is met. If the fence got signalled late, we also
>>>> need to think of a workflow. As we can't schedule more then one render
>>>> in DRM at one time, I don't really see yet how to make that work.
>>>
>>> Indeed, that's also one of the main issues I've spotted. Before using
>>> an implicit fence, we basically have to make sure the frame is due for
>>> display at the next vblank. Otherwise, we need to refrain from using
>>> the fence and schedule the flip later, which is kind of counter-
>>> productive.
>>
>> Fences are about signalling that the contents of a frame are "done" and
>> ready to be presented. They're not about specifying which frame is to be
>> presented when.
>>
>>
>>> I feel like specifying a target vblank would be a good unit for that,
>>
>> The mechanism described above works for that.
>>
>>> since it's our native granularity after all (while a timestamp is not).
>>
>> Note that variable refresh rate (Adaptive Sync / FreeSync / G-Sync)
>> changes things in this regard. It makes the vblank length variable, and
>> if you wait for multiple vblanks between flips, you get the maximum
>> vblank length corresponding to the minimum refresh rate / timing
>> granularity. Thus, it would be useful to allow userspace to specify a
>> timestamp corresponding to the earliest time when the flip is to
>> complete. The kernel could then try to hit that as closely as possible.
> 
> Rendering a video stream is more complex then what you describe here.
> Whenever there is a unexpected delay (late delivery of a frame as an
> example) you may endup in situation where one frame is ready after the
> targeted vblank. If there is another frame that targets the following
> vblank that gets ready on-time, the previous frame should be replaced
> by the most recent one.
> 
> With fences, what happens is that even if you received the next frame
> on time, naively replacing it is not possible, because we don't know
> when the fence for the next frame will be signalled. If you simply
> always replace the current frame, you may endup skipping a lot more
> vblank then what you expect, and that results in jumpy playback.

So you want to be able to replace a queued flip with another one then.
That doesn't necessarily require allowing more than one flip to be
queued ahead of time.

Note that this can also be done in userspace with explicit fencing (by
only selecting a frame and submitting it to the kernel after all
corresponding fences have signalled), at least to some degree, but the
kernel should be able to do it up to a later point in time and more
reliably, with less risk of missing a flip for a frame which becomes
ready just in time.


> Render queues with timestamp are used to smooth rendering and handle
> rendering collision so that the latency is kept low (like when you have
> a 100fps video over a 60Hz display). This is normally done in
> userspace, but with fences, you ask the kernel to render something in
> an unpredictable future, so we loose the ability to make the final
> decision.

That's just not what fences are intended to be used for with the current
KMS UAPI.


-- 
Earthling Michel Dänzer               |              https://www.amd.com
Libre software enthusiast             |             Mesa and X developer

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Support for 2D engines/blitters in V4L2 and DRM
  2019-04-24 14:39           ` Michel Dänzer
@ 2019-04-24 14:41             ` Paul Kocialkowski
  2019-04-24 15:06               ` Daniel Vetter
  0 siblings, 1 reply; 24+ messages in thread
From: Paul Kocialkowski @ 2019-04-24 14:41 UTC (permalink / raw)
  To: Michel Dänzer, Nicolas Dufresne, Daniel Vetter
  Cc: Alexandre Courbot, Maxime Ripard, linux-kernel, dri-devel,
	Tomasz Figa, Hans Verkuil, Thomas Petazzoni, Dave Airlie,
	Mauro Carvalho Chehab, linux-media

Hi,

On Wed, 2019-04-24 at 16:39 +0200, Michel Dänzer wrote:
> On 2019-04-24 2:01 p.m., Nicolas Dufresne wrote:
> > Le mercredi 24 avril 2019 à 10:31 +0200, Michel Dänzer a écrit :
> > > On 2019-04-19 10:38 a.m., Paul Kocialkowski wrote:
> > > > On Thu, 2019-04-18 at 20:30 -0400, Nicolas Dufresne wrote:
> > > > > Le jeudi 18 avril 2019 à 10:18 +0200, Daniel Vetter a écrit :
> > > > > In the first, we'd need a mechanism where we can schedule a render at a
> > > > > specific time or vblank. We can of course already implement this in
> > > > > software, but with fences, the scheduling would need to be done in the
> > > > > driver. Then if the fence is signalled earlier, the driver should hold
> > > > > on until the delay is met. If the fence got signalled late, we also
> > > > > need to think of a workflow. As we can't schedule more then one render
> > > > > in DRM at one time, I don't really see yet how to make that work.
> > > > 
> > > > Indeed, that's also one of the main issues I've spotted. Before using
> > > > an implicit fence, we basically have to make sure the frame is due for
> > > > display at the next vblank. Otherwise, we need to refrain from using
> > > > the fence and schedule the flip later, which is kind of counter-
> > > > productive.
> > > 
> > > Fences are about signalling that the contents of a frame are "done" and
> > > ready to be presented. They're not about specifying which frame is to be
> > > presented when.
> > > 
> > > 
> > > > I feel like specifying a target vblank would be a good unit for that,
> > > 
> > > The mechanism described above works for that.
> > > 
> > > > since it's our native granularity after all (while a timestamp is not).
> > > 
> > > Note that variable refresh rate (Adaptive Sync / FreeSync / G-Sync)
> > > changes things in this regard. It makes the vblank length variable, and
> > > if you wait for multiple vblanks between flips, you get the maximum
> > > vblank length corresponding to the minimum refresh rate / timing
> > > granularity. Thus, it would be useful to allow userspace to specify a
> > > timestamp corresponding to the earliest time when the flip is to
> > > complete. The kernel could then try to hit that as closely as possible.
> > 
> > Rendering a video stream is more complex then what you describe here.
> > Whenever there is a unexpected delay (late delivery of a frame as an
> > example) you may endup in situation where one frame is ready after the
> > targeted vblank. If there is another frame that targets the following
> > vblank that gets ready on-time, the previous frame should be replaced
> > by the most recent one.
> > 
> > With fences, what happens is that even if you received the next frame
> > on time, naively replacing it is not possible, because we don't know
> > when the fence for the next frame will be signalled. If you simply
> > always replace the current frame, you may endup skipping a lot more
> > vblank then what you expect, and that results in jumpy playback.
> 
> So you want to be able to replace a queued flip with another one then.
> That doesn't necessarily require allowing more than one flip to be
> queued ahead of time.

There might be other ways to do it, but this one has plenty of
advantages.

> Note that this can also be done in userspace with explicit fencing (by
> only selecting a frame and submitting it to the kernel after all
> corresponding fences have signalled), at least to some degree, but the
> kernel should be able to do it up to a later point in time and more
> reliably, with less risk of missing a flip for a frame which becomes
> ready just in time.

Indeed, but it would be great if we could do that with implicit fencing
as well.

> > Render queues with timestamp are used to smooth rendering and handle
> > rendering collision so that the latency is kept low (like when you have
> > a 100fps video over a 60Hz display). This is normally done in
> > userspace, but with fences, you ask the kernel to render something in
> > an unpredictable future, so we loose the ability to make the final
> > decision.
> 
> That's just not what fences are intended to be used for with the current
> KMS UAPI.

Yes, and I think we're discussing towards changing that in the future.

Cheers,

Paul

-- 
Paul Kocialkowski, Bootlin
Embedded Linux and kernel engineering
https://bootlin.com


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Support for 2D engines/blitters in V4L2 and DRM
  2019-04-24 14:41             ` Paul Kocialkowski
@ 2019-04-24 15:06               ` Daniel Vetter
  2019-04-24 15:44                 ` Nicolas Dufresne
  0 siblings, 1 reply; 24+ messages in thread
From: Daniel Vetter @ 2019-04-24 15:06 UTC (permalink / raw)
  To: Paul Kocialkowski
  Cc: Michel Dänzer, Nicolas Dufresne, Alexandre Courbot,
	Maxime Ripard, Linux Kernel Mailing List, dri-devel, Tomasz Figa,
	Hans Verkuil, Thomas Petazzoni, Dave Airlie,
	Mauro Carvalho Chehab, open list:DMA BUFFER SHARING FRAMEWORK

On Wed, Apr 24, 2019 at 4:41 PM Paul Kocialkowski
<paul.kocialkowski@bootlin.com> wrote:
>
> Hi,
>
> On Wed, 2019-04-24 at 16:39 +0200, Michel Dänzer wrote:
> > On 2019-04-24 2:01 p.m., Nicolas Dufresne wrote:
> > > Le mercredi 24 avril 2019 à 10:31 +0200, Michel Dänzer a écrit :
> > > > On 2019-04-19 10:38 a.m., Paul Kocialkowski wrote:
> > > > > On Thu, 2019-04-18 at 20:30 -0400, Nicolas Dufresne wrote:
> > > > > > Le jeudi 18 avril 2019 à 10:18 +0200, Daniel Vetter a écrit :
> > > > > > In the first, we'd need a mechanism where we can schedule a render at a
> > > > > > specific time or vblank. We can of course already implement this in
> > > > > > software, but with fences, the scheduling would need to be done in the
> > > > > > driver. Then if the fence is signalled earlier, the driver should hold
> > > > > > on until the delay is met. If the fence got signalled late, we also
> > > > > > need to think of a workflow. As we can't schedule more then one render
> > > > > > in DRM at one time, I don't really see yet how to make that work.
> > > > >
> > > > > Indeed, that's also one of the main issues I've spotted. Before using
> > > > > an implicit fence, we basically have to make sure the frame is due for
> > > > > display at the next vblank. Otherwise, we need to refrain from using
> > > > > the fence and schedule the flip later, which is kind of counter-
> > > > > productive.
> > > >
> > > > Fences are about signalling that the contents of a frame are "done" and
> > > > ready to be presented. They're not about specifying which frame is to be
> > > > presented when.
> > > >
> > > >
> > > > > I feel like specifying a target vblank would be a good unit for that,
> > > >
> > > > The mechanism described above works for that.
> > > >
> > > > > since it's our native granularity after all (while a timestamp is not).
> > > >
> > > > Note that variable refresh rate (Adaptive Sync / FreeSync / G-Sync)
> > > > changes things in this regard. It makes the vblank length variable, and
> > > > if you wait for multiple vblanks between flips, you get the maximum
> > > > vblank length corresponding to the minimum refresh rate / timing
> > > > granularity. Thus, it would be useful to allow userspace to specify a
> > > > timestamp corresponding to the earliest time when the flip is to
> > > > complete. The kernel could then try to hit that as closely as possible.
> > >
> > > Rendering a video stream is more complex then what you describe here.
> > > Whenever there is a unexpected delay (late delivery of a frame as an
> > > example) you may endup in situation where one frame is ready after the
> > > targeted vblank. If there is another frame that targets the following
> > > vblank that gets ready on-time, the previous frame should be replaced
> > > by the most recent one.
> > >
> > > With fences, what happens is that even if you received the next frame
> > > on time, naively replacing it is not possible, because we don't know
> > > when the fence for the next frame will be signalled. If you simply
> > > always replace the current frame, you may endup skipping a lot more
> > > vblank then what you expect, and that results in jumpy playback.
> >
> > So you want to be able to replace a queued flip with another one then.
> > That doesn't necessarily require allowing more than one flip to be
> > queued ahead of time.
>
> There might be other ways to do it, but this one has plenty of
> advantages.

The point of kms (well one of the reasons) was to separate the
implementation of modesetting for specific hw from policy decisions
like which frames to drop and how to schedule them. Kernel gives
tools, userspace implements the actual protocols.

There's definitely a bit a gap around scheduling flips for a specific
frame or allowing to cancel/overwrite an already scheduled flip, but
no one yet has come up with a clear proposal for new uapi + example
implementation + userspace implementation + big enough support from
other compositors that this is what they want too.

And yes writing a really good compositor is really hard, and I think a
lot of people underestimate that and just create something useful for
their niche. If userspace can't come up with a shared library of
helpers, I don't think baking it in as kernel uapi with 10+ years
regression free api guarantees is going to make it any better.

> > Note that this can also be done in userspace with explicit fencing (by
> > only selecting a frame and submitting it to the kernel after all
> > corresponding fences have signalled), at least to some degree, but the
> > kernel should be able to do it up to a later point in time and more
> > reliably, with less risk of missing a flip for a frame which becomes
> > ready just in time.
>
> Indeed, but it would be great if we could do that with implicit fencing
> as well.

1. extract implicit fences from dma-buf. This part is just an idea,
but easy to implement once we have someone who actually wants this.
All we need is a new ioctl on the dma-buf to export the fences from
the reservation_object as a sync_file (either the exclusive or the
shared ones, selected with a flag).
2. do the exact same frame scheduling as with explicit fencing
3. supply explicit fences in your atomic ioctl calls - these should
overrule any implicit fences (assuming correct kernel drivers, but we
have helpers so you can assume they all work correctly).

By design this is possible, it's just that no one yet bothered enough
to make it happen.
-Daniel

> > > Render queues with timestamp are used to smooth rendering and handle
> > > rendering collision so that the latency is kept low (like when you have
> > > a 100fps video over a 60Hz display). This is normally done in
> > > userspace, but with fences, you ask the kernel to render something in
> > > an unpredictable future, so we loose the ability to make the final
> > > decision.
> >
> > That's just not what fences are intended to be used for with the current
> > KMS UAPI.
>
> Yes, and I think we're discussing towards changing that in the future.
>
> Cheers,
>
> Paul
>
> --
> Paul Kocialkowski, Bootlin
> Embedded Linux and kernel engineering
> https://bootlin.com
>


-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Support for 2D engines/blitters in V4L2 and DRM
  2019-04-24 15:06               ` Daniel Vetter
@ 2019-04-24 15:44                 ` Nicolas Dufresne
  2019-04-24 16:54                   ` Michel Dänzer
  0 siblings, 1 reply; 24+ messages in thread
From: Nicolas Dufresne @ 2019-04-24 15:44 UTC (permalink / raw)
  To: Daniel Vetter, Paul Kocialkowski
  Cc: Michel Dänzer, Alexandre Courbot, Maxime Ripard,
	Linux Kernel Mailing List, dri-devel, Tomasz Figa, Hans Verkuil,
	Thomas Petazzoni, Dave Airlie, Mauro Carvalho Chehab,
	open list:DMA BUFFER SHARING FRAMEWORK

[-- Attachment #1: Type: text/plain, Size: 8155 bytes --]

Le mercredi 24 avril 2019 à 17:06 +0200, Daniel Vetter a écrit :
> On Wed, Apr 24, 2019 at 4:41 PM Paul Kocialkowski
> <paul.kocialkowski@bootlin.com> wrote:
> > Hi,
> > 
> > On Wed, 2019-04-24 at 16:39 +0200, Michel Dänzer wrote:
> > > On 2019-04-24 2:01 p.m., Nicolas Dufresne wrote:
> > > > Le mercredi 24 avril 2019 à 10:31 +0200, Michel Dänzer a écrit :
> > > > > On 2019-04-19 10:38 a.m., Paul Kocialkowski wrote:
> > > > > > On Thu, 2019-04-18 at 20:30 -0400, Nicolas Dufresne wrote:
> > > > > > > Le jeudi 18 avril 2019 à 10:18 +0200, Daniel Vetter a écrit :
> > > > > > > In the first, we'd need a mechanism where we can schedule a render at a
> > > > > > > specific time or vblank. We can of course already implement this in
> > > > > > > software, but with fences, the scheduling would need to be done in the
> > > > > > > driver. Then if the fence is signalled earlier, the driver should hold
> > > > > > > on until the delay is met. If the fence got signalled late, we also
> > > > > > > need to think of a workflow. As we can't schedule more then one render
> > > > > > > in DRM at one time, I don't really see yet how to make that work.
> > > > > > 
> > > > > > Indeed, that's also one of the main issues I've spotted. Before using
> > > > > > an implicit fence, we basically have to make sure the frame is due for
> > > > > > display at the next vblank. Otherwise, we need to refrain from using
> > > > > > the fence and schedule the flip later, which is kind of counter-
> > > > > > productive.
> > > > > 
> > > > > Fences are about signalling that the contents of a frame are "done" and
> > > > > ready to be presented. They're not about specifying which frame is to be
> > > > > presented when.
> > > > > 
> > > > > 
> > > > > > I feel like specifying a target vblank would be a good unit for that,
> > > > > 
> > > > > The mechanism described above works for that.
> > > > > 
> > > > > > since it's our native granularity after all (while a timestamp is not).
> > > > > 
> > > > > Note that variable refresh rate (Adaptive Sync / FreeSync / G-Sync)
> > > > > changes things in this regard. It makes the vblank length variable, and
> > > > > if you wait for multiple vblanks between flips, you get the maximum
> > > > > vblank length corresponding to the minimum refresh rate / timing
> > > > > granularity. Thus, it would be useful to allow userspace to specify a
> > > > > timestamp corresponding to the earliest time when the flip is to
> > > > > complete. The kernel could then try to hit that as closely as possible.
> > > > 
> > > > Rendering a video stream is more complex then what you describe here.
> > > > Whenever there is a unexpected delay (late delivery of a frame as an
> > > > example) you may endup in situation where one frame is ready after the
> > > > targeted vblank. If there is another frame that targets the following
> > > > vblank that gets ready on-time, the previous frame should be replaced
> > > > by the most recent one.
> > > > 
> > > > With fences, what happens is that even if you received the next frame
> > > > on time, naively replacing it is not possible, because we don't know
> > > > when the fence for the next frame will be signalled. If you simply
> > > > always replace the current frame, you may endup skipping a lot more
> > > > vblank then what you expect, and that results in jumpy playback.
> > > 
> > > So you want to be able to replace a queued flip with another one then.
> > > That doesn't necessarily require allowing more than one flip to be
> > > queued ahead of time.
> > 
> > There might be other ways to do it, but this one has plenty of
> > advantages.
> 
> The point of kms (well one of the reasons) was to separate the
> implementation of modesetting for specific hw from policy decisions
> like which frames to drop and how to schedule them. Kernel gives
> tools, userspace implements the actual protocols.
> 
> There's definitely a bit a gap around scheduling flips for a specific
> frame or allowing to cancel/overwrite an already scheduled flip, but
> no one yet has come up with a clear proposal for new uapi + example
> implementation + userspace implementation + big enough support from
> other compositors that this is what they want too.
> 
> And yes writing a really good compositor is really hard, and I think a
> lot of people underestimate that and just create something useful for
> their niche. If userspace can't come up with a shared library of
> helpers, I don't think baking it in as kernel uapi with 10+ years
> regression free api guarantees is going to make it any better.
> 
> > > Note that this can also be done in userspace with explicit fencing (by
> > > only selecting a frame and submitting it to the kernel after all
> > > corresponding fences have signalled), at least to some degree, but the
> > > kernel should be able to do it up to a later point in time and more
> > > reliably, with less risk of missing a flip for a frame which becomes
> > > ready just in time.
> > 
> > Indeed, but it would be great if we could do that with implicit fencing
> > as well.
> 
> 1. extract implicit fences from dma-buf. This part is just an idea,
> but easy to implement once we have someone who actually wants this.
> All we need is a new ioctl on the dma-buf to export the fences from
> the reservation_object as a sync_file (either the exclusive or the
> shared ones, selected with a flag).
> 2. do the exact same frame scheduling as with explicit fencing
> 3. supply explicit fences in your atomic ioctl calls - these should
> overrule any implicit fences (assuming correct kernel drivers, but we
> have helpers so you can assume they all work correctly).
> 
> By design this is possible, it's just that no one yet bothered enough
> to make it happen.
> -Daniel

I'm not sure I understand the workflow of this one. I'm all in favour
leaving the hard work to userspace. Note that I have assumed explicit
fences from the start, I don't think implicit fence will ever exist in
v4l2, but I might be wrong. What I understood is that there was a
previous attempt in the past but it raised more issues then it actually
solved. So that being said, how do handle exactly the follow use cases:

 - A frame was lost by capture driver, but it was schedule as being the
next buffer to render (normally previous frame should remain).
 - The scheduled frame is late for the next vblank (didn't signal on-
time), a new one may be better for the next vlbank, but we will only
know when it's fence is signaled.

Better in this context means the the presentation time of this frame is
closer to the next vblank time. Keep in mind that the idea is to
schedule the frames before they are signal, in order to make the usage
of the fence useful in lowering the latency. Of course as Michel said,
we could just always wait on the fence and just schedule. But if you do
that, why would you care implementing the fence in v4l2 to start with,
DQBuf does just that already.

Note that this has nothing to do with the valid use case where you
would want to apply various transformations (m2m or gpu) on the capture
buffer. You still gain from the fence in the context, even if you wait
in userspace on the fence before display. This alone is likely enough
to justify using fences.

> 
> > > > Render queues with timestamp are used to smooth rendering and handle
> > > > rendering collision so that the latency is kept low (like when you have
> > > > a 100fps video over a 60Hz display). This is normally done in
> > > > userspace, but with fences, you ask the kernel to render something in
> > > > an unpredictable future, so we loose the ability to make the final
> > > > decision.
> > > 
> > > That's just not what fences are intended to be used for with the current
> > > KMS UAPI.
> > 
> > Yes, and I think we're discussing towards changing that in the future.
> > 
> > Cheers,
> > 
> > Paul
> > 
> > --
> > Paul Kocialkowski, Bootlin
> > Embedded Linux and kernel engineering
> > https://bootlin.com
> > 
> 
> 

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Support for 2D engines/blitters in V4L2 and DRM
  2019-04-24 15:44                 ` Nicolas Dufresne
@ 2019-04-24 16:54                   ` Michel Dänzer
  2019-04-24 17:43                     ` Nicolas Dufresne
  0 siblings, 1 reply; 24+ messages in thread
From: Michel Dänzer @ 2019-04-24 16:54 UTC (permalink / raw)
  To: Nicolas Dufresne, Daniel Vetter, Paul Kocialkowski
  Cc: Alexandre Courbot, Maxime Ripard, Linux Kernel Mailing List,
	dri-devel, Tomasz Figa, Hans Verkuil, Thomas Petazzoni,
	Dave Airlie, Mauro Carvalho Chehab,
	open list:DMA BUFFER SHARING FRAMEWORK


[-- Attachment #1.1: Type: text/plain, Size: 5289 bytes --]

On 2019-04-24 5:44 p.m., Nicolas Dufresne wrote:
> Le mercredi 24 avril 2019 à 17:06 +0200, Daniel Vetter a écrit :
>> On Wed, Apr 24, 2019 at 4:41 PM Paul Kocialkowski
>> <paul.kocialkowski@bootlin.com> wrote:
>>> On Wed, 2019-04-24 at 16:39 +0200, Michel Dänzer wrote:
>>>> On 2019-04-24 2:01 p.m., Nicolas Dufresne wrote:
>>>>>
>>>>> Rendering a video stream is more complex then what you describe here.
>>>>> Whenever there is a unexpected delay (late delivery of a frame as an
>>>>> example) you may endup in situation where one frame is ready after the
>>>>> targeted vblank. If there is another frame that targets the following
>>>>> vblank that gets ready on-time, the previous frame should be replaced
>>>>> by the most recent one.
>>>>>
>>>>> With fences, what happens is that even if you received the next frame
>>>>> on time, naively replacing it is not possible, because we don't know
>>>>> when the fence for the next frame will be signalled. If you simply
>>>>> always replace the current frame, you may endup skipping a lot more
>>>>> vblank then what you expect, and that results in jumpy playback.
>>>>
>>>> So you want to be able to replace a queued flip with another one then.
>>>> That doesn't necessarily require allowing more than one flip to be
>>>> queued ahead of time.
>>>
>>> There might be other ways to do it, but this one has plenty of
>>> advantages.
>>
>> The point of kms (well one of the reasons) was to separate the
>> implementation of modesetting for specific hw from policy decisions
>> like which frames to drop and how to schedule them. Kernel gives
>> tools, userspace implements the actual protocols.
>>
>> There's definitely a bit a gap around scheduling flips for a specific
>> frame or allowing to cancel/overwrite an already scheduled flip, but
>> no one yet has come up with a clear proposal for new uapi + example
>> implementation + userspace implementation + big enough support from
>> other compositors that this is what they want too.

Actually, the ATOMIC_AMEND patches propose a way to replace a scheduled
flip?


>>>> Note that this can also be done in userspace with explicit fencing (by
>>>> only selecting a frame and submitting it to the kernel after all
>>>> corresponding fences have signalled), at least to some degree, but the
>>>> kernel should be able to do it up to a later point in time and more
>>>> reliably, with less risk of missing a flip for a frame which becomes
>>>> ready just in time.
>>>
>>> Indeed, but it would be great if we could do that with implicit fencing
>>> as well.
>>
>> 1. extract implicit fences from dma-buf. This part is just an idea,
>> but easy to implement once we have someone who actually wants this.
>> All we need is a new ioctl on the dma-buf to export the fences from
>> the reservation_object as a sync_file (either the exclusive or the
>> shared ones, selected with a flag).
>> 2. do the exact same frame scheduling as with explicit fencing
>> 3. supply explicit fences in your atomic ioctl calls - these should
>> overrule any implicit fences (assuming correct kernel drivers, but we
>> have helpers so you can assume they all work correctly).
>>
>> By design this is possible, it's just that no one yet bothered enough
>> to make it happen.
>> -Daniel
> 
> I'm not sure I understand the workflow of this one. I'm all in favour
> leaving the hard work to userspace. Note that I have assumed explicit
> fences from the start, I don't think implicit fence will ever exist in
> v4l2, but I might be wrong. What I understood is that there was a
> previous attempt in the past but it raised more issues then it actually
> solved. So that being said, how do handle exactly the follow use cases:
> 
>  - A frame was lost by capture driver, but it was schedule as being the
> next buffer to render (normally previous frame should remain).

Userspace just doesn't call into the kernel to flip to the lost frame,
so the previous one remains.

>  - The scheduled frame is late for the next vblank (didn't signal on-
> time), a new one may be better for the next vlbank, but we will only
> know when it's fence is signaled.

Userspace only selects a frame and submits it to the kernel after all
its fences have signalled.

> Better in this context means the the presentation time of this frame is
> closer to the next vblank time. Keep in mind that the idea is to
> schedule the frames before they are signal, in order to make the usage
> of the fence useful in lowering the latency.

Fences are about signalling completion, not about low latency.

With a display server, the client can send frames to the display server
ahead of time, only the display server needs to wait for fences to
signal before submitting frames to the kernel.


> Of course as Michel said, we could just always wait on the fence and
> just schedule. But if you do that, why would you care implementing the
> fence in v4l2 to start with, DQBuf does just that already.

A fence is more likely to work out of the box with non-V4L-related code
than DQBuf?


-- 
Earthling Michel Dänzer               |              https://www.amd.com
Libre software enthusiast             |             Mesa and X developer


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Support for 2D engines/blitters in V4L2 and DRM
  2019-04-24 12:19         ` Paul Kocialkowski
@ 2019-04-24 17:10           ` Michel Dänzer
  0 siblings, 0 replies; 24+ messages in thread
From: Michel Dänzer @ 2019-04-24 17:10 UTC (permalink / raw)
  To: Paul Kocialkowski, Nicolas Dufresne, Daniel Vetter
  Cc: Alexandre Courbot, Maxime Ripard, linux-kernel, dri-devel,
	Tomasz Figa, Hans Verkuil, Thomas Petazzoni, Dave Airlie,
	Mauro Carvalho Chehab, linux-media

On 2019-04-24 2:19 p.m., Paul Kocialkowski wrote:
> On Wed, 2019-04-24 at 10:31 +0200, Michel Dänzer wrote:
>> On 2019-04-19 10:38 a.m., Paul Kocialkowski wrote:
>>> On Thu, 2019-04-18 at 20:30 -0400, Nicolas Dufresne wrote:
>>>> Le jeudi 18 avril 2019 à 10:18 +0200, Daniel Vetter a écrit :
>>>>>> It would be cool if both could be used concurrently and not just return
>>>>>> -EBUSY when the device is used with the other subsystem.
>>>>>
>>>>> We live in this world already :-) I think there's even patches (or merged
>>>>> already) to add fences to v4l, for Android.
>>>>
>>>> This work is currently suspended. It will require some feature on DRM
>>>> display to really make this useful, but there is also a lot of
>>>> challanges in V4L2. In GFX space, most of the use case are about
>>>> rendering as soon as possible. Though, in multimedia we have two
>>>> problems, we need to synchronize the frame rendering with the audio,
>>>> and output buffers may comes out of order due to how video CODECs are
>>>> made.
>>>
>>> Definitely, it feels like the DRM display side is currently a good fit
>>> for render use cases, but not so much for precise display cases where
>>> we want to try and display a buffer at a given vblank target instead of
>>> "as soon as possible".
>>>
>>> I have a userspace project where I've implemented a page flip queue,
>>> which only schedules the next flip when relevant and keeps ready
>>> buffers in the queue until then. This requires explicit vblank
>>> syncronisation (which DRM offsers, but pretty much all other display
>>> APIs, that are higher-level don't, so I'm just using a refresh-rate
>>> timer for them) and flip done notification.
>>>
>>> I haven't looked too much at how to flip with a target vblank with DRM
>>> directly but maybe the atomic API already has the bits in for that (but
>>> I haven't heard of such a thing as a buffer queue, so that makes me
>>> doubt it).
>>
>> Not directly. What's available is that if userspace waits for vblank n
>> and then submits a flip, the flip will complete in vblank n+1 (or a
>> later vblank, depending on when the flip is submitted and when the
>> fences the flip depends on signal).
>>
>> There is reluctance allowing more than one flip to be queued in the
>> kernel, as it would considerably increase complexity in the kernel. It
>> would probably only be considered if there was a compelling use-case
>> which was outright impossible otherwise.
> 
> Well, I think it's just less boilerplace for userspace. This is indeed
> quite complex, and I prefer to see that complexity done once and well
> in Linux rather than duplicated in userspace with more or less reliable
> implementations.

That's not the only trade-off to consider, e.g. I suspect handling this
in the kernel is more complex than in userspace.


>>> Well, I need to handle stuff like SDL in my userspace project, so I have
>>> to have all that queuing stuff in software anyway, but it would be good
>>> if each project didn't have to implement that. Worst case, it could be
>>> in libdrm too.
>>
>> Usually, this kind of queuing will be handled in a display server such
>> as Xorg or a Wayland compositor, not by the application such as a video
>> player itself, or any library in the latter's address space. I'm not
>> sure there's much potential for sharing code between display servers for
>> this.
> 
> This assumes that you are using a display server, which is definitely
> not always the case (there is e.g. Kodi GBM). Well, I'm not saying it
> is essential to have it in the kernel, but it would avoid code
> duplication and lower the complexity in userspace.

For code duplication, my suggestion would be to use a display server
instead of duplicating its functionality.


>>>> In the first, we'd need a mechanism where we can schedule a render at a
>>>> specific time or vblank. We can of course already implement this in
>>>> software, but with fences, the scheduling would need to be done in the
>>>> driver. Then if the fence is signalled earlier, the driver should hold
>>>> on until the delay is met. If the fence got signalled late, we also
>>>> need to think of a workflow. As we can't schedule more then one render
>>>> in DRM at one time, I don't really see yet how to make that work.
>>>
>>> Indeed, that's also one of the main issues I've spotted. Before using
>>> an implicit fence, we basically have to make sure the frame is due for
>>> display at the next vblank. Otherwise, we need to refrain from using
>>> the fence and schedule the flip later, which is kind of counter-
>>> productive.
>>
>> [...]
> 
>>> I feel like specifying a target vblank would be a good unit for that,
>>
>> The mechanism described above works for that.
> 
> I still don't see any fence-based mechanism that can work to achieve
> that, but maybe I'm missing your point.

It's not fence based, just good old waiting for the previous vblank
before submitting the flip to the kernel.


>>> since it's our native granularity after all (while a timestamp is not).
>>
>> Note that variable refresh rate (Adaptive Sync / FreeSync / G-Sync)
>> changes things in this regard. It makes the vblank length variable, and
>> if you wait for multiple vblanks between flips, you get the maximum
>> vblank length corresponding to the minimum refresh rate / timing
>> granularity. Thus, it would be useful to allow userspace to specify a
>> timestamp corresponding to the earliest time when the flip is to
>> complete. The kernel could then try to hit that as closely as possible.
> 
> I'm not very familiar with how this works, but I don't really see what
> it changes. Does it mean we can flip multiple times per vblank?

It's not about that.


> And I really like a vblank count over a timestamp, as one is the native
> unit at hand and the other one only correleates to it.

From a video playback application POV it's really the other way around,
isn't it? The target time is known (e.g. in order to sync up with
audio), the vblank count has to be calculated from that. And with
variable refresh rate, this calculation can't be done reliably, because
it's not known ahead of time when the next vblank starts (at least not
more accurately than an interval corresponding to the maximum/minimum
refresh rates).

If the target timestamp could be specified explicitly, the kernel could
do the conversion to the vblank count for fixed refresh, and could
adjust the refresh rate to hit the target more accurately with variable
refresh.


-- 
Earthling Michel Dänzer               |              https://www.amd.com
Libre software enthusiast             |             Mesa and X developer

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Support for 2D engines/blitters in V4L2 and DRM
  2019-04-24 16:54                   ` Michel Dänzer
@ 2019-04-24 17:43                     ` Nicolas Dufresne
  2019-04-25 15:17                       ` Michel Dänzer
  0 siblings, 1 reply; 24+ messages in thread
From: Nicolas Dufresne @ 2019-04-24 17:43 UTC (permalink / raw)
  To: Michel Dänzer, Daniel Vetter, Paul Kocialkowski
  Cc: Alexandre Courbot, Maxime Ripard, Linux Kernel Mailing List,
	dri-devel, Tomasz Figa, Hans Verkuil, Thomas Petazzoni,
	Dave Airlie, Mauro Carvalho Chehab,
	open list:DMA BUFFER SHARING FRAMEWORK

[-- Attachment #1: Type: text/plain, Size: 7371 bytes --]

Le mercredi 24 avril 2019 à 18:54 +0200, Michel Dänzer a écrit :
> On 2019-04-24 5:44 p.m., Nicolas Dufresne wrote:
> > Le mercredi 24 avril 2019 à 17:06 +0200, Daniel Vetter a écrit :
> > > On Wed, Apr 24, 2019 at 4:41 PM Paul Kocialkowski
> > > <paul.kocialkowski@bootlin.com> wrote:
> > > > On Wed, 2019-04-24 at 16:39 +0200, Michel Dänzer wrote:
> > > > > On 2019-04-24 2:01 p.m., Nicolas Dufresne wrote:
> > > > > > Rendering a video stream is more complex then what you describe here.
> > > > > > Whenever there is a unexpected delay (late delivery of a frame as an
> > > > > > example) you may endup in situation where one frame is ready after the
> > > > > > targeted vblank. If there is another frame that targets the following
> > > > > > vblank that gets ready on-time, the previous frame should be replaced
> > > > > > by the most recent one.
> > > > > > 
> > > > > > With fences, what happens is that even if you received the next frame
> > > > > > on time, naively replacing it is not possible, because we don't know
> > > > > > when the fence for the next frame will be signalled. If you simply
> > > > > > always replace the current frame, you may endup skipping a lot more
> > > > > > vblank then what you expect, and that results in jumpy playback.
> > > > > 
> > > > > So you want to be able to replace a queued flip with another one then.
> > > > > That doesn't necessarily require allowing more than one flip to be
> > > > > queued ahead of time.
> > > > 
> > > > There might be other ways to do it, but this one has plenty of
> > > > advantages.
> > > 
> > > The point of kms (well one of the reasons) was to separate the
> > > implementation of modesetting for specific hw from policy decisions
> > > like which frames to drop and how to schedule them. Kernel gives
> > > tools, userspace implements the actual protocols.
> > > 
> > > There's definitely a bit a gap around scheduling flips for a specific
> > > frame or allowing to cancel/overwrite an already scheduled flip, but
> > > no one yet has come up with a clear proposal for new uapi + example
> > > implementation + userspace implementation + big enough support from
> > > other compositors that this is what they want too.
> 
> Actually, the ATOMIC_AMEND patches propose a way to replace a scheduled
> flip?
> 
> 
> > > > > Note that this can also be done in userspace with explicit fencing (by
> > > > > only selecting a frame and submitting it to the kernel after all
> > > > > corresponding fences have signalled), at least to some degree, but the
> > > > > kernel should be able to do it up to a later point in time and more
> > > > > reliably, with less risk of missing a flip for a frame which becomes
> > > > > ready just in time.
> > > > 
> > > > Indeed, but it would be great if we could do that with implicit fencing
> > > > as well.
> > > 
> > > 1. extract implicit fences from dma-buf. This part is just an idea,
> > > but easy to implement once we have someone who actually wants this.
> > > All we need is a new ioctl on the dma-buf to export the fences from
> > > the reservation_object as a sync_file (either the exclusive or the
> > > shared ones, selected with a flag).
> > > 2. do the exact same frame scheduling as with explicit fencing
> > > 3. supply explicit fences in your atomic ioctl calls - these should
> > > overrule any implicit fences (assuming correct kernel drivers, but we
> > > have helpers so you can assume they all work correctly).
> > > 
> > > By design this is possible, it's just that no one yet bothered enough
> > > to make it happen.
> > > -Daniel
> > 
> > I'm not sure I understand the workflow of this one. I'm all in favour
> > leaving the hard work to userspace. Note that I have assumed explicit
> > fences from the start, I don't think implicit fence will ever exist in
> > v4l2, but I might be wrong. What I understood is that there was a
> > previous attempt in the past but it raised more issues then it actually
> > solved. So that being said, how do handle exactly the follow use cases:
> > 
> >  - A frame was lost by capture driver, but it was schedule as being the
> > next buffer to render (normally previous frame should remain).
> 
> Userspace just doesn't call into the kernel to flip to the lost frame,
> so the previous one remains.

We are stuck in a loop you a me. Considering v4l2 to drm, where fences
don't exist on v4l2, it makes very little sense to bring up fences if
we are to wait on the fence in userspace. Unless of course you have
other operations before end making a proper use of the fences.

> 
> >  - The scheduled frame is late for the next vblank (didn't signal on-
> > time), a new one may be better for the next vlbank, but we will only
> > know when it's fence is signaled.
> 
> Userspace only selects a frame and submits it to the kernel after all
> its fences have signalled.
> 
> > Better in this context means the the presentation time of this frame is
> > closer to the next vblank time. Keep in mind that the idea is to
> > schedule the frames before they are signal, in order to make the usage
> > of the fence useful in lowering the latency.
> 
> Fences are about signalling completion, not about low latency.

It can be used to remove a roundtrip with userspace at a very time
sensitive moment. If you pass a dmabuf with it's unsignalled fence to a
kernel driver, the driver can start the job on this dmabuf as soon as
the fence is signalled. If you always wait on a fence in userspace, you
have to wait for the userspace process to be scheduled, then userspace
will setup the drm atomic request or similar action, which may take
some time and may require another process in the kernel to have to be
schedule. This effectively adds some variable delay, a gap where
nothing is happening between two operations. This time is lost and
contributes to the overall operation latency.

The benefit of fences we are looking for is being able to setup before
the fence is signalled the operations on various compatible drivers.
This way, on the time critical moment a driver can be feed more jobs,
there is no userspace rountrip involved. It is also proposed to use it
to return the buffers into v4l2 queued when they are freed, which can
in some conditions avoid let's say a capture driver from skipping due
to random scheduling delays.

> 
> With a display server, the client can send frames to the display server
> ahead of time, only the display server needs to wait for fences to
> signal before submitting frames to the kernel.
> 
> 
> > Of course as Michel said, we could just always wait on the fence and
> > just schedule. But if you do that, why would you care implementing the
> > fence in v4l2 to start with, DQBuf does just that already.
> 
> A fence is more likely to work out of the box with non-V4L-related code
> than DQBuf?

If you use DQBuf, you are guarantied that the data has been produced. A
fence is not useful on a buffer that already contains the data you
would be waiting for. That's why the fence is provided in the RFC at
QBUf, basically when the  free buffer is given to the v4l2 driver. QBuf
can also be passed a fence in the RFC, so if the buffer is not yet
free, the driver would wait on the fence before using it.

> 
> 

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Support for 2D engines/blitters in V4L2 and DRM
  2019-04-24 17:43                     ` Nicolas Dufresne
@ 2019-04-25 15:17                       ` Michel Dänzer
  0 siblings, 0 replies; 24+ messages in thread
From: Michel Dänzer @ 2019-04-25 15:17 UTC (permalink / raw)
  To: Nicolas Dufresne, Daniel Vetter, Paul Kocialkowski
  Cc: Alexandre Courbot, Maxime Ripard, Linux Kernel Mailing List,
	dri-devel, Tomasz Figa, Hans Verkuil, Thomas Petazzoni,
	Dave Airlie, Mauro Carvalho Chehab,
	open list:DMA BUFFER SHARING FRAMEWORK


[-- Attachment #1.1: Type: text/plain, Size: 7053 bytes --]

On 2019-04-24 7:43 p.m., Nicolas Dufresne wrote:
> Le mercredi 24 avril 2019 à 18:54 +0200, Michel Dänzer a écrit :
>> On 2019-04-24 5:44 p.m., Nicolas Dufresne wrote:
>>> Le mercredi 24 avril 2019 à 17:06 +0200, Daniel Vetter a écrit :
>>>> On Wed, Apr 24, 2019 at 4:41 PM Paul Kocialkowski
>>>> <paul.kocialkowski@bootlin.com> wrote:
>>>>> On Wed, 2019-04-24 at 16:39 +0200, Michel Dänzer wrote:
>>>>>> On 2019-04-24 2:01 p.m., Nicolas Dufresne wrote:
>>>>>>> Rendering a video stream is more complex then what you describe here.
>>>>>>> Whenever there is a unexpected delay (late delivery of a frame as an
>>>>>>> example) you may endup in situation where one frame is ready after the
>>>>>>> targeted vblank. If there is another frame that targets the following
>>>>>>> vblank that gets ready on-time, the previous frame should be replaced
>>>>>>> by the most recent one.
>>>>>>>
>>>>>>> With fences, what happens is that even if you received the next frame
>>>>>>> on time, naively replacing it is not possible, because we don't know
>>>>>>> when the fence for the next frame will be signalled. If you simply
>>>>>>> always replace the current frame, you may endup skipping a lot more
>>>>>>> vblank then what you expect, and that results in jumpy playback.
>>>>>>
>>>>>> So you want to be able to replace a queued flip with another one then.
>>>>>> That doesn't necessarily require allowing more than one flip to be
>>>>>> queued ahead of time.
>>>>>
>>>>> There might be other ways to do it, but this one has plenty of
>>>>> advantages.
>>>>
>>>> The point of kms (well one of the reasons) was to separate the
>>>> implementation of modesetting for specific hw from policy decisions
>>>> like which frames to drop and how to schedule them. Kernel gives
>>>> tools, userspace implements the actual protocols.
>>>>
>>>> There's definitely a bit a gap around scheduling flips for a specific
>>>> frame or allowing to cancel/overwrite an already scheduled flip, but
>>>> no one yet has come up with a clear proposal for new uapi + example
>>>> implementation + userspace implementation + big enough support from
>>>> other compositors that this is what they want too.
>>
>> Actually, the ATOMIC_AMEND patches propose a way to replace a scheduled
>> flip?
>>
>>
>>>>>> Note that this can also be done in userspace with explicit fencing (by
>>>>>> only selecting a frame and submitting it to the kernel after all
>>>>>> corresponding fences have signalled), at least to some degree, but the
>>>>>> kernel should be able to do it up to a later point in time and more
>>>>>> reliably, with less risk of missing a flip for a frame which becomes
>>>>>> ready just in time.
>>>>>
>>>>> Indeed, but it would be great if we could do that with implicit fencing
>>>>> as well.
>>>>
>>>> 1. extract implicit fences from dma-buf. This part is just an idea,
>>>> but easy to implement once we have someone who actually wants this.
>>>> All we need is a new ioctl on the dma-buf to export the fences from
>>>> the reservation_object as a sync_file (either the exclusive or the
>>>> shared ones, selected with a flag).
>>>> 2. do the exact same frame scheduling as with explicit fencing
>>>> 3. supply explicit fences in your atomic ioctl calls - these should
>>>> overrule any implicit fences (assuming correct kernel drivers, but we
>>>> have helpers so you can assume they all work correctly).
>>>>
>>>> By design this is possible, it's just that no one yet bothered enough
>>>> to make it happen.
>>>> -Daniel
>>>
>>> I'm not sure I understand the workflow of this one. I'm all in favour
>>> leaving the hard work to userspace. Note that I have assumed explicit
>>> fences from the start, I don't think implicit fence will ever exist in
>>> v4l2, but I might be wrong. What I understood is that there was a
>>> previous attempt in the past but it raised more issues then it actually
>>> solved. So that being said, how do handle exactly the follow use cases:
>>>
>>>  - A frame was lost by capture driver, but it was schedule as being the
>>> next buffer to render (normally previous frame should remain).
>>
>> Userspace just doesn't call into the kernel to flip to the lost frame,
>> so the previous one remains.
> 
> We are stuck in a loop you a me. Considering v4l2 to drm, where fences
> don't exist on v4l2, it makes very little sense to bring up fences if
> we are to wait on the fence in userspace.

It makes sense insofar as no V4L specific code would be needed to make
sure that the contents of a buffer produced via V4L aren't consumed
before they're ready to be.


>>>  - The scheduled frame is late for the next vblank (didn't signal on-
>>> time), a new one may be better for the next vlbank, but we will only
>>> know when it's fence is signaled.
>>
>> Userspace only selects a frame and submits it to the kernel after all
>> its fences have signalled.
>>
>>> Better in this context means the the presentation time of this frame is
>>> closer to the next vblank time. Keep in mind that the idea is to
>>> schedule the frames before they are signal, in order to make the usage
>>> of the fence useful in lowering the latency.
>>
>> Fences are about signalling completion, not about low latency.
> 
> It can be used to remove a roundtrip with userspace at a very time
> sensitive moment. If you pass a dmabuf with it's unsignalled fence to a
> kernel driver, the driver can start the job on this dmabuf as soon as
> the fence is signalled. If you always wait on a fence in userspace, you
> have to wait for the userspace process to be scheduled,

I doubt this magically works without something like that (e.g. a
workqueue, which runs in normal process context) in the kernel either. :)

> then userspace will setup the drm atomic request or similar action, which
> may take some time and may require another process in the kernel to have
> to be schedule. This effectively adds some variable delay, a gap where
> nothing is happening between two operations. This time is lost and
> contributes to the overall operation latency.

It only increases latency if it causes a flip to miss its target vblank,
and it's not possible to know this happens at an unacceptable rate
without trying. The prudent approach is to at least prototype a solution
with as much complexity as possible in userspace first. If that turns
out to perform too badly, then we can think about how to improve it by
adding complexity in the kernel.


> The benefit of fences we are looking for is being able to setup before
> the fence is signalled the operations on various compatible drivers.
> This way, on the time critical moment a driver can be feed more jobs,
> there is no userspace rountrip involved.

That is possible with other operations, just not with page flipping yet.


-- 
Earthling Michel Dänzer               |              https://www.amd.com
Libre software enthusiast             |             Mesa and X developer


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Support for 2D engines/blitters in V4L2 and DRM
  2019-04-17 18:10 Support for 2D engines/blitters in V4L2 and DRM Paul Kocialkowski
  2019-04-18  8:18 ` Daniel Vetter
@ 2019-05-06  8:28 ` Pekka Paalanen
  2019-05-09  8:32   ` Paul Kocialkowski
  1 sibling, 1 reply; 24+ messages in thread
From: Pekka Paalanen @ 2019-05-06  8:28 UTC (permalink / raw)
  To: Paul Kocialkowski
  Cc: Nicolas Dufresne, Alexandre Courbot, Maxime Ripard, linux-kernel,
	dri-devel, Tomasz Figa, Hans Verkuil, Thomas Petazzoni,
	Dave Airlie, Mauro Carvalho Chehab, linux-media

[-- Attachment #1: Type: text/plain, Size: 2925 bytes --]

On Wed, 17 Apr 2019 20:10:15 +0200
Paul Kocialkowski <paul.kocialkowski@bootlin.com> wrote:

> There's also the possibility of writing up a drm-render DDX to handle
> these 2D blitters that can make things a lot faster when running a
> desktop environment. As for wayland, well, I don't really know what to
> think. I was under the impression that it relies on GL for 2D
> operations, but am really not sure how true that actually is.

Hi Paul,

Wayland does not rely on anything really, it does not even have any
rendering commands, and is completely agnostic to how applications or
display servers might be drawing things. Wayland (protocol) does care
about buffer types and fences though, since those are the things passed
between applications and servers.

In a Wayland architecture, each display server (called a Wayland
compositor, corresponding to Xorg + window manager + compositing
manager) uses whatever they want to use for putting the screen contents
together. OpenGL is a popular choice, yes, but they may also use Vulkan,
Pixman, Cairo, Skia, DRM KMS planes, and whatnot or a mix of any.
Sometimes it may so happen that the display server does not need to
render at all, the display hardware can realize the screen contents
through e.g. KMS planes.

Writing a hardware specific driver (like a DDX for Xorg) for one
display server (or a display server library like wlroots or libweston)
is no longer reasonable. You would have to do it on so many display
server projects. What really makes it infeasible is the
hardware-specific aspect. People would have to write a driver for every
display server project for every hardware model. That's just not
feasible today.

Some display server projects even refuse to take hardware-specific code
upstream, because keeping it working has a high cost and only very few
people can test it.

The only way as I see that you could have Wayland compositors at large
take advantage of 2D hardware units is to come up with the common
userspace API in the sense similar to Vulkan or OpenGL, so that each
display server would only need to support the API, and the API
implementation would handle the hardware-specific parts. OpenWF by
Khronos may have been the most serious effort in that, good luck
finding any users or implementations today. Although maybe Android's
hwcomposer could be the next one.

However, if someone is doing a special Wayland compositor to be used on
specific hardware, they can of course use whatever to put the screen
contents together in a downstream fork. Wayland does not restrict that
in any way, not even by buffer or fence types because you can extend
Wayland to deal with anything you need, as long as you also modify the
apps or toolkits to do it too. The limitations are really more
political and practical if you aim for upstream and wide-spread use of
2D hardware blocks.


Thanks,
pq

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Support for 2D engines/blitters in V4L2 and DRM
  2019-05-06  8:28 ` Pekka Paalanen
@ 2019-05-09  8:32   ` Paul Kocialkowski
  0 siblings, 0 replies; 24+ messages in thread
From: Paul Kocialkowski @ 2019-05-09  8:32 UTC (permalink / raw)
  To: Pekka Paalanen
  Cc: Nicolas Dufresne, Alexandre Courbot, Maxime Ripard, linux-kernel,
	dri-devel, Tomasz Figa, Hans Verkuil, Thomas Petazzoni,
	Dave Airlie, Mauro Carvalho Chehab, linux-media

Hi Pekka,

Le lundi 06 mai 2019 à 11:28 +0300, Pekka Paalanen a écrit :
> On Wed, 17 Apr 2019 20:10:15 +0200
> Paul Kocialkowski <paul.kocialkowski@bootlin.com> wrote:
> 
> > There's also the possibility of writing up a drm-render DDX to handle
> > these 2D blitters that can make things a lot faster when running a
> > desktop environment. As for wayland, well, I don't really know what to
> > think. I was under the impression that it relies on GL for 2D
> > operations, but am really not sure how true that actually is.
> 
> Hi Paul,
> 
> Wayland does not rely on anything really, it does not even have any
> rendering commands, and is completely agnostic to how applications or
> display servers might be drawing things. Wayland (protocol) does care
> about buffer types and fences though, since those are the things passed
> between applications and servers.
> 
> In a Wayland architecture, each display server (called a Wayland
> compositor, corresponding to Xorg + window manager + compositing
> manager) uses whatever they want to use for putting the screen contents
> together. OpenGL is a popular choice, yes, but they may also use Vulkan,
> Pixman, Cairo, Skia, DRM KMS planes, and whatnot or a mix of any.
> Sometimes it may so happen that the display server does not need to
> render at all, the display hardware can realize the screen contents
> through e.g. KMS planes.

Right, I looked some more at wayland and had some discussions over IRC
(come to think of it, I'm pretty sure you were in the discussions too)
to get a clearer understanding of the architecture. The fact that the
wayland protocol is render-agnostic and does not alloc buffers on its
own feels very sane to me.

> Writing a hardware specific driver (like a DDX for Xorg) for one
> display server (or a display server library like wlroots or libweston)
> is no longer reasonable. You would have to do it on so many display
> server projects. What really makes it infeasible is the
> hardware-specific aspect. People would have to write a driver for every
> display server project for every hardware model. That's just not
> feasible today.

Yes, this is why I am suggesting implementing a DRM helper library for
that, which would handle common drivers. Basically what mesa does for
3D, but which a DRM-specific-but-device-agnostic userspace interface.
So the overhead for integration in display servers would be minimal.

> Some display server projects even refuse to take hardware-specific code
> upstream, because keeping it working has a high cost and only very few
> people can test it.

Right, maintainance aspects are quite importance and I think it's
definitely best to centralize per-device support in a common library.

> The only way as I see that you could have Wayland compositors at large
> take advantage of 2D hardware units is to come up with the common
> userspace API in the sense similar to Vulkan or OpenGL, so that each
> display server would only need to support the API, and the API
> implementation would handle the hardware-specific parts. OpenWF by
> Khronos may have been the most serious effort in that, good luck
> finding any users or implementations today. Although maybe Android's
> hwcomposer could be the next one.

I would be very cautious regarding the approach of designing a
"standardized" API across systems. Most of the time, this does not work
well and ends up involving a glue layer of crap that is not always a
good fit for the system. Things more or less worked out with GL (with
significant effort put into it), but there are countless other examples
where it didn't (things like OpenMAX, OpenVG, etc).

In addition, this would mostly only be used in compositors, not in
final applications, so the need to have a common API across systems is
much reduced. There's also the fact that 2D is much less complicated
than 3D.

So I am not very interested in this form of standardization and I think
a DRM-specific userspace API for this is not only sufficient, but
probably also the best fit for the job. Maybe the library implementing
this API and device support could later be extended to support a
standardized API across systems too if one shows up (a bit like mesa
supports different state trackers). That's definitely not a personal
priority though and I firmly believe it should not be a blocker to get
2D blitters support with DRM.

> However, if someone is doing a special Wayland compositor to be used on
> specific hardware, they can of course use whatever to put the screen
> contents together in a downstream fork. Wayland does not restrict that
> in any way, not even by buffer or fence types because you can extend
> Wayland to deal with anything you need, as long as you also modify the
> apps or toolkits to do it too. The limitations are really more
> political and practical if you aim for upstream and wide-spread use of
> 2D hardware blocks.

Yes I understand that the issue is not so much on the technical side,
but rather on governance and politics.

Cheers,

Paul

-- 
Paul Kocialkowski <paul.kocialkowski@bootlin.com>
Bootlin


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2019-05-09  8:32 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-17 18:10 Support for 2D engines/blitters in V4L2 and DRM Paul Kocialkowski
2019-04-18  8:18 ` Daniel Vetter
2019-04-18  8:54   ` Paul Kocialkowski
2019-04-18  9:09     ` Tomasz Figa
2019-04-18  9:13       ` Paul Kocialkowski
2019-04-18  9:21         ` Tomasz Figa
2019-04-19  0:30   ` Nicolas Dufresne
2019-04-19  4:27     ` Tomasz Figa
2019-04-19 15:31       ` Nicolas Dufresne
2019-04-22  4:02         ` Tomasz Figa
2019-04-19  8:38     ` Paul Kocialkowski
2019-04-24  8:31       ` Michel Dänzer
2019-04-24 12:01         ` Nicolas Dufresne
2019-04-24 14:39           ` Michel Dänzer
2019-04-24 14:41             ` Paul Kocialkowski
2019-04-24 15:06               ` Daniel Vetter
2019-04-24 15:44                 ` Nicolas Dufresne
2019-04-24 16:54                   ` Michel Dänzer
2019-04-24 17:43                     ` Nicolas Dufresne
2019-04-25 15:17                       ` Michel Dänzer
2019-04-24 12:19         ` Paul Kocialkowski
2019-04-24 17:10           ` Michel Dänzer
2019-05-06  8:28 ` Pekka Paalanen
2019-05-09  8:32   ` Paul Kocialkowski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).