All of lore.kernel.org
 help / color / mirror / Atom feed
* Embedded Linux memory management interest group list
@ 2011-04-18 15:15 Jesse Barker
  2011-05-14 10:19 ` Summary of the V4L2 discussions during LDS - was: " Mauro Carvalho Chehab
  0 siblings, 1 reply; 13+ messages in thread
From: Jesse Barker @ 2011-04-18 15:15 UTC (permalink / raw)
  To: dri-devel, linux-media

Hi all,

One of the big issues we've been faced with at Linaro is around GPU
and multimedia device integration, in particular the memory management
requirements for supporting them on ARM.  This next cycle, we'll be
focusing on driving consensus around a unified memory management
solution for embedded systems that support multiple architectures and
SoCs.  This is listed as part of our working set of requirements for
the next six-month cycle (in spite of the URL, this is not being
treated as a graphics-specific topic - we also have participation from
multimedia and kernel working group folks):

  https://wiki.linaro.org/Cycles/1111/TechnicalTopics/Graphics

I am working on getting the key technical decision makers to provide
input and participate in the requirements collection and design for a
unified solution. We had an initial birds-of-a-feather discussion at
the Embedded Linux Conference in San Francisco this past week to kick
off the effort in preparation for the first embedded-memory-management
mini-sprint in Budapest week of May 9th at Linaro@UDS.  One of the
outcomes of the BoF was the need for a mailing list to coordinate
ideas, planning, etc.  The subscription management for the list is
located at http://lists.linaro.org/mailman/listinfo/linaro-mm-sig.
The mini-summit in Budapest will have live audio and an IRC channel
for those that want to participate (details to go out on the list).
We expect to have additional summits over the course of the cycle,
with the next one likely at Linux Plumbers in September (though, I
would like to try for one more before then).

cheers,
Jesse

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Summary of the V4L2 discussions during LDS - was: Re: Embedded Linux memory management interest group list
  2011-04-18 15:15 Embedded Linux memory management interest group list Jesse Barker
@ 2011-05-14 10:19 ` Mauro Carvalho Chehab
  2011-05-14 11:02   ` Hans Verkuil
  2011-05-16 20:45   ` Guennadi Liakhovetski
  0 siblings, 2 replies; 13+ messages in thread
From: Mauro Carvalho Chehab @ 2011-05-14 10:19 UTC (permalink / raw)
  To: dri-devel, linux-media; +Cc: Jesse Barker

Em 18-04-2011 17:15, Jesse Barker escreveu:
> One of the big issues we've been faced with at Linaro is around GPU
> and multimedia device integration, in particular the memory management
> requirements for supporting them on ARM.  This next cycle, we'll be
> focusing on driving consensus around a unified memory management
> solution for embedded systems that support multiple architectures and
> SoCs.  This is listed as part of our working set of requirements for
> the next six-month cycle (in spite of the URL, this is not being
> treated as a graphics-specific topic - we also have participation from
> multimedia and kernel working group folks):
> 
>   https://wiki.linaro.org/Cycles/1111/TechnicalTopics/Graphics

As part of the memory management needs, Linaro organized several discussions
during Linaro Development Summit (LDS), at Budapest, and invited me and other
members of the V4L and DRI community to discuss about the requirements.
I wish to thank Linaro for its initiative.

Basically, on several SoC designs, the GPU and the CPU are integrated into
the same chipset and they can share the same memory for a framebuffer. Also,
they may have some IP blocks that allow processing the framebuffer internally,
to do things like enhancing the image and converting it into an mpeg stream.

The desire, from the SoC developers, is that those operations should be
done using zero-copy transfers.

This resembles somewhat the idea of the VIDIOC_OVERLAY/VIDIOC_FBUF API, 
that was used in the old days where CPUs weren't fast enough to process
video without generating a huge load on it. So the overlay mode were created
to allow direct PCI2PCI transfers from the video capture board into the
display adapter, using XVideo extension, and removing the overload at the
CPU due to a video stream. It were designed as a Kernel API for it, and an
userspace X11 driver, that passes a framebuffer reference to the V4L driver,
where it is used to program the DMA transfers to happen inside the framebuffer.

At the LDS, we had a 3-day discussions about how the buffer sharing should
be handled, and Linaro is producing a blueprint plan to address the needs.
We had also a discussion about V4L and KMS, allowing both communities to better
understand how things are supposed to work on the other side.

>From V4L2 perspective, what is needed is to create a way to somehow allow
passing a framebuffer between two V4L2 devices and between a V4L2 device
and GPU. The V4L2 device can either be an input or an output one.
The original idea were to add yet-another-mmap-mode at the VIDIOC streaming
ioctls, and keep using QBUF/DQBUF to handle it. However, as I've pointed
there, this would leed into sync issues on a shared buffer, causing flip
effects. Also, as the API is generic, it can be used also on generic computers,
like desktops, notebooks and tablets (even on arm-based designs), and it
may end to be actually implemented as a PCI2PCI transfer.

So, based at all I've seen, I'm pretty much convinced that the normal MMAP
way of streaming (VIDIOC_[REQBUF|STREAMON|STREAMOFF|QBUF|DQBUF ioctl's)
are not the best way to share data with framebuffers. We probably need
something that it will be an enhanced version of the VIDIOC_FBUF/VIDIOC_OVERLAY
ioctls. Unfortunately, we can't just add more stuff there, as there's no
reserved space. So, we'll probably add some VIDIOC_FBUF2 series of ioctl's.

It seems to me that the proper way to develop such API is to start working
with Xorg V4L driver, changing it to work with KMS and with the new API
(probably porting some parts of the Xorg driver to kernelspace).

One of the problems with a shared framebuffer is that an overlayed V4L stream
may, at the worse case, be sent to up to 4 different GPU's and/or displays.

Imagine a scenario like:

	===================+===================
	|                  |                  |
	|      D1     +----|---+     D2       |
	|             | V4L|   |              |
	+-------------|----+---|--------------|
	|             |    |   |              |
	|      D3     +----+---+     D4       |
	|                  |                  |
	=======================================


Where D1, D2, D3 and D4 are 4 different displays, and the same V4L framebuffer is
partially shared between them (the above is an example of a V4L input, although
the reverse scenario of having one frame buffer divided into 4 V4L outputs
also seems to be possible).

As the same image may be divided into 4 monitors, the buffer filling should be
synced with all of them, in order to avoid flipping effects. Also, the shared
buffer can't be re-used until all displays finish reading. From what I understood 
from the discussions with DRI people, the display API's currently has similar issues
of needing to wait for a buffer to be completely used before allowing it to be
re-used. According to them, this were solved there by dynamically allocating buffers. 
We may need to do something similar to that also at V4L.

Btw, the need of managing buffers is currently being covered by the proposal
for new ioctl()s to support multi-sized video-buffers [1].

[1] http://www.spinics.net/lists/linux-media/msg30869.html

It makes sense to me to discuss such proposal together with the above discussions, 
in order to keep the API consistent.

On my understanding, the SoC people that are driving those changes will
be working on providing the API proposals for it. They should also be
providing the needed patches, open source drivers and userspace application(s) 
that allows testing and validating the GPU <==> V4L transfers using the newly API.

Thanks,
Mauro

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Summary of the V4L2 discussions during LDS - was: Re: Embedded Linux memory management interest group list
  2011-05-14 10:19 ` Summary of the V4L2 discussions during LDS - was: " Mauro Carvalho Chehab
@ 2011-05-14 11:02   ` Hans Verkuil
  2011-05-14 11:46     ` Mauro Carvalho Chehab
  2011-05-16 20:45   ` Guennadi Liakhovetski
  1 sibling, 1 reply; 13+ messages in thread
From: Hans Verkuil @ 2011-05-14 11:02 UTC (permalink / raw)
  To: Mauro Carvalho Chehab; +Cc: dri-devel, linux-media, Jesse Barker

On Saturday, May 14, 2011 12:19:18 Mauro Carvalho Chehab wrote:
> Em 18-04-2011 17:15, Jesse Barker escreveu:
> > One of the big issues we've been faced with at Linaro is around GPU
> > and multimedia device integration, in particular the memory management
> > requirements for supporting them on ARM.  This next cycle, we'll be
> > focusing on driving consensus around a unified memory management
> > solution for embedded systems that support multiple architectures and
> > SoCs.  This is listed as part of our working set of requirements for
> > the next six-month cycle (in spite of the URL, this is not being
> > treated as a graphics-specific topic - we also have participation from
> > multimedia and kernel working group folks):
> > 
> >   https://wiki.linaro.org/Cycles/1111/TechnicalTopics/Graphics
> 
> As part of the memory management needs, Linaro organized several discussions
> during Linaro Development Summit (LDS), at Budapest, and invited me and other
> members of the V4L and DRI community to discuss about the requirements.
> I wish to thank Linaro for its initiative.
> 
> Basically, on several SoC designs, the GPU and the CPU are integrated into
> the same chipset and they can share the same memory for a framebuffer. Also,
> they may have some IP blocks that allow processing the framebuffer internally,
> to do things like enhancing the image and converting it into an mpeg stream.
> 
> The desire, from the SoC developers, is that those operations should be
> done using zero-copy transfers.
> 
> This resembles somewhat the idea of the VIDIOC_OVERLAY/VIDIOC_FBUF API, 
> that was used in the old days where CPUs weren't fast enough to process
> video without generating a huge load on it. So the overlay mode were created
> to allow direct PCI2PCI transfers from the video capture board into the
> display adapter, using XVideo extension, and removing the overload at the
> CPU due to a video stream. It were designed as a Kernel API for it, and an
> userspace X11 driver, that passes a framebuffer reference to the V4L driver,
> where it is used to program the DMA transfers to happen inside the framebuffer.
> 
> At the LDS, we had a 3-day discussions about how the buffer sharing should
> be handled, and Linaro is producing a blueprint plan to address the needs.
> We had also a discussion about V4L and KMS, allowing both communities to better
> understand how things are supposed to work on the other side.
> 
> From V4L2 perspective, what is needed is to create a way to somehow allow
> passing a framebuffer between two V4L2 devices and between a V4L2 device
> and GPU. The V4L2 device can either be an input or an output one.
> The original idea were to add yet-another-mmap-mode at the VIDIOC streaming
> ioctls, and keep using QBUF/DQBUF to handle it. However, as I've pointed
> there, this would leed into sync issues on a shared buffer, causing flip
> effects. Also, as the API is generic, it can be used also on generic computers,
> like desktops, notebooks and tablets (even on arm-based designs), and it
> may end to be actually implemented as a PCI2PCI transfer.
> 
> So, based at all I've seen, I'm pretty much convinced that the normal MMAP
> way of streaming (VIDIOC_[REQBUF|STREAMON|STREAMOFF|QBUF|DQBUF ioctl's)
> are not the best way to share data with framebuffers.

I agree with that, but it is a different story between two V4L2 devices. There
you obviously want to use the streaming ioctls and still share buffers.

> We probably need
> something that it will be an enhanced version of the VIDIOC_FBUF/VIDIOC_OVERLAY
> ioctls. Unfortunately, we can't just add more stuff there, as there's no
> reserved space. So, we'll probably add some VIDIOC_FBUF2 series of ioctl's.

That will be useful as well to add better support for blending and Z-ordering
between overlays. The old API for that is very limited in that respect.

Regards,

	Hans

> It seems to me that the proper way to develop such API is to start working
> with Xorg V4L driver, changing it to work with KMS and with the new API
> (probably porting some parts of the Xorg driver to kernelspace).
> 
> One of the problems with a shared framebuffer is that an overlayed V4L stream
> may, at the worse case, be sent to up to 4 different GPU's and/or displays.
> 
> Imagine a scenario like:
> 
> 	===================+===================
> 	|                  |                  |
> 	|      D1     +----|---+     D2       |
> 	|             | V4L|   |              |
> 	+-------------|----+---|--------------|
> 	|             |    |   |              |
> 	|      D3     +----+---+     D4       |
> 	|                  |                  |
> 	=======================================
> 
> 
> Where D1, D2, D3 and D4 are 4 different displays, and the same V4L framebuffer is
> partially shared between them (the above is an example of a V4L input, although
> the reverse scenario of having one frame buffer divided into 4 V4L outputs
> also seems to be possible).
> 
> As the same image may be divided into 4 monitors, the buffer filling should be
> synced with all of them, in order to avoid flipping effects. Also, the shared
> buffer can't be re-used until all displays finish reading. From what I understood 
> from the discussions with DRI people, the display API's currently has similar issues
> of needing to wait for a buffer to be completely used before allowing it to be
> re-used. According to them, this were solved there by dynamically allocating buffers. 
> We may need to do something similar to that also at V4L.
> 
> Btw, the need of managing buffers is currently being covered by the proposal
> for new ioctl()s to support multi-sized video-buffers [1].
> 
> [1] http://www.spinics.net/lists/linux-media/msg30869.html
> 
> It makes sense to me to discuss such proposal together with the above discussions, 
> in order to keep the API consistent.
> 
> On my understanding, the SoC people that are driving those changes will
> be working on providing the API proposals for it. They should also be
> providing the needed patches, open source drivers and userspace application(s) 
> that allows testing and validating the GPU <==> V4L transfers using the newly API.
> 
> Thanks,
> Mauro
> --
> To unsubscribe from this list: send the line "unsubscribe linux-media" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Summary of the V4L2 discussions during LDS - was: Re: Embedded Linux memory management interest group list
  2011-05-14 11:02   ` Hans Verkuil
@ 2011-05-14 11:46     ` Mauro Carvalho Chehab
  2011-05-15 21:10       ` Hans Verkuil
  0 siblings, 1 reply; 13+ messages in thread
From: Mauro Carvalho Chehab @ 2011-05-14 11:46 UTC (permalink / raw)
  To: Hans Verkuil; +Cc: dri-devel, linux-media, Jesse Barker

Em 14-05-2011 13:02, Hans Verkuil escreveu:
> On Saturday, May 14, 2011 12:19:18 Mauro Carvalho Chehab wrote:

>> So, based at all I've seen, I'm pretty much convinced that the normal MMAP
>> way of streaming (VIDIOC_[REQBUF|STREAMON|STREAMOFF|QBUF|DQBUF ioctl's)
>> are not the best way to share data with framebuffers.
> 
> I agree with that, but it is a different story between two V4L2 devices. There
> you obviously want to use the streaming ioctls and still share buffers.

I don't think so. the requirement for syncing the framebuffer between the two
V4L2 devices is pretty much the same as we have with one V4L2 device and one GPU.

On both cases, the requirement is to pass a framebuffer between two entities, 
and not a video stream.

For example, imagine something like:

	V4L2 camera =====> V4L2 encoder t MPEG2
		     ||
		     LL==> GPU

Both GPU and the V4L2 encoder should use the same logic to be sure that they will
use a buffer that were filled already by the camera. Also, the V4L2 camera
driver can't re-use such framebuffer before being sure that both consumers 
has already stopped using it.

So, it is the same requirement as having four displays receiving such framebuffer.

Of course, a GPU endpoint may require some extra information for the blending,
but also a V4L node may require some other type of extra information.

>> We probably need
>> something that it will be an enhanced version of the VIDIOC_FBUF/VIDIOC_OVERLAY
>> ioctls. Unfortunately, we can't just add more stuff there, as there's no
>> reserved space. So, we'll probably add some VIDIOC_FBUF2 series of ioctl's.
> 
> That will be useful as well to add better support for blending and Z-ordering
> between overlays. The old API for that is very limited in that respect.

Agreed.

Mauro.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Summary of the V4L2 discussions during LDS - was: Re: Embedded Linux memory management interest group list
  2011-05-14 11:46     ` Mauro Carvalho Chehab
@ 2011-05-15 21:10       ` Hans Verkuil
  2011-05-15 21:27         ` Alan Cox
                           ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Hans Verkuil @ 2011-05-15 21:10 UTC (permalink / raw)
  To: Mauro Carvalho Chehab; +Cc: dri-devel, linux-media, Jesse Barker

On Saturday, May 14, 2011 13:46:03 Mauro Carvalho Chehab wrote:
> Em 14-05-2011 13:02, Hans Verkuil escreveu:
> > On Saturday, May 14, 2011 12:19:18 Mauro Carvalho Chehab wrote:
> 
> >> So, based at all I've seen, I'm pretty much convinced that the normal MMAP
> >> way of streaming (VIDIOC_[REQBUF|STREAMON|STREAMOFF|QBUF|DQBUF ioctl's)
> >> are not the best way to share data with framebuffers.
> > 
> > I agree with that, but it is a different story between two V4L2 devices. There
> > you obviously want to use the streaming ioctls and still share buffers.
> 
> I don't think so. the requirement for syncing the framebuffer between the two
> V4L2 devices is pretty much the same as we have with one V4L2 device and one GPU.
> 
> On both cases, the requirement is to pass a framebuffer between two entities, 
> and not a video stream.
> 
> For example, imagine something like:
> 
> 	V4L2 camera =====> V4L2 encoder t MPEG2
> 		     ||
> 		     LL==> GPU
> 
> Both GPU and the V4L2 encoder should use the same logic to be sure that they will
> use a buffer that were filled already by the camera. Also, the V4L2 camera
> driver can't re-use such framebuffer before being sure that both consumers 
> has already stopped using it.

No. A camera whose output is sent to a resizer and then to a SW/FW/HW encoder
is a typical example where you want to queue/dequeue buffers. Especially since
the various parts of the pipeline may stall for a bit so you don't want to lose
frames. That's not what the overlay API is for, that's what our streaming API
gives us.

The use case above isn't even possible without copying. At least, I don't see a
way, unless the GPU buffer is non-destructive. In that case you can give the
frame to the GPU, and when the GPU is finished you can give it to the encoder.
I suspect that might become quite complex though.

Note that many video receivers cannot stall. You can't tell them to wait until
the last buffer finished processing. This is different from some/most? sensors.

So if you try to send the input of a video receiver to some device that requires
syncing which can cause stalls, then that will not work without losing frames.
Which especially for video encoding is not desirable.

Of course, it might be that we mean the same, but just use different words :-(

Regards,

	Hans

> 
> So, it is the same requirement as having four displays receiving such framebuffer.
> 
> Of course, a GPU endpoint may require some extra information for the blending,
> but also a V4L node may require some other type of extra information.
> 
> >> We probably need
> >> something that it will be an enhanced version of the VIDIOC_FBUF/VIDIOC_OVERLAY
> >> ioctls. Unfortunately, we can't just add more stuff there, as there's no
> >> reserved space. So, we'll probably add some VIDIOC_FBUF2 series of ioctl's.
> > 
> > That will be useful as well to add better support for blending and Z-ordering
> > between overlays. The old API for that is very limited in that respect.
> 
> Agreed.
> 
> Mauro.
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Summary of the V4L2 discussions during LDS - was: Re: Embedded Linux memory management interest group list
  2011-05-15 21:10       ` Hans Verkuil
@ 2011-05-15 21:27         ` Alan Cox
  2011-05-15 23:44           ` Rob Clark
  2011-05-17 12:49         ` Mauro Carvalho Chehab
  2011-05-18 19:46         ` Sakari Ailus
  2 siblings, 1 reply; 13+ messages in thread
From: Alan Cox @ 2011-05-15 21:27 UTC (permalink / raw)
  To: Hans Verkuil; +Cc: Mauro Carvalho Chehab, dri-devel, linux-media

> > On both cases, the requirement is to pass a framebuffer between two entities, 
> > and not a video stream.

It may not even be a framebuffer. In many cases you'll pass a framebuffer
or some memory target (in DRI think probably a GEM handle), in fact in
theory you can do much of this now.

> > use a buffer that were filled already by the camera. Also, the V4L2 camera
> > driver can't re-use such framebuffer before being sure that both consumers 
> > has already stopped using it.

You also potentially need fences which complicates the interface
somewhat.

> The use case above isn't even possible without copying. At least, I don't see a
> way, unless the GPU buffer is non-destructive. In that case you can give the
> frame to the GPU, and when the GPU is finished you can give it to the encoder.
> I suspect that might become quite complex though.

It's actually no different to giving a buffer to the GPU some of the time
and the CPU other bits. In those cases you often need to ensure private
ownership each side and do fencing/cache flushing as needed.

> Note that many video receivers cannot stall. You can't tell them to wait until
> the last buffer finished processing. This is different from some/most? sensors.

A lot of video receivers also keep the bits away from the CPU as part of
the general DRM delusion TV operators work under. That means you've got
an object that has a handle, has operations (alpha, fade, scale, etc) but
you can never touch the bits. In the TV/Video world not unsurprisingly
that is often seen as the 'primary' frame buffer as well. You've got a
set of mappable framebuffers the CPU can touch plus other video sources
that can be mixed and placed but the CPU can only touch the mappable
objects that form part of the picture.

Alan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Summary of the V4L2 discussions during LDS - was: Re: Embedded Linux memory management interest group list
  2011-05-15 21:27         ` Alan Cox
@ 2011-05-15 23:44           ` Rob Clark
  0 siblings, 0 replies; 13+ messages in thread
From: Rob Clark @ 2011-05-15 23:44 UTC (permalink / raw)
  To: Alan Cox; +Cc: Hans Verkuil, Mauro Carvalho Chehab, dri-devel, linux-media

On Sun, May 15, 2011 at 4:27 PM, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
>> > On both cases, the requirement is to pass a framebuffer between two entities,
>> > and not a video stream.
>
> It may not even be a framebuffer. In many cases you'll pass a framebuffer
> or some memory target (in DRI think probably a GEM handle), in fact in
> theory you can do much of this now.
>
>> > use a buffer that were filled already by the camera. Also, the V4L2 camera
>> > driver can't re-use such framebuffer before being sure that both consumers
>> > has already stopped using it.
>
> You also potentially need fences which complicates the interface
> somewhat.

Presumable this is going through something like DRI2, so the client
application, which is what is interacting w/ V4L2 interface for camera
and perhaps video encoder, would call something that turns into a
ScheduleSwap() call on xserver side, returning a frame count to wait
for, and then at some point later ScheduleWaitMSC() to wait for that
frame count to know the GPU is done with the buffer.  The fences would
be buried somewhere within DRM (kernel) and xserver driver (userspace)
to keep the client app blocked until GPU is done.

You probably don't want the V4L2 devices to be too deeply connected to
how the GPU does synchronization, or otherwise V4L2 would need to
support each different DRM+xserver driver and how it implements buffer
synchronization with the GPU..

BR,
-R

>> The use case above isn't even possible without copying. At least, I don't see a
>> way, unless the GPU buffer is non-destructive. In that case you can give the
>> frame to the GPU, and when the GPU is finished you can give it to the encoder.
>> I suspect that might become quite complex though.
>
> It's actually no different to giving a buffer to the GPU some of the time
> and the CPU other bits. In those cases you often need to ensure private
> ownership each side and do fencing/cache flushing as needed.
>
>> Note that many video receivers cannot stall. You can't tell them to wait until
>> the last buffer finished processing. This is different from some/most? sensors.
>
> A lot of video receivers also keep the bits away from the CPU as part of
> the general DRM delusion TV operators work under. That means you've got
> an object that has a handle, has operations (alpha, fade, scale, etc) but
> you can never touch the bits. In the TV/Video world not unsurprisingly
> that is often seen as the 'primary' frame buffer as well. You've got a
> set of mappable framebuffers the CPU can touch plus other video sources
> that can be mixed and placed but the CPU can only touch the mappable
> objects that form part of the picture.
>
> Alan
> --
> To unsubscribe from this list: send the line "unsubscribe linux-media" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Summary of the V4L2 discussions during LDS - was: Re: Embedded Linux memory management interest group list
  2011-05-14 10:19 ` Summary of the V4L2 discussions during LDS - was: " Mauro Carvalho Chehab
  2011-05-14 11:02   ` Hans Verkuil
@ 2011-05-16 20:45   ` Guennadi Liakhovetski
  2011-05-17 16:46     ` Mauro Carvalho Chehab
  1 sibling, 1 reply; 13+ messages in thread
From: Guennadi Liakhovetski @ 2011-05-16 20:45 UTC (permalink / raw)
  To: Mauro Carvalho Chehab; +Cc: dri-devel, linux-media, Jesse Barker

On Sat, 14 May 2011, Mauro Carvalho Chehab wrote:

> Em 18-04-2011 17:15, Jesse Barker escreveu:
> > One of the big issues we've been faced with at Linaro is around GPU
> > and multimedia device integration, in particular the memory management
> > requirements for supporting them on ARM.  This next cycle, we'll be
> > focusing on driving consensus around a unified memory management
> > solution for embedded systems that support multiple architectures and
> > SoCs.  This is listed as part of our working set of requirements for
> > the next six-month cycle (in spite of the URL, this is not being
> > treated as a graphics-specific topic - we also have participation from
> > multimedia and kernel working group folks):
> > 
> >   https://wiki.linaro.org/Cycles/1111/TechnicalTopics/Graphics
> 
> As part of the memory management needs, Linaro organized several discussions
> during Linaro Development Summit (LDS), at Budapest, and invited me and other
> members of the V4L and DRI community to discuss about the requirements.
> I wish to thank Linaro for its initiative.

[snip]

> Btw, the need of managing buffers is currently being covered by the proposal
> for new ioctl()s to support multi-sized video-buffers [1].
> 
> [1] http://www.spinics.net/lists/linux-media/msg30869.html
> 
> It makes sense to me to discuss such proposal together with the above discussions, 
> in order to keep the API consistent.

The author of that RFC would have been thankful, if he had been put on 
Cc: ;) But anyway, yes, consistency is good, but is my understanding 
correct, that functionally these two extensions - multi-size and 
buffer-forwarding/reuse are independent? We have to think about making the 
APIs consistent, e.g., by reusing data structures. But it's also good to 
make incremental smaller changes where possible, isn't it? So, yes, we 
should think about consistency, but develop and apply those two extensions 
separately?

Thanks
Guennadi

> On my understanding, the SoC people that are driving those changes will
> be working on providing the API proposals for it. They should also be
> providing the needed patches, open source drivers and userspace application(s) 
> that allows testing and validating the GPU <==> V4L transfers using the newly API.
> 
> Thanks,
> Mauro
> --
> To unsubscribe from this list: send the line "unsubscribe linux-media" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

---
Guennadi Liakhovetski, Ph.D.
Freelance Open-Source Software Developer
http://www.open-technology.de/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Summary of the V4L2 discussions during LDS - was: Re: Embedded Linux memory management interest group list
  2011-05-15 21:10       ` Hans Verkuil
  2011-05-15 21:27         ` Alan Cox
@ 2011-05-17 12:49         ` Mauro Carvalho Chehab
  2011-05-17 12:57           ` Mauro Carvalho Chehab
  2011-05-18 19:46         ` Sakari Ailus
  2 siblings, 1 reply; 13+ messages in thread
From: Mauro Carvalho Chehab @ 2011-05-17 12:49 UTC (permalink / raw)
  To: Hans Verkuil; +Cc: dri-devel, linux-media, Jesse Barker

Em 15-05-2011 18:10, Hans Verkuil escreveu:
> On Saturday, May 14, 2011 13:46:03 Mauro Carvalho Chehab wrote:
>> Em 14-05-2011 13:02, Hans Verkuil escreveu:
>>> On Saturday, May 14, 2011 12:19:18 Mauro Carvalho Chehab wrote:
>>
>>>> So, based at all I've seen, I'm pretty much convinced that the normal MMAP
>>>> way of streaming (VIDIOC_[REQBUF|STREAMON|STREAMOFF|QBUF|DQBUF ioctl's)
>>>> are not the best way to share data with framebuffers.
>>>
>>> I agree with that, but it is a different story between two V4L2 devices. There
>>> you obviously want to use the streaming ioctls and still share buffers.
>>
>> I don't think so. the requirement for syncing the framebuffer between the two
>> V4L2 devices is pretty much the same as we have with one V4L2 device and one GPU.
>>
>> On both cases, the requirement is to pass a framebuffer between two entities, 
>> and not a video stream.
>>
>> For example, imagine something like:
>>
>> 	V4L2 camera =====> V4L2 encoder t MPEG2
>> 		     ||
>> 		     LL==> GPU

For the sake of clarity on my next comments, I'm naming the "V4L2 camera" buffer
write endpoint as "producer" and the 2 buffer read endpoints as "consumers". 
>>
>> Both GPU and the V4L2 encoder should use the same logic to be sure that they will
>> use a buffer that were filled already by the camera. Also, the V4L2 camera
>> driver can't re-use such framebuffer before being sure that both consumers 
>> has already stopped using it.
> 
> No. A camera whose output is sent to a resizer and then to a SW/FW/HW encoder
> is a typical example where you want to queue/dequeue buffers.

Why? On a framebuffer-oriented set of ioctl's, some kernel internal calls will
need to take care of the buffer usage, in order to be sure when a buffer can
be rewritten, as userspace has no way to know when a buffer needs to be queued/dequeued.

In other words, the framebuffer kernel API will probably be using a kernel structure like:

struct v4l2_fb_handler {
	bool has_finished;				/* Marks when a handler finishes to handle the buffer */
	bool is_producer;				/* Used by the handler that writes data into the buffer */

	struct list_head *handlers;			/* List with all handlers */

	void (*qbuf)(struct v4l2_fb_handler *handler);	/* qbuf-like callback, called after having a buffer filled */

	v4l2_buffer_ID	buf;				/* Buffer ID (or filehandler?) - In practice, it will probably be a list with the available buffers */

	void *priv;					/* handler priv data */
}

While stream is on, a kernel logic will run a loop, doing basically the steps bellow:

	1) Wait for the producer to rise the has_finished flag;

	2) call qbuf() for all consumers. The qbuf() call shouldn't block; it just calls 
	   a per-handler logic to start using that buffer;

	3) When each fb handler finishes using its buffer, it will rise has_finished flag;

	4) After having all buffer handlers marked as has_finished, cleans the has_finished
	  flags and re-queue the buffer.

Step (2) is equivalent to VIDIOC_QBUF, and step (4) is equivalent to VIDIOC_DQBUF.

PS.: The above is just a simplified view of such handler. We'll probably need more steps. For
example, between (1) and (2) it may probably need some logic to check if is there an available
empty buffer. If not, create a new one and use it.

What happens with REQBUF/QBUF/DQBUF is that:
	- with those calls, there's just one buffer consumer, and just one buffer producer;
	- either the producer or the consumer is on userspace, and the other pair is
	  at kernelspace;
	- buffers are allocated before the start of a process, via an explicit call;
	- buffers need to be mmapped, in order to be visible at userspace.

None of the above applies to a framebuffer-oriented API:
	- more than one buffer consumer is allowed;
	- consumers and producers are on kernelspace (it might be needed to have an
an API for handling such buffers also on userspace, although it doesn't sound a good
idea to me, IMHO);
	- buffers can be dynamically allocated/de-allocated;
	- buffers don't need to be mmapped to userspace.

> Especially since
> the various parts of the pipeline may stall for a bit so you don't want to lose
> frames. That's not what the overlay API is for, that's what our streaming API
> gives us.
> 
> The use case above isn't even possible without copying. At least, I don't see a
> way, unless the GPU buffer is non-destructive. In that case you can give the
> frame to the GPU, and when the GPU is finished you can give it to the encoder.
> I suspect that might become quite complex though.

Well, if some fb consumers would also be rewriting the buffers, serializing them is
needed, as you can't allow another process to access a memory that CPU is destroying 
at the same time, as you'll have unpredicted images being produced. The easiest
way is to make qbuf() callback block until the end of buffer rewrite, but I
don't think that this is a good idea.

On such situations, it is probably faster and cleaner to just copy data into a
second buffer, keeping the original one preserved.

> Note that many video receivers cannot stall. You can't tell them to wait until
> the last buffer finished processing. This is different from some/most? sensors.
> 
> So if you try to send the input of a video receiver to some device that requires
> syncing which can cause stalls, then that will not work without losing frames.
> Which especially for video encoding is not desirable.

If you're sharing a buffer, kernel should be sure that the shared buffer won't
be rewritten before every shared-buffer consumer doesn't finish handling it.

So, assuming that the producer is generating frames at a rate of, let's say, 30
fps, the slowest consumer should be faster than 1/30 s, otherwise, it will loose
frames.

Yet, if, under certain circumstances (like, for example, input switch from one
source to another, requiring an mpeg2 encoder to re-encode the new scena),
one of the consumer is needing more than 1/30 s, but, at most of the time it runs
bellow the 1/30 s, by using dynamic buffer allocation it is still possible of
using shared buffers without loosing frames, if the machine has enough memory
to handle the worse case.

There's one problem with dynamic buffers, however: audio and video sync becomes 
a more complex task. So, we'll end by needing to add audio timestamps at 
kernelspace, at the alsa driver.

Cheers,
Mauro.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Summary of the V4L2 discussions during LDS - was: Re: Embedded Linux memory management interest group list
  2011-05-17 12:49         ` Mauro Carvalho Chehab
@ 2011-05-17 12:57           ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 13+ messages in thread
From: Mauro Carvalho Chehab @ 2011-05-17 12:57 UTC (permalink / raw)
  To: Hans Verkuil; +Cc: dri-devel, linux-media, Jesse Barker

Em 17-05-2011 09:49, Mauro Carvalho Chehab escreveu:
> Em 15-05-2011 18:10, Hans Verkuil escreveu:
>> On Saturday, May 14, 2011 13:46:03 Mauro Carvalho Chehab wrote:
>>> Em 14-05-2011 13:02, Hans Verkuil escreveu:
>>>> On Saturday, May 14, 2011 12:19:18 Mauro Carvalho Chehab wrote:
>>>
>>>>> So, based at all I've seen, I'm pretty much convinced that the normal MMAP
>>>>> way of streaming (VIDIOC_[REQBUF|STREAMON|STREAMOFF|QBUF|DQBUF ioctl's)
>>>>> are not the best way to share data with framebuffers.
>>>>
>>>> I agree with that, but it is a different story between two V4L2 devices. There
>>>> you obviously want to use the streaming ioctls and still share buffers.
>>>
>>> I don't think so. the requirement for syncing the framebuffer between the two
>>> V4L2 devices is pretty much the same as we have with one V4L2 device and one GPU.
>>>
>>> On both cases, the requirement is to pass a framebuffer between two entities, 
>>> and not a video stream.
>>>
>>> For example, imagine something like:
>>>
>>> 	V4L2 camera =====> V4L2 encoder t MPEG2
>>> 		     ||
>>> 		     LL==> GPU
> 
> For the sake of clarity on my next comments, I'm naming the "V4L2 camera" buffer
> write endpoint as "producer" and the 2 buffer read endpoints as "consumers". 
>>>
>>> Both GPU and the V4L2 encoder should use the same logic to be sure that they will
>>> use a buffer that were filled already by the camera. Also, the V4L2 camera
>>> driver can't re-use such framebuffer before being sure that both consumers 
>>> has already stopped using it.
>>
>> No. A camera whose output is sent to a resizer and then to a SW/FW/HW encoder
>> is a typical example where you want to queue/dequeue buffers.
> 
> Why? On a framebuffer-oriented set of ioctl's, some kernel internal calls will
> need to take care of the buffer usage, in order to be sure when a buffer can
> be rewritten, as userspace has no way to know when a buffer needs to be queued/dequeued.
> 
> In other words, the framebuffer kernel API will probably be using a kernel structure like:
> 
> struct v4l2_fb_handler {
> 	bool has_finished;				/* Marks when a handler finishes to handle the buffer */
> 	bool is_producer;				/* Used by the handler that writes data into the buffer */
> 
> 	struct list_head *handlers;			/* List with all handlers */
> 
> 	void (*qbuf)(struct v4l2_fb_handler *handler);	/* qbuf-like callback, called after having a buffer filled */
> 
> 	v4l2_buffer_ID	buf;				/* Buffer ID (or filehandler?) - In practice, it will probably be a list with the available buffers */
> 
> 	void *priv;					/* handler priv data */
> }
> 
> While stream is on, a kernel logic will run a loop, doing basically the steps bellow:
> 
> 	1) Wait for the producer to rise the has_finished flag;
> 
> 	2) call qbuf() for all consumers. The qbuf() call shouldn't block; it just calls 
> 	   a per-handler logic to start using that buffer;
> 
> 	3) When each fb handler finishes using its buffer, it will rise has_finished flag;
> 
> 	4) After having all buffer handlers marked as has_finished, cleans the has_finished
> 	  flags and re-queue the buffer.
> 
> Step (2) is equivalent to VIDIOC_QBUF, and step (4) is equivalent to VIDIOC_DQBUF.
> 
> PS.: The above is just a simplified view of such handler. We'll probably need more steps. For
> example, between (1) and (2) it may probably need some logic to check if is there an available
> empty buffer. If not, create a new one and use it.
> 
> What happens with REQBUF/QBUF/DQBUF is that:
> 	- with those calls, there's just one buffer consumer, and just one buffer producer;
> 	- either the producer or the consumer is on userspace, and the other pair is
> 	  at kernelspace;
> 	- buffers are allocated before the start of a process, via an explicit call;
> 	- buffers need to be mmapped, in order to be visible at userspace.
> 
> None of the above applies to a framebuffer-oriented API:
> 	- more than one buffer consumer is allowed;
> 	- consumers and producers are on kernelspace (it might be needed to have an
> an API for handling such buffers also on userspace, although it doesn't sound a good
> idea to me, IMHO);

A side note: in the specific case of X server and display drivers, such kernelspace-userspace
API  for buffers already exists. I don't know DRI/GEM/KMS enough to tell exactly how this work 
or if it will require some changes or not, in order to work like the above, but it seems that
the right approach is to try to use or extend the existing API's, instead of creating 
something new.

The main point is: DQBUF/QBUF API assumes that userspace has full control at the buffer usage,
and buffer is handled at userspace (so, they should be mmapped there). This is not the general
case where another IP block at the chip is re-using the buffer, or if is there another DMA engine
doing direct transfers on it.

> 	- buffers can be dynamically allocated/de-allocated;
> 	- buffers don't need to be mmapped to userspace.
> 
>> Especially since
>> the various parts of the pipeline may stall for a bit so you don't want to lose
>> frames. That's not what the overlay API is for, that's what our streaming API
>> gives us.
>>
>> The use case above isn't even possible without copying. At least, I don't see a
>> way, unless the GPU buffer is non-destructive. In that case you can give the
>> frame to the GPU, and when the GPU is finished you can give it to the encoder.
>> I suspect that might become quite complex though.
> 
> Well, if some fb consumers would also be rewriting the buffers, serializing them is
> needed, as you can't allow another process to access a memory that CPU is destroying 
> at the same time, as you'll have unpredicted images being produced. The easiest
> way is to make qbuf() callback block until the end of buffer rewrite, but I
> don't think that this is a good idea.
> 
> On such situations, it is probably faster and cleaner to just copy data into a
> second buffer, keeping the original one preserved.
> 
>> Note that many video receivers cannot stall. You can't tell them to wait until
>> the last buffer finished processing. This is different from some/most? sensors.
>>
>> So if you try to send the input of a video receiver to some device that requires
>> syncing which can cause stalls, then that will not work without losing frames.
>> Which especially for video encoding is not desirable.
> 
> If you're sharing a buffer, kernel should be sure that the shared buffer won't
> be rewritten before every shared-buffer consumer doesn't finish handling it.
> 
> So, assuming that the producer is generating frames at a rate of, let's say, 30
> fps, the slowest consumer should be faster than 1/30 s, otherwise, it will loose
> frames.
> 
> Yet, if, under certain circumstances (like, for example, input switch from one
> source to another, requiring an mpeg2 encoder to re-encode the new scena),
> one of the consumer is needing more than 1/30 s, but, at most of the time it runs
> bellow the 1/30 s, by using dynamic buffer allocation it is still possible of
> using shared buffers without loosing frames, if the machine has enough memory
> to handle the worse case.
> 
> There's one problem with dynamic buffers, however: audio and video sync becomes 
> a more complex task. So, we'll end by needing to add audio timestamps at 
> kernelspace, at the alsa driver.
> 
> Cheers,
> Mauro.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-media" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Summary of the V4L2 discussions during LDS - was: Re: Embedded Linux memory management interest group list
  2011-05-16 20:45   ` Guennadi Liakhovetski
@ 2011-05-17 16:46     ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 13+ messages in thread
From: Mauro Carvalho Chehab @ 2011-05-17 16:46 UTC (permalink / raw)
  To: Guennadi Liakhovetski; +Cc: dri-devel, linux-media, Jesse Barker

Em 16-05-2011 17:45, Guennadi Liakhovetski escreveu:
> On Sat, 14 May 2011, Mauro Carvalho Chehab wrote:
> 
>> Em 18-04-2011 17:15, Jesse Barker escreveu:
>>> One of the big issues we've been faced with at Linaro is around GPU
>>> and multimedia device integration, in particular the memory management
>>> requirements for supporting them on ARM.  This next cycle, we'll be
>>> focusing on driving consensus around a unified memory management
>>> solution for embedded systems that support multiple architectures and
>>> SoCs.  This is listed as part of our working set of requirements for
>>> the next six-month cycle (in spite of the URL, this is not being
>>> treated as a graphics-specific topic - we also have participation from
>>> multimedia and kernel working group folks):
>>>
>>>   https://wiki.linaro.org/Cycles/1111/TechnicalTopics/Graphics
>>
>> As part of the memory management needs, Linaro organized several discussions
>> during Linaro Development Summit (LDS), at Budapest, and invited me and other
>> members of the V4L and DRI community to discuss about the requirements.
>> I wish to thank Linaro for its initiative.
> 
> [snip]
> 
>> Btw, the need of managing buffers is currently being covered by the proposal
>> for new ioctl()s to support multi-sized video-buffers [1].
>>
>> [1] http://www.spinics.net/lists/linux-media/msg30869.html
>>
>> It makes sense to me to discuss such proposal together with the above discussions, 
>> in order to keep the API consistent.
> 
> The author of that RFC would have been thankful, if he had been put on 
> Cc: ;) 

If I had added everybody interested on this summary, probably most smtp servers would
refuse to deliver the message thinking that it is a SPAM ;) My intention were to submit
a feedback about it when analysing your rfc patches, if you weren't able to see it
before.

> But anyway, yes, consistency is good, but is my understanding 
> correct, that functionally these two extensions - multi-size and 
> buffer-forwarding/reuse are independent?

Yes.

> We have to think about making the 
> APIs consistent, e.g., by reusing data structures. But it's also good to 
> make incremental smaller changes where possible, isn't it? So, yes, we 
> should think about consistency, but develop and apply those two extensions 
> separately?

True, but one discussion can benefit the other. IMO, we should not rush new
userspace API merges, to avoid merging a code that weren't reasonably discussed,
as otherwise, the API will become too messy.

Thanks,
Mauro.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Summary of the V4L2 discussions during LDS - was: Re: Embedded Linux memory management interest group list
  2011-05-15 21:10       ` Hans Verkuil
  2011-05-15 21:27         ` Alan Cox
  2011-05-17 12:49         ` Mauro Carvalho Chehab
@ 2011-05-18 19:46         ` Sakari Ailus
  2011-05-19 10:56           ` Mauro Carvalho Chehab
  2 siblings, 1 reply; 13+ messages in thread
From: Sakari Ailus @ 2011-05-18 19:46 UTC (permalink / raw)
  To: Hans Verkuil; +Cc: Mauro Carvalho Chehab, dri-devel, linux-media, Jesse Barker

Hans Verkuil wrote:
> Note that many video receivers cannot stall. You can't tell them to wait until
> the last buffer finished processing. This is different from some/most? sensors.

Not even image sensors. They just output the frame data; if the receiver
runs out of buffers the data is just lost. And if any part of the frame
is lost, there's no use for other parts of it either. But that's
something the receiver must handle, i.e. discard the data and increment
frame number (field_count in v4l2_buffer).

The interfaces used by image sensors, be they parallel or serial, do not
provide means to inform the sensor that the receiver has run out of
buffer space. These interfaces are just unidirectional.

Regards,

-- 
Sakari Ailus
sakari.ailus@maxwell.research.nokia.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Summary of the V4L2 discussions during LDS - was: Re: Embedded Linux memory management interest group list
  2011-05-18 19:46         ` Sakari Ailus
@ 2011-05-19 10:56           ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 13+ messages in thread
From: Mauro Carvalho Chehab @ 2011-05-19 10:56 UTC (permalink / raw)
  To: Sakari Ailus; +Cc: Hans Verkuil, dri-devel, linux-media, Jesse Barker

Em 18-05-2011 16:46, Sakari Ailus escreveu:
> Hans Verkuil wrote:
>> Note that many video receivers cannot stall. You can't tell them to wait until
>> the last buffer finished processing. This is different from some/most? sensors.
> 
> Not even image sensors. They just output the frame data; if the receiver
> runs out of buffers the data is just lost. And if any part of the frame
> is lost, there's no use for other parts of it either. But that's
> something the receiver must handle, i.e. discard the data and increment
> frame number (field_count in v4l2_buffer).
> 
> The interfaces used by image sensors, be they parallel or serial, do not
> provide means to inform the sensor that the receiver has run out of
> buffer space. These interfaces are just unidirectional.

Well, it depends on how the hardware works, really. On most (all?) designs, the
IP block responsible to receive data from a sensor (or to transmit data, on an
output device) is capable of generating an IRQ to notify the OS that a 
framebuffer was filled. So, the V4L driver can mark that buffer as finished 
and remove it from the list of the queued buffers. Although the current API's
don't allow to create a new buffer if the list is empty, it may actually make
sense to allow kernel to dynamically create a new buffer, warranting that the
sensor (or receiver) will never run out of buffers under normal usage.

Of course, the maximum number of buffers should be specified, to avoid having
an unacceptable delay. On such case, the frame will end by being discarded.
It makes sense to provide a way to report userspace if this happens.

Mauro.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2011-05-19 10:56 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-04-18 15:15 Embedded Linux memory management interest group list Jesse Barker
2011-05-14 10:19 ` Summary of the V4L2 discussions during LDS - was: " Mauro Carvalho Chehab
2011-05-14 11:02   ` Hans Verkuil
2011-05-14 11:46     ` Mauro Carvalho Chehab
2011-05-15 21:10       ` Hans Verkuil
2011-05-15 21:27         ` Alan Cox
2011-05-15 23:44           ` Rob Clark
2011-05-17 12:49         ` Mauro Carvalho Chehab
2011-05-17 12:57           ` Mauro Carvalho Chehab
2011-05-18 19:46         ` Sakari Ailus
2011-05-19 10:56           ` Mauro Carvalho Chehab
2011-05-16 20:45   ` Guennadi Liakhovetski
2011-05-17 16:46     ` Mauro Carvalho Chehab

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.