All of lore.kernel.org
 help / color / mirror / Atom feed
* per-frame camera metadata (again)
@ 2015-12-16  9:37 Guennadi Liakhovetski
  2015-12-16 10:02 ` Hans Verkuil
  0 siblings, 1 reply; 23+ messages in thread
From: Guennadi Liakhovetski @ 2015-12-16  9:37 UTC (permalink / raw)
  To: Linux Media Mailing List
  Cc: Hans Verkuil, Laurent Pinchart, Mauro Carvalho Chehab,
	Sakari Ailus, Aviv Greenberg

Hi all,

A project, I am currently working on, requires acquiringing per-frame 
metadata from the camera and passing it to user-space. This is not the 
first time this comes up and I know such discussions have been held 
before. A typical user is Android (also my case), where you have to 
provide parameter values, that have been used to capture a specific frame, 
to the user. I know Hans is working to handle one side of this process - 
sending per-request controls, but I'm not aware whether he or anyone else 
is actively working on this already or is planning to do so in the near 
future? I also know, that several proprietary solutions have been 
developed and are in use in various projects.

I think a general agreement has been, that such data has to be passed via 
a buffer queue. But there are a few possibilities there too. Below are 
some:

1. Multiplanar. A separate plane is dedicated to metadata. Pros: (a) 
metadata is already associated to specific frames, which they correspond 
to. Cons: (a) a correct implementation would specify image plane fourcc 
separately from any metadata plane format description, but we currently 
don't support per-plane format specification.

2. Separate buffer queues. Pros: (a) no need to extend multiplanar buffer 
implementation. Cons: (a) more difficult synchronisation with image 
frames, (b) still need to work out a way to specify the metadata version.

Any further options? Of the above my choice would go with (1) but with a 
dedicated metadata plane in struct vb2_buffer.

In either of the above options we also need a way to tell the user what is 
in the metadata buffer, its format. We could create new FOURCC codes for 
them, perhaps as V4L2_META_FMT_... or the user space could identify the 
metadata format based on the camera model and an opaque type (metadata 
version code) value. Since metadata formats seem to be extremely camera- 
specific, I'd go with the latter option.

Comments extremely welcome.

Thanks
Guennadi

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: per-frame camera metadata (again)
  2015-12-16  9:37 per-frame camera metadata (again) Guennadi Liakhovetski
@ 2015-12-16 10:02 ` Hans Verkuil
  2015-12-16 11:25   ` Guennadi Liakhovetski
  2015-12-19  0:06   ` Sakari Ailus
  0 siblings, 2 replies; 23+ messages in thread
From: Hans Verkuil @ 2015-12-16 10:02 UTC (permalink / raw)
  To: Guennadi Liakhovetski, Linux Media Mailing List
  Cc: Laurent Pinchart, Mauro Carvalho Chehab, Sakari Ailus, Aviv Greenberg

On 12/16/15 10:37, Guennadi Liakhovetski wrote:
> Hi all,
> 
> A project, I am currently working on, requires acquiringing per-frame 
> metadata from the camera and passing it to user-space. This is not the 
> first time this comes up and I know such discussions have been held 
> before. A typical user is Android (also my case), where you have to 
> provide parameter values, that have been used to capture a specific frame, 
> to the user. I know Hans is working to handle one side of this process - 
> sending per-request controls,

Actually, the request framework can do both sides of the equation: giving
back meta data in read-only controls that are per-frame. While ideally the
driver would extract the information from the binary blob and put it in
nice controls, it is also possible to make a control that just contains the
binary blob itself. Whether that's a good approach depends on many factors
and that's another topic.

> but I'm not aware whether he or anyone else
> is actively working on this already or is planning to do so in the near 
> future? I also know, that several proprietary solutions have been 
> developed and are in use in various projects.
> 
> I think a general agreement has been, that such data has to be passed via 
> a buffer queue. But there are a few possibilities there too. Below are 
> some:
> 
> 1. Multiplanar. A separate plane is dedicated to metadata. Pros: (a) 
> metadata is already associated to specific frames, which they correspond 
> to. Cons: (a) a correct implementation would specify image plane fourcc 
> separately from any metadata plane format description, but we currently 
> don't support per-plane format specification.

This only makes sense if the data actually comes in via DMA and if it is
large enough to make it worth the effort of implementing this. As you say,
it will require figuring out how to do per-frame fourcc.

It also only makes sense if the metadata comes in at the same time as the
frame.

> 2. Separate buffer queues. Pros: (a) no need to extend multiplanar buffer 
> implementation. Cons: (a) more difficult synchronisation with image 
> frames, (b) still need to work out a way to specify the metadata version.
> 
> Any further options? Of the above my choice would go with (1) but with a 
> dedicated metadata plane in struct vb2_buffer.

3. Use the request framework and return the metadata as control(s). Since controls
can be associated with events when they change you can subscribe to such events.
Note: currently I haven't implemented such events for request controls since I am
not certainly how it would be used, but this would be a good test case.

Pros: (a) no need to extend multiplanar buffer implementation, (b) syncing up
with the image frames should be easy (both use the same request ID), (c) a lot
of freedom on how to export the metadata. Cons: (a) request framework is still
work in progress (currently worked on by Laurent), (b) probably too slow for
really large amounts of metadata, you'll need proper DMA handling for that in
which case I would go for 2.

> 
> In either of the above options we also need a way to tell the user what is 
> in the metadata buffer, its format. We could create new FOURCC codes for 
> them, perhaps as V4L2_META_FMT_... or the user space could identify the 
> metadata format based on the camera model and an opaque type (metadata 
> version code) value. Since metadata formats seem to be extremely camera- 
> specific, I'd go with the latter option.
> 
> Comments extremely welcome.

What I like about the request framework is that the driver can pick apart
the metadata and turn it into well-defined controls. So the knowledge how
to do that is in the place where it belongs. In cases where the meta data
is simple too large for that to be feasible, then I don't have much of an
opinion. Camera + version could be enough. Although the same can just as
easily be encoded as a fourcc (V4L2_META_FMT_OVXXXX_V1, _V2, etc). A fourcc
is more consistent with the current API.

Regards,

	Hans

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: per-frame camera metadata (again)
  2015-12-16 10:02 ` Hans Verkuil
@ 2015-12-16 11:25   ` Guennadi Liakhovetski
  2015-12-21  3:41     ` Laurent Pinchart
  2015-12-19  0:06   ` Sakari Ailus
  1 sibling, 1 reply; 23+ messages in thread
From: Guennadi Liakhovetski @ 2015-12-16 11:25 UTC (permalink / raw)
  To: Hans Verkuil
  Cc: Linux Media Mailing List, Laurent Pinchart,
	Mauro Carvalho Chehab, Sakari Ailus, Aviv Greenberg

Hi Hans,

Thanks for your comments.

On Wed, 16 Dec 2015, Hans Verkuil wrote:

> On 12/16/15 10:37, Guennadi Liakhovetski wrote:
> > Hi all,
> > 
> > A project, I am currently working on, requires acquiringing per-frame 
> > metadata from the camera and passing it to user-space. This is not the 
> > first time this comes up and I know such discussions have been held 
> > before. A typical user is Android (also my case), where you have to 
> > provide parameter values, that have been used to capture a specific frame, 
> > to the user. I know Hans is working to handle one side of this process - 
> > sending per-request controls,
> 
> Actually, the request framework can do both sides of the equation: giving
> back meta data in read-only controls that are per-frame. While ideally the
> driver would extract the information from the binary blob and put it in
> nice controls, it is also possible to make a control that just contains the
> binary blob itself. Whether that's a good approach depends on many factors
> and that's another topic.

Yes, sorry, didn't mention this possibility. On the one hand I agree, that 
this would look nice and consistent - you send a bunch of controls down 
and you get them back in exactly the same way, nicely taken apart. OTOH 
there are some issues with that:

1. Metadata values can indeed come from the camera in a buffer, that's 
DMAed to a buffer by the bridge - we have such examples. In our use-cases 
those buffers are separate from main data, so, that the driver could 
allocate them itself, but can there be cases, in which those buffers have 
to be supplied by the user?

2. Size - not sure how large those control buffers can become, in 
use-cases, that I'm aware of we transfer up to 20 single-value parameters 
per frame.

3. With control values delivered per DMA, it's the bridge driver, that 
gets the data, but it's the sensor subdevice driver, that knows what that 
buffer contains. So, to deliver those parameters to the user, the sensor 
driver control processing routines will have to get access to that 
metadata buffer. This isn't supported so far even with the proposed 
request API?

> > but I'm not aware whether he or anyone else
> > is actively working on this already or is planning to do so in the near 
> > future? I also know, that several proprietary solutions have been 
> > developed and are in use in various projects.
> > 
> > I think a general agreement has been, that such data has to be passed via 
> > a buffer queue. But there are a few possibilities there too. Below are 
> > some:
> > 
> > 1. Multiplanar. A separate plane is dedicated to metadata. Pros: (a) 
> > metadata is already associated to specific frames, which they correspond 
> > to. Cons: (a) a correct implementation would specify image plane fourcc 
> > separately from any metadata plane format description, but we currently 
> > don't support per-plane format specification.
> 
> This only makes sense if the data actually comes in via DMA and if it is
> large enough to make it worth the effort of implementing this. As you say,
> it will require figuring out how to do per-frame fourcc.
> 
> It also only makes sense if the metadata comes in at the same time as the
> frame.
> 
> > 2. Separate buffer queues. Pros: (a) no need to extend multiplanar buffer 
> > implementation. Cons: (a) more difficult synchronisation with image 
> > frames, (b) still need to work out a way to specify the metadata version.
> > 
> > Any further options? Of the above my choice would go with (1) but with a 
> > dedicated metadata plane in struct vb2_buffer.
> 
> 3. Use the request framework and return the metadata as control(s). Since controls
> can be associated with events when they change you can subscribe to such events.
> Note: currently I haven't implemented such events for request controls since I am
> not certainly how it would be used, but this would be a good test case.
> 
> Pros: (a) no need to extend multiplanar buffer implementation, (b) syncing up
> with the image frames should be easy (both use the same request ID), (c) a lot
> of freedom on how to export the metadata. Cons: (a) request framework is still
> work in progress (currently worked on by Laurent), (b) probably too slow for
> really large amounts of metadata, you'll need proper DMA handling for that in
> which case I would go for 2.

For (2) (separate buffer queue) would we have to extend VIDIOC_DQBUF to 
select a specific buffer queue?

> > In either of the above options we also need a way to tell the user what is 
> > in the metadata buffer, its format. We could create new FOURCC codes for 
> > them, perhaps as V4L2_META_FMT_... or the user space could identify the 
> > metadata format based on the camera model and an opaque type (metadata 
> > version code) value. Since metadata formats seem to be extremely camera- 
> > specific, I'd go with the latter option.
> > 
> > Comments extremely welcome.
> 
> What I like about the request framework is that the driver can pick apart
> the metadata and turn it into well-defined controls. So the knowledge how
> to do that is in the place where it belongs. In cases where the meta data
> is simple too large for that to be feasible, then I don't have much of an
> opinion. Camera + version could be enough. Although the same can just as
> easily be encoded as a fourcc (V4L2_META_FMT_OVXXXX_V1, _V2, etc). A fourcc
> is more consistent with the current API.

Right, our use-cases so far don't send a lot of data as per-frame 
metadata, no idea what others do.

Thanks
Guennadi

> Regards,
> 
> 	Hans

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: per-frame camera metadata (again)
  2015-12-16 10:02 ` Hans Verkuil
  2015-12-16 11:25   ` Guennadi Liakhovetski
@ 2015-12-19  0:06   ` Sakari Ailus
  2015-12-23  9:47     ` Guennadi Liakhovetski
  1 sibling, 1 reply; 23+ messages in thread
From: Sakari Ailus @ 2015-12-19  0:06 UTC (permalink / raw)
  To: Hans Verkuil, Guennadi Liakhovetski, Linux Media Mailing List
  Cc: Laurent Pinchart, Mauro Carvalho Chehab, Aviv Greenberg

Hi Guennadi and Hans,

Hans Verkuil wrote:
> On 12/16/15 10:37, Guennadi Liakhovetski wrote:
>> Hi all,
>>
>> A project, I am currently working on, requires acquiringing per-frame
>> metadata from the camera and passing it to user-space. This is not the
>> first time this comes up and I know such discussions have been held
>> before. A typical user is Android (also my case), where you have to
>> provide parameter values, that have been used to capture a specific frame,
>> to the user. I know Hans is working to handle one side of this process -
>> sending per-request controls,
>
> Actually, the request framework can do both sides of the equation: giving
> back meta data in read-only controls that are per-frame. While ideally the
> driver would extract the information from the binary blob and put it in
> nice controls, it is also possible to make a control that just contains the
> binary blob itself. Whether that's a good approach depends on many factors
> and that's another topic.

I think that could be possible in some cases. If you don't have a lot of
metadata, then, sure.

>
>> but I'm not aware whether he or anyone else
>> is actively working on this already or is planning to do so in the near
>> future? I also know, that several proprietary solutions have been
>> developed and are in use in various projects.
>>
>> I think a general agreement has been, that such data has to be passed via
>> a buffer queue. But there are a few possibilities there too. Below are
>> some:
>>
>> 1. Multiplanar. A separate plane is dedicated to metadata. Pros: (a)
>> metadata is already associated to specific frames, which they correspond
>> to. Cons: (a) a correct implementation would specify image plane fourcc
>> separately from any metadata plane format description, but we currently
>> don't support per-plane format specification.
>
> This only makes sense if the data actually comes in via DMA and if it is
> large enough to make it worth the effort of implementing this. As you say,
> it will require figuring out how to do per-frame fourcc.
>
> It also only makes sense if the metadata comes in at the same time as the
> frame.

I agree. Much of the time the metadata indeed arrives earlier than the
rest of the frame. The frame layout nor the use cases should be assumed
in the bridge (ISP) driver which implements the interface, essentially
forcing this on the user. This is a major drawback in the approach.

Albeit. If you combine this with the need to pass buffer data to the 
user before the entire buffer is ready, i.e. a plane is ready, you could 
get around this quite neatly.

However, if the DMA engine writing the metadata is different than what's 
writing the image data to memory, then you have a plain metadata buffer 
--- as it's a different video node. But there's really nothing special 
about that then.

Conceptually we should support multi-part frames rather than metadata, 
albeit metadata is just a single use case where a single DMA engine 
outputs multiple kind of data. This could be statistics as well. Or 
multiple images, e.g. YUV and RAW format images of the same frame.

With CSI-2, as the virtual channels are independent, one could start and 
stop them at different times and the frame rate in those channels could 
as well be unrelated. This suggests that different virtual channels 
should be conceptually separate streams also in V4L2 and thus the data 
from different streams should not end up to the same buffer.

Metadata usually (or practically ever?) does not arrive on a separate 
virtual channel though. So this isn't something that necessarily is 
taken into account right now but it's good to be aware of it.

>
>> 2. Separate buffer queues. Pros: (a) no need to extend multiplanar buffer
>> implementation. Cons: (a) more difficult synchronisation with image
>> frames, (b) still need to work out a way to specify the metadata version.

Do you think you have different versions of metadata from a sensor, for
instance? Based on what I've seen these tend to be sensor specific, or
SMIA which defines a metadata type for each bit depth for compliant sensors.

Each metadata format should have a 4cc code, SMIA bit depth specific or
sensor specific where metadata is sensor specific.

Other kind of metadata than what you get from sensors is not covered by 
the thoughts above.

<URL:http://www.retiisi.org.uk/v4l2/foil/v4l2-multi-format.pdf>

I think I'd still favour separate buffer queues.

>>
>> Any further options? Of the above my choice would go with (1) but with a
>> dedicated metadata plane in struct vb2_buffer.
>
> 3. Use the request framework and return the metadata as control(s). Since controls
> can be associated with events when they change you can subscribe to such events.
> Note: currently I haven't implemented such events for request controls since I am
> not certainly how it would be used, but this would be a good test case.
>
> Pros: (a) no need to extend multiplanar buffer implementation, (b) syncing up
> with the image frames should be easy (both use the same request ID), (c) a lot
> of freedom on how to export the metadata. Cons: (a) request framework is still
> work in progress (currently worked on by Laurent), (b) probably too slow for
> really large amounts of metadata, you'll need proper DMA handling for that in
> which case I would go for 2.

Agreed. You could consider it as a drawback that the number of new 
controls required for this could be large as well, but then already for 
other reasons the best implementation would rather be the second option 
mentioned.

>
>>
>> In either of the above options we also need a way to tell the user what is
>> in the metadata buffer, its format. We could create new FOURCC codes for
>> them, perhaps as V4L2_META_FMT_... or the user space could identify the
>> metadata format based on the camera model and an opaque type (metadata
>> version code) value. Since metadata formats seem to be extremely camera-
>> specific, I'd go with the latter option.

I think I'd use separate 4cc codes for the metadata formats when they 
really are different. There are plenty of possible 4cc codes we can use. :-)

Documenting the formats might be painful though.

>>
>> Comments extremely welcome.
>
> What I like about the request framework is that the driver can pick apart
> the metadata and turn it into well-defined controls. So the knowledge how
> to do that is in the place where it belongs. In cases where the meta data
> is simple too large for that to be feasible, then I don't have much of an
> opinion. Camera + version could be enough. Although the same can just as
> easily be encoded as a fourcc (V4L2_META_FMT_OVXXXX_V1, _V2, etc). A fourcc
> is more consistent with the current API.

-- 
Kind regards,

Sakari Ailus
sakari.ailus@linux.intel.com

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: per-frame camera metadata (again)
  2015-12-16 11:25   ` Guennadi Liakhovetski
@ 2015-12-21  3:41     ` Laurent Pinchart
  2015-12-22 11:16       ` Guennadi Liakhovetski
  0 siblings, 1 reply; 23+ messages in thread
From: Laurent Pinchart @ 2015-12-21  3:41 UTC (permalink / raw)
  To: Guennadi Liakhovetski
  Cc: Hans Verkuil, Linux Media Mailing List, Mauro Carvalho Chehab,
	Sakari Ailus, Aviv Greenberg

Hi Guennadi,

On Wednesday 16 December 2015 12:25:24 Guennadi Liakhovetski wrote:
> On Wed, 16 Dec 2015, Hans Verkuil wrote:
> > On 12/16/15 10:37, Guennadi Liakhovetski wrote:
> > > Hi all,
> > > 
> > > A project, I am currently working on, requires acquiringing per-frame
> > > metadata from the camera and passing it to user-space. This is not the
> > > first time this comes up and I know such discussions have been held
> > > before. A typical user is Android (also my case), where you have to
> > > provide parameter values, that have been used to capture a specific
> > > frame, to the user. I know Hans is working to handle one side of this
> > > process - sending per-request controls,
> > 
> > Actually, the request framework can do both sides of the equation: giving
> > back meta data in read-only controls that are per-frame. While ideally the
> > driver would extract the information from the binary blob and put it in
> > nice controls, it is also possible to make a control that just contains
> > the binary blob itself. Whether that's a good approach depends on many
> > factors and that's another topic.
> 
> Yes, sorry, didn't mention this possibility. On the one hand I agree, that
> this would look nice and consistent - you send a bunch of controls down
> and you get them back in exactly the same way, nicely taken apart. OTOH
> there are some issues with that:
> 
> 1. Metadata values can indeed come from the camera in a buffer, that's
> DMAed to a buffer by the bridge - we have such examples. In our use-cases
> those buffers are separate from main data, so, that the driver could
> allocate them itself, but can there be cases, in which those buffers have
> to be supplied by the user?

The only case I can think of where the user would benefit from supplying the 
buffer is sharing meta data with other processes and/or devices *if* the 
amount of meta data is so large that a memcpy would negatively affect 
performances. And I can't think of such a case at the moment :-)

> 2. Size - not sure how large those control buffers can become, in
> use-cases, that I'm aware of we transfer up to 20 single-value parameters
> per frame.

I have to deal with a system that can transfer up to ~200 parameters per frame 
(at least in theory).

> 3. With control values delivered per DMA, it's the bridge driver, that
> gets the data, but it's the sensor subdevice driver, that knows what that
> buffer contains. So, to deliver those parameters to the user, the sensor
> driver control processing routines will have to get access to that
> metadata buffer. This isn't supported so far even with the proposed
> request API?

Correct. My current implementation (see git://linuxtv.org/pinchartl/media.git 
drm/du/vsp1-kms/request) doesn't deal with controls yet as the first use case 
I focused on for the request API primarily requires setting formats (and 
links, which are my next target).

My other use case (Android camera HAL v3 for Project Ara) mainly deals with 
controls and meta-data, but I'll then likely pass the meta-data blob to 
userspace as-is, as its format isn't always known to the driver. I'm also 
concerned about efficiency but haven't had time to perform measurements yet.

> > > but I'm not aware whether he or anyone else is actively working on this
> > > already or is planning to do so in the near future? I also know, that
> > > several proprietary solutions have been developed and are in use in
> > > various projects.
> > > 
> > > I think a general agreement has been, that such data has to be passed
> > > via a buffer queue. But there are a few possibilities there too. Below
> > > are some:
> > > 
> > > 1. Multiplanar. A separate plane is dedicated to metadata. Pros: (a)
> > > metadata is already associated to specific frames, which they correspond
> > > to. Cons: (a) a correct implementation would specify image plane fourcc
> > > separately from any metadata plane format description, but we currently
> > > don't support per-plane format specification.
> > 
> > This only makes sense if the data actually comes in via DMA and if it is
> > large enough to make it worth the effort of implementing this. As you say,
> > it will require figuring out how to do per-frame fourcc.
> > 
> > It also only makes sense if the metadata comes in at the same time as the
> > frame.
> > 
> > > 2. Separate buffer queues. Pros: (a) no need to extend multiplanar
> > > buffer implementation. Cons: (a) more difficult synchronisation with
> > > image frames, (b) still need to work out a way to specify the metadata
> > > version.
> > > 
> > > Any further options? Of the above my choice would go with (1) but with a
> > > dedicated metadata plane in struct vb2_buffer.
> > 
> > 3. Use the request framework and return the metadata as control(s). Since
> > controls can be associated with events when they change you can subscribe
> > to such events. Note: currently I haven't implemented such events for
> > request controls since I am not certainly how it would be used, but this
> > would be a good test case.
> > 
> > Pros: (a) no need to extend multiplanar buffer implementation, (b) syncing
> > up with the image frames should be easy (both use the same request ID),
> > (c) a lot of freedom on how to export the metadata. Cons: (a) request
> > framework is still work in progress (currently worked on by Laurent), (b)
> > probably too slow for really large amounts of metadata, you'll need
> > proper DMA handling for that in which case I would go for 2.

(a) will eventually be solved, (b) needs measurements before discussing it 
further.

> For (2) (separate buffer queue) would we have to extend VIDIOC_DQBUF to
> select a specific buffer queue?

Wouldn't it use a separate video device node ?

> > > In either of the above options we also need a way to tell the user what
> > > is in the metadata buffer, its format. We could create new FOURCC codes
> > > for them, perhaps as V4L2_META_FMT_... or the user space could identify
> > > the metadata format based on the camera model and an opaque type
> > > (metadata version code) value. Since metadata formats seem to be
> > > extremely camera-specific, I'd go with the latter option.
> > > 
> > > Comments extremely welcome.
> > 
> > What I like about the request framework is that the driver can pick apart
> > the metadata and turn it into well-defined controls. So the knowledge how
> > to do that is in the place where it belongs. In cases where the meta data
> > is simple too large for that to be feasible, then I don't have much of an
> > opinion. Camera + version could be enough. Although the same can just as
> > easily be encoded as a fourcc (V4L2_META_FMT_OVXXXX_V1, _V2, etc). A
> > fourcc is more consistent with the current API.
> 
> Right, our use-cases so far don't send a lot of data as per-frame
> metadata, no idea what others do.

What kind of hardware do you deal with that sends meta-data ? And over what 
kind of channel does it send it ?

-- 
Regards,

Laurent Pinchart


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: per-frame camera metadata (again)
  2015-12-21  3:41     ` Laurent Pinchart
@ 2015-12-22 11:16       ` Guennadi Liakhovetski
  2015-12-22 13:30         ` karthik poduval
  2015-12-23 17:40         ` Laurent Pinchart
  0 siblings, 2 replies; 23+ messages in thread
From: Guennadi Liakhovetski @ 2015-12-22 11:16 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Hans Verkuil, Linux Media Mailing List, Mauro Carvalho Chehab,
	Sakari Ailus, Aviv Greenberg

Hi Laurent,

On Mon, 21 Dec 2015, Laurent Pinchart wrote:

> Hi Guennadi,
> 
> On Wednesday 16 December 2015 12:25:24 Guennadi Liakhovetski wrote:
> > On Wed, 16 Dec 2015, Hans Verkuil wrote:
> > > On 12/16/15 10:37, Guennadi Liakhovetski wrote:
> > > > Hi all,
> > > > 
> > > > A project, I am currently working on, requires acquiringing per-frame
> > > > metadata from the camera and passing it to user-space. This is not the
> > > > first time this comes up and I know such discussions have been held
> > > > before. A typical user is Android (also my case), where you have to
> > > > provide parameter values, that have been used to capture a specific
> > > > frame, to the user. I know Hans is working to handle one side of this
> > > > process - sending per-request controls,
> > > 
> > > Actually, the request framework can do both sides of the equation: giving
> > > back meta data in read-only controls that are per-frame. While ideally the
> > > driver would extract the information from the binary blob and put it in
> > > nice controls, it is also possible to make a control that just contains
> > > the binary blob itself. Whether that's a good approach depends on many
> > > factors and that's another topic.
> > 
> > Yes, sorry, didn't mention this possibility. On the one hand I agree, that
> > this would look nice and consistent - you send a bunch of controls down
> > and you get them back in exactly the same way, nicely taken apart. OTOH
> > there are some issues with that:
> > 
> > 1. Metadata values can indeed come from the camera in a buffer, that's
> > DMAed to a buffer by the bridge - we have such examples. In our use-cases
> > those buffers are separate from main data, so, that the driver could
> > allocate them itself, but can there be cases, in which those buffers have
> > to be supplied by the user?
> 
> The only case I can think of where the user would benefit from supplying the 
> buffer is sharing meta data with other processes and/or devices *if* the 
> amount of meta data is so large that a memcpy would negatively affect 
> performances. And I can't think of such a case at the moment :-)

Ok, so, we could for now limit metadata buffer support to driver 
allocation.

> > 2. Size - not sure how large those control buffers can become, in
> > use-cases, that I'm aware of we transfer up to 20 single-value parameters
> > per frame.
> 
> I have to deal with a system that can transfer up to ~200 parameters per frame 
> (at least in theory).

Are they single-value (say, up to 32 bits) parameters or can be arrays / 
data chunks?

> > 3. With control values delivered per DMA, it's the bridge driver, that
> > gets the data, but it's the sensor subdevice driver, that knows what that
> > buffer contains. So, to deliver those parameters to the user, the sensor
> > driver control processing routines will have to get access to that
> > metadata buffer. This isn't supported so far even with the proposed
> > request API?
> 
> Correct. My current implementation (see git://linuxtv.org/pinchartl/media.git 
> drm/du/vsp1-kms/request) doesn't deal with controls yet as the first use case 
> I focused on for the request API primarily requires setting formats (and 
> links, which are my next target).
> 
> My other use case (Android camera HAL v3 for Project Ara) mainly deals with 
> controls and meta-data, but I'll then likely pass the meta-data blob to 
> userspace as-is, as its format isn't always known to the driver. I'm also 
> concerned about efficiency but haven't had time to perform measurements yet.

Hm, why is it not known to the subdevice driver? Does the buffer layout 
depend on some external conditions? Maybe loaded firmware? But it should 
be possible to tell the driver, say, that the current metadata buffer 
layout has version N?

Those metadata buffers can well contain some parameters, that can also be 
obtained via controls. So, if we just send metadata buffers to the user as 
is, we create duplication, which isn't nice. Besides, the end user will 
anyway want broken down control values. E.g. in the Android case, the app 
is getting single controls, not opaque metadata buffers. Of course, one 
could create a vendor metadata tag "metadata blob," but that's not how 
Android does it so far.

OTOH passing those buffers to the subdevice driver for parsing and 
returning them as an (extended) control also seems a bit ugly.

What about performance cost? If we pass all those parameters as a single 
extended control (as long as they are of the same class), the cost won't 
be higher, than dequeuing a buffer? Let's not take the parsing cost and 
the control struct memory overhead into account for now.

User-friendliness: I think, implementors would prefer to pass a complete 
buffer to the user-space to avoid having to modify drivers every time they 
modify those parameters.

> > > > but I'm not aware whether he or anyone else is actively working on this
> > > > already or is planning to do so in the near future? I also know, that
> > > > several proprietary solutions have been developed and are in use in
> > > > various projects.
> > > > 
> > > > I think a general agreement has been, that such data has to be passed
> > > > via a buffer queue. But there are a few possibilities there too. Below
> > > > are some:
> > > > 
> > > > 1. Multiplanar. A separate plane is dedicated to metadata. Pros: (a)
> > > > metadata is already associated to specific frames, which they correspond
> > > > to. Cons: (a) a correct implementation would specify image plane fourcc
> > > > separately from any metadata plane format description, but we currently
> > > > don't support per-plane format specification.
> > > 
> > > This only makes sense if the data actually comes in via DMA and if it is
> > > large enough to make it worth the effort of implementing this. As you say,
> > > it will require figuring out how to do per-frame fourcc.
> > > 
> > > It also only makes sense if the metadata comes in at the same time as the
> > > frame.
> > > 
> > > > 2. Separate buffer queues. Pros: (a) no need to extend multiplanar
> > > > buffer implementation. Cons: (a) more difficult synchronisation with
> > > > image frames, (b) still need to work out a way to specify the metadata
> > > > version.
> > > > 
> > > > Any further options? Of the above my choice would go with (1) but with a
> > > > dedicated metadata plane in struct vb2_buffer.
> > > 
> > > 3. Use the request framework and return the metadata as control(s). Since
> > > controls can be associated with events when they change you can subscribe
> > > to such events. Note: currently I haven't implemented such events for
> > > request controls since I am not certainly how it would be used, but this
> > > would be a good test case.
> > > 
> > > Pros: (a) no need to extend multiplanar buffer implementation, (b) syncing
> > > up with the image frames should be easy (both use the same request ID),
> > > (c) a lot of freedom on how to export the metadata. Cons: (a) request
> > > framework is still work in progress (currently worked on by Laurent), (b)
> > > probably too slow for really large amounts of metadata, you'll need
> > > proper DMA handling for that in which case I would go for 2.
> 
> (a) will eventually be solved, (b) needs measurements before discussing it 
> further.
> 
> > For (2) (separate buffer queue) would we have to extend VIDIOC_DQBUF to
> > select a specific buffer queue?
> 
> Wouldn't it use a separate video device node ?

Ok, that seems like a better option to me too, agree.

> > > > In either of the above options we also need a way to tell the user what
> > > > is in the metadata buffer, its format. We could create new FOURCC codes
> > > > for them, perhaps as V4L2_META_FMT_... or the user space could identify
> > > > the metadata format based on the camera model and an opaque type
> > > > (metadata version code) value. Since metadata formats seem to be
> > > > extremely camera-specific, I'd go with the latter option.
> > > > 
> > > > Comments extremely welcome.
> > > 
> > > What I like about the request framework is that the driver can pick apart
> > > the metadata and turn it into well-defined controls. So the knowledge how
> > > to do that is in the place where it belongs. In cases where the meta data
> > > is simple too large for that to be feasible, then I don't have much of an
> > > opinion. Camera + version could be enough. Although the same can just as
> > > easily be encoded as a fourcc (V4L2_META_FMT_OVXXXX_V1, _V2, etc). A
> > > fourcc is more consistent with the current API.
> > 
> > Right, our use-cases so far don't send a lot of data as per-frame
> > metadata, no idea what others do.
> 
> What kind of hardware do you deal with that sends meta-data ? And over what 
> kind of channel does it send it ?

A CSI-2 connected camera sensor.

Thanks
Guennadi

> -- 
> Regards,
> 
> Laurent Pinchart

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: per-frame camera metadata (again)
  2015-12-22 11:16       ` Guennadi Liakhovetski
@ 2015-12-22 13:30         ` karthik poduval
  2015-12-24 10:54           ` Laurent Pinchart
  2015-12-23 17:40         ` Laurent Pinchart
  1 sibling, 1 reply; 23+ messages in thread
From: karthik poduval @ 2015-12-22 13:30 UTC (permalink / raw)
  To: Guennadi Liakhovetski
  Cc: Laurent Pinchart, Hans Verkuil, Linux Media Mailing List,
	Mauro Carvalho Chehab, Sakari Ailus, Aviv Greenberg

I have been wanting to share these thoughts for the group but was
waiting for the right time which I think is now since Guennadi brought
up this discussion.

For the Amazon Fire phone 4 corner camera, here is how we passed
metadata from driver to application (which was a CV client requiring
per frame metadata).

We took an unused field in struct v4l2_buffer (__u32 reserved in this
case) and used it to pass in a pointer to a user space metadata object
(i.e. struct app_metadata) to the driver via the VIDIOC_DQBUF ioctl
call.

struct v4l2_buffer for reference.
http://lxr.free-electrons.com/source/include/uapi/linux/videodev2.h#L836

The driver copied its local copy of the metadata object to the
userspace metadata object using the copy_to_user primitive offered by
the kernel.

Here is how we handled the metadata in the driver code.
https://github.com/Fire-Phone/android_kernel_amazon_kodiak/blob/master/drivers/media/platform/msm/camera_v2/camera/camera.c#L235

This was done before HAL V3 was available. With HAL V3, the metadata
object can be the HAL v3 metadata buffer. Non Android devices can use
custom metadata format (like the one we used).

With this approach, the metadata always accompanies the frame data as
it's available along with the frame buffer inside struct v4l2_buffer
from the VIDIOC_DQBUF ioctl call.

If the community likes this idea, the v4l2_buffer can now be
officially modified to contain a pointer to user space metadata object
v4l2_buffer.metadata and then metadata format and size can be agreed
upon between application and driver.
Thoughts ?

--
Regards,
Karthik Poduval


On Tue, Dec 22, 2015 at 3:16 AM, Guennadi Liakhovetski
<g.liakhovetski@gmx.de> wrote:
> Hi Laurent,
>
> On Mon, 21 Dec 2015, Laurent Pinchart wrote:
>
>> Hi Guennadi,
>>
>> On Wednesday 16 December 2015 12:25:24 Guennadi Liakhovetski wrote:
>> > On Wed, 16 Dec 2015, Hans Verkuil wrote:
>> > > On 12/16/15 10:37, Guennadi Liakhovetski wrote:
>> > > > Hi all,
>> > > >
>> > > > A project, I am currently working on, requires acquiringing per-frame
>> > > > metadata from the camera and passing it to user-space. This is not the
>> > > > first time this comes up and I know such discussions have been held
>> > > > before. A typical user is Android (also my case), where you have to
>> > > > provide parameter values, that have been used to capture a specific
>> > > > frame, to the user. I know Hans is working to handle one side of this
>> > > > process - sending per-request controls,
>> > >
>> > > Actually, the request framework can do both sides of the equation: giving
>> > > back meta data in read-only controls that are per-frame. While ideally the
>> > > driver would extract the information from the binary blob and put it in
>> > > nice controls, it is also possible to make a control that just contains
>> > > the binary blob itself. Whether that's a good approach depends on many
>> > > factors and that's another topic.
>> >
>> > Yes, sorry, didn't mention this possibility. On the one hand I agree, that
>> > this would look nice and consistent - you send a bunch of controls down
>> > and you get them back in exactly the same way, nicely taken apart. OTOH
>> > there are some issues with that:
>> >
>> > 1. Metadata values can indeed come from the camera in a buffer, that's
>> > DMAed to a buffer by the bridge - we have such examples. In our use-cases
>> > those buffers are separate from main data, so, that the driver could
>> > allocate them itself, but can there be cases, in which those buffers have
>> > to be supplied by the user?
>>
>> The only case I can think of where the user would benefit from supplying the
>> buffer is sharing meta data with other processes and/or devices *if* the
>> amount of meta data is so large that a memcpy would negatively affect
>> performances. And I can't think of such a case at the moment :-)
>
> Ok, so, we could for now limit metadata buffer support to driver
> allocation.
>
>> > 2. Size - not sure how large those control buffers can become, in
>> > use-cases, that I'm aware of we transfer up to 20 single-value parameters
>> > per frame.
>>
>> I have to deal with a system that can transfer up to ~200 parameters per frame
>> (at least in theory).
>
> Are they single-value (say, up to 32 bits) parameters or can be arrays /
> data chunks?
>
>> > 3. With control values delivered per DMA, it's the bridge driver, that
>> > gets the data, but it's the sensor subdevice driver, that knows what that
>> > buffer contains. So, to deliver those parameters to the user, the sensor
>> > driver control processing routines will have to get access to that
>> > metadata buffer. This isn't supported so far even with the proposed
>> > request API?
>>
>> Correct. My current implementation (see git://linuxtv.org/pinchartl/media.git
>> drm/du/vsp1-kms/request) doesn't deal with controls yet as the first use case
>> I focused on for the request API primarily requires setting formats (and
>> links, which are my next target).
>>
>> My other use case (Android camera HAL v3 for Project Ara) mainly deals with
>> controls and meta-data, but I'll then likely pass the meta-data blob to
>> userspace as-is, as its format isn't always known to the driver. I'm also
>> concerned about efficiency but haven't had time to perform measurements yet.
>
> Hm, why is it not known to the subdevice driver? Does the buffer layout
> depend on some external conditions? Maybe loaded firmware? But it should
> be possible to tell the driver, say, that the current metadata buffer
> layout has version N?
>
> Those metadata buffers can well contain some parameters, that can also be
> obtained via controls. So, if we just send metadata buffers to the user as
> is, we create duplication, which isn't nice. Besides, the end user will
> anyway want broken down control values. E.g. in the Android case, the app
> is getting single controls, not opaque metadata buffers. Of course, one
> could create a vendor metadata tag "metadata blob," but that's not how
> Android does it so far.
>
> OTOH passing those buffers to the subdevice driver for parsing and
> returning them as an (extended) control also seems a bit ugly.
>
> What about performance cost? If we pass all those parameters as a single
> extended control (as long as they are of the same class), the cost won't
> be higher, than dequeuing a buffer? Let's not take the parsing cost and
> the control struct memory overhead into account for now.
>
> User-friendliness: I think, implementors would prefer to pass a complete
> buffer to the user-space to avoid having to modify drivers every time they
> modify those parameters.
>
>> > > > but I'm not aware whether he or anyone else is actively working on this
>> > > > already or is planning to do so in the near future? I also know, that
>> > > > several proprietary solutions have been developed and are in use in
>> > > > various projects.
>> > > >
>> > > > I think a general agreement has been, that such data has to be passed
>> > > > via a buffer queue. But there are a few possibilities there too. Below
>> > > > are some:
>> > > >
>> > > > 1. Multiplanar. A separate plane is dedicated to metadata. Pros: (a)
>> > > > metadata is already associated to specific frames, which they correspond
>> > > > to. Cons: (a) a correct implementation would specify image plane fourcc
>> > > > separately from any metadata plane format description, but we currently
>> > > > don't support per-plane format specification.
>> > >
>> > > This only makes sense if the data actually comes in via DMA and if it is
>> > > large enough to make it worth the effort of implementing this. As you say,
>> > > it will require figuring out how to do per-frame fourcc.
>> > >
>> > > It also only makes sense if the metadata comes in at the same time as the
>> > > frame.
>> > >
>> > > > 2. Separate buffer queues. Pros: (a) no need to extend multiplanar
>> > > > buffer implementation. Cons: (a) more difficult synchronisation with
>> > > > image frames, (b) still need to work out a way to specify the metadata
>> > > > version.
>> > > >
>> > > > Any further options? Of the above my choice would go with (1) but with a
>> > > > dedicated metadata plane in struct vb2_buffer.
>> > >
>> > > 3. Use the request framework and return the metadata as control(s). Since
>> > > controls can be associated with events when they change you can subscribe
>> > > to such events. Note: currently I haven't implemented such events for
>> > > request controls since I am not certainly how it would be used, but this
>> > > would be a good test case.
>> > >
>> > > Pros: (a) no need to extend multiplanar buffer implementation, (b) syncing
>> > > up with the image frames should be easy (both use the same request ID),
>> > > (c) a lot of freedom on how to export the metadata. Cons: (a) request
>> > > framework is still work in progress (currently worked on by Laurent), (b)
>> > > probably too slow for really large amounts of metadata, you'll need
>> > > proper DMA handling for that in which case I would go for 2.
>>
>> (a) will eventually be solved, (b) needs measurements before discussing it
>> further.
>>
>> > For (2) (separate buffer queue) would we have to extend VIDIOC_DQBUF to
>> > select a specific buffer queue?
>>
>> Wouldn't it use a separate video device node ?
>
> Ok, that seems like a better option to me too, agree.
>
>> > > > In either of the above options we also need a way to tell the user what
>> > > > is in the metadata buffer, its format. We could create new FOURCC codes
>> > > > for them, perhaps as V4L2_META_FMT_... or the user space could identify
>> > > > the metadata format based on the camera model and an opaque type
>> > > > (metadata version code) value. Since metadata formats seem to be
>> > > > extremely camera-specific, I'd go with the latter option.
>> > > >
>> > > > Comments extremely welcome.
>> > >
>> > > What I like about the request framework is that the driver can pick apart
>> > > the metadata and turn it into well-defined controls. So the knowledge how
>> > > to do that is in the place where it belongs. In cases where the meta data
>> > > is simple too large for that to be feasible, then I don't have much of an
>> > > opinion. Camera + version could be enough. Although the same can just as
>> > > easily be encoded as a fourcc (V4L2_META_FMT_OVXXXX_V1, _V2, etc). A
>> > > fourcc is more consistent with the current API.
>> >
>> > Right, our use-cases so far don't send a lot of data as per-frame
>> > metadata, no idea what others do.
>>
>> What kind of hardware do you deal with that sends meta-data ? And over what
>> kind of channel does it send it ?
>
> A CSI-2 connected camera sensor.
>
> Thanks
> Guennadi
>
>> --
>> Regards,
>>
>> Laurent Pinchart
> --
> To unsubscribe from this list: send the line "unsubscribe linux-media" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Regards,
Karthik Poduval

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: per-frame camera metadata (again)
  2015-12-19  0:06   ` Sakari Ailus
@ 2015-12-23  9:47     ` Guennadi Liakhovetski
  2015-12-24 10:46       ` Laurent Pinchart
  0 siblings, 1 reply; 23+ messages in thread
From: Guennadi Liakhovetski @ 2015-12-23  9:47 UTC (permalink / raw)
  To: Sakari Ailus
  Cc: Hans Verkuil, Linux Media Mailing List, Laurent Pinchart,
	Mauro Carvalho Chehab, Aviv Greenberg

Hi Sakari,

On Sat, 19 Dec 2015, Sakari Ailus wrote:

> Hi Guennadi and Hans,
> 
> Hans Verkuil wrote:
> > On 12/16/15 10:37, Guennadi Liakhovetski wrote:
> > > Hi all,
> > > 
> > > A project, I am currently working on, requires acquiringing per-frame
> > > metadata from the camera and passing it to user-space. This is not the
> > > first time this comes up and I know such discussions have been held
> > > before. A typical user is Android (also my case), where you have to
> > > provide parameter values, that have been used to capture a specific frame,
> > > to the user. I know Hans is working to handle one side of this process -
> > > sending per-request controls,
> > 
> > Actually, the request framework can do both sides of the equation: giving
> > back meta data in read-only controls that are per-frame. While ideally the
> > driver would extract the information from the binary blob and put it in
> > nice controls, it is also possible to make a control that just contains the
> > binary blob itself. Whether that's a good approach depends on many factors
> > and that's another topic.
> 
> I think that could be possible in some cases. If you don't have a lot of
> metadata, then, sure.
> 
> > > but I'm not aware whether he or anyone else
> > > is actively working on this already or is planning to do so in the near
> > > future? I also know, that several proprietary solutions have been
> > > developed and are in use in various projects.
> > > 
> > > I think a general agreement has been, that such data has to be passed via
> > > a buffer queue. But there are a few possibilities there too. Below are
> > > some:
> > > 
> > > 1. Multiplanar. A separate plane is dedicated to metadata. Pros: (a)
> > > metadata is already associated to specific frames, which they correspond
> > > to. Cons: (a) a correct implementation would specify image plane fourcc
> > > separately from any metadata plane format description, but we currently
> > > don't support per-plane format specification.
> > 
> > This only makes sense if the data actually comes in via DMA and if it is
> > large enough to make it worth the effort of implementing this. As you say,
> > it will require figuring out how to do per-frame fourcc.
> > 
> > It also only makes sense if the metadata comes in at the same time as the
> > frame.
> 
> I agree. Much of the time the metadata indeed arrives earlier than the
> rest of the frame. The frame layout nor the use cases should be assumed
> in the bridge (ISP) driver which implements the interface, essentially
> forcing this on the user. This is a major drawback in the approach.
> 
> Albeit. If you combine this with the need to pass buffer data to the user
> before the entire buffer is ready, i.e. a plane is ready, you could get around
> this quite neatly.
> 
> However, if the DMA engine writing the metadata is different than what's
> writing the image data to memory, then you have a plain metadata buffer --- as
> it's a different video node. But there's really nothing special about that
> then.
> 
> Conceptually we should support multi-part frames rather than metadata, albeit
> metadata is just a single use case where a single DMA engine outputs multiple
> kind of data. This could be statistics as well. Or multiple images, e.g. YUV
> and RAW format images of the same frame.

If you stream different kinds of images (raw, yuv), then using multiple 
nodes is rather straight-forward, isn't it? Whereas for statistics and 
metadata, if we do that, do we assign new FOURCC codes for each new such 
data layout?

> 
> With CSI-2, as the virtual channels are independent, one could start and stop
> them at different times and the frame rate in those channels could as well be
> unrelated. This suggests that different virtual channels should be
> conceptually separate streams also in V4L2 and thus the data from different
> streams should not end up to the same buffer.
> 
> Metadata usually (or practically ever?) does not arrive on a separate virtual
> channel though. So this isn't something that necessarily is taken into account
> right now but it's good to be aware of it.

A camera can send image data and metadata on the same virtual channel, but 
then it should use different data types for them?

> > > 2. Separate buffer queues. Pros: (a) no need to extend multiplanar buffer
> > > implementation. Cons: (a) more difficult synchronisation with image
> > > frames, (b) still need to work out a way to specify the metadata version.
> 
> Do you think you have different versions of metadata from a sensor, for
> instance? Based on what I've seen these tend to be sensor specific, or
> SMIA which defines a metadata type for each bit depth for compliant sensors.
> 
> Each metadata format should have a 4cc code, SMIA bit depth specific or
> sensor specific where metadata is sensor specific.
> 
> Other kind of metadata than what you get from sensors is not covered by the
> thoughts above.
> 
> <URL:http://www.retiisi.org.uk/v4l2/foil/v4l2-multi-format.pdf>
> 
> I think I'd still favour separate buffer queues.

And separate video nodes then.

> > > Any further options? Of the above my choice would go with (1) but with a
> > > dedicated metadata plane in struct vb2_buffer.
> > 
> > 3. Use the request framework and return the metadata as control(s). Since
> > controls
> > can be associated with events when they change you can subscribe to such
> > events.
> > Note: currently I haven't implemented such events for request controls since
> > I am
> > not certainly how it would be used, but this would be a good test case.
> > 
> > Pros: (a) no need to extend multiplanar buffer implementation, (b) syncing
> > up
> > with the image frames should be easy (both use the same request ID), (c) a
> > lot
> > of freedom on how to export the metadata. Cons: (a) request framework is
> > still
> > work in progress (currently worked on by Laurent), (b) probably too slow for
> > really large amounts of metadata, you'll need proper DMA handling for that
> > in
> > which case I would go for 2.
> 
> Agreed. You could consider it as a drawback that the number of new controls
> required for this could be large as well, but then already for other reasons
> the best implementation would rather be the second option mentioned.

But wouldn't a single extended control with all metadata-transferred 
controls solve the performance issue?

> > > In either of the above options we also need a way to tell the user what is
> > > in the metadata buffer, its format. We could create new FOURCC codes for
> > > them, perhaps as V4L2_META_FMT_... or the user space could identify the
> > > metadata format based on the camera model and an opaque type (metadata
> > > version code) value. Since metadata formats seem to be extremely camera-
> > > specific, I'd go with the latter option.
> 
> I think I'd use separate 4cc codes for the metadata formats when they really
> are different. There are plenty of possible 4cc codes we can use. :-)
> 
> Documenting the formats might be painful though.

The advantage of this approach together with a separate video node / 
buffer queue is, that no changes to the core would be required.

At the moment I think, that using (extended) controls would be the most 
"correct" way to implement that metadata, but you can associate such 
control values with frames only when the request API is there. Yet another 
caveat is, that we define V4L2_CTRL_ID2CLASS() as ((id) & 0x0fff0000UL)
and V4L2_CID_PRIVATE_BASE as 0x08000000, so that drivers cannot define 
private controls to belong to existing classes. Was this intensional?

Thanks
Guennadi

> > > Comments extremely welcome.
> > 
> > What I like about the request framework is that the driver can pick apart
> > the metadata and turn it into well-defined controls. So the knowledge how
> > to do that is in the place where it belongs. In cases where the meta data
> > is simple too large for that to be feasible, then I don't have much of an
> > opinion. Camera + version could be enough. Although the same can just as
> > easily be encoded as a fourcc (V4L2_META_FMT_OVXXXX_V1, _V2, etc). A fourcc
> > is more consistent with the current API.
> 
> -- 
> Kind regards,
> 
> Sakari Ailus
> sakari.ailus@linux.intel.com
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: per-frame camera metadata (again)
  2015-12-22 11:16       ` Guennadi Liakhovetski
  2015-12-22 13:30         ` karthik poduval
@ 2015-12-23 17:40         ` Laurent Pinchart
  2015-12-24 10:42           ` Guennadi Liakhovetski
  1 sibling, 1 reply; 23+ messages in thread
From: Laurent Pinchart @ 2015-12-23 17:40 UTC (permalink / raw)
  To: Guennadi Liakhovetski
  Cc: Hans Verkuil, Linux Media Mailing List, Mauro Carvalho Chehab,
	Sakari Ailus, Aviv Greenberg

Hi Guennadi,

On Tuesday 22 December 2015 12:16:27 Guennadi Liakhovetski wrote:
> On Mon, 21 Dec 2015, Laurent Pinchart wrote:
> > On Wednesday 16 December 2015 12:25:24 Guennadi Liakhovetski wrote:
> >> On Wed, 16 Dec 2015, Hans Verkuil wrote:
> >>> On 12/16/15 10:37, Guennadi Liakhovetski wrote:
> >>>> Hi all,
> >>>> 
> >>>> A project, I am currently working on, requires acquiringing
> >>>> per-frame metadata from the camera and passing it to user-space.
> >>>> This is not the first time this comes up and I know such discussions
> >>>> have been held before. A typical user is Android (also my case),
> >>>> where you have to provide parameter values, that have been used to
> >>>> capture a specific frame, to the user. I know Hans is working to
> >>>> handle one side of this process - sending per-request controls,
> >>> 
> >>> Actually, the request framework can do both sides of the equation:
> >>> giving back meta data in read-only controls that are per-frame. While
> >>> ideally the driver would extract the information from the binary blob
> >>> and put it in nice controls, it is also possible to make a control
> >>> that just contains the binary blob itself. Whether that's a good
> >>> approach depends on many factors and that's another topic.
> >> 
> >> Yes, sorry, didn't mention this possibility. On the one hand I agree,
> >> that this would look nice and consistent - you send a bunch of controls
> >> down and you get them back in exactly the same way, nicely taken apart.
> >> OTOH there are some issues with that:
> >> 
> >> 1. Metadata values can indeed come from the camera in a buffer, that's
> >> DMAed to a buffer by the bridge - we have such examples. In our
> >> use-cases those buffers are separate from main data, so, that the driver
> >> could allocate them itself, but can there be cases, in which those
> >> buffers have to be supplied by the user?
> > 
> > The only case I can think of where the user would benefit from supplying
> > the buffer is sharing meta data with other processes and/or devices *if*
> > the amount of meta data is so large that a memcpy would negatively affect
> > performances. And I can't think of such a case at the moment :-)
> 
> Ok, so, we could for now limit metadata buffer support to driver
> allocation.
> 
> >> 2. Size - not sure how large those control buffers can become, in
> >> use-cases, that I'm aware of we transfer up to 20 single-value
> >> parameters per frame.
> > 
> > I have to deal with a system that can transfer up to ~200 parameters per
> > frame (at least in theory).
> 
> Are they single-value (say, up to 32 bits) parameters or can be arrays /
> data chunks?

They can be arrays as well.

> >> 3. With control values delivered per DMA, it's the bridge driver, that
> >> gets the data, but it's the sensor subdevice driver, that knows what
> >> that buffer contains. So, to deliver those parameters to the user, the
> >> sensor driver control processing routines will have to get access to
> >> that metadata buffer. This isn't supported so far even with the proposed
> >> request API?
> > 
> > Correct. My current implementation (see
> > git://linuxtv.org/pinchartl/media.git drm/du/vsp1-kms/request) doesn't
> > deal with controls yet as the first use case I focused on for the request
> > API primarily requires setting formats (and links, which are my next
> > target).
> > 
> > My other use case (Android camera HAL v3 for Project Ara) mainly deals
> > with controls and meta-data, but I'll then likely pass the meta-data blob
> > to userspace as-is, as its format isn't always known to the driver. I'm
> > also concerned about efficiency but haven't had time to perform
> > measurements yet.
>
> Hm, why is it not known to the subdevice driver? Does the buffer layout
> depend on some external conditions? Maybe loaded firmware? But it should
> be possible to tell the driver, say, that the current metadata buffer
> layout has version N?

My devices are class-compliant but can use a device-specific meta-data format. 
The kernel driver knows about the device class only, knowledge about any 
device-specific format is only available in userspace.

> Those metadata buffers can well contain some parameters, that can also be
> obtained via controls. So, if we just send metadata buffers to the user as
> is, we create duplication, which isn't nice.

In my case there won't be any duplication as there will likely be no control 
at all, but I agree with you in the general case.

> Besides, the end user will anyway want broken down control values. E.g. in
> the Android case, the app is getting single controls, not opaque metadata
> buffers. Of course, one could create a vendor metadata tag "metadata blob,"
> but that's not how Android does it so far.
> 
> OTOH passing those buffers to the subdevice driver for parsing and
> returning them as an (extended) control also seems a bit ugly.
> 
> What about performance cost? If we pass all those parameters as a single
> extended control (as long as they are of the same class), the cost won't
> be higher, than dequeuing a buffer? Let's not take the parsing cost and
> the control struct memory overhead into account for now.

If you take nothing into account then the cost won't be higher ;-) It's the 
parsing cost I was referring to, including the cost of updating the control 
value from within the kernel.

> User-friendliness: I think, implementors would prefer to pass a complete
> buffer to the user-space to avoid having to modify drivers every time they
> modify those parameters.
> 
> >>>> but I'm not aware whether he or anyone else is actively working on
> >>>> this already or is planning to do so in the near future? I also
> >>>> know, that several proprietary solutions have been developed and are
> >>>> in use in various projects.
> >>>> 
> >>>> I think a general agreement has been, that such data has to be
> >>>> passed via a buffer queue. But there are a few possibilities there
> >>>> too. Below are some:
> >>>> 
> >>>> 1. Multiplanar. A separate plane is dedicated to metadata. Pros: (a)
> >>>> metadata is already associated to specific frames, which they
> >>>> correspond to. Cons: (a) a correct implementation would specify
> >>>> image plane fourcc separately from any metadata plane format
> >>>> description, but we currently don't support per-plane format
> >>>> specification.
> >>> 
> >>> This only makes sense if the data actually comes in via DMA and if it
> >>> is large enough to make it worth the effort of implementing this. As
> >>> you say, it will require figuring out how to do per-frame fourcc.
> >>> 
> >>> It also only makes sense if the metadata comes in at the same time as
> >>> the frame.
> >>> 
> >>>> 2. Separate buffer queues. Pros: (a) no need to extend multiplanar
> >>>> buffer implementation. Cons: (a) more difficult synchronisation with
> >>>> image frames, (b) still need to work out a way to specify the
> >>>> metadata version.
> >>>> 
> >>>> Any further options? Of the above my choice would go with (1) but
> >>>> with a dedicated metadata plane in struct vb2_buffer.
> >>> 
> >>> 3. Use the request framework and return the metadata as control(s).
> >>> Since controls can be associated with events when they change you can
> >>> subscribe to such events. Note: currently I haven't implemented such
> >>> events for request controls since I am not certainly how it would be
> >>> used, but this would be a good test case.
> >>> 
> >>> Pros: (a) no need to extend multiplanar buffer implementation, (b)
> >>> syncing up with the image frames should be easy (both use the same
> >>> request ID), (c) a lot of freedom on how to export the metadata. Cons:
> >>> (a) request framework is still work in progress (currently worked on
> >>> by Laurent), (b) probably too slow for really large amounts of
> >>> metadata, you'll need proper DMA handling for that in which case I
> >>> would go for 2.
> > 
> > (a) will eventually be solved, (b) needs measurements before discussing it
> > further.
> > 
> > > For (2) (separate buffer queue) would we have to extend VIDIOC_DQBUF to
> > > select a specific buffer queue?
> > 
> > Wouldn't it use a separate video device node ?
> 
> Ok, that seems like a better option to me too, agree.
> 
> >>>> In either of the above options we also need a way to tell the user
> >>>> what is in the metadata buffer, its format. We could create new
> >>>> FOURCC codes for them, perhaps as V4L2_META_FMT_... or the user
> >>>> space could identify the metadata format based on the camera model
> >>>> and an opaque type (metadata version code) value. Since metadata
> >>>> formats seem to be extremely camera-specific, I'd go with the latter
> >>>> option.
> >>>> 
> >>>> Comments extremely welcome.
> >>> 
> >>> What I like about the request framework is that the driver can pick
> >>> apart the metadata and turn it into well-defined controls. So the
> >>> knowledge how to do that is in the place where it belongs. In cases
> >>> where the meta data is simple too large for that to be feasible, then
> >>> I don't have much of an opinion. Camera + version could be enough.
> >>> Although the same can just as easily be encoded as a fourcc
> >>> (V4L2_META_FMT_OVXXXX_V1, _V2, etc). A fourcc is more consistent with
> >>> the current API.
> >> 
> >> Right, our use-cases so far don't send a lot of data as per-frame
> >> metadata, no idea what others do.
> > 
> > What kind of hardware do you deal with that sends meta-data ? And over
> > what kind of channel does it send it ?
> 
> A CSI-2 connected camera sensor.

Is meta-data sent as embedded data lines with a different CSI-2 DT ?

-- 
Regards,

Laurent Pinchart


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: per-frame camera metadata (again)
  2015-12-23 17:40         ` Laurent Pinchart
@ 2015-12-24 10:42           ` Guennadi Liakhovetski
  2015-12-26 23:47             ` Laurent Pinchart
  0 siblings, 1 reply; 23+ messages in thread
From: Guennadi Liakhovetski @ 2015-12-24 10:42 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Hans Verkuil, Linux Media Mailing List, Mauro Carvalho Chehab,
	Sakari Ailus, Aviv Greenberg

Hi Laurent,

Let me put this at the top: So far it looks like we converge on two 
possibilities:

(1) a separate video-device node with a separate queue. No user-space 
visible changes are required apart from new FOURCC codes. In the kernel 
we'd have to add some subdev API between the bridge and the sensor drivers 
to let the sensor driver instruct the bridge driver to use some of the 
data, arriving over the camera interface, as metadata.

(2) parsing metadata by the sensor subdevice driver to make it available 
as controls. This would only (properly) work with the request API, which 
is still a work in progress. Apart from that request API no additional 
user-space visible changes would be required. The kernel subdevice API 
would have to be extended as above, to specify metadata location. 
Additionally, the bridge driver would have to pass the metadata buffer 
back to the subdevice driver for parsing.

Since the request API isn't available yet and even the latest version 
doesn't support per-request controls, looks like immediately only the 
former approach can be used.

On Wed, 23 Dec 2015, Laurent Pinchart wrote:

[snip]

> > > My other use case (Android camera HAL v3 for Project Ara) mainly deals
> > > with controls and meta-data, but I'll then likely pass the meta-data blob
> > > to userspace as-is, as its format isn't always known to the driver. I'm
> > > also concerned about efficiency but haven't had time to perform
> > > measurements yet.
> >
> > Hm, why is it not known to the subdevice driver? Does the buffer layout
> > depend on some external conditions? Maybe loaded firmware? But it should
> > be possible to tell the driver, say, that the current metadata buffer
> > layout has version N?
> 
> My devices are class-compliant but can use a device-specific meta-data format. 
> The kernel driver knows about the device class only, knowledge about any 
> device-specific format is only available in userspace.

So, unless you want to add camera-specific code to your class-driver 
(UVC?), that's another argument against approach (2) above.

> > Those metadata buffers can well contain some parameters, that can also be
> > obtained via controls. So, if we just send metadata buffers to the user as
> > is, we create duplication, which isn't nice.
> 
> In my case there won't be any duplication as there will likely be no control 
> at all, but I agree with you in the general case.
> 
> > Besides, the end user will anyway want broken down control values. E.g. in
> > the Android case, the app is getting single controls, not opaque metadata
> > buffers. Of course, one could create a vendor metadata tag "metadata blob,"
> > but that's not how Android does it so far.
> > 
> > OTOH passing those buffers to the subdevice driver for parsing and
> > returning them as an (extended) control also seems a bit ugly.
> > 
> > What about performance cost? If we pass all those parameters as a single
> > extended control (as long as they are of the same class), the cost won't
> > be higher, than dequeuing a buffer? Let's not take the parsing cost and
> > the control struct memory overhead into account for now.
> 
> If you take nothing into account then the cost won't be higher ;-) It's the 
> parsing cost I was referring to, including the cost of updating the control 
> value from within the kernel.

I meant mostly context switching costs, switching between the kernel- and 
the user-space. If we had to extract all controls one by one that wouldn't 
be a negligible overhead, I guess.

[snip]

> > >> Right, our use-cases so far don't send a lot of data as per-frame
> > >> metadata, no idea what others do.
> > > 
> > > What kind of hardware do you deal with that sends meta-data ? And over
> > > what kind of channel does it send it ?
> > 
> > A CSI-2 connected camera sensor.
> 
> Is meta-data sent as embedded data lines with a different CSI-2 DT ?

A different data type, yes.

So, all in all it looks that the only immediately available option and, 
possibly, the only feasible option at all is a separate buffer queue. Do 
we agree, that a subdev API is needed to inform the bridge driver about 
the availability and location of the metadata?

Thanks
Guennadi

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: per-frame camera metadata (again)
  2015-12-23  9:47     ` Guennadi Liakhovetski
@ 2015-12-24 10:46       ` Laurent Pinchart
  2015-12-24 11:17         ` hverkuil
  0 siblings, 1 reply; 23+ messages in thread
From: Laurent Pinchart @ 2015-12-24 10:46 UTC (permalink / raw)
  To: Guennadi Liakhovetski
  Cc: Sakari Ailus, Hans Verkuil, Linux Media Mailing List,
	Mauro Carvalho Chehab, Aviv Greenberg

Hi Guennadi,

On Wednesday 23 December 2015 10:47:57 Guennadi Liakhovetski wrote:
> On Sat, 19 Dec 2015, Sakari Ailus wrote:
> > Hans Verkuil wrote:
> >> On 12/16/15 10:37, Guennadi Liakhovetski wrote:
> >>> Hi all,
> >>> 
> >>> A project, I am currently working on, requires acquiringing per-frame
> >>> metadata from the camera and passing it to user-space. This is not the
> >>> first time this comes up and I know such discussions have been held
> >>> before. A typical user is Android (also my case), where you have to
> >>> provide parameter values, that have been used to capture a specific
> >>> frame, to the user. I know Hans is working to handle one side of this
> >>> process - sending per-request controls,
> >> 
> >> Actually, the request framework can do both sides of the equation:
> >> giving back meta data in read-only controls that are per-frame. While
> >> ideally the driver would extract the information from the binary blob
> >> and put it in nice controls, it is also possible to make a control that
> >> just contains the binary blob itself. Whether that's a good approach
> >> depends on many factors and that's another topic.
> > 
> > I think that could be possible in some cases. If you don't have a lot of
> > metadata, then, sure.
> > 
> >>> but I'm not aware whether he or anyone else is actively working on
> >>> this already or is planning to do so in the near future? I also know,
> >>> that several proprietary solutions have been developed and are in use
> >>> in various projects.
> >>> 
> >>> I think a general agreement has been, that such data has to be passed
> >>> via a buffer queue. But there are a few possibilities there too. Below
> >>> are some:
> >>> 
> >>> 1. Multiplanar. A separate plane is dedicated to metadata. Pros: (a)
> >>> metadata is already associated to specific frames, which they
> >>> correspond to. Cons: (a) a correct implementation would specify image
> >>> plane fourcc separately from any metadata plane format description,
> >>> but we currently don't support per-plane format specification.
> >> 
> >> This only makes sense if the data actually comes in via DMA and if it is
> >> large enough to make it worth the effort of implementing this. As you
> >> say, it will require figuring out how to do per-frame fourcc.
> >> 
> >> It also only makes sense if the metadata comes in at the same time as
> >> the frame.
> > 
> > I agree. Much of the time the metadata indeed arrives earlier than the
> > rest of the frame. The frame layout nor the use cases should be assumed
> > in the bridge (ISP) driver which implements the interface, essentially
> > forcing this on the user. This is a major drawback in the approach.
> > 
> > Albeit. If you combine this with the need to pass buffer data to the user
> > before the entire buffer is ready, i.e. a plane is ready, you could get
> > around this quite neatly.
> > 
> > However, if the DMA engine writing the metadata is different than what's
> > writing the image data to memory, then you have a plain metadata buffer
> > --- as it's a different video node. But there's really nothing special
> > about that then.
> > 
> > Conceptually we should support multi-part frames rather than metadata,
> > albeit metadata is just a single use case where a single DMA engine
> > outputs multiple kind of data. This could be statistics as well. Or
> > multiple images, e.g. YUV and RAW format images of the same frame.
> 
> If you stream different kinds of images (raw, yuv), then using multiple
> nodes is rather straight-forward, isn't it? Whereas for statistics and
> metadata, if we do that, do we assign new FOURCC codes for each new such
> data layout?
> 
> > With CSI-2, as the virtual channels are independent, one could start and
> > stop them at different times and the frame rate in those channels could
> > as well be unrelated. This suggests that different virtual channels
> > should be conceptually separate streams also in V4L2 and thus the data
> > from different streams should not end up to the same buffer.
> > 
> > Metadata usually (or practically ever?) does not arrive on a separate
> > virtual channel though. So this isn't something that necessarily is taken
> > into account right now but it's good to be aware of it.
> 
> A camera can send image data and metadata on the same virtual channel, but
> then it should use different data types for them?

Conceptually a sensor could send meta-data as embedded data lines using the 
same data type. I'm not sure whether that's done in practice.

> >>> 2. Separate buffer queues. Pros: (a) no need to extend multiplanar
> >>> buffer implementation. Cons: (a) more difficult synchronisation with
> >>> image frames, (b) still need to work out a way to specify the metadata
> >>> version.
> > 
> > Do you think you have different versions of metadata from a sensor, for
> > instance? Based on what I've seen these tend to be sensor specific, or
> > SMIA which defines a metadata type for each bit depth for compliant
> > sensors.
> > 
> > Each metadata format should have a 4cc code, SMIA bit depth specific or
> > sensor specific where metadata is sensor specific.
> > 
> > Other kind of metadata than what you get from sensors is not covered by
> > the thoughts above.
> > 
> > <URL:http://www.retiisi.org.uk/v4l2/foil/v4l2-multi-format.pdf>
> > 
> > I think I'd still favour separate buffer queues.
> 
> And separate video nodes then.
> 
> >>> Any further options? Of the above my choice would go with (1) but with
> >>> a dedicated metadata plane in struct vb2_buffer.
> >> 
> >> 3. Use the request framework and return the metadata as control(s).
> >> Since controls can be associated with events when they change you can
> >> subscribe to such events. Note: currently I haven't implemented such
> >> events for request controls since I am not certainly how it would be
> >> used, but this would be a good test case.
> >> 
> >> Pros: (a) no need to extend multiplanar buffer implementation, (b)
> >> syncing up with the image frames should be easy (both use the same
> >> request ID), (c) a lot of freedom on how to export the metadata. Cons:
> >> (a) request framework is still work in progress (currently worked on by
> >> Laurent), (b) probably too slow for really large amounts of metadata,
> >> you'll need proper DMA handling for that in which case I would go for 2.
> > 
> > Agreed. You could consider it as a drawback that the number of new
> > controls required for this could be large as well, but then already for
> > other reasons the best implementation would rather be the second option
> > mentioned.
>
> But wouldn't a single extended control with all metadata-transferred
> controls solve the performance issue?

You would still have to update the value of the controls in the kernel based 
on meta-data parsing, we've never measured the cost of such an operation when 
the number of controls is large. An interesting side issue is that the control 
framework currently doesn't support updating controls from interrupt context 
as all controls are protected by a mutex. The cost of locking, once it will be 
reworked, also needs to be taken into account.

> >>> In either of the above options we also need a way to tell the user
> >>> what is in the metadata buffer, its format. We could create new FOURCC
> >>> codes for them, perhaps as V4L2_META_FMT_... or the user space could
> >>> identify the metadata format based on the camera model and an opaque
> >>> type (metadata version code) value. Since metadata formats seem to be
> >>> extremely camera-specific, I'd go with the latter option.
> > 
> > I think I'd use separate 4cc codes for the metadata formats when they
> > really are different. There are plenty of possible 4cc codes we can use.
> > :-)
> > 
> > Documenting the formats might be painful though.
> 
> The advantage of this approach together with a separate video node /
> buffer queue is, that no changes to the core would be required.
> 
> At the moment I think, that using (extended) controls would be the most
> "correct" way to implement that metadata, but you can associate such
> control values with frames only when the request API is there. Yet another
> caveat is, that we define V4L2_CTRL_ID2CLASS() as ((id) & 0x0fff0000UL)
> and V4L2_CID_PRIVATE_BASE as 0x08000000, so that drivers cannot define
> private controls to belong to existing classes. Was this intensional?

Control classes are a deprecated concept, so I don't think this matters much.

> >>> Comments extremely welcome.
> >> 
> >> What I like about the request framework is that the driver can pick
> >> apart the metadata and turn it into well-defined controls. So the
> >> knowledge how to do that is in the place where it belongs. In cases
> >> where the meta data is simple too large for that to be feasible, then I
> >> don't have much of an opinion. Camera + version could be enough.
> >> Although the same can just as easily be encoded as a fourcc
> >> (V4L2_META_FMT_OVXXXX_V1, _V2, etc). A fourcc is more consistent with
> >> the current API.

-- 
Regards,

Laurent Pinchart


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: per-frame camera metadata (again)
  2015-12-22 13:30         ` karthik poduval
@ 2015-12-24 10:54           ` Laurent Pinchart
  0 siblings, 0 replies; 23+ messages in thread
From: Laurent Pinchart @ 2015-12-24 10:54 UTC (permalink / raw)
  To: karthik poduval
  Cc: Guennadi Liakhovetski, Hans Verkuil, Linux Media Mailing List,
	Mauro Carvalho Chehab, Sakari Ailus, Aviv Greenberg

Hello Karthik,

On Tuesday 22 December 2015 05:30:52 karthik poduval wrote:
> I have been wanting to share these thoughts for the group but was
> waiting for the right time which I think is now since Guennadi brought
> up this discussion.
> 
> For the Amazon Fire phone 4 corner camera, here is how we passed
> metadata from driver to application (which was a CV client requiring
> per frame metadata).
> 
> We took an unused field in struct v4l2_buffer (__u32 reserved in this
> case) and used it to pass in a pointer to a user space metadata object
> (i.e. struct app_metadata) to the driver via the VIDIOC_DQBUF ioctl
> call.
> 
> struct v4l2_buffer for reference.
> http://lxr.free-electrons.com/source/include/uapi/linux/videodev2.h#L836
> 
> The driver copied its local copy of the metadata object to the
> userspace metadata object using the copy_to_user primitive offered by
> the kernel.
> 
> Here is how we handled the metadata in the driver code.
> https://github.com/Fire-Phone/android_kernel_amazon_kodiak/blob/master/drive
> rs/media/platform/msm/camera_v2/camera/camera.c#L235
> 
> This was done before HAL V3 was available. With HAL V3, the metadata
> object can be the HAL v3 metadata buffer. Non Android devices can use
> custom metadata format (like the one we used).
> 
> With this approach, the metadata always accompanies the frame data as
> it's available along with the frame buffer inside struct v4l2_buffer
> from the VIDIOC_DQBUF ioctl call.
> 
> If the community likes this idea, the v4l2_buffer can now be
> officially modified to contain a pointer to user space metadata object
> v4l2_buffer.metadata and then metadata format and size can be agreed
> upon between application and driver.
> Thoughts ?

I see several issues with that approach. The first one is that the meta-data 
buffer is only passed at DQBUF time. Drivers thus need to copy data using the 
CPU instead of capturing meta-data directly to the buffer through DMA. If the 
meta-data size is small the performance impact can be negligible, but that 
might not be true in the general case.

A second issue is that the approach isn't generic enough in my opinion. If we 
want to attach additional data buffers to a v4l2_buffer I agree with Sakari 
that we should design a multi-part buffer API in order to not limit the 
implementation to meta-data, but support other kind of information such as 
statistics for instance.

Finally, it might be beneficial in some use cases to pass meta-data to 
userspace before the end of the frame (assuming meta-data is available earlier 
of course). That's certainly true for statistics, use cases for meta-data are 
less clear to me though.

-- 
Regards,

Laurent Pinchart


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: per-frame camera metadata (again)
  2015-12-24 10:46       ` Laurent Pinchart
@ 2015-12-24 11:17         ` hverkuil
  2015-12-24 11:29           ` Laurent Pinchart
  0 siblings, 1 reply; 23+ messages in thread
From: hverkuil @ 2015-12-24 11:17 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Guennadi Liakhovetski, Sakari Ailus, Linux Media Mailing List,
	Mauro Carvalho Chehab, Aviv Greenberg, linux-media-owner

On 2015-12-24 11:46, Laurent Pinchart wrote:
> Hi Guennadi,
> 
> On Wednesday 23 December 2015 10:47:57 Guennadi Liakhovetski wrote:

>> >>> 2. Separate buffer queues. Pros: (a) no need to extend multiplanar
>> >>> buffer implementation. Cons: (a) more difficult synchronisation with
>> >>> image frames, (b) still need to work out a way to specify the metadata
>> >>> version.
>> >
>> > Do you think you have different versions of metadata from a sensor, for
>> > instance? Based on what I've seen these tend to be sensor specific, or
>> > SMIA which defines a metadata type for each bit depth for compliant
>> > sensors.
>> >
>> > Each metadata format should have a 4cc code, SMIA bit depth specific or
>> > sensor specific where metadata is sensor specific.
>> >
>> > Other kind of metadata than what you get from sensors is not covered by
>> > the thoughts above.
>> >
>> > <URL:http://www.retiisi.org.uk/v4l2/foil/v4l2-multi-format.pdf>
>> >
>> > I think I'd still favour separate buffer queues.
>> 
>> And separate video nodes then.
>> 
>> >>> Any further options? Of the above my choice would go with (1) but with
>> >>> a dedicated metadata plane in struct vb2_buffer.
>> >>
>> >> 3. Use the request framework and return the metadata as control(s).
>> >> Since controls can be associated with events when they change you can
>> >> subscribe to such events. Note: currently I haven't implemented such
>> >> events for request controls since I am not certainly how it would be
>> >> used, but this would be a good test case.
>> >>
>> >> Pros: (a) no need to extend multiplanar buffer implementation, (b)
>> >> syncing up with the image frames should be easy (both use the same
>> >> request ID), (c) a lot of freedom on how to export the metadata. Cons:
>> >> (a) request framework is still work in progress (currently worked on by
>> >> Laurent), (b) probably too slow for really large amounts of metadata,
>> >> you'll need proper DMA handling for that in which case I would go for 2.
>> >
>> > Agreed. You could consider it as a drawback that the number of new
>> > controls required for this could be large as well, but then already for
>> > other reasons the best implementation would rather be the second option
>> > mentioned.
>> 
>> But wouldn't a single extended control with all metadata-transferred
>> controls solve the performance issue?
> 
> You would still have to update the value of the controls in the kernel 
> based
> on meta-data parsing, we've never measured the cost of such an 
> operation when
> the number of controls is large.

Before discounting this option we need to measure the cost first. I 
suspect that
if there are just a handful (let's say <= 10) of controls, then the cost 
is limited.

> An interesting side issue is that the control
> framework currently doesn't support updating controls from interrupt 
> context
> as all controls are protected by a mutex. The cost of locking, once it 
> will be
> reworked, also needs to be taken into account.

I think the cost of locking can be limited (and I am not even sure if 
locking is
needed if we restrict the type of control where this can be done). I 
have some ideas
about that.

> 
>> >>> In either of the above options we also need a way to tell the user
>> >>> what is in the metadata buffer, its format. We could create new FOURCC
>> >>> codes for them, perhaps as V4L2_META_FMT_... or the user space could
>> >>> identify the metadata format based on the camera model and an opaque
>> >>> type (metadata version code) value. Since metadata formats seem to be
>> >>> extremely camera-specific, I'd go with the latter option.
>> >
>> > I think I'd use separate 4cc codes for the metadata formats when they
>> > really are different. There are plenty of possible 4cc codes we can use.
>> > :-)
>> >
>> > Documenting the formats might be painful though.
>> 
>> The advantage of this approach together with a separate video node /
>> buffer queue is, that no changes to the core would be required.
>> 
>> At the moment I think, that using (extended) controls would be the 
>> most
>> "correct" way to implement that metadata, but you can associate such
>> control values with frames only when the request API is there. Yet 
>> another
>> caveat is, that we define V4L2_CTRL_ID2CLASS() as ((id) & 
>> 0x0fff0000UL)
>> and V4L2_CID_PRIVATE_BASE as 0x08000000, so that drivers cannot define
>> private controls to belong to existing classes. Was this intensional?

PRIVATE_BASE is deprecated and cannot be used with the control framework 
(it
was a very bad design). The control framework provides backwards compat 
code
to handle old apps that still use PRIVATE_BASE.

Control classes are not deprecated, only the use of the control_class 
field in
struct v4l2_ext_controls to limit the controls in the list to a single 
control
class is deprecated. That old limitation was from pre-control-framework 
times
to simplify driver design. With the creation of the control framework 
that
limitation is no longer needed.

Per-control class private controls are perfectly possible: such controls 
start
at control class base ID + 0x1000.

> 
> Control classes are a deprecated concept, so I don't think this matters 
> much.

As described above, this statement is not correct.

Regards,

     Hans


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: per-frame camera metadata (again)
  2015-12-24 11:17         ` hverkuil
@ 2015-12-24 11:29           ` Laurent Pinchart
  2015-12-24 12:54             ` hverkuil
  0 siblings, 1 reply; 23+ messages in thread
From: Laurent Pinchart @ 2015-12-24 11:29 UTC (permalink / raw)
  To: hverkuil
  Cc: Guennadi Liakhovetski, Sakari Ailus, Linux Media Mailing List,
	Mauro Carvalho Chehab, Aviv Greenberg, linux-media-owner

Hi Hans,

On Thursday 24 December 2015 12:17:48 hverkuil wrote:
> On 2015-12-24 11:46, Laurent Pinchart wrote:
> > On Wednesday 23 December 2015 10:47:57 Guennadi Liakhovetski wrote:
> >> >>> 2. Separate buffer queues. Pros: (a) no need to extend multiplanar
> >> >>> buffer implementation. Cons: (a) more difficult synchronisation with
> >> >>> image frames, (b) still need to work out a way to specify the
> >> >>> metadata version.
> >> > 
> >> > Do you think you have different versions of metadata from a sensor, for
> >> > instance? Based on what I've seen these tend to be sensor specific, or
> >> > SMIA which defines a metadata type for each bit depth for compliant
> >> > sensors.
> >> > 
> >> > Each metadata format should have a 4cc code, SMIA bit depth specific or
> >> > sensor specific where metadata is sensor specific.
> >> > 
> >> > Other kind of metadata than what you get from sensors is not covered by
> >> > the thoughts above.
> >> > 
> >> > <URL:http://www.retiisi.org.uk/v4l2/foil/v4l2-multi-format.pdf>
> >> > 
> >> > I think I'd still favour separate buffer queues.
> >> 
> >> And separate video nodes then.
> >> 
> >> >>> Any further options? Of the above my choice would go with (1) but
> >> >>> with a dedicated metadata plane in struct vb2_buffer.
> >> >> 
> >> >> 3. Use the request framework and return the metadata as control(s).
> >> >> Since controls can be associated with events when they change you can
> >> >> subscribe to such events. Note: currently I haven't implemented such
> >> >> events for request controls since I am not certainly how it would be
> >> >> used, but this would be a good test case.
> >> >> 
> >> >> Pros: (a) no need to extend multiplanar buffer implementation, (b)
> >> >> syncing up with the image frames should be easy (both use the same
> >> >> request ID), (c) a lot of freedom on how to export the metadata. Cons:
> >> >> (a) request framework is still work in progress (currently worked on
> >> >> by Laurent), (b) probably too slow for really large amounts of
> >> >> metadata, you'll need proper DMA handling for that in which case I
> >> >> would go for 2.
> >> > 
> >> > Agreed. You could consider it as a drawback that the number of new
> >> > controls required for this could be large as well, but then already for
> >> > other reasons the best implementation would rather be the second option
> >> > mentioned.
> >> 
> >> But wouldn't a single extended control with all metadata-transferred
> >> controls solve the performance issue?
> > 
> > You would still have to update the value of the controls in the kernel
> > based on meta-data parsing, we've never measured the cost of such an
> > operation when the number of controls is large.
> 
> Before discounting this option we need to measure the cost first. I suspect
> that if there are just a handful (let's say <= 10) of controls, then the
> cost is limited.

I certainly don't want to disregard the option before having tried it, my only 
point is that we need to measure performances before making a decision. I plan 
to do so but without an ETA at this point.

> > An interesting side issue is that the control framework currently doesn't
> > support updating controls from interrupt context as all controls are
> > protected by a mutex. The cost of locking, once it will be reworked, also
> > needs to be taken into account.
> 
> I think the cost of locking can be limited (and I am not even sure if
> locking is needed if we restrict the type of control where this can be
> done). I have some ideas about that.

Well, we'll need to do something, and a lockless algorithm would certainly be 
good :-)

> >>>>> In either of the above options we also need a way to tell the user
> >>>>> what is in the metadata buffer, its format. We could create new
> >>>>> FOURCC codes for them, perhaps as V4L2_META_FMT_... or the user space
> >>>>> could identify the metadata format based on the camera model and an
> >>>>> opaque type (metadata version code) value. Since metadata formats
> >>>>> seem to be extremely camera-specific, I'd go with the latter option.
> >>> 
> >>> I think I'd use separate 4cc codes for the metadata formats when they
> >>> really are different. There are plenty of possible 4cc codes we can
> >>> use.
> >>> 
> >>> :-)
> >>> 
> >>> Documenting the formats might be painful though.
> >> 
> >> The advantage of this approach together with a separate video node /
> >> buffer queue is, that no changes to the core would be required.
> >> 
> >> At the moment I think, that using (extended) controls would be the
> >> most "correct" way to implement that metadata, but you can associate such
> >> control values with frames only when the request API is there. Yet
> >> another caveat is, that we define V4L2_CTRL_ID2CLASS() as ((id) &
> >> 0x0fff0000UL) and V4L2_CID_PRIVATE_BASE as 0x08000000, so that drivers
> >> cannot define private controls to belong to existing classes. Was this
> >> intensional?
> 
> PRIVATE_BASE is deprecated and cannot be used with the control framework
> (it was a very bad design). The control framework provides backwards compat
> code to handle old apps that still use PRIVATE_BASE.
> 
> Control classes are not deprecated, only the use of the control_class
> field in struct v4l2_ext_controls to limit the controls in the list to a
> single control class is deprecated. That old limitation was from pre-
> control-framework times to simplify driver design. With the creation of the
> control framework that limitation is no longer needed.

Doesn't that effectively deprecated control classes ? We can certainly group 
controls in categories for documentation purposes, but the control class as an 
API concept is quite useless nowadays.

> Per-control class private controls are perfectly possible: such controls
> start at control class base ID + 0x1000.
> 
> > Control classes are a deprecated concept, so I don't think this matters
> > much.
> 
> As described above, this statement is not correct.

-- 
Regards,

Laurent Pinchart


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: per-frame camera metadata (again)
  2015-12-24 11:29           ` Laurent Pinchart
@ 2015-12-24 12:54             ` hverkuil
  2015-12-24 17:33               ` Laurent Pinchart
  0 siblings, 1 reply; 23+ messages in thread
From: hverkuil @ 2015-12-24 12:54 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Guennadi Liakhovetski, Sakari Ailus, Linux Media Mailing List,
	Mauro Carvalho Chehab, Aviv Greenberg, linux-media-owner

On 2015-12-24 12:29, Laurent Pinchart wrote:
>> Control classes are not deprecated, only the use of the control_class
>> field in struct v4l2_ext_controls to limit the controls in the list to 
>> a
>> single control class is deprecated. That old limitation was from pre-
>> control-framework times to simplify driver design. With the creation 
>> of the
>> control framework that limitation is no longer needed.
> 
> Doesn't that effectively deprecated control classes ? We can certainly 
> group
> controls in categories for documentation purposes, but the control 
> class as an
> API concept is quite useless nowadays.

No it's not. It is meant to be used by GUIs to group controls into tab 
pages or
something similar. See qv4l2 or v4l2-ctl. It's very useful.

Regards,

     Hans

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: per-frame camera metadata (again)
  2015-12-24 12:54             ` hverkuil
@ 2015-12-24 17:33               ` Laurent Pinchart
  0 siblings, 0 replies; 23+ messages in thread
From: Laurent Pinchart @ 2015-12-24 17:33 UTC (permalink / raw)
  To: hverkuil
  Cc: Guennadi Liakhovetski, Sakari Ailus, Linux Media Mailing List,
	Mauro Carvalho Chehab, Aviv Greenberg, linux-media-owner

Hi Hans,

On Thursday 24 December 2015 13:54:11 hverkuil wrote:
> On 2015-12-24 12:29, Laurent Pinchart wrote:
> >> Control classes are not deprecated, only the use of the control_class
> >> field in struct v4l2_ext_controls to limit the controls in the list to
> >> a single control class is deprecated. That old limitation was from pre-
> >> control-framework times to simplify driver design. With the creation
> >> of the control framework that limitation is no longer needed.
> > 
> > Doesn't that effectively deprecated control classes ? We can certainly
> > group controls in categories for documentation purposes, but the control
> > class as an API concept is quite useless nowadays.
> 
> No it's not. It is meant to be used by GUIs to group controls into tab
> pages or something similar. See qv4l2 or v4l2-ctl. It's very useful.

I guess I'll just keep disagreeing with you regarding whether the kernel 
should provide GUI hints then (cf. V4L2_CTRL_FLAG_SLIDER) :-)

-- 
Regards,

Laurent Pinchart


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: per-frame camera metadata (again)
  2015-12-24 10:42           ` Guennadi Liakhovetski
@ 2015-12-26 23:47             ` Laurent Pinchart
  2016-01-01 15:43               ` Guennadi Liakhovetski
  0 siblings, 1 reply; 23+ messages in thread
From: Laurent Pinchart @ 2015-12-26 23:47 UTC (permalink / raw)
  To: Guennadi Liakhovetski
  Cc: Hans Verkuil, Linux Media Mailing List, Mauro Carvalho Chehab,
	Sakari Ailus, Aviv Greenberg

Hi Guennadi,

On Thursday 24 December 2015 11:42:49 Guennadi Liakhovetski wrote:
> Hi Laurent,
> 
> Let me put this at the top: So far it looks like we converge on two
> possibilities:
> 
> (1) a separate video-device node with a separate queue. No user-space
> visible changes are required apart from new FOURCC codes. In the kernel
> we'd have to add some subdev API between the bridge and the sensor drivers
> to let the sensor driver instruct the bridge driver to use some of the
> data, arriving over the camera interface, as metadata.

The interface should be more generic and allow describing how multiple 
channels (in terms of virtual channels and data types for CSI-2 for instance) 
are multiplexed over a single physical link. I'm not sure how to represent 
that at the media controller level, that's also one topic that needs to be 
researched.

> (2) parsing metadata by the sensor subdevice driver to make it available
> as controls. This would only (properly) work with the request API, which
> is still a work in progress. Apart from that request API no additional
> user-space visible changes would be required. The kernel subdevice API
> would have to be extended as above, to specify metadata location.
> Additionally, the bridge driver would have to pass the metadata buffer
> back to the subdevice driver for parsing.
> 
> Since the request API isn't available yet and even the latest version
> doesn't support per-request controls, looks like immediately only the
> former approach can be used.

Both approaches require development, I'm certainly open to collaborating on 
the request API to finalize it sooner :-)

> On Wed, 23 Dec 2015, Laurent Pinchart wrote:
> 
> [snip]
> 
> >>> My other use case (Android camera HAL v3 for Project Ara) mainly deals
> >>> with controls and meta-data, but I'll then likely pass the meta-data
> >>> blob to userspace as-is, as its format isn't always known to the
> >>> driver. I'm also concerned about efficiency but haven't had time to
> >>> perform measurements yet.
> >> 
> >> Hm, why is it not known to the subdevice driver? Does the buffer layout
> >> depend on some external conditions? Maybe loaded firmware? But it should
> >> be possible to tell the driver, say, that the current metadata buffer
> >> layout has version N?
> > 
> > My devices are class-compliant but can use a device-specific meta-data
> > format. The kernel driver knows about the device class only, knowledge
> > about any device-specific format is only available in userspace.
> 
> So, unless you want to add camera-specific code to your class-driver
> (UVC?),

Not UVC, project Ara camera class.

> that's another argument against approach (2) above.

In my case that's correct, although I could still use the request API with a 
single binary blob control.

> >> Those metadata buffers can well contain some parameters, that can also
> >> be obtained via controls. So, if we just send metadata buffers to the
> >> user as is, we create duplication, which isn't nice.
> > 
> > In my case there won't be any duplication as there will likely be no
> > control at all, but I agree with you in the general case.
> > 
> >> Besides, the end user will anyway want broken down control values. E.g.
> >> in the Android case, the app is getting single controls, not opaque
> >> metadata buffers. Of course, one could create a vendor metadata tag
> >> "metadata blob," but that's not how Android does it so far.
> >> 
> >> OTOH passing those buffers to the subdevice driver for parsing and
> >> returning them as an (extended) control also seems a bit ugly.
> >> 
> >> What about performance cost? If we pass all those parameters as a single
> >> extended control (as long as they are of the same class), the cost won't
> >> be higher, than dequeuing a buffer? Let's not take the parsing cost and
> >> the control struct memory overhead into account for now.
> > 
> > If you take nothing into account then the cost won't be higher ;-) It's
> > the parsing cost I was referring to, including the cost of updating the
> > control value from within the kernel.
> 
> I meant mostly context switching costs, switching between the kernel- and
> the user-space. If we had to extract all controls one by one that wouldn't
> be a negligible overhead, I guess.

Agreed.

> [snip]
> 
> >>>> Right, our use-cases so far don't send a lot of data as per-frame
> >>>> metadata, no idea what others do.
> >>> 
> >>> What kind of hardware do you deal with that sends meta-data ? And over
> >>> what kind of channel does it send it ?
> >> 
> >> A CSI-2 connected camera sensor.
> > 
> > Is meta-data sent as embedded data lines with a different CSI-2 DT ?
> 
> A different data type, yes.
> 
> So, all in all it looks that the only immediately available option and,
> possibly, the only feasible option at all is a separate buffer queue. Do
> we agree, that a subdev API is needed to inform the bridge driver about
> the availability and location of the metadata?

As explained above I agree we need to extend the subdev API.

-- 
Regards,

Laurent Pinchart


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: per-frame camera metadata (again)
  2015-12-26 23:47             ` Laurent Pinchart
@ 2016-01-01 15:43               ` Guennadi Liakhovetski
  2016-01-05 11:31                 ` Guennadi Liakhovetski
  0 siblings, 1 reply; 23+ messages in thread
From: Guennadi Liakhovetski @ 2016-01-01 15:43 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Hans Verkuil, Linux Media Mailing List, Mauro Carvalho Chehab,
	Sakari Ailus, Aviv Greenberg

Hi Laurent,

On Sun, 27 Dec 2015, Laurent Pinchart wrote:

> Hi Guennadi,
> 
> On Thursday 24 December 2015 11:42:49 Guennadi Liakhovetski wrote:
> > Hi Laurent,
> > 
> > Let me put this at the top: So far it looks like we converge on two
> > possibilities:
> > 
> > (1) a separate video-device node with a separate queue. No user-space
> > visible changes are required apart from new FOURCC codes. In the kernel
> > we'd have to add some subdev API between the bridge and the sensor drivers
> > to let the sensor driver instruct the bridge driver to use some of the
> > data, arriving over the camera interface, as metadata.
> 
> The interface should be more generic and allow describing how multiple 
> channels (in terms of virtual channels and data types for CSI-2 for instance) 
> are multiplexed over a single physical link. I'm not sure how to represent 
> that at the media controller level, that's also one topic that needs to be 
> researched.

Sure, agree. How about an enumetation style method, something like 
.enum_mbus_streams()?

In principle agree with the rest too, just probably would rather 
concentrate on this one method, if we decide, that we also want to be able 
to access that data, using controls, we can add it later?

Thanks
Guennadi

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: per-frame camera metadata (again)
  2016-01-01 15:43               ` Guennadi Liakhovetski
@ 2016-01-05 11:31                 ` Guennadi Liakhovetski
  2016-01-25 11:14                   ` Guennadi Liakhovetski
  0 siblings, 1 reply; 23+ messages in thread
From: Guennadi Liakhovetski @ 2016-01-05 11:31 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Hans Verkuil, Linux Media Mailing List, Mauro Carvalho Chehab,
	Sakari Ailus, Aviv Greenberg

On Fri, 1 Jan 2016, Guennadi Liakhovetski wrote:

> Hi Laurent,
> 
> On Sun, 27 Dec 2015, Laurent Pinchart wrote:
> 
> > Hi Guennadi,
> > 
> > On Thursday 24 December 2015 11:42:49 Guennadi Liakhovetski wrote:
> > > Hi Laurent,
> > > 
> > > Let me put this at the top: So far it looks like we converge on two
> > > possibilities:
> > > 
> > > (1) a separate video-device node with a separate queue. No user-space
> > > visible changes are required apart from new FOURCC codes. In the kernel
> > > we'd have to add some subdev API between the bridge and the sensor drivers
> > > to let the sensor driver instruct the bridge driver to use some of the
> > > data, arriving over the camera interface, as metadata.
> > 
> > The interface should be more generic and allow describing how multiple 
> > channels (in terms of virtual channels and data types for CSI-2 for instance) 
> > are multiplexed over a single physical link. I'm not sure how to represent 
> > that at the media controller level, that's also one topic that needs to be 
> > researched.
> 
> Sure, agree. How about an enumetation style method, something like 
> .enum_mbus_streams()?

It now also occurs to me, that we currently configure pads with a single 
configuration - pixel format, resolution. However, a single CSI-2 
interface can transfer different frame formats at the same time. So, such 
a sensor driver has to export multiple source pads? The bridge driver 
would export multiple sink pads, then we don't need any new API methods, 
we just configure each link separately, for which we have to add those 
fields to struct v4l2_mbus_framefmt?

Thanks
Guennadi

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: per-frame camera metadata (again)
  2016-01-05 11:31                 ` Guennadi Liakhovetski
@ 2016-01-25 11:14                   ` Guennadi Liakhovetski
  2016-01-25 19:53                     ` Laurent Pinchart
  0 siblings, 1 reply; 23+ messages in thread
From: Guennadi Liakhovetski @ 2016-01-25 11:14 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Hans Verkuil, Linux Media Mailing List, Mauro Carvalho Chehab,
	Sakari Ailus, Aviv Greenberg

On Tue, 5 Jan 2016, Guennadi Liakhovetski wrote:

> On Fri, 1 Jan 2016, Guennadi Liakhovetski wrote:
> 
> > Hi Laurent,
> > 
> > On Sun, 27 Dec 2015, Laurent Pinchart wrote:
> > 
> > > Hi Guennadi,
> > > 
> > > On Thursday 24 December 2015 11:42:49 Guennadi Liakhovetski wrote:
> > > > Hi Laurent,
> > > > 
> > > > Let me put this at the top: So far it looks like we converge on two
> > > > possibilities:
> > > > 
> > > > (1) a separate video-device node with a separate queue. No user-space
> > > > visible changes are required apart from new FOURCC codes. In the kernel
> > > > we'd have to add some subdev API between the bridge and the sensor drivers
> > > > to let the sensor driver instruct the bridge driver to use some of the
> > > > data, arriving over the camera interface, as metadata.
> > > 
> > > The interface should be more generic and allow describing how multiple 
> > > channels (in terms of virtual channels and data types for CSI-2 for instance) 
> > > are multiplexed over a single physical link. I'm not sure how to represent 
> > > that at the media controller level, that's also one topic that needs to be 
> > > researched.
> > 
> > Sure, agree. How about an enumetation style method, something like 
> > .enum_mbus_streams()?
> 
> It now also occurs to me, that we currently configure pads with a single 
> configuration - pixel format, resolution. However, a single CSI-2 
> interface can transfer different frame formats at the same time. So, such 
> a sensor driver has to export multiple source pads? The bridge driver 
> would export multiple sink pads, then we don't need any new API methods, 
> we just configure each link separately, for which we have to add those 
> fields to struct v4l2_mbus_framefmt?

It has been noted, that pads and links conceptually are designed to 
represent physical interfaces and connections between then, therefore 
representing a single CSI-2 link by multiple Media Controller pads and 
links is wrong.

As an alternative it has been proposed to implement a multiplexer and a 
demultiplexer subdevices on the CSI-2 transmitter (camera) and receiver 
(SoC) sides respectively. Originally it has also been proposed to add a 
supporting API to configure multiple streams over such a multiplexed 
connection. However, this seems redundant, because mux sink pads and demux 
source pads will anyway have to be configured individually, which already 
configures the receiver and the transmitter sides.

Currently the design seems to be converging to simply configuring the 
multiplexed link with the MEDIA_BUS_FMT_FIXED format and a fixed 
resolution and perform all real configuration on the other side of the mux 
and demux subdevices. The only API extension, that would be required for 
such a design would be adding CSI-2 Virtual Channel IDs to pad format 
specifications, i.e. to struct v4l2_mbus_framefmt.

On the video device side each stream will be sent to a separate video 
device node. Each CSI-2 controller only supports a finate number of 
streams, that it can demultiplex at any given time. Typically this maximum 
number is much smaller than 256, which is the total number of streams, 
that can be distinguished on a CSI-2 bus, using 2 bits for Virtual 
Channels and 6 bits for data types. For example, if a CSI-2 controller can 
demultiplex up to 8 streams simultaneously, the CSI-2 bridge driver would 
statically create 8 /dev/video* nodes, statically connected to 8 sources 
of an internal demux subdevice. The user-space will then just have to 
configure internal pads with a Virtual Channel number, Media Bus pixel 
format and resolution and the /dev/video* node with the required output 
configuration.

Thanks
Guennadi

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: per-frame camera metadata (again)
  2016-01-25 11:14                   ` Guennadi Liakhovetski
@ 2016-01-25 19:53                     ` Laurent Pinchart
  2016-01-26 12:49                       ` Guennadi Liakhovetski
  0 siblings, 1 reply; 23+ messages in thread
From: Laurent Pinchart @ 2016-01-25 19:53 UTC (permalink / raw)
  To: Guennadi Liakhovetski
  Cc: Hans Verkuil, Linux Media Mailing List, Mauro Carvalho Chehab,
	Sakari Ailus, Aviv Greenberg

Hi Guennadi,

On Monday 25 January 2016 12:14:14 Guennadi Liakhovetski wrote:
> On Tue, 5 Jan 2016, Guennadi Liakhovetski wrote:
> > On Fri, 1 Jan 2016, Guennadi Liakhovetski wrote:
> >> On Sun, 27 Dec 2015, Laurent Pinchart wrote:
> >>> On Thursday 24 December 2015 11:42:49 Guennadi Liakhovetski wrote:
> >>>> Hi Laurent,
> >>>> 
> >>>> Let me put this at the top: So far it looks like we converge on two
> >>>> possibilities:
> >>>> 
> >>>> (1) a separate video-device node with a separate queue. No
> >>>> user-space visible changes are required apart from new FOURCC codes.
> >>>> In the kernel we'd have to add some subdev API between the bridge
> >>>> and the sensor drivers to let the sensor driver instruct the bridge
> >>>> driver to use some of the data, arriving over the camera interface,
> >>>> as metadata.
> >>> 
> >>> The interface should be more generic and allow describing how multiple
> >>> channels (in terms of virtual channels and data types for CSI-2 for
> >>> instance) are multiplexed over a single physical link. I'm not sure
> >>> how to represent that at the media controller level, that's also one
> >>> topic that needs to be researched.
> >> 
> >> Sure, agree. How about an enumetation style method, something like
> >> .enum_mbus_streams()?

I'd rather not. The enumeration-style API isn't really a model of efficiency. 
I'd prefer passing all the data in a single call.

> > It now also occurs to me, that we currently configure pads with a single
> > configuration - pixel format, resolution. However, a single CSI-2
> > interface can transfer different frame formats at the same time. So, such
> > a sensor driver has to export multiple source pads? The bridge driver
> > would export multiple sink pads, then we don't need any new API methods,
> > we just configure each link separately, for which we have to add those
> > fields to struct v4l2_mbus_framefmt?
>
> It has been noted, that pads and links conceptually are designed to
> represent physical interfaces and connections between then, therefore
> representing a single CSI-2 link by multiple Media Controller pads and
> links is wrong.
>
> As an alternative it has been proposed to implement a multiplexer and a
> demultiplexer subdevices on the CSI-2 transmitter (camera) and receiver
> (SoC) sides respectively. Originally it has also been proposed to add a
> supporting API to configure multiple streams over such a multiplexed
> connection. However, this seems redundant, because mux sink pads and demux
> source pads will anyway have to be configured individually, which already
> configures the receiver and the transmitter sides.

You have a point, but I wonder how we would then validate pipelines.

> Currently the design seems to be converging to simply configuring the
> multiplexed link with the MEDIA_BUS_FMT_FIXED format and a fixed
> resolution and perform all real configuration on the other side of the mux
> and demux subdevices. The only API extension, that would be required for
> such a design would be adding CSI-2 Virtual Channel IDs to pad format
> specifications, i.e. to struct v4l2_mbus_framefmt.

I wouldn't add a CSI2-specific field, but a more generic stream ID instead. We 
would then need a way to map stream IDs to the actual bus implementations. For 
CSI-2 that would include both virtual channel and data type.

> On the video device side each stream will be sent to a separate video
> device node.

Not necessarily, they could be sent to different pieces of hardware.

> Each CSI-2 controller only supports a finate number of streams, that it can
> demultiplex at any given time. Typically this maximum number is much smaller
> than 256, which is the total number of streams, that can be distinguished on
> a CSI-2 bus, using 2 bits for Virtual Channels and 6 bits for data types.
> For example, if a CSI-2 controller can demultiplex up to 8 streams
> simultaneously, the CSI-2 bridge driver would statically create 8
> /dev/video* nodes, statically connected to 8 sources of an internal demux
> subdevice. The user-space will then just have to configure internal pads
> with a Virtual Channel number, Media Bus pixel format and resolution and the
> /dev/video* node with the required output configuration.

If there are 8 independent DMA engines then 8 video nodes would seem quite 
logical. Another option would be to create a single video node with 8 buffer 
queues. I'm still debating that with myself though, but it could make sense in 
the case of a single DMA engine with multiple contexts. One could argue that 
we're touching a grey area.

-- 
Regards,

Laurent Pinchart


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: per-frame camera metadata (again)
  2016-01-25 19:53                     ` Laurent Pinchart
@ 2016-01-26 12:49                       ` Guennadi Liakhovetski
  2016-01-29 10:08                         ` Guennadi Liakhovetski
  0 siblings, 1 reply; 23+ messages in thread
From: Guennadi Liakhovetski @ 2016-01-26 12:49 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Hans Verkuil, Linux Media Mailing List, Mauro Carvalho Chehab,
	Sakari Ailus, Aviv Greenberg

Hi Laurent,

On Mon, 25 Jan 2016, Laurent Pinchart wrote:

> Hi Guennadi,
> 
> On Monday 25 January 2016 12:14:14 Guennadi Liakhovetski wrote:
> > On Tue, 5 Jan 2016, Guennadi Liakhovetski wrote:
> > > On Fri, 1 Jan 2016, Guennadi Liakhovetski wrote:
> > >> On Sun, 27 Dec 2015, Laurent Pinchart wrote:
> > >>> On Thursday 24 December 2015 11:42:49 Guennadi Liakhovetski wrote:
> > >>>> Hi Laurent,
> > >>>> 
> > >>>> Let me put this at the top: So far it looks like we converge on two
> > >>>> possibilities:
> > >>>> 
> > >>>> (1) a separate video-device node with a separate queue. No
> > >>>> user-space visible changes are required apart from new FOURCC codes.
> > >>>> In the kernel we'd have to add some subdev API between the bridge
> > >>>> and the sensor drivers to let the sensor driver instruct the bridge
> > >>>> driver to use some of the data, arriving over the camera interface,
> > >>>> as metadata.
> > >>> 
> > >>> The interface should be more generic and allow describing how multiple
> > >>> channels (in terms of virtual channels and data types for CSI-2 for
> > >>> instance) are multiplexed over a single physical link. I'm not sure
> > >>> how to represent that at the media controller level, that's also one
> > >>> topic that needs to be researched.
> > >> 
> > >> Sure, agree. How about an enumetation style method, something like
> > >> .enum_mbus_streams()?
> 
> I'd rather not. The enumeration-style API isn't really a model of efficiency. 
> I'd prefer passing all the data in a single call.

Right, that idea has been abandoned by now.

> > > It now also occurs to me, that we currently configure pads with a single
> > > configuration - pixel format, resolution. However, a single CSI-2
> > > interface can transfer different frame formats at the same time. So, such
> > > a sensor driver has to export multiple source pads? The bridge driver
> > > would export multiple sink pads, then we don't need any new API methods,
> > > we just configure each link separately, for which we have to add those
> > > fields to struct v4l2_mbus_framefmt?
> >
> > It has been noted, that pads and links conceptually are designed to
> > represent physical interfaces and connections between then, therefore
> > representing a single CSI-2 link by multiple Media Controller pads and
> > links is wrong.
> >
> > As an alternative it has been proposed to implement a multiplexer and a
> > demultiplexer subdevices on the CSI-2 transmitter (camera) and receiver
> > (SoC) sides respectively. Originally it has also been proposed to add a
> > supporting API to configure multiple streams over such a multiplexed
> > connection. However, this seems redundant, because mux sink pads and demux
> > source pads will anyway have to be configured individually, which already
> > configures the receiver and the transmitter sides.
> 
> You have a point, but I wonder how we would then validate pipelines.

Well, maybe from "such" pads we should require a .link_validate() method? 
And "such" could mean pads, using the FIXED format, or 0x0 size.

> > Currently the design seems to be converging to simply configuring the
> > multiplexed link with the MEDIA_BUS_FMT_FIXED format and a fixed
> > resolution and perform all real configuration on the other side of the mux
> > and demux subdevices. The only API extension, that would be required for
> > such a design would be adding CSI-2 Virtual Channel IDs to pad format
> > specifications, i.e. to struct v4l2_mbus_framefmt.
> 
> I wouldn't add a CSI2-specific field, but a more generic stream ID instead. We 
> would then need a way to map stream IDs to the actual bus implementations. For 
> CSI-2 that would include both virtual channel and data type.

We discussed the CSI-2 data type with Sakari and I explained to him, that 
I think, that the CSI-2 data type should be directly mapped to the Media 
Bus pixel code. You know, that CSI-2 defines a few such codes, so, adding 
the data type for those would be a real duplication and an additional 
point to check for. It is less clear with user-defined data types, that 
the CSI-2 leaves free for device-specific formats, but there are only 8 
of them. We could just add generic numeric defines for them like

#define MEDIA_BUS_FMT_CSI2_UD_0X30 ...
#define MEDIA_BUS_FMT_CSI2_UD_0X31 ...
...
#define MEDIA_BUS_FMT_CSI2_UD_0X37 ...

That's it.

As for stream ID vs. virtual channel number - we can go either way. I 
think the bus type information combined with a union whould be sufficient.

> > On the video device side each stream will be sent to a separate video
> > device node.
> 
> Not necessarily, they could be sent to different pieces of hardware.

Yes, sure, then those nodes would just be EBUSY.

> > Each CSI-2 controller only supports a finate number of streams, that it can
> > demultiplex at any given time. Typically this maximum number is much smaller
> > than 256, which is the total number of streams, that can be distinguished on
> > a CSI-2 bus, using 2 bits for Virtual Channels and 6 bits for data types.
> > For example, if a CSI-2 controller can demultiplex up to 8 streams
> > simultaneously, the CSI-2 bridge driver would statically create 8
> > /dev/video* nodes, statically connected to 8 sources of an internal demux
> > subdevice. The user-space will then just have to configure internal pads
> > with a Virtual Channel number, Media Bus pixel format and resolution and the
> > /dev/video* node with the required output configuration.
> 
> If there are 8 independent DMA engines then 8 video nodes would seem quite 
> logical. Another option would be to create a single video node with 8 buffer 
> queues. I'm still debating that with myself though, but it could make sense in 
> the case of a single DMA engine with multiple contexts. One could argue that 
> we're touching a grey area.

I haven't read the sources of all those controllers :) I suspect however, 
that at least in many cases it will be just one DMA engine with multiple 
channels. I understand (at least some) disadvantages of using multiple 
video nodes, but at least this is something we can use now without really 
breaking or (badly) abusing any existing APIs. We can add multiple buffer 
queue support too, but so far I don't see a _really_ compelling reason for 
that. Nice to have, but not a deal breaker IMHO.

My another doubt is statically creating all (8) video nodes vs. creating 
them on the fly as respective pads get configured. The latter seems more 
elegant to me and wouldn't fry Linux hotplug daemons by presenting them 
with dozens of non-functional video devices, but also when such devices 
get created the daemons would feel attacked and dealing with dyamically 
creating ones would be more difficult for applications too.

Thanks
Guennadi

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: per-frame camera metadata (again)
  2016-01-26 12:49                       ` Guennadi Liakhovetski
@ 2016-01-29 10:08                         ` Guennadi Liakhovetski
  0 siblings, 0 replies; 23+ messages in thread
From: Guennadi Liakhovetski @ 2016-01-29 10:08 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Hans Verkuil, Linux Media Mailing List, Mauro Carvalho Chehab,
	Sakari Ailus, Aviv Greenberg

Replying to myself with some further ideas.

On Tue, 26 Jan 2016, Guennadi Liakhovetski wrote:

> Hi Laurent,
> 
> On Mon, 25 Jan 2016, Laurent Pinchart wrote:
> 
> > Hi Guennadi,
> > 
> > On Monday 25 January 2016 12:14:14 Guennadi Liakhovetski wrote:
> > > On Tue, 5 Jan 2016, Guennadi Liakhovetski wrote:
> > > > On Fri, 1 Jan 2016, Guennadi Liakhovetski wrote:
> > > >> On Sun, 27 Dec 2015, Laurent Pinchart wrote:
> > > >>> On Thursday 24 December 2015 11:42:49 Guennadi Liakhovetski wrote:

[snip]

> > > > It now also occurs to me, that we currently configure pads with a single
> > > > configuration - pixel format, resolution. However, a single CSI-2
> > > > interface can transfer different frame formats at the same time. So, such
> > > > a sensor driver has to export multiple source pads? The bridge driver
> > > > would export multiple sink pads, then we don't need any new API methods,
> > > > we just configure each link separately, for which we have to add those
> > > > fields to struct v4l2_mbus_framefmt?
> > >
> > > It has been noted, that pads and links conceptually are designed to
> > > represent physical interfaces and connections between then, therefore
> > > representing a single CSI-2 link by multiple Media Controller pads and
> > > links is wrong.
> > >
> > > As an alternative it has been proposed to implement a multiplexer and a
> > > demultiplexer subdevices on the CSI-2 transmitter (camera) and receiver
> > > (SoC) sides respectively. Originally it has also been proposed to add a
> > > supporting API to configure multiple streams over such a multiplexed
> > > connection. However, this seems redundant, because mux sink pads and demux
> > > source pads will anyway have to be configured individually, which already
> > > configures the receiver and the transmitter sides.
> > 
> > You have a point, but I wonder how we would then validate pipelines.
> 
> Well, maybe from "such" pads we should require a .link_validate() method? 
> And "such" could mean pads, using the FIXED format, or 0x0 size.

for the following configuration:

+------ sensor ------+    +- bridge -+
-----------    -------
| subdev1 |--->| mux |    ---------
-----------    |     |    | demux |->
-----------    |     |--->|       |
| subdev2 |--->|     |    |       |->
-----------    -------    ---------

we could let the demux driver implement a function, that would call 
.get_fmt() pad operation for all sink pad of the remote (sensor) mux and 
check, whether for each of its own source pads a suitably configured 
format can be found. That function can then be called during STREAMON 
processing. This mux-demux functionality seems rather 
hardware-independent, so, it could be provided as library functions.

However, if the sensor driver is "very simple," e.g. always only sends a 
fixed video format with a fixed accompanying metadata format, it could be 
desirable to implement it as a single subdevice, without a mux, which then 
would be impossible.

> > > Currently the design seems to be converging to simply configuring the
> > > multiplexed link with the MEDIA_BUS_FMT_FIXED format and a fixed
> > > resolution and perform all real configuration on the other side of the mux
> > > and demux subdevices. The only API extension, that would be required for
> > > such a design would be adding CSI-2 Virtual Channel IDs to pad format
> > > specifications, i.e. to struct v4l2_mbus_framefmt.
> > 
> > I wouldn't add a CSI2-specific field, but a more generic stream ID instead. We 
> > would then need a way to map stream IDs to the actual bus implementations. For 
> > CSI-2 that would include both virtual channel and data type.
> 
> We discussed the CSI-2 data type with Sakari and I explained to him, that 
> I think, that the CSI-2 data type should be directly mapped to the Media 
> Bus pixel code. You know, that CSI-2 defines a few such codes, so, adding 
> the data type for those would be a real duplication and an additional 
> point to check for. It is less clear with user-defined data types, that 
> the CSI-2 leaves free for device-specific formats, but there are only 8 
> of them. We could just add generic numeric defines for them like
> 
> #define MEDIA_BUS_FMT_CSI2_UD_0X30 ...
> #define MEDIA_BUS_FMT_CSI2_UD_0X31 ...
> ...
> #define MEDIA_BUS_FMT_CSI2_UD_0X37 ...
> 
> That's it.
> 
> As for stream ID vs. virtual channel number - we can go either way. I 
> think the bus type information combined with a union whould be sufficient.
> 
> > > On the video device side each stream will be sent to a separate video
> > > device node.
> > 
> > Not necessarily, they could be sent to different pieces of hardware.
> 
> Yes, sure, then those nodes would just be EBUSY.
> 
> > > Each CSI-2 controller only supports a finate number of streams, that it can
> > > demultiplex at any given time. Typically this maximum number is much smaller
> > > than 256, which is the total number of streams, that can be distinguished on
> > > a CSI-2 bus, using 2 bits for Virtual Channels and 6 bits for data types.
> > > For example, if a CSI-2 controller can demultiplex up to 8 streams
> > > simultaneously, the CSI-2 bridge driver would statically create 8
> > > /dev/video* nodes, statically connected to 8 sources of an internal demux
> > > subdevice. The user-space will then just have to configure internal pads
> > > with a Virtual Channel number, Media Bus pixel format and resolution and the
> > > /dev/video* node with the required output configuration.
> > 
> > If there are 8 independent DMA engines then 8 video nodes would seem quite 
> > logical. Another option would be to create a single video node with 8 buffer 
> > queues. I'm still debating that with myself though, but it could make sense in 
> > the case of a single DMA engine with multiple contexts. One could argue that 
> > we're touching a grey area.
> 
> I haven't read the sources of all those controllers :) I suspect however, 
> that at least in many cases it will be just one DMA engine with multiple 
> channels. I understand (at least some) disadvantages of using multiple 
> video nodes, but at least this is something we can use now without really 
> breaking or (badly) abusing any existing APIs. We can add multiple buffer 
> queue support too, but so far I don't see a _really_ compelling reason for 
> that. Nice to have, but not a deal breaker IMHO.
> 
> My another doubt is statically creating all (8) video nodes vs. creating 
> them on the fly as respective pads get configured. The latter seems more 
> elegant to me and wouldn't fry Linux hotplug daemons by presenting them 
> with dozens of non-functional video devices, but also when such devices 
> get created the daemons would feel attacked and dealing with dyamically 
> creating ones would be more difficult for applications too.

If we do want to implement multiple video-buffer queues per video node, we 
could add a stream parameter to buffer handling ioctl()s, using 1 byte 
from reserved space in respective structs. The affected structs would be

v4l2_create_buffers
v4l2_exportbuffer
v4l2_buffer

The only core code, that makes assumptions about where and how many 
buffers exist per video device are ioctl() and fops implementations in 
videobuf2-v4l2.c, but their use is optional, many drivers implement those 
methods themsevls anyway. So, drivers, wishing to support multiple buffer 
queues will be forced to do that too. Then the actual handling of the 
stream ID will be done by respective bridge drivers as wll, no support in 
the core should be required.

Thanks
Guennadi

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2016-01-29 10:08 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-16  9:37 per-frame camera metadata (again) Guennadi Liakhovetski
2015-12-16 10:02 ` Hans Verkuil
2015-12-16 11:25   ` Guennadi Liakhovetski
2015-12-21  3:41     ` Laurent Pinchart
2015-12-22 11:16       ` Guennadi Liakhovetski
2015-12-22 13:30         ` karthik poduval
2015-12-24 10:54           ` Laurent Pinchart
2015-12-23 17:40         ` Laurent Pinchart
2015-12-24 10:42           ` Guennadi Liakhovetski
2015-12-26 23:47             ` Laurent Pinchart
2016-01-01 15:43               ` Guennadi Liakhovetski
2016-01-05 11:31                 ` Guennadi Liakhovetski
2016-01-25 11:14                   ` Guennadi Liakhovetski
2016-01-25 19:53                     ` Laurent Pinchart
2016-01-26 12:49                       ` Guennadi Liakhovetski
2016-01-29 10:08                         ` Guennadi Liakhovetski
2015-12-19  0:06   ` Sakari Ailus
2015-12-23  9:47     ` Guennadi Liakhovetski
2015-12-24 10:46       ` Laurent Pinchart
2015-12-24 11:17         ` hverkuil
2015-12-24 11:29           ` Laurent Pinchart
2015-12-24 12:54             ` hverkuil
2015-12-24 17:33               ` Laurent Pinchart

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.