linux-media.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] Motion Detection API
@ 2013-04-12 15:36 Hans Verkuil
  2013-04-21 12:04 ` Ismael Luceno
  2013-04-29 20:52 ` Laurent Pinchart
  0 siblings, 2 replies; 16+ messages in thread
From: Hans Verkuil @ 2013-04-12 15:36 UTC (permalink / raw)
  To: linux-media
  Cc: Volokh Konstantin, Pete Eberlein, Ismael Luceno,
	Sylwester Nawrocki, Kamil Debski

This RFC looks at adding support for motion detection to V4L2. This is the main
missing piece that prevents the go7007 and solo6x10 drivers from being moved
into mainline from the staging directory.

Step one is to look at existing drivers/hardware:

1) The go7007 driver:

	- divides the frame into blocks of 16x16 pixels each (that's 45x36 blocks for PAL)
	- each block can be assigned to region 0, 1, 2 or 3
	- each region has:
		- a pixel change threshold
		- a motion vector change threshold
		- a trigger level; if this is 0, then motion detection for this
		  region is disabled
	- when streaming the reserved field of v4l2_buffer is used as a bitmask:
	  one bit for each region where motion is detected.

2) The solo6x10 driver:

	- divides the frame into blocks of 16x16 pixels each
	- each block has its own threshold
	- the driver adds one MOTION_ON buffer flag and one MOTION_DETECTED buffer
	  flag.
	- motion detection can be disabled or enabled.
	- the driver has a global motion detection mode with just one threshold:
	  in that case all blocks are set to the same threshold.
	- there is also support for displaying a border around the image if motion
	  is detected (very hardware specific).

3) The tw2804 video encoder (based on the datasheet, not implemented in the driver):

	- divides the image in 12x12 blocks (block size will differ for NTSC vs PAL)
	- motion detection can be enabled or disabled for each block
	- there are four controls: 
		- luminance level change threshold
		- spatial sensitivity threshold
		- temporal sensitivity threshold
		- velocity control (determines how well slow motions are detected)
	- detection is reported by a hardware pin in this case

Comparing these three examples of motion detection I see quite a lot of similarities,
enough to make a proposal for an API:

- Add a MOTION_DETECTION menu control:
	- Disabled
	- Global Motion Detection
	- Regional Motion Detection

If 'Global Motion Detection' is selected, then various threshold controls become
available. What sort of thresholds are available seems to be quite variable, so
I am inclined to leave this as private controls.

- Add new buffer flags when motion is detected. The go7007 driver would need 4
  bits (one for each region), the others just one. This can be done by taking
  4 bits from the v4l2_buffer flags field. There are still 16 bits left there,
  and if it becomes full, then we still have two reserved fields. I see no
  reason for adding a 'MOTION_ON' flag as the solo6x10 driver does today: just
  check the MOTION_DETECTION control if you want to know if motion detection
  is on or not.

- Add two new ioctls to get and set the block data:

	#define V4L2_MD_HOR_BLOCKS (64)
	#define V4L2_MD_VERT_BLOCKS (48)

	#define V4L2_MD_TYPE_REGION	(1)
	#define V4L2_MD_TYPE_THRESHOLD	(2)

	struct v4l2_md_blocks {
		__u32 type;
		struct v4l2_rect rect;
		__u32 minimum;
		__u32 maximum;
		__u32 reserved[32];
        	__u16 blocks[V4L2_MD_HOR_BLOCKS][V4L2_MD_VERT_BLOCKS];
	};

	#define VIDIOC_G_MD_BLOCKS    _IORW('V', 103, struct v4l2_md_blocks)
	#define VIDIOC_S_MD_BLOCKS    _IORW('V', 104, struct v4l2_md_blocks)

  Apps must fill in type, then can call G_MD_BLOCKS to get the current block
  values for that type. TYPE_REGION returns to which region each block belongs,
  TYPE_THRESHOLD returns threshold values for each block.

  rect returns the rectangle of valid blocks, minimum and maximum the min and max
  values for each 'blocks' array element.

  To change the blocks apps call S_MD_BLOCKS, fill in type, rect (rect is useful
  here to set only a subset of all blocks) and blocks.

So the go7007 would return 45x36 in rect, type would be REGION, min/max would be
0-3.

solo6x10 would return 45x36 in rect, type would be THRESHOLD, min/max would be
0-65535.

TW2804 would return 12x12 in rect, type would be THRESHOLD, min/max would be 0-1.

Comment? Questions?

Regards,

	Hans

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] Motion Detection API
  2013-04-12 15:36 [RFC] Motion Detection API Hans Verkuil
@ 2013-04-21 12:04 ` Ismael Luceno
  2013-04-22  7:55   ` Hans Verkuil
  2013-04-29 20:52 ` Laurent Pinchart
  1 sibling, 1 reply; 16+ messages in thread
From: Ismael Luceno @ 2013-04-21 12:04 UTC (permalink / raw)
  To: Hans Verkuil; +Cc: linux-media

[-- Attachment #1: Type: text/plain, Size: 337 bytes --]

On Fri, 12 Apr 2013 17:36:16 +0200
Hans Verkuil <hverkuil@xs4all.nl> wrote:
> This RFC looks at adding support for motion detection to V4L2. This
> is the main missing piece that prevents the go7007 and solo6x10
> drivers from being moved into mainline from the staging directory.
<...>
> Comment? Questions?

+1. I like it :).

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] Motion Detection API
  2013-04-21 12:04 ` Ismael Luceno
@ 2013-04-22  7:55   ` Hans Verkuil
  0 siblings, 0 replies; 16+ messages in thread
From: Hans Verkuil @ 2013-04-22  7:55 UTC (permalink / raw)
  To: Ismael Luceno; +Cc: linux-media

On Sun April 21 2013 14:04:26 Ismael Luceno wrote:
> On Fri, 12 Apr 2013 17:36:16 +0200
> Hans Verkuil <hverkuil@xs4all.nl> wrote:
> > This RFC looks at adding support for motion detection to V4L2. This
> > is the main missing piece that prevents the go7007 and solo6x10
> > drivers from being moved into mainline from the staging directory.
> <...>
> > Comment? Questions?
> 
> +1. I like it :).
> 

Cool. Now all I need is time to actually implement this :-)

Regards,

	Hans

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] Motion Detection API
  2013-04-12 15:36 [RFC] Motion Detection API Hans Verkuil
  2013-04-21 12:04 ` Ismael Luceno
@ 2013-04-29 20:52 ` Laurent Pinchart
  2013-05-06 13:41   ` Hans Verkuil
  1 sibling, 1 reply; 16+ messages in thread
From: Laurent Pinchart @ 2013-04-29 20:52 UTC (permalink / raw)
  To: Hans Verkuil
  Cc: linux-media, Volokh Konstantin, Pete Eberlein, Ismael Luceno,
	Sylwester Nawrocki, Kamil Debski

Hi Hans,

Sorry for the late reply.

On Friday 12 April 2013 17:36:16 Hans Verkuil wrote:
> This RFC looks at adding support for motion detection to V4L2. This is the
> main missing piece that prevents the go7007 and solo6x10 drivers from being
> moved into mainline from the staging directory.
> 
> Step one is to look at existing drivers/hardware:
> 
> 1) The go7007 driver:
> 
> 	- divides the frame into blocks of 16x16 pixels each (that's 45x36 blocks
> 	  for PAL)
> 	- each block can be assigned to region 0, 1, 2 or 3
> 	- each region has:
> 		- a pixel change threshold
> 		- a motion vector change threshold
> 		- a trigger level; if this is 0, then motion detection for this
> 		  region is disabled
> 	- when streaming the reserved field of v4l2_buffer is used as a bitmask:
> 	  one bit for each region where motion is detected.
> 
> 2) The solo6x10 driver:
> 
> 	- divides the frame into blocks of 16x16 pixels each
> 	- each block has its own threshold
> 	- the driver adds one MOTION_ON buffer flag and one MOTION_DETECTED
> 	  buffer flag.
> 	- motion detection can be disabled or enabled.
> 	- the driver has a global motion detection mode with just one threshold:
> 	  in that case all blocks are set to the same threshold.
> 	- there is also support for displaying a border around the image if 
> 	  motion is detected (very hardware specific).
> 
> 3) The tw2804 video encoder (based on the datasheet, not implemented in the
> driver):
> 
> 	- divides the image in 12x12 blocks (block size will differ for NTSC vs
> 	  PAL)
> 	- motion detection can be enabled or disabled for each block
> 	- there are four controls:
> 		- luminance level change threshold
> 		- spatial sensitivity threshold
> 		- temporal sensitivity threshold
> 		- velocity control (determines how well slow motions are detected)
> 	- detection is reported by a hardware pin in this case
> 
> Comparing these three examples of motion detection I see quite a lot of
> similarities, enough to make a proposal for an API:
> 
> - Add a MOTION_DETECTION menu control:
> 	- Disabled
> 	- Global Motion Detection
> 	- Regional Motion Detection
> 
> If 'Global Motion Detection' is selected, then various threshold controls
> become available. What sort of thresholds are available seems to be quite
> variable, so I am inclined to leave this as private controls.
> 
> - Add new buffer flags when motion is detected. The go7007 driver would need
> 4 bits (one for each region), the others just one. This can be done by
> taking 4 bits from the v4l2_buffer flags field. There are still 16 bits
> left there, and if it becomes full, then we still have two reserved fields.
> I see no reason for adding a 'MOTION_ON' flag as the solo6x10 driver does
> today: just check the MOTION_DETECTION control if you want to know if
> motion detection is on or not.

We're really starting to shove metadata in buffer flags. Isn't it time to add 
a proper metadata API ? I don't really like the idea of using (valuable) 
buffer flags for a feature supported by three drivers only.

> - Add two new ioctls to get and set the block data:
> 
> 	#define V4L2_MD_HOR_BLOCKS (64)
> 	#define V4L2_MD_VERT_BLOCKS (48)
> 
> 	#define V4L2_MD_TYPE_REGION	(1)
> 	#define V4L2_MD_TYPE_THRESHOLD	(2)
> 
> 	struct v4l2_md_blocks {
> 		__u32 type;
> 		struct v4l2_rect rect;
> 		__u32 minimum;
> 		__u32 maximum;
> 		__u32 reserved[32];
> 		__u16 blocks[V4L2_MD_HOR_BLOCKS][V4L2_MD_VERT_BLOCKS];
> 	};
> 
> 	#define VIDIOC_G_MD_BLOCKS    _IORW('V', 103, struct v4l2_md_blocks)
> 	#define VIDIOC_S_MD_BLOCKS    _IORW('V', 104, struct v4l2_md_blocks)
> 
>   Apps must fill in type, then can call G_MD_BLOCKS to get the current block
> values for that type. TYPE_REGION returns to which region each block
> belongs, TYPE_THRESHOLD returns threshold values for each block.
> 
>   rect returns the rectangle of valid blocks, minimum and maximum the min
> and max values for each 'blocks' array element.
> 
>   To change the blocks apps call S_MD_BLOCKS, fill in type, rect (rect is
> useful here to set only a subset of all blocks) and blocks.
> 
> So the go7007 would return 45x36 in rect, type would be REGION, min/max
> would be 0-3.
> 
> solo6x10 would return 45x36 in rect, type would be THRESHOLD, min/max would
> be 0-65535.
> 
> TW2804 would return 12x12 in rect, type would be THRESHOLD, min/max would be
> 0-1.
> 
> Comment? Questions?

-- 
Regards,

Laurent Pinchart


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] Motion Detection API
  2013-04-29 20:52 ` Laurent Pinchart
@ 2013-05-06 13:41   ` Hans Verkuil
  2013-05-07 12:09     ` Laurent Pinchart
  0 siblings, 1 reply; 16+ messages in thread
From: Hans Verkuil @ 2013-05-06 13:41 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: linux-media, Volokh Konstantin, Pete Eberlein, Ismael Luceno,
	Sylwester Nawrocki, Kamil Debski

On Mon April 29 2013 22:52:31 Laurent Pinchart wrote:
> Hi Hans,
> 
> Sorry for the late reply.
> 
> On Friday 12 April 2013 17:36:16 Hans Verkuil wrote:
> > This RFC looks at adding support for motion detection to V4L2. This is the
> > main missing piece that prevents the go7007 and solo6x10 drivers from being
> > moved into mainline from the staging directory.
> > 
> > Step one is to look at existing drivers/hardware:
> > 
> > 1) The go7007 driver:
> > 
> > 	- divides the frame into blocks of 16x16 pixels each (that's 45x36 blocks
> > 	  for PAL)
> > 	- each block can be assigned to region 0, 1, 2 or 3
> > 	- each region has:
> > 		- a pixel change threshold
> > 		- a motion vector change threshold
> > 		- a trigger level; if this is 0, then motion detection for this
> > 		  region is disabled
> > 	- when streaming the reserved field of v4l2_buffer is used as a bitmask:
> > 	  one bit for each region where motion is detected.
> > 
> > 2) The solo6x10 driver:
> > 
> > 	- divides the frame into blocks of 16x16 pixels each
> > 	- each block has its own threshold
> > 	- the driver adds one MOTION_ON buffer flag and one MOTION_DETECTED
> > 	  buffer flag.
> > 	- motion detection can be disabled or enabled.
> > 	- the driver has a global motion detection mode with just one threshold:
> > 	  in that case all blocks are set to the same threshold.
> > 	- there is also support for displaying a border around the image if 
> > 	  motion is detected (very hardware specific).
> > 
> > 3) The tw2804 video encoder (based on the datasheet, not implemented in the
> > driver):
> > 
> > 	- divides the image in 12x12 blocks (block size will differ for NTSC vs
> > 	  PAL)
> > 	- motion detection can be enabled or disabled for each block
> > 	- there are four controls:
> > 		- luminance level change threshold
> > 		- spatial sensitivity threshold
> > 		- temporal sensitivity threshold
> > 		- velocity control (determines how well slow motions are detected)
> > 	- detection is reported by a hardware pin in this case
> > 
> > Comparing these three examples of motion detection I see quite a lot of
> > similarities, enough to make a proposal for an API:
> > 
> > - Add a MOTION_DETECTION menu control:
> > 	- Disabled
> > 	- Global Motion Detection
> > 	- Regional Motion Detection
> > 
> > If 'Global Motion Detection' is selected, then various threshold controls
> > become available. What sort of thresholds are available seems to be quite
> > variable, so I am inclined to leave this as private controls.
> > 
> > - Add new buffer flags when motion is detected. The go7007 driver would need
> > 4 bits (one for each region), the others just one. This can be done by
> > taking 4 bits from the v4l2_buffer flags field. There are still 16 bits
> > left there, and if it becomes full, then we still have two reserved fields.
> > I see no reason for adding a 'MOTION_ON' flag as the solo6x10 driver does
> > today: just check the MOTION_DETECTION control if you want to know if
> > motion detection is on or not.
> 
> We're really starting to shove metadata in buffer flags. Isn't it time to add 
> a proper metadata API ? I don't really like the idea of using (valuable) 
> buffer flags for a feature supported by three drivers only.

There are still 18 (not 16) bits remaining, so we are hardly running out of bits.
And I feel it is overkill to create a new metadata API for just a few bits.

It's actually quite rare that bits are added here, so I am OK with this myself.

I will produce an RFCv2 though (the current API doesn't really extend to 1080p motion
detection due to the limited number of blocks), and I will take another look
at this. Although I don't really know off-hand how to implement it. One idea that
I had (just a brainstorm at this moment) is to associate V4L2 events with a buffer.
So internally buffers would have an event queue and events could be added there
once the driver is done with the buffer.

An event buffer flag would be set, signalling to userspace that events are
available for this buffer and DQEVENT can be used to determine which events
there are (e.g. a motion detection event).

This makes it easy to associate many types of event with a buffer (motion detection,
face/smile/whatever detection) using standard ioctls, but it feels a bit convoluted
at the same time. It's probably worth pursuing, though, as it is nicely generic.

An alternative might be to set a 'DETECT' buffer flag, and rename 'reserved2'
to 'detect_mask', thus having up to 32 things to detect. The problem with that
is that we have no idea yet how to do face detection or any other type of
detection since we have no experience with it at all. So 32 bits may also be
insufficient, and I'd rather not use up a full field.

Regards,

	Hans

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] Motion Detection API
  2013-05-06 13:41   ` Hans Verkuil
@ 2013-05-07 12:09     ` Laurent Pinchart
  2013-05-07 12:35       ` Hans Verkuil
  2013-05-07 20:59       ` Sakari Ailus
  0 siblings, 2 replies; 16+ messages in thread
From: Laurent Pinchart @ 2013-05-07 12:09 UTC (permalink / raw)
  To: Hans Verkuil
  Cc: linux-media, Volokh Konstantin, Pete Eberlein, Ismael Luceno,
	Sylwester Nawrocki, Kamil Debski, sakari.ailus

Hi Hans,

(CC'ing Sakari, I know he's missing reviewing V4L2 patches ;-))

On Monday 06 May 2013 15:41:41 Hans Verkuil wrote:
> On Mon April 29 2013 22:52:31 Laurent Pinchart wrote:
> > On Friday 12 April 2013 17:36:16 Hans Verkuil wrote:
> > > This RFC looks at adding support for motion detection to V4L2. This is
> > > the main missing piece that prevents the go7007 and solo6x10 drivers
> > > from being moved into mainline from the staging directory.
> > > 
> > > Step one is to look at existing drivers/hardware:
> > > 
> > > 1) The go7007 driver:
> > > 	- divides the frame into blocks of 16x16 pixels each (that's 45x36
> > > 	  blocks for PAL)
> > > 	- each block can be assigned to region 0, 1, 2 or 3
> > > 	- each region has:
> > > 		- a pixel change threshold
> > > 		- a motion vector change threshold
> > > 		- a trigger level; if this is 0, then motion detection for this
> > > 		  region is disabled
> > > 	- when streaming the reserved field of v4l2_buffer is used as a
> > > 	  bitmask: one bit for each region where motion is detected.
> > > 
> > > 2) The solo6x10 driver:
> > > 	- divides the frame into blocks of 16x16 pixels each
> > > 	- each block has its own threshold
> > > 	- the driver adds one MOTION_ON buffer flag and one MOTION_DETECTED
> > > 	  buffer flag.
> > > 	- motion detection can be disabled or enabled.
> > > 	- the driver has a global motion detection mode with just one
> > > 	  threshold: in that case all blocks are set to the same threshold.
> > > 	- there is also support for displaying a border around the image if
> > > 	  motion is detected (very hardware specific).
> > > 
> > > 3) The tw2804 video encoder (based on the datasheet, not implemented in
> > > the driver):
> > > 	- divides the image in 12x12 blocks (block size will differ for NTSC
> > > 	  vs PAL)
> > > 	- motion detection can be enabled or disabled for each block
> > > 	- there are four controls:
> > > 		- luminance level change threshold
> > > 		- spatial sensitivity threshold
> > > 		- temporal sensitivity threshold
> > > 		- velocity control (determines how well slow motions are
> > > 		  detected)
> > > 	- detection is reported by a hardware pin in this case
> > > 
> > > Comparing these three examples of motion detection I see quite a lot of
> > > similarities, enough to make a proposal for an API:
> > > 
> > > - Add a MOTION_DETECTION menu control:
> > > 	- Disabled
> > > 	- Global Motion Detection
> > > 	- Regional Motion Detection
> > > 
> > > If 'Global Motion Detection' is selected, then various threshold
> > > controls become available. What sort of thresholds are available seems
> > > to be quite variable, so I am inclined to leave this as private
> > > controls.
> > > 
> > > - Add new buffer flags when motion is detected. The go7007 driver would
> > > need 4 bits (one for each region), the others just one. This can be
> > > done by taking 4 bits from the v4l2_buffer flags field. There are still
> > > 16 bits left there, and if it becomes full, then we still have two
> > > reserved fields. I see no reason for adding a 'MOTION_ON' flag as the
> > > solo6x10 driver does today: just check the MOTION_DETECTION control if
> > > you want to know if motion detection is on or not.
> > 
> > We're really starting to shove metadata in buffer flags. Isn't it time to
> > add a proper metadata API ? I don't really like the idea of using
> > (valuable) buffer flags for a feature supported by three drivers only.
> 
> There are still 18 (not 16) bits remaining, so we are hardly running out of
> bits. And I feel it is overkill to create a new metadata API for just a few
> bits.

Creating a metadata API for any of the small piece of metadata information is 
overkil, but when we add them up all together it definitely makes sense. We've 
been postponing that API for some time now, it's in my opinion time to work on 
it :-)

> It's actually quite rare that bits are added here, so I am OK with this
> myself.
> 
> I will produce an RFCv2 though (the current API doesn't really extend to
> 1080p motion detection due to the limited number of blocks), and I will
> take another look at this. Although I don't really know off-hand how to
> implement it. One idea that I had (just a brainstorm at this moment) is to
> associate V4L2 events with a buffer. So internally buffers would have an
> event queue and events could be added there once the driver is done with
> the buffer.
> 
> An event buffer flag would be set, signalling to userspace that events are
> available for this buffer and DQEVENT can be used to determine which events
> there are (e.g. a motion detection event).
> 
> This makes it easy to associate many types of event with a buffer (motion
> detection, face/smile/whatever detection) using standard ioctls, but it
> feels a bit convoluted at the same time. It's probably worth pursuing,
> though, as it is nicely generic.

It's an interesting idea, but maybe a bit convoluted as you mentioned.

What about using a metadata plane ? Alternatively we could add a metadata flag 
and turn the two reserved fields into a metadata buffer pointer. Or add ioctls 
to retrieve the metadata buffer associated with a v4l2_buffer (those are rough 
ideas as well).

> An alternative might be to set a 'DETECT' buffer flag, and rename
> 'reserved2' to 'detect_mask', thus having up to 32 things to detect. The
> problem with that is that we have no idea yet how to do face detection or
> any other type of detection since we have no experience with it at all. So
> 32 bits may also be insufficient, and I'd rather not use up a full field.

-- 
Regards,

Laurent Pinchart


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] Motion Detection API
  2013-05-07 12:09     ` Laurent Pinchart
@ 2013-05-07 12:35       ` Hans Verkuil
  2013-05-07 14:04         ` Sylwester Nawrocki
  2013-05-07 20:59       ` Sakari Ailus
  1 sibling, 1 reply; 16+ messages in thread
From: Hans Verkuil @ 2013-05-07 12:35 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: linux-media, Volokh Konstantin, Pete Eberlein, Ismael Luceno,
	Sylwester Nawrocki, Kamil Debski, sakari.ailus

On Tue 7 May 2013 14:09:54 Laurent Pinchart wrote:
> Hi Hans,
> 
> (CC'ing Sakari, I know he's missing reviewing V4L2 patches ;-))
> 
> On Monday 06 May 2013 15:41:41 Hans Verkuil wrote:
> > On Mon April 29 2013 22:52:31 Laurent Pinchart wrote:
> > > On Friday 12 April 2013 17:36:16 Hans Verkuil wrote:
> > > > This RFC looks at adding support for motion detection to V4L2. This is
> > > > the main missing piece that prevents the go7007 and solo6x10 drivers
> > > > from being moved into mainline from the staging directory.
> > > > 
> > > > Step one is to look at existing drivers/hardware:
> > > > 
> > > > 1) The go7007 driver:
> > > > 	- divides the frame into blocks of 16x16 pixels each (that's 45x36
> > > > 	  blocks for PAL)
> > > > 	- each block can be assigned to region 0, 1, 2 or 3
> > > > 	- each region has:
> > > > 		- a pixel change threshold
> > > > 		- a motion vector change threshold
> > > > 		- a trigger level; if this is 0, then motion detection for this
> > > > 		  region is disabled
> > > > 	- when streaming the reserved field of v4l2_buffer is used as a
> > > > 	  bitmask: one bit for each region where motion is detected.
> > > > 
> > > > 2) The solo6x10 driver:
> > > > 	- divides the frame into blocks of 16x16 pixels each
> > > > 	- each block has its own threshold
> > > > 	- the driver adds one MOTION_ON buffer flag and one MOTION_DETECTED
> > > > 	  buffer flag.
> > > > 	- motion detection can be disabled or enabled.
> > > > 	- the driver has a global motion detection mode with just one
> > > > 	  threshold: in that case all blocks are set to the same threshold.
> > > > 	- there is also support for displaying a border around the image if
> > > > 	  motion is detected (very hardware specific).
> > > > 
> > > > 3) The tw2804 video encoder (based on the datasheet, not implemented in
> > > > the driver):
> > > > 	- divides the image in 12x12 blocks (block size will differ for NTSC
> > > > 	  vs PAL)
> > > > 	- motion detection can be enabled or disabled for each block
> > > > 	- there are four controls:
> > > > 		- luminance level change threshold
> > > > 		- spatial sensitivity threshold
> > > > 		- temporal sensitivity threshold
> > > > 		- velocity control (determines how well slow motions are
> > > > 		  detected)
> > > > 	- detection is reported by a hardware pin in this case
> > > > 
> > > > Comparing these three examples of motion detection I see quite a lot of
> > > > similarities, enough to make a proposal for an API:
> > > > 
> > > > - Add a MOTION_DETECTION menu control:
> > > > 	- Disabled
> > > > 	- Global Motion Detection
> > > > 	- Regional Motion Detection
> > > > 
> > > > If 'Global Motion Detection' is selected, then various threshold
> > > > controls become available. What sort of thresholds are available seems
> > > > to be quite variable, so I am inclined to leave this as private
> > > > controls.
> > > > 
> > > > - Add new buffer flags when motion is detected. The go7007 driver would
> > > > need 4 bits (one for each region), the others just one. This can be
> > > > done by taking 4 bits from the v4l2_buffer flags field. There are still
> > > > 16 bits left there, and if it becomes full, then we still have two
> > > > reserved fields. I see no reason for adding a 'MOTION_ON' flag as the
> > > > solo6x10 driver does today: just check the MOTION_DETECTION control if
> > > > you want to know if motion detection is on or not.
> > > 
> > > We're really starting to shove metadata in buffer flags. Isn't it time to
> > > add a proper metadata API ? I don't really like the idea of using
> > > (valuable) buffer flags for a feature supported by three drivers only.
> > 
> > There are still 18 (not 16) bits remaining, so we are hardly running out of
> > bits. And I feel it is overkill to create a new metadata API for just a few
> > bits.
> 
> Creating a metadata API for any of the small piece of metadata information is 
> overkil, but when we add them up all together it definitely makes sense. We've 
> been postponing that API for some time now, it's in my opinion time to work on 
> it :-)
> 
> > It's actually quite rare that bits are added here, so I am OK with this
> > myself.
> > 
> > I will produce an RFCv2 though (the current API doesn't really extend to
> > 1080p motion detection due to the limited number of blocks), and I will
> > take another look at this. Although I don't really know off-hand how to
> > implement it. One idea that I had (just a brainstorm at this moment) is to
> > associate V4L2 events with a buffer. So internally buffers would have an
> > event queue and events could be added there once the driver is done with
> > the buffer.
> > 
> > An event buffer flag would be set, signalling to userspace that events are
> > available for this buffer and DQEVENT can be used to determine which events
> > there are (e.g. a motion detection event).
> > 
> > This makes it easy to associate many types of event with a buffer (motion
> > detection, face/smile/whatever detection) using standard ioctls, but it
> > feels a bit convoluted at the same time. It's probably worth pursuing,
> > though, as it is nicely generic.
> 
> It's an interesting idea, but maybe a bit convoluted as you mentioned.
> 
> What about using a metadata plane ? Alternatively we could add a metadata flag 
> and turn the two reserved fields into a metadata buffer pointer. Or add ioctls 
> to retrieve the metadata buffer associated with a v4l2_buffer (those are rough 
> ideas as well).

A metadata plane works well if you have substantial amounts of data (e.g. histogram
data) but it has the disadvantage of requiring you to use the MPLANE buffer types,
something which standard apps do not support. I definitely think that is overkill
for things like this.

Another problem with a metadata plane is how to interpret it. If you have lots of
different properties that you want to store it becomes hard to read. An event
queue-like solution would be more effective, it is a good fit for collecting
random pieces of small information. It would certainly be suitable for things
like face/motion/smile/whatever detection scenarios, especially since you can
select() on that if you want and since the whole infrastructure for this
already exists.

I think I'd like to study this possibility a bit further, see how easy it is
to put this together.

Personally I think the metadata planes are more suitable for information that
is there for every frame, whereas per-buffer events are more suitable for
information that is event-driven: 'There is motion', 'There are faces',
'Someone has closed their eyes', etc. Using the event API actually makes a
lot of sense if you look at it that way.

Regards,

	Hans

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] Motion Detection API
  2013-05-07 12:35       ` Hans Verkuil
@ 2013-05-07 14:04         ` Sylwester Nawrocki
  2013-05-08 16:26           ` Sakari Ailus
  0 siblings, 1 reply; 16+ messages in thread
From: Sylwester Nawrocki @ 2013-05-07 14:04 UTC (permalink / raw)
  To: Hans Verkuil
  Cc: Laurent Pinchart, linux-media, Volokh Konstantin, Pete Eberlein,
	Ismael Luceno, Sylwester Nawrocki, Kamil Debski, sakari.ailus

On 05/07/2013 02:35 PM, Hans Verkuil wrote:
> A metadata plane works well if you have substantial amounts of data (e.g. histogram
> data) but it has the disadvantage of requiring you to use the MPLANE buffer types,
> something which standard apps do not support. I definitely think that is overkill
> for things like this.

Standard application could use the MPLANE interface through the libv4l-mplane
plugin [1]. And meta-data plane could be handled in libv4l, passed in raw form 
from the kernel.

There can be substantial amount of meta-data per frame and we were considering
e.g. creating separate buffer queue for meta-data, to be able to use mmaped 
buffer at user space, rather than parsing and copying data multiple times in 
the kernel until it gets into user space and is further processed there.

I'm actually not sure if performance is a real issue here, were are talking
of 1.5 KiB order amounts of data per frame. Likely on x86 desktop machines
it is not a big deal, for ARM embedded platforms we would need to do some
profiling.

I'm not sure myself yet how much such motion/object detection data should be 
interpreted in the kernel, rather than in user space. I suspect some generic
API like in your $subject RFC makes sense, it would cover as many cases as 
possible. But I was wondering how much it makes sense to design a sort of 
raw interface/buffer queue (similar to raw sockets concept), that would allow
user space libraries to parse meta-data.

The format of meta-data could for example have changed after switching to
a new version of device's firmware. It might be rare, I'm just trying to say 
I would like to avoid designing a kernel interface that might soon become a 
limitation.

Besides, I have been thinking of allowing application/libs to request an
additional meta-data plane, which would be driver-specific. For instance
it turns the Samsung S5C73M3 camera can send meta-data for YUV formats
as well as for interleaved JPEG/YUV.

[1] http://git.linuxtv.org/v4l-utils.git/commit/ced1be346fe4f61c864cba9d81f66089d4e32a56

Regards,
Sylwester

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] Motion Detection API
  2013-05-07 12:09     ` Laurent Pinchart
  2013-05-07 12:35       ` Hans Verkuil
@ 2013-05-07 20:59       ` Sakari Ailus
  1 sibling, 0 replies; 16+ messages in thread
From: Sakari Ailus @ 2013-05-07 20:59 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Hans Verkuil, linux-media, Volokh Konstantin, Pete Eberlein,
	Ismael Luceno, Sylwester Nawrocki, Kamil Debski

Hi Laurent, Hans and others,

On Tue, May 07, 2013 at 02:09:54PM +0200, Laurent Pinchart wrote:
> (CC'ing Sakari, I know he's missing reviewing V4L2 patches ;-))

Thanks for cc'ing me! :-) I know I've been very quite recently, but it's not
going to stay like that permanently. Let's say I've been very, very busy
elsewhere.

> On Monday 06 May 2013 15:41:41 Hans Verkuil wrote:
> > On Mon April 29 2013 22:52:31 Laurent Pinchart wrote:
> > > On Friday 12 April 2013 17:36:16 Hans Verkuil wrote:
> > > > This RFC looks at adding support for motion detection to V4L2. This is
> > > > the main missing piece that prevents the go7007 and solo6x10 drivers
> > > > from being moved into mainline from the staging directory.
> > > > 
> > > > Step one is to look at existing drivers/hardware:
> > > > 
> > > > 1) The go7007 driver:
> > > > 	- divides the frame into blocks of 16x16 pixels each (that's 45x36
> > > > 	  blocks for PAL)
> > > > 	- each block can be assigned to region 0, 1, 2 or 3
> > > > 	- each region has:
> > > > 		- a pixel change threshold
> > > > 		- a motion vector change threshold
> > > > 		- a trigger level; if this is 0, then motion detection for this
> > > > 		  region is disabled
> > > > 	- when streaming the reserved field of v4l2_buffer is used as a
> > > > 	  bitmask: one bit for each region where motion is detected.
> > > > 
> > > > 2) The solo6x10 driver:
> > > > 	- divides the frame into blocks of 16x16 pixels each
> > > > 	- each block has its own threshold
> > > > 	- the driver adds one MOTION_ON buffer flag and one MOTION_DETECTED
> > > > 	  buffer flag.
> > > > 	- motion detection can be disabled or enabled.
> > > > 	- the driver has a global motion detection mode with just one
> > > > 	  threshold: in that case all blocks are set to the same threshold.
> > > > 	- there is also support for displaying a border around the image if
> > > > 	  motion is detected (very hardware specific).
> > > > 
> > > > 3) The tw2804 video encoder (based on the datasheet, not implemented in
> > > > the driver):
> > > > 	- divides the image in 12x12 blocks (block size will differ for NTSC
> > > > 	  vs PAL)
> > > > 	- motion detection can be enabled or disabled for each block
> > > > 	- there are four controls:
> > > > 		- luminance level change threshold
> > > > 		- spatial sensitivity threshold
> > > > 		- temporal sensitivity threshold
> > > > 		- velocity control (determines how well slow motions are
> > > > 		  detected)
> > > > 	- detection is reported by a hardware pin in this case
> > > > 
> > > > Comparing these three examples of motion detection I see quite a lot of
> > > > similarities, enough to make a proposal for an API:
> > > > 
> > > > - Add a MOTION_DETECTION menu control:
> > > > 	- Disabled
> > > > 	- Global Motion Detection
> > > > 	- Regional Motion Detection
> > > > 
> > > > If 'Global Motion Detection' is selected, then various threshold
> > > > controls become available. What sort of thresholds are available seems
> > > > to be quite variable, so I am inclined to leave this as private
> > > > controls.
> > > > 
> > > > - Add new buffer flags when motion is detected. The go7007 driver would
> > > > need 4 bits (one for each region), the others just one. This can be
> > > > done by taking 4 bits from the v4l2_buffer flags field. There are still
> > > > 16 bits left there, and if it becomes full, then we still have two
> > > > reserved fields. I see no reason for adding a 'MOTION_ON' flag as the
> > > > solo6x10 driver does today: just check the MOTION_DETECTION control if
> > > > you want to know if motion detection is on or not.
> > > 
> > > We're really starting to shove metadata in buffer flags. Isn't it time to
> > > add a proper metadata API ? I don't really like the idea of using
> > > (valuable) buffer flags for a feature supported by three drivers only.
> > 
> > There are still 18 (not 16) bits remaining, so we are hardly running out of
> > bits. And I feel it is overkill to create a new metadata API for just a few
> > bits.
> 
> Creating a metadata API for any of the small piece of metadata information is 
> overkil, but when we add them up all together it definitely makes sense. We've 
> been postponing that API for some time now, it's in my opinion time to work on 
> it :-)

I agree.

Without that proposal, I'd first suggested events, but a separate metadata
API does indeed sound like a nice idea. There's face detection information
to deliver, for instance. That was discussed long time ago but no conclusion
was reached on that (essentially the proposal was suggesting a very specific
object detection IOCTL which would have no other uses).

On buffer flags --- if one driver uses a single flag to tell about the
motion and another uses four, how does the application still know which
flags to look for and how to interpret them? I think that if it'd be more
than just a single flag with more sophisticated information available by
other means, I wouldn't think buffer flags as the first option. Also ---
think of multiple streams originating from a single source (e.g. in
omap3isp, should it support motion detection). Which one should contain the
relevant buffer flags? In other words, it'd be nice that this was separate
from video buffers themselves; that way it could work on most embedded
systems and desktops in a similar fashion.

The metadata covered by a possible metadata API should probably not cover
all kinds of metadata; for instance, many sensors, such as some SMIA
compatible ones, provide metadata, but there's a but: this metadata is
cumbersome and time-consuming to interpret; reading a single property from
the metadata requies parsing the whole metadata block up to the property of
interest. Also, the metadata consists of very low level properties: register
values used to expose a frame.

I also vaguely remember a Samsung sensor producing floating point numbers in
the metadata. IMHO the above kind of metadata should be parsed in the user
space based on what the user space requires. Designing a stable, useful,
long-lived and complete kernel API for that is unlikely to be feasible.
Beyond that, the metadata itself should likely use video buffers like the
image statistics discussed previously.

So the two above kind of examples of low-level metadata should likely be
excluded from this interface.

But with the two examples of motion detection and face (object) detection, I
think it's entirely possible to start drafting a more generic API. For
object detection more than 64 bytes per event may be needed, and that could
be covered by extended events (when the payload exceeds the space available
in v4l2_event). VIDIOC_QEVENT sounds like an option --- that would allow
passing larger buffers to the V4L2 framework to store the event data into.
That'd be a practicable even if a little rough interface: larger event
buffers would be used whenever the payload exceeds 64 bytes or available.

I think I'd base the metadata API to events: that's an existing, extensible
mechanism that already implements parts of what's needed. Then there are
options, like whether a single event could contain multiple metadata
entries. That's a question of efficiency perhaps; we could also provide
VIDIOC_DQEVENTS that could be used to dequeue multiple events at one go,
thus avoding the need to issue an IOCTL per dequeued event.

> > It's actually quite rare that bits are added here, so I am OK with this
> > myself.
> > 
> > I will produce an RFCv2 though (the current API doesn't really extend to
> > 1080p motion detection due to the limited number of blocks), and I will
> > take another look at this. Although I don't really know off-hand how to
> > implement it. One idea that I had (just a brainstorm at this moment) is to
> > associate V4L2 events with a buffer. So internally buffers would have an
> > event queue and events could be added there once the driver is done with
> > the buffer.

Events already have a frame sequence number, so they can be associated to
video buffers (or rather any video buffer the hardware provides of an image
from the image source). Albeit I admit that there's a catch here: it may be
difficult to know that a motion detection event has not arrived when there
none coming.

Attaching the motion detection information to the video buffer isn't a
silver bullet either since the information whether there's been motion could
be only available after the frame itself (depending on the hardware). For
generic applications it might be just practicable to determine there's no
motion event coming if it wasn't there before the frame itself.

> > An event buffer flag would be set, signalling to userspace that events are
> > available for this buffer and DQEVENT can be used to determine which events
> > there are (e.g. a motion detection event).
> > 
> > This makes it easy to associate many types of event with a buffer (motion
> > detection, face/smile/whatever detection) using standard ioctls, but it
> > feels a bit convoluted at the same time. It's probably worth pursuing,
> > though, as it is nicely generic.
> 
> It's an interesting idea, but maybe a bit convoluted as you mentioned.
> 
> What about using a metadata plane ? Alternatively we could add a metadata flag 

I'm all for a metadata plane for low level sensor metadata as an option ---
sometimes the DMA engine writing that metadata to the system memory is a
different one than the one that writes the actual buffer there. The fact is
that this low-level metadata is typically available much before the actual
frame, and sometimes its availability to user space is time-critical.

> and turn the two reserved fields into a metadata buffer pointer. Or add ioctls 
> to retrieve the metadata buffer associated with a v4l2_buffer (those are rough 
> ideas as well).

I have to say I don't find this option very attractive. There are many, many
IOCTLs in V4L2 for which it's evident they'd better been implemented
differently... S_INPUT, for instance. This approach would have high chances
of being one more to that list. :-)

Long-lived and generic interfaces can be reached by following the Unix
philosophy. Use case specific interfaces seldom are such.

> > An alternative might be to set a 'DETECT' buffer flag, and rename
> > 'reserved2' to 'detect_mask', thus having up to 32 things to detect. The
> > problem with that is that we have no idea yet how to do face detection or
> > any other type of detection since we have no experience with it at all. So
> > 32 bits may also be insufficient, and I'd rather not use up a full field.

On top of that, there are few if any ways to use those bits in a generic
fashion and the user knowing precisely what they mean. If the information is
just bit-based, then probably a single bit would do (motion / no motion).

-- 
Kind regards,

Sakari Ailus
e-mail: sakari.ailus@iki.fi	XMPP: sailus@retiisi.org.uk

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] Motion Detection API
  2013-05-07 14:04         ` Sylwester Nawrocki
@ 2013-05-08 16:26           ` Sakari Ailus
  2013-05-08 22:12             ` Sylwester Nawrocki
  0 siblings, 1 reply; 16+ messages in thread
From: Sakari Ailus @ 2013-05-08 16:26 UTC (permalink / raw)
  To: Sylwester Nawrocki
  Cc: Hans Verkuil, Laurent Pinchart, linux-media, Volokh Konstantin,
	Pete Eberlein, Ismael Luceno, Sylwester Nawrocki, Kamil Debski

Hi Sylwester,

On Tue, May 07, 2013 at 04:04:10PM +0200, Sylwester Nawrocki wrote:
> On 05/07/2013 02:35 PM, Hans Verkuil wrote:
> > A metadata plane works well if you have substantial amounts of data (e.g. histogram
> > data) but it has the disadvantage of requiring you to use the MPLANE buffer types,
> > something which standard apps do not support. I definitely think that is overkill
> > for things like this.
> 
> Standard application could use the MPLANE interface through the libv4l-mplane
> plugin [1]. And meta-data plane could be handled in libv4l, passed in raw form 
> from the kernel.
> 
> There can be substantial amount of meta-data per frame and we were considering
> e.g. creating separate buffer queue for meta-data, to be able to use mmaped 
> buffer at user space, rather than parsing and copying data multiple times in 
> the kernel until it gets into user space and is further processed there.

What kind of metadata do you have?

> I'm actually not sure if performance is a real issue here, were are talking
> of 1.5 KiB order amounts of data per frame. Likely on x86 desktop machines
> it is not a big deal, for ARM embedded platforms we would need to do some
> profiling.
> 
> I'm not sure myself yet how much such motion/object detection data should be 
> interpreted in the kernel, rather than in user space. I suspect some generic
> API like in your $subject RFC makes sense, it would cover as many cases as 
> possible. But I was wondering how much it makes sense to design a sort of 
> raw interface/buffer queue (similar to raw sockets concept), that would allow
> user space libraries to parse meta-data.

This was proposed as one possible solution in the Cambourne meeting.

<URL:http://permalink.gmane.org/gmane.linux.drivers.video-input-infrastructure/36587>

I'm in favour of using a separate video buffer queue for passing low-level
metadata to user space.

> The format of meta-data could for example have changed after switching to
> a new version of device's firmware. It might be rare, I'm just trying to say 
> I would like to avoid designing a kernel interface that might soon become a 
> limitation.

On some devices it seems the metadata consists of much higher level
information.

-- 
kind regards,

Sakari Ailus
e-mail: sakari.ailus@iki.fi	XMPP: sailus@retiisi.org.uk

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] Motion Detection API
  2013-05-08 16:26           ` Sakari Ailus
@ 2013-05-08 22:12             ` Sylwester Nawrocki
  2013-05-21 17:30               ` Sakari Ailus
  0 siblings, 1 reply; 16+ messages in thread
From: Sylwester Nawrocki @ 2013-05-08 22:12 UTC (permalink / raw)
  To: Sakari Ailus
  Cc: Sylwester Nawrocki, Hans Verkuil, Laurent Pinchart, linux-media,
	Volokh Konstantin, Pete Eberlein, Ismael Luceno, Kamil Debski,
	Andrzej Hajda

Hi Sakari :-)

On 05/08/2013 06:26 PM, Sakari Ailus wrote:
> On Tue, May 07, 2013 at 04:04:10PM +0200, Sylwester Nawrocki wrote:
>> On 05/07/2013 02:35 PM, Hans Verkuil wrote:
>>> A metadata plane works well if you have substantial amounts of data (e.g. histogram
>>> data) but it has the disadvantage of requiring you to use the MPLANE buffer types,
>>> something which standard apps do not support. I definitely think that is overkill
>>> for things like this.
>>
>> Standard application could use the MPLANE interface through the libv4l-mplane
>> plugin [1]. And meta-data plane could be handled in libv4l, passed in raw form
>> from the kernel.
>>
>> There can be substantial amount of meta-data per frame and we were considering
>> e.g. creating separate buffer queue for meta-data, to be able to use mmaped
>> buffer at user space, rather than parsing and copying data multiple times in
>> the kernel until it gets into user space and is further processed there.
>
> What kind of metadata do you have?

At least I can tell of three kinds of meta-data at the moment:

  a) face/smile/blink detection markers (rectangles), see struct 
is_face_marker
     in file [1] in the media tree for more details; these markers can be
     available after an image frame is dequeued AFAIK, i.e. not immediately
     together with the image data,

  b) EXIF tags (struct exif_attribute in file [1]), it's a preprocessed by
     the ISP metadata appended to each buffer,

  c) the object detection bitmap, and this one can have size comparable to
     the actual image frame; I didn't see how it works in practice yet 
though.

For b) I have been re-considering using EXIF standard (chapter 4.6, [2]) to
create some sane interface for the ISP driver.

 From performance POV only c) would need a meta-data specific buffer 
queue, as
such data has similar characteristics to the actual image data and a DMA 
engine
is used to capture those bitmaps.

As far as we're not copying megabytes of data by CPU there should be no big
issues, I guess couple pages per frame is fine.

>> I'm actually not sure if performance is a real issue here, were are talking
>> of 1.5 KiB order amounts of data per frame. Likely on x86 desktop machines
>> it is not a big deal, for ARM embedded platforms we would need to do some
>> profiling.
>>
>> I'm not sure myself yet how much such motion/object detection data should be
>> interpreted in the kernel, rather than in user space. I suspect some generic
>> API like in your $subject RFC makes sense, it would cover as many cases as
>> possible. But I was wondering how much it makes sense to design a sort of
>> raw interface/buffer queue (similar to raw sockets concept), that would allow
>> user space libraries to parse meta-data.
>
> This was proposed as one possible solution in the Cambourne meeting.
>
> <URL:http://permalink.gmane.org/gmane.linux.drivers.video-input-infrastructure/36587>

Oh, thanks for bringing up those meeting minutes.

> I'm in favour of using a separate video buffer queue for passing low-level
> metadata to user space.

Sure. I certainly see a need for such an interface. I wouldn't like to 
see it
as the only option, however. One of the main reasons of introducing MPLANE
API was to allow capture of meta-data. We are going to finally prepare some
RFC regarding usage of a separate plane for meta-data capture. I'm not sure
yet how it would look exactly in detail, we've just discussed this topic
roughly with Andrzej.

>> The format of meta-data could for example have changed after switching to
>> a new version of device's firmware. It might be rare, I'm just trying to say
>> I would like to avoid designing a kernel interface that might soon become a
>> limitation.
>
> On some devices it seems the metadata consists of much higher level
> information.

Indeed. It seems in case of devices like OMAP3 ISP we need to deal 
mostly with
raw data from a Bayer sensor, while for the Exynos ISP I would need to 
expose
something produced by the standalone ISP from such a raw metadata.

[1] drivers/media/platform/exynos4-is/fimc-is-param.h
[2] http://www.exif.org/Exif2-2.PDF

--

Regards,
Sylwester

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] Motion Detection API
  2013-05-08 22:12             ` Sylwester Nawrocki
@ 2013-05-21 17:30               ` Sakari Ailus
  2013-05-22 21:41                 ` Sylwester Nawrocki
  0 siblings, 1 reply; 16+ messages in thread
From: Sakari Ailus @ 2013-05-21 17:30 UTC (permalink / raw)
  To: Sylwester Nawrocki
  Cc: Sylwester Nawrocki, Hans Verkuil, Laurent Pinchart, linux-media,
	Volokh Konstantin, Pete Eberlein, Ismael Luceno, Kamil Debski,
	Andrzej Hajda

Hi Sylwester,

My apologies for the late answer.

Sylwester Nawrocki wrote:
> Hi Sakari :-)
>
> On 05/08/2013 06:26 PM, Sakari Ailus wrote:
>> On Tue, May 07, 2013 at 04:04:10PM +0200, Sylwester Nawrocki wrote:
>>> On 05/07/2013 02:35 PM, Hans Verkuil wrote:
>>>> A metadata plane works well if you have substantial amounts of data
>>>> (e.g. histogram
>>>> data) but it has the disadvantage of requiring you to use the MPLANE
>>>> buffer types,
>>>> something which standard apps do not support. I definitely think
>>>> that is overkill
>>>> for things like this.
>>>
>>> Standard application could use the MPLANE interface through the
>>> libv4l-mplane
>>> plugin [1]. And meta-data plane could be handled in libv4l, passed in
>>> raw form
>>> from the kernel.
>>>
>>> There can be substantial amount of meta-data per frame and we were
>>> considering
>>> e.g. creating separate buffer queue for meta-data, to be able to use
>>> mmaped
>>> buffer at user space, rather than parsing and copying data multiple
>>> times in
>>> the kernel until it gets into user space and is further processed there.
>>
>> What kind of metadata do you have?
>
> At least I can tell of three kinds of meta-data at the moment:
>
>   a) face/smile/blink detection markers (rectangles), see struct
> is_face_marker
>      in file [1] in the media tree for more details; these markers can be
>      available after an image frame is dequeued AFAIK, i.e. not immediately
>      together with the image data,
>
>   b) EXIF tags (struct exif_attribute in file [1]), it's a preprocessed by
>      the ISP metadata appended to each buffer,

This class includes other image file metadata types such as iptc and xmp.

>   c) the object detection bitmap, and this one can have size comparable to
>      the actual image frame; I didn't see how it works in practice yet
> though.
> 
> For b) I have been re-considering using EXIF standard (chapter 4.6, [2]) to
> create some sane interface for the ISP driver.
>
>  From performance POV only c) would need a meta-data specific buffer
> queue, as
> such data has similar characteristics to the actual image data and a DMA
> engine
> is used to capture those bitmaps.
>
> As far as we're not copying megabytes of data by CPU there should be no big
> issues, I guess couple pages per frame is fine.
>
>>> I'm actually not sure if performance is a real issue here, were are
>>> talking
>>> of 1.5 KiB order amounts of data per frame. Likely on x86 desktop
>>> machines
>>> it is not a big deal, for ARM embedded platforms we would need to do
>>> some
>>> profiling.
>>>
>>> I'm not sure myself yet how much such motion/object detection data
>>> should be
>>> interpreted in the kernel, rather than in user space. I suspect some
>>> generic
>>> API like in your $subject RFC makes sense, it would cover as many
>>> cases as
>>> possible. But I was wondering how much it makes sense to design a
>>> sort of
>>> raw interface/buffer queue (similar to raw sockets concept), that
>>> would allow
>>> user space libraries to parse meta-data.
>>
>> This was proposed as one possible solution in the Cambourne meeting.
>>
>> <URL:http://permalink.gmane.org/gmane.linux.drivers.video-input-infrastructure/36587>
>>
>
> Oh, thanks for bringing up those meeting minutes.
>
>> I'm in favour of using a separate video buffer queue for passing
>> low-level
>> metadata to user space.
>
> Sure. I certainly see a need for such an interface. I wouldn't like to
> see it
> as the only option, however. One of the main reasons of introducing MPLANE
> API was to allow capture of meta-data. We are going to finally prepare some
> RFC regarding usage of a separate plane for meta-data capture. I'm not sure
> yet how it would look exactly in detail, we've just discussed this topic
> roughly with Andrzej.

I'm fine that being not the only option; however it's unbeatable when it
comes to latencies. So perhaps we should allow using multi-plane buffers for
the same purpose as well.

But how to choose between the two?

In complex media devices the metadata is written into the system memory in
an entirely different place than the images themselves which typically
require processing (especially if the sensor produces raw bayer images).
This would likely mean that there will be multiple device nodes in this kind
of situations.

That said, in both cases extra infrastructure is required for configuring
metadata format (possibly hardware-specific) and passing the control
information between the subdev drivers. This requires new interfaces to the
V4L2 subdev API. I'd think this part will still be common for both
approaches.

>>> The format of meta-data could for example have changed after
>>> switching to
>>> a new version of device's firmware. It might be rare, I'm just trying
>>> to say
>>> I would like to avoid designing a kernel interface that might soon
>>> become a
>>> limitation.
>>
>> On some devices it seems the metadata consists of much higher level
>> information.
>
> Indeed. It seems in case of devices like OMAP3 ISP we need to deal
> mostly with
> raw data from a Bayer sensor, while for the Exynos ISP I would need to
> expose
> something produced by the standalone ISP from such a raw metadata.

Can the Exynos ISP also process raw bayer input, or is it YUV only?

I remember I have heard of and seen external ISPs but never have used those
myself.

-- 
Kind regards,

Sakari Ailus
sakari.ailus@iki.fi

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] Motion Detection API
  2013-05-21 17:30               ` Sakari Ailus
@ 2013-05-22 21:41                 ` Sylwester Nawrocki
  2013-06-03  1:25                   ` Sakari Ailus
  0 siblings, 1 reply; 16+ messages in thread
From: Sylwester Nawrocki @ 2013-05-22 21:41 UTC (permalink / raw)
  To: Sakari Ailus
  Cc: Hans Verkuil, Laurent Pinchart, linux-media, Volokh Konstantin,
	Pete Eberlein, Ismael Luceno, Kamil Debski, Andrzej Hajda

Hi Sakari,

On 05/21/2013 07:30 PM, Sakari Ailus wrote:
> Hi Sylwester,
>
> My apologies for the late answer.

No problem at all, thank you for your follow up.

> Sylwester Nawrocki wrote:
>> On 05/08/2013 06:26 PM, Sakari Ailus wrote:
>>> On Tue, May 07, 2013 at 04:04:10PM +0200, Sylwester Nawrocki wrote:
>>>> On 05/07/2013 02:35 PM, Hans Verkuil wrote:
>>>>> A metadata plane works well if you have substantial amounts of data
>>>>> (e.g. histogram
>>>>> data) but it has the disadvantage of requiring you to use the MPLANE
>>>>> buffer types,
>>>>> something which standard apps do not support. I definitely think
>>>>> that is overkill for things like this.
>>>>
>>>> Standard application could use the MPLANE interface through the
>>>> libv4l-mplane
>>>> plugin [1]. And meta-data plane could be handled in libv4l, passed in
>>>> raw form from the kernel.
>>>>
>>>> There can be substantial amount of meta-data per frame and we were
>>>> considering
>>>> e.g. creating separate buffer queue for meta-data, to be able to use
>>>> mmaped
>>>> buffer at user space, rather than parsing and copying data multiple
>>>> times in
>>>> the kernel until it gets into user space and is further processed
>>>> there.
>>>
>>> What kind of metadata do you have?
>>
>> At least I can tell of three kinds of meta-data at the moment:
>>
>> a) face/smile/blink detection markers (rectangles), see struct
>> is_face_marker
>> in file [1] in the media tree for more details; these markers can be
>> available after an image frame is dequeued AFAIK, i.e. not immediately
>> together with the image data,
>>
>> b) EXIF tags (struct exif_attribute in file [1]), it's a preprocessed by
>> the ISP metadata appended to each buffer,
>
> This class includes other image file metadata types such as iptc and xmp.

Right, thanks for pointing it out.

It seems EXIF is most useful for device related image properties though.
At least this is my understanding from reading the "Guidelines For Handling
Image Metadata" [1].

XMP uses XML schemas and it seems more relevant to processing data in user
space applications. Although it defines various namespaces [2] and could be
a container of a set of EXIF-specific properties.

>> c) the object detection bitmap, and this one can have size comparable to
>> the actual image frame; I didn't see how it works in practice yet
>> though.
>>
>> For b) I have been re-considering using EXIF standard (chapter 4.6,
>> [2]) to
>> create some sane interface for the ISP driver.
>>
>> From performance POV only c) would need a meta-data specific buffer
>> queue, as
>> such data has similar characteristics to the actual image data and a DMA
>> engine
>> is used to capture those bitmaps.
[...]
>>> I'm in favour of using a separate video buffer queue for passing
>>> low-level
>>> metadata to user space.
>>
>> Sure. I certainly see a need for such an interface. I wouldn't like to
>> see it
>> as the only option, however. One of the main reasons of introducing
>> MPLANE
>> API was to allow capture of meta-data. We are going to finally prepare
>> some
>> RFC regarding usage of a separate plane for meta-data capture. I'm not
>> sure
>> yet how it would look exactly in detail, we've just discussed this topic
>> roughly with Andrzej.
>
> I'm fine that being not the only option; however it's unbeatable when it
> comes to latencies. So perhaps we should allow using multi-plane buffers
> for the same purpose as well.
>
> But how to choose between the two?

I think we need some example implementation for metadata capture over
multi-plane interface and with a separate video node. Without such
implementation/API draft it is a bit difficult to discuss this further.

> In complex media devices the metadata is written into the system memory in
> an entirely different place than the images themselves which typically
> require processing (especially if the sensor produces raw bayer images).
> This would likely mean that there will be multiple device nodes in this
> kind of situations.
>
> That said, in both cases extra infrastructure is required for configuring
> metadata format (possibly hardware-specific) and passing the control
> information between the subdev drivers. This requires new interfaces to the
> V4L2 subdev API. I'd think this part will still be common for both
> approaches.
>
>>> On some devices it seems the metadata consists of much higher level
>>> information.
>>
>> Indeed. It seems in case of devices like OMAP3 ISP we need to deal
>> mostly with
>> raw data from a Bayer sensor, while for the Exynos ISP I would need to
>> expose
>> something produced by the standalone ISP from such a raw metadata.
>
> Can the Exynos ISP also process raw bayer input, or is it YUV only?

Exynos4212 an later have a full camera ISP subsystem and can process raw
bayer data, as opposed to the Exynos4210 and earlier SoCs, where the camera
subsystem was supporting at most capturing raw bayer data transparently
to memory.

Unfortunately the Exynos Imaging Subsystem documentation is not yet
publicly available.

> I remember I have heard of and seen external ISPs but never have used
> those myself.

I think it might not matter that much if an ISP is external or local to
the main CPU/SoC. Presumably it may mean that different data busses/
mechanisms are used to communicate with the actual hardware, which should
be encapsulated by a driver anyway.

Thanks,
Sylwester

[1] http://www.metadataworkinggroup.org/specs
[2] 
http://www.adobe.com/content/dam/Adobe/en/devnet/xmp/pdfs/XMPSpecificationPart2.pdf

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] Motion Detection API
  2013-05-22 21:41                 ` Sylwester Nawrocki
@ 2013-06-03  1:25                   ` Sakari Ailus
  2013-06-09 17:56                     ` Sylwester Nawrocki
  0 siblings, 1 reply; 16+ messages in thread
From: Sakari Ailus @ 2013-06-03  1:25 UTC (permalink / raw)
  To: Sylwester Nawrocki
  Cc: Hans Verkuil, Laurent Pinchart, linux-media, Volokh Konstantin,
	Pete Eberlein, Ismael Luceno, Kamil Debski, Andrzej Hajda

Hi Sylwester,

On Wed, May 22, 2013 at 11:41:50PM +0200, Sylwester Nawrocki wrote:
> [...]
> >>>I'm in favour of using a separate video buffer queue for passing
> >>>low-level
> >>>metadata to user space.
> >>
> >>Sure. I certainly see a need for such an interface. I wouldn't like to
> >>see it
> >>as the only option, however. One of the main reasons of introducing
> >>MPLANE
> >>API was to allow capture of meta-data. We are going to finally prepare
> >>some
> >>RFC regarding usage of a separate plane for meta-data capture. I'm not
> >>sure
> >>yet how it would look exactly in detail, we've just discussed this topic
> >>roughly with Andrzej.
> >
> >I'm fine that being not the only option; however it's unbeatable when it
> >comes to latencies. So perhaps we should allow using multi-plane buffers
> >for the same purpose as well.
> >
> >But how to choose between the two?
> 
> I think we need some example implementation for metadata capture over
> multi-plane interface and with a separate video node. Without such
> implementation/API draft it is a bit difficult to discuss this further.

Yes, that'd be quite nice.

There are actually a number of things that I think would be needed to
support what's discussed above. Extended frame descriptors (I'm preparing
RFC v2 --- yes, really!) are one.

Also creating video nodes based on how many different content streams there
are doesn't make much sense to me. A quick and dirty solution would be to
create a low level metadata queue type to avoid having to create more video
nodes. I think I'd prefer a more generic solution though.

-- 
Kind regards,

Sakari Ailus
e-mail: sakari.ailus@iki.fi	XMPP: sailus@retiisi.org.uk

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] Motion Detection API
  2013-06-03  1:25                   ` Sakari Ailus
@ 2013-06-09 17:56                     ` Sylwester Nawrocki
  2013-08-21 10:45                       ` Sakari Ailus
  0 siblings, 1 reply; 16+ messages in thread
From: Sylwester Nawrocki @ 2013-06-09 17:56 UTC (permalink / raw)
  To: Sakari Ailus
  Cc: Hans Verkuil, Laurent Pinchart, linux-media, Volokh Konstantin,
	Pete Eberlein, Ismael Luceno, Kamil Debski, Andrzej Hajda

Hi Sakari,

On 06/03/2013 03:25 AM, Sakari Ailus wrote:
> On Wed, May 22, 2013 at 11:41:50PM +0200, Sylwester Nawrocki wrote:
>> [...]
>>>>> I'm in favour of using a separate video buffer queue for passing
>>>>> low-level
>>>>> metadata to user space.
>>>>
>>>> Sure. I certainly see a need for such an interface. I wouldn't like to
>>>> see it
>>>> as the only option, however. One of the main reasons of introducing
>>>> MPLANE
>>>> API was to allow capture of meta-data. We are going to finally prepare
>>>> some
>>>> RFC regarding usage of a separate plane for meta-data capture. I'm not
>>>> sure
>>>> yet how it would look exactly in detail, we've just discussed this topic
>>>> roughly with Andrzej.
>>>
>>> I'm fine that being not the only option; however it's unbeatable when it
>>> comes to latencies. So perhaps we should allow using multi-plane buffers
>>> for the same purpose as well.
>>>
>>> But how to choose between the two?
>>
>> I think we need some example implementation for metadata capture over
>> multi-plane interface and with a separate video node. Without such
>> implementation/API draft it is a bit difficult to discuss this further.
>
> Yes, that'd be quite nice.

I still haven't found time to look into that, got stuck with debugging some
hardware related issues which took much longer than expected..

> There are actually a number of things that I think would be needed to
> support what's discussed above. Extended frame descriptors (I'm preparing
> RFC v2 --- yes, really!) are one.

Sounds great, I'm really looking forward to improving this part and 
having it
used in more drivers.

> Also creating video nodes based on how many different content streams there
> are doesn't make much sense to me. A quick and dirty solution would be to
> create a low level metadata queue type to avoid having to create more video
> nodes. I think I'd prefer a more generic solution though.

Hmm, does it mean having multiple buffer queues on a video device node,
similarly to, e.g. the M2M interface ? Not sure if it would have been a bad
idea at all. The number of video/subdev nodes can get ridiculously high in
case of more complex devices. For example in case of the Samsung Exynos
SoC imaging subsystem the total number of various device nodes is getting
near *30*, and it is going to be at least that many for sure once all
functionality is covered.

So one video node per a DMA engine is probably fair rule, but there might
be reasons to avoid adding more device nodes for covering "logical" streams.


Regards,
Sylwester

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] Motion Detection API
  2013-06-09 17:56                     ` Sylwester Nawrocki
@ 2013-08-21 10:45                       ` Sakari Ailus
  0 siblings, 0 replies; 16+ messages in thread
From: Sakari Ailus @ 2013-08-21 10:45 UTC (permalink / raw)
  To: Sylwester Nawrocki
  Cc: Hans Verkuil, Laurent Pinchart, linux-media, Volokh Konstantin,
	Pete Eberlein, Ismael Luceno, Kamil Debski, Andrzej Hajda

Hi Sylwester,

My apologies for the delayed answer.

On Sun, Jun 09, 2013 at 07:56:23PM +0200, Sylwester Nawrocki wrote:
> On 06/03/2013 03:25 AM, Sakari Ailus wrote:
> >On Wed, May 22, 2013 at 11:41:50PM +0200, Sylwester Nawrocki wrote:
> >>[...]
> >>>>>I'm in favour of using a separate video buffer queue for passing
> >>>>>low-level
> >>>>>metadata to user space.
> >>>>
> >>>>Sure. I certainly see a need for such an interface. I wouldn't like to
> >>>>see it
> >>>>as the only option, however. One of the main reasons of introducing
> >>>>MPLANE
> >>>>API was to allow capture of meta-data. We are going to finally prepare
> >>>>some
> >>>>RFC regarding usage of a separate plane for meta-data capture. I'm not
> >>>>sure
> >>>>yet how it would look exactly in detail, we've just discussed this topic
> >>>>roughly with Andrzej.
> >>>
> >>>I'm fine that being not the only option; however it's unbeatable when it
> >>>comes to latencies. So perhaps we should allow using multi-plane buffers
> >>>for the same purpose as well.
> >>>
> >>>But how to choose between the two?
> >>
> >>I think we need some example implementation for metadata capture over
> >>multi-plane interface and with a separate video node. Without such
> >>implementation/API draft it is a bit difficult to discuss this further.
> >
> >Yes, that'd be quite nice.
> 
> I still haven't found time to look into that, got stuck with debugging some
> hardware related issues which took much longer than expected..

Any better luck now? :-) :-)

> >There are actually a number of things that I think would be needed to
> >support what's discussed above. Extended frame descriptors (I'm preparing
> >RFC v2 --- yes, really!) are one.
> 
> Sounds great, I'm really looking forward to improving this part and
> having it
> used in more drivers.
> 
> >Also creating video nodes based on how many different content streams there
> >are doesn't make much sense to me. A quick and dirty solution would be to
> >create a low level metadata queue type to avoid having to create more video
> >nodes. I think I'd prefer a more generic solution though.
> 
> Hmm, does it mean having multiple buffer queues on a video device node,
> similarly to, e.g. the M2M interface ? Not sure if it would have been a bad

Yes; the metadata and the images would arrive through the same video node
but a different buffer queue. This way creating new video nodes based on
whether metadata exists or not can be avoided.

But just creating a single separate metadata queue type is slightly hackish:
there can be multiple metadata regions in the frame and the sensor can also
produce a JPEG image (albeit I'd like to consider them rare; I've never
worked on one myself).

> idea at all. The number of video/subdev nodes can get ridiculously high in
> case of more complex devices. For example in case of the Samsung Exynos
> SoC imaging subsystem the total number of various device nodes is getting
> near *30*, and it is going to be at least that many for sure once all
> functionality is covered.
> 
> So one video node per a DMA engine is probably fair rule, but there might
> be reasons to avoid adding more device nodes for covering "logical" streams.

The number in my opinion isn't an issue, but it would be an issue if devices
appear and disappear dynamically based on e.g. sensor configuration.

-- 
Cheers,

Sakari Ailus
e-mail: sakari.ailus@iki.fi	XMPP: sailus@retiisi.org.uk

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2013-08-21 10:46 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-04-12 15:36 [RFC] Motion Detection API Hans Verkuil
2013-04-21 12:04 ` Ismael Luceno
2013-04-22  7:55   ` Hans Verkuil
2013-04-29 20:52 ` Laurent Pinchart
2013-05-06 13:41   ` Hans Verkuil
2013-05-07 12:09     ` Laurent Pinchart
2013-05-07 12:35       ` Hans Verkuil
2013-05-07 14:04         ` Sylwester Nawrocki
2013-05-08 16:26           ` Sakari Ailus
2013-05-08 22:12             ` Sylwester Nawrocki
2013-05-21 17:30               ` Sakari Ailus
2013-05-22 21:41                 ` Sylwester Nawrocki
2013-06-03  1:25                   ` Sakari Ailus
2013-06-09 17:56                     ` Sylwester Nawrocki
2013-08-21 10:45                       ` Sakari Ailus
2013-05-07 20:59       ` Sakari Ailus

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).