linux-media.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [ANN] Report of Media Summit: Codecs
@ 2019-11-02 12:50 Hans Verkuil
  2019-11-14 18:15 ` VP9 Stateless Support (follow up of [ANN] Report of Media Summit: Codecs) Nicolas Dufresne
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Hans Verkuil @ 2019-11-02 12:50 UTC (permalink / raw)
  To: Linux Media Mailing List
  Cc: Boris Brezillon, Alexandre Courbot, Nicolas Dufresne,
	Tomasz Figa, Ezequiel Garcia, Daniel Gomez, Peter Griffin,
	Dafna Hirschfeld, Paul Kocialkowski, Helen Koike, Jan Schmidt,
	Dave Stevenson, Michael Tretter, Stanimir Varbanov,
	Matthew Waters

Media Summit: Codecs
October 29, 2019 - Lyon, France

Many thanks to the Linux Foundation for hosting this meeting. Much appreciated!

Please reply to this report with any comments/corrections. Especially if I missed
any action items!


Original announcement:

https://lore.kernel.org/linux-media/c8380b43-2742-f1cb-0fb9-2c3c90e29a33@xs4all.nl/T/


Attendees:

Boris Brezillon <boris.brezillon@collabora.com>
Alexandre Courbot <acourbot@chromium.org>, Google Chrome OS
Nicolas Dufresne <nicolas@ndufresne.ca>
Tomasz Figa <tfiga@chromium.org>, Google Chrome OS
Ezequiel Garcia <ezequiel.garcia@collabora.com>
Daniel Gomez <daniel@qtec.com>
Peter Griffin <Peter.griffin@linaro.org>
Dafna Hirschfeld <dafna.hirschfeld@collabora.com>
Paul Kocialkowski <paul.kocialkowski@bootlin.com>, Bootlin
Helen Koike <helen.koike@collabora.com>
Jan Schmidt <jan@centricular.com>
Dave Stevenson <dave.stevenson@raspberrypi.org>
Michael Tretter <m.tretter@pengutronix.de>
Stanimir Varbanov <stanimir.varbanov@linaro.org>
Hans Verkuil <hverkuil@xs4all.nl>, Cisco Systems Norway
Matthew Waters <matthew@centricular.com>


Discussion of pending codec patches
-----------------------------------

A v3 series for v5.4 hantro fixes will be posted soon.


Requirements for moving drivers out of staging
----------------------------------------------

- More testing is needed for these two H.264 and HEVC features:
  Multiview (used by 3D video, hantro supports this)
  Stream with sublayers

- Improve standard references in the control metadata documentation.

- Look into the ability to add fields to a compound control in the
  future: requires some investigation into the control framework.
  Helps to make the API more future proof.

- Document that metadata controls must stick to the standard as much
  as possible. No hardware specific data is allowed.

- At least one stateless encoder should be present, ideally for each
  codec. This is needed to see if the existing metadata controls can
  be reused by encoders. Check the Intel encoder and VA API. Also
  check if there is any source code for the AMD encoder.

  For reference: this is the Rockchip H.264 stateless encoder metadata:

	struct rk3288_h264e_reg_params {
		u32 frame_coding_type;
		s32 pic_init_qp;
		s32 slice_alpha_offset;
		s32 slice_beta_offset;
		s32 chroma_qp_index_offset;
		s32 filter_disable;
		u16 idr_pic_id;
		s32 pps_id;
		s32 frame_num;
		s32 slice_size_mb_rows;
		s32 h264_inter4x4_disabled;
		s32 enable_cabac;
		s32 transform8x8_mode;
		s32 cabac_init_idc;
		/* rate control relevant */
		s32 qp;
		s32 mad_qp_delta;
		s32 mad_threshold;
		s32 qp_min;
		s32 qp_max;
		s32 cp_distance_mbs;
		s32 cp_target[10];
		s32 target_error[7];
		s32 delta_qp[7];
	};

  And this is the VP8 metadata:

	struct rk3399_vp8e_reg_params {
		u32 is_intra;
		u32 frm_hdr_size;
		u32 qp;
		s32 mv_prob[2][19];
		s32 intra_prob;
		u32 bool_enc_value;
		u32 bool_enc_value_bits;
		u32 bool_enc_range;
		u32 filterDisable;
		u32 filter_sharpness;
		u32 filter_level;
		s32 intra_frm_delta;
		s32 last_frm_delta;
		s32 golden_frm_delta;
		s32 altref_frm_delta;
		s32 bpred_mode_delta;
		s32 zero_mode_delta;
		s32 newmv_mode_delta;
		s32 splitmv_mode_delta;
	};

	Source:
https://chromium.googlesource.com/chromiumos/third_party/libv4lplugins/+/79286ece8624ab016575a5ad8965a61b334ab169/libv4l-rockchip_v2/libvepu/common/rk_venc.h

  And the metadata for cedrus:

	struct h264enc_params {
		unsigned int width;
		unsigned int height;
		unsigned int src_width;
		unsigned int src_height;
		enum color_format { H264_FMT_NV12 = 0, H264_FMT_NV16 = 1 } src_format;
		unsigned int profile_idc, level_idc;
		enum { H264_EC_CAVLC = 0, H264_EC_CABAC = 1 } entropy_coding_mode;
		unsigned int qp;
		unsigned int keyframe_interval;
	};

        Source: https://github.com/jemk/cedrus/blob/master/h264enc/h264enc.c

  Cisco also has a requirement that the bitrate can be controlled per-frame.

  Conclusion: stateless encoder support needs some research. However, the general
  suspicion is that the decoder metadata controls are unlikely to be reused for
  stateless encoders.


Finalize Stateful Encoder
-------------------------

Currently S_PARM/ENUM_FRAMEINTERVALS is used to set the framerate which is needed
by encoders together with the desired bitrate to determine the compression ratio.

After some discussion we realized that this should actually refer to the rate at
which the encoder produces compressed frames: this is needed when you want to
encoder multiple streams in parallel and you want to indicate how the encoder
hardware should balance these encoder processes. E.g. the Xilinx encoder can
reserve N encoding cores depending on the demand.

Setting the actual framerate (which is needed to determine the compression ratio)
is separate from this and should be done through a new control of type v4l2_fract
that indicates the framerate (not interval, the userspace people very much preferred
framerate over frameinterval).

This requires the introduction of V4L2_CTRL_WHICH_{MIN,MAX}_VAL to obtain the min
and max values of a compound control. But this tries in nicely with the work
Ricardo Ribalda Delgado is doing for the V4L2_CTRL_TYPE_AREA controls.

Hans will implement this in the control framework, Michael Tretter will add support
for this in his Xilinx driver.


The other outstanding issue with stateful encoders is what to do if the capture
buffer is too small. It turns out that it is next to impossible to precisely
predict the minimum size of a capture buffers, so some mechanism is needed to
handle this corner case.

We agreed that the best way is to mark the capture buffer that's too small with
V4L2_BUF_FLAG_ERROR and indicate that the reason is that it is too small with
a new buffer flag: V4L2_BUF_FLAG_TOO_SMALL (0x00080000). When userspace sees
this it should stop streaming on the capture side, reallocate and requeue
capture buffers, and restart streaming.

This will work fine if there are no B frames. If the encoder produces B frames
as well, then this approach can produce an invalid stream. The only way this can
be resolved is if the HW/FW can rollback its internal state to before the point
this error was detected. In the future we need a pixelformat flag to indicate
that the HW/FW can rollback.

If the HW can fragment the encoded frames over multiple capture buffers, then it
should do so. The driver should set V4L2_FMT_FLAG_CONTINUOUS_BYTESTREAM for
this. However, this feature should probably be expliticly requested. This can be
done through a new V4L2_PIX_FMT_FLAG_ flag. Some more discussion is needed for this.


Nicolas mentioned that some codec drivers used to return wrong values for
ENUM_FRAMEINTERVALS (swapped numerator/denominator): v4l2-compliance should check
if the returned values are sane.

Nicolas also mentioned that it is not clear how drivers should round S_FMT
resolutions for codecs: this is currently driver specific. He would like this
to be documented (and checked) as rounding up.


Touched upon but not really discussed in-depth: SVC (Scalable Video Coding)
support.


Action Items
------------

Hans Verkuil:

- Ask Cisco colleagues which bitrate-related parameters have to be per-frame for
  an encoder
- make stateful encoder infrastructure + documentation for the missing bits
- investigate using different sizes for metadata controls in the control framework:
  is this possible?

Michael Tretter:

- Support the new encoder stateful controls in the driver

Tomasz Figa:

- look up AMD encoder support

Boris Brezillon:

- send v3 of hantro g1 fixes

Nicolas Dufresne:

- look into multiview and sublayers support

Paul Kocialkowski:

- check metadata controls against the standards and update the docs if needed

Ezequiel Garcia and Boris Brezillon:

- add VP9 support


^ permalink raw reply	[flat|nested] 7+ messages in thread

* VP9 Stateless Support (follow up of [ANN] Report of Media Summit: Codecs)
  2019-11-02 12:50 [ANN] Report of Media Summit: Codecs Hans Verkuil
@ 2019-11-14 18:15 ` Nicolas Dufresne
  2019-11-15  6:12   ` Boris Brezillon
  2019-11-15 14:52   ` Recap of: " Nicolas Dufresne
  2019-11-19 14:18 ` [ANN] Report of Media Summit: Codecs Hans Verkuil
  2019-12-03  6:56 ` Tomasz Figa
  2 siblings, 2 replies; 7+ messages in thread
From: Nicolas Dufresne @ 2019-11-14 18:15 UTC (permalink / raw)
  To: Hans Verkuil, Linux Media Mailing List
  Cc: Boris Brezillon, Alexandre Courbot, Tomasz Figa, Ezequiel Garcia,
	Daniel Gomez, Peter Griffin, Dafna Hirschfeld, Paul Kocialkowski,
	Helen Koike, Jan Schmidt, Dave Stevenson, Michael Tretter,
	Stanimir Varbanov, Matthew Waters

Hi all,

I hope you all had a good return back after ELCE. I'm writing in order
to follow up and gather ideas around a blocker in our design regarding
stateless CODECs and the VP9 CODEC. This emails provides the context,
and I'd like, for who would like to participate, to have a chat on IRC
#v4l at 14:30 CET for an hour at most. Hans needs to leave at 16h. If
you can't attend, feel free to reply to this thread with ideas.

Context:
-------

VP9 has this concept that resolution can be changed at any frames,
including intra-frames. The consequence of this, is that references
frames maybe not all be of the same resolution. What happens in
practice is that the reference frames are scaled up, or down, to the
decoding target using a fully defined scaling algorithm.

In the context of Hantro (which I need to remind is likely the only VP9
HW decoder design in the world, considering you can get this design for
free), this scaling is done on the fly. The references frames are
passed in their original size.

Our current design for state-less decoder, is that all reference are
held and owned by the VB queue, and referred to with a timestamp (or
cookie). The problem is that as of today, the VB queue buffers are all
of the same formats (despite some looking forward attempt, like
CREATE_BUFS).

Boris has implemented a proof of concept with the current API
limitation, but we could like to find a way forward so that we can
support VP9 compliant implementation. The following are two ideas that
already come up, more could be discussed tomorrow of course.

1. CREATE_BUFS/DELETE_BUFS
--------------------------

I haven't checked how this is exposed in the VB2 framework, but
CREATE_BUFS was created with this idea that you could extend an
existing pool of buffers, with buffers of a different format.

In order to complete this story, we'd need a DELETE_BUF, which allow
asking VB2 to drop it's reference to a specific chunk of memory. For
VP9, a resolution change would looke like this (simplified):

- Userspace detect that next frame has different resolution
- Then DELETE_BUF any buffers that are no longer relevant
- Then TRY_FMT/CREATE_BUFS for the new resolution

As decoding continues, and references frame are no longer relevant,
userspace will do further DELETE_BUF calls. The STREAMON/OFF calls
are no needed anymore.

Pros:
 - It's simple to use
 - There is prior art in the API
Const:
 - QUERYBUF is now insufficient, as we need the format to be returned
 - G_FMT becomes ambiguous
 - It's unclear what to do with buffer index, are they shifted ?
 - Userspace and kernel need to keep managing buffer index, timestamp
   (aka cookie) which seems quite error prone
 - DELETE term might be off reality, maybe RELEASE ?


2. Use a control to pass references

That was an idea that came in previous discussion. We could introduce a
controls to set the 3 references in VP9. Along with each reference, we
could pass back the v4l2_format as it was when this reference frame was
decoded. This would fully by-pass the timestamp/cookie mechanism. But
would impose that VP9 only works with DMABuf, and that a
flush/streamoff/re-alloc/streamon operation happen. It would work if
the resolution changes are rare, e.g. not happening on consecutive
frames.

Pros:
  - Less invasive (uAPI/Spec whise)
Cons:
  - It's very expensive
  - The memory mapping cache is lost, and need to be re-implemented
    in the driver (or some helpers need to be exposed)
  - Is inconsistent with H264/HEVC


3. Split buffer allocation and queue

This is a bit of a crazy and unfinished idea. I'm writing it down just
to feed some ideas, and with hope somebody with the right knowledge (no
me) might make some sense out of it.

What we could consider is to dissociate completely the queues from
buffer allocation and their format. With this idea, the queues will
only serve as a queue of pending operations.

I believe such an allocation model would require a kernel object,
exposed to user-space as an FD, that can wrap an DMABuf and stored all
the relevant metadata, such as the video format for this "frame". For
those familiar with DRM, you'll see where this is inspired from. The
wrapper is also a good place for any caching needed when importing
buffers. So this is no longer cached in the queue.

This would require a whole new set of IOCTL to allocate, release (we
should start thinking in term of reference count rather then
create/delete).

As a side effect, these self contained frames allow serializing the
format changes inside a queue. In such model, the reference frames no
longer need to be in the queue, as they can be passed using their
wrapper. With this we basically get rid of the cookie/timestamp
mechanism which most of us dislike. The workflow is mostly identical
proposal 1, the difference is that reference lookup code can be
removed. The driver no longer need to strictly track the buffers that
has been allocated.

From the queue perspective, this would need to be a totally new type of
capture/output. The v4l2_buffer would point to a frame object rather
then memory pointer/dmabuf/dma-offset.

Pros:
  - Much more flexible model
  - Helps for buffer sharing
  - No more cookie/timestamp lookup all over kernel and userspace
  - A fully referenced count model
Cons:
  - This requires a lot of design, my idea is full of wholes
  - Can it really be implemented in parallel ?
  - Might have the same gruyere effect on the buffer index in queue
  - The io ops need to be re-factored into something else

see you tomorrow,
Nicolas


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: VP9 Stateless Support (follow up of [ANN] Report of Media Summit: Codecs)
  2019-11-14 18:15 ` VP9 Stateless Support (follow up of [ANN] Report of Media Summit: Codecs) Nicolas Dufresne
@ 2019-11-15  6:12   ` Boris Brezillon
  2019-11-15 14:52   ` Recap of: " Nicolas Dufresne
  1 sibling, 0 replies; 7+ messages in thread
From: Boris Brezillon @ 2019-11-15  6:12 UTC (permalink / raw)
  To: Nicolas Dufresne
  Cc: Hans Verkuil, Linux Media Mailing List, Alexandre Courbot,
	Tomasz Figa, Ezequiel Garcia, Daniel Gomez, Peter Griffin,
	Dafna Hirschfeld, Paul Kocialkowski, Helen Koike, Jan Schmidt,
	Dave Stevenson, Michael Tretter, Stanimir Varbanov,
	Matthew Waters

Hi Nicolas,

On Thu, 14 Nov 2019 13:15:17 -0500
Nicolas Dufresne <nicolas@ndufresne.ca> wrote:

> Hi all,
> 
> I hope you all had a good return back after ELCE. I'm writing in order
> to follow up and gather ideas around a blocker in our design regarding
> stateless CODECs and the VP9 CODEC. This emails provides the context,
> and I'd like, for who would like to participate, to have a chat on IRC
> #v4l at 14:30 CET for an hour at most. Hans needs to leave at 16h. If
> you can't attend, feel free to reply to this thread with ideas.

Thanks for starting this discussion.

> 
> Context:
> -------
> 
> VP9 has this concept that resolution can be changed at any frames,
> including intra-frames. The consequence of this, is that references
> frames maybe not all be of the same resolution. What happens in
> practice is that the reference frames are scaled up, or down, to the
> decoding target using a fully defined scaling algorithm.
> 
> In the context of Hantro (which I need to remind is likely the only VP9
> HW decoder design in the world, considering you can get this design for
> free), this scaling is done on the fly. The references frames are
> passed in their original size.
> 
> Our current design for state-less decoder, is that all reference are
> held and owned by the VB queue, and referred to with a timestamp (or
> cookie). The problem is that as of today, the VB queue buffers are all
> of the same formats (despite some looking forward attempt, like
> CREATE_BUFS).
> 
> Boris has implemented a proof of concept with the current API

Correction: I am implementing this PoC, its not finished yet :).

> limitation, but we could like to find a way forward so that we can
> support VP9 compliant implementation. The following are two ideas that
> already come up, more could be discussed tomorrow of course.

Some extra context: here are the VP9 controls I'm using right now [1].
Those are based on the VP9 spec [2] and the existing chromium
implementation [3]. The existing implementation, which I'm using as a
base for my PoC, pass the ref frame size/subsampling/bit_depth info
through a control to bypass the limitations described by Nicolas. I
don't think the driver deals with buffer re-allocation, but it should
work as long as the initial allocation size was big enough to cope with
the largest resolution.

> 
> 1. CREATE_BUFS/DELETE_BUFS
> --------------------------
> 
> I haven't checked how this is exposed in the VB2 framework, but
> CREATE_BUFS was created with this idea that you could extend an
> existing pool of buffers, with buffers of a different format.
> 
> In order to complete this story, we'd need a DELETE_BUF, which allow
> asking VB2 to drop it's reference to a specific chunk of memory. For
> VP9, a resolution change would looke like this (simplified):
> 
> - Userspace detect that next frame has different resolution
> - Then DELETE_BUF any buffers that are no longer relevant
> - Then TRY_FMT/CREATE_BUFS for the new resolution
> 
> As decoding continues, and references frame are no longer relevant,
> userspace will do further DELETE_BUF calls. The STREAMON/OFF calls
> are no needed anymore.
> 
> Pros:
>  - It's simple to use
>  - There is prior art in the API
> Const:
>  - QUERYBUF is now insufficient, as we need the format to be returned
>  - G_FMT becomes ambiguous
>  - It's unclear what to do with buffer index, are they shifted ?
>  - Userspace and kernel need to keep managing buffer index, timestamp
>    (aka cookie) which seems quite error prone
>  - DELETE term might be off reality, maybe RELEASE ?
> 
> 
> 2. Use a control to pass references
> 
> That was an idea that came in previous discussion. We could introduce a
> controls to set the 3 references in VP9. Along with each reference, we
> could pass back the v4l2_format as it was when this reference frame was
> decoded. This would fully by-pass the timestamp/cookie mechanism. But
> would impose that VP9 only works with DMABuf, and that a
> flush/streamoff/re-alloc/streamon operation happen. It would work if
> the resolution changes are rare, e.g. not happening on consecutive
> frames.
> 
> Pros:
>   - Less invasive (uAPI/Spec whise)
> Cons:
>   - It's very expensive
>   - The memory mapping cache is lost, and need to be re-implemented
>     in the driver (or some helpers need to be exposed)

Ideally in some helpers, with most of the logic taken from the vb2 core.

>   - Is inconsistent with H264/HEVC
> 
> 
> 3. Split buffer allocation and queue
> 
> This is a bit of a crazy and unfinished idea. I'm writing it down just
> to feed some ideas, and with hope somebody with the right knowledge (no
> me) might make some sense out of it.
> 
> What we could consider is to dissociate completely the queues from
> buffer allocation and their format. With this idea, the queues will
> only serve as a queue of pending operations.
> 
> I believe such an allocation model would require a kernel object,
> exposed to user-space as an FD, that can wrap an DMABuf and stored all
> the relevant metadata, such as the video format for this "frame". For
> those familiar with DRM, you'll see where this is inspired from. The
> wrapper is also a good place for any caching needed when importing
> buffers. So this is no longer cached in the queue.
> 
> This would require a whole new set of IOCTL to allocate, release (we
> should start thinking in term of reference count rather then
> create/delete).
> 
> As a side effect, these self contained frames allow serializing the
> format changes inside a queue. In such model, the reference frames no
> longer need to be in the queue, as they can be passed using their
> wrapper. With this we basically get rid of the cookie/timestamp
> mechanism which most of us dislike. The workflow is mostly identical
> proposal 1, the difference is that reference lookup code can be
> removed. The driver no longer need to strictly track the buffers that
> has been allocated.
> 
> From the queue perspective, this would need to be a totally new type of
> capture/output. The v4l2_buffer would point to a frame object rather
> then memory pointer/dmabuf/dma-offset.

I really like this new idea.

> 
> Pros:
>   - Much more flexible model
>   - Helps for buffer sharing
>   - No more cookie/timestamp lookup all over kernel and userspace
>   - A fully referenced count model
> Cons:
>   - This requires a lot of design, my idea is full of wholes
>   - Can it really be implemented in parallel ?
>   - Might have the same gruyere effect on the buffer index in queue

Not entirely sure why we need buffers to be indexed in this new model.
Isn't userspace in charge of buffer management here. All the queue
would take care of would be queueing/dequeuing buffers, and the buffer
to queue/dequeue would be passed as a bufdesc handle (or FD) instead of
an index. 

>   - The io ops need to be re-factored into something else

> 
> see you tomorrow,
> Nicolas
> 

Thanks for this summary/braindump.

Boris


[1]https://github.com/bbrezillon/linux/blob/779c72b5ad926b57595bdadcf3bffcdb559402b1/include/media/vp9-ctrls.h
[2]https://storage.googleapis.com/downloads.webmproject.org/docs/vp9/vp9-bitstream-specification-v0.6-20160331-draft.pdf
[3]https://chromium.googlesource.com/chromiumos/third_party/kernel/+/refs/heads/chromeos-4.4/include/uapi/linux/v4l2-controls.h#1240

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Recap of: VP9 Stateless Support (follow up of [ANN] Report of Media Summit: Codecs)
  2019-11-14 18:15 ` VP9 Stateless Support (follow up of [ANN] Report of Media Summit: Codecs) Nicolas Dufresne
  2019-11-15  6:12   ` Boris Brezillon
@ 2019-11-15 14:52   ` Nicolas Dufresne
  1 sibling, 0 replies; 7+ messages in thread
From: Nicolas Dufresne @ 2019-11-15 14:52 UTC (permalink / raw)
  To: Hans Verkuil, Linux Media Mailing List
  Cc: Boris Brezillon, Alexandre Courbot, Tomasz Figa, Ezequiel Garcia,
	Daniel Gomez, Peter Griffin, Dafna Hirschfeld, Paul Kocialkowski,
	Helen Koike, Jan Schmidt, Dave Stevenson, Michael Tretter,
	Stanimir Varbanov, Matthew Waters

Hi All,

So Hans, Boris and I participated to the chat. As consider that it's
important to get this CODEC stuff read in a realistic time frame, the
first and most realistic proposal was picked. So here's a summary of
that will be needed.

## Introduce a VIDIOC_DELETE_BUF

As discussed, this will create wholes in the indexes of the buffers in
the queue. VIDIOC_CREATE_BUFS has a start index, so it is left to
userspace to fill the wholes. We are not going to lift the 32 buffers
limit just yet, that's why it is important to make sure that wholes can
be filled. Typical strategy should be to first discard buffers before
creating new one.

## Workflow

Upon a resolution change, userspace can start allocating frames for the
new resolution at any time (that's the nature of CREATE_BUFS). It can
also discard no-longer referenced buffers using DELETE_BUFS. Assuming
userspace make use of DMABuf, that can happen independently. This could
benefit many other drivers, since it will reduce the number actions
that must be done during the STREAMOFF period.

In order to operate the resolution change, the process remains the
same. With the exception that REQBUFS is no longer needed. The
framework should already validate properly the buffer size (to be
verified please).

 - STREAMOFF(Capture)
 - S_FMT(Capture)
 - QBUF(Capture) *
 - STREAMON(Capture)

Stream ON/OFF operation must be made independent between CAPTURE and
OUTPUT queues if it's not already the case.

## Driver specific bits

Drivers that support mixed size references will likely extend the
structure that stores buffer in vb2, and save the format used at QBUF
time. This avoids the need for an extra control later. This also avoid
having a lot of per-driver validation.

regards,
Nicolas


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [ANN] Report of Media Summit: Codecs
  2019-11-02 12:50 [ANN] Report of Media Summit: Codecs Hans Verkuil
  2019-11-14 18:15 ` VP9 Stateless Support (follow up of [ANN] Report of Media Summit: Codecs) Nicolas Dufresne
@ 2019-11-19 14:18 ` Hans Verkuil
  2019-12-03  7:08   ` Tomasz Figa
  2019-12-03  6:56 ` Tomasz Figa
  2 siblings, 1 reply; 7+ messages in thread
From: Hans Verkuil @ 2019-11-19 14:18 UTC (permalink / raw)
  To: Linux Media Mailing List
  Cc: Boris Brezillon, Alexandre Courbot, Nicolas Dufresne,
	Tomasz Figa, Ezequiel Garcia, Daniel Gomez, Peter Griffin,
	Dafna Hirschfeld, Paul Kocialkowski, Helen Koike, Jan Schmidt,
	Dave Stevenson, Michael Tretter, Stanimir Varbanov,
	Matthew Waters

On 11/2/19 1:50 PM, Hans Verkuil wrote:

<snip>

> Action Items
> ------------
> 
> Hans Verkuil:
> 
> - Ask Cisco colleagues which bitrate-related parameters have to be per-frame for
>   an encoder
> - make stateful encoder infrastructure + documentation for the missing bits
> - investigate using different sizes for metadata controls in the control framework:
>   is this possible?

The problem is with compound arrays. struct v4l2_ext_control just provides a size field
for the total size and so for an array there is no way to discover the actual element
size.

That said, to my knowledge there are currently no compound arrays defined in the mainline
kernel. So one option is to take the last reserved __u32 field and split it in a
__u16 elem_size and a __u16 reserved2 field, or just use the full __u32 for the elem size
and drop the reserved2 field.

Alternatively, we prohibit compound arrays for now and postpone making any changes to
struct v4l2_ext_control until we actually need this.

As long as it is not an array, then we can safely extend these compound control structs
later. It requires some work in the control framework, but it isn't too bad.

I'm in favor of implementing this, and for now prohibiting the use of compound arrays.

It is really helpful making these codec state controls at least somewhat future-proof.

Comments?

	Hans

> 
> Michael Tretter:
> 
> - Support the new encoder stateful controls in the driver
> 
> Tomasz Figa:
> 
> - look up AMD encoder support
> 
> Boris Brezillon:
> 
> - send v3 of hantro g1 fixes
> 
> Nicolas Dufresne:
> 
> - look into multiview and sublayers support
> 
> Paul Kocialkowski:
> 
> - check metadata controls against the standards and update the docs if needed
> 
> Ezequiel Garcia and Boris Brezillon:
> 
> - add VP9 support
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [ANN] Report of Media Summit: Codecs
  2019-11-02 12:50 [ANN] Report of Media Summit: Codecs Hans Verkuil
  2019-11-14 18:15 ` VP9 Stateless Support (follow up of [ANN] Report of Media Summit: Codecs) Nicolas Dufresne
  2019-11-19 14:18 ` [ANN] Report of Media Summit: Codecs Hans Verkuil
@ 2019-12-03  6:56 ` Tomasz Figa
  2 siblings, 0 replies; 7+ messages in thread
From: Tomasz Figa @ 2019-12-03  6:56 UTC (permalink / raw)
  To: Hans Verkuil, Nicolas Dufresne
  Cc: Linux Media Mailing List, Boris Brezillon, Alexandre Courbot,
	Ezequiel Garcia, Daniel Gomez, Peter Griffin, Dafna Hirschfeld,
	Paul Kocialkowski, Helen Koike, Jan Schmidt, Dave Stevenson,
	Michael Tretter, Stanimir Varbanov, Matthew Waters

On Sat, Nov 2, 2019 at 9:50 PM Hans Verkuil <hverkuil@xs4all.nl> wrote:
>
> Media Summit: Codecs
> October 29, 2019 - Lyon, France
>
> Many thanks to the Linux Foundation for hosting this meeting. Much appreciated!
>
> Please reply to this report with any comments/corrections. Especially if I missed
> any action items!
>

Thanks Hans for the report. It looks good to me. Just replied to my
action item inline.

>
> Original announcement:
>
> https://lore.kernel.org/linux-media/c8380b43-2742-f1cb-0fb9-2c3c90e29a33@xs4all.nl/T/
>
>
> Attendees:
>
> Boris Brezillon <boris.brezillon@collabora.com>
> Alexandre Courbot <acourbot@chromium.org>, Google Chrome OS
> Nicolas Dufresne <nicolas@ndufresne.ca>
> Tomasz Figa <tfiga@chromium.org>, Google Chrome OS
> Ezequiel Garcia <ezequiel.garcia@collabora.com>
> Daniel Gomez <daniel@qtec.com>
> Peter Griffin <Peter.griffin@linaro.org>
> Dafna Hirschfeld <dafna.hirschfeld@collabora.com>
> Paul Kocialkowski <paul.kocialkowski@bootlin.com>, Bootlin
> Helen Koike <helen.koike@collabora.com>
> Jan Schmidt <jan@centricular.com>
> Dave Stevenson <dave.stevenson@raspberrypi.org>
> Michael Tretter <m.tretter@pengutronix.de>
> Stanimir Varbanov <stanimir.varbanov@linaro.org>
> Hans Verkuil <hverkuil@xs4all.nl>, Cisco Systems Norway
> Matthew Waters <matthew@centricular.com>
>
>
> Discussion of pending codec patches
> -----------------------------------
>
> A v3 series for v5.4 hantro fixes will be posted soon.
>
>
> Requirements for moving drivers out of staging
> ----------------------------------------------
>
> - More testing is needed for these two H.264 and HEVC features:
>   Multiview (used by 3D video, hantro supports this)
>   Stream with sublayers
>
> - Improve standard references in the control metadata documentation.
>
> - Look into the ability to add fields to a compound control in the
>   future: requires some investigation into the control framework.
>   Helps to make the API more future proof.
>
> - Document that metadata controls must stick to the standard as much
>   as possible. No hardware specific data is allowed.
>
> - At least one stateless encoder should be present, ideally for each
>   codec. This is needed to see if the existing metadata controls can
>   be reused by encoders. Check the Intel encoder and VA API. Also
>   check if there is any source code for the AMD encoder.
>
>   For reference: this is the Rockchip H.264 stateless encoder metadata:
>
>         struct rk3288_h264e_reg_params {
>                 u32 frame_coding_type;
>                 s32 pic_init_qp;
>                 s32 slice_alpha_offset;
>                 s32 slice_beta_offset;
>                 s32 chroma_qp_index_offset;
>                 s32 filter_disable;
>                 u16 idr_pic_id;
>                 s32 pps_id;
>                 s32 frame_num;
>                 s32 slice_size_mb_rows;
>                 s32 h264_inter4x4_disabled;
>                 s32 enable_cabac;
>                 s32 transform8x8_mode;
>                 s32 cabac_init_idc;
>                 /* rate control relevant */
>                 s32 qp;
>                 s32 mad_qp_delta;
>                 s32 mad_threshold;
>                 s32 qp_min;
>                 s32 qp_max;
>                 s32 cp_distance_mbs;
>                 s32 cp_target[10];
>                 s32 target_error[7];
>                 s32 delta_qp[7];
>         };
>
>   And this is the VP8 metadata:
>
>         struct rk3399_vp8e_reg_params {
>                 u32 is_intra;
>                 u32 frm_hdr_size;
>                 u32 qp;
>                 s32 mv_prob[2][19];
>                 s32 intra_prob;
>                 u32 bool_enc_value;
>                 u32 bool_enc_value_bits;
>                 u32 bool_enc_range;
>                 u32 filterDisable;
>                 u32 filter_sharpness;
>                 u32 filter_level;
>                 s32 intra_frm_delta;
>                 s32 last_frm_delta;
>                 s32 golden_frm_delta;
>                 s32 altref_frm_delta;
>                 s32 bpred_mode_delta;
>                 s32 zero_mode_delta;
>                 s32 newmv_mode_delta;
>                 s32 splitmv_mode_delta;
>         };
>
>         Source:
> https://chromium.googlesource.com/chromiumos/third_party/libv4lplugins/+/79286ece8624ab016575a5ad8965a61b334ab169/libv4l-rockchip_v2/libvepu/common/rk_venc.h
>
>   And the metadata for cedrus:
>
>         struct h264enc_params {
>                 unsigned int width;
>                 unsigned int height;
>                 unsigned int src_width;
>                 unsigned int src_height;
>                 enum color_format { H264_FMT_NV12 = 0, H264_FMT_NV16 = 1 } src_format;
>                 unsigned int profile_idc, level_idc;
>                 enum { H264_EC_CAVLC = 0, H264_EC_CABAC = 1 } entropy_coding_mode;
>                 unsigned int qp;
>                 unsigned int keyframe_interval;
>         };
>
>         Source: https://github.com/jemk/cedrus/blob/master/h264enc/h264enc.c
>
>   Cisco also has a requirement that the bitrate can be controlled per-frame.
>
>   Conclusion: stateless encoder support needs some research. However, the general
>   suspicion is that the decoder metadata controls are unlikely to be reused for
>   stateless encoders.
>
>
> Finalize Stateful Encoder
> -------------------------
>
> Currently S_PARM/ENUM_FRAMEINTERVALS is used to set the framerate which is needed
> by encoders together with the desired bitrate to determine the compression ratio.
>
> After some discussion we realized that this should actually refer to the rate at
> which the encoder produces compressed frames: this is needed when you want to
> encoder multiple streams in parallel and you want to indicate how the encoder
> hardware should balance these encoder processes. E.g. the Xilinx encoder can
> reserve N encoding cores depending on the demand.
>
> Setting the actual framerate (which is needed to determine the compression ratio)
> is separate from this and should be done through a new control of type v4l2_fract
> that indicates the framerate (not interval, the userspace people very much preferred
> framerate over frameinterval).
>
> This requires the introduction of V4L2_CTRL_WHICH_{MIN,MAX}_VAL to obtain the min
> and max values of a compound control. But this tries in nicely with the work
> Ricardo Ribalda Delgado is doing for the V4L2_CTRL_TYPE_AREA controls.
>
> Hans will implement this in the control framework, Michael Tretter will add support
> for this in his Xilinx driver.
>
>
> The other outstanding issue with stateful encoders is what to do if the capture
> buffer is too small. It turns out that it is next to impossible to precisely
> predict the minimum size of a capture buffers, so some mechanism is needed to
> handle this corner case.
>
> We agreed that the best way is to mark the capture buffer that's too small with
> V4L2_BUF_FLAG_ERROR and indicate that the reason is that it is too small with
> a new buffer flag: V4L2_BUF_FLAG_TOO_SMALL (0x00080000). When userspace sees
> this it should stop streaming on the capture side, reallocate and requeue
> capture buffers, and restart streaming.
>
> This will work fine if there are no B frames. If the encoder produces B frames
> as well, then this approach can produce an invalid stream. The only way this can
> be resolved is if the HW/FW can rollback its internal state to before the point
> this error was detected. In the future we need a pixelformat flag to indicate
> that the HW/FW can rollback.
>
> If the HW can fragment the encoded frames over multiple capture buffers, then it
> should do so. The driver should set V4L2_FMT_FLAG_CONTINUOUS_BYTESTREAM for
> this. However, this feature should probably be expliticly requested. This can be
> done through a new V4L2_PIX_FMT_FLAG_ flag. Some more discussion is needed for this.
>
>
> Nicolas mentioned that some codec drivers used to return wrong values for
> ENUM_FRAMEINTERVALS (swapped numerator/denominator): v4l2-compliance should check
> if the returned values are sane.
>
> Nicolas also mentioned that it is not clear how drivers should round S_FMT
> resolutions for codecs: this is currently driver specific. He would like this
> to be documented (and checked) as rounding up.
>
>
> Touched upon but not really discussed in-depth: SVC (Scalable Video Coding)
> support.
>
>
> Action Items
> ------------
>
> Hans Verkuil:
>
> - Ask Cisco colleagues which bitrate-related parameters have to be per-frame for
>   an encoder
> - make stateful encoder infrastructure + documentation for the missing bits
> - investigate using different sizes for metadata controls in the control framework:
>   is this possible?
>
> Michael Tretter:
>
> - Support the new encoder stateful controls in the driver
>
> Tomasz Figa:
>
> - look up AMD encoder support
>

We haven't enabled it on our system yet, but the code is based on the
Mesa Gallium VA state tracker:
https://gitlab.freedesktop.org/mesa/mesa/tree/master/src/gallium/state_trackers/va/
https://gitlab.freedesktop.org/mesa/mesa/tree/master/src/gallium/drivers/radeon/

Best regards,
Tomasz

> Boris Brezillon:
>
> - send v3 of hantro g1 fixes
>
> Nicolas Dufresne:
>
> - look into multiview and sublayers support
>
> Paul Kocialkowski:
>
> - check metadata controls against the standards and update the docs if needed
>
> Ezequiel Garcia and Boris Brezillon:
>
> - add VP9 support
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [ANN] Report of Media Summit: Codecs
  2019-11-19 14:18 ` [ANN] Report of Media Summit: Codecs Hans Verkuil
@ 2019-12-03  7:08   ` Tomasz Figa
  0 siblings, 0 replies; 7+ messages in thread
From: Tomasz Figa @ 2019-12-03  7:08 UTC (permalink / raw)
  To: Hans Verkuil
  Cc: Linux Media Mailing List, Boris Brezillon, Alexandre Courbot,
	Nicolas Dufresne, Ezequiel Garcia, Daniel Gomez, Peter Griffin,
	Dafna Hirschfeld, Paul Kocialkowski, Helen Koike, Jan Schmidt,
	Dave Stevenson, Michael Tretter, Stanimir Varbanov,
	Matthew Waters

On Tue, Nov 19, 2019 at 11:18 PM Hans Verkuil <hverkuil@xs4all.nl> wrote:
>
> On 11/2/19 1:50 PM, Hans Verkuil wrote:
>
> <snip>
>
> > Action Items
> > ------------
> >
> > Hans Verkuil:
> >
> > - Ask Cisco colleagues which bitrate-related parameters have to be per-frame for
> >   an encoder
> > - make stateful encoder infrastructure + documentation for the missing bits
> > - investigate using different sizes for metadata controls in the control framework:
> >   is this possible?
>
> The problem is with compound arrays. struct v4l2_ext_control just provides a size field
> for the total size and so for an array there is no way to discover the actual element
> size.
>
> That said, to my knowledge there are currently no compound arrays defined in the mainline
> kernel.

V4L2_CTRL_TYPE_H264_SLICE_PARAMS is a compound array, with one element
of the array for each slice in the bitstream buffer.

> So one option is to take the last reserved __u32 field and split it in a
> __u16 elem_size and a __u16 reserved2 field, or just use the full __u32 for the elem size
> and drop the reserved2 field.

__u16 would limit the element size to 64k. While I don't see any
controls bigger than that on the horizon (and with the current design
handling such big controls would add a lot of memcpy overhead anyway),
I wonder if that's a good assumption. I'd probably just go with __u32.

>
> Alternatively, we prohibit compound arrays for now and postpone making any changes to
> struct v4l2_ext_control until we actually need this.
>
> As long as it is not an array, then we can safely extend these compound control structs
> later. It requires some work in the control framework, but it isn't too bad.
>
> I'm in favor of implementing this, and for now prohibiting the use of compound arrays.
>
> It is really helpful making these codec state controls at least somewhat future-proof.
>
> Comments?
>
>         Hans
>
> >
> > Michael Tretter:
> >
> > - Support the new encoder stateful controls in the driver
> >
> > Tomasz Figa:
> >
> > - look up AMD encoder support
> >
> > Boris Brezillon:
> >
> > - send v3 of hantro g1 fixes
> >
> > Nicolas Dufresne:
> >
> > - look into multiview and sublayers support
> >
> > Paul Kocialkowski:
> >
> > - check metadata controls against the standards and update the docs if needed
> >
> > Ezequiel Garcia and Boris Brezillon:
> >
> > - add VP9 support
> >
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-12-03  7:08 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-02 12:50 [ANN] Report of Media Summit: Codecs Hans Verkuil
2019-11-14 18:15 ` VP9 Stateless Support (follow up of [ANN] Report of Media Summit: Codecs) Nicolas Dufresne
2019-11-15  6:12   ` Boris Brezillon
2019-11-15 14:52   ` Recap of: " Nicolas Dufresne
2019-11-19 14:18 ` [ANN] Report of Media Summit: Codecs Hans Verkuil
2019-12-03  7:08   ` Tomasz Figa
2019-12-03  6:56 ` Tomasz Figa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).