linux-media.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/2] media: cedrus: Add H264 decoding support
@ 2018-11-15 14:56 Maxime Ripard
  2018-11-15 14:56 ` [PATCH v2 1/2] media: uapi: Add H264 low-level decoder API compound controls Maxime Ripard
                   ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Maxime Ripard @ 2018-11-15 14:56 UTC (permalink / raw)
  To: hans.verkuil, acourbot, sakari.ailus, Laurent Pinchart
  Cc: tfiga, posciak, Paul Kocialkowski, Chen-Yu Tsai, linux-kernel,
	linux-arm-kernel, linux-media, nicolas.dufresne, jenskuske,
	linux-sunxi, Thomas Petazzoni, Maxime Ripard


Hi,

Here is a new version of the H264 decoding support in the cedrus
driver.

As you might already know, the cedrus driver relies on the Request
API, and is a reverse engineered driver for the video decoding engine
found on the Allwinner SoCs.

This work has been possible thanks to the work done by the people
behind libvdpau-sunxi found here:
https://github.com/linux-sunxi/libvdpau-sunxi/

It's based on v4.20-rc1, plus the tag patches sent this week by Hans
Verkuil.

I've been using the controls currently integrated into ChromeOS that
have a working version of this particular setup. However, these
controls have a number of shortcomings and inconsistencies with other
decoding API. I've worked with libva so far, but I've noticed already
that:
  - The kernel UAPI expects to have the nal_ref_idc variable, while
    libva only exposes whether that frame is a reference frame or
    not. I've looked at the rockchip driver in the ChromeOS tree, and
    our own driver, and they both need only the information about
    whether the frame is a reference one or not, so maybe we should
    change this?
  - The H264 bitstream exposes the picture default reference list (for
    both list 0 and list 1), the slice reference list and an override
    flag. The libva will only pass the reference list to be used (so
    either the picture default's or the slice's) depending on the
    override flag. The kernel UAPI wants the picture default reference
    list and the slice reference list, but doesn't expose the override
    flag, which prevents us from configuring properly the
    hardware. Our video decoding engine needs the three information,
    but we can easily adapt to having only one. However, having two
    doesn't really work for us.

It's pretty much the only one I've noticed so far, but we should
probably fix them already. And there's probably other, feel free to
step in.

I've tested the various ABI using this gdb script:
http://code.bulix.org/jl4se4-505620?raw

And this test script:
http://code.bulix.org/8zle4s-505623?raw

The application compiled is quite trivial:
http://code.bulix.org/e34zp8-505624?raw

The output is:
arm:	builds/arm-test-v4l2-h264-structures
	SHA1: 88cbf7485ba81831fc3b93772b215599b3b38318
x86:	builds/x86-test-v4l2-h264-structures
	SHA1: 88cbf7485ba81831fc3b93772b215599b3b38318
x64:	builds/x64-test-v4l2-h264-structures
	SHA1: 88cbf7485ba81831fc3b93772b215599b3b38318
arm64:	builds/arm64-test-v4l2-h264-structures
	SHA1: 88cbf7485ba81831fc3b93772b215599b3b38318

Let me know if there's any flaw using that test setup, or if you have
any comments on the patches.

Maxime

Changes from v1:
  - Rebased on 4.20
  - Did the documentation for the userspace API
  - Used the tags instead of buffer IDs
  - Added a comment to explain why we still needed the swdec trigger
  - Reworked the MV col buffer in order to have one slot per frame
  - Removed the unused neighbor info buffer
  - Made sure to have the same structure offset and alignments across
    32 bits and 64 bits architecture

Maxime Ripard (1):
  media: cedrus: Add H264 decoding support

Pawel Osciak (1):
  media: uapi: Add H264 low-level decoder API compound controls.

 Documentation/media/uapi/v4l/biblio.rst       |   9 +
 .../media/uapi/v4l/extended-controls.rst      | 364 ++++++++++++++
 .../media/uapi/v4l/pixfmt-compressed.rst      |  20 +
 .../media/uapi/v4l/vidioc-queryctrl.rst       |  30 ++
 .../media/videodev2.h.rst.exceptions          |   5 +
 drivers/media/v4l2-core/v4l2-ctrls.c          |  42 ++
 drivers/media/v4l2-core/v4l2-ioctl.c          |   1 +
 drivers/staging/media/sunxi/cedrus/Makefile   |   3 +-
 drivers/staging/media/sunxi/cedrus/cedrus.c   |  25 +
 drivers/staging/media/sunxi/cedrus/cedrus.h   |  35 +-
 .../staging/media/sunxi/cedrus/cedrus_dec.c   |  11 +
 .../staging/media/sunxi/cedrus/cedrus_h264.c  | 470 ++++++++++++++++++
 .../staging/media/sunxi/cedrus/cedrus_hw.c    |   4 +
 .../staging/media/sunxi/cedrus/cedrus_regs.h  |  63 +++
 .../staging/media/sunxi/cedrus/cedrus_video.c |   9 +
 include/media/v4l2-ctrls.h                    |  10 +
 include/uapi/linux/v4l2-controls.h            | 166 +++++++
 include/uapi/linux/videodev2.h                |  11 +
 18 files changed, 1276 insertions(+), 2 deletions(-)
 create mode 100644 drivers/staging/media/sunxi/cedrus/cedrus_h264.c

-- 
2.19.1

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v2 1/2] media: uapi: Add H264 low-level decoder API compound controls.
  2018-11-15 14:56 [PATCH v2 0/2] media: cedrus: Add H264 decoding support Maxime Ripard
@ 2018-11-15 14:56 ` Maxime Ripard
  2018-11-27 17:23   ` [linux-sunxi] " Jernej Škrabec
                     ` (3 more replies)
  2018-11-15 14:56 ` [PATCH v2 2/2] media: cedrus: Add H264 decoding support Maxime Ripard
  2018-11-16  7:04 ` [PATCH v2 0/2] " Tomasz Figa
  2 siblings, 4 replies; 27+ messages in thread
From: Maxime Ripard @ 2018-11-15 14:56 UTC (permalink / raw)
  To: hans.verkuil, acourbot, sakari.ailus, Laurent Pinchart
  Cc: tfiga, posciak, Paul Kocialkowski, Chen-Yu Tsai, linux-kernel,
	linux-arm-kernel, linux-media, nicolas.dufresne, jenskuske,
	linux-sunxi, Thomas Petazzoni, Guenter Roeck, Maxime Ripard

From: Pawel Osciak <posciak@chromium.org>

Stateless video codecs will require both the H264 metadata and slices in
order to be able to decode frames.

This introduces the definitions for a new pixel format for H264 slices that
have been parsed, as well as the structures used to pass the metadata from
the userspace to the kernel.

Co-Developed-by: Maxime Ripard <maxime.ripard@bootlin.com>
Signed-off-by: Pawel Osciak <posciak@chromium.org>
Signed-off-by: Guenter Roeck <groeck@chromium.org>
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
---
 Documentation/media/uapi/v4l/biblio.rst       |   9 +
 .../media/uapi/v4l/extended-controls.rst      | 364 ++++++++++++++++++
 .../media/uapi/v4l/pixfmt-compressed.rst      |  20 +
 .../media/uapi/v4l/vidioc-queryctrl.rst       |  30 ++
 .../media/videodev2.h.rst.exceptions          |   5 +
 drivers/media/v4l2-core/v4l2-ctrls.c          |  42 ++
 drivers/media/v4l2-core/v4l2-ioctl.c          |   1 +
 include/media/v4l2-ctrls.h                    |  10 +
 include/uapi/linux/v4l2-controls.h            | 166 ++++++++
 include/uapi/linux/videodev2.h                |  11 +
 10 files changed, 658 insertions(+)

diff --git a/Documentation/media/uapi/v4l/biblio.rst b/Documentation/media/uapi/v4l/biblio.rst
index 386d6cf83e9c..73aeb7ce47d2 100644
--- a/Documentation/media/uapi/v4l/biblio.rst
+++ b/Documentation/media/uapi/v4l/biblio.rst
@@ -115,6 +115,15 @@ ITU BT.1119
 
 :author:    International Telecommunication Union (http://www.itu.ch)
 
+.. _h264:
+
+ITU H.264
+=========
+
+:title:     ITU-T Recommendation H.264 "Advanced Video Coding for Generic Audiovisual Services"
+
+:author:    International Telecommunication Union (http://www.itu.ch)
+
 .. _jfif:
 
 JFIF
diff --git a/Documentation/media/uapi/v4l/extended-controls.rst b/Documentation/media/uapi/v4l/extended-controls.rst
index 65a1d873196b..87c0d151577f 100644
--- a/Documentation/media/uapi/v4l/extended-controls.rst
+++ b/Documentation/media/uapi/v4l/extended-controls.rst
@@ -1674,6 +1674,370 @@ enum v4l2_mpeg_video_h264_hierarchical_coding_type -
 	non-intra-coded frames, in zigzag scanning order. Only relevant for
 	non-4:2:0 YUV formats.
 
+.. _v4l2-mpeg-h264:
+
+``V4L2_CID_MPEG_VIDEO_H264_SPS (struct)``
+    Specifies the sequence parameter set (as extracted from the
+    bitstream) for the associated H264 slice data. This includes the
+    necessary parameters for configuring a stateless hardware decoding
+    pipeline for H264.  The bitstream parameters are defined according
+    to :ref:`h264`. Unless there's a specific comment, refer to the
+    specification for the documentation of these fields.
+
+.. c:type:: v4l2_ctrl_h264_sps
+
+.. cssclass:: longtable
+
+.. flat-table:: struct v4l2_ctrl_h264_sps
+    :header-rows:  0
+    :stub-columns: 0
+    :widths:       1 1 2
+
+    * - __u8
+      - ``profile_idc``
+      -
+    * - __u8
+      - ``constraint_set_flags``
+      - TODO
+    * - __u8
+      - ``level_idc``
+      -
+    * - __u8
+      - ``seq_parameter_set_id``
+      -
+    * - __u8
+      - ``chroma_format_idc``
+      -
+    * - __u8
+      - ``bit_depth_luma_minus8``
+      -
+    * - __u8
+      - ``bit_depth_chroma_minus8``
+      -
+    * - __u8
+      - ``log2_max_frame_num_minus4``
+      -
+    * - __u8
+      - ``pic_order_cnt_type``
+      -
+    * - __u8
+      - ``log2_max_pic_order_cnt_lsb_minus4``
+      -
+    * - __u8
+      - ``max_num_ref_frames``
+      -
+    * - __u8
+      - ``num_ref_frames_in_pic_order_cnt_cycle``
+      -
+    * - __s32
+      - ``offset_for_ref_frame[255]``
+      -
+    * - __s32
+      - ``offset_for_non_ref_pic``
+      -
+    * - __s32
+      - ``offset_for_top_to_bottom_field``
+      -
+    * - __u16
+      - ``pic_width_in_mbs_minus1``
+      -
+    * - __u16
+      - ``pic_height_in_map_units_minus1``
+      -
+    * - __u8
+      - ``flags``
+      - TODO
+
+``V4L2_CID_MPEG_VIDEO_H264_PPS (struct)``
+    Specifies the picture parameter set (as extracted from the
+    bitstream) for the associated H264 slice data. This includes the
+    necessary parameters for configuring a stateless hardware decoding
+    pipeline for H264.  The bitstream parameters are defined according
+    to :ref:`h264`. Unless there's a specific comment, refer to the
+    specification for the documentation of these fields.
+
+.. c:type:: v4l2_ctrl_h264_pps
+
+.. cssclass:: longtable
+
+.. flat-table:: struct v4l2_ctrl_h264_pps
+    :header-rows:  0
+    :stub-columns: 0
+    :widths:       1 1 2
+
+    * - __u8
+      - ``pic_parameter_set_id``
+      -
+    * - __u8
+      - ``seq_parameter_set_id``
+      -
+    * - __u8
+      - ``num_slice_groups_minus1``
+      -
+    * - __u8
+      - ``num_ref_idx_l0_default_active_minus1``
+      -
+    * - __u8
+      - ``num_ref_idx_l1_default_active_minus1``
+      -
+    * - __u8
+      - ``weighted_bipred_idc``
+      -
+    * - __s8
+      - ``pic_init_qp_minus26``
+      -
+    * - __s8
+      - ``pic_init_qs_minus26``
+      -
+    * - __s8
+      - ``chroma_qp_index_offset``
+      -
+    * - __s8
+      - ``second_chroma_qp_index_offset``
+      -
+    * - __u8
+      - ``flags``
+      - TODO
+
+``V4L2_CID_MPEG_VIDEO_H264_SCALING_MATRIX (struct)``
+    Specifies the scaling matrix (as extracted from the bitstream) for
+    the associated H264 slice data. The bitstream parameters are
+    defined according to :ref:`h264`. Unless there's a specific
+    comment, refer to the specification for the documentation of these
+    fields.
+
+.. c:type:: v4l2_ctrl_h264_scaling_matrix
+
+.. cssclass:: longtable
+
+.. flat-table:: struct v4l2_ctrl_h264_scaling_matrix
+    :header-rows:  0
+    :stub-columns: 0
+    :widths:       1 1 2
+
+    * - __u8
+      - ``scaling_list_4x4[6][16]``
+      -
+    * - __u8
+      - ``scaling_list_8x8[6][64]``
+      -
+
+``V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAMS (struct)``
+    Specifies the slice parameters (as extracted from the bitstream)
+    for the associated H264 slice data. This includes the necessary
+    parameters for configuring a stateless hardware decoding pipeline
+    for H264.  The bitstream parameters are defined according to
+    :ref:`h264`. Unless there's a specific comment, refer to the
+    specification for the documentation of these fields.
+
+.. c:type:: v4l2_ctrl_h264_slice_param
+
+.. cssclass:: longtable
+
+.. flat-table:: struct v4l2_ctrl_h264_slice_param
+    :header-rows:  0
+    :stub-columns: 0
+    :widths:       1 1 2
+
+    * - __u32
+      - ``size``
+      -
+    * - __u32
+      - ``header_bit_size``
+      -
+    * - __u16
+      - ``first_mb_in_slice``
+      -
+    * - __u8
+      - ``slice_type``
+      -
+    * - __u8
+      - ``pic_parameter_set_id``
+      -
+    * - __u8
+      - ``colour_plane_id``
+      -
+    * - __u16
+      - ``frame_num``
+      -
+    * - __u16
+      - ``idr_pic_id``
+      -
+    * - __u16
+      - ``pic_order_cnt_lsb``
+      -
+    * - __s32
+      - ``delta_pic_order_cnt_bottom``
+      -
+    * - __s32
+      - ``delta_pic_order_cnt0``
+      -
+    * - __s32
+      - ``delta_pic_order_cnt1``
+      -
+    * - __u8
+      - ``redundant_pic_cnt``
+      -
+    * - struct :c:type:`v4l2_h264_pred_weight_table`
+      - ``pred_weight_table``
+      -
+    * - __u32
+      - ``dec_ref_pic_marking_bit_size``
+      -
+    * - __u32
+      - ``pic_order_cnt_bit_size``
+      -
+    * - __u8
+      - ``cabac_init_idc``
+      -
+    * - __s8
+      - ``slice_qp_delta``
+      -
+    * - __s8
+      - ``slice_qs_delta``
+      -
+    * - __u8
+      - ``disable_deblocking_filter_idc``
+      -
+    * - __s8
+      - ``slice_alpha_c0_offset_div2``
+      -
+    * - __s8
+      - ``slice_beta_offset_div2``
+      -
+    * - __u32
+      - ``slice_group_change_cycle``
+      -
+    * - __u8
+      - ``num_ref_idx_l0_active_minus1``
+      -
+    * - __u8
+      - ``num_ref_idx_l1_active_minus1``
+      -
+    * - __u8
+      - ``ref_pic_list0[32]``
+      -
+    * - __u8
+      - ``ref_pic_list1[32]``
+      -
+    * - __u8
+      - ``flags``
+      - TODO
+
+.. c:type:: v4l2_h264_pred_weight_table
+
+.. cssclass:: longtable
+
+.. flat-table:: struct v4l2_h264_pred_weight_table
+    :header-rows:  0
+    :stub-columns: 0
+    :widths:       1 1 2
+
+    * - __u8
+      - ``luma_log2_weight_denom``
+      -
+    * - __u8
+      - ``chroma_log2_weight_denom``
+      -
+    * - struct :c:type:`v4l2_h264_weight_factors`
+      - ``weight_factors[2]``
+      -
+
+.. c:type:: v4l2_h264_weight_factors
+
+.. cssclass:: longtable
+
+.. flat-table:: struct v4l2_h264_weight_factors
+    :header-rows:  0
+    :stub-columns: 0
+    :widths:       1 1 2
+
+    * - __s8
+      - ``luma_weight[32]``
+      -
+    * - __s8
+      - ``luma_offset[32]``
+      -
+    * - __s8
+      - ``chroma_weight[32][2]``
+      -
+    * - __s8
+      - ``chroma_offset[32][2]``
+      -
+
+``V4L2_CID_MPEG_VIDEO_H264_DECODE_PARAMS (struct)``
+    Specifies the decode parameters (as extracted from the bitstream)
+    for the associated H264 slice data. This includes the necessary
+    parameters for configuring a stateless hardware decoding pipeline
+    for H264.  The bitstream parameters are defined according to
+    :ref:`h264`. Unless there's a specific comment, refer to the
+    specification for the documentation of these fields.
+
+.. c:type:: v4l2_ctrl_h264_decode_param
+
+.. cssclass:: longtable
+
+.. flat-table:: struct v4l2_ctrl_h264_decode_param
+    :header-rows:  0
+    :stub-columns: 0
+    :widths:       1 1 2
+
+    * - __u32
+      - ``num_slices``
+      -
+    * - __u8
+      - ``idr_pic_flag``
+      -
+    * - __u8
+      - ``nal_ref_idc``
+      -
+    * - __s32
+      - ``top_field_order_cnt``
+      -
+    * - __s32
+      - ``bottom_field_order_cnt``
+      -
+    * - __u8
+      - ``ref_pic_list_p0[32]``
+      -
+    * - __u8
+      - ``ref_pic_list_b0[32]``
+      -
+    * - __u8
+      - ``ref_pic_list_b1[32]``
+      -
+    * - struct :c:type:`v4l2_h264_dpb_entry`
+      - ``dpb[16]``
+      -
+
+.. c:type:: v4l2_h264_dpb_entry
+
+.. cssclass:: longtable
+
+.. flat-table:: struct v4l2_h264_dpb_entry
+    :header-rows:  0
+    :stub-columns: 0
+    :widths:       1 1 2
+
+    * - __u32
+      - ``tag``
+      - tag to identify the buffer containing the reference frame
+    * - __u16
+      - ``frame_num``
+      -
+    * - __u16
+      - ``pic_num``
+      -
+    * - __s32
+      - ``top_field_order_cnt``
+      -
+    * - __s32
+      - ``bottom_field_order_cnt``
+      -
+    * - __u8
+      - ``flags``
+      -
+
 MFC 5.1 MPEG Controls
 ---------------------
 
diff --git a/Documentation/media/uapi/v4l/pixfmt-compressed.rst b/Documentation/media/uapi/v4l/pixfmt-compressed.rst
index ba0f6c49d9bf..f15fc1c8d479 100644
--- a/Documentation/media/uapi/v4l/pixfmt-compressed.rst
+++ b/Documentation/media/uapi/v4l/pixfmt-compressed.rst
@@ -45,6 +45,26 @@ Compressed Formats
       - ``V4L2_PIX_FMT_H264_MVC``
       - 'M264'
       - H264 MVC video elementary stream.
+    * .. _V4L2-PIX-FMT-H264-SLICE:
+
+      - ``V4L2_PIX_FMT_H264_SLICE``
+      - 'S264'
+      - H264 parsed slice data, as extracted from the H264 bitstream.
+	This format is adapted for stateless video decoders that
+	implement an H264 pipeline (using the :ref:`codec` and
+	:ref:`media-request-api`).  Metadata associated with the frame
+	to decode are required to be passed through the
+	``V4L2_CID_MPEG_VIDEO_H264_SPS``,
+	``V4L2_CID_MPEG_VIDEO_H264_PPS`` and
+	``V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAMS`` and
+	``V4L2_CID_MPEG_VIDEO_H264_DECODE_PARAMS`` controls and
+	scaling matrices can optionally be specified through the
+	``V4L2_CID_MPEG_VIDEO_H264_SCALING_MATRIX`` control.  See the
+	:ref:`associated Codec Control IDs <v4l2-mpeg-h264>`.
+	Exactly one output and one capture buffer must be provided for
+	use with this pixel format. The output buffer must contain the
+	appropriate number of macroblocks to decode a full
+	corresponding frame to the matching capture buffer.
     * .. _V4L2-PIX-FMT-H263:
 
       - ``V4L2_PIX_FMT_H263``
diff --git a/Documentation/media/uapi/v4l/vidioc-queryctrl.rst b/Documentation/media/uapi/v4l/vidioc-queryctrl.rst
index 258f5813f281..38a9c988124c 100644
--- a/Documentation/media/uapi/v4l/vidioc-queryctrl.rst
+++ b/Documentation/media/uapi/v4l/vidioc-queryctrl.rst
@@ -436,6 +436,36 @@ See also the examples in :ref:`control`.
       - n/a
       - A struct :c:type:`v4l2_ctrl_mpeg2_quantization`, containing MPEG-2
 	quantization matrices for stateless video decoders.
+    * - ``V4L2_CTRL_TYPE_H264_SPS``
+      - n/a
+      - n/a
+      - n/a
+      - A struct :c:type:`v4l2_ctrl_h264_sps`, containing H264
+	sequence parameters for stateless video decoders.
+    * - ``V4L2_CTRL_TYPE_H264_PPS``
+      - n/a
+      - n/a
+      - n/a
+      - A struct :c:type:`v4l2_ctrl_h264_pps`, containing H264
+	picture parameters for stateless video decoders.
+    * - ``V4L2_CTRL_TYPE_H264_SCALING_MATRIX``
+      - n/a
+      - n/a
+      - n/a
+      - A struct :c:type:`v4l2_ctrl_h264_scaling_matrix`, containing H264
+	scaling matrices for stateless video decoders.
+    * - ``V4L2_CTRL_TYPE_H264_SLICE_PARAMS``
+      - n/a
+      - n/a
+      - n/a
+      - A struct :c:type:`v4l2_ctrl_h264_slice_param`, containing H264
+	slice parameters for stateless video decoders.
+    * - ``V4L2_CTRL_TYPE_H264_DECODE_PARAMS``
+      - n/a
+      - n/a
+      - n/a
+      - A struct :c:type:`v4l2_ctrl_h264_decode_param`, containing H264
+	decode parameters for stateless video decoders.
 
 .. tabularcolumns:: |p{6.6cm}|p{2.2cm}|p{8.7cm}|
 
diff --git a/Documentation/media/videodev2.h.rst.exceptions b/Documentation/media/videodev2.h.rst.exceptions
index 1ec425a7c364..99f1bd2bc44c 100644
--- a/Documentation/media/videodev2.h.rst.exceptions
+++ b/Documentation/media/videodev2.h.rst.exceptions
@@ -133,6 +133,11 @@ replace symbol V4L2_CTRL_TYPE_U32 :c:type:`v4l2_ctrl_type`
 replace symbol V4L2_CTRL_TYPE_U8 :c:type:`v4l2_ctrl_type`
 replace symbol V4L2_CTRL_TYPE_MPEG2_SLICE_PARAMS :c:type:`v4l2_ctrl_type`
 replace symbol V4L2_CTRL_TYPE_MPEG2_QUANTIZATION :c:type:`v4l2_ctrl_type`
+replace symbol V4L2_CTRL_TYPE_H264_SPS :c:type:`v4l2_ctrl_type`
+replace symbol V4L2_CTRL_TYPE_H264_PPS :c:type:`v4l2_ctrl_type`
+replace symbol V4L2_CTRL_TYPE_H264_SCALING_MATRIX :c:type:`v4l2_ctrl_type`
+replace symbol V4L2_CTRL_TYPE_H264_SLICE_PARAMS :c:type:`v4l2_ctrl_type`
+replace symbol V4L2_CTRL_TYPE_H264_DECODE_PARAMS :c:type:`v4l2_ctrl_type`
 
 # V4L2 capability defines
 replace define V4L2_CAP_VIDEO_CAPTURE device-capabilities
diff --git a/drivers/media/v4l2-core/v4l2-ctrls.c b/drivers/media/v4l2-core/v4l2-ctrls.c
index b854cceb19dc..e96c453208e8 100644
--- a/drivers/media/v4l2-core/v4l2-ctrls.c
+++ b/drivers/media/v4l2-core/v4l2-ctrls.c
@@ -825,6 +825,11 @@ const char *v4l2_ctrl_get_name(u32 id)
 	case V4L2_CID_MPEG_VIDEO_H264_HIERARCHICAL_CODING_LAYER:return "H264 Number of HC Layers";
 	case V4L2_CID_MPEG_VIDEO_H264_HIERARCHICAL_CODING_LAYER_QP:
 								return "H264 Set QP Value for HC Layers";
+	case V4L2_CID_MPEG_VIDEO_H264_SPS:			return "H264 SPS";
+	case V4L2_CID_MPEG_VIDEO_H264_PPS:			return "H264 PPS";
+	case V4L2_CID_MPEG_VIDEO_H264_SCALING_MATRIX:		return "H264 Scaling Matrix";
+	case V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAMS:		return "H264 Slice Parameters";
+	case V4L2_CID_MPEG_VIDEO_H264_DECODE_PARAMS:		return "H264 Decode Parameters";
 	case V4L2_CID_MPEG_VIDEO_MPEG4_I_FRAME_QP:		return "MPEG4 I-Frame QP Value";
 	case V4L2_CID_MPEG_VIDEO_MPEG4_P_FRAME_QP:		return "MPEG4 P-Frame QP Value";
 	case V4L2_CID_MPEG_VIDEO_MPEG4_B_FRAME_QP:		return "MPEG4 B-Frame QP Value";
@@ -1300,6 +1305,21 @@ void v4l2_ctrl_fill(u32 id, const char **name, enum v4l2_ctrl_type *type,
 	case V4L2_CID_MPEG_VIDEO_MPEG2_QUANTIZATION:
 		*type = V4L2_CTRL_TYPE_MPEG2_QUANTIZATION;
 		break;
+	case V4L2_CID_MPEG_VIDEO_H264_SPS:
+		*type = V4L2_CTRL_TYPE_H264_SPS;
+		break;
+	case V4L2_CID_MPEG_VIDEO_H264_PPS:
+		*type = V4L2_CTRL_TYPE_H264_PPS;
+		break;
+	case V4L2_CID_MPEG_VIDEO_H264_SCALING_MATRIX:
+		*type = V4L2_CTRL_TYPE_H264_SCALING_MATRIX;
+		break;
+	case V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAMS:
+		*type = V4L2_CTRL_TYPE_H264_SLICE_PARAMS;
+		break;
+	case V4L2_CID_MPEG_VIDEO_H264_DECODE_PARAMS:
+		*type = V4L2_CTRL_TYPE_H264_DECODE_PARAMS;
+		break;
 	default:
 		*type = V4L2_CTRL_TYPE_INTEGER;
 		break;
@@ -1665,6 +1685,13 @@ static int std_validate(const struct v4l2_ctrl *ctrl, u32 idx,
 	case V4L2_CTRL_TYPE_MPEG2_QUANTIZATION:
 		return 0;
 
+	case V4L2_CTRL_TYPE_H264_SPS:
+	case V4L2_CTRL_TYPE_H264_PPS:
+	case V4L2_CTRL_TYPE_H264_SCALING_MATRIX:
+	case V4L2_CTRL_TYPE_H264_SLICE_PARAMS:
+	case V4L2_CTRL_TYPE_H264_DECODE_PARAMS:
+		return 0;
+
 	default:
 		return -EINVAL;
 	}
@@ -2245,6 +2272,21 @@ static struct v4l2_ctrl *v4l2_ctrl_new(struct v4l2_ctrl_handler *hdl,
 	case V4L2_CTRL_TYPE_MPEG2_QUANTIZATION:
 		elem_size = sizeof(struct v4l2_ctrl_mpeg2_quantization);
 		break;
+	case V4L2_CTRL_TYPE_H264_SPS:
+		elem_size = sizeof(struct v4l2_ctrl_h264_sps);
+		break;
+	case V4L2_CTRL_TYPE_H264_PPS:
+		elem_size = sizeof(struct v4l2_ctrl_h264_pps);
+		break;
+	case V4L2_CTRL_TYPE_H264_SCALING_MATRIX:
+		elem_size = sizeof(struct v4l2_ctrl_h264_scaling_matrix);
+		break;
+	case V4L2_CTRL_TYPE_H264_SLICE_PARAMS:
+		elem_size = sizeof(struct v4l2_ctrl_h264_slice_param);
+		break;
+	case V4L2_CTRL_TYPE_H264_DECODE_PARAMS:
+		elem_size = sizeof(struct v4l2_ctrl_h264_decode_param);
+		break;
 	default:
 		if (type < V4L2_CTRL_COMPOUND_TYPES)
 			elem_size = sizeof(s32);
diff --git a/drivers/media/v4l2-core/v4l2-ioctl.c b/drivers/media/v4l2-core/v4l2-ioctl.c
index 49103787d19a..aa63f1794272 100644
--- a/drivers/media/v4l2-core/v4l2-ioctl.c
+++ b/drivers/media/v4l2-core/v4l2-ioctl.c
@@ -1309,6 +1309,7 @@ static void v4l_fill_fmtdesc(struct v4l2_fmtdesc *fmt)
 		case V4L2_PIX_FMT_H264:		descr = "H.264"; break;
 		case V4L2_PIX_FMT_H264_NO_SC:	descr = "H.264 (No Start Codes)"; break;
 		case V4L2_PIX_FMT_H264_MVC:	descr = "H.264 MVC"; break;
+		case V4L2_PIX_FMT_H264_SLICE:	descr = "H.264 Parsed Slice"; break;
 		case V4L2_PIX_FMT_H263:		descr = "H.263"; break;
 		case V4L2_PIX_FMT_MPEG1:	descr = "MPEG-1 ES"; break;
 		case V4L2_PIX_FMT_MPEG2:	descr = "MPEG-2 ES"; break;
diff --git a/include/media/v4l2-ctrls.h b/include/media/v4l2-ctrls.h
index 83ce0593b275..b4ca95710d2d 100644
--- a/include/media/v4l2-ctrls.h
+++ b/include/media/v4l2-ctrls.h
@@ -43,6 +43,11 @@ struct poll_table_struct;
  * @p_char:			Pointer to a string.
  * @p_mpeg2_slice_params:	Pointer to a MPEG2 slice parameters structure.
  * @p_mpeg2_quantization:	Pointer to a MPEG2 quantization data structure.
+ * @p_h264_sps:			Pointer to a struct v4l2_ctrl_h264_sps.
+ * @p_h264_pps:			Pointer to a struct v4l2_ctrl_h264_pps.
+ * @p_h264_scal_mtrx:		Pointer to a struct v4l2_ctrl_h264_scaling_matrix.
+ * @p_h264_slice_param:		Pointer to a struct v4l2_ctrl_h264_slice_param.
+ * @p_h264_decode_param:	Pointer to a struct v4l2_ctrl_h264_decode_param.
  * @p:				Pointer to a compound value.
  */
 union v4l2_ctrl_ptr {
@@ -54,6 +59,11 @@ union v4l2_ctrl_ptr {
 	char *p_char;
 	struct v4l2_ctrl_mpeg2_slice_params *p_mpeg2_slice_params;
 	struct v4l2_ctrl_mpeg2_quantization *p_mpeg2_quantization;
+	struct v4l2_ctrl_h264_sps *p_h264_sps;
+	struct v4l2_ctrl_h264_pps *p_h264_pps;
+	struct v4l2_ctrl_h264_scaling_matrix *p_h264_scal_mtrx;
+	struct v4l2_ctrl_h264_slice_param *p_h264_slice_param;
+	struct v4l2_ctrl_h264_decode_param *p_h264_decode_param;
 	void *p;
 };
 
diff --git a/include/uapi/linux/v4l2-controls.h b/include/uapi/linux/v4l2-controls.h
index 76f5322ec543..fb1469ec1b90 100644
--- a/include/uapi/linux/v4l2-controls.h
+++ b/include/uapi/linux/v4l2-controls.h
@@ -50,6 +50,8 @@
 #ifndef __LINUX_V4L2_CONTROLS_H
 #define __LINUX_V4L2_CONTROLS_H
 
+#include <linux/types.h>
+
 /* Control classes */
 #define V4L2_CTRL_CLASS_USER		0x00980000	/* Old-style 'user' controls */
 #define V4L2_CTRL_CLASS_MPEG		0x00990000	/* MPEG-compression controls */
@@ -534,6 +536,12 @@ enum v4l2_mpeg_video_h264_hierarchical_coding_type {
 };
 #define V4L2_CID_MPEG_VIDEO_H264_HIERARCHICAL_CODING_LAYER	(V4L2_CID_MPEG_BASE+381)
 #define V4L2_CID_MPEG_VIDEO_H264_HIERARCHICAL_CODING_LAYER_QP	(V4L2_CID_MPEG_BASE+382)
+#define V4L2_CID_MPEG_VIDEO_H264_SPS		(V4L2_CID_MPEG_BASE+383)
+#define V4L2_CID_MPEG_VIDEO_H264_PPS		(V4L2_CID_MPEG_BASE+384)
+#define V4L2_CID_MPEG_VIDEO_H264_SCALING_MATRIX	(V4L2_CID_MPEG_BASE+385)
+#define V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAMS	(V4L2_CID_MPEG_BASE+386)
+#define V4L2_CID_MPEG_VIDEO_H264_DECODE_PARAMS	(V4L2_CID_MPEG_BASE+387)
+
 #define V4L2_CID_MPEG_VIDEO_MPEG4_I_FRAME_QP	(V4L2_CID_MPEG_BASE+400)
 #define V4L2_CID_MPEG_VIDEO_MPEG4_P_FRAME_QP	(V4L2_CID_MPEG_BASE+401)
 #define V4L2_CID_MPEG_VIDEO_MPEG4_B_FRAME_QP	(V4L2_CID_MPEG_BASE+402)
@@ -1156,4 +1164,162 @@ struct v4l2_ctrl_mpeg2_quantization {
 	__u8	chroma_non_intra_quantiser_matrix[64];
 };
 
+/* Compounds controls */
+
+#define V4L2_H264_SPS_CONSTRAINT_SET0_FLAG			0x01
+#define V4L2_H264_SPS_CONSTRAINT_SET1_FLAG			0x02
+#define V4L2_H264_SPS_CONSTRAINT_SET2_FLAG			0x04
+#define V4L2_H264_SPS_CONSTRAINT_SET3_FLAG			0x08
+#define V4L2_H264_SPS_CONSTRAINT_SET4_FLAG			0x10
+#define V4L2_H264_SPS_CONSTRAINT_SET5_FLAG			0x20
+
+#define V4L2_H264_SPS_FLAG_SEPARATE_COLOUR_PLANE		0x01
+#define V4L2_H264_SPS_FLAG_QPPRIME_Y_ZERO_TRANSFORM_BYPASS	0x02
+#define V4L2_H264_SPS_FLAG_DELTA_PIC_ORDER_ALWAYS_ZERO		0x04
+#define V4L2_H264_SPS_FLAG_GAPS_IN_FRAME_NUM_VALUE_ALLOWED	0x08
+#define V4L2_H264_SPS_FLAG_FRAME_MBS_ONLY			0x10
+#define V4L2_H264_SPS_FLAG_MB_ADAPTIVE_FRAME_FIELD		0x20
+#define V4L2_H264_SPS_FLAG_DIRECT_8X8_INFERENCE			0x40
+
+struct v4l2_ctrl_h264_sps {
+	__u8 profile_idc;
+	__u8 constraint_set_flags;
+	__u8 level_idc;
+	__u8 seq_parameter_set_id;
+	__u8 chroma_format_idc;
+	__u8 bit_depth_luma_minus8;
+	__u8 bit_depth_chroma_minus8;
+	__u8 log2_max_frame_num_minus4;
+	__u8 pic_order_cnt_type;
+	__u8 log2_max_pic_order_cnt_lsb_minus4;
+	__u8 max_num_ref_frames;
+	__u8 num_ref_frames_in_pic_order_cnt_cycle;
+	__s32 offset_for_ref_frame[255];
+	__s32 offset_for_non_ref_pic;
+	__s32 offset_for_top_to_bottom_field;
+	__u16 pic_width_in_mbs_minus1;
+	__u16 pic_height_in_map_units_minus1;
+	__u8 flags;
+};
+
+#define V4L2_H264_PPS_FLAG_ENTROPY_CODING_MODE				0x0001
+#define V4L2_H264_PPS_FLAG_BOTTOM_FIELD_PIC_ORDER_IN_FRAME_PRESENT	0x0002
+#define V4L2_H264_PPS_FLAG_WEIGHTED_PRED				0x0004
+#define V4L2_H264_PPS_FLAG_DEBLOCKING_FILTER_CONTROL_PRESENT		0x0008
+#define V4L2_H264_PPS_FLAG_CONSTRAINED_INTRA_PRED			0x0010
+#define V4L2_H264_PPS_FLAG_REDUNDANT_PIC_CNT_PRESENT			0x0020
+#define V4L2_H264_PPS_FLAG_TRANSFORM_8X8_MODE				0x0040
+#define V4L2_H264_PPS_FLAG_PIC_SCALING_MATRIX_PRESENT			0x0080
+
+struct v4l2_ctrl_h264_pps {
+	__u8 pic_parameter_set_id;
+	__u8 seq_parameter_set_id;
+	__u8 num_slice_groups_minus1;
+	__u8 num_ref_idx_l0_default_active_minus1;
+	__u8 num_ref_idx_l1_default_active_minus1;
+	__u8 weighted_bipred_idc;
+	__s8 pic_init_qp_minus26;
+	__s8 pic_init_qs_minus26;
+	__s8 chroma_qp_index_offset;
+	__s8 second_chroma_qp_index_offset;
+	__u8 flags;
+};
+
+struct v4l2_ctrl_h264_scaling_matrix {
+	__u8 scaling_list_4x4[6][16];
+	__u8 scaling_list_8x8[6][64];
+};
+
+struct v4l2_h264_weight_factors {
+	__s8 luma_weight[32];
+	__s8 luma_offset[32];
+	__s8 chroma_weight[32][2];
+	__s8 chroma_offset[32][2];
+};
+
+struct v4l2_h264_pred_weight_table {
+	__u8 luma_log2_weight_denom;
+	__u8 chroma_log2_weight_denom;
+	struct v4l2_h264_weight_factors weight_factors[2];
+};
+
+#define V4L2_H264_SLICE_TYPE_P				0
+#define V4L2_H264_SLICE_TYPE_B				1
+#define V4L2_H264_SLICE_TYPE_I				2
+#define V4L2_H264_SLICE_TYPE_SP				3
+#define V4L2_H264_SLICE_TYPE_SI				4
+
+#define V4L2_H264_SLICE_FLAG_FIELD_PIC			0x01
+#define V4L2_H264_SLICE_FLAG_BOTTOM_FIELD		0x02
+#define V4L2_H264_SLICE_FLAG_DIRECT_SPATIAL_MV_PRED	0x04
+#define V4L2_H264_SLICE_FLAG_SP_FOR_SWITCH		0x08
+
+struct v4l2_ctrl_h264_slice_param {
+	/* Size in bytes, including header */
+	__u32 size;
+	/* Offset in bits to slice_data() from the beginning of this slice. */
+	__u32 header_bit_size;
+
+	__u16 first_mb_in_slice;
+	__u8 slice_type;
+	__u8 pic_parameter_set_id;
+	__u8 colour_plane_id;
+	__u16 frame_num;
+	__u16 idr_pic_id;
+	__u16 pic_order_cnt_lsb;
+	__s32 delta_pic_order_cnt_bottom;
+	__s32 delta_pic_order_cnt0;
+	__s32 delta_pic_order_cnt1;
+	__u8 redundant_pic_cnt;
+
+	struct v4l2_h264_pred_weight_table pred_weight_table;
+	/* Size in bits of dec_ref_pic_marking() syntax element. */
+	__u32 dec_ref_pic_marking_bit_size;
+	/* Size in bits of pic order count syntax. */
+	__u32 pic_order_cnt_bit_size;
+
+	__u8 cabac_init_idc;
+	__s8 slice_qp_delta;
+	__s8 slice_qs_delta;
+	__u8 disable_deblocking_filter_idc;
+	__s8 slice_alpha_c0_offset_div2;
+	__s8 slice_beta_offset_div2;
+	__u32 slice_group_change_cycle;
+
+	__u8 num_ref_idx_l0_active_minus1;
+	__u8 num_ref_idx_l1_active_minus1;
+	/*  Entries on each list are indices
+	 *  into v4l2_ctrl_h264_decode_param.dpb[]. */
+	__u8 ref_pic_list0[32];
+	__u8 ref_pic_list1[32];
+
+	__u8 flags;
+};
+
+#define V4L2_H264_DPB_ENTRY_FLAG_VALID		0x01
+#define V4L2_H264_DPB_ENTRY_FLAG_ACTIVE		0x02
+#define V4L2_H264_DPB_ENTRY_FLAG_LONG_TERM	0x04
+
+struct v4l2_h264_dpb_entry {
+	__u32 tag;
+	__u16 frame_num;
+	__u16 pic_num;
+	/* Note that field is indicated by v4l2_buffer.field */
+	__s32 top_field_order_cnt;
+	__s32 bottom_field_order_cnt;
+	__u8 flags; /* V4L2_H264_DPB_ENTRY_FLAG_* */
+};
+
+struct v4l2_ctrl_h264_decode_param {
+	__u32 num_slices;
+	__u8 idr_pic_flag;
+	__u8 nal_ref_idc;
+	__s32 top_field_order_cnt;
+	__s32 bottom_field_order_cnt;
+	__u8 ref_pic_list_p0[32];
+	__u8 ref_pic_list_b0[32];
+	__u8 ref_pic_list_b1[32];
+	struct v4l2_h264_dpb_entry dpb[16];
+};
+
 #endif
diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h
index 173a94d2cbef..dd028e0bf306 100644
--- a/include/uapi/linux/videodev2.h
+++ b/include/uapi/linux/videodev2.h
@@ -643,6 +643,7 @@ struct v4l2_pix_format {
 #define V4L2_PIX_FMT_H264     v4l2_fourcc('H', '2', '6', '4') /* H264 with start codes */
 #define V4L2_PIX_FMT_H264_NO_SC v4l2_fourcc('A', 'V', 'C', '1') /* H264 without start codes */
 #define V4L2_PIX_FMT_H264_MVC v4l2_fourcc('M', '2', '6', '4') /* H264 MVC */
+#define V4L2_PIX_FMT_H264_SLICE v4l2_fourcc('S', '2', '6', '4') /* H264 parsed slices */
 #define V4L2_PIX_FMT_H263     v4l2_fourcc('H', '2', '6', '3') /* H263          */
 #define V4L2_PIX_FMT_MPEG1    v4l2_fourcc('M', 'P', 'G', '1') /* MPEG-1 ES     */
 #define V4L2_PIX_FMT_MPEG2    v4l2_fourcc('M', 'P', 'G', '2') /* MPEG-2 ES     */
@@ -1631,6 +1632,11 @@ struct v4l2_ext_control {
 		__u32 __user *p_u32;
 		struct v4l2_ctrl_mpeg2_slice_params __user *p_mpeg2_slice_params;
 		struct v4l2_ctrl_mpeg2_quantization __user *p_mpeg2_quantization;
+		struct v4l2_ctrl_h264_sps __user *p_h264_sps;
+		struct v4l2_ctrl_h264_pps __user *p_h264_pps;
+		struct v4l2_ctrl_h264_scaling_matrix __user *p_h264_scal_mtrx;
+		struct v4l2_ctrl_h264_slice_param __user *p_h264_slice_param;
+		struct v4l2_ctrl_h264_decode_param __user *p_h264_decode_param;
 		void __user *ptr;
 	};
 } __attribute__ ((packed));
@@ -1678,6 +1684,11 @@ enum v4l2_ctrl_type {
 	V4L2_CTRL_TYPE_U32	     = 0x0102,
 	V4L2_CTRL_TYPE_MPEG2_SLICE_PARAMS = 0x0103,
 	V4L2_CTRL_TYPE_MPEG2_QUANTIZATION = 0x0104,
+	V4L2_CTRL_TYPE_H264_SPS      = 0x0105,
+	V4L2_CTRL_TYPE_H264_PPS      = 0x0106,
+	V4L2_CTRL_TYPE_H264_SCALING_MATRIX = 0x0107,
+	V4L2_CTRL_TYPE_H264_SLICE_PARAMS = 0x0108,
+	V4L2_CTRL_TYPE_H264_DECODE_PARAMS = 0x0109,
 };
 
 /*  Used in the VIDIOC_QUERYCTRL ioctl for querying controls */
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 2/2] media: cedrus: Add H264 decoding support
  2018-11-15 14:56 [PATCH v2 0/2] media: cedrus: Add H264 decoding support Maxime Ripard
  2018-11-15 14:56 ` [PATCH v2 1/2] media: uapi: Add H264 low-level decoder API compound controls Maxime Ripard
@ 2018-11-15 14:56 ` Maxime Ripard
  2018-11-24 20:43   ` [linux-sunxi] " Jernej Škrabec
                     ` (2 more replies)
  2018-11-16  7:04 ` [PATCH v2 0/2] " Tomasz Figa
  2 siblings, 3 replies; 27+ messages in thread
From: Maxime Ripard @ 2018-11-15 14:56 UTC (permalink / raw)
  To: hans.verkuil, acourbot, sakari.ailus, Laurent Pinchart
  Cc: tfiga, posciak, Paul Kocialkowski, Chen-Yu Tsai, linux-kernel,
	linux-arm-kernel, linux-media, nicolas.dufresne, jenskuske,
	linux-sunxi, Thomas Petazzoni, Maxime Ripard

Introduce some basic H264 decoding support in cedrus. So far, only the
baseline profile videos have been tested, and some more advanced features
used in higher profiles are not even implemented.

Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
---
 drivers/staging/media/sunxi/cedrus/Makefile   |   3 +-
 drivers/staging/media/sunxi/cedrus/cedrus.c   |  25 +
 drivers/staging/media/sunxi/cedrus/cedrus.h   |  35 +-
 .../staging/media/sunxi/cedrus/cedrus_dec.c   |  11 +
 .../staging/media/sunxi/cedrus/cedrus_h264.c  | 470 ++++++++++++++++++
 .../staging/media/sunxi/cedrus/cedrus_hw.c    |   4 +
 .../staging/media/sunxi/cedrus/cedrus_regs.h  |  63 +++
 .../staging/media/sunxi/cedrus/cedrus_video.c |   9 +
 8 files changed, 618 insertions(+), 2 deletions(-)
 create mode 100644 drivers/staging/media/sunxi/cedrus/cedrus_h264.c

diff --git a/drivers/staging/media/sunxi/cedrus/Makefile b/drivers/staging/media/sunxi/cedrus/Makefile
index e9dc68b7bcb6..aaf141fc58b6 100644
--- a/drivers/staging/media/sunxi/cedrus/Makefile
+++ b/drivers/staging/media/sunxi/cedrus/Makefile
@@ -1,3 +1,4 @@
 obj-$(CONFIG_VIDEO_SUNXI_CEDRUS) += sunxi-cedrus.o
 
-sunxi-cedrus-y = cedrus.o cedrus_video.o cedrus_hw.o cedrus_dec.o cedrus_mpeg2.o
+sunxi-cedrus-y = cedrus.o cedrus_video.o cedrus_hw.o cedrus_dec.o \
+		 cedrus_mpeg2.o cedrus_h264.o
diff --git a/drivers/staging/media/sunxi/cedrus/cedrus.c b/drivers/staging/media/sunxi/cedrus/cedrus.c
index 82558455384a..627a8c07eb21 100644
--- a/drivers/staging/media/sunxi/cedrus/cedrus.c
+++ b/drivers/staging/media/sunxi/cedrus/cedrus.c
@@ -40,6 +40,30 @@ static const struct cedrus_control cedrus_controls[] = {
 		.codec		= CEDRUS_CODEC_MPEG2,
 		.required	= false,
 	},
+	{
+		.id		= V4L2_CID_MPEG_VIDEO_H264_DECODE_PARAMS,
+		.elem_size	= sizeof(struct v4l2_ctrl_h264_decode_param),
+		.codec		= CEDRUS_CODEC_H264,
+		.required	= true,
+	},
+	{
+		.id		= V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAMS,
+		.elem_size	= sizeof(struct v4l2_ctrl_h264_slice_param),
+		.codec		= CEDRUS_CODEC_H264,
+		.required	= true,
+	},
+	{
+		.id		= V4L2_CID_MPEG_VIDEO_H264_SPS,
+		.elem_size	= sizeof(struct v4l2_ctrl_h264_sps),
+		.codec		= CEDRUS_CODEC_H264,
+		.required	= true,
+	},
+	{
+		.id		= V4L2_CID_MPEG_VIDEO_H264_PPS,
+		.elem_size	= sizeof(struct v4l2_ctrl_h264_pps),
+		.codec		= CEDRUS_CODEC_H264,
+		.required	= true,
+	},
 };
 
 #define CEDRUS_CONTROLS_COUNT	ARRAY_SIZE(cedrus_controls)
@@ -277,6 +301,7 @@ static int cedrus_probe(struct platform_device *pdev)
 	}
 
 	dev->dec_ops[CEDRUS_CODEC_MPEG2] = &cedrus_dec_ops_mpeg2;
+	dev->dec_ops[CEDRUS_CODEC_H264] = &cedrus_dec_ops_h264;
 
 	mutex_init(&dev->dev_mutex);
 	spin_lock_init(&dev->irq_lock);
diff --git a/drivers/staging/media/sunxi/cedrus/cedrus.h b/drivers/staging/media/sunxi/cedrus/cedrus.h
index 781676b55a1b..179c10dcf6a7 100644
--- a/drivers/staging/media/sunxi/cedrus/cedrus.h
+++ b/drivers/staging/media/sunxi/cedrus/cedrus.h
@@ -30,7 +30,7 @@
 
 enum cedrus_codec {
 	CEDRUS_CODEC_MPEG2,
-
+	CEDRUS_CODEC_H264,
 	CEDRUS_CODEC_LAST,
 };
 
@@ -40,6 +40,12 @@ enum cedrus_irq_status {
 	CEDRUS_IRQ_OK,
 };
 
+enum cedrus_h264_pic_type {
+	CEDRUS_H264_PIC_TYPE_FRAME	= 0,
+	CEDRUS_H264_PIC_TYPE_FIELD,
+	CEDRUS_H264_PIC_TYPE_MBAFF,
+};
+
 struct cedrus_control {
 	u32			id;
 	u32			elem_size;
@@ -47,6 +53,13 @@ struct cedrus_control {
 	unsigned char		required:1;
 };
 
+struct cedrus_h264_run {
+	const struct v4l2_ctrl_h264_decode_param	*decode_param;
+	const struct v4l2_ctrl_h264_pps			*pps;
+	const struct v4l2_ctrl_h264_slice_param		*slice_param;
+	const struct v4l2_ctrl_h264_sps			*sps;
+};
+
 struct cedrus_mpeg2_run {
 	const struct v4l2_ctrl_mpeg2_slice_params	*slice_params;
 	const struct v4l2_ctrl_mpeg2_quantization	*quantization;
@@ -57,12 +70,20 @@ struct cedrus_run {
 	struct vb2_v4l2_buffer	*dst;
 
 	union {
+		struct cedrus_h264_run	h264;
 		struct cedrus_mpeg2_run	mpeg2;
 	};
 };
 
 struct cedrus_buffer {
 	struct v4l2_m2m_buffer          m2m_buf;
+
+	union {
+		struct {
+			unsigned int			position;
+			enum cedrus_h264_pic_type	pic_type;
+		} h264;
+	} codec;
 };
 
 struct cedrus_ctx {
@@ -77,6 +98,17 @@ struct cedrus_ctx {
 	struct v4l2_ctrl		**ctrls;
 
 	struct vb2_buffer		*dst_bufs[VIDEO_MAX_FRAME];
+
+	union {
+		struct {
+			void		*mv_col_buf;
+			dma_addr_t	mv_col_buf_dma;
+			ssize_t		mv_col_buf_field_size;
+			ssize_t		mv_col_buf_size;
+			void		*pic_info_buf;
+			dma_addr_t	pic_info_buf_dma;
+		} h264;
+	} codec;
 };
 
 struct cedrus_dec_ops {
@@ -120,6 +152,7 @@ struct cedrus_dev {
 };
 
 extern struct cedrus_dec_ops cedrus_dec_ops_mpeg2;
+extern struct cedrus_dec_ops cedrus_dec_ops_h264;
 
 static inline void cedrus_write(struct cedrus_dev *dev, u32 reg, u32 val)
 {
diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_dec.c b/drivers/staging/media/sunxi/cedrus/cedrus_dec.c
index 0cfd6036d0cd..b606f07d94ab 100644
--- a/drivers/staging/media/sunxi/cedrus/cedrus_dec.c
+++ b/drivers/staging/media/sunxi/cedrus/cedrus_dec.c
@@ -49,6 +49,17 @@ void cedrus_device_run(void *priv)
 			V4L2_CID_MPEG_VIDEO_MPEG2_QUANTIZATION);
 		break;
 
+	case V4L2_PIX_FMT_H264_SLICE:
+		run.h264.decode_param = cedrus_find_control_data(ctx,
+			V4L2_CID_MPEG_VIDEO_H264_DECODE_PARAMS);
+		run.h264.pps = cedrus_find_control_data(ctx,
+			V4L2_CID_MPEG_VIDEO_H264_PPS);
+		run.h264.slice_param = cedrus_find_control_data(ctx,
+			V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAMS);
+		run.h264.sps = cedrus_find_control_data(ctx,
+			V4L2_CID_MPEG_VIDEO_H264_SPS);
+		break;
+
 	default:
 		break;
 	}
diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
new file mode 100644
index 000000000000..5459a936b4b9
--- /dev/null
+++ b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
@@ -0,0 +1,470 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (c) 2013 Jens Kuske <jenskuske@gmail.com>
+ * Copyright (c) 2018 Bootlin
+ */
+
+#include <linux/types.h>
+
+#include <media/videobuf2-dma-contig.h>
+
+#include "cedrus.h"
+#include "cedrus_hw.h"
+#include "cedrus_regs.h"
+
+enum cedrus_h264_sram_off {
+	CEDRUS_SRAM_H264_PRED_WEIGHT_TABLE	= 0x000,
+	CEDRUS_SRAM_H264_FRAMEBUFFER_LIST	= 0x100,
+	CEDRUS_SRAM_H264_REF_LIST_0		= 0x190,
+	CEDRUS_SRAM_H264_REF_LIST_1		= 0x199,
+	CEDRUS_SRAM_H264_SCALING_LIST_8x8	= 0x200,
+	CEDRUS_SRAM_H264_SCALING_LIST_4x4	= 0x218,
+};
+
+struct cedrus_h264_sram_ref_pic {
+	__le32	top_field_order_cnt;
+	__le32	bottom_field_order_cnt;
+	__le32	frame_info;
+	__le32	luma_ptr;
+	__le32	chroma_ptr;
+	__le32	mv_col_top_ptr;
+	__le32	mv_col_bot_ptr;
+	__le32	reserved;
+} __packed;
+
+/* One for the output, 16 for the reference images */
+#define CEDRUS_H264_FRAME_NUM		17
+
+#define CEDRUS_PIC_INFO_BUF_SIZE	(128 * SZ_1K)
+
+static void cedrus_h264_write_sram(struct cedrus_dev *dev,
+				   enum cedrus_h264_sram_off off,
+				   const void *data, size_t len)
+{
+	const u32 *buffer = data;
+	size_t count = DIV_ROUND_UP(len, 4);
+
+	cedrus_write(dev, VE_AVC_SRAM_PORT_OFFSET, off << 2);
+
+	do {
+		cedrus_write(dev, VE_AVC_SRAM_PORT_DATA, *buffer++);
+	} while (--count);
+}
+
+static dma_addr_t cedrus_h264_mv_col_buf_addr(struct cedrus_ctx *ctx,
+					      unsigned int position,
+					      unsigned int field)
+{
+	dma_addr_t addr = ctx->codec.h264.mv_col_buf_dma - PHYS_OFFSET;
+
+	/* Adjust for the position */
+	addr += position * ctx->codec.h264.mv_col_buf_field_size * 2;
+
+	/* Adjust for the field */
+	addr += field * ctx->codec.h264.mv_col_buf_field_size;
+
+	return addr;
+}
+
+static void cedrus_fill_ref_pic(struct cedrus_ctx *ctx,
+				struct cedrus_buffer *buf,
+				unsigned int top_field_order_cnt,
+				unsigned int bottom_field_order_cnt,
+				struct cedrus_h264_sram_ref_pic *pic)
+{
+	struct vb2_buffer *vbuf = &buf->m2m_buf.vb.vb2_buf;
+	unsigned int position = buf->codec.h264.position;
+
+	pic->top_field_order_cnt = top_field_order_cnt;
+	pic->bottom_field_order_cnt = bottom_field_order_cnt;
+	pic->frame_info = buf->codec.h264.pic_type << 8;
+
+	pic->luma_ptr = cedrus_buf_addr(vbuf, &ctx->dst_fmt, 0) - PHYS_OFFSET;
+	pic->chroma_ptr = cedrus_buf_addr(vbuf, &ctx->dst_fmt, 1) - PHYS_OFFSET;
+	pic->mv_col_top_ptr = cedrus_h264_mv_col_buf_addr(ctx, position, 0);
+	pic->mv_col_bot_ptr = cedrus_h264_mv_col_buf_addr(ctx, position, 1);
+}
+
+static void cedrus_write_frame_list(struct cedrus_ctx *ctx,
+				    struct cedrus_run *run)
+{
+	struct cedrus_h264_sram_ref_pic pic_list[CEDRUS_H264_FRAME_NUM];
+	const struct v4l2_ctrl_h264_decode_param *dec_param = run->h264.decode_param;
+	const struct v4l2_ctrl_h264_slice_param *slice = run->h264.slice_param;
+	const struct v4l2_ctrl_h264_sps *sps = run->h264.sps;
+	struct vb2_queue *cap_q = &ctx->fh.m2m_ctx->cap_q_ctx.q;
+	struct cedrus_buffer *output_buf;
+	struct cedrus_dev *dev = ctx->dev;
+	unsigned long used_dpbs = 0;
+	unsigned int position;
+	unsigned int output = 0;
+	unsigned int i;
+
+	memset(pic_list, 0, sizeof(pic_list));
+
+	for (i = 0; i < ARRAY_SIZE(dec_param->dpb); i++) {
+		const struct v4l2_h264_dpb_entry *dpb = &dec_param->dpb[i];
+		struct cedrus_buffer *cedrus_buf;
+		int buf_idx;
+
+		if (!(dpb->flags & V4L2_H264_DPB_ENTRY_FLAG_VALID))
+			continue;
+
+		buf_idx = vb2_find_tag(cap_q, dpb->tag, 0);
+		if (buf_idx < 0)
+			continue;
+
+		cedrus_buf = vb2_to_cedrus_buffer(ctx->dst_bufs[buf_idx]);
+		position = cedrus_buf->codec.h264.position;
+		used_dpbs |= BIT(position);
+
+		if (!(dpb->flags & V4L2_H264_DPB_ENTRY_FLAG_ACTIVE))
+			continue;
+
+		cedrus_fill_ref_pic(ctx, cedrus_buf,
+				    dpb->top_field_order_cnt,
+				    dpb->bottom_field_order_cnt,
+				    &pic_list[position]);
+
+		output = max(position, output);
+	}
+
+	position = find_next_zero_bit(&used_dpbs, CEDRUS_H264_FRAME_NUM,
+				      output);
+	if (position >= CEDRUS_H264_FRAME_NUM)
+		position = find_first_zero_bit(&used_dpbs, CEDRUS_H264_FRAME_NUM);
+
+	output_buf = vb2_to_cedrus_buffer(&run->dst->vb2_buf);
+	output_buf->codec.h264.position = position;
+
+	if (slice->flags & V4L2_H264_SLICE_FLAG_FIELD_PIC)
+		output_buf->codec.h264.pic_type = CEDRUS_H264_PIC_TYPE_FIELD;
+	else if (sps->flags & V4L2_H264_SPS_FLAG_MB_ADAPTIVE_FRAME_FIELD)
+		output_buf->codec.h264.pic_type = CEDRUS_H264_PIC_TYPE_MBAFF;
+	else
+		output_buf->codec.h264.pic_type = CEDRUS_H264_PIC_TYPE_FRAME;
+
+	cedrus_fill_ref_pic(ctx, output_buf,
+			    dec_param->top_field_order_cnt,
+			    dec_param->bottom_field_order_cnt,
+			    &pic_list[position]);
+
+	cedrus_h264_write_sram(dev, CEDRUS_SRAM_H264_FRAMEBUFFER_LIST,
+			       pic_list, sizeof(pic_list));
+
+	cedrus_write(dev, VE_H264_OUTPUT_FRAME_IDX, position);
+}
+
+#define CEDRUS_MAX_REF_IDX	32
+
+static void _cedrus_write_ref_list(struct cedrus_ctx *ctx,
+				   struct cedrus_run *run,
+				   const u8 *ref_list, u8 num_ref,
+				   enum cedrus_h264_sram_off sram)
+{
+	const struct v4l2_ctrl_h264_decode_param *decode = run->h264.decode_param;
+	struct vb2_queue *cap_q = &ctx->fh.m2m_ctx->cap_q_ctx.q;
+	struct cedrus_dev *dev = ctx->dev;
+	u32 sram_array[CEDRUS_MAX_REF_IDX / sizeof(u32)];
+	unsigned int size, i;
+
+	memset(sram_array, 0, sizeof(sram_array));
+
+	for (i = 0; i < num_ref; i += 4) {
+		unsigned int j;
+
+		for (j = 0; j < 4; j++) {
+			const struct v4l2_h264_dpb_entry *dpb;
+			const struct cedrus_buffer *cedrus_buf;
+			const struct vb2_v4l2_buffer *ref_buf;
+			unsigned int position;
+			int buf_idx;
+			u8 ref_idx = i + j;
+			u8 dpb_idx;
+
+			if (ref_idx >= num_ref)
+				break;
+
+			dpb_idx = ref_list[ref_idx];
+			dpb = &decode->dpb[dpb_idx];
+
+			if (!(dpb->flags & V4L2_H264_DPB_ENTRY_FLAG_ACTIVE))
+				continue;
+
+			buf_idx = vb2_find_tag(cap_q, dpb->tag, 0);
+			if (buf_idx < 0)
+				continue;
+
+			ref_buf = to_vb2_v4l2_buffer(ctx->dst_bufs[buf_idx]);
+			cedrus_buf = vb2_v4l2_to_cedrus_buffer(ref_buf);
+			position = cedrus_buf->codec.h264.position;
+
+			sram_array[i] |= position << (j * 8 + 1);
+			if (ref_buf->field == V4L2_FIELD_BOTTOM)
+				sram_array[i] |= BIT(j * 8);
+		}
+	}
+
+	size = min((unsigned int)ALIGN(num_ref, 4), sizeof(sram_array));
+	cedrus_h264_write_sram(dev, sram, &sram_array, size);
+}
+
+static void cedrus_write_ref_list0(struct cedrus_ctx *ctx,
+				   struct cedrus_run *run)
+{
+	const struct v4l2_ctrl_h264_slice_param *slice = run->h264.slice_param;
+
+	_cedrus_write_ref_list(ctx, run,
+			       slice->ref_pic_list0,
+			       slice->num_ref_idx_l0_active_minus1 + 1,
+			       CEDRUS_SRAM_H264_REF_LIST_0);
+}
+
+static void cedrus_write_ref_list1(struct cedrus_ctx *ctx,
+				   struct cedrus_run *run)
+{
+	const struct v4l2_ctrl_h264_slice_param *slice = run->h264.slice_param;
+
+	_cedrus_write_ref_list(ctx, run,
+			       slice->ref_pic_list1,
+			       slice->num_ref_idx_l1_active_minus1 + 1,
+			       CEDRUS_SRAM_H264_REF_LIST_1);
+}
+
+static void cedrus_set_params(struct cedrus_ctx *ctx,
+			      struct cedrus_run *run)
+{
+	const struct v4l2_ctrl_h264_slice_param *slice = run->h264.slice_param;
+	const struct v4l2_ctrl_h264_pps *pps = run->h264.pps;
+	const struct v4l2_ctrl_h264_sps *sps = run->h264.sps;
+	struct cedrus_dev *dev = ctx->dev;
+	dma_addr_t src_buf_addr;
+	u32 offset = slice->header_bit_size;
+	u32 len = (slice->size * 8) - offset;
+	u32 reg;
+
+	cedrus_write(dev, 0x220, 0x02000400);
+	cedrus_write(dev, VE_H264_VLD_LEN, len);
+	cedrus_write(dev, VE_H264_VLD_OFFSET, offset);
+
+	src_buf_addr = vb2_dma_contig_plane_dma_addr(&run->src->vb2_buf, 0);
+	src_buf_addr -= PHYS_OFFSET;
+	cedrus_write(dev, VE_H264_VLD_END, src_buf_addr + VBV_SIZE - 1);
+	cedrus_write(dev, VE_H264_VLD_ADDR,
+		     VE_H264_VLD_ADDR_VAL(src_buf_addr) |
+		     VE_H264_VLD_ADDR_FIRST | VE_H264_VLD_ADDR_VALID |
+		     VE_H264_VLD_ADDR_LAST);
+
+	/*
+	 * FIXME: Since the bitstream parsing is done in software, and
+	 * in userspace, this shouldn't be needed anymore. But it
+	 * turns out that removing it breaks the decoding process,
+	 * without any clear indication why.
+	 */
+	cedrus_write(dev, VE_H264_TRIGGER_TYPE,
+		     VE_H264_TRIGGER_TYPE_INIT_SWDEC);
+
+	if ((slice->slice_type == V4L2_H264_SLICE_TYPE_P) ||
+	    (slice->slice_type == V4L2_H264_SLICE_TYPE_SP) ||
+	    (slice->slice_type == V4L2_H264_SLICE_TYPE_B))
+		cedrus_write_ref_list0(ctx, run);
+
+	if (slice->slice_type == V4L2_H264_SLICE_TYPE_B)
+		cedrus_write_ref_list1(ctx, run);
+
+	// picture parameters
+	reg = 0;
+	/*
+	 * FIXME: the kernel headers are allowing the default value to
+	 * be passed, but the libva doesn't give us that.
+	 */
+	reg |= (slice->num_ref_idx_l0_active_minus1 & 0x1f) << 10;
+	reg |= (slice->num_ref_idx_l1_active_minus1 & 0x1f) << 5;
+	reg |= (pps->weighted_bipred_idc & 0x3) << 2;
+	if (pps->flags & V4L2_H264_PPS_FLAG_ENTROPY_CODING_MODE)
+		reg |= BIT(15);
+	if (pps->flags & V4L2_H264_PPS_FLAG_WEIGHTED_PRED)
+		reg |= BIT(4);
+	if (pps->flags & V4L2_H264_PPS_FLAG_CONSTRAINED_INTRA_PRED)
+		reg |= BIT(1);
+	if (pps->flags & V4L2_H264_PPS_FLAG_TRANSFORM_8X8_MODE)
+		reg |= BIT(0);
+	cedrus_write(dev, VE_H264_PIC_HDR, reg);
+
+	// sequence parameters
+	reg = BIT(19);
+	reg |= (sps->pic_width_in_mbs_minus1 & 0xff) << 8;
+	reg |= sps->pic_height_in_map_units_minus1 & 0xff;
+	if (sps->flags & V4L2_H264_SPS_FLAG_FRAME_MBS_ONLY)
+		reg |= BIT(18);
+	if (sps->flags & V4L2_H264_SPS_FLAG_MB_ADAPTIVE_FRAME_FIELD)
+		reg |= BIT(17);
+	if (sps->flags & V4L2_H264_SPS_FLAG_DIRECT_8X8_INFERENCE)
+		reg |= BIT(16);
+	cedrus_write(dev, VE_H264_FRAME_SIZE, reg);
+
+	// slice parameters
+	reg = 0;
+	/*
+	 * FIXME: This bit marks all the frames as references. This
+	 * should probably be set based on nal_ref_idc, but the libva
+	 * doesn't pass that information along, so this is not always
+	 * available. We should find something else, maybe change the
+	 * kernel UAPI somehow?
+	 */
+	reg |= BIT(12);
+	reg |= (slice->slice_type & 0xf) << 8;
+	reg |= slice->cabac_init_idc & 0x3;
+	reg |= BIT(5);
+	if (slice->flags & V4L2_H264_SLICE_FLAG_FIELD_PIC)
+		reg |= BIT(4);
+	if (slice->flags & V4L2_H264_SLICE_FLAG_BOTTOM_FIELD)
+		reg |= BIT(3);
+	if (slice->flags & V4L2_H264_SLICE_FLAG_DIRECT_SPATIAL_MV_PRED)
+		reg |= BIT(2);
+	cedrus_write(dev, VE_H264_SLICE_HDR, reg);
+
+	reg = 0;
+	reg |= (slice->num_ref_idx_l0_active_minus1 & 0x1f) << 24;
+	reg |= (slice->num_ref_idx_l1_active_minus1 & 0x1f) << 16;
+	reg |= (slice->disable_deblocking_filter_idc & 0x3) << 8;
+	reg |= (slice->slice_alpha_c0_offset_div2 & 0xf) << 4;
+	reg |= slice->slice_beta_offset_div2 & 0xf;
+	cedrus_write(dev, VE_H264_SLICE_HDR2, reg);
+
+	reg = 0;
+	/*
+	 * FIXME: This bit tells the video engine to use the default
+	 * quantization matrices. This will obviously need to be
+	 * changed to support the profiles supporting custom
+	 * quantization matrices.
+	 */
+	reg |= BIT(24);
+	reg |= (pps->second_chroma_qp_index_offset & 0x3f) << 16;
+	reg |= (pps->chroma_qp_index_offset & 0x3f) << 8;
+	reg |= (pps->pic_init_qp_minus26 + 26 + slice->slice_qp_delta) & 0x3f;
+	cedrus_write(dev, VE_H264_QP_PARAM, reg);
+
+	// clear status flags
+	cedrus_write(dev, VE_H264_STATUS, cedrus_read(dev, VE_H264_STATUS));
+
+	// enable int
+	reg = cedrus_read(dev, VE_H264_CTRL) | 0x7;
+	cedrus_write(dev, VE_H264_CTRL, reg);
+}
+
+static enum cedrus_irq_status
+cedrus_h264_irq_status(struct cedrus_ctx *ctx)
+{
+	struct cedrus_dev *dev = ctx->dev;
+	u32 reg = cedrus_read(dev, VE_H264_STATUS) & 0x7;
+
+	if (!reg)
+		return CEDRUS_IRQ_NONE;
+
+	if (reg & (BIT(1) | BIT(2)))
+		return CEDRUS_IRQ_ERROR;
+
+	return CEDRUS_IRQ_OK;
+}
+
+static void cedrus_h264_irq_clear(struct cedrus_ctx *ctx)
+{
+	struct cedrus_dev *dev = ctx->dev;
+
+	cedrus_write(dev, VE_H264_STATUS, GENMASK(2, 0));
+}
+
+static void cedrus_h264_irq_disable(struct cedrus_ctx *ctx)
+{
+	struct cedrus_dev *dev = ctx->dev;
+	u32 reg = cedrus_read(dev, VE_H264_CTRL) & ~GENMASK(2, 0);
+
+	cedrus_write(dev, VE_H264_CTRL, reg);
+}
+
+static void cedrus_h264_setup(struct cedrus_ctx *ctx,
+			      struct cedrus_run *run)
+{
+	struct cedrus_dev *dev = ctx->dev;
+
+	cedrus_engine_enable(dev, CEDRUS_CODEC_H264);
+
+	cedrus_write(dev, VE_H264_SDROT_CTRL, 0);
+	cedrus_write(dev, VE_H264_EXTRA_BUFFER1,
+		     ctx->codec.h264.pic_info_buf_dma - PHYS_OFFSET);
+	cedrus_write(dev, VE_H264_EXTRA_BUFFER2,
+		     (ctx->codec.h264.pic_info_buf_dma - PHYS_OFFSET) + 0x48000);
+
+	cedrus_write_frame_list(ctx, run);
+
+	cedrus_set_params(ctx, run);
+}
+
+static int cedrus_h264_start(struct cedrus_ctx *ctx)
+{
+	struct cedrus_dev *dev = ctx->dev;
+	unsigned int field_size;
+	unsigned int mv_col_size;
+	int ret;
+
+	ctx->codec.h264.pic_info_buf =
+		dma_alloc_coherent(dev->dev, CEDRUS_PIC_INFO_BUF_SIZE,
+				   &ctx->codec.h264.pic_info_buf_dma,
+				   GFP_KERNEL);
+	if (!ctx->codec.h264.pic_info_buf)
+		return -ENOMEM;
+
+	field_size = DIV_ROUND_UP(ctx->src_fmt.width, 16) *
+		DIV_ROUND_UP(ctx->src_fmt.height, 16) * 32;
+	ctx->codec.h264.mv_col_buf_field_size = field_size;
+
+	mv_col_size = field_size * 2 * CEDRUS_H264_FRAME_NUM;
+	ctx->codec.h264.mv_col_buf_size = mv_col_size;
+	ctx->codec.h264.mv_col_buf = dma_alloc_coherent(dev->dev,
+							ctx->codec.h264.mv_col_buf_size,
+							&ctx->codec.h264.mv_col_buf_dma,
+							GFP_KERNEL);
+	if (!ctx->codec.h264.mv_col_buf) {
+		ret = -ENOMEM;
+		goto err_pic_buf;
+	}
+
+	return 0;
+
+err_pic_buf:
+	dma_free_coherent(dev->dev, CEDRUS_PIC_INFO_BUF_SIZE,
+			  ctx->codec.h264.pic_info_buf,
+			  ctx->codec.h264.pic_info_buf_dma);
+	return ret;
+}
+
+static void cedrus_h264_stop(struct cedrus_ctx *ctx)
+{
+	struct cedrus_dev *dev = ctx->dev;
+
+	dma_free_coherent(dev->dev, ctx->codec.h264.mv_col_buf_size,
+			  ctx->codec.h264.mv_col_buf,
+			  ctx->codec.h264.mv_col_buf_dma);
+	dma_free_coherent(dev->dev, CEDRUS_PIC_INFO_BUF_SIZE,
+			  ctx->codec.h264.pic_info_buf,
+			  ctx->codec.h264.pic_info_buf_dma);
+}
+
+static void cedrus_h264_trigger(struct cedrus_ctx *ctx)
+{
+	struct cedrus_dev *dev = ctx->dev;
+
+	cedrus_write(dev, VE_H264_TRIGGER_TYPE,
+		     VE_H264_TRIGGER_TYPE_AVC_SLICE_DECODE);
+}
+
+struct cedrus_dec_ops cedrus_dec_ops_h264 = {
+	.irq_clear	= cedrus_h264_irq_clear,
+	.irq_disable	= cedrus_h264_irq_disable,
+	.irq_status	= cedrus_h264_irq_status,
+	.setup		= cedrus_h264_setup,
+	.start		= cedrus_h264_start,
+	.stop		= cedrus_h264_stop,
+	.trigger	= cedrus_h264_trigger,
+};
diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_hw.c b/drivers/staging/media/sunxi/cedrus/cedrus_hw.c
index 32adbcbe6175..8e559454ca82 100644
--- a/drivers/staging/media/sunxi/cedrus/cedrus_hw.c
+++ b/drivers/staging/media/sunxi/cedrus/cedrus_hw.c
@@ -46,6 +46,10 @@ int cedrus_engine_enable(struct cedrus_dev *dev, enum cedrus_codec codec)
 		reg |= VE_MODE_DEC_MPEG;
 		break;
 
+	case CEDRUS_CODEC_H264:
+		reg |= VE_MODE_DEC_H264;
+		break;
+
 	default:
 		return -EINVAL;
 	}
diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_regs.h b/drivers/staging/media/sunxi/cedrus/cedrus_regs.h
index de2d6b6f64bf..6fe9896a506d 100644
--- a/drivers/staging/media/sunxi/cedrus/cedrus_regs.h
+++ b/drivers/staging/media/sunxi/cedrus/cedrus_regs.h
@@ -232,4 +232,67 @@
 #define VE_DEC_MPEG_ROT_LUMA			(VE_ENGINE_DEC_MPEG + 0xcc)
 #define VE_DEC_MPEG_ROT_CHROMA			(VE_ENGINE_DEC_MPEG + 0xd0)
 
+/*  FIXME: Legacy below. */
+
+#define VBV_SIZE                       (1024 * 1024)
+
+#define VE_H264_FRAME_SIZE		0x200
+#define VE_H264_PIC_HDR			0x204
+#define VE_H264_SLICE_HDR		0x208
+#define VE_H264_SLICE_HDR2		0x20c
+#define VE_H264_PRED_WEIGHT		0x210
+#define VE_H264_QP_PARAM		0x21c
+#define VE_H264_CTRL			0x220
+
+#define VE_H264_TRIGGER_TYPE		0x224
+#define VE_H264_TRIGGER_TYPE_AVC_SLICE_DECODE	(8 << 0)
+#define VE_H264_TRIGGER_TYPE_INIT_SWDEC		(7 << 0)
+
+#define VE_H264_STATUS			0x228
+#define VE_H264_CUR_MB_NUM		0x22c
+
+#define VE_H264_VLD_ADDR		0x230
+#define VE_H264_VLD_ADDR_FIRST			BIT(30)
+#define VE_H264_VLD_ADDR_LAST			BIT(29)
+#define VE_H264_VLD_ADDR_VALID			BIT(28)
+#define VE_H264_VLD_ADDR_VAL(x)			(((x) & 0x0ffffff0) | ((x) >> 28))
+
+#define VE_H264_VLD_OFFSET		0x234
+#define VE_H264_VLD_LEN			0x238
+#define VE_H264_VLD_END			0x23c
+#define VE_H264_SDROT_CTRL		0x240
+#define VE_H264_OUTPUT_FRAME_IDX	0x24c
+#define VE_H264_EXTRA_BUFFER1		0x250
+#define VE_H264_EXTRA_BUFFER2		0x254
+#define VE_H264_BASIC_BITS		0x2dc
+#define VE_AVC_SRAM_PORT_OFFSET		0x2e0
+#define VE_AVC_SRAM_PORT_DATA		0x2e4
+
+#define VE_ISP_INPUT_SIZE		0xa00
+#define VE_ISP_INPUT_STRIDE		0xa04
+#define VE_ISP_CTRL			0xa08
+#define VE_ISP_INPUT_LUMA		0xa78
+#define VE_ISP_INPUT_CHROMA		0xa7c
+
+#define VE_AVC_PARAM			0xb04
+#define VE_AVC_QP			0xb08
+#define VE_AVC_MOTION_EST		0xb10
+#define VE_AVC_CTRL			0xb14
+#define VE_AVC_TRIGGER			0xb18
+#define VE_AVC_STATUS			0xb1c
+#define VE_AVC_BASIC_BITS		0xb20
+#define VE_AVC_UNK_BUF			0xb60
+#define VE_AVC_VLE_ADDR			0xb80
+#define VE_AVC_VLE_END			0xb84
+#define VE_AVC_VLE_OFFSET		0xb88
+#define VE_AVC_VLE_MAX			0xb8c
+#define VE_AVC_VLE_LENGTH		0xb90
+#define VE_AVC_REF_LUMA			0xba0
+#define VE_AVC_REF_CHROMA		0xba4
+#define VE_AVC_REC_LUMA			0xbb0
+#define VE_AVC_REC_CHROMA		0xbb4
+#define VE_AVC_REF_SLUMA		0xbb8
+#define VE_AVC_REC_SLUMA		0xbbc
+#define VE_AVC_MB_INFO			0xbc0
+
 #endif
diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_video.c b/drivers/staging/media/sunxi/cedrus/cedrus_video.c
index 293df48326cc..7be2caacddde 100644
--- a/drivers/staging/media/sunxi/cedrus/cedrus_video.c
+++ b/drivers/staging/media/sunxi/cedrus/cedrus_video.c
@@ -37,6 +37,10 @@ static struct cedrus_format cedrus_formats[] = {
 		.pixelformat	= V4L2_PIX_FMT_MPEG2_SLICE,
 		.directions	= CEDRUS_DECODE_SRC,
 	},
+	{
+		.pixelformat	= V4L2_PIX_FMT_H264_SLICE,
+		.directions	= CEDRUS_DECODE_SRC,
+	},
 	{
 		.pixelformat	= V4L2_PIX_FMT_SUNXI_TILED_NV12,
 		.directions	= CEDRUS_DECODE_DST,
@@ -100,6 +104,7 @@ static void cedrus_prepare_format(struct v4l2_pix_format *pix_fmt)
 
 	switch (pix_fmt->pixelformat) {
 	case V4L2_PIX_FMT_MPEG2_SLICE:
+	case V4L2_PIX_FMT_H264_SLICE:
 		/* Zero bytes per line for encoded source. */
 		bytesperline = 0;
 
@@ -451,6 +456,10 @@ static int cedrus_start_streaming(struct vb2_queue *vq, unsigned int count)
 		ctx->current_codec = CEDRUS_CODEC_MPEG2;
 		break;
 
+	case V4L2_PIX_FMT_H264_SLICE:
+		ctx->current_codec = CEDRUS_CODEC_H264;
+		break;
+
 	default:
 		return -EINVAL;
 	}
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/2] media: cedrus: Add H264 decoding support
  2018-11-15 14:56 [PATCH v2 0/2] media: cedrus: Add H264 decoding support Maxime Ripard
  2018-11-15 14:56 ` [PATCH v2 1/2] media: uapi: Add H264 low-level decoder API compound controls Maxime Ripard
  2018-11-15 14:56 ` [PATCH v2 2/2] media: cedrus: Add H264 decoding support Maxime Ripard
@ 2018-11-16  7:04 ` Tomasz Figa
  2018-11-19 14:12   ` Maxime Ripard
  2 siblings, 1 reply; 27+ messages in thread
From: Tomasz Figa @ 2018-11-16  7:04 UTC (permalink / raw)
  To: Maxime Ripard, Pawel Osciak
  Cc: Hans Verkuil, Alexandre Courbot, Sakari Ailus, Laurent Pinchart,
	Paul Kocialkowski, Chen-Yu Tsai, Linux Kernel Mailing List,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>,,
	Linux Media Mailing List, Nicolas Dufresne, jenskuske,
	linux-sunxi, thomas.petazzoni

Hi Maxime,

On Thu, Nov 15, 2018 at 11:56 PM Maxime Ripard
<maxime.ripard@bootlin.com> wrote:
>
>
> Hi,
>
> Here is a new version of the H264 decoding support in the cedrus
> driver.
>
> As you might already know, the cedrus driver relies on the Request
> API, and is a reverse engineered driver for the video decoding engine
> found on the Allwinner SoCs.
>
> This work has been possible thanks to the work done by the people
> behind libvdpau-sunxi found here:
> https://github.com/linux-sunxi/libvdpau-sunxi/
>
> It's based on v4.20-rc1, plus the tag patches sent this week by Hans
> Verkuil.

Thanks for looking into this. Please see my comments below.

>
> I've been using the controls currently integrated into ChromeOS that
> have a working version of this particular setup. However, these
> controls have a number of shortcomings and inconsistencies with other
> decoding API. I've worked with libva so far, but I've noticed already
> that:
>   - The kernel UAPI expects to have the nal_ref_idc variable, while
>     libva only exposes whether that frame is a reference frame or
>     not. I've looked at the rockchip driver in the ChromeOS tree, and
>     our own driver, and they both need only the information about
>     whether the frame is a reference one or not, so maybe we should
>     change this?

Since this is something that is actually present in the stream and the
problem is that libva doesn't convey the information properly, I
believe you can workaround it in the libva backend using this API by
just setting it to 0 and some arbitrary non-zero value in a binary
fashion.

>   - The H264 bitstream exposes the picture default reference list (for
>     both list 0 and list 1), the slice reference list and an override
>     flag. The libva will only pass the reference list to be used (so
>     either the picture default's or the slice's) depending on the
>     override flag. The kernel UAPI wants the picture default reference
>     list and the slice reference list, but doesn't expose the override
>     flag, which prevents us from configuring properly the
>     hardware. Our video decoding engine needs the three information,
>     but we can easily adapt to having only one. However, having two
>     doesn't really work for us.
>

>From what I can see in the H.264 Slice header, there are 3 related data:
 - num_ref_idx_active_override_flag - affects the number of reference
indices for the slice,
 - ref_list_l{0,1}_modifications - modifications for the reference lists,
 - ref_pic_list_modification_flag_l{0,1} - selects whether the
modifications are applied.

The reference lists inside the v4l2_ctrl_h264_slice_param are expected
to already take all the above into account and be the final reference
lists to be used for the slice. For reference, the H.264 specification
refers to those final reference lists as RefPicList0 and RefPicList1
and so the names of the fields in the struct.

There is some interesting background here, though. The Rockchip VPU
parses the slice headers itself and handles the above data on its own.
This means that it needs to be programmed with the unmodified
reference lists, as in v4l2_ctrl_h264_decode_param.

Given that, it sounds like we need to have both. Your driver would
always use the lists in v4l2_ctrl_h264_slice_param, while the Rockchip
VPU would ignore them, use the ones in v4l2_ctrl_h264_decode_param and
perform the per-slice modifications on its own.

Best regards,
Tomasz

> It's pretty much the only one I've noticed so far, but we should
> probably fix them already. And there's probably other, feel free to
> step in.
>
> I've tested the various ABI using this gdb script:
> http://code.bulix.org/jl4se4-505620?raw
>
> And this test script:
> http://code.bulix.org/8zle4s-505623?raw
>
> The application compiled is quite trivial:
> http://code.bulix.org/e34zp8-505624?raw
>
> The output is:
> arm:    builds/arm-test-v4l2-h264-structures
>         SHA1: 88cbf7485ba81831fc3b93772b215599b3b38318
> x86:    builds/x86-test-v4l2-h264-structures
>         SHA1: 88cbf7485ba81831fc3b93772b215599b3b38318
> x64:    builds/x64-test-v4l2-h264-structures
>         SHA1: 88cbf7485ba81831fc3b93772b215599b3b38318
> arm64:  builds/arm64-test-v4l2-h264-structures
>         SHA1: 88cbf7485ba81831fc3b93772b215599b3b38318
>
> Let me know if there's any flaw using that test setup, or if you have
> any comments on the patches.
>
> Maxime
>
> Changes from v1:
>   - Rebased on 4.20
>   - Did the documentation for the userspace API
>   - Used the tags instead of buffer IDs
>   - Added a comment to explain why we still needed the swdec trigger
>   - Reworked the MV col buffer in order to have one slot per frame
>   - Removed the unused neighbor info buffer
>   - Made sure to have the same structure offset and alignments across
>     32 bits and 64 bits architecture
>
> Maxime Ripard (1):
>   media: cedrus: Add H264 decoding support
>
> Pawel Osciak (1):
>   media: uapi: Add H264 low-level decoder API compound controls.
>
>  Documentation/media/uapi/v4l/biblio.rst       |   9 +
>  .../media/uapi/v4l/extended-controls.rst      | 364 ++++++++++++++
>  .../media/uapi/v4l/pixfmt-compressed.rst      |  20 +
>  .../media/uapi/v4l/vidioc-queryctrl.rst       |  30 ++
>  .../media/videodev2.h.rst.exceptions          |   5 +
>  drivers/media/v4l2-core/v4l2-ctrls.c          |  42 ++
>  drivers/media/v4l2-core/v4l2-ioctl.c          |   1 +
>  drivers/staging/media/sunxi/cedrus/Makefile   |   3 +-
>  drivers/staging/media/sunxi/cedrus/cedrus.c   |  25 +
>  drivers/staging/media/sunxi/cedrus/cedrus.h   |  35 +-
>  .../staging/media/sunxi/cedrus/cedrus_dec.c   |  11 +
>  .../staging/media/sunxi/cedrus/cedrus_h264.c  | 470 ++++++++++++++++++
>  .../staging/media/sunxi/cedrus/cedrus_hw.c    |   4 +
>  .../staging/media/sunxi/cedrus/cedrus_regs.h  |  63 +++
>  .../staging/media/sunxi/cedrus/cedrus_video.c |   9 +
>  include/media/v4l2-ctrls.h                    |  10 +
>  include/uapi/linux/v4l2-controls.h            | 166 +++++++
>  include/uapi/linux/videodev2.h                |  11 +
>  18 files changed, 1276 insertions(+), 2 deletions(-)
>  create mode 100644 drivers/staging/media/sunxi/cedrus/cedrus_h264.c
>
> --
> 2.19.1
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/2] media: cedrus: Add H264 decoding support
  2018-11-16  7:04 ` [PATCH v2 0/2] " Tomasz Figa
@ 2018-11-19 14:12   ` Maxime Ripard
  0 siblings, 0 replies; 27+ messages in thread
From: Maxime Ripard @ 2018-11-19 14:12 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: Pawel Osciak, Hans Verkuil, Alexandre Courbot, Sakari Ailus,
	Laurent Pinchart, Paul Kocialkowski, Chen-Yu Tsai,
	Linux Kernel Mailing List,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>,,
	Linux Media Mailing List, Nicolas Dufresne, jenskuske,
	linux-sunxi, thomas.petazzoni

[-- Attachment #1: Type: text/plain, Size: 3200 bytes --]

Hi Tomasz,

On Fri, Nov 16, 2018 at 04:04:40PM +0900, Tomasz Figa wrote:
> > I've been using the controls currently integrated into ChromeOS that
> > have a working version of this particular setup. However, these
> > controls have a number of shortcomings and inconsistencies with other
> > decoding API. I've worked with libva so far, but I've noticed already
> > that:
> >   - The kernel UAPI expects to have the nal_ref_idc variable, while
> >     libva only exposes whether that frame is a reference frame or
> >     not. I've looked at the rockchip driver in the ChromeOS tree, and
> >     our own driver, and they both need only the information about
> >     whether the frame is a reference one or not, so maybe we should
> >     change this?
> 
> Since this is something that is actually present in the stream and the
> problem is that libva doesn't convey the information properly, I
> believe you can workaround it in the libva backend using this API by
> just setting it to 0 and some arbitrary non-zero value in a binary
> fashion.

That could work yes, thanks for the suggestion!

> >   - The H264 bitstream exposes the picture default reference list (for
> >     both list 0 and list 1), the slice reference list and an override
> >     flag. The libva will only pass the reference list to be used (so
> >     either the picture default's or the slice's) depending on the
> >     override flag. The kernel UAPI wants the picture default reference
> >     list and the slice reference list, but doesn't expose the override
> >     flag, which prevents us from configuring properly the
> >     hardware. Our video decoding engine needs the three information,
> >     but we can easily adapt to having only one. However, having two
> >     doesn't really work for us.
> >
> 
> From what I can see in the H.264 Slice header, there are 3 related data:
>  - num_ref_idx_active_override_flag - affects the number of reference
> indices for the slice,
>  - ref_list_l{0,1}_modifications - modifications for the reference lists,
>  - ref_pic_list_modification_flag_l{0,1} - selects whether the
> modifications are applied.
> 
> The reference lists inside the v4l2_ctrl_h264_slice_param are expected
> to already take all the above into account and be the final reference
> lists to be used for the slice. For reference, the H.264 specification
> refers to those final reference lists as RefPicList0 and RefPicList1
> and so the names of the fields in the struct.
> 
> There is some interesting background here, though. The Rockchip VPU
> parses the slice headers itself and handles the above data on its own.
> This means that it needs to be programmed with the unmodified
> reference lists, as in v4l2_ctrl_h264_decode_param.
> 
> Given that, it sounds like we need to have both. Your driver would
> always use the lists in v4l2_ctrl_h264_slice_param, while the Rockchip
> VPU would ignore them, use the ones in v4l2_ctrl_h264_decode_param and
> perform the per-slice modifications on its own.

I guess that would work, yep

Thanks!
Maxime

-- 
Maxime Ripard, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [linux-sunxi] [PATCH v2 2/2] media: cedrus: Add H264 decoding support
  2018-11-15 14:56 ` [PATCH v2 2/2] media: cedrus: Add H264 decoding support Maxime Ripard
@ 2018-11-24 20:43   ` Jernej Škrabec
  2018-11-27 15:50     ` Maxime Ripard
  2018-11-30 12:37   ` Paul Kocialkowski
  2018-12-05 22:27   ` [linux-sunxi] " Jernej Škrabec
  2 siblings, 1 reply; 27+ messages in thread
From: Jernej Škrabec @ 2018-11-24 20:43 UTC (permalink / raw)
  To: linux-sunxi, maxime.ripard
  Cc: hans.verkuil, acourbot, sakari.ailus, Laurent Pinchart, tfiga,
	posciak, Paul Kocialkowski, Chen-Yu Tsai, linux-kernel,
	linux-arm-kernel, linux-media, nicolas.dufresne, jenskuske,
	Thomas Petazzoni

Hi,

first, thanks you for working on this! I also spend some time working on H264 
and I have some comments below.

Dne četrtek, 15. november 2018 ob 15:56:50 CET je Maxime Ripard napisal(a):
> Introduce some basic H264 decoding support in cedrus. So far, only the
> baseline profile videos have been tested, and some more advanced features
> used in higher profiles are not even implemented.
> 
> Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
> ---
>  drivers/staging/media/sunxi/cedrus/Makefile   |   3 +-
>  drivers/staging/media/sunxi/cedrus/cedrus.c   |  25 +
>  drivers/staging/media/sunxi/cedrus/cedrus.h   |  35 +-
>  .../staging/media/sunxi/cedrus/cedrus_dec.c   |  11 +
>  .../staging/media/sunxi/cedrus/cedrus_h264.c  | 470 ++++++++++++++++++
>  .../staging/media/sunxi/cedrus/cedrus_hw.c    |   4 +
>  .../staging/media/sunxi/cedrus/cedrus_regs.h  |  63 +++
>  .../staging/media/sunxi/cedrus/cedrus_video.c |   9 +
>  8 files changed, 618 insertions(+), 2 deletions(-)
>  create mode 100644 drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> 
> diff --git a/drivers/staging/media/sunxi/cedrus/Makefile
> b/drivers/staging/media/sunxi/cedrus/Makefile index
> e9dc68b7bcb6..aaf141fc58b6 100644
> --- a/drivers/staging/media/sunxi/cedrus/Makefile
> +++ b/drivers/staging/media/sunxi/cedrus/Makefile
> @@ -1,3 +1,4 @@
>  obj-$(CONFIG_VIDEO_SUNXI_CEDRUS) += sunxi-cedrus.o
> 
> -sunxi-cedrus-y = cedrus.o cedrus_video.o cedrus_hw.o cedrus_dec.o
> cedrus_mpeg2.o +sunxi-cedrus-y = cedrus.o cedrus_video.o cedrus_hw.o
> cedrus_dec.o \ +		 cedrus_mpeg2.o cedrus_h264.o
> diff --git a/drivers/staging/media/sunxi/cedrus/cedrus.c
> b/drivers/staging/media/sunxi/cedrus/cedrus.c index
> 82558455384a..627a8c07eb21 100644
> --- a/drivers/staging/media/sunxi/cedrus/cedrus.c
> +++ b/drivers/staging/media/sunxi/cedrus/cedrus.c
> @@ -40,6 +40,30 @@ static const struct cedrus_control cedrus_controls[] = {
>  		.codec		= CEDRUS_CODEC_MPEG2,
>  		.required	= false,
>  	},
> +	{
> +		.id		= V4L2_CID_MPEG_VIDEO_H264_DECODE_PARAMS,
> +		.elem_size	= sizeof(struct v4l2_ctrl_h264_decode_param),
> +		.codec		= CEDRUS_CODEC_H264,
> +		.required	= true,
> +	},
> +	{
> +		.id		= V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAMS,
> +		.elem_size	= sizeof(struct v4l2_ctrl_h264_slice_param),
> +		.codec		= CEDRUS_CODEC_H264,
> +		.required	= true,
> +	},
> +	{
> +		.id		= V4L2_CID_MPEG_VIDEO_H264_SPS,
> +		.elem_size	= sizeof(struct v4l2_ctrl_h264_sps),
> +		.codec		= CEDRUS_CODEC_H264,
> +		.required	= true,
> +	},
> +	{
> +		.id		= V4L2_CID_MPEG_VIDEO_H264_PPS,
> +		.elem_size	= sizeof(struct v4l2_ctrl_h264_pps),
> +		.codec		= CEDRUS_CODEC_H264,
> +		.required	= true,
> +	},
>  };
> 
>  #define CEDRUS_CONTROLS_COUNT	ARRAY_SIZE(cedrus_controls)
> @@ -277,6 +301,7 @@ static int cedrus_probe(struct platform_device *pdev)
>  	}
> 
>  	dev->dec_ops[CEDRUS_CODEC_MPEG2] = &cedrus_dec_ops_mpeg2;
> +	dev->dec_ops[CEDRUS_CODEC_H264] = &cedrus_dec_ops_h264;
> 
>  	mutex_init(&dev->dev_mutex);
>  	spin_lock_init(&dev->irq_lock);
> diff --git a/drivers/staging/media/sunxi/cedrus/cedrus.h
> b/drivers/staging/media/sunxi/cedrus/cedrus.h index
> 781676b55a1b..179c10dcf6a7 100644
> --- a/drivers/staging/media/sunxi/cedrus/cedrus.h
> +++ b/drivers/staging/media/sunxi/cedrus/cedrus.h
> @@ -30,7 +30,7 @@
> 
>  enum cedrus_codec {
>  	CEDRUS_CODEC_MPEG2,
> -
> +	CEDRUS_CODEC_H264,
>  	CEDRUS_CODEC_LAST,
>  };
> 
> @@ -40,6 +40,12 @@ enum cedrus_irq_status {
>  	CEDRUS_IRQ_OK,
>  };
> 
> +enum cedrus_h264_pic_type {
> +	CEDRUS_H264_PIC_TYPE_FRAME	= 0,
> +	CEDRUS_H264_PIC_TYPE_FIELD,
> +	CEDRUS_H264_PIC_TYPE_MBAFF,
> +};
> +
>  struct cedrus_control {
>  	u32			id;
>  	u32			elem_size;
> @@ -47,6 +53,13 @@ struct cedrus_control {
>  	unsigned char		required:1;
>  };
> 
> +struct cedrus_h264_run {
> +	const struct v4l2_ctrl_h264_decode_param	*decode_param;
> +	const struct v4l2_ctrl_h264_pps			*pps;
> +	const struct v4l2_ctrl_h264_slice_param		*slice_param;
> +	const struct v4l2_ctrl_h264_sps			*sps;
> +};
> +
>  struct cedrus_mpeg2_run {
>  	const struct v4l2_ctrl_mpeg2_slice_params	*slice_params;
>  	const struct v4l2_ctrl_mpeg2_quantization	*quantization;
> @@ -57,12 +70,20 @@ struct cedrus_run {
>  	struct vb2_v4l2_buffer	*dst;
> 
>  	union {
> +		struct cedrus_h264_run	h264;
>  		struct cedrus_mpeg2_run	mpeg2;
>  	};
>  };
> 
>  struct cedrus_buffer {
>  	struct v4l2_m2m_buffer          m2m_buf;
> +
> +	union {
> +		struct {
> +			unsigned int			position;
> +			enum cedrus_h264_pic_type	pic_type;
> +		} h264;
> +	} codec;
>  };
> 
>  struct cedrus_ctx {
> @@ -77,6 +98,17 @@ struct cedrus_ctx {
>  	struct v4l2_ctrl		**ctrls;
> 
>  	struct vb2_buffer		*dst_bufs[VIDEO_MAX_FRAME];
> +
> +	union {
> +		struct {
> +			void		*mv_col_buf;
> +			dma_addr_t	mv_col_buf_dma;
> +			ssize_t		mv_col_buf_field_size;
> +			ssize_t		mv_col_buf_size;
> +			void		*pic_info_buf;
> +			dma_addr_t	pic_info_buf_dma;
> +		} h264;
> +	} codec;
>  };
> 
>  struct cedrus_dec_ops {
> @@ -120,6 +152,7 @@ struct cedrus_dev {
>  };
> 
>  extern struct cedrus_dec_ops cedrus_dec_ops_mpeg2;
> +extern struct cedrus_dec_ops cedrus_dec_ops_h264;
> 
>  static inline void cedrus_write(struct cedrus_dev *dev, u32 reg, u32 val)
>  {
> diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_dec.c
> b/drivers/staging/media/sunxi/cedrus/cedrus_dec.c index
> 0cfd6036d0cd..b606f07d94ab 100644
> --- a/drivers/staging/media/sunxi/cedrus/cedrus_dec.c
> +++ b/drivers/staging/media/sunxi/cedrus/cedrus_dec.c
> @@ -49,6 +49,17 @@ void cedrus_device_run(void *priv)
>  			V4L2_CID_MPEG_VIDEO_MPEG2_QUANTIZATION);
>  		break;
> 
> +	case V4L2_PIX_FMT_H264_SLICE:
> +		run.h264.decode_param = cedrus_find_control_data(ctx,
> +			V4L2_CID_MPEG_VIDEO_H264_DECODE_PARAMS);
> +		run.h264.pps = cedrus_find_control_data(ctx,
> +			V4L2_CID_MPEG_VIDEO_H264_PPS);
> +		run.h264.slice_param = cedrus_find_control_data(ctx,
> +			V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAMS);
> +		run.h264.sps = cedrus_find_control_data(ctx,
> +			V4L2_CID_MPEG_VIDEO_H264_SPS);
> +		break;
> +
>  	default:
>  		break;
>  	}
> diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c new file mode 100644
> index 000000000000..5459a936b4b9
> --- /dev/null
> +++ b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> @@ -0,0 +1,470 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * Copyright (c) 2013 Jens Kuske <jenskuske@gmail.com>
> + * Copyright (c) 2018 Bootlin
> + */
> +
> +#include <linux/types.h>
> +
> +#include <media/videobuf2-dma-contig.h>
> +
> +#include "cedrus.h"
> +#include "cedrus_hw.h"
> +#include "cedrus_regs.h"
> +
> +enum cedrus_h264_sram_off {
> +	CEDRUS_SRAM_H264_PRED_WEIGHT_TABLE	= 0x000,
> +	CEDRUS_SRAM_H264_FRAMEBUFFER_LIST	= 0x100,
> +	CEDRUS_SRAM_H264_REF_LIST_0		= 0x190,
> +	CEDRUS_SRAM_H264_REF_LIST_1		= 0x199,
> +	CEDRUS_SRAM_H264_SCALING_LIST_8x8	= 0x200,
> +	CEDRUS_SRAM_H264_SCALING_LIST_4x4	= 0x218,

I triple checked above address and it should be 0x220. For easier 
implementation later, you might want to add second scaling list address for 
8x8 at 0x210. Then you can do something like:

cedrus_h264_write_sram(dev, CEDRUS_SRAM_H264_SCALING_LIST_8x8_0,
			       scaling->scaling_list_8x8[0],
			       sizeof(scaling->scaling_list_8x8[0]));
cedrus_h264_write_sram(dev, CEDRUS_SRAM_H264_SCALING_LIST_8x8_1,
			       scaling->scaling_list_8x8[3],
			       sizeof(scaling->scaling_list_8x8[0]));
cedrus_h264_write_sram(dev, CEDRUS_SRAM_H264_SCALING_LIST_4x4,
			       scaling->scaling_list_4x4,
			       sizeof(scaling->scaling_list_4x4));

I know that it's not implemented here, just FYI.

> +};
> +
> +struct cedrus_h264_sram_ref_pic {
> +	__le32	top_field_order_cnt;
> +	__le32	bottom_field_order_cnt;
> +	__le32	frame_info;
> +	__le32	luma_ptr;
> +	__le32	chroma_ptr;
> +	__le32	mv_col_top_ptr;
> +	__le32	mv_col_bot_ptr;
> +	__le32	reserved;
> +} __packed;
> +
> +/* One for the output, 16 for the reference images */
> +#define CEDRUS_H264_FRAME_NUM		17
> +
> +#define CEDRUS_PIC_INFO_BUF_SIZE	(128 * SZ_1K)
> +
> +static void cedrus_h264_write_sram(struct cedrus_dev *dev,
> +				   enum cedrus_h264_sram_off off,
> +				   const void *data, size_t len)
> +{
> +	const u32 *buffer = data;
> +	size_t count = DIV_ROUND_UP(len, 4);
> +
> +	cedrus_write(dev, VE_AVC_SRAM_PORT_OFFSET, off << 2);
> +
> +	do {
> +		cedrus_write(dev, VE_AVC_SRAM_PORT_DATA, *buffer++);
> +	} while (--count);
> +}
> +
> +static dma_addr_t cedrus_h264_mv_col_buf_addr(struct cedrus_ctx *ctx,
> +					      unsigned int position,
> +					      unsigned int field)
> +{
> +	dma_addr_t addr = ctx->codec.h264.mv_col_buf_dma - PHYS_OFFSET;
> +
> +	/* Adjust for the position */
> +	addr += position * ctx->codec.h264.mv_col_buf_field_size * 2;
> +
> +	/* Adjust for the field */
> +	addr += field * ctx->codec.h264.mv_col_buf_field_size;
> +
> +	return addr;
> +}
> +
> +static void cedrus_fill_ref_pic(struct cedrus_ctx *ctx,
> +				struct cedrus_buffer *buf,
> +				unsigned int top_field_order_cnt,
> +				unsigned int bottom_field_order_cnt,
> +				struct cedrus_h264_sram_ref_pic *pic)
> +{
> +	struct vb2_buffer *vbuf = &buf->m2m_buf.vb.vb2_buf;
> +	unsigned int position = buf->codec.h264.position;
> +
> +	pic->top_field_order_cnt = top_field_order_cnt;
> +	pic->bottom_field_order_cnt = bottom_field_order_cnt;
> +	pic->frame_info = buf->codec.h264.pic_type << 8;
> +
> +	pic->luma_ptr = cedrus_buf_addr(vbuf, &ctx->dst_fmt, 0) - PHYS_OFFSET;
> +	pic->chroma_ptr = cedrus_buf_addr(vbuf, &ctx->dst_fmt, 1) - PHYS_OFFSET;

I think subtracting PHYS_OFFSET breaks driver on H3 boards with 2 GiB of RAM. 
Isn't that unnecessary anyway due to

dev->dev->dma_pfn_offset = PHYS_PFN_OFFSET;

in cedrus_hw.c?

This comment is meant for all PHYS_OFFSET subtracting in this patch.

> +	pic->mv_col_top_ptr = cedrus_h264_mv_col_buf_addr(ctx, position, 0);
> +	pic->mv_col_bot_ptr = cedrus_h264_mv_col_buf_addr(ctx, position, 1);
> +}
> +
> +static void cedrus_write_frame_list(struct cedrus_ctx *ctx,
> +				    struct cedrus_run *run)
> +{
> +	struct cedrus_h264_sram_ref_pic pic_list[CEDRUS_H264_FRAME_NUM];
> +	const struct v4l2_ctrl_h264_decode_param *dec_param =
> run->h264.decode_param; +	const struct v4l2_ctrl_h264_slice_param *slice =
> run->h264.slice_param; +	const struct v4l2_ctrl_h264_sps *sps =
> run->h264.sps;
> +	struct vb2_queue *cap_q = &ctx->fh.m2m_ctx->cap_q_ctx.q;
> +	struct cedrus_buffer *output_buf;
> +	struct cedrus_dev *dev = ctx->dev;
> +	unsigned long used_dpbs = 0;
> +	unsigned int position;
> +	unsigned int output = 0;
> +	unsigned int i;
> +
> +	memset(pic_list, 0, sizeof(pic_list));
> +
> +	for (i = 0; i < ARRAY_SIZE(dec_param->dpb); i++) {
> +		const struct v4l2_h264_dpb_entry *dpb = &dec_param->dpb[i];
> +		struct cedrus_buffer *cedrus_buf;
> +		int buf_idx;
> +
> +		if (!(dpb->flags & V4L2_H264_DPB_ENTRY_FLAG_VALID))
> +			continue;
> +
> +		buf_idx = vb2_find_tag(cap_q, dpb->tag, 0);
> +		if (buf_idx < 0)
> +			continue;
> +
> +		cedrus_buf = vb2_to_cedrus_buffer(ctx->dst_bufs[buf_idx]);
> +		position = cedrus_buf->codec.h264.position;
> +		used_dpbs |= BIT(position);
> +
> +		if (!(dpb->flags & V4L2_H264_DPB_ENTRY_FLAG_ACTIVE))
> +			continue;
> +
> +		cedrus_fill_ref_pic(ctx, cedrus_buf,
> +				    dpb->top_field_order_cnt,
> +				    dpb->bottom_field_order_cnt,
> +				    &pic_list[position]);
> +
> +		output = max(position, output);
> +	}
> +
> +	position = find_next_zero_bit(&used_dpbs, CEDRUS_H264_FRAME_NUM,
> +				      output);
> +	if (position >= CEDRUS_H264_FRAME_NUM)
> +		position = find_first_zero_bit(&used_dpbs, CEDRUS_H264_FRAME_NUM);
> +
> +	output_buf = vb2_to_cedrus_buffer(&run->dst->vb2_buf);
> +	output_buf->codec.h264.position = position;
> +
> +	if (slice->flags & V4L2_H264_SLICE_FLAG_FIELD_PIC)
> +		output_buf->codec.h264.pic_type = CEDRUS_H264_PIC_TYPE_FIELD;
> +	else if (sps->flags & V4L2_H264_SPS_FLAG_MB_ADAPTIVE_FRAME_FIELD)
> +		output_buf->codec.h264.pic_type = CEDRUS_H264_PIC_TYPE_MBAFF;
> +	else
> +		output_buf->codec.h264.pic_type = CEDRUS_H264_PIC_TYPE_FRAME;
> +
> +	cedrus_fill_ref_pic(ctx, output_buf,
> +			    dec_param->top_field_order_cnt,
> +			    dec_param->bottom_field_order_cnt,
> +			    &pic_list[position]);
> +
> +	cedrus_h264_write_sram(dev, CEDRUS_SRAM_H264_FRAMEBUFFER_LIST,
> +			       pic_list, sizeof(pic_list));
> +
> +	cedrus_write(dev, VE_H264_OUTPUT_FRAME_IDX, position);
> +}
> +
> +#define CEDRUS_MAX_REF_IDX	32
> +
> +static void _cedrus_write_ref_list(struct cedrus_ctx *ctx,
> +				   struct cedrus_run *run,
> +				   const u8 *ref_list, u8 num_ref,
> +				   enum cedrus_h264_sram_off sram)
> +{
> +	const struct v4l2_ctrl_h264_decode_param *decode = run->h264.decode_param;
> +	struct vb2_queue *cap_q = &ctx->fh.m2m_ctx->cap_q_ctx.q;
> +	struct cedrus_dev *dev = ctx->dev;
> +	u32 sram_array[CEDRUS_MAX_REF_IDX / sizeof(u32)];
> +	unsigned int size, i;
> +
> +	memset(sram_array, 0, sizeof(sram_array));
> +
> +	for (i = 0; i < num_ref; i += 4) {
> +		unsigned int j;
> +
> +		for (j = 0; j < 4; j++) {

I don't think you have to complicate with two loops here. 
cedrus_h264_write_sram() takes void* and it aligns to 4 anyway. So as long 
input buffer is multiple of 4 (u8[CEDRUS_MAX_REF_IDX] qualifies for that), you 
can use single for loop with "u8 sram_array[CEDRUS_MAX_REF_IDX]". This should 
make code much more readable.

> +			const struct v4l2_h264_dpb_entry *dpb;
> +			const struct cedrus_buffer *cedrus_buf;
> +			const struct vb2_v4l2_buffer *ref_buf;
> +			unsigned int position;
> +			int buf_idx;
> +			u8 ref_idx = i + j;
> +			u8 dpb_idx;
> +
> +			if (ref_idx >= num_ref)
> +				break;
> +
> +			dpb_idx = ref_list[ref_idx];
> +			dpb = &decode->dpb[dpb_idx];
> +
> +			if (!(dpb->flags & V4L2_H264_DPB_ENTRY_FLAG_ACTIVE))
> +				continue;
> +
> +			buf_idx = vb2_find_tag(cap_q, dpb->tag, 0);
> +			if (buf_idx < 0)
> +				continue;
> +
> +			ref_buf = to_vb2_v4l2_buffer(ctx->dst_bufs[buf_idx]);
> +			cedrus_buf = vb2_v4l2_to_cedrus_buffer(ref_buf);
> +			position = cedrus_buf->codec.h264.position;
> +
> +			sram_array[i] |= position << (j * 8 + 1);
> +			if (ref_buf->field == V4L2_FIELD_BOTTOM)

You newer set above flag to buffer so this will be always false.

> +				sram_array[i] |= BIT(j * 8);
> +		}
> +	}
> +
> +	size = min((unsigned int)ALIGN(num_ref, 4), sizeof(sram_array));
> +	cedrus_h264_write_sram(dev, sram, &sram_array, size);
> +}
> +
> +static void cedrus_write_ref_list0(struct cedrus_ctx *ctx,
> +				   struct cedrus_run *run)
> +{
> +	const struct v4l2_ctrl_h264_slice_param *slice = run->h264.slice_param;
> +
> +	_cedrus_write_ref_list(ctx, run,
> +			       slice->ref_pic_list0,
> +			       slice->num_ref_idx_l0_active_minus1 + 1,
> +			       CEDRUS_SRAM_H264_REF_LIST_0);
> +}
> +
> +static void cedrus_write_ref_list1(struct cedrus_ctx *ctx,
> +				   struct cedrus_run *run)
> +{
> +	const struct v4l2_ctrl_h264_slice_param *slice = run->h264.slice_param;
> +
> +	_cedrus_write_ref_list(ctx, run,
> +			       slice->ref_pic_list1,
> +			       slice->num_ref_idx_l1_active_minus1 + 1,
> +			       CEDRUS_SRAM_H264_REF_LIST_1);
> +}
> +
> +static void cedrus_set_params(struct cedrus_ctx *ctx,
> +			      struct cedrus_run *run)
> +{
> +	const struct v4l2_ctrl_h264_slice_param *slice = run->h264.slice_param;
> +	const struct v4l2_ctrl_h264_pps *pps = run->h264.pps;
> +	const struct v4l2_ctrl_h264_sps *sps = run->h264.sps;
> +	struct cedrus_dev *dev = ctx->dev;
> +	dma_addr_t src_buf_addr;
> +	u32 offset = slice->header_bit_size;
> +	u32 len = (slice->size * 8) - offset;
> +	u32 reg;
> +
> +	cedrus_write(dev, 0x220, 0x02000400);
> +	cedrus_write(dev, VE_H264_VLD_LEN, len);
> +	cedrus_write(dev, VE_H264_VLD_OFFSET, offset);
> +
> +	src_buf_addr = vb2_dma_contig_plane_dma_addr(&run->src->vb2_buf, 0);
> +	src_buf_addr -= PHYS_OFFSET;
> +	cedrus_write(dev, VE_H264_VLD_END, src_buf_addr + VBV_SIZE - 1);
> +	cedrus_write(dev, VE_H264_VLD_ADDR,
> +		     VE_H264_VLD_ADDR_VAL(src_buf_addr) |
> +		     VE_H264_VLD_ADDR_FIRST | VE_H264_VLD_ADDR_VALID |
> +		     VE_H264_VLD_ADDR_LAST);
> +
> +	/*
> +	 * FIXME: Since the bitstream parsing is done in software, and
> +	 * in userspace, this shouldn't be needed anymore. But it
> +	 * turns out that removing it breaks the decoding process,
> +	 * without any clear indication why.
> +	 */
> +	cedrus_write(dev, VE_H264_TRIGGER_TYPE,
> +		     VE_H264_TRIGGER_TYPE_INIT_SWDEC);
> +
> +	if ((slice->slice_type == V4L2_H264_SLICE_TYPE_P) ||
> +	    (slice->slice_type == V4L2_H264_SLICE_TYPE_SP) ||
> +	    (slice->slice_type == V4L2_H264_SLICE_TYPE_B))
> +		cedrus_write_ref_list0(ctx, run);
> +
> +	if (slice->slice_type == V4L2_H264_SLICE_TYPE_B)
> +		cedrus_write_ref_list1(ctx, run);
> +
> +	// picture parameters
> +	reg = 0;
> +	/*
> +	 * FIXME: the kernel headers are allowing the default value to
> +	 * be passed, but the libva doesn't give us that.
> +	 */
> +	reg |= (slice->num_ref_idx_l0_active_minus1 & 0x1f) << 10;
> +	reg |= (slice->num_ref_idx_l1_active_minus1 & 0x1f) << 5;
> +	reg |= (pps->weighted_bipred_idc & 0x3) << 2;
> +	if (pps->flags & V4L2_H264_PPS_FLAG_ENTROPY_CODING_MODE)
> +		reg |= BIT(15);
> +	if (pps->flags & V4L2_H264_PPS_FLAG_WEIGHTED_PRED)
> +		reg |= BIT(4);
> +	if (pps->flags & V4L2_H264_PPS_FLAG_CONSTRAINED_INTRA_PRED)
> +		reg |= BIT(1);
> +	if (pps->flags & V4L2_H264_PPS_FLAG_TRANSFORM_8X8_MODE)
> +		reg |= BIT(0);
> +	cedrus_write(dev, VE_H264_PIC_HDR, reg);
> +
> +	// sequence parameters
> +	reg = BIT(19);

This one can be inferred from sps->chroma_format_idc.

> +	reg |= (sps->pic_width_in_mbs_minus1 & 0xff) << 8;
> +	reg |= sps->pic_height_in_map_units_minus1 & 0xff;
> +	if (sps->flags & V4L2_H264_SPS_FLAG_FRAME_MBS_ONLY)
> +		reg |= BIT(18);
> +	if (sps->flags & V4L2_H264_SPS_FLAG_MB_ADAPTIVE_FRAME_FIELD)
> +		reg |= BIT(17);
> +	if (sps->flags & V4L2_H264_SPS_FLAG_DIRECT_8X8_INFERENCE)
> +		reg |= BIT(16);
> +	cedrus_write(dev, VE_H264_FRAME_SIZE, reg);
> +
> +	// slice parameters
> +	reg = 0;
> +	/*
> +	 * FIXME: This bit marks all the frames as references. This
> +	 * should probably be set based on nal_ref_idc, but the libva
> +	 * doesn't pass that information along, so this is not always
> +	 * available. We should find something else, maybe change the
> +	 * kernel UAPI somehow?
> +	 */
> +	reg |= BIT(12);

I really think you should use nal_ref_idc here as it is in specification.  You 
can still fake the data from libva backend. I don't think that any driver 
needs this for anything else than check if it is 0 or not.

> +	reg |= (slice->slice_type & 0xf) << 8;
> +	reg |= slice->cabac_init_idc & 0x3;
> +	reg |= BIT(5);
> +	if (slice->flags & V4L2_H264_SLICE_FLAG_FIELD_PIC)
> +		reg |= BIT(4);
> +	if (slice->flags & V4L2_H264_SLICE_FLAG_BOTTOM_FIELD)
> +		reg |= BIT(3);
> +	if (slice->flags & V4L2_H264_SLICE_FLAG_DIRECT_SPATIAL_MV_PRED)
> +		reg |= BIT(2);
> +	cedrus_write(dev, VE_H264_SLICE_HDR, reg);
> +
> +	reg = 0;

You might want to set bit 12 here, which enables active reference picture 
override. However, I'm not completely sure about that.

Best regards,
Jernej

> +	reg |= (slice->num_ref_idx_l0_active_minus1 & 0x1f) << 24;
> +	reg |= (slice->num_ref_idx_l1_active_minus1 & 0x1f) << 16;
> +	reg |= (slice->disable_deblocking_filter_idc & 0x3) << 8;
> +	reg |= (slice->slice_alpha_c0_offset_div2 & 0xf) << 4;
> +	reg |= slice->slice_beta_offset_div2 & 0xf;
> +	cedrus_write(dev, VE_H264_SLICE_HDR2, reg);
> +
> +	reg = 0;
> +	/*
> +	 * FIXME: This bit tells the video engine to use the default
> +	 * quantization matrices. This will obviously need to be
> +	 * changed to support the profiles supporting custom
> +	 * quantization matrices.
> +	 */
> +	reg |= BIT(24);
> +	reg |= (pps->second_chroma_qp_index_offset & 0x3f) << 16;
> +	reg |= (pps->chroma_qp_index_offset & 0x3f) << 8;
> +	reg |= (pps->pic_init_qp_minus26 + 26 + slice->slice_qp_delta) & 0x3f;
> +	cedrus_write(dev, VE_H264_QP_PARAM, reg);
> +
> +	// clear status flags
> +	cedrus_write(dev, VE_H264_STATUS, cedrus_read(dev, VE_H264_STATUS));
> +
> +	// enable int
> +	reg = cedrus_read(dev, VE_H264_CTRL) | 0x7;
> +	cedrus_write(dev, VE_H264_CTRL, reg);
> +}
> +
> +static enum cedrus_irq_status
> +cedrus_h264_irq_status(struct cedrus_ctx *ctx)
> +{
> +	struct cedrus_dev *dev = ctx->dev;
> +	u32 reg = cedrus_read(dev, VE_H264_STATUS) & 0x7;
> +
> +	if (!reg)
> +		return CEDRUS_IRQ_NONE;
> +
> +	if (reg & (BIT(1) | BIT(2)))
> +		return CEDRUS_IRQ_ERROR;
> +
> +	return CEDRUS_IRQ_OK;
> +}
> +
> +static void cedrus_h264_irq_clear(struct cedrus_ctx *ctx)
> +{
> +	struct cedrus_dev *dev = ctx->dev;
> +
> +	cedrus_write(dev, VE_H264_STATUS, GENMASK(2, 0));
> +}
> +
> +static void cedrus_h264_irq_disable(struct cedrus_ctx *ctx)
> +{
> +	struct cedrus_dev *dev = ctx->dev;
> +	u32 reg = cedrus_read(dev, VE_H264_CTRL) & ~GENMASK(2, 0);
> +
> +	cedrus_write(dev, VE_H264_CTRL, reg);
> +}
> +
> +static void cedrus_h264_setup(struct cedrus_ctx *ctx,
> +			      struct cedrus_run *run)
> +{
> +	struct cedrus_dev *dev = ctx->dev;
> +
> +	cedrus_engine_enable(dev, CEDRUS_CODEC_H264);
> +
> +	cedrus_write(dev, VE_H264_SDROT_CTRL, 0);
> +	cedrus_write(dev, VE_H264_EXTRA_BUFFER1,
> +		     ctx->codec.h264.pic_info_buf_dma - PHYS_OFFSET);
> +	cedrus_write(dev, VE_H264_EXTRA_BUFFER2,
> +		     (ctx->codec.h264.pic_info_buf_dma - PHYS_OFFSET) + 0x48000);
> +
> +	cedrus_write_frame_list(ctx, run);
> +
> +	cedrus_set_params(ctx, run);
> +}
> +
> +static int cedrus_h264_start(struct cedrus_ctx *ctx)
> +{
> +	struct cedrus_dev *dev = ctx->dev;
> +	unsigned int field_size;
> +	unsigned int mv_col_size;
> +	int ret;
> +
> +	ctx->codec.h264.pic_info_buf =
> +		dma_alloc_coherent(dev->dev, CEDRUS_PIC_INFO_BUF_SIZE,
> +				   &ctx->codec.h264.pic_info_buf_dma,
> +				   GFP_KERNEL);
> +	if (!ctx->codec.h264.pic_info_buf)
> +		return -ENOMEM;
> +
> +	field_size = DIV_ROUND_UP(ctx->src_fmt.width, 16) *
> +		DIV_ROUND_UP(ctx->src_fmt.height, 16) * 32;
> +	ctx->codec.h264.mv_col_buf_field_size = field_size;
> +
> +	mv_col_size = field_size * 2 * CEDRUS_H264_FRAME_NUM;
> +	ctx->codec.h264.mv_col_buf_size = mv_col_size;
> +	ctx->codec.h264.mv_col_buf = dma_alloc_coherent(dev->dev,
> +							ctx->codec.h264.mv_col_buf_size,
> +							&ctx->codec.h264.mv_col_buf_dma,
> +							GFP_KERNEL);
> +	if (!ctx->codec.h264.mv_col_buf) {
> +		ret = -ENOMEM;
> +		goto err_pic_buf;
> +	}
> +
> +	return 0;
> +
> +err_pic_buf:
> +	dma_free_coherent(dev->dev, CEDRUS_PIC_INFO_BUF_SIZE,
> +			  ctx->codec.h264.pic_info_buf,
> +			  ctx->codec.h264.pic_info_buf_dma);
> +	return ret;
> +}
> +
> +static void cedrus_h264_stop(struct cedrus_ctx *ctx)
> +{
> +	struct cedrus_dev *dev = ctx->dev;
> +
> +	dma_free_coherent(dev->dev, ctx->codec.h264.mv_col_buf_size,
> +			  ctx->codec.h264.mv_col_buf,
> +			  ctx->codec.h264.mv_col_buf_dma);
> +	dma_free_coherent(dev->dev, CEDRUS_PIC_INFO_BUF_SIZE,
> +			  ctx->codec.h264.pic_info_buf,
> +			  ctx->codec.h264.pic_info_buf_dma);
> +}
> +
> +static void cedrus_h264_trigger(struct cedrus_ctx *ctx)
> +{
> +	struct cedrus_dev *dev = ctx->dev;
> +
> +	cedrus_write(dev, VE_H264_TRIGGER_TYPE,
> +		     VE_H264_TRIGGER_TYPE_AVC_SLICE_DECODE);
> +}
> +
> +struct cedrus_dec_ops cedrus_dec_ops_h264 = {
> +	.irq_clear	= cedrus_h264_irq_clear,
> +	.irq_disable	= cedrus_h264_irq_disable,
> +	.irq_status	= cedrus_h264_irq_status,
> +	.setup		= cedrus_h264_setup,
> +	.start		= cedrus_h264_start,
> +	.stop		= cedrus_h264_stop,
> +	.trigger	= cedrus_h264_trigger,
> +};
> diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_hw.c
> b/drivers/staging/media/sunxi/cedrus/cedrus_hw.c index
> 32adbcbe6175..8e559454ca82 100644
> --- a/drivers/staging/media/sunxi/cedrus/cedrus_hw.c
> +++ b/drivers/staging/media/sunxi/cedrus/cedrus_hw.c
> @@ -46,6 +46,10 @@ int cedrus_engine_enable(struct cedrus_dev *dev, enum
> cedrus_codec codec) reg |= VE_MODE_DEC_MPEG;
>  		break;
> 
> +	case CEDRUS_CODEC_H264:
> +		reg |= VE_MODE_DEC_H264;
> +		break;
> +
>  	default:
>  		return -EINVAL;
>  	}
> diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_regs.h
> b/drivers/staging/media/sunxi/cedrus/cedrus_regs.h index
> de2d6b6f64bf..6fe9896a506d 100644
> --- a/drivers/staging/media/sunxi/cedrus/cedrus_regs.h
> +++ b/drivers/staging/media/sunxi/cedrus/cedrus_regs.h
> @@ -232,4 +232,67 @@
>  #define VE_DEC_MPEG_ROT_LUMA			(VE_ENGINE_DEC_MPEG + 0xcc)
>  #define VE_DEC_MPEG_ROT_CHROMA			(VE_ENGINE_DEC_MPEG + 0xd0)
> 
> +/*  FIXME: Legacy below. */
> +
> +#define VBV_SIZE                       (1024 * 1024)
> +
> +#define VE_H264_FRAME_SIZE		0x200
> +#define VE_H264_PIC_HDR			0x204
> +#define VE_H264_SLICE_HDR		0x208
> +#define VE_H264_SLICE_HDR2		0x20c
> +#define VE_H264_PRED_WEIGHT		0x210
> +#define VE_H264_QP_PARAM		0x21c
> +#define VE_H264_CTRL			0x220
> +
> +#define VE_H264_TRIGGER_TYPE		0x224
> +#define VE_H264_TRIGGER_TYPE_AVC_SLICE_DECODE	(8 << 0)
> +#define VE_H264_TRIGGER_TYPE_INIT_SWDEC		(7 << 0)
> +
> +#define VE_H264_STATUS			0x228
> +#define VE_H264_CUR_MB_NUM		0x22c
> +
> +#define VE_H264_VLD_ADDR		0x230
> +#define VE_H264_VLD_ADDR_FIRST			BIT(30)
> +#define VE_H264_VLD_ADDR_LAST			BIT(29)
> +#define VE_H264_VLD_ADDR_VALID			BIT(28)
> +#define VE_H264_VLD_ADDR_VAL(x)			(((x) & 0x0ffffff0) | ((x) >> 28))
> +
> +#define VE_H264_VLD_OFFSET		0x234
> +#define VE_H264_VLD_LEN			0x238
> +#define VE_H264_VLD_END			0x23c
> +#define VE_H264_SDROT_CTRL		0x240
> +#define VE_H264_OUTPUT_FRAME_IDX	0x24c
> +#define VE_H264_EXTRA_BUFFER1		0x250
> +#define VE_H264_EXTRA_BUFFER2		0x254
> +#define VE_H264_BASIC_BITS		0x2dc
> +#define VE_AVC_SRAM_PORT_OFFSET		0x2e0
> +#define VE_AVC_SRAM_PORT_DATA		0x2e4
> +
> +#define VE_ISP_INPUT_SIZE		0xa00
> +#define VE_ISP_INPUT_STRIDE		0xa04
> +#define VE_ISP_CTRL			0xa08
> +#define VE_ISP_INPUT_LUMA		0xa78
> +#define VE_ISP_INPUT_CHROMA		0xa7c
> +
> +#define VE_AVC_PARAM			0xb04
> +#define VE_AVC_QP			0xb08
> +#define VE_AVC_MOTION_EST		0xb10
> +#define VE_AVC_CTRL			0xb14
> +#define VE_AVC_TRIGGER			0xb18
> +#define VE_AVC_STATUS			0xb1c
> +#define VE_AVC_BASIC_BITS		0xb20
> +#define VE_AVC_UNK_BUF			0xb60
> +#define VE_AVC_VLE_ADDR			0xb80
> +#define VE_AVC_VLE_END			0xb84
> +#define VE_AVC_VLE_OFFSET		0xb88
> +#define VE_AVC_VLE_MAX			0xb8c
> +#define VE_AVC_VLE_LENGTH		0xb90
> +#define VE_AVC_REF_LUMA			0xba0
> +#define VE_AVC_REF_CHROMA		0xba4
> +#define VE_AVC_REC_LUMA			0xbb0
> +#define VE_AVC_REC_CHROMA		0xbb4
> +#define VE_AVC_REF_SLUMA		0xbb8
> +#define VE_AVC_REC_SLUMA		0xbbc
> +#define VE_AVC_MB_INFO			0xbc0
> +
>  #endif
> diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_video.c
> b/drivers/staging/media/sunxi/cedrus/cedrus_video.c index
> 293df48326cc..7be2caacddde 100644
> --- a/drivers/staging/media/sunxi/cedrus/cedrus_video.c
> +++ b/drivers/staging/media/sunxi/cedrus/cedrus_video.c
> @@ -37,6 +37,10 @@ static struct cedrus_format cedrus_formats[] = {
>  		.pixelformat	= V4L2_PIX_FMT_MPEG2_SLICE,
>  		.directions	= CEDRUS_DECODE_SRC,
>  	},
> +	{
> +		.pixelformat	= V4L2_PIX_FMT_H264_SLICE,
> +		.directions	= CEDRUS_DECODE_SRC,
> +	},
>  	{
>  		.pixelformat	= V4L2_PIX_FMT_SUNXI_TILED_NV12,
>  		.directions	= CEDRUS_DECODE_DST,
> @@ -100,6 +104,7 @@ static void cedrus_prepare_format(struct v4l2_pix_format
> *pix_fmt)
> 
>  	switch (pix_fmt->pixelformat) {
>  	case V4L2_PIX_FMT_MPEG2_SLICE:
> +	case V4L2_PIX_FMT_H264_SLICE:
>  		/* Zero bytes per line for encoded source. */
>  		bytesperline = 0;
> 
> @@ -451,6 +456,10 @@ static int cedrus_start_streaming(struct vb2_queue *vq,
> unsigned int count) ctx->current_codec = CEDRUS_CODEC_MPEG2;
>  		break;
> 
> +	case V4L2_PIX_FMT_H264_SLICE:
> +		ctx->current_codec = CEDRUS_CODEC_H264;
> +		break;
> +
>  	default:
>  		return -EINVAL;
>  	}

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [linux-sunxi] [PATCH v2 2/2] media: cedrus: Add H264 decoding support
  2018-11-24 20:43   ` [linux-sunxi] " Jernej Škrabec
@ 2018-11-27 15:50     ` Maxime Ripard
  2018-11-27 16:30       ` Jernej Škrabec
  0 siblings, 1 reply; 27+ messages in thread
From: Maxime Ripard @ 2018-11-27 15:50 UTC (permalink / raw)
  To: Jernej Škrabec
  Cc: linux-sunxi, hans.verkuil, acourbot, sakari.ailus,
	Laurent Pinchart, tfiga, posciak, Paul Kocialkowski,
	Chen-Yu Tsai, linux-kernel, linux-arm-kernel, linux-media,
	nicolas.dufresne, jenskuske, Thomas Petazzoni

[-- Attachment #1: Type: text/plain, Size: 6717 bytes --]

Hi Jernej,

Thanks for your review!

On Sat, Nov 24, 2018 at 09:43:43PM +0100, Jernej Škrabec wrote:
> > +enum cedrus_h264_sram_off {
> > +	CEDRUS_SRAM_H264_PRED_WEIGHT_TABLE	= 0x000,
> > +	CEDRUS_SRAM_H264_FRAMEBUFFER_LIST	= 0x100,
> > +	CEDRUS_SRAM_H264_REF_LIST_0		= 0x190,
> > +	CEDRUS_SRAM_H264_REF_LIST_1		= 0x199,
> > +	CEDRUS_SRAM_H264_SCALING_LIST_8x8	= 0x200,
> > +	CEDRUS_SRAM_H264_SCALING_LIST_4x4	= 0x218,
> 
> I triple checked above address and it should be 0x220. For easier 
> implementation later, you might want to add second scaling list address for 
> 8x8 at 0x210. Then you can do something like:
> 
> cedrus_h264_write_sram(dev, CEDRUS_SRAM_H264_SCALING_LIST_8x8_0,
> 			       scaling->scaling_list_8x8[0],
> 			       sizeof(scaling->scaling_list_8x8[0]));
> cedrus_h264_write_sram(dev, CEDRUS_SRAM_H264_SCALING_LIST_8x8_1,
> 			       scaling->scaling_list_8x8[3],
> 			       sizeof(scaling->scaling_list_8x8[0]));
> cedrus_h264_write_sram(dev, CEDRUS_SRAM_H264_SCALING_LIST_4x4,
> 			       scaling->scaling_list_4x4,
> 			       sizeof(scaling->scaling_list_4x4));
> 
> I know that it's not implemented here, just FYI.

Ack. I guess I can just leave it out entirely for now, since it's not
implemented.

> > +static void cedrus_fill_ref_pic(struct cedrus_ctx *ctx,
> > +				struct cedrus_buffer *buf,
> > +				unsigned int top_field_order_cnt,
> > +				unsigned int bottom_field_order_cnt,
> > +				struct cedrus_h264_sram_ref_pic *pic)
> > +{
> > +	struct vb2_buffer *vbuf = &buf->m2m_buf.vb.vb2_buf;
> > +	unsigned int position = buf->codec.h264.position;
> > +
> > +	pic->top_field_order_cnt = top_field_order_cnt;
> > +	pic->bottom_field_order_cnt = bottom_field_order_cnt;
> > +	pic->frame_info = buf->codec.h264.pic_type << 8;
> > +
> > +	pic->luma_ptr = cedrus_buf_addr(vbuf, &ctx->dst_fmt, 0) - PHYS_OFFSET;
> > +	pic->chroma_ptr = cedrus_buf_addr(vbuf, &ctx->dst_fmt, 1) - PHYS_OFFSET;
> 
> I think subtracting PHYS_OFFSET breaks driver on H3 boards with 2 GiB of RAM. 
> Isn't that unnecessary anyway due to
> 
> dev->dev->dma_pfn_offset = PHYS_PFN_OFFSET;
> 
> in cedrus_hw.c?
> 
> This comment is meant for all PHYS_OFFSET subtracting in this patch.

PHYS_OFFSET was needed on some older SoCs, and the dma_pfn_offset
trick wasn't working, I hacked it and forgot about it. I'll try to
figure it out for the next version.

> > +static void _cedrus_write_ref_list(struct cedrus_ctx *ctx,
> > +				   struct cedrus_run *run,
> > +				   const u8 *ref_list, u8 num_ref,
> > +				   enum cedrus_h264_sram_off sram)
> > +{
> > +	const struct v4l2_ctrl_h264_decode_param *decode = run->h264.decode_param;
> > +	struct vb2_queue *cap_q = &ctx->fh.m2m_ctx->cap_q_ctx.q;
> > +	struct cedrus_dev *dev = ctx->dev;
> > +	u32 sram_array[CEDRUS_MAX_REF_IDX / sizeof(u32)];
> > +	unsigned int size, i;
> > +
> > +	memset(sram_array, 0, sizeof(sram_array));
> > +
> > +	for (i = 0; i < num_ref; i += 4) {
> > +		unsigned int j;
> > +
> > +		for (j = 0; j < 4; j++) {
> 
> I don't think you have to complicate with two loops here. 
> cedrus_h264_write_sram() takes void* and it aligns to 4 anyway. So as long 
> input buffer is multiple of 4 (u8[CEDRUS_MAX_REF_IDX] qualifies for that), you 
> can use single for loop with "u8 sram_array[CEDRUS_MAX_REF_IDX]". This should 
> make code much more readable.

This wasn't really about the alignment, but in order to get the
offsets in the u32 and the array more easily.

Breaking out the loop will make that computation less easy on the eye,
so I guess it's very subjective.

> > +			const struct v4l2_h264_dpb_entry *dpb;
> > +			const struct cedrus_buffer *cedrus_buf;
> > +			const struct vb2_v4l2_buffer *ref_buf;
> > +			unsigned int position;
> > +			int buf_idx;
> > +			u8 ref_idx = i + j;
> > +			u8 dpb_idx;
> > +
> > +			if (ref_idx >= num_ref)
> > +				break;
> > +
> > +			dpb_idx = ref_list[ref_idx];
> > +			dpb = &decode->dpb[dpb_idx];
> > +
> > +			if (!(dpb->flags & V4L2_H264_DPB_ENTRY_FLAG_ACTIVE))
> > +				continue;
> > +
> > +			buf_idx = vb2_find_tag(cap_q, dpb->tag, 0);
> > +			if (buf_idx < 0)
> > +				continue;
> > +
> > +			ref_buf = to_vb2_v4l2_buffer(ctx->dst_bufs[buf_idx]);
> > +			cedrus_buf = vb2_v4l2_to_cedrus_buffer(ref_buf);
> > +			position = cedrus_buf->codec.h264.position;
> > +
> > +			sram_array[i] |= position << (j * 8 + 1);
> > +			if (ref_buf->field == V4L2_FIELD_BOTTOM)
> 
> You newer set above flag to buffer so this will be always false.

As far as I know, the field is supposed to be set by the userspace.

> > +	// sequence parameters
> > +	reg = BIT(19);
> 
> This one can be inferred from sps->chroma_format_idc.

I'll look into this

> > +	reg |= (sps->pic_width_in_mbs_minus1 & 0xff) << 8;
> > +	reg |= sps->pic_height_in_map_units_minus1 & 0xff;
> > +	if (sps->flags & V4L2_H264_SPS_FLAG_FRAME_MBS_ONLY)
> > +		reg |= BIT(18);
> > +	if (sps->flags & V4L2_H264_SPS_FLAG_MB_ADAPTIVE_FRAME_FIELD)
> > +		reg |= BIT(17);
> > +	if (sps->flags & V4L2_H264_SPS_FLAG_DIRECT_8X8_INFERENCE)
> > +		reg |= BIT(16);
> > +	cedrus_write(dev, VE_H264_FRAME_SIZE, reg);
> > +
> > +	// slice parameters
> > +	reg = 0;
> > +	/*
> > +	 * FIXME: This bit marks all the frames as references. This
> > +	 * should probably be set based on nal_ref_idc, but the libva
> > +	 * doesn't pass that information along, so this is not always
> > +	 * available. We should find something else, maybe change the
> > +	 * kernel UAPI somehow?
> > +	 */
> > +	reg |= BIT(12);
> 
> I really think you should use nal_ref_idc here as it is in specification.  You 
> can still fake the data from libva backend. I don't think that any driver 
> needs this for anything else than check if it is 0 or not.

Yeah, Tomasz suggested the same thing as a reply to the cover letter,
I'll change that in the next version.

> > +	reg |= (slice->slice_type & 0xf) << 8;
> > +	reg |= slice->cabac_init_idc & 0x3;
> > +	reg |= BIT(5);
> > +	if (slice->flags & V4L2_H264_SLICE_FLAG_FIELD_PIC)
> > +		reg |= BIT(4);
> > +	if (slice->flags & V4L2_H264_SLICE_FLAG_BOTTOM_FIELD)
> > +		reg |= BIT(3);
> > +	if (slice->flags & V4L2_H264_SLICE_FLAG_DIRECT_SPATIAL_MV_PRED)
> > +		reg |= BIT(2);
> > +	cedrus_write(dev, VE_H264_SLICE_HDR, reg);
> > +
> > +	reg = 0;
> 
> You might want to set bit 12 here, which enables active reference picture 
> override. However, I'm not completely sure about that.

Did you find some videos that were broken because of this?

Thanks!
Maxime

-- 
Maxime Ripard, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [linux-sunxi] [PATCH v2 2/2] media: cedrus: Add H264 decoding support
  2018-11-27 15:50     ` Maxime Ripard
@ 2018-11-27 16:30       ` Jernej Škrabec
  2018-11-27 20:19         ` Jernej Škrabec
  2018-11-30  7:30         ` Maxime Ripard
  0 siblings, 2 replies; 27+ messages in thread
From: Jernej Škrabec @ 2018-11-27 16:30 UTC (permalink / raw)
  To: linux-sunxi, maxime.ripard
  Cc: hans.verkuil, acourbot, sakari.ailus, Laurent Pinchart, tfiga,
	posciak, Paul Kocialkowski, Chen-Yu Tsai, linux-kernel,
	linux-arm-kernel, linux-media, nicolas.dufresne, jenskuske,
	Thomas Petazzoni

Dne torek, 27. november 2018 ob 16:50:28 CET je Maxime Ripard napisal(a):
> Hi Jernej,
> 
> Thanks for your review!
> 
> On Sat, Nov 24, 2018 at 09:43:43PM +0100, Jernej Škrabec wrote:
> > > +enum cedrus_h264_sram_off {
> > > +	CEDRUS_SRAM_H264_PRED_WEIGHT_TABLE	= 0x000,
> > > +	CEDRUS_SRAM_H264_FRAMEBUFFER_LIST	= 0x100,
> > > +	CEDRUS_SRAM_H264_REF_LIST_0		= 0x190,
> > > +	CEDRUS_SRAM_H264_REF_LIST_1		= 0x199,
> > > +	CEDRUS_SRAM_H264_SCALING_LIST_8x8	= 0x200,
> > > +	CEDRUS_SRAM_H264_SCALING_LIST_4x4	= 0x218,
> > 
> > I triple checked above address and it should be 0x220. For easier
> > implementation later, you might want to add second scaling list address
> > for
> > 8x8 at 0x210. Then you can do something like:
> > 
> > cedrus_h264_write_sram(dev, CEDRUS_SRAM_H264_SCALING_LIST_8x8_0,
> > 
> > 			       scaling->scaling_list_8x8[0],
> > 			       sizeof(scaling->scaling_list_8x8[0]));
> > 
> > cedrus_h264_write_sram(dev, CEDRUS_SRAM_H264_SCALING_LIST_8x8_1,
> > 
> > 			       scaling->scaling_list_8x8[3],
> > 			       sizeof(scaling->scaling_list_8x8[0]));
> > 
> > cedrus_h264_write_sram(dev, CEDRUS_SRAM_H264_SCALING_LIST_4x4,
> > 
> > 			       scaling->scaling_list_4x4,
> > 			       sizeof(scaling->scaling_list_4x4));
> > 
> > I know that it's not implemented here, just FYI.
> 
> Ack. I guess I can just leave it out entirely for now, since it's not
> implemented.
> 
> > > +static void cedrus_fill_ref_pic(struct cedrus_ctx *ctx,
> > > +				struct cedrus_buffer *buf,
> > > +				unsigned int top_field_order_cnt,
> > > +				unsigned int bottom_field_order_cnt,
> > > +				struct cedrus_h264_sram_ref_pic *pic)
> > > +{
> > > +	struct vb2_buffer *vbuf = &buf->m2m_buf.vb.vb2_buf;
> > > +	unsigned int position = buf->codec.h264.position;
> > > +
> > > +	pic->top_field_order_cnt = top_field_order_cnt;
> > > +	pic->bottom_field_order_cnt = bottom_field_order_cnt;
> > > +	pic->frame_info = buf->codec.h264.pic_type << 8;
> > > +
> > > +	pic->luma_ptr = cedrus_buf_addr(vbuf, &ctx->dst_fmt, 0) - PHYS_OFFSET;
> > > +	pic->chroma_ptr = cedrus_buf_addr(vbuf, &ctx->dst_fmt, 1) -
> > > PHYS_OFFSET;
> > 
> > I think subtracting PHYS_OFFSET breaks driver on H3 boards with 2 GiB of
> > RAM. Isn't that unnecessary anyway due to
> > 
> > dev->dev->dma_pfn_offset = PHYS_PFN_OFFSET;
> > 
> > in cedrus_hw.c?
> > 
> > This comment is meant for all PHYS_OFFSET subtracting in this patch.
> 
> PHYS_OFFSET was needed on some older SoCs, and the dma_pfn_offset
> trick wasn't working, I hacked it and forgot about it. I'll try to
> figure it out for the next version.
> 
> > > +static void _cedrus_write_ref_list(struct cedrus_ctx *ctx,
> > > +				   struct cedrus_run *run,
> > > +				   const u8 *ref_list, u8 num_ref,
> > > +				   enum cedrus_h264_sram_off sram)
> > > +{
> > > +	const struct v4l2_ctrl_h264_decode_param *decode =
> > > run->h264.decode_param; +	struct vb2_queue *cap_q =
> > > &ctx->fh.m2m_ctx->cap_q_ctx.q;
> > > +	struct cedrus_dev *dev = ctx->dev;
> > > +	u32 sram_array[CEDRUS_MAX_REF_IDX / sizeof(u32)];
> > > +	unsigned int size, i;
> > > +
> > > +	memset(sram_array, 0, sizeof(sram_array));
> > > +
> > > +	for (i = 0; i < num_ref; i += 4) {
> > > +		unsigned int j;
> > > +
> > > +		for (j = 0; j < 4; j++) {
> > 
> > I don't think you have to complicate with two loops here.
> > cedrus_h264_write_sram() takes void* and it aligns to 4 anyway. So as long
> > input buffer is multiple of 4 (u8[CEDRUS_MAX_REF_IDX] qualifies for that),
> > you can use single for loop with "u8 sram_array[CEDRUS_MAX_REF_IDX]".
> > This should make code much more readable.
> 
> This wasn't really about the alignment, but in order to get the
> offsets in the u32 and the array more easily.
> 
> Breaking out the loop will make that computation less easy on the eye,
> so I guess it's very subjective.
> 

For some strange reason, code below fixes decoding issue from one of my test 
samples. This is what I actually meant with 1 loop approach:

static void _cedrus_write_ref_list(struct cedrus_ctx *ctx,
				   struct cedrus_run *run,
				   const u8 *ref_list, u8 num_ref,
				   enum cedrus_h264_sram_off sram)
{
	const struct v4l2_ctrl_h264_decode_param *decode = run->h264.decode_param;
	struct vb2_queue *cap_q = &ctx->fh.m2m_ctx->cap_q_ctx.q;
	struct cedrus_dev *dev = ctx->dev;
	u8 sram_array[CEDRUS_MAX_REF_IDX];
	unsigned int i;

	memset(sram_array, 0, sizeof(sram_array));
	num_ref = min(num_ref, (u8)CEDRUS_MAX_REF_IDX);

	for (i = 0; i < num_ref; i++) {
		const struct v4l2_h264_dpb_entry *dpb;
		const struct cedrus_buffer *cedrus_buf;
		const struct vb2_v4l2_buffer *ref_buf;
		unsigned int position;
		int buf_idx;
		u8 dpb_idx;

		dpb_idx = ref_list[i];
		dpb = &decode->dpb[dpb_idx];

		if (!(dpb->flags & V4L2_H264_DPB_ENTRY_FLAG_ACTIVE))
			continue;

		buf_idx = vb2_find_tag(cap_q, dpb->tag, 0);
		if (buf_idx < 0)
			continue;

		ref_buf = to_vb2_v4l2_buffer(ctx->dst_bufs[buf_idx]);
		cedrus_buf = vb2_v4l2_to_cedrus_buffer(ref_buf);
		position = cedrus_buf->codec.h264.position;

		sram_array[i] |= position << 1;
		if (ref_buf->field == V4L2_FIELD_BOTTOM)
			sram_array[i] |= BIT(0);
	}

	cedrus_h264_write_sram(dev, sram, &sram_array, num_ref);
}

IMO this code is easier to read.

> > > +			const struct v4l2_h264_dpb_entry *dpb;
> > > +			const struct cedrus_buffer *cedrus_buf;
> > > +			const struct vb2_v4l2_buffer *ref_buf;
> > > +			unsigned int position;
> > > +			int buf_idx;
> > > +			u8 ref_idx = i + j;
> > > +			u8 dpb_idx;
> > > +
> > > +			if (ref_idx >= num_ref)
> > > +				break;
> > > +
> > > +			dpb_idx = ref_list[ref_idx];
> > > +			dpb = &decode->dpb[dpb_idx];
> > > +
> > > +			if (!(dpb->flags & V4L2_H264_DPB_ENTRY_FLAG_ACTIVE))
> > > +				continue;
> > > +
> > > +			buf_idx = vb2_find_tag(cap_q, dpb->tag, 0);
> > > +			if (buf_idx < 0)
> > > +				continue;
> > > +
> > > +			ref_buf = to_vb2_v4l2_buffer(ctx->dst_bufs[buf_idx]);
> > > +			cedrus_buf = vb2_v4l2_to_cedrus_buffer(ref_buf);
> > > +			position = cedrus_buf->codec.h264.position;
> > > +
> > > +			sram_array[i] |= position << (j * 8 + 1);
> > > +			if (ref_buf->field == V4L2_FIELD_BOTTOM)
> > 
> > You newer set above flag to buffer so this will be always false.
> 
> As far as I know, the field is supposed to be set by the userspace.

How? I thought that only flags at queueing buffers can be set and there is no 
bottom/top flag.

> 
> > > +	// sequence parameters
> > > +	reg = BIT(19);
> > 
> > This one can be inferred from sps->chroma_format_idc.
> 
> I'll look into this
> 

I'm using this:
reg |= (sps->chroma_format_idc & 0x7) << 19;

Although I can't tell if I tested anything else than 1 there (same as it was 
before).

> > > +	reg |= (sps->pic_width_in_mbs_minus1 & 0xff) << 8;
> > > +	reg |= sps->pic_height_in_map_units_minus1 & 0xff;
> > > +	if (sps->flags & V4L2_H264_SPS_FLAG_FRAME_MBS_ONLY)
> > > +		reg |= BIT(18);
> > > +	if (sps->flags & V4L2_H264_SPS_FLAG_MB_ADAPTIVE_FRAME_FIELD)
> > > +		reg |= BIT(17);
> > > +	if (sps->flags & V4L2_H264_SPS_FLAG_DIRECT_8X8_INFERENCE)
> > > +		reg |= BIT(16);
> > > +	cedrus_write(dev, VE_H264_FRAME_SIZE, reg);
> > > +
> > > +	// slice parameters
> > > +	reg = 0;
> > > +	/*
> > > +	 * FIXME: This bit marks all the frames as references. This
> > > +	 * should probably be set based on nal_ref_idc, but the libva
> > > +	 * doesn't pass that information along, so this is not always
> > > +	 * available. We should find something else, maybe change the
> > > +	 * kernel UAPI somehow?
> > > +	 */
> > > +	reg |= BIT(12);
> > 
> > I really think you should use nal_ref_idc here as it is in specification. 
> > You can still fake the data from libva backend. I don't think that any
> > driver needs this for anything else than check if it is 0 or not.
> 
> Yeah, Tomasz suggested the same thing as a reply to the cover letter,
> I'll change that in the next version.
> 
> > > +	reg |= (slice->slice_type & 0xf) << 8;
> > > +	reg |= slice->cabac_init_idc & 0x3;
> > > +	reg |= BIT(5);
> > > +	if (slice->flags & V4L2_H264_SLICE_FLAG_FIELD_PIC)
> > > +		reg |= BIT(4);
> > > +	if (slice->flags & V4L2_H264_SLICE_FLAG_BOTTOM_FIELD)
> > > +		reg |= BIT(3);
> > > +	if (slice->flags & V4L2_H264_SLICE_FLAG_DIRECT_SPATIAL_MV_PRED)
> > > +		reg |= BIT(2);
> > > +	cedrus_write(dev, VE_H264_SLICE_HDR, reg);
> > > +
> > > +	reg = 0;
> > 
> > You might want to set bit 12 here, which enables active reference picture
> > override. However, I'm not completely sure about that.
> 
> Did you find some videos that were broken because of this?

No, not really. That's why I don't really know if it is needed or not.

Best regards,
Jernej

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [linux-sunxi] [PATCH v2 1/2] media: uapi: Add H264 low-level decoder API compound controls.
  2018-11-15 14:56 ` [PATCH v2 1/2] media: uapi: Add H264 low-level decoder API compound controls Maxime Ripard
@ 2018-11-27 17:23   ` Jernej Škrabec
  2018-11-28 15:52     ` Maxime Ripard
  2018-12-05 12:56   ` Hans Verkuil
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 27+ messages in thread
From: Jernej Škrabec @ 2018-11-27 17:23 UTC (permalink / raw)
  To: linux-sunxi, maxime.ripard
  Cc: hans.verkuil, acourbot, sakari.ailus, Laurent Pinchart, tfiga,
	posciak, Paul Kocialkowski, Chen-Yu Tsai, linux-kernel,
	linux-arm-kernel, linux-media, nicolas.dufresne, jenskuske,
	Thomas Petazzoni, Guenter Roeck

Hi!

Dne četrtek, 15. november 2018 ob 15:56:49 CET je Maxime Ripard napisal(a):
> From: Pawel Osciak <posciak@chromium.org>
> 
> Stateless video codecs will require both the H264 metadata and slices in
> order to be able to decode frames.
> 
> This introduces the definitions for a new pixel format for H264 slices that
> have been parsed, as well as the structures used to pass the metadata from
> the userspace to the kernel.
> 
> Co-Developed-by: Maxime Ripard <maxime.ripard@bootlin.com>
> Signed-off-by: Pawel Osciak <posciak@chromium.org>
> Signed-off-by: Guenter Roeck <groeck@chromium.org>
> Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
> ---
>  Documentation/media/uapi/v4l/biblio.rst       |   9 +
>  .../media/uapi/v4l/extended-controls.rst      | 364 ++++++++++++++++++
>  .../media/uapi/v4l/pixfmt-compressed.rst      |  20 +
>  .../media/uapi/v4l/vidioc-queryctrl.rst       |  30 ++
>  .../media/videodev2.h.rst.exceptions          |   5 +
>  drivers/media/v4l2-core/v4l2-ctrls.c          |  42 ++
>  drivers/media/v4l2-core/v4l2-ioctl.c          |   1 +
>  include/media/v4l2-ctrls.h                    |  10 +
>  include/uapi/linux/v4l2-controls.h            | 166 ++++++++
>  include/uapi/linux/videodev2.h                |  11 +
>  10 files changed, 658 insertions(+)
> 

<snip>

> @@ -1156,4 +1164,162 @@ struct v4l2_ctrl_mpeg2_quantization {
>  	__u8	chroma_non_intra_quantiser_matrix[64];
>  };
> 
> +/* Compounds controls */
> +
> +#define V4L2_H264_SPS_CONSTRAINT_SET0_FLAG			0x01
> +#define V4L2_H264_SPS_CONSTRAINT_SET1_FLAG			0x02
> +#define V4L2_H264_SPS_CONSTRAINT_SET2_FLAG			0x04
> +#define V4L2_H264_SPS_CONSTRAINT_SET3_FLAG			0x08
> +#define V4L2_H264_SPS_CONSTRAINT_SET4_FLAG			0x10
> +#define V4L2_H264_SPS_CONSTRAINT_SET5_FLAG			0x20

How are these constraint flags meant to be used?

> +
> +#define V4L2_H264_SPS_FLAG_SEPARATE_COLOUR_PLANE		0x01
> +#define V4L2_H264_SPS_FLAG_QPPRIME_Y_ZERO_TRANSFORM_BYPASS	0x02
> +#define V4L2_H264_SPS_FLAG_DELTA_PIC_ORDER_ALWAYS_ZERO		0x04
> +#define V4L2_H264_SPS_FLAG_GAPS_IN_FRAME_NUM_VALUE_ALLOWED	0x08
> +#define V4L2_H264_SPS_FLAG_FRAME_MBS_ONLY			0x10
> +#define V4L2_H264_SPS_FLAG_MB_ADAPTIVE_FRAME_FIELD		0x20
> +#define V4L2_H264_SPS_FLAG_DIRECT_8X8_INFERENCE			0x40
> +
> +struct v4l2_ctrl_h264_sps {
> +	__u8 profile_idc;
> +	__u8 constraint_set_flags;
> +	__u8 level_idc;
> +	__u8 seq_parameter_set_id;
> +	__u8 chroma_format_idc;
> +	__u8 bit_depth_luma_minus8;
> +	__u8 bit_depth_chroma_minus8;
> +	__u8 log2_max_frame_num_minus4;
> +	__u8 pic_order_cnt_type;
> +	__u8 log2_max_pic_order_cnt_lsb_minus4;
> +	__u8 max_num_ref_frames;
> +	__u8 num_ref_frames_in_pic_order_cnt_cycle;
> +	__s32 offset_for_ref_frame[255];
> +	__s32 offset_for_non_ref_pic;
> +	__s32 offset_for_top_to_bottom_field;
> +	__u16 pic_width_in_mbs_minus1;
> +	__u16 pic_height_in_map_units_minus1;
> +	__u8 flags;
> +};
> +
> +#define V4L2_H264_PPS_FLAG_ENTROPY_CODING_MODE				0x0001
> +#define V4L2_H264_PPS_FLAG_BOTTOM_FIELD_PIC_ORDER_IN_FRAME_PRESENT	0x0002
> +#define V4L2_H264_PPS_FLAG_WEIGHTED_PRED				0x0004
> +#define V4L2_H264_PPS_FLAG_DEBLOCKING_FILTER_CONTROL_PRESENT		0x0008
> +#define V4L2_H264_PPS_FLAG_CONSTRAINED_INTRA_PRED			0x0010
> +#define V4L2_H264_PPS_FLAG_REDUNDANT_PIC_CNT_PRESENT			0x0020
> +#define V4L2_H264_PPS_FLAG_TRANSFORM_8X8_MODE				0x0040
> +#define V4L2_H264_PPS_FLAG_PIC_SCALING_MATRIX_PRESENT			0x0080
> +
> +struct v4l2_ctrl_h264_pps {
> +	__u8 pic_parameter_set_id;
> +	__u8 seq_parameter_set_id;
> +	__u8 num_slice_groups_minus1;
> +	__u8 num_ref_idx_l0_default_active_minus1;
> +	__u8 num_ref_idx_l1_default_active_minus1;
> +	__u8 weighted_bipred_idc;
> +	__s8 pic_init_qp_minus26;
> +	__s8 pic_init_qs_minus26;
> +	__s8 chroma_qp_index_offset;
> +	__s8 second_chroma_qp_index_offset;
> +	__u8 flags;
> +};
> +
> +struct v4l2_ctrl_h264_scaling_matrix {
> +	__u8 scaling_list_4x4[6][16];
> +	__u8 scaling_list_8x8[6][64];
> +};
> +
> +struct v4l2_h264_weight_factors {
> +	__s8 luma_weight[32];
> +	__s8 luma_offset[32];
> +	__s8 chroma_weight[32][2];
> +	__s8 chroma_offset[32][2];
> +};

Regarding weight type __s8 - isn't too small just a bit?

ITU-T Rec. H264 (05/2003) says that this field has value between -128 to 127 if 
weight flag is set. That fits perfectly. However, when weight flag is 0, default 
value is 2^luma_log2_weight_denom (for example). luma_log2_weight_denom can 
have values between 0 and 7, which means that weight will have values from 1 
to 128. That is just slightly over the max value for __s8.

__s8 is fine for offsets, though.

Best regards,
Jernej

> +
> +struct v4l2_h264_pred_weight_table {
> +	__u8 luma_log2_weight_denom;
> +	__u8 chroma_log2_weight_denom;
> +	struct v4l2_h264_weight_factors weight_factors[2];
> +};

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [linux-sunxi] [PATCH v2 2/2] media: cedrus: Add H264 decoding support
  2018-11-27 16:30       ` Jernej Škrabec
@ 2018-11-27 20:19         ` Jernej Škrabec
  2018-11-30  7:30         ` Maxime Ripard
  1 sibling, 0 replies; 27+ messages in thread
From: Jernej Škrabec @ 2018-11-27 20:19 UTC (permalink / raw)
  To: linux-sunxi
  Cc: maxime.ripard, hans.verkuil, acourbot, sakari.ailus,
	Laurent Pinchart, tfiga, posciak, Paul Kocialkowski,
	Chen-Yu Tsai, linux-kernel, linux-arm-kernel, linux-media,
	nicolas.dufresne, jenskuske, Thomas Petazzoni

Dne torek, 27. november 2018 ob 17:30:00 CET je Jernej Škrabec napisal(a):
> Dne torek, 27. november 2018 ob 16:50:28 CET je Maxime Ripard napisal(a):
> > Hi Jernej,
> > 
> > Thanks for your review!
> > 
> > On Sat, Nov 24, 2018 at 09:43:43PM +0100, Jernej Škrabec wrote:
> > > > +enum cedrus_h264_sram_off {
> > > > +	CEDRUS_SRAM_H264_PRED_WEIGHT_TABLE	= 0x000,
> > > > +	CEDRUS_SRAM_H264_FRAMEBUFFER_LIST	= 0x100,
> > > > +	CEDRUS_SRAM_H264_REF_LIST_0		= 0x190,
> > > > +	CEDRUS_SRAM_H264_REF_LIST_1		= 0x199,
> > > > +	CEDRUS_SRAM_H264_SCALING_LIST_8x8	= 0x200,
> > > > +	CEDRUS_SRAM_H264_SCALING_LIST_4x4	= 0x218,
> > > 
> > > I triple checked above address and it should be 0x220. For easier
> > > implementation later, you might want to add second scaling list address
> > > for
> > > 8x8 at 0x210. Then you can do something like:
> > > 
> > > cedrus_h264_write_sram(dev, CEDRUS_SRAM_H264_SCALING_LIST_8x8_0,
> > > 
> > > 			       scaling->scaling_list_8x8[0],
> > > 			       sizeof(scaling->scaling_list_8x8[0]));
> > > 
> > > cedrus_h264_write_sram(dev, CEDRUS_SRAM_H264_SCALING_LIST_8x8_1,
> > > 
> > > 			       scaling->scaling_list_8x8[3],
> > > 			       sizeof(scaling->scaling_list_8x8[0]));
> > > 
> > > cedrus_h264_write_sram(dev, CEDRUS_SRAM_H264_SCALING_LIST_4x4,
> > > 
> > > 			       scaling->scaling_list_4x4,
> > > 			       sizeof(scaling->scaling_list_4x4));
> > > 
> > > I know that it's not implemented here, just FYI.
> > 
> > Ack. I guess I can just leave it out entirely for now, since it's not
> > implemented.
> > 
> > > > +static void cedrus_fill_ref_pic(struct cedrus_ctx *ctx,
> > > > +				struct cedrus_buffer *buf,
> > > > +				unsigned int top_field_order_cnt,
> > > > +				unsigned int bottom_field_order_cnt,
> > > > +				struct cedrus_h264_sram_ref_pic *pic)
> > > > +{
> > > > +	struct vb2_buffer *vbuf = &buf->m2m_buf.vb.vb2_buf;
> > > > +	unsigned int position = buf->codec.h264.position;
> > > > +
> > > > +	pic->top_field_order_cnt = top_field_order_cnt;
> > > > +	pic->bottom_field_order_cnt = bottom_field_order_cnt;
> > > > +	pic->frame_info = buf->codec.h264.pic_type << 8;
> > > > +
> > > > +	pic->luma_ptr = cedrus_buf_addr(vbuf, &ctx->dst_fmt, 0) -
> > > > PHYS_OFFSET;
> > > > +	pic->chroma_ptr = cedrus_buf_addr(vbuf, &ctx->dst_fmt, 1) -
> > > > PHYS_OFFSET;
> > > 
> > > I think subtracting PHYS_OFFSET breaks driver on H3 boards with 2 GiB of
> > > RAM. Isn't that unnecessary anyway due to
> > > 
> > > dev->dev->dma_pfn_offset = PHYS_PFN_OFFSET;
> > > 
> > > in cedrus_hw.c?
> > > 
> > > This comment is meant for all PHYS_OFFSET subtracting in this patch.
> > 
> > PHYS_OFFSET was needed on some older SoCs, and the dma_pfn_offset
> > trick wasn't working, I hacked it and forgot about it. I'll try to
> > figure it out for the next version.
> > 
> > > > +static void _cedrus_write_ref_list(struct cedrus_ctx *ctx,
> > > > +				   struct cedrus_run *run,
> > > > +				   const u8 *ref_list, u8 num_ref,
> > > > +				   enum cedrus_h264_sram_off sram)
> > > > +{
> > > > +	const struct v4l2_ctrl_h264_decode_param *decode =
> > > > run->h264.decode_param; +	struct vb2_queue *cap_q =
> > > > &ctx->fh.m2m_ctx->cap_q_ctx.q;
> > > > +	struct cedrus_dev *dev = ctx->dev;
> > > > +	u32 sram_array[CEDRUS_MAX_REF_IDX / sizeof(u32)];
> > > > +	unsigned int size, i;
> > > > +
> > > > +	memset(sram_array, 0, sizeof(sram_array));
> > > > +
> > > > +	for (i = 0; i < num_ref; i += 4) {
> > > > +		unsigned int j;
> > > > +
> > > > +		for (j = 0; j < 4; j++) {
> > > 
> > > I don't think you have to complicate with two loops here.
> > > cedrus_h264_write_sram() takes void* and it aligns to 4 anyway. So as
> > > long
> > > input buffer is multiple of 4 (u8[CEDRUS_MAX_REF_IDX] qualifies for
> > > that),
> > > you can use single for loop with "u8 sram_array[CEDRUS_MAX_REF_IDX]".
> > > This should make code much more readable.
> > 
> > This wasn't really about the alignment, but in order to get the
> > offsets in the u32 and the array more easily.
> > 
> > Breaking out the loop will make that computation less easy on the eye,
> > so I guess it's very subjective.
> 
> For some strange reason, code below fixes decoding issue from one of my test
> samples. This is what I actually meant with 1 loop approach:
> 
> static void _cedrus_write_ref_list(struct cedrus_ctx *ctx,
> 				   struct cedrus_run *run,
> 				   const u8 *ref_list, u8 num_ref,
> 				   enum cedrus_h264_sram_off sram)
> {
> 	const struct v4l2_ctrl_h264_decode_param *decode = run->h264.decode_param;
> 	struct vb2_queue *cap_q = &ctx->fh.m2m_ctx->cap_q_ctx.q;
> 	struct cedrus_dev *dev = ctx->dev;
> 	u8 sram_array[CEDRUS_MAX_REF_IDX];
> 	unsigned int i;
> 
> 	memset(sram_array, 0, sizeof(sram_array));
> 	num_ref = min(num_ref, (u8)CEDRUS_MAX_REF_IDX);
> 
> 	for (i = 0; i < num_ref; i++) {
> 		const struct v4l2_h264_dpb_entry *dpb;
> 		const struct cedrus_buffer *cedrus_buf;
> 		const struct vb2_v4l2_buffer *ref_buf;
> 		unsigned int position;
> 		int buf_idx;
> 		u8 dpb_idx;
> 
> 		dpb_idx = ref_list[i];
> 		dpb = &decode->dpb[dpb_idx];
> 
> 		if (!(dpb->flags & V4L2_H264_DPB_ENTRY_FLAG_ACTIVE))
> 			continue;
> 
> 		buf_idx = vb2_find_tag(cap_q, dpb->tag, 0);
> 		if (buf_idx < 0)
> 			continue;
> 
> 		ref_buf = to_vb2_v4l2_buffer(ctx->dst_bufs[buf_idx]);
> 		cedrus_buf = vb2_v4l2_to_cedrus_buffer(ref_buf);
> 		position = cedrus_buf->codec.h264.position;
> 
> 		sram_array[i] |= position << 1;
> 		if (ref_buf->field == V4L2_FIELD_BOTTOM)
> 			sram_array[i] |= BIT(0);
> 	}
> 
> 	cedrus_h264_write_sram(dev, sram, &sram_array, num_ref);
> }
> 
> IMO this code is easier to read.
> 
> > > > +			const struct v4l2_h264_dpb_entry *dpb;
> > > > +			const struct cedrus_buffer *cedrus_buf;
> > > > +			const struct vb2_v4l2_buffer *ref_buf;
> > > > +			unsigned int position;
> > > > +			int buf_idx;
> > > > +			u8 ref_idx = i + j;
> > > > +			u8 dpb_idx;
> > > > +
> > > > +			if (ref_idx >= num_ref)
> > > > +				break;
> > > > +
> > > > +			dpb_idx = ref_list[ref_idx];
> > > > +			dpb = &decode->dpb[dpb_idx];
> > > > +
> > > > +			if (!(dpb->flags & V4L2_H264_DPB_ENTRY_FLAG_ACTIVE))
> > > > +				continue;
> > > > +
> > > > +			buf_idx = vb2_find_tag(cap_q, dpb->tag, 0);
> > > > +			if (buf_idx < 0)
> > > > +				continue;
> > > > +
> > > > +			ref_buf = to_vb2_v4l2_buffer(ctx->dst_bufs[buf_idx]);
> > > > +			cedrus_buf = vb2_v4l2_to_cedrus_buffer(ref_buf);
> > > > +			position = cedrus_buf->codec.h264.position;
> > > > +
> > > > +			sram_array[i] |= position << (j * 8 + 1);
> > > > +			if (ref_buf->field == V4L2_FIELD_BOTTOM)
> > > 
> > > You newer set above flag to buffer so this will be always false.
> > 
> > As far as I know, the field is supposed to be set by the userspace.
> 
> How? I thought that only flags at queueing buffers can be set and there is
> no bottom/top flag.
> 
> > > > +	// sequence parameters
> > > > +	reg = BIT(19);
> > > 
> > > This one can be inferred from sps->chroma_format_idc.
> > 
> > I'll look into this
> 
> I'm using this:
> reg |= (sps->chroma_format_idc & 0x7) << 19;
> 
> Although I can't tell if I tested anything else than 1 there (same as it was
> before).
> 
> > > > +	reg |= (sps->pic_width_in_mbs_minus1 & 0xff) << 8;
> > > > +	reg |= sps->pic_height_in_map_units_minus1 & 0xff;
> > > > +	if (sps->flags & V4L2_H264_SPS_FLAG_FRAME_MBS_ONLY)
> > > > +		reg |= BIT(18);
> > > > +	if (sps->flags & V4L2_H264_SPS_FLAG_MB_ADAPTIVE_FRAME_FIELD)
> > > > +		reg |= BIT(17);
> > > > +	if (sps->flags & V4L2_H264_SPS_FLAG_DIRECT_8X8_INFERENCE)
> > > > +		reg |= BIT(16);
> > > > +	cedrus_write(dev, VE_H264_FRAME_SIZE, reg);
> > > > +
> > > > +	// slice parameters
> > > > +	reg = 0;
> > > > +	/*
> > > > +	 * FIXME: This bit marks all the frames as references. This
> > > > +	 * should probably be set based on nal_ref_idc, but the libva
> > > > +	 * doesn't pass that information along, so this is not always
> > > > +	 * available. We should find something else, maybe change the
> > > > +	 * kernel UAPI somehow?
> > > > +	 */
> > > > +	reg |= BIT(12);
> > > 
> > > I really think you should use nal_ref_idc here as it is in
> > > specification.
> > > You can still fake the data from libva backend. I don't think that any
> > > driver needs this for anything else than check if it is 0 or not.
> > 
> > Yeah, Tomasz suggested the same thing as a reply to the cover letter,
> > I'll change that in the next version.
> > 
> > > > +	reg |= (slice->slice_type & 0xf) << 8;
> > > > +	reg |= slice->cabac_init_idc & 0x3;
> > > > +	reg |= BIT(5);
> > > > +	if (slice->flags & V4L2_H264_SLICE_FLAG_FIELD_PIC)
> > > > +		reg |= BIT(4);
> > > > +	if (slice->flags & V4L2_H264_SLICE_FLAG_BOTTOM_FIELD)
> > > > +		reg |= BIT(3);
> > > > +	if (slice->flags & V4L2_H264_SLICE_FLAG_DIRECT_SPATIAL_MV_PRED)
> > > > +		reg |= BIT(2);
> > > > +	cedrus_write(dev, VE_H264_SLICE_HDR, reg);
> > > > +
> > > > +	reg = 0;
> > > 
> > > You might want to set bit 12 here, which enables active reference
> > > picture
> > > override. However, I'm not completely sure about that.
> > 
> > Did you find some videos that were broken because of this?
> 
> No, not really. That's why I don't really know if it is needed or not.

I found a flag in specs: num_ref_idx_active_override_flag
I guess VAAPI always give correct value, so this doesn't need to be set.

Best regards,
Jernej

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [linux-sunxi] [PATCH v2 1/2] media: uapi: Add H264 low-level decoder API compound controls.
  2018-11-27 17:23   ` [linux-sunxi] " Jernej Škrabec
@ 2018-11-28 15:52     ` Maxime Ripard
  0 siblings, 0 replies; 27+ messages in thread
From: Maxime Ripard @ 2018-11-28 15:52 UTC (permalink / raw)
  To: Jernej Škrabec
  Cc: linux-sunxi, hans.verkuil, acourbot, sakari.ailus,
	Laurent Pinchart, tfiga, posciak, Paul Kocialkowski,
	Chen-Yu Tsai, linux-kernel, linux-arm-kernel, linux-media,
	nicolas.dufresne, jenskuske, Thomas Petazzoni, Guenter Roeck

[-- Attachment #1: Type: text/plain, Size: 3934 bytes --]

Hi Jernej,

On Tue, Nov 27, 2018 at 06:23:10PM +0100, Jernej Škrabec wrote:
> > @@ -1156,4 +1164,162 @@ struct v4l2_ctrl_mpeg2_quantization {
> >  	__u8	chroma_non_intra_quantiser_matrix[64];
> >  };
> > 
> > +/* Compounds controls */
> > +
> > +#define V4L2_H264_SPS_CONSTRAINT_SET0_FLAG			0x01
> > +#define V4L2_H264_SPS_CONSTRAINT_SET1_FLAG			0x02
> > +#define V4L2_H264_SPS_CONSTRAINT_SET2_FLAG			0x04
> > +#define V4L2_H264_SPS_CONSTRAINT_SET3_FLAG			0x08
> > +#define V4L2_H264_SPS_CONSTRAINT_SET4_FLAG			0x10
> > +#define V4L2_H264_SPS_CONSTRAINT_SET5_FLAG			0x20
> 
> How are these constraint flags meant to be used?

They are supposed to be used as bit fields in the constraint_set_flags
variable part of the v4l2_ctrl_h264_sps structure.

> > +
> > +#define V4L2_H264_SPS_FLAG_SEPARATE_COLOUR_PLANE		0x01
> > +#define V4L2_H264_SPS_FLAG_QPPRIME_Y_ZERO_TRANSFORM_BYPASS	0x02
> > +#define V4L2_H264_SPS_FLAG_DELTA_PIC_ORDER_ALWAYS_ZERO		0x04
> > +#define V4L2_H264_SPS_FLAG_GAPS_IN_FRAME_NUM_VALUE_ALLOWED	0x08
> > +#define V4L2_H264_SPS_FLAG_FRAME_MBS_ONLY			0x10
> > +#define V4L2_H264_SPS_FLAG_MB_ADAPTIVE_FRAME_FIELD		0x20
> > +#define V4L2_H264_SPS_FLAG_DIRECT_8X8_INFERENCE			0x40
> > +
> > +struct v4l2_ctrl_h264_sps {
> > +	__u8 profile_idc;
> > +	__u8 constraint_set_flags;
> > +	__u8 level_idc;
> > +	__u8 seq_parameter_set_id;
> > +	__u8 chroma_format_idc;
> > +	__u8 bit_depth_luma_minus8;
> > +	__u8 bit_depth_chroma_minus8;
> > +	__u8 log2_max_frame_num_minus4;
> > +	__u8 pic_order_cnt_type;
> > +	__u8 log2_max_pic_order_cnt_lsb_minus4;
> > +	__u8 max_num_ref_frames;
> > +	__u8 num_ref_frames_in_pic_order_cnt_cycle;
> > +	__s32 offset_for_ref_frame[255];
> > +	__s32 offset_for_non_ref_pic;
> > +	__s32 offset_for_top_to_bottom_field;
> > +	__u16 pic_width_in_mbs_minus1;
> > +	__u16 pic_height_in_map_units_minus1;
> > +	__u8 flags;
> > +};
> > +
> > +#define V4L2_H264_PPS_FLAG_ENTROPY_CODING_MODE				0x0001
> > +#define V4L2_H264_PPS_FLAG_BOTTOM_FIELD_PIC_ORDER_IN_FRAME_PRESENT	0x0002
> > +#define V4L2_H264_PPS_FLAG_WEIGHTED_PRED				0x0004
> > +#define V4L2_H264_PPS_FLAG_DEBLOCKING_FILTER_CONTROL_PRESENT		0x0008
> > +#define V4L2_H264_PPS_FLAG_CONSTRAINED_INTRA_PRED			0x0010
> > +#define V4L2_H264_PPS_FLAG_REDUNDANT_PIC_CNT_PRESENT			0x0020
> > +#define V4L2_H264_PPS_FLAG_TRANSFORM_8X8_MODE				0x0040
> > +#define V4L2_H264_PPS_FLAG_PIC_SCALING_MATRIX_PRESENT			0x0080
> > +
> > +struct v4l2_ctrl_h264_pps {
> > +	__u8 pic_parameter_set_id;
> > +	__u8 seq_parameter_set_id;
> > +	__u8 num_slice_groups_minus1;
> > +	__u8 num_ref_idx_l0_default_active_minus1;
> > +	__u8 num_ref_idx_l1_default_active_minus1;
> > +	__u8 weighted_bipred_idc;
> > +	__s8 pic_init_qp_minus26;
> > +	__s8 pic_init_qs_minus26;
> > +	__s8 chroma_qp_index_offset;
> > +	__s8 second_chroma_qp_index_offset;
> > +	__u8 flags;
> > +};
> > +
> > +struct v4l2_ctrl_h264_scaling_matrix {
> > +	__u8 scaling_list_4x4[6][16];
> > +	__u8 scaling_list_8x8[6][64];
> > +};
> > +
> > +struct v4l2_h264_weight_factors {
> > +	__s8 luma_weight[32];
> > +	__s8 luma_offset[32];
> > +	__s8 chroma_weight[32][2];
> > +	__s8 chroma_offset[32][2];
> > +};
> 
> Regarding weight type __s8 - isn't too small just a bit?
> 
> ITU-T Rec. H264 (05/2003) says that this field has value between -128 to 127 if 
> weight flag is set. That fits perfectly. However, when weight flag is 0, default 
> value is 2^luma_log2_weight_denom (for example). luma_log2_weight_denom can 
> have values between 0 and 7, which means that weight will have values from 1 
> to 128. That is just slightly over the max value for __s8.

luma_log2_weight_denom is in the v4l2_h264_pred_weight_table
structure, so you wouldn't use the weights if the weight flag isn't
set.

Maxime

-- 
Maxime Ripard, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [linux-sunxi] [PATCH v2 2/2] media: cedrus: Add H264 decoding support
  2018-11-27 16:30       ` Jernej Škrabec
  2018-11-27 20:19         ` Jernej Škrabec
@ 2018-11-30  7:30         ` Maxime Ripard
  2018-11-30 17:56           ` Jernej Škrabec
  1 sibling, 1 reply; 27+ messages in thread
From: Maxime Ripard @ 2018-11-30  7:30 UTC (permalink / raw)
  To: Jernej Škrabec
  Cc: linux-sunxi, hans.verkuil, acourbot, sakari.ailus,
	Laurent Pinchart, tfiga, posciak, Paul Kocialkowski,
	Chen-Yu Tsai, linux-kernel, linux-arm-kernel, linux-media,
	nicolas.dufresne, jenskuske, Thomas Petazzoni

[-- Attachment #1: Type: text/plain, Size: 4850 bytes --]

On Tue, Nov 27, 2018 at 05:30:00PM +0100, Jernej Škrabec wrote:
> > > > +static void _cedrus_write_ref_list(struct cedrus_ctx *ctx,
> > > > +				   struct cedrus_run *run,
> > > > +				   const u8 *ref_list, u8 num_ref,
> > > > +				   enum cedrus_h264_sram_off sram)
> > > > +{
> > > > +	const struct v4l2_ctrl_h264_decode_param *decode =
> > > > run->h264.decode_param; +	struct vb2_queue *cap_q =
> > > > &ctx->fh.m2m_ctx->cap_q_ctx.q;
> > > > +	struct cedrus_dev *dev = ctx->dev;
> > > > +	u32 sram_array[CEDRUS_MAX_REF_IDX / sizeof(u32)];
> > > > +	unsigned int size, i;
> > > > +
> > > > +	memset(sram_array, 0, sizeof(sram_array));
> > > > +
> > > > +	for (i = 0; i < num_ref; i += 4) {
> > > > +		unsigned int j;
> > > > +
> > > > +		for (j = 0; j < 4; j++) {
> > > 
> > > I don't think you have to complicate with two loops here.
> > > cedrus_h264_write_sram() takes void* and it aligns to 4 anyway. So as long
> > > input buffer is multiple of 4 (u8[CEDRUS_MAX_REF_IDX] qualifies for that),
> > > you can use single for loop with "u8 sram_array[CEDRUS_MAX_REF_IDX]".
> > > This should make code much more readable.
> > 
> > This wasn't really about the alignment, but in order to get the
> > offsets in the u32 and the array more easily.
> > 
> > Breaking out the loop will make that computation less easy on the eye,
> > so I guess it's very subjective.
> > 
> 
> For some strange reason, code below fixes decoding issue from one of my test 
> samples. This is what I actually meant with 1 loop approach:

Do you have that test sample somewhere accessible?

> static void _cedrus_write_ref_list(struct cedrus_ctx *ctx,
> 				   struct cedrus_run *run,
> 				   const u8 *ref_list, u8 num_ref,
> 				   enum cedrus_h264_sram_off sram)
> {
> 	const struct v4l2_ctrl_h264_decode_param *decode = run->h264.decode_param;
> 	struct vb2_queue *cap_q = &ctx->fh.m2m_ctx->cap_q_ctx.q;
> 	struct cedrus_dev *dev = ctx->dev;
> 	u8 sram_array[CEDRUS_MAX_REF_IDX];
> 	unsigned int i;
> 
> 	memset(sram_array, 0, sizeof(sram_array));
> 	num_ref = min(num_ref, (u8)CEDRUS_MAX_REF_IDX);
> 
> 	for (i = 0; i < num_ref; i++) {
> 		const struct v4l2_h264_dpb_entry *dpb;
> 		const struct cedrus_buffer *cedrus_buf;
> 		const struct vb2_v4l2_buffer *ref_buf;
> 		unsigned int position;
> 		int buf_idx;
> 		u8 dpb_idx;
> 
> 		dpb_idx = ref_list[i];
> 		dpb = &decode->dpb[dpb_idx];
> 
> 		if (!(dpb->flags & V4L2_H264_DPB_ENTRY_FLAG_ACTIVE))
> 			continue;
> 
> 		buf_idx = vb2_find_tag(cap_q, dpb->tag, 0);
> 		if (buf_idx < 0)
> 			continue;
> 
> 		ref_buf = to_vb2_v4l2_buffer(ctx->dst_bufs[buf_idx]);
> 		cedrus_buf = vb2_v4l2_to_cedrus_buffer(ref_buf);
> 		position = cedrus_buf->codec.h264.position;
> 
> 		sram_array[i] |= position << 1;
> 		if (ref_buf->field == V4L2_FIELD_BOTTOM)
> 			sram_array[i] |= BIT(0);
> 	}
> 
> 	cedrus_h264_write_sram(dev, sram, &sram_array, num_ref);
> }
> 
> IMO this code is easier to read.

INdeed, thanks!

> > > > +			const struct v4l2_h264_dpb_entry *dpb;
> > > > +			const struct cedrus_buffer *cedrus_buf;
> > > > +			const struct vb2_v4l2_buffer *ref_buf;
> > > > +			unsigned int position;
> > > > +			int buf_idx;
> > > > +			u8 ref_idx = i + j;
> > > > +			u8 dpb_idx;
> > > > +
> > > > +			if (ref_idx >= num_ref)
> > > > +				break;
> > > > +
> > > > +			dpb_idx = ref_list[ref_idx];
> > > > +			dpb = &decode->dpb[dpb_idx];
> > > > +
> > > > +			if (!(dpb->flags & V4L2_H264_DPB_ENTRY_FLAG_ACTIVE))
> > > > +				continue;
> > > > +
> > > > +			buf_idx = vb2_find_tag(cap_q, dpb->tag, 0);
> > > > +			if (buf_idx < 0)
> > > > +				continue;
> > > > +
> > > > +			ref_buf = to_vb2_v4l2_buffer(ctx->dst_bufs[buf_idx]);
> > > > +			cedrus_buf = vb2_v4l2_to_cedrus_buffer(ref_buf);
> > > > +			position = cedrus_buf->codec.h264.position;
> > > > +
> > > > +			sram_array[i] |= position << (j * 8 + 1);
> > > > +			if (ref_buf->field == V4L2_FIELD_BOTTOM)
> > > 
> > > You newer set above flag to buffer so this will be always false.
> > 
> > As far as I know, the field is supposed to be set by the userspace.
> 
> How? I thought that only flags at queueing buffers can be set and there is no 
> bottom/top flag.

https://linuxtv.org/downloads/v4l-dvb-apis/uapi/v4l/buffer.html#c.v4l2_buffer

"Indicates the field order of the image in the buffer, see
v4l2_field. This field is not used when the buffer contains VBI
data. Drivers must set it when type refers to a capture stream,
applications when it refers to an output stream."

My understanding is that the application should set it, since we'll
use the output stream's buffer here. But I might very well be wrong
about it :/

Maxime

-- 
Maxime Ripard, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 2/2] media: cedrus: Add H264 decoding support
  2018-11-15 14:56 ` [PATCH v2 2/2] media: cedrus: Add H264 decoding support Maxime Ripard
  2018-11-24 20:43   ` [linux-sunxi] " Jernej Škrabec
@ 2018-11-30 12:37   ` Paul Kocialkowski
  2018-12-05 22:27   ` [linux-sunxi] " Jernej Škrabec
  2 siblings, 0 replies; 27+ messages in thread
From: Paul Kocialkowski @ 2018-11-30 12:37 UTC (permalink / raw)
  To: Maxime Ripard, hans.verkuil, acourbot, sakari.ailus, Laurent Pinchart
  Cc: tfiga, posciak, Chen-Yu Tsai, linux-kernel, linux-arm-kernel,
	linux-media, nicolas.dufresne, jenskuske, linux-sunxi,
	Thomas Petazzoni

[-- Attachment #1: Type: text/plain, Size: 27689 bytes --]

Hi,

On Thu, 2018-11-15 at 15:56 +0100, Maxime Ripard wrote:
> Introduce some basic H264 decoding support in cedrus. So far, only the
> baseline profile videos have been tested, and some more advanced features
> used in higher profiles are not even implemented.

Regarding the preparation of adresses, it seems that subtracting
PHYS_OFFSET does not work with more than 1 GiB of RAM available.

While platformes before the A33 can only map the first 256 MiB of RAM,
newer ones (starting with the A33) do not have this limitation and the
reserved memory can be set anywhere in RAM.

As an attempt to explain the issue, it could be that with 2 GiB
available, the VPU maps 0x0-0x40000000 to the second GiB of RAM so that
0x40000000-0x80000000 matches the first GiB RAM (like it's exposed to
the CPU). With only 1 GiB (or anything else that divides 1 GiB), that
same GiB is mapped from 0x0-0x40000000, 0x40000000-0x80000000 and so on,
so the issue does not occur.

I've discovered the issue while testing the H.265 series where I had
added the PHYS_OFFSET subtraction and found that the issue applied to
H.264 as well.

Cheers,

Paul

> Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
> ---
>  drivers/staging/media/sunxi/cedrus/Makefile   |   3 +-
>  drivers/staging/media/sunxi/cedrus/cedrus.c   |  25 +
>  drivers/staging/media/sunxi/cedrus/cedrus.h   |  35 +-
>  .../staging/media/sunxi/cedrus/cedrus_dec.c   |  11 +
>  .../staging/media/sunxi/cedrus/cedrus_h264.c  | 470 ++++++++++++++++++
>  .../staging/media/sunxi/cedrus/cedrus_hw.c    |   4 +
>  .../staging/media/sunxi/cedrus/cedrus_regs.h  |  63 +++
>  .../staging/media/sunxi/cedrus/cedrus_video.c |   9 +
>  8 files changed, 618 insertions(+), 2 deletions(-)
>  create mode 100644 drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> 
> diff --git a/drivers/staging/media/sunxi/cedrus/Makefile b/drivers/staging/media/sunxi/cedrus/Makefile
> index e9dc68b7bcb6..aaf141fc58b6 100644
> --- a/drivers/staging/media/sunxi/cedrus/Makefile
> +++ b/drivers/staging/media/sunxi/cedrus/Makefile
> @@ -1,3 +1,4 @@
>  obj-$(CONFIG_VIDEO_SUNXI_CEDRUS) += sunxi-cedrus.o
>  
> -sunxi-cedrus-y = cedrus.o cedrus_video.o cedrus_hw.o cedrus_dec.o cedrus_mpeg2.o
> +sunxi-cedrus-y = cedrus.o cedrus_video.o cedrus_hw.o cedrus_dec.o \
> +		 cedrus_mpeg2.o cedrus_h264.o
> diff --git a/drivers/staging/media/sunxi/cedrus/cedrus.c b/drivers/staging/media/sunxi/cedrus/cedrus.c
> index 82558455384a..627a8c07eb21 100644
> --- a/drivers/staging/media/sunxi/cedrus/cedrus.c
> +++ b/drivers/staging/media/sunxi/cedrus/cedrus.c
> @@ -40,6 +40,30 @@ static const struct cedrus_control cedrus_controls[] = {
>  		.codec		= CEDRUS_CODEC_MPEG2,
>  		.required	= false,
>  	},
> +	{
> +		.id		= V4L2_CID_MPEG_VIDEO_H264_DECODE_PARAMS,
> +		.elem_size	= sizeof(struct v4l2_ctrl_h264_decode_param),
> +		.codec		= CEDRUS_CODEC_H264,
> +		.required	= true,
> +	},
> +	{
> +		.id		= V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAMS,
> +		.elem_size	= sizeof(struct v4l2_ctrl_h264_slice_param),
> +		.codec		= CEDRUS_CODEC_H264,
> +		.required	= true,
> +	},
> +	{
> +		.id		= V4L2_CID_MPEG_VIDEO_H264_SPS,
> +		.elem_size	= sizeof(struct v4l2_ctrl_h264_sps),
> +		.codec		= CEDRUS_CODEC_H264,
> +		.required	= true,
> +	},
> +	{
> +		.id		= V4L2_CID_MPEG_VIDEO_H264_PPS,
> +		.elem_size	= sizeof(struct v4l2_ctrl_h264_pps),
> +		.codec		= CEDRUS_CODEC_H264,
> +		.required	= true,
> +	},
>  };
>  
>  #define CEDRUS_CONTROLS_COUNT	ARRAY_SIZE(cedrus_controls)
> @@ -277,6 +301,7 @@ static int cedrus_probe(struct platform_device *pdev)
>  	}
>  
>  	dev->dec_ops[CEDRUS_CODEC_MPEG2] = &cedrus_dec_ops_mpeg2;
> +	dev->dec_ops[CEDRUS_CODEC_H264] = &cedrus_dec_ops_h264;
>  
>  	mutex_init(&dev->dev_mutex);
>  	spin_lock_init(&dev->irq_lock);
> diff --git a/drivers/staging/media/sunxi/cedrus/cedrus.h b/drivers/staging/media/sunxi/cedrus/cedrus.h
> index 781676b55a1b..179c10dcf6a7 100644
> --- a/drivers/staging/media/sunxi/cedrus/cedrus.h
> +++ b/drivers/staging/media/sunxi/cedrus/cedrus.h
> @@ -30,7 +30,7 @@
>  
>  enum cedrus_codec {
>  	CEDRUS_CODEC_MPEG2,
> -
> +	CEDRUS_CODEC_H264,
>  	CEDRUS_CODEC_LAST,
>  };
>  
> @@ -40,6 +40,12 @@ enum cedrus_irq_status {
>  	CEDRUS_IRQ_OK,
>  };
>  
> +enum cedrus_h264_pic_type {
> +	CEDRUS_H264_PIC_TYPE_FRAME	= 0,
> +	CEDRUS_H264_PIC_TYPE_FIELD,
> +	CEDRUS_H264_PIC_TYPE_MBAFF,
> +};
> +
>  struct cedrus_control {
>  	u32			id;
>  	u32			elem_size;
> @@ -47,6 +53,13 @@ struct cedrus_control {
>  	unsigned char		required:1;
>  };
>  
> +struct cedrus_h264_run {
> +	const struct v4l2_ctrl_h264_decode_param	*decode_param;
> +	const struct v4l2_ctrl_h264_pps			*pps;
> +	const struct v4l2_ctrl_h264_slice_param		*slice_param;
> +	const struct v4l2_ctrl_h264_sps			*sps;
> +};
> +
>  struct cedrus_mpeg2_run {
>  	const struct v4l2_ctrl_mpeg2_slice_params	*slice_params;
>  	const struct v4l2_ctrl_mpeg2_quantization	*quantization;
> @@ -57,12 +70,20 @@ struct cedrus_run {
>  	struct vb2_v4l2_buffer	*dst;
>  
>  	union {
> +		struct cedrus_h264_run	h264;
>  		struct cedrus_mpeg2_run	mpeg2;
>  	};
>  };
>  
>  struct cedrus_buffer {
>  	struct v4l2_m2m_buffer          m2m_buf;
> +
> +	union {
> +		struct {
> +			unsigned int			position;
> +			enum cedrus_h264_pic_type	pic_type;
> +		} h264;
> +	} codec;
>  };
>  
>  struct cedrus_ctx {
> @@ -77,6 +98,17 @@ struct cedrus_ctx {
>  	struct v4l2_ctrl		**ctrls;
>  
>  	struct vb2_buffer		*dst_bufs[VIDEO_MAX_FRAME];
> +
> +	union {
> +		struct {
> +			void		*mv_col_buf;
> +			dma_addr_t	mv_col_buf_dma;
> +			ssize_t		mv_col_buf_field_size;
> +			ssize_t		mv_col_buf_size;
> +			void		*pic_info_buf;
> +			dma_addr_t	pic_info_buf_dma;
> +		} h264;
> +	} codec;
>  };
>  
>  struct cedrus_dec_ops {
> @@ -120,6 +152,7 @@ struct cedrus_dev {
>  };
>  
>  extern struct cedrus_dec_ops cedrus_dec_ops_mpeg2;
> +extern struct cedrus_dec_ops cedrus_dec_ops_h264;
>  
>  static inline void cedrus_write(struct cedrus_dev *dev, u32 reg, u32 val)
>  {
> diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_dec.c b/drivers/staging/media/sunxi/cedrus/cedrus_dec.c
> index 0cfd6036d0cd..b606f07d94ab 100644
> --- a/drivers/staging/media/sunxi/cedrus/cedrus_dec.c
> +++ b/drivers/staging/media/sunxi/cedrus/cedrus_dec.c
> @@ -49,6 +49,17 @@ void cedrus_device_run(void *priv)
>  			V4L2_CID_MPEG_VIDEO_MPEG2_QUANTIZATION);
>  		break;
>  
> +	case V4L2_PIX_FMT_H264_SLICE:
> +		run.h264.decode_param = cedrus_find_control_data(ctx,
> +			V4L2_CID_MPEG_VIDEO_H264_DECODE_PARAMS);
> +		run.h264.pps = cedrus_find_control_data(ctx,
> +			V4L2_CID_MPEG_VIDEO_H264_PPS);
> +		run.h264.slice_param = cedrus_find_control_data(ctx,
> +			V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAMS);
> +		run.h264.sps = cedrus_find_control_data(ctx,
> +			V4L2_CID_MPEG_VIDEO_H264_SPS);
> +		break;
> +
>  	default:
>  		break;
>  	}
> diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> new file mode 100644
> index 000000000000..5459a936b4b9
> --- /dev/null
> +++ b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> @@ -0,0 +1,470 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * Copyright (c) 2013 Jens Kuske <jenskuske@gmail.com>
> + * Copyright (c) 2018 Bootlin
> + */
> +
> +#include <linux/types.h>
> +
> +#include <media/videobuf2-dma-contig.h>
> +
> +#include "cedrus.h"
> +#include "cedrus_hw.h"
> +#include "cedrus_regs.h"
> +
> +enum cedrus_h264_sram_off {
> +	CEDRUS_SRAM_H264_PRED_WEIGHT_TABLE	= 0x000,
> +	CEDRUS_SRAM_H264_FRAMEBUFFER_LIST	= 0x100,
> +	CEDRUS_SRAM_H264_REF_LIST_0		= 0x190,
> +	CEDRUS_SRAM_H264_REF_LIST_1		= 0x199,
> +	CEDRUS_SRAM_H264_SCALING_LIST_8x8	= 0x200,
> +	CEDRUS_SRAM_H264_SCALING_LIST_4x4	= 0x218,
> +};
> +
> +struct cedrus_h264_sram_ref_pic {
> +	__le32	top_field_order_cnt;
> +	__le32	bottom_field_order_cnt;
> +	__le32	frame_info;
> +	__le32	luma_ptr;
> +	__le32	chroma_ptr;
> +	__le32	mv_col_top_ptr;
> +	__le32	mv_col_bot_ptr;
> +	__le32	reserved;
> +} __packed;
> +
> +/* One for the output, 16 for the reference images */
> +#define CEDRUS_H264_FRAME_NUM		17
> +
> +#define CEDRUS_PIC_INFO_BUF_SIZE	(128 * SZ_1K)
> +
> +static void cedrus_h264_write_sram(struct cedrus_dev *dev,
> +				   enum cedrus_h264_sram_off off,
> +				   const void *data, size_t len)
> +{
> +	const u32 *buffer = data;
> +	size_t count = DIV_ROUND_UP(len, 4);
> +
> +	cedrus_write(dev, VE_AVC_SRAM_PORT_OFFSET, off << 2);
> +
> +	do {
> +		cedrus_write(dev, VE_AVC_SRAM_PORT_DATA, *buffer++);
> +	} while (--count);
> +}
> +
> +static dma_addr_t cedrus_h264_mv_col_buf_addr(struct cedrus_ctx *ctx,
> +					      unsigned int position,
> +					      unsigned int field)
> +{
> +	dma_addr_t addr = ctx->codec.h264.mv_col_buf_dma - PHYS_OFFSET;
> +
> +	/* Adjust for the position */
> +	addr += position * ctx->codec.h264.mv_col_buf_field_size * 2;
> +
> +	/* Adjust for the field */
> +	addr += field * ctx->codec.h264.mv_col_buf_field_size;
> +
> +	return addr;
> +}
> +
> +static void cedrus_fill_ref_pic(struct cedrus_ctx *ctx,
> +				struct cedrus_buffer *buf,
> +				unsigned int top_field_order_cnt,
> +				unsigned int bottom_field_order_cnt,
> +				struct cedrus_h264_sram_ref_pic *pic)
> +{
> +	struct vb2_buffer *vbuf = &buf->m2m_buf.vb.vb2_buf;
> +	unsigned int position = buf->codec.h264.position;
> +
> +	pic->top_field_order_cnt = top_field_order_cnt;
> +	pic->bottom_field_order_cnt = bottom_field_order_cnt;
> +	pic->frame_info = buf->codec.h264.pic_type << 8;
> +
> +	pic->luma_ptr = cedrus_buf_addr(vbuf, &ctx->dst_fmt, 0) - PHYS_OFFSET;
> +	pic->chroma_ptr = cedrus_buf_addr(vbuf, &ctx->dst_fmt, 1) - PHYS_OFFSET;
> +	pic->mv_col_top_ptr = cedrus_h264_mv_col_buf_addr(ctx, position, 0);
> +	pic->mv_col_bot_ptr = cedrus_h264_mv_col_buf_addr(ctx, position, 1);
> +}
> +
> +static void cedrus_write_frame_list(struct cedrus_ctx *ctx,
> +				    struct cedrus_run *run)
> +{
> +	struct cedrus_h264_sram_ref_pic pic_list[CEDRUS_H264_FRAME_NUM];
> +	const struct v4l2_ctrl_h264_decode_param *dec_param = run->h264.decode_param;
> +	const struct v4l2_ctrl_h264_slice_param *slice = run->h264.slice_param;
> +	const struct v4l2_ctrl_h264_sps *sps = run->h264.sps;
> +	struct vb2_queue *cap_q = &ctx->fh.m2m_ctx->cap_q_ctx.q;
> +	struct cedrus_buffer *output_buf;
> +	struct cedrus_dev *dev = ctx->dev;
> +	unsigned long used_dpbs = 0;
> +	unsigned int position;
> +	unsigned int output = 0;
> +	unsigned int i;
> +
> +	memset(pic_list, 0, sizeof(pic_list));
> +
> +	for (i = 0; i < ARRAY_SIZE(dec_param->dpb); i++) {
> +		const struct v4l2_h264_dpb_entry *dpb = &dec_param->dpb[i];
> +		struct cedrus_buffer *cedrus_buf;
> +		int buf_idx;
> +
> +		if (!(dpb->flags & V4L2_H264_DPB_ENTRY_FLAG_VALID))
> +			continue;
> +
> +		buf_idx = vb2_find_tag(cap_q, dpb->tag, 0);
> +		if (buf_idx < 0)
> +			continue;
> +
> +		cedrus_buf = vb2_to_cedrus_buffer(ctx->dst_bufs[buf_idx]);
> +		position = cedrus_buf->codec.h264.position;
> +		used_dpbs |= BIT(position);
> +
> +		if (!(dpb->flags & V4L2_H264_DPB_ENTRY_FLAG_ACTIVE))
> +			continue;
> +
> +		cedrus_fill_ref_pic(ctx, cedrus_buf,
> +				    dpb->top_field_order_cnt,
> +				    dpb->bottom_field_order_cnt,
> +				    &pic_list[position]);
> +
> +		output = max(position, output);
> +	}
> +
> +	position = find_next_zero_bit(&used_dpbs, CEDRUS_H264_FRAME_NUM,
> +				      output);
> +	if (position >= CEDRUS_H264_FRAME_NUM)
> +		position = find_first_zero_bit(&used_dpbs, CEDRUS_H264_FRAME_NUM);
> +
> +	output_buf = vb2_to_cedrus_buffer(&run->dst->vb2_buf);
> +	output_buf->codec.h264.position = position;
> +
> +	if (slice->flags & V4L2_H264_SLICE_FLAG_FIELD_PIC)
> +		output_buf->codec.h264.pic_type = CEDRUS_H264_PIC_TYPE_FIELD;
> +	else if (sps->flags & V4L2_H264_SPS_FLAG_MB_ADAPTIVE_FRAME_FIELD)
> +		output_buf->codec.h264.pic_type = CEDRUS_H264_PIC_TYPE_MBAFF;
> +	else
> +		output_buf->codec.h264.pic_type = CEDRUS_H264_PIC_TYPE_FRAME;
> +
> +	cedrus_fill_ref_pic(ctx, output_buf,
> +			    dec_param->top_field_order_cnt,
> +			    dec_param->bottom_field_order_cnt,
> +			    &pic_list[position]);
> +
> +	cedrus_h264_write_sram(dev, CEDRUS_SRAM_H264_FRAMEBUFFER_LIST,
> +			       pic_list, sizeof(pic_list));
> +
> +	cedrus_write(dev, VE_H264_OUTPUT_FRAME_IDX, position);
> +}
> +
> +#define CEDRUS_MAX_REF_IDX	32
> +
> +static void _cedrus_write_ref_list(struct cedrus_ctx *ctx,
> +				   struct cedrus_run *run,
> +				   const u8 *ref_list, u8 num_ref,
> +				   enum cedrus_h264_sram_off sram)
> +{
> +	const struct v4l2_ctrl_h264_decode_param *decode = run->h264.decode_param;
> +	struct vb2_queue *cap_q = &ctx->fh.m2m_ctx->cap_q_ctx.q;
> +	struct cedrus_dev *dev = ctx->dev;
> +	u32 sram_array[CEDRUS_MAX_REF_IDX / sizeof(u32)];
> +	unsigned int size, i;
> +
> +	memset(sram_array, 0, sizeof(sram_array));
> +
> +	for (i = 0; i < num_ref; i += 4) {
> +		unsigned int j;
> +
> +		for (j = 0; j < 4; j++) {
> +			const struct v4l2_h264_dpb_entry *dpb;
> +			const struct cedrus_buffer *cedrus_buf;
> +			const struct vb2_v4l2_buffer *ref_buf;
> +			unsigned int position;
> +			int buf_idx;
> +			u8 ref_idx = i + j;
> +			u8 dpb_idx;
> +
> +			if (ref_idx >= num_ref)
> +				break;
> +
> +			dpb_idx = ref_list[ref_idx];
> +			dpb = &decode->dpb[dpb_idx];
> +
> +			if (!(dpb->flags & V4L2_H264_DPB_ENTRY_FLAG_ACTIVE))
> +				continue;
> +
> +			buf_idx = vb2_find_tag(cap_q, dpb->tag, 0);
> +			if (buf_idx < 0)
> +				continue;
> +
> +			ref_buf = to_vb2_v4l2_buffer(ctx->dst_bufs[buf_idx]);
> +			cedrus_buf = vb2_v4l2_to_cedrus_buffer(ref_buf);
> +			position = cedrus_buf->codec.h264.position;
> +
> +			sram_array[i] |= position << (j * 8 + 1);
> +			if (ref_buf->field == V4L2_FIELD_BOTTOM)
> +				sram_array[i] |= BIT(j * 8);
> +		}
> +	}
> +
> +	size = min((unsigned int)ALIGN(num_ref, 4), sizeof(sram_array));
> +	cedrus_h264_write_sram(dev, sram, &sram_array, size);
> +}
> +
> +static void cedrus_write_ref_list0(struct cedrus_ctx *ctx,
> +				   struct cedrus_run *run)
> +{
> +	const struct v4l2_ctrl_h264_slice_param *slice = run->h264.slice_param;
> +
> +	_cedrus_write_ref_list(ctx, run,
> +			       slice->ref_pic_list0,
> +			       slice->num_ref_idx_l0_active_minus1 + 1,
> +			       CEDRUS_SRAM_H264_REF_LIST_0);
> +}
> +
> +static void cedrus_write_ref_list1(struct cedrus_ctx *ctx,
> +				   struct cedrus_run *run)
> +{
> +	const struct v4l2_ctrl_h264_slice_param *slice = run->h264.slice_param;
> +
> +	_cedrus_write_ref_list(ctx, run,
> +			       slice->ref_pic_list1,
> +			       slice->num_ref_idx_l1_active_minus1 + 1,
> +			       CEDRUS_SRAM_H264_REF_LIST_1);
> +}
> +
> +static void cedrus_set_params(struct cedrus_ctx *ctx,
> +			      struct cedrus_run *run)
> +{
> +	const struct v4l2_ctrl_h264_slice_param *slice = run->h264.slice_param;
> +	const struct v4l2_ctrl_h264_pps *pps = run->h264.pps;
> +	const struct v4l2_ctrl_h264_sps *sps = run->h264.sps;
> +	struct cedrus_dev *dev = ctx->dev;
> +	dma_addr_t src_buf_addr;
> +	u32 offset = slice->header_bit_size;
> +	u32 len = (slice->size * 8) - offset;
> +	u32 reg;
> +
> +	cedrus_write(dev, 0x220, 0x02000400);
> +	cedrus_write(dev, VE_H264_VLD_LEN, len);
> +	cedrus_write(dev, VE_H264_VLD_OFFSET, offset);
> +
> +	src_buf_addr = vb2_dma_contig_plane_dma_addr(&run->src->vb2_buf, 0);
> +	src_buf_addr -= PHYS_OFFSET;
> +	cedrus_write(dev, VE_H264_VLD_END, src_buf_addr + VBV_SIZE - 1);
> +	cedrus_write(dev, VE_H264_VLD_ADDR,
> +		     VE_H264_VLD_ADDR_VAL(src_buf_addr) |
> +		     VE_H264_VLD_ADDR_FIRST | VE_H264_VLD_ADDR_VALID |
> +		     VE_H264_VLD_ADDR_LAST);
> +
> +	/*
> +	 * FIXME: Since the bitstream parsing is done in software, and
> +	 * in userspace, this shouldn't be needed anymore. But it
> +	 * turns out that removing it breaks the decoding process,
> +	 * without any clear indication why.
> +	 */
> +	cedrus_write(dev, VE_H264_TRIGGER_TYPE,
> +		     VE_H264_TRIGGER_TYPE_INIT_SWDEC);
> +
> +	if ((slice->slice_type == V4L2_H264_SLICE_TYPE_P) ||
> +	    (slice->slice_type == V4L2_H264_SLICE_TYPE_SP) ||
> +	    (slice->slice_type == V4L2_H264_SLICE_TYPE_B))
> +		cedrus_write_ref_list0(ctx, run);
> +
> +	if (slice->slice_type == V4L2_H264_SLICE_TYPE_B)
> +		cedrus_write_ref_list1(ctx, run);
> +
> +	// picture parameters
> +	reg = 0;
> +	/*
> +	 * FIXME: the kernel headers are allowing the default value to
> +	 * be passed, but the libva doesn't give us that.
> +	 */
> +	reg |= (slice->num_ref_idx_l0_active_minus1 & 0x1f) << 10;
> +	reg |= (slice->num_ref_idx_l1_active_minus1 & 0x1f) << 5;
> +	reg |= (pps->weighted_bipred_idc & 0x3) << 2;
> +	if (pps->flags & V4L2_H264_PPS_FLAG_ENTROPY_CODING_MODE)
> +		reg |= BIT(15);
> +	if (pps->flags & V4L2_H264_PPS_FLAG_WEIGHTED_PRED)
> +		reg |= BIT(4);
> +	if (pps->flags & V4L2_H264_PPS_FLAG_CONSTRAINED_INTRA_PRED)
> +		reg |= BIT(1);
> +	if (pps->flags & V4L2_H264_PPS_FLAG_TRANSFORM_8X8_MODE)
> +		reg |= BIT(0);
> +	cedrus_write(dev, VE_H264_PIC_HDR, reg);
> +
> +	// sequence parameters
> +	reg = BIT(19);
> +	reg |= (sps->pic_width_in_mbs_minus1 & 0xff) << 8;
> +	reg |= sps->pic_height_in_map_units_minus1 & 0xff;
> +	if (sps->flags & V4L2_H264_SPS_FLAG_FRAME_MBS_ONLY)
> +		reg |= BIT(18);
> +	if (sps->flags & V4L2_H264_SPS_FLAG_MB_ADAPTIVE_FRAME_FIELD)
> +		reg |= BIT(17);
> +	if (sps->flags & V4L2_H264_SPS_FLAG_DIRECT_8X8_INFERENCE)
> +		reg |= BIT(16);
> +	cedrus_write(dev, VE_H264_FRAME_SIZE, reg);
> +
> +	// slice parameters
> +	reg = 0;
> +	/*
> +	 * FIXME: This bit marks all the frames as references. This
> +	 * should probably be set based on nal_ref_idc, but the libva
> +	 * doesn't pass that information along, so this is not always
> +	 * available. We should find something else, maybe change the
> +	 * kernel UAPI somehow?
> +	 */
> +	reg |= BIT(12);
> +	reg |= (slice->slice_type & 0xf) << 8;
> +	reg |= slice->cabac_init_idc & 0x3;
> +	reg |= BIT(5);
> +	if (slice->flags & V4L2_H264_SLICE_FLAG_FIELD_PIC)
> +		reg |= BIT(4);
> +	if (slice->flags & V4L2_H264_SLICE_FLAG_BOTTOM_FIELD)
> +		reg |= BIT(3);
> +	if (slice->flags & V4L2_H264_SLICE_FLAG_DIRECT_SPATIAL_MV_PRED)
> +		reg |= BIT(2);
> +	cedrus_write(dev, VE_H264_SLICE_HDR, reg);
> +
> +	reg = 0;
> +	reg |= (slice->num_ref_idx_l0_active_minus1 & 0x1f) << 24;
> +	reg |= (slice->num_ref_idx_l1_active_minus1 & 0x1f) << 16;
> +	reg |= (slice->disable_deblocking_filter_idc & 0x3) << 8;
> +	reg |= (slice->slice_alpha_c0_offset_div2 & 0xf) << 4;
> +	reg |= slice->slice_beta_offset_div2 & 0xf;
> +	cedrus_write(dev, VE_H264_SLICE_HDR2, reg);
> +
> +	reg = 0;
> +	/*
> +	 * FIXME: This bit tells the video engine to use the default
> +	 * quantization matrices. This will obviously need to be
> +	 * changed to support the profiles supporting custom
> +	 * quantization matrices.
> +	 */
> +	reg |= BIT(24);
> +	reg |= (pps->second_chroma_qp_index_offset & 0x3f) << 16;
> +	reg |= (pps->chroma_qp_index_offset & 0x3f) << 8;
> +	reg |= (pps->pic_init_qp_minus26 + 26 + slice->slice_qp_delta) & 0x3f;
> +	cedrus_write(dev, VE_H264_QP_PARAM, reg);
> +
> +	// clear status flags
> +	cedrus_write(dev, VE_H264_STATUS, cedrus_read(dev, VE_H264_STATUS));
> +
> +	// enable int
> +	reg = cedrus_read(dev, VE_H264_CTRL) | 0x7;
> +	cedrus_write(dev, VE_H264_CTRL, reg);
> +}
> +
> +static enum cedrus_irq_status
> +cedrus_h264_irq_status(struct cedrus_ctx *ctx)
> +{
> +	struct cedrus_dev *dev = ctx->dev;
> +	u32 reg = cedrus_read(dev, VE_H264_STATUS) & 0x7;
> +
> +	if (!reg)
> +		return CEDRUS_IRQ_NONE;
> +
> +	if (reg & (BIT(1) | BIT(2)))
> +		return CEDRUS_IRQ_ERROR;
> +
> +	return CEDRUS_IRQ_OK;
> +}
> +
> +static void cedrus_h264_irq_clear(struct cedrus_ctx *ctx)
> +{
> +	struct cedrus_dev *dev = ctx->dev;
> +
> +	cedrus_write(dev, VE_H264_STATUS, GENMASK(2, 0));
> +}
> +
> +static void cedrus_h264_irq_disable(struct cedrus_ctx *ctx)
> +{
> +	struct cedrus_dev *dev = ctx->dev;
> +	u32 reg = cedrus_read(dev, VE_H264_CTRL) & ~GENMASK(2, 0);
> +
> +	cedrus_write(dev, VE_H264_CTRL, reg);
> +}
> +
> +static void cedrus_h264_setup(struct cedrus_ctx *ctx,
> +			      struct cedrus_run *run)
> +{
> +	struct cedrus_dev *dev = ctx->dev;
> +
> +	cedrus_engine_enable(dev, CEDRUS_CODEC_H264);
> +
> +	cedrus_write(dev, VE_H264_SDROT_CTRL, 0);
> +	cedrus_write(dev, VE_H264_EXTRA_BUFFER1,
> +		     ctx->codec.h264.pic_info_buf_dma - PHYS_OFFSET);
> +	cedrus_write(dev, VE_H264_EXTRA_BUFFER2,
> +		     (ctx->codec.h264.pic_info_buf_dma - PHYS_OFFSET) + 0x48000);
> +
> +	cedrus_write_frame_list(ctx, run);
> +
> +	cedrus_set_params(ctx, run);
> +}
> +
> +static int cedrus_h264_start(struct cedrus_ctx *ctx)
> +{
> +	struct cedrus_dev *dev = ctx->dev;
> +	unsigned int field_size;
> +	unsigned int mv_col_size;
> +	int ret;
> +
> +	ctx->codec.h264.pic_info_buf =
> +		dma_alloc_coherent(dev->dev, CEDRUS_PIC_INFO_BUF_SIZE,
> +				   &ctx->codec.h264.pic_info_buf_dma,
> +				   GFP_KERNEL);
> +	if (!ctx->codec.h264.pic_info_buf)
> +		return -ENOMEM;
> +
> +	field_size = DIV_ROUND_UP(ctx->src_fmt.width, 16) *
> +		DIV_ROUND_UP(ctx->src_fmt.height, 16) * 32;
> +	ctx->codec.h264.mv_col_buf_field_size = field_size;
> +
> +	mv_col_size = field_size * 2 * CEDRUS_H264_FRAME_NUM;
> +	ctx->codec.h264.mv_col_buf_size = mv_col_size;
> +	ctx->codec.h264.mv_col_buf = dma_alloc_coherent(dev->dev,
> +							ctx->codec.h264.mv_col_buf_size,
> +							&ctx->codec.h264.mv_col_buf_dma,
> +							GFP_KERNEL);
> +	if (!ctx->codec.h264.mv_col_buf) {
> +		ret = -ENOMEM;
> +		goto err_pic_buf;
> +	}
> +
> +	return 0;
> +
> +err_pic_buf:
> +	dma_free_coherent(dev->dev, CEDRUS_PIC_INFO_BUF_SIZE,
> +			  ctx->codec.h264.pic_info_buf,
> +			  ctx->codec.h264.pic_info_buf_dma);
> +	return ret;
> +}
> +
> +static void cedrus_h264_stop(struct cedrus_ctx *ctx)
> +{
> +	struct cedrus_dev *dev = ctx->dev;
> +
> +	dma_free_coherent(dev->dev, ctx->codec.h264.mv_col_buf_size,
> +			  ctx->codec.h264.mv_col_buf,
> +			  ctx->codec.h264.mv_col_buf_dma);
> +	dma_free_coherent(dev->dev, CEDRUS_PIC_INFO_BUF_SIZE,
> +			  ctx->codec.h264.pic_info_buf,
> +			  ctx->codec.h264.pic_info_buf_dma);
> +}
> +
> +static void cedrus_h264_trigger(struct cedrus_ctx *ctx)
> +{
> +	struct cedrus_dev *dev = ctx->dev;
> +
> +	cedrus_write(dev, VE_H264_TRIGGER_TYPE,
> +		     VE_H264_TRIGGER_TYPE_AVC_SLICE_DECODE);
> +}
> +
> +struct cedrus_dec_ops cedrus_dec_ops_h264 = {
> +	.irq_clear	= cedrus_h264_irq_clear,
> +	.irq_disable	= cedrus_h264_irq_disable,
> +	.irq_status	= cedrus_h264_irq_status,
> +	.setup		= cedrus_h264_setup,
> +	.start		= cedrus_h264_start,
> +	.stop		= cedrus_h264_stop,
> +	.trigger	= cedrus_h264_trigger,
> +};
> diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_hw.c b/drivers/staging/media/sunxi/cedrus/cedrus_hw.c
> index 32adbcbe6175..8e559454ca82 100644
> --- a/drivers/staging/media/sunxi/cedrus/cedrus_hw.c
> +++ b/drivers/staging/media/sunxi/cedrus/cedrus_hw.c
> @@ -46,6 +46,10 @@ int cedrus_engine_enable(struct cedrus_dev *dev, enum cedrus_codec codec)
>  		reg |= VE_MODE_DEC_MPEG;
>  		break;
>  
> +	case CEDRUS_CODEC_H264:
> +		reg |= VE_MODE_DEC_H264;
> +		break;
> +
>  	default:
>  		return -EINVAL;
>  	}
> diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_regs.h b/drivers/staging/media/sunxi/cedrus/cedrus_regs.h
> index de2d6b6f64bf..6fe9896a506d 100644
> --- a/drivers/staging/media/sunxi/cedrus/cedrus_regs.h
> +++ b/drivers/staging/media/sunxi/cedrus/cedrus_regs.h
> @@ -232,4 +232,67 @@
>  #define VE_DEC_MPEG_ROT_LUMA			(VE_ENGINE_DEC_MPEG + 0xcc)
>  #define VE_DEC_MPEG_ROT_CHROMA			(VE_ENGINE_DEC_MPEG + 0xd0)
>  
> +/*  FIXME: Legacy below. */
> +
> +#define VBV_SIZE                       (1024 * 1024)
> +
> +#define VE_H264_FRAME_SIZE		0x200
> +#define VE_H264_PIC_HDR			0x204
> +#define VE_H264_SLICE_HDR		0x208
> +#define VE_H264_SLICE_HDR2		0x20c
> +#define VE_H264_PRED_WEIGHT		0x210
> +#define VE_H264_QP_PARAM		0x21c
> +#define VE_H264_CTRL			0x220
> +
> +#define VE_H264_TRIGGER_TYPE		0x224
> +#define VE_H264_TRIGGER_TYPE_AVC_SLICE_DECODE	(8 << 0)
> +#define VE_H264_TRIGGER_TYPE_INIT_SWDEC		(7 << 0)
> +
> +#define VE_H264_STATUS			0x228
> +#define VE_H264_CUR_MB_NUM		0x22c
> +
> +#define VE_H264_VLD_ADDR		0x230
> +#define VE_H264_VLD_ADDR_FIRST			BIT(30)
> +#define VE_H264_VLD_ADDR_LAST			BIT(29)
> +#define VE_H264_VLD_ADDR_VALID			BIT(28)
> +#define VE_H264_VLD_ADDR_VAL(x)			(((x) & 0x0ffffff0) | ((x) >> 28))
> +
> +#define VE_H264_VLD_OFFSET		0x234
> +#define VE_H264_VLD_LEN			0x238
> +#define VE_H264_VLD_END			0x23c
> +#define VE_H264_SDROT_CTRL		0x240
> +#define VE_H264_OUTPUT_FRAME_IDX	0x24c
> +#define VE_H264_EXTRA_BUFFER1		0x250
> +#define VE_H264_EXTRA_BUFFER2		0x254
> +#define VE_H264_BASIC_BITS		0x2dc
> +#define VE_AVC_SRAM_PORT_OFFSET		0x2e0
> +#define VE_AVC_SRAM_PORT_DATA		0x2e4
> +
> +#define VE_ISP_INPUT_SIZE		0xa00
> +#define VE_ISP_INPUT_STRIDE		0xa04
> +#define VE_ISP_CTRL			0xa08
> +#define VE_ISP_INPUT_LUMA		0xa78
> +#define VE_ISP_INPUT_CHROMA		0xa7c
> +
> +#define VE_AVC_PARAM			0xb04
> +#define VE_AVC_QP			0xb08
> +#define VE_AVC_MOTION_EST		0xb10
> +#define VE_AVC_CTRL			0xb14
> +#define VE_AVC_TRIGGER			0xb18
> +#define VE_AVC_STATUS			0xb1c
> +#define VE_AVC_BASIC_BITS		0xb20
> +#define VE_AVC_UNK_BUF			0xb60
> +#define VE_AVC_VLE_ADDR			0xb80
> +#define VE_AVC_VLE_END			0xb84
> +#define VE_AVC_VLE_OFFSET		0xb88
> +#define VE_AVC_VLE_MAX			0xb8c
> +#define VE_AVC_VLE_LENGTH		0xb90
> +#define VE_AVC_REF_LUMA			0xba0
> +#define VE_AVC_REF_CHROMA		0xba4
> +#define VE_AVC_REC_LUMA			0xbb0
> +#define VE_AVC_REC_CHROMA		0xbb4
> +#define VE_AVC_REF_SLUMA		0xbb8
> +#define VE_AVC_REC_SLUMA		0xbbc
> +#define VE_AVC_MB_INFO			0xbc0
> +
>  #endif
> diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_video.c b/drivers/staging/media/sunxi/cedrus/cedrus_video.c
> index 293df48326cc..7be2caacddde 100644
> --- a/drivers/staging/media/sunxi/cedrus/cedrus_video.c
> +++ b/drivers/staging/media/sunxi/cedrus/cedrus_video.c
> @@ -37,6 +37,10 @@ static struct cedrus_format cedrus_formats[] = {
>  		.pixelformat	= V4L2_PIX_FMT_MPEG2_SLICE,
>  		.directions	= CEDRUS_DECODE_SRC,
>  	},
> +	{
> +		.pixelformat	= V4L2_PIX_FMT_H264_SLICE,
> +		.directions	= CEDRUS_DECODE_SRC,
> +	},
>  	{
>  		.pixelformat	= V4L2_PIX_FMT_SUNXI_TILED_NV12,
>  		.directions	= CEDRUS_DECODE_DST,
> @@ -100,6 +104,7 @@ static void cedrus_prepare_format(struct v4l2_pix_format *pix_fmt)
>  
>  	switch (pix_fmt->pixelformat) {
>  	case V4L2_PIX_FMT_MPEG2_SLICE:
> +	case V4L2_PIX_FMT_H264_SLICE:
>  		/* Zero bytes per line for encoded source. */
>  		bytesperline = 0;
>  
> @@ -451,6 +456,10 @@ static int cedrus_start_streaming(struct vb2_queue *vq, unsigned int count)
>  		ctx->current_codec = CEDRUS_CODEC_MPEG2;
>  		break;
>  
> +	case V4L2_PIX_FMT_H264_SLICE:
> +		ctx->current_codec = CEDRUS_CODEC_H264;
> +		break;
> +
>  	default:
>  		return -EINVAL;
>  	}
-- 
Paul Kocialkowski, Bootlin (formerly Free Electrons)
Embedded Linux and kernel engineering
https://bootlin.com

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [linux-sunxi] [PATCH v2 2/2] media: cedrus: Add H264 decoding support
  2018-11-30  7:30         ` Maxime Ripard
@ 2018-11-30 17:56           ` Jernej Škrabec
  0 siblings, 0 replies; 27+ messages in thread
From: Jernej Škrabec @ 2018-11-30 17:56 UTC (permalink / raw)
  To: linux-sunxi, maxime.ripard
  Cc: hans.verkuil, acourbot, sakari.ailus, Laurent Pinchart, tfiga,
	posciak, Paul Kocialkowski, Chen-Yu Tsai, linux-kernel,
	linux-arm-kernel, linux-media, nicolas.dufresne, jenskuske,
	Thomas Petazzoni

Dne petek, 30. november 2018 ob 08:30:47 CET je Maxime Ripard napisal(a):
> On Tue, Nov 27, 2018 at 05:30:00PM +0100, Jernej Škrabec wrote:
> > > > > +static void _cedrus_write_ref_list(struct cedrus_ctx *ctx,
> > > > > +				   struct cedrus_run *run,
> > > > > +				   const u8 *ref_list, u8 num_ref,
> > > > > +				   enum cedrus_h264_sram_off sram)
> > > > > +{
> > > > > +	const struct v4l2_ctrl_h264_decode_param *decode =
> > > > > run->h264.decode_param; +	struct vb2_queue *cap_q =
> > > > > &ctx->fh.m2m_ctx->cap_q_ctx.q;
> > > > > +	struct cedrus_dev *dev = ctx->dev;
> > > > > +	u32 sram_array[CEDRUS_MAX_REF_IDX / sizeof(u32)];
> > > > > +	unsigned int size, i;
> > > > > +
> > > > > +	memset(sram_array, 0, sizeof(sram_array));
> > > > > +
> > > > > +	for (i = 0; i < num_ref; i += 4) {
> > > > > +		unsigned int j;
> > > > > +
> > > > > +		for (j = 0; j < 4; j++) {
> > > > 
> > > > I don't think you have to complicate with two loops here.
> > > > cedrus_h264_write_sram() takes void* and it aligns to 4 anyway. So as
> > > > long
> > > > input buffer is multiple of 4 (u8[CEDRUS_MAX_REF_IDX] qualifies for
> > > > that),
> > > > you can use single for loop with "u8 sram_array[CEDRUS_MAX_REF_IDX]".
> > > > This should make code much more readable.
> > > 
> > > This wasn't really about the alignment, but in order to get the
> > > offsets in the u32 and the array more easily.
> > > 
> > > Breaking out the loop will make that computation less easy on the eye,
> > > so I guess it's very subjective.
> > 
> > For some strange reason, code below fixes decoding issue from one of my
> > test
> > samples. This is what I actually meant with 1 loop approach:
> Do you have that test sample somewhere accessible?

yes, it's here:
http://jernej.libreelec.tv/videos/h264/Star%20Wars%20Episode%20VII%20-%20The%20Force%20Awakens%20-%20Teaser%20Trailer%202.mp4

It needs also prediction weight tables (your early patch for that should work 
ok) and scaling list (code I sent you in one of the previous comments should 
work).

For me, if this sample worked without issue, every other non-interlaced sample 
worked too.

> 
> > static void _cedrus_write_ref_list(struct cedrus_ctx *ctx,
> > 
> > 				   struct cedrus_run *run,
> > 				   const u8 *ref_list, u8 num_ref,
> > 				   enum cedrus_h264_sram_off sram)
> > 
> > {
> > 
> > 	const struct v4l2_ctrl_h264_decode_param *decode =
> > 	run->h264.decode_param;
> > 	struct vb2_queue *cap_q = &ctx->fh.m2m_ctx->cap_q_ctx.q;
> > 	struct cedrus_dev *dev = ctx->dev;
> > 	u8 sram_array[CEDRUS_MAX_REF_IDX];
> > 	unsigned int i;
> > 	
> > 	memset(sram_array, 0, sizeof(sram_array));
> > 	num_ref = min(num_ref, (u8)CEDRUS_MAX_REF_IDX);
> > 	
> > 	for (i = 0; i < num_ref; i++) {
> > 	
> > 		const struct v4l2_h264_dpb_entry *dpb;
> > 		const struct cedrus_buffer *cedrus_buf;
> > 		const struct vb2_v4l2_buffer *ref_buf;
> > 		unsigned int position;
> > 		int buf_idx;
> > 		u8 dpb_idx;
> > 		
> > 		dpb_idx = ref_list[i];
> > 		dpb = &decode->dpb[dpb_idx];
> > 		
> > 		if (!(dpb->flags & V4L2_H264_DPB_ENTRY_FLAG_ACTIVE))
> > 		
> > 			continue;
> > 		
> > 		buf_idx = vb2_find_tag(cap_q, dpb->tag, 0);
> > 		if (buf_idx < 0)
> > 		
> > 			continue;
> > 		
> > 		ref_buf = to_vb2_v4l2_buffer(ctx->dst_bufs[buf_idx]);
> > 		cedrus_buf = vb2_v4l2_to_cedrus_buffer(ref_buf);
> > 		position = cedrus_buf->codec.h264.position;
> > 		
> > 		sram_array[i] |= position << 1;
> > 		if (ref_buf->field == V4L2_FIELD_BOTTOM)
> > 		
> > 			sram_array[i] |= BIT(0);
> > 	
> > 	}
> > 	
> > 	cedrus_h264_write_sram(dev, sram, &sram_array, num_ref);
> > 
> > }
> > 
> > IMO this code is easier to read.
> 
> INdeed, thanks!
> 
> > > > > +			const struct v4l2_h264_dpb_entry *dpb;
> > > > > +			const struct cedrus_buffer *cedrus_buf;
> > > > > +			const struct vb2_v4l2_buffer *ref_buf;
> > > > > +			unsigned int position;
> > > > > +			int buf_idx;
> > > > > +			u8 ref_idx = i + j;
> > > > > +			u8 dpb_idx;
> > > > > +
> > > > > +			if (ref_idx >= num_ref)
> > > > > +				break;
> > > > > +
> > > > > +			dpb_idx = ref_list[ref_idx];
> > > > > +			dpb = &decode->dpb[dpb_idx];
> > > > > +
> > > > > +			if (!(dpb->flags & V4L2_H264_DPB_ENTRY_FLAG_ACTIVE))
> > > > > +				continue;
> > > > > +
> > > > > +			buf_idx = vb2_find_tag(cap_q, dpb->tag, 0);
> > > > > +			if (buf_idx < 0)
> > > > > +				continue;
> > > > > +
> > > > > +			ref_buf = to_vb2_v4l2_buffer(ctx->dst_bufs[buf_idx]);
> > > > > +			cedrus_buf = vb2_v4l2_to_cedrus_buffer(ref_buf);
> > > > > +			position = cedrus_buf->codec.h264.position;
> > > > > +
> > > > > +			sram_array[i] |= position << (j * 8 + 1);
> > > > > +			if (ref_buf->field == V4L2_FIELD_BOTTOM)
> > > > 
> > > > You newer set above flag to buffer so this will be always false.
> > > 
> > > As far as I know, the field is supposed to be set by the userspace.
> > 
> > How? I thought that only flags at queueing buffers can be set and there is
> > no bottom/top flag.
> 
> https://linuxtv.org/downloads/v4l-dvb-apis/uapi/v4l/buffer.html#c.v4l2_buffe
> r
> 
> "Indicates the field order of the image in the buffer, see
> v4l2_field. This field is not used when the buffer contains VBI
> data. Drivers must set it when type refers to a capture stream,
> applications when it refers to an output stream."
> 
> My understanding is that the application should set it, since we'll
> use the output stream's buffer here. But I might very well be wrong
> about it :/

I'll take a look, thanks.

Best regards,
Jernej

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 1/2] media: uapi: Add H264 low-level decoder API compound controls.
  2018-11-15 14:56 ` [PATCH v2 1/2] media: uapi: Add H264 low-level decoder API compound controls Maxime Ripard
  2018-11-27 17:23   ` [linux-sunxi] " Jernej Škrabec
@ 2018-12-05 12:56   ` Hans Verkuil
  2019-01-08  9:52   ` Randy 'ayaka' Li
  2019-01-28  5:54   ` Alexandre Courbot
  3 siblings, 0 replies; 27+ messages in thread
From: Hans Verkuil @ 2018-12-05 12:56 UTC (permalink / raw)
  To: Maxime Ripard, hans.verkuil, acourbot, sakari.ailus, Laurent Pinchart
  Cc: tfiga, posciak, Paul Kocialkowski, Chen-Yu Tsai, linux-kernel,
	linux-arm-kernel, linux-media, nicolas.dufresne, jenskuske,
	linux-sunxi, Thomas Petazzoni, Guenter Roeck

On 11/15/18 15:56, Maxime Ripard wrote:
> From: Pawel Osciak <posciak@chromium.org>
> 
> Stateless video codecs will require both the H264 metadata and slices in
> order to be able to decode frames.
> 
> This introduces the definitions for a new pixel format for H264 slices that
> have been parsed, as well as the structures used to pass the metadata from
> the userspace to the kernel.
> 
> Co-Developed-by: Maxime Ripard <maxime.ripard@bootlin.com>
> Signed-off-by: Pawel Osciak <posciak@chromium.org>
> Signed-off-by: Guenter Roeck <groeck@chromium.org>
> Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
> ---
>  Documentation/media/uapi/v4l/biblio.rst       |   9 +
>  .../media/uapi/v4l/extended-controls.rst      | 364 ++++++++++++++++++
>  .../media/uapi/v4l/pixfmt-compressed.rst      |  20 +
>  .../media/uapi/v4l/vidioc-queryctrl.rst       |  30 ++
>  .../media/videodev2.h.rst.exceptions          |   5 +
>  drivers/media/v4l2-core/v4l2-ctrls.c          |  42 ++
>  drivers/media/v4l2-core/v4l2-ioctl.c          |   1 +
>  include/media/v4l2-ctrls.h                    |  10 +
>  include/uapi/linux/v4l2-controls.h            | 166 ++++++++
>  include/uapi/linux/videodev2.h                |  11 +
>  10 files changed, 658 insertions(+)
> 
> diff --git a/Documentation/media/uapi/v4l/biblio.rst b/Documentation/media/uapi/v4l/biblio.rst
> index 386d6cf83e9c..73aeb7ce47d2 100644
> --- a/Documentation/media/uapi/v4l/biblio.rst
> +++ b/Documentation/media/uapi/v4l/biblio.rst
> @@ -115,6 +115,15 @@ ITU BT.1119
>  
>  :author:    International Telecommunication Union (http://www.itu.ch)
>  
> +.. _h264:
> +
> +ITU H.264
> +=========
> +
> +:title:     ITU-T Recommendation H.264 "Advanced Video Coding for Generic Audiovisual Services"
> +
> +:author:    International Telecommunication Union (http://www.itu.ch)
> +
>  .. _jfif:
>  
>  JFIF
> diff --git a/Documentation/media/uapi/v4l/extended-controls.rst b/Documentation/media/uapi/v4l/extended-controls.rst
> index 65a1d873196b..87c0d151577f 100644
> --- a/Documentation/media/uapi/v4l/extended-controls.rst
> +++ b/Documentation/media/uapi/v4l/extended-controls.rst
> @@ -1674,6 +1674,370 @@ enum v4l2_mpeg_video_h264_hierarchical_coding_type -
>  	non-intra-coded frames, in zigzag scanning order. Only relevant for
>  	non-4:2:0 YUV formats.
>  
> +.. _v4l2-mpeg-h264:
> +
> +``V4L2_CID_MPEG_VIDEO_H264_SPS (struct)``
> +    Specifies the sequence parameter set (as extracted from the
> +    bitstream) for the associated H264 slice data. This includes the
> +    necessary parameters for configuring a stateless hardware decoding
> +    pipeline for H264.  The bitstream parameters are defined according
> +    to :ref:`h264`. Unless there's a specific comment, refer to the
> +    specification for the documentation of these fields.

If possible, please refer to the corresponding section(s) in the h264 spec
where this is documented. Same for the other controls.

> +
> +.. c:type:: v4l2_ctrl_h264_sps
> +
> +.. cssclass:: longtable
> +
> +.. flat-table:: struct v4l2_ctrl_h264_sps
> +    :header-rows:  0
> +    :stub-columns: 0
> +    :widths:       1 1 2
> +
> +    * - __u8
> +      - ``profile_idc``
> +      -
> +    * - __u8
> +      - ``constraint_set_flags``
> +      - TODO
> +    * - __u8
> +      - ``level_idc``
> +      -
> +    * - __u8
> +      - ``seq_parameter_set_id``
> +      -
> +    * - __u8
> +      - ``chroma_format_idc``
> +      -
> +    * - __u8
> +      - ``bit_depth_luma_minus8``
> +      -
> +    * - __u8
> +      - ``bit_depth_chroma_minus8``
> +      -
> +    * - __u8
> +      - ``log2_max_frame_num_minus4``
> +      -
> +    * - __u8
> +      - ``pic_order_cnt_type``
> +      -
> +    * - __u8
> +      - ``log2_max_pic_order_cnt_lsb_minus4``
> +      -
> +    * - __u8
> +      - ``max_num_ref_frames``
> +      -
> +    * - __u8
> +      - ``num_ref_frames_in_pic_order_cnt_cycle``
> +      -
> +    * - __s32
> +      - ``offset_for_ref_frame[255]``
> +      -
> +    * - __s32
> +      - ``offset_for_non_ref_pic``
> +      -
> +    * - __s32
> +      - ``offset_for_top_to_bottom_field``
> +      -
> +    * - __u16
> +      - ``pic_width_in_mbs_minus1``
> +      -
> +    * - __u16
> +      - ``pic_height_in_map_units_minus1``
> +      -
> +    * - __u8
> +      - ``flags``
> +      - TODO
> +
> +``V4L2_CID_MPEG_VIDEO_H264_PPS (struct)``
> +    Specifies the picture parameter set (as extracted from the
> +    bitstream) for the associated H264 slice data. This includes the
> +    necessary parameters for configuring a stateless hardware decoding
> +    pipeline for H264.  The bitstream parameters are defined according
> +    to :ref:`h264`. Unless there's a specific comment, refer to the
> +    specification for the documentation of these fields.
> +
> +.. c:type:: v4l2_ctrl_h264_pps
> +
> +.. cssclass:: longtable
> +
> +.. flat-table:: struct v4l2_ctrl_h264_pps
> +    :header-rows:  0
> +    :stub-columns: 0
> +    :widths:       1 1 2
> +
> +    * - __u8
> +      - ``pic_parameter_set_id``
> +      -
> +    * - __u8
> +      - ``seq_parameter_set_id``
> +      -
> +    * - __u8
> +      - ``num_slice_groups_minus1``
> +      -
> +    * - __u8
> +      - ``num_ref_idx_l0_default_active_minus1``
> +      -
> +    * - __u8
> +      - ``num_ref_idx_l1_default_active_minus1``
> +      -
> +    * - __u8
> +      - ``weighted_bipred_idc``
> +      -
> +    * - __s8
> +      - ``pic_init_qp_minus26``
> +      -
> +    * - __s8
> +      - ``pic_init_qs_minus26``
> +      -
> +    * - __s8
> +      - ``chroma_qp_index_offset``
> +      -
> +    * - __s8
> +      - ``second_chroma_qp_index_offset``
> +      -
> +    * - __u8
> +      - ``flags``
> +      - TODO
> +
> +``V4L2_CID_MPEG_VIDEO_H264_SCALING_MATRIX (struct)``
> +    Specifies the scaling matrix (as extracted from the bitstream) for
> +    the associated H264 slice data. The bitstream parameters are
> +    defined according to :ref:`h264`. Unless there's a specific
> +    comment, refer to the specification for the documentation of these
> +    fields.
> +
> +.. c:type:: v4l2_ctrl_h264_scaling_matrix
> +
> +.. cssclass:: longtable
> +
> +.. flat-table:: struct v4l2_ctrl_h264_scaling_matrix
> +    :header-rows:  0
> +    :stub-columns: 0
> +    :widths:       1 1 2
> +
> +    * - __u8
> +      - ``scaling_list_4x4[6][16]``
> +      -
> +    * - __u8
> +      - ``scaling_list_8x8[6][64]``
> +      -
> +
> +``V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAMS (struct)``
> +    Specifies the slice parameters (as extracted from the bitstream)
> +    for the associated H264 slice data. This includes the necessary
> +    parameters for configuring a stateless hardware decoding pipeline
> +    for H264.  The bitstream parameters are defined according to
> +    :ref:`h264`. Unless there's a specific comment, refer to the
> +    specification for the documentation of these fields.
> +
> +.. c:type:: v4l2_ctrl_h264_slice_param
> +
> +.. cssclass:: longtable
> +
> +.. flat-table:: struct v4l2_ctrl_h264_slice_param
> +    :header-rows:  0
> +    :stub-columns: 0
> +    :widths:       1 1 2
> +
> +    * - __u32
> +      - ``size``
> +      -
> +    * - __u32
> +      - ``header_bit_size``
> +      -
> +    * - __u16
> +      - ``first_mb_in_slice``
> +      -
> +    * - __u8
> +      - ``slice_type``
> +      -
> +    * - __u8
> +      - ``pic_parameter_set_id``
> +      -
> +    * - __u8
> +      - ``colour_plane_id``
> +      -
> +    * - __u16
> +      - ``frame_num``
> +      -
> +    * - __u16
> +      - ``idr_pic_id``
> +      -
> +    * - __u16
> +      - ``pic_order_cnt_lsb``
> +      -
> +    * - __s32
> +      - ``delta_pic_order_cnt_bottom``
> +      -
> +    * - __s32
> +      - ``delta_pic_order_cnt0``
> +      -
> +    * - __s32
> +      - ``delta_pic_order_cnt1``
> +      -
> +    * - __u8
> +      - ``redundant_pic_cnt``
> +      -
> +    * - struct :c:type:`v4l2_h264_pred_weight_table`
> +      - ``pred_weight_table``
> +      -
> +    * - __u32
> +      - ``dec_ref_pic_marking_bit_size``
> +      -
> +    * - __u32
> +      - ``pic_order_cnt_bit_size``
> +      -
> +    * - __u8
> +      - ``cabac_init_idc``
> +      -
> +    * - __s8
> +      - ``slice_qp_delta``
> +      -
> +    * - __s8
> +      - ``slice_qs_delta``
> +      -
> +    * - __u8
> +      - ``disable_deblocking_filter_idc``
> +      -
> +    * - __s8
> +      - ``slice_alpha_c0_offset_div2``
> +      -
> +    * - __s8
> +      - ``slice_beta_offset_div2``
> +      -
> +    * - __u32
> +      - ``slice_group_change_cycle``
> +      -
> +    * - __u8
> +      - ``num_ref_idx_l0_active_minus1``
> +      -
> +    * - __u8
> +      - ``num_ref_idx_l1_active_minus1``
> +      -
> +    * - __u8
> +      - ``ref_pic_list0[32]``
> +      -
> +    * - __u8
> +      - ``ref_pic_list1[32]``
> +      -
> +    * - __u8
> +      - ``flags``
> +      - TODO
> +
> +.. c:type:: v4l2_h264_pred_weight_table
> +
> +.. cssclass:: longtable
> +
> +.. flat-table:: struct v4l2_h264_pred_weight_table
> +    :header-rows:  0
> +    :stub-columns: 0
> +    :widths:       1 1 2
> +
> +    * - __u8
> +      - ``luma_log2_weight_denom``
> +      -
> +    * - __u8
> +      - ``chroma_log2_weight_denom``
> +      -
> +    * - struct :c:type:`v4l2_h264_weight_factors`
> +      - ``weight_factors[2]``
> +      -
> +
> +.. c:type:: v4l2_h264_weight_factors
> +
> +.. cssclass:: longtable
> +
> +.. flat-table:: struct v4l2_h264_weight_factors
> +    :header-rows:  0
> +    :stub-columns: 0
> +    :widths:       1 1 2
> +
> +    * - __s8
> +      - ``luma_weight[32]``
> +      -
> +    * - __s8
> +      - ``luma_offset[32]``
> +      -
> +    * - __s8
> +      - ``chroma_weight[32][2]``
> +      -
> +    * - __s8
> +      - ``chroma_offset[32][2]``
> +      -
> +
> +``V4L2_CID_MPEG_VIDEO_H264_DECODE_PARAMS (struct)``
> +    Specifies the decode parameters (as extracted from the bitstream)
> +    for the associated H264 slice data. This includes the necessary
> +    parameters for configuring a stateless hardware decoding pipeline
> +    for H264.  The bitstream parameters are defined according to
> +    :ref:`h264`. Unless there's a specific comment, refer to the
> +    specification for the documentation of these fields.
> +
> +.. c:type:: v4l2_ctrl_h264_decode_param
> +
> +.. cssclass:: longtable
> +
> +.. flat-table:: struct v4l2_ctrl_h264_decode_param
> +    :header-rows:  0
> +    :stub-columns: 0
> +    :widths:       1 1 2
> +
> +    * - __u32
> +      - ``num_slices``
> +      -
> +    * - __u8
> +      - ``idr_pic_flag``
> +      -
> +    * - __u8
> +      - ``nal_ref_idc``
> +      -
> +    * - __s32
> +      - ``top_field_order_cnt``
> +      -
> +    * - __s32
> +      - ``bottom_field_order_cnt``
> +      -
> +    * - __u8
> +      - ``ref_pic_list_p0[32]``
> +      -
> +    * - __u8
> +      - ``ref_pic_list_b0[32]``
> +      -
> +    * - __u8
> +      - ``ref_pic_list_b1[32]``
> +      -
> +    * - struct :c:type:`v4l2_h264_dpb_entry`
> +      - ``dpb[16]``
> +      -
> +
> +.. c:type:: v4l2_h264_dpb_entry
> +
> +.. cssclass:: longtable
> +
> +.. flat-table:: struct v4l2_h264_dpb_entry
> +    :header-rows:  0
> +    :stub-columns: 0
> +    :widths:       1 1 2
> +
> +    * - __u32
> +      - ``tag``
> +      - tag to identify the buffer containing the reference frame
> +    * - __u16
> +      - ``frame_num``
> +      -
> +    * - __u16
> +      - ``pic_num``
> +      -
> +    * - __s32
> +      - ``top_field_order_cnt``
> +      -
> +    * - __s32
> +      - ``bottom_field_order_cnt``
> +      -
> +    * - __u8
> +      - ``flags``
> +      -
> +
>  MFC 5.1 MPEG Controls
>  ---------------------
>  
> diff --git a/Documentation/media/uapi/v4l/pixfmt-compressed.rst b/Documentation/media/uapi/v4l/pixfmt-compressed.rst
> index ba0f6c49d9bf..f15fc1c8d479 100644
> --- a/Documentation/media/uapi/v4l/pixfmt-compressed.rst
> +++ b/Documentation/media/uapi/v4l/pixfmt-compressed.rst
> @@ -45,6 +45,26 @@ Compressed Formats
>        - ``V4L2_PIX_FMT_H264_MVC``
>        - 'M264'
>        - H264 MVC video elementary stream.
> +    * .. _V4L2-PIX-FMT-H264-SLICE:
> +
> +      - ``V4L2_PIX_FMT_H264_SLICE``
> +      - 'S264'
> +      - H264 parsed slice data, as extracted from the H264 bitstream.
> +	This format is adapted for stateless video decoders that
> +	implement an H264 pipeline (using the :ref:`codec` and
> +	:ref:`media-request-api`).  Metadata associated with the frame
> +	to decode are required to be passed through the
> +	``V4L2_CID_MPEG_VIDEO_H264_SPS``,
> +	``V4L2_CID_MPEG_VIDEO_H264_PPS`` and

 and -> ,

> +	``V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAMS`` and
> +	``V4L2_CID_MPEG_VIDEO_H264_DECODE_PARAMS`` controls and
> +	scaling matrices can optionally be specified through the
> +	``V4L2_CID_MPEG_VIDEO_H264_SCALING_MATRIX`` control.  See the
> +	:ref:`associated Codec Control IDs <v4l2-mpeg-h264>`.
> +	Exactly one output and one capture buffer must be provided for
> +	use with this pixel format. The output buffer must contain the
> +	appropriate number of macroblocks to decode a full
> +	corresponding frame to the matching capture buffer.
>      * .. _V4L2-PIX-FMT-H263:
>  
>        - ``V4L2_PIX_FMT_H263``
> diff --git a/Documentation/media/uapi/v4l/vidioc-queryctrl.rst b/Documentation/media/uapi/v4l/vidioc-queryctrl.rst
> index 258f5813f281..38a9c988124c 100644
> --- a/Documentation/media/uapi/v4l/vidioc-queryctrl.rst
> +++ b/Documentation/media/uapi/v4l/vidioc-queryctrl.rst
> @@ -436,6 +436,36 @@ See also the examples in :ref:`control`.
>        - n/a
>        - A struct :c:type:`v4l2_ctrl_mpeg2_quantization`, containing MPEG-2
>  	quantization matrices for stateless video decoders.
> +    * - ``V4L2_CTRL_TYPE_H264_SPS``
> +      - n/a
> +      - n/a
> +      - n/a
> +      - A struct :c:type:`v4l2_ctrl_h264_sps`, containing H264
> +	sequence parameters for stateless video decoders.
> +    * - ``V4L2_CTRL_TYPE_H264_PPS``
> +      - n/a
> +      - n/a
> +      - n/a
> +      - A struct :c:type:`v4l2_ctrl_h264_pps`, containing H264
> +	picture parameters for stateless video decoders.
> +    * - ``V4L2_CTRL_TYPE_H264_SCALING_MATRIX``
> +      - n/a
> +      - n/a
> +      - n/a
> +      - A struct :c:type:`v4l2_ctrl_h264_scaling_matrix`, containing H264
> +	scaling matrices for stateless video decoders.
> +    * - ``V4L2_CTRL_TYPE_H264_SLICE_PARAMS``
> +      - n/a
> +      - n/a
> +      - n/a
> +      - A struct :c:type:`v4l2_ctrl_h264_slice_param`, containing H264
> +	slice parameters for stateless video decoders.
> +    * - ``V4L2_CTRL_TYPE_H264_DECODE_PARAMS``
> +      - n/a
> +      - n/a
> +      - n/a
> +      - A struct :c:type:`v4l2_ctrl_h264_decode_param`, containing H264
> +	decode parameters for stateless video decoders.
>  
>  .. tabularcolumns:: |p{6.6cm}|p{2.2cm}|p{8.7cm}|
>  
> diff --git a/Documentation/media/videodev2.h.rst.exceptions b/Documentation/media/videodev2.h.rst.exceptions
> index 1ec425a7c364..99f1bd2bc44c 100644
> --- a/Documentation/media/videodev2.h.rst.exceptions
> +++ b/Documentation/media/videodev2.h.rst.exceptions
> @@ -133,6 +133,11 @@ replace symbol V4L2_CTRL_TYPE_U32 :c:type:`v4l2_ctrl_type`
>  replace symbol V4L2_CTRL_TYPE_U8 :c:type:`v4l2_ctrl_type`
>  replace symbol V4L2_CTRL_TYPE_MPEG2_SLICE_PARAMS :c:type:`v4l2_ctrl_type`
>  replace symbol V4L2_CTRL_TYPE_MPEG2_QUANTIZATION :c:type:`v4l2_ctrl_type`
> +replace symbol V4L2_CTRL_TYPE_H264_SPS :c:type:`v4l2_ctrl_type`
> +replace symbol V4L2_CTRL_TYPE_H264_PPS :c:type:`v4l2_ctrl_type`
> +replace symbol V4L2_CTRL_TYPE_H264_SCALING_MATRIX :c:type:`v4l2_ctrl_type`
> +replace symbol V4L2_CTRL_TYPE_H264_SLICE_PARAMS :c:type:`v4l2_ctrl_type`
> +replace symbol V4L2_CTRL_TYPE_H264_DECODE_PARAMS :c:type:`v4l2_ctrl_type`
>  
>  # V4L2 capability defines
>  replace define V4L2_CAP_VIDEO_CAPTURE device-capabilities
> diff --git a/drivers/media/v4l2-core/v4l2-ctrls.c b/drivers/media/v4l2-core/v4l2-ctrls.c
> index b854cceb19dc..e96c453208e8 100644
> --- a/drivers/media/v4l2-core/v4l2-ctrls.c
> +++ b/drivers/media/v4l2-core/v4l2-ctrls.c
> @@ -825,6 +825,11 @@ const char *v4l2_ctrl_get_name(u32 id)
>  	case V4L2_CID_MPEG_VIDEO_H264_HIERARCHICAL_CODING_LAYER:return "H264 Number of HC Layers";
>  	case V4L2_CID_MPEG_VIDEO_H264_HIERARCHICAL_CODING_LAYER_QP:
>  								return "H264 Set QP Value for HC Layers";
> +	case V4L2_CID_MPEG_VIDEO_H264_SPS:			return "H264 SPS";
> +	case V4L2_CID_MPEG_VIDEO_H264_PPS:			return "H264 PPS";
> +	case V4L2_CID_MPEG_VIDEO_H264_SCALING_MATRIX:		return "H264 Scaling Matrix";
> +	case V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAMS:		return "H264 Slice Parameters";
> +	case V4L2_CID_MPEG_VIDEO_H264_DECODE_PARAMS:		return "H264 Decode Parameters";
>  	case V4L2_CID_MPEG_VIDEO_MPEG4_I_FRAME_QP:		return "MPEG4 I-Frame QP Value";
>  	case V4L2_CID_MPEG_VIDEO_MPEG4_P_FRAME_QP:		return "MPEG4 P-Frame QP Value";
>  	case V4L2_CID_MPEG_VIDEO_MPEG4_B_FRAME_QP:		return "MPEG4 B-Frame QP Value";
> @@ -1300,6 +1305,21 @@ void v4l2_ctrl_fill(u32 id, const char **name, enum v4l2_ctrl_type *type,
>  	case V4L2_CID_MPEG_VIDEO_MPEG2_QUANTIZATION:
>  		*type = V4L2_CTRL_TYPE_MPEG2_QUANTIZATION;
>  		break;
> +	case V4L2_CID_MPEG_VIDEO_H264_SPS:
> +		*type = V4L2_CTRL_TYPE_H264_SPS;
> +		break;
> +	case V4L2_CID_MPEG_VIDEO_H264_PPS:
> +		*type = V4L2_CTRL_TYPE_H264_PPS;
> +		break;
> +	case V4L2_CID_MPEG_VIDEO_H264_SCALING_MATRIX:
> +		*type = V4L2_CTRL_TYPE_H264_SCALING_MATRIX;
> +		break;
> +	case V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAMS:
> +		*type = V4L2_CTRL_TYPE_H264_SLICE_PARAMS;
> +		break;
> +	case V4L2_CID_MPEG_VIDEO_H264_DECODE_PARAMS:
> +		*type = V4L2_CTRL_TYPE_H264_DECODE_PARAMS;
> +		break;
>  	default:
>  		*type = V4L2_CTRL_TYPE_INTEGER;
>  		break;
> @@ -1665,6 +1685,13 @@ static int std_validate(const struct v4l2_ctrl *ctrl, u32 idx,
>  	case V4L2_CTRL_TYPE_MPEG2_QUANTIZATION:
>  		return 0;
>  
> +	case V4L2_CTRL_TYPE_H264_SPS:
> +	case V4L2_CTRL_TYPE_H264_PPS:
> +	case V4L2_CTRL_TYPE_H264_SCALING_MATRIX:
> +	case V4L2_CTRL_TYPE_H264_SLICE_PARAMS:
> +	case V4L2_CTRL_TYPE_H264_DECODE_PARAMS:
> +		return 0;
> +
>  	default:
>  		return -EINVAL;
>  	}
> @@ -2245,6 +2272,21 @@ static struct v4l2_ctrl *v4l2_ctrl_new(struct v4l2_ctrl_handler *hdl,
>  	case V4L2_CTRL_TYPE_MPEG2_QUANTIZATION:
>  		elem_size = sizeof(struct v4l2_ctrl_mpeg2_quantization);
>  		break;
> +	case V4L2_CTRL_TYPE_H264_SPS:
> +		elem_size = sizeof(struct v4l2_ctrl_h264_sps);
> +		break;
> +	case V4L2_CTRL_TYPE_H264_PPS:
> +		elem_size = sizeof(struct v4l2_ctrl_h264_pps);
> +		break;
> +	case V4L2_CTRL_TYPE_H264_SCALING_MATRIX:
> +		elem_size = sizeof(struct v4l2_ctrl_h264_scaling_matrix);
> +		break;
> +	case V4L2_CTRL_TYPE_H264_SLICE_PARAMS:
> +		elem_size = sizeof(struct v4l2_ctrl_h264_slice_param);
> +		break;
> +	case V4L2_CTRL_TYPE_H264_DECODE_PARAMS:
> +		elem_size = sizeof(struct v4l2_ctrl_h264_decode_param);
> +		break;
>  	default:
>  		if (type < V4L2_CTRL_COMPOUND_TYPES)
>  			elem_size = sizeof(s32);
> diff --git a/drivers/media/v4l2-core/v4l2-ioctl.c b/drivers/media/v4l2-core/v4l2-ioctl.c
> index 49103787d19a..aa63f1794272 100644
> --- a/drivers/media/v4l2-core/v4l2-ioctl.c
> +++ b/drivers/media/v4l2-core/v4l2-ioctl.c
> @@ -1309,6 +1309,7 @@ static void v4l_fill_fmtdesc(struct v4l2_fmtdesc *fmt)
>  		case V4L2_PIX_FMT_H264:		descr = "H.264"; break;
>  		case V4L2_PIX_FMT_H264_NO_SC:	descr = "H.264 (No Start Codes)"; break;
>  		case V4L2_PIX_FMT_H264_MVC:	descr = "H.264 MVC"; break;
> +		case V4L2_PIX_FMT_H264_SLICE:	descr = "H.264 Parsed Slice"; break;
>  		case V4L2_PIX_FMT_H263:		descr = "H.263"; break;
>  		case V4L2_PIX_FMT_MPEG1:	descr = "MPEG-1 ES"; break;
>  		case V4L2_PIX_FMT_MPEG2:	descr = "MPEG-2 ES"; break;
> diff --git a/include/media/v4l2-ctrls.h b/include/media/v4l2-ctrls.h
> index 83ce0593b275..b4ca95710d2d 100644
> --- a/include/media/v4l2-ctrls.h
> +++ b/include/media/v4l2-ctrls.h
> @@ -43,6 +43,11 @@ struct poll_table_struct;
>   * @p_char:			Pointer to a string.
>   * @p_mpeg2_slice_params:	Pointer to a MPEG2 slice parameters structure.
>   * @p_mpeg2_quantization:	Pointer to a MPEG2 quantization data structure.
> + * @p_h264_sps:			Pointer to a struct v4l2_ctrl_h264_sps.
> + * @p_h264_pps:			Pointer to a struct v4l2_ctrl_h264_pps.
> + * @p_h264_scal_mtrx:		Pointer to a struct v4l2_ctrl_h264_scaling_matrix.
> + * @p_h264_slice_param:		Pointer to a struct v4l2_ctrl_h264_slice_param.
> + * @p_h264_decode_param:	Pointer to a struct v4l2_ctrl_h264_decode_param.
>   * @p:				Pointer to a compound value.
>   */
>  union v4l2_ctrl_ptr {
> @@ -54,6 +59,11 @@ union v4l2_ctrl_ptr {
>  	char *p_char;
>  	struct v4l2_ctrl_mpeg2_slice_params *p_mpeg2_slice_params;
>  	struct v4l2_ctrl_mpeg2_quantization *p_mpeg2_quantization;
> +	struct v4l2_ctrl_h264_sps *p_h264_sps;
> +	struct v4l2_ctrl_h264_pps *p_h264_pps;
> +	struct v4l2_ctrl_h264_scaling_matrix *p_h264_scal_mtrx;

Just write this in full: _scaling_matrix

This abbreviation is ugly :-)

> +	struct v4l2_ctrl_h264_slice_param *p_h264_slice_param;
> +	struct v4l2_ctrl_h264_decode_param *p_h264_decode_param;
>  	void *p;
>  };
>  
> diff --git a/include/uapi/linux/v4l2-controls.h b/include/uapi/linux/v4l2-controls.h
> index 76f5322ec543..fb1469ec1b90 100644
> --- a/include/uapi/linux/v4l2-controls.h
> +++ b/include/uapi/linux/v4l2-controls.h
> @@ -50,6 +50,8 @@
>  #ifndef __LINUX_V4L2_CONTROLS_H
>  #define __LINUX_V4L2_CONTROLS_H
>  
> +#include <linux/types.h>
> +
>  /* Control classes */
>  #define V4L2_CTRL_CLASS_USER		0x00980000	/* Old-style 'user' controls */
>  #define V4L2_CTRL_CLASS_MPEG		0x00990000	/* MPEG-compression controls */
> @@ -534,6 +536,12 @@ enum v4l2_mpeg_video_h264_hierarchical_coding_type {
>  };
>  #define V4L2_CID_MPEG_VIDEO_H264_HIERARCHICAL_CODING_LAYER	(V4L2_CID_MPEG_BASE+381)
>  #define V4L2_CID_MPEG_VIDEO_H264_HIERARCHICAL_CODING_LAYER_QP	(V4L2_CID_MPEG_BASE+382)
> +#define V4L2_CID_MPEG_VIDEO_H264_SPS		(V4L2_CID_MPEG_BASE+383)
> +#define V4L2_CID_MPEG_VIDEO_H264_PPS		(V4L2_CID_MPEG_BASE+384)
> +#define V4L2_CID_MPEG_VIDEO_H264_SCALING_MATRIX	(V4L2_CID_MPEG_BASE+385)
> +#define V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAMS	(V4L2_CID_MPEG_BASE+386)
> +#define V4L2_CID_MPEG_VIDEO_H264_DECODE_PARAMS	(V4L2_CID_MPEG_BASE+387)
> +
>  #define V4L2_CID_MPEG_VIDEO_MPEG4_I_FRAME_QP	(V4L2_CID_MPEG_BASE+400)
>  #define V4L2_CID_MPEG_VIDEO_MPEG4_P_FRAME_QP	(V4L2_CID_MPEG_BASE+401)
>  #define V4L2_CID_MPEG_VIDEO_MPEG4_B_FRAME_QP	(V4L2_CID_MPEG_BASE+402)
> @@ -1156,4 +1164,162 @@ struct v4l2_ctrl_mpeg2_quantization {
>  	__u8	chroma_non_intra_quantiser_matrix[64];
>  };
>  
> +/* Compounds controls */
> +
> +#define V4L2_H264_SPS_CONSTRAINT_SET0_FLAG			0x01
> +#define V4L2_H264_SPS_CONSTRAINT_SET1_FLAG			0x02
> +#define V4L2_H264_SPS_CONSTRAINT_SET2_FLAG			0x04
> +#define V4L2_H264_SPS_CONSTRAINT_SET3_FLAG			0x08
> +#define V4L2_H264_SPS_CONSTRAINT_SET4_FLAG			0x10
> +#define V4L2_H264_SPS_CONSTRAINT_SET5_FLAG			0x20
> +
> +#define V4L2_H264_SPS_FLAG_SEPARATE_COLOUR_PLANE		0x01
> +#define V4L2_H264_SPS_FLAG_QPPRIME_Y_ZERO_TRANSFORM_BYPASS	0x02
> +#define V4L2_H264_SPS_FLAG_DELTA_PIC_ORDER_ALWAYS_ZERO		0x04
> +#define V4L2_H264_SPS_FLAG_GAPS_IN_FRAME_NUM_VALUE_ALLOWED	0x08
> +#define V4L2_H264_SPS_FLAG_FRAME_MBS_ONLY			0x10
> +#define V4L2_H264_SPS_FLAG_MB_ADAPTIVE_FRAME_FIELD		0x20
> +#define V4L2_H264_SPS_FLAG_DIRECT_8X8_INFERENCE			0x40
> +
> +struct v4l2_ctrl_h264_sps {
> +	__u8 profile_idc;
> +	__u8 constraint_set_flags;
> +	__u8 level_idc;
> +	__u8 seq_parameter_set_id;
> +	__u8 chroma_format_idc;
> +	__u8 bit_depth_luma_minus8;
> +	__u8 bit_depth_chroma_minus8;
> +	__u8 log2_max_frame_num_minus4;
> +	__u8 pic_order_cnt_type;
> +	__u8 log2_max_pic_order_cnt_lsb_minus4;
> +	__u8 max_num_ref_frames;
> +	__u8 num_ref_frames_in_pic_order_cnt_cycle;
> +	__s32 offset_for_ref_frame[255];
> +	__s32 offset_for_non_ref_pic;
> +	__s32 offset_for_top_to_bottom_field;
> +	__u16 pic_width_in_mbs_minus1;
> +	__u16 pic_height_in_map_units_minus1;
> +	__u8 flags;
> +};

A general comment for all these compound control structures:

Make very sure that there are no holes in the struct. This is both as
a security measure (kernel memory data can leak through the holes) and
as a 32/64 bit compatibility issue.

Probably the easiest way to ensure this is to make each struct is 4-byte
aligned and with no holes.

Don't add padding fields, just increase the size of one or more field from
u8 to a large size.

> +
> +#define V4L2_H264_PPS_FLAG_ENTROPY_CODING_MODE				0x0001
> +#define V4L2_H264_PPS_FLAG_BOTTOM_FIELD_PIC_ORDER_IN_FRAME_PRESENT	0x0002
> +#define V4L2_H264_PPS_FLAG_WEIGHTED_PRED				0x0004
> +#define V4L2_H264_PPS_FLAG_DEBLOCKING_FILTER_CONTROL_PRESENT		0x0008
> +#define V4L2_H264_PPS_FLAG_CONSTRAINED_INTRA_PRED			0x0010
> +#define V4L2_H264_PPS_FLAG_REDUNDANT_PIC_CNT_PRESENT			0x0020
> +#define V4L2_H264_PPS_FLAG_TRANSFORM_8X8_MODE				0x0040
> +#define V4L2_H264_PPS_FLAG_PIC_SCALING_MATRIX_PRESENT			0x0080
> +
> +struct v4l2_ctrl_h264_pps {
> +	__u8 pic_parameter_set_id;
> +	__u8 seq_parameter_set_id;
> +	__u8 num_slice_groups_minus1;
> +	__u8 num_ref_idx_l0_default_active_minus1;
> +	__u8 num_ref_idx_l1_default_active_minus1;
> +	__u8 weighted_bipred_idc;
> +	__s8 pic_init_qp_minus26;
> +	__s8 pic_init_qs_minus26;
> +	__s8 chroma_qp_index_offset;
> +	__s8 second_chroma_qp_index_offset;
> +	__u8 flags;
> +};
> +
> +struct v4l2_ctrl_h264_scaling_matrix {
> +	__u8 scaling_list_4x4[6][16];
> +	__u8 scaling_list_8x8[6][64];
> +};
> +
> +struct v4l2_h264_weight_factors {
> +	__s8 luma_weight[32];
> +	__s8 luma_offset[32];
> +	__s8 chroma_weight[32][2];
> +	__s8 chroma_offset[32][2];
> +};
> +
> +struct v4l2_h264_pred_weight_table {
> +	__u8 luma_log2_weight_denom;
> +	__u8 chroma_log2_weight_denom;
> +	struct v4l2_h264_weight_factors weight_factors[2];
> +};
> +
> +#define V4L2_H264_SLICE_TYPE_P				0
> +#define V4L2_H264_SLICE_TYPE_B				1
> +#define V4L2_H264_SLICE_TYPE_I				2
> +#define V4L2_H264_SLICE_TYPE_SP				3
> +#define V4L2_H264_SLICE_TYPE_SI				4
> +
> +#define V4L2_H264_SLICE_FLAG_FIELD_PIC			0x01
> +#define V4L2_H264_SLICE_FLAG_BOTTOM_FIELD		0x02
> +#define V4L2_H264_SLICE_FLAG_DIRECT_SPATIAL_MV_PRED	0x04
> +#define V4L2_H264_SLICE_FLAG_SP_FOR_SWITCH		0x08
> +
> +struct v4l2_ctrl_h264_slice_param {
> +	/* Size in bytes, including header */
> +	__u32 size;
> +	/* Offset in bits to slice_data() from the beginning of this slice. */
> +	__u32 header_bit_size;
> +
> +	__u16 first_mb_in_slice;
> +	__u8 slice_type;
> +	__u8 pic_parameter_set_id;
> +	__u8 colour_plane_id;
> +	__u16 frame_num;
> +	__u16 idr_pic_id;
> +	__u16 pic_order_cnt_lsb;
> +	__s32 delta_pic_order_cnt_bottom;
> +	__s32 delta_pic_order_cnt0;
> +	__s32 delta_pic_order_cnt1;
> +	__u8 redundant_pic_cnt;
> +
> +	struct v4l2_h264_pred_weight_table pred_weight_table;
> +	/* Size in bits of dec_ref_pic_marking() syntax element. */
> +	__u32 dec_ref_pic_marking_bit_size;
> +	/* Size in bits of pic order count syntax. */
> +	__u32 pic_order_cnt_bit_size;
> +
> +	__u8 cabac_init_idc;
> +	__s8 slice_qp_delta;
> +	__s8 slice_qs_delta;
> +	__u8 disable_deblocking_filter_idc;
> +	__s8 slice_alpha_c0_offset_div2;
> +	__s8 slice_beta_offset_div2;
> +	__u32 slice_group_change_cycle;
> +
> +	__u8 num_ref_idx_l0_active_minus1;
> +	__u8 num_ref_idx_l1_active_minus1;
> +	/*  Entries on each list are indices
> +	 *  into v4l2_ctrl_h264_decode_param.dpb[]. */
> +	__u8 ref_pic_list0[32];
> +	__u8 ref_pic_list1[32];
> +
> +	__u8 flags;
> +};
> +
> +#define V4L2_H264_DPB_ENTRY_FLAG_VALID		0x01
> +#define V4L2_H264_DPB_ENTRY_FLAG_ACTIVE		0x02
> +#define V4L2_H264_DPB_ENTRY_FLAG_LONG_TERM	0x04
> +
> +struct v4l2_h264_dpb_entry {
> +	__u32 tag;
> +	__u16 frame_num;
> +	__u16 pic_num;
> +	/* Note that field is indicated by v4l2_buffer.field */
> +	__s32 top_field_order_cnt;
> +	__s32 bottom_field_order_cnt;
> +	__u8 flags; /* V4L2_H264_DPB_ENTRY_FLAG_* */
> +};
> +
> +struct v4l2_ctrl_h264_decode_param {
> +	__u32 num_slices;
> +	__u8 idr_pic_flag;
> +	__u8 nal_ref_idc;
> +	__s32 top_field_order_cnt;
> +	__s32 bottom_field_order_cnt;
> +	__u8 ref_pic_list_p0[32];
> +	__u8 ref_pic_list_b0[32];
> +	__u8 ref_pic_list_b1[32];
> +	struct v4l2_h264_dpb_entry dpb[16];
> +};
> +
>  #endif
> diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h
> index 173a94d2cbef..dd028e0bf306 100644
> --- a/include/uapi/linux/videodev2.h
> +++ b/include/uapi/linux/videodev2.h
> @@ -643,6 +643,7 @@ struct v4l2_pix_format {
>  #define V4L2_PIX_FMT_H264     v4l2_fourcc('H', '2', '6', '4') /* H264 with start codes */
>  #define V4L2_PIX_FMT_H264_NO_SC v4l2_fourcc('A', 'V', 'C', '1') /* H264 without start codes */
>  #define V4L2_PIX_FMT_H264_MVC v4l2_fourcc('M', '2', '6', '4') /* H264 MVC */
> +#define V4L2_PIX_FMT_H264_SLICE v4l2_fourcc('S', '2', '6', '4') /* H264 parsed slices */
>  #define V4L2_PIX_FMT_H263     v4l2_fourcc('H', '2', '6', '3') /* H263          */
>  #define V4L2_PIX_FMT_MPEG1    v4l2_fourcc('M', 'P', 'G', '1') /* MPEG-1 ES     */
>  #define V4L2_PIX_FMT_MPEG2    v4l2_fourcc('M', 'P', 'G', '2') /* MPEG-2 ES     */
> @@ -1631,6 +1632,11 @@ struct v4l2_ext_control {
>  		__u32 __user *p_u32;
>  		struct v4l2_ctrl_mpeg2_slice_params __user *p_mpeg2_slice_params;
>  		struct v4l2_ctrl_mpeg2_quantization __user *p_mpeg2_quantization;
> +		struct v4l2_ctrl_h264_sps __user *p_h264_sps;
> +		struct v4l2_ctrl_h264_pps __user *p_h264_pps;
> +		struct v4l2_ctrl_h264_scaling_matrix __user *p_h264_scal_mtrx;
> +		struct v4l2_ctrl_h264_slice_param __user *p_h264_slice_param;
> +		struct v4l2_ctrl_h264_decode_param __user *p_h264_decode_param;
>  		void __user *ptr;
>  	};
>  } __attribute__ ((packed));
> @@ -1678,6 +1684,11 @@ enum v4l2_ctrl_type {
>  	V4L2_CTRL_TYPE_U32	     = 0x0102,
>  	V4L2_CTRL_TYPE_MPEG2_SLICE_PARAMS = 0x0103,
>  	V4L2_CTRL_TYPE_MPEG2_QUANTIZATION = 0x0104,
> +	V4L2_CTRL_TYPE_H264_SPS      = 0x0105,
> +	V4L2_CTRL_TYPE_H264_PPS      = 0x0106,
> +	V4L2_CTRL_TYPE_H264_SCALING_MATRIX = 0x0107,
> +	V4L2_CTRL_TYPE_H264_SLICE_PARAMS = 0x0108,
> +	V4L2_CTRL_TYPE_H264_DECODE_PARAMS = 0x0109,
>  };
>  
>  /*  Used in the VIDIOC_QUERYCTRL ioctl for querying controls */
> 

Regards,

	Hans

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [linux-sunxi] [PATCH v2 2/2] media: cedrus: Add H264 decoding support
  2018-11-15 14:56 ` [PATCH v2 2/2] media: cedrus: Add H264 decoding support Maxime Ripard
  2018-11-24 20:43   ` [linux-sunxi] " Jernej Škrabec
  2018-11-30 12:37   ` Paul Kocialkowski
@ 2018-12-05 22:27   ` Jernej Škrabec
  2 siblings, 0 replies; 27+ messages in thread
From: Jernej Škrabec @ 2018-12-05 22:27 UTC (permalink / raw)
  To: linux-sunxi, maxime.ripard
  Cc: hans.verkuil, acourbot, sakari.ailus, Laurent Pinchart, tfiga,
	posciak, Paul Kocialkowski, Chen-Yu Tsai, linux-kernel,
	linux-arm-kernel, linux-media, nicolas.dufresne, jenskuske,
	Thomas Petazzoni, jonas

Hi!

Jonas Karlman (in CC) and me managed to solve playback issues with interlaced 
H264 videos.

Please check comments below. 

You can also build and test LibreELEC for H3 from 
https://github.com/jernejsk/LibreELEC.tv/tree/hw_dec_ffmpeg

It has all changes suggested below, except buffer sizes are calculated for 
worst case instead of using formula from CedarX. It also uses Jonas WIP FFmpeg 
patches for Request API. libva-v4l2-request library is not used.

Dne četrtek, 15. november 2018 ob 15:56:50 CET je Maxime Ripard napisal(a):
> Introduce some basic H264 decoding support in cedrus. So far, only the
> baseline profile videos have been tested, and some more advanced features
> used in higher profiles are not even implemented.
> 
> Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
> ---
>  drivers/staging/media/sunxi/cedrus/Makefile   |   3 +-
>  drivers/staging/media/sunxi/cedrus/cedrus.c   |  25 +
>  drivers/staging/media/sunxi/cedrus/cedrus.h   |  35 +-
>  .../staging/media/sunxi/cedrus/cedrus_dec.c   |  11 +
>  .../staging/media/sunxi/cedrus/cedrus_h264.c  | 470 ++++++++++++++++++
>  .../staging/media/sunxi/cedrus/cedrus_hw.c    |   4 +
>  .../staging/media/sunxi/cedrus/cedrus_regs.h  |  63 +++
>  .../staging/media/sunxi/cedrus/cedrus_video.c |   9 +
>  8 files changed, 618 insertions(+), 2 deletions(-)
>  create mode 100644 drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> 
> diff --git a/drivers/staging/media/sunxi/cedrus/Makefile
> b/drivers/staging/media/sunxi/cedrus/Makefile index
> e9dc68b7bcb6..aaf141fc58b6 100644
> --- a/drivers/staging/media/sunxi/cedrus/Makefile
> +++ b/drivers/staging/media/sunxi/cedrus/Makefile
> @@ -1,3 +1,4 @@
>  obj-$(CONFIG_VIDEO_SUNXI_CEDRUS) += sunxi-cedrus.o
> 
> -sunxi-cedrus-y = cedrus.o cedrus_video.o cedrus_hw.o cedrus_dec.o
> cedrus_mpeg2.o +sunxi-cedrus-y = cedrus.o cedrus_video.o cedrus_hw.o
> cedrus_dec.o \ +		 cedrus_mpeg2.o cedrus_h264.o
> diff --git a/drivers/staging/media/sunxi/cedrus/cedrus.c
> b/drivers/staging/media/sunxi/cedrus/cedrus.c index
> 82558455384a..627a8c07eb21 100644
> --- a/drivers/staging/media/sunxi/cedrus/cedrus.c
> +++ b/drivers/staging/media/sunxi/cedrus/cedrus.c
> @@ -40,6 +40,30 @@ static const struct cedrus_control cedrus_controls[] = {
>  		.codec		= CEDRUS_CODEC_MPEG2,
>  		.required	= false,
>  	},
> +	{
> +		.id		= V4L2_CID_MPEG_VIDEO_H264_DECODE_PARAMS,
> +		.elem_size	= sizeof(struct v4l2_ctrl_h264_decode_param),
> +		.codec		= CEDRUS_CODEC_H264,
> +		.required	= true,
> +	},
> +	{
> +		.id		= V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAMS,
> +		.elem_size	= sizeof(struct v4l2_ctrl_h264_slice_param),
> +		.codec		= CEDRUS_CODEC_H264,
> +		.required	= true,
> +	},
> +	{
> +		.id		= V4L2_CID_MPEG_VIDEO_H264_SPS,
> +		.elem_size	= sizeof(struct v4l2_ctrl_h264_sps),
> +		.codec		= CEDRUS_CODEC_H264,
> +		.required	= true,
> +	},
> +	{
> +		.id		= V4L2_CID_MPEG_VIDEO_H264_PPS,
> +		.elem_size	= sizeof(struct v4l2_ctrl_h264_pps),
> +		.codec		= CEDRUS_CODEC_H264,
> +		.required	= true,
> +	},
>  };
> 
>  #define CEDRUS_CONTROLS_COUNT	ARRAY_SIZE(cedrus_controls)
> @@ -277,6 +301,7 @@ static int cedrus_probe(struct platform_device *pdev)
>  	}
> 
>  	dev->dec_ops[CEDRUS_CODEC_MPEG2] = &cedrus_dec_ops_mpeg2;
> +	dev->dec_ops[CEDRUS_CODEC_H264] = &cedrus_dec_ops_h264;
> 
>  	mutex_init(&dev->dev_mutex);
>  	spin_lock_init(&dev->irq_lock);
> diff --git a/drivers/staging/media/sunxi/cedrus/cedrus.h
> b/drivers/staging/media/sunxi/cedrus/cedrus.h index
> 781676b55a1b..179c10dcf6a7 100644
> --- a/drivers/staging/media/sunxi/cedrus/cedrus.h
> +++ b/drivers/staging/media/sunxi/cedrus/cedrus.h
> @@ -30,7 +30,7 @@
> 
>  enum cedrus_codec {
>  	CEDRUS_CODEC_MPEG2,
> -
> +	CEDRUS_CODEC_H264,
>  	CEDRUS_CODEC_LAST,
>  };
> 
> @@ -40,6 +40,12 @@ enum cedrus_irq_status {
>  	CEDRUS_IRQ_OK,
>  };
> 
> +enum cedrus_h264_pic_type {
> +	CEDRUS_H264_PIC_TYPE_FRAME	= 0,
> +	CEDRUS_H264_PIC_TYPE_FIELD,
> +	CEDRUS_H264_PIC_TYPE_MBAFF,
> +};
> +
>  struct cedrus_control {
>  	u32			id;
>  	u32			elem_size;
> @@ -47,6 +53,13 @@ struct cedrus_control {
>  	unsigned char		required:1;
>  };
> 
> +struct cedrus_h264_run {
> +	const struct v4l2_ctrl_h264_decode_param	*decode_param;
> +	const struct v4l2_ctrl_h264_pps			*pps;
> +	const struct v4l2_ctrl_h264_slice_param		*slice_param;
> +	const struct v4l2_ctrl_h264_sps			*sps;
> +};
> +
>  struct cedrus_mpeg2_run {
>  	const struct v4l2_ctrl_mpeg2_slice_params	*slice_params;
>  	const struct v4l2_ctrl_mpeg2_quantization	*quantization;
> @@ -57,12 +70,20 @@ struct cedrus_run {
>  	struct vb2_v4l2_buffer	*dst;
> 
>  	union {
> +		struct cedrus_h264_run	h264;
>  		struct cedrus_mpeg2_run	mpeg2;
>  	};
>  };
> 
>  struct cedrus_buffer {
>  	struct v4l2_m2m_buffer          m2m_buf;
> +
> +	union {
> +		struct {
> +			unsigned int			position;
> +			enum cedrus_h264_pic_type	pic_type;
> +		} h264;
> +	} codec;
>  };
> 
>  struct cedrus_ctx {
> @@ -77,6 +98,17 @@ struct cedrus_ctx {
>  	struct v4l2_ctrl		**ctrls;
> 
>  	struct vb2_buffer		*dst_bufs[VIDEO_MAX_FRAME];
> +
> +	union {
> +		struct {
> +			void		*mv_col_buf;
> +			dma_addr_t	mv_col_buf_dma;
> +			ssize_t		mv_col_buf_field_size;
> +			ssize_t		mv_col_buf_size;
> +			void		*pic_info_buf;
> +			dma_addr_t	pic_info_buf_dma;
> +		} h264;
> +	} codec;
>  };
> 
>  struct cedrus_dec_ops {
> @@ -120,6 +152,7 @@ struct cedrus_dev {
>  };
> 
>  extern struct cedrus_dec_ops cedrus_dec_ops_mpeg2;
> +extern struct cedrus_dec_ops cedrus_dec_ops_h264;
> 
>  static inline void cedrus_write(struct cedrus_dev *dev, u32 reg, u32 val)
>  {
> diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_dec.c
> b/drivers/staging/media/sunxi/cedrus/cedrus_dec.c index
> 0cfd6036d0cd..b606f07d94ab 100644
> --- a/drivers/staging/media/sunxi/cedrus/cedrus_dec.c
> +++ b/drivers/staging/media/sunxi/cedrus/cedrus_dec.c
> @@ -49,6 +49,17 @@ void cedrus_device_run(void *priv)
>  			V4L2_CID_MPEG_VIDEO_MPEG2_QUANTIZATION);
>  		break;
> 
> +	case V4L2_PIX_FMT_H264_SLICE:
> +		run.h264.decode_param = cedrus_find_control_data(ctx,
> +			V4L2_CID_MPEG_VIDEO_H264_DECODE_PARAMS);
> +		run.h264.pps = cedrus_find_control_data(ctx,
> +			V4L2_CID_MPEG_VIDEO_H264_PPS);
> +		run.h264.slice_param = cedrus_find_control_data(ctx,
> +			V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAMS);
> +		run.h264.sps = cedrus_find_control_data(ctx,
> +			V4L2_CID_MPEG_VIDEO_H264_SPS);
> +		break;
> +
>  	default:
>  		break;
>  	}
> diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c new file mode 100644
> index 000000000000..5459a936b4b9
> --- /dev/null
> +++ b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> @@ -0,0 +1,470 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * Copyright (c) 2013 Jens Kuske <jenskuske@gmail.com>
> + * Copyright (c) 2018 Bootlin
> + */
> +
> +#include <linux/types.h>
> +
> +#include <media/videobuf2-dma-contig.h>
> +
> +#include "cedrus.h"
> +#include "cedrus_hw.h"
> +#include "cedrus_regs.h"
> +
> +enum cedrus_h264_sram_off {
> +	CEDRUS_SRAM_H264_PRED_WEIGHT_TABLE	= 0x000,
> +	CEDRUS_SRAM_H264_FRAMEBUFFER_LIST	= 0x100,
> +	CEDRUS_SRAM_H264_REF_LIST_0		= 0x190,
> +	CEDRUS_SRAM_H264_REF_LIST_1		= 0x199,
> +	CEDRUS_SRAM_H264_SCALING_LIST_8x8	= 0x200,
> +	CEDRUS_SRAM_H264_SCALING_LIST_4x4	= 0x218,
> +};
> +
> +struct cedrus_h264_sram_ref_pic {
> +	__le32	top_field_order_cnt;
> +	__le32	bottom_field_order_cnt;
> +	__le32	frame_info;
> +	__le32	luma_ptr;
> +	__le32	chroma_ptr;
> +	__le32	mv_col_top_ptr;
> +	__le32	mv_col_bot_ptr;
> +	__le32	reserved;
> +} __packed;
> +
> +/* One for the output, 16 for the reference images */
> +#define CEDRUS_H264_FRAME_NUM		17

HW actually supports 18 frames. It would be nice to at least zero out the last 
position.

> +
> +#define CEDRUS_PIC_INFO_BUF_SIZE	(128 * SZ_1K)

I suggest to determine above value according to formula found in CedarX 
source.

> +
> +static void cedrus_h264_write_sram(struct cedrus_dev *dev,
> +				   enum cedrus_h264_sram_off off,
> +				   const void *data, size_t len)
> +{
> +	const u32 *buffer = data;
> +	size_t count = DIV_ROUND_UP(len, 4);
> +
> +	cedrus_write(dev, VE_AVC_SRAM_PORT_OFFSET, off << 2);
> +
> +	do {
> +		cedrus_write(dev, VE_AVC_SRAM_PORT_DATA, *buffer++);
> +	} while (--count);
> +}
> +
> +static dma_addr_t cedrus_h264_mv_col_buf_addr(struct cedrus_ctx *ctx,
> +					      unsigned int position,
> +					      unsigned int field)
> +{
> +	dma_addr_t addr = ctx->codec.h264.mv_col_buf_dma - PHYS_OFFSET;
> +
> +	/* Adjust for the position */
> +	addr += position * ctx->codec.h264.mv_col_buf_field_size * 2;
> +
> +	/* Adjust for the field */
> +	addr += field * ctx->codec.h264.mv_col_buf_field_size;
> +
> +	return addr;
> +}
> +
> +static void cedrus_fill_ref_pic(struct cedrus_ctx *ctx,
> +				struct cedrus_buffer *buf,
> +				unsigned int top_field_order_cnt,
> +				unsigned int bottom_field_order_cnt,
> +				struct cedrus_h264_sram_ref_pic *pic)
> +{
> +	struct vb2_buffer *vbuf = &buf->m2m_buf.vb.vb2_buf;
> +	unsigned int position = buf->codec.h264.position;
> +
> +	pic->top_field_order_cnt = top_field_order_cnt;
> +	pic->bottom_field_order_cnt = bottom_field_order_cnt;
> +	pic->frame_info = buf->codec.h264.pic_type << 8;
> +
> +	pic->luma_ptr = cedrus_buf_addr(vbuf, &ctx->dst_fmt, 0) - PHYS_OFFSET;
> +	pic->chroma_ptr = cedrus_buf_addr(vbuf, &ctx->dst_fmt, 1) - PHYS_OFFSET;
> +	pic->mv_col_top_ptr = cedrus_h264_mv_col_buf_addr(ctx, position, 0);
> +	pic->mv_col_bot_ptr = cedrus_h264_mv_col_buf_addr(ctx, position, 1);
> +}
> +
> +static void cedrus_write_frame_list(struct cedrus_ctx *ctx,
> +				    struct cedrus_run *run)
> +{
> +	struct cedrus_h264_sram_ref_pic pic_list[CEDRUS_H264_FRAME_NUM];
> +	const struct v4l2_ctrl_h264_decode_param *dec_param =
> run->h264.decode_param; +	const struct v4l2_ctrl_h264_slice_param *slice =
> run->h264.slice_param; +	const struct v4l2_ctrl_h264_sps *sps =
> run->h264.sps;
> +	struct vb2_queue *cap_q = &ctx->fh.m2m_ctx->cap_q_ctx.q;
> +	struct cedrus_buffer *output_buf;
> +	struct cedrus_dev *dev = ctx->dev;
> +	unsigned long used_dpbs = 0;
> +	unsigned int position;
> +	unsigned int output = 0;
> +	unsigned int i;
> +
> +	memset(pic_list, 0, sizeof(pic_list));
> +
> +	for (i = 0; i < ARRAY_SIZE(dec_param->dpb); i++) {
> +		const struct v4l2_h264_dpb_entry *dpb = &dec_param->dpb[i];
> +		struct cedrus_buffer *cedrus_buf;
> +		int buf_idx;
> +
> +		if (!(dpb->flags & V4L2_H264_DPB_ENTRY_FLAG_VALID))
> +			continue;
> +
> +		buf_idx = vb2_find_tag(cap_q, dpb->tag, 0);

Field pictures may reference current capture buffer. However, vb2_find_tag won't 
check queued capture buffer tag, so the frame will be skipped and not written 
to the frame list.

This can be solved by:

struct vb2_v4l2_buffer *v4l2_buf = to_vb2_v4l2_buffer(&run->dst->vb2_buf);
...
if (v4l2_buf->tag == dpb->tag)
	buf_idx = v4l2_buf->vb2_buf.index;
else
	buf_idx = vb2_find_tag(cap_q, dpb->tag, 0);


> +		if (buf_idx < 0)
> +			continue;
> +
> +		cedrus_buf = vb2_to_cedrus_buffer(ctx->dst_bufs[buf_idx]);
> +		position = cedrus_buf->codec.h264.position;
> +		used_dpbs |= BIT(position);
> +
> +		if (!(dpb->flags & V4L2_H264_DPB_ENTRY_FLAG_ACTIVE))
> +			continue;
> +
> +		cedrus_fill_ref_pic(ctx, cedrus_buf,
> +				    dpb->top_field_order_cnt,
> +				    dpb->bottom_field_order_cnt,
> +				    &pic_list[position]);
> +
> +		output = max(position, output);
> +	}
> +
> +	position = find_next_zero_bit(&used_dpbs, CEDRUS_H264_FRAME_NUM,
> +				      output);
> +	if (position >= CEDRUS_H264_FRAME_NUM)
> +		position = find_first_zero_bit(&used_dpbs, CEDRUS_H264_FRAME_NUM);

If capture buffer is part of DPB, position is already known.

> +
> +	output_buf = vb2_to_cedrus_buffer(&run->dst->vb2_buf);
> +	output_buf->codec.h264.position = position;
> +
> +	if (slice->flags & V4L2_H264_SLICE_FLAG_FIELD_PIC)
> +		output_buf->codec.h264.pic_type = CEDRUS_H264_PIC_TYPE_FIELD;
> +	else if (sps->flags & V4L2_H264_SPS_FLAG_MB_ADAPTIVE_FRAME_FIELD)
> +		output_buf->codec.h264.pic_type = CEDRUS_H264_PIC_TYPE_MBAFF;
> +	else
> +		output_buf->codec.h264.pic_type = CEDRUS_H264_PIC_TYPE_FRAME;
> +
> +	cedrus_fill_ref_pic(ctx, output_buf,
> +			    dec_param->top_field_order_cnt,
> +			    dec_param->bottom_field_order_cnt,
> +			    &pic_list[position]);
> +
> +	cedrus_h264_write_sram(dev, CEDRUS_SRAM_H264_FRAMEBUFFER_LIST,
> +			       pic_list, sizeof(pic_list));
> +
> +	cedrus_write(dev, VE_H264_OUTPUT_FRAME_IDX, position);
> +}
> +
> +#define CEDRUS_MAX_REF_IDX	32
> +
> +static void _cedrus_write_ref_list(struct cedrus_ctx *ctx,
> +				   struct cedrus_run *run,
> +				   const u8 *ref_list, u8 num_ref,
> +				   enum cedrus_h264_sram_off sram)
> +{
> +	const struct v4l2_ctrl_h264_decode_param *decode = run->h264.decode_param;
> +	struct vb2_queue *cap_q = &ctx->fh.m2m_ctx->cap_q_ctx.q;
> +	struct cedrus_dev *dev = ctx->dev;
> +	u32 sram_array[CEDRUS_MAX_REF_IDX / sizeof(u32)];
> +	unsigned int size, i;
> +
> +	memset(sram_array, 0, sizeof(sram_array));
> +
> +	for (i = 0; i < num_ref; i += 4) {
> +		unsigned int j;
> +
> +		for (j = 0; j < 4; j++) {
> +			const struct v4l2_h264_dpb_entry *dpb;
> +			const struct cedrus_buffer *cedrus_buf;
> +			const struct vb2_v4l2_buffer *ref_buf;
> +			unsigned int position;
> +			int buf_idx;
> +			u8 ref_idx = i + j;
> +			u8 dpb_idx;
> +
> +			if (ref_idx >= num_ref)
> +				break;
> +
> +			dpb_idx = ref_list[ref_idx];
> +			dpb = &decode->dpb[dpb_idx];
> +
> +			if (!(dpb->flags & V4L2_H264_DPB_ENTRY_FLAG_ACTIVE))
> +				continue;
> +
> +			buf_idx = vb2_find_tag(cap_q, dpb->tag, 0);

Same story as above. Capture buffer tag needs to be checked too.

> +			if (buf_idx < 0)
> +				continue;
> +
> +			ref_buf = to_vb2_v4l2_buffer(ctx->dst_bufs[buf_idx]);
> +			cedrus_buf = vb2_v4l2_to_cedrus_buffer(ref_buf);
> +			position = cedrus_buf->codec.h264.position;
> +
> +			sram_array[i] |= position << (j * 8 + 1);
> +			if (ref_buf->field == V4L2_FIELD_BOTTOM)

Above check won't work. Here driver should check if this is "bottom reference" 
which is different as picture field type. We made a hack for PoC code and 
encoded "bottom reference" and "top reference" information in bit 7 and bit 6 
of each ref_list[] element because only 4 bits are actually used.

> +				sram_array[i] |= BIT(j * 8);
> +		}
> +	}
> +
> +	size = min((unsigned int)ALIGN(num_ref, 4), sizeof(sram_array));
> +	cedrus_h264_write_sram(dev, sram, &sram_array, size);
> +}
> +
> +static void cedrus_write_ref_list0(struct cedrus_ctx *ctx,
> +				   struct cedrus_run *run)
> +{
> +	const struct v4l2_ctrl_h264_slice_param *slice = run->h264.slice_param;
> +
> +	_cedrus_write_ref_list(ctx, run,
> +			       slice->ref_pic_list0,
> +			       slice->num_ref_idx_l0_active_minus1 + 1,
> +			       CEDRUS_SRAM_H264_REF_LIST_0);
> +}
> +
> +static void cedrus_write_ref_list1(struct cedrus_ctx *ctx,
> +				   struct cedrus_run *run)
> +{
> +	const struct v4l2_ctrl_h264_slice_param *slice = run->h264.slice_param;
> +
> +	_cedrus_write_ref_list(ctx, run,
> +			       slice->ref_pic_list1,
> +			       slice->num_ref_idx_l1_active_minus1 + 1,
> +			       CEDRUS_SRAM_H264_REF_LIST_1);
> +}
> +
> +static void cedrus_set_params(struct cedrus_ctx *ctx,
> +			      struct cedrus_run *run)
> +{
> +	const struct v4l2_ctrl_h264_slice_param *slice = run->h264.slice_param;
> +	const struct v4l2_ctrl_h264_pps *pps = run->h264.pps;
> +	const struct v4l2_ctrl_h264_sps *sps = run->h264.sps;
> +	struct cedrus_dev *dev = ctx->dev;
> +	dma_addr_t src_buf_addr;
> +	u32 offset = slice->header_bit_size;
> +	u32 len = (slice->size * 8) - offset;
> +	u32 reg;
> +
> +	cedrus_write(dev, 0x220, 0x02000400);

My tests worked well without above line. Do you know if it is really needed? 

> +	cedrus_write(dev, VE_H264_VLD_LEN, len);
> +	cedrus_write(dev, VE_H264_VLD_OFFSET, offset);
> +
> +	src_buf_addr = vb2_dma_contig_plane_dma_addr(&run->src->vb2_buf, 0);
> +	src_buf_addr -= PHYS_OFFSET;
> +	cedrus_write(dev, VE_H264_VLD_END, src_buf_addr + VBV_SIZE - 1);

VBV_SIZE should be replaced with true size aligned to 1024.

This might not be actually relevant for correctness of decoding.

> +	cedrus_write(dev, VE_H264_VLD_ADDR,
> +		     VE_H264_VLD_ADDR_VAL(src_buf_addr) |
> +		     VE_H264_VLD_ADDR_FIRST | VE_H264_VLD_ADDR_VALID |
> +		     VE_H264_VLD_ADDR_LAST);
> +
> +	/*
> +	 * FIXME: Since the bitstream parsing is done in software, and
> +	 * in userspace, this shouldn't be needed anymore. But it
> +	 * turns out that removing it breaks the decoding process,
> +	 * without any clear indication why.
> +	 */
> +	cedrus_write(dev, VE_H264_TRIGGER_TYPE,
> +		     VE_H264_TRIGGER_TYPE_INIT_SWDEC);
> +
> +	if ((slice->slice_type == V4L2_H264_SLICE_TYPE_P) ||
> +	    (slice->slice_type == V4L2_H264_SLICE_TYPE_SP) ||
> +	    (slice->slice_type == V4L2_H264_SLICE_TYPE_B))
> +		cedrus_write_ref_list0(ctx, run);
> +
> +	if (slice->slice_type == V4L2_H264_SLICE_TYPE_B)
> +		cedrus_write_ref_list1(ctx, run);
> +
> +	// picture parameters
> +	reg = 0;
> +	/*
> +	 * FIXME: the kernel headers are allowing the default value to
> +	 * be passed, but the libva doesn't give us that.
> +	 */
> +	reg |= (slice->num_ref_idx_l0_active_minus1 & 0x1f) << 10;
> +	reg |= (slice->num_ref_idx_l1_active_minus1 & 0x1f) << 5;
> +	reg |= (pps->weighted_bipred_idc & 0x3) << 2;
> +	if (pps->flags & V4L2_H264_PPS_FLAG_ENTROPY_CODING_MODE)
> +		reg |= BIT(15);
> +	if (pps->flags & V4L2_H264_PPS_FLAG_WEIGHTED_PRED)
> +		reg |= BIT(4);
> +	if (pps->flags & V4L2_H264_PPS_FLAG_CONSTRAINED_INTRA_PRED)
> +		reg |= BIT(1);
> +	if (pps->flags & V4L2_H264_PPS_FLAG_TRANSFORM_8X8_MODE)
> +		reg |= BIT(0);
> +	cedrus_write(dev, VE_H264_PIC_HDR, reg);
> +
> +	// sequence parameters
> +	reg = BIT(19);
> +	reg |= (sps->pic_width_in_mbs_minus1 & 0xff) << 8;
> +	reg |= sps->pic_height_in_map_units_minus1 & 0xff;
> +	if (sps->flags & V4L2_H264_SPS_FLAG_FRAME_MBS_ONLY)
> +		reg |= BIT(18);
> +	if (sps->flags & V4L2_H264_SPS_FLAG_MB_ADAPTIVE_FRAME_FIELD)
> +		reg |= BIT(17);
> +	if (sps->flags & V4L2_H264_SPS_FLAG_DIRECT_8X8_INFERENCE)
> +		reg |= BIT(16);
> +	cedrus_write(dev, VE_H264_FRAME_SIZE, reg);
> +
> +	// slice parameters
> +	reg = 0;
> +	/*
> +	 * FIXME: This bit marks all the frames as references. This
> +	 * should probably be set based on nal_ref_idc, but the libva
> +	 * doesn't pass that information along, so this is not always
> +	 * available. We should find something else, maybe change the
> +	 * kernel UAPI somehow?
> +	 */
> +	reg |= BIT(12);
> +	reg |= (slice->slice_type & 0xf) << 8;
> +	reg |= slice->cabac_init_idc & 0x3;
> +	reg |= BIT(5);
> +	if (slice->flags & V4L2_H264_SLICE_FLAG_FIELD_PIC)
> +		reg |= BIT(4);
> +	if (slice->flags & V4L2_H264_SLICE_FLAG_BOTTOM_FIELD)
> +		reg |= BIT(3);
> +	if (slice->flags & V4L2_H264_SLICE_FLAG_DIRECT_SPATIAL_MV_PRED)
> +		reg |= BIT(2);
> +	cedrus_write(dev, VE_H264_SLICE_HDR, reg);
> +
> +	reg = 0;

I suggest to always set BIT(12) here (num_ref_idx_active_override_flag) because 
that information is always provided by userspace.

> +	reg |= (slice->num_ref_idx_l0_active_minus1 & 0x1f) << 24;
> +	reg |= (slice->num_ref_idx_l1_active_minus1 & 0x1f) << 16;
> +	reg |= (slice->disable_deblocking_filter_idc & 0x3) << 8;
> +	reg |= (slice->slice_alpha_c0_offset_div2 & 0xf) << 4;
> +	reg |= slice->slice_beta_offset_div2 & 0xf;
> +	cedrus_write(dev, VE_H264_SLICE_HDR2, reg);
> +
> +	reg = 0;
> +	/*
> +	 * FIXME: This bit tells the video engine to use the default
> +	 * quantization matrices. This will obviously need to be
> +	 * changed to support the profiles supporting custom
> +	 * quantization matrices.
> +	 */
> +	reg |= BIT(24);
> +	reg |= (pps->second_chroma_qp_index_offset & 0x3f) << 16;
> +	reg |= (pps->chroma_qp_index_offset & 0x3f) << 8;
> +	reg |= (pps->pic_init_qp_minus26 + 26 + slice->slice_qp_delta) & 0x3f;
> +	cedrus_write(dev, VE_H264_QP_PARAM, reg);
> +
> +	// clear status flags
> +	cedrus_write(dev, VE_H264_STATUS, cedrus_read(dev, VE_H264_STATUS));
> +
> +	// enable int
> +	reg = cedrus_read(dev, VE_H264_CTRL) | 0x7;
> +	cedrus_write(dev, VE_H264_CTRL, reg);
> +}
> +
> +static enum cedrus_irq_status
> +cedrus_h264_irq_status(struct cedrus_ctx *ctx)
> +{
> +	struct cedrus_dev *dev = ctx->dev;
> +	u32 reg = cedrus_read(dev, VE_H264_STATUS) & 0x7;
> +
> +	if (!reg)
> +		return CEDRUS_IRQ_NONE;
> +
> +	if (reg & (BIT(1) | BIT(2)))
> +		return CEDRUS_IRQ_ERROR;
> +
> +	return CEDRUS_IRQ_OK;
> +}
> +
> +static void cedrus_h264_irq_clear(struct cedrus_ctx *ctx)
> +{
> +	struct cedrus_dev *dev = ctx->dev;
> +
> +	cedrus_write(dev, VE_H264_STATUS, GENMASK(2, 0));
> +}
> +
> +static void cedrus_h264_irq_disable(struct cedrus_ctx *ctx)
> +{
> +	struct cedrus_dev *dev = ctx->dev;
> +	u32 reg = cedrus_read(dev, VE_H264_CTRL) & ~GENMASK(2, 0);
> +
> +	cedrus_write(dev, VE_H264_CTRL, reg);
> +}
> +
> +static void cedrus_h264_setup(struct cedrus_ctx *ctx,
> +			      struct cedrus_run *run)
> +{
> +	struct cedrus_dev *dev = ctx->dev;
> +
> +	cedrus_engine_enable(dev, CEDRUS_CODEC_H264);
> +
> +	cedrus_write(dev, VE_H264_SDROT_CTRL, 0);
> +	cedrus_write(dev, VE_H264_EXTRA_BUFFER1,
> +		     ctx->codec.h264.pic_info_buf_dma - PHYS_OFFSET);
> +	cedrus_write(dev, VE_H264_EXTRA_BUFFER2,
> +		     (ctx->codec.h264.pic_info_buf_dma - PHYS_OFFSET) + 0x48000);

 VE_H264_EXTRA_BUFFER2 is actually MB_NEIGHBOR_INFO_ADDR so I would suggest to 
reintroduce the variable "neighbor info buffer" you removed between v1 and v2 
and use it here. According to information I have, it has to be 16 KiB in size, 
but also aligned to 16KiB. Easy solution is to allocate 32 KiB buffer and write 
16K aligned address here.

> +
> +	cedrus_write_frame_list(ctx, run);
> +
> +	cedrus_set_params(ctx, run);
> +}
> +
> +static int cedrus_h264_start(struct cedrus_ctx *ctx)
> +{
> +	struct cedrus_dev *dev = ctx->dev;
> +	unsigned int field_size;
> +	unsigned int mv_col_size;
> +	int ret;
> +
> +	ctx->codec.h264.pic_info_buf =
> +		dma_alloc_coherent(dev->dev, CEDRUS_PIC_INFO_BUF_SIZE,
> +				   &ctx->codec.h264.pic_info_buf_dma,
> +				   GFP_KERNEL);
> +	if (!ctx->codec.h264.pic_info_buf)
> +		return -ENOMEM;
> +
> +	field_size = DIV_ROUND_UP(ctx->src_fmt.width, 16) *
> +		DIV_ROUND_UP(ctx->src_fmt.height, 16) * 32;

Worst case is actually 2 times higher according to CedarX code.

However, better approach would be to multiply with 16 instead of 32 and 
increase this number if:
V4L2_H264_SPS_FLAG_DIRECT_8X8_INFERENCE == 0, by 2x
V4L2_H264_SPS_FLAG_FRAME_MBS_ONLY == 0, by 2x

That way only minimum needed amount of memory is allocated. CedarX code also 
aligns this number to 1024.

Unfortunately, above information is not available here, so this would mean 
that memory allocation have to be done in setup() function, which is not 
ideal...

Best regards,
Jernej

> +	ctx->codec.h264.mv_col_buf_field_size = field_size;
> +
> +	mv_col_size = field_size * 2 * CEDRUS_H264_FRAME_NUM;
> +	ctx->codec.h264.mv_col_buf_size = mv_col_size;
> +	ctx->codec.h264.mv_col_buf = dma_alloc_coherent(dev->dev,
> +							ctx->codec.h264.mv_col_buf_size,
> +							&ctx->codec.h264.mv_col_buf_dma,
> +							GFP_KERNEL);
> +	if (!ctx->codec.h264.mv_col_buf) {
> +		ret = -ENOMEM;
> +		goto err_pic_buf;
> +	}
> +
> +	return 0;
> +
> +err_pic_buf:
> +	dma_free_coherent(dev->dev, CEDRUS_PIC_INFO_BUF_SIZE,
> +			  ctx->codec.h264.pic_info_buf,
> +			  ctx->codec.h264.pic_info_buf_dma);
> +	return ret;
> +}
> +
> +static void cedrus_h264_stop(struct cedrus_ctx *ctx)
> +{
> +	struct cedrus_dev *dev = ctx->dev;
> +
> +	dma_free_coherent(dev->dev, ctx->codec.h264.mv_col_buf_size,
> +			  ctx->codec.h264.mv_col_buf,
> +			  ctx->codec.h264.mv_col_buf_dma);
> +	dma_free_coherent(dev->dev, CEDRUS_PIC_INFO_BUF_SIZE,
> +			  ctx->codec.h264.pic_info_buf,
> +			  ctx->codec.h264.pic_info_buf_dma);
> +}
> +
> +static void cedrus_h264_trigger(struct cedrus_ctx *ctx)
> +{
> +	struct cedrus_dev *dev = ctx->dev;
> +
> +	cedrus_write(dev, VE_H264_TRIGGER_TYPE,
> +		     VE_H264_TRIGGER_TYPE_AVC_SLICE_DECODE);
> +}
> +
> +struct cedrus_dec_ops cedrus_dec_ops_h264 = {
> +	.irq_clear	= cedrus_h264_irq_clear,
> +	.irq_disable	= cedrus_h264_irq_disable,
> +	.irq_status	= cedrus_h264_irq_status,
> +	.setup		= cedrus_h264_setup,
> +	.start		= cedrus_h264_start,
> +	.stop		= cedrus_h264_stop,
> +	.trigger	= cedrus_h264_trigger,
> +};
> diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_hw.c
> b/drivers/staging/media/sunxi/cedrus/cedrus_hw.c index
> 32adbcbe6175..8e559454ca82 100644
> --- a/drivers/staging/media/sunxi/cedrus/cedrus_hw.c
> +++ b/drivers/staging/media/sunxi/cedrus/cedrus_hw.c
> @@ -46,6 +46,10 @@ int cedrus_engine_enable(struct cedrus_dev *dev, enum
> cedrus_codec codec) reg |= VE_MODE_DEC_MPEG;
>  		break;
> 
> +	case CEDRUS_CODEC_H264:
> +		reg |= VE_MODE_DEC_H264;
> +		break;
> +
>  	default:
>  		return -EINVAL;
>  	}
> diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_regs.h
> b/drivers/staging/media/sunxi/cedrus/cedrus_regs.h index
> de2d6b6f64bf..6fe9896a506d 100644
> --- a/drivers/staging/media/sunxi/cedrus/cedrus_regs.h
> +++ b/drivers/staging/media/sunxi/cedrus/cedrus_regs.h
> @@ -232,4 +232,67 @@
>  #define VE_DEC_MPEG_ROT_LUMA			(VE_ENGINE_DEC_MPEG + 0xcc)
>  #define VE_DEC_MPEG_ROT_CHROMA			(VE_ENGINE_DEC_MPEG + 0xd0)
> 
> +/*  FIXME: Legacy below. */
> +
> +#define VBV_SIZE                       (1024 * 1024)
> +
> +#define VE_H264_FRAME_SIZE		0x200
> +#define VE_H264_PIC_HDR			0x204
> +#define VE_H264_SLICE_HDR		0x208
> +#define VE_H264_SLICE_HDR2		0x20c
> +#define VE_H264_PRED_WEIGHT		0x210
> +#define VE_H264_QP_PARAM		0x21c
> +#define VE_H264_CTRL			0x220
> +
> +#define VE_H264_TRIGGER_TYPE		0x224
> +#define VE_H264_TRIGGER_TYPE_AVC_SLICE_DECODE	(8 << 0)
> +#define VE_H264_TRIGGER_TYPE_INIT_SWDEC		(7 << 0)
> +
> +#define VE_H264_STATUS			0x228
> +#define VE_H264_CUR_MB_NUM		0x22c
> +
> +#define VE_H264_VLD_ADDR		0x230
> +#define VE_H264_VLD_ADDR_FIRST			BIT(30)
> +#define VE_H264_VLD_ADDR_LAST			BIT(29)
> +#define VE_H264_VLD_ADDR_VALID			BIT(28)
> +#define VE_H264_VLD_ADDR_VAL(x)			(((x) & 0x0ffffff0) | ((x) >> 28))
> +
> +#define VE_H264_VLD_OFFSET		0x234
> +#define VE_H264_VLD_LEN			0x238
> +#define VE_H264_VLD_END			0x23c
> +#define VE_H264_SDROT_CTRL		0x240
> +#define VE_H264_OUTPUT_FRAME_IDX	0x24c
> +#define VE_H264_EXTRA_BUFFER1		0x250
> +#define VE_H264_EXTRA_BUFFER2		0x254
> +#define VE_H264_BASIC_BITS		0x2dc
> +#define VE_AVC_SRAM_PORT_OFFSET		0x2e0
> +#define VE_AVC_SRAM_PORT_DATA		0x2e4
> +
> +#define VE_ISP_INPUT_SIZE		0xa00
> +#define VE_ISP_INPUT_STRIDE		0xa04
> +#define VE_ISP_CTRL			0xa08
> +#define VE_ISP_INPUT_LUMA		0xa78
> +#define VE_ISP_INPUT_CHROMA		0xa7c
> +
> +#define VE_AVC_PARAM			0xb04
> +#define VE_AVC_QP			0xb08
> +#define VE_AVC_MOTION_EST		0xb10
> +#define VE_AVC_CTRL			0xb14
> +#define VE_AVC_TRIGGER			0xb18
> +#define VE_AVC_STATUS			0xb1c
> +#define VE_AVC_BASIC_BITS		0xb20
> +#define VE_AVC_UNK_BUF			0xb60
> +#define VE_AVC_VLE_ADDR			0xb80
> +#define VE_AVC_VLE_END			0xb84
> +#define VE_AVC_VLE_OFFSET		0xb88
> +#define VE_AVC_VLE_MAX			0xb8c
> +#define VE_AVC_VLE_LENGTH		0xb90
> +#define VE_AVC_REF_LUMA			0xba0
> +#define VE_AVC_REF_CHROMA		0xba4
> +#define VE_AVC_REC_LUMA			0xbb0
> +#define VE_AVC_REC_CHROMA		0xbb4
> +#define VE_AVC_REF_SLUMA		0xbb8
> +#define VE_AVC_REC_SLUMA		0xbbc
> +#define VE_AVC_MB_INFO			0xbc0
> +
>  #endif
> diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_video.c
> b/drivers/staging/media/sunxi/cedrus/cedrus_video.c index
> 293df48326cc..7be2caacddde 100644
> --- a/drivers/staging/media/sunxi/cedrus/cedrus_video.c
> +++ b/drivers/staging/media/sunxi/cedrus/cedrus_video.c
> @@ -37,6 +37,10 @@ static struct cedrus_format cedrus_formats[] = {
>  		.pixelformat	= V4L2_PIX_FMT_MPEG2_SLICE,
>  		.directions	= CEDRUS_DECODE_SRC,
>  	},
> +	{
> +		.pixelformat	= V4L2_PIX_FMT_H264_SLICE,
> +		.directions	= CEDRUS_DECODE_SRC,
> +	},
>  	{
>  		.pixelformat	= V4L2_PIX_FMT_SUNXI_TILED_NV12,
>  		.directions	= CEDRUS_DECODE_DST,
> @@ -100,6 +104,7 @@ static void cedrus_prepare_format(struct v4l2_pix_format
> *pix_fmt)
> 
>  	switch (pix_fmt->pixelformat) {
>  	case V4L2_PIX_FMT_MPEG2_SLICE:
> +	case V4L2_PIX_FMT_H264_SLICE:
>  		/* Zero bytes per line for encoded source. */
>  		bytesperline = 0;
> 
> @@ -451,6 +456,10 @@ static int cedrus_start_streaming(struct vb2_queue *vq,
> unsigned int count) ctx->current_codec = CEDRUS_CODEC_MPEG2;
>  		break;
> 
> +	case V4L2_PIX_FMT_H264_SLICE:
> +		ctx->current_codec = CEDRUS_CODEC_H264;
> +		break;
> +
>  	default:
>  		return -EINVAL;
>  	}





^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 1/2] media: uapi: Add H264 low-level decoder API compound controls.
  2018-11-15 14:56 ` [PATCH v2 1/2] media: uapi: Add H264 low-level decoder API compound controls Maxime Ripard
  2018-11-27 17:23   ` [linux-sunxi] " Jernej Škrabec
  2018-12-05 12:56   ` Hans Verkuil
@ 2019-01-08  9:52   ` Randy 'ayaka' Li
  2019-01-08 17:01     ` ayaka
  2019-01-17 11:01     ` Maxime Ripard
  2019-01-28  5:54   ` Alexandre Courbot
  3 siblings, 2 replies; 27+ messages in thread
From: Randy 'ayaka' Li @ 2019-01-08  9:52 UTC (permalink / raw)
  To: Maxime Ripard
  Cc: hans.verkuil, acourbot, sakari.ailus, Laurent Pinchart,
	jenskuske, linux-sunxi, linux-kernel, tfiga, Paul Kocialkowski,
	Chen-Yu Tsai, posciak, Thomas Petazzoni, Guenter Roeck,
	nicolas.dufresne, linux-arm-kernel, linux-media

On Thu, Nov 15, 2018 at 03:56:49PM +0100, Maxime Ripard wrote:
> From: Pawel Osciak <posciak@chromium.org>
> 
> Stateless video codecs will require both the H264 metadata and slices in
> order to be able to decode frames.
> 
> This introduces the definitions for a new pixel format for H264 slices that
> have been parsed, as well as the structures used to pass the metadata from
> the userspace to the kernel.
> 
> Co-Developed-by: Maxime Ripard <maxime.ripard@bootlin.com>
> Signed-off-by: Pawel Osciak <posciak@chromium.org>
> Signed-off-by: Guenter Roeck <groeck@chromium.org>
> Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
> ---
>  Documentation/media/uapi/v4l/biblio.rst       |   9 +
>  .../media/uapi/v4l/extended-controls.rst      | 364 ++++++++++++++++++
>  .../media/uapi/v4l/pixfmt-compressed.rst      |  20 +
>  .../media/uapi/v4l/vidioc-queryctrl.rst       |  30 ++
>  .../media/videodev2.h.rst.exceptions          |   5 +
>  drivers/media/v4l2-core/v4l2-ctrls.c          |  42 ++
>  drivers/media/v4l2-core/v4l2-ioctl.c          |   1 +
>  include/media/v4l2-ctrls.h                    |  10 +
>  include/uapi/linux/v4l2-controls.h            | 166 ++++++++
>  include/uapi/linux/videodev2.h                |  11 +
>  10 files changed, 658 insertions(+)
> 
> diff --git a/Documentation/media/uapi/v4l/biblio.rst b/Documentation/media/uapi/v4l/biblio.rst
> index 386d6cf83e9c..73aeb7ce47d2 100644
> --- a/Documentation/media/uapi/v4l/biblio.rst
> +++ b/Documentation/media/uapi/v4l/biblio.rst
> @@ -115,6 +115,15 @@ ITU BT.1119
>  
>  :author:    International Telecommunication Union (http://www.itu.ch)
>  
> +.. _h264:
> +
> +ITU H.264
> +=========
> +
> +:title:     ITU-T Recommendation H.264 "Advanced Video Coding for Generic Audiovisual Services"
> +
> +:author:    International Telecommunication Union (http://www.itu.ch)
> +
>  .. _jfif:
>  
>  JFIF
> diff --git a/Documentation/media/uapi/v4l/extended-controls.rst b/Documentation/media/uapi/v4l/extended-controls.rst
> index 65a1d873196b..87c0d151577f 100644
> --- a/Documentation/media/uapi/v4l/extended-controls.rst
> +++ b/Documentation/media/uapi/v4l/extended-controls.rst
> @@ -1674,6 +1674,370 @@ enum v4l2_mpeg_video_h264_hierarchical_coding_type -
>  	non-intra-coded frames, in zigzag scanning order. Only relevant for
>  	non-4:2:0 YUV formats.
>  
> +.. _v4l2-mpeg-h264:
> +
> +``V4L2_CID_MPEG_VIDEO_H264_SPS (struct)``
> +    Specifies the sequence parameter set (as extracted from the
> +    bitstream) for the associated H264 slice data. This includes the
> +    necessary parameters for configuring a stateless hardware decoding
> +    pipeline for H264.  The bitstream parameters are defined according
> +    to :ref:`h264`. Unless there's a specific comment, refer to the
> +    specification for the documentation of these fields.
> +
> +.. c:type:: v4l2_ctrl_h264_sps
> +
> +.. cssclass:: longtable
> +
> +.. flat-table:: struct v4l2_ctrl_h264_sps
> +    :header-rows:  0
> +    :stub-columns: 0
> +    :widths:       1 1 2
> +
> +    * - __u8
> +      - ``profile_idc``
> +      -
> +    * - __u8
> +      - ``constraint_set_flags``
> +      - TODO
> +    * - __u8
> +      - ``level_idc``
> +      -
> +    * - __u8
> +      - ``seq_parameter_set_id``
> +      -
> +    * - __u8
> +      - ``chroma_format_idc``
> +      -
> +    * - __u8
> +      - ``bit_depth_luma_minus8``
> +      -
> +    * - __u8
> +      - ``bit_depth_chroma_minus8``
> +      -
> +    * - __u8
> +      - ``log2_max_frame_num_minus4``
> +      -
> +    * - __u8
> +      - ``pic_order_cnt_type``
> +      -
> +    * - __u8
> +      - ``log2_max_pic_order_cnt_lsb_minus4``
> +      -
> +    * - __u8
> +      - ``max_num_ref_frames``
> +      -
> +    * - __u8
> +      - ``num_ref_frames_in_pic_order_cnt_cycle``
> +      -
> +    * - __s32
> +      - ``offset_for_ref_frame[255]``
> +      -
> +    * - __s32
> +      - ``offset_for_non_ref_pic``
> +      -
> +    * - __s32
> +      - ``offset_for_top_to_bottom_field``
> +      -
> +    * - __u16
> +      - ``pic_width_in_mbs_minus1``
> +      -
> +    * - __u16
> +      - ``pic_height_in_map_units_minus1``
> +      -
> +    * - __u8
> +      - ``flags``
> +      - TODO
> +
> +``V4L2_CID_MPEG_VIDEO_H264_PPS (struct)``
> +    Specifies the picture parameter set (as extracted from the
> +    bitstream) for the associated H264 slice data. This includes the
> +    necessary parameters for configuring a stateless hardware decoding
> +    pipeline for H264.  The bitstream parameters are defined according
> +    to :ref:`h264`. Unless there's a specific comment, refer to the
> +    specification for the documentation of these fields.
> +
> +.. c:type:: v4l2_ctrl_h264_pps
> +
> +.. cssclass:: longtable
> +
> +.. flat-table:: struct v4l2_ctrl_h264_pps
> +    :header-rows:  0
> +    :stub-columns: 0
> +    :widths:       1 1 2
> +
> +    * - __u8
> +      - ``pic_parameter_set_id``
> +      -
> +    * - __u8
> +      - ``seq_parameter_set_id``
> +      -
> +    * - __u8
> +      - ``num_slice_groups_minus1``
> +      -
> +    * - __u8
> +      - ``num_ref_idx_l0_default_active_minus1``
> +      -
> +    * - __u8
> +      - ``num_ref_idx_l1_default_active_minus1``
> +      -
> +    * - __u8
> +      - ``weighted_bipred_idc``
> +      -
> +    * - __s8
> +      - ``pic_init_qp_minus26``
> +      -
> +    * - __s8
> +      - ``pic_init_qs_minus26``
> +      -
> +    * - __s8
> +      - ``chroma_qp_index_offset``
> +      -
> +    * - __s8
> +      - ``second_chroma_qp_index_offset``
> +      -
> +    * - __u8
> +      - ``flags``
> +      - TODO
> +
> +``V4L2_CID_MPEG_VIDEO_H264_SCALING_MATRIX (struct)``
> +    Specifies the scaling matrix (as extracted from the bitstream) for
> +    the associated H264 slice data. The bitstream parameters are
> +    defined according to :ref:`h264`. Unless there's a specific
> +    comment, refer to the specification for the documentation of these
> +    fields.
> +
> +.. c:type:: v4l2_ctrl_h264_scaling_matrix
> +
> +.. cssclass:: longtable
> +
> +.. flat-table:: struct v4l2_ctrl_h264_scaling_matrix
> +    :header-rows:  0
> +    :stub-columns: 0
> +    :widths:       1 1 2
> +
> +    * - __u8
> +      - ``scaling_list_4x4[6][16]``
> +      -
> +    * - __u8
> +      - ``scaling_list_8x8[6][64]``
> +      -
> +
> +``V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAMS (struct)``
> +    Specifies the slice parameters (as extracted from the bitstream)
> +    for the associated H264 slice data. This includes the necessary
> +    parameters for configuring a stateless hardware decoding pipeline
> +    for H264.  The bitstream parameters are defined according to
> +    :ref:`h264`. Unless there's a specific comment, refer to the
> +    specification for the documentation of these fields.
> +
> +.. c:type:: v4l2_ctrl_h264_slice_param
> +
> +.. cssclass:: longtable
> +
> +.. flat-table:: struct v4l2_ctrl_h264_slice_param
> +    :header-rows:  0
> +    :stub-columns: 0
> +    :widths:       1 1 2
> +
> +    * - __u32
> +      - ``size``
> +      -
> +    * - __u32
> +      - ``header_bit_size``
> +      -
> +    * - __u16
> +      - ``first_mb_in_slice``
> +      -
> +    * - __u8
> +      - ``slice_type``
> +      -
> +    * - __u8
> +      - ``pic_parameter_set_id``
> +      -
> +    * - __u8
> +      - ``colour_plane_id``
> +      -
> +    * - __u16
> +      - ``frame_num``
> +      -
> +    * - __u16
> +      - ``idr_pic_id``
> +      -
> +    * - __u16
> +      - ``pic_order_cnt_lsb``
> +      -
> +    * - __s32
> +      - ``delta_pic_order_cnt_bottom``
> +      -
> +    * - __s32
> +      - ``delta_pic_order_cnt0``
> +      -
> +    * - __s32
> +      - ``delta_pic_order_cnt1``
> +      -
> +    * - __u8
> +      - ``redundant_pic_cnt``
> +      -
> +    * - struct :c:type:`v4l2_h264_pred_weight_table`
> +      - ``pred_weight_table``
> +      -
> +    * - __u32
> +      - ``dec_ref_pic_marking_bit_size``
> +      -
> +    * - __u32
> +      - ``pic_order_cnt_bit_size``
> +      -
> +    * - __u8
> +      - ``cabac_init_idc``
> +      -
> +    * - __s8
> +      - ``slice_qp_delta``
> +      -
> +    * - __s8
> +      - ``slice_qs_delta``
> +      -
> +    * - __u8
> +      - ``disable_deblocking_filter_idc``
> +      -
> +    * - __s8
> +      - ``slice_alpha_c0_offset_div2``
> +      -
> +    * - __s8
> +      - ``slice_beta_offset_div2``
> +      -
> +    * - __u32
> +      - ``slice_group_change_cycle``
> +      -
> +    * - __u8
> +      - ``num_ref_idx_l0_active_minus1``
> +      -
> +    * - __u8
> +      - ``num_ref_idx_l1_active_minus1``
> +      -
> +    * - __u8
> +      - ``ref_pic_list0[32]``
> +      -
> +    * - __u8
> +      - ``ref_pic_list1[32]``
> +      -
> +    * - __u8
> +      - ``flags``
> +      - TODO
> +
> +.. c:type:: v4l2_h264_pred_weight_table
> +
> +.. cssclass:: longtable
> +
> +.. flat-table:: struct v4l2_h264_pred_weight_table
> +    :header-rows:  0
> +    :stub-columns: 0
> +    :widths:       1 1 2
> +
> +    * - __u8
> +      - ``luma_log2_weight_denom``
> +      -
> +    * - __u8
> +      - ``chroma_log2_weight_denom``
> +      -
> +    * - struct :c:type:`v4l2_h264_weight_factors`
> +      - ``weight_factors[2]``
> +      -
> +
> +.. c:type:: v4l2_h264_weight_factors
> +
> +.. cssclass:: longtable
> +
> +.. flat-table:: struct v4l2_h264_weight_factors
> +    :header-rows:  0
> +    :stub-columns: 0
> +    :widths:       1 1 2
> +
> +    * - __s8
> +      - ``luma_weight[32]``
> +      -
> +    * - __s8
> +      - ``luma_offset[32]``
> +      -
> +    * - __s8
> +      - ``chroma_weight[32][2]``
> +      -
> +    * - __s8
> +      - ``chroma_offset[32][2]``
> +      -
> +
> +``V4L2_CID_MPEG_VIDEO_H264_DECODE_PARAMS (struct)``
> +    Specifies the decode parameters (as extracted from the bitstream)
> +    for the associated H264 slice data. This includes the necessary
> +    parameters for configuring a stateless hardware decoding pipeline
> +    for H264.  The bitstream parameters are defined according to
> +    :ref:`h264`. Unless there's a specific comment, refer to the
> +    specification for the documentation of these fields.
> +
> +.. c:type:: v4l2_ctrl_h264_decode_param
> +
> +.. cssclass:: longtable
> +
> +.. flat-table:: struct v4l2_ctrl_h264_decode_param
> +    :header-rows:  0
> +    :stub-columns: 0
> +    :widths:       1 1 2
> +
> +    * - __u32
> +      - ``num_slices``
> +      -
> +    * - __u8
> +      - ``idr_pic_flag``
> +      -
> +    * - __u8
> +      - ``nal_ref_idc``
> +      -
> +    * - __s32
> +      - ``top_field_order_cnt``
> +      -
> +    * - __s32
> +      - ``bottom_field_order_cnt``
> +      -
> +    * - __u8
> +      - ``ref_pic_list_p0[32]``
> +      -
> +    * - __u8
> +      - ``ref_pic_list_b0[32]``
> +      -
> +    * - __u8
> +      - ``ref_pic_list_b1[32]``
> +      -
> +    * - struct :c:type:`v4l2_h264_dpb_entry`
> +      - ``dpb[16]``
> +      -
> +
> +.. c:type:: v4l2_h264_dpb_entry
> +
> +.. cssclass:: longtable
> +
> +.. flat-table:: struct v4l2_h264_dpb_entry
> +    :header-rows:  0
> +    :stub-columns: 0
> +    :widths:       1 1 2
> +
> +    * - __u32
> +      - ``tag``
> +      - tag to identify the buffer containing the reference frame
> +    * - __u16
> +      - ``frame_num``
> +      -
> +    * - __u16
> +      - ``pic_num``
> +      -
> +    * - __s32
> +      - ``top_field_order_cnt``
> +      -
> +    * - __s32
> +      - ``bottom_field_order_cnt``
> +      -
> +    * - __u8
> +      - ``flags``
> +      -
> +
>  MFC 5.1 MPEG Controls
>  ---------------------
>  
> diff --git a/Documentation/media/uapi/v4l/pixfmt-compressed.rst b/Documentation/media/uapi/v4l/pixfmt-compressed.rst
> index ba0f6c49d9bf..f15fc1c8d479 100644
> --- a/Documentation/media/uapi/v4l/pixfmt-compressed.rst
> +++ b/Documentation/media/uapi/v4l/pixfmt-compressed.rst
> @@ -45,6 +45,26 @@ Compressed Formats
>        - ``V4L2_PIX_FMT_H264_MVC``
>        - 'M264'
>        - H264 MVC video elementary stream.
> +    * .. _V4L2-PIX-FMT-H264-SLICE:
> +
> +      - ``V4L2_PIX_FMT_H264_SLICE``
> +      - 'S264'
> +      - H264 parsed slice data, as extracted from the H264 bitstream.
> +	This format is adapted for stateless video decoders that
> +	implement an H264 pipeline (using the :ref:`codec` and
> +	:ref:`media-request-api`).  Metadata associated with the frame
> +	to decode are required to be passed through the
> +	``V4L2_CID_MPEG_VIDEO_H264_SPS``,
> +	``V4L2_CID_MPEG_VIDEO_H264_PPS`` and
> +	``V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAMS`` and
> +	``V4L2_CID_MPEG_VIDEO_H264_DECODE_PARAMS`` controls and
> +	scaling matrices can optionally be specified through the
> +	``V4L2_CID_MPEG_VIDEO_H264_SCALING_MATRIX`` control.  See the
> +	:ref:`associated Codec Control IDs <v4l2-mpeg-h264>`.
> +	Exactly one output and one capture buffer must be provided for
> +	use with this pixel format. The output buffer must contain the
> +	appropriate number of macroblocks to decode a full
> +	corresponding frame to the matching capture buffer.
>      * .. _V4L2-PIX-FMT-H263:
>  
>        - ``V4L2_PIX_FMT_H263``
> diff --git a/Documentation/media/uapi/v4l/vidioc-queryctrl.rst b/Documentation/media/uapi/v4l/vidioc-queryctrl.rst
> index 258f5813f281..38a9c988124c 100644
> --- a/Documentation/media/uapi/v4l/vidioc-queryctrl.rst
> +++ b/Documentation/media/uapi/v4l/vidioc-queryctrl.rst
> @@ -436,6 +436,36 @@ See also the examples in :ref:`control`.
>        - n/a
>        - A struct :c:type:`v4l2_ctrl_mpeg2_quantization`, containing MPEG-2
>  	quantization matrices for stateless video decoders.
> +    * - ``V4L2_CTRL_TYPE_H264_SPS``
> +      - n/a
> +      - n/a
> +      - n/a
> +      - A struct :c:type:`v4l2_ctrl_h264_sps`, containing H264
> +	sequence parameters for stateless video decoders.
> +    * - ``V4L2_CTRL_TYPE_H264_PPS``
> +      - n/a
> +      - n/a
> +      - n/a
> +      - A struct :c:type:`v4l2_ctrl_h264_pps`, containing H264
> +	picture parameters for stateless video decoders.
> +    * - ``V4L2_CTRL_TYPE_H264_SCALING_MATRIX``
> +      - n/a
> +      - n/a
> +      - n/a
> +      - A struct :c:type:`v4l2_ctrl_h264_scaling_matrix`, containing H264
> +	scaling matrices for stateless video decoders.
> +    * - ``V4L2_CTRL_TYPE_H264_SLICE_PARAMS``
> +      - n/a
> +      - n/a
> +      - n/a
> +      - A struct :c:type:`v4l2_ctrl_h264_slice_param`, containing H264
> +	slice parameters for stateless video decoders.
> +    * - ``V4L2_CTRL_TYPE_H264_DECODE_PARAMS``
> +      - n/a
> +      - n/a
> +      - n/a
> +      - A struct :c:type:`v4l2_ctrl_h264_decode_param`, containing H264
> +	decode parameters for stateless video decoders.
>  
>  .. tabularcolumns:: |p{6.6cm}|p{2.2cm}|p{8.7cm}|
>  
> diff --git a/Documentation/media/videodev2.h.rst.exceptions b/Documentation/media/videodev2.h.rst.exceptions
> index 1ec425a7c364..99f1bd2bc44c 100644
> --- a/Documentation/media/videodev2.h.rst.exceptions
> +++ b/Documentation/media/videodev2.h.rst.exceptions
> @@ -133,6 +133,11 @@ replace symbol V4L2_CTRL_TYPE_U32 :c:type:`v4l2_ctrl_type`
>  replace symbol V4L2_CTRL_TYPE_U8 :c:type:`v4l2_ctrl_type`
>  replace symbol V4L2_CTRL_TYPE_MPEG2_SLICE_PARAMS :c:type:`v4l2_ctrl_type`
>  replace symbol V4L2_CTRL_TYPE_MPEG2_QUANTIZATION :c:type:`v4l2_ctrl_type`
> +replace symbol V4L2_CTRL_TYPE_H264_SPS :c:type:`v4l2_ctrl_type`
> +replace symbol V4L2_CTRL_TYPE_H264_PPS :c:type:`v4l2_ctrl_type`
> +replace symbol V4L2_CTRL_TYPE_H264_SCALING_MATRIX :c:type:`v4l2_ctrl_type`
> +replace symbol V4L2_CTRL_TYPE_H264_SLICE_PARAMS :c:type:`v4l2_ctrl_type`
> +replace symbol V4L2_CTRL_TYPE_H264_DECODE_PARAMS :c:type:`v4l2_ctrl_type`
>  
>  # V4L2 capability defines
>  replace define V4L2_CAP_VIDEO_CAPTURE device-capabilities
> diff --git a/drivers/media/v4l2-core/v4l2-ctrls.c b/drivers/media/v4l2-core/v4l2-ctrls.c
> index b854cceb19dc..e96c453208e8 100644
> --- a/drivers/media/v4l2-core/v4l2-ctrls.c
> +++ b/drivers/media/v4l2-core/v4l2-ctrls.c
> @@ -825,6 +825,11 @@ const char *v4l2_ctrl_get_name(u32 id)
>  	case V4L2_CID_MPEG_VIDEO_H264_HIERARCHICAL_CODING_LAYER:return "H264 Number of HC Layers";
>  	case V4L2_CID_MPEG_VIDEO_H264_HIERARCHICAL_CODING_LAYER_QP:
>  								return "H264 Set QP Value for HC Layers";
> +	case V4L2_CID_MPEG_VIDEO_H264_SPS:			return "H264 SPS";
> +	case V4L2_CID_MPEG_VIDEO_H264_PPS:			return "H264 PPS";
> +	case V4L2_CID_MPEG_VIDEO_H264_SCALING_MATRIX:		return "H264 Scaling Matrix";
> +	case V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAMS:		return "H264 Slice Parameters";
> +	case V4L2_CID_MPEG_VIDEO_H264_DECODE_PARAMS:		return "H264 Decode Parameters";
>  	case V4L2_CID_MPEG_VIDEO_MPEG4_I_FRAME_QP:		return "MPEG4 I-Frame QP Value";
>  	case V4L2_CID_MPEG_VIDEO_MPEG4_P_FRAME_QP:		return "MPEG4 P-Frame QP Value";
>  	case V4L2_CID_MPEG_VIDEO_MPEG4_B_FRAME_QP:		return "MPEG4 B-Frame QP Value";
> @@ -1300,6 +1305,21 @@ void v4l2_ctrl_fill(u32 id, const char **name, enum v4l2_ctrl_type *type,
>  	case V4L2_CID_MPEG_VIDEO_MPEG2_QUANTIZATION:
>  		*type = V4L2_CTRL_TYPE_MPEG2_QUANTIZATION;
>  		break;
> +	case V4L2_CID_MPEG_VIDEO_H264_SPS:
> +		*type = V4L2_CTRL_TYPE_H264_SPS;
> +		break;
> +	case V4L2_CID_MPEG_VIDEO_H264_PPS:
> +		*type = V4L2_CTRL_TYPE_H264_PPS;
> +		break;
> +	case V4L2_CID_MPEG_VIDEO_H264_SCALING_MATRIX:
> +		*type = V4L2_CTRL_TYPE_H264_SCALING_MATRIX;
> +		break;
> +	case V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAMS:
> +		*type = V4L2_CTRL_TYPE_H264_SLICE_PARAMS;
> +		break;
> +	case V4L2_CID_MPEG_VIDEO_H264_DECODE_PARAMS:
> +		*type = V4L2_CTRL_TYPE_H264_DECODE_PARAMS;
> +		break;
>  	default:
>  		*type = V4L2_CTRL_TYPE_INTEGER;
>  		break;
> @@ -1665,6 +1685,13 @@ static int std_validate(const struct v4l2_ctrl *ctrl, u32 idx,
>  	case V4L2_CTRL_TYPE_MPEG2_QUANTIZATION:
>  		return 0;
>  
> +	case V4L2_CTRL_TYPE_H264_SPS:
> +	case V4L2_CTRL_TYPE_H264_PPS:
> +	case V4L2_CTRL_TYPE_H264_SCALING_MATRIX:
> +	case V4L2_CTRL_TYPE_H264_SLICE_PARAMS:
> +	case V4L2_CTRL_TYPE_H264_DECODE_PARAMS:
> +		return 0;
> +
>  	default:
>  		return -EINVAL;
>  	}
> @@ -2245,6 +2272,21 @@ static struct v4l2_ctrl *v4l2_ctrl_new(struct v4l2_ctrl_handler *hdl,
>  	case V4L2_CTRL_TYPE_MPEG2_QUANTIZATION:
>  		elem_size = sizeof(struct v4l2_ctrl_mpeg2_quantization);
>  		break;
> +	case V4L2_CTRL_TYPE_H264_SPS:
> +		elem_size = sizeof(struct v4l2_ctrl_h264_sps);
> +		break;
> +	case V4L2_CTRL_TYPE_H264_PPS:
> +		elem_size = sizeof(struct v4l2_ctrl_h264_pps);
> +		break;
> +	case V4L2_CTRL_TYPE_H264_SCALING_MATRIX:
> +		elem_size = sizeof(struct v4l2_ctrl_h264_scaling_matrix);
> +		break;
> +	case V4L2_CTRL_TYPE_H264_SLICE_PARAMS:
> +		elem_size = sizeof(struct v4l2_ctrl_h264_slice_param);
> +		break;
> +	case V4L2_CTRL_TYPE_H264_DECODE_PARAMS:
> +		elem_size = sizeof(struct v4l2_ctrl_h264_decode_param);
> +		break;
>  	default:
>  		if (type < V4L2_CTRL_COMPOUND_TYPES)
>  			elem_size = sizeof(s32);
> diff --git a/drivers/media/v4l2-core/v4l2-ioctl.c b/drivers/media/v4l2-core/v4l2-ioctl.c
> index 49103787d19a..aa63f1794272 100644
> --- a/drivers/media/v4l2-core/v4l2-ioctl.c
> +++ b/drivers/media/v4l2-core/v4l2-ioctl.c
> @@ -1309,6 +1309,7 @@ static void v4l_fill_fmtdesc(struct v4l2_fmtdesc *fmt)
>  		case V4L2_PIX_FMT_H264:		descr = "H.264"; break;
>  		case V4L2_PIX_FMT_H264_NO_SC:	descr = "H.264 (No Start Codes)"; break;
>  		case V4L2_PIX_FMT_H264_MVC:	descr = "H.264 MVC"; break;
> +		case V4L2_PIX_FMT_H264_SLICE:	descr = "H.264 Parsed Slice"; break;
>  		case V4L2_PIX_FMT_H263:		descr = "H.263"; break;
>  		case V4L2_PIX_FMT_MPEG1:	descr = "MPEG-1 ES"; break;
>  		case V4L2_PIX_FMT_MPEG2:	descr = "MPEG-2 ES"; break;
> diff --git a/include/media/v4l2-ctrls.h b/include/media/v4l2-ctrls.h
> index 83ce0593b275..b4ca95710d2d 100644
> --- a/include/media/v4l2-ctrls.h
> +++ b/include/media/v4l2-ctrls.h
> @@ -43,6 +43,11 @@ struct poll_table_struct;
>   * @p_char:			Pointer to a string.
>   * @p_mpeg2_slice_params:	Pointer to a MPEG2 slice parameters structure.
>   * @p_mpeg2_quantization:	Pointer to a MPEG2 quantization data structure.
> + * @p_h264_sps:			Pointer to a struct v4l2_ctrl_h264_sps.
> + * @p_h264_pps:			Pointer to a struct v4l2_ctrl_h264_pps.
> + * @p_h264_scal_mtrx:		Pointer to a struct v4l2_ctrl_h264_scaling_matrix.
> + * @p_h264_slice_param:		Pointer to a struct v4l2_ctrl_h264_slice_param.
> + * @p_h264_decode_param:	Pointer to a struct v4l2_ctrl_h264_decode_param.
>   * @p:				Pointer to a compound value.
>   */
>  union v4l2_ctrl_ptr {
> @@ -54,6 +59,11 @@ union v4l2_ctrl_ptr {
>  	char *p_char;
>  	struct v4l2_ctrl_mpeg2_slice_params *p_mpeg2_slice_params;
>  	struct v4l2_ctrl_mpeg2_quantization *p_mpeg2_quantization;
> +	struct v4l2_ctrl_h264_sps *p_h264_sps;
> +	struct v4l2_ctrl_h264_pps *p_h264_pps;
> +	struct v4l2_ctrl_h264_scaling_matrix *p_h264_scal_mtrx;
> +	struct v4l2_ctrl_h264_slice_param *p_h264_slice_param;
> +	struct v4l2_ctrl_h264_decode_param *p_h264_decode_param;
>  	void *p;
>  };
>  
> diff --git a/include/uapi/linux/v4l2-controls.h b/include/uapi/linux/v4l2-controls.h
> index 76f5322ec543..fb1469ec1b90 100644
> --- a/include/uapi/linux/v4l2-controls.h
> +++ b/include/uapi/linux/v4l2-controls.h
> @@ -50,6 +50,8 @@
>  #ifndef __LINUX_V4L2_CONTROLS_H
>  #define __LINUX_V4L2_CONTROLS_H
>  
> +#include <linux/types.h>
> +
>  /* Control classes */
>  #define V4L2_CTRL_CLASS_USER		0x00980000	/* Old-style 'user' controls */
>  #define V4L2_CTRL_CLASS_MPEG		0x00990000	/* MPEG-compression controls */
> @@ -534,6 +536,12 @@ enum v4l2_mpeg_video_h264_hierarchical_coding_type {
>  };
>  #define V4L2_CID_MPEG_VIDEO_H264_HIERARCHICAL_CODING_LAYER	(V4L2_CID_MPEG_BASE+381)
>  #define V4L2_CID_MPEG_VIDEO_H264_HIERARCHICAL_CODING_LAYER_QP	(V4L2_CID_MPEG_BASE+382)
> +#define V4L2_CID_MPEG_VIDEO_H264_SPS		(V4L2_CID_MPEG_BASE+383)
> +#define V4L2_CID_MPEG_VIDEO_H264_PPS		(V4L2_CID_MPEG_BASE+384)
> +#define V4L2_CID_MPEG_VIDEO_H264_SCALING_MATRIX	(V4L2_CID_MPEG_BASE+385)
> +#define V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAMS	(V4L2_CID_MPEG_BASE+386)
> +#define V4L2_CID_MPEG_VIDEO_H264_DECODE_PARAMS	(V4L2_CID_MPEG_BASE+387)
> +
>  #define V4L2_CID_MPEG_VIDEO_MPEG4_I_FRAME_QP	(V4L2_CID_MPEG_BASE+400)
>  #define V4L2_CID_MPEG_VIDEO_MPEG4_P_FRAME_QP	(V4L2_CID_MPEG_BASE+401)
>  #define V4L2_CID_MPEG_VIDEO_MPEG4_B_FRAME_QP	(V4L2_CID_MPEG_BASE+402)
> @@ -1156,4 +1164,162 @@ struct v4l2_ctrl_mpeg2_quantization {
>  	__u8	chroma_non_intra_quantiser_matrix[64];
>  };
>  
> +/* Compounds controls */
> +
> +#define V4L2_H264_SPS_CONSTRAINT_SET0_FLAG			0x01
> +#define V4L2_H264_SPS_CONSTRAINT_SET1_FLAG			0x02
> +#define V4L2_H264_SPS_CONSTRAINT_SET2_FLAG			0x04
> +#define V4L2_H264_SPS_CONSTRAINT_SET3_FLAG			0x08
> +#define V4L2_H264_SPS_CONSTRAINT_SET4_FLAG			0x10
> +#define V4L2_H264_SPS_CONSTRAINT_SET5_FLAG			0x20
> +
> +#define V4L2_H264_SPS_FLAG_SEPARATE_COLOUR_PLANE		0x01
> +#define V4L2_H264_SPS_FLAG_QPPRIME_Y_ZERO_TRANSFORM_BYPASS	0x02
> +#define V4L2_H264_SPS_FLAG_DELTA_PIC_ORDER_ALWAYS_ZERO		0x04
> +#define V4L2_H264_SPS_FLAG_GAPS_IN_FRAME_NUM_VALUE_ALLOWED	0x08
> +#define V4L2_H264_SPS_FLAG_FRAME_MBS_ONLY			0x10
> +#define V4L2_H264_SPS_FLAG_MB_ADAPTIVE_FRAME_FIELD		0x20
> +#define V4L2_H264_SPS_FLAG_DIRECT_8X8_INFERENCE			0x40
> +
> +struct v4l2_ctrl_h264_sps {
> +	__u8 profile_idc;
> +	__u8 constraint_set_flags;
> +	__u8 level_idc;
> +	__u8 seq_parameter_set_id;
> +	__u8 chroma_format_idc;
> +	__u8 bit_depth_luma_minus8;
> +	__u8 bit_depth_chroma_minus8;
> +	__u8 log2_max_frame_num_minus4;
> +	__u8 pic_order_cnt_type;
> +	__u8 log2_max_pic_order_cnt_lsb_minus4;
> +	__u8 max_num_ref_frames;
> +	__u8 num_ref_frames_in_pic_order_cnt_cycle;
> +	__s32 offset_for_ref_frame[255];
> +	__s32 offset_for_non_ref_pic;
> +	__s32 offset_for_top_to_bottom_field;
> +	__u16 pic_width_in_mbs_minus1;
> +	__u16 pic_height_in_map_units_minus1;
> +	__u8 flags;
> +};
> +
> +#define V4L2_H264_PPS_FLAG_ENTROPY_CODING_MODE				0x0001
> +#define V4L2_H264_PPS_FLAG_BOTTOM_FIELD_PIC_ORDER_IN_FRAME_PRESENT	0x0002
> +#define V4L2_H264_PPS_FLAG_WEIGHTED_PRED				0x0004
> +#define V4L2_H264_PPS_FLAG_DEBLOCKING_FILTER_CONTROL_PRESENT		0x0008
> +#define V4L2_H264_PPS_FLAG_CONSTRAINED_INTRA_PRED			0x0010
> +#define V4L2_H264_PPS_FLAG_REDUNDANT_PIC_CNT_PRESENT			0x0020
> +#define V4L2_H264_PPS_FLAG_TRANSFORM_8X8_MODE				0x0040
> +#define V4L2_H264_PPS_FLAG_PIC_SCALING_MATRIX_PRESENT			0x0080
> +
> +struct v4l2_ctrl_h264_pps {
> +	__u8 pic_parameter_set_id;
> +	__u8 seq_parameter_set_id;
> +	__u8 num_slice_groups_minus1;
> +	__u8 num_ref_idx_l0_default_active_minus1;
> +	__u8 num_ref_idx_l1_default_active_minus1;
> +	__u8 weighted_bipred_idc;
> +	__s8 pic_init_qp_minus26;
> +	__s8 pic_init_qs_minus26;
> +	__s8 chroma_qp_index_offset;
> +	__s8 second_chroma_qp_index_offset;
> +	__u8 flags;
> +};
> +
> +struct v4l2_ctrl_h264_scaling_matrix {
> +	__u8 scaling_list_4x4[6][16];
> +	__u8 scaling_list_8x8[6][64];
> +};

I wonder which decoder want this.
> +
> +struct v4l2_h264_weight_factors {
> +	__s8 luma_weight[32];
> +	__s8 luma_offset[32];
> +	__s8 chroma_weight[32][2];
> +	__s8 chroma_offset[32][2];
> +};


> +
> +struct v4l2_h264_pred_weight_table {
> +	__u8 luma_log2_weight_denom;
> +	__u8 chroma_log2_weight_denom;
> +	struct v4l2_h264_weight_factors weight_factors[2];
> +};
> +
> +#define V4L2_H264_SLICE_TYPE_P				0
> +#define V4L2_H264_SLICE_TYPE_B				1
> +#define V4L2_H264_SLICE_TYPE_I				2
> +#define V4L2_H264_SLICE_TYPE_SP				3
> +#define V4L2_H264_SLICE_TYPE_SI				4
> +
> +#define V4L2_H264_SLICE_FLAG_FIELD_PIC			0x01
> +#define V4L2_H264_SLICE_FLAG_BOTTOM_FIELD		0x02
> +#define V4L2_H264_SLICE_FLAG_DIRECT_SPATIAL_MV_PRED	0x04
> +#define V4L2_H264_SLICE_FLAG_SP_FOR_SWITCH		0x08
> +
> +struct v4l2_ctrl_h264_slice_param {
> +	/* Size in bytes, including header */
> +	__u32 size;
> +	/* Offset in bits to slice_data() from the beginning of this slice. */
> +	__u32 header_bit_size;
> +
> +	__u16 first_mb_in_slice;
> +	__u8 slice_type;
> +	__u8 pic_parameter_set_id;
> +	__u8 colour_plane_id;
> +	__u16 frame_num;
> +	__u16 idr_pic_id;
> +	__u16 pic_order_cnt_lsb;
> +	__s32 delta_pic_order_cnt_bottom;
> +	__s32 delta_pic_order_cnt0;
> +	__s32 delta_pic_order_cnt1;
> +	__u8 redundant_pic_cnt;
> +
> +	struct v4l2_h264_pred_weight_table pred_weight_table;
> +	/* Size in bits of dec_ref_pic_marking() syntax element. */
> +	__u32 dec_ref_pic_marking_bit_size;
> +	/* Size in bits of pic order count syntax. */
> +	__u32 pic_order_cnt_bit_size;
> +
> +	__u8 cabac_init_idc;
> +	__s8 slice_qp_delta;
> +	__s8 slice_qs_delta;
> +	__u8 disable_deblocking_filter_idc;
> +	__s8 slice_alpha_c0_offset_div2;
> +	__s8 slice_beta_offset_div2;
> +	__u32 slice_group_change_cycle;
> +
> +	__u8 num_ref_idx_l0_active_minus1;
> +	__u8 num_ref_idx_l1_active_minus1;
> +	/*  Entries on each list are indices
> +	 *  into v4l2_ctrl_h264_decode_param.dpb[]. */
> +	__u8 ref_pic_list0[32];
> +	__u8 ref_pic_list1[32];
> +
> +	__u8 flags;
> +};
> +
We need some addtional properties or the Rockchip won't work.
1. u16 idr_pic_id for identifies IDR (instantaneous decoding refresh)
picture
2. u16 ref_pic_mk_len for length of decoded reference picture marking bits
3. u8 poc_length for length of picture order count field in stream

The last two are used for the hardware to skip a part stream.

> +#define V4L2_H264_DPB_ENTRY_FLAG_VALID		0x01
> +#define V4L2_H264_DPB_ENTRY_FLAG_ACTIVE		0x02
> +#define V4L2_H264_DPB_ENTRY_FLAG_LONG_TERM	0x04
> +
> +struct v4l2_h264_dpb_entry {
> +	__u32 tag;
> +	__u16 frame_num;
> +	__u16 pic_num;

Although the long term reference would use picture order count
and short term for frame num, but only one of them is used
for a entry of a dpb.

Besides, for a frame picture frame_num = pic_num * 2,
and frame_num = pic_num * 2 + 1 for a filed.
> +	/* Note that field is indicated by v4l2_buffer.field */
> +	__s32 top_field_order_cnt;
> +	__s32 bottom_field_order_cnt;
> +	__u8 flags; /* V4L2_H264_DPB_ENTRY_FLAG_* */
> +};
> +
> +struct v4l2_ctrl_h264_decode_param {
> +	__u32 num_slices;
> +	__u8 idr_pic_flag;
> +	__u8 nal_ref_idc;
> +	__s32 top_field_order_cnt;
> +	__s32 bottom_field_order_cnt;
> +	__u8 ref_pic_list_p0[32];
> +	__u8 ref_pic_list_b0[32];
> +	__u8 ref_pic_list_b1[32];
I would prefer to keep only two list, list0 and list 1.
Anyway P slice just use the list0 and B would use the both.
> +	struct v4l2_h264_dpb_entry dpb[16];
> +};
> +
>  #endif
> diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h
> index 173a94d2cbef..dd028e0bf306 100644
> --- a/include/uapi/linux/videodev2.h
> +++ b/include/uapi/linux/videodev2.h
> @@ -643,6 +643,7 @@ struct v4l2_pix_format {
>  #define V4L2_PIX_FMT_H264     v4l2_fourcc('H', '2', '6', '4') /* H264 with start codes */
>  #define V4L2_PIX_FMT_H264_NO_SC v4l2_fourcc('A', 'V', 'C', '1') /* H264 without start codes */
>  #define V4L2_PIX_FMT_H264_MVC v4l2_fourcc('M', '2', '6', '4') /* H264 MVC */
> +#define V4L2_PIX_FMT_H264_SLICE v4l2_fourcc('S', '2', '6', '4') /* H264 parsed slices */
>  #define V4L2_PIX_FMT_H263     v4l2_fourcc('H', '2', '6', '3') /* H263          */
>  #define V4L2_PIX_FMT_MPEG1    v4l2_fourcc('M', 'P', 'G', '1') /* MPEG-1 ES     */
>  #define V4L2_PIX_FMT_MPEG2    v4l2_fourcc('M', 'P', 'G', '2') /* MPEG-2 ES     */
> @@ -1631,6 +1632,11 @@ struct v4l2_ext_control {
>  		__u32 __user *p_u32;
>  		struct v4l2_ctrl_mpeg2_slice_params __user *p_mpeg2_slice_params;
>  		struct v4l2_ctrl_mpeg2_quantization __user *p_mpeg2_quantization;
> +		struct v4l2_ctrl_h264_sps __user *p_h264_sps;
> +		struct v4l2_ctrl_h264_pps __user *p_h264_pps;
> +		struct v4l2_ctrl_h264_scaling_matrix __user *p_h264_scal_mtrx;
> +		struct v4l2_ctrl_h264_slice_param __user *p_h264_slice_param;
> +		struct v4l2_ctrl_h264_decode_param __user *p_h264_decode_param;
>  		void __user *ptr;
>  	};
>  } __attribute__ ((packed));
> @@ -1678,6 +1684,11 @@ enum v4l2_ctrl_type {
>  	V4L2_CTRL_TYPE_U32	     = 0x0102,
>  	V4L2_CTRL_TYPE_MPEG2_SLICE_PARAMS = 0x0103,
>  	V4L2_CTRL_TYPE_MPEG2_QUANTIZATION = 0x0104,
> +	V4L2_CTRL_TYPE_H264_SPS      = 0x0105,
> +	V4L2_CTRL_TYPE_H264_PPS      = 0x0106,
> +	V4L2_CTRL_TYPE_H264_SCALING_MATRIX = 0x0107,
> +	V4L2_CTRL_TYPE_H264_SLICE_PARAMS = 0x0108,
> +	V4L2_CTRL_TYPE_H264_DECODE_PARAMS = 0x0109,
>  };
>  
>  /*  Used in the VIDIOC_QUERYCTRL ioctl for querying controls */

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 1/2] media: uapi: Add H264 low-level decoder API compound controls.
  2019-01-08  9:52   ` Randy 'ayaka' Li
@ 2019-01-08 17:01     ` ayaka
  2019-01-10 13:33       ` ayaka
  2019-01-17 11:16       ` Maxime Ripard
  2019-01-17 11:01     ` Maxime Ripard
  1 sibling, 2 replies; 27+ messages in thread
From: ayaka @ 2019-01-08 17:01 UTC (permalink / raw)
  To: Maxime Ripard
  Cc: hans.verkuil, acourbot, sakari.ailus, Laurent Pinchart,
	jenskuske, linux-sunxi, linux-kernel, tfiga, Paul Kocialkowski,
	Chen-Yu Tsai, posciak, Thomas Petazzoni, Guenter Roeck,
	nicolas.dufresne, linux-arm-kernel, linux-media


On 1/8/19 5:52 PM, Randy 'ayaka' Li wrote:
> On Thu, Nov 15, 2018 at 03:56:49PM +0100, Maxime Ripard wrote:
>> From: Pawel Osciak <posciak@chromium.org>
>>
>> Stateless video codecs will require both the H264 metadata and slices in
>> order to be able to decode frames.
>>
>> This introduces the definitions for a new pixel format for H264 slices that
>> have been parsed, as well as the structures used to pass the metadata from
>> the userspace to the kernel.
>>
>> Co-Developed-by: Maxime Ripard <maxime.ripard@bootlin.com>
>> Signed-off-by: Pawel Osciak <posciak@chromium.org>
>> Signed-off-by: Guenter Roeck <groeck@chromium.org>
>> Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
>> ---
>>   Documentation/media/uapi/v4l/biblio.rst       |   9 +
>>   .../media/uapi/v4l/extended-controls.rst      | 364 ++++++++++++++++++
>>   .../media/uapi/v4l/pixfmt-compressed.rst      |  20 +
>>   .../media/uapi/v4l/vidioc-queryctrl.rst       |  30 ++
>>   .../media/videodev2.h.rst.exceptions          |   5 +
>>   drivers/media/v4l2-core/v4l2-ctrls.c          |  42 ++
>>   drivers/media/v4l2-core/v4l2-ioctl.c          |   1 +
>>   include/media/v4l2-ctrls.h                    |  10 +
>>   include/uapi/linux/v4l2-controls.h            | 166 ++++++++
>>   include/uapi/linux/videodev2.h                |  11 +
>>   10 files changed, 658 insertions(+)
>> +#define V4L2_H264_DPB_ENTRY_FLAG_VALID		0x01
>> +#define V4L2_H264_DPB_ENTRY_FLAG_ACTIVE		0x02
>> +#define V4L2_H264_DPB_ENTRY_FLAG_LONG_TERM	0x04
>> +
>> +struct v4l2_h264_dpb_entry {
>> +	__u32 tag;
>> +	__u16 frame_num;
>> +	__u16 pic_num;
> Although the long term reference would use picture order count
> and short term for frame num, but only one of them is used
> for a entry of a dpb.
>
> Besides, for a frame picture frame_num = pic_num * 2,
> and frame_num = pic_num * 2 + 1 for a filed.

I mistook something before and something Herman told me is wrong, I read 
the book explaining the ITU standard.

The index of a short term reference picture would be frame_num or POC 
and LongTermPicNum for long term.

But stateless hardware decoder usually don't care about whether it is 
long term or short term, as the real dpb updating or management work are 
not done by the the driver or device and decoding job would only use the 
two list(or one list for slice P) for reference pictures. So those flag 
for long term or status can be removed as well.

Stateless decoder would care about just reference index of this picture 
and maybe some extra property for the filed coded below. Keeping a 
property here for the index of a picture is enough.

>> +	/* Note that field is indicated by v4l2_buffer.field */
>> +	__s32 top_field_order_cnt;
>> +	__s32 bottom_field_order_cnt;
>> +	__u8 flags; /* V4L2_H264_DPB_ENTRY_FLAG_* */
>> +};
>> +
>> +struct v4l2_ctrl_h264_decode_param {
>> +	__u32 num_slices;
>> +	__u8 idr_pic_flag;
>> +	__u8 nal_ref_idc;
>> +	__s32 top_field_order_cnt;
>> +	__s32 bottom_field_order_cnt;
>> +	__u8 ref_pic_list_p0[32];
>> +	__u8 ref_pic_list_b0[32];
>> +	__u8 ref_pic_list_b1[32];
> I would prefer to keep only two list, list0 and list 1.
> Anyway P slice just use the list0 and B would use the both.
>> +	struct v4l2_h264_dpb_entry dpb[16];
>> +};
>> +
>>   #endif
>> diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h
>> index 173a94d2cbef..dd028e0bf306 100644
>> --- a/include/uapi/linux/videodev2.h
>> +++ b/include/uapi/linux/videodev2.h
>> @@ -643,6 +643,7 @@ struct v4l2_pix_format {
>>   #define V4L2_PIX_FMT_H264     v4l2_fourcc('H', '2', '6', '4') /* H264 with start codes */
>>   #define V4L2_PIX_FMT_H264_NO_SC v4l2_fourcc('A', 'V', 'C', '1') /* H264 without start codes */
>>   #define V4L2_PIX_FMT_H264_MVC v4l2_fourcc('M', '2', '6', '4') /* H264 MVC */
>> +#define V4L2_PIX_FMT_H264_SLICE v4l2_fourcc('S', '2', '6', '4') /* H264 parsed slices */
>>   #define V4L2_PIX_FMT_H263     v4l2_fourcc('H', '2', '6', '3') /* H263          */
>>   #define V4L2_PIX_FMT_MPEG1    v4l2_fourcc('M', 'P', 'G', '1') /* MPEG-1 ES     */
>>   #define V4L2_PIX_FMT_MPEG2    v4l2_fourcc('M', 'P', 'G', '2') /* MPEG-2 ES     */
>> @@ -1631,6 +1632,11 @@ struct v4l2_ext_control {
>>   		__u32 __user *p_u32;
>>   		struct v4l2_ctrl_mpeg2_slice_params __user *p_mpeg2_slice_params;
>>   		struct v4l2_ctrl_mpeg2_quantization __user *p_mpeg2_quantization;
>> +		struct v4l2_ctrl_h264_sps __user *p_h264_sps;
>> +		struct v4l2_ctrl_h264_pps __user *p_h264_pps;
>> +		struct v4l2_ctrl_h264_scaling_matrix __user *p_h264_scal_mtrx;
>> +		struct v4l2_ctrl_h264_slice_param __user *p_h264_slice_param;
>> +		struct v4l2_ctrl_h264_decode_param __user *p_h264_decode_param;
>>   		void __user *ptr;
>>   	};
>>   } __attribute__ ((packed));
>> @@ -1678,6 +1684,11 @@ enum v4l2_ctrl_type {
>>   	V4L2_CTRL_TYPE_U32	     = 0x0102,
>>   	V4L2_CTRL_TYPE_MPEG2_SLICE_PARAMS = 0x0103,
>>   	V4L2_CTRL_TYPE_MPEG2_QUANTIZATION = 0x0104,
>> +	V4L2_CTRL_TYPE_H264_SPS      = 0x0105,
>> +	V4L2_CTRL_TYPE_H264_PPS      = 0x0106,
>> +	V4L2_CTRL_TYPE_H264_SCALING_MATRIX = 0x0107,
>> +	V4L2_CTRL_TYPE_H264_SLICE_PARAMS = 0x0108,
>> +	V4L2_CTRL_TYPE_H264_DECODE_PARAMS = 0x0109,
>>   };
>>   
>>   /*  Used in the VIDIOC_QUERYCTRL ioctl for querying controls */

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 1/2] media: uapi: Add H264 low-level decoder API compound controls.
  2019-01-08 17:01     ` ayaka
@ 2019-01-10 13:33       ` ayaka
  2019-01-17 11:21         ` Maxime Ripard
  2019-01-17 11:16       ` Maxime Ripard
  1 sibling, 1 reply; 27+ messages in thread
From: ayaka @ 2019-01-10 13:33 UTC (permalink / raw)
  To: Maxime Ripard
  Cc: hans.verkuil, acourbot, sakari.ailus, Laurent Pinchart,
	jenskuske, linux-sunxi, linux-kernel, tfiga, Paul Kocialkowski,
	Chen-Yu Tsai, posciak, Thomas Petazzoni, Guenter Roeck,
	nicolas.dufresne, linux-arm-kernel, linux-media

I forget a important thing, for the rkvdec and rk hevc decoder, it would 
requests cabac table, scaling list, picture parameter set and reference 
picture storing in one or various of DMA buffers. I am not talking about 
the data been parsed, the decoder would requests a raw data.

For the pps and rps, it is possible to reuse the slice header, just let 
the decoder know the offset from the bitstream bufer, I would suggest to 
add three properties(with sps) for them. But I think we need a method to 
mark a OUTPUT side buffer for those aux data.

On 1/9/19 1:01 AM, ayaka wrote:
>
> On 1/8/19 5:52 PM, Randy 'ayaka' Li wrote:
>> On Thu, Nov 15, 2018 at 03:56:49PM +0100, Maxime Ripard wrote:
>>> From: Pawel Osciak <posciak@chromium.org>
>>>
>>> Stateless video codecs will require both the H264 metadata and 
>>> slices in
>>> order to be able to decode frames.
>>>
>>> This introduces the definitions for a new pixel format for H264 
>>> slices that
>>> have been parsed, as well as the structures used to pass the 
>>> metadata from
>>> the userspace to the kernel.
>>>
>>> Co-Developed-by: Maxime Ripard <maxime.ripard@bootlin.com>
>>> Signed-off-by: Pawel Osciak <posciak@chromium.org>
>>> Signed-off-by: Guenter Roeck <groeck@chromium.org>
>>> Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
>>> ---
>>>   Documentation/media/uapi/v4l/biblio.rst       |   9 +
>>>   .../media/uapi/v4l/extended-controls.rst      | 364 
>>> ++++++++++++++++++
>>>   .../media/uapi/v4l/pixfmt-compressed.rst      |  20 +
>>>   .../media/uapi/v4l/vidioc-queryctrl.rst       |  30 ++
>>>   .../media/videodev2.h.rst.exceptions          |   5 +
>>>   drivers/media/v4l2-core/v4l2-ctrls.c          |  42 ++
>>>   drivers/media/v4l2-core/v4l2-ioctl.c          |   1 +
>>>   include/media/v4l2-ctrls.h                    |  10 +
>>>   include/uapi/linux/v4l2-controls.h            | 166 ++++++++
>>>   include/uapi/linux/videodev2.h                |  11 +
>>>   10 files changed, 658 insertions(+)
>>> +#define V4L2_H264_DPB_ENTRY_FLAG_VALID        0x01
>>> +#define V4L2_H264_DPB_ENTRY_FLAG_ACTIVE        0x02
>>> +#define V4L2_H264_DPB_ENTRY_FLAG_LONG_TERM    0x04
>>> +
>>> +struct v4l2_h264_dpb_entry {
>>> +    __u32 tag;
>>> +    __u16 frame_num;
>>> +    __u16 pic_num;
>> Although the long term reference would use picture order count
>> and short term for frame num, but only one of them is used
>> for a entry of a dpb.
>>
>> Besides, for a frame picture frame_num = pic_num * 2,
>> and frame_num = pic_num * 2 + 1 for a filed.
>
> I mistook something before and something Herman told me is wrong, I 
> read the book explaining the ITU standard.
>
> The index of a short term reference picture would be frame_num or POC 
> and LongTermPicNum for long term.
>
> But stateless hardware decoder usually don't care about whether it is 
> long term or short term, as the real dpb updating or management work 
> are not done by the the driver or device and decoding job would only 
> use the two list(or one list for slice P) for reference pictures. So 
> those flag for long term or status can be removed as well.
>
> Stateless decoder would care about just reference index of this 
> picture and maybe some extra property for the filed coded below. 
> Keeping a property here for the index of a picture is enough.
>
>>> +    /* Note that field is indicated by v4l2_buffer.field */
>>> +    __s32 top_field_order_cnt;
>>> +    __s32 bottom_field_order_cnt;
>>> +    __u8 flags; /* V4L2_H264_DPB_ENTRY_FLAG_* */
>>> +};
>>> +
>>> +struct v4l2_ctrl_h264_decode_param {
>>> +    __u32 num_slices;
>>> +    __u8 idr_pic_flag;
>>> +    __u8 nal_ref_idc;
>>> +    __s32 top_field_order_cnt;
>>> +    __s32 bottom_field_order_cnt;
>>> +    __u8 ref_pic_list_p0[32];
>>> +    __u8 ref_pic_list_b0[32];
>>> +    __u8 ref_pic_list_b1[32];
>> I would prefer to keep only two list, list0 and list 1.
>> Anyway P slice just use the list0 and B would use the both.
>>> +    struct v4l2_h264_dpb_entry dpb[16];
>>> +};
>>> +
>>>   #endif
>>> diff --git a/include/uapi/linux/videodev2.h 
>>> b/include/uapi/linux/videodev2.h
>>> index 173a94d2cbef..dd028e0bf306 100644
>>> --- a/include/uapi/linux/videodev2.h
>>> +++ b/include/uapi/linux/videodev2.h
>>> @@ -643,6 +643,7 @@ struct v4l2_pix_format {
>>>   #define V4L2_PIX_FMT_H264     v4l2_fourcc('H', '2', '6', '4') /* 
>>> H264 with start codes */
>>>   #define V4L2_PIX_FMT_H264_NO_SC v4l2_fourcc('A', 'V', 'C', '1') /* 
>>> H264 without start codes */
>>>   #define V4L2_PIX_FMT_H264_MVC v4l2_fourcc('M', '2', '6', '4') /* 
>>> H264 MVC */
>>> +#define V4L2_PIX_FMT_H264_SLICE v4l2_fourcc('S', '2', '6', '4') /* 
>>> H264 parsed slices */
>>>   #define V4L2_PIX_FMT_H263     v4l2_fourcc('H', '2', '6', '3') /* 
>>> H263          */
>>>   #define V4L2_PIX_FMT_MPEG1    v4l2_fourcc('M', 'P', 'G', '1') /* 
>>> MPEG-1 ES     */
>>>   #define V4L2_PIX_FMT_MPEG2    v4l2_fourcc('M', 'P', 'G', '2') /* 
>>> MPEG-2 ES     */
>>> @@ -1631,6 +1632,11 @@ struct v4l2_ext_control {
>>>           __u32 __user *p_u32;
>>>           struct v4l2_ctrl_mpeg2_slice_params __user 
>>> *p_mpeg2_slice_params;
>>>           struct v4l2_ctrl_mpeg2_quantization __user 
>>> *p_mpeg2_quantization;
>>> +        struct v4l2_ctrl_h264_sps __user *p_h264_sps;
>>> +        struct v4l2_ctrl_h264_pps __user *p_h264_pps;
>>> +        struct v4l2_ctrl_h264_scaling_matrix __user *p_h264_scal_mtrx;
>>> +        struct v4l2_ctrl_h264_slice_param __user *p_h264_slice_param;
>>> +        struct v4l2_ctrl_h264_decode_param __user 
>>> *p_h264_decode_param;
>>>           void __user *ptr;
>>>       };
>>>   } __attribute__ ((packed));
>>> @@ -1678,6 +1684,11 @@ enum v4l2_ctrl_type {
>>>       V4L2_CTRL_TYPE_U32         = 0x0102,
>>>       V4L2_CTRL_TYPE_MPEG2_SLICE_PARAMS = 0x0103,
>>>       V4L2_CTRL_TYPE_MPEG2_QUANTIZATION = 0x0104,
>>> +    V4L2_CTRL_TYPE_H264_SPS      = 0x0105,
>>> +    V4L2_CTRL_TYPE_H264_PPS      = 0x0106,
>>> +    V4L2_CTRL_TYPE_H264_SCALING_MATRIX = 0x0107,
>>> +    V4L2_CTRL_TYPE_H264_SLICE_PARAMS = 0x0108,
>>> +    V4L2_CTRL_TYPE_H264_DECODE_PARAMS = 0x0109,
>>>   };
>>>     /*  Used in the VIDIOC_QUERYCTRL ioctl for querying controls */

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 1/2] media: uapi: Add H264 low-level decoder API compound controls.
  2019-01-08  9:52   ` Randy 'ayaka' Li
  2019-01-08 17:01     ` ayaka
@ 2019-01-17 11:01     ` Maxime Ripard
  2019-01-20 12:48       ` ayaka
  1 sibling, 1 reply; 27+ messages in thread
From: Maxime Ripard @ 2019-01-17 11:01 UTC (permalink / raw)
  To: Randy 'ayaka' Li
  Cc: hans.verkuil, acourbot, sakari.ailus, Laurent Pinchart,
	jenskuske, linux-sunxi, linux-kernel, tfiga, Paul Kocialkowski,
	Chen-Yu Tsai, posciak, Thomas Petazzoni, Guenter Roeck,
	nicolas.dufresne, linux-arm-kernel, linux-media

[-- Attachment #1: Type: text/plain, Size: 3662 bytes --]

Hi,

On Tue, Jan 08, 2019 at 05:52:28PM +0800, Randy 'ayaka' Li wrote:
> > +struct v4l2_ctrl_h264_scaling_matrix {
> > +	__u8 scaling_list_4x4[6][16];
> > +	__u8 scaling_list_8x8[6][64];
> > +};
> 
> I wonder which decoder want this.

I'm not sure I follow you, scaling lists are an important part of the
decoding process, so all of them?

> > +struct v4l2_ctrl_h264_slice_param {
> > +	/* Size in bytes, including header */
> > +	__u32 size;
> > +	/* Offset in bits to slice_data() from the beginning of this slice. */
> > +	__u32 header_bit_size;
> > +
> > +	__u16 first_mb_in_slice;
> > +	__u8 slice_type;
> > +	__u8 pic_parameter_set_id;
> > +	__u8 colour_plane_id;
> > +	__u16 frame_num;
> > +	__u16 idr_pic_id;
> > +	__u16 pic_order_cnt_lsb;
> > +	__s32 delta_pic_order_cnt_bottom;
> > +	__s32 delta_pic_order_cnt0;
> > +	__s32 delta_pic_order_cnt1;
> > +	__u8 redundant_pic_cnt;
> > +
> > +	struct v4l2_h264_pred_weight_table pred_weight_table;
> > +	/* Size in bits of dec_ref_pic_marking() syntax element. */
> > +	__u32 dec_ref_pic_marking_bit_size;
> > +	/* Size in bits of pic order count syntax. */
> > +	__u32 pic_order_cnt_bit_size;
> > +
> > +	__u8 cabac_init_idc;
> > +	__s8 slice_qp_delta;
> > +	__s8 slice_qs_delta;
> > +	__u8 disable_deblocking_filter_idc;
> > +	__s8 slice_alpha_c0_offset_div2;
> > +	__s8 slice_beta_offset_div2;
> > +	__u32 slice_group_change_cycle;
> > +
> > +	__u8 num_ref_idx_l0_active_minus1;
> > +	__u8 num_ref_idx_l1_active_minus1;
> > +	/*  Entries on each list are indices
> > +	 *  into v4l2_ctrl_h264_decode_param.dpb[]. */
> > +	__u8 ref_pic_list0[32];
> > +	__u8 ref_pic_list1[32];
> > +
> > +	__u8 flags;
> > +};
> > +
> We need some addtional properties or the Rockchip won't work.
> 1. u16 idr_pic_id for identifies IDR (instantaneous decoding refresh)
> picture

idr_pic_id is already there

> 2. u16 ref_pic_mk_len for length of decoded reference picture marking bits
> 3. u8 poc_length for length of picture order count field in stream
> 
> The last two are used for the hardware to skip a part stream.

I'm not sure what you mean here, those parameters are not in the
bitstream, what do you want to use them for?

> > +#define V4L2_H264_DPB_ENTRY_FLAG_VALID		0x01
> > +#define V4L2_H264_DPB_ENTRY_FLAG_ACTIVE		0x02
> > +#define V4L2_H264_DPB_ENTRY_FLAG_LONG_TERM	0x04
> > +
> > +struct v4l2_h264_dpb_entry {
> > +	__u32 tag;
> > +	__u16 frame_num;
> > +	__u16 pic_num;
> 
> Although the long term reference would use picture order count
> and short term for frame num, but only one of them is used
> for a entry of a dpb.
> 
> Besides, for a frame picture frame_num = pic_num * 2,
> and frame_num = pic_num * 2 + 1 for a filed.

I'm not sure what is your point?

> > +	/* Note that field is indicated by v4l2_buffer.field */
> > +	__s32 top_field_order_cnt;
> > +	__s32 bottom_field_order_cnt;
> > +	__u8 flags; /* V4L2_H264_DPB_ENTRY_FLAG_* */
> > +};
> > +
> > +struct v4l2_ctrl_h264_decode_param {
> > +	__u32 num_slices;
> > +	__u8 idr_pic_flag;
> > +	__u8 nal_ref_idc;
> > +	__s32 top_field_order_cnt;
> > +	__s32 bottom_field_order_cnt;
> > +	__u8 ref_pic_list_p0[32];
> > +	__u8 ref_pic_list_b0[32];
> > +	__u8 ref_pic_list_b1[32];
>
> I would prefer to keep only two list, list0 and list 1.

I'm not even sure why this is needed in the first place anymore. It's
not part of the bitstream, and it seems to come from ChromeOS' Rockchip driver that uses it though. Do you know why?

Thanks!
Maxime

-- 
Maxime Ripard, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 1/2] media: uapi: Add H264 low-level decoder API compound controls.
  2019-01-08 17:01     ` ayaka
  2019-01-10 13:33       ` ayaka
@ 2019-01-17 11:16       ` Maxime Ripard
  1 sibling, 0 replies; 27+ messages in thread
From: Maxime Ripard @ 2019-01-17 11:16 UTC (permalink / raw)
  To: ayaka
  Cc: hans.verkuil, acourbot, sakari.ailus, Laurent Pinchart,
	jenskuske, linux-sunxi, linux-kernel, tfiga, Paul Kocialkowski,
	Chen-Yu Tsai, posciak, Thomas Petazzoni, Guenter Roeck,
	nicolas.dufresne, linux-arm-kernel, linux-media

[-- Attachment #1: Type: text/plain, Size: 3155 bytes --]

Hi,

On Wed, Jan 09, 2019 at 01:01:22AM +0800, ayaka wrote:
> On 1/8/19 5:52 PM, Randy 'ayaka' Li wrote:
> > On Thu, Nov 15, 2018 at 03:56:49PM +0100, Maxime Ripard wrote:
> > > From: Pawel Osciak <posciak@chromium.org>
> > > 
> > > Stateless video codecs will require both the H264 metadata and slices in
> > > order to be able to decode frames.
> > > 
> > > This introduces the definitions for a new pixel format for H264 slices that
> > > have been parsed, as well as the structures used to pass the metadata from
> > > the userspace to the kernel.
> > > 
> > > Co-Developed-by: Maxime Ripard <maxime.ripard@bootlin.com>
> > > Signed-off-by: Pawel Osciak <posciak@chromium.org>
> > > Signed-off-by: Guenter Roeck <groeck@chromium.org>
> > > Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
> > > ---
> > >   Documentation/media/uapi/v4l/biblio.rst       |   9 +
> > >   .../media/uapi/v4l/extended-controls.rst      | 364 ++++++++++++++++++
> > >   .../media/uapi/v4l/pixfmt-compressed.rst      |  20 +
> > >   .../media/uapi/v4l/vidioc-queryctrl.rst       |  30 ++
> > >   .../media/videodev2.h.rst.exceptions          |   5 +
> > >   drivers/media/v4l2-core/v4l2-ctrls.c          |  42 ++
> > >   drivers/media/v4l2-core/v4l2-ioctl.c          |   1 +
> > >   include/media/v4l2-ctrls.h                    |  10 +
> > >   include/uapi/linux/v4l2-controls.h            | 166 ++++++++
> > >   include/uapi/linux/videodev2.h                |  11 +
> > >   10 files changed, 658 insertions(+)
> > > +#define V4L2_H264_DPB_ENTRY_FLAG_VALID		0x01
> > > +#define V4L2_H264_DPB_ENTRY_FLAG_ACTIVE		0x02
> > > +#define V4L2_H264_DPB_ENTRY_FLAG_LONG_TERM	0x04
> > > +
> > > +struct v4l2_h264_dpb_entry {
> > > +	__u32 tag;
> > > +	__u16 frame_num;
> > > +	__u16 pic_num;
> > Although the long term reference would use picture order count
> > and short term for frame num, but only one of them is used
> > for a entry of a dpb.
> > 
> > Besides, for a frame picture frame_num = pic_num * 2,
> > and frame_num = pic_num * 2 + 1 for a filed.
> 
> I mistook something before and something Herman told me is wrong, I read the
> book explaining the ITU standard.
> 
> The index of a short term reference picture would be frame_num or POC and
> LongTermPicNum for long term.
> 
> But stateless hardware decoder usually don't care about whether it is long
> term or short term, as the real dpb updating or management work are not done
> by the the driver or device and decoding job would only use the two list(or
> one list for slice P) for reference pictures. So those flag for long term or
> status can be removed as well.
> 
> Stateless decoder would care about just reference index of this picture and
> maybe some extra property for the filed coded below. Keeping a property here
> for the index of a picture is enough.

It doesn't look like it's part of the bitstream, the rockchip driver
seem like it's using the long term flags in the chromeos
driver. Tomasz, do you know why it's needed?

Maxime

-- 
Maxime Ripard, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 1/2] media: uapi: Add H264 low-level decoder API compound controls.
  2019-01-10 13:33       ` ayaka
@ 2019-01-17 11:21         ` Maxime Ripard
  0 siblings, 0 replies; 27+ messages in thread
From: Maxime Ripard @ 2019-01-17 11:21 UTC (permalink / raw)
  To: ayaka
  Cc: hans.verkuil, acourbot, sakari.ailus, Laurent Pinchart,
	jenskuske, linux-sunxi, linux-kernel, tfiga, Paul Kocialkowski,
	Chen-Yu Tsai, posciak, Thomas Petazzoni, Guenter Roeck,
	nicolas.dufresne, linux-arm-kernel, linux-media

[-- Attachment #1: Type: text/plain, Size: 1034 bytes --]

Hi,

On Thu, Jan 10, 2019 at 09:33:01PM +0800, ayaka wrote:
> I forget a important thing, for the rkvdec and rk hevc decoder, it would
> requests cabac table, scaling list, picture parameter set and reference
> picture storing in one or various of DMA buffers. I am not talking about the
> data been parsed, the decoder would requests a raw data.
> 
> For the pps and rps, it is possible to reuse the slice header, just let the
> decoder know the offset from the bitstream bufer, I would suggest to add
> three properties(with sps) for them. But I think we need a method to mark a
> OUTPUT side buffer for those aux data.

I'm not sure this is something we actually want. The whole design
decision was that we wouldn't have a bitstream parser in the kernel,
and doing as you suggest goes against that design.

And either if it is something that turns out to be useful, this is
really out of scope for this series.

Maxime

-- 
Maxime Ripard, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 1/2] media: uapi: Add H264 low-level decoder API compound controls.
  2019-01-17 11:01     ` Maxime Ripard
@ 2019-01-20 12:48       ` ayaka
  2019-01-24 14:23         ` Maxime Ripard
  0 siblings, 1 reply; 27+ messages in thread
From: ayaka @ 2019-01-20 12:48 UTC (permalink / raw)
  To: Maxime Ripard
  Cc: hans.verkuil, acourbot, sakari.ailus, Laurent Pinchart,
	jenskuske, linux-sunxi, linux-kernel, tfiga, Paul Kocialkowski,
	Chen-Yu Tsai, posciak, Thomas Petazzoni, Guenter Roeck,
	nicolas.dufresne, linux-arm-kernel, linux-media

I am sorry I am a little busy for the lunar new year recently and the 
H.264 syntax rules are little complex, I will try explain my ideas more 
clear here.

On 1/17/19 7:01 PM, Maxime Ripard wrote:
> Hi,
>
> On Tue, Jan 08, 2019 at 05:52:28PM +0800, Randy 'ayaka' Li wrote:
>>> +struct v4l2_ctrl_h264_scaling_matrix {
>>> +	__u8 scaling_list_4x4[6][16];
>>> +	__u8 scaling_list_8x8[6][64];
>>> +};
>> I wonder which decoder want this.
> I'm not sure I follow you, scaling lists are an important part of the
> decoding process, so all of them?
Not actually, when the scaling list is in the sequence(a flag for it), 
we need to tell the decoder a scaling table. But the initial state of 
that table is known, so for some decoder, it would have a internal 
table. And for some decoder, it wants in the Z order while the others won't.
>
>>> +struct v4l2_ctrl_h264_slice_param {
>>> +	/* Size in bytes, including header */
>>> +	__u32 size;
>>> +	/* Offset in bits to slice_data() from the beginning of this slice. */
>>> +	__u32 header_bit_size;
>>> +
>>> +	__u16 first_mb_in_slice;
>>> +	__u8 slice_type;
>>> +	__u8 pic_parameter_set_id;
>>> +	__u8 colour_plane_id;
>>> +	__u16 frame_num;
>>> +	__u16 idr_pic_id;
>>> +	__u16 pic_order_cnt_lsb;
>>> +	__s32 delta_pic_order_cnt_bottom;
>>> +	__s32 delta_pic_order_cnt0;
>>> +	__s32 delta_pic_order_cnt1;
>>> +	__u8 redundant_pic_cnt;
>>> +
>>> +	struct v4l2_h264_pred_weight_table pred_weight_table;
>>> +	/* Size in bits of dec_ref_pic_marking() syntax element. */
>>> +	__u32 dec_ref_pic_marking_bit_size;
>>> +	/* Size in bits of pic order count syntax. */
>>> +	__u32 pic_order_cnt_bit_size;
>>> +
>>> +	__u8 cabac_init_idc;
>>> +	__s8 slice_qp_delta;
>>> +	__s8 slice_qs_delta;
>>> +	__u8 disable_deblocking_filter_idc;
>>> +	__s8 slice_alpha_c0_offset_div2;
>>> +	__s8 slice_beta_offset_div2;
>>> +	__u32 slice_group_change_cycle;
>>> +
>>> +	__u8 num_ref_idx_l0_active_minus1;
>>> +	__u8 num_ref_idx_l1_active_minus1;
>>> +	/*  Entries on each list are indices
>>> +	 *  into v4l2_ctrl_h264_decode_param.dpb[]. */
>>> +	__u8 ref_pic_list0[32];
>>> +	__u8 ref_pic_list1[32];
>>> +
>>> +	__u8 flags;
>>> +};
>>> +
>> We need some addtional properties or the Rockchip won't work.
>> 1. u16 idr_pic_id for identifies IDR (instantaneous decoding refresh)
>> picture
> idr_pic_id is already there
Sorry for miss that.
>
>> 2. u16 ref_pic_mk_len for length of decoded reference picture marking bits
>> 3. u8 poc_length for length of picture order count field in stream
>>
>> The last two are used for the hardware to skip a part stream.
> I'm not sure what you mean here, those parameters are not in the
> bitstream, what do you want to use them for?

Or Rockchip's decoder won't work. Their decoder can't find the data part 
without skip some segments in slice data.

I should say something more about the stateless decoder, it is hard to 
define what a stateless decoder will do, some would like to parse more 
information but some won't. You even have no idea on what it would 
accelerate. OK, I should say for those ISO H serial codec, it would be 
more simple but for those VPx serial, the decoders design is a mess.

>>> +#define V4L2_H264_DPB_ENTRY_FLAG_VALID		0x01
>>> +#define V4L2_H264_DPB_ENTRY_FLAG_ACTIVE		0x02
>>> +#define V4L2_H264_DPB_ENTRY_FLAG_LONG_TERM	0x04
>>> +
>>> +struct v4l2_h264_dpb_entry {
>>> +	__u32 tag;
>>> +	__u16 frame_num;
>>> +	__u16 pic_num;
>> Although the long term reference would use picture order count
>> and short term for frame num, but only one of them is used
>> for a entry of a dpb.
>>
>> Besides, for a frame picture frame_num = pic_num * 2,
>> and frame_num = pic_num * 2 + 1 for a filed.
> I'm not sure what is your point?

I found I was wrong at the last email.


But stateless hardware decoder usually don't care about whether it is long
term or short term, as the real dpb updating or management work are not done
by the the driver or device and decoding job would only use the two list(or
one list for slice P) for reference pictures. So those flag for long term or
status can be removed as well.
And I agree above with my last mail, so I would suggest to keep a 
property as index for both frame_num and pic_num, as only one of them 
would be used for a picture decoding once time.

>
>>> +	/* Note that field is indicated by v4l2_buffer.field */
>>> +	__s32 top_field_order_cnt;
>>> +	__s32 bottom_field_order_cnt;
>>> +	__u8 flags; /* V4L2_H264_DPB_ENTRY_FLAG_* */
>>> +};
>>> +
>>> +struct v4l2_ctrl_h264_decode_param {
>>> +	__u32 num_slices;
>>> +	__u8 idr_pic_flag;
>>> +	__u8 nal_ref_idc;
>>> +	__s32 top_field_order_cnt;
>>> +	__s32 bottom_field_order_cnt;
>>> +	__u8 ref_pic_list_p0[32];
>>> +	__u8 ref_pic_list_b0[32];
>>> +	__u8 ref_pic_list_b1[32];
>> I would prefer to keep only two list, list0 and list 1.
> I'm not even sure why this is needed in the first place anymore. It's
> not part of the bitstream, and it seems to come from ChromeOS' Rockchip driver that uses it though. Do you know why?

You see the P frame would use only a list and B for two list. So for the 
parameter of a picture, two lists are max. I would suggest only keep two 
arrays here and rename them as list0 and list1, it would reduce the 
conflict.


Please forget the chrome os driver, there are too many problems leaving 
there. Believe why I made a mistakes in the previous email, because the 
note and confirm of the H.264 syntax from Rockchip is wrong and I found 
it later.

I want to make thing better at designing state.

>
> Thanks!
> Maxime
>
Thank  you all

Randy 'ayaka' Li


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 1/2] media: uapi: Add H264 low-level decoder API compound controls.
  2019-01-20 12:48       ` ayaka
@ 2019-01-24 14:23         ` Maxime Ripard
  2019-01-24 14:37           ` Ayaka
  0 siblings, 1 reply; 27+ messages in thread
From: Maxime Ripard @ 2019-01-24 14:23 UTC (permalink / raw)
  To: ayaka, tfiga, acourbot
  Cc: hans.verkuil, sakari.ailus, Laurent Pinchart, jenskuske,
	linux-sunxi, linux-kernel, Paul Kocialkowski, Chen-Yu Tsai,
	posciak, Thomas Petazzoni, Guenter Roeck, nicolas.dufresne,
	linux-arm-kernel, linux-media

[-- Attachment #1: Type: text/plain, Size: 6686 bytes --]

Hi!

On Sun, Jan 20, 2019 at 08:48:32PM +0800, ayaka wrote:
> > > > +struct v4l2_ctrl_h264_scaling_matrix {
> > > > +	__u8 scaling_list_4x4[6][16];
> > > > +	__u8 scaling_list_8x8[6][64];
> > > > +};
> > > I wonder which decoder want this.
> > I'm not sure I follow you, scaling lists are an important part of the
> > decoding process, so all of them?
>
> Not actually, when the scaling list is in the sequence(a flag for it), we
> need to tell the decoder a scaling table.

Right, that's why the scaling list has a control of its own.

> But the initial state of that table is known, so for some decoder,
> it would have a internal table.

That control is optional, so you can just ignore that setting in that
case

> And for some decoder, it wants in the Z order while the others
> won't.

We're designing a generic API here, so it doesn't matter. Some will
have to convert it internally in the drivers for the Z order, while
others will be able to use it as is.

> > > > +struct v4l2_ctrl_h264_slice_param {
> > > > +	/* Size in bytes, including header */
> > > > +	__u32 size;
> > > > +	/* Offset in bits to slice_data() from the beginning of this slice. */
> > > > +	__u32 header_bit_size;
> > > > +
> > > > +	__u16 first_mb_in_slice;
> > > > +	__u8 slice_type;
> > > > +	__u8 pic_parameter_set_id;
> > > > +	__u8 colour_plane_id;
> > > > +	__u16 frame_num;
> > > > +	__u16 idr_pic_id;
> > > > +	__u16 pic_order_cnt_lsb;
> > > > +	__s32 delta_pic_order_cnt_bottom;
> > > > +	__s32 delta_pic_order_cnt0;
> > > > +	__s32 delta_pic_order_cnt1;
> > > > +	__u8 redundant_pic_cnt;
> > > > +
> > > > +	struct v4l2_h264_pred_weight_table pred_weight_table;
> > > > +	/* Size in bits of dec_ref_pic_marking() syntax element. */
> > > > +	__u32 dec_ref_pic_marking_bit_size;
> > > > +	/* Size in bits of pic order count syntax. */
> > > > +	__u32 pic_order_cnt_bit_size;
> > > > +
> > > > +	__u8 cabac_init_idc;
> > > > +	__s8 slice_qp_delta;
> > > > +	__s8 slice_qs_delta;
> > > > +	__u8 disable_deblocking_filter_idc;
> > > > +	__s8 slice_alpha_c0_offset_div2;
> > > > +	__s8 slice_beta_offset_div2;
> > > > +	__u32 slice_group_change_cycle;
> > > > +
> > > > +	__u8 num_ref_idx_l0_active_minus1;
> > > > +	__u8 num_ref_idx_l1_active_minus1;
> > > > +	/*  Entries on each list are indices
> > > > +	 *  into v4l2_ctrl_h264_decode_param.dpb[]. */
> > > > +	__u8 ref_pic_list0[32];
> > > > +	__u8 ref_pic_list1[32];
> > > > +
> > > > +	__u8 flags;
> > > > +};
> > > > +
> > > We need some addtional properties or the Rockchip won't work.
> > > 1. u16 idr_pic_id for identifies IDR (instantaneous decoding refresh)
> > > picture
> > idr_pic_id is already there
> Sorry for miss that.
> > 
> > > 2. u16 ref_pic_mk_len for length of decoded reference picture marking bits
> > > 3. u8 poc_length for length of picture order count field in stream
> > > 
> > > The last two are used for the hardware to skip a part stream.
> > I'm not sure what you mean here, those parameters are not in the
> > bitstream, what do you want to use them for?
> 
> Or Rockchip's decoder won't work. Their decoder can't find the data part
> without skip some segments in slice data.
> 
> I should say something more about the stateless decoder, it is hard to
> define what a stateless decoder will do, some would like to parse more
> information but some won't. You even have no idea on what it would
> accelerate. OK, I should say for those ISO H serial codec, it would be more
> simple but for those VPx serial, the decoders design is a mess.

Can't you use header_bit_size in that case to skip over the the parts
of the slice you don't care about and go to the data?

> > > > +#define V4L2_H264_DPB_ENTRY_FLAG_VALID		0x01
> > > > +#define V4L2_H264_DPB_ENTRY_FLAG_ACTIVE		0x02
> > > > +#define V4L2_H264_DPB_ENTRY_FLAG_LONG_TERM	0x04
> > > > +
> > > > +struct v4l2_h264_dpb_entry {
> > > > +	__u32 tag;
> > > > +	__u16 frame_num;
> > > > +	__u16 pic_num;
> > > Although the long term reference would use picture order count
> > > and short term for frame num, but only one of them is used
> > > for a entry of a dpb.
> > > 
> > > Besides, for a frame picture frame_num = pic_num * 2,
> > > and frame_num = pic_num * 2 + 1 for a filed.
> >
> > I'm not sure what is your point?
> 
> I found I was wrong at the last email.
> 
> But stateless hardware decoder usually don't care about whether it is long
> term or short term, as the real dpb updating or management work are not done
> by the the driver or device and decoding job would only use the two list(or
> one list for slice P) for reference pictures. So those flag for long term or
> status can be removed as well.

I'll remove the LONG_TERM flag then. We do need the other two for the
Allwinner driver though.

> And I agree above with my last mail, so I would suggest to keep a property
> as index for both frame_num and pic_num, as only one of them would be used
> for a picture decoding once time.

I'd really prefer to keep everything that is in the bitstream defined
here. We don't want to cover the usual cases, but all of them even the
one that haven't been designed yet, so we should be really
conservative.

> > > > +	/* Note that field is indicated by v4l2_buffer.field */
> > > > +	__s32 top_field_order_cnt;
> > > > +	__s32 bottom_field_order_cnt;
> > > > +	__u8 flags; /* V4L2_H264_DPB_ENTRY_FLAG_* */
> > > > +};
> > > > +
> > > > +struct v4l2_ctrl_h264_decode_param {
> > > > +	__u32 num_slices;
> > > > +	__u8 idr_pic_flag;
> > > > +	__u8 nal_ref_idc;
> > > > +	__s32 top_field_order_cnt;
> > > > +	__s32 bottom_field_order_cnt;
> > > > +	__u8 ref_pic_list_p0[32];
> > > > +	__u8 ref_pic_list_b0[32];
> > > > +	__u8 ref_pic_list_b1[32];
> > >
> > > I would prefer to keep only two list, list0 and list 1.
> >
> > I'm not even sure why this is needed in the first place
> > anymore. It's not part of the bitstream, and it seems to come from
> > ChromeOS' Rockchip driver that uses it though. Do you know why?
> 
> You see the P frame would use only a list and B for two list. So for
> the parameter of a picture, two lists are max. I would suggest only
> keep two arrays here and rename them as list0 and list1, it would
> reduce the conflict.

Right, but those lists are already in v4l2_ctrl_h264_slice_param (with
the construct you are suggesting). I'm not sure about why the
redundancy is needed. Alex, Tomasz, do you have any idea why this was
needed at some point?

Thanks!
Maxime

-- 
Maxime Ripard, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 1/2] media: uapi: Add H264 low-level decoder API compound controls.
  2019-01-24 14:23         ` Maxime Ripard
@ 2019-01-24 14:37           ` Ayaka
  2019-01-25 12:47             ` Maxime Ripard
  0 siblings, 1 reply; 27+ messages in thread
From: Ayaka @ 2019-01-24 14:37 UTC (permalink / raw)
  To: Maxime Ripard
  Cc: tfiga, acourbot, hans.verkuil, sakari.ailus, Laurent Pinchart,
	jenskuske, linux-sunxi, linux-kernel, Paul Kocialkowski,
	Chen-Yu Tsai, posciak, Thomas Petazzoni, Guenter Roeck,
	nicolas.dufresne, linux-arm-kernel, linux-media



Sent from my iPad

> On Jan 24, 2019, at 10:23 PM, Maxime Ripard <maxime.ripard@bootlin.com> wrote:
> 
> Hi!
> 
> On Sun, Jan 20, 2019 at 08:48:32PM +0800, ayaka wrote:
>>>>> +struct v4l2_ctrl_h264_scaling_matrix {
>>>>> +    __u8 scaling_list_4x4[6][16];
>>>>> +    __u8 scaling_list_8x8[6][64];
>>>>> +};
>>>> I wonder which decoder want this.
>>> I'm not sure I follow you, scaling lists are an important part of the
>>> decoding process, so all of them?
>> 
>> Not actually, when the scaling list is in the sequence(a flag for it), we
>> need to tell the decoder a scaling table.
> 
> Right, that's why the scaling list has a control of its own.
> 
>> But the initial state of that table is known, so for some decoder,
>> it would have a internal table.
> 
> That control is optional, so you can just ignore that setting in that
> case
> 
>> And for some decoder, it wants in the Z order while the others
>> won't.
> 
> We're designing a generic API here, so it doesn't matter. Some will
I know, I just wonder whether the other driver request it
> have to convert it internally in the drivers for the Z order, while
> others will be able to use it as is.
> 
>>>>> +struct v4l2_ctrl_h264_slice_param {
>>>>> +    /* Size in bytes, including header */
>>>>> +    __u32 size;
>>>>> +    /* Offset in bits to slice_data() from the beginning of this slice. */
>>>>> +    __u32 header_bit_size;
>>>>> +
>>>>> +    __u16 first_mb_in_slice;
>>>>> +    __u8 slice_type;
>>>>> +    __u8 pic_parameter_set_id;
>>>>> +    __u8 colour_plane_id;
>>>>> +    __u16 frame_num;
>>>>> +    __u16 idr_pic_id;
>>>>> +    __u16 pic_order_cnt_lsb;
>>>>> +    __s32 delta_pic_order_cnt_bottom;
>>>>> +    __s32 delta_pic_order_cnt0;
>>>>> +    __s32 delta_pic_order_cnt1;
>>>>> +    __u8 redundant_pic_cnt;
>>>>> +
>>>>> +    struct v4l2_h264_pred_weight_table pred_weight_table;
>>>>> +    /* Size in bits of dec_ref_pic_marking() syntax element. */
>>>>> +    __u32 dec_ref_pic_marking_bit_size;
>>>>> +    /* Size in bits of pic order count syntax. */
>>>>> +    __u32 pic_order_cnt_bit_size;
>>>>> +
>>>>> +    __u8 cabac_init_idc;
>>>>> +    __s8 slice_qp_delta;
>>>>> +    __s8 slice_qs_delta;
>>>>> +    __u8 disable_deblocking_filter_idc;
>>>>> +    __s8 slice_alpha_c0_offset_div2;
>>>>> +    __s8 slice_beta_offset_div2;
>>>>> +    __u32 slice_group_change_cycle;
>>>>> +
>>>>> +    __u8 num_ref_idx_l0_active_minus1;
>>>>> +    __u8 num_ref_idx_l1_active_minus1;
>>>>> +    /*  Entries on each list are indices
>>>>> +     *  into v4l2_ctrl_h264_decode_param.dpb[]. */
>>>>> +    __u8 ref_pic_list0[32];
>>>>> +    __u8 ref_pic_list1[32];
>>>>> +
>>>>> +    __u8 flags;
>>>>> +};
>>>>> +
>>>> We need some addtional properties or the Rockchip won't work.
>>>> 1. u16 idr_pic_id for identifies IDR (instantaneous decoding refresh)
>>>> picture
>>> idr_pic_id is already there
>> Sorry for miss that.
>>> 
>>>> 2. u16 ref_pic_mk_len for length of decoded reference picture marking bits
>>>> 3. u8 poc_length for length of picture order count field in stream
>>>> 
>>>> The last two are used for the hardware to skip a part stream.
>>> I'm not sure what you mean here, those parameters are not in the
>>> bitstream, what do you want to use them for?
>> 
>> Or Rockchip's decoder won't work. Their decoder can't find the data part
>> without skip some segments in slice data.
>> 
>> I should say something more about the stateless decoder, it is hard to
>> define what a stateless decoder will do, some would like to parse more
>> information but some won't. You even have no idea on what it would
>> accelerate. OK, I should say for those ISO H serial codec, it would be more
>> simple but for those VPx serial, the decoders design is a mess.
> 
> Can't you use header_bit_size in that case to skip over the the parts
> of the slice you don't care about and go to the data?
> 
No, the decoder request extra size of those two segment of h.264 bitstream
>>>>> +#define V4L2_H264_DPB_ENTRY_FLAG_VALID        0x01
>>>>> +#define V4L2_H264_DPB_ENTRY_FLAG_ACTIVE        0x02
>>>>> +#define V4L2_H264_DPB_ENTRY_FLAG_LONG_TERM    0x04
>>>>> +
>>>>> +struct v4l2_h264_dpb_entry {
>>>>> +    __u32 tag;
>>>>> +    __u16 frame_num;
>>>>> +    __u16 pic_num;
>>>> Although the long term reference would use picture order count
>>>> and short term for frame num, but only one of them is used
>>>> for a entry of a dpb.
>>>> 
>>>> Besides, for a frame picture frame_num = pic_num * 2,
>>>> and frame_num = pic_num * 2 + 1 for a filed.
>>> 
>>> I'm not sure what is your point?
>> 
>> I found I was wrong at the last email.
>> 
>> But stateless hardware decoder usually don't care about whether it is long
>> term or short term, as the real dpb updating or management work are not done
>> by the the driver or device and decoding job would only use the two list(or
>> one list for slice P) for reference pictures. So those flag for long term or
>> status can be removed as well.
> 
> I'll remove the LONG_TERM flag then. We do need the other two for the
> Allwinner driver though.
> 
I would ask Paulk and check the manual and vendor library later.
Even there are two register fields, it don’t mean they would be used and required at the same times.
Because it don’t follow ISO manual.
>> And I agree above with my last mail, so I would suggest to keep a property
>> as index for both frame_num and pic_num, as only one of them would be used
>> for a picture decoding once time.
> 
> I'd really prefer to keep everything that is in the bitstream defined
> here. We don't want to cover the usual cases, but all of them even the
> one that haven't been designed yet, so we should be really
> conservative.
As I mention in the other mail, a stateless decoder or encoder like means the device won’t track the previous result. But you have no idea on what data the device would need to process this picture. It is hard to define a standard structure for it.
As you see, even allwinner doesn’t obey all the standard the IOS document said.
In my original suggestion, I would just to add more reservation fields then future driver can use it.
> 
>>>>> +    /* Note that field is indicated by v4l2_buffer.field */
>>>>> +    __s32 top_field_order_cnt;
>>>>> +    __s32 bottom_field_order_cnt;
>>>>> +    __u8 flags; /* V4L2_H264_DPB_ENTRY_FLAG_* */
>>>>> +};
>>>>> +
>>>>> +struct v4l2_ctrl_h264_decode_param {
>>>>> +    __u32 num_slices;
>>>>> +    __u8 idr_pic_flag;
>>>>> +    __u8 nal_ref_idc;
>>>>> +    __s32 top_field_order_cnt;
>>>>> +    __s32 bottom_field_order_cnt;
>>>>> +    __u8 ref_pic_list_p0[32];
>>>>> +    __u8 ref_pic_list_b0[32];
>>>>> +    __u8 ref_pic_list_b1[32];
>>>> 
>>>> I would prefer to keep only two list, list0 and list 1.
>>> 
>>> I'm not even sure why this is needed in the first place
>>> anymore. It's not part of the bitstream, and it seems to come from
>>> ChromeOS' Rockchip driver that uses it though. Do you know why?
>> 
>> You see the P frame would use only a list and B for two list. So for
>> the parameter of a picture, two lists are max. I would suggest only
>> keep two arrays here and rename them as list0 and list1, it would
>> reduce the conflict.
> 
> Right, but those lists are already in v4l2_ctrl_h264_slice_param (with
> the construct you are suggesting). I'm not sure about why the
> redundancy is needed. Alex, Tomasz, do you have any idea why this was
> needed at some point?
> 
I may be wrong or forget.
> Thanks!
> Maxime
> 
> -- 
> Maxime Ripard, Bootlin
> Embedded Linux and Kernel engineering
> https://bootlin.com


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 1/2] media: uapi: Add H264 low-level decoder API compound controls.
  2019-01-24 14:37           ` Ayaka
@ 2019-01-25 12:47             ` Maxime Ripard
  0 siblings, 0 replies; 27+ messages in thread
From: Maxime Ripard @ 2019-01-25 12:47 UTC (permalink / raw)
  To: Ayaka
  Cc: tfiga, acourbot, hans.verkuil, sakari.ailus, Laurent Pinchart,
	jenskuske, linux-sunxi, linux-kernel, Paul Kocialkowski,
	Chen-Yu Tsai, posciak, Thomas Petazzoni, Guenter Roeck,
	nicolas.dufresne, linux-arm-kernel, linux-media

[-- Attachment #1: Type: text/plain, Size: 3205 bytes --]

On Thu, Jan 24, 2019 at 10:37:23PM +0800, Ayaka wrote:
> >>>>> +#define V4L2_H264_DPB_ENTRY_FLAG_VALID        0x01
> >>>>> +#define V4L2_H264_DPB_ENTRY_FLAG_ACTIVE        0x02
> >>>>> +#define V4L2_H264_DPB_ENTRY_FLAG_LONG_TERM    0x04
> >>>>> +
> >>>>> +struct v4l2_h264_dpb_entry {
> >>>>> +    __u32 tag;
> >>>>> +    __u16 frame_num;
> >>>>> +    __u16 pic_num;
> >>>> Although the long term reference would use picture order count
> >>>> and short term for frame num, but only one of them is used
> >>>> for a entry of a dpb.
> >>>> 
> >>>> Besides, for a frame picture frame_num = pic_num * 2,
> >>>> and frame_num = pic_num * 2 + 1 for a filed.
> >>> 
> >>> I'm not sure what is your point?
> >> 
> >> I found I was wrong at the last email.
> >> 
> >> But stateless hardware decoder usually don't care about whether it is long
> >> term or short term, as the real dpb updating or management work are not done
> >> by the the driver or device and decoding job would only use the two list(or
> >> one list for slice P) for reference pictures. So those flag for long term or
> >> status can be removed as well.
> > 
> > I'll remove the LONG_TERM flag then. We do need the other two for the
> > Allwinner driver though.
> >
>
> I would ask Paulk and check the manual and vendor library later.
>
> Even there are two register fields, it don’t mean they would be used
> and required at the same times. Because it don’t follow ISO manual.

It's not a matter of decoding per se, but how the hardware
behaves. All the buffers needed for one particular frame to be decoded
are uploaded to an SRAM, and the position of each buffer in that SRAM
cannot change during the time when it has been decoded, and then later
on when it's used as a reference. If you only have the frames needed
to decode the current frame, you will have no idea which slot in the
SRAM can be reused, whereas having the full DPB allows you to do
that. And that's what _FLAG_ACTIVE gives you.

> >> And I agree above with my last mail, so I would suggest to keep a property
> >> as index for both frame_num and pic_num, as only one of them would be used
> >> for a picture decoding once time.
> > 
> > I'd really prefer to keep everything that is in the bitstream defined
> > here. We don't want to cover the usual cases, but all of them even the
> > one that haven't been designed yet, so we should be really
> > conservative.
>
> As I mention in the other mail, a stateless decoder or encoder like
> means the device won’t track the previous result. But you have no
> idea on what data the device would need to process this picture. It
> is hard to define a standard structure for it.
>
> As you see, even allwinner doesn’t obey all the standard the IOS
> document said.

It's not that it disobeys it, it's that it requires the full blown DPB
to have a working driver.

> In my original suggestion, I would just to add more reservation
> fields then future driver can use it.

This interface is not stable at the moment, so it doesn't really
matter does it?

Maxime

-- 
Maxime Ripard, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 1/2] media: uapi: Add H264 low-level decoder API compound controls.
  2018-11-15 14:56 ` [PATCH v2 1/2] media: uapi: Add H264 low-level decoder API compound controls Maxime Ripard
                     ` (2 preceding siblings ...)
  2019-01-08  9:52   ` Randy 'ayaka' Li
@ 2019-01-28  5:54   ` Alexandre Courbot
  3 siblings, 0 replies; 27+ messages in thread
From: Alexandre Courbot @ 2019-01-28  5:54 UTC (permalink / raw)
  To: Maxime Ripard
  Cc: Hans Verkuil, Sakari Ailus, Laurent Pinchart, Tomasz Figa,
	Pawel Osciak, Paul Kocialkowski, Chen-Yu Tsai, LKML,
	linux-arm-kernel, Linux Media Mailing List, Nicolas Dufresne,
	jenskuske, linux-sunxi, Thomas Petazzoni, Guenter Roeck

On Thu, Nov 15, 2018 at 11:56 PM Maxime Ripard
<maxime.ripard@bootlin.com> wrote:
>
> From: Pawel Osciak <posciak@chromium.org>
>
> Stateless video codecs will require both the H264 metadata and slices in
> order to be able to decode frames.
>
> This introduces the definitions for a new pixel format for H264 slices that
> have been parsed, as well as the structures used to pass the metadata from
> the userspace to the kernel.
>
> Co-Developed-by: Maxime Ripard <maxime.ripard@bootlin.com>
> Signed-off-by: Pawel Osciak <posciak@chromium.org>
> Signed-off-by: Guenter Roeck <groeck@chromium.org>
> Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>

<snip>

> diff --git a/drivers/media/v4l2-core/v4l2-ctrls.c b/drivers/media/v4l2-core/v4l2-ctrls.c
> index b854cceb19dc..e96c453208e8 100644
> --- a/drivers/media/v4l2-core/v4l2-ctrls.c
> +++ b/drivers/media/v4l2-core/v4l2-ctrls.c
> @@ -825,6 +825,11 @@ const char *v4l2_ctrl_get_name(u32 id)
>         case V4L2_CID_MPEG_VIDEO_H264_HIERARCHICAL_CODING_LAYER:return "H264 Number of HC Layers";
>         case V4L2_CID_MPEG_VIDEO_H264_HIERARCHICAL_CODING_LAYER_QP:
>                                                                 return "H264 Set QP Value for HC Layers";
> +       case V4L2_CID_MPEG_VIDEO_H264_SPS:                      return "H264 SPS";
> +       case V4L2_CID_MPEG_VIDEO_H264_PPS:                      return "H264 PPS";
> +       case V4L2_CID_MPEG_VIDEO_H264_SCALING_MATRIX:           return "H264 Scaling Matrix";
> +       case V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAMS:             return "H264 Slice Parameters";
> +       case V4L2_CID_MPEG_VIDEO_H264_DECODE_PARAMS:            return "H264 Decode Parameters";
>         case V4L2_CID_MPEG_VIDEO_MPEG4_I_FRAME_QP:              return "MPEG4 I-Frame QP Value";
>         case V4L2_CID_MPEG_VIDEO_MPEG4_P_FRAME_QP:              return "MPEG4 P-Frame QP Value";
>         case V4L2_CID_MPEG_VIDEO_MPEG4_B_FRAME_QP:              return "MPEG4 B-Frame QP Value";

To make things future-proof I think it may be good to add a control
specifying the granularity of data sent with each request (see
https://lkml.org/lkml/2019/1/24/147).

Right now we have a consensus that to make things simple, we request
one frame of encoded data per request. But this will probably be
relaxed in the future, since allowing to process things at lower
granularity may improve latency. Moreover the granularity accepted by
the encoder is hardware/firmware dependent, so it is probably a good
idea to expose this from the beginning.

How about a new V4L2_CID_MPEG_VIDEO_H264_GRANULARITY control with only
one value at the moment, namely
V4L2_MPEG_VIDEO_H264_GRANULARITY_FRAME? We could extend this in the
future, and that way user-space will have no excuse for not checking
that the codec supports the input granularity it will send.

I'm wondering whether this could be made codec-independent, but I'm
afraid this would add confusion.

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2019-01-28  6:01 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-15 14:56 [PATCH v2 0/2] media: cedrus: Add H264 decoding support Maxime Ripard
2018-11-15 14:56 ` [PATCH v2 1/2] media: uapi: Add H264 low-level decoder API compound controls Maxime Ripard
2018-11-27 17:23   ` [linux-sunxi] " Jernej Škrabec
2018-11-28 15:52     ` Maxime Ripard
2018-12-05 12:56   ` Hans Verkuil
2019-01-08  9:52   ` Randy 'ayaka' Li
2019-01-08 17:01     ` ayaka
2019-01-10 13:33       ` ayaka
2019-01-17 11:21         ` Maxime Ripard
2019-01-17 11:16       ` Maxime Ripard
2019-01-17 11:01     ` Maxime Ripard
2019-01-20 12:48       ` ayaka
2019-01-24 14:23         ` Maxime Ripard
2019-01-24 14:37           ` Ayaka
2019-01-25 12:47             ` Maxime Ripard
2019-01-28  5:54   ` Alexandre Courbot
2018-11-15 14:56 ` [PATCH v2 2/2] media: cedrus: Add H264 decoding support Maxime Ripard
2018-11-24 20:43   ` [linux-sunxi] " Jernej Škrabec
2018-11-27 15:50     ` Maxime Ripard
2018-11-27 16:30       ` Jernej Škrabec
2018-11-27 20:19         ` Jernej Škrabec
2018-11-30  7:30         ` Maxime Ripard
2018-11-30 17:56           ` Jernej Škrabec
2018-11-30 12:37   ` Paul Kocialkowski
2018-12-05 22:27   ` [linux-sunxi] " Jernej Škrabec
2018-11-16  7:04 ` [PATCH v2 0/2] " Tomasz Figa
2018-11-19 14:12   ` Maxime Ripard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).