Linux-Media Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH RFC 00/12] media: hantro: H264 fixes and improvements
@ 2019-09-01 12:42 Jonas Karlman
  2019-09-01 12:45 ` [PATCH 01/12] media: hantro: Fix H264 max frmsize supported on RK3288 Jonas Karlman
                   ` (2 more replies)
  0 siblings, 3 replies; 33+ messages in thread
From: Jonas Karlman @ 2019-09-01 12:42 UTC (permalink / raw)
  To: Ezequiel Garcia
  Cc: Mauro Carvalho Chehab, Hans Verkuil, Boris Brezillon,
	Philipp Zabel, Paul Kocialkowski, linux-media, linux-rockchip,
	linux-kernel, Jonas Karlman

This series contains fixes and improvements for the hantro H264 decoder.

Patch 1-6 fixes issues and limitations observed when preparing support
for field encoded content.

Patch 7 introduce new DPB entry flags that is used to signal how a reference
frame is referenced. This information is needed to correctly build a
reference list for field encoded content.

Patch 8 adds bits to handle field encoded content, this is a rough patch
and should be reworked with proper code style and formatting.
Please get back with feedback on how to improve this.

The following samples from [1] are now playable with patch 1-8
- H264_1080i-25-interlace_Kaesescheibchen.mkv
- H264_10_1080i_50_AC3-Astra19.2_ProSieben_HD.ts
- big_buck_bunny_1080p_H264_AAC_25fps_7200K.mp4
- h264_tivo_sample.ts

The rest of the patches refactors G1 H264 code to more closely match
the code generated by my rockchip-vpu-regtool at [2] and then adds
support for H264 decoding on RK3399/RK3328 using the VPU2 block.
This code is early work and needs proper code style and formatting,
I just wanted to share the early work and get some initial feedback.

This series has been tested using ffmpeg v4l2 request hwaccel at [3] [4]

[1] http://kwiboo.libreelec.tv/test/samples/
[2] https://github.com/Kwiboo/rockchip-vpu-regtool
[3] https://github.com/Kwiboo/FFmpeg/commits/v4l2-request-hwaccel-4.0.4
[4] https://github.com/Kwiboo/FFmpeg/compare/4.0.4-Leia-18.4...45df99d31062e068073cf899dce559e334c9127f

Regards,
Jonas

Jonas Karlman (12):
  media: hantro: Fix H264 max frmsize supported on RK3288
  media: hantro: Do not reorder H264 scaling list
  media: hantro: Fix H264 motion vector buffer offset
  media: hantro: Reduce H264 extra space for motion vectors
  media: hantro: Remove now unused H264 pic_size
  media: hantro: Set H264 FIELDPIC_FLAG_E flag correctly
  media: uapi: h264: Add DPB entry field reference flags
  media: hantro: Fix H264 decoding of field encoded content
  media: hantro: Refactor G1 H264 code
  media: hantro: Add support for H264 decoding on RK3399
  media: hantro: Enable H264 decoding on RK3399
  media: hantro: Enable H264 decoding on RK3328

 .../media/uapi/v4l/ext-ctrls-codec.rst        |  12 +
 drivers/staging/media/hantro/Makefile         |   1 +
 .../staging/media/hantro/hantro_g1_h264_dec.c | 685 +++++++++++-------
 drivers/staging/media/hantro/hantro_h264.c    | 209 +++---
 drivers/staging/media/hantro/hantro_hw.h      |  10 +-
 drivers/staging/media/hantro/hantro_v4l2.c    |   6 +-
 drivers/staging/media/hantro/rk3288_vpu_hw.c  |   4 +-
 drivers/staging/media/hantro/rk3399_vpu_hw.c  |  24 +-
 .../media/hantro/rk3399_vpu_hw_h264_dec.c     | 486 +++++++++++++
 include/media/h264-ctrls.h                    |   4 +
 10 files changed, 1091 insertions(+), 350 deletions(-)
 create mode 100644 drivers/staging/media/hantro/rk3399_vpu_hw_h264_dec.c

-- 
2.17.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 02/12] media: hantro: Do not reorder H264 scaling list
       [not found] ` <20190901124531.23645-1-jonas@kwiboo.se>
@ 2019-09-01 12:45   ` Jonas Karlman
  2019-09-02 14:00     ` Philipp Zabel
  2019-09-01 12:45   ` [PATCH 03/12] media: hantro: Fix H264 motion vector buffer offset Jonas Karlman
                     ` (9 subsequent siblings)
  10 siblings, 1 reply; 33+ messages in thread
From: Jonas Karlman @ 2019-09-01 12:45 UTC (permalink / raw)
  To: Ezequiel Garcia
  Cc: Mauro Carvalho Chehab, Hans Verkuil, Boris Brezillon,
	Philipp Zabel, Paul Kocialkowski, linux-media, linux-rockchip,
	linux-kernel, Jonas Karlman

Scaling list supplied from userspace using ffmpeg and libva-v4l2-request
is already in matrix order and can be used without applying the inverse
scanning process.

The HW also only support 8x8 scaling list for the Y component, indices 0
and 3 in the scaling list supplied from userspace.

Remove reordering and write the scaling matrix in an order expected by
the VPU, also only allocate memory for the two 8x8 lists used.

Fixes: a9471e25629b ("media: hantro: Add core bits to support H264 decoding")
Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
---
 drivers/staging/media/hantro/hantro_h264.c | 64 +++++++---------------
 1 file changed, 20 insertions(+), 44 deletions(-)

diff --git a/drivers/staging/media/hantro/hantro_h264.c b/drivers/staging/media/hantro/hantro_h264.c
index 0d758e0c0f99..e2d01145ac4f 100644
--- a/drivers/staging/media/hantro/hantro_h264.c
+++ b/drivers/staging/media/hantro/hantro_h264.c
@@ -20,7 +20,7 @@
 /* Size with u32 units. */
 #define CABAC_INIT_BUFFER_SIZE		(460 * 2)
 #define POC_BUFFER_SIZE			34
-#define SCALING_LIST_SIZE		(6 * 16 + 6 * 64)
+#define SCALING_LIST_SIZE		(6 * 16 + 2 * 64)
 
 #define POC_CMP(p0, p1) ((p0) < (p1) ? -1 : 1)
 
@@ -194,57 +194,33 @@ static const u32 h264_cabac_table[] = {
 	0x1f0c2517, 0x1f261440
 };
 
-/*
- * NOTE: The scaling lists are in zig-zag order, apply inverse scanning process
- * to get the values in matrix order. In addition, the hardware requires bytes
- * swapped within each subsequent 4 bytes. Both arrays below include both
- * transformations.
- */
-static const u32 zig_zag_4x4[] = {
-	3, 2, 7, 11, 6, 1, 0, 5, 10, 15, 14, 9, 4, 8, 13, 12
-};
-
-static const u32 zig_zag_8x8[] = {
-	3, 2, 11, 19, 10, 1, 0, 9, 18, 27, 35, 26, 17, 8, 7, 6,
-	15, 16, 25, 34, 43, 51, 42, 33, 24, 23, 14, 5, 4, 13, 22, 31,
-	32, 41, 50, 59, 58, 49, 40, 39, 30, 21, 12, 20, 29, 38, 47, 48,
-	57, 56, 55, 46, 37, 28, 36, 45, 54, 63, 62, 53, 44, 52, 61, 60
-};
-
 static void
 reorder_scaling_list(struct hantro_ctx *ctx)
 {
 	const struct hantro_h264_dec_ctrls *ctrls = &ctx->h264_dec.ctrls;
 	const struct v4l2_ctrl_h264_scaling_matrix *scaling = ctrls->scaling;
-	const size_t num_list_4x4 = ARRAY_SIZE(scaling->scaling_list_4x4);
-	const size_t list_len_4x4 = ARRAY_SIZE(scaling->scaling_list_4x4[0]);
-	const size_t num_list_8x8 = ARRAY_SIZE(scaling->scaling_list_8x8);
-	const size_t list_len_8x8 = ARRAY_SIZE(scaling->scaling_list_8x8[0]);
 	struct hantro_h264_dec_priv_tbl *tbl = ctx->h264_dec.priv.cpu;
-	u8 *dst = tbl->scaling_list;
-	const u8 *src;
-	int i, j;
-
-	BUILD_BUG_ON(ARRAY_SIZE(zig_zag_4x4) != list_len_4x4);
-	BUILD_BUG_ON(ARRAY_SIZE(zig_zag_8x8) != list_len_8x8);
-	BUILD_BUG_ON(ARRAY_SIZE(tbl->scaling_list) !=
-		     num_list_4x4 * list_len_4x4 +
-		     num_list_8x8 * list_len_8x8);
-
-	src = &scaling->scaling_list_4x4[0][0];
-	for (i = 0; i < num_list_4x4; ++i) {
-		for (j = 0; j < list_len_4x4; ++j)
-			dst[zig_zag_4x4[j]] = src[j];
-		src += list_len_4x4;
-		dst += list_len_4x4;
+	u32 *dst = (u32 *)tbl->scaling_list;
+	u32 i, j, tmp;
+
+	for (i = 0; i < ARRAY_SIZE(scaling->scaling_list_4x4); i++) {
+		for (j = 0; j < ARRAY_SIZE(scaling->scaling_list_4x4[0]) / 4; j++) {
+			tmp = (scaling->scaling_list_4x4[i][4 * j + 0] << 24) |
+			      (scaling->scaling_list_4x4[i][4 * j + 1] << 16) |
+			      (scaling->scaling_list_4x4[i][4 * j + 2] << 8) |
+			      (scaling->scaling_list_4x4[i][4 * j + 3]);
+			*dst++ = tmp;
+		}
 	}
 
-	src = &scaling->scaling_list_8x8[0][0];
-	for (i = 0; i < num_list_8x8; ++i) {
-		for (j = 0; j < list_len_8x8; ++j)
-			dst[zig_zag_8x8[j]] = src[j];
-		src += list_len_8x8;
-		dst += list_len_8x8;
+	for (i = 0; i < ARRAY_SIZE(scaling->scaling_list_8x8); i += 3) {
+		for (j = 0; j < ARRAY_SIZE(scaling->scaling_list_8x8[0]) / 4; j++) {
+			tmp = (scaling->scaling_list_8x8[i][4 * j + 0] << 24) |
+			      (scaling->scaling_list_8x8[i][4 * j + 1] << 16) |
+			      (scaling->scaling_list_8x8[i][4 * j + 2] << 8) |
+			      (scaling->scaling_list_8x8[i][4 * j + 3]);
+			*dst++ = tmp;
+		}
 	}
 }
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 01/12] media: hantro: Fix H264 max frmsize supported on RK3288
  2019-09-01 12:42 [PATCH RFC 00/12] media: hantro: H264 fixes and improvements Jonas Karlman
@ 2019-09-01 12:45 ` Jonas Karlman
  2019-09-04 13:07   ` Ezequiel Garcia
       [not found] ` <20190901124531.23645-1-jonas@kwiboo.se>
  2019-09-02 13:02 ` [PATCH RFC 00/12] media: hantro: H264 fixes and improvements Ezequiel Garcia
  2 siblings, 1 reply; 33+ messages in thread
From: Jonas Karlman @ 2019-09-01 12:45 UTC (permalink / raw)
  To: Ezequiel Garcia
  Cc: Mauro Carvalho Chehab, Hans Verkuil, Boris Brezillon,
	Philipp Zabel, Paul Kocialkowski, linux-media, linux-rockchip,
	linux-kernel, Jonas Karlman

TRM specify supported image size 48x48 to 4096x2304 at step size 16 pixels,
change frmsize max_width/max_height to match TRM.

Fixes: 760327930e10 ("media: hantro: Enable H264 decoding on rk3288")
Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
---
 drivers/staging/media/hantro/rk3288_vpu_hw.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/media/hantro/rk3288_vpu_hw.c b/drivers/staging/media/hantro/rk3288_vpu_hw.c
index 6bfcc47d1e58..ebb017b8a334 100644
--- a/drivers/staging/media/hantro/rk3288_vpu_hw.c
+++ b/drivers/staging/media/hantro/rk3288_vpu_hw.c
@@ -67,10 +67,10 @@ static const struct hantro_fmt rk3288_vpu_dec_fmts[] = {
 		.max_depth = 2,
 		.frmsize = {
 			.min_width = 48,
-			.max_width = 3840,
+			.max_width = 4096,
 			.step_width = H264_MB_DIM,
 			.min_height = 48,
-			.max_height = 2160,
+			.max_height = 2304,
 			.step_height = H264_MB_DIM,
 		},
 	},
-- 
2.17.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 03/12] media: hantro: Fix H264 motion vector buffer offset
       [not found] ` <20190901124531.23645-1-jonas@kwiboo.se>
  2019-09-01 12:45   ` [PATCH 02/12] media: hantro: Do not reorder H264 scaling list Jonas Karlman
@ 2019-09-01 12:45   ` Jonas Karlman
  2019-09-03 10:58     ` Philipp Zabel
                       ` (2 more replies)
  2019-09-01 12:45   ` [PATCH 05/12] media: hantro: Remove now unused H264 pic_size Jonas Karlman
                     ` (8 subsequent siblings)
  10 siblings, 3 replies; 33+ messages in thread
From: Jonas Karlman @ 2019-09-01 12:45 UTC (permalink / raw)
  To: Ezequiel Garcia
  Cc: Mauro Carvalho Chehab, Hans Verkuil, Boris Brezillon,
	Philipp Zabel, Paul Kocialkowski, linux-media, linux-rockchip,
	linux-kernel, Jonas Karlman

A decoded 8-bit 4:2:0 frame need memory for up to 448 macroblocks
and is laid out in memory as follow:

+-------------------+
| Y-plane   256 MBs |
+-------------------+
| UV-plane  128 MBs |
+-------------------+
| MV buffer  64 MBs |
+-------------------+

The motion vector buffer offset is currently correct for 4:2:0 because
the extra space for motion vectors is overallocated with an extra 64 MBs.

Wrong offset for both destination and motion vector buffer are used
for the bottom field of field encoded content, wrong offset is
also used for 4:0:0 (monochrome) content.

Fix this by always setting the motion vector address to the expected
384 MBs offset for 4:2:0 and 256 MBs offset for 4:0:0 content.

Also use correct destination and motion vector buffer offset
for the bottom field of field encoded content.

While at it also extend the check for 4:0:0 (monochrome) to include an
additional check for High Profile (100).

Fixes: dea0a82f3d22 ("media: hantro: Add support for H264 decoding on G1")
Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
---
 .../staging/media/hantro/hantro_g1_h264_dec.c | 33 +++++++++++--------
 1 file changed, 19 insertions(+), 14 deletions(-)

diff --git a/drivers/staging/media/hantro/hantro_g1_h264_dec.c b/drivers/staging/media/hantro/hantro_g1_h264_dec.c
index 7ab534936843..159bd67e0a36 100644
--- a/drivers/staging/media/hantro/hantro_g1_h264_dec.c
+++ b/drivers/staging/media/hantro/hantro_g1_h264_dec.c
@@ -19,6 +19,9 @@
 #include "hantro_hw.h"
 #include "hantro_v4l2.h"
 
+#define MV_OFFSET_420	384
+#define MV_OFFSET_400	256
+
 static void set_params(struct hantro_ctx *ctx)
 {
 	const struct hantro_h264_dec_ctrls *ctrls = &ctx->h264_dec.ctrls;
@@ -49,8 +52,8 @@ static void set_params(struct hantro_ctx *ctx)
 	vdpu_write_relaxed(vpu, reg, G1_REG_DEC_CTRL0);
 
 	/* Decoder control register 1. */
-	reg = G1_REG_DEC_CTRL1_PIC_MB_WIDTH(sps->pic_width_in_mbs_minus1 + 1) |
-	      G1_REG_DEC_CTRL1_PIC_MB_HEIGHT_P(sps->pic_height_in_map_units_minus1 + 1) |
+	reg = G1_REG_DEC_CTRL1_PIC_MB_WIDTH(H264_MB_WIDTH(ctx->dst_fmt.width)) |
+	      G1_REG_DEC_CTRL1_PIC_MB_HEIGHT_P(H264_MB_HEIGHT(ctx->dst_fmt.height)) |
 	      G1_REG_DEC_CTRL1_REF_FRAMES(sps->max_num_ref_frames);
 	vdpu_write_relaxed(vpu, reg, G1_REG_DEC_CTRL1);
 
@@ -79,7 +82,7 @@ static void set_params(struct hantro_ctx *ctx)
 		reg |= G1_REG_DEC_CTRL4_CABAC_E;
 	if (sps->flags & V4L2_H264_SPS_FLAG_DIRECT_8X8_INFERENCE)
 		reg |= G1_REG_DEC_CTRL4_DIR_8X8_INFER_E;
-	if (sps->chroma_format_idc == 0)
+	if (sps->profile_idc >= 100 && sps->chroma_format_idc == 0)
 		reg |= G1_REG_DEC_CTRL4_BLACKWHITE_E;
 	if (pps->flags & V4L2_H264_PPS_FLAG_WEIGHTED_PRED)
 		reg |= G1_REG_DEC_CTRL4_WEIGHT_PRED_E;
@@ -233,6 +236,7 @@ static void set_buffers(struct hantro_ctx *ctx)
 	struct vb2_v4l2_buffer *src_buf, *dst_buf;
 	struct hantro_dev *vpu = ctx->dev;
 	dma_addr_t src_dma, dst_dma;
+	unsigned int offset = MV_OFFSET_420;
 
 	src_buf = hantro_get_src_buf(ctx);
 	dst_buf = hantro_get_dst_buf(ctx);
@@ -243,19 +247,20 @@ static void set_buffers(struct hantro_ctx *ctx)
 
 	/* Destination (decoded frame) buffer. */
 	dst_dma = vb2_dma_contig_plane_dma_addr(&dst_buf->vb2_buf, 0);
+	if (ctrls->slices[0].flags & V4L2_H264_SLICE_FLAG_BOTTOM_FIELD)
+		dst_dma += ALIGN(ctx->dst_fmt.width, H264_MB_DIM);
 	vdpu_write_relaxed(vpu, dst_dma, G1_REG_ADDR_DST);
 
-	/* Higher profiles require DMV buffer appended to reference frames. */
-	if (ctrls->sps->profile_idc > 66) {
-		size_t pic_size = ctx->h264_dec.pic_size;
-		size_t mv_offset = round_up(pic_size, 8);
-
-		if (ctrls->slices[0].flags & V4L2_H264_SLICE_FLAG_BOTTOM_FIELD)
-			mv_offset += 32 * H264_MB_WIDTH(ctx->dst_fmt.width);
-
-		vdpu_write_relaxed(vpu, dst_dma + mv_offset,
-				   G1_REG_ADDR_DIR_MV);
-	}
+	/* Motion vector buffer is located after the decoded frame. */
+	dst_dma = vb2_dma_contig_plane_dma_addr(&dst_buf->vb2_buf, 0);
+	if (ctrls->sps->profile_idc >= 100 && ctrls->sps->chroma_format_idc == 0)
+		offset = MV_OFFSET_400;
+	dst_dma += offset * H264_MB_WIDTH(ctx->dst_fmt.width) *
+		   H264_MB_HEIGHT(ctx->dst_fmt.height);
+	if (ctrls->slices[0].flags & V4L2_H264_SLICE_FLAG_BOTTOM_FIELD)
+		dst_dma += 32 * H264_MB_WIDTH(ctx->dst_fmt.width) *
+			   H264_MB_HEIGHT(ctx->dst_fmt.height);
+	vdpu_write_relaxed(vpu, dst_dma, G1_REG_ADDR_DIR_MV);
 
 	/* Auxiliary buffer prepared in hantro_g1_h264_dec_prepare_table(). */
 	vdpu_write_relaxed(vpu, ctx->h264_dec.priv.dma, G1_REG_ADDR_QTABLE);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 04/12] media: hantro: Reduce H264 extra space for motion vectors
       [not found] ` <20190901124531.23645-1-jonas@kwiboo.se>
                     ` (2 preceding siblings ...)
  2019-09-01 12:45   ` [PATCH 05/12] media: hantro: Remove now unused H264 pic_size Jonas Karlman
@ 2019-09-01 12:45   ` Jonas Karlman
  2019-09-01 12:45   ` [PATCH 06/12] media: hantro: Set H264 FIELDPIC_FLAG_E flag correctly Jonas Karlman
                     ` (6 subsequent siblings)
  10 siblings, 0 replies; 33+ messages in thread
From: Jonas Karlman @ 2019-09-01 12:45 UTC (permalink / raw)
  To: Ezequiel Garcia
  Cc: Mauro Carvalho Chehab, Hans Verkuil, Boris Brezillon,
	Philipp Zabel, Paul Kocialkowski, linux-media, linux-rockchip,
	linux-kernel, Jonas Karlman

A decoded 8-bit 4:2:0 frame need memory for up to 448 macroblocks
with additional 32 bytes on multi-core variants.

Memory layout is as follow:

+-------------------+
| Y-plane   256 MBs |
+-------------------+
| UV-plane  128 MBs |
+-------------------+
| MV buffer  64 MBs |
+-------------------+
| MC sync  32 bytes |
+-------------------+

Reduce the extra space allocated now that motion vector buffer offset no
longer is based on the extra space.

Only use extra space for 64 MBs of motion vector buffer and 32 bytes for
multi-core sync.

Fixes: a9471e25629b ("media: hantro: Add core bits to support H264 decoding")
Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
---
 drivers/staging/media/hantro/hantro_v4l2.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/media/hantro/hantro_v4l2.c b/drivers/staging/media/hantro/hantro_v4l2.c
index 3dae52abb96c..3a360a6a17e2 100644
--- a/drivers/staging/media/hantro/hantro_v4l2.c
+++ b/drivers/staging/media/hantro/hantro_v4l2.c
@@ -242,12 +242,12 @@ static int vidioc_try_fmt(struct file *file, void *priv, struct v4l2_format *f,
 		/*
 		 * The H264 decoder needs extra space on the output buffers
 		 * to store motion vectors. This is needed for reference
-		 * frames.
+		 * frames. 32 extra bytes is used for multi-core sync.
 		 */
 		if (ctx->vpu_src_fmt->fourcc == V4L2_PIX_FMT_H264_SLICE)
 			pix_mp->plane_fmt[0].sizeimage +=
-				128 * DIV_ROUND_UP(pix_mp->width, 16) *
-				      DIV_ROUND_UP(pix_mp->height, 16);
+				64 * H264_MB_WIDTH(pix_mp->width) *
+				     H264_MB_WIDTH(pix_mp->height) + 32;
 	} else if (!pix_mp->plane_fmt[0].sizeimage) {
 		/*
 		 * For coded formats the application can specify
-- 
2.17.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 05/12] media: hantro: Remove now unused H264 pic_size
       [not found] ` <20190901124531.23645-1-jonas@kwiboo.se>
  2019-09-01 12:45   ` [PATCH 02/12] media: hantro: Do not reorder H264 scaling list Jonas Karlman
  2019-09-01 12:45   ` [PATCH 03/12] media: hantro: Fix H264 motion vector buffer offset Jonas Karlman
@ 2019-09-01 12:45   ` Jonas Karlman
  2019-09-01 12:45   ` [PATCH 04/12] media: hantro: Reduce H264 extra space for motion vectors Jonas Karlman
                     ` (7 subsequent siblings)
  10 siblings, 0 replies; 33+ messages in thread
From: Jonas Karlman @ 2019-09-01 12:45 UTC (permalink / raw)
  To: Ezequiel Garcia
  Cc: Mauro Carvalho Chehab, Hans Verkuil, Boris Brezillon,
	Philipp Zabel, Paul Kocialkowski, linux-media, linux-rockchip,
	linux-kernel, Jonas Karlman

pic_size in hantro_h264_dec_hw_ctx struct is no longer used,
lets remove it.

Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
---
 drivers/staging/media/hantro/hantro_h264.c | 5 -----
 drivers/staging/media/hantro/hantro_hw.h   | 3 ---
 2 files changed, 8 deletions(-)

diff --git a/drivers/staging/media/hantro/hantro_h264.c b/drivers/staging/media/hantro/hantro_h264.c
index e2d01145ac4f..a77cc28e180a 100644
--- a/drivers/staging/media/hantro/hantro_h264.c
+++ b/drivers/staging/media/hantro/hantro_h264.c
@@ -603,7 +603,6 @@ int hantro_h264_dec_init(struct hantro_ctx *ctx)
 	struct hantro_h264_dec_hw_ctx *h264_dec = &ctx->h264_dec;
 	struct hantro_aux_buf *priv = &h264_dec->priv;
 	struct hantro_h264_dec_priv_tbl *tbl;
-	struct v4l2_pix_format_mplane pix_mp;
 
 	priv->cpu = dma_alloc_coherent(vpu->dev, sizeof(*tbl), &priv->dma,
 				       GFP_KERNEL);
@@ -614,9 +613,5 @@ int hantro_h264_dec_init(struct hantro_ctx *ctx)
 	tbl = priv->cpu;
 	memcpy(tbl->cabac_table, h264_cabac_table, sizeof(tbl->cabac_table));
 
-	v4l2_fill_pixfmt_mp(&pix_mp, ctx->dst_fmt.pixelformat,
-			    ctx->dst_fmt.width, ctx->dst_fmt.height);
-	h264_dec->pic_size = pix_mp.plane_fmt[0].sizeimage;
-
 	return 0;
 }
diff --git a/drivers/staging/media/hantro/hantro_hw.h b/drivers/staging/media/hantro/hantro_hw.h
index 2fab655bf098..8adad8ac9b1d 100644
--- a/drivers/staging/media/hantro/hantro_hw.h
+++ b/drivers/staging/media/hantro/hantro_hw.h
@@ -80,15 +80,12 @@ struct hantro_h264_dec_reflists {
  * @dpb:	DPB
  * @reflists:	P/B0/B1 reflists
  * @ctrls:	V4L2 controls attached to a run
- * @pic_size:	Size in bytes of decoded picture, this is needed
- *		to pass the location of motion vectors.
  */
 struct hantro_h264_dec_hw_ctx {
 	struct hantro_aux_buf priv;
 	struct v4l2_h264_dpb_entry dpb[HANTRO_H264_DPB_SIZE];
 	struct hantro_h264_dec_reflists reflists;
 	struct hantro_h264_dec_ctrls ctrls;
-	size_t pic_size;
 };
 
 /**
-- 
2.17.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 06/12] media: hantro: Set H264 FIELDPIC_FLAG_E flag correctly
       [not found] ` <20190901124531.23645-1-jonas@kwiboo.se>
                     ` (3 preceding siblings ...)
  2019-09-01 12:45   ` [PATCH 04/12] media: hantro: Reduce H264 extra space for motion vectors Jonas Karlman
@ 2019-09-01 12:45   ` Jonas Karlman
  2019-09-01 12:45   ` [RFC 08/12] media: hantro: Fix H264 decoding of field encoded content Jonas Karlman
                     ` (5 subsequent siblings)
  10 siblings, 0 replies; 33+ messages in thread
From: Jonas Karlman @ 2019-09-01 12:45 UTC (permalink / raw)
  To: Ezequiel Garcia
  Cc: Mauro Carvalho Chehab, Hans Verkuil, Boris Brezillon,
	Philipp Zabel, Paul Kocialkowski, linux-media, linux-rockchip,
	linux-kernel, Jonas Karlman

The FIELDPIC_FLAG_E bit should be set when field_pic_flag exists in stream,
it is currently set based on field_pic_flag of current frame.
The PIC_FIELDMODE_E bit is correctly set based on the field_pic_flag.

Fix this by setting the FIELDPIC_FLAG_E bit when frame_mbs_only is not set.

Fixes: dea0a82f3d22 ("media: hantro: Add support for H264 decoding on G1")
Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
---
 drivers/staging/media/hantro/hantro_g1_h264_dec.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/media/hantro/hantro_g1_h264_dec.c b/drivers/staging/media/hantro/hantro_g1_h264_dec.c
index 159bd67e0a36..16f21d258f6a 100644
--- a/drivers/staging/media/hantro/hantro_g1_h264_dec.c
+++ b/drivers/staging/media/hantro/hantro_g1_h264_dec.c
@@ -64,7 +64,7 @@ static void set_params(struct hantro_ctx *ctx)
 	/* always use the matrix sent from userspace */
 	reg |= G1_REG_DEC_CTRL2_TYPE1_QUANT_E;
 
-	if (slices[0].flags &  V4L2_H264_SLICE_FLAG_FIELD_PIC)
+	if (!(sps->flags & V4L2_H264_SPS_FLAG_FRAME_MBS_ONLY))
 		reg |= G1_REG_DEC_CTRL2_FIELDPIC_FLAG_E;
 	vdpu_write_relaxed(vpu, reg, G1_REG_DEC_CTRL2);
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [RFC 08/12] media: hantro: Fix H264 decoding of field encoded content
       [not found] ` <20190901124531.23645-1-jonas@kwiboo.se>
                     ` (4 preceding siblings ...)
  2019-09-01 12:45   ` [PATCH 06/12] media: hantro: Set H264 FIELDPIC_FLAG_E flag correctly Jonas Karlman
@ 2019-09-01 12:45   ` Jonas Karlman
  2019-09-03 13:21     ` Philipp Zabel
  2019-09-01 12:45   ` [RFC 07/12] media: uapi: h264: Add DPB entry field reference flags Jonas Karlman
                     ` (4 subsequent siblings)
  10 siblings, 1 reply; 33+ messages in thread
From: Jonas Karlman @ 2019-09-01 12:45 UTC (permalink / raw)
  To: Ezequiel Garcia
  Cc: Mauro Carvalho Chehab, Hans Verkuil, Boris Brezillon,
	Philipp Zabel, Paul Kocialkowski, linux-media, linux-rockchip,
	linux-kernel, Jonas Karlman

This need code cleanup and formatting

Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
---
 .../staging/media/hantro/hantro_g1_h264_dec.c |  26 ++--
 drivers/staging/media/hantro/hantro_h264.c    | 126 ++++++++++++------
 drivers/staging/media/hantro/hantro_hw.h      |   4 +
 3 files changed, 100 insertions(+), 56 deletions(-)

diff --git a/drivers/staging/media/hantro/hantro_g1_h264_dec.c b/drivers/staging/media/hantro/hantro_g1_h264_dec.c
index 16f21d258f6a..bc628ef73b29 100644
--- a/drivers/staging/media/hantro/hantro_g1_h264_dec.c
+++ b/drivers/staging/media/hantro/hantro_g1_h264_dec.c
@@ -130,28 +130,20 @@ static void set_params(struct hantro_ctx *ctx)
 
 static void set_ref(struct hantro_ctx *ctx)
 {
+	const struct v4l2_ctrl_h264_decode_params *dec_param;
+	const struct v4l2_ctrl_h264_slice_params *slice;
 	struct v4l2_h264_dpb_entry *dpb = ctx->h264_dec.dpb;
 	const u8 *b0_reflist, *b1_reflist, *p_reflist;
 	struct hantro_dev *vpu = ctx->dev;
-	u32 dpb_longterm = 0;
-	u32 dpb_valid = 0;
 	int reg_num;
 	u32 reg;
 	int i;
 
-	/*
-	 * Set up bit maps of valid and long term DPBs.
-	 * NOTE: The bits are reversed, i.e. MSb is DPB 0.
-	 */
-	for (i = 0; i < HANTRO_H264_DPB_SIZE; ++i) {
-		if (dpb[i].flags & V4L2_H264_DPB_ENTRY_FLAG_ACTIVE)
-			dpb_valid |= BIT(HANTRO_H264_DPB_SIZE - 1 - i);
+	dec_param = ctx->h264_dec.ctrls.decode;
+	slice = ctx->h264_dec.ctrls.slices;
 
-		if (dpb[i].flags & V4L2_H264_DPB_ENTRY_FLAG_LONG_TERM)
-			dpb_longterm |= BIT(HANTRO_H264_DPB_SIZE - 1 - i);
-	}
-	vdpu_write_relaxed(vpu, dpb_valid << 16, G1_REG_VALID_REF);
-	vdpu_write_relaxed(vpu, dpb_longterm << 16, G1_REG_LT_REF);
+	vdpu_write_relaxed(vpu, ctx->h264_dec.dpb_valid, G1_REG_VALID_REF);
+	vdpu_write_relaxed(vpu, ctx->h264_dec.dpb_longterm, G1_REG_LT_REF);
 
 	/*
 	 * Set up reference frame picture numbers.
@@ -223,10 +215,8 @@ static void set_ref(struct hantro_ctx *ctx)
 
 	/* Set up addresses of DPB buffers. */
 	for (i = 0; i < HANTRO_H264_DPB_SIZE; i++) {
-		struct vb2_buffer *buf =  hantro_h264_get_ref_buf(ctx, i);
-
-		vdpu_write_relaxed(vpu, vb2_dma_contig_plane_dma_addr(buf, 0),
-				   G1_REG_ADDR_REF(i));
+		dma_addr_t addr = hantro_h264_get_ref_dma_addr(ctx, i);
+		vdpu_write_relaxed(vpu, addr, G1_REG_ADDR_REF(i));
 	}
 }
 
diff --git a/drivers/staging/media/hantro/hantro_h264.c b/drivers/staging/media/hantro/hantro_h264.c
index a77cc28e180a..85c86d728b1a 100644
--- a/drivers/staging/media/hantro/hantro_h264.c
+++ b/drivers/staging/media/hantro/hantro_h264.c
@@ -228,17 +228,65 @@ static void prepare_table(struct hantro_ctx *ctx)
 {
 	const struct hantro_h264_dec_ctrls *ctrls = &ctx->h264_dec.ctrls;
 	const struct v4l2_ctrl_h264_decode_params *dec_param = ctrls->decode;
+	const struct v4l2_ctrl_h264_slice_params *slices = ctrls->slices;
 	struct hantro_h264_dec_priv_tbl *tbl = ctx->h264_dec.priv.cpu;
 	const struct v4l2_h264_dpb_entry *dpb = ctx->h264_dec.dpb;
+	u32 dpb_longterm = 0;
+	u32 dpb_valid = 0;
 	int i;
 
+	/*
+	 * Set up bit maps of valid and long term DPBs.
+	 * NOTE: The bits are reversed, i.e. MSb is DPB 0.
+	 */
+	if ((slices[0].flags & V4L2_H264_SLICE_FLAG_FIELD_PIC) || (slices[0].flags & V4L2_H264_SPS_FLAG_MB_ADAPTIVE_FRAME_FIELD)) {
+		for (i = 0; i < HANTRO_H264_DPB_SIZE * 2; ++i) {
+			// check for correct reference use
+			u32 flag = (i & 0x1) ? V4L2_H264_DPB_ENTRY_FLAG_REF_BOTTOM : V4L2_H264_DPB_ENTRY_FLAG_REF_TOP;
+			if (dpb[i / 2].flags & flag)
+				dpb_valid |= BIT(HANTRO_H264_DPB_SIZE * 2 - 1 - i);
+
+			if (dpb[i / 2].flags & V4L2_H264_DPB_ENTRY_FLAG_LONG_TERM)
+				dpb_longterm |= BIT(HANTRO_H264_DPB_SIZE * 2 - 1 - i);
+		}
+
+		ctx->h264_dec.dpb_valid = dpb_valid;
+		ctx->h264_dec.dpb_longterm = dpb_longterm;
+	} else {
+		for (i = 0; i < HANTRO_H264_DPB_SIZE; ++i) {
+			if (dpb[i].flags & V4L2_H264_DPB_ENTRY_FLAG_ACTIVE)
+				dpb_valid |= BIT(HANTRO_H264_DPB_SIZE - 1 - i);
+
+			if (dpb[i].flags & V4L2_H264_DPB_ENTRY_FLAG_LONG_TERM)
+				dpb_longterm |= BIT(HANTRO_H264_DPB_SIZE - 1 - i);
+		}
+
+		ctx->h264_dec.dpb_valid = dpb_valid << 16;
+		ctx->h264_dec.dpb_longterm = dpb_longterm << 16;
+	}
+
 	for (i = 0; i < HANTRO_H264_DPB_SIZE; ++i) {
-		tbl->poc[i * 2] = dpb[i].top_field_order_cnt;
-		tbl->poc[i * 2 + 1] = dpb[i].bottom_field_order_cnt;
+		if (dpb[i].flags & V4L2_H264_DPB_ENTRY_FLAG_ACTIVE) {
+			tbl->poc[i * 2] = dpb[i].top_field_order_cnt;
+			tbl->poc[i * 2 + 1] = dpb[i].bottom_field_order_cnt;
+		} else {
+			tbl->poc[i * 2] = 0;
+			tbl->poc[i * 2 + 1] = 0;
+		}
 	}
 
-	tbl->poc[32] = dec_param->top_field_order_cnt;
-	tbl->poc[33] = dec_param->bottom_field_order_cnt;
+	if ((slices[0].flags & V4L2_H264_SLICE_FLAG_FIELD_PIC) || !(slices[0].flags & V4L2_H264_SPS_FLAG_MB_ADAPTIVE_FRAME_FIELD)) {
+		if ((slices[0].flags & V4L2_H264_SLICE_FLAG_FIELD_PIC))
+			tbl->poc[32] = (slices[0].flags & V4L2_H264_SLICE_FLAG_BOTTOM_FIELD) ?
+					dec_param->bottom_field_order_cnt :
+					dec_param->top_field_order_cnt;
+		else
+			tbl->poc[32] = min(dec_param->top_field_order_cnt, dec_param->bottom_field_order_cnt);
+		tbl->poc[33] = 0;
+	} else {
+		tbl->poc[32] = dec_param->top_field_order_cnt;
+		tbl->poc[33] = dec_param->bottom_field_order_cnt;
+	}
 
 	reorder_scaling_list(ctx);
 }
@@ -251,51 +299,36 @@ struct hantro_h264_reflist_builder {
 	u8 num_valid;
 };
 
-static s32 get_poc(enum v4l2_field field, s32 top_field_order_cnt,
-		   s32 bottom_field_order_cnt)
-{
-	switch (field) {
-	case V4L2_FIELD_TOP:
-		return top_field_order_cnt;
-	case V4L2_FIELD_BOTTOM:
-		return bottom_field_order_cnt;
-	default:
-		break;
-	}
-
-	return min(top_field_order_cnt, bottom_field_order_cnt);
-}
-
 static void
 init_reflist_builder(struct hantro_ctx *ctx,
 		     struct hantro_h264_reflist_builder *b)
 {
 	const struct v4l2_ctrl_h264_decode_params *dec_param;
-	struct vb2_v4l2_buffer *buf = hantro_get_dst_buf(ctx);
+	const struct v4l2_ctrl_h264_slice_params *slices;
 	const struct v4l2_h264_dpb_entry *dpb = ctx->h264_dec.dpb;
-	struct vb2_queue *cap_q = &ctx->fh.m2m_ctx->cap_q_ctx.q;
 	unsigned int i;
 
 	dec_param = ctx->h264_dec.ctrls.decode;
+	slices = ctx->h264_dec.ctrls.slices;
 
 	memset(b, 0, sizeof(*b));
 	b->dpb = dpb;
-	b->curpoc = get_poc(buf->field, dec_param->top_field_order_cnt,
-			    dec_param->bottom_field_order_cnt);
+	b->curpoc = (slices[0].flags & V4L2_H264_SLICE_FLAG_BOTTOM_FIELD) ?
+		    dec_param->bottom_field_order_cnt :
+		    dec_param->top_field_order_cnt;
 
 	for (i = 0; i < ARRAY_SIZE(ctx->h264_dec.dpb); i++) {
-		int buf_idx;
-
-		if (!(dpb[i].flags & V4L2_H264_DPB_ENTRY_FLAG_ACTIVE))
+		u32 ref_flag = dpb[i].flags & V4L2_H264_DPB_ENTRY_FLAG_REF_FRAME;
+		if (!ref_flag)
 			continue;
 
-		buf_idx = vb2_find_timestamp(cap_q, dpb[i].reference_ts, 0);
-		if (buf_idx < 0)
-			continue;
+		if (ref_flag == V4L2_H264_DPB_ENTRY_FLAG_REF_FRAME)
+			b->pocs[i] = min(dpb[i].bottom_field_order_cnt, dpb[i].top_field_order_cnt);
+		else if (ref_flag == V4L2_H264_DPB_ENTRY_FLAG_REF_BOTTOM)
+			b->pocs[i] = dpb[i].bottom_field_order_cnt;
+		else if (ref_flag == V4L2_H264_DPB_ENTRY_FLAG_REF_TOP)
+			b->pocs[i] = dpb[i].top_field_order_cnt;
 
-		buf = to_vb2_v4l2_buffer(vb2_get_buffer(cap_q, buf_idx));
-		b->pocs[i] = get_poc(buf->field, dpb[i].top_field_order_cnt,
-				     dpb[i].bottom_field_order_cnt);
 		b->unordered_reflist[b->num_valid] = i;
 		b->num_valid++;
 	}
@@ -448,8 +481,7 @@ build_b_ref_lists(const struct hantro_h264_reflist_builder *builder,
 static bool dpb_entry_match(const struct v4l2_h264_dpb_entry *a,
 			    const struct v4l2_h264_dpb_entry *b)
 {
-	return a->top_field_order_cnt == b->top_field_order_cnt &&
-	       a->bottom_field_order_cnt == b->bottom_field_order_cnt;
+	return a->reference_ts == b->reference_ts;
 }
 
 static void update_dpb(struct hantro_ctx *ctx)
@@ -463,13 +495,13 @@ static void update_dpb(struct hantro_ctx *ctx)
 
 	/* Disable all entries by default. */
 	for (i = 0; i < ARRAY_SIZE(ctx->h264_dec.dpb); i++)
-		ctx->h264_dec.dpb[i].flags &= ~V4L2_H264_DPB_ENTRY_FLAG_ACTIVE;
+		ctx->h264_dec.dpb[i].flags = 0;
 
 	/* Try to match new DPB entries with existing ones by their POCs. */
 	for (i = 0; i < ARRAY_SIZE(dec_param->dpb); i++) {
 		const struct v4l2_h264_dpb_entry *ndpb = &dec_param->dpb[i];
 
-		if (!(ndpb->flags & V4L2_H264_DPB_ENTRY_FLAG_ACTIVE))
+		if (!(ndpb->flags & V4L2_H264_DPB_ENTRY_FLAG_VALID))
 			continue;
 
 		/*
@@ -480,8 +512,7 @@ static void update_dpb(struct hantro_ctx *ctx)
 			struct v4l2_h264_dpb_entry *cdpb;
 
 			cdpb = &ctx->h264_dec.dpb[j];
-			if (cdpb->flags & V4L2_H264_DPB_ENTRY_FLAG_ACTIVE ||
-			    !dpb_entry_match(cdpb, ndpb))
+			if (!dpb_entry_match(cdpb, ndpb))
 				continue;
 
 			*cdpb = *ndpb;
@@ -541,6 +572,25 @@ struct vb2_buffer *hantro_h264_get_ref_buf(struct hantro_ctx *ctx,
 	return buf;
 }
 
+dma_addr_t hantro_h264_get_ref_dma_addr(struct hantro_ctx *ctx,
+					unsigned int dpb_idx)
+{
+	struct v4l2_h264_dpb_entry *dpb = ctx->h264_dec.dpb;
+	const struct v4l2_ctrl_h264_decode_params *dec_param = ctx->h264_dec.ctrls.decode;
+	const struct v4l2_ctrl_h264_slice_params *slices = ctx->h264_dec.ctrls.slices;
+
+	struct vb2_buffer *buf = hantro_h264_get_ref_buf(ctx, dpb_idx);
+	s32 cur_poc = slices[0].flags & V4L2_H264_SLICE_FLAG_BOTTOM_FIELD ?
+		      dec_param->bottom_field_order_cnt :
+		      dec_param->top_field_order_cnt;
+	u32 flags = dpb[dpb_idx].flags & V4L2_H264_DPB_ENTRY_FLAG_FIELD_PICTURE ? 0x2 : 0;
+	flags |= abs(dpb[dpb_idx].top_field_order_cnt - cur_poc) <
+		 abs(dpb[dpb_idx].bottom_field_order_cnt - cur_poc) ?
+		 0x1 : 0;
+
+	return vb2_dma_contig_plane_dma_addr(buf, 0) | flags;
+}
+
 int hantro_h264_dec_prepare_run(struct hantro_ctx *ctx)
 {
 	struct hantro_h264_dec_hw_ctx *h264_ctx = &ctx->h264_dec;
diff --git a/drivers/staging/media/hantro/hantro_hw.h b/drivers/staging/media/hantro/hantro_hw.h
index 8adad8ac9b1d..d58f2a36ca40 100644
--- a/drivers/staging/media/hantro/hantro_hw.h
+++ b/drivers/staging/media/hantro/hantro_hw.h
@@ -86,6 +86,8 @@ struct hantro_h264_dec_hw_ctx {
 	struct v4l2_h264_dpb_entry dpb[HANTRO_H264_DPB_SIZE];
 	struct hantro_h264_dec_reflists reflists;
 	struct hantro_h264_dec_ctrls ctrls;
+	u32 dpb_longterm;
+	u32 dpb_valid;
 };
 
 /**
@@ -157,6 +159,8 @@ void hantro_jpeg_enc_exit(struct hantro_ctx *ctx);
 
 struct vb2_buffer *hantro_h264_get_ref_buf(struct hantro_ctx *ctx,
 					   unsigned int dpb_idx);
+dma_addr_t hantro_h264_get_ref_dma_addr(struct hantro_ctx *ctx,
+					unsigned int dpb_idx);
 int hantro_h264_dec_prepare_run(struct hantro_ctx *ctx);
 void hantro_g1_h264_dec_run(struct hantro_ctx *ctx);
 int hantro_h264_dec_init(struct hantro_ctx *ctx);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [RFC 07/12] media: uapi: h264: Add DPB entry field reference flags
       [not found] ` <20190901124531.23645-1-jonas@kwiboo.se>
                     ` (5 preceding siblings ...)
  2019-09-01 12:45   ` [RFC 08/12] media: hantro: Fix H264 decoding of field encoded content Jonas Karlman
@ 2019-09-01 12:45   ` Jonas Karlman
  2019-09-01 12:45   ` [RFC 09/12] media: hantro: Refactor G1 H264 code Jonas Karlman
                     ` (3 subsequent siblings)
  10 siblings, 0 replies; 33+ messages in thread
From: Jonas Karlman @ 2019-09-01 12:45 UTC (permalink / raw)
  To: Ezequiel Garcia
  Cc: Mauro Carvalho Chehab, Hans Verkuil, Boris Brezillon,
	Philipp Zabel, Paul Kocialkowski, linux-media, linux-rockchip,
	linux-kernel, Jonas Karlman

Add DPB entry flags to help indicate when a reference frame is a field picture
and how the DPB entry is referenced, top or bottom field or full frame.

Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
---
 Documentation/media/uapi/v4l/ext-ctrls-codec.rst | 12 ++++++++++++
 include/media/h264-ctrls.h                       |  4 ++++
 2 files changed, 16 insertions(+)

diff --git a/Documentation/media/uapi/v4l/ext-ctrls-codec.rst b/Documentation/media/uapi/v4l/ext-ctrls-codec.rst
index bc5dd8e76567..eb6c32668ad7 100644
--- a/Documentation/media/uapi/v4l/ext-ctrls-codec.rst
+++ b/Documentation/media/uapi/v4l/ext-ctrls-codec.rst
@@ -2022,6 +2022,18 @@ enum v4l2_mpeg_video_h264_hierarchical_coding_type -
     * - ``V4L2_H264_DPB_ENTRY_FLAG_LONG_TERM``
       - 0x00000004
       - The DPB entry is a long term reference frame
+    * - ``V4L2_H264_DPB_ENTRY_FLAG_FIELD_PICTURE``
+      - 0x00000008
+      - The DPB entry is a field picture
+    * - ``V4L2_H264_DPB_ENTRY_FLAG_REF_TOP``
+      - 0x00000010
+      - The DPB entry is a top field reference
+    * - ``V4L2_H264_DPB_ENTRY_FLAG_REF_BOTTOM``
+      - 0x00000020
+      - The DPB entry is a bottom field reference
+    * - ``V4L2_H264_DPB_ENTRY_FLAG_REF_FRAME``
+      - 0x00000030
+      - The DPB entry is a reference frame
 
 ``V4L2_CID_MPEG_VIDEO_H264_DECODE_MODE (enum)``
     Specifies the decoding mode to use. Currently exposes slice-based and
diff --git a/include/media/h264-ctrls.h b/include/media/h264-ctrls.h
index e877bf1d537c..76020ebd1e6c 100644
--- a/include/media/h264-ctrls.h
+++ b/include/media/h264-ctrls.h
@@ -185,6 +185,10 @@ struct v4l2_ctrl_h264_slice_params {
 #define V4L2_H264_DPB_ENTRY_FLAG_VALID		0x01
 #define V4L2_H264_DPB_ENTRY_FLAG_ACTIVE		0x02
 #define V4L2_H264_DPB_ENTRY_FLAG_LONG_TERM	0x04
+#define V4L2_H264_DPB_ENTRY_FLAG_FIELD_PICTURE	0x08
+#define V4L2_H264_DPB_ENTRY_FLAG_REF_TOP	0x10
+#define V4L2_H264_DPB_ENTRY_FLAG_REF_BOTTOM	0x20
+#define V4L2_H264_DPB_ENTRY_FLAG_REF_FRAME	0x30
 
 struct v4l2_h264_dpb_entry {
 	__u64 reference_ts;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [RFC 09/12] media: hantro: Refactor G1 H264 code
       [not found] ` <20190901124531.23645-1-jonas@kwiboo.se>
                     ` (6 preceding siblings ...)
  2019-09-01 12:45   ` [RFC 07/12] media: uapi: h264: Add DPB entry field reference flags Jonas Karlman
@ 2019-09-01 12:45   ` Jonas Karlman
  2019-09-01 12:45   ` [RFC 10/12] media: hantro: Add support for H264 decoding on RK3399 Jonas Karlman
                     ` (2 subsequent siblings)
  10 siblings, 0 replies; 33+ messages in thread
From: Jonas Karlman @ 2019-09-01 12:45 UTC (permalink / raw)
  To: Ezequiel Garcia
  Cc: Mauro Carvalho Chehab, Hans Verkuil, Boris Brezillon,
	Philipp Zabel, Paul Kocialkowski, linux-media, linux-rockchip,
	linux-kernel, Jonas Karlman

Use generated code from my rockchip-vpu-regtool

This need code cleanup and formatting

Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
---
 .../staging/media/hantro/hantro_g1_h264_dec.c | 661 +++++++++++-------
 drivers/staging/media/hantro/hantro_h264.c    |  14 +
 drivers/staging/media/hantro/hantro_hw.h      |   2 +
 3 files changed, 439 insertions(+), 238 deletions(-)

diff --git a/drivers/staging/media/hantro/hantro_g1_h264_dec.c b/drivers/staging/media/hantro/hantro_g1_h264_dec.c
index bc628ef73b29..4b82b9fd5252 100644
--- a/drivers/staging/media/hantro/hantro_g1_h264_dec.c
+++ b/drivers/staging/media/hantro/hantro_g1_h264_dec.c
@@ -1,6 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0
 /*
- * Rockchip RK3288 VPU codec driver
+ * Hantro VPU codec driver
  *
  * Copyright (c) 2014 Rockchip Electronics Co., Ltd.
  *	Hertz Wong <hertz.wong@rock-chips.com>
@@ -15,273 +15,458 @@
 
 #include <media/v4l2-mem2mem.h>
 
-#include "hantro_g1_regs.h"
 #include "hantro_hw.h"
 #include "hantro_v4l2.h"
 
 #define MV_OFFSET_420	384
 #define MV_OFFSET_400	256
 
-static void set_params(struct hantro_ctx *ctx)
-{
-	const struct hantro_h264_dec_ctrls *ctrls = &ctx->h264_dec.ctrls;
-	const struct v4l2_ctrl_h264_decode_params *dec_param = ctrls->decode;
-	const struct v4l2_ctrl_h264_slice_params *slices = ctrls->slices;
-	const struct v4l2_ctrl_h264_sps *sps = ctrls->sps;
-	const struct v4l2_ctrl_h264_pps *pps = ctrls->pps;
-	struct vb2_v4l2_buffer *src_buf = hantro_get_src_buf(ctx);
-	struct hantro_dev *vpu = ctx->dev;
-	u32 reg;
-
-	/* Decoder control register 0. */
-	reg = G1_REG_DEC_CTRL0_DEC_AXI_WR_ID(0x0);
-	if (sps->flags & V4L2_H264_SPS_FLAG_MB_ADAPTIVE_FRAME_FIELD)
-		reg |= G1_REG_DEC_CTRL0_SEQ_MBAFF_E;
-	reg |= G1_REG_DEC_CTRL0_PICORD_COUNT_E;
-	if (dec_param->nal_ref_idc)
-		reg |= G1_REG_DEC_CTRL0_WRITE_MVS_E;
-
-	if (!(sps->flags & V4L2_H264_SPS_FLAG_FRAME_MBS_ONLY) &&
-	    (sps->flags & V4L2_H264_SPS_FLAG_MB_ADAPTIVE_FRAME_FIELD ||
-	     slices[0].flags & V4L2_H264_SLICE_FLAG_FIELD_PIC))
-		reg |= G1_REG_DEC_CTRL0_PIC_INTERLACE_E;
-	if (slices[0].flags & V4L2_H264_SLICE_FLAG_FIELD_PIC)
-		reg |= G1_REG_DEC_CTRL0_PIC_FIELDMODE_E;
-	if (!(slices[0].flags & V4L2_H264_SLICE_FLAG_BOTTOM_FIELD))
-		reg |= G1_REG_DEC_CTRL0_PIC_TOPFIELD_E;
-	vdpu_write_relaxed(vpu, reg, G1_REG_DEC_CTRL0);
-
-	/* Decoder control register 1. */
-	reg = G1_REG_DEC_CTRL1_PIC_MB_WIDTH(H264_MB_WIDTH(ctx->dst_fmt.width)) |
-	      G1_REG_DEC_CTRL1_PIC_MB_HEIGHT_P(H264_MB_HEIGHT(ctx->dst_fmt.height)) |
-	      G1_REG_DEC_CTRL1_REF_FRAMES(sps->max_num_ref_frames);
-	vdpu_write_relaxed(vpu, reg, G1_REG_DEC_CTRL1);
-
-	/* Decoder control register 2. */
-	reg = G1_REG_DEC_CTRL2_CH_QP_OFFSET(pps->chroma_qp_index_offset) |
-	      G1_REG_DEC_CTRL2_CH_QP_OFFSET2(pps->second_chroma_qp_index_offset);
-
-	/* always use the matrix sent from userspace */
-	reg |= G1_REG_DEC_CTRL2_TYPE1_QUANT_E;
-
-	if (!(sps->flags & V4L2_H264_SPS_FLAG_FRAME_MBS_ONLY))
-		reg |= G1_REG_DEC_CTRL2_FIELDPIC_FLAG_E;
-	vdpu_write_relaxed(vpu, reg, G1_REG_DEC_CTRL2);
-
-	/* Decoder control register 3. */
-	reg = G1_REG_DEC_CTRL3_START_CODE_E |
-	      G1_REG_DEC_CTRL3_INIT_QP(pps->pic_init_qp_minus26 + 26) |
-	      G1_REG_DEC_CTRL3_STREAM_LEN(vb2_get_plane_payload(&src_buf->vb2_buf, 0));
-	vdpu_write_relaxed(vpu, reg, G1_REG_DEC_CTRL3);
-
-	/* Decoder control register 4. */
-	reg = G1_REG_DEC_CTRL4_FRAMENUM_LEN(sps->log2_max_frame_num_minus4 + 4) |
-	      G1_REG_DEC_CTRL4_FRAMENUM(slices[0].frame_num) |
-	      G1_REG_DEC_CTRL4_WEIGHT_BIPR_IDC(pps->weighted_bipred_idc);
-	if (pps->flags & V4L2_H264_PPS_FLAG_ENTROPY_CODING_MODE)
-		reg |= G1_REG_DEC_CTRL4_CABAC_E;
-	if (sps->flags & V4L2_H264_SPS_FLAG_DIRECT_8X8_INFERENCE)
-		reg |= G1_REG_DEC_CTRL4_DIR_8X8_INFER_E;
-	if (sps->profile_idc >= 100 && sps->chroma_format_idc == 0)
-		reg |= G1_REG_DEC_CTRL4_BLACKWHITE_E;
-	if (pps->flags & V4L2_H264_PPS_FLAG_WEIGHTED_PRED)
-		reg |= G1_REG_DEC_CTRL4_WEIGHT_PRED_E;
-	vdpu_write_relaxed(vpu, reg, G1_REG_DEC_CTRL4);
-
-	/* Decoder control register 5. */
-	reg = G1_REG_DEC_CTRL5_REFPIC_MK_LEN(slices[0].dec_ref_pic_marking_bit_size) |
-	      G1_REG_DEC_CTRL5_IDR_PIC_ID(slices[0].idr_pic_id);
-	if (pps->flags & V4L2_H264_PPS_FLAG_CONSTRAINED_INTRA_PRED)
-		reg |= G1_REG_DEC_CTRL5_CONST_INTRA_E;
-	if (pps->flags & V4L2_H264_PPS_FLAG_DEBLOCKING_FILTER_CONTROL_PRESENT)
-		reg |= G1_REG_DEC_CTRL5_FILT_CTRL_PRES;
-	if (pps->flags & V4L2_H264_PPS_FLAG_REDUNDANT_PIC_CNT_PRESENT)
-		reg |= G1_REG_DEC_CTRL5_RDPIC_CNT_PRES;
-	if (pps->flags & V4L2_H264_PPS_FLAG_TRANSFORM_8X8_MODE)
-		reg |= G1_REG_DEC_CTRL5_8X8TRANS_FLAG_E;
-	if (dec_param->flags & V4L2_H264_DECODE_PARAM_FLAG_IDR_PIC)
-		reg |= G1_REG_DEC_CTRL5_IDR_PIC_E;
-	vdpu_write_relaxed(vpu, reg, G1_REG_DEC_CTRL5);
-
-	/* Decoder control register 6. */
-	reg = G1_REG_DEC_CTRL6_PPS_ID(slices[0].pic_parameter_set_id) |
-	      G1_REG_DEC_CTRL6_REFIDX0_ACTIVE(pps->num_ref_idx_l0_default_active_minus1 + 1) |
-	      G1_REG_DEC_CTRL6_REFIDX1_ACTIVE(pps->num_ref_idx_l1_default_active_minus1 + 1) |
-	      G1_REG_DEC_CTRL6_POC_LENGTH(slices[0].pic_order_cnt_bit_size);
-	vdpu_write_relaxed(vpu, reg, G1_REG_DEC_CTRL6);
-
-	/* Error concealment register. */
-	vdpu_write_relaxed(vpu, 0, G1_REG_ERR_CONC);
-
-	/* Prediction filter tap register. */
-	vdpu_write_relaxed(vpu,
-			   G1_REG_PRED_FLT_PRED_BC_TAP_0_0(1) |
-			   G1_REG_PRED_FLT_PRED_BC_TAP_0_1(-5 & 0x3ff) |
-			   G1_REG_PRED_FLT_PRED_BC_TAP_0_2(20),
-			   G1_REG_PRED_FLT);
-
-	/* Reference picture buffer control register. */
-	vdpu_write_relaxed(vpu, 0, G1_REG_REF_BUF_CTRL);
-
-	/* Reference picture buffer control register 2. */
-	vdpu_write_relaxed(vpu, G1_REG_REF_BUF_CTRL2_APF_THRESHOLD(8),
-			   G1_REG_REF_BUF_CTRL2);
-}
+#define G1_SWREG(nr)			((nr) * 4)
+
+#define G1_REG_RLC_VLC_BASE		G1_SWREG(12)
+#define G1_REG_DEC_OUT_BASE		G1_SWREG(13)
+#define G1_REG_REFER0_BASE		G1_SWREG(14)
+#define G1_REG_REFER1_BASE		G1_SWREG(15)
+#define G1_REG_REFER2_BASE		G1_SWREG(16)
+#define G1_REG_REFER3_BASE		G1_SWREG(17)
+#define G1_REG_REFER4_BASE		G1_SWREG(18)
+#define G1_REG_REFER5_BASE		G1_SWREG(19)
+#define G1_REG_REFER6_BASE		G1_SWREG(20)
+#define G1_REG_REFER7_BASE		G1_SWREG(21)
+#define G1_REG_REFER8_BASE		G1_SWREG(22)
+#define G1_REG_REFER9_BASE		G1_SWREG(23)
+#define G1_REG_REFER10_BASE		G1_SWREG(24)
+#define G1_REG_REFER11_BASE		G1_SWREG(25)
+#define G1_REG_REFER12_BASE		G1_SWREG(26)
+#define G1_REG_REFER13_BASE		G1_SWREG(27)
+#define G1_REG_REFER14_BASE		G1_SWREG(28)
+#define G1_REG_REFER15_BASE		G1_SWREG(29)
+#define G1_REG_QTABLE_BASE		G1_SWREG(40)
+#define G1_REG_DIR_MV_BASE		G1_SWREG(41)
+#define G1_REG_DEC_E(v)			((v) ? BIT(0) : 0)
+
+#define G1_REG_DEC_AXI_RD_ID(v)		(((v) << 24) & GENMASK(31, 24))
+#define G1_REG_DEC_TIMEOUT_E(v)		((v) ? BIT(23) : 0)
+#define G1_REG_DEC_STRSWAP32_E(v)	((v) ? BIT(22) : 0)
+#define G1_REG_DEC_STRENDIAN_E(v)	((v) ? BIT(21) : 0)
+#define G1_REG_DEC_INSWAP32_E(v)	((v) ? BIT(20) : 0)
+#define G1_REG_DEC_OUTSWAP32_E(v)	((v) ? BIT(19) : 0)
+#define G1_REG_DEC_DATA_DISC_E(v)	((v) ? BIT(18) : 0)
+#define G1_REG_DEC_LATENCY(v)		(((v) << 11) & GENMASK(16, 11))
+#define G1_REG_DEC_CLK_GATE_E(v)	((v) ? BIT(10) : 0)
+#define G1_REG_DEC_IN_ENDIAN(v)		((v) ? BIT(9) : 0)
+#define G1_REG_DEC_OUT_ENDIAN(v)	((v) ? BIT(8) : 0)
+#define G1_REG_DEC_ADV_PRE_DIS(v)	((v) ? BIT(6) : 0)
+#define G1_REG_DEC_SCMD_DIS(v)		((v) ? BIT(5) : 0)
+#define G1_REG_DEC_MAX_BURST(v)		(((v) << 0) & GENMASK(4, 0))
+
+#define G1_REG_DEC_MODE(v)		(((v) << 28) & GENMASK(31, 28))
+#define G1_REG_RLC_MODE_E(v)		((v) ? BIT(27) : 0)
+#define G1_REG_PIC_INTERLACE_E(v)	((v) ? BIT(23) : 0)
+#define G1_REG_PIC_FIELDMODE_E(v)	((v) ? BIT(22) : 0)
+#define G1_REG_PIC_TOPFIELD_E(v)	((v) ? BIT(19) : 0)
+#define G1_REG_FILTERING_DIS(v)		((v) ? BIT(14) : 0)
+#define G1_REG_PIC_FIXED_QUANT(v)	((v) ? BIT(13) : 0)
+#define G1_REG_WRITE_MVS_E(v)		((v) ? BIT(12) : 0)
+#define G1_REG_SEQ_MBAFF_E(v)		((v) ? BIT(10) : 0)
+#define G1_REG_PICORD_COUNT_E(v)	((v) ? BIT(9) : 0)
+#define G1_REG_DEC_AXI_WR_ID(v)		(((v) << 0) & GENMASK(7, 0))
+
+#define G1_REG_PIC_MB_WIDTH(v)		(((v) << 23) & GENMASK(31, 23))
+#define G1_REG_PIC_MB_HEIGHT_P(v)	(((v) << 11) & GENMASK(18, 11))
+#define G1_REG_REF_FRAMES(v)		(((v) << 0) & GENMASK(4, 0))
+
+#define G1_REG_STRM_START_BIT(v)	(((v) << 26) & GENMASK(31, 26))
+#define G1_REG_TYPE1_QUANT_E(v)		((v) ? BIT(24) : 0)
+#define G1_REG_CH_QP_OFFSET(v)		(((v) << 19) & GENMASK(23, 19))
+#define G1_REG_CH_QP_OFFSET2(v)		(((v) << 14) & GENMASK(18, 14))
+#define G1_REG_FIELDPIC_FLAG_E(v)	((v) ? BIT(0) : 0)
+
+#define G1_REG_START_CODE_E(v)		((v) ? BIT(31) : 0)
+#define G1_REG_INIT_QP(v)		(((v) << 25) & GENMASK(30, 25))
+#define G1_REG_CH_8PIX_ILEAV_E(v)	((v) ? BIT(24) : 0)
+#define G1_REG_STREAM_LEN(v)		(((v) << 0) & GENMASK(23, 0))
+
+#define G1_REG_CABAC_E(v)		((v) ? BIT(31) : 0)
+#define G1_REG_BLACKWHITE_E(v)		((v) ? BIT(30) : 0)
+#define G1_REG_DIR_8X8_INFER_E(v)	((v) ? BIT(29) : 0)
+#define G1_REG_WEIGHT_PRED_E(v)		((v) ? BIT(28) : 0)
+#define G1_REG_WEIGHT_BIPR_IDC(v)	(((v) << 26) & GENMASK(27, 26))
+#define G1_REG_FRAMENUM_LEN(v)		(((v) << 16) & GENMASK(20, 16))
+#define G1_REG_FRAMENUM(v)		(((v) << 0) & GENMASK(15, 0))
+
+#define G1_REG_CONST_INTRA_E(v)		((v) ? BIT(31) : 0)
+#define G1_REG_FILT_CTRL_PRES(v)	((v) ? BIT(30) : 0)
+#define G1_REG_RDPIC_CNT_PRES(v)	((v) ? BIT(29) : 0)
+#define G1_REG_8X8TRANS_FLAG_E(v)	((v) ? BIT(28) : 0)
+#define G1_REG_REFPIC_MK_LEN(v)		(((v) << 17) & GENMASK(27, 17))
+#define G1_REG_IDR_PIC_E(v)		((v) ? BIT(16) : 0)
+#define G1_REG_IDR_PIC_ID(v)		(((v) << 0) & GENMASK(15, 0))
+
+#define G1_REG_PPS_ID(v)		(((v) << 24) & GENMASK(31, 24))
+#define G1_REG_REFIDX1_ACTIVE(v)	(((v) << 19) & GENMASK(23, 19))
+#define G1_REG_REFIDX0_ACTIVE(v)	(((v) << 14) & GENMASK(18, 14))
+#define G1_REG_POC_LENGTH(v)		(((v) << 0) & GENMASK(7, 0))
+
+#define G1_REG_PINIT_RLIST_F9(v)	(((v) << 25) & GENMASK(29, 25))
+#define G1_REG_PINIT_RLIST_F8(v)	(((v) << 20) & GENMASK(24, 20))
+#define G1_REG_PINIT_RLIST_F7(v)	(((v) << 15) & GENMASK(19, 15))
+#define G1_REG_PINIT_RLIST_F6(v)	(((v) << 10) & GENMASK(14, 10))
+#define G1_REG_PINIT_RLIST_F5(v)	(((v) << 5) & GENMASK(9, 5))
+#define G1_REG_PINIT_RLIST_F4(v)	(((v) << 0) & GENMASK(4, 0))
+
+#define G1_REG_PINIT_RLIST_F15(v)	(((v) << 25) & GENMASK(29, 25))
+#define G1_REG_PINIT_RLIST_F14(v)	(((v) << 20) & GENMASK(24, 20))
+#define G1_REG_PINIT_RLIST_F13(v)	(((v) << 15) & GENMASK(19, 15))
+#define G1_REG_PINIT_RLIST_F12(v)	(((v) << 10) & GENMASK(14, 10))
+#define G1_REG_PINIT_RLIST_F11(v)	(((v) << 5) & GENMASK(9, 5))
+#define G1_REG_PINIT_RLIST_F10(v)	(((v) << 0) & GENMASK(4, 0))
+
+#define G1_REG_REFER1_NBR(v)		(((v) << 16) & GENMASK(31, 16))
+#define G1_REG_REFER0_NBR(v)		(((v) << 0) & GENMASK(15, 0))
+
+#define G1_REG_REFER3_NBR(v)		(((v) << 16) & GENMASK(31, 16))
+#define G1_REG_REFER2_NBR(v)		(((v) << 0) & GENMASK(15, 0))
+
+#define G1_REG_REFER5_NBR(v)		(((v) << 16) & GENMASK(31, 16))
+#define G1_REG_REFER4_NBR(v)		(((v) << 0) & GENMASK(15, 0))
+
+#define G1_REG_REFER7_NBR(v)		(((v) << 16) & GENMASK(31, 16))
+#define G1_REG_REFER6_NBR(v)		(((v) << 0) & GENMASK(15, 0))
+
+#define G1_REG_REFER9_NBR(v)		(((v) << 16) & GENMASK(31, 16))
+#define G1_REG_REFER8_NBR(v)		(((v) << 0) & GENMASK(15, 0))
+
+#define G1_REG_REFER11_NBR(v)		(((v) << 16) & GENMASK(31, 16))
+#define G1_REG_REFER10_NBR(v)		(((v) << 0) & GENMASK(15, 0))
+
+#define G1_REG_REFER13_NBR(v)		(((v) << 16) & GENMASK(31, 16))
+#define G1_REG_REFER12_NBR(v)		(((v) << 0) & GENMASK(15, 0))
+
+#define G1_REG_REFER15_NBR(v)		(((v) << 16) & GENMASK(31, 16))
+#define G1_REG_REFER14_NBR(v)		(((v) << 0) & GENMASK(15, 0))
+
+#define G1_REG_REFER_LTERM_E(v)		(((v) << 0) & GENMASK(31, 0))
+
+#define G1_REG_REFER_VALID_E(v)		(((v) << 0) & GENMASK(31, 0))
+
+#define G1_REG_BINIT_RLIST_B2(v)	(((v) << 25) & GENMASK(29, 25))
+#define G1_REG_BINIT_RLIST_F2(v)	(((v) << 20) & GENMASK(24, 20))
+#define G1_REG_BINIT_RLIST_B1(v)	(((v) << 15) & GENMASK(19, 15))
+#define G1_REG_BINIT_RLIST_F1(v)	(((v) << 10) & GENMASK(14, 10))
+#define G1_REG_BINIT_RLIST_B0(v)	(((v) << 5) & GENMASK(9, 5))
+#define G1_REG_BINIT_RLIST_F0(v)	(((v) << 0) & GENMASK(4, 0))
+
+#define G1_REG_BINIT_RLIST_B5(v)	(((v) << 25) & GENMASK(29, 25))
+#define G1_REG_BINIT_RLIST_F5(v)	(((v) << 20) & GENMASK(24, 20))
+#define G1_REG_BINIT_RLIST_B4(v)	(((v) << 15) & GENMASK(19, 15))
+#define G1_REG_BINIT_RLIST_F4(v)	(((v) << 10) & GENMASK(14, 10))
+#define G1_REG_BINIT_RLIST_B3(v)	(((v) << 5) & GENMASK(9, 5))
+#define G1_REG_BINIT_RLIST_F3(v)	(((v) << 0) & GENMASK(4, 0))
+
+#define G1_REG_BINIT_RLIST_B8(v)	(((v) << 25) & GENMASK(29, 25))
+#define G1_REG_BINIT_RLIST_F8(v)	(((v) << 20) & GENMASK(24, 20))
+#define G1_REG_BINIT_RLIST_B7(v)	(((v) << 15) & GENMASK(19, 15))
+#define G1_REG_BINIT_RLIST_F7(v)	(((v) << 10) & GENMASK(14, 10))
+#define G1_REG_BINIT_RLIST_B6(v)	(((v) << 5) & GENMASK(9, 5))
+#define G1_REG_BINIT_RLIST_F6(v)	(((v) << 0) & GENMASK(4, 0))
+
+#define G1_REG_BINIT_RLIST_B11(v)	(((v) << 25) & GENMASK(29, 25))
+#define G1_REG_BINIT_RLIST_F11(v)	(((v) << 20) & GENMASK(24, 20))
+#define G1_REG_BINIT_RLIST_B10(v)	(((v) << 15) & GENMASK(19, 15))
+#define G1_REG_BINIT_RLIST_F10(v)	(((v) << 10) & GENMASK(14, 10))
+#define G1_REG_BINIT_RLIST_B9(v)	(((v) << 5) & GENMASK(9, 5))
+#define G1_REG_BINIT_RLIST_F9(v)	(((v) << 0) & GENMASK(4, 0))
+
+#define G1_REG_BINIT_RLIST_B14(v)	(((v) << 25) & GENMASK(29, 25))
+#define G1_REG_BINIT_RLIST_F14(v)	(((v) << 20) & GENMASK(24, 20))
+#define G1_REG_BINIT_RLIST_B13(v)	(((v) << 15) & GENMASK(19, 15))
+#define G1_REG_BINIT_RLIST_F13(v)	(((v) << 10) & GENMASK(14, 10))
+#define G1_REG_BINIT_RLIST_B12(v)	(((v) << 5) & GENMASK(9, 5))
+#define G1_REG_BINIT_RLIST_F12(v)	(((v) << 0) & GENMASK(4, 0))
+
+#define G1_REG_PINIT_RLIST_F3(v)	(((v) << 25) & GENMASK(29, 25))
+#define G1_REG_PINIT_RLIST_F2(v)	(((v) << 20) & GENMASK(24, 20))
+#define G1_REG_PINIT_RLIST_F1(v)	(((v) << 15) & GENMASK(19, 15))
+#define G1_REG_PINIT_RLIST_F0(v)	(((v) << 10) & GENMASK(14, 10))
+#define G1_REG_BINIT_RLIST_B15(v)	(((v) << 5) & GENMASK(9, 5))
+#define G1_REG_BINIT_RLIST_F15(v)	(((v) << 0) & GENMASK(4, 0))
+
+#define G1_REG_STARTMB_X(v)		(((v) << 23) & GENMASK(31, 23))
+#define G1_REG_STARTMB_Y(v)		(((v) << 15) & GENMASK(22, 15))
+
+#define G1_REG_PRED_BC_TAP_0_0(v)	(((v) << 22) & GENMASK(31, 22))
+#define G1_REG_PRED_BC_TAP_0_1(v)	(((v) << 12) & GENMASK(21, 12))
+#define G1_REG_PRED_BC_TAP_0_2(v)	(((v) << 2) & GENMASK(11, 2))
+
+#define G1_REG_REFBU_E(v)		((v) ? BIT(31) : 0)
+
+#define G1_REG_APF_THRESHOLD(v)		(((v) << 0) & GENMASK(13, 0))
+>>>>>>> b22734fb5e2c... Ymedia: hantro: Refactor G1 H264 code
 
-static void set_ref(struct hantro_ctx *ctx)
+void hantro_g1_h264_dec_run(struct hantro_ctx *ctx)
 {
+	struct hantro_dev *vpu = ctx->dev;
+	struct vb2_v4l2_buffer *src_buf, *dst_buf;
+	const struct hantro_h264_dec_ctrls *ctrls;
 	const struct v4l2_ctrl_h264_decode_params *dec_param;
-	const struct v4l2_ctrl_h264_slice_params *slice;
-	struct v4l2_h264_dpb_entry *dpb = ctx->h264_dec.dpb;
+	const struct v4l2_ctrl_h264_slice_params *slices;
+	const struct v4l2_ctrl_h264_sps *sps;
+	const struct v4l2_ctrl_h264_pps *pps;
 	const u8 *b0_reflist, *b1_reflist, *p_reflist;
-	struct hantro_dev *vpu = ctx->dev;
-	int reg_num;
+	dma_addr_t addr;
 	u32 reg;
-	int i;
-
-	dec_param = ctx->h264_dec.ctrls.decode;
-	slice = ctx->h264_dec.ctrls.slices;
-
-	vdpu_write_relaxed(vpu, ctx->h264_dec.dpb_valid, G1_REG_VALID_REF);
-	vdpu_write_relaxed(vpu, ctx->h264_dec.dpb_longterm, G1_REG_LT_REF);
-
-	/*
-	 * Set up reference frame picture numbers.
-	 *
-	 * Each G1_REG_REF_PIC(x) register contains numbers of two
-	 * subsequential reference pictures.
-	 */
-	for (i = 0; i < HANTRO_H264_DPB_SIZE; i += 2) {
-		reg = 0;
-		if (dpb[i].flags & V4L2_H264_DPB_ENTRY_FLAG_LONG_TERM)
-			reg |= G1_REG_REF_PIC_REFER0_NBR(dpb[i].pic_num);
-		else
-			reg |= G1_REG_REF_PIC_REFER0_NBR(dpb[i].frame_num);
-
-		if (dpb[i + 1].flags & V4L2_H264_DPB_ENTRY_FLAG_LONG_TERM)
-			reg |= G1_REG_REF_PIC_REFER1_NBR(dpb[i + 1].pic_num);
-		else
-			reg |= G1_REG_REF_PIC_REFER1_NBR(dpb[i + 1].frame_num);
-
-		vdpu_write_relaxed(vpu, reg, G1_REG_REF_PIC(i / 2));
-	}
+	unsigned int offset = MV_OFFSET_420;
+
+	/* Prepare the H264 decoder context. */
+	if (hantro_h264_dec_prepare_run(ctx))
+		return;
+
+	src_buf = hantro_get_src_buf(ctx);
+	dst_buf = hantro_get_dst_buf(ctx);
+
+	ctrls = &ctx->h264_dec.ctrls;
+	dec_param = ctrls->decode;
+	slices = ctrls->slices;
+	sps = ctrls->sps;
+	pps = ctrls->pps;
 
 	b0_reflist = ctx->h264_dec.reflists.b0;
 	b1_reflist = ctx->h264_dec.reflists.b1;
 	p_reflist = ctx->h264_dec.reflists.p;
 
-	/*
-	 * Each G1_REG_BD_REF_PIC(x) register contains three entries
-	 * of each forward and backward picture list.
-	 */
-	reg_num = 0;
-	for (i = 0; i < 15; i += 3) {
-		reg = G1_REG_BD_REF_PIC_BINIT_RLIST_F0(b0_reflist[i]) |
-		      G1_REG_BD_REF_PIC_BINIT_RLIST_F1(b0_reflist[i + 1]) |
-		      G1_REG_BD_REF_PIC_BINIT_RLIST_F2(b0_reflist[i + 2]) |
-		      G1_REG_BD_REF_PIC_BINIT_RLIST_B0(b1_reflist[i]) |
-		      G1_REG_BD_REF_PIC_BINIT_RLIST_B1(b1_reflist[i + 1]) |
-		      G1_REG_BD_REF_PIC_BINIT_RLIST_B2(b1_reflist[i + 2]);
-		vdpu_write_relaxed(vpu, reg, G1_REG_BD_REF_PIC(reg_num++));
-	}
-
-	/*
-	 * G1_REG_BD_P_REF_PIC register contains last entries (index 15)
-	 * of forward and backward reference picture lists and first 4 entries
-	 * of P forward picture list.
-	 */
-	reg = G1_REG_BD_P_REF_PIC_BINIT_RLIST_F15(b0_reflist[15]) |
-	      G1_REG_BD_P_REF_PIC_BINIT_RLIST_B15(b1_reflist[15]) |
-	      G1_REG_BD_P_REF_PIC_PINIT_RLIST_F0(p_reflist[0]) |
-	      G1_REG_BD_P_REF_PIC_PINIT_RLIST_F1(p_reflist[1]) |
-	      G1_REG_BD_P_REF_PIC_PINIT_RLIST_F2(p_reflist[2]) |
-	      G1_REG_BD_P_REF_PIC_PINIT_RLIST_F3(p_reflist[3]);
-	vdpu_write_relaxed(vpu, reg, G1_REG_BD_P_REF_PIC);
-
-	/*
-	 * Each G1_REG_FWD_PIC(x) register contains six consecutive
-	 * entries of P forward picture list, starting from index 4.
-	 */
-	reg_num = 0;
-	for (i = 4; i < HANTRO_H264_DPB_SIZE; i += 6) {
-		reg = G1_REG_FWD_PIC_PINIT_RLIST_F0(p_reflist[i]) |
-		      G1_REG_FWD_PIC_PINIT_RLIST_F1(p_reflist[i + 1]) |
-		      G1_REG_FWD_PIC_PINIT_RLIST_F2(p_reflist[i + 2]) |
-		      G1_REG_FWD_PIC_PINIT_RLIST_F3(p_reflist[i + 3]) |
-		      G1_REG_FWD_PIC_PINIT_RLIST_F4(p_reflist[i + 4]) |
-		      G1_REG_FWD_PIC_PINIT_RLIST_F5(p_reflist[i + 5]);
-		vdpu_write_relaxed(vpu, reg, G1_REG_FWD_PIC(reg_num++));
-	}
-
-	/* Set up addresses of DPB buffers. */
-	for (i = 0; i < HANTRO_H264_DPB_SIZE; i++) {
-		dma_addr_t addr = hantro_h264_get_ref_dma_addr(ctx, i);
-		vdpu_write_relaxed(vpu, addr, G1_REG_ADDR_REF(i));
-	}
-}
+	reg = G1_REG_DEC_AXI_RD_ID(0xff) |
+	      G1_REG_DEC_TIMEOUT_E(1) |
+	      G1_REG_DEC_STRSWAP32_E(1) |
+	      G1_REG_DEC_STRENDIAN_E(1) |
+	      G1_REG_DEC_INSWAP32_E(1) |
+	      G1_REG_DEC_OUTSWAP32_E(1) |
+	      G1_REG_DEC_DATA_DISC_E(0) |
+	      G1_REG_DEC_LATENCY(0) |
+	      G1_REG_DEC_CLK_GATE_E(1) |
+	      G1_REG_DEC_IN_ENDIAN(0) |
+	      G1_REG_DEC_OUT_ENDIAN(1) |
+	      G1_REG_DEC_ADV_PRE_DIS(0) |
+	      G1_REG_DEC_SCMD_DIS(0) |
+	      G1_REG_DEC_MAX_BURST(16);
+	vdpu_write_relaxed(vpu, reg, G1_SWREG(2));
+
+	reg = G1_REG_DEC_MODE(0) |
+	      G1_REG_RLC_MODE_E(0) |
+	      G1_REG_PIC_INTERLACE_E(!(sps->flags & V4L2_H264_SPS_FLAG_FRAME_MBS_ONLY) && (sps->flags & V4L2_H264_SPS_FLAG_MB_ADAPTIVE_FRAME_FIELD || slices[0].flags & V4L2_H264_SLICE_FLAG_FIELD_PIC)) |
+	      G1_REG_PIC_FIELDMODE_E(slices[0].flags & V4L2_H264_SLICE_FLAG_FIELD_PIC) |
+	      G1_REG_PIC_TOPFIELD_E(!(slices[0].flags & V4L2_H264_SLICE_FLAG_BOTTOM_FIELD)) |
+	      G1_REG_FILTERING_DIS(0) |
+	      G1_REG_PIC_FIXED_QUANT(0) |
+	      G1_REG_WRITE_MVS_E(dec_param->nal_ref_idc) |
+	      G1_REG_SEQ_MBAFF_E(sps->flags & V4L2_H264_SPS_FLAG_MB_ADAPTIVE_FRAME_FIELD) |
+	      G1_REG_PICORD_COUNT_E(1) |
+	      G1_REG_DEC_AXI_WR_ID(0);
+	vdpu_write_relaxed(vpu, reg, G1_SWREG(3));
+
+	reg = G1_REG_PIC_MB_WIDTH(H264_MB_WIDTH(ctx->dst_fmt.width)) |
+	      G1_REG_PIC_MB_HEIGHT_P(H264_MB_HEIGHT(ctx->dst_fmt.height)) |
+	      G1_REG_REF_FRAMES(sps->max_num_ref_frames);
+	vdpu_write_relaxed(vpu, reg, G1_SWREG(4));
+
+	reg = G1_REG_STRM_START_BIT(0) |
+	      G1_REG_TYPE1_QUANT_E(1) |
+	      G1_REG_CH_QP_OFFSET(pps->chroma_qp_index_offset) |
+	      G1_REG_CH_QP_OFFSET2(pps->second_chroma_qp_index_offset) |
+	      G1_REG_FIELDPIC_FLAG_E(!(sps->flags & V4L2_H264_SPS_FLAG_FRAME_MBS_ONLY));
+	vdpu_write_relaxed(vpu, reg, G1_SWREG(5));
+
+	reg = G1_REG_START_CODE_E(1) |
+	      G1_REG_INIT_QP(pps->pic_init_qp_minus26 + 26) |
+	      G1_REG_CH_8PIX_ILEAV_E(0) |
+	      G1_REG_STREAM_LEN(vb2_get_plane_payload(&src_buf->vb2_buf, 0));
+	vdpu_write_relaxed(vpu, reg, G1_SWREG(6));
+
+	reg = G1_REG_CABAC_E(pps->flags & V4L2_H264_PPS_FLAG_ENTROPY_CODING_MODE) |
+	      G1_REG_BLACKWHITE_E(sps->profile_idc >= 100 && sps->chroma_format_idc == 0) |
+	      G1_REG_DIR_8X8_INFER_E(sps->flags & V4L2_H264_SPS_FLAG_DIRECT_8X8_INFERENCE) |
+	      G1_REG_WEIGHT_PRED_E(pps->flags & V4L2_H264_PPS_FLAG_WEIGHTED_PRED) |
+	      G1_REG_WEIGHT_BIPR_IDC(pps->weighted_bipred_idc) |
+	      G1_REG_FRAMENUM_LEN(sps->log2_max_frame_num_minus4 + 4) |
+	      G1_REG_FRAMENUM(slices[0].frame_num);
+	vdpu_write_relaxed(vpu, reg, G1_SWREG(7));
+
+	reg = G1_REG_CONST_INTRA_E(pps->flags & V4L2_H264_PPS_FLAG_CONSTRAINED_INTRA_PRED) |
+	      G1_REG_FILT_CTRL_PRES(pps->flags & V4L2_H264_PPS_FLAG_DEBLOCKING_FILTER_CONTROL_PRESENT) |
+	      G1_REG_RDPIC_CNT_PRES(pps->flags & V4L2_H264_PPS_FLAG_REDUNDANT_PIC_CNT_PRESENT) |
+	      G1_REG_8X8TRANS_FLAG_E(pps->flags & V4L2_H264_PPS_FLAG_TRANSFORM_8X8_MODE) |
+	      G1_REG_REFPIC_MK_LEN(slices[0].dec_ref_pic_marking_bit_size) |
+	      G1_REG_IDR_PIC_E(dec_param->flags & V4L2_H264_DECODE_PARAM_FLAG_IDR_PIC) |
+	      G1_REG_IDR_PIC_ID(slices[0].idr_pic_id);
+	vdpu_write_relaxed(vpu, reg, G1_SWREG(8));
+
+	reg = G1_REG_PPS_ID(slices[0].pic_parameter_set_id) |
+	      G1_REG_REFIDX1_ACTIVE(pps->num_ref_idx_l1_default_active_minus1 + 1) |
+	      G1_REG_REFIDX0_ACTIVE(pps->num_ref_idx_l0_default_active_minus1 + 1) |
+	      G1_REG_POC_LENGTH(slices[0].pic_order_cnt_bit_size);
+	vdpu_write_relaxed(vpu, reg, G1_SWREG(9));
+
+	reg = G1_REG_PINIT_RLIST_F9(p_reflist[9]) |
+	      G1_REG_PINIT_RLIST_F8(p_reflist[8]) |
+	      G1_REG_PINIT_RLIST_F7(p_reflist[7]) |
+	      G1_REG_PINIT_RLIST_F6(p_reflist[6]) |
+	      G1_REG_PINIT_RLIST_F5(p_reflist[5]) |
+	      G1_REG_PINIT_RLIST_F4(p_reflist[4]);
+	vdpu_write_relaxed(vpu, reg, G1_SWREG(10));
+
+	reg = G1_REG_PINIT_RLIST_F15(p_reflist[15]) |
+	      G1_REG_PINIT_RLIST_F14(p_reflist[14]) |
+	      G1_REG_PINIT_RLIST_F13(p_reflist[13]) |
+	      G1_REG_PINIT_RLIST_F12(p_reflist[12]) |
+	      G1_REG_PINIT_RLIST_F11(p_reflist[11]) |
+	      G1_REG_PINIT_RLIST_F10(p_reflist[10]);
+	vdpu_write_relaxed(vpu, reg, G1_SWREG(11));
+
+	reg = G1_REG_REFER1_NBR(hantro_h264_get_ref_nbr(ctx, 1)) |
+	      G1_REG_REFER0_NBR(hantro_h264_get_ref_nbr(ctx, 0));
+	vdpu_write_relaxed(vpu, reg, G1_SWREG(30));
+
+	reg = G1_REG_REFER3_NBR(hantro_h264_get_ref_nbr(ctx, 3)) |
+	      G1_REG_REFER2_NBR(hantro_h264_get_ref_nbr(ctx, 2));
+	vdpu_write_relaxed(vpu, reg, G1_SWREG(31));
+
+	reg = G1_REG_REFER5_NBR(hantro_h264_get_ref_nbr(ctx, 5)) |
+	      G1_REG_REFER4_NBR(hantro_h264_get_ref_nbr(ctx, 4));
+	vdpu_write_relaxed(vpu, reg, G1_SWREG(32));
+
+	reg = G1_REG_REFER7_NBR(hantro_h264_get_ref_nbr(ctx, 7)) |
+	      G1_REG_REFER6_NBR(hantro_h264_get_ref_nbr(ctx, 6));
+	vdpu_write_relaxed(vpu, reg, G1_SWREG(33));
+
+	reg = G1_REG_REFER9_NBR(hantro_h264_get_ref_nbr(ctx, 9)) |
+	      G1_REG_REFER8_NBR(hantro_h264_get_ref_nbr(ctx, 8));
+	vdpu_write_relaxed(vpu, reg, G1_SWREG(34));
+
+	reg = G1_REG_REFER11_NBR(hantro_h264_get_ref_nbr(ctx, 11)) |
+	      G1_REG_REFER10_NBR(hantro_h264_get_ref_nbr(ctx, 10));
+	vdpu_write_relaxed(vpu, reg, G1_SWREG(35));
+
+	reg = G1_REG_REFER13_NBR(hantro_h264_get_ref_nbr(ctx, 13)) |
+	      G1_REG_REFER12_NBR(hantro_h264_get_ref_nbr(ctx, 12));
+	vdpu_write_relaxed(vpu, reg, G1_SWREG(36));
+
+	reg = G1_REG_REFER15_NBR(hantro_h264_get_ref_nbr(ctx, 15)) |
+	      G1_REG_REFER14_NBR(hantro_h264_get_ref_nbr(ctx, 14));
+	vdpu_write_relaxed(vpu, reg, G1_SWREG(37));
+
+	reg = G1_REG_REFER_LTERM_E(ctx->h264_dec.dpb_longterm);
+	vdpu_write_relaxed(vpu, reg, G1_SWREG(38));
+
+	reg = G1_REG_REFER_VALID_E(ctx->h264_dec.dpb_valid);
+	vdpu_write_relaxed(vpu, reg, G1_SWREG(39));
+
+	reg = G1_REG_BINIT_RLIST_B2(b1_reflist[2]) |
+	      G1_REG_BINIT_RLIST_F2(b0_reflist[2]) |
+	      G1_REG_BINIT_RLIST_B1(b1_reflist[1]) |
+	      G1_REG_BINIT_RLIST_F1(b0_reflist[1]) |
+	      G1_REG_BINIT_RLIST_B0(b1_reflist[0]) |
+	      G1_REG_BINIT_RLIST_F0(b0_reflist[0]);
+	vdpu_write_relaxed(vpu, reg, G1_SWREG(42));
+
+	reg = G1_REG_BINIT_RLIST_B5(b1_reflist[5]) |
+	      G1_REG_BINIT_RLIST_F5(b0_reflist[5]) |
+	      G1_REG_BINIT_RLIST_B4(b1_reflist[4]) |
+	      G1_REG_BINIT_RLIST_F4(b0_reflist[4]) |
+	      G1_REG_BINIT_RLIST_B3(b1_reflist[3]) |
+	      G1_REG_BINIT_RLIST_F3(b0_reflist[3]);
+	vdpu_write_relaxed(vpu, reg, G1_SWREG(43));
+
+	reg = G1_REG_BINIT_RLIST_B8(b1_reflist[8]) |
+	      G1_REG_BINIT_RLIST_F8(b0_reflist[8]) |
+	      G1_REG_BINIT_RLIST_B7(b1_reflist[7]) |
+	      G1_REG_BINIT_RLIST_F7(b0_reflist[7]) |
+	      G1_REG_BINIT_RLIST_B6(b1_reflist[6]) |
+	      G1_REG_BINIT_RLIST_F6(b0_reflist[6]);
+	vdpu_write_relaxed(vpu, reg, G1_SWREG(44));
+
+	reg = G1_REG_BINIT_RLIST_B11(b1_reflist[11]) |
+	      G1_REG_BINIT_RLIST_F11(b0_reflist[11]) |
+	      G1_REG_BINIT_RLIST_B10(b1_reflist[10]) |
+	      G1_REG_BINIT_RLIST_F10(b0_reflist[10]) |
+	      G1_REG_BINIT_RLIST_B9(b1_reflist[9]) |
+	      G1_REG_BINIT_RLIST_F9(b0_reflist[9]);
+	vdpu_write_relaxed(vpu, reg, G1_SWREG(45));
+
+	reg = G1_REG_BINIT_RLIST_B14(b1_reflist[14]) |
+	      G1_REG_BINIT_RLIST_F14(b0_reflist[14]) |
+	      G1_REG_BINIT_RLIST_B13(b1_reflist[13]) |
+	      G1_REG_BINIT_RLIST_F13(b0_reflist[13]) |
+	      G1_REG_BINIT_RLIST_B12(b1_reflist[12]) |
+	      G1_REG_BINIT_RLIST_F12(b0_reflist[12]);
+	vdpu_write_relaxed(vpu, reg, G1_SWREG(46));
+
+	reg = G1_REG_PINIT_RLIST_F3(p_reflist[3]) |
+	      G1_REG_PINIT_RLIST_F2(p_reflist[2]) |
+	      G1_REG_PINIT_RLIST_F1(p_reflist[1]) |
+	      G1_REG_PINIT_RLIST_F0(p_reflist[0]) |
+	      G1_REG_BINIT_RLIST_B15(b1_reflist[15]) |
+	      G1_REG_BINIT_RLIST_F15(b0_reflist[15]);
+	vdpu_write_relaxed(vpu, reg, G1_SWREG(47));
+
+	reg = G1_REG_STARTMB_X(0) |
+	      G1_REG_STARTMB_Y(0);
+	vdpu_write_relaxed(vpu, reg, G1_SWREG(48));
+
+	reg = G1_REG_PRED_BC_TAP_0_0(1) |
+	      G1_REG_PRED_BC_TAP_0_1((u32)-5) |
+	      G1_REG_PRED_BC_TAP_0_2(20);
+	vdpu_write_relaxed(vpu, reg, G1_SWREG(49));
+
+	reg = G1_REG_REFBU_E(0);
+	vdpu_write_relaxed(vpu, reg, G1_SWREG(51));
+
+	reg = G1_REG_APF_THRESHOLD(8);
+	vdpu_write_relaxed(vpu, reg, G1_SWREG(55));
 
-static void set_buffers(struct hantro_ctx *ctx)
-{
-	const struct hantro_h264_dec_ctrls *ctrls = &ctx->h264_dec.ctrls;
-	struct vb2_v4l2_buffer *src_buf, *dst_buf;
-	struct hantro_dev *vpu = ctx->dev;
-	dma_addr_t src_dma, dst_dma;
-	unsigned int offset = MV_OFFSET_420;
-
-	src_buf = hantro_get_src_buf(ctx);
-	dst_buf = hantro_get_dst_buf(ctx);
+	/* Auxiliary buffer prepared in hantro_g1_h264_dec_prepare_table(). */
+	vdpu_write_relaxed(vpu, ctx->h264_dec.priv.dma, G1_REG_QTABLE_BASE);
 
 	/* Source (stream) buffer. */
-	src_dma = vb2_dma_contig_plane_dma_addr(&src_buf->vb2_buf, 0);
-	vdpu_write_relaxed(vpu, src_dma, G1_REG_ADDR_STR);
+	addr = vb2_dma_contig_plane_dma_addr(&src_buf->vb2_buf, 0);
+	vdpu_write_relaxed(vpu, addr, G1_REG_RLC_VLC_BASE);
 
 	/* Destination (decoded frame) buffer. */
-	dst_dma = vb2_dma_contig_plane_dma_addr(&dst_buf->vb2_buf, 0);
+	addr = vb2_dma_contig_plane_dma_addr(&dst_buf->vb2_buf, 0);
 	if (ctrls->slices[0].flags & V4L2_H264_SLICE_FLAG_BOTTOM_FIELD)
-		dst_dma += ALIGN(ctx->dst_fmt.width, H264_MB_DIM);
-	vdpu_write_relaxed(vpu, dst_dma, G1_REG_ADDR_DST);
+		addr += ALIGN(ctx->dst_fmt.width, H264_MB_DIM);
+	vdpu_write_relaxed(vpu, addr, G1_REG_DEC_OUT_BASE);
 
 	/* Motion vector buffer is located after the decoded frame. */
-	dst_dma = vb2_dma_contig_plane_dma_addr(&dst_buf->vb2_buf, 0);
-	if (ctrls->sps->profile_idc >= 100 && ctrls->sps->chroma_format_idc == 0)
+	addr = vb2_dma_contig_plane_dma_addr(&dst_buf->vb2_buf, 0);
+	if (sps->profile_idc >= 100 && sps->chroma_format_idc == 0)
 		offset = MV_OFFSET_400;
-	dst_dma += offset * H264_MB_WIDTH(ctx->dst_fmt.width) *
+	addr += offset * H264_MB_WIDTH(ctx->dst_fmt.width) *
 		   H264_MB_HEIGHT(ctx->dst_fmt.height);
 	if (ctrls->slices[0].flags & V4L2_H264_SLICE_FLAG_BOTTOM_FIELD)
-		dst_dma += 32 * H264_MB_WIDTH(ctx->dst_fmt.width) *
+		addr += 32 * H264_MB_WIDTH(ctx->dst_fmt.width) *
 			   H264_MB_HEIGHT(ctx->dst_fmt.height);
-	vdpu_write_relaxed(vpu, dst_dma, G1_REG_ADDR_DIR_MV);
-
-	/* Auxiliary buffer prepared in hantro_g1_h264_dec_prepare_table(). */
-	vdpu_write_relaxed(vpu, ctx->h264_dec.priv.dma, G1_REG_ADDR_QTABLE);
-}
-
-void hantro_g1_h264_dec_run(struct hantro_ctx *ctx)
-{
-	struct hantro_dev *vpu = ctx->dev;
-
-	/* Prepare the H264 decoder context. */
-	if (hantro_h264_dec_prepare_run(ctx))
-		return;
-
-	/* Configure hardware registers. */
-	set_params(ctx);
-	set_ref(ctx);
-	set_buffers(ctx);
+	vdpu_write_relaxed(vpu, addr, G1_REG_DIR_MV_BASE);
+
+	vdpu_write_relaxed(vpu, hantro_h264_get_ref_dma_addr(ctx, 0), G1_REG_REFER0_BASE);
+	vdpu_write_relaxed(vpu, hantro_h264_get_ref_dma_addr(ctx, 1), G1_REG_REFER1_BASE);
+	vdpu_write_relaxed(vpu, hantro_h264_get_ref_dma_addr(ctx, 2), G1_REG_REFER2_BASE);
+	vdpu_write_relaxed(vpu, hantro_h264_get_ref_dma_addr(ctx, 3), G1_REG_REFER3_BASE);
+	vdpu_write_relaxed(vpu, hantro_h264_get_ref_dma_addr(ctx, 4), G1_REG_REFER4_BASE);
+	vdpu_write_relaxed(vpu, hantro_h264_get_ref_dma_addr(ctx, 5), G1_REG_REFER5_BASE);
+	vdpu_write_relaxed(vpu, hantro_h264_get_ref_dma_addr(ctx, 6), G1_REG_REFER6_BASE);
+	vdpu_write_relaxed(vpu, hantro_h264_get_ref_dma_addr(ctx, 7), G1_REG_REFER7_BASE);
+	vdpu_write_relaxed(vpu, hantro_h264_get_ref_dma_addr(ctx, 8), G1_REG_REFER8_BASE);
+	vdpu_write_relaxed(vpu, hantro_h264_get_ref_dma_addr(ctx, 9), G1_REG_REFER9_BASE);
+	vdpu_write_relaxed(vpu, hantro_h264_get_ref_dma_addr(ctx, 10), G1_REG_REFER10_BASE);
+	vdpu_write_relaxed(vpu, hantro_h264_get_ref_dma_addr(ctx, 11), G1_REG_REFER11_BASE);
+	vdpu_write_relaxed(vpu, hantro_h264_get_ref_dma_addr(ctx, 12), G1_REG_REFER12_BASE);
+	vdpu_write_relaxed(vpu, hantro_h264_get_ref_dma_addr(ctx, 13), G1_REG_REFER13_BASE);
+	vdpu_write_relaxed(vpu, hantro_h264_get_ref_dma_addr(ctx, 14), G1_REG_REFER14_BASE);
+	vdpu_write_relaxed(vpu, hantro_h264_get_ref_dma_addr(ctx, 15), G1_REG_REFER15_BASE);
 
 	hantro_finish_run(ctx);
 
 	/* Start decoding! */
-	vdpu_write_relaxed(vpu,
-			   G1_REG_CONFIG_DEC_AXI_RD_ID(0xffu) |
-			   G1_REG_CONFIG_DEC_TIMEOUT_E |
-			   G1_REG_CONFIG_DEC_OUT_ENDIAN |
-			   G1_REG_CONFIG_DEC_STRENDIAN_E |
-			   G1_REG_CONFIG_DEC_MAX_BURST(16) |
-			   G1_REG_CONFIG_DEC_OUTSWAP32_E |
-			   G1_REG_CONFIG_DEC_INSWAP32_E |
-			   G1_REG_CONFIG_DEC_STRSWAP32_E |
-			   G1_REG_CONFIG_DEC_CLK_GATE_E,
-			   G1_REG_CONFIG);
-	vdpu_write(vpu, G1_REG_INTERRUPT_DEC_E, G1_REG_INTERRUPT);
+	reg = G1_REG_DEC_E(1);
+	vdpu_write(vpu, reg, G1_SWREG(1));
 }
diff --git a/drivers/staging/media/hantro/hantro_h264.c b/drivers/staging/media/hantro/hantro_h264.c
index 85c86d728b1a..03b37e00b6fd 100644
--- a/drivers/staging/media/hantro/hantro_h264.c
+++ b/drivers/staging/media/hantro/hantro_h264.c
@@ -591,6 +591,20 @@ dma_addr_t hantro_h264_get_ref_dma_addr(struct hantro_ctx *ctx,
 	return vb2_dma_contig_plane_dma_addr(buf, 0) | flags;
 }
 
+u16 hantro_h264_get_ref_nbr(struct hantro_ctx *ctx,
+			    unsigned int dpb_idx)
+{
+	const struct v4l2_h264_dpb_entry *dpb = &ctx->h264_dec.dpb[dpb_idx];
+
+	if (!(dpb->flags & V4L2_H264_DPB_ENTRY_FLAG_ACTIVE))
+		return 0;
+
+	if (dpb->flags & V4L2_H264_DPB_ENTRY_FLAG_LONG_TERM)
+		return dpb->pic_num;
+
+	return dpb->frame_num;
+}
+
 int hantro_h264_dec_prepare_run(struct hantro_ctx *ctx)
 {
 	struct hantro_h264_dec_hw_ctx *h264_ctx = &ctx->h264_dec;
diff --git a/drivers/staging/media/hantro/hantro_hw.h b/drivers/staging/media/hantro/hantro_hw.h
index d58f2a36ca40..51de2fee2233 100644
--- a/drivers/staging/media/hantro/hantro_hw.h
+++ b/drivers/staging/media/hantro/hantro_hw.h
@@ -161,6 +161,8 @@ struct vb2_buffer *hantro_h264_get_ref_buf(struct hantro_ctx *ctx,
 					   unsigned int dpb_idx);
 dma_addr_t hantro_h264_get_ref_dma_addr(struct hantro_ctx *ctx,
 					unsigned int dpb_idx);
+u16 hantro_h264_get_ref_nbr(struct hantro_ctx *ctx,
+			    unsigned int dpb_idx);
 int hantro_h264_dec_prepare_run(struct hantro_ctx *ctx);
 void hantro_g1_h264_dec_run(struct hantro_ctx *ctx);
 int hantro_h264_dec_init(struct hantro_ctx *ctx);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [RFC 11/12] media: hantro: Enable H264 decoding on RK3399
       [not found] ` <20190901124531.23645-1-jonas@kwiboo.se>
                     ` (8 preceding siblings ...)
  2019-09-01 12:45   ` [RFC 10/12] media: hantro: Add support for H264 decoding on RK3399 Jonas Karlman
@ 2019-09-01 12:45   ` " Jonas Karlman
  2019-09-01 12:45   ` [RFC 12/12] media: hantro: Enable H264 decoding on RK3328 Jonas Karlman
  10 siblings, 0 replies; 33+ messages in thread
From: Jonas Karlman @ 2019-09-01 12:45 UTC (permalink / raw)
  To: Ezequiel Garcia
  Cc: Mauro Carvalho Chehab, Hans Verkuil, Boris Brezillon,
	Philipp Zabel, Paul Kocialkowski, linux-media, linux-rockchip,
	linux-kernel, Jonas Karlman

The RK3399 SoC has two VPU blocks capable of H264 decoding, VPU2 and RKVDEC,
this enables support for H264 decoding using the VPU2 block.

Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
---
 drivers/staging/media/hantro/rk3399_vpu_hw.c | 21 +++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/staging/media/hantro/rk3399_vpu_hw.c b/drivers/staging/media/hantro/rk3399_vpu_hw.c
index 14d14bc6b12b..47ca51b75a0d 100644
--- a/drivers/staging/media/hantro/rk3399_vpu_hw.c
+++ b/drivers/staging/media/hantro/rk3399_vpu_hw.c
@@ -60,6 +60,19 @@ static const struct hantro_fmt rk3399_vpu_dec_fmts[] = {
 		.fourcc = V4L2_PIX_FMT_NV12,
 		.codec_mode = HANTRO_MODE_NONE,
 	},
+	{
+		.fourcc = V4L2_PIX_FMT_H264_SLICE,
+		.codec_mode = HANTRO_MODE_H264_DEC,
+		.max_depth = 2,
+		.frmsize = {
+			.min_width = 48,
+			.max_width = 4096,
+			.step_width = H264_MB_DIM,
+			.min_height = 48,
+			.max_height = 2304,
+			.step_height = H264_MB_DIM,
+		},
+	},
 	{
 		.fourcc = V4L2_PIX_FMT_MPEG2_SLICE,
 		.codec_mode = HANTRO_MODE_MPEG2_DEC,
@@ -161,6 +174,12 @@ static const struct hantro_codec_ops rk3399_vpu_codec_ops[] = {
 		.init = hantro_jpeg_enc_init,
 		.exit = hantro_jpeg_enc_exit,
 	},
+	[HANTRO_MODE_H264_DEC] = {
+		.run = rk3399_vpu_h264_dec_run,
+		.reset = rk3399_vpu_dec_reset,
+		.init = hantro_h264_dec_init,
+		.exit = hantro_h264_dec_exit,
+	},
 	[HANTRO_MODE_MPEG2_DEC] = {
 		.run = rk3399_vpu_mpeg2_dec_run,
 		.reset = rk3399_vpu_dec_reset,
@@ -196,7 +215,7 @@ const struct hantro_variant rk3399_vpu_variant = {
 	.dec_fmts = rk3399_vpu_dec_fmts,
 	.num_dec_fmts = ARRAY_SIZE(rk3399_vpu_dec_fmts),
 	.codec = HANTRO_JPEG_ENCODER | HANTRO_MPEG2_DECODER |
-		 HANTRO_VP8_DECODER,
+		 HANTRO_VP8_DECODER | HANTRO_H264_DECODER,
 	.codec_ops = rk3399_vpu_codec_ops,
 	.irqs = rk3399_irqs,
 	.num_irqs = ARRAY_SIZE(rk3399_irqs),
-- 
2.17.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [RFC 10/12] media: hantro: Add support for H264 decoding on RK3399
       [not found] ` <20190901124531.23645-1-jonas@kwiboo.se>
                     ` (7 preceding siblings ...)
  2019-09-01 12:45   ` [RFC 09/12] media: hantro: Refactor G1 H264 code Jonas Karlman
@ 2019-09-01 12:45   ` Jonas Karlman
  2019-09-02 11:46     ` Hans Verkuil
  2019-09-01 12:45   ` [RFC 11/12] media: hantro: Enable " Jonas Karlman
  2019-09-01 12:45   ` [RFC 12/12] media: hantro: Enable H264 decoding on RK3328 Jonas Karlman
  10 siblings, 1 reply; 33+ messages in thread
From: Jonas Karlman @ 2019-09-01 12:45 UTC (permalink / raw)
  To: Ezequiel Garcia
  Cc: Mauro Carvalho Chehab, Hans Verkuil, Boris Brezillon,
	Philipp Zabel, Paul Kocialkowski, linux-media, linux-rockchip,
	linux-kernel, Jonas Karlman

Rockchip RK3399 SoC has the same Hantro G1 IP block
as RK3288, but the registers are entirely different.

In a similar fashion as MPEG-2 and VP8 decoding,
it's simpler to just add a separate implementation.

Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
---
 drivers/staging/media/hantro/Makefile         |   1 +
 .../staging/media/hantro/hantro_g1_h264_dec.c |   1 -
 drivers/staging/media/hantro/hantro_hw.h      |   1 +
 .../media/hantro/rk3399_vpu_hw_h264_dec.c     | 486 ++++++++++++++++++
 4 files changed, 488 insertions(+), 1 deletion(-)
 create mode 100644 drivers/staging/media/hantro/rk3399_vpu_hw_h264_dec.c

diff --git a/drivers/staging/media/hantro/Makefile b/drivers/staging/media/hantro/Makefile
index 5d6b0383d280..8d33b0e8aa6c 100644
--- a/drivers/staging/media/hantro/Makefile
+++ b/drivers/staging/media/hantro/Makefile
@@ -8,6 +8,7 @@ hantro-vpu-y += \
 		hantro_g1_mpeg2_dec.o \
 		hantro_g1_vp8_dec.o \
 		rk3399_vpu_hw_jpeg_enc.o \
+		rk3399_vpu_hw_h264_dec.o \
 		rk3399_vpu_hw_mpeg2_dec.o \
 		rk3399_vpu_hw_vp8_dec.o \
 		hantro_jpeg.o \
diff --git a/drivers/staging/media/hantro/hantro_g1_h264_dec.c b/drivers/staging/media/hantro/hantro_g1_h264_dec.c
index 4b82b9fd5252..ec2736fb473d 100644
--- a/drivers/staging/media/hantro/hantro_g1_h264_dec.c
+++ b/drivers/staging/media/hantro/hantro_g1_h264_dec.c
@@ -202,7 +202,6 @@
 #define G1_REG_REFBU_E(v)		((v) ? BIT(31) : 0)
 
 #define G1_REG_APF_THRESHOLD(v)		(((v) << 0) & GENMASK(13, 0))
->>>>>>> b22734fb5e2c... Ymedia: hantro: Refactor G1 H264 code
 
 void hantro_g1_h264_dec_run(struct hantro_ctx *ctx)
 {
diff --git a/drivers/staging/media/hantro/hantro_hw.h b/drivers/staging/media/hantro/hantro_hw.h
index 51de2fee2233..00161a4f22ac 100644
--- a/drivers/staging/media/hantro/hantro_hw.h
+++ b/drivers/staging/media/hantro/hantro_hw.h
@@ -165,6 +165,7 @@ u16 hantro_h264_get_ref_nbr(struct hantro_ctx *ctx,
 			    unsigned int dpb_idx);
 int hantro_h264_dec_prepare_run(struct hantro_ctx *ctx);
 void hantro_g1_h264_dec_run(struct hantro_ctx *ctx);
+void rk3399_vpu_h264_dec_run(struct hantro_ctx *ctx);
 int hantro_h264_dec_init(struct hantro_ctx *ctx);
 void hantro_h264_dec_exit(struct hantro_ctx *ctx);
 
diff --git a/drivers/staging/media/hantro/rk3399_vpu_hw_h264_dec.c b/drivers/staging/media/hantro/rk3399_vpu_hw_h264_dec.c
new file mode 100644
index 000000000000..8e480a68ca3d
--- /dev/null
+++ b/drivers/staging/media/hantro/rk3399_vpu_hw_h264_dec.c
@@ -0,0 +1,486 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Hantro VPU codec driver
+ *
+ * Copyright (c) 2014 Rockchip Electronics Co., Ltd.
+ *	Hertz Wong <hertz.wong@rock-chips.com>
+ *	Herman Chen <herman.chen@rock-chips.com>
+ *
+ * Copyright (C) 2014 Google, Inc.
+ *	Tomasz Figa <tfiga@chromium.org>
+ */
+
+#include <linux/types.h>
+#include <linux/sort.h>
+
+#include <media/v4l2-mem2mem.h>
+
+#include "hantro_hw.h"
+#include "hantro_v4l2.h"
+
+#define MV_OFFSET_420	384
+#define MV_OFFSET_400	256
+
+#define VDPU_SWREG(nr)			((nr) * 4)
+
+#define VDPU_REG_DEC_OUT_BASE		VDPU_SWREG(63)
+#define VDPU_REG_RLC_VLC_BASE		VDPU_SWREG(64)
+#define VDPU_REG_QTABLE_BASE		VDPU_SWREG(61)
+#define VDPU_REG_DIR_MV_BASE		VDPU_SWREG(62)
+#define VDPU_REG_REFER0_BASE		VDPU_SWREG(84)
+#define VDPU_REG_REFER1_BASE		VDPU_SWREG(85)
+#define VDPU_REG_REFER2_BASE		VDPU_SWREG(86)
+#define VDPU_REG_REFER3_BASE		VDPU_SWREG(87)
+#define VDPU_REG_REFER4_BASE		VDPU_SWREG(88)
+#define VDPU_REG_REFER5_BASE		VDPU_SWREG(89)
+#define VDPU_REG_REFER6_BASE		VDPU_SWREG(90)
+#define VDPU_REG_REFER7_BASE		VDPU_SWREG(91)
+#define VDPU_REG_REFER8_BASE		VDPU_SWREG(92)
+#define VDPU_REG_REFER9_BASE		VDPU_SWREG(93)
+#define VDPU_REG_REFER10_BASE		VDPU_SWREG(94)
+#define VDPU_REG_REFER11_BASE		VDPU_SWREG(95)
+#define VDPU_REG_REFER12_BASE		VDPU_SWREG(96)
+#define VDPU_REG_REFER13_BASE		VDPU_SWREG(97)
+#define VDPU_REG_REFER14_BASE		VDPU_SWREG(98)
+#define VDPU_REG_REFER15_BASE		VDPU_SWREG(99)
+#define VDPU_REG_DEC_E(v)		((v) ? BIT(0) : 0)
+
+#define VDPU_REG_DEC_ADV_PRE_DIS(v)	((v) ? BIT(11) : 0)
+#define VDPU_REG_DEC_SCMD_DIS(v)	((v) ? BIT(10) : 0)
+#define VDPU_REG_FILTERING_DIS(v)	((v) ? BIT(8) : 0)
+#define VDPU_REG_PIC_FIXED_QUANT(v)	((v) ? BIT(7) : 0)
+#define VDPU_REG_DEC_LATENCY(v)		(((v) << 1) & GENMASK(6, 1))
+
+#define VDPU_REG_INIT_QP(v)		(((v) << 25) & GENMASK(30, 25))
+#define VDPU_REG_STREAM_LEN(v)		(((v) << 0) & GENMASK(23, 0))
+
+#define VDPU_REG_APF_THRESHOLD(v)	(((v) << 17) & GENMASK(30, 17))
+#define VDPU_REG_STARTMB_X(v)		(((v) << 8) & GENMASK(16, 8))
+#define VDPU_REG_STARTMB_Y(v)		(((v) << 0) & GENMASK(7, 0))
+
+#define VDPU_REG_DEC_MODE(v)		(((v) << 0) & GENMASK(3, 0))
+
+#define VDPU_REG_DEC_STRENDIAN_E(v)	((v) ? BIT(5) : 0)
+#define VDPU_REG_DEC_STRSWAP32_E(v)	((v) ? BIT(4) : 0)
+#define VDPU_REG_DEC_OUTSWAP32_E(v)	((v) ? BIT(3) : 0)
+#define VDPU_REG_DEC_INSWAP32_E(v)	((v) ? BIT(2) : 0)
+#define VDPU_REG_DEC_OUT_ENDIAN(v)	((v) ? BIT(1) : 0)
+#define VDPU_REG_DEC_IN_ENDIAN(v)	((v) ? BIT(0) : 0)
+
+#define VDPU_REG_DEC_DATA_DISC_E(v)	((v) ? BIT(22) : 0)
+#define VDPU_REG_DEC_MAX_BURST(v)	(((v) << 16) & GENMASK(20, 16))
+#define VDPU_REG_DEC_AXI_WR_ID(v)	(((v) << 8) & GENMASK(15, 8))
+#define VDPU_REG_DEC_AXI_RD_ID(v)	(((v) << 0) & GENMASK(7, 0))
+
+#define VDPU_REG_START_CODE_E(v)	((v) ? BIT(22) : 0)
+#define VDPU_REG_CH_8PIX_ILEAV_E(v)	((v) ? BIT(21) : 0)
+#define VDPU_REG_RLC_MODE_E(v)		((v) ? BIT(20) : 0)
+#define VDPU_REG_PIC_INTERLACE_E(v)	((v) ? BIT(17) : 0)
+#define VDPU_REG_PIC_FIELDMODE_E(v)	((v) ? BIT(16) : 0)
+#define VDPU_REG_PIC_TOPFIELD_E(v)	((v) ? BIT(13) : 0)
+#define VDPU_REG_WRITE_MVS_E(v)		((v) ? BIT(10) : 0)
+#define VDPU_REG_SEQ_MBAFF_E(v)		((v) ? BIT(7) : 0)
+#define VDPU_REG_PICORD_COUNT_E(v)	((v) ? BIT(6) : 0)
+#define VDPU_REG_DEC_TIMEOUT_E(v)	((v) ? BIT(5) : 0)
+#define VDPU_REG_DEC_CLK_GATE_E(v)	((v) ? BIT(4) : 0)
+
+#define VDPU_REG_PRED_BC_TAP_0_0(v)	(((v) << 22) & GENMASK(31, 22))
+#define VDPU_REG_PRED_BC_TAP_0_1(v)	(((v) << 12) & GENMASK(21, 12))
+#define VDPU_REG_PRED_BC_TAP_0_2(v)	(((v) << 2) & GENMASK(11, 2))
+
+#define VDPU_REG_REFBU_E(v)		((v) ? BIT(31) : 0)
+
+#define VDPU_REG_PINIT_RLIST_F9(v)	(((v) << 25) & GENMASK(29, 25))
+#define VDPU_REG_PINIT_RLIST_F8(v)	(((v) << 20) & GENMASK(24, 20))
+#define VDPU_REG_PINIT_RLIST_F7(v)	(((v) << 15) & GENMASK(19, 15))
+#define VDPU_REG_PINIT_RLIST_F6(v)	(((v) << 10) & GENMASK(14, 10))
+#define VDPU_REG_PINIT_RLIST_F5(v)	(((v) << 5) & GENMASK(9, 5))
+#define VDPU_REG_PINIT_RLIST_F4(v)	(((v) << 0) & GENMASK(4, 0))
+
+#define VDPU_REG_PINIT_RLIST_F15(v)	(((v) << 25) & GENMASK(29, 25))
+#define VDPU_REG_PINIT_RLIST_F14(v)	(((v) << 20) & GENMASK(24, 20))
+#define VDPU_REG_PINIT_RLIST_F13(v)	(((v) << 15) & GENMASK(19, 15))
+#define VDPU_REG_PINIT_RLIST_F12(v)	(((v) << 10) & GENMASK(14, 10))
+#define VDPU_REG_PINIT_RLIST_F11(v)	(((v) << 5) & GENMASK(9, 5))
+#define VDPU_REG_PINIT_RLIST_F10(v)	(((v) << 0) & GENMASK(4, 0))
+
+#define VDPU_REG_REFER1_NBR(v)		(((v) << 16) & GENMASK(31, 16))
+#define VDPU_REG_REFER0_NBR(v)		(((v) << 0) & GENMASK(15, 0))
+
+#define VDPU_REG_REFER3_NBR(v)		(((v) << 16) & GENMASK(31, 16))
+#define VDPU_REG_REFER2_NBR(v)		(((v) << 0) & GENMASK(15, 0))
+
+#define VDPU_REG_REFER5_NBR(v)		(((v) << 16) & GENMASK(31, 16))
+#define VDPU_REG_REFER4_NBR(v)		(((v) << 0) & GENMASK(15, 0))
+
+#define VDPU_REG_REFER7_NBR(v)		(((v) << 16) & GENMASK(31, 16))
+#define VDPU_REG_REFER6_NBR(v)		(((v) << 0) & GENMASK(15, 0))
+
+#define VDPU_REG_REFER9_NBR(v)		(((v) << 16) & GENMASK(31, 16))
+#define VDPU_REG_REFER8_NBR(v)		(((v) << 0) & GENMASK(15, 0))
+
+#define VDPU_REG_REFER11_NBR(v)		(((v) << 16) & GENMASK(31, 16))
+#define VDPU_REG_REFER10_NBR(v)		(((v) << 0) & GENMASK(15, 0))
+
+#define VDPU_REG_REFER13_NBR(v)		(((v) << 16) & GENMASK(31, 16))
+#define VDPU_REG_REFER12_NBR(v)		(((v) << 0) & GENMASK(15, 0))
+
+#define VDPU_REG_REFER15_NBR(v)		(((v) << 16) & GENMASK(31, 16))
+#define VDPU_REG_REFER14_NBR(v)		(((v) << 0) & GENMASK(15, 0))
+
+#define VDPU_REG_BINIT_RLIST_F5(v)	(((v) << 25) & GENMASK(29, 25))
+#define VDPU_REG_BINIT_RLIST_F4(v)	(((v) << 20) & GENMASK(24, 20))
+#define VDPU_REG_BINIT_RLIST_F3(v)	(((v) << 15) & GENMASK(19, 15))
+#define VDPU_REG_BINIT_RLIST_F2(v)	(((v) << 10) & GENMASK(14, 10))
+#define VDPU_REG_BINIT_RLIST_F1(v)	(((v) << 5) & GENMASK(9, 5))
+#define VDPU_REG_BINIT_RLIST_F0(v)	(((v) << 0) & GENMASK(4, 0))
+
+#define VDPU_REG_BINIT_RLIST_F11(v)	(((v) << 25) & GENMASK(29, 25))
+#define VDPU_REG_BINIT_RLIST_F10(v)	(((v) << 20) & GENMASK(24, 20))
+#define VDPU_REG_BINIT_RLIST_F9(v)	(((v) << 15) & GENMASK(19, 15))
+#define VDPU_REG_BINIT_RLIST_F8(v)	(((v) << 10) & GENMASK(14, 10))
+#define VDPU_REG_BINIT_RLIST_F7(v)	(((v) << 5) & GENMASK(9, 5))
+#define VDPU_REG_BINIT_RLIST_F6(v)	(((v) << 0) & GENMASK(4, 0))
+
+#define VDPU_REG_BINIT_RLIST_F15(v)	(((v) << 15) & GENMASK(19, 15))
+#define VDPU_REG_BINIT_RLIST_F14(v)	(((v) << 10) & GENMASK(14, 10))
+#define VDPU_REG_BINIT_RLIST_F13(v)	(((v) << 5) & GENMASK(9, 5))
+#define VDPU_REG_BINIT_RLIST_F12(v)	(((v) << 0) & GENMASK(4, 0))
+
+#define VDPU_REG_BINIT_RLIST_B5(v)	(((v) << 25) & GENMASK(29, 25))
+#define VDPU_REG_BINIT_RLIST_B4(v)	(((v) << 20) & GENMASK(24, 20))
+#define VDPU_REG_BINIT_RLIST_B3(v)	(((v) << 15) & GENMASK(19, 15))
+#define VDPU_REG_BINIT_RLIST_B2(v)	(((v) << 10) & GENMASK(14, 10))
+#define VDPU_REG_BINIT_RLIST_B1(v)	(((v) << 5) & GENMASK(9, 5))
+#define VDPU_REG_BINIT_RLIST_B0(v)	(((v) << 0) & GENMASK(4, 0))
+
+#define VDPU_REG_BINIT_RLIST_B11(v)	(((v) << 25) & GENMASK(29, 25))
+#define VDPU_REG_BINIT_RLIST_B10(v)	(((v) << 20) & GENMASK(24, 20))
+#define VDPU_REG_BINIT_RLIST_B9(v)	(((v) << 15) & GENMASK(19, 15))
+#define VDPU_REG_BINIT_RLIST_B8(v)	(((v) << 10) & GENMASK(14, 10))
+#define VDPU_REG_BINIT_RLIST_B7(v)	(((v) << 5) & GENMASK(9, 5))
+#define VDPU_REG_BINIT_RLIST_B6(v)	(((v) << 0) & GENMASK(4, 0))
+
+#define VDPU_REG_BINIT_RLIST_B15(v)	(((v) << 15) & GENMASK(19, 15))
+#define VDPU_REG_BINIT_RLIST_B14(v)	(((v) << 10) & GENMASK(14, 10))
+#define VDPU_REG_BINIT_RLIST_B13(v)	(((v) << 5) & GENMASK(9, 5))
+#define VDPU_REG_BINIT_RLIST_B12(v)	(((v) << 0) & GENMASK(4, 0))
+
+#define VDPU_REG_PINIT_RLIST_F3(v)	(((v) << 15) & GENMASK(19, 15))
+#define VDPU_REG_PINIT_RLIST_F2(v)	(((v) << 10) & GENMASK(14, 10))
+#define VDPU_REG_PINIT_RLIST_F1(v)	(((v) << 5) & GENMASK(9, 5))
+#define VDPU_REG_PINIT_RLIST_F0(v)	(((v) << 0) & GENMASK(4, 0))
+
+#define VDPU_REG_REFER_LTERM_E(v)	(((v) << 0) & GENMASK(31, 0))
+
+#define VDPU_REG_REFER_VALID_E(v)	(((v) << 0) & GENMASK(31, 0))
+
+#define VDPU_REG_STRM_START_BIT(v)	(((v) << 0) & GENMASK(5, 0))
+
+#define VDPU_REG_CH_QP_OFFSET2(v)	(((v) << 22) & GENMASK(26, 22))
+#define VDPU_REG_CH_QP_OFFSET(v)	(((v) << 17) & GENMASK(21, 17))
+#define VDPU_REG_PIC_MB_HEIGHT_P(v)	(((v) << 9) & GENMASK(16, 9))
+#define VDPU_REG_PIC_MB_WIDTH(v)	(((v) << 0) & GENMASK(8, 0))
+
+#define VDPU_REG_WEIGHT_BIPR_IDC(v)	(((v) << 16) & GENMASK(17, 16))
+#define VDPU_REG_REF_FRAMES(v)		(((v) << 0) & GENMASK(4, 0))
+
+#define VDPU_REG_FILT_CTRL_PRES(v)	((v) ? BIT(31) : 0)
+#define VDPU_REG_RDPIC_CNT_PRES(v)	((v) ? BIT(30) : 0)
+#define VDPU_REG_FRAMENUM_LEN(v)	(((v) << 16) & GENMASK(20, 16))
+#define VDPU_REG_FRAMENUM(v)		(((v) << 0) & GENMASK(15, 0))
+
+#define VDPU_REG_REFPIC_MK_LEN(v)	(((v) << 16) & GENMASK(26, 16))
+#define VDPU_REG_IDR_PIC_ID(v)		(((v) << 0) & GENMASK(15, 0))
+
+#define VDPU_REG_PPS_ID(v)		(((v) << 24) & GENMASK(31, 24))
+#define VDPU_REG_REFIDX1_ACTIVE(v)	(((v) << 19) & GENMASK(23, 19))
+#define VDPU_REG_REFIDX0_ACTIVE(v)	(((v) << 14) & GENMASK(18, 14))
+#define VDPU_REG_POC_LENGTH(v)		(((v) << 0) & GENMASK(7, 0))
+
+#define VDPU_REG_IDR_PIC_E(v)		((v) ? BIT(8) : 0)
+#define VDPU_REG_DIR_8X8_INFER_E(v)	((v) ? BIT(7) : 0)
+#define VDPU_REG_BLACKWHITE_E(v)	((v) ? BIT(6) : 0)
+#define VDPU_REG_CABAC_E(v)		((v) ? BIT(5) : 0)
+#define VDPU_REG_WEIGHT_PRED_E(v)	((v) ? BIT(4) : 0)
+#define VDPU_REG_CONST_INTRA_E(v)	((v) ? BIT(3) : 0)
+#define VDPU_REG_8X8TRANS_FLAG_E(v)	((v) ? BIT(2) : 0)
+#define VDPU_REG_TYPE1_QUANT_E(v)	((v) ? BIT(1) : 0)
+#define VDPU_REG_FIELDPIC_FLAG_E(v)	((v) ? BIT(0) : 0)
+
+void rk3399_vpu_h264_dec_run(struct hantro_ctx *ctx)
+{
+	struct hantro_dev *vpu = ctx->dev;
+	struct vb2_v4l2_buffer *src_buf, *dst_buf;
+	const struct hantro_h264_dec_ctrls *ctrls;
+	const struct v4l2_ctrl_h264_decode_params *dec_param;
+	const struct v4l2_ctrl_h264_slice_params *slices;
+	const struct v4l2_ctrl_h264_sps *sps;
+	const struct v4l2_ctrl_h264_pps *pps;
+	const u8 *b0_reflist, *b1_reflist, *p_reflist;
+	dma_addr_t addr;
+	u32 reg;
+	unsigned int offset = MV_OFFSET_420;
+
+	/* Prepare the H264 decoder context. */
+	if (hantro_h264_dec_prepare_run(ctx))
+		return;
+
+	src_buf = hantro_get_src_buf(ctx);
+	dst_buf = hantro_get_dst_buf(ctx);
+
+	ctrls = &ctx->h264_dec.ctrls;
+	dec_param = ctrls->decode;
+	slices = ctrls->slices;
+	sps = ctrls->sps;
+	pps = ctrls->pps;
+
+	b0_reflist = ctx->h264_dec.reflists.b0;
+	b1_reflist = ctx->h264_dec.reflists.b1;
+	p_reflist = ctx->h264_dec.reflists.p;
+
+	reg = VDPU_REG_DEC_ADV_PRE_DIS(0) |
+	      VDPU_REG_DEC_SCMD_DIS(0) |
+	      VDPU_REG_FILTERING_DIS(0) |
+	      VDPU_REG_PIC_FIXED_QUANT(0) |
+	      VDPU_REG_DEC_LATENCY(0);
+	vdpu_write_relaxed(vpu, reg, VDPU_SWREG(50));
+
+	reg = VDPU_REG_INIT_QP(pps->pic_init_qp_minus26 + 26) |
+	      VDPU_REG_STREAM_LEN(vb2_get_plane_payload(&src_buf->vb2_buf, 0));
+	vdpu_write_relaxed(vpu, reg, VDPU_SWREG(51));
+
+	reg = VDPU_REG_APF_THRESHOLD(8) |
+	      VDPU_REG_STARTMB_X(0) |
+	      VDPU_REG_STARTMB_Y(0);
+	vdpu_write_relaxed(vpu, reg, VDPU_SWREG(52));
+
+	reg = VDPU_REG_DEC_MODE(0);
+	vdpu_write_relaxed(vpu, reg, VDPU_SWREG(53));
+
+	reg = VDPU_REG_DEC_STRENDIAN_E(1) |
+	      VDPU_REG_DEC_STRSWAP32_E(1) |
+	      VDPU_REG_DEC_OUTSWAP32_E(1) |
+	      VDPU_REG_DEC_INSWAP32_E(1) |
+	      VDPU_REG_DEC_OUT_ENDIAN(1) |
+	      VDPU_REG_DEC_IN_ENDIAN(0);
+	vdpu_write_relaxed(vpu, reg, VDPU_SWREG(54));
+
+	reg = VDPU_REG_DEC_DATA_DISC_E(0) |
+	      VDPU_REG_DEC_MAX_BURST(16) |
+	      VDPU_REG_DEC_AXI_WR_ID(0) |
+	      VDPU_REG_DEC_AXI_RD_ID(0xff);
+	vdpu_write_relaxed(vpu, reg, VDPU_SWREG(56));
+
+	reg = VDPU_REG_START_CODE_E(1) |
+	      VDPU_REG_CH_8PIX_ILEAV_E(0) |
+	      VDPU_REG_RLC_MODE_E(0) |
+	      VDPU_REG_PIC_INTERLACE_E(!(sps->flags & V4L2_H264_SPS_FLAG_FRAME_MBS_ONLY) && (sps->flags & V4L2_H264_SPS_FLAG_MB_ADAPTIVE_FRAME_FIELD || slices[0].flags & V4L2_H264_SLICE_FLAG_FIELD_PIC)) |
+	      VDPU_REG_PIC_FIELDMODE_E(slices[0].flags & V4L2_H264_SLICE_FLAG_FIELD_PIC) |
+	      VDPU_REG_PIC_TOPFIELD_E(!(slices[0].flags & V4L2_H264_SLICE_FLAG_BOTTOM_FIELD)) |
+	      VDPU_REG_WRITE_MVS_E(dec_param->nal_ref_idc) |
+	      VDPU_REG_SEQ_MBAFF_E(sps->flags & V4L2_H264_SPS_FLAG_MB_ADAPTIVE_FRAME_FIELD) |
+	      VDPU_REG_PICORD_COUNT_E(1) |
+	      VDPU_REG_DEC_TIMEOUT_E(1) |
+	      VDPU_REG_DEC_CLK_GATE_E(1);
+	vdpu_write_relaxed(vpu, reg, VDPU_SWREG(57));
+
+	reg = VDPU_REG_PRED_BC_TAP_0_0(1) |
+	      VDPU_REG_PRED_BC_TAP_0_1((u32)-5) |
+	      VDPU_REG_PRED_BC_TAP_0_2(20);
+	vdpu_write_relaxed(vpu, reg, VDPU_SWREG(59));
+
+	reg = VDPU_REG_REFBU_E(0);
+	vdpu_write_relaxed(vpu, reg, VDPU_SWREG(65));
+
+	reg = VDPU_REG_PINIT_RLIST_F9(p_reflist[9]) |
+	      VDPU_REG_PINIT_RLIST_F8(p_reflist[8]) |
+	      VDPU_REG_PINIT_RLIST_F7(p_reflist[7]) |
+	      VDPU_REG_PINIT_RLIST_F6(p_reflist[6]) |
+	      VDPU_REG_PINIT_RLIST_F5(p_reflist[5]) |
+	      VDPU_REG_PINIT_RLIST_F4(p_reflist[4]);
+	vdpu_write_relaxed(vpu, reg, VDPU_SWREG(74));
+
+	reg = VDPU_REG_PINIT_RLIST_F15(p_reflist[15]) |
+	      VDPU_REG_PINIT_RLIST_F14(p_reflist[14]) |
+	      VDPU_REG_PINIT_RLIST_F13(p_reflist[13]) |
+	      VDPU_REG_PINIT_RLIST_F12(p_reflist[12]) |
+	      VDPU_REG_PINIT_RLIST_F11(p_reflist[11]) |
+	      VDPU_REG_PINIT_RLIST_F10(p_reflist[10]);
+	vdpu_write_relaxed(vpu, reg, VDPU_SWREG(75));
+
+	reg = VDPU_REG_REFER1_NBR(hantro_h264_get_ref_nbr(ctx, 1)) |
+	      VDPU_REG_REFER0_NBR(hantro_h264_get_ref_nbr(ctx, 0));
+	vdpu_write_relaxed(vpu, reg, VDPU_SWREG(76));
+
+	reg = VDPU_REG_REFER3_NBR(hantro_h264_get_ref_nbr(ctx, 3)) |
+	      VDPU_REG_REFER2_NBR(hantro_h264_get_ref_nbr(ctx, 2));
+	vdpu_write_relaxed(vpu, reg, VDPU_SWREG(77));
+
+	reg = VDPU_REG_REFER5_NBR(hantro_h264_get_ref_nbr(ctx, 5)) |
+	      VDPU_REG_REFER4_NBR(hantro_h264_get_ref_nbr(ctx, 4));
+	vdpu_write_relaxed(vpu, reg, VDPU_SWREG(78));
+
+	reg = VDPU_REG_REFER7_NBR(hantro_h264_get_ref_nbr(ctx, 7)) |
+	      VDPU_REG_REFER6_NBR(hantro_h264_get_ref_nbr(ctx, 6));
+	vdpu_write_relaxed(vpu, reg, VDPU_SWREG(79));
+
+	reg = VDPU_REG_REFER9_NBR(hantro_h264_get_ref_nbr(ctx, 9)) |
+	      VDPU_REG_REFER8_NBR(hantro_h264_get_ref_nbr(ctx, 8));
+	vdpu_write_relaxed(vpu, reg, VDPU_SWREG(80));
+
+	reg = VDPU_REG_REFER11_NBR(hantro_h264_get_ref_nbr(ctx, 11)) |
+	      VDPU_REG_REFER10_NBR(hantro_h264_get_ref_nbr(ctx, 10));
+	vdpu_write_relaxed(vpu, reg, VDPU_SWREG(81));
+
+	reg = VDPU_REG_REFER13_NBR(hantro_h264_get_ref_nbr(ctx, 13)) |
+	      VDPU_REG_REFER12_NBR(hantro_h264_get_ref_nbr(ctx, 12));
+	vdpu_write_relaxed(vpu, reg, VDPU_SWREG(82));
+
+	reg = VDPU_REG_REFER15_NBR(hantro_h264_get_ref_nbr(ctx, 15)) |
+	      VDPU_REG_REFER14_NBR(hantro_h264_get_ref_nbr(ctx, 14));
+	vdpu_write_relaxed(vpu, reg, VDPU_SWREG(83));
+
+	reg = VDPU_REG_BINIT_RLIST_F5(b0_reflist[5]) |
+	      VDPU_REG_BINIT_RLIST_F4(b0_reflist[4]) |
+	      VDPU_REG_BINIT_RLIST_F3(b0_reflist[3]) |
+	      VDPU_REG_BINIT_RLIST_F2(b0_reflist[2]) |
+	      VDPU_REG_BINIT_RLIST_F1(b0_reflist[1]) |
+	      VDPU_REG_BINIT_RLIST_F0(b0_reflist[0]);
+	vdpu_write_relaxed(vpu, reg, VDPU_SWREG(100));
+
+	reg = VDPU_REG_BINIT_RLIST_F11(b0_reflist[11]) |
+	      VDPU_REG_BINIT_RLIST_F10(b0_reflist[10]) |
+	      VDPU_REG_BINIT_RLIST_F9(b0_reflist[9]) |
+	      VDPU_REG_BINIT_RLIST_F8(b0_reflist[8]) |
+	      VDPU_REG_BINIT_RLIST_F7(b0_reflist[7]) |
+	      VDPU_REG_BINIT_RLIST_F6(b0_reflist[6]);
+	vdpu_write_relaxed(vpu, reg, VDPU_SWREG(101));
+
+	reg = VDPU_REG_BINIT_RLIST_F15(b0_reflist[15]) |
+	      VDPU_REG_BINIT_RLIST_F14(b0_reflist[14]) |
+	      VDPU_REG_BINIT_RLIST_F13(b0_reflist[13]) |
+	      VDPU_REG_BINIT_RLIST_F12(b0_reflist[12]);
+	vdpu_write_relaxed(vpu, reg, VDPU_SWREG(102));
+
+	reg = VDPU_REG_BINIT_RLIST_B5(b1_reflist[5]) |
+	      VDPU_REG_BINIT_RLIST_B4(b1_reflist[4]) |
+	      VDPU_REG_BINIT_RLIST_B3(b1_reflist[3]) |
+	      VDPU_REG_BINIT_RLIST_B2(b1_reflist[2]) |
+	      VDPU_REG_BINIT_RLIST_B1(b1_reflist[1]) |
+	      VDPU_REG_BINIT_RLIST_B0(b1_reflist[0]);
+	vdpu_write_relaxed(vpu, reg, VDPU_SWREG(103));
+
+	reg = VDPU_REG_BINIT_RLIST_B11(b1_reflist[11]) |
+	      VDPU_REG_BINIT_RLIST_B10(b1_reflist[10]) |
+	      VDPU_REG_BINIT_RLIST_B9(b1_reflist[9]) |
+	      VDPU_REG_BINIT_RLIST_B8(b1_reflist[8]) |
+	      VDPU_REG_BINIT_RLIST_B7(b1_reflist[7]) |
+	      VDPU_REG_BINIT_RLIST_B6(b1_reflist[6]);
+	vdpu_write_relaxed(vpu, reg, VDPU_SWREG(104));
+
+	reg = VDPU_REG_BINIT_RLIST_B15(b1_reflist[15]) |
+	      VDPU_REG_BINIT_RLIST_B14(b1_reflist[14]) |
+	      VDPU_REG_BINIT_RLIST_B13(b1_reflist[13]) |
+	      VDPU_REG_BINIT_RLIST_B12(b1_reflist[12]);
+	vdpu_write_relaxed(vpu, reg, VDPU_SWREG(105));
+
+	reg = VDPU_REG_PINIT_RLIST_F3(p_reflist[3]) |
+	      VDPU_REG_PINIT_RLIST_F2(p_reflist[2]) |
+	      VDPU_REG_PINIT_RLIST_F1(p_reflist[1]) |
+	      VDPU_REG_PINIT_RLIST_F0(p_reflist[0]);
+	vdpu_write_relaxed(vpu, reg, VDPU_SWREG(106));
+
+	reg = VDPU_REG_REFER_LTERM_E(ctx->h264_dec.dpb_longterm);
+	vdpu_write_relaxed(vpu, reg, VDPU_SWREG(107));
+
+	reg = VDPU_REG_REFER_VALID_E(ctx->h264_dec.dpb_valid);
+	vdpu_write_relaxed(vpu, reg, VDPU_SWREG(108));
+
+	reg = VDPU_REG_STRM_START_BIT(0);
+	vdpu_write_relaxed(vpu, reg, VDPU_SWREG(109));
+
+	reg = VDPU_REG_CH_QP_OFFSET2(pps->second_chroma_qp_index_offset) |
+	      VDPU_REG_CH_QP_OFFSET(pps->chroma_qp_index_offset) |
+	      VDPU_REG_PIC_MB_HEIGHT_P(H264_MB_HEIGHT(ctx->dst_fmt.height)) |
+	      VDPU_REG_PIC_MB_WIDTH(H264_MB_WIDTH(ctx->dst_fmt.width));
+	vdpu_write_relaxed(vpu, reg, VDPU_SWREG(110));
+
+	reg = VDPU_REG_WEIGHT_BIPR_IDC(pps->weighted_bipred_idc) |
+	      VDPU_REG_REF_FRAMES(sps->max_num_ref_frames);
+	vdpu_write_relaxed(vpu, reg, VDPU_SWREG(111));
+
+	reg = VDPU_REG_FILT_CTRL_PRES(pps->flags & V4L2_H264_PPS_FLAG_DEBLOCKING_FILTER_CONTROL_PRESENT) |
+	      VDPU_REG_RDPIC_CNT_PRES(pps->flags & V4L2_H264_PPS_FLAG_REDUNDANT_PIC_CNT_PRESENT) |
+	      VDPU_REG_FRAMENUM_LEN(sps->log2_max_frame_num_minus4 + 4) |
+	      VDPU_REG_FRAMENUM(slices[0].frame_num);
+	vdpu_write_relaxed(vpu, reg, VDPU_SWREG(112));
+
+	reg = VDPU_REG_REFPIC_MK_LEN(slices[0].dec_ref_pic_marking_bit_size) |
+	      VDPU_REG_IDR_PIC_ID(slices[0].idr_pic_id);
+	vdpu_write_relaxed(vpu, reg, VDPU_SWREG(113));
+
+	reg = VDPU_REG_PPS_ID(slices[0].pic_parameter_set_id) |
+	      VDPU_REG_REFIDX1_ACTIVE(pps->num_ref_idx_l1_default_active_minus1 + 1) |
+	      VDPU_REG_REFIDX0_ACTIVE(pps->num_ref_idx_l0_default_active_minus1 + 1) |
+	      VDPU_REG_POC_LENGTH(slices[0].pic_order_cnt_bit_size);
+	vdpu_write_relaxed(vpu, reg, VDPU_SWREG(114));
+
+	reg = VDPU_REG_IDR_PIC_E(dec_param->flags & V4L2_H264_DECODE_PARAM_FLAG_IDR_PIC) |
+	      VDPU_REG_DIR_8X8_INFER_E(sps->flags & V4L2_H264_SPS_FLAG_DIRECT_8X8_INFERENCE) |
+	      VDPU_REG_BLACKWHITE_E(sps->profile_idc >= 100 && sps->chroma_format_idc == 0) |
+	      VDPU_REG_CABAC_E(pps->flags & V4L2_H264_PPS_FLAG_ENTROPY_CODING_MODE) |
+	      VDPU_REG_WEIGHT_PRED_E(pps->flags & V4L2_H264_PPS_FLAG_WEIGHTED_PRED) |
+	      VDPU_REG_CONST_INTRA_E(pps->flags & V4L2_H264_PPS_FLAG_CONSTRAINED_INTRA_PRED) |
+	      VDPU_REG_8X8TRANS_FLAG_E(pps->flags & V4L2_H264_PPS_FLAG_TRANSFORM_8X8_MODE) |
+	      VDPU_REG_TYPE1_QUANT_E(1) |
+	      VDPU_REG_FIELDPIC_FLAG_E(!(sps->flags & V4L2_H264_SPS_FLAG_FRAME_MBS_ONLY));
+	vdpu_write_relaxed(vpu, reg, VDPU_SWREG(115));
+
+	/* Auxiliary buffer prepared in hantro_g1_h264_dec_prepare_table(). */
+	vdpu_write_relaxed(vpu, ctx->h264_dec.priv.dma, VDPU_REG_QTABLE_BASE);
+
+	/* Source (stream) buffer. */
+	addr = vb2_dma_contig_plane_dma_addr(&src_buf->vb2_buf, 0);
+	vdpu_write_relaxed(vpu, addr, VDPU_REG_RLC_VLC_BASE);
+
+	/* Destination (decoded frame) buffer. */
+	addr = vb2_dma_contig_plane_dma_addr(&dst_buf->vb2_buf, 0);
+	if (ctrls->slices[0].flags & V4L2_H264_SLICE_FLAG_BOTTOM_FIELD)
+		addr += ALIGN(ctx->dst_fmt.width, H264_MB_DIM);
+	vdpu_write_relaxed(vpu, addr, VDPU_REG_DEC_OUT_BASE);
+
+	/* Motion vector buffer is located after the decoded frame. */
+	addr = vb2_dma_contig_plane_dma_addr(&dst_buf->vb2_buf, 0);
+	if (sps->profile_idc >= 100 && sps->chroma_format_idc == 0)
+		offset = MV_OFFSET_400;
+	addr += offset * H264_MB_WIDTH(ctx->dst_fmt.width) *
+		   H264_MB_HEIGHT(ctx->dst_fmt.height);
+	if (ctrls->slices[0].flags & V4L2_H264_SLICE_FLAG_BOTTOM_FIELD)
+		addr += 32 * H264_MB_WIDTH(ctx->dst_fmt.width) *
+			   H264_MB_HEIGHT(ctx->dst_fmt.height);
+	vdpu_write_relaxed(vpu, addr, VDPU_REG_DIR_MV_BASE);
+
+	vdpu_write_relaxed(vpu, hantro_h264_get_ref_dma_addr(ctx, 0), VDPU_REG_REFER0_BASE);
+	vdpu_write_relaxed(vpu, hantro_h264_get_ref_dma_addr(ctx, 1), VDPU_REG_REFER1_BASE);
+	vdpu_write_relaxed(vpu, hantro_h264_get_ref_dma_addr(ctx, 2), VDPU_REG_REFER2_BASE);
+	vdpu_write_relaxed(vpu, hantro_h264_get_ref_dma_addr(ctx, 3), VDPU_REG_REFER3_BASE);
+	vdpu_write_relaxed(vpu, hantro_h264_get_ref_dma_addr(ctx, 4), VDPU_REG_REFER4_BASE);
+	vdpu_write_relaxed(vpu, hantro_h264_get_ref_dma_addr(ctx, 5), VDPU_REG_REFER5_BASE);
+	vdpu_write_relaxed(vpu, hantro_h264_get_ref_dma_addr(ctx, 6), VDPU_REG_REFER6_BASE);
+	vdpu_write_relaxed(vpu, hantro_h264_get_ref_dma_addr(ctx, 7), VDPU_REG_REFER7_BASE);
+	vdpu_write_relaxed(vpu, hantro_h264_get_ref_dma_addr(ctx, 8), VDPU_REG_REFER8_BASE);
+	vdpu_write_relaxed(vpu, hantro_h264_get_ref_dma_addr(ctx, 9), VDPU_REG_REFER9_BASE);
+	vdpu_write_relaxed(vpu, hantro_h264_get_ref_dma_addr(ctx, 10), VDPU_REG_REFER10_BASE);
+	vdpu_write_relaxed(vpu, hantro_h264_get_ref_dma_addr(ctx, 11), VDPU_REG_REFER11_BASE);
+	vdpu_write_relaxed(vpu, hantro_h264_get_ref_dma_addr(ctx, 12), VDPU_REG_REFER12_BASE);
+	vdpu_write_relaxed(vpu, hantro_h264_get_ref_dma_addr(ctx, 13), VDPU_REG_REFER13_BASE);
+	vdpu_write_relaxed(vpu, hantro_h264_get_ref_dma_addr(ctx, 14), VDPU_REG_REFER14_BASE);
+	vdpu_write_relaxed(vpu, hantro_h264_get_ref_dma_addr(ctx, 15), VDPU_REG_REFER15_BASE);
+
+	hantro_finish_run(ctx);
+
+	/* Start decoding! */
+	reg = vdpu_read(vpu, VDPU_SWREG(57)) | VDPU_REG_DEC_E(1);
+	vdpu_write(vpu, reg, VDPU_SWREG(57));
+}
-- 
2.17.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [RFC 12/12] media: hantro: Enable H264 decoding on RK3328
       [not found] ` <20190901124531.23645-1-jonas@kwiboo.se>
                     ` (9 preceding siblings ...)
  2019-09-01 12:45   ` [RFC 11/12] media: hantro: Enable " Jonas Karlman
@ 2019-09-01 12:45   ` Jonas Karlman
  10 siblings, 0 replies; 33+ messages in thread
From: Jonas Karlman @ 2019-09-01 12:45 UTC (permalink / raw)
  To: Ezequiel Garcia
  Cc: Mauro Carvalho Chehab, Hans Verkuil, Boris Brezillon,
	Philipp Zabel, Paul Kocialkowski, linux-media, linux-rockchip,
	linux-kernel, Jonas Karlman

RK3328 SoC has the same decoder IP block as RK3399,
lets enable H264 decoding on RK3328.

Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
---
 drivers/staging/media/hantro/rk3399_vpu_hw.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/staging/media/hantro/rk3399_vpu_hw.c b/drivers/staging/media/hantro/rk3399_vpu_hw.c
index 47ca51b75a0d..08b965129377 100644
--- a/drivers/staging/media/hantro/rk3399_vpu_hw.c
+++ b/drivers/staging/media/hantro/rk3399_vpu_hw.c
@@ -232,7 +232,8 @@ const struct hantro_variant rk3328_vpu_variant = {
 	.dec_offset = 0x400,
 	.dec_fmts = rk3399_vpu_dec_fmts,
 	.num_dec_fmts = ARRAY_SIZE(rk3399_vpu_dec_fmts),
-	.codec = HANTRO_MPEG2_DECODER | HANTRO_VP8_DECODER,
+	.codec = HANTRO_MPEG2_DECODER | HANTRO_VP8_DECODER |
+		 HANTRO_H264_DECODER,
 	.codec_ops = rk3399_vpu_codec_ops,
 	.irqs = rk3328_irqs,
 	.num_irqs = ARRAY_SIZE(rk3328_irqs),
-- 
2.17.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC 10/12] media: hantro: Add support for H264 decoding on RK3399
  2019-09-01 12:45   ` [RFC 10/12] media: hantro: Add support for H264 decoding on RK3399 Jonas Karlman
@ 2019-09-02 11:46     ` Hans Verkuil
  2019-09-02 15:25       ` Jonas Karlman
  0 siblings, 1 reply; 33+ messages in thread
From: Hans Verkuil @ 2019-09-02 11:46 UTC (permalink / raw)
  To: Jonas Karlman, Ezequiel Garcia
  Cc: Mauro Carvalho Chehab, Boris Brezillon, Philipp Zabel,
	Paul Kocialkowski, linux-media, linux-rockchip, linux-kernel

On 9/1/19 2:45 PM, Jonas Karlman wrote:
> Rockchip RK3399 SoC has the same Hantro G1 IP block
> as RK3288, but the registers are entirely different.
> 
> In a similar fashion as MPEG-2 and VP8 decoding,
> it's simpler to just add a separate implementation.
> 
> Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
> ---
>  drivers/staging/media/hantro/Makefile         |   1 +
>  .../staging/media/hantro/hantro_g1_h264_dec.c |   1 -
>  drivers/staging/media/hantro/hantro_hw.h      |   1 +
>  .../media/hantro/rk3399_vpu_hw_h264_dec.c     | 486 ++++++++++++++++++
>  4 files changed, 488 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/staging/media/hantro/rk3399_vpu_hw_h264_dec.c
> 
> diff --git a/drivers/staging/media/hantro/Makefile b/drivers/staging/media/hantro/Makefile
> index 5d6b0383d280..8d33b0e8aa6c 100644
> --- a/drivers/staging/media/hantro/Makefile
> +++ b/drivers/staging/media/hantro/Makefile
> @@ -8,6 +8,7 @@ hantro-vpu-y += \
>  		hantro_g1_mpeg2_dec.o \
>  		hantro_g1_vp8_dec.o \
>  		rk3399_vpu_hw_jpeg_enc.o \
> +		rk3399_vpu_hw_h264_dec.o \
>  		rk3399_vpu_hw_mpeg2_dec.o \
>  		rk3399_vpu_hw_vp8_dec.o \
>  		hantro_jpeg.o \
> diff --git a/drivers/staging/media/hantro/hantro_g1_h264_dec.c b/drivers/staging/media/hantro/hantro_g1_h264_dec.c
> index 4b82b9fd5252..ec2736fb473d 100644
> --- a/drivers/staging/media/hantro/hantro_g1_h264_dec.c
> +++ b/drivers/staging/media/hantro/hantro_g1_h264_dec.c
> @@ -202,7 +202,6 @@
>  #define G1_REG_REFBU_E(v)		((v) ? BIT(31) : 0)
>  
>  #define G1_REG_APF_THRESHOLD(v)		(((v) << 0) & GENMASK(13, 0))
> ->>>>>>> b22734fb5e2c... Ymedia: hantro: Refactor G1 H264 code

^^^^^^^^^^^^^^^

Seems to be a left-over from patch 9?

Regards,

	Hans

>  
>  void hantro_g1_h264_dec_run(struct hantro_ctx *ctx)
>  {

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH RFC 00/12] media: hantro: H264 fixes and improvements
  2019-09-01 12:42 [PATCH RFC 00/12] media: hantro: H264 fixes and improvements Jonas Karlman
  2019-09-01 12:45 ` [PATCH 01/12] media: hantro: Fix H264 max frmsize supported on RK3288 Jonas Karlman
       [not found] ` <20190901124531.23645-1-jonas@kwiboo.se>
@ 2019-09-02 13:02 ` Ezequiel Garcia
  2019-09-02 16:28   ` Jonas Karlman
  2 siblings, 1 reply; 33+ messages in thread
From: Ezequiel Garcia @ 2019-09-02 13:02 UTC (permalink / raw)
  To: Jonas Karlman
  Cc: Mauro Carvalho Chehab, Hans Verkuil, Boris Brezillon,
	Philipp Zabel, Paul Kocialkowski, linux-media, linux-rockchip,
	linux-kernel

Hi Jonas,

Thanks for the series, I'll be reviewing this shortly.

On Sun, 2019-09-01 at 12:42 +0000, Jonas Karlman wrote:
> This series contains fixes and improvements for the hantro H264 decoder.
> 
> Patch 1-6 fixes issues and limitations observed when preparing support
> for field encoded content.
> 
> Patch 7 introduce new DPB entry flags that is used to signal how a reference
> frame is referenced. This information is needed to correctly build a
> reference list for field encoded content.
> 
> Patch 8 adds bits to handle field encoded content, this is a rough patch
> and should be reworked with proper code style and formatting.
> Please get back with feedback on how to improve this.
> 
> The following samples from [1] are now playable with patch 1-8
> - H264_1080i-25-interlace_Kaesescheibchen.mkv
> - H264_10_1080i_50_AC3-Astra19.2_ProSieben_HD.ts
> - big_buck_bunny_1080p_H264_AAC_25fps_7200K.mp4
> - h264_tivo_sample.ts
> 
> The rest of the patches refactors G1 H264 code to more closely match
> the code generated by my rockchip-vpu-regtool at [2] and then adds
> support for H264 decoding on RK3399/RK3328 using the VPU2 block.
> This code is early work and needs proper code style and formatting,
> I just wanted to share the early work and get some initial feedback.
> 
> This series has been tested using ffmpeg v4l2 request hwaccel at [3] [4]
> 

What boards have you tested this on?

Thanks,
Ezequiel


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 02/12] media: hantro: Do not reorder H264 scaling list
  2019-09-01 12:45   ` [PATCH 02/12] media: hantro: Do not reorder H264 scaling list Jonas Karlman
@ 2019-09-02 14:00     ` Philipp Zabel
  2019-09-02 16:18       ` Jonas Karlman
  0 siblings, 1 reply; 33+ messages in thread
From: Philipp Zabel @ 2019-09-02 14:00 UTC (permalink / raw)
  To: Jonas Karlman, Ezequiel Garcia
  Cc: Mauro Carvalho Chehab, Hans Verkuil, Boris Brezillon,
	Paul Kocialkowski, linux-media, linux-rockchip, linux-kernel

Hi Jonas,

On Sun, 2019-09-01 at 12:45 +0000, Jonas Karlman wrote:
> Scaling list supplied from userspace using ffmpeg and libva-v4l2-request
> is already in matrix order and can be used without applying the inverse
> scanning process.

"in matrix order" is equivalent to "in raster scan order"?

Could you add this requirement to the
V4L2_CID_MPEG_VIDEO_H264_SCALING_MATRIX documentation?

> The HW also only support 8x8 scaling list for the Y component, indices 0
> and 3 in the scaling list supplied from userspace.
> 
> Remove reordering and write the scaling matrix in an order expected by
> the VPU, also only allocate memory for the two 8x8 lists used.
> 
> Fixes: a9471e25629b ("media: hantro: Add core bits to support H264 decoding")
> Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
> ---
>  drivers/staging/media/hantro/hantro_h264.c | 64 +++++++---------------
>  1 file changed, 20 insertions(+), 44 deletions(-)
> 
> diff --git a/drivers/staging/media/hantro/hantro_h264.c b/drivers/staging/media/hantro/hantro_h264.c
> index 0d758e0c0f99..e2d01145ac4f 100644
> --- a/drivers/staging/media/hantro/hantro_h264.c
> +++ b/drivers/staging/media/hantro/hantro_h264.c
> @@ -20,7 +20,7 @@
>  /* Size with u32 units. */
>  #define CABAC_INIT_BUFFER_SIZE		(460 * 2)
>  #define POC_BUFFER_SIZE			34
> -#define SCALING_LIST_SIZE		(6 * 16 + 6 * 64)
> +#define SCALING_LIST_SIZE		(6 * 16 + 2 * 64)

This changes the size of struct hantro_h264_dec_priv_tbl. Did this
describe the auxiliary buffer format incorrectly before?

>  #define POC_CMP(p0, p1) ((p0) < (p1) ? -1 : 1)
>  
> @@ -194,57 +194,33 @@ static const u32 h264_cabac_table[] = {
>  	0x1f0c2517, 0x1f261440
>  };
>  
> -/*
> - * NOTE: The scaling lists are in zig-zag order, apply inverse scanning process
> - * to get the values in matrix order. In addition, the hardware requires bytes
> - * swapped within each subsequent 4 bytes. Both arrays below include both
> - * transformations.
> - */
> -static const u32 zig_zag_4x4[] = {
> -	3, 2, 7, 11, 6, 1, 0, 5, 10, 15, 14, 9, 4, 8, 13, 12
> -};
> -
> -static const u32 zig_zag_8x8[] = {
> -	3, 2, 11, 19, 10, 1, 0, 9, 18, 27, 35, 26, 17, 8, 7, 6,
> -	15, 16, 25, 34, 43, 51, 42, 33, 24, 23, 14, 5, 4, 13, 22, 31,
> -	32, 41, 50, 59, 58, 49, 40, 39, 30, 21, 12, 20, 29, 38, 47, 48,
> -	57, 56, 55, 46, 37, 28, 36, 45, 54, 63, 62, 53, 44, 52, 61, 60
> -};
> -
>  static void
>  reorder_scaling_list(struct hantro_ctx *ctx)
>  {
>  	const struct hantro_h264_dec_ctrls *ctrls = &ctx->h264_dec.ctrls;
>  	const struct v4l2_ctrl_h264_scaling_matrix *scaling = ctrls->scaling;
> -	const size_t num_list_4x4 = ARRAY_SIZE(scaling->scaling_list_4x4);
> -	const size_t list_len_4x4 = ARRAY_SIZE(scaling->scaling_list_4x4[0]);
> -	const size_t num_list_8x8 = ARRAY_SIZE(scaling->scaling_list_8x8);
> -	const size_t list_len_8x8 = ARRAY_SIZE(scaling->scaling_list_8x8[0]);
>  	struct hantro_h264_dec_priv_tbl *tbl = ctx->h264_dec.priv.cpu;
> -	u8 *dst = tbl->scaling_list;
> -	const u8 *src;
> -	int i, j;
> -
> -	BUILD_BUG_ON(ARRAY_SIZE(zig_zag_4x4) != list_len_4x4);
> -	BUILD_BUG_ON(ARRAY_SIZE(zig_zag_8x8) != list_len_8x8);
> -	BUILD_BUG_ON(ARRAY_SIZE(tbl->scaling_list) !=
> -		     num_list_4x4 * list_len_4x4 +
> -		     num_list_8x8 * list_len_8x8);
> -
> -	src = &scaling->scaling_list_4x4[0][0];
> -	for (i = 0; i < num_list_4x4; ++i) {
> -		for (j = 0; j < list_len_4x4; ++j)
> -			dst[zig_zag_4x4[j]] = src[j];
> -		src += list_len_4x4;
> -		dst += list_len_4x4;
> +	u32 *dst = (u32 *)tbl->scaling_list;
> +	u32 i, j, tmp;
> +
> +	for (i = 0; i < ARRAY_SIZE(scaling->scaling_list_4x4); i++) {
> +		for (j = 0; j < ARRAY_SIZE(scaling->scaling_list_4x4[0]) / 4; j++) {
> +			tmp = (scaling->scaling_list_4x4[i][4 * j + 0] << 24) |
> +			      (scaling->scaling_list_4x4[i][4 * j + 1] << 16) |
> +			      (scaling->scaling_list_4x4[i][4 * j + 2] << 8) |
> +			      (scaling->scaling_list_4x4[i][4 * j + 3]);
> +			*dst++ = tmp;
> +		}

This looks like it could use swab32().

>  	}
>  
> -	src = &scaling->scaling_list_8x8[0][0];
> -	for (i = 0; i < num_list_8x8; ++i) {
> -		for (j = 0; j < list_len_8x8; ++j)
> -			dst[zig_zag_8x8[j]] = src[j];
> -		src += list_len_8x8;
> -		dst += list_len_8x8;
> +	for (i = 0; i < ARRAY_SIZE(scaling->scaling_list_8x8); i += 3) {
> +		for (j = 0; j < ARRAY_SIZE(scaling->scaling_list_8x8[0]) / 4; j++) {
> +			tmp = (scaling->scaling_list_8x8[i][4 * j + 0] << 24) |
> +			      (scaling->scaling_list_8x8[i][4 * j + 1] << 16) |
> +			      (scaling->scaling_list_8x8[i][4 * j + 2] << 8) |
> +			      (scaling->scaling_list_8x8[i][4 * j + 3]);
> +			*dst++ = tmp;
> +		}

After this change, the second 8x8 scaling list has moved to a different
offset. Is this where the hardware has always been looking for it, or is
there a change missing in another place?

regards
Philipp

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC 10/12] media: hantro: Add support for H264 decoding on RK3399
  2019-09-02 11:46     ` Hans Verkuil
@ 2019-09-02 15:25       ` Jonas Karlman
  0 siblings, 0 replies; 33+ messages in thread
From: Jonas Karlman @ 2019-09-02 15:25 UTC (permalink / raw)
  To: Hans Verkuil, Ezequiel Garcia
  Cc: Mauro Carvalho Chehab, Boris Brezillon, Philipp Zabel,
	Paul Kocialkowski, linux-media, linux-rockchip, linux-kernel

On 2019-09-02 13:46, Hans Verkuil wrote:
> On 9/1/19 2:45 PM, Jonas Karlman wrote:
>> Rockchip RK3399 SoC has the same Hantro G1 IP block
>> as RK3288, but the registers are entirely different.
>>
>> In a similar fashion as MPEG-2 and VP8 decoding,
>> it's simpler to just add a separate implementation.
>>
>> Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
>> ---
>>  drivers/staging/media/hantro/Makefile         |   1 +
>>  .../staging/media/hantro/hantro_g1_h264_dec.c |   1 -
>>  drivers/staging/media/hantro/hantro_hw.h      |   1 +
>>  .../media/hantro/rk3399_vpu_hw_h264_dec.c     | 486 ++++++++++++++++++
>>  4 files changed, 488 insertions(+), 1 deletion(-)
>>  create mode 100644 drivers/staging/media/hantro/rk3399_vpu_hw_h264_dec.c
>>
>> diff --git a/drivers/staging/media/hantro/Makefile b/drivers/staging/media/hantro/Makefile
>> index 5d6b0383d280..8d33b0e8aa6c 100644
>> --- a/drivers/staging/media/hantro/Makefile
>> +++ b/drivers/staging/media/hantro/Makefile
>> @@ -8,6 +8,7 @@ hantro-vpu-y += \
>>  		hantro_g1_mpeg2_dec.o \
>>  		hantro_g1_vp8_dec.o \
>>  		rk3399_vpu_hw_jpeg_enc.o \
>> +		rk3399_vpu_hw_h264_dec.o \
>>  		rk3399_vpu_hw_mpeg2_dec.o \
>>  		rk3399_vpu_hw_vp8_dec.o \
>>  		hantro_jpeg.o \
>> diff --git a/drivers/staging/media/hantro/hantro_g1_h264_dec.c b/drivers/staging/media/hantro/hantro_g1_h264_dec.c
>> index 4b82b9fd5252..ec2736fb473d 100644
>> --- a/drivers/staging/media/hantro/hantro_g1_h264_dec.c
>> +++ b/drivers/staging/media/hantro/hantro_g1_h264_dec.c
>> @@ -202,7 +202,6 @@
>>  #define G1_REG_REFBU_E(v)		((v) ? BIT(31) : 0)
>>  
>>  #define G1_REG_APF_THRESHOLD(v)		(((v) << 0) & GENMASK(13, 0))
>> ->>>>>>> b22734fb5e2c... Ymedia: hantro: Refactor G1 H264 code
> ^^^^^^^^^^^^^^^
>
> Seems to be a left-over from patch 9?

Yes, thanks for noticing, looks like there was a small rebase issue, the line got added
in one patch and removed in the next. Will fix in next spin.

Regards,
Jonas

>
> Regards,
>
> 	Hans
>
>>  
>>  void hantro_g1_h264_dec_run(struct hantro_ctx *ctx)
>>  {


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 02/12] media: hantro: Do not reorder H264 scaling list
  2019-09-02 14:00     ` Philipp Zabel
@ 2019-09-02 16:18       ` Jonas Karlman
  2019-09-03  7:54         ` Jonas Karlman
                           ` (2 more replies)
  0 siblings, 3 replies; 33+ messages in thread
From: Jonas Karlman @ 2019-09-02 16:18 UTC (permalink / raw)
  To: Philipp Zabel, Ezequiel Garcia
  Cc: Mauro Carvalho Chehab, Hans Verkuil, Boris Brezillon,
	Paul Kocialkowski, linux-media, linux-rockchip, linux-kernel

On 2019-09-02 16:00, Philipp Zabel wrote:
> Hi Jonas,
>
> On Sun, 2019-09-01 at 12:45 +0000, Jonas Karlman wrote:
>> Scaling list supplied from userspace using ffmpeg and libva-v4l2-request
>> is already in matrix order and can be used without applying the inverse
>> scanning process.
> "in matrix order" is equivalent to "in raster scan order"?

The values supplied by ffmpeg and libva-v4l2-request is in the order after the
inverse scanning process has been applied (scaling list has been transformed
into a scaling matrix). Not sure what this is called, "matrix order" seemed
close enough.

Since there is two scan orders, zig-zag and field, and cedrus already expecting
the values in "matrix" order, it seems more logical to let userspace handle the
inverse scanning process.

>
> Could you add this requirement to the
> V4L2_CID_MPEG_VIDEO_H264_SCALING_MATRIX documentation?

Sure, I will update documentation in v2.

>
>> The HW also only support 8x8 scaling list for the Y component, indices 0
>> and 3 in the scaling list supplied from userspace.
>>
>> Remove reordering and write the scaling matrix in an order expected by
>> the VPU, also only allocate memory for the two 8x8 lists used.
>>
>> Fixes: a9471e25629b ("media: hantro: Add core bits to support H264 decoding")
>> Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
>> ---
>>  drivers/staging/media/hantro/hantro_h264.c | 64 +++++++---------------
>>  1 file changed, 20 insertions(+), 44 deletions(-)
>>
>> diff --git a/drivers/staging/media/hantro/hantro_h264.c b/drivers/staging/media/hantro/hantro_h264.c
>> index 0d758e0c0f99..e2d01145ac4f 100644
>> --- a/drivers/staging/media/hantro/hantro_h264.c
>> +++ b/drivers/staging/media/hantro/hantro_h264.c
>> @@ -20,7 +20,7 @@
>>  /* Size with u32 units. */
>>  #define CABAC_INIT_BUFFER_SIZE		(460 * 2)
>>  #define POC_BUFFER_SIZE			34
>> -#define SCALING_LIST_SIZE		(6 * 16 + 6 * 64)
>> +#define SCALING_LIST_SIZE		(6 * 16 + 2 * 64)
> This changes the size of struct hantro_h264_dec_priv_tbl. Did this
> describe the auxiliary buffer format incorrectly before?

Based on RKMPP and Hantro SDK the HW expects the 8x8 inter/intra list for
Y-component to be located at indices 0 and 1, lists for Cr/Cb is only used for
4:4:4 and HW only supports 4:0:0/4:2:0 if I am not mistaken. So the unused
extra 4 lists at the end of the auxiliary buffer seemed like a waste,
also RKMPP and Hantro SDK only seemed to allocate space for 2 lists.

>
>>  #define POC_CMP(p0, p1) ((p0) < (p1) ? -1 : 1)
>>  
>> @@ -194,57 +194,33 @@ static const u32 h264_cabac_table[] = {
>>  	0x1f0c2517, 0x1f261440
>>  };
>>  
>> -/*
>> - * NOTE: The scaling lists are in zig-zag order, apply inverse scanning process
>> - * to get the values in matrix order. In addition, the hardware requires bytes
>> - * swapped within each subsequent 4 bytes. Both arrays below include both
>> - * transformations.
>> - */
>> -static const u32 zig_zag_4x4[] = {
>> -	3, 2, 7, 11, 6, 1, 0, 5, 10, 15, 14, 9, 4, 8, 13, 12
>> -};
>> -
>> -static const u32 zig_zag_8x8[] = {
>> -	3, 2, 11, 19, 10, 1, 0, 9, 18, 27, 35, 26, 17, 8, 7, 6,
>> -	15, 16, 25, 34, 43, 51, 42, 33, 24, 23, 14, 5, 4, 13, 22, 31,
>> -	32, 41, 50, 59, 58, 49, 40, 39, 30, 21, 12, 20, 29, 38, 47, 48,
>> -	57, 56, 55, 46, 37, 28, 36, 45, 54, 63, 62, 53, 44, 52, 61, 60
>> -};
>> -
>>  static void
>>  reorder_scaling_list(struct hantro_ctx *ctx)
>>  {
>>  	const struct hantro_h264_dec_ctrls *ctrls = &ctx->h264_dec.ctrls;
>>  	const struct v4l2_ctrl_h264_scaling_matrix *scaling = ctrls->scaling;
>> -	const size_t num_list_4x4 = ARRAY_SIZE(scaling->scaling_list_4x4);
>> -	const size_t list_len_4x4 = ARRAY_SIZE(scaling->scaling_list_4x4[0]);
>> -	const size_t num_list_8x8 = ARRAY_SIZE(scaling->scaling_list_8x8);
>> -	const size_t list_len_8x8 = ARRAY_SIZE(scaling->scaling_list_8x8[0]);
>>  	struct hantro_h264_dec_priv_tbl *tbl = ctx->h264_dec.priv.cpu;
>> -	u8 *dst = tbl->scaling_list;
>> -	const u8 *src;
>> -	int i, j;
>> -
>> -	BUILD_BUG_ON(ARRAY_SIZE(zig_zag_4x4) != list_len_4x4);
>> -	BUILD_BUG_ON(ARRAY_SIZE(zig_zag_8x8) != list_len_8x8);
>> -	BUILD_BUG_ON(ARRAY_SIZE(tbl->scaling_list) !=
>> -		     num_list_4x4 * list_len_4x4 +
>> -		     num_list_8x8 * list_len_8x8);
>> -
>> -	src = &scaling->scaling_list_4x4[0][0];
>> -	for (i = 0; i < num_list_4x4; ++i) {
>> -		for (j = 0; j < list_len_4x4; ++j)
>> -			dst[zig_zag_4x4[j]] = src[j];
>> -		src += list_len_4x4;
>> -		dst += list_len_4x4;
>> +	u32 *dst = (u32 *)tbl->scaling_list;
>> +	u32 i, j, tmp;
>> +
>> +	for (i = 0; i < ARRAY_SIZE(scaling->scaling_list_4x4); i++) {
>> +		for (j = 0; j < ARRAY_SIZE(scaling->scaling_list_4x4[0]) / 4; j++) {
>> +			tmp = (scaling->scaling_list_4x4[i][4 * j + 0] << 24) |
>> +			      (scaling->scaling_list_4x4[i][4 * j + 1] << 16) |
>> +			      (scaling->scaling_list_4x4[i][4 * j + 2] << 8) |
>> +			      (scaling->scaling_list_4x4[i][4 * j + 3]);
>> +			*dst++ = tmp;
>> +		}
> This looks like it could use swab32().

Thanks for the tip, will look into and change in v2.

>
>>  	}
>>  
>> -	src = &scaling->scaling_list_8x8[0][0];
>> -	for (i = 0; i < num_list_8x8; ++i) {
>> -		for (j = 0; j < list_len_8x8; ++j)
>> -			dst[zig_zag_8x8[j]] = src[j];
>> -		src += list_len_8x8;
>> -		dst += list_len_8x8;
>> +	for (i = 0; i < ARRAY_SIZE(scaling->scaling_list_8x8); i += 3) {
>> +		for (j = 0; j < ARRAY_SIZE(scaling->scaling_list_8x8[0]) / 4; j++) {
>> +			tmp = (scaling->scaling_list_8x8[i][4 * j + 0] << 24) |
>> +			      (scaling->scaling_list_8x8[i][4 * j + 1] << 16) |
>> +			      (scaling->scaling_list_8x8[i][4 * j + 2] << 8) |
>> +			      (scaling->scaling_list_8x8[i][4 * j + 3]);
>> +			*dst++ = tmp;
>> +		}
> After this change, the second 8x8 scaling list has moved to a different
> offset. Is this where the hardware has always been looking for it, or is
> there a change missing in another place?

As mentioned above HW only looks at indices 0 and 1, and ffmpeg will store the
inter/intra Y list at indices 0 and 3 as seen at [1], in similar way cedrus only
use indices 0 and 3 at [2].
FFmpeg memcpy entire scaling_matrix8 to scaling_list_8x8 for v4l2-request-api
and memcpy scaling_matrix8[0] and scaling_matrix8[3] for vaapi.

You can see the effect of this patch using the h264_tivo_sample.ts sample from
cover letter, patch 3-8 must be applied. With this patch applied the green
football field will stay green, without the patch the field will shift in colors.

[1] https://github.com/FFmpeg/FFmpeg/blob/master/libavcodec/h264_ps.c#L299-L308
[2] https://git.linuxtv.org/media_tree.git/tree/drivers/staging/media/sunxi/cedrus/cedrus_h264.c#n231

Regards,
Jonas

>
> regards
> Philipp


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH RFC 00/12] media: hantro: H264 fixes and improvements
  2019-09-02 13:02 ` [PATCH RFC 00/12] media: hantro: H264 fixes and improvements Ezequiel Garcia
@ 2019-09-02 16:28   ` Jonas Karlman
  0 siblings, 0 replies; 33+ messages in thread
From: Jonas Karlman @ 2019-09-02 16:28 UTC (permalink / raw)
  To: Ezequiel Garcia
  Cc: Mauro Carvalho Chehab, Hans Verkuil, Boris Brezillon,
	Philipp Zabel, Paul Kocialkowski, linux-media, linux-rockchip,
	linux-kernel

On 2019-09-02 15:02, Ezequiel Garcia wrote:
> Hi Jonas,
>
> Thanks for the series, I'll be reviewing this shortly.
>
> On Sun, 2019-09-01 at 12:42 +0000, Jonas Karlman wrote:
>> This series contains fixes and improvements for the hantro H264 decoder.
>>
>> Patch 1-6 fixes issues and limitations observed when preparing support
>> for field encoded content.
>>
>> Patch 7 introduce new DPB entry flags that is used to signal how a reference
>> frame is referenced. This information is needed to correctly build a
>> reference list for field encoded content.
>>
>> Patch 8 adds bits to handle field encoded content, this is a rough patch
>> and should be reworked with proper code style and formatting.
>> Please get back with feedback on how to improve this.
>>
>> The following samples from [1] are now playable with patch 1-8
>> - H264_1080i-25-interlace_Kaesescheibchen.mkv
>> - H264_10_1080i_50_AC3-Astra19.2_ProSieben_HD.ts
>> - big_buck_bunny_1080p_H264_AAC_25fps_7200K.mp4
>> - h264_tivo_sample.ts
>>
>> The rest of the patches refactors G1 H264 code to more closely match
>> the code generated by my rockchip-vpu-regtool at [2] and then adds
>> support for H264 decoding on RK3399/RK3328 using the VPU2 block.
>> This code is early work and needs proper code style and formatting,
>> I just wanted to share the early work and get some initial feedback.
>>
>> This series has been tested using ffmpeg v4l2 request hwaccel at [3] [4]
>>
> What boards have you tested this on?

Main testing has been done on a Tinker Board S (RK3288) and a Rock64 (RK3328) device.
Very limited testing on a Rock Pi 4 (RK3399) using earlier version of the patchset,
I will redo some RK3399 testing to make sure it is not only VPU2 on RK3328 that works.

Regards,
Jonas

>
> Thanks,
> Ezequiel
>


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 02/12] media: hantro: Do not reorder H264 scaling list
  2019-09-02 16:18       ` Jonas Karlman
@ 2019-09-03  7:54         ` Jonas Karlman
  2019-09-03 12:53           ` Philipp Zabel
  2019-09-03  9:56         ` Philipp Zabel
  2019-09-10 10:14         ` Ezequiel Garcia
  2 siblings, 1 reply; 33+ messages in thread
From: Jonas Karlman @ 2019-09-03  7:54 UTC (permalink / raw)
  To: Philipp Zabel, Ezequiel Garcia
  Cc: Paul Kocialkowski, linux-kernel, Hans Verkuil, linux-rockchip,
	Boris Brezillon, Mauro Carvalho Chehab, linux-media

On 2019-09-02 18:18, Jonas Karlman wrote:
> On 2019-09-02 16:00, Philipp Zabel wrote:
>> Hi Jonas,
>>
>> On Sun, 2019-09-01 at 12:45 +0000, Jonas Karlman wrote:
>>> Scaling list supplied from userspace using ffmpeg and libva-v4l2-request
>>> is already in matrix order and can be used without applying the inverse
>>> scanning process.
>> "in matrix order" is equivalent to "in raster scan order"?
> The values supplied by ffmpeg and libva-v4l2-request is in the order after the
> inverse scanning process has been applied (scaling list has been transformed
> into a scaling matrix). Not sure what this is called, "matrix order" seemed
> close enough.
>
> Since there is two scan orders, zig-zag and field, and cedrus already expecting
> the values in "matrix" order, it seems more logical to let userspace handle the
> inverse scanning process.

After a closer look both ffmpeg and rkmpp only apply zig-zag scan and not field scan,
ffmpeg will memcpy the scaling_matrix4/8 as is for vaapi, vdpau and nvdec,
for dxva2 there is a workaround flag that controls if zig-zag should be applied or not.

I suggest a clarification of the expect order of values and use of the same value order as vaapi, vdpau and nvdec.
i.e. have the scaling list values in "matrix order"/"raster order", after zig-zag scan has been applied,
as is currently expected by cedrus and hantro after this patch.

I would also suggest a change to the expected order of the 8x8 scaling lists to follow the H264 standard,
instead of the ffmpeg order like this patch and cedrus driver currently expects.

Expected scaling list order would then be,
for 4x4: Intra Y, Intra Cb, Intra Cr, Inter Y, Inter Cb, Inter Cr,
for 8x8: Intra Y, Inter Y, Intra Cb, Inter Cb, Intra Cr, Inter Cr.

Regards,
Jonas

>
>> Could you add this requirement to the
>> V4L2_CID_MPEG_VIDEO_H264_SCALING_MATRIX documentation?
> Sure, I will update documentation in v2.
>
>>> The HW also only support 8x8 scaling list for the Y component, indices 0
>>> and 3 in the scaling list supplied from userspace.
>>>
>>> Remove reordering and write the scaling matrix in an order expected by
>>> the VPU, also only allocate memory for the two 8x8 lists used.
>>>
>>> Fixes: a9471e25629b ("media: hantro: Add core bits to support H264 decoding")
>>> Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
>>> ---
>>>  drivers/staging/media/hantro/hantro_h264.c | 64 +++++++---------------
>>>  1 file changed, 20 insertions(+), 44 deletions(-)
>>>
>>> diff --git a/drivers/staging/media/hantro/hantro_h264.c b/drivers/staging/media/hantro/hantro_h264.c
>>> index 0d758e0c0f99..e2d01145ac4f 100644
>>> --- a/drivers/staging/media/hantro/hantro_h264.c
>>> +++ b/drivers/staging/media/hantro/hantro_h264.c
>>> @@ -20,7 +20,7 @@
>>>  /* Size with u32 units. */
>>>  #define CABAC_INIT_BUFFER_SIZE		(460 * 2)
>>>  #define POC_BUFFER_SIZE			34
>>> -#define SCALING_LIST_SIZE		(6 * 16 + 6 * 64)
>>> +#define SCALING_LIST_SIZE		(6 * 16 + 2 * 64)
>> This changes the size of struct hantro_h264_dec_priv_tbl. Did this
>> describe the auxiliary buffer format incorrectly before?
> Based on RKMPP and Hantro SDK the HW expects the 8x8 inter/intra list for
> Y-component to be located at indices 0 and 1, lists for Cr/Cb is only used for
> 4:4:4 and HW only supports 4:0:0/4:2:0 if I am not mistaken. So the unused
> extra 4 lists at the end of the auxiliary buffer seemed like a waste,
> also RKMPP and Hantro SDK only seemed to allocate space for 2 lists.
>
>>>  #define POC_CMP(p0, p1) ((p0) < (p1) ? -1 : 1)
>>>  
>>> @@ -194,57 +194,33 @@ static const u32 h264_cabac_table[] = {
>>>  	0x1f0c2517, 0x1f261440
>>>  };
>>>  
>>> -/*
>>> - * NOTE: The scaling lists are in zig-zag order, apply inverse scanning process
>>> - * to get the values in matrix order. In addition, the hardware requires bytes
>>> - * swapped within each subsequent 4 bytes. Both arrays below include both
>>> - * transformations.
>>> - */
>>> -static const u32 zig_zag_4x4[] = {
>>> -	3, 2, 7, 11, 6, 1, 0, 5, 10, 15, 14, 9, 4, 8, 13, 12
>>> -};
>>> -
>>> -static const u32 zig_zag_8x8[] = {
>>> -	3, 2, 11, 19, 10, 1, 0, 9, 18, 27, 35, 26, 17, 8, 7, 6,
>>> -	15, 16, 25, 34, 43, 51, 42, 33, 24, 23, 14, 5, 4, 13, 22, 31,
>>> -	32, 41, 50, 59, 58, 49, 40, 39, 30, 21, 12, 20, 29, 38, 47, 48,
>>> -	57, 56, 55, 46, 37, 28, 36, 45, 54, 63, 62, 53, 44, 52, 61, 60
>>> -};
>>> -
>>>  static void
>>>  reorder_scaling_list(struct hantro_ctx *ctx)
>>>  {
>>>  	const struct hantro_h264_dec_ctrls *ctrls = &ctx->h264_dec.ctrls;
>>>  	const struct v4l2_ctrl_h264_scaling_matrix *scaling = ctrls->scaling;
>>> -	const size_t num_list_4x4 = ARRAY_SIZE(scaling->scaling_list_4x4);
>>> -	const size_t list_len_4x4 = ARRAY_SIZE(scaling->scaling_list_4x4[0]);
>>> -	const size_t num_list_8x8 = ARRAY_SIZE(scaling->scaling_list_8x8);
>>> -	const size_t list_len_8x8 = ARRAY_SIZE(scaling->scaling_list_8x8[0]);
>>>  	struct hantro_h264_dec_priv_tbl *tbl = ctx->h264_dec.priv.cpu;
>>> -	u8 *dst = tbl->scaling_list;
>>> -	const u8 *src;
>>> -	int i, j;
>>> -
>>> -	BUILD_BUG_ON(ARRAY_SIZE(zig_zag_4x4) != list_len_4x4);
>>> -	BUILD_BUG_ON(ARRAY_SIZE(zig_zag_8x8) != list_len_8x8);
>>> -	BUILD_BUG_ON(ARRAY_SIZE(tbl->scaling_list) !=
>>> -		     num_list_4x4 * list_len_4x4 +
>>> -		     num_list_8x8 * list_len_8x8);
>>> -
>>> -	src = &scaling->scaling_list_4x4[0][0];
>>> -	for (i = 0; i < num_list_4x4; ++i) {
>>> -		for (j = 0; j < list_len_4x4; ++j)
>>> -			dst[zig_zag_4x4[j]] = src[j];
>>> -		src += list_len_4x4;
>>> -		dst += list_len_4x4;
>>> +	u32 *dst = (u32 *)tbl->scaling_list;
>>> +	u32 i, j, tmp;
>>> +
>>> +	for (i = 0; i < ARRAY_SIZE(scaling->scaling_list_4x4); i++) {
>>> +		for (j = 0; j < ARRAY_SIZE(scaling->scaling_list_4x4[0]) / 4; j++) {
>>> +			tmp = (scaling->scaling_list_4x4[i][4 * j + 0] << 24) |
>>> +			      (scaling->scaling_list_4x4[i][4 * j + 1] << 16) |
>>> +			      (scaling->scaling_list_4x4[i][4 * j + 2] << 8) |
>>> +			      (scaling->scaling_list_4x4[i][4 * j + 3]);
>>> +			*dst++ = tmp;
>>> +		}
>> This looks like it could use swab32().
> Thanks for the tip, will look into and change in v2.
>
>>>  	}
>>>  
>>> -	src = &scaling->scaling_list_8x8[0][0];
>>> -	for (i = 0; i < num_list_8x8; ++i) {
>>> -		for (j = 0; j < list_len_8x8; ++j)
>>> -			dst[zig_zag_8x8[j]] = src[j];
>>> -		src += list_len_8x8;
>>> -		dst += list_len_8x8;
>>> +	for (i = 0; i < ARRAY_SIZE(scaling->scaling_list_8x8); i += 3) {
>>> +		for (j = 0; j < ARRAY_SIZE(scaling->scaling_list_8x8[0]) / 4; j++) {
>>> +			tmp = (scaling->scaling_list_8x8[i][4 * j + 0] << 24) |
>>> +			      (scaling->scaling_list_8x8[i][4 * j + 1] << 16) |
>>> +			      (scaling->scaling_list_8x8[i][4 * j + 2] << 8) |
>>> +			      (scaling->scaling_list_8x8[i][4 * j + 3]);
>>> +			*dst++ = tmp;
>>> +		}
>> After this change, the second 8x8 scaling list has moved to a different
>> offset. Is this where the hardware has always been looking for it, or is
>> there a change missing in another place?
> As mentioned above HW only looks at indices 0 and 1, and ffmpeg will store the
> inter/intra Y list at indices 0 and 3 as seen at [1], in similar way cedrus only
> use indices 0 and 3 at [2].
> FFmpeg memcpy entire scaling_matrix8 to scaling_list_8x8 for v4l2-request-api
> and memcpy scaling_matrix8[0] and scaling_matrix8[3] for vaapi.
>
> You can see the effect of this patch using the h264_tivo_sample.ts sample from
> cover letter, patch 3-8 must be applied. With this patch applied the green
> football field will stay green, without the patch the field will shift in colors.
>
> [1] https://github.com/FFmpeg/FFmpeg/blob/master/libavcodec/h264_ps.c#L299-L308
> [2] https://git.linuxtv.org/media_tree.git/tree/drivers/staging/media/sunxi/cedrus/cedrus_h264.c#n231
>
> Regards,
> Jonas
>
>> regards
>> Philipp


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 02/12] media: hantro: Do not reorder H264 scaling list
  2019-09-02 16:18       ` Jonas Karlman
  2019-09-03  7:54         ` Jonas Karlman
@ 2019-09-03  9:56         ` Philipp Zabel
  2019-09-10 10:14         ` Ezequiel Garcia
  2 siblings, 0 replies; 33+ messages in thread
From: Philipp Zabel @ 2019-09-03  9:56 UTC (permalink / raw)
  To: Jonas Karlman, Ezequiel Garcia
  Cc: Mauro Carvalho Chehab, Hans Verkuil, Boris Brezillon,
	Paul Kocialkowski, linux-media, linux-rockchip, linux-kernel

On Mon, 2019-09-02 at 16:18 +0000, Jonas Karlman wrote:
> On 2019-09-02 16:00, Philipp Zabel wrote:
> > Hi Jonas,
> > 
> > On Sun, 2019-09-01 at 12:45 +0000, Jonas Karlman wrote:
> > > Scaling list supplied from userspace using ffmpeg and libva-v4l2-request
> > > is already in matrix order and can be used without applying the inverse
> > > scanning process.
> > 
> > "in matrix order" is equivalent to "in raster scan order"?
> 
> The values supplied by ffmpeg and libva-v4l2-request is in the order after the
> inverse scanning process has been applied (scaling list has been transformed
> into a scaling matrix). Not sure what this is called, "matrix order" seemed
> close enough.

Ok, after reading chapters

8.5.6 Inverse scanning process for 4x4 transform coefficients and scaling lists
8.5.7 Inverse scanning process for 8x8 transform coefficients and scaling lists

of ITU-T Rec. H.264, this seems clear enough. I just asked to make sure,
because libva documentation uses the term "raster scan" [1].

[1] http://intel.github.io/libva/structVAIQMatrixBufferH264.html

> Since there is two scan orders, zig-zag and field, and cedrus already expecting
> the values in "matrix" order, it seems more logical to let userspace handle the
> inverse scanning process.

I agree.

[...]
> > This changes the size of struct hantro_h264_dec_priv_tbl. Did this
> > describe the auxiliary buffer format incorrectly before?
> 
> Based on RKMPP and Hantro SDK the HW expects the 8x8 inter/intra list for
> Y-component to be located at indices 0 and 1, lists for Cr/Cb is only used for
> 4:4:4 and HW only supports 4:0:0/4:2:0 if I am not mistaken. So the unused
> extra 4 lists at the end of the auxiliary buffer seemed like a waste,
> also RKMPP and Hantro SDK only seemed to allocate space for 2 lists.

Ok.

> > After this change, the second 8x8 scaling list has moved to a different
> > offset. Is this where the hardware has always been looking for it, or is
> > there a change missing in another place?
> 
> As mentioned above HW only looks at indices 0 and 1, and ffmpeg will store the
> inter/intra Y list at indices 0 and 3 as seen at [1], in similar way cedrus only
> use indices 0 and 3 at [2].
> FFmpeg memcpy entire scaling_matrix8 to scaling_list_8x8 for v4l2-request-api
> and memcpy scaling_matrix8[0] and scaling_matrix8[3] for vaapi.
> 
> You can see the effect of this patch using the h264_tivo_sample.ts sample from
> cover letter, patch 3-8 must be applied. With this patch applied the green
> football field will stay green, without the patch the field will shift in colors.
> 
> [1] https://github.com/FFmpeg/FFmpeg/blob/master/libavcodec/h264_ps.c#L299-L308
> [2] https://git.linuxtv.org/media_tree.git/tree/drivers/staging/media/sunxi/cedrus/cedrus_h264.c#n231

Thank you, I'll try this.

regards
Philipp

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 03/12] media: hantro: Fix H264 motion vector buffer offset
  2019-09-01 12:45   ` [PATCH 03/12] media: hantro: Fix H264 motion vector buffer offset Jonas Karlman
@ 2019-09-03 10:58     ` Philipp Zabel
  2019-09-03 20:13       ` Jonas Karlman
  2019-09-10 10:18     ` Ezequiel Garcia
  2019-09-10 11:34     ` Ezequiel Garcia
  2 siblings, 1 reply; 33+ messages in thread
From: Philipp Zabel @ 2019-09-03 10:58 UTC (permalink / raw)
  To: Jonas Karlman, Ezequiel Garcia
  Cc: Mauro Carvalho Chehab, Hans Verkuil, Boris Brezillon,
	Paul Kocialkowski, linux-media, linux-rockchip, linux-kernel

Hi Jonas,

On Sun, 2019-09-01 at 12:45 +0000, Jonas Karlman wrote:
> A decoded 8-bit 4:2:0 frame need memory for up to 448 macroblocks
> and is laid out in memory as follow:

Do you mean "A decoded 8-bit 4:2:0 frame needs up to 448 bytes per
macroblock"?

A 1280x720 frame already consists of 3600 macroblocks (each 16x16 Y +
2x8x8 Cb,Cr).

> +-------------------+
> > Y-plane   256 MBs |

So that looks like it should be 256 bytes * number of macroblocks
instead, same for the following two.

> +-------------------+
> > UV-plane  128 MBs |
> +-------------------+
> > MV buffer  64 MBs |
> 
> +-------------------+
>
> The motion vector buffer offset is currently correct for 4:2:0 because
> the extra space for motion vectors is overallocated with an extra 64 MBs.
> 
> Wrong offset for both destination and motion vector buffer are used
> for the bottom field of field encoded content, wrong offset is
> also used for 4:0:0 (monochrome) content.
> 
> Fix this by always setting the motion vector address to the expected
> 384 MBs offset for 4:2:0 and 256 MBs offset for 4:0:0 content.

Expected by whom? For example, could these be placed in separate buffers
instead of appended to the VB2 allocated buffers?

> Also use correct destination and motion vector buffer offset
> for the bottom field of field encoded content.
> 
> While at it also extend the check for 4:0:0 (monochrome) to include an
> additional check for High Profile (100).
> 
> Fixes: dea0a82f3d22 ("media: hantro: Add support for H264 decoding on G1")
> Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
> ---
>  .../staging/media/hantro/hantro_g1_h264_dec.c | 33 +++++++++++--------
>  1 file changed, 19 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/staging/media/hantro/hantro_g1_h264_dec.c b/drivers/staging/media/hantro/hantro_g1_h264_dec.c
> index 7ab534936843..159bd67e0a36 100644
> --- a/drivers/staging/media/hantro/hantro_g1_h264_dec.c
> +++ b/drivers/staging/media/hantro/hantro_g1_h264_dec.c
> @@ -19,6 +19,9 @@
>  #include "hantro_hw.h"
>  #include "hantro_v4l2.h"
>  
> +#define MV_OFFSET_420	384
> +#define MV_OFFSET_400	256
> +
>  static void set_params(struct hantro_ctx *ctx)
>  {
>  	const struct hantro_h264_dec_ctrls *ctrls = &ctx->h264_dec.ctrls;
> @@ -49,8 +52,8 @@ static void set_params(struct hantro_ctx *ctx)
>  	vdpu_write_relaxed(vpu, reg, G1_REG_DEC_CTRL0);
>  
>  	/* Decoder control register 1. */
> -	reg = G1_REG_DEC_CTRL1_PIC_MB_WIDTH(sps->pic_width_in_mbs_minus1 + 1) |
> -	      G1_REG_DEC_CTRL1_PIC_MB_HEIGHT_P(sps->pic_height_in_map_units_minus1 + 1) |
> +	reg = G1_REG_DEC_CTRL1_PIC_MB_WIDTH(H264_MB_WIDTH(ctx->dst_fmt.width)) |
> +	      G1_REG_DEC_CTRL1_PIC_MB_HEIGHT_P(H264_MB_HEIGHT(ctx->dst_fmt.height)) |
>  	      G1_REG_DEC_CTRL1_REF_FRAMES(sps->max_num_ref_frames);
>  	vdpu_write_relaxed(vpu, reg, G1_REG_DEC_CTRL1);
>  
> @@ -79,7 +82,7 @@ static void set_params(struct hantro_ctx *ctx)
>  		reg |= G1_REG_DEC_CTRL4_CABAC_E;
>  	if (sps->flags & V4L2_H264_SPS_FLAG_DIRECT_8X8_INFERENCE)
>  		reg |= G1_REG_DEC_CTRL4_DIR_8X8_INFER_E;
> -	if (sps->chroma_format_idc == 0)
> +	if (sps->profile_idc >= 100 && sps->chroma_format_idc == 0)
>  		reg |= G1_REG_DEC_CTRL4_BLACKWHITE_E;
>  	if (pps->flags & V4L2_H264_PPS_FLAG_WEIGHTED_PRED)
>  		reg |= G1_REG_DEC_CTRL4_WEIGHT_PRED_E;
> @@ -233,6 +236,7 @@ static void set_buffers(struct hantro_ctx *ctx)
>  	struct vb2_v4l2_buffer *src_buf, *dst_buf;
>  	struct hantro_dev *vpu = ctx->dev;
>  	dma_addr_t src_dma, dst_dma;
> +	unsigned int offset = MV_OFFSET_420;
>  
>  	src_buf = hantro_get_src_buf(ctx);
>  	dst_buf = hantro_get_dst_buf(ctx);
> @@ -243,19 +247,20 @@ static void set_buffers(struct hantro_ctx *ctx)
>  
>  	/* Destination (decoded frame) buffer. */
>  	dst_dma = vb2_dma_contig_plane_dma_addr(&dst_buf->vb2_buf, 0);
> +	if (ctrls->slices[0].flags & V4L2_H264_SLICE_FLAG_BOTTOM_FIELD)
> +		dst_dma += ALIGN(ctx->dst_fmt.width, H264_MB_DIM);

How does this work? Does userspace decode two fields into the same
capture buffer and the hardware writes each field with a stride of 2
lines? I suppose this corresponds to V4L2_FIELD_INTERLACED. Could this
also be made to support V4L2_FIELD_SEQ_TB output?

regards
Philipp

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 02/12] media: hantro: Do not reorder H264 scaling list
  2019-09-03  7:54         ` Jonas Karlman
@ 2019-09-03 12:53           ` Philipp Zabel
  0 siblings, 0 replies; 33+ messages in thread
From: Philipp Zabel @ 2019-09-03 12:53 UTC (permalink / raw)
  To: Jonas Karlman, Ezequiel Garcia
  Cc: Paul Kocialkowski, linux-kernel, Hans Verkuil, linux-rockchip,
	Boris Brezillon, Mauro Carvalho Chehab, linux-media

On Tue, 2019-09-03 at 07:54 +0000, Jonas Karlman wrote:
[...]
> After a closer look both ffmpeg and rkmpp only apply zig-zag scan and not field scan,
> ffmpeg will memcpy the scaling_matrix4/8 as is for vaapi, vdpau and nvdec,
> for dxva2 there is a workaround flag that controls if zig-zag should be applied or not.
> 
> I suggest a clarification of the expect order of values and use of the same value order as vaapi, vdpau and nvdec.
> i.e. have the scaling list values in "matrix order"/"raster order", after zig-zag scan has been applied,
> as is currently expected by cedrus and hantro after this patch.
> 
> I would also suggest a change to the expected order of the 8x8 scaling lists to follow the H264 standard,
> instead of the ffmpeg order like this patch and cedrus driver currently expects.
> 
> Expected scaling list order would then be,
> for 4x4: Intra Y, Intra Cb, Intra Cr, Inter Y, Inter Cb, Inter Cr,
> for 8x8: Intra Y, Inter Y, Intra Cb, Inter Cb, Intra Cr, Inter Cr.

I'm in favor of both, it seems unnecessary to reorder the lists in
userspace only to have the kernel reorder them back before passing them
to the hardware.

regards
Philipp

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC 08/12] media: hantro: Fix H264 decoding of field encoded content
  2019-09-01 12:45   ` [RFC 08/12] media: hantro: Fix H264 decoding of field encoded content Jonas Karlman
@ 2019-09-03 13:21     ` Philipp Zabel
  2019-09-03 14:02       ` Jonas Karlman
  0 siblings, 1 reply; 33+ messages in thread
From: Philipp Zabel @ 2019-09-03 13:21 UTC (permalink / raw)
  To: Jonas Karlman, Ezequiel Garcia
  Cc: Mauro Carvalho Chehab, Hans Verkuil, Boris Brezillon,
	Paul Kocialkowski, linux-media, linux-rockchip, linux-kernel

On Sun, 2019-09-01 at 12:45 +0000, Jonas Karlman wrote:
> This need code cleanup and formatting
> 
> Signed-off-by: Jonas Karlman <jonas@kwiboo.se>

The previous patches all work, but this patch breaks decoding of
progressive content for me (i.MX8MQ with FFmpeg based on Ezequiel's
branch).

regards
Philipp

> ---
>  .../staging/media/hantro/hantro_g1_h264_dec.c |  26 ++--
>  drivers/staging/media/hantro/hantro_h264.c    | 126 ++++++++++++------
>  drivers/staging/media/hantro/hantro_hw.h      |   4 +
>  3 files changed, 100 insertions(+), 56 deletions(-)
> 
> diff --git a/drivers/staging/media/hantro/hantro_g1_h264_dec.c b/drivers/staging/media/hantro/hantro_g1_h264_dec.c
> index 16f21d258f6a..bc628ef73b29 100644
> --- a/drivers/staging/media/hantro/hantro_g1_h264_dec.c
> +++ b/drivers/staging/media/hantro/hantro_g1_h264_dec.c
> @@ -130,28 +130,20 @@ static void set_params(struct hantro_ctx *ctx)
>  
>  static void set_ref(struct hantro_ctx *ctx)
>  {
> +	const struct v4l2_ctrl_h264_decode_params *dec_param;
> +	const struct v4l2_ctrl_h264_slice_params *slice;
>  	struct v4l2_h264_dpb_entry *dpb = ctx->h264_dec.dpb;
>  	const u8 *b0_reflist, *b1_reflist, *p_reflist;
>  	struct hantro_dev *vpu = ctx->dev;
> -	u32 dpb_longterm = 0;
> -	u32 dpb_valid = 0;
>  	int reg_num;
>  	u32 reg;
>  	int i;
>  
> -	/*
> -	 * Set up bit maps of valid and long term DPBs.
> -	 * NOTE: The bits are reversed, i.e. MSb is DPB 0.
> -	 */
> -	for (i = 0; i < HANTRO_H264_DPB_SIZE; ++i) {
> -		if (dpb[i].flags & V4L2_H264_DPB_ENTRY_FLAG_ACTIVE)
> -			dpb_valid |= BIT(HANTRO_H264_DPB_SIZE - 1 - i);
> +	dec_param = ctx->h264_dec.ctrls.decode;
> +	slice = ctx->h264_dec.ctrls.slices;
>  
> -		if (dpb[i].flags & V4L2_H264_DPB_ENTRY_FLAG_LONG_TERM)
> -			dpb_longterm |= BIT(HANTRO_H264_DPB_SIZE - 1 - i);
> -	}
> -	vdpu_write_relaxed(vpu, dpb_valid << 16, G1_REG_VALID_REF);
> -	vdpu_write_relaxed(vpu, dpb_longterm << 16, G1_REG_LT_REF);
> +	vdpu_write_relaxed(vpu, ctx->h264_dec.dpb_valid, G1_REG_VALID_REF);
> +	vdpu_write_relaxed(vpu, ctx->h264_dec.dpb_longterm, G1_REG_LT_REF);
>  
>  	/*
>  	 * Set up reference frame picture numbers.
> @@ -223,10 +215,8 @@ static void set_ref(struct hantro_ctx *ctx)
>  
>  	/* Set up addresses of DPB buffers. */
>  	for (i = 0; i < HANTRO_H264_DPB_SIZE; i++) {
> -		struct vb2_buffer *buf =  hantro_h264_get_ref_buf(ctx, i);
> -
> -		vdpu_write_relaxed(vpu, vb2_dma_contig_plane_dma_addr(buf, 0),
> -				   G1_REG_ADDR_REF(i));
> +		dma_addr_t addr = hantro_h264_get_ref_dma_addr(ctx, i);
> +		vdpu_write_relaxed(vpu, addr, G1_REG_ADDR_REF(i));
>  	}
>  }
>  
> diff --git a/drivers/staging/media/hantro/hantro_h264.c b/drivers/staging/media/hantro/hantro_h264.c
> index a77cc28e180a..85c86d728b1a 100644
> --- a/drivers/staging/media/hantro/hantro_h264.c
> +++ b/drivers/staging/media/hantro/hantro_h264.c
> @@ -228,17 +228,65 @@ static void prepare_table(struct hantro_ctx *ctx)
>  {
>  	const struct hantro_h264_dec_ctrls *ctrls = &ctx->h264_dec.ctrls;
>  	const struct v4l2_ctrl_h264_decode_params *dec_param = ctrls->decode;
> +	const struct v4l2_ctrl_h264_slice_params *slices = ctrls->slices;
>  	struct hantro_h264_dec_priv_tbl *tbl = ctx->h264_dec.priv.cpu;
>  	const struct v4l2_h264_dpb_entry *dpb = ctx->h264_dec.dpb;
> +	u32 dpb_longterm = 0;
> +	u32 dpb_valid = 0;
>  	int i;
>  
> +	/*
> +	 * Set up bit maps of valid and long term DPBs.
> +	 * NOTE: The bits are reversed, i.e. MSb is DPB 0.
> +	 */
> +	if ((slices[0].flags & V4L2_H264_SLICE_FLAG_FIELD_PIC) || (slices[0].flags & V4L2_H264_SPS_FLAG_MB_ADAPTIVE_FRAME_FIELD)) {
> +		for (i = 0; i < HANTRO_H264_DPB_SIZE * 2; ++i) {
> +			// check for correct reference use
> +			u32 flag = (i & 0x1) ? V4L2_H264_DPB_ENTRY_FLAG_REF_BOTTOM : V4L2_H264_DPB_ENTRY_FLAG_REF_TOP;
> +			if (dpb[i / 2].flags & flag)
> +				dpb_valid |= BIT(HANTRO_H264_DPB_SIZE * 2 - 1 - i);
> +
> +			if (dpb[i / 2].flags & V4L2_H264_DPB_ENTRY_FLAG_LONG_TERM)
> +				dpb_longterm |= BIT(HANTRO_H264_DPB_SIZE * 2 - 1 - i);
> +		}
> +
> +		ctx->h264_dec.dpb_valid = dpb_valid;
> +		ctx->h264_dec.dpb_longterm = dpb_longterm;
> +	} else {
> +		for (i = 0; i < HANTRO_H264_DPB_SIZE; ++i) {
> +			if (dpb[i].flags & V4L2_H264_DPB_ENTRY_FLAG_ACTIVE)
> +				dpb_valid |= BIT(HANTRO_H264_DPB_SIZE - 1 - i);
> +
> +			if (dpb[i].flags & V4L2_H264_DPB_ENTRY_FLAG_LONG_TERM)
> +				dpb_longterm |= BIT(HANTRO_H264_DPB_SIZE - 1 - i);
> +		}
> +
> +		ctx->h264_dec.dpb_valid = dpb_valid << 16;
> +		ctx->h264_dec.dpb_longterm = dpb_longterm << 16;
> +	}
> +
>  	for (i = 0; i < HANTRO_H264_DPB_SIZE; ++i) {
> -		tbl->poc[i * 2] = dpb[i].top_field_order_cnt;
> -		tbl->poc[i * 2 + 1] = dpb[i].bottom_field_order_cnt;
> +		if (dpb[i].flags & V4L2_H264_DPB_ENTRY_FLAG_ACTIVE) {
> +			tbl->poc[i * 2] = dpb[i].top_field_order_cnt;
> +			tbl->poc[i * 2 + 1] = dpb[i].bottom_field_order_cnt;
> +		} else {
> +			tbl->poc[i * 2] = 0;
> +			tbl->poc[i * 2 + 1] = 0;
> +		}
>  	}
>  
> -	tbl->poc[32] = dec_param->top_field_order_cnt;
> -	tbl->poc[33] = dec_param->bottom_field_order_cnt;
> +	if ((slices[0].flags & V4L2_H264_SLICE_FLAG_FIELD_PIC) || !(slices[0].flags & V4L2_H264_SPS_FLAG_MB_ADAPTIVE_FRAME_FIELD)) {
> +		if ((slices[0].flags & V4L2_H264_SLICE_FLAG_FIELD_PIC))
> +			tbl->poc[32] = (slices[0].flags & V4L2_H264_SLICE_FLAG_BOTTOM_FIELD) ?
> +					dec_param->bottom_field_order_cnt :
> +					dec_param->top_field_order_cnt;
> +		else
> +			tbl->poc[32] = min(dec_param->top_field_order_cnt, dec_param->bottom_field_order_cnt);
> +		tbl->poc[33] = 0;
> +	} else {
> +		tbl->poc[32] = dec_param->top_field_order_cnt;
> +		tbl->poc[33] = dec_param->bottom_field_order_cnt;
> +	}
>  
>  	reorder_scaling_list(ctx);
>  }
> @@ -251,51 +299,36 @@ struct hantro_h264_reflist_builder {
>  	u8 num_valid;
>  };
>  
> -static s32 get_poc(enum v4l2_field field, s32 top_field_order_cnt,
> -		   s32 bottom_field_order_cnt)
> -{
> -	switch (field) {
> -	case V4L2_FIELD_TOP:
> -		return top_field_order_cnt;
> -	case V4L2_FIELD_BOTTOM:
> -		return bottom_field_order_cnt;
> -	default:
> -		break;
> -	}
> -
> -	return min(top_field_order_cnt, bottom_field_order_cnt);
> -}
> -
>  static void
>  init_reflist_builder(struct hantro_ctx *ctx,
>  		     struct hantro_h264_reflist_builder *b)
>  {
>  	const struct v4l2_ctrl_h264_decode_params *dec_param;
> -	struct vb2_v4l2_buffer *buf = hantro_get_dst_buf(ctx);
> +	const struct v4l2_ctrl_h264_slice_params *slices;
>  	const struct v4l2_h264_dpb_entry *dpb = ctx->h264_dec.dpb;
> -	struct vb2_queue *cap_q = &ctx->fh.m2m_ctx->cap_q_ctx.q;
>  	unsigned int i;
>  
>  	dec_param = ctx->h264_dec.ctrls.decode;
> +	slices = ctx->h264_dec.ctrls.slices;
>  
>  	memset(b, 0, sizeof(*b));
>  	b->dpb = dpb;
> -	b->curpoc = get_poc(buf->field, dec_param->top_field_order_cnt,
> -			    dec_param->bottom_field_order_cnt);
> +	b->curpoc = (slices[0].flags & V4L2_H264_SLICE_FLAG_BOTTOM_FIELD) ?
> +		    dec_param->bottom_field_order_cnt :
> +		    dec_param->top_field_order_cnt;
>  
>  	for (i = 0; i < ARRAY_SIZE(ctx->h264_dec.dpb); i++) {
> -		int buf_idx;
> -
> -		if (!(dpb[i].flags & V4L2_H264_DPB_ENTRY_FLAG_ACTIVE))
> +		u32 ref_flag = dpb[i].flags & V4L2_H264_DPB_ENTRY_FLAG_REF_FRAME;
> +		if (!ref_flag)
>  			continue;
>  
> -		buf_idx = vb2_find_timestamp(cap_q, dpb[i].reference_ts, 0);
> -		if (buf_idx < 0)
> -			continue;
> +		if (ref_flag == V4L2_H264_DPB_ENTRY_FLAG_REF_FRAME)
> +			b->pocs[i] = min(dpb[i].bottom_field_order_cnt, dpb[i].top_field_order_cnt);
> +		else if (ref_flag == V4L2_H264_DPB_ENTRY_FLAG_REF_BOTTOM)
> +			b->pocs[i] = dpb[i].bottom_field_order_cnt;
> +		else if (ref_flag == V4L2_H264_DPB_ENTRY_FLAG_REF_TOP)
> +			b->pocs[i] = dpb[i].top_field_order_cnt;
>  
> -		buf = to_vb2_v4l2_buffer(vb2_get_buffer(cap_q, buf_idx));
> -		b->pocs[i] = get_poc(buf->field, dpb[i].top_field_order_cnt,
> -				     dpb[i].bottom_field_order_cnt);
>  		b->unordered_reflist[b->num_valid] = i;
>  		b->num_valid++;
>  	}
> @@ -448,8 +481,7 @@ build_b_ref_lists(const struct hantro_h264_reflist_builder *builder,
>  static bool dpb_entry_match(const struct v4l2_h264_dpb_entry *a,
>  			    const struct v4l2_h264_dpb_entry *b)
>  {
> -	return a->top_field_order_cnt == b->top_field_order_cnt &&
> -	       a->bottom_field_order_cnt == b->bottom_field_order_cnt;
> +	return a->reference_ts == b->reference_ts;
>  }
>  
>  static void update_dpb(struct hantro_ctx *ctx)
> @@ -463,13 +495,13 @@ static void update_dpb(struct hantro_ctx *ctx)
>  
>  	/* Disable all entries by default. */
>  	for (i = 0; i < ARRAY_SIZE(ctx->h264_dec.dpb); i++)
> -		ctx->h264_dec.dpb[i].flags &= ~V4L2_H264_DPB_ENTRY_FLAG_ACTIVE;
> +		ctx->h264_dec.dpb[i].flags = 0;
>  
>  	/* Try to match new DPB entries with existing ones by their POCs. */
>  	for (i = 0; i < ARRAY_SIZE(dec_param->dpb); i++) {
>  		const struct v4l2_h264_dpb_entry *ndpb = &dec_param->dpb[i];
>  
> -		if (!(ndpb->flags & V4L2_H264_DPB_ENTRY_FLAG_ACTIVE))
> +		if (!(ndpb->flags & V4L2_H264_DPB_ENTRY_FLAG_VALID))
>  			continue;
>  
>  		/*
> @@ -480,8 +512,7 @@ static void update_dpb(struct hantro_ctx *ctx)
>  			struct v4l2_h264_dpb_entry *cdpb;
>  
>  			cdpb = &ctx->h264_dec.dpb[j];
> -			if (cdpb->flags & V4L2_H264_DPB_ENTRY_FLAG_ACTIVE ||
> -			    !dpb_entry_match(cdpb, ndpb))
> +			if (!dpb_entry_match(cdpb, ndpb))
>  				continue;
>  
>  			*cdpb = *ndpb;
> @@ -541,6 +572,25 @@ struct vb2_buffer *hantro_h264_get_ref_buf(struct hantro_ctx *ctx,
>  	return buf;
>  }
>  
> +dma_addr_t hantro_h264_get_ref_dma_addr(struct hantro_ctx *ctx,
> +					unsigned int dpb_idx)
> +{
> +	struct v4l2_h264_dpb_entry *dpb = ctx->h264_dec.dpb;
> +	const struct v4l2_ctrl_h264_decode_params *dec_param = ctx->h264_dec.ctrls.decode;
> +	const struct v4l2_ctrl_h264_slice_params *slices = ctx->h264_dec.ctrls.slices;
> +
> +	struct vb2_buffer *buf = hantro_h264_get_ref_buf(ctx, dpb_idx);
> +	s32 cur_poc = slices[0].flags & V4L2_H264_SLICE_FLAG_BOTTOM_FIELD ?
> +		      dec_param->bottom_field_order_cnt :
> +		      dec_param->top_field_order_cnt;
> +	u32 flags = dpb[dpb_idx].flags & V4L2_H264_DPB_ENTRY_FLAG_FIELD_PICTURE ? 0x2 : 0;
> +	flags |= abs(dpb[dpb_idx].top_field_order_cnt - cur_poc) <
> +		 abs(dpb[dpb_idx].bottom_field_order_cnt - cur_poc) ?
> +		 0x1 : 0;
> +
> +	return vb2_dma_contig_plane_dma_addr(buf, 0) | flags;
> +}
> +
>  int hantro_h264_dec_prepare_run(struct hantro_ctx *ctx)
>  {
>  	struct hantro_h264_dec_hw_ctx *h264_ctx = &ctx->h264_dec;
> diff --git a/drivers/staging/media/hantro/hantro_hw.h b/drivers/staging/media/hantro/hantro_hw.h
> index 8adad8ac9b1d..d58f2a36ca40 100644
> --- a/drivers/staging/media/hantro/hantro_hw.h
> +++ b/drivers/staging/media/hantro/hantro_hw.h
> @@ -86,6 +86,8 @@ struct hantro_h264_dec_hw_ctx {
>  	struct v4l2_h264_dpb_entry dpb[HANTRO_H264_DPB_SIZE];
>  	struct hantro_h264_dec_reflists reflists;
>  	struct hantro_h264_dec_ctrls ctrls;
> +	u32 dpb_longterm;
> +	u32 dpb_valid;
>  };
>  
>  /**
> @@ -157,6 +159,8 @@ void hantro_jpeg_enc_exit(struct hantro_ctx *ctx);
>  
>  struct vb2_buffer *hantro_h264_get_ref_buf(struct hantro_ctx *ctx,
>  					   unsigned int dpb_idx);
> +dma_addr_t hantro_h264_get_ref_dma_addr(struct hantro_ctx *ctx,
> +					unsigned int dpb_idx);
>  int hantro_h264_dec_prepare_run(struct hantro_ctx *ctx);
>  void hantro_g1_h264_dec_run(struct hantro_ctx *ctx);
>  int hantro_h264_dec_init(struct hantro_ctx *ctx);

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC 08/12] media: hantro: Fix H264 decoding of field encoded content
  2019-09-03 13:21     ` Philipp Zabel
@ 2019-09-03 14:02       ` Jonas Karlman
  2019-09-03 15:01         ` Philipp Zabel
  0 siblings, 1 reply; 33+ messages in thread
From: Jonas Karlman @ 2019-09-03 14:02 UTC (permalink / raw)
  To: Philipp Zabel, Ezequiel Garcia
  Cc: Mauro Carvalho Chehab, Hans Verkuil, Boris Brezillon,
	Paul Kocialkowski, linux-media, linux-rockchip, linux-kernel

On 2019-09-03 15:21, Philipp Zabel wrote:
> On Sun, 2019-09-01 at 12:45 +0000, Jonas Karlman wrote:
>> This need code cleanup and formatting
>>
>> Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
> The previous patches all work, but this patch breaks decoding of
> progressive content for me (i.MX8MQ with FFmpeg based on Ezequiel's
> branch).

Please try with ffmpeg based on my v4l2-request-hwaccel-4.0.4-hantro branch at [1],
up to and including the commit "HACK: add dpb flags for reference usage and field picture".
That commit adds code to set reference flags needed by the changes in this patch.

There is probably also some other minor difference between our two ffmpeg branches.
I have not observed any issues with progressive content with this patch and my ffmpeg branch (on a RK3288 device).
Some H264 reference samples do have visual issues after this patch, however all my real world samples does seem to work.

My branch use libudev to probe media/video devices and needs to be configured with:
--enable-v4l2-request --enable-libudev --enable-libdrm

[1] https://github.com/Kwiboo/FFmpeg/commits/v4l2-request-hwaccel-4.0.4-hantro

Regards,
Jonas

>
> regards
> Philipp
>
>> ---
>>  .../staging/media/hantro/hantro_g1_h264_dec.c |  26 ++--
>>  drivers/staging/media/hantro/hantro_h264.c    | 126 ++++++++++++------
>>  drivers/staging/media/hantro/hantro_hw.h      |   4 +
>>  3 files changed, 100 insertions(+), 56 deletions(-)
>>
>> diff --git a/drivers/staging/media/hantro/hantro_g1_h264_dec.c b/drivers/staging/media/hantro/hantro_g1_h264_dec.c
>> index 16f21d258f6a..bc628ef73b29 100644
>> --- a/drivers/staging/media/hantro/hantro_g1_h264_dec.c
>> +++ b/drivers/staging/media/hantro/hantro_g1_h264_dec.c
>> @@ -130,28 +130,20 @@ static void set_params(struct hantro_ctx *ctx)
>>  
>>  static void set_ref(struct hantro_ctx *ctx)
>>  {
>> +	const struct v4l2_ctrl_h264_decode_params *dec_param;
>> +	const struct v4l2_ctrl_h264_slice_params *slice;
>>  	struct v4l2_h264_dpb_entry *dpb = ctx->h264_dec.dpb;
>>  	const u8 *b0_reflist, *b1_reflist, *p_reflist;
>>  	struct hantro_dev *vpu = ctx->dev;
>> -	u32 dpb_longterm = 0;
>> -	u32 dpb_valid = 0;
>>  	int reg_num;
>>  	u32 reg;
>>  	int i;
>>  
>> -	/*
>> -	 * Set up bit maps of valid and long term DPBs.
>> -	 * NOTE: The bits are reversed, i.e. MSb is DPB 0.
>> -	 */
>> -	for (i = 0; i < HANTRO_H264_DPB_SIZE; ++i) {
>> -		if (dpb[i].flags & V4L2_H264_DPB_ENTRY_FLAG_ACTIVE)
>> -			dpb_valid |= BIT(HANTRO_H264_DPB_SIZE - 1 - i);
>> +	dec_param = ctx->h264_dec.ctrls.decode;
>> +	slice = ctx->h264_dec.ctrls.slices;
>>  
>> -		if (dpb[i].flags & V4L2_H264_DPB_ENTRY_FLAG_LONG_TERM)
>> -			dpb_longterm |= BIT(HANTRO_H264_DPB_SIZE - 1 - i);
>> -	}
>> -	vdpu_write_relaxed(vpu, dpb_valid << 16, G1_REG_VALID_REF);
>> -	vdpu_write_relaxed(vpu, dpb_longterm << 16, G1_REG_LT_REF);
>> +	vdpu_write_relaxed(vpu, ctx->h264_dec.dpb_valid, G1_REG_VALID_REF);
>> +	vdpu_write_relaxed(vpu, ctx->h264_dec.dpb_longterm, G1_REG_LT_REF);
>>  
>>  	/*
>>  	 * Set up reference frame picture numbers.
>> @@ -223,10 +215,8 @@ static void set_ref(struct hantro_ctx *ctx)
>>  
>>  	/* Set up addresses of DPB buffers. */
>>  	for (i = 0; i < HANTRO_H264_DPB_SIZE; i++) {
>> -		struct vb2_buffer *buf =  hantro_h264_get_ref_buf(ctx, i);
>> -
>> -		vdpu_write_relaxed(vpu, vb2_dma_contig_plane_dma_addr(buf, 0),
>> -				   G1_REG_ADDR_REF(i));
>> +		dma_addr_t addr = hantro_h264_get_ref_dma_addr(ctx, i);
>> +		vdpu_write_relaxed(vpu, addr, G1_REG_ADDR_REF(i));
>>  	}
>>  }
>>  
>> diff --git a/drivers/staging/media/hantro/hantro_h264.c b/drivers/staging/media/hantro/hantro_h264.c
>> index a77cc28e180a..85c86d728b1a 100644
>> --- a/drivers/staging/media/hantro/hantro_h264.c
>> +++ b/drivers/staging/media/hantro/hantro_h264.c
>> @@ -228,17 +228,65 @@ static void prepare_table(struct hantro_ctx *ctx)
>>  {
>>  	const struct hantro_h264_dec_ctrls *ctrls = &ctx->h264_dec.ctrls;
>>  	const struct v4l2_ctrl_h264_decode_params *dec_param = ctrls->decode;
>> +	const struct v4l2_ctrl_h264_slice_params *slices = ctrls->slices;
>>  	struct hantro_h264_dec_priv_tbl *tbl = ctx->h264_dec.priv.cpu;
>>  	const struct v4l2_h264_dpb_entry *dpb = ctx->h264_dec.dpb;
>> +	u32 dpb_longterm = 0;
>> +	u32 dpb_valid = 0;
>>  	int i;
>>  
>> +	/*
>> +	 * Set up bit maps of valid and long term DPBs.
>> +	 * NOTE: The bits are reversed, i.e. MSb is DPB 0.
>> +	 */
>> +	if ((slices[0].flags & V4L2_H264_SLICE_FLAG_FIELD_PIC) || (slices[0].flags & V4L2_H264_SPS_FLAG_MB_ADAPTIVE_FRAME_FIELD)) {
>> +		for (i = 0; i < HANTRO_H264_DPB_SIZE * 2; ++i) {
>> +			// check for correct reference use
>> +			u32 flag = (i & 0x1) ? V4L2_H264_DPB_ENTRY_FLAG_REF_BOTTOM : V4L2_H264_DPB_ENTRY_FLAG_REF_TOP;
>> +			if (dpb[i / 2].flags & flag)
>> +				dpb_valid |= BIT(HANTRO_H264_DPB_SIZE * 2 - 1 - i);
>> +
>> +			if (dpb[i / 2].flags & V4L2_H264_DPB_ENTRY_FLAG_LONG_TERM)
>> +				dpb_longterm |= BIT(HANTRO_H264_DPB_SIZE * 2 - 1 - i);
>> +		}
>> +
>> +		ctx->h264_dec.dpb_valid = dpb_valid;
>> +		ctx->h264_dec.dpb_longterm = dpb_longterm;
>> +	} else {
>> +		for (i = 0; i < HANTRO_H264_DPB_SIZE; ++i) {
>> +			if (dpb[i].flags & V4L2_H264_DPB_ENTRY_FLAG_ACTIVE)
>> +				dpb_valid |= BIT(HANTRO_H264_DPB_SIZE - 1 - i);
>> +
>> +			if (dpb[i].flags & V4L2_H264_DPB_ENTRY_FLAG_LONG_TERM)
>> +				dpb_longterm |= BIT(HANTRO_H264_DPB_SIZE - 1 - i);
>> +		}
>> +
>> +		ctx->h264_dec.dpb_valid = dpb_valid << 16;
>> +		ctx->h264_dec.dpb_longterm = dpb_longterm << 16;
>> +	}
>> +
>>  	for (i = 0; i < HANTRO_H264_DPB_SIZE; ++i) {
>> -		tbl->poc[i * 2] = dpb[i].top_field_order_cnt;
>> -		tbl->poc[i * 2 + 1] = dpb[i].bottom_field_order_cnt;
>> +		if (dpb[i].flags & V4L2_H264_DPB_ENTRY_FLAG_ACTIVE) {
>> +			tbl->poc[i * 2] = dpb[i].top_field_order_cnt;
>> +			tbl->poc[i * 2 + 1] = dpb[i].bottom_field_order_cnt;
>> +		} else {
>> +			tbl->poc[i * 2] = 0;
>> +			tbl->poc[i * 2 + 1] = 0;
>> +		}
>>  	}
>>  
>> -	tbl->poc[32] = dec_param->top_field_order_cnt;
>> -	tbl->poc[33] = dec_param->bottom_field_order_cnt;
>> +	if ((slices[0].flags & V4L2_H264_SLICE_FLAG_FIELD_PIC) || !(slices[0].flags & V4L2_H264_SPS_FLAG_MB_ADAPTIVE_FRAME_FIELD)) {
>> +		if ((slices[0].flags & V4L2_H264_SLICE_FLAG_FIELD_PIC))
>> +			tbl->poc[32] = (slices[0].flags & V4L2_H264_SLICE_FLAG_BOTTOM_FIELD) ?
>> +					dec_param->bottom_field_order_cnt :
>> +					dec_param->top_field_order_cnt;
>> +		else
>> +			tbl->poc[32] = min(dec_param->top_field_order_cnt, dec_param->bottom_field_order_cnt);
>> +		tbl->poc[33] = 0;
>> +	} else {
>> +		tbl->poc[32] = dec_param->top_field_order_cnt;
>> +		tbl->poc[33] = dec_param->bottom_field_order_cnt;
>> +	}
>>  
>>  	reorder_scaling_list(ctx);
>>  }
>> @@ -251,51 +299,36 @@ struct hantro_h264_reflist_builder {
>>  	u8 num_valid;
>>  };
>>  
>> -static s32 get_poc(enum v4l2_field field, s32 top_field_order_cnt,
>> -		   s32 bottom_field_order_cnt)
>> -{
>> -	switch (field) {
>> -	case V4L2_FIELD_TOP:
>> -		return top_field_order_cnt;
>> -	case V4L2_FIELD_BOTTOM:
>> -		return bottom_field_order_cnt;
>> -	default:
>> -		break;
>> -	}
>> -
>> -	return min(top_field_order_cnt, bottom_field_order_cnt);
>> -}
>> -
>>  static void
>>  init_reflist_builder(struct hantro_ctx *ctx,
>>  		     struct hantro_h264_reflist_builder *b)
>>  {
>>  	const struct v4l2_ctrl_h264_decode_params *dec_param;
>> -	struct vb2_v4l2_buffer *buf = hantro_get_dst_buf(ctx);
>> +	const struct v4l2_ctrl_h264_slice_params *slices;
>>  	const struct v4l2_h264_dpb_entry *dpb = ctx->h264_dec.dpb;
>> -	struct vb2_queue *cap_q = &ctx->fh.m2m_ctx->cap_q_ctx.q;
>>  	unsigned int i;
>>  
>>  	dec_param = ctx->h264_dec.ctrls.decode;
>> +	slices = ctx->h264_dec.ctrls.slices;
>>  
>>  	memset(b, 0, sizeof(*b));
>>  	b->dpb = dpb;
>> -	b->curpoc = get_poc(buf->field, dec_param->top_field_order_cnt,
>> -			    dec_param->bottom_field_order_cnt);
>> +	b->curpoc = (slices[0].flags & V4L2_H264_SLICE_FLAG_BOTTOM_FIELD) ?
>> +		    dec_param->bottom_field_order_cnt :
>> +		    dec_param->top_field_order_cnt;
>>  
>>  	for (i = 0; i < ARRAY_SIZE(ctx->h264_dec.dpb); i++) {
>> -		int buf_idx;
>> -
>> -		if (!(dpb[i].flags & V4L2_H264_DPB_ENTRY_FLAG_ACTIVE))
>> +		u32 ref_flag = dpb[i].flags & V4L2_H264_DPB_ENTRY_FLAG_REF_FRAME;
>> +		if (!ref_flag)
>>  			continue;
>>  
>> -		buf_idx = vb2_find_timestamp(cap_q, dpb[i].reference_ts, 0);
>> -		if (buf_idx < 0)
>> -			continue;
>> +		if (ref_flag == V4L2_H264_DPB_ENTRY_FLAG_REF_FRAME)
>> +			b->pocs[i] = min(dpb[i].bottom_field_order_cnt, dpb[i].top_field_order_cnt);
>> +		else if (ref_flag == V4L2_H264_DPB_ENTRY_FLAG_REF_BOTTOM)
>> +			b->pocs[i] = dpb[i].bottom_field_order_cnt;
>> +		else if (ref_flag == V4L2_H264_DPB_ENTRY_FLAG_REF_TOP)
>> +			b->pocs[i] = dpb[i].top_field_order_cnt;
>>  
>> -		buf = to_vb2_v4l2_buffer(vb2_get_buffer(cap_q, buf_idx));
>> -		b->pocs[i] = get_poc(buf->field, dpb[i].top_field_order_cnt,
>> -				     dpb[i].bottom_field_order_cnt);
>>  		b->unordered_reflist[b->num_valid] = i;
>>  		b->num_valid++;
>>  	}
>> @@ -448,8 +481,7 @@ build_b_ref_lists(const struct hantro_h264_reflist_builder *builder,
>>  static bool dpb_entry_match(const struct v4l2_h264_dpb_entry *a,
>>  			    const struct v4l2_h264_dpb_entry *b)
>>  {
>> -	return a->top_field_order_cnt == b->top_field_order_cnt &&
>> -	       a->bottom_field_order_cnt == b->bottom_field_order_cnt;
>> +	return a->reference_ts == b->reference_ts;
>>  }
>>  
>>  static void update_dpb(struct hantro_ctx *ctx)
>> @@ -463,13 +495,13 @@ static void update_dpb(struct hantro_ctx *ctx)
>>  
>>  	/* Disable all entries by default. */
>>  	for (i = 0; i < ARRAY_SIZE(ctx->h264_dec.dpb); i++)
>> -		ctx->h264_dec.dpb[i].flags &= ~V4L2_H264_DPB_ENTRY_FLAG_ACTIVE;
>> +		ctx->h264_dec.dpb[i].flags = 0;
>>  
>>  	/* Try to match new DPB entries with existing ones by their POCs. */
>>  	for (i = 0; i < ARRAY_SIZE(dec_param->dpb); i++) {
>>  		const struct v4l2_h264_dpb_entry *ndpb = &dec_param->dpb[i];
>>  
>> -		if (!(ndpb->flags & V4L2_H264_DPB_ENTRY_FLAG_ACTIVE))
>> +		if (!(ndpb->flags & V4L2_H264_DPB_ENTRY_FLAG_VALID))
>>  			continue;
>>  
>>  		/*
>> @@ -480,8 +512,7 @@ static void update_dpb(struct hantro_ctx *ctx)
>>  			struct v4l2_h264_dpb_entry *cdpb;
>>  
>>  			cdpb = &ctx->h264_dec.dpb[j];
>> -			if (cdpb->flags & V4L2_H264_DPB_ENTRY_FLAG_ACTIVE ||
>> -			    !dpb_entry_match(cdpb, ndpb))
>> +			if (!dpb_entry_match(cdpb, ndpb))
>>  				continue;
>>  
>>  			*cdpb = *ndpb;
>> @@ -541,6 +572,25 @@ struct vb2_buffer *hantro_h264_get_ref_buf(struct hantro_ctx *ctx,
>>  	return buf;
>>  }
>>  
>> +dma_addr_t hantro_h264_get_ref_dma_addr(struct hantro_ctx *ctx,
>> +					unsigned int dpb_idx)
>> +{
>> +	struct v4l2_h264_dpb_entry *dpb = ctx->h264_dec.dpb;
>> +	const struct v4l2_ctrl_h264_decode_params *dec_param = ctx->h264_dec.ctrls.decode;
>> +	const struct v4l2_ctrl_h264_slice_params *slices = ctx->h264_dec.ctrls.slices;
>> +
>> +	struct vb2_buffer *buf = hantro_h264_get_ref_buf(ctx, dpb_idx);
>> +	s32 cur_poc = slices[0].flags & V4L2_H264_SLICE_FLAG_BOTTOM_FIELD ?
>> +		      dec_param->bottom_field_order_cnt :
>> +		      dec_param->top_field_order_cnt;
>> +	u32 flags = dpb[dpb_idx].flags & V4L2_H264_DPB_ENTRY_FLAG_FIELD_PICTURE ? 0x2 : 0;
>> +	flags |= abs(dpb[dpb_idx].top_field_order_cnt - cur_poc) <
>> +		 abs(dpb[dpb_idx].bottom_field_order_cnt - cur_poc) ?
>> +		 0x1 : 0;
>> +
>> +	return vb2_dma_contig_plane_dma_addr(buf, 0) | flags;
>> +}
>> +
>>  int hantro_h264_dec_prepare_run(struct hantro_ctx *ctx)
>>  {
>>  	struct hantro_h264_dec_hw_ctx *h264_ctx = &ctx->h264_dec;
>> diff --git a/drivers/staging/media/hantro/hantro_hw.h b/drivers/staging/media/hantro/hantro_hw.h
>> index 8adad8ac9b1d..d58f2a36ca40 100644
>> --- a/drivers/staging/media/hantro/hantro_hw.h
>> +++ b/drivers/staging/media/hantro/hantro_hw.h
>> @@ -86,6 +86,8 @@ struct hantro_h264_dec_hw_ctx {
>>  	struct v4l2_h264_dpb_entry dpb[HANTRO_H264_DPB_SIZE];
>>  	struct hantro_h264_dec_reflists reflists;
>>  	struct hantro_h264_dec_ctrls ctrls;
>> +	u32 dpb_longterm;
>> +	u32 dpb_valid;
>>  };
>>  
>>  /**
>> @@ -157,6 +159,8 @@ void hantro_jpeg_enc_exit(struct hantro_ctx *ctx);
>>  
>>  struct vb2_buffer *hantro_h264_get_ref_buf(struct hantro_ctx *ctx,
>>  					   unsigned int dpb_idx);
>> +dma_addr_t hantro_h264_get_ref_dma_addr(struct hantro_ctx *ctx,
>> +					unsigned int dpb_idx);
>>  int hantro_h264_dec_prepare_run(struct hantro_ctx *ctx);
>>  void hantro_g1_h264_dec_run(struct hantro_ctx *ctx);
>>  int hantro_h264_dec_init(struct hantro_ctx *ctx);


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC 08/12] media: hantro: Fix H264 decoding of field encoded content
  2019-09-03 14:02       ` Jonas Karlman
@ 2019-09-03 15:01         ` Philipp Zabel
  2019-09-03 19:47           ` Jonas Karlman
  0 siblings, 1 reply; 33+ messages in thread
From: Philipp Zabel @ 2019-09-03 15:01 UTC (permalink / raw)
  To: Jonas Karlman, Ezequiel Garcia
  Cc: Mauro Carvalho Chehab, Hans Verkuil, Boris Brezillon,
	Paul Kocialkowski, linux-media, linux-rockchip, linux-kernel

On Tue, 2019-09-03 at 14:02 +0000, Jonas Karlman wrote:
> On 2019-09-03 15:21, Philipp Zabel wrote:
> > On Sun, 2019-09-01 at 12:45 +0000, Jonas Karlman wrote:
> > > This need code cleanup and formatting
> > > 
> > > Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
> > 
> > The previous patches all work, but this patch breaks decoding of
> > progressive content for me (i.MX8MQ with FFmpeg based on Ezequiel's
> > branch).
> 
> Please try with ffmpeg based on my v4l2-request-hwaccel-4.0.4-hantro branch at [1],
> up to and including the commit "HACK: add dpb flags for reference usage and field picture".
> That commit adds code to set reference flags needed by the changes in this patch.
> 
> There is probably also some other minor difference between our two ffmpeg branches.
> I have not observed any issues with progressive content with this patch and my ffmpeg branch (on a RK3288 device).
> Some H264 reference samples do have visual issues after this patch, however all my real world samples does seem to work.
> 
> My branch use libudev to probe media/video devices and needs to be configured with:
> --enable-v4l2-request --enable-libudev --enable-libdrm
> 
> [1] https://github.com/Kwiboo/FFmpeg/commits/v4l2-request-hwaccel-4.0.4-hantro

I hadn't realized that this is a backwards incompatible change. With
your branch rebased onto n4.2, and this patch applied, decoding seems to
work.

regards
Philipp

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC 08/12] media: hantro: Fix H264 decoding of field encoded content
  2019-09-03 15:01         ` Philipp Zabel
@ 2019-09-03 19:47           ` Jonas Karlman
  0 siblings, 0 replies; 33+ messages in thread
From: Jonas Karlman @ 2019-09-03 19:47 UTC (permalink / raw)
  To: Philipp Zabel, Ezequiel Garcia
  Cc: Mauro Carvalho Chehab, Hans Verkuil, Boris Brezillon,
	Paul Kocialkowski, linux-media, linux-rockchip, linux-kernel

On 2019-09-03 17:01, Philipp Zabel wrote:
> On Tue, 2019-09-03 at 14:02 +0000, Jonas Karlman wrote:
>> On 2019-09-03 15:21, Philipp Zabel wrote:
>>> On Sun, 2019-09-01 at 12:45 +0000, Jonas Karlman wrote:
>>>> This need code cleanup and formatting
>>>>
>>>> Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
>>> The previous patches all work, but this patch breaks decoding of
>>> progressive content for me (i.MX8MQ with FFmpeg based on Ezequiel's
>>> branch).
>> Please try with ffmpeg based on my v4l2-request-hwaccel-4.0.4-hantro branch at [1],
>> up to and including the commit "HACK: add dpb flags for reference usage and field picture".
>> That commit adds code to set reference flags needed by the changes in this patch.
>>
>> There is probably also some other minor difference between our two ffmpeg branches.
>> I have not observed any issues with progressive content with this patch and my ffmpeg branch (on a RK3288 device).
>> Some H264 reference samples do have visual issues after this patch, however all my real world samples does seem to work.
>>
>> My branch use libudev to probe media/video devices and needs to be configured with:
>> --enable-v4l2-request --enable-libudev --enable-libdrm
>>
>> [1] https://github.com/Kwiboo/FFmpeg/commits/v4l2-request-hwaccel-4.0.4-hantro
> I hadn't realized that this is a backwards incompatible change. With
> your branch rebased onto n4.2, and this patch applied, decoding seems to
> work.

Nor did I, thanks for testing and verifying, I will try to reduce breaking changes in a v2.

Regards,
Jonas

>
> regards
> Philipp


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 03/12] media: hantro: Fix H264 motion vector buffer offset
  2019-09-03 10:58     ` Philipp Zabel
@ 2019-09-03 20:13       ` Jonas Karlman
  0 siblings, 0 replies; 33+ messages in thread
From: Jonas Karlman @ 2019-09-03 20:13 UTC (permalink / raw)
  To: Philipp Zabel, Ezequiel Garcia
  Cc: Mauro Carvalho Chehab, Hans Verkuil, Boris Brezillon,
	Paul Kocialkowski, linux-media, linux-rockchip, linux-kernel

On 2019-09-03 12:58, Philipp Zabel wrote:
> Hi Jonas,
>
> On Sun, 2019-09-01 at 12:45 +0000, Jonas Karlman wrote:
>> A decoded 8-bit 4:2:0 frame need memory for up to 448 macroblocks
>> and is laid out in memory as follow:
> Do you mean "A decoded 8-bit 4:2:0 frame needs up to 448 bytes per
> macroblock"?
>
> A 1280x720 frame already consists of 3600 macroblocks (each 16x16 Y +
> 2x8x8 Cb,Cr).

You are correct, thanks for pointing out, I will change in a v2.

>
>> +-------------------+
>>> Y-plane   256 MBs |
> So that looks like it should be 256 bytes * number of macroblocks
> instead, same for the following two.

Ack.

>
>> +-------------------+
>>> UV-plane  128 MBs |
>> +-------------------+
>>> MV buffer  64 MBs |
>> +-------------------+
>>
>> The motion vector buffer offset is currently correct for 4:2:0 because
>> the extra space for motion vectors is overallocated with an extra 64 MBs.
>>
>> Wrong offset for both destination and motion vector buffer are used
>> for the bottom field of field encoded content, wrong offset is
>> also used for 4:0:0 (monochrome) content.
>>
>> Fix this by always setting the motion vector address to the expected
>> 384 MBs offset for 4:2:0 and 256 MBs offset for 4:0:0 content.
> Expected by whom? For example, could these be placed in separate buffers
> instead of appended to the VB2 allocated buffers?

From what I understand main and high profile decoding have hw constraints in that
the direct mode motion vectors buffer must be located continuously after the YUV buffer.

I also observed instances where the current requirement for profile_idc > 66 caused issues
for some streams, e.g. big_buck_bunny_1080p_H264_AAC_25fps_7200K.mp4

Because of this it was just easier to always configure the motion vector buffer address.

>
>> Also use correct destination and motion vector buffer offset
>> for the bottom field of field encoded content.
>>
>> While at it also extend the check for 4:0:0 (monochrome) to include an
>> additional check for High Profile (100).
>>
>> Fixes: dea0a82f3d22 ("media: hantro: Add support for H264 decoding on G1")
>> Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
>> ---
>>  .../staging/media/hantro/hantro_g1_h264_dec.c | 33 +++++++++++--------
>>  1 file changed, 19 insertions(+), 14 deletions(-)
>>
>> diff --git a/drivers/staging/media/hantro/hantro_g1_h264_dec.c b/drivers/staging/media/hantro/hantro_g1_h264_dec.c
>> index 7ab534936843..159bd67e0a36 100644
>> --- a/drivers/staging/media/hantro/hantro_g1_h264_dec.c
>> +++ b/drivers/staging/media/hantro/hantro_g1_h264_dec.c
>> @@ -19,6 +19,9 @@
>>  #include "hantro_hw.h"
>>  #include "hantro_v4l2.h"
>>  
>> +#define MV_OFFSET_420	384
>> +#define MV_OFFSET_400	256
>> +
>>  static void set_params(struct hantro_ctx *ctx)
>>  {
>>  	const struct hantro_h264_dec_ctrls *ctrls = &ctx->h264_dec.ctrls;
>> @@ -49,8 +52,8 @@ static void set_params(struct hantro_ctx *ctx)
>>  	vdpu_write_relaxed(vpu, reg, G1_REG_DEC_CTRL0);
>>  
>>  	/* Decoder control register 1. */
>> -	reg = G1_REG_DEC_CTRL1_PIC_MB_WIDTH(sps->pic_width_in_mbs_minus1 + 1) |
>> -	      G1_REG_DEC_CTRL1_PIC_MB_HEIGHT_P(sps->pic_height_in_map_units_minus1 + 1) |
>> +	reg = G1_REG_DEC_CTRL1_PIC_MB_WIDTH(H264_MB_WIDTH(ctx->dst_fmt.width)) |
>> +	      G1_REG_DEC_CTRL1_PIC_MB_HEIGHT_P(H264_MB_HEIGHT(ctx->dst_fmt.height)) |
>>  	      G1_REG_DEC_CTRL1_REF_FRAMES(sps->max_num_ref_frames);
>>  	vdpu_write_relaxed(vpu, reg, G1_REG_DEC_CTRL1);
>>  
>> @@ -79,7 +82,7 @@ static void set_params(struct hantro_ctx *ctx)
>>  		reg |= G1_REG_DEC_CTRL4_CABAC_E;
>>  	if (sps->flags & V4L2_H264_SPS_FLAG_DIRECT_8X8_INFERENCE)
>>  		reg |= G1_REG_DEC_CTRL4_DIR_8X8_INFER_E;
>> -	if (sps->chroma_format_idc == 0)
>> +	if (sps->profile_idc >= 100 && sps->chroma_format_idc == 0)
>>  		reg |= G1_REG_DEC_CTRL4_BLACKWHITE_E;
>>  	if (pps->flags & V4L2_H264_PPS_FLAG_WEIGHTED_PRED)
>>  		reg |= G1_REG_DEC_CTRL4_WEIGHT_PRED_E;
>> @@ -233,6 +236,7 @@ static void set_buffers(struct hantro_ctx *ctx)
>>  	struct vb2_v4l2_buffer *src_buf, *dst_buf;
>>  	struct hantro_dev *vpu = ctx->dev;
>>  	dma_addr_t src_dma, dst_dma;
>> +	unsigned int offset = MV_OFFSET_420;
>>  
>>  	src_buf = hantro_get_src_buf(ctx);
>>  	dst_buf = hantro_get_dst_buf(ctx);
>> @@ -243,19 +247,20 @@ static void set_buffers(struct hantro_ctx *ctx)
>>  
>>  	/* Destination (decoded frame) buffer. */
>>  	dst_dma = vb2_dma_contig_plane_dma_addr(&dst_buf->vb2_buf, 0);
>> +	if (ctrls->slices[0].flags & V4L2_H264_SLICE_FLAG_BOTTOM_FIELD)
>> +		dst_dma += ALIGN(ctx->dst_fmt.width, H264_MB_DIM);
> How does this work? Does userspace decode two fields into the same
> capture buffer and the hardware writes each field with a stride of 2
> lines? I suppose this corresponds to V4L2_FIELD_INTERLACED. Could this
> also be made to support V4L2_FIELD_SEQ_TB output?

Yes, both fields are decoded into the same capture buffer, top field to odd numbered lines
and bottom field to even numbered lines, so I guess this corresponds to V4L2_FIELD_INTERLACED.
This is also how the cedrus driver handles field decoding with Jernej's h264 patches.

I do not know if it is possible to configure the hw to decode into a V4L2_FIELD_SEQ_TB type output.

Regards,
Jonas

>
> regards
> Philipp


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 01/12] media: hantro: Fix H264 max frmsize supported on RK3288
  2019-09-01 12:45 ` [PATCH 01/12] media: hantro: Fix H264 max frmsize supported on RK3288 Jonas Karlman
@ 2019-09-04 13:07   ` Ezequiel Garcia
  2019-09-09 19:25     ` Jonas Karlman
  0 siblings, 1 reply; 33+ messages in thread
From: Ezequiel Garcia @ 2019-09-04 13:07 UTC (permalink / raw)
  To: Jonas Karlman
  Cc: Mauro Carvalho Chehab, Hans Verkuil, Boris Brezillon,
	Philipp Zabel, Paul Kocialkowski, linux-media, linux-rockchip,
	linux-kernel

Hello Jonas,

Thank you for the patch.

On Sun, 2019-09-01 at 12:45 +0000, Jonas Karlman wrote:
> TRM specify supported image size 48x48 to 4096x2304 at step size 16 pixels,
> change frmsize max_width/max_height to match TRM.
> 

The RK3288 TRM v1.1 (2015-8-20) I have here mentions a maximum
of 3840x2160.

I must admit I haven't tested with actual content this size
to verify it, have you checked it?

Thanks,
Ezequiel
 
> Fixes: 760327930e10 ("media: hantro: Enable H264 decoding on rk3288")
> Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
> ---
>  drivers/staging/media/hantro/rk3288_vpu_hw.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/staging/media/hantro/rk3288_vpu_hw.c b/drivers/staging/media/hantro/rk3288_vpu_hw.c
> index 6bfcc47d1e58..ebb017b8a334 100644
> --- a/drivers/staging/media/hantro/rk3288_vpu_hw.c
> +++ b/drivers/staging/media/hantro/rk3288_vpu_hw.c
> @@ -67,10 +67,10 @@ static const struct hantro_fmt rk3288_vpu_dec_fmts[] = {
>  		.max_depth = 2,
>  		.frmsize = {
>  			.min_width = 48,
> -			.max_width = 3840,
> +			.max_width = 4096,
>  			.step_width = H264_MB_DIM,
>  			.min_height = 48,
> -			.max_height = 2160,
> +			.max_height = 2304,
>  			.step_height = H264_MB_DIM,
>  		},
>  	},



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 01/12] media: hantro: Fix H264 max frmsize supported on RK3288
  2019-09-04 13:07   ` Ezequiel Garcia
@ 2019-09-09 19:25     ` Jonas Karlman
  0 siblings, 0 replies; 33+ messages in thread
From: Jonas Karlman @ 2019-09-09 19:25 UTC (permalink / raw)
  To: Ezequiel Garcia
  Cc: Mauro Carvalho Chehab, Hans Verkuil, Boris Brezillon,
	Philipp Zabel, Paul Kocialkowski, linux-media, linux-rockchip,
	linux-kernel

On 2019-09-04 15:07, Ezequiel Garcia wrote:
> Hello Jonas,
>
> Thank you for the patch.
>
> On Sun, 2019-09-01 at 12:45 +0000, Jonas Karlman wrote:
>> TRM specify supported image size 48x48 to 4096x2304 at step size 16 pixels,
>> change frmsize max_width/max_height to match TRM.
>>
> The RK3288 TRM v1.1 (2015-8-20) I have here mentions a maximum
> of 3840x2160.
>
> I must admit I haven't tested with actual content this size
> to verify it, have you checked it?

I can confirm that one of my test samples (PUPPIES BATH IN 4K) is 4096x2304 and can be decoded after this patch.
However the decoding speed is not optimal at 400Mhz, if I recall correctly you need to set the VPU1 clock to 600Mhz for 4K decoding on RK3288.

The RK3288 TRM vcodec chapter from [1], unknown revision and date, lists 48x48 to 4096x2304 step size 16 pixels under 25.5.1 H.264 decoder.

[1] http://www.t-firefly.com/download/firefly-rk3288/docs/TRM/rk3288-chapter-25-video-encoder-decoder-unit-(vcodec).pdf

Regards,
Jonas

>
> Thanks,
> Ezequiel
>  
>> Fixes: 760327930e10 ("media: hantro: Enable H264 decoding on rk3288")
>> Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
>> ---
>>  drivers/staging/media/hantro/rk3288_vpu_hw.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/staging/media/hantro/rk3288_vpu_hw.c b/drivers/staging/media/hantro/rk3288_vpu_hw.c
>> index 6bfcc47d1e58..ebb017b8a334 100644
>> --- a/drivers/staging/media/hantro/rk3288_vpu_hw.c
>> +++ b/drivers/staging/media/hantro/rk3288_vpu_hw.c
>> @@ -67,10 +67,10 @@ static const struct hantro_fmt rk3288_vpu_dec_fmts[] = {
>>  		.max_depth = 2,
>>  		.frmsize = {
>>  			.min_width = 48,
>> -			.max_width = 3840,
>> +			.max_width = 4096,
>>  			.step_width = H264_MB_DIM,
>>  			.min_height = 48,
>> -			.max_height = 2160,
>> +			.max_height = 2304,
>>  			.step_height = H264_MB_DIM,
>>  		},
>>  	},
>


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 02/12] media: hantro: Do not reorder H264 scaling list
  2019-09-02 16:18       ` Jonas Karlman
  2019-09-03  7:54         ` Jonas Karlman
  2019-09-03  9:56         ` Philipp Zabel
@ 2019-09-10 10:14         ` Ezequiel Garcia
  2 siblings, 0 replies; 33+ messages in thread
From: Ezequiel Garcia @ 2019-09-10 10:14 UTC (permalink / raw)
  To: Jonas Karlman, Philipp Zabel
  Cc: Mauro Carvalho Chehab, Hans Verkuil, Boris Brezillon,
	Paul Kocialkowski, linux-media, linux-rockchip, linux-kernel

Hi Jonas,

Thanks for your patch.

On Mon, 2019-09-02 at 16:18 +0000, Jonas Karlman wrote:
[..]
> 
> > > diff --git a/drivers/staging/media/hantro/hantro_h264.c b/drivers/staging/media/hantro/hantro_h264.c
> > > index 0d758e0c0f99..e2d01145ac4f 100644
> > > --- a/drivers/staging/media/hantro/hantro_h264.c
> > > +++ b/drivers/staging/media/hantro/hantro_h264.c
> > > @@ -20,7 +20,7 @@
> > >  /* Size with u32 units. */
> > >  #define CABAC_INIT_BUFFER_SIZE		(460 * 2)
> > >  #define POC_BUFFER_SIZE			34
> > > -#define SCALING_LIST_SIZE		(6 * 16 + 6 * 64)
> > > +#define SCALING_LIST_SIZE		(6 * 16 + 2 * 64)
> > This changes the size of struct hantro_h264_dec_priv_tbl. Did this
> > describe the auxiliary buffer format incorrectly before?
> 
> Based on RKMPP and Hantro SDK the HW expects the 8x8 inter/intra list for
> Y-component to be located at indices 0 and 1, lists for Cr/Cb is only used for
> 4:4:4 and HW only supports 4:0:0/4:2:0 if I am not mistaken. So the unused
> extra 4 lists at the end of the auxiliary buffer seemed like a waste,
> also RKMPP and Hantro SDK only seemed to allocate space for 2 lists.
> 

I think it would make a lot of sense to document what the hardware
expects somewhere, perhaps as part of the struct hantro_h264_dec_priv_tbl
documentation?

Thanks,
Ezequiel


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 03/12] media: hantro: Fix H264 motion vector buffer offset
  2019-09-01 12:45   ` [PATCH 03/12] media: hantro: Fix H264 motion vector buffer offset Jonas Karlman
  2019-09-03 10:58     ` Philipp Zabel
@ 2019-09-10 10:18     ` Ezequiel Garcia
  2019-09-10 11:34     ` Ezequiel Garcia
  2 siblings, 0 replies; 33+ messages in thread
From: Ezequiel Garcia @ 2019-09-10 10:18 UTC (permalink / raw)
  To: Jonas Karlman
  Cc: Mauro Carvalho Chehab, Hans Verkuil, Boris Brezillon,
	Philipp Zabel, Paul Kocialkowski, linux-media, linux-rockchip,
	linux-kernel

Hi Jonas,

Thanks for fixing this, I'm happy we are reducing the amount
of black magic here.

On Sun, 2019-09-01 at 12:45 +0000, Jonas Karlman wrote:
> A decoded 8-bit 4:2:0 frame need memory for up to 448 macroblocks
> and is laid out in memory as follow:
> 
> +-------------------+
> > Y-plane   256 MBs |
> +-------------------+
> > UV-plane  128 MBs |
> +-------------------+
> > MV buffer  64 MBs |
> +-------------------+
> 
> The motion vector buffer offset is currently correct for 4:2:0 because
> the extra space for motion vectors is overallocated with an extra 64 MBs.
> 
> Wrong offset for both destination and motion vector buffer are used
> for the bottom field of field encoded content, wrong offset is
> also used for 4:0:0 (monochrome) content.
> 
> Fix this by always setting the motion vector address to the expected
> 384 MBs offset for 4:2:0 and 256 MBs offset for 4:0:0 content.
> 
> Also use correct destination and motion vector buffer offset
> for the bottom field of field encoded content.
> 
> While at it also extend the check for 4:0:0 (monochrome) to include an
> additional check for High Profile (100).
> 

As with the scaling list, I believe it would make a lot of sense
to document this in the driver itself.

Thanks,
Ezequiel


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 03/12] media: hantro: Fix H264 motion vector buffer offset
  2019-09-01 12:45   ` [PATCH 03/12] media: hantro: Fix H264 motion vector buffer offset Jonas Karlman
  2019-09-03 10:58     ` Philipp Zabel
  2019-09-10 10:18     ` Ezequiel Garcia
@ 2019-09-10 11:34     ` Ezequiel Garcia
  2 siblings, 0 replies; 33+ messages in thread
From: Ezequiel Garcia @ 2019-09-10 11:34 UTC (permalink / raw)
  To: Jonas Karlman
  Cc: Mauro Carvalho Chehab, Hans Verkuil, Boris Brezillon,
	Philipp Zabel, Paul Kocialkowski, linux-media, linux-rockchip,
	linux-kernel

A few more comments...

On Sun, 2019-09-01 at 12:45 +0000, Jonas Karlman wrote:
> A decoded 8-bit 4:2:0 frame need memory for up to 448 macroblocks
> and is laid out in memory as follow:
> 
> +-------------------+
> > Y-plane   256 MBs |
> +-------------------+
> > UV-plane  128 MBs |
> +-------------------+
> > MV buffer  64 MBs |
> +-------------------+
> 
> The motion vector buffer offset is currently correct for 4:2:0 because
> the extra space for motion vectors is overallocated with an extra 64 MBs.
> 
> Wrong offset for both destination and motion vector buffer are used
> for the bottom field of field encoded content, wrong offset is
> also used for 4:0:0 (monochrome) content.
> 
> Fix this by always setting the motion vector address to the expected
> 384 MBs offset for 4:2:0 and 256 MBs offset for 4:0:0 content.
> 
> Also use correct destination and motion vector buffer offset
> for the bottom field of field encoded content.
> 
> While at it also extend the check for 4:0:0 (monochrome) to include an
> additional check for High Profile (100).
> 
> Fixes: dea0a82f3d22 ("media: hantro: Add support for H264 decoding on G1")
> Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
> ---
>  .../staging/media/hantro/hantro_g1_h264_dec.c | 33 +++++++++++--------
>  1 file changed, 19 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/staging/media/hantro/hantro_g1_h264_dec.c b/drivers/staging/media/hantro/hantro_g1_h264_dec.c
> index 7ab534936843..159bd67e0a36 100644
> --- a/drivers/staging/media/hantro/hantro_g1_h264_dec.c
> +++ b/drivers/staging/media/hantro/hantro_g1_h264_dec.c
> @@ -19,6 +19,9 @@
>  #include "hantro_hw.h"
>  #include "hantro_v4l2.h"
>  
> +#define MV_OFFSET_420	384
> +#define MV_OFFSET_400	256
> +

Instead of introducing these macros, I'd just use the macroblock width
and height ones explicitly. This way it's more clear where is
the code coming from.

>  static void set_params(struct hantro_ctx *ctx)
>  {
>  	const struct hantro_h264_dec_ctrls *ctrls = &ctx->h264_dec.ctrls;
> @@ -49,8 +52,8 @@ static void set_params(struct hantro_ctx *ctx)
>  	vdpu_write_relaxed(vpu, reg, G1_REG_DEC_CTRL0);
>  
>  	/* Decoder control register 1. */
> -	reg = G1_REG_DEC_CTRL1_PIC_MB_WIDTH(sps->pic_width_in_mbs_minus1 + 1) |
> -	      G1_REG_DEC_CTRL1_PIC_MB_HEIGHT_P(sps->pic_height_in_map_units_minus1 + 1) |
> +	reg = G1_REG_DEC_CTRL1_PIC_MB_WIDTH(H264_MB_WIDTH(ctx->dst_fmt.width)) |
> +	      G1_REG_DEC_CTRL1_PIC_MB_HEIGHT_P(H264_MB_HEIGHT(ctx->dst_fmt.height)) |

This is a nice fix, but unless I'm missing something it's unrelated to this patch.
 
>  	      G1_REG_DEC_CTRL1_REF_FRAMES(sps->max_num_ref_frames);
>  	vdpu_write_relaxed(vpu, reg, G1_REG_DEC_CTRL1);
>  
> @@ -79,7 +82,7 @@ static void set_params(struct hantro_ctx *ctx)
>  		reg |= G1_REG_DEC_CTRL4_CABAC_E;
>  	if (sps->flags & V4L2_H264_SPS_FLAG_DIRECT_8X8_INFERENCE)
>  		reg |= G1_REG_DEC_CTRL4_DIR_8X8_INFER_E;
> -	if (sps->chroma_format_idc == 0)
> +	if (sps->profile_idc >= 100 && sps->chroma_format_idc == 0)
>  		reg |= G1_REG_DEC_CTRL4_BLACKWHITE_E;
>  	if (pps->flags & V4L2_H264_PPS_FLAG_WEIGHTED_PRED)
>  		reg |= G1_REG_DEC_CTRL4_WEIGHT_PRED_E;
> @@ -233,6 +236,7 @@ static void set_buffers(struct hantro_ctx *ctx)
>  	struct vb2_v4l2_buffer *src_buf, *dst_buf;
>  	struct hantro_dev *vpu = ctx->dev;
>  	dma_addr_t src_dma, dst_dma;
> +	unsigned int offset = MV_OFFSET_420;
>  
>  	src_buf = hantro_get_src_buf(ctx);
>  	dst_buf = hantro_get_dst_buf(ctx);
> @@ -243,19 +247,20 @@ static void set_buffers(struct hantro_ctx *ctx)
>  
>  	/* Destination (decoded frame) buffer. */
>  	dst_dma = vb2_dma_contig_plane_dma_addr(&dst_buf->vb2_buf, 0);
> +	if (ctrls->slices[0].flags & V4L2_H264_SLICE_FLAG_BOTTOM_FIELD)
> +		dst_dma += ALIGN(ctx->dst_fmt.width, H264_MB_DIM);
>  	vdpu_write_relaxed(vpu, dst_dma, G1_REG_ADDR_DST);
>  
> -	/* Higher profiles require DMV buffer appended to reference frames. */
> -	if (ctrls->sps->profile_idc > 66) {
> -		size_t pic_size = ctx->h264_dec.pic_size;
> -		size_t mv_offset = round_up(pic_size, 8);
> -
> -		if (ctrls->slices[0].flags & V4L2_H264_SLICE_FLAG_BOTTOM_FIELD)
> -			mv_offset += 32 * H264_MB_WIDTH(ctx->dst_fmt.width);
> -
> -		vdpu_write_relaxed(vpu, dst_dma + mv_offset,
> -				   G1_REG_ADDR_DIR_MV);
> -	}
> +	/* Motion vector buffer is located after the decoded frame. */
> +	dst_dma = vb2_dma_contig_plane_dma_addr(&dst_buf->vb2_buf, 0);

I would try to rework the code to avoid calling
vb2_dma_contig_plane_dma_addr() again.

> +	if (ctrls->sps->profile_idc >= 100 && ctrls->sps->chroma_format_idc == 0)
> +		offset = MV_OFFSET_400;
> +	dst_dma += offset * H264_MB_WIDTH(ctx->dst_fmt.width) *
> +		   H264_MB_HEIGHT(ctx->dst_fmt.height);

Perhaps rename 'offset' to something different? Maybe bytes_per_mb
or similar.

> +	if (ctrls->slices[0].flags & V4L2_H264_SLICE_FLAG_BOTTOM_FIELD)
> +		dst_dma += 32 * H264_MB_WIDTH(ctx->dst_fmt.width) *
> +			   H264_MB_HEIGHT(ctx->dst_fmt.height);

While here, could you replace this 32 magic number with some
meaningful macro?

> +	vdpu_write_relaxed(vpu, dst_dma, G1_REG_ADDR_DIR_MV);
>  
>  	/* Auxiliary buffer prepared in hantro_g1_h264_dec_prepare_table(). */
>  	vdpu_write_relaxed(vpu, ctx->h264_dec.priv.dma, G1_REG_ADDR_QTABLE);

Thanks a lot,
Ezequiel


^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, back to index

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-01 12:42 [PATCH RFC 00/12] media: hantro: H264 fixes and improvements Jonas Karlman
2019-09-01 12:45 ` [PATCH 01/12] media: hantro: Fix H264 max frmsize supported on RK3288 Jonas Karlman
2019-09-04 13:07   ` Ezequiel Garcia
2019-09-09 19:25     ` Jonas Karlman
     [not found] ` <20190901124531.23645-1-jonas@kwiboo.se>
2019-09-01 12:45   ` [PATCH 02/12] media: hantro: Do not reorder H264 scaling list Jonas Karlman
2019-09-02 14:00     ` Philipp Zabel
2019-09-02 16:18       ` Jonas Karlman
2019-09-03  7:54         ` Jonas Karlman
2019-09-03 12:53           ` Philipp Zabel
2019-09-03  9:56         ` Philipp Zabel
2019-09-10 10:14         ` Ezequiel Garcia
2019-09-01 12:45   ` [PATCH 03/12] media: hantro: Fix H264 motion vector buffer offset Jonas Karlman
2019-09-03 10:58     ` Philipp Zabel
2019-09-03 20:13       ` Jonas Karlman
2019-09-10 10:18     ` Ezequiel Garcia
2019-09-10 11:34     ` Ezequiel Garcia
2019-09-01 12:45   ` [PATCH 05/12] media: hantro: Remove now unused H264 pic_size Jonas Karlman
2019-09-01 12:45   ` [PATCH 04/12] media: hantro: Reduce H264 extra space for motion vectors Jonas Karlman
2019-09-01 12:45   ` [PATCH 06/12] media: hantro: Set H264 FIELDPIC_FLAG_E flag correctly Jonas Karlman
2019-09-01 12:45   ` [RFC 08/12] media: hantro: Fix H264 decoding of field encoded content Jonas Karlman
2019-09-03 13:21     ` Philipp Zabel
2019-09-03 14:02       ` Jonas Karlman
2019-09-03 15:01         ` Philipp Zabel
2019-09-03 19:47           ` Jonas Karlman
2019-09-01 12:45   ` [RFC 07/12] media: uapi: h264: Add DPB entry field reference flags Jonas Karlman
2019-09-01 12:45   ` [RFC 09/12] media: hantro: Refactor G1 H264 code Jonas Karlman
2019-09-01 12:45   ` [RFC 10/12] media: hantro: Add support for H264 decoding on RK3399 Jonas Karlman
2019-09-02 11:46     ` Hans Verkuil
2019-09-02 15:25       ` Jonas Karlman
2019-09-01 12:45   ` [RFC 11/12] media: hantro: Enable " Jonas Karlman
2019-09-01 12:45   ` [RFC 12/12] media: hantro: Enable H264 decoding on RK3328 Jonas Karlman
2019-09-02 13:02 ` [PATCH RFC 00/12] media: hantro: H264 fixes and improvements Ezequiel Garcia
2019-09-02 16:28   ` Jonas Karlman

Linux-Media Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-media/0 linux-media/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-media linux-media/ https://lore.kernel.org/linux-media \
		linux-media@vger.kernel.org linux-media@archiver.kernel.org
	public-inbox-index linux-media

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-media


AGPL code for this site: git clone https://public-inbox.org/ public-inbox