linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jonas Karlman <jonas@kwiboo.se>
To: Philipp Zabel <p.zabel@pengutronix.de>,
	Ezequiel Garcia <ezequiel@collabora.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>,
	Hans Verkuil <hverkuil@xs4all.nl>,
	Boris Brezillon <boris.brezillon@collabora.com>,
	Paul Kocialkowski <paul.kocialkowski@bootlin.com>,
	"linux-media@vger.kernel.org" <linux-media@vger.kernel.org>,
	"linux-rockchip@lists.infradead.org" 
	<linux-rockchip@lists.infradead.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 02/12] media: hantro: Do not reorder H264 scaling list
Date: Mon, 2 Sep 2019 16:18:21 +0000	[thread overview]
Message-ID: <HE1PR06MB4011A8F99D58E5ACFAE3CECAACBE0@HE1PR06MB4011.eurprd06.prod.outlook.com> (raw)
In-Reply-To: <1567432843.3666.6.camel@pengutronix.de>

On 2019-09-02 16:00, Philipp Zabel wrote:
> Hi Jonas,
>
> On Sun, 2019-09-01 at 12:45 +0000, Jonas Karlman wrote:
>> Scaling list supplied from userspace using ffmpeg and libva-v4l2-request
>> is already in matrix order and can be used without applying the inverse
>> scanning process.
> "in matrix order" is equivalent to "in raster scan order"?

The values supplied by ffmpeg and libva-v4l2-request is in the order after the
inverse scanning process has been applied (scaling list has been transformed
into a scaling matrix). Not sure what this is called, "matrix order" seemed
close enough.

Since there is two scan orders, zig-zag and field, and cedrus already expecting
the values in "matrix" order, it seems more logical to let userspace handle the
inverse scanning process.

>
> Could you add this requirement to the
> V4L2_CID_MPEG_VIDEO_H264_SCALING_MATRIX documentation?

Sure, I will update documentation in v2.

>
>> The HW also only support 8x8 scaling list for the Y component, indices 0
>> and 3 in the scaling list supplied from userspace.
>>
>> Remove reordering and write the scaling matrix in an order expected by
>> the VPU, also only allocate memory for the two 8x8 lists used.
>>
>> Fixes: a9471e25629b ("media: hantro: Add core bits to support H264 decoding")
>> Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
>> ---
>>  drivers/staging/media/hantro/hantro_h264.c | 64 +++++++---------------
>>  1 file changed, 20 insertions(+), 44 deletions(-)
>>
>> diff --git a/drivers/staging/media/hantro/hantro_h264.c b/drivers/staging/media/hantro/hantro_h264.c
>> index 0d758e0c0f99..e2d01145ac4f 100644
>> --- a/drivers/staging/media/hantro/hantro_h264.c
>> +++ b/drivers/staging/media/hantro/hantro_h264.c
>> @@ -20,7 +20,7 @@
>>  /* Size with u32 units. */
>>  #define CABAC_INIT_BUFFER_SIZE		(460 * 2)
>>  #define POC_BUFFER_SIZE			34
>> -#define SCALING_LIST_SIZE		(6 * 16 + 6 * 64)
>> +#define SCALING_LIST_SIZE		(6 * 16 + 2 * 64)
> This changes the size of struct hantro_h264_dec_priv_tbl. Did this
> describe the auxiliary buffer format incorrectly before?

Based on RKMPP and Hantro SDK the HW expects the 8x8 inter/intra list for
Y-component to be located at indices 0 and 1, lists for Cr/Cb is only used for
4:4:4 and HW only supports 4:0:0/4:2:0 if I am not mistaken. So the unused
extra 4 lists at the end of the auxiliary buffer seemed like a waste,
also RKMPP and Hantro SDK only seemed to allocate space for 2 lists.

>
>>  #define POC_CMP(p0, p1) ((p0) < (p1) ? -1 : 1)
>>  
>> @@ -194,57 +194,33 @@ static const u32 h264_cabac_table[] = {
>>  	0x1f0c2517, 0x1f261440
>>  };
>>  
>> -/*
>> - * NOTE: The scaling lists are in zig-zag order, apply inverse scanning process
>> - * to get the values in matrix order. In addition, the hardware requires bytes
>> - * swapped within each subsequent 4 bytes. Both arrays below include both
>> - * transformations.
>> - */
>> -static const u32 zig_zag_4x4[] = {
>> -	3, 2, 7, 11, 6, 1, 0, 5, 10, 15, 14, 9, 4, 8, 13, 12
>> -};
>> -
>> -static const u32 zig_zag_8x8[] = {
>> -	3, 2, 11, 19, 10, 1, 0, 9, 18, 27, 35, 26, 17, 8, 7, 6,
>> -	15, 16, 25, 34, 43, 51, 42, 33, 24, 23, 14, 5, 4, 13, 22, 31,
>> -	32, 41, 50, 59, 58, 49, 40, 39, 30, 21, 12, 20, 29, 38, 47, 48,
>> -	57, 56, 55, 46, 37, 28, 36, 45, 54, 63, 62, 53, 44, 52, 61, 60
>> -};
>> -
>>  static void
>>  reorder_scaling_list(struct hantro_ctx *ctx)
>>  {
>>  	const struct hantro_h264_dec_ctrls *ctrls = &ctx->h264_dec.ctrls;
>>  	const struct v4l2_ctrl_h264_scaling_matrix *scaling = ctrls->scaling;
>> -	const size_t num_list_4x4 = ARRAY_SIZE(scaling->scaling_list_4x4);
>> -	const size_t list_len_4x4 = ARRAY_SIZE(scaling->scaling_list_4x4[0]);
>> -	const size_t num_list_8x8 = ARRAY_SIZE(scaling->scaling_list_8x8);
>> -	const size_t list_len_8x8 = ARRAY_SIZE(scaling->scaling_list_8x8[0]);
>>  	struct hantro_h264_dec_priv_tbl *tbl = ctx->h264_dec.priv.cpu;
>> -	u8 *dst = tbl->scaling_list;
>> -	const u8 *src;
>> -	int i, j;
>> -
>> -	BUILD_BUG_ON(ARRAY_SIZE(zig_zag_4x4) != list_len_4x4);
>> -	BUILD_BUG_ON(ARRAY_SIZE(zig_zag_8x8) != list_len_8x8);
>> -	BUILD_BUG_ON(ARRAY_SIZE(tbl->scaling_list) !=
>> -		     num_list_4x4 * list_len_4x4 +
>> -		     num_list_8x8 * list_len_8x8);
>> -
>> -	src = &scaling->scaling_list_4x4[0][0];
>> -	for (i = 0; i < num_list_4x4; ++i) {
>> -		for (j = 0; j < list_len_4x4; ++j)
>> -			dst[zig_zag_4x4[j]] = src[j];
>> -		src += list_len_4x4;
>> -		dst += list_len_4x4;
>> +	u32 *dst = (u32 *)tbl->scaling_list;
>> +	u32 i, j, tmp;
>> +
>> +	for (i = 0; i < ARRAY_SIZE(scaling->scaling_list_4x4); i++) {
>> +		for (j = 0; j < ARRAY_SIZE(scaling->scaling_list_4x4[0]) / 4; j++) {
>> +			tmp = (scaling->scaling_list_4x4[i][4 * j + 0] << 24) |
>> +			      (scaling->scaling_list_4x4[i][4 * j + 1] << 16) |
>> +			      (scaling->scaling_list_4x4[i][4 * j + 2] << 8) |
>> +			      (scaling->scaling_list_4x4[i][4 * j + 3]);
>> +			*dst++ = tmp;
>> +		}
> This looks like it could use swab32().

Thanks for the tip, will look into and change in v2.

>
>>  	}
>>  
>> -	src = &scaling->scaling_list_8x8[0][0];
>> -	for (i = 0; i < num_list_8x8; ++i) {
>> -		for (j = 0; j < list_len_8x8; ++j)
>> -			dst[zig_zag_8x8[j]] = src[j];
>> -		src += list_len_8x8;
>> -		dst += list_len_8x8;
>> +	for (i = 0; i < ARRAY_SIZE(scaling->scaling_list_8x8); i += 3) {
>> +		for (j = 0; j < ARRAY_SIZE(scaling->scaling_list_8x8[0]) / 4; j++) {
>> +			tmp = (scaling->scaling_list_8x8[i][4 * j + 0] << 24) |
>> +			      (scaling->scaling_list_8x8[i][4 * j + 1] << 16) |
>> +			      (scaling->scaling_list_8x8[i][4 * j + 2] << 8) |
>> +			      (scaling->scaling_list_8x8[i][4 * j + 3]);
>> +			*dst++ = tmp;
>> +		}
> After this change, the second 8x8 scaling list has moved to a different
> offset. Is this where the hardware has always been looking for it, or is
> there a change missing in another place?

As mentioned above HW only looks at indices 0 and 1, and ffmpeg will store the
inter/intra Y list at indices 0 and 3 as seen at [1], in similar way cedrus only
use indices 0 and 3 at [2].
FFmpeg memcpy entire scaling_matrix8 to scaling_list_8x8 for v4l2-request-api
and memcpy scaling_matrix8[0] and scaling_matrix8[3] for vaapi.

You can see the effect of this patch using the h264_tivo_sample.ts sample from
cover letter, patch 3-8 must be applied. With this patch applied the green
football field will stay green, without the patch the field will shift in colors.

[1] https://github.com/FFmpeg/FFmpeg/blob/master/libavcodec/h264_ps.c#L299-L308
[2] https://git.linuxtv.org/media_tree.git/tree/drivers/staging/media/sunxi/cedrus/cedrus_h264.c#n231

Regards,
Jonas

>
> regards
> Philipp


  reply	other threads:[~2019-09-02 16:18 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-01 12:42 [PATCH RFC 00/12] media: hantro: H264 fixes and improvements Jonas Karlman
2019-09-01 12:45 ` [PATCH 01/12] media: hantro: Fix H264 max frmsize supported on RK3288 Jonas Karlman
2019-09-04 13:07   ` Ezequiel Garcia
2019-09-09 19:25     ` Jonas Karlman
     [not found] ` <20190901124531.23645-1-jonas@kwiboo.se>
2019-09-01 12:45   ` [PATCH 02/12] media: hantro: Do not reorder H264 scaling list Jonas Karlman
2019-09-02 14:00     ` Philipp Zabel
2019-09-02 16:18       ` Jonas Karlman [this message]
2019-09-03  7:54         ` Jonas Karlman
2019-09-03 12:53           ` Philipp Zabel
2019-09-03  9:56         ` Philipp Zabel
2019-09-10 10:14         ` Ezequiel Garcia
2019-09-01 12:45   ` [PATCH 03/12] media: hantro: Fix H264 motion vector buffer offset Jonas Karlman
2019-09-03 10:58     ` Philipp Zabel
2019-09-03 20:13       ` Jonas Karlman
2019-09-10 10:18     ` Ezequiel Garcia
2019-09-10 11:34     ` Ezequiel Garcia
2019-09-01 12:45   ` [PATCH 05/12] media: hantro: Remove now unused H264 pic_size Jonas Karlman
2019-09-01 12:45   ` [PATCH 04/12] media: hantro: Reduce H264 extra space for motion vectors Jonas Karlman
2019-09-01 12:45   ` [PATCH 06/12] media: hantro: Set H264 FIELDPIC_FLAG_E flag correctly Jonas Karlman
2019-09-01 12:45   ` [RFC 08/12] media: hantro: Fix H264 decoding of field encoded content Jonas Karlman
2019-09-03 13:21     ` Philipp Zabel
2019-09-03 14:02       ` Jonas Karlman
2019-09-03 15:01         ` Philipp Zabel
2019-09-03 19:47           ` Jonas Karlman
2019-09-01 12:45   ` [RFC 07/12] media: uapi: h264: Add DPB entry field reference flags Jonas Karlman
2020-07-10  4:21     ` Ezequiel Garcia
2020-07-10  8:13       ` Boris Brezillon
2020-07-10  8:48         ` Jonas Karlman
2020-07-10 12:18           ` Ezequiel Garcia
2020-07-10 11:50         ` Ezequiel Garcia
2020-07-10 12:05           ` Boris Brezillon
2020-07-10 12:25             ` Ezequiel Garcia
2020-07-10 21:49               ` Nicolas Dufresne
2020-07-11 10:21                 ` Jonas Karlman
2020-07-11 18:36                   ` Nicolas Dufresne
2020-07-12 22:59                   ` Ezequiel Garcia
2020-07-14 16:04                     ` Nicolas Dufresne
2019-09-01 12:45   ` [RFC 09/12] media: hantro: Refactor G1 H264 code Jonas Karlman
2019-09-01 12:45   ` [RFC 10/12] media: hantro: Add support for H264 decoding on RK3399 Jonas Karlman
2019-09-02 11:46     ` Hans Verkuil
2019-09-02 15:25       ` Jonas Karlman
2019-09-01 12:45   ` [RFC 11/12] media: hantro: Enable " Jonas Karlman
2019-09-01 12:45   ` [RFC 12/12] media: hantro: Enable H264 decoding on RK3328 Jonas Karlman
2019-09-02 13:02 ` [PATCH RFC 00/12] media: hantro: H264 fixes and improvements Ezequiel Garcia
2019-09-02 16:28   ` Jonas Karlman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=HE1PR06MB4011A8F99D58E5ACFAE3CECAACBE0@HE1PR06MB4011.eurprd06.prod.outlook.com \
    --to=jonas@kwiboo.se \
    --cc=boris.brezillon@collabora.com \
    --cc=ezequiel@collabora.com \
    --cc=hverkuil@xs4all.nl \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=linux-rockchip@lists.infradead.org \
    --cc=mchehab@kernel.org \
    --cc=p.zabel@pengutronix.de \
    --cc=paul.kocialkowski@bootlin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).