All of lore.kernel.org
 help / color / mirror / Atom feed
From: Maxime Ripard <maxime.ripard@bootlin.com>
To: "Jernej Škrabec" <jernej.skrabec@gmail.com>
Cc: linux-sunxi@googlegroups.com, hans.verkuil@cisco.com,
	acourbot@chromium.org, sakari.ailus@linux.intel.com,
	Laurent Pinchart <laurent.pinchart@ideasonboard.com>,
	tfiga@chromium.org, posciak@chromium.org,
	Paul Kocialkowski <paul.kocialkowski@bootlin.com>,
	Chen-Yu Tsai <wens@csie.org>,
	linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-media@vger.kernel.org, nicolas.dufresne@collabora.com,
	jenskuske@gmail.com,
	Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Subject: Re: [linux-sunxi] [PATCH v2 2/2] media: cedrus: Add H264 decoding support
Date: Tue, 27 Nov 2018 16:50:28 +0100	[thread overview]
Message-ID: <20181127155028.5ukw3g6zjbnvarbp@flea> (raw)
In-Reply-To: <2826880.kP3DS59ZBy@jernej-laptop>

[-- Attachment #1: Type: text/plain, Size: 6717 bytes --]

Hi Jernej,

Thanks for your review!

On Sat, Nov 24, 2018 at 09:43:43PM +0100, Jernej Škrabec wrote:
> > +enum cedrus_h264_sram_off {
> > +	CEDRUS_SRAM_H264_PRED_WEIGHT_TABLE	= 0x000,
> > +	CEDRUS_SRAM_H264_FRAMEBUFFER_LIST	= 0x100,
> > +	CEDRUS_SRAM_H264_REF_LIST_0		= 0x190,
> > +	CEDRUS_SRAM_H264_REF_LIST_1		= 0x199,
> > +	CEDRUS_SRAM_H264_SCALING_LIST_8x8	= 0x200,
> > +	CEDRUS_SRAM_H264_SCALING_LIST_4x4	= 0x218,
> 
> I triple checked above address and it should be 0x220. For easier 
> implementation later, you might want to add second scaling list address for 
> 8x8 at 0x210. Then you can do something like:
> 
> cedrus_h264_write_sram(dev, CEDRUS_SRAM_H264_SCALING_LIST_8x8_0,
> 			       scaling->scaling_list_8x8[0],
> 			       sizeof(scaling->scaling_list_8x8[0]));
> cedrus_h264_write_sram(dev, CEDRUS_SRAM_H264_SCALING_LIST_8x8_1,
> 			       scaling->scaling_list_8x8[3],
> 			       sizeof(scaling->scaling_list_8x8[0]));
> cedrus_h264_write_sram(dev, CEDRUS_SRAM_H264_SCALING_LIST_4x4,
> 			       scaling->scaling_list_4x4,
> 			       sizeof(scaling->scaling_list_4x4));
> 
> I know that it's not implemented here, just FYI.

Ack. I guess I can just leave it out entirely for now, since it's not
implemented.

> > +static void cedrus_fill_ref_pic(struct cedrus_ctx *ctx,
> > +				struct cedrus_buffer *buf,
> > +				unsigned int top_field_order_cnt,
> > +				unsigned int bottom_field_order_cnt,
> > +				struct cedrus_h264_sram_ref_pic *pic)
> > +{
> > +	struct vb2_buffer *vbuf = &buf->m2m_buf.vb.vb2_buf;
> > +	unsigned int position = buf->codec.h264.position;
> > +
> > +	pic->top_field_order_cnt = top_field_order_cnt;
> > +	pic->bottom_field_order_cnt = bottom_field_order_cnt;
> > +	pic->frame_info = buf->codec.h264.pic_type << 8;
> > +
> > +	pic->luma_ptr = cedrus_buf_addr(vbuf, &ctx->dst_fmt, 0) - PHYS_OFFSET;
> > +	pic->chroma_ptr = cedrus_buf_addr(vbuf, &ctx->dst_fmt, 1) - PHYS_OFFSET;
> 
> I think subtracting PHYS_OFFSET breaks driver on H3 boards with 2 GiB of RAM. 
> Isn't that unnecessary anyway due to
> 
> dev->dev->dma_pfn_offset = PHYS_PFN_OFFSET;
> 
> in cedrus_hw.c?
> 
> This comment is meant for all PHYS_OFFSET subtracting in this patch.

PHYS_OFFSET was needed on some older SoCs, and the dma_pfn_offset
trick wasn't working, I hacked it and forgot about it. I'll try to
figure it out for the next version.

> > +static void _cedrus_write_ref_list(struct cedrus_ctx *ctx,
> > +				   struct cedrus_run *run,
> > +				   const u8 *ref_list, u8 num_ref,
> > +				   enum cedrus_h264_sram_off sram)
> > +{
> > +	const struct v4l2_ctrl_h264_decode_param *decode = run->h264.decode_param;
> > +	struct vb2_queue *cap_q = &ctx->fh.m2m_ctx->cap_q_ctx.q;
> > +	struct cedrus_dev *dev = ctx->dev;
> > +	u32 sram_array[CEDRUS_MAX_REF_IDX / sizeof(u32)];
> > +	unsigned int size, i;
> > +
> > +	memset(sram_array, 0, sizeof(sram_array));
> > +
> > +	for (i = 0; i < num_ref; i += 4) {
> > +		unsigned int j;
> > +
> > +		for (j = 0; j < 4; j++) {
> 
> I don't think you have to complicate with two loops here. 
> cedrus_h264_write_sram() takes void* and it aligns to 4 anyway. So as long 
> input buffer is multiple of 4 (u8[CEDRUS_MAX_REF_IDX] qualifies for that), you 
> can use single for loop with "u8 sram_array[CEDRUS_MAX_REF_IDX]". This should 
> make code much more readable.

This wasn't really about the alignment, but in order to get the
offsets in the u32 and the array more easily.

Breaking out the loop will make that computation less easy on the eye,
so I guess it's very subjective.

> > +			const struct v4l2_h264_dpb_entry *dpb;
> > +			const struct cedrus_buffer *cedrus_buf;
> > +			const struct vb2_v4l2_buffer *ref_buf;
> > +			unsigned int position;
> > +			int buf_idx;
> > +			u8 ref_idx = i + j;
> > +			u8 dpb_idx;
> > +
> > +			if (ref_idx >= num_ref)
> > +				break;
> > +
> > +			dpb_idx = ref_list[ref_idx];
> > +			dpb = &decode->dpb[dpb_idx];
> > +
> > +			if (!(dpb->flags & V4L2_H264_DPB_ENTRY_FLAG_ACTIVE))
> > +				continue;
> > +
> > +			buf_idx = vb2_find_tag(cap_q, dpb->tag, 0);
> > +			if (buf_idx < 0)
> > +				continue;
> > +
> > +			ref_buf = to_vb2_v4l2_buffer(ctx->dst_bufs[buf_idx]);
> > +			cedrus_buf = vb2_v4l2_to_cedrus_buffer(ref_buf);
> > +			position = cedrus_buf->codec.h264.position;
> > +
> > +			sram_array[i] |= position << (j * 8 + 1);
> > +			if (ref_buf->field == V4L2_FIELD_BOTTOM)
> 
> You newer set above flag to buffer so this will be always false.

As far as I know, the field is supposed to be set by the userspace.

> > +	// sequence parameters
> > +	reg = BIT(19);
> 
> This one can be inferred from sps->chroma_format_idc.

I'll look into this

> > +	reg |= (sps->pic_width_in_mbs_minus1 & 0xff) << 8;
> > +	reg |= sps->pic_height_in_map_units_minus1 & 0xff;
> > +	if (sps->flags & V4L2_H264_SPS_FLAG_FRAME_MBS_ONLY)
> > +		reg |= BIT(18);
> > +	if (sps->flags & V4L2_H264_SPS_FLAG_MB_ADAPTIVE_FRAME_FIELD)
> > +		reg |= BIT(17);
> > +	if (sps->flags & V4L2_H264_SPS_FLAG_DIRECT_8X8_INFERENCE)
> > +		reg |= BIT(16);
> > +	cedrus_write(dev, VE_H264_FRAME_SIZE, reg);
> > +
> > +	// slice parameters
> > +	reg = 0;
> > +	/*
> > +	 * FIXME: This bit marks all the frames as references. This
> > +	 * should probably be set based on nal_ref_idc, but the libva
> > +	 * doesn't pass that information along, so this is not always
> > +	 * available. We should find something else, maybe change the
> > +	 * kernel UAPI somehow?
> > +	 */
> > +	reg |= BIT(12);
> 
> I really think you should use nal_ref_idc here as it is in specification.  You 
> can still fake the data from libva backend. I don't think that any driver 
> needs this for anything else than check if it is 0 or not.

Yeah, Tomasz suggested the same thing as a reply to the cover letter,
I'll change that in the next version.

> > +	reg |= (slice->slice_type & 0xf) << 8;
> > +	reg |= slice->cabac_init_idc & 0x3;
> > +	reg |= BIT(5);
> > +	if (slice->flags & V4L2_H264_SLICE_FLAG_FIELD_PIC)
> > +		reg |= BIT(4);
> > +	if (slice->flags & V4L2_H264_SLICE_FLAG_BOTTOM_FIELD)
> > +		reg |= BIT(3);
> > +	if (slice->flags & V4L2_H264_SLICE_FLAG_DIRECT_SPATIAL_MV_PRED)
> > +		reg |= BIT(2);
> > +	cedrus_write(dev, VE_H264_SLICE_HDR, reg);
> > +
> > +	reg = 0;
> 
> You might want to set bit 12 here, which enables active reference picture 
> override. However, I'm not completely sure about that.

Did you find some videos that were broken because of this?

Thanks!
Maxime

-- 
Maxime Ripard, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

WARNING: multiple messages have this Message-ID (diff)
From: maxime.ripard@bootlin.com (Maxime Ripard)
To: linux-arm-kernel@lists.infradead.org
Subject: [linux-sunxi] [PATCH v2 2/2] media: cedrus: Add H264 decoding support
Date: Tue, 27 Nov 2018 16:50:28 +0100	[thread overview]
Message-ID: <20181127155028.5ukw3g6zjbnvarbp@flea> (raw)
In-Reply-To: <2826880.kP3DS59ZBy@jernej-laptop>

Hi Jernej,

Thanks for your review!

On Sat, Nov 24, 2018 at 09:43:43PM +0100, Jernej ?krabec wrote:
> > +enum cedrus_h264_sram_off {
> > +	CEDRUS_SRAM_H264_PRED_WEIGHT_TABLE	= 0x000,
> > +	CEDRUS_SRAM_H264_FRAMEBUFFER_LIST	= 0x100,
> > +	CEDRUS_SRAM_H264_REF_LIST_0		= 0x190,
> > +	CEDRUS_SRAM_H264_REF_LIST_1		= 0x199,
> > +	CEDRUS_SRAM_H264_SCALING_LIST_8x8	= 0x200,
> > +	CEDRUS_SRAM_H264_SCALING_LIST_4x4	= 0x218,
> 
> I triple checked above address and it should be 0x220. For easier 
> implementation later, you might want to add second scaling list address for 
> 8x8 at 0x210. Then you can do something like:
> 
> cedrus_h264_write_sram(dev, CEDRUS_SRAM_H264_SCALING_LIST_8x8_0,
> 			       scaling->scaling_list_8x8[0],
> 			       sizeof(scaling->scaling_list_8x8[0]));
> cedrus_h264_write_sram(dev, CEDRUS_SRAM_H264_SCALING_LIST_8x8_1,
> 			       scaling->scaling_list_8x8[3],
> 			       sizeof(scaling->scaling_list_8x8[0]));
> cedrus_h264_write_sram(dev, CEDRUS_SRAM_H264_SCALING_LIST_4x4,
> 			       scaling->scaling_list_4x4,
> 			       sizeof(scaling->scaling_list_4x4));
> 
> I know that it's not implemented here, just FYI.

Ack. I guess I can just leave it out entirely for now, since it's not
implemented.

> > +static void cedrus_fill_ref_pic(struct cedrus_ctx *ctx,
> > +				struct cedrus_buffer *buf,
> > +				unsigned int top_field_order_cnt,
> > +				unsigned int bottom_field_order_cnt,
> > +				struct cedrus_h264_sram_ref_pic *pic)
> > +{
> > +	struct vb2_buffer *vbuf = &buf->m2m_buf.vb.vb2_buf;
> > +	unsigned int position = buf->codec.h264.position;
> > +
> > +	pic->top_field_order_cnt = top_field_order_cnt;
> > +	pic->bottom_field_order_cnt = bottom_field_order_cnt;
> > +	pic->frame_info = buf->codec.h264.pic_type << 8;
> > +
> > +	pic->luma_ptr = cedrus_buf_addr(vbuf, &ctx->dst_fmt, 0) - PHYS_OFFSET;
> > +	pic->chroma_ptr = cedrus_buf_addr(vbuf, &ctx->dst_fmt, 1) - PHYS_OFFSET;
> 
> I think subtracting PHYS_OFFSET breaks driver on H3 boards with 2 GiB of RAM. 
> Isn't that unnecessary anyway due to
> 
> dev->dev->dma_pfn_offset = PHYS_PFN_OFFSET;
> 
> in cedrus_hw.c?
> 
> This comment is meant for all PHYS_OFFSET subtracting in this patch.

PHYS_OFFSET was needed on some older SoCs, and the dma_pfn_offset
trick wasn't working, I hacked it and forgot about it. I'll try to
figure it out for the next version.

> > +static void _cedrus_write_ref_list(struct cedrus_ctx *ctx,
> > +				   struct cedrus_run *run,
> > +				   const u8 *ref_list, u8 num_ref,
> > +				   enum cedrus_h264_sram_off sram)
> > +{
> > +	const struct v4l2_ctrl_h264_decode_param *decode = run->h264.decode_param;
> > +	struct vb2_queue *cap_q = &ctx->fh.m2m_ctx->cap_q_ctx.q;
> > +	struct cedrus_dev *dev = ctx->dev;
> > +	u32 sram_array[CEDRUS_MAX_REF_IDX / sizeof(u32)];
> > +	unsigned int size, i;
> > +
> > +	memset(sram_array, 0, sizeof(sram_array));
> > +
> > +	for (i = 0; i < num_ref; i += 4) {
> > +		unsigned int j;
> > +
> > +		for (j = 0; j < 4; j++) {
> 
> I don't think you have to complicate with two loops here. 
> cedrus_h264_write_sram() takes void* and it aligns to 4 anyway. So as long 
> input buffer is multiple of 4 (u8[CEDRUS_MAX_REF_IDX] qualifies for that), you 
> can use single for loop with "u8 sram_array[CEDRUS_MAX_REF_IDX]". This should 
> make code much more readable.

This wasn't really about the alignment, but in order to get the
offsets in the u32 and the array more easily.

Breaking out the loop will make that computation less easy on the eye,
so I guess it's very subjective.

> > +			const struct v4l2_h264_dpb_entry *dpb;
> > +			const struct cedrus_buffer *cedrus_buf;
> > +			const struct vb2_v4l2_buffer *ref_buf;
> > +			unsigned int position;
> > +			int buf_idx;
> > +			u8 ref_idx = i + j;
> > +			u8 dpb_idx;
> > +
> > +			if (ref_idx >= num_ref)
> > +				break;
> > +
> > +			dpb_idx = ref_list[ref_idx];
> > +			dpb = &decode->dpb[dpb_idx];
> > +
> > +			if (!(dpb->flags & V4L2_H264_DPB_ENTRY_FLAG_ACTIVE))
> > +				continue;
> > +
> > +			buf_idx = vb2_find_tag(cap_q, dpb->tag, 0);
> > +			if (buf_idx < 0)
> > +				continue;
> > +
> > +			ref_buf = to_vb2_v4l2_buffer(ctx->dst_bufs[buf_idx]);
> > +			cedrus_buf = vb2_v4l2_to_cedrus_buffer(ref_buf);
> > +			position = cedrus_buf->codec.h264.position;
> > +
> > +			sram_array[i] |= position << (j * 8 + 1);
> > +			if (ref_buf->field == V4L2_FIELD_BOTTOM)
> 
> You newer set above flag to buffer so this will be always false.

As far as I know, the field is supposed to be set by the userspace.

> > +	// sequence parameters
> > +	reg = BIT(19);
> 
> This one can be inferred from sps->chroma_format_idc.

I'll look into this

> > +	reg |= (sps->pic_width_in_mbs_minus1 & 0xff) << 8;
> > +	reg |= sps->pic_height_in_map_units_minus1 & 0xff;
> > +	if (sps->flags & V4L2_H264_SPS_FLAG_FRAME_MBS_ONLY)
> > +		reg |= BIT(18);
> > +	if (sps->flags & V4L2_H264_SPS_FLAG_MB_ADAPTIVE_FRAME_FIELD)
> > +		reg |= BIT(17);
> > +	if (sps->flags & V4L2_H264_SPS_FLAG_DIRECT_8X8_INFERENCE)
> > +		reg |= BIT(16);
> > +	cedrus_write(dev, VE_H264_FRAME_SIZE, reg);
> > +
> > +	// slice parameters
> > +	reg = 0;
> > +	/*
> > +	 * FIXME: This bit marks all the frames as references. This
> > +	 * should probably be set based on nal_ref_idc, but the libva
> > +	 * doesn't pass that information along, so this is not always
> > +	 * available. We should find something else, maybe change the
> > +	 * kernel UAPI somehow?
> > +	 */
> > +	reg |= BIT(12);
> 
> I really think you should use nal_ref_idc here as it is in specification.  You 
> can still fake the data from libva backend. I don't think that any driver 
> needs this for anything else than check if it is 0 or not.

Yeah, Tomasz suggested the same thing as a reply to the cover letter,
I'll change that in the next version.

> > +	reg |= (slice->slice_type & 0xf) << 8;
> > +	reg |= slice->cabac_init_idc & 0x3;
> > +	reg |= BIT(5);
> > +	if (slice->flags & V4L2_H264_SLICE_FLAG_FIELD_PIC)
> > +		reg |= BIT(4);
> > +	if (slice->flags & V4L2_H264_SLICE_FLAG_BOTTOM_FIELD)
> > +		reg |= BIT(3);
> > +	if (slice->flags & V4L2_H264_SLICE_FLAG_DIRECT_SPATIAL_MV_PRED)
> > +		reg |= BIT(2);
> > +	cedrus_write(dev, VE_H264_SLICE_HDR, reg);
> > +
> > +	reg = 0;
> 
> You might want to set bit 12 here, which enables active reference picture 
> override. However, I'm not completely sure about that.

Did you find some videos that were broken because of this?

Thanks!
Maxime

-- 
Maxime Ripard, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 228 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20181127/4a47041c/attachment-0001.sig>

  reply	other threads:[~2018-11-27 15:51 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-15 14:56 [PATCH v2 0/2] media: cedrus: Add H264 decoding support Maxime Ripard
2018-11-15 14:56 ` Maxime Ripard
2018-11-15 14:56 ` [PATCH v2 1/2] media: uapi: Add H264 low-level decoder API compound controls Maxime Ripard
2018-11-15 14:56   ` Maxime Ripard
2018-11-27 17:23   ` [linux-sunxi] " Jernej Škrabec
2018-11-27 17:23     ` Jernej Škrabec
2018-11-28 15:52     ` Maxime Ripard
2018-11-28 15:52       ` Maxime Ripard
2018-12-05 12:56   ` Hans Verkuil
2018-12-05 12:56     ` Hans Verkuil
2019-01-08  9:52   ` Randy 'ayaka' Li
2019-01-08  9:52     ` Randy 'ayaka' Li
2019-01-08 17:01     ` ayaka
2019-01-08 17:01       ` ayaka
2019-01-10 13:33       ` ayaka
2019-01-10 13:33         ` ayaka
2019-01-17 11:21         ` Maxime Ripard
2019-01-17 11:21           ` Maxime Ripard
2019-01-17 11:16       ` Maxime Ripard
2019-01-17 11:16         ` Maxime Ripard
2019-01-17 11:01     ` Maxime Ripard
2019-01-17 11:01       ` Maxime Ripard
2019-01-20 12:48       ` ayaka
2019-01-20 12:48         ` ayaka
2019-01-24 14:23         ` Maxime Ripard
2019-01-24 14:23           ` Maxime Ripard
2019-01-24 14:37           ` Ayaka
2019-01-24 14:37             ` Ayaka
2019-01-25 12:47             ` Maxime Ripard
2019-01-25 12:47               ` Maxime Ripard
2019-01-28  5:54   ` Alexandre Courbot
2019-01-28  5:54     ` Alexandre Courbot
2018-11-15 14:56 ` [PATCH v2 2/2] media: cedrus: Add H264 decoding support Maxime Ripard
2018-11-15 14:56   ` Maxime Ripard
2018-11-24 20:43   ` [linux-sunxi] " Jernej Škrabec
2018-11-24 20:43     ` Jernej Škrabec
2018-11-27 15:50     ` Maxime Ripard [this message]
2018-11-27 15:50       ` Maxime Ripard
2018-11-27 16:30       ` Jernej Škrabec
2018-11-27 16:30         ` Jernej Škrabec
2018-11-27 20:19         ` Jernej Škrabec
2018-11-27 20:19           ` Jernej Škrabec
2018-11-30  7:30         ` Maxime Ripard
2018-11-30  7:30           ` Maxime Ripard
2018-11-30 17:56           ` Jernej Škrabec
2018-11-30 17:56             ` Jernej Škrabec
2018-11-30 12:37   ` Paul Kocialkowski
2018-11-30 12:37     ` Paul Kocialkowski
2018-12-05 22:27   ` [linux-sunxi] " Jernej Škrabec
2018-12-05 22:27     ` Jernej Škrabec
2018-11-16  7:04 ` [PATCH v2 0/2] " Tomasz Figa
2018-11-16  7:04   ` Tomasz Figa
2018-11-16  7:04   ` Tomasz Figa
2018-11-19 14:12   ` Maxime Ripard
2018-11-19 14:12     ` Maxime Ripard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181127155028.5ukw3g6zjbnvarbp@flea \
    --to=maxime.ripard@bootlin.com \
    --cc=acourbot@chromium.org \
    --cc=hans.verkuil@cisco.com \
    --cc=jenskuske@gmail.com \
    --cc=jernej.skrabec@gmail.com \
    --cc=laurent.pinchart@ideasonboard.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=linux-sunxi@googlegroups.com \
    --cc=nicolas.dufresne@collabora.com \
    --cc=paul.kocialkowski@bootlin.com \
    --cc=posciak@chromium.org \
    --cc=sakari.ailus@linux.intel.com \
    --cc=tfiga@chromium.org \
    --cc=thomas.petazzoni@bootlin.com \
    --cc=wens@csie.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.