Linux-Media Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH 0/7] media: cedrus: Improvements/cleanup
@ 2019-05-30 21:15 Jernej Skrabec
  2019-05-30 21:15 ` [PATCH 1/7] media: cedrus: Disable engine after each slice decoding Jernej Skrabec
                   ` (7 more replies)
  0 siblings, 8 replies; 30+ messages in thread
From: Jernej Skrabec @ 2019-05-30 21:15 UTC (permalink / raw)
  To: paul.kocialkowski, maxime.ripard
  Cc: wens, mchehab, gregkh, linux-media, devel, linux-arm-kernel,
	linux-kernel

Here is first batch of random Cedrus improvements/cleanups. Only patch 2
has a change which raises a question about H264 controls.

Changes were tested on H3 SoC using modified ffmpeg and Kodi.

Please take a look.

Best regards,
Jernej

Jernej Skrabec (7):
  media: cedrus: Disable engine after each slice decoding
  media: cedrus: Fix H264 default reference index count
  media: cedrus: Fix decoding for some H264 videos
  media: cedrus: Remove dst_bufs from context
  media: cedrus: Don't set chroma size for scale & rotation
  media: cedrus: Add infra for extra buffers connected to capture
    buffers
  media: cedrus: Improve H264 memory efficiency

 drivers/staging/media/sunxi/cedrus/cedrus.h   |  12 +-
 .../staging/media/sunxi/cedrus/cedrus_h264.c  | 115 ++++++++----------
 .../staging/media/sunxi/cedrus/cedrus_hw.c    |   4 +-
 .../staging/media/sunxi/cedrus/cedrus_video.c |  25 ++--
 4 files changed, 68 insertions(+), 88 deletions(-)

-- 
2.21.0


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 1/7] media: cedrus: Disable engine after each slice decoding
  2019-05-30 21:15 [PATCH 0/7] media: cedrus: Improvements/cleanup Jernej Skrabec
@ 2019-05-30 21:15 ` Jernej Skrabec
  2019-06-03 11:38   ` Maxime Ripard
  2019-05-30 21:15 ` [PATCH 2/7] media: cedrus: Fix H264 default reference index count Jernej Skrabec
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 30+ messages in thread
From: Jernej Skrabec @ 2019-05-30 21:15 UTC (permalink / raw)
  To: paul.kocialkowski, maxime.ripard
  Cc: wens, mchehab, gregkh, linux-media, devel, linux-arm-kernel,
	linux-kernel

libvdpau-sunxi always disables engine after each decoded slice.
Do same in Cedrus driver.

Presumably this also lowers power consumption which is always nice.

Signed-off-by: Jernej Skrabec <jernej.skrabec@siol.net>
---
 drivers/staging/media/sunxi/cedrus/cedrus_hw.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_hw.c b/drivers/staging/media/sunxi/cedrus/cedrus_hw.c
index c34aec7c6e40..9c5819def186 100644
--- a/drivers/staging/media/sunxi/cedrus/cedrus_hw.c
+++ b/drivers/staging/media/sunxi/cedrus/cedrus_hw.c
@@ -123,6 +123,7 @@ static irqreturn_t cedrus_irq(int irq, void *data)
 
 	dev->dec_ops[ctx->current_codec]->irq_disable(ctx);
 	dev->dec_ops[ctx->current_codec]->irq_clear(ctx);
+	cedrus_engine_disable(dev);
 
 	src_buf = v4l2_m2m_src_buf_remove(ctx->fh.m2m_ctx);
 	dst_buf = v4l2_m2m_dst_buf_remove(ctx->fh.m2m_ctx);
-- 
2.21.0


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 2/7] media: cedrus: Fix H264 default reference index count
  2019-05-30 21:15 [PATCH 0/7] media: cedrus: Improvements/cleanup Jernej Skrabec
  2019-05-30 21:15 ` [PATCH 1/7] media: cedrus: Disable engine after each slice decoding Jernej Skrabec
@ 2019-05-30 21:15 ` Jernej Skrabec
  2019-06-03 11:46   ` Maxime Ripard
  2019-05-30 21:15 ` [PATCH 3/7] media: cedrus: Fix decoding for some H264 videos Jernej Skrabec
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 30+ messages in thread
From: Jernej Skrabec @ 2019-05-30 21:15 UTC (permalink / raw)
  To: paul.kocialkowski, maxime.ripard
  Cc: wens, mchehab, gregkh, linux-media, devel, linux-arm-kernel,
	linux-kernel, nicolas, boris.brezillon, jonas

Reference index count in VE_H264_PPS reg should come from PPS control.
However, this is not really important because reference index count is
in our case always overridden by that from slice header.

Cc: nicolas@ndufresne.ca
Cc: boris.brezillon@collabora.com
Cc: jonas@kwiboo.se

Signed-off-by: Jernej Skrabec <jernej.skrabec@siol.net>
---
We have to decide if we drop pps->num_ref_idx_l0_default_active_minus1
and pps->num_ref_idx_l1_default_active_minus1 fields or add
num_ref_idx_l0_active_override_flag and num_ref_idx_l0_active_override_flag
to slice control.

Current control doesn't have those two flags, so in Cedrus override flag is
always set and we rely on userspace to set slice->num_ref_idx_l0_active_minus1
and slice->num_ref_idx_l1_active_minus1 to correct values. This means that
values stored in PPS are not needed and always ignored by VPU.

If I understand correctly, algorithm is very simple:

ref_count = PPS->ref_count
if (override_flag)
	ref_count = slice->ref_count

It seems that VAAPI provides only final value. In my opinion we should do the
same - get rid of PPS default ref index count fields.

 drivers/staging/media/sunxi/cedrus/cedrus_h264.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
index a30bb283f69f..cc8d17f211a1 100644
--- a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
+++ b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
@@ -340,12 +340,8 @@ static void cedrus_set_params(struct cedrus_ctx *ctx,
 
 	// picture parameters
 	reg = 0;
-	/*
-	 * FIXME: the kernel headers are allowing the default value to
-	 * be passed, but the libva doesn't give us that.
-	 */
-	reg |= (slice->num_ref_idx_l0_active_minus1 & 0x1f) << 10;
-	reg |= (slice->num_ref_idx_l1_active_minus1 & 0x1f) << 5;
+	reg |= (pps->num_ref_idx_l0_default_active_minus1 & 0x1f) << 10;
+	reg |= (pps->num_ref_idx_l1_default_active_minus1 & 0x1f) << 5;
 	reg |= (pps->weighted_bipred_idc & 0x3) << 2;
 	if (pps->flags & V4L2_H264_PPS_FLAG_ENTROPY_CODING_MODE)
 		reg |= VE_H264_PPS_ENTROPY_CODING_MODE;
-- 
2.21.0


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 3/7] media: cedrus: Fix decoding for some H264 videos
  2019-05-30 21:15 [PATCH 0/7] media: cedrus: Improvements/cleanup Jernej Skrabec
  2019-05-30 21:15 ` [PATCH 1/7] media: cedrus: Disable engine after each slice decoding Jernej Skrabec
  2019-05-30 21:15 ` [PATCH 2/7] media: cedrus: Fix H264 default reference index count Jernej Skrabec
@ 2019-05-30 21:15 ` Jernej Skrabec
  2019-06-03 11:55   ` Maxime Ripard
  2019-05-30 21:15 ` [PATCH 4/7] media: cedrus: Remove dst_bufs from context Jernej Skrabec
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 30+ messages in thread
From: Jernej Skrabec @ 2019-05-30 21:15 UTC (permalink / raw)
  To: paul.kocialkowski, maxime.ripard
  Cc: wens, mchehab, gregkh, linux-media, devel, linux-arm-kernel,
	linux-kernel

It seems that for some H264 videos at least one bitstream parsing
trigger must be called in order to be decoded correctly. There is no
explanation why this helps, but it was observed that two sample videos
with this fix are now decoded correctly and there is no regression with
others.

Signed-off-by: Jernej Skrabec <jernej.skrabec@siol.net>
---
I have two samples which are fixed by this:
http://jernej.libreelec.tv/videos/h264/test.mkv
http://jernej.libreelec.tv/videos/h264/Dredd%20%E2%80%93%20DTS%20Sound%20Check%20DTS-HD%20MA%207.1.m2ts

Although second one also needs support for multi-slice frames, which is not yet implemented here.

 .../staging/media/sunxi/cedrus/cedrus_h264.c  | 22 ++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
index cc8d17f211a1..d0ee3f90ff46 100644
--- a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
+++ b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
@@ -6,6 +6,7 @@
  * Copyright (c) 2018 Bootlin
  */
 
+#include <linux/delay.h>
 #include <linux/types.h>
 
 #include <media/videobuf2-dma-contig.h>
@@ -289,6 +290,20 @@ static void cedrus_write_pred_weight_table(struct cedrus_ctx *ctx,
 	}
 }
 
+static void cedrus_skip_bits(struct cedrus_dev *dev, int num)
+{
+	for (; num > 32; num -= 32) {
+		cedrus_write(dev, VE_H264_TRIGGER_TYPE, 0x3 | (32 << 8));
+		while (cedrus_read(dev, VE_H264_STATUS) & (1 << 8))
+			udelay(1);
+	}
+	if (num > 0) {
+		cedrus_write(dev, VE_H264_TRIGGER_TYPE, 0x3 | (num << 8));
+		while (cedrus_read(dev, VE_H264_STATUS) & (1 << 8))
+			udelay(1);
+	}
+}
+
 static void cedrus_set_params(struct cedrus_ctx *ctx,
 			      struct cedrus_run *run)
 {
@@ -299,12 +314,11 @@ static void cedrus_set_params(struct cedrus_ctx *ctx,
 	struct vb2_buffer *src_buf = &run->src->vb2_buf;
 	struct cedrus_dev *dev = ctx->dev;
 	dma_addr_t src_buf_addr;
-	u32 offset = slice->header_bit_size;
-	u32 len = (slice->size * 8) - offset;
+	u32 len = slice->size * 8;
 	u32 reg;
 
 	cedrus_write(dev, VE_H264_VLD_LEN, len);
-	cedrus_write(dev, VE_H264_VLD_OFFSET, offset);
+	cedrus_write(dev, VE_H264_VLD_OFFSET, 0);
 
 	src_buf_addr = vb2_dma_contig_plane_dma_addr(src_buf, 0);
 	cedrus_write(dev, VE_H264_VLD_END,
@@ -323,6 +337,8 @@ static void cedrus_set_params(struct cedrus_ctx *ctx,
 	cedrus_write(dev, VE_H264_TRIGGER_TYPE,
 		     VE_H264_TRIGGER_TYPE_INIT_SWDEC);
 
+	cedrus_skip_bits(dev, slice->header_bit_size);
+
 	if (((pps->flags & V4L2_H264_PPS_FLAG_WEIGHTED_PRED) &&
 	     (slice->slice_type == V4L2_H264_SLICE_TYPE_P ||
 	      slice->slice_type == V4L2_H264_SLICE_TYPE_SP)) ||
-- 
2.21.0


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 4/7] media: cedrus: Remove dst_bufs from context
  2019-05-30 21:15 [PATCH 0/7] media: cedrus: Improvements/cleanup Jernej Skrabec
                   ` (2 preceding siblings ...)
  2019-05-30 21:15 ` [PATCH 3/7] media: cedrus: Fix decoding for some H264 videos Jernej Skrabec
@ 2019-05-30 21:15 ` Jernej Skrabec
  2019-06-03 12:13   ` Maxime Ripard
                     ` (2 more replies)
  2019-05-30 21:15 ` [PATCH 5/7] media: cedrus: Don't set chroma size for scale & rotation Jernej Skrabec
                   ` (3 subsequent siblings)
  7 siblings, 3 replies; 30+ messages in thread
From: Jernej Skrabec @ 2019-05-30 21:15 UTC (permalink / raw)
  To: paul.kocialkowski, maxime.ripard
  Cc: wens, mchehab, gregkh, linux-media, devel, linux-arm-kernel,
	linux-kernel

This array is just duplicated capture buffer queue. Remove it and adjust
code to look into capture buffer queue instead.

Signed-off-by: Jernej Skrabec <jernej.skrabec@siol.net>
---
 drivers/staging/media/sunxi/cedrus/cedrus.h   |  4 +---
 .../staging/media/sunxi/cedrus/cedrus_h264.c  |  4 ++--
 .../staging/media/sunxi/cedrus/cedrus_video.c | 22 -------------------
 3 files changed, 3 insertions(+), 27 deletions(-)

diff --git a/drivers/staging/media/sunxi/cedrus/cedrus.h b/drivers/staging/media/sunxi/cedrus/cedrus.h
index 3f476d0fd981..d8e6777e5e27 100644
--- a/drivers/staging/media/sunxi/cedrus/cedrus.h
+++ b/drivers/staging/media/sunxi/cedrus/cedrus.h
@@ -100,8 +100,6 @@ struct cedrus_ctx {
 	struct v4l2_ctrl_handler	hdl;
 	struct v4l2_ctrl		**ctrls;
 
-	struct vb2_buffer		*dst_bufs[VIDEO_MAX_FRAME];
-
 	union {
 		struct {
 			void		*mv_col_buf;
@@ -187,7 +185,7 @@ static inline dma_addr_t cedrus_dst_buf_addr(struct cedrus_ctx *ctx,
 	if (index < 0)
 		return 0;
 
-	buf = ctx->dst_bufs[index];
+	buf = ctx->fh.m2m_ctx->cap_q_ctx.q.bufs[index];
 	return buf ? cedrus_buf_addr(buf, &ctx->dst_fmt, plane) : 0;
 }
 
diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
index d0ee3f90ff46..b2290f98d81a 100644
--- a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
+++ b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
@@ -119,7 +119,7 @@ static void cedrus_write_frame_list(struct cedrus_ctx *ctx,
 		if (buf_idx < 0)
 			continue;
 
-		cedrus_buf = vb2_to_cedrus_buffer(ctx->dst_bufs[buf_idx]);
+		cedrus_buf = vb2_to_cedrus_buffer(cap_q->bufs[buf_idx]);
 		position = cedrus_buf->codec.h264.position;
 		used_dpbs |= BIT(position);
 
@@ -194,7 +194,7 @@ static void _cedrus_write_ref_list(struct cedrus_ctx *ctx,
 		if (buf_idx < 0)
 			continue;
 
-		ref_buf = to_vb2_v4l2_buffer(ctx->dst_bufs[buf_idx]);
+		ref_buf = to_vb2_v4l2_buffer(cap_q->bufs[buf_idx]);
 		cedrus_buf = vb2_v4l2_to_cedrus_buffer(ref_buf);
 		position = cedrus_buf->codec.h264.position;
 
diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_video.c b/drivers/staging/media/sunxi/cedrus/cedrus_video.c
index e2b530b1a956..681dfe3367a6 100644
--- a/drivers/staging/media/sunxi/cedrus/cedrus_video.c
+++ b/drivers/staging/media/sunxi/cedrus/cedrus_video.c
@@ -411,26 +411,6 @@ static void cedrus_queue_cleanup(struct vb2_queue *vq, u32 state)
 	}
 }
 
-static int cedrus_buf_init(struct vb2_buffer *vb)
-{
-	struct vb2_queue *vq = vb->vb2_queue;
-	struct cedrus_ctx *ctx = vb2_get_drv_priv(vq);
-
-	if (!V4L2_TYPE_IS_OUTPUT(vq->type))
-		ctx->dst_bufs[vb->index] = vb;
-
-	return 0;
-}
-
-static void cedrus_buf_cleanup(struct vb2_buffer *vb)
-{
-	struct vb2_queue *vq = vb->vb2_queue;
-	struct cedrus_ctx *ctx = vb2_get_drv_priv(vq);
-
-	if (!V4L2_TYPE_IS_OUTPUT(vq->type))
-		ctx->dst_bufs[vb->index] = NULL;
-}
-
 static int cedrus_buf_out_validate(struct vb2_buffer *vb)
 {
 	struct vb2_v4l2_buffer *vbuf = to_vb2_v4l2_buffer(vb);
@@ -517,8 +497,6 @@ static void cedrus_buf_request_complete(struct vb2_buffer *vb)
 static struct vb2_ops cedrus_qops = {
 	.queue_setup		= cedrus_queue_setup,
 	.buf_prepare		= cedrus_buf_prepare,
-	.buf_init		= cedrus_buf_init,
-	.buf_cleanup		= cedrus_buf_cleanup,
 	.buf_queue		= cedrus_buf_queue,
 	.buf_out_validate	= cedrus_buf_out_validate,
 	.buf_request_complete	= cedrus_buf_request_complete,
-- 
2.21.0


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 5/7] media: cedrus: Don't set chroma size for scale & rotation
  2019-05-30 21:15 [PATCH 0/7] media: cedrus: Improvements/cleanup Jernej Skrabec
                   ` (3 preceding siblings ...)
  2019-05-30 21:15 ` [PATCH 4/7] media: cedrus: Remove dst_bufs from context Jernej Skrabec
@ 2019-05-30 21:15 ` Jernej Skrabec
  2019-06-05 21:08   ` Paul Kocialkowski
  2019-05-30 21:15 ` [PATCH 6/7] media: cedrus: Add infra for extra buffers connected to capture buffers Jernej Skrabec
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 30+ messages in thread
From: Jernej Skrabec @ 2019-05-30 21:15 UTC (permalink / raw)
  To: paul.kocialkowski, maxime.ripard
  Cc: wens, mchehab, gregkh, linux-media, devel, linux-arm-kernel,
	linux-kernel

Scale and rotation are currently not implemented, so it makes no sense to
set chroma size for it.

Signed-off-by: Jernej Skrabec <jernej.skrabec@siol.net>
---
 drivers/staging/media/sunxi/cedrus/cedrus_hw.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_hw.c b/drivers/staging/media/sunxi/cedrus/cedrus_hw.c
index 9c5819def186..9de20ae47916 100644
--- a/drivers/staging/media/sunxi/cedrus/cedrus_hw.c
+++ b/drivers/staging/media/sunxi/cedrus/cedrus_hw.c
@@ -79,9 +79,6 @@ void cedrus_dst_format_set(struct cedrus_dev *dev,
 		reg = VE_PRIMARY_OUT_FMT_NV12;
 		cedrus_write(dev, VE_PRIMARY_OUT_FMT, reg);
 
-		reg = VE_CHROMA_BUF_LEN_SDRT(chroma_size / 2);
-		cedrus_write(dev, VE_CHROMA_BUF_LEN, reg);
-
 		reg = chroma_size / 2;
 		cedrus_write(dev, VE_PRIMARY_CHROMA_BUF_LEN, reg);
 
-- 
2.21.0


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 6/7] media: cedrus: Add infra for extra buffers connected to capture buffers
  2019-05-30 21:15 [PATCH 0/7] media: cedrus: Improvements/cleanup Jernej Skrabec
                   ` (4 preceding siblings ...)
  2019-05-30 21:15 ` [PATCH 5/7] media: cedrus: Don't set chroma size for scale & rotation Jernej Skrabec
@ 2019-05-30 21:15 ` Jernej Skrabec
  2019-06-03 12:18   ` Maxime Ripard
  2019-05-30 21:15 ` [PATCH 7/7] media: cedrus: Improve H264 memory efficiency Jernej Skrabec
  2019-08-12 12:12 ` [PATCH 0/7] media: cedrus: Improvements/cleanup Hans Verkuil
  7 siblings, 1 reply; 30+ messages in thread
From: Jernej Skrabec @ 2019-05-30 21:15 UTC (permalink / raw)
  To: paul.kocialkowski, maxime.ripard
  Cc: wens, mchehab, gregkh, linux-media, devel, linux-arm-kernel,
	linux-kernel

H264 and HEVC engines need additional buffers for each capture buffer.
H264 engine has this currently solved by allocating fixed size pool,
which is not ideal. Most of the time pool size is much bigger than it
needs to be.

Ideally, extra buffer should be allocated at buffer initialization, but
that's not efficient. It's size in H264 depends on flags set in SPS, but
that information is not available in buffer init callback.

Signed-off-by: Jernej Skrabec <jernej.skrabec@siol.net>
---
 drivers/staging/media/sunxi/cedrus/cedrus.h   |  4 ++++
 .../staging/media/sunxi/cedrus/cedrus_video.c | 19 +++++++++++++++++++
 2 files changed, 23 insertions(+)

diff --git a/drivers/staging/media/sunxi/cedrus/cedrus.h b/drivers/staging/media/sunxi/cedrus/cedrus.h
index d8e6777e5e27..16c1bdfd243a 100644
--- a/drivers/staging/media/sunxi/cedrus/cedrus.h
+++ b/drivers/staging/media/sunxi/cedrus/cedrus.h
@@ -81,6 +81,10 @@ struct cedrus_run {
 struct cedrus_buffer {
 	struct v4l2_m2m_buffer          m2m_buf;
 
+	void		*extra_buf;
+	dma_addr_t	extra_buf_dma;
+	ssize_t		extra_buf_size;
+
 	union {
 		struct {
 			unsigned int			position;
diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_video.c b/drivers/staging/media/sunxi/cedrus/cedrus_video.c
index 681dfe3367a6..d756e0e69634 100644
--- a/drivers/staging/media/sunxi/cedrus/cedrus_video.c
+++ b/drivers/staging/media/sunxi/cedrus/cedrus_video.c
@@ -411,6 +411,24 @@ static void cedrus_queue_cleanup(struct vb2_queue *vq, u32 state)
 	}
 }
 
+static void cedrus_buf_cleanup(struct vb2_buffer *vb)
+{
+	struct vb2_queue *vq = vb->vb2_queue;
+
+	if (!V4L2_TYPE_IS_OUTPUT(vq->type)) {
+		struct cedrus_ctx *ctx = vb2_get_drv_priv(vq);
+		struct cedrus_buffer *cedrus_buf;
+
+		cedrus_buf = vb2_to_cedrus_buffer(vq->bufs[vb->index]);
+
+		if (cedrus_buf->extra_buf_size)
+			dma_free_coherent(ctx->dev->dev,
+					  cedrus_buf->extra_buf_size,
+					  cedrus_buf->extra_buf,
+					  cedrus_buf->extra_buf_dma);
+	}
+}
+
 static int cedrus_buf_out_validate(struct vb2_buffer *vb)
 {
 	struct vb2_v4l2_buffer *vbuf = to_vb2_v4l2_buffer(vb);
@@ -497,6 +515,7 @@ static void cedrus_buf_request_complete(struct vb2_buffer *vb)
 static struct vb2_ops cedrus_qops = {
 	.queue_setup		= cedrus_queue_setup,
 	.buf_prepare		= cedrus_buf_prepare,
+	.buf_cleanup		= cedrus_buf_cleanup,
 	.buf_queue		= cedrus_buf_queue,
 	.buf_out_validate	= cedrus_buf_out_validate,
 	.buf_request_complete	= cedrus_buf_request_complete,
-- 
2.21.0


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 7/7] media: cedrus: Improve H264 memory efficiency
  2019-05-30 21:15 [PATCH 0/7] media: cedrus: Improvements/cleanup Jernej Skrabec
                   ` (5 preceding siblings ...)
  2019-05-30 21:15 ` [PATCH 6/7] media: cedrus: Add infra for extra buffers connected to capture buffers Jernej Skrabec
@ 2019-05-30 21:15 ` Jernej Skrabec
  2019-06-03 12:23   ` Maxime Ripard
  2019-08-12 12:12 ` [PATCH 0/7] media: cedrus: Improvements/cleanup Hans Verkuil
  7 siblings, 1 reply; 30+ messages in thread
From: Jernej Skrabec @ 2019-05-30 21:15 UTC (permalink / raw)
  To: paul.kocialkowski, maxime.ripard
  Cc: wens, mchehab, gregkh, linux-media, devel, linux-arm-kernel,
	linux-kernel

H264 decoder driver preallocated pretty big worst case mv col buffer
pool. It turns out that pool is most of the time much bigger than it
needs to be.

Solution implemented here is to allocate memory only if capture buffer
is actually used and only as much as it is really necessary.

This is also preparation for 4K video decoding support, which will be
implemented later.

Signed-off-by: Jernej Skrabec <jernej.skrabec@siol.net>
---
 drivers/staging/media/sunxi/cedrus/cedrus.h   |  4 -
 .../staging/media/sunxi/cedrus/cedrus_h264.c  | 81 +++++++------------
 2 files changed, 28 insertions(+), 57 deletions(-)

diff --git a/drivers/staging/media/sunxi/cedrus/cedrus.h b/drivers/staging/media/sunxi/cedrus/cedrus.h
index 16c1bdfd243a..fcbbbef65494 100644
--- a/drivers/staging/media/sunxi/cedrus/cedrus.h
+++ b/drivers/staging/media/sunxi/cedrus/cedrus.h
@@ -106,10 +106,6 @@ struct cedrus_ctx {
 
 	union {
 		struct {
-			void		*mv_col_buf;
-			dma_addr_t	mv_col_buf_dma;
-			ssize_t		mv_col_buf_field_size;
-			ssize_t		mv_col_buf_size;
 			void		*pic_info_buf;
 			dma_addr_t	pic_info_buf_dma;
 			void		*neighbor_info_buf;
diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
index b2290f98d81a..758fd0049e8f 100644
--- a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
+++ b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
@@ -54,17 +54,14 @@ static void cedrus_h264_write_sram(struct cedrus_dev *dev,
 		cedrus_write(dev, VE_AVC_SRAM_PORT_DATA, *buffer++);
 }
 
-static dma_addr_t cedrus_h264_mv_col_buf_addr(struct cedrus_ctx *ctx,
-					      unsigned int position,
+static dma_addr_t cedrus_h264_mv_col_buf_addr(struct cedrus_buffer *buf,
 					      unsigned int field)
 {
-	dma_addr_t addr = ctx->codec.h264.mv_col_buf_dma;
-
-	/* Adjust for the position */
-	addr += position * ctx->codec.h264.mv_col_buf_field_size * 2;
+	dma_addr_t addr = buf->extra_buf_dma;
 
 	/* Adjust for the field */
-	addr += field * ctx->codec.h264.mv_col_buf_field_size;
+	if (field)
+		addr += buf->extra_buf_size / 2;
 
 	return addr;
 }
@@ -76,7 +73,6 @@ static void cedrus_fill_ref_pic(struct cedrus_ctx *ctx,
 				struct cedrus_h264_sram_ref_pic *pic)
 {
 	struct vb2_buffer *vbuf = &buf->m2m_buf.vb.vb2_buf;
-	unsigned int position = buf->codec.h264.position;
 
 	pic->top_field_order_cnt = cpu_to_le32(top_field_order_cnt);
 	pic->bottom_field_order_cnt = cpu_to_le32(bottom_field_order_cnt);
@@ -84,10 +80,8 @@ static void cedrus_fill_ref_pic(struct cedrus_ctx *ctx,
 
 	pic->luma_ptr = cpu_to_le32(cedrus_buf_addr(vbuf, &ctx->dst_fmt, 0));
 	pic->chroma_ptr = cpu_to_le32(cedrus_buf_addr(vbuf, &ctx->dst_fmt, 1));
-	pic->mv_col_top_ptr =
-		cpu_to_le32(cedrus_h264_mv_col_buf_addr(ctx, position, 0));
-	pic->mv_col_bot_ptr =
-		cpu_to_le32(cedrus_h264_mv_col_buf_addr(ctx, position, 1));
+	pic->mv_col_top_ptr = cpu_to_le32(cedrus_h264_mv_col_buf_addr(buf, 0));
+	pic->mv_col_bot_ptr = cpu_to_le32(cedrus_h264_mv_col_buf_addr(buf, 1));
 }
 
 static void cedrus_write_frame_list(struct cedrus_ctx *ctx,
@@ -142,6 +136,28 @@ static void cedrus_write_frame_list(struct cedrus_ctx *ctx,
 	output_buf = vb2_to_cedrus_buffer(&run->dst->vb2_buf);
 	output_buf->codec.h264.position = position;
 
+	if (!output_buf->extra_buf_size) {
+		const struct v4l2_ctrl_h264_sps *sps = run->h264.sps;
+		unsigned int field_size;
+
+		field_size = DIV_ROUND_UP(ctx->src_fmt.width, 16) *
+			DIV_ROUND_UP(ctx->src_fmt.height, 16) * 16;
+		if (!(sps->flags & V4L2_H264_SPS_FLAG_DIRECT_8X8_INFERENCE))
+			field_size = field_size * 2;
+		if (!(sps->flags & V4L2_H264_SPS_FLAG_FRAME_MBS_ONLY))
+			field_size = field_size * 2;
+
+		output_buf->extra_buf_size = field_size * 2;
+		output_buf->extra_buf =
+			dma_alloc_coherent(dev->dev,
+					   output_buf->extra_buf_size,
+					   &output_buf->extra_buf_dma,
+					   GFP_KERNEL);
+
+		if (!output_buf->extra_buf)
+			output_buf->extra_buf_size = 0;
+	}
+
 	if (slice->flags & V4L2_H264_SLICE_FLAG_FIELD_PIC)
 		output_buf->codec.h264.pic_type = CEDRUS_H264_PIC_TYPE_FIELD;
 	else if (sps->flags & V4L2_H264_SPS_FLAG_MB_ADAPTIVE_FRAME_FIELD)
@@ -476,8 +492,6 @@ static void cedrus_h264_setup(struct cedrus_ctx *ctx,
 static int cedrus_h264_start(struct cedrus_ctx *ctx)
 {
 	struct cedrus_dev *dev = ctx->dev;
-	unsigned int field_size;
-	unsigned int mv_col_size;
 	int ret;
 
 	/*
@@ -509,44 +523,8 @@ static int cedrus_h264_start(struct cedrus_ctx *ctx)
 		goto err_pic_buf;
 	}
 
-	field_size = DIV_ROUND_UP(ctx->src_fmt.width, 16) *
-		DIV_ROUND_UP(ctx->src_fmt.height, 16) * 16;
-
-	/*
-	 * FIXME: This is actually conditional to
-	 * V4L2_H264_SPS_FLAG_DIRECT_8X8_INFERENCE not being set, we
-	 * might have to rework this if memory efficiency ever is
-	 * something we need to work on.
-	 */
-	field_size = field_size * 2;
-
-	/*
-	 * FIXME: This is actually conditional to
-	 * V4L2_H264_SPS_FLAG_FRAME_MBS_ONLY not being set, we might
-	 * have to rework this if memory efficiency ever is something
-	 * we need to work on.
-	 */
-	field_size = field_size * 2;
-	ctx->codec.h264.mv_col_buf_field_size = field_size;
-
-	mv_col_size = field_size * 2 * CEDRUS_H264_FRAME_NUM;
-	ctx->codec.h264.mv_col_buf_size = mv_col_size;
-	ctx->codec.h264.mv_col_buf = dma_alloc_coherent(dev->dev,
-							ctx->codec.h264.mv_col_buf_size,
-							&ctx->codec.h264.mv_col_buf_dma,
-							GFP_KERNEL);
-	if (!ctx->codec.h264.mv_col_buf) {
-		ret = -ENOMEM;
-		goto err_neighbor_buf;
-	}
-
 	return 0;
 
-err_neighbor_buf:
-	dma_free_coherent(dev->dev, CEDRUS_NEIGHBOR_INFO_BUF_SIZE,
-			  ctx->codec.h264.neighbor_info_buf,
-			  ctx->codec.h264.neighbor_info_buf_dma);
-
 err_pic_buf:
 	dma_free_coherent(dev->dev, CEDRUS_PIC_INFO_BUF_SIZE,
 			  ctx->codec.h264.pic_info_buf,
@@ -558,9 +536,6 @@ static void cedrus_h264_stop(struct cedrus_ctx *ctx)
 {
 	struct cedrus_dev *dev = ctx->dev;
 
-	dma_free_coherent(dev->dev, ctx->codec.h264.mv_col_buf_size,
-			  ctx->codec.h264.mv_col_buf,
-			  ctx->codec.h264.mv_col_buf_dma);
 	dma_free_coherent(dev->dev, CEDRUS_NEIGHBOR_INFO_BUF_SIZE,
 			  ctx->codec.h264.neighbor_info_buf,
 			  ctx->codec.h264.neighbor_info_buf_dma);
-- 
2.21.0


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/7] media: cedrus: Disable engine after each slice decoding
  2019-05-30 21:15 ` [PATCH 1/7] media: cedrus: Disable engine after each slice decoding Jernej Skrabec
@ 2019-06-03 11:38   ` Maxime Ripard
  2019-08-12 13:28     ` Ezequiel Garcia
  0 siblings, 1 reply; 30+ messages in thread
From: Maxime Ripard @ 2019-06-03 11:38 UTC (permalink / raw)
  To: Jernej Skrabec
  Cc: paul.kocialkowski, wens, mchehab, gregkh, linux-media, devel,
	linux-arm-kernel, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 712 bytes --]

Hi,

On Thu, May 30, 2019 at 11:15:10PM +0200, Jernej Skrabec wrote:
> libvdpau-sunxi always disables engine after each decoded slice.
> Do same in Cedrus driver.
>
> Presumably this also lowers power consumption which is always nice.
>
> Signed-off-by: Jernej Skrabec <jernej.skrabec@siol.net>

Is it fixing anything though?

I indeed saw that cedar did disable it everytime, but I couldn't find
a reason why.

Also, the power management improvement would need to be measured, it
can even create the opposite situation where the device will draw more
current from being woken up than if it had just remained disabled.

Maxime

--
Maxime Ripard, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/7] media: cedrus: Fix H264 default reference index count
  2019-05-30 21:15 ` [PATCH 2/7] media: cedrus: Fix H264 default reference index count Jernej Skrabec
@ 2019-06-03 11:46   ` Maxime Ripard
  2019-06-03 15:34     ` Jernej Škrabec
  0 siblings, 1 reply; 30+ messages in thread
From: Maxime Ripard @ 2019-06-03 11:46 UTC (permalink / raw)
  To: Jernej Skrabec
  Cc: paul.kocialkowski, wens, mchehab, gregkh, linux-media, devel,
	linux-arm-kernel, linux-kernel, nicolas, boris.brezillon, jonas

[-- Attachment #1: Type: text/plain, Size: 1561 bytes --]

On Thu, May 30, 2019 at 11:15:11PM +0200, Jernej Skrabec wrote:
> Reference index count in VE_H264_PPS reg should come from PPS control.
> However, this is not really important because reference index count is
> in our case always overridden by that from slice header.
>
> Cc: nicolas@ndufresne.ca
> Cc: boris.brezillon@collabora.com
> Cc: jonas@kwiboo.se
>
> Signed-off-by: Jernej Skrabec <jernej.skrabec@siol.net>

Acked-by: Maxime Ripard <maxime.ripard@bootlin.com>

> ---
> We have to decide if we drop pps->num_ref_idx_l0_default_active_minus1
> and pps->num_ref_idx_l1_default_active_minus1 fields or add
> num_ref_idx_l0_active_override_flag and num_ref_idx_l0_active_override_flag
> to slice control.
>
> Current control doesn't have those two flags, so in Cedrus override flag is
> always set and we rely on userspace to set slice->num_ref_idx_l0_active_minus1
> and slice->num_ref_idx_l1_active_minus1 to correct values. This means that
> values stored in PPS are not needed and always ignored by VPU.
>
> If I understand correctly, algorithm is very simple:
>
> ref_count = PPS->ref_count
> if (override_flag)
> 	ref_count = slice->ref_count
>
> It seems that VAAPI provides only final value. In my opinion we should do the
> same - get rid of PPS default ref index count fields.

The rationale was to be as conservative as possible and just expose
everything that is in the bitstream in those controls to accomodate
for as many weird hardware as possible.

Maxime

--
Maxime Ripard, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/7] media: cedrus: Fix decoding for some H264 videos
  2019-05-30 21:15 ` [PATCH 3/7] media: cedrus: Fix decoding for some H264 videos Jernej Skrabec
@ 2019-06-03 11:55   ` Maxime Ripard
  2019-06-03 15:37     ` Jernej Škrabec
  0 siblings, 1 reply; 30+ messages in thread
From: Maxime Ripard @ 2019-06-03 11:55 UTC (permalink / raw)
  To: Jernej Skrabec
  Cc: paul.kocialkowski, wens, mchehab, gregkh, linux-media, devel,
	linux-arm-kernel, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2310 bytes --]

Hi,

On Thu, May 30, 2019 at 11:15:12PM +0200, Jernej Skrabec wrote:
> It seems that for some H264 videos at least one bitstream parsing
> trigger must be called in order to be decoded correctly. There is no
> explanation why this helps, but it was observed that two sample videos
> with this fix are now decoded correctly and there is no regression with
> others.
>
> Signed-off-by: Jernej Skrabec <jernej.skrabec@siol.net>
> ---
> I have two samples which are fixed by this:
> http://jernej.libreelec.tv/videos/h264/test.mkv
> http://jernej.libreelec.tv/videos/h264/Dredd%20%E2%80%93%20DTS%20Sound%20Check%20DTS-HD%20MA%207.1.m2ts
>
> Although second one also needs support for multi-slice frames, which is not yet implemented here.
>
>  .../staging/media/sunxi/cedrus/cedrus_h264.c  | 22 ++++++++++++++++---
>  1 file changed, 19 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> index cc8d17f211a1..d0ee3f90ff46 100644
> --- a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> +++ b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> @@ -6,6 +6,7 @@
>   * Copyright (c) 2018 Bootlin
>   */
>
> +#include <linux/delay.h>
>  #include <linux/types.h>
>
>  #include <media/videobuf2-dma-contig.h>
> @@ -289,6 +290,20 @@ static void cedrus_write_pred_weight_table(struct cedrus_ctx *ctx,
>  	}
>  }

We should have a comment here explaining why that is needed

> +static void cedrus_skip_bits(struct cedrus_dev *dev, int num)
> +{
> +	for (; num > 32; num -= 32) {
> +		cedrus_write(dev, VE_H264_TRIGGER_TYPE, 0x3 | (32 << 8));

Using defines here would be great

> +		while (cedrus_read(dev, VE_H264_STATUS) & (1 << 8))
> +			udelay(1);
> +	}

A new line here would be great

> +	if (num > 0) {
> +		cedrus_write(dev, VE_H264_TRIGGER_TYPE, 0x3 | (num << 8));
> +		while (cedrus_read(dev, VE_H264_STATUS) & (1 << 8))
> +			udelay(1);
> +	}

Can't we make that a bit simpler by not duplicating the loop?

Something like:

int current = 0;

while (current < num) {
    int tmp = min(num - current, 32);

    cedrus_write(dev, VE_H264_TRIGGER_TYPE, 0x3 | (current << 8))
    while (...)
       ...

    current += tmp;
}

Maxime

--
Maxime Ripard, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/7] media: cedrus: Remove dst_bufs from context
  2019-05-30 21:15 ` [PATCH 4/7] media: cedrus: Remove dst_bufs from context Jernej Skrabec
@ 2019-06-03 12:13   ` Maxime Ripard
  2019-06-05 21:07   ` Paul Kocialkowski
  2019-08-12 13:42   ` Ezequiel Garcia
  2 siblings, 0 replies; 30+ messages in thread
From: Maxime Ripard @ 2019-06-03 12:13 UTC (permalink / raw)
  To: Jernej Skrabec
  Cc: paul.kocialkowski, wens, mchehab, gregkh, linux-media, devel,
	linux-arm-kernel, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 395 bytes --]

On Thu, May 30, 2019 at 11:15:13PM +0200, Jernej Skrabec wrote:
> This array is just duplicated capture buffer queue. Remove it and adjust
> code to look into capture buffer queue instead.
>
> Signed-off-by: Jernej Skrabec <jernej.skrabec@siol.net>

Acked-by: Maxime Ripard <maxime.ripard@bootlin.com>

Maxime

--
Maxime Ripard, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/7] media: cedrus: Add infra for extra buffers connected to capture buffers
  2019-05-30 21:15 ` [PATCH 6/7] media: cedrus: Add infra for extra buffers connected to capture buffers Jernej Skrabec
@ 2019-06-03 12:18   ` Maxime Ripard
  2019-06-03 15:48     ` Jernej Škrabec
  0 siblings, 1 reply; 30+ messages in thread
From: Maxime Ripard @ 2019-06-03 12:18 UTC (permalink / raw)
  To: Jernej Skrabec
  Cc: paul.kocialkowski, wens, mchehab, gregkh, linux-media, devel,
	linux-arm-kernel, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2425 bytes --]

Hi,

On Thu, May 30, 2019 at 11:15:15PM +0200, Jernej Skrabec wrote:
> H264 and HEVC engines need additional buffers for each capture buffer.
> H264 engine has this currently solved by allocating fixed size pool,
> which is not ideal. Most of the time pool size is much bigger than it
> needs to be.
>
> Ideally, extra buffer should be allocated at buffer initialization, but
> that's not efficient. It's size in H264 depends on flags set in SPS, but
> that information is not available in buffer init callback.
>
> Signed-off-by: Jernej Skrabec <jernej.skrabec@siol.net>
> ---
>  drivers/staging/media/sunxi/cedrus/cedrus.h   |  4 ++++
>  .../staging/media/sunxi/cedrus/cedrus_video.c | 19 +++++++++++++++++++
>  2 files changed, 23 insertions(+)
>
> diff --git a/drivers/staging/media/sunxi/cedrus/cedrus.h b/drivers/staging/media/sunxi/cedrus/cedrus.h
> index d8e6777e5e27..16c1bdfd243a 100644
> --- a/drivers/staging/media/sunxi/cedrus/cedrus.h
> +++ b/drivers/staging/media/sunxi/cedrus/cedrus.h
> @@ -81,6 +81,10 @@ struct cedrus_run {
>  struct cedrus_buffer {
>  	struct v4l2_m2m_buffer          m2m_buf;
>
> +	void		*extra_buf;
> +	dma_addr_t	extra_buf_dma;
> +	ssize_t		extra_buf_size;
> +
>  	union {
>  		struct {
>  			unsigned int			position;
> diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_video.c b/drivers/staging/media/sunxi/cedrus/cedrus_video.c
> index 681dfe3367a6..d756e0e69634 100644
> --- a/drivers/staging/media/sunxi/cedrus/cedrus_video.c
> +++ b/drivers/staging/media/sunxi/cedrus/cedrus_video.c
> @@ -411,6 +411,24 @@ static void cedrus_queue_cleanup(struct vb2_queue *vq, u32 state)
>  	}
>  }
>
> +static void cedrus_buf_cleanup(struct vb2_buffer *vb)
> +{
> +	struct vb2_queue *vq = vb->vb2_queue;
> +
> +	if (!V4L2_TYPE_IS_OUTPUT(vq->type)) {
> +		struct cedrus_ctx *ctx = vb2_get_drv_priv(vq);
> +		struct cedrus_buffer *cedrus_buf;
> +
> +		cedrus_buf = vb2_to_cedrus_buffer(vq->bufs[vb->index]);
> +
> +		if (cedrus_buf->extra_buf_size)
> +			dma_free_coherent(ctx->dev->dev,
> +					  cedrus_buf->extra_buf_size,
> +					  cedrus_buf->extra_buf,
> +					  cedrus_buf->extra_buf_dma);
> +	}
> +}
> +

I'm really not a fan of allocating something somewhere, and freeing it
somewhere else. Making sure you don't leak something is hard enough to
not have some non-trivial allocation scheme.

Maxime

--
Maxime Ripard, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 7/7] media: cedrus: Improve H264 memory efficiency
  2019-05-30 21:15 ` [PATCH 7/7] media: cedrus: Improve H264 memory efficiency Jernej Skrabec
@ 2019-06-03 12:23   ` Maxime Ripard
  2019-06-03 16:37     ` Jernej Škrabec
  0 siblings, 1 reply; 30+ messages in thread
From: Maxime Ripard @ 2019-06-03 12:23 UTC (permalink / raw)
  To: Jernej Skrabec
  Cc: paul.kocialkowski, wens, mchehab, gregkh, linux-media, devel,
	linux-arm-kernel, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 4730 bytes --]

On Thu, May 30, 2019 at 11:15:16PM +0200, Jernej Skrabec wrote:
> H264 decoder driver preallocated pretty big worst case mv col buffer
> pool. It turns out that pool is most of the time much bigger than it
> needs to be.
>
> Solution implemented here is to allocate memory only if capture buffer
> is actually used and only as much as it is really necessary.
>
> This is also preparation for 4K video decoding support, which will be
> implemented later.

What is it doing exactly to prepare for 4k?

> Signed-off-by: Jernej Skrabec <jernej.skrabec@siol.net>
> ---
>  drivers/staging/media/sunxi/cedrus/cedrus.h   |  4 -
>  .../staging/media/sunxi/cedrus/cedrus_h264.c  | 81 +++++++------------
>  2 files changed, 28 insertions(+), 57 deletions(-)
>
> diff --git a/drivers/staging/media/sunxi/cedrus/cedrus.h b/drivers/staging/media/sunxi/cedrus/cedrus.h
> index 16c1bdfd243a..fcbbbef65494 100644
> --- a/drivers/staging/media/sunxi/cedrus/cedrus.h
> +++ b/drivers/staging/media/sunxi/cedrus/cedrus.h
> @@ -106,10 +106,6 @@ struct cedrus_ctx {
>
>  	union {
>  		struct {
> -			void		*mv_col_buf;
> -			dma_addr_t	mv_col_buf_dma;
> -			ssize_t		mv_col_buf_field_size;
> -			ssize_t		mv_col_buf_size;
>  			void		*pic_info_buf;
>  			dma_addr_t	pic_info_buf_dma;
>  			void		*neighbor_info_buf;
> diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> index b2290f98d81a..758fd0049e8f 100644
> --- a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> +++ b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> @@ -54,17 +54,14 @@ static void cedrus_h264_write_sram(struct cedrus_dev *dev,
>  		cedrus_write(dev, VE_AVC_SRAM_PORT_DATA, *buffer++);
>  }
>
> -static dma_addr_t cedrus_h264_mv_col_buf_addr(struct cedrus_ctx *ctx,
> -					      unsigned int position,
> +static dma_addr_t cedrus_h264_mv_col_buf_addr(struct cedrus_buffer *buf,
>  					      unsigned int field)
>  {
> -	dma_addr_t addr = ctx->codec.h264.mv_col_buf_dma;
> -
> -	/* Adjust for the position */
> -	addr += position * ctx->codec.h264.mv_col_buf_field_size * 2;
> +	dma_addr_t addr = buf->extra_buf_dma;
>
>  	/* Adjust for the field */
> -	addr += field * ctx->codec.h264.mv_col_buf_field_size;
> +	if (field)
> +		addr += buf->extra_buf_size / 2;
>
>  	return addr;
>  }
> @@ -76,7 +73,6 @@ static void cedrus_fill_ref_pic(struct cedrus_ctx *ctx,
>  				struct cedrus_h264_sram_ref_pic *pic)
>  {
>  	struct vb2_buffer *vbuf = &buf->m2m_buf.vb.vb2_buf;
> -	unsigned int position = buf->codec.h264.position;
>
>  	pic->top_field_order_cnt = cpu_to_le32(top_field_order_cnt);
>  	pic->bottom_field_order_cnt = cpu_to_le32(bottom_field_order_cnt);
> @@ -84,10 +80,8 @@ static void cedrus_fill_ref_pic(struct cedrus_ctx *ctx,
>
>  	pic->luma_ptr = cpu_to_le32(cedrus_buf_addr(vbuf, &ctx->dst_fmt, 0));
>  	pic->chroma_ptr = cpu_to_le32(cedrus_buf_addr(vbuf, &ctx->dst_fmt, 1));
> -	pic->mv_col_top_ptr =
> -		cpu_to_le32(cedrus_h264_mv_col_buf_addr(ctx, position, 0));
> -	pic->mv_col_bot_ptr =
> -		cpu_to_le32(cedrus_h264_mv_col_buf_addr(ctx, position, 1));
> +	pic->mv_col_top_ptr = cpu_to_le32(cedrus_h264_mv_col_buf_addr(buf, 0));
> +	pic->mv_col_bot_ptr = cpu_to_le32(cedrus_h264_mv_col_buf_addr(buf, 1));
>  }
>
>  static void cedrus_write_frame_list(struct cedrus_ctx *ctx,
> @@ -142,6 +136,28 @@ static void cedrus_write_frame_list(struct cedrus_ctx *ctx,
>  	output_buf = vb2_to_cedrus_buffer(&run->dst->vb2_buf);
>  	output_buf->codec.h264.position = position;
>
> +	if (!output_buf->extra_buf_size) {
> +		const struct v4l2_ctrl_h264_sps *sps = run->h264.sps;
> +		unsigned int field_size;
> +
> +		field_size = DIV_ROUND_UP(ctx->src_fmt.width, 16) *
> +			DIV_ROUND_UP(ctx->src_fmt.height, 16) * 16;
> +		if (!(sps->flags & V4L2_H264_SPS_FLAG_DIRECT_8X8_INFERENCE))
> +			field_size = field_size * 2;
> +		if (!(sps->flags & V4L2_H264_SPS_FLAG_FRAME_MBS_ONLY))
> +			field_size = field_size * 2;
> +
> +		output_buf->extra_buf_size = field_size * 2;
> +		output_buf->extra_buf =
> +			dma_alloc_coherent(dev->dev,
> +					   output_buf->extra_buf_size,
> +					   &output_buf->extra_buf_dma,
> +					   GFP_KERNEL);
> +
> +		if (!output_buf->extra_buf)
> +			output_buf->extra_buf_size = 0;
> +	}
> +

That also means that instead of allocating that buffer exactly once,
you now allocate it for each output buffer?

I guess that it will cleaned up by your previous patch at
buffer_cleanup time, so after it's no longer a reference frame?

What is the average memory usage before, and after that change during
a playback, and what is the runtime cost of doing it multiple times
instead of once?

Maxime

--
Maxime Ripard, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/7] media: cedrus: Fix H264 default reference index count
  2019-06-03 11:46   ` Maxime Ripard
@ 2019-06-03 15:34     ` Jernej Škrabec
  0 siblings, 0 replies; 30+ messages in thread
From: Jernej Škrabec @ 2019-06-03 15:34 UTC (permalink / raw)
  To: Maxime Ripard
  Cc: paul.kocialkowski, wens, mchehab, gregkh, linux-media, devel,
	linux-arm-kernel, linux-kernel, nicolas, boris.brezillon, jonas

Dne ponedeljek, 03. junij 2019 ob 13:46:20 CEST je Maxime Ripard napisal(a):
> On Thu, May 30, 2019 at 11:15:11PM +0200, Jernej Skrabec wrote:
> > Reference index count in VE_H264_PPS reg should come from PPS control.
> > However, this is not really important because reference index count is
> > in our case always overridden by that from slice header.
> > 
> > Cc: nicolas@ndufresne.ca
> > Cc: boris.brezillon@collabora.com
> > Cc: jonas@kwiboo.se
> > 
> > Signed-off-by: Jernej Skrabec <jernej.skrabec@siol.net>
> 
> Acked-by: Maxime Ripard <maxime.ripard@bootlin.com>
> 
> > ---
> > We have to decide if we drop pps->num_ref_idx_l0_default_active_minus1
> > and pps->num_ref_idx_l1_default_active_minus1 fields or add
> > num_ref_idx_l0_active_override_flag and
> > num_ref_idx_l0_active_override_flag
> > to slice control.

Actually only one flag is in bitstream valid for both l0 and l1 ref list.

> > 
> > Current control doesn't have those two flags, so in Cedrus override flag
> > is
> > always set and we rely on userspace to set
> > slice->num_ref_idx_l0_active_minus1 and
> > slice->num_ref_idx_l1_active_minus1 to correct values. This means that
> > values stored in PPS are not needed and always ignored by VPU.
> > 
> > If I understand correctly, algorithm is very simple:
> > 
> > ref_count = PPS->ref_count
> > if (override_flag)
> > 
> > 	ref_count = slice->ref_count
> > 
> > It seems that VAAPI provides only final value. In my opinion we should do
> > the same - get rid of PPS default ref index count fields.
> 
> The rationale was to be as conservative as possible and just expose
> everything that is in the bitstream in those controls to accomodate
> for as many weird hardware as possible.

Ok, so then we should add that override flag, which would align with h264 specs 
and you can still do same trick in VAAPI library which it's currently used in 
Cedrus driver - always set override flag and fill out only slice reflist count. 
At the end it shouldn't matter for proper decoding in any driver.

Best regards,
Jernej




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/7] media: cedrus: Fix decoding for some H264 videos
  2019-06-03 11:55   ` Maxime Ripard
@ 2019-06-03 15:37     ` Jernej Škrabec
  2019-06-06 12:45       ` Dan Carpenter
  0 siblings, 1 reply; 30+ messages in thread
From: Jernej Škrabec @ 2019-06-03 15:37 UTC (permalink / raw)
  To: Maxime Ripard
  Cc: paul.kocialkowski, wens, mchehab, gregkh, linux-media, devel,
	linux-arm-kernel, linux-kernel

Dne ponedeljek, 03. junij 2019 ob 13:55:36 CEST je Maxime Ripard napisal(a):
> Hi,
> 
> On Thu, May 30, 2019 at 11:15:12PM +0200, Jernej Skrabec wrote:
> > It seems that for some H264 videos at least one bitstream parsing
> > trigger must be called in order to be decoded correctly. There is no
> > explanation why this helps, but it was observed that two sample videos
> > with this fix are now decoded correctly and there is no regression with
> > others.
> > 
> > Signed-off-by: Jernej Skrabec <jernej.skrabec@siol.net>
> > ---
> > I have two samples which are fixed by this:
> > http://jernej.libreelec.tv/videos/h264/test.mkv
> > http://jernej.libreelec.tv/videos/h264/Dredd%20%E2%80%93%20DTS%20Sound%20C
> > heck%20DTS-HD%20MA%207.1.m2ts
> > 
> > Although second one also needs support for multi-slice frames, which is
> > not yet implemented here.> 
> >  .../staging/media/sunxi/cedrus/cedrus_h264.c  | 22 ++++++++++++++++---
> >  1 file changed, 19 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> > b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c index
> > cc8d17f211a1..d0ee3f90ff46 100644
> > --- a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> > +++ b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> > @@ -6,6 +6,7 @@
> > 
> >   * Copyright (c) 2018 Bootlin
> >   */
> > 
> > +#include <linux/delay.h>
> > 
> >  #include <linux/types.h>
> >  
> >  #include <media/videobuf2-dma-contig.h>
> > 
> > @@ -289,6 +290,20 @@ static void cedrus_write_pred_weight_table(struct
> > cedrus_ctx *ctx,> 
> >  	}
> >  
> >  }
> 
> We should have a comment here explaining why that is needed

ok.

> 
> > +static void cedrus_skip_bits(struct cedrus_dev *dev, int num)
> > +{
> > +	for (; num > 32; num -= 32) {
> > +		cedrus_write(dev, VE_H264_TRIGGER_TYPE, 0x3 | (32 << 
8));
> 
> Using defines here would be great

Yes, sorry about that.

> 
> > +		while (cedrus_read(dev, VE_H264_STATUS) & (1 << 8))
> > +			udelay(1);
> > +	}
> 
> A new line here would be great
> 
> > +	if (num > 0) {
> > +		cedrus_write(dev, VE_H264_TRIGGER_TYPE, 0x3 | (num << 
8));
> > +		while (cedrus_read(dev, VE_H264_STATUS) & (1 << 8))
> > +			udelay(1);
> > +	}
> 
> Can't we make that a bit simpler by not duplicating the loop?
> 
> Something like:
> 
> int current = 0;
> 
> while (current < num) {
>     int tmp = min(num - current, 32);
> 
>     cedrus_write(dev, VE_H264_TRIGGER_TYPE, 0x3 | (current << 8))
>     while (...)
>        ...
> 
>     current += tmp;
> }

Yes, that looks better, I'll change it in next version.

Best regards,
Jernej




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/7] media: cedrus: Add infra for extra buffers connected to capture buffers
  2019-06-03 12:18   ` Maxime Ripard
@ 2019-06-03 15:48     ` Jernej Škrabec
  2019-06-05 21:10       ` Paul Kocialkowski
  2019-06-06  8:33       ` Maxime Ripard
  0 siblings, 2 replies; 30+ messages in thread
From: Jernej Škrabec @ 2019-06-03 15:48 UTC (permalink / raw)
  To: Maxime Ripard
  Cc: paul.kocialkowski, wens, mchehab, gregkh, linux-media, devel,
	linux-arm-kernel, linux-kernel

Dne ponedeljek, 03. junij 2019 ob 14:18:59 CEST je Maxime Ripard napisal(a):
> Hi,
> 
> On Thu, May 30, 2019 at 11:15:15PM +0200, Jernej Skrabec wrote:
> > H264 and HEVC engines need additional buffers for each capture buffer.
> > H264 engine has this currently solved by allocating fixed size pool,
> > which is not ideal. Most of the time pool size is much bigger than it
> > needs to be.
> > 
> > Ideally, extra buffer should be allocated at buffer initialization, but
> > that's not efficient. It's size in H264 depends on flags set in SPS, but
> > that information is not available in buffer init callback.
> > 
> > Signed-off-by: Jernej Skrabec <jernej.skrabec@siol.net>
> > ---
> > 
> >  drivers/staging/media/sunxi/cedrus/cedrus.h   |  4 ++++
> >  .../staging/media/sunxi/cedrus/cedrus_video.c | 19 +++++++++++++++++++
> >  2 files changed, 23 insertions(+)
> > 
> > diff --git a/drivers/staging/media/sunxi/cedrus/cedrus.h
> > b/drivers/staging/media/sunxi/cedrus/cedrus.h index
> > d8e6777e5e27..16c1bdfd243a 100644
> > --- a/drivers/staging/media/sunxi/cedrus/cedrus.h
> > +++ b/drivers/staging/media/sunxi/cedrus/cedrus.h
> > @@ -81,6 +81,10 @@ struct cedrus_run {
> > 
> >  struct cedrus_buffer {
> >  
> >  	struct v4l2_m2m_buffer          m2m_buf;
> > 
> > +	void		*extra_buf;
> > +	dma_addr_t	extra_buf_dma;
> > +	ssize_t		extra_buf_size;
> > +
> > 
> >  	union {
> >  	
> >  		struct {
> >  		
> >  			unsigned int			position;
> > 
> > diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_video.c
> > b/drivers/staging/media/sunxi/cedrus/cedrus_video.c index
> > 681dfe3367a6..d756e0e69634 100644
> > --- a/drivers/staging/media/sunxi/cedrus/cedrus_video.c
> > +++ b/drivers/staging/media/sunxi/cedrus/cedrus_video.c
> > @@ -411,6 +411,24 @@ static void cedrus_queue_cleanup(struct vb2_queue
> > *vq, u32 state)> 
> >  	}
> >  
> >  }
> > 
> > +static void cedrus_buf_cleanup(struct vb2_buffer *vb)
> > +{
> > +	struct vb2_queue *vq = vb->vb2_queue;
> > +
> > +	if (!V4L2_TYPE_IS_OUTPUT(vq->type)) {
> > +		struct cedrus_ctx *ctx = vb2_get_drv_priv(vq);
> > +		struct cedrus_buffer *cedrus_buf;
> > +
> > +		cedrus_buf = vb2_to_cedrus_buffer(vq->bufs[vb->index]);
> > +
> > +		if (cedrus_buf->extra_buf_size)
> > +			dma_free_coherent(ctx->dev->dev,
> > +					  cedrus_buf-
>extra_buf_size,
> > +					  cedrus_buf-
>extra_buf,
> > +					  cedrus_buf-
>extra_buf_dma);
> > +	}
> > +}
> > +
> 
> I'm really not a fan of allocating something somewhere, and freeing it
> somewhere else. Making sure you don't leak something is hard enough to
> not have some non-trivial allocation scheme.

Ok, what about introducing two new optional methods in engine callbacks, 
buffer_init and buffer_destroy, which would be called from cedrus_buf_init() and 
cedrus_buf_cleanup(), respectively. That way all (de)allocation logic stays 
within the same engine.

Best regards,
Jernej




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 7/7] media: cedrus: Improve H264 memory efficiency
  2019-06-03 12:23   ` Maxime Ripard
@ 2019-06-03 16:37     ` Jernej Škrabec
  2019-06-05 21:12       ` Paul Kocialkowski
  0 siblings, 1 reply; 30+ messages in thread
From: Jernej Škrabec @ 2019-06-03 16:37 UTC (permalink / raw)
  To: Maxime Ripard
  Cc: paul.kocialkowski, wens, mchehab, gregkh, linux-media, devel,
	linux-arm-kernel, linux-kernel

Dne ponedeljek, 03. junij 2019 ob 14:23:28 CEST je Maxime Ripard napisal(a):
> On Thu, May 30, 2019 at 11:15:16PM +0200, Jernej Skrabec wrote:
> > H264 decoder driver preallocated pretty big worst case mv col buffer
> > pool. It turns out that pool is most of the time much bigger than it
> > needs to be.
> > 
> > Solution implemented here is to allocate memory only if capture buffer
> > is actually used and only as much as it is really necessary.
> > 
> > This is also preparation for 4K video decoding support, which will be
> > implemented later.
> 
> What is it doing exactly to prepare for 4k?

Well, with that change 4K videos can be actually watched with 256 MiB CMA 
pool, but I can drop this statement in next version.

I concentrated on 256 MiB CMA pool, because it's maximum memory size supported 
by older VPU versions, but I think they don't support 4K decoding. I don't 
have them, so I can't test that hypothesis.

> 
> > Signed-off-by: Jernej Skrabec <jernej.skrabec@siol.net>
> > ---
> > 
> >  drivers/staging/media/sunxi/cedrus/cedrus.h   |  4 -
> >  .../staging/media/sunxi/cedrus/cedrus_h264.c  | 81 +++++++------------
> >  2 files changed, 28 insertions(+), 57 deletions(-)
> > 
> > diff --git a/drivers/staging/media/sunxi/cedrus/cedrus.h
> > b/drivers/staging/media/sunxi/cedrus/cedrus.h index
> > 16c1bdfd243a..fcbbbef65494 100644
> > --- a/drivers/staging/media/sunxi/cedrus/cedrus.h
> > +++ b/drivers/staging/media/sunxi/cedrus/cedrus.h
> > @@ -106,10 +106,6 @@ struct cedrus_ctx {
> > 
> >  	union {
> >  	
> >  		struct {
> > 
> > -			void		*mv_col_buf;
> > -			dma_addr_t	mv_col_buf_dma;
> > -			ssize_t		mv_col_buf_field_size;
> > -			ssize_t		mv_col_buf_size;
> > 
> >  			void		*pic_info_buf;
> >  			dma_addr_t	pic_info_buf_dma;
> >  			void		*neighbor_info_buf;
> > 
> > diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> > b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c index
> > b2290f98d81a..758fd0049e8f 100644
> > --- a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> > +++ b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> > @@ -54,17 +54,14 @@ static void cedrus_h264_write_sram(struct cedrus_dev
> > *dev,> 
> >  		cedrus_write(dev, VE_AVC_SRAM_PORT_DATA, *buffer++);
> >  
> >  }
> > 
> > -static dma_addr_t cedrus_h264_mv_col_buf_addr(struct cedrus_ctx *ctx,
> > -					      unsigned int 
position,
> > +static dma_addr_t cedrus_h264_mv_col_buf_addr(struct cedrus_buffer *buf,
> > 
> >  					      unsigned int 
field)
> >  
> >  {
> > 
> > -	dma_addr_t addr = ctx->codec.h264.mv_col_buf_dma;
> > -
> > -	/* Adjust for the position */
> > -	addr += position * ctx->codec.h264.mv_col_buf_field_size * 2;
> > +	dma_addr_t addr = buf->extra_buf_dma;
> > 
> >  	/* Adjust for the field */
> > 
> > -	addr += field * ctx->codec.h264.mv_col_buf_field_size;
> > +	if (field)
> > +		addr += buf->extra_buf_size / 2;
> > 
> >  	return addr;
> >  
> >  }
> > 
> > @@ -76,7 +73,6 @@ static void cedrus_fill_ref_pic(struct cedrus_ctx *ctx,
> > 
> >  				struct cedrus_h264_sram_ref_pic 
*pic)
> >  
> >  {
> >  
> >  	struct vb2_buffer *vbuf = &buf->m2m_buf.vb.vb2_buf;
> > 
> > -	unsigned int position = buf->codec.h264.position;
> > 
> >  	pic->top_field_order_cnt = cpu_to_le32(top_field_order_cnt);
> >  	pic->bottom_field_order_cnt = cpu_to_le32(bottom_field_order_cnt);
> > 
> > @@ -84,10 +80,8 @@ static void cedrus_fill_ref_pic(struct cedrus_ctx *ctx,
> > 
> >  	pic->luma_ptr = cpu_to_le32(cedrus_buf_addr(vbuf, &ctx->dst_fmt, 
0));
> >  	pic->chroma_ptr = cpu_to_le32(cedrus_buf_addr(vbuf, &ctx->dst_fmt, 
1));
> > 
> > -	pic->mv_col_top_ptr =
> > -		cpu_to_le32(cedrus_h264_mv_col_buf_addr(ctx, position, 
0));
> > -	pic->mv_col_bot_ptr =
> > -		cpu_to_le32(cedrus_h264_mv_col_buf_addr(ctx, position, 
1));
> > +	pic->mv_col_top_ptr = cpu_to_le32(cedrus_h264_mv_col_buf_addr(buf, 
0));
> > +	pic->mv_col_bot_ptr = cpu_to_le32(cedrus_h264_mv_col_buf_addr(buf, 
1));
> > 
> >  }
> >  
> >  static void cedrus_write_frame_list(struct cedrus_ctx *ctx,
> > 
> > @@ -142,6 +136,28 @@ static void cedrus_write_frame_list(struct cedrus_ctx
> > *ctx,> 
> >  	output_buf = vb2_to_cedrus_buffer(&run->dst->vb2_buf);
> >  	output_buf->codec.h264.position = position;
> > 
> > +	if (!output_buf->extra_buf_size) {
> > +		const struct v4l2_ctrl_h264_sps *sps = run->h264.sps;
> > +		unsigned int field_size;
> > +
> > +		field_size = DIV_ROUND_UP(ctx->src_fmt.width, 16) *
> > +			DIV_ROUND_UP(ctx->src_fmt.height, 16) * 16;
> > +		if (!(sps->flags & 
V4L2_H264_SPS_FLAG_DIRECT_8X8_INFERENCE))
> > +			field_size = field_size * 2;
> > +		if (!(sps->flags & V4L2_H264_SPS_FLAG_FRAME_MBS_ONLY))
> > +			field_size = field_size * 2;
> > +
> > +		output_buf->extra_buf_size = field_size * 2;
> > +		output_buf->extra_buf =
> > +			dma_alloc_coherent(dev->dev,
> > +					   output_buf-
>extra_buf_size,
> > +					   &output_buf-
>extra_buf_dma,
> > +					   GFP_KERNEL);
> > +
> > +		if (!output_buf->extra_buf)
> > +			output_buf->extra_buf_size = 0;
> > +	}
> > +
> 
> That also means that instead of allocating that buffer exactly once,
> you now allocate it for each output buffer?

It's not completely the same. I'm allocating multiple times, yes, but much 
smaller chunks and only if needed.

Still, this slight overhead happens only when buffer is used for the first time. 
When buffer is reused, this MV buffer is also reused.

> 
> I guess that it will cleaned up by your previous patch at
> buffer_cleanup time, so after it's no longer a reference frame?

Yes, it will be deallocated in buffer_cleanup, but only after capture buffers 
are freed, which usually happens when device file descriptor is closed.

Buffers which holds reference frames are later reused, together with it's MV 
buffer, so there's no overhead.

> 
> What is the average memory usage before, and after that change during
> a playback, and what is the runtime cost of doing it multiple times
> instead of once?

As I already said, overhead is present only when buffer is used for the first 
time, which is not ideal, but allows to calculate minimal buffer size needed 
and even doesn't allocate anything if capture buffer is not used at all.

I didn't collect any exact numbers, but with this change I can play H264 and 
HEVC (with similar modification) 4K video samples with 256 MiB CMA pool. 
Without this change, it's not really possible. You can argue "but what if 4K 
video use 16 reference frames", then yes, only solution is to increase CMA 
pool, but why reserve extra memory which will never be used?

I've been using this optimization for past ~6 month with no issues noticed. If 
you feel better, I can change this to be a bit conservative and allocate MV 
buffer when buffer_init is called. This will consume a bit more memory as SPS is 
not available at that time (worst case buffer size estimation), but it still 
won't allocate MV buffers for unallocated capture frames.

Best regards,
Jernej



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/7] media: cedrus: Remove dst_bufs from context
  2019-05-30 21:15 ` [PATCH 4/7] media: cedrus: Remove dst_bufs from context Jernej Skrabec
  2019-06-03 12:13   ` Maxime Ripard
@ 2019-06-05 21:07   ` Paul Kocialkowski
  2019-08-12 13:42   ` Ezequiel Garcia
  2 siblings, 0 replies; 30+ messages in thread
From: Paul Kocialkowski @ 2019-06-05 21:07 UTC (permalink / raw)
  To: Jernej Skrabec, maxime.ripard
  Cc: wens, mchehab, gregkh, linux-media, devel, linux-arm-kernel,
	linux-kernel

Hi,

Le jeudi 30 mai 2019 à 23:15 +0200, Jernej Skrabec a écrit :
> This array is just duplicated capture buffer queue. Remove it and adjust
> code to look into capture buffer queue instead.
> 
> Signed-off-by: Jernej Skrabec <jernej.skrabec@siol.net>

Acked-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>

Cheers and thanks,

Paul

> ---
>  drivers/staging/media/sunxi/cedrus/cedrus.h   |  4 +---
>  .../staging/media/sunxi/cedrus/cedrus_h264.c  |  4 ++--
>  .../staging/media/sunxi/cedrus/cedrus_video.c | 22 -------------------
>  3 files changed, 3 insertions(+), 27 deletions(-)
> 
> diff --git a/drivers/staging/media/sunxi/cedrus/cedrus.h b/drivers/staging/media/sunxi/cedrus/cedrus.h
> index 3f476d0fd981..d8e6777e5e27 100644
> --- a/drivers/staging/media/sunxi/cedrus/cedrus.h
> +++ b/drivers/staging/media/sunxi/cedrus/cedrus.h
> @@ -100,8 +100,6 @@ struct cedrus_ctx {
>  	struct v4l2_ctrl_handler	hdl;
>  	struct v4l2_ctrl		**ctrls;
>  
> -	struct vb2_buffer		*dst_bufs[VIDEO_MAX_FRAME];
> -
>  	union {
>  		struct {
>  			void		*mv_col_buf;
> @@ -187,7 +185,7 @@ static inline dma_addr_t cedrus_dst_buf_addr(struct cedrus_ctx *ctx,
>  	if (index < 0)
>  		return 0;
>  
> -	buf = ctx->dst_bufs[index];
> +	buf = ctx->fh.m2m_ctx->cap_q_ctx.q.bufs[index];
>  	return buf ? cedrus_buf_addr(buf, &ctx->dst_fmt, plane) : 0;
>  }
>  
> diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> index d0ee3f90ff46..b2290f98d81a 100644
> --- a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> +++ b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> @@ -119,7 +119,7 @@ static void cedrus_write_frame_list(struct cedrus_ctx *ctx,
>  		if (buf_idx < 0)
>  			continue;
>  
> -		cedrus_buf = vb2_to_cedrus_buffer(ctx->dst_bufs[buf_idx]);
> +		cedrus_buf = vb2_to_cedrus_buffer(cap_q->bufs[buf_idx]);
>  		position = cedrus_buf->codec.h264.position;
>  		used_dpbs |= BIT(position);
>  
> @@ -194,7 +194,7 @@ static void _cedrus_write_ref_list(struct cedrus_ctx *ctx,
>  		if (buf_idx < 0)
>  			continue;
>  
> -		ref_buf = to_vb2_v4l2_buffer(ctx->dst_bufs[buf_idx]);
> +		ref_buf = to_vb2_v4l2_buffer(cap_q->bufs[buf_idx]);
>  		cedrus_buf = vb2_v4l2_to_cedrus_buffer(ref_buf);
>  		position = cedrus_buf->codec.h264.position;
>  
> diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_video.c b/drivers/staging/media/sunxi/cedrus/cedrus_video.c
> index e2b530b1a956..681dfe3367a6 100644
> --- a/drivers/staging/media/sunxi/cedrus/cedrus_video.c
> +++ b/drivers/staging/media/sunxi/cedrus/cedrus_video.c
> @@ -411,26 +411,6 @@ static void cedrus_queue_cleanup(struct vb2_queue *vq, u32 state)
>  	}
>  }
>  
> -static int cedrus_buf_init(struct vb2_buffer *vb)
> -{
> -	struct vb2_queue *vq = vb->vb2_queue;
> -	struct cedrus_ctx *ctx = vb2_get_drv_priv(vq);
> -
> -	if (!V4L2_TYPE_IS_OUTPUT(vq->type))
> -		ctx->dst_bufs[vb->index] = vb;
> -
> -	return 0;
> -}
> -
> -static void cedrus_buf_cleanup(struct vb2_buffer *vb)
> -{
> -	struct vb2_queue *vq = vb->vb2_queue;
> -	struct cedrus_ctx *ctx = vb2_get_drv_priv(vq);
> -
> -	if (!V4L2_TYPE_IS_OUTPUT(vq->type))
> -		ctx->dst_bufs[vb->index] = NULL;
> -}
> -
>  static int cedrus_buf_out_validate(struct vb2_buffer *vb)
>  {
>  	struct vb2_v4l2_buffer *vbuf = to_vb2_v4l2_buffer(vb);
> @@ -517,8 +497,6 @@ static void cedrus_buf_request_complete(struct vb2_buffer *vb)
>  static struct vb2_ops cedrus_qops = {
>  	.queue_setup		= cedrus_queue_setup,
>  	.buf_prepare		= cedrus_buf_prepare,
> -	.buf_init		= cedrus_buf_init,
> -	.buf_cleanup		= cedrus_buf_cleanup,
>  	.buf_queue		= cedrus_buf_queue,
>  	.buf_out_validate	= cedrus_buf_out_validate,
>  	.buf_request_complete	= cedrus_buf_request_complete,


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/7] media: cedrus: Don't set chroma size for scale & rotation
  2019-05-30 21:15 ` [PATCH 5/7] media: cedrus: Don't set chroma size for scale & rotation Jernej Skrabec
@ 2019-06-05 21:08   ` Paul Kocialkowski
  0 siblings, 0 replies; 30+ messages in thread
From: Paul Kocialkowski @ 2019-06-05 21:08 UTC (permalink / raw)
  To: Jernej Skrabec, maxime.ripard
  Cc: wens, mchehab, gregkh, linux-media, devel, linux-arm-kernel,
	linux-kernel

Hi,

Le jeudi 30 mai 2019 à 23:15 +0200, Jernej Skrabec a écrit :
> Scale and rotation are currently not implemented, so it makes no sense to
> set chroma size for it.
> 
> Signed-off-by: Jernej Skrabec <jernej.skrabec@siol.net>

Acked-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
Cheers,

Paul

> ---
>  drivers/staging/media/sunxi/cedrus/cedrus_hw.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_hw.c b/drivers/staging/media/sunxi/cedrus/cedrus_hw.c
> index 9c5819def186..9de20ae47916 100644
> --- a/drivers/staging/media/sunxi/cedrus/cedrus_hw.c
> +++ b/drivers/staging/media/sunxi/cedrus/cedrus_hw.c
> @@ -79,9 +79,6 @@ void cedrus_dst_format_set(struct cedrus_dev *dev,
>  		reg = VE_PRIMARY_OUT_FMT_NV12;
>  		cedrus_write(dev, VE_PRIMARY_OUT_FMT, reg);
>  
> -		reg = VE_CHROMA_BUF_LEN_SDRT(chroma_size / 2);
> -		cedrus_write(dev, VE_CHROMA_BUF_LEN, reg);
> -
>  		reg = chroma_size / 2;
>  		cedrus_write(dev, VE_PRIMARY_CHROMA_BUF_LEN, reg);
>  


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/7] media: cedrus: Add infra for extra buffers connected to capture buffers
  2019-06-03 15:48     ` Jernej Škrabec
@ 2019-06-05 21:10       ` Paul Kocialkowski
  2019-06-05 21:52         ` Jernej Škrabec
  2019-06-06  8:33       ` Maxime Ripard
  1 sibling, 1 reply; 30+ messages in thread
From: Paul Kocialkowski @ 2019-06-05 21:10 UTC (permalink / raw)
  To: Jernej Škrabec, Maxime Ripard
  Cc: wens, mchehab, gregkh, linux-media, devel, linux-arm-kernel,
	linux-kernel

Hi,

Le lundi 03 juin 2019 à 17:48 +0200, Jernej Škrabec a écrit :
> Dne ponedeljek, 03. junij 2019 ob 14:18:59 CEST je Maxime Ripard napisal(a):
> > Hi,
> > 
> > On Thu, May 30, 2019 at 11:15:15PM +0200, Jernej Skrabec wrote:
> > > H264 and HEVC engines need additional buffers for each capture buffer.
> > > H264 engine has this currently solved by allocating fixed size pool,
> > > which is not ideal. Most of the time pool size is much bigger than it
> > > needs to be.
> > > 
> > > Ideally, extra buffer should be allocated at buffer initialization, but
> > > that's not efficient. It's size in H264 depends on flags set in SPS, but
> > > that information is not available in buffer init callback.
> > > 
> > > Signed-off-by: Jernej Skrabec <jernej.skrabec@siol.net>
> > > ---
> > > 
> > >  drivers/staging/media/sunxi/cedrus/cedrus.h   |  4 ++++
> > >  .../staging/media/sunxi/cedrus/cedrus_video.c | 19 +++++++++++++++++++
> > >  2 files changed, 23 insertions(+)
> > > 
> > > diff --git a/drivers/staging/media/sunxi/cedrus/cedrus.h
> > > b/drivers/staging/media/sunxi/cedrus/cedrus.h index
> > > d8e6777e5e27..16c1bdfd243a 100644
> > > --- a/drivers/staging/media/sunxi/cedrus/cedrus.h
> > > +++ b/drivers/staging/media/sunxi/cedrus/cedrus.h
> > > @@ -81,6 +81,10 @@ struct cedrus_run {
> > > 
> > >  struct cedrus_buffer {
> > >  
> > >  	struct v4l2_m2m_buffer          m2m_buf;
> > > 
> > > +	void		*extra_buf;
> > > +	dma_addr_t	extra_buf_dma;
> > > +	ssize_t		extra_buf_size;
> > > +
> > > 
> > >  	union {
> > >  	
> > >  		struct {
> > >  		
> > >  			unsigned int			position;
> > > 
> > > diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_video.c
> > > b/drivers/staging/media/sunxi/cedrus/cedrus_video.c index
> > > 681dfe3367a6..d756e0e69634 100644
> > > --- a/drivers/staging/media/sunxi/cedrus/cedrus_video.c
> > > +++ b/drivers/staging/media/sunxi/cedrus/cedrus_video.c
> > > @@ -411,6 +411,24 @@ static void cedrus_queue_cleanup(struct vb2_queue
> > > *vq, u32 state)> 
> > >  	}
> > >  
> > >  }
> > > 
> > > +static void cedrus_buf_cleanup(struct vb2_buffer *vb)
> > > +{
> > > +	struct vb2_queue *vq = vb->vb2_queue;
> > > +
> > > +	if (!V4L2_TYPE_IS_OUTPUT(vq->type)) {
> > > +		struct cedrus_ctx *ctx = vb2_get_drv_priv(vq);
> > > +		struct cedrus_buffer *cedrus_buf;
> > > +
> > > +		cedrus_buf = vb2_to_cedrus_buffer(vq->bufs[vb->index]);
> > > +
> > > +		if (cedrus_buf->extra_buf_size)
> > > +			dma_free_coherent(ctx->dev->dev,
> > > +					  cedrus_buf-
> > extra_buf_size,
> > > +					  cedrus_buf-
> > extra_buf,
> > > +					  cedrus_buf-
> > extra_buf_dma);
> > > +	}
> > > +}
> > > +
> > 
> > I'm really not a fan of allocating something somewhere, and freeing it
> > somewhere else. Making sure you don't leak something is hard enough to
> > not have some non-trivial allocation scheme.
> 
> Ok, what about introducing two new optional methods in engine callbacks, 
> buffer_init and buffer_destroy, which would be called from cedrus_buf_init() and 
> cedrus_buf_cleanup(), respectively. That way all (de)allocation logic stays 
> within the same engine.

I'm thinking that we should have v4l2-framework-level per-codec helpers
to provide ops for these kinds of things, since they tend be quite
common across decoders.

Cheers,

Paul


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 7/7] media: cedrus: Improve H264 memory efficiency
  2019-06-03 16:37     ` Jernej Škrabec
@ 2019-06-05 21:12       ` Paul Kocialkowski
  0 siblings, 0 replies; 30+ messages in thread
From: Paul Kocialkowski @ 2019-06-05 21:12 UTC (permalink / raw)
  To: Jernej Škrabec, Maxime Ripard
  Cc: wens, mchehab, gregkh, linux-media, devel, linux-arm-kernel,
	linux-kernel

Hi,

Le lundi 03 juin 2019 à 18:37 +0200, Jernej Škrabec a écrit :
> Dne ponedeljek, 03. junij 2019 ob 14:23:28 CEST je Maxime Ripard napisal(a):
> > On Thu, May 30, 2019 at 11:15:16PM +0200, Jernej Skrabec wrote:
> > > H264 decoder driver preallocated pretty big worst case mv col buffer
> > > pool. It turns out that pool is most of the time much bigger than it
> > > needs to be.
> > > 
> > > Solution implemented here is to allocate memory only if capture buffer
> > > is actually used and only as much as it is really necessary.
> > > 
> > > This is also preparation for 4K video decoding support, which will be
> > > implemented later.
> > 
> > What is it doing exactly to prepare for 4k?
> 
> Well, with that change 4K videos can be actually watched with 256 MiB CMA 
> pool, but I can drop this statement in next version.
> 
> I concentrated on 256 MiB CMA pool, because it's maximum memory size supported 
> by older VPU versions, but I think they don't support 4K decoding. I don't 
> have them, so I can't test that hypothesis.

I think it's a fair goal to try and optimize the CMA pool usage, maybe
it should be presented as that and I guess it's fine to connect that to
4K decoding if you like :)

Either way, I think we should have per-codec framework callbacks for
these kinds of things.

Cheers,

Paul

> > > Signed-off-by: Jernej Skrabec <jernej.skrabec@siol.net>
> > > ---
> > > 
> > >  drivers/staging/media/sunxi/cedrus/cedrus.h   |  4 -
> > >  .../staging/media/sunxi/cedrus/cedrus_h264.c  | 81 +++++++------------
> > >  2 files changed, 28 insertions(+), 57 deletions(-)
> > > 
> > > diff --git a/drivers/staging/media/sunxi/cedrus/cedrus.h
> > > b/drivers/staging/media/sunxi/cedrus/cedrus.h index
> > > 16c1bdfd243a..fcbbbef65494 100644
> > > --- a/drivers/staging/media/sunxi/cedrus/cedrus.h
> > > +++ b/drivers/staging/media/sunxi/cedrus/cedrus.h
> > > @@ -106,10 +106,6 @@ struct cedrus_ctx {
> > > 
> > >  	union {
> > >  	
> > >  		struct {
> > > 
> > > -			void		*mv_col_buf;
> > > -			dma_addr_t	mv_col_buf_dma;
> > > -			ssize_t		mv_col_buf_field_size;
> > > -			ssize_t		mv_col_buf_size;
> > > 
> > >  			void		*pic_info_buf;
> > >  			dma_addr_t	pic_info_buf_dma;
> > >  			void		*neighbor_info_buf;
> > > 
> > > diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> > > b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c index
> > > b2290f98d81a..758fd0049e8f 100644
> > > --- a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> > > +++ b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> > > @@ -54,17 +54,14 @@ static void cedrus_h264_write_sram(struct cedrus_dev
> > > *dev,> 
> > >  		cedrus_write(dev, VE_AVC_SRAM_PORT_DATA, *buffer++);
> > >  
> > >  }
> > > 
> > > -static dma_addr_t cedrus_h264_mv_col_buf_addr(struct cedrus_ctx *ctx,
> > > -					      unsigned int 
> position,
> > > +static dma_addr_t cedrus_h264_mv_col_buf_addr(struct cedrus_buffer *buf,
> > > 
> > >  					      unsigned int 
> field)
> > >  
> > >  {
> > > 
> > > -	dma_addr_t addr = ctx->codec.h264.mv_col_buf_dma;
> > > -
> > > -	/* Adjust for the position */
> > > -	addr += position * ctx->codec.h264.mv_col_buf_field_size * 2;
> > > +	dma_addr_t addr = buf->extra_buf_dma;
> > > 
> > >  	/* Adjust for the field */
> > > 
> > > -	addr += field * ctx->codec.h264.mv_col_buf_field_size;
> > > +	if (field)
> > > +		addr += buf->extra_buf_size / 2;
> > > 
> > >  	return addr;
> > >  
> > >  }
> > > 
> > > @@ -76,7 +73,6 @@ static void cedrus_fill_ref_pic(struct cedrus_ctx *ctx,
> > > 
> > >  				struct cedrus_h264_sram_ref_pic 
> *pic)
> > >  
> > >  {
> > >  
> > >  	struct vb2_buffer *vbuf = &buf->m2m_buf.vb.vb2_buf;
> > > 
> > > -	unsigned int position = buf->codec.h264.position;
> > > 
> > >  	pic->top_field_order_cnt = cpu_to_le32(top_field_order_cnt);
> > >  	pic->bottom_field_order_cnt = cpu_to_le32(bottom_field_order_cnt);
> > > 
> > > @@ -84,10 +80,8 @@ static void cedrus_fill_ref_pic(struct cedrus_ctx *ctx,
> > > 
> > >  	pic->luma_ptr = cpu_to_le32(cedrus_buf_addr(vbuf, &ctx->dst_fmt, 
> 0));
> > >  	pic->chroma_ptr = cpu_to_le32(cedrus_buf_addr(vbuf, &ctx->dst_fmt, 
> 1));
> > > -	pic->mv_col_top_ptr =
> > > -		cpu_to_le32(cedrus_h264_mv_col_buf_addr(ctx, position, 
> 0));
> > > -	pic->mv_col_bot_ptr =
> > > -		cpu_to_le32(cedrus_h264_mv_col_buf_addr(ctx, position, 
> 1));
> > > +	pic->mv_col_top_ptr = cpu_to_le32(cedrus_h264_mv_col_buf_addr(buf, 
> 0));
> > > +	pic->mv_col_bot_ptr = cpu_to_le32(cedrus_h264_mv_col_buf_addr(buf, 
> 1));
> > >  }
> > >  
> > >  static void cedrus_write_frame_list(struct cedrus_ctx *ctx,
> > > 
> > > @@ -142,6 +136,28 @@ static void cedrus_write_frame_list(struct cedrus_ctx
> > > *ctx,> 
> > >  	output_buf = vb2_to_cedrus_buffer(&run->dst->vb2_buf);
> > >  	output_buf->codec.h264.position = position;
> > > 
> > > +	if (!output_buf->extra_buf_size) {
> > > +		const struct v4l2_ctrl_h264_sps *sps = run->h264.sps;
> > > +		unsigned int field_size;
> > > +
> > > +		field_size = DIV_ROUND_UP(ctx->src_fmt.width, 16) *
> > > +			DIV_ROUND_UP(ctx->src_fmt.height, 16) * 16;
> > > +		if (!(sps->flags & 
> V4L2_H264_SPS_FLAG_DIRECT_8X8_INFERENCE))
> > > +			field_size = field_size * 2;
> > > +		if (!(sps->flags & V4L2_H264_SPS_FLAG_FRAME_MBS_ONLY))
> > > +			field_size = field_size * 2;
> > > +
> > > +		output_buf->extra_buf_size = field_size * 2;
> > > +		output_buf->extra_buf =
> > > +			dma_alloc_coherent(dev->dev,
> > > +					   output_buf-
> > extra_buf_size,
> > > +					   &output_buf-
> > extra_buf_dma,
> > > +					   GFP_KERNEL);
> > > +
> > > +		if (!output_buf->extra_buf)
> > > +			output_buf->extra_buf_size = 0;
> > > +	}
> > > +
> > 
> > That also means that instead of allocating that buffer exactly once,
> > you now allocate it for each output buffer?
> 
> It's not completely the same. I'm allocating multiple times, yes, but much 
> smaller chunks and only if needed.
> 
> Still, this slight overhead happens only when buffer is used for the first time. 
> When buffer is reused, this MV buffer is also reused.
> 
> > I guess that it will cleaned up by your previous patch at
> > buffer_cleanup time, so after it's no longer a reference frame?
> 
> Yes, it will be deallocated in buffer_cleanup, but only after capture buffers 
> are freed, which usually happens when device file descriptor is closed.
> 
> Buffers which holds reference frames are later reused, together with it's MV 
> buffer, so there's no overhead.
> 
> > What is the average memory usage before, and after that change during
> > a playback, and what is the runtime cost of doing it multiple times
> > instead of once?
> 
> As I already said, overhead is present only when buffer is used for the first 
> time, which is not ideal, but allows to calculate minimal buffer size needed 
> and even doesn't allocate anything if capture buffer is not used at all.
> 
> I didn't collect any exact numbers, but with this change I can play H264 and 
> HEVC (with similar modification) 4K video samples with 256 MiB CMA pool. 
> Without this change, it's not really possible. You can argue "but what if 4K 
> video use 16 reference frames", then yes, only solution is to increase CMA 
> pool, but why reserve extra memory which will never be used?
> 
> I've been using this optimization for past ~6 month with no issues noticed. If 
> you feel better, I can change this to be a bit conservative and allocate MV 
> buffer when buffer_init is called. This will consume a bit more memory as SPS is 
> not available at that time (worst case buffer size estimation), but it still 
> won't allocate MV buffers for unallocated capture frames.
> 
> Best regards,
> Jernej
> 
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/7] media: cedrus: Add infra for extra buffers connected to capture buffers
  2019-06-05 21:10       ` Paul Kocialkowski
@ 2019-06-05 21:52         ` Jernej Škrabec
  0 siblings, 0 replies; 30+ messages in thread
From: Jernej Škrabec @ 2019-06-05 21:52 UTC (permalink / raw)
  To: Paul Kocialkowski
  Cc: Maxime Ripard, wens, mchehab, gregkh, linux-media, devel,
	linux-arm-kernel, linux-kernel

Dne sreda, 05. junij 2019 ob 23:10:17 CEST je Paul Kocialkowski napisal(a):
> Hi,
> 
> Le lundi 03 juin 2019 à 17:48 +0200, Jernej Škrabec a écrit :
> > Dne ponedeljek, 03. junij 2019 ob 14:18:59 CEST je Maxime Ripard 
napisal(a):
> > > Hi,
> > > 
> > > On Thu, May 30, 2019 at 11:15:15PM +0200, Jernej Skrabec wrote:
> > > > H264 and HEVC engines need additional buffers for each capture buffer.
> > > > H264 engine has this currently solved by allocating fixed size pool,
> > > > which is not ideal. Most of the time pool size is much bigger than it
> > > > needs to be.
> > > > 
> > > > Ideally, extra buffer should be allocated at buffer initialization,
> > > > but
> > > > that's not efficient. It's size in H264 depends on flags set in SPS,
> > > > but
> > > > that information is not available in buffer init callback.
> > > > 
> > > > Signed-off-by: Jernej Skrabec <jernej.skrabec@siol.net>
> > > > ---
> > > > 
> > > >  drivers/staging/media/sunxi/cedrus/cedrus.h   |  4 ++++
> > > >  .../staging/media/sunxi/cedrus/cedrus_video.c | 19
> > > >  +++++++++++++++++++
> > > >  2 files changed, 23 insertions(+)
> > > > 
> > > > diff --git a/drivers/staging/media/sunxi/cedrus/cedrus.h
> > > > b/drivers/staging/media/sunxi/cedrus/cedrus.h index
> > > > d8e6777e5e27..16c1bdfd243a 100644
> > > > --- a/drivers/staging/media/sunxi/cedrus/cedrus.h
> > > > +++ b/drivers/staging/media/sunxi/cedrus/cedrus.h
> > > > @@ -81,6 +81,10 @@ struct cedrus_run {
> > > > 
> > > >  struct cedrus_buffer {
> > > >  
> > > >  	struct v4l2_m2m_buffer          m2m_buf;
> > > > 
> > > > +	void		*extra_buf;
> > > > +	dma_addr_t	extra_buf_dma;
> > > > +	ssize_t		extra_buf_size;
> > > > +
> > > > 
> > > >  	union {
> > > >  	
> > > >  		struct {
> > > >  		
> > > >  			unsigned int			position;
> > > > 
> > > > diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_video.c
> > > > b/drivers/staging/media/sunxi/cedrus/cedrus_video.c index
> > > > 681dfe3367a6..d756e0e69634 100644
> > > > --- a/drivers/staging/media/sunxi/cedrus/cedrus_video.c
> > > > +++ b/drivers/staging/media/sunxi/cedrus/cedrus_video.c
> > > > @@ -411,6 +411,24 @@ static void cedrus_queue_cleanup(struct vb2_queue
> > > > *vq, u32 state)>
> > > > 
> > > >  	}
> > > >  
> > > >  }
> > > > 
> > > > +static void cedrus_buf_cleanup(struct vb2_buffer *vb)
> > > > +{
> > > > +	struct vb2_queue *vq = vb->vb2_queue;
> > > > +
> > > > +	if (!V4L2_TYPE_IS_OUTPUT(vq->type)) {
> > > > +		struct cedrus_ctx *ctx = vb2_get_drv_priv(vq);
> > > > +		struct cedrus_buffer *cedrus_buf;
> > > > +
> > > > +		cedrus_buf = vb2_to_cedrus_buffer(vq->bufs[vb->index]);
> > > > +
> > > > +		if (cedrus_buf->extra_buf_size)
> > > > +			dma_free_coherent(ctx->dev->dev,
> > > > +					  cedrus_buf-
> > > 
> > > extra_buf_size,
> > > 
> > > > +					  cedrus_buf-
> > > 
> > > extra_buf,
> > > 
> > > > +					  cedrus_buf-
> > > 
> > > extra_buf_dma);
> > > 
> > > > +	}
> > > > +}
> > > > +
> > > 
> > > I'm really not a fan of allocating something somewhere, and freeing it
> > > somewhere else. Making sure you don't leak something is hard enough to
> > > not have some non-trivial allocation scheme.
> > 
> > Ok, what about introducing two new optional methods in engine callbacks,
> > buffer_init and buffer_destroy, which would be called from
> > cedrus_buf_init() and cedrus_buf_cleanup(), respectively. That way all
> > (de)allocation logic stays within the same engine.
> 
> I'm thinking that we should have v4l2-framework-level per-codec helpers
> to provide ops for these kinds of things, since they tend be quite
> common across decoders.

Isn't .buf_init and .buf_cleanup callbacks provided by struct vb2_ops meant 
for exactly that?

Related, but different topic. I managed to fix 10-bit HEVC support on H6, but 
when working in 8-bit mode, capture buffers have to be big enough to hold 
normal NV12 decoded image plus extra buffer for 2 bits of each pixel. VPU 
accepts only offset from destination buffer for this extra buffer instead of full 
address. How we will handle that? Override sizeimage when allocating? But 
there we don't have information if it's 10-bit video or not. As you can see, 
I'm not a fan of overallocating.

I suspect we will have even bigger issues when decoding 10-bit HEVC video in 
P010 format, which is the only 10-bit YUV format useable by DRM driver (not 
implemented yet). From what I know till now, VPU needs aforementioned 8-bit+2-
bit buffers (for decoding) and another one in which it rearranges samples in 
P010 format. But that has to be confirmed first.

Best regards,
Jernej



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/7] media: cedrus: Add infra for extra buffers connected to capture buffers
  2019-06-03 15:48     ` Jernej Škrabec
  2019-06-05 21:10       ` Paul Kocialkowski
@ 2019-06-06  8:33       ` Maxime Ripard
  1 sibling, 0 replies; 30+ messages in thread
From: Maxime Ripard @ 2019-06-06  8:33 UTC (permalink / raw)
  To: Jernej Škrabec
  Cc: paul.kocialkowski, wens, mchehab, gregkh, linux-media, devel,
	linux-arm-kernel, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1375 bytes --]

On Mon, Jun 03, 2019 at 05:48:25PM +0200, Jernej Škrabec wrote:
> Dne ponedeljek, 03. junij 2019 ob 14:18:59 CEST je Maxime Ripard napisal(a):
> > > +static void cedrus_buf_cleanup(struct vb2_buffer *vb)
> > > +{
> > > +	struct vb2_queue *vq = vb->vb2_queue;
> > > +
> > > +	if (!V4L2_TYPE_IS_OUTPUT(vq->type)) {
> > > +		struct cedrus_ctx *ctx = vb2_get_drv_priv(vq);
> > > +		struct cedrus_buffer *cedrus_buf;
> > > +
> > > +		cedrus_buf = vb2_to_cedrus_buffer(vq->bufs[vb->index]);
> > > +
> > > +		if (cedrus_buf->extra_buf_size)
> > > +			dma_free_coherent(ctx->dev->dev,
> > > +					  cedrus_buf-
> >extra_buf_size,
> > > +					  cedrus_buf-
> >extra_buf,
> > > +					  cedrus_buf-
> >extra_buf_dma);
> > > +	}
> > > +}
> > > +
> >
> > I'm really not a fan of allocating something somewhere, and freeing it
> > somewhere else. Making sure you don't leak something is hard enough to
> > not have some non-trivial allocation scheme.
>
> Ok, what about introducing two new optional methods in engine callbacks,
> buffer_init and buffer_destroy, which would be called from cedrus_buf_init() and
> cedrus_buf_cleanup(), respectively. That way all (de)allocation logic stays
> within the same engine.

Yep, that would work for me.

Thanks!
Maxime

--
Maxime Ripard, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/7] media: cedrus: Fix decoding for some H264 videos
  2019-06-03 15:37     ` Jernej Škrabec
@ 2019-06-06 12:45       ` Dan Carpenter
  0 siblings, 0 replies; 30+ messages in thread
From: Dan Carpenter @ 2019-06-06 12:45 UTC (permalink / raw)
  To: Jernej Škrabec
  Cc: Maxime Ripard, devel, gregkh, linux-kernel, paul.kocialkowski,
	wens, mchehab, linux-arm-kernel, linux-media

On Mon, Jun 03, 2019 at 05:37:17PM +0200, Jernej Škrabec wrote:
> Dne ponedeljek, 03. junij 2019 ob 13:55:36 CEST je Maxime Ripard napisal(a):
> > int current = 0;
> > 
> > while (current < num) {
> >     int tmp = min(num - current, 32);
> > 
> >     cedrus_write(dev, VE_H264_TRIGGER_TYPE, 0x3 | (current << 8))
                                                       ^^^^^^^
This should be "tmp << 8" instead of "current << 8" though.


> >     while (...)
> >        ...
> > 
> >     current += tmp;
> > }
> 

regards,
dan carpenter

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/7] media: cedrus: Improvements/cleanup
  2019-05-30 21:15 [PATCH 0/7] media: cedrus: Improvements/cleanup Jernej Skrabec
                   ` (6 preceding siblings ...)
  2019-05-30 21:15 ` [PATCH 7/7] media: cedrus: Improve H264 memory efficiency Jernej Skrabec
@ 2019-08-12 12:12 ` Hans Verkuil
  2019-08-12 12:21   ` Jernej Škrabec
  2019-08-12 13:55   ` Maxime Ripard
  7 siblings, 2 replies; 30+ messages in thread
From: Hans Verkuil @ 2019-08-12 12:12 UTC (permalink / raw)
  To: Jernej Skrabec, paul.kocialkowski, maxime.ripard
  Cc: wens, mchehab, gregkh, linux-media, devel, linux-arm-kernel,
	linux-kernel

On 5/30/19 11:15 PM, Jernej Skrabec wrote:
> Here is first batch of random Cedrus improvements/cleanups. Only patch 2
> has a change which raises a question about H264 controls.
> 
> Changes were tested on H3 SoC using modified ffmpeg and Kodi.
> 
> Please take a look.

This has been sitting in patchwork for quite some time. I've updated the
status of the various patches and most needed extra work.

It seems that patches 4/7 and 5/7 are OK. Maxime, can you please confirm
that these two are still valid? They apply cleanly on the latest master
at least, but since they are a bit old I prefer to have confirmation that
it's OK to merge them.

Regards,

	Hans

> 
> Best regards,
> Jernej
> 
> Jernej Skrabec (7):
>   media: cedrus: Disable engine after each slice decoding
>   media: cedrus: Fix H264 default reference index count
>   media: cedrus: Fix decoding for some H264 videos
>   media: cedrus: Remove dst_bufs from context
>   media: cedrus: Don't set chroma size for scale & rotation
>   media: cedrus: Add infra for extra buffers connected to capture
>     buffers
>   media: cedrus: Improve H264 memory efficiency
> 
>  drivers/staging/media/sunxi/cedrus/cedrus.h   |  12 +-
>  .../staging/media/sunxi/cedrus/cedrus_h264.c  | 115 ++++++++----------
>  .../staging/media/sunxi/cedrus/cedrus_hw.c    |   4 +-
>  .../staging/media/sunxi/cedrus/cedrus_video.c |  25 ++--
>  4 files changed, 68 insertions(+), 88 deletions(-)
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/7] media: cedrus: Improvements/cleanup
  2019-08-12 12:12 ` [PATCH 0/7] media: cedrus: Improvements/cleanup Hans Verkuil
@ 2019-08-12 12:21   ` Jernej Škrabec
  2019-08-12 13:55   ` Maxime Ripard
  1 sibling, 0 replies; 30+ messages in thread
From: Jernej Škrabec @ 2019-08-12 12:21 UTC (permalink / raw)
  To: Hans Verkuil
  Cc: paul.kocialkowski, maxime.ripard, wens, mchehab, gregkh,
	linux-media, devel, linux-arm-kernel, linux-kernel

Dne ponedeljek, 12. avgust 2019 ob 14:12:21 CEST je Hans Verkuil napisal(a):
> On 5/30/19 11:15 PM, Jernej Skrabec wrote:
> > Here is first batch of random Cedrus improvements/cleanups. Only patch 2
> > has a change which raises a question about H264 controls.
> > 
> > Changes were tested on H3 SoC using modified ffmpeg and Kodi.
> > 
> > Please take a look.
> 
> This has been sitting in patchwork for quite some time. I've updated the
> status of the various patches and most needed extra work.
> 
> It seems that patches 4/7 and 5/7 are OK. Maxime, can you please confirm
> that these two are still valid? They apply cleanly on the latest master
> at least, but since they are a bit old I prefer to have confirmation that
> it's OK to merge them.

I'm not sure about patch 4, IIRC Boris Brezillon also wants to improve that 
area in separate series, but patch 5 should be safe to merge.

Anyway, I didn't post new version because I'm waiting on close-to-be-merged 
H264 and HEVC patch series to be actually merged.

Best regards,
Jernej

> 
> Regards,
> 
> 	Hans
> 
> > Best regards,
> > Jernej
> > 
> > Jernej Skrabec (7):
> >   media: cedrus: Disable engine after each slice decoding
> >   media: cedrus: Fix H264 default reference index count
> >   media: cedrus: Fix decoding for some H264 videos
> >   media: cedrus: Remove dst_bufs from context
> >   media: cedrus: Don't set chroma size for scale & rotation
> >   media: cedrus: Add infra for extra buffers connected to capture
> >   
> >     buffers
> >   
> >   media: cedrus: Improve H264 memory efficiency
> >  
> >  drivers/staging/media/sunxi/cedrus/cedrus.h   |  12 +-
> >  .../staging/media/sunxi/cedrus/cedrus_h264.c  | 115 ++++++++----------
> >  .../staging/media/sunxi/cedrus/cedrus_hw.c    |   4 +-
> >  .../staging/media/sunxi/cedrus/cedrus_video.c |  25 ++--
> >  4 files changed, 68 insertions(+), 88 deletions(-)





^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/7] media: cedrus: Disable engine after each slice decoding
  2019-06-03 11:38   ` Maxime Ripard
@ 2019-08-12 13:28     ` Ezequiel Garcia
  0 siblings, 0 replies; 30+ messages in thread
From: Ezequiel Garcia @ 2019-08-12 13:28 UTC (permalink / raw)
  To: Maxime Ripard, Jernej Skrabec
  Cc: paul.kocialkowski, wens, mchehab, gregkh, linux-media, devel,
	linux-arm-kernel, linux-kernel

Hi Jernej,

On Mon, 2019-06-03 at 13:38 +0200, Maxime Ripard wrote:
> Hi,
> 
> On Thu, May 30, 2019 at 11:15:10PM +0200, Jernej Skrabec wrote:
> > libvdpau-sunxi always disables engine after each decoded slice.
> > Do same in Cedrus driver.
> > 
> > Presumably this also lowers power consumption which is always nice.
> > 
> > Signed-off-by: Jernej Skrabec <jernej.skrabec@siol.net>
> 
> Is it fixing anything though?
> 
> I indeed saw that cedar did disable it everytime, but I couldn't find
> a reason why.
> 
> Also, the power management improvement would need to be measured, it
> can even create the opposite situation where the device will draw more
> current from being woken up than if it had just remained disabled.
> 

While reviewing this, I'm noticing that cedrus_engine_disable can
be marked for static storage (with or without this patch).

Regards,
Eze


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/7] media: cedrus: Remove dst_bufs from context
  2019-05-30 21:15 ` [PATCH 4/7] media: cedrus: Remove dst_bufs from context Jernej Skrabec
  2019-06-03 12:13   ` Maxime Ripard
  2019-06-05 21:07   ` Paul Kocialkowski
@ 2019-08-12 13:42   ` Ezequiel Garcia
  2 siblings, 0 replies; 30+ messages in thread
From: Ezequiel Garcia @ 2019-08-12 13:42 UTC (permalink / raw)
  To: Jernej Skrabec, paul.kocialkowski, maxime.ripard
  Cc: wens, mchehab, gregkh, linux-media, devel, linux-arm-kernel,
	linux-kernel

On Thu, 2019-05-30 at 23:15 +0200, Jernej Skrabec wrote:
> This array is just duplicated capture buffer queue. Remove it and adjust
> code to look into capture buffer queue instead.
> 
> Signed-off-by: Jernej Skrabec <jernej.skrabec@siol.net>
> ---
>  drivers/staging/media/sunxi/cedrus/cedrus.h   |  4 +---
>  .../staging/media/sunxi/cedrus/cedrus_h264.c  |  4 ++--
>  .../staging/media/sunxi/cedrus/cedrus_video.c | 22 -------------------
>  3 files changed, 3 insertions(+), 27 deletions(-)
> 
> diff --git a/drivers/staging/media/sunxi/cedrus/cedrus.h b/drivers/staging/media/sunxi/cedrus/cedrus.h
> index 3f476d0fd981..d8e6777e5e27 100644
> --- a/drivers/staging/media/sunxi/cedrus/cedrus.h
> +++ b/drivers/staging/media/sunxi/cedrus/cedrus.h
> @@ -100,8 +100,6 @@ struct cedrus_ctx {
>  	struct v4l2_ctrl_handler	hdl;
>  	struct v4l2_ctrl		**ctrls;
>  
> -	struct vb2_buffer		*dst_bufs[VIDEO_MAX_FRAME];
> -
>  	union {
>  		struct {
>  			void		*mv_col_buf;
> @@ -187,7 +185,7 @@ static inline dma_addr_t cedrus_dst_buf_addr(struct cedrus_ctx *ctx,
>  	if (index < 0)
>  		return 0;
>  
> -	buf = ctx->dst_bufs[index];
> +	buf = ctx->fh.m2m_ctx->cap_q_ctx.q.bufs[index];

I think you can use v4l2_m2m_get_dst_vq() to access the queue,
and vb2_get_buffer() to access buffers in a vb2 queue.

>  	return buf ? cedrus_buf_addr(buf, &ctx->dst_fmt, plane) : 0;
>  }
>  
> diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> index d0ee3f90ff46..b2290f98d81a 100644
> --- a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> +++ b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> @@ -119,7 +119,7 @@ static void cedrus_write_frame_list(struct cedrus_ctx *ctx,
>  		if (buf_idx < 0)
>  			continue;
>  
> -		cedrus_buf = vb2_to_cedrus_buffer(ctx->dst_bufs[buf_idx]);
> +		cedrus_buf = vb2_to_cedrus_buffer(cap_q->bufs[buf_idx]);

Ditto about vb2_get_buffer.

>  		position = cedrus_buf->codec.h264.position;
>  		used_dpbs |= BIT(position);
>  
> @@ -194,7 +194,7 @@ static void _cedrus_write_ref_list(struct cedrus_ctx *ctx,
>  		if (buf_idx < 0)
>  			continue;
>  
> -		ref_buf = to_vb2_v4l2_buffer(ctx->dst_bufs[buf_idx]);
> +		ref_buf = to_vb2_v4l2_buffer(cap_q->bufs[buf_idx]);

Ditto about vb2_get_buffer.

With those changes:

Reviewed-by: Ezequiel Garcia <ezequiel@collabora.com>

Thanks,
Ezequiel


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/7] media: cedrus: Improvements/cleanup
  2019-08-12 12:12 ` [PATCH 0/7] media: cedrus: Improvements/cleanup Hans Verkuil
  2019-08-12 12:21   ` Jernej Škrabec
@ 2019-08-12 13:55   ` Maxime Ripard
  1 sibling, 0 replies; 30+ messages in thread
From: Maxime Ripard @ 2019-08-12 13:55 UTC (permalink / raw)
  To: Hans Verkuil
  Cc: Jernej Skrabec, paul.kocialkowski, wens, mchehab, gregkh,
	linux-media, devel, linux-arm-kernel, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 869 bytes --]

Hi!

On Mon, Aug 12, 2019 at 02:12:21PM +0200, Hans Verkuil wrote:
> On 5/30/19 11:15 PM, Jernej Skrabec wrote:
> > Here is first batch of random Cedrus improvements/cleanups. Only patch 2
> > has a change which raises a question about H264 controls.
> >
> > Changes were tested on H3 SoC using modified ffmpeg and Kodi.
> >
> > Please take a look.
>
> This has been sitting in patchwork for quite some time. I've updated the
> status of the various patches and most needed extra work.
>
> It seems that patches 4/7 and 5/7 are OK. Maxime, can you please confirm
> that these two are still valid? They apply cleanly on the latest master
> at least, but since they are a bit old I prefer to have confirmation that
> it's OK to merge them.

Yes, you can definitely merge those.

Maxime

--
Maxime Ripard, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, back to index

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-30 21:15 [PATCH 0/7] media: cedrus: Improvements/cleanup Jernej Skrabec
2019-05-30 21:15 ` [PATCH 1/7] media: cedrus: Disable engine after each slice decoding Jernej Skrabec
2019-06-03 11:38   ` Maxime Ripard
2019-08-12 13:28     ` Ezequiel Garcia
2019-05-30 21:15 ` [PATCH 2/7] media: cedrus: Fix H264 default reference index count Jernej Skrabec
2019-06-03 11:46   ` Maxime Ripard
2019-06-03 15:34     ` Jernej Škrabec
2019-05-30 21:15 ` [PATCH 3/7] media: cedrus: Fix decoding for some H264 videos Jernej Skrabec
2019-06-03 11:55   ` Maxime Ripard
2019-06-03 15:37     ` Jernej Škrabec
2019-06-06 12:45       ` Dan Carpenter
2019-05-30 21:15 ` [PATCH 4/7] media: cedrus: Remove dst_bufs from context Jernej Skrabec
2019-06-03 12:13   ` Maxime Ripard
2019-06-05 21:07   ` Paul Kocialkowski
2019-08-12 13:42   ` Ezequiel Garcia
2019-05-30 21:15 ` [PATCH 5/7] media: cedrus: Don't set chroma size for scale & rotation Jernej Skrabec
2019-06-05 21:08   ` Paul Kocialkowski
2019-05-30 21:15 ` [PATCH 6/7] media: cedrus: Add infra for extra buffers connected to capture buffers Jernej Skrabec
2019-06-03 12:18   ` Maxime Ripard
2019-06-03 15:48     ` Jernej Škrabec
2019-06-05 21:10       ` Paul Kocialkowski
2019-06-05 21:52         ` Jernej Škrabec
2019-06-06  8:33       ` Maxime Ripard
2019-05-30 21:15 ` [PATCH 7/7] media: cedrus: Improve H264 memory efficiency Jernej Skrabec
2019-06-03 12:23   ` Maxime Ripard
2019-06-03 16:37     ` Jernej Škrabec
2019-06-05 21:12       ` Paul Kocialkowski
2019-08-12 12:12 ` [PATCH 0/7] media: cedrus: Improvements/cleanup Hans Verkuil
2019-08-12 12:21   ` Jernej Škrabec
2019-08-12 13:55   ` Maxime Ripard

Linux-Media Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-media/0 linux-media/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-media linux-media/ https://lore.kernel.org/linux-media \
		linux-media@vger.kernel.org linux-media@archiver.kernel.org
	public-inbox-index linux-media


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-media


AGPL code for this site: git clone https://public-inbox.org/ public-inbox