linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/3] Enable Hantro G1 post-processor
@ 2019-11-13 17:56 Ezequiel Garcia
  2019-11-13 17:56 ` [PATCH v3 1/3] media: hantro: Cleanup format negotiation helpers Ezequiel Garcia
                   ` (3 more replies)
  0 siblings, 4 replies; 21+ messages in thread
From: Ezequiel Garcia @ 2019-11-13 17:56 UTC (permalink / raw)
  To: linux-media
  Cc: kernel, Tomasz Figa, linux-rockchip, Heiko Stuebner,
	Jonas Karlman, Philipp Zabel, Boris Brezillon, Chris Healy,
	linux-kernel, Ezequiel Garcia

Hi all,

The Hantro G1 VPU post-processor block can be pipelined with
the decoder hardware, allowing to perform operations such as
color conversion, scaling, rotation, cropping, among others.

When the post-processor is enabled, the decoder hardware
needs its own set of NV12 buffers (the native decoder format),
and the post-processor is the owner of the CAPTURE buffers,
allocated for the post-processed format.

This way, applications obtain post-processed
(scaled, converted, etc) buffers transparently.

This feature is implemented by exposing the post-processed pixel
formats on ENUM_FMT, ordered as "preferred pixelformat first":

v4l2-ctl -d 1 --list-formats
ioctl: VIDIOC_ENUM_FMT
	Type: Video Capture Multiplanar

	[0]: 'NV12' (Y/CbCr 4:2:0)
	[1]: 'YUYV' (YUYV 4:2:2)

The order of preference in ENUM_FMT can be used as a hint
by applications. This series updates the uAPI specification
accordingly.

When the application sets a pixel format other than NV12,
the post-processor is transparently enabled.

Patch 1 is a cleanups needed to easier integrate the post-processor.
Patch 2 introduces the post-processing support.
Patch 3 updates the uAPI specification.

This is tested on RK3288 platforms with MPEG-2, VP8 and
H264 streams, decoding to YUY2 surfaces. For now, this series
is only adding support for NV12-to-YUY2 conversion.

Applies to media/master.

Future plans
------------

It seems to me that we should start moving this driver to use
regmap-based access to registers. However, such move is out of scope
and not entirely related to this post-processor enablement.

We'll work on that as follow-up patches.

Changelog
---------

Changes v3:

* After discussing with Hans and Tomasz during the media summit
in ELCE, we decided to go back on the MC changes. The MC topology
is now untouched. This means the series is now similar to v1,
except we explicitly use the ENUM_FMT to hint about the post-processed
formats.

Changes v2:

* The decoder->post-processor topology is now exposed
  explicitly and applications need to configure the pipeline.
  By default, the decoder is enabled and the post-processor
  is disabled.

* RGB post-processing output has been dropped. We might
  add this in the future, but for now, it seems it would
  make the code more complex without a use-case in mind.
  RGB is much more memory-consuming so less attractive
  than YUV, and modern GPUs and display controllers support YUV.

* The post-processor implementation still supports RK3288
  only. However, a generic register infrastructure is introduced
  to make addition of other variants such as RK3399 really easy.

Ezequiel Garcia (3):
  media: hantro: Cleanup format negotiation helpers
  media: hantro: Support color conversion via post-processing
  media: vidioc-enum-fmt.rst: clarify format preference

 .../media/uapi/v4l/vidioc-enum-fmt.rst        |   4 +-
 drivers/staging/media/hantro/Makefile         |   1 +
 drivers/staging/media/hantro/hantro.h         |  64 +++++++-
 drivers/staging/media/hantro/hantro_drv.c     |   8 +-
 .../staging/media/hantro/hantro_g1_h264_dec.c |   2 +-
 .../media/hantro/hantro_g1_mpeg2_dec.c        |   2 +-
 drivers/staging/media/hantro/hantro_g1_regs.h |  53 +++++++
 .../staging/media/hantro/hantro_g1_vp8_dec.c  |   2 +-
 drivers/staging/media/hantro/hantro_h264.c    |   6 +-
 drivers/staging/media/hantro/hantro_hw.h      |  13 ++
 .../staging/media/hantro/hantro_postproc.c    | 141 ++++++++++++++++++
 drivers/staging/media/hantro/hantro_v4l2.c    | 105 ++++++++-----
 drivers/staging/media/hantro/rk3288_vpu_hw.c  |  10 ++
 13 files changed, 366 insertions(+), 45 deletions(-)
 create mode 100644 drivers/staging/media/hantro/hantro_postproc.c

-- 
2.22.0


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v3 1/3] media: hantro: Cleanup format negotiation helpers
  2019-11-13 17:56 [PATCH v3 0/3] Enable Hantro G1 post-processor Ezequiel Garcia
@ 2019-11-13 17:56 ` Ezequiel Garcia
  2019-11-14  9:26   ` Philipp Zabel
  2019-11-14  9:32   ` Boris Brezillon
  2019-11-13 17:56 ` [PATCH v3 2/3] media: hantro: Support color conversion via post-processing Ezequiel Garcia
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 21+ messages in thread
From: Ezequiel Garcia @ 2019-11-13 17:56 UTC (permalink / raw)
  To: linux-media
  Cc: kernel, Tomasz Figa, linux-rockchip, Heiko Stuebner,
	Jonas Karlman, Philipp Zabel, Boris Brezillon, Chris Healy,
	linux-kernel, Ezequiel Garcia

Format negotiation helpers, hantro_find_format()
and hantro_get_default_fmt() can be simplified,
making the code a little bit clearer.

More importantly, this change is preparation work
for the post-processor usage.

Signed-off-by: Ezequiel Garcia <ezequiel@collabora.com>
---
 drivers/staging/media/hantro/hantro_v4l2.c | 53 ++++++++--------------
 1 file changed, 20 insertions(+), 33 deletions(-)

diff --git a/drivers/staging/media/hantro/hantro_v4l2.c b/drivers/staging/media/hantro/hantro_v4l2.c
index 3dae52abb96c..238e53b28f8f 100644
--- a/drivers/staging/media/hantro/hantro_v4l2.c
+++ b/drivers/staging/media/hantro/hantro_v4l2.c
@@ -47,23 +47,26 @@ hantro_get_formats(const struct hantro_ctx *ctx, unsigned int *num_fmts)
 }
 
 static const struct hantro_fmt *
-hantro_find_format(const struct hantro_fmt *formats, unsigned int num_fmts,
-		   u32 fourcc)
+hantro_find_format(const struct hantro_ctx *ctx, u32 fourcc)
 {
-	unsigned int i;
+	const struct hantro_fmt *formats;
+	unsigned int i, num_fmts;
 
+	formats = hantro_get_formats(ctx, &num_fmts);
 	for (i = 0; i < num_fmts; i++)
 		if (formats[i].fourcc == fourcc)
 			return &formats[i];
+
 	return NULL;
 }
 
 static const struct hantro_fmt *
-hantro_get_default_fmt(const struct hantro_fmt *formats, unsigned int num_fmts,
-		       bool bitstream)
+hantro_get_default_fmt(const struct hantro_ctx *ctx, bool bitstream)
 {
-	unsigned int i;
+	const struct hantro_fmt *formats;
+	unsigned int i, num_fmts;
 
+	formats = hantro_get_formats(ctx, &num_fmts);
 	for (i = 0; i < num_fmts; i++) {
 		if (bitstream == (formats[i].codec_mode !=
 				  HANTRO_MODE_NONE))
@@ -89,8 +92,7 @@ static int vidioc_enum_framesizes(struct file *file, void *priv,
 				  struct v4l2_frmsizeenum *fsize)
 {
 	struct hantro_ctx *ctx = fh_to_ctx(priv);
-	const struct hantro_fmt *formats, *fmt;
-	unsigned int num_fmts;
+	const struct hantro_fmt *fmt;
 
 	if (fsize->index != 0) {
 		vpu_debug(0, "invalid frame size index (expected 0, got %d)\n",
@@ -98,8 +100,7 @@ static int vidioc_enum_framesizes(struct file *file, void *priv,
 		return -EINVAL;
 	}
 
-	formats = hantro_get_formats(ctx, &num_fmts);
-	fmt = hantro_find_format(formats, num_fmts, fsize->pixel_format);
+	fmt = hantro_find_format(ctx, fsize->pixel_format);
 	if (!fmt) {
 		vpu_debug(0, "unsupported bitstream format (%08x)\n",
 			  fsize->pixel_format);
@@ -196,8 +197,7 @@ static int vidioc_try_fmt(struct file *file, void *priv, struct v4l2_format *f,
 {
 	struct hantro_ctx *ctx = fh_to_ctx(priv);
 	struct v4l2_pix_format_mplane *pix_mp = &f->fmt.pix_mp;
-	const struct hantro_fmt *formats, *fmt, *vpu_fmt;
-	unsigned int num_fmts;
+	const struct hantro_fmt *fmt, *vpu_fmt;
 	bool coded;
 
 	coded = capture == hantro_is_encoder_ctx(ctx);
@@ -208,10 +208,9 @@ static int vidioc_try_fmt(struct file *file, void *priv, struct v4l2_format *f,
 		  (pix_mp->pixelformat >> 16) & 0x7f,
 		  (pix_mp->pixelformat >> 24) & 0x7f);
 
-	formats = hantro_get_formats(ctx, &num_fmts);
-	fmt = hantro_find_format(formats, num_fmts, pix_mp->pixelformat);
+	fmt = hantro_find_format(ctx, pix_mp->pixelformat);
 	if (!fmt) {
-		fmt = hantro_get_default_fmt(formats, num_fmts, coded);
+		fmt = hantro_get_default_fmt(ctx, coded);
 		f->fmt.pix_mp.pixelformat = fmt->fourcc;
 	}
 
@@ -290,12 +289,10 @@ hantro_reset_fmt(struct v4l2_pix_format_mplane *fmt,
 static void
 hantro_reset_encoded_fmt(struct hantro_ctx *ctx)
 {
-	const struct hantro_fmt *vpu_fmt, *formats;
+	const struct hantro_fmt *vpu_fmt;
 	struct v4l2_pix_format_mplane *fmt;
-	unsigned int num_fmts;
 
-	formats = hantro_get_formats(ctx, &num_fmts);
-	vpu_fmt = hantro_get_default_fmt(formats, num_fmts, true);
+	vpu_fmt = hantro_get_default_fmt(ctx, true);
 
 	if (hantro_is_encoder_ctx(ctx)) {
 		ctx->vpu_dst_fmt = vpu_fmt;
@@ -316,12 +313,10 @@ hantro_reset_encoded_fmt(struct hantro_ctx *ctx)
 static void
 hantro_reset_raw_fmt(struct hantro_ctx *ctx)
 {
-	const struct hantro_fmt *raw_vpu_fmt, *formats;
+	const struct hantro_fmt *raw_vpu_fmt;
 	struct v4l2_pix_format_mplane *raw_fmt, *encoded_fmt;
-	unsigned int num_fmts;
 
-	formats = hantro_get_formats(ctx, &num_fmts);
-	raw_vpu_fmt = hantro_get_default_fmt(formats, num_fmts, false);
+	raw_vpu_fmt = hantro_get_default_fmt(ctx, false);
 
 	if (hantro_is_encoder_ctx(ctx)) {
 		ctx->vpu_src_fmt = raw_vpu_fmt;
@@ -367,8 +362,6 @@ vidioc_s_fmt_out_mplane(struct file *file, void *priv, struct v4l2_format *f)
 {
 	struct v4l2_pix_format_mplane *pix_mp = &f->fmt.pix_mp;
 	struct hantro_ctx *ctx = fh_to_ctx(priv);
-	const struct hantro_fmt *formats;
-	unsigned int num_fmts;
 	struct vb2_queue *vq;
 	int ret;
 
@@ -395,9 +388,7 @@ vidioc_s_fmt_out_mplane(struct file *file, void *priv, struct v4l2_format *f)
 	if (ret)
 		return ret;
 
-	formats = hantro_get_formats(ctx, &num_fmts);
-	ctx->vpu_src_fmt = hantro_find_format(formats, num_fmts,
-					      pix_mp->pixelformat);
+	ctx->vpu_src_fmt = hantro_find_format(ctx, pix_mp->pixelformat);
 	ctx->src_fmt = *pix_mp;
 
 	/*
@@ -431,9 +422,7 @@ static int vidioc_s_fmt_cap_mplane(struct file *file, void *priv,
 {
 	struct v4l2_pix_format_mplane *pix_mp = &f->fmt.pix_mp;
 	struct hantro_ctx *ctx = fh_to_ctx(priv);
-	const struct hantro_fmt *formats;
 	struct vb2_queue *vq;
-	unsigned int num_fmts;
 	int ret;
 
 	/* Change not allowed if queue is busy. */
@@ -462,9 +451,7 @@ static int vidioc_s_fmt_cap_mplane(struct file *file, void *priv,
 	if (ret)
 		return ret;
 
-	formats = hantro_get_formats(ctx, &num_fmts);
-	ctx->vpu_dst_fmt = hantro_find_format(formats, num_fmts,
-					      pix_mp->pixelformat);
+	ctx->vpu_dst_fmt = hantro_find_format(ctx, pix_mp->pixelformat);
 	ctx->dst_fmt = *pix_mp;
 
 	/*
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 2/3] media: hantro: Support color conversion via post-processing
  2019-11-13 17:56 [PATCH v3 0/3] Enable Hantro G1 post-processor Ezequiel Garcia
  2019-11-13 17:56 ` [PATCH v3 1/3] media: hantro: Cleanup format negotiation helpers Ezequiel Garcia
@ 2019-11-13 17:56 ` Ezequiel Garcia
  2019-11-14  9:46   ` Boris Brezillon
  2019-11-14  9:48   ` Philipp Zabel
  2019-11-13 17:56 ` [PATCH v3 3/3] media: vidioc-enum-fmt.rst: clarify format preference Ezequiel Garcia
  2020-02-09 19:52 ` [PATCH v3 0/3] Enable Hantro G1 post-processor Nicolas Dufresne
  3 siblings, 2 replies; 21+ messages in thread
From: Ezequiel Garcia @ 2019-11-13 17:56 UTC (permalink / raw)
  To: linux-media
  Cc: kernel, Tomasz Figa, linux-rockchip, Heiko Stuebner,
	Jonas Karlman, Philipp Zabel, Boris Brezillon, Chris Healy,
	linux-kernel, Ezequiel Garcia

The Hantro G1 decoder is able to enable a post-processor
on the decoding pipeline, which can be used to perform
scaling and color conversion.

The post-processor is integrated to the decoder, and it's
possible to use it in a way that is completely transparent
to the user.

This commit enables color conversion via post-processing,
which means the driver now exposes YUV packed, in addition to NV12.

Signed-off-by: Ezequiel Garcia <ezequiel@collabora.com>
---
 drivers/staging/media/hantro/Makefile         |   1 +
 drivers/staging/media/hantro/hantro.h         |  64 +++++++-
 drivers/staging/media/hantro/hantro_drv.c     |   8 +-
 .../staging/media/hantro/hantro_g1_h264_dec.c |   2 +-
 .../media/hantro/hantro_g1_mpeg2_dec.c        |   2 +-
 drivers/staging/media/hantro/hantro_g1_regs.h |  53 +++++++
 .../staging/media/hantro/hantro_g1_vp8_dec.c  |   2 +-
 drivers/staging/media/hantro/hantro_h264.c    |   6 +-
 drivers/staging/media/hantro/hantro_hw.h      |  13 ++
 .../staging/media/hantro/hantro_postproc.c    | 141 ++++++++++++++++++
 drivers/staging/media/hantro/hantro_v4l2.c    |  52 ++++++-
 drivers/staging/media/hantro/rk3288_vpu_hw.c  |  10 ++
 12 files changed, 343 insertions(+), 11 deletions(-)
 create mode 100644 drivers/staging/media/hantro/hantro_postproc.c

diff --git a/drivers/staging/media/hantro/Makefile b/drivers/staging/media/hantro/Makefile
index 5d6b0383d280..496b30c3c396 100644
--- a/drivers/staging/media/hantro/Makefile
+++ b/drivers/staging/media/hantro/Makefile
@@ -3,6 +3,7 @@ obj-$(CONFIG_VIDEO_HANTRO) += hantro-vpu.o
 hantro-vpu-y += \
 		hantro_drv.o \
 		hantro_v4l2.o \
+		hantro_postproc.o \
 		hantro_h1_jpeg_enc.o \
 		hantro_g1_h264_dec.o \
 		hantro_g1_mpeg2_dec.o \
diff --git a/drivers/staging/media/hantro/hantro.h b/drivers/staging/media/hantro/hantro.h
index deb90ae37859..6016a1a42503 100644
--- a/drivers/staging/media/hantro/hantro.h
+++ b/drivers/staging/media/hantro/hantro.h
@@ -60,6 +60,8 @@ struct hantro_irq {
  * @num_enc_fmts:		Number of encoder formats.
  * @dec_fmts:			Decoder formats.
  * @num_dec_fmts:		Number of decoder formats.
+ * @postproc_fmts:		Post-processor formats.
+ * @num_postproc_fmts:		Number of post-processor formats.
  * @codec:			Supported codecs
  * @codec_ops:			Codec ops.
  * @init:			Initialize hardware.
@@ -70,6 +72,7 @@ struct hantro_irq {
  * @num_clocks:			number of clocks in the array
  * @reg_names:			array of register range names
  * @num_regs:			number of register range names in the array
+ * @postproc_regs:		&struct hantro_postproc_regs pointer
  */
 struct hantro_variant {
 	unsigned int enc_offset;
@@ -78,6 +81,8 @@ struct hantro_variant {
 	unsigned int num_enc_fmts;
 	const struct hantro_fmt *dec_fmts;
 	unsigned int num_dec_fmts;
+	const struct hantro_fmt *postproc_fmts;
+	unsigned int num_postproc_fmts;
 	unsigned int codec;
 	const struct hantro_codec_ops *codec_ops;
 	int (*init)(struct hantro_dev *vpu);
@@ -88,6 +93,7 @@ struct hantro_variant {
 	int num_clocks;
 	const char * const *reg_names;
 	int num_regs;
+	const struct hantro_postproc_regs *postproc_regs;
 };
 
 /**
@@ -213,6 +219,7 @@ struct hantro_dev {
  *			context, and it's called right before
  *			calling v4l2_m2m_job_finish.
  * @codec_ops:		Set of operations related to codec mode.
+ * @postproc:		Post-processing context.
  * @jpeg_enc:		JPEG-encoding context.
  * @mpeg2_dec:		MPEG-2-decoding context.
  * @vp8_dec:		VP8-decoding context.
@@ -237,6 +244,7 @@ struct hantro_ctx {
 			  unsigned int bytesused);
 
 	const struct hantro_codec_ops *codec_ops;
+	struct hantro_postproc_ctx postproc;
 
 	/* Specific for particular codec modes. */
 	union {
@@ -274,6 +282,23 @@ struct hantro_reg {
 	u32 mask;
 };
 
+struct hantro_postproc_regs {
+	struct hantro_reg pipeline_en;
+	struct hantro_reg max_burst;
+	struct hantro_reg clk_gate;
+	struct hantro_reg out_swap32;
+	struct hantro_reg out_endian;
+	struct hantro_reg out_luma_base;
+	struct hantro_reg input_width;
+	struct hantro_reg input_height;
+	struct hantro_reg output_width;
+	struct hantro_reg output_height;
+	struct hantro_reg input_fmt;
+	struct hantro_reg output_fmt;
+	struct hantro_reg orig_width;
+	struct hantro_reg display_width;
+};
+
 /* Logging helpers */
 
 /**
@@ -352,9 +377,23 @@ static inline u32 vdpu_read(struct hantro_dev *vpu, u32 reg)
 	return val;
 }
 
-static inline void hantro_reg_write(struct hantro_dev *vpu,
-				    const struct hantro_reg *reg,
-				    u32 val)
+static inline void
+hantro_reg_write(struct hantro_dev *vpu,
+		 const struct hantro_reg *reg,
+		 u32 val)
+{
+	u32 v;
+
+	v = vdpu_read(vpu, reg->base);
+	v &= ~(reg->mask << reg->shift);
+	v |= ((val & reg->mask) << reg->shift);
+	vdpu_write(vpu, v, reg->base);
+}
+
+static inline void
+hantro_reg_write_relaxed(struct hantro_dev *vpu,
+			 const struct hantro_reg *reg,
+			 u32 val)
 {
 	u32 v;
 
@@ -381,4 +420,23 @@ hantro_get_dst_buf(struct hantro_ctx *ctx)
 	return v4l2_m2m_next_dst_buf(ctx->fh.m2m_ctx);
 }
 
+static inline bool
+hantro_needs_postproc(struct hantro_ctx *ctx)
+{
+	return ctx->vpu_dst_fmt->fourcc != V4L2_PIX_FMT_NV12;
+}
+
+static inline dma_addr_t
+hantro_get_dec_buf_addr(struct hantro_ctx *ctx, struct vb2_buffer *vb)
+{
+	if (hantro_needs_postproc(ctx))
+		return ctx->postproc.dec_q[vb->index].dma;
+	return vb2_dma_contig_plane_dma_addr(vb, 0);
+}
+
+void hantro_postproc_disable(struct hantro_ctx *ctx);
+void hantro_postproc_setup(struct hantro_ctx *ctx);
+void hantro_postproc_free(struct hantro_ctx *ctx);
+int hantro_postproc_alloc(struct hantro_ctx *ctx);
+
 #endif /* HANTRO_H_ */
diff --git a/drivers/staging/media/hantro/hantro_drv.c b/drivers/staging/media/hantro/hantro_drv.c
index 26108c96b674..fb08296db168 100644
--- a/drivers/staging/media/hantro/hantro_drv.c
+++ b/drivers/staging/media/hantro/hantro_drv.c
@@ -53,7 +53,7 @@ dma_addr_t hantro_get_ref(struct hantro_ctx *ctx, u64 ts)
 	if (index < 0)
 		return 0;
 	buf = vb2_get_buffer(q, index);
-	return vb2_dma_contig_plane_dma_addr(buf, 0);
+	return hantro_get_dec_buf_addr(ctx, buf);
 }
 
 static int
@@ -159,12 +159,18 @@ void hantro_prepare_run(struct hantro_ctx *ctx)
 	src_buf = hantro_get_src_buf(ctx);
 	v4l2_ctrl_request_setup(src_buf->vb2_buf.req_obj.req,
 				&ctx->ctrl_handler);
+
+	if (!hantro_needs_postproc(ctx))
+		hantro_postproc_disable(ctx);
 }
 
 void hantro_finish_run(struct hantro_ctx *ctx)
 {
 	struct vb2_v4l2_buffer *src_buf;
 
+	if (hantro_needs_postproc(ctx))
+		hantro_postproc_setup(ctx);
+
 	src_buf = hantro_get_src_buf(ctx);
 	v4l2_ctrl_request_complete(src_buf->vb2_buf.req_obj.req,
 				   &ctx->ctrl_handler);
diff --git a/drivers/staging/media/hantro/hantro_g1_h264_dec.c b/drivers/staging/media/hantro/hantro_g1_h264_dec.c
index 70a6b5b26477..9b292722c9de 100644
--- a/drivers/staging/media/hantro/hantro_g1_h264_dec.c
+++ b/drivers/staging/media/hantro/hantro_g1_h264_dec.c
@@ -243,7 +243,7 @@ static void set_buffers(struct hantro_ctx *ctx)
 	vdpu_write_relaxed(vpu, src_dma, G1_REG_ADDR_STR);
 
 	/* Destination (decoded frame) buffer. */
-	dst_dma = vb2_dma_contig_plane_dma_addr(&dst_buf->vb2_buf, 0);
+	dst_dma = hantro_get_dec_buf_addr(ctx, &dst_buf->vb2_buf);
 	vdpu_write_relaxed(vpu, dst_dma, G1_REG_ADDR_DST);
 
 	/* Higher profiles require DMV buffer appended to reference frames. */
diff --git a/drivers/staging/media/hantro/hantro_g1_mpeg2_dec.c b/drivers/staging/media/hantro/hantro_g1_mpeg2_dec.c
index f3bf67d8a289..0abe0be2c1ad 100644
--- a/drivers/staging/media/hantro/hantro_g1_mpeg2_dec.c
+++ b/drivers/staging/media/hantro/hantro_g1_mpeg2_dec.c
@@ -121,7 +121,7 @@ hantro_g1_mpeg2_dec_set_buffers(struct hantro_dev *vpu, struct hantro_ctx *ctx,
 	vdpu_write_relaxed(vpu, addr, G1_REG_RLC_VLC_BASE);
 
 	/* Destination frame buffer */
-	addr = vb2_dma_contig_plane_dma_addr(dst_buf, 0);
+	addr = hantro_get_dec_buf_addr(ctx, dst_buf);
 	current_addr = addr;
 
 	if (picture->picture_structure == PICT_BOTTOM_FIELD)
diff --git a/drivers/staging/media/hantro/hantro_g1_regs.h b/drivers/staging/media/hantro/hantro_g1_regs.h
index 5c0ea7994336..c1756e3d5391 100644
--- a/drivers/staging/media/hantro/hantro_g1_regs.h
+++ b/drivers/staging/media/hantro/hantro_g1_regs.h
@@ -9,6 +9,8 @@
 #ifndef HANTRO_G1_REGS_H_
 #define HANTRO_G1_REGS_H_
 
+#define G1_SWREG(nr)                 ((nr) * 4)
+
 /* Decoder registers. */
 #define G1_REG_INTERRUPT				0x004
 #define     G1_REG_INTERRUPT_DEC_PIC_INF		BIT(24)
@@ -298,4 +300,55 @@
 #define     G1_REG_REF_BUF_CTRL2_APF_THRESHOLD(x)	(((x) & 0x3fff) << 0)
 #define G1_REG_SOFT_RESET				0x194
 
+/* Post-processor registers. */
+#define G1_REG_PP_INTERRUPT		G1_SWREG(60)
+#define    G1_REG_PP_READY_IRQ		BIT(12)
+#define    G1_REG_PP_IRQ		BIT(8)
+#define    G1_REG_PP_IRQ_DIS		BIT(4)
+#define    G1_REG_PP_PIPELINE_EN	BIT(1)
+#define    G1_REG_PP_EXTERNAL_TRIGGER	BIT(0)
+#define G1_REG_PP_DEV_CONFIG		G1_SWREG(61)
+#define     G1_REG_PP_AXI_RD_ID(v)	(((v) << 24) & GENMASK(31, 24))
+#define     G1_REG_PP_AXI_WR_ID(v)	(((v) << 16) & GENMASK(23, 16))
+#define     G1_REG_PP_INSWAP32_E(v)	((v) ? BIT(10) : 0)
+#define     G1_REG_PP_DATA_DISC_E(v)	((v) ? BIT(9) : 0)
+#define     G1_REG_PP_CLK_GATE_E(v)	((v) ? BIT(8) : 0)
+#define     G1_REG_PP_IN_ENDIAN(v)	((v) ? BIT(7) : 0)
+#define     G1_REG_PP_OUT_ENDIAN(v)	((v) ? BIT(6) : 0)
+#define     G1_REG_PP_OUTSWAP32_E(v)	((v) ? BIT(5) : 0)
+#define     G1_REG_PP_MAX_BURST(v)	(((v) << 0) & GENMASK(4, 0))
+#define G1_REG_PP_IN_LUMA_BASE		G1_SWREG(63)
+#define G1_REG_PP_IN_CB_BASE		G1_SWREG(64)
+#define G1_REG_PP_IN_CR_BASE		G1_SWREG(65)
+#define G1_REG_PP_OUT_LUMA_BASE		G1_SWREG(66)
+#define G1_REG_PP_OUT_CHROMA_BASE	G1_SWREG(67)
+#define G1_REG_PP_CONTRAST_ADJUST	G1_SWREG(68)
+#define G1_REG_PP_COLOR_CONVERSION	G1_SWREG(69)
+#define G1_REG_PP_COLOR_CONVERSION0	G1_SWREG(70)
+#define G1_REG_PP_COLOR_CONVERSION1	G1_SWREG(71)
+#define G1_REG_PP_INPUT_SIZE		G1_SWREG(72)
+#define    G1_REG_PP_INPUT_SIZE_HEIGHT(v) (((v) << 9) & GENMASK(16, 9))
+#define    G1_REG_PP_INPUT_SIZE_WIDTH(v)  (((v) << 0) & GENMASK(8, 0))
+#define G1_REG_PP_SCALING0		G1_SWREG(79)
+#define     G1_REG_PP_PADD_R(v)	(((v) << 23) & GENMASK(27, 23))
+#define     G1_REG_PP_PADD_G(v)	(((v) << 18) & GENMASK(22, 18))
+#define     G1_REG_PP_RANGEMAP_Y(v) ((v) ? BIT(31) : 0)
+#define     G1_REG_PP_RANGEMAP_C(v) ((v) ? BIT(30) : 0)
+#define     G1_REG_PP_YCBCR_RANGE(v) ((v) ? BIT(29) : 0)
+#define     G1_REG_PP_RGB_16(v) ((v) ? BIT(28) : 0)
+#define G1_REG_PP_SCALING1		G1_SWREG(80)
+#define     G1_REG_PP_PADD_B(v)	(((v) << 18) & GENMASK(22, 18))
+#define G1_REG_PP_MASK_R		G1_SWREG(82)
+#define G1_REG_PP_MASK_G		G1_SWREG(83)
+#define G1_REG_PP_MASK_B		G1_SWREG(84)
+#define G1_REG_PP_CONTROL		G1_SWREG(85)
+#define     G1_REG_PP_CONTROL_IN_FMT(v)	(((v) << 29) & GENMASK(31, 29))
+#define     G1_REG_PP_CONTROL_OUT_FMT(v) (((v) << 26) & GENMASK(28, 26))
+#define     G1_REG_PP_CONTROL_OUT_HEIGHT(v) (((v) << 15) & GENMASK(25, 15))
+#define     G1_REG_PP_CONTROL_OUT_WIDTH(v) (((v) << 4) & GENMASK(14, 4))
+#define G1_REG_PP_MASK1_ORIG_WIDTH	G1_SWREG(88)
+#define     G1_REG_PP_ORIG_WIDTH(v)	(((v) << 23) & GENMASK(31, 23))
+#define G1_REG_PP_DISPLAY_WIDTH		G1_SWREG(92)
+#define G1_REG_PP_FUSE			G1_SWREG(99)
+
 #endif /* HANTRO_G1_REGS_H_ */
diff --git a/drivers/staging/media/hantro/hantro_g1_vp8_dec.c b/drivers/staging/media/hantro/hantro_g1_vp8_dec.c
index cad18094fee0..e708994d1aba 100644
--- a/drivers/staging/media/hantro/hantro_g1_vp8_dec.c
+++ b/drivers/staging/media/hantro/hantro_g1_vp8_dec.c
@@ -422,7 +422,7 @@ static void cfg_buffers(struct hantro_ctx *ctx,
 	}
 	vdpu_write_relaxed(vpu, reg, G1_REG_FWD_PIC(0));
 
-	dst_dma = vb2_dma_contig_plane_dma_addr(&vb2_dst->vb2_buf, 0);
+	dst_dma = hantro_get_dec_buf_addr(ctx, &vb2_dst->vb2_buf);
 	vdpu_write_relaxed(vpu, dst_dma, G1_REG_ADDR_DST);
 }
 
diff --git a/drivers/staging/media/hantro/hantro_h264.c b/drivers/staging/media/hantro/hantro_h264.c
index 694a330f508e..5c84ebcdd0ea 100644
--- a/drivers/staging/media/hantro/hantro_h264.c
+++ b/drivers/staging/media/hantro/hantro_h264.c
@@ -629,7 +629,11 @@ int hantro_h264_dec_init(struct hantro_ctx *ctx)
 	tbl = priv->cpu;
 	memcpy(tbl->cabac_table, h264_cabac_table, sizeof(tbl->cabac_table));
 
-	v4l2_fill_pixfmt_mp(&pix_mp, ctx->dst_fmt.pixelformat,
+	/*
+	 * For the decoder picture size, we want the decoder
+	 * native pixel format.
+	 */
+	v4l2_fill_pixfmt_mp(&pix_mp, V4L2_PIX_FMT_NV12,
 			    ctx->dst_fmt.width, ctx->dst_fmt.height);
 	h264_dec->pic_size = pix_mp.plane_fmt[0].sizeimage;
 
diff --git a/drivers/staging/media/hantro/hantro_hw.h b/drivers/staging/media/hantro/hantro_hw.h
index 69b88f4d3fb3..18e7d9e1f469 100644
--- a/drivers/staging/media/hantro/hantro_hw.h
+++ b/drivers/staging/media/hantro/hantro_hw.h
@@ -28,11 +28,13 @@ struct hantro_variant;
  * @cpu:	CPU pointer to the buffer.
  * @dma:	DMA address of the buffer.
  * @size:	Size of the buffer.
+ * @attrs:	Attributes of the DMA mapping.
  */
 struct hantro_aux_buf {
 	void *cpu;
 	dma_addr_t dma;
 	size_t size;
+	unsigned long attrs;
 };
 
 /**
@@ -109,6 +111,15 @@ struct hantro_vp8_dec_hw_ctx {
 	struct hantro_aux_buf prob_tbl;
 };
 
+/**
+ * struct hantro_postproc_ctx
+ *
+ * @dec_q:		References buffers, in decoder format.
+ */
+struct hantro_postproc_ctx {
+	struct hantro_aux_buf dec_q[VB2_MAX_FRAME];
+};
+
 /**
  * struct hantro_codec_ops - codec mode specific operations
  *
@@ -144,6 +155,8 @@ extern const struct hantro_variant rk3399_vpu_variant;
 extern const struct hantro_variant rk3328_vpu_variant;
 extern const struct hantro_variant rk3288_vpu_variant;
 
+extern const struct hantro_postproc_regs hantro_g1_postproc_regs;
+
 extern const u32 hantro_vp8_dec_mc_filter[8][6];
 
 void hantro_watchdog(struct work_struct *work);
diff --git a/drivers/staging/media/hantro/hantro_postproc.c b/drivers/staging/media/hantro/hantro_postproc.c
new file mode 100644
index 000000000000..865435386363
--- /dev/null
+++ b/drivers/staging/media/hantro/hantro_postproc.c
@@ -0,0 +1,141 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Hantro G1 post-processor support
+ *
+ * Copyright (C) 2019 Collabora, Ltd.
+ */
+
+#include <linux/dma-mapping.h>
+#include <linux/types.h>
+
+#include "hantro.h"
+#include "hantro_hw.h"
+#include "hantro_g1_regs.h"
+
+#define HANTRO_PP_REG_WRITE(vpu, reg_name, val) \
+	do { \
+		if ((vpu)->variant->postproc_regs->(reg_name).base)	\
+			hantro_reg_write_relaxed((vpu), \
+					 &(vpu)->variant->postproc_regs->(reg_name), \
+					 (val)); \
+	} while (0)
+
+#define HANTRO_PP_REG_WRITE_S(vpu, reg_name, val) \
+	do { \
+		if ((vpu)->variant->postproc_regs->(reg_name).base)	\
+			hantro_reg_write((vpu), \
+					 &(vpu)->variant->postproc_regs->(reg_name), \
+					 (val)); \
+	} while (0)
+
+#define VPU_PP_IN_YUYV			0x0
+#define VPU_PP_IN_NV12			0x1
+#define VPU_PP_IN_YUV420		0x2
+#define VPU_PP_IN_YUV240_TILED		0x5
+#define VPU_PP_OUT_RGB			0x0
+#define VPU_PP_OUT_YUYV			0x3
+
+const struct hantro_postproc_regs hantro_g1_postproc_regs = {
+	.pipeline_en = {G1_REG_PP_INTERRUPT, 1, 0x1},
+	.max_burst = {G1_REG_PP_DEV_CONFIG, 0, 0x1f},
+	.clk_gate = {G1_REG_PP_DEV_CONFIG, 1, 0x1},
+	.out_swap32 = {G1_REG_PP_DEV_CONFIG, 5, 0x1},
+	.out_endian = {G1_REG_PP_DEV_CONFIG, 6, 0x1},
+	.out_luma_base = {G1_REG_PP_OUT_LUMA_BASE, 0, 0xffffffff},
+	.input_width = {G1_REG_PP_INPUT_SIZE, 0, 0x1ff},
+	.input_height = {G1_REG_PP_INPUT_SIZE, 9, 0x1ff},
+	.output_width = {G1_REG_PP_CONTROL, 4, 0x7ff},
+	.output_height = {G1_REG_PP_CONTROL, 15, 0x7ff},
+	.input_fmt = {G1_REG_PP_CONTROL, 29, 0x7},
+	.output_fmt = {G1_REG_PP_CONTROL, 26, 0x7},
+	.orig_width = {G1_REG_PP_MASK1_ORIG_WIDTH, 23, 0x1ff},
+	.display_width = {G1_REG_PP_DISPLAY_WIDTH, 0, 0xfff},
+};
+
+void hantro_postproc_setup(struct hantro_ctx *ctx)
+{
+	struct hantro_dev *vpu = ctx->dev;
+	struct vb2_v4l2_buffer *dst_buf;
+	u32 src_pp_fmt, dst_pp_fmt;
+	dma_addr_t dst_dma;
+
+	/* Turn on pipeline mode. Must be done first. */
+	HANTRO_PP_REG_WRITE_S(vpu, pipeline_en, 0x1);
+
+	src_pp_fmt = VPU_PP_IN_NV12;
+
+	switch (ctx->vpu_dst_fmt->fourcc) {
+	case V4L2_PIX_FMT_YUYV:
+		dst_pp_fmt = VPU_PP_OUT_YUYV;
+		break;
+	default:
+		WARN(1, "output format %d not supported by the post-processor, this wasn't expected.",
+		     ctx->vpu_dst_fmt->fourcc);
+		dst_pp_fmt = 0;
+		break;
+	}
+
+	dst_buf = v4l2_m2m_next_dst_buf(ctx->fh.m2m_ctx);
+	dst_dma = vb2_dma_contig_plane_dma_addr(&dst_buf->vb2_buf, 0);
+
+	HANTRO_PP_REG_WRITE(vpu, clk_gate, 0x1);
+	HANTRO_PP_REG_WRITE(vpu, out_endian, 0x1);
+	HANTRO_PP_REG_WRITE(vpu, out_swap32, 0x1);
+	HANTRO_PP_REG_WRITE(vpu, max_burst, 16);
+	HANTRO_PP_REG_WRITE(vpu, out_luma_base, dst_dma);
+	HANTRO_PP_REG_WRITE(vpu, input_width, MB_WIDTH(ctx->dst_fmt.width));
+	HANTRO_PP_REG_WRITE(vpu, input_height, MB_HEIGHT(ctx->dst_fmt.height));
+	HANTRO_PP_REG_WRITE(vpu, input_fmt, src_pp_fmt);
+	HANTRO_PP_REG_WRITE(vpu, output_fmt, dst_pp_fmt);
+	HANTRO_PP_REG_WRITE(vpu, output_width, ctx->dst_fmt.width);
+	HANTRO_PP_REG_WRITE(vpu, output_height, ctx->dst_fmt.height);
+	HANTRO_PP_REG_WRITE(vpu, orig_width, MB_WIDTH(ctx->dst_fmt.width));
+	HANTRO_PP_REG_WRITE(vpu, display_width, ctx->dst_fmt.width);
+}
+
+void hantro_postproc_free(struct hantro_ctx *ctx)
+{
+	struct hantro_dev *vpu = ctx->dev;
+	unsigned int i;
+
+	for (i = 0; i < VB2_MAX_FRAME; ++i) {
+		struct hantro_aux_buf *priv = &ctx->postproc.dec_q[i];
+
+		if (priv->cpu) {
+			dma_free_attrs(vpu->dev, priv->size, priv->cpu,
+				       priv->dma, priv->attrs);
+			priv->cpu = NULL;
+		}
+	}
+}
+
+int hantro_postproc_alloc(struct hantro_ctx *ctx)
+{
+	struct hantro_dev *vpu = ctx->dev;
+	unsigned int i, buf_size;
+
+	buf_size = ctx->dst_fmt.plane_fmt[0].sizeimage;
+
+	for (i = 0; i < VB2_MAX_FRAME; ++i) {
+		struct hantro_aux_buf *priv = &ctx->postproc.dec_q[i];
+
+		/*
+		 * The buffers on this queue are meant as intermediate
+		 * buffers for the decoder, so no mapping is needed.
+		 */
+		priv->attrs = DMA_ATTR_NO_KERNEL_MAPPING;
+		priv->cpu = dma_alloc_attrs(vpu->dev, buf_size, &priv->dma,
+					    GFP_KERNEL, priv->attrs);
+		if (!priv->cpu)
+			return -ENOMEM;
+		priv->size = buf_size;
+	}
+	return 0;
+}
+
+void hantro_postproc_disable(struct hantro_ctx *ctx)
+{
+	struct hantro_dev *vpu = ctx->dev;
+
+	HANTRO_PP_REG_WRITE_S(vpu, pipeline_en, 0x0);
+}
diff --git a/drivers/staging/media/hantro/hantro_v4l2.c b/drivers/staging/media/hantro/hantro_v4l2.c
index 238e53b28f8f..ff665d4f004f 100644
--- a/drivers/staging/media/hantro/hantro_v4l2.c
+++ b/drivers/staging/media/hantro/hantro_v4l2.c
@@ -46,6 +46,19 @@ hantro_get_formats(const struct hantro_ctx *ctx, unsigned int *num_fmts)
 	return formats;
 }
 
+static const struct hantro_fmt *
+hantro_get_postproc_formats(const struct hantro_ctx *ctx,
+			    unsigned int *num_fmts)
+{
+	if (hantro_is_encoder_ctx(ctx)) {
+		*num_fmts = 0;
+		return NULL;
+	}
+
+	*num_fmts = ctx->dev->variant->num_postproc_fmts;
+	return ctx->dev->variant->postproc_fmts;
+}
+
 static const struct hantro_fmt *
 hantro_find_format(const struct hantro_ctx *ctx, u32 fourcc)
 {
@@ -57,6 +70,10 @@ hantro_find_format(const struct hantro_ctx *ctx, u32 fourcc)
 		if (formats[i].fourcc == fourcc)
 			return &formats[i];
 
+	formats = hantro_get_postproc_formats(ctx, &num_fmts);
+	for (i = 0; i < num_fmts; i++)
+		if (formats[i].fourcc == fourcc)
+			return &formats[i];
 	return NULL;
 }
 
@@ -151,6 +168,20 @@ static int vidioc_enum_fmt(struct file *file, void *priv,
 		}
 		++j;
 	}
+
+	/* Enumerate post-processed formats. */
+	if (!capture)
+		return -EINVAL;
+	formats = hantro_get_postproc_formats(ctx, &num_fmts);
+	for (i = 0; i < num_fmts; i++) {
+		if (j == f->index) {
+			fmt = &formats[i];
+			f->pixelformat = fmt->fourcc;
+			return 0;
+		}
+		++j;
+	}
+
 	return -EINVAL;
 }
 
@@ -241,9 +272,10 @@ static int vidioc_try_fmt(struct file *file, void *priv, struct v4l2_format *f,
 		/*
 		 * The H264 decoder needs extra space on the output buffers
 		 * to store motion vectors. This is needed for reference
-		 * frames.
+		 * frames and only if the format is non-post-processed (NV12).
 		 */
-		if (ctx->vpu_src_fmt->fourcc == V4L2_PIX_FMT_H264_SLICE)
+		if (ctx->vpu_src_fmt->fourcc == V4L2_PIX_FMT_H264_SLICE &&
+		    fmt->fourcc == V4L2_PIX_FMT_NV12)
 			pix_mp->plane_fmt[0].sizeimage +=
 				128 * DIV_ROUND_UP(pix_mp->width, 16) *
 				      DIV_ROUND_UP(pix_mp->height, 16);
@@ -611,10 +643,23 @@ static int hantro_start_streaming(struct vb2_queue *q, unsigned int count)
 
 		vpu_debug(4, "Codec mode = %d\n", codec_mode);
 		ctx->codec_ops = &ctx->dev->variant->codec_ops[codec_mode];
-		if (ctx->codec_ops->init)
+		if (ctx->codec_ops->init) {
 			ret = ctx->codec_ops->init(ctx);
+			if (ret)
+				return ret;
+		}
+
+		if (hantro_needs_postproc(ctx)) {
+			ret = hantro_postproc_alloc(ctx);
+			if (ret)
+				goto err_codec_exit;
+		}
 	}
+	return ret;
 
+err_codec_exit:
+	if (ctx->codec_ops && ctx->codec_ops->exit)
+		ctx->codec_ops->exit(ctx);
 	return ret;
 }
 
@@ -641,6 +686,7 @@ static void hantro_stop_streaming(struct vb2_queue *q)
 	struct hantro_ctx *ctx = vb2_get_drv_priv(q);
 
 	if (hantro_vq_is_coded(q)) {
+		hantro_postproc_free(ctx);
 		if (ctx->codec_ops && ctx->codec_ops->exit)
 			ctx->codec_ops->exit(ctx);
 	}
diff --git a/drivers/staging/media/hantro/rk3288_vpu_hw.c b/drivers/staging/media/hantro/rk3288_vpu_hw.c
index f8db6fcaad73..2f914b37b9e5 100644
--- a/drivers/staging/media/hantro/rk3288_vpu_hw.c
+++ b/drivers/staging/media/hantro/rk3288_vpu_hw.c
@@ -56,6 +56,13 @@ static const struct hantro_fmt rk3288_vpu_enc_fmts[] = {
 	},
 };
 
+static const struct hantro_fmt rk3288_vpu_postproc_fmts[] = {
+	{
+		.fourcc = V4L2_PIX_FMT_YUYV,
+		.codec_mode = HANTRO_MODE_NONE,
+	},
+};
+
 static const struct hantro_fmt rk3288_vpu_dec_fmts[] = {
 	{
 		.fourcc = V4L2_PIX_FMT_NV12,
@@ -215,6 +222,9 @@ const struct hantro_variant rk3288_vpu_variant = {
 	.dec_offset = 0x400,
 	.dec_fmts = rk3288_vpu_dec_fmts,
 	.num_dec_fmts = ARRAY_SIZE(rk3288_vpu_dec_fmts),
+	.postproc_fmts = rk3288_vpu_postproc_fmts,
+	.num_postproc_fmts = ARRAY_SIZE(rk3288_vpu_postproc_fmts),
+	.postproc_regs = &hantro_g1_postproc_regs,
 	.codec = HANTRO_JPEG_ENCODER | HANTRO_MPEG2_DECODER |
 		 HANTRO_VP8_DECODER | HANTRO_H264_DECODER,
 	.codec_ops = rk3288_vpu_codec_ops,
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 3/3] media: vidioc-enum-fmt.rst: clarify format preference
  2019-11-13 17:56 [PATCH v3 0/3] Enable Hantro G1 post-processor Ezequiel Garcia
  2019-11-13 17:56 ` [PATCH v3 1/3] media: hantro: Cleanup format negotiation helpers Ezequiel Garcia
  2019-11-13 17:56 ` [PATCH v3 2/3] media: hantro: Support color conversion via post-processing Ezequiel Garcia
@ 2019-11-13 17:56 ` Ezequiel Garcia
  2019-11-14  9:32   ` Boris Brezillon
  2020-02-09 19:52 ` [PATCH v3 0/3] Enable Hantro G1 post-processor Nicolas Dufresne
  3 siblings, 1 reply; 21+ messages in thread
From: Ezequiel Garcia @ 2019-11-13 17:56 UTC (permalink / raw)
  To: linux-media
  Cc: kernel, Tomasz Figa, linux-rockchip, Heiko Stuebner,
	Jonas Karlman, Philipp Zabel, Boris Brezillon, Chris Healy,
	linux-kernel, Ezequiel Garcia

It has been decided to use the ENUM_FMT index value
as a hint for driver preference. This is defined purposedly
in a very liberal way, letting drivers define what "preference"
means.

For instance, the Hantro VPU driver indicates additional
processing to output a given format, and thus implicates
more CPU usage, which is enumerated after native (non-processed)
formats.

Signed-off-by: Ezequiel Garcia <ezequiel@collabora.com>
---
 Documentation/media/uapi/v4l/vidioc-enum-fmt.rst | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/Documentation/media/uapi/v4l/vidioc-enum-fmt.rst b/Documentation/media/uapi/v4l/vidioc-enum-fmt.rst
index 399ef1062bac..8ca6ab701e4a 100644
--- a/Documentation/media/uapi/v4l/vidioc-enum-fmt.rst
+++ b/Documentation/media/uapi/v4l/vidioc-enum-fmt.rst
@@ -44,7 +44,9 @@ To enumerate image formats applications initialize the ``type`` and
 the :ref:`VIDIOC_ENUM_FMT` ioctl with a pointer to this structure. Drivers
 fill the rest of the structure or return an ``EINVAL`` error code. All
 formats are enumerable by beginning at index zero and incrementing by
-one until ``EINVAL`` is returned.
+one until ``EINVAL`` is returned. If applicable, drivers shall return
+formats in preference order, where preferred formats are returned before
+(that is, with lower ``index`` value) less-preferred formats.
 
 .. note::
 
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 1/3] media: hantro: Cleanup format negotiation helpers
  2019-11-13 17:56 ` [PATCH v3 1/3] media: hantro: Cleanup format negotiation helpers Ezequiel Garcia
@ 2019-11-14  9:26   ` Philipp Zabel
  2019-11-14  9:32   ` Boris Brezillon
  1 sibling, 0 replies; 21+ messages in thread
From: Philipp Zabel @ 2019-11-14  9:26 UTC (permalink / raw)
  To: Ezequiel Garcia, linux-media
  Cc: kernel, Tomasz Figa, linux-rockchip, Heiko Stuebner,
	Jonas Karlman, Boris Brezillon, Chris Healy, linux-kernel

On Wed, 2019-11-13 at 14:56 -0300, Ezequiel Garcia wrote:
> Format negotiation helpers, hantro_find_format()
> and hantro_get_default_fmt() can be simplified,
> making the code a little bit clearer.
> 
> More importantly, this change is preparation work
> for the post-processor usage.
> 
> Signed-off-by: Ezequiel Garcia <ezequiel@collabora.com>

Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>

regards
Philipp


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 1/3] media: hantro: Cleanup format negotiation helpers
  2019-11-13 17:56 ` [PATCH v3 1/3] media: hantro: Cleanup format negotiation helpers Ezequiel Garcia
  2019-11-14  9:26   ` Philipp Zabel
@ 2019-11-14  9:32   ` Boris Brezillon
  1 sibling, 0 replies; 21+ messages in thread
From: Boris Brezillon @ 2019-11-14  9:32 UTC (permalink / raw)
  To: Ezequiel Garcia
  Cc: linux-media, kernel, Tomasz Figa, linux-rockchip, Heiko Stuebner,
	Jonas Karlman, Philipp Zabel, Chris Healy, linux-kernel

On Wed, 13 Nov 2019 14:56:01 -0300
Ezequiel Garcia <ezequiel@collabora.com> wrote:

> Format negotiation helpers, hantro_find_format()
> and hantro_get_default_fmt() can be simplified,
> making the code a little bit clearer.
> 
> More importantly, this change is preparation work
> for the post-processor usage.
> 
> Signed-off-by: Ezequiel Garcia <ezequiel@collabora.com>

Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>

> ---
>  drivers/staging/media/hantro/hantro_v4l2.c | 53 ++++++++--------------
>  1 file changed, 20 insertions(+), 33 deletions(-)
> 
> diff --git a/drivers/staging/media/hantro/hantro_v4l2.c b/drivers/staging/media/hantro/hantro_v4l2.c
> index 3dae52abb96c..238e53b28f8f 100644
> --- a/drivers/staging/media/hantro/hantro_v4l2.c
> +++ b/drivers/staging/media/hantro/hantro_v4l2.c
> @@ -47,23 +47,26 @@ hantro_get_formats(const struct hantro_ctx *ctx, unsigned int *num_fmts)
>  }
>  
>  static const struct hantro_fmt *
> -hantro_find_format(const struct hantro_fmt *formats, unsigned int num_fmts,
> -		   u32 fourcc)
> +hantro_find_format(const struct hantro_ctx *ctx, u32 fourcc)
>  {
> -	unsigned int i;
> +	const struct hantro_fmt *formats;
> +	unsigned int i, num_fmts;
>  
> +	formats = hantro_get_formats(ctx, &num_fmts);
>  	for (i = 0; i < num_fmts; i++)
>  		if (formats[i].fourcc == fourcc)
>  			return &formats[i];
> +
>  	return NULL;
>  }
>  
>  static const struct hantro_fmt *
> -hantro_get_default_fmt(const struct hantro_fmt *formats, unsigned int num_fmts,
> -		       bool bitstream)
> +hantro_get_default_fmt(const struct hantro_ctx *ctx, bool bitstream)
>  {
> -	unsigned int i;
> +	const struct hantro_fmt *formats;
> +	unsigned int i, num_fmts;
>  
> +	formats = hantro_get_formats(ctx, &num_fmts);
>  	for (i = 0; i < num_fmts; i++) {
>  		if (bitstream == (formats[i].codec_mode !=
>  				  HANTRO_MODE_NONE))
> @@ -89,8 +92,7 @@ static int vidioc_enum_framesizes(struct file *file, void *priv,
>  				  struct v4l2_frmsizeenum *fsize)
>  {
>  	struct hantro_ctx *ctx = fh_to_ctx(priv);
> -	const struct hantro_fmt *formats, *fmt;
> -	unsigned int num_fmts;
> +	const struct hantro_fmt *fmt;
>  
>  	if (fsize->index != 0) {
>  		vpu_debug(0, "invalid frame size index (expected 0, got %d)\n",
> @@ -98,8 +100,7 @@ static int vidioc_enum_framesizes(struct file *file, void *priv,
>  		return -EINVAL;
>  	}
>  
> -	formats = hantro_get_formats(ctx, &num_fmts);
> -	fmt = hantro_find_format(formats, num_fmts, fsize->pixel_format);
> +	fmt = hantro_find_format(ctx, fsize->pixel_format);
>  	if (!fmt) {
>  		vpu_debug(0, "unsupported bitstream format (%08x)\n",
>  			  fsize->pixel_format);
> @@ -196,8 +197,7 @@ static int vidioc_try_fmt(struct file *file, void *priv, struct v4l2_format *f,
>  {
>  	struct hantro_ctx *ctx = fh_to_ctx(priv);
>  	struct v4l2_pix_format_mplane *pix_mp = &f->fmt.pix_mp;
> -	const struct hantro_fmt *formats, *fmt, *vpu_fmt;
> -	unsigned int num_fmts;
> +	const struct hantro_fmt *fmt, *vpu_fmt;
>  	bool coded;
>  
>  	coded = capture == hantro_is_encoder_ctx(ctx);
> @@ -208,10 +208,9 @@ static int vidioc_try_fmt(struct file *file, void *priv, struct v4l2_format *f,
>  		  (pix_mp->pixelformat >> 16) & 0x7f,
>  		  (pix_mp->pixelformat >> 24) & 0x7f);
>  
> -	formats = hantro_get_formats(ctx, &num_fmts);
> -	fmt = hantro_find_format(formats, num_fmts, pix_mp->pixelformat);
> +	fmt = hantro_find_format(ctx, pix_mp->pixelformat);
>  	if (!fmt) {
> -		fmt = hantro_get_default_fmt(formats, num_fmts, coded);
> +		fmt = hantro_get_default_fmt(ctx, coded);
>  		f->fmt.pix_mp.pixelformat = fmt->fourcc;
>  	}
>  
> @@ -290,12 +289,10 @@ hantro_reset_fmt(struct v4l2_pix_format_mplane *fmt,
>  static void
>  hantro_reset_encoded_fmt(struct hantro_ctx *ctx)
>  {
> -	const struct hantro_fmt *vpu_fmt, *formats;
> +	const struct hantro_fmt *vpu_fmt;
>  	struct v4l2_pix_format_mplane *fmt;
> -	unsigned int num_fmts;
>  
> -	formats = hantro_get_formats(ctx, &num_fmts);
> -	vpu_fmt = hantro_get_default_fmt(formats, num_fmts, true);
> +	vpu_fmt = hantro_get_default_fmt(ctx, true);
>  
>  	if (hantro_is_encoder_ctx(ctx)) {
>  		ctx->vpu_dst_fmt = vpu_fmt;
> @@ -316,12 +313,10 @@ hantro_reset_encoded_fmt(struct hantro_ctx *ctx)
>  static void
>  hantro_reset_raw_fmt(struct hantro_ctx *ctx)
>  {
> -	const struct hantro_fmt *raw_vpu_fmt, *formats;
> +	const struct hantro_fmt *raw_vpu_fmt;
>  	struct v4l2_pix_format_mplane *raw_fmt, *encoded_fmt;
> -	unsigned int num_fmts;
>  
> -	formats = hantro_get_formats(ctx, &num_fmts);
> -	raw_vpu_fmt = hantro_get_default_fmt(formats, num_fmts, false);
> +	raw_vpu_fmt = hantro_get_default_fmt(ctx, false);
>  
>  	if (hantro_is_encoder_ctx(ctx)) {
>  		ctx->vpu_src_fmt = raw_vpu_fmt;
> @@ -367,8 +362,6 @@ vidioc_s_fmt_out_mplane(struct file *file, void *priv, struct v4l2_format *f)
>  {
>  	struct v4l2_pix_format_mplane *pix_mp = &f->fmt.pix_mp;
>  	struct hantro_ctx *ctx = fh_to_ctx(priv);
> -	const struct hantro_fmt *formats;
> -	unsigned int num_fmts;
>  	struct vb2_queue *vq;
>  	int ret;
>  
> @@ -395,9 +388,7 @@ vidioc_s_fmt_out_mplane(struct file *file, void *priv, struct v4l2_format *f)
>  	if (ret)
>  		return ret;
>  
> -	formats = hantro_get_formats(ctx, &num_fmts);
> -	ctx->vpu_src_fmt = hantro_find_format(formats, num_fmts,
> -					      pix_mp->pixelformat);
> +	ctx->vpu_src_fmt = hantro_find_format(ctx, pix_mp->pixelformat);
>  	ctx->src_fmt = *pix_mp;
>  
>  	/*
> @@ -431,9 +422,7 @@ static int vidioc_s_fmt_cap_mplane(struct file *file, void *priv,
>  {
>  	struct v4l2_pix_format_mplane *pix_mp = &f->fmt.pix_mp;
>  	struct hantro_ctx *ctx = fh_to_ctx(priv);
> -	const struct hantro_fmt *formats;
>  	struct vb2_queue *vq;
> -	unsigned int num_fmts;
>  	int ret;
>  
>  	/* Change not allowed if queue is busy. */
> @@ -462,9 +451,7 @@ static int vidioc_s_fmt_cap_mplane(struct file *file, void *priv,
>  	if (ret)
>  		return ret;
>  
> -	formats = hantro_get_formats(ctx, &num_fmts);
> -	ctx->vpu_dst_fmt = hantro_find_format(formats, num_fmts,
> -					      pix_mp->pixelformat);
> +	ctx->vpu_dst_fmt = hantro_find_format(ctx, pix_mp->pixelformat);
>  	ctx->dst_fmt = *pix_mp;
>  
>  	/*


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 3/3] media: vidioc-enum-fmt.rst: clarify format preference
  2019-11-13 17:56 ` [PATCH v3 3/3] media: vidioc-enum-fmt.rst: clarify format preference Ezequiel Garcia
@ 2019-11-14  9:32   ` Boris Brezillon
  0 siblings, 0 replies; 21+ messages in thread
From: Boris Brezillon @ 2019-11-14  9:32 UTC (permalink / raw)
  To: Ezequiel Garcia
  Cc: linux-media, kernel, Tomasz Figa, linux-rockchip, Heiko Stuebner,
	Jonas Karlman, Philipp Zabel, Chris Healy, linux-kernel

On Wed, 13 Nov 2019 14:56:03 -0300
Ezequiel Garcia <ezequiel@collabora.com> wrote:

> It has been decided to use the ENUM_FMT index value
> as a hint for driver preference. This is defined purposedly
> in a very liberal way, letting drivers define what "preference"
> means.
> 
> For instance, the Hantro VPU driver indicates additional
> processing to output a given format, and thus implicates
> more CPU usage, which is enumerated after native (non-processed)
> formats.
> 
> Signed-off-by: Ezequiel Garcia <ezequiel@collabora.com>

Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>

> ---
>  Documentation/media/uapi/v4l/vidioc-enum-fmt.rst | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/media/uapi/v4l/vidioc-enum-fmt.rst b/Documentation/media/uapi/v4l/vidioc-enum-fmt.rst
> index 399ef1062bac..8ca6ab701e4a 100644
> --- a/Documentation/media/uapi/v4l/vidioc-enum-fmt.rst
> +++ b/Documentation/media/uapi/v4l/vidioc-enum-fmt.rst
> @@ -44,7 +44,9 @@ To enumerate image formats applications initialize the ``type`` and
>  the :ref:`VIDIOC_ENUM_FMT` ioctl with a pointer to this structure. Drivers
>  fill the rest of the structure or return an ``EINVAL`` error code. All
>  formats are enumerable by beginning at index zero and incrementing by
> -one until ``EINVAL`` is returned.
> +one until ``EINVAL`` is returned. If applicable, drivers shall return
> +formats in preference order, where preferred formats are returned before
> +(that is, with lower ``index`` value) less-preferred formats.
>  
>  .. note::
>  


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 2/3] media: hantro: Support color conversion via post-processing
  2019-11-13 17:56 ` [PATCH v3 2/3] media: hantro: Support color conversion via post-processing Ezequiel Garcia
@ 2019-11-14  9:46   ` Boris Brezillon
  2019-11-14  9:48   ` Philipp Zabel
  1 sibling, 0 replies; 21+ messages in thread
From: Boris Brezillon @ 2019-11-14  9:46 UTC (permalink / raw)
  To: Ezequiel Garcia
  Cc: linux-media, kernel, Tomasz Figa, linux-rockchip, Heiko Stuebner,
	Jonas Karlman, Philipp Zabel, Chris Healy, linux-kernel

Hi Ezequiel,

On Wed, 13 Nov 2019 14:56:02 -0300
Ezequiel Garcia <ezequiel@collabora.com> wrote:

> +
> +int hantro_postproc_alloc(struct hantro_ctx *ctx)
> +{
> +	struct hantro_dev *vpu = ctx->dev;
> +	unsigned int i, buf_size;
> +
> +	buf_size = ctx->dst_fmt.plane_fmt[0].sizeimage;
> +
> +	for (i = 0; i < VB2_MAX_FRAME; ++i) {

Don't we know at that point how big the queue is (vq->num_buffers)?
Sounds a bit expensive to always allocate VB2_MAX_FRAME aux buffers.

> +		struct hantro_aux_buf *priv = &ctx->postproc.dec_q[i];
> +
> +		/*
> +		 * The buffers on this queue are meant as intermediate
> +		 * buffers for the decoder, so no mapping is needed.
> +		 */
> +		priv->attrs = DMA_ATTR_NO_KERNEL_MAPPING;
> +		priv->cpu = dma_alloc_attrs(vpu->dev, buf_size, &priv->dma,
> +					    GFP_KERNEL, priv->attrs);
> +		if (!priv->cpu)
> +			return -ENOMEM;
> +		priv->size = buf_size;
> +	}
> +	return 0;
> +}

Other than that, the post-proc extension looks pretty good. Thought it
would be much more invasive than that.

Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 2/3] media: hantro: Support color conversion via post-processing
  2019-11-13 17:56 ` [PATCH v3 2/3] media: hantro: Support color conversion via post-processing Ezequiel Garcia
  2019-11-14  9:46   ` Boris Brezillon
@ 2019-11-14  9:48   ` Philipp Zabel
  2019-11-15 15:44     ` Ezequiel Garcia
  1 sibling, 1 reply; 21+ messages in thread
From: Philipp Zabel @ 2019-11-14  9:48 UTC (permalink / raw)
  To: Ezequiel Garcia, linux-media
  Cc: kernel, Tomasz Figa, linux-rockchip, Heiko Stuebner,
	Jonas Karlman, Boris Brezillon, Chris Healy, linux-kernel

Hi Ezequiel,

On Wed, 2019-11-13 at 14:56 -0300, Ezequiel Garcia wrote:
> The Hantro G1 decoder is able to enable a post-processor
> on the decoding pipeline, which can be used to perform
> scaling and color conversion.
> 
> The post-processor is integrated to the decoder, and it's
> possible to use it in a way that is completely transparent
> to the user.
> 
> This commit enables color conversion via post-processing,
> which means the driver now exposes YUV packed, in addition to NV12.
> 
> Signed-off-by: Ezequiel Garcia <ezequiel@collabora.com>
> ---
>  drivers/staging/media/hantro/Makefile         |   1 +
>  drivers/staging/media/hantro/hantro.h         |  64 +++++++-
>  drivers/staging/media/hantro/hantro_drv.c     |   8 +-
>  .../staging/media/hantro/hantro_g1_h264_dec.c |   2 +-
>  .../media/hantro/hantro_g1_mpeg2_dec.c        |   2 +-
>  drivers/staging/media/hantro/hantro_g1_regs.h |  53 +++++++
>  .../staging/media/hantro/hantro_g1_vp8_dec.c  |   2 +-
>  drivers/staging/media/hantro/hantro_h264.c    |   6 +-
>  drivers/staging/media/hantro/hantro_hw.h      |  13 ++
>  .../staging/media/hantro/hantro_postproc.c    | 141 ++++++++++++++++++
>  drivers/staging/media/hantro/hantro_v4l2.c    |  52 ++++++-
>  drivers/staging/media/hantro/rk3288_vpu_hw.c  |  10 ++
>  12 files changed, 343 insertions(+), 11 deletions(-)
>  create mode 100644 drivers/staging/media/hantro/hantro_postproc.c
> 
> diff --git a/drivers/staging/media/hantro/Makefile b/drivers/staging/media/hantro/Makefile
> index 5d6b0383d280..496b30c3c396 100644
> --- a/drivers/staging/media/hantro/Makefile
> +++ b/drivers/staging/media/hantro/Makefile
> @@ -3,6 +3,7 @@ obj-$(CONFIG_VIDEO_HANTRO) += hantro-vpu.o
>  hantro-vpu-y += \
>  		hantro_drv.o \
>  		hantro_v4l2.o \
> +		hantro_postproc.o \
>  		hantro_h1_jpeg_enc.o \
>  		hantro_g1_h264_dec.o \
>  		hantro_g1_mpeg2_dec.o \
> diff --git a/drivers/staging/media/hantro/hantro.h b/drivers/staging/media/hantro/hantro.h
> index deb90ae37859..6016a1a42503 100644
> --- a/drivers/staging/media/hantro/hantro.h
> +++ b/drivers/staging/media/hantro/hantro.h
> @@ -60,6 +60,8 @@ struct hantro_irq {
>   * @num_enc_fmts:		Number of encoder formats.
>   * @dec_fmts:			Decoder formats.
>   * @num_dec_fmts:		Number of decoder formats.
> + * @postproc_fmts:		Post-processor formats.
> + * @num_postproc_fmts:		Number of post-processor formats.
>   * @codec:			Supported codecs
>   * @codec_ops:			Codec ops.
>   * @init:			Initialize hardware.
> @@ -70,6 +72,7 @@ struct hantro_irq {
>   * @num_clocks:			number of clocks in the array
>   * @reg_names:			array of register range names
>   * @num_regs:			number of register range names in the array
> + * @postproc_regs:		&struct hantro_postproc_regs pointer
>   */
>  struct hantro_variant {
>  	unsigned int enc_offset;
> @@ -78,6 +81,8 @@ struct hantro_variant {
>  	unsigned int num_enc_fmts;
>  	const struct hantro_fmt *dec_fmts;
>  	unsigned int num_dec_fmts;
> +	const struct hantro_fmt *postproc_fmts;
> +	unsigned int num_postproc_fmts;
>  	unsigned int codec;
>  	const struct hantro_codec_ops *codec_ops;
>  	int (*init)(struct hantro_dev *vpu);
> @@ -88,6 +93,7 @@ struct hantro_variant {
>  	int num_clocks;
>  	const char * const *reg_names;
>  	int num_regs;
> +	const struct hantro_postproc_regs *postproc_regs;
>  };
>  
>  /**
> @@ -213,6 +219,7 @@ struct hantro_dev {
>   *			context, and it's called right before
>   *			calling v4l2_m2m_job_finish.
>   * @codec_ops:		Set of operations related to codec mode.
> + * @postproc:		Post-processing context.
>   * @jpeg_enc:		JPEG-encoding context.
>   * @mpeg2_dec:		MPEG-2-decoding context.
>   * @vp8_dec:		VP8-decoding context.
> @@ -237,6 +244,7 @@ struct hantro_ctx {
>  			  unsigned int bytesused);
>  
>  	const struct hantro_codec_ops *codec_ops;
> +	struct hantro_postproc_ctx postproc;
>  
>  	/* Specific for particular codec modes. */
>  	union {
> @@ -274,6 +282,23 @@ struct hantro_reg {
>  	u32 mask;
>  };
>  
> +struct hantro_postproc_regs {
> +	struct hantro_reg pipeline_en;
> +	struct hantro_reg max_burst;
> +	struct hantro_reg clk_gate;
> +	struct hantro_reg out_swap32;
> +	struct hantro_reg out_endian;
> +	struct hantro_reg out_luma_base;
> +	struct hantro_reg input_width;
> +	struct hantro_reg input_height;
> +	struct hantro_reg output_width;
> +	struct hantro_reg output_height;
> +	struct hantro_reg input_fmt;
> +	struct hantro_reg output_fmt;
> +	struct hantro_reg orig_width;
> +	struct hantro_reg display_width;
> +};
> +
>  /* Logging helpers */
>  
>  /**
> @@ -352,9 +377,23 @@ static inline u32 vdpu_read(struct hantro_dev *vpu, u32 reg)
>  	return val;
>  }
>  
> -static inline void hantro_reg_write(struct hantro_dev *vpu,
> -				    const struct hantro_reg *reg,
> -				    u32 val)
> +static inline void
> +hantro_reg_write(struct hantro_dev *vpu,
> +		 const struct hantro_reg *reg,
> +		 u32 val)
> +{
> +	u32 v;
> +
> +	v = vdpu_read(vpu, reg->base);
> +	v &= ~(reg->mask << reg->shift);
> +	v |= ((val & reg->mask) << reg->shift);
> +	vdpu_write(vpu, v, reg->base);
> +}

This adds barriers to all the currently relaxed writes in the VP8
decoders. Maybe split this into a separate patch and add an explanation.

> +
> +static inline void
> +hantro_reg_write_relaxed(struct hantro_dev *vpu,
> +			 const struct hantro_reg *reg,
> +			 u32 val)
>  {
>  	u32 v;
>  
> @@ -381,4 +420,23 @@ hantro_get_dst_buf(struct hantro_ctx *ctx)
>  	return v4l2_m2m_next_dst_buf(ctx->fh.m2m_ctx);
>  }
>  
> +static inline bool
> +hantro_needs_postproc(struct hantro_ctx *ctx)
> +{
> +	return ctx->vpu_dst_fmt->fourcc != V4L2_PIX_FMT_NV12;
> +}
> +
> +static inline dma_addr_t
> +hantro_get_dec_buf_addr(struct hantro_ctx *ctx, struct vb2_buffer *vb)
> +{
> +	if (hantro_needs_postproc(ctx))
> +		return ctx->postproc.dec_q[vb->index].dma;
> +	return vb2_dma_contig_plane_dma_addr(vb, 0);
> +}
> +
> +void hantro_postproc_disable(struct hantro_ctx *ctx);
> +void hantro_postproc_setup(struct hantro_ctx *ctx);
> +void hantro_postproc_free(struct hantro_ctx *ctx);
> +int hantro_postproc_alloc(struct hantro_ctx *ctx);
> +
>  #endif /* HANTRO_H_ */
> diff --git a/drivers/staging/media/hantro/hantro_drv.c b/drivers/staging/media/hantro/hantro_drv.c
> index 26108c96b674..fb08296db168 100644
> --- a/drivers/staging/media/hantro/hantro_drv.c
> +++ b/drivers/staging/media/hantro/hantro_drv.c
> @@ -53,7 +53,7 @@ dma_addr_t hantro_get_ref(struct hantro_ctx *ctx, u64 ts)
>  	if (index < 0)
>  		return 0;
>  	buf = vb2_get_buffer(q, index);
> -	return vb2_dma_contig_plane_dma_addr(buf, 0);
> +	return hantro_get_dec_buf_addr(ctx, buf);
>  }
>  
>  static int
> @@ -159,12 +159,18 @@ void hantro_prepare_run(struct hantro_ctx *ctx)
>  	src_buf = hantro_get_src_buf(ctx);
>  	v4l2_ctrl_request_setup(src_buf->vb2_buf.req_obj.req,
>  				&ctx->ctrl_handler);
> +
> +	if (!hantro_needs_postproc(ctx))
> +		hantro_postproc_disable(ctx);

Why isn't PP enabled in prepare_run? Does this mean the first frame is
not post-processed?

>  }
>  
>  void hantro_finish_run(struct hantro_ctx *ctx)
>  {
>  	struct vb2_v4l2_buffer *src_buf;
>  
> +	if (hantro_needs_postproc(ctx))
> +		hantro_postproc_setup(ctx);
> +
>  	src_buf = hantro_get_src_buf(ctx);
>  	v4l2_ctrl_request_complete(src_buf->vb2_buf.req_obj.req,
>  				   &ctx->ctrl_handler);
> diff --git a/drivers/staging/media/hantro/hantro_g1_h264_dec.c b/drivers/staging/media/hantro/hantro_g1_h264_dec.c
> index 70a6b5b26477..9b292722c9de 100644
> --- a/drivers/staging/media/hantro/hantro_g1_h264_dec.c
> +++ b/drivers/staging/media/hantro/hantro_g1_h264_dec.c
> @@ -243,7 +243,7 @@ static void set_buffers(struct hantro_ctx *ctx)
>  	vdpu_write_relaxed(vpu, src_dma, G1_REG_ADDR_STR);
>  
>  	/* Destination (decoded frame) buffer. */
> -	dst_dma = vb2_dma_contig_plane_dma_addr(&dst_buf->vb2_buf, 0);
> +	dst_dma = hantro_get_dec_buf_addr(ctx, &dst_buf->vb2_buf);
>  	vdpu_write_relaxed(vpu, dst_dma, G1_REG_ADDR_DST);
>  
>  	/* Higher profiles require DMV buffer appended to reference frames. */
> diff --git a/drivers/staging/media/hantro/hantro_g1_mpeg2_dec.c b/drivers/staging/media/hantro/hantro_g1_mpeg2_dec.c
> index f3bf67d8a289..0abe0be2c1ad 100644
> --- a/drivers/staging/media/hantro/hantro_g1_mpeg2_dec.c
> +++ b/drivers/staging/media/hantro/hantro_g1_mpeg2_dec.c
> @@ -121,7 +121,7 @@ hantro_g1_mpeg2_dec_set_buffers(struct hantro_dev *vpu, struct hantro_ctx *ctx,
>  	vdpu_write_relaxed(vpu, addr, G1_REG_RLC_VLC_BASE);
>  
>  	/* Destination frame buffer */
> -	addr = vb2_dma_contig_plane_dma_addr(dst_buf, 0);
> +	addr = hantro_get_dec_buf_addr(ctx, dst_buf);
>  	current_addr = addr;
>  
>  	if (picture->picture_structure == PICT_BOTTOM_FIELD)
> diff --git a/drivers/staging/media/hantro/hantro_g1_regs.h b/drivers/staging/media/hantro/hantro_g1_regs.h
> index 5c0ea7994336..c1756e3d5391 100644
> --- a/drivers/staging/media/hantro/hantro_g1_regs.h
> +++ b/drivers/staging/media/hantro/hantro_g1_regs.h
> @@ -9,6 +9,8 @@
>  #ifndef HANTRO_G1_REGS_H_
>  #define HANTRO_G1_REGS_H_
>  
> +#define G1_SWREG(nr)                 ((nr) * 4)
> +
>  /* Decoder registers. */
>  #define G1_REG_INTERRUPT				0x004
>  #define     G1_REG_INTERRUPT_DEC_PIC_INF		BIT(24)
> @@ -298,4 +300,55 @@
>  #define     G1_REG_REF_BUF_CTRL2_APF_THRESHOLD(x)	(((x) & 0x3fff) << 0)
>  #define G1_REG_SOFT_RESET				0x194
>  
> +/* Post-processor registers. */
> +#define G1_REG_PP_INTERRUPT		G1_SWREG(60)
> +#define    G1_REG_PP_READY_IRQ		BIT(12)
> +#define    G1_REG_PP_IRQ		BIT(8)
> +#define    G1_REG_PP_IRQ_DIS		BIT(4)
> +#define    G1_REG_PP_PIPELINE_EN	BIT(1)
> +#define    G1_REG_PP_EXTERNAL_TRIGGER	BIT(0)
> +#define G1_REG_PP_DEV_CONFIG		G1_SWREG(61)
> +#define     G1_REG_PP_AXI_RD_ID(v)	(((v) << 24) & GENMASK(31, 24))
> +#define     G1_REG_PP_AXI_WR_ID(v)	(((v) << 16) & GENMASK(23, 16))
> +#define     G1_REG_PP_INSWAP32_E(v)	((v) ? BIT(10) : 0)
> +#define     G1_REG_PP_DATA_DISC_E(v)	((v) ? BIT(9) : 0)
> +#define     G1_REG_PP_CLK_GATE_E(v)	((v) ? BIT(8) : 0)
> +#define     G1_REG_PP_IN_ENDIAN(v)	((v) ? BIT(7) : 0)
> +#define     G1_REG_PP_OUT_ENDIAN(v)	((v) ? BIT(6) : 0)
> +#define     G1_REG_PP_OUTSWAP32_E(v)	((v) ? BIT(5) : 0)
> +#define     G1_REG_PP_MAX_BURST(v)	(((v) << 0) & GENMASK(4, 0))
> +#define G1_REG_PP_IN_LUMA_BASE		G1_SWREG(63)
> +#define G1_REG_PP_IN_CB_BASE		G1_SWREG(64)
> +#define G1_REG_PP_IN_CR_BASE		G1_SWREG(65)
> +#define G1_REG_PP_OUT_LUMA_BASE		G1_SWREG(66)
> +#define G1_REG_PP_OUT_CHROMA_BASE	G1_SWREG(67)
> +#define G1_REG_PP_CONTRAST_ADJUST	G1_SWREG(68)
> +#define G1_REG_PP_COLOR_CONVERSION	G1_SWREG(69)
> +#define G1_REG_PP_COLOR_CONVERSION0	G1_SWREG(70)
> +#define G1_REG_PP_COLOR_CONVERSION1	G1_SWREG(71)
> +#define G1_REG_PP_INPUT_SIZE		G1_SWREG(72)
> +#define    G1_REG_PP_INPUT_SIZE_HEIGHT(v) (((v) << 9) & GENMASK(16, 9))
> +#define    G1_REG_PP_INPUT_SIZE_WIDTH(v)  (((v) << 0) & GENMASK(8, 0))
> +#define G1_REG_PP_SCALING0		G1_SWREG(79)
> +#define     G1_REG_PP_PADD_R(v)	(((v) << 23) & GENMASK(27, 23))
> +#define     G1_REG_PP_PADD_G(v)	(((v) << 18) & GENMASK(22, 18))
> +#define     G1_REG_PP_RANGEMAP_Y(v) ((v) ? BIT(31) : 0)
> +#define     G1_REG_PP_RANGEMAP_C(v) ((v) ? BIT(30) : 0)
> +#define     G1_REG_PP_YCBCR_RANGE(v) ((v) ? BIT(29) : 0)
> +#define     G1_REG_PP_RGB_16(v) ((v) ? BIT(28) : 0)
> +#define G1_REG_PP_SCALING1		G1_SWREG(80)
> +#define     G1_REG_PP_PADD_B(v)	(((v) << 18) & GENMASK(22, 18))
> +#define G1_REG_PP_MASK_R		G1_SWREG(82)
> +#define G1_REG_PP_MASK_G		G1_SWREG(83)
> +#define G1_REG_PP_MASK_B		G1_SWREG(84)
> +#define G1_REG_PP_CONTROL		G1_SWREG(85)
> +#define     G1_REG_PP_CONTROL_IN_FMT(v)	(((v) << 29) & GENMASK(31, 29))
> +#define     G1_REG_PP_CONTROL_OUT_FMT(v) (((v) << 26) & GENMASK(28, 26))
> +#define     G1_REG_PP_CONTROL_OUT_HEIGHT(v) (((v) << 15) & GENMASK(25, 15))
> +#define     G1_REG_PP_CONTROL_OUT_WIDTH(v) (((v) << 4) & GENMASK(14, 4))
> +#define G1_REG_PP_MASK1_ORIG_WIDTH	G1_SWREG(88)
> +#define     G1_REG_PP_ORIG_WIDTH(v)	(((v) << 23) & GENMASK(31, 23))
> +#define G1_REG_PP_DISPLAY_WIDTH		G1_SWREG(92)
> +#define G1_REG_PP_FUSE			G1_SWREG(99)
> +
>  #endif /* HANTRO_G1_REGS_H_ */
> diff --git a/drivers/staging/media/hantro/hantro_g1_vp8_dec.c b/drivers/staging/media/hantro/hantro_g1_vp8_dec.c
> index cad18094fee0..e708994d1aba 100644
> --- a/drivers/staging/media/hantro/hantro_g1_vp8_dec.c
> +++ b/drivers/staging/media/hantro/hantro_g1_vp8_dec.c
> @@ -422,7 +422,7 @@ static void cfg_buffers(struct hantro_ctx *ctx,
>  	}
>  	vdpu_write_relaxed(vpu, reg, G1_REG_FWD_PIC(0));
>  
> -	dst_dma = vb2_dma_contig_plane_dma_addr(&vb2_dst->vb2_buf, 0);
> +	dst_dma = hantro_get_dec_buf_addr(ctx, &vb2_dst->vb2_buf);
>  	vdpu_write_relaxed(vpu, dst_dma, G1_REG_ADDR_DST);
>  }
>  
> diff --git a/drivers/staging/media/hantro/hantro_h264.c b/drivers/staging/media/hantro/hantro_h264.c
> index 694a330f508e..5c84ebcdd0ea 100644
> --- a/drivers/staging/media/hantro/hantro_h264.c
> +++ b/drivers/staging/media/hantro/hantro_h264.c
> @@ -629,7 +629,11 @@ int hantro_h264_dec_init(struct hantro_ctx *ctx)
>  	tbl = priv->cpu;
>  	memcpy(tbl->cabac_table, h264_cabac_table, sizeof(tbl->cabac_table));
>  
> -	v4l2_fill_pixfmt_mp(&pix_mp, ctx->dst_fmt.pixelformat,
> +	/*
> +	 * For the decoder picture size, we want the decoder
> +	 * native pixel format.
> +	 */
> +	v4l2_fill_pixfmt_mp(&pix_mp, V4L2_PIX_FMT_NV12,
>  			    ctx->dst_fmt.width, ctx->dst_fmt.height);
>  	h264_dec->pic_size = pix_mp.plane_fmt[0].sizeimage;
>  
> diff --git a/drivers/staging/media/hantro/hantro_hw.h b/drivers/staging/media/hantro/hantro_hw.h
> index 69b88f4d3fb3..18e7d9e1f469 100644
> --- a/drivers/staging/media/hantro/hantro_hw.h
> +++ b/drivers/staging/media/hantro/hantro_hw.h
> @@ -28,11 +28,13 @@ struct hantro_variant;
>   * @cpu:	CPU pointer to the buffer.
>   * @dma:	DMA address of the buffer.
>   * @size:	Size of the buffer.
> + * @attrs:	Attributes of the DMA mapping.
>   */
>  struct hantro_aux_buf {
>  	void *cpu;
>  	dma_addr_t dma;
>  	size_t size;
> +	unsigned long attrs;
>  };
>  
>  /**
> @@ -109,6 +111,15 @@ struct hantro_vp8_dec_hw_ctx {
>  	struct hantro_aux_buf prob_tbl;
>  };
>  
> +/**
> + * struct hantro_postproc_ctx
> + *
> + * @dec_q:		References buffers, in decoder format.
> + */
> +struct hantro_postproc_ctx {
> +	struct hantro_aux_buf dec_q[VB2_MAX_FRAME];
> +};
> +
>  /**
>   * struct hantro_codec_ops - codec mode specific operations
>   *
> @@ -144,6 +155,8 @@ extern const struct hantro_variant rk3399_vpu_variant;
>  extern const struct hantro_variant rk3328_vpu_variant;
>  extern const struct hantro_variant rk3288_vpu_variant;
>  
> +extern const struct hantro_postproc_regs hantro_g1_postproc_regs;
> +
>  extern const u32 hantro_vp8_dec_mc_filter[8][6];
>  
>  void hantro_watchdog(struct work_struct *work);
> diff --git a/drivers/staging/media/hantro/hantro_postproc.c b/drivers/staging/media/hantro/hantro_postproc.c
> new file mode 100644
> index 000000000000..865435386363
> --- /dev/null
> +++ b/drivers/staging/media/hantro/hantro_postproc.c
> @@ -0,0 +1,141 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Hantro G1 post-processor support
> + *
> + * Copyright (C) 2019 Collabora, Ltd.
> + */
> +
> +#include <linux/dma-mapping.h>
> +#include <linux/types.h>
> +
> +#include "hantro.h"
> +#include "hantro_hw.h"
> +#include "hantro_g1_regs.h"
> +
> +#define HANTRO_PP_REG_WRITE(vpu, reg_name, val) \
> +	do { \
> +		if ((vpu)->variant->postproc_regs->(reg_name).base)	\
> +			hantro_reg_write_relaxed((vpu), \
> +					 &(vpu)->variant->postproc_regs->(reg_name), \
> +					 (val)); \
> +	} while (0)
> +
> +#define HANTRO_PP_REG_WRITE_S(vpu, reg_name, val) \
> +	do { \
> +		if ((vpu)->variant->postproc_regs->(reg_name).base)	\
> +			hantro_reg_write((vpu), \
> +					 &(vpu)->variant->postproc_regs->(reg_name), \
> +					 (val)); \
> +	} while (0)

Why all these checks, are any of the register fields optional?

> +
> +#define VPU_PP_IN_YUYV			0x0
> +#define VPU_PP_IN_NV12			0x1
> +#define VPU_PP_IN_YUV420		0x2
> +#define VPU_PP_IN_YUV240_TILED		0x5
> +#define VPU_PP_OUT_RGB			0x0
> +#define VPU_PP_OUT_YUYV			0x3
> +
> +const struct hantro_postproc_regs hantro_g1_postproc_regs = {
> +	.pipeline_en = {G1_REG_PP_INTERRUPT, 1, 0x1},
> +	.max_burst = {G1_REG_PP_DEV_CONFIG, 0, 0x1f},
> +	.clk_gate = {G1_REG_PP_DEV_CONFIG, 1, 0x1},
> +	.out_swap32 = {G1_REG_PP_DEV_CONFIG, 5, 0x1},
> +	.out_endian = {G1_REG_PP_DEV_CONFIG, 6, 0x1},
> +	.out_luma_base = {G1_REG_PP_OUT_LUMA_BASE, 0, 0xffffffff},
> +	.input_width = {G1_REG_PP_INPUT_SIZE, 0, 0x1ff},
> +	.input_height = {G1_REG_PP_INPUT_SIZE, 9, 0x1ff},
> +	.output_width = {G1_REG_PP_CONTROL, 4, 0x7ff},
> +	.output_height = {G1_REG_PP_CONTROL, 15, 0x7ff},
> +	.input_fmt = {G1_REG_PP_CONTROL, 29, 0x7},
> +	.output_fmt = {G1_REG_PP_CONTROL, 26, 0x7},
> +	.orig_width = {G1_REG_PP_MASK1_ORIG_WIDTH, 23, 0x1ff},
> +	.display_width = {G1_REG_PP_DISPLAY_WIDTH, 0, 0xfff},
> +};
> +
> +void hantro_postproc_setup(struct hantro_ctx *ctx)
> +{
> +	struct hantro_dev *vpu = ctx->dev;
> +	struct vb2_v4l2_buffer *dst_buf;
> +	u32 src_pp_fmt, dst_pp_fmt;
> +	dma_addr_t dst_dma;
> +
> +	/* Turn on pipeline mode. Must be done first. */
> +	HANTRO_PP_REG_WRITE_S(vpu, pipeline_en, 0x1);
> +
> +	src_pp_fmt = VPU_PP_IN_NV12;
> +
> +	switch (ctx->vpu_dst_fmt->fourcc) {
> +	case V4L2_PIX_FMT_YUYV:
> +		dst_pp_fmt = VPU_PP_OUT_YUYV;
> +		break;
> +	default:
> +		WARN(1, "output format %d not supported by the post-processor, this wasn't expected.",
> +		     ctx->vpu_dst_fmt->fourcc);
> +		dst_pp_fmt = 0;
> +		break;
> +	}
> +
> +	dst_buf = v4l2_m2m_next_dst_buf(ctx->fh.m2m_ctx);
> +	dst_dma = vb2_dma_contig_plane_dma_addr(&dst_buf->vb2_buf, 0);
> +
> +	HANTRO_PP_REG_WRITE(vpu, clk_gate, 0x1);
> +	HANTRO_PP_REG_WRITE(vpu, out_endian, 0x1);
> +	HANTRO_PP_REG_WRITE(vpu, out_swap32, 0x1);
> +	HANTRO_PP_REG_WRITE(vpu, max_burst, 16);
> +	HANTRO_PP_REG_WRITE(vpu, out_luma_base, dst_dma);
> +	HANTRO_PP_REG_WRITE(vpu, input_width, MB_WIDTH(ctx->dst_fmt.width));
> +	HANTRO_PP_REG_WRITE(vpu, input_height, MB_HEIGHT(ctx->dst_fmt.height));
> +	HANTRO_PP_REG_WRITE(vpu, input_fmt, src_pp_fmt);
> +	HANTRO_PP_REG_WRITE(vpu, output_fmt, dst_pp_fmt);
> +	HANTRO_PP_REG_WRITE(vpu, output_width, ctx->dst_fmt.width);
> +	HANTRO_PP_REG_WRITE(vpu, output_height, ctx->dst_fmt.height);
> +	HANTRO_PP_REG_WRITE(vpu, orig_width, MB_WIDTH(ctx->dst_fmt.width));
> +	HANTRO_PP_REG_WRITE(vpu, display_width, ctx->dst_fmt.width);
> +}
> +
> +void hantro_postproc_free(struct hantro_ctx *ctx)
> +{
> +	struct hantro_dev *vpu = ctx->dev;
> +	unsigned int i;
> +
> +	for (i = 0; i < VB2_MAX_FRAME; ++i) {
> +		struct hantro_aux_buf *priv = &ctx->postproc.dec_q[i];
> +
> +		if (priv->cpu) {
> +			dma_free_attrs(vpu->dev, priv->size, priv->cpu,
> +				       priv->dma, priv->attrs);
> +			priv->cpu = NULL;
> +		}
> +	}
> +}
> +
> +int hantro_postproc_alloc(struct hantro_ctx *ctx)
> +{
> +	struct hantro_dev *vpu = ctx->dev;
> +	unsigned int i, buf_size;
> +
> +	buf_size = ctx->dst_fmt.plane_fmt[0].sizeimage;
> +
> +	for (i = 0; i < VB2_MAX_FRAME; ++i) {

If requests less than VB2_MAX_FRAME capture frames, some of those are
unused. I think it would be better to match the amount of capture frames
here.

> +		struct hantro_aux_buf *priv = &ctx->postproc.dec_q[i];
> +
> +		/*
> +		 * The buffers on this queue are meant as intermediate
> +		 * buffers for the decoder, so no mapping is needed.
> +		 */
> +		priv->attrs = DMA_ATTR_NO_KERNEL_MAPPING;
> +		priv->cpu = dma_alloc_attrs(vpu->dev, buf_size, &priv->dma,
> +					    GFP_KERNEL, priv->attrs);
> +		if (!priv->cpu)
> +			return -ENOMEM;
> +		priv->size = buf_size;
> +	}
> +	return 0;
> +}
> +
> +void hantro_postproc_disable(struct hantro_ctx *ctx)
> +{
> +	struct hantro_dev *vpu = ctx->dev;
> +
> +	HANTRO_PP_REG_WRITE_S(vpu, pipeline_en, 0x0);
> +}
> diff --git a/drivers/staging/media/hantro/hantro_v4l2.c b/drivers/staging/media/hantro/hantro_v4l2.c
> index 238e53b28f8f..ff665d4f004f 100644
> --- a/drivers/staging/media/hantro/hantro_v4l2.c
> +++ b/drivers/staging/media/hantro/hantro_v4l2.c
> @@ -46,6 +46,19 @@ hantro_get_formats(const struct hantro_ctx *ctx, unsigned int *num_fmts)
>  	return formats;
>  }
>  
> +static const struct hantro_fmt *
> +hantro_get_postproc_formats(const struct hantro_ctx *ctx,
> +			    unsigned int *num_fmts)
> +{
> +	if (hantro_is_encoder_ctx(ctx)) {
> +		*num_fmts = 0;
> +		return NULL;
> +	}
> +
> +	*num_fmts = ctx->dev->variant->num_postproc_fmts;
> +	return ctx->dev->variant->postproc_fmts;
> +}
> +
>  static const struct hantro_fmt *
>  hantro_find_format(const struct hantro_ctx *ctx, u32 fourcc)
>  {
> @@ -57,6 +70,10 @@ hantro_find_format(const struct hantro_ctx *ctx, u32 fourcc)
>  		if (formats[i].fourcc == fourcc)
>  			return &formats[i];
>  
> +	formats = hantro_get_postproc_formats(ctx, &num_fmts);
> +	for (i = 0; i < num_fmts; i++)
> +		if (formats[i].fourcc == fourcc)
> +			return &formats[i];
>  	return NULL;
>  }
>  
> @@ -151,6 +168,20 @@ static int vidioc_enum_fmt(struct file *file, void *priv,
>  		}
>  		++j;
>  	}
> +
> +	/* Enumerate post-processed formats. */

I think here some of the explanation from the commit message of patch 3
could be added: why the driver is enumerating PP formats after non-PP
formats.

> +	if (!capture)
> +		return -EINVAL;
> +	formats = hantro_get_postproc_formats(ctx, &num_fmts);
> +	for (i = 0; i < num_fmts; i++) {
> +		if (j == f->index) {
> +			fmt = &formats[i];
> +			f->pixelformat = fmt->fourcc;
> +			return 0;
> +		}
> +		++j;
> +	}
> +
>  	return -EINVAL;
>  }
>  
> @@ -241,9 +272,10 @@ static int vidioc_try_fmt(struct file *file, void *priv, struct v4l2_format *f,
>  		/*
>  		 * The H264 decoder needs extra space on the output buffers
>  		 * to store motion vectors. This is needed for reference
> -		 * frames.
> +		 * frames and only if the format is non-post-processed (NV12).
>  		 */
> -		if (ctx->vpu_src_fmt->fourcc == V4L2_PIX_FMT_H264_SLICE)
> +		if (ctx->vpu_src_fmt->fourcc == V4L2_PIX_FMT_H264_SLICE &&
> +		    fmt->fourcc == V4L2_PIX_FMT_NV12)

If you change hantro_needs_postproc to use a struct hantro_fmt argument,
you could use hantro_needs_postproc(fmt) here.

>  			pix_mp->plane_fmt[0].sizeimage +=
>  				128 * DIV_ROUND_UP(pix_mp->width, 16) *
>  				      DIV_ROUND_UP(pix_mp->height, 16);
> @@ -611,10 +643,23 @@ static int hantro_start_streaming(struct vb2_queue *q, unsigned int count)
>  
>  		vpu_debug(4, "Codec mode = %d\n", codec_mode);
>  		ctx->codec_ops = &ctx->dev->variant->codec_ops[codec_mode];
> -		if (ctx->codec_ops->init)
> +		if (ctx->codec_ops->init) {
>  			ret = ctx->codec_ops->init(ctx);
> +			if (ret)
> +				return ret;
> +		}
> +
> +		if (hantro_needs_postproc(ctx)) {
> +			ret = hantro_postproc_alloc(ctx);

Why is this done in start_streaming? Wouldn't capture side REQBUFS be a
better place for this?

> +			if (ret)
> +				goto err_codec_exit;
> +		}
>  	}
> +	return ret;
>  
> +err_codec_exit:
> +	if (ctx->codec_ops && ctx->codec_ops->exit)
> +		ctx->codec_ops->exit(ctx);
>  	return ret;
>  }
>  
> @@ -641,6 +686,7 @@ static void hantro_stop_streaming(struct vb2_queue *q)
>  	struct hantro_ctx *ctx = vb2_get_drv_priv(q);
>  
>  	if (hantro_vq_is_coded(q)) {
> +		hantro_postproc_free(ctx);

Same as above, move this to capture side REQBUFS(0) ?

>  		if (ctx->codec_ops && ctx->codec_ops->exit)
>  			ctx->codec_ops->exit(ctx);
>  	}
> diff --git a/drivers/staging/media/hantro/rk3288_vpu_hw.c b/drivers/staging/media/hantro/rk3288_vpu_hw.c
> index f8db6fcaad73..2f914b37b9e5 100644
> --- a/drivers/staging/media/hantro/rk3288_vpu_hw.c
> +++ b/drivers/staging/media/hantro/rk3288_vpu_hw.c
> @@ -56,6 +56,13 @@ static const struct hantro_fmt rk3288_vpu_enc_fmts[] = {
>  	},
>  };
>  
> +static const struct hantro_fmt rk3288_vpu_postproc_fmts[] = {
> +	{
> +		.fourcc = V4L2_PIX_FMT_YUYV,
> +		.codec_mode = HANTRO_MODE_NONE,
> +	},
> +};
> +
>  static const struct hantro_fmt rk3288_vpu_dec_fmts[] = {
>  	{
>  		.fourcc = V4L2_PIX_FMT_NV12,
> @@ -215,6 +222,9 @@ const struct hantro_variant rk3288_vpu_variant = {
>  	.dec_offset = 0x400,
>  	.dec_fmts = rk3288_vpu_dec_fmts,
>  	.num_dec_fmts = ARRAY_SIZE(rk3288_vpu_dec_fmts),
> +	.postproc_fmts = rk3288_vpu_postproc_fmts,
> +	.num_postproc_fmts = ARRAY_SIZE(rk3288_vpu_postproc_fmts),
> +	.postproc_regs = &hantro_g1_postproc_regs,
>  	.codec = HANTRO_JPEG_ENCODER | HANTRO_MPEG2_DECODER |
>  		 HANTRO_VP8_DECODER | HANTRO_H264_DECODER,
>  	.codec_ops = rk3288_vpu_codec_ops,

regards
Philipp


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 2/3] media: hantro: Support color conversion via post-processing
  2019-11-14  9:48   ` Philipp Zabel
@ 2019-11-15 15:44     ` Ezequiel Garcia
  2019-11-15 16:10       ` Philipp Zabel
  2019-12-05 14:33       ` Ezequiel Garcia
  0 siblings, 2 replies; 21+ messages in thread
From: Ezequiel Garcia @ 2019-11-15 15:44 UTC (permalink / raw)
  To: Philipp Zabel, linux-media
  Cc: kernel, Tomasz Figa, linux-rockchip, Heiko Stuebner,
	Jonas Karlman, Boris Brezillon, Chris Healy, linux-kernel

Hello Philipp,

Thanks for reviewing.

On Thu, 2019-11-14 at 10:48 +0100, Philipp Zabel wrote:
> Hi Ezequiel,
> 
> On Wed, 2019-11-13 at 14:56 -0300, Ezequiel Garcia wrote:
> > The Hantro G1 decoder is able to enable a post-processor
> > on the decoding pipeline, which can be used to perform
> > scaling and color conversion.
> > 
> > The post-processor is integrated to the decoder, and it's
> > possible to use it in a way that is completely transparent
> > to the user.
> > 
> > This commit enables color conversion via post-processing,
> > which means the driver now exposes YUV packed, in addition to NV12.
> > 
> > Signed-off-by: Ezequiel Garcia <ezequiel@collabora.com>
> > ---
> >  drivers/staging/media/hantro/Makefile         |   1 +
> >  drivers/staging/media/hantro/hantro.h         |  64 +++++++-
> >  drivers/staging/media/hantro/hantro_drv.c     |   8 +-
> >  .../staging/media/hantro/hantro_g1_h264_dec.c |   2 +-
> >  .../media/hantro/hantro_g1_mpeg2_dec.c        |   2 +-
> >  drivers/staging/media/hantro/hantro_g1_regs.h |  53 +++++++
> >  .../staging/media/hantro/hantro_g1_vp8_dec.c  |   2 +-
> >  drivers/staging/media/hantro/hantro_h264.c    |   6 +-
> >  drivers/staging/media/hantro/hantro_hw.h      |  13 ++
> >  .../staging/media/hantro/hantro_postproc.c    | 141 ++++++++++++++++++
> >  drivers/staging/media/hantro/hantro_v4l2.c    |  52 ++++++-
> >  drivers/staging/media/hantro/rk3288_vpu_hw.c  |  10 ++
> >  12 files changed, 343 insertions(+), 11 deletions(-)
> >  create mode 100644 drivers/staging/media/hantro/hantro_postproc.c
> > 
> > diff --git a/drivers/staging/media/hantro/Makefile b/drivers/staging/media/hantro/Makefile
> > index 5d6b0383d280..496b30c3c396 100644
> > --- a/drivers/staging/media/hantro/Makefile
> > +++ b/drivers/staging/media/hantro/Makefile
> > @@ -3,6 +3,7 @@ obj-$(CONFIG_VIDEO_HANTRO) += hantro-vpu.o
> >  hantro-vpu-y += \
> >  		hantro_drv.o \
> >  		hantro_v4l2.o \
> > +		hantro_postproc.o \
> >  		hantro_h1_jpeg_enc.o \
> >  		hantro_g1_h264_dec.o \
> >  		hantro_g1_mpeg2_dec.o \
> > diff --git a/drivers/staging/media/hantro/hantro.h b/drivers/staging/media/hantro/hantro.h
> > index deb90ae37859..6016a1a42503 100644
> > --- a/drivers/staging/media/hantro/hantro.h
> > +++ b/drivers/staging/media/hantro/hantro.h
> > @@ -60,6 +60,8 @@ struct hantro_irq {
> >   * @num_enc_fmts:		Number of encoder formats.
> >   * @dec_fmts:			Decoder formats.
> >   * @num_dec_fmts:		Number of decoder formats.
> > + * @postproc_fmts:		Post-processor formats.
> > + * @num_postproc_fmts:		Number of post-processor formats.
> >   * @codec:			Supported codecs
> >   * @codec_ops:			Codec ops.
> >   * @init:			Initialize hardware.
> > @@ -70,6 +72,7 @@ struct hantro_irq {
> >   * @num_clocks:			number of clocks in the array
> >   * @reg_names:			array of register range names
> >   * @num_regs:			number of register range names in the array
> > + * @postproc_regs:		&struct hantro_postproc_regs pointer
> >   */
> >  struct hantro_variant {
> >  	unsigned int enc_offset;
> > @@ -78,6 +81,8 @@ struct hantro_variant {
> >  	unsigned int num_enc_fmts;
> >  	const struct hantro_fmt *dec_fmts;
> >  	unsigned int num_dec_fmts;
> > +	const struct hantro_fmt *postproc_fmts;
> > +	unsigned int num_postproc_fmts;
> >  	unsigned int codec;
> >  	const struct hantro_codec_ops *codec_ops;
> >  	int (*init)(struct hantro_dev *vpu);
> > @@ -88,6 +93,7 @@ struct hantro_variant {
> >  	int num_clocks;
> >  	const char * const *reg_names;
> >  	int num_regs;
> > +	const struct hantro_postproc_regs *postproc_regs;
> >  };
> >  
> >  /**
> > @@ -213,6 +219,7 @@ struct hantro_dev {
> >   *			context, and it's called right before
> >   *			calling v4l2_m2m_job_finish.
> >   * @codec_ops:		Set of operations related to codec mode.
> > + * @postproc:		Post-processing context.
> >   * @jpeg_enc:		JPEG-encoding context.
> >   * @mpeg2_dec:		MPEG-2-decoding context.
> >   * @vp8_dec:		VP8-decoding context.
> > @@ -237,6 +244,7 @@ struct hantro_ctx {
> >  			  unsigned int bytesused);
> >  
> >  	const struct hantro_codec_ops *codec_ops;
> > +	struct hantro_postproc_ctx postproc;
> >  
> >  	/* Specific for particular codec modes. */
> >  	union {
> > @@ -274,6 +282,23 @@ struct hantro_reg {
> >  	u32 mask;
> >  };
> >  
> > +struct hantro_postproc_regs {
> > +	struct hantro_reg pipeline_en;
> > +	struct hantro_reg max_burst;
> > +	struct hantro_reg clk_gate;
> > +	struct hantro_reg out_swap32;
> > +	struct hantro_reg out_endian;
> > +	struct hantro_reg out_luma_base;
> > +	struct hantro_reg input_width;
> > +	struct hantro_reg input_height;
> > +	struct hantro_reg output_width;
> > +	struct hantro_reg output_height;
> > +	struct hantro_reg input_fmt;
> > +	struct hantro_reg output_fmt;
> > +	struct hantro_reg orig_width;
> > +	struct hantro_reg display_width;
> > +};
> > +
> >  /* Logging helpers */
> >  
> >  /**
> > @@ -352,9 +377,23 @@ static inline u32 vdpu_read(struct hantro_dev *vpu, u32 reg)
> >  	return val;
> >  }
> >  
> > -static inline void hantro_reg_write(struct hantro_dev *vpu,
> > -				    const struct hantro_reg *reg,
> > -				    u32 val)
> > +static inline void
> > +hantro_reg_write(struct hantro_dev *vpu,
> > +		 const struct hantro_reg *reg,
> > +		 u32 val)
> > +{
> > +	u32 v;
> > +
> > +	v = vdpu_read(vpu, reg->base);
> > +	v &= ~(reg->mask << reg->shift);
> > +	v |= ((val & reg->mask) << reg->shift);
> > +	vdpu_write(vpu, v, reg->base);
> > +}
> 
> This adds barriers to all the currently relaxed writes in the VP8
> decoders. Maybe split this into a separate patch and add an explanation.
> 

Yes, I missed this.

> > +
> > +static inline void
> > +hantro_reg_write_relaxed(struct hantro_dev *vpu,
> > +			 const struct hantro_reg *reg,
> > +			 u32 val)
> >  {
> >  	u32 v;
> >  
> > @@ -381,4 +420,23 @@ hantro_get_dst_buf(struct hantro_ctx *ctx)
> >  	return v4l2_m2m_next_dst_buf(ctx->fh.m2m_ctx);
> >  }
> >  
> > +static inline bool
> > +hantro_needs_postproc(struct hantro_ctx *ctx)
> > +{
> > +	return ctx->vpu_dst_fmt->fourcc != V4L2_PIX_FMT_NV12;
> > +}
> > +
> > +static inline dma_addr_t
> > +hantro_get_dec_buf_addr(struct hantro_ctx *ctx, struct vb2_buffer *vb)
> > +{
> > +	if (hantro_needs_postproc(ctx))
> > +		return ctx->postproc.dec_q[vb->index].dma;
> > +	return vb2_dma_contig_plane_dma_addr(vb, 0);
> > +}
> > +
> > +void hantro_postproc_disable(struct hantro_ctx *ctx);
> > +void hantro_postproc_setup(struct hantro_ctx *ctx);
> > +void hantro_postproc_free(struct hantro_ctx *ctx);
> > +int hantro_postproc_alloc(struct hantro_ctx *ctx);
> > +
> >  #endif /* HANTRO_H_ */
> > diff --git a/drivers/staging/media/hantro/hantro_drv.c b/drivers/staging/media/hantro/hantro_drv.c
> > index 26108c96b674..fb08296db168 100644
> > --- a/drivers/staging/media/hantro/hantro_drv.c
> > +++ b/drivers/staging/media/hantro/hantro_drv.c
> > @@ -53,7 +53,7 @@ dma_addr_t hantro_get_ref(struct hantro_ctx *ctx, u64 ts)
> >  	if (index < 0)
> >  		return 0;
> >  	buf = vb2_get_buffer(q, index);
> > -	return vb2_dma_contig_plane_dma_addr(buf, 0);
> > +	return hantro_get_dec_buf_addr(ctx, buf);
> >  }
> >  
> >  static int
> > @@ -159,12 +159,18 @@ void hantro_prepare_run(struct hantro_ctx *ctx)
> >  	src_buf = hantro_get_src_buf(ctx);
> >  	v4l2_ctrl_request_setup(src_buf->vb2_buf.req_obj.req,
> >  				&ctx->ctrl_handler);
> > +
> > +	if (!hantro_needs_postproc(ctx))
> > +		hantro_postproc_disable(ctx);
> 
> Why isn't PP enabled in prepare_run? Does this mean the first frame is
> not post-processed?
> 

No, because hantro_finish_run is called (despite its name)
before the decoding operation is actually triggered.

I guess this hantro_finish_run name adds some confusion:
prepare_run and finish_run should be something along
start_prepare_run, end_prepare_run. 

And also, perhaps disabling the post-processor in prepare_run
works just fine. I need to check that.

> >  }
> >  
> >  void hantro_finish_run(struct hantro_ctx *ctx)
> >  {
> >  	struct vb2_v4l2_buffer *src_buf;
> >  
> > +	if (hantro_needs_postproc(ctx))
> > +		hantro_postproc_setup(ctx);
> > +
> >  	src_buf = hantro_get_src_buf(ctx);
> >  	v4l2_ctrl_request_complete(src_buf->vb2_buf.req_obj.req,
> >  				   &ctx->ctrl_handler);
> > diff --git a/drivers/staging/media/hantro/hantro_g1_h264_dec.c b/drivers/staging/media/hantro/hantro_g1_h264_dec.c
> > index 70a6b5b26477..9b292722c9de 100644
> > --- a/drivers/staging/media/hantro/hantro_g1_h264_dec.c
> > +++ b/drivers/staging/media/hantro/hantro_g1_h264_dec.c
> > @@ -243,7 +243,7 @@ static void set_buffers(struct hantro_ctx *ctx)
> >  	vdpu_write_relaxed(vpu, src_dma, G1_REG_ADDR_STR);
> >  
> >  	/* Destination (decoded frame) buffer. */
> > -	dst_dma = vb2_dma_contig_plane_dma_addr(&dst_buf->vb2_buf, 0);
> > +	dst_dma = hantro_get_dec_buf_addr(ctx, &dst_buf->vb2_buf);
> >  	vdpu_write_relaxed(vpu, dst_dma, G1_REG_ADDR_DST);
> >  
> >  	/* Higher profiles require DMV buffer appended to reference frames. */
> > diff --git a/drivers/staging/media/hantro/hantro_g1_mpeg2_dec.c b/drivers/staging/media/hantro/hantro_g1_mpeg2_dec.c
> > index f3bf67d8a289..0abe0be2c1ad 100644
> > --- a/drivers/staging/media/hantro/hantro_g1_mpeg2_dec.c
> > +++ b/drivers/staging/media/hantro/hantro_g1_mpeg2_dec.c
> > @@ -121,7 +121,7 @@ hantro_g1_mpeg2_dec_set_buffers(struct hantro_dev *vpu, struct hantro_ctx *ctx,
> >  	vdpu_write_relaxed(vpu, addr, G1_REG_RLC_VLC_BASE);
> >  
> >  	/* Destination frame buffer */
> > -	addr = vb2_dma_contig_plane_dma_addr(dst_buf, 0);
> > +	addr = hantro_get_dec_buf_addr(ctx, dst_buf);
> >  	current_addr = addr;
> >  
> >  	if (picture->picture_structure == PICT_BOTTOM_FIELD)
> > diff --git a/drivers/staging/media/hantro/hantro_g1_regs.h b/drivers/staging/media/hantro/hantro_g1_regs.h
> > index 5c0ea7994336..c1756e3d5391 100644
> > --- a/drivers/staging/media/hantro/hantro_g1_regs.h
> > +++ b/drivers/staging/media/hantro/hantro_g1_regs.h
> > @@ -9,6 +9,8 @@
> >  #ifndef HANTRO_G1_REGS_H_
> >  #define HANTRO_G1_REGS_H_
> >  
> > +#define G1_SWREG(nr)                 ((nr) * 4)
> > +
> >  /* Decoder registers. */
> >  #define G1_REG_INTERRUPT				0x004
> >  #define     G1_REG_INTERRUPT_DEC_PIC_INF		BIT(24)
> > @@ -298,4 +300,55 @@
> >  #define     G1_REG_REF_BUF_CTRL2_APF_THRESHOLD(x)	(((x) & 0x3fff) << 0)
> >  #define G1_REG_SOFT_RESET				0x194
> >  
> > +/* Post-processor registers. */
> > +#define G1_REG_PP_INTERRUPT		G1_SWREG(60)
> > +#define    G1_REG_PP_READY_IRQ		BIT(12)
> > +#define    G1_REG_PP_IRQ		BIT(8)
> > +#define    G1_REG_PP_IRQ_DIS		BIT(4)
> > +#define    G1_REG_PP_PIPELINE_EN	BIT(1)
> > +#define    G1_REG_PP_EXTERNAL_TRIGGER	BIT(0)
> > +#define G1_REG_PP_DEV_CONFIG		G1_SWREG(61)
> > +#define     G1_REG_PP_AXI_RD_ID(v)	(((v) << 24) & GENMASK(31, 24))
> > +#define     G1_REG_PP_AXI_WR_ID(v)	(((v) << 16) & GENMASK(23, 16))
> > +#define     G1_REG_PP_INSWAP32_E(v)	((v) ? BIT(10) : 0)
> > +#define     G1_REG_PP_DATA_DISC_E(v)	((v) ? BIT(9) : 0)
> > +#define     G1_REG_PP_CLK_GATE_E(v)	((v) ? BIT(8) : 0)
> > +#define     G1_REG_PP_IN_ENDIAN(v)	((v) ? BIT(7) : 0)
> > +#define     G1_REG_PP_OUT_ENDIAN(v)	((v) ? BIT(6) : 0)
> > +#define     G1_REG_PP_OUTSWAP32_E(v)	((v) ? BIT(5) : 0)
> > +#define     G1_REG_PP_MAX_BURST(v)	(((v) << 0) & GENMASK(4, 0))
> > +#define G1_REG_PP_IN_LUMA_BASE		G1_SWREG(63)
> > +#define G1_REG_PP_IN_CB_BASE		G1_SWREG(64)
> > +#define G1_REG_PP_IN_CR_BASE		G1_SWREG(65)
> > +#define G1_REG_PP_OUT_LUMA_BASE		G1_SWREG(66)
> > +#define G1_REG_PP_OUT_CHROMA_BASE	G1_SWREG(67)
> > +#define G1_REG_PP_CONTRAST_ADJUST	G1_SWREG(68)
> > +#define G1_REG_PP_COLOR_CONVERSION	G1_SWREG(69)
> > +#define G1_REG_PP_COLOR_CONVERSION0	G1_SWREG(70)
> > +#define G1_REG_PP_COLOR_CONVERSION1	G1_SWREG(71)
> > +#define G1_REG_PP_INPUT_SIZE		G1_SWREG(72)
> > +#define    G1_REG_PP_INPUT_SIZE_HEIGHT(v) (((v) << 9) & GENMASK(16, 9))
> > +#define    G1_REG_PP_INPUT_SIZE_WIDTH(v)  (((v) << 0) & GENMASK(8, 0))
> > +#define G1_REG_PP_SCALING0		G1_SWREG(79)
> > +#define     G1_REG_PP_PADD_R(v)	(((v) << 23) & GENMASK(27, 23))
> > +#define     G1_REG_PP_PADD_G(v)	(((v) << 18) & GENMASK(22, 18))
> > +#define     G1_REG_PP_RANGEMAP_Y(v) ((v) ? BIT(31) : 0)
> > +#define     G1_REG_PP_RANGEMAP_C(v) ((v) ? BIT(30) : 0)
> > +#define     G1_REG_PP_YCBCR_RANGE(v) ((v) ? BIT(29) : 0)
> > +#define     G1_REG_PP_RGB_16(v) ((v) ? BIT(28) : 0)
> > +#define G1_REG_PP_SCALING1		G1_SWREG(80)
> > +#define     G1_REG_PP_PADD_B(v)	(((v) << 18) & GENMASK(22, 18))
> > +#define G1_REG_PP_MASK_R		G1_SWREG(82)
> > +#define G1_REG_PP_MASK_G		G1_SWREG(83)
> > +#define G1_REG_PP_MASK_B		G1_SWREG(84)
> > +#define G1_REG_PP_CONTROL		G1_SWREG(85)
> > +#define     G1_REG_PP_CONTROL_IN_FMT(v)	(((v) << 29) & GENMASK(31, 29))
> > +#define     G1_REG_PP_CONTROL_OUT_FMT(v) (((v) << 26) & GENMASK(28, 26))
> > +#define     G1_REG_PP_CONTROL_OUT_HEIGHT(v) (((v) << 15) & GENMASK(25, 15))
> > +#define     G1_REG_PP_CONTROL_OUT_WIDTH(v) (((v) << 4) & GENMASK(14, 4))
> > +#define G1_REG_PP_MASK1_ORIG_WIDTH	G1_SWREG(88)
> > +#define     G1_REG_PP_ORIG_WIDTH(v)	(((v) << 23) & GENMASK(31, 23))
> > +#define G1_REG_PP_DISPLAY_WIDTH		G1_SWREG(92)
> > +#define G1_REG_PP_FUSE			G1_SWREG(99)
> > +
> >  #endif /* HANTRO_G1_REGS_H_ */
> > diff --git a/drivers/staging/media/hantro/hantro_g1_vp8_dec.c b/drivers/staging/media/hantro/hantro_g1_vp8_dec.c
> > index cad18094fee0..e708994d1aba 100644
> > --- a/drivers/staging/media/hantro/hantro_g1_vp8_dec.c
> > +++ b/drivers/staging/media/hantro/hantro_g1_vp8_dec.c
> > @@ -422,7 +422,7 @@ static void cfg_buffers(struct hantro_ctx *ctx,
> >  	}
> >  	vdpu_write_relaxed(vpu, reg, G1_REG_FWD_PIC(0));
> >  
> > -	dst_dma = vb2_dma_contig_plane_dma_addr(&vb2_dst->vb2_buf, 0);
> > +	dst_dma = hantro_get_dec_buf_addr(ctx, &vb2_dst->vb2_buf);
> >  	vdpu_write_relaxed(vpu, dst_dma, G1_REG_ADDR_DST);
> >  }
> >  
> > diff --git a/drivers/staging/media/hantro/hantro_h264.c b/drivers/staging/media/hantro/hantro_h264.c
> > index 694a330f508e..5c84ebcdd0ea 100644
> > --- a/drivers/staging/media/hantro/hantro_h264.c
> > +++ b/drivers/staging/media/hantro/hantro_h264.c
> > @@ -629,7 +629,11 @@ int hantro_h264_dec_init(struct hantro_ctx *ctx)
> >  	tbl = priv->cpu;
> >  	memcpy(tbl->cabac_table, h264_cabac_table, sizeof(tbl->cabac_table));
> >  
> > -	v4l2_fill_pixfmt_mp(&pix_mp, ctx->dst_fmt.pixelformat,
> > +	/*
> > +	 * For the decoder picture size, we want the decoder
> > +	 * native pixel format.
> > +	 */
> > +	v4l2_fill_pixfmt_mp(&pix_mp, V4L2_PIX_FMT_NV12,
> >  			    ctx->dst_fmt.width, ctx->dst_fmt.height);
> >  	h264_dec->pic_size = pix_mp.plane_fmt[0].sizeimage;
> >  
> > diff --git a/drivers/staging/media/hantro/hantro_hw.h b/drivers/staging/media/hantro/hantro_hw.h
> > index 69b88f4d3fb3..18e7d9e1f469 100644
> > --- a/drivers/staging/media/hantro/hantro_hw.h
> > +++ b/drivers/staging/media/hantro/hantro_hw.h
> > @@ -28,11 +28,13 @@ struct hantro_variant;
> >   * @cpu:	CPU pointer to the buffer.
> >   * @dma:	DMA address of the buffer.
> >   * @size:	Size of the buffer.
> > + * @attrs:	Attributes of the DMA mapping.
> >   */
> >  struct hantro_aux_buf {
> >  	void *cpu;
> >  	dma_addr_t dma;
> >  	size_t size;
> > +	unsigned long attrs;
> >  };
> >  
> >  /**
> > @@ -109,6 +111,15 @@ struct hantro_vp8_dec_hw_ctx {
> >  	struct hantro_aux_buf prob_tbl;
> >  };
> >  
> > +/**
> > + * struct hantro_postproc_ctx
> > + *
> > + * @dec_q:		References buffers, in decoder format.
> > + */
> > +struct hantro_postproc_ctx {
> > +	struct hantro_aux_buf dec_q[VB2_MAX_FRAME];
> > +};
> > +
> >  /**
> >   * struct hantro_codec_ops - codec mode specific operations
> >   *
> > @@ -144,6 +155,8 @@ extern const struct hantro_variant rk3399_vpu_variant;
> >  extern const struct hantro_variant rk3328_vpu_variant;
> >  extern const struct hantro_variant rk3288_vpu_variant;
> >  
> > +extern const struct hantro_postproc_regs hantro_g1_postproc_regs;
> > +
> >  extern const u32 hantro_vp8_dec_mc_filter[8][6];
> >  
> >  void hantro_watchdog(struct work_struct *work);
> > diff --git a/drivers/staging/media/hantro/hantro_postproc.c b/drivers/staging/media/hantro/hantro_postproc.c
> > new file mode 100644
> > index 000000000000..865435386363
> > --- /dev/null
> > +++ b/drivers/staging/media/hantro/hantro_postproc.c
> > @@ -0,0 +1,141 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Hantro G1 post-processor support
> > + *
> > + * Copyright (C) 2019 Collabora, Ltd.
> > + */
> > +
> > +#include <linux/dma-mapping.h>
> > +#include <linux/types.h>
> > +
> > +#include "hantro.h"
> > +#include "hantro_hw.h"
> > +#include "hantro_g1_regs.h"
> > +
> > +#define HANTRO_PP_REG_WRITE(vpu, reg_name, val) \
> > +	do { \
> > +		if ((vpu)->variant->postproc_regs->(reg_name).base)	\
> > +			hantro_reg_write_relaxed((vpu), \
> > +					 &(vpu)->variant->postproc_regs->(reg_name), \
> > +					 (val)); \
> > +	} while (0)
> > +
> > +#define HANTRO_PP_REG_WRITE_S(vpu, reg_name, val) \
> > +	do { \
> > +		if ((vpu)->variant->postproc_regs->(reg_name).base)	\
> > +			hantro_reg_write((vpu), \
> > +					 &(vpu)->variant->postproc_regs->(reg_name), \
> > +					 (val)); \
> > +	} while (0)
> 
> Why all these checks, are any of the register fields optional?
> 

That was the plan. Perhaps now it makes less sense,
but maybe it's safer this way, if it's extended?

OTOH, we might want to make sure the driver fails (or warns).

> > +
> > +#define VPU_PP_IN_YUYV			0x0
> > +#define VPU_PP_IN_NV12			0x1
> > +#define VPU_PP_IN_YUV420		0x2
> > +#define VPU_PP_IN_YUV240_TILED		0x5
> > +#define VPU_PP_OUT_RGB			0x0
> > +#define VPU_PP_OUT_YUYV			0x3
> > +
> > +const struct hantro_postproc_regs hantro_g1_postproc_regs = {
> > +	.pipeline_en = {G1_REG_PP_INTERRUPT, 1, 0x1},
> > +	.max_burst = {G1_REG_PP_DEV_CONFIG, 0, 0x1f},
> > +	.clk_gate = {G1_REG_PP_DEV_CONFIG, 1, 0x1},
> > +	.out_swap32 = {G1_REG_PP_DEV_CONFIG, 5, 0x1},
> > +	.out_endian = {G1_REG_PP_DEV_CONFIG, 6, 0x1},
> > +	.out_luma_base = {G1_REG_PP_OUT_LUMA_BASE, 0, 0xffffffff},
> > +	.input_width = {G1_REG_PP_INPUT_SIZE, 0, 0x1ff},
> > +	.input_height = {G1_REG_PP_INPUT_SIZE, 9, 0x1ff},
> > +	.output_width = {G1_REG_PP_CONTROL, 4, 0x7ff},
> > +	.output_height = {G1_REG_PP_CONTROL, 15, 0x7ff},
> > +	.input_fmt = {G1_REG_PP_CONTROL, 29, 0x7},
> > +	.output_fmt = {G1_REG_PP_CONTROL, 26, 0x7},
> > +	.orig_width = {G1_REG_PP_MASK1_ORIG_WIDTH, 23, 0x1ff},
> > +	.display_width = {G1_REG_PP_DISPLAY_WIDTH, 0, 0xfff},
> > +};
> > +
> > +void hantro_postproc_setup(struct hantro_ctx *ctx)
> > +{
> > +	struct hantro_dev *vpu = ctx->dev;
> > +	struct vb2_v4l2_buffer *dst_buf;
> > +	u32 src_pp_fmt, dst_pp_fmt;
> > +	dma_addr_t dst_dma;
> > +
> > +	/* Turn on pipeline mode. Must be done first. */
> > +	HANTRO_PP_REG_WRITE_S(vpu, pipeline_en, 0x1);
> > +
> > +	src_pp_fmt = VPU_PP_IN_NV12;
> > +
> > +	switch (ctx->vpu_dst_fmt->fourcc) {
> > +	case V4L2_PIX_FMT_YUYV:
> > +		dst_pp_fmt = VPU_PP_OUT_YUYV;
> > +		break;
> > +	default:
> > +		WARN(1, "output format %d not supported by the post-processor, this wasn't expected.",
> > +		     ctx->vpu_dst_fmt->fourcc);
> > +		dst_pp_fmt = 0;
> > +		break;
> > +	}
> > +
> > +	dst_buf = v4l2_m2m_next_dst_buf(ctx->fh.m2m_ctx);
> > +	dst_dma = vb2_dma_contig_plane_dma_addr(&dst_buf->vb2_buf, 0);
> > +
> > +	HANTRO_PP_REG_WRITE(vpu, clk_gate, 0x1);
> > +	HANTRO_PP_REG_WRITE(vpu, out_endian, 0x1);
> > +	HANTRO_PP_REG_WRITE(vpu, out_swap32, 0x1);
> > +	HANTRO_PP_REG_WRITE(vpu, max_burst, 16);
> > +	HANTRO_PP_REG_WRITE(vpu, out_luma_base, dst_dma);
> > +	HANTRO_PP_REG_WRITE(vpu, input_width, MB_WIDTH(ctx->dst_fmt.width));
> > +	HANTRO_PP_REG_WRITE(vpu, input_height, MB_HEIGHT(ctx->dst_fmt.height));
> > +	HANTRO_PP_REG_WRITE(vpu, input_fmt, src_pp_fmt);
> > +	HANTRO_PP_REG_WRITE(vpu, output_fmt, dst_pp_fmt);
> > +	HANTRO_PP_REG_WRITE(vpu, output_width, ctx->dst_fmt.width);
> > +	HANTRO_PP_REG_WRITE(vpu, output_height, ctx->dst_fmt.height);
> > +	HANTRO_PP_REG_WRITE(vpu, orig_width, MB_WIDTH(ctx->dst_fmt.width));
> > +	HANTRO_PP_REG_WRITE(vpu, display_width, ctx->dst_fmt.width);
> > +}
> > +
> > +void hantro_postproc_free(struct hantro_ctx *ctx)
> > +{
> > +	struct hantro_dev *vpu = ctx->dev;
> > +	unsigned int i;
> > +
> > +	for (i = 0; i < VB2_MAX_FRAME; ++i) {
> > +		struct hantro_aux_buf *priv = &ctx->postproc.dec_q[i];
> > +
> > +		if (priv->cpu) {
> > +			dma_free_attrs(vpu->dev, priv->size, priv->cpu,
> > +				       priv->dma, priv->attrs);
> > +			priv->cpu = NULL;
> > +		}
> > +	}
> > +}
> > +
> > +int hantro_postproc_alloc(struct hantro_ctx *ctx)
> > +{
> > +	struct hantro_dev *vpu = ctx->dev;
> > +	unsigned int i, buf_size;
> > +
> > +	buf_size = ctx->dst_fmt.plane_fmt[0].sizeimage;
> > +
> > +	for (i = 0; i < VB2_MAX_FRAME; ++i) {
> 
> If requests less than VB2_MAX_FRAME capture frames, some of those are
> unused. I think it would be better to match the amount of capture frames
> here.
> 

Right.

> > +		struct hantro_aux_buf *priv = &ctx->postproc.dec_q[i];
> > +
> > +		/*
> > +		 * The buffers on this queue are meant as intermediate
> > +		 * buffers for the decoder, so no mapping is needed.
> > +		 */
> > +		priv->attrs = DMA_ATTR_NO_KERNEL_MAPPING;
> > +		priv->cpu = dma_alloc_attrs(vpu->dev, buf_size, &priv->dma,
> > +					    GFP_KERNEL, priv->attrs);
> > +		if (!priv->cpu)
> > +			return -ENOMEM;
> > +		priv->size = buf_size;
> > +	}
> > +	return 0;
> > +}
> > +
> > +void hantro_postproc_disable(struct hantro_ctx *ctx)
> > +{
> > +	struct hantro_dev *vpu = ctx->dev;
> > +
> > +	HANTRO_PP_REG_WRITE_S(vpu, pipeline_en, 0x0);
> > +}
> > diff --git a/drivers/staging/media/hantro/hantro_v4l2.c b/drivers/staging/media/hantro/hantro_v4l2.c
> > index 238e53b28f8f..ff665d4f004f 100644
> > --- a/drivers/staging/media/hantro/hantro_v4l2.c
> > +++ b/drivers/staging/media/hantro/hantro_v4l2.c
> > @@ -46,6 +46,19 @@ hantro_get_formats(const struct hantro_ctx *ctx, unsigned int *num_fmts)
> >  	return formats;
> >  }
> >  
> > +static const struct hantro_fmt *
> > +hantro_get_postproc_formats(const struct hantro_ctx *ctx,
> > +			    unsigned int *num_fmts)
> > +{
> > +	if (hantro_is_encoder_ctx(ctx)) {
> > +		*num_fmts = 0;
> > +		return NULL;
> > +	}
> > +
> > +	*num_fmts = ctx->dev->variant->num_postproc_fmts;
> > +	return ctx->dev->variant->postproc_fmts;
> > +}
> > +
> >  static const struct hantro_fmt *
> >  hantro_find_format(const struct hantro_ctx *ctx, u32 fourcc)
> >  {
> > @@ -57,6 +70,10 @@ hantro_find_format(const struct hantro_ctx *ctx, u32 fourcc)
> >  		if (formats[i].fourcc == fourcc)
> >  			return &formats[i];
> >  
> > +	formats = hantro_get_postproc_formats(ctx, &num_fmts);
> > +	for (i = 0; i < num_fmts; i++)
> > +		if (formats[i].fourcc == fourcc)
> > +			return &formats[i];
> >  	return NULL;
> >  }
> >  
> > @@ -151,6 +168,20 @@ static int vidioc_enum_fmt(struct file *file, void *priv,
> >  		}
> >  		++j;
> >  	}
> > +
> > +	/* Enumerate post-processed formats. */
> 
> I think here some of the explanation from the commit message of patch 3
> could be added: why the driver is enumerating PP formats after non-PP
> formats.
> 

Right.

> > +	if (!capture)
> > +		return -EINVAL;
> > +	formats = hantro_get_postproc_formats(ctx, &num_fmts);
> > +	for (i = 0; i < num_fmts; i++) {
> > +		if (j == f->index) {
> > +			fmt = &formats[i];
> > +			f->pixelformat = fmt->fourcc;
> > +			return 0;
> > +		}
> > +		++j;
> > +	}
> > +
> >  	return -EINVAL;
> >  }
> >  
> > @@ -241,9 +272,10 @@ static int vidioc_try_fmt(struct file *file, void *priv, struct v4l2_format *f,
> >  		/*
> >  		 * The H264 decoder needs extra space on the output buffers
> >  		 * to store motion vectors. This is needed for reference
> > -		 * frames.
> > +		 * frames and only if the format is non-post-processed (NV12).
> >  		 */
> > -		if (ctx->vpu_src_fmt->fourcc == V4L2_PIX_FMT_H264_SLICE)
> > +		if (ctx->vpu_src_fmt->fourcc == V4L2_PIX_FMT_H264_SLICE &&
> > +		    fmt->fourcc == V4L2_PIX_FMT_NV12)
> 
> If you change hantro_needs_postproc to use a struct hantro_fmt argument,
> you could use hantro_needs_postproc(fmt) here.
> 

Makes sense.

> >  			pix_mp->plane_fmt[0].sizeimage +=
> >  				128 * DIV_ROUND_UP(pix_mp->width, 16) *
> >  				      DIV_ROUND_UP(pix_mp->height, 16);
> > @@ -611,10 +643,23 @@ static int hantro_start_streaming(struct vb2_queue *q, unsigned int count)
> >  
> >  		vpu_debug(4, "Codec mode = %d\n", codec_mode);
> >  		ctx->codec_ops = &ctx->dev->variant->codec_ops[codec_mode];
> > -		if (ctx->codec_ops->init)
> > +		if (ctx->codec_ops->init) {
> >  			ret = ctx->codec_ops->init(ctx);
> > +			if (ret)
> > +				return ret;
> > +		}
> > +
> > +		if (hantro_needs_postproc(ctx)) {
> > +			ret = hantro_postproc_alloc(ctx);
> 
> Why is this done in start_streaming? Wouldn't capture side REQBUFS be a
> better place for this?
> 

Yes, makes sense as well.

Thanks!
Ezequiel


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 2/3] media: hantro: Support color conversion via post-processing
  2019-11-15 15:44     ` Ezequiel Garcia
@ 2019-11-15 16:10       ` Philipp Zabel
  2019-12-05 14:33       ` Ezequiel Garcia
  1 sibling, 0 replies; 21+ messages in thread
From: Philipp Zabel @ 2019-11-15 16:10 UTC (permalink / raw)
  To: Ezequiel Garcia, linux-media
  Cc: kernel, Tomasz Figa, linux-rockchip, Heiko Stuebner,
	Jonas Karlman, Boris Brezillon, Chris Healy, linux-kernel

Hi Ezequiel,

On Fri, 2019-11-15 at 12:44 -0300, Ezequiel Garcia wrote:
> Hello Philipp,
> 
> Thanks for reviewing.
> 
> On Thu, 2019-11-14 at 10:48 +0100, Philipp Zabel wrote:
[...]
> > Why isn't PP enabled in prepare_run? Does this mean the first frame is
> > not post-processed?
> > 
> 
> No, because hantro_finish_run is called (despite its name)
> before the decoding operation is actually triggered.
> 
> I guess this hantro_finish_run name adds some confusion:
> prepare_run and finish_run should be something along
> start_prepare_run, end_prepare_run. 

Ah, ok then. I was confused because I just came from looking at coda-vpu 
code, where finish_run is a callback called after the device has
finished processing. Maybe I should rename that as well.

> And also, perhaps disabling the post-processor in prepare_run
> works just fine. I need to check that.

Ok.

[...]
> > > +#define HANTRO_PP_REG_WRITE_S(vpu, reg_name, val) \
> > > +	do { \
> > > +		if ((vpu)->variant->postproc_regs->(reg_name).base)	\
> > > +			hantro_reg_write((vpu), \
> > > +					 &(vpu)->variant->postproc_regs->(reg_name), \
> > > +					 (val)); \
> > > +	} while (0)
> > 
> > Why all these checks, are any of the register fields optional?
> > 
> 
> That was the plan. Perhaps now it makes less sense,
> but maybe it's safer this way, if it's extended?
> 
> OTOH, we might want to make sure the driver fails (or warns).

I think that would be better than silently ignoring them.

Although I don't quite see the point in repeatedly checking the presence
of mandatory register fields at runtime.

regards
Philipp


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 2/3] media: hantro: Support color conversion via post-processing
  2019-11-15 15:44     ` Ezequiel Garcia
  2019-11-15 16:10       ` Philipp Zabel
@ 2019-12-05 14:33       ` Ezequiel Garcia
  2019-12-05 14:46         ` Philipp Zabel
  1 sibling, 1 reply; 21+ messages in thread
From: Ezequiel Garcia @ 2019-12-05 14:33 UTC (permalink / raw)
  To: Philipp Zabel, linux-media
  Cc: kernel, Tomasz Figa, linux-rockchip, Heiko Stuebner,
	Jonas Karlman, Boris Brezillon, Chris Healy, linux-kernel

Hello Philipp,

On Fri, 2019-11-15 at 12:44 -0300, Ezequiel Garcia wrote:
> Hello Philipp,
> 
> Thanks for reviewing.
> 
> On Thu, 2019-11-14 at 10:48 +0100, Philipp Zabel wrote:
> > Hi Ezequiel,
> > 
> > On Wed, 2019-11-13 at 14:56 -0300, Ezequiel Garcia wrote:
> > > The Hantro G1 decoder is able to enable a post-processor
> > > on the decoding pipeline, which can be used to perform
> > > scaling and color conversion.
> > > 
> > > The post-processor is integrated to the decoder, and it's
> > > possible to use it in a way that is completely transparent
> > > to the user.
> > > 
> > > This commit enables color conversion via post-processing,
> > > which means the driver now exposes YUV packed, in addition to NV12.
> > > 
> > > Signed-off-by: Ezequiel Garcia <ezequiel@collabora.com>
> > > ---
> > >  drivers/staging/media/hantro/Makefile         |   1 +
> > >  drivers/staging/media/hantro/hantro.h         |  64 +++++++-
> > >  drivers/staging/media/hantro/hantro_drv.c     |   8 +-
> > >  .../staging/media/hantro/hantro_g1_h264_dec.c |   2 +-
> > >  .../media/hantro/hantro_g1_mpeg2_dec.c        |   2 +-
> > >  drivers/staging/media/hantro/hantro_g1_regs.h |  53 +++++++
> > >  .../staging/media/hantro/hantro_g1_vp8_dec.c  |   2 +-
> > >  drivers/staging/media/hantro/hantro_h264.c    |   6 +-
> > >  drivers/staging/media/hantro/hantro_hw.h      |  13 ++
> > >  .../staging/media/hantro/hantro_postproc.c    | 141 ++++++++++++++++++
> > >  drivers/staging/media/hantro/hantro_v4l2.c    |  52 ++++++-
> > >  drivers/staging/media/hantro/rk3288_vpu_hw.c  |  10 ++
> > >  12 files changed, 343 insertions(+), 11 deletions(-)
> > >  create mode 100644 drivers/staging/media/hantro/hantro_postproc.c
> > > 
> > > 
[..]
> 
> > >  			pix_mp->plane_fmt[0].sizeimage +=
> > >  				128 * DIV_ROUND_UP(pix_mp->width, 16) *
> > >  				      DIV_ROUND_UP(pix_mp->height, 16);
> > > @@ -611,10 +643,23 @@ static int hantro_start_streaming(struct vb2_queue *q, unsigned int count)
> > >  
> > >  		vpu_debug(4, "Codec mode = %d\n", codec_mode);
> > >  		ctx->codec_ops = &ctx->dev->variant->codec_ops[codec_mode];
> > > -		if (ctx->codec_ops->init)
> > > +		if (ctx->codec_ops->init) {
> > >  			ret = ctx->codec_ops->init(ctx);
> > > +			if (ret)
> > > +				return ret;
> > > +		}
> > > +
> > > +		if (hantro_needs_postproc(ctx)) {
> > > +			ret = hantro_postproc_alloc(ctx);
> > 
> > Why is this done in start_streaming? Wouldn't capture side REQBUFS be a
> > better place for this?
> > 
> 
> Yes, makes sense as well.
> 

This didn't work so well, so I have decided to leave it as-is in the
just submitted v4 series.

The vb2 framework provides two mechanism for drivers to allocate
buffers, REQBUFS and CREATEBUFS, so the bounce buffer allocation
has to be hooked on both of them.

Also, REQBUFS and CREATEBUFS can be called multiple times
to grow/shrink the vb2_queue, so the driver has to check
if the bounce buffers were already created or not.

Not a big deal, but I felt the implementation ended up being
too nasty for my taste.

If fragmentation turns out to be an issue and we want to avoid
allocating and destroying in start and stop (STREAMOFF, STREAMON),
we can revisit this.

Thanks,
Ezequiel


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 2/3] media: hantro: Support color conversion via post-processing
  2019-12-05 14:33       ` Ezequiel Garcia
@ 2019-12-05 14:46         ` Philipp Zabel
  2019-12-05 16:02           ` Ezequiel Garcia
  0 siblings, 1 reply; 21+ messages in thread
From: Philipp Zabel @ 2019-12-05 14:46 UTC (permalink / raw)
  To: Ezequiel Garcia, linux-media
  Cc: kernel, Tomasz Figa, linux-rockchip, Heiko Stuebner,
	Jonas Karlman, Boris Brezillon, Chris Healy, linux-kernel

On Thu, 2019-12-05 at 11:33 -0300, Ezequiel Garcia wrote:
> Hello Philipp,
> 
> On Fri, 2019-11-15 at 12:44 -0300, Ezequiel Garcia wrote:
> > Hello Philipp,
> > 
> > Thanks for reviewing.
> > 
> > On Thu, 2019-11-14 at 10:48 +0100, Philipp Zabel wrote:
> > > Hi Ezequiel,
> > > 
> > > On Wed, 2019-11-13 at 14:56 -0300, Ezequiel Garcia wrote:
> > > > The Hantro G1 decoder is able to enable a post-processor
> > > > on the decoding pipeline, which can be used to perform
> > > > scaling and color conversion.
> > > > 
> > > > The post-processor is integrated to the decoder, and it's
> > > > possible to use it in a way that is completely transparent
> > > > to the user.
> > > > 
> > > > This commit enables color conversion via post-processing,
> > > > which means the driver now exposes YUV packed, in addition to NV12.
> > > > 
> > > > Signed-off-by: Ezequiel Garcia <ezequiel@collabora.com>
> > > > ---
> > > >  drivers/staging/media/hantro/Makefile         |   1 +
> > > >  drivers/staging/media/hantro/hantro.h         |  64 +++++++-
> > > >  drivers/staging/media/hantro/hantro_drv.c     |   8 +-
> > > >  .../staging/media/hantro/hantro_g1_h264_dec.c |   2 +-
> > > >  .../media/hantro/hantro_g1_mpeg2_dec.c        |   2 +-
> > > >  drivers/staging/media/hantro/hantro_g1_regs.h |  53 +++++++
> > > >  .../staging/media/hantro/hantro_g1_vp8_dec.c  |   2 +-
> > > >  drivers/staging/media/hantro/hantro_h264.c    |   6 +-
> > > >  drivers/staging/media/hantro/hantro_hw.h      |  13 ++
> > > >  .../staging/media/hantro/hantro_postproc.c    | 141 ++++++++++++++++++
> > > >  drivers/staging/media/hantro/hantro_v4l2.c    |  52 ++++++-
> > > >  drivers/staging/media/hantro/rk3288_vpu_hw.c  |  10 ++
> > > >  12 files changed, 343 insertions(+), 11 deletions(-)
> > > >  create mode 100644 drivers/staging/media/hantro/hantro_postproc.c
> > > > 
> > > > 
> [..]
> > > >  			pix_mp->plane_fmt[0].sizeimage +=
> > > >  				128 * DIV_ROUND_UP(pix_mp->width, 16) *
> > > >  				      DIV_ROUND_UP(pix_mp->height, 16);
> > > > @@ -611,10 +643,23 @@ static int hantro_start_streaming(struct vb2_queue *q, unsigned int count)
> > > >  
> > > >  		vpu_debug(4, "Codec mode = %d\n", codec_mode);
> > > >  		ctx->codec_ops = &ctx->dev->variant->codec_ops[codec_mode];
> > > > -		if (ctx->codec_ops->init)
> > > > +		if (ctx->codec_ops->init) {
> > > >  			ret = ctx->codec_ops->init(ctx);
> > > > +			if (ret)
> > > > +				return ret;
> > > > +		}
> > > > +
> > > > +		if (hantro_needs_postproc(ctx)) {
> > > > +			ret = hantro_postproc_alloc(ctx);
> > > 
> > > Why is this done in start_streaming? Wouldn't capture side REQBUFS be a
> > > better place for this?
> > > 
> > 
> > Yes, makes sense as well.
> > 
> 
> This didn't work so well, so I have decided to leave it as-is in the
> just submitted v4 series.
> 
> The vb2 framework provides two mechanism for drivers to allocate
> buffers, REQBUFS and CREATEBUFS, so the bounce buffer allocation
> has to be hooked on both of them.

That is a good point, now that we don't allocate VB2_MAX_FRAME bounce
buffers at start_streaming time anymore, what happens if additional
capture buffers are created with CREATEBUFS while streaming?

> Also, REQBUFS and CREATEBUFS can be called multiple times
> to grow/shrink the vb2_queue, so the driver has to check
> if the bounce buffers were already created or not.
> 
> Not a big deal, but I felt the implementation ended up being
> too nasty for my taste.
> 
> If fragmentation turns out to be an issue and we want to avoid
> allocating and destroying in start and stop (STREAMOFF, STREAMON),
> we can revisit this.

regards
Philipp


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 2/3] media: hantro: Support color conversion via post-processing
  2019-12-05 14:46         ` Philipp Zabel
@ 2019-12-05 16:02           ` Ezequiel Garcia
  0 siblings, 0 replies; 21+ messages in thread
From: Ezequiel Garcia @ 2019-12-05 16:02 UTC (permalink / raw)
  To: Philipp Zabel, linux-media
  Cc: kernel, Tomasz Figa, linux-rockchip, Heiko Stuebner,
	Jonas Karlman, Boris Brezillon, Chris Healy, linux-kernel

On Thu, 2019-12-05 at 15:46 +0100, Philipp Zabel wrote:
> On Thu, 2019-12-05 at 11:33 -0300, Ezequiel Garcia wrote:
> > Hello Philipp,
> > 
> > On Fri, 2019-11-15 at 12:44 -0300, Ezequiel Garcia wrote:
> > > Hello Philipp,
> > > 
> > > Thanks for reviewing.
> > > 
> > > On Thu, 2019-11-14 at 10:48 +0100, Philipp Zabel wrote:
> > > > Hi Ezequiel,
> > > > 
> > > > On Wed, 2019-11-13 at 14:56 -0300, Ezequiel Garcia wrote:
> > > > > The Hantro G1 decoder is able to enable a post-processor
> > > > > on the decoding pipeline, which can be used to perform
> > > > > scaling and color conversion.
> > > > > 
> > > > > The post-processor is integrated to the decoder, and it's
> > > > > possible to use it in a way that is completely transparent
> > > > > to the user.
> > > > > 
> > > > > This commit enables color conversion via post-processing,
> > > > > which means the driver now exposes YUV packed, in addition to NV12.
> > > > > 
> > > > > Signed-off-by: Ezequiel Garcia <ezequiel@collabora.com>
> > > > > ---
> > > > >  drivers/staging/media/hantro/Makefile         |   1 +
> > > > >  drivers/staging/media/hantro/hantro.h         |  64 +++++++-
> > > > >  drivers/staging/media/hantro/hantro_drv.c     |   8 +-
> > > > >  .../staging/media/hantro/hantro_g1_h264_dec.c |   2 +-
> > > > >  .../media/hantro/hantro_g1_mpeg2_dec.c        |   2 +-
> > > > >  drivers/staging/media/hantro/hantro_g1_regs.h |  53 +++++++
> > > > >  .../staging/media/hantro/hantro_g1_vp8_dec.c  |   2 +-
> > > > >  drivers/staging/media/hantro/hantro_h264.c    |   6 +-
> > > > >  drivers/staging/media/hantro/hantro_hw.h      |  13 ++
> > > > >  .../staging/media/hantro/hantro_postproc.c    | 141 ++++++++++++++++++
> > > > >  drivers/staging/media/hantro/hantro_v4l2.c    |  52 ++++++-
> > > > >  drivers/staging/media/hantro/rk3288_vpu_hw.c  |  10 ++
> > > > >  12 files changed, 343 insertions(+), 11 deletions(-)
> > > > >  create mode 100644 drivers/staging/media/hantro/hantro_postproc.c
> > > > > 
> > > > > 
> > [..]
> > > > >  			pix_mp->plane_fmt[0].sizeimage +=
> > > > >  				128 * DIV_ROUND_UP(pix_mp->width, 16) *
> > > > >  				      DIV_ROUND_UP(pix_mp->height, 16);
> > > > > @@ -611,10 +643,23 @@ static int hantro_start_streaming(struct vb2_queue *q, unsigned int count)
> > > > >  
> > > > >  		vpu_debug(4, "Codec mode = %d\n", codec_mode);
> > > > >  		ctx->codec_ops = &ctx->dev->variant->codec_ops[codec_mode];
> > > > > -		if (ctx->codec_ops->init)
> > > > > +		if (ctx->codec_ops->init) {
> > > > >  			ret = ctx->codec_ops->init(ctx);
> > > > > +			if (ret)
> > > > > +				return ret;
> > > > > +		}
> > > > > +
> > > > > +		if (hantro_needs_postproc(ctx)) {
> > > > > +			ret = hantro_postproc_alloc(ctx);
> > > > 
> > > > Why is this done in start_streaming? Wouldn't capture side REQBUFS be a
> > > > better place for this?
> > > > 
> > > 
> > > Yes, makes sense as well.
> > > 
> > 
> > This didn't work so well, so I have decided to leave it as-is in the
> > just submitted v4 series.
> > 
> > The vb2 framework provides two mechanism for drivers to allocate
> > buffers, REQBUFS and CREATEBUFS, so the bounce buffer allocation
> > has to be hooked on both of them.
> 
> That is a good point, now that we don't allocate VB2_MAX_FRAME bounce
> buffers at start_streaming time anymore, what happens if additional
> capture buffers are created with CREATEBUFS while streaming?
> 

If I understand vb2 logic correctly, then I do not think anything
will happen, because the newly created buffers won't be queued. 

Regards,
Ezequiel


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 0/3] Enable Hantro G1 post-processor
  2019-11-13 17:56 [PATCH v3 0/3] Enable Hantro G1 post-processor Ezequiel Garcia
                   ` (2 preceding siblings ...)
  2019-11-13 17:56 ` [PATCH v3 3/3] media: vidioc-enum-fmt.rst: clarify format preference Ezequiel Garcia
@ 2020-02-09 19:52 ` Nicolas Dufresne
  2020-02-10  2:45   ` Tomasz Figa
  3 siblings, 1 reply; 21+ messages in thread
From: Nicolas Dufresne @ 2020-02-09 19:52 UTC (permalink / raw)
  To: Ezequiel Garcia, linux-media
  Cc: kernel, Tomasz Figa, linux-rockchip, Heiko Stuebner,
	Jonas Karlman, Philipp Zabel, Boris Brezillon, Chris Healy,
	linux-kernel

Le mercredi 13 novembre 2019 à 14:56 -0300, Ezequiel Garcia a écrit :
> Hi all,
> 
> The Hantro G1 VPU post-processor block can be pipelined with
> the decoder hardware, allowing to perform operations such as
> color conversion, scaling, rotation, cropping, among others.
> 
> When the post-processor is enabled, the decoder hardware
> needs its own set of NV12 buffers (the native decoder format),
> and the post-processor is the owner of the CAPTURE buffers,
> allocated for the post-processed format.
> 
> This way, applications obtain post-processed
> (scaled, converted, etc) buffers transparently.
> 
> This feature is implemented by exposing the post-processed pixel
> formats on ENUM_FMT, ordered as "preferred pixelformat first":
> 
> v4l2-ctl -d 1 --list-formats
> ioctl: VIDIOC_ENUM_FMT
> 	Type: Video Capture Multiplanar
> 
> 	[0]: 'NV12' (Y/CbCr 4:2:0)
> 	[1]: 'YUYV' (YUYV 4:2:2)
> 
> The order of preference in ENUM_FMT can be used as a hint
> by applications. This series updates the uAPI specification
> accordingly.

As I'm implementing this, I realize that there may me a gap in being
able to implement both IPP and non-IPP support in a generic framework.
Unlike the above comment, we for non-IPP decoder we cannot naively pick
the first format. In fact we parse the chroma and depth information
from the headers (like pps from H264), and we pick a matching pixel
format. This way, if we have a 10bit stream, and our IP supports 10bit,
we will pick a 10bit pixel formats, otherwise decoding will just fail.

None of this information is passed to the driver prior to the first
Request being made, so there is no way (as of current spec) that the
driver can validate this in try_fmt ahead of time. Unless I set picture
parameters without a request_fd for that purpose. If this is the way,
then we should document this.

Is this the intended way to negotiation IPP functions with the driver ?

> 
> When the application sets a pixel format other than NV12,
> the post-processor is transparently enabled.
> 
> Patch 1 is a cleanups needed to easier integrate the post-processor.
> Patch 2 introduces the post-processing support.
> Patch 3 updates the uAPI specification.
> 
> This is tested on RK3288 platforms with MPEG-2, VP8 and
> H264 streams, decoding to YUY2 surfaces. For now, this series
> is only adding support for NV12-to-YUY2 conversion.
> 
> Applies to media/master.
> 
> Future plans
> ------------
> 
> It seems to me that we should start moving this driver to use
> regmap-based access to registers. However, such move is out of scope
> and not entirely related to this post-processor enablement.
> 
> We'll work on that as follow-up patches.
> 
> Changelog
> ---------
> 
> Changes v3:
> 
> * After discussing with Hans and Tomasz during the media summit
> in ELCE, we decided to go back on the MC changes. The MC topology
> is now untouched. This means the series is now similar to v1,
> except we explicitly use the ENUM_FMT to hint about the post-processed
> formats.
> 
> Changes v2:
> 
> * The decoder->post-processor topology is now exposed
>   explicitly and applications need to configure the pipeline.
>   By default, the decoder is enabled and the post-processor
>   is disabled.
> 
> * RGB post-processing output has been dropped. We might
>   add this in the future, but for now, it seems it would
>   make the code more complex without a use-case in mind.
>   RGB is much more memory-consuming so less attractive
>   than YUV, and modern GPUs and display controllers support YUV.
> 
> * The post-processor implementation still supports RK3288
>   only. However, a generic register infrastructure is introduced
>   to make addition of other variants such as RK3399 really easy.
> 
> Ezequiel Garcia (3):
>   media: hantro: Cleanup format negotiation helpers
>   media: hantro: Support color conversion via post-processing
>   media: vidioc-enum-fmt.rst: clarify format preference
> 
>  .../media/uapi/v4l/vidioc-enum-fmt.rst        |   4 +-
>  drivers/staging/media/hantro/Makefile         |   1 +
>  drivers/staging/media/hantro/hantro.h         |  64 +++++++-
>  drivers/staging/media/hantro/hantro_drv.c     |   8 +-
>  .../staging/media/hantro/hantro_g1_h264_dec.c |   2 +-
>  .../media/hantro/hantro_g1_mpeg2_dec.c        |   2 +-
>  drivers/staging/media/hantro/hantro_g1_regs.h |  53 +++++++
>  .../staging/media/hantro/hantro_g1_vp8_dec.c  |   2 +-
>  drivers/staging/media/hantro/hantro_h264.c    |   6 +-
>  drivers/staging/media/hantro/hantro_hw.h      |  13 ++
>  .../staging/media/hantro/hantro_postproc.c    | 141 ++++++++++++++++++
>  drivers/staging/media/hantro/hantro_v4l2.c    | 105 ++++++++-----
>  drivers/staging/media/hantro/rk3288_vpu_hw.c  |  10 ++
>  13 files changed, 366 insertions(+), 45 deletions(-)
>  create mode 100644 drivers/staging/media/hantro/hantro_postproc.c
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 0/3] Enable Hantro G1 post-processor
  2020-02-09 19:52 ` [PATCH v3 0/3] Enable Hantro G1 post-processor Nicolas Dufresne
@ 2020-02-10  2:45   ` Tomasz Figa
  2020-02-11  4:16     ` Nicolas Dufresne
  0 siblings, 1 reply; 21+ messages in thread
From: Tomasz Figa @ 2020-02-10  2:45 UTC (permalink / raw)
  To: Nicolas Dufresne
  Cc: Ezequiel Garcia, Linux Media Mailing List, kernel,
	open list:ARM/Rockchip SoC...,
	Heiko Stuebner, Jonas Karlman, Philipp Zabel, Boris Brezillon,
	Chris Healy, Linux Kernel Mailing List, Alexandre Courbot

On Mon, Feb 10, 2020 at 4:52 AM Nicolas Dufresne <nicolas@ndufresne.ca> wrote:
>
> Le mercredi 13 novembre 2019 à 14:56 -0300, Ezequiel Garcia a écrit :
> > Hi all,
> >
> > The Hantro G1 VPU post-processor block can be pipelined with
> > the decoder hardware, allowing to perform operations such as
> > color conversion, scaling, rotation, cropping, among others.
> >
> > When the post-processor is enabled, the decoder hardware
> > needs its own set of NV12 buffers (the native decoder format),
> > and the post-processor is the owner of the CAPTURE buffers,
> > allocated for the post-processed format.
> >
> > This way, applications obtain post-processed
> > (scaled, converted, etc) buffers transparently.
> >
> > This feature is implemented by exposing the post-processed pixel
> > formats on ENUM_FMT, ordered as "preferred pixelformat first":
> >
> > v4l2-ctl -d 1 --list-formats
> > ioctl: VIDIOC_ENUM_FMT
> >       Type: Video Capture Multiplanar
> >
> >       [0]: 'NV12' (Y/CbCr 4:2:0)
> >       [1]: 'YUYV' (YUYV 4:2:2)
> >
> > The order of preference in ENUM_FMT can be used as a hint
> > by applications. This series updates the uAPI specification
> > accordingly.
>
> As I'm implementing this, I realize that there may me a gap in being
> able to implement both IPP and non-IPP support in a generic framework.
> Unlike the above comment, we for non-IPP decoder we cannot naively pick
> the first format. In fact we parse the chroma and depth information
> from the headers (like pps from H264), and we pick a matching pixel
> format. This way, if we have a 10bit stream, and our IP supports 10bit,
> we will pick a 10bit pixel formats, otherwise decoding will just fail.
>
> None of this information is passed to the driver prior to the first
> Request being made, so there is no way (as of current spec) that the
> driver can validate this in try_fmt ahead of time. Unless I set picture
> parameters without a request_fd for that purpose. If this is the way,
> then we should document this.

+Alexandre Courbot

It was suggested in the very early RFC stage, but it looks like it
didn't make it to the final spec.
https://patchwork.kernel.org/patch/10583233/#22209555

>
> Is this the intended way to negotiation IPP functions with the driver ?
>

In theory, if the userspace knows whether the stream is 4:2:0 or 4:2:2
and 8-bit or 10-bit, it can still select the first format from the top
that matches these properties.

That's not how format handling in V4L2 works, though. ENUM_FMT is
expected to return a list of valid formats and if we forget about the
image processor for a moment, a stateless decoder would always return
any possible format, including ones invalid for the stream.

Now back to the image processor, if it handles conversions from any to
any format listed by ENUM_FMT, we kind of regain the V4L2 compliance,
but if the conversions are limited, the above requirement still
doesn't hold and we're not implementing V4L2 correctly.

Perhaps we can still amend the spec and require controls that
determine the stream properties to be set before starting the
streaming? I can imagine it could also help the driver filter out some
unsupported streams early, before allocating buffers and attempting to
decode.

Best regards,
Tomasz

> >
> > When the application sets a pixel format other than NV12,
> > the post-processor is transparently enabled.
> >
> > Patch 1 is a cleanups needed to easier integrate the post-processor.
> > Patch 2 introduces the post-processing support.
> > Patch 3 updates the uAPI specification.
> >
> > This is tested on RK3288 platforms with MPEG-2, VP8 and
> > H264 streams, decoding to YUY2 surfaces. For now, this series
> > is only adding support for NV12-to-YUY2 conversion.
> >
> > Applies to media/master.
> >
> > Future plans
> > ------------
> >
> > It seems to me that we should start moving this driver to use
> > regmap-based access to registers. However, such move is out of scope
> > and not entirely related to this post-processor enablement.
> >
> > We'll work on that as follow-up patches.
> >
> > Changelog
> > ---------
> >
> > Changes v3:
> >
> > * After discussing with Hans and Tomasz during the media summit
> > in ELCE, we decided to go back on the MC changes. The MC topology
> > is now untouched. This means the series is now similar to v1,
> > except we explicitly use the ENUM_FMT to hint about the post-processed
> > formats.
> >
> > Changes v2:
> >
> > * The decoder->post-processor topology is now exposed
> >   explicitly and applications need to configure the pipeline.
> >   By default, the decoder is enabled and the post-processor
> >   is disabled.
> >
> > * RGB post-processing output has been dropped. We might
> >   add this in the future, but for now, it seems it would
> >   make the code more complex without a use-case in mind.
> >   RGB is much more memory-consuming so less attractive
> >   than YUV, and modern GPUs and display controllers support YUV.
> >
> > * The post-processor implementation still supports RK3288
> >   only. However, a generic register infrastructure is introduced
> >   to make addition of other variants such as RK3399 really easy.
> >
> > Ezequiel Garcia (3):
> >   media: hantro: Cleanup format negotiation helpers
> >   media: hantro: Support color conversion via post-processing
> >   media: vidioc-enum-fmt.rst: clarify format preference
> >
> >  .../media/uapi/v4l/vidioc-enum-fmt.rst        |   4 +-
> >  drivers/staging/media/hantro/Makefile         |   1 +
> >  drivers/staging/media/hantro/hantro.h         |  64 +++++++-
> >  drivers/staging/media/hantro/hantro_drv.c     |   8 +-
> >  .../staging/media/hantro/hantro_g1_h264_dec.c |   2 +-
> >  .../media/hantro/hantro_g1_mpeg2_dec.c        |   2 +-
> >  drivers/staging/media/hantro/hantro_g1_regs.h |  53 +++++++
> >  .../staging/media/hantro/hantro_g1_vp8_dec.c  |   2 +-
> >  drivers/staging/media/hantro/hantro_h264.c    |   6 +-
> >  drivers/staging/media/hantro/hantro_hw.h      |  13 ++
> >  .../staging/media/hantro/hantro_postproc.c    | 141 ++++++++++++++++++
> >  drivers/staging/media/hantro/hantro_v4l2.c    | 105 ++++++++-----
> >  drivers/staging/media/hantro/rk3288_vpu_hw.c  |  10 ++
> >  13 files changed, 366 insertions(+), 45 deletions(-)
> >  create mode 100644 drivers/staging/media/hantro/hantro_postproc.c
> >
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 0/3] Enable Hantro G1 post-processor
  2020-02-10  2:45   ` Tomasz Figa
@ 2020-02-11  4:16     ` Nicolas Dufresne
  2020-02-11 16:22       ` Nicolas Dufresne
  0 siblings, 1 reply; 21+ messages in thread
From: Nicolas Dufresne @ 2020-02-11  4:16 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: Ezequiel Garcia, Linux Media Mailing List, kernel,
	open list:ARM/Rockchip SoC...,
	Heiko Stuebner, Jonas Karlman, Philipp Zabel, Boris Brezillon,
	Chris Healy, Linux Kernel Mailing List, Alexandre Courbot

Le lundi 10 février 2020 à 11:45 +0900, Tomasz Figa a écrit :
> On Mon, Feb 10, 2020 at 4:52 AM Nicolas Dufresne <nicolas@ndufresne.ca> wrote:
> > Le mercredi 13 novembre 2019 à 14:56 -0300, Ezequiel Garcia a écrit :
> > > Hi all,
> > > 
> > > The Hantro G1 VPU post-processor block can be pipelined with
> > > the decoder hardware, allowing to perform operations such as
> > > color conversion, scaling, rotation, cropping, among others.
> > > 
> > > When the post-processor is enabled, the decoder hardware
> > > needs its own set of NV12 buffers (the native decoder format),
> > > and the post-processor is the owner of the CAPTURE buffers,
> > > allocated for the post-processed format.
> > > 
> > > This way, applications obtain post-processed
> > > (scaled, converted, etc) buffers transparently.
> > > 
> > > This feature is implemented by exposing the post-processed pixel
> > > formats on ENUM_FMT, ordered as "preferred pixelformat first":
> > > 
> > > v4l2-ctl -d 1 --list-formats
> > > ioctl: VIDIOC_ENUM_FMT
> > >       Type: Video Capture Multiplanar
> > > 
> > >       [0]: 'NV12' (Y/CbCr 4:2:0)
> > >       [1]: 'YUYV' (YUYV 4:2:2)
> > > 
> > > The order of preference in ENUM_FMT can be used as a hint
> > > by applications. This series updates the uAPI specification
> > > accordingly.
> > 
> > As I'm implementing this, I realize that there may me a gap in being
> > able to implement both IPP and non-IPP support in a generic framework.
> > Unlike the above comment, we for non-IPP decoder we cannot naively pick
> > the first format. In fact we parse the chroma and depth information
> > from the headers (like pps from H264), and we pick a matching pixel
> > format. This way, if we have a 10bit stream, and our IP supports 10bit,
> > we will pick a 10bit pixel formats, otherwise decoding will just fail.
> > 
> > None of this information is passed to the driver prior to the first
> > Request being made, so there is no way (as of current spec) that the
> > driver can validate this in try_fmt ahead of time. Unless I set picture
> > parameters without a request_fd for that purpose. If this is the way,
> > then we should document this.
> 
> +Alexandre Courbot
> 
> It was suggested in the very early RFC stage, but it looks like it
> didn't make it to the final spec.
> https://patchwork.kernel.org/patch/10583233/#22209555

Ok, maybe we should revive it, it would fill that gap. Again, only an
issue if you have a post processor. I'm still surprised we didn't
expose the IPP functions through the topology, it would make so much
sense to me, and I can make better code with that knowledge.

I found while coding this, that even if it's more difficult,
classification of device by looking at the topology and the entity
functions is much nicer, less of a guess.

Though, we lack some documentation (or clarification) for how to
properly handle formats, size and selection in order to configure the
IPP. Ezequiel was saying that we don't implement selection in Hanto, so
I guess the scaling is a bit ambiguous then in regard to coded/display
sizes. Though we pass a size to the OUTPUT side, so the driver can
always control a little bit.

> 
> > Is this the intended way to negotiation IPP functions with the driver ?
> > 
> 
> In theory, if the userspace knows whether the stream is 4:2:0 or 4:2:2
> and 8-bit or 10-bit, it can still select the first format from the top
> that matches these properties.

That's exactly what I do, I have a map of the formats and their
chroma/depth, and take the first one that match. If that fails, I just
fallback to the first one. It's something you need to do anyway, as we
prefer the native format first (even if there is an IPP).

> 
> That's not how format handling in V4L2 works, though. ENUM_FMT is
> expected to return a list of valid formats and if we forget about the
> image processor for a moment, a stateless decoder would always return
> any possible format, including ones invalid for the stream.
> 
> Now back to the image processor, if it handles conversions from any to
> any format listed by ENUM_FMT, we kind of regain the V4L2 compliance,
> but if the conversions are limited, the above requirement still
> doesn't hold and we're not implementing V4L2 correctly.
> 
> Perhaps we can still amend the spec and require controls that
> determine the stream properties to be set before starting the
> streaming? I can imagine it could also help the driver filter out some
> unsupported streams early, before allocating buffers and attempting to
> decode.

I think it would make sense, so just to make sure, for H264 we could
set the V4L2_CID_MPEG_VIDEO_H264_SPS along with the OUTPUT format,
prior to CAPTURE enumeration.

> 
> Best regards,
> Tomasz
> 
> > > When the application sets a pixel format other than NV12,
> > > the post-processor is transparently enabled.
> > > 
> > > Patch 1 is a cleanups needed to easier integrate the post-processor.
> > > Patch 2 introduces the post-processing support.
> > > Patch 3 updates the uAPI specification.
> > > 
> > > This is tested on RK3288 platforms with MPEG-2, VP8 and
> > > H264 streams, decoding to YUY2 surfaces. For now, this series
> > > is only adding support for NV12-to-YUY2 conversion.
> > > 
> > > Applies to media/master.
> > > 
> > > Future plans
> > > ------------
> > > 
> > > It seems to me that we should start moving this driver to use
> > > regmap-based access to registers. However, such move is out of scope
> > > and not entirely related to this post-processor enablement.
> > > 
> > > We'll work on that as follow-up patches.
> > > 
> > > Changelog
> > > ---------
> > > 
> > > Changes v3:
> > > 
> > > * After discussing with Hans and Tomasz during the media summit
> > > in ELCE, we decided to go back on the MC changes. The MC topology
> > > is now untouched. This means the series is now similar to v1,
> > > except we explicitly use the ENUM_FMT to hint about the post-processed
> > > formats.
> > > 
> > > Changes v2:
> > > 
> > > * The decoder->post-processor topology is now exposed
> > >   explicitly and applications need to configure the pipeline.
> > >   By default, the decoder is enabled and the post-processor
> > >   is disabled.
> > > 
> > > * RGB post-processing output has been dropped. We might
> > >   add this in the future, but for now, it seems it would
> > >   make the code more complex without a use-case in mind.
> > >   RGB is much more memory-consuming so less attractive
> > >   than YUV, and modern GPUs and display controllers support YUV.
> > > 
> > > * The post-processor implementation still supports RK3288
> > >   only. However, a generic register infrastructure is introduced
> > >   to make addition of other variants such as RK3399 really easy.
> > > 
> > > Ezequiel Garcia (3):
> > >   media: hantro: Cleanup format negotiation helpers
> > >   media: hantro: Support color conversion via post-processing
> > >   media: vidioc-enum-fmt.rst: clarify format preference
> > > 
> > >  .../media/uapi/v4l/vidioc-enum-fmt.rst        |   4 +-
> > >  drivers/staging/media/hantro/Makefile         |   1 +
> > >  drivers/staging/media/hantro/hantro.h         |  64 +++++++-
> > >  drivers/staging/media/hantro/hantro_drv.c     |   8 +-
> > >  .../staging/media/hantro/hantro_g1_h264_dec.c |   2 +-
> > >  .../media/hantro/hantro_g1_mpeg2_dec.c        |   2 +-
> > >  drivers/staging/media/hantro/hantro_g1_regs.h |  53 +++++++
> > >  .../staging/media/hantro/hantro_g1_vp8_dec.c  |   2 +-
> > >  drivers/staging/media/hantro/hantro_h264.c    |   6 +-
> > >  drivers/staging/media/hantro/hantro_hw.h      |  13 ++
> > >  .../staging/media/hantro/hantro_postproc.c    | 141 ++++++++++++++++++
> > >  drivers/staging/media/hantro/hantro_v4l2.c    | 105 ++++++++-----
> > >  drivers/staging/media/hantro/rk3288_vpu_hw.c  |  10 ++
> > >  13 files changed, 366 insertions(+), 45 deletions(-)
> > >  create mode 100644 drivers/staging/media/hantro/hantro_postproc.c
> > > 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 0/3] Enable Hantro G1 post-processor
  2020-02-11  4:16     ` Nicolas Dufresne
@ 2020-02-11 16:22       ` Nicolas Dufresne
  2020-02-12  7:06         ` Tomasz Figa
  0 siblings, 1 reply; 21+ messages in thread
From: Nicolas Dufresne @ 2020-02-11 16:22 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: Ezequiel Garcia, Linux Media Mailing List, kernel,
	open list:ARM/Rockchip SoC...,
	Heiko Stuebner, Jonas Karlman, Philipp Zabel, Boris Brezillon,
	Chris Healy, Linux Kernel Mailing List, Alexandre Courbot

On lun, 2020-02-10 at 23:16 -0500, Nicolas Dufresne wrote:
> Le lundi 10 février 2020 à 11:45 +0900, Tomasz Figa a écrit :
> > On Mon, Feb 10, 2020 at 4:52 AM Nicolas Dufresne <nicolas@ndufresne.ca>
> > wrote:
> > > Le mercredi 13 novembre 2019 à 14:56 -0300, Ezequiel Garcia a écrit :
> > > > Hi all,
> > > > 
> > > > The Hantro G1 VPU post-processor block can be pipelined with
> > > > the decoder hardware, allowing to perform operations such as
> > > > color conversion, scaling, rotation, cropping, among others.
> > > > 
> > > > When the post-processor is enabled, the decoder hardware
> > > > needs its own set of NV12 buffers (the native decoder format),
> > > > and the post-processor is the owner of the CAPTURE buffers,
> > > > allocated for the post-processed format.
> > > > 
> > > > This way, applications obtain post-processed
> > > > (scaled, converted, etc) buffers transparently.
> > > > 
> > > > This feature is implemented by exposing the post-processed pixel
> > > > formats on ENUM_FMT, ordered as "preferred pixelformat first":
> > > > 
> > > > v4l2-ctl -d 1 --list-formats
> > > > ioctl: VIDIOC_ENUM_FMT
> > > >       Type: Video Capture Multiplanar
> > > > 
> > > >       [0]: 'NV12' (Y/CbCr 4:2:0)
> > > >       [1]: 'YUYV' (YUYV 4:2:2)
> > > > 
> > > > The order of preference in ENUM_FMT can be used as a hint
> > > > by applications. This series updates the uAPI specification
> > > > accordingly.
> > > 
> > > As I'm implementing this, I realize that there may me a gap in being
> > > able to implement both IPP and non-IPP support in a generic framework.
> > > Unlike the above comment, we for non-IPP decoder we cannot naively pick
> > > the first format. In fact we parse the chroma and depth information
> > > from the headers (like pps from H264), and we pick a matching pixel
> > > format. This way, if we have a 10bit stream, and our IP supports 10bit,
> > > we will pick a 10bit pixel formats, otherwise decoding will just fail.
> > > 
> > > None of this information is passed to the driver prior to the first
> > > Request being made, so there is no way (as of current spec) that the
> > > driver can validate this in try_fmt ahead of time. Unless I set picture
> > > parameters without a request_fd for that purpose. If this is the way,
> > > then we should document this.
> > 
> > +Alexandre Courbot
> > 
> > It was suggested in the very early RFC stage, but it looks like it
> > didn't make it to the final spec.
> > https://patchwork.kernel.org/patch/10583233/#22209555
> 
> Ok, maybe we should revive it, it would fill that gap. Again, only an
> issue if you have a post processor. I'm still surprised we didn't
> expose the IPP functions through the topology, it would make so much
> sense to me, and I can make better code with that knowledge.
> 
> I found while coding this, that even if it's more difficult,
> classification of device by looking at the topology and the entity
> functions is much nicer, less of a guess.
> 
> Though, we lack some documentation (or clarification) for how to
> properly handle formats, size and selection in order to configure the
> IPP. Ezequiel was saying that we don't implement selection in Hanto, so
> I guess the scaling is a bit ambiguous then in regard to coded/display
> sizes. Though we pass a size to the OUTPUT side, so the driver can
> always control a little bit.

Re-reading the "initialization process", it's actually still there:

"Call VIDIOC_S_EXT_CTRLS() to set all the controls (parsed headers, etc.)
required by the OUTPUT format to enumerate the CAPTURE formats."

And then it suggest to use G_FMT to retreive the optimal format as selected by
the driver. So I guess this is a false alarm. And we probably don't need to play
with the subsampling to filter the formats, as the driver is expected to do so.

So what could be improve, is have in the per-codec documentation the list of
controls that are "required by the OUTPUT format to enumerate the CAPTURE
formats.". Could just be a comment in the control doc so say, "this control is
needed to allow enumerationg of the CAPTURE formats".

> 
> > > Is this the intended way to negotiation IPP functions with the driver ?
> > > 
> > 
> > In theory, if the userspace knows whether the stream is 4:2:0 or 4:2:2
> > and 8-bit or 10-bit, it can still select the first format from the top
> > that matches these properties.
> 
> That's exactly what I do, I have a map of the formats and their
> chroma/depth, and take the first one that match. If that fails, I just
> fallback to the first one. It's something you need to do anyway, as we
> prefer the native format first (even if there is an IPP).
> 
> > That's not how format handling in V4L2 works, though. ENUM_FMT is
> > expected to return a list of valid formats and if we forget about the
> > image processor for a moment, a stateless decoder would always return
> > any possible format, including ones invalid for the stream.
> > 
> > Now back to the image processor, if it handles conversions from any to
> > any format listed by ENUM_FMT, we kind of regain the V4L2 compliance,
> > but if the conversions are limited, the above requirement still
> > doesn't hold and we're not implementing V4L2 correctly.
> > 
> > Perhaps we can still amend the spec and require controls that
> > determine the stream properties to be set before starting the
> > streaming? I can imagine it could also help the driver filter out some
> > unsupported streams early, before allocating buffers and attempting to
> > decode.
> 
> I think it would make sense, so just to make sure, for H264 we could
> set the V4L2_CID_MPEG_VIDEO_H264_SPS along with the OUTPUT format,
> prior to CAPTURE enumeration.
> 
> > Best regards,
> > Tomasz
> > 
> > > > When the application sets a pixel format other than NV12,
> > > > the post-processor is transparently enabled.
> > > > 
> > > > Patch 1 is a cleanups needed to easier integrate the post-processor.
> > > > Patch 2 introduces the post-processing support.
> > > > Patch 3 updates the uAPI specification.
> > > > 
> > > > This is tested on RK3288 platforms with MPEG-2, VP8 and
> > > > H264 streams, decoding to YUY2 surfaces. For now, this series
> > > > is only adding support for NV12-to-YUY2 conversion.
> > > > 
> > > > Applies to media/master.
> > > > 
> > > > Future plans
> > > > ------------
> > > > 
> > > > It seems to me that we should start moving this driver to use
> > > > regmap-based access to registers. However, such move is out of scope
> > > > and not entirely related to this post-processor enablement.
> > > > 
> > > > We'll work on that as follow-up patches.
> > > > 
> > > > Changelog
> > > > ---------
> > > > 
> > > > Changes v3:
> > > > 
> > > > * After discussing with Hans and Tomasz during the media summit
> > > > in ELCE, we decided to go back on the MC changes. The MC topology
> > > > is now untouched. This means the series is now similar to v1,
> > > > except we explicitly use the ENUM_FMT to hint about the post-processed
> > > > formats.
> > > > 
> > > > Changes v2:
> > > > 
> > > > * The decoder->post-processor topology is now exposed
> > > >   explicitly and applications need to configure the pipeline.
> > > >   By default, the decoder is enabled and the post-processor
> > > >   is disabled.
> > > > 
> > > > * RGB post-processing output has been dropped. We might
> > > >   add this in the future, but for now, it seems it would
> > > >   make the code more complex without a use-case in mind.
> > > >   RGB is much more memory-consuming so less attractive
> > > >   than YUV, and modern GPUs and display controllers support YUV.
> > > > 
> > > > * The post-processor implementation still supports RK3288
> > > >   only. However, a generic register infrastructure is introduced
> > > >   to make addition of other variants such as RK3399 really easy.
> > > > 
> > > > Ezequiel Garcia (3):
> > > >   media: hantro: Cleanup format negotiation helpers
> > > >   media: hantro: Support color conversion via post-processing
> > > >   media: vidioc-enum-fmt.rst: clarify format preference
> > > > 
> > > >  .../media/uapi/v4l/vidioc-enum-fmt.rst        |   4 +-
> > > >  drivers/staging/media/hantro/Makefile         |   1 +
> > > >  drivers/staging/media/hantro/hantro.h         |  64 +++++++-
> > > >  drivers/staging/media/hantro/hantro_drv.c     |   8 +-
> > > >  .../staging/media/hantro/hantro_g1_h264_dec.c |   2 +-
> > > >  .../media/hantro/hantro_g1_mpeg2_dec.c        |   2 +-
> > > >  drivers/staging/media/hantro/hantro_g1_regs.h |  53 +++++++
> > > >  .../staging/media/hantro/hantro_g1_vp8_dec.c  |   2 +-
> > > >  drivers/staging/media/hantro/hantro_h264.c    |   6 +-
> > > >  drivers/staging/media/hantro/hantro_hw.h      |  13 ++
> > > >  .../staging/media/hantro/hantro_postproc.c    | 141 ++++++++++++++++++
> > > >  drivers/staging/media/hantro/hantro_v4l2.c    | 105 ++++++++-----
> > > >  drivers/staging/media/hantro/rk3288_vpu_hw.c  |  10 ++
> > > >  13 files changed, 366 insertions(+), 45 deletions(-)
> > > >  create mode 100644 drivers/staging/media/hantro/hantro_postproc.c
> > > > 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 0/3] Enable Hantro G1 post-processor
  2020-02-11 16:22       ` Nicolas Dufresne
@ 2020-02-12  7:06         ` Tomasz Figa
  2020-02-19 21:06           ` Nicolas Dufresne
  0 siblings, 1 reply; 21+ messages in thread
From: Tomasz Figa @ 2020-02-12  7:06 UTC (permalink / raw)
  To: Nicolas Dufresne, Ezequiel Garcia
  Cc: Linux Media Mailing List, kernel, open list:ARM/Rockchip SoC...,
	Heiko Stuebner, Jonas Karlman, Philipp Zabel, Boris Brezillon,
	Chris Healy, Linux Kernel Mailing List, Alexandre Courbot

On Wed, Feb 12, 2020 at 1:22 AM Nicolas Dufresne <nicolas@ndufresne.ca> wrote:
>
> On lun, 2020-02-10 at 23:16 -0500, Nicolas Dufresne wrote:
> > Le lundi 10 février 2020 à 11:45 +0900, Tomasz Figa a écrit :
> > > On Mon, Feb 10, 2020 at 4:52 AM Nicolas Dufresne <nicolas@ndufresne.ca>
> > > wrote:
> > > > Le mercredi 13 novembre 2019 à 14:56 -0300, Ezequiel Garcia a écrit :
> > > > > Hi all,
> > > > >
> > > > > The Hantro G1 VPU post-processor block can be pipelined with
> > > > > the decoder hardware, allowing to perform operations such as
> > > > > color conversion, scaling, rotation, cropping, among others.
> > > > >
> > > > > When the post-processor is enabled, the decoder hardware
> > > > > needs its own set of NV12 buffers (the native decoder format),
> > > > > and the post-processor is the owner of the CAPTURE buffers,
> > > > > allocated for the post-processed format.
> > > > >
> > > > > This way, applications obtain post-processed
> > > > > (scaled, converted, etc) buffers transparently.
> > > > >
> > > > > This feature is implemented by exposing the post-processed pixel
> > > > > formats on ENUM_FMT, ordered as "preferred pixelformat first":
> > > > >
> > > > > v4l2-ctl -d 1 --list-formats
> > > > > ioctl: VIDIOC_ENUM_FMT
> > > > >       Type: Video Capture Multiplanar
> > > > >
> > > > >       [0]: 'NV12' (Y/CbCr 4:2:0)
> > > > >       [1]: 'YUYV' (YUYV 4:2:2)
> > > > >
> > > > > The order of preference in ENUM_FMT can be used as a hint
> > > > > by applications. This series updates the uAPI specification
> > > > > accordingly.
> > > >
> > > > As I'm implementing this, I realize that there may me a gap in being
> > > > able to implement both IPP and non-IPP support in a generic framework.
> > > > Unlike the above comment, we for non-IPP decoder we cannot naively pick
> > > > the first format. In fact we parse the chroma and depth information
> > > > from the headers (like pps from H264), and we pick a matching pixel
> > > > format. This way, if we have a 10bit stream, and our IP supports 10bit,
> > > > we will pick a 10bit pixel formats, otherwise decoding will just fail.
> > > >
> > > > None of this information is passed to the driver prior to the first
> > > > Request being made, so there is no way (as of current spec) that the
> > > > driver can validate this in try_fmt ahead of time. Unless I set picture
> > > > parameters without a request_fd for that purpose. If this is the way,
> > > > then we should document this.
> > >
> > > +Alexandre Courbot
> > >
> > > It was suggested in the very early RFC stage, but it looks like it
> > > didn't make it to the final spec.
> > > https://patchwork.kernel.org/patch/10583233/#22209555
> >
> > Ok, maybe we should revive it, it would fill that gap. Again, only an
> > issue if you have a post processor. I'm still surprised we didn't
> > expose the IPP functions through the topology, it would make so much
> > sense to me, and I can make better code with that knowledge.
> >
> > I found while coding this, that even if it's more difficult,
> > classification of device by looking at the topology and the entity
> > functions is much nicer, less of a guess.
> >
> > Though, we lack some documentation (or clarification) for how to
> > properly handle formats, size and selection in order to configure the
> > IPP. Ezequiel was saying that we don't implement selection in Hanto, so
> > I guess the scaling is a bit ambiguous then in regard to coded/display
> > sizes. Though we pass a size to the OUTPUT side, so the driver can
> > always control a little bit.
>
> Re-reading the "initialization process", it's actually still there:
>
> "Call VIDIOC_S_EXT_CTRLS() to set all the controls (parsed headers, etc.)
> required by the OUTPUT format to enumerate the CAPTURE formats."
>

Oh, so it did make it to the final spec. Sorry, my bad. Should have
checked better.

> And then it suggest to use G_FMT to retreive the optimal format as selected by
> the driver. So I guess this is a false alarm. And we probably don't need to play
> with the subsampling to filter the formats, as the driver is expected to do so.
>

The question that this raises then is whether the merged drivers do
so. Hantro G1 supports only 8-bit YUV 4:2:0, so there is no problem,
although the driver doesn't seem to validate that in the headers. I'd
also say the CAPTURE resolution should be determined by the headers,
rather than the OUTPUT queue, consistently to the stateful decoders.
Resolution changes should be also signaled if controls are set to
headers that contain different resolutions than previously set.

> So what could be improve, is have in the per-codec documentation the list of
> controls that are "required by the OUTPUT format to enumerate the CAPTURE
> formats.". Could just be a comment in the control doc so say, "this control is
> needed to allow enumerationg of the CAPTURE formats".
>

I wouldn't limit this to the format enumeration. The drivers may also
need to check for things like profile, some stream features or even
just whether the resolution doesn't exceed the maximum supported by
the decoder.

Best regards,
Tomasz

> >
> > > > Is this the intended way to negotiation IPP functions with the driver ?
> > > >
> > >
> > > In theory, if the userspace knows whether the stream is 4:2:0 or 4:2:2
> > > and 8-bit or 10-bit, it can still select the first format from the top
> > > that matches these properties.
> >
> > That's exactly what I do, I have a map of the formats and their
> > chroma/depth, and take the first one that match. If that fails, I just
> > fallback to the first one. It's something you need to do anyway, as we
> > prefer the native format first (even if there is an IPP).
> >
> > > That's not how format handling in V4L2 works, though. ENUM_FMT is
> > > expected to return a list of valid formats and if we forget about the
> > > image processor for a moment, a stateless decoder would always return
> > > any possible format, including ones invalid for the stream.
> > >
> > > Now back to the image processor, if it handles conversions from any to
> > > any format listed by ENUM_FMT, we kind of regain the V4L2 compliance,
> > > but if the conversions are limited, the above requirement still
> > > doesn't hold and we're not implementing V4L2 correctly.
> > >
> > > Perhaps we can still amend the spec and require controls that
> > > determine the stream properties to be set before starting the
> > > streaming? I can imagine it could also help the driver filter out some
> > > unsupported streams early, before allocating buffers and attempting to
> > > decode.
> >
> > I think it would make sense, so just to make sure, for H264 we could
> > set the V4L2_CID_MPEG_VIDEO_H264_SPS along with the OUTPUT format,
> > prior to CAPTURE enumeration.
> >
> > > Best regards,
> > > Tomasz
> > >
> > > > > When the application sets a pixel format other than NV12,
> > > > > the post-processor is transparently enabled.
> > > > >
> > > > > Patch 1 is a cleanups needed to easier integrate the post-processor.
> > > > > Patch 2 introduces the post-processing support.
> > > > > Patch 3 updates the uAPI specification.
> > > > >
> > > > > This is tested on RK3288 platforms with MPEG-2, VP8 and
> > > > > H264 streams, decoding to YUY2 surfaces. For now, this series
> > > > > is only adding support for NV12-to-YUY2 conversion.
> > > > >
> > > > > Applies to media/master.
> > > > >
> > > > > Future plans
> > > > > ------------
> > > > >
> > > > > It seems to me that we should start moving this driver to use
> > > > > regmap-based access to registers. However, such move is out of scope
> > > > > and not entirely related to this post-processor enablement.
> > > > >
> > > > > We'll work on that as follow-up patches.
> > > > >
> > > > > Changelog
> > > > > ---------
> > > > >
> > > > > Changes v3:
> > > > >
> > > > > * After discussing with Hans and Tomasz during the media summit
> > > > > in ELCE, we decided to go back on the MC changes. The MC topology
> > > > > is now untouched. This means the series is now similar to v1,
> > > > > except we explicitly use the ENUM_FMT to hint about the post-processed
> > > > > formats.
> > > > >
> > > > > Changes v2:
> > > > >
> > > > > * The decoder->post-processor topology is now exposed
> > > > >   explicitly and applications need to configure the pipeline.
> > > > >   By default, the decoder is enabled and the post-processor
> > > > >   is disabled.
> > > > >
> > > > > * RGB post-processing output has been dropped. We might
> > > > >   add this in the future, but for now, it seems it would
> > > > >   make the code more complex without a use-case in mind.
> > > > >   RGB is much more memory-consuming so less attractive
> > > > >   than YUV, and modern GPUs and display controllers support YUV.
> > > > >
> > > > > * The post-processor implementation still supports RK3288
> > > > >   only. However, a generic register infrastructure is introduced
> > > > >   to make addition of other variants such as RK3399 really easy.
> > > > >
> > > > > Ezequiel Garcia (3):
> > > > >   media: hantro: Cleanup format negotiation helpers
> > > > >   media: hantro: Support color conversion via post-processing
> > > > >   media: vidioc-enum-fmt.rst: clarify format preference
> > > > >
> > > > >  .../media/uapi/v4l/vidioc-enum-fmt.rst        |   4 +-
> > > > >  drivers/staging/media/hantro/Makefile         |   1 +
> > > > >  drivers/staging/media/hantro/hantro.h         |  64 +++++++-
> > > > >  drivers/staging/media/hantro/hantro_drv.c     |   8 +-
> > > > >  .../staging/media/hantro/hantro_g1_h264_dec.c |   2 +-
> > > > >  .../media/hantro/hantro_g1_mpeg2_dec.c        |   2 +-
> > > > >  drivers/staging/media/hantro/hantro_g1_regs.h |  53 +++++++
> > > > >  .../staging/media/hantro/hantro_g1_vp8_dec.c  |   2 +-
> > > > >  drivers/staging/media/hantro/hantro_h264.c    |   6 +-
> > > > >  drivers/staging/media/hantro/hantro_hw.h      |  13 ++
> > > > >  .../staging/media/hantro/hantro_postproc.c    | 141 ++++++++++++++++++
> > > > >  drivers/staging/media/hantro/hantro_v4l2.c    | 105 ++++++++-----
> > > > >  drivers/staging/media/hantro/rk3288_vpu_hw.c  |  10 ++
> > > > >  13 files changed, 366 insertions(+), 45 deletions(-)
> > > > >  create mode 100644 drivers/staging/media/hantro/hantro_postproc.c
> > > > >
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 0/3] Enable Hantro G1 post-processor
  2020-02-12  7:06         ` Tomasz Figa
@ 2020-02-19 21:06           ` Nicolas Dufresne
  2020-03-09 11:11             ` Tomasz Figa
  0 siblings, 1 reply; 21+ messages in thread
From: Nicolas Dufresne @ 2020-02-19 21:06 UTC (permalink / raw)
  To: Tomasz Figa, Ezequiel Garcia
  Cc: Linux Media Mailing List, kernel, open list:ARM/Rockchip SoC...,
	Heiko Stuebner, Jonas Karlman, Philipp Zabel, Boris Brezillon,
	Chris Healy, Linux Kernel Mailing List, Alexandre Courbot

On mer, 2020-02-12 at 16:06 +0900, Tomasz Figa wrote:
> On Wed, Feb 12, 2020 at 1:22 AM Nicolas Dufresne <nicolas@ndufresne.ca> wrote:
> > On lun, 2020-02-10 at 23:16 -0500, Nicolas Dufresne wrote:
> > > Le lundi 10 février 2020 à 11:45 +0900, Tomasz Figa a écrit :
> > > > On Mon, Feb 10, 2020 at 4:52 AM Nicolas Dufresne <nicolas@ndufresne.ca>
> > > > wrote:
> > > > > Le mercredi 13 novembre 2019 à 14:56 -0300, Ezequiel Garcia a écrit :
> > > > > > Hi all,
> > > > > > 
> > > > > > The Hantro G1 VPU post-processor block can be pipelined with
> > > > > > the decoder hardware, allowing to perform operations such as
> > > > > > color conversion, scaling, rotation, cropping, among others.
> > > > > > 
> > > > > > When the post-processor is enabled, the decoder hardware
> > > > > > needs its own set of NV12 buffers (the native decoder format),
> > > > > > and the post-processor is the owner of the CAPTURE buffers,
> > > > > > allocated for the post-processed format.
> > > > > > 
> > > > > > This way, applications obtain post-processed
> > > > > > (scaled, converted, etc) buffers transparently.
> > > > > > 
> > > > > > This feature is implemented by exposing the post-processed pixel
> > > > > > formats on ENUM_FMT, ordered as "preferred pixelformat first":
> > > > > > 
> > > > > > v4l2-ctl -d 1 --list-formats
> > > > > > ioctl: VIDIOC_ENUM_FMT
> > > > > >       Type: Video Capture Multiplanar
> > > > > > 
> > > > > >       [0]: 'NV12' (Y/CbCr 4:2:0)
> > > > > >       [1]: 'YUYV' (YUYV 4:2:2)
> > > > > > 
> > > > > > The order of preference in ENUM_FMT can be used as a hint
> > > > > > by applications. This series updates the uAPI specification
> > > > > > accordingly.
> > > > > 
> > > > > As I'm implementing this, I realize that there may me a gap in being
> > > > > able to implement both IPP and non-IPP support in a generic framework.
> > > > > Unlike the above comment, we for non-IPP decoder we cannot naively
> > > > > pick
> > > > > the first format. In fact we parse the chroma and depth information
> > > > > from the headers (like pps from H264), and we pick a matching pixel
> > > > > format. This way, if we have a 10bit stream, and our IP supports
> > > > > 10bit,
> > > > > we will pick a 10bit pixel formats, otherwise decoding will just fail.
> > > > > 
> > > > > None of this information is passed to the driver prior to the first
> > > > > Request being made, so there is no way (as of current spec) that the
> > > > > driver can validate this in try_fmt ahead of time. Unless I set
> > > > > picture
> > > > > parameters without a request_fd for that purpose. If this is the way,
> > > > > then we should document this.
> > > > 
> > > > +Alexandre Courbot
> > > > 
> > > > It was suggested in the very early RFC stage, but it looks like it
> > > > didn't make it to the final spec.
> > > > https://patchwork.kernel.org/patch/10583233/#22209555
> > > 
> > > Ok, maybe we should revive it, it would fill that gap. Again, only an
> > > issue if you have a post processor. I'm still surprised we didn't
> > > expose the IPP functions through the topology, it would make so much
> > > sense to me, and I can make better code with that knowledge.
> > > 
> > > I found while coding this, that even if it's more difficult,
> > > classification of device by looking at the topology and the entity
> > > functions is much nicer, less of a guess.
> > > 
> > > Though, we lack some documentation (or clarification) for how to
> > > properly handle formats, size and selection in order to configure the
> > > IPP. Ezequiel was saying that we don't implement selection in Hanto, so
> > > I guess the scaling is a bit ambiguous then in regard to coded/display
> > > sizes. Though we pass a size to the OUTPUT side, so the driver can
> > > always control a little bit.
> > 
> > Re-reading the "initialization process", it's actually still there:
> > 
> > "Call VIDIOC_S_EXT_CTRLS() to set all the controls (parsed headers, etc.)
> > required by the OUTPUT format to enumerate the CAPTURE formats."
> > 
> 
> Oh, so it did make it to the final spec. Sorry, my bad. Should have
> checked better.
> 
> > And then it suggest to use G_FMT to retreive the optimal format as selected
> > by
> > the driver. So I guess this is a false alarm. And we probably don't need to
> > play
> > with the subsampling to filter the formats, as the driver is expected to do
> > so.
> > 
> 
> The question that this raises then is whether the merged drivers do
> so. Hantro G1 supports only 8-bit YUV 4:2:0, so there is no problem,
> although the driver doesn't seem to validate that in the headers. I'd
> also say the CAPTURE resolution should be determined by the headers,
> rather than the OUTPUT queue, consistently to the stateful decoders.
> Resolution changes should be also signaled if controls are set to
> headers that contain different resolutions than previously set.
> 
> > So what could be improve, is have in the per-codec documentation the list of
> > controls that are "required by the OUTPUT format to enumerate the CAPTURE
> > formats.". Could just be a comment in the control doc so say, "this control
> > is
> > needed to allow enumerationg of the CAPTURE formats".
> > 
> 
> I wouldn't limit this to the format enumeration. The drivers may also
> need to check for things like profile, some stream features or even
> just whether the resolution doesn't exceed the maximum supported by
> the decoder.

Yes, I came to similar conclusion. As an example, an H264 should validate the
SPS (and I think for H264 that is sufficient ?) based on it's hardware specific
constraint. Using the SPS for format and size numeration is also what makes the
most sense when there is an IPP. An IPP could support crop, and the driver could
select the default cropping to matcht the display resolution. Driver should be
enhanced, but it seems to be part of normal refine.

To come back to the spec, shall we document a return value for the case where
the control is supported, but the stream isn't supported ?

> 
> Best regards,
> Tomasz
> 
> > > > > Is this the intended way to negotiation IPP functions with the driver
> > > > > ?
> > > > > 
> > > > 
> > > > In theory, if the userspace knows whether the stream is 4:2:0 or 4:2:2
> > > > and 8-bit or 10-bit, it can still select the first format from the top
> > > > that matches these properties.
> > > 
> > > That's exactly what I do, I have a map of the formats and their
> > > chroma/depth, and take the first one that match. If that fails, I just
> > > fallback to the first one. It's something you need to do anyway, as we
> > > prefer the native format first (even if there is an IPP).
> > > 
> > > > That's not how format handling in V4L2 works, though. ENUM_FMT is
> > > > expected to return a list of valid formats and if we forget about the
> > > > image processor for a moment, a stateless decoder would always return
> > > > any possible format, including ones invalid for the stream.
> > > > 
> > > > Now back to the image processor, if it handles conversions from any to
> > > > any format listed by ENUM_FMT, we kind of regain the V4L2 compliance,
> > > > but if the conversions are limited, the above requirement still
> > > > doesn't hold and we're not implementing V4L2 correctly.
> > > > 
> > > > Perhaps we can still amend the spec and require controls that
> > > > determine the stream properties to be set before starting the
> > > > streaming? I can imagine it could also help the driver filter out some
> > > > unsupported streams early, before allocating buffers and attempting to
> > > > decode.
> > > 
> > > I think it would make sense, so just to make sure, for H264 we could
> > > set the V4L2_CID_MPEG_VIDEO_H264_SPS along with the OUTPUT format,
> > > prior to CAPTURE enumeration.
> > > 
> > > > Best regards,
> > > > Tomasz
> > > > 
> > > > > > When the application sets a pixel format other than NV12,
> > > > > > the post-processor is transparently enabled.
> > > > > > 
> > > > > > Patch 1 is a cleanups needed to easier integrate the post-processor.
> > > > > > Patch 2 introduces the post-processing support.
> > > > > > Patch 3 updates the uAPI specification.
> > > > > > 
> > > > > > This is tested on RK3288 platforms with MPEG-2, VP8 and
> > > > > > H264 streams, decoding to YUY2 surfaces. For now, this series
> > > > > > is only adding support for NV12-to-YUY2 conversion.
> > > > > > 
> > > > > > Applies to media/master.
> > > > > > 
> > > > > > Future plans
> > > > > > ------------
> > > > > > 
> > > > > > It seems to me that we should start moving this driver to use
> > > > > > regmap-based access to registers. However, such move is out of scope
> > > > > > and not entirely related to this post-processor enablement.
> > > > > > 
> > > > > > We'll work on that as follow-up patches.
> > > > > > 
> > > > > > Changelog
> > > > > > ---------
> > > > > > 
> > > > > > Changes v3:
> > > > > > 
> > > > > > * After discussing with Hans and Tomasz during the media summit
> > > > > > in ELCE, we decided to go back on the MC changes. The MC topology
> > > > > > is now untouched. This means the series is now similar to v1,
> > > > > > except we explicitly use the ENUM_FMT to hint about the post-
> > > > > > processed
> > > > > > formats.
> > > > > > 
> > > > > > Changes v2:
> > > > > > 
> > > > > > * The decoder->post-processor topology is now exposed
> > > > > >   explicitly and applications need to configure the pipeline.
> > > > > >   By default, the decoder is enabled and the post-processor
> > > > > >   is disabled.
> > > > > > 
> > > > > > * RGB post-processing output has been dropped. We might
> > > > > >   add this in the future, but for now, it seems it would
> > > > > >   make the code more complex without a use-case in mind.
> > > > > >   RGB is much more memory-consuming so less attractive
> > > > > >   than YUV, and modern GPUs and display controllers support YUV.
> > > > > > 
> > > > > > * The post-processor implementation still supports RK3288
> > > > > >   only. However, a generic register infrastructure is introduced
> > > > > >   to make addition of other variants such as RK3399 really easy.
> > > > > > 
> > > > > > Ezequiel Garcia (3):
> > > > > >   media: hantro: Cleanup format negotiation helpers
> > > > > >   media: hantro: Support color conversion via post-processing
> > > > > >   media: vidioc-enum-fmt.rst: clarify format preference
> > > > > > 
> > > > > >  .../media/uapi/v4l/vidioc-enum-fmt.rst        |   4 +-
> > > > > >  drivers/staging/media/hantro/Makefile         |   1 +
> > > > > >  drivers/staging/media/hantro/hantro.h         |  64 +++++++-
> > > > > >  drivers/staging/media/hantro/hantro_drv.c     |   8 +-
> > > > > >  .../staging/media/hantro/hantro_g1_h264_dec.c |   2 +-
> > > > > >  .../media/hantro/hantro_g1_mpeg2_dec.c        |   2 +-
> > > > > >  drivers/staging/media/hantro/hantro_g1_regs.h |  53 +++++++
> > > > > >  .../staging/media/hantro/hantro_g1_vp8_dec.c  |   2 +-
> > > > > >  drivers/staging/media/hantro/hantro_h264.c    |   6 +-
> > > > > >  drivers/staging/media/hantro/hantro_hw.h      |  13 ++
> > > > > >  .../staging/media/hantro/hantro_postproc.c    | 141
> > > > > > ++++++++++++++++++
> > > > > >  drivers/staging/media/hantro/hantro_v4l2.c    | 105 ++++++++-----
> > > > > >  drivers/staging/media/hantro/rk3288_vpu_hw.c  |  10 ++
> > > > > >  13 files changed, 366 insertions(+), 45 deletions(-)
> > > > > >  create mode 100644 drivers/staging/media/hantro/hantro_postproc.c
> > > > > > 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 0/3] Enable Hantro G1 post-processor
  2020-02-19 21:06           ` Nicolas Dufresne
@ 2020-03-09 11:11             ` Tomasz Figa
  0 siblings, 0 replies; 21+ messages in thread
From: Tomasz Figa @ 2020-03-09 11:11 UTC (permalink / raw)
  To: Nicolas Dufresne
  Cc: Ezequiel Garcia, Linux Media Mailing List, kernel,
	open list:ARM/Rockchip SoC...,
	Heiko Stuebner, Jonas Karlman, Philipp Zabel, Boris Brezillon,
	Chris Healy, Linux Kernel Mailing List, Alexandre Courbot

On Thu, Feb 20, 2020 at 6:06 AM Nicolas Dufresne <nicolas@ndufresne.ca> wrote:
>
> On mer, 2020-02-12 at 16:06 +0900, Tomasz Figa wrote:
> > On Wed, Feb 12, 2020 at 1:22 AM Nicolas Dufresne <nicolas@ndufresne.ca> wrote:
> > > On lun, 2020-02-10 at 23:16 -0500, Nicolas Dufresne wrote:
> > > > Le lundi 10 février 2020 à 11:45 +0900, Tomasz Figa a écrit :
> > > > > On Mon, Feb 10, 2020 at 4:52 AM Nicolas Dufresne <nicolas@ndufresne.ca>
> > > > > wrote:
> > > > > > Le mercredi 13 novembre 2019 à 14:56 -0300, Ezequiel Garcia a écrit :
> > > > > > > Hi all,
> > > > > > >
> > > > > > > The Hantro G1 VPU post-processor block can be pipelined with
> > > > > > > the decoder hardware, allowing to perform operations such as
> > > > > > > color conversion, scaling, rotation, cropping, among others.
> > > > > > >
> > > > > > > When the post-processor is enabled, the decoder hardware
> > > > > > > needs its own set of NV12 buffers (the native decoder format),
> > > > > > > and the post-processor is the owner of the CAPTURE buffers,
> > > > > > > allocated for the post-processed format.
> > > > > > >
> > > > > > > This way, applications obtain post-processed
> > > > > > > (scaled, converted, etc) buffers transparently.
> > > > > > >
> > > > > > > This feature is implemented by exposing the post-processed pixel
> > > > > > > formats on ENUM_FMT, ordered as "preferred pixelformat first":
> > > > > > >
> > > > > > > v4l2-ctl -d 1 --list-formats
> > > > > > > ioctl: VIDIOC_ENUM_FMT
> > > > > > >       Type: Video Capture Multiplanar
> > > > > > >
> > > > > > >       [0]: 'NV12' (Y/CbCr 4:2:0)
> > > > > > >       [1]: 'YUYV' (YUYV 4:2:2)
> > > > > > >
> > > > > > > The order of preference in ENUM_FMT can be used as a hint
> > > > > > > by applications. This series updates the uAPI specification
> > > > > > > accordingly.
> > > > > >
> > > > > > As I'm implementing this, I realize that there may me a gap in being
> > > > > > able to implement both IPP and non-IPP support in a generic framework.
> > > > > > Unlike the above comment, we for non-IPP decoder we cannot naively
> > > > > > pick
> > > > > > the first format. In fact we parse the chroma and depth information
> > > > > > from the headers (like pps from H264), and we pick a matching pixel
> > > > > > format. This way, if we have a 10bit stream, and our IP supports
> > > > > > 10bit,
> > > > > > we will pick a 10bit pixel formats, otherwise decoding will just fail.
> > > > > >
> > > > > > None of this information is passed to the driver prior to the first
> > > > > > Request being made, so there is no way (as of current spec) that the
> > > > > > driver can validate this in try_fmt ahead of time. Unless I set
> > > > > > picture
> > > > > > parameters without a request_fd for that purpose. If this is the way,
> > > > > > then we should document this.
> > > > >
> > > > > +Alexandre Courbot
> > > > >
> > > > > It was suggested in the very early RFC stage, but it looks like it
> > > > > didn't make it to the final spec.
> > > > > https://patchwork.kernel.org/patch/10583233/#22209555
> > > >
> > > > Ok, maybe we should revive it, it would fill that gap. Again, only an
> > > > issue if you have a post processor. I'm still surprised we didn't
> > > > expose the IPP functions through the topology, it would make so much
> > > > sense to me, and I can make better code with that knowledge.
> > > >
> > > > I found while coding this, that even if it's more difficult,
> > > > classification of device by looking at the topology and the entity
> > > > functions is much nicer, less of a guess.
> > > >
> > > > Though, we lack some documentation (or clarification) for how to
> > > > properly handle formats, size and selection in order to configure the
> > > > IPP. Ezequiel was saying that we don't implement selection in Hanto, so
> > > > I guess the scaling is a bit ambiguous then in regard to coded/display
> > > > sizes. Though we pass a size to the OUTPUT side, so the driver can
> > > > always control a little bit.
> > >
> > > Re-reading the "initialization process", it's actually still there:
> > >
> > > "Call VIDIOC_S_EXT_CTRLS() to set all the controls (parsed headers, etc.)
> > > required by the OUTPUT format to enumerate the CAPTURE formats."
> > >
> >
> > Oh, so it did make it to the final spec. Sorry, my bad. Should have
> > checked better.
> >
> > > And then it suggest to use G_FMT to retreive the optimal format as selected
> > > by
> > > the driver. So I guess this is a false alarm. And we probably don't need to
> > > play
> > > with the subsampling to filter the formats, as the driver is expected to do
> > > so.
> > >
> >
> > The question that this raises then is whether the merged drivers do
> > so. Hantro G1 supports only 8-bit YUV 4:2:0, so there is no problem,
> > although the driver doesn't seem to validate that in the headers. I'd
> > also say the CAPTURE resolution should be determined by the headers,
> > rather than the OUTPUT queue, consistently to the stateful decoders.
> > Resolution changes should be also signaled if controls are set to
> > headers that contain different resolutions than previously set.
> >
> > > So what could be improve, is have in the per-codec documentation the list of
> > > controls that are "required by the OUTPUT format to enumerate the CAPTURE
> > > formats.". Could just be a comment in the control doc so say, "this control
> > > is
> > > needed to allow enumerationg of the CAPTURE formats".
> > >
> >
> > I wouldn't limit this to the format enumeration. The drivers may also
> > need to check for things like profile, some stream features or even
> > just whether the resolution doesn't exceed the maximum supported by
> > the decoder.
>
> Yes, I came to similar conclusion. As an example, an H264 should validate the
> SPS (and I think for H264 that is sufficient ?) based on it's hardware specific
> constraint. Using the SPS for format and size numeration is also what makes the
> most sense when there is an IPP. An IPP could support crop, and the driver could
> select the default cropping to matcht the display resolution. Driver should be
> enhanced, but it seems to be part of normal refine.
>
> To come back to the spec, shall we document a return value for the case where
> the control is supported, but the stream isn't supported ?
>

I think the answer is yes. :) I'll leave the details to others, though.

Best regards,
Tomasz

> >
> > Best regards,
> > Tomasz
> >
> > > > > > Is this the intended way to negotiation IPP functions with the driver
> > > > > > ?
> > > > > >
> > > > >
> > > > > In theory, if the userspace knows whether the stream is 4:2:0 or 4:2:2
> > > > > and 8-bit or 10-bit, it can still select the first format from the top
> > > > > that matches these properties.
> > > >
> > > > That's exactly what I do, I have a map of the formats and their
> > > > chroma/depth, and take the first one that match. If that fails, I just
> > > > fallback to the first one. It's something you need to do anyway, as we
> > > > prefer the native format first (even if there is an IPP).
> > > >
> > > > > That's not how format handling in V4L2 works, though. ENUM_FMT is
> > > > > expected to return a list of valid formats and if we forget about the
> > > > > image processor for a moment, a stateless decoder would always return
> > > > > any possible format, including ones invalid for the stream.
> > > > >
> > > > > Now back to the image processor, if it handles conversions from any to
> > > > > any format listed by ENUM_FMT, we kind of regain the V4L2 compliance,
> > > > > but if the conversions are limited, the above requirement still
> > > > > doesn't hold and we're not implementing V4L2 correctly.
> > > > >
> > > > > Perhaps we can still amend the spec and require controls that
> > > > > determine the stream properties to be set before starting the
> > > > > streaming? I can imagine it could also help the driver filter out some
> > > > > unsupported streams early, before allocating buffers and attempting to
> > > > > decode.
> > > >
> > > > I think it would make sense, so just to make sure, for H264 we could
> > > > set the V4L2_CID_MPEG_VIDEO_H264_SPS along with the OUTPUT format,
> > > > prior to CAPTURE enumeration.
> > > >
> > > > > Best regards,
> > > > > Tomasz
> > > > >
> > > > > > > When the application sets a pixel format other than NV12,
> > > > > > > the post-processor is transparently enabled.
> > > > > > >
> > > > > > > Patch 1 is a cleanups needed to easier integrate the post-processor.
> > > > > > > Patch 2 introduces the post-processing support.
> > > > > > > Patch 3 updates the uAPI specification.
> > > > > > >
> > > > > > > This is tested on RK3288 platforms with MPEG-2, VP8 and
> > > > > > > H264 streams, decoding to YUY2 surfaces. For now, this series
> > > > > > > is only adding support for NV12-to-YUY2 conversion.
> > > > > > >
> > > > > > > Applies to media/master.
> > > > > > >
> > > > > > > Future plans
> > > > > > > ------------
> > > > > > >
> > > > > > > It seems to me that we should start moving this driver to use
> > > > > > > regmap-based access to registers. However, such move is out of scope
> > > > > > > and not entirely related to this post-processor enablement.
> > > > > > >
> > > > > > > We'll work on that as follow-up patches.
> > > > > > >
> > > > > > > Changelog
> > > > > > > ---------
> > > > > > >
> > > > > > > Changes v3:
> > > > > > >
> > > > > > > * After discussing with Hans and Tomasz during the media summit
> > > > > > > in ELCE, we decided to go back on the MC changes. The MC topology
> > > > > > > is now untouched. This means the series is now similar to v1,
> > > > > > > except we explicitly use the ENUM_FMT to hint about the post-
> > > > > > > processed
> > > > > > > formats.
> > > > > > >
> > > > > > > Changes v2:
> > > > > > >
> > > > > > > * The decoder->post-processor topology is now exposed
> > > > > > >   explicitly and applications need to configure the pipeline.
> > > > > > >   By default, the decoder is enabled and the post-processor
> > > > > > >   is disabled.
> > > > > > >
> > > > > > > * RGB post-processing output has been dropped. We might
> > > > > > >   add this in the future, but for now, it seems it would
> > > > > > >   make the code more complex without a use-case in mind.
> > > > > > >   RGB is much more memory-consuming so less attractive
> > > > > > >   than YUV, and modern GPUs and display controllers support YUV.
> > > > > > >
> > > > > > > * The post-processor implementation still supports RK3288
> > > > > > >   only. However, a generic register infrastructure is introduced
> > > > > > >   to make addition of other variants such as RK3399 really easy.
> > > > > > >
> > > > > > > Ezequiel Garcia (3):
> > > > > > >   media: hantro: Cleanup format negotiation helpers
> > > > > > >   media: hantro: Support color conversion via post-processing
> > > > > > >   media: vidioc-enum-fmt.rst: clarify format preference
> > > > > > >
> > > > > > >  .../media/uapi/v4l/vidioc-enum-fmt.rst        |   4 +-
> > > > > > >  drivers/staging/media/hantro/Makefile         |   1 +
> > > > > > >  drivers/staging/media/hantro/hantro.h         |  64 +++++++-
> > > > > > >  drivers/staging/media/hantro/hantro_drv.c     |   8 +-
> > > > > > >  .../staging/media/hantro/hantro_g1_h264_dec.c |   2 +-
> > > > > > >  .../media/hantro/hantro_g1_mpeg2_dec.c        |   2 +-
> > > > > > >  drivers/staging/media/hantro/hantro_g1_regs.h |  53 +++++++
> > > > > > >  .../staging/media/hantro/hantro_g1_vp8_dec.c  |   2 +-
> > > > > > >  drivers/staging/media/hantro/hantro_h264.c    |   6 +-
> > > > > > >  drivers/staging/media/hantro/hantro_hw.h      |  13 ++
> > > > > > >  .../staging/media/hantro/hantro_postproc.c    | 141
> > > > > > > ++++++++++++++++++
> > > > > > >  drivers/staging/media/hantro/hantro_v4l2.c    | 105 ++++++++-----
> > > > > > >  drivers/staging/media/hantro/rk3288_vpu_hw.c  |  10 ++
> > > > > > >  13 files changed, 366 insertions(+), 45 deletions(-)
> > > > > > >  create mode 100644 drivers/staging/media/hantro/hantro_postproc.c
> > > > > > >
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2020-03-09 11:12 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-13 17:56 [PATCH v3 0/3] Enable Hantro G1 post-processor Ezequiel Garcia
2019-11-13 17:56 ` [PATCH v3 1/3] media: hantro: Cleanup format negotiation helpers Ezequiel Garcia
2019-11-14  9:26   ` Philipp Zabel
2019-11-14  9:32   ` Boris Brezillon
2019-11-13 17:56 ` [PATCH v3 2/3] media: hantro: Support color conversion via post-processing Ezequiel Garcia
2019-11-14  9:46   ` Boris Brezillon
2019-11-14  9:48   ` Philipp Zabel
2019-11-15 15:44     ` Ezequiel Garcia
2019-11-15 16:10       ` Philipp Zabel
2019-12-05 14:33       ` Ezequiel Garcia
2019-12-05 14:46         ` Philipp Zabel
2019-12-05 16:02           ` Ezequiel Garcia
2019-11-13 17:56 ` [PATCH v3 3/3] media: vidioc-enum-fmt.rst: clarify format preference Ezequiel Garcia
2019-11-14  9:32   ` Boris Brezillon
2020-02-09 19:52 ` [PATCH v3 0/3] Enable Hantro G1 post-processor Nicolas Dufresne
2020-02-10  2:45   ` Tomasz Figa
2020-02-11  4:16     ` Nicolas Dufresne
2020-02-11 16:22       ` Nicolas Dufresne
2020-02-12  7:06         ` Tomasz Figa
2020-02-19 21:06           ` Nicolas Dufresne
2020-03-09 11:11             ` Tomasz Figa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).