linux-rockchip.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFT v2 0/8] media: hantro: jpeg: Various improvements
@ 2022-01-07  9:34 Chen-Yu Tsai
  2022-01-07  9:34 ` [PATCH RFT v2 1/8] media: hantro: jpeg: Relax register writes before write starting hardware Chen-Yu Tsai
                   ` (7 more replies)
  0 siblings, 8 replies; 14+ messages in thread
From: Chen-Yu Tsai @ 2022-01-07  9:34 UTC (permalink / raw)
  To: Ezequiel Garcia, Philipp Zabel, Mauro Carvalho Chehab,
	Hans Verkuil, Greg Kroah-Hartman
  Cc: Chen-Yu Tsai, linux-media, linux-rockchip, linux-staging, linux-kernel

Hi everyone,

Here are some improvements to the Hantro JPEG encoder driver. This
finishes two of the TODO items.

Changes since v1:

- New patch 8 cleaning up JPEG quantization table dimension magic
  numbers and sanity checking of various tables

- Add sanity checks for JPEG header size and alignment

- Drop linux/dma-mapping.h header now that the DMA bounce buffer is gone

- Make JPEG_ACTIVE_MARKER control read only

Patch 1 cleans up the final register write sequence in the JPEG encoder
driver. This particular bit was a bit confusing and hard to understand
given the lack of context around the original wmb(). Was it used to
force all the register writes to finish or to make sure memory writes
were completed? In the end I stuck with what the other hantro decoders
were doing.

Patch 2 fixes a misleading register name.

Patch 3 implements cropping on the output queue with the selection API
for the JPEG encoder. This allows specifying the visible area slightly
smaller than the macroblock-aligned coded size. This bit can be reused
by other stateless encoders once they are implemented.

Patch 4 adds a JFIF APP0 segment to the JPEG encoder output.

Patch 5 adds a COM segment to the JPEG encoder output. This is used to
align the SOS segment payload area.

Patch 6 implements the V4L2_CID_JPEG_ACTIVE_MARKER control. This is only
used to signal the segments added to userspace. The driver ignores any
changes requested.

Patch 7 lets the encoder output directly into the capture buffer,
getting rid of the DMA bounce buffer.

Patch 8 cleans up some of the magic number 64 instances around the
quantization table code. Sanity checks are also added.

Please have a look and test. I only tested this on the RK3399 with
gstreamer, and with Chromium's jpeg_encode_accelerator_unittest with the
patches backported to v5.10 on a RK3399 Kevin Chromebook. The H1 variant
is untested by me.

To test the selection API bits with gstreamer, the v4l2videoenc plugin
needs to be patched. A gst_v4l2_object_set_crop() call should be
inserted after gst_v4l2_object_set_format() in
gst_v4l2_video_enc_set_format().


Regards
ChenYu


Chen-Yu Tsai (8):
  media: hantro: jpeg: Relax register writes before write starting
    hardware
  media: hantro: Fix overfill bottom register field name
  media: hantro: Support cropping visible area for encoders
  media: hantro: jpeg: Add JFIF APP0 segment to JPEG encoder output
  media: hantro: jpeg: Add COM segment to JPEG header to align image
    scan
  media: hantro: Implement V4L2_CID_JPEG_ACTIVE_MARKER control
  media: hantro: output encoded JPEG content directly to capture buffers
  media: hantro: jpeg: Remove open-coded size in quantization table code

 drivers/staging/media/hantro/TODO             |  7 --
 drivers/staging/media/hantro/hantro.h         |  1 -
 drivers/staging/media/hantro/hantro_drv.c     | 41 ++++++---
 .../staging/media/hantro/hantro_h1_jpeg_enc.c | 44 ++++++----
 drivers/staging/media/hantro/hantro_h1_regs.h |  2 +-
 drivers/staging/media/hantro/hantro_hw.h      | 11 ---
 drivers/staging/media/hantro/hantro_jpeg.c    | 86 ++++++++++---------
 drivers/staging/media/hantro/hantro_jpeg.h    |  2 +-
 drivers/staging/media/hantro/hantro_v4l2.c    | 77 +++++++++++++++++
 .../media/hantro/rockchip_vpu2_hw_jpeg_enc.c  | 47 +++++-----
 .../staging/media/hantro/rockchip_vpu_hw.c    |  6 --
 11 files changed, 206 insertions(+), 118 deletions(-)

-- 
2.34.1.575.g55b058a8bb-goog


_______________________________________________
Linux-rockchip mailing list
Linux-rockchip@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-rockchip

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH RFT v2 1/8] media: hantro: jpeg: Relax register writes before write starting hardware
  2022-01-07  9:34 [PATCH RFT v2 0/8] media: hantro: jpeg: Various improvements Chen-Yu Tsai
@ 2022-01-07  9:34 ` Chen-Yu Tsai
  2022-01-18 21:02   ` Ezequiel Garcia
  2022-01-07  9:34 ` [PATCH RFT v2 2/8] media: hantro: Fix overfill bottom register field name Chen-Yu Tsai
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 14+ messages in thread
From: Chen-Yu Tsai @ 2022-01-07  9:34 UTC (permalink / raw)
  To: Ezequiel Garcia, Philipp Zabel, Mauro Carvalho Chehab,
	Hans Verkuil, Greg Kroah-Hartman
  Cc: Chen-Yu Tsai, linux-media, linux-rockchip, linux-staging, linux-kernel

In the earlier submissions of the Hantro/Rockchip JPEG encoder driver, a
wmb() was inserted before the final register write that starts the
encoder. In v11, it was removed and the second-to-last register write
was changed to a non-relaxed write, which has an implicit wmb() [1].
The rockchip_vpu2 (then rk3399_vpu) variant is even weirder as there
is another writel_relaxed() following the non-relaxed one.

Turns out only the last writel() needs to be non-relaxed. Device I/O
mappings already guarantee strict ordering to the same endpoint, and
the writel() triggering the hardware would force all writes to memory
to be observed before the writel() to the hardware is observed.

[1] https://lore.kernel.org/linux-media/CAAFQd5ArFG0hU6MgcyLd+_UOP3+T_U-aw2FXv6sE7fGqVCVGqw@mail.gmail.com/

Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
---
 drivers/staging/media/hantro/hantro_h1_jpeg_enc.c        | 3 +--
 drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c | 3 +--
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c b/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c
index 1450013d3685..03db1c3444f8 100644
--- a/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c
+++ b/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c
@@ -123,8 +123,7 @@ int hantro_h1_jpeg_enc_run(struct hantro_ctx *ctx)
 		| H1_REG_AXI_CTRL_INPUT_SWAP32
 		| H1_REG_AXI_CTRL_OUTPUT_SWAP8
 		| H1_REG_AXI_CTRL_INPUT_SWAP8;
-	/* Make sure that all registers are written at this point. */
-	vepu_write(vpu, reg, H1_REG_AXI_CTRL);
+	vepu_write_relaxed(vpu, reg, H1_REG_AXI_CTRL);
 
 	reg = H1_REG_ENC_CTRL_WIDTH(MB_WIDTH(ctx->src_fmt.width))
 		| H1_REG_ENC_CTRL_HEIGHT(MB_HEIGHT(ctx->src_fmt.height))
diff --git a/drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c b/drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c
index 4df16f59fb97..b931fc5fa1a9 100644
--- a/drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c
+++ b/drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c
@@ -152,8 +152,7 @@ int rockchip_vpu2_jpeg_enc_run(struct hantro_ctx *ctx)
 		| VEPU_REG_INPUT_SWAP8
 		| VEPU_REG_INPUT_SWAP16
 		| VEPU_REG_INPUT_SWAP32;
-	/* Make sure that all registers are written at this point. */
-	vepu_write(vpu, reg, VEPU_REG_DATA_ENDIAN);
+	vepu_write_relaxed(vpu, reg, VEPU_REG_DATA_ENDIAN);
 
 	reg = VEPU_REG_AXI_CTRL_BURST_LEN(16);
 	vepu_write_relaxed(vpu, reg, VEPU_REG_AXI_CTRL);
-- 
2.34.1.575.g55b058a8bb-goog


_______________________________________________
Linux-rockchip mailing list
Linux-rockchip@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-rockchip

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH RFT v2 2/8] media: hantro: Fix overfill bottom register field name
  2022-01-07  9:34 [PATCH RFT v2 0/8] media: hantro: jpeg: Various improvements Chen-Yu Tsai
  2022-01-07  9:34 ` [PATCH RFT v2 1/8] media: hantro: jpeg: Relax register writes before write starting hardware Chen-Yu Tsai
@ 2022-01-07  9:34 ` Chen-Yu Tsai
  2022-01-07  9:34 ` [PATCH RFT v2 3/8] media: hantro: Support cropping visible area for encoders Chen-Yu Tsai
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Chen-Yu Tsai @ 2022-01-07  9:34 UTC (permalink / raw)
  To: Ezequiel Garcia, Philipp Zabel, Mauro Carvalho Chehab,
	Hans Verkuil, Greg Kroah-Hartman
  Cc: Chen-Yu Tsai, linux-media, linux-rockchip, linux-staging, linux-kernel

The Hantro H1 hardware can crop off pixels from the right and bottom of
the source frame. These are controlled with the H1_REG_IN_IMG_CTRL_OVRFLB
and H1_REG_IN_IMG_CTRL_OVRFLR in the H1_REG_IN_IMG_CTRL register.

The ChromeOS kernel driver that this was based on incorrectly added the
_D4 suffix H1_REG_IN_IMG_CTRL_OVRFLB. This field crops the bottom of the
input frame, and the number is _not_ divided by 4. [1]

Correct the name to avoid confusion when crop support with the selection
API is added.

[1] https://chromium.googlesource.com/chromiumos/third_party/kernel/+/refs/ \
	heads/chromeos-4.19/drivers/staging/media/hantro/hantro_h1_vp8_enc.c#377

Fixes: 775fec69008d ("media: add Rockchip VPU JPEG encoder driver")
Fixes: a29add8c9bb2 ("media: rockchip/vpu: rename from rockchip to hantro")
Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
---
 drivers/staging/media/hantro/hantro_h1_jpeg_enc.c | 2 +-
 drivers/staging/media/hantro/hantro_h1_regs.h     | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c b/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c
index 03db1c3444f8..96153c755fb8 100644
--- a/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c
+++ b/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c
@@ -23,7 +23,7 @@ static void hantro_h1_set_src_img_ctrl(struct hantro_dev *vpu,
 
 	reg = H1_REG_IN_IMG_CTRL_ROW_LEN(pix_fmt->width)
 		| H1_REG_IN_IMG_CTRL_OVRFLR_D4(0)
-		| H1_REG_IN_IMG_CTRL_OVRFLB_D4(0)
+		| H1_REG_IN_IMG_CTRL_OVRFLB(0)
 		| H1_REG_IN_IMG_CTRL_FMT(ctx->vpu_src_fmt->enc_fmt);
 	vepu_write_relaxed(vpu, reg, H1_REG_IN_IMG_CTRL);
 }
diff --git a/drivers/staging/media/hantro/hantro_h1_regs.h b/drivers/staging/media/hantro/hantro_h1_regs.h
index d6e9825bb5c7..30e7e7b920b5 100644
--- a/drivers/staging/media/hantro/hantro_h1_regs.h
+++ b/drivers/staging/media/hantro/hantro_h1_regs.h
@@ -47,7 +47,7 @@
 #define H1_REG_IN_IMG_CTRL				0x03c
 #define     H1_REG_IN_IMG_CTRL_ROW_LEN(x)		((x) << 12)
 #define     H1_REG_IN_IMG_CTRL_OVRFLR_D4(x)		((x) << 10)
-#define     H1_REG_IN_IMG_CTRL_OVRFLB_D4(x)		((x) << 6)
+#define     H1_REG_IN_IMG_CTRL_OVRFLB(x)		((x) << 6)
 #define     H1_REG_IN_IMG_CTRL_FMT(x)			((x) << 2)
 #define H1_REG_ENC_CTRL0				0x040
 #define    H1_REG_ENC_CTRL0_INIT_QP(x)			((x) << 26)
-- 
2.34.1.575.g55b058a8bb-goog


_______________________________________________
Linux-rockchip mailing list
Linux-rockchip@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-rockchip

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH RFT v2 3/8] media: hantro: Support cropping visible area for encoders
  2022-01-07  9:34 [PATCH RFT v2 0/8] media: hantro: jpeg: Various improvements Chen-Yu Tsai
  2022-01-07  9:34 ` [PATCH RFT v2 1/8] media: hantro: jpeg: Relax register writes before write starting hardware Chen-Yu Tsai
  2022-01-07  9:34 ` [PATCH RFT v2 2/8] media: hantro: Fix overfill bottom register field name Chen-Yu Tsai
@ 2022-01-07  9:34 ` Chen-Yu Tsai
  2022-01-07  9:34 ` [PATCH RFT v2 4/8] media: hantro: jpeg: Add JFIF APP0 segment to JPEG encoder output Chen-Yu Tsai
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Chen-Yu Tsai @ 2022-01-07  9:34 UTC (permalink / raw)
  To: Ezequiel Garcia, Philipp Zabel, Mauro Carvalho Chehab,
	Hans Verkuil, Greg Kroah-Hartman
  Cc: Chen-Yu Tsai, linux-media, linux-rockchip, linux-staging, linux-kernel

Encoders typically operate on macroblocks. Thus their output or coded
resolution is constrained to multiples of macroblocks. For frame sizes
not aligned to macroblocks, cropping is needed to limit the visible
area of the frame.

Add support for cropping on the output (source) side for encoders,
using the selection API.

Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
---
 drivers/staging/media/hantro/TODO             |  3 -
 .../staging/media/hantro/hantro_h1_jpeg_enc.c | 17 +++-
 drivers/staging/media/hantro/hantro_v4l2.c    | 77 +++++++++++++++++++
 .../media/hantro/rockchip_vpu2_hw_jpeg_enc.c  | 17 ++--
 4 files changed, 101 insertions(+), 13 deletions(-)

diff --git a/drivers/staging/media/hantro/TODO b/drivers/staging/media/hantro/TODO
index fa0c94057007..da181dc93069 100644
--- a/drivers/staging/media/hantro/TODO
+++ b/drivers/staging/media/hantro/TODO
@@ -5,9 +5,6 @@
 
   For this reason, we are keeping this driver in staging for now.
 
-* Add support for the S_SELECTION API.
-  See the comment for VEPU_REG_ENC_OVER_FILL_STRM_OFFSET.
-
 * Instead of having a DMA bounce buffer, it could be possible to use a
   normal buffer and memmove() the payload to make space for the header.
   This might need to use extra JPEG markers for padding reasons.
diff --git a/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c b/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c
index 96153c755fb8..9104973af8df 100644
--- a/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c
+++ b/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c
@@ -18,12 +18,21 @@
 static void hantro_h1_set_src_img_ctrl(struct hantro_dev *vpu,
 				       struct hantro_ctx *ctx)
 {
-	struct v4l2_pix_format_mplane *pix_fmt = &ctx->src_fmt;
+	u32 overfill_r, overfill_b;
 	u32 reg;
 
-	reg = H1_REG_IN_IMG_CTRL_ROW_LEN(pix_fmt->width)
-		| H1_REG_IN_IMG_CTRL_OVRFLR_D4(0)
-		| H1_REG_IN_IMG_CTRL_OVRFLB(0)
+	/*
+	 * The format width and height are already macroblock aligned
+	 * by .vidioc_s_fmt_vid_cap_mplane() callback. Destination
+	 * format width and height can be further modified by
+	 * .vidioc_s_selection(), and the width is 4-aligned.
+	 */
+	overfill_r = ctx->src_fmt.width - ctx->dst_fmt.width;
+	overfill_b = ctx->src_fmt.height - ctx->dst_fmt.height;
+
+	reg = H1_REG_IN_IMG_CTRL_ROW_LEN(ctx->src_fmt.width)
+		| H1_REG_IN_IMG_CTRL_OVRFLR_D4(overfill_r / 4)
+		| H1_REG_IN_IMG_CTRL_OVRFLB(overfill_b)
 		| H1_REG_IN_IMG_CTRL_FMT(ctx->vpu_src_fmt->enc_fmt);
 	vepu_write_relaxed(vpu, reg, H1_REG_IN_IMG_CTRL);
 }
diff --git a/drivers/staging/media/hantro/hantro_v4l2.c b/drivers/staging/media/hantro/hantro_v4l2.c
index e595905b3bd7..67148ba346f5 100644
--- a/drivers/staging/media/hantro/hantro_v4l2.c
+++ b/drivers/staging/media/hantro/hantro_v4l2.c
@@ -554,6 +554,80 @@ vidioc_s_fmt_cap_mplane(struct file *file, void *priv, struct v4l2_format *f)
 	return hantro_set_fmt_cap(fh_to_ctx(priv), &f->fmt.pix_mp);
 }
 
+static int vidioc_g_selection(struct file *file, void *priv,
+			      struct v4l2_selection *sel)
+{
+	struct hantro_ctx *ctx = fh_to_ctx(priv);
+
+	/* Crop only supported on source. */
+	if (!ctx->is_encoder ||
+	    sel->type != V4L2_BUF_TYPE_VIDEO_OUTPUT)
+		return -EINVAL;
+
+	switch (sel->target) {
+	case V4L2_SEL_TGT_CROP_DEFAULT:
+	case V4L2_SEL_TGT_CROP_BOUNDS:
+		sel->r.top = 0;
+		sel->r.left = 0;
+		sel->r.width = ctx->src_fmt.width;
+		sel->r.height = ctx->src_fmt.height;
+		break;
+	case V4L2_SEL_TGT_CROP:
+		sel->r.top = 0;
+		sel->r.left = 0;
+		sel->r.width = ctx->dst_fmt.width;
+		sel->r.height = ctx->dst_fmt.height;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int vidioc_s_selection(struct file *file, void *priv,
+			      struct v4l2_selection *sel)
+{
+	struct hantro_ctx *ctx = fh_to_ctx(priv);
+	struct v4l2_rect *rect = &sel->r;
+	struct vb2_queue *vq;
+
+	/* Crop only supported on source. */
+	if (!ctx->is_encoder ||
+	    sel->type != V4L2_BUF_TYPE_VIDEO_OUTPUT)
+		return -EINVAL;
+
+	/* Change not allowed if the queue is streaming. */
+	vq = v4l2_m2m_get_src_vq(ctx->fh.m2m_ctx);
+	if (vb2_is_streaming(vq))
+		return -EBUSY;
+
+	if (sel->target != V4L2_SEL_TGT_CROP)
+		return -EINVAL;
+
+	/*
+	 * We do not support offsets, and we can crop only inside
+	 * right-most or bottom-most macroblocks.
+	 */
+	if (rect->left != 0 || rect->top != 0 ||
+	    round_up(rect->width, MB_DIM) != ctx->src_fmt.width ||
+	    round_up(rect->height, MB_DIM) != ctx->src_fmt.height) {
+		/* Default to full frame for incorrect settings. */
+		rect->left = 0;
+		rect->top = 0;
+		rect->width = ctx->src_fmt.width;
+		rect->height = ctx->src_fmt.height;
+	} else {
+		/* We support widths aligned to 4 pixels and arbitrary heights. */
+		rect->width = round_up(rect->width, 4);
+	}
+
+	ctx->dst_fmt.width = rect->width;
+	ctx->dst_fmt.height = rect->height;
+
+	return 0;
+}
+
 const struct v4l2_ioctl_ops hantro_ioctl_ops = {
 	.vidioc_querycap = vidioc_querycap,
 	.vidioc_enum_framesizes = vidioc_enum_framesizes,
@@ -580,6 +654,9 @@ const struct v4l2_ioctl_ops hantro_ioctl_ops = {
 
 	.vidioc_streamon = v4l2_m2m_ioctl_streamon,
 	.vidioc_streamoff = v4l2_m2m_ioctl_streamoff,
+
+	.vidioc_g_selection = vidioc_g_selection,
+	.vidioc_s_selection = vidioc_s_selection,
 };
 
 static int
diff --git a/drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c b/drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c
index b931fc5fa1a9..da275568874a 100644
--- a/drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c
+++ b/drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c
@@ -35,18 +35,23 @@
 static void rockchip_vpu2_set_src_img_ctrl(struct hantro_dev *vpu,
 					   struct hantro_ctx *ctx)
 {
-	struct v4l2_pix_format_mplane *pix_fmt = &ctx->src_fmt;
+	u32 overfill_r, overfill_b;
 	u32 reg;
 
 	/*
-	 * The pix fmt width/height are already macroblock aligned
-	 * by .vidioc_s_fmt_vid_cap_mplane() callback
+	 * The format width and height are already macroblock aligned
+	 * by .vidioc_s_fmt_vid_cap_mplane() callback. Destination
+	 * format width and height can be further modified by
+	 * .vidioc_s_selection(), and the width is 4-aligned.
 	 */
-	reg = VEPU_REG_IN_IMG_CTRL_ROW_LEN(pix_fmt->width);
+	overfill_r = ctx->src_fmt.width - ctx->dst_fmt.width;
+	overfill_b = ctx->src_fmt.height - ctx->dst_fmt.height;
+
+	reg = VEPU_REG_IN_IMG_CTRL_ROW_LEN(ctx->src_fmt.width);
 	vepu_write_relaxed(vpu, reg, VEPU_REG_INPUT_LUMA_INFO);
 
-	reg = VEPU_REG_IN_IMG_CTRL_OVRFLR_D4(0) |
-	      VEPU_REG_IN_IMG_CTRL_OVRFLB(0);
+	reg = VEPU_REG_IN_IMG_CTRL_OVRFLR_D4(overfill_r / 4) |
+	      VEPU_REG_IN_IMG_CTRL_OVRFLB(overfill_b);
 	/*
 	 * This register controls the input crop, as the offset
 	 * from the right/bottom within the last macroblock. The offset from the
-- 
2.34.1.575.g55b058a8bb-goog


_______________________________________________
Linux-rockchip mailing list
Linux-rockchip@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-rockchip

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH RFT v2 4/8] media: hantro: jpeg: Add JFIF APP0 segment to JPEG encoder output
  2022-01-07  9:34 [PATCH RFT v2 0/8] media: hantro: jpeg: Various improvements Chen-Yu Tsai
                   ` (2 preceding siblings ...)
  2022-01-07  9:34 ` [PATCH RFT v2 3/8] media: hantro: Support cropping visible area for encoders Chen-Yu Tsai
@ 2022-01-07  9:34 ` Chen-Yu Tsai
  2022-01-07  9:34 ` [PATCH RFT v2 5/8] media: hantro: jpeg: Add COM segment to JPEG header to align image scan Chen-Yu Tsai
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Chen-Yu Tsai @ 2022-01-07  9:34 UTC (permalink / raw)
  To: Ezequiel Garcia, Philipp Zabel, Mauro Carvalho Chehab,
	Hans Verkuil, Greg Kroah-Hartman
  Cc: Chen-Yu Tsai, linux-media, linux-rockchip, linux-staging, linux-kernel

While the V4L2_PIX_FMT_JPEG format doesn't specify any requirements for
the APP0 or APP1 segments, it would be nice if the output is JFIF
compliant. While some programs can read JPEG streams that aren't, some
guess work is involved.

Add the standard JFIF APP0 segment to the JPEG header, so that the JPEG
encoder output is JFIF compliant.

Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
---
 drivers/staging/media/hantro/hantro_jpeg.c | 21 +++++++++++++--------
 drivers/staging/media/hantro/hantro_jpeg.h |  2 +-
 2 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/drivers/staging/media/hantro/hantro_jpeg.c b/drivers/staging/media/hantro/hantro_jpeg.c
index df62fbdff7c9..7d4018bd6876 100644
--- a/drivers/staging/media/hantro/hantro_jpeg.c
+++ b/drivers/staging/media/hantro/hantro_jpeg.c
@@ -12,15 +12,15 @@
 #include "hantro_jpeg.h"
 #include "hantro.h"
 
-#define LUMA_QUANT_OFF		7
-#define CHROMA_QUANT_OFF	72
-#define HEIGHT_OFF		141
-#define WIDTH_OFF		143
+#define LUMA_QUANT_OFF		25
+#define CHROMA_QUANT_OFF	90
+#define HEIGHT_OFF		159
+#define WIDTH_OFF		161
 
-#define HUFF_LUMA_DC_OFF	160
-#define HUFF_LUMA_AC_OFF	193
-#define HUFF_CHROMA_DC_OFF	376
-#define HUFF_CHROMA_AC_OFF	409
+#define HUFF_LUMA_DC_OFF	178
+#define HUFF_LUMA_AC_OFF	211
+#define HUFF_CHROMA_DC_OFF	394
+#define HUFF_CHROMA_AC_OFF	427
 
 /* Default tables from JPEG ITU-T.81
  * (ISO/IEC 10918-1) Annex K, tables K.1 and K.2
@@ -144,6 +144,11 @@ static const unsigned char hantro_jpeg_header[JPEG_HEADER_SIZE] = {
 	/* SOI */
 	0xff, 0xd8,
 
+	/* JFIF-APP0 */
+	0xff, 0xe0, 0x00, 0x10, 0x4a, 0x46, 0x49, 0x46,
+	0x00, 0x01, 0x01, 0x00, 0x00, 0x01, 0x00, 0x01,
+	0x00, 0x00,
+
 	/* DQT */
 	0xff, 0xdb, 0x00, 0x84,
 
diff --git a/drivers/staging/media/hantro/hantro_jpeg.h b/drivers/staging/media/hantro/hantro_jpeg.h
index 035ab25b803f..f33c492134e4 100644
--- a/drivers/staging/media/hantro/hantro_jpeg.h
+++ b/drivers/staging/media/hantro/hantro_jpeg.h
@@ -1,6 +1,6 @@
 /* SPDX-License-Identifier: GPL-2.0+ */
 
-#define JPEG_HEADER_SIZE	601
+#define JPEG_HEADER_SIZE	619
 #define JPEG_QUANT_SIZE		64
 
 struct hantro_jpeg_ctx {
-- 
2.34.1.575.g55b058a8bb-goog


_______________________________________________
Linux-rockchip mailing list
Linux-rockchip@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-rockchip

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH RFT v2 5/8] media: hantro: jpeg: Add COM segment to JPEG header to align image scan
  2022-01-07  9:34 [PATCH RFT v2 0/8] media: hantro: jpeg: Various improvements Chen-Yu Tsai
                   ` (3 preceding siblings ...)
  2022-01-07  9:34 ` [PATCH RFT v2 4/8] media: hantro: jpeg: Add JFIF APP0 segment to JPEG encoder output Chen-Yu Tsai
@ 2022-01-07  9:34 ` Chen-Yu Tsai
  2022-01-07  9:34 ` [PATCH RFT v2 6/8] media: hantro: Implement V4L2_CID_JPEG_ACTIVE_MARKER control Chen-Yu Tsai
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Chen-Yu Tsai @ 2022-01-07  9:34 UTC (permalink / raw)
  To: Ezequiel Garcia, Philipp Zabel, Mauro Carvalho Chehab,
	Hans Verkuil, Greg Kroah-Hartman
  Cc: Chen-Yu Tsai, linux-media, linux-rockchip, linux-staging, linux-kernel

The JPEG header size is not 64-bit aligned. This makes the driver
require a bounce buffer for the encoded JPEG image scan output.

Add a COM (comment) segment to the JPEG header so that the header size
is a multiple of 64 bits. This will then allow dropping the use of the
bounce buffer, and instead have the hardware write out to the capture
buffer directly.

Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
---
 drivers/staging/media/hantro/hantro_jpeg.c | 23 +++++++++++++++++++++-
 drivers/staging/media/hantro/hantro_jpeg.h |  2 +-
 2 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/media/hantro/hantro_jpeg.c b/drivers/staging/media/hantro/hantro_jpeg.c
index 7d4018bd6876..e63eeef3952d 100644
--- a/drivers/staging/media/hantro/hantro_jpeg.c
+++ b/drivers/staging/media/hantro/hantro_jpeg.c
@@ -6,6 +6,9 @@
  * Copyright (C) Jean-Francois Moine (http://moinejf.free.fr)
  * Copyright (C) 2014 Philipp Zabel, Pengutronix
  */
+
+#include <linux/align.h>
+#include <linux/build_bug.h>
 #include <linux/dma-mapping.h>
 #include <linux/kernel.h>
 #include <linux/string.h>
@@ -140,7 +143,7 @@ static const unsigned char chroma_ac_table[] = {
  * and we'll use fixed offsets to change the width, height
  * quantization tables, etc.
  */
-static const unsigned char hantro_jpeg_header[JPEG_HEADER_SIZE] = {
+static const unsigned char hantro_jpeg_header[] = {
 	/* SOI */
 	0xff, 0xd8,
 
@@ -247,11 +250,29 @@ static const unsigned char hantro_jpeg_header[JPEG_HEADER_SIZE] = {
 	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
 	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
 
+	/* COM */
+	0xff, 0xfe, 0x00, 0x03, 0x00,
+
 	/* SOS */
 	0xff, 0xda, 0x00, 0x0c, 0x03, 0x01, 0x00, 0x02,
 	0x11, 0x03, 0x11, 0x00, 0x3f, 0x00,
 };
 
+/*
+ * JPEG_HEADER_SIZE is used in other parts of the driver in lieu of
+ * "sizeof(hantro_jpeg_header)". The two must be equal.
+ */
+static_assert(sizeof(hantro_jpeg_header) == JPEG_HEADER_SIZE);
+
+/*
+ * hantro_jpeg_header is padded with a COM segment, so that the payload
+ * of the SOS segment (the entropy-encoded image scan), which should
+ * trail the whole header, is 8-byte aligned for the hardware to write
+ * to directly.
+ */
+static_assert(IS_ALIGNED(sizeof(hantro_jpeg_header), 8),
+	      "Hantro JPEG header size needs to be 8-byte aligned.");
+
 static unsigned char jpeg_scale_qp(const unsigned char qp, int scale)
 {
 	unsigned int temp;
diff --git a/drivers/staging/media/hantro/hantro_jpeg.h b/drivers/staging/media/hantro/hantro_jpeg.h
index f33c492134e4..0b49d0b82caa 100644
--- a/drivers/staging/media/hantro/hantro_jpeg.h
+++ b/drivers/staging/media/hantro/hantro_jpeg.h
@@ -1,6 +1,6 @@
 /* SPDX-License-Identifier: GPL-2.0+ */
 
-#define JPEG_HEADER_SIZE	619
+#define JPEG_HEADER_SIZE	624
 #define JPEG_QUANT_SIZE		64
 
 struct hantro_jpeg_ctx {
-- 
2.34.1.575.g55b058a8bb-goog


_______________________________________________
Linux-rockchip mailing list
Linux-rockchip@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-rockchip

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH RFT v2 6/8] media: hantro: Implement V4L2_CID_JPEG_ACTIVE_MARKER control
  2022-01-07  9:34 [PATCH RFT v2 0/8] media: hantro: jpeg: Various improvements Chen-Yu Tsai
                   ` (4 preceding siblings ...)
  2022-01-07  9:34 ` [PATCH RFT v2 5/8] media: hantro: jpeg: Add COM segment to JPEG header to align image scan Chen-Yu Tsai
@ 2022-01-07  9:34 ` Chen-Yu Tsai
  2022-01-07  9:34 ` [PATCH RFT v2 7/8] media: hantro: output encoded JPEG content directly to capture buffers Chen-Yu Tsai
  2022-01-07  9:34 ` [PATCH RFT v2 8/8] media: hantro: jpeg: Remove open-coded size in quantization table code Chen-Yu Tsai
  7 siblings, 0 replies; 14+ messages in thread
From: Chen-Yu Tsai @ 2022-01-07  9:34 UTC (permalink / raw)
  To: Ezequiel Garcia, Philipp Zabel, Mauro Carvalho Chehab,
	Hans Verkuil, Greg Kroah-Hartman
  Cc: Chen-Yu Tsai, linux-media, linux-rockchip, linux-staging, linux-kernel

The Hantro JPEG encoder driver adds various segments to the JPEG header.
While it would be quite complicated to make these segments selectable
to userspace, given that the driver has to fill in various fields in
these segments, and also take care of alignment, it would be nice if
the driver could signal to userspace what segments are included.

Implement the V4L2_CID_JPEG_ACTIVE_MARKER control, and make it read
only so that it always returns the set of segments that the driver adds.

Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
---
 drivers/staging/media/hantro/hantro_drv.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/drivers/staging/media/hantro/hantro_drv.c b/drivers/staging/media/hantro/hantro_drv.c
index 6a51f39dde56..b376b9d94b01 100644
--- a/drivers/staging/media/hantro/hantro_drv.c
+++ b/drivers/staging/media/hantro/hantro_drv.c
@@ -332,6 +332,11 @@ static const struct v4l2_ctrl_ops hantro_hevc_ctrl_ops = {
 	.s_ctrl = hantro_hevc_s_ctrl,
 };
 
+#define HANTRO_JPEG_ACTIVE_MARKERS	(V4L2_JPEG_ACTIVE_MARKER_APP0 | \
+					 V4L2_JPEG_ACTIVE_MARKER_COM | \
+					 V4L2_JPEG_ACTIVE_MARKER_DQT | \
+					 V4L2_JPEG_ACTIVE_MARKER_DHT)
+
 static const struct hantro_ctrl controls[] = {
 	{
 		.codec = HANTRO_JPEG_ENCODER,
@@ -343,6 +348,22 @@ static const struct hantro_ctrl controls[] = {
 			.def = 50,
 			.ops = &hantro_jpeg_ctrl_ops,
 		},
+	}, {
+		.codec = HANTRO_JPEG_ENCODER,
+		.cfg = {
+			.id = V4L2_CID_JPEG_ACTIVE_MARKER,
+			.max = HANTRO_JPEG_ACTIVE_MARKERS,
+			.def = HANTRO_JPEG_ACTIVE_MARKERS,
+			/*
+			 * Changing the set of active markers/segments also
+			 * messes up the alignment of the JPEG header, which
+			 * is needed to allow the hardware to write directly
+			 * to the output buffer. Implementing this introduces
+			 * a lot of complexity for little gain, as the markers
+			 * enabled is already the minimum required set.
+			 */
+			.flags = V4L2_CTRL_FLAG_READ_ONLY,
+		},
 	}, {
 		.codec = HANTRO_MPEG2_DECODER,
 		.cfg = {
-- 
2.34.1.575.g55b058a8bb-goog


_______________________________________________
Linux-rockchip mailing list
Linux-rockchip@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-rockchip

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH RFT v2 7/8] media: hantro: output encoded JPEG content directly to capture buffers
  2022-01-07  9:34 [PATCH RFT v2 0/8] media: hantro: jpeg: Various improvements Chen-Yu Tsai
                   ` (5 preceding siblings ...)
  2022-01-07  9:34 ` [PATCH RFT v2 6/8] media: hantro: Implement V4L2_CID_JPEG_ACTIVE_MARKER control Chen-Yu Tsai
@ 2022-01-07  9:34 ` Chen-Yu Tsai
  2022-01-07  9:34 ` [PATCH RFT v2 8/8] media: hantro: jpeg: Remove open-coded size in quantization table code Chen-Yu Tsai
  7 siblings, 0 replies; 14+ messages in thread
From: Chen-Yu Tsai @ 2022-01-07  9:34 UTC (permalink / raw)
  To: Ezequiel Garcia, Philipp Zabel, Mauro Carvalho Chehab,
	Hans Verkuil, Greg Kroah-Hartman
  Cc: Chen-Yu Tsai, linux-media, linux-rockchip, linux-staging, linux-kernel

Now that the JPEG header length is aligned with bus access boundaries,
the JPEG encoder can output to the capture buffers directly without
going through a bounce buffer.

Do just that, and get rid of all the bounce buffer related code.

Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
---
 drivers/staging/media/hantro/TODO             |  4 ---
 drivers/staging/media/hantro/hantro.h         |  1 -
 drivers/staging/media/hantro/hantro_drv.c     | 20 +++++--------
 .../staging/media/hantro/hantro_h1_jpeg_enc.c | 24 ++++++++--------
 drivers/staging/media/hantro/hantro_hw.h      | 11 --------
 drivers/staging/media/hantro/hantro_jpeg.c    | 28 -------------------
 .../media/hantro/rockchip_vpu2_hw_jpeg_enc.c  | 27 ++++++++++--------
 .../staging/media/hantro/rockchip_vpu_hw.c    |  6 ----
 8 files changed, 34 insertions(+), 87 deletions(-)

diff --git a/drivers/staging/media/hantro/TODO b/drivers/staging/media/hantro/TODO
index da181dc93069..1d7fed936019 100644
--- a/drivers/staging/media/hantro/TODO
+++ b/drivers/staging/media/hantro/TODO
@@ -4,7 +4,3 @@
   the uABI, it will be required to have the driver in staging.
 
   For this reason, we are keeping this driver in staging for now.
-
-* Instead of having a DMA bounce buffer, it could be possible to use a
-  normal buffer and memmove() the payload to make space for the header.
-  This might need to use extra JPEG markers for padding reasons.
diff --git a/drivers/staging/media/hantro/hantro.h b/drivers/staging/media/hantro/hantro.h
index 06d0f3597694..357f83b86809 100644
--- a/drivers/staging/media/hantro/hantro.h
+++ b/drivers/staging/media/hantro/hantro.h
@@ -259,7 +259,6 @@ struct hantro_ctx {
 	/* Specific for particular codec modes. */
 	union {
 		struct hantro_h264_dec_hw_ctx h264_dec;
-		struct hantro_jpeg_enc_hw_ctx jpeg_enc;
 		struct hantro_mpeg2_dec_hw_ctx mpeg2_dec;
 		struct hantro_vp8_dec_hw_ctx vp8_dec;
 		struct hantro_hevc_dec_hw_ctx hevc_dec;
diff --git a/drivers/staging/media/hantro/hantro_drv.c b/drivers/staging/media/hantro/hantro_drv.c
index b376b9d94b01..bc9bcb4eaf46 100644
--- a/drivers/staging/media/hantro/hantro_drv.c
+++ b/drivers/staging/media/hantro/hantro_drv.c
@@ -219,21 +219,15 @@ queue_init(void *priv, struct vb2_queue *src_vq, struct vb2_queue *dst_vq)
 	if (ret)
 		return ret;
 
+	dst_vq->bidirectional = true;
+	dst_vq->mem_ops = &vb2_dma_contig_memops;
+	dst_vq->dma_attrs = DMA_ATTR_ALLOC_SINGLE_PAGES;
 	/*
-	 * When encoding, the CAPTURE queue doesn't need dma memory,
-	 * as the CPU needs to create the JPEG frames, from the
-	 * hardware-produced JPEG payload.
-	 *
-	 * For the DMA destination buffer, we use a bounce buffer.
+	 * The Kernel needs access to the JPEG destination buffer for the
+	 * JPEG encoder to fill in the JPEG headers.
 	 */
-	if (ctx->is_encoder) {
-		dst_vq->mem_ops = &vb2_vmalloc_memops;
-	} else {
-		dst_vq->bidirectional = true;
-		dst_vq->mem_ops = &vb2_dma_contig_memops;
-		dst_vq->dma_attrs = DMA_ATTR_ALLOC_SINGLE_PAGES |
-				    DMA_ATTR_NO_KERNEL_MAPPING;
-	}
+	if (!ctx->is_encoder)
+		dst_vq->dma_attrs |= DMA_ATTR_NO_KERNEL_MAPPING;
 
 	dst_vq->type = V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE;
 	dst_vq->io_modes = VB2_MMAP | VB2_DMABUF;
diff --git a/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c b/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c
index 9104973af8df..54d2ffbb2637 100644
--- a/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c
+++ b/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c
@@ -39,17 +39,23 @@ static void hantro_h1_set_src_img_ctrl(struct hantro_dev *vpu,
 
 static void hantro_h1_jpeg_enc_set_buffers(struct hantro_dev *vpu,
 					   struct hantro_ctx *ctx,
-					   struct vb2_buffer *src_buf)
+					   struct vb2_buffer *src_buf,
+					   struct vb2_buffer *dst_buf)
 {
 	struct v4l2_pix_format_mplane *pix_fmt = &ctx->src_fmt;
 	dma_addr_t src[3];
+	u32 size_left;
+
+	size_left = vb2_plane_size(dst_buf, 0) - ctx->vpu_dst_fmt->header_size;
+	if (WARN_ON(vb2_plane_size(dst_buf, 0) < ctx->vpu_dst_fmt->header_size))
+		size_left = 0;
 
 	WARN_ON(pix_fmt->num_planes > 3);
 
-	vepu_write_relaxed(vpu, ctx->jpeg_enc.bounce_buffer.dma,
+	vepu_write_relaxed(vpu, vb2_dma_contig_plane_dma_addr(dst_buf, 0) +
+				ctx->vpu_dst_fmt->header_size,
 			   H1_REG_ADDR_OUTPUT_STREAM);
-	vepu_write_relaxed(vpu, ctx->jpeg_enc.bounce_buffer.size,
-			   H1_REG_STR_BUF_LIMIT);
+	vepu_write_relaxed(vpu, size_left, H1_REG_STR_BUF_LIMIT);
 
 	if (pix_fmt->num_planes == 1) {
 		src[0] = vb2_dma_contig_plane_dma_addr(src_buf, 0);
@@ -121,7 +127,8 @@ int hantro_h1_jpeg_enc_run(struct hantro_ctx *ctx)
 			   H1_REG_ENC_CTRL);
 
 	hantro_h1_set_src_img_ctrl(vpu, ctx);
-	hantro_h1_jpeg_enc_set_buffers(vpu, ctx, &src_buf->vb2_buf);
+	hantro_h1_jpeg_enc_set_buffers(vpu, ctx, &src_buf->vb2_buf,
+				       &dst_buf->vb2_buf);
 	hantro_h1_jpeg_enc_set_qtable(vpu, jpeg_ctx.hw_luma_qtable,
 				      jpeg_ctx.hw_chroma_qtable);
 
@@ -153,13 +160,6 @@ void hantro_h1_jpeg_enc_done(struct hantro_ctx *ctx)
 	u32 bytesused = vepu_read(vpu, H1_REG_STR_BUF_LIMIT) / 8;
 	struct vb2_v4l2_buffer *dst_buf = hantro_get_dst_buf(ctx);
 
-	/*
-	 * TODO: Rework the JPEG encoder to eliminate the need
-	 * for a bounce buffer.
-	 */
-	memcpy(vb2_plane_vaddr(&dst_buf->vb2_buf, 0) +
-	       ctx->vpu_dst_fmt->header_size,
-	       ctx->jpeg_enc.bounce_buffer.cpu, bytesused);
 	vb2_set_plane_payload(&dst_buf->vb2_buf, 0,
 			      ctx->vpu_dst_fmt->header_size + bytesused);
 }
diff --git a/drivers/staging/media/hantro/hantro_hw.h b/drivers/staging/media/hantro/hantro_hw.h
index 4a19ae8940b9..c1fd807bc090 100644
--- a/drivers/staging/media/hantro/hantro_hw.h
+++ b/drivers/staging/media/hantro/hantro_hw.h
@@ -43,15 +43,6 @@ struct hantro_aux_buf {
 	unsigned long attrs;
 };
 
-/**
- * struct hantro_jpeg_enc_hw_ctx
- *
- * @bounce_buffer:	Bounce buffer
- */
-struct hantro_jpeg_enc_hw_ctx {
-	struct hantro_aux_buf bounce_buffer;
-};
-
 /* Max. number of entries in the DPB (HW limitation). */
 #define HANTRO_H264_DPB_SIZE		16
 
@@ -327,8 +318,6 @@ void hantro_g1_reset(struct hantro_ctx *ctx);
 
 int hantro_h1_jpeg_enc_run(struct hantro_ctx *ctx);
 int rockchip_vpu2_jpeg_enc_run(struct hantro_ctx *ctx);
-int hantro_jpeg_enc_init(struct hantro_ctx *ctx);
-void hantro_jpeg_enc_exit(struct hantro_ctx *ctx);
 void hantro_h1_jpeg_enc_done(struct hantro_ctx *ctx);
 void rockchip_vpu2_jpeg_enc_done(struct hantro_ctx *ctx);
 
diff --git a/drivers/staging/media/hantro/hantro_jpeg.c b/drivers/staging/media/hantro/hantro_jpeg.c
index e63eeef3952d..84d3f0bfff00 100644
--- a/drivers/staging/media/hantro/hantro_jpeg.c
+++ b/drivers/staging/media/hantro/hantro_jpeg.c
@@ -9,7 +9,6 @@
 
 #include <linux/align.h>
 #include <linux/build_bug.h>
-#include <linux/dma-mapping.h>
 #include <linux/kernel.h>
 #include <linux/string.h>
 #include "hantro_jpeg.h"
@@ -339,30 +338,3 @@ void hantro_jpeg_header_assemble(struct hantro_jpeg_ctx *ctx)
 
 	jpeg_set_quality(ctx);
 }
-
-int hantro_jpeg_enc_init(struct hantro_ctx *ctx)
-{
-	ctx->jpeg_enc.bounce_buffer.size =
-		ctx->dst_fmt.plane_fmt[0].sizeimage -
-		ctx->vpu_dst_fmt->header_size;
-
-	ctx->jpeg_enc.bounce_buffer.cpu =
-		dma_alloc_attrs(ctx->dev->dev,
-				ctx->jpeg_enc.bounce_buffer.size,
-				&ctx->jpeg_enc.bounce_buffer.dma,
-				GFP_KERNEL,
-				DMA_ATTR_ALLOC_SINGLE_PAGES);
-	if (!ctx->jpeg_enc.bounce_buffer.cpu)
-		return -ENOMEM;
-
-	return 0;
-}
-
-void hantro_jpeg_enc_exit(struct hantro_ctx *ctx)
-{
-	dma_free_attrs(ctx->dev->dev,
-		       ctx->jpeg_enc.bounce_buffer.size,
-		       ctx->jpeg_enc.bounce_buffer.cpu,
-		       ctx->jpeg_enc.bounce_buffer.dma,
-		       DMA_ATTR_ALLOC_SINGLE_PAGES);
-}
diff --git a/drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c b/drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c
index da275568874a..d0ae0f14b61e 100644
--- a/drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c
+++ b/drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c
@@ -66,17 +66,23 @@ static void rockchip_vpu2_set_src_img_ctrl(struct hantro_dev *vpu,
 
 static void rockchip_vpu2_jpeg_enc_set_buffers(struct hantro_dev *vpu,
 					       struct hantro_ctx *ctx,
-					       struct vb2_buffer *src_buf)
+					       struct vb2_buffer *src_buf,
+					       struct vb2_buffer *dst_buf)
 {
 	struct v4l2_pix_format_mplane *pix_fmt = &ctx->src_fmt;
 	dma_addr_t src[3];
+	u32 size_left;
+
+	size_left = vb2_plane_size(dst_buf, 0) - ctx->vpu_dst_fmt->header_size;
+	if (WARN_ON(vb2_plane_size(dst_buf, 0) < ctx->vpu_dst_fmt->header_size))
+		size_left = 0;
 
 	WARN_ON(pix_fmt->num_planes > 3);
 
-	vepu_write_relaxed(vpu, ctx->jpeg_enc.bounce_buffer.dma,
+	vepu_write_relaxed(vpu, vb2_dma_contig_plane_dma_addr(dst_buf, 0) +
+				ctx->vpu_dst_fmt->header_size,
 			   VEPU_REG_ADDR_OUTPUT_STREAM);
-	vepu_write_relaxed(vpu, ctx->jpeg_enc.bounce_buffer.size,
-			   VEPU_REG_STR_BUF_LIMIT);
+	vepu_write_relaxed(vpu, size_left, VEPU_REG_STR_BUF_LIMIT);
 
 	if (pix_fmt->num_planes == 1) {
 		src[0] = vb2_dma_contig_plane_dma_addr(src_buf, 0);
@@ -137,6 +143,9 @@ int rockchip_vpu2_jpeg_enc_run(struct hantro_ctx *ctx)
 
 	memset(&jpeg_ctx, 0, sizeof(jpeg_ctx));
 	jpeg_ctx.buffer = vb2_plane_vaddr(&dst_buf->vb2_buf, 0);
+	if (!jpeg_ctx.buffer)
+		return -ENOMEM;
+
 	jpeg_ctx.width = ctx->dst_fmt.width;
 	jpeg_ctx.height = ctx->dst_fmt.height;
 	jpeg_ctx.quality = ctx->jpeg_quality;
@@ -147,7 +156,8 @@ int rockchip_vpu2_jpeg_enc_run(struct hantro_ctx *ctx)
 			   VEPU_REG_ENCODE_START);
 
 	rockchip_vpu2_set_src_img_ctrl(vpu, ctx);
-	rockchip_vpu2_jpeg_enc_set_buffers(vpu, ctx, &src_buf->vb2_buf);
+	rockchip_vpu2_jpeg_enc_set_buffers(vpu, ctx, &src_buf->vb2_buf,
+					   &dst_buf->vb2_buf);
 	rockchip_vpu2_jpeg_enc_set_qtable(vpu, jpeg_ctx.hw_luma_qtable,
 					  jpeg_ctx.hw_chroma_qtable);
 
@@ -181,13 +191,6 @@ void rockchip_vpu2_jpeg_enc_done(struct hantro_ctx *ctx)
 	u32 bytesused = vepu_read(vpu, VEPU_REG_STR_BUF_LIMIT) / 8;
 	struct vb2_v4l2_buffer *dst_buf = hantro_get_dst_buf(ctx);
 
-	/*
-	 * TODO: Rework the JPEG encoder to eliminate the need
-	 * for a bounce buffer.
-	 */
-	memcpy(vb2_plane_vaddr(&dst_buf->vb2_buf, 0) +
-	       ctx->vpu_dst_fmt->header_size,
-	       ctx->jpeg_enc.bounce_buffer.cpu, bytesused);
 	vb2_set_plane_payload(&dst_buf->vb2_buf, 0,
 			      ctx->vpu_dst_fmt->header_size + bytesused);
 }
diff --git a/drivers/staging/media/hantro/rockchip_vpu_hw.c b/drivers/staging/media/hantro/rockchip_vpu_hw.c
index c203b606e6e7..163cf92eafca 100644
--- a/drivers/staging/media/hantro/rockchip_vpu_hw.c
+++ b/drivers/staging/media/hantro/rockchip_vpu_hw.c
@@ -343,9 +343,7 @@ static const struct hantro_codec_ops rk3066_vpu_codec_ops[] = {
 	[HANTRO_MODE_JPEG_ENC] = {
 		.run = hantro_h1_jpeg_enc_run,
 		.reset = rockchip_vpu1_enc_reset,
-		.init = hantro_jpeg_enc_init,
 		.done = hantro_h1_jpeg_enc_done,
-		.exit = hantro_jpeg_enc_exit,
 	},
 	[HANTRO_MODE_H264_DEC] = {
 		.run = hantro_g1_h264_dec_run,
@@ -371,9 +369,7 @@ static const struct hantro_codec_ops rk3288_vpu_codec_ops[] = {
 	[HANTRO_MODE_JPEG_ENC] = {
 		.run = hantro_h1_jpeg_enc_run,
 		.reset = rockchip_vpu1_enc_reset,
-		.init = hantro_jpeg_enc_init,
 		.done = hantro_h1_jpeg_enc_done,
-		.exit = hantro_jpeg_enc_exit,
 	},
 	[HANTRO_MODE_H264_DEC] = {
 		.run = hantro_g1_h264_dec_run,
@@ -399,9 +395,7 @@ static const struct hantro_codec_ops rk3399_vpu_codec_ops[] = {
 	[HANTRO_MODE_JPEG_ENC] = {
 		.run = rockchip_vpu2_jpeg_enc_run,
 		.reset = rockchip_vpu2_enc_reset,
-		.init = hantro_jpeg_enc_init,
 		.done = rockchip_vpu2_jpeg_enc_done,
-		.exit = hantro_jpeg_enc_exit,
 	},
 	[HANTRO_MODE_H264_DEC] = {
 		.run = rockchip_vpu2_h264_dec_run,
-- 
2.34.1.575.g55b058a8bb-goog


_______________________________________________
Linux-rockchip mailing list
Linux-rockchip@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-rockchip

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH RFT v2 8/8] media: hantro: jpeg: Remove open-coded size in quantization table code
  2022-01-07  9:34 [PATCH RFT v2 0/8] media: hantro: jpeg: Various improvements Chen-Yu Tsai
                   ` (6 preceding siblings ...)
  2022-01-07  9:34 ` [PATCH RFT v2 7/8] media: hantro: output encoded JPEG content directly to capture buffers Chen-Yu Tsai
@ 2022-01-07  9:34 ` Chen-Yu Tsai
  7 siblings, 0 replies; 14+ messages in thread
From: Chen-Yu Tsai @ 2022-01-07  9:34 UTC (permalink / raw)
  To: Ezequiel Garcia, Philipp Zabel, Mauro Carvalho Chehab,
	Hans Verkuil, Greg Kroah-Hartman
  Cc: Chen-Yu Tsai, linux-media, linux-rockchip, linux-staging, linux-kernel

The quantization tables used in the Hantro JPEG encoder driver are
implicitly sized by the data they contain, but the loop that scales
the tables based on the compression quality hard codes the size to
64. No code exists to check whether the two actually match.

Commit 85bdcb7eaae7 ("media: hantro: Write the quantization tables in
proper order") introduced two new tables, with sizes hardcoded to 64,
but still no checking if all the sizes are the same.

Commit 41479adb5e52 ("media: hantro: Avoid global variable for jpeg
quantization tables") added the macro JPEG_QUANT_SIZE, but only the
newly added fields used this.

This has resulted in code scattered with magic numbers and array sizes
that happen to match up, without any sort of sanity checking to enforce
it.

Drop the hard-coded array sizes, replace the magic loop count with
a proper JPEG_QUANT_SIZE macro, and add BUILD_BUG_ON()s to check
that all the table sizes match up.

Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
---
 drivers/staging/media/hantro/hantro_jpeg.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/media/hantro/hantro_jpeg.c b/drivers/staging/media/hantro/hantro_jpeg.c
index 84d3f0bfff00..d07b1b449b61 100644
--- a/drivers/staging/media/hantro/hantro_jpeg.c
+++ b/drivers/staging/media/hantro/hantro_jpeg.c
@@ -49,7 +49,7 @@ static const unsigned char chroma_q_table[] = {
 	0x63, 0x63, 0x63, 0x63, 0x63, 0x63, 0x63, 0x63
 };
 
-static const unsigned char zigzag[64] = {
+static const unsigned char zigzag[] = {
 	 0,  1,  8, 16,  9,  2,  3, 10,
 	17, 24, 32, 25, 18, 11,  4,  5,
 	12, 19, 26, 33, 40, 48, 41, 34,
@@ -60,7 +60,7 @@ static const unsigned char zigzag[64] = {
 	53, 60, 61, 54, 47, 55, 62, 63
 };
 
-static const u32 hw_reorder[64] = {
+static const u32 hw_reorder[] = {
 	 0,  8, 16, 24,  1,  9, 17, 25,
 	32, 40, 48, 56, 33, 41, 49, 57,
 	 2, 10, 18, 26,  3, 11, 19, 27,
@@ -292,7 +292,10 @@ jpeg_scale_quant_table(unsigned char *file_q_tab,
 {
 	int i;
 
-	for (i = 0; i < 64; i++) {
+	BUILD_BUG_ON(ARRAY_SIZE(zigzag) != JPEG_QUANT_SIZE);
+	BUILD_BUG_ON(ARRAY_SIZE(hw_reorder) != JPEG_QUANT_SIZE);
+
+	for (i = 0; i < JPEG_QUANT_SIZE; i++) {
 		file_q_tab[i] = jpeg_scale_qp(tab[zigzag[i]], scale);
 		reordered_q_tab[i] = jpeg_scale_qp(tab[hw_reorder[i]], scale);
 	}
@@ -311,6 +314,11 @@ static void jpeg_set_quality(struct hantro_jpeg_ctx *ctx)
 	else
 		scale = 200 - 2 * ctx->quality;
 
+	BUILD_BUG_ON(ARRAY_SIZE(luma_q_table) != JPEG_QUANT_SIZE);
+	BUILD_BUG_ON(ARRAY_SIZE(chroma_q_table) != JPEG_QUANT_SIZE);
+	BUILD_BUG_ON(ARRAY_SIZE(ctx->hw_luma_qtable) != JPEG_QUANT_SIZE);
+	BUILD_BUG_ON(ARRAY_SIZE(ctx->hw_chroma_qtable) != JPEG_QUANT_SIZE);
+
 	jpeg_scale_quant_table(ctx->buffer + LUMA_QUANT_OFF,
 			       ctx->hw_luma_qtable, luma_q_table, scale);
 	jpeg_scale_quant_table(ctx->buffer + CHROMA_QUANT_OFF,
-- 
2.34.1.575.g55b058a8bb-goog


_______________________________________________
Linux-rockchip mailing list
Linux-rockchip@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-rockchip

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH RFT v2 1/8] media: hantro: jpeg: Relax register writes before write starting hardware
  2022-01-07  9:34 ` [PATCH RFT v2 1/8] media: hantro: jpeg: Relax register writes before write starting hardware Chen-Yu Tsai
@ 2022-01-18 21:02   ` Ezequiel Garcia
  2022-01-19 10:08     ` Chen-Yu Tsai
  0 siblings, 1 reply; 14+ messages in thread
From: Ezequiel Garcia @ 2022-01-18 21:02 UTC (permalink / raw)
  To: Chen-Yu Tsai
  Cc: Philipp Zabel, Mauro Carvalho Chehab, Hans Verkuil,
	Greg Kroah-Hartman, linux-media, linux-rockchip, linux-staging,
	linux-kernel

Hi Chen-Yu,

The series looks good, thanks for picking up this task.

Just a one comment.

On Fri, Jan 07, 2022 at 05:34:48PM +0800, Chen-Yu Tsai wrote:
> In the earlier submissions of the Hantro/Rockchip JPEG encoder driver, a
> wmb() was inserted before the final register write that starts the
> encoder. In v11, it was removed and the second-to-last register write
> was changed to a non-relaxed write, which has an implicit wmb() [1].
> The rockchip_vpu2 (then rk3399_vpu) variant is even weirder as there
> is another writel_relaxed() following the non-relaxed one.
> 
> Turns out only the last writel() needs to be non-relaxed. Device I/O
> mappings already guarantee strict ordering to the same endpoint, and
> the writel() triggering the hardware would force all writes to memory
> to be observed before the writel() to the hardware is observed.
> 
> [1] https://lore.kernel.org/linux-media/CAAFQd5ArFG0hU6MgcyLd+_UOP3+T_U-aw2FXv6sE7fGqVCVGqw@mail.gmail.com/
> 
> Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
> ---
>  drivers/staging/media/hantro/hantro_h1_jpeg_enc.c        | 3 +--
>  drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c | 3 +--
>  2 files changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c b/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c
> index 1450013d3685..03db1c3444f8 100644
> --- a/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c
> +++ b/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c
> @@ -123,8 +123,7 @@ int hantro_h1_jpeg_enc_run(struct hantro_ctx *ctx)
>  		| H1_REG_AXI_CTRL_INPUT_SWAP32
>  		| H1_REG_AXI_CTRL_OUTPUT_SWAP8
>  		| H1_REG_AXI_CTRL_INPUT_SWAP8;
> -	/* Make sure that all registers are written at this point. */
> -	vepu_write(vpu, reg, H1_REG_AXI_CTRL);
> +	vepu_write_relaxed(vpu, reg, H1_REG_AXI_CTRL);
>  

As far as I can remember, this logic comes from really old Chromium Kernels.
You might be right, and this barrier isn't needed... but then OTOH the comment
is here for a reason, so maybe it is needed (or was needed on some RK3288 SoC revision).

I don't have RK3288 boards near me, but in any case, I'm not sure
we'd be able to test this easily (maybe there are issues that only
trigger under a certain load).

I'd personally avoid this one change, but if you are confident enough with it
that's fine too.

Thanks!
Ezequiel

>  	reg = H1_REG_ENC_CTRL_WIDTH(MB_WIDTH(ctx->src_fmt.width))
>  		| H1_REG_ENC_CTRL_HEIGHT(MB_HEIGHT(ctx->src_fmt.height))
> diff --git a/drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c b/drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c
> index 4df16f59fb97..b931fc5fa1a9 100644
> --- a/drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c
> +++ b/drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c
> @@ -152,8 +152,7 @@ int rockchip_vpu2_jpeg_enc_run(struct hantro_ctx *ctx)
>  		| VEPU_REG_INPUT_SWAP8
>  		| VEPU_REG_INPUT_SWAP16
>  		| VEPU_REG_INPUT_SWAP32;
> -	/* Make sure that all registers are written at this point. */
> -	vepu_write(vpu, reg, VEPU_REG_DATA_ENDIAN);
> +	vepu_write_relaxed(vpu, reg, VEPU_REG_DATA_ENDIAN);
>  
>  	reg = VEPU_REG_AXI_CTRL_BURST_LEN(16);
>  	vepu_write_relaxed(vpu, reg, VEPU_REG_AXI_CTRL);
> -- 
> 2.34.1.575.g55b058a8bb-goog
> 

_______________________________________________
Linux-rockchip mailing list
Linux-rockchip@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-rockchip

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RFT v2 1/8] media: hantro: jpeg: Relax register writes before write starting hardware
  2022-01-18 21:02   ` Ezequiel Garcia
@ 2022-01-19 10:08     ` Chen-Yu Tsai
  2022-01-19 11:59       ` Ezequiel Garcia
  2022-01-20 14:20       ` Hans Verkuil
  0 siblings, 2 replies; 14+ messages in thread
From: Chen-Yu Tsai @ 2022-01-19 10:08 UTC (permalink / raw)
  To: Ezequiel Garcia, Hans Verkuil
  Cc: Philipp Zabel, Mauro Carvalho Chehab, Greg Kroah-Hartman,
	linux-media, linux-rockchip, linux-staging, linux-kernel

Hi,

On Wed, Jan 19, 2022 at 5:02 AM Ezequiel Garcia
<ezequiel@vanguardiasur.com.ar> wrote:
>
> Hi Chen-Yu,
>
> The series looks good, thanks for picking up this task.
>
> Just a one comment.
>
> On Fri, Jan 07, 2022 at 05:34:48PM +0800, Chen-Yu Tsai wrote:
> > In the earlier submissions of the Hantro/Rockchip JPEG encoder driver, a
> > wmb() was inserted before the final register write that starts the
> > encoder. In v11, it was removed and the second-to-last register write
> > was changed to a non-relaxed write, which has an implicit wmb() [1].
> > The rockchip_vpu2 (then rk3399_vpu) variant is even weirder as there
> > is another writel_relaxed() following the non-relaxed one.
> >
> > Turns out only the last writel() needs to be non-relaxed. Device I/O
> > mappings already guarantee strict ordering to the same endpoint, and
> > the writel() triggering the hardware would force all writes to memory
> > to be observed before the writel() to the hardware is observed.
> >
> > [1] https://lore.kernel.org/linux-media/CAAFQd5ArFG0hU6MgcyLd+_UOP3+T_U-aw2FXv6sE7fGqVCVGqw@mail.gmail.com/
> >
> > Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
> > ---
> >  drivers/staging/media/hantro/hantro_h1_jpeg_enc.c        | 3 +--
> >  drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c | 3 +--
> >  2 files changed, 2 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c b/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c
> > index 1450013d3685..03db1c3444f8 100644
> > --- a/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c
> > +++ b/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c
> > @@ -123,8 +123,7 @@ int hantro_h1_jpeg_enc_run(struct hantro_ctx *ctx)
> >               | H1_REG_AXI_CTRL_INPUT_SWAP32
> >               | H1_REG_AXI_CTRL_OUTPUT_SWAP8
> >               | H1_REG_AXI_CTRL_INPUT_SWAP8;
> > -     /* Make sure that all registers are written at this point. */
> > -     vepu_write(vpu, reg, H1_REG_AXI_CTRL);
> > +     vepu_write_relaxed(vpu, reg, H1_REG_AXI_CTRL);
> >
>
> As far as I can remember, this logic comes from really old Chromium Kernels.
> You might be right, and this barrier isn't needed... but then OTOH the comment
> is here for a reason, so maybe it is needed (or was needed on some RK3288 SoC revision).

I just realized that my commit log is wrong.

" ... a wmb() was inserted before the final register write that starts the
encoder. ... " . It is actually before the second-to-last register write.

> I don't have RK3288 boards near me, but in any case, I'm not sure
> we'd be able to test this easily (maybe there are issues that only
> trigger under a certain load).

I see. I do have a Veyron around that I haven't used in awhile. But as you
said, it might not be an obvious hardware limitation.

> I'd personally avoid this one change, but if you are confident enough with it
> that's fine too.

Unfortunately they didn't leave a whole lot of clues around. For most hardware,
as I mentioned in the commit log, I think the final non-relaxed write should
suffice. I'd point to the decoder drivers not having any barriers or
non-relaxed writes except the final one, but IIUC they are actually two
distinct pieces of hardware.

I suspect we will never know. This JPEG encoder doesn't seem to get used
a lot. The VP8 and H.264 encoders on ChromeOS work correctly without the
extra barrier and get tested a lot, but that's only testing the RK3399.

Hans, would it be possible for you to skip this patch and pick the rest?
Or would you like me to resent without this one?


Thanks
ChenYu

> Thanks!
> Ezequiel
>
> >       reg = H1_REG_ENC_CTRL_WIDTH(MB_WIDTH(ctx->src_fmt.width))
> >               | H1_REG_ENC_CTRL_HEIGHT(MB_HEIGHT(ctx->src_fmt.height))
> > diff --git a/drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c b/drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c
> > index 4df16f59fb97..b931fc5fa1a9 100644
> > --- a/drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c
> > +++ b/drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c
> > @@ -152,8 +152,7 @@ int rockchip_vpu2_jpeg_enc_run(struct hantro_ctx *ctx)
> >               | VEPU_REG_INPUT_SWAP8
> >               | VEPU_REG_INPUT_SWAP16
> >               | VEPU_REG_INPUT_SWAP32;
> > -     /* Make sure that all registers are written at this point. */
> > -     vepu_write(vpu, reg, VEPU_REG_DATA_ENDIAN);
> > +     vepu_write_relaxed(vpu, reg, VEPU_REG_DATA_ENDIAN);
> >
> >       reg = VEPU_REG_AXI_CTRL_BURST_LEN(16);
> >       vepu_write_relaxed(vpu, reg, VEPU_REG_AXI_CTRL);
> > --
> > 2.34.1.575.g55b058a8bb-goog
> >

_______________________________________________
Linux-rockchip mailing list
Linux-rockchip@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-rockchip

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RFT v2 1/8] media: hantro: jpeg: Relax register writes before write starting hardware
  2022-01-19 10:08     ` Chen-Yu Tsai
@ 2022-01-19 11:59       ` Ezequiel Garcia
  2022-01-20 14:20       ` Hans Verkuil
  1 sibling, 0 replies; 14+ messages in thread
From: Ezequiel Garcia @ 2022-01-19 11:59 UTC (permalink / raw)
  To: Chen-Yu Tsai
  Cc: Hans Verkuil, Philipp Zabel, Mauro Carvalho Chehab,
	Greg Kroah-Hartman, linux-media, open list:ARM/Rockchip SoC...,
	open list:STAGING SUBSYSTEM, Linux Kernel Mailing List

On Wed, 19 Jan 2022 at 07:09, Chen-Yu Tsai <wenst@chromium.org> wrote:
>
> Hi,
>
> On Wed, Jan 19, 2022 at 5:02 AM Ezequiel Garcia
> <ezequiel@vanguardiasur.com.ar> wrote:
> >
> > Hi Chen-Yu,
> >
> > The series looks good, thanks for picking up this task.
> >
> > Just a one comment.
> >
> > On Fri, Jan 07, 2022 at 05:34:48PM +0800, Chen-Yu Tsai wrote:
> > > In the earlier submissions of the Hantro/Rockchip JPEG encoder driver, a
> > > wmb() was inserted before the final register write that starts the
> > > encoder. In v11, it was removed and the second-to-last register write
> > > was changed to a non-relaxed write, which has an implicit wmb() [1].
> > > The rockchip_vpu2 (then rk3399_vpu) variant is even weirder as there
> > > is another writel_relaxed() following the non-relaxed one.
> > >
> > > Turns out only the last writel() needs to be non-relaxed. Device I/O
> > > mappings already guarantee strict ordering to the same endpoint, and
> > > the writel() triggering the hardware would force all writes to memory
> > > to be observed before the writel() to the hardware is observed.
> > >
> > > [1] https://lore.kernel.org/linux-media/CAAFQd5ArFG0hU6MgcyLd+_UOP3+T_U-aw2FXv6sE7fGqVCVGqw@mail.gmail.com/
> > >
> > > Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
> > > ---
> > >  drivers/staging/media/hantro/hantro_h1_jpeg_enc.c        | 3 +--
> > >  drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c | 3 +--
> > >  2 files changed, 2 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c b/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c
> > > index 1450013d3685..03db1c3444f8 100644
> > > --- a/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c
> > > +++ b/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c
> > > @@ -123,8 +123,7 @@ int hantro_h1_jpeg_enc_run(struct hantro_ctx *ctx)
> > >               | H1_REG_AXI_CTRL_INPUT_SWAP32
> > >               | H1_REG_AXI_CTRL_OUTPUT_SWAP8
> > >               | H1_REG_AXI_CTRL_INPUT_SWAP8;
> > > -     /* Make sure that all registers are written at this point. */
> > > -     vepu_write(vpu, reg, H1_REG_AXI_CTRL);
> > > +     vepu_write_relaxed(vpu, reg, H1_REG_AXI_CTRL);
> > >
> >
> > As far as I can remember, this logic comes from really old Chromium Kernels.
> > You might be right, and this barrier isn't needed... but then OTOH the comment
> > is here for a reason, so maybe it is needed (or was needed on some RK3288 SoC revision).
>
> I just realized that my commit log is wrong.
>
> " ... a wmb() was inserted before the final register write that starts the
> encoder. ... " . It is actually before the second-to-last register write.
>
> > I don't have RK3288 boards near me, but in any case, I'm not sure
> > we'd be able to test this easily (maybe there are issues that only
> > trigger under a certain load).
>
> I see. I do have a Veyron around that I haven't used in awhile. But as you
> said, it might not be an obvious hardware limitation.
>
> > I'd personally avoid this one change, but if you are confident enough with it
> > that's fine too.
>
> Unfortunately they didn't leave a whole lot of clues around. For most hardware,
> as I mentioned in the commit log, I think the final non-relaxed write should
> suffice. I'd point to the decoder drivers not having any barriers or
> non-relaxed writes except the final one, but IIUC they are actually two
> distinct pieces of hardware.
>
> I suspect we will never know. This JPEG encoder doesn't seem to get used
> a lot. The VP8 and H.264 encoders on ChromeOS work correctly without the
> extra barrier and get tested a lot, but that's only testing the RK3399.
>
> Hans, would it be possible for you to skip this patch and pick the rest?
> Or would you like me to resent without this one?
>

If you decide to resend, feel free to add:

Reviewed-by: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar>

to the whole series.

Thanks,
Ezequiel

_______________________________________________
Linux-rockchip mailing list
Linux-rockchip@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-rockchip

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RFT v2 1/8] media: hantro: jpeg: Relax register writes before write starting hardware
  2022-01-19 10:08     ` Chen-Yu Tsai
  2022-01-19 11:59       ` Ezequiel Garcia
@ 2022-01-20 14:20       ` Hans Verkuil
  2022-01-21  2:25         ` Chen-Yu Tsai
  1 sibling, 1 reply; 14+ messages in thread
From: Hans Verkuil @ 2022-01-20 14:20 UTC (permalink / raw)
  To: Chen-Yu Tsai, Ezequiel Garcia
  Cc: Philipp Zabel, Mauro Carvalho Chehab, Greg Kroah-Hartman,
	linux-media, linux-rockchip, linux-staging, linux-kernel

Hi Chen-Yu,

I'll take patches 2-8.

So should I mark patch 1/8 as 'Rejected' or 'Changes Requested' in patchwork?

Regards,

	Hans

On 1/19/22 11:08, Chen-Yu Tsai wrote:
> Hi,
> 
> On Wed, Jan 19, 2022 at 5:02 AM Ezequiel Garcia
> <ezequiel@vanguardiasur.com.ar> wrote:
>>
>> Hi Chen-Yu,
>>
>> The series looks good, thanks for picking up this task.
>>
>> Just a one comment.
>>
>> On Fri, Jan 07, 2022 at 05:34:48PM +0800, Chen-Yu Tsai wrote:
>>> In the earlier submissions of the Hantro/Rockchip JPEG encoder driver, a
>>> wmb() was inserted before the final register write that starts the
>>> encoder. In v11, it was removed and the second-to-last register write
>>> was changed to a non-relaxed write, which has an implicit wmb() [1].
>>> The rockchip_vpu2 (then rk3399_vpu) variant is even weirder as there
>>> is another writel_relaxed() following the non-relaxed one.
>>>
>>> Turns out only the last writel() needs to be non-relaxed. Device I/O
>>> mappings already guarantee strict ordering to the same endpoint, and
>>> the writel() triggering the hardware would force all writes to memory
>>> to be observed before the writel() to the hardware is observed.
>>>
>>> [1] https://lore.kernel.org/linux-media/CAAFQd5ArFG0hU6MgcyLd+_UOP3+T_U-aw2FXv6sE7fGqVCVGqw@mail.gmail.com/
>>>
>>> Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
>>> ---
>>>  drivers/staging/media/hantro/hantro_h1_jpeg_enc.c        | 3 +--
>>>  drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c | 3 +--
>>>  2 files changed, 2 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c b/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c
>>> index 1450013d3685..03db1c3444f8 100644
>>> --- a/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c
>>> +++ b/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c
>>> @@ -123,8 +123,7 @@ int hantro_h1_jpeg_enc_run(struct hantro_ctx *ctx)
>>>               | H1_REG_AXI_CTRL_INPUT_SWAP32
>>>               | H1_REG_AXI_CTRL_OUTPUT_SWAP8
>>>               | H1_REG_AXI_CTRL_INPUT_SWAP8;
>>> -     /* Make sure that all registers are written at this point. */
>>> -     vepu_write(vpu, reg, H1_REG_AXI_CTRL);
>>> +     vepu_write_relaxed(vpu, reg, H1_REG_AXI_CTRL);
>>>
>>
>> As far as I can remember, this logic comes from really old Chromium Kernels.
>> You might be right, and this barrier isn't needed... but then OTOH the comment
>> is here for a reason, so maybe it is needed (or was needed on some RK3288 SoC revision).
> 
> I just realized that my commit log is wrong.
> 
> " ... a wmb() was inserted before the final register write that starts the
> encoder. ... " . It is actually before the second-to-last register write.
> 
>> I don't have RK3288 boards near me, but in any case, I'm not sure
>> we'd be able to test this easily (maybe there are issues that only
>> trigger under a certain load).
> 
> I see. I do have a Veyron around that I haven't used in awhile. But as you
> said, it might not be an obvious hardware limitation.
> 
>> I'd personally avoid this one change, but if you are confident enough with it
>> that's fine too.
> 
> Unfortunately they didn't leave a whole lot of clues around. For most hardware,
> as I mentioned in the commit log, I think the final non-relaxed write should
> suffice. I'd point to the decoder drivers not having any barriers or
> non-relaxed writes except the final one, but IIUC they are actually two
> distinct pieces of hardware.
> 
> I suspect we will never know. This JPEG encoder doesn't seem to get used
> a lot. The VP8 and H.264 encoders on ChromeOS work correctly without the
> extra barrier and get tested a lot, but that's only testing the RK3399.
> 
> Hans, would it be possible for you to skip this patch and pick the rest?
> Or would you like me to resent without this one?
> 
> 
> Thanks
> ChenYu
> 
>> Thanks!
>> Ezequiel
>>
>>>       reg = H1_REG_ENC_CTRL_WIDTH(MB_WIDTH(ctx->src_fmt.width))
>>>               | H1_REG_ENC_CTRL_HEIGHT(MB_HEIGHT(ctx->src_fmt.height))
>>> diff --git a/drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c b/drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c
>>> index 4df16f59fb97..b931fc5fa1a9 100644
>>> --- a/drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c
>>> +++ b/drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c
>>> @@ -152,8 +152,7 @@ int rockchip_vpu2_jpeg_enc_run(struct hantro_ctx *ctx)
>>>               | VEPU_REG_INPUT_SWAP8
>>>               | VEPU_REG_INPUT_SWAP16
>>>               | VEPU_REG_INPUT_SWAP32;
>>> -     /* Make sure that all registers are written at this point. */
>>> -     vepu_write(vpu, reg, VEPU_REG_DATA_ENDIAN);
>>> +     vepu_write_relaxed(vpu, reg, VEPU_REG_DATA_ENDIAN);
>>>
>>>       reg = VEPU_REG_AXI_CTRL_BURST_LEN(16);
>>>       vepu_write_relaxed(vpu, reg, VEPU_REG_AXI_CTRL);
>>> --
>>> 2.34.1.575.g55b058a8bb-goog
>>>

_______________________________________________
Linux-rockchip mailing list
Linux-rockchip@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-rockchip

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RFT v2 1/8] media: hantro: jpeg: Relax register writes before write starting hardware
  2022-01-20 14:20       ` Hans Verkuil
@ 2022-01-21  2:25         ` Chen-Yu Tsai
  0 siblings, 0 replies; 14+ messages in thread
From: Chen-Yu Tsai @ 2022-01-21  2:25 UTC (permalink / raw)
  To: Hans Verkuil
  Cc: Ezequiel Garcia, Philipp Zabel, Mauro Carvalho Chehab,
	Greg Kroah-Hartman, linux-media, linux-rockchip, linux-staging,
	linux-kernel

On Thu, Jan 20, 2022 at 10:20 PM Hans Verkuil <hverkuil-cisco@xs4all.nl> wrote:
>
> Hi Chen-Yu,
>
> I'll take patches 2-8.

Got it.

> So should I mark patch 1/8 as 'Rejected' or 'Changes Requested' in patchwork?

I'd say "Changes Requested", but it won't really progress unless someone
from Rockchip fills in the blanks about the wmb() though.

ChenYu

> Regards,
>
>         Hans
>
> On 1/19/22 11:08, Chen-Yu Tsai wrote:
> > Hi,
> >
> > On Wed, Jan 19, 2022 at 5:02 AM Ezequiel Garcia
> > <ezequiel@vanguardiasur.com.ar> wrote:
> >>
> >> Hi Chen-Yu,
> >>
> >> The series looks good, thanks for picking up this task.
> >>
> >> Just a one comment.
> >>
> >> On Fri, Jan 07, 2022 at 05:34:48PM +0800, Chen-Yu Tsai wrote:
> >>> In the earlier submissions of the Hantro/Rockchip JPEG encoder driver, a
> >>> wmb() was inserted before the final register write that starts the
> >>> encoder. In v11, it was removed and the second-to-last register write
> >>> was changed to a non-relaxed write, which has an implicit wmb() [1].
> >>> The rockchip_vpu2 (then rk3399_vpu) variant is even weirder as there
> >>> is another writel_relaxed() following the non-relaxed one.
> >>>
> >>> Turns out only the last writel() needs to be non-relaxed. Device I/O
> >>> mappings already guarantee strict ordering to the same endpoint, and
> >>> the writel() triggering the hardware would force all writes to memory
> >>> to be observed before the writel() to the hardware is observed.
> >>>
> >>> [1] https://lore.kernel.org/linux-media/CAAFQd5ArFG0hU6MgcyLd+_UOP3+T_U-aw2FXv6sE7fGqVCVGqw@mail.gmail.com/
> >>>
> >>> Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
> >>> ---
> >>>  drivers/staging/media/hantro/hantro_h1_jpeg_enc.c        | 3 +--
> >>>  drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c | 3 +--
> >>>  2 files changed, 2 insertions(+), 4 deletions(-)
> >>>
> >>> diff --git a/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c b/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c
> >>> index 1450013d3685..03db1c3444f8 100644
> >>> --- a/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c
> >>> +++ b/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c
> >>> @@ -123,8 +123,7 @@ int hantro_h1_jpeg_enc_run(struct hantro_ctx *ctx)
> >>>               | H1_REG_AXI_CTRL_INPUT_SWAP32
> >>>               | H1_REG_AXI_CTRL_OUTPUT_SWAP8
> >>>               | H1_REG_AXI_CTRL_INPUT_SWAP8;
> >>> -     /* Make sure that all registers are written at this point. */
> >>> -     vepu_write(vpu, reg, H1_REG_AXI_CTRL);
> >>> +     vepu_write_relaxed(vpu, reg, H1_REG_AXI_CTRL);
> >>>
> >>
> >> As far as I can remember, this logic comes from really old Chromium Kernels.
> >> You might be right, and this barrier isn't needed... but then OTOH the comment
> >> is here for a reason, so maybe it is needed (or was needed on some RK3288 SoC revision).
> >
> > I just realized that my commit log is wrong.
> >
> > " ... a wmb() was inserted before the final register write that starts the
> > encoder. ... " . It is actually before the second-to-last register write.
> >
> >> I don't have RK3288 boards near me, but in any case, I'm not sure
> >> we'd be able to test this easily (maybe there are issues that only
> >> trigger under a certain load).
> >
> > I see. I do have a Veyron around that I haven't used in awhile. But as you
> > said, it might not be an obvious hardware limitation.
> >
> >> I'd personally avoid this one change, but if you are confident enough with it
> >> that's fine too.
> >
> > Unfortunately they didn't leave a whole lot of clues around. For most hardware,
> > as I mentioned in the commit log, I think the final non-relaxed write should
> > suffice. I'd point to the decoder drivers not having any barriers or
> > non-relaxed writes except the final one, but IIUC they are actually two
> > distinct pieces of hardware.
> >
> > I suspect we will never know. This JPEG encoder doesn't seem to get used
> > a lot. The VP8 and H.264 encoders on ChromeOS work correctly without the
> > extra barrier and get tested a lot, but that's only testing the RK3399.
> >
> > Hans, would it be possible for you to skip this patch and pick the rest?
> > Or would you like me to resent without this one?
> >
> >
> > Thanks
> > ChenYu
> >
> >> Thanks!
> >> Ezequiel
> >>
> >>>       reg = H1_REG_ENC_CTRL_WIDTH(MB_WIDTH(ctx->src_fmt.width))
> >>>               | H1_REG_ENC_CTRL_HEIGHT(MB_HEIGHT(ctx->src_fmt.height))
> >>> diff --git a/drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c b/drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c
> >>> index 4df16f59fb97..b931fc5fa1a9 100644
> >>> --- a/drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c
> >>> +++ b/drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c
> >>> @@ -152,8 +152,7 @@ int rockchip_vpu2_jpeg_enc_run(struct hantro_ctx *ctx)
> >>>               | VEPU_REG_INPUT_SWAP8
> >>>               | VEPU_REG_INPUT_SWAP16
> >>>               | VEPU_REG_INPUT_SWAP32;
> >>> -     /* Make sure that all registers are written at this point. */
> >>> -     vepu_write(vpu, reg, VEPU_REG_DATA_ENDIAN);
> >>> +     vepu_write_relaxed(vpu, reg, VEPU_REG_DATA_ENDIAN);
> >>>
> >>>       reg = VEPU_REG_AXI_CTRL_BURST_LEN(16);
> >>>       vepu_write_relaxed(vpu, reg, VEPU_REG_AXI_CTRL);
> >>> --
> >>> 2.34.1.575.g55b058a8bb-goog
> >>>

_______________________________________________
Linux-rockchip mailing list
Linux-rockchip@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-rockchip

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-01-21  2:26 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-07  9:34 [PATCH RFT v2 0/8] media: hantro: jpeg: Various improvements Chen-Yu Tsai
2022-01-07  9:34 ` [PATCH RFT v2 1/8] media: hantro: jpeg: Relax register writes before write starting hardware Chen-Yu Tsai
2022-01-18 21:02   ` Ezequiel Garcia
2022-01-19 10:08     ` Chen-Yu Tsai
2022-01-19 11:59       ` Ezequiel Garcia
2022-01-20 14:20       ` Hans Verkuil
2022-01-21  2:25         ` Chen-Yu Tsai
2022-01-07  9:34 ` [PATCH RFT v2 2/8] media: hantro: Fix overfill bottom register field name Chen-Yu Tsai
2022-01-07  9:34 ` [PATCH RFT v2 3/8] media: hantro: Support cropping visible area for encoders Chen-Yu Tsai
2022-01-07  9:34 ` [PATCH RFT v2 4/8] media: hantro: jpeg: Add JFIF APP0 segment to JPEG encoder output Chen-Yu Tsai
2022-01-07  9:34 ` [PATCH RFT v2 5/8] media: hantro: jpeg: Add COM segment to JPEG header to align image scan Chen-Yu Tsai
2022-01-07  9:34 ` [PATCH RFT v2 6/8] media: hantro: Implement V4L2_CID_JPEG_ACTIVE_MARKER control Chen-Yu Tsai
2022-01-07  9:34 ` [PATCH RFT v2 7/8] media: hantro: output encoded JPEG content directly to capture buffers Chen-Yu Tsai
2022-01-07  9:34 ` [PATCH RFT v2 8/8] media: hantro: jpeg: Remove open-coded size in quantization table code Chen-Yu Tsai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).