All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 00/16] i.MX media mem2mem scaler
@ 2018-09-18  9:34 Philipp Zabel
  2018-09-18  9:34 ` [PATCH v3 01/16] media: imx: add mem2mem device Philipp Zabel
                   ` (16 more replies)
  0 siblings, 17 replies; 29+ messages in thread
From: Philipp Zabel @ 2018-09-18  9:34 UTC (permalink / raw)
  To: linux-media, Steve Longerbeam; +Cc: Nicolas Dufresne, kernel

Hi,

this is the third version of the i.MX mem2mem scaler series.

The driver patch has been moved to the beginning, as Steve suggested.
I've added his warning patch to catch alignment bugs to the series,
seam position selection has been fixed for more corner cases, the
alignment restriction relaxation patches have been merged into one
patch, and support for tiling with three rows or columns has been added
to avoid unnecessary overhead.

Changes since v2:
 - Rely on ipu_image_convert_adjust() in mem2mem_try_fmt() for format
   adjustments. This makes the mem2mem driver mostly a V4L2 mem2mem API
   wrapper around the IPU image converter, and independent of the
   internal image converter implementation.
 - Remove the source and destination buffers on error in device_run().
   Otherwise the conversion is re-attempted apparently over and over
   again (with WARN() backtraces).
 - Allow subscribing to control changes.
 - Fix seam position selection for more corner cases:
    - Switch width/height properly and align tile top left positions to 8x8
      IRT block size when rotating.
    - Align input width to input burst length in case the scaling step
      flips horizontally.
    - Fix bottom edge calculation.

Changes since v1:
 - Fix inverted allow_overshoot logic
 - Correctly switch horizontal / vertical tile alignment when
   determining seam positions with the 90° rotator active.
 - Fix SPDX-License-Identifier and remove superfluous license
   text.
 - Fix uninitialized walign in try_fmt

Previous cover letter:

we have image conversion code for scaling and colorspace conversion in
the IPUv3 base driver for a while. Since the IC hardware can only write
up to 1024x1024 pixel buffers, it scales to larger output buffers by
splitting the input and output frame into similarly sized tiles.

This causes the issue that the bilinear interpolation resets at the tile
boundary: instead of smoothly interpolating across the seam, there is a
jump in the input sample position that is very apparent for high
upscaling factors. This can be avoided by slightly changing the scaling
coefficients to let the left/top tiles overshoot their input sampling
into the first pixel / line of their right / bottom neighbors. The error
can be further reduced by letting tiles be differently sized and by
selecting seam positions that minimize the input sampling position error
at tile boundaries.
This is complicated by different DMA start address, burst size, and
rotator block size alignment requirements, depending on the input and
output pixel formats, and the fact that flipping happens in different
places depending on the rotation.

This series implements optimal seam position selection and seam hiding
with per-tile resizing coefficients and adds a scaling mem2mem device
to the imx-media driver.

regards
Philipp

Philipp Zabel (15):
  media: imx: add mem2mem device
  gpu: ipu-v3: ipu-ic: allow to manually set resize coefficients
  gpu: ipu-v3: image-convert: prepare for per-tile configuration
  gpu: ipu-v3: image-convert: calculate per-tile resize coefficients
  gpu: ipu-v3: image-convert: reconfigure IC per tile
  gpu: ipu-v3: image-convert: store tile top/left position
  gpu: ipu-v3: image-convert: calculate tile dimensions and offsets
    outside fill_image
  gpu: ipu-v3: image-convert: move tile alignment helpers
  gpu: ipu-v3: image-convert: select optimal seam positions
  gpu: ipu-v3: image-convert: fix debug output for varying tile sizes
  gpu: ipu-v3: image-convert: relax alignment restrictions
  gpu: ipu-v3: image-convert: fix bytesperline adjustment
  gpu: ipu-v3: image-convert: add some ASCII art to the exposition
  gpu: ipu-v3: image-convert: disable double buffering if necessary
  gpu: ipu-v3: image-convert: allow three rows or columns

Steve Longerbeam (1):
  gpu: ipu-cpmem: add WARN_ON_ONCE() for unaligned dma buffers

 drivers/gpu/ipu-v3/ipu-cpmem.c                |   6 +
 drivers/gpu/ipu-v3/ipu-ic.c                   |  52 +-
 drivers/gpu/ipu-v3/ipu-image-convert.c        | 919 +++++++++++++++---
 drivers/staging/media/imx/Kconfig             |   1 +
 drivers/staging/media/imx/Makefile            |   1 +
 drivers/staging/media/imx/imx-media-dev.c     |  11 +
 drivers/staging/media/imx/imx-media-mem2mem.c | 873 +++++++++++++++++
 drivers/staging/media/imx/imx-media.h         |  10 +
 include/video/imx-ipu-v3.h                    |   6 +
 9 files changed, 1727 insertions(+), 152 deletions(-)
 create mode 100644 drivers/staging/media/imx/imx-media-mem2mem.c

-- 
2.19.0

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v3 01/16] media: imx: add mem2mem device
  2018-09-18  9:34 [PATCH v3 00/16] i.MX media mem2mem scaler Philipp Zabel
@ 2018-09-18  9:34 ` Philipp Zabel
  2018-10-18 22:53   ` Tim Harvey
  2018-09-18  9:34 ` [PATCH v3 02/16] gpu: ipu-cpmem: add WARN_ON_ONCE() for unaligned dma buffers Philipp Zabel
                   ` (15 subsequent siblings)
  16 siblings, 1 reply; 29+ messages in thread
From: Philipp Zabel @ 2018-09-18  9:34 UTC (permalink / raw)
  To: linux-media, Steve Longerbeam; +Cc: Nicolas Dufresne, kernel

Add a single imx-media mem2mem video device that uses the IPU IC PP
(image converter post processing) task for scaling and colorspace
conversion.
On i.MX6Q/DL SoCs with two IPUs currently only the first IPU is used.

The hardware only supports writing to destination buffers up to
1024x1024 pixels in a single pass, arbitrary sizes can be achieved
by rendering multiple tiles per frame.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
[steve_longerbeam@mentor.com: use ipu_image_convert_adjust(), fix
 device_run() error handling]
Signed-off-by: Steve Longerbeam <steve_longerbeam@mentor.com>
---
Changes since v2:
 - Rely on ipu_image_convert_adjust() in mem2mem_try_fmt() for format
   adjustments. This makes the mem2mem driver mostly a V4L2 mem2mem API
   wrapper around the IPU image converter, and independent of the
   internal image converter implementation.
 - Remove the source and destination buffers on error in device_run().
   Otherwise the conversion is re-attempted apparently over and over
   again (with WARN() backtraces).
 - Allow subscribing to control changes.
---
 drivers/staging/media/imx/Kconfig             |   1 +
 drivers/staging/media/imx/Makefile            |   1 +
 drivers/staging/media/imx/imx-media-dev.c     |  11 +
 drivers/staging/media/imx/imx-media-mem2mem.c | 873 ++++++++++++++++++
 drivers/staging/media/imx/imx-media.h         |  10 +
 5 files changed, 896 insertions(+)
 create mode 100644 drivers/staging/media/imx/imx-media-mem2mem.c

diff --git a/drivers/staging/media/imx/Kconfig b/drivers/staging/media/imx/Kconfig
index bfc17de56b17..07013cb3cb66 100644
--- a/drivers/staging/media/imx/Kconfig
+++ b/drivers/staging/media/imx/Kconfig
@@ -6,6 +6,7 @@ config VIDEO_IMX_MEDIA
 	depends on HAS_DMA
 	select VIDEOBUF2_DMA_CONTIG
 	select V4L2_FWNODE
+	select V4L2_MEM2MEM_DEV
 	---help---
 	  Say yes here to enable support for video4linux media controller
 	  driver for the i.MX5/6 SOC.
diff --git a/drivers/staging/media/imx/Makefile b/drivers/staging/media/imx/Makefile
index 698a4210316e..f2e722d0fa19 100644
--- a/drivers/staging/media/imx/Makefile
+++ b/drivers/staging/media/imx/Makefile
@@ -6,6 +6,7 @@ imx-media-ic-objs := imx-ic-common.o imx-ic-prp.o imx-ic-prpencvf.o
 obj-$(CONFIG_VIDEO_IMX_MEDIA) += imx-media.o
 obj-$(CONFIG_VIDEO_IMX_MEDIA) += imx-media-common.o
 obj-$(CONFIG_VIDEO_IMX_MEDIA) += imx-media-capture.o
+obj-$(CONFIG_VIDEO_IMX_MEDIA) += imx-media-mem2mem.o
 obj-$(CONFIG_VIDEO_IMX_MEDIA) += imx-media-vdic.o
 obj-$(CONFIG_VIDEO_IMX_MEDIA) += imx-media-ic.o
 
diff --git a/drivers/staging/media/imx/imx-media-dev.c b/drivers/staging/media/imx/imx-media-dev.c
index 1931d1b038dc..9c59687612a0 100644
--- a/drivers/staging/media/imx/imx-media-dev.c
+++ b/drivers/staging/media/imx/imx-media-dev.c
@@ -359,6 +359,17 @@ static int imx_media_probe_complete(struct v4l2_async_notifier *notifier)
 		goto unlock;
 
 	ret = v4l2_device_register_subdev_nodes(&imxmd->v4l2_dev);
+	if (ret)
+		goto unlock;
+
+	/* TODO: check whether we have IC subdevices first */
+	imxmd->m2m_vdev = imx_media_mem2mem_device_init(imxmd);
+	if (IS_ERR(imxmd->m2m_vdev)) {
+		ret = PTR_ERR(imxmd->m2m_vdev);
+		goto unlock;
+	}
+
+	ret = imx_media_mem2mem_device_register(imxmd->m2m_vdev);
 unlock:
 	mutex_unlock(&imxmd->mutex);
 	if (ret)
diff --git a/drivers/staging/media/imx/imx-media-mem2mem.c b/drivers/staging/media/imx/imx-media-mem2mem.c
new file mode 100644
index 000000000000..a2a4dca017ce
--- /dev/null
+++ b/drivers/staging/media/imx/imx-media-mem2mem.c
@@ -0,0 +1,873 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * i.MX IPUv3 mem2mem Scaler/CSC driver
+ *
+ * Copyright (C) 2011 Pengutronix, Sascha Hauer
+ * Copyright (C) 2018 Pengutronix, Philipp Zabel
+ */
+#include <linux/module.h>
+#include <linux/delay.h>
+#include <linux/fs.h>
+#include <linux/version.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <video/imx-ipu-v3.h>
+#include <video/imx-ipu-image-convert.h>
+
+#include <media/v4l2-ctrls.h>
+#include <media/v4l2-event.h>
+#include <media/v4l2-mem2mem.h>
+#include <media/v4l2-device.h>
+#include <media/v4l2-ioctl.h>
+#include <media/videobuf2-dma-contig.h>
+
+#include "imx-media.h"
+
+#define fh_to_ctx(__fh)	container_of(__fh, struct mem2mem_ctx, fh)
+
+enum {
+	V4L2_M2M_SRC = 0,
+	V4L2_M2M_DST = 1,
+};
+
+struct mem2mem_priv {
+	struct imx_media_video_dev vdev;
+
+	struct v4l2_m2m_dev   *m2m_dev;
+	struct device         *dev;
+
+	struct imx_media_dev  *md;
+
+	struct mutex          mutex;       /* mem2mem device mutex */
+
+	atomic_t              num_inst;
+};
+
+#define to_mem2mem_priv(v) container_of(v, struct mem2mem_priv, vdev)
+
+/* Per-queue, driver-specific private data */
+struct mem2mem_q_data {
+	struct v4l2_pix_format	cur_fmt;
+	struct v4l2_rect	rect;
+};
+
+struct mem2mem_ctx {
+	struct mem2mem_priv	*priv;
+
+	struct v4l2_fh		fh;
+	struct mem2mem_q_data	q_data[2];
+	int			error;
+	struct ipu_image_convert_ctx *icc;
+
+	struct v4l2_ctrl_handler ctrl_hdlr;
+	int rotate;
+	bool hflip;
+	bool vflip;
+	enum ipu_rotate_mode	rot_mode;
+};
+
+static struct mem2mem_q_data *get_q_data(struct mem2mem_ctx *ctx,
+					 enum v4l2_buf_type type)
+{
+	if (V4L2_TYPE_IS_OUTPUT(type))
+		return &ctx->q_data[V4L2_M2M_SRC];
+	else
+		return &ctx->q_data[V4L2_M2M_DST];
+}
+
+/*
+ * mem2mem callbacks
+ */
+
+static void job_abort(void *_ctx)
+{
+	struct mem2mem_ctx *ctx = _ctx;
+
+	if (ctx->icc)
+		ipu_image_convert_abort(ctx->icc);
+}
+
+static void mem2mem_ic_complete(struct ipu_image_convert_run *run, void *_ctx)
+{
+	struct mem2mem_ctx *ctx = _ctx;
+	struct mem2mem_priv *priv = ctx->priv;
+	struct vb2_v4l2_buffer *src_buf, *dst_buf;
+
+	src_buf = v4l2_m2m_src_buf_remove(ctx->fh.m2m_ctx);
+	dst_buf = v4l2_m2m_dst_buf_remove(ctx->fh.m2m_ctx);
+
+	dst_buf->vb2_buf.timestamp = src_buf->vb2_buf.timestamp;
+	dst_buf->timecode = src_buf->timecode;
+
+	v4l2_m2m_buf_done(src_buf, run->status ? VB2_BUF_STATE_ERROR :
+						 VB2_BUF_STATE_DONE);
+	v4l2_m2m_buf_done(dst_buf, run->status ? VB2_BUF_STATE_ERROR :
+						 VB2_BUF_STATE_DONE);
+
+	v4l2_m2m_job_finish(priv->m2m_dev, ctx->fh.m2m_ctx);
+	kfree(run);
+}
+
+static void device_run(void *_ctx)
+{
+	struct mem2mem_ctx *ctx = _ctx;
+	struct mem2mem_priv *priv = ctx->priv;
+	struct vb2_v4l2_buffer *src_buf, *dst_buf;
+	struct ipu_image_convert_run *run;
+	int ret;
+
+	src_buf = v4l2_m2m_next_src_buf(ctx->fh.m2m_ctx);
+	dst_buf = v4l2_m2m_next_dst_buf(ctx->fh.m2m_ctx);
+
+	run = kzalloc(sizeof(*run), GFP_KERNEL);
+	if (!run)
+		goto err;
+
+	run->ctx = ctx->icc;
+	run->in_phys = vb2_dma_contig_plane_dma_addr(&src_buf->vb2_buf, 0);
+	run->out_phys = vb2_dma_contig_plane_dma_addr(&dst_buf->vb2_buf, 0);
+
+	ret = ipu_image_convert_queue(run);
+	if (ret < 0) {
+		v4l2_err(ctx->priv->vdev.vfd->v4l2_dev,
+			 "%s: failed to queue: %d\n", __func__, ret);
+		goto err;
+	}
+
+	return;
+
+err:
+	v4l2_m2m_src_buf_remove(ctx->fh.m2m_ctx);
+	v4l2_m2m_dst_buf_remove(ctx->fh.m2m_ctx);
+	v4l2_m2m_buf_done(src_buf, VB2_BUF_STATE_ERROR);
+	v4l2_m2m_buf_done(dst_buf, VB2_BUF_STATE_ERROR);
+	v4l2_m2m_job_finish(priv->m2m_dev, ctx->fh.m2m_ctx);
+}
+
+/*
+ * Video ioctls
+ */
+static int vidioc_querycap(struct file *file, void *priv,
+			   struct v4l2_capability *cap)
+{
+	strncpy(cap->driver, "imx-media-mem2mem", sizeof(cap->driver) - 1);
+	strncpy(cap->card, "imx-media-mem2mem", sizeof(cap->card) - 1);
+	strncpy(cap->bus_info, "platform:imx-media-mem2mem",
+		sizeof(cap->bus_info) - 1);
+	cap->device_caps = V4L2_CAP_VIDEO_M2M | V4L2_CAP_STREAMING;
+	cap->capabilities = cap->device_caps | V4L2_CAP_DEVICE_CAPS;
+
+	return 0;
+}
+
+static int mem2mem_enum_fmt(struct file *file, void *fh,
+			    struct v4l2_fmtdesc *f)
+{
+	u32 fourcc;
+	int ret;
+
+	ret = imx_media_enum_format(&fourcc, f->index, CS_SEL_ANY);
+	if (ret)
+		return ret;
+
+	f->pixelformat = fourcc;
+
+	return 0;
+}
+
+static int mem2mem_g_fmt(struct file *file, void *priv, struct v4l2_format *f)
+{
+	struct mem2mem_ctx *ctx = fh_to_ctx(priv);
+	struct mem2mem_q_data *q_data;
+
+	q_data = get_q_data(ctx, f->type);
+
+	f->fmt.pix = q_data->cur_fmt;
+
+	return 0;
+}
+
+static int mem2mem_try_fmt(struct file *file, void *priv,
+			   struct v4l2_format *f)
+{
+	struct mem2mem_ctx *ctx = fh_to_ctx(priv);
+	struct mem2mem_q_data *q_data = get_q_data(ctx, f->type);
+	struct ipu_image test_in, test_out;
+
+	if (f->type == V4L2_BUF_TYPE_VIDEO_CAPTURE) {
+		struct mem2mem_q_data *q_data_in =
+			get_q_data(ctx, V4L2_BUF_TYPE_VIDEO_OUTPUT);
+
+		test_out.pix = f->fmt.pix;
+		test_in.pix = q_data_in->cur_fmt;
+	} else {
+		struct mem2mem_q_data *q_data_out =
+			get_q_data(ctx, V4L2_BUF_TYPE_VIDEO_CAPTURE);
+
+		test_in.pix = f->fmt.pix;
+		test_out.pix = q_data_out->cur_fmt;
+	}
+
+	ipu_image_convert_adjust(&test_in, &test_out, ctx->rot_mode);
+
+	f->fmt.pix = (f->type == V4L2_BUF_TYPE_VIDEO_CAPTURE) ?
+		test_out.pix : test_in.pix;
+
+	if (f->type == V4L2_BUF_TYPE_VIDEO_CAPTURE) {
+		f->fmt.pix.colorspace = q_data->cur_fmt.colorspace;
+		f->fmt.pix.ycbcr_enc = q_data->cur_fmt.ycbcr_enc;
+		f->fmt.pix.xfer_func = q_data->cur_fmt.xfer_func;
+		f->fmt.pix.quantization = q_data->cur_fmt.quantization;
+	} else if (f->fmt.pix.colorspace == V4L2_COLORSPACE_DEFAULT) {
+		f->fmt.pix.colorspace = V4L2_COLORSPACE_SRGB;
+		f->fmt.pix.ycbcr_enc = V4L2_YCBCR_ENC_DEFAULT;
+		f->fmt.pix.xfer_func = V4L2_XFER_FUNC_DEFAULT;
+		f->fmt.pix.quantization = V4L2_QUANTIZATION_DEFAULT;
+	}
+
+	return 0;
+}
+
+static int mem2mem_s_fmt(struct file *file, void *priv, struct v4l2_format *f)
+{
+	struct mem2mem_q_data *q_data;
+	struct mem2mem_ctx *ctx = fh_to_ctx(priv);
+	struct vb2_queue *vq;
+	int ret;
+
+	vq = v4l2_m2m_get_vq(ctx->fh.m2m_ctx, f->type);
+	if (vb2_is_busy(vq)) {
+		v4l2_err(ctx->priv->vdev.vfd->v4l2_dev, "%s queue busy\n",
+			 __func__);
+		return -EBUSY;
+	}
+
+	q_data = get_q_data(ctx, f->type);
+
+	ret = mem2mem_try_fmt(file, priv, f);
+	if (ret < 0)
+		return ret;
+
+	q_data->cur_fmt.width = f->fmt.pix.width;
+	q_data->cur_fmt.height = f->fmt.pix.height;
+	q_data->cur_fmt.pixelformat = f->fmt.pix.pixelformat;
+	q_data->cur_fmt.field = f->fmt.pix.field;
+	q_data->cur_fmt.bytesperline = f->fmt.pix.bytesperline;
+	q_data->cur_fmt.sizeimage = f->fmt.pix.sizeimage;
+
+	/* Reset cropping/composing rectangle */
+	q_data->rect.left = 0;
+	q_data->rect.top = 0;
+	q_data->rect.width = q_data->cur_fmt.width;
+	q_data->rect.height = q_data->cur_fmt.height;
+
+	if (f->type == V4L2_BUF_TYPE_VIDEO_OUTPUT) {
+		/* Set colorimetry on the output queue */
+		q_data->cur_fmt.colorspace = f->fmt.pix.colorspace;
+		q_data->cur_fmt.ycbcr_enc = f->fmt.pix.ycbcr_enc;
+		q_data->cur_fmt.xfer_func = f->fmt.pix.xfer_func;
+		q_data->cur_fmt.quantization = f->fmt.pix.quantization;
+		/* Propagate colorimetry to the capture queue */
+		q_data = get_q_data(ctx, V4L2_BUF_TYPE_VIDEO_CAPTURE);
+		q_data->cur_fmt.colorspace = f->fmt.pix.colorspace;
+		q_data->cur_fmt.ycbcr_enc = f->fmt.pix.ycbcr_enc;
+		q_data->cur_fmt.xfer_func = f->fmt.pix.xfer_func;
+		q_data->cur_fmt.quantization = f->fmt.pix.quantization;
+	}
+
+	/*
+	 * TODO: Setting colorimetry on the capture queue is currently not
+	 * supported by the V4L2 API
+	 */
+
+	return 0;
+}
+
+static int mem2mem_g_selection(struct file *file, void *priv,
+			       struct v4l2_selection *s)
+{
+	struct mem2mem_ctx *ctx = fh_to_ctx(priv);
+	struct mem2mem_q_data *q_data;
+
+	switch (s->target) {
+	case V4L2_SEL_TGT_CROP:
+	case V4L2_SEL_TGT_CROP_DEFAULT:
+	case V4L2_SEL_TGT_CROP_BOUNDS:
+		if (s->type != V4L2_BUF_TYPE_VIDEO_OUTPUT)
+			return -EINVAL;
+		q_data = get_q_data(ctx, V4L2_BUF_TYPE_VIDEO_OUTPUT);
+		break;
+	case V4L2_SEL_TGT_COMPOSE:
+	case V4L2_SEL_TGT_COMPOSE_DEFAULT:
+	case V4L2_SEL_TGT_COMPOSE_BOUNDS:
+	case V4L2_SEL_TGT_COMPOSE_PADDED:
+		if (s->type != V4L2_BUF_TYPE_VIDEO_CAPTURE)
+			return -EINVAL;
+		q_data = get_q_data(ctx, V4L2_BUF_TYPE_VIDEO_CAPTURE);
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	if (s->target == V4L2_SEL_TGT_CROP ||
+	    s->target == V4L2_SEL_TGT_COMPOSE) {
+		s->r = q_data->rect;
+	} else {
+		s->r.left = 0;
+		s->r.top = 0;
+		s->r.width = q_data->cur_fmt.width;
+		s->r.height = q_data->cur_fmt.height;
+	}
+
+	return 0;
+}
+
+static int mem2mem_s_selection(struct file *file, void *priv,
+			       struct v4l2_selection *s)
+{
+	struct mem2mem_ctx *ctx = fh_to_ctx(priv);
+	struct mem2mem_q_data *q_data;
+
+	switch (s->target) {
+	case V4L2_SEL_TGT_CROP:
+		if (s->type != V4L2_BUF_TYPE_VIDEO_OUTPUT)
+			return -EINVAL;
+		break;
+	case V4L2_SEL_TGT_COMPOSE:
+		if (s->type != V4L2_BUF_TYPE_VIDEO_CAPTURE)
+			return -EINVAL;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	if (s->type != V4L2_BUF_TYPE_VIDEO_CAPTURE &&
+	    s->type != V4L2_BUF_TYPE_VIDEO_OUTPUT)
+		return -EINVAL;
+
+	q_data = get_q_data(ctx, s->type);
+
+	/* The input's frame width to the IC must be a multiple of 8 pixels
+	 * When performing resizing the frame width must be multiple of burst
+	 * size - 8 or 16 pixels as defined by CB#_BURST_16 parameter.
+	 */
+	if (s->flags & V4L2_SEL_FLAG_GE)
+		s->r.width = round_up(s->r.width, 8);
+	if (s->flags & V4L2_SEL_FLAG_LE)
+		s->r.width = round_down(s->r.width, 8);
+	s->r.width = clamp_t(unsigned int, s->r.width, 8,
+			     round_down(q_data->cur_fmt.width, 8));
+	s->r.height = clamp_t(unsigned int, s->r.height, 1,
+			      q_data->cur_fmt.height);
+	s->r.left = clamp_t(unsigned int, s->r.left, 0,
+			    q_data->cur_fmt.width - s->r.width);
+	s->r.top = clamp_t(unsigned int, s->r.top, 0,
+			   q_data->cur_fmt.height - s->r.height);
+
+	/* V4L2_SEL_FLAG_KEEP_CONFIG is only valid for subdevices */
+	q_data->rect = s->r;
+
+	return 0;
+}
+
+static const struct v4l2_ioctl_ops mem2mem_ioctl_ops = {
+	.vidioc_querycap	= vidioc_querycap,
+
+	.vidioc_enum_fmt_vid_cap = mem2mem_enum_fmt,
+	.vidioc_g_fmt_vid_cap	= mem2mem_g_fmt,
+	.vidioc_try_fmt_vid_cap	= mem2mem_try_fmt,
+	.vidioc_s_fmt_vid_cap	= mem2mem_s_fmt,
+
+	.vidioc_enum_fmt_vid_out = mem2mem_enum_fmt,
+	.vidioc_g_fmt_vid_out	= mem2mem_g_fmt,
+	.vidioc_try_fmt_vid_out	= mem2mem_try_fmt,
+	.vidioc_s_fmt_vid_out	= mem2mem_s_fmt,
+
+	.vidioc_g_selection	= mem2mem_g_selection,
+	.vidioc_s_selection	= mem2mem_s_selection,
+
+	.vidioc_reqbufs		= v4l2_m2m_ioctl_reqbufs,
+	.vidioc_querybuf	= v4l2_m2m_ioctl_querybuf,
+
+	.vidioc_qbuf		= v4l2_m2m_ioctl_qbuf,
+	.vidioc_expbuf		= v4l2_m2m_ioctl_expbuf,
+	.vidioc_dqbuf		= v4l2_m2m_ioctl_dqbuf,
+	.vidioc_create_bufs	= v4l2_m2m_ioctl_create_bufs,
+
+	.vidioc_streamon	= v4l2_m2m_ioctl_streamon,
+	.vidioc_streamoff	= v4l2_m2m_ioctl_streamoff,
+
+	.vidioc_subscribe_event = v4l2_ctrl_subscribe_event,
+	.vidioc_unsubscribe_event = v4l2_event_unsubscribe,
+};
+
+/*
+ * Queue operations
+ */
+
+static int mem2mem_queue_setup(struct vb2_queue *vq, unsigned int *nbuffers,
+			       unsigned int *nplanes, unsigned int sizes[],
+			       struct device *alloc_devs[])
+{
+	struct mem2mem_ctx *ctx = vb2_get_drv_priv(vq);
+	struct mem2mem_q_data *q_data;
+	unsigned int count = *nbuffers;
+	struct v4l2_pix_format *pix;
+
+	q_data = get_q_data(ctx, vq->type);
+	pix = &q_data->cur_fmt;
+
+	*nplanes = 1;
+	*nbuffers = count;
+	sizes[0] = pix->sizeimage;
+
+	dev_dbg(ctx->priv->dev, "get %d buffer(s) of size %d each.\n",
+		count, pix->sizeimage);
+
+	return 0;
+}
+
+static int mem2mem_buf_prepare(struct vb2_buffer *vb)
+{
+	struct mem2mem_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
+	struct mem2mem_q_data *q_data;
+	struct v4l2_pix_format *pix;
+	unsigned int plane_size, payload;
+
+	dev_dbg(ctx->priv->dev, "type: %d\n", vb->vb2_queue->type);
+
+	q_data = get_q_data(ctx, vb->vb2_queue->type);
+	pix = &q_data->cur_fmt;
+	plane_size = pix->sizeimage;
+
+	if (vb2_plane_size(vb, 0) < plane_size) {
+		dev_dbg(ctx->priv->dev,
+			"%s data will not fit into plane (%lu < %lu)\n",
+			__func__, vb2_plane_size(vb, 0), (long)plane_size);
+		return -EINVAL;
+	}
+
+	payload = pix->bytesperline * pix->height;
+	if (pix->pixelformat == V4L2_PIX_FMT_YUV420 ||
+	    pix->pixelformat == V4L2_PIX_FMT_YVU420 ||
+	    pix->pixelformat == V4L2_PIX_FMT_NV12)
+		payload = payload * 3 / 2;
+	else if (pix->pixelformat == V4L2_PIX_FMT_YUV422P ||
+		 pix->pixelformat == V4L2_PIX_FMT_NV16)
+		payload *= 2;
+
+	vb2_set_plane_payload(vb, 0, payload);
+
+	return 0;
+}
+
+static void mem2mem_buf_queue(struct vb2_buffer *vb)
+{
+	struct mem2mem_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
+
+	v4l2_m2m_buf_queue(ctx->fh.m2m_ctx, to_vb2_v4l2_buffer(vb));
+}
+
+static void ipu_image_from_q_data(struct ipu_image *im,
+				  struct mem2mem_q_data *q_data)
+{
+	im->pix.width = q_data->cur_fmt.width;
+	im->pix.height = q_data->cur_fmt.height;
+	im->pix.bytesperline = q_data->cur_fmt.bytesperline;
+	im->pix.pixelformat = q_data->cur_fmt.pixelformat;
+	im->rect = q_data->rect;
+}
+
+static int mem2mem_start_streaming(struct vb2_queue *q, unsigned int count)
+{
+	const enum ipu_ic_task ic_task = IC_TASK_POST_PROCESSOR;
+	struct mem2mem_ctx *ctx = vb2_get_drv_priv(q);
+	struct mem2mem_priv *priv = ctx->priv;
+	struct ipu_soc *ipu = priv->md->ipu[0];
+	struct mem2mem_q_data *q_data;
+	struct vb2_queue *other_q;
+	struct ipu_image in, out;
+
+	other_q = v4l2_m2m_get_vq(ctx->fh.m2m_ctx,
+				  (q->type == V4L2_BUF_TYPE_VIDEO_CAPTURE) ?
+				  V4L2_BUF_TYPE_VIDEO_OUTPUT :
+				  V4L2_BUF_TYPE_VIDEO_CAPTURE);
+	if (!vb2_is_streaming(other_q))
+		return 0;
+
+	if (ctx->icc) {
+		v4l2_warn(ctx->priv->vdev.vfd->v4l2_dev, "removing old ICC\n");
+		ipu_image_convert_unprepare(ctx->icc);
+	}
+
+	q_data = get_q_data(ctx, V4L2_BUF_TYPE_VIDEO_OUTPUT);
+	ipu_image_from_q_data(&in, q_data);
+
+	q_data = get_q_data(ctx, V4L2_BUF_TYPE_VIDEO_CAPTURE);
+	ipu_image_from_q_data(&out, q_data);
+
+	ctx->icc = ipu_image_convert_prepare(ipu, ic_task, &in, &out,
+					     ctx->rot_mode,
+					     mem2mem_ic_complete, ctx);
+	if (IS_ERR(ctx->icc)) {
+		struct vb2_v4l2_buffer *buf;
+		int ret = PTR_ERR(ctx->icc);
+
+		ctx->icc = NULL;
+		v4l2_err(ctx->priv->vdev.vfd->v4l2_dev, "%s: error %d\n",
+			 __func__, ret);
+		while ((buf = v4l2_m2m_src_buf_remove(ctx->fh.m2m_ctx)))
+			v4l2_m2m_buf_done(buf, VB2_BUF_STATE_QUEUED);
+		while ((buf = v4l2_m2m_dst_buf_remove(ctx->fh.m2m_ctx)))
+			v4l2_m2m_buf_done(buf, VB2_BUF_STATE_QUEUED);
+		return ret;
+	}
+
+	return 0;
+}
+
+static void mem2mem_stop_streaming(struct vb2_queue *q)
+{
+	struct mem2mem_ctx *ctx = vb2_get_drv_priv(q);
+	struct vb2_v4l2_buffer *buf;
+
+	if (ctx->icc) {
+		ipu_image_convert_unprepare(ctx->icc);
+		ctx->icc = NULL;
+	}
+
+	if (q->type == V4L2_BUF_TYPE_VIDEO_OUTPUT) {
+		while ((buf = v4l2_m2m_src_buf_remove(ctx->fh.m2m_ctx)))
+			v4l2_m2m_buf_done(buf, VB2_BUF_STATE_ERROR);
+	} else {
+		while ((buf = v4l2_m2m_dst_buf_remove(ctx->fh.m2m_ctx)))
+			v4l2_m2m_buf_done(buf, VB2_BUF_STATE_ERROR);
+	}
+}
+
+static const struct vb2_ops mem2mem_qops = {
+	.queue_setup	= mem2mem_queue_setup,
+	.buf_prepare	= mem2mem_buf_prepare,
+	.buf_queue	= mem2mem_buf_queue,
+	.wait_prepare	= vb2_ops_wait_prepare,
+	.wait_finish	= vb2_ops_wait_finish,
+	.start_streaming = mem2mem_start_streaming,
+	.stop_streaming = mem2mem_stop_streaming,
+};
+
+static int queue_init(void *priv, struct vb2_queue *src_vq,
+		      struct vb2_queue *dst_vq)
+{
+	struct mem2mem_ctx *ctx = priv;
+	int ret;
+
+	memset(src_vq, 0, sizeof(*src_vq));
+	src_vq->type = V4L2_BUF_TYPE_VIDEO_OUTPUT;
+	src_vq->io_modes = VB2_MMAP | VB2_DMABUF;
+	src_vq->drv_priv = ctx;
+	src_vq->buf_struct_size = sizeof(struct v4l2_m2m_buffer);
+	src_vq->ops = &mem2mem_qops;
+	src_vq->mem_ops = &vb2_dma_contig_memops;
+	src_vq->timestamp_flags = V4L2_BUF_FLAG_TIMESTAMP_COPY;
+	src_vq->lock = &ctx->priv->mutex;
+	src_vq->dev = ctx->priv->dev;
+
+	ret = vb2_queue_init(src_vq);
+	if (ret)
+		return ret;
+
+	memset(dst_vq, 0, sizeof(*dst_vq));
+	dst_vq->type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
+	dst_vq->io_modes = VB2_MMAP | VB2_DMABUF;
+	dst_vq->drv_priv = ctx;
+	dst_vq->buf_struct_size = sizeof(struct v4l2_m2m_buffer);
+	dst_vq->ops = &mem2mem_qops;
+	dst_vq->mem_ops = &vb2_dma_contig_memops;
+	dst_vq->timestamp_flags = V4L2_BUF_FLAG_TIMESTAMP_COPY;
+	dst_vq->lock = &ctx->priv->mutex;
+	dst_vq->dev = ctx->priv->dev;
+
+	return vb2_queue_init(dst_vq);
+}
+
+static int mem2mem_s_ctrl(struct v4l2_ctrl *ctrl)
+{
+	struct mem2mem_ctx *ctx = container_of(ctrl->handler,
+					       struct mem2mem_ctx, ctrl_hdlr);
+	enum ipu_rotate_mode rot_mode;
+	int rotate;
+	bool hflip, vflip;
+	int ret = 0;
+
+	rotate = ctx->rotate;
+	hflip = ctx->hflip;
+	vflip = ctx->vflip;
+
+	switch (ctrl->id) {
+	case V4L2_CID_HFLIP:
+		hflip = ctrl->val;
+		break;
+	case V4L2_CID_VFLIP:
+		vflip = ctrl->val;
+		break;
+	case V4L2_CID_ROTATE:
+		rotate = ctrl->val;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	ret = ipu_degrees_to_rot_mode(&rot_mode, rotate, hflip, vflip);
+	if (ret)
+		return ret;
+
+	if (rot_mode != ctx->rot_mode) {
+		struct vb2_queue *cap_q;
+
+		cap_q = v4l2_m2m_get_vq(ctx->fh.m2m_ctx,
+					V4L2_BUF_TYPE_VIDEO_CAPTURE);
+		if (vb2_is_streaming(cap_q))
+			return -EBUSY;
+
+		ctx->rot_mode = rot_mode;
+		ctx->rotate = rotate;
+		ctx->hflip = hflip;
+		ctx->vflip = vflip;
+	}
+
+	return 0;
+}
+
+static const struct v4l2_ctrl_ops mem2mem_ctrl_ops = {
+	.s_ctrl = mem2mem_s_ctrl,
+};
+
+static int mem2mem_init_controls(struct mem2mem_ctx *ctx)
+{
+	struct v4l2_ctrl_handler *hdlr = &ctx->ctrl_hdlr;
+	int ret;
+
+	v4l2_ctrl_handler_init(hdlr, 3);
+
+	v4l2_ctrl_new_std(hdlr, &mem2mem_ctrl_ops, V4L2_CID_HFLIP,
+			  0, 1, 1, 0);
+	v4l2_ctrl_new_std(hdlr, &mem2mem_ctrl_ops, V4L2_CID_VFLIP,
+			  0, 1, 1, 0);
+	v4l2_ctrl_new_std(hdlr, &mem2mem_ctrl_ops, V4L2_CID_ROTATE,
+			  0, 270, 90, 0);
+
+	if (hdlr->error) {
+		ret = hdlr->error;
+		goto out_free;
+	}
+
+	v4l2_ctrl_handler_setup(hdlr);
+	return 0;
+
+out_free:
+	v4l2_ctrl_handler_free(hdlr);
+	return ret;
+}
+
+#define DEFAULT_WIDTH	720
+#define DEFAULT_HEIGHT	576
+static const struct mem2mem_q_data mem2mem_q_data_default = {
+	.cur_fmt = {
+		.width = DEFAULT_WIDTH,
+		.height = DEFAULT_HEIGHT,
+		.pixelformat = V4L2_PIX_FMT_YUV420,
+		.field = V4L2_FIELD_NONE,
+		.bytesperline = DEFAULT_WIDTH,
+		.sizeimage = DEFAULT_WIDTH * DEFAULT_HEIGHT * 3 / 2,
+		.colorspace = V4L2_COLORSPACE_SRGB,
+	},
+	.rect = {
+		.width = DEFAULT_WIDTH,
+		.height = DEFAULT_HEIGHT,
+	},
+};
+
+/*
+ * File operations
+ */
+static int mem2mem_open(struct file *file)
+{
+	struct mem2mem_priv *priv = video_drvdata(file);
+	struct mem2mem_ctx *ctx = NULL;
+	int ret;
+
+	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
+	if (!ctx)
+		return -ENOMEM;
+
+	ctx->rot_mode = IPU_ROTATE_NONE;
+
+	v4l2_fh_init(&ctx->fh, video_devdata(file));
+	file->private_data = &ctx->fh;
+	v4l2_fh_add(&ctx->fh);
+	ctx->priv = priv;
+
+	ctx->fh.m2m_ctx = v4l2_m2m_ctx_init(priv->m2m_dev, ctx,
+					    &queue_init);
+	if (IS_ERR(ctx->fh.m2m_ctx)) {
+		ret = PTR_ERR(ctx->fh.m2m_ctx);
+		goto err_ctx;
+	}
+
+	ret = mem2mem_init_controls(ctx);
+	if (ret)
+		goto err_ctrls;
+
+	ctx->fh.ctrl_handler = &ctx->ctrl_hdlr;
+
+	ctx->q_data[V4L2_M2M_SRC] = mem2mem_q_data_default;
+	ctx->q_data[V4L2_M2M_DST] = mem2mem_q_data_default;
+
+	atomic_inc(&priv->num_inst);
+
+	dev_dbg(priv->dev, "Created instance %p, m2m_ctx: %p\n", ctx,
+		ctx->fh.m2m_ctx);
+
+	return 0;
+
+err_ctrls:
+	v4l2_m2m_ctx_release(ctx->fh.m2m_ctx);
+err_ctx:
+	v4l2_fh_del(&ctx->fh);
+	v4l2_fh_exit(&ctx->fh);
+	kfree(ctx);
+	return ret;
+}
+
+static int mem2mem_release(struct file *file)
+{
+	struct mem2mem_priv *priv = video_drvdata(file);
+	struct mem2mem_ctx *ctx = fh_to_ctx(file->private_data);
+
+	dev_dbg(priv->dev, "Releasing instance %p\n", ctx);
+
+	v4l2_m2m_ctx_release(ctx->fh.m2m_ctx);
+	v4l2_fh_del(&ctx->fh);
+	v4l2_fh_exit(&ctx->fh);
+	kfree(ctx);
+
+	atomic_dec(&priv->num_inst);
+
+	return 0;
+}
+
+static const struct v4l2_file_operations mem2mem_fops = {
+	.owner		= THIS_MODULE,
+	.open		= mem2mem_open,
+	.release	= mem2mem_release,
+	.poll		= v4l2_m2m_fop_poll,
+	.unlocked_ioctl	= video_ioctl2,
+	.mmap		= v4l2_m2m_fop_mmap,
+};
+
+static struct v4l2_m2m_ops m2m_ops = {
+	.device_run	= device_run,
+	.job_abort	= job_abort,
+};
+
+static const struct video_device mem2mem_videodev_template = {
+	.name		= "ipu0_ic_pp mem2mem",
+	.fops		= &mem2mem_fops,
+	.ioctl_ops	= &mem2mem_ioctl_ops,
+	.minor		= -1,
+	.release	= video_device_release,
+	.vfl_dir	= VFL_DIR_M2M,
+	.tvnorms	= V4L2_STD_NTSC | V4L2_STD_PAL | V4L2_STD_SECAM,
+	.device_caps	= V4L2_CAP_VIDEO_M2M | V4L2_CAP_STREAMING,
+};
+
+int imx_media_mem2mem_device_register(struct imx_media_video_dev *vdev)
+{
+	struct mem2mem_priv *priv = to_mem2mem_priv(vdev);
+	struct video_device *vfd = vdev->vfd;
+	int ret;
+
+	vfd->v4l2_dev = &priv->md->v4l2_dev;
+
+	ret = video_register_device(vfd, VFL_TYPE_GRABBER, -1);
+	if (ret) {
+		v4l2_err(vfd->v4l2_dev, "Failed to register video device\n");
+		return ret;
+	}
+
+	v4l2_info(vfd->v4l2_dev, "Registered %s as /dev/%s\n", vfd->name,
+		  video_device_node_name(vfd));
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(imx_media_mem2mem_device_register);
+
+void imx_media_mem2mem_device_unregister(struct imx_media_video_dev *vdev)
+{
+	struct mem2mem_priv *priv = to_mem2mem_priv(vdev);
+	struct video_device *vfd = priv->vdev.vfd;
+
+	mutex_lock(&priv->mutex);
+
+	if (video_is_registered(vfd)) {
+		video_unregister_device(vfd);
+		media_entity_cleanup(&vfd->entity);
+	}
+
+	mutex_unlock(&priv->mutex);
+}
+EXPORT_SYMBOL_GPL(imx_media_mem2mem_device_unregister);
+
+struct imx_media_video_dev *
+imx_media_mem2mem_device_init(struct imx_media_dev *md)
+{
+	struct mem2mem_priv *priv;
+	struct video_device *vfd;
+	int ret;
+
+	priv = devm_kzalloc(md->md.dev, sizeof(*priv), GFP_KERNEL);
+	if (!priv)
+		return ERR_PTR(-ENOMEM);
+
+	priv->md = md;
+	priv->dev = md->md.dev;
+
+	mutex_init(&priv->mutex);
+	atomic_set(&priv->num_inst, 0);
+
+	vfd = video_device_alloc();
+	if (!vfd)
+		return ERR_PTR(-ENOMEM);
+
+	*vfd = mem2mem_videodev_template;
+	snprintf(vfd->name, sizeof(vfd->name), "ipu_ic_pp mem2mem");
+	vfd->lock = &priv->mutex;
+	priv->vdev.vfd = vfd;
+
+	INIT_LIST_HEAD(&priv->vdev.list);
+
+	video_set_drvdata(vfd, priv);
+
+	priv->m2m_dev = v4l2_m2m_init(&m2m_ops);
+	if (IS_ERR(priv->m2m_dev)) {
+		ret = PTR_ERR(priv->m2m_dev);
+		v4l2_err(&md->v4l2_dev, "Failed to init mem2mem device: %d\n",
+			 ret);
+		return ERR_PTR(ret);
+	}
+
+	return &priv->vdev;
+}
+EXPORT_SYMBOL_GPL(imx_media_mem2mem_device_init);
+
+void imx_media_mem2mem_device_remove(struct imx_media_video_dev *vdev)
+{
+	struct mem2mem_priv *priv = to_mem2mem_priv(vdev);
+
+	v4l2_m2m_release(priv->m2m_dev);
+}
+EXPORT_SYMBOL_GPL(imx_media_mem2mem_device_remove);
+
+MODULE_DESCRIPTION("i.MX IPUv3 mem2mem scaler/CSC driver");
+MODULE_AUTHOR("Sascha Hauer <s.hauer@pengutronix.de>");
+MODULE_LICENSE("GPL");
diff --git a/drivers/staging/media/imx/imx-media.h b/drivers/staging/media/imx/imx-media.h
index 57bd094cf765..ce8b15ca401c 100644
--- a/drivers/staging/media/imx/imx-media.h
+++ b/drivers/staging/media/imx/imx-media.h
@@ -151,6 +151,9 @@ struct imx_media_dev {
 	/* for async subdev registration */
 	struct list_head asd_list;
 	struct v4l2_async_notifier subdev_notifier;
+
+	/* IC scaler/CSC mem2mem video device */
+	struct imx_media_video_dev *m2m_vdev;
 };
 
 enum codespace_sel {
@@ -264,6 +267,13 @@ void imx_media_capture_device_set_format(struct imx_media_video_dev *vdev,
 					 struct v4l2_pix_format *pix);
 void imx_media_capture_device_error(struct imx_media_video_dev *vdev);
 
+/* imx-media-mem2mem.c */
+struct imx_media_video_dev *
+imx_media_mem2mem_device_init(struct imx_media_dev *dev);
+void imx_media_mem2mem_device_remove(struct imx_media_video_dev *vdev);
+int imx_media_mem2mem_device_register(struct imx_media_video_dev *vdev);
+void imx_media_mem2mem_device_unregister(struct imx_media_video_dev *vdev);
+
 /* subdev group ids */
 #define IMX_MEDIA_GRP_ID_CSI2      BIT(8)
 #define IMX_MEDIA_GRP_ID_CSI_BIT   9
-- 
2.19.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 02/16] gpu: ipu-cpmem: add WARN_ON_ONCE() for unaligned dma buffers
  2018-09-18  9:34 [PATCH v3 00/16] i.MX media mem2mem scaler Philipp Zabel
  2018-09-18  9:34 ` [PATCH v3 01/16] media: imx: add mem2mem device Philipp Zabel
@ 2018-09-18  9:34 ` Philipp Zabel
  2018-09-18  9:34 ` [PATCH v3 03/16] gpu: ipu-v3: ipu-ic: allow to manually set resize coefficients Philipp Zabel
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 29+ messages in thread
From: Philipp Zabel @ 2018-09-18  9:34 UTC (permalink / raw)
  To: linux-media, Steve Longerbeam; +Cc: Nicolas Dufresne, kernel

From: Steve Longerbeam <steve_longerbeam@mentor.com>

Add a WARN_ON_ONCE() if either the Y/packed buffer, or the U/V offsets,
are not aligned on 8-byte boundaries. This will catch alignment
bugs in DRM, V4L2.

Signed-off-by: Steve Longerbeam <steve_longerbeam@mentor.com>
---
New since v2.
---
 drivers/gpu/ipu-v3/ipu-cpmem.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/ipu-v3/ipu-cpmem.c b/drivers/gpu/ipu-v3/ipu-cpmem.c
index a9d2501500a1..7e65954f13c2 100644
--- a/drivers/gpu/ipu-v3/ipu-cpmem.c
+++ b/drivers/gpu/ipu-v3/ipu-cpmem.c
@@ -259,6 +259,8 @@ EXPORT_SYMBOL_GPL(ipu_cpmem_set_high_priority);
 
 void ipu_cpmem_set_buffer(struct ipuv3_channel *ch, int bufnum, dma_addr_t buf)
 {
+	WARN_ON_ONCE(buf & 0x7);
+
 	if (bufnum)
 		ipu_ch_param_write_field(ch, IPU_FIELD_EBA1, buf >> 3);
 	else
@@ -268,6 +270,8 @@ EXPORT_SYMBOL_GPL(ipu_cpmem_set_buffer);
 
 void ipu_cpmem_set_uv_offset(struct ipuv3_channel *ch, u32 u_off, u32 v_off)
 {
+	WARN_ON_ONCE((u_off & 0x7) || (v_off & 0x7));
+
 	ipu_ch_param_write_field(ch, IPU_FIELD_UBO, u_off / 8);
 	ipu_ch_param_write_field(ch, IPU_FIELD_VBO, v_off / 8);
 }
@@ -435,6 +439,8 @@ void ipu_cpmem_set_yuv_planar_full(struct ipuv3_channel *ch,
 				   unsigned int uv_stride,
 				   unsigned int u_offset, unsigned int v_offset)
 {
+	WARN_ON_ONCE((u_offset & 0x7) || (v_offset & 0x7));
+
 	ipu_ch_param_write_field(ch, IPU_FIELD_SLUV, uv_stride - 1);
 	ipu_ch_param_write_field(ch, IPU_FIELD_UBO, u_offset / 8);
 	ipu_ch_param_write_field(ch, IPU_FIELD_VBO, v_offset / 8);
-- 
2.19.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 03/16] gpu: ipu-v3: ipu-ic: allow to manually set resize coefficients
  2018-09-18  9:34 [PATCH v3 00/16] i.MX media mem2mem scaler Philipp Zabel
  2018-09-18  9:34 ` [PATCH v3 01/16] media: imx: add mem2mem device Philipp Zabel
  2018-09-18  9:34 ` [PATCH v3 02/16] gpu: ipu-cpmem: add WARN_ON_ONCE() for unaligned dma buffers Philipp Zabel
@ 2018-09-18  9:34 ` Philipp Zabel
  2018-09-18  9:34 ` [PATCH v3 04/16] gpu: ipu-v3: image-convert: prepare for per-tile configuration Philipp Zabel
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 29+ messages in thread
From: Philipp Zabel @ 2018-09-18  9:34 UTC (permalink / raw)
  To: linux-media, Steve Longerbeam; +Cc: Nicolas Dufresne, kernel

For tiled scaling, we want to compute the scaling coefficients
externally in such a way that the interpolation overshoots tile
boundaries and samples up to the first pixel of the next tile.
Prepare to override the resizing coefficients from the image
conversion code.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
---
No changes since v2.
---
 drivers/gpu/ipu-v3/ipu-ic.c | 52 +++++++++++++++++++++++--------------
 include/video/imx-ipu-v3.h  |  6 +++++
 2 files changed, 39 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-ic.c b/drivers/gpu/ipu-v3/ipu-ic.c
index 67cc820253a9..594c3cbc8291 100644
--- a/drivers/gpu/ipu-v3/ipu-ic.c
+++ b/drivers/gpu/ipu-v3/ipu-ic.c
@@ -442,36 +442,40 @@ int ipu_ic_task_graphics_init(struct ipu_ic *ic,
 }
 EXPORT_SYMBOL_GPL(ipu_ic_task_graphics_init);
 
-int ipu_ic_task_init(struct ipu_ic *ic,
-		     int in_width, int in_height,
-		     int out_width, int out_height,
-		     enum ipu_color_space in_cs,
-		     enum ipu_color_space out_cs)
+int ipu_ic_task_init_rsc(struct ipu_ic *ic,
+			 int in_width, int in_height,
+			 int out_width, int out_height,
+			 enum ipu_color_space in_cs,
+			 enum ipu_color_space out_cs,
+			 u32 rsc)
 {
 	struct ipu_ic_priv *priv = ic->priv;
-	u32 reg, downsize_coeff, resize_coeff;
+	u32 downsize_coeff, resize_coeff;
 	unsigned long flags;
 	int ret = 0;
 
-	/* Setup vertical resizing */
-	ret = calc_resize_coeffs(ic, in_height, out_height,
-				 &resize_coeff, &downsize_coeff);
-	if (ret)
-		return ret;
+	if (!rsc) {
+		/* Setup vertical resizing */
 
-	reg = (downsize_coeff << 30) | (resize_coeff << 16);
+		ret = calc_resize_coeffs(ic, in_height, out_height,
+					 &resize_coeff, &downsize_coeff);
+		if (ret)
+			return ret;
+
+		rsc = (downsize_coeff << 30) | (resize_coeff << 16);
 
-	/* Setup horizontal resizing */
-	ret = calc_resize_coeffs(ic, in_width, out_width,
-				 &resize_coeff, &downsize_coeff);
-	if (ret)
-		return ret;
+		/* Setup horizontal resizing */
+		ret = calc_resize_coeffs(ic, in_width, out_width,
+					 &resize_coeff, &downsize_coeff);
+		if (ret)
+			return ret;
 
-	reg |= (downsize_coeff << 14) | resize_coeff;
+		rsc |= (downsize_coeff << 14) | resize_coeff;
+	}
 
 	spin_lock_irqsave(&priv->lock, flags);
 
-	ipu_ic_write(ic, reg, ic->reg->rsc);
+	ipu_ic_write(ic, rsc, ic->reg->rsc);
 
 	/* Setup color space conversion */
 	ic->in_cs = in_cs;
@@ -487,6 +491,16 @@ int ipu_ic_task_init(struct ipu_ic *ic,
 	spin_unlock_irqrestore(&priv->lock, flags);
 	return ret;
 }
+
+int ipu_ic_task_init(struct ipu_ic *ic,
+		     int in_width, int in_height,
+		     int out_width, int out_height,
+		     enum ipu_color_space in_cs,
+		     enum ipu_color_space out_cs)
+{
+	return ipu_ic_task_init_rsc(ic, in_width, in_height, out_width,
+				    out_height, in_cs, out_cs, 0);
+}
 EXPORT_SYMBOL_GPL(ipu_ic_task_init);
 
 int ipu_ic_task_idma_init(struct ipu_ic *ic, struct ipuv3_channel *channel,
diff --git a/include/video/imx-ipu-v3.h b/include/video/imx-ipu-v3.h
index abbad94e14a1..94f0eec821c8 100644
--- a/include/video/imx-ipu-v3.h
+++ b/include/video/imx-ipu-v3.h
@@ -387,6 +387,12 @@ int ipu_ic_task_init(struct ipu_ic *ic,
 		     int out_width, int out_height,
 		     enum ipu_color_space in_cs,
 		     enum ipu_color_space out_cs);
+int ipu_ic_task_init_rsc(struct ipu_ic *ic,
+			 int in_width, int in_height,
+			 int out_width, int out_height,
+			 enum ipu_color_space in_cs,
+			 enum ipu_color_space out_cs,
+			 u32 rsc);
 int ipu_ic_task_graphics_init(struct ipu_ic *ic,
 			      enum ipu_color_space in_g_cs,
 			      bool galpha_en, u32 galpha,
-- 
2.19.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 04/16] gpu: ipu-v3: image-convert: prepare for per-tile configuration
  2018-09-18  9:34 [PATCH v3 00/16] i.MX media mem2mem scaler Philipp Zabel
                   ` (2 preceding siblings ...)
  2018-09-18  9:34 ` [PATCH v3 03/16] gpu: ipu-v3: ipu-ic: allow to manually set resize coefficients Philipp Zabel
@ 2018-09-18  9:34 ` Philipp Zabel
  2018-09-18  9:34 ` [PATCH v3 05/16] gpu: ipu-v3: image-convert: calculate per-tile resize coefficients Philipp Zabel
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 29+ messages in thread
From: Philipp Zabel @ 2018-09-18  9:34 UTC (permalink / raw)
  To: linux-media, Steve Longerbeam; +Cc: Nicolas Dufresne, kernel

Let convert_start start from a given tile index, allocate intermediate
tile with maximum tile size.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
---
No changes since v2.
---
 drivers/gpu/ipu-v3/ipu-image-convert.c | 60 +++++++++++++++-----------
 1 file changed, 35 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-image-convert.c b/drivers/gpu/ipu-v3/ipu-image-convert.c
index f4081962784c..f4db1553d23a 100644
--- a/drivers/gpu/ipu-v3/ipu-image-convert.c
+++ b/drivers/gpu/ipu-v3/ipu-image-convert.c
@@ -611,7 +611,8 @@ static void init_idmac_channel(struct ipu_image_convert_ctx *ctx,
 			       struct ipuv3_channel *channel,
 			       struct ipu_image_convert_image *image,
 			       enum ipu_rotate_mode rot_mode,
-			       bool rot_swap_width_height)
+			       bool rot_swap_width_height,
+			       unsigned int tile)
 {
 	struct ipu_image_convert_chan *chan = ctx->chan;
 	unsigned int burst_size;
@@ -621,23 +622,23 @@ static void init_idmac_channel(struct ipu_image_convert_ctx *ctx,
 	unsigned int tile_idx[2];
 
 	if (image->type == IMAGE_CONVERT_OUT) {
-		tile_idx[0] = ctx->out_tile_map[0];
+		tile_idx[0] = ctx->out_tile_map[tile];
 		tile_idx[1] = ctx->out_tile_map[1];
 	} else {
-		tile_idx[0] = 0;
+		tile_idx[0] = tile;
 		tile_idx[1] = 1;
 	}
 
 	if (rot_swap_width_height) {
-		width = image->tile[0].height;
-		height = image->tile[0].width;
-		stride = image->tile[0].rot_stride;
+		width = image->tile[tile_idx[0]].height;
+		height = image->tile[tile_idx[0]].width;
+		stride = image->tile[tile_idx[0]].rot_stride;
 		addr0 = ctx->rot_intermediate[0].phys;
 		if (ctx->double_buffering)
 			addr1 = ctx->rot_intermediate[1].phys;
 	} else {
-		width = image->tile[0].width;
-		height = image->tile[0].height;
+		width = image->tile[tile_idx[0]].width;
+		height = image->tile[tile_idx[0]].height;
 		stride = image->stride;
 		addr0 = image->base.phys0 +
 			image->tile[tile_idx[0]].offset;
@@ -687,7 +688,7 @@ static void init_idmac_channel(struct ipu_image_convert_ctx *ctx,
 	ipu_idmac_set_double_buffer(channel, ctx->double_buffering);
 }
 
-static int convert_start(struct ipu_image_convert_run *run)
+static int convert_start(struct ipu_image_convert_run *run, unsigned int tile)
 {
 	struct ipu_image_convert_ctx *ctx = run->ctx;
 	struct ipu_image_convert_chan *chan = ctx->chan;
@@ -695,28 +696,29 @@ static int convert_start(struct ipu_image_convert_run *run)
 	struct ipu_image_convert_image *s_image = &ctx->in;
 	struct ipu_image_convert_image *d_image = &ctx->out;
 	enum ipu_color_space src_cs, dest_cs;
+	unsigned int dst_tile = ctx->out_tile_map[tile];
 	unsigned int dest_width, dest_height;
 	int ret;
 
-	dev_dbg(priv->ipu->dev, "%s: task %u: starting ctx %p run %p\n",
-		__func__, chan->ic_task, ctx, run);
+	dev_dbg(priv->ipu->dev, "%s: task %u: starting ctx %p run %p tile %u -> %u\n",
+		__func__, chan->ic_task, ctx, run, tile, dst_tile);
 
 	src_cs = ipu_pixelformat_to_colorspace(s_image->fmt->fourcc);
 	dest_cs = ipu_pixelformat_to_colorspace(d_image->fmt->fourcc);
 
 	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
 		/* swap width/height for resizer */
-		dest_width = d_image->tile[0].height;
-		dest_height = d_image->tile[0].width;
+		dest_width = d_image->tile[dst_tile].height;
+		dest_height = d_image->tile[dst_tile].width;
 	} else {
-		dest_width = d_image->tile[0].width;
-		dest_height = d_image->tile[0].height;
+		dest_width = d_image->tile[dst_tile].width;
+		dest_height = d_image->tile[dst_tile].height;
 	}
 
 	/* setup the IC resizer and CSC */
 	ret = ipu_ic_task_init(chan->ic,
-			       s_image->tile[0].width,
-			       s_image->tile[0].height,
+			       s_image->tile[tile].width,
+			       s_image->tile[tile].height,
 			       dest_width,
 			       dest_height,
 			       src_cs, dest_cs);
@@ -727,27 +729,27 @@ static int convert_start(struct ipu_image_convert_run *run)
 
 	/* init the source MEM-->IC PP IDMAC channel */
 	init_idmac_channel(ctx, chan->in_chan, s_image,
-			   IPU_ROTATE_NONE, false);
+			   IPU_ROTATE_NONE, false, tile);
 
 	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
 		/* init the IC PP-->MEM IDMAC channel */
 		init_idmac_channel(ctx, chan->out_chan, d_image,
-				   IPU_ROTATE_NONE, true);
+				   IPU_ROTATE_NONE, true, tile);
 
 		/* init the MEM-->IC PP ROT IDMAC channel */
 		init_idmac_channel(ctx, chan->rotation_in_chan, d_image,
-				   ctx->rot_mode, true);
+				   ctx->rot_mode, true, tile);
 
 		/* init the destination IC PP ROT-->MEM IDMAC channel */
 		init_idmac_channel(ctx, chan->rotation_out_chan, d_image,
-				   IPU_ROTATE_NONE, false);
+				   IPU_ROTATE_NONE, false, tile);
 
 		/* now link IC PP-->MEM to MEM-->IC PP ROT */
 		ipu_idmac_link(chan->out_chan, chan->rotation_in_chan);
 	} else {
 		/* init the destination IC PP-->MEM IDMAC channel */
 		init_idmac_channel(ctx, chan->out_chan, d_image,
-				   ctx->rot_mode, false);
+				   ctx->rot_mode, false, tile);
 	}
 
 	/* enable the IC */
@@ -805,7 +807,7 @@ static int do_run(struct ipu_image_convert_run *run)
 	list_del(&run->list);
 	chan->current_run = run;
 
-	return convert_start(run);
+	return convert_start(run, 0);
 }
 
 /* hold irqlock when calling */
@@ -1436,14 +1438,22 @@ ipu_image_convert_prepare(struct ipu_soc *ipu, enum ipu_ic_task ic_task,
 				 !d_image->fmt->planar);
 
 	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
+		unsigned long intermediate_size = d_image->tile[0].size;
+		unsigned int i;
+
+		for (i = 1; i < ctx->num_tiles; i++) {
+			if (d_image->tile[i].size > intermediate_size)
+				intermediate_size = d_image->tile[i].size;
+		}
+
 		ret = alloc_dma_buf(priv, &ctx->rot_intermediate[0],
-				    d_image->tile[0].size);
+				    intermediate_size);
 		if (ret)
 			goto out_free;
 		if (ctx->double_buffering) {
 			ret = alloc_dma_buf(priv,
 					    &ctx->rot_intermediate[1],
-					    d_image->tile[0].size);
+					    intermediate_size);
 			if (ret)
 				goto out_free_dmabuf0;
 		}
-- 
2.19.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 05/16] gpu: ipu-v3: image-convert: calculate per-tile resize coefficients
  2018-09-18  9:34 [PATCH v3 00/16] i.MX media mem2mem scaler Philipp Zabel
                   ` (3 preceding siblings ...)
  2018-09-18  9:34 ` [PATCH v3 04/16] gpu: ipu-v3: image-convert: prepare for per-tile configuration Philipp Zabel
@ 2018-09-18  9:34 ` Philipp Zabel
  2018-09-18  9:34 ` [PATCH v3 06/16] gpu: ipu-v3: image-convert: reconfigure IC per tile Philipp Zabel
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 29+ messages in thread
From: Philipp Zabel @ 2018-09-18  9:34 UTC (permalink / raw)
  To: linux-media, Steve Longerbeam; +Cc: Nicolas Dufresne, kernel

Slightly modifying resize coefficients per-tile allows to completely
hide the seams between tiles and to sample the correct input pixels at
the bottom and right edges of the image.

Tiling requires a bilinear interpolator reset at each tile start, which
causes the image to be slightly shifted if the starting pixel should not
have been sampled from an integer pixel position in the source image
according to the full image resizing ratio. To work around this
hardware limitation, calculate per-tile resizing coefficients that make
sure that the correct input pixels are sampled at the tile end.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
---
No changes since v2.
---
 drivers/gpu/ipu-v3/ipu-image-convert.c | 236 ++++++++++++++++++++++++-
 1 file changed, 234 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-image-convert.c b/drivers/gpu/ipu-v3/ipu-image-convert.c
index f4db1553d23a..01a63eb8ccaf 100644
--- a/drivers/gpu/ipu-v3/ipu-image-convert.c
+++ b/drivers/gpu/ipu-v3/ipu-image-convert.c
@@ -135,6 +135,12 @@ struct ipu_image_convert_ctx {
 	struct ipu_image_convert_image in;
 	struct ipu_image_convert_image out;
 	enum ipu_rotate_mode rot_mode;
+	u32 downsize_coeff_h;
+	u32 downsize_coeff_v;
+	u32 image_resize_coeff_h;
+	u32 image_resize_coeff_v;
+	u32 resize_coeffs_h[MAX_STRIPES_W];
+	u32 resize_coeffs_v[MAX_STRIPES_H];
 
 	/* intermediate buffer for rotation */
 	struct ipu_image_convert_dma_buf rot_intermediate[2];
@@ -361,6 +367,69 @@ static inline int num_stripes(int dim)
 		return 4;
 }
 
+/*
+ * Calculate downsizing coefficients, which are the same for all tiles,
+ * and bilinear resizing coefficients, which are used to find the best
+ * seam positions.
+ */
+static int calc_image_resize_coefficients(struct ipu_image_convert_ctx *ctx,
+					  struct ipu_image *in,
+					  struct ipu_image *out)
+{
+	u32 downsized_width = in->rect.width;
+	u32 downsized_height = in->rect.height;
+	u32 downsize_coeff_v = 0;
+	u32 downsize_coeff_h = 0;
+	u32 resized_width = out->rect.width;
+	u32 resized_height = out->rect.height;
+	u32 resize_coeff_h;
+	u32 resize_coeff_v;
+
+	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
+		resized_width = out->rect.height;
+		resized_height = out->rect.width;
+	}
+
+	/* Do not let invalid input lead to an endless loop below */
+	if (WARN_ON(resized_width == 0 || resized_height == 0))
+		return -EINVAL;
+
+	while (downsized_width >= resized_width * 2) {
+		downsized_width >>= 1;
+		downsize_coeff_h++;
+	}
+
+	while (downsized_height >= resized_height * 2) {
+		downsized_height >>= 1;
+		downsize_coeff_v++;
+	}
+
+	/*
+	 * Calculate the bilinear resizing coefficients that could be used if
+	 * we were converting with a single tile. The bottom right output pixel
+	 * should sample as close as possible to the bottom right input pixel
+	 * out of the decimator, but not overshoot it:
+	 */
+	resize_coeff_h = 8192 * (downsized_width - 1) / (resized_width - 1);
+	resize_coeff_v = 8192 * (downsized_height - 1) / (resized_height - 1);
+
+	dev_dbg(ctx->chan->priv->ipu->dev,
+		"%s: hscale: >>%u, *8192/%u vscale: >>%u, *8192/%u, %ux%u tiles\n",
+		__func__, downsize_coeff_h, resize_coeff_h, downsize_coeff_v,
+		resize_coeff_v, ctx->in.num_cols, ctx->in.num_rows);
+
+	if (downsize_coeff_h > 2 || downsize_coeff_v  > 2 ||
+	    resize_coeff_h > 0x3fff || resize_coeff_v > 0x3fff)
+		return -EINVAL;
+
+	ctx->downsize_coeff_h = downsize_coeff_h;
+	ctx->downsize_coeff_v = downsize_coeff_v;
+	ctx->image_resize_coeff_h = resize_coeff_h;
+	ctx->image_resize_coeff_v = resize_coeff_v;
+
+	return 0;
+}
+
 static void calc_tile_dimensions(struct ipu_image_convert_ctx *ctx,
 				 struct ipu_image_convert_image *image)
 {
@@ -564,6 +633,149 @@ static void calc_tile_offsets(struct ipu_image_convert_ctx *ctx,
 		calc_tile_offsets_packed(ctx, image);
 }
 
+/*
+ * Calculate the resizing ratio for the IC main processing section given input
+ * size, fixed downsizing coefficient, and output size.
+ * Either round to closest for the next tile's first pixel to minimize seams
+ * and distortion (for all but right column / bottom row), or round down to
+ * avoid sampling beyond the edges of the input image for this tile's last
+ * pixel.
+ * Returns the resizing coefficient, resizing ratio is 8192.0 / resize_coeff.
+ */
+static u32 calc_resize_coeff(u32 input_size, u32 downsize_coeff,
+			     u32 output_size, bool allow_overshoot)
+{
+	u32 downsized = input_size >> downsize_coeff;
+
+	if (allow_overshoot)
+		return DIV_ROUND_CLOSEST(8192 * downsized, output_size);
+	else
+		return 8192 * (downsized - 1) / (output_size - 1);
+}
+
+/*
+ * Slightly modify resize coefficients per tile to hide the bilinear
+ * interpolator reset at tile borders, shifting the right / bottom edge
+ * by up to a half input pixel. This removes noticeable seams between
+ * tiles at higher upscaling factors.
+ */
+static void calc_tile_resize_coefficients(struct ipu_image_convert_ctx *ctx)
+{
+	struct ipu_image_convert_chan *chan = ctx->chan;
+	struct ipu_image_convert_priv *priv = chan->priv;
+	struct ipu_image_tile *in_tile, *out_tile;
+	unsigned int col, row, tile_idx;
+	unsigned int last_output;
+
+	for (col = 0; col < ctx->in.num_cols; col++) {
+		bool closest = (col < ctx->in.num_cols - 1) &&
+			       !(ctx->rot_mode & IPU_ROT_BIT_HFLIP);
+		u32 resized_width;
+		u32 resize_coeff_h;
+
+		tile_idx = col;
+		in_tile = &ctx->in.tile[tile_idx];
+		out_tile = &ctx->out.tile[ctx->out_tile_map[tile_idx]];
+
+		if (ipu_rot_mode_is_irt(ctx->rot_mode))
+			resized_width = out_tile->height;
+		else
+			resized_width = out_tile->width;
+
+		resize_coeff_h = calc_resize_coeff(in_tile->width,
+						   ctx->downsize_coeff_h,
+						   resized_width, closest);
+
+		dev_dbg(priv->ipu->dev, "%s: column %u hscale: *8192/%u\n",
+			__func__, col, resize_coeff_h);
+
+
+		for (row = 0; row < ctx->in.num_rows; row++) {
+			tile_idx = row * ctx->in.num_cols + col;
+			in_tile = &ctx->in.tile[tile_idx];
+			out_tile = &ctx->out.tile[ctx->out_tile_map[tile_idx]];
+
+			/*
+			 * With the horizontal scaling factor known, round up
+			 * resized width (output width or height) to burst size.
+			 */
+			if (ipu_rot_mode_is_irt(ctx->rot_mode))
+				out_tile->height = round_up(resized_width, 8);
+			else
+				out_tile->width = round_up(resized_width, 8);
+
+			/*
+			 * Calculate input width from the last accessed input
+			 * pixel given resized width and scaling coefficients.
+			 * Round up to burst size.
+			 */
+			last_output = round_up(resized_width, 8) - 1;
+			if (closest)
+				last_output++;
+			in_tile->width = round_up(
+				(DIV_ROUND_UP(last_output * resize_coeff_h,
+					      8192) + 1)
+				<< ctx->downsize_coeff_h, 8);
+		}
+
+		ctx->resize_coeffs_h[col] = resize_coeff_h;
+	}
+
+	for (row = 0; row < ctx->in.num_rows; row++) {
+		bool closest = (row < ctx->in.num_rows - 1) &&
+			       !(ctx->rot_mode & IPU_ROT_BIT_VFLIP);
+		u32 resized_height;
+		u32 resize_coeff_v;
+
+		tile_idx = row * ctx->in.num_cols;
+		in_tile = &ctx->in.tile[tile_idx];
+		out_tile = &ctx->out.tile[ctx->out_tile_map[tile_idx]];
+
+		if (ipu_rot_mode_is_irt(ctx->rot_mode))
+			resized_height = out_tile->width;
+		else
+			resized_height = out_tile->height;
+
+		resize_coeff_v = calc_resize_coeff(in_tile->height,
+						   ctx->downsize_coeff_v,
+						   resized_height, closest);
+
+		dev_dbg(priv->ipu->dev, "%s: row %u vscale: *8192/%u\n",
+			__func__, row, resize_coeff_v);
+
+		for (col = 0; col < ctx->in.num_cols; col++) {
+			tile_idx = row * ctx->in.num_cols + col;
+			in_tile = &ctx->in.tile[tile_idx];
+			out_tile = &ctx->out.tile[ctx->out_tile_map[tile_idx]];
+
+			/*
+			 * With the vertical scaling factor known, round up
+			 * resized height (output width or height) to IDMAC
+			 * limitations.
+			 */
+			if (ipu_rot_mode_is_irt(ctx->rot_mode))
+				out_tile->width = round_up(resized_height, 2);
+			else
+				out_tile->height = round_up(resized_height, 2);
+
+			/*
+			 * Calculate input width from the last accessed input
+			 * pixel given resized height and scaling coefficients.
+			 * Align to IDMAC restrictions.
+			 */
+			last_output = round_up(resized_height, 2) - 1;
+			if (closest)
+				last_output++;
+			in_tile->height = round_up(
+				(DIV_ROUND_UP(last_output * resize_coeff_v,
+					      8192) + 1)
+				<< ctx->downsize_coeff_v, 2);
+		}
+
+		ctx->resize_coeffs_v[row] = resize_coeff_v;
+	}
+}
+
 /*
  * return the number of runs in given queue (pending_q or done_q)
  * for this context. hold irqlock when calling.
@@ -698,6 +910,8 @@ static int convert_start(struct ipu_image_convert_run *run, unsigned int tile)
 	enum ipu_color_space src_cs, dest_cs;
 	unsigned int dst_tile = ctx->out_tile_map[tile];
 	unsigned int dest_width, dest_height;
+	unsigned int col, row;
+	u32 rsc;
 	int ret;
 
 	dev_dbg(priv->ipu->dev, "%s: task %u: starting ctx %p run %p tile %u -> %u\n",
@@ -715,13 +929,26 @@ static int convert_start(struct ipu_image_convert_run *run, unsigned int tile)
 		dest_height = d_image->tile[dst_tile].height;
 	}
 
+	row = tile / s_image->num_cols;
+	col = tile % s_image->num_cols;
+
+	rsc =  (ctx->downsize_coeff_v << 30) |
+	       (ctx->resize_coeffs_v[row] << 16) |
+	       (ctx->downsize_coeff_h << 14) |
+	       (ctx->resize_coeffs_h[col]);
+
+	dev_dbg(priv->ipu->dev, "%s: %ux%u -> %ux%u (rsc = 0x%x)\n",
+		__func__, s_image->tile[tile].width,
+		s_image->tile[tile].height, dest_width, dest_height, rsc);
+
 	/* setup the IC resizer and CSC */
-	ret = ipu_ic_task_init(chan->ic,
+	ret = ipu_ic_task_init_rsc(chan->ic,
 			       s_image->tile[tile].width,
 			       s_image->tile[tile].height,
 			       dest_width,
 			       dest_height,
-			       src_cs, dest_cs);
+			       src_cs, dest_cs,
+			       rsc);
 	if (ret) {
 		dev_err(priv->ipu->dev, "ipu_ic_task_init failed, %d\n", ret);
 		return ret;
@@ -1407,6 +1634,10 @@ ipu_image_convert_prepare(struct ipu_soc *ipu, enum ipu_ic_task ic_task,
 	ctx->num_tiles = d_image->num_cols * d_image->num_rows;
 	ctx->rot_mode = rot_mode;
 
+	ret = calc_image_resize_coefficients(ctx, in, out);
+	if (ret)
+		goto out_free;
+
 	ret = fill_image(ctx, s_image, in, IMAGE_CONVERT_IN);
 	if (ret)
 		goto out_free;
@@ -1415,6 +1646,7 @@ ipu_image_convert_prepare(struct ipu_soc *ipu, enum ipu_ic_task ic_task,
 		goto out_free;
 
 	calc_out_tile_map(ctx);
+	calc_tile_resize_coefficients(ctx);
 
 	dump_format(ctx, s_image);
 	dump_format(ctx, d_image);
-- 
2.19.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 06/16] gpu: ipu-v3: image-convert: reconfigure IC per tile
  2018-09-18  9:34 [PATCH v3 00/16] i.MX media mem2mem scaler Philipp Zabel
                   ` (4 preceding siblings ...)
  2018-09-18  9:34 ` [PATCH v3 05/16] gpu: ipu-v3: image-convert: calculate per-tile resize coefficients Philipp Zabel
@ 2018-09-18  9:34 ` Philipp Zabel
  2018-09-18  9:34 ` [PATCH v3 07/16] gpu: ipu-v3: image-convert: store tile top/left position Philipp Zabel
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 29+ messages in thread
From: Philipp Zabel @ 2018-09-18  9:34 UTC (permalink / raw)
  To: linux-media, Steve Longerbeam; +Cc: Nicolas Dufresne, kernel

For differently sized tiles or if the resizing coefficients change,
we have to stop, reconfigure, and restart the IC between tiles.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
---
No changes since v2.
---
 drivers/gpu/ipu-v3/ipu-image-convert.c | 65 +++++++++++++++++---------
 1 file changed, 44 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-image-convert.c b/drivers/gpu/ipu-v3/ipu-image-convert.c
index 01a63eb8ccaf..65f0321a7971 100644
--- a/drivers/gpu/ipu-v3/ipu-image-convert.c
+++ b/drivers/gpu/ipu-v3/ipu-image-convert.c
@@ -1137,6 +1137,24 @@ static irqreturn_t do_bh(int irq, void *dev_id)
 	return IRQ_HANDLED;
 }
 
+static bool ic_settings_changed(struct ipu_image_convert_ctx *ctx)
+{
+	unsigned int cur_tile = ctx->next_tile - 1;
+	unsigned int next_tile = ctx->next_tile;
+
+	if (ctx->resize_coeffs_h[cur_tile % ctx->in.num_cols] !=
+	    ctx->resize_coeffs_h[next_tile % ctx->in.num_cols] ||
+	    ctx->resize_coeffs_v[cur_tile / ctx->in.num_cols] !=
+	    ctx->resize_coeffs_v[next_tile / ctx->in.num_cols] ||
+	    ctx->in.tile[cur_tile].width != ctx->in.tile[next_tile].width ||
+	    ctx->in.tile[cur_tile].height != ctx->in.tile[next_tile].height ||
+	    ctx->out.tile[cur_tile].width != ctx->out.tile[next_tile].width ||
+	    ctx->out.tile[cur_tile].height != ctx->out.tile[next_tile].height)
+		return true;
+
+	return false;
+}
+
 /* hold irqlock when calling */
 static irqreturn_t do_irq(struct ipu_image_convert_run *run)
 {
@@ -1180,27 +1198,32 @@ static irqreturn_t do_irq(struct ipu_image_convert_run *run)
 	 * not done, place the next tile buffers.
 	 */
 	if (!ctx->double_buffering) {
-
-		src_tile = &s_image->tile[ctx->next_tile];
-		dst_idx = ctx->out_tile_map[ctx->next_tile];
-		dst_tile = &d_image->tile[dst_idx];
-
-		ipu_cpmem_set_buffer(chan->in_chan, 0,
-				     s_image->base.phys0 + src_tile->offset);
-		ipu_cpmem_set_buffer(outch, 0,
-				     d_image->base.phys0 + dst_tile->offset);
-		if (s_image->fmt->planar)
-			ipu_cpmem_set_uv_offset(chan->in_chan,
-						src_tile->u_off,
-						src_tile->v_off);
-		if (d_image->fmt->planar)
-			ipu_cpmem_set_uv_offset(outch,
-						dst_tile->u_off,
-						dst_tile->v_off);
-
-		ipu_idmac_select_buffer(chan->in_chan, 0);
-		ipu_idmac_select_buffer(outch, 0);
-
+		if (ic_settings_changed(ctx)) {
+			convert_stop(run);
+			convert_start(run, ctx->next_tile);
+		} else {
+			src_tile = &s_image->tile[ctx->next_tile];
+			dst_idx = ctx->out_tile_map[ctx->next_tile];
+			dst_tile = &d_image->tile[dst_idx];
+
+			ipu_cpmem_set_buffer(chan->in_chan, 0,
+					     s_image->base.phys0 +
+					     src_tile->offset);
+			ipu_cpmem_set_buffer(outch, 0,
+					     d_image->base.phys0 +
+					     dst_tile->offset);
+			if (s_image->fmt->planar)
+				ipu_cpmem_set_uv_offset(chan->in_chan,
+							src_tile->u_off,
+							src_tile->v_off);
+			if (d_image->fmt->planar)
+				ipu_cpmem_set_uv_offset(outch,
+							dst_tile->u_off,
+							dst_tile->v_off);
+
+			ipu_idmac_select_buffer(chan->in_chan, 0);
+			ipu_idmac_select_buffer(outch, 0);
+		}
 	} else if (ctx->next_tile < ctx->num_tiles - 1) {
 
 		src_tile = &s_image->tile[ctx->next_tile + 1];
-- 
2.19.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 07/16] gpu: ipu-v3: image-convert: store tile top/left position
  2018-09-18  9:34 [PATCH v3 00/16] i.MX media mem2mem scaler Philipp Zabel
                   ` (5 preceding siblings ...)
  2018-09-18  9:34 ` [PATCH v3 06/16] gpu: ipu-v3: image-convert: reconfigure IC per tile Philipp Zabel
@ 2018-09-18  9:34 ` Philipp Zabel
  2018-09-18  9:34 ` [PATCH v3 08/16] gpu: ipu-v3: image-convert: calculate tile dimensions and offsets outside fill_image Philipp Zabel
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 29+ messages in thread
From: Philipp Zabel @ 2018-09-18  9:34 UTC (permalink / raw)
  To: linux-media, Steve Longerbeam; +Cc: Nicolas Dufresne, kernel

Store tile top/left position in pixels in the tile structure.
This will allow overlapping tiles with different sizes later.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
---
No changes since v2.
---
 drivers/gpu/ipu-v3/ipu-image-convert.c | 27 ++++++++++++++------------
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-image-convert.c b/drivers/gpu/ipu-v3/ipu-image-convert.c
index 65f0321a7971..e4b198777d0f 100644
--- a/drivers/gpu/ipu-v3/ipu-image-convert.c
+++ b/drivers/gpu/ipu-v3/ipu-image-convert.c
@@ -84,6 +84,8 @@ struct ipu_image_convert_dma_chan {
 struct ipu_image_tile {
 	u32 width;
 	u32 height;
+	u32 left;
+	u32 top;
 	/* size and strides are in bytes */
 	u32 size;
 	u32 stride;
@@ -433,13 +435,17 @@ static int calc_image_resize_coefficients(struct ipu_image_convert_ctx *ctx,
 static void calc_tile_dimensions(struct ipu_image_convert_ctx *ctx,
 				 struct ipu_image_convert_image *image)
 {
-	int i;
+	unsigned int i;
 
 	for (i = 0; i < ctx->num_tiles; i++) {
 		struct ipu_image_tile *tile = &image->tile[i];
+		const unsigned int row = i / image->num_cols;
+		const unsigned int col = i % image->num_cols;
 
 		tile->height = image->base.pix.height / image->num_rows;
 		tile->width = image->base.pix.width / image->num_cols;
+		tile->left = col * tile->width;
+		tile->top = row * tile->height;
 		tile->size = ((tile->height * image->fmt->bpp) >> 3) *
 			tile->width;
 
@@ -535,7 +541,7 @@ static void calc_tile_offsets_planar(struct ipu_image_convert_ctx *ctx,
 	struct ipu_image_convert_priv *priv = chan->priv;
 	const struct ipu_image_pixfmt *fmt = image->fmt;
 	unsigned int row, col, tile = 0;
-	u32 H, w, h, y_stride, uv_stride;
+	u32 H, top, y_stride, uv_stride;
 	u32 uv_row_off, uv_col_off, uv_off, u_off, v_off, tmp;
 	u32 y_row_off, y_col_off, y_off;
 	u32 y_size, uv_size;
@@ -552,13 +558,12 @@ static void calc_tile_offsets_planar(struct ipu_image_convert_ctx *ctx,
 	uv_size = y_size / (fmt->uv_width_dec * fmt->uv_height_dec);
 
 	for (row = 0; row < image->num_rows; row++) {
-		w = image->tile[tile].width;
-		h = image->tile[tile].height;
-		y_row_off = row * h * y_stride;
-		uv_row_off = (row * h * uv_stride) / fmt->uv_height_dec;
+		top = image->tile[tile].top;
+		y_row_off = top * y_stride;
+		uv_row_off = (top * uv_stride) / fmt->uv_height_dec;
 
 		for (col = 0; col < image->num_cols; col++) {
-			y_col_off = col * w;
+			y_col_off = image->tile[tile].left;
 			uv_col_off = y_col_off / fmt->uv_width_dec;
 			if (fmt->uv_packed)
 				uv_col_off *= 2;
@@ -595,7 +600,7 @@ static void calc_tile_offsets_packed(struct ipu_image_convert_ctx *ctx,
 	struct ipu_image_convert_priv *priv = chan->priv;
 	const struct ipu_image_pixfmt *fmt = image->fmt;
 	unsigned int row, col, tile = 0;
-	u32 w, h, bpp, stride;
+	u32 bpp, stride;
 	u32 row_off, col_off;
 
 	/* setup some convenience vars */
@@ -603,12 +608,10 @@ static void calc_tile_offsets_packed(struct ipu_image_convert_ctx *ctx,
 	bpp = fmt->bpp;
 
 	for (row = 0; row < image->num_rows; row++) {
-		w = image->tile[tile].width;
-		h = image->tile[tile].height;
-		row_off = row * h * stride;
+		row_off = image->tile[tile].top * stride;
 
 		for (col = 0; col < image->num_cols; col++) {
-			col_off = (col * w * bpp) >> 3;
+			col_off = (image->tile[tile].left * bpp) >> 3;
 
 			image->tile[tile].offset = row_off + col_off;
 			image->tile[tile].u_off = 0;
-- 
2.19.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 08/16] gpu: ipu-v3: image-convert: calculate tile dimensions and offsets outside fill_image
  2018-09-18  9:34 [PATCH v3 00/16] i.MX media mem2mem scaler Philipp Zabel
                   ` (6 preceding siblings ...)
  2018-09-18  9:34 ` [PATCH v3 07/16] gpu: ipu-v3: image-convert: store tile top/left position Philipp Zabel
@ 2018-09-18  9:34 ` Philipp Zabel
  2018-09-18  9:34 ` [PATCH v3 09/16] gpu: ipu-v3: image-convert: move tile alignment helpers Philipp Zabel
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 29+ messages in thread
From: Philipp Zabel @ 2018-09-18  9:34 UTC (permalink / raw)
  To: linux-media, Steve Longerbeam; +Cc: Nicolas Dufresne, kernel

This will allow to calculate seam positions after initializing the
ipu_image base structure but before calculating tile dimensions.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
---
No changes since v2.
---
 drivers/gpu/ipu-v3/ipu-image-convert.c | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-image-convert.c b/drivers/gpu/ipu-v3/ipu-image-convert.c
index e4b198777d0f..830622277588 100644
--- a/drivers/gpu/ipu-v3/ipu-image-convert.c
+++ b/drivers/gpu/ipu-v3/ipu-image-convert.c
@@ -1453,9 +1453,6 @@ static int fill_image(struct ipu_image_convert_ctx *ctx,
 	else
 		ic_image->stride  = ic_image->base.pix.bytesperline;
 
-	calc_tile_dimensions(ctx, ic_image);
-	calc_tile_offsets(ctx, ic_image);
-
 	return 0;
 }
 
@@ -1660,10 +1657,6 @@ ipu_image_convert_prepare(struct ipu_soc *ipu, enum ipu_ic_task ic_task,
 	ctx->num_tiles = d_image->num_cols * d_image->num_rows;
 	ctx->rot_mode = rot_mode;
 
-	ret = calc_image_resize_coefficients(ctx, in, out);
-	if (ret)
-		goto out_free;
-
 	ret = fill_image(ctx, s_image, in, IMAGE_CONVERT_IN);
 	if (ret)
 		goto out_free;
@@ -1671,6 +1664,16 @@ ipu_image_convert_prepare(struct ipu_soc *ipu, enum ipu_ic_task ic_task,
 	if (ret)
 		goto out_free;
 
+	ret = calc_image_resize_coefficients(ctx, in, out);
+	if (ret)
+		goto out_free;
+
+	calc_tile_dimensions(ctx, s_image);
+	calc_tile_offsets(ctx, s_image);
+
+	calc_tile_dimensions(ctx, d_image);
+	calc_tile_offsets(ctx, d_image);
+
 	calc_out_tile_map(ctx);
 	calc_tile_resize_coefficients(ctx);
 
-- 
2.19.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 09/16] gpu: ipu-v3: image-convert: move tile alignment helpers
  2018-09-18  9:34 [PATCH v3 00/16] i.MX media mem2mem scaler Philipp Zabel
                   ` (7 preceding siblings ...)
  2018-09-18  9:34 ` [PATCH v3 08/16] gpu: ipu-v3: image-convert: calculate tile dimensions and offsets outside fill_image Philipp Zabel
@ 2018-09-18  9:34 ` Philipp Zabel
  2018-09-18  9:34 ` [PATCH v3 10/16] gpu: ipu-v3: image-convert: select optimal seam positions Philipp Zabel
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 29+ messages in thread
From: Philipp Zabel @ 2018-09-18  9:34 UTC (permalink / raw)
  To: linux-media, Steve Longerbeam; +Cc: Nicolas Dufresne, kernel

Move tile_width_align and tile_height_align up so they
can be used by the tile edge position calculation code.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
---
No changes since v2.
---
 drivers/gpu/ipu-v3/ipu-image-convert.c | 54 +++++++++++++-------------
 1 file changed, 27 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-image-convert.c b/drivers/gpu/ipu-v3/ipu-image-convert.c
index 830622277588..97061049a9d2 100644
--- a/drivers/gpu/ipu-v3/ipu-image-convert.c
+++ b/drivers/gpu/ipu-v3/ipu-image-convert.c
@@ -432,6 +432,33 @@ static int calc_image_resize_coefficients(struct ipu_image_convert_ctx *ctx,
 	return 0;
 }
 
+/*
+ * We have to adjust the tile width such that the tile physaddrs and
+ * U and V plane offsets are multiples of 8 bytes as required by
+ * the IPU DMA Controller. For the planar formats, this corresponds
+ * to a pixel alignment of 16 (but use a more formal equation since
+ * the variables are available). For all the packed formats, 8 is
+ * good enough.
+ */
+static inline u32 tile_width_align(const struct ipu_image_pixfmt *fmt)
+{
+	return fmt->planar ? 8 * fmt->uv_width_dec : 8;
+}
+
+/*
+ * For tile height alignment, we have to ensure that the output tile
+ * heights are multiples of 8 lines if the IRT is required by the
+ * given rotation mode (the IRT performs rotations on 8x8 blocks
+ * at a time). If the IRT is not used, or for input image tiles,
+ * 2 lines are good enough.
+ */
+static inline u32 tile_height_align(enum ipu_image_convert_type type,
+				    enum ipu_rotate_mode rot_mode)
+{
+	return (type == IMAGE_CONVERT_OUT &&
+		ipu_rot_mode_is_irt(rot_mode)) ? 8 : 2;
+}
+
 static void calc_tile_dimensions(struct ipu_image_convert_ctx *ctx,
 				 struct ipu_image_convert_image *image)
 {
@@ -1473,33 +1500,6 @@ static unsigned int clamp_align(unsigned int x, unsigned int min,
 	return x;
 }
 
-/*
- * We have to adjust the tile width such that the tile physaddrs and
- * U and V plane offsets are multiples of 8 bytes as required by
- * the IPU DMA Controller. For the planar formats, this corresponds
- * to a pixel alignment of 16 (but use a more formal equation since
- * the variables are available). For all the packed formats, 8 is
- * good enough.
- */
-static inline u32 tile_width_align(const struct ipu_image_pixfmt *fmt)
-{
-	return fmt->planar ? 8 * fmt->uv_width_dec : 8;
-}
-
-/*
- * For tile height alignment, we have to ensure that the output tile
- * heights are multiples of 8 lines if the IRT is required by the
- * given rotation mode (the IRT performs rotations on 8x8 blocks
- * at a time). If the IRT is not used, or for input image tiles,
- * 2 lines are good enough.
- */
-static inline u32 tile_height_align(enum ipu_image_convert_type type,
-				    enum ipu_rotate_mode rot_mode)
-{
-	return (type == IMAGE_CONVERT_OUT &&
-		ipu_rot_mode_is_irt(rot_mode)) ? 8 : 2;
-}
-
 /* Adjusts input/output images to IPU restrictions */
 void ipu_image_convert_adjust(struct ipu_image *in, struct ipu_image *out,
 			      enum ipu_rotate_mode rot_mode)
-- 
2.19.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 10/16] gpu: ipu-v3: image-convert: select optimal seam positions
  2018-09-18  9:34 [PATCH v3 00/16] i.MX media mem2mem scaler Philipp Zabel
                   ` (8 preceding siblings ...)
  2018-09-18  9:34 ` [PATCH v3 09/16] gpu: ipu-v3: image-convert: move tile alignment helpers Philipp Zabel
@ 2018-09-18  9:34 ` Philipp Zabel
  2018-10-13  0:33   ` Steve Longerbeam
  2018-09-18  9:34 ` [PATCH v3 11/16] gpu: ipu-v3: image-convert: fix debug output for varying tile sizes Philipp Zabel
                   ` (6 subsequent siblings)
  16 siblings, 1 reply; 29+ messages in thread
From: Philipp Zabel @ 2018-09-18  9:34 UTC (permalink / raw)
  To: linux-media, Steve Longerbeam; +Cc: Nicolas Dufresne, kernel

Select seam positions that minimize distortions during seam hiding while
satifying input and output IDMAC, rotator, and image format constraints.

This code looks for aligned output seam positions that minimize the
difference between the fractional corresponding ideal input positions
and the input positions rounded to alignment requirements.

Since now tiles can be sized differently, alignment restrictions of the
complete image can be relaxed in the next step.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
---
Changes since v2:
 - Switch width/height properly and align tile top left positions to 8x8
   IRT block size when rotating.
 - Align input width to input burst length in case the scaling step
   flips horizontally.
 - Fix bottom edge calculation.
---
 drivers/gpu/ipu-v3/ipu-image-convert.c | 340 ++++++++++++++++++++++++-
 1 file changed, 334 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-image-convert.c b/drivers/gpu/ipu-v3/ipu-image-convert.c
index 97061049a9d2..4a513dea7913 100644
--- a/drivers/gpu/ipu-v3/ipu-image-convert.c
+++ b/drivers/gpu/ipu-v3/ipu-image-convert.c
@@ -432,6 +432,123 @@ static int calc_image_resize_coefficients(struct ipu_image_convert_ctx *ctx,
 	return 0;
 }
 
+#define round_closest(x, y) round_down((x) + (y)/2, (y))
+
+/*
+ * Find the best aligned seam position in the inverval [out_start, out_end].
+ * Rotation and image offsets are out of scope.
+ *
+ * @out_start: start of inverval, must be within 1024 pixels / lines
+ *             of out_end
+ * @out_end: end of interval, smaller than or equal to out_edge
+ * @in_edge: input right / bottom edge
+ * @out_edge: output right / bottom edge
+ * @in_align: input alignment, either horizontal 8-byte line start address
+ *            alignment, or pixel alignment due to image format
+ * @out_align: output alignment, either horizontal 8-byte line start address
+ *             alignment, or pixel alignment due to image format or rotator
+ *             block size
+ * @in_burst: horizontal input burst size in case of horizontal flip
+ * @out_burst: horizontal output burst size or rotator block size
+ * @downsize_coeff: downsizing section coefficient
+ * @resize_coeff: main processing section resizing coefficient
+ * @_in_seam: aligned input seam position return value
+ * @_out_seam: aligned output seam position return value
+ */
+static void find_best_seam(struct ipu_image_convert_ctx *ctx,
+			   unsigned int out_start,
+			   unsigned int out_end,
+			   unsigned int in_edge,
+			   unsigned int out_edge,
+			   unsigned int in_align,
+			   unsigned int out_align,
+			   unsigned int in_burst,
+			   unsigned int out_burst,
+			   unsigned int downsize_coeff,
+			   unsigned int resize_coeff,
+			   u32 *_in_seam,
+			   u32 *_out_seam)
+{
+	struct device *dev = ctx->chan->priv->ipu->dev;
+	unsigned int out_pos;
+	/* Input / output seam position candidates */
+	unsigned int out_seam = 0;
+	unsigned int in_seam = 0;
+	unsigned int min_diff = UINT_MAX;
+
+	/*
+	 * Output tiles must start at a multiple of 8 bytes horizontally and
+	 * possibly at an even line horizontally depending on the pixel format.
+	 * Only consider output aligned positions for the seam.
+	 */
+	out_start = round_up(out_start, out_align);
+	for (out_pos = out_start; out_pos < out_end; out_pos += out_align) {
+		unsigned int in_pos;
+		unsigned int in_pos_aligned;
+		unsigned int abs_diff;
+
+		/*
+		 * Tiles in the right row / bottom column may not be allowed to
+		 * overshoot horizontally / vertically. out_burst may be the
+		 * actual DMA burst size, or the rotator block size.
+		 */
+		if ((out_burst > 1) && (out_edge - out_pos) % out_burst)
+			continue;
+
+		/*
+		 * Input sample position, corresponding to out_pos, 19.13 fixed
+		 * point.
+		 */
+		in_pos = (out_pos * resize_coeff) << downsize_coeff;
+		/*
+		 * The closest input sample position that we could actually
+		 * start the input tile at, 19.13 fixed point.
+		 */
+		in_pos_aligned = round_closest(in_pos, 8192U * in_align);
+
+		if ((in_burst > 1) &&
+		    (in_edge - in_pos_aligned / 8192U) % in_burst)
+			continue;
+
+		if (in_pos < in_pos_aligned)
+			abs_diff = in_pos_aligned - in_pos;
+		else
+			abs_diff = in_pos - in_pos_aligned;
+
+		if (abs_diff < min_diff) {
+			in_seam = in_pos_aligned;
+			out_seam = out_pos;
+			min_diff = abs_diff;
+		}
+	}
+
+	*_out_seam = out_seam;
+	/* Convert 19.13 fixed point to integer seam position */
+	*_in_seam = DIV_ROUND_CLOSEST(in_seam, 8192U);
+
+	dev_dbg(dev, "%s: out_seam %u(%u) in [%u, %u], in_seam %u(%u) diff %u.%03u\n",
+		__func__, out_seam, out_align, out_start, out_end,
+		*_in_seam, in_align, min_diff / 8192,
+		DIV_ROUND_CLOSEST(min_diff % 8192 * 1000, 8192));
+}
+
+/*
+ * Tile left edges are required to be aligned to multiples of 8 bytes
+ * by the IDMAC.
+ */
+static inline u32 tile_left_align(const struct ipu_image_pixfmt *fmt)
+{
+	return fmt->planar ? 8 * fmt->uv_width_dec : 64 / fmt->bpp;
+}
+
+/*
+ * Tile top edge alignment is only limited by chroma subsampling.
+ */
+static inline u32 tile_top_align(const struct ipu_image_pixfmt *fmt)
+{
+	return fmt->uv_height_dec > 1 ? 2 : 1;
+}
+
 /*
  * We have to adjust the tile width such that the tile physaddrs and
  * U and V plane offsets are multiples of 8 bytes as required by
@@ -459,20 +576,228 @@ static inline u32 tile_height_align(enum ipu_image_convert_type type,
 		ipu_rot_mode_is_irt(rot_mode)) ? 8 : 2;
 }
 
+/*
+ * Fill in left position and width and for all tiles in an input column, and
+ * for all corresponding output tiles. If the 90° rotator is used, the output
+ * tiles are in a row, and output tile top position and height are set.
+ */
+static void fill_tile_column(struct ipu_image_convert_ctx *ctx,
+			     unsigned int col,
+			     struct ipu_image_convert_image *in,
+			     unsigned int in_left, unsigned int in_width,
+			     struct ipu_image_convert_image *out,
+			     unsigned int out_left, unsigned int out_width)
+{
+	unsigned int row, tile_idx;
+	struct ipu_image_tile *in_tile, *out_tile;
+
+	for (row = 0; row < in->num_rows; row++) {
+		tile_idx = in->num_cols * row + col;
+		in_tile = &in->tile[tile_idx];
+		out_tile = &out->tile[ctx->out_tile_map[tile_idx]];
+
+		in_tile->left = in_left;
+		in_tile->width = in_width;
+
+		if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
+			out_tile->top = out_left;
+			out_tile->height = out_width;
+		} else {
+			out_tile->left = out_left;
+			out_tile->width = out_width;
+		}
+	}
+}
+
+/*
+ * Fill in top position and height and for all tiles in an input row, and
+ * for all corresponding output tiles. If the 90° rotator is used, the output
+ * tiles are in a column, and output tile left position and width are set.
+ */
+static void fill_tile_row(struct ipu_image_convert_ctx *ctx, unsigned int row,
+			  struct ipu_image_convert_image *in,
+			  unsigned int in_top, unsigned int in_height,
+			  struct ipu_image_convert_image *out,
+			  unsigned int out_top, unsigned int out_height)
+{
+	unsigned int col, tile_idx;
+	struct ipu_image_tile *in_tile, *out_tile;
+
+	for (col = 0; col < in->num_cols; col++) {
+		tile_idx = in->num_cols * row + col;
+		in_tile = &in->tile[tile_idx];
+		out_tile = &out->tile[ctx->out_tile_map[tile_idx]];
+
+		in_tile->top = in_top;
+		in_tile->height = in_height;
+
+		if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
+			out_tile->left = out_top;
+			out_tile->width = out_height;
+		} else {
+			out_tile->top = out_top;
+			out_tile->height = out_height;
+		}
+	}
+}
+
+/*
+ * Find the best horizontal and vertical seam positions to split into tiles.
+ * Minimize the fractional part of the input sampling position for the
+ * top / left pixels of each tile.
+ */
+static void find_seams(struct ipu_image_convert_ctx *ctx,
+		       struct ipu_image_convert_image *in,
+		       struct ipu_image_convert_image *out)
+{
+	struct device *dev = ctx->chan->priv->ipu->dev;
+	unsigned int resized_width = out->base.rect.width;
+	unsigned int resized_height = out->base.rect.height;
+	unsigned int col;
+	unsigned int row;
+	unsigned int in_left_align = tile_left_align(in->fmt);
+	unsigned int in_top_align = tile_top_align(in->fmt);
+	unsigned int out_left_align = tile_left_align(out->fmt);
+	unsigned int out_top_align = tile_top_align(out->fmt);
+	unsigned int out_width_align = tile_width_align(out->fmt);
+	unsigned int out_height_align = tile_height_align(out->type,
+							  ctx->rot_mode);
+	unsigned int in_right = in->base.rect.width;
+	unsigned int in_bottom = in->base.rect.height;
+	unsigned int out_right = out->base.rect.width;
+	unsigned int out_bottom = out->base.rect.height;
+	unsigned int flipped_out_left;
+	unsigned int flipped_out_top;
+
+	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
+		/* Switch width/height and align top left to IRT block size */
+		resized_width = out->base.rect.height;
+		resized_height = out->base.rect.width;
+		out_left_align = out_height_align;
+		out_top_align = out_width_align;
+		out_width_align = out_left_align;
+		out_height_align = out_top_align;
+		out_right = out->base.rect.height;
+		out_bottom = out->base.rect.width;
+	}
+
+	for (col = in->num_cols - 1; col > 0; col--) {
+		bool allow_in_overshoot = ipu_rot_mode_is_irt(ctx->rot_mode) ||
+					  !(ctx->rot_mode & IPU_ROT_BIT_HFLIP);
+		bool allow_out_overshoot = (col < in->num_cols - 1) &&
+					   !(ctx->rot_mode & IPU_ROT_BIT_HFLIP);
+		unsigned int out_start;
+		unsigned int out_end;
+		unsigned int in_left;
+		unsigned int out_left;
+
+		/*
+		 * Align input width to burst length if the scaling step flips
+		 * horizontally.
+		 */
+
+		/* Start within 1024 pixels of the right edge */
+		out_start = max_t(int, 0, out_right - 1024);
+		/* End before having to add more columns to the left */
+		out_end = min_t(unsigned int, out_right, col * 1024);
+
+		find_best_seam(ctx, out_start, out_end,
+			       in_right, out_right,
+			       in_left_align, out_left_align,
+			       allow_in_overshoot ? 1 : 8 /* burst length */,
+			       allow_out_overshoot ? 1 : out_width_align,
+			       ctx->downsize_coeff_h, ctx->image_resize_coeff_h,
+			       &in_left, &out_left);
+
+		if (ctx->rot_mode & IPU_ROT_BIT_HFLIP)
+			flipped_out_left = resized_width - out_right;
+		else
+			flipped_out_left = out_left;
+
+		fill_tile_column(ctx, col, in, in_left, in_right - in_left,
+				 out, flipped_out_left, out_right - out_left);
+
+		dev_dbg(dev, "%s: col %u: %u, %u -> %u, %u\n", __func__, col,
+			in_left, in_right - in_left,
+			flipped_out_left, out_right - out_left);
+
+		in_right = in_left;
+		out_right = out_left;
+	}
+
+	flipped_out_left = (ctx->rot_mode & IPU_ROT_BIT_HFLIP) ?
+			   resized_width - out_right : 0;
+
+	fill_tile_column(ctx, 0, in, 0, in_right,
+			 out, flipped_out_left, out_right);
+
+	dev_dbg(dev, "%s: col 0: 0, %u -> %u, %u\n", __func__,
+		in_right, flipped_out_left, out_right);
+
+	for (row = in->num_rows - 1; row > 0; row--) {
+		bool allow_overshoot = row < in->num_rows - 1;
+		unsigned int out_start;
+		unsigned int out_end;
+		unsigned int in_top;
+		unsigned int out_top;
+
+		/* Start within 1024 lines of the bottom edge */
+		out_start = max_t(int, 0, out_bottom - 1024);
+		/* End before having to add more rows above */
+		out_end = min_t(unsigned int, out_bottom, row * 1024);
+
+		find_best_seam(ctx, out_start, out_end,
+			       in_bottom, out_bottom,
+			       in_top_align, out_top_align,
+			       1, allow_overshoot ? 1 : out_height_align,
+			       ctx->downsize_coeff_v, ctx->image_resize_coeff_v,
+			       &in_top, &out_top);
+
+		if ((ctx->rot_mode & IPU_ROT_BIT_VFLIP) ^
+		    ipu_rot_mode_is_irt(ctx->rot_mode))
+			flipped_out_top = resized_height - out_bottom;
+		else
+			flipped_out_top = out_top;
+
+		fill_tile_row(ctx, row, in, in_top, in_bottom - in_top,
+			      out, flipped_out_top, out_bottom - out_top);
+
+		dev_dbg(dev, "%s: row %u: %u, %u -> %u, %u\n", __func__, row,
+			in_top, in_bottom - in_top,
+			flipped_out_top, out_bottom - out_top);
+
+		in_bottom = in_top;
+		out_bottom = out_top;
+	}
+
+	if ((ctx->rot_mode & IPU_ROT_BIT_VFLIP) ^
+	    ipu_rot_mode_is_irt(ctx->rot_mode))
+		flipped_out_top = resized_height - out_bottom;
+	else
+		flipped_out_top = 0;
+
+	fill_tile_row(ctx, 0, in, 0, in_bottom,
+		      out, flipped_out_top, out_bottom);
+
+	dev_dbg(dev, "%s: row 0: 0, %u -> %u, %u\n", __func__,
+		in_bottom, flipped_out_top, out_bottom);
+}
+
 static void calc_tile_dimensions(struct ipu_image_convert_ctx *ctx,
 				 struct ipu_image_convert_image *image)
 {
 	unsigned int i;
 
 	for (i = 0; i < ctx->num_tiles; i++) {
-		struct ipu_image_tile *tile = &image->tile[i];
+		struct ipu_image_tile *tile;
 		const unsigned int row = i / image->num_cols;
 		const unsigned int col = i % image->num_cols;
 
-		tile->height = image->base.pix.height / image->num_rows;
-		tile->width = image->base.pix.width / image->num_cols;
-		tile->left = col * tile->width;
-		tile->top = row * tile->height;
+		if (image->type == IMAGE_CONVERT_OUT)
+			tile = &image->tile[ctx->out_tile_map[i]];
+		else
+			tile = &image->tile[i];
+
 		tile->size = ((tile->height * image->fmt->bpp) >> 3) *
 			tile->width;
 
@@ -1668,13 +1993,16 @@ ipu_image_convert_prepare(struct ipu_soc *ipu, enum ipu_ic_task ic_task,
 	if (ret)
 		goto out_free;
 
+	calc_out_tile_map(ctx);
+
+	find_seams(ctx, s_image, d_image);
+
 	calc_tile_dimensions(ctx, s_image);
 	calc_tile_offsets(ctx, s_image);
 
 	calc_tile_dimensions(ctx, d_image);
 	calc_tile_offsets(ctx, d_image);
 
-	calc_out_tile_map(ctx);
 	calc_tile_resize_coefficients(ctx);
 
 	dump_format(ctx, s_image);
-- 
2.19.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 11/16] gpu: ipu-v3: image-convert: fix debug output for varying tile sizes
  2018-09-18  9:34 [PATCH v3 00/16] i.MX media mem2mem scaler Philipp Zabel
                   ` (9 preceding siblings ...)
  2018-09-18  9:34 ` [PATCH v3 10/16] gpu: ipu-v3: image-convert: select optimal seam positions Philipp Zabel
@ 2018-09-18  9:34 ` Philipp Zabel
  2018-09-18  9:34 ` [PATCH v3 12/16] gpu: ipu-v3: image-convert: relax alignment restrictions Philipp Zabel
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 29+ messages in thread
From: Philipp Zabel @ 2018-09-18  9:34 UTC (permalink / raw)
  To: linux-media, Steve Longerbeam; +Cc: Nicolas Dufresne, kernel

Since tile dimensions now vary between tiles, add debug output for each
tile's position and dimensions.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
---
No changes since v2.
---
 drivers/gpu/ipu-v3/ipu-image-convert.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-image-convert.c b/drivers/gpu/ipu-v3/ipu-image-convert.c
index 4a513dea7913..aba973aedb75 100644
--- a/drivers/gpu/ipu-v3/ipu-image-convert.c
+++ b/drivers/gpu/ipu-v3/ipu-image-convert.c
@@ -308,12 +308,11 @@ static void dump_format(struct ipu_image_convert_ctx *ctx,
 	struct ipu_image_convert_priv *priv = chan->priv;
 
 	dev_dbg(priv->ipu->dev,
-		"task %u: ctx %p: %s format: %dx%d (%dx%d tiles of size %dx%d), %c%c%c%c\n",
+		"task %u: ctx %p: %s format: %dx%d (%dx%d tiles), %c%c%c%c\n",
 		chan->ic_task, ctx,
 		ic_image->type == IMAGE_CONVERT_OUT ? "Output" : "Input",
 		ic_image->base.pix.width, ic_image->base.pix.height,
 		ic_image->num_cols, ic_image->num_rows,
-		ic_image->tile[0].width, ic_image->tile[0].height,
 		ic_image->fmt->fourcc & 0xff,
 		(ic_image->fmt->fourcc >> 8) & 0xff,
 		(ic_image->fmt->fourcc >> 16) & 0xff,
@@ -786,6 +785,8 @@ static void find_seams(struct ipu_image_convert_ctx *ctx,
 static void calc_tile_dimensions(struct ipu_image_convert_ctx *ctx,
 				 struct ipu_image_convert_image *image)
 {
+	struct ipu_image_convert_chan *chan = ctx->chan;
+	struct ipu_image_convert_priv *priv = chan->priv;
 	unsigned int i;
 
 	for (i = 0; i < ctx->num_tiles; i++) {
@@ -810,6 +811,13 @@ static void calc_tile_dimensions(struct ipu_image_convert_ctx *ctx,
 			tile->rot_stride =
 				(image->fmt->bpp * tile->height) >> 3;
 		}
+
+		dev_dbg(priv->ipu->dev,
+			"task %u: ctx %p: %s@[%u,%u]: %ux%u@%u,%u\n",
+			chan->ic_task, ctx,
+			image->type == IMAGE_CONVERT_IN ? "Input" : "Output",
+			row, col,
+			tile->width, tile->height, tile->left, tile->top);
 	}
 }
 
-- 
2.19.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 12/16] gpu: ipu-v3: image-convert: relax alignment restrictions
  2018-09-18  9:34 [PATCH v3 00/16] i.MX media mem2mem scaler Philipp Zabel
                   ` (10 preceding siblings ...)
  2018-09-18  9:34 ` [PATCH v3 11/16] gpu: ipu-v3: image-convert: fix debug output for varying tile sizes Philipp Zabel
@ 2018-09-18  9:34 ` Philipp Zabel
  2018-09-18  9:34 ` [PATCH v3 13/16] gpu: ipu-v3: image-convert: fix bytesperline adjustment Philipp Zabel
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 29+ messages in thread
From: Philipp Zabel @ 2018-09-18  9:34 UTC (permalink / raw)
  To: linux-media, Steve Longerbeam; +Cc: Nicolas Dufresne, kernel

For the planar but U/V-packed formats NV12 and NV16, 8 pixel width
alignment is good enough to fulfill the 8 byte stride requirement.
If we allow the input 8-pixel DMA bursts to overshoot the end of the
line, the only input alignment restrictions are dictated by the pixel
format and 8-byte aligned line start address.
Since different tile sizes are allowed, the output tile with / height
alignment doesn't need to be multiplied by number of columns / rows.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
[steve_longerbeam@mentor.com: Bring in the fixes to format width and
 height alignment restrictions from imx-media-mem2mem.c.]
Signed-off-by: Steve Longerbeam <steve_longerbeam@mentor.com>
---
Changes since v2:
 - merge "relax tile width alignment for NV12 and NV16", "relax input
   alignment restrictions", and "relax output alignment restrictions"
 - bring in fixes to format width and height alignment restrictions
   from imx-media-mem2mem.c
---
 drivers/gpu/ipu-v3/ipu-image-convert.c | 81 +++++++++++++-------------
 1 file changed, 41 insertions(+), 40 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-image-convert.c b/drivers/gpu/ipu-v3/ipu-image-convert.c
index aba973aedb75..abf02b9d4b66 100644
--- a/drivers/gpu/ipu-v3/ipu-image-convert.c
+++ b/drivers/gpu/ipu-v3/ipu-image-convert.c
@@ -548,31 +548,46 @@ static inline u32 tile_top_align(const struct ipu_image_pixfmt *fmt)
 	return fmt->uv_height_dec > 1 ? 2 : 1;
 }
 
-/*
- * We have to adjust the tile width such that the tile physaddrs and
- * U and V plane offsets are multiples of 8 bytes as required by
- * the IPU DMA Controller. For the planar formats, this corresponds
- * to a pixel alignment of 16 (but use a more formal equation since
- * the variables are available). For all the packed formats, 8 is
- * good enough.
- */
-static inline u32 tile_width_align(const struct ipu_image_pixfmt *fmt)
+static inline u32 tile_width_align(enum ipu_image_convert_type type,
+				   const struct ipu_image_pixfmt *fmt,
+				   enum ipu_rotate_mode rot_mode)
 {
-	return fmt->planar ? 8 * fmt->uv_width_dec : 8;
+	if (type == IMAGE_CONVERT_IN) {
+		/*
+		 * The IC burst reads 8 pixels at a time. Reading beyond the
+		 * end of the line is usually acceptable. Those pixels are
+		 * ignored, unless the IC has to write the scaled line in
+		 * reverse.
+		 */
+		return (!ipu_rot_mode_is_irt(rot_mode) &&
+			(rot_mode & IPU_ROT_BIT_HFLIP)) ? 8 : 2;
+	}
+
+	/*
+	 * Align to 16x16 pixel blocks for planar 4:2:0 chroma subsampled
+	 * formats to guarantee 8-byte aligned line start addresses in the
+	 * chroma planes when IRT is used. Align to 8x8 pixel IRT block size
+	 * for all other formats.
+	 */
+	return (ipu_rot_mode_is_irt(rot_mode) &&
+		fmt->planar && !fmt->uv_packed) ?
+		8 * fmt->uv_width_dec : 8;
 }
 
-/*
- * For tile height alignment, we have to ensure that the output tile
- * heights are multiples of 8 lines if the IRT is required by the
- * given rotation mode (the IRT performs rotations on 8x8 blocks
- * at a time). If the IRT is not used, or for input image tiles,
- * 2 lines are good enough.
- */
 static inline u32 tile_height_align(enum ipu_image_convert_type type,
+				    const struct ipu_image_pixfmt *fmt,
 				    enum ipu_rotate_mode rot_mode)
 {
-	return (type == IMAGE_CONVERT_OUT &&
-		ipu_rot_mode_is_irt(rot_mode)) ? 8 : 2;
+	if (type == IMAGE_CONVERT_IN || !ipu_rot_mode_is_irt(rot_mode))
+		return 2;
+
+	/*
+	 * Align to 16x16 pixel blocks for planar 4:2:0 chroma subsampled
+	 * formats to guarantee 8-byte aligned line start addresses in the
+	 * chroma planes when IRT is used. Align to 8x8 pixel IRT block size
+	 * for all other formats.
+	 */
+	return (fmt->planar && !fmt->uv_packed) ? 8 * fmt->uv_width_dec : 8;
 }
 
 /*
@@ -658,8 +673,9 @@ static void find_seams(struct ipu_image_convert_ctx *ctx,
 	unsigned int in_top_align = tile_top_align(in->fmt);
 	unsigned int out_left_align = tile_left_align(out->fmt);
 	unsigned int out_top_align = tile_top_align(out->fmt);
-	unsigned int out_width_align = tile_width_align(out->fmt);
-	unsigned int out_height_align = tile_height_align(out->type,
+	unsigned int out_width_align = tile_width_align(out->type, out->fmt,
+							ctx->rot_mode);
+	unsigned int out_height_align = tile_height_align(out->type, out->fmt,
 							  ctx->rot_mode);
 	unsigned int in_right = in->base.rect.width;
 	unsigned int in_bottom = in->base.rect.height;
@@ -1838,8 +1854,6 @@ void ipu_image_convert_adjust(struct ipu_image *in, struct ipu_image *out,
 			      enum ipu_rotate_mode rot_mode)
 {
 	const struct ipu_image_pixfmt *infmt, *outfmt;
-	unsigned int num_in_rows, num_in_cols;
-	unsigned int num_out_rows, num_out_cols;
 	u32 w_align, h_align;
 
 	infmt = get_format(in->pix.pixelformat);
@@ -1871,28 +1885,15 @@ void ipu_image_convert_adjust(struct ipu_image *in, struct ipu_image *out,
 					in->pix.height / 4);
 	}
 
-	/* get tiling rows/cols from output format */
-	num_out_rows = num_stripes(out->pix.height);
-	num_out_cols = num_stripes(out->pix.width);
-	if (ipu_rot_mode_is_irt(rot_mode)) {
-		num_in_rows = num_out_cols;
-		num_in_cols = num_out_rows;
-	} else {
-		num_in_rows = num_out_rows;
-		num_in_cols = num_out_cols;
-	}
-
 	/* align input width/height */
-	w_align = ilog2(tile_width_align(infmt) * num_in_cols);
-	h_align = ilog2(tile_height_align(IMAGE_CONVERT_IN, rot_mode) *
-			num_in_rows);
+	w_align = ilog2(tile_width_align(IMAGE_CONVERT_IN, infmt, rot_mode));
+	h_align = ilog2(tile_height_align(IMAGE_CONVERT_IN, infmt, rot_mode));
 	in->pix.width = clamp_align(in->pix.width, MIN_W, MAX_W, w_align);
 	in->pix.height = clamp_align(in->pix.height, MIN_H, MAX_H, h_align);
 
 	/* align output width/height */
-	w_align = ilog2(tile_width_align(outfmt) * num_out_cols);
-	h_align = ilog2(tile_height_align(IMAGE_CONVERT_OUT, rot_mode) *
-			num_out_rows);
+	w_align = ilog2(tile_width_align(IMAGE_CONVERT_OUT, outfmt, rot_mode));
+	h_align = ilog2(tile_height_align(IMAGE_CONVERT_OUT, outfmt, rot_mode));
 	out->pix.width = clamp_align(out->pix.width, MIN_W, MAX_W, w_align);
 	out->pix.height = clamp_align(out->pix.height, MIN_H, MAX_H, h_align);
 
-- 
2.19.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 13/16] gpu: ipu-v3: image-convert: fix bytesperline adjustment
  2018-09-18  9:34 [PATCH v3 00/16] i.MX media mem2mem scaler Philipp Zabel
                   ` (11 preceding siblings ...)
  2018-09-18  9:34 ` [PATCH v3 12/16] gpu: ipu-v3: image-convert: relax alignment restrictions Philipp Zabel
@ 2018-09-18  9:34 ` Philipp Zabel
  2018-09-18  9:34 ` [PATCH v3 14/16] gpu: ipu-v3: image-convert: add some ASCII art to the exposition Philipp Zabel
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 29+ messages in thread
From: Philipp Zabel @ 2018-09-18  9:34 UTC (permalink / raw)
  To: linux-media, Steve Longerbeam; +Cc: Nicolas Dufresne, kernel

For planar formats, bytesperline does not depend on BPP. It must always
be larger than width and aligned to tile width alignment restrictions.

The input bytesperline to ipu_image_convert_adjust() may be
uninitialized, so don't rely on input bytesperline as the
minimum value for clamp_align(). Use 2 << w_align as the minimum
instead.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
[steve_longerbeam@mentor.com: clamp input bytesperline]
Signed-off-by: Steve Longerbeam <steve_longerbeam@mentor.com>
---
Changes since v2:
 - clamp uninitialized input bytesperline
---
 drivers/gpu/ipu-v3/ipu-image-convert.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-image-convert.c b/drivers/gpu/ipu-v3/ipu-image-convert.c
index abf02b9d4b66..16d400b2b3d2 100644
--- a/drivers/gpu/ipu-v3/ipu-image-convert.c
+++ b/drivers/gpu/ipu-v3/ipu-image-convert.c
@@ -1898,10 +1898,18 @@ void ipu_image_convert_adjust(struct ipu_image *in, struct ipu_image *out,
 	out->pix.height = clamp_align(out->pix.height, MIN_H, MAX_H, h_align);
 
 	/* set input/output strides and image sizes */
-	in->pix.bytesperline = (in->pix.width * infmt->bpp) >> 3;
-	in->pix.sizeimage = in->pix.height * in->pix.bytesperline;
-	out->pix.bytesperline = (out->pix.width * outfmt->bpp) >> 3;
-	out->pix.sizeimage = out->pix.height * out->pix.bytesperline;
+	in->pix.bytesperline = infmt->planar ?
+		clamp_align(in->pix.width, 2 << w_align, MAX_W, w_align) :
+		clamp_align((in->pix.width * infmt->bpp) >> 3,
+			    2 << w_align, MAX_W, w_align);
+	in->pix.sizeimage = infmt->planar ?
+		(in->pix.height * in->pix.bytesperline * infmt->bpp) >> 3 :
+		in->pix.height * in->pix.bytesperline;
+	out->pix.bytesperline = outfmt->planar ? out->pix.width :
+		(out->pix.width * outfmt->bpp) >> 3;
+	out->pix.sizeimage = outfmt->planar ?
+		(out->pix.height * out->pix.bytesperline * outfmt->bpp) >> 3 :
+		out->pix.height * out->pix.bytesperline;
 }
 EXPORT_SYMBOL_GPL(ipu_image_convert_adjust);
 
-- 
2.19.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 14/16] gpu: ipu-v3: image-convert: add some ASCII art to the exposition
  2018-09-18  9:34 [PATCH v3 00/16] i.MX media mem2mem scaler Philipp Zabel
                   ` (12 preceding siblings ...)
  2018-09-18  9:34 ` [PATCH v3 13/16] gpu: ipu-v3: image-convert: fix bytesperline adjustment Philipp Zabel
@ 2018-09-18  9:34 ` Philipp Zabel
  2018-09-18  9:34 ` [PATCH v3 15/16] gpu: ipu-v3: image-convert: disable double buffering if necessary Philipp Zabel
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 29+ messages in thread
From: Philipp Zabel @ 2018-09-18  9:34 UTC (permalink / raw)
  To: linux-media, Steve Longerbeam; +Cc: Nicolas Dufresne, kernel

Visualize the scaling and rotation pipeline with some ASCII art
diagrams. Remove the FIXME comment about missing seam prevention.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
---
No changes since v2.
---
 drivers/gpu/ipu-v3/ipu-image-convert.c | 39 +++++++++++++++++++-------
 1 file changed, 29 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-image-convert.c b/drivers/gpu/ipu-v3/ipu-image-convert.c
index 16d400b2b3d2..6179d8bd123c 100644
--- a/drivers/gpu/ipu-v3/ipu-image-convert.c
+++ b/drivers/gpu/ipu-v3/ipu-image-convert.c
@@ -37,17 +37,36 @@
  * when double_buffering boolean is set).
  *
  * Note that the input frame must be split up into the same number
- * of tiles as the output frame.
+ * of tiles as the output frame:
  *
- * FIXME: at this point there is no attempt to deal with visible seams
- * at the tile boundaries when upscaling. The seams are caused by a reset
- * of the bilinear upscale interpolation when starting a new tile. The
- * seams are barely visible for small upscale factors, but become
- * increasingly visible as the upscale factor gets larger, since more
- * interpolated pixels get thrown out at the tile boundaries. A possilble
- * fix might be to overlap tiles of different sizes, but this must be done
- * while also maintaining the IDMAC dma buffer address alignment and 8x8 IRT
- * alignment restrictions of each tile.
+ *                       +---------+-----+
+ *   +-----+---+         |  A      | B   |
+ *   | A   | B |         |         |     |
+ *   +-----+---+   -->   +---------+-----+
+ *   | C   | D |         |  C      | D   |
+ *   +-----+---+         |         |     |
+ *                       +---------+-----+
+ *
+ * Clockwise 90° rotations are handled by first rescaling into a
+ * reusable temporary tile buffer and then rotating with the 8x8
+ * block rotator, writing to the correct destination:
+ *
+ *                                         +-----+-----+
+ *                                         |     |     |
+ *   +-----+---+         +---------+       | C   | A   |
+ *   | A   | B |         | A,B, |  |       |     |     |
+ *   +-----+---+   -->   | C,D  |  |  -->  |     |     |
+ *   | C   | D |         +---------+       +-----+-----+
+ *   +-----+---+                           | D   | B   |
+ *                                         |     |     |
+ *                                         +-----+-----+
+ *
+ * If the 8x8 block rotator is used, horizontal or vertical flipping
+ * is done during the rotation step, otherwise flipping is done
+ * during the scaling step.
+ * With rotation or flipping, tile order changes between input and
+ * output image. Tiles are numbered row major from top left to bottom
+ * right for both input and output image.
  */
 
 #define MAX_STRIPES_W    4
-- 
2.19.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 15/16] gpu: ipu-v3: image-convert: disable double buffering if necessary
  2018-09-18  9:34 [PATCH v3 00/16] i.MX media mem2mem scaler Philipp Zabel
                   ` (13 preceding siblings ...)
  2018-09-18  9:34 ` [PATCH v3 14/16] gpu: ipu-v3: image-convert: add some ASCII art to the exposition Philipp Zabel
@ 2018-09-18  9:34 ` Philipp Zabel
  2018-09-18  9:34 ` [PATCH v3 16/16] gpu: ipu-v3: image-convert: allow three rows or columns Philipp Zabel
  2018-10-13  0:29 ` [PATCH v3 00/16] i.MX media mem2mem scaler Steve Longerbeam
  16 siblings, 0 replies; 29+ messages in thread
From: Philipp Zabel @ 2018-09-18  9:34 UTC (permalink / raw)
  To: linux-media, Steve Longerbeam; +Cc: Nicolas Dufresne, kernel

Double-buffering only works if tile sizes are the same and the resizing
coefficient does not change between tiles, even for non-planar formats.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
---
No changes since v2.
---
 drivers/gpu/ipu-v3/ipu-image-convert.c | 27 ++++++++++++++++++++++++--
 1 file changed, 25 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-image-convert.c b/drivers/gpu/ipu-v3/ipu-image-convert.c
index 6179d8bd123c..6ab880416919 100644
--- a/drivers/gpu/ipu-v3/ipu-image-convert.c
+++ b/drivers/gpu/ipu-v3/ipu-image-convert.c
@@ -1973,6 +1973,7 @@ ipu_image_convert_prepare(struct ipu_soc *ipu, enum ipu_ic_task ic_task,
 	struct ipu_image_convert_chan *chan;
 	struct ipu_image_convert_ctx *ctx;
 	unsigned long flags;
+	unsigned int i;
 	bool get_res;
 	int ret;
 
@@ -2056,15 +2057,37 @@ ipu_image_convert_prepare(struct ipu_soc *ipu, enum ipu_ic_task ic_task,
 	 * for every tile, and therefore would have to be updated for
 	 * each buffer which is not possible. So double-buffering is
 	 * impossible when either the source or destination images are
-	 * a planar format (YUV420, YUV422P, etc.).
+	 * a planar format (YUV420, YUV422P, etc.). Further, differently
+	 * sized tiles or different resizing coefficients per tile
+	 * prevent double-buffering as well.
 	 */
 	ctx->double_buffering = (ctx->num_tiles > 1 &&
 				 !s_image->fmt->planar &&
 				 !d_image->fmt->planar);
+	for (i = 1; i < ctx->num_tiles; i++) {
+		if (ctx->in.tile[i].width != ctx->in.tile[0].width ||
+		    ctx->in.tile[i].height != ctx->in.tile[0].height ||
+		    ctx->out.tile[i].width != ctx->out.tile[0].width ||
+		    ctx->out.tile[i].height != ctx->out.tile[0].height) {
+			ctx->double_buffering = false;
+			break;
+		}
+	}
+	for (i = 1; i < ctx->in.num_cols; i++) {
+		if (ctx->resize_coeffs_h[i] != ctx->resize_coeffs_h[0]) {
+			ctx->double_buffering = false;
+			break;
+		}
+	}
+	for (i = 1; i < ctx->in.num_rows; i++) {
+		if (ctx->resize_coeffs_v[i] != ctx->resize_coeffs_v[0]) {
+			ctx->double_buffering = false;
+			break;
+		}
+	}
 
 	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
 		unsigned long intermediate_size = d_image->tile[0].size;
-		unsigned int i;
 
 		for (i = 1; i < ctx->num_tiles; i++) {
 			if (d_image->tile[i].size > intermediate_size)
-- 
2.19.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 16/16] gpu: ipu-v3: image-convert: allow three rows or columns
  2018-09-18  9:34 [PATCH v3 00/16] i.MX media mem2mem scaler Philipp Zabel
                   ` (14 preceding siblings ...)
  2018-09-18  9:34 ` [PATCH v3 15/16] gpu: ipu-v3: image-convert: disable double buffering if necessary Philipp Zabel
@ 2018-09-18  9:34 ` Philipp Zabel
  2018-10-13  0:29 ` [PATCH v3 00/16] i.MX media mem2mem scaler Steve Longerbeam
  16 siblings, 0 replies; 29+ messages in thread
From: Philipp Zabel @ 2018-09-18  9:34 UTC (permalink / raw)
  To: linux-media, Steve Longerbeam; +Cc: Nicolas Dufresne, kernel

If width or height are in the [2049, 3072] range, allow to
use just three tiles in this dimension, instead of four.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
---
New since v2.
---
 drivers/gpu/ipu-v3/ipu-image-convert.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-image-convert.c b/drivers/gpu/ipu-v3/ipu-image-convert.c
index 6ab880416919..5bce03b73502 100644
--- a/drivers/gpu/ipu-v3/ipu-image-convert.c
+++ b/drivers/gpu/ipu-v3/ipu-image-convert.c
@@ -379,12 +379,7 @@ static int alloc_dma_buf(struct ipu_image_convert_priv *priv,
 
 static inline int num_stripes(int dim)
 {
-	if (dim <= 1024)
-		return 1;
-	else if (dim <= 2048)
-		return 2;
-	else
-		return 4;
+	return (dim - 1) / 1024 + 1;
 }
 
 /*
-- 
2.19.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 00/16] i.MX media mem2mem scaler
  2018-09-18  9:34 [PATCH v3 00/16] i.MX media mem2mem scaler Philipp Zabel
                   ` (15 preceding siblings ...)
  2018-09-18  9:34 ` [PATCH v3 16/16] gpu: ipu-v3: image-convert: allow three rows or columns Philipp Zabel
@ 2018-10-13  0:29 ` Steve Longerbeam
  2018-10-17 23:46   ` Steve Longerbeam
  16 siblings, 1 reply; 29+ messages in thread
From: Steve Longerbeam @ 2018-10-13  0:29 UTC (permalink / raw)
  To: Philipp Zabel, linux-media, Steve Longerbeam; +Cc: Nicolas Dufresne, kernel

Hi Philipp,

I finished some fairly comprehensive testing on this series,
scaling to and from the following common resolutions:

640x480,
720x480,
720x576,
800x480,
800x600,
1024x768,
1152x720,
1280x720,
1280x768,
1280x800,
1280x1024,
1920x1080,
1920x1200,
2560x1440,
2560x1600,
2592x1944

and within each scaling conversion, I tested converting to and from
all the pixel formats supported by the mem2mem device:

422p, nv12, rgb4, yu12, bgr3, nv16,
rgbp, yuyv, bgr4, rgb3, uyvy, yv12

What I found is that most of these conversions are working, producing
no mis-aligned tile buffer addresses, except for one exception: converting
to or from a packed 24-bit pixel format (rgb3 or bgr3). For example,
try 640x480.rgb3 -> 1920x1080.yu12.

I found the cause of that, I added the following patch to my fork
git@github.com:slongerbeam/mediatree.git, branch imx-mem2mem.3:

"gpu: ipu-v3: image-convert: Fix tile_left_align()"

Feel free to squash it with

"gpu: ipu-v3: image-convert: select optimal seam positions"

I've also added a few other patches to my branch:

"gpu: ipu-v3: Add chroma plane offset overrides to ipu_cpmem_set_image()"

which fixes a false warning in ipu_cpmem_set_yuv_planar_full().

Also added:

"gpu: ipu-v3: image-convert: Prevent race between run and unprepare"
"gpu: ipu-v3: image-convert: Only wait for abort completion if active run"
"gpu: ipu-v3: image-convert: Allow reentrancy into abort"
"gpu: ipu-v3: image-convert: Remove need_abort flag"
"gpu: ipu-v3: image-convert: Catch unaligned tile offsets"

With the fix to tile_left_align(), all conversions for the above scaling
resolutions are not producing mis-aligned tile buffers anymore, for all
pixel formats.

But one last thing. Conversions to and from YV12 are producing images
with wrong colors, it looks like the .uv_swapped boolean needs to be checked
additionally somewhere. Any ideas?


Steve


On 09/18/2018 02:34 AM, Philipp Zabel wrote:
> Hi,
>
> this is the third version of the i.MX mem2mem scaler series.
>
> The driver patch has been moved to the beginning, as Steve suggested.
> I've added his warning patch to catch alignment bugs to the series,
> seam position selection has been fixed for more corner cases, the
> alignment restriction relaxation patches have been merged into one
> patch, and support for tiling with three rows or columns has been added
> to avoid unnecessary overhead.
>
> Changes since v2:
>   - Rely on ipu_image_convert_adjust() in mem2mem_try_fmt() for format
>     adjustments. This makes the mem2mem driver mostly a V4L2 mem2mem API
>     wrapper around the IPU image converter, and independent of the
>     internal image converter implementation.
>   - Remove the source and destination buffers on error in device_run().
>     Otherwise the conversion is re-attempted apparently over and over
>     again (with WARN() backtraces).
>   - Allow subscribing to control changes.
>   - Fix seam position selection for more corner cases:
>      - Switch width/height properly and align tile top left positions to 8x8
>        IRT block size when rotating.
>      - Align input width to input burst length in case the scaling step
>        flips horizontally.
>      - Fix bottom edge calculation.
>
> Changes since v1:
>   - Fix inverted allow_overshoot logic
>   - Correctly switch horizontal / vertical tile alignment when
>     determining seam positions with the 90° rotator active.
>   - Fix SPDX-License-Identifier and remove superfluous license
>     text.
>   - Fix uninitialized walign in try_fmt
>
> Previous cover letter:
>
> we have image conversion code for scaling and colorspace conversion in
> the IPUv3 base driver for a while. Since the IC hardware can only write
> up to 1024x1024 pixel buffers, it scales to larger output buffers by
> splitting the input and output frame into similarly sized tiles.
>
> This causes the issue that the bilinear interpolation resets at the tile
> boundary: instead of smoothly interpolating across the seam, there is a
> jump in the input sample position that is very apparent for high
> upscaling factors. This can be avoided by slightly changing the scaling
> coefficients to let the left/top tiles overshoot their input sampling
> into the first pixel / line of their right / bottom neighbors. The error
> can be further reduced by letting tiles be differently sized and by
> selecting seam positions that minimize the input sampling position error
> at tile boundaries.
> This is complicated by different DMA start address, burst size, and
> rotator block size alignment requirements, depending on the input and
> output pixel formats, and the fact that flipping happens in different
> places depending on the rotation.
>
> This series implements optimal seam position selection and seam hiding
> with per-tile resizing coefficients and adds a scaling mem2mem device
> to the imx-media driver.
>
> regards
> Philipp
>
> Philipp Zabel (15):
>    media: imx: add mem2mem device
>    gpu: ipu-v3: ipu-ic: allow to manually set resize coefficients
>    gpu: ipu-v3: image-convert: prepare for per-tile configuration
>    gpu: ipu-v3: image-convert: calculate per-tile resize coefficients
>    gpu: ipu-v3: image-convert: reconfigure IC per tile
>    gpu: ipu-v3: image-convert: store tile top/left position
>    gpu: ipu-v3: image-convert: calculate tile dimensions and offsets
>      outside fill_image
>    gpu: ipu-v3: image-convert: move tile alignment helpers
>    gpu: ipu-v3: image-convert: select optimal seam positions
>    gpu: ipu-v3: image-convert: fix debug output for varying tile sizes
>    gpu: ipu-v3: image-convert: relax alignment restrictions
>    gpu: ipu-v3: image-convert: fix bytesperline adjustment
>    gpu: ipu-v3: image-convert: add some ASCII art to the exposition
>    gpu: ipu-v3: image-convert: disable double buffering if necessary
>    gpu: ipu-v3: image-convert: allow three rows or columns
>
> Steve Longerbeam (1):
>    gpu: ipu-cpmem: add WARN_ON_ONCE() for unaligned dma buffers
>
>   drivers/gpu/ipu-v3/ipu-cpmem.c                |   6 +
>   drivers/gpu/ipu-v3/ipu-ic.c                   |  52 +-
>   drivers/gpu/ipu-v3/ipu-image-convert.c        | 919 +++++++++++++++---
>   drivers/staging/media/imx/Kconfig             |   1 +
>   drivers/staging/media/imx/Makefile            |   1 +
>   drivers/staging/media/imx/imx-media-dev.c     |  11 +
>   drivers/staging/media/imx/imx-media-mem2mem.c | 873 +++++++++++++++++
>   drivers/staging/media/imx/imx-media.h         |  10 +
>   include/video/imx-ipu-v3.h                    |   6 +
>   9 files changed, 1727 insertions(+), 152 deletions(-)
>   create mode 100644 drivers/staging/media/imx/imx-media-mem2mem.c
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 10/16] gpu: ipu-v3: image-convert: select optimal seam positions
  2018-09-18  9:34 ` [PATCH v3 10/16] gpu: ipu-v3: image-convert: select optimal seam positions Philipp Zabel
@ 2018-10-13  0:33   ` Steve Longerbeam
  2018-10-17 11:10     ` Philipp Zabel
  0 siblings, 1 reply; 29+ messages in thread
From: Steve Longerbeam @ 2018-10-13  0:33 UTC (permalink / raw)
  To: Philipp Zabel, linux-media, Steve Longerbeam; +Cc: Nicolas Dufresne, kernel



On 09/18/2018 02:34 AM, Philipp Zabel wrote:

<snip>
> +/*
> + * Tile left edges are required to be aligned to multiples of 8 bytes
> + * by the IDMAC.
> + */
> +static inline u32 tile_left_align(const struct ipu_image_pixfmt *fmt)
> +{
> +	return fmt->planar ? 8 * fmt->uv_width_dec : 64 / fmt->bpp;
> +}
<snip>

As I indicated, shouldn't this be

return fmt->planar ? 8 * fmt->uv_width_dec : 8;

?

Just from a unit analysis perspective, "64 / fmt->bp" has
units of pixels / 8-bytes, it should have units of bytes.

Steve

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 10/16] gpu: ipu-v3: image-convert: select optimal seam positions
  2018-10-13  0:33   ` Steve Longerbeam
@ 2018-10-17 11:10     ` Philipp Zabel
  2018-10-17 23:44       ` Steve Longerbeam
  0 siblings, 1 reply; 29+ messages in thread
From: Philipp Zabel @ 2018-10-17 11:10 UTC (permalink / raw)
  To: Steve Longerbeam, linux-media; +Cc: Nicolas Dufresne, kernel

On Fri, 2018-10-12 at 17:33 -0700, Steve Longerbeam wrote:
> 
> On 09/18/2018 02:34 AM, Philipp Zabel wrote:
> 
> <snip>
> > +/*
> > + * Tile left edges are required to be aligned to multiples of 8 bytes
> > + * by the IDMAC.
> > + */
> > +static inline u32 tile_left_align(const struct ipu_image_pixfmt *fmt)
> > +{
> > +	return fmt->planar ? 8 * fmt->uv_width_dec : 64 / fmt->bpp;
> > +}
> 
> <snip>
> 
> As I indicated, shouldn't this be
> 
> return fmt->planar ? 8 * fmt->uv_width_dec : 8;
> 
> ?
>
> Just from a unit analysis perspective, "64 / fmt->bp" has
> units of pixels / 8-bytes, it should have units of bytes.

The tile alignment is in pixels, not in bytes. For 16-bit and 32-bit
packed formats, we only need to align to 4 or 2 pixels, respectively,
as the LCM of 8-byte alignment and 2-byte or 4-byte pixel size is
always 8 bytes.

But now that you pointed it out, it is quite obvious that this can't
work for 24-bit packed formats. Here the LCM of 8-byte alignment and 3-
byte pixels is 24 bytes, or 8 pixels.

How about:

	if (fmt->planar)
		return fmt->uv_packed ? 8 : 8 * fmt->uv_width_dec;
	else
		return fmt->bpp == 32 ? 2 : fmt->bpp == 16 ? 4 : 8;

regards
Philipp

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 10/16] gpu: ipu-v3: image-convert: select optimal seam positions
  2018-10-17 11:10     ` Philipp Zabel
@ 2018-10-17 23:44       ` Steve Longerbeam
  0 siblings, 0 replies; 29+ messages in thread
From: Steve Longerbeam @ 2018-10-17 23:44 UTC (permalink / raw)
  To: Philipp Zabel, linux-media; +Cc: Nicolas Dufresne, kernel


On 10/17/18 4:10 AM, Philipp Zabel wrote:
> On Fri, 2018-10-12 at 17:33 -0700, Steve Longerbeam wrote:
>> On 09/18/2018 02:34 AM, Philipp Zabel wrote:
>>
>> <snip>
>>> +/*
>>> + * Tile left edges are required to be aligned to multiples of 8 bytes
>>> + * by the IDMAC.
>>> + */
>>> +static inline u32 tile_left_align(const struct ipu_image_pixfmt *fmt)
>>> +{
>>> +	return fmt->planar ? 8 * fmt->uv_width_dec : 64 / fmt->bpp;
>>> +}
>> <snip>
>>
>> As I indicated, shouldn't this be
>>
>> return fmt->planar ? 8 * fmt->uv_width_dec : 8;
>>
>> ?
>>
>> Just from a unit analysis perspective, "64 / fmt->bp" has
>> units of pixels / 8-bytes, it should have units of bytes.
> The tile alignment is in pixels, not in bytes.


Ah, yes of course you are right, I used to know this :) I am
loosing track of this code.


>   For 16-bit and 32-bit
> packed formats, we only need to align to 4 or 2 pixels, respectively,
> as the LCM of 8-byte alignment and 2-byte or 4-byte pixel size is
> always 8 bytes.


Yes I agree, the LCM of 8-byte alignment and bytes-per-pixel should
be the tile left edge alignment in pixels.


> But now that you pointed it out, it is quite obvious that this can't
> work for 24-bit packed formats. Here the LCM of 8-byte alignment and 3-
> byte pixels is 24 bytes, or 8 pixels.
>
> How about:
>
> 	if (fmt->planar)
> 		return fmt->uv_packed ? 8 : 8 * fmt->uv_width_dec;
> 	else
> 		return fmt->bpp == 32 ? 2 : fmt->bpp == 16 ? 4 : 8;


Yep, that looks better. I tested this and it works fine.

Steve

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 00/16] i.MX media mem2mem scaler
  2018-10-13  0:29 ` [PATCH v3 00/16] i.MX media mem2mem scaler Steve Longerbeam
@ 2018-10-17 23:46   ` Steve Longerbeam
  2018-10-19 12:18     ` Philipp Zabel
  0 siblings, 1 reply; 29+ messages in thread
From: Steve Longerbeam @ 2018-10-17 23:46 UTC (permalink / raw)
  To: Steve Longerbeam, Philipp Zabel, linux-media; +Cc: Nicolas Dufresne, kernel

Hi Philipp,

On 10/12/18 5:29 PM, Steve Longerbeam wrote:
> <snip>
>
> But one last thing. Conversions to and from YV12 are producing images
> with wrong colors, it looks like the .uv_swapped boolean needs to be 
> checked
> additionally somewhere. Any ideas?


Sorry, this was my fault. I fixed this in

"gpu: ipu-v3: Add chroma plane offset overrides to ipu_cpmem_set_image()"

in my fork git@github.com:slongerbeam/mediatree.git, branch imx-mem2mem.3.

Steve

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 01/16] media: imx: add mem2mem device
  2018-09-18  9:34 ` [PATCH v3 01/16] media: imx: add mem2mem device Philipp Zabel
@ 2018-10-18 22:53   ` Tim Harvey
  2018-10-19  9:53     ` Philipp Zabel
  0 siblings, 1 reply; 29+ messages in thread
From: Tim Harvey @ 2018-10-18 22:53 UTC (permalink / raw)
  To: Philipp Zabel; +Cc: linux-media, Steve Longerbeam, nicolas, Sascha Hauer

On Tue, Sep 18, 2018 at 2:34 AM Philipp Zabel <p.zabel@pengutronix.de> wrote:
>
> Add a single imx-media mem2mem video device that uses the IPU IC PP
> (image converter post processing) task for scaling and colorspace
> conversion.
> On i.MX6Q/DL SoCs with two IPUs currently only the first IPU is used.
>
> The hardware only supports writing to destination buffers up to
> 1024x1024 pixels in a single pass, arbitrary sizes can be achieved
> by rendering multiple tiles per frame.
>
> Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
> [steve_longerbeam@mentor.com: use ipu_image_convert_adjust(), fix
>  device_run() error handling]
> Signed-off-by: Steve Longerbeam <steve_longerbeam@mentor.com>
> ---
> Changes since v2:
>  - Rely on ipu_image_convert_adjust() in mem2mem_try_fmt() for format
>    adjustments. This makes the mem2mem driver mostly a V4L2 mem2mem API
>    wrapper around the IPU image converter, and independent of the
>    internal image converter implementation.
>  - Remove the source and destination buffers on error in device_run().
>    Otherwise the conversion is re-attempted apparently over and over
>    again (with WARN() backtraces).
>  - Allow subscribing to control changes.
> ---
>  drivers/staging/media/imx/Kconfig             |   1 +
>  drivers/staging/media/imx/Makefile            |   1 +
>  drivers/staging/media/imx/imx-media-dev.c     |  11 +
>  drivers/staging/media/imx/imx-media-mem2mem.c | 873 ++++++++++++++++++
>  drivers/staging/media/imx/imx-media.h         |  10 +
>  5 files changed, 896 insertions(+)
>  create mode 100644 drivers/staging/media/imx/imx-media-mem2mem.c
>

Philipp,

Thanks for submitting this!

I'm hoping this lets us use non-IMX capture devices along with the IMX
media controller entities to so we can use hardware
CSC,scaling,pixel-format-conversions and ultimately coda based encode.

I've built this on top of linux-media and see that it registers as
/dev/video8 but I'm not clear how to use it? I don't see it within the
media controller graph.

Regards,

Tim

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 01/16] media: imx: add mem2mem device
  2018-10-18 22:53   ` Tim Harvey
@ 2018-10-19  9:53     ` Philipp Zabel
  2018-10-19 20:19       ` Steve Longerbeam
  0 siblings, 1 reply; 29+ messages in thread
From: Philipp Zabel @ 2018-10-19  9:53 UTC (permalink / raw)
  To: Tim Harvey; +Cc: linux-media, Steve Longerbeam, nicolas, Sascha Hauer

Hi Tim,

On Thu, 2018-10-18 at 15:53 -0700, Tim Harvey wrote:
[...]
> Philipp,
> 
> Thanks for submitting this!
> 
> I'm hoping this lets us use non-IMX capture devices along with the IMX
> media controller entities to so we can use hardware
> CSC,scaling,pixel-format-conversions and ultimately coda based encode.
> 
> I've built this on top of linux-media and see that it registers as
> /dev/video8 but I'm not clear how to use it? I don't see it within the
> media controller graph.

It's a V4L2 mem2mem device that can be handled by the GstV4l2Transform
element, for example. GStreamer should create a v4l2video8convert
element of that type.

The mem2mem device is not part of the media controller graph on purpose.
There is no interaction with any of the entities in the media controller
graph apart from the fact that the IC PP task we are using for mem2mem
scaling is sharing hardware resources with the IC PRP tasks used for the
media controller scaler entitites.

regards
Philipp

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 00/16] i.MX media mem2mem scaler
  2018-10-17 23:46   ` Steve Longerbeam
@ 2018-10-19 12:18     ` Philipp Zabel
  0 siblings, 0 replies; 29+ messages in thread
From: Philipp Zabel @ 2018-10-19 12:18 UTC (permalink / raw)
  To: Steve Longerbeam, Steve Longerbeam, linux-media; +Cc: Nicolas Dufresne, kernel

Hi Steve,

On Wed, 2018-10-17 at 16:46 -0700, Steve Longerbeam wrote:
> Hi Philipp,
> 
> On 10/12/18 5:29 PM, Steve Longerbeam wrote:
> > <snip>
> > 
> > But one last thing. Conversions to and from YV12 are producing images
> > with wrong colors, it looks like the .uv_swapped boolean needs to be 
> > checked
> > additionally somewhere. Any ideas?
> 
> 
> Sorry, this was my fault. I fixed this in
> 
> "gpu: ipu-v3: Add chroma plane offset overrides to ipu_cpmem_set_image()"
> 
> in my fork git@github.com:slongerbeam/mediatree.git, branch imx-mem2mem.3.
> 
> Steve

Thanks a lot for testing, fixes, and integration. Basically I've just
resubmitted that branch as v4.
After this round I'll pick up all non-controversial ipu-v3 / image-
convert patches.

regards
Philipp

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 01/16] media: imx: add mem2mem device
  2018-10-19  9:53     ` Philipp Zabel
@ 2018-10-19 20:19       ` Steve Longerbeam
  2018-10-19 23:54         ` Tim Harvey
  2018-10-21 17:43         ` Philipp Zabel
  0 siblings, 2 replies; 29+ messages in thread
From: Steve Longerbeam @ 2018-10-19 20:19 UTC (permalink / raw)
  To: Philipp Zabel, Tim Harvey
  Cc: linux-media, Steve Longerbeam, nicolas, Sascha Hauer


On 10/19/18 2:53 AM, Philipp Zabel wrote:
> Hi Tim,
>
> On Thu, 2018-10-18 at 15:53 -0700, Tim Harvey wrote:
> [...]
>> Philipp,
>>
>> Thanks for submitting this!
>>
>> I'm hoping this lets us use non-IMX capture devices along with the IMX
>> media controller entities to so we can use hardware
>> CSC,scaling,pixel-format-conversions and ultimately coda based encode.
>>
>> I've built this on top of linux-media and see that it registers as
>> /dev/video8 but I'm not clear how to use it? I don't see it within the
>> media controller graph.
> It's a V4L2 mem2mem device that can be handled by the GstV4l2Transform
> element, for example. GStreamer should create a v4l2video8convert
> element of that type.
>
> The mem2mem device is not part of the media controller graph on purpose.
> There is no interaction with any of the entities in the media controller
> graph apart from the fact that the IC PP task we are using for mem2mem
> scaling is sharing hardware resources with the IC PRP tasks used for the
> media controller scaler entitites.


It would be nice in the future to link mem2mem output-side to the ipu_vdic:1
pad, to make use of h/w VDIC de-interlace as part of mem2mem operations.
The progressive output from a new ipu_vdic:3 pad can then be sent to the
image_convert APIs by the mem2mem driver for further tiled scaling, CSC,
and rotation by the IC PP task. The ipu_vdic:1 pad makes use of pure 
DMA-based
de-interlace, that is, all input frames (N-1, N, N+1) to the VDIC are sent
from DMA buffers, and this VDIC mode of operation is well understood
and produces clean de-interlace output. The risk is that this would require
iDMAC channel 5 for ipu_vdic:3, which IFAIK is not verified to work yet.


The other problem with that currently is that mem2mem would have to be 
split into
separate device nodes: a /dev/videoN for output-side (linked to 
ipu_vdic:1), and
a /dev/videoM for capture-side (linked from ipu_vdic:3). And then it no 
longer
presents to userspace as a mem2mem device with a single device node for both
output and capture sides.


Or is there another way? I recall work to integrate mem2mem with media 
control.
There is v4l2_m2m_register_media_controller(), but that create three 
entities:
source, processing, and sink. The VDIC entity would be part of mem2mem
processing but this entity already exists for the current graph. This 
function
could however be used as a guide to incorporate the VDIC entity into m2m
device.


Steve

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 01/16] media: imx: add mem2mem device
  2018-10-19 20:19       ` Steve Longerbeam
@ 2018-10-19 23:54         ` Tim Harvey
  2018-10-21 17:43         ` Philipp Zabel
  1 sibling, 0 replies; 29+ messages in thread
From: Tim Harvey @ 2018-10-19 23:54 UTC (permalink / raw)
  To: Steve Longerbeam; +Cc: Philipp Zabel, linux-media, nicolas, Sascha Hauer

On Fri, Oct 19, 2018 at 1:19 PM Steve Longerbeam <slongerbeam@gmail.com> wrote:
>
>
> On 10/19/18 2:53 AM, Philipp Zabel wrote:
> > Hi Tim,
> >
> > On Thu, 2018-10-18 at 15:53 -0700, Tim Harvey wrote:
> > [...]
> >> Philipp,
> >>
> >> Thanks for submitting this!
> >>
> >> I'm hoping this lets us use non-IMX capture devices along with the IMX
> >> media controller entities to so we can use hardware
> >> CSC,scaling,pixel-format-conversions and ultimately coda based encode.
> >>
> >> I've built this on top of linux-media and see that it registers as
> >> /dev/video8 but I'm not clear how to use it? I don't see it within the
> >> media controller graph.
> > It's a V4L2 mem2mem device that can be handled by the GstV4l2Transform
> > element, for example. GStreamer should create a v4l2video8convert
> > element of that type.
> >
> > The mem2mem device is not part of the media controller graph on purpose.
> > There is no interaction with any of the entities in the media controller
> > graph apart from the fact that the IC PP task we are using for mem2mem
> > scaling is sharing hardware resources with the IC PRP tasks used for the
> > media controller scaler entitites.
>
>
> It would be nice in the future to link mem2mem output-side to the ipu_vdic:1
> pad, to make use of h/w VDIC de-interlace as part of mem2mem operations.
> The progressive output from a new ipu_vdic:3 pad can then be sent to the
> image_convert APIs by the mem2mem driver for further tiled scaling, CSC,
> and rotation by the IC PP task. The ipu_vdic:1 pad makes use of pure
> DMA-based
> de-interlace, that is, all input frames (N-1, N, N+1) to the VDIC are sent
> from DMA buffers, and this VDIC mode of operation is well understood
> and produces clean de-interlace output. The risk is that this would require
> iDMAC channel 5 for ipu_vdic:3, which IFAIK is not verified to work yet.
>
>
> The other problem with that currently is that mem2mem would have to be
> split into
> separate device nodes: a /dev/videoN for output-side (linked to
> ipu_vdic:1), and
> a /dev/videoM for capture-side (linked from ipu_vdic:3). And then it no
> longer
> presents to userspace as a mem2mem device with a single device node for both
> output and capture sides.
>
>
> Or is there another way? I recall work to integrate mem2mem with media
> control.
> There is v4l2_m2m_register_media_controller(), but that create three
> entities:
> source, processing, and sink. The VDIC entity would be part of mem2mem
> processing but this entity already exists for the current graph. This
> function
> could however be used as a guide to incorporate the VDIC entity into m2m
> device.
>

I agree - without being able to utilize de-interlace,csc,scaling and
rotation it seems fairly limited today (but a great start!).

Also, if it were in the media graph wouldn't we be able to use the
compose selection subdev API?

I've got an AVC8000 minPCIe card here with a TW6869 with 8x analog
capture inputs that I'm hoping to someday soon be able to capture,
compose into a single frame, and encode.

Regards,

Tim

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 01/16] media: imx: add mem2mem device
  2018-10-19 20:19       ` Steve Longerbeam
  2018-10-19 23:54         ` Tim Harvey
@ 2018-10-21 17:43         ` Philipp Zabel
  2018-10-23 23:23           ` Steve Longerbeam
  1 sibling, 1 reply; 29+ messages in thread
From: Philipp Zabel @ 2018-10-21 17:43 UTC (permalink / raw)
  To: Steve Longerbeam; +Cc: Tim Harvey, nicolas, Sascha Hauer, linux-media

On Fri, Oct 19, 2018 at 01:19:10PM -0700, Steve Longerbeam wrote:
> 
> On 10/19/18 2:53 AM, Philipp Zabel wrote:
> > Hi Tim,
> > 
> > On Thu, 2018-10-18 at 15:53 -0700, Tim Harvey wrote:
> > [...]
> > > Philipp,
> > > 
> > > Thanks for submitting this!
> > > 
> > > I'm hoping this lets us use non-IMX capture devices along with the IMX
> > > media controller entities to so we can use hardware
> > > CSC,scaling,pixel-format-conversions and ultimately coda based encode.
> > > 
> > > I've built this on top of linux-media and see that it registers as
> > > /dev/video8 but I'm not clear how to use it? I don't see it within the
> > > media controller graph.
> > It's a V4L2 mem2mem device that can be handled by the GstV4l2Transform
> > element, for example. GStreamer should create a v4l2video8convert
> > element of that type.
> > 
> > The mem2mem device is not part of the media controller graph on purpose.
> > There is no interaction with any of the entities in the media controller
> > graph apart from the fact that the IC PP task we are using for mem2mem
> > scaling is sharing hardware resources with the IC PRP tasks used for the
> > media controller scaler entitites.
> 
> It would be nice in the future to link mem2mem output-side to the ipu_vdic:1
> pad, to make use of h/w VDIC de-interlace as part of mem2mem operations.
> The progressive output from a new ipu_vdic:3 pad can then be sent to the
> image_convert APIs by the mem2mem driver for further tiled scaling, CSC,
> and rotation by the IC PP task. The ipu_vdic:1 pad makes use of pure
> DMA-based de-interlace, that is, all input frames (N-1, N, N+1) to the
> VDIC are sent from DMA buffers, and this VDIC mode of operation is
> well understood and produces clean de-interlace output. The risk is
> that this would require iDMAC channel 5 for ipu_vdic:3, which IFAIK is
> not verified to work yet.

Tiled mem2mem deinterlacing support would be nice, I'm not sure yet how
though. I'd limit media controller links to marking VDIC as unavailable
for the capture pipeline. The V4L2 subdev API is too lowlevel for tiling
mem2mem purposes, as we'd need to change the subdev format multiple
times per frame.
Also I'd like to keep the option of scheduling tile jobs to both IPUs on
i.MX6Q, which will become difficult to describe via MC, as both IPUs'
ipu_vdics would have to be involved.

> The other problem with that currently is that mem2mem would have to be split
> into separate device nodes: a /dev/videoN for output-side (linked to
> ipu_vdic:1), and a /dev/videoM for capture-side (linked from
> ipu_vdic:3). And then it no longer presents to userspace as a mem2mem
> device with a single device node for both output and capture sides.

I don't understand why we'd need separate video devices for output and
capture, deinterlacing is still single input single (double rate)
output. As soon as we begin tiling, we are one layer of abstraction
away from the hardware pads anyway. Now if we want to support combining
on the other hand...

> Or is there another way? I recall work to integrate mem2mem with media
> control. There is v4l2_m2m_register_media_controller(), but that
> create three
> entities:
> source, processing, and sink. The VDIC entity would be part of mem2mem
> processing but this entity already exists for the current graph. This
> function could however be used as a guide to incorporate the VDIC
> entity into m2m device.

I'm not sure if this is the right abstraction. Without tiling or
multi-IPU scheduling, sure. But the mem2mem driver does not directly
describe hardware operation anyway.

regards
Philipp

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 01/16] media: imx: add mem2mem device
  2018-10-21 17:43         ` Philipp Zabel
@ 2018-10-23 23:23           ` Steve Longerbeam
  0 siblings, 0 replies; 29+ messages in thread
From: Steve Longerbeam @ 2018-10-23 23:23 UTC (permalink / raw)
  To: Philipp Zabel, Steve Longerbeam
  Cc: Tim Harvey, nicolas, Sascha Hauer, linux-media

(resending as plain text)


On 10/21/18 10:43 AM, Philipp Zabel wrote:
> On Fri, Oct 19, 2018 at 01:19:10PM -0700, Steve Longerbeam wrote:
>> On 10/19/18 2:53 AM, Philipp Zabel wrote:
>>> Hi Tim,
>>>
>>> On Thu, 2018-10-18 at 15:53 -0700, Tim Harvey wrote:
>>> [...]
>>>> Philipp,
>>>>
>>>> Thanks for submitting this!
>>>>
>>>> I'm hoping this lets us use non-IMX capture devices along with the IMX
>>>> media controller entities to so we can use hardware
>>>> CSC,scaling,pixel-format-conversions and ultimately coda based encode.
>>>>
>>>> I've built this on top of linux-media and see that it registers as
>>>> /dev/video8 but I'm not clear how to use it? I don't see it within the
>>>> media controller graph.
>>> It's a V4L2 mem2mem device that can be handled by the GstV4l2Transform
>>> element, for example. GStreamer should create a v4l2video8convert
>>> element of that type.
>>>
>>> The mem2mem device is not part of the media controller graph on purpose.
>>> There is no interaction with any of the entities in the media controller
>>> graph apart from the fact that the IC PP task we are using for mem2mem
>>> scaling is sharing hardware resources with the IC PRP tasks used for the
>>> media controller scaler entitites.
>> It would be nice in the future to link mem2mem output-side to the ipu_vdic:1
>> pad, to make use of h/w VDIC de-interlace as part of mem2mem operations.
>> The progressive output from a new ipu_vdic:3 pad can then be sent to the
>> image_convert APIs by the mem2mem driver for further tiled scaling, CSC,
>> and rotation by the IC PP task. The ipu_vdic:1 pad makes use of pure
>> DMA-based de-interlace, that is, all input frames (N-1, N, N+1) to the
>> VDIC are sent from DMA buffers, and this VDIC mode of operation is
>> well understood and produces clean de-interlace output. The risk is
>> that this would require iDMAC channel 5 for ipu_vdic:3, which IFAIK is
>> not verified to work yet.
> Tiled mem2mem deinterlacing support would be nice, I'm not sure yet how
> though. I'd limit media controller links to marking VDIC as unavailable
> for the capture pipeline. The V4L2 subdev API is too lowlevel for tiling
> mem2mem purposes, as we'd need to change the subdev format multiple
> times per frame.


I wasn't considering tiled deinterlacing, only deinterlacing at the 
hardware limited input frame size to the VDIC.


> Also I'd like to keep the option of scheduling tile jobs to both IPUs on
> i.MX6Q, which will become difficult to describe via MC, as both IPUs'
> ipu_vdics would have to be involved.

Agreed, it would be good to add to the mem2mem driver the ability to 
schedule jobs to whichever IPU is least busy.


>> The other problem with that currently is that mem2mem would have to be split
>> into separate device nodes: a /dev/videoN for output-side (linked to
>> ipu_vdic:1), and a /dev/videoM for capture-side (linked from
>> ipu_vdic:3). And then it no longer presents to userspace as a mem2mem
>> device with a single device node for both output and capture sides.
> I don't understand why we'd need separate video devices for output and
> capture, deinterlacing is still single input single (double rate)
> output. As soon as we begin tiling, we are one layer of abstraction
> away from the hardware pads anyway. Now if we want to support combining
> on the other hand...


Again I wasn't thinking of doing tiled deinterlace.


>> Or is there another way? I recall work to integrate mem2mem with media
>> control. There is v4l2_m2m_register_media_controller(), but that
>> create three
>> entities:
>> source, processing, and sink. The VDIC entity would be part of mem2mem
>> processing but this entity already exists for the current graph. This
>> function could however be used as a guide to incorporate the VDIC
>> entity into m2m device.
> I'm not sure if this is the right abstraction. Without tiling or
> multi-IPU scheduling, sure. But the mem2mem driver does not directly
> describe hardware operation anyway.


What I'm thinking is separate mem2mem devices from this proposed 
tiled-scaling post-processing mem2mem, that solely do motion compensated 
deinterlace using the VDIC as the processing entity.

[1] is the dot graph that demonstrates the idea, on imx6q SabreSD. Two 
new mem2mem devices are attached to VDIC sink and source IDMAC pads. 
Also the PP tiled scaling mem2mem is shown in the graph.

As of now, the VDIC is only available for video capture pipelines, so 
the advantage would be to allow the use of hardware deinterlace 
gstreamer pipelines from other types of interlaced sources like file or 
network streams.

So something like the following example which would do hardware 
deinterlace, followed by tiled scaling/CSC/rotation, then h.264 encode, 
the VDIC mem2mem device is at /dev/video0, and the PP mem2mem device is 
at /dev/video12 in this example:

gst-launch-1.0 \
v4l2src ! \
v4l2video0convert output-io-mode=dmabuf-import ! \
v4l2video12convert output-io-mode=dmabuf-import ! \
v4l2h264enc output-io-mode=dmabuf-import ! \
h264parse ! \
matroskamux ! \
filesink

I'm probably missing some stream properties in that example, but you get 
the idea.

Steve

[1]

digraph board {
     rankdir=TB
     n00000001 [label="{{<port0> 0 | <port1> 1} | 
ipu1_vdic\n/dev/v4l-subdev0 | {<port2> 2 | <port3> 3}}", shape=Mrecord, 
style=filled, fillcolor=green]
     n00000001:port3 -> n00000008 [style=dashed]
     n00000001:port2 -> n00000021:port0 [style=dashed]
     n00000006 [label="ipu_vdic mem2mem-source\n/dev/video0", shape=box, 
style=filled, fillcolor=yellow]
     n00000006 -> n00000001:port1 [style=dashed]
     n00000008 [label="ipu_vdic mem2mem-sink\n/dev/video0", shape=box, 
style=filled, fillcolor=yellow]
     n00000011 [label="{{<port0> 0 | <port1> 1} | 
ipu2_vdic\n/dev/v4l-subdev1 | {<port2> 2 | <port3> 3}}", shape=Mrecord, 
style=filled, fillcolor=green]
     n00000011:port3 -> n00000018 [style=dashed]
     n00000011:port2 -> n00000037:port0 [style=dashed]
     n00000016 [label="ipu_vdic mem2mem-source\n/dev/video1", shape=box, 
style=filled, fillcolor=yellow]
     n00000016 -> n00000011:port1 [style=dashed]
     n00000018 [label="ipu_vdic mem2mem-sink\n/dev/video1", shape=box, 
style=filled, fillcolor=yellow]
     n00000021 [label="{{<port0> 0} | ipu1_ic_prp\n/dev/v4l-subdev2 | 
{<port1> 1 | <port2> 2}}", shape=Mrecord, style=filled, fillcolor=green]
     n00000021:port1 -> n00000025:port0 [style=dashed]
     n00000021:port2 -> n0000002e:port0 [style=dashed]
     n00000025 [label="{{<port0> 0} | ipu1_ic_prpenc\n/dev/v4l-subdev3 | 
{<port1> 1}}", shape=Mrecord, style=filled, fillcolor=green]
     n00000025:port1 -> n00000028 [style=dashed]
     n00000028 [label="ipu1_ic_prpenc capture\n/dev/video2", shape=box, 
style=filled, fillcolor=yellow]
     n0000002e [label="{{<port0> 0} | ipu1_ic_prpvf\n/dev/v4l-subdev4 | 
{<port1> 1}}", shape=Mrecord, style=filled, fillcolor=green]
     n0000002e:port1 -> n00000031 [style=dashed]
     n00000031 [label="ipu1_ic_prpvf capture\n/dev/video3", shape=box, 
style=filled, fillcolor=yellow]
     n00000037 [label="{{<port0> 0} | ipu2_ic_prp\n/dev/v4l-subdev5 | 
{<port1> 1 | <port2> 2}}", shape=Mrecord, style=filled, fillcolor=green]
     n00000037:port1 -> n0000003b:port0 [style=dashed]
     n00000037:port2 -> n00000044:port0 [style=dashed]
     n0000003b [label="{{<port0> 0} | ipu2_ic_prpenc\n/dev/v4l-subdev6 | 
{<port1> 1}}", shape=Mrecord, style=filled, fillcolor=green]
     n0000003b:port1 -> n0000003e [style=dashed]
     n0000003e [label="ipu2_ic_prpenc capture\n/dev/video4", shape=box, 
style=filled, fillcolor=yellow]
     n00000044 [label="{{<port0> 0} | ipu2_ic_prpvf\n/dev/v4l-subdev7 | 
{<port1> 1}}", shape=Mrecord, style=filled, fillcolor=green]
     n00000044:port1 -> n00000047 [style=dashed]
     n00000047 [label="ipu2_ic_prpvf capture\n/dev/video5", shape=box, 
style=filled, fillcolor=yellow]
     n0000004d [label="{{<port0> 0} | ipu1_csi0\n/dev/v4l-subdev8 | 
{<port1> 1 | <port2> 2}}", shape=Mrecord, style=filled, fillcolor=green]
     n0000004d:port2 -> n00000051 [style=dashed]
     n0000004d:port1 -> n00000021:port0 [style=dashed]
     n0000004d:port1 -> n00000001:port0 [style=dashed]
     n00000051 [label="ipu1_csi0 capture\n/dev/video6", shape=box, 
style=filled, fillcolor=yellow]
     n00000057 [label="{{<port0> 0} | ipu1_csi1\n/dev/v4l-subdev9 | 
{<port1> 1 | <port2> 2}}", shape=Mrecord, style=filled, fillcolor=green]
     n00000057:port2 -> n0000005b [style=dashed]
     n00000057:port1 -> n00000021:port0 [style=dashed]
     n00000057:port1 -> n00000001:port0 [style=dashed]
     n0000005b [label="ipu1_csi1 capture\n/dev/video7", shape=box, 
style=filled, fillcolor=yellow]
     n00000061 [label="{{<port0> 0} | ipu2_csi0\n/dev/v4l-subdev10 | 
{<port1> 1 | <port2> 2}}", shape=Mrecord, style=filled, fillcolor=green]
     n00000061:port2 -> n00000065 [style=dashed]
     n00000061:port1 -> n00000037:port0 [style=dashed]
     n00000061:port1 -> n00000011:port0 [style=dashed]
     n00000065 [label="ipu2_csi0 capture\n/dev/video8", shape=box, 
style=filled, fillcolor=yellow]
     n0000006b [label="{{<port0> 0} | ipu2_csi1\n/dev/v4l-subdev11 | 
{<port1> 1 | <port2> 2}}", shape=Mrecord, style=filled, fillcolor=green]
     n0000006b:port2 -> n0000006f [style=dashed]
     n0000006b:port1 -> n00000037:port0 [style=dashed]
     n0000006b:port1 -> n00000011:port0 [style=dashed]
     n0000006f [label="ipu2_csi1 capture\n/dev/video9", shape=box, 
style=filled, fillcolor=yellow]
     n00000075 [label="{{<port0> 0} | imx6-mipi-csi2\n/dev/v4l-subdev12 
| {<port1> 1 | <port2> 2 | <port3> 3 | <port4> 4}}", shape=Mrecord, 
style=filled, fillcolor=green]
     n00000075:port2 -> n00000057:port0 [style=dashed]
     n00000075:port3 -> n00000061:port0 [style=dashed]
     n00000075:port1 -> n0000007b:port0 [style=dashed]
     n00000075:port4 -> n0000007f:port0 [style=dashed]
     n0000007b [label="{{<port0> 0 | <port1> 1} | 
ipu1_csi0_mux\n/dev/v4l-subdev13 | {<port2> 2}}", shape=Mrecord, 
style=filled, fillcolor=green]
     n0000007b:port2 -> n0000004d:port0 [style=dashed]
     n0000007f [label="{{<port0> 0 | <port1> 1} | 
ipu2_csi1_mux\n/dev/v4l-subdev14 | {<port2> 2}}", shape=Mrecord, 
style=filled, fillcolor=green]
     n0000007f:port2 -> n0000006b:port0 [style=dashed]
     n00000083 [label="{{} | ov5640 1-003c\n/dev/v4l-subdev15 | {<port0> 
0}}", shape=Mrecord, style=filled, fillcolor=green]
     n00000083:port0 -> n00000075:port0 [style=dashed]
     n000000cf [label="ipu_ic_pp mem2mem-proc\n", shape=box, 
style=filled, fillcolor=yellow]
     n000000cf -> n000000d4 [style=bold]
     n000000d2 [label="ipu_ic_pp mem2mem-source\n/dev/video12", 
shape=box, style=filled, fillcolor=yellow]
     n000000d2 -> n000000cf [style=bold]
     n000000d4 [label="ipu_ic_pp mem2mem-sink\n/dev/video12", shape=box, 
style=filled, fillcolor=yellow]
}

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2018-10-24  7:52 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-18  9:34 [PATCH v3 00/16] i.MX media mem2mem scaler Philipp Zabel
2018-09-18  9:34 ` [PATCH v3 01/16] media: imx: add mem2mem device Philipp Zabel
2018-10-18 22:53   ` Tim Harvey
2018-10-19  9:53     ` Philipp Zabel
2018-10-19 20:19       ` Steve Longerbeam
2018-10-19 23:54         ` Tim Harvey
2018-10-21 17:43         ` Philipp Zabel
2018-10-23 23:23           ` Steve Longerbeam
2018-09-18  9:34 ` [PATCH v3 02/16] gpu: ipu-cpmem: add WARN_ON_ONCE() for unaligned dma buffers Philipp Zabel
2018-09-18  9:34 ` [PATCH v3 03/16] gpu: ipu-v3: ipu-ic: allow to manually set resize coefficients Philipp Zabel
2018-09-18  9:34 ` [PATCH v3 04/16] gpu: ipu-v3: image-convert: prepare for per-tile configuration Philipp Zabel
2018-09-18  9:34 ` [PATCH v3 05/16] gpu: ipu-v3: image-convert: calculate per-tile resize coefficients Philipp Zabel
2018-09-18  9:34 ` [PATCH v3 06/16] gpu: ipu-v3: image-convert: reconfigure IC per tile Philipp Zabel
2018-09-18  9:34 ` [PATCH v3 07/16] gpu: ipu-v3: image-convert: store tile top/left position Philipp Zabel
2018-09-18  9:34 ` [PATCH v3 08/16] gpu: ipu-v3: image-convert: calculate tile dimensions and offsets outside fill_image Philipp Zabel
2018-09-18  9:34 ` [PATCH v3 09/16] gpu: ipu-v3: image-convert: move tile alignment helpers Philipp Zabel
2018-09-18  9:34 ` [PATCH v3 10/16] gpu: ipu-v3: image-convert: select optimal seam positions Philipp Zabel
2018-10-13  0:33   ` Steve Longerbeam
2018-10-17 11:10     ` Philipp Zabel
2018-10-17 23:44       ` Steve Longerbeam
2018-09-18  9:34 ` [PATCH v3 11/16] gpu: ipu-v3: image-convert: fix debug output for varying tile sizes Philipp Zabel
2018-09-18  9:34 ` [PATCH v3 12/16] gpu: ipu-v3: image-convert: relax alignment restrictions Philipp Zabel
2018-09-18  9:34 ` [PATCH v3 13/16] gpu: ipu-v3: image-convert: fix bytesperline adjustment Philipp Zabel
2018-09-18  9:34 ` [PATCH v3 14/16] gpu: ipu-v3: image-convert: add some ASCII art to the exposition Philipp Zabel
2018-09-18  9:34 ` [PATCH v3 15/16] gpu: ipu-v3: image-convert: disable double buffering if necessary Philipp Zabel
2018-09-18  9:34 ` [PATCH v3 16/16] gpu: ipu-v3: image-convert: allow three rows or columns Philipp Zabel
2018-10-13  0:29 ` [PATCH v3 00/16] i.MX media mem2mem scaler Steve Longerbeam
2018-10-17 23:46   ` Steve Longerbeam
2018-10-19 12:18     ` Philipp Zabel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.