All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 00/22] i.MX media mem2mem scaler
@ 2018-10-19 12:15 Philipp Zabel
  2018-10-19 12:15 ` [PATCH v4 01/22] media: imx: add mem2mem device Philipp Zabel
                   ` (23 more replies)
  0 siblings, 24 replies; 26+ messages in thread
From: Philipp Zabel @ 2018-10-19 12:15 UTC (permalink / raw)
  To: linux-media, Steve Longerbeam; +Cc: Nicolas Dufresne, Tim Harvey, kernel

Hi,

this is the fourth version of the i.MX mem2mem scaler series.

An alignment issue with 24-bit RGB formats has been corrected in the
seam position selection patch and a few new fixes by Steve have been
added. If there are no more issues, I'll pick up the ipu-v3 patches
via imx-drm/next. The first patch could be merged via the media tree
independently.

Changes since v3:
 - Fix tile_left_align for 24-bit RGB formats and reduce alignment
   restrictions for U/V packed planar YUV formats
 - Catch unaligned tile offsets in image-convert
 - Add chroma plane offset overrides to ipu_cpmem_set_image() to
   prevent a false positive warning in some cases
 - Fix a race between run and unprepare and make abort reentrant.


Changes since v2:
 - Rely on ipu_image_convert_adjust() in mem2mem_try_fmt() for format
   adjustments. This makes the mem2mem driver mostly a V4L2 mem2mem API
   wrapper around the IPU image converter, and independent of the
   internal image converter implementation.
 - Remove the source and destination buffers on error in device_run().
   Otherwise the conversion is re-attempted apparently over and over
   again (with WARN() backtraces).
 - Allow subscribing to control changes.
 - Fix seam position selection for more corner cases:
    - Switch width/height properly and align tile top left positions to 8x8
      IRT block size when rotating.
    - Align input width to input burst length in case the scaling step
      flips horizontally.
    - Fix bottom edge calculation.

Changes since v1:
 - Fix inverted allow_overshoot logic
 - Correctly switch horizontal / vertical tile alignment when
   determining seam positions with the 90° rotator active.
 - Fix SPDX-License-Identifier and remove superfluous license
   text.
 - Fix uninitialized walign in try_fmt

Previous cover letter:

we have image conversion code for scaling and colorspace conversion in
the IPUv3 base driver for a while. Since the IC hardware can only write
up to 1024x1024 pixel buffers, it scales to larger output buffers by
splitting the input and output frame into similarly sized tiles.

This causes the issue that the bilinear interpolation resets at the tile
boundary: instead of smoothly interpolating across the seam, there is a
jump in the input sample position that is very apparent for high
upscaling factors. This can be avoided by slightly changing the scaling
coefficients to let the left/top tiles overshoot their input sampling
into the first pixel / line of their right / bottom neighbors. The error
can be further reduced by letting tiles be differently sized and by
selecting seam positions that minimize the input sampling position error
at tile boundaries.
This is complicated by different DMA start address, burst size, and
rotator block size alignment requirements, depending on the input and
output pixel formats, and the fact that flipping happens in different
places depending on the rotation.

This series implements optimal seam position selection and seam hiding
with per-tile resizing coefficients and adds a scaling mem2mem device
to the imx-media driver.

regards
Philipp

Philipp Zabel (15):
  media: imx: add mem2mem device
  gpu: ipu-v3: ipu-ic: allow to manually set resize coefficients
  gpu: ipu-v3: image-convert: prepare for per-tile configuration
  gpu: ipu-v3: image-convert: calculate per-tile resize coefficients
  gpu: ipu-v3: image-convert: reconfigure IC per tile
  gpu: ipu-v3: image-convert: store tile top/left position
  gpu: ipu-v3: image-convert: calculate tile dimensions and offsets
    outside fill_image
  gpu: ipu-v3: image-convert: move tile alignment helpers
  gpu: ipu-v3: image-convert: select optimal seam positions
  gpu: ipu-v3: image-convert: fix debug output for varying tile sizes
  gpu: ipu-v3: image-convert: relax alignment restrictions
  gpu: ipu-v3: image-convert: fix bytesperline adjustment
  gpu: ipu-v3: image-convert: add some ASCII art to the exposition
  gpu: ipu-v3: image-convert: disable double buffering if necessary
  gpu: ipu-v3: image-convert: allow three rows or columns

Steve Longerbeam (7):
  gpu: ipu-cpmem: add WARN_ON_ONCE() for unaligned dma buffers
  gpu: ipu-v3: Add chroma plane offset overrides to
    ipu_cpmem_set_image()
  gpu: ipu-v3: image-convert: Prevent race between run and unprepare
  gpu: ipu-v3: image-convert: Only wait for abort completion if active
    run
  gpu: ipu-v3: image-convert: Allow reentrancy into abort
  gpu: ipu-v3: image-convert: Remove need_abort flag
  gpu: ipu-v3: image-convert: Catch unaligned tile offsets

 drivers/gpu/ipu-v3/ipu-cpmem.c                |   52 +-
 drivers/gpu/ipu-v3/ipu-ic.c                   |   52 +-
 drivers/gpu/ipu-v3/ipu-image-convert.c        | 1019 ++++++++++++++---
 drivers/staging/media/imx/Kconfig             |    1 +
 drivers/staging/media/imx/Makefile            |    1 +
 drivers/staging/media/imx/imx-media-dev.c     |   11 +
 drivers/staging/media/imx/imx-media-mem2mem.c |  873 ++++++++++++++
 drivers/staging/media/imx/imx-media.h         |   10 +
 include/video/imx-ipu-v3.h                    |    9 +
 9 files changed, 1821 insertions(+), 207 deletions(-)
 create mode 100644 drivers/staging/media/imx/imx-media-mem2mem.c

-- 
2.19.0

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v4 01/22] media: imx: add mem2mem device
  2018-10-19 12:15 [PATCH v4 00/22] i.MX media mem2mem scaler Philipp Zabel
@ 2018-10-19 12:15 ` Philipp Zabel
  2018-10-19 12:15 ` [PATCH v4 02/22] gpu: ipu-cpmem: add WARN_ON_ONCE() for unaligned dma buffers Philipp Zabel
                   ` (22 subsequent siblings)
  23 siblings, 0 replies; 26+ messages in thread
From: Philipp Zabel @ 2018-10-19 12:15 UTC (permalink / raw)
  To: linux-media, Steve Longerbeam; +Cc: Nicolas Dufresne, Tim Harvey, kernel

Add a single imx-media mem2mem video device that uses the IPU IC PP
(image converter post processing) task for scaling and colorspace
conversion.
On i.MX6Q/DL SoCs with two IPUs currently only the first IPU is used.

The hardware only supports writing to destination buffers up to
1024x1024 pixels in a single pass, arbitrary sizes can be achieved
by rendering multiple tiles per frame.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
[slongerbeam@gmail.com: use ipu_image_convert_adjust(), fix
 device_run() error handling]
Signed-off-by: Steve Longerbeam <slongerbeam@gmail.com>
---
No changes since v3.
---
 drivers/staging/media/imx/Kconfig             |   1 +
 drivers/staging/media/imx/Makefile            |   1 +
 drivers/staging/media/imx/imx-media-dev.c     |  11 +
 drivers/staging/media/imx/imx-media-mem2mem.c | 873 ++++++++++++++++++
 drivers/staging/media/imx/imx-media.h         |  10 +
 5 files changed, 896 insertions(+)
 create mode 100644 drivers/staging/media/imx/imx-media-mem2mem.c

diff --git a/drivers/staging/media/imx/Kconfig b/drivers/staging/media/imx/Kconfig
index bfc17de56b17..07013cb3cb66 100644
--- a/drivers/staging/media/imx/Kconfig
+++ b/drivers/staging/media/imx/Kconfig
@@ -6,6 +6,7 @@ config VIDEO_IMX_MEDIA
 	depends on HAS_DMA
 	select VIDEOBUF2_DMA_CONTIG
 	select V4L2_FWNODE
+	select V4L2_MEM2MEM_DEV
 	---help---
 	  Say yes here to enable support for video4linux media controller
 	  driver for the i.MX5/6 SOC.
diff --git a/drivers/staging/media/imx/Makefile b/drivers/staging/media/imx/Makefile
index 698a4210316e..f2e722d0fa19 100644
--- a/drivers/staging/media/imx/Makefile
+++ b/drivers/staging/media/imx/Makefile
@@ -6,6 +6,7 @@ imx-media-ic-objs := imx-ic-common.o imx-ic-prp.o imx-ic-prpencvf.o
 obj-$(CONFIG_VIDEO_IMX_MEDIA) += imx-media.o
 obj-$(CONFIG_VIDEO_IMX_MEDIA) += imx-media-common.o
 obj-$(CONFIG_VIDEO_IMX_MEDIA) += imx-media-capture.o
+obj-$(CONFIG_VIDEO_IMX_MEDIA) += imx-media-mem2mem.o
 obj-$(CONFIG_VIDEO_IMX_MEDIA) += imx-media-vdic.o
 obj-$(CONFIG_VIDEO_IMX_MEDIA) += imx-media-ic.o
 
diff --git a/drivers/staging/media/imx/imx-media-dev.c b/drivers/staging/media/imx/imx-media-dev.c
index 481840195071..c4324df54c2e 100644
--- a/drivers/staging/media/imx/imx-media-dev.c
+++ b/drivers/staging/media/imx/imx-media-dev.c
@@ -309,6 +309,17 @@ static int imx_media_probe_complete(struct v4l2_async_notifier *notifier)
 		goto unlock;
 
 	ret = v4l2_device_register_subdev_nodes(&imxmd->v4l2_dev);
+	if (ret)
+		goto unlock;
+
+	/* TODO: check whether we have IC subdevices first */
+	imxmd->m2m_vdev = imx_media_mem2mem_device_init(imxmd);
+	if (IS_ERR(imxmd->m2m_vdev)) {
+		ret = PTR_ERR(imxmd->m2m_vdev);
+		goto unlock;
+	}
+
+	ret = imx_media_mem2mem_device_register(imxmd->m2m_vdev);
 unlock:
 	mutex_unlock(&imxmd->mutex);
 	if (ret)
diff --git a/drivers/staging/media/imx/imx-media-mem2mem.c b/drivers/staging/media/imx/imx-media-mem2mem.c
new file mode 100644
index 000000000000..a2a4dca017ce
--- /dev/null
+++ b/drivers/staging/media/imx/imx-media-mem2mem.c
@@ -0,0 +1,873 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * i.MX IPUv3 mem2mem Scaler/CSC driver
+ *
+ * Copyright (C) 2011 Pengutronix, Sascha Hauer
+ * Copyright (C) 2018 Pengutronix, Philipp Zabel
+ */
+#include <linux/module.h>
+#include <linux/delay.h>
+#include <linux/fs.h>
+#include <linux/version.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <video/imx-ipu-v3.h>
+#include <video/imx-ipu-image-convert.h>
+
+#include <media/v4l2-ctrls.h>
+#include <media/v4l2-event.h>
+#include <media/v4l2-mem2mem.h>
+#include <media/v4l2-device.h>
+#include <media/v4l2-ioctl.h>
+#include <media/videobuf2-dma-contig.h>
+
+#include "imx-media.h"
+
+#define fh_to_ctx(__fh)	container_of(__fh, struct mem2mem_ctx, fh)
+
+enum {
+	V4L2_M2M_SRC = 0,
+	V4L2_M2M_DST = 1,
+};
+
+struct mem2mem_priv {
+	struct imx_media_video_dev vdev;
+
+	struct v4l2_m2m_dev   *m2m_dev;
+	struct device         *dev;
+
+	struct imx_media_dev  *md;
+
+	struct mutex          mutex;       /* mem2mem device mutex */
+
+	atomic_t              num_inst;
+};
+
+#define to_mem2mem_priv(v) container_of(v, struct mem2mem_priv, vdev)
+
+/* Per-queue, driver-specific private data */
+struct mem2mem_q_data {
+	struct v4l2_pix_format	cur_fmt;
+	struct v4l2_rect	rect;
+};
+
+struct mem2mem_ctx {
+	struct mem2mem_priv	*priv;
+
+	struct v4l2_fh		fh;
+	struct mem2mem_q_data	q_data[2];
+	int			error;
+	struct ipu_image_convert_ctx *icc;
+
+	struct v4l2_ctrl_handler ctrl_hdlr;
+	int rotate;
+	bool hflip;
+	bool vflip;
+	enum ipu_rotate_mode	rot_mode;
+};
+
+static struct mem2mem_q_data *get_q_data(struct mem2mem_ctx *ctx,
+					 enum v4l2_buf_type type)
+{
+	if (V4L2_TYPE_IS_OUTPUT(type))
+		return &ctx->q_data[V4L2_M2M_SRC];
+	else
+		return &ctx->q_data[V4L2_M2M_DST];
+}
+
+/*
+ * mem2mem callbacks
+ */
+
+static void job_abort(void *_ctx)
+{
+	struct mem2mem_ctx *ctx = _ctx;
+
+	if (ctx->icc)
+		ipu_image_convert_abort(ctx->icc);
+}
+
+static void mem2mem_ic_complete(struct ipu_image_convert_run *run, void *_ctx)
+{
+	struct mem2mem_ctx *ctx = _ctx;
+	struct mem2mem_priv *priv = ctx->priv;
+	struct vb2_v4l2_buffer *src_buf, *dst_buf;
+
+	src_buf = v4l2_m2m_src_buf_remove(ctx->fh.m2m_ctx);
+	dst_buf = v4l2_m2m_dst_buf_remove(ctx->fh.m2m_ctx);
+
+	dst_buf->vb2_buf.timestamp = src_buf->vb2_buf.timestamp;
+	dst_buf->timecode = src_buf->timecode;
+
+	v4l2_m2m_buf_done(src_buf, run->status ? VB2_BUF_STATE_ERROR :
+						 VB2_BUF_STATE_DONE);
+	v4l2_m2m_buf_done(dst_buf, run->status ? VB2_BUF_STATE_ERROR :
+						 VB2_BUF_STATE_DONE);
+
+	v4l2_m2m_job_finish(priv->m2m_dev, ctx->fh.m2m_ctx);
+	kfree(run);
+}
+
+static void device_run(void *_ctx)
+{
+	struct mem2mem_ctx *ctx = _ctx;
+	struct mem2mem_priv *priv = ctx->priv;
+	struct vb2_v4l2_buffer *src_buf, *dst_buf;
+	struct ipu_image_convert_run *run;
+	int ret;
+
+	src_buf = v4l2_m2m_next_src_buf(ctx->fh.m2m_ctx);
+	dst_buf = v4l2_m2m_next_dst_buf(ctx->fh.m2m_ctx);
+
+	run = kzalloc(sizeof(*run), GFP_KERNEL);
+	if (!run)
+		goto err;
+
+	run->ctx = ctx->icc;
+	run->in_phys = vb2_dma_contig_plane_dma_addr(&src_buf->vb2_buf, 0);
+	run->out_phys = vb2_dma_contig_plane_dma_addr(&dst_buf->vb2_buf, 0);
+
+	ret = ipu_image_convert_queue(run);
+	if (ret < 0) {
+		v4l2_err(ctx->priv->vdev.vfd->v4l2_dev,
+			 "%s: failed to queue: %d\n", __func__, ret);
+		goto err;
+	}
+
+	return;
+
+err:
+	v4l2_m2m_src_buf_remove(ctx->fh.m2m_ctx);
+	v4l2_m2m_dst_buf_remove(ctx->fh.m2m_ctx);
+	v4l2_m2m_buf_done(src_buf, VB2_BUF_STATE_ERROR);
+	v4l2_m2m_buf_done(dst_buf, VB2_BUF_STATE_ERROR);
+	v4l2_m2m_job_finish(priv->m2m_dev, ctx->fh.m2m_ctx);
+}
+
+/*
+ * Video ioctls
+ */
+static int vidioc_querycap(struct file *file, void *priv,
+			   struct v4l2_capability *cap)
+{
+	strncpy(cap->driver, "imx-media-mem2mem", sizeof(cap->driver) - 1);
+	strncpy(cap->card, "imx-media-mem2mem", sizeof(cap->card) - 1);
+	strncpy(cap->bus_info, "platform:imx-media-mem2mem",
+		sizeof(cap->bus_info) - 1);
+	cap->device_caps = V4L2_CAP_VIDEO_M2M | V4L2_CAP_STREAMING;
+	cap->capabilities = cap->device_caps | V4L2_CAP_DEVICE_CAPS;
+
+	return 0;
+}
+
+static int mem2mem_enum_fmt(struct file *file, void *fh,
+			    struct v4l2_fmtdesc *f)
+{
+	u32 fourcc;
+	int ret;
+
+	ret = imx_media_enum_format(&fourcc, f->index, CS_SEL_ANY);
+	if (ret)
+		return ret;
+
+	f->pixelformat = fourcc;
+
+	return 0;
+}
+
+static int mem2mem_g_fmt(struct file *file, void *priv, struct v4l2_format *f)
+{
+	struct mem2mem_ctx *ctx = fh_to_ctx(priv);
+	struct mem2mem_q_data *q_data;
+
+	q_data = get_q_data(ctx, f->type);
+
+	f->fmt.pix = q_data->cur_fmt;
+
+	return 0;
+}
+
+static int mem2mem_try_fmt(struct file *file, void *priv,
+			   struct v4l2_format *f)
+{
+	struct mem2mem_ctx *ctx = fh_to_ctx(priv);
+	struct mem2mem_q_data *q_data = get_q_data(ctx, f->type);
+	struct ipu_image test_in, test_out;
+
+	if (f->type == V4L2_BUF_TYPE_VIDEO_CAPTURE) {
+		struct mem2mem_q_data *q_data_in =
+			get_q_data(ctx, V4L2_BUF_TYPE_VIDEO_OUTPUT);
+
+		test_out.pix = f->fmt.pix;
+		test_in.pix = q_data_in->cur_fmt;
+	} else {
+		struct mem2mem_q_data *q_data_out =
+			get_q_data(ctx, V4L2_BUF_TYPE_VIDEO_CAPTURE);
+
+		test_in.pix = f->fmt.pix;
+		test_out.pix = q_data_out->cur_fmt;
+	}
+
+	ipu_image_convert_adjust(&test_in, &test_out, ctx->rot_mode);
+
+	f->fmt.pix = (f->type == V4L2_BUF_TYPE_VIDEO_CAPTURE) ?
+		test_out.pix : test_in.pix;
+
+	if (f->type == V4L2_BUF_TYPE_VIDEO_CAPTURE) {
+		f->fmt.pix.colorspace = q_data->cur_fmt.colorspace;
+		f->fmt.pix.ycbcr_enc = q_data->cur_fmt.ycbcr_enc;
+		f->fmt.pix.xfer_func = q_data->cur_fmt.xfer_func;
+		f->fmt.pix.quantization = q_data->cur_fmt.quantization;
+	} else if (f->fmt.pix.colorspace == V4L2_COLORSPACE_DEFAULT) {
+		f->fmt.pix.colorspace = V4L2_COLORSPACE_SRGB;
+		f->fmt.pix.ycbcr_enc = V4L2_YCBCR_ENC_DEFAULT;
+		f->fmt.pix.xfer_func = V4L2_XFER_FUNC_DEFAULT;
+		f->fmt.pix.quantization = V4L2_QUANTIZATION_DEFAULT;
+	}
+
+	return 0;
+}
+
+static int mem2mem_s_fmt(struct file *file, void *priv, struct v4l2_format *f)
+{
+	struct mem2mem_q_data *q_data;
+	struct mem2mem_ctx *ctx = fh_to_ctx(priv);
+	struct vb2_queue *vq;
+	int ret;
+
+	vq = v4l2_m2m_get_vq(ctx->fh.m2m_ctx, f->type);
+	if (vb2_is_busy(vq)) {
+		v4l2_err(ctx->priv->vdev.vfd->v4l2_dev, "%s queue busy\n",
+			 __func__);
+		return -EBUSY;
+	}
+
+	q_data = get_q_data(ctx, f->type);
+
+	ret = mem2mem_try_fmt(file, priv, f);
+	if (ret < 0)
+		return ret;
+
+	q_data->cur_fmt.width = f->fmt.pix.width;
+	q_data->cur_fmt.height = f->fmt.pix.height;
+	q_data->cur_fmt.pixelformat = f->fmt.pix.pixelformat;
+	q_data->cur_fmt.field = f->fmt.pix.field;
+	q_data->cur_fmt.bytesperline = f->fmt.pix.bytesperline;
+	q_data->cur_fmt.sizeimage = f->fmt.pix.sizeimage;
+
+	/* Reset cropping/composing rectangle */
+	q_data->rect.left = 0;
+	q_data->rect.top = 0;
+	q_data->rect.width = q_data->cur_fmt.width;
+	q_data->rect.height = q_data->cur_fmt.height;
+
+	if (f->type == V4L2_BUF_TYPE_VIDEO_OUTPUT) {
+		/* Set colorimetry on the output queue */
+		q_data->cur_fmt.colorspace = f->fmt.pix.colorspace;
+		q_data->cur_fmt.ycbcr_enc = f->fmt.pix.ycbcr_enc;
+		q_data->cur_fmt.xfer_func = f->fmt.pix.xfer_func;
+		q_data->cur_fmt.quantization = f->fmt.pix.quantization;
+		/* Propagate colorimetry to the capture queue */
+		q_data = get_q_data(ctx, V4L2_BUF_TYPE_VIDEO_CAPTURE);
+		q_data->cur_fmt.colorspace = f->fmt.pix.colorspace;
+		q_data->cur_fmt.ycbcr_enc = f->fmt.pix.ycbcr_enc;
+		q_data->cur_fmt.xfer_func = f->fmt.pix.xfer_func;
+		q_data->cur_fmt.quantization = f->fmt.pix.quantization;
+	}
+
+	/*
+	 * TODO: Setting colorimetry on the capture queue is currently not
+	 * supported by the V4L2 API
+	 */
+
+	return 0;
+}
+
+static int mem2mem_g_selection(struct file *file, void *priv,
+			       struct v4l2_selection *s)
+{
+	struct mem2mem_ctx *ctx = fh_to_ctx(priv);
+	struct mem2mem_q_data *q_data;
+
+	switch (s->target) {
+	case V4L2_SEL_TGT_CROP:
+	case V4L2_SEL_TGT_CROP_DEFAULT:
+	case V4L2_SEL_TGT_CROP_BOUNDS:
+		if (s->type != V4L2_BUF_TYPE_VIDEO_OUTPUT)
+			return -EINVAL;
+		q_data = get_q_data(ctx, V4L2_BUF_TYPE_VIDEO_OUTPUT);
+		break;
+	case V4L2_SEL_TGT_COMPOSE:
+	case V4L2_SEL_TGT_COMPOSE_DEFAULT:
+	case V4L2_SEL_TGT_COMPOSE_BOUNDS:
+	case V4L2_SEL_TGT_COMPOSE_PADDED:
+		if (s->type != V4L2_BUF_TYPE_VIDEO_CAPTURE)
+			return -EINVAL;
+		q_data = get_q_data(ctx, V4L2_BUF_TYPE_VIDEO_CAPTURE);
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	if (s->target == V4L2_SEL_TGT_CROP ||
+	    s->target == V4L2_SEL_TGT_COMPOSE) {
+		s->r = q_data->rect;
+	} else {
+		s->r.left = 0;
+		s->r.top = 0;
+		s->r.width = q_data->cur_fmt.width;
+		s->r.height = q_data->cur_fmt.height;
+	}
+
+	return 0;
+}
+
+static int mem2mem_s_selection(struct file *file, void *priv,
+			       struct v4l2_selection *s)
+{
+	struct mem2mem_ctx *ctx = fh_to_ctx(priv);
+	struct mem2mem_q_data *q_data;
+
+	switch (s->target) {
+	case V4L2_SEL_TGT_CROP:
+		if (s->type != V4L2_BUF_TYPE_VIDEO_OUTPUT)
+			return -EINVAL;
+		break;
+	case V4L2_SEL_TGT_COMPOSE:
+		if (s->type != V4L2_BUF_TYPE_VIDEO_CAPTURE)
+			return -EINVAL;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	if (s->type != V4L2_BUF_TYPE_VIDEO_CAPTURE &&
+	    s->type != V4L2_BUF_TYPE_VIDEO_OUTPUT)
+		return -EINVAL;
+
+	q_data = get_q_data(ctx, s->type);
+
+	/* The input's frame width to the IC must be a multiple of 8 pixels
+	 * When performing resizing the frame width must be multiple of burst
+	 * size - 8 or 16 pixels as defined by CB#_BURST_16 parameter.
+	 */
+	if (s->flags & V4L2_SEL_FLAG_GE)
+		s->r.width = round_up(s->r.width, 8);
+	if (s->flags & V4L2_SEL_FLAG_LE)
+		s->r.width = round_down(s->r.width, 8);
+	s->r.width = clamp_t(unsigned int, s->r.width, 8,
+			     round_down(q_data->cur_fmt.width, 8));
+	s->r.height = clamp_t(unsigned int, s->r.height, 1,
+			      q_data->cur_fmt.height);
+	s->r.left = clamp_t(unsigned int, s->r.left, 0,
+			    q_data->cur_fmt.width - s->r.width);
+	s->r.top = clamp_t(unsigned int, s->r.top, 0,
+			   q_data->cur_fmt.height - s->r.height);
+
+	/* V4L2_SEL_FLAG_KEEP_CONFIG is only valid for subdevices */
+	q_data->rect = s->r;
+
+	return 0;
+}
+
+static const struct v4l2_ioctl_ops mem2mem_ioctl_ops = {
+	.vidioc_querycap	= vidioc_querycap,
+
+	.vidioc_enum_fmt_vid_cap = mem2mem_enum_fmt,
+	.vidioc_g_fmt_vid_cap	= mem2mem_g_fmt,
+	.vidioc_try_fmt_vid_cap	= mem2mem_try_fmt,
+	.vidioc_s_fmt_vid_cap	= mem2mem_s_fmt,
+
+	.vidioc_enum_fmt_vid_out = mem2mem_enum_fmt,
+	.vidioc_g_fmt_vid_out	= mem2mem_g_fmt,
+	.vidioc_try_fmt_vid_out	= mem2mem_try_fmt,
+	.vidioc_s_fmt_vid_out	= mem2mem_s_fmt,
+
+	.vidioc_g_selection	= mem2mem_g_selection,
+	.vidioc_s_selection	= mem2mem_s_selection,
+
+	.vidioc_reqbufs		= v4l2_m2m_ioctl_reqbufs,
+	.vidioc_querybuf	= v4l2_m2m_ioctl_querybuf,
+
+	.vidioc_qbuf		= v4l2_m2m_ioctl_qbuf,
+	.vidioc_expbuf		= v4l2_m2m_ioctl_expbuf,
+	.vidioc_dqbuf		= v4l2_m2m_ioctl_dqbuf,
+	.vidioc_create_bufs	= v4l2_m2m_ioctl_create_bufs,
+
+	.vidioc_streamon	= v4l2_m2m_ioctl_streamon,
+	.vidioc_streamoff	= v4l2_m2m_ioctl_streamoff,
+
+	.vidioc_subscribe_event = v4l2_ctrl_subscribe_event,
+	.vidioc_unsubscribe_event = v4l2_event_unsubscribe,
+};
+
+/*
+ * Queue operations
+ */
+
+static int mem2mem_queue_setup(struct vb2_queue *vq, unsigned int *nbuffers,
+			       unsigned int *nplanes, unsigned int sizes[],
+			       struct device *alloc_devs[])
+{
+	struct mem2mem_ctx *ctx = vb2_get_drv_priv(vq);
+	struct mem2mem_q_data *q_data;
+	unsigned int count = *nbuffers;
+	struct v4l2_pix_format *pix;
+
+	q_data = get_q_data(ctx, vq->type);
+	pix = &q_data->cur_fmt;
+
+	*nplanes = 1;
+	*nbuffers = count;
+	sizes[0] = pix->sizeimage;
+
+	dev_dbg(ctx->priv->dev, "get %d buffer(s) of size %d each.\n",
+		count, pix->sizeimage);
+
+	return 0;
+}
+
+static int mem2mem_buf_prepare(struct vb2_buffer *vb)
+{
+	struct mem2mem_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
+	struct mem2mem_q_data *q_data;
+	struct v4l2_pix_format *pix;
+	unsigned int plane_size, payload;
+
+	dev_dbg(ctx->priv->dev, "type: %d\n", vb->vb2_queue->type);
+
+	q_data = get_q_data(ctx, vb->vb2_queue->type);
+	pix = &q_data->cur_fmt;
+	plane_size = pix->sizeimage;
+
+	if (vb2_plane_size(vb, 0) < plane_size) {
+		dev_dbg(ctx->priv->dev,
+			"%s data will not fit into plane (%lu < %lu)\n",
+			__func__, vb2_plane_size(vb, 0), (long)plane_size);
+		return -EINVAL;
+	}
+
+	payload = pix->bytesperline * pix->height;
+	if (pix->pixelformat == V4L2_PIX_FMT_YUV420 ||
+	    pix->pixelformat == V4L2_PIX_FMT_YVU420 ||
+	    pix->pixelformat == V4L2_PIX_FMT_NV12)
+		payload = payload * 3 / 2;
+	else if (pix->pixelformat == V4L2_PIX_FMT_YUV422P ||
+		 pix->pixelformat == V4L2_PIX_FMT_NV16)
+		payload *= 2;
+
+	vb2_set_plane_payload(vb, 0, payload);
+
+	return 0;
+}
+
+static void mem2mem_buf_queue(struct vb2_buffer *vb)
+{
+	struct mem2mem_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
+
+	v4l2_m2m_buf_queue(ctx->fh.m2m_ctx, to_vb2_v4l2_buffer(vb));
+}
+
+static void ipu_image_from_q_data(struct ipu_image *im,
+				  struct mem2mem_q_data *q_data)
+{
+	im->pix.width = q_data->cur_fmt.width;
+	im->pix.height = q_data->cur_fmt.height;
+	im->pix.bytesperline = q_data->cur_fmt.bytesperline;
+	im->pix.pixelformat = q_data->cur_fmt.pixelformat;
+	im->rect = q_data->rect;
+}
+
+static int mem2mem_start_streaming(struct vb2_queue *q, unsigned int count)
+{
+	const enum ipu_ic_task ic_task = IC_TASK_POST_PROCESSOR;
+	struct mem2mem_ctx *ctx = vb2_get_drv_priv(q);
+	struct mem2mem_priv *priv = ctx->priv;
+	struct ipu_soc *ipu = priv->md->ipu[0];
+	struct mem2mem_q_data *q_data;
+	struct vb2_queue *other_q;
+	struct ipu_image in, out;
+
+	other_q = v4l2_m2m_get_vq(ctx->fh.m2m_ctx,
+				  (q->type == V4L2_BUF_TYPE_VIDEO_CAPTURE) ?
+				  V4L2_BUF_TYPE_VIDEO_OUTPUT :
+				  V4L2_BUF_TYPE_VIDEO_CAPTURE);
+	if (!vb2_is_streaming(other_q))
+		return 0;
+
+	if (ctx->icc) {
+		v4l2_warn(ctx->priv->vdev.vfd->v4l2_dev, "removing old ICC\n");
+		ipu_image_convert_unprepare(ctx->icc);
+	}
+
+	q_data = get_q_data(ctx, V4L2_BUF_TYPE_VIDEO_OUTPUT);
+	ipu_image_from_q_data(&in, q_data);
+
+	q_data = get_q_data(ctx, V4L2_BUF_TYPE_VIDEO_CAPTURE);
+	ipu_image_from_q_data(&out, q_data);
+
+	ctx->icc = ipu_image_convert_prepare(ipu, ic_task, &in, &out,
+					     ctx->rot_mode,
+					     mem2mem_ic_complete, ctx);
+	if (IS_ERR(ctx->icc)) {
+		struct vb2_v4l2_buffer *buf;
+		int ret = PTR_ERR(ctx->icc);
+
+		ctx->icc = NULL;
+		v4l2_err(ctx->priv->vdev.vfd->v4l2_dev, "%s: error %d\n",
+			 __func__, ret);
+		while ((buf = v4l2_m2m_src_buf_remove(ctx->fh.m2m_ctx)))
+			v4l2_m2m_buf_done(buf, VB2_BUF_STATE_QUEUED);
+		while ((buf = v4l2_m2m_dst_buf_remove(ctx->fh.m2m_ctx)))
+			v4l2_m2m_buf_done(buf, VB2_BUF_STATE_QUEUED);
+		return ret;
+	}
+
+	return 0;
+}
+
+static void mem2mem_stop_streaming(struct vb2_queue *q)
+{
+	struct mem2mem_ctx *ctx = vb2_get_drv_priv(q);
+	struct vb2_v4l2_buffer *buf;
+
+	if (ctx->icc) {
+		ipu_image_convert_unprepare(ctx->icc);
+		ctx->icc = NULL;
+	}
+
+	if (q->type == V4L2_BUF_TYPE_VIDEO_OUTPUT) {
+		while ((buf = v4l2_m2m_src_buf_remove(ctx->fh.m2m_ctx)))
+			v4l2_m2m_buf_done(buf, VB2_BUF_STATE_ERROR);
+	} else {
+		while ((buf = v4l2_m2m_dst_buf_remove(ctx->fh.m2m_ctx)))
+			v4l2_m2m_buf_done(buf, VB2_BUF_STATE_ERROR);
+	}
+}
+
+static const struct vb2_ops mem2mem_qops = {
+	.queue_setup	= mem2mem_queue_setup,
+	.buf_prepare	= mem2mem_buf_prepare,
+	.buf_queue	= mem2mem_buf_queue,
+	.wait_prepare	= vb2_ops_wait_prepare,
+	.wait_finish	= vb2_ops_wait_finish,
+	.start_streaming = mem2mem_start_streaming,
+	.stop_streaming = mem2mem_stop_streaming,
+};
+
+static int queue_init(void *priv, struct vb2_queue *src_vq,
+		      struct vb2_queue *dst_vq)
+{
+	struct mem2mem_ctx *ctx = priv;
+	int ret;
+
+	memset(src_vq, 0, sizeof(*src_vq));
+	src_vq->type = V4L2_BUF_TYPE_VIDEO_OUTPUT;
+	src_vq->io_modes = VB2_MMAP | VB2_DMABUF;
+	src_vq->drv_priv = ctx;
+	src_vq->buf_struct_size = sizeof(struct v4l2_m2m_buffer);
+	src_vq->ops = &mem2mem_qops;
+	src_vq->mem_ops = &vb2_dma_contig_memops;
+	src_vq->timestamp_flags = V4L2_BUF_FLAG_TIMESTAMP_COPY;
+	src_vq->lock = &ctx->priv->mutex;
+	src_vq->dev = ctx->priv->dev;
+
+	ret = vb2_queue_init(src_vq);
+	if (ret)
+		return ret;
+
+	memset(dst_vq, 0, sizeof(*dst_vq));
+	dst_vq->type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
+	dst_vq->io_modes = VB2_MMAP | VB2_DMABUF;
+	dst_vq->drv_priv = ctx;
+	dst_vq->buf_struct_size = sizeof(struct v4l2_m2m_buffer);
+	dst_vq->ops = &mem2mem_qops;
+	dst_vq->mem_ops = &vb2_dma_contig_memops;
+	dst_vq->timestamp_flags = V4L2_BUF_FLAG_TIMESTAMP_COPY;
+	dst_vq->lock = &ctx->priv->mutex;
+	dst_vq->dev = ctx->priv->dev;
+
+	return vb2_queue_init(dst_vq);
+}
+
+static int mem2mem_s_ctrl(struct v4l2_ctrl *ctrl)
+{
+	struct mem2mem_ctx *ctx = container_of(ctrl->handler,
+					       struct mem2mem_ctx, ctrl_hdlr);
+	enum ipu_rotate_mode rot_mode;
+	int rotate;
+	bool hflip, vflip;
+	int ret = 0;
+
+	rotate = ctx->rotate;
+	hflip = ctx->hflip;
+	vflip = ctx->vflip;
+
+	switch (ctrl->id) {
+	case V4L2_CID_HFLIP:
+		hflip = ctrl->val;
+		break;
+	case V4L2_CID_VFLIP:
+		vflip = ctrl->val;
+		break;
+	case V4L2_CID_ROTATE:
+		rotate = ctrl->val;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	ret = ipu_degrees_to_rot_mode(&rot_mode, rotate, hflip, vflip);
+	if (ret)
+		return ret;
+
+	if (rot_mode != ctx->rot_mode) {
+		struct vb2_queue *cap_q;
+
+		cap_q = v4l2_m2m_get_vq(ctx->fh.m2m_ctx,
+					V4L2_BUF_TYPE_VIDEO_CAPTURE);
+		if (vb2_is_streaming(cap_q))
+			return -EBUSY;
+
+		ctx->rot_mode = rot_mode;
+		ctx->rotate = rotate;
+		ctx->hflip = hflip;
+		ctx->vflip = vflip;
+	}
+
+	return 0;
+}
+
+static const struct v4l2_ctrl_ops mem2mem_ctrl_ops = {
+	.s_ctrl = mem2mem_s_ctrl,
+};
+
+static int mem2mem_init_controls(struct mem2mem_ctx *ctx)
+{
+	struct v4l2_ctrl_handler *hdlr = &ctx->ctrl_hdlr;
+	int ret;
+
+	v4l2_ctrl_handler_init(hdlr, 3);
+
+	v4l2_ctrl_new_std(hdlr, &mem2mem_ctrl_ops, V4L2_CID_HFLIP,
+			  0, 1, 1, 0);
+	v4l2_ctrl_new_std(hdlr, &mem2mem_ctrl_ops, V4L2_CID_VFLIP,
+			  0, 1, 1, 0);
+	v4l2_ctrl_new_std(hdlr, &mem2mem_ctrl_ops, V4L2_CID_ROTATE,
+			  0, 270, 90, 0);
+
+	if (hdlr->error) {
+		ret = hdlr->error;
+		goto out_free;
+	}
+
+	v4l2_ctrl_handler_setup(hdlr);
+	return 0;
+
+out_free:
+	v4l2_ctrl_handler_free(hdlr);
+	return ret;
+}
+
+#define DEFAULT_WIDTH	720
+#define DEFAULT_HEIGHT	576
+static const struct mem2mem_q_data mem2mem_q_data_default = {
+	.cur_fmt = {
+		.width = DEFAULT_WIDTH,
+		.height = DEFAULT_HEIGHT,
+		.pixelformat = V4L2_PIX_FMT_YUV420,
+		.field = V4L2_FIELD_NONE,
+		.bytesperline = DEFAULT_WIDTH,
+		.sizeimage = DEFAULT_WIDTH * DEFAULT_HEIGHT * 3 / 2,
+		.colorspace = V4L2_COLORSPACE_SRGB,
+	},
+	.rect = {
+		.width = DEFAULT_WIDTH,
+		.height = DEFAULT_HEIGHT,
+	},
+};
+
+/*
+ * File operations
+ */
+static int mem2mem_open(struct file *file)
+{
+	struct mem2mem_priv *priv = video_drvdata(file);
+	struct mem2mem_ctx *ctx = NULL;
+	int ret;
+
+	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
+	if (!ctx)
+		return -ENOMEM;
+
+	ctx->rot_mode = IPU_ROTATE_NONE;
+
+	v4l2_fh_init(&ctx->fh, video_devdata(file));
+	file->private_data = &ctx->fh;
+	v4l2_fh_add(&ctx->fh);
+	ctx->priv = priv;
+
+	ctx->fh.m2m_ctx = v4l2_m2m_ctx_init(priv->m2m_dev, ctx,
+					    &queue_init);
+	if (IS_ERR(ctx->fh.m2m_ctx)) {
+		ret = PTR_ERR(ctx->fh.m2m_ctx);
+		goto err_ctx;
+	}
+
+	ret = mem2mem_init_controls(ctx);
+	if (ret)
+		goto err_ctrls;
+
+	ctx->fh.ctrl_handler = &ctx->ctrl_hdlr;
+
+	ctx->q_data[V4L2_M2M_SRC] = mem2mem_q_data_default;
+	ctx->q_data[V4L2_M2M_DST] = mem2mem_q_data_default;
+
+	atomic_inc(&priv->num_inst);
+
+	dev_dbg(priv->dev, "Created instance %p, m2m_ctx: %p\n", ctx,
+		ctx->fh.m2m_ctx);
+
+	return 0;
+
+err_ctrls:
+	v4l2_m2m_ctx_release(ctx->fh.m2m_ctx);
+err_ctx:
+	v4l2_fh_del(&ctx->fh);
+	v4l2_fh_exit(&ctx->fh);
+	kfree(ctx);
+	return ret;
+}
+
+static int mem2mem_release(struct file *file)
+{
+	struct mem2mem_priv *priv = video_drvdata(file);
+	struct mem2mem_ctx *ctx = fh_to_ctx(file->private_data);
+
+	dev_dbg(priv->dev, "Releasing instance %p\n", ctx);
+
+	v4l2_m2m_ctx_release(ctx->fh.m2m_ctx);
+	v4l2_fh_del(&ctx->fh);
+	v4l2_fh_exit(&ctx->fh);
+	kfree(ctx);
+
+	atomic_dec(&priv->num_inst);
+
+	return 0;
+}
+
+static const struct v4l2_file_operations mem2mem_fops = {
+	.owner		= THIS_MODULE,
+	.open		= mem2mem_open,
+	.release	= mem2mem_release,
+	.poll		= v4l2_m2m_fop_poll,
+	.unlocked_ioctl	= video_ioctl2,
+	.mmap		= v4l2_m2m_fop_mmap,
+};
+
+static struct v4l2_m2m_ops m2m_ops = {
+	.device_run	= device_run,
+	.job_abort	= job_abort,
+};
+
+static const struct video_device mem2mem_videodev_template = {
+	.name		= "ipu0_ic_pp mem2mem",
+	.fops		= &mem2mem_fops,
+	.ioctl_ops	= &mem2mem_ioctl_ops,
+	.minor		= -1,
+	.release	= video_device_release,
+	.vfl_dir	= VFL_DIR_M2M,
+	.tvnorms	= V4L2_STD_NTSC | V4L2_STD_PAL | V4L2_STD_SECAM,
+	.device_caps	= V4L2_CAP_VIDEO_M2M | V4L2_CAP_STREAMING,
+};
+
+int imx_media_mem2mem_device_register(struct imx_media_video_dev *vdev)
+{
+	struct mem2mem_priv *priv = to_mem2mem_priv(vdev);
+	struct video_device *vfd = vdev->vfd;
+	int ret;
+
+	vfd->v4l2_dev = &priv->md->v4l2_dev;
+
+	ret = video_register_device(vfd, VFL_TYPE_GRABBER, -1);
+	if (ret) {
+		v4l2_err(vfd->v4l2_dev, "Failed to register video device\n");
+		return ret;
+	}
+
+	v4l2_info(vfd->v4l2_dev, "Registered %s as /dev/%s\n", vfd->name,
+		  video_device_node_name(vfd));
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(imx_media_mem2mem_device_register);
+
+void imx_media_mem2mem_device_unregister(struct imx_media_video_dev *vdev)
+{
+	struct mem2mem_priv *priv = to_mem2mem_priv(vdev);
+	struct video_device *vfd = priv->vdev.vfd;
+
+	mutex_lock(&priv->mutex);
+
+	if (video_is_registered(vfd)) {
+		video_unregister_device(vfd);
+		media_entity_cleanup(&vfd->entity);
+	}
+
+	mutex_unlock(&priv->mutex);
+}
+EXPORT_SYMBOL_GPL(imx_media_mem2mem_device_unregister);
+
+struct imx_media_video_dev *
+imx_media_mem2mem_device_init(struct imx_media_dev *md)
+{
+	struct mem2mem_priv *priv;
+	struct video_device *vfd;
+	int ret;
+
+	priv = devm_kzalloc(md->md.dev, sizeof(*priv), GFP_KERNEL);
+	if (!priv)
+		return ERR_PTR(-ENOMEM);
+
+	priv->md = md;
+	priv->dev = md->md.dev;
+
+	mutex_init(&priv->mutex);
+	atomic_set(&priv->num_inst, 0);
+
+	vfd = video_device_alloc();
+	if (!vfd)
+		return ERR_PTR(-ENOMEM);
+
+	*vfd = mem2mem_videodev_template;
+	snprintf(vfd->name, sizeof(vfd->name), "ipu_ic_pp mem2mem");
+	vfd->lock = &priv->mutex;
+	priv->vdev.vfd = vfd;
+
+	INIT_LIST_HEAD(&priv->vdev.list);
+
+	video_set_drvdata(vfd, priv);
+
+	priv->m2m_dev = v4l2_m2m_init(&m2m_ops);
+	if (IS_ERR(priv->m2m_dev)) {
+		ret = PTR_ERR(priv->m2m_dev);
+		v4l2_err(&md->v4l2_dev, "Failed to init mem2mem device: %d\n",
+			 ret);
+		return ERR_PTR(ret);
+	}
+
+	return &priv->vdev;
+}
+EXPORT_SYMBOL_GPL(imx_media_mem2mem_device_init);
+
+void imx_media_mem2mem_device_remove(struct imx_media_video_dev *vdev)
+{
+	struct mem2mem_priv *priv = to_mem2mem_priv(vdev);
+
+	v4l2_m2m_release(priv->m2m_dev);
+}
+EXPORT_SYMBOL_GPL(imx_media_mem2mem_device_remove);
+
+MODULE_DESCRIPTION("i.MX IPUv3 mem2mem scaler/CSC driver");
+MODULE_AUTHOR("Sascha Hauer <s.hauer@pengutronix.de>");
+MODULE_LICENSE("GPL");
diff --git a/drivers/staging/media/imx/imx-media.h b/drivers/staging/media/imx/imx-media.h
index bc7feb81937c..8d8f5ca6646d 100644
--- a/drivers/staging/media/imx/imx-media.h
+++ b/drivers/staging/media/imx/imx-media.h
@@ -149,6 +149,9 @@ struct imx_media_dev {
 
 	/* for async subdev registration */
 	struct v4l2_async_notifier notifier;
+
+	/* IC scaler/CSC mem2mem video device */
+	struct imx_media_video_dev *m2m_vdev;
 };
 
 enum codespace_sel {
@@ -262,6 +265,13 @@ void imx_media_capture_device_set_format(struct imx_media_video_dev *vdev,
 					 struct v4l2_pix_format *pix);
 void imx_media_capture_device_error(struct imx_media_video_dev *vdev);
 
+/* imx-media-mem2mem.c */
+struct imx_media_video_dev *
+imx_media_mem2mem_device_init(struct imx_media_dev *dev);
+void imx_media_mem2mem_device_remove(struct imx_media_video_dev *vdev);
+int imx_media_mem2mem_device_register(struct imx_media_video_dev *vdev);
+void imx_media_mem2mem_device_unregister(struct imx_media_video_dev *vdev);
+
 /* subdev group ids */
 #define IMX_MEDIA_GRP_ID_CSI2      BIT(8)
 #define IMX_MEDIA_GRP_ID_CSI_BIT   9
-- 
2.19.0

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v4 02/22] gpu: ipu-cpmem: add WARN_ON_ONCE() for unaligned dma buffers
  2018-10-19 12:15 [PATCH v4 00/22] i.MX media mem2mem scaler Philipp Zabel
  2018-10-19 12:15 ` [PATCH v4 01/22] media: imx: add mem2mem device Philipp Zabel
@ 2018-10-19 12:15 ` Philipp Zabel
  2018-10-19 12:15 ` [PATCH v4 03/22] gpu: ipu-v3: Add chroma plane offset overrides to ipu_cpmem_set_image() Philipp Zabel
                   ` (21 subsequent siblings)
  23 siblings, 0 replies; 26+ messages in thread
From: Philipp Zabel @ 2018-10-19 12:15 UTC (permalink / raw)
  To: linux-media, Steve Longerbeam; +Cc: Nicolas Dufresne, Tim Harvey, kernel

From: Steve Longerbeam <slongerbeam@gmail.com>

Add a WARN_ON_ONCE() if either the Y/packed buffer, or the U/V offsets,
are not aligned on 8-byte boundaries. This will catch alignment
bugs in DRM, V4L2.

Signed-off-by: Steve Longerbeam <slongerbeam@gmail.com>
---
No changes since v3.
---
 drivers/gpu/ipu-v3/ipu-cpmem.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/ipu-v3/ipu-cpmem.c b/drivers/gpu/ipu-v3/ipu-cpmem.c
index a9d2501500a1..7e65954f13c2 100644
--- a/drivers/gpu/ipu-v3/ipu-cpmem.c
+++ b/drivers/gpu/ipu-v3/ipu-cpmem.c
@@ -259,6 +259,8 @@ EXPORT_SYMBOL_GPL(ipu_cpmem_set_high_priority);
 
 void ipu_cpmem_set_buffer(struct ipuv3_channel *ch, int bufnum, dma_addr_t buf)
 {
+	WARN_ON_ONCE(buf & 0x7);
+
 	if (bufnum)
 		ipu_ch_param_write_field(ch, IPU_FIELD_EBA1, buf >> 3);
 	else
@@ -268,6 +270,8 @@ EXPORT_SYMBOL_GPL(ipu_cpmem_set_buffer);
 
 void ipu_cpmem_set_uv_offset(struct ipuv3_channel *ch, u32 u_off, u32 v_off)
 {
+	WARN_ON_ONCE((u_off & 0x7) || (v_off & 0x7));
+
 	ipu_ch_param_write_field(ch, IPU_FIELD_UBO, u_off / 8);
 	ipu_ch_param_write_field(ch, IPU_FIELD_VBO, v_off / 8);
 }
@@ -435,6 +439,8 @@ void ipu_cpmem_set_yuv_planar_full(struct ipuv3_channel *ch,
 				   unsigned int uv_stride,
 				   unsigned int u_offset, unsigned int v_offset)
 {
+	WARN_ON_ONCE((u_offset & 0x7) || (v_offset & 0x7));
+
 	ipu_ch_param_write_field(ch, IPU_FIELD_SLUV, uv_stride - 1);
 	ipu_ch_param_write_field(ch, IPU_FIELD_UBO, u_offset / 8);
 	ipu_ch_param_write_field(ch, IPU_FIELD_VBO, v_offset / 8);
-- 
2.19.0

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v4 03/22] gpu: ipu-v3: Add chroma plane offset overrides to ipu_cpmem_set_image()
  2018-10-19 12:15 [PATCH v4 00/22] i.MX media mem2mem scaler Philipp Zabel
  2018-10-19 12:15 ` [PATCH v4 01/22] media: imx: add mem2mem device Philipp Zabel
  2018-10-19 12:15 ` [PATCH v4 02/22] gpu: ipu-cpmem: add WARN_ON_ONCE() for unaligned dma buffers Philipp Zabel
@ 2018-10-19 12:15 ` Philipp Zabel
  2018-10-19 12:15 ` [PATCH v4 04/22] gpu: ipu-v3: ipu-ic: allow to manually set resize coefficients Philipp Zabel
                   ` (20 subsequent siblings)
  23 siblings, 0 replies; 26+ messages in thread
From: Philipp Zabel @ 2018-10-19 12:15 UTC (permalink / raw)
  To: linux-media, Steve Longerbeam; +Cc: Nicolas Dufresne, Tim Harvey, kernel

From: Steve Longerbeam <slongerbeam@gmail.com>

Allow the caller of ipu_cpmem_set_image() to override the latters
calculation of the chroma plane offsets, by adding override U/V
plane offsets to 'struct ipu_image'.

Signed-off-by: Steve Longerbeam <slongerbeam@gmail.com>
---
New since v3.
---
 drivers/gpu/ipu-v3/ipu-cpmem.c         | 46 +++++++++++++++-----------
 drivers/gpu/ipu-v3/ipu-image-convert.c | 10 +++---
 include/video/imx-ipu-v3.h             |  3 ++
 3 files changed, 35 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-cpmem.c b/drivers/gpu/ipu-v3/ipu-cpmem.c
index 7e65954f13c2..163fadb8a33a 100644
--- a/drivers/gpu/ipu-v3/ipu-cpmem.c
+++ b/drivers/gpu/ipu-v3/ipu-cpmem.c
@@ -745,48 +745,56 @@ int ipu_cpmem_set_image(struct ipuv3_channel *ch, struct ipu_image *image)
 	switch (pix->pixelformat) {
 	case V4L2_PIX_FMT_YUV420:
 		offset = Y_OFFSET(pix, image->rect.left, image->rect.top);
-		u_offset = U_OFFSET(pix, image->rect.left,
-				    image->rect.top) - offset;
-		v_offset = V_OFFSET(pix, image->rect.left,
-				    image->rect.top) - offset;
+		u_offset = image->u_offset ?
+			image->u_offset : U_OFFSET(pix, image->rect.left,
+						   image->rect.top) - offset;
+		v_offset = image->v_offset ?
+			image->v_offset : V_OFFSET(pix, image->rect.left,
+						   image->rect.top) - offset;
 
 		ipu_cpmem_set_yuv_planar_full(ch, pix->bytesperline / 2,
 					      u_offset, v_offset);
 		break;
 	case V4L2_PIX_FMT_YVU420:
 		offset = Y_OFFSET(pix, image->rect.left, image->rect.top);
-		u_offset = U_OFFSET(pix, image->rect.left,
-				    image->rect.top) - offset;
-		v_offset = V_OFFSET(pix, image->rect.left,
-				    image->rect.top) - offset;
+		u_offset = image->u_offset ?
+			image->u_offset : V_OFFSET(pix, image->rect.left,
+						   image->rect.top) - offset;
+		v_offset = image->v_offset ?
+			image->v_offset : U_OFFSET(pix, image->rect.left,
+						   image->rect.top) - offset;
 
 		ipu_cpmem_set_yuv_planar_full(ch, pix->bytesperline / 2,
-					      v_offset, u_offset);
+					      u_offset, v_offset);
 		break;
 	case V4L2_PIX_FMT_YUV422P:
 		offset = Y_OFFSET(pix, image->rect.left, image->rect.top);
-		u_offset = U2_OFFSET(pix, image->rect.left,
-				     image->rect.top) - offset;
-		v_offset = V2_OFFSET(pix, image->rect.left,
-				     image->rect.top) - offset;
+		u_offset = image->u_offset ?
+			image->u_offset : U2_OFFSET(pix, image->rect.left,
+						    image->rect.top) - offset;
+		v_offset = image->v_offset ?
+			image->v_offset : V2_OFFSET(pix, image->rect.left,
+						    image->rect.top) - offset;
 
 		ipu_cpmem_set_yuv_planar_full(ch, pix->bytesperline / 2,
 					      u_offset, v_offset);
 		break;
 	case V4L2_PIX_FMT_NV12:
 		offset = Y_OFFSET(pix, image->rect.left, image->rect.top);
-		u_offset = UV_OFFSET(pix, image->rect.left,
-				     image->rect.top) - offset;
-		v_offset = 0;
+		u_offset = image->u_offset ?
+			image->u_offset : UV_OFFSET(pix, image->rect.left,
+						    image->rect.top) - offset;
+		v_offset = image->v_offset ? image->v_offset : 0;
 
 		ipu_cpmem_set_yuv_planar_full(ch, pix->bytesperline,
 					      u_offset, v_offset);
 		break;
 	case V4L2_PIX_FMT_NV16:
 		offset = Y_OFFSET(pix, image->rect.left, image->rect.top);
-		u_offset = UV2_OFFSET(pix, image->rect.left,
-				      image->rect.top) - offset;
-		v_offset = 0;
+		u_offset = image->u_offset ?
+			image->u_offset : UV2_OFFSET(pix, image->rect.left,
+						     image->rect.top) - offset;
+		v_offset = image->v_offset ? image->v_offset : 0;
 
 		ipu_cpmem_set_yuv_planar_full(ch, pix->bytesperline,
 					      u_offset, v_offset);
diff --git a/drivers/gpu/ipu-v3/ipu-image-convert.c b/drivers/gpu/ipu-v3/ipu-image-convert.c
index f4081962784c..41fb62b88c54 100644
--- a/drivers/gpu/ipu-v3/ipu-image-convert.c
+++ b/drivers/gpu/ipu-v3/ipu-image-convert.c
@@ -655,12 +655,12 @@ static void init_idmac_channel(struct ipu_image_convert_ctx *ctx,
 	tile_image.pix.pixelformat =  image->fmt->fourcc;
 	tile_image.phys0 = addr0;
 	tile_image.phys1 = addr1;
-	ipu_cpmem_set_image(channel, &tile_image);
+	if (image->fmt->planar && !rot_swap_width_height) {
+		tile_image.u_offset = image->tile[tile_idx[0]].u_off;
+		tile_image.v_offset = image->tile[tile_idx[0]].v_off;
+	}
 
-	if (image->fmt->planar && !rot_swap_width_height)
-		ipu_cpmem_set_uv_offset(channel,
-					image->tile[tile_idx[0]].u_off,
-					image->tile[tile_idx[0]].v_off);
+	ipu_cpmem_set_image(channel, &tile_image);
 
 	if (rot_mode)
 		ipu_cpmem_set_rotation(channel, rot_mode);
diff --git a/include/video/imx-ipu-v3.h b/include/video/imx-ipu-v3.h
index abbad94e14a1..8bb163cd9314 100644
--- a/include/video/imx-ipu-v3.h
+++ b/include/video/imx-ipu-v3.h
@@ -246,6 +246,9 @@ struct ipu_image {
 	struct v4l2_rect rect;
 	dma_addr_t phys0;
 	dma_addr_t phys1;
+	/* chroma plane offset overrides */
+	u32 u_offset;
+	u32 v_offset;
 };
 
 void ipu_cpmem_zero(struct ipuv3_channel *ch);
-- 
2.19.0

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v4 04/22] gpu: ipu-v3: ipu-ic: allow to manually set resize coefficients
  2018-10-19 12:15 [PATCH v4 00/22] i.MX media mem2mem scaler Philipp Zabel
                   ` (2 preceding siblings ...)
  2018-10-19 12:15 ` [PATCH v4 03/22] gpu: ipu-v3: Add chroma plane offset overrides to ipu_cpmem_set_image() Philipp Zabel
@ 2018-10-19 12:15 ` Philipp Zabel
  2018-10-19 12:15 ` [PATCH v4 05/22] gpu: ipu-v3: image-convert: Prevent race between run and unprepare Philipp Zabel
                   ` (19 subsequent siblings)
  23 siblings, 0 replies; 26+ messages in thread
From: Philipp Zabel @ 2018-10-19 12:15 UTC (permalink / raw)
  To: linux-media, Steve Longerbeam; +Cc: Nicolas Dufresne, Tim Harvey, kernel

For tiled scaling, we want to compute the scaling coefficients
externally in such a way that the interpolation overshoots tile
boundaries and samples up to the first pixel of the next tile.
Prepare to override the resizing coefficients from the image
conversion code.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
---
No changes since v3.
---
 drivers/gpu/ipu-v3/ipu-ic.c | 52 +++++++++++++++++++++++--------------
 include/video/imx-ipu-v3.h  |  6 +++++
 2 files changed, 39 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-ic.c b/drivers/gpu/ipu-v3/ipu-ic.c
index 67cc820253a9..594c3cbc8291 100644
--- a/drivers/gpu/ipu-v3/ipu-ic.c
+++ b/drivers/gpu/ipu-v3/ipu-ic.c
@@ -442,36 +442,40 @@ int ipu_ic_task_graphics_init(struct ipu_ic *ic,
 }
 EXPORT_SYMBOL_GPL(ipu_ic_task_graphics_init);
 
-int ipu_ic_task_init(struct ipu_ic *ic,
-		     int in_width, int in_height,
-		     int out_width, int out_height,
-		     enum ipu_color_space in_cs,
-		     enum ipu_color_space out_cs)
+int ipu_ic_task_init_rsc(struct ipu_ic *ic,
+			 int in_width, int in_height,
+			 int out_width, int out_height,
+			 enum ipu_color_space in_cs,
+			 enum ipu_color_space out_cs,
+			 u32 rsc)
 {
 	struct ipu_ic_priv *priv = ic->priv;
-	u32 reg, downsize_coeff, resize_coeff;
+	u32 downsize_coeff, resize_coeff;
 	unsigned long flags;
 	int ret = 0;
 
-	/* Setup vertical resizing */
-	ret = calc_resize_coeffs(ic, in_height, out_height,
-				 &resize_coeff, &downsize_coeff);
-	if (ret)
-		return ret;
+	if (!rsc) {
+		/* Setup vertical resizing */
 
-	reg = (downsize_coeff << 30) | (resize_coeff << 16);
+		ret = calc_resize_coeffs(ic, in_height, out_height,
+					 &resize_coeff, &downsize_coeff);
+		if (ret)
+			return ret;
+
+		rsc = (downsize_coeff << 30) | (resize_coeff << 16);
 
-	/* Setup horizontal resizing */
-	ret = calc_resize_coeffs(ic, in_width, out_width,
-				 &resize_coeff, &downsize_coeff);
-	if (ret)
-		return ret;
+		/* Setup horizontal resizing */
+		ret = calc_resize_coeffs(ic, in_width, out_width,
+					 &resize_coeff, &downsize_coeff);
+		if (ret)
+			return ret;
 
-	reg |= (downsize_coeff << 14) | resize_coeff;
+		rsc |= (downsize_coeff << 14) | resize_coeff;
+	}
 
 	spin_lock_irqsave(&priv->lock, flags);
 
-	ipu_ic_write(ic, reg, ic->reg->rsc);
+	ipu_ic_write(ic, rsc, ic->reg->rsc);
 
 	/* Setup color space conversion */
 	ic->in_cs = in_cs;
@@ -487,6 +491,16 @@ int ipu_ic_task_init(struct ipu_ic *ic,
 	spin_unlock_irqrestore(&priv->lock, flags);
 	return ret;
 }
+
+int ipu_ic_task_init(struct ipu_ic *ic,
+		     int in_width, int in_height,
+		     int out_width, int out_height,
+		     enum ipu_color_space in_cs,
+		     enum ipu_color_space out_cs)
+{
+	return ipu_ic_task_init_rsc(ic, in_width, in_height, out_width,
+				    out_height, in_cs, out_cs, 0);
+}
 EXPORT_SYMBOL_GPL(ipu_ic_task_init);
 
 int ipu_ic_task_idma_init(struct ipu_ic *ic, struct ipuv3_channel *channel,
diff --git a/include/video/imx-ipu-v3.h b/include/video/imx-ipu-v3.h
index 8bb163cd9314..e582e8e7527a 100644
--- a/include/video/imx-ipu-v3.h
+++ b/include/video/imx-ipu-v3.h
@@ -390,6 +390,12 @@ int ipu_ic_task_init(struct ipu_ic *ic,
 		     int out_width, int out_height,
 		     enum ipu_color_space in_cs,
 		     enum ipu_color_space out_cs);
+int ipu_ic_task_init_rsc(struct ipu_ic *ic,
+			 int in_width, int in_height,
+			 int out_width, int out_height,
+			 enum ipu_color_space in_cs,
+			 enum ipu_color_space out_cs,
+			 u32 rsc);
 int ipu_ic_task_graphics_init(struct ipu_ic *ic,
 			      enum ipu_color_space in_g_cs,
 			      bool galpha_en, u32 galpha,
-- 
2.19.0

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v4 05/22] gpu: ipu-v3: image-convert: Prevent race between run and unprepare
  2018-10-19 12:15 [PATCH v4 00/22] i.MX media mem2mem scaler Philipp Zabel
                   ` (3 preceding siblings ...)
  2018-10-19 12:15 ` [PATCH v4 04/22] gpu: ipu-v3: ipu-ic: allow to manually set resize coefficients Philipp Zabel
@ 2018-10-19 12:15 ` Philipp Zabel
  2018-10-19 12:15 ` [PATCH v4 06/22] gpu: ipu-v3: image-convert: Only wait for abort completion if active run Philipp Zabel
                   ` (18 subsequent siblings)
  23 siblings, 0 replies; 26+ messages in thread
From: Philipp Zabel @ 2018-10-19 12:15 UTC (permalink / raw)
  To: linux-media, Steve Longerbeam; +Cc: Nicolas Dufresne, Tim Harvey, kernel

From: Steve Longerbeam <slongerbeam@gmail.com>

Prevent possible race by parallel threads between ipu_image_convert_run()
and ipu_image_convert_unprepare(). This involves setting ctx->aborting
to true unconditionally so that no new job runs can be queued during
unprepare, and holding the ctx->aborting flag until the context is freed.

Note that the "normal" ipu_image_convert_abort() case (e.g. not during
context unprepare) should clear the ctx->aborting flag after aborting
any active run and clearing the context's pending queue. This is because
it should be possible to continue to use the conversion context and queue
more runs after an abort.

Signed-off-by: Steve Longerbeam <slongerbeam@gmail.com>
---
New since v3.
---
 drivers/gpu/ipu-v3/ipu-image-convert.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-image-convert.c b/drivers/gpu/ipu-v3/ipu-image-convert.c
index 41fb62b88c54..6c15bf8efaa2 100644
--- a/drivers/gpu/ipu-v3/ipu-image-convert.c
+++ b/drivers/gpu/ipu-v3/ipu-image-convert.c
@@ -1524,7 +1524,7 @@ int ipu_image_convert_queue(struct ipu_image_convert_run *run)
 EXPORT_SYMBOL_GPL(ipu_image_convert_queue);
 
 /* Abort any active or pending conversions for this context */
-void ipu_image_convert_abort(struct ipu_image_convert_ctx *ctx)
+static void __ipu_image_convert_abort(struct ipu_image_convert_ctx *ctx)
 {
 	struct ipu_image_convert_chan *chan = ctx->chan;
 	struct ipu_image_convert_priv *priv = chan->priv;
@@ -1551,7 +1551,7 @@ void ipu_image_convert_abort(struct ipu_image_convert_ctx *ctx)
 
 	need_abort = (run_count || active_run);
 
-	ctx->aborting = need_abort;
+	ctx->aborting = true;
 
 	spin_unlock_irqrestore(&chan->irqlock, flags);
 
@@ -1572,7 +1572,11 @@ void ipu_image_convert_abort(struct ipu_image_convert_ctx *ctx)
 		dev_warn(priv->ipu->dev, "%s: timeout\n", __func__);
 		force_abort(ctx);
 	}
+}
 
+void ipu_image_convert_abort(struct ipu_image_convert_ctx *ctx)
+{
+	__ipu_image_convert_abort(ctx);
 	ctx->aborting = false;
 }
 EXPORT_SYMBOL_GPL(ipu_image_convert_abort);
@@ -1586,7 +1590,7 @@ void ipu_image_convert_unprepare(struct ipu_image_convert_ctx *ctx)
 	bool put_res;
 
 	/* make sure no runs are hanging around */
-	ipu_image_convert_abort(ctx);
+	__ipu_image_convert_abort(ctx);
 
 	dev_dbg(priv->ipu->dev, "%s: task %u: removing ctx %p\n", __func__,
 		chan->ic_task, ctx);
-- 
2.19.0

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v4 06/22] gpu: ipu-v3: image-convert: Only wait for abort completion if active run
  2018-10-19 12:15 [PATCH v4 00/22] i.MX media mem2mem scaler Philipp Zabel
                   ` (4 preceding siblings ...)
  2018-10-19 12:15 ` [PATCH v4 05/22] gpu: ipu-v3: image-convert: Prevent race between run and unprepare Philipp Zabel
@ 2018-10-19 12:15 ` Philipp Zabel
  2018-10-19 12:15 ` [PATCH v4 07/22] gpu: ipu-v3: image-convert: Allow reentrancy into abort Philipp Zabel
                   ` (17 subsequent siblings)
  23 siblings, 0 replies; 26+ messages in thread
From: Philipp Zabel @ 2018-10-19 12:15 UTC (permalink / raw)
  To: linux-media, Steve Longerbeam; +Cc: Nicolas Dufresne, Tim Harvey, kernel

From: Steve Longerbeam <slongerbeam@gmail.com>

Only wait for the ctx->aborted completion if there is an active run
in progress, otherwise the wait will just timeout after 10 seconds.
If there is no active run in progress, the done queue just needs to
be emptied.

Signed-off-by: Steve Longerbeam <slongerbeam@gmail.com>
---
New since v3.
---
 drivers/gpu/ipu-v3/ipu-image-convert.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-image-convert.c b/drivers/gpu/ipu-v3/ipu-image-convert.c
index 6c15bf8efaa2..e3e032252604 100644
--- a/drivers/gpu/ipu-v3/ipu-image-convert.c
+++ b/drivers/gpu/ipu-v3/ipu-image-convert.c
@@ -1562,9 +1562,14 @@ static void __ipu_image_convert_abort(struct ipu_image_convert_ctx *ctx)
 		return;
 	}
 
+	if (!active_run) {
+		empty_done_q(chan);
+		return;
+	}
+
 	dev_dbg(priv->ipu->dev,
-		"%s: task %u: wait for completion: %d runs, active run %p\n",
-		__func__, chan->ic_task, run_count, active_run);
+		"%s: task %u: wait for completion: %d runs\n",
+		__func__, chan->ic_task, run_count);
 
 	ret = wait_for_completion_timeout(&ctx->aborted,
 					  msecs_to_jiffies(10000));
-- 
2.19.0

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v4 07/22] gpu: ipu-v3: image-convert: Allow reentrancy into abort
  2018-10-19 12:15 [PATCH v4 00/22] i.MX media mem2mem scaler Philipp Zabel
                   ` (5 preceding siblings ...)
  2018-10-19 12:15 ` [PATCH v4 06/22] gpu: ipu-v3: image-convert: Only wait for abort completion if active run Philipp Zabel
@ 2018-10-19 12:15 ` Philipp Zabel
  2018-10-19 12:15 ` [PATCH v4 08/22] gpu: ipu-v3: image-convert: Remove need_abort flag Philipp Zabel
                   ` (16 subsequent siblings)
  23 siblings, 0 replies; 26+ messages in thread
From: Philipp Zabel @ 2018-10-19 12:15 UTC (permalink / raw)
  To: linux-media, Steve Longerbeam; +Cc: Nicolas Dufresne, Tim Harvey, kernel

From: Steve Longerbeam <slongerbeam@gmail.com>

Allow reentrancy into ipu_image_convert_abort(), by moving re-init
of ctx->aborted completion under the spin lock, and only if there is
an active run, and complete all waiters do_bh(). Note:
ipu_image_convert_unprepare() is still _not_ reentrant, and can't
be made reentrant.

Signed-off-by: Steve Longerbeam <slongerbeam@gmail.com>
---
New since v3.
---
 drivers/gpu/ipu-v3/ipu-image-convert.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-image-convert.c b/drivers/gpu/ipu-v3/ipu-image-convert.c
index e3e032252604..abd8afb22b48 100644
--- a/drivers/gpu/ipu-v3/ipu-image-convert.c
+++ b/drivers/gpu/ipu-v3/ipu-image-convert.c
@@ -896,7 +896,7 @@ static irqreturn_t do_bh(int irq, void *dev_id)
 			dev_dbg(priv->ipu->dev,
 				"%s: task %u: signaling abort for ctx %p\n",
 				__func__, chan->ic_task, ctx);
-			complete(&ctx->aborted);
+			complete_all(&ctx->aborted);
 		}
 	}
 
@@ -1533,8 +1533,6 @@ static void __ipu_image_convert_abort(struct ipu_image_convert_ctx *ctx)
 	int run_count, ret;
 	bool need_abort;
 
-	reinit_completion(&ctx->aborted);
-
 	spin_lock_irqsave(&chan->irqlock, flags);
 
 	/* move all remaining pending runs in this context to done_q */
@@ -1549,6 +1547,9 @@ static void __ipu_image_convert_abort(struct ipu_image_convert_ctx *ctx)
 	active_run = (chan->current_run && chan->current_run->ctx == ctx) ?
 		chan->current_run : NULL;
 
+	if (active_run)
+		reinit_completion(&ctx->aborted);
+
 	need_abort = (run_count || active_run);
 
 	ctx->aborting = true;
-- 
2.19.0

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v4 08/22] gpu: ipu-v3: image-convert: Remove need_abort flag
  2018-10-19 12:15 [PATCH v4 00/22] i.MX media mem2mem scaler Philipp Zabel
                   ` (6 preceding siblings ...)
  2018-10-19 12:15 ` [PATCH v4 07/22] gpu: ipu-v3: image-convert: Allow reentrancy into abort Philipp Zabel
@ 2018-10-19 12:15 ` Philipp Zabel
  2018-10-19 12:15 ` [PATCH v4 09/22] gpu: ipu-v3: image-convert: Catch unaligned tile offsets Philipp Zabel
                   ` (15 subsequent siblings)
  23 siblings, 0 replies; 26+ messages in thread
From: Philipp Zabel @ 2018-10-19 12:15 UTC (permalink / raw)
  To: linux-media, Steve Longerbeam; +Cc: Nicolas Dufresne, Tim Harvey, kernel

From: Steve Longerbeam <slongerbeam@gmail.com>

The need_abort flag is not really needed anymore in
__ipu_image_convert_abort(), remove it.
No functional changes.

Signed-off-by: Steve Longerbeam <slongerbeam@gmail.com>
---
New since v3.
---
 drivers/gpu/ipu-v3/ipu-image-convert.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-image-convert.c b/drivers/gpu/ipu-v3/ipu-image-convert.c
index abd8afb22b48..b8a400182a00 100644
--- a/drivers/gpu/ipu-v3/ipu-image-convert.c
+++ b/drivers/gpu/ipu-v3/ipu-image-convert.c
@@ -1531,7 +1531,6 @@ static void __ipu_image_convert_abort(struct ipu_image_convert_ctx *ctx)
 	struct ipu_image_convert_run *run, *active_run, *tmp;
 	unsigned long flags;
 	int run_count, ret;
-	bool need_abort;
 
 	spin_lock_irqsave(&chan->irqlock, flags);
 
@@ -1550,13 +1549,11 @@ static void __ipu_image_convert_abort(struct ipu_image_convert_ctx *ctx)
 	if (active_run)
 		reinit_completion(&ctx->aborted);
 
-	need_abort = (run_count || active_run);
-
 	ctx->aborting = true;
 
 	spin_unlock_irqrestore(&chan->irqlock, flags);
 
-	if (!need_abort) {
+	if (!run_count && !active_run) {
 		dev_dbg(priv->ipu->dev,
 			"%s: task %u: no abort needed for ctx %p\n",
 			__func__, chan->ic_task, ctx);
-- 
2.19.0

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v4 09/22] gpu: ipu-v3: image-convert: Catch unaligned tile offsets
  2018-10-19 12:15 [PATCH v4 00/22] i.MX media mem2mem scaler Philipp Zabel
                   ` (7 preceding siblings ...)
  2018-10-19 12:15 ` [PATCH v4 08/22] gpu: ipu-v3: image-convert: Remove need_abort flag Philipp Zabel
@ 2018-10-19 12:15 ` Philipp Zabel
  2018-10-19 12:15 ` [PATCH v4 10/22] gpu: ipu-v3: image-convert: prepare for per-tile configuration Philipp Zabel
                   ` (14 subsequent siblings)
  23 siblings, 0 replies; 26+ messages in thread
From: Philipp Zabel @ 2018-10-19 12:15 UTC (permalink / raw)
  To: linux-media, Steve Longerbeam; +Cc: Nicolas Dufresne, Tim Harvey, kernel

From: Steve Longerbeam <slongerbeam@gmail.com>

Catch calculated tile offsets that are not 8-byte aligned as required by the
IDMAC engine and return error in calc_tile_offsets().

Signed-off-by: Steve Longerbeam <slongerbeam@gmail.com>
---
New since v3.
---
 drivers/gpu/ipu-v3/ipu-image-convert.c | 61 ++++++++++++++++----------
 1 file changed, 37 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-image-convert.c b/drivers/gpu/ipu-v3/ipu-image-convert.c
index b8a400182a00..5fccba176e39 100644
--- a/drivers/gpu/ipu-v3/ipu-image-convert.c
+++ b/drivers/gpu/ipu-v3/ipu-image-convert.c
@@ -459,8 +459,8 @@ static void calc_out_tile_map(struct ipu_image_convert_ctx *ctx)
 	}
 }
 
-static void calc_tile_offsets_planar(struct ipu_image_convert_ctx *ctx,
-				     struct ipu_image_convert_image *image)
+static int calc_tile_offsets_planar(struct ipu_image_convert_ctx *ctx,
+				    struct ipu_image_convert_image *image)
 {
 	struct ipu_image_convert_chan *chan = ctx->chan;
 	struct ipu_image_convert_priv *priv = chan->priv;
@@ -509,24 +509,30 @@ static void calc_tile_offsets_planar(struct ipu_image_convert_ctx *ctx,
 			image->tile[tile].u_off = u_off;
 			image->tile[tile++].v_off = v_off;
 
-			dev_dbg(priv->ipu->dev,
-				"task %u: ctx %p: %s@[%d,%d]: y_off %08x, u_off %08x, v_off %08x\n",
-				chan->ic_task, ctx,
-				image->type == IMAGE_CONVERT_IN ?
-				"Input" : "Output", row, col,
-				y_off, u_off, v_off);
+			if ((y_off & 0x7) || (u_off & 0x7) || (v_off & 0x7)) {
+				dev_err(priv->ipu->dev,
+					"task %u: ctx %p: %s@[%d,%d]: "
+					"y_off %08x, u_off %08x, v_off %08x\n",
+					chan->ic_task, ctx,
+					image->type == IMAGE_CONVERT_IN ?
+					"Input" : "Output", row, col,
+					y_off, u_off, v_off);
+				return -EINVAL;
+			}
 		}
 	}
+
+	return 0;
 }
 
-static void calc_tile_offsets_packed(struct ipu_image_convert_ctx *ctx,
-				     struct ipu_image_convert_image *image)
+static int calc_tile_offsets_packed(struct ipu_image_convert_ctx *ctx,
+				    struct ipu_image_convert_image *image)
 {
 	struct ipu_image_convert_chan *chan = ctx->chan;
 	struct ipu_image_convert_priv *priv = chan->priv;
 	const struct ipu_image_pixfmt *fmt = image->fmt;
 	unsigned int row, col, tile = 0;
-	u32 w, h, bpp, stride;
+	u32 w, h, bpp, stride, offset;
 	u32 row_off, col_off;
 
 	/* setup some convenience vars */
@@ -541,27 +547,35 @@ static void calc_tile_offsets_packed(struct ipu_image_convert_ctx *ctx,
 		for (col = 0; col < image->num_cols; col++) {
 			col_off = (col * w * bpp) >> 3;
 
-			image->tile[tile].offset = row_off + col_off;
+			offset = row_off + col_off;
+
+			image->tile[tile].offset = offset;
 			image->tile[tile].u_off = 0;
 			image->tile[tile++].v_off = 0;
 
-			dev_dbg(priv->ipu->dev,
-				"task %u: ctx %p: %s@[%d,%d]: phys %08x\n",
-				chan->ic_task, ctx,
-				image->type == IMAGE_CONVERT_IN ?
-				"Input" : "Output", row, col,
-				row_off + col_off);
+			if (offset & 0x7) {
+				dev_err(priv->ipu->dev,
+					"task %u: ctx %p: %s@[%d,%d]: "
+					"phys %08x\n",
+					chan->ic_task, ctx,
+					image->type == IMAGE_CONVERT_IN ?
+					"Input" : "Output", row, col,
+					row_off + col_off);
+				return -EINVAL;
+			}
 		}
 	}
+
+	return 0;
 }
 
-static void calc_tile_offsets(struct ipu_image_convert_ctx *ctx,
+static int calc_tile_offsets(struct ipu_image_convert_ctx *ctx,
 			      struct ipu_image_convert_image *image)
 {
 	if (image->fmt->planar)
-		calc_tile_offsets_planar(ctx, image);
-	else
-		calc_tile_offsets_packed(ctx, image);
+		return calc_tile_offsets_planar(ctx, image);
+
+	return calc_tile_offsets_packed(ctx, image);
 }
 
 /*
@@ -1199,9 +1213,8 @@ static int fill_image(struct ipu_image_convert_ctx *ctx,
 		ic_image->stride  = ic_image->base.pix.bytesperline;
 
 	calc_tile_dimensions(ctx, ic_image);
-	calc_tile_offsets(ctx, ic_image);
 
-	return 0;
+	return calc_tile_offsets(ctx, ic_image);
 }
 
 /* borrowed from drivers/media/v4l2-core/v4l2-common.c */
-- 
2.19.0

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v4 10/22] gpu: ipu-v3: image-convert: prepare for per-tile configuration
  2018-10-19 12:15 [PATCH v4 00/22] i.MX media mem2mem scaler Philipp Zabel
                   ` (8 preceding siblings ...)
  2018-10-19 12:15 ` [PATCH v4 09/22] gpu: ipu-v3: image-convert: Catch unaligned tile offsets Philipp Zabel
@ 2018-10-19 12:15 ` Philipp Zabel
  2018-10-19 12:15 ` [PATCH v4 11/22] gpu: ipu-v3: image-convert: calculate per-tile resize coefficients Philipp Zabel
                   ` (13 subsequent siblings)
  23 siblings, 0 replies; 26+ messages in thread
From: Philipp Zabel @ 2018-10-19 12:15 UTC (permalink / raw)
  To: linux-media, Steve Longerbeam; +Cc: Nicolas Dufresne, Tim Harvey, kernel

Let convert_start start from a given tile index, allocate intermediate
tile with maximum tile size.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
---
No changes since v3.
---
 drivers/gpu/ipu-v3/ipu-image-convert.c | 60 +++++++++++++++-----------
 1 file changed, 35 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-image-convert.c b/drivers/gpu/ipu-v3/ipu-image-convert.c
index 5fccba176e39..c2f82d681c48 100644
--- a/drivers/gpu/ipu-v3/ipu-image-convert.c
+++ b/drivers/gpu/ipu-v3/ipu-image-convert.c
@@ -625,7 +625,8 @@ static void init_idmac_channel(struct ipu_image_convert_ctx *ctx,
 			       struct ipuv3_channel *channel,
 			       struct ipu_image_convert_image *image,
 			       enum ipu_rotate_mode rot_mode,
-			       bool rot_swap_width_height)
+			       bool rot_swap_width_height,
+			       unsigned int tile)
 {
 	struct ipu_image_convert_chan *chan = ctx->chan;
 	unsigned int burst_size;
@@ -635,23 +636,23 @@ static void init_idmac_channel(struct ipu_image_convert_ctx *ctx,
 	unsigned int tile_idx[2];
 
 	if (image->type == IMAGE_CONVERT_OUT) {
-		tile_idx[0] = ctx->out_tile_map[0];
+		tile_idx[0] = ctx->out_tile_map[tile];
 		tile_idx[1] = ctx->out_tile_map[1];
 	} else {
-		tile_idx[0] = 0;
+		tile_idx[0] = tile;
 		tile_idx[1] = 1;
 	}
 
 	if (rot_swap_width_height) {
-		width = image->tile[0].height;
-		height = image->tile[0].width;
-		stride = image->tile[0].rot_stride;
+		width = image->tile[tile_idx[0]].height;
+		height = image->tile[tile_idx[0]].width;
+		stride = image->tile[tile_idx[0]].rot_stride;
 		addr0 = ctx->rot_intermediate[0].phys;
 		if (ctx->double_buffering)
 			addr1 = ctx->rot_intermediate[1].phys;
 	} else {
-		width = image->tile[0].width;
-		height = image->tile[0].height;
+		width = image->tile[tile_idx[0]].width;
+		height = image->tile[tile_idx[0]].height;
 		stride = image->stride;
 		addr0 = image->base.phys0 +
 			image->tile[tile_idx[0]].offset;
@@ -701,7 +702,7 @@ static void init_idmac_channel(struct ipu_image_convert_ctx *ctx,
 	ipu_idmac_set_double_buffer(channel, ctx->double_buffering);
 }
 
-static int convert_start(struct ipu_image_convert_run *run)
+static int convert_start(struct ipu_image_convert_run *run, unsigned int tile)
 {
 	struct ipu_image_convert_ctx *ctx = run->ctx;
 	struct ipu_image_convert_chan *chan = ctx->chan;
@@ -709,28 +710,29 @@ static int convert_start(struct ipu_image_convert_run *run)
 	struct ipu_image_convert_image *s_image = &ctx->in;
 	struct ipu_image_convert_image *d_image = &ctx->out;
 	enum ipu_color_space src_cs, dest_cs;
+	unsigned int dst_tile = ctx->out_tile_map[tile];
 	unsigned int dest_width, dest_height;
 	int ret;
 
-	dev_dbg(priv->ipu->dev, "%s: task %u: starting ctx %p run %p\n",
-		__func__, chan->ic_task, ctx, run);
+	dev_dbg(priv->ipu->dev, "%s: task %u: starting ctx %p run %p tile %u -> %u\n",
+		__func__, chan->ic_task, ctx, run, tile, dst_tile);
 
 	src_cs = ipu_pixelformat_to_colorspace(s_image->fmt->fourcc);
 	dest_cs = ipu_pixelformat_to_colorspace(d_image->fmt->fourcc);
 
 	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
 		/* swap width/height for resizer */
-		dest_width = d_image->tile[0].height;
-		dest_height = d_image->tile[0].width;
+		dest_width = d_image->tile[dst_tile].height;
+		dest_height = d_image->tile[dst_tile].width;
 	} else {
-		dest_width = d_image->tile[0].width;
-		dest_height = d_image->tile[0].height;
+		dest_width = d_image->tile[dst_tile].width;
+		dest_height = d_image->tile[dst_tile].height;
 	}
 
 	/* setup the IC resizer and CSC */
 	ret = ipu_ic_task_init(chan->ic,
-			       s_image->tile[0].width,
-			       s_image->tile[0].height,
+			       s_image->tile[tile].width,
+			       s_image->tile[tile].height,
 			       dest_width,
 			       dest_height,
 			       src_cs, dest_cs);
@@ -741,27 +743,27 @@ static int convert_start(struct ipu_image_convert_run *run)
 
 	/* init the source MEM-->IC PP IDMAC channel */
 	init_idmac_channel(ctx, chan->in_chan, s_image,
-			   IPU_ROTATE_NONE, false);
+			   IPU_ROTATE_NONE, false, tile);
 
 	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
 		/* init the IC PP-->MEM IDMAC channel */
 		init_idmac_channel(ctx, chan->out_chan, d_image,
-				   IPU_ROTATE_NONE, true);
+				   IPU_ROTATE_NONE, true, tile);
 
 		/* init the MEM-->IC PP ROT IDMAC channel */
 		init_idmac_channel(ctx, chan->rotation_in_chan, d_image,
-				   ctx->rot_mode, true);
+				   ctx->rot_mode, true, tile);
 
 		/* init the destination IC PP ROT-->MEM IDMAC channel */
 		init_idmac_channel(ctx, chan->rotation_out_chan, d_image,
-				   IPU_ROTATE_NONE, false);
+				   IPU_ROTATE_NONE, false, tile);
 
 		/* now link IC PP-->MEM to MEM-->IC PP ROT */
 		ipu_idmac_link(chan->out_chan, chan->rotation_in_chan);
 	} else {
 		/* init the destination IC PP-->MEM IDMAC channel */
 		init_idmac_channel(ctx, chan->out_chan, d_image,
-				   ctx->rot_mode, false);
+				   ctx->rot_mode, false, tile);
 	}
 
 	/* enable the IC */
@@ -819,7 +821,7 @@ static int do_run(struct ipu_image_convert_run *run)
 	list_del(&run->list);
 	chan->current_run = run;
 
-	return convert_start(run);
+	return convert_start(run, 0);
 }
 
 /* hold irqlock when calling */
@@ -1449,14 +1451,22 @@ ipu_image_convert_prepare(struct ipu_soc *ipu, enum ipu_ic_task ic_task,
 				 !d_image->fmt->planar);
 
 	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
+		unsigned long intermediate_size = d_image->tile[0].size;
+		unsigned int i;
+
+		for (i = 1; i < ctx->num_tiles; i++) {
+			if (d_image->tile[i].size > intermediate_size)
+				intermediate_size = d_image->tile[i].size;
+		}
+
 		ret = alloc_dma_buf(priv, &ctx->rot_intermediate[0],
-				    d_image->tile[0].size);
+				    intermediate_size);
 		if (ret)
 			goto out_free;
 		if (ctx->double_buffering) {
 			ret = alloc_dma_buf(priv,
 					    &ctx->rot_intermediate[1],
-					    d_image->tile[0].size);
+					    intermediate_size);
 			if (ret)
 				goto out_free_dmabuf0;
 		}
-- 
2.19.0

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v4 11/22] gpu: ipu-v3: image-convert: calculate per-tile resize coefficients
  2018-10-19 12:15 [PATCH v4 00/22] i.MX media mem2mem scaler Philipp Zabel
                   ` (9 preceding siblings ...)
  2018-10-19 12:15 ` [PATCH v4 10/22] gpu: ipu-v3: image-convert: prepare for per-tile configuration Philipp Zabel
@ 2018-10-19 12:15 ` Philipp Zabel
  2018-10-19 12:15 ` [PATCH v4 12/22] gpu: ipu-v3: image-convert: reconfigure IC per tile Philipp Zabel
                   ` (12 subsequent siblings)
  23 siblings, 0 replies; 26+ messages in thread
From: Philipp Zabel @ 2018-10-19 12:15 UTC (permalink / raw)
  To: linux-media, Steve Longerbeam; +Cc: Nicolas Dufresne, Tim Harvey, kernel

Slightly modifying resize coefficients per-tile allows to completely
hide the seams between tiles and to sample the correct input pixels at
the bottom and right edges of the image.

Tiling requires a bilinear interpolator reset at each tile start, which
causes the image to be slightly shifted if the starting pixel should not
have been sampled from an integer pixel position in the source image
according to the full image resizing ratio. To work around this
hardware limitation, calculate per-tile resizing coefficients that make
sure that the correct input pixels are sampled at the tile end.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
---
No changes since v3.
---
 drivers/gpu/ipu-v3/ipu-image-convert.c | 236 ++++++++++++++++++++++++-
 1 file changed, 234 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-image-convert.c b/drivers/gpu/ipu-v3/ipu-image-convert.c
index c2f82d681c48..31e7186bcc00 100644
--- a/drivers/gpu/ipu-v3/ipu-image-convert.c
+++ b/drivers/gpu/ipu-v3/ipu-image-convert.c
@@ -135,6 +135,12 @@ struct ipu_image_convert_ctx {
 	struct ipu_image_convert_image in;
 	struct ipu_image_convert_image out;
 	enum ipu_rotate_mode rot_mode;
+	u32 downsize_coeff_h;
+	u32 downsize_coeff_v;
+	u32 image_resize_coeff_h;
+	u32 image_resize_coeff_v;
+	u32 resize_coeffs_h[MAX_STRIPES_W];
+	u32 resize_coeffs_v[MAX_STRIPES_H];
 
 	/* intermediate buffer for rotation */
 	struct ipu_image_convert_dma_buf rot_intermediate[2];
@@ -361,6 +367,69 @@ static inline int num_stripes(int dim)
 		return 4;
 }
 
+/*
+ * Calculate downsizing coefficients, which are the same for all tiles,
+ * and bilinear resizing coefficients, which are used to find the best
+ * seam positions.
+ */
+static int calc_image_resize_coefficients(struct ipu_image_convert_ctx *ctx,
+					  struct ipu_image *in,
+					  struct ipu_image *out)
+{
+	u32 downsized_width = in->rect.width;
+	u32 downsized_height = in->rect.height;
+	u32 downsize_coeff_v = 0;
+	u32 downsize_coeff_h = 0;
+	u32 resized_width = out->rect.width;
+	u32 resized_height = out->rect.height;
+	u32 resize_coeff_h;
+	u32 resize_coeff_v;
+
+	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
+		resized_width = out->rect.height;
+		resized_height = out->rect.width;
+	}
+
+	/* Do not let invalid input lead to an endless loop below */
+	if (WARN_ON(resized_width == 0 || resized_height == 0))
+		return -EINVAL;
+
+	while (downsized_width >= resized_width * 2) {
+		downsized_width >>= 1;
+		downsize_coeff_h++;
+	}
+
+	while (downsized_height >= resized_height * 2) {
+		downsized_height >>= 1;
+		downsize_coeff_v++;
+	}
+
+	/*
+	 * Calculate the bilinear resizing coefficients that could be used if
+	 * we were converting with a single tile. The bottom right output pixel
+	 * should sample as close as possible to the bottom right input pixel
+	 * out of the decimator, but not overshoot it:
+	 */
+	resize_coeff_h = 8192 * (downsized_width - 1) / (resized_width - 1);
+	resize_coeff_v = 8192 * (downsized_height - 1) / (resized_height - 1);
+
+	dev_dbg(ctx->chan->priv->ipu->dev,
+		"%s: hscale: >>%u, *8192/%u vscale: >>%u, *8192/%u, %ux%u tiles\n",
+		__func__, downsize_coeff_h, resize_coeff_h, downsize_coeff_v,
+		resize_coeff_v, ctx->in.num_cols, ctx->in.num_rows);
+
+	if (downsize_coeff_h > 2 || downsize_coeff_v  > 2 ||
+	    resize_coeff_h > 0x3fff || resize_coeff_v > 0x3fff)
+		return -EINVAL;
+
+	ctx->downsize_coeff_h = downsize_coeff_h;
+	ctx->downsize_coeff_v = downsize_coeff_v;
+	ctx->image_resize_coeff_h = resize_coeff_h;
+	ctx->image_resize_coeff_v = resize_coeff_v;
+
+	return 0;
+}
+
 static void calc_tile_dimensions(struct ipu_image_convert_ctx *ctx,
 				 struct ipu_image_convert_image *image)
 {
@@ -578,6 +647,149 @@ static int calc_tile_offsets(struct ipu_image_convert_ctx *ctx,
 	return calc_tile_offsets_packed(ctx, image);
 }
 
+/*
+ * Calculate the resizing ratio for the IC main processing section given input
+ * size, fixed downsizing coefficient, and output size.
+ * Either round to closest for the next tile's first pixel to minimize seams
+ * and distortion (for all but right column / bottom row), or round down to
+ * avoid sampling beyond the edges of the input image for this tile's last
+ * pixel.
+ * Returns the resizing coefficient, resizing ratio is 8192.0 / resize_coeff.
+ */
+static u32 calc_resize_coeff(u32 input_size, u32 downsize_coeff,
+			     u32 output_size, bool allow_overshoot)
+{
+	u32 downsized = input_size >> downsize_coeff;
+
+	if (allow_overshoot)
+		return DIV_ROUND_CLOSEST(8192 * downsized, output_size);
+	else
+		return 8192 * (downsized - 1) / (output_size - 1);
+}
+
+/*
+ * Slightly modify resize coefficients per tile to hide the bilinear
+ * interpolator reset at tile borders, shifting the right / bottom edge
+ * by up to a half input pixel. This removes noticeable seams between
+ * tiles at higher upscaling factors.
+ */
+static void calc_tile_resize_coefficients(struct ipu_image_convert_ctx *ctx)
+{
+	struct ipu_image_convert_chan *chan = ctx->chan;
+	struct ipu_image_convert_priv *priv = chan->priv;
+	struct ipu_image_tile *in_tile, *out_tile;
+	unsigned int col, row, tile_idx;
+	unsigned int last_output;
+
+	for (col = 0; col < ctx->in.num_cols; col++) {
+		bool closest = (col < ctx->in.num_cols - 1) &&
+			       !(ctx->rot_mode & IPU_ROT_BIT_HFLIP);
+		u32 resized_width;
+		u32 resize_coeff_h;
+
+		tile_idx = col;
+		in_tile = &ctx->in.tile[tile_idx];
+		out_tile = &ctx->out.tile[ctx->out_tile_map[tile_idx]];
+
+		if (ipu_rot_mode_is_irt(ctx->rot_mode))
+			resized_width = out_tile->height;
+		else
+			resized_width = out_tile->width;
+
+		resize_coeff_h = calc_resize_coeff(in_tile->width,
+						   ctx->downsize_coeff_h,
+						   resized_width, closest);
+
+		dev_dbg(priv->ipu->dev, "%s: column %u hscale: *8192/%u\n",
+			__func__, col, resize_coeff_h);
+
+
+		for (row = 0; row < ctx->in.num_rows; row++) {
+			tile_idx = row * ctx->in.num_cols + col;
+			in_tile = &ctx->in.tile[tile_idx];
+			out_tile = &ctx->out.tile[ctx->out_tile_map[tile_idx]];
+
+			/*
+			 * With the horizontal scaling factor known, round up
+			 * resized width (output width or height) to burst size.
+			 */
+			if (ipu_rot_mode_is_irt(ctx->rot_mode))
+				out_tile->height = round_up(resized_width, 8);
+			else
+				out_tile->width = round_up(resized_width, 8);
+
+			/*
+			 * Calculate input width from the last accessed input
+			 * pixel given resized width and scaling coefficients.
+			 * Round up to burst size.
+			 */
+			last_output = round_up(resized_width, 8) - 1;
+			if (closest)
+				last_output++;
+			in_tile->width = round_up(
+				(DIV_ROUND_UP(last_output * resize_coeff_h,
+					      8192) + 1)
+				<< ctx->downsize_coeff_h, 8);
+		}
+
+		ctx->resize_coeffs_h[col] = resize_coeff_h;
+	}
+
+	for (row = 0; row < ctx->in.num_rows; row++) {
+		bool closest = (row < ctx->in.num_rows - 1) &&
+			       !(ctx->rot_mode & IPU_ROT_BIT_VFLIP);
+		u32 resized_height;
+		u32 resize_coeff_v;
+
+		tile_idx = row * ctx->in.num_cols;
+		in_tile = &ctx->in.tile[tile_idx];
+		out_tile = &ctx->out.tile[ctx->out_tile_map[tile_idx]];
+
+		if (ipu_rot_mode_is_irt(ctx->rot_mode))
+			resized_height = out_tile->width;
+		else
+			resized_height = out_tile->height;
+
+		resize_coeff_v = calc_resize_coeff(in_tile->height,
+						   ctx->downsize_coeff_v,
+						   resized_height, closest);
+
+		dev_dbg(priv->ipu->dev, "%s: row %u vscale: *8192/%u\n",
+			__func__, row, resize_coeff_v);
+
+		for (col = 0; col < ctx->in.num_cols; col++) {
+			tile_idx = row * ctx->in.num_cols + col;
+			in_tile = &ctx->in.tile[tile_idx];
+			out_tile = &ctx->out.tile[ctx->out_tile_map[tile_idx]];
+
+			/*
+			 * With the vertical scaling factor known, round up
+			 * resized height (output width or height) to IDMAC
+			 * limitations.
+			 */
+			if (ipu_rot_mode_is_irt(ctx->rot_mode))
+				out_tile->width = round_up(resized_height, 2);
+			else
+				out_tile->height = round_up(resized_height, 2);
+
+			/*
+			 * Calculate input width from the last accessed input
+			 * pixel given resized height and scaling coefficients.
+			 * Align to IDMAC restrictions.
+			 */
+			last_output = round_up(resized_height, 2) - 1;
+			if (closest)
+				last_output++;
+			in_tile->height = round_up(
+				(DIV_ROUND_UP(last_output * resize_coeff_v,
+					      8192) + 1)
+				<< ctx->downsize_coeff_v, 2);
+		}
+
+		ctx->resize_coeffs_v[row] = resize_coeff_v;
+	}
+}
+
 /*
  * return the number of runs in given queue (pending_q or done_q)
  * for this context. hold irqlock when calling.
@@ -712,6 +924,8 @@ static int convert_start(struct ipu_image_convert_run *run, unsigned int tile)
 	enum ipu_color_space src_cs, dest_cs;
 	unsigned int dst_tile = ctx->out_tile_map[tile];
 	unsigned int dest_width, dest_height;
+	unsigned int col, row;
+	u32 rsc;
 	int ret;
 
 	dev_dbg(priv->ipu->dev, "%s: task %u: starting ctx %p run %p tile %u -> %u\n",
@@ -729,13 +943,26 @@ static int convert_start(struct ipu_image_convert_run *run, unsigned int tile)
 		dest_height = d_image->tile[dst_tile].height;
 	}
 
+	row = tile / s_image->num_cols;
+	col = tile % s_image->num_cols;
+
+	rsc =  (ctx->downsize_coeff_v << 30) |
+	       (ctx->resize_coeffs_v[row] << 16) |
+	       (ctx->downsize_coeff_h << 14) |
+	       (ctx->resize_coeffs_h[col]);
+
+	dev_dbg(priv->ipu->dev, "%s: %ux%u -> %ux%u (rsc = 0x%x)\n",
+		__func__, s_image->tile[tile].width,
+		s_image->tile[tile].height, dest_width, dest_height, rsc);
+
 	/* setup the IC resizer and CSC */
-	ret = ipu_ic_task_init(chan->ic,
+	ret = ipu_ic_task_init_rsc(chan->ic,
 			       s_image->tile[tile].width,
 			       s_image->tile[tile].height,
 			       dest_width,
 			       dest_height,
-			       src_cs, dest_cs);
+			       src_cs, dest_cs,
+			       rsc);
 	if (ret) {
 		dev_err(priv->ipu->dev, "ipu_ic_task_init failed, %d\n", ret);
 		return ret;
@@ -1420,6 +1647,10 @@ ipu_image_convert_prepare(struct ipu_soc *ipu, enum ipu_ic_task ic_task,
 	ctx->num_tiles = d_image->num_cols * d_image->num_rows;
 	ctx->rot_mode = rot_mode;
 
+	ret = calc_image_resize_coefficients(ctx, in, out);
+	if (ret)
+		goto out_free;
+
 	ret = fill_image(ctx, s_image, in, IMAGE_CONVERT_IN);
 	if (ret)
 		goto out_free;
@@ -1428,6 +1659,7 @@ ipu_image_convert_prepare(struct ipu_soc *ipu, enum ipu_ic_task ic_task,
 		goto out_free;
 
 	calc_out_tile_map(ctx);
+	calc_tile_resize_coefficients(ctx);
 
 	dump_format(ctx, s_image);
 	dump_format(ctx, d_image);
-- 
2.19.0

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v4 12/22] gpu: ipu-v3: image-convert: reconfigure IC per tile
  2018-10-19 12:15 [PATCH v4 00/22] i.MX media mem2mem scaler Philipp Zabel
                   ` (10 preceding siblings ...)
  2018-10-19 12:15 ` [PATCH v4 11/22] gpu: ipu-v3: image-convert: calculate per-tile resize coefficients Philipp Zabel
@ 2018-10-19 12:15 ` Philipp Zabel
  2018-10-19 12:15 ` [PATCH v4 13/22] gpu: ipu-v3: image-convert: store tile top/left position Philipp Zabel
                   ` (11 subsequent siblings)
  23 siblings, 0 replies; 26+ messages in thread
From: Philipp Zabel @ 2018-10-19 12:15 UTC (permalink / raw)
  To: linux-media, Steve Longerbeam; +Cc: Nicolas Dufresne, Tim Harvey, kernel

For differently sized tiles or if the resizing coefficients change,
we have to stop, reconfigure, and restart the IC between tiles.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
---
No changes since v3.
---
 drivers/gpu/ipu-v3/ipu-image-convert.c | 65 +++++++++++++++++---------
 1 file changed, 44 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-image-convert.c b/drivers/gpu/ipu-v3/ipu-image-convert.c
index 31e7186bcc00..cb47981741b4 100644
--- a/drivers/gpu/ipu-v3/ipu-image-convert.c
+++ b/drivers/gpu/ipu-v3/ipu-image-convert.c
@@ -1151,6 +1151,24 @@ static irqreturn_t do_bh(int irq, void *dev_id)
 	return IRQ_HANDLED;
 }
 
+static bool ic_settings_changed(struct ipu_image_convert_ctx *ctx)
+{
+	unsigned int cur_tile = ctx->next_tile - 1;
+	unsigned int next_tile = ctx->next_tile;
+
+	if (ctx->resize_coeffs_h[cur_tile % ctx->in.num_cols] !=
+	    ctx->resize_coeffs_h[next_tile % ctx->in.num_cols] ||
+	    ctx->resize_coeffs_v[cur_tile / ctx->in.num_cols] !=
+	    ctx->resize_coeffs_v[next_tile / ctx->in.num_cols] ||
+	    ctx->in.tile[cur_tile].width != ctx->in.tile[next_tile].width ||
+	    ctx->in.tile[cur_tile].height != ctx->in.tile[next_tile].height ||
+	    ctx->out.tile[cur_tile].width != ctx->out.tile[next_tile].width ||
+	    ctx->out.tile[cur_tile].height != ctx->out.tile[next_tile].height)
+		return true;
+
+	return false;
+}
+
 /* hold irqlock when calling */
 static irqreturn_t do_irq(struct ipu_image_convert_run *run)
 {
@@ -1194,27 +1212,32 @@ static irqreturn_t do_irq(struct ipu_image_convert_run *run)
 	 * not done, place the next tile buffers.
 	 */
 	if (!ctx->double_buffering) {
-
-		src_tile = &s_image->tile[ctx->next_tile];
-		dst_idx = ctx->out_tile_map[ctx->next_tile];
-		dst_tile = &d_image->tile[dst_idx];
-
-		ipu_cpmem_set_buffer(chan->in_chan, 0,
-				     s_image->base.phys0 + src_tile->offset);
-		ipu_cpmem_set_buffer(outch, 0,
-				     d_image->base.phys0 + dst_tile->offset);
-		if (s_image->fmt->planar)
-			ipu_cpmem_set_uv_offset(chan->in_chan,
-						src_tile->u_off,
-						src_tile->v_off);
-		if (d_image->fmt->planar)
-			ipu_cpmem_set_uv_offset(outch,
-						dst_tile->u_off,
-						dst_tile->v_off);
-
-		ipu_idmac_select_buffer(chan->in_chan, 0);
-		ipu_idmac_select_buffer(outch, 0);
-
+		if (ic_settings_changed(ctx)) {
+			convert_stop(run);
+			convert_start(run, ctx->next_tile);
+		} else {
+			src_tile = &s_image->tile[ctx->next_tile];
+			dst_idx = ctx->out_tile_map[ctx->next_tile];
+			dst_tile = &d_image->tile[dst_idx];
+
+			ipu_cpmem_set_buffer(chan->in_chan, 0,
+					     s_image->base.phys0 +
+					     src_tile->offset);
+			ipu_cpmem_set_buffer(outch, 0,
+					     d_image->base.phys0 +
+					     dst_tile->offset);
+			if (s_image->fmt->planar)
+				ipu_cpmem_set_uv_offset(chan->in_chan,
+							src_tile->u_off,
+							src_tile->v_off);
+			if (d_image->fmt->planar)
+				ipu_cpmem_set_uv_offset(outch,
+							dst_tile->u_off,
+							dst_tile->v_off);
+
+			ipu_idmac_select_buffer(chan->in_chan, 0);
+			ipu_idmac_select_buffer(outch, 0);
+		}
 	} else if (ctx->next_tile < ctx->num_tiles - 1) {
 
 		src_tile = &s_image->tile[ctx->next_tile + 1];
-- 
2.19.0

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v4 13/22] gpu: ipu-v3: image-convert: store tile top/left position
  2018-10-19 12:15 [PATCH v4 00/22] i.MX media mem2mem scaler Philipp Zabel
                   ` (11 preceding siblings ...)
  2018-10-19 12:15 ` [PATCH v4 12/22] gpu: ipu-v3: image-convert: reconfigure IC per tile Philipp Zabel
@ 2018-10-19 12:15 ` Philipp Zabel
  2018-10-19 12:15 ` [PATCH v4 14/22] gpu: ipu-v3: image-convert: calculate tile dimensions and offsets outside fill_image Philipp Zabel
                   ` (10 subsequent siblings)
  23 siblings, 0 replies; 26+ messages in thread
From: Philipp Zabel @ 2018-10-19 12:15 UTC (permalink / raw)
  To: linux-media, Steve Longerbeam; +Cc: Nicolas Dufresne, Tim Harvey, kernel

Store tile top/left position in pixels in the tile structure.
This will allow overlapping tiles with different sizes later.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
---
No functional changes since v3.
---
 drivers/gpu/ipu-v3/ipu-image-convert.c | 27 ++++++++++++++------------
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-image-convert.c b/drivers/gpu/ipu-v3/ipu-image-convert.c
index cb47981741b4..d14ee7b303a1 100644
--- a/drivers/gpu/ipu-v3/ipu-image-convert.c
+++ b/drivers/gpu/ipu-v3/ipu-image-convert.c
@@ -84,6 +84,8 @@ struct ipu_image_convert_dma_chan {
 struct ipu_image_tile {
 	u32 width;
 	u32 height;
+	u32 left;
+	u32 top;
 	/* size and strides are in bytes */
 	u32 size;
 	u32 stride;
@@ -433,13 +435,17 @@ static int calc_image_resize_coefficients(struct ipu_image_convert_ctx *ctx,
 static void calc_tile_dimensions(struct ipu_image_convert_ctx *ctx,
 				 struct ipu_image_convert_image *image)
 {
-	int i;
+	unsigned int i;
 
 	for (i = 0; i < ctx->num_tiles; i++) {
 		struct ipu_image_tile *tile = &image->tile[i];
+		const unsigned int row = i / image->num_cols;
+		const unsigned int col = i % image->num_cols;
 
 		tile->height = image->base.pix.height / image->num_rows;
 		tile->width = image->base.pix.width / image->num_cols;
+		tile->left = col * tile->width;
+		tile->top = row * tile->height;
 		tile->size = ((tile->height * image->fmt->bpp) >> 3) *
 			tile->width;
 
@@ -535,7 +541,7 @@ static int calc_tile_offsets_planar(struct ipu_image_convert_ctx *ctx,
 	struct ipu_image_convert_priv *priv = chan->priv;
 	const struct ipu_image_pixfmt *fmt = image->fmt;
 	unsigned int row, col, tile = 0;
-	u32 H, w, h, y_stride, uv_stride;
+	u32 H, top, y_stride, uv_stride;
 	u32 uv_row_off, uv_col_off, uv_off, u_off, v_off, tmp;
 	u32 y_row_off, y_col_off, y_off;
 	u32 y_size, uv_size;
@@ -552,13 +558,12 @@ static int calc_tile_offsets_planar(struct ipu_image_convert_ctx *ctx,
 	uv_size = y_size / (fmt->uv_width_dec * fmt->uv_height_dec);
 
 	for (row = 0; row < image->num_rows; row++) {
-		w = image->tile[tile].width;
-		h = image->tile[tile].height;
-		y_row_off = row * h * y_stride;
-		uv_row_off = (row * h * uv_stride) / fmt->uv_height_dec;
+		top = image->tile[tile].top;
+		y_row_off = top * y_stride;
+		uv_row_off = (top * uv_stride) / fmt->uv_height_dec;
 
 		for (col = 0; col < image->num_cols; col++) {
-			y_col_off = col * w;
+			y_col_off = image->tile[tile].left;
 			uv_col_off = y_col_off / fmt->uv_width_dec;
 			if (fmt->uv_packed)
 				uv_col_off *= 2;
@@ -601,7 +606,7 @@ static int calc_tile_offsets_packed(struct ipu_image_convert_ctx *ctx,
 	struct ipu_image_convert_priv *priv = chan->priv;
 	const struct ipu_image_pixfmt *fmt = image->fmt;
 	unsigned int row, col, tile = 0;
-	u32 w, h, bpp, stride, offset;
+	u32 bpp, stride, offset;
 	u32 row_off, col_off;
 
 	/* setup some convenience vars */
@@ -609,12 +614,10 @@ static int calc_tile_offsets_packed(struct ipu_image_convert_ctx *ctx,
 	bpp = fmt->bpp;
 
 	for (row = 0; row < image->num_rows; row++) {
-		w = image->tile[tile].width;
-		h = image->tile[tile].height;
-		row_off = row * h * stride;
+		row_off = image->tile[tile].top * stride;
 
 		for (col = 0; col < image->num_cols; col++) {
-			col_off = (col * w * bpp) >> 3;
+			col_off = (image->tile[tile].left * bpp) >> 3;
 
 			offset = row_off + col_off;
 
-- 
2.19.0

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v4 14/22] gpu: ipu-v3: image-convert: calculate tile dimensions and offsets outside fill_image
  2018-10-19 12:15 [PATCH v4 00/22] i.MX media mem2mem scaler Philipp Zabel
                   ` (12 preceding siblings ...)
  2018-10-19 12:15 ` [PATCH v4 13/22] gpu: ipu-v3: image-convert: store tile top/left position Philipp Zabel
@ 2018-10-19 12:15 ` Philipp Zabel
  2018-10-19 12:15 ` [PATCH v4 15/22] gpu: ipu-v3: image-convert: move tile alignment helpers Philipp Zabel
                   ` (9 subsequent siblings)
  23 siblings, 0 replies; 26+ messages in thread
From: Philipp Zabel @ 2018-10-19 12:15 UTC (permalink / raw)
  To: linux-media, Steve Longerbeam; +Cc: Nicolas Dufresne, Tim Harvey, kernel

This will allow to calculate seam positions after initializing the
ipu_image base structure but before calculating tile dimensions.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
---
No changes since v3.
---
 drivers/gpu/ipu-v3/ipu-image-convert.c | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-image-convert.c b/drivers/gpu/ipu-v3/ipu-image-convert.c
index d14ee7b303a1..542c091cfef1 100644
--- a/drivers/gpu/ipu-v3/ipu-image-convert.c
+++ b/drivers/gpu/ipu-v3/ipu-image-convert.c
@@ -1467,9 +1467,7 @@ static int fill_image(struct ipu_image_convert_ctx *ctx,
 	else
 		ic_image->stride  = ic_image->base.pix.bytesperline;
 
-	calc_tile_dimensions(ctx, ic_image);
-
-	return calc_tile_offsets(ctx, ic_image);
+	return 0;
 }
 
 /* borrowed from drivers/media/v4l2-core/v4l2-common.c */
@@ -1673,14 +1671,24 @@ ipu_image_convert_prepare(struct ipu_soc *ipu, enum ipu_ic_task ic_task,
 	ctx->num_tiles = d_image->num_cols * d_image->num_rows;
 	ctx->rot_mode = rot_mode;
 
+	ret = fill_image(ctx, s_image, in, IMAGE_CONVERT_IN);
+	if (ret)
+		goto out_free;
+	ret = fill_image(ctx, d_image, out, IMAGE_CONVERT_OUT);
+	if (ret)
+		goto out_free;
+
 	ret = calc_image_resize_coefficients(ctx, in, out);
 	if (ret)
 		goto out_free;
 
-	ret = fill_image(ctx, s_image, in, IMAGE_CONVERT_IN);
+	calc_tile_dimensions(ctx, s_image);
+	ret = calc_tile_offsets(ctx, s_image);
 	if (ret)
 		goto out_free;
-	ret = fill_image(ctx, d_image, out, IMAGE_CONVERT_OUT);
+
+	calc_tile_dimensions(ctx, d_image);
+	ret = calc_tile_offsets(ctx, d_image);
 	if (ret)
 		goto out_free;
 
-- 
2.19.0

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v4 15/22] gpu: ipu-v3: image-convert: move tile alignment helpers
  2018-10-19 12:15 [PATCH v4 00/22] i.MX media mem2mem scaler Philipp Zabel
                   ` (13 preceding siblings ...)
  2018-10-19 12:15 ` [PATCH v4 14/22] gpu: ipu-v3: image-convert: calculate tile dimensions and offsets outside fill_image Philipp Zabel
@ 2018-10-19 12:15 ` Philipp Zabel
  2018-10-19 12:15 ` [PATCH v4 16/22] gpu: ipu-v3: image-convert: select optimal seam positions Philipp Zabel
                   ` (8 subsequent siblings)
  23 siblings, 0 replies; 26+ messages in thread
From: Philipp Zabel @ 2018-10-19 12:15 UTC (permalink / raw)
  To: linux-media, Steve Longerbeam; +Cc: Nicolas Dufresne, Tim Harvey, kernel

Move tile_width_align and tile_height_align up so they
can be used by the tile edge position calculation code.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
---
No changes since v3.
---
 drivers/gpu/ipu-v3/ipu-image-convert.c | 54 +++++++++++++-------------
 1 file changed, 27 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-image-convert.c b/drivers/gpu/ipu-v3/ipu-image-convert.c
index 542c091cfef1..a407ca3b367b 100644
--- a/drivers/gpu/ipu-v3/ipu-image-convert.c
+++ b/drivers/gpu/ipu-v3/ipu-image-convert.c
@@ -432,6 +432,33 @@ static int calc_image_resize_coefficients(struct ipu_image_convert_ctx *ctx,
 	return 0;
 }
 
+/*
+ * We have to adjust the tile width such that the tile physaddrs and
+ * U and V plane offsets are multiples of 8 bytes as required by
+ * the IPU DMA Controller. For the planar formats, this corresponds
+ * to a pixel alignment of 16 (but use a more formal equation since
+ * the variables are available). For all the packed formats, 8 is
+ * good enough.
+ */
+static inline u32 tile_width_align(const struct ipu_image_pixfmt *fmt)
+{
+	return fmt->planar ? 8 * fmt->uv_width_dec : 8;
+}
+
+/*
+ * For tile height alignment, we have to ensure that the output tile
+ * heights are multiples of 8 lines if the IRT is required by the
+ * given rotation mode (the IRT performs rotations on 8x8 blocks
+ * at a time). If the IRT is not used, or for input image tiles,
+ * 2 lines are good enough.
+ */
+static inline u32 tile_height_align(enum ipu_image_convert_type type,
+				    enum ipu_rotate_mode rot_mode)
+{
+	return (type == IMAGE_CONVERT_OUT &&
+		ipu_rot_mode_is_irt(rot_mode)) ? 8 : 2;
+}
+
 static void calc_tile_dimensions(struct ipu_image_convert_ctx *ctx,
 				 struct ipu_image_convert_image *image)
 {
@@ -1487,33 +1514,6 @@ static unsigned int clamp_align(unsigned int x, unsigned int min,
 	return x;
 }
 
-/*
- * We have to adjust the tile width such that the tile physaddrs and
- * U and V plane offsets are multiples of 8 bytes as required by
- * the IPU DMA Controller. For the planar formats, this corresponds
- * to a pixel alignment of 16 (but use a more formal equation since
- * the variables are available). For all the packed formats, 8 is
- * good enough.
- */
-static inline u32 tile_width_align(const struct ipu_image_pixfmt *fmt)
-{
-	return fmt->planar ? 8 * fmt->uv_width_dec : 8;
-}
-
-/*
- * For tile height alignment, we have to ensure that the output tile
- * heights are multiples of 8 lines if the IRT is required by the
- * given rotation mode (the IRT performs rotations on 8x8 blocks
- * at a time). If the IRT is not used, or for input image tiles,
- * 2 lines are good enough.
- */
-static inline u32 tile_height_align(enum ipu_image_convert_type type,
-				    enum ipu_rotate_mode rot_mode)
-{
-	return (type == IMAGE_CONVERT_OUT &&
-		ipu_rot_mode_is_irt(rot_mode)) ? 8 : 2;
-}
-
 /* Adjusts input/output images to IPU restrictions */
 void ipu_image_convert_adjust(struct ipu_image *in, struct ipu_image *out,
 			      enum ipu_rotate_mode rot_mode)
-- 
2.19.0

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v4 16/22] gpu: ipu-v3: image-convert: select optimal seam positions
  2018-10-19 12:15 [PATCH v4 00/22] i.MX media mem2mem scaler Philipp Zabel
                   ` (14 preceding siblings ...)
  2018-10-19 12:15 ` [PATCH v4 15/22] gpu: ipu-v3: image-convert: move tile alignment helpers Philipp Zabel
@ 2018-10-19 12:15 ` Philipp Zabel
  2018-10-19 12:15 ` [PATCH v4 17/22] gpu: ipu-v3: image-convert: fix debug output for varying tile sizes Philipp Zabel
                   ` (7 subsequent siblings)
  23 siblings, 0 replies; 26+ messages in thread
From: Philipp Zabel @ 2018-10-19 12:15 UTC (permalink / raw)
  To: linux-media, Steve Longerbeam; +Cc: Nicolas Dufresne, Tim Harvey, kernel

Select seam positions that minimize distortions during seam hiding while
satifying input and output IDMAC, rotator, and image format constraints.

This code looks for aligned output seam positions that minimize the
difference between the fractional corresponding ideal input positions
and the input positions rounded to alignment requirements.

Since now tiles can be sized differently, alignment restrictions of the
complete image can be relaxed in the next step.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
---
Changes since v3:
 - Fix tile_left_align for 24-bit RGB formats and reduce alignment
   restrictions for U/V packed planar YUV formats.
---
 drivers/gpu/ipu-v3/ipu-image-convert.c | 343 ++++++++++++++++++++++++-
 1 file changed, 337 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-image-convert.c b/drivers/gpu/ipu-v3/ipu-image-convert.c
index a407ca3b367b..a674241dd0b8 100644
--- a/drivers/gpu/ipu-v3/ipu-image-convert.c
+++ b/drivers/gpu/ipu-v3/ipu-image-convert.c
@@ -432,6 +432,126 @@ static int calc_image_resize_coefficients(struct ipu_image_convert_ctx *ctx,
 	return 0;
 }
 
+#define round_closest(x, y) round_down((x) + (y)/2, (y))
+
+/*
+ * Find the best aligned seam position in the inverval [out_start, out_end].
+ * Rotation and image offsets are out of scope.
+ *
+ * @out_start: start of inverval, must be within 1024 pixels / lines
+ *             of out_end
+ * @out_end: end of interval, smaller than or equal to out_edge
+ * @in_edge: input right / bottom edge
+ * @out_edge: output right / bottom edge
+ * @in_align: input alignment, either horizontal 8-byte line start address
+ *            alignment, or pixel alignment due to image format
+ * @out_align: output alignment, either horizontal 8-byte line start address
+ *             alignment, or pixel alignment due to image format or rotator
+ *             block size
+ * @in_burst: horizontal input burst size in case of horizontal flip
+ * @out_burst: horizontal output burst size or rotator block size
+ * @downsize_coeff: downsizing section coefficient
+ * @resize_coeff: main processing section resizing coefficient
+ * @_in_seam: aligned input seam position return value
+ * @_out_seam: aligned output seam position return value
+ */
+static void find_best_seam(struct ipu_image_convert_ctx *ctx,
+			   unsigned int out_start,
+			   unsigned int out_end,
+			   unsigned int in_edge,
+			   unsigned int out_edge,
+			   unsigned int in_align,
+			   unsigned int out_align,
+			   unsigned int in_burst,
+			   unsigned int out_burst,
+			   unsigned int downsize_coeff,
+			   unsigned int resize_coeff,
+			   u32 *_in_seam,
+			   u32 *_out_seam)
+{
+	struct device *dev = ctx->chan->priv->ipu->dev;
+	unsigned int out_pos;
+	/* Input / output seam position candidates */
+	unsigned int out_seam = 0;
+	unsigned int in_seam = 0;
+	unsigned int min_diff = UINT_MAX;
+
+	/*
+	 * Output tiles must start at a multiple of 8 bytes horizontally and
+	 * possibly at an even line horizontally depending on the pixel format.
+	 * Only consider output aligned positions for the seam.
+	 */
+	out_start = round_up(out_start, out_align);
+	for (out_pos = out_start; out_pos < out_end; out_pos += out_align) {
+		unsigned int in_pos;
+		unsigned int in_pos_aligned;
+		unsigned int abs_diff;
+
+		/*
+		 * Tiles in the right row / bottom column may not be allowed to
+		 * overshoot horizontally / vertically. out_burst may be the
+		 * actual DMA burst size, or the rotator block size.
+		 */
+		if ((out_burst > 1) && (out_edge - out_pos) % out_burst)
+			continue;
+
+		/*
+		 * Input sample position, corresponding to out_pos, 19.13 fixed
+		 * point.
+		 */
+		in_pos = (out_pos * resize_coeff) << downsize_coeff;
+		/*
+		 * The closest input sample position that we could actually
+		 * start the input tile at, 19.13 fixed point.
+		 */
+		in_pos_aligned = round_closest(in_pos, 8192U * in_align);
+
+		if ((in_burst > 1) &&
+		    (in_edge - in_pos_aligned / 8192U) % in_burst)
+			continue;
+
+		if (in_pos < in_pos_aligned)
+			abs_diff = in_pos_aligned - in_pos;
+		else
+			abs_diff = in_pos - in_pos_aligned;
+
+		if (abs_diff < min_diff) {
+			in_seam = in_pos_aligned;
+			out_seam = out_pos;
+			min_diff = abs_diff;
+		}
+	}
+
+	*_out_seam = out_seam;
+	/* Convert 19.13 fixed point to integer seam position */
+	*_in_seam = DIV_ROUND_CLOSEST(in_seam, 8192U);
+
+	dev_dbg(dev, "%s: out_seam %u(%u) in [%u, %u], in_seam %u(%u) diff %u.%03u\n",
+		__func__, out_seam, out_align, out_start, out_end,
+		*_in_seam, in_align, min_diff / 8192,
+		DIV_ROUND_CLOSEST(min_diff % 8192 * 1000, 8192));
+}
+
+/*
+ * Tile left edges are required to be aligned to multiples of 8 bytes
+ * by the IDMAC.
+ */
+static inline u32 tile_left_align(const struct ipu_image_pixfmt *fmt)
+{
+	if (fmt->planar)
+		return fmt->uv_packed ? 8 : 8 * fmt->uv_width_dec;
+	else
+		return fmt->bpp == 32 ? 2 : fmt->bpp == 16 ? 4 : 8;
+}
+
+/*
+ * Tile top edge alignment is only limited by chroma subsampling.
+ */
+static inline u32 tile_top_align(const struct ipu_image_pixfmt *fmt)
+{
+	return fmt->uv_height_dec > 1 ? 2 : 1;
+}
+
 /*
  * We have to adjust the tile width such that the tile physaddrs and
  * U and V plane offsets are multiples of 8 bytes as required by
@@ -459,20 +579,228 @@ static inline u32 tile_height_align(enum ipu_image_convert_type type,
 		ipu_rot_mode_is_irt(rot_mode)) ? 8 : 2;
 }
 
+/*
+ * Fill in left position and width and for all tiles in an input column, and
+ * for all corresponding output tiles. If the 90° rotator is used, the output
+ * tiles are in a row, and output tile top position and height are set.
+ */
+static void fill_tile_column(struct ipu_image_convert_ctx *ctx,
+			     unsigned int col,
+			     struct ipu_image_convert_image *in,
+			     unsigned int in_left, unsigned int in_width,
+			     struct ipu_image_convert_image *out,
+			     unsigned int out_left, unsigned int out_width)
+{
+	unsigned int row, tile_idx;
+	struct ipu_image_tile *in_tile, *out_tile;
+
+	for (row = 0; row < in->num_rows; row++) {
+		tile_idx = in->num_cols * row + col;
+		in_tile = &in->tile[tile_idx];
+		out_tile = &out->tile[ctx->out_tile_map[tile_idx]];
+
+		in_tile->left = in_left;
+		in_tile->width = in_width;
+
+		if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
+			out_tile->top = out_left;
+			out_tile->height = out_width;
+		} else {
+			out_tile->left = out_left;
+			out_tile->width = out_width;
+		}
+	}
+}
+
+/*
+ * Fill in top position and height and for all tiles in an input row, and
+ * for all corresponding output tiles. If the 90° rotator is used, the output
+ * tiles are in a column, and output tile left position and width are set.
+ */
+static void fill_tile_row(struct ipu_image_convert_ctx *ctx, unsigned int row,
+			  struct ipu_image_convert_image *in,
+			  unsigned int in_top, unsigned int in_height,
+			  struct ipu_image_convert_image *out,
+			  unsigned int out_top, unsigned int out_height)
+{
+	unsigned int col, tile_idx;
+	struct ipu_image_tile *in_tile, *out_tile;
+
+	for (col = 0; col < in->num_cols; col++) {
+		tile_idx = in->num_cols * row + col;
+		in_tile = &in->tile[tile_idx];
+		out_tile = &out->tile[ctx->out_tile_map[tile_idx]];
+
+		in_tile->top = in_top;
+		in_tile->height = in_height;
+
+		if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
+			out_tile->left = out_top;
+			out_tile->width = out_height;
+		} else {
+			out_tile->top = out_top;
+			out_tile->height = out_height;
+		}
+	}
+}
+
+/*
+ * Find the best horizontal and vertical seam positions to split into tiles.
+ * Minimize the fractional part of the input sampling position for the
+ * top / left pixels of each tile.
+ */
+static void find_seams(struct ipu_image_convert_ctx *ctx,
+		       struct ipu_image_convert_image *in,
+		       struct ipu_image_convert_image *out)
+{
+	struct device *dev = ctx->chan->priv->ipu->dev;
+	unsigned int resized_width = out->base.rect.width;
+	unsigned int resized_height = out->base.rect.height;
+	unsigned int col;
+	unsigned int row;
+	unsigned int in_left_align = tile_left_align(in->fmt);
+	unsigned int in_top_align = tile_top_align(in->fmt);
+	unsigned int out_left_align = tile_left_align(out->fmt);
+	unsigned int out_top_align = tile_top_align(out->fmt);
+	unsigned int out_width_align = tile_width_align(out->fmt);
+	unsigned int out_height_align = tile_height_align(out->type,
+							  ctx->rot_mode);
+	unsigned int in_right = in->base.rect.width;
+	unsigned int in_bottom = in->base.rect.height;
+	unsigned int out_right = out->base.rect.width;
+	unsigned int out_bottom = out->base.rect.height;
+	unsigned int flipped_out_left;
+	unsigned int flipped_out_top;
+
+	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
+		/* Switch width/height and align top left to IRT block size */
+		resized_width = out->base.rect.height;
+		resized_height = out->base.rect.width;
+		out_left_align = out_height_align;
+		out_top_align = out_width_align;
+		out_width_align = out_left_align;
+		out_height_align = out_top_align;
+		out_right = out->base.rect.height;
+		out_bottom = out->base.rect.width;
+	}
+
+	for (col = in->num_cols - 1; col > 0; col--) {
+		bool allow_in_overshoot = ipu_rot_mode_is_irt(ctx->rot_mode) ||
+					  !(ctx->rot_mode & IPU_ROT_BIT_HFLIP);
+		bool allow_out_overshoot = (col < in->num_cols - 1) &&
+					   !(ctx->rot_mode & IPU_ROT_BIT_HFLIP);
+		unsigned int out_start;
+		unsigned int out_end;
+		unsigned int in_left;
+		unsigned int out_left;
+
+		/*
+		 * Align input width to burst length if the scaling step flips
+		 * horizontally.
+		 */
+
+		/* Start within 1024 pixels of the right edge */
+		out_start = max_t(int, 0, out_right - 1024);
+		/* End before having to add more columns to the left */
+		out_end = min_t(unsigned int, out_right, col * 1024);
+
+		find_best_seam(ctx, out_start, out_end,
+			       in_right, out_right,
+			       in_left_align, out_left_align,
+			       allow_in_overshoot ? 1 : 8 /* burst length */,
+			       allow_out_overshoot ? 1 : out_width_align,
+			       ctx->downsize_coeff_h, ctx->image_resize_coeff_h,
+			       &in_left, &out_left);
+
+		if (ctx->rot_mode & IPU_ROT_BIT_HFLIP)
+			flipped_out_left = resized_width - out_right;
+		else
+			flipped_out_left = out_left;
+
+		fill_tile_column(ctx, col, in, in_left, in_right - in_left,
+				 out, flipped_out_left, out_right - out_left);
+
+		dev_dbg(dev, "%s: col %u: %u, %u -> %u, %u\n", __func__, col,
+			in_left, in_right - in_left,
+			flipped_out_left, out_right - out_left);
+
+		in_right = in_left;
+		out_right = out_left;
+	}
+
+	flipped_out_left = (ctx->rot_mode & IPU_ROT_BIT_HFLIP) ?
+			   resized_width - out_right : 0;
+
+	fill_tile_column(ctx, 0, in, 0, in_right,
+			 out, flipped_out_left, out_right);
+
+	dev_dbg(dev, "%s: col 0: 0, %u -> %u, %u\n", __func__,
+		in_right, flipped_out_left, out_right);
+
+	for (row = in->num_rows - 1; row > 0; row--) {
+		bool allow_overshoot = row < in->num_rows - 1;
+		unsigned int out_start;
+		unsigned int out_end;
+		unsigned int in_top;
+		unsigned int out_top;
+
+		/* Start within 1024 lines of the bottom edge */
+		out_start = max_t(int, 0, out_bottom - 1024);
+		/* End before having to add more rows above */
+		out_end = min_t(unsigned int, out_bottom, row * 1024);
+
+		find_best_seam(ctx, out_start, out_end,
+			       in_bottom, out_bottom,
+			       in_top_align, out_top_align,
+			       1, allow_overshoot ? 1 : out_height_align,
+			       ctx->downsize_coeff_v, ctx->image_resize_coeff_v,
+			       &in_top, &out_top);
+
+		if ((ctx->rot_mode & IPU_ROT_BIT_VFLIP) ^
+		    ipu_rot_mode_is_irt(ctx->rot_mode))
+			flipped_out_top = resized_height - out_bottom;
+		else
+			flipped_out_top = out_top;
+
+		fill_tile_row(ctx, row, in, in_top, in_bottom - in_top,
+			      out, flipped_out_top, out_bottom - out_top);
+
+		dev_dbg(dev, "%s: row %u: %u, %u -> %u, %u\n", __func__, row,
+			in_top, in_bottom - in_top,
+			flipped_out_top, out_bottom - out_top);
+
+		in_bottom = in_top;
+		out_bottom = out_top;
+	}
+
+	if ((ctx->rot_mode & IPU_ROT_BIT_VFLIP) ^
+	    ipu_rot_mode_is_irt(ctx->rot_mode))
+		flipped_out_top = resized_height - out_bottom;
+	else
+		flipped_out_top = 0;
+
+	fill_tile_row(ctx, 0, in, 0, in_bottom,
+		      out, flipped_out_top, out_bottom);
+
+	dev_dbg(dev, "%s: row 0: 0, %u -> %u, %u\n", __func__,
+		in_bottom, flipped_out_top, out_bottom);
+}
+
 static void calc_tile_dimensions(struct ipu_image_convert_ctx *ctx,
 				 struct ipu_image_convert_image *image)
 {
 	unsigned int i;
 
 	for (i = 0; i < ctx->num_tiles; i++) {
-		struct ipu_image_tile *tile = &image->tile[i];
+		struct ipu_image_tile *tile;
 		const unsigned int row = i / image->num_cols;
 		const unsigned int col = i % image->num_cols;
 
-		tile->height = image->base.pix.height / image->num_rows;
-		tile->width = image->base.pix.width / image->num_cols;
-		tile->left = col * tile->width;
-		tile->top = row * tile->height;
+		if (image->type == IMAGE_CONVERT_OUT)
+			tile = &image->tile[ctx->out_tile_map[i]];
+		else
+			tile = &image->tile[i];
+
 		tile->size = ((tile->height * image->fmt->bpp) >> 3) *
 			tile->width;
 
@@ -1682,6 +2010,10 @@ ipu_image_convert_prepare(struct ipu_soc *ipu, enum ipu_ic_task ic_task,
 	if (ret)
 		goto out_free;
 
+	calc_out_tile_map(ctx);
+
+	find_seams(ctx, s_image, d_image);
+
 	calc_tile_dimensions(ctx, s_image);
 	ret = calc_tile_offsets(ctx, s_image);
 	if (ret)
@@ -1692,7 +2024,6 @@ ipu_image_convert_prepare(struct ipu_soc *ipu, enum ipu_ic_task ic_task,
 	if (ret)
 		goto out_free;
 
-	calc_out_tile_map(ctx);
 	calc_tile_resize_coefficients(ctx);
 
 	dump_format(ctx, s_image);
-- 
2.19.0

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v4 17/22] gpu: ipu-v3: image-convert: fix debug output for varying tile sizes
  2018-10-19 12:15 [PATCH v4 00/22] i.MX media mem2mem scaler Philipp Zabel
                   ` (15 preceding siblings ...)
  2018-10-19 12:15 ` [PATCH v4 16/22] gpu: ipu-v3: image-convert: select optimal seam positions Philipp Zabel
@ 2018-10-19 12:15 ` Philipp Zabel
  2018-10-19 12:15 ` [PATCH v4 18/22] gpu: ipu-v3: image-convert: relax alignment restrictions Philipp Zabel
                   ` (6 subsequent siblings)
  23 siblings, 0 replies; 26+ messages in thread
From: Philipp Zabel @ 2018-10-19 12:15 UTC (permalink / raw)
  To: linux-media, Steve Longerbeam; +Cc: Nicolas Dufresne, Tim Harvey, kernel

Since tile dimensions now vary between tiles, add debug output for each
tile's position and dimensions.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
---
No changes since v3.
---
 drivers/gpu/ipu-v3/ipu-image-convert.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-image-convert.c b/drivers/gpu/ipu-v3/ipu-image-convert.c
index a674241dd0b8..0451d699f515 100644
--- a/drivers/gpu/ipu-v3/ipu-image-convert.c
+++ b/drivers/gpu/ipu-v3/ipu-image-convert.c
@@ -308,12 +308,11 @@ static void dump_format(struct ipu_image_convert_ctx *ctx,
 	struct ipu_image_convert_priv *priv = chan->priv;
 
 	dev_dbg(priv->ipu->dev,
-		"task %u: ctx %p: %s format: %dx%d (%dx%d tiles of size %dx%d), %c%c%c%c\n",
+		"task %u: ctx %p: %s format: %dx%d (%dx%d tiles), %c%c%c%c\n",
 		chan->ic_task, ctx,
 		ic_image->type == IMAGE_CONVERT_OUT ? "Output" : "Input",
 		ic_image->base.pix.width, ic_image->base.pix.height,
 		ic_image->num_cols, ic_image->num_rows,
-		ic_image->tile[0].width, ic_image->tile[0].height,
 		ic_image->fmt->fourcc & 0xff,
 		(ic_image->fmt->fourcc >> 8) & 0xff,
 		(ic_image->fmt->fourcc >> 16) & 0xff,
@@ -789,6 +788,8 @@ static void find_seams(struct ipu_image_convert_ctx *ctx,
 static void calc_tile_dimensions(struct ipu_image_convert_ctx *ctx,
 				 struct ipu_image_convert_image *image)
 {
+	struct ipu_image_convert_chan *chan = ctx->chan;
+	struct ipu_image_convert_priv *priv = chan->priv;
 	unsigned int i;
 
 	for (i = 0; i < ctx->num_tiles; i++) {
@@ -813,6 +814,13 @@ static void calc_tile_dimensions(struct ipu_image_convert_ctx *ctx,
 			tile->rot_stride =
 				(image->fmt->bpp * tile->height) >> 3;
 		}
+
+		dev_dbg(priv->ipu->dev,
+			"task %u: ctx %p: %s@[%u,%u]: %ux%u@%u,%u\n",
+			chan->ic_task, ctx,
+			image->type == IMAGE_CONVERT_IN ? "Input" : "Output",
+			row, col,
+			tile->width, tile->height, tile->left, tile->top);
 	}
 }
 
-- 
2.19.0

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v4 18/22] gpu: ipu-v3: image-convert: relax alignment restrictions
  2018-10-19 12:15 [PATCH v4 00/22] i.MX media mem2mem scaler Philipp Zabel
                   ` (16 preceding siblings ...)
  2018-10-19 12:15 ` [PATCH v4 17/22] gpu: ipu-v3: image-convert: fix debug output for varying tile sizes Philipp Zabel
@ 2018-10-19 12:15 ` Philipp Zabel
  2018-10-19 12:15 ` [PATCH v4 19/22] gpu: ipu-v3: image-convert: fix bytesperline adjustment Philipp Zabel
                   ` (5 subsequent siblings)
  23 siblings, 0 replies; 26+ messages in thread
From: Philipp Zabel @ 2018-10-19 12:15 UTC (permalink / raw)
  To: linux-media, Steve Longerbeam; +Cc: Nicolas Dufresne, Tim Harvey, kernel

For the planar but U/V-packed formats NV12 and NV16, 8 pixel width
alignment is good enough to fulfill the 8 byte stride requirement.
If we allow the input 8-pixel DMA bursts to overshoot the end of the
line, the only input alignment restrictions are dictated by the pixel
format and 8-byte aligned line start address.
Since different tile sizes are allowed, the output tile with / height
alignment doesn't need to be multiplied by number of columns / rows.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
[slongerbeam@gmail.com: Bring in the fixes to format width and
 height alignment restrictions from imx-media-mem2mem.c.]
Signed-off-by: Steve Longerbeam <slongerbeam@gmail.com>
---
No changes since v3.
---
 drivers/gpu/ipu-v3/ipu-image-convert.c | 81 +++++++++++++-------------
 1 file changed, 41 insertions(+), 40 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-image-convert.c b/drivers/gpu/ipu-v3/ipu-image-convert.c
index 0451d699f515..0829723a7599 100644
--- a/drivers/gpu/ipu-v3/ipu-image-convert.c
+++ b/drivers/gpu/ipu-v3/ipu-image-convert.c
@@ -551,31 +551,46 @@ static inline u32 tile_top_align(const struct ipu_image_pixfmt *fmt)
 	return fmt->uv_height_dec > 1 ? 2 : 1;
 }
 
-/*
- * We have to adjust the tile width such that the tile physaddrs and
- * U and V plane offsets are multiples of 8 bytes as required by
- * the IPU DMA Controller. For the planar formats, this corresponds
- * to a pixel alignment of 16 (but use a more formal equation since
- * the variables are available). For all the packed formats, 8 is
- * good enough.
- */
-static inline u32 tile_width_align(const struct ipu_image_pixfmt *fmt)
+static inline u32 tile_width_align(enum ipu_image_convert_type type,
+				   const struct ipu_image_pixfmt *fmt,
+				   enum ipu_rotate_mode rot_mode)
 {
-	return fmt->planar ? 8 * fmt->uv_width_dec : 8;
+	if (type == IMAGE_CONVERT_IN) {
+		/*
+		 * The IC burst reads 8 pixels at a time. Reading beyond the
+		 * end of the line is usually acceptable. Those pixels are
+		 * ignored, unless the IC has to write the scaled line in
+		 * reverse.
+		 */
+		return (!ipu_rot_mode_is_irt(rot_mode) &&
+			(rot_mode & IPU_ROT_BIT_HFLIP)) ? 8 : 2;
+	}
+
+	/*
+	 * Align to 16x16 pixel blocks for planar 4:2:0 chroma subsampled
+	 * formats to guarantee 8-byte aligned line start addresses in the
+	 * chroma planes when IRT is used. Align to 8x8 pixel IRT block size
+	 * for all other formats.
+	 */
+	return (ipu_rot_mode_is_irt(rot_mode) &&
+		fmt->planar && !fmt->uv_packed) ?
+		8 * fmt->uv_width_dec : 8;
 }
 
-/*
- * For tile height alignment, we have to ensure that the output tile
- * heights are multiples of 8 lines if the IRT is required by the
- * given rotation mode (the IRT performs rotations on 8x8 blocks
- * at a time). If the IRT is not used, or for input image tiles,
- * 2 lines are good enough.
- */
 static inline u32 tile_height_align(enum ipu_image_convert_type type,
+				    const struct ipu_image_pixfmt *fmt,
 				    enum ipu_rotate_mode rot_mode)
 {
-	return (type == IMAGE_CONVERT_OUT &&
-		ipu_rot_mode_is_irt(rot_mode)) ? 8 : 2;
+	if (type == IMAGE_CONVERT_IN || !ipu_rot_mode_is_irt(rot_mode))
+		return 2;
+
+	/*
+	 * Align to 16x16 pixel blocks for planar 4:2:0 chroma subsampled
+	 * formats to guarantee 8-byte aligned line start addresses in the
+	 * chroma planes when IRT is used. Align to 8x8 pixel IRT block size
+	 * for all other formats.
+	 */
+	return (fmt->planar && !fmt->uv_packed) ? 8 * fmt->uv_width_dec : 8;
 }
 
 /*
@@ -661,8 +676,9 @@ static void find_seams(struct ipu_image_convert_ctx *ctx,
 	unsigned int in_top_align = tile_top_align(in->fmt);
 	unsigned int out_left_align = tile_left_align(out->fmt);
 	unsigned int out_top_align = tile_top_align(out->fmt);
-	unsigned int out_width_align = tile_width_align(out->fmt);
-	unsigned int out_height_align = tile_height_align(out->type,
+	unsigned int out_width_align = tile_width_align(out->type, out->fmt,
+							ctx->rot_mode);
+	unsigned int out_height_align = tile_height_align(out->type, out->fmt,
 							  ctx->rot_mode);
 	unsigned int in_right = in->base.rect.width;
 	unsigned int in_bottom = in->base.rect.height;
@@ -1855,8 +1871,6 @@ void ipu_image_convert_adjust(struct ipu_image *in, struct ipu_image *out,
 			      enum ipu_rotate_mode rot_mode)
 {
 	const struct ipu_image_pixfmt *infmt, *outfmt;
-	unsigned int num_in_rows, num_in_cols;
-	unsigned int num_out_rows, num_out_cols;
 	u32 w_align, h_align;
 
 	infmt = get_format(in->pix.pixelformat);
@@ -1888,28 +1902,15 @@ void ipu_image_convert_adjust(struct ipu_image *in, struct ipu_image *out,
 					in->pix.height / 4);
 	}
 
-	/* get tiling rows/cols from output format */
-	num_out_rows = num_stripes(out->pix.height);
-	num_out_cols = num_stripes(out->pix.width);
-	if (ipu_rot_mode_is_irt(rot_mode)) {
-		num_in_rows = num_out_cols;
-		num_in_cols = num_out_rows;
-	} else {
-		num_in_rows = num_out_rows;
-		num_in_cols = num_out_cols;
-	}
-
 	/* align input width/height */
-	w_align = ilog2(tile_width_align(infmt) * num_in_cols);
-	h_align = ilog2(tile_height_align(IMAGE_CONVERT_IN, rot_mode) *
-			num_in_rows);
+	w_align = ilog2(tile_width_align(IMAGE_CONVERT_IN, infmt, rot_mode));
+	h_align = ilog2(tile_height_align(IMAGE_CONVERT_IN, infmt, rot_mode));
 	in->pix.width = clamp_align(in->pix.width, MIN_W, MAX_W, w_align);
 	in->pix.height = clamp_align(in->pix.height, MIN_H, MAX_H, h_align);
 
 	/* align output width/height */
-	w_align = ilog2(tile_width_align(outfmt) * num_out_cols);
-	h_align = ilog2(tile_height_align(IMAGE_CONVERT_OUT, rot_mode) *
-			num_out_rows);
+	w_align = ilog2(tile_width_align(IMAGE_CONVERT_OUT, outfmt, rot_mode));
+	h_align = ilog2(tile_height_align(IMAGE_CONVERT_OUT, outfmt, rot_mode));
 	out->pix.width = clamp_align(out->pix.width, MIN_W, MAX_W, w_align);
 	out->pix.height = clamp_align(out->pix.height, MIN_H, MAX_H, h_align);
 
-- 
2.19.0

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v4 19/22] gpu: ipu-v3: image-convert: fix bytesperline adjustment
  2018-10-19 12:15 [PATCH v4 00/22] i.MX media mem2mem scaler Philipp Zabel
                   ` (17 preceding siblings ...)
  2018-10-19 12:15 ` [PATCH v4 18/22] gpu: ipu-v3: image-convert: relax alignment restrictions Philipp Zabel
@ 2018-10-19 12:15 ` Philipp Zabel
  2018-10-19 12:15 ` [PATCH v4 20/22] gpu: ipu-v3: image-convert: add some ASCII art to the exposition Philipp Zabel
                   ` (4 subsequent siblings)
  23 siblings, 0 replies; 26+ messages in thread
From: Philipp Zabel @ 2018-10-19 12:15 UTC (permalink / raw)
  To: linux-media, Steve Longerbeam; +Cc: Nicolas Dufresne, Tim Harvey, kernel

For planar formats, bytesperline does not depend on BPP. It must always
be larger than width and aligned to tile width alignment restrictions.

The input bytesperline to ipu_image_convert_adjust() may be
uninitialized, so don't rely on input bytesperline as the
minimum value for clamp_align(). Use 2 << w_align as the minimum
instead.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
[slongerbeam@gmail.com: clamp input bytesperline]
Signed-off-by: Steve Longerbeam <slongerbeam@gmail.com>
---
No changes since v3.
---
 drivers/gpu/ipu-v3/ipu-image-convert.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-image-convert.c b/drivers/gpu/ipu-v3/ipu-image-convert.c
index 0829723a7599..b735065fe288 100644
--- a/drivers/gpu/ipu-v3/ipu-image-convert.c
+++ b/drivers/gpu/ipu-v3/ipu-image-convert.c
@@ -1915,10 +1915,18 @@ void ipu_image_convert_adjust(struct ipu_image *in, struct ipu_image *out,
 	out->pix.height = clamp_align(out->pix.height, MIN_H, MAX_H, h_align);
 
 	/* set input/output strides and image sizes */
-	in->pix.bytesperline = (in->pix.width * infmt->bpp) >> 3;
-	in->pix.sizeimage = in->pix.height * in->pix.bytesperline;
-	out->pix.bytesperline = (out->pix.width * outfmt->bpp) >> 3;
-	out->pix.sizeimage = out->pix.height * out->pix.bytesperline;
+	in->pix.bytesperline = infmt->planar ?
+		clamp_align(in->pix.width, 2 << w_align, MAX_W, w_align) :
+		clamp_align((in->pix.width * infmt->bpp) >> 3,
+			    2 << w_align, MAX_W, w_align);
+	in->pix.sizeimage = infmt->planar ?
+		(in->pix.height * in->pix.bytesperline * infmt->bpp) >> 3 :
+		in->pix.height * in->pix.bytesperline;
+	out->pix.bytesperline = outfmt->planar ? out->pix.width :
+		(out->pix.width * outfmt->bpp) >> 3;
+	out->pix.sizeimage = outfmt->planar ?
+		(out->pix.height * out->pix.bytesperline * outfmt->bpp) >> 3 :
+		out->pix.height * out->pix.bytesperline;
 }
 EXPORT_SYMBOL_GPL(ipu_image_convert_adjust);
 
-- 
2.19.0

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v4 20/22] gpu: ipu-v3: image-convert: add some ASCII art to the exposition
  2018-10-19 12:15 [PATCH v4 00/22] i.MX media mem2mem scaler Philipp Zabel
                   ` (18 preceding siblings ...)
  2018-10-19 12:15 ` [PATCH v4 19/22] gpu: ipu-v3: image-convert: fix bytesperline adjustment Philipp Zabel
@ 2018-10-19 12:15 ` Philipp Zabel
  2018-10-19 12:15 ` [PATCH v4 21/22] gpu: ipu-v3: image-convert: disable double buffering if necessary Philipp Zabel
                   ` (3 subsequent siblings)
  23 siblings, 0 replies; 26+ messages in thread
From: Philipp Zabel @ 2018-10-19 12:15 UTC (permalink / raw)
  To: linux-media, Steve Longerbeam; +Cc: Nicolas Dufresne, Tim Harvey, kernel

Visualize the scaling and rotation pipeline with some ASCII art
diagrams. Remove the FIXME comment about missing seam prevention.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
---
No changes since v3.
---
 drivers/gpu/ipu-v3/ipu-image-convert.c | 39 +++++++++++++++++++-------
 1 file changed, 29 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-image-convert.c b/drivers/gpu/ipu-v3/ipu-image-convert.c
index b735065fe288..91fe8f1672b4 100644
--- a/drivers/gpu/ipu-v3/ipu-image-convert.c
+++ b/drivers/gpu/ipu-v3/ipu-image-convert.c
@@ -37,17 +37,36 @@
  * when double_buffering boolean is set).
  *
  * Note that the input frame must be split up into the same number
- * of tiles as the output frame.
+ * of tiles as the output frame:
  *
- * FIXME: at this point there is no attempt to deal with visible seams
- * at the tile boundaries when upscaling. The seams are caused by a reset
- * of the bilinear upscale interpolation when starting a new tile. The
- * seams are barely visible for small upscale factors, but become
- * increasingly visible as the upscale factor gets larger, since more
- * interpolated pixels get thrown out at the tile boundaries. A possilble
- * fix might be to overlap tiles of different sizes, but this must be done
- * while also maintaining the IDMAC dma buffer address alignment and 8x8 IRT
- * alignment restrictions of each tile.
+ *                       +---------+-----+
+ *   +-----+---+         |  A      | B   |
+ *   | A   | B |         |         |     |
+ *   +-----+---+   -->   +---------+-----+
+ *   | C   | D |         |  C      | D   |
+ *   +-----+---+         |         |     |
+ *                       +---------+-----+
+ *
+ * Clockwise 90° rotations are handled by first rescaling into a
+ * reusable temporary tile buffer and then rotating with the 8x8
+ * block rotator, writing to the correct destination:
+ *
+ *                                         +-----+-----+
+ *                                         |     |     |
+ *   +-----+---+         +---------+       | C   | A   |
+ *   | A   | B |         | A,B, |  |       |     |     |
+ *   +-----+---+   -->   | C,D  |  |  -->  |     |     |
+ *   | C   | D |         +---------+       +-----+-----+
+ *   +-----+---+                           | D   | B   |
+ *                                         |     |     |
+ *                                         +-----+-----+
+ *
+ * If the 8x8 block rotator is used, horizontal or vertical flipping
+ * is done during the rotation step, otherwise flipping is done
+ * during the scaling step.
+ * With rotation or flipping, tile order changes between input and
+ * output image. Tiles are numbered row major from top left to bottom
+ * right for both input and output image.
  */
 
 #define MAX_STRIPES_W    4
-- 
2.19.0

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v4 21/22] gpu: ipu-v3: image-convert: disable double buffering if necessary
  2018-10-19 12:15 [PATCH v4 00/22] i.MX media mem2mem scaler Philipp Zabel
                   ` (19 preceding siblings ...)
  2018-10-19 12:15 ` [PATCH v4 20/22] gpu: ipu-v3: image-convert: add some ASCII art to the exposition Philipp Zabel
@ 2018-10-19 12:15 ` Philipp Zabel
  2018-10-19 12:15 ` [PATCH v4 22/22] gpu: ipu-v3: image-convert: allow three rows or columns Philipp Zabel
                   ` (2 subsequent siblings)
  23 siblings, 0 replies; 26+ messages in thread
From: Philipp Zabel @ 2018-10-19 12:15 UTC (permalink / raw)
  To: linux-media, Steve Longerbeam; +Cc: Nicolas Dufresne, Tim Harvey, kernel

Double-buffering only works if tile sizes are the same and the resizing
coefficient does not change between tiles, even for non-planar formats.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
---
No changes since v3.
---
 drivers/gpu/ipu-v3/ipu-image-convert.c | 27 ++++++++++++++++++++++++--
 1 file changed, 25 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-image-convert.c b/drivers/gpu/ipu-v3/ipu-image-convert.c
index 91fe8f1672b4..3e73494d5930 100644
--- a/drivers/gpu/ipu-v3/ipu-image-convert.c
+++ b/drivers/gpu/ipu-v3/ipu-image-convert.c
@@ -1990,6 +1990,7 @@ ipu_image_convert_prepare(struct ipu_soc *ipu, enum ipu_ic_task ic_task,
 	struct ipu_image_convert_chan *chan;
 	struct ipu_image_convert_ctx *ctx;
 	unsigned long flags;
+	unsigned int i;
 	bool get_res;
 	int ret;
 
@@ -2077,15 +2078,37 @@ ipu_image_convert_prepare(struct ipu_soc *ipu, enum ipu_ic_task ic_task,
 	 * for every tile, and therefore would have to be updated for
 	 * each buffer which is not possible. So double-buffering is
 	 * impossible when either the source or destination images are
-	 * a planar format (YUV420, YUV422P, etc.).
+	 * a planar format (YUV420, YUV422P, etc.). Further, differently
+	 * sized tiles or different resizing coefficients per tile
+	 * prevent double-buffering as well.
 	 */
 	ctx->double_buffering = (ctx->num_tiles > 1 &&
 				 !s_image->fmt->planar &&
 				 !d_image->fmt->planar);
+	for (i = 1; i < ctx->num_tiles; i++) {
+		if (ctx->in.tile[i].width != ctx->in.tile[0].width ||
+		    ctx->in.tile[i].height != ctx->in.tile[0].height ||
+		    ctx->out.tile[i].width != ctx->out.tile[0].width ||
+		    ctx->out.tile[i].height != ctx->out.tile[0].height) {
+			ctx->double_buffering = false;
+			break;
+		}
+	}
+	for (i = 1; i < ctx->in.num_cols; i++) {
+		if (ctx->resize_coeffs_h[i] != ctx->resize_coeffs_h[0]) {
+			ctx->double_buffering = false;
+			break;
+		}
+	}
+	for (i = 1; i < ctx->in.num_rows; i++) {
+		if (ctx->resize_coeffs_v[i] != ctx->resize_coeffs_v[0]) {
+			ctx->double_buffering = false;
+			break;
+		}
+	}
 
 	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
 		unsigned long intermediate_size = d_image->tile[0].size;
-		unsigned int i;
 
 		for (i = 1; i < ctx->num_tiles; i++) {
 			if (d_image->tile[i].size > intermediate_size)
-- 
2.19.0

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v4 22/22] gpu: ipu-v3: image-convert: allow three rows or columns
  2018-10-19 12:15 [PATCH v4 00/22] i.MX media mem2mem scaler Philipp Zabel
                   ` (20 preceding siblings ...)
  2018-10-19 12:15 ` [PATCH v4 21/22] gpu: ipu-v3: image-convert: disable double buffering if necessary Philipp Zabel
@ 2018-10-19 12:15 ` Philipp Zabel
  2018-10-19 18:51 ` [PATCH v4 00/22] i.MX media mem2mem scaler Steve Longerbeam
  2018-11-14 13:47 ` Adam Ford
  23 siblings, 0 replies; 26+ messages in thread
From: Philipp Zabel @ 2018-10-19 12:15 UTC (permalink / raw)
  To: linux-media, Steve Longerbeam; +Cc: Nicolas Dufresne, Tim Harvey, kernel

If width or height are in the [2049, 3072] range, allow to
use just three tiles in this dimension, instead of four.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
---
No changes since v3.
---
 drivers/gpu/ipu-v3/ipu-image-convert.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-image-convert.c b/drivers/gpu/ipu-v3/ipu-image-convert.c
index 3e73494d5930..13103ab86050 100644
--- a/drivers/gpu/ipu-v3/ipu-image-convert.c
+++ b/drivers/gpu/ipu-v3/ipu-image-convert.c
@@ -379,12 +379,7 @@ static int alloc_dma_buf(struct ipu_image_convert_priv *priv,
 
 static inline int num_stripes(int dim)
 {
-	if (dim <= 1024)
-		return 1;
-	else if (dim <= 2048)
-		return 2;
-	else
-		return 4;
+	return (dim - 1) / 1024 + 1;
 }
 
 /*
-- 
2.19.0

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 00/22] i.MX media mem2mem scaler
  2018-10-19 12:15 [PATCH v4 00/22] i.MX media mem2mem scaler Philipp Zabel
                   ` (21 preceding siblings ...)
  2018-10-19 12:15 ` [PATCH v4 22/22] gpu: ipu-v3: image-convert: allow three rows or columns Philipp Zabel
@ 2018-10-19 18:51 ` Steve Longerbeam
  2018-10-30 10:00   ` Philipp Zabel
  2018-11-14 13:47 ` Adam Ford
  23 siblings, 1 reply; 26+ messages in thread
From: Steve Longerbeam @ 2018-10-19 18:51 UTC (permalink / raw)
  To: Philipp Zabel, linux-media, Steve Longerbeam
  Cc: Nicolas Dufresne, Tim Harvey, kernel

Awesome, thanks Philipp.

For the whole series:

Acked-by: Steve Longerbeam <slongerbeam@gmail.com>
Tested-by: Steve Longerbeam <slongerbeam@gmail.com>
on i.MX6q SabreSD.


On 10/19/18 5:15 AM, Philipp Zabel wrote:
> Hi,
>
> this is the fourth version of the i.MX mem2mem scaler series.
>
> An alignment issue with 24-bit RGB formats has been corrected in the
> seam position selection patch and a few new fixes by Steve have been
> added. If there are no more issues, I'll pick up the ipu-v3 patches
> via imx-drm/next. The first patch could be merged via the media tree
> independently.
>
> Changes since v3:
>   - Fix tile_left_align for 24-bit RGB formats and reduce alignment
>     restrictions for U/V packed planar YUV formats
>   - Catch unaligned tile offsets in image-convert
>   - Add chroma plane offset overrides to ipu_cpmem_set_image() to
>     prevent a false positive warning in some cases
>   - Fix a race between run and unprepare and make abort reentrant.
>
>
> Changes since v2:
>   - Rely on ipu_image_convert_adjust() in mem2mem_try_fmt() for format
>     adjustments. This makes the mem2mem driver mostly a V4L2 mem2mem API
>     wrapper around the IPU image converter, and independent of the
>     internal image converter implementation.
>   - Remove the source and destination buffers on error in device_run().
>     Otherwise the conversion is re-attempted apparently over and over
>     again (with WARN() backtraces).
>   - Allow subscribing to control changes.
>   - Fix seam position selection for more corner cases:
>      - Switch width/height properly and align tile top left positions to 8x8
>        IRT block size when rotating.
>      - Align input width to input burst length in case the scaling step
>        flips horizontally.
>      - Fix bottom edge calculation.
>
> Changes since v1:
>   - Fix inverted allow_overshoot logic
>   - Correctly switch horizontal / vertical tile alignment when
>     determining seam positions with the 90° rotator active.
>   - Fix SPDX-License-Identifier and remove superfluous license
>     text.
>   - Fix uninitialized walign in try_fmt
>
> Previous cover letter:
>
> we have image conversion code for scaling and colorspace conversion in
> the IPUv3 base driver for a while. Since the IC hardware can only write
> up to 1024x1024 pixel buffers, it scales to larger output buffers by
> splitting the input and output frame into similarly sized tiles.
>
> This causes the issue that the bilinear interpolation resets at the tile
> boundary: instead of smoothly interpolating across the seam, there is a
> jump in the input sample position that is very apparent for high
> upscaling factors. This can be avoided by slightly changing the scaling
> coefficients to let the left/top tiles overshoot their input sampling
> into the first pixel / line of their right / bottom neighbors. The error
> can be further reduced by letting tiles be differently sized and by
> selecting seam positions that minimize the input sampling position error
> at tile boundaries.
> This is complicated by different DMA start address, burst size, and
> rotator block size alignment requirements, depending on the input and
> output pixel formats, and the fact that flipping happens in different
> places depending on the rotation.
>
> This series implements optimal seam position selection and seam hiding
> with per-tile resizing coefficients and adds a scaling mem2mem device
> to the imx-media driver.
>
> regards
> Philipp
>
> Philipp Zabel (15):
>    media: imx: add mem2mem device
>    gpu: ipu-v3: ipu-ic: allow to manually set resize coefficients
>    gpu: ipu-v3: image-convert: prepare for per-tile configuration
>    gpu: ipu-v3: image-convert: calculate per-tile resize coefficients
>    gpu: ipu-v3: image-convert: reconfigure IC per tile
>    gpu: ipu-v3: image-convert: store tile top/left position
>    gpu: ipu-v3: image-convert: calculate tile dimensions and offsets
>      outside fill_image
>    gpu: ipu-v3: image-convert: move tile alignment helpers
>    gpu: ipu-v3: image-convert: select optimal seam positions
>    gpu: ipu-v3: image-convert: fix debug output for varying tile sizes
>    gpu: ipu-v3: image-convert: relax alignment restrictions
>    gpu: ipu-v3: image-convert: fix bytesperline adjustment
>    gpu: ipu-v3: image-convert: add some ASCII art to the exposition
>    gpu: ipu-v3: image-convert: disable double buffering if necessary
>    gpu: ipu-v3: image-convert: allow three rows or columns
>
> Steve Longerbeam (7):
>    gpu: ipu-cpmem: add WARN_ON_ONCE() for unaligned dma buffers
>    gpu: ipu-v3: Add chroma plane offset overrides to
>      ipu_cpmem_set_image()
>    gpu: ipu-v3: image-convert: Prevent race between run and unprepare
>    gpu: ipu-v3: image-convert: Only wait for abort completion if active
>      run
>    gpu: ipu-v3: image-convert: Allow reentrancy into abort
>    gpu: ipu-v3: image-convert: Remove need_abort flag
>    gpu: ipu-v3: image-convert: Catch unaligned tile offsets
>
>   drivers/gpu/ipu-v3/ipu-cpmem.c                |   52 +-
>   drivers/gpu/ipu-v3/ipu-ic.c                   |   52 +-
>   drivers/gpu/ipu-v3/ipu-image-convert.c        | 1019 ++++++++++++++---
>   drivers/staging/media/imx/Kconfig             |    1 +
>   drivers/staging/media/imx/Makefile            |    1 +
>   drivers/staging/media/imx/imx-media-dev.c     |   11 +
>   drivers/staging/media/imx/imx-media-mem2mem.c |  873 ++++++++++++++
>   drivers/staging/media/imx/imx-media.h         |   10 +
>   include/video/imx-ipu-v3.h                    |    9 +
>   9 files changed, 1821 insertions(+), 207 deletions(-)
>   create mode 100644 drivers/staging/media/imx/imx-media-mem2mem.c
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 00/22] i.MX media mem2mem scaler
  2018-10-19 18:51 ` [PATCH v4 00/22] i.MX media mem2mem scaler Steve Longerbeam
@ 2018-10-30 10:00   ` Philipp Zabel
  0 siblings, 0 replies; 26+ messages in thread
From: Philipp Zabel @ 2018-10-30 10:00 UTC (permalink / raw)
  To: Steve Longerbeam, linux-media, Steve Longerbeam
  Cc: Nicolas Dufresne, Tim Harvey, kernel

Hi Steve,

On Fri, 2018-10-19 at 11:51 -0700, Steve Longerbeam wrote:
> Awesome, thanks Philipp.
> 
> For the whole series:
> 
> Acked-by: Steve Longerbeam <slongerbeam@gmail.com>
> Tested-by: Steve Longerbeam <slongerbeam@gmail.com>
> on i.MX6q SabreSD.

Thank you. I have added the tags where it made sense to me and applied
the ipu-v3 patches (all except 1/22) to imx-drm/next.

regards
Philipp

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 00/22] i.MX media mem2mem scaler
  2018-10-19 12:15 [PATCH v4 00/22] i.MX media mem2mem scaler Philipp Zabel
                   ` (22 preceding siblings ...)
  2018-10-19 18:51 ` [PATCH v4 00/22] i.MX media mem2mem scaler Steve Longerbeam
@ 2018-11-14 13:47 ` Adam Ford
  23 siblings, 0 replies; 26+ messages in thread
From: Adam Ford @ 2018-11-14 13:47 UTC (permalink / raw)
  To: p.zabel; +Cc: linux-media, slongerbeam, nicolas, Tim Harvey, Sascha Hauer

On Fri, Oct 19, 2018 at 7:16 AM Philipp Zabel <p.zabel@pengutronix.de> wrote:
>
> Hi,
>
> this is the fourth version of the i.MX mem2mem scaler series.
>
> An alignment issue with 24-bit RGB formats has been corrected in the
> seam position selection patch and a few new fixes by Steve have been
> added. If there are no more issues, I'll pick up the ipu-v3 patches
> via imx-drm/next. The first patch could be merged via the media tree
> independently.

I'd like to test this, but I am not sure how.  Do you have
instructions?  (ideally using gstreamer?)

adam

>
> Changes since v3:
>  - Fix tile_left_align for 24-bit RGB formats and reduce alignment
>    restrictions for U/V packed planar YUV formats
>  - Catch unaligned tile offsets in image-convert
>  - Add chroma plane offset overrides to ipu_cpmem_set_image() to
>    prevent a false positive warning in some cases
>  - Fix a race between run and unprepare and make abort reentrant.
>
>
> Changes since v2:
>  - Rely on ipu_image_convert_adjust() in mem2mem_try_fmt() for format
>    adjustments. This makes the mem2mem driver mostly a V4L2 mem2mem API
>    wrapper around the IPU image converter, and independent of the
>    internal image converter implementation.
>  - Remove the source and destination buffers on error in device_run().
>    Otherwise the conversion is re-attempted apparently over and over
>    again (with WARN() backtraces).
>  - Allow subscribing to control changes.
>  - Fix seam position selection for more corner cases:
>     - Switch width/height properly and align tile top left positions to 8x8
>       IRT block size when rotating.
>     - Align input width to input burst length in case the scaling step
>       flips horizontally.
>     - Fix bottom edge calculation.
>
> Changes since v1:
>  - Fix inverted allow_overshoot logic
>  - Correctly switch horizontal / vertical tile alignment when
>    determining seam positions with the 90° rotator active.
>  - Fix SPDX-License-Identifier and remove superfluous license
>    text.
>  - Fix uninitialized walign in try_fmt
>
> Previous cover letter:
>
> we have image conversion code for scaling and colorspace conversion in
> the IPUv3 base driver for a while. Since the IC hardware can only write
> up to 1024x1024 pixel buffers, it scales to larger output buffers by
> splitting the input and output frame into similarly sized tiles.
>
> This causes the issue that the bilinear interpolation resets at the tile
> boundary: instead of smoothly interpolating across the seam, there is a
> jump in the input sample position that is very apparent for high
> upscaling factors. This can be avoided by slightly changing the scaling
> coefficients to let the left/top tiles overshoot their input sampling
> into the first pixel / line of their right / bottom neighbors. The error
> can be further reduced by letting tiles be differently sized and by
> selecting seam positions that minimize the input sampling position error
> at tile boundaries.
> This is complicated by different DMA start address, burst size, and
> rotator block size alignment requirements, depending on the input and
> output pixel formats, and the fact that flipping happens in different
> places depending on the rotation.
>
> This series implements optimal seam position selection and seam hiding
> with per-tile resizing coefficients and adds a scaling mem2mem device
> to the imx-media driver.
>
> regards
> Philipp
>
> Philipp Zabel (15):
>   media: imx: add mem2mem device
>   gpu: ipu-v3: ipu-ic: allow to manually set resize coefficients
>   gpu: ipu-v3: image-convert: prepare for per-tile configuration
>   gpu: ipu-v3: image-convert: calculate per-tile resize coefficients
>   gpu: ipu-v3: image-convert: reconfigure IC per tile
>   gpu: ipu-v3: image-convert: store tile top/left position
>   gpu: ipu-v3: image-convert: calculate tile dimensions and offsets
>     outside fill_image
>   gpu: ipu-v3: image-convert: move tile alignment helpers
>   gpu: ipu-v3: image-convert: select optimal seam positions
>   gpu: ipu-v3: image-convert: fix debug output for varying tile sizes
>   gpu: ipu-v3: image-convert: relax alignment restrictions
>   gpu: ipu-v3: image-convert: fix bytesperline adjustment
>   gpu: ipu-v3: image-convert: add some ASCII art to the exposition
>   gpu: ipu-v3: image-convert: disable double buffering if necessary
>   gpu: ipu-v3: image-convert: allow three rows or columns
>
> Steve Longerbeam (7):
>   gpu: ipu-cpmem: add WARN_ON_ONCE() for unaligned dma buffers
>   gpu: ipu-v3: Add chroma plane offset overrides to
>     ipu_cpmem_set_image()
>   gpu: ipu-v3: image-convert: Prevent race between run and unprepare
>   gpu: ipu-v3: image-convert: Only wait for abort completion if active
>     run
>   gpu: ipu-v3: image-convert: Allow reentrancy into abort
>   gpu: ipu-v3: image-convert: Remove need_abort flag
>   gpu: ipu-v3: image-convert: Catch unaligned tile offsets
>
>  drivers/gpu/ipu-v3/ipu-cpmem.c                |   52 +-
>  drivers/gpu/ipu-v3/ipu-ic.c                   |   52 +-
>  drivers/gpu/ipu-v3/ipu-image-convert.c        | 1019 ++++++++++++++---
>  drivers/staging/media/imx/Kconfig             |    1 +
>  drivers/staging/media/imx/Makefile            |    1 +
>  drivers/staging/media/imx/imx-media-dev.c     |   11 +
>  drivers/staging/media/imx/imx-media-mem2mem.c |  873 ++++++++++++++
>  drivers/staging/media/imx/imx-media.h         |   10 +
>  include/video/imx-ipu-v3.h                    |    9 +
>  9 files changed, 1821 insertions(+), 207 deletions(-)
>  create mode 100644 drivers/staging/media/imx/imx-media-mem2mem.c
>
> --
> 2.19.0
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2018-11-14 23:51 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-19 12:15 [PATCH v4 00/22] i.MX media mem2mem scaler Philipp Zabel
2018-10-19 12:15 ` [PATCH v4 01/22] media: imx: add mem2mem device Philipp Zabel
2018-10-19 12:15 ` [PATCH v4 02/22] gpu: ipu-cpmem: add WARN_ON_ONCE() for unaligned dma buffers Philipp Zabel
2018-10-19 12:15 ` [PATCH v4 03/22] gpu: ipu-v3: Add chroma plane offset overrides to ipu_cpmem_set_image() Philipp Zabel
2018-10-19 12:15 ` [PATCH v4 04/22] gpu: ipu-v3: ipu-ic: allow to manually set resize coefficients Philipp Zabel
2018-10-19 12:15 ` [PATCH v4 05/22] gpu: ipu-v3: image-convert: Prevent race between run and unprepare Philipp Zabel
2018-10-19 12:15 ` [PATCH v4 06/22] gpu: ipu-v3: image-convert: Only wait for abort completion if active run Philipp Zabel
2018-10-19 12:15 ` [PATCH v4 07/22] gpu: ipu-v3: image-convert: Allow reentrancy into abort Philipp Zabel
2018-10-19 12:15 ` [PATCH v4 08/22] gpu: ipu-v3: image-convert: Remove need_abort flag Philipp Zabel
2018-10-19 12:15 ` [PATCH v4 09/22] gpu: ipu-v3: image-convert: Catch unaligned tile offsets Philipp Zabel
2018-10-19 12:15 ` [PATCH v4 10/22] gpu: ipu-v3: image-convert: prepare for per-tile configuration Philipp Zabel
2018-10-19 12:15 ` [PATCH v4 11/22] gpu: ipu-v3: image-convert: calculate per-tile resize coefficients Philipp Zabel
2018-10-19 12:15 ` [PATCH v4 12/22] gpu: ipu-v3: image-convert: reconfigure IC per tile Philipp Zabel
2018-10-19 12:15 ` [PATCH v4 13/22] gpu: ipu-v3: image-convert: store tile top/left position Philipp Zabel
2018-10-19 12:15 ` [PATCH v4 14/22] gpu: ipu-v3: image-convert: calculate tile dimensions and offsets outside fill_image Philipp Zabel
2018-10-19 12:15 ` [PATCH v4 15/22] gpu: ipu-v3: image-convert: move tile alignment helpers Philipp Zabel
2018-10-19 12:15 ` [PATCH v4 16/22] gpu: ipu-v3: image-convert: select optimal seam positions Philipp Zabel
2018-10-19 12:15 ` [PATCH v4 17/22] gpu: ipu-v3: image-convert: fix debug output for varying tile sizes Philipp Zabel
2018-10-19 12:15 ` [PATCH v4 18/22] gpu: ipu-v3: image-convert: relax alignment restrictions Philipp Zabel
2018-10-19 12:15 ` [PATCH v4 19/22] gpu: ipu-v3: image-convert: fix bytesperline adjustment Philipp Zabel
2018-10-19 12:15 ` [PATCH v4 20/22] gpu: ipu-v3: image-convert: add some ASCII art to the exposition Philipp Zabel
2018-10-19 12:15 ` [PATCH v4 21/22] gpu: ipu-v3: image-convert: disable double buffering if necessary Philipp Zabel
2018-10-19 12:15 ` [PATCH v4 22/22] gpu: ipu-v3: image-convert: allow three rows or columns Philipp Zabel
2018-10-19 18:51 ` [PATCH v4 00/22] i.MX media mem2mem scaler Steve Longerbeam
2018-10-30 10:00   ` Philipp Zabel
2018-11-14 13:47 ` Adam Ford

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.