linux-media.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/16]  i.MX media mem2mem scaler
@ 2018-07-19 15:30 Philipp Zabel
  2018-07-19 15:30 ` [PATCH v2 01/16] gpu: ipu-v3: ipu-ic: allow to manually set resize coefficients Philipp Zabel
                   ` (16 more replies)
  0 siblings, 17 replies; 22+ messages in thread
From: Philipp Zabel @ 2018-07-19 15:30 UTC (permalink / raw)
  To: linux-media; +Cc: Steve Longerbeam, Nicolas Dufresne, kernel

Hi,

this is the second version of the i.MX mem2mem scaler series.
Patches 8 and 16 have been modified.

Changes since v1:
 - Fix inverted allow_overshoot logic
 - Correctly switch horizontal / vertical tile alignment when
   determining seam positions with the 90° rotator active.
 - Fix SPDX-License-Identifier and remove superfluous license
   text.
 - Fix uninitialized walign in try_fmt

Previous cover letter:

we have image conversion code for scaling and colorspace conversion in
the IPUv3 base driver for a while. Since the IC hardware can only write
up to 1024x1024 pixel buffers, it scales to larger output buffers by
splitting the input and output frame into similarly sized tiles.

This causes the issue that the bilinear interpolation resets at the tile
boundary: instead of smoothly interpolating across the seam, there is a
jump in the input sample position that is very apparent for high
upscaling factors. This can be avoided by slightly changing the scaling
coefficients to let the left/top tiles overshoot their input sampling
into the first pixel / line of their right / bottom neighbors. The error
can be further reduced by letting tiles be differently sized and by
selecting seam positions that minimize the input sampling position error
at tile boundaries.
This is complicated by different DMA start address, burst size, and
rotator block size alignment requirements, depending on the input and
output pixel formats, and the fact that flipping happens in different
places depending on the rotation.

This series implements optimal seam position selection and seam hiding
with per-tile resizing coefficients and adds a scaling mem2mem device
to the imx-media driver.

regards
Philipp

Philipp Zabel (16):
  gpu: ipu-v3: ipu-ic: allow to manually set resize coefficients
  gpu: ipu-v3: image-convert: prepare for per-tile configuration
  gpu: ipu-v3: image-convert: calculate per-tile resize coefficients
  gpu: ipu-v3: image-convert: reconfigure IC per tile
  gpu: ipu-v3: image-convert: store tile top/left position
  gpu: ipu-v3: image-convert: calculate tile dimensions and offsets
    outside fill_image
  gpu: ipu-v3: image-convert: move tile alignment helpers
  gpu: ipu-v3: image-convert: select optimal seam positions
  gpu: ipu-v3: image-convert: fix debug output for varying tile sizes
  gpu: ipu-v3: image-convert: relax tile width alignment for NV12 and
    NV16
  gpu: ipu-v3: image-convert: relax input alignment restrictions
  gpu: ipu-v3: image-convert: relax output alignment restrictions
  gpu: ipu-v3: image-convert: fix bytesperline adjustment
  gpu: ipu-v3: image-convert: add some ASCII art to the exposition
  gpu: ipu-v3: image-convert: disable double buffering if necessary
  media: imx: add mem2mem device

 drivers/gpu/ipu-v3/ipu-ic.c                   |  52 +-
 drivers/gpu/ipu-v3/ipu-image-convert.c        | 870 +++++++++++++---
 drivers/staging/media/imx/Kconfig             |   1 +
 drivers/staging/media/imx/Makefile            |   1 +
 drivers/staging/media/imx/imx-media-dev.c     |  11 +
 drivers/staging/media/imx/imx-media-mem2mem.c | 946 ++++++++++++++++++
 drivers/staging/media/imx/imx-media.h         |  10 +
 include/video/imx-ipu-v3.h                    |   6 +
 8 files changed, 1758 insertions(+), 139 deletions(-)
 create mode 100644 drivers/staging/media/imx/imx-media-mem2mem.c

-- 
2.18.0

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 01/16] gpu: ipu-v3: ipu-ic: allow to manually set resize coefficients
  2018-07-19 15:30 [PATCH v2 00/16] i.MX media mem2mem scaler Philipp Zabel
@ 2018-07-19 15:30 ` Philipp Zabel
  2018-07-19 15:30 ` [PATCH v2 02/16] gpu: ipu-v3: image-convert: prepare for per-tile configuration Philipp Zabel
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 22+ messages in thread
From: Philipp Zabel @ 2018-07-19 15:30 UTC (permalink / raw)
  To: linux-media; +Cc: Steve Longerbeam, Nicolas Dufresne, kernel

For tiled scaling, we want to compute the scaling coefficients
externally in such a way that the interpolation overshoots tile
boundaries and samples up to the first pixel of the next tile.
Prepare to override the resizing coefficients from the image
conversion code.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
---
 drivers/gpu/ipu-v3/ipu-ic.c | 52 +++++++++++++++++++++++--------------
 include/video/imx-ipu-v3.h  |  6 +++++
 2 files changed, 39 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-ic.c b/drivers/gpu/ipu-v3/ipu-ic.c
index 67cc820253a9..594c3cbc8291 100644
--- a/drivers/gpu/ipu-v3/ipu-ic.c
+++ b/drivers/gpu/ipu-v3/ipu-ic.c
@@ -442,36 +442,40 @@ int ipu_ic_task_graphics_init(struct ipu_ic *ic,
 }
 EXPORT_SYMBOL_GPL(ipu_ic_task_graphics_init);
 
-int ipu_ic_task_init(struct ipu_ic *ic,
-		     int in_width, int in_height,
-		     int out_width, int out_height,
-		     enum ipu_color_space in_cs,
-		     enum ipu_color_space out_cs)
+int ipu_ic_task_init_rsc(struct ipu_ic *ic,
+			 int in_width, int in_height,
+			 int out_width, int out_height,
+			 enum ipu_color_space in_cs,
+			 enum ipu_color_space out_cs,
+			 u32 rsc)
 {
 	struct ipu_ic_priv *priv = ic->priv;
-	u32 reg, downsize_coeff, resize_coeff;
+	u32 downsize_coeff, resize_coeff;
 	unsigned long flags;
 	int ret = 0;
 
-	/* Setup vertical resizing */
-	ret = calc_resize_coeffs(ic, in_height, out_height,
-				 &resize_coeff, &downsize_coeff);
-	if (ret)
-		return ret;
+	if (!rsc) {
+		/* Setup vertical resizing */
 
-	reg = (downsize_coeff << 30) | (resize_coeff << 16);
+		ret = calc_resize_coeffs(ic, in_height, out_height,
+					 &resize_coeff, &downsize_coeff);
+		if (ret)
+			return ret;
+
+		rsc = (downsize_coeff << 30) | (resize_coeff << 16);
 
-	/* Setup horizontal resizing */
-	ret = calc_resize_coeffs(ic, in_width, out_width,
-				 &resize_coeff, &downsize_coeff);
-	if (ret)
-		return ret;
+		/* Setup horizontal resizing */
+		ret = calc_resize_coeffs(ic, in_width, out_width,
+					 &resize_coeff, &downsize_coeff);
+		if (ret)
+			return ret;
 
-	reg |= (downsize_coeff << 14) | resize_coeff;
+		rsc |= (downsize_coeff << 14) | resize_coeff;
+	}
 
 	spin_lock_irqsave(&priv->lock, flags);
 
-	ipu_ic_write(ic, reg, ic->reg->rsc);
+	ipu_ic_write(ic, rsc, ic->reg->rsc);
 
 	/* Setup color space conversion */
 	ic->in_cs = in_cs;
@@ -487,6 +491,16 @@ int ipu_ic_task_init(struct ipu_ic *ic,
 	spin_unlock_irqrestore(&priv->lock, flags);
 	return ret;
 }
+
+int ipu_ic_task_init(struct ipu_ic *ic,
+		     int in_width, int in_height,
+		     int out_width, int out_height,
+		     enum ipu_color_space in_cs,
+		     enum ipu_color_space out_cs)
+{
+	return ipu_ic_task_init_rsc(ic, in_width, in_height, out_width,
+				    out_height, in_cs, out_cs, 0);
+}
 EXPORT_SYMBOL_GPL(ipu_ic_task_init);
 
 int ipu_ic_task_idma_init(struct ipu_ic *ic, struct ipuv3_channel *channel,
diff --git a/include/video/imx-ipu-v3.h b/include/video/imx-ipu-v3.h
index abbad94e14a1..94f0eec821c8 100644
--- a/include/video/imx-ipu-v3.h
+++ b/include/video/imx-ipu-v3.h
@@ -387,6 +387,12 @@ int ipu_ic_task_init(struct ipu_ic *ic,
 		     int out_width, int out_height,
 		     enum ipu_color_space in_cs,
 		     enum ipu_color_space out_cs);
+int ipu_ic_task_init_rsc(struct ipu_ic *ic,
+			 int in_width, int in_height,
+			 int out_width, int out_height,
+			 enum ipu_color_space in_cs,
+			 enum ipu_color_space out_cs,
+			 u32 rsc);
 int ipu_ic_task_graphics_init(struct ipu_ic *ic,
 			      enum ipu_color_space in_g_cs,
 			      bool galpha_en, u32 galpha,
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 02/16] gpu: ipu-v3: image-convert: prepare for per-tile configuration
  2018-07-19 15:30 [PATCH v2 00/16] i.MX media mem2mem scaler Philipp Zabel
  2018-07-19 15:30 ` [PATCH v2 01/16] gpu: ipu-v3: ipu-ic: allow to manually set resize coefficients Philipp Zabel
@ 2018-07-19 15:30 ` Philipp Zabel
  2018-07-19 15:30 ` [PATCH v2 03/16] gpu: ipu-v3: image-convert: calculate per-tile resize coefficients Philipp Zabel
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 22+ messages in thread
From: Philipp Zabel @ 2018-07-19 15:30 UTC (permalink / raw)
  To: linux-media; +Cc: Steve Longerbeam, Nicolas Dufresne, kernel

Let convert_start start from a given tile index, allocate intermediate
tile with maximum tile size.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
---
 drivers/gpu/ipu-v3/ipu-image-convert.c | 60 +++++++++++++++-----------
 1 file changed, 35 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-image-convert.c b/drivers/gpu/ipu-v3/ipu-image-convert.c
index 524a717ab28e..7eef51decc97 100644
--- a/drivers/gpu/ipu-v3/ipu-image-convert.c
+++ b/drivers/gpu/ipu-v3/ipu-image-convert.c
@@ -605,7 +605,8 @@ static void init_idmac_channel(struct ipu_image_convert_ctx *ctx,
 			       struct ipuv3_channel *channel,
 			       struct ipu_image_convert_image *image,
 			       enum ipu_rotate_mode rot_mode,
-			       bool rot_swap_width_height)
+			       bool rot_swap_width_height,
+			       unsigned int tile)
 {
 	struct ipu_image_convert_chan *chan = ctx->chan;
 	unsigned int burst_size;
@@ -615,23 +616,23 @@ static void init_idmac_channel(struct ipu_image_convert_ctx *ctx,
 	unsigned int tile_idx[2];
 
 	if (image->type == IMAGE_CONVERT_OUT) {
-		tile_idx[0] = ctx->out_tile_map[0];
+		tile_idx[0] = ctx->out_tile_map[tile];
 		tile_idx[1] = ctx->out_tile_map[1];
 	} else {
-		tile_idx[0] = 0;
+		tile_idx[0] = tile;
 		tile_idx[1] = 1;
 	}
 
 	if (rot_swap_width_height) {
-		width = image->tile[0].height;
-		height = image->tile[0].width;
-		stride = image->tile[0].rot_stride;
+		width = image->tile[tile_idx[0]].height;
+		height = image->tile[tile_idx[0]].width;
+		stride = image->tile[tile_idx[0]].rot_stride;
 		addr0 = ctx->rot_intermediate[0].phys;
 		if (ctx->double_buffering)
 			addr1 = ctx->rot_intermediate[1].phys;
 	} else {
-		width = image->tile[0].width;
-		height = image->tile[0].height;
+		width = image->tile[tile_idx[0]].width;
+		height = image->tile[tile_idx[0]].height;
 		stride = image->stride;
 		addr0 = image->base.phys0 +
 			image->tile[tile_idx[0]].offset;
@@ -681,7 +682,7 @@ static void init_idmac_channel(struct ipu_image_convert_ctx *ctx,
 	ipu_idmac_set_double_buffer(channel, ctx->double_buffering);
 }
 
-static int convert_start(struct ipu_image_convert_run *run)
+static int convert_start(struct ipu_image_convert_run *run, unsigned int tile)
 {
 	struct ipu_image_convert_ctx *ctx = run->ctx;
 	struct ipu_image_convert_chan *chan = ctx->chan;
@@ -689,28 +690,29 @@ static int convert_start(struct ipu_image_convert_run *run)
 	struct ipu_image_convert_image *s_image = &ctx->in;
 	struct ipu_image_convert_image *d_image = &ctx->out;
 	enum ipu_color_space src_cs, dest_cs;
+	unsigned int dst_tile = ctx->out_tile_map[tile];
 	unsigned int dest_width, dest_height;
 	int ret;
 
-	dev_dbg(priv->ipu->dev, "%s: task %u: starting ctx %p run %p\n",
-		__func__, chan->ic_task, ctx, run);
+	dev_dbg(priv->ipu->dev, "%s: task %u: starting ctx %p run %p tile %u -> %u\n",
+		__func__, chan->ic_task, ctx, run, tile, dst_tile);
 
 	src_cs = ipu_pixelformat_to_colorspace(s_image->fmt->fourcc);
 	dest_cs = ipu_pixelformat_to_colorspace(d_image->fmt->fourcc);
 
 	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
 		/* swap width/height for resizer */
-		dest_width = d_image->tile[0].height;
-		dest_height = d_image->tile[0].width;
+		dest_width = d_image->tile[dst_tile].height;
+		dest_height = d_image->tile[dst_tile].width;
 	} else {
-		dest_width = d_image->tile[0].width;
-		dest_height = d_image->tile[0].height;
+		dest_width = d_image->tile[dst_tile].width;
+		dest_height = d_image->tile[dst_tile].height;
 	}
 
 	/* setup the IC resizer and CSC */
 	ret = ipu_ic_task_init(chan->ic,
-			       s_image->tile[0].width,
-			       s_image->tile[0].height,
+			       s_image->tile[tile].width,
+			       s_image->tile[tile].height,
 			       dest_width,
 			       dest_height,
 			       src_cs, dest_cs);
@@ -721,27 +723,27 @@ static int convert_start(struct ipu_image_convert_run *run)
 
 	/* init the source MEM-->IC PP IDMAC channel */
 	init_idmac_channel(ctx, chan->in_chan, s_image,
-			   IPU_ROTATE_NONE, false);
+			   IPU_ROTATE_NONE, false, tile);
 
 	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
 		/* init the IC PP-->MEM IDMAC channel */
 		init_idmac_channel(ctx, chan->out_chan, d_image,
-				   IPU_ROTATE_NONE, true);
+				   IPU_ROTATE_NONE, true, tile);
 
 		/* init the MEM-->IC PP ROT IDMAC channel */
 		init_idmac_channel(ctx, chan->rotation_in_chan, d_image,
-				   ctx->rot_mode, true);
+				   ctx->rot_mode, true, tile);
 
 		/* init the destination IC PP ROT-->MEM IDMAC channel */
 		init_idmac_channel(ctx, chan->rotation_out_chan, d_image,
-				   IPU_ROTATE_NONE, false);
+				   IPU_ROTATE_NONE, false, tile);
 
 		/* now link IC PP-->MEM to MEM-->IC PP ROT */
 		ipu_idmac_link(chan->out_chan, chan->rotation_in_chan);
 	} else {
 		/* init the destination IC PP-->MEM IDMAC channel */
 		init_idmac_channel(ctx, chan->out_chan, d_image,
-				   ctx->rot_mode, false);
+				   ctx->rot_mode, false, tile);
 	}
 
 	/* enable the IC */
@@ -799,7 +801,7 @@ static int do_run(struct ipu_image_convert_run *run)
 	list_del(&run->list);
 	chan->current_run = run;
 
-	return convert_start(run);
+	return convert_start(run, 0);
 }
 
 /* hold irqlock when calling */
@@ -1430,14 +1432,22 @@ ipu_image_convert_prepare(struct ipu_soc *ipu, enum ipu_ic_task ic_task,
 				 !d_image->fmt->planar);
 
 	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
+		unsigned long intermediate_size = d_image->tile[0].size;
+		unsigned int i;
+
+		for (i = 1; i < ctx->num_tiles; i++) {
+			if (d_image->tile[i].size > intermediate_size)
+				intermediate_size = d_image->tile[i].size;
+		}
+
 		ret = alloc_dma_buf(priv, &ctx->rot_intermediate[0],
-				    d_image->tile[0].size);
+				    intermediate_size);
 		if (ret)
 			goto out_free;
 		if (ctx->double_buffering) {
 			ret = alloc_dma_buf(priv,
 					    &ctx->rot_intermediate[1],
-					    d_image->tile[0].size);
+					    intermediate_size);
 			if (ret)
 				goto out_free_dmabuf0;
 		}
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 03/16] gpu: ipu-v3: image-convert: calculate per-tile resize coefficients
  2018-07-19 15:30 [PATCH v2 00/16] i.MX media mem2mem scaler Philipp Zabel
  2018-07-19 15:30 ` [PATCH v2 01/16] gpu: ipu-v3: ipu-ic: allow to manually set resize coefficients Philipp Zabel
  2018-07-19 15:30 ` [PATCH v2 02/16] gpu: ipu-v3: image-convert: prepare for per-tile configuration Philipp Zabel
@ 2018-07-19 15:30 ` Philipp Zabel
  2018-07-19 15:30 ` [PATCH v2 04/16] gpu: ipu-v3: image-convert: reconfigure IC per tile Philipp Zabel
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 22+ messages in thread
From: Philipp Zabel @ 2018-07-19 15:30 UTC (permalink / raw)
  To: linux-media; +Cc: Steve Longerbeam, Nicolas Dufresne, kernel

Slightly modifying resize coefficients per-tile allows to completely
hide the seams between tiles and to sample the correct input pixels at
the bottom and right edges of the image.

Tiling requires a bilinear interpolator reset at each tile start, which
causes the image to be slightly shifted if the starting pixel should not
have been sampled from an integer pixel position in the source image
according to the full image resizing ratio. To work around this
hardware limitation, calculate per-tile resizing coefficients that make
sure that the correct input pixels are sampled at the tile end.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
---
 drivers/gpu/ipu-v3/ipu-image-convert.c | 236 ++++++++++++++++++++++++-
 1 file changed, 234 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-image-convert.c b/drivers/gpu/ipu-v3/ipu-image-convert.c
index 7eef51decc97..12da0772bff0 100644
--- a/drivers/gpu/ipu-v3/ipu-image-convert.c
+++ b/drivers/gpu/ipu-v3/ipu-image-convert.c
@@ -135,6 +135,12 @@ struct ipu_image_convert_ctx {
 	struct ipu_image_convert_image in;
 	struct ipu_image_convert_image out;
 	enum ipu_rotate_mode rot_mode;
+	u32 downsize_coeff_h;
+	u32 downsize_coeff_v;
+	u32 image_resize_coeff_h;
+	u32 image_resize_coeff_v;
+	u32 resize_coeffs_h[MAX_STRIPES_W];
+	u32 resize_coeffs_v[MAX_STRIPES_H];
 
 	/* intermediate buffer for rotation */
 	struct ipu_image_convert_dma_buf rot_intermediate[2];
@@ -355,6 +361,69 @@ static inline int num_stripes(int dim)
 		return 4;
 }
 
+/*
+ * Calculate downsizing coefficients, which are the same for all tiles,
+ * and bilinear resizing coefficients, which are used to find the best
+ * seam positions.
+ */
+static int calc_image_resize_coefficients(struct ipu_image_convert_ctx *ctx,
+					  struct ipu_image *in,
+					  struct ipu_image *out)
+{
+	u32 downsized_width = in->rect.width;
+	u32 downsized_height = in->rect.height;
+	u32 downsize_coeff_v = 0;
+	u32 downsize_coeff_h = 0;
+	u32 resized_width = out->rect.width;
+	u32 resized_height = out->rect.height;
+	u32 resize_coeff_h;
+	u32 resize_coeff_v;
+
+	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
+		resized_width = out->rect.height;
+		resized_height = out->rect.width;
+	}
+
+	/* Do not let invalid input lead to an endless loop below */
+	if (WARN_ON(resized_width == 0 || resized_height == 0))
+		return -EINVAL;
+
+	while (downsized_width >= resized_width * 2) {
+		downsized_width >>= 1;
+		downsize_coeff_h++;
+	}
+
+	while (downsized_height >= resized_height * 2) {
+		downsized_height >>= 1;
+		downsize_coeff_v++;
+	}
+
+	/*
+	 * Calculate the bilinear resizing coefficients that could be used if
+	 * we were converting with a single tile. The bottom right output pixel
+	 * should sample as close as possible to the bottom right input pixel
+	 * out of the decimator, but not overshoot it:
+	 */
+	resize_coeff_h = 8192 * (downsized_width - 1) / (resized_width - 1);
+	resize_coeff_v = 8192 * (downsized_height - 1) / (resized_height - 1);
+
+	dev_dbg(ctx->chan->priv->ipu->dev,
+		"%s: hscale: >>%u, *8192/%u vscale: >>%u, *8192/%u, %ux%u tiles\n",
+		__func__, downsize_coeff_h, resize_coeff_h, downsize_coeff_v,
+		resize_coeff_v, ctx->in.num_cols, ctx->in.num_rows);
+
+	if (downsize_coeff_h > 2 || downsize_coeff_v  > 2 ||
+	    resize_coeff_h > 0x3fff || resize_coeff_v > 0x3fff)
+		return -EINVAL;
+
+	ctx->downsize_coeff_h = downsize_coeff_h;
+	ctx->downsize_coeff_v = downsize_coeff_v;
+	ctx->image_resize_coeff_h = resize_coeff_h;
+	ctx->image_resize_coeff_v = resize_coeff_v;
+
+	return 0;
+}
+
 static void calc_tile_dimensions(struct ipu_image_convert_ctx *ctx,
 				 struct ipu_image_convert_image *image)
 {
@@ -558,6 +627,149 @@ static void calc_tile_offsets(struct ipu_image_convert_ctx *ctx,
 		calc_tile_offsets_packed(ctx, image);
 }
 
+/*
+ * Calculate the resizing ratio for the IC main processing section given input
+ * size, fixed downsizing coefficient, and output size.
+ * Either round to closest for the next tile's first pixel to minimize seams
+ * and distortion (for all but right column / bottom row), or round down to
+ * avoid sampling beyond the edges of the input image for this tile's last
+ * pixel.
+ * Returns the resizing coefficient, resizing ratio is 8192.0 / resize_coeff.
+ */
+static u32 calc_resize_coeff(u32 input_size, u32 downsize_coeff,
+			     u32 output_size, bool allow_overshoot)
+{
+	u32 downsized = input_size >> downsize_coeff;
+
+	if (allow_overshoot)
+		return DIV_ROUND_CLOSEST(8192 * downsized, output_size);
+	else
+		return 8192 * (downsized - 1) / (output_size - 1);
+}
+
+/*
+ * Slightly modify resize coefficients per tile to hide the bilinear
+ * interpolator reset at tile borders, shifting the right / bottom edge
+ * by up to a half input pixel. This removes noticeable seams between
+ * tiles at higher upscaling factors.
+ */
+static void calc_tile_resize_coefficients(struct ipu_image_convert_ctx *ctx)
+{
+	struct ipu_image_convert_chan *chan = ctx->chan;
+	struct ipu_image_convert_priv *priv = chan->priv;
+	struct ipu_image_tile *in_tile, *out_tile;
+	unsigned int col, row, tile_idx;
+	unsigned int last_output;
+
+	for (col = 0; col < ctx->in.num_cols; col++) {
+		bool closest = (col < ctx->in.num_cols - 1) &&
+			       !(ctx->rot_mode & IPU_ROT_BIT_HFLIP);
+		u32 resized_width;
+		u32 resize_coeff_h;
+
+		tile_idx = col;
+		in_tile = &ctx->in.tile[tile_idx];
+		out_tile = &ctx->out.tile[ctx->out_tile_map[tile_idx]];
+
+		if (ipu_rot_mode_is_irt(ctx->rot_mode))
+			resized_width = out_tile->height;
+		else
+			resized_width = out_tile->width;
+
+		resize_coeff_h = calc_resize_coeff(in_tile->width,
+						   ctx->downsize_coeff_h,
+						   resized_width, closest);
+
+		dev_dbg(priv->ipu->dev, "%s: column %u hscale: *8192/%u\n",
+			__func__, col, resize_coeff_h);
+
+
+		for (row = 0; row < ctx->in.num_rows; row++) {
+			tile_idx = row * ctx->in.num_cols + col;
+			in_tile = &ctx->in.tile[tile_idx];
+			out_tile = &ctx->out.tile[ctx->out_tile_map[tile_idx]];
+
+			/*
+			 * With the horizontal scaling factor known, round up
+			 * resized width (output width or height) to burst size.
+			 */
+			if (ipu_rot_mode_is_irt(ctx->rot_mode))
+				out_tile->height = round_up(resized_width, 8);
+			else
+				out_tile->width = round_up(resized_width, 8);
+
+			/*
+			 * Calculate input width from the last accessed input
+			 * pixel given resized width and scaling coefficients.
+			 * Round up to burst size.
+			 */
+			last_output = round_up(resized_width, 8) - 1;
+			if (closest)
+				last_output++;
+			in_tile->width = round_up(
+				(DIV_ROUND_UP(last_output * resize_coeff_h,
+					      8192) + 1)
+				<< ctx->downsize_coeff_h, 8);
+		}
+
+		ctx->resize_coeffs_h[col] = resize_coeff_h;
+	}
+
+	for (row = 0; row < ctx->in.num_rows; row++) {
+		bool closest = (row < ctx->in.num_rows - 1) &&
+			       !(ctx->rot_mode & IPU_ROT_BIT_VFLIP);
+		u32 resized_height;
+		u32 resize_coeff_v;
+
+		tile_idx = row * ctx->in.num_cols;
+		in_tile = &ctx->in.tile[tile_idx];
+		out_tile = &ctx->out.tile[ctx->out_tile_map[tile_idx]];
+
+		if (ipu_rot_mode_is_irt(ctx->rot_mode))
+			resized_height = out_tile->width;
+		else
+			resized_height = out_tile->height;
+
+		resize_coeff_v = calc_resize_coeff(in_tile->height,
+						   ctx->downsize_coeff_v,
+						   resized_height, closest);
+
+		dev_dbg(priv->ipu->dev, "%s: row %u vscale: *8192/%u\n",
+			__func__, row, resize_coeff_v);
+
+		for (col = 0; col < ctx->in.num_cols; col++) {
+			tile_idx = row * ctx->in.num_cols + col;
+			in_tile = &ctx->in.tile[tile_idx];
+			out_tile = &ctx->out.tile[ctx->out_tile_map[tile_idx]];
+
+			/*
+			 * With the vertical scaling factor known, round up
+			 * resized height (output width or height) to IDMAC
+			 * limitations.
+			 */
+			if (ipu_rot_mode_is_irt(ctx->rot_mode))
+				out_tile->width = round_up(resized_height, 2);
+			else
+				out_tile->height = round_up(resized_height, 2);
+
+			/*
+			 * Calculate input width from the last accessed input
+			 * pixel given resized height and scaling coefficients.
+			 * Align to IDMAC restrictions.
+			 */
+			last_output = round_up(resized_height, 2) - 1;
+			if (closest)
+				last_output++;
+			in_tile->height = round_up(
+				(DIV_ROUND_UP(last_output * resize_coeff_v,
+					      8192) + 1)
+				<< ctx->downsize_coeff_v, 2);
+		}
+
+		ctx->resize_coeffs_v[row] = resize_coeff_v;
+	}
+}
+
 /*
  * return the number of runs in given queue (pending_q or done_q)
  * for this context. hold irqlock when calling.
@@ -692,6 +904,8 @@ static int convert_start(struct ipu_image_convert_run *run, unsigned int tile)
 	enum ipu_color_space src_cs, dest_cs;
 	unsigned int dst_tile = ctx->out_tile_map[tile];
 	unsigned int dest_width, dest_height;
+	unsigned int col, row;
+	u32 rsc;
 	int ret;
 
 	dev_dbg(priv->ipu->dev, "%s: task %u: starting ctx %p run %p tile %u -> %u\n",
@@ -709,13 +923,26 @@ static int convert_start(struct ipu_image_convert_run *run, unsigned int tile)
 		dest_height = d_image->tile[dst_tile].height;
 	}
 
+	row = tile / s_image->num_cols;
+	col = tile % s_image->num_cols;
+
+	rsc =  (ctx->downsize_coeff_v << 30) |
+	       (ctx->resize_coeffs_v[row] << 16) |
+	       (ctx->downsize_coeff_h << 14) |
+	       (ctx->resize_coeffs_h[col]);
+
+	dev_dbg(priv->ipu->dev, "%s: %ux%u -> %ux%u (rsc = 0x%x)\n",
+		__func__, s_image->tile[tile].width,
+		s_image->tile[tile].height, dest_width, dest_height, rsc);
+
 	/* setup the IC resizer and CSC */
-	ret = ipu_ic_task_init(chan->ic,
+	ret = ipu_ic_task_init_rsc(chan->ic,
 			       s_image->tile[tile].width,
 			       s_image->tile[tile].height,
 			       dest_width,
 			       dest_height,
-			       src_cs, dest_cs);
+			       src_cs, dest_cs,
+			       rsc);
 	if (ret) {
 		dev_err(priv->ipu->dev, "ipu_ic_task_init failed, %d\n", ret);
 		return ret;
@@ -1401,6 +1628,10 @@ ipu_image_convert_prepare(struct ipu_soc *ipu, enum ipu_ic_task ic_task,
 	ctx->num_tiles = d_image->num_cols * d_image->num_rows;
 	ctx->rot_mode = rot_mode;
 
+	ret = calc_image_resize_coefficients(ctx, in, out);
+	if (ret)
+		goto out_free;
+
 	ret = fill_image(ctx, s_image, in, IMAGE_CONVERT_IN);
 	if (ret)
 		goto out_free;
@@ -1409,6 +1640,7 @@ ipu_image_convert_prepare(struct ipu_soc *ipu, enum ipu_ic_task ic_task,
 		goto out_free;
 
 	calc_out_tile_map(ctx);
+	calc_tile_resize_coefficients(ctx);
 
 	dump_format(ctx, s_image);
 	dump_format(ctx, d_image);
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 04/16] gpu: ipu-v3: image-convert: reconfigure IC per tile
  2018-07-19 15:30 [PATCH v2 00/16] i.MX media mem2mem scaler Philipp Zabel
                   ` (2 preceding siblings ...)
  2018-07-19 15:30 ` [PATCH v2 03/16] gpu: ipu-v3: image-convert: calculate per-tile resize coefficients Philipp Zabel
@ 2018-07-19 15:30 ` Philipp Zabel
  2018-07-19 15:30 ` [PATCH v2 05/16] gpu: ipu-v3: image-convert: store tile top/left position Philipp Zabel
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 22+ messages in thread
From: Philipp Zabel @ 2018-07-19 15:30 UTC (permalink / raw)
  To: linux-media; +Cc: Steve Longerbeam, Nicolas Dufresne, kernel

For differently sized tiles or if the resizing coefficients change,
we have to stop, reconfigure, and restart the IC between tiles.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
---
 drivers/gpu/ipu-v3/ipu-image-convert.c | 65 +++++++++++++++++---------
 1 file changed, 44 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-image-convert.c b/drivers/gpu/ipu-v3/ipu-image-convert.c
index 12da0772bff0..3907fb7dae13 100644
--- a/drivers/gpu/ipu-v3/ipu-image-convert.c
+++ b/drivers/gpu/ipu-v3/ipu-image-convert.c
@@ -1131,6 +1131,24 @@ static irqreturn_t do_bh(int irq, void *dev_id)
 	return IRQ_HANDLED;
 }
 
+static bool ic_settings_changed(struct ipu_image_convert_ctx *ctx)
+{
+	unsigned int cur_tile = ctx->next_tile - 1;
+	unsigned int next_tile = ctx->next_tile;
+
+	if (ctx->resize_coeffs_h[cur_tile % ctx->in.num_cols] !=
+	    ctx->resize_coeffs_h[next_tile % ctx->in.num_cols] ||
+	    ctx->resize_coeffs_v[cur_tile / ctx->in.num_cols] !=
+	    ctx->resize_coeffs_v[next_tile / ctx->in.num_cols] ||
+	    ctx->in.tile[cur_tile].width != ctx->in.tile[next_tile].width ||
+	    ctx->in.tile[cur_tile].height != ctx->in.tile[next_tile].height ||
+	    ctx->out.tile[cur_tile].width != ctx->out.tile[next_tile].width ||
+	    ctx->out.tile[cur_tile].height != ctx->out.tile[next_tile].height)
+		return true;
+
+	return false;
+}
+
 /* hold irqlock when calling */
 static irqreturn_t do_irq(struct ipu_image_convert_run *run)
 {
@@ -1174,27 +1192,32 @@ static irqreturn_t do_irq(struct ipu_image_convert_run *run)
 	 * not done, place the next tile buffers.
 	 */
 	if (!ctx->double_buffering) {
-
-		src_tile = &s_image->tile[ctx->next_tile];
-		dst_idx = ctx->out_tile_map[ctx->next_tile];
-		dst_tile = &d_image->tile[dst_idx];
-
-		ipu_cpmem_set_buffer(chan->in_chan, 0,
-				     s_image->base.phys0 + src_tile->offset);
-		ipu_cpmem_set_buffer(outch, 0,
-				     d_image->base.phys0 + dst_tile->offset);
-		if (s_image->fmt->planar)
-			ipu_cpmem_set_uv_offset(chan->in_chan,
-						src_tile->u_off,
-						src_tile->v_off);
-		if (d_image->fmt->planar)
-			ipu_cpmem_set_uv_offset(outch,
-						dst_tile->u_off,
-						dst_tile->v_off);
-
-		ipu_idmac_select_buffer(chan->in_chan, 0);
-		ipu_idmac_select_buffer(outch, 0);
-
+		if (ic_settings_changed(ctx)) {
+			convert_stop(run);
+			convert_start(run, ctx->next_tile);
+		} else {
+			src_tile = &s_image->tile[ctx->next_tile];
+			dst_idx = ctx->out_tile_map[ctx->next_tile];
+			dst_tile = &d_image->tile[dst_idx];
+
+			ipu_cpmem_set_buffer(chan->in_chan, 0,
+					     s_image->base.phys0 +
+					     src_tile->offset);
+			ipu_cpmem_set_buffer(outch, 0,
+					     d_image->base.phys0 +
+					     dst_tile->offset);
+			if (s_image->fmt->planar)
+				ipu_cpmem_set_uv_offset(chan->in_chan,
+							src_tile->u_off,
+							src_tile->v_off);
+			if (d_image->fmt->planar)
+				ipu_cpmem_set_uv_offset(outch,
+							dst_tile->u_off,
+							dst_tile->v_off);
+
+			ipu_idmac_select_buffer(chan->in_chan, 0);
+			ipu_idmac_select_buffer(outch, 0);
+		}
 	} else if (ctx->next_tile < ctx->num_tiles - 1) {
 
 		src_tile = &s_image->tile[ctx->next_tile + 1];
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 05/16] gpu: ipu-v3: image-convert: store tile top/left position
  2018-07-19 15:30 [PATCH v2 00/16] i.MX media mem2mem scaler Philipp Zabel
                   ` (3 preceding siblings ...)
  2018-07-19 15:30 ` [PATCH v2 04/16] gpu: ipu-v3: image-convert: reconfigure IC per tile Philipp Zabel
@ 2018-07-19 15:30 ` Philipp Zabel
  2018-07-19 15:30 ` [PATCH v2 06/16] gpu: ipu-v3: image-convert: calculate tile dimensions and offsets outside fill_image Philipp Zabel
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 22+ messages in thread
From: Philipp Zabel @ 2018-07-19 15:30 UTC (permalink / raw)
  To: linux-media; +Cc: Steve Longerbeam, Nicolas Dufresne, kernel

Store tile top/left position in pixels in the tile structure.
This will allow overlapping tiles with different sizes later.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
---
 drivers/gpu/ipu-v3/ipu-image-convert.c | 27 ++++++++++++++------------
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-image-convert.c b/drivers/gpu/ipu-v3/ipu-image-convert.c
index 3907fb7dae13..c3358e83bcc1 100644
--- a/drivers/gpu/ipu-v3/ipu-image-convert.c
+++ b/drivers/gpu/ipu-v3/ipu-image-convert.c
@@ -84,6 +84,8 @@ struct ipu_image_convert_dma_chan {
 struct ipu_image_tile {
 	u32 width;
 	u32 height;
+	u32 left;
+	u32 top;
 	/* size and strides are in bytes */
 	u32 size;
 	u32 stride;
@@ -427,13 +429,17 @@ static int calc_image_resize_coefficients(struct ipu_image_convert_ctx *ctx,
 static void calc_tile_dimensions(struct ipu_image_convert_ctx *ctx,
 				 struct ipu_image_convert_image *image)
 {
-	int i;
+	unsigned int i;
 
 	for (i = 0; i < ctx->num_tiles; i++) {
 		struct ipu_image_tile *tile = &image->tile[i];
+		const unsigned int row = i / image->num_cols;
+		const unsigned int col = i % image->num_cols;
 
 		tile->height = image->base.pix.height / image->num_rows;
 		tile->width = image->base.pix.width / image->num_cols;
+		tile->left = col * tile->width;
+		tile->top = row * tile->height;
 		tile->size = ((tile->height * image->fmt->bpp) >> 3) *
 			tile->width;
 
@@ -529,7 +535,7 @@ static void calc_tile_offsets_planar(struct ipu_image_convert_ctx *ctx,
 	struct ipu_image_convert_priv *priv = chan->priv;
 	const struct ipu_image_pixfmt *fmt = image->fmt;
 	unsigned int row, col, tile = 0;
-	u32 H, w, h, y_stride, uv_stride;
+	u32 H, top, y_stride, uv_stride;
 	u32 uv_row_off, uv_col_off, uv_off, u_off, v_off, tmp;
 	u32 y_row_off, y_col_off, y_off;
 	u32 y_size, uv_size;
@@ -546,13 +552,12 @@ static void calc_tile_offsets_planar(struct ipu_image_convert_ctx *ctx,
 	uv_size = y_size / (fmt->uv_width_dec * fmt->uv_height_dec);
 
 	for (row = 0; row < image->num_rows; row++) {
-		w = image->tile[tile].width;
-		h = image->tile[tile].height;
-		y_row_off = row * h * y_stride;
-		uv_row_off = (row * h * uv_stride) / fmt->uv_height_dec;
+		top = image->tile[tile].top;
+		y_row_off = top * y_stride;
+		uv_row_off = (top * uv_stride) / fmt->uv_height_dec;
 
 		for (col = 0; col < image->num_cols; col++) {
-			y_col_off = col * w;
+			y_col_off = image->tile[tile].left;
 			uv_col_off = y_col_off / fmt->uv_width_dec;
 			if (fmt->uv_packed)
 				uv_col_off *= 2;
@@ -589,7 +594,7 @@ static void calc_tile_offsets_packed(struct ipu_image_convert_ctx *ctx,
 	struct ipu_image_convert_priv *priv = chan->priv;
 	const struct ipu_image_pixfmt *fmt = image->fmt;
 	unsigned int row, col, tile = 0;
-	u32 w, h, bpp, stride;
+	u32 bpp, stride;
 	u32 row_off, col_off;
 
 	/* setup some convenience vars */
@@ -597,12 +602,10 @@ static void calc_tile_offsets_packed(struct ipu_image_convert_ctx *ctx,
 	bpp = fmt->bpp;
 
 	for (row = 0; row < image->num_rows; row++) {
-		w = image->tile[tile].width;
-		h = image->tile[tile].height;
-		row_off = row * h * stride;
+		row_off = image->tile[tile].top * stride;
 
 		for (col = 0; col < image->num_cols; col++) {
-			col_off = (col * w * bpp) >> 3;
+			col_off = (image->tile[tile].left * bpp) >> 3;
 
 			image->tile[tile].offset = row_off + col_off;
 			image->tile[tile].u_off = 0;
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 06/16] gpu: ipu-v3: image-convert: calculate tile dimensions and offsets outside fill_image
  2018-07-19 15:30 [PATCH v2 00/16] i.MX media mem2mem scaler Philipp Zabel
                   ` (4 preceding siblings ...)
  2018-07-19 15:30 ` [PATCH v2 05/16] gpu: ipu-v3: image-convert: store tile top/left position Philipp Zabel
@ 2018-07-19 15:30 ` Philipp Zabel
  2018-07-19 15:30 ` [PATCH v2 07/16] gpu: ipu-v3: image-convert: move tile alignment helpers Philipp Zabel
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 22+ messages in thread
From: Philipp Zabel @ 2018-07-19 15:30 UTC (permalink / raw)
  To: linux-media; +Cc: Steve Longerbeam, Nicolas Dufresne, kernel

This will allow to calculate seam positions after initializing the
ipu_image base structure but before calculating tile dimensions.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
---
 drivers/gpu/ipu-v3/ipu-image-convert.c | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-image-convert.c b/drivers/gpu/ipu-v3/ipu-image-convert.c
index c3358e83bcc1..06d65c63262d 100644
--- a/drivers/gpu/ipu-v3/ipu-image-convert.c
+++ b/drivers/gpu/ipu-v3/ipu-image-convert.c
@@ -1447,9 +1447,6 @@ static int fill_image(struct ipu_image_convert_ctx *ctx,
 	else
 		ic_image->stride  = ic_image->base.pix.bytesperline;
 
-	calc_tile_dimensions(ctx, ic_image);
-	calc_tile_offsets(ctx, ic_image);
-
 	return 0;
 }
 
@@ -1654,10 +1651,6 @@ ipu_image_convert_prepare(struct ipu_soc *ipu, enum ipu_ic_task ic_task,
 	ctx->num_tiles = d_image->num_cols * d_image->num_rows;
 	ctx->rot_mode = rot_mode;
 
-	ret = calc_image_resize_coefficients(ctx, in, out);
-	if (ret)
-		goto out_free;
-
 	ret = fill_image(ctx, s_image, in, IMAGE_CONVERT_IN);
 	if (ret)
 		goto out_free;
@@ -1665,6 +1658,16 @@ ipu_image_convert_prepare(struct ipu_soc *ipu, enum ipu_ic_task ic_task,
 	if (ret)
 		goto out_free;
 
+	ret = calc_image_resize_coefficients(ctx, in, out);
+	if (ret)
+		goto out_free;
+
+	calc_tile_dimensions(ctx, s_image);
+	calc_tile_offsets(ctx, s_image);
+
+	calc_tile_dimensions(ctx, d_image);
+	calc_tile_offsets(ctx, d_image);
+
 	calc_out_tile_map(ctx);
 	calc_tile_resize_coefficients(ctx);
 
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 07/16] gpu: ipu-v3: image-convert: move tile alignment helpers
  2018-07-19 15:30 [PATCH v2 00/16] i.MX media mem2mem scaler Philipp Zabel
                   ` (5 preceding siblings ...)
  2018-07-19 15:30 ` [PATCH v2 06/16] gpu: ipu-v3: image-convert: calculate tile dimensions and offsets outside fill_image Philipp Zabel
@ 2018-07-19 15:30 ` Philipp Zabel
  2018-07-19 15:30 ` [PATCH v2 08/16] gpu: ipu-v3: image-convert: select optimal seam positions Philipp Zabel
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 22+ messages in thread
From: Philipp Zabel @ 2018-07-19 15:30 UTC (permalink / raw)
  To: linux-media; +Cc: Steve Longerbeam, Nicolas Dufresne, kernel

Move tile_width_align and tile_height_align up so they
can be used by the tile edge position calculation code.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
---
 drivers/gpu/ipu-v3/ipu-image-convert.c | 54 +++++++++++++-------------
 1 file changed, 27 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-image-convert.c b/drivers/gpu/ipu-v3/ipu-image-convert.c
index 06d65c63262d..da6f18475b6b 100644
--- a/drivers/gpu/ipu-v3/ipu-image-convert.c
+++ b/drivers/gpu/ipu-v3/ipu-image-convert.c
@@ -426,6 +426,33 @@ static int calc_image_resize_coefficients(struct ipu_image_convert_ctx *ctx,
 	return 0;
 }
 
+/*
+ * We have to adjust the tile width such that the tile physaddrs and
+ * U and V plane offsets are multiples of 8 bytes as required by
+ * the IPU DMA Controller. For the planar formats, this corresponds
+ * to a pixel alignment of 16 (but use a more formal equation since
+ * the variables are available). For all the packed formats, 8 is
+ * good enough.
+ */
+static inline u32 tile_width_align(const struct ipu_image_pixfmt *fmt)
+{
+	return fmt->planar ? 8 * fmt->uv_width_dec : 8;
+}
+
+/*
+ * For tile height alignment, we have to ensure that the output tile
+ * heights are multiples of 8 lines if the IRT is required by the
+ * given rotation mode (the IRT performs rotations on 8x8 blocks
+ * at a time). If the IRT is not used, or for input image tiles,
+ * 2 lines are good enough.
+ */
+static inline u32 tile_height_align(enum ipu_image_convert_type type,
+				    enum ipu_rotate_mode rot_mode)
+{
+	return (type == IMAGE_CONVERT_OUT &&
+		ipu_rot_mode_is_irt(rot_mode)) ? 8 : 2;
+}
+
 static void calc_tile_dimensions(struct ipu_image_convert_ctx *ctx,
 				 struct ipu_image_convert_image *image)
 {
@@ -1467,33 +1494,6 @@ static unsigned int clamp_align(unsigned int x, unsigned int min,
 	return x;
 }
 
-/*
- * We have to adjust the tile width such that the tile physaddrs and
- * U and V plane offsets are multiples of 8 bytes as required by
- * the IPU DMA Controller. For the planar formats, this corresponds
- * to a pixel alignment of 16 (but use a more formal equation since
- * the variables are available). For all the packed formats, 8 is
- * good enough.
- */
-static inline u32 tile_width_align(const struct ipu_image_pixfmt *fmt)
-{
-	return fmt->planar ? 8 * fmt->uv_width_dec : 8;
-}
-
-/*
- * For tile height alignment, we have to ensure that the output tile
- * heights are multiples of 8 lines if the IRT is required by the
- * given rotation mode (the IRT performs rotations on 8x8 blocks
- * at a time). If the IRT is not used, or for input image tiles,
- * 2 lines are good enough.
- */
-static inline u32 tile_height_align(enum ipu_image_convert_type type,
-				    enum ipu_rotate_mode rot_mode)
-{
-	return (type == IMAGE_CONVERT_OUT &&
-		ipu_rot_mode_is_irt(rot_mode)) ? 8 : 2;
-}
-
 /* Adjusts input/output images to IPU restrictions */
 void ipu_image_convert_adjust(struct ipu_image *in, struct ipu_image *out,
 			      enum ipu_rotate_mode rot_mode)
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 08/16] gpu: ipu-v3: image-convert: select optimal seam positions
  2018-07-19 15:30 [PATCH v2 00/16] i.MX media mem2mem scaler Philipp Zabel
                   ` (6 preceding siblings ...)
  2018-07-19 15:30 ` [PATCH v2 07/16] gpu: ipu-v3: image-convert: move tile alignment helpers Philipp Zabel
@ 2018-07-19 15:30 ` Philipp Zabel
  2018-07-19 15:30 ` [PATCH v2 09/16] gpu: ipu-v3: image-convert: fix debug output for varying tile sizes Philipp Zabel
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 22+ messages in thread
From: Philipp Zabel @ 2018-07-19 15:30 UTC (permalink / raw)
  To: linux-media; +Cc: Steve Longerbeam, Nicolas Dufresne, kernel

Select seam positions that minimize distortions during seam hiding while
satifying input and output IDMAC, rotator, and image format constraints.

This code looks for aligned output seam positions that minimize the
difference between the fractional corresponding ideal input positions
and the input positions rounded to alignment requirements.

Since now tiles can be sized differently, alignment restrictions of the
complete image can be relaxed in the next step.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
---
Changes since v1 [1]:
 - Fix inverted allow_overshoot logic
 - Correctly switch horizontal / vertical tile alignment when
   determining seam positions with the 90° rotator active.

[1] https://patchwork.linuxtv.org/patch/50521/
---
 drivers/gpu/ipu-v3/ipu-image-convert.c | 320 ++++++++++++++++++++++++-
 1 file changed, 314 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-image-convert.c b/drivers/gpu/ipu-v3/ipu-image-convert.c
index da6f18475b6b..6615cea694ed 100644
--- a/drivers/gpu/ipu-v3/ipu-image-convert.c
+++ b/drivers/gpu/ipu-v3/ipu-image-convert.c
@@ -426,6 +426,115 @@ static int calc_image_resize_coefficients(struct ipu_image_convert_ctx *ctx,
 	return 0;
 }
 
+#define round_closest(x, y) round_down((x) + (y)/2, (y))
+
+/*
+ * Find the best aligned seam position in the inverval [out_start, out_end].
+ * Rotation and image offsets are out of scope.
+ *
+ * @out_start: start of inverval, must be within 1024 pixels / lines
+ *             of out_end
+ * @out_end: output right / bottom edge, end of interval
+ * @in_align: input alignment, either horizontal 8-byte line start address
+ *            alignment, or pixel alignment due to image format
+ * @out_align: output alignment, either horizontal 8-byte line start address
+ *             alignment, or pixel alignment due to image format or rotator
+ *             block size
+ * @out_burst: horizontal output burst size or rotator block size
+ * @downsize_coeff: downsizing section coefficient
+ * @resize_coeff: main processing section resizing coefficient
+ * @allow_overshoot: ignore out_burst if true
+ * @_in_seam: aligned input seam position return value
+ * @_out_seam: aligned output seam position return value
+ */
+static void find_best_seam(struct ipu_image_convert_ctx *ctx,
+			   unsigned int out_start,
+			   unsigned int out_end,
+			   unsigned int in_align,
+			   unsigned int out_align,
+			   unsigned int out_burst,
+			   unsigned int downsize_coeff,
+			   unsigned int resize_coeff,
+			   bool allow_overshoot,
+			   u32 *_in_seam,
+			   u32 *_out_seam)
+{
+	struct device *dev = ctx->chan->priv->ipu->dev;
+	unsigned int out_pos;
+	/* Input / output seam position candidates */
+	unsigned int out_seam = 0;
+	unsigned int in_seam = 0;
+	unsigned int min_diff = UINT_MAX;
+
+	/*
+	 * Output tiles must start at a multiple of 8 bytes horizontally and
+	 * possibly at an even line horizontally depending on the pixel format.
+	 * Only consider output aligned positions for the seam.
+	 */
+	out_start = round_up(out_start, out_align);
+	for (out_pos = out_start; out_pos < out_end; out_pos += out_align) {
+		unsigned int in_pos;
+		unsigned int in_pos_aligned;
+		unsigned int abs_diff;
+
+		/*
+		 * Tiles in the right row / bottom column may not be allowed to
+		 * overshoot horizontally / vertically. out_burst may be the
+		 * actual DMA burst size, or the rotator block size.
+		 */
+		if (!allow_overshoot && (out_end - out_pos) % out_burst)
+			continue;
+
+		/*
+		 * Input sample position, corresponding to out_pos, 19.13 fixed
+		 * point.
+		 */
+		in_pos = (out_pos * resize_coeff) << downsize_coeff;
+		/*
+		 * The closest input sample position that we could actually
+		 * start the input tile at, 19.13 fixed point.
+		 */
+		in_pos_aligned = round_closest(in_pos, 8192U * in_align);
+
+		if (in_pos < in_pos_aligned)
+			abs_diff = in_pos_aligned - in_pos;
+		else
+			abs_diff = in_pos - in_pos_aligned;
+
+		if (abs_diff < min_diff) {
+			in_seam = in_pos_aligned;
+			out_seam = out_pos;
+			min_diff = abs_diff;
+		}
+	}
+
+	*_out_seam = out_seam;
+	/* Convert 19.13 fixed point to integer seam position */
+	*_in_seam = DIV_ROUND_CLOSEST(in_seam, 8192U);
+
+	dev_dbg(dev, "%s: out_seam %u in [%u, %u], in_seam %u diff %u.%03u\n",
+		__func__, out_seam, out_start, out_end, *_in_seam,
+		min_diff / 8192,
+		DIV_ROUND_CLOSEST(min_diff % 8192 * 1000, 8192));
+}
+
+/*
+ * Tile left edges are required to be aligned to multiples of 8 bytes
+ * by the IDMAC.
+ */
+static inline u32 tile_left_align(const struct ipu_image_pixfmt *fmt)
+{
+	return fmt->planar ? 8 * fmt->uv_width_dec : 64 / fmt->bpp;
+}
+
+/*
+ * Tile top edge alignment is only limited by chroma subsampling.
+ */
+static inline u32 tile_top_align(const struct ipu_image_pixfmt *fmt)
+{
+	return fmt->uv_height_dec > 1 ? 2 : 1;
+}
+
 /*
  * We have to adjust the tile width such that the tile physaddrs and
  * U and V plane offsets are multiples of 8 bytes as required by
@@ -453,20 +562,216 @@ static inline u32 tile_height_align(enum ipu_image_convert_type type,
 		ipu_rot_mode_is_irt(rot_mode)) ? 8 : 2;
 }
 
+/*
+ * Fill in left position and width and for all tiles in an input column, and
+ * for all corresponding output tiles. If the 90° rotator is used, the output
+ * tiles are in a row, and output tile top position and height are set.
+ */
+static void fill_tile_column(struct ipu_image_convert_ctx *ctx,
+			     unsigned int col,
+			     struct ipu_image_convert_image *in,
+			     unsigned int in_left, unsigned int in_width,
+			     struct ipu_image_convert_image *out,
+			     unsigned int out_left, unsigned int out_width)
+{
+	unsigned int row, tile_idx;
+	struct ipu_image_tile *in_tile, *out_tile;
+
+	for (row = 0; row < in->num_rows; row++) {
+		tile_idx = in->num_cols * row + col;
+		in_tile = &in->tile[tile_idx];
+		out_tile = &out->tile[ctx->out_tile_map[tile_idx]];
+
+		in_tile->left = in_left;
+		in_tile->width = in_width;
+
+		if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
+			out_tile->top = out_left;
+			out_tile->height = out_width;
+		} else {
+			out_tile->left = out_left;
+			out_tile->width = out_width;
+		}
+	}
+}
+
+/*
+ * Fill in top position and height and for all tiles in an input row, and
+ * for all corresponding output tiles. If the 90° rotator is used, the output
+ * tiles are in a column, and output tile left position and width are set.
+ */
+static void fill_tile_row(struct ipu_image_convert_ctx *ctx, unsigned int row,
+			  struct ipu_image_convert_image *in,
+			  unsigned int in_top, unsigned int in_height,
+			  struct ipu_image_convert_image *out,
+			  unsigned int out_top, unsigned int out_height)
+{
+	unsigned int col, tile_idx;
+	struct ipu_image_tile *in_tile, *out_tile;
+
+	for (col = 0; col < in->num_cols; col++) {
+		tile_idx = in->num_cols * row + col;
+		in_tile = &in->tile[tile_idx];
+		out_tile = &out->tile[ctx->out_tile_map[tile_idx]];
+
+		in_tile->top = in_top;
+		in_tile->height = in_height;
+
+		if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
+			out_tile->left = out_top;
+			out_tile->width = out_height;
+		} else {
+			out_tile->top = out_top;
+			out_tile->height = out_height;
+		}
+	}
+}
+
+/*
+ * Find the best horizontal and vertical seam positions to split into tiles.
+ * Minimize the fractional part of the input sampling position for the
+ * top / left pixels of each tile.
+ */
+static void find_seams(struct ipu_image_convert_ctx *ctx,
+		       struct ipu_image_convert_image *in,
+		       struct ipu_image_convert_image *out)
+{
+	struct device *dev = ctx->chan->priv->ipu->dev;
+	unsigned int resized_width = out->base.rect.width;
+	unsigned int resized_height = out->base.rect.height;
+	unsigned int col;
+	unsigned int row;
+	unsigned int in_left_align = tile_left_align(in->fmt);
+	unsigned int in_top_align = tile_top_align(in->fmt);
+	unsigned int out_left_align = tile_left_align(out->fmt);
+	unsigned int out_top_align = tile_top_align(out->fmt);
+	unsigned int out_width_align = tile_width_align(out->fmt);
+	unsigned int out_height_align = tile_height_align(out->type,
+							  ctx->rot_mode);
+	unsigned int in_right = in->base.rect.width;
+	unsigned int in_bottom = in->base.rect.height;
+	unsigned int out_right = out->base.rect.width;
+	unsigned int out_bottom = out->base.rect.height;
+	unsigned int flipped_out_left;
+	unsigned int flipped_out_top;
+
+	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
+		resized_width = out->base.rect.height;
+		resized_height = out->base.rect.width;
+		out_left_align = tile_top_align(out->fmt);
+		out_top_align = tile_left_align(out->fmt);
+		out_width_align = tile_height_align(out->type,
+						    ctx->rot_mode);
+		out_height_align = tile_width_align(out->fmt);
+		out_right = out->base.rect.height;
+		out_bottom = out->base.rect.width;
+	}
+
+	for (col = in->num_cols - 1; col > 0; col--) {
+		bool allow_overshoot = (col < in->num_cols - 1) &&
+				       !(ctx->rot_mode & IPU_ROT_BIT_HFLIP);
+		unsigned int out_start;
+		unsigned int out_end;
+		unsigned int in_left;
+		unsigned int out_left;
+
+		/* Start within 1024 pixels of the right edge */
+		out_start = max_t(int, 0, out_right - 1024);
+		/* End before having to add more columns to the left */
+		out_end = min_t(unsigned int, out_right, col * 1024);
+
+		find_best_seam(ctx, out_start, out_end,
+			       in_left_align, out_left_align, out_width_align,
+			       ctx->downsize_coeff_h, ctx->image_resize_coeff_h,
+			       allow_overshoot, &in_left, &out_left);
+
+		if (ctx->rot_mode & IPU_ROT_BIT_HFLIP)
+			flipped_out_left = resized_width - out_right;
+		else
+			flipped_out_left = out_left;
+
+		fill_tile_column(ctx, col, in, in_left, in_right - in_left,
+				 out, flipped_out_left, out_right - out_left);
+
+		dev_dbg(dev, "%s: col %u: %u, %u -> %u, %u\n", __func__, col,
+			in_left, in_right - in_left,
+			flipped_out_left, out_right - out_left);
+
+		in_right = in_left;
+		out_right = out_left;
+	}
+
+	flipped_out_left = (ctx->rot_mode & IPU_ROT_BIT_HFLIP) ?
+			   resized_width - out_right : 0;
+
+	fill_tile_column(ctx, 0, in, 0, in_right,
+			 out, flipped_out_left, out_right);
+
+	dev_dbg(dev, "%s: col 0: 0, %u -> %u, %u\n", __func__,
+		in_right, flipped_out_left, out_right);
+
+	for (row = in->num_rows - 1; row > 0; row--) {
+		bool allow_overshoot = row < in->num_rows - 1;
+		unsigned int out_start;
+		unsigned int out_end;
+		unsigned int in_top;
+		unsigned int out_top;
+
+		/* Start within 1024 lines of the bottom edge */
+		out_start = max_t(int, 0, out_bottom - 1024);
+		/* End before having to add more rows above */
+		out_end = min_t(unsigned int, out_right, row * 1024);
+
+		find_best_seam(ctx, out_start, out_end,
+			       in_top_align, out_top_align, out_height_align,
+			       ctx->downsize_coeff_v, ctx->image_resize_coeff_v,
+			       allow_overshoot, &in_top, &out_top);
+
+		if ((ctx->rot_mode & IPU_ROT_BIT_VFLIP) ^
+		    ipu_rot_mode_is_irt(ctx->rot_mode))
+			flipped_out_top = resized_height - out_bottom;
+		else
+			flipped_out_top = out_top;
+
+		fill_tile_row(ctx, row, in, in_top, in_bottom - in_top,
+			      out, flipped_out_top, out_bottom - out_top);
+
+		dev_dbg(dev, "%s: row %u: %u, %u -> %u, %u\n", __func__, row,
+			in_top, in_bottom - in_top,
+			flipped_out_top, out_bottom - out_top);
+
+		in_bottom = in_top;
+		out_bottom = out_top;
+	}
+
+	if ((ctx->rot_mode & IPU_ROT_BIT_VFLIP) ^
+	    ipu_rot_mode_is_irt(ctx->rot_mode))
+		flipped_out_top = resized_height - out_bottom;
+	else
+		flipped_out_top = 0;
+
+	fill_tile_row(ctx, 0, in, 0, in_bottom,
+		      out, flipped_out_top, out_bottom);
+
+	dev_dbg(dev, "%s: row 0: 0, %u -> %u, %u\n", __func__,
+		in_bottom, flipped_out_top, out_bottom);
+}
+
 static void calc_tile_dimensions(struct ipu_image_convert_ctx *ctx,
 				 struct ipu_image_convert_image *image)
 {
 	unsigned int i;
 
 	for (i = 0; i < ctx->num_tiles; i++) {
-		struct ipu_image_tile *tile = &image->tile[i];
+		struct ipu_image_tile *tile;
 		const unsigned int row = i / image->num_cols;
 		const unsigned int col = i % image->num_cols;
 
-		tile->height = image->base.pix.height / image->num_rows;
-		tile->width = image->base.pix.width / image->num_cols;
-		tile->left = col * tile->width;
-		tile->top = row * tile->height;
+		if (image->type == IMAGE_CONVERT_OUT)
+			tile = &image->tile[ctx->out_tile_map[i]];
+		else
+			tile = &image->tile[i];
+
 		tile->size = ((tile->height * image->fmt->bpp) >> 3) *
 			tile->width;
 
@@ -1662,13 +1967,16 @@ ipu_image_convert_prepare(struct ipu_soc *ipu, enum ipu_ic_task ic_task,
 	if (ret)
 		goto out_free;
 
+	calc_out_tile_map(ctx);
+
+	find_seams(ctx, s_image, d_image);
+
 	calc_tile_dimensions(ctx, s_image);
 	calc_tile_offsets(ctx, s_image);
 
 	calc_tile_dimensions(ctx, d_image);
 	calc_tile_offsets(ctx, d_image);
 
-	calc_out_tile_map(ctx);
 	calc_tile_resize_coefficients(ctx);
 
 	dump_format(ctx, s_image);
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 09/16] gpu: ipu-v3: image-convert: fix debug output for varying tile sizes
  2018-07-19 15:30 [PATCH v2 00/16] i.MX media mem2mem scaler Philipp Zabel
                   ` (7 preceding siblings ...)
  2018-07-19 15:30 ` [PATCH v2 08/16] gpu: ipu-v3: image-convert: select optimal seam positions Philipp Zabel
@ 2018-07-19 15:30 ` Philipp Zabel
  2018-07-19 15:30 ` [PATCH v2 10/16] gpu: ipu-v3: image-convert: relax tile width alignment for NV12 and NV16 Philipp Zabel
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 22+ messages in thread
From: Philipp Zabel @ 2018-07-19 15:30 UTC (permalink / raw)
  To: linux-media; +Cc: Steve Longerbeam, Nicolas Dufresne, kernel

Since tile dimensions now vary between tiles, add debug output for each
tile's position and dimensions.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
---
 drivers/gpu/ipu-v3/ipu-image-convert.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-image-convert.c b/drivers/gpu/ipu-v3/ipu-image-convert.c
index 6615cea694ed..69cc307f932d 100644
--- a/drivers/gpu/ipu-v3/ipu-image-convert.c
+++ b/drivers/gpu/ipu-v3/ipu-image-convert.c
@@ -302,12 +302,11 @@ static void dump_format(struct ipu_image_convert_ctx *ctx,
 	struct ipu_image_convert_priv *priv = chan->priv;
 
 	dev_dbg(priv->ipu->dev,
-		"task %u: ctx %p: %s format: %dx%d (%dx%d tiles of size %dx%d), %c%c%c%c\n",
+		"task %u: ctx %p: %s format: %dx%d (%dx%d tiles), %c%c%c%c\n",
 		chan->ic_task, ctx,
 		ic_image->type == IMAGE_CONVERT_OUT ? "Output" : "Input",
 		ic_image->base.pix.width, ic_image->base.pix.height,
 		ic_image->num_cols, ic_image->num_rows,
-		ic_image->tile[0].width, ic_image->tile[0].height,
 		ic_image->fmt->fourcc & 0xff,
 		(ic_image->fmt->fourcc >> 8) & 0xff,
 		(ic_image->fmt->fourcc >> 16) & 0xff,
@@ -760,6 +759,8 @@ static void find_seams(struct ipu_image_convert_ctx *ctx,
 static void calc_tile_dimensions(struct ipu_image_convert_ctx *ctx,
 				 struct ipu_image_convert_image *image)
 {
+	struct ipu_image_convert_chan *chan = ctx->chan;
+	struct ipu_image_convert_priv *priv = chan->priv;
 	unsigned int i;
 
 	for (i = 0; i < ctx->num_tiles; i++) {
@@ -784,6 +785,13 @@ static void calc_tile_dimensions(struct ipu_image_convert_ctx *ctx,
 			tile->rot_stride =
 				(image->fmt->bpp * tile->height) >> 3;
 		}
+
+		dev_dbg(priv->ipu->dev,
+			"task %u: ctx %p: %s@[%u,%u]: %ux%u@%u,%u\n",
+			chan->ic_task, ctx,
+			image->type == IMAGE_CONVERT_IN ? "Input" : "Output",
+			row, col,
+			tile->width, tile->height, tile->left, tile->top);
 	}
 }
 
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 10/16] gpu: ipu-v3: image-convert: relax tile width alignment for NV12 and NV16
  2018-07-19 15:30 [PATCH v2 00/16] i.MX media mem2mem scaler Philipp Zabel
                   ` (8 preceding siblings ...)
  2018-07-19 15:30 ` [PATCH v2 09/16] gpu: ipu-v3: image-convert: fix debug output for varying tile sizes Philipp Zabel
@ 2018-07-19 15:30 ` Philipp Zabel
  2018-07-19 15:30 ` [PATCH v2 11/16] gpu: ipu-v3: image-convert: relax input alignment restrictions Philipp Zabel
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 22+ messages in thread
From: Philipp Zabel @ 2018-07-19 15:30 UTC (permalink / raw)
  To: linux-media; +Cc: Steve Longerbeam, Nicolas Dufresne, kernel

For the planar but U/V-packed formats NV12 and NV16, 8 pixel width
alignment is good enough to fulfill the 8 byte stride requirement.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
---
 drivers/gpu/ipu-v3/ipu-image-convert.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/ipu-v3/ipu-image-convert.c b/drivers/gpu/ipu-v3/ipu-image-convert.c
index 69cc307f932d..1a8fc29e278f 100644
--- a/drivers/gpu/ipu-v3/ipu-image-convert.c
+++ b/drivers/gpu/ipu-v3/ipu-image-convert.c
@@ -544,7 +544,7 @@ static inline u32 tile_top_align(const struct ipu_image_pixfmt *fmt)
  */
 static inline u32 tile_width_align(const struct ipu_image_pixfmt *fmt)
 {
-	return fmt->planar ? 8 * fmt->uv_width_dec : 8;
+	return (fmt->planar && !fmt->uv_packed) ? 8 * fmt->uv_width_dec : 8;
 }
 
 /*
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 11/16] gpu: ipu-v3: image-convert: relax input alignment restrictions
  2018-07-19 15:30 [PATCH v2 00/16] i.MX media mem2mem scaler Philipp Zabel
                   ` (9 preceding siblings ...)
  2018-07-19 15:30 ` [PATCH v2 10/16] gpu: ipu-v3: image-convert: relax tile width alignment for NV12 and NV16 Philipp Zabel
@ 2018-07-19 15:30 ` Philipp Zabel
  2018-07-19 15:30 ` [PATCH v2 12/16] gpu: ipu-v3: image-convert: relax output " Philipp Zabel
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 22+ messages in thread
From: Philipp Zabel @ 2018-07-19 15:30 UTC (permalink / raw)
  To: linux-media; +Cc: Steve Longerbeam, Nicolas Dufresne, kernel

If we allow the 8-pixel DMA bursts to overshoot the end of the line, the
only input alignment restrictions are dictated by the pixel format and
8-byte aligned line start address.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
---
 drivers/gpu/ipu-v3/ipu-image-convert.c | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-image-convert.c b/drivers/gpu/ipu-v3/ipu-image-convert.c
index 1a8fc29e278f..bae8d6042333 100644
--- a/drivers/gpu/ipu-v3/ipu-image-convert.c
+++ b/drivers/gpu/ipu-v3/ipu-image-convert.c
@@ -1856,13 +1856,6 @@ void ipu_image_convert_adjust(struct ipu_image *in, struct ipu_image *out,
 		num_in_cols = num_out_cols;
 	}
 
-	/* align input width/height */
-	w_align = ilog2(tile_width_align(infmt) * num_in_cols);
-	h_align = ilog2(tile_height_align(IMAGE_CONVERT_IN, rot_mode) *
-			num_in_rows);
-	in->pix.width = clamp_align(in->pix.width, MIN_W, MAX_W, w_align);
-	in->pix.height = clamp_align(in->pix.height, MIN_H, MAX_H, h_align);
-
 	/* align output width/height */
 	w_align = ilog2(tile_width_align(outfmt) * num_out_cols);
 	h_align = ilog2(tile_height_align(IMAGE_CONVERT_OUT, rot_mode) *
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 12/16] gpu: ipu-v3: image-convert: relax output alignment restrictions
  2018-07-19 15:30 [PATCH v2 00/16] i.MX media mem2mem scaler Philipp Zabel
                   ` (10 preceding siblings ...)
  2018-07-19 15:30 ` [PATCH v2 11/16] gpu: ipu-v3: image-convert: relax input alignment restrictions Philipp Zabel
@ 2018-07-19 15:30 ` Philipp Zabel
  2018-07-19 15:30 ` [PATCH v2 13/16] gpu: ipu-v3: image-convert: fix bytesperline adjustment Philipp Zabel
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 22+ messages in thread
From: Philipp Zabel @ 2018-07-19 15:30 UTC (permalink / raw)
  To: linux-media; +Cc: Steve Longerbeam, Nicolas Dufresne, kernel

If we allow different tile sizes, the output tile with / height
alignment doesn't need to be multiplied by number of columns / rows.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
---
 drivers/gpu/ipu-v3/ipu-image-convert.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-image-convert.c b/drivers/gpu/ipu-v3/ipu-image-convert.c
index bae8d6042333..a8d7939d58d9 100644
--- a/drivers/gpu/ipu-v3/ipu-image-convert.c
+++ b/drivers/gpu/ipu-v3/ipu-image-convert.c
@@ -1857,9 +1857,8 @@ void ipu_image_convert_adjust(struct ipu_image *in, struct ipu_image *out,
 	}
 
 	/* align output width/height */
-	w_align = ilog2(tile_width_align(outfmt) * num_out_cols);
-	h_align = ilog2(tile_height_align(IMAGE_CONVERT_OUT, rot_mode) *
-			num_out_rows);
+	w_align = ilog2(tile_width_align(outfmt));
+	h_align = ilog2(tile_height_align(IMAGE_CONVERT_OUT, rot_mode));
 	out->pix.width = clamp_align(out->pix.width, MIN_W, MAX_W, w_align);
 	out->pix.height = clamp_align(out->pix.height, MIN_H, MAX_H, h_align);
 
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 13/16] gpu: ipu-v3: image-convert: fix bytesperline adjustment
  2018-07-19 15:30 [PATCH v2 00/16] i.MX media mem2mem scaler Philipp Zabel
                   ` (11 preceding siblings ...)
  2018-07-19 15:30 ` [PATCH v2 12/16] gpu: ipu-v3: image-convert: relax output " Philipp Zabel
@ 2018-07-19 15:30 ` Philipp Zabel
  2018-07-19 15:30 ` [PATCH v2 14/16] gpu: ipu-v3: image-convert: add some ASCII art to the exposition Philipp Zabel
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 22+ messages in thread
From: Philipp Zabel @ 2018-07-19 15:30 UTC (permalink / raw)
  To: linux-media; +Cc: Steve Longerbeam, Nicolas Dufresne, kernel

For planar formats, bytesperline does not depend on BPP. It must always
be larger than width and aligned to tile width alignment restrictions.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
---
 drivers/gpu/ipu-v3/ipu-image-convert.c | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-image-convert.c b/drivers/gpu/ipu-v3/ipu-image-convert.c
index a8d7939d58d9..a0d9c154c951 100644
--- a/drivers/gpu/ipu-v3/ipu-image-convert.c
+++ b/drivers/gpu/ipu-v3/ipu-image-convert.c
@@ -1863,10 +1863,19 @@ void ipu_image_convert_adjust(struct ipu_image *in, struct ipu_image *out,
 	out->pix.height = clamp_align(out->pix.height, MIN_H, MAX_H, h_align);
 
 	/* set input/output strides and image sizes */
-	in->pix.bytesperline = (in->pix.width * infmt->bpp) >> 3;
-	in->pix.sizeimage = in->pix.height * in->pix.bytesperline;
-	out->pix.bytesperline = (out->pix.width * outfmt->bpp) >> 3;
-	out->pix.sizeimage = out->pix.height * out->pix.bytesperline;
+	in->pix.bytesperline = infmt->planar ?
+		clamp_align(in->pix.width,
+			    in->pix.bytesperline, MAX_W, w_align) :
+		clamp_align((in->pix.width * infmt->bpp) >> 3,
+			    in->pix.bytesperline, MAX_W, w_align);
+	in->pix.sizeimage = infmt->planar ?
+		(in->pix.height * in->pix.bytesperline * infmt->bpp) >> 3 :
+		in->pix.height * in->pix.bytesperline;
+	out->pix.bytesperline = outfmt->planar ? out->pix.width :
+		(out->pix.width * outfmt->bpp) >> 3;
+	out->pix.sizeimage = outfmt->planar ?
+		(out->pix.height * out->pix.bytesperline * outfmt->bpp) >> 3 :
+		out->pix.height * out->pix.bytesperline;
 }
 EXPORT_SYMBOL_GPL(ipu_image_convert_adjust);
 
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 14/16] gpu: ipu-v3: image-convert: add some ASCII art to the exposition
  2018-07-19 15:30 [PATCH v2 00/16] i.MX media mem2mem scaler Philipp Zabel
                   ` (12 preceding siblings ...)
  2018-07-19 15:30 ` [PATCH v2 13/16] gpu: ipu-v3: image-convert: fix bytesperline adjustment Philipp Zabel
@ 2018-07-19 15:30 ` Philipp Zabel
  2018-07-19 15:30 ` [PATCH v2 15/16] gpu: ipu-v3: image-convert: disable double buffering if necessary Philipp Zabel
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 22+ messages in thread
From: Philipp Zabel @ 2018-07-19 15:30 UTC (permalink / raw)
  To: linux-media; +Cc: Steve Longerbeam, Nicolas Dufresne, kernel

Visualize the scaling and rotation pipeline with some ASCII art
diagrams. Remove the FIXME comment about missing seam prevention.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
---
 drivers/gpu/ipu-v3/ipu-image-convert.c | 39 +++++++++++++++++++-------
 1 file changed, 29 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-image-convert.c b/drivers/gpu/ipu-v3/ipu-image-convert.c
index a0d9c154c951..93eaeacf777e 100644
--- a/drivers/gpu/ipu-v3/ipu-image-convert.c
+++ b/drivers/gpu/ipu-v3/ipu-image-convert.c
@@ -37,17 +37,36 @@
  * when double_buffering boolean is set).
  *
  * Note that the input frame must be split up into the same number
- * of tiles as the output frame.
+ * of tiles as the output frame:
  *
- * FIXME: at this point there is no attempt to deal with visible seams
- * at the tile boundaries when upscaling. The seams are caused by a reset
- * of the bilinear upscale interpolation when starting a new tile. The
- * seams are barely visible for small upscale factors, but become
- * increasingly visible as the upscale factor gets larger, since more
- * interpolated pixels get thrown out at the tile boundaries. A possilble
- * fix might be to overlap tiles of different sizes, but this must be done
- * while also maintaining the IDMAC dma buffer address alignment and 8x8 IRT
- * alignment restrictions of each tile.
+ *                       +---------+-----+
+ *   +-----+---+         |  A      | B   |
+ *   | A   | B |         |         |     |
+ *   +-----+---+   -->   +---------+-----+
+ *   | C   | D |         |  C      | D   |
+ *   +-----+---+         |         |     |
+ *                       +---------+-----+
+ *
+ * Clockwise 90° rotations are handled by first rescaling into a
+ * reusable temporary tile buffer and then rotating with the 8x8
+ * block rotator, writing to the correct destination:
+ *
+ *                                         +-----+-----+
+ *                                         |     |     |
+ *   +-----+---+         +---------+       | C   | A   |
+ *   | A   | B |         | A,B, |  |       |     |     |
+ *   +-----+---+   -->   | C,D  |  |  -->  |     |     |
+ *   | C   | D |         +---------+       +-----+-----+
+ *   +-----+---+                           | D   | B   |
+ *                                         |     |     |
+ *                                         +-----+-----+
+ *
+ * If the 8x8 block rotator is used, horizontal or vertical flipping
+ * is done during the rotation step, otherwise flipping is done
+ * during the scaling step.
+ * With rotation or flipping, tile order changes between input and
+ * output image. Tiles are numbered row major from top left to bottom
+ * right for both input and output image.
  */
 
 #define MAX_STRIPES_W    4
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 15/16] gpu: ipu-v3: image-convert: disable double buffering if necessary
  2018-07-19 15:30 [PATCH v2 00/16] i.MX media mem2mem scaler Philipp Zabel
                   ` (13 preceding siblings ...)
  2018-07-19 15:30 ` [PATCH v2 14/16] gpu: ipu-v3: image-convert: add some ASCII art to the exposition Philipp Zabel
@ 2018-07-19 15:30 ` Philipp Zabel
  2018-07-19 15:30 ` [PATCH v2 16/16] media: imx: add mem2mem device Philipp Zabel
  2018-07-22 18:30 ` [PATCH v2 00/16] i.MX media mem2mem scaler Steve Longerbeam
  16 siblings, 0 replies; 22+ messages in thread
From: Philipp Zabel @ 2018-07-19 15:30 UTC (permalink / raw)
  To: linux-media; +Cc: Steve Longerbeam, Nicolas Dufresne, kernel

Double-buffering only works if tile sizes are the same and the resizing
coefficient does not change between tiles, even for non-planar formats.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
---
 drivers/gpu/ipu-v3/ipu-image-convert.c | 27 ++++++++++++++++++++++++--
 1 file changed, 25 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-image-convert.c b/drivers/gpu/ipu-v3/ipu-image-convert.c
index 93eaeacf777e..1ea1ad0e8d66 100644
--- a/drivers/gpu/ipu-v3/ipu-image-convert.c
+++ b/drivers/gpu/ipu-v3/ipu-image-convert.c
@@ -1939,6 +1939,7 @@ ipu_image_convert_prepare(struct ipu_soc *ipu, enum ipu_ic_task ic_task,
 	struct ipu_image_convert_chan *chan;
 	struct ipu_image_convert_ctx *ctx;
 	unsigned long flags;
+	unsigned int i;
 	bool get_res;
 	int ret;
 
@@ -2022,15 +2023,37 @@ ipu_image_convert_prepare(struct ipu_soc *ipu, enum ipu_ic_task ic_task,
 	 * for every tile, and therefore would have to be updated for
 	 * each buffer which is not possible. So double-buffering is
 	 * impossible when either the source or destination images are
-	 * a planar format (YUV420, YUV422P, etc.).
+	 * a planar format (YUV420, YUV422P, etc.). Further, differently
+	 * sized tiles or different resizing coefficients per tile
+	 * prevent double-buffering as well.
 	 */
 	ctx->double_buffering = (ctx->num_tiles > 1 &&
 				 !s_image->fmt->planar &&
 				 !d_image->fmt->planar);
+	for (i = 1; i < ctx->num_tiles; i++) {
+		if (ctx->in.tile[i].width != ctx->in.tile[0].width ||
+		    ctx->in.tile[i].height != ctx->in.tile[0].height ||
+		    ctx->out.tile[i].width != ctx->out.tile[0].width ||
+		    ctx->out.tile[i].height != ctx->out.tile[0].height) {
+			ctx->double_buffering = false;
+			break;
+		}
+	}
+	for (i = 1; i < ctx->in.num_cols; i++) {
+		if (ctx->resize_coeffs_h[i] != ctx->resize_coeffs_h[0]) {
+			ctx->double_buffering = false;
+			break;
+		}
+	}
+	for (i = 1; i < ctx->in.num_rows; i++) {
+		if (ctx->resize_coeffs_v[i] != ctx->resize_coeffs_v[0]) {
+			ctx->double_buffering = false;
+			break;
+		}
+	}
 
 	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
 		unsigned long intermediate_size = d_image->tile[0].size;
-		unsigned int i;
 
 		for (i = 1; i < ctx->num_tiles; i++) {
 			if (d_image->tile[i].size > intermediate_size)
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 16/16] media: imx: add mem2mem device
  2018-07-19 15:30 [PATCH v2 00/16] i.MX media mem2mem scaler Philipp Zabel
                   ` (14 preceding siblings ...)
  2018-07-19 15:30 ` [PATCH v2 15/16] gpu: ipu-v3: image-convert: disable double buffering if necessary Philipp Zabel
@ 2018-07-19 15:30 ` Philipp Zabel
  2018-07-22 18:30 ` [PATCH v2 00/16] i.MX media mem2mem scaler Steve Longerbeam
  16 siblings, 0 replies; 22+ messages in thread
From: Philipp Zabel @ 2018-07-19 15:30 UTC (permalink / raw)
  To: linux-media; +Cc: Steve Longerbeam, Nicolas Dufresne, kernel

Add a single imx-media mem2mem video device that uses the IPU IC PP
(image converter post processing) task for scaling and colorspace
conversion.
On i.MX6Q/DL SoCs with two IPUs currently only the first IPU is used.

The hardware only supports writing to destination buffers up to
1024x1024 pixels in a single pass, so the mem2mem video device is
limited to this resolution. After fixing the tiling code it should
be possible to extend this to arbitrary sizes by rendering multiple
tiles per frame.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
---
Changes since v1 [1]:
 - Fix SPDX-License-Identifier and remove superfluous license
   text.
 - Fix uninitialized walign in try_fmt

[1] https://patchwork.linuxtv.org/patch/50522/
---
 drivers/staging/media/imx/Kconfig             |   1 +
 drivers/staging/media/imx/Makefile            |   1 +
 drivers/staging/media/imx/imx-media-dev.c     |  11 +
 drivers/staging/media/imx/imx-media-mem2mem.c | 946 ++++++++++++++++++
 drivers/staging/media/imx/imx-media.h         |  10 +
 5 files changed, 969 insertions(+)
 create mode 100644 drivers/staging/media/imx/imx-media-mem2mem.c

diff --git a/drivers/staging/media/imx/Kconfig b/drivers/staging/media/imx/Kconfig
index bfc17de56b17..07013cb3cb66 100644
--- a/drivers/staging/media/imx/Kconfig
+++ b/drivers/staging/media/imx/Kconfig
@@ -6,6 +6,7 @@ config VIDEO_IMX_MEDIA
 	depends on HAS_DMA
 	select VIDEOBUF2_DMA_CONTIG
 	select V4L2_FWNODE
+	select V4L2_MEM2MEM_DEV
 	---help---
 	  Say yes here to enable support for video4linux media controller
 	  driver for the i.MX5/6 SOC.
diff --git a/drivers/staging/media/imx/Makefile b/drivers/staging/media/imx/Makefile
index 698a4210316e..f2e722d0fa19 100644
--- a/drivers/staging/media/imx/Makefile
+++ b/drivers/staging/media/imx/Makefile
@@ -6,6 +6,7 @@ imx-media-ic-objs := imx-ic-common.o imx-ic-prp.o imx-ic-prpencvf.o
 obj-$(CONFIG_VIDEO_IMX_MEDIA) += imx-media.o
 obj-$(CONFIG_VIDEO_IMX_MEDIA) += imx-media-common.o
 obj-$(CONFIG_VIDEO_IMX_MEDIA) += imx-media-capture.o
+obj-$(CONFIG_VIDEO_IMX_MEDIA) += imx-media-mem2mem.o
 obj-$(CONFIG_VIDEO_IMX_MEDIA) += imx-media-vdic.o
 obj-$(CONFIG_VIDEO_IMX_MEDIA) += imx-media-ic.o
 
diff --git a/drivers/staging/media/imx/imx-media-dev.c b/drivers/staging/media/imx/imx-media-dev.c
index b0be80f05767..1466eba4119e 100644
--- a/drivers/staging/media/imx/imx-media-dev.c
+++ b/drivers/staging/media/imx/imx-media-dev.c
@@ -359,6 +359,17 @@ static int imx_media_probe_complete(struct v4l2_async_notifier *notifier)
 		goto unlock;
 
 	ret = v4l2_device_register_subdev_nodes(&imxmd->v4l2_dev);
+	if (ret)
+		goto unlock;
+
+	/* TODO: check whether we have IC subdevices first */
+	imxmd->m2m_vdev = imx_media_mem2mem_device_init(imxmd);
+	if (IS_ERR(imxmd->m2m_vdev)) {
+		ret = PTR_ERR(imxmd->m2m_vdev);
+		goto unlock;
+	}
+
+	ret = imx_media_mem2mem_device_register(imxmd->m2m_vdev);
 unlock:
 	mutex_unlock(&imxmd->mutex);
 	if (ret)
diff --git a/drivers/staging/media/imx/imx-media-mem2mem.c b/drivers/staging/media/imx/imx-media-mem2mem.c
new file mode 100644
index 000000000000..bbc04703ae17
--- /dev/null
+++ b/drivers/staging/media/imx/imx-media-mem2mem.c
@@ -0,0 +1,946 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * i.MX IPUv3 mem2mem Scaler/CSC driver
+ *
+ * Copyright (C) 2011 Pengutronix, Sascha Hauer
+ * Copyright (C) 2018 Pengutronix, Philipp Zabel
+ */
+#include <linux/module.h>
+#include <linux/delay.h>
+#include <linux/fs.h>
+#include <linux/version.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <video/imx-ipu-v3.h>
+#include <video/imx-ipu-image-convert.h>
+
+#include <media/v4l2-ctrls.h>
+#include <media/v4l2-mem2mem.h>
+#include <media/v4l2-device.h>
+#include <media/v4l2-ioctl.h>
+#include <media/videobuf2-dma-contig.h>
+
+#include "imx-media.h"
+
+#define MIN_W 16
+#define MIN_H 16
+#define MAX_W 4096
+#define MAX_H 4096
+
+#define fh_to_ctx(__fh)	container_of(__fh, struct mem2mem_ctx, fh)
+
+enum {
+	V4L2_M2M_SRC = 0,
+	V4L2_M2M_DST = 1,
+};
+
+struct mem2mem_priv {
+	struct imx_media_video_dev vdev;
+
+	struct v4l2_m2m_dev   *m2m_dev;
+	struct device         *dev;
+
+	struct imx_media_dev  *md;
+
+	struct mutex          mutex;       /* mem2mem device mutex */
+
+	atomic_t              num_inst;
+};
+
+#define to_mem2mem_priv(v) container_of(v, struct mem2mem_priv, vdev)
+
+/* Per-queue, driver-specific private data */
+struct mem2mem_q_data {
+	struct v4l2_pix_format	cur_fmt;
+	struct v4l2_rect	rect;
+};
+
+struct mem2mem_ctx {
+	struct mem2mem_priv	*priv;
+
+	struct v4l2_fh		fh;
+	struct mem2mem_q_data	q_data[2];
+	int			error;
+	struct ipu_image_convert_ctx *icc;
+
+	struct v4l2_ctrl_handler ctrl_hdlr;
+	int rotate;
+	bool hflip;
+	bool vflip;
+	enum ipu_rotate_mode	rot_mode;
+};
+
+static struct mem2mem_q_data *get_q_data(struct mem2mem_ctx *ctx,
+					 enum v4l2_buf_type type)
+{
+	if (V4L2_TYPE_IS_OUTPUT(type))
+		return &ctx->q_data[V4L2_M2M_SRC];
+	else
+		return &ctx->q_data[V4L2_M2M_DST];
+}
+
+/*
+ * mem2mem callbacks
+ */
+
+static void job_abort(void *_ctx)
+{
+	struct mem2mem_ctx *ctx = _ctx;
+
+	if (ctx->icc)
+		ipu_image_convert_abort(ctx->icc);
+}
+
+static void mem2mem_ic_complete(struct ipu_image_convert_run *run, void *_ctx)
+{
+	struct mem2mem_ctx *ctx = _ctx;
+	struct mem2mem_priv *priv = ctx->priv;
+	struct vb2_v4l2_buffer *src_buf, *dst_buf;
+
+	src_buf = v4l2_m2m_src_buf_remove(ctx->fh.m2m_ctx);
+	dst_buf = v4l2_m2m_dst_buf_remove(ctx->fh.m2m_ctx);
+
+	dst_buf->vb2_buf.timestamp = src_buf->vb2_buf.timestamp;
+	dst_buf->timecode = src_buf->timecode;
+
+	v4l2_m2m_buf_done(src_buf, run->status ? VB2_BUF_STATE_ERROR :
+						 VB2_BUF_STATE_DONE);
+	v4l2_m2m_buf_done(dst_buf, run->status ? VB2_BUF_STATE_ERROR :
+						 VB2_BUF_STATE_DONE);
+
+	v4l2_m2m_job_finish(priv->m2m_dev, ctx->fh.m2m_ctx);
+	kfree(run);
+}
+
+static void device_run(void *_ctx)
+{
+	struct mem2mem_ctx *ctx = _ctx;
+	struct mem2mem_priv *priv = ctx->priv;
+	struct vb2_v4l2_buffer *src_buf, *dst_buf;
+	struct ipu_image_convert_run *run;
+	int ret;
+
+	src_buf = v4l2_m2m_next_src_buf(ctx->fh.m2m_ctx);
+	dst_buf = v4l2_m2m_next_dst_buf(ctx->fh.m2m_ctx);
+
+	run = kzalloc(sizeof(*run), GFP_KERNEL);
+	if (!run)
+		goto err;
+
+	run->ctx = ctx->icc;
+	run->in_phys = vb2_dma_contig_plane_dma_addr(&src_buf->vb2_buf, 0);
+	run->out_phys = vb2_dma_contig_plane_dma_addr(&dst_buf->vb2_buf, 0);
+
+	ret = ipu_image_convert_queue(run);
+	if (ret < 0) {
+		v4l2_err(ctx->priv->vdev.vfd->v4l2_dev,
+			 "%s: failed to queue: %d\n", __func__, ret);
+		goto err;
+	}
+
+	return;
+
+err:
+	v4l2_m2m_buf_done(src_buf, VB2_BUF_STATE_ERROR);
+	v4l2_m2m_buf_done(dst_buf, VB2_BUF_STATE_ERROR);
+	v4l2_m2m_job_finish(priv->m2m_dev, ctx->fh.m2m_ctx);
+}
+
+/*
+ * Video ioctls
+ */
+static int vidioc_querycap(struct file *file, void *priv,
+			   struct v4l2_capability *cap)
+{
+	strncpy(cap->driver, "imx-media-mem2mem", sizeof(cap->driver) - 1);
+	strncpy(cap->card, "imx-media-mem2mem", sizeof(cap->card) - 1);
+	strncpy(cap->bus_info, "platform:imx-media-mem2mem",
+		sizeof(cap->bus_info) - 1);
+	cap->device_caps = V4L2_CAP_VIDEO_M2M | V4L2_CAP_STREAMING;
+	cap->capabilities = cap->device_caps | V4L2_CAP_DEVICE_CAPS;
+
+	return 0;
+}
+
+static int mem2mem_enum_fmt(struct file *file, void *fh,
+			    struct v4l2_fmtdesc *f)
+{
+	u32 fourcc;
+	int ret;
+
+	ret = imx_media_enum_format(&fourcc, f->index, CS_SEL_ANY);
+	if (ret)
+		return ret;
+
+	f->pixelformat = fourcc;
+
+	return 0;
+}
+
+static int mem2mem_g_fmt(struct file *file, void *priv, struct v4l2_format *f)
+{
+	struct mem2mem_ctx *ctx = fh_to_ctx(priv);
+	struct mem2mem_q_data *q_data;
+
+	q_data = get_q_data(ctx, f->type);
+
+	f->fmt.pix = q_data->cur_fmt;
+
+	return 0;
+}
+
+static int mem2mem_try_fmt(struct file *file, void *priv,
+			   struct v4l2_format *f)
+{
+	const struct imx_media_pixfmt *cc;
+	struct mem2mem_ctx *ctx = fh_to_ctx(priv);
+	struct mem2mem_q_data *q_data = get_q_data(ctx, f->type);
+	unsigned int walign = 0;
+	unsigned int halign = 0;
+	u32 stride;
+
+	cc = imx_media_find_format(f->fmt.pix.pixelformat, CS_SEL_ANY, false);
+	if (!cc) {
+		f->fmt.pix.pixelformat = V4L2_PIX_FMT_RGB32;
+		cc = imx_media_find_format(V4L2_PIX_FMT_RGB32, CS_SEL_RGB,
+					   false);
+	}
+
+	/*
+	 * Horizontally/vertically chroma subsampled formats must have even
+	 * width/height.
+	 */
+	switch (f->fmt.pix.pixelformat) {
+	case V4L2_PIX_FMT_YUV420:
+	case V4L2_PIX_FMT_YVU420:
+	case V4L2_PIX_FMT_NV12:
+		halign = 1;
+		/* fall through */
+	case V4L2_PIX_FMT_YUV422P:
+	case V4L2_PIX_FMT_NV16:
+		walign = 1;
+		break;
+	default:
+		break;
+	}
+	if (f->type == V4L2_BUF_TYPE_VIDEO_OUTPUT) {
+		/*
+		 * The IC burst reads 8 pixels at a time. Reading beyond the
+		 * end of the line is usually acceptable. Those pixels are
+		 * ignored, unless the IC has to write the scaled line in
+		 * reverse.
+		 */
+		if (!ipu_rot_mode_is_irt(ctx->rot_mode) &&
+		    ctx->rot_mode && IPU_ROT_BIT_HFLIP)
+			walign = 3;
+	} else {
+		if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
+			switch (f->fmt.pix.pixelformat) {
+			case V4L2_PIX_FMT_YUV420:
+			case V4L2_PIX_FMT_YVU420:
+			case V4L2_PIX_FMT_YUV422P:
+				/*
+				 * Align to 16x16 pixel blocks for planar 4:2:0
+				 * chroma subsampled formats to guarantee
+				 * 8-byte aligned line start addresses in the
+				 * chroma planes.
+				 */
+				walign = 4;
+				halign = 4;
+				break;
+			default:
+				/*
+				 * Align to 8x8 pixel IRT block size for all
+				 * other formats.
+				 */
+				walign = 3;
+				halign = 3;
+				break;
+			}
+		} else {
+			/*
+			 * The IC burst writes 8 pixels at a time.
+			 *
+			 * TODO: support unaligned width with via
+			 * V4L2_SEL_TGT_COMPOSE_PADDED.
+			 */
+			walign = 3;
+		}
+	}
+	v4l_bound_align_image(&f->fmt.pix.width, MIN_W, MAX_W, walign,
+			      &f->fmt.pix.height, MIN_H, MAX_H, halign, 0);
+
+	stride = cc->planar ? f->fmt.pix.width
+			    : (f->fmt.pix.width * cc->bpp) >> 3;
+	switch (f->fmt.pix.pixelformat) {
+	case V4L2_PIX_FMT_YUV420:
+	case V4L2_PIX_FMT_YVU420:
+	case V4L2_PIX_FMT_YUV422P:
+		stride = round_up(stride, 16);
+		break;
+	default:
+		stride = round_up(stride, 8);
+		break;
+	}
+
+	f->fmt.pix.field = V4L2_FIELD_NONE;
+	f->fmt.pix.bytesperline = stride;
+	f->fmt.pix.sizeimage = cc->planar ?
+			       (stride * f->fmt.pix.height * cc->bpp) >> 3 :
+			       stride * f->fmt.pix.height;
+
+	if (f->type == V4L2_BUF_TYPE_VIDEO_CAPTURE) {
+		f->fmt.pix.colorspace = q_data->cur_fmt.colorspace;
+		f->fmt.pix.ycbcr_enc = q_data->cur_fmt.ycbcr_enc;
+		f->fmt.pix.xfer_func = q_data->cur_fmt.xfer_func;
+		f->fmt.pix.quantization = q_data->cur_fmt.quantization;
+	} else if (f->fmt.pix.colorspace == V4L2_COLORSPACE_DEFAULT) {
+		f->fmt.pix.colorspace = V4L2_COLORSPACE_SRGB;
+		f->fmt.pix.ycbcr_enc = V4L2_YCBCR_ENC_DEFAULT;
+		f->fmt.pix.xfer_func = V4L2_XFER_FUNC_DEFAULT;
+		f->fmt.pix.quantization = V4L2_QUANTIZATION_DEFAULT;
+	}
+
+	return 0;
+}
+
+static int mem2mem_s_fmt(struct file *file, void *priv, struct v4l2_format *f)
+{
+	struct mem2mem_q_data *q_data;
+	struct mem2mem_ctx *ctx = fh_to_ctx(priv);
+	struct vb2_queue *vq;
+	int ret;
+
+	vq = v4l2_m2m_get_vq(ctx->fh.m2m_ctx, f->type);
+	if (vb2_is_busy(vq)) {
+		v4l2_err(ctx->priv->vdev.vfd->v4l2_dev, "%s queue busy\n",
+			 __func__);
+		return -EBUSY;
+	}
+
+	q_data = get_q_data(ctx, f->type);
+
+	ret = mem2mem_try_fmt(file, priv, f);
+	if (ret < 0)
+		return ret;
+
+	q_data->cur_fmt.width = f->fmt.pix.width;
+	q_data->cur_fmt.height = f->fmt.pix.height;
+	q_data->cur_fmt.pixelformat = f->fmt.pix.pixelformat;
+	q_data->cur_fmt.field = f->fmt.pix.field;
+	q_data->cur_fmt.bytesperline = f->fmt.pix.bytesperline;
+	q_data->cur_fmt.sizeimage = f->fmt.pix.sizeimage;
+
+	/* Reset cropping/composing rectangle */
+	q_data->rect.left = 0;
+	q_data->rect.top = 0;
+	q_data->rect.width = q_data->cur_fmt.width;
+	q_data->rect.height = q_data->cur_fmt.height;
+
+	if (f->type == V4L2_BUF_TYPE_VIDEO_OUTPUT) {
+		/* Set colorimetry on the output queue */
+		q_data->cur_fmt.colorspace = f->fmt.pix.colorspace;
+		q_data->cur_fmt.ycbcr_enc = f->fmt.pix.ycbcr_enc;
+		q_data->cur_fmt.xfer_func = f->fmt.pix.xfer_func;
+		q_data->cur_fmt.quantization = f->fmt.pix.quantization;
+		/* Propagate colorimetry to the capture queue */
+		q_data = get_q_data(ctx, V4L2_BUF_TYPE_VIDEO_CAPTURE);
+		q_data->cur_fmt.colorspace = f->fmt.pix.colorspace;
+		q_data->cur_fmt.ycbcr_enc = f->fmt.pix.ycbcr_enc;
+		q_data->cur_fmt.xfer_func = f->fmt.pix.xfer_func;
+		q_data->cur_fmt.quantization = f->fmt.pix.quantization;
+	}
+
+	/*
+	 * TODO: Setting colorimetry on the capture queue is currently not
+	 * supported by the V4L2 API
+	 */
+
+	return 0;
+}
+
+static int mem2mem_g_selection(struct file *file, void *priv,
+			       struct v4l2_selection *s)
+{
+	struct mem2mem_ctx *ctx = fh_to_ctx(priv);
+	struct mem2mem_q_data *q_data;
+
+	switch (s->target) {
+	case V4L2_SEL_TGT_CROP:
+	case V4L2_SEL_TGT_CROP_DEFAULT:
+	case V4L2_SEL_TGT_CROP_BOUNDS:
+		if (s->type != V4L2_BUF_TYPE_VIDEO_OUTPUT)
+			return -EINVAL;
+		q_data = get_q_data(ctx, V4L2_BUF_TYPE_VIDEO_OUTPUT);
+		break;
+	case V4L2_SEL_TGT_COMPOSE:
+	case V4L2_SEL_TGT_COMPOSE_DEFAULT:
+	case V4L2_SEL_TGT_COMPOSE_BOUNDS:
+	case V4L2_SEL_TGT_COMPOSE_PADDED:
+		if (s->type != V4L2_BUF_TYPE_VIDEO_CAPTURE)
+			return -EINVAL;
+		q_data = get_q_data(ctx, V4L2_BUF_TYPE_VIDEO_CAPTURE);
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	if (s->target == V4L2_SEL_TGT_CROP ||
+	    s->target == V4L2_SEL_TGT_COMPOSE) {
+		s->r = q_data->rect;
+	} else {
+		s->r.left = 0;
+		s->r.top = 0;
+		s->r.width = q_data->cur_fmt.width;
+		s->r.height = q_data->cur_fmt.height;
+	}
+
+	return 0;
+}
+
+static int mem2mem_s_selection(struct file *file, void *priv,
+			       struct v4l2_selection *s)
+{
+	struct mem2mem_ctx *ctx = fh_to_ctx(priv);
+	struct mem2mem_q_data *q_data;
+
+	switch (s->target) {
+	case V4L2_SEL_TGT_CROP:
+		if (s->type != V4L2_BUF_TYPE_VIDEO_OUTPUT)
+			return -EINVAL;
+		break;
+	case V4L2_SEL_TGT_COMPOSE:
+		if (s->type != V4L2_BUF_TYPE_VIDEO_CAPTURE)
+			return -EINVAL;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	if (s->type != V4L2_BUF_TYPE_VIDEO_CAPTURE ||
+	    s->type != V4L2_BUF_TYPE_VIDEO_OUTPUT)
+		return -EINVAL;
+
+	q_data = get_q_data(ctx, s->type);
+
+	/* The input's frame width to the IC must be a multiple of 8 pixels
+	 * When performing resizing the frame width must be multiple of burst
+	 * size - 8 or 16 pixels as defined by CB#_BURST_16 parameter.
+	 */
+	if (s->flags & V4L2_SEL_FLAG_GE)
+		s->r.width = round_up(s->r.width, 8);
+	if (s->flags & V4L2_SEL_FLAG_LE)
+		s->r.width = round_down(s->r.width, 8);
+	s->r.width = clamp_t(unsigned int, s->r.width, 8,
+			     round_down(q_data->cur_fmt.width, 8));
+	s->r.height = clamp_t(unsigned int, s->r.height, 1,
+			      q_data->cur_fmt.height);
+	s->r.left = clamp_t(unsigned int, s->r.left, 0,
+			    q_data->cur_fmt.width - s->r.width);
+	s->r.top = clamp_t(unsigned int, s->r.top, 0,
+			   q_data->cur_fmt.height - s->r.height);
+
+	/* V4L2_SEL_FLAG_KEEP_CONFIG is only valid for subdevices */
+	q_data->rect = s->r;
+
+	return 0;
+}
+
+static const struct v4l2_ioctl_ops mem2mem_ioctl_ops = {
+	.vidioc_querycap	= vidioc_querycap,
+
+	.vidioc_enum_fmt_vid_cap = mem2mem_enum_fmt,
+	.vidioc_g_fmt_vid_cap	= mem2mem_g_fmt,
+	.vidioc_try_fmt_vid_cap	= mem2mem_try_fmt,
+	.vidioc_s_fmt_vid_cap	= mem2mem_s_fmt,
+
+	.vidioc_enum_fmt_vid_out = mem2mem_enum_fmt,
+	.vidioc_g_fmt_vid_out	= mem2mem_g_fmt,
+	.vidioc_try_fmt_vid_out	= mem2mem_try_fmt,
+	.vidioc_s_fmt_vid_out	= mem2mem_s_fmt,
+
+	.vidioc_g_selection	= mem2mem_g_selection,
+	.vidioc_s_selection	= mem2mem_s_selection,
+
+	.vidioc_reqbufs		= v4l2_m2m_ioctl_reqbufs,
+	.vidioc_querybuf	= v4l2_m2m_ioctl_querybuf,
+
+	.vidioc_qbuf		= v4l2_m2m_ioctl_qbuf,
+	.vidioc_expbuf		= v4l2_m2m_ioctl_expbuf,
+	.vidioc_dqbuf		= v4l2_m2m_ioctl_dqbuf,
+	.vidioc_create_bufs	= v4l2_m2m_ioctl_create_bufs,
+
+	.vidioc_streamon	= v4l2_m2m_ioctl_streamon,
+	.vidioc_streamoff	= v4l2_m2m_ioctl_streamoff,
+};
+
+/*
+ * Queue operations
+ */
+
+static int mem2mem_queue_setup(struct vb2_queue *vq, unsigned int *nbuffers,
+			       unsigned int *nplanes, unsigned int sizes[],
+			       struct device *alloc_devs[])
+{
+	struct mem2mem_ctx *ctx = vb2_get_drv_priv(vq);
+	struct mem2mem_q_data *q_data;
+	unsigned int count = *nbuffers;
+	struct v4l2_pix_format *pix;
+
+	q_data = get_q_data(ctx, vq->type);
+	pix = &q_data->cur_fmt;
+
+	*nplanes = 1;
+	*nbuffers = count;
+	sizes[0] = pix->sizeimage;
+
+	dev_dbg(ctx->priv->dev, "get %d buffer(s) of size %d each.\n",
+		count, pix->sizeimage);
+
+	return 0;
+}
+
+static int mem2mem_buf_prepare(struct vb2_buffer *vb)
+{
+	struct mem2mem_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
+	struct mem2mem_q_data *q_data;
+	struct v4l2_pix_format *pix;
+	unsigned int plane_size, payload;
+
+	dev_dbg(ctx->priv->dev, "type: %d\n", vb->vb2_queue->type);
+
+	q_data = get_q_data(ctx, vb->vb2_queue->type);
+	pix = &q_data->cur_fmt;
+	plane_size = pix->sizeimage;
+
+	if (vb2_plane_size(vb, 0) < plane_size) {
+		dev_dbg(ctx->priv->dev,
+			"%s data will not fit into plane (%lu < %lu)\n",
+			__func__, vb2_plane_size(vb, 0), (long)plane_size);
+		return -EINVAL;
+	}
+
+	payload = pix->bytesperline * pix->height;
+	if (pix->pixelformat == V4L2_PIX_FMT_YUV420 ||
+	    pix->pixelformat == V4L2_PIX_FMT_YVU420 ||
+	    pix->pixelformat == V4L2_PIX_FMT_NV12)
+		payload = payload * 3 / 2;
+	else if (pix->pixelformat == V4L2_PIX_FMT_YUV422P ||
+		 pix->pixelformat == V4L2_PIX_FMT_NV16)
+		payload *= 2;
+
+	vb2_set_plane_payload(vb, 0, payload);
+
+	return 0;
+}
+
+static void mem2mem_buf_queue(struct vb2_buffer *vb)
+{
+	struct mem2mem_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
+
+	v4l2_m2m_buf_queue(ctx->fh.m2m_ctx, to_vb2_v4l2_buffer(vb));
+}
+
+static void ipu_image_from_q_data(struct ipu_image *im,
+				  struct mem2mem_q_data *q_data)
+{
+	im->pix.width = q_data->cur_fmt.width;
+	im->pix.height = q_data->cur_fmt.height;
+	im->pix.bytesperline = q_data->cur_fmt.bytesperline;
+	im->pix.pixelformat = q_data->cur_fmt.pixelformat;
+	im->rect = q_data->rect;
+}
+
+static int mem2mem_start_streaming(struct vb2_queue *q, unsigned int count)
+{
+	const enum ipu_ic_task ic_task = IC_TASK_POST_PROCESSOR;
+	struct mem2mem_ctx *ctx = vb2_get_drv_priv(q);
+	struct mem2mem_priv *priv = ctx->priv;
+	struct ipu_soc *ipu = priv->md->ipu[0];
+	struct mem2mem_q_data *q_data;
+	struct vb2_queue *other_q;
+	struct ipu_image in, out;
+
+	other_q = v4l2_m2m_get_vq(ctx->fh.m2m_ctx,
+				  (q->type == V4L2_BUF_TYPE_VIDEO_CAPTURE) ?
+				  V4L2_BUF_TYPE_VIDEO_OUTPUT :
+				  V4L2_BUF_TYPE_VIDEO_CAPTURE);
+	if (!vb2_is_streaming(other_q))
+		return 0;
+
+	if (ctx->icc) {
+		v4l2_warn(ctx->priv->vdev.vfd->v4l2_dev, "removing old ICC\n");
+		ipu_image_convert_unprepare(ctx->icc);
+	}
+
+	q_data = get_q_data(ctx, V4L2_BUF_TYPE_VIDEO_OUTPUT);
+	ipu_image_from_q_data(&in, q_data);
+
+	q_data = get_q_data(ctx, V4L2_BUF_TYPE_VIDEO_CAPTURE);
+	ipu_image_from_q_data(&out, q_data);
+
+	ctx->icc = ipu_image_convert_prepare(ipu, ic_task, &in, &out,
+					     ctx->rot_mode,
+					     mem2mem_ic_complete, ctx);
+	if (IS_ERR(ctx->icc)) {
+		struct vb2_v4l2_buffer *buf;
+		int ret = PTR_ERR(ctx->icc);
+
+		ctx->icc = NULL;
+		v4l2_err(ctx->priv->vdev.vfd->v4l2_dev, "%s: error %d\n",
+			 __func__, ret);
+		while ((buf = v4l2_m2m_src_buf_remove(ctx->fh.m2m_ctx)))
+			v4l2_m2m_buf_done(buf, VB2_BUF_STATE_QUEUED);
+		while ((buf = v4l2_m2m_dst_buf_remove(ctx->fh.m2m_ctx)))
+			v4l2_m2m_buf_done(buf, VB2_BUF_STATE_QUEUED);
+		return ret;
+	}
+
+	return 0;
+}
+
+static void mem2mem_stop_streaming(struct vb2_queue *q)
+{
+	struct mem2mem_ctx *ctx = vb2_get_drv_priv(q);
+	struct vb2_v4l2_buffer *buf;
+
+	if (ctx->icc) {
+		ipu_image_convert_unprepare(ctx->icc);
+		ctx->icc = NULL;
+	}
+
+	if (q->type == V4L2_BUF_TYPE_VIDEO_OUTPUT) {
+		while ((buf = v4l2_m2m_src_buf_remove(ctx->fh.m2m_ctx)))
+			v4l2_m2m_buf_done(buf, VB2_BUF_STATE_ERROR);
+	} else {
+		while ((buf = v4l2_m2m_dst_buf_remove(ctx->fh.m2m_ctx)))
+			v4l2_m2m_buf_done(buf, VB2_BUF_STATE_ERROR);
+	}
+}
+
+static const struct vb2_ops mem2mem_qops = {
+	.queue_setup	= mem2mem_queue_setup,
+	.buf_prepare	= mem2mem_buf_prepare,
+	.buf_queue	= mem2mem_buf_queue,
+	.wait_prepare	= vb2_ops_wait_prepare,
+	.wait_finish	= vb2_ops_wait_finish,
+	.start_streaming = mem2mem_start_streaming,
+	.stop_streaming = mem2mem_stop_streaming,
+};
+
+static int queue_init(void *priv, struct vb2_queue *src_vq,
+		      struct vb2_queue *dst_vq)
+{
+	struct mem2mem_ctx *ctx = priv;
+	int ret;
+
+	memset(src_vq, 0, sizeof(*src_vq));
+	src_vq->type = V4L2_BUF_TYPE_VIDEO_OUTPUT;
+	src_vq->io_modes = VB2_MMAP | VB2_DMABUF;
+	src_vq->drv_priv = ctx;
+	src_vq->buf_struct_size = sizeof(struct v4l2_m2m_buffer);
+	src_vq->ops = &mem2mem_qops;
+	src_vq->mem_ops = &vb2_dma_contig_memops;
+	src_vq->timestamp_flags = V4L2_BUF_FLAG_TIMESTAMP_COPY;
+	src_vq->lock = &ctx->priv->mutex;
+	src_vq->dev = ctx->priv->dev;
+
+	ret = vb2_queue_init(src_vq);
+	if (ret)
+		return ret;
+
+	memset(dst_vq, 0, sizeof(*dst_vq));
+	dst_vq->type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
+	dst_vq->io_modes = VB2_MMAP | VB2_DMABUF;
+	dst_vq->drv_priv = ctx;
+	dst_vq->buf_struct_size = sizeof(struct v4l2_m2m_buffer);
+	dst_vq->ops = &mem2mem_qops;
+	dst_vq->mem_ops = &vb2_dma_contig_memops;
+	dst_vq->timestamp_flags = V4L2_BUF_FLAG_TIMESTAMP_COPY;
+	dst_vq->lock = &ctx->priv->mutex;
+	dst_vq->dev = ctx->priv->dev;
+
+	return vb2_queue_init(dst_vq);
+}
+
+static int mem2mem_s_ctrl(struct v4l2_ctrl *ctrl)
+{
+	struct mem2mem_ctx *ctx = container_of(ctrl->handler,
+					       struct mem2mem_ctx, ctrl_hdlr);
+	enum ipu_rotate_mode rot_mode;
+	int rotate;
+	bool hflip, vflip;
+	int ret = 0;
+
+	rotate = ctx->rotate;
+	hflip = ctx->hflip;
+	vflip = ctx->vflip;
+
+	switch (ctrl->id) {
+	case V4L2_CID_HFLIP:
+		hflip = ctrl->val;
+		break;
+	case V4L2_CID_VFLIP:
+		vflip = ctrl->val;
+		break;
+	case V4L2_CID_ROTATE:
+		rotate = ctrl->val;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	ret = ipu_degrees_to_rot_mode(&rot_mode, rotate, hflip, vflip);
+	if (ret)
+		return ret;
+
+	if (rot_mode != ctx->rot_mode) {
+		struct vb2_queue *cap_q;
+
+		cap_q = v4l2_m2m_get_vq(ctx->fh.m2m_ctx,
+					V4L2_BUF_TYPE_VIDEO_CAPTURE);
+		if (vb2_is_streaming(cap_q))
+			return -EBUSY;
+
+		ctx->rot_mode = rot_mode;
+		ctx->rotate = rotate;
+		ctx->hflip = hflip;
+		ctx->vflip = vflip;
+	}
+
+	return 0;
+}
+
+static const struct v4l2_ctrl_ops mem2mem_ctrl_ops = {
+	.s_ctrl = mem2mem_s_ctrl,
+};
+
+static int mem2mem_init_controls(struct mem2mem_ctx *ctx)
+{
+	struct v4l2_ctrl_handler *hdlr = &ctx->ctrl_hdlr;
+	int ret;
+
+	v4l2_ctrl_handler_init(hdlr, 3);
+
+	v4l2_ctrl_new_std(hdlr, &mem2mem_ctrl_ops, V4L2_CID_HFLIP,
+			  0, 1, 1, 0);
+	v4l2_ctrl_new_std(hdlr, &mem2mem_ctrl_ops, V4L2_CID_VFLIP,
+			  0, 1, 1, 0);
+	v4l2_ctrl_new_std(hdlr, &mem2mem_ctrl_ops, V4L2_CID_ROTATE,
+			  0, 270, 90, 0);
+
+	if (hdlr->error) {
+		ret = hdlr->error;
+		goto out_free;
+	}
+
+	v4l2_ctrl_handler_setup(hdlr);
+	return 0;
+
+out_free:
+	v4l2_ctrl_handler_free(hdlr);
+	return ret;
+}
+
+#define DEFAULT_WIDTH	720
+#define DEFAULT_HEIGHT	576
+static const struct mem2mem_q_data mem2mem_q_data_default = {
+	.cur_fmt = {
+		.width = DEFAULT_WIDTH,
+		.height = DEFAULT_HEIGHT,
+		.pixelformat = V4L2_PIX_FMT_YUV420,
+		.field = V4L2_FIELD_NONE,
+		.bytesperline = DEFAULT_WIDTH,
+		.sizeimage = DEFAULT_WIDTH * DEFAULT_HEIGHT * 3 / 2,
+		.colorspace = V4L2_COLORSPACE_SRGB,
+	},
+	.rect = {
+		.width = DEFAULT_WIDTH,
+		.height = DEFAULT_HEIGHT,
+	},
+};
+
+/*
+ * File operations
+ */
+static int mem2mem_open(struct file *file)
+{
+	struct mem2mem_priv *priv = video_drvdata(file);
+	struct mem2mem_ctx *ctx = NULL;
+	int ret;
+
+	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
+	if (!ctx)
+		return -ENOMEM;
+
+	ctx->rot_mode = IPU_ROTATE_NONE;
+
+	v4l2_fh_init(&ctx->fh, video_devdata(file));
+	file->private_data = &ctx->fh;
+	v4l2_fh_add(&ctx->fh);
+	ctx->priv = priv;
+
+	ctx->fh.m2m_ctx = v4l2_m2m_ctx_init(priv->m2m_dev, ctx,
+					    &queue_init);
+	if (IS_ERR(ctx->fh.m2m_ctx)) {
+		ret = PTR_ERR(ctx->fh.m2m_ctx);
+		goto err_ctx;
+	}
+
+	ret = mem2mem_init_controls(ctx);
+	if (ret)
+		goto err_ctrls;
+
+	ctx->fh.ctrl_handler = &ctx->ctrl_hdlr;
+
+	ctx->q_data[V4L2_M2M_SRC] = mem2mem_q_data_default;
+	ctx->q_data[V4L2_M2M_DST] = mem2mem_q_data_default;
+
+	atomic_inc(&priv->num_inst);
+
+	dev_dbg(priv->dev, "Created instance %p, m2m_ctx: %p\n", ctx,
+		ctx->fh.m2m_ctx);
+
+	return 0;
+
+err_ctrls:
+	v4l2_m2m_ctx_release(ctx->fh.m2m_ctx);
+err_ctx:
+	v4l2_fh_del(&ctx->fh);
+	v4l2_fh_exit(&ctx->fh);
+	kfree(ctx);
+	return ret;
+}
+
+static int mem2mem_release(struct file *file)
+{
+	struct mem2mem_priv *priv = video_drvdata(file);
+	struct mem2mem_ctx *ctx = fh_to_ctx(file->private_data);
+
+	dev_dbg(priv->dev, "Releasing instance %p\n", ctx);
+
+	v4l2_m2m_ctx_release(ctx->fh.m2m_ctx);
+	v4l2_fh_del(&ctx->fh);
+	v4l2_fh_exit(&ctx->fh);
+	kfree(ctx);
+
+	atomic_dec(&priv->num_inst);
+
+	return 0;
+}
+
+static const struct v4l2_file_operations mem2mem_fops = {
+	.owner		= THIS_MODULE,
+	.open		= mem2mem_open,
+	.release	= mem2mem_release,
+	.poll		= v4l2_m2m_fop_poll,
+	.unlocked_ioctl	= video_ioctl2,
+	.mmap		= v4l2_m2m_fop_mmap,
+};
+
+static struct v4l2_m2m_ops m2m_ops = {
+	.device_run	= device_run,
+	.job_abort	= job_abort,
+};
+
+static const struct video_device mem2mem_videodev_template = {
+	.name		= "ipu0_ic_pp mem2mem",
+	.fops		= &mem2mem_fops,
+	.ioctl_ops	= &mem2mem_ioctl_ops,
+	.minor		= -1,
+	.release	= video_device_release,
+	.vfl_dir	= VFL_DIR_M2M,
+	.tvnorms	= V4L2_STD_NTSC | V4L2_STD_PAL | V4L2_STD_SECAM,
+	.device_caps	= V4L2_CAP_VIDEO_M2M | V4L2_CAP_STREAMING,
+};
+
+int imx_media_mem2mem_device_register(struct imx_media_video_dev *vdev)
+{
+	struct mem2mem_priv *priv = to_mem2mem_priv(vdev);
+	struct video_device *vfd = vdev->vfd;
+	int ret;
+
+	vfd->v4l2_dev = &priv->md->v4l2_dev;
+
+	ret = video_register_device(vfd, VFL_TYPE_GRABBER, -1);
+	if (ret) {
+		v4l2_err(vfd->v4l2_dev, "Failed to register video device\n");
+		return ret;
+	}
+
+	v4l2_info(vfd->v4l2_dev, "Registered %s as /dev/%s\n", vfd->name,
+		  video_device_node_name(vfd));
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(imx_media_mem2mem_device_register);
+
+void imx_media_mem2mem_device_unregister(struct imx_media_video_dev *vdev)
+{
+	struct mem2mem_priv *priv = to_mem2mem_priv(vdev);
+	struct video_device *vfd = priv->vdev.vfd;
+
+	mutex_lock(&priv->mutex);
+
+	if (video_is_registered(vfd)) {
+		video_unregister_device(vfd);
+		media_entity_cleanup(&vfd->entity);
+	}
+
+	mutex_unlock(&priv->mutex);
+}
+EXPORT_SYMBOL_GPL(imx_media_mem2mem_device_unregister);
+
+struct imx_media_video_dev *
+imx_media_mem2mem_device_init(struct imx_media_dev *md)
+{
+	struct mem2mem_priv *priv;
+	struct video_device *vfd;
+	int ret;
+
+	priv = devm_kzalloc(md->md.dev, sizeof(*priv), GFP_KERNEL);
+	if (!priv)
+		return ERR_PTR(-ENOMEM);
+
+	priv->md = md;
+	priv->dev = md->md.dev;
+
+	mutex_init(&priv->mutex);
+	atomic_set(&priv->num_inst, 0);
+
+	vfd = video_device_alloc();
+	if (!vfd)
+		return ERR_PTR(-ENOMEM);
+
+	*vfd = mem2mem_videodev_template;
+	snprintf(vfd->name, sizeof(vfd->name), "ipu_ic_pp mem2mem");
+	vfd->lock = &priv->mutex;
+	priv->vdev.vfd = vfd;
+
+	INIT_LIST_HEAD(&priv->vdev.list);
+
+	video_set_drvdata(vfd, priv);
+
+	priv->m2m_dev = v4l2_m2m_init(&m2m_ops);
+	if (IS_ERR(priv->m2m_dev)) {
+		ret = PTR_ERR(priv->m2m_dev);
+		v4l2_err(&md->v4l2_dev, "Failed to init mem2mem device: %d\n",
+			 ret);
+		return ERR_PTR(ret);
+	}
+
+	return &priv->vdev;
+}
+EXPORT_SYMBOL_GPL(imx_media_mem2mem_device_init);
+
+void imx_media_mem2mem_device_remove(struct imx_media_video_dev *vdev)
+{
+	struct mem2mem_priv *priv = to_mem2mem_priv(vdev);
+
+	v4l2_m2m_release(priv->m2m_dev);
+}
+EXPORT_SYMBOL_GPL(imx_media_mem2mem_device_remove);
+
+MODULE_DESCRIPTION("i.MX IPUv3 mem2mem scaler/CSC driver");
+MODULE_AUTHOR("Sascha Hauer <s.hauer@pengutronix.de>");
+MODULE_LICENSE("GPL");
diff --git a/drivers/staging/media/imx/imx-media.h b/drivers/staging/media/imx/imx-media.h
index 57bd094cf765..ce8b15ca401c 100644
--- a/drivers/staging/media/imx/imx-media.h
+++ b/drivers/staging/media/imx/imx-media.h
@@ -151,6 +151,9 @@ struct imx_media_dev {
 	/* for async subdev registration */
 	struct list_head asd_list;
 	struct v4l2_async_notifier subdev_notifier;
+
+	/* IC scaler/CSC mem2mem video device */
+	struct imx_media_video_dev *m2m_vdev;
 };
 
 enum codespace_sel {
@@ -264,6 +267,13 @@ void imx_media_capture_device_set_format(struct imx_media_video_dev *vdev,
 					 struct v4l2_pix_format *pix);
 void imx_media_capture_device_error(struct imx_media_video_dev *vdev);
 
+/* imx-media-mem2mem.c */
+struct imx_media_video_dev *
+imx_media_mem2mem_device_init(struct imx_media_dev *dev);
+void imx_media_mem2mem_device_remove(struct imx_media_video_dev *vdev);
+int imx_media_mem2mem_device_register(struct imx_media_video_dev *vdev);
+void imx_media_mem2mem_device_unregister(struct imx_media_video_dev *vdev);
+
 /* subdev group ids */
 #define IMX_MEDIA_GRP_ID_CSI2      BIT(8)
 #define IMX_MEDIA_GRP_ID_CSI_BIT   9
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 00/16] i.MX media mem2mem scaler
  2018-07-19 15:30 [PATCH v2 00/16] i.MX media mem2mem scaler Philipp Zabel
                   ` (15 preceding siblings ...)
  2018-07-19 15:30 ` [PATCH v2 16/16] media: imx: add mem2mem device Philipp Zabel
@ 2018-07-22 18:30 ` Steve Longerbeam
  2018-07-22 19:11   ` Steve Longerbeam
                     ` (2 more replies)
  16 siblings, 3 replies; 22+ messages in thread
From: Steve Longerbeam @ 2018-07-22 18:30 UTC (permalink / raw)
  To: Philipp Zabel, linux-media; +Cc: Nicolas Dufresne, kernel

Hi Philipp,


On 07/19/2018 08:30 AM, Philipp Zabel wrote:
> Hi,
>
> this is the second version of the i.MX mem2mem scaler series.
> Patches 8 and 16 have been modified.
>
> Changes since v1:
>   - Fix inverted allow_overshoot logic
>   - Correctly switch horizontal / vertical tile alignment when
>     determining seam positions with the 90° rotator active.

Yes, this fixes the specific rotation test that was broken
(720x480, UYVY --> 1280x768, UYVY, rotate 90).

But running more tests on this v2 reveals more issues. I chose a
somewhat random upscaling-only example as a first try:

640x480, YV12 --> full HD 2560x1600, YV12 (no rotation or flip).

This produces division by zero backtraces and the conversion hangs:


[  131.079978] Division by zero in kernel.
[  131.083853] CPU: 0 PID: 683 Comm: mx6-m2m Tainted: G W         
4.18.0-rc2-13448-g678218d #7
[  131.092830] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[  131.099372] Backtrace:
[  131.101858] [<c010def8>] (dump_backtrace) from [<c010e1b8>] 
(show_stack+0x18/0x1c)
[  131.109450]  r7:00000000 r6:600f0013 r5:00000000 r4:c107db3c
[  131.115135] [<c010e1a0>] (show_stack) from [<c0aa5aec>] 
(dump_stack+0xb4/0xe8)
[  131.122380] [<c0aa5a38>] (dump_stack) from [<c010e048>] 
(__div0+0x18/0x20)
[  131.129274]  r9:ec37d800 r8:00000003 r7:00000000 r6:00000000 
r5:00000000 r4:ec37dae8
[  131.137036] [<c010e030>] (__div0) from [<c0aa3b34>] (Ldiv0+0x8/0x10)
[  131.143425] [<c0590078>] (ipu_image_convert_prepare) from 
[<c07c92b4>] (mem2mem_start_streaming+0xe0/0x1c0)
[  131.153186]  r10:c0b9f640 r9:c071958c r8:00000280 r7:000001e0 
r6:32315659 r5:c1008908
[  131.161030]  r4:ecf9c800
[  131.163588] [<c07c91d4>] (mem2mem_start_streaming) from [<c073ac44>] 
(vb2_start_streaming+0x64/0x160)
[  131.172826]  r8:c1008908 r7:00000001 r6:ed01b808 r5:ed01b934 r4:ed01b810
[  131.179547] [<c073abe0>] (vb2_start_streaming) from [<c073c214>] 
(vb2_core_streamon+0x10c/0x164)
[  131.188351]  r9:c071958c r8:c1008908 r7:00000001 r6:ec1038f8 
r5:00000000 r4:ed01b808
[  131.196114] [<c073c108>] (vb2_core_streamon) from [<c073ebe0>] 
(vb2_streamon+0x34/0x58)
[  131.204133]  r5:40045612 r4:ed01b800
[  131.207733] [<c073ebac>] (vb2_streamon) from [<c072f33c>] 
(v4l2_m2m_streamon+0x24/0x3c)
[  131.215758] [<c072f318>] (v4l2_m2m_streamon) from [<c072f36c>] 
(v4l2_m2m_ioctl_streamon+0x18/0x1c)
[  131.224732]  r5:40045612 r4:c072f354
[  131.228330] [<c072f354>] (v4l2_m2m_ioctl_streamon) from [<c07195b0>] 
(v4l_streamon+0x24/0x28)
[  131.236878] [<c071958c>] (v4l_streamon) from [<c071be3c>] 
(__video_do_ioctl+0x284/0x4f8)
[  131.244984]  r5:40045612 r4:ecc50800
[  131.248583] [<c071bbb8>] (__video_do_ioctl) from [<c071f9dc>] 
(video_usercopy+0x260/0x55c)
[  131.256866]  r10:00000004 r9:00000000 r8:c1008908 r7:ed747dfc 
r6:00000000 r5:00000004
[  131.264709]  r4:40045612
[  131.267265] [<c071f77c>] (video_usercopy) from [<c071fcec>] 
(video_ioctl2+0x14/0x1c)
[  131.275026]  r10:00000036 r9:00000003 r8:ed480068 r7:c0254840 
r6:ecc76000 r5:bea6ab38
[  131.282869]  r4:c071fcd8
[  131.285424] [<c071fcd8>] (video_ioctl2) from [<c0717924>] 
(v4l2_ioctl+0x44/0x5c)
[  131.292845] [<c07178e0>] (v4l2_ioctl) from [<c0253e60>] 
(do_vfs_ioctl+0xa8/0xa4c)
[  131.300343]  r5:bea6ab38 r4:c1008908
[  131.303940] [<c0253db8>] (do_vfs_ioctl) from [<c0254840>] 
(ksys_ioctl+0x3c/0x60)
[  131.311355]  r10:00000036 r9:ed746000 r8:bea6ab38 r7:40045612 
r6:00000003 r5:ecc76000
[  131.319198]  r4:ecc76000
[  131.321752] [<c0254804>] (ksys_ioctl) from [<c0254874>] 
(sys_ioctl+0x10/0x14)
[  131.328907]  r9:ed746000 r8:c01011e4 r7:00000036 r6:00010960 
r5:00000000 r4:00012620
[  131.336672] [<c0254864>] (sys_ioctl) from [<c0101000>] 
(ret_fast_syscall+0x0/0x28)
[  131.344256] Exception stack(0xed747fa8 to 0xed747ff0)
[  131.349327] 7fa0:                   00012620 00000000 00000003 
40045612 bea6ab38 00000003
[  131.357524] 7fc0: 00012620 00000000 00010960 00000036 00000000 
00000000 45d80000 bea6abac
[  131.365717] 7fe0: 0002312c bea6aaa4 00012308 45e58d5c


To aid in debugging this I created branch 'imx-mem2mem.stevel' in my
mediatree fork on github. I moved the mem2mem driver to the beginning
and added a few patches:

d317a7771c ("gpu: ipu-cpmem: add WARN_ON_ONCE() for unaligned dma buffers")
b4362162c0 ("media: imx: mem2mem: Use ipu_image_convert_adjust in try 
format")
4758be0cf8 ("gpu: ipu-v3: image-convert: Fix width/height alignment")
d069163c7f ("gpu: ipu-v3: image-convert: Fix input bytesperline clamp in 
adjust")

(feel free to squash some of those if you agree with them for v3).

By moving the mem2mem driver before the seam avoidance patches, and making
it independent of the image converter implementation, the driver can be 
tested with
and without the seam avoidance changes.

If you run a git rebase and build/run the kernel when stopped at 
b4362162c0 (e.g.
without the seam avoidance patches), you will find that the above 
640x480 -->
2560x1600 conversion succeeds, albeit with the expected visible seams at the
tile boundaries.

Also, I'm trying to parse the functions find_best_seam() and 
find_seams(). Can
you provide some more background on the behavior of those functions?

Steve

>   - Fix SPDX-License-Identifier and remove superfluous license
>     text.
>   - Fix uninitialized walign in try_fmt
>
> Previous cover letter:
>
> we have image conversion code for scaling and colorspace conversion in
> the IPUv3 base driver for a while. Since the IC hardware can only write
> up to 1024x1024 pixel buffers, it scales to larger output buffers by
> splitting the input and output frame into similarly sized tiles.
>
> This causes the issue that the bilinear interpolation resets at the tile
> boundary: instead of smoothly interpolating across the seam, there is a
> jump in the input sample position that is very apparent for high
> upscaling factors. This can be avoided by slightly changing the scaling
> coefficients to let the left/top tiles overshoot their input sampling
> into the first pixel / line of their right / bottom neighbors. The error
> can be further reduced by letting tiles be differently sized and by
> selecting seam positions that minimize the input sampling position error
> at tile boundaries.
> This is complicated by different DMA start address, burst size, and
> rotator block size alignment requirements, depending on the input and
> output pixel formats, and the fact that flipping happens in different
> places depending on the rotation.
>
> This series implements optimal seam position selection and seam hiding
> with per-tile resizing coefficients and adds a scaling mem2mem device
> to the imx-media driver.
>
> regards
> Philipp
>
> Philipp Zabel (16):
>    gpu: ipu-v3: ipu-ic: allow to manually set resize coefficients
>    gpu: ipu-v3: image-convert: prepare for per-tile configuration
>    gpu: ipu-v3: image-convert: calculate per-tile resize coefficients
>    gpu: ipu-v3: image-convert: reconfigure IC per tile
>    gpu: ipu-v3: image-convert: store tile top/left position
>    gpu: ipu-v3: image-convert: calculate tile dimensions and offsets
>      outside fill_image
>    gpu: ipu-v3: image-convert: move tile alignment helpers
>    gpu: ipu-v3: image-convert: select optimal seam positions
>    gpu: ipu-v3: image-convert: fix debug output for varying tile sizes
>    gpu: ipu-v3: image-convert: relax tile width alignment for NV12 and
>      NV16
>    gpu: ipu-v3: image-convert: relax input alignment restrictions
>    gpu: ipu-v3: image-convert: relax output alignment restrictions
>    gpu: ipu-v3: image-convert: fix bytesperline adjustment
>    gpu: ipu-v3: image-convert: add some ASCII art to the exposition
>    gpu: ipu-v3: image-convert: disable double buffering if necessary
>    media: imx: add mem2mem device
>
>   drivers/gpu/ipu-v3/ipu-ic.c                   |  52 +-
>   drivers/gpu/ipu-v3/ipu-image-convert.c        | 870 +++++++++++++---
>   drivers/staging/media/imx/Kconfig             |   1 +
>   drivers/staging/media/imx/Makefile            |   1 +
>   drivers/staging/media/imx/imx-media-dev.c     |  11 +
>   drivers/staging/media/imx/imx-media-mem2mem.c | 946 ++++++++++++++++++
>   drivers/staging/media/imx/imx-media.h         |  10 +
>   include/video/imx-ipu-v3.h                    |   6 +
>   8 files changed, 1758 insertions(+), 139 deletions(-)
>   create mode 100644 drivers/staging/media/imx/imx-media-mem2mem.c
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 00/16] i.MX media mem2mem scaler
  2018-07-22 18:30 ` [PATCH v2 00/16] i.MX media mem2mem scaler Steve Longerbeam
@ 2018-07-22 19:11   ` Steve Longerbeam
  2018-07-23  9:29   ` Philipp Zabel
  2018-07-23 13:26   ` Philipp Zabel
  2 siblings, 0 replies; 22+ messages in thread
From: Steve Longerbeam @ 2018-07-22 19:11 UTC (permalink / raw)
  To: Philipp Zabel, linux-media; +Cc: Nicolas Dufresne, kernel



On 07/22/2018 11:30 AM, Steve Longerbeam wrote:
> Hi Philipp,
>
>
> On 07/19/2018 08:30 AM, Philipp Zabel wrote:
>> Hi,
>>
>> this is the second version of the i.MX mem2mem scaler series.
>> Patches 8 and 16 have been modified.
>>
>> Changes since v1:
>>   - Fix inverted allow_overshoot logic
>>   - Correctly switch horizontal / vertical tile alignment when
>>     determining seam positions with the 90° rotator active.
>
> Yes, this fixes the specific rotation test that was broken
> (720x480, UYVY --> 1280x768, UYVY, rotate 90).
>
> But running more tests on this v2 reveals more issues. I chose a
> somewhat random upscaling-only example as a first try:
>
> 640x480, YV12 --> full HD 2560x1600, YV12 (no rotation or flip).
>
> This produces division by zero backtraces and the conversion hangs:
>

The hang is apparently because the conversion is re-attempted over and
over again, with an endless WARN() from
drivers/media/common/videobuf2/videobuf2-core.c:900.

I fixed the hang with an additional patch:

50026cbe08 ("media: imx: mem2mem: Remove buffers on device_run failures")

With this the conversion completes, but the below div-by-zero errors
persist, and the resultant image is blank.

Steve

>
> [  131.079978] Division by zero in kernel.
> [  131.083853] CPU: 0 PID: 683 Comm: mx6-m2m Tainted: G W 
> 4.18.0-rc2-13448-g678218d #7
> [  131.092830] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> [  131.099372] Backtrace:
> [  131.101858] [<c010def8>] (dump_backtrace) from [<c010e1b8>] 
> (show_stack+0x18/0x1c)
> [  131.109450]  r7:00000000 r6:600f0013 r5:00000000 r4:c107db3c
> [  131.115135] [<c010e1a0>] (show_stack) from [<c0aa5aec>] 
> (dump_stack+0xb4/0xe8)
> [  131.122380] [<c0aa5a38>] (dump_stack) from [<c010e048>] 
> (__div0+0x18/0x20)
> [  131.129274]  r9:ec37d800 r8:00000003 r7:00000000 r6:00000000 
> r5:00000000 r4:ec37dae8
> [  131.137036] [<c010e030>] (__div0) from [<c0aa3b34>] (Ldiv0+0x8/0x10)
> [  131.143425] [<c0590078>] (ipu_image_convert_prepare) from 
> [<c07c92b4>] (mem2mem_start_streaming+0xe0/0x1c0)
> [  131.153186]  r10:c0b9f640 r9:c071958c r8:00000280 r7:000001e0 
> r6:32315659 r5:c1008908
> [  131.161030]  r4:ecf9c800
> [  131.163588] [<c07c91d4>] (mem2mem_start_streaming) from 
> [<c073ac44>] (vb2_start_streaming+0x64/0x160)
> [  131.172826]  r8:c1008908 r7:00000001 r6:ed01b808 r5:ed01b934 
> r4:ed01b810
> [  131.179547] [<c073abe0>] (vb2_start_streaming) from [<c073c214>] 
> (vb2_core_streamon+0x10c/0x164)
> [  131.188351]  r9:c071958c r8:c1008908 r7:00000001 r6:ec1038f8 
> r5:00000000 r4:ed01b808
> [  131.196114] [<c073c108>] (vb2_core_streamon) from [<c073ebe0>] 
> (vb2_streamon+0x34/0x58)
> [  131.204133]  r5:40045612 r4:ed01b800
> [  131.207733] [<c073ebac>] (vb2_streamon) from [<c072f33c>] 
> (v4l2_m2m_streamon+0x24/0x3c)
> [  131.215758] [<c072f318>] (v4l2_m2m_streamon) from [<c072f36c>] 
> (v4l2_m2m_ioctl_streamon+0x18/0x1c)
> [  131.224732]  r5:40045612 r4:c072f354
> [  131.228330] [<c072f354>] (v4l2_m2m_ioctl_streamon) from 
> [<c07195b0>] (v4l_streamon+0x24/0x28)
> [  131.236878] [<c071958c>] (v4l_streamon) from [<c071be3c>] 
> (__video_do_ioctl+0x284/0x4f8)
> [  131.244984]  r5:40045612 r4:ecc50800
> [  131.248583] [<c071bbb8>] (__video_do_ioctl) from [<c071f9dc>] 
> (video_usercopy+0x260/0x55c)
> [  131.256866]  r10:00000004 r9:00000000 r8:c1008908 r7:ed747dfc 
> r6:00000000 r5:00000004
> [  131.264709]  r4:40045612
> [  131.267265] [<c071f77c>] (video_usercopy) from [<c071fcec>] 
> (video_ioctl2+0x14/0x1c)
> [  131.275026]  r10:00000036 r9:00000003 r8:ed480068 r7:c0254840 
> r6:ecc76000 r5:bea6ab38
> [  131.282869]  r4:c071fcd8
> [  131.285424] [<c071fcd8>] (video_ioctl2) from [<c0717924>] 
> (v4l2_ioctl+0x44/0x5c)
> [  131.292845] [<c07178e0>] (v4l2_ioctl) from [<c0253e60>] 
> (do_vfs_ioctl+0xa8/0xa4c)
> [  131.300343]  r5:bea6ab38 r4:c1008908
> [  131.303940] [<c0253db8>] (do_vfs_ioctl) from [<c0254840>] 
> (ksys_ioctl+0x3c/0x60)
> [  131.311355]  r10:00000036 r9:ed746000 r8:bea6ab38 r7:40045612 
> r6:00000003 r5:ecc76000
> [  131.319198]  r4:ecc76000
> [  131.321752] [<c0254804>] (ksys_ioctl) from [<c0254874>] 
> (sys_ioctl+0x10/0x14)
> [  131.328907]  r9:ed746000 r8:c01011e4 r7:00000036 r6:00010960 
> r5:00000000 r4:00012620
> [  131.336672] [<c0254864>] (sys_ioctl) from [<c0101000>] 
> (ret_fast_syscall+0x0/0x28)
> [  131.344256] Exception stack(0xed747fa8 to 0xed747ff0)
> [  131.349327] 7fa0:                   00012620 00000000 00000003 
> 40045612 bea6ab38 00000003
> [  131.357524] 7fc0: 00012620 00000000 00010960 00000036 00000000 
> 00000000 45d80000 bea6abac
> [  131.365717] 7fe0: 0002312c bea6aaa4 00012308 45e58d5c
>
>
> To aid in debugging this I created branch 'imx-mem2mem.stevel' in my
> mediatree fork on github. I moved the mem2mem driver to the beginning
> and added a few patches:
>
> d317a7771c ("gpu: ipu-cpmem: add WARN_ON_ONCE() for unaligned dma 
> buffers")
> b4362162c0 ("media: imx: mem2mem: Use ipu_image_convert_adjust in try 
> format")
> 4758be0cf8 ("gpu: ipu-v3: image-convert: Fix width/height alignment")
> d069163c7f ("gpu: ipu-v3: image-convert: Fix input bytesperline clamp 
> in adjust")
>
> (feel free to squash some of those if you agree with them for v3).
>
> By moving the mem2mem driver before the seam avoidance patches, and 
> making
> it independent of the image converter implementation, the driver can 
> be tested with
> and without the seam avoidance changes.
>
> If you run a git rebase and build/run the kernel when stopped at 
> b4362162c0 (e.g.
> without the seam avoidance patches), you will find that the above 
> 640x480 -->
> 2560x1600 conversion succeeds, albeit with the expected visible seams 
> at the
> tile boundaries.
>
> Also, I'm trying to parse the functions find_best_seam() and 
> find_seams(). Can
> you provide some more background on the behavior of those functions?
>
> Steve
>
>>   - Fix SPDX-License-Identifier and remove superfluous license
>>     text.
>>   - Fix uninitialized walign in try_fmt
>>
>> Previous cover letter:
>>
>> we have image conversion code for scaling and colorspace conversion in
>> the IPUv3 base driver for a while. Since the IC hardware can only write
>> up to 1024x1024 pixel buffers, it scales to larger output buffers by
>> splitting the input and output frame into similarly sized tiles.
>>
>> This causes the issue that the bilinear interpolation resets at the tile
>> boundary: instead of smoothly interpolating across the seam, there is a
>> jump in the input sample position that is very apparent for high
>> upscaling factors. This can be avoided by slightly changing the scaling
>> coefficients to let the left/top tiles overshoot their input sampling
>> into the first pixel / line of their right / bottom neighbors. The error
>> can be further reduced by letting tiles be differently sized and by
>> selecting seam positions that minimize the input sampling position error
>> at tile boundaries.
>> This is complicated by different DMA start address, burst size, and
>> rotator block size alignment requirements, depending on the input and
>> output pixel formats, and the fact that flipping happens in different
>> places depending on the rotation.
>>
>> This series implements optimal seam position selection and seam hiding
>> with per-tile resizing coefficients and adds a scaling mem2mem device
>> to the imx-media driver.
>>
>> regards
>> Philipp
>>
>> Philipp Zabel (16):
>>    gpu: ipu-v3: ipu-ic: allow to manually set resize coefficients
>>    gpu: ipu-v3: image-convert: prepare for per-tile configuration
>>    gpu: ipu-v3: image-convert: calculate per-tile resize coefficients
>>    gpu: ipu-v3: image-convert: reconfigure IC per tile
>>    gpu: ipu-v3: image-convert: store tile top/left position
>>    gpu: ipu-v3: image-convert: calculate tile dimensions and offsets
>>      outside fill_image
>>    gpu: ipu-v3: image-convert: move tile alignment helpers
>>    gpu: ipu-v3: image-convert: select optimal seam positions
>>    gpu: ipu-v3: image-convert: fix debug output for varying tile sizes
>>    gpu: ipu-v3: image-convert: relax tile width alignment for NV12 and
>>      NV16
>>    gpu: ipu-v3: image-convert: relax input alignment restrictions
>>    gpu: ipu-v3: image-convert: relax output alignment restrictions
>>    gpu: ipu-v3: image-convert: fix bytesperline adjustment
>>    gpu: ipu-v3: image-convert: add some ASCII art to the exposition
>>    gpu: ipu-v3: image-convert: disable double buffering if necessary
>>    media: imx: add mem2mem device
>>
>>   drivers/gpu/ipu-v3/ipu-ic.c                   |  52 +-
>>   drivers/gpu/ipu-v3/ipu-image-convert.c        | 870 +++++++++++++---
>>   drivers/staging/media/imx/Kconfig             |   1 +
>>   drivers/staging/media/imx/Makefile            |   1 +
>>   drivers/staging/media/imx/imx-media-dev.c     |  11 +
>>   drivers/staging/media/imx/imx-media-mem2mem.c | 946 ++++++++++++++++++
>>   drivers/staging/media/imx/imx-media.h         |  10 +
>>   include/video/imx-ipu-v3.h                    |   6 +
>>   8 files changed, 1758 insertions(+), 139 deletions(-)
>>   create mode 100644 drivers/staging/media/imx/imx-media-mem2mem.c
>>
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 00/16] i.MX media mem2mem scaler
  2018-07-22 18:30 ` [PATCH v2 00/16] i.MX media mem2mem scaler Steve Longerbeam
  2018-07-22 19:11   ` Steve Longerbeam
@ 2018-07-23  9:29   ` Philipp Zabel
  2018-07-23  9:42     ` Philipp Zabel
  2018-07-23 13:26   ` Philipp Zabel
  2 siblings, 1 reply; 22+ messages in thread
From: Philipp Zabel @ 2018-07-23  9:29 UTC (permalink / raw)
  To: Steve Longerbeam, linux-media; +Cc: Nicolas Dufresne, kernel

Hi Steve,

On Sun, 2018-07-22 at 11:30 -0700, Steve Longerbeam wrote:
> Hi Philipp,
> 
> On 07/19/2018 08:30 AM, Philipp Zabel wrote:
> > Hi,
> > 
> > this is the second version of the i.MX mem2mem scaler series.
> > Patches 8 and 16 have been modified.
> > 
> > Changes since v1:
> >   - Fix inverted allow_overshoot logic
> >   - Correctly switch horizontal / vertical tile alignment when
> >     determining seam positions with the 90° rotator active.
> 
> Yes, this fixes the specific rotation test that was broken
> (720x480, UYVY --> 1280x768, UYVY, rotate 90).
> 
> But running more tests on this v2 reveals more issues. I chose a
> somewhat random upscaling-only example as a first try:
> 
> 640x480, YV12 --> full HD 2560x1600, YV12 (no rotation or flip).
>
> This produces division by zero backtraces and the conversion hangs:
> 
> 
> [  131.079978] Division by zero in kernel.
[...]

Thanks, find_best_seam() breaks because it is fed the wrong bottom edge:

----------8<----------
diff --git a/drivers/gpu/ipu-v3/ipu-image-convert.c b/drivers/gpu/ipu-v3/ipu-image-convert.c
index 726e3b7390c7..0c47d39adf03 100644
--- a/drivers/gpu/ipu-v3/ipu-image-convert.c
+++ b/drivers/gpu/ipu-v3/ipu-image-convert.c
@@ -806,7 +801,7 @@ static void find_seams(struct ipu_image_convert_ctx *ctx,
                /* Start within 1024 lines of the bottom edge */
                out_start = max_t(int, 0, out_bottom - 1024);
                /* End before having to add more rows above */
-               out_end = min_t(unsigned int, out_right, row * 1024);
+               out_end = min_t(unsigned int, out_bottom, row * 1024);
 
                find_best_seam(ctx, out_start, out_end,
                               in_top_align, out_top_align, out_height_align,
---------->8----------

Also we unnecessarily use four tile columns instead of three:

----------8<----------
diff --git a/drivers/gpu/ipu-v3/ipu-image-convert.c b/drivers/gpu/ipu-v3/ipu-image-convert.c
index 726e3b7390c7..0c47d39adf03 100644
--- a/drivers/gpu/ipu-v3/ipu-image-convert.c
+++ b/drivers/gpu/ipu-v3/ipu-image-convert.c
@@ -380,12 +380,7 @@ static int alloc_dma_buf(struct ipu_image_convert_priv *priv,
 
 static inline int num_stripes(int dim)
 {
-       if (dim <= 1024)
-               return 1;
-       else if (dim <= 2048)
-               return 2;
-       else
-               return 4;
+       return (dim - 1) / 1024 + 1;
 }
 
 /*
---------->8----------

With that fixed, your test case succeeds.

Unfortunately, just adding rotate=90 makes it hang again. I'll
investigate.

> To aid in debugging this I created branch 'imx-mem2mem.stevel' in my
> mediatree fork on github. I moved the mem2mem driver to the beginning
> and added a few patches:
> 
> d317a7771c ("gpu: ipu-cpmem: add WARN_ON_ONCE() for unaligned dma buffers")
> b4362162c0 ("media: imx: mem2mem: Use ipu_image_convert_adjust in try 
> format")
> 4758be0cf8 ("gpu: ipu-v3: image-convert: Fix width/height alignment")
> d069163c7f ("gpu: ipu-v3: image-convert: Fix input bytesperline clamp in 
> adjust")
> 
> (feel free to squash some of those if you agree with them for v3).
>
> By moving the mem2mem driver before the seam avoidance patches, and making
> it independent of the image converter implementation, the driver can be 
> tested with
> and without the seam avoidance changes.

Yes, this makes sense to me. If we merge the mem2mem driver before the
image-convert changes go in, it should be limited to 1024x1024 output,
but if we manage to merge both parts in the same cycle, this should be
fine.

[...]
> Also, I'm trying to parse the functions find_best_seam() and 
> find_seams(). Can
> you provide some more background on the behavior of those functions?

The hardware limits us to restart linear sampling at zero with each
tile, so find_seams() tries to find the (horizontal and vertical) output
positions where the corresponding input sampling positions are closest
to integer values.
The distance between the ideal fractional input sampling position and
the actual integer sampling position that can be achieved is the amount
of distortion we have to introduce (by slightly stretching one input
tile and slightly shrinking the other) to completely hide the visible
seams.

find_best_seam() contains the code to find the left (or top) edge for a
single column (or row) that minimizes this distortion, given the right
(or bottom) edge, scaling factor, alignment restrictions, and allowed
range. The range is limited by the maximum tile width (or height).

find_seams() first iterates over all columns, right to left, and calls
find_best_seam() for each column. Each found seam then serves as the
right edge of the next column. Then it iterates over all rows, bottom to
top, again calling find_best_seam() for each row.

The reason we start at the bottom/right edges is that we have to make
sure that burst size / rotator block size align with the bottom/right
edge of the output frame.

regards
Philipp

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 00/16] i.MX media mem2mem scaler
  2018-07-23  9:29   ` Philipp Zabel
@ 2018-07-23  9:42     ` Philipp Zabel
  0 siblings, 0 replies; 22+ messages in thread
From: Philipp Zabel @ 2018-07-23  9:42 UTC (permalink / raw)
  To: Steve Longerbeam, linux-media; +Cc: kernel, Nicolas Dufresne

On Mon, 2018-07-23 at 11:29 +0200, Philipp Zabel wrote:
[...]
> > Also, I'm trying to parse the functions find_best_seam() and 
> > find_seams(). Can
> > you provide some more background on the behavior of those functions?
> 
> The hardware limits us to restart linear sampling at zero with each
> tile, so find_seams() tries to find the (horizontal and vertical) output
> positions where the corresponding input sampling positions are closest
> to integer values.
> The distance between the ideal fractional input sampling position and
> the actual integer sampling position that can be achieved is the amount
> of distortion we have to introduce (by slightly stretching one input
> tile and slightly shrinking the other) to completely hide the visible
> seams.

Actually, this is not all of it. In addition to being an integer, the
input sampling position at seam start is still subject to alignment
restrictions, so the actual value that is minimized is the difference
between the ideal fractional input sampling position and the closest
aligned input position.

regards
Philipp

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 00/16] i.MX media mem2mem scaler
  2018-07-22 18:30 ` [PATCH v2 00/16] i.MX media mem2mem scaler Steve Longerbeam
  2018-07-22 19:11   ` Steve Longerbeam
  2018-07-23  9:29   ` Philipp Zabel
@ 2018-07-23 13:26   ` Philipp Zabel
  2 siblings, 0 replies; 22+ messages in thread
From: Philipp Zabel @ 2018-07-23 13:26 UTC (permalink / raw)
  To: Steve Longerbeam, linux-media; +Cc: Nicolas Dufresne, kernel

Hi Steve,

On Sun, 2018-07-22 at 11:30 -0700, Steve Longerbeam wrote:
[...]
> To aid in debugging this I created branch 'imx-mem2mem.stevel' in my
> mediatree fork on github. I moved the mem2mem driver to the beginning
> and added a few patches:
> 
> d317a7771c ("gpu: ipu-cpmem: add WARN_ON_ONCE() for unaligned dma buffers")
> b4362162c0 ("media: imx: mem2mem: Use ipu_image_convert_adjust in try 
> format")
> 4758be0cf8 ("gpu: ipu-v3: image-convert: Fix width/height alignment")
> d069163c7f ("gpu: ipu-v3: image-convert: Fix input bytesperline clamp in 
> adjust")
> 
> (feel free to squash some of those if you agree with them for v3).

Thank you, I've squashed them where it made sense:

- "media: imx: mem2mem: Use ipu_image_convert_adjust in try format"
  into "media: imx: add mem2mem device" so it could be merged
  independently,
- "gpu: ipu-v3: image-convert: Fix width/height alignment" into
  "gpu: ipu-v3: image-convert: relax alignment restrictions", which
  itself is squashed together from "gpu: ipu-v3: image-convert: relax
  input alignment restrictions" and "gpu: ipu-v3: image-convert: relax
  output alignment restrictions", and
- "gpu: ipu-v3: image-convert: Fix input bytesperline clamp in adjust"
  into "gpu: ipu-v3: image-convert: fix bytesperline adjustment".

I've added some fixes and limited output tile top/left alignment to 8x8
IRT block size if the rotator is being used, and dropped the current
state into this branch:

  git://git.pengutronix.de/pza/linux imx-mem2mem

regards
Philipp

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2018-07-23 14:27 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-19 15:30 [PATCH v2 00/16] i.MX media mem2mem scaler Philipp Zabel
2018-07-19 15:30 ` [PATCH v2 01/16] gpu: ipu-v3: ipu-ic: allow to manually set resize coefficients Philipp Zabel
2018-07-19 15:30 ` [PATCH v2 02/16] gpu: ipu-v3: image-convert: prepare for per-tile configuration Philipp Zabel
2018-07-19 15:30 ` [PATCH v2 03/16] gpu: ipu-v3: image-convert: calculate per-tile resize coefficients Philipp Zabel
2018-07-19 15:30 ` [PATCH v2 04/16] gpu: ipu-v3: image-convert: reconfigure IC per tile Philipp Zabel
2018-07-19 15:30 ` [PATCH v2 05/16] gpu: ipu-v3: image-convert: store tile top/left position Philipp Zabel
2018-07-19 15:30 ` [PATCH v2 06/16] gpu: ipu-v3: image-convert: calculate tile dimensions and offsets outside fill_image Philipp Zabel
2018-07-19 15:30 ` [PATCH v2 07/16] gpu: ipu-v3: image-convert: move tile alignment helpers Philipp Zabel
2018-07-19 15:30 ` [PATCH v2 08/16] gpu: ipu-v3: image-convert: select optimal seam positions Philipp Zabel
2018-07-19 15:30 ` [PATCH v2 09/16] gpu: ipu-v3: image-convert: fix debug output for varying tile sizes Philipp Zabel
2018-07-19 15:30 ` [PATCH v2 10/16] gpu: ipu-v3: image-convert: relax tile width alignment for NV12 and NV16 Philipp Zabel
2018-07-19 15:30 ` [PATCH v2 11/16] gpu: ipu-v3: image-convert: relax input alignment restrictions Philipp Zabel
2018-07-19 15:30 ` [PATCH v2 12/16] gpu: ipu-v3: image-convert: relax output " Philipp Zabel
2018-07-19 15:30 ` [PATCH v2 13/16] gpu: ipu-v3: image-convert: fix bytesperline adjustment Philipp Zabel
2018-07-19 15:30 ` [PATCH v2 14/16] gpu: ipu-v3: image-convert: add some ASCII art to the exposition Philipp Zabel
2018-07-19 15:30 ` [PATCH v2 15/16] gpu: ipu-v3: image-convert: disable double buffering if necessary Philipp Zabel
2018-07-19 15:30 ` [PATCH v2 16/16] media: imx: add mem2mem device Philipp Zabel
2018-07-22 18:30 ` [PATCH v2 00/16] i.MX media mem2mem scaler Steve Longerbeam
2018-07-22 19:11   ` Steve Longerbeam
2018-07-23  9:29   ` Philipp Zabel
2018-07-23  9:42     ` Philipp Zabel
2018-07-23 13:26   ` Philipp Zabel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).