All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/16] i.MX media mem2mem scaler
@ 2018-06-22 15:52 Philipp Zabel
  2018-06-22 15:52 ` [PATCH 01/16] gpu: ipu-v3: ipu-ic: allow to manually set resize coefficients Philipp Zabel
                   ` (16 more replies)
  0 siblings, 17 replies; 28+ messages in thread
From: Philipp Zabel @ 2018-06-22 15:52 UTC (permalink / raw)
  To: linux-media; +Cc: kernel, Steve Longerbeam

Hi,

we have image conversion code for scaling and colorspace conversion in
the IPUv3 base driver for a while. Since the IC hardware can only write
up to 1024x1024 pixel buffers, it scales to larger output buffers by
splitting the input and output frame into similarly sized tiles.

This causes the issue that the bilinear interpolation resets at the tile
boundary: instead of smoothly interpolating across the seam, there is a
jump in the input sample position that is very apparent for high
upscaling factors. This can be avoided by slightly changing the scaling
coefficients to let the left/top tiles overshoot their input sampling
into the first pixel / line of their right / bottom neighbors. The error
can be further reduced by letting tiles be differently sized and by
selecting seam positions that minimize the input sampling position error
at tile boundaries.
This is complicated by different DMA start address, burst size, and
rotator block size alignment requirements, depending on the input and
output pixel formats, and the fact that flipping happens in different
places depending on the rotation.

This series implements optimal seam position selection and seam hiding
with per-tile resizing coefficients and adds a scaling mem2mem device
to the imx-media driver.

regards
Philipp

Philipp Zabel (16):
  gpu: ipu-v3: ipu-ic: allow to manually set resize coefficients
  gpu: ipu-v3: image-convert: prepare for per-tile configuration
  gpu: ipu-v3: image-convert: calculate per-tile resize coefficients
  gpu: ipu-v3: image-convert: reconfigure IC per tile
  gpu: ipu-v3: image-convert: store tile top/left position
  gpu: ipu-v3: image-convert: calculate tile dimensions and offsets
    outside fill_image
  gpu: ipu-v3: image-convert: move tile alignment helpers
  gpu: ipu-v3: image-convert: select optimal seam positions
  gpu: ipu-v3: image-convert: fix debug output for varying tile sizes
  gpu: ipu-v3: image-convert: relax tile width alignment for NV12 and
    NV16
  gpu: ipu-v3: image-convert: relax input alignment restrictions
  gpu: ipu-v3: image-convert: relax output alignment restrictions
  gpu: ipu-v3: image-convert: fix bytesperline adjustment
  gpu: ipu-v3: image-convert: add some ASCII art to the exposition
  gpu: ipu-v3: image-convert: disable double buffering if necessary
  media: imx: add mem2mem device

 drivers/gpu/ipu-v3/ipu-ic.c                   |  52 +-
 drivers/gpu/ipu-v3/ipu-image-convert.c        | 865 +++++++++++++---
 drivers/staging/media/imx/Kconfig             |   1 +
 drivers/staging/media/imx/Makefile            |   1 +
 drivers/staging/media/imx/imx-media-dev.c     |  11 +
 drivers/staging/media/imx/imx-media-mem2mem.c | 953 ++++++++++++++++++
 drivers/staging/media/imx/imx-media.h         |  10 +
 include/video/imx-ipu-v3.h                    |   6 +
 8 files changed, 1760 insertions(+), 139 deletions(-)
 create mode 100644 drivers/staging/media/imx/imx-media-mem2mem.c

-- 
2.17.1

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 01/16] gpu: ipu-v3: ipu-ic: allow to manually set resize coefficients
  2018-06-22 15:52 [PATCH 00/16] i.MX media mem2mem scaler Philipp Zabel
@ 2018-06-22 15:52 ` Philipp Zabel
  2018-06-22 15:52 ` [PATCH 02/16] gpu: ipu-v3: image-convert: prepare for per-tile configuration Philipp Zabel
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 28+ messages in thread
From: Philipp Zabel @ 2018-06-22 15:52 UTC (permalink / raw)
  To: linux-media; +Cc: kernel, Steve Longerbeam

For tiled scaling, we want to compute the scaling coefficients
externally in such a way that the interpolation overshoots tile
boundaries and samples up to the first pixel of the next tile.
Prepare to override the resizing coefficients from the image
conversion code.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
---
 drivers/gpu/ipu-v3/ipu-ic.c | 52 +++++++++++++++++++++++--------------
 include/video/imx-ipu-v3.h  |  6 +++++
 2 files changed, 39 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-ic.c b/drivers/gpu/ipu-v3/ipu-ic.c
index 67cc820253a9..594c3cbc8291 100644
--- a/drivers/gpu/ipu-v3/ipu-ic.c
+++ b/drivers/gpu/ipu-v3/ipu-ic.c
@@ -442,36 +442,40 @@ int ipu_ic_task_graphics_init(struct ipu_ic *ic,
 }
 EXPORT_SYMBOL_GPL(ipu_ic_task_graphics_init);
 
-int ipu_ic_task_init(struct ipu_ic *ic,
-		     int in_width, int in_height,
-		     int out_width, int out_height,
-		     enum ipu_color_space in_cs,
-		     enum ipu_color_space out_cs)
+int ipu_ic_task_init_rsc(struct ipu_ic *ic,
+			 int in_width, int in_height,
+			 int out_width, int out_height,
+			 enum ipu_color_space in_cs,
+			 enum ipu_color_space out_cs,
+			 u32 rsc)
 {
 	struct ipu_ic_priv *priv = ic->priv;
-	u32 reg, downsize_coeff, resize_coeff;
+	u32 downsize_coeff, resize_coeff;
 	unsigned long flags;
 	int ret = 0;
 
-	/* Setup vertical resizing */
-	ret = calc_resize_coeffs(ic, in_height, out_height,
-				 &resize_coeff, &downsize_coeff);
-	if (ret)
-		return ret;
+	if (!rsc) {
+		/* Setup vertical resizing */
 
-	reg = (downsize_coeff << 30) | (resize_coeff << 16);
+		ret = calc_resize_coeffs(ic, in_height, out_height,
+					 &resize_coeff, &downsize_coeff);
+		if (ret)
+			return ret;
+
+		rsc = (downsize_coeff << 30) | (resize_coeff << 16);
 
-	/* Setup horizontal resizing */
-	ret = calc_resize_coeffs(ic, in_width, out_width,
-				 &resize_coeff, &downsize_coeff);
-	if (ret)
-		return ret;
+		/* Setup horizontal resizing */
+		ret = calc_resize_coeffs(ic, in_width, out_width,
+					 &resize_coeff, &downsize_coeff);
+		if (ret)
+			return ret;
 
-	reg |= (downsize_coeff << 14) | resize_coeff;
+		rsc |= (downsize_coeff << 14) | resize_coeff;
+	}
 
 	spin_lock_irqsave(&priv->lock, flags);
 
-	ipu_ic_write(ic, reg, ic->reg->rsc);
+	ipu_ic_write(ic, rsc, ic->reg->rsc);
 
 	/* Setup color space conversion */
 	ic->in_cs = in_cs;
@@ -487,6 +491,16 @@ int ipu_ic_task_init(struct ipu_ic *ic,
 	spin_unlock_irqrestore(&priv->lock, flags);
 	return ret;
 }
+
+int ipu_ic_task_init(struct ipu_ic *ic,
+		     int in_width, int in_height,
+		     int out_width, int out_height,
+		     enum ipu_color_space in_cs,
+		     enum ipu_color_space out_cs)
+{
+	return ipu_ic_task_init_rsc(ic, in_width, in_height, out_width,
+				    out_height, in_cs, out_cs, 0);
+}
 EXPORT_SYMBOL_GPL(ipu_ic_task_init);
 
 int ipu_ic_task_idma_init(struct ipu_ic *ic, struct ipuv3_channel *channel,
diff --git a/include/video/imx-ipu-v3.h b/include/video/imx-ipu-v3.h
index abbad94e14a1..94f0eec821c8 100644
--- a/include/video/imx-ipu-v3.h
+++ b/include/video/imx-ipu-v3.h
@@ -387,6 +387,12 @@ int ipu_ic_task_init(struct ipu_ic *ic,
 		     int out_width, int out_height,
 		     enum ipu_color_space in_cs,
 		     enum ipu_color_space out_cs);
+int ipu_ic_task_init_rsc(struct ipu_ic *ic,
+			 int in_width, int in_height,
+			 int out_width, int out_height,
+			 enum ipu_color_space in_cs,
+			 enum ipu_color_space out_cs,
+			 u32 rsc);
 int ipu_ic_task_graphics_init(struct ipu_ic *ic,
 			      enum ipu_color_space in_g_cs,
 			      bool galpha_en, u32 galpha,
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 02/16] gpu: ipu-v3: image-convert: prepare for per-tile configuration
  2018-06-22 15:52 [PATCH 00/16] i.MX media mem2mem scaler Philipp Zabel
  2018-06-22 15:52 ` [PATCH 01/16] gpu: ipu-v3: ipu-ic: allow to manually set resize coefficients Philipp Zabel
@ 2018-06-22 15:52 ` Philipp Zabel
  2018-06-22 15:52 ` [PATCH 03/16] gpu: ipu-v3: image-convert: calculate per-tile resize coefficients Philipp Zabel
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 28+ messages in thread
From: Philipp Zabel @ 2018-06-22 15:52 UTC (permalink / raw)
  To: linux-media; +Cc: kernel, Steve Longerbeam

Let convert_start start from a given tile index, allocate intermediate
tile with maximum tile size.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
---
 drivers/gpu/ipu-v3/ipu-image-convert.c | 60 +++++++++++++++-----------
 1 file changed, 35 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-image-convert.c b/drivers/gpu/ipu-v3/ipu-image-convert.c
index 524a717ab28e..7eef51decc97 100644
--- a/drivers/gpu/ipu-v3/ipu-image-convert.c
+++ b/drivers/gpu/ipu-v3/ipu-image-convert.c
@@ -605,7 +605,8 @@ static void init_idmac_channel(struct ipu_image_convert_ctx *ctx,
 			       struct ipuv3_channel *channel,
 			       struct ipu_image_convert_image *image,
 			       enum ipu_rotate_mode rot_mode,
-			       bool rot_swap_width_height)
+			       bool rot_swap_width_height,
+			       unsigned int tile)
 {
 	struct ipu_image_convert_chan *chan = ctx->chan;
 	unsigned int burst_size;
@@ -615,23 +616,23 @@ static void init_idmac_channel(struct ipu_image_convert_ctx *ctx,
 	unsigned int tile_idx[2];
 
 	if (image->type == IMAGE_CONVERT_OUT) {
-		tile_idx[0] = ctx->out_tile_map[0];
+		tile_idx[0] = ctx->out_tile_map[tile];
 		tile_idx[1] = ctx->out_tile_map[1];
 	} else {
-		tile_idx[0] = 0;
+		tile_idx[0] = tile;
 		tile_idx[1] = 1;
 	}
 
 	if (rot_swap_width_height) {
-		width = image->tile[0].height;
-		height = image->tile[0].width;
-		stride = image->tile[0].rot_stride;
+		width = image->tile[tile_idx[0]].height;
+		height = image->tile[tile_idx[0]].width;
+		stride = image->tile[tile_idx[0]].rot_stride;
 		addr0 = ctx->rot_intermediate[0].phys;
 		if (ctx->double_buffering)
 			addr1 = ctx->rot_intermediate[1].phys;
 	} else {
-		width = image->tile[0].width;
-		height = image->tile[0].height;
+		width = image->tile[tile_idx[0]].width;
+		height = image->tile[tile_idx[0]].height;
 		stride = image->stride;
 		addr0 = image->base.phys0 +
 			image->tile[tile_idx[0]].offset;
@@ -681,7 +682,7 @@ static void init_idmac_channel(struct ipu_image_convert_ctx *ctx,
 	ipu_idmac_set_double_buffer(channel, ctx->double_buffering);
 }
 
-static int convert_start(struct ipu_image_convert_run *run)
+static int convert_start(struct ipu_image_convert_run *run, unsigned int tile)
 {
 	struct ipu_image_convert_ctx *ctx = run->ctx;
 	struct ipu_image_convert_chan *chan = ctx->chan;
@@ -689,28 +690,29 @@ static int convert_start(struct ipu_image_convert_run *run)
 	struct ipu_image_convert_image *s_image = &ctx->in;
 	struct ipu_image_convert_image *d_image = &ctx->out;
 	enum ipu_color_space src_cs, dest_cs;
+	unsigned int dst_tile = ctx->out_tile_map[tile];
 	unsigned int dest_width, dest_height;
 	int ret;
 
-	dev_dbg(priv->ipu->dev, "%s: task %u: starting ctx %p run %p\n",
-		__func__, chan->ic_task, ctx, run);
+	dev_dbg(priv->ipu->dev, "%s: task %u: starting ctx %p run %p tile %u -> %u\n",
+		__func__, chan->ic_task, ctx, run, tile, dst_tile);
 
 	src_cs = ipu_pixelformat_to_colorspace(s_image->fmt->fourcc);
 	dest_cs = ipu_pixelformat_to_colorspace(d_image->fmt->fourcc);
 
 	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
 		/* swap width/height for resizer */
-		dest_width = d_image->tile[0].height;
-		dest_height = d_image->tile[0].width;
+		dest_width = d_image->tile[dst_tile].height;
+		dest_height = d_image->tile[dst_tile].width;
 	} else {
-		dest_width = d_image->tile[0].width;
-		dest_height = d_image->tile[0].height;
+		dest_width = d_image->tile[dst_tile].width;
+		dest_height = d_image->tile[dst_tile].height;
 	}
 
 	/* setup the IC resizer and CSC */
 	ret = ipu_ic_task_init(chan->ic,
-			       s_image->tile[0].width,
-			       s_image->tile[0].height,
+			       s_image->tile[tile].width,
+			       s_image->tile[tile].height,
 			       dest_width,
 			       dest_height,
 			       src_cs, dest_cs);
@@ -721,27 +723,27 @@ static int convert_start(struct ipu_image_convert_run *run)
 
 	/* init the source MEM-->IC PP IDMAC channel */
 	init_idmac_channel(ctx, chan->in_chan, s_image,
-			   IPU_ROTATE_NONE, false);
+			   IPU_ROTATE_NONE, false, tile);
 
 	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
 		/* init the IC PP-->MEM IDMAC channel */
 		init_idmac_channel(ctx, chan->out_chan, d_image,
-				   IPU_ROTATE_NONE, true);
+				   IPU_ROTATE_NONE, true, tile);
 
 		/* init the MEM-->IC PP ROT IDMAC channel */
 		init_idmac_channel(ctx, chan->rotation_in_chan, d_image,
-				   ctx->rot_mode, true);
+				   ctx->rot_mode, true, tile);
 
 		/* init the destination IC PP ROT-->MEM IDMAC channel */
 		init_idmac_channel(ctx, chan->rotation_out_chan, d_image,
-				   IPU_ROTATE_NONE, false);
+				   IPU_ROTATE_NONE, false, tile);
 
 		/* now link IC PP-->MEM to MEM-->IC PP ROT */
 		ipu_idmac_link(chan->out_chan, chan->rotation_in_chan);
 	} else {
 		/* init the destination IC PP-->MEM IDMAC channel */
 		init_idmac_channel(ctx, chan->out_chan, d_image,
-				   ctx->rot_mode, false);
+				   ctx->rot_mode, false, tile);
 	}
 
 	/* enable the IC */
@@ -799,7 +801,7 @@ static int do_run(struct ipu_image_convert_run *run)
 	list_del(&run->list);
 	chan->current_run = run;
 
-	return convert_start(run);
+	return convert_start(run, 0);
 }
 
 /* hold irqlock when calling */
@@ -1430,14 +1432,22 @@ ipu_image_convert_prepare(struct ipu_soc *ipu, enum ipu_ic_task ic_task,
 				 !d_image->fmt->planar);
 
 	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
+		unsigned long intermediate_size = d_image->tile[0].size;
+		unsigned int i;
+
+		for (i = 1; i < ctx->num_tiles; i++) {
+			if (d_image->tile[i].size > intermediate_size)
+				intermediate_size = d_image->tile[i].size;
+		}
+
 		ret = alloc_dma_buf(priv, &ctx->rot_intermediate[0],
-				    d_image->tile[0].size);
+				    intermediate_size);
 		if (ret)
 			goto out_free;
 		if (ctx->double_buffering) {
 			ret = alloc_dma_buf(priv,
 					    &ctx->rot_intermediate[1],
-					    d_image->tile[0].size);
+					    intermediate_size);
 			if (ret)
 				goto out_free_dmabuf0;
 		}
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 03/16] gpu: ipu-v3: image-convert: calculate per-tile resize coefficients
  2018-06-22 15:52 [PATCH 00/16] i.MX media mem2mem scaler Philipp Zabel
  2018-06-22 15:52 ` [PATCH 01/16] gpu: ipu-v3: ipu-ic: allow to manually set resize coefficients Philipp Zabel
  2018-06-22 15:52 ` [PATCH 02/16] gpu: ipu-v3: image-convert: prepare for per-tile configuration Philipp Zabel
@ 2018-06-22 15:52 ` Philipp Zabel
  2018-06-22 15:52 ` [PATCH 04/16] gpu: ipu-v3: image-convert: reconfigure IC per tile Philipp Zabel
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 28+ messages in thread
From: Philipp Zabel @ 2018-06-22 15:52 UTC (permalink / raw)
  To: linux-media; +Cc: kernel, Steve Longerbeam

Slightly modifying resize coefficients per-tile allows to completely
hide the seams between tiles and to sample the correct input pixels at
the bottom and right edges of the image.

Tiling requires a bilinear interpolator reset at each tile start, which
causes the image to be slightly shifted if the starting pixel should not
have been sampled from an integer pixel position in the source image
according to the full image resizing ratio. To work around this
hardware limitation, calculate per-tile resizing coefficients that make
sure that the correct input pixels are sampled at the tile end.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
---
 drivers/gpu/ipu-v3/ipu-image-convert.c | 236 ++++++++++++++++++++++++-
 1 file changed, 234 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-image-convert.c b/drivers/gpu/ipu-v3/ipu-image-convert.c
index 7eef51decc97..12da0772bff0 100644
--- a/drivers/gpu/ipu-v3/ipu-image-convert.c
+++ b/drivers/gpu/ipu-v3/ipu-image-convert.c
@@ -135,6 +135,12 @@ struct ipu_image_convert_ctx {
 	struct ipu_image_convert_image in;
 	struct ipu_image_convert_image out;
 	enum ipu_rotate_mode rot_mode;
+	u32 downsize_coeff_h;
+	u32 downsize_coeff_v;
+	u32 image_resize_coeff_h;
+	u32 image_resize_coeff_v;
+	u32 resize_coeffs_h[MAX_STRIPES_W];
+	u32 resize_coeffs_v[MAX_STRIPES_H];
 
 	/* intermediate buffer for rotation */
 	struct ipu_image_convert_dma_buf rot_intermediate[2];
@@ -355,6 +361,69 @@ static inline int num_stripes(int dim)
 		return 4;
 }
 
+/*
+ * Calculate downsizing coefficients, which are the same for all tiles,
+ * and bilinear resizing coefficients, which are used to find the best
+ * seam positions.
+ */
+static int calc_image_resize_coefficients(struct ipu_image_convert_ctx *ctx,
+					  struct ipu_image *in,
+					  struct ipu_image *out)
+{
+	u32 downsized_width = in->rect.width;
+	u32 downsized_height = in->rect.height;
+	u32 downsize_coeff_v = 0;
+	u32 downsize_coeff_h = 0;
+	u32 resized_width = out->rect.width;
+	u32 resized_height = out->rect.height;
+	u32 resize_coeff_h;
+	u32 resize_coeff_v;
+
+	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
+		resized_width = out->rect.height;
+		resized_height = out->rect.width;
+	}
+
+	/* Do not let invalid input lead to an endless loop below */
+	if (WARN_ON(resized_width == 0 || resized_height == 0))
+		return -EINVAL;
+
+	while (downsized_width >= resized_width * 2) {
+		downsized_width >>= 1;
+		downsize_coeff_h++;
+	}
+
+	while (downsized_height >= resized_height * 2) {
+		downsized_height >>= 1;
+		downsize_coeff_v++;
+	}
+
+	/*
+	 * Calculate the bilinear resizing coefficients that could be used if
+	 * we were converting with a single tile. The bottom right output pixel
+	 * should sample as close as possible to the bottom right input pixel
+	 * out of the decimator, but not overshoot it:
+	 */
+	resize_coeff_h = 8192 * (downsized_width - 1) / (resized_width - 1);
+	resize_coeff_v = 8192 * (downsized_height - 1) / (resized_height - 1);
+
+	dev_dbg(ctx->chan->priv->ipu->dev,
+		"%s: hscale: >>%u, *8192/%u vscale: >>%u, *8192/%u, %ux%u tiles\n",
+		__func__, downsize_coeff_h, resize_coeff_h, downsize_coeff_v,
+		resize_coeff_v, ctx->in.num_cols, ctx->in.num_rows);
+
+	if (downsize_coeff_h > 2 || downsize_coeff_v  > 2 ||
+	    resize_coeff_h > 0x3fff || resize_coeff_v > 0x3fff)
+		return -EINVAL;
+
+	ctx->downsize_coeff_h = downsize_coeff_h;
+	ctx->downsize_coeff_v = downsize_coeff_v;
+	ctx->image_resize_coeff_h = resize_coeff_h;
+	ctx->image_resize_coeff_v = resize_coeff_v;
+
+	return 0;
+}
+
 static void calc_tile_dimensions(struct ipu_image_convert_ctx *ctx,
 				 struct ipu_image_convert_image *image)
 {
@@ -558,6 +627,149 @@ static void calc_tile_offsets(struct ipu_image_convert_ctx *ctx,
 		calc_tile_offsets_packed(ctx, image);
 }
 
+/*
+ * Calculate the resizing ratio for the IC main processing section given input
+ * size, fixed downsizing coefficient, and output size.
+ * Either round to closest for the next tile's first pixel to minimize seams
+ * and distortion (for all but right column / bottom row), or round down to
+ * avoid sampling beyond the edges of the input image for this tile's last
+ * pixel.
+ * Returns the resizing coefficient, resizing ratio is 8192.0 / resize_coeff.
+ */
+static u32 calc_resize_coeff(u32 input_size, u32 downsize_coeff,
+			     u32 output_size, bool allow_overshoot)
+{
+	u32 downsized = input_size >> downsize_coeff;
+
+	if (allow_overshoot)
+		return DIV_ROUND_CLOSEST(8192 * downsized, output_size);
+	else
+		return 8192 * (downsized - 1) / (output_size - 1);
+}
+
+/*
+ * Slightly modify resize coefficients per tile to hide the bilinear
+ * interpolator reset at tile borders, shifting the right / bottom edge
+ * by up to a half input pixel. This removes noticeable seams between
+ * tiles at higher upscaling factors.
+ */
+static void calc_tile_resize_coefficients(struct ipu_image_convert_ctx *ctx)
+{
+	struct ipu_image_convert_chan *chan = ctx->chan;
+	struct ipu_image_convert_priv *priv = chan->priv;
+	struct ipu_image_tile *in_tile, *out_tile;
+	unsigned int col, row, tile_idx;
+	unsigned int last_output;
+
+	for (col = 0; col < ctx->in.num_cols; col++) {
+		bool closest = (col < ctx->in.num_cols - 1) &&
+			       !(ctx->rot_mode & IPU_ROT_BIT_HFLIP);
+		u32 resized_width;
+		u32 resize_coeff_h;
+
+		tile_idx = col;
+		in_tile = &ctx->in.tile[tile_idx];
+		out_tile = &ctx->out.tile[ctx->out_tile_map[tile_idx]];
+
+		if (ipu_rot_mode_is_irt(ctx->rot_mode))
+			resized_width = out_tile->height;
+		else
+			resized_width = out_tile->width;
+
+		resize_coeff_h = calc_resize_coeff(in_tile->width,
+						   ctx->downsize_coeff_h,
+						   resized_width, closest);
+
+		dev_dbg(priv->ipu->dev, "%s: column %u hscale: *8192/%u\n",
+			__func__, col, resize_coeff_h);
+
+
+		for (row = 0; row < ctx->in.num_rows; row++) {
+			tile_idx = row * ctx->in.num_cols + col;
+			in_tile = &ctx->in.tile[tile_idx];
+			out_tile = &ctx->out.tile[ctx->out_tile_map[tile_idx]];
+
+			/*
+			 * With the horizontal scaling factor known, round up
+			 * resized width (output width or height) to burst size.
+			 */
+			if (ipu_rot_mode_is_irt(ctx->rot_mode))
+				out_tile->height = round_up(resized_width, 8);
+			else
+				out_tile->width = round_up(resized_width, 8);
+
+			/*
+			 * Calculate input width from the last accessed input
+			 * pixel given resized width and scaling coefficients.
+			 * Round up to burst size.
+			 */
+			last_output = round_up(resized_width, 8) - 1;
+			if (closest)
+				last_output++;
+			in_tile->width = round_up(
+				(DIV_ROUND_UP(last_output * resize_coeff_h,
+					      8192) + 1)
+				<< ctx->downsize_coeff_h, 8);
+		}
+
+		ctx->resize_coeffs_h[col] = resize_coeff_h;
+	}
+
+	for (row = 0; row < ctx->in.num_rows; row++) {
+		bool closest = (row < ctx->in.num_rows - 1) &&
+			       !(ctx->rot_mode & IPU_ROT_BIT_VFLIP);
+		u32 resized_height;
+		u32 resize_coeff_v;
+
+		tile_idx = row * ctx->in.num_cols;
+		in_tile = &ctx->in.tile[tile_idx];
+		out_tile = &ctx->out.tile[ctx->out_tile_map[tile_idx]];
+
+		if (ipu_rot_mode_is_irt(ctx->rot_mode))
+			resized_height = out_tile->width;
+		else
+			resized_height = out_tile->height;
+
+		resize_coeff_v = calc_resize_coeff(in_tile->height,
+						   ctx->downsize_coeff_v,
+						   resized_height, closest);
+
+		dev_dbg(priv->ipu->dev, "%s: row %u vscale: *8192/%u\n",
+			__func__, row, resize_coeff_v);
+
+		for (col = 0; col < ctx->in.num_cols; col++) {
+			tile_idx = row * ctx->in.num_cols + col;
+			in_tile = &ctx->in.tile[tile_idx];
+			out_tile = &ctx->out.tile[ctx->out_tile_map[tile_idx]];
+
+			/*
+			 * With the vertical scaling factor known, round up
+			 * resized height (output width or height) to IDMAC
+			 * limitations.
+			 */
+			if (ipu_rot_mode_is_irt(ctx->rot_mode))
+				out_tile->width = round_up(resized_height, 2);
+			else
+				out_tile->height = round_up(resized_height, 2);
+
+			/*
+			 * Calculate input width from the last accessed input
+			 * pixel given resized height and scaling coefficients.
+			 * Align to IDMAC restrictions.
+			 */
+			last_output = round_up(resized_height, 2) - 1;
+			if (closest)
+				last_output++;
+			in_tile->height = round_up(
+				(DIV_ROUND_UP(last_output * resize_coeff_v,
+					      8192) + 1)
+				<< ctx->downsize_coeff_v, 2);
+		}
+
+		ctx->resize_coeffs_v[row] = resize_coeff_v;
+	}
+}
+
 /*
  * return the number of runs in given queue (pending_q or done_q)
  * for this context. hold irqlock when calling.
@@ -692,6 +904,8 @@ static int convert_start(struct ipu_image_convert_run *run, unsigned int tile)
 	enum ipu_color_space src_cs, dest_cs;
 	unsigned int dst_tile = ctx->out_tile_map[tile];
 	unsigned int dest_width, dest_height;
+	unsigned int col, row;
+	u32 rsc;
 	int ret;
 
 	dev_dbg(priv->ipu->dev, "%s: task %u: starting ctx %p run %p tile %u -> %u\n",
@@ -709,13 +923,26 @@ static int convert_start(struct ipu_image_convert_run *run, unsigned int tile)
 		dest_height = d_image->tile[dst_tile].height;
 	}
 
+	row = tile / s_image->num_cols;
+	col = tile % s_image->num_cols;
+
+	rsc =  (ctx->downsize_coeff_v << 30) |
+	       (ctx->resize_coeffs_v[row] << 16) |
+	       (ctx->downsize_coeff_h << 14) |
+	       (ctx->resize_coeffs_h[col]);
+
+	dev_dbg(priv->ipu->dev, "%s: %ux%u -> %ux%u (rsc = 0x%x)\n",
+		__func__, s_image->tile[tile].width,
+		s_image->tile[tile].height, dest_width, dest_height, rsc);
+
 	/* setup the IC resizer and CSC */
-	ret = ipu_ic_task_init(chan->ic,
+	ret = ipu_ic_task_init_rsc(chan->ic,
 			       s_image->tile[tile].width,
 			       s_image->tile[tile].height,
 			       dest_width,
 			       dest_height,
-			       src_cs, dest_cs);
+			       src_cs, dest_cs,
+			       rsc);
 	if (ret) {
 		dev_err(priv->ipu->dev, "ipu_ic_task_init failed, %d\n", ret);
 		return ret;
@@ -1401,6 +1628,10 @@ ipu_image_convert_prepare(struct ipu_soc *ipu, enum ipu_ic_task ic_task,
 	ctx->num_tiles = d_image->num_cols * d_image->num_rows;
 	ctx->rot_mode = rot_mode;
 
+	ret = calc_image_resize_coefficients(ctx, in, out);
+	if (ret)
+		goto out_free;
+
 	ret = fill_image(ctx, s_image, in, IMAGE_CONVERT_IN);
 	if (ret)
 		goto out_free;
@@ -1409,6 +1640,7 @@ ipu_image_convert_prepare(struct ipu_soc *ipu, enum ipu_ic_task ic_task,
 		goto out_free;
 
 	calc_out_tile_map(ctx);
+	calc_tile_resize_coefficients(ctx);
 
 	dump_format(ctx, s_image);
 	dump_format(ctx, d_image);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 04/16] gpu: ipu-v3: image-convert: reconfigure IC per tile
  2018-06-22 15:52 [PATCH 00/16] i.MX media mem2mem scaler Philipp Zabel
                   ` (2 preceding siblings ...)
  2018-06-22 15:52 ` [PATCH 03/16] gpu: ipu-v3: image-convert: calculate per-tile resize coefficients Philipp Zabel
@ 2018-06-22 15:52 ` Philipp Zabel
  2018-06-22 15:52 ` [PATCH 05/16] gpu: ipu-v3: image-convert: store tile top/left position Philipp Zabel
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 28+ messages in thread
From: Philipp Zabel @ 2018-06-22 15:52 UTC (permalink / raw)
  To: linux-media; +Cc: kernel, Steve Longerbeam

For differently sized tiles or if the resizing coefficients change,
we have to stop, reconfigure, and restart the IC between tiles.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
---
 drivers/gpu/ipu-v3/ipu-image-convert.c | 65 +++++++++++++++++---------
 1 file changed, 44 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-image-convert.c b/drivers/gpu/ipu-v3/ipu-image-convert.c
index 12da0772bff0..3907fb7dae13 100644
--- a/drivers/gpu/ipu-v3/ipu-image-convert.c
+++ b/drivers/gpu/ipu-v3/ipu-image-convert.c
@@ -1131,6 +1131,24 @@ static irqreturn_t do_bh(int irq, void *dev_id)
 	return IRQ_HANDLED;
 }
 
+static bool ic_settings_changed(struct ipu_image_convert_ctx *ctx)
+{
+	unsigned int cur_tile = ctx->next_tile - 1;
+	unsigned int next_tile = ctx->next_tile;
+
+	if (ctx->resize_coeffs_h[cur_tile % ctx->in.num_cols] !=
+	    ctx->resize_coeffs_h[next_tile % ctx->in.num_cols] ||
+	    ctx->resize_coeffs_v[cur_tile / ctx->in.num_cols] !=
+	    ctx->resize_coeffs_v[next_tile / ctx->in.num_cols] ||
+	    ctx->in.tile[cur_tile].width != ctx->in.tile[next_tile].width ||
+	    ctx->in.tile[cur_tile].height != ctx->in.tile[next_tile].height ||
+	    ctx->out.tile[cur_tile].width != ctx->out.tile[next_tile].width ||
+	    ctx->out.tile[cur_tile].height != ctx->out.tile[next_tile].height)
+		return true;
+
+	return false;
+}
+
 /* hold irqlock when calling */
 static irqreturn_t do_irq(struct ipu_image_convert_run *run)
 {
@@ -1174,27 +1192,32 @@ static irqreturn_t do_irq(struct ipu_image_convert_run *run)
 	 * not done, place the next tile buffers.
 	 */
 	if (!ctx->double_buffering) {
-
-		src_tile = &s_image->tile[ctx->next_tile];
-		dst_idx = ctx->out_tile_map[ctx->next_tile];
-		dst_tile = &d_image->tile[dst_idx];
-
-		ipu_cpmem_set_buffer(chan->in_chan, 0,
-				     s_image->base.phys0 + src_tile->offset);
-		ipu_cpmem_set_buffer(outch, 0,
-				     d_image->base.phys0 + dst_tile->offset);
-		if (s_image->fmt->planar)
-			ipu_cpmem_set_uv_offset(chan->in_chan,
-						src_tile->u_off,
-						src_tile->v_off);
-		if (d_image->fmt->planar)
-			ipu_cpmem_set_uv_offset(outch,
-						dst_tile->u_off,
-						dst_tile->v_off);
-
-		ipu_idmac_select_buffer(chan->in_chan, 0);
-		ipu_idmac_select_buffer(outch, 0);
-
+		if (ic_settings_changed(ctx)) {
+			convert_stop(run);
+			convert_start(run, ctx->next_tile);
+		} else {
+			src_tile = &s_image->tile[ctx->next_tile];
+			dst_idx = ctx->out_tile_map[ctx->next_tile];
+			dst_tile = &d_image->tile[dst_idx];
+
+			ipu_cpmem_set_buffer(chan->in_chan, 0,
+					     s_image->base.phys0 +
+					     src_tile->offset);
+			ipu_cpmem_set_buffer(outch, 0,
+					     d_image->base.phys0 +
+					     dst_tile->offset);
+			if (s_image->fmt->planar)
+				ipu_cpmem_set_uv_offset(chan->in_chan,
+							src_tile->u_off,
+							src_tile->v_off);
+			if (d_image->fmt->planar)
+				ipu_cpmem_set_uv_offset(outch,
+							dst_tile->u_off,
+							dst_tile->v_off);
+
+			ipu_idmac_select_buffer(chan->in_chan, 0);
+			ipu_idmac_select_buffer(outch, 0);
+		}
 	} else if (ctx->next_tile < ctx->num_tiles - 1) {
 
 		src_tile = &s_image->tile[ctx->next_tile + 1];
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 05/16] gpu: ipu-v3: image-convert: store tile top/left position
  2018-06-22 15:52 [PATCH 00/16] i.MX media mem2mem scaler Philipp Zabel
                   ` (3 preceding siblings ...)
  2018-06-22 15:52 ` [PATCH 04/16] gpu: ipu-v3: image-convert: reconfigure IC per tile Philipp Zabel
@ 2018-06-22 15:52 ` Philipp Zabel
  2018-06-22 15:52 ` [PATCH 06/16] gpu: ipu-v3: image-convert: calculate tile dimensions and offsets outside fill_image Philipp Zabel
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 28+ messages in thread
From: Philipp Zabel @ 2018-06-22 15:52 UTC (permalink / raw)
  To: linux-media; +Cc: kernel, Steve Longerbeam

Store tile top/left position in pixels in the tile structure.
This will allow overlapping tiles with different sizes later.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
---
 drivers/gpu/ipu-v3/ipu-image-convert.c | 27 ++++++++++++++------------
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-image-convert.c b/drivers/gpu/ipu-v3/ipu-image-convert.c
index 3907fb7dae13..c3358e83bcc1 100644
--- a/drivers/gpu/ipu-v3/ipu-image-convert.c
+++ b/drivers/gpu/ipu-v3/ipu-image-convert.c
@@ -84,6 +84,8 @@ struct ipu_image_convert_dma_chan {
 struct ipu_image_tile {
 	u32 width;
 	u32 height;
+	u32 left;
+	u32 top;
 	/* size and strides are in bytes */
 	u32 size;
 	u32 stride;
@@ -427,13 +429,17 @@ static int calc_image_resize_coefficients(struct ipu_image_convert_ctx *ctx,
 static void calc_tile_dimensions(struct ipu_image_convert_ctx *ctx,
 				 struct ipu_image_convert_image *image)
 {
-	int i;
+	unsigned int i;
 
 	for (i = 0; i < ctx->num_tiles; i++) {
 		struct ipu_image_tile *tile = &image->tile[i];
+		const unsigned int row = i / image->num_cols;
+		const unsigned int col = i % image->num_cols;
 
 		tile->height = image->base.pix.height / image->num_rows;
 		tile->width = image->base.pix.width / image->num_cols;
+		tile->left = col * tile->width;
+		tile->top = row * tile->height;
 		tile->size = ((tile->height * image->fmt->bpp) >> 3) *
 			tile->width;
 
@@ -529,7 +535,7 @@ static void calc_tile_offsets_planar(struct ipu_image_convert_ctx *ctx,
 	struct ipu_image_convert_priv *priv = chan->priv;
 	const struct ipu_image_pixfmt *fmt = image->fmt;
 	unsigned int row, col, tile = 0;
-	u32 H, w, h, y_stride, uv_stride;
+	u32 H, top, y_stride, uv_stride;
 	u32 uv_row_off, uv_col_off, uv_off, u_off, v_off, tmp;
 	u32 y_row_off, y_col_off, y_off;
 	u32 y_size, uv_size;
@@ -546,13 +552,12 @@ static void calc_tile_offsets_planar(struct ipu_image_convert_ctx *ctx,
 	uv_size = y_size / (fmt->uv_width_dec * fmt->uv_height_dec);
 
 	for (row = 0; row < image->num_rows; row++) {
-		w = image->tile[tile].width;
-		h = image->tile[tile].height;
-		y_row_off = row * h * y_stride;
-		uv_row_off = (row * h * uv_stride) / fmt->uv_height_dec;
+		top = image->tile[tile].top;
+		y_row_off = top * y_stride;
+		uv_row_off = (top * uv_stride) / fmt->uv_height_dec;
 
 		for (col = 0; col < image->num_cols; col++) {
-			y_col_off = col * w;
+			y_col_off = image->tile[tile].left;
 			uv_col_off = y_col_off / fmt->uv_width_dec;
 			if (fmt->uv_packed)
 				uv_col_off *= 2;
@@ -589,7 +594,7 @@ static void calc_tile_offsets_packed(struct ipu_image_convert_ctx *ctx,
 	struct ipu_image_convert_priv *priv = chan->priv;
 	const struct ipu_image_pixfmt *fmt = image->fmt;
 	unsigned int row, col, tile = 0;
-	u32 w, h, bpp, stride;
+	u32 bpp, stride;
 	u32 row_off, col_off;
 
 	/* setup some convenience vars */
@@ -597,12 +602,10 @@ static void calc_tile_offsets_packed(struct ipu_image_convert_ctx *ctx,
 	bpp = fmt->bpp;
 
 	for (row = 0; row < image->num_rows; row++) {
-		w = image->tile[tile].width;
-		h = image->tile[tile].height;
-		row_off = row * h * stride;
+		row_off = image->tile[tile].top * stride;
 
 		for (col = 0; col < image->num_cols; col++) {
-			col_off = (col * w * bpp) >> 3;
+			col_off = (image->tile[tile].left * bpp) >> 3;
 
 			image->tile[tile].offset = row_off + col_off;
 			image->tile[tile].u_off = 0;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 06/16] gpu: ipu-v3: image-convert: calculate tile dimensions and offsets outside fill_image
  2018-06-22 15:52 [PATCH 00/16] i.MX media mem2mem scaler Philipp Zabel
                   ` (4 preceding siblings ...)
  2018-06-22 15:52 ` [PATCH 05/16] gpu: ipu-v3: image-convert: store tile top/left position Philipp Zabel
@ 2018-06-22 15:52 ` Philipp Zabel
  2018-06-22 15:52 ` [PATCH 07/16] gpu: ipu-v3: image-convert: move tile alignment helpers Philipp Zabel
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 28+ messages in thread
From: Philipp Zabel @ 2018-06-22 15:52 UTC (permalink / raw)
  To: linux-media; +Cc: kernel, Steve Longerbeam

This will allow to calculate seam positions after initializing the
ipu_image base structure but before calculating tile dimensions.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
---
 drivers/gpu/ipu-v3/ipu-image-convert.c | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-image-convert.c b/drivers/gpu/ipu-v3/ipu-image-convert.c
index c3358e83bcc1..06d65c63262d 100644
--- a/drivers/gpu/ipu-v3/ipu-image-convert.c
+++ b/drivers/gpu/ipu-v3/ipu-image-convert.c
@@ -1447,9 +1447,6 @@ static int fill_image(struct ipu_image_convert_ctx *ctx,
 	else
 		ic_image->stride  = ic_image->base.pix.bytesperline;
 
-	calc_tile_dimensions(ctx, ic_image);
-	calc_tile_offsets(ctx, ic_image);
-
 	return 0;
 }
 
@@ -1654,10 +1651,6 @@ ipu_image_convert_prepare(struct ipu_soc *ipu, enum ipu_ic_task ic_task,
 	ctx->num_tiles = d_image->num_cols * d_image->num_rows;
 	ctx->rot_mode = rot_mode;
 
-	ret = calc_image_resize_coefficients(ctx, in, out);
-	if (ret)
-		goto out_free;
-
 	ret = fill_image(ctx, s_image, in, IMAGE_CONVERT_IN);
 	if (ret)
 		goto out_free;
@@ -1665,6 +1658,16 @@ ipu_image_convert_prepare(struct ipu_soc *ipu, enum ipu_ic_task ic_task,
 	if (ret)
 		goto out_free;
 
+	ret = calc_image_resize_coefficients(ctx, in, out);
+	if (ret)
+		goto out_free;
+
+	calc_tile_dimensions(ctx, s_image);
+	calc_tile_offsets(ctx, s_image);
+
+	calc_tile_dimensions(ctx, d_image);
+	calc_tile_offsets(ctx, d_image);
+
 	calc_out_tile_map(ctx);
 	calc_tile_resize_coefficients(ctx);
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 07/16] gpu: ipu-v3: image-convert: move tile alignment helpers
  2018-06-22 15:52 [PATCH 00/16] i.MX media mem2mem scaler Philipp Zabel
                   ` (5 preceding siblings ...)
  2018-06-22 15:52 ` [PATCH 06/16] gpu: ipu-v3: image-convert: calculate tile dimensions and offsets outside fill_image Philipp Zabel
@ 2018-06-22 15:52 ` Philipp Zabel
  2018-06-22 15:52 ` [PATCH 08/16] gpu: ipu-v3: image-convert: select optimal seam positions Philipp Zabel
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 28+ messages in thread
From: Philipp Zabel @ 2018-06-22 15:52 UTC (permalink / raw)
  To: linux-media; +Cc: kernel, Steve Longerbeam

Move tile_width_align and tile_height_align up so they
can be used by the tile edge position calculation code.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
---
 drivers/gpu/ipu-v3/ipu-image-convert.c | 54 +++++++++++++-------------
 1 file changed, 27 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-image-convert.c b/drivers/gpu/ipu-v3/ipu-image-convert.c
index 06d65c63262d..da6f18475b6b 100644
--- a/drivers/gpu/ipu-v3/ipu-image-convert.c
+++ b/drivers/gpu/ipu-v3/ipu-image-convert.c
@@ -426,6 +426,33 @@ static int calc_image_resize_coefficients(struct ipu_image_convert_ctx *ctx,
 	return 0;
 }
 
+/*
+ * We have to adjust the tile width such that the tile physaddrs and
+ * U and V plane offsets are multiples of 8 bytes as required by
+ * the IPU DMA Controller. For the planar formats, this corresponds
+ * to a pixel alignment of 16 (but use a more formal equation since
+ * the variables are available). For all the packed formats, 8 is
+ * good enough.
+ */
+static inline u32 tile_width_align(const struct ipu_image_pixfmt *fmt)
+{
+	return fmt->planar ? 8 * fmt->uv_width_dec : 8;
+}
+
+/*
+ * For tile height alignment, we have to ensure that the output tile
+ * heights are multiples of 8 lines if the IRT is required by the
+ * given rotation mode (the IRT performs rotations on 8x8 blocks
+ * at a time). If the IRT is not used, or for input image tiles,
+ * 2 lines are good enough.
+ */
+static inline u32 tile_height_align(enum ipu_image_convert_type type,
+				    enum ipu_rotate_mode rot_mode)
+{
+	return (type == IMAGE_CONVERT_OUT &&
+		ipu_rot_mode_is_irt(rot_mode)) ? 8 : 2;
+}
+
 static void calc_tile_dimensions(struct ipu_image_convert_ctx *ctx,
 				 struct ipu_image_convert_image *image)
 {
@@ -1467,33 +1494,6 @@ static unsigned int clamp_align(unsigned int x, unsigned int min,
 	return x;
 }
 
-/*
- * We have to adjust the tile width such that the tile physaddrs and
- * U and V plane offsets are multiples of 8 bytes as required by
- * the IPU DMA Controller. For the planar formats, this corresponds
- * to a pixel alignment of 16 (but use a more formal equation since
- * the variables are available). For all the packed formats, 8 is
- * good enough.
- */
-static inline u32 tile_width_align(const struct ipu_image_pixfmt *fmt)
-{
-	return fmt->planar ? 8 * fmt->uv_width_dec : 8;
-}
-
-/*
- * For tile height alignment, we have to ensure that the output tile
- * heights are multiples of 8 lines if the IRT is required by the
- * given rotation mode (the IRT performs rotations on 8x8 blocks
- * at a time). If the IRT is not used, or for input image tiles,
- * 2 lines are good enough.
- */
-static inline u32 tile_height_align(enum ipu_image_convert_type type,
-				    enum ipu_rotate_mode rot_mode)
-{
-	return (type == IMAGE_CONVERT_OUT &&
-		ipu_rot_mode_is_irt(rot_mode)) ? 8 : 2;
-}
-
 /* Adjusts input/output images to IPU restrictions */
 void ipu_image_convert_adjust(struct ipu_image *in, struct ipu_image *out,
 			      enum ipu_rotate_mode rot_mode)
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 08/16] gpu: ipu-v3: image-convert: select optimal seam positions
  2018-06-22 15:52 [PATCH 00/16] i.MX media mem2mem scaler Philipp Zabel
                   ` (6 preceding siblings ...)
  2018-06-22 15:52 ` [PATCH 07/16] gpu: ipu-v3: image-convert: move tile alignment helpers Philipp Zabel
@ 2018-06-22 15:52 ` Philipp Zabel
  2018-06-22 15:52 ` [PATCH 09/16] gpu: ipu-v3: image-convert: fix debug output for varying tile sizes Philipp Zabel
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 28+ messages in thread
From: Philipp Zabel @ 2018-06-22 15:52 UTC (permalink / raw)
  To: linux-media; +Cc: kernel, Steve Longerbeam

Select seam positions that minimize distortions during seam hiding while
satifying input and output IDMAC, rotator, and image format constraints.

This code looks for aligned output seam positions that minimize the
difference between the fractional corresponding ideal input positions
and the input positions rounded to alignment requirements.

Since now tiles can be sized differently, alignment restrictions of the
complete image can be relaxed in the next step.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
---
 drivers/gpu/ipu-v3/ipu-image-convert.c | 315 ++++++++++++++++++++++++-
 1 file changed, 309 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-image-convert.c b/drivers/gpu/ipu-v3/ipu-image-convert.c
index da6f18475b6b..fbb46d296170 100644
--- a/drivers/gpu/ipu-v3/ipu-image-convert.c
+++ b/drivers/gpu/ipu-v3/ipu-image-convert.c
@@ -426,6 +426,115 @@ static int calc_image_resize_coefficients(struct ipu_image_convert_ctx *ctx,
 	return 0;
 }
 
+#define round_closest(x, y) round_down((x) + (y)/2, (y))
+
+/*
+ * Find the best aligned seam position in the inverval [out_start, out_end].
+ * Rotation and image offsets are out of scope.
+ *
+ * @out_start: start of inverval, must be within 1024 pixels / lines
+ *             of out_end
+ * @out_end: output right / bottom edge, end of interval
+ * @in_align: input alignment, either horizontal 8-byte line start address
+ *            alignment, or pixel alignment due to image format
+ * @out_align: output alignment, either horizontal 8-byte line start address
+ *             alignment, or pixel alignment due to image format or rotator
+ *             block size
+ * @out_burst: horizontal output burst size or rotator block size
+ * @downsize_coeff: downsizing section coefficient
+ * @resize_coeff: main processing section resizing coefficient
+ * @allow_overshoot: ignore out_burst if true
+ * @_in_seam: aligned input seam position return value
+ * @_out_seam: aligned output seam position return value
+ */
+static void find_best_seam(struct ipu_image_convert_ctx *ctx,
+			   unsigned int out_start,
+			   unsigned int out_end,
+			   unsigned int in_align,
+			   unsigned int out_align,
+			   unsigned int out_burst,
+			   unsigned int downsize_coeff,
+			   unsigned int resize_coeff,
+			   bool allow_overshoot,
+			   u32 *_in_seam,
+			   u32 *_out_seam)
+{
+	struct device *dev = ctx->chan->priv->ipu->dev;
+	unsigned int out_pos;
+	/* Input / output seam position candidates */
+	unsigned int out_seam = 0;
+	unsigned int in_seam = 0;
+	unsigned int min_diff = UINT_MAX;
+
+	/*
+	 * Output tiles must start at a multiple of 8 bytes horizontally and
+	 * possibly at an even line horizontally depending on the pixel format.
+	 * Only consider output aligned positions for the seam.
+	 */
+	out_start = round_up(out_start, out_align);
+	for (out_pos = out_start; out_pos < out_end; out_pos += out_align) {
+		unsigned int in_pos;
+		unsigned int in_pos_aligned;
+		unsigned int abs_diff;
+
+		/*
+		 * Tiles in the right row / bottom column may not be allowed to
+		 * overshoot horizontally / vertically. out_burst may be the
+		 * actual DMA burst size, or the rotator block size.
+		 */
+		if (!allow_overshoot && (out_end - out_pos) % out_burst)
+			continue;
+
+		/*
+		 * Input sample position, corresponding to out_pos, 19.13 fixed
+		 * point.
+		 */
+		in_pos = (out_pos * resize_coeff) << downsize_coeff;
+		/*
+		 * The closest input sample position that we could actually
+		 * start the input tile at, 19.13 fixed point.
+		 */
+		in_pos_aligned = round_closest(in_pos, 8192U * in_align);
+
+		if (in_pos < in_pos_aligned)
+			abs_diff = in_pos_aligned - in_pos;
+		else
+			abs_diff = in_pos - in_pos_aligned;
+
+		if (abs_diff < min_diff) {
+			in_seam = in_pos_aligned;
+			out_seam = out_pos;
+			min_diff = abs_diff;
+		}
+	}
+
+	*_out_seam = out_seam;
+	/* Convert 19.13 fixed point to integer seam position */
+	*_in_seam = DIV_ROUND_CLOSEST(in_seam, 8192U);
+
+	dev_dbg(dev, "%s: out_seam %u in [%u, %u], in_seam %u diff %u.%03u\n",
+		__func__, out_seam, out_start, out_end, *_in_seam,
+		min_diff / 8192,
+		DIV_ROUND_CLOSEST(min_diff % 8192 * 1000, 8192));
+}
+
+/*
+ * Tile left edges are required to be aligned to multiples of 8 bytes
+ * by the IDMAC.
+ */
+static inline u32 tile_left_align(const struct ipu_image_pixfmt *fmt)
+{
+	return fmt->planar ? 8 * fmt->uv_width_dec : 64 / fmt->bpp;
+}
+
+/*
+ * Tile top edge alignment is only limited by chroma subsampling.
+ */
+static inline u32 tile_top_align(const struct ipu_image_pixfmt *fmt)
+{
+	return fmt->uv_height_dec > 1 ? 2 : 1;
+}
+
 /*
  * We have to adjust the tile width such that the tile physaddrs and
  * U and V plane offsets are multiples of 8 bytes as required by
@@ -453,20 +562,211 @@ static inline u32 tile_height_align(enum ipu_image_convert_type type,
 		ipu_rot_mode_is_irt(rot_mode)) ? 8 : 2;
 }
 
+/*
+ * Fill in left position and width and for all tiles in an input column, and
+ * for all corresponding output tiles. If the 90° rotator is used, the output
+ * tiles are in a row, and output tile top position and height are set.
+ */
+static void fill_tile_column(struct ipu_image_convert_ctx *ctx,
+			     unsigned int col,
+			     struct ipu_image_convert_image *in,
+			     unsigned int in_left, unsigned int in_width,
+			     struct ipu_image_convert_image *out,
+			     unsigned int out_left, unsigned int out_width)
+{
+	unsigned int row, tile_idx;
+	struct ipu_image_tile *in_tile, *out_tile;
+
+	for (row = 0; row < in->num_rows; row++) {
+		tile_idx = in->num_cols * row + col;
+		in_tile = &in->tile[tile_idx];
+		out_tile = &out->tile[ctx->out_tile_map[tile_idx]];
+
+		in_tile->left = in_left;
+		in_tile->width = in_width;
+
+		if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
+			out_tile->top = out_left;
+			out_tile->height = out_width;
+		} else {
+			out_tile->left = out_left;
+			out_tile->width = out_width;
+		}
+	}
+}
+
+/*
+ * Fill in top position and height and for all tiles in an input row, and
+ * for all corresponding output tiles. If the 90° rotator is used, the output
+ * tiles are in a column, and output tile left position and width are set.
+ */
+static void fill_tile_row(struct ipu_image_convert_ctx *ctx, unsigned int row,
+			  struct ipu_image_convert_image *in,
+			  unsigned int in_top, unsigned int in_height,
+			  struct ipu_image_convert_image *out,
+			  unsigned int out_top, unsigned int out_height)
+{
+	unsigned int col, tile_idx;
+	struct ipu_image_tile *in_tile, *out_tile;
+
+	for (col = 0; col < in->num_cols; col++) {
+		tile_idx = in->num_cols * row + col;
+		in_tile = &in->tile[tile_idx];
+		out_tile = &out->tile[ctx->out_tile_map[tile_idx]];
+
+		in_tile->top = in_top;
+		in_tile->height = in_height;
+
+		if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
+			out_tile->left = out_top;
+			out_tile->width = out_height;
+		} else {
+			out_tile->top = out_top;
+			out_tile->height = out_height;
+		}
+	}
+}
+
+/*
+ * Find the best horizontal and vertical seam positions to split into tiles.
+ * Minimize the fractional part of the input sampling position for the
+ * top / left pixels of each tile.
+ */
+static void find_seams(struct ipu_image_convert_ctx *ctx,
+		       struct ipu_image_convert_image *in,
+		       struct ipu_image_convert_image *out)
+{
+	struct device *dev = ctx->chan->priv->ipu->dev;
+	unsigned int resized_width = out->base.rect.width;
+	unsigned int resized_height = out->base.rect.height;
+	unsigned int col;
+	unsigned int row;
+	unsigned int in_left_align = tile_left_align(in->fmt);
+	unsigned int in_top_align = tile_top_align(in->fmt);
+	unsigned int out_left_align = tile_left_align(out->fmt);
+	unsigned int out_top_align = tile_top_align(out->fmt);
+	unsigned int out_width_align = tile_width_align(out->fmt);
+	unsigned int out_height_align = tile_height_align(out->type,
+							  ctx->rot_mode);
+	unsigned int in_right = in->base.rect.width;
+	unsigned int in_bottom = in->base.rect.height;
+	unsigned int out_right = out->base.rect.width;
+	unsigned int out_bottom = out->base.rect.height;
+	unsigned int flipped_out_left;
+	unsigned int flipped_out_top;
+
+	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
+		resized_width = out->base.rect.height;
+		resized_height = out->base.rect.width;
+		out_right = out->base.rect.height;
+		out_bottom = out->base.rect.width;
+	}
+
+	for (col = in->num_cols - 1; col > 0; col--) {
+		bool allow_overshoot = (col == in->num_cols - 1) &&
+				       !(ctx->rot_mode & IPU_ROT_BIT_HFLIP);
+		unsigned int out_start;
+		unsigned int out_end;
+		unsigned int in_left;
+		unsigned int out_left;
+
+		/* Start within 1024 pixels of the right edge */
+		out_start = max_t(int, 0, out_right - 1024);
+		/* End before having to add more columns to the left */
+		out_end = min_t(unsigned int, out_right, col * 1024);
+
+		find_best_seam(ctx, out_start, out_end,
+			       in_left_align, out_left_align, out_width_align,
+			       ctx->downsize_coeff_h, ctx->image_resize_coeff_h,
+			       allow_overshoot, &in_left, &out_left);
+
+		if (ctx->rot_mode & IPU_ROT_BIT_HFLIP)
+			flipped_out_left = resized_width - out_right;
+		else
+			flipped_out_left = out_left;
+
+		fill_tile_column(ctx, col, in, in_left, in_right - in_left,
+				 out, flipped_out_left, out_right - out_left);
+
+		dev_dbg(dev, "%s: col %u: %u, %u -> %u, %u\n", __func__, col,
+			in_left, in_right - in_left,
+			flipped_out_left, out_right - out_left);
+
+		in_right = in_left;
+		out_right = out_left;
+	}
+
+	flipped_out_left = (ctx->rot_mode & IPU_ROT_BIT_HFLIP) ?
+			   resized_width - out_right : 0;
+
+	fill_tile_column(ctx, 0, in, 0, in_right,
+			 out, flipped_out_left, out_right);
+
+	dev_dbg(dev, "%s: col 0: 0, %u -> %u, %u\n", __func__,
+		in_right, flipped_out_left, out_right);
+
+	for (row = in->num_rows - 1; row > 0; row--) {
+		unsigned int out_start;
+		unsigned int out_end;
+		unsigned int in_top;
+		unsigned int out_top;
+
+		/* Start within 1024 lines of the bottom edge */
+		out_start = max_t(int, 0, out_bottom - 1024);
+		/* End before having to add more rows above */
+		out_end = min_t(unsigned int, out_right, row * 1024);
+
+		find_best_seam(ctx, out_start, out_end,
+			       in_top_align, out_top_align, out_height_align,
+			       ctx->downsize_coeff_v, ctx->image_resize_coeff_v,
+			       row == in->num_rows - 1,
+			       &in_top, &out_top);
+
+		if ((ctx->rot_mode & IPU_ROT_BIT_VFLIP) ^
+		    ipu_rot_mode_is_irt(ctx->rot_mode))
+			flipped_out_top = resized_height - out_bottom;
+		else
+			flipped_out_top = out_top;
+
+		fill_tile_row(ctx, row, in, in_top, in_bottom - in_top,
+			      out, flipped_out_top, out_bottom - out_top);
+
+		dev_dbg(dev, "%s: row %u: %u, %u -> %u, %u\n", __func__, row,
+			in_top, in_bottom - in_top,
+			flipped_out_top, out_bottom - out_top);
+
+		in_bottom = in_top;
+		out_bottom = out_top;
+	}
+
+	if ((ctx->rot_mode & IPU_ROT_BIT_VFLIP) ^
+	    ipu_rot_mode_is_irt(ctx->rot_mode))
+		flipped_out_top = resized_height - out_bottom;
+	else
+		flipped_out_top = 0;
+
+	fill_tile_row(ctx, 0, in, 0, in_bottom,
+		      out, flipped_out_top, out_bottom);
+
+	dev_dbg(dev, "%s: row 0: 0, %u -> %u, %u\n", __func__,
+		in_bottom, flipped_out_top, out_bottom);
+}
+
 static void calc_tile_dimensions(struct ipu_image_convert_ctx *ctx,
 				 struct ipu_image_convert_image *image)
 {
 	unsigned int i;
 
 	for (i = 0; i < ctx->num_tiles; i++) {
-		struct ipu_image_tile *tile = &image->tile[i];
+		struct ipu_image_tile *tile;
 		const unsigned int row = i / image->num_cols;
 		const unsigned int col = i % image->num_cols;
 
-		tile->height = image->base.pix.height / image->num_rows;
-		tile->width = image->base.pix.width / image->num_cols;
-		tile->left = col * tile->width;
-		tile->top = row * tile->height;
+		if (image->type == IMAGE_CONVERT_OUT)
+			tile = &image->tile[ctx->out_tile_map[i]];
+		else
+			tile = &image->tile[i];
+
 		tile->size = ((tile->height * image->fmt->bpp) >> 3) *
 			tile->width;
 
@@ -1662,13 +1962,16 @@ ipu_image_convert_prepare(struct ipu_soc *ipu, enum ipu_ic_task ic_task,
 	if (ret)
 		goto out_free;
 
+	calc_out_tile_map(ctx);
+
+	find_seams(ctx, s_image, d_image);
+
 	calc_tile_dimensions(ctx, s_image);
 	calc_tile_offsets(ctx, s_image);
 
 	calc_tile_dimensions(ctx, d_image);
 	calc_tile_offsets(ctx, d_image);
 
-	calc_out_tile_map(ctx);
 	calc_tile_resize_coefficients(ctx);
 
 	dump_format(ctx, s_image);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 09/16] gpu: ipu-v3: image-convert: fix debug output for varying tile sizes
  2018-06-22 15:52 [PATCH 00/16] i.MX media mem2mem scaler Philipp Zabel
                   ` (7 preceding siblings ...)
  2018-06-22 15:52 ` [PATCH 08/16] gpu: ipu-v3: image-convert: select optimal seam positions Philipp Zabel
@ 2018-06-22 15:52 ` Philipp Zabel
  2018-06-22 15:52 ` [PATCH 10/16] gpu: ipu-v3: image-convert: relax tile width alignment for NV12 and NV16 Philipp Zabel
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 28+ messages in thread
From: Philipp Zabel @ 2018-06-22 15:52 UTC (permalink / raw)
  To: linux-media; +Cc: kernel, Steve Longerbeam

Since tile dimensions now vary between tiles, add debug output for each
tile's position and dimensions.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
---
 drivers/gpu/ipu-v3/ipu-image-convert.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-image-convert.c b/drivers/gpu/ipu-v3/ipu-image-convert.c
index fbb46d296170..edd59c935710 100644
--- a/drivers/gpu/ipu-v3/ipu-image-convert.c
+++ b/drivers/gpu/ipu-v3/ipu-image-convert.c
@@ -302,12 +302,11 @@ static void dump_format(struct ipu_image_convert_ctx *ctx,
 	struct ipu_image_convert_priv *priv = chan->priv;
 
 	dev_dbg(priv->ipu->dev,
-		"task %u: ctx %p: %s format: %dx%d (%dx%d tiles of size %dx%d), %c%c%c%c\n",
+		"task %u: ctx %p: %s format: %dx%d (%dx%d tiles), %c%c%c%c\n",
 		chan->ic_task, ctx,
 		ic_image->type == IMAGE_CONVERT_OUT ? "Output" : "Input",
 		ic_image->base.pix.width, ic_image->base.pix.height,
 		ic_image->num_cols, ic_image->num_rows,
-		ic_image->tile[0].width, ic_image->tile[0].height,
 		ic_image->fmt->fourcc & 0xff,
 		(ic_image->fmt->fourcc >> 8) & 0xff,
 		(ic_image->fmt->fourcc >> 16) & 0xff,
@@ -755,6 +754,8 @@ static void find_seams(struct ipu_image_convert_ctx *ctx,
 static void calc_tile_dimensions(struct ipu_image_convert_ctx *ctx,
 				 struct ipu_image_convert_image *image)
 {
+	struct ipu_image_convert_chan *chan = ctx->chan;
+	struct ipu_image_convert_priv *priv = chan->priv;
 	unsigned int i;
 
 	for (i = 0; i < ctx->num_tiles; i++) {
@@ -779,6 +780,13 @@ static void calc_tile_dimensions(struct ipu_image_convert_ctx *ctx,
 			tile->rot_stride =
 				(image->fmt->bpp * tile->height) >> 3;
 		}
+
+		dev_dbg(priv->ipu->dev,
+			"task %u: ctx %p: %s@[%u,%u]: %ux%u@%u,%u\n",
+			chan->ic_task, ctx,
+			image->type == IMAGE_CONVERT_IN ? "Input" : "Output",
+			row, col,
+			tile->width, tile->height, tile->left, tile->top);
 	}
 }
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 10/16] gpu: ipu-v3: image-convert: relax tile width alignment for NV12 and NV16
  2018-06-22 15:52 [PATCH 00/16] i.MX media mem2mem scaler Philipp Zabel
                   ` (8 preceding siblings ...)
  2018-06-22 15:52 ` [PATCH 09/16] gpu: ipu-v3: image-convert: fix debug output for varying tile sizes Philipp Zabel
@ 2018-06-22 15:52 ` Philipp Zabel
  2018-06-22 15:52 ` [PATCH 11/16] gpu: ipu-v3: image-convert: relax input alignment restrictions Philipp Zabel
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 28+ messages in thread
From: Philipp Zabel @ 2018-06-22 15:52 UTC (permalink / raw)
  To: linux-media; +Cc: kernel, Steve Longerbeam

For the planar but U/V-packed formats NV12 and NV16, 8 pixel width
alignment is good enough to fulfill the 8 byte stride requirement.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
---
 drivers/gpu/ipu-v3/ipu-image-convert.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/ipu-v3/ipu-image-convert.c b/drivers/gpu/ipu-v3/ipu-image-convert.c
index edd59c935710..68d84fae9b9d 100644
--- a/drivers/gpu/ipu-v3/ipu-image-convert.c
+++ b/drivers/gpu/ipu-v3/ipu-image-convert.c
@@ -544,7 +544,7 @@ static inline u32 tile_top_align(const struct ipu_image_pixfmt *fmt)
  */
 static inline u32 tile_width_align(const struct ipu_image_pixfmt *fmt)
 {
-	return fmt->planar ? 8 * fmt->uv_width_dec : 8;
+	return (fmt->planar && !fmt->uv_packed) ? 8 * fmt->uv_width_dec : 8;
 }
 
 /*
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 11/16] gpu: ipu-v3: image-convert: relax input alignment restrictions
  2018-06-22 15:52 [PATCH 00/16] i.MX media mem2mem scaler Philipp Zabel
                   ` (9 preceding siblings ...)
  2018-06-22 15:52 ` [PATCH 10/16] gpu: ipu-v3: image-convert: relax tile width alignment for NV12 and NV16 Philipp Zabel
@ 2018-06-22 15:52 ` Philipp Zabel
  2018-06-22 15:52 ` [PATCH 12/16] gpu: ipu-v3: image-convert: relax output " Philipp Zabel
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 28+ messages in thread
From: Philipp Zabel @ 2018-06-22 15:52 UTC (permalink / raw)
  To: linux-media; +Cc: kernel, Steve Longerbeam

If we allow the 8-pixel DMA bursts to overshoot the end of the line, the
only input alignment restrictions are dictated by the pixel format and
8-byte aligned line start address.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
---
 drivers/gpu/ipu-v3/ipu-image-convert.c | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-image-convert.c b/drivers/gpu/ipu-v3/ipu-image-convert.c
index 68d84fae9b9d..04113b1c7a92 100644
--- a/drivers/gpu/ipu-v3/ipu-image-convert.c
+++ b/drivers/gpu/ipu-v3/ipu-image-convert.c
@@ -1851,13 +1851,6 @@ void ipu_image_convert_adjust(struct ipu_image *in, struct ipu_image *out,
 		num_in_cols = num_out_cols;
 	}
 
-	/* align input width/height */
-	w_align = ilog2(tile_width_align(infmt) * num_in_cols);
-	h_align = ilog2(tile_height_align(IMAGE_CONVERT_IN, rot_mode) *
-			num_in_rows);
-	in->pix.width = clamp_align(in->pix.width, MIN_W, MAX_W, w_align);
-	in->pix.height = clamp_align(in->pix.height, MIN_H, MAX_H, h_align);
-
 	/* align output width/height */
 	w_align = ilog2(tile_width_align(outfmt) * num_out_cols);
 	h_align = ilog2(tile_height_align(IMAGE_CONVERT_OUT, rot_mode) *
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 12/16] gpu: ipu-v3: image-convert: relax output alignment restrictions
  2018-06-22 15:52 [PATCH 00/16] i.MX media mem2mem scaler Philipp Zabel
                   ` (10 preceding siblings ...)
  2018-06-22 15:52 ` [PATCH 11/16] gpu: ipu-v3: image-convert: relax input alignment restrictions Philipp Zabel
@ 2018-06-22 15:52 ` Philipp Zabel
  2018-06-22 15:52 ` [PATCH 13/16] gpu: ipu-v3: image-convert: fix bytesperline adjustment Philipp Zabel
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 28+ messages in thread
From: Philipp Zabel @ 2018-06-22 15:52 UTC (permalink / raw)
  To: linux-media; +Cc: kernel, Steve Longerbeam

If we allow different tile sizes, the output tile with / height
alignment doesn't need to be multiplied by number of columns / rows.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
---
 drivers/gpu/ipu-v3/ipu-image-convert.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-image-convert.c b/drivers/gpu/ipu-v3/ipu-image-convert.c
index 04113b1c7a92..3eb74d41733f 100644
--- a/drivers/gpu/ipu-v3/ipu-image-convert.c
+++ b/drivers/gpu/ipu-v3/ipu-image-convert.c
@@ -1852,9 +1852,8 @@ void ipu_image_convert_adjust(struct ipu_image *in, struct ipu_image *out,
 	}
 
 	/* align output width/height */
-	w_align = ilog2(tile_width_align(outfmt) * num_out_cols);
-	h_align = ilog2(tile_height_align(IMAGE_CONVERT_OUT, rot_mode) *
-			num_out_rows);
+	w_align = ilog2(tile_width_align(outfmt));
+	h_align = ilog2(tile_height_align(IMAGE_CONVERT_OUT, rot_mode));
 	out->pix.width = clamp_align(out->pix.width, MIN_W, MAX_W, w_align);
 	out->pix.height = clamp_align(out->pix.height, MIN_H, MAX_H, h_align);
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 13/16] gpu: ipu-v3: image-convert: fix bytesperline adjustment
  2018-06-22 15:52 [PATCH 00/16] i.MX media mem2mem scaler Philipp Zabel
                   ` (11 preceding siblings ...)
  2018-06-22 15:52 ` [PATCH 12/16] gpu: ipu-v3: image-convert: relax output " Philipp Zabel
@ 2018-06-22 15:52 ` Philipp Zabel
  2018-06-22 15:52 ` [PATCH 14/16] gpu: ipu-v3: image-convert: add some ASCII art to the exposition Philipp Zabel
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 28+ messages in thread
From: Philipp Zabel @ 2018-06-22 15:52 UTC (permalink / raw)
  To: linux-media; +Cc: kernel, Steve Longerbeam

For planar formats, bytesperline does not depend on BPP. It must always
be larger than width and aligned to tile width alignment restrictions.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
---
 drivers/gpu/ipu-v3/ipu-image-convert.c | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-image-convert.c b/drivers/gpu/ipu-v3/ipu-image-convert.c
index 3eb74d41733f..43eaa512e8c2 100644
--- a/drivers/gpu/ipu-v3/ipu-image-convert.c
+++ b/drivers/gpu/ipu-v3/ipu-image-convert.c
@@ -1858,10 +1858,19 @@ void ipu_image_convert_adjust(struct ipu_image *in, struct ipu_image *out,
 	out->pix.height = clamp_align(out->pix.height, MIN_H, MAX_H, h_align);
 
 	/* set input/output strides and image sizes */
-	in->pix.bytesperline = (in->pix.width * infmt->bpp) >> 3;
-	in->pix.sizeimage = in->pix.height * in->pix.bytesperline;
-	out->pix.bytesperline = (out->pix.width * outfmt->bpp) >> 3;
-	out->pix.sizeimage = out->pix.height * out->pix.bytesperline;
+	in->pix.bytesperline = infmt->planar ?
+		clamp_align(in->pix.width,
+			    in->pix.bytesperline, MAX_W, w_align) :
+		clamp_align((in->pix.width * infmt->bpp) >> 3,
+			    in->pix.bytesperline, MAX_W, w_align);
+	in->pix.sizeimage = infmt->planar ?
+		(in->pix.height * in->pix.bytesperline * infmt->bpp) >> 3 :
+		in->pix.height * in->pix.bytesperline;
+	out->pix.bytesperline = outfmt->planar ? out->pix.width :
+		(out->pix.width * outfmt->bpp) >> 3;
+	out->pix.sizeimage = outfmt->planar ?
+		(out->pix.height * out->pix.bytesperline * outfmt->bpp) >> 3 :
+		out->pix.height * out->pix.bytesperline;
 }
 EXPORT_SYMBOL_GPL(ipu_image_convert_adjust);
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 14/16] gpu: ipu-v3: image-convert: add some ASCII art to the exposition
  2018-06-22 15:52 [PATCH 00/16] i.MX media mem2mem scaler Philipp Zabel
                   ` (12 preceding siblings ...)
  2018-06-22 15:52 ` [PATCH 13/16] gpu: ipu-v3: image-convert: fix bytesperline adjustment Philipp Zabel
@ 2018-06-22 15:52 ` Philipp Zabel
  2018-06-22 15:52 ` [PATCH 15/16] gpu: ipu-v3: image-convert: disable double buffering if necessary Philipp Zabel
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 28+ messages in thread
From: Philipp Zabel @ 2018-06-22 15:52 UTC (permalink / raw)
  To: linux-media; +Cc: kernel, Steve Longerbeam

Visualize the scaling and rotation pipeline with some ASCII art
diagrams. Remove the FIXME comment about missing seam prevention.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
---
 drivers/gpu/ipu-v3/ipu-image-convert.c | 39 +++++++++++++++++++-------
 1 file changed, 29 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-image-convert.c b/drivers/gpu/ipu-v3/ipu-image-convert.c
index 43eaa512e8c2..c6050cf12885 100644
--- a/drivers/gpu/ipu-v3/ipu-image-convert.c
+++ b/drivers/gpu/ipu-v3/ipu-image-convert.c
@@ -37,17 +37,36 @@
  * when double_buffering boolean is set).
  *
  * Note that the input frame must be split up into the same number
- * of tiles as the output frame.
+ * of tiles as the output frame:
  *
- * FIXME: at this point there is no attempt to deal with visible seams
- * at the tile boundaries when upscaling. The seams are caused by a reset
- * of the bilinear upscale interpolation when starting a new tile. The
- * seams are barely visible for small upscale factors, but become
- * increasingly visible as the upscale factor gets larger, since more
- * interpolated pixels get thrown out at the tile boundaries. A possilble
- * fix might be to overlap tiles of different sizes, but this must be done
- * while also maintaining the IDMAC dma buffer address alignment and 8x8 IRT
- * alignment restrictions of each tile.
+ *                       +---------+-----+
+ *   +-----+---+         |  A      | B   |
+ *   | A   | B |         |         |     |
+ *   +-----+---+   -->   +---------+-----+
+ *   | C   | D |         |  C      | D   |
+ *   +-----+---+         |         |     |
+ *                       +---------+-----+
+ *
+ * Clockwise 90° rotations are handled by first rescaling into a
+ * reusable temporary tile buffer and then rotating with the 8x8
+ * block rotator, writing to the correct destination:
+ *
+ *                                         +-----+-----+
+ *                                         |     |     |
+ *   +-----+---+         +---------+       | C   | A   |
+ *   | A   | B |         | A,B, |  |       |     |     |
+ *   +-----+---+   -->   | C,D  |  |  -->  |     |     |
+ *   | C   | D |         +---------+       +-----+-----+
+ *   +-----+---+                           | D   | B   |
+ *                                         |     |     |
+ *                                         +-----+-----+
+ *
+ * If the 8x8 block rotator is used, horizontal or vertical flipping
+ * is done during the rotation step, otherwise flipping is done
+ * during the scaling step.
+ * With rotation or flipping, tile order changes between input and
+ * output image. Tiles are numbered row major from top left to bottom
+ * right for both input and output image.
  */
 
 #define MAX_STRIPES_W    4
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 15/16] gpu: ipu-v3: image-convert: disable double buffering if necessary
  2018-06-22 15:52 [PATCH 00/16] i.MX media mem2mem scaler Philipp Zabel
                   ` (13 preceding siblings ...)
  2018-06-22 15:52 ` [PATCH 14/16] gpu: ipu-v3: image-convert: add some ASCII art to the exposition Philipp Zabel
@ 2018-06-22 15:52 ` Philipp Zabel
  2018-06-22 15:52 ` [PATCH 16/16] media: imx: add mem2mem device Philipp Zabel
  2018-07-05 21:55 ` [PATCH 00/16] i.MX media mem2mem scaler Steve Longerbeam
  16 siblings, 0 replies; 28+ messages in thread
From: Philipp Zabel @ 2018-06-22 15:52 UTC (permalink / raw)
  To: linux-media; +Cc: kernel, Steve Longerbeam

Double-buffering only works if tile sizes are the same and the resizing
coefficient does not change between tiles, even for non-planar formats.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
---
 drivers/gpu/ipu-v3/ipu-image-convert.c | 27 ++++++++++++++++++++++++--
 1 file changed, 25 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-image-convert.c b/drivers/gpu/ipu-v3/ipu-image-convert.c
index c6050cf12885..0fbffc0b058a 100644
--- a/drivers/gpu/ipu-v3/ipu-image-convert.c
+++ b/drivers/gpu/ipu-v3/ipu-image-convert.c
@@ -1934,6 +1934,7 @@ ipu_image_convert_prepare(struct ipu_soc *ipu, enum ipu_ic_task ic_task,
 	struct ipu_image_convert_chan *chan;
 	struct ipu_image_convert_ctx *ctx;
 	unsigned long flags;
+	unsigned int i;
 	bool get_res;
 	int ret;
 
@@ -2017,15 +2018,37 @@ ipu_image_convert_prepare(struct ipu_soc *ipu, enum ipu_ic_task ic_task,
 	 * for every tile, and therefore would have to be updated for
 	 * each buffer which is not possible. So double-buffering is
 	 * impossible when either the source or destination images are
-	 * a planar format (YUV420, YUV422P, etc.).
+	 * a planar format (YUV420, YUV422P, etc.). Further, differently
+	 * sized tiles or different resizing coefficients per tile
+	 * prevent double-buffering as well.
 	 */
 	ctx->double_buffering = (ctx->num_tiles > 1 &&
 				 !s_image->fmt->planar &&
 				 !d_image->fmt->planar);
+	for (i = 1; i < ctx->num_tiles; i++) {
+		if (ctx->in.tile[i].width != ctx->in.tile[0].width ||
+		    ctx->in.tile[i].height != ctx->in.tile[0].height ||
+		    ctx->out.tile[i].width != ctx->out.tile[0].width ||
+		    ctx->out.tile[i].height != ctx->out.tile[0].height) {
+			ctx->double_buffering = false;
+			break;
+		}
+	}
+	for (i = 1; i < ctx->in.num_cols; i++) {
+		if (ctx->resize_coeffs_h[i] != ctx->resize_coeffs_h[0]) {
+			ctx->double_buffering = false;
+			break;
+		}
+	}
+	for (i = 1; i < ctx->in.num_rows; i++) {
+		if (ctx->resize_coeffs_v[i] != ctx->resize_coeffs_v[0]) {
+			ctx->double_buffering = false;
+			break;
+		}
+	}
 
 	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
 		unsigned long intermediate_size = d_image->tile[0].size;
-		unsigned int i;
 
 		for (i = 1; i < ctx->num_tiles; i++) {
 			if (d_image->tile[i].size > intermediate_size)
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 16/16] media: imx: add mem2mem device
  2018-06-22 15:52 [PATCH 00/16] i.MX media mem2mem scaler Philipp Zabel
                   ` (14 preceding siblings ...)
  2018-06-22 15:52 ` [PATCH 15/16] gpu: ipu-v3: image-convert: disable double buffering if necessary Philipp Zabel
@ 2018-06-22 15:52 ` Philipp Zabel
  2018-06-22 19:37   ` Nicolas Dufresne
                     ` (3 more replies)
  2018-07-05 21:55 ` [PATCH 00/16] i.MX media mem2mem scaler Steve Longerbeam
  16 siblings, 4 replies; 28+ messages in thread
From: Philipp Zabel @ 2018-06-22 15:52 UTC (permalink / raw)
  To: linux-media; +Cc: kernel, Steve Longerbeam

Add a single imx-media mem2mem video device that uses the IPU IC PP
(image converter post processing) task for scaling and colorspace
conversion.
On i.MX6Q/DL SoCs with two IPUs currently only the first IPU is used.

The hardware only supports writing to destination buffers up to
1024x1024 pixels in a single pass, so the mem2mem video device is
limited to this resolution. After fixing the tiling code it should
be possible to extend this to arbitrary sizes by rendering multiple
tiles per frame.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
---
 drivers/staging/media/imx/Kconfig             |   1 +
 drivers/staging/media/imx/Makefile            |   1 +
 drivers/staging/media/imx/imx-media-dev.c     |  11 +
 drivers/staging/media/imx/imx-media-mem2mem.c | 953 ++++++++++++++++++
 drivers/staging/media/imx/imx-media.h         |  10 +
 5 files changed, 976 insertions(+)
 create mode 100644 drivers/staging/media/imx/imx-media-mem2mem.c

diff --git a/drivers/staging/media/imx/Kconfig b/drivers/staging/media/imx/Kconfig
index bfc17de56b17..07013cb3cb66 100644
--- a/drivers/staging/media/imx/Kconfig
+++ b/drivers/staging/media/imx/Kconfig
@@ -6,6 +6,7 @@ config VIDEO_IMX_MEDIA
 	depends on HAS_DMA
 	select VIDEOBUF2_DMA_CONTIG
 	select V4L2_FWNODE
+	select V4L2_MEM2MEM_DEV
 	---help---
 	  Say yes here to enable support for video4linux media controller
 	  driver for the i.MX5/6 SOC.
diff --git a/drivers/staging/media/imx/Makefile b/drivers/staging/media/imx/Makefile
index 698a4210316e..f2e722d0fa19 100644
--- a/drivers/staging/media/imx/Makefile
+++ b/drivers/staging/media/imx/Makefile
@@ -6,6 +6,7 @@ imx-media-ic-objs := imx-ic-common.o imx-ic-prp.o imx-ic-prpencvf.o
 obj-$(CONFIG_VIDEO_IMX_MEDIA) += imx-media.o
 obj-$(CONFIG_VIDEO_IMX_MEDIA) += imx-media-common.o
 obj-$(CONFIG_VIDEO_IMX_MEDIA) += imx-media-capture.o
+obj-$(CONFIG_VIDEO_IMX_MEDIA) += imx-media-mem2mem.o
 obj-$(CONFIG_VIDEO_IMX_MEDIA) += imx-media-vdic.o
 obj-$(CONFIG_VIDEO_IMX_MEDIA) += imx-media-ic.o
 
diff --git a/drivers/staging/media/imx/imx-media-dev.c b/drivers/staging/media/imx/imx-media-dev.c
index 289d775c4820..7a9aabcae3ee 100644
--- a/drivers/staging/media/imx/imx-media-dev.c
+++ b/drivers/staging/media/imx/imx-media-dev.c
@@ -359,6 +359,17 @@ static int imx_media_probe_complete(struct v4l2_async_notifier *notifier)
 		goto unlock;
 
 	ret = v4l2_device_register_subdev_nodes(&imxmd->v4l2_dev);
+	if (ret)
+		goto unlock;
+
+	/* TODO: check whether we have IC subdevices first */
+	imxmd->m2m_vdev = imx_media_mem2mem_device_init(imxmd);
+	if (IS_ERR(imxmd->m2m_vdev)) {
+		ret = PTR_ERR(imxmd->m2m_vdev);
+		goto unlock;
+	}
+
+	ret = imx_media_mem2mem_device_register(imxmd->m2m_vdev);
 unlock:
 	mutex_unlock(&imxmd->mutex);
 	if (ret)
diff --git a/drivers/staging/media/imx/imx-media-mem2mem.c b/drivers/staging/media/imx/imx-media-mem2mem.c
new file mode 100644
index 000000000000..8830f77f0407
--- /dev/null
+++ b/drivers/staging/media/imx/imx-media-mem2mem.c
@@ -0,0 +1,953 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * i.MX IPUv3 mem2mem Scaler/CSC driver
+ *
+ * Copyright (C) 2011 Pengutronix, Sascha Hauer
+ * Copyright (C) 2018 Pengutronix, Philipp Zabel
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+#include <linux/module.h>
+#include <linux/delay.h>
+#include <linux/fs.h>
+#include <linux/version.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <video/imx-ipu-v3.h>
+#include <video/imx-ipu-image-convert.h>
+
+#include <media/v4l2-ctrls.h>
+#include <media/v4l2-mem2mem.h>
+#include <media/v4l2-device.h>
+#include <media/v4l2-ioctl.h>
+#include <media/videobuf2-dma-contig.h>
+
+#include "imx-media.h"
+
+#define MIN_W 16
+#define MIN_H 16
+#define MAX_W 4096
+#define MAX_H 4096
+
+#define fh_to_ctx(__fh)	container_of(__fh, struct mem2mem_ctx, fh)
+
+enum {
+	V4L2_M2M_SRC = 0,
+	V4L2_M2M_DST = 1,
+};
+
+struct mem2mem_priv {
+	struct imx_media_video_dev vdev;
+
+	struct v4l2_m2m_dev   *m2m_dev;
+	struct device         *dev;
+
+	struct imx_media_dev  *md;
+
+	struct mutex          mutex;       /* mem2mem device mutex */
+
+	atomic_t              num_inst;
+};
+
+#define to_mem2mem_priv(v) container_of(v, struct mem2mem_priv, vdev)
+
+/* Per-queue, driver-specific private data */
+struct mem2mem_q_data {
+	struct v4l2_pix_format	cur_fmt;
+	struct v4l2_rect	rect;
+};
+
+struct mem2mem_ctx {
+	struct mem2mem_priv	*priv;
+
+	struct v4l2_fh		fh;
+	struct mem2mem_q_data	q_data[2];
+	int			error;
+	struct ipu_image_convert_ctx *icc;
+
+	struct v4l2_ctrl_handler ctrl_hdlr;
+	int rotate;
+	bool hflip;
+	bool vflip;
+	enum ipu_rotate_mode	rot_mode;
+};
+
+static struct mem2mem_q_data *get_q_data(struct mem2mem_ctx *ctx,
+					 enum v4l2_buf_type type)
+{
+	if (V4L2_TYPE_IS_OUTPUT(type))
+		return &ctx->q_data[V4L2_M2M_SRC];
+	else
+		return &ctx->q_data[V4L2_M2M_DST];
+}
+
+/*
+ * mem2mem callbacks
+ */
+
+static void job_abort(void *_ctx)
+{
+	struct mem2mem_ctx *ctx = _ctx;
+
+	if (ctx->icc)
+		ipu_image_convert_abort(ctx->icc);
+}
+
+static void mem2mem_ic_complete(struct ipu_image_convert_run *run, void *_ctx)
+{
+	struct mem2mem_ctx *ctx = _ctx;
+	struct mem2mem_priv *priv = ctx->priv;
+	struct vb2_v4l2_buffer *src_buf, *dst_buf;
+
+	src_buf = v4l2_m2m_src_buf_remove(ctx->fh.m2m_ctx);
+	dst_buf = v4l2_m2m_dst_buf_remove(ctx->fh.m2m_ctx);
+
+	dst_buf->vb2_buf.timestamp = src_buf->vb2_buf.timestamp;
+	dst_buf->timecode = src_buf->timecode;
+
+	v4l2_m2m_buf_done(src_buf, run->status ? VB2_BUF_STATE_ERROR :
+						 VB2_BUF_STATE_DONE);
+	v4l2_m2m_buf_done(dst_buf, run->status ? VB2_BUF_STATE_ERROR :
+						 VB2_BUF_STATE_DONE);
+
+	v4l2_m2m_job_finish(priv->m2m_dev, ctx->fh.m2m_ctx);
+	kfree(run);
+}
+
+static void device_run(void *_ctx)
+{
+	struct mem2mem_ctx *ctx = _ctx;
+	struct mem2mem_priv *priv = ctx->priv;
+	struct vb2_v4l2_buffer *src_buf, *dst_buf;
+	struct ipu_image_convert_run *run;
+	int ret;
+
+	src_buf = v4l2_m2m_next_src_buf(ctx->fh.m2m_ctx);
+	dst_buf = v4l2_m2m_next_dst_buf(ctx->fh.m2m_ctx);
+
+	run = kzalloc(sizeof(*run), GFP_KERNEL);
+	if (!run)
+		goto err;
+
+	run->ctx = ctx->icc;
+	run->in_phys = vb2_dma_contig_plane_dma_addr(&src_buf->vb2_buf, 0);
+	run->out_phys = vb2_dma_contig_plane_dma_addr(&dst_buf->vb2_buf, 0);
+
+	ret = ipu_image_convert_queue(run);
+	if (ret < 0) {
+		v4l2_err(ctx->priv->vdev.vfd->v4l2_dev,
+			 "%s: failed to queue: %d\n", __func__, ret);
+		goto err;
+	}
+
+	return;
+
+err:
+	v4l2_m2m_buf_done(src_buf, VB2_BUF_STATE_ERROR);
+	v4l2_m2m_buf_done(dst_buf, VB2_BUF_STATE_ERROR);
+	v4l2_m2m_job_finish(priv->m2m_dev, ctx->fh.m2m_ctx);
+}
+
+/*
+ * Video ioctls
+ */
+static int vidioc_querycap(struct file *file, void *priv,
+			   struct v4l2_capability *cap)
+{
+	strncpy(cap->driver, "imx-media-mem2mem", sizeof(cap->driver) - 1);
+	strncpy(cap->card, "imx-media-mem2mem", sizeof(cap->card) - 1);
+	strncpy(cap->bus_info, "platform:imx-media-mem2mem",
+		sizeof(cap->bus_info) - 1);
+	cap->device_caps = V4L2_CAP_VIDEO_M2M | V4L2_CAP_STREAMING;
+	cap->capabilities = cap->device_caps | V4L2_CAP_DEVICE_CAPS;
+
+	return 0;
+}
+
+static int mem2mem_enum_fmt(struct file *file, void *fh,
+			    struct v4l2_fmtdesc *f)
+{
+	u32 fourcc;
+	int ret;
+
+	ret = imx_media_enum_format(&fourcc, f->index, CS_SEL_ANY);
+	if (ret)
+		return ret;
+
+	f->pixelformat = fourcc;
+
+	return 0;
+}
+
+static int mem2mem_g_fmt(struct file *file, void *priv, struct v4l2_format *f)
+{
+	struct mem2mem_ctx *ctx = fh_to_ctx(priv);
+	struct mem2mem_q_data *q_data;
+
+	q_data = get_q_data(ctx, f->type);
+
+	f->fmt.pix = q_data->cur_fmt;
+
+	return 0;
+}
+
+static int mem2mem_try_fmt(struct file *file, void *priv,
+			   struct v4l2_format *f)
+{
+	const struct imx_media_pixfmt *cc;
+	struct mem2mem_ctx *ctx = fh_to_ctx(priv);
+	struct mem2mem_q_data *q_data = get_q_data(ctx, f->type);
+	unsigned int walign, halign;
+	u32 stride;
+
+	cc = imx_media_find_format(f->fmt.pix.pixelformat, CS_SEL_ANY, false);
+	if (!cc) {
+		f->fmt.pix.pixelformat = V4L2_PIX_FMT_RGB32;
+		cc = imx_media_find_format(V4L2_PIX_FMT_RGB32, CS_SEL_RGB,
+					   false);
+	}
+
+	/*
+	 * Horizontally/vertically chroma subsampled formats must have even
+	 * width/height.
+	 */
+	switch (f->fmt.pix.pixelformat) {
+	case V4L2_PIX_FMT_YUV420:
+	case V4L2_PIX_FMT_YVU420:
+	case V4L2_PIX_FMT_NV12:
+		walign = 1;
+		halign = 1;
+		break;
+	case V4L2_PIX_FMT_YUV422P:
+	case V4L2_PIX_FMT_NV16:
+		walign = 1;
+		halign = 0;
+		break;
+	default:
+		halign = 0;
+		break;
+	}
+	if (f->type == V4L2_BUF_TYPE_VIDEO_OUTPUT) {
+		/*
+		 * The IC burst reads 8 pixels at a time. Reading beyond the
+		 * end of the line is usually acceptable. Those pixels are
+		 * ignored, unless the IC has to write the scaled line in
+		 * reverse.
+		 */
+		if (!ipu_rot_mode_is_irt(ctx->rot_mode) &&
+		    ctx->rot_mode && IPU_ROT_BIT_HFLIP)
+			walign = 3;
+	} else {
+		if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
+			switch (f->fmt.pix.pixelformat) {
+			case V4L2_PIX_FMT_YUV420:
+			case V4L2_PIX_FMT_YVU420:
+			case V4L2_PIX_FMT_YUV422P:
+				/*
+				 * Align to 16x16 pixel blocks for planar 4:2:0
+				 * chroma subsampled formats to guarantee
+				 * 8-byte aligned line start addresses in the
+				 * chroma planes.
+				 */
+				walign = 4;
+				halign = 4;
+				break;
+			default:
+				/*
+				 * Align to 8x8 pixel IRT block size for all
+				 * other formats.
+				 */
+				walign = 3;
+				halign = 3;
+				break;
+			}
+		} else {
+			/*
+			 * The IC burst writes 8 pixels at a time.
+			 *
+			 * TODO: support unaligned width with via
+			 * V4L2_SEL_TGT_COMPOSE_PADDED.
+			 */
+			walign = 3;
+		}
+	}
+	v4l_bound_align_image(&f->fmt.pix.width, MIN_W, MAX_W, walign,
+			      &f->fmt.pix.height, MIN_H, MAX_H, halign, 0);
+
+	stride = cc->planar ? f->fmt.pix.width
+			    : (f->fmt.pix.width * cc->bpp) >> 3;
+	switch (f->fmt.pix.pixelformat) {
+	case V4L2_PIX_FMT_YUV420:
+	case V4L2_PIX_FMT_YVU420:
+	case V4L2_PIX_FMT_YUV422P:
+		stride = round_up(stride, 16);
+		break;
+	default:
+		stride = round_up(stride, 8);
+		break;
+	}
+
+	f->fmt.pix.field = V4L2_FIELD_NONE;
+	f->fmt.pix.bytesperline = stride;
+	f->fmt.pix.sizeimage = cc->planar ?
+			       (stride * f->fmt.pix.height * cc->bpp) >> 3 :
+			       stride * f->fmt.pix.height;
+
+	if (f->type == V4L2_BUF_TYPE_VIDEO_CAPTURE) {
+		f->fmt.pix.colorspace = q_data->cur_fmt.colorspace;
+		f->fmt.pix.ycbcr_enc = q_data->cur_fmt.ycbcr_enc;
+		f->fmt.pix.xfer_func = q_data->cur_fmt.xfer_func;
+		f->fmt.pix.quantization = q_data->cur_fmt.quantization;
+	} else if (f->fmt.pix.colorspace == V4L2_COLORSPACE_DEFAULT) {
+		f->fmt.pix.colorspace = V4L2_COLORSPACE_SRGB;
+		f->fmt.pix.ycbcr_enc = V4L2_YCBCR_ENC_DEFAULT;
+		f->fmt.pix.xfer_func = V4L2_XFER_FUNC_DEFAULT;
+		f->fmt.pix.quantization = V4L2_QUANTIZATION_DEFAULT;
+	}
+
+	return 0;
+}
+
+static int mem2mem_s_fmt(struct file *file, void *priv, struct v4l2_format *f)
+{
+	struct mem2mem_q_data *q_data;
+	struct mem2mem_ctx *ctx = fh_to_ctx(priv);
+	struct vb2_queue *vq;
+	int ret;
+
+	vq = v4l2_m2m_get_vq(ctx->fh.m2m_ctx, f->type);
+	if (vb2_is_busy(vq)) {
+		v4l2_err(ctx->priv->vdev.vfd->v4l2_dev, "%s queue busy\n",
+			 __func__);
+		return -EBUSY;
+	}
+
+	q_data = get_q_data(ctx, f->type);
+
+	ret = mem2mem_try_fmt(file, priv, f);
+	if (ret < 0)
+		return ret;
+
+	q_data->cur_fmt.width = f->fmt.pix.width;
+	q_data->cur_fmt.height = f->fmt.pix.height;
+	q_data->cur_fmt.pixelformat = f->fmt.pix.pixelformat;
+	q_data->cur_fmt.field = f->fmt.pix.field;
+	q_data->cur_fmt.bytesperline = f->fmt.pix.bytesperline;
+	q_data->cur_fmt.sizeimage = f->fmt.pix.sizeimage;
+
+	/* Reset cropping/composing rectangle */
+	q_data->rect.left = 0;
+	q_data->rect.top = 0;
+	q_data->rect.width = q_data->cur_fmt.width;
+	q_data->rect.height = q_data->cur_fmt.height;
+
+	if (f->type == V4L2_BUF_TYPE_VIDEO_OUTPUT) {
+		/* Set colorimetry on the output queue */
+		q_data->cur_fmt.colorspace = f->fmt.pix.colorspace;
+		q_data->cur_fmt.ycbcr_enc = f->fmt.pix.ycbcr_enc;
+		q_data->cur_fmt.xfer_func = f->fmt.pix.xfer_func;
+		q_data->cur_fmt.quantization = f->fmt.pix.quantization;
+		/* Propagate colorimetry to the capture queue */
+		q_data = get_q_data(ctx, V4L2_BUF_TYPE_VIDEO_CAPTURE);
+		q_data->cur_fmt.colorspace = f->fmt.pix.colorspace;
+		q_data->cur_fmt.ycbcr_enc = f->fmt.pix.ycbcr_enc;
+		q_data->cur_fmt.xfer_func = f->fmt.pix.xfer_func;
+		q_data->cur_fmt.quantization = f->fmt.pix.quantization;
+	}
+
+	/*
+	 * TODO: Setting colorimetry on the capture queue is currently not
+	 * supported by the V4L2 API
+	 */
+
+	return 0;
+}
+
+static int mem2mem_g_selection(struct file *file, void *priv,
+			       struct v4l2_selection *s)
+{
+	struct mem2mem_ctx *ctx = fh_to_ctx(priv);
+	struct mem2mem_q_data *q_data;
+
+	switch (s->target) {
+	case V4L2_SEL_TGT_CROP:
+	case V4L2_SEL_TGT_CROP_DEFAULT:
+	case V4L2_SEL_TGT_CROP_BOUNDS:
+		if (s->type != V4L2_BUF_TYPE_VIDEO_OUTPUT)
+			return -EINVAL;
+		q_data = get_q_data(ctx, V4L2_BUF_TYPE_VIDEO_OUTPUT);
+		break;
+	case V4L2_SEL_TGT_COMPOSE:
+	case V4L2_SEL_TGT_COMPOSE_DEFAULT:
+	case V4L2_SEL_TGT_COMPOSE_BOUNDS:
+	case V4L2_SEL_TGT_COMPOSE_PADDED:
+		if (s->type != V4L2_BUF_TYPE_VIDEO_CAPTURE)
+			return -EINVAL;
+		q_data = get_q_data(ctx, V4L2_BUF_TYPE_VIDEO_CAPTURE);
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	if (s->target == V4L2_SEL_TGT_CROP ||
+	    s->target == V4L2_SEL_TGT_COMPOSE) {
+		s->r = q_data->rect;
+	} else {
+		s->r.left = 0;
+		s->r.top = 0;
+		s->r.width = q_data->cur_fmt.width;
+		s->r.height = q_data->cur_fmt.height;
+	}
+
+	return 0;
+}
+
+static int mem2mem_s_selection(struct file *file, void *priv,
+			       struct v4l2_selection *s)
+{
+	struct mem2mem_ctx *ctx = fh_to_ctx(priv);
+	struct mem2mem_q_data *q_data;
+
+	switch (s->target) {
+	case V4L2_SEL_TGT_CROP:
+		if (s->type != V4L2_BUF_TYPE_VIDEO_OUTPUT)
+			return -EINVAL;
+		break;
+	case V4L2_SEL_TGT_COMPOSE:
+		if (s->type != V4L2_BUF_TYPE_VIDEO_CAPTURE)
+			return -EINVAL;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	if (s->type != V4L2_BUF_TYPE_VIDEO_CAPTURE ||
+	    s->type != V4L2_BUF_TYPE_VIDEO_OUTPUT)
+		return -EINVAL;
+
+	q_data = get_q_data(ctx, s->type);
+
+	/* The input's frame width to the IC must be a multiple of 8 pixels
+	 * When performing resizing the frame width must be multiple of burst
+	 * size - 8 or 16 pixels as defined by CB#_BURST_16 parameter.
+	 */
+	if (s->flags & V4L2_SEL_FLAG_GE)
+		s->r.width = round_up(s->r.width, 8);
+	if (s->flags & V4L2_SEL_FLAG_LE)
+		s->r.width = round_down(s->r.width, 8);
+	s->r.width = clamp_t(unsigned int, s->r.width, 8,
+			     round_down(q_data->cur_fmt.width, 8));
+	s->r.height = clamp_t(unsigned int, s->r.height, 1,
+			      q_data->cur_fmt.height);
+	s->r.left = clamp_t(unsigned int, s->r.left, 0,
+			    q_data->cur_fmt.width - s->r.width);
+	s->r.top = clamp_t(unsigned int, s->r.top, 0,
+			   q_data->cur_fmt.height - s->r.height);
+
+	/* V4L2_SEL_FLAG_KEEP_CONFIG is only valid for subdevices */
+	q_data->rect = s->r;
+
+	return 0;
+}
+
+static const struct v4l2_ioctl_ops mem2mem_ioctl_ops = {
+	.vidioc_querycap	= vidioc_querycap,
+
+	.vidioc_enum_fmt_vid_cap = mem2mem_enum_fmt,
+	.vidioc_g_fmt_vid_cap	= mem2mem_g_fmt,
+	.vidioc_try_fmt_vid_cap	= mem2mem_try_fmt,
+	.vidioc_s_fmt_vid_cap	= mem2mem_s_fmt,
+
+	.vidioc_enum_fmt_vid_out = mem2mem_enum_fmt,
+	.vidioc_g_fmt_vid_out	= mem2mem_g_fmt,
+	.vidioc_try_fmt_vid_out	= mem2mem_try_fmt,
+	.vidioc_s_fmt_vid_out	= mem2mem_s_fmt,
+
+	.vidioc_g_selection	= mem2mem_g_selection,
+	.vidioc_s_selection	= mem2mem_s_selection,
+
+	.vidioc_reqbufs		= v4l2_m2m_ioctl_reqbufs,
+	.vidioc_querybuf	= v4l2_m2m_ioctl_querybuf,
+
+	.vidioc_qbuf		= v4l2_m2m_ioctl_qbuf,
+	.vidioc_expbuf		= v4l2_m2m_ioctl_expbuf,
+	.vidioc_dqbuf		= v4l2_m2m_ioctl_dqbuf,
+	.vidioc_create_bufs	= v4l2_m2m_ioctl_create_bufs,
+
+	.vidioc_streamon	= v4l2_m2m_ioctl_streamon,
+	.vidioc_streamoff	= v4l2_m2m_ioctl_streamoff,
+};
+
+/*
+ * Queue operations
+ */
+
+static int mem2mem_queue_setup(struct vb2_queue *vq, unsigned int *nbuffers,
+			       unsigned int *nplanes, unsigned int sizes[],
+			       struct device *alloc_devs[])
+{
+	struct mem2mem_ctx *ctx = vb2_get_drv_priv(vq);
+	struct mem2mem_q_data *q_data;
+	unsigned int count = *nbuffers;
+	struct v4l2_pix_format *pix;
+
+	q_data = get_q_data(ctx, vq->type);
+	pix = &q_data->cur_fmt;
+
+	*nplanes = 1;
+	*nbuffers = count;
+	sizes[0] = pix->sizeimage;
+
+	dev_dbg(ctx->priv->dev, "get %d buffer(s) of size %d each.\n",
+		count, pix->sizeimage);
+
+	return 0;
+}
+
+static int mem2mem_buf_prepare(struct vb2_buffer *vb)
+{
+	struct mem2mem_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
+	struct mem2mem_q_data *q_data;
+	struct v4l2_pix_format *pix;
+	unsigned int plane_size, payload;
+
+	dev_dbg(ctx->priv->dev, "type: %d\n", vb->vb2_queue->type);
+
+	q_data = get_q_data(ctx, vb->vb2_queue->type);
+	pix = &q_data->cur_fmt;
+	plane_size = pix->sizeimage;
+
+	if (vb2_plane_size(vb, 0) < plane_size) {
+		dev_dbg(ctx->priv->dev,
+			"%s data will not fit into plane (%lu < %lu)\n",
+			__func__, vb2_plane_size(vb, 0), (long)plane_size);
+		return -EINVAL;
+	}
+
+	payload = pix->bytesperline * pix->height;
+	if (pix->pixelformat == V4L2_PIX_FMT_YUV420 ||
+	    pix->pixelformat == V4L2_PIX_FMT_YVU420 ||
+	    pix->pixelformat == V4L2_PIX_FMT_NV12)
+		payload = payload * 3 / 2;
+	else if (pix->pixelformat == V4L2_PIX_FMT_YUV422P ||
+		 pix->pixelformat == V4L2_PIX_FMT_NV16)
+		payload *= 2;
+
+	vb2_set_plane_payload(vb, 0, payload);
+
+	return 0;
+}
+
+static void mem2mem_buf_queue(struct vb2_buffer *vb)
+{
+	struct mem2mem_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
+
+	v4l2_m2m_buf_queue(ctx->fh.m2m_ctx, to_vb2_v4l2_buffer(vb));
+}
+
+static void ipu_image_from_q_data(struct ipu_image *im,
+				  struct mem2mem_q_data *q_data)
+{
+	im->pix.width = q_data->cur_fmt.width;
+	im->pix.height = q_data->cur_fmt.height;
+	im->pix.bytesperline = q_data->cur_fmt.bytesperline;
+	im->pix.pixelformat = q_data->cur_fmt.pixelformat;
+	im->rect = q_data->rect;
+}
+
+static int mem2mem_start_streaming(struct vb2_queue *q, unsigned int count)
+{
+	const enum ipu_ic_task ic_task = IC_TASK_POST_PROCESSOR;
+	struct mem2mem_ctx *ctx = vb2_get_drv_priv(q);
+	struct mem2mem_priv *priv = ctx->priv;
+	struct ipu_soc *ipu = priv->md->ipu[0];
+	struct mem2mem_q_data *q_data;
+	struct vb2_queue *other_q;
+	struct ipu_image in, out;
+
+	other_q = v4l2_m2m_get_vq(ctx->fh.m2m_ctx,
+				  (q->type == V4L2_BUF_TYPE_VIDEO_CAPTURE) ?
+				  V4L2_BUF_TYPE_VIDEO_OUTPUT :
+				  V4L2_BUF_TYPE_VIDEO_CAPTURE);
+	if (!vb2_is_streaming(other_q))
+		return 0;
+
+	if (ctx->icc) {
+		v4l2_warn(ctx->priv->vdev.vfd->v4l2_dev, "removing old ICC\n");
+		ipu_image_convert_unprepare(ctx->icc);
+	}
+
+	q_data = get_q_data(ctx, V4L2_BUF_TYPE_VIDEO_OUTPUT);
+	ipu_image_from_q_data(&in, q_data);
+
+	q_data = get_q_data(ctx, V4L2_BUF_TYPE_VIDEO_CAPTURE);
+	ipu_image_from_q_data(&out, q_data);
+
+	ctx->icc = ipu_image_convert_prepare(ipu, ic_task, &in, &out,
+					     ctx->rot_mode,
+					     mem2mem_ic_complete, ctx);
+	if (IS_ERR(ctx->icc)) {
+		struct vb2_v4l2_buffer *buf;
+		int ret = PTR_ERR(ctx->icc);
+
+		ctx->icc = NULL;
+		v4l2_err(ctx->priv->vdev.vfd->v4l2_dev, "%s: error %d\n",
+			 __func__, ret);
+		while ((buf = v4l2_m2m_src_buf_remove(ctx->fh.m2m_ctx)))
+			v4l2_m2m_buf_done(buf, VB2_BUF_STATE_QUEUED);
+		while ((buf = v4l2_m2m_dst_buf_remove(ctx->fh.m2m_ctx)))
+			v4l2_m2m_buf_done(buf, VB2_BUF_STATE_QUEUED);
+		return ret;
+	}
+
+	return 0;
+}
+
+static void mem2mem_stop_streaming(struct vb2_queue *q)
+{
+	struct mem2mem_ctx *ctx = vb2_get_drv_priv(q);
+	struct vb2_v4l2_buffer *buf;
+
+	if (ctx->icc) {
+		ipu_image_convert_unprepare(ctx->icc);
+		ctx->icc = NULL;
+	}
+
+	if (q->type == V4L2_BUF_TYPE_VIDEO_OUTPUT) {
+		while ((buf = v4l2_m2m_src_buf_remove(ctx->fh.m2m_ctx)))
+			v4l2_m2m_buf_done(buf, VB2_BUF_STATE_ERROR);
+	} else {
+		while ((buf = v4l2_m2m_dst_buf_remove(ctx->fh.m2m_ctx)))
+			v4l2_m2m_buf_done(buf, VB2_BUF_STATE_ERROR);
+	}
+}
+
+static const struct vb2_ops mem2mem_qops = {
+	.queue_setup	= mem2mem_queue_setup,
+	.buf_prepare	= mem2mem_buf_prepare,
+	.buf_queue	= mem2mem_buf_queue,
+	.wait_prepare	= vb2_ops_wait_prepare,
+	.wait_finish	= vb2_ops_wait_finish,
+	.start_streaming = mem2mem_start_streaming,
+	.stop_streaming = mem2mem_stop_streaming,
+};
+
+static int queue_init(void *priv, struct vb2_queue *src_vq,
+		      struct vb2_queue *dst_vq)
+{
+	struct mem2mem_ctx *ctx = priv;
+	int ret;
+
+	memset(src_vq, 0, sizeof(*src_vq));
+	src_vq->type = V4L2_BUF_TYPE_VIDEO_OUTPUT;
+	src_vq->io_modes = VB2_MMAP | VB2_DMABUF;
+	src_vq->drv_priv = ctx;
+	src_vq->buf_struct_size = sizeof(struct v4l2_m2m_buffer);
+	src_vq->ops = &mem2mem_qops;
+	src_vq->mem_ops = &vb2_dma_contig_memops;
+	src_vq->timestamp_flags = V4L2_BUF_FLAG_TIMESTAMP_COPY;
+	src_vq->lock = &ctx->priv->mutex;
+	src_vq->dev = ctx->priv->dev;
+
+	ret = vb2_queue_init(src_vq);
+	if (ret)
+		return ret;
+
+	memset(dst_vq, 0, sizeof(*dst_vq));
+	dst_vq->type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
+	dst_vq->io_modes = VB2_MMAP | VB2_DMABUF;
+	dst_vq->drv_priv = ctx;
+	dst_vq->buf_struct_size = sizeof(struct v4l2_m2m_buffer);
+	dst_vq->ops = &mem2mem_qops;
+	dst_vq->mem_ops = &vb2_dma_contig_memops;
+	dst_vq->timestamp_flags = V4L2_BUF_FLAG_TIMESTAMP_COPY;
+	dst_vq->lock = &ctx->priv->mutex;
+	dst_vq->dev = ctx->priv->dev;
+
+	return vb2_queue_init(dst_vq);
+}
+
+static int mem2mem_s_ctrl(struct v4l2_ctrl *ctrl)
+{
+	struct mem2mem_ctx *ctx = container_of(ctrl->handler,
+					       struct mem2mem_ctx, ctrl_hdlr);
+	enum ipu_rotate_mode rot_mode;
+	int rotate;
+	bool hflip, vflip;
+	int ret = 0;
+
+	rotate = ctx->rotate;
+	hflip = ctx->hflip;
+	vflip = ctx->vflip;
+
+	switch (ctrl->id) {
+	case V4L2_CID_HFLIP:
+		hflip = ctrl->val;
+		break;
+	case V4L2_CID_VFLIP:
+		vflip = ctrl->val;
+		break;
+	case V4L2_CID_ROTATE:
+		rotate = ctrl->val;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	ret = ipu_degrees_to_rot_mode(&rot_mode, rotate, hflip, vflip);
+	if (ret)
+		return ret;
+
+	if (rot_mode != ctx->rot_mode) {
+		struct vb2_queue *cap_q;
+
+		cap_q = v4l2_m2m_get_vq(ctx->fh.m2m_ctx,
+					V4L2_BUF_TYPE_VIDEO_CAPTURE);
+		if (vb2_is_streaming(cap_q))
+			return -EBUSY;
+
+		ctx->rot_mode = rot_mode;
+		ctx->rotate = rotate;
+		ctx->hflip = hflip;
+		ctx->vflip = vflip;
+	}
+
+	return 0;
+}
+
+static const struct v4l2_ctrl_ops mem2mem_ctrl_ops = {
+	.s_ctrl = mem2mem_s_ctrl,
+};
+
+static int mem2mem_init_controls(struct mem2mem_ctx *ctx)
+{
+	struct v4l2_ctrl_handler *hdlr = &ctx->ctrl_hdlr;
+	int ret;
+
+	v4l2_ctrl_handler_init(hdlr, 3);
+
+	v4l2_ctrl_new_std(hdlr, &mem2mem_ctrl_ops, V4L2_CID_HFLIP,
+			  0, 1, 1, 0);
+	v4l2_ctrl_new_std(hdlr, &mem2mem_ctrl_ops, V4L2_CID_VFLIP,
+			  0, 1, 1, 0);
+	v4l2_ctrl_new_std(hdlr, &mem2mem_ctrl_ops, V4L2_CID_ROTATE,
+			  0, 270, 90, 0);
+
+	if (hdlr->error) {
+		ret = hdlr->error;
+		goto out_free;
+	}
+
+	v4l2_ctrl_handler_setup(hdlr);
+	return 0;
+
+out_free:
+	v4l2_ctrl_handler_free(hdlr);
+	return ret;
+}
+
+#define DEFAULT_WIDTH	720
+#define DEFAULT_HEIGHT	576
+static const struct mem2mem_q_data mem2mem_q_data_default = {
+	.cur_fmt = {
+		.width = DEFAULT_WIDTH,
+		.height = DEFAULT_HEIGHT,
+		.pixelformat = V4L2_PIX_FMT_YUV420,
+		.field = V4L2_FIELD_NONE,
+		.bytesperline = DEFAULT_WIDTH,
+		.sizeimage = DEFAULT_WIDTH * DEFAULT_HEIGHT * 3 / 2,
+		.colorspace = V4L2_COLORSPACE_SRGB,
+	},
+	.rect = {
+		.width = DEFAULT_WIDTH,
+		.height = DEFAULT_HEIGHT,
+	},
+};
+
+/*
+ * File operations
+ */
+static int mem2mem_open(struct file *file)
+{
+	struct mem2mem_priv *priv = video_drvdata(file);
+	struct mem2mem_ctx *ctx = NULL;
+	int ret;
+
+	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
+	if (!ctx)
+		return -ENOMEM;
+
+	ctx->rot_mode = IPU_ROTATE_NONE;
+
+	v4l2_fh_init(&ctx->fh, video_devdata(file));
+	file->private_data = &ctx->fh;
+	v4l2_fh_add(&ctx->fh);
+	ctx->priv = priv;
+
+	ctx->fh.m2m_ctx = v4l2_m2m_ctx_init(priv->m2m_dev, ctx,
+					    &queue_init);
+	if (IS_ERR(ctx->fh.m2m_ctx)) {
+		ret = PTR_ERR(ctx->fh.m2m_ctx);
+		goto err_ctx;
+	}
+
+	ret = mem2mem_init_controls(ctx);
+	if (ret)
+		goto err_ctrls;
+
+	ctx->fh.ctrl_handler = &ctx->ctrl_hdlr;
+
+	ctx->q_data[V4L2_M2M_SRC] = mem2mem_q_data_default;
+	ctx->q_data[V4L2_M2M_DST] = mem2mem_q_data_default;
+
+	atomic_inc(&priv->num_inst);
+
+	dev_dbg(priv->dev, "Created instance %p, m2m_ctx: %p\n", ctx,
+		ctx->fh.m2m_ctx);
+
+	return 0;
+
+err_ctrls:
+	v4l2_m2m_ctx_release(ctx->fh.m2m_ctx);
+err_ctx:
+	v4l2_fh_del(&ctx->fh);
+	v4l2_fh_exit(&ctx->fh);
+	kfree(ctx);
+	return ret;
+}
+
+static int mem2mem_release(struct file *file)
+{
+	struct mem2mem_priv *priv = video_drvdata(file);
+	struct mem2mem_ctx *ctx = fh_to_ctx(file->private_data);
+
+	dev_dbg(priv->dev, "Releasing instance %p\n", ctx);
+
+	v4l2_m2m_ctx_release(ctx->fh.m2m_ctx);
+	v4l2_fh_del(&ctx->fh);
+	v4l2_fh_exit(&ctx->fh);
+	kfree(ctx);
+
+	atomic_dec(&priv->num_inst);
+
+	return 0;
+}
+
+static const struct v4l2_file_operations mem2mem_fops = {
+	.owner		= THIS_MODULE,
+	.open		= mem2mem_open,
+	.release	= mem2mem_release,
+	.poll		= v4l2_m2m_fop_poll,
+	.unlocked_ioctl	= video_ioctl2,
+	.mmap		= v4l2_m2m_fop_mmap,
+};
+
+static struct v4l2_m2m_ops m2m_ops = {
+	.device_run	= device_run,
+	.job_abort	= job_abort,
+};
+
+static const struct video_device mem2mem_videodev_template = {
+	.name		= "ipu0_ic_pp mem2mem",
+	.fops		= &mem2mem_fops,
+	.ioctl_ops	= &mem2mem_ioctl_ops,
+	.minor		= -1,
+	.release	= video_device_release,
+	.vfl_dir	= VFL_DIR_M2M,
+	.tvnorms	= V4L2_STD_NTSC | V4L2_STD_PAL | V4L2_STD_SECAM,
+	.device_caps	= V4L2_CAP_VIDEO_M2M | V4L2_CAP_STREAMING,
+};
+
+int imx_media_mem2mem_device_register(struct imx_media_video_dev *vdev)
+{
+	struct mem2mem_priv *priv = to_mem2mem_priv(vdev);
+	struct video_device *vfd = vdev->vfd;
+	int ret;
+
+	vfd->v4l2_dev = &priv->md->v4l2_dev;
+
+	ret = video_register_device(vfd, VFL_TYPE_GRABBER, -1);
+	if (ret) {
+		v4l2_err(vfd->v4l2_dev, "Failed to register video device\n");
+		return ret;
+	}
+
+	v4l2_info(vfd->v4l2_dev, "Registered %s as /dev/%s\n", vfd->name,
+		  video_device_node_name(vfd));
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(imx_media_mem2mem_device_register);
+
+void imx_media_mem2mem_device_unregister(struct imx_media_video_dev *vdev)
+{
+	struct mem2mem_priv *priv = to_mem2mem_priv(vdev);
+	struct video_device *vfd = priv->vdev.vfd;
+
+	mutex_lock(&priv->mutex);
+
+	if (video_is_registered(vfd)) {
+		video_unregister_device(vfd);
+		media_entity_cleanup(&vfd->entity);
+	}
+
+	mutex_unlock(&priv->mutex);
+}
+EXPORT_SYMBOL_GPL(imx_media_mem2mem_device_unregister);
+
+struct imx_media_video_dev *
+imx_media_mem2mem_device_init(struct imx_media_dev *md)
+{
+	struct mem2mem_priv *priv;
+	struct video_device *vfd;
+	int ret;
+
+	priv = devm_kzalloc(md->md.dev, sizeof(*priv), GFP_KERNEL);
+	if (!priv)
+		return ERR_PTR(-ENOMEM);
+
+	priv->md = md;
+	priv->dev = md->md.dev;
+
+	mutex_init(&priv->mutex);
+	atomic_set(&priv->num_inst, 0);
+
+	vfd = video_device_alloc();
+	if (!vfd)
+		return ERR_PTR(-ENOMEM);
+
+	*vfd = mem2mem_videodev_template;
+	snprintf(vfd->name, sizeof(vfd->name), "ipu_ic_pp mem2mem");
+	vfd->lock = &priv->mutex;
+	priv->vdev.vfd = vfd;
+
+	INIT_LIST_HEAD(&priv->vdev.list);
+
+	video_set_drvdata(vfd, priv);
+
+	priv->m2m_dev = v4l2_m2m_init(&m2m_ops);
+	if (IS_ERR(priv->m2m_dev)) {
+		ret = PTR_ERR(priv->m2m_dev);
+		v4l2_err(&md->v4l2_dev, "Failed to init mem2mem device: %d\n",
+			 ret);
+		return ERR_PTR(ret);
+	}
+
+	return &priv->vdev;
+}
+EXPORT_SYMBOL_GPL(imx_media_mem2mem_device_init);
+
+void imx_media_mem2mem_device_remove(struct imx_media_video_dev *vdev)
+{
+	struct mem2mem_priv *priv = to_mem2mem_priv(vdev);
+
+	v4l2_m2m_release(priv->m2m_dev);
+}
+EXPORT_SYMBOL_GPL(imx_media_mem2mem_device_remove);
+
+MODULE_DESCRIPTION("i.MX IPUv3 mem2mem scaler/CSC driver");
+MODULE_AUTHOR("Sascha Hauer <s.hauer@pengutronix.de>");
+MODULE_LICENSE("GPL");
diff --git a/drivers/staging/media/imx/imx-media.h b/drivers/staging/media/imx/imx-media.h
index e945e0ed6dd6..dc24ed37f050 100644
--- a/drivers/staging/media/imx/imx-media.h
+++ b/drivers/staging/media/imx/imx-media.h
@@ -149,6 +149,9 @@ struct imx_media_dev {
 	/* for async subdev registration */
 	struct list_head asd_list;
 	struct v4l2_async_notifier subdev_notifier;
+
+	/* IC scaler/CSC mem2mem video device */
+	struct imx_media_video_dev *m2m_vdev;
 };
 
 enum codespace_sel {
@@ -262,6 +265,13 @@ void imx_media_capture_device_set_format(struct imx_media_video_dev *vdev,
 					 struct v4l2_pix_format *pix);
 void imx_media_capture_device_error(struct imx_media_video_dev *vdev);
 
+/* imx-media-mem2mem.c */
+struct imx_media_video_dev *
+imx_media_mem2mem_device_init(struct imx_media_dev *dev);
+void imx_media_mem2mem_device_remove(struct imx_media_video_dev *vdev);
+int imx_media_mem2mem_device_register(struct imx_media_video_dev *vdev);
+void imx_media_mem2mem_device_unregister(struct imx_media_video_dev *vdev);
+
 /* subdev group ids */
 #define IMX_MEDIA_GRP_ID_CSI2      BIT(8)
 #define IMX_MEDIA_GRP_ID_CSI_BIT   9
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 16/16] media: imx: add mem2mem device
  2018-06-22 15:52 ` [PATCH 16/16] media: imx: add mem2mem device Philipp Zabel
@ 2018-06-22 19:37   ` Nicolas Dufresne
  2018-06-22 21:03   ` kbuild test robot
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 28+ messages in thread
From: Nicolas Dufresne @ 2018-06-22 19:37 UTC (permalink / raw)
  To: Philipp Zabel, linux-media; +Cc: kernel, Steve Longerbeam

[-- Attachment #1: Type: text/plain, Size: 32933 bytes --]

Le vendredi 22 juin 2018 à 17:52 +0200, Philipp Zabel a écrit :
> Add a single imx-media mem2mem video device that uses the IPU IC PP
> (image converter post processing) task for scaling and colorspace
> conversion.
> On i.MX6Q/DL SoCs with two IPUs currently only the first IPU is used.
> 
> The hardware only supports writing to destination buffers up to
> 1024x1024 pixels in a single pass, so the mem2mem video device is
> limited to this resolution. After fixing the tiling code it should
> be possible to extend this to arbitrary sizes by rendering multiple
> tiles per frame.
> 
> Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>

Tested-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>

> ---
>  drivers/staging/media/imx/Kconfig             |   1 +
>  drivers/staging/media/imx/Makefile            |   1 +
>  drivers/staging/media/imx/imx-media-dev.c     |  11 +
>  drivers/staging/media/imx/imx-media-mem2mem.c | 953
> ++++++++++++++++++
>  drivers/staging/media/imx/imx-media.h         |  10 +
>  5 files changed, 976 insertions(+)
>  create mode 100644 drivers/staging/media/imx/imx-media-mem2mem.c
> 
> diff --git a/drivers/staging/media/imx/Kconfig
> b/drivers/staging/media/imx/Kconfig
> index bfc17de56b17..07013cb3cb66 100644
> --- a/drivers/staging/media/imx/Kconfig
> +++ b/drivers/staging/media/imx/Kconfig
> @@ -6,6 +6,7 @@ config VIDEO_IMX_MEDIA
>  	depends on HAS_DMA
>  	select VIDEOBUF2_DMA_CONTIG
>  	select V4L2_FWNODE
> +	select V4L2_MEM2MEM_DEV
>  	---help---
>  	  Say yes here to enable support for video4linux media
> controller
>  	  driver for the i.MX5/6 SOC.
> diff --git a/drivers/staging/media/imx/Makefile
> b/drivers/staging/media/imx/Makefile
> index 698a4210316e..f2e722d0fa19 100644
> --- a/drivers/staging/media/imx/Makefile
> +++ b/drivers/staging/media/imx/Makefile
> @@ -6,6 +6,7 @@ imx-media-ic-objs := imx-ic-common.o imx-ic-prp.o
> imx-ic-prpencvf.o
>  obj-$(CONFIG_VIDEO_IMX_MEDIA) += imx-media.o
>  obj-$(CONFIG_VIDEO_IMX_MEDIA) += imx-media-common.o
>  obj-$(CONFIG_VIDEO_IMX_MEDIA) += imx-media-capture.o
> +obj-$(CONFIG_VIDEO_IMX_MEDIA) += imx-media-mem2mem.o
>  obj-$(CONFIG_VIDEO_IMX_MEDIA) += imx-media-vdic.o
>  obj-$(CONFIG_VIDEO_IMX_MEDIA) += imx-media-ic.o
>  
> diff --git a/drivers/staging/media/imx/imx-media-dev.c
> b/drivers/staging/media/imx/imx-media-dev.c
> index 289d775c4820..7a9aabcae3ee 100644
> --- a/drivers/staging/media/imx/imx-media-dev.c
> +++ b/drivers/staging/media/imx/imx-media-dev.c
> @@ -359,6 +359,17 @@ static int imx_media_probe_complete(struct
> v4l2_async_notifier *notifier)
>  		goto unlock;
>  
>  	ret = v4l2_device_register_subdev_nodes(&imxmd->v4l2_dev);
> +	if (ret)
> +		goto unlock;
> +
> +	/* TODO: check whether we have IC subdevices first */
> +	imxmd->m2m_vdev = imx_media_mem2mem_device_init(imxmd);
> +	if (IS_ERR(imxmd->m2m_vdev)) {
> +		ret = PTR_ERR(imxmd->m2m_vdev);
> +		goto unlock;
> +	}
> +
> +	ret = imx_media_mem2mem_device_register(imxmd->m2m_vdev);
>  unlock:
>  	mutex_unlock(&imxmd->mutex);
>  	if (ret)
> diff --git a/drivers/staging/media/imx/imx-media-mem2mem.c
> b/drivers/staging/media/imx/imx-media-mem2mem.c
> new file mode 100644
> index 000000000000..8830f77f0407
> --- /dev/null
> +++ b/drivers/staging/media/imx/imx-media-mem2mem.c
> @@ -0,0 +1,953 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * i.MX IPUv3 mem2mem Scaler/CSC driver
> + *
> + * Copyright (C) 2011 Pengutronix, Sascha Hauer
> + * Copyright (C) 2018 Pengutronix, Philipp Zabel
> + *
> + * This program is free software; you can redistribute it and/or
> modify
> + * it under the terms of the GNU General Public License as published
> by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + */
> +#include <linux/module.h>
> +#include <linux/delay.h>
> +#include <linux/fs.h>
> +#include <linux/version.h>
> +#include <linux/sched.h>
> +#include <linux/slab.h>
> +#include <video/imx-ipu-v3.h>
> +#include <video/imx-ipu-image-convert.h>
> +
> +#include <media/v4l2-ctrls.h>
> +#include <media/v4l2-mem2mem.h>
> +#include <media/v4l2-device.h>
> +#include <media/v4l2-ioctl.h>
> +#include <media/videobuf2-dma-contig.h>
> +
> +#include "imx-media.h"
> +
> +#define MIN_W 16
> +#define MIN_H 16
> +#define MAX_W 4096
> +#define MAX_H 4096
> +
> +#define fh_to_ctx(__fh)	container_of(__fh, struct
> mem2mem_ctx, fh)
> +
> +enum {
> +	V4L2_M2M_SRC = 0,
> +	V4L2_M2M_DST = 1,
> +};
> +
> +struct mem2mem_priv {
> +	struct imx_media_video_dev vdev;
> +
> +	struct v4l2_m2m_dev   *m2m_dev;
> +	struct device         *dev;
> +
> +	struct imx_media_dev  *md;
> +
> +	struct mutex          mutex;       /* mem2mem device mutex
> */
> +
> +	atomic_t              num_inst;
> +};
> +
> +#define to_mem2mem_priv(v) container_of(v, struct mem2mem_priv,
> vdev)
> +
> +/* Per-queue, driver-specific private data */
> +struct mem2mem_q_data {
> +	struct v4l2_pix_format	cur_fmt;
> +	struct v4l2_rect	rect;
> +};
> +
> +struct mem2mem_ctx {
> +	struct mem2mem_priv	*priv;
> +
> +	struct v4l2_fh		fh;
> +	struct mem2mem_q_data	q_data[2];
> +	int			error;
> +	struct ipu_image_convert_ctx *icc;
> +
> +	struct v4l2_ctrl_handler ctrl_hdlr;
> +	int rotate;
> +	bool hflip;
> +	bool vflip;
> +	enum ipu_rotate_mode	rot_mode;
> +};
> +
> +static struct mem2mem_q_data *get_q_data(struct mem2mem_ctx *ctx,
> +					 enum v4l2_buf_type type)
> +{
> +	if (V4L2_TYPE_IS_OUTPUT(type))
> +		return &ctx->q_data[V4L2_M2M_SRC];
> +	else
> +		return &ctx->q_data[V4L2_M2M_DST];
> +}
> +
> +/*
> + * mem2mem callbacks
> + */
> +
> +static void job_abort(void *_ctx)
> +{
> +	struct mem2mem_ctx *ctx = _ctx;
> +
> +	if (ctx->icc)
> +		ipu_image_convert_abort(ctx->icc);
> +}
> +
> +static void mem2mem_ic_complete(struct ipu_image_convert_run *run,
> void *_ctx)
> +{
> +	struct mem2mem_ctx *ctx = _ctx;
> +	struct mem2mem_priv *priv = ctx->priv;
> +	struct vb2_v4l2_buffer *src_buf, *dst_buf;
> +
> +	src_buf = v4l2_m2m_src_buf_remove(ctx->fh.m2m_ctx);
> +	dst_buf = v4l2_m2m_dst_buf_remove(ctx->fh.m2m_ctx);
> +
> +	dst_buf->vb2_buf.timestamp = src_buf->vb2_buf.timestamp;
> +	dst_buf->timecode = src_buf->timecode;
> +
> +	v4l2_m2m_buf_done(src_buf, run->status ? VB2_BUF_STATE_ERROR
> :
> +						 VB2_BUF_STATE_DONE)
> ;
> +	v4l2_m2m_buf_done(dst_buf, run->status ? VB2_BUF_STATE_ERROR
> :
> +						 VB2_BUF_STATE_DONE)
> ;
> +
> +	v4l2_m2m_job_finish(priv->m2m_dev, ctx->fh.m2m_ctx);
> +	kfree(run);
> +}
> +
> +static void device_run(void *_ctx)
> +{
> +	struct mem2mem_ctx *ctx = _ctx;
> +	struct mem2mem_priv *priv = ctx->priv;
> +	struct vb2_v4l2_buffer *src_buf, *dst_buf;
> +	struct ipu_image_convert_run *run;
> +	int ret;
> +
> +	src_buf = v4l2_m2m_next_src_buf(ctx->fh.m2m_ctx);
> +	dst_buf = v4l2_m2m_next_dst_buf(ctx->fh.m2m_ctx);
> +
> +	run = kzalloc(sizeof(*run), GFP_KERNEL);
> +	if (!run)
> +		goto err;
> +
> +	run->ctx = ctx->icc;
> +	run->in_phys = vb2_dma_contig_plane_dma_addr(&src_buf-
> >vb2_buf, 0);
> +	run->out_phys = vb2_dma_contig_plane_dma_addr(&dst_buf-
> >vb2_buf, 0);
> +
> +	ret = ipu_image_convert_queue(run);
> +	if (ret < 0) {
> +		v4l2_err(ctx->priv->vdev.vfd->v4l2_dev,
> +			 "%s: failed to queue: %d\n", __func__,
> ret);
> +		goto err;
> +	}
> +
> +	return;
> +
> +err:
> +	v4l2_m2m_buf_done(src_buf, VB2_BUF_STATE_ERROR);
> +	v4l2_m2m_buf_done(dst_buf, VB2_BUF_STATE_ERROR);
> +	v4l2_m2m_job_finish(priv->m2m_dev, ctx->fh.m2m_ctx);
> +}
> +
> +/*
> + * Video ioctls
> + */
> +static int vidioc_querycap(struct file *file, void *priv,
> +			   struct v4l2_capability *cap)
> +{
> +	strncpy(cap->driver, "imx-media-mem2mem", sizeof(cap-
> >driver) - 1);
> +	strncpy(cap->card, "imx-media-mem2mem", sizeof(cap->card) -
> 1);
> +	strncpy(cap->bus_info, "platform:imx-media-mem2mem",
> +		sizeof(cap->bus_info) - 1);
> +	cap->device_caps = V4L2_CAP_VIDEO_M2M | V4L2_CAP_STREAMING;
> +	cap->capabilities = cap->device_caps | V4L2_CAP_DEVICE_CAPS;
> +
> +	return 0;
> +}
> +
> +static int mem2mem_enum_fmt(struct file *file, void *fh,
> +			    struct v4l2_fmtdesc *f)
> +{
> +	u32 fourcc;
> +	int ret;
> +
> +	ret = imx_media_enum_format(&fourcc, f->index, CS_SEL_ANY);
> +	if (ret)
> +		return ret;
> +
> +	f->pixelformat = fourcc;
> +
> +	return 0;
> +}
> +
> +static int mem2mem_g_fmt(struct file *file, void *priv, struct
> v4l2_format *f)
> +{
> +	struct mem2mem_ctx *ctx = fh_to_ctx(priv);
> +	struct mem2mem_q_data *q_data;
> +
> +	q_data = get_q_data(ctx, f->type);
> +
> +	f->fmt.pix = q_data->cur_fmt;
> +
> +	return 0;
> +}
> +
> +static int mem2mem_try_fmt(struct file *file, void *priv,
> +			   struct v4l2_format *f)
> +{
> +	const struct imx_media_pixfmt *cc;
> +	struct mem2mem_ctx *ctx = fh_to_ctx(priv);
> +	struct mem2mem_q_data *q_data = get_q_data(ctx, f->type);
> +	unsigned int walign, halign;
> +	u32 stride;
> +
> +	cc = imx_media_find_format(f->fmt.pix.pixelformat,
> CS_SEL_ANY, false);
> +	if (!cc) {
> +		f->fmt.pix.pixelformat = V4L2_PIX_FMT_RGB32;
> +		cc = imx_media_find_format(V4L2_PIX_FMT_RGB32,
> CS_SEL_RGB,
> +					   false);
> +	}
> +
> +	/*
> +	 * Horizontally/vertically chroma subsampled formats must
> have even
> +	 * width/height.
> +	 */
> +	switch (f->fmt.pix.pixelformat) {
> +	case V4L2_PIX_FMT_YUV420:
> +	case V4L2_PIX_FMT_YVU420:
> +	case V4L2_PIX_FMT_NV12:
> +		walign = 1;
> +		halign = 1;
> +		break;
> +	case V4L2_PIX_FMT_YUV422P:
> +	case V4L2_PIX_FMT_NV16:
> +		walign = 1;
> +		halign = 0;
> +		break;
> +	default:
> +		halign = 0;
> +		break;
> +	}
> +	if (f->type == V4L2_BUF_TYPE_VIDEO_OUTPUT) {
> +		/*
> +		 * The IC burst reads 8 pixels at a time. Reading
> beyond the
> +		 * end of the line is usually acceptable. Those
> pixels are
> +		 * ignored, unless the IC has to write the scaled
> line in
> +		 * reverse.
> +		 */
> +		if (!ipu_rot_mode_is_irt(ctx->rot_mode) &&
> +		    ctx->rot_mode && IPU_ROT_BIT_HFLIP)
> +			walign = 3;
> +	} else {
> +		if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
> +			switch (f->fmt.pix.pixelformat) {
> +			case V4L2_PIX_FMT_YUV420:
> +			case V4L2_PIX_FMT_YVU420:
> +			case V4L2_PIX_FMT_YUV422P:
> +				/*
> +				 * Align to 16x16 pixel blocks for
> planar 4:2:0
> +				 * chroma subsampled formats to
> guarantee
> +				 * 8-byte aligned line start
> addresses in the
> +				 * chroma planes.
> +				 */
> +				walign = 4;
> +				halign = 4;
> +				break;
> +			default:
> +				/*
> +				 * Align to 8x8 pixel IRT block size
> for all
> +				 * other formats.
> +				 */
> +				walign = 3;
> +				halign = 3;
> +				break;
> +			}
> +		} else {
> +			/*
> +			 * The IC burst writes 8 pixels at a time.
> +			 *
> +			 * TODO: support unaligned width with via
> +			 * V4L2_SEL_TGT_COMPOSE_PADDED.
> +			 */
> +			walign = 3;
> +		}
> +	}
> +	v4l_bound_align_image(&f->fmt.pix.width, MIN_W, MAX_W,
> walign,
> +			      &f->fmt.pix.height, MIN_H, MAX_H,
> halign, 0);
> +
> +	stride = cc->planar ? f->fmt.pix.width
> +			    : (f->fmt.pix.width * cc->bpp) >> 3;
> +	switch (f->fmt.pix.pixelformat) {
> +	case V4L2_PIX_FMT_YUV420:
> +	case V4L2_PIX_FMT_YVU420:
> +	case V4L2_PIX_FMT_YUV422P:
> +		stride = round_up(stride, 16);
> +		break;
> +	default:
> +		stride = round_up(stride, 8);
> +		break;
> +	}
> +
> +	f->fmt.pix.field = V4L2_FIELD_NONE;
> +	f->fmt.pix.bytesperline = stride;
> +	f->fmt.pix.sizeimage = cc->planar ?
> +			       (stride * f->fmt.pix.height * cc-
> >bpp) >> 3 :
> +			       stride * f->fmt.pix.height;
> +
> +	if (f->type == V4L2_BUF_TYPE_VIDEO_CAPTURE) {
> +		f->fmt.pix.colorspace = q_data->cur_fmt.colorspace;
> +		f->fmt.pix.ycbcr_enc = q_data->cur_fmt.ycbcr_enc;
> +		f->fmt.pix.xfer_func = q_data->cur_fmt.xfer_func;
> +		f->fmt.pix.quantization = q_data-
> >cur_fmt.quantization;
> +	} else if (f->fmt.pix.colorspace == V4L2_COLORSPACE_DEFAULT)
> {
> +		f->fmt.pix.colorspace = V4L2_COLORSPACE_SRGB;
> +		f->fmt.pix.ycbcr_enc = V4L2_YCBCR_ENC_DEFAULT;
> +		f->fmt.pix.xfer_func = V4L2_XFER_FUNC_DEFAULT;
> +		f->fmt.pix.quantization = V4L2_QUANTIZATION_DEFAULT;
> +	}
> +
> +	return 0;
> +}
> +
> +static int mem2mem_s_fmt(struct file *file, void *priv, struct
> v4l2_format *f)
> +{
> +	struct mem2mem_q_data *q_data;
> +	struct mem2mem_ctx *ctx = fh_to_ctx(priv);
> +	struct vb2_queue *vq;
> +	int ret;
> +
> +	vq = v4l2_m2m_get_vq(ctx->fh.m2m_ctx, f->type);
> +	if (vb2_is_busy(vq)) {
> +		v4l2_err(ctx->priv->vdev.vfd->v4l2_dev, "%s queue
> busy\n",
> +			 __func__);
> +		return -EBUSY;
> +	}
> +
> +	q_data = get_q_data(ctx, f->type);
> +
> +	ret = mem2mem_try_fmt(file, priv, f);
> +	if (ret < 0)
> +		return ret;
> +
> +	q_data->cur_fmt.width = f->fmt.pix.width;
> +	q_data->cur_fmt.height = f->fmt.pix.height;
> +	q_data->cur_fmt.pixelformat = f->fmt.pix.pixelformat;
> +	q_data->cur_fmt.field = f->fmt.pix.field;
> +	q_data->cur_fmt.bytesperline = f->fmt.pix.bytesperline;
> +	q_data->cur_fmt.sizeimage = f->fmt.pix.sizeimage;
> +
> +	/* Reset cropping/composing rectangle */
> +	q_data->rect.left = 0;
> +	q_data->rect.top = 0;
> +	q_data->rect.width = q_data->cur_fmt.width;
> +	q_data->rect.height = q_data->cur_fmt.height;
> +
> +	if (f->type == V4L2_BUF_TYPE_VIDEO_OUTPUT) {
> +		/* Set colorimetry on the output queue */
> +		q_data->cur_fmt.colorspace = f->fmt.pix.colorspace;
> +		q_data->cur_fmt.ycbcr_enc = f->fmt.pix.ycbcr_enc;
> +		q_data->cur_fmt.xfer_func = f->fmt.pix.xfer_func;
> +		q_data->cur_fmt.quantization = f-
> >fmt.pix.quantization;
> +		/* Propagate colorimetry to the capture queue */
> +		q_data = get_q_data(ctx,
> V4L2_BUF_TYPE_VIDEO_CAPTURE);
> +		q_data->cur_fmt.colorspace = f->fmt.pix.colorspace;
> +		q_data->cur_fmt.ycbcr_enc = f->fmt.pix.ycbcr_enc;
> +		q_data->cur_fmt.xfer_func = f->fmt.pix.xfer_func;
> +		q_data->cur_fmt.quantization = f-
> >fmt.pix.quantization;
> +	}
> +
> +	/*
> +	 * TODO: Setting colorimetry on the capture queue is
> currently not
> +	 * supported by the V4L2 API
> +	 */
> +
> +	return 0;
> +}
> +
> +static int mem2mem_g_selection(struct file *file, void *priv,
> +			       struct v4l2_selection *s)
> +{
> +	struct mem2mem_ctx *ctx = fh_to_ctx(priv);
> +	struct mem2mem_q_data *q_data;
> +
> +	switch (s->target) {
> +	case V4L2_SEL_TGT_CROP:
> +	case V4L2_SEL_TGT_CROP_DEFAULT:
> +	case V4L2_SEL_TGT_CROP_BOUNDS:
> +		if (s->type != V4L2_BUF_TYPE_VIDEO_OUTPUT)
> +			return -EINVAL;
> +		q_data = get_q_data(ctx,
> V4L2_BUF_TYPE_VIDEO_OUTPUT);
> +		break;
> +	case V4L2_SEL_TGT_COMPOSE:
> +	case V4L2_SEL_TGT_COMPOSE_DEFAULT:
> +	case V4L2_SEL_TGT_COMPOSE_BOUNDS:
> +	case V4L2_SEL_TGT_COMPOSE_PADDED:
> +		if (s->type != V4L2_BUF_TYPE_VIDEO_CAPTURE)
> +			return -EINVAL;
> +		q_data = get_q_data(ctx,
> V4L2_BUF_TYPE_VIDEO_CAPTURE);
> +		break;
> +	default:
> +		return -EINVAL;
> +	}
> +
> +	if (s->target == V4L2_SEL_TGT_CROP ||
> +	    s->target == V4L2_SEL_TGT_COMPOSE) {
> +		s->r = q_data->rect;
> +	} else {
> +		s->r.left = 0;
> +		s->r.top = 0;
> +		s->r.width = q_data->cur_fmt.width;
> +		s->r.height = q_data->cur_fmt.height;
> +	}
> +
> +	return 0;
> +}
> +
> +static int mem2mem_s_selection(struct file *file, void *priv,
> +			       struct v4l2_selection *s)
> +{
> +	struct mem2mem_ctx *ctx = fh_to_ctx(priv);
> +	struct mem2mem_q_data *q_data;
> +
> +	switch (s->target) {
> +	case V4L2_SEL_TGT_CROP:
> +		if (s->type != V4L2_BUF_TYPE_VIDEO_OUTPUT)
> +			return -EINVAL;
> +		break;
> +	case V4L2_SEL_TGT_COMPOSE:
> +		if (s->type != V4L2_BUF_TYPE_VIDEO_CAPTURE)
> +			return -EINVAL;
> +		break;
> +	default:
> +		return -EINVAL;
> +	}
> +
> +	if (s->type != V4L2_BUF_TYPE_VIDEO_CAPTURE ||
> +	    s->type != V4L2_BUF_TYPE_VIDEO_OUTPUT)
> +		return -EINVAL;
> +
> +	q_data = get_q_data(ctx, s->type);
> +
> +	/* The input's frame width to the IC must be a multiple of 8
> pixels
> +	 * When performing resizing the frame width must be multiple
> of burst
> +	 * size - 8 or 16 pixels as defined by CB#_BURST_16
> parameter.
> +	 */
> +	if (s->flags & V4L2_SEL_FLAG_GE)
> +		s->r.width = round_up(s->r.width, 8);
> +	if (s->flags & V4L2_SEL_FLAG_LE)
> +		s->r.width = round_down(s->r.width, 8);
> +	s->r.width = clamp_t(unsigned int, s->r.width, 8,
> +			     round_down(q_data->cur_fmt.width, 8));
> +	s->r.height = clamp_t(unsigned int, s->r.height, 1,
> +			      q_data->cur_fmt.height);
> +	s->r.left = clamp_t(unsigned int, s->r.left, 0,
> +			    q_data->cur_fmt.width - s->r.width);
> +	s->r.top = clamp_t(unsigned int, s->r.top, 0,
> +			   q_data->cur_fmt.height - s->r.height);
> +
> +	/* V4L2_SEL_FLAG_KEEP_CONFIG is only valid for subdevices */
> +	q_data->rect = s->r;
> +
> +	return 0;
> +}
> +
> +static const struct v4l2_ioctl_ops mem2mem_ioctl_ops = {
> +	.vidioc_querycap	= vidioc_querycap,
> +
> +	.vidioc_enum_fmt_vid_cap = mem2mem_enum_fmt,
> +	.vidioc_g_fmt_vid_cap	= mem2mem_g_fmt,
> +	.vidioc_try_fmt_vid_cap	= mem2mem_try_fmt,
> +	.vidioc_s_fmt_vid_cap	= mem2mem_s_fmt,
> +
> +	.vidioc_enum_fmt_vid_out = mem2mem_enum_fmt,
> +	.vidioc_g_fmt_vid_out	= mem2mem_g_fmt,
> +	.vidioc_try_fmt_vid_out	= mem2mem_try_fmt,
> +	.vidioc_s_fmt_vid_out	= mem2mem_s_fmt,
> +
> +	.vidioc_g_selection	= mem2mem_g_selection,
> +	.vidioc_s_selection	= mem2mem_s_selection,
> +
> +	.vidioc_reqbufs		= v4l2_m2m_ioctl_reqbufs,
> +	.vidioc_querybuf	= v4l2_m2m_ioctl_querybuf,
> +
> +	.vidioc_qbuf		= v4l2_m2m_ioctl_qbuf,
> +	.vidioc_expbuf		= v4l2_m2m_ioctl_expbuf,
> +	.vidioc_dqbuf		= v4l2_m2m_ioctl_dqbuf,
> +	.vidioc_create_bufs	= v4l2_m2m_ioctl_create_bufs,
> +
> +	.vidioc_streamon	= v4l2_m2m_ioctl_streamon,
> +	.vidioc_streamoff	= v4l2_m2m_ioctl_streamoff,
> +};
> +
> +/*
> + * Queue operations
> + */
> +
> +static int mem2mem_queue_setup(struct vb2_queue *vq, unsigned int
> *nbuffers,
> +			       unsigned int *nplanes, unsigned int
> sizes[],
> +			       struct device *alloc_devs[])
> +{
> +	struct mem2mem_ctx *ctx = vb2_get_drv_priv(vq);
> +	struct mem2mem_q_data *q_data;
> +	unsigned int count = *nbuffers;
> +	struct v4l2_pix_format *pix;
> +
> +	q_data = get_q_data(ctx, vq->type);
> +	pix = &q_data->cur_fmt;
> +
> +	*nplanes = 1;
> +	*nbuffers = count;
> +	sizes[0] = pix->sizeimage;
> +
> +	dev_dbg(ctx->priv->dev, "get %d buffer(s) of size %d
> each.\n",
> +		count, pix->sizeimage);
> +
> +	return 0;
> +}
> +
> +static int mem2mem_buf_prepare(struct vb2_buffer *vb)
> +{
> +	struct mem2mem_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
> +	struct mem2mem_q_data *q_data;
> +	struct v4l2_pix_format *pix;
> +	unsigned int plane_size, payload;
> +
> +	dev_dbg(ctx->priv->dev, "type: %d\n", vb->vb2_queue->type);
> +
> +	q_data = get_q_data(ctx, vb->vb2_queue->type);
> +	pix = &q_data->cur_fmt;
> +	plane_size = pix->sizeimage;
> +
> +	if (vb2_plane_size(vb, 0) < plane_size) {
> +		dev_dbg(ctx->priv->dev,
> +			"%s data will not fit into plane (%lu <
> %lu)\n",
> +			__func__, vb2_plane_size(vb, 0),
> (long)plane_size);
> +		return -EINVAL;
> +	}
> +
> +	payload = pix->bytesperline * pix->height;
> +	if (pix->pixelformat == V4L2_PIX_FMT_YUV420 ||
> +	    pix->pixelformat == V4L2_PIX_FMT_YVU420 ||
> +	    pix->pixelformat == V4L2_PIX_FMT_NV12)
> +		payload = payload * 3 / 2;
> +	else if (pix->pixelformat == V4L2_PIX_FMT_YUV422P ||
> +		 pix->pixelformat == V4L2_PIX_FMT_NV16)
> +		payload *= 2;
> +
> +	vb2_set_plane_payload(vb, 0, payload);
> +
> +	return 0;
> +}
> +
> +static void mem2mem_buf_queue(struct vb2_buffer *vb)
> +{
> +	struct mem2mem_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
> +
> +	v4l2_m2m_buf_queue(ctx->fh.m2m_ctx, to_vb2_v4l2_buffer(vb));
> +}
> +
> +static void ipu_image_from_q_data(struct ipu_image *im,
> +				  struct mem2mem_q_data *q_data)
> +{
> +	im->pix.width = q_data->cur_fmt.width;
> +	im->pix.height = q_data->cur_fmt.height;
> +	im->pix.bytesperline = q_data->cur_fmt.bytesperline;
> +	im->pix.pixelformat = q_data->cur_fmt.pixelformat;
> +	im->rect = q_data->rect;
> +}
> +
> +static int mem2mem_start_streaming(struct vb2_queue *q, unsigned int
> count)
> +{
> +	const enum ipu_ic_task ic_task = IC_TASK_POST_PROCESSOR;
> +	struct mem2mem_ctx *ctx = vb2_get_drv_priv(q);
> +	struct mem2mem_priv *priv = ctx->priv;
> +	struct ipu_soc *ipu = priv->md->ipu[0];
> +	struct mem2mem_q_data *q_data;
> +	struct vb2_queue *other_q;
> +	struct ipu_image in, out;
> +
> +	other_q = v4l2_m2m_get_vq(ctx->fh.m2m_ctx,
> +				  (q->type ==
> V4L2_BUF_TYPE_VIDEO_CAPTURE) ?
> +				  V4L2_BUF_TYPE_VIDEO_OUTPUT :
> +				  V4L2_BUF_TYPE_VIDEO_CAPTURE);
> +	if (!vb2_is_streaming(other_q))
> +		return 0;
> +
> +	if (ctx->icc) {
> +		v4l2_warn(ctx->priv->vdev.vfd->v4l2_dev, "removing
> old ICC\n");
> +		ipu_image_convert_unprepare(ctx->icc);
> +	}
> +
> +	q_data = get_q_data(ctx, V4L2_BUF_TYPE_VIDEO_OUTPUT);
> +	ipu_image_from_q_data(&in, q_data);
> +
> +	q_data = get_q_data(ctx, V4L2_BUF_TYPE_VIDEO_CAPTURE);
> +	ipu_image_from_q_data(&out, q_data);
> +
> +	ctx->icc = ipu_image_convert_prepare(ipu, ic_task, &in,
> &out,
> +					     ctx->rot_mode,
> +					     mem2mem_ic_complete,
> ctx);
> +	if (IS_ERR(ctx->icc)) {
> +		struct vb2_v4l2_buffer *buf;
> +		int ret = PTR_ERR(ctx->icc);
> +
> +		ctx->icc = NULL;
> +		v4l2_err(ctx->priv->vdev.vfd->v4l2_dev, "%s: error
> %d\n",
> +			 __func__, ret);
> +		while ((buf = v4l2_m2m_src_buf_remove(ctx-
> >fh.m2m_ctx)))
> +			v4l2_m2m_buf_done(buf,
> VB2_BUF_STATE_QUEUED);
> +		while ((buf = v4l2_m2m_dst_buf_remove(ctx-
> >fh.m2m_ctx)))
> +			v4l2_m2m_buf_done(buf,
> VB2_BUF_STATE_QUEUED);
> +		return ret;
> +	}
> +
> +	return 0;
> +}
> +
> +static void mem2mem_stop_streaming(struct vb2_queue *q)
> +{
> +	struct mem2mem_ctx *ctx = vb2_get_drv_priv(q);
> +	struct vb2_v4l2_buffer *buf;
> +
> +	if (ctx->icc) {
> +		ipu_image_convert_unprepare(ctx->icc);
> +		ctx->icc = NULL;
> +	}
> +
> +	if (q->type == V4L2_BUF_TYPE_VIDEO_OUTPUT) {
> +		while ((buf = v4l2_m2m_src_buf_remove(ctx-
> >fh.m2m_ctx)))
> +			v4l2_m2m_buf_done(buf, VB2_BUF_STATE_ERROR);
> +	} else {
> +		while ((buf = v4l2_m2m_dst_buf_remove(ctx-
> >fh.m2m_ctx)))
> +			v4l2_m2m_buf_done(buf, VB2_BUF_STATE_ERROR);
> +	}
> +}
> +
> +static const struct vb2_ops mem2mem_qops = {
> +	.queue_setup	= mem2mem_queue_setup,
> +	.buf_prepare	= mem2mem_buf_prepare,
> +	.buf_queue	= mem2mem_buf_queue,
> +	.wait_prepare	= vb2_ops_wait_prepare,
> +	.wait_finish	= vb2_ops_wait_finish,
> +	.start_streaming = mem2mem_start_streaming,
> +	.stop_streaming = mem2mem_stop_streaming,
> +};
> +
> +static int queue_init(void *priv, struct vb2_queue *src_vq,
> +		      struct vb2_queue *dst_vq)
> +{
> +	struct mem2mem_ctx *ctx = priv;
> +	int ret;
> +
> +	memset(src_vq, 0, sizeof(*src_vq));
> +	src_vq->type = V4L2_BUF_TYPE_VIDEO_OUTPUT;
> +	src_vq->io_modes = VB2_MMAP | VB2_DMABUF;
> +	src_vq->drv_priv = ctx;
> +	src_vq->buf_struct_size = sizeof(struct v4l2_m2m_buffer);
> +	src_vq->ops = &mem2mem_qops;
> +	src_vq->mem_ops = &vb2_dma_contig_memops;
> +	src_vq->timestamp_flags = V4L2_BUF_FLAG_TIMESTAMP_COPY;
> +	src_vq->lock = &ctx->priv->mutex;
> +	src_vq->dev = ctx->priv->dev;
> +
> +	ret = vb2_queue_init(src_vq);
> +	if (ret)
> +		return ret;
> +
> +	memset(dst_vq, 0, sizeof(*dst_vq));
> +	dst_vq->type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
> +	dst_vq->io_modes = VB2_MMAP | VB2_DMABUF;
> +	dst_vq->drv_priv = ctx;
> +	dst_vq->buf_struct_size = sizeof(struct v4l2_m2m_buffer);
> +	dst_vq->ops = &mem2mem_qops;
> +	dst_vq->mem_ops = &vb2_dma_contig_memops;
> +	dst_vq->timestamp_flags = V4L2_BUF_FLAG_TIMESTAMP_COPY;
> +	dst_vq->lock = &ctx->priv->mutex;
> +	dst_vq->dev = ctx->priv->dev;
> +
> +	return vb2_queue_init(dst_vq);
> +}
> +
> +static int mem2mem_s_ctrl(struct v4l2_ctrl *ctrl)
> +{
> +	struct mem2mem_ctx *ctx = container_of(ctrl->handler,
> +					       struct mem2mem_ctx,
> ctrl_hdlr);
> +	enum ipu_rotate_mode rot_mode;
> +	int rotate;
> +	bool hflip, vflip;
> +	int ret = 0;
> +
> +	rotate = ctx->rotate;
> +	hflip = ctx->hflip;
> +	vflip = ctx->vflip;
> +
> +	switch (ctrl->id) {
> +	case V4L2_CID_HFLIP:
> +		hflip = ctrl->val;
> +		break;
> +	case V4L2_CID_VFLIP:
> +		vflip = ctrl->val;
> +		break;
> +	case V4L2_CID_ROTATE:
> +		rotate = ctrl->val;
> +		break;
> +	default:
> +		return -EINVAL;
> +	}
> +
> +	ret = ipu_degrees_to_rot_mode(&rot_mode, rotate, hflip,
> vflip);
> +	if (ret)
> +		return ret;
> +
> +	if (rot_mode != ctx->rot_mode) {
> +		struct vb2_queue *cap_q;
> +
> +		cap_q = v4l2_m2m_get_vq(ctx->fh.m2m_ctx,
> +					V4L2_BUF_TYPE_VIDEO_CAPTURE)
> ;
> +		if (vb2_is_streaming(cap_q))
> +			return -EBUSY;
> +
> +		ctx->rot_mode = rot_mode;
> +		ctx->rotate = rotate;
> +		ctx->hflip = hflip;
> +		ctx->vflip = vflip;
> +	}
> +
> +	return 0;
> +}
> +
> +static const struct v4l2_ctrl_ops mem2mem_ctrl_ops = {
> +	.s_ctrl = mem2mem_s_ctrl,
> +};
> +
> +static int mem2mem_init_controls(struct mem2mem_ctx *ctx)
> +{
> +	struct v4l2_ctrl_handler *hdlr = &ctx->ctrl_hdlr;
> +	int ret;
> +
> +	v4l2_ctrl_handler_init(hdlr, 3);
> +
> +	v4l2_ctrl_new_std(hdlr, &mem2mem_ctrl_ops, V4L2_CID_HFLIP,
> +			  0, 1, 1, 0);
> +	v4l2_ctrl_new_std(hdlr, &mem2mem_ctrl_ops, V4L2_CID_VFLIP,
> +			  0, 1, 1, 0);
> +	v4l2_ctrl_new_std(hdlr, &mem2mem_ctrl_ops, V4L2_CID_ROTATE,
> +			  0, 270, 90, 0);
> +
> +	if (hdlr->error) {
> +		ret = hdlr->error;
> +		goto out_free;
> +	}
> +
> +	v4l2_ctrl_handler_setup(hdlr);
> +	return 0;
> +
> +out_free:
> +	v4l2_ctrl_handler_free(hdlr);
> +	return ret;
> +}
> +
> +#define DEFAULT_WIDTH	720
> +#define DEFAULT_HEIGHT	576
> +static const struct mem2mem_q_data mem2mem_q_data_default = {
> +	.cur_fmt = {
> +		.width = DEFAULT_WIDTH,
> +		.height = DEFAULT_HEIGHT,
> +		.pixelformat = V4L2_PIX_FMT_YUV420,
> +		.field = V4L2_FIELD_NONE,
> +		.bytesperline = DEFAULT_WIDTH,
> +		.sizeimage = DEFAULT_WIDTH * DEFAULT_HEIGHT * 3 / 2,
> +		.colorspace = V4L2_COLORSPACE_SRGB,
> +	},
> +	.rect = {
> +		.width = DEFAULT_WIDTH,
> +		.height = DEFAULT_HEIGHT,
> +	},
> +};
> +
> +/*
> + * File operations
> + */
> +static int mem2mem_open(struct file *file)
> +{
> +	struct mem2mem_priv *priv = video_drvdata(file);
> +	struct mem2mem_ctx *ctx = NULL;
> +	int ret;
> +
> +	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
> +	if (!ctx)
> +		return -ENOMEM;
> +
> +	ctx->rot_mode = IPU_ROTATE_NONE;
> +
> +	v4l2_fh_init(&ctx->fh, video_devdata(file));
> +	file->private_data = &ctx->fh;
> +	v4l2_fh_add(&ctx->fh);
> +	ctx->priv = priv;
> +
> +	ctx->fh.m2m_ctx = v4l2_m2m_ctx_init(priv->m2m_dev, ctx,
> +					    &queue_init);
> +	if (IS_ERR(ctx->fh.m2m_ctx)) {
> +		ret = PTR_ERR(ctx->fh.m2m_ctx);
> +		goto err_ctx;
> +	}
> +
> +	ret = mem2mem_init_controls(ctx);
> +	if (ret)
> +		goto err_ctrls;
> +
> +	ctx->fh.ctrl_handler = &ctx->ctrl_hdlr;
> +
> +	ctx->q_data[V4L2_M2M_SRC] = mem2mem_q_data_default;
> +	ctx->q_data[V4L2_M2M_DST] = mem2mem_q_data_default;
> +
> +	atomic_inc(&priv->num_inst);
> +
> +	dev_dbg(priv->dev, "Created instance %p, m2m_ctx: %p\n",
> ctx,
> +		ctx->fh.m2m_ctx);
> +
> +	return 0;
> +
> +err_ctrls:
> +	v4l2_m2m_ctx_release(ctx->fh.m2m_ctx);
> +err_ctx:
> +	v4l2_fh_del(&ctx->fh);
> +	v4l2_fh_exit(&ctx->fh);
> +	kfree(ctx);
> +	return ret;
> +}
> +
> +static int mem2mem_release(struct file *file)
> +{
> +	struct mem2mem_priv *priv = video_drvdata(file);
> +	struct mem2mem_ctx *ctx = fh_to_ctx(file->private_data);
> +
> +	dev_dbg(priv->dev, "Releasing instance %p\n", ctx);
> +
> +	v4l2_m2m_ctx_release(ctx->fh.m2m_ctx);
> +	v4l2_fh_del(&ctx->fh);
> +	v4l2_fh_exit(&ctx->fh);
> +	kfree(ctx);
> +
> +	atomic_dec(&priv->num_inst);
> +
> +	return 0;
> +}
> +
> +static const struct v4l2_file_operations mem2mem_fops = {
> +	.owner		= THIS_MODULE,
> +	.open		= mem2mem_open,
> +	.release	= mem2mem_release,
> +	.poll		= v4l2_m2m_fop_poll,
> +	.unlocked_ioctl	= video_ioctl2,
> +	.mmap		= v4l2_m2m_fop_mmap,
> +};
> +
> +static struct v4l2_m2m_ops m2m_ops = {
> +	.device_run	= device_run,
> +	.job_abort	= job_abort,
> +};
> +
> +static const struct video_device mem2mem_videodev_template = {
> +	.name		= "ipu0_ic_pp mem2mem",
> +	.fops		= &mem2mem_fops,
> +	.ioctl_ops	= &mem2mem_ioctl_ops,
> +	.minor		= -1,
> +	.release	= video_device_release,
> +	.vfl_dir	= VFL_DIR_M2M,
> +	.tvnorms	= V4L2_STD_NTSC | V4L2_STD_PAL |
> V4L2_STD_SECAM,
> +	.device_caps	= V4L2_CAP_VIDEO_M2M |
> V4L2_CAP_STREAMING,
> +};
> +
> +int imx_media_mem2mem_device_register(struct imx_media_video_dev
> *vdev)
> +{
> +	struct mem2mem_priv *priv = to_mem2mem_priv(vdev);
> +	struct video_device *vfd = vdev->vfd;
> +	int ret;
> +
> +	vfd->v4l2_dev = &priv->md->v4l2_dev;
> +
> +	ret = video_register_device(vfd, VFL_TYPE_GRABBER, -1);
> +	if (ret) {
> +		v4l2_err(vfd->v4l2_dev, "Failed to register video
> device\n");
> +		return ret;
> +	}
> +
> +	v4l2_info(vfd->v4l2_dev, "Registered %s as /dev/%s\n", vfd-
> >name,
> +		  video_device_node_name(vfd));
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(imx_media_mem2mem_device_register);
> +
> +void imx_media_mem2mem_device_unregister(struct imx_media_video_dev
> *vdev)
> +{
> +	struct mem2mem_priv *priv = to_mem2mem_priv(vdev);
> +	struct video_device *vfd = priv->vdev.vfd;
> +
> +	mutex_lock(&priv->mutex);
> +
> +	if (video_is_registered(vfd)) {
> +		video_unregister_device(vfd);
> +		media_entity_cleanup(&vfd->entity);
> +	}
> +
> +	mutex_unlock(&priv->mutex);
> +}
> +EXPORT_SYMBOL_GPL(imx_media_mem2mem_device_unregister);
> +
> +struct imx_media_video_dev *
> +imx_media_mem2mem_device_init(struct imx_media_dev *md)
> +{
> +	struct mem2mem_priv *priv;
> +	struct video_device *vfd;
> +	int ret;
> +
> +	priv = devm_kzalloc(md->md.dev, sizeof(*priv), GFP_KERNEL);
> +	if (!priv)
> +		return ERR_PTR(-ENOMEM);
> +
> +	priv->md = md;
> +	priv->dev = md->md.dev;
> +
> +	mutex_init(&priv->mutex);
> +	atomic_set(&priv->num_inst, 0);
> +
> +	vfd = video_device_alloc();
> +	if (!vfd)
> +		return ERR_PTR(-ENOMEM);
> +
> +	*vfd = mem2mem_videodev_template;
> +	snprintf(vfd->name, sizeof(vfd->name), "ipu_ic_pp mem2mem");
> +	vfd->lock = &priv->mutex;
> +	priv->vdev.vfd = vfd;
> +
> +	INIT_LIST_HEAD(&priv->vdev.list);
> +
> +	video_set_drvdata(vfd, priv);
> +
> +	priv->m2m_dev = v4l2_m2m_init(&m2m_ops);
> +	if (IS_ERR(priv->m2m_dev)) {
> +		ret = PTR_ERR(priv->m2m_dev);
> +		v4l2_err(&md->v4l2_dev, "Failed to init mem2mem
> device: %d\n",
> +			 ret);
> +		return ERR_PTR(ret);
> +	}
> +
> +	return &priv->vdev;
> +}
> +EXPORT_SYMBOL_GPL(imx_media_mem2mem_device_init);
> +
> +void imx_media_mem2mem_device_remove(struct imx_media_video_dev
> *vdev)
> +{
> +	struct mem2mem_priv *priv = to_mem2mem_priv(vdev);
> +
> +	v4l2_m2m_release(priv->m2m_dev);
> +}
> +EXPORT_SYMBOL_GPL(imx_media_mem2mem_device_remove);
> +
> +MODULE_DESCRIPTION("i.MX IPUv3 mem2mem scaler/CSC driver");
> +MODULE_AUTHOR("Sascha Hauer <s.hauer@pengutronix.de>");
> +MODULE_LICENSE("GPL");
> diff --git a/drivers/staging/media/imx/imx-media.h
> b/drivers/staging/media/imx/imx-media.h
> index e945e0ed6dd6..dc24ed37f050 100644
> --- a/drivers/staging/media/imx/imx-media.h
> +++ b/drivers/staging/media/imx/imx-media.h
> @@ -149,6 +149,9 @@ struct imx_media_dev {
>  	/* for async subdev registration */
>  	struct list_head asd_list;
>  	struct v4l2_async_notifier subdev_notifier;
> +
> +	/* IC scaler/CSC mem2mem video device */
> +	struct imx_media_video_dev *m2m_vdev;
>  };
>  
>  enum codespace_sel {
> @@ -262,6 +265,13 @@ void imx_media_capture_device_set_format(struct
> imx_media_video_dev *vdev,
>  					 struct v4l2_pix_format
> *pix);
>  void imx_media_capture_device_error(struct imx_media_video_dev
> *vdev);
>  
> +/* imx-media-mem2mem.c */
> +struct imx_media_video_dev *
> +imx_media_mem2mem_device_init(struct imx_media_dev *dev);
> +void imx_media_mem2mem_device_remove(struct imx_media_video_dev
> *vdev);
> +int imx_media_mem2mem_device_register(struct imx_media_video_dev
> *vdev);
> +void imx_media_mem2mem_device_unregister(struct imx_media_video_dev
> *vdev);
> +
>  /* subdev group ids */
>  #define IMX_MEDIA_GRP_ID_CSI2      BIT(8)
>  #define IMX_MEDIA_GRP_ID_CSI_BIT   9

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 16/16] media: imx: add mem2mem device
  2018-06-22 15:52 ` [PATCH 16/16] media: imx: add mem2mem device Philipp Zabel
  2018-06-22 19:37   ` Nicolas Dufresne
@ 2018-06-22 21:03   ` kbuild test robot
  2018-07-05 22:09   ` Steve Longerbeam
  2018-07-10 12:07   ` Pavel Machek
  3 siblings, 0 replies; 28+ messages in thread
From: kbuild test robot @ 2018-06-22 21:03 UTC (permalink / raw)
  To: Philipp Zabel; +Cc: kbuild-all, linux-media, kernel, Steve Longerbeam

[-- Attachment #1: Type: text/plain, Size: 1953 bytes --]

Hi Philipp,

I love your patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v4.18-rc1 next-20180622]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Philipp-Zabel/i-MX-media-mem2mem-scaler/20180623-024533
config: ia64-allmodconfig (attached as .config)
compiler: ia64-linux-gcc (GCC) 8.1.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        GCC_VERSION=8.1.0 make.cross ARCH=ia64 

All warnings (new ones prefixed by >>):

   drivers/staging/media/imx/imx-media-mem2mem.c: In function 'vidioc_querycap':
>> drivers/staging/media/imx/imx-media-mem2mem.c:160:2: warning: 'strncpy' output truncated copying 15 bytes from a string of length 17 [-Wstringop-truncation]
     strncpy(cap->driver, "imx-media-mem2mem", sizeof(cap->driver) - 1);
     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

vim +/strncpy +160 drivers/staging/media/imx/imx-media-mem2mem.c

   153	
   154	/*
   155	 * Video ioctls
   156	 */
   157	static int vidioc_querycap(struct file *file, void *priv,
   158				   struct v4l2_capability *cap)
   159	{
 > 160		strncpy(cap->driver, "imx-media-mem2mem", sizeof(cap->driver) - 1);
   161		strncpy(cap->card, "imx-media-mem2mem", sizeof(cap->card) - 1);
   162		strncpy(cap->bus_info, "platform:imx-media-mem2mem",
   163			sizeof(cap->bus_info) - 1);
   164		cap->device_caps = V4L2_CAP_VIDEO_M2M | V4L2_CAP_STREAMING;
   165		cap->capabilities = cap->device_caps | V4L2_CAP_DEVICE_CAPS;
   166	
   167		return 0;
   168	}
   169	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 50914 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 00/16] i.MX media mem2mem scaler
  2018-06-22 15:52 [PATCH 00/16] i.MX media mem2mem scaler Philipp Zabel
                   ` (15 preceding siblings ...)
  2018-06-22 15:52 ` [PATCH 16/16] media: imx: add mem2mem device Philipp Zabel
@ 2018-07-05 21:55 ` Steve Longerbeam
  2018-07-16 14:10   ` Philipp Zabel
  16 siblings, 1 reply; 28+ messages in thread
From: Steve Longerbeam @ 2018-07-05 21:55 UTC (permalink / raw)
  To: Philipp Zabel, linux-media; +Cc: kernel, Steve Longerbeam

Hi Philipp,

Thanks for this great patchset! Finally we have improved seams
with tiled conversions, and relaxed width alignment requirements.

Unfortunately this patchset isn't working correctly yet. It breaks tiled
conversions with rotation.

Trying the following conversion:

input: 720x480, UYVY
output: 1280x768, UYVY, rotation=90 degrees

causes non-8-byte aligned tile buffers at the output:

[  129.578210] imx-ipuv3 2400000.ipu: task 2: ctx 8955dec9: Input@[0,0]: 
phys 00000000
[  129.585980] imx-ipuv3 2400000.ipu: task 2: ctx 8955dec9: Input@[1,0]: 
phys 00051360
[  129.593736] imx-ipuv3 2400000.ipu: task 2: ctx 8955dec9: 
Output@[0,0]: phys 00000000
[  129.601556] imx-ipuv3 2400000.ipu: task 2: ctx 8955dec9: 
Output@[0,1]: phys 0000052e

resulting in hung conversion and abort timeout:

[  147.689220] imx-ipuv3 2400000.ipu: ipu_image_convert_abort: timeout

Note that when converting to a planar format, the final (rotated) chroma 
tile
buffers are also mis-aligned, in addition to the Y buffers.

I have some additional comments to follow in the patches.

Steve

On 06/22/2018 08:52 AM, Philipp Zabel wrote:
> Hi,
>
> we have image conversion code for scaling and colorspace conversion in
> the IPUv3 base driver for a while. Since the IC hardware can only write
> up to 1024x1024 pixel buffers, it scales to larger output buffers by
> splitting the input and output frame into similarly sized tiles.
>
> This causes the issue that the bilinear interpolation resets at the tile
> boundary: instead of smoothly interpolating across the seam, there is a
> jump in the input sample position that is very apparent for high
> upscaling factors. This can be avoided by slightly changing the scaling
> coefficients to let the left/top tiles overshoot their input sampling
> into the first pixel / line of their right / bottom neighbors. The error
> can be further reduced by letting tiles be differently sized and by
> selecting seam positions that minimize the input sampling position error
> at tile boundaries.
> This is complicated by different DMA start address, burst size, and
> rotator block size alignment requirements, depending on the input and
> output pixel formats, and the fact that flipping happens in different
> places depending on the rotation.
>
> This series implements optimal seam position selection and seam hiding
> with per-tile resizing coefficients and adds a scaling mem2mem device
> to the imx-media driver.
>
> regards
> Philipp
>
> Philipp Zabel (16):
>    gpu: ipu-v3: ipu-ic: allow to manually set resize coefficients
>    gpu: ipu-v3: image-convert: prepare for per-tile configuration
>    gpu: ipu-v3: image-convert: calculate per-tile resize coefficients
>    gpu: ipu-v3: image-convert: reconfigure IC per tile
>    gpu: ipu-v3: image-convert: store tile top/left position
>    gpu: ipu-v3: image-convert: calculate tile dimensions and offsets
>      outside fill_image
>    gpu: ipu-v3: image-convert: move tile alignment helpers
>    gpu: ipu-v3: image-convert: select optimal seam positions
>    gpu: ipu-v3: image-convert: fix debug output for varying tile sizes
>    gpu: ipu-v3: image-convert: relax tile width alignment for NV12 and
>      NV16
>    gpu: ipu-v3: image-convert: relax input alignment restrictions
>    gpu: ipu-v3: image-convert: relax output alignment restrictions
>    gpu: ipu-v3: image-convert: fix bytesperline adjustment
>    gpu: ipu-v3: image-convert: add some ASCII art to the exposition
>    gpu: ipu-v3: image-convert: disable double buffering if necessary
>    media: imx: add mem2mem device
>
>   drivers/gpu/ipu-v3/ipu-ic.c                   |  52 +-
>   drivers/gpu/ipu-v3/ipu-image-convert.c        | 865 +++++++++++++---
>   drivers/staging/media/imx/Kconfig             |   1 +
>   drivers/staging/media/imx/Makefile            |   1 +
>   drivers/staging/media/imx/imx-media-dev.c     |  11 +
>   drivers/staging/media/imx/imx-media-mem2mem.c | 953 ++++++++++++++++++
>   drivers/staging/media/imx/imx-media.h         |  10 +
>   include/video/imx-ipu-v3.h                    |   6 +
>   8 files changed, 1760 insertions(+), 139 deletions(-)
>   create mode 100644 drivers/staging/media/imx/imx-media-mem2mem.c
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 16/16] media: imx: add mem2mem device
  2018-06-22 15:52 ` [PATCH 16/16] media: imx: add mem2mem device Philipp Zabel
  2018-06-22 19:37   ` Nicolas Dufresne
  2018-06-22 21:03   ` kbuild test robot
@ 2018-07-05 22:09   ` Steve Longerbeam
  2018-07-16 14:12     ` Philipp Zabel
  2018-07-10 12:07   ` Pavel Machek
  3 siblings, 1 reply; 28+ messages in thread
From: Steve Longerbeam @ 2018-07-05 22:09 UTC (permalink / raw)
  To: Philipp Zabel, linux-media; +Cc: kernel, Steve Longerbeam

Hi Philipp,


On 06/22/2018 08:52 AM, Philipp Zabel wrote:
> Add a single imx-media mem2mem video device that uses the IPU IC PP
> (image converter post processing) task for scaling and colorspace
> conversion.
> On i.MX6Q/DL SoCs with two IPUs currently only the first IPU is used.
>
> The hardware only supports writing to destination buffers up to
> 1024x1024 pixels in a single pass, so the mem2mem video device is
> limited to this resolution. After fixing the tiling code it should
> be possible to extend this to arbitrary sizes by rendering multiple
> tiles per frame.
>
> Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
> ---
>   drivers/staging/media/imx/Kconfig             |   1 +
>   drivers/staging/media/imx/Makefile            |   1 +
>   drivers/staging/media/imx/imx-media-dev.c     |  11 +
>   drivers/staging/media/imx/imx-media-mem2mem.c | 953 ++++++++++++++++++
>   drivers/staging/media/imx/imx-media.h         |  10 +
>   5 files changed, 976 insertions(+)
>   create mode 100644 drivers/staging/media/imx/imx-media-mem2mem.c
>
> diff --git a/drivers/staging/media/imx/Kconfig b/drivers/staging/media/imx/Kconfig
> index bfc17de56b17..07013cb3cb66 100644
> --- a/drivers/staging/media/imx/Kconfig
> +++ b/drivers/staging/media/imx/Kconfig
> @@ -6,6 +6,7 @@ config VIDEO_IMX_MEDIA
>   	depends on HAS_DMA
>   	select VIDEOBUF2_DMA_CONTIG
>   	select V4L2_FWNODE
> +	select V4L2_MEM2MEM_DEV
>   	---help---
>   	  Say yes here to enable support for video4linux media controller
>   	  driver for the i.MX5/6 SOC.
> diff --git a/drivers/staging/media/imx/Makefile b/drivers/staging/media/imx/Makefile
> index 698a4210316e..f2e722d0fa19 100644
> --- a/drivers/staging/media/imx/Makefile
> +++ b/drivers/staging/media/imx/Makefile
> @@ -6,6 +6,7 @@ imx-media-ic-objs := imx-ic-common.o imx-ic-prp.o imx-ic-prpencvf.o
>   obj-$(CONFIG_VIDEO_IMX_MEDIA) += imx-media.o
>   obj-$(CONFIG_VIDEO_IMX_MEDIA) += imx-media-common.o
>   obj-$(CONFIG_VIDEO_IMX_MEDIA) += imx-media-capture.o
> +obj-$(CONFIG_VIDEO_IMX_MEDIA) += imx-media-mem2mem.o
>   obj-$(CONFIG_VIDEO_IMX_MEDIA) += imx-media-vdic.o
>   obj-$(CONFIG_VIDEO_IMX_MEDIA) += imx-media-ic.o
>   
> diff --git a/drivers/staging/media/imx/imx-media-dev.c b/drivers/staging/media/imx/imx-media-dev.c
> index 289d775c4820..7a9aabcae3ee 100644
> --- a/drivers/staging/media/imx/imx-media-dev.c
> +++ b/drivers/staging/media/imx/imx-media-dev.c
> @@ -359,6 +359,17 @@ static int imx_media_probe_complete(struct v4l2_async_notifier *notifier)
>   		goto unlock;
>   
>   	ret = v4l2_device_register_subdev_nodes(&imxmd->v4l2_dev);
> +	if (ret)
> +		goto unlock;
> +
> +	/* TODO: check whether we have IC subdevices first */
> +	imxmd->m2m_vdev = imx_media_mem2mem_device_init(imxmd);
> +	if (IS_ERR(imxmd->m2m_vdev)) {
> +		ret = PTR_ERR(imxmd->m2m_vdev);
> +		goto unlock;
> +	}
> +
> +	ret = imx_media_mem2mem_device_register(imxmd->m2m_vdev);
>   unlock:
>   	mutex_unlock(&imxmd->mutex);
>   	if (ret)
> diff --git a/drivers/staging/media/imx/imx-media-mem2mem.c b/drivers/staging/media/imx/imx-media-mem2mem.c
> new file mode 100644
> index 000000000000..8830f77f0407
> --- /dev/null
> +++ b/drivers/staging/media/imx/imx-media-mem2mem.c
> @@ -0,0 +1,953 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * i.MX IPUv3 mem2mem Scaler/CSC driver
> + *
> + * Copyright (C) 2011 Pengutronix, Sascha Hauer
> + * Copyright (C) 2018 Pengutronix, Philipp Zabel
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + */
> +#include <linux/module.h>
> +#include <linux/delay.h>
> +#include <linux/fs.h>
> +#include <linux/version.h>
> +#include <linux/sched.h>
> +#include <linux/slab.h>
> +#include <video/imx-ipu-v3.h>
> +#include <video/imx-ipu-image-convert.h>
> +
> +#include <media/v4l2-ctrls.h>
> +#include <media/v4l2-mem2mem.h>
> +#include <media/v4l2-device.h>
> +#include <media/v4l2-ioctl.h>
> +#include <media/videobuf2-dma-contig.h>
> +
> +#include "imx-media.h"
> +
> +#define MIN_W 16
> +#define MIN_H 16
> +#define MAX_W 4096
> +#define MAX_H 4096
> +
> +#define fh_to_ctx(__fh)	container_of(__fh, struct mem2mem_ctx, fh)
> +
> +enum {
> +	V4L2_M2M_SRC = 0,
> +	V4L2_M2M_DST = 1,
> +};
> +
> +struct mem2mem_priv {
> +	struct imx_media_video_dev vdev;
> +
> +	struct v4l2_m2m_dev   *m2m_dev;
> +	struct device         *dev;
> +
> +	struct imx_media_dev  *md;
> +
> +	struct mutex          mutex;       /* mem2mem device mutex */
> +
> +	atomic_t              num_inst;
> +};
> +
> +#define to_mem2mem_priv(v) container_of(v, struct mem2mem_priv, vdev)
> +
> +/* Per-queue, driver-specific private data */
> +struct mem2mem_q_data {
> +	struct v4l2_pix_format	cur_fmt;
> +	struct v4l2_rect	rect;
> +};
> +
> +struct mem2mem_ctx {
> +	struct mem2mem_priv	*priv;
> +
> +	struct v4l2_fh		fh;
> +	struct mem2mem_q_data	q_data[2];
> +	int			error;
> +	struct ipu_image_convert_ctx *icc;
> +
> +	struct v4l2_ctrl_handler ctrl_hdlr;
> +	int rotate;
> +	bool hflip;
> +	bool vflip;
> +	enum ipu_rotate_mode	rot_mode;
> +};
> +
> +static struct mem2mem_q_data *get_q_data(struct mem2mem_ctx *ctx,
> +					 enum v4l2_buf_type type)
> +{
> +	if (V4L2_TYPE_IS_OUTPUT(type))
> +		return &ctx->q_data[V4L2_M2M_SRC];
> +	else
> +		return &ctx->q_data[V4L2_M2M_DST];
> +}
> +
> +/*
> + * mem2mem callbacks
> + */
> +
> +static void job_abort(void *_ctx)
> +{
> +	struct mem2mem_ctx *ctx = _ctx;
> +
> +	if (ctx->icc)
> +		ipu_image_convert_abort(ctx->icc);
> +}
> +
> +static void mem2mem_ic_complete(struct ipu_image_convert_run *run, void *_ctx)
> +{
> +	struct mem2mem_ctx *ctx = _ctx;
> +	struct mem2mem_priv *priv = ctx->priv;
> +	struct vb2_v4l2_buffer *src_buf, *dst_buf;
> +
> +	src_buf = v4l2_m2m_src_buf_remove(ctx->fh.m2m_ctx);
> +	dst_buf = v4l2_m2m_dst_buf_remove(ctx->fh.m2m_ctx);
> +
> +	dst_buf->vb2_buf.timestamp = src_buf->vb2_buf.timestamp;
> +	dst_buf->timecode = src_buf->timecode;
> +
> +	v4l2_m2m_buf_done(src_buf, run->status ? VB2_BUF_STATE_ERROR :
> +						 VB2_BUF_STATE_DONE);
> +	v4l2_m2m_buf_done(dst_buf, run->status ? VB2_BUF_STATE_ERROR :
> +						 VB2_BUF_STATE_DONE);
> +
> +	v4l2_m2m_job_finish(priv->m2m_dev, ctx->fh.m2m_ctx);
> +	kfree(run);
> +}
> +
> +static void device_run(void *_ctx)
> +{
> +	struct mem2mem_ctx *ctx = _ctx;
> +	struct mem2mem_priv *priv = ctx->priv;
> +	struct vb2_v4l2_buffer *src_buf, *dst_buf;
> +	struct ipu_image_convert_run *run;
> +	int ret;
> +
> +	src_buf = v4l2_m2m_next_src_buf(ctx->fh.m2m_ctx);
> +	dst_buf = v4l2_m2m_next_dst_buf(ctx->fh.m2m_ctx);
> +
> +	run = kzalloc(sizeof(*run), GFP_KERNEL);
> +	if (!run)
> +		goto err;
> +
> +	run->ctx = ctx->icc;
> +	run->in_phys = vb2_dma_contig_plane_dma_addr(&src_buf->vb2_buf, 0);
> +	run->out_phys = vb2_dma_contig_plane_dma_addr(&dst_buf->vb2_buf, 0);
> +
> +	ret = ipu_image_convert_queue(run);
> +	if (ret < 0) {
> +		v4l2_err(ctx->priv->vdev.vfd->v4l2_dev,
> +			 "%s: failed to queue: %d\n", __func__, ret);
> +		goto err;
> +	}
> +
> +	return;
> +
> +err:
> +	v4l2_m2m_buf_done(src_buf, VB2_BUF_STATE_ERROR);
> +	v4l2_m2m_buf_done(dst_buf, VB2_BUF_STATE_ERROR);
> +	v4l2_m2m_job_finish(priv->m2m_dev, ctx->fh.m2m_ctx);
> +}
> +
> +/*
> + * Video ioctls
> + */
> +static int vidioc_querycap(struct file *file, void *priv,
> +			   struct v4l2_capability *cap)
> +{
> +	strncpy(cap->driver, "imx-media-mem2mem", sizeof(cap->driver) - 1);
> +	strncpy(cap->card, "imx-media-mem2mem", sizeof(cap->card) - 1);
> +	strncpy(cap->bus_info, "platform:imx-media-mem2mem",
> +		sizeof(cap->bus_info) - 1);
> +	cap->device_caps = V4L2_CAP_VIDEO_M2M | V4L2_CAP_STREAMING;
> +	cap->capabilities = cap->device_caps | V4L2_CAP_DEVICE_CAPS;
> +
> +	return 0;
> +}
> +
> +static int mem2mem_enum_fmt(struct file *file, void *fh,
> +			    struct v4l2_fmtdesc *f)
> +{
> +	u32 fourcc;
> +	int ret;
> +
> +	ret = imx_media_enum_format(&fourcc, f->index, CS_SEL_ANY);
> +	if (ret)
> +		return ret;
> +
> +	f->pixelformat = fourcc;
> +
> +	return 0;
> +}
> +
> +static int mem2mem_g_fmt(struct file *file, void *priv, struct v4l2_format *f)
> +{
> +	struct mem2mem_ctx *ctx = fh_to_ctx(priv);
> +	struct mem2mem_q_data *q_data;
> +
> +	q_data = get_q_data(ctx, f->type);
> +
> +	f->fmt.pix = q_data->cur_fmt;
> +
> +	return 0;
> +}
> +
> +static int mem2mem_try_fmt(struct file *file, void *priv,
> +			   struct v4l2_format *f)
> +{
> +	const struct imx_media_pixfmt *cc;
> +	struct mem2mem_ctx *ctx = fh_to_ctx(priv);
> +	struct mem2mem_q_data *q_data = get_q_data(ctx, f->type);
> +	unsigned int walign, halign;
> +	u32 stride;
> +
> +	cc = imx_media_find_format(f->fmt.pix.pixelformat, CS_SEL_ANY, false);
> +	if (!cc) {
> +		f->fmt.pix.pixelformat = V4L2_PIX_FMT_RGB32;
> +		cc = imx_media_find_format(V4L2_PIX_FMT_RGB32, CS_SEL_RGB,
> +					   false);
> +	}
> +
> +	/*
> +	 * Horizontally/vertically chroma subsampled formats must have even
> +	 * width/height.
> +	 */
> +	switch (f->fmt.pix.pixelformat) {
> +	case V4L2_PIX_FMT_YUV420:
> +	case V4L2_PIX_FMT_YVU420:
> +	case V4L2_PIX_FMT_NV12:
> +		walign = 1;
> +		halign = 1;
> +		break;
> +	case V4L2_PIX_FMT_YUV422P:
> +	case V4L2_PIX_FMT_NV16:
> +		walign = 1;
> +		halign = 0;
> +		break;
> +	default:

The default case should init walign, otherwise for OUTPUT direction,
walign may not get initialized at all, see below...

> +		halign = 0;
> +		break;
> +	}
> +	if (f->type == V4L2_BUF_TYPE_VIDEO_OUTPUT) {
> +		/*
> +		 * The IC burst reads 8 pixels at a time. Reading beyond the
> +		 * end of the line is usually acceptable. Those pixels are
> +		 * ignored, unless the IC has to write the scaled line in
> +		 * reverse.
> +		 */
> +		if (!ipu_rot_mode_is_irt(ctx->rot_mode) &&
> +		    ctx->rot_mode && IPU_ROT_BIT_HFLIP)
> +			walign = 3;

This looks wrong. Do you mean:

if (ipu_rot_mode_is_irt(ctx->rot_mode) || (ctx->rot_mode & IPU_ROT_BIT_HFLIP))
     walign = 3;
else
     walign = 1;


That is, require 8 byte width alignment for IRT or if HFLIP is enabled.


Also, why not simply call ipu_image_convert_adjust() in
mem2mem_try_fmt()? If there is something missing in the former
function, then it should be added there, instead of adding the
missing checks in mem2mem_try_fmt().

  
Steve

> +	} else {
> +		if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
> +			switch (f->fmt.pix.pixelformat) {
> +			case V4L2_PIX_FMT_YUV420:
> +			case V4L2_PIX_FMT_YVU420:
> +			case V4L2_PIX_FMT_YUV422P:
> +				/*
> +				 * Align to 16x16 pixel blocks for planar 4:2:0
> +				 * chroma subsampled formats to guarantee
> +				 * 8-byte aligned line start addresses in the
> +				 * chroma planes.
> +				 */
> +				walign = 4;
> +				halign = 4;
> +				break;
> +			default:
> +				/*
> +				 * Align to 8x8 pixel IRT block size for all
> +				 * other formats.
> +				 */
> +				walign = 3;
> +				halign = 3;
> +				break;
> +			}
> +		} else {
> +			/*
> +			 * The IC burst writes 8 pixels at a time.
> +			 *
> +			 * TODO: support unaligned width with via
> +			 * V4L2_SEL_TGT_COMPOSE_PADDED.
> +			 */
> +			walign = 3;
> +		}
> +	}
> +	v4l_bound_align_image(&f->fmt.pix.width, MIN_W, MAX_W, walign,
> +			      &f->fmt.pix.height, MIN_H, MAX_H, halign, 0);
> +
> +	stride = cc->planar ? f->fmt.pix.width
> +			    : (f->fmt.pix.width * cc->bpp) >> 3;
> +	switch (f->fmt.pix.pixelformat) {
> +	case V4L2_PIX_FMT_YUV420:
> +	case V4L2_PIX_FMT_YVU420:
> +	case V4L2_PIX_FMT_YUV422P:
> +		stride = round_up(stride, 16);
> +		break;
> +	default:
> +		stride = round_up(stride, 8);
> +		break;
> +	}
> +
> +	f->fmt.pix.field = V4L2_FIELD_NONE;
> +	f->fmt.pix.bytesperline = stride;
> +	f->fmt.pix.sizeimage = cc->planar ?
> +			       (stride * f->fmt.pix.height * cc->bpp) >> 3 :
> +			       stride * f->fmt.pix.height;
> +
> +	if (f->type == V4L2_BUF_TYPE_VIDEO_CAPTURE) {
> +		f->fmt.pix.colorspace = q_data->cur_fmt.colorspace;
> +		f->fmt.pix.ycbcr_enc = q_data->cur_fmt.ycbcr_enc;
> +		f->fmt.pix.xfer_func = q_data->cur_fmt.xfer_func;
> +		f->fmt.pix.quantization = q_data->cur_fmt.quantization;
> +	} else if (f->fmt.pix.colorspace == V4L2_COLORSPACE_DEFAULT) {
> +		f->fmt.pix.colorspace = V4L2_COLORSPACE_SRGB;
> +		f->fmt.pix.ycbcr_enc = V4L2_YCBCR_ENC_DEFAULT;
> +		f->fmt.pix.xfer_func = V4L2_XFER_FUNC_DEFAULT;
> +		f->fmt.pix.quantization = V4L2_QUANTIZATION_DEFAULT;
> +	}
> +
> +	return 0;
> +}
> +
> +static int mem2mem_s_fmt(struct file *file, void *priv, struct v4l2_format *f)
> +{
> +	struct mem2mem_q_data *q_data;
> +	struct mem2mem_ctx *ctx = fh_to_ctx(priv);
> +	struct vb2_queue *vq;
> +	int ret;
> +
> +	vq = v4l2_m2m_get_vq(ctx->fh.m2m_ctx, f->type);
> +	if (vb2_is_busy(vq)) {
> +		v4l2_err(ctx->priv->vdev.vfd->v4l2_dev, "%s queue busy\n",
> +			 __func__);
> +		return -EBUSY;
> +	}
> +
> +	q_data = get_q_data(ctx, f->type);
> +
> +	ret = mem2mem_try_fmt(file, priv, f);
> +	if (ret < 0)
> +		return ret;
> +
> +	q_data->cur_fmt.width = f->fmt.pix.width;
> +	q_data->cur_fmt.height = f->fmt.pix.height;
> +	q_data->cur_fmt.pixelformat = f->fmt.pix.pixelformat;
> +	q_data->cur_fmt.field = f->fmt.pix.field;
> +	q_data->cur_fmt.bytesperline = f->fmt.pix.bytesperline;
> +	q_data->cur_fmt.sizeimage = f->fmt.pix.sizeimage;
> +
> +	/* Reset cropping/composing rectangle */
> +	q_data->rect.left = 0;
> +	q_data->rect.top = 0;
> +	q_data->rect.width = q_data->cur_fmt.width;
> +	q_data->rect.height = q_data->cur_fmt.height;
> +
> +	if (f->type == V4L2_BUF_TYPE_VIDEO_OUTPUT) {
> +		/* Set colorimetry on the output queue */
> +		q_data->cur_fmt.colorspace = f->fmt.pix.colorspace;
> +		q_data->cur_fmt.ycbcr_enc = f->fmt.pix.ycbcr_enc;
> +		q_data->cur_fmt.xfer_func = f->fmt.pix.xfer_func;
> +		q_data->cur_fmt.quantization = f->fmt.pix.quantization;
> +		/* Propagate colorimetry to the capture queue */
> +		q_data = get_q_data(ctx, V4L2_BUF_TYPE_VIDEO_CAPTURE);
> +		q_data->cur_fmt.colorspace = f->fmt.pix.colorspace;
> +		q_data->cur_fmt.ycbcr_enc = f->fmt.pix.ycbcr_enc;
> +		q_data->cur_fmt.xfer_func = f->fmt.pix.xfer_func;
> +		q_data->cur_fmt.quantization = f->fmt.pix.quantization;
> +	}
> +
> +	/*
> +	 * TODO: Setting colorimetry on the capture queue is currently not
> +	 * supported by the V4L2 API
> +	 */
> +
> +	return 0;
> +}
> +
> +static int mem2mem_g_selection(struct file *file, void *priv,
> +			       struct v4l2_selection *s)
> +{
> +	struct mem2mem_ctx *ctx = fh_to_ctx(priv);
> +	struct mem2mem_q_data *q_data;
> +
> +	switch (s->target) {
> +	case V4L2_SEL_TGT_CROP:
> +	case V4L2_SEL_TGT_CROP_DEFAULT:
> +	case V4L2_SEL_TGT_CROP_BOUNDS:
> +		if (s->type != V4L2_BUF_TYPE_VIDEO_OUTPUT)
> +			return -EINVAL;
> +		q_data = get_q_data(ctx, V4L2_BUF_TYPE_VIDEO_OUTPUT);
> +		break;
> +	case V4L2_SEL_TGT_COMPOSE:
> +	case V4L2_SEL_TGT_COMPOSE_DEFAULT:
> +	case V4L2_SEL_TGT_COMPOSE_BOUNDS:
> +	case V4L2_SEL_TGT_COMPOSE_PADDED:
> +		if (s->type != V4L2_BUF_TYPE_VIDEO_CAPTURE)
> +			return -EINVAL;
> +		q_data = get_q_data(ctx, V4L2_BUF_TYPE_VIDEO_CAPTURE);
> +		break;
> +	default:
> +		return -EINVAL;
> +	}
> +
> +	if (s->target == V4L2_SEL_TGT_CROP ||
> +	    s->target == V4L2_SEL_TGT_COMPOSE) {
> +		s->r = q_data->rect;
> +	} else {
> +		s->r.left = 0;
> +		s->r.top = 0;
> +		s->r.width = q_data->cur_fmt.width;
> +		s->r.height = q_data->cur_fmt.height;
> +	}
> +
> +	return 0;
> +}
> +
> +static int mem2mem_s_selection(struct file *file, void *priv,
> +			       struct v4l2_selection *s)
> +{
> +	struct mem2mem_ctx *ctx = fh_to_ctx(priv);
> +	struct mem2mem_q_data *q_data;
> +
> +	switch (s->target) {
> +	case V4L2_SEL_TGT_CROP:
> +		if (s->type != V4L2_BUF_TYPE_VIDEO_OUTPUT)
> +			return -EINVAL;
> +		break;
> +	case V4L2_SEL_TGT_COMPOSE:
> +		if (s->type != V4L2_BUF_TYPE_VIDEO_CAPTURE)
> +			return -EINVAL;
> +		break;
> +	default:
> +		return -EINVAL;
> +	}
> +
> +	if (s->type != V4L2_BUF_TYPE_VIDEO_CAPTURE ||
> +	    s->type != V4L2_BUF_TYPE_VIDEO_OUTPUT)
> +		return -EINVAL;
> +
> +	q_data = get_q_data(ctx, s->type);
> +
> +	/* The input's frame width to the IC must be a multiple of 8 pixels
> +	 * When performing resizing the frame width must be multiple of burst
> +	 * size - 8 or 16 pixels as defined by CB#_BURST_16 parameter.
> +	 */
> +	if (s->flags & V4L2_SEL_FLAG_GE)
> +		s->r.width = round_up(s->r.width, 8);
> +	if (s->flags & V4L2_SEL_FLAG_LE)
> +		s->r.width = round_down(s->r.width, 8);
> +	s->r.width = clamp_t(unsigned int, s->r.width, 8,
> +			     round_down(q_data->cur_fmt.width, 8));
> +	s->r.height = clamp_t(unsigned int, s->r.height, 1,
> +			      q_data->cur_fmt.height);
> +	s->r.left = clamp_t(unsigned int, s->r.left, 0,
> +			    q_data->cur_fmt.width - s->r.width);
> +	s->r.top = clamp_t(unsigned int, s->r.top, 0,
> +			   q_data->cur_fmt.height - s->r.height);
> +
> +	/* V4L2_SEL_FLAG_KEEP_CONFIG is only valid for subdevices */
> +	q_data->rect = s->r;
> +
> +	return 0;
> +}
> +
> +static const struct v4l2_ioctl_ops mem2mem_ioctl_ops = {
> +	.vidioc_querycap	= vidioc_querycap,
> +
> +	.vidioc_enum_fmt_vid_cap = mem2mem_enum_fmt,
> +	.vidioc_g_fmt_vid_cap	= mem2mem_g_fmt,
> +	.vidioc_try_fmt_vid_cap	= mem2mem_try_fmt,
> +	.vidioc_s_fmt_vid_cap	= mem2mem_s_fmt,
> +
> +	.vidioc_enum_fmt_vid_out = mem2mem_enum_fmt,
> +	.vidioc_g_fmt_vid_out	= mem2mem_g_fmt,
> +	.vidioc_try_fmt_vid_out	= mem2mem_try_fmt,
> +	.vidioc_s_fmt_vid_out	= mem2mem_s_fmt,
> +
> +	.vidioc_g_selection	= mem2mem_g_selection,
> +	.vidioc_s_selection	= mem2mem_s_selection,
> +
> +	.vidioc_reqbufs		= v4l2_m2m_ioctl_reqbufs,
> +	.vidioc_querybuf	= v4l2_m2m_ioctl_querybuf,
> +
> +	.vidioc_qbuf		= v4l2_m2m_ioctl_qbuf,
> +	.vidioc_expbuf		= v4l2_m2m_ioctl_expbuf,
> +	.vidioc_dqbuf		= v4l2_m2m_ioctl_dqbuf,
> +	.vidioc_create_bufs	= v4l2_m2m_ioctl_create_bufs,
> +
> +	.vidioc_streamon	= v4l2_m2m_ioctl_streamon,
> +	.vidioc_streamoff	= v4l2_m2m_ioctl_streamoff,
> +};
> +
> +/*
> + * Queue operations
> + */
> +
> +static int mem2mem_queue_setup(struct vb2_queue *vq, unsigned int *nbuffers,
> +			       unsigned int *nplanes, unsigned int sizes[],
> +			       struct device *alloc_devs[])
> +{
> +	struct mem2mem_ctx *ctx = vb2_get_drv_priv(vq);
> +	struct mem2mem_q_data *q_data;
> +	unsigned int count = *nbuffers;
> +	struct v4l2_pix_format *pix;
> +
> +	q_data = get_q_data(ctx, vq->type);
> +	pix = &q_data->cur_fmt;
> +
> +	*nplanes = 1;
> +	*nbuffers = count;
> +	sizes[0] = pix->sizeimage;
> +
> +	dev_dbg(ctx->priv->dev, "get %d buffer(s) of size %d each.\n",
> +		count, pix->sizeimage);
> +
> +	return 0;
> +}
> +
> +static int mem2mem_buf_prepare(struct vb2_buffer *vb)
> +{
> +	struct mem2mem_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
> +	struct mem2mem_q_data *q_data;
> +	struct v4l2_pix_format *pix;
> +	unsigned int plane_size, payload;
> +
> +	dev_dbg(ctx->priv->dev, "type: %d\n", vb->vb2_queue->type);
> +
> +	q_data = get_q_data(ctx, vb->vb2_queue->type);
> +	pix = &q_data->cur_fmt;
> +	plane_size = pix->sizeimage;
> +
> +	if (vb2_plane_size(vb, 0) < plane_size) {
> +		dev_dbg(ctx->priv->dev,
> +			"%s data will not fit into plane (%lu < %lu)\n",
> +			__func__, vb2_plane_size(vb, 0), (long)plane_size);
> +		return -EINVAL;
> +	}
> +
> +	payload = pix->bytesperline * pix->height;
> +	if (pix->pixelformat == V4L2_PIX_FMT_YUV420 ||
> +	    pix->pixelformat == V4L2_PIX_FMT_YVU420 ||
> +	    pix->pixelformat == V4L2_PIX_FMT_NV12)
> +		payload = payload * 3 / 2;
> +	else if (pix->pixelformat == V4L2_PIX_FMT_YUV422P ||
> +		 pix->pixelformat == V4L2_PIX_FMT_NV16)
> +		payload *= 2;
> +
> +	vb2_set_plane_payload(vb, 0, payload);
> +
> +	return 0;
> +}
> +
> +static void mem2mem_buf_queue(struct vb2_buffer *vb)
> +{
> +	struct mem2mem_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
> +
> +	v4l2_m2m_buf_queue(ctx->fh.m2m_ctx, to_vb2_v4l2_buffer(vb));
> +}
> +
> +static void ipu_image_from_q_data(struct ipu_image *im,
> +				  struct mem2mem_q_data *q_data)
> +{
> +	im->pix.width = q_data->cur_fmt.width;
> +	im->pix.height = q_data->cur_fmt.height;
> +	im->pix.bytesperline = q_data->cur_fmt.bytesperline;
> +	im->pix.pixelformat = q_data->cur_fmt.pixelformat;
> +	im->rect = q_data->rect;
> +}
> +
> +static int mem2mem_start_streaming(struct vb2_queue *q, unsigned int count)
> +{
> +	const enum ipu_ic_task ic_task = IC_TASK_POST_PROCESSOR;
> +	struct mem2mem_ctx *ctx = vb2_get_drv_priv(q);
> +	struct mem2mem_priv *priv = ctx->priv;
> +	struct ipu_soc *ipu = priv->md->ipu[0];
> +	struct mem2mem_q_data *q_data;
> +	struct vb2_queue *other_q;
> +	struct ipu_image in, out;
> +
> +	other_q = v4l2_m2m_get_vq(ctx->fh.m2m_ctx,
> +				  (q->type == V4L2_BUF_TYPE_VIDEO_CAPTURE) ?
> +				  V4L2_BUF_TYPE_VIDEO_OUTPUT :
> +				  V4L2_BUF_TYPE_VIDEO_CAPTURE);
> +	if (!vb2_is_streaming(other_q))
> +		return 0;
> +
> +	if (ctx->icc) {
> +		v4l2_warn(ctx->priv->vdev.vfd->v4l2_dev, "removing old ICC\n");
> +		ipu_image_convert_unprepare(ctx->icc);
> +	}
> +
> +	q_data = get_q_data(ctx, V4L2_BUF_TYPE_VIDEO_OUTPUT);
> +	ipu_image_from_q_data(&in, q_data);
> +
> +	q_data = get_q_data(ctx, V4L2_BUF_TYPE_VIDEO_CAPTURE);
> +	ipu_image_from_q_data(&out, q_data);
> +
> +	ctx->icc = ipu_image_convert_prepare(ipu, ic_task, &in, &out,
> +					     ctx->rot_mode,
> +					     mem2mem_ic_complete, ctx);
> +	if (IS_ERR(ctx->icc)) {
> +		struct vb2_v4l2_buffer *buf;
> +		int ret = PTR_ERR(ctx->icc);
> +
> +		ctx->icc = NULL;
> +		v4l2_err(ctx->priv->vdev.vfd->v4l2_dev, "%s: error %d\n",
> +			 __func__, ret);
> +		while ((buf = v4l2_m2m_src_buf_remove(ctx->fh.m2m_ctx)))
> +			v4l2_m2m_buf_done(buf, VB2_BUF_STATE_QUEUED);
> +		while ((buf = v4l2_m2m_dst_buf_remove(ctx->fh.m2m_ctx)))
> +			v4l2_m2m_buf_done(buf, VB2_BUF_STATE_QUEUED);
> +		return ret;
> +	}
> +
> +	return 0;
> +}
> +
> +static void mem2mem_stop_streaming(struct vb2_queue *q)
> +{
> +	struct mem2mem_ctx *ctx = vb2_get_drv_priv(q);
> +	struct vb2_v4l2_buffer *buf;
> +
> +	if (ctx->icc) {
> +		ipu_image_convert_unprepare(ctx->icc);
> +		ctx->icc = NULL;
> +	}
> +
> +	if (q->type == V4L2_BUF_TYPE_VIDEO_OUTPUT) {
> +		while ((buf = v4l2_m2m_src_buf_remove(ctx->fh.m2m_ctx)))
> +			v4l2_m2m_buf_done(buf, VB2_BUF_STATE_ERROR);
> +	} else {
> +		while ((buf = v4l2_m2m_dst_buf_remove(ctx->fh.m2m_ctx)))
> +			v4l2_m2m_buf_done(buf, VB2_BUF_STATE_ERROR);
> +	}
> +}
> +
> +static const struct vb2_ops mem2mem_qops = {
> +	.queue_setup	= mem2mem_queue_setup,
> +	.buf_prepare	= mem2mem_buf_prepare,
> +	.buf_queue	= mem2mem_buf_queue,
> +	.wait_prepare	= vb2_ops_wait_prepare,
> +	.wait_finish	= vb2_ops_wait_finish,
> +	.start_streaming = mem2mem_start_streaming,
> +	.stop_streaming = mem2mem_stop_streaming,
> +};
> +
> +static int queue_init(void *priv, struct vb2_queue *src_vq,
> +		      struct vb2_queue *dst_vq)
> +{
> +	struct mem2mem_ctx *ctx = priv;
> +	int ret;
> +
> +	memset(src_vq, 0, sizeof(*src_vq));
> +	src_vq->type = V4L2_BUF_TYPE_VIDEO_OUTPUT;
> +	src_vq->io_modes = VB2_MMAP | VB2_DMABUF;
> +	src_vq->drv_priv = ctx;
> +	src_vq->buf_struct_size = sizeof(struct v4l2_m2m_buffer);
> +	src_vq->ops = &mem2mem_qops;
> +	src_vq->mem_ops = &vb2_dma_contig_memops;
> +	src_vq->timestamp_flags = V4L2_BUF_FLAG_TIMESTAMP_COPY;
> +	src_vq->lock = &ctx->priv->mutex;
> +	src_vq->dev = ctx->priv->dev;
> +
> +	ret = vb2_queue_init(src_vq);
> +	if (ret)
> +		return ret;
> +
> +	memset(dst_vq, 0, sizeof(*dst_vq));
> +	dst_vq->type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
> +	dst_vq->io_modes = VB2_MMAP | VB2_DMABUF;
> +	dst_vq->drv_priv = ctx;
> +	dst_vq->buf_struct_size = sizeof(struct v4l2_m2m_buffer);
> +	dst_vq->ops = &mem2mem_qops;
> +	dst_vq->mem_ops = &vb2_dma_contig_memops;
> +	dst_vq->timestamp_flags = V4L2_BUF_FLAG_TIMESTAMP_COPY;
> +	dst_vq->lock = &ctx->priv->mutex;
> +	dst_vq->dev = ctx->priv->dev;
> +
> +	return vb2_queue_init(dst_vq);
> +}
> +
> +static int mem2mem_s_ctrl(struct v4l2_ctrl *ctrl)
> +{
> +	struct mem2mem_ctx *ctx = container_of(ctrl->handler,
> +					       struct mem2mem_ctx, ctrl_hdlr);
> +	enum ipu_rotate_mode rot_mode;
> +	int rotate;
> +	bool hflip, vflip;
> +	int ret = 0;
> +
> +	rotate = ctx->rotate;
> +	hflip = ctx->hflip;
> +	vflip = ctx->vflip;
> +
> +	switch (ctrl->id) {
> +	case V4L2_CID_HFLIP:
> +		hflip = ctrl->val;
> +		break;
> +	case V4L2_CID_VFLIP:
> +		vflip = ctrl->val;
> +		break;
> +	case V4L2_CID_ROTATE:
> +		rotate = ctrl->val;
> +		break;
> +	default:
> +		return -EINVAL;
> +	}
> +
> +	ret = ipu_degrees_to_rot_mode(&rot_mode, rotate, hflip, vflip);
> +	if (ret)
> +		return ret;
> +
> +	if (rot_mode != ctx->rot_mode) {
> +		struct vb2_queue *cap_q;
> +
> +		cap_q = v4l2_m2m_get_vq(ctx->fh.m2m_ctx,
> +					V4L2_BUF_TYPE_VIDEO_CAPTURE);
> +		if (vb2_is_streaming(cap_q))
> +			return -EBUSY;
> +
> +		ctx->rot_mode = rot_mode;
> +		ctx->rotate = rotate;
> +		ctx->hflip = hflip;
> +		ctx->vflip = vflip;
> +	}
> +
> +	return 0;
> +}
> +
> +static const struct v4l2_ctrl_ops mem2mem_ctrl_ops = {
> +	.s_ctrl = mem2mem_s_ctrl,
> +};
> +
> +static int mem2mem_init_controls(struct mem2mem_ctx *ctx)
> +{
> +	struct v4l2_ctrl_handler *hdlr = &ctx->ctrl_hdlr;
> +	int ret;
> +
> +	v4l2_ctrl_handler_init(hdlr, 3);
> +
> +	v4l2_ctrl_new_std(hdlr, &mem2mem_ctrl_ops, V4L2_CID_HFLIP,
> +			  0, 1, 1, 0);
> +	v4l2_ctrl_new_std(hdlr, &mem2mem_ctrl_ops, V4L2_CID_VFLIP,
> +			  0, 1, 1, 0);
> +	v4l2_ctrl_new_std(hdlr, &mem2mem_ctrl_ops, V4L2_CID_ROTATE,
> +			  0, 270, 90, 0);
> +
> +	if (hdlr->error) {
> +		ret = hdlr->error;
> +		goto out_free;
> +	}
> +
> +	v4l2_ctrl_handler_setup(hdlr);
> +	return 0;
> +
> +out_free:
> +	v4l2_ctrl_handler_free(hdlr);
> +	return ret;
> +}
> +
> +#define DEFAULT_WIDTH	720
> +#define DEFAULT_HEIGHT	576
> +static const struct mem2mem_q_data mem2mem_q_data_default = {
> +	.cur_fmt = {
> +		.width = DEFAULT_WIDTH,
> +		.height = DEFAULT_HEIGHT,
> +		.pixelformat = V4L2_PIX_FMT_YUV420,
> +		.field = V4L2_FIELD_NONE,
> +		.bytesperline = DEFAULT_WIDTH,
> +		.sizeimage = DEFAULT_WIDTH * DEFAULT_HEIGHT * 3 / 2,
> +		.colorspace = V4L2_COLORSPACE_SRGB,
> +	},
> +	.rect = {
> +		.width = DEFAULT_WIDTH,
> +		.height = DEFAULT_HEIGHT,
> +	},
> +};
> +
> +/*
> + * File operations
> + */
> +static int mem2mem_open(struct file *file)
> +{
> +	struct mem2mem_priv *priv = video_drvdata(file);
> +	struct mem2mem_ctx *ctx = NULL;
> +	int ret;
> +
> +	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
> +	if (!ctx)
> +		return -ENOMEM;
> +
> +	ctx->rot_mode = IPU_ROTATE_NONE;
> +
> +	v4l2_fh_init(&ctx->fh, video_devdata(file));
> +	file->private_data = &ctx->fh;
> +	v4l2_fh_add(&ctx->fh);
> +	ctx->priv = priv;
> +
> +	ctx->fh.m2m_ctx = v4l2_m2m_ctx_init(priv->m2m_dev, ctx,
> +					    &queue_init);
> +	if (IS_ERR(ctx->fh.m2m_ctx)) {
> +		ret = PTR_ERR(ctx->fh.m2m_ctx);
> +		goto err_ctx;
> +	}
> +
> +	ret = mem2mem_init_controls(ctx);
> +	if (ret)
> +		goto err_ctrls;
> +
> +	ctx->fh.ctrl_handler = &ctx->ctrl_hdlr;
> +
> +	ctx->q_data[V4L2_M2M_SRC] = mem2mem_q_data_default;
> +	ctx->q_data[V4L2_M2M_DST] = mem2mem_q_data_default;
> +
> +	atomic_inc(&priv->num_inst);
> +
> +	dev_dbg(priv->dev, "Created instance %p, m2m_ctx: %p\n", ctx,
> +		ctx->fh.m2m_ctx);
> +
> +	return 0;
> +
> +err_ctrls:
> +	v4l2_m2m_ctx_release(ctx->fh.m2m_ctx);
> +err_ctx:
> +	v4l2_fh_del(&ctx->fh);
> +	v4l2_fh_exit(&ctx->fh);
> +	kfree(ctx);
> +	return ret;
> +}
> +
> +static int mem2mem_release(struct file *file)
> +{
> +	struct mem2mem_priv *priv = video_drvdata(file);
> +	struct mem2mem_ctx *ctx = fh_to_ctx(file->private_data);
> +
> +	dev_dbg(priv->dev, "Releasing instance %p\n", ctx);
> +
> +	v4l2_m2m_ctx_release(ctx->fh.m2m_ctx);
> +	v4l2_fh_del(&ctx->fh);
> +	v4l2_fh_exit(&ctx->fh);
> +	kfree(ctx);
> +
> +	atomic_dec(&priv->num_inst);
> +
> +	return 0;
> +}
> +
> +static const struct v4l2_file_operations mem2mem_fops = {
> +	.owner		= THIS_MODULE,
> +	.open		= mem2mem_open,
> +	.release	= mem2mem_release,
> +	.poll		= v4l2_m2m_fop_poll,
> +	.unlocked_ioctl	= video_ioctl2,
> +	.mmap		= v4l2_m2m_fop_mmap,
> +};
> +
> +static struct v4l2_m2m_ops m2m_ops = {
> +	.device_run	= device_run,
> +	.job_abort	= job_abort,
> +};
> +
> +static const struct video_device mem2mem_videodev_template = {
> +	.name		= "ipu0_ic_pp mem2mem",
> +	.fops		= &mem2mem_fops,
> +	.ioctl_ops	= &mem2mem_ioctl_ops,
> +	.minor		= -1,
> +	.release	= video_device_release,
> +	.vfl_dir	= VFL_DIR_M2M,
> +	.tvnorms	= V4L2_STD_NTSC | V4L2_STD_PAL | V4L2_STD_SECAM,
> +	.device_caps	= V4L2_CAP_VIDEO_M2M | V4L2_CAP_STREAMING,
> +};
> +
> +int imx_media_mem2mem_device_register(struct imx_media_video_dev *vdev)
> +{
> +	struct mem2mem_priv *priv = to_mem2mem_priv(vdev);
> +	struct video_device *vfd = vdev->vfd;
> +	int ret;
> +
> +	vfd->v4l2_dev = &priv->md->v4l2_dev;
> +
> +	ret = video_register_device(vfd, VFL_TYPE_GRABBER, -1);
> +	if (ret) {
> +		v4l2_err(vfd->v4l2_dev, "Failed to register video device\n");
> +		return ret;
> +	}
> +
> +	v4l2_info(vfd->v4l2_dev, "Registered %s as /dev/%s\n", vfd->name,
> +		  video_device_node_name(vfd));
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(imx_media_mem2mem_device_register);
> +
> +void imx_media_mem2mem_device_unregister(struct imx_media_video_dev *vdev)
> +{
> +	struct mem2mem_priv *priv = to_mem2mem_priv(vdev);
> +	struct video_device *vfd = priv->vdev.vfd;
> +
> +	mutex_lock(&priv->mutex);
> +
> +	if (video_is_registered(vfd)) {
> +		video_unregister_device(vfd);
> +		media_entity_cleanup(&vfd->entity);
> +	}
> +
> +	mutex_unlock(&priv->mutex);
> +}
> +EXPORT_SYMBOL_GPL(imx_media_mem2mem_device_unregister);
> +
> +struct imx_media_video_dev *
> +imx_media_mem2mem_device_init(struct imx_media_dev *md)
> +{
> +	struct mem2mem_priv *priv;
> +	struct video_device *vfd;
> +	int ret;
> +
> +	priv = devm_kzalloc(md->md.dev, sizeof(*priv), GFP_KERNEL);
> +	if (!priv)
> +		return ERR_PTR(-ENOMEM);
> +
> +	priv->md = md;
> +	priv->dev = md->md.dev;
> +
> +	mutex_init(&priv->mutex);
> +	atomic_set(&priv->num_inst, 0);
> +
> +	vfd = video_device_alloc();
> +	if (!vfd)
> +		return ERR_PTR(-ENOMEM);
> +
> +	*vfd = mem2mem_videodev_template;
> +	snprintf(vfd->name, sizeof(vfd->name), "ipu_ic_pp mem2mem");
> +	vfd->lock = &priv->mutex;
> +	priv->vdev.vfd = vfd;
> +
> +	INIT_LIST_HEAD(&priv->vdev.list);
> +
> +	video_set_drvdata(vfd, priv);
> +
> +	priv->m2m_dev = v4l2_m2m_init(&m2m_ops);
> +	if (IS_ERR(priv->m2m_dev)) {
> +		ret = PTR_ERR(priv->m2m_dev);
> +		v4l2_err(&md->v4l2_dev, "Failed to init mem2mem device: %d\n",
> +			 ret);
> +		return ERR_PTR(ret);
> +	}
> +
> +	return &priv->vdev;
> +}
> +EXPORT_SYMBOL_GPL(imx_media_mem2mem_device_init);
> +
> +void imx_media_mem2mem_device_remove(struct imx_media_video_dev *vdev)
> +{
> +	struct mem2mem_priv *priv = to_mem2mem_priv(vdev);
> +
> +	v4l2_m2m_release(priv->m2m_dev);
> +}
> +EXPORT_SYMBOL_GPL(imx_media_mem2mem_device_remove);
> +
> +MODULE_DESCRIPTION("i.MX IPUv3 mem2mem scaler/CSC driver");
> +MODULE_AUTHOR("Sascha Hauer <s.hauer@pengutronix.de>");
> +MODULE_LICENSE("GPL");
> diff --git a/drivers/staging/media/imx/imx-media.h b/drivers/staging/media/imx/imx-media.h
> index e945e0ed6dd6..dc24ed37f050 100644
> --- a/drivers/staging/media/imx/imx-media.h
> +++ b/drivers/staging/media/imx/imx-media.h
> @@ -149,6 +149,9 @@ struct imx_media_dev {
>   	/* for async subdev registration */
>   	struct list_head asd_list;
>   	struct v4l2_async_notifier subdev_notifier;
> +
> +	/* IC scaler/CSC mem2mem video device */
> +	struct imx_media_video_dev *m2m_vdev;
>   };
>   
>   enum codespace_sel {
> @@ -262,6 +265,13 @@ void imx_media_capture_device_set_format(struct imx_media_video_dev *vdev,
>   					 struct v4l2_pix_format *pix);
>   void imx_media_capture_device_error(struct imx_media_video_dev *vdev);
>   
> +/* imx-media-mem2mem.c */
> +struct imx_media_video_dev *
> +imx_media_mem2mem_device_init(struct imx_media_dev *dev);
> +void imx_media_mem2mem_device_remove(struct imx_media_video_dev *vdev);
> +int imx_media_mem2mem_device_register(struct imx_media_video_dev *vdev);
> +void imx_media_mem2mem_device_unregister(struct imx_media_video_dev *vdev);
> +
>   /* subdev group ids */
>   #define IMX_MEDIA_GRP_ID_CSI2      BIT(8)
>   #define IMX_MEDIA_GRP_ID_CSI_BIT   9

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 16/16] media: imx: add mem2mem device
  2018-06-22 15:52 ` [PATCH 16/16] media: imx: add mem2mem device Philipp Zabel
                     ` (2 preceding siblings ...)
  2018-07-05 22:09   ` Steve Longerbeam
@ 2018-07-10 12:07   ` Pavel Machek
  2018-07-16 14:10     ` Philipp Zabel
  3 siblings, 1 reply; 28+ messages in thread
From: Pavel Machek @ 2018-07-10 12:07 UTC (permalink / raw)
  To: Philipp Zabel; +Cc: linux-media, kernel, Steve Longerbeam

[-- Attachment #1: Type: text/plain, Size: 1405 bytes --]

Hi!

> Add a single imx-media mem2mem video device that uses the IPU IC PP
> (image converter post processing) task for scaling and colorspace
> conversion.
> On i.MX6Q/DL SoCs with two IPUs currently only the first IPU is used.
> 
> The hardware only supports writing to destination buffers up to
> 1024x1024 pixels in a single pass, so the mem2mem video device is
> limited to this resolution. After fixing the tiling code it should
> be possible to extend this to arbitrary sizes by rendering multiple
> tiles per frame.
> 
> Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>

> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * i.MX IPUv3 mem2mem Scaler/CSC driver
> + *
> + * Copyright (C) 2011 Pengutronix, Sascha Hauer
> + * Copyright (C) 2018 Pengutronix, Philipp Zabel
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + */

Point of SPDX is that the last 4 lines can be removed...and if you
want GPL-2.0+ as you state (and I like that), you should also say so
in SPDX.

Thanks,

									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 00/16] i.MX media mem2mem scaler
  2018-07-05 21:55 ` [PATCH 00/16] i.MX media mem2mem scaler Steve Longerbeam
@ 2018-07-16 14:10   ` Philipp Zabel
  0 siblings, 0 replies; 28+ messages in thread
From: Philipp Zabel @ 2018-07-16 14:10 UTC (permalink / raw)
  To: Steve Longerbeam, linux-media; +Cc: kernel, Steve Longerbeam

Hi Steve,

On Thu, 2018-07-05 at 14:55 -0700, Steve Longerbeam wrote:
> Hi Philipp,
> 
> Thanks for this great patchset! Finally we have improved seams
> with tiled conversions, and relaxed width alignment requirements.
> 
> Unfortunately this patchset isn't working correctly yet. It breaks tiled
> conversions with rotation.
>
> Trying the following conversion:
> 
> input: 720x480, UYVY
> output: 1280x768, UYVY, rotation=90 degrees
>
> causes non-8-byte aligned tile buffers at the output:
> 
> [  129.578210] imx-ipuv3 2400000.ipu: task 2: ctx 8955dec9: Input@[0,0]: 
> phys 00000000
> [  129.585980] imx-ipuv3 2400000.ipu: task 2: ctx 8955dec9: Input@[1,0]: 
> phys 00051360
> [  129.593736] imx-ipuv3 2400000.ipu: task 2: ctx 8955dec9: 
> Output@[0,0]: phys 00000000
> [  129.601556] imx-ipuv3 2400000.ipu: task 2: ctx 8955dec9: 
> Output@[0,1]: phys 0000052e
> 
> resulting in hung conversion and abort timeout:
> 
> [  147.689220] imx-ipuv3 2400000.ipu: ipu_image_convert_abort: timeout
> 
> Note that when converting to a planar format, the final (rotated) chroma 
> tile
> buffers are also mis-aligned, in addition to the Y buffers.

Ouch, thanks. Calculation of the allow_overshoot parameter to
find_best_seam() is inverted and the seam alignment does not
properly switch output width/height in the IRT case.

I'll fix this in v2:

----------8<----------
diff --git a/drivers/gpu/ipu-v3/ipu-image-convert.c b/drivers/gpu/ipu-v3/ipu-image-convert.c
index 4eb76714505a..8e125def4f5c 100644
--- a/drivers/gpu/ipu-v3/ipu-image-convert.c
+++ b/drivers/gpu/ipu-v3/ipu-image-convert.c
@@ -683,6 +683,11 @@ static void find_seams(struct ipu_image_convert_ctx *ctx,
 	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
 		resized_width = out->base.rect.height;
 		resized_height = out->base.rect.width;
+		out_left_align = tile_top_align(out->fmt);
+		out_top_align = tile_left_align(out->fmt);
+		out_width_align = tile_height_align(out->type,
+						    ctx->rot_mode);
+		out_height_align = tile_width_align(out->fmt);
 		out_right = out->base.rect.height;
 		out_bottom = out->base.rect.width;
 	}
@@ -732,7 +737,7 @@ static void find_seams(struct ipu_image_convert_ctx *ctx,
 	}
 
 	for (col = in->num_cols - 1; col > 0; col--) {
-		bool allow_overshoot = (col == in->num_cols - 1) &&
+		bool allow_overshoot = (col < in->num_cols - 1) &&
 				       !(ctx->rot_mode & IPU_ROT_BIT_HFLIP);
 		unsigned int out_start;
 		unsigned int out_end;
@@ -775,6 +780,7 @@ static void find_seams(struct ipu_image_convert_ctx *ctx,
 		in_right, flipped_out_left, out_right);
 
 	for (row = in->num_rows - 1; row > 0; row--) {
+		bool allow_overshoot = row < in->num_rows - 1;
 		unsigned int out_start;
 		unsigned int out_end;
 		unsigned int in_top;
@@ -788,8 +794,7 @@ static void find_seams(struct ipu_image_convert_ctx *ctx,
 		find_best_seam(ctx, out_start, out_end,
 			       in_top_align, out_top_align, out_height_align,
 			       ctx->downsize_coeff_v, ctx->image_resize_coeff_v,
-			       row == in->num_rows - 1,
-			       &in_top, &out_top);
+			       allow_overshoot, &in_top, &out_top);
 
 		if ((ctx->rot_mode & IPU_ROT_BIT_VFLIP) ^
 		    ipu_rot_mode_is_irt(ctx->rot_mode))
---------->8----------

regards
Philipp

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 16/16] media: imx: add mem2mem device
  2018-07-10 12:07   ` Pavel Machek
@ 2018-07-16 14:10     ` Philipp Zabel
  0 siblings, 0 replies; 28+ messages in thread
From: Philipp Zabel @ 2018-07-16 14:10 UTC (permalink / raw)
  To: Pavel Machek; +Cc: linux-media, kernel, Steve Longerbeam

Hi Pavel,

On Tue, 2018-07-10 at 14:07 +0200, Pavel Machek wrote:
[...]
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * i.MX IPUv3 mem2mem Scaler/CSC driver
> > + *
> > + * Copyright (C) 2011 Pengutronix, Sascha Hauer
> > + * Copyright (C) 2018 Pengutronix, Philipp Zabel
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License as published by
> > + * the Free Software Foundation; either version 2 of the License, or
> > + * (at your option) any later version.
> > + */
> 
> Point of SPDX is that the last 4 lines can be removed...and if you
> want GPL-2.0+ as you state (and I like that), you should also say so
> in SPDX.

Thank you, I'll fix this in v2.

regards
Philipp

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 16/16] media: imx: add mem2mem device
  2018-07-05 22:09   ` Steve Longerbeam
@ 2018-07-16 14:12     ` Philipp Zabel
  2018-07-22 18:02       ` Steve Longerbeam
  0 siblings, 1 reply; 28+ messages in thread
From: Philipp Zabel @ 2018-07-16 14:12 UTC (permalink / raw)
  To: Steve Longerbeam, linux-media; +Cc: kernel, Steve Longerbeam

Hi Steve,

On Thu, 2018-07-05 at 15:09 -0700, Steve Longerbeam wrote:
[...]
> > +static int mem2mem_try_fmt(struct file *file, void *priv,
> > +			   struct v4l2_format *f)
> > +{
[...]
> > +	/*
> > +	 * Horizontally/vertically chroma subsampled formats must have even
> > +	 * width/height.
> > +	 */
> > +	switch (f->fmt.pix.pixelformat) {
> > +	case V4L2_PIX_FMT_YUV420:
> > +	case V4L2_PIX_FMT_YVU420:
> > +	case V4L2_PIX_FMT_NV12:
> > +		walign = 1;
> > +		halign = 1;
> > +		break;
> > +	case V4L2_PIX_FMT_YUV422P:
> > +	case V4L2_PIX_FMT_NV16:
> > +		walign = 1;
> > +		halign = 0;
> > +		break;
> > +	default:
> 
> The default case should init walign, otherwise for OUTPUT direction,
> walign may not get initialized at all, see below...

Yes, thank you.

> > +		halign = 0;
> > +		break;
> > +	}
> > +	if (f->type == V4L2_BUF_TYPE_VIDEO_OUTPUT) {
> > +		/*
> > +		 * The IC burst reads 8 pixels at a time. Reading beyond the
> > +		 * end of the line is usually acceptable. Those pixels are
> > +		 * ignored, unless the IC has to write the scaled line in
> > +		 * reverse.
> > +		 */
> > +		if (!ipu_rot_mode_is_irt(ctx->rot_mode) &&
> > +		    ctx->rot_mode && IPU_ROT_BIT_HFLIP)
> > +			walign = 3;
> 
> This looks wrong. Do you mean:
> 
> if (ipu_rot_mode_is_irt(ctx->rot_mode) || (ctx->rot_mode & IPU_ROT_BIT_HFLIP))
>      walign = 3;
> else
>      walign = 1;

The input DMA burst width alignment is only necessary if the lines are
scanned from right to left (that is, if HF is enabled) in the scaling
step.
If the rotator is used, the flipping is done in the rotation step
instead, so the alignment restriction would be on the width of the
intermediate tile (and thus on the output height). This is already
covered by the rotator 8x8 pixel block alignment.

> That is, require 8 byte width alignment for IRT or if HFLIP is enabled.

No, I specifically meant (!IRT && HFLIP).

The rotator itself doesn't cause any input alignment restrictions, we
just have to make sure that the intermediate tiles after scaling are 8x8
aligned.

> Also, why not simply call ipu_image_convert_adjust() in
> mem2mem_try_fmt()? If there is something missing in the former
> function, then it should be added there, instead of adding the
> missing checks in mem2mem_try_fmt().

ipu_image_convert_adjust tries to adjust both input and output image at
the same time, here we just have the format of either input or output
image. Do you suggest to split this function into an input and an output
version?

regards
Philipp

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 16/16] media: imx: add mem2mem device
  2018-07-16 14:12     ` Philipp Zabel
@ 2018-07-22 18:02       ` Steve Longerbeam
  2018-07-23  7:31         ` Philipp Zabel
  0 siblings, 1 reply; 28+ messages in thread
From: Steve Longerbeam @ 2018-07-22 18:02 UTC (permalink / raw)
  To: Philipp Zabel, Steve Longerbeam, linux-media; +Cc: kernel



On 07/16/2018 07:12 AM, Philipp Zabel wrote:
> Hi Steve,
>
> On Thu, 2018-07-05 at 15:09 -0700, Steve Longerbeam wrote:
> [...]
> [...]
>>> +		halign = 0;
>>> +		break;
>>> +	}
>>> +	if (f->type == V4L2_BUF_TYPE_VIDEO_OUTPUT) {
>>> +		/*
>>> +		 * The IC burst reads 8 pixels at a time. Reading beyond the
>>> +		 * end of the line is usually acceptable. Those pixels are
>>> +		 * ignored, unless the IC has to write the scaled line in
>>> +		 * reverse.
>>> +		 */
>>> +		if (!ipu_rot_mode_is_irt(ctx->rot_mode) &&
>>> +		    ctx->rot_mode && IPU_ROT_BIT_HFLIP)
>>> +			walign = 3;
>> This looks wrong. Do you mean:
>>
>> if (ipu_rot_mode_is_irt(ctx->rot_mode) || (ctx->rot_mode & IPU_ROT_BIT_HFLIP))
>>       walign = 3;
>> else
>>       walign = 1;
> The input DMA burst width alignment is only necessary if the lines are
> scanned from right to left (that is, if HF is enabled) in the scaling
> step.

Ok, thanks for the explanation, that makes sense.

> If the rotator is used, the flipping is done in the rotation step
> instead,

Ah, I missed or forgot about that detail in the ref manual,
I reviewed it again and you are right...

>   so the alignment restriction would be on the width of the
> intermediate tile (and thus on the output height). This is already
> covered by the rotator 8x8 pixel block alignment.

so this makes sense too.

>
>> That is, require 8 byte width alignment for IRT or if HFLIP is enabled.
> No, I specifically meant (!IRT && HFLIP).

Right, but there is still a typo:

if (!ipu_rot_mode_is_irt(ctx->rot_mode) && ctx->rot_mode && IPU_ROT_BIT_HFLIP)

should be:

if (!ipu_rot_mode_is_irt(ctx->rot_mode) && (ctx->rot_mode & IPU_ROT_BIT_HFLIP))


>
> The rotator itself doesn't cause any input alignment restrictions, we
> just have to make sure that the intermediate tiles after scaling are 8x8
> aligned.
>
>> Also, why not simply call ipu_image_convert_adjust() in
>> mem2mem_try_fmt()? If there is something missing in the former
>> function, then it should be added there, instead of adding the
>> missing checks in mem2mem_try_fmt().
> ipu_image_convert_adjust tries to adjust both input and output image at
> the same time, here we just have the format of either input or output
> image. Do you suggest to split this function into an input and an output
> version?

See b4362162c0 ("media: imx: mem2mem: Use ipu_image_convert_adjust
in try format")

in my mediatree fork at git@github.com:slongerbeam/mediatree.git.

Let's discuss this further in the v2 patches.

Steve

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 16/16] media: imx: add mem2mem device
  2018-07-22 18:02       ` Steve Longerbeam
@ 2018-07-23  7:31         ` Philipp Zabel
  2018-07-23 16:54           ` Steve Longerbeam
  0 siblings, 1 reply; 28+ messages in thread
From: Philipp Zabel @ 2018-07-23  7:31 UTC (permalink / raw)
  To: Steve Longerbeam, Steve Longerbeam, linux-media; +Cc: kernel

On Sun, 2018-07-22 at 11:02 -0700, Steve Longerbeam wrote:
> On 07/16/2018 07:12 AM, Philipp Zabel wrote:
[...]
> > > > +		/*
> > > > +		 * The IC burst reads 8 pixels at a time. Reading beyond the
> > > > +		 * end of the line is usually acceptable. Those pixels are
> > > > +		 * ignored, unless the IC has to write the scaled line in
> > > > +		 * reverse.
> > > > +		 */
> > > > +		if (!ipu_rot_mode_is_irt(ctx->rot_mode) &&
> > > > +		    ctx->rot_mode && IPU_ROT_BIT_HFLIP)
> > > > +			walign = 3;
> > > 
> > > This looks wrong. Do you mean:
> > > 
> > > if (ipu_rot_mode_is_irt(ctx->rot_mode) || (ctx->rot_mode & IPU_ROT_BIT_HFLIP))
> > >       walign = 3;
> > > else
> > >       walign = 1;
[...]
> > No, I specifically meant (!IRT && HFLIP).
> 
> Right, but there is still a typo:
> 
> if (!ipu_rot_mode_is_irt(ctx->rot_mode) && ctx->rot_mode && IPU_ROT_BIT_HFLIP)
>
> should be:
> 
> if (!ipu_rot_mode_is_irt(ctx->rot_mode) && (ctx->rot_mode & IPU_ROT_BIT_HFLIP))

Ow, yes, thank you.

> > The rotator itself doesn't cause any input alignment restrictions, we
> > just have to make sure that the intermediate tiles after scaling are 8x8
> > aligned.
> > 
> > > Also, why not simply call ipu_image_convert_adjust() in
> > > mem2mem_try_fmt()? If there is something missing in the former
> > > function, then it should be added there, instead of adding the
> > > missing checks in mem2mem_try_fmt().
> > 
> > ipu_image_convert_adjust tries to adjust both input and output image at
> > the same time, here we just have the format of either input or output
> > image. Do you suggest to split this function into an input and an output
> > version?
> 
> See b4362162c0 ("media: imx: mem2mem: Use ipu_image_convert_adjust
> in try format")

Alright, this looks fine to me. I was worried about inter-format
limitations, but the only one seems to be the output size lower bound to
1/4 of the input size. Should S_FMT(OUT) also update the capture format
if adjustments were made to keep a consistent state?

regards
Philipp

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 16/16] media: imx: add mem2mem device
  2018-07-23  7:31         ` Philipp Zabel
@ 2018-07-23 16:54           ` Steve Longerbeam
  0 siblings, 0 replies; 28+ messages in thread
From: Steve Longerbeam @ 2018-07-23 16:54 UTC (permalink / raw)
  To: Philipp Zabel, Steve Longerbeam, linux-media; +Cc: kernel



On 07/23/2018 12:31 AM, Philipp Zabel wrote:
>>>
>>> ipu_image_convert_adjust tries to adjust both input and output image at
>>> the same time, here we just have the format of either input or output
>>> image. Do you suggest to split this function into an input and an output
>>> version?
>> See b4362162c0 ("media: imx: mem2mem: Use ipu_image_convert_adjust
>> in try format")
> Alright, this looks fine to me. I was worried about inter-format
> limitations, but the only one seems to be the output size lower bound to
> 1/4 of the input size. Should S_FMT(OUT) also update the capture format
> if adjustments were made to keep a consistent state?

That's a good question, I don't know if the mem2mem API allows for
that, but if it does we should do that for consistent state as you said.

In b4362162c0, the current capture format is used to adjust output
format during S_FMT(OUT) but any capture format changes are
dropped, and vice-versa.

Steve

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2018-07-23 17:57 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-22 15:52 [PATCH 00/16] i.MX media mem2mem scaler Philipp Zabel
2018-06-22 15:52 ` [PATCH 01/16] gpu: ipu-v3: ipu-ic: allow to manually set resize coefficients Philipp Zabel
2018-06-22 15:52 ` [PATCH 02/16] gpu: ipu-v3: image-convert: prepare for per-tile configuration Philipp Zabel
2018-06-22 15:52 ` [PATCH 03/16] gpu: ipu-v3: image-convert: calculate per-tile resize coefficients Philipp Zabel
2018-06-22 15:52 ` [PATCH 04/16] gpu: ipu-v3: image-convert: reconfigure IC per tile Philipp Zabel
2018-06-22 15:52 ` [PATCH 05/16] gpu: ipu-v3: image-convert: store tile top/left position Philipp Zabel
2018-06-22 15:52 ` [PATCH 06/16] gpu: ipu-v3: image-convert: calculate tile dimensions and offsets outside fill_image Philipp Zabel
2018-06-22 15:52 ` [PATCH 07/16] gpu: ipu-v3: image-convert: move tile alignment helpers Philipp Zabel
2018-06-22 15:52 ` [PATCH 08/16] gpu: ipu-v3: image-convert: select optimal seam positions Philipp Zabel
2018-06-22 15:52 ` [PATCH 09/16] gpu: ipu-v3: image-convert: fix debug output for varying tile sizes Philipp Zabel
2018-06-22 15:52 ` [PATCH 10/16] gpu: ipu-v3: image-convert: relax tile width alignment for NV12 and NV16 Philipp Zabel
2018-06-22 15:52 ` [PATCH 11/16] gpu: ipu-v3: image-convert: relax input alignment restrictions Philipp Zabel
2018-06-22 15:52 ` [PATCH 12/16] gpu: ipu-v3: image-convert: relax output " Philipp Zabel
2018-06-22 15:52 ` [PATCH 13/16] gpu: ipu-v3: image-convert: fix bytesperline adjustment Philipp Zabel
2018-06-22 15:52 ` [PATCH 14/16] gpu: ipu-v3: image-convert: add some ASCII art to the exposition Philipp Zabel
2018-06-22 15:52 ` [PATCH 15/16] gpu: ipu-v3: image-convert: disable double buffering if necessary Philipp Zabel
2018-06-22 15:52 ` [PATCH 16/16] media: imx: add mem2mem device Philipp Zabel
2018-06-22 19:37   ` Nicolas Dufresne
2018-06-22 21:03   ` kbuild test robot
2018-07-05 22:09   ` Steve Longerbeam
2018-07-16 14:12     ` Philipp Zabel
2018-07-22 18:02       ` Steve Longerbeam
2018-07-23  7:31         ` Philipp Zabel
2018-07-23 16:54           ` Steve Longerbeam
2018-07-10 12:07   ` Pavel Machek
2018-07-16 14:10     ` Philipp Zabel
2018-07-05 21:55 ` [PATCH 00/16] i.MX media mem2mem scaler Steve Longerbeam
2018-07-16 14:10   ` Philipp Zabel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.