All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/4] IPUv3 prep for i.MX5/6 v4l2 staging drivers, v4
@ 2016-08-18  0:50 ` Steve Longerbeam
  0 siblings, 0 replies; 32+ messages in thread
From: Steve Longerbeam @ 2016-08-18  0:50 UTC (permalink / raw)
  To: p.zabel, plagnioj, tomi.valkeinen
  Cc: dri-devel, linux-kernel, linux-fbdev, Steve Longerbeam

In this version:

- rebased against latest drm-next.
- cleaned up header includes in ipu-vdi.c.
- do away with struct ipu_ic_tile_off in ipu-ic.c, and move tile offsets
  into struct ipu_ic_tile. This paves the way for possibly allowing for
  each tile to have different dimensions in the future.


Steve Longerbeam (4):
  gpu: ipu-v3: Add Video Deinterlacer unit
  gpu: ipu-v3: Add FSU channel linking support
  gpu: ipu-ic: Add complete image conversion support with tiling
  gpu: ipu-ic: allow multiple handles to ic

 drivers/gpu/ipu-v3/Makefile     |    2 +-
 drivers/gpu/ipu-v3/ipu-common.c |  142 ++++
 drivers/gpu/ipu-v3/ipu-ic.c     | 1719 ++++++++++++++++++++++++++++++++++++++-
 drivers/gpu/ipu-v3/ipu-prv.h    |   33 +
 drivers/gpu/ipu-v3/ipu-vdi.c    |  243 ++++++
 include/video/imx-ipu-v3.h      |   93 ++-
 6 files changed, 2195 insertions(+), 37 deletions(-)
 create mode 100644 drivers/gpu/ipu-v3/ipu-vdi.c

-- 
1.9.1

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v4 0/4] IPUv3 prep for i.MX5/6 v4l2 staging drivers, v4
@ 2016-08-18  0:50 ` Steve Longerbeam
  0 siblings, 0 replies; 32+ messages in thread
From: Steve Longerbeam @ 2016-08-18  0:50 UTC (permalink / raw)
  To: p.zabel, plagnioj, tomi.valkeinen
  Cc: dri-devel, linux-kernel, linux-fbdev, Steve Longerbeam

In this version:

- rebased against latest drm-next.
- cleaned up header includes in ipu-vdi.c.
- do away with struct ipu_ic_tile_off in ipu-ic.c, and move tile offsets
  into struct ipu_ic_tile. This paves the way for possibly allowing for
  each tile to have different dimensions in the future.


Steve Longerbeam (4):
  gpu: ipu-v3: Add Video Deinterlacer unit
  gpu: ipu-v3: Add FSU channel linking support
  gpu: ipu-ic: Add complete image conversion support with tiling
  gpu: ipu-ic: allow multiple handles to ic

 drivers/gpu/ipu-v3/Makefile     |    2 +-
 drivers/gpu/ipu-v3/ipu-common.c |  142 ++++
 drivers/gpu/ipu-v3/ipu-ic.c     | 1719 ++++++++++++++++++++++++++++++++++++++-
 drivers/gpu/ipu-v3/ipu-prv.h    |   33 +
 drivers/gpu/ipu-v3/ipu-vdi.c    |  243 ++++++
 include/video/imx-ipu-v3.h      |   93 ++-
 6 files changed, 2195 insertions(+), 37 deletions(-)
 create mode 100644 drivers/gpu/ipu-v3/ipu-vdi.c

-- 
1.9.1


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v4 1/4] gpu: ipu-v3: Add Video Deinterlacer unit
  2016-08-18  0:50 ` Steve Longerbeam
@ 2016-08-18  0:50   ` Steve Longerbeam
  -1 siblings, 0 replies; 32+ messages in thread
From: Steve Longerbeam @ 2016-08-18  0:50 UTC (permalink / raw)
  To: p.zabel, plagnioj, tomi.valkeinen
  Cc: dri-devel, linux-kernel, linux-fbdev, Steve Longerbeam

Adds the Video Deinterlacer (VDIC) unit.

Signed-off-by: Steve Longerbeam <steve_longerbeam@mentor.com>

---

v4:
- pruned included headers.

v3:
- renamed and exported ipu_vdi_set_top_field_man() to
  ipu_vdi_set_field_order(). Args include std and field to determine
  correct field order.
- exported ipu_vdi_set_motion().
- ipu_vdi_setup() does not need to call ipu_vdi_set_top_field_man() or
  ipu_vdi_set_motion(), since latter are exported. This simplifies args.
- removed ipu_vdi_toggle_top_field_man().
- removed ipu_vdi_set_src().

v2:
- removed include of module.h
- corrected V4L2 field type checks
- cleaned up use_count decrement in ipu_vdi_disable()
---
 drivers/gpu/ipu-v3/Makefile     |   2 +-
 drivers/gpu/ipu-v3/ipu-common.c |  11 ++
 drivers/gpu/ipu-v3/ipu-prv.h    |   6 +
 drivers/gpu/ipu-v3/ipu-vdi.c    | 243 ++++++++++++++++++++++++++++++++++++++++
 include/video/imx-ipu-v3.h      |  23 ++++
 5 files changed, 284 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/ipu-v3/ipu-vdi.c

diff --git a/drivers/gpu/ipu-v3/Makefile b/drivers/gpu/ipu-v3/Makefile
index 107ec23..aeba9dc 100644
--- a/drivers/gpu/ipu-v3/Makefile
+++ b/drivers/gpu/ipu-v3/Makefile
@@ -1,4 +1,4 @@
 obj-$(CONFIG_IMX_IPUV3_CORE) += imx-ipu-v3.o
 
 imx-ipu-v3-objs := ipu-common.o ipu-cpmem.o ipu-csi.o ipu-dc.o ipu-di.o \
-		ipu-dp.o ipu-dmfc.o ipu-ic.o ipu-smfc.o
+		ipu-dp.o ipu-dmfc.o ipu-ic.o ipu-smfc.o ipu-vdi.o
diff --git a/drivers/gpu/ipu-v3/ipu-common.c b/drivers/gpu/ipu-v3/ipu-common.c
index d230988..9d3584b 100644
--- a/drivers/gpu/ipu-v3/ipu-common.c
+++ b/drivers/gpu/ipu-v3/ipu-common.c
@@ -839,6 +839,14 @@ static int ipu_submodules_init(struct ipu_soc *ipu,
 		goto err_ic;
 	}
 
+	ret = ipu_vdi_init(ipu, dev, ipu_base + devtype->vdi_ofs,
+			   IPU_CONF_VDI_EN | IPU_CONF_ISP_EN |
+			   IPU_CONF_IC_INPUT);
+	if (ret) {
+		unit = "vdi";
+		goto err_vdi;
+	}
+
 	ret = ipu_di_init(ipu, dev, 0, ipu_base + devtype->disp0_ofs,
 			  IPU_CONF_DI0_EN, ipu_clk);
 	if (ret) {
@@ -893,6 +901,8 @@ err_dc:
 err_di_1:
 	ipu_di_exit(ipu, 0);
 err_di_0:
+	ipu_vdi_exit(ipu);
+err_vdi:
 	ipu_ic_exit(ipu);
 err_ic:
 	ipu_csi_exit(ipu, 1);
@@ -977,6 +987,7 @@ static void ipu_submodules_exit(struct ipu_soc *ipu)
 	ipu_dc_exit(ipu);
 	ipu_di_exit(ipu, 1);
 	ipu_di_exit(ipu, 0);
+	ipu_vdi_exit(ipu);
 	ipu_ic_exit(ipu);
 	ipu_csi_exit(ipu, 1);
 	ipu_csi_exit(ipu, 0);
diff --git a/drivers/gpu/ipu-v3/ipu-prv.h b/drivers/gpu/ipu-v3/ipu-prv.h
index fd47f8f..02057d8 100644
--- a/drivers/gpu/ipu-v3/ipu-prv.h
+++ b/drivers/gpu/ipu-v3/ipu-prv.h
@@ -138,6 +138,7 @@ struct ipu_dc_priv;
 struct ipu_dmfc_priv;
 struct ipu_di;
 struct ipu_ic_priv;
+struct ipu_vdi;
 struct ipu_smfc_priv;
 
 struct ipu_devtype;
@@ -170,6 +171,7 @@ struct ipu_soc {
 	struct ipu_di		*di_priv[2];
 	struct ipu_csi		*csi_priv[2];
 	struct ipu_ic_priv	*ic_priv;
+	struct ipu_vdi          *vdi_priv;
 	struct ipu_smfc_priv	*smfc_priv;
 };
 
@@ -200,6 +202,10 @@ int ipu_ic_init(struct ipu_soc *ipu, struct device *dev,
 		unsigned long base, unsigned long tpmem_base);
 void ipu_ic_exit(struct ipu_soc *ipu);
 
+int ipu_vdi_init(struct ipu_soc *ipu, struct device *dev,
+		 unsigned long base, u32 module);
+void ipu_vdi_exit(struct ipu_soc *ipu);
+
 int ipu_di_init(struct ipu_soc *ipu, struct device *dev, int id,
 		unsigned long base, u32 module, struct clk *ipu_clk);
 void ipu_di_exit(struct ipu_soc *ipu, int id);
diff --git a/drivers/gpu/ipu-v3/ipu-vdi.c b/drivers/gpu/ipu-v3/ipu-vdi.c
new file mode 100644
index 0000000..f27bf5a
--- /dev/null
+++ b/drivers/gpu/ipu-v3/ipu-vdi.c
@@ -0,0 +1,243 @@
+/*
+ * Copyright (C) 2012-2016 Mentor Graphics Inc.
+ * Copyright (C) 2005-2009 Freescale Semiconductor, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2 of the License, or (at your
+ * option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ */
+#include <linux/io.h>
+#include "ipu-prv.h"
+
+struct ipu_vdi {
+	void __iomem *base;
+	u32 module;
+	spinlock_t lock;
+	int use_count;
+	struct ipu_soc *ipu;
+};
+
+
+/* VDI Register Offsets */
+#define VDI_FSIZE 0x0000
+#define VDI_C     0x0004
+
+/* VDI Register Fields */
+#define VDI_C_CH_420             (0 << 1)
+#define VDI_C_CH_422             (1 << 1)
+#define VDI_C_MOT_SEL_MASK       (0x3 << 2)
+#define VDI_C_MOT_SEL_FULL       (2 << 2)
+#define VDI_C_MOT_SEL_LOW        (1 << 2)
+#define VDI_C_MOT_SEL_MED        (0 << 2)
+#define VDI_C_BURST_SIZE1_4      (3 << 4)
+#define VDI_C_BURST_SIZE2_4      (3 << 8)
+#define VDI_C_BURST_SIZE3_4      (3 << 12)
+#define VDI_C_BURST_SIZE_MASK    0xF
+#define VDI_C_BURST_SIZE1_OFFSET 4
+#define VDI_C_BURST_SIZE2_OFFSET 8
+#define VDI_C_BURST_SIZE3_OFFSET 12
+#define VDI_C_VWM1_SET_1         (0 << 16)
+#define VDI_C_VWM1_SET_2         (1 << 16)
+#define VDI_C_VWM1_CLR_2         (1 << 19)
+#define VDI_C_VWM3_SET_1         (0 << 22)
+#define VDI_C_VWM3_SET_2         (1 << 22)
+#define VDI_C_VWM3_CLR_2         (1 << 25)
+#define VDI_C_TOP_FIELD_MAN_1    (1 << 30)
+#define VDI_C_TOP_FIELD_AUTO_1   (1 << 31)
+
+static inline u32 ipu_vdi_read(struct ipu_vdi *vdi, unsigned int offset)
+{
+	return readl(vdi->base + offset);
+}
+
+static inline void ipu_vdi_write(struct ipu_vdi *vdi, u32 value,
+				 unsigned int offset)
+{
+	writel(value, vdi->base + offset);
+}
+
+void ipu_vdi_set_field_order(struct ipu_vdi *vdi, v4l2_std_id std, u32 field)
+{
+	bool top_field_0 = false;
+	unsigned long flags;
+	u32 reg;
+
+	switch (field) {
+	case V4L2_FIELD_INTERLACED_TB:
+	case V4L2_FIELD_SEQ_TB:
+	case V4L2_FIELD_TOP:
+		top_field_0 = true;
+		break;
+	case V4L2_FIELD_INTERLACED_BT:
+	case V4L2_FIELD_SEQ_BT:
+	case V4L2_FIELD_BOTTOM:
+		top_field_0 = false;
+		break;
+	default:
+		top_field_0 = (std & V4L2_STD_525_60) ? true : false;
+		break;
+	}
+
+	spin_lock_irqsave(&vdi->lock, flags);
+
+	reg = ipu_vdi_read(vdi, VDI_C);
+	if (top_field_0)
+		reg &= ~VDI_C_TOP_FIELD_MAN_1;
+	else
+		reg |= VDI_C_TOP_FIELD_MAN_1;
+	ipu_vdi_write(vdi, reg, VDI_C);
+
+	spin_unlock_irqrestore(&vdi->lock, flags);
+}
+EXPORT_SYMBOL_GPL(ipu_vdi_set_field_order);
+
+void ipu_vdi_set_motion(struct ipu_vdi *vdi, enum ipu_motion_sel motion_sel)
+{
+	unsigned long flags;
+	u32 reg;
+
+	spin_lock_irqsave(&vdi->lock, flags);
+
+	reg = ipu_vdi_read(vdi, VDI_C);
+
+	reg &= ~VDI_C_MOT_SEL_MASK;
+
+	switch (motion_sel) {
+	case MED_MOTION:
+		reg |= VDI_C_MOT_SEL_MED;
+		break;
+	case HIGH_MOTION:
+		reg |= VDI_C_MOT_SEL_FULL;
+		break;
+	default:
+		reg |= VDI_C_MOT_SEL_LOW;
+		break;
+	}
+
+	ipu_vdi_write(vdi, reg, VDI_C);
+
+	spin_unlock_irqrestore(&vdi->lock, flags);
+}
+EXPORT_SYMBOL_GPL(ipu_vdi_set_motion);
+
+void ipu_vdi_setup(struct ipu_vdi *vdi, u32 code, int xres, int yres)
+{
+	unsigned long flags;
+	u32 pixel_fmt, reg;
+
+	spin_lock_irqsave(&vdi->lock, flags);
+
+	reg = ((yres - 1) << 16) | (xres - 1);
+	ipu_vdi_write(vdi, reg, VDI_FSIZE);
+
+	/*
+	 * Full motion, only vertical filter is used.
+	 * Burst size is 4 accesses
+	 */
+	if (code == MEDIA_BUS_FMT_UYVY8_2X8 ||
+	    code == MEDIA_BUS_FMT_UYVY8_1X16 ||
+	    code == MEDIA_BUS_FMT_YUYV8_2X8 ||
+	    code == MEDIA_BUS_FMT_YUYV8_1X16)
+		pixel_fmt = VDI_C_CH_422;
+	else
+		pixel_fmt = VDI_C_CH_420;
+
+	reg = ipu_vdi_read(vdi, VDI_C);
+	reg |= pixel_fmt;
+	reg |= VDI_C_BURST_SIZE2_4;
+	reg |= VDI_C_BURST_SIZE1_4 | VDI_C_VWM1_CLR_2;
+	reg |= VDI_C_BURST_SIZE3_4 | VDI_C_VWM3_CLR_2;
+	ipu_vdi_write(vdi, reg, VDI_C);
+
+	spin_unlock_irqrestore(&vdi->lock, flags);
+}
+EXPORT_SYMBOL_GPL(ipu_vdi_setup);
+
+void ipu_vdi_unsetup(struct ipu_vdi *vdi)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&vdi->lock, flags);
+	ipu_vdi_write(vdi, 0, VDI_FSIZE);
+	ipu_vdi_write(vdi, 0, VDI_C);
+	spin_unlock_irqrestore(&vdi->lock, flags);
+}
+EXPORT_SYMBOL_GPL(ipu_vdi_unsetup);
+
+int ipu_vdi_enable(struct ipu_vdi *vdi)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&vdi->lock, flags);
+
+	if (!vdi->use_count)
+		ipu_module_enable(vdi->ipu, vdi->module);
+
+	vdi->use_count++;
+
+	spin_unlock_irqrestore(&vdi->lock, flags);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(ipu_vdi_enable);
+
+int ipu_vdi_disable(struct ipu_vdi *vdi)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&vdi->lock, flags);
+
+	if (vdi->use_count) {
+		if (!--vdi->use_count)
+			ipu_module_disable(vdi->ipu, vdi->module);
+	}
+
+	spin_unlock_irqrestore(&vdi->lock, flags);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(ipu_vdi_disable);
+
+struct ipu_vdi *ipu_vdi_get(struct ipu_soc *ipu)
+{
+	return ipu->vdi_priv;
+}
+EXPORT_SYMBOL_GPL(ipu_vdi_get);
+
+void ipu_vdi_put(struct ipu_vdi *vdi)
+{
+}
+EXPORT_SYMBOL_GPL(ipu_vdi_put);
+
+int ipu_vdi_init(struct ipu_soc *ipu, struct device *dev,
+		 unsigned long base, u32 module)
+{
+	struct ipu_vdi *vdi;
+
+	vdi = devm_kzalloc(dev, sizeof(*vdi), GFP_KERNEL);
+	if (!vdi)
+		return -ENOMEM;
+
+	ipu->vdi_priv = vdi;
+
+	spin_lock_init(&vdi->lock);
+	vdi->module = module;
+	vdi->base = devm_ioremap(dev, base, PAGE_SIZE);
+	if (!vdi->base)
+		return -ENOMEM;
+
+	dev_dbg(dev, "VDI base: 0x%08lx remapped to %p\n", base, vdi->base);
+	vdi->ipu = ipu;
+
+	return 0;
+}
+
+void ipu_vdi_exit(struct ipu_soc *ipu)
+{
+}
diff --git a/include/video/imx-ipu-v3.h b/include/video/imx-ipu-v3.h
index c3de740..335d42e 100644
--- a/include/video/imx-ipu-v3.h
+++ b/include/video/imx-ipu-v3.h
@@ -80,6 +80,16 @@ enum ipu_color_space {
 	IPUV3_COLORSPACE_UNKNOWN,
 };
 
+/*
+ * Enumeration of VDI MOTION select
+ */
+enum ipu_motion_sel {
+	MOTION_NONE = 0,
+	LOW_MOTION,
+	MED_MOTION,
+	HIGH_MOTION,
+};
+
 struct ipuv3_channel;
 
 enum ipu_channel_irq {
@@ -335,6 +345,19 @@ void ipu_ic_put(struct ipu_ic *ic);
 void ipu_ic_dump(struct ipu_ic *ic);
 
 /*
+ * IPU Video De-Interlacer (vdi) functions
+ */
+struct ipu_vdi;
+void ipu_vdi_set_field_order(struct ipu_vdi *vdi, v4l2_std_id std, u32 field);
+void ipu_vdi_set_motion(struct ipu_vdi *vdi, enum ipu_motion_sel motion_sel);
+void ipu_vdi_setup(struct ipu_vdi *vdi, u32 code, int xres, int yres);
+void ipu_vdi_unsetup(struct ipu_vdi *vdi);
+int ipu_vdi_enable(struct ipu_vdi *vdi);
+int ipu_vdi_disable(struct ipu_vdi *vdi);
+struct ipu_vdi *ipu_vdi_get(struct ipu_soc *ipu);
+void ipu_vdi_put(struct ipu_vdi *vdi);
+
+/*
  * IPU Sensor Multiple FIFO Controller (SMFC) functions
  */
 struct ipu_smfc *ipu_smfc_get(struct ipu_soc *ipu, unsigned int chno);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v4 1/4] gpu: ipu-v3: Add Video Deinterlacer unit
@ 2016-08-18  0:50   ` Steve Longerbeam
  0 siblings, 0 replies; 32+ messages in thread
From: Steve Longerbeam @ 2016-08-18  0:50 UTC (permalink / raw)
  To: p.zabel, plagnioj, tomi.valkeinen
  Cc: dri-devel, linux-kernel, linux-fbdev, Steve Longerbeam

Adds the Video Deinterlacer (VDIC) unit.

Signed-off-by: Steve Longerbeam <steve_longerbeam@mentor.com>

---

v4:
- pruned included headers.

v3:
- renamed and exported ipu_vdi_set_top_field_man() to
  ipu_vdi_set_field_order(). Args include std and field to determine
  correct field order.
- exported ipu_vdi_set_motion().
- ipu_vdi_setup() does not need to call ipu_vdi_set_top_field_man() or
  ipu_vdi_set_motion(), since latter are exported. This simplifies args.
- removed ipu_vdi_toggle_top_field_man().
- removed ipu_vdi_set_src().

v2:
- removed include of module.h
- corrected V4L2 field type checks
- cleaned up use_count decrement in ipu_vdi_disable()
---
 drivers/gpu/ipu-v3/Makefile     |   2 +-
 drivers/gpu/ipu-v3/ipu-common.c |  11 ++
 drivers/gpu/ipu-v3/ipu-prv.h    |   6 +
 drivers/gpu/ipu-v3/ipu-vdi.c    | 243 ++++++++++++++++++++++++++++++++++++++++
 include/video/imx-ipu-v3.h      |  23 ++++
 5 files changed, 284 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/ipu-v3/ipu-vdi.c

diff --git a/drivers/gpu/ipu-v3/Makefile b/drivers/gpu/ipu-v3/Makefile
index 107ec23..aeba9dc 100644
--- a/drivers/gpu/ipu-v3/Makefile
+++ b/drivers/gpu/ipu-v3/Makefile
@@ -1,4 +1,4 @@
 obj-$(CONFIG_IMX_IPUV3_CORE) += imx-ipu-v3.o
 
 imx-ipu-v3-objs := ipu-common.o ipu-cpmem.o ipu-csi.o ipu-dc.o ipu-di.o \
-		ipu-dp.o ipu-dmfc.o ipu-ic.o ipu-smfc.o
+		ipu-dp.o ipu-dmfc.o ipu-ic.o ipu-smfc.o ipu-vdi.o
diff --git a/drivers/gpu/ipu-v3/ipu-common.c b/drivers/gpu/ipu-v3/ipu-common.c
index d230988..9d3584b 100644
--- a/drivers/gpu/ipu-v3/ipu-common.c
+++ b/drivers/gpu/ipu-v3/ipu-common.c
@@ -839,6 +839,14 @@ static int ipu_submodules_init(struct ipu_soc *ipu,
 		goto err_ic;
 	}
 
+	ret = ipu_vdi_init(ipu, dev, ipu_base + devtype->vdi_ofs,
+			   IPU_CONF_VDI_EN | IPU_CONF_ISP_EN |
+			   IPU_CONF_IC_INPUT);
+	if (ret) {
+		unit = "vdi";
+		goto err_vdi;
+	}
+
 	ret = ipu_di_init(ipu, dev, 0, ipu_base + devtype->disp0_ofs,
 			  IPU_CONF_DI0_EN, ipu_clk);
 	if (ret) {
@@ -893,6 +901,8 @@ err_dc:
 err_di_1:
 	ipu_di_exit(ipu, 0);
 err_di_0:
+	ipu_vdi_exit(ipu);
+err_vdi:
 	ipu_ic_exit(ipu);
 err_ic:
 	ipu_csi_exit(ipu, 1);
@@ -977,6 +987,7 @@ static void ipu_submodules_exit(struct ipu_soc *ipu)
 	ipu_dc_exit(ipu);
 	ipu_di_exit(ipu, 1);
 	ipu_di_exit(ipu, 0);
+	ipu_vdi_exit(ipu);
 	ipu_ic_exit(ipu);
 	ipu_csi_exit(ipu, 1);
 	ipu_csi_exit(ipu, 0);
diff --git a/drivers/gpu/ipu-v3/ipu-prv.h b/drivers/gpu/ipu-v3/ipu-prv.h
index fd47f8f..02057d8 100644
--- a/drivers/gpu/ipu-v3/ipu-prv.h
+++ b/drivers/gpu/ipu-v3/ipu-prv.h
@@ -138,6 +138,7 @@ struct ipu_dc_priv;
 struct ipu_dmfc_priv;
 struct ipu_di;
 struct ipu_ic_priv;
+struct ipu_vdi;
 struct ipu_smfc_priv;
 
 struct ipu_devtype;
@@ -170,6 +171,7 @@ struct ipu_soc {
 	struct ipu_di		*di_priv[2];
 	struct ipu_csi		*csi_priv[2];
 	struct ipu_ic_priv	*ic_priv;
+	struct ipu_vdi          *vdi_priv;
 	struct ipu_smfc_priv	*smfc_priv;
 };
 
@@ -200,6 +202,10 @@ int ipu_ic_init(struct ipu_soc *ipu, struct device *dev,
 		unsigned long base, unsigned long tpmem_base);
 void ipu_ic_exit(struct ipu_soc *ipu);
 
+int ipu_vdi_init(struct ipu_soc *ipu, struct device *dev,
+		 unsigned long base, u32 module);
+void ipu_vdi_exit(struct ipu_soc *ipu);
+
 int ipu_di_init(struct ipu_soc *ipu, struct device *dev, int id,
 		unsigned long base, u32 module, struct clk *ipu_clk);
 void ipu_di_exit(struct ipu_soc *ipu, int id);
diff --git a/drivers/gpu/ipu-v3/ipu-vdi.c b/drivers/gpu/ipu-v3/ipu-vdi.c
new file mode 100644
index 0000000..f27bf5a
--- /dev/null
+++ b/drivers/gpu/ipu-v3/ipu-vdi.c
@@ -0,0 +1,243 @@
+/*
+ * Copyright (C) 2012-2016 Mentor Graphics Inc.
+ * Copyright (C) 2005-2009 Freescale Semiconductor, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2 of the License, or (at your
+ * option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ */
+#include <linux/io.h>
+#include "ipu-prv.h"
+
+struct ipu_vdi {
+	void __iomem *base;
+	u32 module;
+	spinlock_t lock;
+	int use_count;
+	struct ipu_soc *ipu;
+};
+
+
+/* VDI Register Offsets */
+#define VDI_FSIZE 0x0000
+#define VDI_C     0x0004
+
+/* VDI Register Fields */
+#define VDI_C_CH_420             (0 << 1)
+#define VDI_C_CH_422             (1 << 1)
+#define VDI_C_MOT_SEL_MASK       (0x3 << 2)
+#define VDI_C_MOT_SEL_FULL       (2 << 2)
+#define VDI_C_MOT_SEL_LOW        (1 << 2)
+#define VDI_C_MOT_SEL_MED        (0 << 2)
+#define VDI_C_BURST_SIZE1_4      (3 << 4)
+#define VDI_C_BURST_SIZE2_4      (3 << 8)
+#define VDI_C_BURST_SIZE3_4      (3 << 12)
+#define VDI_C_BURST_SIZE_MASK    0xF
+#define VDI_C_BURST_SIZE1_OFFSET 4
+#define VDI_C_BURST_SIZE2_OFFSET 8
+#define VDI_C_BURST_SIZE3_OFFSET 12
+#define VDI_C_VWM1_SET_1         (0 << 16)
+#define VDI_C_VWM1_SET_2         (1 << 16)
+#define VDI_C_VWM1_CLR_2         (1 << 19)
+#define VDI_C_VWM3_SET_1         (0 << 22)
+#define VDI_C_VWM3_SET_2         (1 << 22)
+#define VDI_C_VWM3_CLR_2         (1 << 25)
+#define VDI_C_TOP_FIELD_MAN_1    (1 << 30)
+#define VDI_C_TOP_FIELD_AUTO_1   (1 << 31)
+
+static inline u32 ipu_vdi_read(struct ipu_vdi *vdi, unsigned int offset)
+{
+	return readl(vdi->base + offset);
+}
+
+static inline void ipu_vdi_write(struct ipu_vdi *vdi, u32 value,
+				 unsigned int offset)
+{
+	writel(value, vdi->base + offset);
+}
+
+void ipu_vdi_set_field_order(struct ipu_vdi *vdi, v4l2_std_id std, u32 field)
+{
+	bool top_field_0 = false;
+	unsigned long flags;
+	u32 reg;
+
+	switch (field) {
+	case V4L2_FIELD_INTERLACED_TB:
+	case V4L2_FIELD_SEQ_TB:
+	case V4L2_FIELD_TOP:
+		top_field_0 = true;
+		break;
+	case V4L2_FIELD_INTERLACED_BT:
+	case V4L2_FIELD_SEQ_BT:
+	case V4L2_FIELD_BOTTOM:
+		top_field_0 = false;
+		break;
+	default:
+		top_field_0 = (std & V4L2_STD_525_60) ? true : false;
+		break;
+	}
+
+	spin_lock_irqsave(&vdi->lock, flags);
+
+	reg = ipu_vdi_read(vdi, VDI_C);
+	if (top_field_0)
+		reg &= ~VDI_C_TOP_FIELD_MAN_1;
+	else
+		reg |= VDI_C_TOP_FIELD_MAN_1;
+	ipu_vdi_write(vdi, reg, VDI_C);
+
+	spin_unlock_irqrestore(&vdi->lock, flags);
+}
+EXPORT_SYMBOL_GPL(ipu_vdi_set_field_order);
+
+void ipu_vdi_set_motion(struct ipu_vdi *vdi, enum ipu_motion_sel motion_sel)
+{
+	unsigned long flags;
+	u32 reg;
+
+	spin_lock_irqsave(&vdi->lock, flags);
+
+	reg = ipu_vdi_read(vdi, VDI_C);
+
+	reg &= ~VDI_C_MOT_SEL_MASK;
+
+	switch (motion_sel) {
+	case MED_MOTION:
+		reg |= VDI_C_MOT_SEL_MED;
+		break;
+	case HIGH_MOTION:
+		reg |= VDI_C_MOT_SEL_FULL;
+		break;
+	default:
+		reg |= VDI_C_MOT_SEL_LOW;
+		break;
+	}
+
+	ipu_vdi_write(vdi, reg, VDI_C);
+
+	spin_unlock_irqrestore(&vdi->lock, flags);
+}
+EXPORT_SYMBOL_GPL(ipu_vdi_set_motion);
+
+void ipu_vdi_setup(struct ipu_vdi *vdi, u32 code, int xres, int yres)
+{
+	unsigned long flags;
+	u32 pixel_fmt, reg;
+
+	spin_lock_irqsave(&vdi->lock, flags);
+
+	reg = ((yres - 1) << 16) | (xres - 1);
+	ipu_vdi_write(vdi, reg, VDI_FSIZE);
+
+	/*
+	 * Full motion, only vertical filter is used.
+	 * Burst size is 4 accesses
+	 */
+	if (code = MEDIA_BUS_FMT_UYVY8_2X8 ||
+	    code = MEDIA_BUS_FMT_UYVY8_1X16 ||
+	    code = MEDIA_BUS_FMT_YUYV8_2X8 ||
+	    code = MEDIA_BUS_FMT_YUYV8_1X16)
+		pixel_fmt = VDI_C_CH_422;
+	else
+		pixel_fmt = VDI_C_CH_420;
+
+	reg = ipu_vdi_read(vdi, VDI_C);
+	reg |= pixel_fmt;
+	reg |= VDI_C_BURST_SIZE2_4;
+	reg |= VDI_C_BURST_SIZE1_4 | VDI_C_VWM1_CLR_2;
+	reg |= VDI_C_BURST_SIZE3_4 | VDI_C_VWM3_CLR_2;
+	ipu_vdi_write(vdi, reg, VDI_C);
+
+	spin_unlock_irqrestore(&vdi->lock, flags);
+}
+EXPORT_SYMBOL_GPL(ipu_vdi_setup);
+
+void ipu_vdi_unsetup(struct ipu_vdi *vdi)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&vdi->lock, flags);
+	ipu_vdi_write(vdi, 0, VDI_FSIZE);
+	ipu_vdi_write(vdi, 0, VDI_C);
+	spin_unlock_irqrestore(&vdi->lock, flags);
+}
+EXPORT_SYMBOL_GPL(ipu_vdi_unsetup);
+
+int ipu_vdi_enable(struct ipu_vdi *vdi)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&vdi->lock, flags);
+
+	if (!vdi->use_count)
+		ipu_module_enable(vdi->ipu, vdi->module);
+
+	vdi->use_count++;
+
+	spin_unlock_irqrestore(&vdi->lock, flags);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(ipu_vdi_enable);
+
+int ipu_vdi_disable(struct ipu_vdi *vdi)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&vdi->lock, flags);
+
+	if (vdi->use_count) {
+		if (!--vdi->use_count)
+			ipu_module_disable(vdi->ipu, vdi->module);
+	}
+
+	spin_unlock_irqrestore(&vdi->lock, flags);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(ipu_vdi_disable);
+
+struct ipu_vdi *ipu_vdi_get(struct ipu_soc *ipu)
+{
+	return ipu->vdi_priv;
+}
+EXPORT_SYMBOL_GPL(ipu_vdi_get);
+
+void ipu_vdi_put(struct ipu_vdi *vdi)
+{
+}
+EXPORT_SYMBOL_GPL(ipu_vdi_put);
+
+int ipu_vdi_init(struct ipu_soc *ipu, struct device *dev,
+		 unsigned long base, u32 module)
+{
+	struct ipu_vdi *vdi;
+
+	vdi = devm_kzalloc(dev, sizeof(*vdi), GFP_KERNEL);
+	if (!vdi)
+		return -ENOMEM;
+
+	ipu->vdi_priv = vdi;
+
+	spin_lock_init(&vdi->lock);
+	vdi->module = module;
+	vdi->base = devm_ioremap(dev, base, PAGE_SIZE);
+	if (!vdi->base)
+		return -ENOMEM;
+
+	dev_dbg(dev, "VDI base: 0x%08lx remapped to %p\n", base, vdi->base);
+	vdi->ipu = ipu;
+
+	return 0;
+}
+
+void ipu_vdi_exit(struct ipu_soc *ipu)
+{
+}
diff --git a/include/video/imx-ipu-v3.h b/include/video/imx-ipu-v3.h
index c3de740..335d42e 100644
--- a/include/video/imx-ipu-v3.h
+++ b/include/video/imx-ipu-v3.h
@@ -80,6 +80,16 @@ enum ipu_color_space {
 	IPUV3_COLORSPACE_UNKNOWN,
 };
 
+/*
+ * Enumeration of VDI MOTION select
+ */
+enum ipu_motion_sel {
+	MOTION_NONE = 0,
+	LOW_MOTION,
+	MED_MOTION,
+	HIGH_MOTION,
+};
+
 struct ipuv3_channel;
 
 enum ipu_channel_irq {
@@ -335,6 +345,19 @@ void ipu_ic_put(struct ipu_ic *ic);
 void ipu_ic_dump(struct ipu_ic *ic);
 
 /*
+ * IPU Video De-Interlacer (vdi) functions
+ */
+struct ipu_vdi;
+void ipu_vdi_set_field_order(struct ipu_vdi *vdi, v4l2_std_id std, u32 field);
+void ipu_vdi_set_motion(struct ipu_vdi *vdi, enum ipu_motion_sel motion_sel);
+void ipu_vdi_setup(struct ipu_vdi *vdi, u32 code, int xres, int yres);
+void ipu_vdi_unsetup(struct ipu_vdi *vdi);
+int ipu_vdi_enable(struct ipu_vdi *vdi);
+int ipu_vdi_disable(struct ipu_vdi *vdi);
+struct ipu_vdi *ipu_vdi_get(struct ipu_soc *ipu);
+void ipu_vdi_put(struct ipu_vdi *vdi);
+
+/*
  * IPU Sensor Multiple FIFO Controller (SMFC) functions
  */
 struct ipu_smfc *ipu_smfc_get(struct ipu_soc *ipu, unsigned int chno);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v4 2/4] gpu: ipu-v3: Add FSU channel linking support
  2016-08-18  0:50 ` Steve Longerbeam
@ 2016-08-18  0:50   ` Steve Longerbeam
  -1 siblings, 0 replies; 32+ messages in thread
From: Steve Longerbeam @ 2016-08-18  0:50 UTC (permalink / raw)
  To: p.zabel, plagnioj, tomi.valkeinen
  Cc: dri-devel, linux-kernel, linux-fbdev, Steve Longerbeam

Adds functions to link and unlink source channels to sink
channels in the FSU:

int ipu_fsu_link(struct ipu_soc *ipu, int src_ch, int sink_ch);
int ipu_fsu_unlink(struct ipu_soc *ipu, int src_ch, int sink_ch);

The channels numbers are usually IDMAC channels, but they can also be
channels that do not transfer data to or from memory. The following
convenience functions can be used in place of ipu_fsu_link/unlink()
when both source and sink channels are IDMAC channels:

int ipu_idmac_link(struct ipuv3_channel *src, struct ipuv3_channel *sink);
int ipu_idmac_unlink(struct ipuv3_channel *src, struct ipuv3_channel *sink);

So far the following links are supported:

IPUV3_CHANNEL_IC_PRP_ENC_MEM -> IPUV3_CHANNEL_MEM_ROT_ENC
PUV3_CHANNEL_IC_PRP_VF_MEM   -> IPUV3_CHANNEL_MEM_ROT_VF
IPUV3_CHANNEL_IC_PP_MEM      -> IPUV3_CHANNEL_MEM_ROT_PP
IPUV3_CHANNEL_CSI_DIRECT     -> IPUV3_CHANNEL_CSI_VDI_PREV

More links can be added to the fsu_link_info[] array.

Signed-off-by: Steve Longerbeam <steve_longerbeam@mentor.com>
---
 drivers/gpu/ipu-v3/ipu-common.c | 131 ++++++++++++++++++++++++++++++++++++++++
 drivers/gpu/ipu-v3/ipu-prv.h    |  27 +++++++++
 include/video/imx-ipu-v3.h      |  13 ++++
 3 files changed, 171 insertions(+)

diff --git a/drivers/gpu/ipu-v3/ipu-common.c b/drivers/gpu/ipu-v3/ipu-common.c
index 9d3584b..891cbef 100644
--- a/drivers/gpu/ipu-v3/ipu-common.c
+++ b/drivers/gpu/ipu-v3/ipu-common.c
@@ -730,6 +730,137 @@ void ipu_set_ic_src_mux(struct ipu_soc *ipu, int csi_id, bool vdi)
 }
 EXPORT_SYMBOL_GPL(ipu_set_ic_src_mux);
 
+
+/* Frame Synchronization Unit Channel Linking */
+
+struct fsu_link_reg_info {
+	int chno;
+	u32 reg;
+	u32 mask;
+	u32 val;
+};
+
+struct fsu_link_info {
+	struct fsu_link_reg_info src;
+	struct fsu_link_reg_info sink;
+};
+
+static const struct fsu_link_info fsu_link_info[] = {
+	{
+		.src  = { IPUV3_CHANNEL_IC_PRP_ENC_MEM, IPU_FS_PROC_FLOW2,
+			  FS_PRP_ENC_DEST_SEL_MASK, FS_PRP_ENC_DEST_SEL_IRT_ENC },
+		.sink = { IPUV3_CHANNEL_MEM_ROT_ENC, IPU_FS_PROC_FLOW1,
+			  FS_PRPENC_ROT_SRC_SEL_MASK, FS_PRPENC_ROT_SRC_SEL_ENC },
+	}, {
+		.src =  { IPUV3_CHANNEL_IC_PRP_VF_MEM, IPU_FS_PROC_FLOW2,
+			  FS_PRPVF_DEST_SEL_MASK, FS_PRPVF_DEST_SEL_IRT_VF },
+		.sink = { IPUV3_CHANNEL_MEM_ROT_VF, IPU_FS_PROC_FLOW1,
+			  FS_PRPVF_ROT_SRC_SEL_MASK, FS_PRPVF_ROT_SRC_SEL_VF },
+	}, {
+		.src =  { IPUV3_CHANNEL_IC_PP_MEM, IPU_FS_PROC_FLOW2,
+			  FS_PP_DEST_SEL_MASK, FS_PP_DEST_SEL_IRT_PP },
+		.sink = { IPUV3_CHANNEL_MEM_ROT_PP, IPU_FS_PROC_FLOW1,
+			  FS_PP_ROT_SRC_SEL_MASK, FS_PP_ROT_SRC_SEL_PP },
+	}, {
+		.src =  { IPUV3_CHANNEL_CSI_DIRECT, 0 },
+		.sink = { IPUV3_CHANNEL_CSI_VDI_PREV, IPU_FS_PROC_FLOW1,
+			  FS_VDI_SRC_SEL_MASK, FS_VDI_SRC_SEL_CSI_DIRECT },
+	},
+};
+
+static const struct fsu_link_info *find_fsu_link_info(int src, int sink)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(fsu_link_info); i++) {
+		if (src == fsu_link_info[i].src.chno &&
+		    sink == fsu_link_info[i].sink.chno)
+			return &fsu_link_info[i];
+	}
+
+	return NULL;
+}
+
+/*
+ * Links a source channel to a sink channel in the FSU.
+ */
+int ipu_fsu_link(struct ipu_soc *ipu, int src_ch, int sink_ch)
+{
+	const struct fsu_link_info *link;
+	u32 src_reg, sink_reg;
+	unsigned long flags;
+
+	link = find_fsu_link_info(src_ch, sink_ch);
+	if (!link)
+		return -EINVAL;
+
+	spin_lock_irqsave(&ipu->lock, flags);
+
+	if (link->src.mask) {
+		src_reg = ipu_cm_read(ipu, link->src.reg);
+		src_reg &= ~link->src.mask;
+		src_reg |= link->src.val;
+		ipu_cm_write(ipu, src_reg, link->src.reg);
+	}
+
+	if (link->sink.mask) {
+		sink_reg = ipu_cm_read(ipu, link->sink.reg);
+		sink_reg &= ~link->sink.mask;
+		sink_reg |= link->sink.val;
+		ipu_cm_write(ipu, sink_reg, link->sink.reg);
+	}
+
+	spin_unlock_irqrestore(&ipu->lock, flags);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(ipu_fsu_link);
+
+/*
+ * Unlinks source and sink channels in the FSU.
+ */
+int ipu_fsu_unlink(struct ipu_soc *ipu, int src_ch, int sink_ch)
+{
+	const struct fsu_link_info *link;
+	u32 src_reg, sink_reg;
+	unsigned long flags;
+
+	link = find_fsu_link_info(src_ch, sink_ch);
+	if (!link)
+		return -EINVAL;
+
+	spin_lock_irqsave(&ipu->lock, flags);
+
+	if (link->src.mask) {
+		src_reg = ipu_cm_read(ipu, link->src.reg);
+		src_reg &= ~link->src.mask;
+		ipu_cm_write(ipu, src_reg, link->src.reg);
+	}
+
+	if (link->sink.mask) {
+		sink_reg = ipu_cm_read(ipu, link->sink.reg);
+		sink_reg &= ~link->sink.mask;
+		ipu_cm_write(ipu, sink_reg, link->sink.reg);
+	}
+
+	spin_unlock_irqrestore(&ipu->lock, flags);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(ipu_fsu_unlink);
+
+/* Link IDMAC channels in the FSU */
+int ipu_idmac_link(struct ipuv3_channel *src, struct ipuv3_channel *sink)
+{
+	return ipu_fsu_link(src->ipu, src->num, sink->num);
+}
+EXPORT_SYMBOL_GPL(ipu_idmac_link);
+
+/* Unlink IDMAC channels in the FSU */
+int ipu_idmac_unlink(struct ipuv3_channel *src, struct ipuv3_channel *sink)
+{
+	return ipu_fsu_unlink(src->ipu, src->num, sink->num);
+}
+EXPORT_SYMBOL_GPL(ipu_idmac_unlink);
+
 struct ipu_devtype {
 	const char *name;
 	unsigned long cm_ofs;
diff --git a/drivers/gpu/ipu-v3/ipu-prv.h b/drivers/gpu/ipu-v3/ipu-prv.h
index 02057d8..dca2c3a 100644
--- a/drivers/gpu/ipu-v3/ipu-prv.h
+++ b/drivers/gpu/ipu-v3/ipu-prv.h
@@ -75,6 +75,33 @@ struct ipu_soc;
 #define IPU_INT_CTRL(n)		IPU_CM_REG(0x003C + 4 * (n))
 #define IPU_INT_STAT(n)		IPU_CM_REG(0x0200 + 4 * (n))
 
+/* FS_PROC_FLOW1 */
+#define FS_PRPENC_ROT_SRC_SEL_MASK	(0xf << 0)
+#define FS_PRPENC_ROT_SRC_SEL_ENC		(0x7 << 0)
+#define FS_PRPVF_ROT_SRC_SEL_MASK	(0xf << 8)
+#define FS_PRPVF_ROT_SRC_SEL_VF			(0x8 << 8)
+#define FS_PP_SRC_SEL_MASK		(0xf << 12)
+#define FS_PP_ROT_SRC_SEL_MASK		(0xf << 16)
+#define FS_PP_ROT_SRC_SEL_PP			(0x5 << 16)
+#define FS_VDI1_SRC_SEL_MASK		(0x3 << 20)
+#define FS_VDI3_SRC_SEL_MASK		(0x3 << 20)
+#define FS_PRP_SRC_SEL_MASK		(0xf << 24)
+#define FS_VDI_SRC_SEL_MASK		(0x3 << 28)
+#define FS_VDI_SRC_SEL_CSI_DIRECT		(0x1 << 28)
+#define FS_VDI_SRC_SEL_VDOA			(0x2 << 28)
+
+/* FS_PROC_FLOW2 */
+#define FS_PRP_ENC_DEST_SEL_MASK	(0xf << 0)
+#define FS_PRP_ENC_DEST_SEL_IRT_ENC		(0x1 << 0)
+#define FS_PRPVF_DEST_SEL_MASK		(0xf << 4)
+#define FS_PRPVF_DEST_SEL_IRT_VF		(0x1 << 4)
+#define FS_PRPVF_ROT_DEST_SEL_MASK	(0xf << 8)
+#define FS_PP_DEST_SEL_MASK		(0xf << 12)
+#define FS_PP_DEST_SEL_IRT_PP			(0x3 << 12)
+#define FS_PP_ROT_DEST_SEL_MASK		(0xf << 16)
+#define FS_PRPENC_ROT_DEST_SEL_MASK	(0xf << 20)
+#define FS_PRP_DEST_SEL_MASK		(0xf << 24)
+
 #define IPU_DI0_COUNTER_RELEASE			(1 << 24)
 #define IPU_DI1_COUNTER_RELEASE			(1 << 25)
 
diff --git a/include/video/imx-ipu-v3.h b/include/video/imx-ipu-v3.h
index 335d42e..1a3f7d4 100644
--- a/include/video/imx-ipu-v3.h
+++ b/include/video/imx-ipu-v3.h
@@ -107,6 +107,14 @@ enum ipu_channel_irq {
 #define IPUV3_CHANNEL_CSI2			 2
 #define IPUV3_CHANNEL_CSI3			 3
 #define IPUV3_CHANNEL_VDI_MEM_IC_VF		 5
+/*
+ * NOTE: channels 6,7 are unused in the IPU and are not IDMAC channels,
+ * but the direct CSI->VDI linking is handled the same way as IDMAC
+ * channel linking in the FSU via the IPU_FS_PROC_FLOW registers, so
+ * these channel names are used to support the direct CSI->VDI link.
+ */
+#define IPUV3_CHANNEL_CSI_DIRECT		 6
+#define IPUV3_CHANNEL_CSI_VDI_PREV		 7
 #define IPUV3_CHANNEL_MEM_VDI_PREV		 8
 #define IPUV3_CHANNEL_MEM_VDI_CUR		 9
 #define IPUV3_CHANNEL_MEM_VDI_NEXT		10
@@ -143,6 +151,7 @@ enum ipu_channel_irq {
 #define IPUV3_CHANNEL_ROT_PP_MEM		50
 #define IPUV3_CHANNEL_MEM_BG_SYNC_ALPHA		51
 #define IPUV3_CHANNEL_MEM_BG_ASYNC_ALPHA	52
+#define IPUV3_NUM_CHANNELS			64
 
 int ipu_map_irq(struct ipu_soc *ipu, int irq);
 int ipu_idmac_channel_irq(struct ipu_soc *ipu, struct ipuv3_channel *channel,
@@ -186,6 +195,10 @@ int ipu_idmac_get_current_buffer(struct ipuv3_channel *channel);
 bool ipu_idmac_buffer_is_ready(struct ipuv3_channel *channel, u32 buf_num);
 void ipu_idmac_select_buffer(struct ipuv3_channel *channel, u32 buf_num);
 void ipu_idmac_clear_buffer(struct ipuv3_channel *channel, u32 buf_num);
+int ipu_fsu_link(struct ipu_soc *ipu, int src_ch, int sink_ch);
+int ipu_fsu_unlink(struct ipu_soc *ipu, int src_ch, int sink_ch);
+int ipu_idmac_link(struct ipuv3_channel *src, struct ipuv3_channel *sink);
+int ipu_idmac_unlink(struct ipuv3_channel *src, struct ipuv3_channel *sink);
 
 /*
  * IPU Channel Parameter Memory (cpmem) functions
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v4 2/4] gpu: ipu-v3: Add FSU channel linking support
@ 2016-08-18  0:50   ` Steve Longerbeam
  0 siblings, 0 replies; 32+ messages in thread
From: Steve Longerbeam @ 2016-08-18  0:50 UTC (permalink / raw)
  To: p.zabel, plagnioj, tomi.valkeinen
  Cc: dri-devel, linux-kernel, linux-fbdev, Steve Longerbeam

Adds functions to link and unlink source channels to sink
channels in the FSU:

int ipu_fsu_link(struct ipu_soc *ipu, int src_ch, int sink_ch);
int ipu_fsu_unlink(struct ipu_soc *ipu, int src_ch, int sink_ch);

The channels numbers are usually IDMAC channels, but they can also be
channels that do not transfer data to or from memory. The following
convenience functions can be used in place of ipu_fsu_link/unlink()
when both source and sink channels are IDMAC channels:

int ipu_idmac_link(struct ipuv3_channel *src, struct ipuv3_channel *sink);
int ipu_idmac_unlink(struct ipuv3_channel *src, struct ipuv3_channel *sink);

So far the following links are supported:

IPUV3_CHANNEL_IC_PRP_ENC_MEM -> IPUV3_CHANNEL_MEM_ROT_ENC
PUV3_CHANNEL_IC_PRP_VF_MEM   -> IPUV3_CHANNEL_MEM_ROT_VF
IPUV3_CHANNEL_IC_PP_MEM      -> IPUV3_CHANNEL_MEM_ROT_PP
IPUV3_CHANNEL_CSI_DIRECT     -> IPUV3_CHANNEL_CSI_VDI_PREV

More links can be added to the fsu_link_info[] array.

Signed-off-by: Steve Longerbeam <steve_longerbeam@mentor.com>
---
 drivers/gpu/ipu-v3/ipu-common.c | 131 ++++++++++++++++++++++++++++++++++++++++
 drivers/gpu/ipu-v3/ipu-prv.h    |  27 +++++++++
 include/video/imx-ipu-v3.h      |  13 ++++
 3 files changed, 171 insertions(+)

diff --git a/drivers/gpu/ipu-v3/ipu-common.c b/drivers/gpu/ipu-v3/ipu-common.c
index 9d3584b..891cbef 100644
--- a/drivers/gpu/ipu-v3/ipu-common.c
+++ b/drivers/gpu/ipu-v3/ipu-common.c
@@ -730,6 +730,137 @@ void ipu_set_ic_src_mux(struct ipu_soc *ipu, int csi_id, bool vdi)
 }
 EXPORT_SYMBOL_GPL(ipu_set_ic_src_mux);
 
+
+/* Frame Synchronization Unit Channel Linking */
+
+struct fsu_link_reg_info {
+	int chno;
+	u32 reg;
+	u32 mask;
+	u32 val;
+};
+
+struct fsu_link_info {
+	struct fsu_link_reg_info src;
+	struct fsu_link_reg_info sink;
+};
+
+static const struct fsu_link_info fsu_link_info[] = {
+	{
+		.src  = { IPUV3_CHANNEL_IC_PRP_ENC_MEM, IPU_FS_PROC_FLOW2,
+			  FS_PRP_ENC_DEST_SEL_MASK, FS_PRP_ENC_DEST_SEL_IRT_ENC },
+		.sink = { IPUV3_CHANNEL_MEM_ROT_ENC, IPU_FS_PROC_FLOW1,
+			  FS_PRPENC_ROT_SRC_SEL_MASK, FS_PRPENC_ROT_SRC_SEL_ENC },
+	}, {
+		.src =  { IPUV3_CHANNEL_IC_PRP_VF_MEM, IPU_FS_PROC_FLOW2,
+			  FS_PRPVF_DEST_SEL_MASK, FS_PRPVF_DEST_SEL_IRT_VF },
+		.sink = { IPUV3_CHANNEL_MEM_ROT_VF, IPU_FS_PROC_FLOW1,
+			  FS_PRPVF_ROT_SRC_SEL_MASK, FS_PRPVF_ROT_SRC_SEL_VF },
+	}, {
+		.src =  { IPUV3_CHANNEL_IC_PP_MEM, IPU_FS_PROC_FLOW2,
+			  FS_PP_DEST_SEL_MASK, FS_PP_DEST_SEL_IRT_PP },
+		.sink = { IPUV3_CHANNEL_MEM_ROT_PP, IPU_FS_PROC_FLOW1,
+			  FS_PP_ROT_SRC_SEL_MASK, FS_PP_ROT_SRC_SEL_PP },
+	}, {
+		.src =  { IPUV3_CHANNEL_CSI_DIRECT, 0 },
+		.sink = { IPUV3_CHANNEL_CSI_VDI_PREV, IPU_FS_PROC_FLOW1,
+			  FS_VDI_SRC_SEL_MASK, FS_VDI_SRC_SEL_CSI_DIRECT },
+	},
+};
+
+static const struct fsu_link_info *find_fsu_link_info(int src, int sink)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(fsu_link_info); i++) {
+		if (src = fsu_link_info[i].src.chno &&
+		    sink = fsu_link_info[i].sink.chno)
+			return &fsu_link_info[i];
+	}
+
+	return NULL;
+}
+
+/*
+ * Links a source channel to a sink channel in the FSU.
+ */
+int ipu_fsu_link(struct ipu_soc *ipu, int src_ch, int sink_ch)
+{
+	const struct fsu_link_info *link;
+	u32 src_reg, sink_reg;
+	unsigned long flags;
+
+	link = find_fsu_link_info(src_ch, sink_ch);
+	if (!link)
+		return -EINVAL;
+
+	spin_lock_irqsave(&ipu->lock, flags);
+
+	if (link->src.mask) {
+		src_reg = ipu_cm_read(ipu, link->src.reg);
+		src_reg &= ~link->src.mask;
+		src_reg |= link->src.val;
+		ipu_cm_write(ipu, src_reg, link->src.reg);
+	}
+
+	if (link->sink.mask) {
+		sink_reg = ipu_cm_read(ipu, link->sink.reg);
+		sink_reg &= ~link->sink.mask;
+		sink_reg |= link->sink.val;
+		ipu_cm_write(ipu, sink_reg, link->sink.reg);
+	}
+
+	spin_unlock_irqrestore(&ipu->lock, flags);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(ipu_fsu_link);
+
+/*
+ * Unlinks source and sink channels in the FSU.
+ */
+int ipu_fsu_unlink(struct ipu_soc *ipu, int src_ch, int sink_ch)
+{
+	const struct fsu_link_info *link;
+	u32 src_reg, sink_reg;
+	unsigned long flags;
+
+	link = find_fsu_link_info(src_ch, sink_ch);
+	if (!link)
+		return -EINVAL;
+
+	spin_lock_irqsave(&ipu->lock, flags);
+
+	if (link->src.mask) {
+		src_reg = ipu_cm_read(ipu, link->src.reg);
+		src_reg &= ~link->src.mask;
+		ipu_cm_write(ipu, src_reg, link->src.reg);
+	}
+
+	if (link->sink.mask) {
+		sink_reg = ipu_cm_read(ipu, link->sink.reg);
+		sink_reg &= ~link->sink.mask;
+		ipu_cm_write(ipu, sink_reg, link->sink.reg);
+	}
+
+	spin_unlock_irqrestore(&ipu->lock, flags);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(ipu_fsu_unlink);
+
+/* Link IDMAC channels in the FSU */
+int ipu_idmac_link(struct ipuv3_channel *src, struct ipuv3_channel *sink)
+{
+	return ipu_fsu_link(src->ipu, src->num, sink->num);
+}
+EXPORT_SYMBOL_GPL(ipu_idmac_link);
+
+/* Unlink IDMAC channels in the FSU */
+int ipu_idmac_unlink(struct ipuv3_channel *src, struct ipuv3_channel *sink)
+{
+	return ipu_fsu_unlink(src->ipu, src->num, sink->num);
+}
+EXPORT_SYMBOL_GPL(ipu_idmac_unlink);
+
 struct ipu_devtype {
 	const char *name;
 	unsigned long cm_ofs;
diff --git a/drivers/gpu/ipu-v3/ipu-prv.h b/drivers/gpu/ipu-v3/ipu-prv.h
index 02057d8..dca2c3a 100644
--- a/drivers/gpu/ipu-v3/ipu-prv.h
+++ b/drivers/gpu/ipu-v3/ipu-prv.h
@@ -75,6 +75,33 @@ struct ipu_soc;
 #define IPU_INT_CTRL(n)		IPU_CM_REG(0x003C + 4 * (n))
 #define IPU_INT_STAT(n)		IPU_CM_REG(0x0200 + 4 * (n))
 
+/* FS_PROC_FLOW1 */
+#define FS_PRPENC_ROT_SRC_SEL_MASK	(0xf << 0)
+#define FS_PRPENC_ROT_SRC_SEL_ENC		(0x7 << 0)
+#define FS_PRPVF_ROT_SRC_SEL_MASK	(0xf << 8)
+#define FS_PRPVF_ROT_SRC_SEL_VF			(0x8 << 8)
+#define FS_PP_SRC_SEL_MASK		(0xf << 12)
+#define FS_PP_ROT_SRC_SEL_MASK		(0xf << 16)
+#define FS_PP_ROT_SRC_SEL_PP			(0x5 << 16)
+#define FS_VDI1_SRC_SEL_MASK		(0x3 << 20)
+#define FS_VDI3_SRC_SEL_MASK		(0x3 << 20)
+#define FS_PRP_SRC_SEL_MASK		(0xf << 24)
+#define FS_VDI_SRC_SEL_MASK		(0x3 << 28)
+#define FS_VDI_SRC_SEL_CSI_DIRECT		(0x1 << 28)
+#define FS_VDI_SRC_SEL_VDOA			(0x2 << 28)
+
+/* FS_PROC_FLOW2 */
+#define FS_PRP_ENC_DEST_SEL_MASK	(0xf << 0)
+#define FS_PRP_ENC_DEST_SEL_IRT_ENC		(0x1 << 0)
+#define FS_PRPVF_DEST_SEL_MASK		(0xf << 4)
+#define FS_PRPVF_DEST_SEL_IRT_VF		(0x1 << 4)
+#define FS_PRPVF_ROT_DEST_SEL_MASK	(0xf << 8)
+#define FS_PP_DEST_SEL_MASK		(0xf << 12)
+#define FS_PP_DEST_SEL_IRT_PP			(0x3 << 12)
+#define FS_PP_ROT_DEST_SEL_MASK		(0xf << 16)
+#define FS_PRPENC_ROT_DEST_SEL_MASK	(0xf << 20)
+#define FS_PRP_DEST_SEL_MASK		(0xf << 24)
+
 #define IPU_DI0_COUNTER_RELEASE			(1 << 24)
 #define IPU_DI1_COUNTER_RELEASE			(1 << 25)
 
diff --git a/include/video/imx-ipu-v3.h b/include/video/imx-ipu-v3.h
index 335d42e..1a3f7d4 100644
--- a/include/video/imx-ipu-v3.h
+++ b/include/video/imx-ipu-v3.h
@@ -107,6 +107,14 @@ enum ipu_channel_irq {
 #define IPUV3_CHANNEL_CSI2			 2
 #define IPUV3_CHANNEL_CSI3			 3
 #define IPUV3_CHANNEL_VDI_MEM_IC_VF		 5
+/*
+ * NOTE: channels 6,7 are unused in the IPU and are not IDMAC channels,
+ * but the direct CSI->VDI linking is handled the same way as IDMAC
+ * channel linking in the FSU via the IPU_FS_PROC_FLOW registers, so
+ * these channel names are used to support the direct CSI->VDI link.
+ */
+#define IPUV3_CHANNEL_CSI_DIRECT		 6
+#define IPUV3_CHANNEL_CSI_VDI_PREV		 7
 #define IPUV3_CHANNEL_MEM_VDI_PREV		 8
 #define IPUV3_CHANNEL_MEM_VDI_CUR		 9
 #define IPUV3_CHANNEL_MEM_VDI_NEXT		10
@@ -143,6 +151,7 @@ enum ipu_channel_irq {
 #define IPUV3_CHANNEL_ROT_PP_MEM		50
 #define IPUV3_CHANNEL_MEM_BG_SYNC_ALPHA		51
 #define IPUV3_CHANNEL_MEM_BG_ASYNC_ALPHA	52
+#define IPUV3_NUM_CHANNELS			64
 
 int ipu_map_irq(struct ipu_soc *ipu, int irq);
 int ipu_idmac_channel_irq(struct ipu_soc *ipu, struct ipuv3_channel *channel,
@@ -186,6 +195,10 @@ int ipu_idmac_get_current_buffer(struct ipuv3_channel *channel);
 bool ipu_idmac_buffer_is_ready(struct ipuv3_channel *channel, u32 buf_num);
 void ipu_idmac_select_buffer(struct ipuv3_channel *channel, u32 buf_num);
 void ipu_idmac_clear_buffer(struct ipuv3_channel *channel, u32 buf_num);
+int ipu_fsu_link(struct ipu_soc *ipu, int src_ch, int sink_ch);
+int ipu_fsu_unlink(struct ipu_soc *ipu, int src_ch, int sink_ch);
+int ipu_idmac_link(struct ipuv3_channel *src, struct ipuv3_channel *sink);
+int ipu_idmac_unlink(struct ipuv3_channel *src, struct ipuv3_channel *sink);
 
 /*
  * IPU Channel Parameter Memory (cpmem) functions
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v4 3/4] gpu: ipu-ic: Add complete image conversion support with tiling
  2016-08-18  0:50 ` Steve Longerbeam
@ 2016-08-18  0:50   ` Steve Longerbeam
  -1 siblings, 0 replies; 32+ messages in thread
From: Steve Longerbeam @ 2016-08-18  0:50 UTC (permalink / raw)
  To: p.zabel, plagnioj, tomi.valkeinen
  Cc: dri-devel, linux-kernel, linux-fbdev, Steve Longerbeam

This patch implements complete image conversion support to ipu-ic,
with tiling to support scaling to and from images up to 4096x4096.
Image rotation is also supported.

The internal API is subsystem agnostic (no V4L2 dependency except
for the use of V4L2 fourcc pixel formats).

Callers prepare for image conversion by calling
ipu_image_convert_prepare(), which initializes the parameters of
the conversion. The caller passes in the ipu_ic task to use for
the conversion, the input and output image formats, a rotation mode,
and a completion callback and completion context pointer:

struct image_converter_ctx *
ipu_image_convert_prepare(struct ipu_ic *ic,
                          struct ipu_image *in, struct ipu_image *out,
                          enum ipu_rotate_mode rot_mode,
                          image_converter_cb_t complete,
                          void *complete_context);

The caller is given a new conversion context that must be passed to
the further APIs:

struct image_converter_run *
ipu_image_convert_run(struct image_converter_ctx *ctx,
                      dma_addr_t in_phys, dma_addr_t out_phys);

This queues a new image conversion request to a run queue, and
starts the conversion immediately if the run queue is empty. Only
the physaddr's of the input and output image buffers are needed,
since the conversion context was created previously with
ipu_image_convert_prepare(). Returns a new run object pointer. When
the conversion completes, the run pointer is returned to the
completion callback.

void image_convert_abort(struct image_converter_ctx *ctx);

This will abort any active or pending conversions for this context.
Any currently active or pending runs belonging to this context are
returned via the completion callback with an error status.

void ipu_image_convert_unprepare(struct image_converter_ctx *ctx);

Unprepares the conversion context. Any active or pending runs will
be aborted by calling image_convert_abort().

Signed-off-by: Steve Longerbeam <steve_longerbeam@mentor.com>

---

v4:
- do away with struct ipu_ic_tile_off, and move tile offsets into
  struct ipu_ic_tile. This paves the way for possibly allowing for
  each tile to have different dimensions in the future.

v3: no changes
v2: no changes
---
 drivers/gpu/ipu-v3/ipu-ic.c | 1694 ++++++++++++++++++++++++++++++++++++++++++-
 include/video/imx-ipu-v3.h  |   57 +-
 2 files changed, 1739 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-ic.c b/drivers/gpu/ipu-v3/ipu-ic.c
index 1a37afc..01b1b56 100644
--- a/drivers/gpu/ipu-v3/ipu-ic.c
+++ b/drivers/gpu/ipu-v3/ipu-ic.c
@@ -17,6 +17,8 @@
 #include <linux/bitrev.h>
 #include <linux/io.h>
 #include <linux/err.h>
+#include <linux/interrupt.h>
+#include <linux/dma-mapping.h>
 #include "ipu-prv.h"
 
 /* IC Register Offsets */
@@ -82,6 +84,40 @@
 #define IC_IDMAC_3_PP_WIDTH_MASK        (0x3ff << 20)
 #define IC_IDMAC_3_PP_WIDTH_OFFSET      20
 
+/*
+ * The IC Resizer has a restriction that the output frame from the
+ * resizer must be 1024 or less in both width (pixels) and height
+ * (lines).
+ *
+ * The image conversion support attempts to split up a conversion when
+ * the desired output (converted) frame resolution exceeds the IC resizer
+ * limit of 1024 in either dimension.
+ *
+ * If either dimension of the output frame exceeds the limit, the
+ * dimension is split into 1, 2, or 4 equal stripes, for a maximum
+ * of 4*4 or 16 tiles. A conversion is then carried out for each
+ * tile (but taking care to pass the full frame stride length to
+ * the DMA channel's parameter memory!). IDMA double-buffering is used
+ * to convert each tile back-to-back when possible (see note below
+ * when double_buffering boolean is set).
+ *
+ * Note that the input frame must be split up into the same number
+ * of tiles as the output frame.
+ */
+#define MAX_STRIPES_W    4
+#define MAX_STRIPES_H    4
+#define MAX_TILES (MAX_STRIPES_W * MAX_STRIPES_H)
+
+#define MIN_W     128
+#define MIN_H     128
+#define MAX_W     4096
+#define MAX_H     4096
+
+enum image_convert_type {
+	IMAGE_CONVERT_IN = 0,
+	IMAGE_CONVERT_OUT,
+};
+
 struct ic_task_regoffs {
 	u32 rsc;
 	u32 tpmem_csc[2];
@@ -96,6 +132,16 @@ struct ic_task_bitfields {
 	u32 ic_cmb_galpha_bit;
 };
 
+struct ic_task_channels {
+	int in;
+	int out;
+	int rot_in;
+	int rot_out;
+	int vdi_in_p;
+	int vdi_in;
+	int vdi_in_n;
+};
+
 static const struct ic_task_regoffs ic_task_reg[IC_NUM_TASKS] = {
 	[IC_TASK_ENCODER] = {
 		.rsc = IC_PRP_ENC_RSC,
@@ -138,12 +184,155 @@ static const struct ic_task_bitfields ic_task_bit[IC_NUM_TASKS] = {
 	},
 };
 
+static const struct ic_task_channels ic_task_ch[IC_NUM_TASKS] = {
+	[IC_TASK_ENCODER] = {
+		.out = IPUV3_CHANNEL_IC_PRP_ENC_MEM,
+		.rot_in = IPUV3_CHANNEL_MEM_ROT_ENC,
+		.rot_out = IPUV3_CHANNEL_ROT_ENC_MEM,
+	},
+	[IC_TASK_VIEWFINDER] = {
+		.in = IPUV3_CHANNEL_MEM_IC_PRP_VF,
+		.out = IPUV3_CHANNEL_IC_PRP_VF_MEM,
+		.rot_in = IPUV3_CHANNEL_MEM_ROT_VF,
+		.rot_out = IPUV3_CHANNEL_ROT_VF_MEM,
+		.vdi_in_p = IPUV3_CHANNEL_MEM_VDI_PREV,
+		.vdi_in = IPUV3_CHANNEL_MEM_VDI_CUR,
+		.vdi_in_n = IPUV3_CHANNEL_MEM_VDI_NEXT,
+	},
+	[IC_TASK_POST_PROCESSOR] = {
+		.in = IPUV3_CHANNEL_MEM_IC_PP,
+		.out = IPUV3_CHANNEL_IC_PP_MEM,
+		.rot_in = IPUV3_CHANNEL_MEM_ROT_PP,
+		.rot_out = IPUV3_CHANNEL_ROT_PP_MEM,
+	},
+};
+
+struct ipu_ic_dma_buf {
+	void          *virt;
+	dma_addr_t    phys;
+	unsigned long len;
+};
+
+/* dimensions of one tile */
+struct ipu_ic_tile {
+	u32 width;
+	u32 height;
+	/* size and strides are in bytes */
+	u32 size;
+	u32 stride;
+	u32 rot_stride;
+	/* start Y or packed offset of this tile */
+	u32 offset;
+	/* offset from start to tile in U plane, for planar formats */
+	u32 u_off;
+	/* offset from start to tile in V plane, for planar formats */
+	u32 v_off;
+};
+
+struct ipu_ic_pixfmt {
+	char	*name;
+	u32	fourcc;        /* V4L2 fourcc */
+	int     bpp;           /* total bpp */
+	int     y_depth;       /* depth of Y plane for planar formats */
+	int     uv_width_dec;  /* decimation in width for U/V planes */
+	int     uv_height_dec; /* decimation in height for U/V planes */
+	bool    uv_swapped;    /* U and V planes are swapped */
+	bool    uv_packed;     /* partial planar (U and V in same plane) */
+};
+
+struct ipu_ic_image {
+	struct ipu_image base;
+	enum image_convert_type type;
+
+	const struct ipu_ic_pixfmt *fmt;
+	unsigned int stride;
+
+	/* # of rows (horizontal stripes) if dest height is > 1024 */
+	unsigned int num_rows;
+	/* # of columns (vertical stripes) if dest width is > 1024 */
+	unsigned int num_cols;
+
+	struct ipu_ic_tile tile[MAX_TILES];
+};
+
+struct image_converter_ctx;
+struct image_converter;
 struct ipu_ic_priv;
+struct ipu_ic;
+
+struct image_converter_run {
+	struct image_converter_ctx *ctx;
+
+	dma_addr_t in_phys;
+	dma_addr_t out_phys;
+
+	int status;
+
+	struct list_head list;
+};
+
+struct image_converter_ctx {
+	struct image_converter *cvt;
+
+	image_converter_cb_t complete;
+	void *complete_context;
+
+	/* Source/destination image data and rotation mode */
+	struct ipu_ic_image in;
+	struct ipu_ic_image out;
+	enum ipu_rotate_mode rot_mode;
+
+	/* intermediate buffer for rotation */
+	struct ipu_ic_dma_buf rot_intermediate[2];
+
+	/* current buffer number for double buffering */
+	int cur_buf_num;
+
+	bool aborting;
+	struct completion aborted;
+
+	/* can we use double-buffering for this conversion operation? */
+	bool double_buffering;
+	/* num_rows * num_cols */
+	unsigned int num_tiles;
+	/* next tile to process */
+	unsigned int next_tile;
+	/* where to place converted tile in dest image */
+	unsigned int out_tile_map[MAX_TILES];
+
+	struct list_head list;
+};
+
+struct image_converter {
+	struct ipu_ic *ic;
+
+	struct ipuv3_channel *in_chan;
+	struct ipuv3_channel *out_chan;
+	struct ipuv3_channel *rotation_in_chan;
+	struct ipuv3_channel *rotation_out_chan;
+
+	/* the IPU end-of-frame irqs */
+	int out_eof_irq;
+	int rot_out_eof_irq;
+
+	spinlock_t irqlock;
+
+	/* list of convert contexts */
+	struct list_head ctx_list;
+	/* queue of conversion runs */
+	struct list_head pending_q;
+	/* queue of completed runs */
+	struct list_head done_q;
+
+	/* the current conversion run */
+	struct image_converter_run *current_run;
+};
 
 struct ipu_ic {
 	enum ipu_ic_task task;
 	const struct ic_task_regoffs *reg;
 	const struct ic_task_bitfields *bit;
+	const struct ic_task_channels *ch;
 
 	enum ipu_color_space in_cs, g_in_cs;
 	enum ipu_color_space out_cs;
@@ -151,6 +340,8 @@ struct ipu_ic {
 	bool rotation;
 	bool in_use;
 
+	struct image_converter cvt;
+
 	struct ipu_ic_priv *priv;
 };
 
@@ -619,7 +810,7 @@ int ipu_ic_task_idma_init(struct ipu_ic *ic, struct ipuv3_channel *channel,
 	ipu_ic_write(ic, ic_idmac_2, IC_IDMAC_2);
 	ipu_ic_write(ic, ic_idmac_3, IC_IDMAC_3);
 
-	if (rot >= IPU_ROTATE_90_RIGHT)
+	if (ipu_rot_mode_is_irt(rot))
 		ic->rotation = true;
 
 unlock:
@@ -648,6 +839,1487 @@ static void ipu_irt_disable(struct ipu_ic *ic)
 	}
 }
 
+/*
+ * Complete image conversion support follows
+ */
+
+static const struct ipu_ic_pixfmt ipu_ic_formats[] = {
+	{
+		.name	= "RGB565",
+		.fourcc	= V4L2_PIX_FMT_RGB565,
+		.bpp    = 16,
+	}, {
+		.name	= "RGB24",
+		.fourcc	= V4L2_PIX_FMT_RGB24,
+		.bpp    = 24,
+	}, {
+		.name	= "BGR24",
+		.fourcc	= V4L2_PIX_FMT_BGR24,
+		.bpp    = 24,
+	}, {
+		.name	= "RGB32",
+		.fourcc	= V4L2_PIX_FMT_RGB32,
+		.bpp    = 32,
+	}, {
+		.name	= "BGR32",
+		.fourcc	= V4L2_PIX_FMT_BGR32,
+		.bpp    = 32,
+	}, {
+		.name	= "4:2:2 packed, YUYV",
+		.fourcc	= V4L2_PIX_FMT_YUYV,
+		.bpp    = 16,
+		.uv_width_dec = 2,
+		.uv_height_dec = 1,
+	}, {
+		.name	= "4:2:2 packed, UYVY",
+		.fourcc	= V4L2_PIX_FMT_UYVY,
+		.bpp    = 16,
+		.uv_width_dec = 2,
+		.uv_height_dec = 1,
+	}, {
+		.name	= "4:2:0 planar, YUV",
+		.fourcc	= V4L2_PIX_FMT_YUV420,
+		.bpp    = 12,
+		.y_depth = 8,
+		.uv_width_dec = 2,
+		.uv_height_dec = 2,
+	}, {
+		.name	= "4:2:0 planar, YVU",
+		.fourcc	= V4L2_PIX_FMT_YVU420,
+		.bpp    = 12,
+		.y_depth = 8,
+		.uv_width_dec = 2,
+		.uv_height_dec = 2,
+		.uv_swapped = true,
+	}, {
+		.name   = "4:2:0 partial planar, NV12",
+		.fourcc = V4L2_PIX_FMT_NV12,
+		.bpp    = 12,
+		.y_depth = 8,
+		.uv_width_dec = 2,
+		.uv_height_dec = 2,
+		.uv_packed = true,
+	}, {
+		.name   = "4:2:2 planar, YUV",
+		.fourcc = V4L2_PIX_FMT_YUV422P,
+		.bpp    = 16,
+		.y_depth = 8,
+		.uv_width_dec = 2,
+		.uv_height_dec = 1,
+	}, {
+		.name   = "4:2:2 partial planar, NV16",
+		.fourcc = V4L2_PIX_FMT_NV16,
+		.bpp    = 16,
+		.y_depth = 8,
+		.uv_width_dec = 2,
+		.uv_height_dec = 1,
+		.uv_packed = true,
+	},
+};
+
+static const struct ipu_ic_pixfmt *ipu_ic_get_format(u32 fourcc)
+{
+	const struct ipu_ic_pixfmt *ret = NULL;
+	unsigned int i;
+
+	for (i = 0; i < ARRAY_SIZE(ipu_ic_formats); i++) {
+		if (ipu_ic_formats[i].fourcc == fourcc) {
+			ret = &ipu_ic_formats[i];
+			break;
+		}
+	}
+
+	return ret;
+}
+
+static void ipu_ic_dump_format(struct image_converter_ctx *ctx,
+			       struct ipu_ic_image *ic_image)
+{
+	struct ipu_ic_priv *priv = ctx->cvt->ic->priv;
+
+	dev_dbg(priv->ipu->dev,
+		"ctx %p: %s format: %dx%d (%dx%d tiles of size %dx%d), %c%c%c%c\n",
+		ctx,
+		ic_image->type == IMAGE_CONVERT_OUT ? "Output" : "Input",
+		ic_image->base.pix.width, ic_image->base.pix.height,
+		ic_image->num_cols, ic_image->num_rows,
+		ic_image->tile[0].width, ic_image->tile[0].height,
+		ic_image->fmt->fourcc & 0xff,
+		(ic_image->fmt->fourcc >> 8) & 0xff,
+		(ic_image->fmt->fourcc >> 16) & 0xff,
+		(ic_image->fmt->fourcc >> 24) & 0xff);
+}
+
+int ipu_image_convert_enum_format(int index, const char **desc, u32 *fourcc)
+{
+	const struct ipu_ic_pixfmt *fmt;
+
+	if (index >= (int)ARRAY_SIZE(ipu_ic_formats))
+		return -EINVAL;
+
+	/* Format found */
+	fmt = &ipu_ic_formats[index];
+	*desc = fmt->name;
+	*fourcc = fmt->fourcc;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(ipu_image_convert_enum_format);
+
+static void ipu_ic_free_dma_buf(struct ipu_ic_priv *priv,
+				struct ipu_ic_dma_buf *buf)
+{
+	if (buf->virt)
+		dma_free_coherent(priv->ipu->dev,
+				  buf->len, buf->virt, buf->phys);
+	buf->virt = NULL;
+	buf->phys = 0;
+}
+
+static int ipu_ic_alloc_dma_buf(struct ipu_ic_priv *priv,
+				struct ipu_ic_dma_buf *buf,
+				int size)
+{
+	unsigned long newlen = PAGE_ALIGN(size);
+
+	if (buf->virt) {
+		if (buf->len == newlen)
+			return 0;
+		ipu_ic_free_dma_buf(priv, buf);
+	}
+
+	buf->len = newlen;
+	buf->virt = dma_alloc_coherent(priv->ipu->dev, buf->len, &buf->phys,
+				       GFP_DMA | GFP_KERNEL);
+	if (!buf->virt) {
+		dev_err(priv->ipu->dev, "failed to alloc dma buffer\n");
+		return -ENOMEM;
+	}
+
+	return 0;
+}
+
+static inline int ipu_ic_num_stripes(int dim)
+{
+	if (dim <= 1024)
+		return 1;
+	else if (dim <= 2048)
+		return 2;
+	else
+		return 4;
+}
+
+static void ipu_ic_calc_tile_dimensions(struct image_converter_ctx *ctx,
+					struct ipu_ic_image *image)
+{
+	int i;
+
+	for (i = 0; i < ctx->num_tiles; i++) {
+		struct ipu_ic_tile *tile = &image->tile[i];
+
+		tile->height = image->base.pix.height / image->num_rows;
+		tile->width = image->base.pix.width / image->num_cols;
+		tile->size = ((tile->height * image->fmt->bpp) >> 3) *
+			tile->width;
+
+		if (image->fmt->y_depth) {
+			tile->stride =
+				(image->fmt->y_depth * tile->width) >> 3;
+			tile->rot_stride =
+				(image->fmt->y_depth * tile->height) >> 3;
+		} else {
+			tile->stride =
+				(image->fmt->bpp * tile->width) >> 3;
+			tile->rot_stride =
+				(image->fmt->bpp * tile->height) >> 3;
+		}
+	}
+}
+
+/*
+ * Use the rotation transformation to find the tile coordinates
+ * (row, col) of a tile in the destination frame that corresponds
+ * to the given tile coordinates of a source frame. The destination
+ * coordinate is then converted to a tile index.
+ */
+static int ipu_ic_transform_tile_index(struct image_converter_ctx *ctx,
+				       int src_row, int src_col)
+{
+	struct ipu_ic_priv *priv = ctx->cvt->ic->priv;
+	struct ipu_ic_image *s_image = &ctx->in;
+	struct ipu_ic_image *d_image = &ctx->out;
+	int cos, sin, dst_row, dst_col;
+
+	/* with no rotation it's a 1:1 mapping */
+	if (ctx->rot_mode == IPU_ROTATE_NONE)
+		return src_row * s_image->num_cols + src_col;
+
+	if (ctx->rot_mode & IPU_ROT_BIT_90) {
+		cos = 0;
+		sin = 1;
+	} else {
+		cos = 1;
+		sin = 0;
+	}
+
+	/*
+	 * before doing the transform, first we have to translate
+	 * source row,col for an origin in the center of s_image
+	 */
+	src_row *= 2;
+	src_col *= 2;
+	src_row -= s_image->num_rows - 1;
+	src_col -= s_image->num_cols - 1;
+
+	/* do the rotation transform */
+	dst_col = src_col * cos - src_row * sin;
+	dst_row = src_col * sin + src_row * cos;
+
+	/* apply flip */
+	if (ctx->rot_mode & IPU_ROT_BIT_HFLIP)
+		dst_col = -dst_col;
+	if (ctx->rot_mode & IPU_ROT_BIT_VFLIP)
+		dst_row = -dst_row;
+
+	dev_dbg(priv->ipu->dev, "ctx %p: [%d,%d] --> [%d,%d]\n",
+		ctx, src_col, src_row, dst_col, dst_row);
+
+	/*
+	 * finally translate dest row,col using an origin in upper
+	 * left of d_image
+	 */
+	dst_row += d_image->num_rows - 1;
+	dst_col += d_image->num_cols - 1;
+	dst_row /= 2;
+	dst_col /= 2;
+
+	return dst_row * d_image->num_cols + dst_col;
+}
+
+/*
+ * Fill the out_tile_map[] with transformed destination tile indeces.
+ */
+static void ipu_ic_calc_out_tile_map(struct image_converter_ctx *ctx)
+{
+	struct ipu_ic_image *s_image = &ctx->in;
+	unsigned int row, col, tile = 0;
+
+	for (row = 0; row < s_image->num_rows; row++) {
+		for (col = 0; col < s_image->num_cols; col++) {
+			ctx->out_tile_map[tile] =
+				ipu_ic_transform_tile_index(ctx, row, col);
+			tile++;
+		}
+	}
+}
+
+static void ipu_ic_calc_tile_offsets_planar(struct image_converter_ctx *ctx,
+					    struct ipu_ic_image *image)
+{
+	struct ipu_ic_priv *priv = ctx->cvt->ic->priv;
+	const struct ipu_ic_pixfmt *fmt = image->fmt;
+	unsigned int row, col, tile = 0;
+	u32 H, w, h, y_depth, y_stride, uv_stride;
+	u32 uv_row_off, uv_col_off, uv_off, u_off, v_off, tmp;
+	u32 y_row_off, y_col_off, y_off;
+	u32 y_size, uv_size;
+
+	/* setup some convenience vars */
+	H = image->base.pix.height;
+
+	y_depth = fmt->y_depth;
+	y_stride = image->stride;
+	uv_stride = y_stride / fmt->uv_width_dec;
+	if (fmt->uv_packed)
+		uv_stride *= 2;
+
+	y_size = H * y_stride;
+	uv_size = y_size / (fmt->uv_width_dec * fmt->uv_height_dec);
+
+	for (row = 0; row < image->num_rows; row++) {
+		w = image->tile[tile].width;
+		h = image->tile[tile].height;
+		y_row_off = row * h * y_stride;
+		uv_row_off = (row * h * uv_stride) / fmt->uv_height_dec;
+
+		for (col = 0; col < image->num_cols; col++) {
+			y_col_off = (col * w * y_depth) >> 3;
+			uv_col_off = y_col_off / fmt->uv_width_dec;
+			if (fmt->uv_packed)
+				uv_col_off *= 2;
+
+			y_off = y_row_off + y_col_off;
+			uv_off = uv_row_off + uv_col_off;
+
+			u_off = y_size - y_off + uv_off;
+			v_off = (fmt->uv_packed) ? 0 : u_off + uv_size;
+			if (fmt->uv_swapped) {
+				tmp = u_off;
+				u_off = v_off;
+				v_off = tmp;
+			}
+
+			image->tile[tile].offset = y_off;
+			image->tile[tile].u_off = u_off;
+			image->tile[tile++].v_off = v_off;
+
+			dev_dbg(priv->ipu->dev,
+				"ctx %p: %s@[%d,%d]: y_off %08x, u_off %08x, v_off %08x\n",
+				ctx, image->type == IMAGE_CONVERT_IN ?
+				"Input" : "Output", row, col,
+				y_off, u_off, v_off);
+		}
+	}
+}
+
+static void ipu_ic_calc_tile_offsets_packed(struct image_converter_ctx *ctx,
+					    struct ipu_ic_image *image)
+{
+	struct ipu_ic_priv *priv = ctx->cvt->ic->priv;
+	const struct ipu_ic_pixfmt *fmt = image->fmt;
+	unsigned int row, col, tile = 0;
+	u32 w, h, bpp, stride;
+	u32 row_off, col_off;
+
+	/* setup some convenience vars */
+	stride = image->stride;
+	bpp = fmt->bpp;
+
+	for (row = 0; row < image->num_rows; row++) {
+		w = image->tile[tile].width;
+		h = image->tile[tile].height;
+		row_off = row * h * stride;
+
+		for (col = 0; col < image->num_cols; col++) {
+			col_off = (col * w * bpp) >> 3;
+
+			image->tile[tile].offset = row_off + col_off;
+			image->tile[tile].u_off = 0;
+			image->tile[tile++].v_off = 0;
+
+			dev_dbg(priv->ipu->dev,
+				"ctx %p: %s@[%d,%d]: phys %08x\n", ctx,
+				image->type == IMAGE_CONVERT_IN ?
+				"Input" : "Output", row, col,
+				row_off + col_off);
+		}
+	}
+}
+
+static void ipu_ic_calc_tile_offsets(struct image_converter_ctx *ctx,
+				     struct ipu_ic_image *image)
+{
+	if (image->fmt->y_depth)
+		ipu_ic_calc_tile_offsets_planar(ctx, image);
+	else
+		ipu_ic_calc_tile_offsets_packed(ctx, image);
+}
+
+/*
+ * return the number of runs in given queue (pending_q or done_q)
+ * for this context. hold irqlock when calling.
+ */
+static int ipu_ic_get_run_count(struct image_converter_ctx *ctx,
+				struct list_head *q)
+{
+	struct image_converter_run *run;
+	int count = 0;
+
+	list_for_each_entry(run, q, list) {
+		if (run->ctx == ctx)
+			count++;
+	}
+
+	return count;
+}
+
+/* hold irqlock when calling */
+static void ipu_ic_convert_stop(struct image_converter_run *run)
+{
+	struct image_converter_ctx *ctx = run->ctx;
+	struct image_converter *cvt = ctx->cvt;
+	struct ipu_ic_priv *priv = cvt->ic->priv;
+
+	dev_dbg(priv->ipu->dev, "%s: stopping ctx %p run %p\n",
+		__func__, ctx, run);
+
+	/* disable IC tasks and the channels */
+	ipu_ic_task_disable(cvt->ic);
+	ipu_idmac_disable_channel(cvt->in_chan);
+	ipu_idmac_disable_channel(cvt->out_chan);
+
+	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
+		ipu_idmac_disable_channel(cvt->rotation_in_chan);
+		ipu_idmac_disable_channel(cvt->rotation_out_chan);
+		ipu_idmac_unlink(cvt->out_chan, cvt->rotation_in_chan);
+	}
+
+	ipu_ic_disable(cvt->ic);
+}
+
+/* hold irqlock when calling */
+static void init_idmac_channel(struct image_converter_ctx *ctx,
+			       struct ipuv3_channel *channel,
+			       struct ipu_ic_image *image,
+			       enum ipu_rotate_mode rot_mode,
+			       bool rot_swap_width_height)
+{
+	struct image_converter *cvt = ctx->cvt;
+	unsigned int burst_size;
+	u32 width, height, stride;
+	dma_addr_t addr0, addr1 = 0;
+	struct ipu_image tile_image;
+	unsigned int tile_idx[2];
+
+	if (image->type == IMAGE_CONVERT_OUT) {
+		tile_idx[0] = ctx->out_tile_map[0];
+		tile_idx[1] = ctx->out_tile_map[1];
+	} else {
+		tile_idx[0] = 0;
+		tile_idx[1] = 1;
+	}
+
+	if (rot_swap_width_height) {
+		width = image->tile[0].height;
+		height = image->tile[0].width;
+		stride = image->tile[0].rot_stride;
+		addr0 = ctx->rot_intermediate[0].phys;
+		if (ctx->double_buffering)
+			addr1 = ctx->rot_intermediate[1].phys;
+	} else {
+		width = image->tile[0].width;
+		height = image->tile[0].height;
+		stride = image->stride;
+		addr0 = image->base.phys0 +
+			image->tile[tile_idx[0]].offset;
+		if (ctx->double_buffering)
+			addr1 = image->base.phys0 +
+				image->tile[tile_idx[1]].offset;
+	}
+
+	ipu_cpmem_zero(channel);
+
+	memset(&tile_image, 0, sizeof(tile_image));
+	tile_image.pix.width = tile_image.rect.width = width;
+	tile_image.pix.height = tile_image.rect.height = height;
+	tile_image.pix.bytesperline = stride;
+	tile_image.pix.pixelformat =  image->fmt->fourcc;
+	tile_image.phys0 = addr0;
+	tile_image.phys1 = addr1;
+	ipu_cpmem_set_image(channel, &tile_image);
+
+	if (image->fmt->y_depth && !rot_swap_width_height)
+		ipu_cpmem_set_uv_offset(channel,
+					image->tile[tile_idx[0]].u_off,
+					image->tile[tile_idx[0]].v_off);
+
+	if (rot_mode)
+		ipu_cpmem_set_rotation(channel, rot_mode);
+
+	if (channel == cvt->rotation_in_chan ||
+	    channel == cvt->rotation_out_chan) {
+		burst_size = 8;
+		ipu_cpmem_set_block_mode(channel);
+	} else
+		burst_size = (width % 16) ? 8 : 16;
+
+	ipu_cpmem_set_burstsize(channel, burst_size);
+
+	ipu_ic_task_idma_init(cvt->ic, channel, width, height,
+			      burst_size, rot_mode);
+
+	ipu_cpmem_set_axi_id(channel, 1);
+
+	ipu_idmac_set_double_buffer(channel, ctx->double_buffering);
+}
+
+/* hold irqlock when calling */
+static int ipu_ic_convert_start(struct image_converter_run *run)
+{
+	struct image_converter_ctx *ctx = run->ctx;
+	struct image_converter *cvt = ctx->cvt;
+	struct ipu_ic_priv *priv = cvt->ic->priv;
+	struct ipu_ic_image *s_image = &ctx->in;
+	struct ipu_ic_image *d_image = &ctx->out;
+	enum ipu_color_space src_cs, dest_cs;
+	unsigned int dest_width, dest_height;
+	int ret;
+
+	dev_dbg(priv->ipu->dev, "%s: starting ctx %p run %p\n",
+		__func__, ctx, run);
+
+	src_cs = ipu_pixelformat_to_colorspace(s_image->fmt->fourcc);
+	dest_cs = ipu_pixelformat_to_colorspace(d_image->fmt->fourcc);
+
+	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
+		/* swap width/height for resizer */
+		dest_width = d_image->tile[0].height;
+		dest_height = d_image->tile[0].width;
+	} else {
+		dest_width = d_image->tile[0].width;
+		dest_height = d_image->tile[0].height;
+	}
+
+	/* setup the IC resizer and CSC */
+	ret = ipu_ic_task_init(cvt->ic,
+			       s_image->tile[0].width,
+			       s_image->tile[0].height,
+			       dest_width,
+			       dest_height,
+			       src_cs, dest_cs);
+	if (ret) {
+		dev_err(priv->ipu->dev, "ipu_ic_task_init failed, %d\n", ret);
+		return ret;
+	}
+
+	/* init the source MEM-->IC PP IDMAC channel */
+	init_idmac_channel(ctx, cvt->in_chan, s_image,
+			   IPU_ROTATE_NONE, false);
+
+	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
+		/* init the IC PP-->MEM IDMAC channel */
+		init_idmac_channel(ctx, cvt->out_chan, d_image,
+				   IPU_ROTATE_NONE, true);
+
+		/* init the MEM-->IC PP ROT IDMAC channel */
+		init_idmac_channel(ctx, cvt->rotation_in_chan, d_image,
+				   ctx->rot_mode, true);
+
+		/* init the destination IC PP ROT-->MEM IDMAC channel */
+		init_idmac_channel(ctx, cvt->rotation_out_chan, d_image,
+				   IPU_ROTATE_NONE, false);
+
+		/* now link IC PP-->MEM to MEM-->IC PP ROT */
+		ipu_idmac_link(cvt->out_chan, cvt->rotation_in_chan);
+	} else {
+		/* init the destination IC PP-->MEM IDMAC channel */
+		init_idmac_channel(ctx, cvt->out_chan, d_image,
+				   ctx->rot_mode, false);
+	}
+
+	/* enable the IC */
+	ipu_ic_enable(cvt->ic);
+
+	/* set buffers ready */
+	ipu_idmac_select_buffer(cvt->in_chan, 0);
+	ipu_idmac_select_buffer(cvt->out_chan, 0);
+	if (ipu_rot_mode_is_irt(ctx->rot_mode))
+		ipu_idmac_select_buffer(cvt->rotation_out_chan, 0);
+	if (ctx->double_buffering) {
+		ipu_idmac_select_buffer(cvt->in_chan, 1);
+		ipu_idmac_select_buffer(cvt->out_chan, 1);
+		if (ipu_rot_mode_is_irt(ctx->rot_mode))
+			ipu_idmac_select_buffer(cvt->rotation_out_chan, 1);
+	}
+
+	/* enable the channels! */
+	ipu_idmac_enable_channel(cvt->in_chan);
+	ipu_idmac_enable_channel(cvt->out_chan);
+	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
+		ipu_idmac_enable_channel(cvt->rotation_in_chan);
+		ipu_idmac_enable_channel(cvt->rotation_out_chan);
+	}
+
+	ipu_ic_task_enable(cvt->ic);
+
+	ipu_cpmem_dump(cvt->in_chan);
+	ipu_cpmem_dump(cvt->out_chan);
+	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
+		ipu_cpmem_dump(cvt->rotation_in_chan);
+		ipu_cpmem_dump(cvt->rotation_out_chan);
+	}
+
+	ipu_dump(priv->ipu);
+
+	return 0;
+}
+
+/* hold irqlock when calling */
+static int ipu_ic_run(struct image_converter_run *run)
+{
+	struct image_converter_ctx *ctx = run->ctx;
+	struct image_converter *cvt = ctx->cvt;
+
+	ctx->in.base.phys0 = run->in_phys;
+	ctx->out.base.phys0 = run->out_phys;
+
+	ctx->cur_buf_num = 0;
+	ctx->next_tile = 1;
+
+	/* remove run from pending_q and set as current */
+	list_del(&run->list);
+	cvt->current_run = run;
+
+	return ipu_ic_convert_start(run);
+}
+
+/* hold irqlock when calling */
+static void ipu_ic_run_next(struct image_converter *cvt)
+{
+	struct ipu_ic_priv *priv = cvt->ic->priv;
+	struct image_converter_run *run, *tmp;
+	int ret;
+
+	list_for_each_entry_safe(run, tmp, &cvt->pending_q, list) {
+		/* skip contexts that are aborting */
+		if (run->ctx->aborting) {
+			dev_dbg(priv->ipu->dev,
+				 "%s: skipping aborting ctx %p run %p\n",
+				 __func__, run->ctx, run);
+			continue;
+		}
+
+		ret = ipu_ic_run(run);
+		if (!ret)
+			break;
+
+		/*
+		 * something went wrong with start, add the run
+		 * to done q and continue to the next run in the
+		 * pending q.
+		 */
+		run->status = ret;
+		list_add_tail(&run->list, &cvt->done_q);
+		cvt->current_run = NULL;
+	}
+}
+
+static void ipu_ic_empty_done_q(struct image_converter *cvt)
+{
+	struct ipu_ic_priv *priv = cvt->ic->priv;
+	struct image_converter_run *run;
+	unsigned long flags;
+
+	spin_lock_irqsave(&cvt->irqlock, flags);
+
+	while (!list_empty(&cvt->done_q)) {
+		run = list_entry(cvt->done_q.next,
+				 struct image_converter_run,
+				 list);
+
+		list_del(&run->list);
+
+		dev_dbg(priv->ipu->dev,
+			"%s: completing ctx %p run %p with %d\n",
+			__func__, run->ctx, run, run->status);
+
+		/* call the completion callback and free the run */
+		spin_unlock_irqrestore(&cvt->irqlock, flags);
+		run->ctx->complete(run->ctx->complete_context, run,
+				   run->status);
+		kfree(run);
+		spin_lock_irqsave(&cvt->irqlock, flags);
+	}
+
+	spin_unlock_irqrestore(&cvt->irqlock, flags);
+}
+
+/*
+ * the bottom half thread clears out the done_q, calling the
+ * completion handler for each.
+ */
+static irqreturn_t ipu_ic_bh(int irq, void *dev_id)
+{
+	struct image_converter *cvt = dev_id;
+	struct ipu_ic_priv *priv = cvt->ic->priv;
+	struct image_converter_ctx *ctx;
+	unsigned long flags;
+
+	dev_dbg(priv->ipu->dev, "%s: enter\n", __func__);
+
+	ipu_ic_empty_done_q(cvt);
+
+	spin_lock_irqsave(&cvt->irqlock, flags);
+
+	/*
+	 * the done_q is cleared out, signal any contexts
+	 * that are aborting that abort can complete.
+	 */
+	list_for_each_entry(ctx, &cvt->ctx_list, list) {
+		if (ctx->aborting) {
+			dev_dbg(priv->ipu->dev,
+				 "%s: signaling abort for ctx %p\n",
+				 __func__, ctx);
+			complete(&ctx->aborted);
+		}
+	}
+
+	spin_unlock_irqrestore(&cvt->irqlock, flags);
+
+	dev_dbg(priv->ipu->dev, "%s: exit\n", __func__);
+	return IRQ_HANDLED;
+}
+
+/* hold irqlock when calling */
+static irqreturn_t ipu_ic_doirq(struct image_converter_run *run)
+{
+	struct image_converter_ctx *ctx = run->ctx;
+	struct image_converter *cvt = ctx->cvt;
+	struct ipu_ic_tile *src_tile, *dst_tile;
+	struct ipu_ic_image *s_image = &ctx->in;
+	struct ipu_ic_image *d_image = &ctx->out;
+	struct ipuv3_channel *outch;
+	unsigned int dst_idx;
+
+	outch = ipu_rot_mode_is_irt(ctx->rot_mode) ?
+		cvt->rotation_out_chan : cvt->out_chan;
+
+	/*
+	 * It is difficult to stop the channel DMA before the channels
+	 * enter the paused state. Without double-buffering the channels
+	 * are always in a paused state when the EOF irq occurs, so it
+	 * is safe to stop the channels now. For double-buffering we
+	 * just ignore the abort until the operation completes, when it
+	 * is safe to shut down.
+	 */
+	if (ctx->aborting && !ctx->double_buffering) {
+		ipu_ic_convert_stop(run);
+		run->status = -EIO;
+		goto done;
+	}
+
+	if (ctx->next_tile == ctx->num_tiles) {
+		/*
+		 * the conversion is complete
+		 */
+		ipu_ic_convert_stop(run);
+		run->status = 0;
+		goto done;
+	}
+
+	/*
+	 * not done, place the next tile buffers.
+	 */
+	if (!ctx->double_buffering) {
+
+		src_tile = &s_image->tile[ctx->next_tile];
+		dst_idx = ctx->out_tile_map[ctx->next_tile];
+		dst_tile = &d_image->tile[dst_idx];
+
+		ipu_cpmem_set_buffer(cvt->in_chan, 0,
+				     s_image->base.phys0 + src_tile->offset);
+		ipu_cpmem_set_buffer(outch, 0,
+				     d_image->base.phys0 + dst_tile->offset);
+		if (s_image->fmt->y_depth)
+			ipu_cpmem_set_uv_offset(cvt->in_chan,
+						src_tile->u_off,
+						src_tile->v_off);
+		if (d_image->fmt->y_depth)
+			ipu_cpmem_set_uv_offset(outch,
+						dst_tile->u_off,
+						dst_tile->v_off);
+
+		ipu_idmac_select_buffer(cvt->in_chan, 0);
+		ipu_idmac_select_buffer(outch, 0);
+
+	} else if (ctx->next_tile < ctx->num_tiles - 1) {
+
+		src_tile = &s_image->tile[ctx->next_tile + 1];
+		dst_idx = ctx->out_tile_map[ctx->next_tile + 1];
+		dst_tile = &d_image->tile[dst_idx];
+
+		ipu_cpmem_set_buffer(cvt->in_chan, ctx->cur_buf_num,
+				     s_image->base.phys0 + src_tile->offset);
+		ipu_cpmem_set_buffer(outch, ctx->cur_buf_num,
+				     d_image->base.phys0 + dst_tile->offset);
+
+		ipu_idmac_select_buffer(cvt->in_chan, ctx->cur_buf_num);
+		ipu_idmac_select_buffer(outch, ctx->cur_buf_num);
+
+		ctx->cur_buf_num ^= 1;
+	}
+
+	ctx->next_tile++;
+	return IRQ_HANDLED;
+done:
+	list_add_tail(&run->list, &cvt->done_q);
+	cvt->current_run = NULL;
+	ipu_ic_run_next(cvt);
+	return IRQ_WAKE_THREAD;
+}
+
+static irqreturn_t ipu_ic_norotate_irq(int irq, void *data)
+{
+	struct image_converter *cvt = data;
+	struct image_converter_ctx *ctx;
+	struct image_converter_run *run;
+	unsigned long flags;
+	irqreturn_t ret;
+
+	spin_lock_irqsave(&cvt->irqlock, flags);
+
+	/* get current run and its context */
+	run = cvt->current_run;
+	if (!run) {
+		ret = IRQ_NONE;
+		goto out;
+	}
+
+	ctx = run->ctx;
+
+	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
+		/* this is a rotation operation, just ignore */
+		spin_unlock_irqrestore(&cvt->irqlock, flags);
+		return IRQ_HANDLED;
+	}
+
+	ret = ipu_ic_doirq(run);
+out:
+	spin_unlock_irqrestore(&cvt->irqlock, flags);
+	return ret;
+}
+
+static irqreturn_t ipu_ic_rotate_irq(int irq, void *data)
+{
+	struct image_converter *cvt = data;
+	struct ipu_ic_priv *priv = cvt->ic->priv;
+	struct image_converter_ctx *ctx;
+	struct image_converter_run *run;
+	unsigned long flags;
+	irqreturn_t ret;
+
+	spin_lock_irqsave(&cvt->irqlock, flags);
+
+	/* get current run and its context */
+	run = cvt->current_run;
+	if (!run) {
+		ret = IRQ_NONE;
+		goto out;
+	}
+
+	ctx = run->ctx;
+
+	if (!ipu_rot_mode_is_irt(ctx->rot_mode)) {
+		/* this was NOT a rotation operation, shouldn't happen */
+		dev_err(priv->ipu->dev, "Unexpected rotation interrupt\n");
+		spin_unlock_irqrestore(&cvt->irqlock, flags);
+		return IRQ_HANDLED;
+	}
+
+	ret = ipu_ic_doirq(run);
+out:
+	spin_unlock_irqrestore(&cvt->irqlock, flags);
+	return ret;
+}
+
+/*
+ * try to force the completion of runs for this ctx. Called when
+ * abort wait times out in ipu_image_convert_abort().
+ */
+static void ipu_ic_force_abort(struct image_converter_ctx *ctx)
+{
+	struct image_converter *cvt = ctx->cvt;
+	struct image_converter_run *run;
+	unsigned long flags;
+
+	spin_lock_irqsave(&cvt->irqlock, flags);
+
+	run = cvt->current_run;
+	if (run && run->ctx == ctx) {
+		ipu_ic_convert_stop(run);
+		run->status = -EIO;
+		list_add_tail(&run->list, &cvt->done_q);
+		cvt->current_run = NULL;
+		ipu_ic_run_next(cvt);
+	}
+
+	spin_unlock_irqrestore(&cvt->irqlock, flags);
+
+	ipu_ic_empty_done_q(cvt);
+}
+
+static void ipu_ic_release_ipu_resources(struct image_converter *cvt)
+{
+	if (cvt->out_eof_irq >= 0)
+		free_irq(cvt->out_eof_irq, cvt);
+	if (cvt->rot_out_eof_irq >= 0)
+		free_irq(cvt->rot_out_eof_irq, cvt);
+
+	if (!IS_ERR_OR_NULL(cvt->in_chan))
+		ipu_idmac_put(cvt->in_chan);
+	if (!IS_ERR_OR_NULL(cvt->out_chan))
+		ipu_idmac_put(cvt->out_chan);
+	if (!IS_ERR_OR_NULL(cvt->rotation_in_chan))
+		ipu_idmac_put(cvt->rotation_in_chan);
+	if (!IS_ERR_OR_NULL(cvt->rotation_out_chan))
+		ipu_idmac_put(cvt->rotation_out_chan);
+
+	cvt->in_chan = cvt->out_chan = cvt->rotation_in_chan =
+		cvt->rotation_out_chan = NULL;
+	cvt->out_eof_irq = cvt->rot_out_eof_irq = -1;
+}
+
+static int ipu_ic_get_ipu_resources(struct image_converter *cvt)
+{
+	const struct ic_task_channels *chan = cvt->ic->ch;
+	struct ipu_ic_priv *priv = cvt->ic->priv;
+	int ret;
+
+	/* get IDMAC channels */
+	cvt->in_chan = ipu_idmac_get(priv->ipu, chan->in);
+	cvt->out_chan = ipu_idmac_get(priv->ipu, chan->out);
+	if (IS_ERR(cvt->in_chan) || IS_ERR(cvt->out_chan)) {
+		dev_err(priv->ipu->dev, "could not acquire idmac channels\n");
+		ret = -EBUSY;
+		goto err;
+	}
+
+	cvt->rotation_in_chan = ipu_idmac_get(priv->ipu, chan->rot_in);
+	cvt->rotation_out_chan = ipu_idmac_get(priv->ipu, chan->rot_out);
+	if (IS_ERR(cvt->rotation_in_chan) || IS_ERR(cvt->rotation_out_chan)) {
+		dev_err(priv->ipu->dev,
+			"could not acquire idmac rotation channels\n");
+		ret = -EBUSY;
+		goto err;
+	}
+
+	/* acquire the EOF interrupts */
+	cvt->out_eof_irq = ipu_idmac_channel_irq(priv->ipu,
+						cvt->out_chan,
+						IPU_IRQ_EOF);
+
+	ret = request_threaded_irq(cvt->out_eof_irq,
+				   ipu_ic_norotate_irq, ipu_ic_bh,
+				   0, "ipu-ic", cvt);
+	if (ret < 0) {
+		dev_err(priv->ipu->dev, "could not acquire irq %d\n",
+			 cvt->out_eof_irq);
+		cvt->out_eof_irq = -1;
+		goto err;
+	}
+
+	cvt->rot_out_eof_irq = ipu_idmac_channel_irq(priv->ipu,
+						     cvt->rotation_out_chan,
+						     IPU_IRQ_EOF);
+
+	ret = request_threaded_irq(cvt->rot_out_eof_irq,
+				   ipu_ic_rotate_irq, ipu_ic_bh,
+				   0, "ipu-ic", cvt);
+	if (ret < 0) {
+		dev_err(priv->ipu->dev, "could not acquire irq %d\n",
+			cvt->rot_out_eof_irq);
+		cvt->rot_out_eof_irq = -1;
+		goto err;
+	}
+
+	return 0;
+err:
+	ipu_ic_release_ipu_resources(cvt);
+	return ret;
+}
+
+static int ipu_ic_fill_image(struct image_converter_ctx *ctx,
+			     struct ipu_ic_image *ic_image,
+			     struct ipu_image *image,
+			     enum image_convert_type type)
+{
+	struct ipu_ic_priv *priv = ctx->cvt->ic->priv;
+
+	ic_image->base = *image;
+	ic_image->type = type;
+
+	ic_image->fmt = ipu_ic_get_format(image->pix.pixelformat);
+	if (!ic_image->fmt) {
+		dev_err(priv->ipu->dev, "pixelformat not supported for %s\n",
+			type == IMAGE_CONVERT_OUT ? "Output" : "Input");
+		return -EINVAL;
+	}
+
+	if (ic_image->fmt->y_depth)
+		ic_image->stride = (ic_image->fmt->y_depth *
+				    ic_image->base.pix.width) >> 3;
+	else
+		ic_image->stride  = ic_image->base.pix.bytesperline;
+
+	ipu_ic_calc_tile_dimensions(ctx, ic_image);
+	ipu_ic_calc_tile_offsets(ctx, ic_image);
+
+	return 0;
+}
+
+/* borrowed from drivers/media/v4l2-core/v4l2-common.c */
+static unsigned int clamp_align(unsigned int x, unsigned int min,
+				unsigned int max, unsigned int align)
+{
+	/* Bits that must be zero to be aligned */
+	unsigned int mask = ~((1 << align) - 1);
+
+	/* Clamp to aligned min and max */
+	x = clamp(x, (min + ~mask) & mask, max & mask);
+
+	/* Round to nearest aligned value */
+	if (align)
+		x = (x + (1 << (align - 1))) & mask;
+
+	return x;
+}
+
+/*
+ * We have to adjust the tile width such that the tile physaddrs and
+ * U and V plane offsets are multiples of 8 bytes as required by
+ * the IPU DMA Controller. For the planar formats, this corresponds
+ * to a pixel alignment of 16 (but use a more formal equation since
+ * the variables are available). For all the packed formats, 8 is
+ * good enough.
+ */
+static inline u32 tile_width_align(const struct ipu_ic_pixfmt *fmt)
+{
+	return fmt->y_depth ? (64 * fmt->uv_width_dec) / fmt->y_depth : 8;
+}
+
+/*
+ * For tile height alignment, we have to ensure that the output tile
+ * heights are multiples of 8 lines if the IRT is required by the
+ * given rotation mode (the IRT performs rotations on 8x8 blocks
+ * at a time). If the IRT is not used, or for input image tiles,
+ * 2 lines are good enough.
+ */
+static inline u32 tile_height_align(enum image_convert_type type,
+				    enum ipu_rotate_mode rot_mode)
+{
+	return (type == IMAGE_CONVERT_OUT &&
+		ipu_rot_mode_is_irt(rot_mode)) ? 8 : 2;
+}
+
+/* Adjusts input/output images to IPU restrictions */
+int ipu_image_convert_adjust(struct ipu_image *in, struct ipu_image *out,
+			     enum ipu_rotate_mode rot_mode)
+{
+	const struct ipu_ic_pixfmt *infmt, *outfmt;
+	unsigned int num_in_rows, num_in_cols;
+	unsigned int num_out_rows, num_out_cols;
+	u32 w_align, h_align;
+
+	infmt = ipu_ic_get_format(in->pix.pixelformat);
+	outfmt = ipu_ic_get_format(out->pix.pixelformat);
+
+	/* set some defaults if needed */
+	if (!infmt) {
+		in->pix.pixelformat = V4L2_PIX_FMT_RGB24;
+		infmt = ipu_ic_get_format(V4L2_PIX_FMT_RGB24);
+	}
+	if (!outfmt) {
+		out->pix.pixelformat = V4L2_PIX_FMT_RGB24;
+		outfmt = ipu_ic_get_format(V4L2_PIX_FMT_RGB24);
+	}
+
+	if (!in->pix.width || !in->pix.height) {
+		in->pix.width = 640;
+		in->pix.height = 480;
+	}
+	if (!out->pix.width || !out->pix.height) {
+		out->pix.width = 640;
+		out->pix.height = 480;
+	}
+
+	/* image converter does not handle fields */
+	in->pix.field = out->pix.field = V4L2_FIELD_NONE;
+
+	/* resizer cannot downsize more than 4:1 */
+	if (ipu_rot_mode_is_irt(rot_mode)) {
+		out->pix.height = max_t(__u32, out->pix.height,
+					in->pix.width / 4);
+		out->pix.width = max_t(__u32, out->pix.width,
+				       in->pix.height / 4);
+	} else {
+		out->pix.width = max_t(__u32, out->pix.width,
+				       in->pix.width / 4);
+		out->pix.height = max_t(__u32, out->pix.height,
+					in->pix.height / 4);
+	}
+
+	/* get tiling rows/cols from output format */
+	num_out_rows = ipu_ic_num_stripes(out->pix.height);
+	num_out_cols = ipu_ic_num_stripes(out->pix.width);
+	if (ipu_rot_mode_is_irt(rot_mode)) {
+		num_in_rows = num_out_cols;
+		num_in_cols = num_out_rows;
+	} else {
+		num_in_rows = num_out_rows;
+		num_in_cols = num_out_cols;
+	}
+
+	/* align input width/height */
+	w_align = ilog2(tile_width_align(infmt) * num_in_cols);
+	h_align = ilog2(tile_height_align(IMAGE_CONVERT_IN, rot_mode) *
+			num_in_rows);
+	in->pix.width = clamp_align(in->pix.width, MIN_W, MAX_W, w_align);
+	in->pix.height = clamp_align(in->pix.height, MIN_H, MAX_H, h_align);
+
+	/* align output width/height */
+	w_align = ilog2(tile_width_align(outfmt) * num_out_cols);
+	h_align = ilog2(tile_height_align(IMAGE_CONVERT_OUT, rot_mode) *
+			num_out_rows);
+	out->pix.width = clamp_align(out->pix.width, MIN_W, MAX_W, w_align);
+	out->pix.height = clamp_align(out->pix.height, MIN_H, MAX_H, h_align);
+
+	/* set input/output strides and image sizes */
+	in->pix.bytesperline = (in->pix.width * infmt->bpp) >> 3;
+	in->pix.sizeimage = in->pix.height * in->pix.bytesperline;
+	out->pix.bytesperline = (out->pix.width * outfmt->bpp) >> 3;
+	out->pix.sizeimage = out->pix.height * out->pix.bytesperline;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(ipu_image_convert_adjust);
+
+/*
+ * this is used by ipu_image_convert_prepare() to verify set input and
+ * output images are valid before starting the conversion. Clients can
+ * also call it before calling ipu_image_convert_prepare().
+ */
+int ipu_image_convert_verify(struct ipu_image *in, struct ipu_image *out,
+			     enum ipu_rotate_mode rot_mode)
+{
+	struct ipu_image testin, testout;
+	int ret;
+
+	testin = *in;
+	testout = *out;
+
+	ret = ipu_image_convert_adjust(&testin, &testout, rot_mode);
+	if (ret)
+		return ret;
+
+	if (testin.pix.width != in->pix.width ||
+	    testin.pix.height != in->pix.height ||
+	    testout.pix.width != out->pix.width ||
+	    testout.pix.height != out->pix.height)
+		return -EINVAL;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(ipu_image_convert_verify);
+
+/*
+ * Call ipu_image_convert_prepare() to prepare for the conversion of
+ * given images and rotation mode. Returns a new conversion context.
+ */
+struct image_converter_ctx *
+ipu_image_convert_prepare(struct ipu_ic *ic,
+			  struct ipu_image *in, struct ipu_image *out,
+			  enum ipu_rotate_mode rot_mode,
+			  image_converter_cb_t complete,
+			  void *complete_context)
+{
+	struct ipu_ic_priv *priv = ic->priv;
+	struct image_converter *cvt = &ic->cvt;
+	struct ipu_ic_image *s_image, *d_image;
+	struct image_converter_ctx *ctx;
+	unsigned long flags;
+	bool get_res;
+	int ret;
+
+	if (!ic || !in || !out || !complete)
+		return ERR_PTR(-EINVAL);
+
+	/* verify the in/out images before continuing */
+	ret = ipu_image_convert_verify(in, out, rot_mode);
+	if (ret) {
+		dev_err(priv->ipu->dev, "%s: in/out formats invalid\n",
+			__func__);
+		return ERR_PTR(ret);
+	}
+
+	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
+	if (!ctx)
+		return ERR_PTR(-ENOMEM);
+
+	dev_dbg(priv->ipu->dev, "%s: ctx %p\n", __func__, ctx);
+
+	ctx->cvt = cvt;
+	init_completion(&ctx->aborted);
+
+	s_image = &ctx->in;
+	d_image = &ctx->out;
+
+	/* set tiling and rotation */
+	d_image->num_rows = ipu_ic_num_stripes(out->pix.height);
+	d_image->num_cols = ipu_ic_num_stripes(out->pix.width);
+	if (ipu_rot_mode_is_irt(rot_mode)) {
+		s_image->num_rows = d_image->num_cols;
+		s_image->num_cols = d_image->num_rows;
+	} else {
+		s_image->num_rows = d_image->num_rows;
+		s_image->num_cols = d_image->num_cols;
+	}
+
+	ctx->num_tiles = d_image->num_cols * d_image->num_rows;
+	ctx->rot_mode = rot_mode;
+
+	ret = ipu_ic_fill_image(ctx, s_image, in, IMAGE_CONVERT_IN);
+	if (ret)
+		goto out_free;
+	ret = ipu_ic_fill_image(ctx, d_image, out, IMAGE_CONVERT_OUT);
+	if (ret)
+		goto out_free;
+
+	ipu_ic_calc_out_tile_map(ctx);
+
+	ipu_ic_dump_format(ctx, s_image);
+	ipu_ic_dump_format(ctx, d_image);
+
+	ctx->complete = complete;
+	ctx->complete_context = complete_context;
+
+	/*
+	 * Can we use double-buffering for this operation? If there is
+	 * only one tile (the whole image can be converted in a single
+	 * operation) there's no point in using double-buffering. Also,
+	 * the IPU's IDMAC channels allow only a single U and V plane
+	 * offset shared between both buffers, but these offsets change
+	 * for every tile, and therefore would have to be updated for
+	 * each buffer which is not possible. So double-buffering is
+	 * impossible when either the source or destination images are
+	 * a planar format (YUV420, YUV422P, etc.).
+	 */
+	ctx->double_buffering = (ctx->num_tiles > 1 &&
+				 !s_image->fmt->y_depth &&
+				 !d_image->fmt->y_depth);
+
+	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
+		ret = ipu_ic_alloc_dma_buf(priv, &ctx->rot_intermediate[0],
+					   d_image->tile[0].size);
+		if (ret)
+			goto out_free;
+		if (ctx->double_buffering) {
+			ret = ipu_ic_alloc_dma_buf(priv,
+						   &ctx->rot_intermediate[1],
+						   d_image->tile[0].size);
+			if (ret)
+				goto out_free_dmabuf0;
+		}
+	}
+
+	spin_lock_irqsave(&cvt->irqlock, flags);
+
+	get_res = list_empty(&cvt->ctx_list);
+
+	list_add_tail(&ctx->list, &cvt->ctx_list);
+
+	spin_unlock_irqrestore(&cvt->irqlock, flags);
+
+	if (get_res) {
+		ret = ipu_ic_get_ipu_resources(cvt);
+		if (ret)
+			goto out_free_dmabuf1;
+	}
+
+	return ctx;
+
+out_free_dmabuf1:
+	ipu_ic_free_dma_buf(priv, &ctx->rot_intermediate[1]);
+	spin_lock_irqsave(&cvt->irqlock, flags);
+	list_del(&ctx->list);
+	spin_unlock_irqrestore(&cvt->irqlock, flags);
+out_free_dmabuf0:
+	ipu_ic_free_dma_buf(priv, &ctx->rot_intermediate[0]);
+out_free:
+	kfree(ctx);
+	return ERR_PTR(ret);
+}
+EXPORT_SYMBOL_GPL(ipu_image_convert_prepare);
+
+/*
+ * Carry out a single image conversion. Only the physaddr's of the input
+ * and output image buffers are needed. The conversion context must have
+ * been created previously with ipu_image_convert_prepare(). Returns the
+ * new run object.
+ */
+struct image_converter_run *
+ipu_image_convert_run(struct image_converter_ctx *ctx,
+		      dma_addr_t in_phys, dma_addr_t out_phys)
+{
+	struct image_converter *cvt = ctx->cvt;
+	struct ipu_ic_priv *priv = cvt->ic->priv;
+	struct image_converter_run *run;
+	unsigned long flags;
+	int ret = 0;
+
+	run = kzalloc(sizeof(*run), GFP_KERNEL);
+	if (!run)
+		return ERR_PTR(-ENOMEM);
+
+	run->ctx = ctx;
+	run->in_phys = in_phys;
+	run->out_phys = out_phys;
+
+	dev_dbg(priv->ipu->dev, "%s: ctx %p run %p\n", __func__,
+		ctx, run);
+
+	spin_lock_irqsave(&cvt->irqlock, flags);
+
+	if (ctx->aborting) {
+		ret = -EIO;
+		goto unlock;
+	}
+
+	list_add_tail(&run->list, &cvt->pending_q);
+
+	if (!cvt->current_run) {
+		ret = ipu_ic_run(run);
+		if (ret)
+			cvt->current_run = NULL;
+	}
+unlock:
+	spin_unlock_irqrestore(&cvt->irqlock, flags);
+
+	if (ret) {
+		kfree(run);
+		run = ERR_PTR(ret);
+	}
+
+	return run;
+}
+EXPORT_SYMBOL_GPL(ipu_image_convert_run);
+
+/* Abort any active or pending conversions for this context */
+void ipu_image_convert_abort(struct image_converter_ctx *ctx)
+{
+	struct image_converter *cvt = ctx->cvt;
+	struct ipu_ic_priv *priv = cvt->ic->priv;
+	struct image_converter_run *run, *active_run, *tmp;
+	unsigned long flags;
+	int run_count, ret;
+	bool need_abort;
+
+	reinit_completion(&ctx->aborted);
+
+	spin_lock_irqsave(&cvt->irqlock, flags);
+
+	/* move all remaining pending runs in this context to done_q */
+	list_for_each_entry_safe(run, tmp, &cvt->pending_q, list) {
+		if (run->ctx != ctx)
+			continue;
+		run->status = -EIO;
+		list_move_tail(&run->list, &cvt->done_q);
+	}
+
+	run_count = ipu_ic_get_run_count(ctx, &cvt->done_q);
+	active_run = (cvt->current_run && cvt->current_run->ctx == ctx) ?
+		cvt->current_run : NULL;
+
+	need_abort = (run_count || active_run);
+
+	ctx->aborting = need_abort;
+
+	spin_unlock_irqrestore(&cvt->irqlock, flags);
+
+	if (!need_abort) {
+		dev_dbg(priv->ipu->dev, "%s: no abort needed for ctx %p\n",
+			__func__, ctx);
+		return;
+	}
+
+	dev_dbg(priv->ipu->dev,
+		 "%s: wait for completion: %d runs, active run %p\n",
+		 __func__, run_count, active_run);
+
+	ret = wait_for_completion_timeout(&ctx->aborted,
+					  msecs_to_jiffies(10000));
+	if (ret == 0) {
+		dev_warn(priv->ipu->dev, "%s: timeout\n", __func__);
+		ipu_ic_force_abort(ctx);
+	}
+
+	ctx->aborting = false;
+}
+EXPORT_SYMBOL_GPL(ipu_image_convert_abort);
+
+/* Unprepare image conversion context */
+void ipu_image_convert_unprepare(struct image_converter_ctx *ctx)
+{
+	struct image_converter *cvt = ctx->cvt;
+	struct ipu_ic_priv *priv = cvt->ic->priv;
+	unsigned long flags;
+	bool put_res;
+
+	/* make sure no runs are hanging around */
+	ipu_image_convert_abort(ctx);
+
+	dev_dbg(priv->ipu->dev, "%s: removing ctx %p\n", __func__, ctx);
+
+	spin_lock_irqsave(&cvt->irqlock, flags);
+
+	list_del(&ctx->list);
+
+	put_res = list_empty(&cvt->ctx_list);
+
+	spin_unlock_irqrestore(&cvt->irqlock, flags);
+
+	if (put_res)
+		ipu_ic_release_ipu_resources(cvt);
+
+	ipu_ic_free_dma_buf(priv, &ctx->rot_intermediate[1]);
+	ipu_ic_free_dma_buf(priv, &ctx->rot_intermediate[0]);
+
+	kfree(ctx);
+}
+EXPORT_SYMBOL_GPL(ipu_image_convert_unprepare);
+
+/*
+ * "Canned" asynchronous single image conversion. On successful return
+ * caller must call ipu_image_convert_unprepare() after conversion completes.
+ * Returns the new conversion context.
+ */
+struct image_converter_ctx *
+ipu_image_convert(struct ipu_ic *ic,
+		  struct ipu_image *in, struct ipu_image *out,
+		  enum ipu_rotate_mode rot_mode,
+		  image_converter_cb_t complete,
+		  void *complete_context)
+{
+	struct image_converter_ctx *ctx;
+	struct image_converter_run *run;
+
+	ctx = ipu_image_convert_prepare(ic, in, out, rot_mode,
+					complete, complete_context);
+	if (IS_ERR(ctx))
+		return ctx;
+
+	run = ipu_image_convert_run(ctx, in->phys0, out->phys0);
+	if (IS_ERR(run)) {
+		ipu_image_convert_unprepare(ctx);
+		return ERR_PTR(PTR_ERR(run));
+	}
+
+	return ctx;
+}
+EXPORT_SYMBOL_GPL(ipu_image_convert);
+
+/* "Canned" synchronous single image conversion */
+static void image_convert_sync_complete(void *data,
+					struct image_converter_run *run,
+					int err)
+{
+	struct completion *comp = data;
+
+	complete(comp);
+}
+
+int ipu_image_convert_sync(struct ipu_ic *ic,
+			   struct ipu_image *in, struct ipu_image *out,
+			   enum ipu_rotate_mode rot_mode)
+{
+	struct image_converter_ctx *ctx;
+	struct completion comp;
+	int ret;
+
+	init_completion(&comp);
+
+	ctx = ipu_image_convert(ic, in, out, rot_mode,
+				image_convert_sync_complete, &comp);
+	if (IS_ERR(ctx))
+		return PTR_ERR(ctx);
+
+	ret = wait_for_completion_timeout(&comp, msecs_to_jiffies(10000));
+	ret = (ret == 0) ? -ETIMEDOUT : 0;
+
+	ipu_image_convert_unprepare(ctx);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(ipu_image_convert_sync);
+
 int ipu_ic_enable(struct ipu_ic *ic)
 {
 	struct ipu_ic_priv *priv = ic->priv;
@@ -746,6 +2418,7 @@ int ipu_ic_init(struct ipu_soc *ipu, struct device *dev,
 	ipu->ic_priv = priv;
 
 	spin_lock_init(&priv->lock);
+
 	priv->base = devm_ioremap(dev, base, PAGE_SIZE);
 	if (!priv->base)
 		return -ENOMEM;
@@ -758,10 +2431,21 @@ int ipu_ic_init(struct ipu_soc *ipu, struct device *dev,
 	priv->ipu = ipu;
 
 	for (i = 0; i < IC_NUM_TASKS; i++) {
-		priv->task[i].task = i;
-		priv->task[i].priv = priv;
-		priv->task[i].reg = &ic_task_reg[i];
-		priv->task[i].bit = &ic_task_bit[i];
+		struct ipu_ic *ic = &priv->task[i];
+		struct image_converter *cvt = &ic->cvt;
+
+		ic->task = i;
+		ic->priv = priv;
+		ic->reg = &ic_task_reg[i];
+		ic->bit = &ic_task_bit[i];
+		ic->ch = &ic_task_ch[i];
+
+		cvt->ic = ic;
+		spin_lock_init(&cvt->irqlock);
+		INIT_LIST_HEAD(&cvt->ctx_list);
+		INIT_LIST_HEAD(&cvt->pending_q);
+		INIT_LIST_HEAD(&cvt->done_q);
+		cvt->out_eof_irq = cvt->rot_out_eof_irq = -1;
 	}
 
 	return 0;
diff --git a/include/video/imx-ipu-v3.h b/include/video/imx-ipu-v3.h
index 1a3f7d4..992addf 100644
--- a/include/video/imx-ipu-v3.h
+++ b/include/video/imx-ipu-v3.h
@@ -63,17 +63,25 @@ enum ipu_csi_dest {
 /*
  * Enumeration of IPU rotation modes
  */
+#define IPU_ROT_BIT_VFLIP (1 << 0)
+#define IPU_ROT_BIT_HFLIP (1 << 1)
+#define IPU_ROT_BIT_90    (1 << 2)
+
 enum ipu_rotate_mode {
 	IPU_ROTATE_NONE = 0,
-	IPU_ROTATE_VERT_FLIP,
-	IPU_ROTATE_HORIZ_FLIP,
-	IPU_ROTATE_180,
-	IPU_ROTATE_90_RIGHT,
-	IPU_ROTATE_90_RIGHT_VFLIP,
-	IPU_ROTATE_90_RIGHT_HFLIP,
-	IPU_ROTATE_90_LEFT,
+	IPU_ROTATE_VERT_FLIP = IPU_ROT_BIT_VFLIP,
+	IPU_ROTATE_HORIZ_FLIP = IPU_ROT_BIT_HFLIP,
+	IPU_ROTATE_180 = (IPU_ROT_BIT_VFLIP | IPU_ROT_BIT_HFLIP),
+	IPU_ROTATE_90_RIGHT = IPU_ROT_BIT_90,
+	IPU_ROTATE_90_RIGHT_VFLIP = (IPU_ROT_BIT_90 | IPU_ROT_BIT_VFLIP),
+	IPU_ROTATE_90_RIGHT_HFLIP = (IPU_ROT_BIT_90 | IPU_ROT_BIT_HFLIP),
+	IPU_ROTATE_90_LEFT = (IPU_ROT_BIT_90 |
+			      IPU_ROT_BIT_VFLIP | IPU_ROT_BIT_HFLIP),
 };
 
+/* 90-degree rotations require the IRT unit */
+#define ipu_rot_mode_is_irt(m) ((m) >= IPU_ROTATE_90_RIGHT)
+
 enum ipu_color_space {
 	IPUV3_COLORSPACE_RGB,
 	IPUV3_COLORSPACE_YUV,
@@ -337,6 +345,7 @@ enum ipu_ic_task {
 };
 
 struct ipu_ic;
+
 int ipu_ic_task_init(struct ipu_ic *ic,
 		     int in_width, int in_height,
 		     int out_width, int out_height,
@@ -351,6 +360,40 @@ void ipu_ic_task_disable(struct ipu_ic *ic);
 int ipu_ic_task_idma_init(struct ipu_ic *ic, struct ipuv3_channel *channel,
 			  u32 width, u32 height, int burst_size,
 			  enum ipu_rotate_mode rot);
+
+struct image_converter_ctx;
+struct image_converter_run;
+
+typedef void (*image_converter_cb_t)(void *ctx,
+				     struct image_converter_run *run,
+				     int err);
+
+int ipu_image_convert_enum_format(int index, const char **desc, u32 *fourcc);
+int ipu_image_convert_adjust(struct ipu_image *in, struct ipu_image *out,
+			     enum ipu_rotate_mode rot_mode);
+int ipu_image_convert_verify(struct ipu_image *in, struct ipu_image *out,
+			     enum ipu_rotate_mode rot_mode);
+struct image_converter_ctx *
+ipu_image_convert_prepare(struct ipu_ic *ic,
+			  struct ipu_image *in, struct ipu_image *out,
+			  enum ipu_rotate_mode rot_mode,
+			  image_converter_cb_t complete,
+			  void *complete_context);
+void ipu_image_convert_unprepare(struct image_converter_ctx *ctx);
+struct image_converter_run *
+ipu_image_convert_run(struct image_converter_ctx *ctx,
+		      dma_addr_t in_phys, dma_addr_t out_phys);
+void ipu_image_convert_abort(struct image_converter_ctx *ctx);
+struct image_converter_ctx *
+ipu_image_convert(struct ipu_ic *ic,
+		  struct ipu_image *in, struct ipu_image *out,
+		  enum ipu_rotate_mode rot_mode,
+		  image_converter_cb_t complete,
+		  void *complete_context);
+int ipu_image_convert_sync(struct ipu_ic *ic,
+			   struct ipu_image *in, struct ipu_image *out,
+			   enum ipu_rotate_mode rot_mode);
+
 int ipu_ic_enable(struct ipu_ic *ic);
 int ipu_ic_disable(struct ipu_ic *ic);
 struct ipu_ic *ipu_ic_get(struct ipu_soc *ipu, enum ipu_ic_task task);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v4 3/4] gpu: ipu-ic: Add complete image conversion support with tiling
@ 2016-08-18  0:50   ` Steve Longerbeam
  0 siblings, 0 replies; 32+ messages in thread
From: Steve Longerbeam @ 2016-08-18  0:50 UTC (permalink / raw)
  To: p.zabel, plagnioj, tomi.valkeinen
  Cc: dri-devel, linux-kernel, linux-fbdev, Steve Longerbeam

This patch implements complete image conversion support to ipu-ic,
with tiling to support scaling to and from images up to 4096x4096.
Image rotation is also supported.

The internal API is subsystem agnostic (no V4L2 dependency except
for the use of V4L2 fourcc pixel formats).

Callers prepare for image conversion by calling
ipu_image_convert_prepare(), which initializes the parameters of
the conversion. The caller passes in the ipu_ic task to use for
the conversion, the input and output image formats, a rotation mode,
and a completion callback and completion context pointer:

struct image_converter_ctx *
ipu_image_convert_prepare(struct ipu_ic *ic,
                          struct ipu_image *in, struct ipu_image *out,
                          enum ipu_rotate_mode rot_mode,
                          image_converter_cb_t complete,
                          void *complete_context);

The caller is given a new conversion context that must be passed to
the further APIs:

struct image_converter_run *
ipu_image_convert_run(struct image_converter_ctx *ctx,
                      dma_addr_t in_phys, dma_addr_t out_phys);

This queues a new image conversion request to a run queue, and
starts the conversion immediately if the run queue is empty. Only
the physaddr's of the input and output image buffers are needed,
since the conversion context was created previously with
ipu_image_convert_prepare(). Returns a new run object pointer. When
the conversion completes, the run pointer is returned to the
completion callback.

void image_convert_abort(struct image_converter_ctx *ctx);

This will abort any active or pending conversions for this context.
Any currently active or pending runs belonging to this context are
returned via the completion callback with an error status.

void ipu_image_convert_unprepare(struct image_converter_ctx *ctx);

Unprepares the conversion context. Any active or pending runs will
be aborted by calling image_convert_abort().

Signed-off-by: Steve Longerbeam <steve_longerbeam@mentor.com>

---

v4:
- do away with struct ipu_ic_tile_off, and move tile offsets into
  struct ipu_ic_tile. This paves the way for possibly allowing for
  each tile to have different dimensions in the future.

v3: no changes
v2: no changes
---
 drivers/gpu/ipu-v3/ipu-ic.c | 1694 ++++++++++++++++++++++++++++++++++++++++++-
 include/video/imx-ipu-v3.h  |   57 +-
 2 files changed, 1739 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-ic.c b/drivers/gpu/ipu-v3/ipu-ic.c
index 1a37afc..01b1b56 100644
--- a/drivers/gpu/ipu-v3/ipu-ic.c
+++ b/drivers/gpu/ipu-v3/ipu-ic.c
@@ -17,6 +17,8 @@
 #include <linux/bitrev.h>
 #include <linux/io.h>
 #include <linux/err.h>
+#include <linux/interrupt.h>
+#include <linux/dma-mapping.h>
 #include "ipu-prv.h"
 
 /* IC Register Offsets */
@@ -82,6 +84,40 @@
 #define IC_IDMAC_3_PP_WIDTH_MASK        (0x3ff << 20)
 #define IC_IDMAC_3_PP_WIDTH_OFFSET      20
 
+/*
+ * The IC Resizer has a restriction that the output frame from the
+ * resizer must be 1024 or less in both width (pixels) and height
+ * (lines).
+ *
+ * The image conversion support attempts to split up a conversion when
+ * the desired output (converted) frame resolution exceeds the IC resizer
+ * limit of 1024 in either dimension.
+ *
+ * If either dimension of the output frame exceeds the limit, the
+ * dimension is split into 1, 2, or 4 equal stripes, for a maximum
+ * of 4*4 or 16 tiles. A conversion is then carried out for each
+ * tile (but taking care to pass the full frame stride length to
+ * the DMA channel's parameter memory!). IDMA double-buffering is used
+ * to convert each tile back-to-back when possible (see note below
+ * when double_buffering boolean is set).
+ *
+ * Note that the input frame must be split up into the same number
+ * of tiles as the output frame.
+ */
+#define MAX_STRIPES_W    4
+#define MAX_STRIPES_H    4
+#define MAX_TILES (MAX_STRIPES_W * MAX_STRIPES_H)
+
+#define MIN_W     128
+#define MIN_H     128
+#define MAX_W     4096
+#define MAX_H     4096
+
+enum image_convert_type {
+	IMAGE_CONVERT_IN = 0,
+	IMAGE_CONVERT_OUT,
+};
+
 struct ic_task_regoffs {
 	u32 rsc;
 	u32 tpmem_csc[2];
@@ -96,6 +132,16 @@ struct ic_task_bitfields {
 	u32 ic_cmb_galpha_bit;
 };
 
+struct ic_task_channels {
+	int in;
+	int out;
+	int rot_in;
+	int rot_out;
+	int vdi_in_p;
+	int vdi_in;
+	int vdi_in_n;
+};
+
 static const struct ic_task_regoffs ic_task_reg[IC_NUM_TASKS] = {
 	[IC_TASK_ENCODER] = {
 		.rsc = IC_PRP_ENC_RSC,
@@ -138,12 +184,155 @@ static const struct ic_task_bitfields ic_task_bit[IC_NUM_TASKS] = {
 	},
 };
 
+static const struct ic_task_channels ic_task_ch[IC_NUM_TASKS] = {
+	[IC_TASK_ENCODER] = {
+		.out = IPUV3_CHANNEL_IC_PRP_ENC_MEM,
+		.rot_in = IPUV3_CHANNEL_MEM_ROT_ENC,
+		.rot_out = IPUV3_CHANNEL_ROT_ENC_MEM,
+	},
+	[IC_TASK_VIEWFINDER] = {
+		.in = IPUV3_CHANNEL_MEM_IC_PRP_VF,
+		.out = IPUV3_CHANNEL_IC_PRP_VF_MEM,
+		.rot_in = IPUV3_CHANNEL_MEM_ROT_VF,
+		.rot_out = IPUV3_CHANNEL_ROT_VF_MEM,
+		.vdi_in_p = IPUV3_CHANNEL_MEM_VDI_PREV,
+		.vdi_in = IPUV3_CHANNEL_MEM_VDI_CUR,
+		.vdi_in_n = IPUV3_CHANNEL_MEM_VDI_NEXT,
+	},
+	[IC_TASK_POST_PROCESSOR] = {
+		.in = IPUV3_CHANNEL_MEM_IC_PP,
+		.out = IPUV3_CHANNEL_IC_PP_MEM,
+		.rot_in = IPUV3_CHANNEL_MEM_ROT_PP,
+		.rot_out = IPUV3_CHANNEL_ROT_PP_MEM,
+	},
+};
+
+struct ipu_ic_dma_buf {
+	void          *virt;
+	dma_addr_t    phys;
+	unsigned long len;
+};
+
+/* dimensions of one tile */
+struct ipu_ic_tile {
+	u32 width;
+	u32 height;
+	/* size and strides are in bytes */
+	u32 size;
+	u32 stride;
+	u32 rot_stride;
+	/* start Y or packed offset of this tile */
+	u32 offset;
+	/* offset from start to tile in U plane, for planar formats */
+	u32 u_off;
+	/* offset from start to tile in V plane, for planar formats */
+	u32 v_off;
+};
+
+struct ipu_ic_pixfmt {
+	char	*name;
+	u32	fourcc;        /* V4L2 fourcc */
+	int     bpp;           /* total bpp */
+	int     y_depth;       /* depth of Y plane for planar formats */
+	int     uv_width_dec;  /* decimation in width for U/V planes */
+	int     uv_height_dec; /* decimation in height for U/V planes */
+	bool    uv_swapped;    /* U and V planes are swapped */
+	bool    uv_packed;     /* partial planar (U and V in same plane) */
+};
+
+struct ipu_ic_image {
+	struct ipu_image base;
+	enum image_convert_type type;
+
+	const struct ipu_ic_pixfmt *fmt;
+	unsigned int stride;
+
+	/* # of rows (horizontal stripes) if dest height is > 1024 */
+	unsigned int num_rows;
+	/* # of columns (vertical stripes) if dest width is > 1024 */
+	unsigned int num_cols;
+
+	struct ipu_ic_tile tile[MAX_TILES];
+};
+
+struct image_converter_ctx;
+struct image_converter;
 struct ipu_ic_priv;
+struct ipu_ic;
+
+struct image_converter_run {
+	struct image_converter_ctx *ctx;
+
+	dma_addr_t in_phys;
+	dma_addr_t out_phys;
+
+	int status;
+
+	struct list_head list;
+};
+
+struct image_converter_ctx {
+	struct image_converter *cvt;
+
+	image_converter_cb_t complete;
+	void *complete_context;
+
+	/* Source/destination image data and rotation mode */
+	struct ipu_ic_image in;
+	struct ipu_ic_image out;
+	enum ipu_rotate_mode rot_mode;
+
+	/* intermediate buffer for rotation */
+	struct ipu_ic_dma_buf rot_intermediate[2];
+
+	/* current buffer number for double buffering */
+	int cur_buf_num;
+
+	bool aborting;
+	struct completion aborted;
+
+	/* can we use double-buffering for this conversion operation? */
+	bool double_buffering;
+	/* num_rows * num_cols */
+	unsigned int num_tiles;
+	/* next tile to process */
+	unsigned int next_tile;
+	/* where to place converted tile in dest image */
+	unsigned int out_tile_map[MAX_TILES];
+
+	struct list_head list;
+};
+
+struct image_converter {
+	struct ipu_ic *ic;
+
+	struct ipuv3_channel *in_chan;
+	struct ipuv3_channel *out_chan;
+	struct ipuv3_channel *rotation_in_chan;
+	struct ipuv3_channel *rotation_out_chan;
+
+	/* the IPU end-of-frame irqs */
+	int out_eof_irq;
+	int rot_out_eof_irq;
+
+	spinlock_t irqlock;
+
+	/* list of convert contexts */
+	struct list_head ctx_list;
+	/* queue of conversion runs */
+	struct list_head pending_q;
+	/* queue of completed runs */
+	struct list_head done_q;
+
+	/* the current conversion run */
+	struct image_converter_run *current_run;
+};
 
 struct ipu_ic {
 	enum ipu_ic_task task;
 	const struct ic_task_regoffs *reg;
 	const struct ic_task_bitfields *bit;
+	const struct ic_task_channels *ch;
 
 	enum ipu_color_space in_cs, g_in_cs;
 	enum ipu_color_space out_cs;
@@ -151,6 +340,8 @@ struct ipu_ic {
 	bool rotation;
 	bool in_use;
 
+	struct image_converter cvt;
+
 	struct ipu_ic_priv *priv;
 };
 
@@ -619,7 +810,7 @@ int ipu_ic_task_idma_init(struct ipu_ic *ic, struct ipuv3_channel *channel,
 	ipu_ic_write(ic, ic_idmac_2, IC_IDMAC_2);
 	ipu_ic_write(ic, ic_idmac_3, IC_IDMAC_3);
 
-	if (rot >= IPU_ROTATE_90_RIGHT)
+	if (ipu_rot_mode_is_irt(rot))
 		ic->rotation = true;
 
 unlock:
@@ -648,6 +839,1487 @@ static void ipu_irt_disable(struct ipu_ic *ic)
 	}
 }
 
+/*
+ * Complete image conversion support follows
+ */
+
+static const struct ipu_ic_pixfmt ipu_ic_formats[] = {
+	{
+		.name	= "RGB565",
+		.fourcc	= V4L2_PIX_FMT_RGB565,
+		.bpp    = 16,
+	}, {
+		.name	= "RGB24",
+		.fourcc	= V4L2_PIX_FMT_RGB24,
+		.bpp    = 24,
+	}, {
+		.name	= "BGR24",
+		.fourcc	= V4L2_PIX_FMT_BGR24,
+		.bpp    = 24,
+	}, {
+		.name	= "RGB32",
+		.fourcc	= V4L2_PIX_FMT_RGB32,
+		.bpp    = 32,
+	}, {
+		.name	= "BGR32",
+		.fourcc	= V4L2_PIX_FMT_BGR32,
+		.bpp    = 32,
+	}, {
+		.name	= "4:2:2 packed, YUYV",
+		.fourcc	= V4L2_PIX_FMT_YUYV,
+		.bpp    = 16,
+		.uv_width_dec = 2,
+		.uv_height_dec = 1,
+	}, {
+		.name	= "4:2:2 packed, UYVY",
+		.fourcc	= V4L2_PIX_FMT_UYVY,
+		.bpp    = 16,
+		.uv_width_dec = 2,
+		.uv_height_dec = 1,
+	}, {
+		.name	= "4:2:0 planar, YUV",
+		.fourcc	= V4L2_PIX_FMT_YUV420,
+		.bpp    = 12,
+		.y_depth = 8,
+		.uv_width_dec = 2,
+		.uv_height_dec = 2,
+	}, {
+		.name	= "4:2:0 planar, YVU",
+		.fourcc	= V4L2_PIX_FMT_YVU420,
+		.bpp    = 12,
+		.y_depth = 8,
+		.uv_width_dec = 2,
+		.uv_height_dec = 2,
+		.uv_swapped = true,
+	}, {
+		.name   = "4:2:0 partial planar, NV12",
+		.fourcc = V4L2_PIX_FMT_NV12,
+		.bpp    = 12,
+		.y_depth = 8,
+		.uv_width_dec = 2,
+		.uv_height_dec = 2,
+		.uv_packed = true,
+	}, {
+		.name   = "4:2:2 planar, YUV",
+		.fourcc = V4L2_PIX_FMT_YUV422P,
+		.bpp    = 16,
+		.y_depth = 8,
+		.uv_width_dec = 2,
+		.uv_height_dec = 1,
+	}, {
+		.name   = "4:2:2 partial planar, NV16",
+		.fourcc = V4L2_PIX_FMT_NV16,
+		.bpp    = 16,
+		.y_depth = 8,
+		.uv_width_dec = 2,
+		.uv_height_dec = 1,
+		.uv_packed = true,
+	},
+};
+
+static const struct ipu_ic_pixfmt *ipu_ic_get_format(u32 fourcc)
+{
+	const struct ipu_ic_pixfmt *ret = NULL;
+	unsigned int i;
+
+	for (i = 0; i < ARRAY_SIZE(ipu_ic_formats); i++) {
+		if (ipu_ic_formats[i].fourcc = fourcc) {
+			ret = &ipu_ic_formats[i];
+			break;
+		}
+	}
+
+	return ret;
+}
+
+static void ipu_ic_dump_format(struct image_converter_ctx *ctx,
+			       struct ipu_ic_image *ic_image)
+{
+	struct ipu_ic_priv *priv = ctx->cvt->ic->priv;
+
+	dev_dbg(priv->ipu->dev,
+		"ctx %p: %s format: %dx%d (%dx%d tiles of size %dx%d), %c%c%c%c\n",
+		ctx,
+		ic_image->type = IMAGE_CONVERT_OUT ? "Output" : "Input",
+		ic_image->base.pix.width, ic_image->base.pix.height,
+		ic_image->num_cols, ic_image->num_rows,
+		ic_image->tile[0].width, ic_image->tile[0].height,
+		ic_image->fmt->fourcc & 0xff,
+		(ic_image->fmt->fourcc >> 8) & 0xff,
+		(ic_image->fmt->fourcc >> 16) & 0xff,
+		(ic_image->fmt->fourcc >> 24) & 0xff);
+}
+
+int ipu_image_convert_enum_format(int index, const char **desc, u32 *fourcc)
+{
+	const struct ipu_ic_pixfmt *fmt;
+
+	if (index >= (int)ARRAY_SIZE(ipu_ic_formats))
+		return -EINVAL;
+
+	/* Format found */
+	fmt = &ipu_ic_formats[index];
+	*desc = fmt->name;
+	*fourcc = fmt->fourcc;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(ipu_image_convert_enum_format);
+
+static void ipu_ic_free_dma_buf(struct ipu_ic_priv *priv,
+				struct ipu_ic_dma_buf *buf)
+{
+	if (buf->virt)
+		dma_free_coherent(priv->ipu->dev,
+				  buf->len, buf->virt, buf->phys);
+	buf->virt = NULL;
+	buf->phys = 0;
+}
+
+static int ipu_ic_alloc_dma_buf(struct ipu_ic_priv *priv,
+				struct ipu_ic_dma_buf *buf,
+				int size)
+{
+	unsigned long newlen = PAGE_ALIGN(size);
+
+	if (buf->virt) {
+		if (buf->len = newlen)
+			return 0;
+		ipu_ic_free_dma_buf(priv, buf);
+	}
+
+	buf->len = newlen;
+	buf->virt = dma_alloc_coherent(priv->ipu->dev, buf->len, &buf->phys,
+				       GFP_DMA | GFP_KERNEL);
+	if (!buf->virt) {
+		dev_err(priv->ipu->dev, "failed to alloc dma buffer\n");
+		return -ENOMEM;
+	}
+
+	return 0;
+}
+
+static inline int ipu_ic_num_stripes(int dim)
+{
+	if (dim <= 1024)
+		return 1;
+	else if (dim <= 2048)
+		return 2;
+	else
+		return 4;
+}
+
+static void ipu_ic_calc_tile_dimensions(struct image_converter_ctx *ctx,
+					struct ipu_ic_image *image)
+{
+	int i;
+
+	for (i = 0; i < ctx->num_tiles; i++) {
+		struct ipu_ic_tile *tile = &image->tile[i];
+
+		tile->height = image->base.pix.height / image->num_rows;
+		tile->width = image->base.pix.width / image->num_cols;
+		tile->size = ((tile->height * image->fmt->bpp) >> 3) *
+			tile->width;
+
+		if (image->fmt->y_depth) {
+			tile->stride +				(image->fmt->y_depth * tile->width) >> 3;
+			tile->rot_stride +				(image->fmt->y_depth * tile->height) >> 3;
+		} else {
+			tile->stride +				(image->fmt->bpp * tile->width) >> 3;
+			tile->rot_stride +				(image->fmt->bpp * tile->height) >> 3;
+		}
+	}
+}
+
+/*
+ * Use the rotation transformation to find the tile coordinates
+ * (row, col) of a tile in the destination frame that corresponds
+ * to the given tile coordinates of a source frame. The destination
+ * coordinate is then converted to a tile index.
+ */
+static int ipu_ic_transform_tile_index(struct image_converter_ctx *ctx,
+				       int src_row, int src_col)
+{
+	struct ipu_ic_priv *priv = ctx->cvt->ic->priv;
+	struct ipu_ic_image *s_image = &ctx->in;
+	struct ipu_ic_image *d_image = &ctx->out;
+	int cos, sin, dst_row, dst_col;
+
+	/* with no rotation it's a 1:1 mapping */
+	if (ctx->rot_mode = IPU_ROTATE_NONE)
+		return src_row * s_image->num_cols + src_col;
+
+	if (ctx->rot_mode & IPU_ROT_BIT_90) {
+		cos = 0;
+		sin = 1;
+	} else {
+		cos = 1;
+		sin = 0;
+	}
+
+	/*
+	 * before doing the transform, first we have to translate
+	 * source row,col for an origin in the center of s_image
+	 */
+	src_row *= 2;
+	src_col *= 2;
+	src_row -= s_image->num_rows - 1;
+	src_col -= s_image->num_cols - 1;
+
+	/* do the rotation transform */
+	dst_col = src_col * cos - src_row * sin;
+	dst_row = src_col * sin + src_row * cos;
+
+	/* apply flip */
+	if (ctx->rot_mode & IPU_ROT_BIT_HFLIP)
+		dst_col = -dst_col;
+	if (ctx->rot_mode & IPU_ROT_BIT_VFLIP)
+		dst_row = -dst_row;
+
+	dev_dbg(priv->ipu->dev, "ctx %p: [%d,%d] --> [%d,%d]\n",
+		ctx, src_col, src_row, dst_col, dst_row);
+
+	/*
+	 * finally translate dest row,col using an origin in upper
+	 * left of d_image
+	 */
+	dst_row += d_image->num_rows - 1;
+	dst_col += d_image->num_cols - 1;
+	dst_row /= 2;
+	dst_col /= 2;
+
+	return dst_row * d_image->num_cols + dst_col;
+}
+
+/*
+ * Fill the out_tile_map[] with transformed destination tile indeces.
+ */
+static void ipu_ic_calc_out_tile_map(struct image_converter_ctx *ctx)
+{
+	struct ipu_ic_image *s_image = &ctx->in;
+	unsigned int row, col, tile = 0;
+
+	for (row = 0; row < s_image->num_rows; row++) {
+		for (col = 0; col < s_image->num_cols; col++) {
+			ctx->out_tile_map[tile] +				ipu_ic_transform_tile_index(ctx, row, col);
+			tile++;
+		}
+	}
+}
+
+static void ipu_ic_calc_tile_offsets_planar(struct image_converter_ctx *ctx,
+					    struct ipu_ic_image *image)
+{
+	struct ipu_ic_priv *priv = ctx->cvt->ic->priv;
+	const struct ipu_ic_pixfmt *fmt = image->fmt;
+	unsigned int row, col, tile = 0;
+	u32 H, w, h, y_depth, y_stride, uv_stride;
+	u32 uv_row_off, uv_col_off, uv_off, u_off, v_off, tmp;
+	u32 y_row_off, y_col_off, y_off;
+	u32 y_size, uv_size;
+
+	/* setup some convenience vars */
+	H = image->base.pix.height;
+
+	y_depth = fmt->y_depth;
+	y_stride = image->stride;
+	uv_stride = y_stride / fmt->uv_width_dec;
+	if (fmt->uv_packed)
+		uv_stride *= 2;
+
+	y_size = H * y_stride;
+	uv_size = y_size / (fmt->uv_width_dec * fmt->uv_height_dec);
+
+	for (row = 0; row < image->num_rows; row++) {
+		w = image->tile[tile].width;
+		h = image->tile[tile].height;
+		y_row_off = row * h * y_stride;
+		uv_row_off = (row * h * uv_stride) / fmt->uv_height_dec;
+
+		for (col = 0; col < image->num_cols; col++) {
+			y_col_off = (col * w * y_depth) >> 3;
+			uv_col_off = y_col_off / fmt->uv_width_dec;
+			if (fmt->uv_packed)
+				uv_col_off *= 2;
+
+			y_off = y_row_off + y_col_off;
+			uv_off = uv_row_off + uv_col_off;
+
+			u_off = y_size - y_off + uv_off;
+			v_off = (fmt->uv_packed) ? 0 : u_off + uv_size;
+			if (fmt->uv_swapped) {
+				tmp = u_off;
+				u_off = v_off;
+				v_off = tmp;
+			}
+
+			image->tile[tile].offset = y_off;
+			image->tile[tile].u_off = u_off;
+			image->tile[tile++].v_off = v_off;
+
+			dev_dbg(priv->ipu->dev,
+				"ctx %p: %s@[%d,%d]: y_off %08x, u_off %08x, v_off %08x\n",
+				ctx, image->type = IMAGE_CONVERT_IN ?
+				"Input" : "Output", row, col,
+				y_off, u_off, v_off);
+		}
+	}
+}
+
+static void ipu_ic_calc_tile_offsets_packed(struct image_converter_ctx *ctx,
+					    struct ipu_ic_image *image)
+{
+	struct ipu_ic_priv *priv = ctx->cvt->ic->priv;
+	const struct ipu_ic_pixfmt *fmt = image->fmt;
+	unsigned int row, col, tile = 0;
+	u32 w, h, bpp, stride;
+	u32 row_off, col_off;
+
+	/* setup some convenience vars */
+	stride = image->stride;
+	bpp = fmt->bpp;
+
+	for (row = 0; row < image->num_rows; row++) {
+		w = image->tile[tile].width;
+		h = image->tile[tile].height;
+		row_off = row * h * stride;
+
+		for (col = 0; col < image->num_cols; col++) {
+			col_off = (col * w * bpp) >> 3;
+
+			image->tile[tile].offset = row_off + col_off;
+			image->tile[tile].u_off = 0;
+			image->tile[tile++].v_off = 0;
+
+			dev_dbg(priv->ipu->dev,
+				"ctx %p: %s@[%d,%d]: phys %08x\n", ctx,
+				image->type = IMAGE_CONVERT_IN ?
+				"Input" : "Output", row, col,
+				row_off + col_off);
+		}
+	}
+}
+
+static void ipu_ic_calc_tile_offsets(struct image_converter_ctx *ctx,
+				     struct ipu_ic_image *image)
+{
+	if (image->fmt->y_depth)
+		ipu_ic_calc_tile_offsets_planar(ctx, image);
+	else
+		ipu_ic_calc_tile_offsets_packed(ctx, image);
+}
+
+/*
+ * return the number of runs in given queue (pending_q or done_q)
+ * for this context. hold irqlock when calling.
+ */
+static int ipu_ic_get_run_count(struct image_converter_ctx *ctx,
+				struct list_head *q)
+{
+	struct image_converter_run *run;
+	int count = 0;
+
+	list_for_each_entry(run, q, list) {
+		if (run->ctx = ctx)
+			count++;
+	}
+
+	return count;
+}
+
+/* hold irqlock when calling */
+static void ipu_ic_convert_stop(struct image_converter_run *run)
+{
+	struct image_converter_ctx *ctx = run->ctx;
+	struct image_converter *cvt = ctx->cvt;
+	struct ipu_ic_priv *priv = cvt->ic->priv;
+
+	dev_dbg(priv->ipu->dev, "%s: stopping ctx %p run %p\n",
+		__func__, ctx, run);
+
+	/* disable IC tasks and the channels */
+	ipu_ic_task_disable(cvt->ic);
+	ipu_idmac_disable_channel(cvt->in_chan);
+	ipu_idmac_disable_channel(cvt->out_chan);
+
+	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
+		ipu_idmac_disable_channel(cvt->rotation_in_chan);
+		ipu_idmac_disable_channel(cvt->rotation_out_chan);
+		ipu_idmac_unlink(cvt->out_chan, cvt->rotation_in_chan);
+	}
+
+	ipu_ic_disable(cvt->ic);
+}
+
+/* hold irqlock when calling */
+static void init_idmac_channel(struct image_converter_ctx *ctx,
+			       struct ipuv3_channel *channel,
+			       struct ipu_ic_image *image,
+			       enum ipu_rotate_mode rot_mode,
+			       bool rot_swap_width_height)
+{
+	struct image_converter *cvt = ctx->cvt;
+	unsigned int burst_size;
+	u32 width, height, stride;
+	dma_addr_t addr0, addr1 = 0;
+	struct ipu_image tile_image;
+	unsigned int tile_idx[2];
+
+	if (image->type = IMAGE_CONVERT_OUT) {
+		tile_idx[0] = ctx->out_tile_map[0];
+		tile_idx[1] = ctx->out_tile_map[1];
+	} else {
+		tile_idx[0] = 0;
+		tile_idx[1] = 1;
+	}
+
+	if (rot_swap_width_height) {
+		width = image->tile[0].height;
+		height = image->tile[0].width;
+		stride = image->tile[0].rot_stride;
+		addr0 = ctx->rot_intermediate[0].phys;
+		if (ctx->double_buffering)
+			addr1 = ctx->rot_intermediate[1].phys;
+	} else {
+		width = image->tile[0].width;
+		height = image->tile[0].height;
+		stride = image->stride;
+		addr0 = image->base.phys0 +
+			image->tile[tile_idx[0]].offset;
+		if (ctx->double_buffering)
+			addr1 = image->base.phys0 +
+				image->tile[tile_idx[1]].offset;
+	}
+
+	ipu_cpmem_zero(channel);
+
+	memset(&tile_image, 0, sizeof(tile_image));
+	tile_image.pix.width = tile_image.rect.width = width;
+	tile_image.pix.height = tile_image.rect.height = height;
+	tile_image.pix.bytesperline = stride;
+	tile_image.pix.pixelformat =  image->fmt->fourcc;
+	tile_image.phys0 = addr0;
+	tile_image.phys1 = addr1;
+	ipu_cpmem_set_image(channel, &tile_image);
+
+	if (image->fmt->y_depth && !rot_swap_width_height)
+		ipu_cpmem_set_uv_offset(channel,
+					image->tile[tile_idx[0]].u_off,
+					image->tile[tile_idx[0]].v_off);
+
+	if (rot_mode)
+		ipu_cpmem_set_rotation(channel, rot_mode);
+
+	if (channel = cvt->rotation_in_chan ||
+	    channel = cvt->rotation_out_chan) {
+		burst_size = 8;
+		ipu_cpmem_set_block_mode(channel);
+	} else
+		burst_size = (width % 16) ? 8 : 16;
+
+	ipu_cpmem_set_burstsize(channel, burst_size);
+
+	ipu_ic_task_idma_init(cvt->ic, channel, width, height,
+			      burst_size, rot_mode);
+
+	ipu_cpmem_set_axi_id(channel, 1);
+
+	ipu_idmac_set_double_buffer(channel, ctx->double_buffering);
+}
+
+/* hold irqlock when calling */
+static int ipu_ic_convert_start(struct image_converter_run *run)
+{
+	struct image_converter_ctx *ctx = run->ctx;
+	struct image_converter *cvt = ctx->cvt;
+	struct ipu_ic_priv *priv = cvt->ic->priv;
+	struct ipu_ic_image *s_image = &ctx->in;
+	struct ipu_ic_image *d_image = &ctx->out;
+	enum ipu_color_space src_cs, dest_cs;
+	unsigned int dest_width, dest_height;
+	int ret;
+
+	dev_dbg(priv->ipu->dev, "%s: starting ctx %p run %p\n",
+		__func__, ctx, run);
+
+	src_cs = ipu_pixelformat_to_colorspace(s_image->fmt->fourcc);
+	dest_cs = ipu_pixelformat_to_colorspace(d_image->fmt->fourcc);
+
+	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
+		/* swap width/height for resizer */
+		dest_width = d_image->tile[0].height;
+		dest_height = d_image->tile[0].width;
+	} else {
+		dest_width = d_image->tile[0].width;
+		dest_height = d_image->tile[0].height;
+	}
+
+	/* setup the IC resizer and CSC */
+	ret = ipu_ic_task_init(cvt->ic,
+			       s_image->tile[0].width,
+			       s_image->tile[0].height,
+			       dest_width,
+			       dest_height,
+			       src_cs, dest_cs);
+	if (ret) {
+		dev_err(priv->ipu->dev, "ipu_ic_task_init failed, %d\n", ret);
+		return ret;
+	}
+
+	/* init the source MEM-->IC PP IDMAC channel */
+	init_idmac_channel(ctx, cvt->in_chan, s_image,
+			   IPU_ROTATE_NONE, false);
+
+	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
+		/* init the IC PP-->MEM IDMAC channel */
+		init_idmac_channel(ctx, cvt->out_chan, d_image,
+				   IPU_ROTATE_NONE, true);
+
+		/* init the MEM-->IC PP ROT IDMAC channel */
+		init_idmac_channel(ctx, cvt->rotation_in_chan, d_image,
+				   ctx->rot_mode, true);
+
+		/* init the destination IC PP ROT-->MEM IDMAC channel */
+		init_idmac_channel(ctx, cvt->rotation_out_chan, d_image,
+				   IPU_ROTATE_NONE, false);
+
+		/* now link IC PP-->MEM to MEM-->IC PP ROT */
+		ipu_idmac_link(cvt->out_chan, cvt->rotation_in_chan);
+	} else {
+		/* init the destination IC PP-->MEM IDMAC channel */
+		init_idmac_channel(ctx, cvt->out_chan, d_image,
+				   ctx->rot_mode, false);
+	}
+
+	/* enable the IC */
+	ipu_ic_enable(cvt->ic);
+
+	/* set buffers ready */
+	ipu_idmac_select_buffer(cvt->in_chan, 0);
+	ipu_idmac_select_buffer(cvt->out_chan, 0);
+	if (ipu_rot_mode_is_irt(ctx->rot_mode))
+		ipu_idmac_select_buffer(cvt->rotation_out_chan, 0);
+	if (ctx->double_buffering) {
+		ipu_idmac_select_buffer(cvt->in_chan, 1);
+		ipu_idmac_select_buffer(cvt->out_chan, 1);
+		if (ipu_rot_mode_is_irt(ctx->rot_mode))
+			ipu_idmac_select_buffer(cvt->rotation_out_chan, 1);
+	}
+
+	/* enable the channels! */
+	ipu_idmac_enable_channel(cvt->in_chan);
+	ipu_idmac_enable_channel(cvt->out_chan);
+	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
+		ipu_idmac_enable_channel(cvt->rotation_in_chan);
+		ipu_idmac_enable_channel(cvt->rotation_out_chan);
+	}
+
+	ipu_ic_task_enable(cvt->ic);
+
+	ipu_cpmem_dump(cvt->in_chan);
+	ipu_cpmem_dump(cvt->out_chan);
+	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
+		ipu_cpmem_dump(cvt->rotation_in_chan);
+		ipu_cpmem_dump(cvt->rotation_out_chan);
+	}
+
+	ipu_dump(priv->ipu);
+
+	return 0;
+}
+
+/* hold irqlock when calling */
+static int ipu_ic_run(struct image_converter_run *run)
+{
+	struct image_converter_ctx *ctx = run->ctx;
+	struct image_converter *cvt = ctx->cvt;
+
+	ctx->in.base.phys0 = run->in_phys;
+	ctx->out.base.phys0 = run->out_phys;
+
+	ctx->cur_buf_num = 0;
+	ctx->next_tile = 1;
+
+	/* remove run from pending_q and set as current */
+	list_del(&run->list);
+	cvt->current_run = run;
+
+	return ipu_ic_convert_start(run);
+}
+
+/* hold irqlock when calling */
+static void ipu_ic_run_next(struct image_converter *cvt)
+{
+	struct ipu_ic_priv *priv = cvt->ic->priv;
+	struct image_converter_run *run, *tmp;
+	int ret;
+
+	list_for_each_entry_safe(run, tmp, &cvt->pending_q, list) {
+		/* skip contexts that are aborting */
+		if (run->ctx->aborting) {
+			dev_dbg(priv->ipu->dev,
+				 "%s: skipping aborting ctx %p run %p\n",
+				 __func__, run->ctx, run);
+			continue;
+		}
+
+		ret = ipu_ic_run(run);
+		if (!ret)
+			break;
+
+		/*
+		 * something went wrong with start, add the run
+		 * to done q and continue to the next run in the
+		 * pending q.
+		 */
+		run->status = ret;
+		list_add_tail(&run->list, &cvt->done_q);
+		cvt->current_run = NULL;
+	}
+}
+
+static void ipu_ic_empty_done_q(struct image_converter *cvt)
+{
+	struct ipu_ic_priv *priv = cvt->ic->priv;
+	struct image_converter_run *run;
+	unsigned long flags;
+
+	spin_lock_irqsave(&cvt->irqlock, flags);
+
+	while (!list_empty(&cvt->done_q)) {
+		run = list_entry(cvt->done_q.next,
+				 struct image_converter_run,
+				 list);
+
+		list_del(&run->list);
+
+		dev_dbg(priv->ipu->dev,
+			"%s: completing ctx %p run %p with %d\n",
+			__func__, run->ctx, run, run->status);
+
+		/* call the completion callback and free the run */
+		spin_unlock_irqrestore(&cvt->irqlock, flags);
+		run->ctx->complete(run->ctx->complete_context, run,
+				   run->status);
+		kfree(run);
+		spin_lock_irqsave(&cvt->irqlock, flags);
+	}
+
+	spin_unlock_irqrestore(&cvt->irqlock, flags);
+}
+
+/*
+ * the bottom half thread clears out the done_q, calling the
+ * completion handler for each.
+ */
+static irqreturn_t ipu_ic_bh(int irq, void *dev_id)
+{
+	struct image_converter *cvt = dev_id;
+	struct ipu_ic_priv *priv = cvt->ic->priv;
+	struct image_converter_ctx *ctx;
+	unsigned long flags;
+
+	dev_dbg(priv->ipu->dev, "%s: enter\n", __func__);
+
+	ipu_ic_empty_done_q(cvt);
+
+	spin_lock_irqsave(&cvt->irqlock, flags);
+
+	/*
+	 * the done_q is cleared out, signal any contexts
+	 * that are aborting that abort can complete.
+	 */
+	list_for_each_entry(ctx, &cvt->ctx_list, list) {
+		if (ctx->aborting) {
+			dev_dbg(priv->ipu->dev,
+				 "%s: signaling abort for ctx %p\n",
+				 __func__, ctx);
+			complete(&ctx->aborted);
+		}
+	}
+
+	spin_unlock_irqrestore(&cvt->irqlock, flags);
+
+	dev_dbg(priv->ipu->dev, "%s: exit\n", __func__);
+	return IRQ_HANDLED;
+}
+
+/* hold irqlock when calling */
+static irqreturn_t ipu_ic_doirq(struct image_converter_run *run)
+{
+	struct image_converter_ctx *ctx = run->ctx;
+	struct image_converter *cvt = ctx->cvt;
+	struct ipu_ic_tile *src_tile, *dst_tile;
+	struct ipu_ic_image *s_image = &ctx->in;
+	struct ipu_ic_image *d_image = &ctx->out;
+	struct ipuv3_channel *outch;
+	unsigned int dst_idx;
+
+	outch = ipu_rot_mode_is_irt(ctx->rot_mode) ?
+		cvt->rotation_out_chan : cvt->out_chan;
+
+	/*
+	 * It is difficult to stop the channel DMA before the channels
+	 * enter the paused state. Without double-buffering the channels
+	 * are always in a paused state when the EOF irq occurs, so it
+	 * is safe to stop the channels now. For double-buffering we
+	 * just ignore the abort until the operation completes, when it
+	 * is safe to shut down.
+	 */
+	if (ctx->aborting && !ctx->double_buffering) {
+		ipu_ic_convert_stop(run);
+		run->status = -EIO;
+		goto done;
+	}
+
+	if (ctx->next_tile = ctx->num_tiles) {
+		/*
+		 * the conversion is complete
+		 */
+		ipu_ic_convert_stop(run);
+		run->status = 0;
+		goto done;
+	}
+
+	/*
+	 * not done, place the next tile buffers.
+	 */
+	if (!ctx->double_buffering) {
+
+		src_tile = &s_image->tile[ctx->next_tile];
+		dst_idx = ctx->out_tile_map[ctx->next_tile];
+		dst_tile = &d_image->tile[dst_idx];
+
+		ipu_cpmem_set_buffer(cvt->in_chan, 0,
+				     s_image->base.phys0 + src_tile->offset);
+		ipu_cpmem_set_buffer(outch, 0,
+				     d_image->base.phys0 + dst_tile->offset);
+		if (s_image->fmt->y_depth)
+			ipu_cpmem_set_uv_offset(cvt->in_chan,
+						src_tile->u_off,
+						src_tile->v_off);
+		if (d_image->fmt->y_depth)
+			ipu_cpmem_set_uv_offset(outch,
+						dst_tile->u_off,
+						dst_tile->v_off);
+
+		ipu_idmac_select_buffer(cvt->in_chan, 0);
+		ipu_idmac_select_buffer(outch, 0);
+
+	} else if (ctx->next_tile < ctx->num_tiles - 1) {
+
+		src_tile = &s_image->tile[ctx->next_tile + 1];
+		dst_idx = ctx->out_tile_map[ctx->next_tile + 1];
+		dst_tile = &d_image->tile[dst_idx];
+
+		ipu_cpmem_set_buffer(cvt->in_chan, ctx->cur_buf_num,
+				     s_image->base.phys0 + src_tile->offset);
+		ipu_cpmem_set_buffer(outch, ctx->cur_buf_num,
+				     d_image->base.phys0 + dst_tile->offset);
+
+		ipu_idmac_select_buffer(cvt->in_chan, ctx->cur_buf_num);
+		ipu_idmac_select_buffer(outch, ctx->cur_buf_num);
+
+		ctx->cur_buf_num ^= 1;
+	}
+
+	ctx->next_tile++;
+	return IRQ_HANDLED;
+done:
+	list_add_tail(&run->list, &cvt->done_q);
+	cvt->current_run = NULL;
+	ipu_ic_run_next(cvt);
+	return IRQ_WAKE_THREAD;
+}
+
+static irqreturn_t ipu_ic_norotate_irq(int irq, void *data)
+{
+	struct image_converter *cvt = data;
+	struct image_converter_ctx *ctx;
+	struct image_converter_run *run;
+	unsigned long flags;
+	irqreturn_t ret;
+
+	spin_lock_irqsave(&cvt->irqlock, flags);
+
+	/* get current run and its context */
+	run = cvt->current_run;
+	if (!run) {
+		ret = IRQ_NONE;
+		goto out;
+	}
+
+	ctx = run->ctx;
+
+	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
+		/* this is a rotation operation, just ignore */
+		spin_unlock_irqrestore(&cvt->irqlock, flags);
+		return IRQ_HANDLED;
+	}
+
+	ret = ipu_ic_doirq(run);
+out:
+	spin_unlock_irqrestore(&cvt->irqlock, flags);
+	return ret;
+}
+
+static irqreturn_t ipu_ic_rotate_irq(int irq, void *data)
+{
+	struct image_converter *cvt = data;
+	struct ipu_ic_priv *priv = cvt->ic->priv;
+	struct image_converter_ctx *ctx;
+	struct image_converter_run *run;
+	unsigned long flags;
+	irqreturn_t ret;
+
+	spin_lock_irqsave(&cvt->irqlock, flags);
+
+	/* get current run and its context */
+	run = cvt->current_run;
+	if (!run) {
+		ret = IRQ_NONE;
+		goto out;
+	}
+
+	ctx = run->ctx;
+
+	if (!ipu_rot_mode_is_irt(ctx->rot_mode)) {
+		/* this was NOT a rotation operation, shouldn't happen */
+		dev_err(priv->ipu->dev, "Unexpected rotation interrupt\n");
+		spin_unlock_irqrestore(&cvt->irqlock, flags);
+		return IRQ_HANDLED;
+	}
+
+	ret = ipu_ic_doirq(run);
+out:
+	spin_unlock_irqrestore(&cvt->irqlock, flags);
+	return ret;
+}
+
+/*
+ * try to force the completion of runs for this ctx. Called when
+ * abort wait times out in ipu_image_convert_abort().
+ */
+static void ipu_ic_force_abort(struct image_converter_ctx *ctx)
+{
+	struct image_converter *cvt = ctx->cvt;
+	struct image_converter_run *run;
+	unsigned long flags;
+
+	spin_lock_irqsave(&cvt->irqlock, flags);
+
+	run = cvt->current_run;
+	if (run && run->ctx = ctx) {
+		ipu_ic_convert_stop(run);
+		run->status = -EIO;
+		list_add_tail(&run->list, &cvt->done_q);
+		cvt->current_run = NULL;
+		ipu_ic_run_next(cvt);
+	}
+
+	spin_unlock_irqrestore(&cvt->irqlock, flags);
+
+	ipu_ic_empty_done_q(cvt);
+}
+
+static void ipu_ic_release_ipu_resources(struct image_converter *cvt)
+{
+	if (cvt->out_eof_irq >= 0)
+		free_irq(cvt->out_eof_irq, cvt);
+	if (cvt->rot_out_eof_irq >= 0)
+		free_irq(cvt->rot_out_eof_irq, cvt);
+
+	if (!IS_ERR_OR_NULL(cvt->in_chan))
+		ipu_idmac_put(cvt->in_chan);
+	if (!IS_ERR_OR_NULL(cvt->out_chan))
+		ipu_idmac_put(cvt->out_chan);
+	if (!IS_ERR_OR_NULL(cvt->rotation_in_chan))
+		ipu_idmac_put(cvt->rotation_in_chan);
+	if (!IS_ERR_OR_NULL(cvt->rotation_out_chan))
+		ipu_idmac_put(cvt->rotation_out_chan);
+
+	cvt->in_chan = cvt->out_chan = cvt->rotation_in_chan +		cvt->rotation_out_chan = NULL;
+	cvt->out_eof_irq = cvt->rot_out_eof_irq = -1;
+}
+
+static int ipu_ic_get_ipu_resources(struct image_converter *cvt)
+{
+	const struct ic_task_channels *chan = cvt->ic->ch;
+	struct ipu_ic_priv *priv = cvt->ic->priv;
+	int ret;
+
+	/* get IDMAC channels */
+	cvt->in_chan = ipu_idmac_get(priv->ipu, chan->in);
+	cvt->out_chan = ipu_idmac_get(priv->ipu, chan->out);
+	if (IS_ERR(cvt->in_chan) || IS_ERR(cvt->out_chan)) {
+		dev_err(priv->ipu->dev, "could not acquire idmac channels\n");
+		ret = -EBUSY;
+		goto err;
+	}
+
+	cvt->rotation_in_chan = ipu_idmac_get(priv->ipu, chan->rot_in);
+	cvt->rotation_out_chan = ipu_idmac_get(priv->ipu, chan->rot_out);
+	if (IS_ERR(cvt->rotation_in_chan) || IS_ERR(cvt->rotation_out_chan)) {
+		dev_err(priv->ipu->dev,
+			"could not acquire idmac rotation channels\n");
+		ret = -EBUSY;
+		goto err;
+	}
+
+	/* acquire the EOF interrupts */
+	cvt->out_eof_irq = ipu_idmac_channel_irq(priv->ipu,
+						cvt->out_chan,
+						IPU_IRQ_EOF);
+
+	ret = request_threaded_irq(cvt->out_eof_irq,
+				   ipu_ic_norotate_irq, ipu_ic_bh,
+				   0, "ipu-ic", cvt);
+	if (ret < 0) {
+		dev_err(priv->ipu->dev, "could not acquire irq %d\n",
+			 cvt->out_eof_irq);
+		cvt->out_eof_irq = -1;
+		goto err;
+	}
+
+	cvt->rot_out_eof_irq = ipu_idmac_channel_irq(priv->ipu,
+						     cvt->rotation_out_chan,
+						     IPU_IRQ_EOF);
+
+	ret = request_threaded_irq(cvt->rot_out_eof_irq,
+				   ipu_ic_rotate_irq, ipu_ic_bh,
+				   0, "ipu-ic", cvt);
+	if (ret < 0) {
+		dev_err(priv->ipu->dev, "could not acquire irq %d\n",
+			cvt->rot_out_eof_irq);
+		cvt->rot_out_eof_irq = -1;
+		goto err;
+	}
+
+	return 0;
+err:
+	ipu_ic_release_ipu_resources(cvt);
+	return ret;
+}
+
+static int ipu_ic_fill_image(struct image_converter_ctx *ctx,
+			     struct ipu_ic_image *ic_image,
+			     struct ipu_image *image,
+			     enum image_convert_type type)
+{
+	struct ipu_ic_priv *priv = ctx->cvt->ic->priv;
+
+	ic_image->base = *image;
+	ic_image->type = type;
+
+	ic_image->fmt = ipu_ic_get_format(image->pix.pixelformat);
+	if (!ic_image->fmt) {
+		dev_err(priv->ipu->dev, "pixelformat not supported for %s\n",
+			type = IMAGE_CONVERT_OUT ? "Output" : "Input");
+		return -EINVAL;
+	}
+
+	if (ic_image->fmt->y_depth)
+		ic_image->stride = (ic_image->fmt->y_depth *
+				    ic_image->base.pix.width) >> 3;
+	else
+		ic_image->stride  = ic_image->base.pix.bytesperline;
+
+	ipu_ic_calc_tile_dimensions(ctx, ic_image);
+	ipu_ic_calc_tile_offsets(ctx, ic_image);
+
+	return 0;
+}
+
+/* borrowed from drivers/media/v4l2-core/v4l2-common.c */
+static unsigned int clamp_align(unsigned int x, unsigned int min,
+				unsigned int max, unsigned int align)
+{
+	/* Bits that must be zero to be aligned */
+	unsigned int mask = ~((1 << align) - 1);
+
+	/* Clamp to aligned min and max */
+	x = clamp(x, (min + ~mask) & mask, max & mask);
+
+	/* Round to nearest aligned value */
+	if (align)
+		x = (x + (1 << (align - 1))) & mask;
+
+	return x;
+}
+
+/*
+ * We have to adjust the tile width such that the tile physaddrs and
+ * U and V plane offsets are multiples of 8 bytes as required by
+ * the IPU DMA Controller. For the planar formats, this corresponds
+ * to a pixel alignment of 16 (but use a more formal equation since
+ * the variables are available). For all the packed formats, 8 is
+ * good enough.
+ */
+static inline u32 tile_width_align(const struct ipu_ic_pixfmt *fmt)
+{
+	return fmt->y_depth ? (64 * fmt->uv_width_dec) / fmt->y_depth : 8;
+}
+
+/*
+ * For tile height alignment, we have to ensure that the output tile
+ * heights are multiples of 8 lines if the IRT is required by the
+ * given rotation mode (the IRT performs rotations on 8x8 blocks
+ * at a time). If the IRT is not used, or for input image tiles,
+ * 2 lines are good enough.
+ */
+static inline u32 tile_height_align(enum image_convert_type type,
+				    enum ipu_rotate_mode rot_mode)
+{
+	return (type = IMAGE_CONVERT_OUT &&
+		ipu_rot_mode_is_irt(rot_mode)) ? 8 : 2;
+}
+
+/* Adjusts input/output images to IPU restrictions */
+int ipu_image_convert_adjust(struct ipu_image *in, struct ipu_image *out,
+			     enum ipu_rotate_mode rot_mode)
+{
+	const struct ipu_ic_pixfmt *infmt, *outfmt;
+	unsigned int num_in_rows, num_in_cols;
+	unsigned int num_out_rows, num_out_cols;
+	u32 w_align, h_align;
+
+	infmt = ipu_ic_get_format(in->pix.pixelformat);
+	outfmt = ipu_ic_get_format(out->pix.pixelformat);
+
+	/* set some defaults if needed */
+	if (!infmt) {
+		in->pix.pixelformat = V4L2_PIX_FMT_RGB24;
+		infmt = ipu_ic_get_format(V4L2_PIX_FMT_RGB24);
+	}
+	if (!outfmt) {
+		out->pix.pixelformat = V4L2_PIX_FMT_RGB24;
+		outfmt = ipu_ic_get_format(V4L2_PIX_FMT_RGB24);
+	}
+
+	if (!in->pix.width || !in->pix.height) {
+		in->pix.width = 640;
+		in->pix.height = 480;
+	}
+	if (!out->pix.width || !out->pix.height) {
+		out->pix.width = 640;
+		out->pix.height = 480;
+	}
+
+	/* image converter does not handle fields */
+	in->pix.field = out->pix.field = V4L2_FIELD_NONE;
+
+	/* resizer cannot downsize more than 4:1 */
+	if (ipu_rot_mode_is_irt(rot_mode)) {
+		out->pix.height = max_t(__u32, out->pix.height,
+					in->pix.width / 4);
+		out->pix.width = max_t(__u32, out->pix.width,
+				       in->pix.height / 4);
+	} else {
+		out->pix.width = max_t(__u32, out->pix.width,
+				       in->pix.width / 4);
+		out->pix.height = max_t(__u32, out->pix.height,
+					in->pix.height / 4);
+	}
+
+	/* get tiling rows/cols from output format */
+	num_out_rows = ipu_ic_num_stripes(out->pix.height);
+	num_out_cols = ipu_ic_num_stripes(out->pix.width);
+	if (ipu_rot_mode_is_irt(rot_mode)) {
+		num_in_rows = num_out_cols;
+		num_in_cols = num_out_rows;
+	} else {
+		num_in_rows = num_out_rows;
+		num_in_cols = num_out_cols;
+	}
+
+	/* align input width/height */
+	w_align = ilog2(tile_width_align(infmt) * num_in_cols);
+	h_align = ilog2(tile_height_align(IMAGE_CONVERT_IN, rot_mode) *
+			num_in_rows);
+	in->pix.width = clamp_align(in->pix.width, MIN_W, MAX_W, w_align);
+	in->pix.height = clamp_align(in->pix.height, MIN_H, MAX_H, h_align);
+
+	/* align output width/height */
+	w_align = ilog2(tile_width_align(outfmt) * num_out_cols);
+	h_align = ilog2(tile_height_align(IMAGE_CONVERT_OUT, rot_mode) *
+			num_out_rows);
+	out->pix.width = clamp_align(out->pix.width, MIN_W, MAX_W, w_align);
+	out->pix.height = clamp_align(out->pix.height, MIN_H, MAX_H, h_align);
+
+	/* set input/output strides and image sizes */
+	in->pix.bytesperline = (in->pix.width * infmt->bpp) >> 3;
+	in->pix.sizeimage = in->pix.height * in->pix.bytesperline;
+	out->pix.bytesperline = (out->pix.width * outfmt->bpp) >> 3;
+	out->pix.sizeimage = out->pix.height * out->pix.bytesperline;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(ipu_image_convert_adjust);
+
+/*
+ * this is used by ipu_image_convert_prepare() to verify set input and
+ * output images are valid before starting the conversion. Clients can
+ * also call it before calling ipu_image_convert_prepare().
+ */
+int ipu_image_convert_verify(struct ipu_image *in, struct ipu_image *out,
+			     enum ipu_rotate_mode rot_mode)
+{
+	struct ipu_image testin, testout;
+	int ret;
+
+	testin = *in;
+	testout = *out;
+
+	ret = ipu_image_convert_adjust(&testin, &testout, rot_mode);
+	if (ret)
+		return ret;
+
+	if (testin.pix.width != in->pix.width ||
+	    testin.pix.height != in->pix.height ||
+	    testout.pix.width != out->pix.width ||
+	    testout.pix.height != out->pix.height)
+		return -EINVAL;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(ipu_image_convert_verify);
+
+/*
+ * Call ipu_image_convert_prepare() to prepare for the conversion of
+ * given images and rotation mode. Returns a new conversion context.
+ */
+struct image_converter_ctx *
+ipu_image_convert_prepare(struct ipu_ic *ic,
+			  struct ipu_image *in, struct ipu_image *out,
+			  enum ipu_rotate_mode rot_mode,
+			  image_converter_cb_t complete,
+			  void *complete_context)
+{
+	struct ipu_ic_priv *priv = ic->priv;
+	struct image_converter *cvt = &ic->cvt;
+	struct ipu_ic_image *s_image, *d_image;
+	struct image_converter_ctx *ctx;
+	unsigned long flags;
+	bool get_res;
+	int ret;
+
+	if (!ic || !in || !out || !complete)
+		return ERR_PTR(-EINVAL);
+
+	/* verify the in/out images before continuing */
+	ret = ipu_image_convert_verify(in, out, rot_mode);
+	if (ret) {
+		dev_err(priv->ipu->dev, "%s: in/out formats invalid\n",
+			__func__);
+		return ERR_PTR(ret);
+	}
+
+	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
+	if (!ctx)
+		return ERR_PTR(-ENOMEM);
+
+	dev_dbg(priv->ipu->dev, "%s: ctx %p\n", __func__, ctx);
+
+	ctx->cvt = cvt;
+	init_completion(&ctx->aborted);
+
+	s_image = &ctx->in;
+	d_image = &ctx->out;
+
+	/* set tiling and rotation */
+	d_image->num_rows = ipu_ic_num_stripes(out->pix.height);
+	d_image->num_cols = ipu_ic_num_stripes(out->pix.width);
+	if (ipu_rot_mode_is_irt(rot_mode)) {
+		s_image->num_rows = d_image->num_cols;
+		s_image->num_cols = d_image->num_rows;
+	} else {
+		s_image->num_rows = d_image->num_rows;
+		s_image->num_cols = d_image->num_cols;
+	}
+
+	ctx->num_tiles = d_image->num_cols * d_image->num_rows;
+	ctx->rot_mode = rot_mode;
+
+	ret = ipu_ic_fill_image(ctx, s_image, in, IMAGE_CONVERT_IN);
+	if (ret)
+		goto out_free;
+	ret = ipu_ic_fill_image(ctx, d_image, out, IMAGE_CONVERT_OUT);
+	if (ret)
+		goto out_free;
+
+	ipu_ic_calc_out_tile_map(ctx);
+
+	ipu_ic_dump_format(ctx, s_image);
+	ipu_ic_dump_format(ctx, d_image);
+
+	ctx->complete = complete;
+	ctx->complete_context = complete_context;
+
+	/*
+	 * Can we use double-buffering for this operation? If there is
+	 * only one tile (the whole image can be converted in a single
+	 * operation) there's no point in using double-buffering. Also,
+	 * the IPU's IDMAC channels allow only a single U and V plane
+	 * offset shared between both buffers, but these offsets change
+	 * for every tile, and therefore would have to be updated for
+	 * each buffer which is not possible. So double-buffering is
+	 * impossible when either the source or destination images are
+	 * a planar format (YUV420, YUV422P, etc.).
+	 */
+	ctx->double_buffering = (ctx->num_tiles > 1 &&
+				 !s_image->fmt->y_depth &&
+				 !d_image->fmt->y_depth);
+
+	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
+		ret = ipu_ic_alloc_dma_buf(priv, &ctx->rot_intermediate[0],
+					   d_image->tile[0].size);
+		if (ret)
+			goto out_free;
+		if (ctx->double_buffering) {
+			ret = ipu_ic_alloc_dma_buf(priv,
+						   &ctx->rot_intermediate[1],
+						   d_image->tile[0].size);
+			if (ret)
+				goto out_free_dmabuf0;
+		}
+	}
+
+	spin_lock_irqsave(&cvt->irqlock, flags);
+
+	get_res = list_empty(&cvt->ctx_list);
+
+	list_add_tail(&ctx->list, &cvt->ctx_list);
+
+	spin_unlock_irqrestore(&cvt->irqlock, flags);
+
+	if (get_res) {
+		ret = ipu_ic_get_ipu_resources(cvt);
+		if (ret)
+			goto out_free_dmabuf1;
+	}
+
+	return ctx;
+
+out_free_dmabuf1:
+	ipu_ic_free_dma_buf(priv, &ctx->rot_intermediate[1]);
+	spin_lock_irqsave(&cvt->irqlock, flags);
+	list_del(&ctx->list);
+	spin_unlock_irqrestore(&cvt->irqlock, flags);
+out_free_dmabuf0:
+	ipu_ic_free_dma_buf(priv, &ctx->rot_intermediate[0]);
+out_free:
+	kfree(ctx);
+	return ERR_PTR(ret);
+}
+EXPORT_SYMBOL_GPL(ipu_image_convert_prepare);
+
+/*
+ * Carry out a single image conversion. Only the physaddr's of the input
+ * and output image buffers are needed. The conversion context must have
+ * been created previously with ipu_image_convert_prepare(). Returns the
+ * new run object.
+ */
+struct image_converter_run *
+ipu_image_convert_run(struct image_converter_ctx *ctx,
+		      dma_addr_t in_phys, dma_addr_t out_phys)
+{
+	struct image_converter *cvt = ctx->cvt;
+	struct ipu_ic_priv *priv = cvt->ic->priv;
+	struct image_converter_run *run;
+	unsigned long flags;
+	int ret = 0;
+
+	run = kzalloc(sizeof(*run), GFP_KERNEL);
+	if (!run)
+		return ERR_PTR(-ENOMEM);
+
+	run->ctx = ctx;
+	run->in_phys = in_phys;
+	run->out_phys = out_phys;
+
+	dev_dbg(priv->ipu->dev, "%s: ctx %p run %p\n", __func__,
+		ctx, run);
+
+	spin_lock_irqsave(&cvt->irqlock, flags);
+
+	if (ctx->aborting) {
+		ret = -EIO;
+		goto unlock;
+	}
+
+	list_add_tail(&run->list, &cvt->pending_q);
+
+	if (!cvt->current_run) {
+		ret = ipu_ic_run(run);
+		if (ret)
+			cvt->current_run = NULL;
+	}
+unlock:
+	spin_unlock_irqrestore(&cvt->irqlock, flags);
+
+	if (ret) {
+		kfree(run);
+		run = ERR_PTR(ret);
+	}
+
+	return run;
+}
+EXPORT_SYMBOL_GPL(ipu_image_convert_run);
+
+/* Abort any active or pending conversions for this context */
+void ipu_image_convert_abort(struct image_converter_ctx *ctx)
+{
+	struct image_converter *cvt = ctx->cvt;
+	struct ipu_ic_priv *priv = cvt->ic->priv;
+	struct image_converter_run *run, *active_run, *tmp;
+	unsigned long flags;
+	int run_count, ret;
+	bool need_abort;
+
+	reinit_completion(&ctx->aborted);
+
+	spin_lock_irqsave(&cvt->irqlock, flags);
+
+	/* move all remaining pending runs in this context to done_q */
+	list_for_each_entry_safe(run, tmp, &cvt->pending_q, list) {
+		if (run->ctx != ctx)
+			continue;
+		run->status = -EIO;
+		list_move_tail(&run->list, &cvt->done_q);
+	}
+
+	run_count = ipu_ic_get_run_count(ctx, &cvt->done_q);
+	active_run = (cvt->current_run && cvt->current_run->ctx = ctx) ?
+		cvt->current_run : NULL;
+
+	need_abort = (run_count || active_run);
+
+	ctx->aborting = need_abort;
+
+	spin_unlock_irqrestore(&cvt->irqlock, flags);
+
+	if (!need_abort) {
+		dev_dbg(priv->ipu->dev, "%s: no abort needed for ctx %p\n",
+			__func__, ctx);
+		return;
+	}
+
+	dev_dbg(priv->ipu->dev,
+		 "%s: wait for completion: %d runs, active run %p\n",
+		 __func__, run_count, active_run);
+
+	ret = wait_for_completion_timeout(&ctx->aborted,
+					  msecs_to_jiffies(10000));
+	if (ret = 0) {
+		dev_warn(priv->ipu->dev, "%s: timeout\n", __func__);
+		ipu_ic_force_abort(ctx);
+	}
+
+	ctx->aborting = false;
+}
+EXPORT_SYMBOL_GPL(ipu_image_convert_abort);
+
+/* Unprepare image conversion context */
+void ipu_image_convert_unprepare(struct image_converter_ctx *ctx)
+{
+	struct image_converter *cvt = ctx->cvt;
+	struct ipu_ic_priv *priv = cvt->ic->priv;
+	unsigned long flags;
+	bool put_res;
+
+	/* make sure no runs are hanging around */
+	ipu_image_convert_abort(ctx);
+
+	dev_dbg(priv->ipu->dev, "%s: removing ctx %p\n", __func__, ctx);
+
+	spin_lock_irqsave(&cvt->irqlock, flags);
+
+	list_del(&ctx->list);
+
+	put_res = list_empty(&cvt->ctx_list);
+
+	spin_unlock_irqrestore(&cvt->irqlock, flags);
+
+	if (put_res)
+		ipu_ic_release_ipu_resources(cvt);
+
+	ipu_ic_free_dma_buf(priv, &ctx->rot_intermediate[1]);
+	ipu_ic_free_dma_buf(priv, &ctx->rot_intermediate[0]);
+
+	kfree(ctx);
+}
+EXPORT_SYMBOL_GPL(ipu_image_convert_unprepare);
+
+/*
+ * "Canned" asynchronous single image conversion. On successful return
+ * caller must call ipu_image_convert_unprepare() after conversion completes.
+ * Returns the new conversion context.
+ */
+struct image_converter_ctx *
+ipu_image_convert(struct ipu_ic *ic,
+		  struct ipu_image *in, struct ipu_image *out,
+		  enum ipu_rotate_mode rot_mode,
+		  image_converter_cb_t complete,
+		  void *complete_context)
+{
+	struct image_converter_ctx *ctx;
+	struct image_converter_run *run;
+
+	ctx = ipu_image_convert_prepare(ic, in, out, rot_mode,
+					complete, complete_context);
+	if (IS_ERR(ctx))
+		return ctx;
+
+	run = ipu_image_convert_run(ctx, in->phys0, out->phys0);
+	if (IS_ERR(run)) {
+		ipu_image_convert_unprepare(ctx);
+		return ERR_PTR(PTR_ERR(run));
+	}
+
+	return ctx;
+}
+EXPORT_SYMBOL_GPL(ipu_image_convert);
+
+/* "Canned" synchronous single image conversion */
+static void image_convert_sync_complete(void *data,
+					struct image_converter_run *run,
+					int err)
+{
+	struct completion *comp = data;
+
+	complete(comp);
+}
+
+int ipu_image_convert_sync(struct ipu_ic *ic,
+			   struct ipu_image *in, struct ipu_image *out,
+			   enum ipu_rotate_mode rot_mode)
+{
+	struct image_converter_ctx *ctx;
+	struct completion comp;
+	int ret;
+
+	init_completion(&comp);
+
+	ctx = ipu_image_convert(ic, in, out, rot_mode,
+				image_convert_sync_complete, &comp);
+	if (IS_ERR(ctx))
+		return PTR_ERR(ctx);
+
+	ret = wait_for_completion_timeout(&comp, msecs_to_jiffies(10000));
+	ret = (ret = 0) ? -ETIMEDOUT : 0;
+
+	ipu_image_convert_unprepare(ctx);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(ipu_image_convert_sync);
+
 int ipu_ic_enable(struct ipu_ic *ic)
 {
 	struct ipu_ic_priv *priv = ic->priv;
@@ -746,6 +2418,7 @@ int ipu_ic_init(struct ipu_soc *ipu, struct device *dev,
 	ipu->ic_priv = priv;
 
 	spin_lock_init(&priv->lock);
+
 	priv->base = devm_ioremap(dev, base, PAGE_SIZE);
 	if (!priv->base)
 		return -ENOMEM;
@@ -758,10 +2431,21 @@ int ipu_ic_init(struct ipu_soc *ipu, struct device *dev,
 	priv->ipu = ipu;
 
 	for (i = 0; i < IC_NUM_TASKS; i++) {
-		priv->task[i].task = i;
-		priv->task[i].priv = priv;
-		priv->task[i].reg = &ic_task_reg[i];
-		priv->task[i].bit = &ic_task_bit[i];
+		struct ipu_ic *ic = &priv->task[i];
+		struct image_converter *cvt = &ic->cvt;
+
+		ic->task = i;
+		ic->priv = priv;
+		ic->reg = &ic_task_reg[i];
+		ic->bit = &ic_task_bit[i];
+		ic->ch = &ic_task_ch[i];
+
+		cvt->ic = ic;
+		spin_lock_init(&cvt->irqlock);
+		INIT_LIST_HEAD(&cvt->ctx_list);
+		INIT_LIST_HEAD(&cvt->pending_q);
+		INIT_LIST_HEAD(&cvt->done_q);
+		cvt->out_eof_irq = cvt->rot_out_eof_irq = -1;
 	}
 
 	return 0;
diff --git a/include/video/imx-ipu-v3.h b/include/video/imx-ipu-v3.h
index 1a3f7d4..992addf 100644
--- a/include/video/imx-ipu-v3.h
+++ b/include/video/imx-ipu-v3.h
@@ -63,17 +63,25 @@ enum ipu_csi_dest {
 /*
  * Enumeration of IPU rotation modes
  */
+#define IPU_ROT_BIT_VFLIP (1 << 0)
+#define IPU_ROT_BIT_HFLIP (1 << 1)
+#define IPU_ROT_BIT_90    (1 << 2)
+
 enum ipu_rotate_mode {
 	IPU_ROTATE_NONE = 0,
-	IPU_ROTATE_VERT_FLIP,
-	IPU_ROTATE_HORIZ_FLIP,
-	IPU_ROTATE_180,
-	IPU_ROTATE_90_RIGHT,
-	IPU_ROTATE_90_RIGHT_VFLIP,
-	IPU_ROTATE_90_RIGHT_HFLIP,
-	IPU_ROTATE_90_LEFT,
+	IPU_ROTATE_VERT_FLIP = IPU_ROT_BIT_VFLIP,
+	IPU_ROTATE_HORIZ_FLIP = IPU_ROT_BIT_HFLIP,
+	IPU_ROTATE_180 = (IPU_ROT_BIT_VFLIP | IPU_ROT_BIT_HFLIP),
+	IPU_ROTATE_90_RIGHT = IPU_ROT_BIT_90,
+	IPU_ROTATE_90_RIGHT_VFLIP = (IPU_ROT_BIT_90 | IPU_ROT_BIT_VFLIP),
+	IPU_ROTATE_90_RIGHT_HFLIP = (IPU_ROT_BIT_90 | IPU_ROT_BIT_HFLIP),
+	IPU_ROTATE_90_LEFT = (IPU_ROT_BIT_90 |
+			      IPU_ROT_BIT_VFLIP | IPU_ROT_BIT_HFLIP),
 };
 
+/* 90-degree rotations require the IRT unit */
+#define ipu_rot_mode_is_irt(m) ((m) >= IPU_ROTATE_90_RIGHT)
+
 enum ipu_color_space {
 	IPUV3_COLORSPACE_RGB,
 	IPUV3_COLORSPACE_YUV,
@@ -337,6 +345,7 @@ enum ipu_ic_task {
 };
 
 struct ipu_ic;
+
 int ipu_ic_task_init(struct ipu_ic *ic,
 		     int in_width, int in_height,
 		     int out_width, int out_height,
@@ -351,6 +360,40 @@ void ipu_ic_task_disable(struct ipu_ic *ic);
 int ipu_ic_task_idma_init(struct ipu_ic *ic, struct ipuv3_channel *channel,
 			  u32 width, u32 height, int burst_size,
 			  enum ipu_rotate_mode rot);
+
+struct image_converter_ctx;
+struct image_converter_run;
+
+typedef void (*image_converter_cb_t)(void *ctx,
+				     struct image_converter_run *run,
+				     int err);
+
+int ipu_image_convert_enum_format(int index, const char **desc, u32 *fourcc);
+int ipu_image_convert_adjust(struct ipu_image *in, struct ipu_image *out,
+			     enum ipu_rotate_mode rot_mode);
+int ipu_image_convert_verify(struct ipu_image *in, struct ipu_image *out,
+			     enum ipu_rotate_mode rot_mode);
+struct image_converter_ctx *
+ipu_image_convert_prepare(struct ipu_ic *ic,
+			  struct ipu_image *in, struct ipu_image *out,
+			  enum ipu_rotate_mode rot_mode,
+			  image_converter_cb_t complete,
+			  void *complete_context);
+void ipu_image_convert_unprepare(struct image_converter_ctx *ctx);
+struct image_converter_run *
+ipu_image_convert_run(struct image_converter_ctx *ctx,
+		      dma_addr_t in_phys, dma_addr_t out_phys);
+void ipu_image_convert_abort(struct image_converter_ctx *ctx);
+struct image_converter_ctx *
+ipu_image_convert(struct ipu_ic *ic,
+		  struct ipu_image *in, struct ipu_image *out,
+		  enum ipu_rotate_mode rot_mode,
+		  image_converter_cb_t complete,
+		  void *complete_context);
+int ipu_image_convert_sync(struct ipu_ic *ic,
+			   struct ipu_image *in, struct ipu_image *out,
+			   enum ipu_rotate_mode rot_mode);
+
 int ipu_ic_enable(struct ipu_ic *ic);
 int ipu_ic_disable(struct ipu_ic *ic);
 struct ipu_ic *ipu_ic_get(struct ipu_soc *ipu, enum ipu_ic_task task);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v4 4/4] gpu: ipu-ic: allow multiple handles to ic
  2016-08-18  0:50 ` Steve Longerbeam
@ 2016-08-18  0:50   ` Steve Longerbeam
  -1 siblings, 0 replies; 32+ messages in thread
From: Steve Longerbeam @ 2016-08-18  0:50 UTC (permalink / raw)
  To: p.zabel, plagnioj, tomi.valkeinen
  Cc: dri-devel, linux-kernel, linux-fbdev, Steve Longerbeam

The image converter kernel API supports conversion contexts and
job queues, so we should allow more than one handle to the IC, so
that multiple users can add jobs to the queue.

Note however that users that control the IC manually (that do not
use the image converter APIs but setup the IC task by hand via calls
to ipu_ic_task_enable(), ipu_ic_enable(), etc.) must still be careful not
to share the IC handle with other threads. At this point, the only user
that still controls the IC manually is the i.mx capture driver. In that
case the capture driver only allows one open context to get a handle
to the IC at a time, so we should be ok there.

Signed-off-by: Steve Longerbeam <steve_longerbeam@mentor.com>

---

v4: no changes
v3: no changes
v2: no changes
---
 drivers/gpu/ipu-v3/ipu-ic.c | 25 +------------------------
 1 file changed, 1 insertion(+), 24 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-ic.c b/drivers/gpu/ipu-v3/ipu-ic.c
index 01b1b56..c44aeba 100644
--- a/drivers/gpu/ipu-v3/ipu-ic.c
+++ b/drivers/gpu/ipu-v3/ipu-ic.c
@@ -338,7 +338,6 @@ struct ipu_ic {
 	enum ipu_color_space out_cs;
 	bool graphics;
 	bool rotation;
-	bool in_use;
 
 	struct image_converter cvt;
 
@@ -2370,38 +2369,16 @@ EXPORT_SYMBOL_GPL(ipu_ic_disable);
 struct ipu_ic *ipu_ic_get(struct ipu_soc *ipu, enum ipu_ic_task task)
 {
 	struct ipu_ic_priv *priv = ipu->ic_priv;
-	unsigned long flags;
-	struct ipu_ic *ic, *ret;
 
 	if (task >= IC_NUM_TASKS)
 		return ERR_PTR(-EINVAL);
 
-	ic = &priv->task[task];
-
-	spin_lock_irqsave(&priv->lock, flags);
-
-	if (ic->in_use) {
-		ret = ERR_PTR(-EBUSY);
-		goto unlock;
-	}
-
-	ic->in_use = true;
-	ret = ic;
-
-unlock:
-	spin_unlock_irqrestore(&priv->lock, flags);
-	return ret;
+	return &priv->task[task];
 }
 EXPORT_SYMBOL_GPL(ipu_ic_get);
 
 void ipu_ic_put(struct ipu_ic *ic)
 {
-	struct ipu_ic_priv *priv = ic->priv;
-	unsigned long flags;
-
-	spin_lock_irqsave(&priv->lock, flags);
-	ic->in_use = false;
-	spin_unlock_irqrestore(&priv->lock, flags);
 }
 EXPORT_SYMBOL_GPL(ipu_ic_put);
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v4 4/4] gpu: ipu-ic: allow multiple handles to ic
@ 2016-08-18  0:50   ` Steve Longerbeam
  0 siblings, 0 replies; 32+ messages in thread
From: Steve Longerbeam @ 2016-08-18  0:50 UTC (permalink / raw)
  To: p.zabel, plagnioj, tomi.valkeinen
  Cc: dri-devel, linux-kernel, linux-fbdev, Steve Longerbeam

The image converter kernel API supports conversion contexts and
job queues, so we should allow more than one handle to the IC, so
that multiple users can add jobs to the queue.

Note however that users that control the IC manually (that do not
use the image converter APIs but setup the IC task by hand via calls
to ipu_ic_task_enable(), ipu_ic_enable(), etc.) must still be careful not
to share the IC handle with other threads. At this point, the only user
that still controls the IC manually is the i.mx capture driver. In that
case the capture driver only allows one open context to get a handle
to the IC at a time, so we should be ok there.

Signed-off-by: Steve Longerbeam <steve_longerbeam@mentor.com>

---

v4: no changes
v3: no changes
v2: no changes
---
 drivers/gpu/ipu-v3/ipu-ic.c | 25 +------------------------
 1 file changed, 1 insertion(+), 24 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-ic.c b/drivers/gpu/ipu-v3/ipu-ic.c
index 01b1b56..c44aeba 100644
--- a/drivers/gpu/ipu-v3/ipu-ic.c
+++ b/drivers/gpu/ipu-v3/ipu-ic.c
@@ -338,7 +338,6 @@ struct ipu_ic {
 	enum ipu_color_space out_cs;
 	bool graphics;
 	bool rotation;
-	bool in_use;
 
 	struct image_converter cvt;
 
@@ -2370,38 +2369,16 @@ EXPORT_SYMBOL_GPL(ipu_ic_disable);
 struct ipu_ic *ipu_ic_get(struct ipu_soc *ipu, enum ipu_ic_task task)
 {
 	struct ipu_ic_priv *priv = ipu->ic_priv;
-	unsigned long flags;
-	struct ipu_ic *ic, *ret;
 
 	if (task >= IC_NUM_TASKS)
 		return ERR_PTR(-EINVAL);
 
-	ic = &priv->task[task];
-
-	spin_lock_irqsave(&priv->lock, flags);
-
-	if (ic->in_use) {
-		ret = ERR_PTR(-EBUSY);
-		goto unlock;
-	}
-
-	ic->in_use = true;
-	ret = ic;
-
-unlock:
-	spin_unlock_irqrestore(&priv->lock, flags);
-	return ret;
+	return &priv->task[task];
 }
 EXPORT_SYMBOL_GPL(ipu_ic_get);
 
 void ipu_ic_put(struct ipu_ic *ic)
 {
-	struct ipu_ic_priv *priv = ic->priv;
-	unsigned long flags;
-
-	spin_lock_irqsave(&priv->lock, flags);
-	ic->in_use = false;
-	spin_unlock_irqrestore(&priv->lock, flags);
 }
 EXPORT_SYMBOL_GPL(ipu_ic_put);
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 0/4] IPUv3 prep for i.MX5/6 v4l2 staging drivers, v4
  2016-08-18  0:50 ` Steve Longerbeam
@ 2016-08-25 14:17   ` Tim Harvey
  -1 siblings, 0 replies; 32+ messages in thread
From: Tim Harvey @ 2016-08-25 14:17 UTC (permalink / raw)
  To: Philipp Zabel
  Cc: plagnioj, tomi.valkeinen, linux-fbdev, Steve Longerbeam,
	linux-kernel, DRI mailing list, Steve Longerbeam,
	Lars-Peter Clausen, Fabio Estevam

On Wed, Aug 17, 2016 at 5:50 PM, Steve Longerbeam <slongerbeam@gmail.com> wrote:
> In this version:
>
> - rebased against latest drm-next.
> - cleaned up header includes in ipu-vdi.c.
> - do away with struct ipu_ic_tile_off in ipu-ic.c, and move tile offsets
>   into struct ipu_ic_tile. This paves the way for possibly allowing for
>   each tile to have different dimensions in the future.
>
>
> Steve Longerbeam (4):
>   gpu: ipu-v3: Add Video Deinterlacer unit
>   gpu: ipu-v3: Add FSU channel linking support
>   gpu: ipu-ic: Add complete image conversion support with tiling
>   gpu: ipu-ic: allow multiple handles to ic
>
>  drivers/gpu/ipu-v3/Makefile     |    2 +-
>  drivers/gpu/ipu-v3/ipu-common.c |  142 ++++
>  drivers/gpu/ipu-v3/ipu-ic.c     | 1719 ++++++++++++++++++++++++++++++++++++++-
>  drivers/gpu/ipu-v3/ipu-prv.h    |   33 +
>  drivers/gpu/ipu-v3/ipu-vdi.c    |  243 ++++++
>  include/video/imx-ipu-v3.h      |   93 ++-
>  6 files changed, 2195 insertions(+), 37 deletions(-)
>  create mode 100644 drivers/gpu/ipu-v3/ipu-vdi.c
>

Philipp,

Have you had a chance to review this last series of Steve's submitted
last week? We are down to 4 patches in gpu/ipu-v3 needed in order to
get his IMX5/6 capture driver into staging and the sooner we do that
the sooner we can get more testing and additional support/features for
it.

Regards,

Tim

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 0/4] IPUv3 prep for i.MX5/6 v4l2 staging drivers, v4
@ 2016-08-25 14:17   ` Tim Harvey
  0 siblings, 0 replies; 32+ messages in thread
From: Tim Harvey @ 2016-08-25 14:17 UTC (permalink / raw)
  To: Philipp Zabel
  Cc: plagnioj, tomi.valkeinen, linux-fbdev, Steve Longerbeam,
	linux-kernel, DRI mailing list, Steve Longerbeam,
	Lars-Peter Clausen, Fabio Estevam

On Wed, Aug 17, 2016 at 5:50 PM, Steve Longerbeam <slongerbeam@gmail.com> wrote:
> In this version:
>
> - rebased against latest drm-next.
> - cleaned up header includes in ipu-vdi.c.
> - do away with struct ipu_ic_tile_off in ipu-ic.c, and move tile offsets
>   into struct ipu_ic_tile. This paves the way for possibly allowing for
>   each tile to have different dimensions in the future.
>
>
> Steve Longerbeam (4):
>   gpu: ipu-v3: Add Video Deinterlacer unit
>   gpu: ipu-v3: Add FSU channel linking support
>   gpu: ipu-ic: Add complete image conversion support with tiling
>   gpu: ipu-ic: allow multiple handles to ic
>
>  drivers/gpu/ipu-v3/Makefile     |    2 +-
>  drivers/gpu/ipu-v3/ipu-common.c |  142 ++++
>  drivers/gpu/ipu-v3/ipu-ic.c     | 1719 ++++++++++++++++++++++++++++++++++++++-
>  drivers/gpu/ipu-v3/ipu-prv.h    |   33 +
>  drivers/gpu/ipu-v3/ipu-vdi.c    |  243 ++++++
>  include/video/imx-ipu-v3.h      |   93 ++-
>  6 files changed, 2195 insertions(+), 37 deletions(-)
>  create mode 100644 drivers/gpu/ipu-v3/ipu-vdi.c
>

Philipp,

Have you had a chance to review this last series of Steve's submitted
last week? We are down to 4 patches in gpu/ipu-v3 needed in order to
get his IMX5/6 capture driver into staging and the sooner we do that
the sooner we can get more testing and additional support/features for
it.

Regards,

Tim

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 0/4] IPUv3 prep for i.MX5/6 v4l2 staging drivers, v4
  2016-08-25 14:17   ` Tim Harvey
  (?)
@ 2016-09-05 14:41     ` Fabio Estevam
  -1 siblings, 0 replies; 32+ messages in thread
From: Fabio Estevam @ 2016-09-05 14:41 UTC (permalink / raw)
  To: Tim Harvey
  Cc: Philipp Zabel, linux-fbdev, Steve Longerbeam, linux-kernel,
	DRI mailing list, Tomi Valkeinen, Steve Longerbeam,
	Fabio Estevam, Jean-Christophe PLAGNIOL-VILLARD

Hi Philipp,

On Thu, Aug 25, 2016 at 11:17 AM, Tim Harvey <tharvey@gateworks.com> wrote:

> Philipp,
>
> Have you had a chance to review this last series of Steve's submitted
> last week? We are down to 4 patches in gpu/ipu-v3 needed in order to
> get his IMX5/6 capture driver into staging and the sooner we do that
> the sooner we can get more testing and additional support/features for
> it.

Do these patches look good to you? It would be nice to get them into
4.9 if possible.

Thanks

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 0/4] IPUv3 prep for i.MX5/6 v4l2 staging drivers, v4
@ 2016-09-05 14:41     ` Fabio Estevam
  0 siblings, 0 replies; 32+ messages in thread
From: Fabio Estevam @ 2016-09-05 14:41 UTC (permalink / raw)
  To: Tim Harvey
  Cc: linux-fbdev, Steve Longerbeam, linux-kernel, DRI mailing list,
	Tomi Valkeinen, Steve Longerbeam, Fabio Estevam,
	Jean-Christophe PLAGNIOL-VILLARD

Hi Philipp,

On Thu, Aug 25, 2016 at 11:17 AM, Tim Harvey <tharvey@gateworks.com> wrote:

> Philipp,
>
> Have you had a chance to review this last series of Steve's submitted
> last week? We are down to 4 patches in gpu/ipu-v3 needed in order to
> get his IMX5/6 capture driver into staging and the sooner we do that
> the sooner we can get more testing and additional support/features for
> it.

Do these patches look good to you? It would be nice to get them into
4.9 if possible.

Thanks

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 0/4] IPUv3 prep for i.MX5/6 v4l2 staging drivers, v4
@ 2016-09-05 14:41     ` Fabio Estevam
  0 siblings, 0 replies; 32+ messages in thread
From: Fabio Estevam @ 2016-09-05 14:41 UTC (permalink / raw)
  To: Tim Harvey
  Cc: linux-fbdev, Steve Longerbeam, linux-kernel, DRI mailing list,
	Tomi Valkeinen, Steve Longerbeam, Fabio Estevam,
	Jean-Christophe PLAGNIOL-VILLARD

Hi Philipp,

On Thu, Aug 25, 2016 at 11:17 AM, Tim Harvey <tharvey@gateworks.com> wrote:

> Philipp,
>
> Have you had a chance to review this last series of Steve's submitted
> last week? We are down to 4 patches in gpu/ipu-v3 needed in order to
> get his IMX5/6 capture driver into staging and the sooner we do that
> the sooner we can get more testing and additional support/features for
> it.

Do these patches look good to you? It would be nice to get them into
4.9 if possible.

Thanks
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 0/4] IPUv3 prep for i.MX5/6 v4l2 staging drivers, v4
  2016-09-05 14:41     ` Fabio Estevam
  (?)
@ 2016-09-06  9:26       ` Philipp Zabel
  -1 siblings, 0 replies; 32+ messages in thread
From: Philipp Zabel @ 2016-09-06  9:26 UTC (permalink / raw)
  To: Fabio Estevam
  Cc: Tim Harvey, linux-fbdev, Steve Longerbeam, linux-kernel,
	DRI mailing list, Tomi Valkeinen, Steve Longerbeam,
	Fabio Estevam, Jean-Christophe PLAGNIOL-VILLARD

Am Montag, den 05.09.2016, 11:41 -0300 schrieb Fabio Estevam:
> Hi Philipp,
> 
> On Thu, Aug 25, 2016 at 11:17 AM, Tim Harvey <tharvey@gateworks.com> wrote:
> 
> > Philipp,
> >
> > Have you had a chance to review this last series of Steve's submitted
> > last week? We are down to 4 patches in gpu/ipu-v3 needed in order to
> > get his IMX5/6 capture driver into staging and the sooner we do that
> > the sooner we can get more testing and additional support/features for
> > it.
> 
> Do these patches look good to you? It would be nice to get them into
> 4.9 if possible.

I have applied all but the IC patches so far. For those I have a few
comments.

regards
Philipp

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 0/4] IPUv3 prep for i.MX5/6 v4l2 staging drivers, v4
@ 2016-09-06  9:26       ` Philipp Zabel
  0 siblings, 0 replies; 32+ messages in thread
From: Philipp Zabel @ 2016-09-06  9:26 UTC (permalink / raw)
  To: Fabio Estevam
  Cc: linux-fbdev, Steve Longerbeam, linux-kernel, DRI mailing list,
	Tomi Valkeinen, Steve Longerbeam, Fabio Estevam,
	Jean-Christophe PLAGNIOL-VILLARD

Am Montag, den 05.09.2016, 11:41 -0300 schrieb Fabio Estevam:
> Hi Philipp,
> 
> On Thu, Aug 25, 2016 at 11:17 AM, Tim Harvey <tharvey@gateworks.com> wrote:
> 
> > Philipp,
> >
> > Have you had a chance to review this last series of Steve's submitted
> > last week? We are down to 4 patches in gpu/ipu-v3 needed in order to
> > get his IMX5/6 capture driver into staging and the sooner we do that
> > the sooner we can get more testing and additional support/features for
> > it.
> 
> Do these patches look good to you? It would be nice to get them into
> 4.9 if possible.

I have applied all but the IC patches so far. For those I have a few
comments.

regards
Philipp


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 0/4] IPUv3 prep for i.MX5/6 v4l2 staging drivers, v4
@ 2016-09-06  9:26       ` Philipp Zabel
  0 siblings, 0 replies; 32+ messages in thread
From: Philipp Zabel @ 2016-09-06  9:26 UTC (permalink / raw)
  To: Fabio Estevam
  Cc: linux-fbdev, Steve Longerbeam, linux-kernel, DRI mailing list,
	Tomi Valkeinen, Steve Longerbeam, Fabio Estevam,
	Jean-Christophe PLAGNIOL-VILLARD

Am Montag, den 05.09.2016, 11:41 -0300 schrieb Fabio Estevam:
> Hi Philipp,
> 
> On Thu, Aug 25, 2016 at 11:17 AM, Tim Harvey <tharvey@gateworks.com> wrote:
> 
> > Philipp,
> >
> > Have you had a chance to review this last series of Steve's submitted
> > last week? We are down to 4 patches in gpu/ipu-v3 needed in order to
> > get his IMX5/6 capture driver into staging and the sooner we do that
> > the sooner we can get more testing and additional support/features for
> > it.
> 
> Do these patches look good to you? It would be nice to get them into
> 4.9 if possible.

I have applied all but the IC patches so far. For those I have a few
comments.

regards
Philipp

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 3/4] gpu: ipu-ic: Add complete image conversion support with tiling
  2016-08-18  0:50   ` Steve Longerbeam
  (?)
@ 2016-09-06  9:26     ` Philipp Zabel
  -1 siblings, 0 replies; 32+ messages in thread
From: Philipp Zabel @ 2016-09-06  9:26 UTC (permalink / raw)
  To: Steve Longerbeam
  Cc: plagnioj, tomi.valkeinen, dri-devel, linux-kernel, linux-fbdev,
	Steve Longerbeam

Hi Steve,

Am Mittwoch, den 17.08.2016, 17:50 -0700 schrieb Steve Longerbeam:
> This patch implements complete image conversion support to ipu-ic,
> with tiling to support scaling to and from images up to 4096x4096.
> Image rotation is also supported.
> 
> The internal API is subsystem agnostic (no V4L2 dependency except
> for the use of V4L2 fourcc pixel formats).
> 
> Callers prepare for image conversion by calling
> ipu_image_convert_prepare(), which initializes the parameters of
> the conversion.

... and possibly allocates intermediate buffers for rotation support.
This should be documented somewhere, with a node that v4l2 users should
be doing this during REQBUFS.

>  The caller passes in the ipu_ic task to use for
> the conversion, the input and output image formats, a rotation mode,
> and a completion callback and completion context pointer:
> 
> struct image_converter_ctx *
> ipu_image_convert_prepare(struct ipu_ic *ic,
>                           struct ipu_image *in, struct ipu_image *out,
>                           enum ipu_rotate_mode rot_mode,
>                           image_converter_cb_t complete,
>                           void *complete_context);

As I commented on the other patch, I think the image_convert functions
should use a separate handle for the image conversion queues that sit on
top of the ipu_ic task handles.

> The caller is given a new conversion context that must be passed to
> the further APIs:
> 
> struct image_converter_run *
> ipu_image_convert_run(struct image_converter_ctx *ctx,
>                       dma_addr_t in_phys, dma_addr_t out_phys);
> 
> This queues a new image conversion request to a run queue, and
> starts the conversion immediately if the run queue is empty. Only
> the physaddr's of the input and output image buffers are needed,
> since the conversion context was created previously with
> ipu_image_convert_prepare(). Returns a new run object pointer. When
> the conversion completes, the run pointer is returned to the
> completion callback.
>
> void image_convert_abort(struct image_converter_ctx *ctx);
> 
> This will abort any active or pending conversions for this context.
> Any currently active or pending runs belonging to this context are
> returned via the completion callback with an error status.
>
> void ipu_image_convert_unprepare(struct image_converter_ctx *ctx);
> 
> Unprepares the conversion context. Any active or pending runs will
> be aborted by calling image_convert_abort().
> 
> Signed-off-by: Steve Longerbeam <steve_longerbeam@mentor.com>
> 
> ---
> 
> v4:
> - do away with struct ipu_ic_tile_off, and move tile offsets into
>   struct ipu_ic_tile. This paves the way for possibly allowing for
>   each tile to have different dimensions in the future.

Thanks, this looks a lot better to me.

> v3: no changes
> v2: no changes
> ---
>  drivers/gpu/ipu-v3/ipu-ic.c | 1694 ++++++++++++++++++++++++++++++++++++++++++-
>  include/video/imx-ipu-v3.h  |   57 +-
>  2 files changed, 1739 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/gpu/ipu-v3/ipu-ic.c b/drivers/gpu/ipu-v3/ipu-ic.c
> index 1a37afc..01b1b56 100644
> --- a/drivers/gpu/ipu-v3/ipu-ic.c
> +++ b/drivers/gpu/ipu-v3/ipu-ic.c
> @@ -17,6 +17,8 @@
>  #include <linux/bitrev.h>
>  #include <linux/io.h>
>  #include <linux/err.h>
> +#include <linux/interrupt.h>
> +#include <linux/dma-mapping.h>
>  #include "ipu-prv.h"
>  
>  /* IC Register Offsets */
> @@ -82,6 +84,40 @@
>  #define IC_IDMAC_3_PP_WIDTH_MASK        (0x3ff << 20)
>  #define IC_IDMAC_3_PP_WIDTH_OFFSET      20
>  
> +/*
> + * The IC Resizer has a restriction that the output frame from the
> + * resizer must be 1024 or less in both width (pixels) and height
> + * (lines).
> + *
> + * The image conversion support attempts to split up a conversion when
> + * the desired output (converted) frame resolution exceeds the IC resizer
> + * limit of 1024 in either dimension.
> + *
> + * If either dimension of the output frame exceeds the limit, the
> + * dimension is split into 1, 2, or 4 equal stripes, for a maximum
> + * of 4*4 or 16 tiles. A conversion is then carried out for each
> + * tile (but taking care to pass the full frame stride length to
> + * the DMA channel's parameter memory!). IDMA double-buffering is used
> + * to convert each tile back-to-back when possible (see note below
> + * when double_buffering boolean is set).
> + *
> + * Note that the input frame must be split up into the same number
> + * of tiles as the output frame.
> + */
> +#define MAX_STRIPES_W    4
> +#define MAX_STRIPES_H    4
> +#define MAX_TILES (MAX_STRIPES_W * MAX_STRIPES_H)
> +
> +#define MIN_W     128
> +#define MIN_H     128

Where does this minimum come from?

> +#define MAX_W     4096
> +#define MAX_H     4096
> +
> +enum image_convert_type {
> +	IMAGE_CONVERT_IN = 0,
> +	IMAGE_CONVERT_OUT,
> +};
> +
>  struct ic_task_regoffs {
>  	u32 rsc;
>  	u32 tpmem_csc[2];
> @@ -96,6 +132,16 @@ struct ic_task_bitfields {
>  	u32 ic_cmb_galpha_bit;
>  };
>  
> +struct ic_task_channels {
> +	int in;
> +	int out;
> +	int rot_in;
> +	int rot_out;
> +	int vdi_in_p;
> +	int vdi_in;
> +	int vdi_in_n;

The vdi channels are unused.

> +};
> +
>  static const struct ic_task_regoffs ic_task_reg[IC_NUM_TASKS] = {
>  	[IC_TASK_ENCODER] = {
>  		.rsc = IC_PRP_ENC_RSC,
> @@ -138,12 +184,155 @@ static const struct ic_task_bitfields ic_task_bit[IC_NUM_TASKS] = {
>  	},
>  };
>  
> +static const struct ic_task_channels ic_task_ch[IC_NUM_TASKS] = {
> +	[IC_TASK_ENCODER] = {
> +		.out = IPUV3_CHANNEL_IC_PRP_ENC_MEM,
> +		.rot_in = IPUV3_CHANNEL_MEM_ROT_ENC,
> +		.rot_out = IPUV3_CHANNEL_ROT_ENC_MEM,
> +	},
> +	[IC_TASK_VIEWFINDER] = {
> +		.in = IPUV3_CHANNEL_MEM_IC_PRP_VF,
> +		.out = IPUV3_CHANNEL_IC_PRP_VF_MEM,
> +		.rot_in = IPUV3_CHANNEL_MEM_ROT_VF,
> +		.rot_out = IPUV3_CHANNEL_ROT_VF_MEM,
> +		.vdi_in_p = IPUV3_CHANNEL_MEM_VDI_PREV,
> +		.vdi_in = IPUV3_CHANNEL_MEM_VDI_CUR,
> +		.vdi_in_n = IPUV3_CHANNEL_MEM_VDI_NEXT,

See above.

> +	},
> +	[IC_TASK_POST_PROCESSOR] = {
> +		.in = IPUV3_CHANNEL_MEM_IC_PP,
> +		.out = IPUV3_CHANNEL_IC_PP_MEM,
> +		.rot_in = IPUV3_CHANNEL_MEM_ROT_PP,
> +		.rot_out = IPUV3_CHANNEL_ROT_PP_MEM,
> +	},
> +};
> +
> +struct ipu_ic_dma_buf {
> +	void          *virt;
> +	dma_addr_t    phys;
> +	unsigned long len;
> +};
> +
> +/* dimensions of one tile */
> +struct ipu_ic_tile {
> +	u32 width;
> +	u32 height;
> +	/* size and strides are in bytes */
> +	u32 size;
> +	u32 stride;
> +	u32 rot_stride;
> +	/* start Y or packed offset of this tile */
> +	u32 offset;
> +	/* offset from start to tile in U plane, for planar formats */
> +	u32 u_off;
> +	/* offset from start to tile in V plane, for planar formats */
> +	u32 v_off;
> +};
> +
> +struct ipu_ic_pixfmt {
> +	char	*name;
> +	u32	fourcc;        /* V4L2 fourcc */
> +	int     bpp;           /* total bpp */
> +	int     y_depth;       /* depth of Y plane for planar formats */
> +	int     uv_width_dec;  /* decimation in width for U/V planes */
> +	int     uv_height_dec; /* decimation in height for U/V planes */
> +	bool    uv_swapped;    /* U and V planes are swapped */
> +	bool    uv_packed;     /* partial planar (U and V in same plane) */
> +};
> +
> +struct ipu_ic_image {
> +	struct ipu_image base;
> +	enum image_convert_type type;
> +
> +	const struct ipu_ic_pixfmt *fmt;
> +	unsigned int stride;
> +
> +	/* # of rows (horizontal stripes) if dest height is > 1024 */
> +	unsigned int num_rows;
> +	/* # of columns (vertical stripes) if dest width is > 1024 */
> +	unsigned int num_cols;
> +
> +	struct ipu_ic_tile tile[MAX_TILES];
> +};
> +
> +struct image_converter_ctx;
> +struct image_converter;
>  struct ipu_ic_priv;
> +struct ipu_ic;
> +
> +struct image_converter_run {
> +	struct image_converter_ctx *ctx;
> +
> +	dma_addr_t in_phys;
> +	dma_addr_t out_phys;
> +
> +	int status;
> +
> +	struct list_head list;
> +};
> +
> +struct image_converter_ctx {
> +	struct image_converter *cvt;
> +
> +	image_converter_cb_t complete;
> +	void *complete_context;
> +
> +	/* Source/destination image data and rotation mode */
> +	struct ipu_ic_image in;
> +	struct ipu_ic_image out;
> +	enum ipu_rotate_mode rot_mode;
> +
> +	/* intermediate buffer for rotation */
> +	struct ipu_ic_dma_buf rot_intermediate[2];

No need to change it now, but I assume these could be per IC task
instead of per context.

> +	/* current buffer number for double buffering */
> +	int cur_buf_num;
> +
> +	bool aborting;
> +	struct completion aborted;
> +
> +	/* can we use double-buffering for this conversion operation? */
> +	bool double_buffering;
> +	/* num_rows * num_cols */
> +	unsigned int num_tiles;
> +	/* next tile to process */
> +	unsigned int next_tile;
> +	/* where to place converted tile in dest image */
> +	unsigned int out_tile_map[MAX_TILES];
> +
> +	struct list_head list;
> +};
> +
> +struct image_converter {
> +	struct ipu_ic *ic;
> +
> +	struct ipuv3_channel *in_chan;
> +	struct ipuv3_channel *out_chan;
> +	struct ipuv3_channel *rotation_in_chan;
> +	struct ipuv3_channel *rotation_out_chan;
> +
> +	/* the IPU end-of-frame irqs */
> +	int out_eof_irq;
> +	int rot_out_eof_irq;
> +
> +	spinlock_t irqlock;
> +
> +	/* list of convert contexts */
> +	struct list_head ctx_list;
> +	/* queue of conversion runs */
> +	struct list_head pending_q;
> +	/* queue of completed runs */
> +	struct list_head done_q;
> +
> +	/* the current conversion run */
> +	struct image_converter_run *current_run;
> +};
>  
>  struct ipu_ic {
>  	enum ipu_ic_task task;
>  	const struct ic_task_regoffs *reg;
>  	const struct ic_task_bitfields *bit;
> +	const struct ic_task_channels *ch;
>  
>  	enum ipu_color_space in_cs, g_in_cs;
>  	enum ipu_color_space out_cs;
> @@ -151,6 +340,8 @@ struct ipu_ic {
>  	bool rotation;
>  	bool in_use;
>  
> +	struct image_converter cvt;
> +
>  	struct ipu_ic_priv *priv;
>  };
>  
> @@ -619,7 +810,7 @@ int ipu_ic_task_idma_init(struct ipu_ic *ic, struct ipuv3_channel *channel,
>  	ipu_ic_write(ic, ic_idmac_2, IC_IDMAC_2);
>  	ipu_ic_write(ic, ic_idmac_3, IC_IDMAC_3);
>  
> -	if (rot >= IPU_ROTATE_90_RIGHT)
> +	if (ipu_rot_mode_is_irt(rot))
>  		ic->rotation = true;
>  
>  unlock:
> @@ -648,6 +839,1487 @@ static void ipu_irt_disable(struct ipu_ic *ic)
>  	}
>  }
>  
> +/*
> + * Complete image conversion support follows
> + */
> +
> +static const struct ipu_ic_pixfmt ipu_ic_formats[] = {
> +	{
> +		.name	= "RGB565",

Please drop the names, keeping a list of user readable format names is
the v4l2 core's business, not ours.

> +		.fourcc	= V4L2_PIX_FMT_RGB565,
> +		.bpp    = 16,

bpp is only ever used in bytes, not bits (always divided by 8).
Why not make this bytes_per_pixel or pixel_stride = 2.

> +	}, {
> +		.name	= "RGB24",
> +		.fourcc	= V4L2_PIX_FMT_RGB24,
> +		.bpp    = 24,
> +	}, {
> +		.name	= "BGR24",
> +		.fourcc	= V4L2_PIX_FMT_BGR24,
> +		.bpp    = 24,
> +	}, {
> +		.name	= "RGB32",
> +		.fourcc	= V4L2_PIX_FMT_RGB32,
> +		.bpp    = 32,
> +	}, {
> +		.name	= "BGR32",
> +		.fourcc	= V4L2_PIX_FMT_BGR32,
> +		.bpp    = 32,
> +	}, {
> +		.name	= "4:2:2 packed, YUYV",
> +		.fourcc	= V4L2_PIX_FMT_YUYV,
> +		.bpp    = 16,
> +		.uv_width_dec = 2,
> +		.uv_height_dec = 1,
> +	}, {
> +		.name	= "4:2:2 packed, UYVY",
> +		.fourcc	= V4L2_PIX_FMT_UYVY,
> +		.bpp    = 16,
> +		.uv_width_dec = 2,
> +		.uv_height_dec = 1,
> +	}, {
> +		.name	= "4:2:0 planar, YUV",
> +		.fourcc	= V4L2_PIX_FMT_YUV420,
> +		.bpp    = 12,
> +		.y_depth = 8,

y_depth is only ever used in bytes, not bits (always divided by 8).
Why not make this bool planar instead.

> +		.uv_width_dec = 2,
> +		.uv_height_dec = 2,
> +	}, {
> +		.name	= "4:2:0 planar, YVU",
> +		.fourcc	= V4L2_PIX_FMT_YVU420,
> +		.bpp    = 12,
> +		.y_depth = 8,
> +		.uv_width_dec = 2,
> +		.uv_height_dec = 2,
> +		.uv_swapped = true,
> +	}, {
> +		.name   = "4:2:0 partial planar, NV12",
> +		.fourcc = V4L2_PIX_FMT_NV12,
> +		.bpp    = 12,
> +		.y_depth = 8,
> +		.uv_width_dec = 2,
> +		.uv_height_dec = 2,
> +		.uv_packed = true,
> +	}, {
> +		.name   = "4:2:2 planar, YUV",
> +		.fourcc = V4L2_PIX_FMT_YUV422P,
> +		.bpp    = 16,
> +		.y_depth = 8,
> +		.uv_width_dec = 2,
> +		.uv_height_dec = 1,
> +	}, {
> +		.name   = "4:2:2 partial planar, NV16",
> +		.fourcc = V4L2_PIX_FMT_NV16,
> +		.bpp    = 16,
> +		.y_depth = 8,
> +		.uv_width_dec = 2,
> +		.uv_height_dec = 1,
> +		.uv_packed = true,
> +	},
> +};
> +
> +static const struct ipu_ic_pixfmt *ipu_ic_get_format(u32 fourcc)
> +{
> +	const struct ipu_ic_pixfmt *ret = NULL;
> +	unsigned int i;
> +
> +	for (i = 0; i < ARRAY_SIZE(ipu_ic_formats); i++) {
> +		if (ipu_ic_formats[i].fourcc == fourcc) {
> +			ret = &ipu_ic_formats[i];
> +			break;
> +		}
> +	}
> +
> +	return ret;
> +}
> +
> +static void ipu_ic_dump_format(struct image_converter_ctx *ctx,
> +			       struct ipu_ic_image *ic_image)
> +{
> +	struct ipu_ic_priv *priv = ctx->cvt->ic->priv;
> +
> +	dev_dbg(priv->ipu->dev,
> +		"ctx %p: %s format: %dx%d (%dx%d tiles of size %dx%d), %c%c%c%c\n",
> +		ctx,
> +		ic_image->type == IMAGE_CONVERT_OUT ? "Output" : "Input",
> +		ic_image->base.pix.width, ic_image->base.pix.height,
> +		ic_image->num_cols, ic_image->num_rows,
> +		ic_image->tile[0].width, ic_image->tile[0].height,
> +		ic_image->fmt->fourcc & 0xff,
> +		(ic_image->fmt->fourcc >> 8) & 0xff,
> +		(ic_image->fmt->fourcc >> 16) & 0xff,
> +		(ic_image->fmt->fourcc >> 24) & 0xff);
> +}
> +
> +int ipu_image_convert_enum_format(int index, const char **desc, u32 *fourcc)
> +{
> +	const struct ipu_ic_pixfmt *fmt;
> +
> +	if (index >= (int)ARRAY_SIZE(ipu_ic_formats))
> +		return -EINVAL;
> +
> +	/* Format found */
> +	fmt = &ipu_ic_formats[index];
> +	*desc = fmt->name;
> +	*fourcc = fmt->fourcc;
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(ipu_image_convert_enum_format);
> +
> +static void ipu_ic_free_dma_buf(struct ipu_ic_priv *priv,
> +				struct ipu_ic_dma_buf *buf)
> +{
> +	if (buf->virt)
> +		dma_free_coherent(priv->ipu->dev,
> +				  buf->len, buf->virt, buf->phys);
> +	buf->virt = NULL;
> +	buf->phys = 0;
> +}
> +
> +static int ipu_ic_alloc_dma_buf(struct ipu_ic_priv *priv,
> +				struct ipu_ic_dma_buf *buf,
> +				int size)
> +{
> +	unsigned long newlen = PAGE_ALIGN(size);
> +
> +	if (buf->virt) {
> +		if (buf->len == newlen)
> +			return 0;
> +		ipu_ic_free_dma_buf(priv, buf);
> +	}

Is it necessary to support reallocation? This is currently only used by
the prepare function, which creates a new context.

> +
> +	buf->len = newlen;
> +	buf->virt = dma_alloc_coherent(priv->ipu->dev, buf->len, &buf->phys,
> +				       GFP_DMA | GFP_KERNEL);
> +	if (!buf->virt) {
> +		dev_err(priv->ipu->dev, "failed to alloc dma buffer\n");
> +		return -ENOMEM;
> +	}
> +
> +	return 0;
> +}
> +
> +static inline int ipu_ic_num_stripes(int dim)
> +{
> +	if (dim <= 1024)
> +		return 1;
> +	else if (dim <= 2048)
> +		return 2;
> +	else
> +		return 4;
> +}
> +
> +static void ipu_ic_calc_tile_dimensions(struct image_converter_ctx *ctx,
> +					struct ipu_ic_image *image)
> +{
> +	int i;
> +
> +	for (i = 0; i < ctx->num_tiles; i++) {
> +		struct ipu_ic_tile *tile = &image->tile[i];
> +
> +		tile->height = image->base.pix.height / image->num_rows;
> +		tile->width = image->base.pix.width / image->num_cols;

We already have talked about this, this simplified tiling will cause
image artifacts (horizontal and vertical seams at the tile borders) when
the bilinear upscaler source pixel step is significantly smaller than a
whole pixel.
This can be fixed in the future by using overlapping tiles of different
sizes and possibly by slightly changing the scaling factors of
individual tiles.

> +		tile->size = ((tile->height * image->fmt->bpp) >> 3) *
> +			tile->width;
> +
> +		if (image->fmt->y_depth) {
> +			tile->stride =
> +				(image->fmt->y_depth * tile->width) >> 3;
> +			tile->rot_stride =
> +				(image->fmt->y_depth * tile->height) >> 3;
> +		} else {
> +			tile->stride =
> +				(image->fmt->bpp * tile->width) >> 3;
> +			tile->rot_stride =
> +				(image->fmt->bpp * tile->height) >> 3;
> +		}
> +	}
> +}
> +
> +/*
> + * Use the rotation transformation to find the tile coordinates
> + * (row, col) of a tile in the destination frame that corresponds
> + * to the given tile coordinates of a source frame. The destination
> + * coordinate is then converted to a tile index.
> + */
> +static int ipu_ic_transform_tile_index(struct image_converter_ctx *ctx,
> +				       int src_row, int src_col)
> +{
> +	struct ipu_ic_priv *priv = ctx->cvt->ic->priv;
> +	struct ipu_ic_image *s_image = &ctx->in;
> +	struct ipu_ic_image *d_image = &ctx->out;
> +	int cos, sin, dst_row, dst_col;
> +
> +	/* with no rotation it's a 1:1 mapping */
> +	if (ctx->rot_mode == IPU_ROTATE_NONE)
> +		return src_row * s_image->num_cols + src_col;
> +
> +	if (ctx->rot_mode & IPU_ROT_BIT_90) {
> +		cos = 0;
> +		sin = 1;
> +	} else {
> +		cos = 1;
> +		sin = 0;
> +	}
>+
> +	/*
> +	 * before doing the transform, first we have to translate
> +	 * source row,col for an origin in the center of s_image
> +	 */
> +	src_row *= 2;
> +	src_col *= 2;
> +	src_row -= s_image->num_rows - 1;
> +	src_col -= s_image->num_cols - 1;
> +
> +	/* do the rotation transform */
> +	dst_col = src_col * cos - src_row * sin;
> +	dst_row = src_col * sin + src_row * cos;

This looks nice, but I'd just move the rot_mode conditional below
assignment of src_row/col and do away with the sin/cos temporary
variables:

	/*
	 * before doing the transform, first we have to translate
	 * source row,col for an origin in the center of s_image
	 */
	src_row = src_row * 2 - (s_image->num_rows - 1);
	src_col = src_col * 2 - (s_image->num_cols - 1);

	/* do the rotation transform */
	if (ctx->rot_mode & IPU_ROT_BIT_90) {
		dst_col = -src_row;
		dst_row = src_col;
	} else {
		dst_col = src_col;
		dst_row = src_row;
	}

> +	/* apply flip */
> +	if (ctx->rot_mode & IPU_ROT_BIT_HFLIP)
> +		dst_col = -dst_col;
> +	if (ctx->rot_mode & IPU_ROT_BIT_VFLIP)
> +		dst_row = -dst_row;
> +
> +	dev_dbg(priv->ipu->dev, "ctx %p: [%d,%d] --> [%d,%d]\n",
> +		ctx, src_col, src_row, dst_col, dst_row);
> +
> +	/*
> +	 * finally translate dest row,col using an origin in upper
> +	 * left of d_image
> +	 */
> +	dst_row += d_image->num_rows - 1;
> +	dst_col += d_image->num_cols - 1;
> +	dst_row /= 2;
> +	dst_col /= 2;
> +
> +	return dst_row * d_image->num_cols + dst_col;
> +}
> +
> +/*
> + * Fill the out_tile_map[] with transformed destination tile indeces.
> + */
> +static void ipu_ic_calc_out_tile_map(struct image_converter_ctx *ctx)
> +{
> +	struct ipu_ic_image *s_image = &ctx->in;
> +	unsigned int row, col, tile = 0;
> +
> +	for (row = 0; row < s_image->num_rows; row++) {
> +		for (col = 0; col < s_image->num_cols; col++) {
> +			ctx->out_tile_map[tile] =
> +				ipu_ic_transform_tile_index(ctx, row, col);
> +			tile++;
> +		}
> +	}
> +}
> +
> +static void ipu_ic_calc_tile_offsets_planar(struct image_converter_ctx *ctx,
> +					    struct ipu_ic_image *image)
> +{
> +	struct ipu_ic_priv *priv = ctx->cvt->ic->priv;
> +	const struct ipu_ic_pixfmt *fmt = image->fmt;
> +	unsigned int row, col, tile = 0;
> +	u32 H, w, h, y_depth, y_stride, uv_stride;
> +	u32 uv_row_off, uv_col_off, uv_off, u_off, v_off, tmp;
> +	u32 y_row_off, y_col_off, y_off;
> +	u32 y_size, uv_size;
> +
> +	/* setup some convenience vars */
> +	H = image->base.pix.height;
> +
> +	y_depth = fmt->y_depth;
> +	y_stride = image->stride;
> +	uv_stride = y_stride / fmt->uv_width_dec;
> +	if (fmt->uv_packed)
> +		uv_stride *= 2;
> +
> +	y_size = H * y_stride;
> +	uv_size = y_size / (fmt->uv_width_dec * fmt->uv_height_dec);
> +
> +	for (row = 0; row < image->num_rows; row++) {
> +		w = image->tile[tile].width;
> +		h = image->tile[tile].height;
> +		y_row_off = row * h * y_stride;
> +		uv_row_off = (row * h * uv_stride) / fmt->uv_height_dec;
> +
> +		for (col = 0; col < image->num_cols; col++) {
> +			y_col_off = (col * w * y_depth) >> 3;

We know that for planar formats, y_depth can only ever be 8. No need to
calculate this here.

> +			uv_col_off = y_col_off / fmt->uv_width_dec;
> +			if (fmt->uv_packed)
> +				uv_col_off *= 2;
> +
> +			y_off = y_row_off + y_col_off;
> +			uv_off = uv_row_off + uv_col_off;
> +
> +			u_off = y_size - y_off + uv_off;
> +			v_off = (fmt->uv_packed) ? 0 : u_off + uv_size;
> +			if (fmt->uv_swapped) {
> +				tmp = u_off;
> +				u_off = v_off;
> +				v_off = tmp;
> +			}
> +
> +			image->tile[tile].offset = y_off;
> +			image->tile[tile].u_off = u_off;
> +			image->tile[tile++].v_off = v_off;
> +
> +			dev_dbg(priv->ipu->dev,
> +				"ctx %p: %s@[%d,%d]: y_off %08x, u_off %08x, v_off %08x\n",
> +				ctx, image->type == IMAGE_CONVERT_IN ?
> +				"Input" : "Output", row, col,
> +				y_off, u_off, v_off);
> +		}
> +	}
> +}
> +
> +static void ipu_ic_calc_tile_offsets_packed(struct image_converter_ctx *ctx,
> +					    struct ipu_ic_image *image)
> +{
> +	struct ipu_ic_priv *priv = ctx->cvt->ic->priv;
> +	const struct ipu_ic_pixfmt *fmt = image->fmt;
> +	unsigned int row, col, tile = 0;
> +	u32 w, h, bpp, stride;
> +	u32 row_off, col_off;
> +
> +	/* setup some convenience vars */
> +	stride = image->stride;
> +	bpp = fmt->bpp;
> +
> +	for (row = 0; row < image->num_rows; row++) {
> +		w = image->tile[tile].width;
> +		h = image->tile[tile].height;
> +		row_off = row * h * stride;
> +
> +		for (col = 0; col < image->num_cols; col++) {
> +			col_off = (col * w * bpp) >> 3;
> +
> +			image->tile[tile].offset = row_off + col_off;
> +			image->tile[tile].u_off = 0;
> +			image->tile[tile++].v_off = 0;
> +
> +			dev_dbg(priv->ipu->dev,
> +				"ctx %p: %s@[%d,%d]: phys %08x\n", ctx,
> +				image->type == IMAGE_CONVERT_IN ?
> +				"Input" : "Output", row, col,
> +				row_off + col_off);
> +		}
> +	}
> +}
> +
> +static void ipu_ic_calc_tile_offsets(struct image_converter_ctx *ctx,
> +				     struct ipu_ic_image *image)
> +{
> +	if (image->fmt->y_depth)
> +		ipu_ic_calc_tile_offsets_planar(ctx, image);
> +	else
> +		ipu_ic_calc_tile_offsets_packed(ctx, image);
> +}
> +
> +/*
> + * return the number of runs in given queue (pending_q or done_q)
> + * for this context. hold irqlock when calling.
> + */

Most of the following code seems to be running under one big spinlock.
Is this really necessary?
All the IRQ handlers to is potentially call ipu_ic_convert_stop, update
the CPMEM, mark buffers as ready for the IDMAC, and and put the current
run on the done_q when ready. Can't the IC/IDMAC register access be
locked completely separately from the list handling?

> +static int ipu_ic_get_run_count(struct image_converter_ctx *ctx,
> +				struct list_head *q)
> +{
> +	struct image_converter_run *run;
> +	int count = 0;

Add
	lockdep_assert_held(&ctx->irqlock);
for the functions that expect their caller to be holding the lock.

> +	list_for_each_entry(run, q, list) {
> +		if (run->ctx == ctx)
> +			count++;
> +	}
> +
> +	return count;
> +}
> +
> +/* hold irqlock when calling */
> +static void ipu_ic_convert_stop(struct image_converter_run *run)
> +{
> +	struct image_converter_ctx *ctx = run->ctx;
> +	struct image_converter *cvt = ctx->cvt;
> +	struct ipu_ic_priv *priv = cvt->ic->priv;
> +
> +	dev_dbg(priv->ipu->dev, "%s: stopping ctx %p run %p\n",
> +		__func__, ctx, run);

Maybe add some indication which IC task this context belongs to?

> +	/* disable IC tasks and the channels */
> +	ipu_ic_task_disable(cvt->ic);
> +	ipu_idmac_disable_channel(cvt->in_chan);
> +	ipu_idmac_disable_channel(cvt->out_chan);
> +
> +	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
> +		ipu_idmac_disable_channel(cvt->rotation_in_chan);
> +		ipu_idmac_disable_channel(cvt->rotation_out_chan);
> +		ipu_idmac_unlink(cvt->out_chan, cvt->rotation_in_chan);
> +	}
> +
> +	ipu_ic_disable(cvt->ic);
> +}
> +
> +/* hold irqlock when calling */
> +static void init_idmac_channel(struct image_converter_ctx *ctx,
> +			       struct ipuv3_channel *channel,
> +			       struct ipu_ic_image *image,
> +			       enum ipu_rotate_mode rot_mode,
> +			       bool rot_swap_width_height)
> +{
> +	struct image_converter *cvt = ctx->cvt;
> +	unsigned int burst_size;
> +	u32 width, height, stride;
> +	dma_addr_t addr0, addr1 = 0;
> +	struct ipu_image tile_image;
> +	unsigned int tile_idx[2];
> +
> +	if (image->type == IMAGE_CONVERT_OUT) {
> +		tile_idx[0] = ctx->out_tile_map[0];
> +		tile_idx[1] = ctx->out_tile_map[1];
> +	} else {
> +		tile_idx[0] = 0;
> +		tile_idx[1] = 1;
> +	}
> +
> +	if (rot_swap_width_height) {
> +		width = image->tile[0].height;
> +		height = image->tile[0].width;
> +		stride = image->tile[0].rot_stride;
> +		addr0 = ctx->rot_intermediate[0].phys;
> +		if (ctx->double_buffering)
> +			addr1 = ctx->rot_intermediate[1].phys;
> +	} else {
> +		width = image->tile[0].width;
> +		height = image->tile[0].height;
> +		stride = image->stride;
> +		addr0 = image->base.phys0 +
> +			image->tile[tile_idx[0]].offset;
> +		if (ctx->double_buffering)
> +			addr1 = image->base.phys0 +
> +				image->tile[tile_idx[1]].offset;
> +	}
> +
> +	ipu_cpmem_zero(channel);
> +
> +	memset(&tile_image, 0, sizeof(tile_image));
> +	tile_image.pix.width = tile_image.rect.width = width;
> +	tile_image.pix.height = tile_image.rect.height = height;
> +	tile_image.pix.bytesperline = stride;
> +	tile_image.pix.pixelformat =  image->fmt->fourcc;
> +	tile_image.phys0 = addr0;
> +	tile_image.phys1 = addr1;
> +	ipu_cpmem_set_image(channel, &tile_image);
> +
> +	if (image->fmt->y_depth && !rot_swap_width_height)
> +		ipu_cpmem_set_uv_offset(channel,
> +					image->tile[tile_idx[0]].u_off,
> +					image->tile[tile_idx[0]].v_off);
> +
> +	if (rot_mode)
> +		ipu_cpmem_set_rotation(channel, rot_mode);
> +
> +	if (channel == cvt->rotation_in_chan ||
> +	    channel == cvt->rotation_out_chan) {
> +		burst_size = 8;
> +		ipu_cpmem_set_block_mode(channel);
> +	} else
> +		burst_size = (width % 16) ? 8 : 16;

This is for later, but it might turn out to be better to accept a little
overdraw if stride allows for it and use the larger burst size,
especially for wide images.

> +	ipu_cpmem_set_burstsize(channel, burst_size);
> +
> +	ipu_ic_task_idma_init(cvt->ic, channel, width, height,
> +			      burst_size, rot_mode);
> +
> +	ipu_cpmem_set_axi_id(channel, 1);
> +
> +	ipu_idmac_set_double_buffer(channel, ctx->double_buffering);
> +}
> +
> +/* hold irqlock when calling */
> +static int ipu_ic_convert_start(struct image_converter_run *run)
> +{
> +	struct image_converter_ctx *ctx = run->ctx;
> +	struct image_converter *cvt = ctx->cvt;
> +	struct ipu_ic_priv *priv = cvt->ic->priv;
> +	struct ipu_ic_image *s_image = &ctx->in;
> +	struct ipu_ic_image *d_image = &ctx->out;
> +	enum ipu_color_space src_cs, dest_cs;
> +	unsigned int dest_width, dest_height;
> +	int ret;
> +
> +	dev_dbg(priv->ipu->dev, "%s: starting ctx %p run %p\n",
> +		__func__, ctx, run);
> +
> +	src_cs = ipu_pixelformat_to_colorspace(s_image->fmt->fourcc);
> +	dest_cs = ipu_pixelformat_to_colorspace(d_image->fmt->fourcc);
> +
> +	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
> +		/* swap width/height for resizer */
> +		dest_width = d_image->tile[0].height;
> +		dest_height = d_image->tile[0].width;
> +	} else {
> +		dest_width = d_image->tile[0].width;
> +		dest_height = d_image->tile[0].height;
> +	}
> +
> +	/* setup the IC resizer and CSC */
> +	ret = ipu_ic_task_init(cvt->ic,
> +			       s_image->tile[0].width,
> +			       s_image->tile[0].height,
> +			       dest_width,
> +			       dest_height,
> +			       src_cs, dest_cs);
> +	if (ret) {
> +		dev_err(priv->ipu->dev, "ipu_ic_task_init failed, %d\n", ret);
> +		return ret;
> +	}
> +
> +	/* init the source MEM-->IC PP IDMAC channel */
> +	init_idmac_channel(ctx, cvt->in_chan, s_image,
> +			   IPU_ROTATE_NONE, false);
> +
> +	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
> +		/* init the IC PP-->MEM IDMAC channel */
> +		init_idmac_channel(ctx, cvt->out_chan, d_image,
> +				   IPU_ROTATE_NONE, true);
> +
> +		/* init the MEM-->IC PP ROT IDMAC channel */
> +		init_idmac_channel(ctx, cvt->rotation_in_chan, d_image,
> +				   ctx->rot_mode, true);
> +
> +		/* init the destination IC PP ROT-->MEM IDMAC channel */
> +		init_idmac_channel(ctx, cvt->rotation_out_chan, d_image,
> +				   IPU_ROTATE_NONE, false);
> +
> +		/* now link IC PP-->MEM to MEM-->IC PP ROT */
> +		ipu_idmac_link(cvt->out_chan, cvt->rotation_in_chan);
> +	} else {
> +		/* init the destination IC PP-->MEM IDMAC channel */
> +		init_idmac_channel(ctx, cvt->out_chan, d_image,
> +				   ctx->rot_mode, false);
> +	}
> +
> +	/* enable the IC */
> +	ipu_ic_enable(cvt->ic);
> +
> +	/* set buffers ready */
> +	ipu_idmac_select_buffer(cvt->in_chan, 0);
> +	ipu_idmac_select_buffer(cvt->out_chan, 0);
> +	if (ipu_rot_mode_is_irt(ctx->rot_mode))
> +		ipu_idmac_select_buffer(cvt->rotation_out_chan, 0);
> +	if (ctx->double_buffering) {
> +		ipu_idmac_select_buffer(cvt->in_chan, 1);
> +		ipu_idmac_select_buffer(cvt->out_chan, 1);
> +		if (ipu_rot_mode_is_irt(ctx->rot_mode))
> +			ipu_idmac_select_buffer(cvt->rotation_out_chan, 1);
> +	}
> +
> +	/* enable the channels! */
> +	ipu_idmac_enable_channel(cvt->in_chan);
> +	ipu_idmac_enable_channel(cvt->out_chan);
> +	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
> +		ipu_idmac_enable_channel(cvt->rotation_in_chan);
> +		ipu_idmac_enable_channel(cvt->rotation_out_chan);
> +	}
> +
> +	ipu_ic_task_enable(cvt->ic);
> +
> +	ipu_cpmem_dump(cvt->in_chan);
> +	ipu_cpmem_dump(cvt->out_chan);
> +	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
> +		ipu_cpmem_dump(cvt->rotation_in_chan);
> +		ipu_cpmem_dump(cvt->rotation_out_chan);
> +	}
> +
> +	ipu_dump(priv->ipu);
> +
> +	return 0;
> +}
> +
> +/* hold irqlock when calling */
> +static int ipu_ic_run(struct image_converter_run *run)
> +{
> +	struct image_converter_ctx *ctx = run->ctx;
> +	struct image_converter *cvt = ctx->cvt;
> +
> +	ctx->in.base.phys0 = run->in_phys;
> +	ctx->out.base.phys0 = run->out_phys;
> +
> +	ctx->cur_buf_num = 0;
> +	ctx->next_tile = 1;
> +
> +	/* remove run from pending_q and set as current */
> +	list_del(&run->list);
> +	cvt->current_run = run;
> +
> +	return ipu_ic_convert_start(run);
> +}
> +
> +/* hold irqlock when calling */
> +static void ipu_ic_run_next(struct image_converter *cvt)
> +{
> +	struct ipu_ic_priv *priv = cvt->ic->priv;
> +	struct image_converter_run *run, *tmp;
> +	int ret;
> +
> +	list_for_each_entry_safe(run, tmp, &cvt->pending_q, list) {
> +		/* skip contexts that are aborting */
> +		if (run->ctx->aborting) {
> +			dev_dbg(priv->ipu->dev,
> +				 "%s: skipping aborting ctx %p run %p\n",
> +				 __func__, run->ctx, run);
> +			continue;
> +		}
> +
> +		ret = ipu_ic_run(run);
> +		if (!ret)
> +			break;
> +
> +		/*
> +		 * something went wrong with start, add the run
> +		 * to done q and continue to the next run in the
> +		 * pending q.
> +		 */
> +		run->status = ret;
> +		list_add_tail(&run->list, &cvt->done_q);
> +		cvt->current_run = NULL;
> +	}
> +}
> +
> +static void ipu_ic_empty_done_q(struct image_converter *cvt)
> +{
> +	struct ipu_ic_priv *priv = cvt->ic->priv;
> +	struct image_converter_run *run;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&cvt->irqlock, flags);
> +
> +	while (!list_empty(&cvt->done_q)) {
> +		run = list_entry(cvt->done_q.next,
> +				 struct image_converter_run,
> +				 list);
> +
> +		list_del(&run->list);
> +
> +		dev_dbg(priv->ipu->dev,
> +			"%s: completing ctx %p run %p with %d\n",
> +			__func__, run->ctx, run, run->status);
> +
> +		/* call the completion callback and free the run */
> +		spin_unlock_irqrestore(&cvt->irqlock, flags);
> +		run->ctx->complete(run->ctx->complete_context, run,
> +				   run->status);
> +		kfree(run);
> +		spin_lock_irqsave(&cvt->irqlock, flags);
> +	}
> +
> +	spin_unlock_irqrestore(&cvt->irqlock, flags);
> +}
> +
> +/*
> + * the bottom half thread clears out the done_q, calling the
> + * completion handler for each.
> + */
> +static irqreturn_t ipu_ic_bh(int irq, void *dev_id)
> +{
> +	struct image_converter *cvt = dev_id;
> +	struct ipu_ic_priv *priv = cvt->ic->priv;
> +	struct image_converter_ctx *ctx;
> +	unsigned long flags;
> +
> +	dev_dbg(priv->ipu->dev, "%s: enter\n", __func__);
> +
> +	ipu_ic_empty_done_q(cvt);
> +
> +	spin_lock_irqsave(&cvt->irqlock, flags);
> +
> +	/*
> +	 * the done_q is cleared out, signal any contexts
> +	 * that are aborting that abort can complete.
> +	 */
> +	list_for_each_entry(ctx, &cvt->ctx_list, list) {
> +		if (ctx->aborting) {
> +			dev_dbg(priv->ipu->dev,
> +				 "%s: signaling abort for ctx %p\n",
> +				 __func__, ctx);
> +			complete(&ctx->aborted);
> +		}
> +	}
> +
> +	spin_unlock_irqrestore(&cvt->irqlock, flags);
> +
> +	dev_dbg(priv->ipu->dev, "%s: exit\n", __func__);
> +	return IRQ_HANDLED;
> +}
> +
> +/* hold irqlock when calling */
> +static irqreturn_t ipu_ic_doirq(struct image_converter_run *run)
> +{
> +	struct image_converter_ctx *ctx = run->ctx;
> +	struct image_converter *cvt = ctx->cvt;
> +	struct ipu_ic_tile *src_tile, *dst_tile;
> +	struct ipu_ic_image *s_image = &ctx->in;
> +	struct ipu_ic_image *d_image = &ctx->out;
> +	struct ipuv3_channel *outch;
> +	unsigned int dst_idx;
> +
> +	outch = ipu_rot_mode_is_irt(ctx->rot_mode) ?
> +		cvt->rotation_out_chan : cvt->out_chan;
> +
> +	/*
> +	 * It is difficult to stop the channel DMA before the channels
> +	 * enter the paused state. Without double-buffering the channels
> +	 * are always in a paused state when the EOF irq occurs, so it
> +	 * is safe to stop the channels now. For double-buffering we
> +	 * just ignore the abort until the operation completes, when it
> +	 * is safe to shut down.
> +	 */
> +	if (ctx->aborting && !ctx->double_buffering) {
> +		ipu_ic_convert_stop(run);
> +		run->status = -EIO;
> +		goto done;
> +	}
> +
> +	if (ctx->next_tile == ctx->num_tiles) {
> +		/*
> +		 * the conversion is complete
> +		 */
> +		ipu_ic_convert_stop(run);
> +		run->status = 0;
> +		goto done;
> +	}
> +
> +	/*
> +	 * not done, place the next tile buffers.
> +	 */
> +	if (!ctx->double_buffering) {
> +
> +		src_tile = &s_image->tile[ctx->next_tile];
> +		dst_idx = ctx->out_tile_map[ctx->next_tile];
> +		dst_tile = &d_image->tile[dst_idx];
> +
> +		ipu_cpmem_set_buffer(cvt->in_chan, 0,
> +				     s_image->base.phys0 + src_tile->offset);
> +		ipu_cpmem_set_buffer(outch, 0,
> +				     d_image->base.phys0 + dst_tile->offset);
> +		if (s_image->fmt->y_depth)
> +			ipu_cpmem_set_uv_offset(cvt->in_chan,
> +						src_tile->u_off,
> +						src_tile->v_off);
> +		if (d_image->fmt->y_depth)
> +			ipu_cpmem_set_uv_offset(outch,
> +						dst_tile->u_off,
> +						dst_tile->v_off);
> +
> +		ipu_idmac_select_buffer(cvt->in_chan, 0);
> +		ipu_idmac_select_buffer(outch, 0);
> +
> +	} else if (ctx->next_tile < ctx->num_tiles - 1) {
> +
> +		src_tile = &s_image->tile[ctx->next_tile + 1];
> +		dst_idx = ctx->out_tile_map[ctx->next_tile + 1];
> +		dst_tile = &d_image->tile[dst_idx];
> +
> +		ipu_cpmem_set_buffer(cvt->in_chan, ctx->cur_buf_num,
> +				     s_image->base.phys0 + src_tile->offset);
> +		ipu_cpmem_set_buffer(outch, ctx->cur_buf_num,
> +				     d_image->base.phys0 + dst_tile->offset);
> +
> +		ipu_idmac_select_buffer(cvt->in_chan, ctx->cur_buf_num);
> +		ipu_idmac_select_buffer(outch, ctx->cur_buf_num);
> +
> +		ctx->cur_buf_num ^= 1;
> +	}
> +
> +	ctx->next_tile++;
> +	return IRQ_HANDLED;
> +done:
> +	list_add_tail(&run->list, &cvt->done_q);
> +	cvt->current_run = NULL;
> +	ipu_ic_run_next(cvt);
> +	return IRQ_WAKE_THREAD;
> +}
> +
> +static irqreturn_t ipu_ic_norotate_irq(int irq, void *data)
> +{
> +	struct image_converter *cvt = data;
> +	struct image_converter_ctx *ctx;
> +	struct image_converter_run *run;
> +	unsigned long flags;
> +	irqreturn_t ret;
> +
> +	spin_lock_irqsave(&cvt->irqlock, flags);
> +
> +	/* get current run and its context */
> +	run = cvt->current_run;
> +	if (!run) {
> +		ret = IRQ_NONE;
> +		goto out;
> +	}
> +
> +	ctx = run->ctx;
> +
> +	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
> +		/* this is a rotation operation, just ignore */
> +		spin_unlock_irqrestore(&cvt->irqlock, flags);
> +		return IRQ_HANDLED;
> +	}

Why enable the out_chan EOF irq at all when using the IRT mode?

> +
> +	ret = ipu_ic_doirq(run);
> +out:
> +	spin_unlock_irqrestore(&cvt->irqlock, flags);
> +	return ret;
> +}
> +
> +static irqreturn_t ipu_ic_rotate_irq(int irq, void *data)
> +{
> +	struct image_converter *cvt = data;
> +	struct ipu_ic_priv *priv = cvt->ic->priv;
> +	struct image_converter_ctx *ctx;
> +	struct image_converter_run *run;
> +	unsigned long flags;
> +	irqreturn_t ret;
> +
> +	spin_lock_irqsave(&cvt->irqlock, flags);
> +
> +	/* get current run and its context */
> +	run = cvt->current_run;
> +	if (!run) {
> +		ret = IRQ_NONE;
> +		goto out;
> +	}
> +
> +	ctx = run->ctx;
> +
> +	if (!ipu_rot_mode_is_irt(ctx->rot_mode)) {
> +		/* this was NOT a rotation operation, shouldn't happen */
> +		dev_err(priv->ipu->dev, "Unexpected rotation interrupt\n");
> +		spin_unlock_irqrestore(&cvt->irqlock, flags);
> +		return IRQ_HANDLED;
> +	}
> +
> +	ret = ipu_ic_doirq(run);
> +out:
> +	spin_unlock_irqrestore(&cvt->irqlock, flags);
> +	return ret;
> +}
> +
> +/*
> + * try to force the completion of runs for this ctx. Called when
> + * abort wait times out in ipu_image_convert_abort().
> + */
> +static void ipu_ic_force_abort(struct image_converter_ctx *ctx)
> +{
> +	struct image_converter *cvt = ctx->cvt;
> +	struct image_converter_run *run;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&cvt->irqlock, flags);
> +
> +	run = cvt->current_run;
> +	if (run && run->ctx == ctx) {
> +		ipu_ic_convert_stop(run);
> +		run->status = -EIO;
> +		list_add_tail(&run->list, &cvt->done_q);
> +		cvt->current_run = NULL;
> +		ipu_ic_run_next(cvt);
> +	}
> +
> +	spin_unlock_irqrestore(&cvt->irqlock, flags);
> +
> +	ipu_ic_empty_done_q(cvt);
> +}
> +
> +static void ipu_ic_release_ipu_resources(struct image_converter *cvt)
> +{
> +	if (cvt->out_eof_irq >= 0)
> +		free_irq(cvt->out_eof_irq, cvt);
> +	if (cvt->rot_out_eof_irq >= 0)
> +		free_irq(cvt->rot_out_eof_irq, cvt);
> +
> +	if (!IS_ERR_OR_NULL(cvt->in_chan))
> +		ipu_idmac_put(cvt->in_chan);
> +	if (!IS_ERR_OR_NULL(cvt->out_chan))
> +		ipu_idmac_put(cvt->out_chan);
> +	if (!IS_ERR_OR_NULL(cvt->rotation_in_chan))
> +		ipu_idmac_put(cvt->rotation_in_chan);
> +	if (!IS_ERR_OR_NULL(cvt->rotation_out_chan))
> +		ipu_idmac_put(cvt->rotation_out_chan);
> +
> +	cvt->in_chan = cvt->out_chan = cvt->rotation_in_chan =
> +		cvt->rotation_out_chan = NULL;
> +	cvt->out_eof_irq = cvt->rot_out_eof_irq = -1;
> +}
> +
> +static int ipu_ic_get_ipu_resources(struct image_converter *cvt)
> +{
> +	const struct ic_task_channels *chan = cvt->ic->ch;
> +	struct ipu_ic_priv *priv = cvt->ic->priv;
> +	int ret;
> +
> +	/* get IDMAC channels */
> +	cvt->in_chan = ipu_idmac_get(priv->ipu, chan->in);
> +	cvt->out_chan = ipu_idmac_get(priv->ipu, chan->out);
> +	if (IS_ERR(cvt->in_chan) || IS_ERR(cvt->out_chan)) {
> +		dev_err(priv->ipu->dev, "could not acquire idmac channels\n");
> +		ret = -EBUSY;
> +		goto err;
> +	}
> +
> +	cvt->rotation_in_chan = ipu_idmac_get(priv->ipu, chan->rot_in);
> +	cvt->rotation_out_chan = ipu_idmac_get(priv->ipu, chan->rot_out);
> +	if (IS_ERR(cvt->rotation_in_chan) || IS_ERR(cvt->rotation_out_chan)) {
> +		dev_err(priv->ipu->dev,
> +			"could not acquire idmac rotation channels\n");
> +		ret = -EBUSY;
> +		goto err;
> +	}
> +
> +	/* acquire the EOF interrupts */
> +	cvt->out_eof_irq = ipu_idmac_channel_irq(priv->ipu,
> +						cvt->out_chan,
> +						IPU_IRQ_EOF);
> +
> +	ret = request_threaded_irq(cvt->out_eof_irq,
> +				   ipu_ic_norotate_irq, ipu_ic_bh,
> +				   0, "ipu-ic", cvt);
> +	if (ret < 0) {
> +		dev_err(priv->ipu->dev, "could not acquire irq %d\n",
> +			 cvt->out_eof_irq);
> +		cvt->out_eof_irq = -1;
> +		goto err;
> +	}
> +
> +	cvt->rot_out_eof_irq = ipu_idmac_channel_irq(priv->ipu,
> +						     cvt->rotation_out_chan,
> +						     IPU_IRQ_EOF);
> +
> +	ret = request_threaded_irq(cvt->rot_out_eof_irq,
> +				   ipu_ic_rotate_irq, ipu_ic_bh,
> +				   0, "ipu-ic", cvt);
> +	if (ret < 0) {
> +		dev_err(priv->ipu->dev, "could not acquire irq %d\n",
> +			cvt->rot_out_eof_irq);
> +		cvt->rot_out_eof_irq = -1;
> +		goto err;
> +	}
> +
> +	return 0;
> +err:
> +	ipu_ic_release_ipu_resources(cvt);
> +	return ret;
> +}
> +
> +static int ipu_ic_fill_image(struct image_converter_ctx *ctx,
> +			     struct ipu_ic_image *ic_image,
> +			     struct ipu_image *image,
> +			     enum image_convert_type type)
> +{
> +	struct ipu_ic_priv *priv = ctx->cvt->ic->priv;
> +
> +	ic_image->base = *image;
> +	ic_image->type = type;
> +
> +	ic_image->fmt = ipu_ic_get_format(image->pix.pixelformat);
> +	if (!ic_image->fmt) {
> +		dev_err(priv->ipu->dev, "pixelformat not supported for %s\n",
> +			type == IMAGE_CONVERT_OUT ? "Output" : "Input");
> +		return -EINVAL;
> +	}
> +
> +	if (ic_image->fmt->y_depth)
> +		ic_image->stride = (ic_image->fmt->y_depth *
> +				    ic_image->base.pix.width) >> 3;
> +	else
> +		ic_image->stride  = ic_image->base.pix.bytesperline;
> +
> +	ipu_ic_calc_tile_dimensions(ctx, ic_image);
> +	ipu_ic_calc_tile_offsets(ctx, ic_image);
> +
> +	return 0;
> +}
> +
> +/* borrowed from drivers/media/v4l2-core/v4l2-common.c */
> +static unsigned int clamp_align(unsigned int x, unsigned int min,
> +				unsigned int max, unsigned int align)
> +{
> +	/* Bits that must be zero to be aligned */
> +	unsigned int mask = ~((1 << align) - 1);
> +
> +	/* Clamp to aligned min and max */
> +	x = clamp(x, (min + ~mask) & mask, max & mask);
> +
> +	/* Round to nearest aligned value */
> +	if (align)
> +		x = (x + (1 << (align - 1))) & mask;
> +
> +	return x;
> +}
> +
> +/*
> + * We have to adjust the tile width such that the tile physaddrs and
> + * U and V plane offsets are multiples of 8 bytes as required by
> + * the IPU DMA Controller. For the planar formats, this corresponds
> + * to a pixel alignment of 16 (but use a more formal equation since
> + * the variables are available). For all the packed formats, 8 is
> + * good enough.
> + */
> +static inline u32 tile_width_align(const struct ipu_ic_pixfmt *fmt)
> +{
> +	return fmt->y_depth ? (64 * fmt->uv_width_dec) / fmt->y_depth : 8;
> +}
> +
> +/*
> + * For tile height alignment, we have to ensure that the output tile
> + * heights are multiples of 8 lines if the IRT is required by the
> + * given rotation mode (the IRT performs rotations on 8x8 blocks
> + * at a time). If the IRT is not used, or for input image tiles,
> + * 2 lines are good enough.
> + */
> +static inline u32 tile_height_align(enum image_convert_type type,
> +				    enum ipu_rotate_mode rot_mode)
> +{
> +	return (type == IMAGE_CONVERT_OUT &&
> +		ipu_rot_mode_is_irt(rot_mode)) ? 8 : 2;
> +}
> +
> +/* Adjusts input/output images to IPU restrictions */
> +int ipu_image_convert_adjust(struct ipu_image *in, struct ipu_image *out,
> +			     enum ipu_rotate_mode rot_mode)
> +{
> +	const struct ipu_ic_pixfmt *infmt, *outfmt;
> +	unsigned int num_in_rows, num_in_cols;
> +	unsigned int num_out_rows, num_out_cols;
> +	u32 w_align, h_align;
> +
> +	infmt = ipu_ic_get_format(in->pix.pixelformat);
> +	outfmt = ipu_ic_get_format(out->pix.pixelformat);
> +
> +	/* set some defaults if needed */

Is this our task at all?

> +	if (!infmt) {
> +		in->pix.pixelformat = V4L2_PIX_FMT_RGB24;
> +		infmt = ipu_ic_get_format(V4L2_PIX_FMT_RGB24);
> +	}
> +	if (!outfmt) {
> +		out->pix.pixelformat = V4L2_PIX_FMT_RGB24;
> +		outfmt = ipu_ic_get_format(V4L2_PIX_FMT_RGB24);
> +	}
> +
> +	if (!in->pix.width || !in->pix.height) {
> +		in->pix.width = 640;
> +		in->pix.height = 480;
> +	}
> +	if (!out->pix.width || !out->pix.height) {
> +		out->pix.width = 640;
> +		out->pix.height = 480;
> +	}
> +
> +	/* image converter does not handle fields */
> +	in->pix.field = out->pix.field = V4L2_FIELD_NONE;

Why not? The scaler can scale alternate top/bottom fields no problem.

For SEQ_TB/BT and the interleaved interlacing we'd have to adjust
scaling factors per field and use two vertical tiles for the fields
before this can be supported.

> +	/* resizer cannot downsize more than 4:1 */
> +	if (ipu_rot_mode_is_irt(rot_mode)) {
> +		out->pix.height = max_t(__u32, out->pix.height,
> +					in->pix.width / 4);
> +		out->pix.width = max_t(__u32, out->pix.width,
> +				       in->pix.height / 4);
> +	} else {
> +		out->pix.width = max_t(__u32, out->pix.width,
> +				       in->pix.width / 4);
> +		out->pix.height = max_t(__u32, out->pix.height,
> +					in->pix.height / 4);
> +	}
> +
> +	/* get tiling rows/cols from output format */
> +	num_out_rows = ipu_ic_num_stripes(out->pix.height);
> +	num_out_cols = ipu_ic_num_stripes(out->pix.width);
> +	if (ipu_rot_mode_is_irt(rot_mode)) {
> +		num_in_rows = num_out_cols;
> +		num_in_cols = num_out_rows;
> +	} else {
> +		num_in_rows = num_out_rows;
> +		num_in_cols = num_out_cols;
> +	}
> +
> +	/* align input width/height */
> +	w_align = ilog2(tile_width_align(infmt) * num_in_cols);
> +	h_align = ilog2(tile_height_align(IMAGE_CONVERT_IN, rot_mode) *
> +			num_in_rows);
> +	in->pix.width = clamp_align(in->pix.width, MIN_W, MAX_W, w_align);
> +	in->pix.height = clamp_align(in->pix.height, MIN_H, MAX_H, h_align);
> +
> +	/* align output width/height */
> +	w_align = ilog2(tile_width_align(outfmt) * num_out_cols);
> +	h_align = ilog2(tile_height_align(IMAGE_CONVERT_OUT, rot_mode) *
> +			num_out_rows);
> +	out->pix.width = clamp_align(out->pix.width, MIN_W, MAX_W, w_align);
> +	out->pix.height = clamp_align(out->pix.height, MIN_H, MAX_H, h_align);
> +
> +	/* set input/output strides and image sizes */
> +	in->pix.bytesperline = (in->pix.width * infmt->bpp) >> 3;
> +	in->pix.sizeimage = in->pix.height * in->pix.bytesperline;
> +	out->pix.bytesperline = (out->pix.width * outfmt->bpp) >> 3;
> +	out->pix.sizeimage = out->pix.height * out->pix.bytesperline;
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(ipu_image_convert_adjust);
> +
> +/*
> + * this is used by ipu_image_convert_prepare() to verify set input and
> + * output images are valid before starting the conversion. Clients can
> + * also call it before calling ipu_image_convert_prepare().
> + */
> +int ipu_image_convert_verify(struct ipu_image *in, struct ipu_image *out,
> +			     enum ipu_rotate_mode rot_mode)
> +{
> +	struct ipu_image testin, testout;
> +	int ret;
> +
> +	testin = *in;
> +	testout = *out;
> +
> +	ret = ipu_image_convert_adjust(&testin, &testout, rot_mode);
> +	if (ret)
> +		return ret;
> +
> +	if (testin.pix.width != in->pix.width ||
> +	    testin.pix.height != in->pix.height ||
> +	    testout.pix.width != out->pix.width ||
> +	    testout.pix.height != out->pix.height)
> +		return -EINVAL;
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(ipu_image_convert_verify);
> +
> +/*
> + * Call ipu_image_convert_prepare() to prepare for the conversion of
> + * given images and rotation mode. Returns a new conversion context.
> + */
> +struct image_converter_ctx *
> +ipu_image_convert_prepare(struct ipu_ic *ic,
> +			  struct ipu_image *in, struct ipu_image *out,
> +			  enum ipu_rotate_mode rot_mode,
> +			  image_converter_cb_t complete,
> +			  void *complete_context)
> +{
> +	struct ipu_ic_priv *priv = ic->priv;
> +	struct image_converter *cvt = &ic->cvt;
> +	struct ipu_ic_image *s_image, *d_image;
> +	struct image_converter_ctx *ctx;
> +	unsigned long flags;
> +	bool get_res;
> +	int ret;
> +
> +	if (!ic || !in || !out || !complete)
> +		return ERR_PTR(-EINVAL);
> +
> +	/* verify the in/out images before continuing */
> +	ret = ipu_image_convert_verify(in, out, rot_mode);
> +	if (ret) {
> +		dev_err(priv->ipu->dev, "%s: in/out formats invalid\n",
> +			__func__);
> +		return ERR_PTR(ret);
> +	}
> +
> +	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
> +	if (!ctx)
> +		return ERR_PTR(-ENOMEM);
> +
> +	dev_dbg(priv->ipu->dev, "%s: ctx %p\n", __func__, ctx);
> +
> +	ctx->cvt = cvt;
> +	init_completion(&ctx->aborted);
> +
> +	s_image = &ctx->in;
> +	d_image = &ctx->out;
> +
> +	/* set tiling and rotation */
> +	d_image->num_rows = ipu_ic_num_stripes(out->pix.height);
> +	d_image->num_cols = ipu_ic_num_stripes(out->pix.width);
> +	if (ipu_rot_mode_is_irt(rot_mode)) {
> +		s_image->num_rows = d_image->num_cols;
> +		s_image->num_cols = d_image->num_rows;
> +	} else {
> +		s_image->num_rows = d_image->num_rows;
> +		s_image->num_cols = d_image->num_cols;
> +	}
> +
> +	ctx->num_tiles = d_image->num_cols * d_image->num_rows;
> +	ctx->rot_mode = rot_mode;
> +
> +	ret = ipu_ic_fill_image(ctx, s_image, in, IMAGE_CONVERT_IN);
> +	if (ret)
> +		goto out_free;
> +	ret = ipu_ic_fill_image(ctx, d_image, out, IMAGE_CONVERT_OUT);
> +	if (ret)
> +		goto out_free;
> +
> +	ipu_ic_calc_out_tile_map(ctx);
> +
> +	ipu_ic_dump_format(ctx, s_image);
> +	ipu_ic_dump_format(ctx, d_image);
> +
> +	ctx->complete = complete;
> +	ctx->complete_context = complete_context;
> +
> +	/*
> +	 * Can we use double-buffering for this operation? If there is
> +	 * only one tile (the whole image can be converted in a single
> +	 * operation) there's no point in using double-buffering. Also,
> +	 * the IPU's IDMAC channels allow only a single U and V plane
> +	 * offset shared between both buffers, but these offsets change
> +	 * for every tile, and therefore would have to be updated for
> +	 * each buffer which is not possible. So double-buffering is
> +	 * impossible when either the source or destination images are
> +	 * a planar format (YUV420, YUV422P, etc.).
> +	 */
> +	ctx->double_buffering = (ctx->num_tiles > 1 &&
> +				 !s_image->fmt->y_depth &&
> +				 !d_image->fmt->y_depth);
> +
> +	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
> +		ret = ipu_ic_alloc_dma_buf(priv, &ctx->rot_intermediate[0],
> +					   d_image->tile[0].size);
> +		if (ret)
> +			goto out_free;
> +		if (ctx->double_buffering) {
> +			ret = ipu_ic_alloc_dma_buf(priv,
> +						   &ctx->rot_intermediate[1],
> +						   d_image->tile[0].size);
> +			if (ret)
> +				goto out_free_dmabuf0;
> +		}
> +	}
> +
> +	spin_lock_irqsave(&cvt->irqlock, flags);
> +
> +	get_res = list_empty(&cvt->ctx_list);
> +
> +	list_add_tail(&ctx->list, &cvt->ctx_list);
> +
> +	spin_unlock_irqrestore(&cvt->irqlock, flags);
> +
> +	if (get_res) {
> +		ret = ipu_ic_get_ipu_resources(cvt);
> +		if (ret)
> +			goto out_free_dmabuf1;
> +	}
> +
> +	return ctx;
> +
> +out_free_dmabuf1:
> +	ipu_ic_free_dma_buf(priv, &ctx->rot_intermediate[1]);
> +	spin_lock_irqsave(&cvt->irqlock, flags);
> +	list_del(&ctx->list);
> +	spin_unlock_irqrestore(&cvt->irqlock, flags);
> +out_free_dmabuf0:
> +	ipu_ic_free_dma_buf(priv, &ctx->rot_intermediate[0]);
> +out_free:
> +	kfree(ctx);
> +	return ERR_PTR(ret);
> +}
> +EXPORT_SYMBOL_GPL(ipu_image_convert_prepare);
> +
> +/*
> + * Carry out a single image conversion. Only the physaddr's of the input
> + * and output image buffers are needed. The conversion context must have
> + * been created previously with ipu_image_convert_prepare(). Returns the
> + * new run object.
> + */
> +struct image_converter_run *
> +ipu_image_convert_run(struct image_converter_ctx *ctx,
> +		      dma_addr_t in_phys, dma_addr_t out_phys)
> +{
> +	struct image_converter *cvt = ctx->cvt;
> +	struct ipu_ic_priv *priv = cvt->ic->priv;
> +	struct image_converter_run *run;
> +	unsigned long flags;
> +	int ret = 0;
> +
> +	run = kzalloc(sizeof(*run), GFP_KERNEL);
> +	if (!run)
> +		return ERR_PTR(-ENOMEM);

What is the reasoning behind making the image_converter_run opaque to
the user? If you let the user provide it to ipu_image_convert_run, it
wouldn't have to be allocated/freed with each frame.

> +	run->ctx = ctx;
> +	run->in_phys = in_phys;
> +	run->out_phys = out_phys;
> +
> +	dev_dbg(priv->ipu->dev, "%s: ctx %p run %p\n", __func__,
> +		ctx, run);
> +
> +	spin_lock_irqsave(&cvt->irqlock, flags);
> +
> +	if (ctx->aborting) {
> +		ret = -EIO;
> +		goto unlock;
> +	}
> +
> +	list_add_tail(&run->list, &cvt->pending_q);
> +
> +	if (!cvt->current_run) {
> +		ret = ipu_ic_run(run);
> +		if (ret)
> +			cvt->current_run = NULL;
> +	}
> +unlock:
> +	spin_unlock_irqrestore(&cvt->irqlock, flags);
> +
> +	if (ret) {
> +		kfree(run);
> +		run = ERR_PTR(ret);
> +	}
> +
> +	return run;
> +}
> +EXPORT_SYMBOL_GPL(ipu_image_convert_run);
> +
> +/* Abort any active or pending conversions for this context */
> +void ipu_image_convert_abort(struct image_converter_ctx *ctx)
> +{
> +	struct image_converter *cvt = ctx->cvt;
> +	struct ipu_ic_priv *priv = cvt->ic->priv;
> +	struct image_converter_run *run, *active_run, *tmp;
> +	unsigned long flags;
> +	int run_count, ret;
> +	bool need_abort;
> +
> +	reinit_completion(&ctx->aborted);
> +
> +	spin_lock_irqsave(&cvt->irqlock, flags);
> +
> +	/* move all remaining pending runs in this context to done_q */
> +	list_for_each_entry_safe(run, tmp, &cvt->pending_q, list) {
> +		if (run->ctx != ctx)
> +			continue;
> +		run->status = -EIO;
> +		list_move_tail(&run->list, &cvt->done_q);
> +	}
> +
> +	run_count = ipu_ic_get_run_count(ctx, &cvt->done_q);
> +	active_run = (cvt->current_run && cvt->current_run->ctx == ctx) ?
> +		cvt->current_run : NULL;
> +
> +	need_abort = (run_count || active_run);
> +
> +	ctx->aborting = need_abort;
> +
> +	spin_unlock_irqrestore(&cvt->irqlock, flags);
> +
> +	if (!need_abort) {
> +		dev_dbg(priv->ipu->dev, "%s: no abort needed for ctx %p\n",
> +			__func__, ctx);
> +		return;
> +	}
> +
> +	dev_dbg(priv->ipu->dev,
> +		 "%s: wait for completion: %d runs, active run %p\n",
> +		 __func__, run_count, active_run);
> +
> +	ret = wait_for_completion_timeout(&ctx->aborted,
> +					  msecs_to_jiffies(10000));
> +	if (ret == 0) {
> +		dev_warn(priv->ipu->dev, "%s: timeout\n", __func__);
> +		ipu_ic_force_abort(ctx);
> +	}
> +
> +	ctx->aborting = false;
> +}
> +EXPORT_SYMBOL_GPL(ipu_image_convert_abort);
> +
> +/* Unprepare image conversion context */
> +void ipu_image_convert_unprepare(struct image_converter_ctx *ctx)
> +{
> +	struct image_converter *cvt = ctx->cvt;
> +	struct ipu_ic_priv *priv = cvt->ic->priv;
> +	unsigned long flags;
> +	bool put_res;
> +
> +	/* make sure no runs are hanging around */
> +	ipu_image_convert_abort(ctx);
> +
> +	dev_dbg(priv->ipu->dev, "%s: removing ctx %p\n", __func__, ctx);
> +
> +	spin_lock_irqsave(&cvt->irqlock, flags);
> +
> +	list_del(&ctx->list);
> +
> +	put_res = list_empty(&cvt->ctx_list);
> +
> +	spin_unlock_irqrestore(&cvt->irqlock, flags);
> +
> +	if (put_res)
> +		ipu_ic_release_ipu_resources(cvt);
> +
> +	ipu_ic_free_dma_buf(priv, &ctx->rot_intermediate[1]);
> +	ipu_ic_free_dma_buf(priv, &ctx->rot_intermediate[0]);
> +
> +	kfree(ctx);
> +}
> +EXPORT_SYMBOL_GPL(ipu_image_convert_unprepare);
> +
> +/*
> + * "Canned" asynchronous single image conversion. On successful return
> + * caller must call ipu_image_convert_unprepare() after conversion completes.
> + * Returns the new conversion context.
> + */
> +struct image_converter_ctx *
> +ipu_image_convert(struct ipu_ic *ic,
> +		  struct ipu_image *in, struct ipu_image *out,
> +		  enum ipu_rotate_mode rot_mode,
> +		  image_converter_cb_t complete,
> +		  void *complete_context)
> +{
> +	struct image_converter_ctx *ctx;
> +	struct image_converter_run *run;
> +
> +	ctx = ipu_image_convert_prepare(ic, in, out, rot_mode,
> +					complete, complete_context);
> +	if (IS_ERR(ctx))
> +		return ctx;
> +
> +	run = ipu_image_convert_run(ctx, in->phys0, out->phys0);
> +	if (IS_ERR(run)) {
> +		ipu_image_convert_unprepare(ctx);
> +		return ERR_PTR(PTR_ERR(run));
> +	}
> +
> +	return ctx;
> +}
> +EXPORT_SYMBOL_GPL(ipu_image_convert);
> +
> +/* "Canned" synchronous single image conversion */
> +static void image_convert_sync_complete(void *data,
> +					struct image_converter_run *run,
> +					int err)
> +{
> +	struct completion *comp = data;
> +
> +	complete(comp);
> +}
> +
> +int ipu_image_convert_sync(struct ipu_ic *ic,
> +			   struct ipu_image *in, struct ipu_image *out,
> +			   enum ipu_rotate_mode rot_mode)
> +{
> +	struct image_converter_ctx *ctx;
> +	struct completion comp;
> +	int ret;
> +
> +	init_completion(&comp);
> +
> +	ctx = ipu_image_convert(ic, in, out, rot_mode,
> +				image_convert_sync_complete, &comp);
> +	if (IS_ERR(ctx))
> +		return PTR_ERR(ctx);
> +
> +	ret = wait_for_completion_timeout(&comp, msecs_to_jiffies(10000));
> +	ret = (ret == 0) ? -ETIMEDOUT : 0;
> +
> +	ipu_image_convert_unprepare(ctx);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(ipu_image_convert_sync);
> +

Most of this calculation of tile geometry and conversion queue handling
code is not really low level IC hardware access. I'd like the code that
doesn't have to access ipu_ic internals directly to be moved into a
separate source file. I'd suggest ipu-ic-queue.c, or
ipu-image-convert.c.

>  int ipu_ic_enable(struct ipu_ic *ic)
>  {
>  	struct ipu_ic_priv *priv = ic->priv;
> @@ -746,6 +2418,7 @@ int ipu_ic_init(struct ipu_soc *ipu, struct device *dev,
>  	ipu->ic_priv = priv;
>  
>  	spin_lock_init(&priv->lock);
> +
>  	priv->base = devm_ioremap(dev, base, PAGE_SIZE);
>  	if (!priv->base)
>  		return -ENOMEM;
> @@ -758,10 +2431,21 @@ int ipu_ic_init(struct ipu_soc *ipu, struct device *dev,
>  	priv->ipu = ipu;
>  
>  	for (i = 0; i < IC_NUM_TASKS; i++) {
> -		priv->task[i].task = i;
> -		priv->task[i].priv = priv;
> -		priv->task[i].reg = &ic_task_reg[i];
> -		priv->task[i].bit = &ic_task_bit[i];
> +		struct ipu_ic *ic = &priv->task[i];
> +		struct image_converter *cvt = &ic->cvt;
> +
> +		ic->task = i;
> +		ic->priv = priv;
> +		ic->reg = &ic_task_reg[i];
> +		ic->bit = &ic_task_bit[i];
> +		ic->ch = &ic_task_ch[i];
> +
> +		cvt->ic = ic;
> +		spin_lock_init(&cvt->irqlock);
> +		INIT_LIST_HEAD(&cvt->ctx_list);
> +		INIT_LIST_HEAD(&cvt->pending_q);
> +		INIT_LIST_HEAD(&cvt->done_q);
> +		cvt->out_eof_irq = cvt->rot_out_eof_irq = -1;
>  	}
>  
>  	return 0;
> diff --git a/include/video/imx-ipu-v3.h b/include/video/imx-ipu-v3.h
> index 1a3f7d4..992addf 100644
> --- a/include/video/imx-ipu-v3.h
> +++ b/include/video/imx-ipu-v3.h
> @@ -63,17 +63,25 @@ enum ipu_csi_dest {
>  /*
>   * Enumeration of IPU rotation modes
>   */
> +#define IPU_ROT_BIT_VFLIP (1 << 0)
> +#define IPU_ROT_BIT_HFLIP (1 << 1)
> +#define IPU_ROT_BIT_90    (1 << 2)
> +
>  enum ipu_rotate_mode {
>  	IPU_ROTATE_NONE = 0,
> -	IPU_ROTATE_VERT_FLIP,
> -	IPU_ROTATE_HORIZ_FLIP,
> -	IPU_ROTATE_180,
> -	IPU_ROTATE_90_RIGHT,
> -	IPU_ROTATE_90_RIGHT_VFLIP,
> -	IPU_ROTATE_90_RIGHT_HFLIP,
> -	IPU_ROTATE_90_LEFT,
> +	IPU_ROTATE_VERT_FLIP = IPU_ROT_BIT_VFLIP,
> +	IPU_ROTATE_HORIZ_FLIP = IPU_ROT_BIT_HFLIP,
> +	IPU_ROTATE_180 = (IPU_ROT_BIT_VFLIP | IPU_ROT_BIT_HFLIP),
> +	IPU_ROTATE_90_RIGHT = IPU_ROT_BIT_90,
> +	IPU_ROTATE_90_RIGHT_VFLIP = (IPU_ROT_BIT_90 | IPU_ROT_BIT_VFLIP),
> +	IPU_ROTATE_90_RIGHT_HFLIP = (IPU_ROT_BIT_90 | IPU_ROT_BIT_HFLIP),
> +	IPU_ROTATE_90_LEFT = (IPU_ROT_BIT_90 |
> +			      IPU_ROT_BIT_VFLIP | IPU_ROT_BIT_HFLIP),
>  };
>  
> +/* 90-degree rotations require the IRT unit */
> +#define ipu_rot_mode_is_irt(m) ((m) >= IPU_ROTATE_90_RIGHT)
> +
>  enum ipu_color_space {
>  	IPUV3_COLORSPACE_RGB,
>  	IPUV3_COLORSPACE_YUV,
> @@ -337,6 +345,7 @@ enum ipu_ic_task {
>  };
>  
>  struct ipu_ic;
> +
>  int ipu_ic_task_init(struct ipu_ic *ic,
>  		     int in_width, int in_height,
>  		     int out_width, int out_height,
> @@ -351,6 +360,40 @@ void ipu_ic_task_disable(struct ipu_ic *ic);
>  int ipu_ic_task_idma_init(struct ipu_ic *ic, struct ipuv3_channel *channel,
>  			  u32 width, u32 height, int burst_size,
>  			  enum ipu_rotate_mode rot);
> +
> +struct image_converter_ctx;
> +struct image_converter_run;
> +

Add an ipu_ prefix to those.

> +typedef void (*image_converter_cb_t)(void *ctx,
> +				     struct image_converter_run *run,
> +				     int err);
> +
> +int ipu_image_convert_enum_format(int index, const char **desc, u32 *fourcc);
> +int ipu_image_convert_adjust(struct ipu_image *in, struct ipu_image *out,
> +			     enum ipu_rotate_mode rot_mode);
> +int ipu_image_convert_verify(struct ipu_image *in, struct ipu_image *out,
> +			     enum ipu_rotate_mode rot_mode);
> +struct image_converter_ctx *
> +ipu_image_convert_prepare(struct ipu_ic *ic,
> +			  struct ipu_image *in, struct ipu_image *out,
> +			  enum ipu_rotate_mode rot_mode,
> +			  image_converter_cb_t complete,
> +			  void *complete_context);
> +void ipu_image_convert_unprepare(struct image_converter_ctx *ctx);
> +struct image_converter_run *
> +ipu_image_convert_run(struct image_converter_ctx *ctx,
> +		      dma_addr_t in_phys, dma_addr_t out_phys);
> +void ipu_image_convert_abort(struct image_converter_ctx *ctx);
> +struct image_converter_ctx *
> +ipu_image_convert(struct ipu_ic *ic,
> +		  struct ipu_image *in, struct ipu_image *out,
> +		  enum ipu_rotate_mode rot_mode,
> +		  image_converter_cb_t complete,
> +		  void *complete_context);
> +int ipu_image_convert_sync(struct ipu_ic *ic,
> +			   struct ipu_image *in, struct ipu_image *out,
> +			   enum ipu_rotate_mode rot_mode);
> +
>  int ipu_ic_enable(struct ipu_ic *ic);
>  int ipu_ic_disable(struct ipu_ic *ic);
>  struct ipu_ic *ipu_ic_get(struct ipu_soc *ipu, enum ipu_ic_task task);

regards
Philipp

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 3/4] gpu: ipu-ic: Add complete image conversion support with tiling
@ 2016-09-06  9:26     ` Philipp Zabel
  0 siblings, 0 replies; 32+ messages in thread
From: Philipp Zabel @ 2016-09-06  9:26 UTC (permalink / raw)
  To: Steve Longerbeam
  Cc: linux-fbdev, Steve Longerbeam, linux-kernel, dri-devel,
	tomi.valkeinen, plagnioj

Hi Steve,

Am Mittwoch, den 17.08.2016, 17:50 -0700 schrieb Steve Longerbeam:
> This patch implements complete image conversion support to ipu-ic,
> with tiling to support scaling to and from images up to 4096x4096.
> Image rotation is also supported.
> 
> The internal API is subsystem agnostic (no V4L2 dependency except
> for the use of V4L2 fourcc pixel formats).
> 
> Callers prepare for image conversion by calling
> ipu_image_convert_prepare(), which initializes the parameters of
> the conversion.

... and possibly allocates intermediate buffers for rotation support.
This should be documented somewhere, with a node that v4l2 users should
be doing this during REQBUFS.

>  The caller passes in the ipu_ic task to use for
> the conversion, the input and output image formats, a rotation mode,
> and a completion callback and completion context pointer:
> 
> struct image_converter_ctx *
> ipu_image_convert_prepare(struct ipu_ic *ic,
>                           struct ipu_image *in, struct ipu_image *out,
>                           enum ipu_rotate_mode rot_mode,
>                           image_converter_cb_t complete,
>                           void *complete_context);

As I commented on the other patch, I think the image_convert functions
should use a separate handle for the image conversion queues that sit on
top of the ipu_ic task handles.

> The caller is given a new conversion context that must be passed to
> the further APIs:
> 
> struct image_converter_run *
> ipu_image_convert_run(struct image_converter_ctx *ctx,
>                       dma_addr_t in_phys, dma_addr_t out_phys);
> 
> This queues a new image conversion request to a run queue, and
> starts the conversion immediately if the run queue is empty. Only
> the physaddr's of the input and output image buffers are needed,
> since the conversion context was created previously with
> ipu_image_convert_prepare(). Returns a new run object pointer. When
> the conversion completes, the run pointer is returned to the
> completion callback.
>
> void image_convert_abort(struct image_converter_ctx *ctx);
> 
> This will abort any active or pending conversions for this context.
> Any currently active or pending runs belonging to this context are
> returned via the completion callback with an error status.
>
> void ipu_image_convert_unprepare(struct image_converter_ctx *ctx);
> 
> Unprepares the conversion context. Any active or pending runs will
> be aborted by calling image_convert_abort().
> 
> Signed-off-by: Steve Longerbeam <steve_longerbeam@mentor.com>
> 
> ---
> 
> v4:
> - do away with struct ipu_ic_tile_off, and move tile offsets into
>   struct ipu_ic_tile. This paves the way for possibly allowing for
>   each tile to have different dimensions in the future.

Thanks, this looks a lot better to me.

> v3: no changes
> v2: no changes
> ---
>  drivers/gpu/ipu-v3/ipu-ic.c | 1694 ++++++++++++++++++++++++++++++++++++++++++-
>  include/video/imx-ipu-v3.h  |   57 +-
>  2 files changed, 1739 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/gpu/ipu-v3/ipu-ic.c b/drivers/gpu/ipu-v3/ipu-ic.c
> index 1a37afc..01b1b56 100644
> --- a/drivers/gpu/ipu-v3/ipu-ic.c
> +++ b/drivers/gpu/ipu-v3/ipu-ic.c
> @@ -17,6 +17,8 @@
>  #include <linux/bitrev.h>
>  #include <linux/io.h>
>  #include <linux/err.h>
> +#include <linux/interrupt.h>
> +#include <linux/dma-mapping.h>
>  #include "ipu-prv.h"
>  
>  /* IC Register Offsets */
> @@ -82,6 +84,40 @@
>  #define IC_IDMAC_3_PP_WIDTH_MASK        (0x3ff << 20)
>  #define IC_IDMAC_3_PP_WIDTH_OFFSET      20
>  
> +/*
> + * The IC Resizer has a restriction that the output frame from the
> + * resizer must be 1024 or less in both width (pixels) and height
> + * (lines).
> + *
> + * The image conversion support attempts to split up a conversion when
> + * the desired output (converted) frame resolution exceeds the IC resizer
> + * limit of 1024 in either dimension.
> + *
> + * If either dimension of the output frame exceeds the limit, the
> + * dimension is split into 1, 2, or 4 equal stripes, for a maximum
> + * of 4*4 or 16 tiles. A conversion is then carried out for each
> + * tile (but taking care to pass the full frame stride length to
> + * the DMA channel's parameter memory!). IDMA double-buffering is used
> + * to convert each tile back-to-back when possible (see note below
> + * when double_buffering boolean is set).
> + *
> + * Note that the input frame must be split up into the same number
> + * of tiles as the output frame.
> + */
> +#define MAX_STRIPES_W    4
> +#define MAX_STRIPES_H    4
> +#define MAX_TILES (MAX_STRIPES_W * MAX_STRIPES_H)
> +
> +#define MIN_W     128
> +#define MIN_H     128

Where does this minimum come from?

> +#define MAX_W     4096
> +#define MAX_H     4096
> +
> +enum image_convert_type {
> +	IMAGE_CONVERT_IN = 0,
> +	IMAGE_CONVERT_OUT,
> +};
> +
>  struct ic_task_regoffs {
>  	u32 rsc;
>  	u32 tpmem_csc[2];
> @@ -96,6 +132,16 @@ struct ic_task_bitfields {
>  	u32 ic_cmb_galpha_bit;
>  };
>  
> +struct ic_task_channels {
> +	int in;
> +	int out;
> +	int rot_in;
> +	int rot_out;
> +	int vdi_in_p;
> +	int vdi_in;
> +	int vdi_in_n;

The vdi channels are unused.

> +};
> +
>  static const struct ic_task_regoffs ic_task_reg[IC_NUM_TASKS] = {
>  	[IC_TASK_ENCODER] = {
>  		.rsc = IC_PRP_ENC_RSC,
> @@ -138,12 +184,155 @@ static const struct ic_task_bitfields ic_task_bit[IC_NUM_TASKS] = {
>  	},
>  };
>  
> +static const struct ic_task_channels ic_task_ch[IC_NUM_TASKS] = {
> +	[IC_TASK_ENCODER] = {
> +		.out = IPUV3_CHANNEL_IC_PRP_ENC_MEM,
> +		.rot_in = IPUV3_CHANNEL_MEM_ROT_ENC,
> +		.rot_out = IPUV3_CHANNEL_ROT_ENC_MEM,
> +	},
> +	[IC_TASK_VIEWFINDER] = {
> +		.in = IPUV3_CHANNEL_MEM_IC_PRP_VF,
> +		.out = IPUV3_CHANNEL_IC_PRP_VF_MEM,
> +		.rot_in = IPUV3_CHANNEL_MEM_ROT_VF,
> +		.rot_out = IPUV3_CHANNEL_ROT_VF_MEM,
> +		.vdi_in_p = IPUV3_CHANNEL_MEM_VDI_PREV,
> +		.vdi_in = IPUV3_CHANNEL_MEM_VDI_CUR,
> +		.vdi_in_n = IPUV3_CHANNEL_MEM_VDI_NEXT,

See above.

> +	},
> +	[IC_TASK_POST_PROCESSOR] = {
> +		.in = IPUV3_CHANNEL_MEM_IC_PP,
> +		.out = IPUV3_CHANNEL_IC_PP_MEM,
> +		.rot_in = IPUV3_CHANNEL_MEM_ROT_PP,
> +		.rot_out = IPUV3_CHANNEL_ROT_PP_MEM,
> +	},
> +};
> +
> +struct ipu_ic_dma_buf {
> +	void          *virt;
> +	dma_addr_t    phys;
> +	unsigned long len;
> +};
> +
> +/* dimensions of one tile */
> +struct ipu_ic_tile {
> +	u32 width;
> +	u32 height;
> +	/* size and strides are in bytes */
> +	u32 size;
> +	u32 stride;
> +	u32 rot_stride;
> +	/* start Y or packed offset of this tile */
> +	u32 offset;
> +	/* offset from start to tile in U plane, for planar formats */
> +	u32 u_off;
> +	/* offset from start to tile in V plane, for planar formats */
> +	u32 v_off;
> +};
> +
> +struct ipu_ic_pixfmt {
> +	char	*name;
> +	u32	fourcc;        /* V4L2 fourcc */
> +	int     bpp;           /* total bpp */
> +	int     y_depth;       /* depth of Y plane for planar formats */
> +	int     uv_width_dec;  /* decimation in width for U/V planes */
> +	int     uv_height_dec; /* decimation in height for U/V planes */
> +	bool    uv_swapped;    /* U and V planes are swapped */
> +	bool    uv_packed;     /* partial planar (U and V in same plane) */
> +};
> +
> +struct ipu_ic_image {
> +	struct ipu_image base;
> +	enum image_convert_type type;
> +
> +	const struct ipu_ic_pixfmt *fmt;
> +	unsigned int stride;
> +
> +	/* # of rows (horizontal stripes) if dest height is > 1024 */
> +	unsigned int num_rows;
> +	/* # of columns (vertical stripes) if dest width is > 1024 */
> +	unsigned int num_cols;
> +
> +	struct ipu_ic_tile tile[MAX_TILES];
> +};
> +
> +struct image_converter_ctx;
> +struct image_converter;
>  struct ipu_ic_priv;
> +struct ipu_ic;
> +
> +struct image_converter_run {
> +	struct image_converter_ctx *ctx;
> +
> +	dma_addr_t in_phys;
> +	dma_addr_t out_phys;
> +
> +	int status;
> +
> +	struct list_head list;
> +};
> +
> +struct image_converter_ctx {
> +	struct image_converter *cvt;
> +
> +	image_converter_cb_t complete;
> +	void *complete_context;
> +
> +	/* Source/destination image data and rotation mode */
> +	struct ipu_ic_image in;
> +	struct ipu_ic_image out;
> +	enum ipu_rotate_mode rot_mode;
> +
> +	/* intermediate buffer for rotation */
> +	struct ipu_ic_dma_buf rot_intermediate[2];

No need to change it now, but I assume these could be per IC task
instead of per context.

> +	/* current buffer number for double buffering */
> +	int cur_buf_num;
> +
> +	bool aborting;
> +	struct completion aborted;
> +
> +	/* can we use double-buffering for this conversion operation? */
> +	bool double_buffering;
> +	/* num_rows * num_cols */
> +	unsigned int num_tiles;
> +	/* next tile to process */
> +	unsigned int next_tile;
> +	/* where to place converted tile in dest image */
> +	unsigned int out_tile_map[MAX_TILES];
> +
> +	struct list_head list;
> +};
> +
> +struct image_converter {
> +	struct ipu_ic *ic;
> +
> +	struct ipuv3_channel *in_chan;
> +	struct ipuv3_channel *out_chan;
> +	struct ipuv3_channel *rotation_in_chan;
> +	struct ipuv3_channel *rotation_out_chan;
> +
> +	/* the IPU end-of-frame irqs */
> +	int out_eof_irq;
> +	int rot_out_eof_irq;
> +
> +	spinlock_t irqlock;
> +
> +	/* list of convert contexts */
> +	struct list_head ctx_list;
> +	/* queue of conversion runs */
> +	struct list_head pending_q;
> +	/* queue of completed runs */
> +	struct list_head done_q;
> +
> +	/* the current conversion run */
> +	struct image_converter_run *current_run;
> +};
>  
>  struct ipu_ic {
>  	enum ipu_ic_task task;
>  	const struct ic_task_regoffs *reg;
>  	const struct ic_task_bitfields *bit;
> +	const struct ic_task_channels *ch;
>  
>  	enum ipu_color_space in_cs, g_in_cs;
>  	enum ipu_color_space out_cs;
> @@ -151,6 +340,8 @@ struct ipu_ic {
>  	bool rotation;
>  	bool in_use;
>  
> +	struct image_converter cvt;
> +
>  	struct ipu_ic_priv *priv;
>  };
>  
> @@ -619,7 +810,7 @@ int ipu_ic_task_idma_init(struct ipu_ic *ic, struct ipuv3_channel *channel,
>  	ipu_ic_write(ic, ic_idmac_2, IC_IDMAC_2);
>  	ipu_ic_write(ic, ic_idmac_3, IC_IDMAC_3);
>  
> -	if (rot >= IPU_ROTATE_90_RIGHT)
> +	if (ipu_rot_mode_is_irt(rot))
>  		ic->rotation = true;
>  
>  unlock:
> @@ -648,6 +839,1487 @@ static void ipu_irt_disable(struct ipu_ic *ic)
>  	}
>  }
>  
> +/*
> + * Complete image conversion support follows
> + */
> +
> +static const struct ipu_ic_pixfmt ipu_ic_formats[] = {
> +	{
> +		.name	= "RGB565",

Please drop the names, keeping a list of user readable format names is
the v4l2 core's business, not ours.

> +		.fourcc	= V4L2_PIX_FMT_RGB565,
> +		.bpp    = 16,

bpp is only ever used in bytes, not bits (always divided by 8).
Why not make this bytes_per_pixel or pixel_stride = 2.

> +	}, {
> +		.name	= "RGB24",
> +		.fourcc	= V4L2_PIX_FMT_RGB24,
> +		.bpp    = 24,
> +	}, {
> +		.name	= "BGR24",
> +		.fourcc	= V4L2_PIX_FMT_BGR24,
> +		.bpp    = 24,
> +	}, {
> +		.name	= "RGB32",
> +		.fourcc	= V4L2_PIX_FMT_RGB32,
> +		.bpp    = 32,
> +	}, {
> +		.name	= "BGR32",
> +		.fourcc	= V4L2_PIX_FMT_BGR32,
> +		.bpp    = 32,
> +	}, {
> +		.name	= "4:2:2 packed, YUYV",
> +		.fourcc	= V4L2_PIX_FMT_YUYV,
> +		.bpp    = 16,
> +		.uv_width_dec = 2,
> +		.uv_height_dec = 1,
> +	}, {
> +		.name	= "4:2:2 packed, UYVY",
> +		.fourcc	= V4L2_PIX_FMT_UYVY,
> +		.bpp    = 16,
> +		.uv_width_dec = 2,
> +		.uv_height_dec = 1,
> +	}, {
> +		.name	= "4:2:0 planar, YUV",
> +		.fourcc	= V4L2_PIX_FMT_YUV420,
> +		.bpp    = 12,
> +		.y_depth = 8,

y_depth is only ever used in bytes, not bits (always divided by 8).
Why not make this bool planar instead.

> +		.uv_width_dec = 2,
> +		.uv_height_dec = 2,
> +	}, {
> +		.name	= "4:2:0 planar, YVU",
> +		.fourcc	= V4L2_PIX_FMT_YVU420,
> +		.bpp    = 12,
> +		.y_depth = 8,
> +		.uv_width_dec = 2,
> +		.uv_height_dec = 2,
> +		.uv_swapped = true,
> +	}, {
> +		.name   = "4:2:0 partial planar, NV12",
> +		.fourcc = V4L2_PIX_FMT_NV12,
> +		.bpp    = 12,
> +		.y_depth = 8,
> +		.uv_width_dec = 2,
> +		.uv_height_dec = 2,
> +		.uv_packed = true,
> +	}, {
> +		.name   = "4:2:2 planar, YUV",
> +		.fourcc = V4L2_PIX_FMT_YUV422P,
> +		.bpp    = 16,
> +		.y_depth = 8,
> +		.uv_width_dec = 2,
> +		.uv_height_dec = 1,
> +	}, {
> +		.name   = "4:2:2 partial planar, NV16",
> +		.fourcc = V4L2_PIX_FMT_NV16,
> +		.bpp    = 16,
> +		.y_depth = 8,
> +		.uv_width_dec = 2,
> +		.uv_height_dec = 1,
> +		.uv_packed = true,
> +	},
> +};
> +
> +static const struct ipu_ic_pixfmt *ipu_ic_get_format(u32 fourcc)
> +{
> +	const struct ipu_ic_pixfmt *ret = NULL;
> +	unsigned int i;
> +
> +	for (i = 0; i < ARRAY_SIZE(ipu_ic_formats); i++) {
> +		if (ipu_ic_formats[i].fourcc = fourcc) {
> +			ret = &ipu_ic_formats[i];
> +			break;
> +		}
> +	}
> +
> +	return ret;
> +}
> +
> +static void ipu_ic_dump_format(struct image_converter_ctx *ctx,
> +			       struct ipu_ic_image *ic_image)
> +{
> +	struct ipu_ic_priv *priv = ctx->cvt->ic->priv;
> +
> +	dev_dbg(priv->ipu->dev,
> +		"ctx %p: %s format: %dx%d (%dx%d tiles of size %dx%d), %c%c%c%c\n",
> +		ctx,
> +		ic_image->type = IMAGE_CONVERT_OUT ? "Output" : "Input",
> +		ic_image->base.pix.width, ic_image->base.pix.height,
> +		ic_image->num_cols, ic_image->num_rows,
> +		ic_image->tile[0].width, ic_image->tile[0].height,
> +		ic_image->fmt->fourcc & 0xff,
> +		(ic_image->fmt->fourcc >> 8) & 0xff,
> +		(ic_image->fmt->fourcc >> 16) & 0xff,
> +		(ic_image->fmt->fourcc >> 24) & 0xff);
> +}
> +
> +int ipu_image_convert_enum_format(int index, const char **desc, u32 *fourcc)
> +{
> +	const struct ipu_ic_pixfmt *fmt;
> +
> +	if (index >= (int)ARRAY_SIZE(ipu_ic_formats))
> +		return -EINVAL;
> +
> +	/* Format found */
> +	fmt = &ipu_ic_formats[index];
> +	*desc = fmt->name;
> +	*fourcc = fmt->fourcc;
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(ipu_image_convert_enum_format);
> +
> +static void ipu_ic_free_dma_buf(struct ipu_ic_priv *priv,
> +				struct ipu_ic_dma_buf *buf)
> +{
> +	if (buf->virt)
> +		dma_free_coherent(priv->ipu->dev,
> +				  buf->len, buf->virt, buf->phys);
> +	buf->virt = NULL;
> +	buf->phys = 0;
> +}
> +
> +static int ipu_ic_alloc_dma_buf(struct ipu_ic_priv *priv,
> +				struct ipu_ic_dma_buf *buf,
> +				int size)
> +{
> +	unsigned long newlen = PAGE_ALIGN(size);
> +
> +	if (buf->virt) {
> +		if (buf->len = newlen)
> +			return 0;
> +		ipu_ic_free_dma_buf(priv, buf);
> +	}

Is it necessary to support reallocation? This is currently only used by
the prepare function, which creates a new context.

> +
> +	buf->len = newlen;
> +	buf->virt = dma_alloc_coherent(priv->ipu->dev, buf->len, &buf->phys,
> +				       GFP_DMA | GFP_KERNEL);
> +	if (!buf->virt) {
> +		dev_err(priv->ipu->dev, "failed to alloc dma buffer\n");
> +		return -ENOMEM;
> +	}
> +
> +	return 0;
> +}
> +
> +static inline int ipu_ic_num_stripes(int dim)
> +{
> +	if (dim <= 1024)
> +		return 1;
> +	else if (dim <= 2048)
> +		return 2;
> +	else
> +		return 4;
> +}
> +
> +static void ipu_ic_calc_tile_dimensions(struct image_converter_ctx *ctx,
> +					struct ipu_ic_image *image)
> +{
> +	int i;
> +
> +	for (i = 0; i < ctx->num_tiles; i++) {
> +		struct ipu_ic_tile *tile = &image->tile[i];
> +
> +		tile->height = image->base.pix.height / image->num_rows;
> +		tile->width = image->base.pix.width / image->num_cols;

We already have talked about this, this simplified tiling will cause
image artifacts (horizontal and vertical seams at the tile borders) when
the bilinear upscaler source pixel step is significantly smaller than a
whole pixel.
This can be fixed in the future by using overlapping tiles of different
sizes and possibly by slightly changing the scaling factors of
individual tiles.

> +		tile->size = ((tile->height * image->fmt->bpp) >> 3) *
> +			tile->width;
> +
> +		if (image->fmt->y_depth) {
> +			tile->stride > +				(image->fmt->y_depth * tile->width) >> 3;
> +			tile->rot_stride > +				(image->fmt->y_depth * tile->height) >> 3;
> +		} else {
> +			tile->stride > +				(image->fmt->bpp * tile->width) >> 3;
> +			tile->rot_stride > +				(image->fmt->bpp * tile->height) >> 3;
> +		}
> +	}
> +}
> +
> +/*
> + * Use the rotation transformation to find the tile coordinates
> + * (row, col) of a tile in the destination frame that corresponds
> + * to the given tile coordinates of a source frame. The destination
> + * coordinate is then converted to a tile index.
> + */
> +static int ipu_ic_transform_tile_index(struct image_converter_ctx *ctx,
> +				       int src_row, int src_col)
> +{
> +	struct ipu_ic_priv *priv = ctx->cvt->ic->priv;
> +	struct ipu_ic_image *s_image = &ctx->in;
> +	struct ipu_ic_image *d_image = &ctx->out;
> +	int cos, sin, dst_row, dst_col;
> +
> +	/* with no rotation it's a 1:1 mapping */
> +	if (ctx->rot_mode = IPU_ROTATE_NONE)
> +		return src_row * s_image->num_cols + src_col;
> +
> +	if (ctx->rot_mode & IPU_ROT_BIT_90) {
> +		cos = 0;
> +		sin = 1;
> +	} else {
> +		cos = 1;
> +		sin = 0;
> +	}
>+
> +	/*
> +	 * before doing the transform, first we have to translate
> +	 * source row,col for an origin in the center of s_image
> +	 */
> +	src_row *= 2;
> +	src_col *= 2;
> +	src_row -= s_image->num_rows - 1;
> +	src_col -= s_image->num_cols - 1;
> +
> +	/* do the rotation transform */
> +	dst_col = src_col * cos - src_row * sin;
> +	dst_row = src_col * sin + src_row * cos;

This looks nice, but I'd just move the rot_mode conditional below
assignment of src_row/col and do away with the sin/cos temporary
variables:

	/*
	 * before doing the transform, first we have to translate
	 * source row,col for an origin in the center of s_image
	 */
	src_row = src_row * 2 - (s_image->num_rows - 1);
	src_col = src_col * 2 - (s_image->num_cols - 1);

	/* do the rotation transform */
	if (ctx->rot_mode & IPU_ROT_BIT_90) {
		dst_col = -src_row;
		dst_row = src_col;
	} else {
		dst_col = src_col;
		dst_row = src_row;
	}

> +	/* apply flip */
> +	if (ctx->rot_mode & IPU_ROT_BIT_HFLIP)
> +		dst_col = -dst_col;
> +	if (ctx->rot_mode & IPU_ROT_BIT_VFLIP)
> +		dst_row = -dst_row;
> +
> +	dev_dbg(priv->ipu->dev, "ctx %p: [%d,%d] --> [%d,%d]\n",
> +		ctx, src_col, src_row, dst_col, dst_row);
> +
> +	/*
> +	 * finally translate dest row,col using an origin in upper
> +	 * left of d_image
> +	 */
> +	dst_row += d_image->num_rows - 1;
> +	dst_col += d_image->num_cols - 1;
> +	dst_row /= 2;
> +	dst_col /= 2;
> +
> +	return dst_row * d_image->num_cols + dst_col;
> +}
> +
> +/*
> + * Fill the out_tile_map[] with transformed destination tile indeces.
> + */
> +static void ipu_ic_calc_out_tile_map(struct image_converter_ctx *ctx)
> +{
> +	struct ipu_ic_image *s_image = &ctx->in;
> +	unsigned int row, col, tile = 0;
> +
> +	for (row = 0; row < s_image->num_rows; row++) {
> +		for (col = 0; col < s_image->num_cols; col++) {
> +			ctx->out_tile_map[tile] > +				ipu_ic_transform_tile_index(ctx, row, col);
> +			tile++;
> +		}
> +	}
> +}
> +
> +static void ipu_ic_calc_tile_offsets_planar(struct image_converter_ctx *ctx,
> +					    struct ipu_ic_image *image)
> +{
> +	struct ipu_ic_priv *priv = ctx->cvt->ic->priv;
> +	const struct ipu_ic_pixfmt *fmt = image->fmt;
> +	unsigned int row, col, tile = 0;
> +	u32 H, w, h, y_depth, y_stride, uv_stride;
> +	u32 uv_row_off, uv_col_off, uv_off, u_off, v_off, tmp;
> +	u32 y_row_off, y_col_off, y_off;
> +	u32 y_size, uv_size;
> +
> +	/* setup some convenience vars */
> +	H = image->base.pix.height;
> +
> +	y_depth = fmt->y_depth;
> +	y_stride = image->stride;
> +	uv_stride = y_stride / fmt->uv_width_dec;
> +	if (fmt->uv_packed)
> +		uv_stride *= 2;
> +
> +	y_size = H * y_stride;
> +	uv_size = y_size / (fmt->uv_width_dec * fmt->uv_height_dec);
> +
> +	for (row = 0; row < image->num_rows; row++) {
> +		w = image->tile[tile].width;
> +		h = image->tile[tile].height;
> +		y_row_off = row * h * y_stride;
> +		uv_row_off = (row * h * uv_stride) / fmt->uv_height_dec;
> +
> +		for (col = 0; col < image->num_cols; col++) {
> +			y_col_off = (col * w * y_depth) >> 3;

We know that for planar formats, y_depth can only ever be 8. No need to
calculate this here.

> +			uv_col_off = y_col_off / fmt->uv_width_dec;
> +			if (fmt->uv_packed)
> +				uv_col_off *= 2;
> +
> +			y_off = y_row_off + y_col_off;
> +			uv_off = uv_row_off + uv_col_off;
> +
> +			u_off = y_size - y_off + uv_off;
> +			v_off = (fmt->uv_packed) ? 0 : u_off + uv_size;
> +			if (fmt->uv_swapped) {
> +				tmp = u_off;
> +				u_off = v_off;
> +				v_off = tmp;
> +			}
> +
> +			image->tile[tile].offset = y_off;
> +			image->tile[tile].u_off = u_off;
> +			image->tile[tile++].v_off = v_off;
> +
> +			dev_dbg(priv->ipu->dev,
> +				"ctx %p: %s@[%d,%d]: y_off %08x, u_off %08x, v_off %08x\n",
> +				ctx, image->type = IMAGE_CONVERT_IN ?
> +				"Input" : "Output", row, col,
> +				y_off, u_off, v_off);
> +		}
> +	}
> +}
> +
> +static void ipu_ic_calc_tile_offsets_packed(struct image_converter_ctx *ctx,
> +					    struct ipu_ic_image *image)
> +{
> +	struct ipu_ic_priv *priv = ctx->cvt->ic->priv;
> +	const struct ipu_ic_pixfmt *fmt = image->fmt;
> +	unsigned int row, col, tile = 0;
> +	u32 w, h, bpp, stride;
> +	u32 row_off, col_off;
> +
> +	/* setup some convenience vars */
> +	stride = image->stride;
> +	bpp = fmt->bpp;
> +
> +	for (row = 0; row < image->num_rows; row++) {
> +		w = image->tile[tile].width;
> +		h = image->tile[tile].height;
> +		row_off = row * h * stride;
> +
> +		for (col = 0; col < image->num_cols; col++) {
> +			col_off = (col * w * bpp) >> 3;
> +
> +			image->tile[tile].offset = row_off + col_off;
> +			image->tile[tile].u_off = 0;
> +			image->tile[tile++].v_off = 0;
> +
> +			dev_dbg(priv->ipu->dev,
> +				"ctx %p: %s@[%d,%d]: phys %08x\n", ctx,
> +				image->type = IMAGE_CONVERT_IN ?
> +				"Input" : "Output", row, col,
> +				row_off + col_off);
> +		}
> +	}
> +}
> +
> +static void ipu_ic_calc_tile_offsets(struct image_converter_ctx *ctx,
> +				     struct ipu_ic_image *image)
> +{
> +	if (image->fmt->y_depth)
> +		ipu_ic_calc_tile_offsets_planar(ctx, image);
> +	else
> +		ipu_ic_calc_tile_offsets_packed(ctx, image);
> +}
> +
> +/*
> + * return the number of runs in given queue (pending_q or done_q)
> + * for this context. hold irqlock when calling.
> + */

Most of the following code seems to be running under one big spinlock.
Is this really necessary?
All the IRQ handlers to is potentially call ipu_ic_convert_stop, update
the CPMEM, mark buffers as ready for the IDMAC, and and put the current
run on the done_q when ready. Can't the IC/IDMAC register access be
locked completely separately from the list handling?

> +static int ipu_ic_get_run_count(struct image_converter_ctx *ctx,
> +				struct list_head *q)
> +{
> +	struct image_converter_run *run;
> +	int count = 0;

Add
	lockdep_assert_held(&ctx->irqlock);
for the functions that expect their caller to be holding the lock.

> +	list_for_each_entry(run, q, list) {
> +		if (run->ctx = ctx)
> +			count++;
> +	}
> +
> +	return count;
> +}
> +
> +/* hold irqlock when calling */
> +static void ipu_ic_convert_stop(struct image_converter_run *run)
> +{
> +	struct image_converter_ctx *ctx = run->ctx;
> +	struct image_converter *cvt = ctx->cvt;
> +	struct ipu_ic_priv *priv = cvt->ic->priv;
> +
> +	dev_dbg(priv->ipu->dev, "%s: stopping ctx %p run %p\n",
> +		__func__, ctx, run);

Maybe add some indication which IC task this context belongs to?

> +	/* disable IC tasks and the channels */
> +	ipu_ic_task_disable(cvt->ic);
> +	ipu_idmac_disable_channel(cvt->in_chan);
> +	ipu_idmac_disable_channel(cvt->out_chan);
> +
> +	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
> +		ipu_idmac_disable_channel(cvt->rotation_in_chan);
> +		ipu_idmac_disable_channel(cvt->rotation_out_chan);
> +		ipu_idmac_unlink(cvt->out_chan, cvt->rotation_in_chan);
> +	}
> +
> +	ipu_ic_disable(cvt->ic);
> +}
> +
> +/* hold irqlock when calling */
> +static void init_idmac_channel(struct image_converter_ctx *ctx,
> +			       struct ipuv3_channel *channel,
> +			       struct ipu_ic_image *image,
> +			       enum ipu_rotate_mode rot_mode,
> +			       bool rot_swap_width_height)
> +{
> +	struct image_converter *cvt = ctx->cvt;
> +	unsigned int burst_size;
> +	u32 width, height, stride;
> +	dma_addr_t addr0, addr1 = 0;
> +	struct ipu_image tile_image;
> +	unsigned int tile_idx[2];
> +
> +	if (image->type = IMAGE_CONVERT_OUT) {
> +		tile_idx[0] = ctx->out_tile_map[0];
> +		tile_idx[1] = ctx->out_tile_map[1];
> +	} else {
> +		tile_idx[0] = 0;
> +		tile_idx[1] = 1;
> +	}
> +
> +	if (rot_swap_width_height) {
> +		width = image->tile[0].height;
> +		height = image->tile[0].width;
> +		stride = image->tile[0].rot_stride;
> +		addr0 = ctx->rot_intermediate[0].phys;
> +		if (ctx->double_buffering)
> +			addr1 = ctx->rot_intermediate[1].phys;
> +	} else {
> +		width = image->tile[0].width;
> +		height = image->tile[0].height;
> +		stride = image->stride;
> +		addr0 = image->base.phys0 +
> +			image->tile[tile_idx[0]].offset;
> +		if (ctx->double_buffering)
> +			addr1 = image->base.phys0 +
> +				image->tile[tile_idx[1]].offset;
> +	}
> +
> +	ipu_cpmem_zero(channel);
> +
> +	memset(&tile_image, 0, sizeof(tile_image));
> +	tile_image.pix.width = tile_image.rect.width = width;
> +	tile_image.pix.height = tile_image.rect.height = height;
> +	tile_image.pix.bytesperline = stride;
> +	tile_image.pix.pixelformat =  image->fmt->fourcc;
> +	tile_image.phys0 = addr0;
> +	tile_image.phys1 = addr1;
> +	ipu_cpmem_set_image(channel, &tile_image);
> +
> +	if (image->fmt->y_depth && !rot_swap_width_height)
> +		ipu_cpmem_set_uv_offset(channel,
> +					image->tile[tile_idx[0]].u_off,
> +					image->tile[tile_idx[0]].v_off);
> +
> +	if (rot_mode)
> +		ipu_cpmem_set_rotation(channel, rot_mode);
> +
> +	if (channel = cvt->rotation_in_chan ||
> +	    channel = cvt->rotation_out_chan) {
> +		burst_size = 8;
> +		ipu_cpmem_set_block_mode(channel);
> +	} else
> +		burst_size = (width % 16) ? 8 : 16;

This is for later, but it might turn out to be better to accept a little
overdraw if stride allows for it and use the larger burst size,
especially for wide images.

> +	ipu_cpmem_set_burstsize(channel, burst_size);
> +
> +	ipu_ic_task_idma_init(cvt->ic, channel, width, height,
> +			      burst_size, rot_mode);
> +
> +	ipu_cpmem_set_axi_id(channel, 1);
> +
> +	ipu_idmac_set_double_buffer(channel, ctx->double_buffering);
> +}
> +
> +/* hold irqlock when calling */
> +static int ipu_ic_convert_start(struct image_converter_run *run)
> +{
> +	struct image_converter_ctx *ctx = run->ctx;
> +	struct image_converter *cvt = ctx->cvt;
> +	struct ipu_ic_priv *priv = cvt->ic->priv;
> +	struct ipu_ic_image *s_image = &ctx->in;
> +	struct ipu_ic_image *d_image = &ctx->out;
> +	enum ipu_color_space src_cs, dest_cs;
> +	unsigned int dest_width, dest_height;
> +	int ret;
> +
> +	dev_dbg(priv->ipu->dev, "%s: starting ctx %p run %p\n",
> +		__func__, ctx, run);
> +
> +	src_cs = ipu_pixelformat_to_colorspace(s_image->fmt->fourcc);
> +	dest_cs = ipu_pixelformat_to_colorspace(d_image->fmt->fourcc);
> +
> +	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
> +		/* swap width/height for resizer */
> +		dest_width = d_image->tile[0].height;
> +		dest_height = d_image->tile[0].width;
> +	} else {
> +		dest_width = d_image->tile[0].width;
> +		dest_height = d_image->tile[0].height;
> +	}
> +
> +	/* setup the IC resizer and CSC */
> +	ret = ipu_ic_task_init(cvt->ic,
> +			       s_image->tile[0].width,
> +			       s_image->tile[0].height,
> +			       dest_width,
> +			       dest_height,
> +			       src_cs, dest_cs);
> +	if (ret) {
> +		dev_err(priv->ipu->dev, "ipu_ic_task_init failed, %d\n", ret);
> +		return ret;
> +	}
> +
> +	/* init the source MEM-->IC PP IDMAC channel */
> +	init_idmac_channel(ctx, cvt->in_chan, s_image,
> +			   IPU_ROTATE_NONE, false);
> +
> +	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
> +		/* init the IC PP-->MEM IDMAC channel */
> +		init_idmac_channel(ctx, cvt->out_chan, d_image,
> +				   IPU_ROTATE_NONE, true);
> +
> +		/* init the MEM-->IC PP ROT IDMAC channel */
> +		init_idmac_channel(ctx, cvt->rotation_in_chan, d_image,
> +				   ctx->rot_mode, true);
> +
> +		/* init the destination IC PP ROT-->MEM IDMAC channel */
> +		init_idmac_channel(ctx, cvt->rotation_out_chan, d_image,
> +				   IPU_ROTATE_NONE, false);
> +
> +		/* now link IC PP-->MEM to MEM-->IC PP ROT */
> +		ipu_idmac_link(cvt->out_chan, cvt->rotation_in_chan);
> +	} else {
> +		/* init the destination IC PP-->MEM IDMAC channel */
> +		init_idmac_channel(ctx, cvt->out_chan, d_image,
> +				   ctx->rot_mode, false);
> +	}
> +
> +	/* enable the IC */
> +	ipu_ic_enable(cvt->ic);
> +
> +	/* set buffers ready */
> +	ipu_idmac_select_buffer(cvt->in_chan, 0);
> +	ipu_idmac_select_buffer(cvt->out_chan, 0);
> +	if (ipu_rot_mode_is_irt(ctx->rot_mode))
> +		ipu_idmac_select_buffer(cvt->rotation_out_chan, 0);
> +	if (ctx->double_buffering) {
> +		ipu_idmac_select_buffer(cvt->in_chan, 1);
> +		ipu_idmac_select_buffer(cvt->out_chan, 1);
> +		if (ipu_rot_mode_is_irt(ctx->rot_mode))
> +			ipu_idmac_select_buffer(cvt->rotation_out_chan, 1);
> +	}
> +
> +	/* enable the channels! */
> +	ipu_idmac_enable_channel(cvt->in_chan);
> +	ipu_idmac_enable_channel(cvt->out_chan);
> +	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
> +		ipu_idmac_enable_channel(cvt->rotation_in_chan);
> +		ipu_idmac_enable_channel(cvt->rotation_out_chan);
> +	}
> +
> +	ipu_ic_task_enable(cvt->ic);
> +
> +	ipu_cpmem_dump(cvt->in_chan);
> +	ipu_cpmem_dump(cvt->out_chan);
> +	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
> +		ipu_cpmem_dump(cvt->rotation_in_chan);
> +		ipu_cpmem_dump(cvt->rotation_out_chan);
> +	}
> +
> +	ipu_dump(priv->ipu);
> +
> +	return 0;
> +}
> +
> +/* hold irqlock when calling */
> +static int ipu_ic_run(struct image_converter_run *run)
> +{
> +	struct image_converter_ctx *ctx = run->ctx;
> +	struct image_converter *cvt = ctx->cvt;
> +
> +	ctx->in.base.phys0 = run->in_phys;
> +	ctx->out.base.phys0 = run->out_phys;
> +
> +	ctx->cur_buf_num = 0;
> +	ctx->next_tile = 1;
> +
> +	/* remove run from pending_q and set as current */
> +	list_del(&run->list);
> +	cvt->current_run = run;
> +
> +	return ipu_ic_convert_start(run);
> +}
> +
> +/* hold irqlock when calling */
> +static void ipu_ic_run_next(struct image_converter *cvt)
> +{
> +	struct ipu_ic_priv *priv = cvt->ic->priv;
> +	struct image_converter_run *run, *tmp;
> +	int ret;
> +
> +	list_for_each_entry_safe(run, tmp, &cvt->pending_q, list) {
> +		/* skip contexts that are aborting */
> +		if (run->ctx->aborting) {
> +			dev_dbg(priv->ipu->dev,
> +				 "%s: skipping aborting ctx %p run %p\n",
> +				 __func__, run->ctx, run);
> +			continue;
> +		}
> +
> +		ret = ipu_ic_run(run);
> +		if (!ret)
> +			break;
> +
> +		/*
> +		 * something went wrong with start, add the run
> +		 * to done q and continue to the next run in the
> +		 * pending q.
> +		 */
> +		run->status = ret;
> +		list_add_tail(&run->list, &cvt->done_q);
> +		cvt->current_run = NULL;
> +	}
> +}
> +
> +static void ipu_ic_empty_done_q(struct image_converter *cvt)
> +{
> +	struct ipu_ic_priv *priv = cvt->ic->priv;
> +	struct image_converter_run *run;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&cvt->irqlock, flags);
> +
> +	while (!list_empty(&cvt->done_q)) {
> +		run = list_entry(cvt->done_q.next,
> +				 struct image_converter_run,
> +				 list);
> +
> +		list_del(&run->list);
> +
> +		dev_dbg(priv->ipu->dev,
> +			"%s: completing ctx %p run %p with %d\n",
> +			__func__, run->ctx, run, run->status);
> +
> +		/* call the completion callback and free the run */
> +		spin_unlock_irqrestore(&cvt->irqlock, flags);
> +		run->ctx->complete(run->ctx->complete_context, run,
> +				   run->status);
> +		kfree(run);
> +		spin_lock_irqsave(&cvt->irqlock, flags);
> +	}
> +
> +	spin_unlock_irqrestore(&cvt->irqlock, flags);
> +}
> +
> +/*
> + * the bottom half thread clears out the done_q, calling the
> + * completion handler for each.
> + */
> +static irqreturn_t ipu_ic_bh(int irq, void *dev_id)
> +{
> +	struct image_converter *cvt = dev_id;
> +	struct ipu_ic_priv *priv = cvt->ic->priv;
> +	struct image_converter_ctx *ctx;
> +	unsigned long flags;
> +
> +	dev_dbg(priv->ipu->dev, "%s: enter\n", __func__);
> +
> +	ipu_ic_empty_done_q(cvt);
> +
> +	spin_lock_irqsave(&cvt->irqlock, flags);
> +
> +	/*
> +	 * the done_q is cleared out, signal any contexts
> +	 * that are aborting that abort can complete.
> +	 */
> +	list_for_each_entry(ctx, &cvt->ctx_list, list) {
> +		if (ctx->aborting) {
> +			dev_dbg(priv->ipu->dev,
> +				 "%s: signaling abort for ctx %p\n",
> +				 __func__, ctx);
> +			complete(&ctx->aborted);
> +		}
> +	}
> +
> +	spin_unlock_irqrestore(&cvt->irqlock, flags);
> +
> +	dev_dbg(priv->ipu->dev, "%s: exit\n", __func__);
> +	return IRQ_HANDLED;
> +}
> +
> +/* hold irqlock when calling */
> +static irqreturn_t ipu_ic_doirq(struct image_converter_run *run)
> +{
> +	struct image_converter_ctx *ctx = run->ctx;
> +	struct image_converter *cvt = ctx->cvt;
> +	struct ipu_ic_tile *src_tile, *dst_tile;
> +	struct ipu_ic_image *s_image = &ctx->in;
> +	struct ipu_ic_image *d_image = &ctx->out;
> +	struct ipuv3_channel *outch;
> +	unsigned int dst_idx;
> +
> +	outch = ipu_rot_mode_is_irt(ctx->rot_mode) ?
> +		cvt->rotation_out_chan : cvt->out_chan;
> +
> +	/*
> +	 * It is difficult to stop the channel DMA before the channels
> +	 * enter the paused state. Without double-buffering the channels
> +	 * are always in a paused state when the EOF irq occurs, so it
> +	 * is safe to stop the channels now. For double-buffering we
> +	 * just ignore the abort until the operation completes, when it
> +	 * is safe to shut down.
> +	 */
> +	if (ctx->aborting && !ctx->double_buffering) {
> +		ipu_ic_convert_stop(run);
> +		run->status = -EIO;
> +		goto done;
> +	}
> +
> +	if (ctx->next_tile = ctx->num_tiles) {
> +		/*
> +		 * the conversion is complete
> +		 */
> +		ipu_ic_convert_stop(run);
> +		run->status = 0;
> +		goto done;
> +	}
> +
> +	/*
> +	 * not done, place the next tile buffers.
> +	 */
> +	if (!ctx->double_buffering) {
> +
> +		src_tile = &s_image->tile[ctx->next_tile];
> +		dst_idx = ctx->out_tile_map[ctx->next_tile];
> +		dst_tile = &d_image->tile[dst_idx];
> +
> +		ipu_cpmem_set_buffer(cvt->in_chan, 0,
> +				     s_image->base.phys0 + src_tile->offset);
> +		ipu_cpmem_set_buffer(outch, 0,
> +				     d_image->base.phys0 + dst_tile->offset);
> +		if (s_image->fmt->y_depth)
> +			ipu_cpmem_set_uv_offset(cvt->in_chan,
> +						src_tile->u_off,
> +						src_tile->v_off);
> +		if (d_image->fmt->y_depth)
> +			ipu_cpmem_set_uv_offset(outch,
> +						dst_tile->u_off,
> +						dst_tile->v_off);
> +
> +		ipu_idmac_select_buffer(cvt->in_chan, 0);
> +		ipu_idmac_select_buffer(outch, 0);
> +
> +	} else if (ctx->next_tile < ctx->num_tiles - 1) {
> +
> +		src_tile = &s_image->tile[ctx->next_tile + 1];
> +		dst_idx = ctx->out_tile_map[ctx->next_tile + 1];
> +		dst_tile = &d_image->tile[dst_idx];
> +
> +		ipu_cpmem_set_buffer(cvt->in_chan, ctx->cur_buf_num,
> +				     s_image->base.phys0 + src_tile->offset);
> +		ipu_cpmem_set_buffer(outch, ctx->cur_buf_num,
> +				     d_image->base.phys0 + dst_tile->offset);
> +
> +		ipu_idmac_select_buffer(cvt->in_chan, ctx->cur_buf_num);
> +		ipu_idmac_select_buffer(outch, ctx->cur_buf_num);
> +
> +		ctx->cur_buf_num ^= 1;
> +	}
> +
> +	ctx->next_tile++;
> +	return IRQ_HANDLED;
> +done:
> +	list_add_tail(&run->list, &cvt->done_q);
> +	cvt->current_run = NULL;
> +	ipu_ic_run_next(cvt);
> +	return IRQ_WAKE_THREAD;
> +}
> +
> +static irqreturn_t ipu_ic_norotate_irq(int irq, void *data)
> +{
> +	struct image_converter *cvt = data;
> +	struct image_converter_ctx *ctx;
> +	struct image_converter_run *run;
> +	unsigned long flags;
> +	irqreturn_t ret;
> +
> +	spin_lock_irqsave(&cvt->irqlock, flags);
> +
> +	/* get current run and its context */
> +	run = cvt->current_run;
> +	if (!run) {
> +		ret = IRQ_NONE;
> +		goto out;
> +	}
> +
> +	ctx = run->ctx;
> +
> +	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
> +		/* this is a rotation operation, just ignore */
> +		spin_unlock_irqrestore(&cvt->irqlock, flags);
> +		return IRQ_HANDLED;
> +	}

Why enable the out_chan EOF irq at all when using the IRT mode?

> +
> +	ret = ipu_ic_doirq(run);
> +out:
> +	spin_unlock_irqrestore(&cvt->irqlock, flags);
> +	return ret;
> +}
> +
> +static irqreturn_t ipu_ic_rotate_irq(int irq, void *data)
> +{
> +	struct image_converter *cvt = data;
> +	struct ipu_ic_priv *priv = cvt->ic->priv;
> +	struct image_converter_ctx *ctx;
> +	struct image_converter_run *run;
> +	unsigned long flags;
> +	irqreturn_t ret;
> +
> +	spin_lock_irqsave(&cvt->irqlock, flags);
> +
> +	/* get current run and its context */
> +	run = cvt->current_run;
> +	if (!run) {
> +		ret = IRQ_NONE;
> +		goto out;
> +	}
> +
> +	ctx = run->ctx;
> +
> +	if (!ipu_rot_mode_is_irt(ctx->rot_mode)) {
> +		/* this was NOT a rotation operation, shouldn't happen */
> +		dev_err(priv->ipu->dev, "Unexpected rotation interrupt\n");
> +		spin_unlock_irqrestore(&cvt->irqlock, flags);
> +		return IRQ_HANDLED;
> +	}
> +
> +	ret = ipu_ic_doirq(run);
> +out:
> +	spin_unlock_irqrestore(&cvt->irqlock, flags);
> +	return ret;
> +}
> +
> +/*
> + * try to force the completion of runs for this ctx. Called when
> + * abort wait times out in ipu_image_convert_abort().
> + */
> +static void ipu_ic_force_abort(struct image_converter_ctx *ctx)
> +{
> +	struct image_converter *cvt = ctx->cvt;
> +	struct image_converter_run *run;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&cvt->irqlock, flags);
> +
> +	run = cvt->current_run;
> +	if (run && run->ctx = ctx) {
> +		ipu_ic_convert_stop(run);
> +		run->status = -EIO;
> +		list_add_tail(&run->list, &cvt->done_q);
> +		cvt->current_run = NULL;
> +		ipu_ic_run_next(cvt);
> +	}
> +
> +	spin_unlock_irqrestore(&cvt->irqlock, flags);
> +
> +	ipu_ic_empty_done_q(cvt);
> +}
> +
> +static void ipu_ic_release_ipu_resources(struct image_converter *cvt)
> +{
> +	if (cvt->out_eof_irq >= 0)
> +		free_irq(cvt->out_eof_irq, cvt);
> +	if (cvt->rot_out_eof_irq >= 0)
> +		free_irq(cvt->rot_out_eof_irq, cvt);
> +
> +	if (!IS_ERR_OR_NULL(cvt->in_chan))
> +		ipu_idmac_put(cvt->in_chan);
> +	if (!IS_ERR_OR_NULL(cvt->out_chan))
> +		ipu_idmac_put(cvt->out_chan);
> +	if (!IS_ERR_OR_NULL(cvt->rotation_in_chan))
> +		ipu_idmac_put(cvt->rotation_in_chan);
> +	if (!IS_ERR_OR_NULL(cvt->rotation_out_chan))
> +		ipu_idmac_put(cvt->rotation_out_chan);
> +
> +	cvt->in_chan = cvt->out_chan = cvt->rotation_in_chan > +		cvt->rotation_out_chan = NULL;
> +	cvt->out_eof_irq = cvt->rot_out_eof_irq = -1;
> +}
> +
> +static int ipu_ic_get_ipu_resources(struct image_converter *cvt)
> +{
> +	const struct ic_task_channels *chan = cvt->ic->ch;
> +	struct ipu_ic_priv *priv = cvt->ic->priv;
> +	int ret;
> +
> +	/* get IDMAC channels */
> +	cvt->in_chan = ipu_idmac_get(priv->ipu, chan->in);
> +	cvt->out_chan = ipu_idmac_get(priv->ipu, chan->out);
> +	if (IS_ERR(cvt->in_chan) || IS_ERR(cvt->out_chan)) {
> +		dev_err(priv->ipu->dev, "could not acquire idmac channels\n");
> +		ret = -EBUSY;
> +		goto err;
> +	}
> +
> +	cvt->rotation_in_chan = ipu_idmac_get(priv->ipu, chan->rot_in);
> +	cvt->rotation_out_chan = ipu_idmac_get(priv->ipu, chan->rot_out);
> +	if (IS_ERR(cvt->rotation_in_chan) || IS_ERR(cvt->rotation_out_chan)) {
> +		dev_err(priv->ipu->dev,
> +			"could not acquire idmac rotation channels\n");
> +		ret = -EBUSY;
> +		goto err;
> +	}
> +
> +	/* acquire the EOF interrupts */
> +	cvt->out_eof_irq = ipu_idmac_channel_irq(priv->ipu,
> +						cvt->out_chan,
> +						IPU_IRQ_EOF);
> +
> +	ret = request_threaded_irq(cvt->out_eof_irq,
> +				   ipu_ic_norotate_irq, ipu_ic_bh,
> +				   0, "ipu-ic", cvt);
> +	if (ret < 0) {
> +		dev_err(priv->ipu->dev, "could not acquire irq %d\n",
> +			 cvt->out_eof_irq);
> +		cvt->out_eof_irq = -1;
> +		goto err;
> +	}
> +
> +	cvt->rot_out_eof_irq = ipu_idmac_channel_irq(priv->ipu,
> +						     cvt->rotation_out_chan,
> +						     IPU_IRQ_EOF);
> +
> +	ret = request_threaded_irq(cvt->rot_out_eof_irq,
> +				   ipu_ic_rotate_irq, ipu_ic_bh,
> +				   0, "ipu-ic", cvt);
> +	if (ret < 0) {
> +		dev_err(priv->ipu->dev, "could not acquire irq %d\n",
> +			cvt->rot_out_eof_irq);
> +		cvt->rot_out_eof_irq = -1;
> +		goto err;
> +	}
> +
> +	return 0;
> +err:
> +	ipu_ic_release_ipu_resources(cvt);
> +	return ret;
> +}
> +
> +static int ipu_ic_fill_image(struct image_converter_ctx *ctx,
> +			     struct ipu_ic_image *ic_image,
> +			     struct ipu_image *image,
> +			     enum image_convert_type type)
> +{
> +	struct ipu_ic_priv *priv = ctx->cvt->ic->priv;
> +
> +	ic_image->base = *image;
> +	ic_image->type = type;
> +
> +	ic_image->fmt = ipu_ic_get_format(image->pix.pixelformat);
> +	if (!ic_image->fmt) {
> +		dev_err(priv->ipu->dev, "pixelformat not supported for %s\n",
> +			type = IMAGE_CONVERT_OUT ? "Output" : "Input");
> +		return -EINVAL;
> +	}
> +
> +	if (ic_image->fmt->y_depth)
> +		ic_image->stride = (ic_image->fmt->y_depth *
> +				    ic_image->base.pix.width) >> 3;
> +	else
> +		ic_image->stride  = ic_image->base.pix.bytesperline;
> +
> +	ipu_ic_calc_tile_dimensions(ctx, ic_image);
> +	ipu_ic_calc_tile_offsets(ctx, ic_image);
> +
> +	return 0;
> +}
> +
> +/* borrowed from drivers/media/v4l2-core/v4l2-common.c */
> +static unsigned int clamp_align(unsigned int x, unsigned int min,
> +				unsigned int max, unsigned int align)
> +{
> +	/* Bits that must be zero to be aligned */
> +	unsigned int mask = ~((1 << align) - 1);
> +
> +	/* Clamp to aligned min and max */
> +	x = clamp(x, (min + ~mask) & mask, max & mask);
> +
> +	/* Round to nearest aligned value */
> +	if (align)
> +		x = (x + (1 << (align - 1))) & mask;
> +
> +	return x;
> +}
> +
> +/*
> + * We have to adjust the tile width such that the tile physaddrs and
> + * U and V plane offsets are multiples of 8 bytes as required by
> + * the IPU DMA Controller. For the planar formats, this corresponds
> + * to a pixel alignment of 16 (but use a more formal equation since
> + * the variables are available). For all the packed formats, 8 is
> + * good enough.
> + */
> +static inline u32 tile_width_align(const struct ipu_ic_pixfmt *fmt)
> +{
> +	return fmt->y_depth ? (64 * fmt->uv_width_dec) / fmt->y_depth : 8;
> +}
> +
> +/*
> + * For tile height alignment, we have to ensure that the output tile
> + * heights are multiples of 8 lines if the IRT is required by the
> + * given rotation mode (the IRT performs rotations on 8x8 blocks
> + * at a time). If the IRT is not used, or for input image tiles,
> + * 2 lines are good enough.
> + */
> +static inline u32 tile_height_align(enum image_convert_type type,
> +				    enum ipu_rotate_mode rot_mode)
> +{
> +	return (type = IMAGE_CONVERT_OUT &&
> +		ipu_rot_mode_is_irt(rot_mode)) ? 8 : 2;
> +}
> +
> +/* Adjusts input/output images to IPU restrictions */
> +int ipu_image_convert_adjust(struct ipu_image *in, struct ipu_image *out,
> +			     enum ipu_rotate_mode rot_mode)
> +{
> +	const struct ipu_ic_pixfmt *infmt, *outfmt;
> +	unsigned int num_in_rows, num_in_cols;
> +	unsigned int num_out_rows, num_out_cols;
> +	u32 w_align, h_align;
> +
> +	infmt = ipu_ic_get_format(in->pix.pixelformat);
> +	outfmt = ipu_ic_get_format(out->pix.pixelformat);
> +
> +	/* set some defaults if needed */

Is this our task at all?

> +	if (!infmt) {
> +		in->pix.pixelformat = V4L2_PIX_FMT_RGB24;
> +		infmt = ipu_ic_get_format(V4L2_PIX_FMT_RGB24);
> +	}
> +	if (!outfmt) {
> +		out->pix.pixelformat = V4L2_PIX_FMT_RGB24;
> +		outfmt = ipu_ic_get_format(V4L2_PIX_FMT_RGB24);
> +	}
> +
> +	if (!in->pix.width || !in->pix.height) {
> +		in->pix.width = 640;
> +		in->pix.height = 480;
> +	}
> +	if (!out->pix.width || !out->pix.height) {
> +		out->pix.width = 640;
> +		out->pix.height = 480;
> +	}
> +
> +	/* image converter does not handle fields */
> +	in->pix.field = out->pix.field = V4L2_FIELD_NONE;

Why not? The scaler can scale alternate top/bottom fields no problem.

For SEQ_TB/BT and the interleaved interlacing we'd have to adjust
scaling factors per field and use two vertical tiles for the fields
before this can be supported.

> +	/* resizer cannot downsize more than 4:1 */
> +	if (ipu_rot_mode_is_irt(rot_mode)) {
> +		out->pix.height = max_t(__u32, out->pix.height,
> +					in->pix.width / 4);
> +		out->pix.width = max_t(__u32, out->pix.width,
> +				       in->pix.height / 4);
> +	} else {
> +		out->pix.width = max_t(__u32, out->pix.width,
> +				       in->pix.width / 4);
> +		out->pix.height = max_t(__u32, out->pix.height,
> +					in->pix.height / 4);
> +	}
> +
> +	/* get tiling rows/cols from output format */
> +	num_out_rows = ipu_ic_num_stripes(out->pix.height);
> +	num_out_cols = ipu_ic_num_stripes(out->pix.width);
> +	if (ipu_rot_mode_is_irt(rot_mode)) {
> +		num_in_rows = num_out_cols;
> +		num_in_cols = num_out_rows;
> +	} else {
> +		num_in_rows = num_out_rows;
> +		num_in_cols = num_out_cols;
> +	}
> +
> +	/* align input width/height */
> +	w_align = ilog2(tile_width_align(infmt) * num_in_cols);
> +	h_align = ilog2(tile_height_align(IMAGE_CONVERT_IN, rot_mode) *
> +			num_in_rows);
> +	in->pix.width = clamp_align(in->pix.width, MIN_W, MAX_W, w_align);
> +	in->pix.height = clamp_align(in->pix.height, MIN_H, MAX_H, h_align);
> +
> +	/* align output width/height */
> +	w_align = ilog2(tile_width_align(outfmt) * num_out_cols);
> +	h_align = ilog2(tile_height_align(IMAGE_CONVERT_OUT, rot_mode) *
> +			num_out_rows);
> +	out->pix.width = clamp_align(out->pix.width, MIN_W, MAX_W, w_align);
> +	out->pix.height = clamp_align(out->pix.height, MIN_H, MAX_H, h_align);
> +
> +	/* set input/output strides and image sizes */
> +	in->pix.bytesperline = (in->pix.width * infmt->bpp) >> 3;
> +	in->pix.sizeimage = in->pix.height * in->pix.bytesperline;
> +	out->pix.bytesperline = (out->pix.width * outfmt->bpp) >> 3;
> +	out->pix.sizeimage = out->pix.height * out->pix.bytesperline;
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(ipu_image_convert_adjust);
> +
> +/*
> + * this is used by ipu_image_convert_prepare() to verify set input and
> + * output images are valid before starting the conversion. Clients can
> + * also call it before calling ipu_image_convert_prepare().
> + */
> +int ipu_image_convert_verify(struct ipu_image *in, struct ipu_image *out,
> +			     enum ipu_rotate_mode rot_mode)
> +{
> +	struct ipu_image testin, testout;
> +	int ret;
> +
> +	testin = *in;
> +	testout = *out;
> +
> +	ret = ipu_image_convert_adjust(&testin, &testout, rot_mode);
> +	if (ret)
> +		return ret;
> +
> +	if (testin.pix.width != in->pix.width ||
> +	    testin.pix.height != in->pix.height ||
> +	    testout.pix.width != out->pix.width ||
> +	    testout.pix.height != out->pix.height)
> +		return -EINVAL;
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(ipu_image_convert_verify);
> +
> +/*
> + * Call ipu_image_convert_prepare() to prepare for the conversion of
> + * given images and rotation mode. Returns a new conversion context.
> + */
> +struct image_converter_ctx *
> +ipu_image_convert_prepare(struct ipu_ic *ic,
> +			  struct ipu_image *in, struct ipu_image *out,
> +			  enum ipu_rotate_mode rot_mode,
> +			  image_converter_cb_t complete,
> +			  void *complete_context)
> +{
> +	struct ipu_ic_priv *priv = ic->priv;
> +	struct image_converter *cvt = &ic->cvt;
> +	struct ipu_ic_image *s_image, *d_image;
> +	struct image_converter_ctx *ctx;
> +	unsigned long flags;
> +	bool get_res;
> +	int ret;
> +
> +	if (!ic || !in || !out || !complete)
> +		return ERR_PTR(-EINVAL);
> +
> +	/* verify the in/out images before continuing */
> +	ret = ipu_image_convert_verify(in, out, rot_mode);
> +	if (ret) {
> +		dev_err(priv->ipu->dev, "%s: in/out formats invalid\n",
> +			__func__);
> +		return ERR_PTR(ret);
> +	}
> +
> +	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
> +	if (!ctx)
> +		return ERR_PTR(-ENOMEM);
> +
> +	dev_dbg(priv->ipu->dev, "%s: ctx %p\n", __func__, ctx);
> +
> +	ctx->cvt = cvt;
> +	init_completion(&ctx->aborted);
> +
> +	s_image = &ctx->in;
> +	d_image = &ctx->out;
> +
> +	/* set tiling and rotation */
> +	d_image->num_rows = ipu_ic_num_stripes(out->pix.height);
> +	d_image->num_cols = ipu_ic_num_stripes(out->pix.width);
> +	if (ipu_rot_mode_is_irt(rot_mode)) {
> +		s_image->num_rows = d_image->num_cols;
> +		s_image->num_cols = d_image->num_rows;
> +	} else {
> +		s_image->num_rows = d_image->num_rows;
> +		s_image->num_cols = d_image->num_cols;
> +	}
> +
> +	ctx->num_tiles = d_image->num_cols * d_image->num_rows;
> +	ctx->rot_mode = rot_mode;
> +
> +	ret = ipu_ic_fill_image(ctx, s_image, in, IMAGE_CONVERT_IN);
> +	if (ret)
> +		goto out_free;
> +	ret = ipu_ic_fill_image(ctx, d_image, out, IMAGE_CONVERT_OUT);
> +	if (ret)
> +		goto out_free;
> +
> +	ipu_ic_calc_out_tile_map(ctx);
> +
> +	ipu_ic_dump_format(ctx, s_image);
> +	ipu_ic_dump_format(ctx, d_image);
> +
> +	ctx->complete = complete;
> +	ctx->complete_context = complete_context;
> +
> +	/*
> +	 * Can we use double-buffering for this operation? If there is
> +	 * only one tile (the whole image can be converted in a single
> +	 * operation) there's no point in using double-buffering. Also,
> +	 * the IPU's IDMAC channels allow only a single U and V plane
> +	 * offset shared between both buffers, but these offsets change
> +	 * for every tile, and therefore would have to be updated for
> +	 * each buffer which is not possible. So double-buffering is
> +	 * impossible when either the source or destination images are
> +	 * a planar format (YUV420, YUV422P, etc.).
> +	 */
> +	ctx->double_buffering = (ctx->num_tiles > 1 &&
> +				 !s_image->fmt->y_depth &&
> +				 !d_image->fmt->y_depth);
> +
> +	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
> +		ret = ipu_ic_alloc_dma_buf(priv, &ctx->rot_intermediate[0],
> +					   d_image->tile[0].size);
> +		if (ret)
> +			goto out_free;
> +		if (ctx->double_buffering) {
> +			ret = ipu_ic_alloc_dma_buf(priv,
> +						   &ctx->rot_intermediate[1],
> +						   d_image->tile[0].size);
> +			if (ret)
> +				goto out_free_dmabuf0;
> +		}
> +	}
> +
> +	spin_lock_irqsave(&cvt->irqlock, flags);
> +
> +	get_res = list_empty(&cvt->ctx_list);
> +
> +	list_add_tail(&ctx->list, &cvt->ctx_list);
> +
> +	spin_unlock_irqrestore(&cvt->irqlock, flags);
> +
> +	if (get_res) {
> +		ret = ipu_ic_get_ipu_resources(cvt);
> +		if (ret)
> +			goto out_free_dmabuf1;
> +	}
> +
> +	return ctx;
> +
> +out_free_dmabuf1:
> +	ipu_ic_free_dma_buf(priv, &ctx->rot_intermediate[1]);
> +	spin_lock_irqsave(&cvt->irqlock, flags);
> +	list_del(&ctx->list);
> +	spin_unlock_irqrestore(&cvt->irqlock, flags);
> +out_free_dmabuf0:
> +	ipu_ic_free_dma_buf(priv, &ctx->rot_intermediate[0]);
> +out_free:
> +	kfree(ctx);
> +	return ERR_PTR(ret);
> +}
> +EXPORT_SYMBOL_GPL(ipu_image_convert_prepare);
> +
> +/*
> + * Carry out a single image conversion. Only the physaddr's of the input
> + * and output image buffers are needed. The conversion context must have
> + * been created previously with ipu_image_convert_prepare(). Returns the
> + * new run object.
> + */
> +struct image_converter_run *
> +ipu_image_convert_run(struct image_converter_ctx *ctx,
> +		      dma_addr_t in_phys, dma_addr_t out_phys)
> +{
> +	struct image_converter *cvt = ctx->cvt;
> +	struct ipu_ic_priv *priv = cvt->ic->priv;
> +	struct image_converter_run *run;
> +	unsigned long flags;
> +	int ret = 0;
> +
> +	run = kzalloc(sizeof(*run), GFP_KERNEL);
> +	if (!run)
> +		return ERR_PTR(-ENOMEM);

What is the reasoning behind making the image_converter_run opaque to
the user? If you let the user provide it to ipu_image_convert_run, it
wouldn't have to be allocated/freed with each frame.

> +	run->ctx = ctx;
> +	run->in_phys = in_phys;
> +	run->out_phys = out_phys;
> +
> +	dev_dbg(priv->ipu->dev, "%s: ctx %p run %p\n", __func__,
> +		ctx, run);
> +
> +	spin_lock_irqsave(&cvt->irqlock, flags);
> +
> +	if (ctx->aborting) {
> +		ret = -EIO;
> +		goto unlock;
> +	}
> +
> +	list_add_tail(&run->list, &cvt->pending_q);
> +
> +	if (!cvt->current_run) {
> +		ret = ipu_ic_run(run);
> +		if (ret)
> +			cvt->current_run = NULL;
> +	}
> +unlock:
> +	spin_unlock_irqrestore(&cvt->irqlock, flags);
> +
> +	if (ret) {
> +		kfree(run);
> +		run = ERR_PTR(ret);
> +	}
> +
> +	return run;
> +}
> +EXPORT_SYMBOL_GPL(ipu_image_convert_run);
> +
> +/* Abort any active or pending conversions for this context */
> +void ipu_image_convert_abort(struct image_converter_ctx *ctx)
> +{
> +	struct image_converter *cvt = ctx->cvt;
> +	struct ipu_ic_priv *priv = cvt->ic->priv;
> +	struct image_converter_run *run, *active_run, *tmp;
> +	unsigned long flags;
> +	int run_count, ret;
> +	bool need_abort;
> +
> +	reinit_completion(&ctx->aborted);
> +
> +	spin_lock_irqsave(&cvt->irqlock, flags);
> +
> +	/* move all remaining pending runs in this context to done_q */
> +	list_for_each_entry_safe(run, tmp, &cvt->pending_q, list) {
> +		if (run->ctx != ctx)
> +			continue;
> +		run->status = -EIO;
> +		list_move_tail(&run->list, &cvt->done_q);
> +	}
> +
> +	run_count = ipu_ic_get_run_count(ctx, &cvt->done_q);
> +	active_run = (cvt->current_run && cvt->current_run->ctx = ctx) ?
> +		cvt->current_run : NULL;
> +
> +	need_abort = (run_count || active_run);
> +
> +	ctx->aborting = need_abort;
> +
> +	spin_unlock_irqrestore(&cvt->irqlock, flags);
> +
> +	if (!need_abort) {
> +		dev_dbg(priv->ipu->dev, "%s: no abort needed for ctx %p\n",
> +			__func__, ctx);
> +		return;
> +	}
> +
> +	dev_dbg(priv->ipu->dev,
> +		 "%s: wait for completion: %d runs, active run %p\n",
> +		 __func__, run_count, active_run);
> +
> +	ret = wait_for_completion_timeout(&ctx->aborted,
> +					  msecs_to_jiffies(10000));
> +	if (ret = 0) {
> +		dev_warn(priv->ipu->dev, "%s: timeout\n", __func__);
> +		ipu_ic_force_abort(ctx);
> +	}
> +
> +	ctx->aborting = false;
> +}
> +EXPORT_SYMBOL_GPL(ipu_image_convert_abort);
> +
> +/* Unprepare image conversion context */
> +void ipu_image_convert_unprepare(struct image_converter_ctx *ctx)
> +{
> +	struct image_converter *cvt = ctx->cvt;
> +	struct ipu_ic_priv *priv = cvt->ic->priv;
> +	unsigned long flags;
> +	bool put_res;
> +
> +	/* make sure no runs are hanging around */
> +	ipu_image_convert_abort(ctx);
> +
> +	dev_dbg(priv->ipu->dev, "%s: removing ctx %p\n", __func__, ctx);
> +
> +	spin_lock_irqsave(&cvt->irqlock, flags);
> +
> +	list_del(&ctx->list);
> +
> +	put_res = list_empty(&cvt->ctx_list);
> +
> +	spin_unlock_irqrestore(&cvt->irqlock, flags);
> +
> +	if (put_res)
> +		ipu_ic_release_ipu_resources(cvt);
> +
> +	ipu_ic_free_dma_buf(priv, &ctx->rot_intermediate[1]);
> +	ipu_ic_free_dma_buf(priv, &ctx->rot_intermediate[0]);
> +
> +	kfree(ctx);
> +}
> +EXPORT_SYMBOL_GPL(ipu_image_convert_unprepare);
> +
> +/*
> + * "Canned" asynchronous single image conversion. On successful return
> + * caller must call ipu_image_convert_unprepare() after conversion completes.
> + * Returns the new conversion context.
> + */
> +struct image_converter_ctx *
> +ipu_image_convert(struct ipu_ic *ic,
> +		  struct ipu_image *in, struct ipu_image *out,
> +		  enum ipu_rotate_mode rot_mode,
> +		  image_converter_cb_t complete,
> +		  void *complete_context)
> +{
> +	struct image_converter_ctx *ctx;
> +	struct image_converter_run *run;
> +
> +	ctx = ipu_image_convert_prepare(ic, in, out, rot_mode,
> +					complete, complete_context);
> +	if (IS_ERR(ctx))
> +		return ctx;
> +
> +	run = ipu_image_convert_run(ctx, in->phys0, out->phys0);
> +	if (IS_ERR(run)) {
> +		ipu_image_convert_unprepare(ctx);
> +		return ERR_PTR(PTR_ERR(run));
> +	}
> +
> +	return ctx;
> +}
> +EXPORT_SYMBOL_GPL(ipu_image_convert);
> +
> +/* "Canned" synchronous single image conversion */
> +static void image_convert_sync_complete(void *data,
> +					struct image_converter_run *run,
> +					int err)
> +{
> +	struct completion *comp = data;
> +
> +	complete(comp);
> +}
> +
> +int ipu_image_convert_sync(struct ipu_ic *ic,
> +			   struct ipu_image *in, struct ipu_image *out,
> +			   enum ipu_rotate_mode rot_mode)
> +{
> +	struct image_converter_ctx *ctx;
> +	struct completion comp;
> +	int ret;
> +
> +	init_completion(&comp);
> +
> +	ctx = ipu_image_convert(ic, in, out, rot_mode,
> +				image_convert_sync_complete, &comp);
> +	if (IS_ERR(ctx))
> +		return PTR_ERR(ctx);
> +
> +	ret = wait_for_completion_timeout(&comp, msecs_to_jiffies(10000));
> +	ret = (ret = 0) ? -ETIMEDOUT : 0;
> +
> +	ipu_image_convert_unprepare(ctx);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(ipu_image_convert_sync);
> +

Most of this calculation of tile geometry and conversion queue handling
code is not really low level IC hardware access. I'd like the code that
doesn't have to access ipu_ic internals directly to be moved into a
separate source file. I'd suggest ipu-ic-queue.c, or
ipu-image-convert.c.

>  int ipu_ic_enable(struct ipu_ic *ic)
>  {
>  	struct ipu_ic_priv *priv = ic->priv;
> @@ -746,6 +2418,7 @@ int ipu_ic_init(struct ipu_soc *ipu, struct device *dev,
>  	ipu->ic_priv = priv;
>  
>  	spin_lock_init(&priv->lock);
> +
>  	priv->base = devm_ioremap(dev, base, PAGE_SIZE);
>  	if (!priv->base)
>  		return -ENOMEM;
> @@ -758,10 +2431,21 @@ int ipu_ic_init(struct ipu_soc *ipu, struct device *dev,
>  	priv->ipu = ipu;
>  
>  	for (i = 0; i < IC_NUM_TASKS; i++) {
> -		priv->task[i].task = i;
> -		priv->task[i].priv = priv;
> -		priv->task[i].reg = &ic_task_reg[i];
> -		priv->task[i].bit = &ic_task_bit[i];
> +		struct ipu_ic *ic = &priv->task[i];
> +		struct image_converter *cvt = &ic->cvt;
> +
> +		ic->task = i;
> +		ic->priv = priv;
> +		ic->reg = &ic_task_reg[i];
> +		ic->bit = &ic_task_bit[i];
> +		ic->ch = &ic_task_ch[i];
> +
> +		cvt->ic = ic;
> +		spin_lock_init(&cvt->irqlock);
> +		INIT_LIST_HEAD(&cvt->ctx_list);
> +		INIT_LIST_HEAD(&cvt->pending_q);
> +		INIT_LIST_HEAD(&cvt->done_q);
> +		cvt->out_eof_irq = cvt->rot_out_eof_irq = -1;
>  	}
>  
>  	return 0;
> diff --git a/include/video/imx-ipu-v3.h b/include/video/imx-ipu-v3.h
> index 1a3f7d4..992addf 100644
> --- a/include/video/imx-ipu-v3.h
> +++ b/include/video/imx-ipu-v3.h
> @@ -63,17 +63,25 @@ enum ipu_csi_dest {
>  /*
>   * Enumeration of IPU rotation modes
>   */
> +#define IPU_ROT_BIT_VFLIP (1 << 0)
> +#define IPU_ROT_BIT_HFLIP (1 << 1)
> +#define IPU_ROT_BIT_90    (1 << 2)
> +
>  enum ipu_rotate_mode {
>  	IPU_ROTATE_NONE = 0,
> -	IPU_ROTATE_VERT_FLIP,
> -	IPU_ROTATE_HORIZ_FLIP,
> -	IPU_ROTATE_180,
> -	IPU_ROTATE_90_RIGHT,
> -	IPU_ROTATE_90_RIGHT_VFLIP,
> -	IPU_ROTATE_90_RIGHT_HFLIP,
> -	IPU_ROTATE_90_LEFT,
> +	IPU_ROTATE_VERT_FLIP = IPU_ROT_BIT_VFLIP,
> +	IPU_ROTATE_HORIZ_FLIP = IPU_ROT_BIT_HFLIP,
> +	IPU_ROTATE_180 = (IPU_ROT_BIT_VFLIP | IPU_ROT_BIT_HFLIP),
> +	IPU_ROTATE_90_RIGHT = IPU_ROT_BIT_90,
> +	IPU_ROTATE_90_RIGHT_VFLIP = (IPU_ROT_BIT_90 | IPU_ROT_BIT_VFLIP),
> +	IPU_ROTATE_90_RIGHT_HFLIP = (IPU_ROT_BIT_90 | IPU_ROT_BIT_HFLIP),
> +	IPU_ROTATE_90_LEFT = (IPU_ROT_BIT_90 |
> +			      IPU_ROT_BIT_VFLIP | IPU_ROT_BIT_HFLIP),
>  };
>  
> +/* 90-degree rotations require the IRT unit */
> +#define ipu_rot_mode_is_irt(m) ((m) >= IPU_ROTATE_90_RIGHT)
> +
>  enum ipu_color_space {
>  	IPUV3_COLORSPACE_RGB,
>  	IPUV3_COLORSPACE_YUV,
> @@ -337,6 +345,7 @@ enum ipu_ic_task {
>  };
>  
>  struct ipu_ic;
> +
>  int ipu_ic_task_init(struct ipu_ic *ic,
>  		     int in_width, int in_height,
>  		     int out_width, int out_height,
> @@ -351,6 +360,40 @@ void ipu_ic_task_disable(struct ipu_ic *ic);
>  int ipu_ic_task_idma_init(struct ipu_ic *ic, struct ipuv3_channel *channel,
>  			  u32 width, u32 height, int burst_size,
>  			  enum ipu_rotate_mode rot);
> +
> +struct image_converter_ctx;
> +struct image_converter_run;
> +

Add an ipu_ prefix to those.

> +typedef void (*image_converter_cb_t)(void *ctx,
> +				     struct image_converter_run *run,
> +				     int err);
> +
> +int ipu_image_convert_enum_format(int index, const char **desc, u32 *fourcc);
> +int ipu_image_convert_adjust(struct ipu_image *in, struct ipu_image *out,
> +			     enum ipu_rotate_mode rot_mode);
> +int ipu_image_convert_verify(struct ipu_image *in, struct ipu_image *out,
> +			     enum ipu_rotate_mode rot_mode);
> +struct image_converter_ctx *
> +ipu_image_convert_prepare(struct ipu_ic *ic,
> +			  struct ipu_image *in, struct ipu_image *out,
> +			  enum ipu_rotate_mode rot_mode,
> +			  image_converter_cb_t complete,
> +			  void *complete_context);
> +void ipu_image_convert_unprepare(struct image_converter_ctx *ctx);
> +struct image_converter_run *
> +ipu_image_convert_run(struct image_converter_ctx *ctx,
> +		      dma_addr_t in_phys, dma_addr_t out_phys);
> +void ipu_image_convert_abort(struct image_converter_ctx *ctx);
> +struct image_converter_ctx *
> +ipu_image_convert(struct ipu_ic *ic,
> +		  struct ipu_image *in, struct ipu_image *out,
> +		  enum ipu_rotate_mode rot_mode,
> +		  image_converter_cb_t complete,
> +		  void *complete_context);
> +int ipu_image_convert_sync(struct ipu_ic *ic,
> +			   struct ipu_image *in, struct ipu_image *out,
> +			   enum ipu_rotate_mode rot_mode);
> +
>  int ipu_ic_enable(struct ipu_ic *ic);
>  int ipu_ic_disable(struct ipu_ic *ic);
>  struct ipu_ic *ipu_ic_get(struct ipu_soc *ipu, enum ipu_ic_task task);

regards
Philipp


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 3/4] gpu: ipu-ic: Add complete image conversion support with tiling
@ 2016-09-06  9:26     ` Philipp Zabel
  0 siblings, 0 replies; 32+ messages in thread
From: Philipp Zabel @ 2016-09-06  9:26 UTC (permalink / raw)
  To: Steve Longerbeam
  Cc: linux-fbdev, Steve Longerbeam, linux-kernel, dri-devel,
	tomi.valkeinen, plagnioj

Hi Steve,

Am Mittwoch, den 17.08.2016, 17:50 -0700 schrieb Steve Longerbeam:
> This patch implements complete image conversion support to ipu-ic,
> with tiling to support scaling to and from images up to 4096x4096.
> Image rotation is also supported.
> 
> The internal API is subsystem agnostic (no V4L2 dependency except
> for the use of V4L2 fourcc pixel formats).
> 
> Callers prepare for image conversion by calling
> ipu_image_convert_prepare(), which initializes the parameters of
> the conversion.

... and possibly allocates intermediate buffers for rotation support.
This should be documented somewhere, with a node that v4l2 users should
be doing this during REQBUFS.

>  The caller passes in the ipu_ic task to use for
> the conversion, the input and output image formats, a rotation mode,
> and a completion callback and completion context pointer:
> 
> struct image_converter_ctx *
> ipu_image_convert_prepare(struct ipu_ic *ic,
>                           struct ipu_image *in, struct ipu_image *out,
>                           enum ipu_rotate_mode rot_mode,
>                           image_converter_cb_t complete,
>                           void *complete_context);

As I commented on the other patch, I think the image_convert functions
should use a separate handle for the image conversion queues that sit on
top of the ipu_ic task handles.

> The caller is given a new conversion context that must be passed to
> the further APIs:
> 
> struct image_converter_run *
> ipu_image_convert_run(struct image_converter_ctx *ctx,
>                       dma_addr_t in_phys, dma_addr_t out_phys);
> 
> This queues a new image conversion request to a run queue, and
> starts the conversion immediately if the run queue is empty. Only
> the physaddr's of the input and output image buffers are needed,
> since the conversion context was created previously with
> ipu_image_convert_prepare(). Returns a new run object pointer. When
> the conversion completes, the run pointer is returned to the
> completion callback.
>
> void image_convert_abort(struct image_converter_ctx *ctx);
> 
> This will abort any active or pending conversions for this context.
> Any currently active or pending runs belonging to this context are
> returned via the completion callback with an error status.
>
> void ipu_image_convert_unprepare(struct image_converter_ctx *ctx);
> 
> Unprepares the conversion context. Any active or pending runs will
> be aborted by calling image_convert_abort().
> 
> Signed-off-by: Steve Longerbeam <steve_longerbeam@mentor.com>
> 
> ---
> 
> v4:
> - do away with struct ipu_ic_tile_off, and move tile offsets into
>   struct ipu_ic_tile. This paves the way for possibly allowing for
>   each tile to have different dimensions in the future.

Thanks, this looks a lot better to me.

> v3: no changes
> v2: no changes
> ---
>  drivers/gpu/ipu-v3/ipu-ic.c | 1694 ++++++++++++++++++++++++++++++++++++++++++-
>  include/video/imx-ipu-v3.h  |   57 +-
>  2 files changed, 1739 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/gpu/ipu-v3/ipu-ic.c b/drivers/gpu/ipu-v3/ipu-ic.c
> index 1a37afc..01b1b56 100644
> --- a/drivers/gpu/ipu-v3/ipu-ic.c
> +++ b/drivers/gpu/ipu-v3/ipu-ic.c
> @@ -17,6 +17,8 @@
>  #include <linux/bitrev.h>
>  #include <linux/io.h>
>  #include <linux/err.h>
> +#include <linux/interrupt.h>
> +#include <linux/dma-mapping.h>
>  #include "ipu-prv.h"
>  
>  /* IC Register Offsets */
> @@ -82,6 +84,40 @@
>  #define IC_IDMAC_3_PP_WIDTH_MASK        (0x3ff << 20)
>  #define IC_IDMAC_3_PP_WIDTH_OFFSET      20
>  
> +/*
> + * The IC Resizer has a restriction that the output frame from the
> + * resizer must be 1024 or less in both width (pixels) and height
> + * (lines).
> + *
> + * The image conversion support attempts to split up a conversion when
> + * the desired output (converted) frame resolution exceeds the IC resizer
> + * limit of 1024 in either dimension.
> + *
> + * If either dimension of the output frame exceeds the limit, the
> + * dimension is split into 1, 2, or 4 equal stripes, for a maximum
> + * of 4*4 or 16 tiles. A conversion is then carried out for each
> + * tile (but taking care to pass the full frame stride length to
> + * the DMA channel's parameter memory!). IDMA double-buffering is used
> + * to convert each tile back-to-back when possible (see note below
> + * when double_buffering boolean is set).
> + *
> + * Note that the input frame must be split up into the same number
> + * of tiles as the output frame.
> + */
> +#define MAX_STRIPES_W    4
> +#define MAX_STRIPES_H    4
> +#define MAX_TILES (MAX_STRIPES_W * MAX_STRIPES_H)
> +
> +#define MIN_W     128
> +#define MIN_H     128

Where does this minimum come from?

> +#define MAX_W     4096
> +#define MAX_H     4096
> +
> +enum image_convert_type {
> +	IMAGE_CONVERT_IN = 0,
> +	IMAGE_CONVERT_OUT,
> +};
> +
>  struct ic_task_regoffs {
>  	u32 rsc;
>  	u32 tpmem_csc[2];
> @@ -96,6 +132,16 @@ struct ic_task_bitfields {
>  	u32 ic_cmb_galpha_bit;
>  };
>  
> +struct ic_task_channels {
> +	int in;
> +	int out;
> +	int rot_in;
> +	int rot_out;
> +	int vdi_in_p;
> +	int vdi_in;
> +	int vdi_in_n;

The vdi channels are unused.

> +};
> +
>  static const struct ic_task_regoffs ic_task_reg[IC_NUM_TASKS] = {
>  	[IC_TASK_ENCODER] = {
>  		.rsc = IC_PRP_ENC_RSC,
> @@ -138,12 +184,155 @@ static const struct ic_task_bitfields ic_task_bit[IC_NUM_TASKS] = {
>  	},
>  };
>  
> +static const struct ic_task_channels ic_task_ch[IC_NUM_TASKS] = {
> +	[IC_TASK_ENCODER] = {
> +		.out = IPUV3_CHANNEL_IC_PRP_ENC_MEM,
> +		.rot_in = IPUV3_CHANNEL_MEM_ROT_ENC,
> +		.rot_out = IPUV3_CHANNEL_ROT_ENC_MEM,
> +	},
> +	[IC_TASK_VIEWFINDER] = {
> +		.in = IPUV3_CHANNEL_MEM_IC_PRP_VF,
> +		.out = IPUV3_CHANNEL_IC_PRP_VF_MEM,
> +		.rot_in = IPUV3_CHANNEL_MEM_ROT_VF,
> +		.rot_out = IPUV3_CHANNEL_ROT_VF_MEM,
> +		.vdi_in_p = IPUV3_CHANNEL_MEM_VDI_PREV,
> +		.vdi_in = IPUV3_CHANNEL_MEM_VDI_CUR,
> +		.vdi_in_n = IPUV3_CHANNEL_MEM_VDI_NEXT,

See above.

> +	},
> +	[IC_TASK_POST_PROCESSOR] = {
> +		.in = IPUV3_CHANNEL_MEM_IC_PP,
> +		.out = IPUV3_CHANNEL_IC_PP_MEM,
> +		.rot_in = IPUV3_CHANNEL_MEM_ROT_PP,
> +		.rot_out = IPUV3_CHANNEL_ROT_PP_MEM,
> +	},
> +};
> +
> +struct ipu_ic_dma_buf {
> +	void          *virt;
> +	dma_addr_t    phys;
> +	unsigned long len;
> +};
> +
> +/* dimensions of one tile */
> +struct ipu_ic_tile {
> +	u32 width;
> +	u32 height;
> +	/* size and strides are in bytes */
> +	u32 size;
> +	u32 stride;
> +	u32 rot_stride;
> +	/* start Y or packed offset of this tile */
> +	u32 offset;
> +	/* offset from start to tile in U plane, for planar formats */
> +	u32 u_off;
> +	/* offset from start to tile in V plane, for planar formats */
> +	u32 v_off;
> +};
> +
> +struct ipu_ic_pixfmt {
> +	char	*name;
> +	u32	fourcc;        /* V4L2 fourcc */
> +	int     bpp;           /* total bpp */
> +	int     y_depth;       /* depth of Y plane for planar formats */
> +	int     uv_width_dec;  /* decimation in width for U/V planes */
> +	int     uv_height_dec; /* decimation in height for U/V planes */
> +	bool    uv_swapped;    /* U and V planes are swapped */
> +	bool    uv_packed;     /* partial planar (U and V in same plane) */
> +};
> +
> +struct ipu_ic_image {
> +	struct ipu_image base;
> +	enum image_convert_type type;
> +
> +	const struct ipu_ic_pixfmt *fmt;
> +	unsigned int stride;
> +
> +	/* # of rows (horizontal stripes) if dest height is > 1024 */
> +	unsigned int num_rows;
> +	/* # of columns (vertical stripes) if dest width is > 1024 */
> +	unsigned int num_cols;
> +
> +	struct ipu_ic_tile tile[MAX_TILES];
> +};
> +
> +struct image_converter_ctx;
> +struct image_converter;
>  struct ipu_ic_priv;
> +struct ipu_ic;
> +
> +struct image_converter_run {
> +	struct image_converter_ctx *ctx;
> +
> +	dma_addr_t in_phys;
> +	dma_addr_t out_phys;
> +
> +	int status;
> +
> +	struct list_head list;
> +};
> +
> +struct image_converter_ctx {
> +	struct image_converter *cvt;
> +
> +	image_converter_cb_t complete;
> +	void *complete_context;
> +
> +	/* Source/destination image data and rotation mode */
> +	struct ipu_ic_image in;
> +	struct ipu_ic_image out;
> +	enum ipu_rotate_mode rot_mode;
> +
> +	/* intermediate buffer for rotation */
> +	struct ipu_ic_dma_buf rot_intermediate[2];

No need to change it now, but I assume these could be per IC task
instead of per context.

> +	/* current buffer number for double buffering */
> +	int cur_buf_num;
> +
> +	bool aborting;
> +	struct completion aborted;
> +
> +	/* can we use double-buffering for this conversion operation? */
> +	bool double_buffering;
> +	/* num_rows * num_cols */
> +	unsigned int num_tiles;
> +	/* next tile to process */
> +	unsigned int next_tile;
> +	/* where to place converted tile in dest image */
> +	unsigned int out_tile_map[MAX_TILES];
> +
> +	struct list_head list;
> +};
> +
> +struct image_converter {
> +	struct ipu_ic *ic;
> +
> +	struct ipuv3_channel *in_chan;
> +	struct ipuv3_channel *out_chan;
> +	struct ipuv3_channel *rotation_in_chan;
> +	struct ipuv3_channel *rotation_out_chan;
> +
> +	/* the IPU end-of-frame irqs */
> +	int out_eof_irq;
> +	int rot_out_eof_irq;
> +
> +	spinlock_t irqlock;
> +
> +	/* list of convert contexts */
> +	struct list_head ctx_list;
> +	/* queue of conversion runs */
> +	struct list_head pending_q;
> +	/* queue of completed runs */
> +	struct list_head done_q;
> +
> +	/* the current conversion run */
> +	struct image_converter_run *current_run;
> +};
>  
>  struct ipu_ic {
>  	enum ipu_ic_task task;
>  	const struct ic_task_regoffs *reg;
>  	const struct ic_task_bitfields *bit;
> +	const struct ic_task_channels *ch;
>  
>  	enum ipu_color_space in_cs, g_in_cs;
>  	enum ipu_color_space out_cs;
> @@ -151,6 +340,8 @@ struct ipu_ic {
>  	bool rotation;
>  	bool in_use;
>  
> +	struct image_converter cvt;
> +
>  	struct ipu_ic_priv *priv;
>  };
>  
> @@ -619,7 +810,7 @@ int ipu_ic_task_idma_init(struct ipu_ic *ic, struct ipuv3_channel *channel,
>  	ipu_ic_write(ic, ic_idmac_2, IC_IDMAC_2);
>  	ipu_ic_write(ic, ic_idmac_3, IC_IDMAC_3);
>  
> -	if (rot >= IPU_ROTATE_90_RIGHT)
> +	if (ipu_rot_mode_is_irt(rot))
>  		ic->rotation = true;
>  
>  unlock:
> @@ -648,6 +839,1487 @@ static void ipu_irt_disable(struct ipu_ic *ic)
>  	}
>  }
>  
> +/*
> + * Complete image conversion support follows
> + */
> +
> +static const struct ipu_ic_pixfmt ipu_ic_formats[] = {
> +	{
> +		.name	= "RGB565",

Please drop the names, keeping a list of user readable format names is
the v4l2 core's business, not ours.

> +		.fourcc	= V4L2_PIX_FMT_RGB565,
> +		.bpp    = 16,

bpp is only ever used in bytes, not bits (always divided by 8).
Why not make this bytes_per_pixel or pixel_stride = 2.

> +	}, {
> +		.name	= "RGB24",
> +		.fourcc	= V4L2_PIX_FMT_RGB24,
> +		.bpp    = 24,
> +	}, {
> +		.name	= "BGR24",
> +		.fourcc	= V4L2_PIX_FMT_BGR24,
> +		.bpp    = 24,
> +	}, {
> +		.name	= "RGB32",
> +		.fourcc	= V4L2_PIX_FMT_RGB32,
> +		.bpp    = 32,
> +	}, {
> +		.name	= "BGR32",
> +		.fourcc	= V4L2_PIX_FMT_BGR32,
> +		.bpp    = 32,
> +	}, {
> +		.name	= "4:2:2 packed, YUYV",
> +		.fourcc	= V4L2_PIX_FMT_YUYV,
> +		.bpp    = 16,
> +		.uv_width_dec = 2,
> +		.uv_height_dec = 1,
> +	}, {
> +		.name	= "4:2:2 packed, UYVY",
> +		.fourcc	= V4L2_PIX_FMT_UYVY,
> +		.bpp    = 16,
> +		.uv_width_dec = 2,
> +		.uv_height_dec = 1,
> +	}, {
> +		.name	= "4:2:0 planar, YUV",
> +		.fourcc	= V4L2_PIX_FMT_YUV420,
> +		.bpp    = 12,
> +		.y_depth = 8,

y_depth is only ever used in bytes, not bits (always divided by 8).
Why not make this bool planar instead.

> +		.uv_width_dec = 2,
> +		.uv_height_dec = 2,
> +	}, {
> +		.name	= "4:2:0 planar, YVU",
> +		.fourcc	= V4L2_PIX_FMT_YVU420,
> +		.bpp    = 12,
> +		.y_depth = 8,
> +		.uv_width_dec = 2,
> +		.uv_height_dec = 2,
> +		.uv_swapped = true,
> +	}, {
> +		.name   = "4:2:0 partial planar, NV12",
> +		.fourcc = V4L2_PIX_FMT_NV12,
> +		.bpp    = 12,
> +		.y_depth = 8,
> +		.uv_width_dec = 2,
> +		.uv_height_dec = 2,
> +		.uv_packed = true,
> +	}, {
> +		.name   = "4:2:2 planar, YUV",
> +		.fourcc = V4L2_PIX_FMT_YUV422P,
> +		.bpp    = 16,
> +		.y_depth = 8,
> +		.uv_width_dec = 2,
> +		.uv_height_dec = 1,
> +	}, {
> +		.name   = "4:2:2 partial planar, NV16",
> +		.fourcc = V4L2_PIX_FMT_NV16,
> +		.bpp    = 16,
> +		.y_depth = 8,
> +		.uv_width_dec = 2,
> +		.uv_height_dec = 1,
> +		.uv_packed = true,
> +	},
> +};
> +
> +static const struct ipu_ic_pixfmt *ipu_ic_get_format(u32 fourcc)
> +{
> +	const struct ipu_ic_pixfmt *ret = NULL;
> +	unsigned int i;
> +
> +	for (i = 0; i < ARRAY_SIZE(ipu_ic_formats); i++) {
> +		if (ipu_ic_formats[i].fourcc == fourcc) {
> +			ret = &ipu_ic_formats[i];
> +			break;
> +		}
> +	}
> +
> +	return ret;
> +}
> +
> +static void ipu_ic_dump_format(struct image_converter_ctx *ctx,
> +			       struct ipu_ic_image *ic_image)
> +{
> +	struct ipu_ic_priv *priv = ctx->cvt->ic->priv;
> +
> +	dev_dbg(priv->ipu->dev,
> +		"ctx %p: %s format: %dx%d (%dx%d tiles of size %dx%d), %c%c%c%c\n",
> +		ctx,
> +		ic_image->type == IMAGE_CONVERT_OUT ? "Output" : "Input",
> +		ic_image->base.pix.width, ic_image->base.pix.height,
> +		ic_image->num_cols, ic_image->num_rows,
> +		ic_image->tile[0].width, ic_image->tile[0].height,
> +		ic_image->fmt->fourcc & 0xff,
> +		(ic_image->fmt->fourcc >> 8) & 0xff,
> +		(ic_image->fmt->fourcc >> 16) & 0xff,
> +		(ic_image->fmt->fourcc >> 24) & 0xff);
> +}
> +
> +int ipu_image_convert_enum_format(int index, const char **desc, u32 *fourcc)
> +{
> +	const struct ipu_ic_pixfmt *fmt;
> +
> +	if (index >= (int)ARRAY_SIZE(ipu_ic_formats))
> +		return -EINVAL;
> +
> +	/* Format found */
> +	fmt = &ipu_ic_formats[index];
> +	*desc = fmt->name;
> +	*fourcc = fmt->fourcc;
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(ipu_image_convert_enum_format);
> +
> +static void ipu_ic_free_dma_buf(struct ipu_ic_priv *priv,
> +				struct ipu_ic_dma_buf *buf)
> +{
> +	if (buf->virt)
> +		dma_free_coherent(priv->ipu->dev,
> +				  buf->len, buf->virt, buf->phys);
> +	buf->virt = NULL;
> +	buf->phys = 0;
> +}
> +
> +static int ipu_ic_alloc_dma_buf(struct ipu_ic_priv *priv,
> +				struct ipu_ic_dma_buf *buf,
> +				int size)
> +{
> +	unsigned long newlen = PAGE_ALIGN(size);
> +
> +	if (buf->virt) {
> +		if (buf->len == newlen)
> +			return 0;
> +		ipu_ic_free_dma_buf(priv, buf);
> +	}

Is it necessary to support reallocation? This is currently only used by
the prepare function, which creates a new context.

> +
> +	buf->len = newlen;
> +	buf->virt = dma_alloc_coherent(priv->ipu->dev, buf->len, &buf->phys,
> +				       GFP_DMA | GFP_KERNEL);
> +	if (!buf->virt) {
> +		dev_err(priv->ipu->dev, "failed to alloc dma buffer\n");
> +		return -ENOMEM;
> +	}
> +
> +	return 0;
> +}
> +
> +static inline int ipu_ic_num_stripes(int dim)
> +{
> +	if (dim <= 1024)
> +		return 1;
> +	else if (dim <= 2048)
> +		return 2;
> +	else
> +		return 4;
> +}
> +
> +static void ipu_ic_calc_tile_dimensions(struct image_converter_ctx *ctx,
> +					struct ipu_ic_image *image)
> +{
> +	int i;
> +
> +	for (i = 0; i < ctx->num_tiles; i++) {
> +		struct ipu_ic_tile *tile = &image->tile[i];
> +
> +		tile->height = image->base.pix.height / image->num_rows;
> +		tile->width = image->base.pix.width / image->num_cols;

We already have talked about this, this simplified tiling will cause
image artifacts (horizontal and vertical seams at the tile borders) when
the bilinear upscaler source pixel step is significantly smaller than a
whole pixel.
This can be fixed in the future by using overlapping tiles of different
sizes and possibly by slightly changing the scaling factors of
individual tiles.

> +		tile->size = ((tile->height * image->fmt->bpp) >> 3) *
> +			tile->width;
> +
> +		if (image->fmt->y_depth) {
> +			tile->stride =
> +				(image->fmt->y_depth * tile->width) >> 3;
> +			tile->rot_stride =
> +				(image->fmt->y_depth * tile->height) >> 3;
> +		} else {
> +			tile->stride =
> +				(image->fmt->bpp * tile->width) >> 3;
> +			tile->rot_stride =
> +				(image->fmt->bpp * tile->height) >> 3;
> +		}
> +	}
> +}
> +
> +/*
> + * Use the rotation transformation to find the tile coordinates
> + * (row, col) of a tile in the destination frame that corresponds
> + * to the given tile coordinates of a source frame. The destination
> + * coordinate is then converted to a tile index.
> + */
> +static int ipu_ic_transform_tile_index(struct image_converter_ctx *ctx,
> +				       int src_row, int src_col)
> +{
> +	struct ipu_ic_priv *priv = ctx->cvt->ic->priv;
> +	struct ipu_ic_image *s_image = &ctx->in;
> +	struct ipu_ic_image *d_image = &ctx->out;
> +	int cos, sin, dst_row, dst_col;
> +
> +	/* with no rotation it's a 1:1 mapping */
> +	if (ctx->rot_mode == IPU_ROTATE_NONE)
> +		return src_row * s_image->num_cols + src_col;
> +
> +	if (ctx->rot_mode & IPU_ROT_BIT_90) {
> +		cos = 0;
> +		sin = 1;
> +	} else {
> +		cos = 1;
> +		sin = 0;
> +	}
>+
> +	/*
> +	 * before doing the transform, first we have to translate
> +	 * source row,col for an origin in the center of s_image
> +	 */
> +	src_row *= 2;
> +	src_col *= 2;
> +	src_row -= s_image->num_rows - 1;
> +	src_col -= s_image->num_cols - 1;
> +
> +	/* do the rotation transform */
> +	dst_col = src_col * cos - src_row * sin;
> +	dst_row = src_col * sin + src_row * cos;

This looks nice, but I'd just move the rot_mode conditional below
assignment of src_row/col and do away with the sin/cos temporary
variables:

	/*
	 * before doing the transform, first we have to translate
	 * source row,col for an origin in the center of s_image
	 */
	src_row = src_row * 2 - (s_image->num_rows - 1);
	src_col = src_col * 2 - (s_image->num_cols - 1);

	/* do the rotation transform */
	if (ctx->rot_mode & IPU_ROT_BIT_90) {
		dst_col = -src_row;
		dst_row = src_col;
	} else {
		dst_col = src_col;
		dst_row = src_row;
	}

> +	/* apply flip */
> +	if (ctx->rot_mode & IPU_ROT_BIT_HFLIP)
> +		dst_col = -dst_col;
> +	if (ctx->rot_mode & IPU_ROT_BIT_VFLIP)
> +		dst_row = -dst_row;
> +
> +	dev_dbg(priv->ipu->dev, "ctx %p: [%d,%d] --> [%d,%d]\n",
> +		ctx, src_col, src_row, dst_col, dst_row);
> +
> +	/*
> +	 * finally translate dest row,col using an origin in upper
> +	 * left of d_image
> +	 */
> +	dst_row += d_image->num_rows - 1;
> +	dst_col += d_image->num_cols - 1;
> +	dst_row /= 2;
> +	dst_col /= 2;
> +
> +	return dst_row * d_image->num_cols + dst_col;
> +}
> +
> +/*
> + * Fill the out_tile_map[] with transformed destination tile indeces.
> + */
> +static void ipu_ic_calc_out_tile_map(struct image_converter_ctx *ctx)
> +{
> +	struct ipu_ic_image *s_image = &ctx->in;
> +	unsigned int row, col, tile = 0;
> +
> +	for (row = 0; row < s_image->num_rows; row++) {
> +		for (col = 0; col < s_image->num_cols; col++) {
> +			ctx->out_tile_map[tile] =
> +				ipu_ic_transform_tile_index(ctx, row, col);
> +			tile++;
> +		}
> +	}
> +}
> +
> +static void ipu_ic_calc_tile_offsets_planar(struct image_converter_ctx *ctx,
> +					    struct ipu_ic_image *image)
> +{
> +	struct ipu_ic_priv *priv = ctx->cvt->ic->priv;
> +	const struct ipu_ic_pixfmt *fmt = image->fmt;
> +	unsigned int row, col, tile = 0;
> +	u32 H, w, h, y_depth, y_stride, uv_stride;
> +	u32 uv_row_off, uv_col_off, uv_off, u_off, v_off, tmp;
> +	u32 y_row_off, y_col_off, y_off;
> +	u32 y_size, uv_size;
> +
> +	/* setup some convenience vars */
> +	H = image->base.pix.height;
> +
> +	y_depth = fmt->y_depth;
> +	y_stride = image->stride;
> +	uv_stride = y_stride / fmt->uv_width_dec;
> +	if (fmt->uv_packed)
> +		uv_stride *= 2;
> +
> +	y_size = H * y_stride;
> +	uv_size = y_size / (fmt->uv_width_dec * fmt->uv_height_dec);
> +
> +	for (row = 0; row < image->num_rows; row++) {
> +		w = image->tile[tile].width;
> +		h = image->tile[tile].height;
> +		y_row_off = row * h * y_stride;
> +		uv_row_off = (row * h * uv_stride) / fmt->uv_height_dec;
> +
> +		for (col = 0; col < image->num_cols; col++) {
> +			y_col_off = (col * w * y_depth) >> 3;

We know that for planar formats, y_depth can only ever be 8. No need to
calculate this here.

> +			uv_col_off = y_col_off / fmt->uv_width_dec;
> +			if (fmt->uv_packed)
> +				uv_col_off *= 2;
> +
> +			y_off = y_row_off + y_col_off;
> +			uv_off = uv_row_off + uv_col_off;
> +
> +			u_off = y_size - y_off + uv_off;
> +			v_off = (fmt->uv_packed) ? 0 : u_off + uv_size;
> +			if (fmt->uv_swapped) {
> +				tmp = u_off;
> +				u_off = v_off;
> +				v_off = tmp;
> +			}
> +
> +			image->tile[tile].offset = y_off;
> +			image->tile[tile].u_off = u_off;
> +			image->tile[tile++].v_off = v_off;
> +
> +			dev_dbg(priv->ipu->dev,
> +				"ctx %p: %s@[%d,%d]: y_off %08x, u_off %08x, v_off %08x\n",
> +				ctx, image->type == IMAGE_CONVERT_IN ?
> +				"Input" : "Output", row, col,
> +				y_off, u_off, v_off);
> +		}
> +	}
> +}
> +
> +static void ipu_ic_calc_tile_offsets_packed(struct image_converter_ctx *ctx,
> +					    struct ipu_ic_image *image)
> +{
> +	struct ipu_ic_priv *priv = ctx->cvt->ic->priv;
> +	const struct ipu_ic_pixfmt *fmt = image->fmt;
> +	unsigned int row, col, tile = 0;
> +	u32 w, h, bpp, stride;
> +	u32 row_off, col_off;
> +
> +	/* setup some convenience vars */
> +	stride = image->stride;
> +	bpp = fmt->bpp;
> +
> +	for (row = 0; row < image->num_rows; row++) {
> +		w = image->tile[tile].width;
> +		h = image->tile[tile].height;
> +		row_off = row * h * stride;
> +
> +		for (col = 0; col < image->num_cols; col++) {
> +			col_off = (col * w * bpp) >> 3;
> +
> +			image->tile[tile].offset = row_off + col_off;
> +			image->tile[tile].u_off = 0;
> +			image->tile[tile++].v_off = 0;
> +
> +			dev_dbg(priv->ipu->dev,
> +				"ctx %p: %s@[%d,%d]: phys %08x\n", ctx,
> +				image->type == IMAGE_CONVERT_IN ?
> +				"Input" : "Output", row, col,
> +				row_off + col_off);
> +		}
> +	}
> +}
> +
> +static void ipu_ic_calc_tile_offsets(struct image_converter_ctx *ctx,
> +				     struct ipu_ic_image *image)
> +{
> +	if (image->fmt->y_depth)
> +		ipu_ic_calc_tile_offsets_planar(ctx, image);
> +	else
> +		ipu_ic_calc_tile_offsets_packed(ctx, image);
> +}
> +
> +/*
> + * return the number of runs in given queue (pending_q or done_q)
> + * for this context. hold irqlock when calling.
> + */

Most of the following code seems to be running under one big spinlock.
Is this really necessary?
All the IRQ handlers to is potentially call ipu_ic_convert_stop, update
the CPMEM, mark buffers as ready for the IDMAC, and and put the current
run on the done_q when ready. Can't the IC/IDMAC register access be
locked completely separately from the list handling?

> +static int ipu_ic_get_run_count(struct image_converter_ctx *ctx,
> +				struct list_head *q)
> +{
> +	struct image_converter_run *run;
> +	int count = 0;

Add
	lockdep_assert_held(&ctx->irqlock);
for the functions that expect their caller to be holding the lock.

> +	list_for_each_entry(run, q, list) {
> +		if (run->ctx == ctx)
> +			count++;
> +	}
> +
> +	return count;
> +}
> +
> +/* hold irqlock when calling */
> +static void ipu_ic_convert_stop(struct image_converter_run *run)
> +{
> +	struct image_converter_ctx *ctx = run->ctx;
> +	struct image_converter *cvt = ctx->cvt;
> +	struct ipu_ic_priv *priv = cvt->ic->priv;
> +
> +	dev_dbg(priv->ipu->dev, "%s: stopping ctx %p run %p\n",
> +		__func__, ctx, run);

Maybe add some indication which IC task this context belongs to?

> +	/* disable IC tasks and the channels */
> +	ipu_ic_task_disable(cvt->ic);
> +	ipu_idmac_disable_channel(cvt->in_chan);
> +	ipu_idmac_disable_channel(cvt->out_chan);
> +
> +	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
> +		ipu_idmac_disable_channel(cvt->rotation_in_chan);
> +		ipu_idmac_disable_channel(cvt->rotation_out_chan);
> +		ipu_idmac_unlink(cvt->out_chan, cvt->rotation_in_chan);
> +	}
> +
> +	ipu_ic_disable(cvt->ic);
> +}
> +
> +/* hold irqlock when calling */
> +static void init_idmac_channel(struct image_converter_ctx *ctx,
> +			       struct ipuv3_channel *channel,
> +			       struct ipu_ic_image *image,
> +			       enum ipu_rotate_mode rot_mode,
> +			       bool rot_swap_width_height)
> +{
> +	struct image_converter *cvt = ctx->cvt;
> +	unsigned int burst_size;
> +	u32 width, height, stride;
> +	dma_addr_t addr0, addr1 = 0;
> +	struct ipu_image tile_image;
> +	unsigned int tile_idx[2];
> +
> +	if (image->type == IMAGE_CONVERT_OUT) {
> +		tile_idx[0] = ctx->out_tile_map[0];
> +		tile_idx[1] = ctx->out_tile_map[1];
> +	} else {
> +		tile_idx[0] = 0;
> +		tile_idx[1] = 1;
> +	}
> +
> +	if (rot_swap_width_height) {
> +		width = image->tile[0].height;
> +		height = image->tile[0].width;
> +		stride = image->tile[0].rot_stride;
> +		addr0 = ctx->rot_intermediate[0].phys;
> +		if (ctx->double_buffering)
> +			addr1 = ctx->rot_intermediate[1].phys;
> +	} else {
> +		width = image->tile[0].width;
> +		height = image->tile[0].height;
> +		stride = image->stride;
> +		addr0 = image->base.phys0 +
> +			image->tile[tile_idx[0]].offset;
> +		if (ctx->double_buffering)
> +			addr1 = image->base.phys0 +
> +				image->tile[tile_idx[1]].offset;
> +	}
> +
> +	ipu_cpmem_zero(channel);
> +
> +	memset(&tile_image, 0, sizeof(tile_image));
> +	tile_image.pix.width = tile_image.rect.width = width;
> +	tile_image.pix.height = tile_image.rect.height = height;
> +	tile_image.pix.bytesperline = stride;
> +	tile_image.pix.pixelformat =  image->fmt->fourcc;
> +	tile_image.phys0 = addr0;
> +	tile_image.phys1 = addr1;
> +	ipu_cpmem_set_image(channel, &tile_image);
> +
> +	if (image->fmt->y_depth && !rot_swap_width_height)
> +		ipu_cpmem_set_uv_offset(channel,
> +					image->tile[tile_idx[0]].u_off,
> +					image->tile[tile_idx[0]].v_off);
> +
> +	if (rot_mode)
> +		ipu_cpmem_set_rotation(channel, rot_mode);
> +
> +	if (channel == cvt->rotation_in_chan ||
> +	    channel == cvt->rotation_out_chan) {
> +		burst_size = 8;
> +		ipu_cpmem_set_block_mode(channel);
> +	} else
> +		burst_size = (width % 16) ? 8 : 16;

This is for later, but it might turn out to be better to accept a little
overdraw if stride allows for it and use the larger burst size,
especially for wide images.

> +	ipu_cpmem_set_burstsize(channel, burst_size);
> +
> +	ipu_ic_task_idma_init(cvt->ic, channel, width, height,
> +			      burst_size, rot_mode);
> +
> +	ipu_cpmem_set_axi_id(channel, 1);
> +
> +	ipu_idmac_set_double_buffer(channel, ctx->double_buffering);
> +}
> +
> +/* hold irqlock when calling */
> +static int ipu_ic_convert_start(struct image_converter_run *run)
> +{
> +	struct image_converter_ctx *ctx = run->ctx;
> +	struct image_converter *cvt = ctx->cvt;
> +	struct ipu_ic_priv *priv = cvt->ic->priv;
> +	struct ipu_ic_image *s_image = &ctx->in;
> +	struct ipu_ic_image *d_image = &ctx->out;
> +	enum ipu_color_space src_cs, dest_cs;
> +	unsigned int dest_width, dest_height;
> +	int ret;
> +
> +	dev_dbg(priv->ipu->dev, "%s: starting ctx %p run %p\n",
> +		__func__, ctx, run);
> +
> +	src_cs = ipu_pixelformat_to_colorspace(s_image->fmt->fourcc);
> +	dest_cs = ipu_pixelformat_to_colorspace(d_image->fmt->fourcc);
> +
> +	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
> +		/* swap width/height for resizer */
> +		dest_width = d_image->tile[0].height;
> +		dest_height = d_image->tile[0].width;
> +	} else {
> +		dest_width = d_image->tile[0].width;
> +		dest_height = d_image->tile[0].height;
> +	}
> +
> +	/* setup the IC resizer and CSC */
> +	ret = ipu_ic_task_init(cvt->ic,
> +			       s_image->tile[0].width,
> +			       s_image->tile[0].height,
> +			       dest_width,
> +			       dest_height,
> +			       src_cs, dest_cs);
> +	if (ret) {
> +		dev_err(priv->ipu->dev, "ipu_ic_task_init failed, %d\n", ret);
> +		return ret;
> +	}
> +
> +	/* init the source MEM-->IC PP IDMAC channel */
> +	init_idmac_channel(ctx, cvt->in_chan, s_image,
> +			   IPU_ROTATE_NONE, false);
> +
> +	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
> +		/* init the IC PP-->MEM IDMAC channel */
> +		init_idmac_channel(ctx, cvt->out_chan, d_image,
> +				   IPU_ROTATE_NONE, true);
> +
> +		/* init the MEM-->IC PP ROT IDMAC channel */
> +		init_idmac_channel(ctx, cvt->rotation_in_chan, d_image,
> +				   ctx->rot_mode, true);
> +
> +		/* init the destination IC PP ROT-->MEM IDMAC channel */
> +		init_idmac_channel(ctx, cvt->rotation_out_chan, d_image,
> +				   IPU_ROTATE_NONE, false);
> +
> +		/* now link IC PP-->MEM to MEM-->IC PP ROT */
> +		ipu_idmac_link(cvt->out_chan, cvt->rotation_in_chan);
> +	} else {
> +		/* init the destination IC PP-->MEM IDMAC channel */
> +		init_idmac_channel(ctx, cvt->out_chan, d_image,
> +				   ctx->rot_mode, false);
> +	}
> +
> +	/* enable the IC */
> +	ipu_ic_enable(cvt->ic);
> +
> +	/* set buffers ready */
> +	ipu_idmac_select_buffer(cvt->in_chan, 0);
> +	ipu_idmac_select_buffer(cvt->out_chan, 0);
> +	if (ipu_rot_mode_is_irt(ctx->rot_mode))
> +		ipu_idmac_select_buffer(cvt->rotation_out_chan, 0);
> +	if (ctx->double_buffering) {
> +		ipu_idmac_select_buffer(cvt->in_chan, 1);
> +		ipu_idmac_select_buffer(cvt->out_chan, 1);
> +		if (ipu_rot_mode_is_irt(ctx->rot_mode))
> +			ipu_idmac_select_buffer(cvt->rotation_out_chan, 1);
> +	}
> +
> +	/* enable the channels! */
> +	ipu_idmac_enable_channel(cvt->in_chan);
> +	ipu_idmac_enable_channel(cvt->out_chan);
> +	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
> +		ipu_idmac_enable_channel(cvt->rotation_in_chan);
> +		ipu_idmac_enable_channel(cvt->rotation_out_chan);
> +	}
> +
> +	ipu_ic_task_enable(cvt->ic);
> +
> +	ipu_cpmem_dump(cvt->in_chan);
> +	ipu_cpmem_dump(cvt->out_chan);
> +	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
> +		ipu_cpmem_dump(cvt->rotation_in_chan);
> +		ipu_cpmem_dump(cvt->rotation_out_chan);
> +	}
> +
> +	ipu_dump(priv->ipu);
> +
> +	return 0;
> +}
> +
> +/* hold irqlock when calling */
> +static int ipu_ic_run(struct image_converter_run *run)
> +{
> +	struct image_converter_ctx *ctx = run->ctx;
> +	struct image_converter *cvt = ctx->cvt;
> +
> +	ctx->in.base.phys0 = run->in_phys;
> +	ctx->out.base.phys0 = run->out_phys;
> +
> +	ctx->cur_buf_num = 0;
> +	ctx->next_tile = 1;
> +
> +	/* remove run from pending_q and set as current */
> +	list_del(&run->list);
> +	cvt->current_run = run;
> +
> +	return ipu_ic_convert_start(run);
> +}
> +
> +/* hold irqlock when calling */
> +static void ipu_ic_run_next(struct image_converter *cvt)
> +{
> +	struct ipu_ic_priv *priv = cvt->ic->priv;
> +	struct image_converter_run *run, *tmp;
> +	int ret;
> +
> +	list_for_each_entry_safe(run, tmp, &cvt->pending_q, list) {
> +		/* skip contexts that are aborting */
> +		if (run->ctx->aborting) {
> +			dev_dbg(priv->ipu->dev,
> +				 "%s: skipping aborting ctx %p run %p\n",
> +				 __func__, run->ctx, run);
> +			continue;
> +		}
> +
> +		ret = ipu_ic_run(run);
> +		if (!ret)
> +			break;
> +
> +		/*
> +		 * something went wrong with start, add the run
> +		 * to done q and continue to the next run in the
> +		 * pending q.
> +		 */
> +		run->status = ret;
> +		list_add_tail(&run->list, &cvt->done_q);
> +		cvt->current_run = NULL;
> +	}
> +}
> +
> +static void ipu_ic_empty_done_q(struct image_converter *cvt)
> +{
> +	struct ipu_ic_priv *priv = cvt->ic->priv;
> +	struct image_converter_run *run;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&cvt->irqlock, flags);
> +
> +	while (!list_empty(&cvt->done_q)) {
> +		run = list_entry(cvt->done_q.next,
> +				 struct image_converter_run,
> +				 list);
> +
> +		list_del(&run->list);
> +
> +		dev_dbg(priv->ipu->dev,
> +			"%s: completing ctx %p run %p with %d\n",
> +			__func__, run->ctx, run, run->status);
> +
> +		/* call the completion callback and free the run */
> +		spin_unlock_irqrestore(&cvt->irqlock, flags);
> +		run->ctx->complete(run->ctx->complete_context, run,
> +				   run->status);
> +		kfree(run);
> +		spin_lock_irqsave(&cvt->irqlock, flags);
> +	}
> +
> +	spin_unlock_irqrestore(&cvt->irqlock, flags);
> +}
> +
> +/*
> + * the bottom half thread clears out the done_q, calling the
> + * completion handler for each.
> + */
> +static irqreturn_t ipu_ic_bh(int irq, void *dev_id)
> +{
> +	struct image_converter *cvt = dev_id;
> +	struct ipu_ic_priv *priv = cvt->ic->priv;
> +	struct image_converter_ctx *ctx;
> +	unsigned long flags;
> +
> +	dev_dbg(priv->ipu->dev, "%s: enter\n", __func__);
> +
> +	ipu_ic_empty_done_q(cvt);
> +
> +	spin_lock_irqsave(&cvt->irqlock, flags);
> +
> +	/*
> +	 * the done_q is cleared out, signal any contexts
> +	 * that are aborting that abort can complete.
> +	 */
> +	list_for_each_entry(ctx, &cvt->ctx_list, list) {
> +		if (ctx->aborting) {
> +			dev_dbg(priv->ipu->dev,
> +				 "%s: signaling abort for ctx %p\n",
> +				 __func__, ctx);
> +			complete(&ctx->aborted);
> +		}
> +	}
> +
> +	spin_unlock_irqrestore(&cvt->irqlock, flags);
> +
> +	dev_dbg(priv->ipu->dev, "%s: exit\n", __func__);
> +	return IRQ_HANDLED;
> +}
> +
> +/* hold irqlock when calling */
> +static irqreturn_t ipu_ic_doirq(struct image_converter_run *run)
> +{
> +	struct image_converter_ctx *ctx = run->ctx;
> +	struct image_converter *cvt = ctx->cvt;
> +	struct ipu_ic_tile *src_tile, *dst_tile;
> +	struct ipu_ic_image *s_image = &ctx->in;
> +	struct ipu_ic_image *d_image = &ctx->out;
> +	struct ipuv3_channel *outch;
> +	unsigned int dst_idx;
> +
> +	outch = ipu_rot_mode_is_irt(ctx->rot_mode) ?
> +		cvt->rotation_out_chan : cvt->out_chan;
> +
> +	/*
> +	 * It is difficult to stop the channel DMA before the channels
> +	 * enter the paused state. Without double-buffering the channels
> +	 * are always in a paused state when the EOF irq occurs, so it
> +	 * is safe to stop the channels now. For double-buffering we
> +	 * just ignore the abort until the operation completes, when it
> +	 * is safe to shut down.
> +	 */
> +	if (ctx->aborting && !ctx->double_buffering) {
> +		ipu_ic_convert_stop(run);
> +		run->status = -EIO;
> +		goto done;
> +	}
> +
> +	if (ctx->next_tile == ctx->num_tiles) {
> +		/*
> +		 * the conversion is complete
> +		 */
> +		ipu_ic_convert_stop(run);
> +		run->status = 0;
> +		goto done;
> +	}
> +
> +	/*
> +	 * not done, place the next tile buffers.
> +	 */
> +	if (!ctx->double_buffering) {
> +
> +		src_tile = &s_image->tile[ctx->next_tile];
> +		dst_idx = ctx->out_tile_map[ctx->next_tile];
> +		dst_tile = &d_image->tile[dst_idx];
> +
> +		ipu_cpmem_set_buffer(cvt->in_chan, 0,
> +				     s_image->base.phys0 + src_tile->offset);
> +		ipu_cpmem_set_buffer(outch, 0,
> +				     d_image->base.phys0 + dst_tile->offset);
> +		if (s_image->fmt->y_depth)
> +			ipu_cpmem_set_uv_offset(cvt->in_chan,
> +						src_tile->u_off,
> +						src_tile->v_off);
> +		if (d_image->fmt->y_depth)
> +			ipu_cpmem_set_uv_offset(outch,
> +						dst_tile->u_off,
> +						dst_tile->v_off);
> +
> +		ipu_idmac_select_buffer(cvt->in_chan, 0);
> +		ipu_idmac_select_buffer(outch, 0);
> +
> +	} else if (ctx->next_tile < ctx->num_tiles - 1) {
> +
> +		src_tile = &s_image->tile[ctx->next_tile + 1];
> +		dst_idx = ctx->out_tile_map[ctx->next_tile + 1];
> +		dst_tile = &d_image->tile[dst_idx];
> +
> +		ipu_cpmem_set_buffer(cvt->in_chan, ctx->cur_buf_num,
> +				     s_image->base.phys0 + src_tile->offset);
> +		ipu_cpmem_set_buffer(outch, ctx->cur_buf_num,
> +				     d_image->base.phys0 + dst_tile->offset);
> +
> +		ipu_idmac_select_buffer(cvt->in_chan, ctx->cur_buf_num);
> +		ipu_idmac_select_buffer(outch, ctx->cur_buf_num);
> +
> +		ctx->cur_buf_num ^= 1;
> +	}
> +
> +	ctx->next_tile++;
> +	return IRQ_HANDLED;
> +done:
> +	list_add_tail(&run->list, &cvt->done_q);
> +	cvt->current_run = NULL;
> +	ipu_ic_run_next(cvt);
> +	return IRQ_WAKE_THREAD;
> +}
> +
> +static irqreturn_t ipu_ic_norotate_irq(int irq, void *data)
> +{
> +	struct image_converter *cvt = data;
> +	struct image_converter_ctx *ctx;
> +	struct image_converter_run *run;
> +	unsigned long flags;
> +	irqreturn_t ret;
> +
> +	spin_lock_irqsave(&cvt->irqlock, flags);
> +
> +	/* get current run and its context */
> +	run = cvt->current_run;
> +	if (!run) {
> +		ret = IRQ_NONE;
> +		goto out;
> +	}
> +
> +	ctx = run->ctx;
> +
> +	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
> +		/* this is a rotation operation, just ignore */
> +		spin_unlock_irqrestore(&cvt->irqlock, flags);
> +		return IRQ_HANDLED;
> +	}

Why enable the out_chan EOF irq at all when using the IRT mode?

> +
> +	ret = ipu_ic_doirq(run);
> +out:
> +	spin_unlock_irqrestore(&cvt->irqlock, flags);
> +	return ret;
> +}
> +
> +static irqreturn_t ipu_ic_rotate_irq(int irq, void *data)
> +{
> +	struct image_converter *cvt = data;
> +	struct ipu_ic_priv *priv = cvt->ic->priv;
> +	struct image_converter_ctx *ctx;
> +	struct image_converter_run *run;
> +	unsigned long flags;
> +	irqreturn_t ret;
> +
> +	spin_lock_irqsave(&cvt->irqlock, flags);
> +
> +	/* get current run and its context */
> +	run = cvt->current_run;
> +	if (!run) {
> +		ret = IRQ_NONE;
> +		goto out;
> +	}
> +
> +	ctx = run->ctx;
> +
> +	if (!ipu_rot_mode_is_irt(ctx->rot_mode)) {
> +		/* this was NOT a rotation operation, shouldn't happen */
> +		dev_err(priv->ipu->dev, "Unexpected rotation interrupt\n");
> +		spin_unlock_irqrestore(&cvt->irqlock, flags);
> +		return IRQ_HANDLED;
> +	}
> +
> +	ret = ipu_ic_doirq(run);
> +out:
> +	spin_unlock_irqrestore(&cvt->irqlock, flags);
> +	return ret;
> +}
> +
> +/*
> + * try to force the completion of runs for this ctx. Called when
> + * abort wait times out in ipu_image_convert_abort().
> + */
> +static void ipu_ic_force_abort(struct image_converter_ctx *ctx)
> +{
> +	struct image_converter *cvt = ctx->cvt;
> +	struct image_converter_run *run;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&cvt->irqlock, flags);
> +
> +	run = cvt->current_run;
> +	if (run && run->ctx == ctx) {
> +		ipu_ic_convert_stop(run);
> +		run->status = -EIO;
> +		list_add_tail(&run->list, &cvt->done_q);
> +		cvt->current_run = NULL;
> +		ipu_ic_run_next(cvt);
> +	}
> +
> +	spin_unlock_irqrestore(&cvt->irqlock, flags);
> +
> +	ipu_ic_empty_done_q(cvt);
> +}
> +
> +static void ipu_ic_release_ipu_resources(struct image_converter *cvt)
> +{
> +	if (cvt->out_eof_irq >= 0)
> +		free_irq(cvt->out_eof_irq, cvt);
> +	if (cvt->rot_out_eof_irq >= 0)
> +		free_irq(cvt->rot_out_eof_irq, cvt);
> +
> +	if (!IS_ERR_OR_NULL(cvt->in_chan))
> +		ipu_idmac_put(cvt->in_chan);
> +	if (!IS_ERR_OR_NULL(cvt->out_chan))
> +		ipu_idmac_put(cvt->out_chan);
> +	if (!IS_ERR_OR_NULL(cvt->rotation_in_chan))
> +		ipu_idmac_put(cvt->rotation_in_chan);
> +	if (!IS_ERR_OR_NULL(cvt->rotation_out_chan))
> +		ipu_idmac_put(cvt->rotation_out_chan);
> +
> +	cvt->in_chan = cvt->out_chan = cvt->rotation_in_chan =
> +		cvt->rotation_out_chan = NULL;
> +	cvt->out_eof_irq = cvt->rot_out_eof_irq = -1;
> +}
> +
> +static int ipu_ic_get_ipu_resources(struct image_converter *cvt)
> +{
> +	const struct ic_task_channels *chan = cvt->ic->ch;
> +	struct ipu_ic_priv *priv = cvt->ic->priv;
> +	int ret;
> +
> +	/* get IDMAC channels */
> +	cvt->in_chan = ipu_idmac_get(priv->ipu, chan->in);
> +	cvt->out_chan = ipu_idmac_get(priv->ipu, chan->out);
> +	if (IS_ERR(cvt->in_chan) || IS_ERR(cvt->out_chan)) {
> +		dev_err(priv->ipu->dev, "could not acquire idmac channels\n");
> +		ret = -EBUSY;
> +		goto err;
> +	}
> +
> +	cvt->rotation_in_chan = ipu_idmac_get(priv->ipu, chan->rot_in);
> +	cvt->rotation_out_chan = ipu_idmac_get(priv->ipu, chan->rot_out);
> +	if (IS_ERR(cvt->rotation_in_chan) || IS_ERR(cvt->rotation_out_chan)) {
> +		dev_err(priv->ipu->dev,
> +			"could not acquire idmac rotation channels\n");
> +		ret = -EBUSY;
> +		goto err;
> +	}
> +
> +	/* acquire the EOF interrupts */
> +	cvt->out_eof_irq = ipu_idmac_channel_irq(priv->ipu,
> +						cvt->out_chan,
> +						IPU_IRQ_EOF);
> +
> +	ret = request_threaded_irq(cvt->out_eof_irq,
> +				   ipu_ic_norotate_irq, ipu_ic_bh,
> +				   0, "ipu-ic", cvt);
> +	if (ret < 0) {
> +		dev_err(priv->ipu->dev, "could not acquire irq %d\n",
> +			 cvt->out_eof_irq);
> +		cvt->out_eof_irq = -1;
> +		goto err;
> +	}
> +
> +	cvt->rot_out_eof_irq = ipu_idmac_channel_irq(priv->ipu,
> +						     cvt->rotation_out_chan,
> +						     IPU_IRQ_EOF);
> +
> +	ret = request_threaded_irq(cvt->rot_out_eof_irq,
> +				   ipu_ic_rotate_irq, ipu_ic_bh,
> +				   0, "ipu-ic", cvt);
> +	if (ret < 0) {
> +		dev_err(priv->ipu->dev, "could not acquire irq %d\n",
> +			cvt->rot_out_eof_irq);
> +		cvt->rot_out_eof_irq = -1;
> +		goto err;
> +	}
> +
> +	return 0;
> +err:
> +	ipu_ic_release_ipu_resources(cvt);
> +	return ret;
> +}
> +
> +static int ipu_ic_fill_image(struct image_converter_ctx *ctx,
> +			     struct ipu_ic_image *ic_image,
> +			     struct ipu_image *image,
> +			     enum image_convert_type type)
> +{
> +	struct ipu_ic_priv *priv = ctx->cvt->ic->priv;
> +
> +	ic_image->base = *image;
> +	ic_image->type = type;
> +
> +	ic_image->fmt = ipu_ic_get_format(image->pix.pixelformat);
> +	if (!ic_image->fmt) {
> +		dev_err(priv->ipu->dev, "pixelformat not supported for %s\n",
> +			type == IMAGE_CONVERT_OUT ? "Output" : "Input");
> +		return -EINVAL;
> +	}
> +
> +	if (ic_image->fmt->y_depth)
> +		ic_image->stride = (ic_image->fmt->y_depth *
> +				    ic_image->base.pix.width) >> 3;
> +	else
> +		ic_image->stride  = ic_image->base.pix.bytesperline;
> +
> +	ipu_ic_calc_tile_dimensions(ctx, ic_image);
> +	ipu_ic_calc_tile_offsets(ctx, ic_image);
> +
> +	return 0;
> +}
> +
> +/* borrowed from drivers/media/v4l2-core/v4l2-common.c */
> +static unsigned int clamp_align(unsigned int x, unsigned int min,
> +				unsigned int max, unsigned int align)
> +{
> +	/* Bits that must be zero to be aligned */
> +	unsigned int mask = ~((1 << align) - 1);
> +
> +	/* Clamp to aligned min and max */
> +	x = clamp(x, (min + ~mask) & mask, max & mask);
> +
> +	/* Round to nearest aligned value */
> +	if (align)
> +		x = (x + (1 << (align - 1))) & mask;
> +
> +	return x;
> +}
> +
> +/*
> + * We have to adjust the tile width such that the tile physaddrs and
> + * U and V plane offsets are multiples of 8 bytes as required by
> + * the IPU DMA Controller. For the planar formats, this corresponds
> + * to a pixel alignment of 16 (but use a more formal equation since
> + * the variables are available). For all the packed formats, 8 is
> + * good enough.
> + */
> +static inline u32 tile_width_align(const struct ipu_ic_pixfmt *fmt)
> +{
> +	return fmt->y_depth ? (64 * fmt->uv_width_dec) / fmt->y_depth : 8;
> +}
> +
> +/*
> + * For tile height alignment, we have to ensure that the output tile
> + * heights are multiples of 8 lines if the IRT is required by the
> + * given rotation mode (the IRT performs rotations on 8x8 blocks
> + * at a time). If the IRT is not used, or for input image tiles,
> + * 2 lines are good enough.
> + */
> +static inline u32 tile_height_align(enum image_convert_type type,
> +				    enum ipu_rotate_mode rot_mode)
> +{
> +	return (type == IMAGE_CONVERT_OUT &&
> +		ipu_rot_mode_is_irt(rot_mode)) ? 8 : 2;
> +}
> +
> +/* Adjusts input/output images to IPU restrictions */
> +int ipu_image_convert_adjust(struct ipu_image *in, struct ipu_image *out,
> +			     enum ipu_rotate_mode rot_mode)
> +{
> +	const struct ipu_ic_pixfmt *infmt, *outfmt;
> +	unsigned int num_in_rows, num_in_cols;
> +	unsigned int num_out_rows, num_out_cols;
> +	u32 w_align, h_align;
> +
> +	infmt = ipu_ic_get_format(in->pix.pixelformat);
> +	outfmt = ipu_ic_get_format(out->pix.pixelformat);
> +
> +	/* set some defaults if needed */

Is this our task at all?

> +	if (!infmt) {
> +		in->pix.pixelformat = V4L2_PIX_FMT_RGB24;
> +		infmt = ipu_ic_get_format(V4L2_PIX_FMT_RGB24);
> +	}
> +	if (!outfmt) {
> +		out->pix.pixelformat = V4L2_PIX_FMT_RGB24;
> +		outfmt = ipu_ic_get_format(V4L2_PIX_FMT_RGB24);
> +	}
> +
> +	if (!in->pix.width || !in->pix.height) {
> +		in->pix.width = 640;
> +		in->pix.height = 480;
> +	}
> +	if (!out->pix.width || !out->pix.height) {
> +		out->pix.width = 640;
> +		out->pix.height = 480;
> +	}
> +
> +	/* image converter does not handle fields */
> +	in->pix.field = out->pix.field = V4L2_FIELD_NONE;

Why not? The scaler can scale alternate top/bottom fields no problem.

For SEQ_TB/BT and the interleaved interlacing we'd have to adjust
scaling factors per field and use two vertical tiles for the fields
before this can be supported.

> +	/* resizer cannot downsize more than 4:1 */
> +	if (ipu_rot_mode_is_irt(rot_mode)) {
> +		out->pix.height = max_t(__u32, out->pix.height,
> +					in->pix.width / 4);
> +		out->pix.width = max_t(__u32, out->pix.width,
> +				       in->pix.height / 4);
> +	} else {
> +		out->pix.width = max_t(__u32, out->pix.width,
> +				       in->pix.width / 4);
> +		out->pix.height = max_t(__u32, out->pix.height,
> +					in->pix.height / 4);
> +	}
> +
> +	/* get tiling rows/cols from output format */
> +	num_out_rows = ipu_ic_num_stripes(out->pix.height);
> +	num_out_cols = ipu_ic_num_stripes(out->pix.width);
> +	if (ipu_rot_mode_is_irt(rot_mode)) {
> +		num_in_rows = num_out_cols;
> +		num_in_cols = num_out_rows;
> +	} else {
> +		num_in_rows = num_out_rows;
> +		num_in_cols = num_out_cols;
> +	}
> +
> +	/* align input width/height */
> +	w_align = ilog2(tile_width_align(infmt) * num_in_cols);
> +	h_align = ilog2(tile_height_align(IMAGE_CONVERT_IN, rot_mode) *
> +			num_in_rows);
> +	in->pix.width = clamp_align(in->pix.width, MIN_W, MAX_W, w_align);
> +	in->pix.height = clamp_align(in->pix.height, MIN_H, MAX_H, h_align);
> +
> +	/* align output width/height */
> +	w_align = ilog2(tile_width_align(outfmt) * num_out_cols);
> +	h_align = ilog2(tile_height_align(IMAGE_CONVERT_OUT, rot_mode) *
> +			num_out_rows);
> +	out->pix.width = clamp_align(out->pix.width, MIN_W, MAX_W, w_align);
> +	out->pix.height = clamp_align(out->pix.height, MIN_H, MAX_H, h_align);
> +
> +	/* set input/output strides and image sizes */
> +	in->pix.bytesperline = (in->pix.width * infmt->bpp) >> 3;
> +	in->pix.sizeimage = in->pix.height * in->pix.bytesperline;
> +	out->pix.bytesperline = (out->pix.width * outfmt->bpp) >> 3;
> +	out->pix.sizeimage = out->pix.height * out->pix.bytesperline;
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(ipu_image_convert_adjust);
> +
> +/*
> + * this is used by ipu_image_convert_prepare() to verify set input and
> + * output images are valid before starting the conversion. Clients can
> + * also call it before calling ipu_image_convert_prepare().
> + */
> +int ipu_image_convert_verify(struct ipu_image *in, struct ipu_image *out,
> +			     enum ipu_rotate_mode rot_mode)
> +{
> +	struct ipu_image testin, testout;
> +	int ret;
> +
> +	testin = *in;
> +	testout = *out;
> +
> +	ret = ipu_image_convert_adjust(&testin, &testout, rot_mode);
> +	if (ret)
> +		return ret;
> +
> +	if (testin.pix.width != in->pix.width ||
> +	    testin.pix.height != in->pix.height ||
> +	    testout.pix.width != out->pix.width ||
> +	    testout.pix.height != out->pix.height)
> +		return -EINVAL;
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(ipu_image_convert_verify);
> +
> +/*
> + * Call ipu_image_convert_prepare() to prepare for the conversion of
> + * given images and rotation mode. Returns a new conversion context.
> + */
> +struct image_converter_ctx *
> +ipu_image_convert_prepare(struct ipu_ic *ic,
> +			  struct ipu_image *in, struct ipu_image *out,
> +			  enum ipu_rotate_mode rot_mode,
> +			  image_converter_cb_t complete,
> +			  void *complete_context)
> +{
> +	struct ipu_ic_priv *priv = ic->priv;
> +	struct image_converter *cvt = &ic->cvt;
> +	struct ipu_ic_image *s_image, *d_image;
> +	struct image_converter_ctx *ctx;
> +	unsigned long flags;
> +	bool get_res;
> +	int ret;
> +
> +	if (!ic || !in || !out || !complete)
> +		return ERR_PTR(-EINVAL);
> +
> +	/* verify the in/out images before continuing */
> +	ret = ipu_image_convert_verify(in, out, rot_mode);
> +	if (ret) {
> +		dev_err(priv->ipu->dev, "%s: in/out formats invalid\n",
> +			__func__);
> +		return ERR_PTR(ret);
> +	}
> +
> +	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
> +	if (!ctx)
> +		return ERR_PTR(-ENOMEM);
> +
> +	dev_dbg(priv->ipu->dev, "%s: ctx %p\n", __func__, ctx);
> +
> +	ctx->cvt = cvt;
> +	init_completion(&ctx->aborted);
> +
> +	s_image = &ctx->in;
> +	d_image = &ctx->out;
> +
> +	/* set tiling and rotation */
> +	d_image->num_rows = ipu_ic_num_stripes(out->pix.height);
> +	d_image->num_cols = ipu_ic_num_stripes(out->pix.width);
> +	if (ipu_rot_mode_is_irt(rot_mode)) {
> +		s_image->num_rows = d_image->num_cols;
> +		s_image->num_cols = d_image->num_rows;
> +	} else {
> +		s_image->num_rows = d_image->num_rows;
> +		s_image->num_cols = d_image->num_cols;
> +	}
> +
> +	ctx->num_tiles = d_image->num_cols * d_image->num_rows;
> +	ctx->rot_mode = rot_mode;
> +
> +	ret = ipu_ic_fill_image(ctx, s_image, in, IMAGE_CONVERT_IN);
> +	if (ret)
> +		goto out_free;
> +	ret = ipu_ic_fill_image(ctx, d_image, out, IMAGE_CONVERT_OUT);
> +	if (ret)
> +		goto out_free;
> +
> +	ipu_ic_calc_out_tile_map(ctx);
> +
> +	ipu_ic_dump_format(ctx, s_image);
> +	ipu_ic_dump_format(ctx, d_image);
> +
> +	ctx->complete = complete;
> +	ctx->complete_context = complete_context;
> +
> +	/*
> +	 * Can we use double-buffering for this operation? If there is
> +	 * only one tile (the whole image can be converted in a single
> +	 * operation) there's no point in using double-buffering. Also,
> +	 * the IPU's IDMAC channels allow only a single U and V plane
> +	 * offset shared between both buffers, but these offsets change
> +	 * for every tile, and therefore would have to be updated for
> +	 * each buffer which is not possible. So double-buffering is
> +	 * impossible when either the source or destination images are
> +	 * a planar format (YUV420, YUV422P, etc.).
> +	 */
> +	ctx->double_buffering = (ctx->num_tiles > 1 &&
> +				 !s_image->fmt->y_depth &&
> +				 !d_image->fmt->y_depth);
> +
> +	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
> +		ret = ipu_ic_alloc_dma_buf(priv, &ctx->rot_intermediate[0],
> +					   d_image->tile[0].size);
> +		if (ret)
> +			goto out_free;
> +		if (ctx->double_buffering) {
> +			ret = ipu_ic_alloc_dma_buf(priv,
> +						   &ctx->rot_intermediate[1],
> +						   d_image->tile[0].size);
> +			if (ret)
> +				goto out_free_dmabuf0;
> +		}
> +	}
> +
> +	spin_lock_irqsave(&cvt->irqlock, flags);
> +
> +	get_res = list_empty(&cvt->ctx_list);
> +
> +	list_add_tail(&ctx->list, &cvt->ctx_list);
> +
> +	spin_unlock_irqrestore(&cvt->irqlock, flags);
> +
> +	if (get_res) {
> +		ret = ipu_ic_get_ipu_resources(cvt);
> +		if (ret)
> +			goto out_free_dmabuf1;
> +	}
> +
> +	return ctx;
> +
> +out_free_dmabuf1:
> +	ipu_ic_free_dma_buf(priv, &ctx->rot_intermediate[1]);
> +	spin_lock_irqsave(&cvt->irqlock, flags);
> +	list_del(&ctx->list);
> +	spin_unlock_irqrestore(&cvt->irqlock, flags);
> +out_free_dmabuf0:
> +	ipu_ic_free_dma_buf(priv, &ctx->rot_intermediate[0]);
> +out_free:
> +	kfree(ctx);
> +	return ERR_PTR(ret);
> +}
> +EXPORT_SYMBOL_GPL(ipu_image_convert_prepare);
> +
> +/*
> + * Carry out a single image conversion. Only the physaddr's of the input
> + * and output image buffers are needed. The conversion context must have
> + * been created previously with ipu_image_convert_prepare(). Returns the
> + * new run object.
> + */
> +struct image_converter_run *
> +ipu_image_convert_run(struct image_converter_ctx *ctx,
> +		      dma_addr_t in_phys, dma_addr_t out_phys)
> +{
> +	struct image_converter *cvt = ctx->cvt;
> +	struct ipu_ic_priv *priv = cvt->ic->priv;
> +	struct image_converter_run *run;
> +	unsigned long flags;
> +	int ret = 0;
> +
> +	run = kzalloc(sizeof(*run), GFP_KERNEL);
> +	if (!run)
> +		return ERR_PTR(-ENOMEM);

What is the reasoning behind making the image_converter_run opaque to
the user? If you let the user provide it to ipu_image_convert_run, it
wouldn't have to be allocated/freed with each frame.

> +	run->ctx = ctx;
> +	run->in_phys = in_phys;
> +	run->out_phys = out_phys;
> +
> +	dev_dbg(priv->ipu->dev, "%s: ctx %p run %p\n", __func__,
> +		ctx, run);
> +
> +	spin_lock_irqsave(&cvt->irqlock, flags);
> +
> +	if (ctx->aborting) {
> +		ret = -EIO;
> +		goto unlock;
> +	}
> +
> +	list_add_tail(&run->list, &cvt->pending_q);
> +
> +	if (!cvt->current_run) {
> +		ret = ipu_ic_run(run);
> +		if (ret)
> +			cvt->current_run = NULL;
> +	}
> +unlock:
> +	spin_unlock_irqrestore(&cvt->irqlock, flags);
> +
> +	if (ret) {
> +		kfree(run);
> +		run = ERR_PTR(ret);
> +	}
> +
> +	return run;
> +}
> +EXPORT_SYMBOL_GPL(ipu_image_convert_run);
> +
> +/* Abort any active or pending conversions for this context */
> +void ipu_image_convert_abort(struct image_converter_ctx *ctx)
> +{
> +	struct image_converter *cvt = ctx->cvt;
> +	struct ipu_ic_priv *priv = cvt->ic->priv;
> +	struct image_converter_run *run, *active_run, *tmp;
> +	unsigned long flags;
> +	int run_count, ret;
> +	bool need_abort;
> +
> +	reinit_completion(&ctx->aborted);
> +
> +	spin_lock_irqsave(&cvt->irqlock, flags);
> +
> +	/* move all remaining pending runs in this context to done_q */
> +	list_for_each_entry_safe(run, tmp, &cvt->pending_q, list) {
> +		if (run->ctx != ctx)
> +			continue;
> +		run->status = -EIO;
> +		list_move_tail(&run->list, &cvt->done_q);
> +	}
> +
> +	run_count = ipu_ic_get_run_count(ctx, &cvt->done_q);
> +	active_run = (cvt->current_run && cvt->current_run->ctx == ctx) ?
> +		cvt->current_run : NULL;
> +
> +	need_abort = (run_count || active_run);
> +
> +	ctx->aborting = need_abort;
> +
> +	spin_unlock_irqrestore(&cvt->irqlock, flags);
> +
> +	if (!need_abort) {
> +		dev_dbg(priv->ipu->dev, "%s: no abort needed for ctx %p\n",
> +			__func__, ctx);
> +		return;
> +	}
> +
> +	dev_dbg(priv->ipu->dev,
> +		 "%s: wait for completion: %d runs, active run %p\n",
> +		 __func__, run_count, active_run);
> +
> +	ret = wait_for_completion_timeout(&ctx->aborted,
> +					  msecs_to_jiffies(10000));
> +	if (ret == 0) {
> +		dev_warn(priv->ipu->dev, "%s: timeout\n", __func__);
> +		ipu_ic_force_abort(ctx);
> +	}
> +
> +	ctx->aborting = false;
> +}
> +EXPORT_SYMBOL_GPL(ipu_image_convert_abort);
> +
> +/* Unprepare image conversion context */
> +void ipu_image_convert_unprepare(struct image_converter_ctx *ctx)
> +{
> +	struct image_converter *cvt = ctx->cvt;
> +	struct ipu_ic_priv *priv = cvt->ic->priv;
> +	unsigned long flags;
> +	bool put_res;
> +
> +	/* make sure no runs are hanging around */
> +	ipu_image_convert_abort(ctx);
> +
> +	dev_dbg(priv->ipu->dev, "%s: removing ctx %p\n", __func__, ctx);
> +
> +	spin_lock_irqsave(&cvt->irqlock, flags);
> +
> +	list_del(&ctx->list);
> +
> +	put_res = list_empty(&cvt->ctx_list);
> +
> +	spin_unlock_irqrestore(&cvt->irqlock, flags);
> +
> +	if (put_res)
> +		ipu_ic_release_ipu_resources(cvt);
> +
> +	ipu_ic_free_dma_buf(priv, &ctx->rot_intermediate[1]);
> +	ipu_ic_free_dma_buf(priv, &ctx->rot_intermediate[0]);
> +
> +	kfree(ctx);
> +}
> +EXPORT_SYMBOL_GPL(ipu_image_convert_unprepare);
> +
> +/*
> + * "Canned" asynchronous single image conversion. On successful return
> + * caller must call ipu_image_convert_unprepare() after conversion completes.
> + * Returns the new conversion context.
> + */
> +struct image_converter_ctx *
> +ipu_image_convert(struct ipu_ic *ic,
> +		  struct ipu_image *in, struct ipu_image *out,
> +		  enum ipu_rotate_mode rot_mode,
> +		  image_converter_cb_t complete,
> +		  void *complete_context)
> +{
> +	struct image_converter_ctx *ctx;
> +	struct image_converter_run *run;
> +
> +	ctx = ipu_image_convert_prepare(ic, in, out, rot_mode,
> +					complete, complete_context);
> +	if (IS_ERR(ctx))
> +		return ctx;
> +
> +	run = ipu_image_convert_run(ctx, in->phys0, out->phys0);
> +	if (IS_ERR(run)) {
> +		ipu_image_convert_unprepare(ctx);
> +		return ERR_PTR(PTR_ERR(run));
> +	}
> +
> +	return ctx;
> +}
> +EXPORT_SYMBOL_GPL(ipu_image_convert);
> +
> +/* "Canned" synchronous single image conversion */
> +static void image_convert_sync_complete(void *data,
> +					struct image_converter_run *run,
> +					int err)
> +{
> +	struct completion *comp = data;
> +
> +	complete(comp);
> +}
> +
> +int ipu_image_convert_sync(struct ipu_ic *ic,
> +			   struct ipu_image *in, struct ipu_image *out,
> +			   enum ipu_rotate_mode rot_mode)
> +{
> +	struct image_converter_ctx *ctx;
> +	struct completion comp;
> +	int ret;
> +
> +	init_completion(&comp);
> +
> +	ctx = ipu_image_convert(ic, in, out, rot_mode,
> +				image_convert_sync_complete, &comp);
> +	if (IS_ERR(ctx))
> +		return PTR_ERR(ctx);
> +
> +	ret = wait_for_completion_timeout(&comp, msecs_to_jiffies(10000));
> +	ret = (ret == 0) ? -ETIMEDOUT : 0;
> +
> +	ipu_image_convert_unprepare(ctx);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(ipu_image_convert_sync);
> +

Most of this calculation of tile geometry and conversion queue handling
code is not really low level IC hardware access. I'd like the code that
doesn't have to access ipu_ic internals directly to be moved into a
separate source file. I'd suggest ipu-ic-queue.c, or
ipu-image-convert.c.

>  int ipu_ic_enable(struct ipu_ic *ic)
>  {
>  	struct ipu_ic_priv *priv = ic->priv;
> @@ -746,6 +2418,7 @@ int ipu_ic_init(struct ipu_soc *ipu, struct device *dev,
>  	ipu->ic_priv = priv;
>  
>  	spin_lock_init(&priv->lock);
> +
>  	priv->base = devm_ioremap(dev, base, PAGE_SIZE);
>  	if (!priv->base)
>  		return -ENOMEM;
> @@ -758,10 +2431,21 @@ int ipu_ic_init(struct ipu_soc *ipu, struct device *dev,
>  	priv->ipu = ipu;
>  
>  	for (i = 0; i < IC_NUM_TASKS; i++) {
> -		priv->task[i].task = i;
> -		priv->task[i].priv = priv;
> -		priv->task[i].reg = &ic_task_reg[i];
> -		priv->task[i].bit = &ic_task_bit[i];
> +		struct ipu_ic *ic = &priv->task[i];
> +		struct image_converter *cvt = &ic->cvt;
> +
> +		ic->task = i;
> +		ic->priv = priv;
> +		ic->reg = &ic_task_reg[i];
> +		ic->bit = &ic_task_bit[i];
> +		ic->ch = &ic_task_ch[i];
> +
> +		cvt->ic = ic;
> +		spin_lock_init(&cvt->irqlock);
> +		INIT_LIST_HEAD(&cvt->ctx_list);
> +		INIT_LIST_HEAD(&cvt->pending_q);
> +		INIT_LIST_HEAD(&cvt->done_q);
> +		cvt->out_eof_irq = cvt->rot_out_eof_irq = -1;
>  	}
>  
>  	return 0;
> diff --git a/include/video/imx-ipu-v3.h b/include/video/imx-ipu-v3.h
> index 1a3f7d4..992addf 100644
> --- a/include/video/imx-ipu-v3.h
> +++ b/include/video/imx-ipu-v3.h
> @@ -63,17 +63,25 @@ enum ipu_csi_dest {
>  /*
>   * Enumeration of IPU rotation modes
>   */
> +#define IPU_ROT_BIT_VFLIP (1 << 0)
> +#define IPU_ROT_BIT_HFLIP (1 << 1)
> +#define IPU_ROT_BIT_90    (1 << 2)
> +
>  enum ipu_rotate_mode {
>  	IPU_ROTATE_NONE = 0,
> -	IPU_ROTATE_VERT_FLIP,
> -	IPU_ROTATE_HORIZ_FLIP,
> -	IPU_ROTATE_180,
> -	IPU_ROTATE_90_RIGHT,
> -	IPU_ROTATE_90_RIGHT_VFLIP,
> -	IPU_ROTATE_90_RIGHT_HFLIP,
> -	IPU_ROTATE_90_LEFT,
> +	IPU_ROTATE_VERT_FLIP = IPU_ROT_BIT_VFLIP,
> +	IPU_ROTATE_HORIZ_FLIP = IPU_ROT_BIT_HFLIP,
> +	IPU_ROTATE_180 = (IPU_ROT_BIT_VFLIP | IPU_ROT_BIT_HFLIP),
> +	IPU_ROTATE_90_RIGHT = IPU_ROT_BIT_90,
> +	IPU_ROTATE_90_RIGHT_VFLIP = (IPU_ROT_BIT_90 | IPU_ROT_BIT_VFLIP),
> +	IPU_ROTATE_90_RIGHT_HFLIP = (IPU_ROT_BIT_90 | IPU_ROT_BIT_HFLIP),
> +	IPU_ROTATE_90_LEFT = (IPU_ROT_BIT_90 |
> +			      IPU_ROT_BIT_VFLIP | IPU_ROT_BIT_HFLIP),
>  };
>  
> +/* 90-degree rotations require the IRT unit */
> +#define ipu_rot_mode_is_irt(m) ((m) >= IPU_ROTATE_90_RIGHT)
> +
>  enum ipu_color_space {
>  	IPUV3_COLORSPACE_RGB,
>  	IPUV3_COLORSPACE_YUV,
> @@ -337,6 +345,7 @@ enum ipu_ic_task {
>  };
>  
>  struct ipu_ic;
> +
>  int ipu_ic_task_init(struct ipu_ic *ic,
>  		     int in_width, int in_height,
>  		     int out_width, int out_height,
> @@ -351,6 +360,40 @@ void ipu_ic_task_disable(struct ipu_ic *ic);
>  int ipu_ic_task_idma_init(struct ipu_ic *ic, struct ipuv3_channel *channel,
>  			  u32 width, u32 height, int burst_size,
>  			  enum ipu_rotate_mode rot);
> +
> +struct image_converter_ctx;
> +struct image_converter_run;
> +

Add an ipu_ prefix to those.

> +typedef void (*image_converter_cb_t)(void *ctx,
> +				     struct image_converter_run *run,
> +				     int err);
> +
> +int ipu_image_convert_enum_format(int index, const char **desc, u32 *fourcc);
> +int ipu_image_convert_adjust(struct ipu_image *in, struct ipu_image *out,
> +			     enum ipu_rotate_mode rot_mode);
> +int ipu_image_convert_verify(struct ipu_image *in, struct ipu_image *out,
> +			     enum ipu_rotate_mode rot_mode);
> +struct image_converter_ctx *
> +ipu_image_convert_prepare(struct ipu_ic *ic,
> +			  struct ipu_image *in, struct ipu_image *out,
> +			  enum ipu_rotate_mode rot_mode,
> +			  image_converter_cb_t complete,
> +			  void *complete_context);
> +void ipu_image_convert_unprepare(struct image_converter_ctx *ctx);
> +struct image_converter_run *
> +ipu_image_convert_run(struct image_converter_ctx *ctx,
> +		      dma_addr_t in_phys, dma_addr_t out_phys);
> +void ipu_image_convert_abort(struct image_converter_ctx *ctx);
> +struct image_converter_ctx *
> +ipu_image_convert(struct ipu_ic *ic,
> +		  struct ipu_image *in, struct ipu_image *out,
> +		  enum ipu_rotate_mode rot_mode,
> +		  image_converter_cb_t complete,
> +		  void *complete_context);
> +int ipu_image_convert_sync(struct ipu_ic *ic,
> +			   struct ipu_image *in, struct ipu_image *out,
> +			   enum ipu_rotate_mode rot_mode);
> +
>  int ipu_ic_enable(struct ipu_ic *ic);
>  int ipu_ic_disable(struct ipu_ic *ic);
>  struct ipu_ic *ipu_ic_get(struct ipu_soc *ipu, enum ipu_ic_task task);

regards
Philipp

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 4/4] gpu: ipu-ic: allow multiple handles to ic
  2016-08-18  0:50   ` Steve Longerbeam
  (?)
@ 2016-09-06  9:26     ` Philipp Zabel
  -1 siblings, 0 replies; 32+ messages in thread
From: Philipp Zabel @ 2016-09-06  9:26 UTC (permalink / raw)
  To: Steve Longerbeam
  Cc: plagnioj, tomi.valkeinen, dri-devel, linux-kernel, linux-fbdev,
	Steve Longerbeam

Am Mittwoch, den 17.08.2016, 17:50 -0700 schrieb Steve Longerbeam:
> The image converter kernel API supports conversion contexts and
> job queues, so we should allow more than one handle to the IC, so
> that multiple users can add jobs to the queue.

The image converter queue can be shared, but hardware access to the IC
task still has to be exclusive (ipu_ic_task_enable/disable/init/etc.)

> Note however that users that control the IC manually (that do not
> use the image converter APIs but setup the IC task by hand via calls
> to ipu_ic_task_enable(), ipu_ic_enable(), etc.) must still be careful not
> to share the IC handle with other threads. At this point, the only user
> that still controls the IC manually is the i.mx capture driver. In that
> case the capture driver only allows one open context to get a handle
> to the IC at a time, so we should be ok there.

The ipu_ic task handles should be kept exclusive. The image conversion
queue API could get its own handle (ipu_ic_queue? basically what is now
struct image_converter) with its own refcounting get/put functions on
top and each queue should take one exclusive reference on its
corresponding IC task while requested.:_:

If the capture code uses FSU channel linking to feed the IC preprocessor
tasks directly from the CSI, the viewfinder and encoder IC tasks should
not be available for the conversion queues to use.

regards
Philipp

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 4/4] gpu: ipu-ic: allow multiple handles to ic
@ 2016-09-06  9:26     ` Philipp Zabel
  0 siblings, 0 replies; 32+ messages in thread
From: Philipp Zabel @ 2016-09-06  9:26 UTC (permalink / raw)
  To: Steve Longerbeam
  Cc: linux-fbdev, Steve Longerbeam, linux-kernel, dri-devel,
	tomi.valkeinen, plagnioj

Am Mittwoch, den 17.08.2016, 17:50 -0700 schrieb Steve Longerbeam:
> The image converter kernel API supports conversion contexts and
> job queues, so we should allow more than one handle to the IC, so
> that multiple users can add jobs to the queue.

The image converter queue can be shared, but hardware access to the IC
task still has to be exclusive (ipu_ic_task_enable/disable/init/etc.)

> Note however that users that control the IC manually (that do not
> use the image converter APIs but setup the IC task by hand via calls
> to ipu_ic_task_enable(), ipu_ic_enable(), etc.) must still be careful not
> to share the IC handle with other threads. At this point, the only user
> that still controls the IC manually is the i.mx capture driver. In that
> case the capture driver only allows one open context to get a handle
> to the IC at a time, so we should be ok there.

The ipu_ic task handles should be kept exclusive. The image conversion
queue API could get its own handle (ipu_ic_queue? basically what is now
struct image_converter) with its own refcounting get/put functions on
top and each queue should take one exclusive reference on its
corresponding IC task while requested.:_:

If the capture code uses FSU channel linking to feed the IC preprocessor
tasks directly from the CSI, the viewfinder and encoder IC tasks should
not be available for the conversion queues to use.

regards
Philipp


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 4/4] gpu: ipu-ic: allow multiple handles to ic
@ 2016-09-06  9:26     ` Philipp Zabel
  0 siblings, 0 replies; 32+ messages in thread
From: Philipp Zabel @ 2016-09-06  9:26 UTC (permalink / raw)
  To: Steve Longerbeam
  Cc: linux-fbdev, Steve Longerbeam, linux-kernel, dri-devel,
	tomi.valkeinen, plagnioj

Am Mittwoch, den 17.08.2016, 17:50 -0700 schrieb Steve Longerbeam:
> The image converter kernel API supports conversion contexts and
> job queues, so we should allow more than one handle to the IC, so
> that multiple users can add jobs to the queue.

The image converter queue can be shared, but hardware access to the IC
task still has to be exclusive (ipu_ic_task_enable/disable/init/etc.)

> Note however that users that control the IC manually (that do not
> use the image converter APIs but setup the IC task by hand via calls
> to ipu_ic_task_enable(), ipu_ic_enable(), etc.) must still be careful not
> to share the IC handle with other threads. At this point, the only user
> that still controls the IC manually is the i.mx capture driver. In that
> case the capture driver only allows one open context to get a handle
> to the IC at a time, so we should be ok there.

The ipu_ic task handles should be kept exclusive. The image conversion
queue API could get its own handle (ipu_ic_queue? basically what is now
struct image_converter) with its own refcounting get/put functions on
top and each queue should take one exclusive reference on its
corresponding IC task while requested.:_:

If the capture code uses FSU channel linking to feed the IC preprocessor
tasks directly from the CSI, the viewfinder and encoder IC tasks should
not be available for the conversion queues to use.

regards
Philipp

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 3/4] gpu: ipu-ic: Add complete image conversion support with tiling
  2016-09-06  9:26     ` Philipp Zabel
  (?)
@ 2016-09-15  1:45       ` Steve Longerbeam
  -1 siblings, 0 replies; 32+ messages in thread
From: Steve Longerbeam @ 2016-09-15  1:45 UTC (permalink / raw)
  To: Philipp Zabel, Steve Longerbeam
  Cc: plagnioj, tomi.valkeinen, dri-devel, linux-kernel, linux-fbdev

Hi Philipp,


On 09/06/2016 02:26 AM, Philipp Zabel wrote:
> Hi Steve,
>
> Am Mittwoch, den 17.08.2016, 17:50 -0700 schrieb Steve Longerbeam:
>> This patch implements complete image conversion support to ipu-ic,
>> with tiling to support scaling to and from images up to 4096x4096.
>> Image rotation is also supported.
>>
>> The internal API is subsystem agnostic (no V4L2 dependency except
>> for the use of V4L2 fourcc pixel formats).
>>
>> Callers prepare for image conversion by calling
>> ipu_image_convert_prepare(), which initializes the parameters of
>> the conversion.
> ... and possibly allocates intermediate buffers for rotation support.
> This should be documented somewhere, with a node that v4l2 users should
> be doing this during REQBUFS.

I added comment headers for all the image conversion prototypes.
It caused bloat in imx-ipu-v3.h, so I moved it to a new header:
include/video/imx-image-convert.h, but let me know if we should put
this somewhere else and/or under Documentation/ somewhere.


>>   The caller passes in the ipu_ic task to use for
>> the conversion, the input and output image formats, a rotation mode,
>> and a completion callback and completion context pointer:
>>
>> struct image_converter_ctx *
>> ipu_image_convert_prepare(struct ipu_ic *ic,
>>                            struct ipu_image *in, struct ipu_image *out,
>>                            enum ipu_rotate_mode rot_mode,
>>                            image_converter_cb_t complete,
>>                            void *complete_context);
> As I commented on the other patch, I think the image_convert functions
> should use a separate handle for the image conversion queues that sit on
> top of the ipu_ic task handles.

Here is a new prototype I came up with:

struct ipu_image_convert_ctx *
ipu_image_convert_prepare(struct ipu_soc *ipu, enum ipu_ic_task ic_task,
               struct ipu_image *in, struct ipu_image *out,
               enum ipu_rotate_mode rot_mode,
               ipu_image_convert_cb_t complete,
               void *complete_context);

In other words, the ipu_ic handle is replaced by the IPU handle and IC task
that are requested for carrying out the conversion.

The image converter will acquire the ipu_ic handle internally, whenever 
there
are queued contexts to that IC task (which I am calling a 'struct 
ipu_image_convert_chan').
This way the IC handle can be shared by all contexts using that IC task. 
After all
contexts have been freed from the (struct 
ipu_image_convert_chan)->ctx_list queue,
the ipu_ic handle is freed.

The ipu_ic handle is acquired in get_ipu_resources() and freed in 
release_ipu_resources(),
along with all the other IPU resources that *could possibly be needed* 
in that
ipu_image_convert_chan by future contexts (*all* idmac channels, *all* 
irqs).

I should have done this from the start, instead of allowing multiple 
handles the the IC tasks.
Thanks for pointing this out.

>
>> +
>> +#define MIN_W     128
>> +#define MIN_H     128
> Where does this minimum come from?

Nowhere really :) This is just some sane minimums, to pass
to clamp_align() when aligning input/output width/height in
ipu_image_convert_adjust().

>> +struct ic_task_channels {
>> +	int in;
>> +	int out;
>> +	int rot_in;
>> +	int rot_out;
>> +	int vdi_in_p;
>> +	int vdi_in;
>> +	int vdi_in_n;
> The vdi channels are unused.

Well, I'd prefer to keep the VDI channels. It's quite possible we
can add motion compensated deinterlacing support using the
PRP_VF task to the image converter in the future.

>> +struct image_converter_ctx {
>> +	struct image_converter *cvt;
>> +
>> +	image_converter_cb_t complete;
>> +	void *complete_context;
>> +
>> +	/* Source/destination image data and rotation mode */
>> +	struct ipu_ic_image in;
>> +	struct ipu_ic_image out;
>> +	enum ipu_rotate_mode rot_mode;
>> +
>> +	/* intermediate buffer for rotation */
>> +	struct ipu_ic_dma_buf rot_intermediate[2];
> No need to change it now, but I assume these could be per IC task
> instead of per context.

Actually no. The rotation intermediate buffers have the dimension
of a single tile, so they must remain in the context struct.

>> +static const struct ipu_ic_pixfmt ipu_ic_formats[] = {
>> +	{
>> +		.name	= "RGB565",
> Please drop the names, keeping a list of user readable format names is
> the v4l2 core's business, not ours.

done.

>> +		.fourcc	= V4L2_PIX_FMT_RGB565,
>> +		.bpp    = 16,
> bpp is only ever used in bytes, not bits (always divided by 8).
> Why not make this bytes_per_pixel or pixel_stride = 2.

Actually bpp is used to calculate *total* tile sizes and *total* bytes
per line. For the planar 4:2:0 formats that means it must be specified
in bits.


>> +	}, {
>> +		.name	= "4:2:0 planar, YUV",
>> +		.fourcc	= V4L2_PIX_FMT_YUV420,
>> +		.bpp    = 12,
>> +		.y_depth = 8,
> y_depth is only ever used in bytes, not bits (always divided by 8).
> Why not make this bool planar instead.

sure why not, although I think y_depth makes the calculations more
explanatory, but not important. Done.

>>
>> +static int ipu_ic_alloc_dma_buf(struct ipu_ic_priv *priv,
>> +				struct ipu_ic_dma_buf *buf,
>> +				int size)
>> +{
>> +	unsigned long newlen = PAGE_ALIGN(size);
>> +
>> +	if (buf->virt) {
>> +		if (buf->len == newlen)
>> +			return 0;
>> +		ipu_ic_free_dma_buf(priv, buf);
>> +	}
> Is it necessary to support reallocation? This is currently only used by
> the prepare function, which creates a new context.

Yep, thanks for catching, removed.

>> +static void ipu_ic_calc_tile_dimensions(struct image_converter_ctx *ctx,
>> +					struct ipu_ic_image *image)
>> +{
>> +	int i;
>> +
>> +	for (i = 0; i < ctx->num_tiles; i++) {
>> +		struct ipu_ic_tile *tile = &image->tile[i];
>> +
>> +		tile->height = image->base.pix.height / image->num_rows;
>> +		tile->width = image->base.pix.width / image->num_cols;
> We already have talked about this, this simplified tiling will cause
> image artifacts (horizontal and vertical seams at the tile borders) when
> the bilinear upscaler source pixel step is significantly smaller than a
> whole pixel.
> This can be fixed in the future by using overlapping tiles of different
> sizes and possibly by slightly changing the scaling factors of
> individual tiles.

Right, for now I've added a FIXME note near the top.

> This looks nice, but I'd just move the rot_mode conditional below
> assignment of src_row/col and do away with the sin/cos temporary
> variables:
>
> 	/*
> 	 * before doing the transform, first we have to translate
> 	 * source row,col for an origin in the center of s_image
> 	 */
> 	src_row = src_row * 2 - (s_image->num_rows - 1);
> 	src_col = src_col * 2 - (s_image->num_cols - 1);
>
> 	/* do the rotation transform */
> 	if (ctx->rot_mode & IPU_ROT_BIT_90) {
> 		dst_col = -src_row;
> 		dst_row = src_col;
> 	} else {
> 		dst_col = src_col;
> 		dst_row = src_row;
> 	}

Done.

>> +		for (col = 0; col < image->num_cols; col++) {
>> +			y_col_off = (col * w * y_depth) >> 3;
> We know that for planar formats, y_depth can only ever be 8. No need to
> calculate this here.

Done.

> Most of the following code seems to be running under one big spinlock.
> Is this really necessary?

You're right, convert_stop(), convert_start(), and init_idmac_channel() are
only calling the ipu_ic lower level primitives. So they don't require 
the irqlock.
I did remove the "hold irqlock when calling" comment for those. However
they are called embedded in the irq handling, so it would be cumbersome
to drop the lock there only because they don't need it. We can revisit the
lock handling later if you see some room for optimization there.


>> +static int ipu_ic_get_run_count(struct image_converter_ctx *ctx,
>> +				struct list_head *q)
>> +{
>> +	struct image_converter_run *run;
>> +	int count = 0;
> Add
> 	lockdep_assert_held(&ctx->irqlock);
> for the functions that expect their caller to be holding the lock.

Done.

>> +	list_for_each_entry(run, q, list) {
>> +		if (run->ctx == ctx)
>> +			count++;
>> +	}
>> +
>> +	return count;
>> +}
>> +
>> +/* hold irqlock when calling */
>> +static void ipu_ic_convert_stop(struct image_converter_run *run)
>> +{
>> +	struct image_converter_ctx *ctx = run->ctx;
>> +	struct image_converter *cvt = ctx->cvt;
>> +	struct ipu_ic_priv *priv = cvt->ic->priv;
>> +
>> +	dev_dbg(priv->ipu->dev, "%s: stopping ctx %p run %p\n",
>> +		__func__, ctx, run);
> Maybe add some indication which IC task this context belongs to?

Done.

>> +
>> +	if (channel == cvt->rotation_in_chan ||
>> +	    channel == cvt->rotation_out_chan) {
>> +		burst_size = 8;
>> +		ipu_cpmem_set_block_mode(channel);
>> +	} else
>> +		burst_size = (width % 16) ? 8 : 16;
> This is for later, but it might turn out to be better to accept a little
> overdraw if stride allows for it and use the larger burst size,
> especially for wide images.

Right, as long as the stride is a multiple of the burst size.


>>
>>
>>
>> +
>> +static irqreturn_t ipu_ic_norotate_irq(int irq, void *data)
>> +{
>> +	struct image_converter *cvt = data;
>> +	struct image_converter_ctx *ctx;
>> +	struct image_converter_run *run;
>> +	unsigned long flags;
>> +	irqreturn_t ret;
>> +
>> +	spin_lock_irqsave(&cvt->irqlock, flags);
>> +
>> +	/* get current run and its context */
>> +	run = cvt->current_run;
>> +	if (!run) {
>> +		ret = IRQ_NONE;
>> +		goto out;
>> +	}
>> +
>> +	ctx = run->ctx;
>> +
>> +	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
>> +		/* this is a rotation operation, just ignore */
>> +		spin_unlock_irqrestore(&cvt->irqlock, flags);
>> +		return IRQ_HANDLED;
>> +	}
> Why enable the out_chan EOF irq at all when using the IRT mode?

Because (see above), all the IPU resources that might be needed
for any conversion context that is queued to a image conversion
channel (IC task) are acquired when the first context is queued,
including rotation resources. So by acquiring the non-rotation EOF
irq, it will get fielded even for rotation conversions, so we have to
handle it.

>>
>>
>> +/* Adjusts input/output images to IPU restrictions */
>> +int ipu_image_convert_adjust(struct ipu_image *in, struct ipu_image *out,
>> +			     enum ipu_rotate_mode rot_mode)
>> +{
>> +	const struct ipu_ic_pixfmt *infmt, *outfmt;
>> +	unsigned int num_in_rows, num_in_cols;
>> +	unsigned int num_out_rows, num_out_cols;
>> +	u32 w_align, h_align;
>> +
>> +	infmt = ipu_ic_get_format(in->pix.pixelformat);
>> +	outfmt = ipu_ic_get_format(out->pix.pixelformat);
>> +
>> +	/* set some defaults if needed */
> Is this our task at all?

ipu_image_convert_adjust() is meant to be called by v4l2 try_format(),
which should never return EINVAL but should return a supported format
when the passed format is not supported. So I added this here to return
some default pixel formats and width/heights if needed.

>> +	if (!infmt) {
>> +		in->pix.pixelformat = V4L2_PIX_FMT_RGB24;
>> +		infmt = ipu_ic_get_format(V4L2_PIX_FMT_RGB24);
>> +	}
>> +	if (!outfmt) {
>> +		out->pix.pixelformat = V4L2_PIX_FMT_RGB24;
>> +		outfmt = ipu_ic_get_format(V4L2_PIX_FMT_RGB24);
>> +	}
>> +
>> +	if (!in->pix.width || !in->pix.height) {
>> +		in->pix.width = 640;
>> +		in->pix.height = 480;
>> +	}
>> +	if (!out->pix.width || !out->pix.height) {
>> +		out->pix.width = 640;
>> +		out->pix.height = 480;
>> +	}
>> +
>> +	/* image converter does not handle fields */
>> +	in->pix.field = out->pix.field = V4L2_FIELD_NONE;
> Why not? The scaler can scale alternate top/bottom fields no problem.
>
> For SEQ_TB/BT and the interleaved interlacing we'd have to adjust
> scaling factors per field and use two vertical tiles for the fields
> before this can be supported.

Right, we could do that. It would then be up to a later pipeline element
to do the deinterlacing, but at least this would scale and/or color convert
the fields.

>> +/*
>> + * Carry out a single image conversion. Only the physaddr's of the input
>> + * and output image buffers are needed. The conversion context must have
>> + * been created previously with ipu_image_convert_prepare(). Returns the
>> + * new run object.
>> + */
>> +struct image_converter_run *
>> +ipu_image_convert_run(struct image_converter_ctx *ctx,
>> +		      dma_addr_t in_phys, dma_addr_t out_phys)
>> +{
>> +	struct image_converter *cvt = ctx->cvt;
>> +	struct ipu_ic_priv *priv = cvt->ic->priv;
>> +	struct image_converter_run *run;
>> +	unsigned long flags;
>> +	int ret = 0;
>> +
>> +	run = kzalloc(sizeof(*run), GFP_KERNEL);
>> +	if (!run)
>> +		return ERR_PTR(-ENOMEM);
> What is the reasoning behind making the image_converter_run opaque to
> the user? If you let the user provide it to ipu_image_convert_run, it
> wouldn't have to be allocated/freed with each frame.

Good idea, done!

> Most of this calculation of tile geometry and conversion queue handling
> code is not really low level IC hardware access. I'd like the code that
> doesn't have to access ipu_ic internals directly to be moved into a
> separate source file. I'd suggest ipu-ic-queue.c, or
> ipu-image-convert.c.

Done, I created ipu-image-convert.c.

>> +
>> +struct image_converter_ctx;
>> +struct image_converter_run;
>> +
> Add an ipu_ prefix to those.

Done.

I will be pushing a new patch-set shortly with these changes.

Steve

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 3/4] gpu: ipu-ic: Add complete image conversion support with tiling
@ 2016-09-15  1:45       ` Steve Longerbeam
  0 siblings, 0 replies; 32+ messages in thread
From: Steve Longerbeam @ 2016-09-15  1:45 UTC (permalink / raw)
  To: Philipp Zabel, Steve Longerbeam
  Cc: plagnioj, tomi.valkeinen, dri-devel, linux-kernel, linux-fbdev

Hi Philipp,


On 09/06/2016 02:26 AM, Philipp Zabel wrote:
> Hi Steve,
>
> Am Mittwoch, den 17.08.2016, 17:50 -0700 schrieb Steve Longerbeam:
>> This patch implements complete image conversion support to ipu-ic,
>> with tiling to support scaling to and from images up to 4096x4096.
>> Image rotation is also supported.
>>
>> The internal API is subsystem agnostic (no V4L2 dependency except
>> for the use of V4L2 fourcc pixel formats).
>>
>> Callers prepare for image conversion by calling
>> ipu_image_convert_prepare(), which initializes the parameters of
>> the conversion.
> ... and possibly allocates intermediate buffers for rotation support.
> This should be documented somewhere, with a node that v4l2 users should
> be doing this during REQBUFS.

I added comment headers for all the image conversion prototypes.
It caused bloat in imx-ipu-v3.h, so I moved it to a new header:
include/video/imx-image-convert.h, but let me know if we should put
this somewhere else and/or under Documentation/ somewhere.


>>   The caller passes in the ipu_ic task to use for
>> the conversion, the input and output image formats, a rotation mode,
>> and a completion callback and completion context pointer:
>>
>> struct image_converter_ctx *
>> ipu_image_convert_prepare(struct ipu_ic *ic,
>>                            struct ipu_image *in, struct ipu_image *out,
>>                            enum ipu_rotate_mode rot_mode,
>>                            image_converter_cb_t complete,
>>                            void *complete_context);
> As I commented on the other patch, I think the image_convert functions
> should use a separate handle for the image conversion queues that sit on
> top of the ipu_ic task handles.

Here is a new prototype I came up with:

struct ipu_image_convert_ctx *
ipu_image_convert_prepare(struct ipu_soc *ipu, enum ipu_ic_task ic_task,
               struct ipu_image *in, struct ipu_image *out,
               enum ipu_rotate_mode rot_mode,
               ipu_image_convert_cb_t complete,
               void *complete_context);

In other words, the ipu_ic handle is replaced by the IPU handle and IC task
that are requested for carrying out the conversion.

The image converter will acquire the ipu_ic handle internally, whenever 
there
are queued contexts to that IC task (which I am calling a 'struct 
ipu_image_convert_chan').
This way the IC handle can be shared by all contexts using that IC task. 
After all
contexts have been freed from the (struct 
ipu_image_convert_chan)->ctx_list queue,
the ipu_ic handle is freed.

The ipu_ic handle is acquired in get_ipu_resources() and freed in 
release_ipu_resources(),
along with all the other IPU resources that *could possibly be needed* 
in that
ipu_image_convert_chan by future contexts (*all* idmac channels, *all* 
irqs).

I should have done this from the start, instead of allowing multiple 
handles the the IC tasks.
Thanks for pointing this out.

>
>> +
>> +#define MIN_W     128
>> +#define MIN_H     128
> Where does this minimum come from?

Nowhere really :) This is just some sane minimums, to pass
to clamp_align() when aligning input/output width/height in
ipu_image_convert_adjust().

>> +struct ic_task_channels {
>> +	int in;
>> +	int out;
>> +	int rot_in;
>> +	int rot_out;
>> +	int vdi_in_p;
>> +	int vdi_in;
>> +	int vdi_in_n;
> The vdi channels are unused.

Well, I'd prefer to keep the VDI channels. It's quite possible we
can add motion compensated deinterlacing support using the
PRP_VF task to the image converter in the future.

>> +struct image_converter_ctx {
>> +	struct image_converter *cvt;
>> +
>> +	image_converter_cb_t complete;
>> +	void *complete_context;
>> +
>> +	/* Source/destination image data and rotation mode */
>> +	struct ipu_ic_image in;
>> +	struct ipu_ic_image out;
>> +	enum ipu_rotate_mode rot_mode;
>> +
>> +	/* intermediate buffer for rotation */
>> +	struct ipu_ic_dma_buf rot_intermediate[2];
> No need to change it now, but I assume these could be per IC task
> instead of per context.

Actually no. The rotation intermediate buffers have the dimension
of a single tile, so they must remain in the context struct.

>> +static const struct ipu_ic_pixfmt ipu_ic_formats[] = {
>> +	{
>> +		.name	= "RGB565",
> Please drop the names, keeping a list of user readable format names is
> the v4l2 core's business, not ours.

done.

>> +		.fourcc	= V4L2_PIX_FMT_RGB565,
>> +		.bpp    = 16,
> bpp is only ever used in bytes, not bits (always divided by 8).
> Why not make this bytes_per_pixel or pixel_stride = 2.

Actually bpp is used to calculate *total* tile sizes and *total* bytes
per line. For the planar 4:2:0 formats that means it must be specified
in bits.


>> +	}, {
>> +		.name	= "4:2:0 planar, YUV",
>> +		.fourcc	= V4L2_PIX_FMT_YUV420,
>> +		.bpp    = 12,
>> +		.y_depth = 8,
> y_depth is only ever used in bytes, not bits (always divided by 8).
> Why not make this bool planar instead.

sure why not, although I think y_depth makes the calculations more
explanatory, but not important. Done.

>>
>> +static int ipu_ic_alloc_dma_buf(struct ipu_ic_priv *priv,
>> +				struct ipu_ic_dma_buf *buf,
>> +				int size)
>> +{
>> +	unsigned long newlen = PAGE_ALIGN(size);
>> +
>> +	if (buf->virt) {
>> +		if (buf->len = newlen)
>> +			return 0;
>> +		ipu_ic_free_dma_buf(priv, buf);
>> +	}
> Is it necessary to support reallocation? This is currently only used by
> the prepare function, which creates a new context.

Yep, thanks for catching, removed.

>> +static void ipu_ic_calc_tile_dimensions(struct image_converter_ctx *ctx,
>> +					struct ipu_ic_image *image)
>> +{
>> +	int i;
>> +
>> +	for (i = 0; i < ctx->num_tiles; i++) {
>> +		struct ipu_ic_tile *tile = &image->tile[i];
>> +
>> +		tile->height = image->base.pix.height / image->num_rows;
>> +		tile->width = image->base.pix.width / image->num_cols;
> We already have talked about this, this simplified tiling will cause
> image artifacts (horizontal and vertical seams at the tile borders) when
> the bilinear upscaler source pixel step is significantly smaller than a
> whole pixel.
> This can be fixed in the future by using overlapping tiles of different
> sizes and possibly by slightly changing the scaling factors of
> individual tiles.

Right, for now I've added a FIXME note near the top.

> This looks nice, but I'd just move the rot_mode conditional below
> assignment of src_row/col and do away with the sin/cos temporary
> variables:
>
> 	/*
> 	 * before doing the transform, first we have to translate
> 	 * source row,col for an origin in the center of s_image
> 	 */
> 	src_row = src_row * 2 - (s_image->num_rows - 1);
> 	src_col = src_col * 2 - (s_image->num_cols - 1);
>
> 	/* do the rotation transform */
> 	if (ctx->rot_mode & IPU_ROT_BIT_90) {
> 		dst_col = -src_row;
> 		dst_row = src_col;
> 	} else {
> 		dst_col = src_col;
> 		dst_row = src_row;
> 	}

Done.

>> +		for (col = 0; col < image->num_cols; col++) {
>> +			y_col_off = (col * w * y_depth) >> 3;
> We know that for planar formats, y_depth can only ever be 8. No need to
> calculate this here.

Done.

> Most of the following code seems to be running under one big spinlock.
> Is this really necessary?

You're right, convert_stop(), convert_start(), and init_idmac_channel() are
only calling the ipu_ic lower level primitives. So they don't require 
the irqlock.
I did remove the "hold irqlock when calling" comment for those. However
they are called embedded in the irq handling, so it would be cumbersome
to drop the lock there only because they don't need it. We can revisit the
lock handling later if you see some room for optimization there.


>> +static int ipu_ic_get_run_count(struct image_converter_ctx *ctx,
>> +				struct list_head *q)
>> +{
>> +	struct image_converter_run *run;
>> +	int count = 0;
> Add
> 	lockdep_assert_held(&ctx->irqlock);
> for the functions that expect their caller to be holding the lock.

Done.

>> +	list_for_each_entry(run, q, list) {
>> +		if (run->ctx = ctx)
>> +			count++;
>> +	}
>> +
>> +	return count;
>> +}
>> +
>> +/* hold irqlock when calling */
>> +static void ipu_ic_convert_stop(struct image_converter_run *run)
>> +{
>> +	struct image_converter_ctx *ctx = run->ctx;
>> +	struct image_converter *cvt = ctx->cvt;
>> +	struct ipu_ic_priv *priv = cvt->ic->priv;
>> +
>> +	dev_dbg(priv->ipu->dev, "%s: stopping ctx %p run %p\n",
>> +		__func__, ctx, run);
> Maybe add some indication which IC task this context belongs to?

Done.

>> +
>> +	if (channel = cvt->rotation_in_chan ||
>> +	    channel = cvt->rotation_out_chan) {
>> +		burst_size = 8;
>> +		ipu_cpmem_set_block_mode(channel);
>> +	} else
>> +		burst_size = (width % 16) ? 8 : 16;
> This is for later, but it might turn out to be better to accept a little
> overdraw if stride allows for it and use the larger burst size,
> especially for wide images.

Right, as long as the stride is a multiple of the burst size.


>>
>>
>>
>> +
>> +static irqreturn_t ipu_ic_norotate_irq(int irq, void *data)
>> +{
>> +	struct image_converter *cvt = data;
>> +	struct image_converter_ctx *ctx;
>> +	struct image_converter_run *run;
>> +	unsigned long flags;
>> +	irqreturn_t ret;
>> +
>> +	spin_lock_irqsave(&cvt->irqlock, flags);
>> +
>> +	/* get current run and its context */
>> +	run = cvt->current_run;
>> +	if (!run) {
>> +		ret = IRQ_NONE;
>> +		goto out;
>> +	}
>> +
>> +	ctx = run->ctx;
>> +
>> +	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
>> +		/* this is a rotation operation, just ignore */
>> +		spin_unlock_irqrestore(&cvt->irqlock, flags);
>> +		return IRQ_HANDLED;
>> +	}
> Why enable the out_chan EOF irq at all when using the IRT mode?

Because (see above), all the IPU resources that might be needed
for any conversion context that is queued to a image conversion
channel (IC task) are acquired when the first context is queued,
including rotation resources. So by acquiring the non-rotation EOF
irq, it will get fielded even for rotation conversions, so we have to
handle it.

>>
>>
>> +/* Adjusts input/output images to IPU restrictions */
>> +int ipu_image_convert_adjust(struct ipu_image *in, struct ipu_image *out,
>> +			     enum ipu_rotate_mode rot_mode)
>> +{
>> +	const struct ipu_ic_pixfmt *infmt, *outfmt;
>> +	unsigned int num_in_rows, num_in_cols;
>> +	unsigned int num_out_rows, num_out_cols;
>> +	u32 w_align, h_align;
>> +
>> +	infmt = ipu_ic_get_format(in->pix.pixelformat);
>> +	outfmt = ipu_ic_get_format(out->pix.pixelformat);
>> +
>> +	/* set some defaults if needed */
> Is this our task at all?

ipu_image_convert_adjust() is meant to be called by v4l2 try_format(),
which should never return EINVAL but should return a supported format
when the passed format is not supported. So I added this here to return
some default pixel formats and width/heights if needed.

>> +	if (!infmt) {
>> +		in->pix.pixelformat = V4L2_PIX_FMT_RGB24;
>> +		infmt = ipu_ic_get_format(V4L2_PIX_FMT_RGB24);
>> +	}
>> +	if (!outfmt) {
>> +		out->pix.pixelformat = V4L2_PIX_FMT_RGB24;
>> +		outfmt = ipu_ic_get_format(V4L2_PIX_FMT_RGB24);
>> +	}
>> +
>> +	if (!in->pix.width || !in->pix.height) {
>> +		in->pix.width = 640;
>> +		in->pix.height = 480;
>> +	}
>> +	if (!out->pix.width || !out->pix.height) {
>> +		out->pix.width = 640;
>> +		out->pix.height = 480;
>> +	}
>> +
>> +	/* image converter does not handle fields */
>> +	in->pix.field = out->pix.field = V4L2_FIELD_NONE;
> Why not? The scaler can scale alternate top/bottom fields no problem.
>
> For SEQ_TB/BT and the interleaved interlacing we'd have to adjust
> scaling factors per field and use two vertical tiles for the fields
> before this can be supported.

Right, we could do that. It would then be up to a later pipeline element
to do the deinterlacing, but at least this would scale and/or color convert
the fields.

>> +/*
>> + * Carry out a single image conversion. Only the physaddr's of the input
>> + * and output image buffers are needed. The conversion context must have
>> + * been created previously with ipu_image_convert_prepare(). Returns the
>> + * new run object.
>> + */
>> +struct image_converter_run *
>> +ipu_image_convert_run(struct image_converter_ctx *ctx,
>> +		      dma_addr_t in_phys, dma_addr_t out_phys)
>> +{
>> +	struct image_converter *cvt = ctx->cvt;
>> +	struct ipu_ic_priv *priv = cvt->ic->priv;
>> +	struct image_converter_run *run;
>> +	unsigned long flags;
>> +	int ret = 0;
>> +
>> +	run = kzalloc(sizeof(*run), GFP_KERNEL);
>> +	if (!run)
>> +		return ERR_PTR(-ENOMEM);
> What is the reasoning behind making the image_converter_run opaque to
> the user? If you let the user provide it to ipu_image_convert_run, it
> wouldn't have to be allocated/freed with each frame.

Good idea, done!

> Most of this calculation of tile geometry and conversion queue handling
> code is not really low level IC hardware access. I'd like the code that
> doesn't have to access ipu_ic internals directly to be moved into a
> separate source file. I'd suggest ipu-ic-queue.c, or
> ipu-image-convert.c.

Done, I created ipu-image-convert.c.

>> +
>> +struct image_converter_ctx;
>> +struct image_converter_run;
>> +
> Add an ipu_ prefix to those.

Done.

I will be pushing a new patch-set shortly with these changes.

Steve



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 3/4] gpu: ipu-ic: Add complete image conversion support with tiling
@ 2016-09-15  1:45       ` Steve Longerbeam
  0 siblings, 0 replies; 32+ messages in thread
From: Steve Longerbeam @ 2016-09-15  1:45 UTC (permalink / raw)
  To: Philipp Zabel, Steve Longerbeam
  Cc: plagnioj, tomi.valkeinen, dri-devel, linux-kernel, linux-fbdev

Hi Philipp,


On 09/06/2016 02:26 AM, Philipp Zabel wrote:
> Hi Steve,
>
> Am Mittwoch, den 17.08.2016, 17:50 -0700 schrieb Steve Longerbeam:
>> This patch implements complete image conversion support to ipu-ic,
>> with tiling to support scaling to and from images up to 4096x4096.
>> Image rotation is also supported.
>>
>> The internal API is subsystem agnostic (no V4L2 dependency except
>> for the use of V4L2 fourcc pixel formats).
>>
>> Callers prepare for image conversion by calling
>> ipu_image_convert_prepare(), which initializes the parameters of
>> the conversion.
> ... and possibly allocates intermediate buffers for rotation support.
> This should be documented somewhere, with a node that v4l2 users should
> be doing this during REQBUFS.

I added comment headers for all the image conversion prototypes.
It caused bloat in imx-ipu-v3.h, so I moved it to a new header:
include/video/imx-image-convert.h, but let me know if we should put
this somewhere else and/or under Documentation/ somewhere.


>>   The caller passes in the ipu_ic task to use for
>> the conversion, the input and output image formats, a rotation mode,
>> and a completion callback and completion context pointer:
>>
>> struct image_converter_ctx *
>> ipu_image_convert_prepare(struct ipu_ic *ic,
>>                            struct ipu_image *in, struct ipu_image *out,
>>                            enum ipu_rotate_mode rot_mode,
>>                            image_converter_cb_t complete,
>>                            void *complete_context);
> As I commented on the other patch, I think the image_convert functions
> should use a separate handle for the image conversion queues that sit on
> top of the ipu_ic task handles.

Here is a new prototype I came up with:

struct ipu_image_convert_ctx *
ipu_image_convert_prepare(struct ipu_soc *ipu, enum ipu_ic_task ic_task,
               struct ipu_image *in, struct ipu_image *out,
               enum ipu_rotate_mode rot_mode,
               ipu_image_convert_cb_t complete,
               void *complete_context);

In other words, the ipu_ic handle is replaced by the IPU handle and IC task
that are requested for carrying out the conversion.

The image converter will acquire the ipu_ic handle internally, whenever 
there
are queued contexts to that IC task (which I am calling a 'struct 
ipu_image_convert_chan').
This way the IC handle can be shared by all contexts using that IC task. 
After all
contexts have been freed from the (struct 
ipu_image_convert_chan)->ctx_list queue,
the ipu_ic handle is freed.

The ipu_ic handle is acquired in get_ipu_resources() and freed in 
release_ipu_resources(),
along with all the other IPU resources that *could possibly be needed* 
in that
ipu_image_convert_chan by future contexts (*all* idmac channels, *all* 
irqs).

I should have done this from the start, instead of allowing multiple 
handles the the IC tasks.
Thanks for pointing this out.

>
>> +
>> +#define MIN_W     128
>> +#define MIN_H     128
> Where does this minimum come from?

Nowhere really :) This is just some sane minimums, to pass
to clamp_align() when aligning input/output width/height in
ipu_image_convert_adjust().

>> +struct ic_task_channels {
>> +	int in;
>> +	int out;
>> +	int rot_in;
>> +	int rot_out;
>> +	int vdi_in_p;
>> +	int vdi_in;
>> +	int vdi_in_n;
> The vdi channels are unused.

Well, I'd prefer to keep the VDI channels. It's quite possible we
can add motion compensated deinterlacing support using the
PRP_VF task to the image converter in the future.

>> +struct image_converter_ctx {
>> +	struct image_converter *cvt;
>> +
>> +	image_converter_cb_t complete;
>> +	void *complete_context;
>> +
>> +	/* Source/destination image data and rotation mode */
>> +	struct ipu_ic_image in;
>> +	struct ipu_ic_image out;
>> +	enum ipu_rotate_mode rot_mode;
>> +
>> +	/* intermediate buffer for rotation */
>> +	struct ipu_ic_dma_buf rot_intermediate[2];
> No need to change it now, but I assume these could be per IC task
> instead of per context.

Actually no. The rotation intermediate buffers have the dimension
of a single tile, so they must remain in the context struct.

>> +static const struct ipu_ic_pixfmt ipu_ic_formats[] = {
>> +	{
>> +		.name	= "RGB565",
> Please drop the names, keeping a list of user readable format names is
> the v4l2 core's business, not ours.

done.

>> +		.fourcc	= V4L2_PIX_FMT_RGB565,
>> +		.bpp    = 16,
> bpp is only ever used in bytes, not bits (always divided by 8).
> Why not make this bytes_per_pixel or pixel_stride = 2.

Actually bpp is used to calculate *total* tile sizes and *total* bytes
per line. For the planar 4:2:0 formats that means it must be specified
in bits.


>> +	}, {
>> +		.name	= "4:2:0 planar, YUV",
>> +		.fourcc	= V4L2_PIX_FMT_YUV420,
>> +		.bpp    = 12,
>> +		.y_depth = 8,
> y_depth is only ever used in bytes, not bits (always divided by 8).
> Why not make this bool planar instead.

sure why not, although I think y_depth makes the calculations more
explanatory, but not important. Done.

>>
>> +static int ipu_ic_alloc_dma_buf(struct ipu_ic_priv *priv,
>> +				struct ipu_ic_dma_buf *buf,
>> +				int size)
>> +{
>> +	unsigned long newlen = PAGE_ALIGN(size);
>> +
>> +	if (buf->virt) {
>> +		if (buf->len == newlen)
>> +			return 0;
>> +		ipu_ic_free_dma_buf(priv, buf);
>> +	}
> Is it necessary to support reallocation? This is currently only used by
> the prepare function, which creates a new context.

Yep, thanks for catching, removed.

>> +static void ipu_ic_calc_tile_dimensions(struct image_converter_ctx *ctx,
>> +					struct ipu_ic_image *image)
>> +{
>> +	int i;
>> +
>> +	for (i = 0; i < ctx->num_tiles; i++) {
>> +		struct ipu_ic_tile *tile = &image->tile[i];
>> +
>> +		tile->height = image->base.pix.height / image->num_rows;
>> +		tile->width = image->base.pix.width / image->num_cols;
> We already have talked about this, this simplified tiling will cause
> image artifacts (horizontal and vertical seams at the tile borders) when
> the bilinear upscaler source pixel step is significantly smaller than a
> whole pixel.
> This can be fixed in the future by using overlapping tiles of different
> sizes and possibly by slightly changing the scaling factors of
> individual tiles.

Right, for now I've added a FIXME note near the top.

> This looks nice, but I'd just move the rot_mode conditional below
> assignment of src_row/col and do away with the sin/cos temporary
> variables:
>
> 	/*
> 	 * before doing the transform, first we have to translate
> 	 * source row,col for an origin in the center of s_image
> 	 */
> 	src_row = src_row * 2 - (s_image->num_rows - 1);
> 	src_col = src_col * 2 - (s_image->num_cols - 1);
>
> 	/* do the rotation transform */
> 	if (ctx->rot_mode & IPU_ROT_BIT_90) {
> 		dst_col = -src_row;
> 		dst_row = src_col;
> 	} else {
> 		dst_col = src_col;
> 		dst_row = src_row;
> 	}

Done.

>> +		for (col = 0; col < image->num_cols; col++) {
>> +			y_col_off = (col * w * y_depth) >> 3;
> We know that for planar formats, y_depth can only ever be 8. No need to
> calculate this here.

Done.

> Most of the following code seems to be running under one big spinlock.
> Is this really necessary?

You're right, convert_stop(), convert_start(), and init_idmac_channel() are
only calling the ipu_ic lower level primitives. So they don't require 
the irqlock.
I did remove the "hold irqlock when calling" comment for those. However
they are called embedded in the irq handling, so it would be cumbersome
to drop the lock there only because they don't need it. We can revisit the
lock handling later if you see some room for optimization there.


>> +static int ipu_ic_get_run_count(struct image_converter_ctx *ctx,
>> +				struct list_head *q)
>> +{
>> +	struct image_converter_run *run;
>> +	int count = 0;
> Add
> 	lockdep_assert_held(&ctx->irqlock);
> for the functions that expect their caller to be holding the lock.

Done.

>> +	list_for_each_entry(run, q, list) {
>> +		if (run->ctx == ctx)
>> +			count++;
>> +	}
>> +
>> +	return count;
>> +}
>> +
>> +/* hold irqlock when calling */
>> +static void ipu_ic_convert_stop(struct image_converter_run *run)
>> +{
>> +	struct image_converter_ctx *ctx = run->ctx;
>> +	struct image_converter *cvt = ctx->cvt;
>> +	struct ipu_ic_priv *priv = cvt->ic->priv;
>> +
>> +	dev_dbg(priv->ipu->dev, "%s: stopping ctx %p run %p\n",
>> +		__func__, ctx, run);
> Maybe add some indication which IC task this context belongs to?

Done.

>> +
>> +	if (channel == cvt->rotation_in_chan ||
>> +	    channel == cvt->rotation_out_chan) {
>> +		burst_size = 8;
>> +		ipu_cpmem_set_block_mode(channel);
>> +	} else
>> +		burst_size = (width % 16) ? 8 : 16;
> This is for later, but it might turn out to be better to accept a little
> overdraw if stride allows for it and use the larger burst size,
> especially for wide images.

Right, as long as the stride is a multiple of the burst size.


>>
>>
>>
>> +
>> +static irqreturn_t ipu_ic_norotate_irq(int irq, void *data)
>> +{
>> +	struct image_converter *cvt = data;
>> +	struct image_converter_ctx *ctx;
>> +	struct image_converter_run *run;
>> +	unsigned long flags;
>> +	irqreturn_t ret;
>> +
>> +	spin_lock_irqsave(&cvt->irqlock, flags);
>> +
>> +	/* get current run and its context */
>> +	run = cvt->current_run;
>> +	if (!run) {
>> +		ret = IRQ_NONE;
>> +		goto out;
>> +	}
>> +
>> +	ctx = run->ctx;
>> +
>> +	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
>> +		/* this is a rotation operation, just ignore */
>> +		spin_unlock_irqrestore(&cvt->irqlock, flags);
>> +		return IRQ_HANDLED;
>> +	}
> Why enable the out_chan EOF irq at all when using the IRT mode?

Because (see above), all the IPU resources that might be needed
for any conversion context that is queued to a image conversion
channel (IC task) are acquired when the first context is queued,
including rotation resources. So by acquiring the non-rotation EOF
irq, it will get fielded even for rotation conversions, so we have to
handle it.

>>
>>
>> +/* Adjusts input/output images to IPU restrictions */
>> +int ipu_image_convert_adjust(struct ipu_image *in, struct ipu_image *out,
>> +			     enum ipu_rotate_mode rot_mode)
>> +{
>> +	const struct ipu_ic_pixfmt *infmt, *outfmt;
>> +	unsigned int num_in_rows, num_in_cols;
>> +	unsigned int num_out_rows, num_out_cols;
>> +	u32 w_align, h_align;
>> +
>> +	infmt = ipu_ic_get_format(in->pix.pixelformat);
>> +	outfmt = ipu_ic_get_format(out->pix.pixelformat);
>> +
>> +	/* set some defaults if needed */
> Is this our task at all?

ipu_image_convert_adjust() is meant to be called by v4l2 try_format(),
which should never return EINVAL but should return a supported format
when the passed format is not supported. So I added this here to return
some default pixel formats and width/heights if needed.

>> +	if (!infmt) {
>> +		in->pix.pixelformat = V4L2_PIX_FMT_RGB24;
>> +		infmt = ipu_ic_get_format(V4L2_PIX_FMT_RGB24);
>> +	}
>> +	if (!outfmt) {
>> +		out->pix.pixelformat = V4L2_PIX_FMT_RGB24;
>> +		outfmt = ipu_ic_get_format(V4L2_PIX_FMT_RGB24);
>> +	}
>> +
>> +	if (!in->pix.width || !in->pix.height) {
>> +		in->pix.width = 640;
>> +		in->pix.height = 480;
>> +	}
>> +	if (!out->pix.width || !out->pix.height) {
>> +		out->pix.width = 640;
>> +		out->pix.height = 480;
>> +	}
>> +
>> +	/* image converter does not handle fields */
>> +	in->pix.field = out->pix.field = V4L2_FIELD_NONE;
> Why not? The scaler can scale alternate top/bottom fields no problem.
>
> For SEQ_TB/BT and the interleaved interlacing we'd have to adjust
> scaling factors per field and use two vertical tiles for the fields
> before this can be supported.

Right, we could do that. It would then be up to a later pipeline element
to do the deinterlacing, but at least this would scale and/or color convert
the fields.

>> +/*
>> + * Carry out a single image conversion. Only the physaddr's of the input
>> + * and output image buffers are needed. The conversion context must have
>> + * been created previously with ipu_image_convert_prepare(). Returns the
>> + * new run object.
>> + */
>> +struct image_converter_run *
>> +ipu_image_convert_run(struct image_converter_ctx *ctx,
>> +		      dma_addr_t in_phys, dma_addr_t out_phys)
>> +{
>> +	struct image_converter *cvt = ctx->cvt;
>> +	struct ipu_ic_priv *priv = cvt->ic->priv;
>> +	struct image_converter_run *run;
>> +	unsigned long flags;
>> +	int ret = 0;
>> +
>> +	run = kzalloc(sizeof(*run), GFP_KERNEL);
>> +	if (!run)
>> +		return ERR_PTR(-ENOMEM);
> What is the reasoning behind making the image_converter_run opaque to
> the user? If you let the user provide it to ipu_image_convert_run, it
> wouldn't have to be allocated/freed with each frame.

Good idea, done!

> Most of this calculation of tile geometry and conversion queue handling
> code is not really low level IC hardware access. I'd like the code that
> doesn't have to access ipu_ic internals directly to be moved into a
> separate source file. I'd suggest ipu-ic-queue.c, or
> ipu-image-convert.c.

Done, I created ipu-image-convert.c.

>> +
>> +struct image_converter_ctx;
>> +struct image_converter_run;
>> +
> Add an ipu_ prefix to those.

Done.

I will be pushing a new patch-set shortly with these changes.

Steve

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 3/4] gpu: ipu-ic: Add complete image conversion support with tiling
  2016-09-15  1:45       ` Steve Longerbeam
@ 2016-09-16 14:16         ` Philipp Zabel
  -1 siblings, 0 replies; 32+ messages in thread
From: Philipp Zabel @ 2016-09-16 14:16 UTC (permalink / raw)
  To: Steve Longerbeam
  Cc: Steve Longerbeam, plagnioj, tomi.valkeinen, dri-devel,
	linux-kernel, linux-fbdev

Hi Steve,

thanks for the update.

Am Mittwoch, den 14.09.2016, 18:45 -0700 schrieb Steve Longerbeam:
> Hi Philipp,
> 
> 
> On 09/06/2016 02:26 AM, Philipp Zabel wrote:
> > Hi Steve,
> >
> > Am Mittwoch, den 17.08.2016, 17:50 -0700 schrieb Steve Longerbeam:
> >> This patch implements complete image conversion support to ipu-ic,
> >> with tiling to support scaling to and from images up to 4096x4096.
> >> Image rotation is also supported.
> >>
> >> The internal API is subsystem agnostic (no V4L2 dependency except
> >> for the use of V4L2 fourcc pixel formats).
> >>
> >> Callers prepare for image conversion by calling
> >> ipu_image_convert_prepare(), which initializes the parameters of
> >> the conversion.
> > ... and possibly allocates intermediate buffers for rotation support.
> > This should be documented somewhere, with a node that v4l2 users should
> > be doing this during REQBUFS.
> 
> I added comment headers for all the image conversion prototypes.
> It caused bloat in imx-ipu-v3.h, so I moved it to a new header:
> include/video/imx-image-convert.h, but let me know if we should put
> this somewhere else and/or under Documentation/ somewhere.

I think that is the right place already. imx-image-convert.h could be
renamed to imx-ipu-image-convert.h, to make clear that this is about the
IPU image converter.

> >>   The caller passes in the ipu_ic task to use for
> >> the conversion, the input and output image formats, a rotation mode,
> >> and a completion callback and completion context pointer:
> >>
> >> struct image_converter_ctx *
> >> ipu_image_convert_prepare(struct ipu_ic *ic,
> >>                            struct ipu_image *in, struct ipu_image *out,
> >>                            enum ipu_rotate_mode rot_mode,
> >>                            image_converter_cb_t complete,
> >>                            void *complete_context);
> > As I commented on the other patch, I think the image_convert functions
> > should use a separate handle for the image conversion queues that sit on
> > top of the ipu_ic task handles.
> 
> Here is a new prototype I came up with:
> 
> struct ipu_image_convert_ctx *
> ipu_image_convert_prepare(struct ipu_soc *ipu, enum ipu_ic_task ic_task,
>                struct ipu_image *in, struct ipu_image *out,
>                enum ipu_rotate_mode rot_mode,
>                ipu_image_convert_cb_t complete,
>                void *complete_context);
> 
> In other words, the ipu_ic handle is replaced by the IPU handle and IC task
> that are requested for carrying out the conversion.

Looks good to me for now.

> The image converter will acquire the ipu_ic handle internally, whenever 
> there
> are queued contexts to that IC task (which I am calling a 'struct 
> ipu_image_convert_chan').
> This way the IC handle can be shared by all contexts using that IC task. 
> After all
> contexts have been freed from the (struct 
> ipu_image_convert_chan)->ctx_list queue,
> the ipu_ic handle is freed.
> 
> The ipu_ic handle is acquired in get_ipu_resources() and freed in 
> release_ipu_resources(),
> along with all the other IPU resources that *could possibly be needed* 
> in that
> ipu_image_convert_chan by future contexts (*all* idmac channels, *all* 
> irqs).

Ok.

[...]
> >> +#define MIN_W     128
> >> +#define MIN_H     128
> > Where does this minimum come from?
> 
> Nowhere really :) This is just some sane minimums, to pass
> to clamp_align() when aligning input/output width/height in
> ipu_image_convert_adjust().

Let's use hardware minimum in the low level code. Sane defaults are for
the V4L2 API. Would that be 8x2 pixels per input tile?

> >> +struct ic_task_channels {
> >> +	int in;
> >> +	int out;
> >> +	int rot_in;
> >> +	int rot_out;
> >> +	int vdi_in_p;
> >> +	int vdi_in;
> >> +	int vdi_in_n;
> > The vdi channels are unused.
> 
> Well, I'd prefer to keep the VDI channels. It's quite possible we
> can add motion compensated deinterlacing support using the
> PRP_VF task to the image converter in the future.

Indeed.

> >> +struct image_converter_ctx {
> >> +	struct image_converter *cvt;
> >> +
> >> +	image_converter_cb_t complete;
> >> +	void *complete_context;
> >> +
> >> +	/* Source/destination image data and rotation mode */
> >> +	struct ipu_ic_image in;
> >> +	struct ipu_ic_image out;
> >> +	enum ipu_rotate_mode rot_mode;
> >> +
> >> +	/* intermediate buffer for rotation */
> >> +	struct ipu_ic_dma_buf rot_intermediate[2];
> > No need to change it now, but I assume these could be per IC task
> > instead of per context.
> 
> Actually no. The rotation intermediate buffers have the dimension
> of a single tile, so they must remain in the context struct.

I see. The per task intermediate buffer would have to be the maximum
size, so this would only ever make sense when rotating multiple large
RGB streams simultaneously. I think we can reasonably ignore this use
case.

[...]
> >> +		.fourcc	= V4L2_PIX_FMT_RGB565,
> >> +		.bpp    = 16,
> > bpp is only ever used in bytes, not bits (always divided by 8).
> > Why not make this bytes_per_pixel or pixel_stride = 2.
> 
> Actually bpp is used to calculate *total* tile sizes and *total* bytes
> per line. For the planar 4:2:0 formats that means it must be specified
> in bits.

Ok for total size of chroma subsampled planar formats.

[...]
> > Most of the following code seems to be running under one big spinlock.
> > Is this really necessary?
> 
> You're right, convert_stop(), convert_start(), and init_idmac_channel() are
> only calling the ipu_ic lower level primitives. So they don't require 
> the irqlock.
> I did remove the "hold irqlock when calling" comment for those. However
> they are called embedded in the irq handling, so it would be cumbersome
> to drop the lock there only because they don't need it. We can revisit the
> lock handling later if you see some room for optimization there.

Alright, let's call it future performance optimization potential.

> >> +static irqreturn_t ipu_ic_norotate_irq(int irq, void *data)
> >> +{
[...]
> >> +	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
> >> +		/* this is a rotation operation, just ignore */
> >> +		spin_unlock_irqrestore(&cvt->irqlock, flags);
> >> +		return IRQ_HANDLED;
> >> +	}
> > Why enable the out_chan EOF irq at all when using the IRT mode?
> 
> Because (see above), all the IPU resources that might be needed
> for any conversion context that is queued to a image conversion
> channel (IC task) are acquired when the first context is queued,
> including rotation resources. So by acquiring the non-rotation EOF
> irq, it will get fielded even for rotation conversions, so we have to
> handle it.

There is nothing wrong with acquiring the irq. It could still be
disabled while it is not needed.

> >> +/* Adjusts input/output images to IPU restrictions */
> >> +int ipu_image_convert_adjust(struct ipu_image *in, struct ipu_image *out,
> >> +			     enum ipu_rotate_mode rot_mode)
> >> +{
> >> +	const struct ipu_ic_pixfmt *infmt, *outfmt;
> >> +	unsigned int num_in_rows, num_in_cols;
> >> +	unsigned int num_out_rows, num_out_cols;
> >> +	u32 w_align, h_align;
> >> +
> >> +	infmt = ipu_ic_get_format(in->pix.pixelformat);
> >> +	outfmt = ipu_ic_get_format(out->pix.pixelformat);
> >> +
> >> +	/* set some defaults if needed */
> > Is this our task at all?
> 
> ipu_image_convert_adjust() is meant to be called by v4l2 try_format(),
> which should never return EINVAL but should return a supported format
> when the passed format is not supported. So I added this here to return
> some default pixel formats and width/heights if needed.

I'd prefer to move this into the mem2mem driver try_format, then.

The remaining issues are minor and can be fixed later.
I'll apply this as is.

regards
Philipp

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 3/4] gpu: ipu-ic: Add complete image conversion support with tiling
@ 2016-09-16 14:16         ` Philipp Zabel
  0 siblings, 0 replies; 32+ messages in thread
From: Philipp Zabel @ 2016-09-16 14:16 UTC (permalink / raw)
  To: Steve Longerbeam
  Cc: Steve Longerbeam, plagnioj, tomi.valkeinen, dri-devel,
	linux-kernel, linux-fbdev

Hi Steve,

thanks for the update.

Am Mittwoch, den 14.09.2016, 18:45 -0700 schrieb Steve Longerbeam:
> Hi Philipp,
> 
> 
> On 09/06/2016 02:26 AM, Philipp Zabel wrote:
> > Hi Steve,
> >
> > Am Mittwoch, den 17.08.2016, 17:50 -0700 schrieb Steve Longerbeam:
> >> This patch implements complete image conversion support to ipu-ic,
> >> with tiling to support scaling to and from images up to 4096x4096.
> >> Image rotation is also supported.
> >>
> >> The internal API is subsystem agnostic (no V4L2 dependency except
> >> for the use of V4L2 fourcc pixel formats).
> >>
> >> Callers prepare for image conversion by calling
> >> ipu_image_convert_prepare(), which initializes the parameters of
> >> the conversion.
> > ... and possibly allocates intermediate buffers for rotation support.
> > This should be documented somewhere, with a node that v4l2 users should
> > be doing this during REQBUFS.
> 
> I added comment headers for all the image conversion prototypes.
> It caused bloat in imx-ipu-v3.h, so I moved it to a new header:
> include/video/imx-image-convert.h, but let me know if we should put
> this somewhere else and/or under Documentation/ somewhere.

I think that is the right place already. imx-image-convert.h could be
renamed to imx-ipu-image-convert.h, to make clear that this is about the
IPU image converter.

> >>   The caller passes in the ipu_ic task to use for
> >> the conversion, the input and output image formats, a rotation mode,
> >> and a completion callback and completion context pointer:
> >>
> >> struct image_converter_ctx *
> >> ipu_image_convert_prepare(struct ipu_ic *ic,
> >>                            struct ipu_image *in, struct ipu_image *out,
> >>                            enum ipu_rotate_mode rot_mode,
> >>                            image_converter_cb_t complete,
> >>                            void *complete_context);
> > As I commented on the other patch, I think the image_convert functions
> > should use a separate handle for the image conversion queues that sit on
> > top of the ipu_ic task handles.
> 
> Here is a new prototype I came up with:
> 
> struct ipu_image_convert_ctx *
> ipu_image_convert_prepare(struct ipu_soc *ipu, enum ipu_ic_task ic_task,
>                struct ipu_image *in, struct ipu_image *out,
>                enum ipu_rotate_mode rot_mode,
>                ipu_image_convert_cb_t complete,
>                void *complete_context);
> 
> In other words, the ipu_ic handle is replaced by the IPU handle and IC task
> that are requested for carrying out the conversion.

Looks good to me for now.

> The image converter will acquire the ipu_ic handle internally, whenever 
> there
> are queued contexts to that IC task (which I am calling a 'struct 
> ipu_image_convert_chan').
> This way the IC handle can be shared by all contexts using that IC task. 
> After all
> contexts have been freed from the (struct 
> ipu_image_convert_chan)->ctx_list queue,
> the ipu_ic handle is freed.
> 
> The ipu_ic handle is acquired in get_ipu_resources() and freed in 
> release_ipu_resources(),
> along with all the other IPU resources that *could possibly be needed* 
> in that
> ipu_image_convert_chan by future contexts (*all* idmac channels, *all* 
> irqs).

Ok.

[...]
> >> +#define MIN_W     128
> >> +#define MIN_H     128
> > Where does this minimum come from?
> 
> Nowhere really :) This is just some sane minimums, to pass
> to clamp_align() when aligning input/output width/height in
> ipu_image_convert_adjust().

Let's use hardware minimum in the low level code. Sane defaults are for
the V4L2 API. Would that be 8x2 pixels per input tile?

> >> +struct ic_task_channels {
> >> +	int in;
> >> +	int out;
> >> +	int rot_in;
> >> +	int rot_out;
> >> +	int vdi_in_p;
> >> +	int vdi_in;
> >> +	int vdi_in_n;
> > The vdi channels are unused.
> 
> Well, I'd prefer to keep the VDI channels. It's quite possible we
> can add motion compensated deinterlacing support using the
> PRP_VF task to the image converter in the future.

Indeed.

> >> +struct image_converter_ctx {
> >> +	struct image_converter *cvt;
> >> +
> >> +	image_converter_cb_t complete;
> >> +	void *complete_context;
> >> +
> >> +	/* Source/destination image data and rotation mode */
> >> +	struct ipu_ic_image in;
> >> +	struct ipu_ic_image out;
> >> +	enum ipu_rotate_mode rot_mode;
> >> +
> >> +	/* intermediate buffer for rotation */
> >> +	struct ipu_ic_dma_buf rot_intermediate[2];
> > No need to change it now, but I assume these could be per IC task
> > instead of per context.
> 
> Actually no. The rotation intermediate buffers have the dimension
> of a single tile, so they must remain in the context struct.

I see. The per task intermediate buffer would have to be the maximum
size, so this would only ever make sense when rotating multiple large
RGB streams simultaneously. I think we can reasonably ignore this use
case.

[...]
> >> +		.fourcc	= V4L2_PIX_FMT_RGB565,
> >> +		.bpp    = 16,
> > bpp is only ever used in bytes, not bits (always divided by 8).
> > Why not make this bytes_per_pixel or pixel_stride = 2.
> 
> Actually bpp is used to calculate *total* tile sizes and *total* bytes
> per line. For the planar 4:2:0 formats that means it must be specified
> in bits.

Ok for total size of chroma subsampled planar formats.

[...]
> > Most of the following code seems to be running under one big spinlock.
> > Is this really necessary?
> 
> You're right, convert_stop(), convert_start(), and init_idmac_channel() are
> only calling the ipu_ic lower level primitives. So they don't require 
> the irqlock.
> I did remove the "hold irqlock when calling" comment for those. However
> they are called embedded in the irq handling, so it would be cumbersome
> to drop the lock there only because they don't need it. We can revisit the
> lock handling later if you see some room for optimization there.

Alright, let's call it future performance optimization potential.

> >> +static irqreturn_t ipu_ic_norotate_irq(int irq, void *data)
> >> +{
[...]
> >> +	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
> >> +		/* this is a rotation operation, just ignore */
> >> +		spin_unlock_irqrestore(&cvt->irqlock, flags);
> >> +		return IRQ_HANDLED;
> >> +	}
> > Why enable the out_chan EOF irq at all when using the IRT mode?
> 
> Because (see above), all the IPU resources that might be needed
> for any conversion context that is queued to a image conversion
> channel (IC task) are acquired when the first context is queued,
> including rotation resources. So by acquiring the non-rotation EOF
> irq, it will get fielded even for rotation conversions, so we have to
> handle it.

There is nothing wrong with acquiring the irq. It could still be
disabled while it is not needed.

> >> +/* Adjusts input/output images to IPU restrictions */
> >> +int ipu_image_convert_adjust(struct ipu_image *in, struct ipu_image *out,
> >> +			     enum ipu_rotate_mode rot_mode)
> >> +{
> >> +	const struct ipu_ic_pixfmt *infmt, *outfmt;
> >> +	unsigned int num_in_rows, num_in_cols;
> >> +	unsigned int num_out_rows, num_out_cols;
> >> +	u32 w_align, h_align;
> >> +
> >> +	infmt = ipu_ic_get_format(in->pix.pixelformat);
> >> +	outfmt = ipu_ic_get_format(out->pix.pixelformat);
> >> +
> >> +	/* set some defaults if needed */
> > Is this our task at all?
> 
> ipu_image_convert_adjust() is meant to be called by v4l2 try_format(),
> which should never return EINVAL but should return a supported format
> when the passed format is not supported. So I added this here to return
> some default pixel formats and width/heights if needed.

I'd prefer to move this into the mem2mem driver try_format, then.

The remaining issues are minor and can be fixed later.
I'll apply this as is.

regards
Philipp


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 3/4] gpu: ipu-ic: Add complete image conversion support with tiling
  2016-09-16 14:16         ` Philipp Zabel
  (?)
@ 2016-09-17 18:46           ` Steve Longerbeam
  -1 siblings, 0 replies; 32+ messages in thread
From: Steve Longerbeam @ 2016-09-17 18:46 UTC (permalink / raw)
  To: Philipp Zabel
  Cc: Steve Longerbeam, plagnioj, tomi.valkeinen, dri-devel,
	linux-kernel, linux-fbdev



On 09/16/2016 07:16 AM, Philipp Zabel wrote:
> Hi Steve,
>
> thanks for the update.
>
> Am Mittwoch, den 14.09.2016, 18:45 -0700 schrieb Steve Longerbeam:
>> I added comment headers for all the image conversion prototypes.
>> It caused bloat in imx-ipu-v3.h, so I moved it to a new header:
>> include/video/imx-image-convert.h, but let me know if we should put
>> this somewhere else and/or under Documentation/ somewhere.
> I think that is the right place already. imx-image-convert.h could be
> renamed to imx-ipu-image-convert.h, to make clear that this is about the
> IPU image converter.

Ok, I'll send another update with the name change in the next
version (v7).
>
>>>> +#define MIN_W     128
>>>> +#define MIN_H     128
>>> Where does this minimum come from?
>> Nowhere really :) This is just some sane minimums, to pass
>> to clamp_align() when aligning input/output width/height in
>> ipu_image_convert_adjust().
> Let's use hardware minimum in the low level code. Sane defaults are for
> the V4L2 API. Would that be 8x2 pixels per input tile?

I searched the imx6 reference manual, I can't find any mention
of width/height minimums for the IC. So I suppose 8x2 would be fine,
or maybe 16x8, to account for planar and IRT conversions.

>
>>>> +	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
>>>> +		/* this is a rotation operation, just ignore */
>>>> +		spin_unlock_irqrestore(&cvt->irqlock, flags);
>>>> +		return IRQ_HANDLED;
>>>> +	}
>>> Why enable the out_chan EOF irq at all when using the IRT mode?
>> Because (see above), all the IPU resources that might be needed
>> for any conversion context that is queued to a image conversion
>> channel (IC task) are acquired when the first context is queued,
>> including rotation resources. So by acquiring the non-rotation EOF
>> irq, it will get fielded even for rotation conversions, so we have to
>> handle it.
> There is nothing wrong with acquiring the irq. It could still be
> disabled while it is not needed.

It would be difficult to disable the irq. Remember the irq handlers must
field all EOF interrupts in an ipu_image_convert_chan (IC task). If one
context in that channel disables the irq, it will break other runnings
contexts in that channel that are using it.

>
>>>> +/* Adjusts input/output images to IPU restrictions */
>>>> +int ipu_image_convert_adjust(struct ipu_image *in, struct ipu_image *out,
>>>> +			     enum ipu_rotate_mode rot_mode)
>>>> +{
>>>> +	const struct ipu_ic_pixfmt *infmt, *outfmt;
>>>> +	unsigned int num_in_rows, num_in_cols;
>>>> +	unsigned int num_out_rows, num_out_cols;
>>>> +	u32 w_align, h_align;
>>>> +
>>>> +	infmt = ipu_ic_get_format(in->pix.pixelformat);
>>>> +	outfmt = ipu_ic_get_format(out->pix.pixelformat);
>>>> +
>>>> +	/* set some defaults if needed */
>>> Is this our task at all?
>> ipu_image_convert_adjust() is meant to be called by v4l2 try_format(),
>> which should never return EINVAL but should return a supported format
>> when the passed format is not supported. So I added this here to return
>> some default pixel formats and width/heights if needed.
> I'd prefer to move this into the mem2mem driver try_format, then.

We could move the 0 width/height checks to v4l2, but the pixel
format defaults should probably remain in ipu-image-convert, since
it knows what formats it supports converting to/from.

Steve

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 3/4] gpu: ipu-ic: Add complete image conversion support with tiling
@ 2016-09-17 18:46           ` Steve Longerbeam
  0 siblings, 0 replies; 32+ messages in thread
From: Steve Longerbeam @ 2016-09-17 18:46 UTC (permalink / raw)
  To: Philipp Zabel
  Cc: Steve Longerbeam, plagnioj, tomi.valkeinen, dri-devel,
	linux-kernel, linux-fbdev



On 09/16/2016 07:16 AM, Philipp Zabel wrote:
> Hi Steve,
>
> thanks for the update.
>
> Am Mittwoch, den 14.09.2016, 18:45 -0700 schrieb Steve Longerbeam:
>> I added comment headers for all the image conversion prototypes.
>> It caused bloat in imx-ipu-v3.h, so I moved it to a new header:
>> include/video/imx-image-convert.h, but let me know if we should put
>> this somewhere else and/or under Documentation/ somewhere.
> I think that is the right place already. imx-image-convert.h could be
> renamed to imx-ipu-image-convert.h, to make clear that this is about the
> IPU image converter.

Ok, I'll send another update with the name change in the next
version (v7).
>
>>>> +#define MIN_W     128
>>>> +#define MIN_H     128
>>> Where does this minimum come from?
>> Nowhere really :) This is just some sane minimums, to pass
>> to clamp_align() when aligning input/output width/height in
>> ipu_image_convert_adjust().
> Let's use hardware minimum in the low level code. Sane defaults are for
> the V4L2 API. Would that be 8x2 pixels per input tile?

I searched the imx6 reference manual, I can't find any mention
of width/height minimums for the IC. So I suppose 8x2 would be fine,
or maybe 16x8, to account for planar and IRT conversions.

>
>>>> +	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
>>>> +		/* this is a rotation operation, just ignore */
>>>> +		spin_unlock_irqrestore(&cvt->irqlock, flags);
>>>> +		return IRQ_HANDLED;
>>>> +	}
>>> Why enable the out_chan EOF irq at all when using the IRT mode?
>> Because (see above), all the IPU resources that might be needed
>> for any conversion context that is queued to a image conversion
>> channel (IC task) are acquired when the first context is queued,
>> including rotation resources. So by acquiring the non-rotation EOF
>> irq, it will get fielded even for rotation conversions, so we have to
>> handle it.
> There is nothing wrong with acquiring the irq. It could still be
> disabled while it is not needed.

It would be difficult to disable the irq. Remember the irq handlers must
field all EOF interrupts in an ipu_image_convert_chan (IC task). If one
context in that channel disables the irq, it will break other runnings
contexts in that channel that are using it.

>
>>>> +/* Adjusts input/output images to IPU restrictions */
>>>> +int ipu_image_convert_adjust(struct ipu_image *in, struct ipu_image *out,
>>>> +			     enum ipu_rotate_mode rot_mode)
>>>> +{
>>>> +	const struct ipu_ic_pixfmt *infmt, *outfmt;
>>>> +	unsigned int num_in_rows, num_in_cols;
>>>> +	unsigned int num_out_rows, num_out_cols;
>>>> +	u32 w_align, h_align;
>>>> +
>>>> +	infmt = ipu_ic_get_format(in->pix.pixelformat);
>>>> +	outfmt = ipu_ic_get_format(out->pix.pixelformat);
>>>> +
>>>> +	/* set some defaults if needed */
>>> Is this our task at all?
>> ipu_image_convert_adjust() is meant to be called by v4l2 try_format(),
>> which should never return EINVAL but should return a supported format
>> when the passed format is not supported. So I added this here to return
>> some default pixel formats and width/heights if needed.
> I'd prefer to move this into the mem2mem driver try_format, then.

We could move the 0 width/height checks to v4l2, but the pixel
format defaults should probably remain in ipu-image-convert, since
it knows what formats it supports converting to/from.

Steve


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 3/4] gpu: ipu-ic: Add complete image conversion support with tiling
@ 2016-09-17 18:46           ` Steve Longerbeam
  0 siblings, 0 replies; 32+ messages in thread
From: Steve Longerbeam @ 2016-09-17 18:46 UTC (permalink / raw)
  To: Philipp Zabel
  Cc: Steve Longerbeam, plagnioj, tomi.valkeinen, dri-devel,
	linux-kernel, linux-fbdev



On 09/16/2016 07:16 AM, Philipp Zabel wrote:
> Hi Steve,
>
> thanks for the update.
>
> Am Mittwoch, den 14.09.2016, 18:45 -0700 schrieb Steve Longerbeam:
>> I added comment headers for all the image conversion prototypes.
>> It caused bloat in imx-ipu-v3.h, so I moved it to a new header:
>> include/video/imx-image-convert.h, but let me know if we should put
>> this somewhere else and/or under Documentation/ somewhere.
> I think that is the right place already. imx-image-convert.h could be
> renamed to imx-ipu-image-convert.h, to make clear that this is about the
> IPU image converter.

Ok, I'll send another update with the name change in the next
version (v7).
>
>>>> +#define MIN_W     128
>>>> +#define MIN_H     128
>>> Where does this minimum come from?
>> Nowhere really :) This is just some sane minimums, to pass
>> to clamp_align() when aligning input/output width/height in
>> ipu_image_convert_adjust().
> Let's use hardware minimum in the low level code. Sane defaults are for
> the V4L2 API. Would that be 8x2 pixels per input tile?

I searched the imx6 reference manual, I can't find any mention
of width/height minimums for the IC. So I suppose 8x2 would be fine,
or maybe 16x8, to account for planar and IRT conversions.

>
>>>> +	if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
>>>> +		/* this is a rotation operation, just ignore */
>>>> +		spin_unlock_irqrestore(&cvt->irqlock, flags);
>>>> +		return IRQ_HANDLED;
>>>> +	}
>>> Why enable the out_chan EOF irq at all when using the IRT mode?
>> Because (see above), all the IPU resources that might be needed
>> for any conversion context that is queued to a image conversion
>> channel (IC task) are acquired when the first context is queued,
>> including rotation resources. So by acquiring the non-rotation EOF
>> irq, it will get fielded even for rotation conversions, so we have to
>> handle it.
> There is nothing wrong with acquiring the irq. It could still be
> disabled while it is not needed.

It would be difficult to disable the irq. Remember the irq handlers must
field all EOF interrupts in an ipu_image_convert_chan (IC task). If one
context in that channel disables the irq, it will break other runnings
contexts in that channel that are using it.

>
>>>> +/* Adjusts input/output images to IPU restrictions */
>>>> +int ipu_image_convert_adjust(struct ipu_image *in, struct ipu_image *out,
>>>> +			     enum ipu_rotate_mode rot_mode)
>>>> +{
>>>> +	const struct ipu_ic_pixfmt *infmt, *outfmt;
>>>> +	unsigned int num_in_rows, num_in_cols;
>>>> +	unsigned int num_out_rows, num_out_cols;
>>>> +	u32 w_align, h_align;
>>>> +
>>>> +	infmt = ipu_ic_get_format(in->pix.pixelformat);
>>>> +	outfmt = ipu_ic_get_format(out->pix.pixelformat);
>>>> +
>>>> +	/* set some defaults if needed */
>>> Is this our task at all?
>> ipu_image_convert_adjust() is meant to be called by v4l2 try_format(),
>> which should never return EINVAL but should return a supported format
>> when the passed format is not supported. So I added this here to return
>> some default pixel formats and width/heights if needed.
> I'd prefer to move this into the mem2mem driver try_format, then.

We could move the 0 width/height checks to v4l2, but the pixel
format defaults should probably remain in ipu-image-convert, since
it knows what formats it supports converting to/from.

Steve

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2016-09-17 18:46 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-18  0:50 [PATCH v4 0/4] IPUv3 prep for i.MX5/6 v4l2 staging drivers, v4 Steve Longerbeam
2016-08-18  0:50 ` Steve Longerbeam
2016-08-18  0:50 ` [PATCH v4 1/4] gpu: ipu-v3: Add Video Deinterlacer unit Steve Longerbeam
2016-08-18  0:50   ` Steve Longerbeam
2016-08-18  0:50 ` [PATCH v4 2/4] gpu: ipu-v3: Add FSU channel linking support Steve Longerbeam
2016-08-18  0:50   ` Steve Longerbeam
2016-08-18  0:50 ` [PATCH v4 3/4] gpu: ipu-ic: Add complete image conversion support with tiling Steve Longerbeam
2016-08-18  0:50   ` Steve Longerbeam
2016-09-06  9:26   ` Philipp Zabel
2016-09-06  9:26     ` Philipp Zabel
2016-09-06  9:26     ` Philipp Zabel
2016-09-15  1:45     ` Steve Longerbeam
2016-09-15  1:45       ` Steve Longerbeam
2016-09-15  1:45       ` Steve Longerbeam
2016-09-16 14:16       ` Philipp Zabel
2016-09-16 14:16         ` Philipp Zabel
2016-09-17 18:46         ` Steve Longerbeam
2016-09-17 18:46           ` Steve Longerbeam
2016-09-17 18:46           ` Steve Longerbeam
2016-08-18  0:50 ` [PATCH v4 4/4] gpu: ipu-ic: allow multiple handles to ic Steve Longerbeam
2016-08-18  0:50   ` Steve Longerbeam
2016-09-06  9:26   ` Philipp Zabel
2016-09-06  9:26     ` Philipp Zabel
2016-09-06  9:26     ` Philipp Zabel
2016-08-25 14:17 ` [PATCH v4 0/4] IPUv3 prep for i.MX5/6 v4l2 staging drivers, v4 Tim Harvey
2016-08-25 14:17   ` Tim Harvey
2016-09-05 14:41   ` Fabio Estevam
2016-09-05 14:41     ` Fabio Estevam
2016-09-05 14:41     ` Fabio Estevam
2016-09-06  9:26     ` Philipp Zabel
2016-09-06  9:26       ` Philipp Zabel
2016-09-06  9:26       ` Philipp Zabel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.