All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/3]  [media] add IPU3 CIO2 CSI2 driver
@ 2017-06-07  1:34 Yong Zhi
  2017-06-07  1:34 ` [PATCH v2 1/3] [media] videodev2.h, v4l2-ioctl: add IPU3 raw10 color format Yong Zhi
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Yong Zhi @ 2017-06-07  1:34 UTC (permalink / raw)
  To: linux-media, sakari.ailus
  Cc: jian.xu.zheng, tfiga, rajmohan.mani, tuukka.toivonen, hverkuil,
	hyungwoo.yang, Yong Zhi

This patch adds the driver for the CIO2 device found in some the Skylake
and Kaby Kake SoCs. The CIO2 consists of four D-PHY receivers.

The CIO2 driver exposes V4L2, V4L2 sub-device and Media controller
interfaces to the user space.

===========
= history =
===========
version 2:
- remove all explicit DMA flush operations
- change dma_free_noncoherent() to dma_free_coherent()
- remove cio2_hw_mipi_lanes()
- replace v4l2_g_ext_ctrls() with v4l2_ctrl_g_ctrl()
  in cio2_csi2_calc_timing().
- use ffs() to iterate the port_status in cio2_irq()
- add static inline file_to_cio2_queue() function
- comment dma_wmb(), cio2_rx_timing() and few other places
- use ktime_get_ns() for vb2_buf.timestamp in cio2_buffer_done()
- use of SET_RUNTIME_PM_OPS() macro for cio2_pm_ops
- use BIT() macro for bit difinitions
- remove un-used macros such as CIO2_QUEUE_WIDTH() in ipu3-cio2.h
- move the MODULE_AUTHOR() to the end of the file
- change file path to drivers/media/pci/intel/ipu3

version 1:
- Initial submission
Yong Zhi (3):
  [media] videodev2.h, v4l2-ioctl: add IPU3 raw10 color format
  [media] doc-rst: add IPU3 raw10 bayer pixel format definitions
  [media] intel-ipu3: cio2: Add new MIPI-CSI2 driver

 Documentation/media/uapi/v4l/pixfmt-rgb.rst        |    1 +
 .../media/uapi/v4l/pixfmt-srggb10-ipu3.rst         |   62 +
 drivers/media/pci/Kconfig                          |    2 +
 drivers/media/pci/Makefile                         |    3 +-
 drivers/media/pci/intel/Makefile                   |    5 +
 drivers/media/pci/intel/ipu3/Kconfig               |   17 +
 drivers/media/pci/intel/ipu3/Makefile              |    1 +
 drivers/media/pci/intel/ipu3/ipu3-cio2.c           | 1788 ++++++++++++++++++++
 drivers/media/pci/intel/ipu3/ipu3-cio2.h           |  424 +++++
 drivers/media/v4l2-core/v4l2-ioctl.c               |    4 +
 include/uapi/linux/videodev2.h                     |    5 +
 11 files changed, 2311 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/media/uapi/v4l/pixfmt-srggb10-ipu3.rst
 create mode 100644 drivers/media/pci/intel/Makefile
 create mode 100644 drivers/media/pci/intel/ipu3/Kconfig
 create mode 100644 drivers/media/pci/intel/ipu3/Makefile
 create mode 100644 drivers/media/pci/intel/ipu3/ipu3-cio2.c
 create mode 100644 drivers/media/pci/intel/ipu3/ipu3-cio2.h

-- 
2.7.4

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v2 1/3] [media] videodev2.h, v4l2-ioctl: add IPU3 raw10 color format
  2017-06-07  1:34 [PATCH v2 0/3] [media] add IPU3 CIO2 CSI2 driver Yong Zhi
@ 2017-06-07  1:34 ` Yong Zhi
  2017-06-07  1:34 ` [PATCH v2 2/3] [media] doc-rst: add IPU3 raw10 bayer pixel format definitions Yong Zhi
  2017-06-07  1:34 ` [PATCH v2 3/3] [media] intel-ipu3: cio2: Add new MIPI-CSI2 driver Yong Zhi
  2 siblings, 0 replies; 17+ messages in thread
From: Yong Zhi @ 2017-06-07  1:34 UTC (permalink / raw)
  To: linux-media, sakari.ailus
  Cc: jian.xu.zheng, tfiga, rajmohan.mani, tuukka.toivonen, hverkuil,
	hyungwoo.yang, Yong Zhi

Add IPU3 specific formats:

	V4L2_PIX_FMT_IPU3_SBGGR10
	V4L2_PIX_FMT_IPU3_SGBRG10
	V4L2_PIX_FMT_IPU3_SGRBG10
	V4L2_PIX_FMT_IPU3_SRGGB10

Signed-off-by: Yong Zhi <yong.zhi@intel.com>
---
 drivers/media/v4l2-core/v4l2-ioctl.c | 4 ++++
 include/uapi/linux/videodev2.h       | 5 +++++
 2 files changed, 9 insertions(+)

diff --git a/drivers/media/v4l2-core/v4l2-ioctl.c b/drivers/media/v4l2-core/v4l2-ioctl.c
index e5a2187..fb1387f 100644
--- a/drivers/media/v4l2-core/v4l2-ioctl.c
+++ b/drivers/media/v4l2-core/v4l2-ioctl.c
@@ -1202,6 +1202,10 @@ static void v4l_fill_fmtdesc(struct v4l2_fmtdesc *fmt)
 	case V4L2_PIX_FMT_SGBRG10P:	descr = "10-bit Bayer GBGB/RGRG Packed"; break;
 	case V4L2_PIX_FMT_SGRBG10P:	descr = "10-bit Bayer GRGR/BGBG Packed"; break;
 	case V4L2_PIX_FMT_SRGGB10P:	descr = "10-bit Bayer RGRG/GBGB Packed"; break;
+	case V4L2_PIX_FMT_IPU3_SBGGR10: descr = "10-bit bayer BGGR IPU3 Packed"; break;
+	case V4L2_PIX_FMT_IPU3_SGBRG10: descr = "10-bit bayer GBRG IPU3 Packed"; break;
+	case V4L2_PIX_FMT_IPU3_SGRBG10: descr = "10-bit bayer GRBG IPU3 Packed"; break;
+	case V4L2_PIX_FMT_IPU3_SRGGB10: descr = "10-bit bayer RGGB IPU3 Packed"; break;
 	case V4L2_PIX_FMT_SBGGR10ALAW8:	descr = "8-bit Bayer BGBG/GRGR (A-law)"; break;
 	case V4L2_PIX_FMT_SGBRG10ALAW8:	descr = "8-bit Bayer GBGB/RGRG (A-law)"; break;
 	case V4L2_PIX_FMT_SGRBG10ALAW8:	descr = "8-bit Bayer GRGR/BGBG (A-law)"; break;
diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h
index 2b8feb8..7bfa6ad 100644
--- a/include/uapi/linux/videodev2.h
+++ b/include/uapi/linux/videodev2.h
@@ -663,6 +663,11 @@ struct v4l2_pix_format {
 #define V4L2_PIX_FMT_MT21C    v4l2_fourcc('M', 'T', '2', '1') /* Mediatek compressed block mode  */
 #define V4L2_PIX_FMT_INZI     v4l2_fourcc('I', 'N', 'Z', 'I') /* Intel Planar Greyscale 10-bit and Depth 16-bit */
 
+#define V4L2_PIX_FMT_IPU3_SBGGR10	v4l2_fourcc('i', 'p', '3', 'b') /* IPU3 packed 10-bit BGGR bayer */
+#define V4L2_PIX_FMT_IPU3_SGBRG10	v4l2_fourcc('i', 'p', '3', 'g') /* IPU3 packed 10-bit GBRG bayer */
+#define V4L2_PIX_FMT_IPU3_SGRBG10	v4l2_fourcc('i', 'p', '3', 'G') /* IPU3 packed 10-bit GRBG bayer */
+#define V4L2_PIX_FMT_IPU3_SRGGB10	v4l2_fourcc('i', 'p', '3', 'r') /* IPU3 packed 10-bit RGGB bayer */
+
 /* SDR formats - used only for Software Defined Radio devices */
 #define V4L2_SDR_FMT_CU8          v4l2_fourcc('C', 'U', '0', '8') /* IQ u8 */
 #define V4L2_SDR_FMT_CU16LE       v4l2_fourcc('C', 'U', '1', '6') /* IQ u16le */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 2/3] [media] doc-rst: add IPU3 raw10 bayer pixel format definitions
  2017-06-07  1:34 [PATCH v2 0/3] [media] add IPU3 CIO2 CSI2 driver Yong Zhi
  2017-06-07  1:34 ` [PATCH v2 1/3] [media] videodev2.h, v4l2-ioctl: add IPU3 raw10 color format Yong Zhi
@ 2017-06-07  1:34 ` Yong Zhi
  2017-06-07 17:55   ` Alan Cox
  2017-06-07  1:34 ` [PATCH v2 3/3] [media] intel-ipu3: cio2: Add new MIPI-CSI2 driver Yong Zhi
  2 siblings, 1 reply; 17+ messages in thread
From: Yong Zhi @ 2017-06-07  1:34 UTC (permalink / raw)
  To: linux-media, sakari.ailus
  Cc: jian.xu.zheng, tfiga, rajmohan.mani, tuukka.toivonen, hverkuil,
	hyungwoo.yang, Yong Zhi

The formats added by this patch are:

    V4L2_PIX_FMT_IPU3_SBGGR10
    V4L2_PIX_FMT_IPU3_SGBRG10
    V4L2_PIX_FMT_IPU3_SGRBG10
    V4L2_PIX_FMT_IPU3_SRGGB10

Signed-off-by: Yong Zhi <yong.zhi@intel.com>
---
 Documentation/media/uapi/v4l/pixfmt-rgb.rst        |  1 +
 .../media/uapi/v4l/pixfmt-srggb10-ipu3.rst         | 62 ++++++++++++++++++++++
 2 files changed, 63 insertions(+)
 create mode 100644 Documentation/media/uapi/v4l/pixfmt-srggb10-ipu3.rst

diff --git a/Documentation/media/uapi/v4l/pixfmt-rgb.rst b/Documentation/media/uapi/v4l/pixfmt-rgb.rst
index b0f3513..6900d5c 100644
--- a/Documentation/media/uapi/v4l/pixfmt-rgb.rst
+++ b/Documentation/media/uapi/v4l/pixfmt-rgb.rst
@@ -16,5 +16,6 @@ RGB Formats
     pixfmt-srggb10p
     pixfmt-srggb10alaw8
     pixfmt-srggb10dpcm8
+    pixfmt-srggb10-ipu3
     pixfmt-srggb12
     pixfmt-srggb16
diff --git a/Documentation/media/uapi/v4l/pixfmt-srggb10-ipu3.rst b/Documentation/media/uapi/v4l/pixfmt-srggb10-ipu3.rst
new file mode 100644
index 0000000..618e24a
--- /dev/null
+++ b/Documentation/media/uapi/v4l/pixfmt-srggb10-ipu3.rst
@@ -0,0 +1,62 @@
+.. -*- coding: utf-8; mode: rst -*-
+
+.. _V4L2_PIX_FMT_IPU3_SBGGR10:
+.. _V4L2_PIX_FMT_IPU3_SGBRG10:
+.. _V4L2_PIX_FMT_IPU3_SGRBG10:
+.. _V4L2_PIX_FMT_IPU3_SRGGB10:
+
+**********************************************************************************************************************************************
+V4L2_PIX_FMT_IPU3_SBGGR10 ('ip3b'), V4L2_PIX_FMT_IPU3_SGBRG10 ('ip3g'), V4L2_PIX_FMT_IPU3_SGRBG10 ('ip3G'), V4L2_PIX_FMT_IPU3_SRGGB10 ('ip3r')
+**********************************************************************************************************************************************
+
+10-bit Bayer formats
+
+Description
+===========
+
+These four pixel formats are used by Intel IPU3 driver, they are raw
+sRGB / Bayer formats with 10 bits per sample with every 25 pixels packed
+to 32 bytes leaving 6 most significant bits padding in the last byte.
+The format is little endian.
+
+In other respects this format is similar to :ref:`V4L2-PIX-FMT-SRGGB10`.
+
+**Byte Order.**
+Each cell is one byte.
+
+.. raw:: latex
+
+    \newline\newline\begin{adjustbox}{width=\columnwidth}
+
+.. tabularcolumns:: |p{1.3cm}|p{1.0cm}|p{10.9cm}|p{10.9cm}|p{10.9cm}|p{1.0cm}|
+
+.. flat-table::
+
+    * - start + 0:
+      - B\ :sub:`00low`
+      - G\ :sub:`01low` \ (bits 7--2) B\ :sub:`00high`\ (bits 1--0)
+      - B\ :sub:`02low` \ (bits 7--4) G\ :sub:`01high`\ (bits 3--0)
+      - G\ :sub:`03low` \ (bits 7--6) B\ :sub:`02high`\ (bits 5--0)
+      - G\ :sub:`03high`
+    * - start + 5:
+      - G\ :sub:`10low`
+      - R\ :sub:`11low` \ (bits 7--2) G\ :sub:`10high`\ (bits 1--0)
+      - G\ :sub:`12low` \ (bits 7--4) R\ :sub:`11high`\ (bits 3--0)
+      - R\ :sub:`13low` \ (bits 7--6) G\ :sub:`12high`\ (bits 5--0)
+      - R\ :sub:`13high`
+    * - start + 10:
+      - B\ :sub:`20low`
+      - G\ :sub:`21low` \ (bits 7--2) B\ :sub:`20high`\ (bits 1--0)
+      - B\ :sub:`22low` \ (bits 7--4) G\ :sub:`21high`\ (bits 3--0)
+      - G\ :sub:`23low` \ (bits 7--6) B\ :sub:`22high`\ (bits 5--0)
+      - G\ :sub:`23high`
+    * - start + 15:
+      - G\ :sub:`30low`
+      - R\ :sub:`31low` \ (bits 7--2) G\ :sub:`30high`\ (bits 1--0)
+      - G\ :sub:`32low` \ (bits 7--4) R\ :sub:`31high`\ (bits 3--0)
+      - R\ :sub:`33low` \ (bits 7--6) G\ :sub:`32high`\ (bits 5--0)
+      - R\ :sub:`33high`
+
+.. raw:: latex
+
+    \end{adjustbox}\newline\newline
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 3/3] [media] intel-ipu3: cio2: Add new MIPI-CSI2 driver
  2017-06-07  1:34 [PATCH v2 0/3] [media] add IPU3 CIO2 CSI2 driver Yong Zhi
  2017-06-07  1:34 ` [PATCH v2 1/3] [media] videodev2.h, v4l2-ioctl: add IPU3 raw10 color format Yong Zhi
  2017-06-07  1:34 ` [PATCH v2 2/3] [media] doc-rst: add IPU3 raw10 bayer pixel format definitions Yong Zhi
@ 2017-06-07  1:34 ` Yong Zhi
  2017-06-12  9:59   ` Tomasz Figa
  2 siblings, 1 reply; 17+ messages in thread
From: Yong Zhi @ 2017-06-07  1:34 UTC (permalink / raw)
  To: linux-media, sakari.ailus
  Cc: jian.xu.zheng, tfiga, rajmohan.mani, tuukka.toivonen, hverkuil,
	hyungwoo.yang, Yong Zhi

This patch adds CIO2 CSI-2 device driver for
Intel's IPU3 camera sub-system support.

Signed-off-by: Yong Zhi <yong.zhi@intel.com>
---
 drivers/media/pci/Kconfig                |    2 +
 drivers/media/pci/Makefile               |    3 +-
 drivers/media/pci/intel/Makefile         |    5 +
 drivers/media/pci/intel/ipu3/Kconfig     |   17 +
 drivers/media/pci/intel/ipu3/Makefile    |    1 +
 drivers/media/pci/intel/ipu3/ipu3-cio2.c | 1788 ++++++++++++++++++++++++++++++
 drivers/media/pci/intel/ipu3/ipu3-cio2.h |  424 +++++++
 7 files changed, 2239 insertions(+), 1 deletion(-)
 create mode 100644 drivers/media/pci/intel/Makefile
 create mode 100644 drivers/media/pci/intel/ipu3/Kconfig
 create mode 100644 drivers/media/pci/intel/ipu3/Makefile
 create mode 100644 drivers/media/pci/intel/ipu3/ipu3-cio2.c
 create mode 100644 drivers/media/pci/intel/ipu3/ipu3-cio2.h

diff --git a/drivers/media/pci/Kconfig b/drivers/media/pci/Kconfig
index da28e68..5932e22 100644
--- a/drivers/media/pci/Kconfig
+++ b/drivers/media/pci/Kconfig
@@ -54,5 +54,7 @@ source "drivers/media/pci/smipcie/Kconfig"
 source "drivers/media/pci/netup_unidvb/Kconfig"
 endif
 
+source "drivers/media/pci/intel/ipu3/Kconfig"
+
 endif #MEDIA_PCI_SUPPORT
 endif #PCI
diff --git a/drivers/media/pci/Makefile b/drivers/media/pci/Makefile
index a7e8af0..d8f9843 100644
--- a/drivers/media/pci/Makefile
+++ b/drivers/media/pci/Makefile
@@ -13,7 +13,8 @@ obj-y        +=	ttpci/		\
 		ddbridge/	\
 		saa7146/	\
 		smipcie/	\
-		netup_unidvb/
+		netup_unidvb/	\
+		intel/
 
 obj-$(CONFIG_VIDEO_IVTV) += ivtv/
 obj-$(CONFIG_VIDEO_ZORAN) += zoran/
diff --git a/drivers/media/pci/intel/Makefile b/drivers/media/pci/intel/Makefile
new file mode 100644
index 0000000..745c8b2
--- /dev/null
+++ b/drivers/media/pci/intel/Makefile
@@ -0,0 +1,5 @@
+#
+# Makefile for the IPU3 cio2 and ImGU drivers
+#
+
+obj-y	+= ipu3/
diff --git a/drivers/media/pci/intel/ipu3/Kconfig b/drivers/media/pci/intel/ipu3/Kconfig
new file mode 100644
index 0000000..2a895d6
--- /dev/null
+++ b/drivers/media/pci/intel/ipu3/Kconfig
@@ -0,0 +1,17 @@
+config VIDEO_IPU3_CIO2
+	tristate "Intel ipu3-cio2 driver"
+	depends on VIDEO_V4L2 && PCI
+	depends on MEDIA_CONTROLLER
+	depends on HAS_DMA
+	depends on ACPI
+	select V4L2_FWNODE
+	select VIDEOBUF2_DMA_SG
+
+	---help---
+	This is the Intel IPU3 CIO2 CSI-2 receiver unit, found in Intel
+	Skylake and Kaby Lake SoCs and used for capturing images and
+	video from a camera sensor.
+
+	Say Y or M here if you have a Skylake/Kaby Lake SoC with MIPI CSI-2
+	connected camera.
+	The module will be called ipu3-cio2.
diff --git a/drivers/media/pci/intel/ipu3/Makefile b/drivers/media/pci/intel/ipu3/Makefile
new file mode 100644
index 0000000..20186e3
--- /dev/null
+++ b/drivers/media/pci/intel/ipu3/Makefile
@@ -0,0 +1 @@
+obj-$(CONFIG_VIDEO_IPU3_CIO2) += ipu3-cio2.o
diff --git a/drivers/media/pci/intel/ipu3/ipu3-cio2.c b/drivers/media/pci/intel/ipu3/ipu3-cio2.c
new file mode 100644
index 0000000..69c47fc
--- /dev/null
+++ b/drivers/media/pci/intel/ipu3/ipu3-cio2.c
@@ -0,0 +1,1788 @@
+/*
+ * Copyright (c) 2017 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version
+ * 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * Based partially on Intel IPU4 driver written by
+ *  Sakari Ailus <sakari.ailus@linux.intel.com>
+ *  Samu Onkalo <samu.onkalo@intel.com>
+ *  Jouni Högander <jouni.hogander@intel.com>
+ *  Jouni Ukkonen <jouni.ukkonen@intel.com>
+ *  Antti Laakso <antti.laakso@intel.com>
+ * et al.
+ *
+ */
+
+#include <linux/delay.h>
+#include <linux/interrupt.h>
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <linux/pm_runtime.h>
+#include <linux/property.h>
+#include <linux/vmalloc.h>
+#include <media/v4l2-ctrls.h>
+#include <media/v4l2-device.h>
+#include <media/v4l2-event.h>
+#include <media/v4l2-fwnode.h>
+#include <media/v4l2-ioctl.h>
+#include <media/videobuf2-dma-sg.h>
+
+#include "ipu3-cio2.h"
+
+/*
+ * These are raw formats used in Intel's third generation of
+ * Image Processing Unit known as IPU3.
+ * 10bit raw bayer packed, 32 bytes for every 25 pixels,
+ * last LSB 6 bits unused.
+ */
+static const u32 cio2_csi2_fmts[] = {
+	V4L2_PIX_FMT_IPU3_SRGGB10,
+	V4L2_PIX_FMT_IPU3_SGBRG10,
+	V4L2_PIX_FMT_IPU3_SGRBG10,
+	V4L2_PIX_FMT_IPU3_SBGGR10,
+};
+
+static inline u32 cio2_bytesperline(const unsigned int width)
+{
+	/*
+	 * 64 bytes for every 50 pixels, the line length
+	 * in bytes is multiple of 64 (line end alignment).
+	 */
+	return DIV_ROUND_UP(width, 50) * 64;
+}
+
+/**************** FBPT operations ****************/
+
+static void cio2_fbpt_exit_dummy(struct cio2_device *cio2)
+{
+	if (cio2->dummy_lop) {
+		dma_free_coherent(&cio2->pci_dev->dev, PAGE_SIZE,
+				cio2->dummy_lop, cio2->dummy_lop_bus_addr);
+		cio2->dummy_lop = NULL;
+	}
+	if (cio2->dummy_page) {
+		dma_free_coherent(&cio2->pci_dev->dev, PAGE_SIZE,
+				cio2->dummy_page, cio2->dummy_page_bus_addr);
+		cio2->dummy_page = NULL;
+	}
+}
+
+static int cio2_fbpt_init_dummy(struct cio2_device *cio2)
+{
+	unsigned int i;
+
+	cio2->dummy_page = dma_alloc_noncoherent(&cio2->pci_dev->dev, PAGE_SIZE,
+					&cio2->dummy_page_bus_addr, GFP_KERNEL);
+	cio2->dummy_lop = dma_alloc_noncoherent(&cio2->pci_dev->dev, PAGE_SIZE,
+					&cio2->dummy_lop_bus_addr, GFP_KERNEL);
+	if (!cio2->dummy_page || !cio2->dummy_lop) {
+		cio2_fbpt_exit_dummy(cio2);
+		return -ENOMEM;
+	}
+	/*
+	 * List of Pointers(LOP) contains 1024x32b pointers to 4KB page each
+	 * Initialize each entry to dummy_page bus base address.
+	 */
+	for (i = 0; i < PAGE_SIZE / sizeof(*cio2->dummy_lop); i++)
+		cio2->dummy_lop[i] = cio2->dummy_page_bus_addr >> PAGE_SHIFT;
+
+	return 0;
+}
+
+static void cio2_fbpt_entry_enable(struct cio2_device *cio2,
+				   struct cio2_fbpt_entry entry[CIO2_MAX_LOPS])
+{
+	/*
+	 * The CPU first initializes some fields in fbpt, then sets
+	 * the VALID bit, this barrier is to ensure that the DMA(device)
+	 * does not see the VALID bit enabled before other fields are
+	 * initialized; otherwise it could lead to havoc.
+	 */
+	dma_wmb();
+
+	/*
+	 * Request interrupts for start and completion
+	 * Valid bit is applicable only to 1st entry
+	 */
+	entry[0].first_entry.ctrl = CIO2_FBPT_CTRL_VALID |
+		CIO2_FBPT_CTRL_IOC | CIO2_FBPT_CTRL_IOS;
+}
+
+/* Initialize fpbt entries to point to dummy frame */
+static void cio2_fbpt_entry_init_dummy(struct cio2_device *cio2,
+				       struct cio2_fbpt_entry
+				       entry[CIO2_MAX_LOPS])
+{
+	unsigned int i;
+
+	entry[0].first_entry.first_page_offset = 0;
+	entry[1].second_entry.num_of_pages =
+		PAGE_SIZE / sizeof(u32) * CIO2_MAX_LOPS;
+	entry[1].second_entry.last_page_available_bytes = PAGE_SIZE - 1;
+
+	for (i = 0; i < CIO2_MAX_LOPS; i++)
+		entry[i].lop_page_addr = cio2->dummy_lop_bus_addr >> PAGE_SHIFT;
+
+	cio2_fbpt_entry_enable(cio2, entry);
+}
+
+/* Initialize fpbt entries to point to a given buffer */
+static void cio2_fbpt_entry_init_buf(struct cio2_device *cio2,
+				     struct cio2_buffer *b,
+				     struct cio2_fbpt_entry
+				     entry[CIO2_MAX_LOPS])
+{
+	struct vb2_buffer *vb = &b->vbb.vb2_buf;
+	unsigned int length = vb->planes[0].length;
+	dma_addr_t lop_bus_addr = b->lop_bus_addr;
+	int remaining;
+
+	entry[0].first_entry.first_page_offset =
+		offset_in_page(vb2_plane_vaddr(vb, 0));
+	remaining = length + entry[0].first_entry.first_page_offset;
+	entry[1].second_entry.num_of_pages = DIV_ROUND_UP(remaining, PAGE_SIZE);
+	/*
+	 * last_page_available_bytes has the offset of the last byte in the
+	 * last page which is still accessible by DMA. DMA cannot access
+	 * beyond this point. Valid range for this is from 0 to 4095.
+	 * 0 indicates 1st byte in the page is DMA accessible.
+	 * 4095 (PAGE_SIZE - 1) means every single byte in the last page
+	 * is available for DMA transfer.
+	 */
+	entry[1].second_entry.last_page_available_bytes =
+			(remaining & ~PAGE_MASK) ?
+				(remaining & ~PAGE_MASK) - 1 : PAGE_SIZE - 1;
+	/* Fill FBPT */
+	remaining = length;
+	while (remaining > 0) {
+		entry->lop_page_addr = lop_bus_addr >> PAGE_SHIFT;
+		lop_bus_addr += PAGE_SIZE;
+		remaining -= PAGE_SIZE / sizeof(u32) * PAGE_SIZE;
+		entry++;
+	}
+
+	/*
+	 * The first not meaningful FBPT entry should point to a valid LOP
+	 */
+	entry->lop_page_addr = cio2->dummy_lop_bus_addr >> PAGE_SHIFT;
+
+	cio2_fbpt_entry_enable(cio2, entry);
+}
+
+static int cio2_fbpt_init(struct cio2_device *cio2, struct cio2_queue *q)
+{
+	struct device *dev = &cio2->pci_dev->dev;
+
+	q->fbpt = dma_alloc_noncoherent(dev, CIO2_FBPT_SIZE,
+			&q->fbpt_bus_addr, GFP_KERNEL);
+	if (!q->fbpt)
+		return -ENOMEM;
+
+	memset(q->fbpt, 0, CIO2_FBPT_SIZE);
+
+	return 0;
+}
+
+static void cio2_fbpt_exit(struct cio2_queue *q, struct device *dev)
+{
+	dma_free_coherent(dev, CIO2_FBPT_SIZE, q->fbpt, q->fbpt_bus_addr);
+}
+
+/**************** CSI2 hardware setup ****************/
+
+/*
+ * This should come from sensor driver. No
+ * driver interface nor requirement yet.
+ */
+static u8 sensor_vc;	/* Virtual channel */
+
+/*
+ * The CSI2 receiver has several parameters affecting
+ * the receiver timings. These depend on the MIPI bus frequency
+ * F in Hz (sensor transmitter rate) as follows:
+ *     register value = (A/1e9 + B * UI) / COUNT_ACC
+ * where
+ *      UI = 1 / (2 * F) in seconds
+ *      COUNT_ACC = counter accuracy in seconds
+ *      For IPU3 COUNT_ACC = 0.0625
+ *
+ * A and B are coefficients from the table below,
+ * depending whether the register minimum or maximum value is
+ * calculated.
+ *                                     Minimum     Maximum
+ * Clock lane                          A     B     A     B
+ * reg_rx_csi_dly_cnt_termen_clane     0     0    38     0
+ * reg_rx_csi_dly_cnt_settle_clane    95    -8   300   -16
+ * Data lanes
+ * reg_rx_csi_dly_cnt_termen_dlane0    0     0    35
+ * reg_rx_csi_dly_cnt_settle_dlane0   85    -2   145    -6
+ * reg_rx_csi_dly_cnt_termen_dlane1    0     0    35     4
+ * reg_rx_csi_dly_cnt_settle_dlane1   85    -2   145    -6
+ * reg_rx_csi_dly_cnt_termen_dlane2    0     0    35     4
+ * reg_rx_csi_dly_cnt_settle_dlane2   85    -2   145    -6
+ * reg_rx_csi_dly_cnt_termen_dlane3    0     0    35     4
+ * reg_rx_csi_dly_cnt_settle_dlane3   85    -2   145    -6
+ *
+ * We use the minimum values of both A and B.
+ */
+static int cio2_rx_timing(s32 a, s32 b, s64 freq)
+{
+	int r;
+	const u32 accinv = 16;
+	const u32 ds = 8; /* divde shift */
+
+	freq = (s32)freq >> ds;
+	if (WARN_ON(freq <= 0))
+		return -EINVAL;
+
+	/* b could be 0, -2 or -8, so r < 500000000 */
+	r = accinv * b * (500000000 >> ds);
+	r /= freq;
+	/* max value of a is 95 */
+	r += accinv * a;
+
+	return r;
+};
+
+/* Computation for the Delay value for Termination enable of Clock lane HS Rx */
+static int cio2_csi2_calc_timing(struct cio2_device *cio2, struct cio2_queue *q,
+			    struct cio2_csi2_timing *timing)
+{
+	struct device *dev = &cio2->pci_dev->dev;
+	struct v4l2_querymenu qm = {.id = V4L2_CID_LINK_FREQ, };
+	struct v4l2_ctrl *link_freq;
+	s64 freq;
+	int r;
+
+	if (q->sensor)
+		link_freq = v4l2_ctrl_find(q->sensor->ctrl_handler,
+						V4L2_CID_LINK_FREQ);
+	if (!link_freq) {
+		dev_err(dev, "failed to find LINK_FREQ\n");
+		return -EPIPE;
+	};
+
+	qm.index = v4l2_ctrl_g_ctrl(link_freq);
+	r = v4l2_querymenu(q->sensor->ctrl_handler, &qm);
+	if (r) {
+		dev_err(dev, "failed to get menu item\n");
+		return r;
+	}
+
+	if (!qm.value)
+		return -EINVAL;
+	freq = qm.value;
+
+	dev_info(dev, "link freq is %lld\n", qm.value);
+
+	timing->clk_termen = cio2_rx_timing(CIO2_CSIRX_DLY_CNT_TERMEN_CLANE_A,
+				CIO2_CSIRX_DLY_CNT_TERMEN_CLANE_B, freq);
+	/* test freq/div_shift > 0 */
+	if (timing->clk_termen < 0)
+		return -EINVAL;
+
+	timing->clk_settle = cio2_rx_timing(CIO2_CSIRX_DLY_CNT_SETTLE_CLANE_A,
+				CIO2_CSIRX_DLY_CNT_SETTLE_CLANE_B, freq);
+	timing->dat_termen = cio2_rx_timing(CIO2_CSIRX_DLY_CNT_TERMEN_DLANE_A,
+				CIO2_CSIRX_DLY_CNT_TERMEN_DLANE_B, freq);
+	timing->dat_settle = cio2_rx_timing(CIO2_CSIRX_DLY_CNT_SETTLE_DLANE_A,
+				CIO2_CSIRX_DLY_CNT_SETTLE_DLANE_B, freq);
+
+	dev_dbg(dev, "freq ct value is %d\n", timing->clk_termen);
+	dev_dbg(dev, "freq cs value is %d\n", timing->clk_settle);
+	dev_dbg(dev, "freq dt value is %d\n", timing->dat_termen);
+	dev_dbg(dev, "freq ds value is %d\n", timing->dat_settle);
+
+	return 0;
+};
+
+static int cio2_hw_mbus_to_mipicode(__u32 code)
+{
+	static const struct {
+		u32 mbuscode;
+		u8 mipicode;
+	} mbus2mipi[] = {
+		{ MEDIA_BUS_FMT_SBGGR10_1X10, 0x2b },
+		{ MEDIA_BUS_FMT_SGBRG10_1X10, 0x2b },
+		{ MEDIA_BUS_FMT_SGRBG10_1X10, 0x2b },
+		{ MEDIA_BUS_FMT_SRGGB10_1X10, 0x2b },
+	};
+	unsigned int i;
+
+	for (i = 0; i < ARRAY_SIZE(mbus2mipi); i++)
+		if (mbus2mipi[i].mbuscode == code)
+			return mbus2mipi[i].mipicode;
+
+	return -EINVAL;
+}
+
+static int cio2_hw_init(struct cio2_device *cio2, struct cio2_queue *q)
+{
+	static const int NUM_VCS = 4;
+	static const int SID;	/* Stream id */
+	static const int ENTRY;
+	static const int FBPT_WIDTH = DIV_ROUND_UP(CIO2_MAX_LOPS,
+					CIO2_FBPT_SUBENTRY_UNIT);
+	const u32 num_buffers1 = CIO2_MAX_BUFFERS - 1;
+	void __iomem *const base = cio2->base;
+	u8 mipicode, lanes, csi2bus = q->csi2.port;
+	struct cio2_csi2_timing timing;
+	int i, r;
+
+	/* TODO: add support for virtual channels */
+	sensor_vc = 0;
+	mipicode = r = cio2_hw_mbus_to_mipicode(
+			q->subdev_fmt.code);
+	if (r < 0)
+		return r;
+
+	lanes = r = q->csi2.num_of_lanes;
+	if (r < 0)
+		return r;
+
+	writel(CIO2_PBM_WMCTRL1_MIN_2CK |
+	       CIO2_PBM_WMCTRL1_MID1_2CK |
+	       CIO2_PBM_WMCTRL1_MID2_2CK, base + CIO2_REG_PBM_WMCTRL1);
+	writel(CIO2_PBM_WMCTRL2_HWM_2CK << CIO2_PBM_WMCTRL2_HWM_2CK_SHIFT |
+	       CIO2_PBM_WMCTRL2_LWM_2CK << CIO2_PBM_WMCTRL2_LWM_2CK_SHIFT |
+	       CIO2_PBM_WMCTRL2_OBFFWM_2CK <<
+	       CIO2_PBM_WMCTRL2_OBFFWM_2CK_SHIFT |
+	       CIO2_PBM_WMCTRL2_TRANSDYN << CIO2_PBM_WMCTRL2_TRANSDYN_SHIFT |
+	       CIO2_PBM_WMCTRL2_OBFF_MEM_EN, base + CIO2_REG_PBM_WMCTRL2);
+	writel(CIO2_PBM_ARB_CTRL_LANES_DIV << CIO2_PBM_ARB_CTRL_LANES_DIV |
+	       CIO2_PBM_ARB_CTRL_LE_EN |
+	       CIO2_PBM_ARB_CTRL_PLL_POST_SHTDN <<
+	       CIO2_PBM_ARB_CTRL_PLL_POST_SHTDN_SHIFT |
+	       CIO2_PBM_ARB_CTRL_PLL_AHD_WK_UP <<
+	       CIO2_PBM_ARB_CTRL_PLL_AHD_WK_UP_SHIFT,
+	       base + CIO2_REG_PBM_ARB_CTRL);
+	writel(CIO2_CSIRX_STATUS_DLANE_HS_MASK,
+	       q->csi_rx_base + CIO2_REG_CSIRX_STATUS_DLANE_HS);
+	writel(CIO2_CSIRX_STATUS_DLANE_LP_MASK,
+	       q->csi_rx_base + CIO2_REG_CSIRX_STATUS_DLANE_LP);
+
+	writel(CIO2_FB_HPLL_FREQ, base + CIO2_REG_FB_HPLL_FREQ);
+	writel(CIO2_ISCLK_RATIO, base + CIO2_REG_ISCLK_RATIO);
+
+	/* Configure MIPI backend */
+	for (i = 0; i < NUM_VCS; i++)
+		writel(1, q->csi_rx_base + CIO2_REG_MIPIBE_SP_LUT_ENTRY(i));
+
+	/* There are 16 short packet LUT entry */
+	for (i = 0; i < 16; i++)
+		writel(CIO2_MIPIBE_LP_LUT_ENTRY_DISREGARD,
+		       q->csi_rx_base + CIO2_REG_MIPIBE_LP_LUT_ENTRY(i));
+	writel(CIO2_MIPIBE_GLOBAL_LUT_DISREGARD,
+	       q->csi_rx_base + CIO2_REG_MIPIBE_GLOBAL_LUT_DISREGARD);
+
+	writel(CIO2_INT_EN_EXT_IE_MASK, base + CIO2_REG_INT_EN_EXT_IE);
+	writel(CIO2_IRQCTRL_MASK, q->csi_rx_base + CIO2_REG_IRQCTRL_MASK);
+	writel(CIO2_IRQCTRL_MASK, q->csi_rx_base + CIO2_REG_IRQCTRL_ENABLE);
+	writel(0, q->csi_rx_base + CIO2_REG_IRQCTRL_EDGE);
+	writel(0, q->csi_rx_base + CIO2_REG_IRQCTRL_LEVEL_NOT_PULSE);
+	writel(CIO2_INT_EN_EXT_OE_MASK, base + CIO2_REG_INT_EN_EXT_OE);
+
+	writel(CIO2_INT_IOC(CIO2_DMA_CHAN), base + CIO2_REG_INT_EN);
+
+	writel((CIO2_PXM_PXF_FMT_CFG_BPP_10 | CIO2_PXM_PXF_FMT_CFG_PCK_64B)
+	       << CIO2_PXM_PXF_FMT_CFG_SID0_SHIFT,
+	       base + CIO2_REG_PXM_PXF_FMT_CFG0(csi2bus));
+	writel(SID << CIO2_MIPIBE_LP_LUT_ENTRY_SID_SHIFT |
+	       sensor_vc << CIO2_MIPIBE_LP_LUT_ENTRY_VC_SHIFT |
+	       mipicode << CIO2_MIPIBE_LP_LUT_ENTRY_FORMAT_TYPE_SHIFT,
+	       q->csi_rx_base + CIO2_REG_MIPIBE_LP_LUT_ENTRY(ENTRY));
+	writel(0, q->csi_rx_base + CIO2_REG_MIPIBE_COMP_FORMAT(sensor_vc));
+	writel(0, q->csi_rx_base + CIO2_REG_MIPIBE_FORCE_RAW8);
+	writel(0, base + CIO2_REG_PXM_SID2BID0(csi2bus));
+
+	r = cio2_csi2_calc_timing(cio2, q, &timing);
+	if (r) {
+		/* Use default values */
+		for (i = -1; i < lanes; i++) {
+			writel(0x4, q->csi_rx_base +
+				CIO2_REG_CSIRX_DLY_CNT_TERMEN(i));
+			writel(0x570, q->csi_rx_base +
+				CIO2_REG_CSIRX_DLY_CNT_SETTLE(i));
+		}
+	} else {
+		i = CIO2_CSIRX_DLY_CNT_CLANE_IDX;
+		writel(timing.clk_termen, q->csi_rx_base +
+			CIO2_REG_CSIRX_DLY_CNT_TERMEN(i));
+		writel(timing.clk_settle, q->csi_rx_base +
+			CIO2_REG_CSIRX_DLY_CNT_SETTLE(i));
+
+		for (i = 0; i < lanes; i++) {
+			writel(timing.dat_termen, q->csi_rx_base +
+				CIO2_REG_CSIRX_DLY_CNT_TERMEN(i));
+			writel(timing.dat_settle, q->csi_rx_base +
+				CIO2_REG_CSIRX_DLY_CNT_SETTLE(i));
+		}
+	}
+
+	writel(lanes, q->csi_rx_base + CIO2_REG_CSIRX_NOF_ENABLED_LANES);
+	writel(CIO2_CGC_PRIM_TGE |
+	       CIO2_CGC_SIDE_TGE |
+	       CIO2_CGC_XOSC_TGE |
+	       CIO2_CGC_D3I3_TGE |
+	       CIO2_CGC_CSI2_INTERFRAME_TGE |
+	       CIO2_CGC_CSI2_PORT_DCGE |
+	       CIO2_CGC_SIDE_DCGE |
+	       CIO2_CGC_PRIM_DCGE |
+	       CIO2_CGC_ROSC_DCGE |
+	       CIO2_CGC_XOSC_DCGE |
+	       CIO2_CGC_CLKGATE_HOLDOFF << CIO2_CGC_CLKGATE_HOLDOFF_SHIFT |
+	       CIO2_CGC_CSI_CLKGATE_HOLDOFF
+	       << CIO2_CGC_CSI_CLKGATE_HOLDOFF_SHIFT, base + CIO2_REG_CGC);
+	writel(CIO2_LTRVAL0_VAL << CIO2_LTRVAL02_VAL_SHIFT |
+	       CIO2_LTRVAL0_SCALE << CIO2_LTRVAL02_SCALE_SHIFT |
+	       CIO2_LTRVAL1_VAL << CIO2_LTRVAL13_VAL_SHIFT |
+	       CIO2_LTRVAL1_SCALE << CIO2_LTRVAL13_SCALE_SHIFT,
+	       base + CIO2_REG_LTRVAL01);
+	writel(CIO2_LTRVAL2_VAL << CIO2_LTRVAL02_VAL_SHIFT |
+	       CIO2_LTRVAL2_SCALE << CIO2_LTRVAL02_SCALE_SHIFT |
+	       CIO2_LTRVAL3_VAL << CIO2_LTRVAL13_VAL_SHIFT |
+	       CIO2_LTRVAL3_SCALE << CIO2_LTRVAL13_SCALE_SHIFT,
+	       base + CIO2_REG_LTRVAL23);
+
+	for (i = 0; i < CIO2_NUM_DMA_CHAN; i++) {
+		writel(0, base + CIO2_REG_CDMABA(i));
+		writel(0, base + CIO2_REG_CDMAC0(i));
+		writel(0, base + CIO2_REG_CDMAC1(i));
+	}
+
+	/* Enable DMA */
+	writel(q->fbpt_bus_addr >> PAGE_SHIFT,
+	       base + CIO2_REG_CDMABA(CIO2_DMA_CHAN));
+
+	writel(num_buffers1 << CIO2_CDMAC0_FBPT_LEN_SHIFT |
+	       FBPT_WIDTH << CIO2_CDMAC0_FBPT_WIDTH_SHIFT |
+	       CIO2_CDMAC0_DMA_INTR_ON_FE |
+	       CIO2_CDMAC0_FBPT_UPDATE_FIFO_FULL |
+	       CIO2_CDMAC0_DMA_EN |
+	       CIO2_CDMAC0_DMA_INTR_ON_FS |
+	       CIO2_CDMAC0_DMA_HALTED, base + CIO2_REG_CDMAC0(CIO2_DMA_CHAN));
+
+	writel(1 << CIO2_CDMAC1_LINENUMUPDATE_SHIFT,
+	       base + CIO2_REG_CDMAC1(CIO2_DMA_CHAN));
+
+	writel(0, base + CIO2_REG_PBM_FOPN_ABORT);
+
+	writel(CIO2_PXM_FRF_CFG_CRC_TH << CIO2_PXM_FRF_CFG_CRC_TH_SHIFT |
+	       CIO2_PXM_FRF_CFG_MSK_ECC_DPHY_NR |
+	       CIO2_PXM_FRF_CFG_MSK_ECC_RE |
+	       CIO2_PXM_FRF_CFG_MSK_ECC_DPHY_NE,
+	       base + CIO2_REG_PXM_FRF_CFG(q->csi2.port));
+
+	/* Clear interrupts */
+	writel(CIO2_IRQCTRL_MASK, q->csi_rx_base + CIO2_REG_IRQCTRL_CLEAR);
+	writel(~0, base + CIO2_REG_INT_STS_EXT_OE);
+	writel(~0, base + CIO2_REG_INT_STS_EXT_IE);
+	writel(~0, base + CIO2_REG_INT_STS);
+
+	/* Enable devices, starting from the last device in the pipe */
+	writel(1, q->csi_rx_base + CIO2_REG_MIPIBE_ENABLE);
+	writel(1, q->csi_rx_base + CIO2_REG_CSIRX_ENABLE);
+
+	return 0;
+}
+
+static void cio2_hw_exit(struct cio2_device *cio2, struct cio2_queue *q)
+{
+	void __iomem *base = cio2->base;
+	unsigned int i, maxloops = 1000;
+
+	/* Disable CSI receiver and MIPI backend devices */
+	writel(0, q->csi_rx_base + CIO2_REG_CSIRX_ENABLE);
+	writel(0, q->csi_rx_base + CIO2_REG_MIPIBE_ENABLE);
+
+	/* Halt DMA */
+	writel(0, base + CIO2_REG_CDMAC0(CIO2_DMA_CHAN));
+	do {
+		if (readl(base + CIO2_REG_CDMAC0(CIO2_DMA_CHAN)) &
+		    CIO2_CDMAC0_DMA_HALTED)
+			break;
+		usleep_range(1000, 2000);
+	} while (--maxloops);
+	if (!maxloops)
+		dev_err(&cio2->pci_dev->dev,
+			"DMA %i can not be halted\n", CIO2_DMA_CHAN);
+
+	for (i = 0; i < CIO2_NUM_PORTS; i++) {
+		writel(readl(base + CIO2_REG_PXM_FRF_CFG(i)) |
+		       CIO2_PXM_FRF_CFG_ABORT, base + CIO2_REG_PXM_FRF_CFG(i));
+		writel(readl(base + CIO2_REG_PBM_FOPN_ABORT) |
+		       CIO2_PBM_FOPN_ABORT(i), base + CIO2_REG_PBM_FOPN_ABORT);
+	}
+}
+
+static void cio2_buffer_done(struct cio2_device *cio2, unsigned int dma_chan)
+{
+	struct device *dev = &cio2->pci_dev->dev;
+	struct cio2_queue *q = cio2->cur_queue;
+	int buffers_found = 0;
+
+	if (dma_chan >= CIO2_QUEUES) {
+		dev_err(dev, "bad DMA channel %i\n", dma_chan);
+		return;
+	}
+
+	/* Find out which buffer(s) are ready */
+	do {
+		struct cio2_fbpt_entry *const entry =
+			&q->fbpt[q->bufs_first * CIO2_MAX_LOPS];
+		struct cio2_buffer *b;
+
+		if (entry->first_entry.ctrl & CIO2_FBPT_CTRL_VALID)
+			break;
+
+		b = q->bufs[q->bufs_first];
+		if (b) {
+			u64 ns = ktime_get_ns();
+			int bytes = entry[1].second_entry.num_of_bytes;
+
+			q->bufs[q->bufs_first] = NULL;
+			atomic_dec(&q->bufs_queued);
+			dev_dbg(&cio2->pci_dev->dev,
+				"buffer %i done\n", b->vbb.vb2_buf.index);
+
+			/* Fill vb2 buffer entries and tell it's ready */
+			vb2_set_plane_payload(&b->vbb.vb2_buf, 0, bytes);
+			b->vbb.vb2_buf.timestamp = ns;
+			b->vbb.flags = V4L2_BUF_FLAG_DONE;
+			b->vbb.field = V4L2_FIELD_NONE;
+			memset(&b->vbb.timecode, 0, sizeof(b->vbb.timecode));
+			b->vbb.sequence = entry[0].first_entry.frame_num;
+			vb2_buffer_done(&b->vbb.vb2_buf, VB2_BUF_STATE_DONE);
+		}
+		cio2_fbpt_entry_init_dummy(cio2, entry);
+		q->bufs_first = (q->bufs_first + 1) % CIO2_MAX_BUFFERS;
+		buffers_found++;
+	} while (1);
+
+	if (buffers_found == 0)
+		dev_warn(&cio2->pci_dev->dev,
+			 "no ready buffers found on DMA channel %i\n",
+			 dma_chan);
+}
+
+static void cio2_queue_event_sof(struct cio2_device *cio2, struct cio2_queue *q)
+{
+	struct v4l2_event event = {
+		.type = V4L2_EVENT_FRAME_SYNC,
+		.u.frame_sync.frame_sequence =
+			atomic_inc_return(&q->frame_sequence) - 1,
+	};
+
+	v4l2_event_queue(q->subdev.devnode, &event);
+}
+
+static const char *const cio2_irq_errs[] = {
+	"single packet header error corrected",
+	"multiple packet header errors detected",
+	"payload checksum (CRC) error",
+	"fifo overflow",
+	"reserved short packet data type detected",
+	"reserved long packet data type detected",
+	"incomplete long packet detected",
+	"frame sync error",
+	"line sync error",
+	"DPHY start of transmission error",
+	"DPHY synchronization error",
+	"escape mode error",
+	"escape mode trigger event",
+	"escape mode ultra-low power state for data lane(s)",
+	"escape mode ultra-low power state exit for clock lane",
+	"inter-frame short packet discarded",
+	"inter-frame long packet discarded",
+	"non-matching Long Packet stalled",
+};
+
+static const char *const cio2_port_errs[] = {
+	"ECC recoverable",
+	"DPHY not recoverable",
+	"ECC not recoverable",
+	"CRC error",
+	"INTERFRAMEDATA",
+	"PKT2SHORT",
+	"PKT2LONG",
+};
+
+static irqreturn_t cio2_irq(int irq, void *cio2_ptr)
+{
+	struct cio2_device *cio2 = cio2_ptr;
+	void __iomem *const base = cio2->base;
+	struct device *dev = &cio2->pci_dev->dev;
+	u32 int_status, int_clear;
+
+	int_clear = int_status = readl(base + CIO2_REG_INT_STS);
+	if (!int_status)
+		return IRQ_NONE;
+
+	if (int_status & CIO2_INT_IOOE) {
+		/* Interrupt on Output Error:
+		 * 1) SRAM is full and FS received, or
+		 * 2) An invalid bit detected by DMA.
+		 */
+		u32 oe_status, oe_clear;
+
+		oe_clear = oe_status = readl(base + CIO2_REG_INT_STS_EXT_OE);
+
+		if (oe_status & CIO2_INT_EXT_OE_DMAOE_MASK) {
+			dev_err(dev, "DMA output error: 0x%x\n",
+				(oe_status & CIO2_INT_EXT_OE_DMAOE_MASK)
+				>> CIO2_INT_EXT_OE_DMAOE_SHIFT);
+			oe_status &= ~CIO2_INT_EXT_OE_DMAOE_MASK;
+		}
+		if (oe_status & CIO2_INT_EXT_OE_OES_MASK) {
+			dev_err(dev, "DMA output error on CSI2 buses: 0x%x\n",
+				(oe_status & CIO2_INT_EXT_OE_OES_MASK)
+				>> CIO2_INT_EXT_OE_OES_SHIFT);
+			oe_status &= ~CIO2_INT_EXT_OE_OES_MASK;
+		}
+		writel(oe_clear, base + CIO2_REG_INT_STS_EXT_OE);
+		if (oe_status)
+			dev_warn(dev, "unknown interrupt 0x%x on OE\n",
+				 oe_status);
+		int_status &= ~CIO2_INT_IOOE;
+	}
+
+	if (int_status & CIO2_INT_IOC_MASK) {
+		/* DMA IO done -- frame ready */
+		u32 clr = 0;
+		unsigned int d;
+
+		for (d = 0; d < CIO2_NUM_DMA_CHAN; d++)
+			if (int_status & CIO2_INT_IOC(d)) {
+				clr |= CIO2_INT_IOC(d);
+				dev_dbg(dev, "DMA %i done\n", d);
+				cio2_buffer_done(cio2, d);
+			}
+		int_status &= ~clr;
+	}
+
+	if (int_status & CIO2_INT_IOS_IOLN_MASK) {
+		/* DMA IO starts or reached specified line */
+		u32 clr = 0;
+		unsigned int d;
+
+		for (d = 0; d < CIO2_NUM_DMA_CHAN; d++)
+			if (int_status & CIO2_INT_IOS_IOLN(d)) {
+				clr |= CIO2_INT_IOS_IOLN(d);
+				if (d == CIO2_DMA_CHAN)
+					cio2_queue_event_sof(cio2,
+							     cio2->cur_queue);
+				dev_dbg(dev,
+					"DMA %i started or reached line\n", d);
+			}
+		int_status &= ~clr;
+	}
+
+	if (int_status & (CIO2_INT_IOIE | CIO2_INT_IOIRQ)) {
+		/* CSI2 receiver (error) interrupt */
+		u32 ie_status, ie_clear;
+		unsigned int port;
+
+		ie_clear = ie_status = readl(base + CIO2_REG_INT_STS_EXT_IE);
+
+		for (port = 0; port < CIO2_NUM_PORTS; port++) {
+			u32 port_status = (ie_status >> (port * 8)) & 0xff;
+			void __iomem *const csi_rx_base =
+						base + CIO2_REG_PIPE_BASE(port);
+			unsigned int i;
+
+			while (port_status) {
+				i = ffs(port_status) - 1;
+				dev_err(dev, "port %i error %s\n",
+					port, cio2_port_errs[i]);
+				ie_status &= ~BIT(port * 8 + i);
+				port_status &= ~BIT(i);
+			}
+
+			if (ie_status & CIO2_INT_EXT_IE_IRQ(port)) {
+				u32 csi2_status, csi2_clear;
+
+				csi2_clear = csi2_status = readl(csi_rx_base +
+						CIO2_REG_IRQCTRL_STATUS);
+				for (i = 0; i < ARRAY_SIZE(cio2_irq_errs);
+				     i++) {
+					if (csi2_status & (1 << i)) {
+						dev_err(dev,
+							"CSI-2 receiver port %i: %s\n",
+							port, cio2_irq_errs[i]);
+						csi2_status &= ~(1 << i);
+					}
+				}
+
+				writel(csi2_clear,
+				       csi_rx_base + CIO2_REG_IRQCTRL_CLEAR);
+				if (csi2_status)
+					dev_warn(dev,
+						 "unknown CSI2 error 0x%x on port %i\n",
+						 csi2_status, port);
+
+				ie_status &= ~CIO2_INT_EXT_IE_IRQ(port);
+			}
+		}
+
+		writel(ie_clear, base + CIO2_REG_INT_STS_EXT_IE);
+		if (ie_status)
+			dev_warn(dev, "unknown interrupt 0x%x on IE\n",
+				 ie_status);
+
+		int_status &= ~(CIO2_INT_IOIE | CIO2_INT_IOIRQ);
+	}
+
+	writel(int_clear, base + CIO2_REG_INT_STS);
+	if (int_status)
+		dev_warn(dev, "unknown interrupt 0x%x on INT\n", int_status);
+
+	return IRQ_HANDLED;
+}
+
+/**************** Videobuf2 interface ****************/
+
+static void cio2_vb2_return_all_buffers(struct cio2_queue *q,
+					enum vb2_buffer_state state)
+{
+	unsigned int i;
+
+	for (i = 0; i < CIO2_MAX_BUFFERS; i++) {
+		if (q->bufs[i]) {
+			atomic_dec(&q->bufs_queued);
+			vb2_buffer_done(&q->bufs[i]->vbb.vb2_buf, state);
+		}
+	}
+}
+
+static int cio2_vb2_queue_setup(struct vb2_queue *vq,
+				unsigned int *num_buffers,
+				unsigned int *num_planes,
+				unsigned int sizes[],
+				struct device *alloc_devs[])
+{
+	struct cio2_device *cio2 = vb2_get_drv_priv(vq);
+	struct cio2_queue *q = container_of(vq, struct cio2_queue, vbq);
+	u32 width = q->subdev_fmt.width;
+	u32 height = q->subdev_fmt.height;
+	u32 pixelformat = q->pixelformat;
+	unsigned int i, szimage;
+	int r = 0;
+
+	for (i = 0; i < ARRAY_SIZE(cio2_csi2_fmts); i++) {
+		if (pixelformat == cio2_csi2_fmts[i])
+			break;
+	}
+
+	/* Use SRGGB10 instead of return err */
+	if (i >= ARRAY_SIZE(cio2_csi2_fmts))
+		pixelformat = V4L2_PIX_FMT_IPU3_SRGGB10;
+
+	alloc_devs[0] = &cio2->pci_dev->dev;
+	szimage = cio2_bytesperline(width) * height;
+
+	if (*num_planes) {
+		/*
+		 * Only single plane is supported
+		 */
+		if (*num_planes != 1 || sizes[0] < szimage)
+			return -EINVAL;
+	}
+
+	*num_planes = 1;
+	sizes[0] = szimage;
+
+	*num_buffers = clamp_val(*num_buffers, 1, CIO2_MAX_BUFFERS);
+
+	/* Initialize buffer queue */
+	for (i = 0; i < CIO2_MAX_BUFFERS; i++) {
+		q->bufs[i] = NULL;
+		cio2_fbpt_entry_init_dummy(cio2, &q->fbpt[i * CIO2_MAX_LOPS]);
+	}
+	atomic_set(&q->bufs_queued, 0);
+	q->bufs_first = 0;
+	q->bufs_next = 0;
+
+	return r;
+}
+
+/* Called after each buffer is allocated */
+static int cio2_vb2_buf_init(struct vb2_buffer *vb)
+{
+	struct cio2_device *cio2 = vb2_get_drv_priv(vb->vb2_queue);
+	struct device *dev = &cio2->pci_dev->dev;
+	struct cio2_buffer *b =
+		container_of(vb, struct cio2_buffer, vbb.vb2_buf);
+	unsigned int length = vb->planes[0].length;
+	int lops  = DIV_ROUND_UP(DIV_ROUND_UP(length, PAGE_SIZE) + 1,
+				 PAGE_SIZE / sizeof(u32));
+	u32 *lop;
+	struct sg_table *sg;
+	struct sg_page_iter sg_iter;
+
+	if (lops <= 0 || lops > CIO2_MAX_LOPS) {
+		dev_err(dev, "%s: bad buffer size (%i)\n", __func__, length);
+		return -ENOSPC;		/* Should never happen */
+	}
+
+	/* Allocate LOP table */
+	b->lop = lop = dma_alloc_noncoherent(dev, lops * PAGE_SIZE,
+					&b->lop_bus_addr, GFP_KERNEL);
+	if (!lop)
+		return -ENOMEM;
+
+	/* Fill LOP */
+	sg = vb2_dma_sg_plane_desc(vb, 0);
+	if (!sg)
+		return -EFAULT;
+
+	for_each_sg_page(sg->sgl, &sg_iter, sg->nents, 0)
+		*lop++ = sg_page_iter_dma_address(&sg_iter) >> PAGE_SHIFT;
+	*lop++ = cio2->dummy_page_bus_addr >> PAGE_SHIFT;
+
+	return 0;
+}
+
+/* Transfer buffer ownership to cio2 */
+static void cio2_vb2_buf_queue(struct vb2_buffer *vb)
+{
+	struct cio2_device *cio2 = vb2_get_drv_priv(vb->vb2_queue);
+	struct cio2_queue *q =
+		container_of(vb->vb2_queue, struct cio2_queue, vbq);
+	struct cio2_buffer *b =
+		container_of(vb, struct cio2_buffer, vbb.vb2_buf);
+	struct cio2_fbpt_entry *entry;
+	unsigned int next = q->bufs_next;
+	int bufs_queued = atomic_inc_return(&q->bufs_queued);
+
+	if (vb2_start_streaming_called(&q->vbq)) {
+		u32 fbpt_rp =
+			(readl(cio2->base + CIO2_REG_CDMARI(CIO2_DMA_CHAN))
+			 >> CIO2_CDMARI_FBPT_RP_SHIFT)
+			& CIO2_CDMARI_FBPT_RP_MASK;
+
+		/*
+		 * fbpt_rp is the fbpt entry that the dma is currently working
+		 * on, but since it could jump to next entry at any time,
+		 * assume that we might already be there.
+		 */
+		fbpt_rp = (fbpt_rp + 1) % CIO2_MAX_BUFFERS;
+
+		if (bufs_queued <= 1)
+			next = fbpt_rp + 1;	/* Buffers were drained */
+		else if (fbpt_rp == next)
+			next++;
+		next %= CIO2_MAX_BUFFERS;
+	}
+
+	while (q->bufs[next]) {
+		/* If the entry is used, get the next one,
+		 * We can not break here if all are filled,
+		 * Will wait for one free, otherwise it will crash
+		 */
+		dev_dbg(&cio2->pci_dev->dev,
+			"entry %i was already full!\n", next);
+		next = (next + 1) % CIO2_MAX_BUFFERS;
+	}
+
+	q->bufs[next] = b;
+	entry = &q->fbpt[next * CIO2_MAX_LOPS];
+	cio2_fbpt_entry_init_buf(cio2, b, entry);
+	q->bufs_next = (next + 1) % CIO2_MAX_BUFFERS;
+}
+
+/* Called when each buffer is freed */
+static void cio2_vb2_buf_cleanup(struct vb2_buffer *vb)
+{
+	struct cio2_device *cio2 = vb2_get_drv_priv(vb->vb2_queue);
+	struct cio2_buffer *b =
+		container_of(vb, struct cio2_buffer, vbb.vb2_buf);
+	unsigned int length = vb->planes[0].length;
+	int lops = DIV_ROUND_UP(DIV_ROUND_UP(length, PAGE_SIZE),
+				PAGE_SIZE / sizeof(u32));
+
+	/* Free LOP table */
+	dma_free_coherent(&cio2->pci_dev->dev, lops * PAGE_SIZE,
+				b->lop, b->lop_bus_addr);
+}
+
+static int cio2_set_power(struct vb2_queue *vq, int enable)
+{
+	struct cio2_device *cio2 = vb2_get_drv_priv(vq);
+	struct device *dev = &cio2->pci_dev->dev;
+	int ret = 0;
+
+	if (enable) {
+		ret = pm_runtime_get_sync(dev);
+		if (ret < 0) {
+			dev_info(&cio2->pci_dev->dev,
+				"failed to get power %d\n", ret);
+			pm_runtime_put(dev);
+		}
+	} else {
+		ret = pm_runtime_put(dev);
+	}
+
+	/* return 0 if power is active */
+	return (ret >= 0) ? 0 : ret;
+}
+
+static int cio2_vb2_start_streaming(struct vb2_queue *vq, unsigned int count)
+{
+	struct cio2_queue *q = container_of(vq, struct cio2_queue, vbq);
+	struct cio2_device *cio2 = vb2_get_drv_priv(vq);
+	int r;
+
+	cio2->cur_queue = q;
+	atomic_set(&q->frame_sequence, 0);
+
+	r = cio2_set_power(vq, 1);
+	if (r) {
+		dev_info(&cio2->pci_dev->dev, "failed to set power\n");
+		return r;
+	}
+
+	r = media_pipeline_start(&q->vdev.entity, &q->pipe);
+	if (r)
+		goto fail_pipeline;
+
+	r = cio2_hw_init(cio2, q);
+	if (r)
+		goto fail_hw;
+
+	/* Start streaming on CSI2 receiver */
+	r = v4l2_subdev_call(&q->subdev, video, s_stream, 1);
+	if (r && r != -ENOIOCTLCMD)
+		goto fail_csi2_subdev;
+
+	/* Start streaming on sensor */
+	r = v4l2_subdev_call(q->sensor, video, s_stream, 1);
+	if (r)
+		goto fail_sensor_subdev;
+
+	return 0;
+
+fail_sensor_subdev:
+	v4l2_subdev_call(&q->subdev, video, s_stream, 0);
+fail_csi2_subdev:
+	cio2_hw_exit(cio2, q);
+fail_hw:
+	media_pipeline_stop(&q->vdev.entity);
+fail_pipeline:
+	dev_dbg(&cio2->pci_dev->dev, "failed to start streaming (%d)\n", r);
+	cio2_vb2_return_all_buffers(q, VB2_BUF_STATE_QUEUED);
+
+	return r;
+}
+
+static void cio2_vb2_stop_streaming(struct vb2_queue *vq)
+{
+	struct cio2_queue *q = container_of(vq, struct cio2_queue, vbq);
+	struct cio2_device *cio2 = vb2_get_drv_priv(vq);
+	int r;
+
+	if (v4l2_subdev_call(q->sensor, video, s_stream, 0))
+		dev_err(&cio2->pci_dev->dev,
+			"failed to stop sensor streaming\n");
+
+	r = v4l2_subdev_call(&q->subdev, video, s_stream, 0);
+	if (r && r != -ENOIOCTLCMD)
+		dev_err(&cio2->pci_dev->dev, "failed to stop CSI2 streaming\n");
+
+	cio2_hw_exit(cio2, q);
+	cio2_vb2_return_all_buffers(q, VB2_BUF_STATE_ERROR);
+	media_pipeline_stop(&q->vdev.entity);
+	cio2_set_power(vq, 0);
+}
+
+static const struct vb2_ops cio2_vb2_ops = {
+	.buf_init = cio2_vb2_buf_init,
+	.buf_queue = cio2_vb2_buf_queue,
+	.buf_cleanup = cio2_vb2_buf_cleanup,
+	.queue_setup = cio2_vb2_queue_setup,
+	.start_streaming = cio2_vb2_start_streaming,
+	.stop_streaming = cio2_vb2_stop_streaming,
+	.wait_prepare = vb2_ops_wait_prepare,
+	.wait_finish = vb2_ops_wait_finish,
+};
+
+/**************** V4L2 interface ****************/
+
+static int cio2_v4l2_querycap(struct file *file, void *fh,
+			      struct v4l2_capability *cap)
+{
+	struct cio2_device *cio2 = video_drvdata(file);
+
+	strlcpy(cap->driver, CIO2_NAME, sizeof(cap->driver));
+	strlcpy(cap->card, CIO2_DEVICE_NAME, sizeof(cap->card));
+	snprintf(cap->bus_info, sizeof(cap->bus_info),
+		 "PCI:%s", pci_name(cio2->pci_dev));
+	cap->device_caps = V4L2_CAP_VIDEO_CAPTURE | V4L2_CAP_STREAMING;
+	cap->capabilities = cap->device_caps | V4L2_CAP_DEVICE_CAPS;
+
+	return 0;
+}
+
+static int cio2_v4l2_enum_fmt(struct file *file, void *fh,
+			      struct v4l2_fmtdesc *f)
+{
+	if (f->index >= ARRAY_SIZE(cio2_csi2_fmts))
+		return -EINVAL;
+
+	f->pixelformat = cio2_csi2_fmts[f->index];
+
+	return 0;
+}
+
+/* Propagate forward always the format from the CIO2 subdev */
+static int cio2_v4l2_g_fmt(struct file *file, void *fh, struct v4l2_format *f)
+{
+	struct cio2_queue *q = file_to_cio2_queue(file);
+
+	memset(&f->fmt, 0, sizeof(f->fmt));
+
+	f->fmt.pix.width = q->subdev_fmt.width;
+	f->fmt.pix.height = q->subdev_fmt.height;
+	f->fmt.pix.pixelformat = q->pixelformat;
+	f->fmt.pix.field = V4L2_FIELD_NONE;
+	f->fmt.pix.bytesperline = cio2_bytesperline(f->fmt.pix.width);
+	f->fmt.pix.sizeimage = f->fmt.pix.bytesperline * f->fmt.pix.height;
+	f->fmt.pix.colorspace = V4L2_COLORSPACE_RAW;
+
+	return 0;
+}
+
+static int cio2_v4l2_try_fmt(struct file *file, void *fh, struct v4l2_format *f)
+{
+	u32 pixelformat = f->fmt.pix.pixelformat;
+	unsigned int i;
+
+	cio2_v4l2_g_fmt(file, fh, f);
+
+	for (i = 0; i < ARRAY_SIZE(cio2_csi2_fmts); i++) {
+		if (pixelformat == cio2_csi2_fmts[i])
+			break;
+	}
+
+	/* Use SRGGB10 as default if not found */
+	if (i >= ARRAY_SIZE(cio2_csi2_fmts))
+		pixelformat = V4L2_PIX_FMT_IPU3_SRGGB10;
+
+	f->fmt.pix.pixelformat = pixelformat;
+	f->fmt.pix.bytesperline = cio2_bytesperline(f->fmt.pix.width);
+	f->fmt.pix.sizeimage = f->fmt.pix.bytesperline * f->fmt.pix.height;
+
+	return 0;
+}
+
+static int cio2_v4l2_s_fmt(struct file *file, void *fh, struct v4l2_format *f)
+{
+	struct cio2_queue *q = file_to_cio2_queue(file);
+
+	cio2_v4l2_try_fmt(file, fh, f);
+	q->pixelformat = f->fmt.pix.pixelformat;
+
+	return 0;
+}
+
+static const struct v4l2_file_operations cio2_v4l2_fops = {
+	.owner = THIS_MODULE,
+	.unlocked_ioctl = video_ioctl2,
+	.open = v4l2_fh_open,
+	.release = vb2_fop_release,
+	.poll = vb2_fop_poll,
+	.mmap = vb2_fop_mmap,
+};
+
+static const struct v4l2_ioctl_ops cio2_v4l2_ioctl_ops = {
+	.vidioc_querycap = cio2_v4l2_querycap,
+	.vidioc_enum_fmt_vid_cap = cio2_v4l2_enum_fmt,
+	.vidioc_g_fmt_vid_cap = cio2_v4l2_g_fmt,
+	.vidioc_s_fmt_vid_cap = cio2_v4l2_s_fmt,
+	.vidioc_try_fmt_vid_cap = cio2_v4l2_try_fmt,
+	.vidioc_reqbufs = vb2_ioctl_reqbufs,
+	.vidioc_create_bufs = vb2_ioctl_create_bufs,
+	.vidioc_prepare_buf = vb2_ioctl_prepare_buf,
+	.vidioc_querybuf = vb2_ioctl_querybuf,
+	.vidioc_qbuf = vb2_ioctl_qbuf,
+	.vidioc_dqbuf = vb2_ioctl_dqbuf,
+	.vidioc_streamon = vb2_ioctl_streamon,
+	.vidioc_streamoff = vb2_ioctl_streamoff,
+	.vidioc_expbuf = vb2_ioctl_expbuf,
+};
+
+static int cio2_subdev_subscribe_event(struct v4l2_subdev *sd,
+				       struct v4l2_fh *fh,
+				       struct v4l2_event_subscription *sub)
+{
+	if (sub->type != V4L2_EVENT_FRAME_SYNC)
+		return -EINVAL;
+
+	/* Line number. For now only zero accepted. */
+	if (sub->id != 0)
+		return -EINVAL;
+
+	return v4l2_event_subscribe(fh, sub, 0, NULL);
+}
+
+/*
+ * cio2_subdev_get_fmt - Handle get format by pads subdev method
+ * @sd : pointer to v4l2 subdev structure
+ * @cfg: V4L2 subdev pad config
+ * @fmt: pointer to v4l2 subdev format structure
+ * return -EINVAL or zero on success
+ */
+static int cio2_subdev_get_fmt(struct v4l2_subdev *sd,
+			       struct v4l2_subdev_pad_config *cfg,
+			       struct v4l2_subdev_format *fmt)
+{
+	struct cio2_queue *q = container_of(sd, struct cio2_queue, subdev);
+
+	if (fmt->which == V4L2_SUBDEV_FORMAT_TRY)
+		fmt->format = *v4l2_subdev_get_try_format(sd, cfg, fmt->pad);
+	else	/* Retrieve the current format */
+		fmt->format = q->subdev_fmt;
+
+	return 0;
+}
+
+/*
+ * cio2_subdev_set_fmt - Handle set format by pads subdev method
+ * @sd : pointer to v4l2 subdev structure
+ * @cfg: V4L2 subdev pad config
+ * @fmt: pointer to v4l2 subdev format structure
+ * return -EINVAL or zero on success
+ */
+static int cio2_subdev_set_fmt(struct v4l2_subdev *sd,
+			       struct v4l2_subdev_pad_config *cfg,
+			       struct v4l2_subdev_format *fmt)
+{
+	struct cio2_queue *q = container_of(sd, struct cio2_queue, subdev);
+
+	/*
+	 * Only allow setting sink pad format;
+	 * source always propagates from sink
+	 */
+	if (fmt->pad == CIO2_PAD_SOURCE)
+		return cio2_subdev_get_fmt(sd, cfg, fmt);
+
+	if (fmt->which == V4L2_SUBDEV_FORMAT_TRY)
+		*v4l2_subdev_get_try_format(sd, cfg, fmt->pad) = fmt->format;
+	else {
+		/* It's the sink, allow changing frame size */
+		q->subdev_fmt.width = fmt->format.width;
+		q->subdev_fmt.height = fmt->format.height;
+		q->subdev_fmt.code = fmt->format.code;
+		fmt->format = q->subdev_fmt;
+	}
+
+	return 0;
+}
+
+static int cio2_subdev_enum_mbus_code(struct v4l2_subdev *sd,
+				      struct v4l2_subdev_pad_config *cfg,
+				      struct v4l2_subdev_mbus_code_enum *code)
+{
+	static const u32 codes[] = {
+		MEDIA_BUS_FMT_SRGGB10_1X10,
+		MEDIA_BUS_FMT_SBGGR10_1X10,
+		MEDIA_BUS_FMT_SGBRG10_1X10,
+		MEDIA_BUS_FMT_SGRBG10_1X10,
+	};
+
+	if (code->index >= ARRAY_SIZE(codes))
+		return -EINVAL;
+
+	code->code = codes[code->index];
+
+	return 0;
+}
+
+static const struct v4l2_subdev_core_ops cio2_subdev_core_ops = {
+	.subscribe_event = cio2_subdev_subscribe_event,
+	.unsubscribe_event = v4l2_event_subdev_unsubscribe,
+};
+
+static const struct v4l2_subdev_video_ops cio2_subdev_video_ops = {};
+
+static const struct v4l2_subdev_pad_ops cio2_subdev_pad_ops = {
+	.link_validate = v4l2_subdev_link_validate_default,
+	.get_fmt = cio2_subdev_get_fmt,
+	.set_fmt = cio2_subdev_set_fmt,
+	.enum_mbus_code = cio2_subdev_enum_mbus_code,
+};
+
+static const struct v4l2_subdev_ops cio2_subdev_ops = {
+	.core = &cio2_subdev_core_ops,
+	.video = &cio2_subdev_video_ops,
+	.pad = &cio2_subdev_pad_ops,
+};
+
+/******* V4L2 sub-device asynchronous registration callbacks***********/
+
+static struct cio2_queue *cio2_find_queue_by_sensor_node(struct cio2_queue *q,
+						struct fwnode_handle *fwnode)
+{
+	int i;
+
+	for (i = 0; i < CIO2_QUEUES; i++) {
+		if (q[i].sensor->fwnode == fwnode)
+			return &q[i];
+	}
+
+	return NULL;
+}
+
+/* The .bound() notifier callback when a match is found */
+static int cio2_notifier_bound(struct v4l2_async_notifier *notifier,
+			       struct v4l2_subdev *sd,
+			       struct v4l2_async_subdev *asd)
+{
+	struct cio2_device *cio2 = container_of(notifier,
+					struct cio2_device, notifier);
+	struct sensor_async_subdev *s_asd = container_of(asd,
+					struct sensor_async_subdev, asd);
+	struct cio2_queue *q;
+	struct device *dev;
+	int i;
+
+	dev = &cio2->pci_dev->dev;
+
+	/* Find first free slot for the subdev */
+	for (i = 0; i < CIO2_QUEUES; i++)
+		if (!cio2->queue[i].sensor)
+			break;
+
+	if (i >= CIO2_QUEUES) {
+		dev_err(dev, "too many subdevs\n");
+		return -ENOSPC;
+	}
+	q = &cio2->queue[i];
+
+	q->csi2.port = s_asd->vfwn_endpt.base.port;
+	q->csi2.num_of_lanes = s_asd->vfwn_endpt.bus.mipi_csi2.num_data_lanes;
+	q->sensor = sd;
+	q->csi_rx_base = cio2->base + CIO2_REG_PIPE_BASE(q->csi2.port);
+
+	return 0;
+}
+
+/* The .unbind callback */
+static void cio2_notifier_unbind(struct v4l2_async_notifier *notifier,
+				 struct v4l2_subdev *sd,
+				 struct v4l2_async_subdev *asd)
+{
+	struct cio2_device *cio2 = container_of(notifier,
+						struct cio2_device, notifier);
+	unsigned int i;
+
+	/* Note: sd may here point to unallocated memory. Do not access. */
+	for (i = 0; i < CIO2_QUEUES; i++) {
+		if (cio2->queue[i].sensor == sd) {
+			cio2->queue[i].sensor = NULL;
+			return;
+		}
+	}
+}
+
+/* .complete() is called after all subdevices have been located */
+static int cio2_notifier_complete(struct v4l2_async_notifier *notifier)
+{
+	struct cio2_device *cio2 = container_of(notifier, struct cio2_device,
+						notifier);
+	struct sensor_async_subdev *s_asd;
+	struct fwnode_handle *fwn_remote, *fwn_endpt, *fwn_remote_endpt;
+	struct cio2_queue *q;
+	struct fwnode_endpoint remote_endpt;
+	int i, ret;
+
+	for (i = 0; i < notifier->num_subdevs; i++) {
+		s_asd = container_of(cio2->notifier.subdevs[i],
+					struct sensor_async_subdev,
+					asd);
+
+		fwn_remote = s_asd->asd.match.fwnode.fwn;
+		fwn_endpt = (struct fwnode_handle *)
+					s_asd->vfwn_endpt.base.local_fwnode;
+		fwn_remote_endpt = fwnode_graph_get_remote_endpoint(fwn_endpt);
+		if (!fwn_remote_endpt) {
+			dev_err(&cio2->pci_dev->dev,
+					"failed to get remote endpt %d\n", ret);
+			return ret;
+		}
+
+		ret = fwnode_graph_parse_endpoint(fwn_remote_endpt,
+							&remote_endpt);
+		if (ret) {
+			dev_err(&cio2->pci_dev->dev,
+				"failed to parse remote endpt %d\n", ret);
+			return ret;
+		}
+
+		q = cio2_find_queue_by_sensor_node(cio2->queue, fwn_remote);
+		if (!q) {
+			dev_err(&cio2->pci_dev->dev,
+					"failed to find cio2 queue %d\n", ret);
+			return ret;
+		}
+
+		ret = media_create_pad_link(
+				&q->sensor->entity, remote_endpt.id,
+				&q->subdev.entity, s_asd->vfwn_endpt.base.id,
+				0);
+		if (ret) {
+			dev_err(&cio2->pci_dev->dev,
+					"failed to create link for %s\n",
+					cio2->queue[i].sensor->name);
+			return ret;
+		}
+	}
+
+	return v4l2_device_register_subdev_nodes(&cio2->v4l2_dev);
+}
+
+static int cio2_notifier_init(struct cio2_device *cio2)
+{
+	struct device *dev;
+	struct fwnode_handle *dev_fwn, *fwn, *fwn_remote;
+	struct v4l2_async_subdev *asd;
+	struct sensor_async_subdev *s_asd;
+	int ret, endpt_i;
+
+	dev = &cio2->pci_dev->dev;
+	dev_fwn = dev_fwnode(dev);
+
+	asd = devm_kzalloc(dev, sizeof(asd) * CIO2_QUEUES, GFP_KERNEL);
+	if (!asd)
+		return -ENOMEM;
+
+	cio2->notifier.subdevs = (struct v4l2_async_subdev **)asd;
+	cio2->notifier.num_subdevs = 0;
+	cio2->notifier.bound = cio2_notifier_bound;
+	cio2->notifier.unbind = cio2_notifier_unbind;
+	cio2->notifier.complete = cio2_notifier_complete;
+
+	fwn = NULL;
+	endpt_i = 0;
+	while (endpt_i < CIO2_QUEUES &&
+			(fwn = fwnode_graph_get_next_endpoint(dev_fwn, fwn))) {
+		s_asd = devm_kzalloc(dev, sizeof(*s_asd), GFP_KERNEL);
+		if (!asd)
+			return -ENOMEM;
+
+		fwn_remote = fwnode_graph_get_remote_port_parent(fwn);
+		if (!fwn_remote) {
+			dev_err(dev, "bad remote port parent\n");
+			return -ENOENT;
+		}
+
+		ret = v4l2_fwnode_endpoint_parse(fwn, &s_asd->vfwn_endpt);
+		if (ret) {
+			dev_err(dev, "endpoint parsing error : %d\n", ret);
+			return ret;
+		}
+
+		if (s_asd->vfwn_endpt.bus_type != V4L2_MBUS_CSI2) {
+			dev_warn(dev, "endpoint bus type error\n");
+			devm_kfree(dev, s_asd);
+			continue;
+		}
+
+		s_asd->asd.match.fwnode.fwn = fwn_remote;
+		s_asd->asd.match_type = V4L2_ASYNC_MATCH_FWNODE;
+
+		cio2->notifier.subdevs[endpt_i++] = &s_asd->asd;
+	}
+
+	if (!endpt_i)
+		return 0;	/* No endpoint */
+
+	cio2->notifier.num_subdevs = endpt_i;
+	ret = v4l2_async_notifier_register(&cio2->v4l2_dev, &cio2->notifier);
+	if (ret) {
+		cio2->notifier.num_subdevs = 0;
+		dev_err(dev, "failed to register async notifier : %d\n", ret);
+		return ret;
+	}
+
+	return 0;
+}
+
+static void cio2_notifier_exit(struct cio2_device *cio2)
+{
+	if (cio2->notifier.num_subdevs > 0)
+		v4l2_async_notifier_unregister(&cio2->notifier);
+}
+
+/**************** Queue initialization ****************/
+static const struct media_entity_operations cio2_media_ops = {
+	.link_validate = v4l2_subdev_link_validate,
+};
+
+int cio2_queue_init(struct cio2_device *cio2, struct cio2_queue *q)
+{
+	static const u32 default_width = 1936;
+	static const u32 default_height = 1096;
+	static const u32 default_mbusfmt = MEDIA_BUS_FMT_SRGGB10_1X10;
+
+	struct video_device *vdev = &q->vdev;
+	struct vb2_queue *vbq = &q->vbq;
+	struct v4l2_subdev *subdev = &q->subdev;
+	struct v4l2_mbus_framefmt *fmt;
+	int r;
+
+	/* Initialize miscellaneous variables */
+	mutex_init(&q->lock);
+
+	/* Initialize formats to default values */
+	fmt = &q->subdev_fmt;
+	fmt->width = default_width;
+	fmt->height = default_height;
+	fmt->code = default_mbusfmt;
+	fmt->field = V4L2_FIELD_NONE;
+	fmt->colorspace = V4L2_COLORSPACE_RAW;
+	fmt->ycbcr_enc = V4L2_YCBCR_ENC_DEFAULT;
+	fmt->quantization = V4L2_QUANTIZATION_DEFAULT;
+	fmt->xfer_func = V4L2_XFER_FUNC_DEFAULT;
+
+	q->pixelformat = V4L2_PIX_FMT_IPU3_SRGGB10;
+
+	/* Initialize fbpt */
+	r = cio2_fbpt_init(cio2, q);
+	if (r)
+		goto fail_fbpt;
+
+	/* Initialize media entities */
+	r = media_entity_pads_init(&subdev->entity, CIO2_PADS, q->subdev_pads);
+	if (r) {
+		dev_err(&cio2->pci_dev->dev,
+			"failed initialize subdev media entity (%d)\n", r);
+		goto fail_subdev_media_entity;
+	}
+	q->subdev_pads[CIO2_PAD_SINK].flags = MEDIA_PAD_FL_SINK |
+		MEDIA_PAD_FL_MUST_CONNECT;
+	q->subdev_pads[CIO2_PAD_SOURCE].flags = MEDIA_PAD_FL_SOURCE;
+	subdev->entity.ops = &cio2_media_ops;
+	r = media_entity_pads_init(&vdev->entity, 1, &q->vdev_pad);
+	if (r) {
+		dev_err(&cio2->pci_dev->dev,
+			"failed initialize videodev media entity (%d)\n", r);
+		goto fail_vdev_media_entity;
+	}
+	q->vdev_pad.flags = MEDIA_PAD_FL_SINK | MEDIA_PAD_FL_MUST_CONNECT;
+	vdev->entity.ops = &cio2_media_ops;
+
+	/* Initialize subdev */
+	v4l2_subdev_init(subdev, &cio2_subdev_ops);
+	subdev->flags = V4L2_SUBDEV_FL_HAS_DEVNODE | V4L2_SUBDEV_FL_HAS_EVENTS;
+	subdev->owner = THIS_MODULE;
+	snprintf(subdev->name, sizeof(subdev->name),
+		 CIO2_ENTITY_NAME ":%li", q - cio2->queue);
+	v4l2_set_subdevdata(subdev, cio2);
+	r = v4l2_device_register_subdev(&cio2->v4l2_dev, subdev);
+	if (r) {
+		dev_err(&cio2->pci_dev->dev,
+			"failed initialize subdev (%d)\n", r);
+		goto fail_subdev;
+	}
+
+	/* Initialize vbq */
+	vbq->type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
+	vbq->io_modes = VB2_USERPTR | VB2_MMAP;
+	vbq->ops = &cio2_vb2_ops;
+	vbq->mem_ops = &vb2_dma_sg_memops;
+	vbq->buf_struct_size = sizeof(struct cio2_buffer);
+	vbq->timestamp_flags = V4L2_BUF_FLAG_TIMESTAMP_MONOTONIC;
+	vbq->min_buffers_needed = 1;
+	vbq->drv_priv = cio2;
+	vbq->lock = &q->lock;
+	r = vb2_queue_init(vbq);
+	if (r) {
+		dev_err(&cio2->pci_dev->dev,
+			"failed to initialize videobuf2 queue (%d)\n", r);
+		goto fail_vbq;
+	}
+
+	/* Initialize vdev */
+	snprintf(vdev->name, sizeof(vdev->name),
+		 "%s:%li", CIO2_NAME, q - cio2->queue);
+	vdev->release = video_device_release_empty;
+	vdev->fops = &cio2_v4l2_fops;
+	vdev->ioctl_ops = &cio2_v4l2_ioctl_ops;
+	vdev->lock = &cio2->lock;
+	vdev->v4l2_dev = &cio2->v4l2_dev;
+	vdev->queue = &q->vbq;
+	video_set_drvdata(vdev, cio2);
+	r = video_register_device(vdev, VFL_TYPE_GRABBER, -1);
+	if (r) {
+		dev_err(&cio2->pci_dev->dev,
+			"failed to register video device (%d)\n", r);
+		goto fail_vdev;
+	}
+
+	/* Create link from CIO2 subdev to output node */
+	r = media_create_pad_link(
+		&subdev->entity, CIO2_PAD_SOURCE, &vdev->entity, 0,
+		MEDIA_LNK_FL_ENABLED | MEDIA_LNK_FL_IMMUTABLE);
+	if (r)
+		goto fail_link;
+
+	return 0;
+
+fail_link:
+	video_unregister_device(&q->vdev);
+fail_vdev:
+	vb2_queue_release(vbq);
+fail_vbq:
+	v4l2_device_unregister_subdev(subdev);
+fail_subdev:
+	media_entity_cleanup(&vdev->entity);
+fail_vdev_media_entity:
+	media_entity_cleanup(&subdev->entity);
+fail_subdev_media_entity:
+	cio2_fbpt_exit(q, &cio2->pci_dev->dev);
+fail_fbpt:
+	mutex_destroy(&q->lock);
+
+	return r;
+}
+
+static void cio2_queue_exit(struct cio2_device *cio2, struct cio2_queue *q)
+{
+	video_unregister_device(&q->vdev);
+	vb2_queue_release(&q->vbq);
+	v4l2_device_unregister_subdev(&q->subdev);
+	media_entity_cleanup(&q->vdev.entity);
+	media_entity_cleanup(&q->subdev.entity);
+	cio2_fbpt_exit(q, &cio2->pci_dev->dev);
+	mutex_destroy(&q->lock);
+}
+
+/**************** PCI interface ****************/
+
+static int cio2_pci_config_setup(struct pci_dev *dev)
+{
+	u16 pci_command;
+	int r = pci_enable_msi(dev);
+
+	if (r) {
+		dev_err(&dev->dev, "failed to enable MSI (%d)\n", r);
+		return r;
+	}
+
+	pci_read_config_word(dev, PCI_COMMAND, &pci_command);
+	pci_command |= PCI_COMMAND_MEMORY | PCI_COMMAND_MASTER |
+		PCI_COMMAND_INTX_DISABLE;
+	pci_write_config_word(dev, PCI_COMMAND, pci_command);
+
+	return 0;
+}
+
+static int cio2_pci_probe(struct pci_dev *pci_dev,
+			  const struct pci_device_id *id)
+{
+	struct cio2_device *cio2;
+	phys_addr_t phys;
+	void __iomem *const *iomap;
+	int i = -1, r = -ENODEV;
+
+	cio2 = devm_kzalloc(&pci_dev->dev, sizeof(*cio2), GFP_KERNEL);
+	if (!cio2)
+		return -ENOMEM;
+	cio2->pci_dev = pci_dev;
+
+	r = pcim_enable_device(pci_dev);
+	if (r) {
+		dev_err(&pci_dev->dev, "failed to enable device (%d)\n", r);
+		return r;
+	}
+
+	dev_info(&pci_dev->dev, "device 0x%x (rev: 0x%x)\n",
+		 pci_dev->device, pci_dev->revision);
+
+	phys = pci_resource_start(pci_dev, CIO2_PCI_BAR);
+
+	r = pcim_iomap_regions(pci_dev, 1 << CIO2_PCI_BAR, pci_name(pci_dev));
+	if (r) {
+		dev_err(&pci_dev->dev, "failed to remap I/O memory (%d)\n", r);
+		return -ENODEV;
+	}
+
+	iomap = pcim_iomap_table(pci_dev);
+	if (!iomap) {
+		dev_err(&pci_dev->dev, "failed to iomap table\n");
+		return -ENODEV;
+	}
+
+	cio2->base = iomap[CIO2_PCI_BAR];
+
+	pci_set_drvdata(pci_dev, cio2);
+
+	pci_set_master(pci_dev);
+
+	r = pci_set_dma_mask(pci_dev, CIO2_DMA_MASK);
+	if (r) {
+		dev_err(&pci_dev->dev, "failed to set DMA mask (%d)\n", r);
+		return -ENODEV;
+	}
+
+	r = cio2_pci_config_setup(pci_dev);
+	if (r)
+		return -ENODEV;
+
+	mutex_init(&cio2->lock);
+
+	cio2->media_dev.dev = &cio2->pci_dev->dev;
+	strlcpy(cio2->media_dev.model, CIO2_DEVICE_NAME,
+		sizeof(cio2->media_dev.model));
+	snprintf(cio2->media_dev.bus_info, sizeof(cio2->media_dev.bus_info),
+		 "PCI:%s", pci_name(cio2->pci_dev));
+	cio2->media_dev.driver_version = KERNEL_VERSION(4, 11, 0);
+	cio2->media_dev.hw_revision = 0;
+
+	media_device_init(&cio2->media_dev);
+	r = media_device_register(&cio2->media_dev);
+	if (r < 0)
+		goto fail_mutex_destroy;
+
+	cio2->v4l2_dev.mdev = &cio2->media_dev;
+	r = v4l2_device_register(&pci_dev->dev, &cio2->v4l2_dev);
+	if (r) {
+		dev_err(&pci_dev->dev,
+			"failed to register V4L2 device (%d)\n", r);
+		goto fail_mutex_destroy;
+	}
+
+	for (i = 0; i < CIO2_QUEUES; i++) {
+		r = cio2_queue_init(cio2, &cio2->queue[i]);
+		if (r)
+			goto fail;
+	}
+
+	r = cio2_fbpt_init_dummy(cio2);
+	if (r)
+		goto fail;
+
+	/* Register notifier for subdevices we care */
+	r = cio2_notifier_init(cio2);
+	if (r)
+		goto fail;
+
+	r = devm_request_irq(&pci_dev->dev, pci_dev->irq, cio2_irq,
+			     IRQF_SHARED, CIO2_NAME, cio2);
+	if (r) {
+		dev_err(&pci_dev->dev, "failed to request IRQ (%d)\n", r);
+		goto fail;
+	}
+
+	pm_runtime_put_noidle(&pci_dev->dev);
+	pm_runtime_allow(&pci_dev->dev);
+
+	return 0;
+
+fail:
+	cio2_notifier_exit(cio2);
+	cio2_fbpt_exit_dummy(cio2);
+	for (; i >= 0; i--)
+		cio2_queue_exit(cio2, &cio2->queue[i]);
+	v4l2_device_unregister(&cio2->v4l2_dev);
+	media_device_unregister(&cio2->media_dev);
+	media_device_cleanup(&cio2->media_dev);
+fail_mutex_destroy:
+	mutex_destroy(&cio2->lock);
+
+	return r;
+}
+
+static void cio2_pci_remove(struct pci_dev *pci_dev)
+{
+	struct cio2_device *cio2 = pci_get_drvdata(pci_dev);
+	unsigned int i;
+
+	cio2_notifier_exit(cio2);
+	cio2_fbpt_exit_dummy(cio2);
+	for (i = 0; i < CIO2_QUEUES; i++)
+		cio2_queue_exit(cio2, &cio2->queue[i]);
+	v4l2_device_unregister(&cio2->v4l2_dev);
+	media_device_unregister(&cio2->media_dev);
+	media_device_cleanup(&cio2->media_dev);
+	mutex_destroy(&cio2->lock);
+}
+
+static int cio2_runtime_suspend(struct device *dev)
+{
+	struct pci_dev *pci_dev = to_pci_dev(dev);
+	struct cio2_device *cio2 = pci_get_drvdata(pci_dev);
+	void __iomem *const base = cio2->base;
+	u16 pm;
+
+	writel(CIO2_D0I3C_I3, base + CIO2_REG_D0I3C);
+	dev_dbg(dev, "cio2 runtime suspend.\n");
+
+	pci_read_config_word(pci_dev, pci_dev->pm_cap + CIO2_PMCSR_OFFSET,
+				&pm);
+	pm = (pm >> CIO2_PMCSR_D0D3_SHIFT) << CIO2_PMCSR_D0D3_SHIFT;
+	pm |= CIO2_PMCSR_D3;
+	pci_write_config_word(pci_dev, pci_dev->pm_cap + CIO2_PMCSR_OFFSET,
+				pm);
+
+	return 0;
+}
+
+static int cio2_runtime_resume(struct device *dev)
+{
+	struct pci_dev *pci_dev = to_pci_dev(dev);
+	struct cio2_device *cio2 = pci_get_drvdata(pci_dev);
+	void __iomem *const base = cio2->base;
+	u16 pm;
+
+	writel(CIO2_D0I3C_RR, base + CIO2_REG_D0I3C);
+	dev_dbg(dev, "cio2 runtime resume.\n");
+
+	pci_read_config_word(pci_dev, pci_dev->pm_cap + CIO2_PMCSR_OFFSET,
+				&pm);
+	pm = (pm >> CIO2_PMCSR_D0D3_SHIFT) << CIO2_PMCSR_D0D3_SHIFT;
+	pci_write_config_word(pci_dev, pci_dev->pm_cap + CIO2_PMCSR_OFFSET,
+				pm);
+
+	return 0;
+}
+
+static const struct dev_pm_ops cio2_pm_ops = {
+	SET_RUNTIME_PM_OPS(&cio2_runtime_suspend,
+		&cio2_runtime_resume, NULL)
+};
+
+
+static const struct pci_device_id cio2_pci_id_table[] = {
+	{ PCI_DEVICE(PCI_VENDOR_ID_INTEL, CIO2_PCI_ID) },
+	{ 0 }
+};
+
+MODULE_DEVICE_TABLE(pci, cio2_pci_id_table);
+
+static struct pci_driver cio2_pci_driver = {
+	.name = CIO2_NAME,
+	.id_table = cio2_pci_id_table,
+	.probe = cio2_pci_probe,
+	.remove = cio2_pci_remove,
+	.driver = {
+		.pm = &cio2_pm_ops,
+	},
+};
+
+module_pci_driver(cio2_pci_driver);
+
+MODULE_AUTHOR("Tuukka Toivonen <tuukka.toivonen@intel.com>");
+MODULE_AUTHOR("Tianshu Qiu <tian.shu.qiu@intel.com>");
+MODULE_AUTHOR("Jian Xu Zheng <jian.xu.zheng@intel.com>");
+MODULE_AUTHOR("Yuning Pu <yuning.pu@intel.com>");
+MODULE_AUTHOR("Yong Zhi <yong.zhi@intel.com>");
+MODULE_LICENSE("GPL v2");
+MODULE_DESCRIPTION("IPU3 CIO2 driver");
diff --git a/drivers/media/pci/intel/ipu3/ipu3-cio2.h b/drivers/media/pci/intel/ipu3/ipu3-cio2.h
new file mode 100644
index 0000000..b61d2ff
--- /dev/null
+++ b/drivers/media/pci/intel/ipu3/ipu3-cio2.h
@@ -0,0 +1,424 @@
+/*
+ * Copyright (c) 2017 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version
+ * 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef __IPU3_CIO2_H
+#define __IPU3_CIO2_H
+
+#define CIO2_NAME			"ipu3-cio2"
+#define CIO2_DEVICE_NAME		"Intel IPU3 CIO2"
+#define CIO2_ENTITY_NAME		"ipu3-csi2"
+#define CIO2_PCI_ID			0x9d32
+#define CIO2_PCI_BAR			0
+#define CIO2_DMA_MASK			DMA_BIT_MASK(39)
+#define CIO2_QUEUES			2 /* 1 for each sensor */
+
+#define CIO2_MAX_LOPS			8 /* 32MB = 8xFBPT_entry */
+#define CIO2_MAX_BUFFERS		(PAGE_SIZE / 16 / CIO2_MAX_LOPS)
+
+#define CIO2_PAD_SINK			0 /* sinking data */
+#define CIO2_PAD_SOURCE			1 /* sourcing data */
+#define CIO2_PADS			2
+
+#define CIO2_NUM_DMA_CHAN		20
+#define CIO2_NUM_PORTS			4 /* DPHYs */
+
+/* Register and bit field definitions */
+#define CIO2_REG_PIPE_BASE(n)		((n) * 0x0400)	/* n = 0..3 */
+#define CIO2_REG_CSIRX_BASE		0x000
+#define CIO2_REG_MIPIBE_BASE		0x100
+#define CIO2_REG_PIXELGEN_BAS		0x200
+#define CIO2_REG_IRQCTRL_BASE		0x300
+#define CIO2_REG_GPREG_BASE		0x1000
+
+/* base register: CIO2_REG_PIPE_BASE(pipe) * CIO2_REG_CSIRX_BASE */
+#define CIO2_REG_CSIRX_ENABLE			(CIO2_REG_CSIRX_BASE + 0x0)
+#define CIO2_REG_CSIRX_NOF_ENABLED_LANES	(CIO2_REG_CSIRX_BASE + 0x4)
+#define CIO2_REG_CSIRX_SP_IF_CONFIG		(CIO2_REG_CSIRX_BASE + 0x10)
+#define CIO2_REG_CSIRX_LP_IF_CONFIG		(CIO2_REG_CSIRX_BASE + 0x14)
+#define CIO2_CSIRX_IF_CONFIG_FILTEROUT			0x00
+#define CIO2_CSIRX_IF_CONFIG_FILTEROUT_VC_INACTIVE	0x01
+#define CIO2_CSIRX_IF_CONFIG_PASS			0x02
+#define CIO2_CSIRX_IF_CONFIG_FLAG_ERROR			BIT(2)
+#define CIO2_REG_CSIRX_STATUS			(CIO2_REG_CSIRX_BASE + 0x18)
+#define CIO2_REG_CSIRX_STATUS_DLANE_HS		(CIO2_REG_CSIRX_BASE + 0x1c)
+#define CIO2_CSIRX_STATUS_DLANE_HS_MASK			0xff
+#define CIO2_REG_CSIRX_STATUS_DLANE_LP		(CIO2_REG_CSIRX_BASE + 0x20)
+#define CIO2_CSIRX_STATUS_DLANE_LP_MASK			0xffffff
+/* Termination enable and settle in 0.0625ns units, lane=0..3 or -1 for clock */
+#define CIO2_REG_CSIRX_DLY_CNT_TERMEN(lane) \
+				(CIO2_REG_CSIRX_BASE + 0x2c + 8*(lane))
+#define CIO2_REG_CSIRX_DLY_CNT_SETTLE(lane) \
+				(CIO2_REG_CSIRX_BASE + 0x30 + 8*(lane))
+/* base register: CIO2_REG_PIPE_BASE(pipe) * CIO2_REG_MIPIBE_BASE */
+#define CIO2_REG_MIPIBE_ENABLE		(CIO2_REG_MIPIBE_BASE + 0x0)
+#define CIO2_REG_MIPIBE_STATUS		(CIO2_REG_MIPIBE_BASE + 0x4)
+#define CIO2_REG_MIPIBE_COMP_FORMAT(vc) \
+				(CIO2_REG_MIPIBE_BASE + 0x8 + 0x4*(vc))
+#define CIO2_REG_MIPIBE_FORCE_RAW8	(CIO2_REG_MIPIBE_BASE + 0x20)
+#define CIO2_REG_MIPIBE_FORCE_RAW8_ENABLE		BIT(0)
+#define CIO2_REG_MIPIBE_FORCE_RAW8_USE_TYPEID		BIT(1)
+#define CIO2_REG_MIPIBE_FORCE_RAW8_TYPEID_SHIFT		2
+
+#define CIO2_REG_MIPIBE_IRQ_STATUS	(CIO2_REG_MIPIBE_BASE + 0x24)
+#define CIO2_REG_MIPIBE_IRQ_CLEAR	(CIO2_REG_MIPIBE_BASE + 0x28)
+#define CIO2_REG_MIPIBE_GLOBAL_LUT_DISREGARD (CIO2_REG_MIPIBE_BASE + 0x68)
+#define CIO2_MIPIBE_GLOBAL_LUT_DISREGARD		1
+#define CIO2_REG_MIPIBE_PKT_STALL_STATUS (CIO2_REG_MIPIBE_BASE + 0x6c)
+#define CIO2_REG_MIPIBE_PARSE_GSP_THROUGH_LP_LUT_REG_IDX \
+					(CIO2_REG_MIPIBE_BASE + 0x70)
+#define CIO2_REG_MIPIBE_SP_LUT_ENTRY(vc) \
+				       (CIO2_REG_MIPIBE_BASE + 0x74 + 4*(vc))
+#define CIO2_REG_MIPIBE_LP_LUT_ENTRY(m)	/* m = 0..15 */ \
+					(CIO2_REG_MIPIBE_BASE + 0x84 + 4*(m))
+#define CIO2_MIPIBE_LP_LUT_ENTRY_DISREGARD		1
+#define CIO2_MIPIBE_LP_LUT_ENTRY_SID_SHIFT		1
+#define CIO2_MIPIBE_LP_LUT_ENTRY_VC_SHIFT		5
+#define CIO2_MIPIBE_LP_LUT_ENTRY_FORMAT_TYPE_SHIFT	7
+
+/* base register: CIO2_REG_PIPE_BASE(pipe) * CIO2_REG_IRQCTRL_BASE */
+/* IRQ registers are 18-bit wide, see cio2_irq_error for bit definitions */
+#define CIO2_REG_IRQCTRL_EDGE		(CIO2_REG_IRQCTRL_BASE + 0x00)
+#define CIO2_REG_IRQCTRL_MASK		(CIO2_REG_IRQCTRL_BASE + 0x04)
+#define CIO2_REG_IRQCTRL_STATUS		(CIO2_REG_IRQCTRL_BASE + 0x08)
+#define CIO2_REG_IRQCTRL_CLEAR		(CIO2_REG_IRQCTRL_BASE + 0x0c)
+#define CIO2_REG_IRQCTRL_ENABLE		(CIO2_REG_IRQCTRL_BASE + 0x10)
+#define CIO2_REG_IRQCTRL_LEVEL_NOT_PULSE	(CIO2_REG_IRQCTRL_BASE + 0x14)
+
+#define CIO2_REG_GPREG_SRST		(CIO2_REG_GPREG_BASE + 0x0)
+#define CIO2_GPREG_SRST_ALL				0xffff	/* Reset all */
+#define CIO2_REG_FB_HPLL_FREQ		(CIO2_REG_GPREG_BASE + 0x08)
+#define CIO2_REG_ISCLK_RATIO		(CIO2_REG_GPREG_BASE + 0xc)
+
+#define CIO2_REG_CGC			0x1400
+#define CIO2_CGC_CSI2_TGE				BIT(0)
+#define CIO2_CGC_PRIM_TGE				BIT(1)
+#define CIO2_CGC_SIDE_TGE				BIT(2)
+#define CIO2_CGC_XOSC_TGE				BIT(3)
+#define CIO2_CGC_MPLL_SHUTDOWN_EN			BIT(4)
+#define CIO2_CGC_D3I3_TGE				BIT(5)
+#define CIO2_CGC_CSI2_INTERFRAME_TGE			BIT(6)
+#define CIO2_CGC_CSI2_PORT_DCGE				BIT(8)
+#define CIO2_CGC_CSI2_DCGE				BIT(9)
+#define CIO2_CGC_SIDE_DCGE				BIT(10)
+#define CIO2_CGC_PRIM_DCGE				BIT(11)
+#define CIO2_CGC_ROSC_DCGE				BIT(12)
+#define CIO2_CGC_XOSC_DCGE				BIT(13)
+#define CIO2_CGC_FLIS_DCGE				BIT(14)
+#define CIO2_CGC_CLKGATE_HOLDOFF_SHIFT			20
+#define CIO2_CGC_CSI_CLKGATE_HOLDOFF_SHIFT		24
+#define CIO2_REG_D0I3C			0x1408
+#define CIO2_D0I3C_I3					BIT(2)	/* Set D0I3 */
+#define CIO2_D0I3C_RR					BIT(3)	/* Restore? */
+#define CIO2_REG_SWRESET		0x140c
+#define CIO2_SWRESET_SWRESET				1
+#define CIO2_REG_SENSOR_ACTIVE		0x1410
+#define CIO2_REG_INT_STS		0x1414
+#define CIO2_REG_INT_STS_EXT_OE		0x1418
+#define CIO2_INT_EXT_OE_DMAOE_SHIFT			0
+#define CIO2_INT_EXT_OE_DMAOE_MASK			0x7ffff
+#define CIO2_INT_EXT_OE_OES_SHIFT			24
+#define CIO2_INT_EXT_OE_OES_MASK	(0xf << CIO2_INT_EXT_OE_OES_SHIFT)
+#define CIO2_REG_INT_EN			0x1420
+#define CIO2_INT_IOC(dma)	(1 << ((dma) < 4 ? (dma) : ((dma) >> 1) + 2))
+#define CIO2_INT_IOC_SHIFT				0
+#define CIO2_INT_IOC_MASK		(0x7ff << CIO2_INT_IOC_SHIFT)
+#define CIO2_INT_IOS_IOLN(dma)			(1 << (((dma) >> 1) + 12))
+#define CIO2_INT_IOS_IOLN_SHIFT				12
+#define CIO2_INT_IOS_IOLN_MASK		(0x3ff << CIO2_INT_IOS_IOLN_SHIFT)
+#define CIO2_INT_IOIE					BIT(22)
+#define CIO2_INT_IOOE					BIT(23)
+#define CIO2_INT_IOIRQ					BIT(24)
+#define CIO2_REG_INT_EN_EXT_OE		0x1424
+#define CIO2_REG_DMA_DBG		0x1448
+#define CIO2_REG_DMA_DBG_DMA_INDEX_SHIFT 0
+#define CIO2_REG_PBM_ARB_CTRL		0x1460
+#define CIO2_PBM_ARB_CTRL_LANES_DIV_SHIFT		0
+#define CIO2_PBM_ARB_CTRL_LE_EN				BIT(7)
+#define CIO2_PBM_ARB_CTRL_PLL_POST_SHTDN_SHIFT		8
+#define CIO2_PBM_ARB_CTRL_PLL_AHD_WK_UP_SHIFT		16
+#define CIO2_REG_PBM_WMCTRL1		0x1464
+#define CIO2_PBM_WMCTRL1_MIN_2CK_SHIFT			0
+#define CIO2_PBM_WMCTRL1_MID1_2CK_SHIFT			8
+#define CIO2_PBM_WMCTRL1_MID2_2CK_SHIFT			16
+#define CIO2_PBM_WMCTRL1_TS_COUNT_DISABLE		BIT(31)
+#define CIO2_REG_PBM_WMCTRL2		0x1468
+#define CIO2_PBM_WMCTRL2_HWM_2CK_SHIFT			0
+#define CIO2_PBM_WMCTRL2_LWM_2CK_SHIFT			8
+#define CIO2_PBM_WMCTRL2_OBFFWM_2CK_SHIFT		16
+#define CIO2_PBM_WMCTRL2_TRANSDYN_SHIFT			24
+#define CIO2_PBM_WMCTRL2_DYNWMEN			BIT(28)
+#define CIO2_PBM_WMCTRL2_OBFF_MEM_EN			BIT(29)
+#define CIO2_PBM_WMCTRL2_OBFF_CPU_EN			BIT(30)
+#define CIO2_PBM_WMCTRL2_DRAINNOW			BIT(31)
+#define CIO2_REG_PBM_TS_COUNT		0x146c
+#define CIO2_REG_PBM_FOPN_ABORT	0x1474	/* below n = 0..3 */
+#define CIO2_PBM_FOPN_ABORT(n)				(0x1 << 8*(n))
+#define CIO2_PBM_FOPN_FORCE_ABORT(n)			(0x2 << 8*(n))
+#define CIO2_PBM_FOPN_FRAMEOPEN(n)			(0x8 << 8*(n))
+#define CIO2_REG_LTRCTRL		0x1480
+#define CIO2_LTRCTRL_LTRDYNEN				BIT(16)
+#define CIO2_LTRCTRL_LTRSTABLETIME_SHIFT		8
+#define CIO2_LTRCTRL_LTRSTABLETIME_MASK			0xff
+#define CIO2_LTRCTRL_LTRSEL1S3				BIT(7)
+#define CIO2_LTRCTRL_LTRSEL1S2				BIT(6)
+#define CIO2_LTRCTRL_LTRSEL1S1				BIT(5)
+#define CIO2_LTRCTRL_LTRSEL1S0				BIT(4)
+#define CIO2_LTRCTRL_LTRSEL2S3				BIT(3)
+#define CIO2_LTRCTRL_LTRSEL2S2				BIT(2)
+#define CIO2_LTRCTRL_LTRSEL2S1				BIT(1)
+#define CIO2_LTRCTRL_LTRSEL2S0				BIT(0)
+#define CIO2_REG_LTRVAL23		0x1484
+#define CIO2_REG_LTRVAL01		0x1488
+#define CIO2_LTRVAL02_VAL_SHIFT				0
+#define CIO2_LTRVAL02_SCALE_SHIFT			10
+#define CIO2_LTRVAL13_VAL_SHIFT				16
+#define CIO2_LTRVAL13_SCALE_SHIFT			26
+
+#define CIO2_REG_CDMABA(n)		(0x1500 + 0x10*(n))	/* n = 0..19 */
+#define CIO2_REG_CDMARI(n)		(0x1504 + 0x10*(n))
+#define CIO2_CDMARI_FBPT_RP_SHIFT			0
+#define CIO2_CDMARI_FBPT_RP_MASK			0xff
+#define CIO2_REG_CDMAC0(n)		(0x1508 + 0x10*(n))
+#define CIO2_CDMAC0_FBPT_LEN_SHIFT			0
+#define CIO2_CDMAC0_FBPT_WIDTH_SHIFT			8
+#define CIO2_CDMAC0_FBPT_NS				BIT(25)
+#define CIO2_CDMAC0_DMA_INTR_ON_FS			BIT(26)
+#define CIO2_CDMAC0_DMA_INTR_ON_FE			BIT(27)
+#define CIO2_CDMAC0_FBPT_UPDATE_FIFO_FULL		BIT(28)
+#define CIO2_CDMAC0_FBPT_FIFO_FULL_FIX_DIS		BIT(29)
+#define CIO2_CDMAC0_DMA_EN				BIT(30)
+#define CIO2_CDMAC0_DMA_HALTED				BIT(31)
+#define CIO2_REG_CDMAC1(n)		(0x150c + 0x10*(n))
+#define CIO2_CDMAC1_LINENUMINT_SHIFT			0
+#define CIO2_CDMAC1_LINENUMUPDATE_SHIFT			16
+
+#define CIO2_REG_PXM_PXF_FMT_CFG0(n)	(0x1700 + 0x30*(n))	/* n = 0..3 */
+#define CIO2_PXM_PXF_FMT_CFG_SID0_SHIFT			0
+#define CIO2_PXM_PXF_FMT_CFG_SID1_SHIFT			16
+#define CIO2_PXM_PXF_FMT_CFG_PCK_64B			(0 << 0)
+#define CIO2_PXM_PXF_FMT_CFG_PCK_32B			(1 << 0)
+#define CIO2_PXM_PXF_FMT_CFG_BPP_08			(0 << 2)
+#define CIO2_PXM_PXF_FMT_CFG_BPP_10			(1 << 2)
+#define CIO2_PXM_PXF_FMT_CFG_BPP_12			(2 << 2)
+#define CIO2_PXM_PXF_FMT_CFG_BPP_14			(3 << 2)
+#define CIO2_PXM_PXF_FMT_CFG_SPEC_4PPC			(0 << 4)
+#define CIO2_PXM_PXF_FMT_CFG_SPEC_3PPC_RGBA		(1 << 4)
+#define CIO2_PXM_PXF_FMT_CFG_SPEC_3PPC_ARGB		(2 << 4)
+#define CIO2_PXM_PXF_FMT_CFG_SPEC_PLANAR2		(3 << 4)
+#define CIO2_PXM_PXF_FMT_CFG_SPEC_PLANAR3		(4 << 4)
+#define CIO2_PXM_PXF_FMT_CFG_SPEC_NV16			(5 << 4)
+#define CIO2_PXM_PXF_FMT_CFG_PSWAP4_1ST_AB		(1 << 7)
+#define CIO2_PXM_PXF_FMT_CFG_PSWAP4_1ST_CD		(1 << 8)
+#define CIO2_PXM_PXF_FMT_CFG_PSWAP4_2ND_AC		(1 << 9)
+#define CIO2_PXM_PXF_FMT_CFG_PSWAP4_2ND_BD		(1 << 10)
+#define CIO2_REG_INT_STS_EXT_IE		0x17e4	/* See CIO_INT_EXT_IE_* */
+#define CIO2_REG_INT_EN_EXT_IE		0x17e8
+#define CIO2_INT_EXT_IE_ECC_RE(n)			(0x01 << (8 * (n)))
+#define CIO2_INT_EXT_IE_DPHY_NR(n)			(0x02 << (8 * (n)))
+#define CIO2_INT_EXT_IE_ECC_NR(n)			(0x04 << (8 * (n)))
+#define CIO2_INT_EXT_IE_CRCERR(n)			(0x08 << (8 * (n)))
+#define CIO2_INT_EXT_IE_INTERFRAMEDATA(n)		(0x10 << (8 * (n)))
+#define CIO2_INT_EXT_IE_PKT2SHORT(n)			(0x20 << (8 * (n)))
+#define CIO2_INT_EXT_IE_PKT2LONG(n)			(0x40 << (8 * (n)))
+#define CIO2_INT_EXT_IE_IRQ(n)				(0x80 << (8 * (n)))
+#define CIO2_REG_PXM_FRF_CFG(n)		(0x1720 + 0x30*(n))
+#define CIO2_PXM_FRF_CFG_FNSEL				BIT(0)
+#define CIO2_PXM_FRF_CFG_FN_RST				BIT(1)
+#define CIO2_PXM_FRF_CFG_ABORT				BIT(2)
+#define CIO2_PXM_FRF_CFG_CRC_TH_SHIFT			3
+#define CIO2_PXM_FRF_CFG_MSK_ECC_DPHY_NR		BIT(8)
+#define CIO2_PXM_FRF_CFG_MSK_ECC_RE			BIT(9)
+#define CIO2_PXM_FRF_CFG_MSK_ECC_DPHY_NE		BIT(10)
+#define CIO2_PXM_FRF_CFG_EVEN_ODD_MODE_SHIFT		11
+#define CIO2_PXM_FRF_CFG_MASK_CRC_THRES			BIT(13)
+#define CIO2_PXM_FRF_CFG_MASK_CSI_ACCEPT		BIT(14)
+#define CIO2_PXM_FRF_CFG_CIOHC_FS_MODE			BIT(15)
+#define CIO2_PXM_FRF_CFG_CIOHC_FRST_FRM_SHIFT		16
+#define CIO2_REG_PXM_SID2BID0(n)	(0x1724 + 0x30*(n))
+#define CIO2_FB_HPLL_FREQ		0x2
+#define CIO2_ISCLK_RATIO		0xc
+
+#define CIO2_PBM_WMCTRL1_MIN_2CK	(4 << CIO2_PBM_WMCTRL1_MIN_2CK_SHIFT)
+#define CIO2_PBM_WMCTRL1_MID1_2CK	(16 << CIO2_PBM_WMCTRL1_MID1_2CK_SHIFT)
+#define CIO2_PBM_WMCTRL1_MID2_2CK	(21 << CIO2_PBM_WMCTRL1_MID2_2CK_SHIFT)
+
+#define CIO2_PBM_WMCTRL2_HWM_2CK	53
+#define CIO2_PBM_WMCTRL2_LWM_2CK	22
+#define CIO2_PBM_WMCTRL2_OBFFWM_2CK	2
+#define CIO2_PBM_WMCTRL2_TRANSDYN	1
+
+#define CIO2_PBM_ARB_CTRL_LANES_DIV	0	/* 4-4-2-2 lanes */
+#define CIO2_PBM_ARB_CTRL_PLL_POST_SHTDN 2
+#define CIO2_PBM_ARB_CTRL_PLL_AHD_WK_UP 480
+
+#define CIO2_IRQCTRL_MASK		0x3ffff
+
+#define CIO2_INT_EN_EXT_OE_MASK		0x8f0fffff
+
+#define CIO2_CGC_CLKGATE_HOLDOFF	3
+#define CIO2_CGC_CSI_CLKGATE_HOLDOFF	5
+
+#define CIO2_LTRVAL0_VAL		500
+#define CIO2_LTRVAL0_SCALE		2	/* Value times 1024 ns */
+#define CIO2_LTRVAL1_VAL		90
+#define CIO2_LTRVAL1_SCALE		2
+#define CIO2_LTRVAL2_VAL		90
+#define CIO2_LTRVAL2_SCALE		2
+#define CIO2_LTRVAL3_VAL		90
+#define CIO2_LTRVAL3_SCALE		2
+
+#define CIO2_PXM_FRF_CFG_CRC_TH		16
+
+#define CIO2_INT_EN_EXT_IE_MASK		0xffffffff
+
+#define CIO2_DMA_CHAN			0
+
+#define CIO2_CSIRX_DLY_CNT_CLANE_IDX	-1
+
+#define CIO2_CSIRX_DLY_CNT_TERMEN_CLANE_A	0
+#define CIO2_CSIRX_DLY_CNT_TERMEN_CLANE_B	0
+#define CIO2_CSIRX_DLY_CNT_SETTLE_CLANE_A	95
+#define CIO2_CSIRX_DLY_CNT_SETTLE_CLANE_B	-8
+
+#define CIO2_CSIRX_DLY_CNT_TERMEN_DLANE_A	0
+#define CIO2_CSIRX_DLY_CNT_TERMEN_DLANE_B	0
+#define CIO2_CSIRX_DLY_CNT_SETTLE_DLANE_A	85
+#define CIO2_CSIRX_DLY_CNT_SETTLE_DLANE_B	-2
+
+#define CIO2_PMCSR_OFFSET		4
+#define CIO2_PMCSR_D0D3_SHIFT		2
+#define CIO2_PMCSR_D3			0x3
+
+struct cio2_csi2_timing {
+	s32 clk_termen;
+	s32 clk_settle;
+	s32 dat_termen;
+	s32 dat_settle;
+};
+
+struct csi2_bus_info {
+	u32 port;
+	u32 num_of_lanes;
+};
+
+struct cio2_queue {
+	/* mutex to be used by vb2_queue */
+	struct mutex lock;
+	struct media_pipeline pipe;
+	struct csi2_bus_info csi2;
+	struct v4l2_subdev *sensor;
+	void __iomem *csi_rx_base;
+
+	/* Subdev, /dev/v4l-subdevX */
+	struct v4l2_subdev subdev;
+	struct media_pad subdev_pads[CIO2_PADS];
+	struct v4l2_mbus_framefmt subdev_fmt;
+	atomic_t frame_sequence;
+
+	/* Video device, /dev/videoX */
+	struct video_device vdev;
+	struct media_pad vdev_pad;
+	u32 pixelformat;
+	struct vb2_queue vbq;
+
+	/* Buffer queue handling */
+	struct cio2_fbpt_entry *fbpt;	/* Frame buffer pointer table */
+	dma_addr_t fbpt_bus_addr;
+	struct cio2_buffer *bufs[CIO2_MAX_BUFFERS];
+	unsigned int bufs_first;	/* Index of the first used entry */
+	unsigned int bufs_next;	/* Index of the first unused entry */
+	atomic_t bufs_queued;
+};
+
+struct sensor_async_subdev {
+	struct v4l2_async_subdev asd;
+	struct v4l2_fwnode_endpoint vfwn_endpt;
+};
+
+struct cio2_device {
+	struct pci_dev *pci_dev;
+	void __iomem *base;
+	struct v4l2_device v4l2_dev;
+	struct cio2_queue queue[CIO2_QUEUES];
+	struct cio2_queue *cur_queue;
+	/* mutex to be used by video_device */
+	struct mutex lock;
+
+	struct v4l2_async_notifier notifier;
+	struct media_device media_dev;
+
+	/*
+	 * Safety net to catch DMA fetch ahead
+	 * when reaching the end of LOP
+	 */
+	void *dummy_page;
+	/* DMA handle of dummy_page */
+	dma_addr_t dummy_page_bus_addr;
+	/* single List of Pointers (LOP) page */
+	u32 *dummy_lop;
+	/* DMA handle of dummy_lop */
+	dma_addr_t dummy_lop_bus_addr;
+};
+
+struct cio2_buffer {
+	struct vb2_v4l2_buffer vbb;
+	u32 *lop;
+	dma_addr_t lop_bus_addr;
+};
+
+/**************** FBPT operations ****************/
+#define CIO2_FBPT_SIZE			(CIO2_MAX_BUFFERS * CIO2_MAX_LOPS * \
+					 sizeof(struct cio2_fbpt_entry))
+
+#define CIO2_FBPT_SUBENTRY_UNIT		4
+
+/*
+ * Frame Buffer Pointer Table(FBPT) entry
+ * each entry describe an output buffer and consists of
+ * several sub-entries
+ */
+struct __packed cio2_fbpt_entry {
+	union {
+		struct __packed {
+			u32 ctrl; /* status ctrl */
+#define CIO2_FBPT_CTRL_VALID		BIT(0)
+#define CIO2_FBPT_CTRL_IOC		BIT(1)
+#define CIO2_FBPT_CTRL_IOS		BIT(2)
+#define CIO2_FBPT_CTRL_SUCCXFAIL	BIT(3)
+#define CIO2_FBPT_CTRL_CMPLCODE_SHIFT	4
+			u16 cur_line_num; /* current line # written to DDR */
+			u16 frame_num; /* updated by DMA upon FE */
+			u32 first_page_offset; /* offset for 1st page in LOP */
+		} first_entry;
+		/* Second entry per buffer */
+		struct __packed {
+			u32 timestamp;
+			u32 num_of_bytes;
+			/* the number of bytes for write on last page */
+			u16 last_page_available_bytes;
+			/* the number of pages allocated for this buf */
+			u16 num_of_pages;
+		} second_entry;
+		struct __packed {
+			u64 __reserved1;
+			u32 __reserved0;
+		} other_entries;
+	};
+	u32 lop_page_addr;	/* Points to list of pointers (LOP) table */
+};
+
+static inline struct cio2_queue *file_to_cio2_queue(struct file *file) {
+	return container_of(video_devdata(file), struct cio2_queue, vdev);
+}
+
+#endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/3] [media] doc-rst: add IPU3 raw10 bayer pixel format definitions
  2017-06-07  1:34 ` [PATCH v2 2/3] [media] doc-rst: add IPU3 raw10 bayer pixel format definitions Yong Zhi
@ 2017-06-07 17:55   ` Alan Cox
  0 siblings, 0 replies; 17+ messages in thread
From: Alan Cox @ 2017-06-07 17:55 UTC (permalink / raw)
  To: Yong Zhi
  Cc: linux-media, sakari.ailus, jian.xu.zheng, tfiga, rajmohan.mani,
	tuukka.toivonen, hverkuil, hyungwoo.yang

> +
> +10-bit Bayer formats
> +
> +Description
> +===========
> +
> +These four pixel formats are used by Intel IPU3 driver,

Are the same formats present in IPUv2, will they ever be present in other
hardware.

If so (and I think it is so...) then it's not a good idea to encode ipu3
in the name. Something like V4l2_PIX_FMT_SBGGR10_PACKED might be better ?

Alan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 3/3] [media] intel-ipu3: cio2: Add new MIPI-CSI2 driver
  2017-06-07  1:34 ` [PATCH v2 3/3] [media] intel-ipu3: cio2: Add new MIPI-CSI2 driver Yong Zhi
@ 2017-06-12  9:59   ` Tomasz Figa
  2017-06-13  8:58     ` Tuukka Toivonen
                       ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: Tomasz Figa @ 2017-06-12  9:59 UTC (permalink / raw)
  To: Yong Zhi
  Cc: linux-media, Sakari Ailus, Zheng, Jian Xu, Mani, Rajmohan,
	Toivonen, Tuukka, Hans Verkuil, Yang, Hyungwoo

Hi Yong,

Please see my comments inline.

On Wed, Jun 7, 2017 at 10:34 AM, Yong Zhi <yong.zhi@intel.com> wrote:
> This patch adds CIO2 CSI-2 device driver for
> Intel's IPU3 camera sub-system support.
>
> Signed-off-by: Yong Zhi <yong.zhi@intel.com>
> ---
>  drivers/media/pci/Kconfig                |    2 +
>  drivers/media/pci/Makefile               |    3 +-
>  drivers/media/pci/intel/Makefile         |    5 +
>  drivers/media/pci/intel/ipu3/Kconfig     |   17 +
>  drivers/media/pci/intel/ipu3/Makefile    |    1 +
>  drivers/media/pci/intel/ipu3/ipu3-cio2.c | 1788 ++++++++++++++++++++++++++++++
>  drivers/media/pci/intel/ipu3/ipu3-cio2.h |  424 +++++++
>  7 files changed, 2239 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/media/pci/intel/Makefile
>  create mode 100644 drivers/media/pci/intel/ipu3/Kconfig
>  create mode 100644 drivers/media/pci/intel/ipu3/Makefile
>  create mode 100644 drivers/media/pci/intel/ipu3/ipu3-cio2.c
>  create mode 100644 drivers/media/pci/intel/ipu3/ipu3-cio2.h
[snip]
> diff --git a/drivers/media/pci/intel/ipu3/Kconfig b/drivers/media/pci/intel/ipu3/Kconfig
> new file mode 100644
> index 0000000..2a895d6
> --- /dev/null
> +++ b/drivers/media/pci/intel/ipu3/Kconfig
> @@ -0,0 +1,17 @@
> +config VIDEO_IPU3_CIO2
> +       tristate "Intel ipu3-cio2 driver"
> +       depends on VIDEO_V4L2 && PCI
> +       depends on MEDIA_CONTROLLER
> +       depends on HAS_DMA
> +       depends on ACPI

I wonder if it wouldn't make sense to make this depend on X86 (||
COMPILE_TEST) as well. Are we expecting a standalone PCI(e) card with
this device in the future?

> +       select V4L2_FWNODE
> +       select VIDEOBUF2_DMA_SG
> +
> +       ---help---
> +       This is the Intel IPU3 CIO2 CSI-2 receiver unit, found in Intel
> +       Skylake and Kaby Lake SoCs and used for capturing images and
> +       video from a camera sensor.
> +
> +       Say Y or M here if you have a Skylake/Kaby Lake SoC with MIPI CSI-2
> +       connected camera.
> +       The module will be called ipu3-cio2.
> diff --git a/drivers/media/pci/intel/ipu3/Makefile b/drivers/media/pci/intel/ipu3/Makefile
> new file mode 100644
> index 0000000..20186e3
> --- /dev/null
> +++ b/drivers/media/pci/intel/ipu3/Makefile
> @@ -0,0 +1 @@
> +obj-$(CONFIG_VIDEO_IPU3_CIO2) += ipu3-cio2.o
> diff --git a/drivers/media/pci/intel/ipu3/ipu3-cio2.c b/drivers/media/pci/intel/ipu3/ipu3-cio2.c
> new file mode 100644
> index 0000000..69c47fc
> --- /dev/null
> +++ b/drivers/media/pci/intel/ipu3/ipu3-cio2.c
[snip]
> +static int cio2_fbpt_init_dummy(struct cio2_device *cio2)
> +{
> +       unsigned int i;
> +
> +       cio2->dummy_page = dma_alloc_noncoherent(&cio2->pci_dev->dev, PAGE_SIZE,
> +                                       &cio2->dummy_page_bus_addr, GFP_KERNEL);
> +       cio2->dummy_lop = dma_alloc_noncoherent(&cio2->pci_dev->dev, PAGE_SIZE,
> +                                       &cio2->dummy_lop_bus_addr, GFP_KERNEL);

Something is not right here. Why noncoherent memory is allocated, but
coherent memory is freed in the free function above? Wasn't the
intention to just always use coherent memory throughout the driver?

> +       if (!cio2->dummy_page || !cio2->dummy_lop) {
> +               cio2_fbpt_exit_dummy(cio2);
> +               return -ENOMEM;
> +       }
> +       /*
> +        * List of Pointers(LOP) contains 1024x32b pointers to 4KB page each
> +        * Initialize each entry to dummy_page bus base address.
> +        */
> +       for (i = 0; i < PAGE_SIZE / sizeof(*cio2->dummy_lop); i++)
> +               cio2->dummy_lop[i] = cio2->dummy_page_bus_addr >> PAGE_SHIFT;
> +
> +       return 0;
> +}
[snip]
> +/* Initialize fpbt entries to point to a given buffer */
> +static void cio2_fbpt_entry_init_buf(struct cio2_device *cio2,
> +                                    struct cio2_buffer *b,
> +                                    struct cio2_fbpt_entry
> +                                    entry[CIO2_MAX_LOPS])
> +{
> +       struct vb2_buffer *vb = &b->vbb.vb2_buf;
> +       unsigned int length = vb->planes[0].length;
> +       dma_addr_t lop_bus_addr = b->lop_bus_addr;
> +       int remaining;
> +
> +       entry[0].first_entry.first_page_offset =
> +               offset_in_page(vb2_plane_vaddr(vb, 0));

nit: Even though it's technically the same value, it's kind of
logically confusing that a function for virtual addresses is used for
DMA calculations. Similarly for offset_in_page, since it refers to CPU
pages.

> +       remaining = length + entry[0].first_entry.first_page_offset;
> +       entry[1].second_entry.num_of_pages = DIV_ROUND_UP(remaining, PAGE_SIZE);
> +       /*
> +        * last_page_available_bytes has the offset of the last byte in the
> +        * last page which is still accessible by DMA. DMA cannot access
> +        * beyond this point. Valid range for this is from 0 to 4095.
> +        * 0 indicates 1st byte in the page is DMA accessible.
> +        * 4095 (PAGE_SIZE - 1) means every single byte in the last page
> +        * is available for DMA transfer.
> +        */
> +       entry[1].second_entry.last_page_available_bytes =
> +                       (remaining & ~PAGE_MASK) ?
> +                               (remaining & ~PAGE_MASK) - 1 : PAGE_SIZE - 1;

nit: Probably PAGE_SIZE is not going to change easily, but it's
referring to CPU default page size. At least for clarity, you should
have your own defines for CIO2 page sizes.

> +       /* Fill FBPT */
> +       remaining = length;
> +       while (remaining > 0) {
> +               entry->lop_page_addr = lop_bus_addr >> PAGE_SHIFT;
> +               lop_bus_addr += PAGE_SIZE;
> +               remaining -= PAGE_SIZE / sizeof(u32) * PAGE_SIZE;
> +               entry++;
> +       }

By any chance, doesn't the hardware provide some simple mode for
contiguous buffers? Since we have an MMU anyway, we could use
vb2_dma_contig and simplify the code significantly.

> +
> +       /*
> +        * The first not meaningful FBPT entry should point to a valid LOP
> +        */
> +       entry->lop_page_addr = cio2->dummy_lop_bus_addr >> PAGE_SHIFT;
> +
> +       cio2_fbpt_entry_enable(cio2, entry);
> +}
> +
> +static int cio2_fbpt_init(struct cio2_device *cio2, struct cio2_queue *q)
> +{
> +       struct device *dev = &cio2->pci_dev->dev;
> +
> +       q->fbpt = dma_alloc_noncoherent(dev, CIO2_FBPT_SIZE,
> +                       &q->fbpt_bus_addr, GFP_KERNEL);

_coherent?

> +       if (!q->fbpt)
> +               return -ENOMEM;
> +
> +       memset(q->fbpt, 0, CIO2_FBPT_SIZE);
> +
> +       return 0;
> +}
> +
> +static void cio2_fbpt_exit(struct cio2_queue *q, struct device *dev)
> +{
> +       dma_free_coherent(dev, CIO2_FBPT_SIZE, q->fbpt, q->fbpt_bus_addr);
> +}
> +
> +/**************** CSI2 hardware setup ****************/
> +
> +/*
> + * This should come from sensor driver. No
> + * driver interface nor requirement yet.
> + */
> +static u8 sensor_vc;   /* Virtual channel */
> +
> +/*
> + * The CSI2 receiver has several parameters affecting
> + * the receiver timings. These depend on the MIPI bus frequency
> + * F in Hz (sensor transmitter rate) as follows:
> + *     register value = (A/1e9 + B * UI) / COUNT_ACC
> + * where
> + *      UI = 1 / (2 * F) in seconds
> + *      COUNT_ACC = counter accuracy in seconds
> + *      For IPU3 COUNT_ACC = 0.0625
> + *
> + * A and B are coefficients from the table below,
> + * depending whether the register minimum or maximum value is
> + * calculated.
> + *                                     Minimum     Maximum
> + * Clock lane                          A     B     A     B
> + * reg_rx_csi_dly_cnt_termen_clane     0     0    38     0
> + * reg_rx_csi_dly_cnt_settle_clane    95    -8   300   -16
> + * Data lanes
> + * reg_rx_csi_dly_cnt_termen_dlane0    0     0    35
> + * reg_rx_csi_dly_cnt_settle_dlane0   85    -2   145    -6
> + * reg_rx_csi_dly_cnt_termen_dlane1    0     0    35     4
> + * reg_rx_csi_dly_cnt_settle_dlane1   85    -2   145    -6
> + * reg_rx_csi_dly_cnt_termen_dlane2    0     0    35     4
> + * reg_rx_csi_dly_cnt_settle_dlane2   85    -2   145    -6
> + * reg_rx_csi_dly_cnt_termen_dlane3    0     0    35     4
> + * reg_rx_csi_dly_cnt_settle_dlane3   85    -2   145    -6
> + *
> + * We use the minimum values of both A and B.

Why?

> + */
> +static int cio2_rx_timing(s32 a, s32 b, s64 freq)
> +{
> +       int r;
> +       const u32 accinv = 16;
> +       const u32 ds = 8; /* divde shift */

typo: divide

> +
> +       freq = (s32)freq >> ds;

Why do we demote freq from 64 to 32 bits here?

> +       if (WARN_ON(freq <= 0))
> +               return -EINVAL;

It generally doesn't make sense for the frequency to be negative, so
maybe the argument should have been unsigned to start with? (And
32-bit if we don't expect frequencies higher than 4 GHz anyway.)

> +
> +       /* b could be 0, -2 or -8, so r < 500000000 */

Definitely. Anything <= 0 is also less than 500000000. Let's take a
look at the computation below again:

1) accinv is multiplied by b,
2) 500000000 is divided by 256 (=== shift right by 8 bits) = 1953125,
3) accinv*b is multiplied by 1953125 to form the value of r.

Now let's see at possible maximum absolute values for particular steps:
1) 16 * -8 = -128 (signed 8 bits),
2) 1953125 (unsigned 21 bits),
3) -128 * 1953125 = -249999872 (signed 29 bits).

So I think the important thing to note in the comment is:

/* b could be 0, -2 or -8, so |accinv * b| is always less than (1 <<
ds) and thus |r| < 500000000. */

> +       r = accinv * b * (500000000 >> ds);

On the other hand, you lose some precision here. If you used s64
instead and did the divide shift at the end ((accinv * b * 500000000)
>> ds), for the example above you would get -250007629. (Depending on
how big freq is, it might not matter, though.)

Also nit: What is 500000000? We have local constants defined above, I
think it could also make sense to do the same for this one. The
compiler should do constant propagation and simplify respective
calculations anyway.

> +       r /= freq;
> +       /* max value of a is 95 */
> +       r += accinv * a;
> +
> +       return r;
> +};
> +
> +/* Computation for the Delay value for Termination enable of Clock lane HS Rx */
> +static int cio2_csi2_calc_timing(struct cio2_device *cio2, struct cio2_queue *q,
> +                           struct cio2_csi2_timing *timing)
> +{
> +       struct device *dev = &cio2->pci_dev->dev;
> +       struct v4l2_querymenu qm = {.id = V4L2_CID_LINK_FREQ, };
> +       struct v4l2_ctrl *link_freq;
> +       s64 freq;
> +       int r;
> +
> +       if (q->sensor)
> +               link_freq = v4l2_ctrl_find(q->sensor->ctrl_handler,
> +                                               V4L2_CID_LINK_FREQ);

Is it even possible to have this function called with !q->sensor? If
yes, maybe the function should check it and fail earlier?

> +       if (!link_freq) {
> +               dev_err(dev, "failed to find LINK_FREQ\n");
> +               return -EPIPE;
> +       };
> +
> +       qm.index = v4l2_ctrl_g_ctrl(link_freq);

We will crash here or even dereference some invalid memory if (!q->sensor).

> +       r = v4l2_querymenu(q->sensor->ctrl_handler, &qm);
> +       if (r) {
> +               dev_err(dev, "failed to get menu item\n");
> +               return r;
> +       }
> +
> +       if (!qm.value)
> +               return -EINVAL;

I think an error message would make sense here.

> +       freq = qm.value;
> +
> +       dev_info(dev, "link freq is %lld\n", qm.value);

Do we need to print this? Perhaps dev_dbg() could be more appropriate.

> +
> +       timing->clk_termen = cio2_rx_timing(CIO2_CSIRX_DLY_CNT_TERMEN_CLANE_A,
> +                               CIO2_CSIRX_DLY_CNT_TERMEN_CLANE_B, freq);
> +       /* test freq/div_shift > 0 */
> +       if (timing->clk_termen < 0)

Either the comment should say >= 0 or the check be <= 0.

> +               return -EINVAL;

Given that freq comes from the sensor, the calculation might involve a
subtraction (B might be negative) and returned value is not allowed to
be negative, is it really okay to always use the minimum values?

> +
> +       timing->clk_settle = cio2_rx_timing(CIO2_CSIRX_DLY_CNT_SETTLE_CLANE_A,
> +                               CIO2_CSIRX_DLY_CNT_SETTLE_CLANE_B, freq);
> +       timing->dat_termen = cio2_rx_timing(CIO2_CSIRX_DLY_CNT_TERMEN_DLANE_A,
> +                               CIO2_CSIRX_DLY_CNT_TERMEN_DLANE_B, freq);
> +       timing->dat_settle = cio2_rx_timing(CIO2_CSIRX_DLY_CNT_SETTLE_DLANE_A,
> +                               CIO2_CSIRX_DLY_CNT_SETTLE_DLANE_B, freq);

No need to check the above for > 0?

> +
> +       dev_dbg(dev, "freq ct value is %d\n", timing->clk_termen);
> +       dev_dbg(dev, "freq cs value is %d\n", timing->clk_settle);
> +       dev_dbg(dev, "freq dt value is %d\n", timing->dat_termen);
> +       dev_dbg(dev, "freq ds value is %d\n", timing->dat_settle);
> +
> +       return 0;
> +};
[snip]
> +static int cio2_hw_init(struct cio2_device *cio2, struct cio2_queue *q)
> +{
> +       static const int NUM_VCS = 4;
> +       static const int SID;   /* Stream id */
> +       static const int ENTRY;
> +       static const int FBPT_WIDTH = DIV_ROUND_UP(CIO2_MAX_LOPS,
> +                                       CIO2_FBPT_SUBENTRY_UNIT);
> +       const u32 num_buffers1 = CIO2_MAX_BUFFERS - 1;
> +       void __iomem *const base = cio2->base;
> +       u8 mipicode, lanes, csi2bus = q->csi2.port;
> +       struct cio2_csi2_timing timing;
> +       int i, r;
> +
> +       /* TODO: add support for virtual channels */
> +       sensor_vc = 0;

Modifying a global variable here. I think it would make more sense to
make it a local variable for this function then and also define a
macro with the default VC number.

> +       mipicode = r = cio2_hw_mbus_to_mipicode(
> +                       q->subdev_fmt.code);
> +       if (r < 0)
> +               return r;
> +
> +       lanes = r = q->csi2.num_of_lanes;
> +       if (r < 0)
> +               return r;
[snip]
> +       r = cio2_csi2_calc_timing(cio2, q, &timing);
> +       if (r) {
> +               /* Use default values */

Is it really the good thing to do here? Perhaps calling
cio2_csi2_calc_timing() before starting to program the hardware and
bailing out if it fails would be a better choice?

> +               for (i = -1; i < lanes; i++) {

-1?

> +                       writel(0x4, q->csi_rx_base +
> +                               CIO2_REG_CSIRX_DLY_CNT_TERMEN(i));
> +                       writel(0x570, q->csi_rx_base +
> +                               CIO2_REG_CSIRX_DLY_CNT_SETTLE(i));
> +               }
[snip]
> +static void cio2_hw_exit(struct cio2_device *cio2, struct cio2_queue *q)
> +{
> +       void __iomem *base = cio2->base;
> +       unsigned int i, maxloops = 1000;
> +
> +       /* Disable CSI receiver and MIPI backend devices */
> +       writel(0, q->csi_rx_base + CIO2_REG_CSIRX_ENABLE);
> +       writel(0, q->csi_rx_base + CIO2_REG_MIPIBE_ENABLE);
> +
> +       /* Halt DMA */
> +       writel(0, base + CIO2_REG_CDMAC0(CIO2_DMA_CHAN));
> +       do {
> +               if (readl(base + CIO2_REG_CDMAC0(CIO2_DMA_CHAN)) &
> +                   CIO2_CDMAC0_DMA_HALTED)
> +                       break;
> +               usleep_range(1000, 2000);
> +       } while (--maxloops);
> +       if (!maxloops)
> +               dev_err(&cio2->pci_dev->dev,
> +                       "DMA %i can not be halted\n", CIO2_DMA_CHAN);

Does the code below ensure that the hardware is gracefully cut from
the bus to avoid memory corruption?

> +
> +       for (i = 0; i < CIO2_NUM_PORTS; i++) {
> +               writel(readl(base + CIO2_REG_PXM_FRF_CFG(i)) |
> +                      CIO2_PXM_FRF_CFG_ABORT, base + CIO2_REG_PXM_FRF_CFG(i));
> +               writel(readl(base + CIO2_REG_PBM_FOPN_ABORT) |
> +                      CIO2_PBM_FOPN_ABORT(i), base + CIO2_REG_PBM_FOPN_ABORT);
> +       }
> +}
> +
> +static void cio2_buffer_done(struct cio2_device *cio2, unsigned int dma_chan)
> +{
> +       struct device *dev = &cio2->pci_dev->dev;
> +       struct cio2_queue *q = cio2->cur_queue;
> +       int buffers_found = 0;
> +
> +       if (dma_chan >= CIO2_QUEUES) {
> +               dev_err(dev, "bad DMA channel %i\n", dma_chan);
> +               return;
> +       }
> +
> +       /* Find out which buffer(s) are ready */
> +       do {
> +               struct cio2_fbpt_entry *const entry =
> +                       &q->fbpt[q->bufs_first * CIO2_MAX_LOPS];
> +               struct cio2_buffer *b;
> +
> +               if (entry->first_entry.ctrl & CIO2_FBPT_CTRL_VALID)
> +                       break;
> +
> +               b = q->bufs[q->bufs_first];
> +               if (b) {
> +                       u64 ns = ktime_get_ns();
> +                       int bytes = entry[1].second_entry.num_of_bytes;
> +
> +                       q->bufs[q->bufs_first] = NULL;
> +                       atomic_dec(&q->bufs_queued);
> +                       dev_dbg(&cio2->pci_dev->dev,
> +                               "buffer %i done\n", b->vbb.vb2_buf.index);
> +
> +                       /* Fill vb2 buffer entries and tell it's ready */
> +                       vb2_set_plane_payload(&b->vbb.vb2_buf, 0, bytes);
> +                       b->vbb.vb2_buf.timestamp = ns;
> +                       b->vbb.flags = V4L2_BUF_FLAG_DONE;
> +                       b->vbb.field = V4L2_FIELD_NONE;
> +                       memset(&b->vbb.timecode, 0, sizeof(b->vbb.timecode));
> +                       b->vbb.sequence = entry[0].first_entry.frame_num;
> +                       vb2_buffer_done(&b->vbb.vb2_buf, VB2_BUF_STATE_DONE);
> +               }
> +               cio2_fbpt_entry_init_dummy(cio2, entry);
> +               q->bufs_first = (q->bufs_first + 1) % CIO2_MAX_BUFFERS;
> +               buffers_found++;

Personally, I'm a bit afraid of such potentially infinite loops in
interrupt handlers (if the CPU doesn't process the buffers faster than
the hardware produces, it would never finish spinning). Let me defer
this to other reviewers, though...

> +       } while (1);
> +
> +       if (buffers_found == 0)
> +               dev_warn(&cio2->pci_dev->dev,
> +                        "no ready buffers found on DMA channel %i\n",
> +                        dma_chan);
> +}
[snip]
> +static irqreturn_t cio2_irq(int irq, void *cio2_ptr)
> +{
> +       struct cio2_device *cio2 = cio2_ptr;
> +       void __iomem *const base = cio2->base;
> +       struct device *dev = &cio2->pci_dev->dev;
> +       u32 int_status, int_clear;
> +
> +       int_clear = int_status = readl(base + CIO2_REG_INT_STS);
> +       if (!int_status)
> +               return IRQ_NONE;

CodingStyle states clearly: "Don't put multiple assignments on a
single line either.  Kernel coding style
is super simple.  Avoid tricky expressions."

> +
> +       if (int_status & CIO2_INT_IOOE) {
> +               /* Interrupt on Output Error:

CodingStyle: Multi-line comment should start with an empty line.

> +                * 1) SRAM is full and FS received, or
> +                * 2) An invalid bit detected by DMA.
> +                */
> +               u32 oe_status, oe_clear;
> +
> +               oe_clear = oe_status = readl(base + CIO2_REG_INT_STS_EXT_OE);

Multiple assignments on a single ine.

> +
> +               if (oe_status & CIO2_INT_EXT_OE_DMAOE_MASK) {
> +                       dev_err(dev, "DMA output error: 0x%x\n",
> +                               (oe_status & CIO2_INT_EXT_OE_DMAOE_MASK)
> +                               >> CIO2_INT_EXT_OE_DMAOE_SHIFT);
> +                       oe_status &= ~CIO2_INT_EXT_OE_DMAOE_MASK;
> +               }
> +               if (oe_status & CIO2_INT_EXT_OE_OES_MASK) {
> +                       dev_err(dev, "DMA output error on CSI2 buses: 0x%x\n",
> +                               (oe_status & CIO2_INT_EXT_OE_OES_MASK)
> +                               >> CIO2_INT_EXT_OE_OES_SHIFT);
> +                       oe_status &= ~CIO2_INT_EXT_OE_OES_MASK;
> +               }
> +               writel(oe_clear, base + CIO2_REG_INT_STS_EXT_OE);
> +               if (oe_status)
> +                       dev_warn(dev, "unknown interrupt 0x%x on OE\n",
> +                                oe_status);
> +               int_status &= ~CIO2_INT_IOOE;
> +       }
> +
> +       if (int_status & CIO2_INT_IOC_MASK) {
> +               /* DMA IO done -- frame ready */
> +               u32 clr = 0;
> +               unsigned int d;
> +
> +               for (d = 0; d < CIO2_NUM_DMA_CHAN; d++)
> +                       if (int_status & CIO2_INT_IOC(d)) {
> +                               clr |= CIO2_INT_IOC(d);
> +                               dev_dbg(dev, "DMA %i done\n", d);
> +                               cio2_buffer_done(cio2, d);
> +                       }
> +               int_status &= ~clr;

Perhaps a construct using lfs/ffs would be a bit smarter here. For example:

        while (int_status & CIO2_INT_IOC_MASK) {
                d = __ffs(int_status & CIO2_INT_IOC_MASK) - 1;
                int_status &= ~CIO2_INT_IOC(d);
                dev_dbg(dev, "DMA %i done\n", d);
                cio2_buffer_done(cio2, d);
        }

> +       }
> +
> +       if (int_status & CIO2_INT_IOS_IOLN_MASK) {
> +               /* DMA IO starts or reached specified line */
> +               u32 clr = 0;
> +               unsigned int d;
> +
> +               for (d = 0; d < CIO2_NUM_DMA_CHAN; d++)
> +                       if (int_status & CIO2_INT_IOS_IOLN(d)) {
> +                               clr |= CIO2_INT_IOS_IOLN(d);
> +                               if (d == CIO2_DMA_CHAN)
> +                                       cio2_queue_event_sof(cio2,
> +                                                            cio2->cur_queue);
> +                               dev_dbg(dev,
> +                                       "DMA %i started or reached line\n", d);
> +                       }
> +               int_status &= ~clr;

Ditto.

> +       }
> +
> +       if (int_status & (CIO2_INT_IOIE | CIO2_INT_IOIRQ)) {
> +               /* CSI2 receiver (error) interrupt */
> +               u32 ie_status, ie_clear;
> +               unsigned int port;
> +
> +               ie_clear = ie_status = readl(base + CIO2_REG_INT_STS_EXT_IE);

Multiple assignments.

> +
> +               for (port = 0; port < CIO2_NUM_PORTS; port++) {

CIO2_NUM_PORTS is small and the bits are scattered through ie_status,
so I guess no __ffs() trick here.

> +                       u32 port_status = (ie_status >> (port * 8)) & 0xff;
> +                       void __iomem *const csi_rx_base =
> +                                               base + CIO2_REG_PIPE_BASE(port);
> +                       unsigned int i;
> +
> +                       while (port_status) {
> +                               i = ffs(port_status) - 1;
> +                               dev_err(dev, "port %i error %s\n",
> +                                       port, cio2_port_errs[i]);
> +                               ie_status &= ~BIT(port * 8 + i);
> +                               port_status &= ~BIT(i);
> +                       }

Yeah, exactly like this. ;)

> +
> +                       if (ie_status & CIO2_INT_EXT_IE_IRQ(port)) {
> +                               u32 csi2_status, csi2_clear;
> +
> +                               csi2_clear = csi2_status = readl(csi_rx_base +
> +                                               CIO2_REG_IRQCTRL_STATUS);
> +                               for (i = 0; i < ARRAY_SIZE(cio2_irq_errs);
> +                                    i++) {
> +                                       if (csi2_status & (1 << i)) {
> +                                               dev_err(dev,
> +                                                       "CSI-2 receiver port %i: %s\n",
> +                                                       port, cio2_irq_errs[i]);
> +                                               csi2_status &= ~(1 << i);
> +                                       }
> +                               }

__ffs() trick applies here too.

> +
> +                               writel(csi2_clear,
> +                                      csi_rx_base + CIO2_REG_IRQCTRL_CLEAR);
> +                               if (csi2_status)
> +                                       dev_warn(dev,
> +                                                "unknown CSI2 error 0x%x on port %i\n",
> +                                                csi2_status, port);
> +
> +                               ie_status &= ~CIO2_INT_EXT_IE_IRQ(port);
> +                       }
> +               }
> +
> +               writel(ie_clear, base + CIO2_REG_INT_STS_EXT_IE);
> +               if (ie_status)
> +                       dev_warn(dev, "unknown interrupt 0x%x on IE\n",
> +                                ie_status);
> +
> +               int_status &= ~(CIO2_INT_IOIE | CIO2_INT_IOIRQ);
> +       }
> +
> +       writel(int_clear, base + CIO2_REG_INT_STS);
> +       if (int_status)
> +               dev_warn(dev, "unknown interrupt 0x%x on INT\n", int_status);
> +
> +       return IRQ_HANDLED;
> +}
> +
> +/**************** Videobuf2 interface ****************/
> +
> +static void cio2_vb2_return_all_buffers(struct cio2_queue *q,
> +                                       enum vb2_buffer_state state)

Hmm, is there ever a reason to return all buffers with a state other
than _ERROR?

> +{
> +       unsigned int i;
> +
> +       for (i = 0; i < CIO2_MAX_BUFFERS; i++) {
> +               if (q->bufs[i]) {
> +                       atomic_dec(&q->bufs_queued);
> +                       vb2_buffer_done(&q->bufs[i]->vbb.vb2_buf, state);
> +               }
> +       }
> +}
> +
> +static int cio2_vb2_queue_setup(struct vb2_queue *vq,
> +                               unsigned int *num_buffers,
> +                               unsigned int *num_planes,
> +                               unsigned int sizes[],
> +                               struct device *alloc_devs[])
> +{
> +       struct cio2_device *cio2 = vb2_get_drv_priv(vq);
> +       struct cio2_queue *q = container_of(vq, struct cio2_queue, vbq);
> +       u32 width = q->subdev_fmt.width;
> +       u32 height = q->subdev_fmt.height;
> +       u32 pixelformat = q->pixelformat;
> +       unsigned int i, szimage;
> +       int r = 0;
> +
> +       for (i = 0; i < ARRAY_SIZE(cio2_csi2_fmts); i++) {
> +               if (pixelformat == cio2_csi2_fmts[i])
> +                       break;
> +       }
> +
> +       /* Use SRGGB10 instead of return err */
> +       if (i >= ARRAY_SIZE(cio2_csi2_fmts))

I think this should be impossible, since S_FMT should have already
validated (and corrected) the setting.

> +               pixelformat = V4L2_PIX_FMT_IPU3_SRGGB10;
> +
> +       alloc_devs[0] = &cio2->pci_dev->dev;

Hmm, so it doesn't go through the IPU MMU in the end?

> +       szimage = cio2_bytesperline(width) * height;
> +
> +       if (*num_planes) {
> +               /*
> +                * Only single plane is supported
> +                */
> +               if (*num_planes != 1 || sizes[0] < szimage)
> +                       return -EINVAL;

S_FMT should validate this and then queue_setup should never get an
invalid value.

> +       }
> +
> +       *num_planes = 1;
> +       sizes[0] = szimage;
> +
> +       *num_buffers = clamp_val(*num_buffers, 1, CIO2_MAX_BUFFERS);
> +
> +       /* Initialize buffer queue */
> +       for (i = 0; i < CIO2_MAX_BUFFERS; i++) {
> +               q->bufs[i] = NULL;
> +               cio2_fbpt_entry_init_dummy(cio2, &q->fbpt[i * CIO2_MAX_LOPS]);
> +       }
> +       atomic_set(&q->bufs_queued, 0);
> +       q->bufs_first = 0;
> +       q->bufs_next = 0;
> +
> +       return r;
> +}
> +
> +/* Called after each buffer is allocated */
> +static int cio2_vb2_buf_init(struct vb2_buffer *vb)
> +{
> +       struct cio2_device *cio2 = vb2_get_drv_priv(vb->vb2_queue);
> +       struct device *dev = &cio2->pci_dev->dev;
> +       struct cio2_buffer *b =
> +               container_of(vb, struct cio2_buffer, vbb.vb2_buf);
> +       unsigned int length = vb->planes[0].length;
> +       int lops  = DIV_ROUND_UP(DIV_ROUND_UP(length, PAGE_SIZE) + 1,
> +                                PAGE_SIZE / sizeof(u32));
> +       u32 *lop;
> +       struct sg_table *sg;
> +       struct sg_page_iter sg_iter;
> +
> +       if (lops <= 0 || lops > CIO2_MAX_LOPS) {
> +               dev_err(dev, "%s: bad buffer size (%i)\n", __func__, length);
> +               return -ENOSPC;         /* Should never happen */
> +       }
> +
> +       /* Allocate LOP table */
> +       b->lop = lop = dma_alloc_noncoherent(dev, lops * PAGE_SIZE,
> +                                       &b->lop_bus_addr, GFP_KERNEL);

_coherent?

> +       if (!lop)
> +               return -ENOMEM;
> +
> +       /* Fill LOP */
> +       sg = vb2_dma_sg_plane_desc(vb, 0);
> +       if (!sg)
> +               return -EFAULT;

I'd say -ENOMEM is better here. (But actually it should be impossible,
if allocation succeeded previously.)

> +
> +       for_each_sg_page(sg->sgl, &sg_iter, sg->nents, 0)
> +               *lop++ = sg_page_iter_dma_address(&sg_iter) >> PAGE_SHIFT;
> +       *lop++ = cio2->dummy_page_bus_addr >> PAGE_SHIFT;
> +
> +       return 0;
> +}
> +
> +/* Transfer buffer ownership to cio2 */
> +static void cio2_vb2_buf_queue(struct vb2_buffer *vb)
> +{
> +       struct cio2_device *cio2 = vb2_get_drv_priv(vb->vb2_queue);
> +       struct cio2_queue *q =
> +               container_of(vb->vb2_queue, struct cio2_queue, vbq);
> +       struct cio2_buffer *b =
> +               container_of(vb, struct cio2_buffer, vbb.vb2_buf);
> +       struct cio2_fbpt_entry *entry;
> +       unsigned int next = q->bufs_next;
> +       int bufs_queued = atomic_inc_return(&q->bufs_queued);
> +
> +       if (vb2_start_streaming_called(&q->vbq)) {

Shouldn't it be vb2_is_streaming()? (There is not much difference,
though, except that vb2_start_streaming_called() returns true, even
before .start_streaming finished, while vb2_is_streaming() does so
only after it returns successfully.)

> +               u32 fbpt_rp =
> +                       (readl(cio2->base + CIO2_REG_CDMARI(CIO2_DMA_CHAN))
> +                        >> CIO2_CDMARI_FBPT_RP_SHIFT)
> +                       & CIO2_CDMARI_FBPT_RP_MASK;
> +
> +               /*
> +                * fbpt_rp is the fbpt entry that the dma is currently working
> +                * on, but since it could jump to next entry at any time,
> +                * assume that we might already be there.
> +                */
> +               fbpt_rp = (fbpt_rp + 1) % CIO2_MAX_BUFFERS;

Hmm, this is really racy. This code can be pre-empted and not execute
for quite long time, depending on system load, resuming after the
hardware goes even further. Technically you could prevent this using
*_irq_save()/_irq_restore(), but I'd try to find a way that doesn't
rely on the timing, if possible.

> +
> +               if (bufs_queued <= 1)
> +                       next = fbpt_rp + 1;     /* Buffers were drained */
> +               else if (fbpt_rp == next)
> +                       next++;
> +               next %= CIO2_MAX_BUFFERS;
> +       }
> +
> +       while (q->bufs[next]) {
> +               /* If the entry is used, get the next one,
> +                * We can not break here if all are filled,
> +                * Will wait for one free, otherwise it will crash
> +                */
> +               dev_dbg(&cio2->pci_dev->dev,
> +                       "entry %i was already full!\n", next);
> +               next = (next + 1) % CIO2_MAX_BUFFERS;

A busy waiting, possibly infinite, loop. Hmm.

I think we could do something smarter here, such as sleeping on a
wait_queue, which is woken up from the interrupt handler.

Also, why do you think it will crash? I think you can just do return
the buffer to vb2 with _ERROR status and bail out, if you can't queue
due to some failure.

> +       }
> +
> +       q->bufs[next] = b;
> +       entry = &q->fbpt[next * CIO2_MAX_LOPS];
> +       cio2_fbpt_entry_init_buf(cio2, b, entry);
> +       q->bufs_next = (next + 1) % CIO2_MAX_BUFFERS;
> +}
[snip]
> +static int cio2_set_power(struct vb2_queue *vq, int enable)
> +{
> +       struct cio2_device *cio2 = vb2_get_drv_priv(vq);
> +       struct device *dev = &cio2->pci_dev->dev;
> +       int ret = 0;
> +
> +       if (enable) {
> +               ret = pm_runtime_get_sync(dev);
> +               if (ret < 0) {
> +                       dev_info(&cio2->pci_dev->dev,
> +                               "failed to get power %d\n", ret);
> +                       pm_runtime_put(dev);
> +               }
> +       } else {
> +               ret = pm_runtime_put(dev);
> +       }
> +
> +       /* return 0 if power is active */
> +       return (ret >= 0) ? 0 : ret;

nit: I think this function is unnecessary, as it only adds one more
level of indirection and also combines two completely different code
paths together, especially since it is called exactly once with
enable==1 and once with enable==0. I'd suggest just pasting respective
code in place of the call instead.

> +}
[snip]
> +static int cio2_v4l2_querycap(struct file *file, void *fh,
> +                             struct v4l2_capability *cap)
> +{
> +       struct cio2_device *cio2 = video_drvdata(file);
> +
> +       strlcpy(cap->driver, CIO2_NAME, sizeof(cap->driver));
> +       strlcpy(cap->card, CIO2_DEVICE_NAME, sizeof(cap->card));
> +       snprintf(cap->bus_info, sizeof(cap->bus_info),
> +                "PCI:%s", pci_name(cio2->pci_dev));
> +       cap->device_caps = V4L2_CAP_VIDEO_CAPTURE | V4L2_CAP_STREAMING;

Hmm, I thought single plane queue type was deprecated these days and
_MPLANE recommended for all new drivers. I'll defer this to other
reviewers, though.

> +       cap->capabilities = cap->device_caps | V4L2_CAP_DEVICE_CAPS;
> +
> +       return 0;
> +}
[snip]
> +static int cio2_v4l2_try_fmt(struct file *file, void *fh, struct v4l2_format *f)
> +{
> +       u32 pixelformat = f->fmt.pix.pixelformat;
> +       unsigned int i;
> +
> +       cio2_v4l2_g_fmt(file, fh, f);
> +
> +       for (i = 0; i < ARRAY_SIZE(cio2_csi2_fmts); i++) {
> +               if (pixelformat == cio2_csi2_fmts[i])
> +                       break;
> +       }
> +
> +       /* Use SRGGB10 as default if not found */
> +       if (i >= ARRAY_SIZE(cio2_csi2_fmts))
> +               pixelformat = V4L2_PIX_FMT_IPU3_SRGGB10;
> +
> +       f->fmt.pix.pixelformat = pixelformat;
> +       f->fmt.pix.bytesperline = cio2_bytesperline(f->fmt.pix.width);
> +       f->fmt.pix.sizeimage = f->fmt.pix.bytesperline * f->fmt.pix.height;

Shouldn't you use f->fmt.pix_mp instead?

> +
> +       return 0;
> +}
[snip]
> +
> +       /* Initialize vbq */
> +       vbq->type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
> +       vbq->io_modes = VB2_USERPTR | VB2_MMAP;

VB2_DMABUF?

> +       vbq->ops = &cio2_vb2_ops;
> +       vbq->mem_ops = &vb2_dma_sg_memops;
> +       vbq->buf_struct_size = sizeof(struct cio2_buffer);
> +       vbq->timestamp_flags = V4L2_BUF_FLAG_TIMESTAMP_MONOTONIC;
> +       vbq->min_buffers_needed = 1;
> +       vbq->drv_priv = cio2;
> +       vbq->lock = &q->lock;

Does the code take into account queue operations and video device
operations being asynchronous regarding each other? Given that in this
case there is always one queue per video device, maybe it would just
make sense to use the same lock for both? (This happens if you leave
vbq->lock with NULL.)

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 3/3] [media] intel-ipu3: cio2: Add new MIPI-CSI2 driver
  2017-06-12  9:59   ` Tomasz Figa
@ 2017-06-13  8:58     ` Tuukka Toivonen
  2017-06-13  9:18       ` Tomasz Figa
  2017-06-16 11:48     ` Sakari Ailus
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 17+ messages in thread
From: Tuukka Toivonen @ 2017-06-13  8:58 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: Yong Zhi, linux-media, Sakari Ailus, Zheng, Jian Xu, Mani,
	Rajmohan, Hans Verkuil, Yang, Hyungwoo

Hi Tomasz,

On Monday, June 12, 2017 18:59:18 Tomasz Figa wrote:
> By any chance, doesn't the hardware provide some simple mode for
> contiguous buffers? Since we have an MMU anyway, we could use
> vb2_dma_contig and simplify the code significantly.

In IPU3 the CIO2 (CSI-2 receiver) and the IMGU (image processing system) 
are entirely separate PCI devices. The MMU is only in the IMGU device; 
the CIO2 doesn't have MMU but has the FBPT (frame buffer pointer tables) 
to handle discontinuous buffers.

[...]

> 
> > +               pixelformat = V4L2_PIX_FMT_IPU3_SRGGB10;
> > +
> > +       alloc_devs[0] = &cio2->pci_dev->dev;
> 
> Hmm, so it doesn't go through the IPU MMU in the end?

No, it doesn't.

> 
> > +       szimage = cio2_bytesperline(width) * height;
> > +
> > +       if (*num_planes) {
> > +               /*
> > +                * Only single plane is supported
> > +                */
> > +               if (*num_planes != 1 || sizes[0] < szimage)
> > +                       return -EINVAL;
> 
> S_FMT should validate this and then queue_setup should never get an
> invalid value.
> 
> > +       }
> > +
> > +       *num_planes = 1;
> > +       sizes[0] = szimage;
> > +
> > +       *num_buffers = clamp_val(*num_buffers, 1, CIO2_MAX_BUFFERS);
> > +
> > +       /* Initialize buffer queue */
> > +       for (i = 0; i < CIO2_MAX_BUFFERS; i++) {
> > +               q->bufs[i] = NULL;
> > +               cio2_fbpt_entry_init_dummy(cio2, &q->fbpt[i * 
CIO2_MAX_LOPS]);
> > +       }
> > +       atomic_set(&q->bufs_queued, 0);
> > +       q->bufs_first = 0;
> > +       q->bufs_next = 0;
> > +
> > +       return r;
> > +}
> > +
> > +/* Called after each buffer is allocated */
> > +static int cio2_vb2_buf_init(struct vb2_buffer *vb)
> > +{
> > +       struct cio2_device *cio2 = vb2_get_drv_priv(vb->vb2_queue);
> > +       struct device *dev = &cio2->pci_dev->dev;
> > +       struct cio2_buffer *b =
> > +               container_of(vb, struct cio2_buffer, vbb.vb2_buf);
> > +       unsigned int length = vb->planes[0].length;
> > +       int lops  = DIV_ROUND_UP(DIV_ROUND_UP(length, PAGE_SIZE) + 
1,
> > +                                PAGE_SIZE / sizeof(u32));
> > +       u32 *lop;
> > +       struct sg_table *sg;
> > +       struct sg_page_iter sg_iter;
> > +
> > +       if (lops <= 0 || lops > CIO2_MAX_LOPS) {
> > +               dev_err(dev, "%s: bad buffer size (%i)\n", __func__, 
length);
> > +               return -ENOSPC;         /* Should never happen */
> > +       }
> > +
> > +       /* Allocate LOP table */
> > +       b->lop = lop = dma_alloc_noncoherent(dev, lops * PAGE_SIZE,
> > +                                       &b->lop_bus_addr, 
GFP_KERNEL);
> 
> _coherent?
> 
> > +       if (!lop)
> > +               return -ENOMEM;
> > +
> > +       /* Fill LOP */
> > +       sg = vb2_dma_sg_plane_desc(vb, 0);
> > +       if (!sg)
> > +               return -EFAULT;
> 
> I'd say -ENOMEM is better here. (But actually it should be impossible,
> if allocation succeeded previously.)
> 
> > +
> > +       for_each_sg_page(sg->sgl, &sg_iter, sg->nents, 0)
> > +               *lop++ = sg_page_iter_dma_address(&sg_iter) >> 
PAGE_SHIFT;
> > +       *lop++ = cio2->dummy_page_bus_addr >> PAGE_SHIFT;
> > +
> > +       return 0;
> > +}
> > +
> > +/* Transfer buffer ownership to cio2 */
> > +static void cio2_vb2_buf_queue(struct vb2_buffer *vb)
> > +{
> > +       struct cio2_device *cio2 = vb2_get_drv_priv(vb->vb2_queue);
> > +       struct cio2_queue *q =
> > +               container_of(vb->vb2_queue, struct cio2_queue, vbq);
> > +       struct cio2_buffer *b =
> > +               container_of(vb, struct cio2_buffer, vbb.vb2_buf);
> > +       struct cio2_fbpt_entry *entry;
> > +       unsigned int next = q->bufs_next;
> > +       int bufs_queued = atomic_inc_return(&q->bufs_queued);
> > +
> > +       if (vb2_start_streaming_called(&q->vbq)) {
> 
> Shouldn't it be vb2_is_streaming()? (There is not much difference,
> though, except that vb2_start_streaming_called() returns true, even
> before .start_streaming finished, while vb2_is_streaming() does so
> only after it returns successfully.)
> 
> > +               u32 fbpt_rp =
> > +                       (readl(cio2->base + 
CIO2_REG_CDMARI(CIO2_DMA_CHAN))
> > +                        >> CIO2_CDMARI_FBPT_RP_SHIFT)
> > +                       & CIO2_CDMARI_FBPT_RP_MASK;
> > +
> > +               /*
> > +                * fbpt_rp is the fbpt entry that the dma is 
currently working
> > +                * on, but since it could jump to next entry at any 
time,
> > +                * assume that we might already be there.
> > +                */
> > +               fbpt_rp = (fbpt_rp + 1) % CIO2_MAX_BUFFERS;
> 
> Hmm, this is really racy. This code can be pre-empted and not execute
> for quite long time, depending on system load, resuming after the
> hardware goes even further. Technically you could prevent this using
> *_irq_save()/_irq_restore(), but I'd try to find a way that doesn't
> rely on the timing, if possible.

That is true, if the driver doesn't get executed in more than one frame 
time. I don't think that's very common, but should be handled.

Hmm. Actually the buffer has VALID bit which is set by driver to indicate
that the HW can fill the buffer and cleared by HW to indicate that the
buffer is filled. Probably the HW can not actually jump to the next
buffer as suggested by the comment, because I think the VALID bit
would be clear in that case. That should be checked.

> 
> > +
> > +               if (bufs_queued <= 1)
> > +                       next = fbpt_rp + 1;     /* Buffers were 
drained */
> > +               else if (fbpt_rp == next)
> > +                       next++;
> > +               next %= CIO2_MAX_BUFFERS;
> > +       }
> > +
> > +       while (q->bufs[next]) {
> > +               /* If the entry is used, get the next one,
> > +                * We can not break here if all are filled,
> > +                * Will wait for one free, otherwise it will crash
> > +                */

That comment should be fixed. "otherwise it will crash" doesn't
tell much useful. Why would it crash?

> > +               dev_dbg(&cio2->pci_dev->dev,
> > +                       "entry %i was already full!\n", next);
> > +               next = (next + 1) % CIO2_MAX_BUFFERS;
> 
> A busy waiting, possibly infinite, loop. Hmm.

It's not really busy waiting. We have allocated CIO2_MAX_BUFFERS
buffers (or actually just buffer entries in HW table) circularly for the
hardware, and then the user has requested N buffer queue. The driver
ensures N <= CIO2_MAX_BUFFERS and this guarantees that whenever user
queues a buffer, there necessarily is a free buffer in the hardware
circular buffer list. The loop above finds the first free buffer from the
circular list, which necessarily exists. In practice it should be always
the very first since that is the oldest one given to hardware.

> 
> I think we could do something smarter here, such as sleeping on a
> wait_queue, which is woken up from the interrupt handler.

I think that's a bit complicated for situation which should be never
possible to happen.

> Also, why do you think it will crash? I think you can just do return
> the buffer to vb2 with _ERROR status and bail out, if you can't queue
> due to some failure.

Agree.

- Tuukka

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 3/3] [media] intel-ipu3: cio2: Add new MIPI-CSI2 driver
  2017-06-13  8:58     ` Tuukka Toivonen
@ 2017-06-13  9:18       ` Tomasz Figa
  0 siblings, 0 replies; 17+ messages in thread
From: Tomasz Figa @ 2017-06-13  9:18 UTC (permalink / raw)
  To: Tuukka Toivonen
  Cc: Yong Zhi, linux-media, Sakari Ailus, Zheng, Jian Xu, Mani,
	Rajmohan, Hans Verkuil, Yang, Hyungwoo

Hi Tuukka,

Thanks for your replies. Please see mine inline.

On Tue, Jun 13, 2017 at 5:58 PM, Tuukka Toivonen
<tuukka.toivonen@intel.com> wrote:
> Hi Tomasz,
>
> On Monday, June 12, 2017 18:59:18 Tomasz Figa wrote:
>> By any chance, doesn't the hardware provide some simple mode for
>> contiguous buffers? Since we have an MMU anyway, we could use
>> vb2_dma_contig and simplify the code significantly.
>
> In IPU3 the CIO2 (CSI-2 receiver) and the IMGU (image processing system)
> are entirely separate PCI devices. The MMU is only in the IMGU device;
> the CIO2 doesn't have MMU but has the FBPT (frame buffer pointer tables)
> to handle discontinuous buffers.
>
> [...]
>
>>
>> > +               pixelformat = V4L2_PIX_FMT_IPU3_SRGGB10;
>> > +
>> > +       alloc_devs[0] = &cio2->pci_dev->dev;
>>
>> Hmm, so it doesn't go through the IPU MMU in the end?
>
> No, it doesn't.

Aha. I was confused by the fact that the driver calls
dma_alloc_(non)coherent() with sizes likely greater than PAGE_SIZE.

So, given the above, I believe we need to fix the LOP allocation to
allocate one page at a time and stop relying on bus address
contiguity.

>> > +/* Called after each buffer is allocated */
>> > +static int cio2_vb2_buf_init(struct vb2_buffer *vb)
>> > +{
>> > +       struct cio2_device *cio2 = vb2_get_drv_priv(vb->vb2_queue);
>> > +       struct device *dev = &cio2->pci_dev->dev;
>> > +       struct cio2_buffer *b =
>> > +               container_of(vb, struct cio2_buffer, vbb.vb2_buf);
>> > +       unsigned int length = vb->planes[0].length;
>> > +       int lops  = DIV_ROUND_UP(DIV_ROUND_UP(length, PAGE_SIZE) +
> 1,
>> > +                                PAGE_SIZE / sizeof(u32));
>> > +       u32 *lop;
>> > +       struct sg_table *sg;
>> > +       struct sg_page_iter sg_iter;
>> > +
>> > +       if (lops <= 0 || lops > CIO2_MAX_LOPS) {
>> > +               dev_err(dev, "%s: bad buffer size (%i)\n", __func__,
> length);
>> > +               return -ENOSPC;         /* Should never happen */
>> > +       }
>> > +
>> > +       /* Allocate LOP table */
>> > +       b->lop = lop = dma_alloc_noncoherent(dev, lops * PAGE_SIZE,
>> > +                                       &b->lop_bus_addr,
> GFP_KERNEL);

^^ Here is the offending allocation.

>>
>> > +               u32 fbpt_rp =
>> > +                       (readl(cio2->base +
> CIO2_REG_CDMARI(CIO2_DMA_CHAN))
>> > +                        >> CIO2_CDMARI_FBPT_RP_SHIFT)
>> > +                       & CIO2_CDMARI_FBPT_RP_MASK;
>> > +
>> > +               /*
>> > +                * fbpt_rp is the fbpt entry that the dma is
> currently working
>> > +                * on, but since it could jump to next entry at any
> time,
>> > +                * assume that we might already be there.
>> > +                */
>> > +               fbpt_rp = (fbpt_rp + 1) % CIO2_MAX_BUFFERS;
>>
>> Hmm, this is really racy. This code can be pre-empted and not execute
>> for quite long time, depending on system load, resuming after the
>> hardware goes even further. Technically you could prevent this using
>> *_irq_save()/_irq_restore(), but I'd try to find a way that doesn't
>> rely on the timing, if possible.
>
> That is true, if the driver doesn't get executed in more than one frame
> time. I don't think that's very common, but should be handled.
>
> Hmm. Actually the buffer has VALID bit which is set by driver to indicate
> that the HW can fill the buffer and cleared by HW to indicate that the
> buffer is filled. Probably the HW can not actually jump to the next
> buffer as suggested by the comment, because I think the VALID bit
> would be clear in that case. That should be checked.

I think the problem here is that we keep all the entries valid and
only point to dummy buffers if there are no buffers queued by
userspace.

>
>>
>> > +
>> > +               if (bufs_queued <= 1)
>> > +                       next = fbpt_rp + 1;     /* Buffers were
> drained */
>> > +               else if (fbpt_rp == next)
>> > +                       next++;
>> > +               next %= CIO2_MAX_BUFFERS;
>> > +       }
>> > +
>> > +       while (q->bufs[next]) {
>> > +               /* If the entry is used, get the next one,
>> > +                * We can not break here if all are filled,
>> > +                * Will wait for one free, otherwise it will crash
>> > +                */
>
> That comment should be fixed. "otherwise it will crash" doesn't
> tell much useful. Why would it crash?
>
>> > +               dev_dbg(&cio2->pci_dev->dev,
>> > +                       "entry %i was already full!\n", next);
>> > +               next = (next + 1) % CIO2_MAX_BUFFERS;
>>
>> A busy waiting, possibly infinite, loop. Hmm.
>
> It's not really busy waiting. We have allocated CIO2_MAX_BUFFERS
> buffers (or actually just buffer entries in HW table) circularly for the
> hardware, and then the user has requested N buffer queue. The driver
> ensures N <= CIO2_MAX_BUFFERS and this guarantees that whenever user
> queues a buffer, there necessarily is a free buffer in the hardware
> circular buffer list. The loop above finds the first free buffer from the
> circular list, which necessarily exists. In practice it should be always
> the very first since that is the oldest one given to hardware.
>
>>
>> I think we could do something smarter here, such as sleeping on a
>> wait_queue, which is woken up from the interrupt handler.
>
> I think that's a bit complicated for situation which should be never
> possible to happen.
>
>> Also, why do you think it will crash? I think you can just do return
>> the buffer to vb2 with _ERROR status and bail out, if you can't queue
>> due to some failure.
>
> Agree.

Given your explanation, wouldn't it make sense to actually make the
loop finite, limited by the number of buffers and if (due to some
unforeseen condition, like a driver bug) there is no buffer available
until then, error out?

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 3/3] [media] intel-ipu3: cio2: Add new MIPI-CSI2 driver
  2017-06-12  9:59   ` Tomasz Figa
  2017-06-13  8:58     ` Tuukka Toivonen
@ 2017-06-16 11:48     ` Sakari Ailus
  2017-06-26 14:51     ` Sakari Ailus
  2017-10-06 19:19     ` Zhi, Yong
  3 siblings, 0 replies; 17+ messages in thread
From: Sakari Ailus @ 2017-06-16 11:48 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: Yong Zhi, linux-media, Sakari Ailus, Zheng, Jian Xu, Mani,
	Rajmohan, Toivonen, Tuukka, Hans Verkuil, Yang, Hyungwoo

Hi Tomasz,

On Mon, Jun 12, 2017 at 06:59:18PM +0900, Tomasz Figa wrote:
> Hi Yong,
> 
> Please see my comments inline.
> 
> On Wed, Jun 7, 2017 at 10:34 AM, Yong Zhi <yong.zhi@intel.com> wrote:
> > This patch adds CIO2 CSI-2 device driver for
> > Intel's IPU3 camera sub-system support.
> >
> > Signed-off-by: Yong Zhi <yong.zhi@intel.com>
> > ---
> >  drivers/media/pci/Kconfig                |    2 +
> >  drivers/media/pci/Makefile               |    3 +-
> >  drivers/media/pci/intel/Makefile         |    5 +
> >  drivers/media/pci/intel/ipu3/Kconfig     |   17 +
> >  drivers/media/pci/intel/ipu3/Makefile    |    1 +
> >  drivers/media/pci/intel/ipu3/ipu3-cio2.c | 1788 ++++++++++++++++++++++++++++++
> >  drivers/media/pci/intel/ipu3/ipu3-cio2.h |  424 +++++++
> >  7 files changed, 2239 insertions(+), 1 deletion(-)
> >  create mode 100644 drivers/media/pci/intel/Makefile
> >  create mode 100644 drivers/media/pci/intel/ipu3/Kconfig
> >  create mode 100644 drivers/media/pci/intel/ipu3/Makefile
> >  create mode 100644 drivers/media/pci/intel/ipu3/ipu3-cio2.c
> >  create mode 100644 drivers/media/pci/intel/ipu3/ipu3-cio2.h
> [snip]
> > diff --git a/drivers/media/pci/intel/ipu3/Kconfig b/drivers/media/pci/intel/ipu3/Kconfig
> > new file mode 100644
> > index 0000000..2a895d6
> > --- /dev/null
> > +++ b/drivers/media/pci/intel/ipu3/Kconfig
> > @@ -0,0 +1,17 @@
> > +config VIDEO_IPU3_CIO2
> > +       tristate "Intel ipu3-cio2 driver"
> > +       depends on VIDEO_V4L2 && PCI
> > +       depends on MEDIA_CONTROLLER
> > +       depends on HAS_DMA
> > +       depends on ACPI
> 
> I wonder if it wouldn't make sense to make this depend on X86 (||
> COMPILE_TEST) as well. Are we expecting a standalone PCI(e) card with
> this device in the future?

All I'm aware of are integrated with the CPU (or the chipset).

-- 
Regards,

Sakari Ailus
e-mail: sakari.ailus@iki.fi	XMPP: sailus@retiisi.org.uk

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 3/3] [media] intel-ipu3: cio2: Add new MIPI-CSI2 driver
  2017-06-12  9:59   ` Tomasz Figa
  2017-06-13  8:58     ` Tuukka Toivonen
  2017-06-16 11:48     ` Sakari Ailus
@ 2017-06-26 14:51     ` Sakari Ailus
  2017-06-27  9:33       ` Tomasz Figa
  2017-10-06 19:19     ` Zhi, Yong
  3 siblings, 1 reply; 17+ messages in thread
From: Sakari Ailus @ 2017-06-26 14:51 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: Yong Zhi, linux-media, Sakari Ailus, Zheng, Jian Xu, Mani,
	Rajmohan, Toivonen, Tuukka, Hans Verkuil, Yang, Hyungwoo

Hi Tomasz,

A few more comments, better late than never I guess.

On Mon, Jun 12, 2017 at 06:59:18PM +0900, Tomasz Figa wrote:
...
> > +/*
> > + * The CSI2 receiver has several parameters affecting
> > + * the receiver timings. These depend on the MIPI bus frequency
> > + * F in Hz (sensor transmitter rate) as follows:
> > + *     register value = (A/1e9 + B * UI) / COUNT_ACC
> > + * where
> > + *      UI = 1 / (2 * F) in seconds
> > + *      COUNT_ACC = counter accuracy in seconds
> > + *      For IPU3 COUNT_ACC = 0.0625
> > + *
> > + * A and B are coefficients from the table below,
> > + * depending whether the register minimum or maximum value is
> > + * calculated.
> > + *                                     Minimum     Maximum
> > + * Clock lane                          A     B     A     B
> > + * reg_rx_csi_dly_cnt_termen_clane     0     0    38     0
> > + * reg_rx_csi_dly_cnt_settle_clane    95    -8   300   -16
> > + * Data lanes
> > + * reg_rx_csi_dly_cnt_termen_dlane0    0     0    35
> > + * reg_rx_csi_dly_cnt_settle_dlane0   85    -2   145    -6
> > + * reg_rx_csi_dly_cnt_termen_dlane1    0     0    35     4
> > + * reg_rx_csi_dly_cnt_settle_dlane1   85    -2   145    -6
> > + * reg_rx_csi_dly_cnt_termen_dlane2    0     0    35     4
> > + * reg_rx_csi_dly_cnt_settle_dlane2   85    -2   145    -6
> > + * reg_rx_csi_dly_cnt_termen_dlane3    0     0    35     4
> > + * reg_rx_csi_dly_cnt_settle_dlane3   85    -2   145    -6
> > + *
> > + * We use the minimum values of both A and B.
> 
> Why?
> 
> > + */
> > +static int cio2_rx_timing(s32 a, s32 b, s64 freq)
> > +{
> > +       int r;
> > +       const u32 accinv = 16;
> > +       const u32 ds = 8; /* divde shift */
> 
> typo: divide
> 
> > +
> > +       freq = (s32)freq >> ds;
> 
> Why do we demote freq from 64 to 32 bits here?

I don't think there's any reason to. The original purpose of the check has
likely been to avoid dividing by a 64-bit number but that has been lost
here. The cast should be elsewhere...

> 
> > +       if (WARN_ON(freq <= 0))
> > +               return -EINVAL;
> 
> It generally doesn't make sense for the frequency to be negative, so
> maybe the argument should have been unsigned to start with? (And
> 32-bit if we don't expect frequencies higher than 4 GHz anyway.)

The value comes from a 64-bit integer V4L2 control so that implies the value
range of s64 as well.

> 
> > +
> > +       /* b could be 0, -2 or -8, so r < 500000000 */
> 
> Definitely. Anything <= 0 is also less than 500000000. Let's take a
> look at the computation below again:
> 
> 1) accinv is multiplied by b,
> 2) 500000000 is divided by 256 (=== shift right by 8 bits) = 1953125,
> 3) accinv*b is multiplied by 1953125 to form the value of r.
> 
> Now let's see at possible maximum absolute values for particular steps:
> 1) 16 * -8 = -128 (signed 8 bits),
> 2) 1953125 (unsigned 21 bits),
> 3) -128 * 1953125 = -249999872 (signed 29 bits).
> 
> So I think the important thing to note in the comment is:
> 
> /* b could be 0, -2 or -8, so |accinv * b| is always less than (1 <<
> ds) and thus |r| < 500000000. */
> 
> > +       r = accinv * b * (500000000 >> ds);
> 
> On the other hand, you lose some precision here. If you used s64
> instead and did the divide shift at the end ((accinv * b * 500000000)
> >> ds), for the example above you would get -250007629. (Depending on
> how big freq is, it might not matter, though.)
> 

The frequency is typically hundreds of mega-Hertz.

> Also nit: What is 500000000? We have local constants defined above, I
> think it could also make sense to do the same for this one. The
> compiler should do constant propagation and simplify respective
> calculations anyway.

COUNT_ACC in the formula in the comment a few decalines above is in
nanoseconds. Performing the calculations in integer arithmetics results in
having 500000000 in the resulting formula.

So this is actually a constant related to the hardware but it does not have
a pre-determined name because it is derived from COUNT_ACC.

...

> > +static int cio2_vb2_queue_setup(struct vb2_queue *vq,
> > +                               unsigned int *num_buffers,
> > +                               unsigned int *num_planes,
> > +                               unsigned int sizes[],
> > +                               struct device *alloc_devs[])
> > +{
> > +       struct cio2_device *cio2 = vb2_get_drv_priv(vq);
> > +       struct cio2_queue *q = container_of(vq, struct cio2_queue, vbq);
> > +       u32 width = q->subdev_fmt.width;
> > +       u32 height = q->subdev_fmt.height;
> > +       u32 pixelformat = q->pixelformat;
> > +       unsigned int i, szimage;
> > +       int r = 0;
> > +
> > +       for (i = 0; i < ARRAY_SIZE(cio2_csi2_fmts); i++) {
> > +               if (pixelformat == cio2_csi2_fmts[i])
> > +                       break;
> > +       }
> > +
> > +       /* Use SRGGB10 instead of return err */
> > +       if (i >= ARRAY_SIZE(cio2_csi2_fmts))
> 
> I think this should be impossible, since S_FMT should have already
> validated (and corrected) the setting.
> 
> > +               pixelformat = V4L2_PIX_FMT_IPU3_SRGGB10;
> > +
> > +       alloc_devs[0] = &cio2->pci_dev->dev;
> 
> Hmm, so it doesn't go through the IPU MMU in the end?

No. The CSI-2 receiver isn't behind the MMU --- it's entirely separate from
the ISP.

...

> > +static int cio2_v4l2_querycap(struct file *file, void *fh,
> > +                             struct v4l2_capability *cap)
> > +{
> > +       struct cio2_device *cio2 = video_drvdata(file);
> > +
> > +       strlcpy(cap->driver, CIO2_NAME, sizeof(cap->driver));
> > +       strlcpy(cap->card, CIO2_DEVICE_NAME, sizeof(cap->card));
> > +       snprintf(cap->bus_info, sizeof(cap->bus_info),
> > +                "PCI:%s", pci_name(cio2->pci_dev));
> > +       cap->device_caps = V4L2_CAP_VIDEO_CAPTURE | V4L2_CAP_STREAMING;
> 
> Hmm, I thought single plane queue type was deprecated these days and
> _MPLANE recommended for all new drivers. I'll defer this to other
> reviewers, though.

If the device supports single plane formats only, I don't see a reason to
use MPLANE buffer types.

> [snip]
> > +
> > +       /* Initialize vbq */
> > +       vbq->type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
> > +       vbq->io_modes = VB2_USERPTR | VB2_MMAP;
> 
> VB2_DMABUF?
> 
> > +       vbq->ops = &cio2_vb2_ops;
> > +       vbq->mem_ops = &vb2_dma_sg_memops;
> > +       vbq->buf_struct_size = sizeof(struct cio2_buffer);
> > +       vbq->timestamp_flags = V4L2_BUF_FLAG_TIMESTAMP_MONOTONIC;
> > +       vbq->min_buffers_needed = 1;
> > +       vbq->drv_priv = cio2;
> > +       vbq->lock = &q->lock;
> 
> Does the code take into account queue operations and video device
> operations being asynchronous regarding each other? Given that in this
> case there is always one queue per video device, maybe it would just
> make sense to use the same lock for both? (This happens if you leave
> vbq->lock with NULL.)

Using the same lock should be fine IMO.

-- 
Sakari Ailus
e-mail: sakari.ailus@iki.fi	XMPP: sailus@retiisi.org.uk

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 3/3] [media] intel-ipu3: cio2: Add new MIPI-CSI2 driver
  2017-06-26 14:51     ` Sakari Ailus
@ 2017-06-27  9:33       ` Tomasz Figa
  2017-06-28 13:31         ` Sakari Ailus
  0 siblings, 1 reply; 17+ messages in thread
From: Tomasz Figa @ 2017-06-27  9:33 UTC (permalink / raw)
  To: Sakari Ailus
  Cc: Yong Zhi, Linux Media Mailing List, Sakari Ailus, Zheng, Jian Xu,
	Mani, Rajmohan, Toivonen, Tuukka, Hans Verkuil, Yang, Hyungwoo

On Mon, Jun 26, 2017 at 11:51 PM, Sakari Ailus <sakari.ailus@iki.fi> wrote:
> On Mon, Jun 12, 2017 at 06:59:18PM +0900, Tomasz Figa wrote:
>
>>
>> > +       if (WARN_ON(freq <= 0))
>> > +               return -EINVAL;
>>
>> It generally doesn't make sense for the frequency to be negative, so
>> maybe the argument should have been unsigned to start with? (And
>> 32-bit if we don't expect frequencies higher than 4 GHz anyway.)
>
> The value comes from a 64-bit integer V4L2 control so that implies the value
> range of s64 as well.

Okay, if there is no way to enforce this at control level, then I
guess we have to keep this here.

>
>>
>> > +
>> > +       /* b could be 0, -2 or -8, so r < 500000000 */
>>
>> Definitely. Anything <= 0 is also less than 500000000. Let's take a
>> look at the computation below again:
>>
>> 1) accinv is multiplied by b,
>> 2) 500000000 is divided by 256 (=== shift right by 8 bits) = 1953125,
>> 3) accinv*b is multiplied by 1953125 to form the value of r.
>>
>> Now let's see at possible maximum absolute values for particular steps:
>> 1) 16 * -8 = -128 (signed 8 bits),
>> 2) 1953125 (unsigned 21 bits),
>> 3) -128 * 1953125 = -249999872 (signed 29 bits).
>>
>> So I think the important thing to note in the comment is:
>>
>> /* b could be 0, -2 or -8, so |accinv * b| is always less than (1 <<
>> ds) and thus |r| < 500000000. */
>>
>> > +       r = accinv * b * (500000000 >> ds);
>>
>> On the other hand, you lose some precision here. If you used s64
>> instead and did the divide shift at the end ((accinv * b * 500000000)
>> >> ds), for the example above you would get -250007629. (Depending on
>> how big freq is, it might not matter, though.)
>>
>
> The frequency is typically hundreds of mega-Hertz.

I think it still would make sense to have the calculation a bit more precise.

>
>> Also nit: What is 500000000? We have local constants defined above, I
>> think it could also make sense to do the same for this one. The
>> compiler should do constant propagation and simplify respective
>> calculations anyway.
>
> COUNT_ACC in the formula in the comment a few decalines above is in
> nanoseconds. Performing the calculations in integer arithmetics results in
> having 500000000 in the resulting formula.
>
> So this is actually a constant related to the hardware but it does not have
> a pre-determined name because it is derived from COUNT_ACC.

Which, I believe, doesn't stop us from naming it.

>> > +static int cio2_v4l2_querycap(struct file *file, void *fh,
>> > +                             struct v4l2_capability *cap)
>> > +{
>> > +       struct cio2_device *cio2 = video_drvdata(file);
>> > +
>> > +       strlcpy(cap->driver, CIO2_NAME, sizeof(cap->driver));
>> > +       strlcpy(cap->card, CIO2_DEVICE_NAME, sizeof(cap->card));
>> > +       snprintf(cap->bus_info, sizeof(cap->bus_info),
>> > +                "PCI:%s", pci_name(cio2->pci_dev));
>> > +       cap->device_caps = V4L2_CAP_VIDEO_CAPTURE | V4L2_CAP_STREAMING;
>>
>> Hmm, I thought single plane queue type was deprecated these days and
>> _MPLANE recommended for all new drivers. I'll defer this to other
>> reviewers, though.
>
> If the device supports single plane formats only, I don't see a reason to
> use MPLANE buffer types.

On the other hand, if a further new revision of the hardware (or
amendment of supported feature set of current hardware) actually adds
support for multiple planes, changing it to MPLANE will require
keeping a non-MPLANE variant of the code, due to userspace
compatibility concerns...

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 3/3] [media] intel-ipu3: cio2: Add new MIPI-CSI2 driver
  2017-06-27  9:33       ` Tomasz Figa
@ 2017-06-28 13:31         ` Sakari Ailus
  2017-06-28 13:36           ` Tomasz Figa
       [not found]           ` <CGME20170628154447epcas5p28ba0ff617f6e640185fada0e955e24b0@epcas5p2.samsung.com>
  0 siblings, 2 replies; 17+ messages in thread
From: Sakari Ailus @ 2017-06-28 13:31 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: Yong Zhi, Linux Media Mailing List, Sakari Ailus, Zheng, Jian Xu,
	Mani, Rajmohan, Toivonen, Tuukka, Hans Verkuil, Yang, Hyungwoo

On Tue, Jun 27, 2017 at 06:33:13PM +0900, Tomasz Figa wrote:
> On Mon, Jun 26, 2017 at 11:51 PM, Sakari Ailus <sakari.ailus@iki.fi> wrote:
> > On Mon, Jun 12, 2017 at 06:59:18PM +0900, Tomasz Figa wrote:
> >
> >>
> >> > +       if (WARN_ON(freq <= 0))
> >> > +               return -EINVAL;
> >>
> >> It generally doesn't make sense for the frequency to be negative, so
> >> maybe the argument should have been unsigned to start with? (And
> >> 32-bit if we don't expect frequencies higher than 4 GHz anyway.)
> >
> > The value comes from a 64-bit integer V4L2 control so that implies the value
> > range of s64 as well.
> 
> Okay, if there is no way to enforce this at control level, then I
> guess we have to keep this here.
> 
> >
> >>
> >> > +
> >> > +       /* b could be 0, -2 or -8, so r < 500000000 */
> >>
> >> Definitely. Anything <= 0 is also less than 500000000. Let's take a
> >> look at the computation below again:
> >>
> >> 1) accinv is multiplied by b,
> >> 2) 500000000 is divided by 256 (=== shift right by 8 bits) = 1953125,
> >> 3) accinv*b is multiplied by 1953125 to form the value of r.
> >>
> >> Now let's see at possible maximum absolute values for particular steps:
> >> 1) 16 * -8 = -128 (signed 8 bits),
> >> 2) 1953125 (unsigned 21 bits),
> >> 3) -128 * 1953125 = -249999872 (signed 29 bits).
> >>
> >> So I think the important thing to note in the comment is:
> >>
> >> /* b could be 0, -2 or -8, so |accinv * b| is always less than (1 <<
> >> ds) and thus |r| < 500000000. */
> >>
> >> > +       r = accinv * b * (500000000 >> ds);
> >>
> >> On the other hand, you lose some precision here. If you used s64
> >> instead and did the divide shift at the end ((accinv * b * 500000000)
> >> >> ds), for the example above you would get -250007629. (Depending on
> >> how big freq is, it might not matter, though.)
> >>
> >
> > The frequency is typically hundreds of mega-Hertz.
> 
> I think it still would make sense to have the calculation a bit more precise.

Then the solution is to divide by the 64-bit number, i.e. do_div(). IMO
this shouldn't be a big deal either way: the result needs to be in a value
range and this is only done once when streaming is started.

> 
> >
> >> Also nit: What is 500000000? We have local constants defined above, I
> >> think it could also make sense to do the same for this one. The
> >> compiler should do constant propagation and simplify respective
> >> calculations anyway.
> >
> > COUNT_ACC in the formula in the comment a few decalines above is in
> > nanoseconds. Performing the calculations in integer arithmetics results in
> > having 500000000 in the resulting formula.
> >
> > So this is actually a constant related to the hardware but it does not have
> > a pre-determined name because it is derived from COUNT_ACC.
> 
> Which, I believe, doesn't stop us from naming it.

No, but the value is derived from another value and used once. There's not
much value in adding a macro for IMO.

The formula can be perhaps easier written as:

	accinv * a + (accinv * b * (500000000 >> ds)
		      / (int32_t)(link_freq >> ds));

If you insist, how about COUNT_ACC_FACTOR, for it's derived from COUNT_ACC?

> 
> >> > +static int cio2_v4l2_querycap(struct file *file, void *fh,
> >> > +                             struct v4l2_capability *cap)
> >> > +{
> >> > +       struct cio2_device *cio2 = video_drvdata(file);
> >> > +
> >> > +       strlcpy(cap->driver, CIO2_NAME, sizeof(cap->driver));
> >> > +       strlcpy(cap->card, CIO2_DEVICE_NAME, sizeof(cap->card));
> >> > +       snprintf(cap->bus_info, sizeof(cap->bus_info),
> >> > +                "PCI:%s", pci_name(cio2->pci_dev));
> >> > +       cap->device_caps = V4L2_CAP_VIDEO_CAPTURE | V4L2_CAP_STREAMING;
> >>
> >> Hmm, I thought single plane queue type was deprecated these days and
> >> _MPLANE recommended for all new drivers. I'll defer this to other
> >> reviewers, though.
> >
> > If the device supports single plane formats only, I don't see a reason to
> > use MPLANE buffer types.
> 
> On the other hand, if a further new revision of the hardware (or
> amendment of supported feature set of current hardware) actually adds
> support for multiple planes, changing it to MPLANE will require
> keeping a non-MPLANE variant of the code, due to userspace
> compatibility concerns...

I think I have to correct my earlier statement --- the device supports
multi-planar formats as well. They're only useful with SoC cameras though,
not with raw Bayer cameras.

IMO VB2/V4L2 could better support conversion between single and
multi-planar buffer types so that the applications could just use any and
drivers could manage with one.

I don't have a strong opinion either way, but IMO this could be well
addressed later on by improving the framework when (or if) the support for
formats such as NV12 is added.

-- 
Kind regards,

Sakari Ailus
e-mail: sakari.ailus@iki.fi	XMPP: sailus@retiisi.org.uk

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 3/3] [media] intel-ipu3: cio2: Add new MIPI-CSI2 driver
  2017-06-28 13:31         ` Sakari Ailus
@ 2017-06-28 13:36           ` Tomasz Figa
  2017-06-28 14:55             ` Sakari Ailus
       [not found]           ` <CGME20170628154447epcas5p28ba0ff617f6e640185fada0e955e24b0@epcas5p2.samsung.com>
  1 sibling, 1 reply; 17+ messages in thread
From: Tomasz Figa @ 2017-06-28 13:36 UTC (permalink / raw)
  To: Sakari Ailus, Hans Verkuil
  Cc: Yong Zhi, Linux Media Mailing List, Sakari Ailus, Zheng, Jian Xu,
	Mani, Rajmohan, Toivonen, Tuukka, Yang, Hyungwoo

On Wed, Jun 28, 2017 at 10:31 PM, Sakari Ailus <sakari.ailus@iki.fi> wrote:
> On Tue, Jun 27, 2017 at 06:33:13PM +0900, Tomasz Figa wrote:
>> On Mon, Jun 26, 2017 at 11:51 PM, Sakari Ailus <sakari.ailus@iki.fi> wrote:
>> > On Mon, Jun 12, 2017 at 06:59:18PM +0900, Tomasz Figa wrote:
>> >
>> >>
>> >> > +       if (WARN_ON(freq <= 0))
>> >> > +               return -EINVAL;
>> >>
>> >> It generally doesn't make sense for the frequency to be negative, so
>> >> maybe the argument should have been unsigned to start with? (And
>> >> 32-bit if we don't expect frequencies higher than 4 GHz anyway.)
>> >
>> > The value comes from a 64-bit integer V4L2 control so that implies the value
>> > range of s64 as well.
>>
>> Okay, if there is no way to enforce this at control level, then I
>> guess we have to keep this here.
>>
>> >
>> >>
>> >> > +
>> >> > +       /* b could be 0, -2 or -8, so r < 500000000 */
>> >>
>> >> Definitely. Anything <= 0 is also less than 500000000. Let's take a
>> >> look at the computation below again:
>> >>
>> >> 1) accinv is multiplied by b,
>> >> 2) 500000000 is divided by 256 (=== shift right by 8 bits) = 1953125,
>> >> 3) accinv*b is multiplied by 1953125 to form the value of r.
>> >>
>> >> Now let's see at possible maximum absolute values for particular steps:
>> >> 1) 16 * -8 = -128 (signed 8 bits),
>> >> 2) 1953125 (unsigned 21 bits),
>> >> 3) -128 * 1953125 = -249999872 (signed 29 bits).
>> >>
>> >> So I think the important thing to note in the comment is:
>> >>
>> >> /* b could be 0, -2 or -8, so |accinv * b| is always less than (1 <<
>> >> ds) and thus |r| < 500000000. */
>> >>
>> >> > +       r = accinv * b * (500000000 >> ds);
>> >>
>> >> On the other hand, you lose some precision here. If you used s64
>> >> instead and did the divide shift at the end ((accinv * b * 500000000)
>> >> >> ds), for the example above you would get -250007629. (Depending on
>> >> how big freq is, it might not matter, though.)
>> >>
>> >
>> > The frequency is typically hundreds of mega-Hertz.
>>
>> I think it still would make sense to have the calculation a bit more precise.
>
> Then the solution is to divide by the 64-bit number, i.e. do_div(). IMO
> this shouldn't be a big deal either way: the result needs to be in a value
> range and this is only done once when streaming is started.
>
>>
>> >
>> >> Also nit: What is 500000000? We have local constants defined above, I
>> >> think it could also make sense to do the same for this one. The
>> >> compiler should do constant propagation and simplify respective
>> >> calculations anyway.
>> >
>> > COUNT_ACC in the formula in the comment a few decalines above is in
>> > nanoseconds. Performing the calculations in integer arithmetics results in
>> > having 500000000 in the resulting formula.
>> >
>> > So this is actually a constant related to the hardware but it does not have
>> > a pre-determined name because it is derived from COUNT_ACC.
>>
>> Which, I believe, doesn't stop us from naming it.
>
> No, but the value is derived from another value and used once. There's not
> much value in adding a macro for IMO.
>
> The formula can be perhaps easier written as:
>
>         accinv * a + (accinv * b * (500000000 >> ds)
>                       / (int32_t)(link_freq >> ds));
>
> If you insist, how about COUNT_ACC_FACTOR, for it's derived from COUNT_ACC?
>
>>
>> >> > +static int cio2_v4l2_querycap(struct file *file, void *fh,
>> >> > +                             struct v4l2_capability *cap)
>> >> > +{
>> >> > +       struct cio2_device *cio2 = video_drvdata(file);
>> >> > +
>> >> > +       strlcpy(cap->driver, CIO2_NAME, sizeof(cap->driver));
>> >> > +       strlcpy(cap->card, CIO2_DEVICE_NAME, sizeof(cap->card));
>> >> > +       snprintf(cap->bus_info, sizeof(cap->bus_info),
>> >> > +                "PCI:%s", pci_name(cio2->pci_dev));
>> >> > +       cap->device_caps = V4L2_CAP_VIDEO_CAPTURE | V4L2_CAP_STREAMING;
>> >>
>> >> Hmm, I thought single plane queue type was deprecated these days and
>> >> _MPLANE recommended for all new drivers. I'll defer this to other
>> >> reviewers, though.
>> >
>> > If the device supports single plane formats only, I don't see a reason to
>> > use MPLANE buffer types.
>>
>> On the other hand, if a further new revision of the hardware (or
>> amendment of supported feature set of current hardware) actually adds
>> support for multiple planes, changing it to MPLANE will require
>> keeping a non-MPLANE variant of the code, due to userspace
>> compatibility concerns...
>
> I think I have to correct my earlier statement --- the device supports
> multi-planar formats as well. They're only useful with SoC cameras though,
> not with raw Bayer cameras.
>
> IMO VB2/V4L2 could better support conversion between single and
> multi-planar buffer types so that the applications could just use any and
> drivers could manage with one.
>
> I don't have a strong opinion either way, but IMO this could be well
> addressed later on by improving the framework when (or if) the support for
> formats such as NV12 is added.

The problem is that it couldn't, because it would change the userspace ABI.

...and somehow I still recall (voice echoing in my head ;)) someone
saying (writing) that single plane ABI is deprecated and all new
drivers should be using MPLANE. Hans, was that you?

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 3/3] [media] intel-ipu3: cio2: Add new MIPI-CSI2 driver
  2017-06-28 13:36           ` Tomasz Figa
@ 2017-06-28 14:55             ` Sakari Ailus
  0 siblings, 0 replies; 17+ messages in thread
From: Sakari Ailus @ 2017-06-28 14:55 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: Hans Verkuil, Yong Zhi, Linux Media Mailing List, Sakari Ailus,
	Zheng, Jian Xu, Mani, Rajmohan, Toivonen, Tuukka, Yang, Hyungwoo

On Wed, Jun 28, 2017 at 10:36:54PM +0900, Tomasz Figa wrote:
> On Wed, Jun 28, 2017 at 10:31 PM, Sakari Ailus <sakari.ailus@iki.fi> wrote:
> > On Tue, Jun 27, 2017 at 06:33:13PM +0900, Tomasz Figa wrote:
> >> On Mon, Jun 26, 2017 at 11:51 PM, Sakari Ailus <sakari.ailus@iki.fi> wrote:
> >> > On Mon, Jun 12, 2017 at 06:59:18PM +0900, Tomasz Figa wrote:
> >> >
> >> >>
> >> >> > +       if (WARN_ON(freq <= 0))
> >> >> > +               return -EINVAL;
> >> >>
> >> >> It generally doesn't make sense for the frequency to be negative, so
> >> >> maybe the argument should have been unsigned to start with? (And
> >> >> 32-bit if we don't expect frequencies higher than 4 GHz anyway.)
> >> >
> >> > The value comes from a 64-bit integer V4L2 control so that implies the value
> >> > range of s64 as well.
> >>
> >> Okay, if there is no way to enforce this at control level, then I
> >> guess we have to keep this here.
> >>
> >> >
> >> >>
> >> >> > +
> >> >> > +       /* b could be 0, -2 or -8, so r < 500000000 */
> >> >>
> >> >> Definitely. Anything <= 0 is also less than 500000000. Let's take a
> >> >> look at the computation below again:
> >> >>
> >> >> 1) accinv is multiplied by b,
> >> >> 2) 500000000 is divided by 256 (=== shift right by 8 bits) = 1953125,
> >> >> 3) accinv*b is multiplied by 1953125 to form the value of r.
> >> >>
> >> >> Now let's see at possible maximum absolute values for particular steps:
> >> >> 1) 16 * -8 = -128 (signed 8 bits),
> >> >> 2) 1953125 (unsigned 21 bits),
> >> >> 3) -128 * 1953125 = -249999872 (signed 29 bits).
> >> >>
> >> >> So I think the important thing to note in the comment is:
> >> >>
> >> >> /* b could be 0, -2 or -8, so |accinv * b| is always less than (1 <<
> >> >> ds) and thus |r| < 500000000. */
> >> >>
> >> >> > +       r = accinv * b * (500000000 >> ds);
> >> >>
> >> >> On the other hand, you lose some precision here. If you used s64
> >> >> instead and did the divide shift at the end ((accinv * b * 500000000)
> >> >> >> ds), for the example above you would get -250007629. (Depending on
> >> >> how big freq is, it might not matter, though.)
> >> >>
> >> >
> >> > The frequency is typically hundreds of mega-Hertz.
> >>
> >> I think it still would make sense to have the calculation a bit more precise.
> >
> > Then the solution is to divide by the 64-bit number, i.e. do_div(). IMO
> > this shouldn't be a big deal either way: the result needs to be in a value
> > range and this is only done once when streaming is started.
> >
> >>
> >> >
> >> >> Also nit: What is 500000000? We have local constants defined above, I
> >> >> think it could also make sense to do the same for this one. The
> >> >> compiler should do constant propagation and simplify respective
> >> >> calculations anyway.
> >> >
> >> > COUNT_ACC in the formula in the comment a few decalines above is in
> >> > nanoseconds. Performing the calculations in integer arithmetics results in
> >> > having 500000000 in the resulting formula.
> >> >
> >> > So this is actually a constant related to the hardware but it does not have
> >> > a pre-determined name because it is derived from COUNT_ACC.
> >>
> >> Which, I believe, doesn't stop us from naming it.
> >
> > No, but the value is derived from another value and used once. There's not
> > much value in adding a macro for IMO.
> >
> > The formula can be perhaps easier written as:
> >
> >         accinv * a + (accinv * b * (500000000 >> ds)
> >                       / (int32_t)(link_freq >> ds));
> >
> > If you insist, how about COUNT_ACC_FACTOR, for it's derived from COUNT_ACC?
> >
> >>
> >> >> > +static int cio2_v4l2_querycap(struct file *file, void *fh,
> >> >> > +                             struct v4l2_capability *cap)
> >> >> > +{
> >> >> > +       struct cio2_device *cio2 = video_drvdata(file);
> >> >> > +
> >> >> > +       strlcpy(cap->driver, CIO2_NAME, sizeof(cap->driver));
> >> >> > +       strlcpy(cap->card, CIO2_DEVICE_NAME, sizeof(cap->card));
> >> >> > +       snprintf(cap->bus_info, sizeof(cap->bus_info),
> >> >> > +                "PCI:%s", pci_name(cio2->pci_dev));
> >> >> > +       cap->device_caps = V4L2_CAP_VIDEO_CAPTURE | V4L2_CAP_STREAMING;
> >> >>
> >> >> Hmm, I thought single plane queue type was deprecated these days and
> >> >> _MPLANE recommended for all new drivers. I'll defer this to other
> >> >> reviewers, though.
> >> >
> >> > If the device supports single plane formats only, I don't see a reason to
> >> > use MPLANE buffer types.
> >>
> >> On the other hand, if a further new revision of the hardware (or
> >> amendment of supported feature set of current hardware) actually adds
> >> support for multiple planes, changing it to MPLANE will require
> >> keeping a non-MPLANE variant of the code, due to userspace
> >> compatibility concerns...
> >
> > I think I have to correct my earlier statement --- the device supports
> > multi-planar formats as well. They're only useful with SoC cameras though,
> > not with raw Bayer cameras.
> >
> > IMO VB2/V4L2 could better support conversion between single and
> > multi-planar buffer types so that the applications could just use any and
> > drivers could manage with one.
> >
> > I don't have a strong opinion either way, but IMO this could be well
> > addressed later on by improving the framework when (or if) the support for
> > formats such as NV12 is added.
> 
> The problem is that it couldn't, because it would change the userspace ABI.

Not if the driver supports single plane formats using both single-planar
and multi-planar API. I don't think there's much missing from that. It'd
make sense to do that anyway, independently of this driver.

-- 
Sakari Ailus
e-mail: sakari.ailus@iki.fi	XMPP: sailus@retiisi.org.uk

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 3/3] [media] intel-ipu3: cio2: Add new MIPI-CSI2 driver
       [not found]           ` <CGME20170628154447epcas5p28ba0ff617f6e640185fada0e955e24b0@epcas5p2.samsung.com>
@ 2017-06-28 15:44             ` Sylwester Nawrocki
  2017-06-28 15:56               ` Sakari Ailus
  0 siblings, 1 reply; 17+ messages in thread
From: Sylwester Nawrocki @ 2017-06-28 15:44 UTC (permalink / raw)
  To: Sakari Ailus, Tomasz Figa
  Cc: Yong Zhi, Linux Media Mailing List, Sakari Ailus, Zheng, Jian Xu,
	Mani, Rajmohan, Toivonen, Tuukka, Hans Verkuil, Yang, Hyungwoo

Hi,

On 06/28/2017 03:31 PM, Sakari Ailus wrote:
> IMO VB2/V4L2 could better support conversion between single and
> multi-planar buffer types so that the applications could just use any and
> drivers could manage with one.
> 
> I don't have a strong opinion either way, but IMO this could be well
> addressed later on by improving the framework when (or if) the support for
> formats such as NV12 is added.

We had already conversion between single and multi-planar buffer types
in the kernel.  But for some reasons it got removed. [1] The conversion
is supposed to be done in libv4l2, which is not mandatory so it cannot
be used to ensure backward compatibility while moving driver from one
API to the other.

[1]
commit 1d0c86cad38678fa42f6d048a7b9e4057c8c16fc
[media] media: v4l: remove single to multiplane conversion

Regards,
Sylwester

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 3/3] [media] intel-ipu3: cio2: Add new MIPI-CSI2 driver
  2017-06-28 15:44             ` Sylwester Nawrocki
@ 2017-06-28 15:56               ` Sakari Ailus
  0 siblings, 0 replies; 17+ messages in thread
From: Sakari Ailus @ 2017-06-28 15:56 UTC (permalink / raw)
  To: Sylwester Nawrocki
  Cc: Tomasz Figa, Yong Zhi, Linux Media Mailing List, Sakari Ailus,
	Zheng, Jian Xu, Mani, Rajmohan, Toivonen, Tuukka, Hans Verkuil,
	Yang, Hyungwoo

Hi Sylwester,

On Wed, Jun 28, 2017 at 05:44:32PM +0200, Sylwester Nawrocki wrote:
> Hi,
> 
> On 06/28/2017 03:31 PM, Sakari Ailus wrote:
> > IMO VB2/V4L2 could better support conversion between single and
> > multi-planar buffer types so that the applications could just use any and
> > drivers could manage with one.
> > 
> > I don't have a strong opinion either way, but IMO this could be well
> > addressed later on by improving the framework when (or if) the support for
> > formats such as NV12 is added.
> 
> We had already conversion between single and multi-planar buffer types
> in the kernel.  But for some reasons it got removed. [1] The conversion
> is supposed to be done in libv4l2, which is not mandatory so it cannot
> be used to ensure backward compatibility while moving driver from one
> API to the other.
> 
> [1]
> commit 1d0c86cad38678fa42f6d048a7b9e4057c8c16fc
> [media] media: v4l: remove single to multiplane conversion

Thanks for the pointer. I had missed this back then.

Not all applications will be using libv4l2. This is something that would
make sense to do in the kernel IMO. The changes seem pretty minimal to me,
based on the patch.

There is now at least one difference between single-planar and multi-planar
cases; the data_offset field is only present in struct v4l2_plane. That
should be easy to address by adding the field to the single-planar case,
too. (We'll need new buffer structs in the near future anyway, there's no
really a way around that.)

-- 
Kind regards,

Sakari Ailus
e-mail: sakari.ailus@iki.fi	XMPP: sailus@retiisi.org.uk

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH v2 3/3] [media] intel-ipu3: cio2: Add new MIPI-CSI2 driver
  2017-06-12  9:59   ` Tomasz Figa
                       ` (2 preceding siblings ...)
  2017-06-26 14:51     ` Sakari Ailus
@ 2017-10-06 19:19     ` Zhi, Yong
  3 siblings, 0 replies; 17+ messages in thread
From: Zhi, Yong @ 2017-10-06 19:19 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: linux-media, Sakari Ailus, Zheng, Jian Xu, Mani, Rajmohan,
	Toivonen, Tuukka, Hans Verkuil, Yang, Hyungwoo

Hi, Tomasz,

Sorry for the late reply. I will omit the points that have been fixed in v4 or discussed earlier by either Tuukka or Sakari (https://patchwork.linuxtv.org/patch/41665)

> -----Original Message-----
> From: linux-media-owner@vger.kernel.org [mailto:linux-media-
> owner@vger.kernel.org] On Behalf Of Tomasz Figa
> Sent: Monday, June 12, 2017 2:59 AM
> To: Zhi, Yong <yong.zhi@intel.com>
> Cc: linux-media@vger.kernel.org; Sakari Ailus <sakari.ailus@linux.intel.com>;
> Zheng, Jian Xu <jian.xu.zheng@intel.com>; Mani, Rajmohan
> <rajmohan.mani@intel.com>; Toivonen, Tuukka
> <tuukka.toivonen@intel.com>; Hans Verkuil <hverkuil@xs4all.nl>; Yang,
> Hyungwoo <hyungwoo.yang@intel.com>
> Subject: Re: [PATCH v2 3/3] [media] intel-ipu3: cio2: Add new MIPI-CSI2
> driver
> 
> Hi Yong,
> 
> Please see my comments inline.
> 
> On Wed, Jun 7, 2017 at 10:34 AM, Yong Zhi <yong.zhi@intel.com> wrote:
> > This patch adds CIO2 CSI-2 device driver for Intel's IPU3 camera
> > sub-system support.
> >
> > Signed-off-by: Yong Zhi <yong.zhi@intel.com>
> > ---
> >  drivers/media/pci/Kconfig                |    2 +
> >  drivers/media/pci/Makefile               |    3 +-
> >  drivers/media/pci/intel/Makefile         |    5 +
> >  drivers/media/pci/intel/ipu3/Kconfig     |   17 +
> >  drivers/media/pci/intel/ipu3/Makefile    |    1 +
> >  drivers/media/pci/intel/ipu3/ipu3-cio2.c | 1788
> > ++++++++++++++++++++++++++++++
> > drivers/media/pci/intel/ipu3/ipu3-cio2.h |  424 +++++++
> >  7 files changed, 2239 insertions(+), 1 deletion(-)  create mode
> > 100644 drivers/media/pci/intel/Makefile  create mode 100644
> > drivers/media/pci/intel/ipu3/Kconfig
> >  create mode 100644 drivers/media/pci/intel/ipu3/Makefile
> >  create mode 100644 drivers/media/pci/intel/ipu3/ipu3-cio2.c
> >  create mode 100644 drivers/media/pci/intel/ipu3/ipu3-cio2.h
> [snip]
> > diff --git a/drivers/media/pci/intel/ipu3/Kconfig
> > b/drivers/media/pci/intel/ipu3/Kconfig
> > new file mode 100644
> > index 0000000..2a895d6
> > --- /dev/null
> > +++ b/drivers/media/pci/intel/ipu3/Kconfig
> > @@ -0,0 +1,17 @@
> > +config VIDEO_IPU3_CIO2
> > +       tristate "Intel ipu3-cio2 driver"
> > +       depends on VIDEO_V4L2 && PCI
> > +       depends on MEDIA_CONTROLLER
> > +       depends on HAS_DMA
> > +       depends on ACPI
> 
> I wonder if it wouldn't make sense to make this depend on X86 (||
> COMPILE_TEST) as well. Are we expecting a standalone PCI(e) card with this
> device in the future?

Will add depends on (X86 || COMPILE_TEST) && 64BIT

> 
> > +       select V4L2_FWNODE
> > +       select VIDEOBUF2_DMA_SG
> > +
> > +       ---help---
> > +       This is the Intel IPU3 CIO2 CSI-2 receiver unit, found in Intel
> > +       Skylake and Kaby Lake SoCs and used for capturing images and
> > +       video from a camera sensor.
> > +
> > +       Say Y or M here if you have a Skylake/Kaby Lake SoC with MIPI CSI-2
> > +       connected camera.
> > +       The module will be called ipu3-cio2.
> > diff --git a/drivers/media/pci/intel/ipu3/Makefile
> > b/drivers/media/pci/intel/ipu3/Makefile
> > new file mode 100644
> > index 0000000..20186e3
> > --- /dev/null
> > +++ b/drivers/media/pci/intel/ipu3/Makefile
> > @@ -0,0 +1 @@
> > +obj-$(CONFIG_VIDEO_IPU3_CIO2) += ipu3-cio2.o
> > diff --git a/drivers/media/pci/intel/ipu3/ipu3-cio2.c
> > b/drivers/media/pci/intel/ipu3/ipu3-cio2.c
> > new file mode 100644
> > index 0000000..69c47fc
> > --- /dev/null
> > +++ b/drivers/media/pci/intel/ipu3/ipu3-cio2.c
> [snip]
> 
> > +               u32 fbpt_rp =
> > +                       (readl(cio2->base + CIO2_REG_CDMARI(CIO2_DMA_CHAN))
> > +                        >> CIO2_CDMARI_FBPT_RP_SHIFT)
> > +                       & CIO2_CDMARI_FBPT_RP_MASK;
> > +
> > +               /*
> > +                * fbpt_rp is the fbpt entry that the dma is currently working
> > +                * on, but since it could jump to next entry at any time,
> > +                * assume that we might already be there.
> > +                */
> > +               fbpt_rp = (fbpt_rp + 1) % CIO2_MAX_BUFFERS;
> 
> Hmm, this is really racy. This code can be pre-empted and not execute for
> quite long time, depending on system load, resuming after the hardware
> goes even further. Technically you could prevent this using
> *_irq_save()/_irq_restore(), but I'd try to find a way that doesn't rely on the
> timing, if possible.

Ack
Will disable interrupts for the duration of this buffer queueing.

> [snip]
> > +static int cio2_v4l2_querycap(struct file *file, void *fh,
> > +                             struct v4l2_capability *cap) {
> > +       struct cio2_device *cio2 = video_drvdata(file);
> > +
> > +       strlcpy(cap->driver, CIO2_NAME, sizeof(cap->driver));
> > +       strlcpy(cap->card, CIO2_DEVICE_NAME, sizeof(cap->card));
> > +       snprintf(cap->bus_info, sizeof(cap->bus_info),
> > +                "PCI:%s", pci_name(cio2->pci_dev));
> > +       cap->device_caps = V4L2_CAP_VIDEO_CAPTURE |
> > + V4L2_CAP_STREAMING;
> 
> Hmm, I thought single plane queue type was deprecated these days and
> _MPLANE recommended for all new drivers. I'll defer this to other reviewers,
> though.

Will switch to MPLANE support in v5.

> 
> > +       cap->capabilities = cap->device_caps | V4L2_CAP_DEVICE_CAPS;
> > +
> > +       return 0;
> > +}
> [snip]
> > +static int cio2_v4l2_try_fmt(struct file *file, void *fh, struct
> > +v4l2_format *f) {
> > +       u32 pixelformat = f->fmt.pix.pixelformat;
> > +       unsigned int i;
> > +
> > +       cio2_v4l2_g_fmt(file, fh, f);
> > +
> > +       for (i = 0; i < ARRAY_SIZE(cio2_csi2_fmts); i++) {
> > +               if (pixelformat == cio2_csi2_fmts[i])
> > +                       break;
> > +       }
> > +
> > +       /* Use SRGGB10 as default if not found */
> > +       if (i >= ARRAY_SIZE(cio2_csi2_fmts))
> > +               pixelformat = V4L2_PIX_FMT_IPU3_SRGGB10;
> > +
> > +       f->fmt.pix.pixelformat = pixelformat;
> > +       f->fmt.pix.bytesperline = cio2_bytesperline(f->fmt.pix.width);
> > +       f->fmt.pix.sizeimage = f->fmt.pix.bytesperline *
> > + f->fmt.pix.height;
> 
> Shouldn't you use f->fmt.pix_mp instead?
> 

Agreed, will update here together with MPLANE support.

> [snip]
> > +
> > +       /* Initialize vbq */
> > +       vbq->type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
> > +       vbq->io_modes = VB2_USERPTR | VB2_MMAP;
> 
> 
> Best regards,
> Tomasz

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2017-10-06 19:20 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-07  1:34 [PATCH v2 0/3] [media] add IPU3 CIO2 CSI2 driver Yong Zhi
2017-06-07  1:34 ` [PATCH v2 1/3] [media] videodev2.h, v4l2-ioctl: add IPU3 raw10 color format Yong Zhi
2017-06-07  1:34 ` [PATCH v2 2/3] [media] doc-rst: add IPU3 raw10 bayer pixel format definitions Yong Zhi
2017-06-07 17:55   ` Alan Cox
2017-06-07  1:34 ` [PATCH v2 3/3] [media] intel-ipu3: cio2: Add new MIPI-CSI2 driver Yong Zhi
2017-06-12  9:59   ` Tomasz Figa
2017-06-13  8:58     ` Tuukka Toivonen
2017-06-13  9:18       ` Tomasz Figa
2017-06-16 11:48     ` Sakari Ailus
2017-06-26 14:51     ` Sakari Ailus
2017-06-27  9:33       ` Tomasz Figa
2017-06-28 13:31         ` Sakari Ailus
2017-06-28 13:36           ` Tomasz Figa
2017-06-28 14:55             ` Sakari Ailus
     [not found]           ` <CGME20170628154447epcas5p28ba0ff617f6e640185fada0e955e24b0@epcas5p2.samsung.com>
2017-06-28 15:44             ` Sylwester Nawrocki
2017-06-28 15:56               ` Sakari Ailus
2017-10-06 19:19     ` Zhi, Yong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.