All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] v4l: VPE mem to mem driver
@ 2013-08-02 14:03 ` Archit Taneja
  0 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-08-02 14:03 UTC (permalink / raw)
  To: linux-media
  Cc: linux-omap, dagriego, dale, pawel, m.szyprowski, hverkuil,
	laurent.pinchart, tomi.valkeinen, Archit Taneja

VPE:
VPE(Video Processing Engine) is an IP found on DRA7xx, and in some past TI
multimedia SoCs which don't have baseport support in the mainline kernel.

VPE is a memory to memory block used for performing de-interlacing, scaling and
color conversion on input buffers. It's primarily used to de-interlace decoded
DVD/Blu Ray video buffers, and provide the content to progressive display or do
some other post processing. VPE can also be used for other tasks like fast
color space conversion, scaling and chrominance up/down sampling. The scaler in
particular is based on a polyphase filter and supports 32 phases and 5/7 taps.

VPE's De-interlacer IP:
The De-interlacer module performs a combination of spatial and temporal
interlacing, it determines the weight-age by keeping a track of the change in
motion between fields by maintaining and updating a motion vector buffer in
the RAM. The de-interlacer needs the current field and the 2 previous fields
(along with the motion vector info)to generate a progressive frame. It operates
on YUV422 data.

VPDMA:
All the DMAs are done through a dedicated DMA IP called VPDMA(Video Port Direct
Memory Access). This DMA IP is specialized for transferring video buffers, the
input and output data ports of VPDMA are configured via descriptor lists loaded
to the VPDMA list manager. VPDMA is also used to load MMRs of the various VPE
sub blocks.

VPDMA is advanced enough to support multiple clients like a system DMA,
however, the way it's integrated in the SoC is such that it can be used only by
the VPE IP. The same IP is also used on DRA7x in another block called VIP
(full form) used to capture camera sensor content. It's again dedicated to the
VIP block, and therefore doesn't have multiple clients. These factors made us
consider writing the VPDMA block as a library, providing functions to
VPE(and VIP in the future) to add descriptors and start DMA. It might have
made sense to make it a dmaengine driver if there were multiple clients using
VPDMA.

VPE and VPDMA look something like this:

   -----------		         ---
  |    MVin   |---------------->|   |
  |	      |			|   |
  |   Mvout   |<----------------|   |    ---	
  |	      |	   ---------    |   |   |   |
  |	f     |-->| CHR_US1 |-->| D |   | S |	     ------
  | (YUV in)  |    ---------    | E |-->| C |------>|CHR_DS|----
  | 	      |	   ---------    | I |   |   |   |    ------     |
  |   f - 1   |-->| CHR_US2 |-->|   |   |   |   |		|
  | (YUV in)  |	   ---------	|   |    ---    |    -----	|
  | 	      |	   ---------    |   |		 -->| CSC |--   |
  |   f - 2   |-->| CHR_US3 |-->|   |		     -----   |  |
  | (YUV in)  |	   ---------    |   |			     |  |
  |  	      |			 ---			     |  |
  |	      |						     |	|					  
  | (YUV out) |<---------------------------------------------	|
  |	      |							|
  | (RGB out) |<------------------------------------------------
   -----------
     VPDMA			      VPE

f, f - 1, and f - 2 are input ports fetching 3 consecutive fields for the
de-interlacer. MVin and MVout are ports which fetch the current motion vector
and output the updated motion vector respectively. There are 2 output ports,
one for YUV output and the other for RGB output if the color space
converter(CSC) is used. The inputs can be YUV packed or semiplanar formats. The
chrominance upsampler(CHR_USx) is used when the input format is NV12, the
chrominance downsampler(CHR_DS) is used if the the output content needs to be
NV12 format. The scaler(SC) can be used to scale the de-interlaced content if
needed.

This series adds VPE as a mem to mem v4l2 driver, and VPDMA as a helper
library. For now, only the de-interlacer is configured, the scaler and color
space converter are bypassed.

These patches were tested over the patch series which provides initial baseport
support for DRA7XX:

http://marc.info/?l=linux-omap&m=137518359422774&w=2

Archit Taneja (6):
  v4l: ti-vpe: Create a vpdma helper library
  v4l: ti-vpe: Add helpers for creating VPDMA descriptors
  v4l: ti-vpe: Add VPE mem to mem driver
  v4l: ti-vpe: Add de-interlacer support in VPE
  arm: dra7xx: hwmod data: add VPE hwmod data and ocp_if info
  experimental: arm: dts: dra7xx: Add a DT node for VPE

 arch/arm/boot/dts/dra7.dtsi                |   11 +
 arch/arm/mach-omap2/omap_hwmod_7xx_data.c  |   42 +
 drivers/media/platform/Kconfig             |   10 +
 drivers/media/platform/Makefile            |    2 +
 drivers/media/platform/ti-vpe/vpdma.c      |  858 ++++++++++++
 drivers/media/platform/ti-vpe/vpdma.h      |  202 +++
 drivers/media/platform/ti-vpe/vpdma_priv.h |  814 +++++++++++
 drivers/media/platform/ti-vpe/vpe.c        | 2065 ++++++++++++++++++++++++++++
 drivers/media/platform/ti-vpe/vpe_regs.h   |  496 +++++++
 9 files changed, 4500 insertions(+)
 create mode 100644 drivers/media/platform/ti-vpe/vpdma.c
 create mode 100644 drivers/media/platform/ti-vpe/vpdma.h
 create mode 100644 drivers/media/platform/ti-vpe/vpdma_priv.h
 create mode 100644 drivers/media/platform/ti-vpe/vpe.c
 create mode 100644 drivers/media/platform/ti-vpe/vpe_regs.h

-- 
1.8.1.2


^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH 0/6] v4l: VPE mem to mem driver
@ 2013-08-02 14:03 ` Archit Taneja
  0 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-08-02 14:03 UTC (permalink / raw)
  To: linux-media
  Cc: linux-omap, dagriego, dale, pawel, m.szyprowski, hverkuil,
	laurent.pinchart, tomi.valkeinen, Archit Taneja

VPE:
VPE(Video Processing Engine) is an IP found on DRA7xx, and in some past TI
multimedia SoCs which don't have baseport support in the mainline kernel.

VPE is a memory to memory block used for performing de-interlacing, scaling and
color conversion on input buffers. It's primarily used to de-interlace decoded
DVD/Blu Ray video buffers, and provide the content to progressive display or do
some other post processing. VPE can also be used for other tasks like fast
color space conversion, scaling and chrominance up/down sampling. The scaler in
particular is based on a polyphase filter and supports 32 phases and 5/7 taps.

VPE's De-interlacer IP:
The De-interlacer module performs a combination of spatial and temporal
interlacing, it determines the weight-age by keeping a track of the change in
motion between fields by maintaining and updating a motion vector buffer in
the RAM. The de-interlacer needs the current field and the 2 previous fields
(along with the motion vector info)to generate a progressive frame. It operates
on YUV422 data.

VPDMA:
All the DMAs are done through a dedicated DMA IP called VPDMA(Video Port Direct
Memory Access). This DMA IP is specialized for transferring video buffers, the
input and output data ports of VPDMA are configured via descriptor lists loaded
to the VPDMA list manager. VPDMA is also used to load MMRs of the various VPE
sub blocks.

VPDMA is advanced enough to support multiple clients like a system DMA,
however, the way it's integrated in the SoC is such that it can be used only by
the VPE IP. The same IP is also used on DRA7x in another block called VIP
(full form) used to capture camera sensor content. It's again dedicated to the
VIP block, and therefore doesn't have multiple clients. These factors made us
consider writing the VPDMA block as a library, providing functions to
VPE(and VIP in the future) to add descriptors and start DMA. It might have
made sense to make it a dmaengine driver if there were multiple clients using
VPDMA.

VPE and VPDMA look something like this:

   -----------		         ---
  |    MVin   |---------------->|   |
  |	      |			|   |
  |   Mvout   |<----------------|   |    ---	
  |	      |	   ---------    |   |   |   |
  |	f     |-->| CHR_US1 |-->| D |   | S |	     ------
  | (YUV in)  |    ---------    | E |-->| C |------>|CHR_DS|----
  | 	      |	   ---------    | I |   |   |   |    ------     |
  |   f - 1   |-->| CHR_US2 |-->|   |   |   |   |		|
  | (YUV in)  |	   ---------	|   |    ---    |    -----	|
  | 	      |	   ---------    |   |		 -->| CSC |--   |
  |   f - 2   |-->| CHR_US3 |-->|   |		     -----   |  |
  | (YUV in)  |	   ---------    |   |			     |  |
  |  	      |			 ---			     |  |
  |	      |						     |	|					  
  | (YUV out) |<---------------------------------------------	|
  |	      |							|
  | (RGB out) |<------------------------------------------------
   -----------
     VPDMA			      VPE

f, f - 1, and f - 2 are input ports fetching 3 consecutive fields for the
de-interlacer. MVin and MVout are ports which fetch the current motion vector
and output the updated motion vector respectively. There are 2 output ports,
one for YUV output and the other for RGB output if the color space
converter(CSC) is used. The inputs can be YUV packed or semiplanar formats. The
chrominance upsampler(CHR_USx) is used when the input format is NV12, the
chrominance downsampler(CHR_DS) is used if the the output content needs to be
NV12 format. The scaler(SC) can be used to scale the de-interlaced content if
needed.

This series adds VPE as a mem to mem v4l2 driver, and VPDMA as a helper
library. For now, only the de-interlacer is configured, the scaler and color
space converter are bypassed.

These patches were tested over the patch series which provides initial baseport
support for DRA7XX:

http://marc.info/?l=linux-omap&m=137518359422774&w=2

Archit Taneja (6):
  v4l: ti-vpe: Create a vpdma helper library
  v4l: ti-vpe: Add helpers for creating VPDMA descriptors
  v4l: ti-vpe: Add VPE mem to mem driver
  v4l: ti-vpe: Add de-interlacer support in VPE
  arm: dra7xx: hwmod data: add VPE hwmod data and ocp_if info
  experimental: arm: dts: dra7xx: Add a DT node for VPE

 arch/arm/boot/dts/dra7.dtsi                |   11 +
 arch/arm/mach-omap2/omap_hwmod_7xx_data.c  |   42 +
 drivers/media/platform/Kconfig             |   10 +
 drivers/media/platform/Makefile            |    2 +
 drivers/media/platform/ti-vpe/vpdma.c      |  858 ++++++++++++
 drivers/media/platform/ti-vpe/vpdma.h      |  202 +++
 drivers/media/platform/ti-vpe/vpdma_priv.h |  814 +++++++++++
 drivers/media/platform/ti-vpe/vpe.c        | 2065 ++++++++++++++++++++++++++++
 drivers/media/platform/ti-vpe/vpe_regs.h   |  496 +++++++
 9 files changed, 4500 insertions(+)
 create mode 100644 drivers/media/platform/ti-vpe/vpdma.c
 create mode 100644 drivers/media/platform/ti-vpe/vpdma.h
 create mode 100644 drivers/media/platform/ti-vpe/vpdma_priv.h
 create mode 100644 drivers/media/platform/ti-vpe/vpe.c
 create mode 100644 drivers/media/platform/ti-vpe/vpe_regs.h

-- 
1.8.1.2


^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH 1/6] v4l: ti-vpe: Create a vpdma helper library
  2013-08-02 14:03 ` Archit Taneja
@ 2013-08-02 14:03   ` Archit Taneja
  -1 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-08-02 14:03 UTC (permalink / raw)
  To: linux-media
  Cc: linux-omap, dagriego, dale, pawel, m.szyprowski, hverkuil,
	laurent.pinchart, tomi.valkeinen, Archit Taneja

The primary function of VPDMA is to move data between external memory and
internal processing modules(in our case, VPE) that source or sink data. VPDMA is
capable of buffering this data and then delivering the data as demanded to the
modules as programmed. The modules that source or sink data are referred to as
clients or ports. A channel is setup inside the VPDMA to connect a specific
memory buffer to a specific client. The VPDMA centralizes the DMA control
functions and buffering required to allow all the clients to minimize the
effect of long latency times.

Add the following to the VPDMA helper:

- A data struct which describe VPDMA channels. For now, these channels are the
  ones used only by VPE, the list of channels will increase when VIP(Video
  Input Port) also uses the VPDMA library. This channel information will be
  used to populate fields required by data descriptors.

- Data structs which describe the different data types supported by VPDMA. This
  data type information will be used to populate fields required by data
  descriptors and used by the VPE driver to map a V4L2 format to the
  corresponding VPDMA data type.

- Provide VPDMA register offset definitions, functions to read, write and modify
  VPDMA registers.

- Functions to create and submit a VPDMA list. A list is a group of descriptors
  that makes up a set of DMA transfers that need to be completed. Each
  descriptor will either perform a DMA transaction to fetch input buffers and
  write to output buffers(data descriptors), or configure the MMRs of sub blocks
  of VPE(configuration descriptors), or provide control information to VPDMA
  (control descriptors).

- Functions to allocate, map and unmap buffers needed for the descriptor list,
  payloads containing MMR values and motion vector buffers. These use the
  DMA mapping APIs to ensure exclusive access to VPDMA.

- Functions to enable VPDMA interrupts. VPDMA can trigger an interrupt on the
  VPE interrupt line when a descriptor list is parsed completely and the DMA
  transactions are completed. This requires masking the events in VPDMA
  registers and configuring some top level VPE interrupt registers.

- Enable some VPDMA specific parameters: frame start event(when to start DMA for
  a client) and line mode(whether each line fetched should be mirrored or not).

- Function to load firmware required by VPDMA. VPDMA requires a firmware for
  it's internal list manager. We add the required request_firmware apis to fetch
  this firmware from user space.

- Function to dump VPDMA registers.

- A function to initialize VPDMA, this will be called by the VPE driver with
  it's platform device pointer, this function will take care of loading VPDMA
  firmware and returning a handle back to the VPE driver. The VIP driver will
  also call the same init function to initialize it's own VPDMA instance.

Signed-off-by: Archit Taneja <archit@ti.com>
---
 drivers/media/platform/ti-vpe/vpdma.c      | 589 +++++++++++++++++++++++++++++
 drivers/media/platform/ti-vpe/vpdma.h      | 154 ++++++++
 drivers/media/platform/ti-vpe/vpdma_priv.h | 119 ++++++
 3 files changed, 862 insertions(+)
 create mode 100644 drivers/media/platform/ti-vpe/vpdma.c
 create mode 100644 drivers/media/platform/ti-vpe/vpdma.h
 create mode 100644 drivers/media/platform/ti-vpe/vpdma_priv.h

diff --git a/drivers/media/platform/ti-vpe/vpdma.c b/drivers/media/platform/ti-vpe/vpdma.c
new file mode 100644
index 0000000..b15b3dd
--- /dev/null
+++ b/drivers/media/platform/ti-vpe/vpdma.c
@@ -0,0 +1,589 @@
+/*
+ * VPDMA helper library
+ *
+ * Copyright (c) 2013 Texas Instruments Inc.
+ *
+ * David Griego, <dagriego@biglakesoftware.com>
+ * Dale Farnsworth, <dale@farnsworth.org>
+ * Archit Taneja, <archit@ti.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#include <linux/delay.h>
+#include <linux/dma-mapping.h>
+#include <linux/err.h>
+#include <linux/firmware.h>
+#include <linux/io.h>
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+
+#include "vpdma.h"
+#include "vpdma_priv.h"
+
+#define VPDMA_FIRMWARE	"vpdma-1b8.bin"
+
+struct vpdma_data_format vpdma_yuv_fmts[] = {
+	[VPDMA_DATA_FMT_Y444] = {
+		.data_type	= DATA_TYPE_Y444,
+		.depth		= 8,
+	},
+	[VPDMA_DATA_FMT_Y422] = {
+		.data_type	= DATA_TYPE_Y422,
+		.depth		= 8,
+	},
+	[VPDMA_DATA_FMT_Y420] = {
+		.data_type	= DATA_TYPE_Y420,
+		.depth		= 8,
+	},
+	[VPDMA_DATA_FMT_C444] = {
+		.data_type	= DATA_TYPE_C444,
+		.depth		= 8,
+	},
+	[VPDMA_DATA_FMT_C422] = {
+		.data_type	= DATA_TYPE_C422,
+		.depth		= 8,
+	},
+	[VPDMA_DATA_FMT_C420] = {
+		.data_type	= DATA_TYPE_C420,
+		.depth		= 4,
+	},
+	[VPDMA_DATA_FMT_YC422] = {
+		.data_type	= DATA_TYPE_YC422,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_YC444] = {
+		.data_type	= DATA_TYPE_YC444,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_CY422] = {
+		.data_type	= DATA_TYPE_CY422,
+		.depth		= 16,
+	},
+};
+
+struct vpdma_data_format vpdma_rgb_fmts[] = {
+	[VPDMA_DATA_FMT_RGB565] = {
+		.data_type	= DATA_TYPE_RGB16_565,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_ARGB16_1555] = {
+		.data_type	= DATA_TYPE_ARGB_1555,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_ARGB16] = {
+		.data_type	= DATA_TYPE_ARGB_4444,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_RGBA16_5551] = {
+		.data_type	= DATA_TYPE_RGBA_5551,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_RGBA16] = {
+		.data_type	= DATA_TYPE_RGBA_4444,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_ARGB24] = {
+		.data_type	= DATA_TYPE_ARGB24_6666,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_RGB24] = {
+		.data_type	= DATA_TYPE_RGB24_888,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_ARGB32] = {
+		.data_type	= DATA_TYPE_ARGB32_8888,
+		.depth		= 32,
+	},
+	[VPDMA_DATA_FMT_RGBA24] = {
+		.data_type	= DATA_TYPE_RGBA24_6666,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_RGBA32] = {
+		.data_type	= DATA_TYPE_RGBA32_8888,
+		.depth		= 32,
+	},
+	[VPDMA_DATA_FMT_BGR565] = {
+		.data_type	= DATA_TYPE_BGR16_565,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_ABGR16_1555] = {
+		.data_type	= DATA_TYPE_ABGR_1555,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_ABGR16] = {
+		.data_type	= DATA_TYPE_ABGR_4444,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_BGRA16_5551] = {
+		.data_type	= DATA_TYPE_BGRA_5551,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_BGRA16] = {
+		.data_type	= DATA_TYPE_BGRA_4444,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_ABGR24] = {
+		.data_type	= DATA_TYPE_ABGR24_6666,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_BGR24] = {
+		.data_type	= DATA_TYPE_BGR24_888,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_ABGR32] = {
+		.data_type	= DATA_TYPE_ABGR32_8888,
+		.depth		= 32,
+	},
+	[VPDMA_DATA_FMT_BGRA24] = {
+		.data_type	= DATA_TYPE_BGRA24_6666,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_BGRA32] = {
+		.data_type	= DATA_TYPE_BGRA32_8888,
+		.depth		= 32,
+	},
+};
+
+struct vpdma_data_format vpdma_misc_fmts[] = {
+	[VPDMA_DATA_FMT_MV] = {
+		.data_type	= DATA_TYPE_MV,
+		.depth		= 4,
+	},
+};
+
+struct vpdma_channel_info {
+	int num;		/* VPDMA channel number */
+	int cstat_offset;	/* client CSTAT register offset */
+};
+
+static struct vpdma_channel_info chan_info[] = {
+	[VPE_CHAN_LUMA1_IN] = {
+		.num		= VPE_CHAN_NUM_LUMA1_IN,
+		.cstat_offset	= VPDMA_DEI_LUMA1_CSTAT,
+	},
+	[VPE_CHAN_CHROMA1_IN] = {
+		.num		= VPE_CHAN_NUM_CHROMA1_IN,
+		.cstat_offset	= VPDMA_DEI_CHROMA1_CSTAT,
+	},
+	[VPE_CHAN_LUMA2_IN] = {
+		.num		= VPE_CHAN_NUM_LUMA2_IN,
+		.cstat_offset	= VPDMA_DEI_LUMA2_CSTAT,
+	},
+	[VPE_CHAN_CHROMA2_IN] = {
+		.num		= VPE_CHAN_NUM_CHROMA2_IN,
+		.cstat_offset	= VPDMA_DEI_CHROMA2_CSTAT,
+	},
+	[VPE_CHAN_LUMA3_IN] = {
+		.num		= VPE_CHAN_NUM_LUMA3_IN,
+		.cstat_offset	= VPDMA_DEI_LUMA3_CSTAT,
+	},
+	[VPE_CHAN_CHROMA3_IN] = {
+		.num		= VPE_CHAN_NUM_CHROMA3_IN,
+		.cstat_offset	= VPDMA_DEI_CHROMA3_CSTAT,
+	},
+	[VPE_CHAN_MV_IN] = {
+		.num		= VPE_CHAN_NUM_MV_IN,
+		.cstat_offset	= VPDMA_DEI_MV_IN_CSTAT,
+	},
+	[VPE_CHAN_MV_OUT] = {
+		.num		= VPE_CHAN_NUM_MV_OUT,
+		.cstat_offset	= VPDMA_DEI_MV_OUT_CSTAT,
+	},
+	[VPE_CHAN_LUMA_OUT] = {
+		.num		= VPE_CHAN_NUM_LUMA_OUT,
+		.cstat_offset	= VPDMA_VIP_UP_Y_CSTAT,
+	},
+	[VPE_CHAN_CHROMA_OUT] = {
+		.num		= VPE_CHAN_NUM_CHROMA_OUT,
+		.cstat_offset	= VPDMA_VIP_UP_UV_CSTAT,
+	},
+	[VPE_CHAN_RGB_OUT] = {
+		.num		= VPE_CHAN_NUM_RGB_OUT,
+		.cstat_offset	= VPDMA_VIP_UP_Y_CSTAT,
+	},
+};
+
+static u32 read_reg(struct vpdma_data *vpdma, int offset)
+{
+	return ioread32(vpdma->base + offset);
+}
+
+static void write_reg(struct vpdma_data *vpdma, int offset, u32 value)
+{
+	iowrite32(value, vpdma->base + offset);
+}
+
+static int get_field(u32 value, u32 mask, int shift)
+{
+	return (value & (mask << shift)) >> shift;
+}
+
+static int get_field_reg(struct vpdma_data *vpdma, int offset,
+		u32 mask, int shift)
+{
+	return get_field(read_reg(vpdma, offset), mask, shift);
+}
+
+static void insert_field(u32 *valp, u32 field, u32 mask, int shift)
+{
+	u32 val = *valp;
+
+	val &= ~(mask << shift);
+	val |= (field & mask) << shift;
+	*valp = val;
+}
+
+static void insert_field_reg(struct vpdma_data *vpdma, int offset, u32 field,
+		u32 mask, int shift)
+{
+	u32 val = read_reg(vpdma, offset);
+
+	insert_field(&val, field, mask, shift);
+
+	write_reg(vpdma, offset, val);
+}
+
+void vpdma_dump_regs(struct vpdma_data *vpdma)
+{
+	struct device *dev = &vpdma->pdev->dev;
+
+#define DUMPREG(r) dev_dbg(dev, "%-35s %08x\n", #r, read_reg(vpdma, VPDMA_##r))
+
+	dev_dbg(dev, "VPDMA Registers:\n");
+
+	DUMPREG(PID);
+	DUMPREG(LIST_ADDR);
+	DUMPREG(LIST_ATTR);
+	DUMPREG(LIST_STAT_SYNC);
+	DUMPREG(BG_RGB);
+	DUMPREG(BG_YUV);
+	DUMPREG(SETUP);
+	DUMPREG(MAX_SIZE1);
+	DUMPREG(MAX_SIZE2);
+	DUMPREG(MAX_SIZE3);
+
+	/*
+	 * dumping registers of only group0 and group3, because VPE channels
+	 * lie within group0 and group3 registers
+	 */
+	DUMPREG(INT_CHAN_STAT(0));
+	DUMPREG(INT_CHAN_MASK(0));
+	DUMPREG(INT_CHAN_STAT(3));
+	DUMPREG(INT_CHAN_MASK(3));
+	DUMPREG(INT_CLIENT0_STAT);
+	DUMPREG(INT_CLIENT0_MASK);
+	DUMPREG(INT_CLIENT1_STAT);
+	DUMPREG(INT_CLIENT1_MASK);
+	DUMPREG(INT_LIST0_STAT);
+	DUMPREG(INT_LIST0_MASK);
+
+	/*
+	 * these are registers specific to VPE clients, we can make this
+	 * function dump client registers specific to VPE or VIP based on
+	 * who is using it
+	 */
+	DUMPREG(DEI_CHROMA1_CSTAT);
+	DUMPREG(DEI_LUMA1_CSTAT);
+	DUMPREG(DEI_CHROMA2_CSTAT);
+	DUMPREG(DEI_LUMA2_CSTAT);
+	DUMPREG(DEI_CHROMA3_CSTAT);
+	DUMPREG(DEI_LUMA3_CSTAT);
+	DUMPREG(DEI_MV_IN_CSTAT);
+	DUMPREG(DEI_MV_OUT_CSTAT);
+	DUMPREG(VIP_UP_Y_CSTAT);
+	DUMPREG(VIP_UP_UV_CSTAT);
+	DUMPREG(VPI_CTL_CSTAT);
+}
+
+/*
+ * Allocate a DMA buffer
+ */
+int vpdma_buf_alloc(struct vpdma_buf *buf, size_t size)
+{
+	buf->size = size;
+	buf->mapped = 0;
+	buf->addr = kzalloc(size, GFP_KERNEL);
+	if (!buf->addr)
+		return -ENOMEM;
+
+	WARN_ON((u32) buf->addr & VPDMA_DESC_ALIGN);
+
+	return 0;
+}
+
+void vpdma_buf_free(struct vpdma_buf *buf)
+{
+	WARN_ON(buf->mapped != 0);
+	kfree(buf->addr);
+	buf->addr = NULL;
+	buf->size = 0;
+}
+
+/*
+ * map a DMA buffer, enabling DMA access
+ */
+void vpdma_buf_map(struct vpdma_data *vpdma, struct vpdma_buf *buf)
+{
+	struct device *dev = &vpdma->pdev->dev;
+
+	WARN_ON(buf->mapped != 0);
+	buf->dma_addr = dma_map_single(dev, buf->addr, buf->size,
+				DMA_TO_DEVICE);
+	buf->mapped = 1;
+	BUG_ON(dma_mapping_error(dev, buf->dma_addr));
+}
+
+/*
+ * unmap a DMA buffer, disabling DMA access and
+ * allowing the main processor to acces the data
+ */
+void vpdma_buf_unmap(struct vpdma_data *vpdma, struct vpdma_buf *buf)
+{
+	struct device *dev = &vpdma->pdev->dev;
+
+	if (buf->mapped)
+		dma_unmap_single(dev, buf->dma_addr, buf->size, DMA_TO_DEVICE);
+
+	buf->mapped = 0;
+}
+
+/*
+ * create a descriptor list, the user of this list will append configuration,
+ * contorl and data descriptors to this list, this list will be submitted to
+ * VPDMA. VPDMA's list parser will go through each descriptor and perform the
+ * required DMA operations
+ */
+int vpdma_create_desc_list(struct vpdma_desc_list *list, size_t size, int type)
+{
+	int r;
+
+	r = vpdma_buf_alloc(&list->buf, size);
+	if (r)
+		return r;
+
+	list->next = list->buf.addr;
+
+	list->type = type;
+
+	return 0;
+}
+
+/*
+ * once a descriptor list is parsed by VPDMA, we reset the list by emptying it,
+ * to allow new descriptors to be added to the list.
+ */
+void vpdma_reset_desc_list(struct vpdma_desc_list *list)
+{
+	list->next = list->buf.addr;
+}
+
+/*
+ * free the buffer allocated fot the VPDMA descriptor list, this should be
+ * called when the user doesn't want to use VPDMA any more.
+ */
+void vpdma_free_desc_list(struct vpdma_desc_list *list)
+{
+	vpdma_buf_free(&list->buf);
+
+	list->next = NULL;
+}
+
+static int vpdma_list_busy(struct vpdma_data *vpdma, int list_num)
+{
+	u32 sync_reg = read_reg(vpdma, VPDMA_LIST_STAT_SYNC);
+
+	return (sync_reg >> (list_num + 16)) & 0x01;
+}
+
+/*
+ * submit a list of DMA descriptors to the VPE VPDMA, do not wait for completion
+ */
+int vpdma_submit_descs(struct vpdma_data *vpdma, struct vpdma_desc_list *list)
+{
+	/* we always use the first list */
+	int list_num = 0;
+	int list_size;
+
+	if (vpdma_list_busy(vpdma, list_num))
+		return -EBUSY;
+
+	/* 16-byte granularity */
+	list_size = (list->next - list->buf.addr) >> 4;
+
+	write_reg(vpdma, VPDMA_LIST_ADDR, (u32) list->buf.dma_addr);
+	wmb();
+	write_reg(vpdma, VPDMA_LIST_ATTR,
+			(list_num << VPDMA_LIST_NUM_SHFT) |
+			(list->type << VPDMA_LIST_TYPE_SHFT) |
+			list_size);
+
+	return 0;
+}
+
+/* set or clear the mask for list complete interrupt */
+void vpdma_enable_list_complete_irq(struct vpdma_data *vpdma, int list_num,
+		bool enable)
+{
+	u32 val;
+
+	val = read_reg(vpdma, VPDMA_INT_LIST0_MASK);
+	if (enable)
+		val |= (1 << (list_num * 2));
+	else
+		val &= ~(1 << (list_num * 2));
+	write_reg(vpdma, VPDMA_INT_LIST0_MASK, val);
+}
+
+/* clear previosuly occured list intterupts in the LIST_STAT register */
+void vpdma_clear_list_stat(struct vpdma_data *vpdma)
+{
+	write_reg(vpdma, VPDMA_INT_LIST0_STAT,
+		read_reg(vpdma, VPDMA_INT_LIST0_STAT));
+}
+
+/*
+ * configures the output mode of the line buffer for the given client, the
+ * line buffer content can either be mirrored(each line repeated twice) or
+ * passed to the client as is
+ */
+void vpdma_set_line_mode(struct vpdma_data *vpdma, int line_mode,
+		enum vpdma_channel chan)
+{
+	int client_cstat = chan_info[chan].cstat_offset;
+
+	insert_field_reg(vpdma, client_cstat, line_mode,
+		VPDMA_CSTAT_LINE_MODE_MASK, VPDMA_CSTAT_LINE_MODE_SHIFT);
+}
+
+/*
+ * configures the event which should trigger VPDMA transfer for the given
+ * client
+ */
+void vpdma_set_frame_start_event(struct vpdma_data *vpdma,
+		enum vpdma_frame_start_event fs_event,
+		enum vpdma_channel chan)
+{
+	int client_cstat = chan_info[chan].cstat_offset;
+
+	insert_field_reg(vpdma, client_cstat, fs_event,
+		VPDMA_CSTAT_FRAME_START_MASK, VPDMA_CSTAT_FRAME_START_SHIFT);
+}
+
+static void vpdma_firmware_cb(const struct firmware *f, void *context)
+{
+	struct vpdma_data *vpdma = context;
+	struct vpdma_buf fw_dma_buf;
+	int i, r;
+
+	dev_dbg(&vpdma->pdev->dev, "firmware callback\n");
+
+	if (!f || !f->data) {
+		dev_err(&vpdma->pdev->dev, "couldn't get firmware\n");
+		return;
+	}
+
+	/* already initialized */
+	if (get_field_reg(vpdma, VPDMA_LIST_ATTR, VPDMA_LIST_RDY_MASK,
+			VPDMA_LIST_RDY_SHFT)) {
+		vpdma->ready = true;
+		return;
+	}
+
+	r = vpdma_buf_alloc(&fw_dma_buf, f->size);
+	if (r) {
+		dev_err(&vpdma->pdev->dev,
+			"failed to allocate dma buffer for firmware\n");
+		goto rel_fw;
+	}
+
+	memcpy(fw_dma_buf.addr, f->data, f->size);
+
+	vpdma_buf_map(vpdma, &fw_dma_buf);
+
+	write_reg(vpdma, VPDMA_LIST_ADDR, (u32) fw_dma_buf.dma_addr);
+
+	for (i = 0; i < 100; i++) {		/* max 1 second */
+		msleep_interruptible(10);
+
+		if (get_field_reg(vpdma, VPDMA_LIST_ATTR, VPDMA_LIST_RDY_MASK,
+				VPDMA_LIST_RDY_SHFT))
+			break;
+	}
+
+	if (i == 100) {
+		dev_err(&vpdma->pdev->dev, "firmware upload failed\n");
+		goto free_buf;
+	}
+
+	vpdma->ready = true;
+
+free_buf:
+	vpdma_buf_unmap(vpdma, &fw_dma_buf);
+
+	vpdma_buf_free(&fw_dma_buf);
+rel_fw:
+	release_firmware(f);
+}
+
+static int vpdma_load_firmware(struct vpdma_data *vpdma)
+{
+	int r;
+	struct device *dev = &vpdma->pdev->dev;
+
+	r = request_firmware_nowait(THIS_MODULE, 1,
+		(const char *) VPDMA_FIRMWARE, dev, GFP_KERNEL, vpdma,
+		vpdma_firmware_cb);
+	if (r) {
+		dev_err(dev, "firmware not available %s\n", VPDMA_FIRMWARE);
+		return r;
+	} else {
+		dev_info(dev, "loading firmware %s\n", VPDMA_FIRMWARE);
+	}
+
+	return 0;
+}
+
+int vpdma_init(struct platform_device *pdev, struct vpdma_data **pvpdma)
+{
+	struct resource *res;
+	struct vpdma_data *vpdma;
+	int r;
+
+	dev_dbg(&pdev->dev, "vpdma_init\n");
+
+	vpdma = devm_kzalloc(&pdev->dev, sizeof(*vpdma), GFP_KERNEL);
+	if (!vpdma) {
+		dev_err(&pdev->dev, "couldn't alloc vpdma_dev\n");
+		return -ENOMEM;
+	}
+
+	vpdma->pdev = pdev;
+
+	res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "vpdma");
+	if (res == NULL) {
+		dev_err(&pdev->dev, "missing platform resources data\n");
+		return -ENODEV;
+	}
+
+	vpdma->base = devm_ioremap(&pdev->dev, res->start, resource_size(res));
+	if (!vpdma->base) {
+		dev_err(&pdev->dev, "failed to ioremap\n");
+		return -ENOMEM;
+	}
+
+	r = vpdma_load_firmware(vpdma);
+	if (r) {
+		pr_err("failed to load firmware %s\n", VPDMA_FIRMWARE);
+		return r;
+	}
+
+	*pvpdma = vpdma;
+
+	return 0;
+}
+MODULE_FIRMWARE(VPDMA_FIRMWARE);
diff --git a/drivers/media/platform/ti-vpe/vpdma.h b/drivers/media/platform/ti-vpe/vpdma.h
new file mode 100644
index 0000000..2ea2dd3
--- /dev/null
+++ b/drivers/media/platform/ti-vpe/vpdma.h
@@ -0,0 +1,154 @@
+/*
+ * Copyright (c) 2013 Texas Instruments Inc.
+ *
+ * David Griego, <dagriego@biglakesoftware.com>
+ * Dale Farnsworth, <dale@farnsworth.org>
+ * Archit Taneja, <archit@ti.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#ifndef __TI_VPDMA_H_
+#define __TI_VPDMA_H_
+
+/*
+ * A vpdma_buf tracks the size, DMA address and mapping status of each
+ * driver DMA area.
+ */
+struct vpdma_buf {
+	void			*addr;
+	dma_addr_t		dma_addr;
+	size_t			size;
+	bool			mapped;
+};
+
+struct vpdma_desc_list {
+	struct vpdma_buf buf;
+	void *next;
+	int type;
+};
+
+struct vpdma_data {
+	void __iomem		*base;
+
+	struct platform_device	*pdev;
+
+	/* tells whether vpdma firmware is loaded or not */
+	bool ready;
+};
+
+struct vpdma_data_format {
+	int data_type;
+	u8 depth;
+};
+
+#define VPDMA_DESC_ALIGN		16	/* 16-byte descriptor alignment */
+
+#define VPDMA_MAX_DESC_SIZE		32	/* 8 words */
+
+#define VPDMA_LIST_TYPE_NORMAL		0
+#define VPDMA_LIST_TYPE_SELF_MODIFYING	1
+#define VPDMA_LIST_TYPE_DOORBELL	2
+
+enum vpdma_yuv_formats {
+	VPDMA_DATA_FMT_Y444 = 0,
+	VPDMA_DATA_FMT_Y422,
+	VPDMA_DATA_FMT_Y420,
+	VPDMA_DATA_FMT_C444,
+	VPDMA_DATA_FMT_C422,
+	VPDMA_DATA_FMT_C420,
+	VPDMA_DATA_FMT_YC422,
+	VPDMA_DATA_FMT_YC444,
+	VPDMA_DATA_FMT_CY422,
+};
+
+enum vpdma_rgb_formats {
+	VPDMA_DATA_FMT_RGB565 = 0,
+	VPDMA_DATA_FMT_ARGB16_1555,
+	VPDMA_DATA_FMT_ARGB16,
+	VPDMA_DATA_FMT_RGBA16_5551,
+	VPDMA_DATA_FMT_RGBA16,
+	VPDMA_DATA_FMT_ARGB24,
+	VPDMA_DATA_FMT_RGB24,
+	VPDMA_DATA_FMT_ARGB32,
+	VPDMA_DATA_FMT_RGBA24,
+	VPDMA_DATA_FMT_RGBA32,
+	VPDMA_DATA_FMT_BGR565,
+	VPDMA_DATA_FMT_ABGR16_1555,
+	VPDMA_DATA_FMT_ABGR16,
+	VPDMA_DATA_FMT_BGRA16_5551,
+	VPDMA_DATA_FMT_BGRA16,
+	VPDMA_DATA_FMT_ABGR24,
+	VPDMA_DATA_FMT_BGR24,
+	VPDMA_DATA_FMT_ABGR32,
+	VPDMA_DATA_FMT_BGRA24,
+	VPDMA_DATA_FMT_BGRA32,
+};
+
+enum vpdma_misc_formats {
+	VPDMA_DATA_FMT_MV = 0,
+};
+
+extern struct vpdma_data_format vpdma_yuv_fmts[];
+extern struct vpdma_data_format vpdma_rgb_fmts[];
+extern struct vpdma_data_format vpdma_misc_fmts[];
+
+enum vpdma_frame_start_event {
+	VPDMA_FSEVENT_HDMI_FID = 0,
+	VPDMA_FSEVENT_DVO2_FID,
+	VPDMA_FSEVENT_HDCOMP_FID,
+	VPDMA_FSEVENT_SD_FID,
+	VPDMA_FSEVENT_LM_FID0,
+	VPDMA_FSEVENT_LM_FID1,
+	VPDMA_FSEVENT_LM_FID2,
+	VPDMA_FSEVENT_CHANNEL_ACTIVE,
+};
+
+/*
+ * VPDMA channel numbers
+ */
+enum vpdma_channel {
+	VPE_CHAN_LUMA1_IN,
+	VPE_CHAN_CHROMA1_IN,
+	VPE_CHAN_LUMA2_IN,
+	VPE_CHAN_CHROMA2_IN,
+	VPE_CHAN_LUMA3_IN,
+	VPE_CHAN_CHROMA3_IN,
+	VPE_CHAN_MV_IN,
+	VPE_CHAN_MV_OUT,
+	VPE_CHAN_LUMA_OUT,
+	VPE_CHAN_CHROMA_OUT,
+	VPE_CHAN_RGB_OUT,
+};
+
+/* vpdma descriptor buffer allocation and management */
+int vpdma_buf_alloc(struct vpdma_buf *buf, size_t size);
+void vpdma_buf_free(struct vpdma_buf *buf);
+void vpdma_buf_map(struct vpdma_data *vpdma, struct vpdma_buf *buf);
+void vpdma_buf_unmap(struct vpdma_data *vpdma, struct vpdma_buf *buf);
+
+/* vpdma descriptor list funcs */
+int vpdma_create_desc_list(struct vpdma_desc_list *list, size_t size, int type);
+void vpdma_reset_desc_list(struct vpdma_desc_list *list);
+void vpdma_free_desc_list(struct vpdma_desc_list *list);
+int vpdma_submit_descs(struct vpdma_data *vpdma, struct vpdma_desc_list *list);
+
+/* vpdma list interrupt management */
+void vpdma_enable_list_complete_irq(struct vpdma_data *vpdma, int list_num,
+		bool enable);
+void vpdma_clear_list_stat(struct vpdma_data *vpdma);
+
+/* vpdma client configuration */
+void vpdma_set_line_mode(struct vpdma_data *vpdma, int line_mode,
+		enum vpdma_channel chan);
+void vpdma_set_frame_start_event(struct vpdma_data *vpdma,
+		enum vpdma_frame_start_event fs_event, enum vpdma_channel chan);
+
+void vpdma_dump_regs(struct vpdma_data *vpdma);
+
+/* initialize vpdma, passed with VPE's platform device pointer */
+int vpdma_init(struct platform_device *pdev, struct vpdma_data **pvpdma);
+
+#endif
diff --git a/drivers/media/platform/ti-vpe/vpdma_priv.h b/drivers/media/platform/ti-vpe/vpdma_priv.h
new file mode 100644
index 0000000..8ff51a3
--- /dev/null
+++ b/drivers/media/platform/ti-vpe/vpdma_priv.h
@@ -0,0 +1,119 @@
+/*
+ * Copyright (c) 2013 Texas Instruments Inc.
+ *
+ * David Griego, <dagriego@biglakesoftware.com>
+ * Dale Farnsworth, <dale@farnsworth.org>
+ * Archit Taneja, <archit@ti.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#ifndef _TI_VPDMA_PRIV_H_
+#define _TI_VPDMA_PRIV_H_
+
+/*
+ * VPDMA Register offsets
+ */
+
+/* Top level */
+#define VPDMA_PID		0x00
+#define VPDMA_LIST_ADDR		0x04
+#define VPDMA_LIST_ATTR		0x08
+#define VPDMA_LIST_STAT_SYNC	0x0c
+#define VPDMA_BG_RGB		0x18
+#define VPDMA_BG_YUV		0x1c
+#define VPDMA_SETUP		0x30
+#define VPDMA_MAX_SIZE1		0x34
+#define VPDMA_MAX_SIZE2		0x38
+#define VPDMA_MAX_SIZE3		0x3c
+
+/* Interrupts */
+#define VPDMA_INT_CHAN_STAT(grp)	(0x40 + grp * 8)
+#define VPDMA_INT_CHAN_MASK(grp)	(VPDMA_INT_CHAN_STAT(grp) + 4)
+#define VPDMA_INT_CLIENT0_STAT		0x78
+#define VPDMA_INT_CLIENT0_MASK		0x7c
+#define VPDMA_INT_CLIENT1_STAT		0x80
+#define VPDMA_INT_CLIENT1_MASK		0x84
+#define VPDMA_INT_LIST0_STAT		0x88
+#define VPDMA_INT_LIST0_MASK		0x8c
+
+#define VPDMA_PERFMON(i)		(0x200 + i * 4)
+
+/* VPE specific client registers */
+#define VPDMA_DEI_CHROMA1_CSTAT		0x0300
+#define VPDMA_DEI_LUMA1_CSTAT		0x0304
+#define VPDMA_DEI_LUMA2_CSTAT		0x0308
+#define VPDMA_DEI_CHROMA2_CSTAT		0x030c
+#define VPDMA_DEI_LUMA3_CSTAT		0x0310
+#define VPDMA_DEI_CHROMA3_CSTAT		0x0314
+#define VPDMA_DEI_MV_IN_CSTAT		0x0330
+#define VPDMA_DEI_MV_OUT_CSTAT		0x033c
+#define VPDMA_VIP_UP_Y_CSTAT		0x0390
+#define VPDMA_VIP_UP_UV_CSTAT		0x0394
+#define VPDMA_VPI_CTL_CSTAT		0x03d0
+
+/* Reg field info for VPDMA_CLIENT_CSTAT registers */
+#define VPDMA_CSTAT_LINE_MODE_MASK	0x03
+#define VPDMA_CSTAT_LINE_MODE_SHIFT	8
+#define VPDMA_CSTAT_FRAME_START_MASK	0xf
+#define VPDMA_CSTAT_FRAME_START_SHIFT	10
+
+#define VPDMA_LIST_NUM_MASK		0x07
+#define VPDMA_LIST_NUM_SHFT		24
+#define VPDMA_LIST_STOP_SHFT		20
+#define VPDMA_LIST_RDY_MASK		0x01
+#define VPDMA_LIST_RDY_SHFT		19
+#define VPDMA_LIST_TYPE_MASK		0x03
+#define VPDMA_LIST_TYPE_SHFT		16
+#define VPDMA_LIST_SIZE_MASK		0xffff
+
+/* VPDMA data type values for data formats */
+#define DATA_TYPE_Y444				0x0
+#define DATA_TYPE_Y422				0x1
+#define DATA_TYPE_Y420				0x2
+#define DATA_TYPE_C444				0x4
+#define DATA_TYPE_C422				0x5
+#define DATA_TYPE_C420				0x6
+#define DATA_TYPE_YC422				0x7
+#define DATA_TYPE_YC444				0x8
+#define DATA_TYPE_CY422				0x23
+
+#define DATA_TYPE_RGB16_565			0x0
+#define DATA_TYPE_ARGB_1555			0x1
+#define DATA_TYPE_ARGB_4444			0x2
+#define DATA_TYPE_RGBA_5551			0x3
+#define DATA_TYPE_RGBA_4444			0x4
+#define DATA_TYPE_ARGB24_6666			0x5
+#define DATA_TYPE_RGB24_888			0x6
+#define DATA_TYPE_ARGB32_8888			0x7
+#define DATA_TYPE_RGBA24_6666			0x8
+#define DATA_TYPE_RGBA32_8888			0x9
+#define DATA_TYPE_BGR16_565			0x10
+#define DATA_TYPE_ABGR_1555			0x11
+#define DATA_TYPE_ABGR_4444			0x12
+#define DATA_TYPE_BGRA_5551			0x13
+#define DATA_TYPE_BGRA_4444			0x14
+#define DATA_TYPE_ABGR24_6666			0x15
+#define DATA_TYPE_BGR24_888			0x16
+#define DATA_TYPE_ABGR32_8888			0x17
+#define DATA_TYPE_BGRA24_6666			0x18
+#define DATA_TYPE_BGRA32_8888			0x19
+
+#define DATA_TYPE_MV				0x3
+
+/* VPDMA channel numbers(only VPE channels for now) */
+#define	VPE_CHAN_NUM_LUMA1_IN		0
+#define	VPE_CHAN_NUM_CHROMA1_IN		1
+#define	VPE_CHAN_NUM_LUMA2_IN		2
+#define	VPE_CHAN_NUM_CHROMA2_IN		3
+#define	VPE_CHAN_NUM_LUMA3_IN		4
+#define	VPE_CHAN_NUM_CHROMA3_IN		5
+#define	VPE_CHAN_NUM_MV_IN		12
+#define	VPE_CHAN_NUM_MV_OUT		15
+#define	VPE_CHAN_NUM_LUMA_OUT		102
+#define	VPE_CHAN_NUM_CHROMA_OUT		103
+#define	VPE_CHAN_NUM_RGB_OUT		106
+
+#endif
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH 1/6] v4l: ti-vpe: Create a vpdma helper library
@ 2013-08-02 14:03   ` Archit Taneja
  0 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-08-02 14:03 UTC (permalink / raw)
  To: linux-media
  Cc: linux-omap, dagriego, dale, pawel, m.szyprowski, hverkuil,
	laurent.pinchart, tomi.valkeinen, Archit Taneja

The primary function of VPDMA is to move data between external memory and
internal processing modules(in our case, VPE) that source or sink data. VPDMA is
capable of buffering this data and then delivering the data as demanded to the
modules as programmed. The modules that source or sink data are referred to as
clients or ports. A channel is setup inside the VPDMA to connect a specific
memory buffer to a specific client. The VPDMA centralizes the DMA control
functions and buffering required to allow all the clients to minimize the
effect of long latency times.

Add the following to the VPDMA helper:

- A data struct which describe VPDMA channels. For now, these channels are the
  ones used only by VPE, the list of channels will increase when VIP(Video
  Input Port) also uses the VPDMA library. This channel information will be
  used to populate fields required by data descriptors.

- Data structs which describe the different data types supported by VPDMA. This
  data type information will be used to populate fields required by data
  descriptors and used by the VPE driver to map a V4L2 format to the
  corresponding VPDMA data type.

- Provide VPDMA register offset definitions, functions to read, write and modify
  VPDMA registers.

- Functions to create and submit a VPDMA list. A list is a group of descriptors
  that makes up a set of DMA transfers that need to be completed. Each
  descriptor will either perform a DMA transaction to fetch input buffers and
  write to output buffers(data descriptors), or configure the MMRs of sub blocks
  of VPE(configuration descriptors), or provide control information to VPDMA
  (control descriptors).

- Functions to allocate, map and unmap buffers needed for the descriptor list,
  payloads containing MMR values and motion vector buffers. These use the
  DMA mapping APIs to ensure exclusive access to VPDMA.

- Functions to enable VPDMA interrupts. VPDMA can trigger an interrupt on the
  VPE interrupt line when a descriptor list is parsed completely and the DMA
  transactions are completed. This requires masking the events in VPDMA
  registers and configuring some top level VPE interrupt registers.

- Enable some VPDMA specific parameters: frame start event(when to start DMA for
  a client) and line mode(whether each line fetched should be mirrored or not).

- Function to load firmware required by VPDMA. VPDMA requires a firmware for
  it's internal list manager. We add the required request_firmware apis to fetch
  this firmware from user space.

- Function to dump VPDMA registers.

- A function to initialize VPDMA, this will be called by the VPE driver with
  it's platform device pointer, this function will take care of loading VPDMA
  firmware and returning a handle back to the VPE driver. The VIP driver will
  also call the same init function to initialize it's own VPDMA instance.

Signed-off-by: Archit Taneja <archit@ti.com>
---
 drivers/media/platform/ti-vpe/vpdma.c      | 589 +++++++++++++++++++++++++++++
 drivers/media/platform/ti-vpe/vpdma.h      | 154 ++++++++
 drivers/media/platform/ti-vpe/vpdma_priv.h | 119 ++++++
 3 files changed, 862 insertions(+)
 create mode 100644 drivers/media/platform/ti-vpe/vpdma.c
 create mode 100644 drivers/media/platform/ti-vpe/vpdma.h
 create mode 100644 drivers/media/platform/ti-vpe/vpdma_priv.h

diff --git a/drivers/media/platform/ti-vpe/vpdma.c b/drivers/media/platform/ti-vpe/vpdma.c
new file mode 100644
index 0000000..b15b3dd
--- /dev/null
+++ b/drivers/media/platform/ti-vpe/vpdma.c
@@ -0,0 +1,589 @@
+/*
+ * VPDMA helper library
+ *
+ * Copyright (c) 2013 Texas Instruments Inc.
+ *
+ * David Griego, <dagriego@biglakesoftware.com>
+ * Dale Farnsworth, <dale@farnsworth.org>
+ * Archit Taneja, <archit@ti.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#include <linux/delay.h>
+#include <linux/dma-mapping.h>
+#include <linux/err.h>
+#include <linux/firmware.h>
+#include <linux/io.h>
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+
+#include "vpdma.h"
+#include "vpdma_priv.h"
+
+#define VPDMA_FIRMWARE	"vpdma-1b8.bin"
+
+struct vpdma_data_format vpdma_yuv_fmts[] = {
+	[VPDMA_DATA_FMT_Y444] = {
+		.data_type	= DATA_TYPE_Y444,
+		.depth		= 8,
+	},
+	[VPDMA_DATA_FMT_Y422] = {
+		.data_type	= DATA_TYPE_Y422,
+		.depth		= 8,
+	},
+	[VPDMA_DATA_FMT_Y420] = {
+		.data_type	= DATA_TYPE_Y420,
+		.depth		= 8,
+	},
+	[VPDMA_DATA_FMT_C444] = {
+		.data_type	= DATA_TYPE_C444,
+		.depth		= 8,
+	},
+	[VPDMA_DATA_FMT_C422] = {
+		.data_type	= DATA_TYPE_C422,
+		.depth		= 8,
+	},
+	[VPDMA_DATA_FMT_C420] = {
+		.data_type	= DATA_TYPE_C420,
+		.depth		= 4,
+	},
+	[VPDMA_DATA_FMT_YC422] = {
+		.data_type	= DATA_TYPE_YC422,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_YC444] = {
+		.data_type	= DATA_TYPE_YC444,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_CY422] = {
+		.data_type	= DATA_TYPE_CY422,
+		.depth		= 16,
+	},
+};
+
+struct vpdma_data_format vpdma_rgb_fmts[] = {
+	[VPDMA_DATA_FMT_RGB565] = {
+		.data_type	= DATA_TYPE_RGB16_565,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_ARGB16_1555] = {
+		.data_type	= DATA_TYPE_ARGB_1555,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_ARGB16] = {
+		.data_type	= DATA_TYPE_ARGB_4444,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_RGBA16_5551] = {
+		.data_type	= DATA_TYPE_RGBA_5551,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_RGBA16] = {
+		.data_type	= DATA_TYPE_RGBA_4444,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_ARGB24] = {
+		.data_type	= DATA_TYPE_ARGB24_6666,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_RGB24] = {
+		.data_type	= DATA_TYPE_RGB24_888,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_ARGB32] = {
+		.data_type	= DATA_TYPE_ARGB32_8888,
+		.depth		= 32,
+	},
+	[VPDMA_DATA_FMT_RGBA24] = {
+		.data_type	= DATA_TYPE_RGBA24_6666,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_RGBA32] = {
+		.data_type	= DATA_TYPE_RGBA32_8888,
+		.depth		= 32,
+	},
+	[VPDMA_DATA_FMT_BGR565] = {
+		.data_type	= DATA_TYPE_BGR16_565,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_ABGR16_1555] = {
+		.data_type	= DATA_TYPE_ABGR_1555,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_ABGR16] = {
+		.data_type	= DATA_TYPE_ABGR_4444,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_BGRA16_5551] = {
+		.data_type	= DATA_TYPE_BGRA_5551,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_BGRA16] = {
+		.data_type	= DATA_TYPE_BGRA_4444,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_ABGR24] = {
+		.data_type	= DATA_TYPE_ABGR24_6666,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_BGR24] = {
+		.data_type	= DATA_TYPE_BGR24_888,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_ABGR32] = {
+		.data_type	= DATA_TYPE_ABGR32_8888,
+		.depth		= 32,
+	},
+	[VPDMA_DATA_FMT_BGRA24] = {
+		.data_type	= DATA_TYPE_BGRA24_6666,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_BGRA32] = {
+		.data_type	= DATA_TYPE_BGRA32_8888,
+		.depth		= 32,
+	},
+};
+
+struct vpdma_data_format vpdma_misc_fmts[] = {
+	[VPDMA_DATA_FMT_MV] = {
+		.data_type	= DATA_TYPE_MV,
+		.depth		= 4,
+	},
+};
+
+struct vpdma_channel_info {
+	int num;		/* VPDMA channel number */
+	int cstat_offset;	/* client CSTAT register offset */
+};
+
+static struct vpdma_channel_info chan_info[] = {
+	[VPE_CHAN_LUMA1_IN] = {
+		.num		= VPE_CHAN_NUM_LUMA1_IN,
+		.cstat_offset	= VPDMA_DEI_LUMA1_CSTAT,
+	},
+	[VPE_CHAN_CHROMA1_IN] = {
+		.num		= VPE_CHAN_NUM_CHROMA1_IN,
+		.cstat_offset	= VPDMA_DEI_CHROMA1_CSTAT,
+	},
+	[VPE_CHAN_LUMA2_IN] = {
+		.num		= VPE_CHAN_NUM_LUMA2_IN,
+		.cstat_offset	= VPDMA_DEI_LUMA2_CSTAT,
+	},
+	[VPE_CHAN_CHROMA2_IN] = {
+		.num		= VPE_CHAN_NUM_CHROMA2_IN,
+		.cstat_offset	= VPDMA_DEI_CHROMA2_CSTAT,
+	},
+	[VPE_CHAN_LUMA3_IN] = {
+		.num		= VPE_CHAN_NUM_LUMA3_IN,
+		.cstat_offset	= VPDMA_DEI_LUMA3_CSTAT,
+	},
+	[VPE_CHAN_CHROMA3_IN] = {
+		.num		= VPE_CHAN_NUM_CHROMA3_IN,
+		.cstat_offset	= VPDMA_DEI_CHROMA3_CSTAT,
+	},
+	[VPE_CHAN_MV_IN] = {
+		.num		= VPE_CHAN_NUM_MV_IN,
+		.cstat_offset	= VPDMA_DEI_MV_IN_CSTAT,
+	},
+	[VPE_CHAN_MV_OUT] = {
+		.num		= VPE_CHAN_NUM_MV_OUT,
+		.cstat_offset	= VPDMA_DEI_MV_OUT_CSTAT,
+	},
+	[VPE_CHAN_LUMA_OUT] = {
+		.num		= VPE_CHAN_NUM_LUMA_OUT,
+		.cstat_offset	= VPDMA_VIP_UP_Y_CSTAT,
+	},
+	[VPE_CHAN_CHROMA_OUT] = {
+		.num		= VPE_CHAN_NUM_CHROMA_OUT,
+		.cstat_offset	= VPDMA_VIP_UP_UV_CSTAT,
+	},
+	[VPE_CHAN_RGB_OUT] = {
+		.num		= VPE_CHAN_NUM_RGB_OUT,
+		.cstat_offset	= VPDMA_VIP_UP_Y_CSTAT,
+	},
+};
+
+static u32 read_reg(struct vpdma_data *vpdma, int offset)
+{
+	return ioread32(vpdma->base + offset);
+}
+
+static void write_reg(struct vpdma_data *vpdma, int offset, u32 value)
+{
+	iowrite32(value, vpdma->base + offset);
+}
+
+static int get_field(u32 value, u32 mask, int shift)
+{
+	return (value & (mask << shift)) >> shift;
+}
+
+static int get_field_reg(struct vpdma_data *vpdma, int offset,
+		u32 mask, int shift)
+{
+	return get_field(read_reg(vpdma, offset), mask, shift);
+}
+
+static void insert_field(u32 *valp, u32 field, u32 mask, int shift)
+{
+	u32 val = *valp;
+
+	val &= ~(mask << shift);
+	val |= (field & mask) << shift;
+	*valp = val;
+}
+
+static void insert_field_reg(struct vpdma_data *vpdma, int offset, u32 field,
+		u32 mask, int shift)
+{
+	u32 val = read_reg(vpdma, offset);
+
+	insert_field(&val, field, mask, shift);
+
+	write_reg(vpdma, offset, val);
+}
+
+void vpdma_dump_regs(struct vpdma_data *vpdma)
+{
+	struct device *dev = &vpdma->pdev->dev;
+
+#define DUMPREG(r) dev_dbg(dev, "%-35s %08x\n", #r, read_reg(vpdma, VPDMA_##r))
+
+	dev_dbg(dev, "VPDMA Registers:\n");
+
+	DUMPREG(PID);
+	DUMPREG(LIST_ADDR);
+	DUMPREG(LIST_ATTR);
+	DUMPREG(LIST_STAT_SYNC);
+	DUMPREG(BG_RGB);
+	DUMPREG(BG_YUV);
+	DUMPREG(SETUP);
+	DUMPREG(MAX_SIZE1);
+	DUMPREG(MAX_SIZE2);
+	DUMPREG(MAX_SIZE3);
+
+	/*
+	 * dumping registers of only group0 and group3, because VPE channels
+	 * lie within group0 and group3 registers
+	 */
+	DUMPREG(INT_CHAN_STAT(0));
+	DUMPREG(INT_CHAN_MASK(0));
+	DUMPREG(INT_CHAN_STAT(3));
+	DUMPREG(INT_CHAN_MASK(3));
+	DUMPREG(INT_CLIENT0_STAT);
+	DUMPREG(INT_CLIENT0_MASK);
+	DUMPREG(INT_CLIENT1_STAT);
+	DUMPREG(INT_CLIENT1_MASK);
+	DUMPREG(INT_LIST0_STAT);
+	DUMPREG(INT_LIST0_MASK);
+
+	/*
+	 * these are registers specific to VPE clients, we can make this
+	 * function dump client registers specific to VPE or VIP based on
+	 * who is using it
+	 */
+	DUMPREG(DEI_CHROMA1_CSTAT);
+	DUMPREG(DEI_LUMA1_CSTAT);
+	DUMPREG(DEI_CHROMA2_CSTAT);
+	DUMPREG(DEI_LUMA2_CSTAT);
+	DUMPREG(DEI_CHROMA3_CSTAT);
+	DUMPREG(DEI_LUMA3_CSTAT);
+	DUMPREG(DEI_MV_IN_CSTAT);
+	DUMPREG(DEI_MV_OUT_CSTAT);
+	DUMPREG(VIP_UP_Y_CSTAT);
+	DUMPREG(VIP_UP_UV_CSTAT);
+	DUMPREG(VPI_CTL_CSTAT);
+}
+
+/*
+ * Allocate a DMA buffer
+ */
+int vpdma_buf_alloc(struct vpdma_buf *buf, size_t size)
+{
+	buf->size = size;
+	buf->mapped = 0;
+	buf->addr = kzalloc(size, GFP_KERNEL);
+	if (!buf->addr)
+		return -ENOMEM;
+
+	WARN_ON((u32) buf->addr & VPDMA_DESC_ALIGN);
+
+	return 0;
+}
+
+void vpdma_buf_free(struct vpdma_buf *buf)
+{
+	WARN_ON(buf->mapped != 0);
+	kfree(buf->addr);
+	buf->addr = NULL;
+	buf->size = 0;
+}
+
+/*
+ * map a DMA buffer, enabling DMA access
+ */
+void vpdma_buf_map(struct vpdma_data *vpdma, struct vpdma_buf *buf)
+{
+	struct device *dev = &vpdma->pdev->dev;
+
+	WARN_ON(buf->mapped != 0);
+	buf->dma_addr = dma_map_single(dev, buf->addr, buf->size,
+				DMA_TO_DEVICE);
+	buf->mapped = 1;
+	BUG_ON(dma_mapping_error(dev, buf->dma_addr));
+}
+
+/*
+ * unmap a DMA buffer, disabling DMA access and
+ * allowing the main processor to acces the data
+ */
+void vpdma_buf_unmap(struct vpdma_data *vpdma, struct vpdma_buf *buf)
+{
+	struct device *dev = &vpdma->pdev->dev;
+
+	if (buf->mapped)
+		dma_unmap_single(dev, buf->dma_addr, buf->size, DMA_TO_DEVICE);
+
+	buf->mapped = 0;
+}
+
+/*
+ * create a descriptor list, the user of this list will append configuration,
+ * contorl and data descriptors to this list, this list will be submitted to
+ * VPDMA. VPDMA's list parser will go through each descriptor and perform the
+ * required DMA operations
+ */
+int vpdma_create_desc_list(struct vpdma_desc_list *list, size_t size, int type)
+{
+	int r;
+
+	r = vpdma_buf_alloc(&list->buf, size);
+	if (r)
+		return r;
+
+	list->next = list->buf.addr;
+
+	list->type = type;
+
+	return 0;
+}
+
+/*
+ * once a descriptor list is parsed by VPDMA, we reset the list by emptying it,
+ * to allow new descriptors to be added to the list.
+ */
+void vpdma_reset_desc_list(struct vpdma_desc_list *list)
+{
+	list->next = list->buf.addr;
+}
+
+/*
+ * free the buffer allocated fot the VPDMA descriptor list, this should be
+ * called when the user doesn't want to use VPDMA any more.
+ */
+void vpdma_free_desc_list(struct vpdma_desc_list *list)
+{
+	vpdma_buf_free(&list->buf);
+
+	list->next = NULL;
+}
+
+static int vpdma_list_busy(struct vpdma_data *vpdma, int list_num)
+{
+	u32 sync_reg = read_reg(vpdma, VPDMA_LIST_STAT_SYNC);
+
+	return (sync_reg >> (list_num + 16)) & 0x01;
+}
+
+/*
+ * submit a list of DMA descriptors to the VPE VPDMA, do not wait for completion
+ */
+int vpdma_submit_descs(struct vpdma_data *vpdma, struct vpdma_desc_list *list)
+{
+	/* we always use the first list */
+	int list_num = 0;
+	int list_size;
+
+	if (vpdma_list_busy(vpdma, list_num))
+		return -EBUSY;
+
+	/* 16-byte granularity */
+	list_size = (list->next - list->buf.addr) >> 4;
+
+	write_reg(vpdma, VPDMA_LIST_ADDR, (u32) list->buf.dma_addr);
+	wmb();
+	write_reg(vpdma, VPDMA_LIST_ATTR,
+			(list_num << VPDMA_LIST_NUM_SHFT) |
+			(list->type << VPDMA_LIST_TYPE_SHFT) |
+			list_size);
+
+	return 0;
+}
+
+/* set or clear the mask for list complete interrupt */
+void vpdma_enable_list_complete_irq(struct vpdma_data *vpdma, int list_num,
+		bool enable)
+{
+	u32 val;
+
+	val = read_reg(vpdma, VPDMA_INT_LIST0_MASK);
+	if (enable)
+		val |= (1 << (list_num * 2));
+	else
+		val &= ~(1 << (list_num * 2));
+	write_reg(vpdma, VPDMA_INT_LIST0_MASK, val);
+}
+
+/* clear previosuly occured list intterupts in the LIST_STAT register */
+void vpdma_clear_list_stat(struct vpdma_data *vpdma)
+{
+	write_reg(vpdma, VPDMA_INT_LIST0_STAT,
+		read_reg(vpdma, VPDMA_INT_LIST0_STAT));
+}
+
+/*
+ * configures the output mode of the line buffer for the given client, the
+ * line buffer content can either be mirrored(each line repeated twice) or
+ * passed to the client as is
+ */
+void vpdma_set_line_mode(struct vpdma_data *vpdma, int line_mode,
+		enum vpdma_channel chan)
+{
+	int client_cstat = chan_info[chan].cstat_offset;
+
+	insert_field_reg(vpdma, client_cstat, line_mode,
+		VPDMA_CSTAT_LINE_MODE_MASK, VPDMA_CSTAT_LINE_MODE_SHIFT);
+}
+
+/*
+ * configures the event which should trigger VPDMA transfer for the given
+ * client
+ */
+void vpdma_set_frame_start_event(struct vpdma_data *vpdma,
+		enum vpdma_frame_start_event fs_event,
+		enum vpdma_channel chan)
+{
+	int client_cstat = chan_info[chan].cstat_offset;
+
+	insert_field_reg(vpdma, client_cstat, fs_event,
+		VPDMA_CSTAT_FRAME_START_MASK, VPDMA_CSTAT_FRAME_START_SHIFT);
+}
+
+static void vpdma_firmware_cb(const struct firmware *f, void *context)
+{
+	struct vpdma_data *vpdma = context;
+	struct vpdma_buf fw_dma_buf;
+	int i, r;
+
+	dev_dbg(&vpdma->pdev->dev, "firmware callback\n");
+
+	if (!f || !f->data) {
+		dev_err(&vpdma->pdev->dev, "couldn't get firmware\n");
+		return;
+	}
+
+	/* already initialized */
+	if (get_field_reg(vpdma, VPDMA_LIST_ATTR, VPDMA_LIST_RDY_MASK,
+			VPDMA_LIST_RDY_SHFT)) {
+		vpdma->ready = true;
+		return;
+	}
+
+	r = vpdma_buf_alloc(&fw_dma_buf, f->size);
+	if (r) {
+		dev_err(&vpdma->pdev->dev,
+			"failed to allocate dma buffer for firmware\n");
+		goto rel_fw;
+	}
+
+	memcpy(fw_dma_buf.addr, f->data, f->size);
+
+	vpdma_buf_map(vpdma, &fw_dma_buf);
+
+	write_reg(vpdma, VPDMA_LIST_ADDR, (u32) fw_dma_buf.dma_addr);
+
+	for (i = 0; i < 100; i++) {		/* max 1 second */
+		msleep_interruptible(10);
+
+		if (get_field_reg(vpdma, VPDMA_LIST_ATTR, VPDMA_LIST_RDY_MASK,
+				VPDMA_LIST_RDY_SHFT))
+			break;
+	}
+
+	if (i == 100) {
+		dev_err(&vpdma->pdev->dev, "firmware upload failed\n");
+		goto free_buf;
+	}
+
+	vpdma->ready = true;
+
+free_buf:
+	vpdma_buf_unmap(vpdma, &fw_dma_buf);
+
+	vpdma_buf_free(&fw_dma_buf);
+rel_fw:
+	release_firmware(f);
+}
+
+static int vpdma_load_firmware(struct vpdma_data *vpdma)
+{
+	int r;
+	struct device *dev = &vpdma->pdev->dev;
+
+	r = request_firmware_nowait(THIS_MODULE, 1,
+		(const char *) VPDMA_FIRMWARE, dev, GFP_KERNEL, vpdma,
+		vpdma_firmware_cb);
+	if (r) {
+		dev_err(dev, "firmware not available %s\n", VPDMA_FIRMWARE);
+		return r;
+	} else {
+		dev_info(dev, "loading firmware %s\n", VPDMA_FIRMWARE);
+	}
+
+	return 0;
+}
+
+int vpdma_init(struct platform_device *pdev, struct vpdma_data **pvpdma)
+{
+	struct resource *res;
+	struct vpdma_data *vpdma;
+	int r;
+
+	dev_dbg(&pdev->dev, "vpdma_init\n");
+
+	vpdma = devm_kzalloc(&pdev->dev, sizeof(*vpdma), GFP_KERNEL);
+	if (!vpdma) {
+		dev_err(&pdev->dev, "couldn't alloc vpdma_dev\n");
+		return -ENOMEM;
+	}
+
+	vpdma->pdev = pdev;
+
+	res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "vpdma");
+	if (res == NULL) {
+		dev_err(&pdev->dev, "missing platform resources data\n");
+		return -ENODEV;
+	}
+
+	vpdma->base = devm_ioremap(&pdev->dev, res->start, resource_size(res));
+	if (!vpdma->base) {
+		dev_err(&pdev->dev, "failed to ioremap\n");
+		return -ENOMEM;
+	}
+
+	r = vpdma_load_firmware(vpdma);
+	if (r) {
+		pr_err("failed to load firmware %s\n", VPDMA_FIRMWARE);
+		return r;
+	}
+
+	*pvpdma = vpdma;
+
+	return 0;
+}
+MODULE_FIRMWARE(VPDMA_FIRMWARE);
diff --git a/drivers/media/platform/ti-vpe/vpdma.h b/drivers/media/platform/ti-vpe/vpdma.h
new file mode 100644
index 0000000..2ea2dd3
--- /dev/null
+++ b/drivers/media/platform/ti-vpe/vpdma.h
@@ -0,0 +1,154 @@
+/*
+ * Copyright (c) 2013 Texas Instruments Inc.
+ *
+ * David Griego, <dagriego@biglakesoftware.com>
+ * Dale Farnsworth, <dale@farnsworth.org>
+ * Archit Taneja, <archit@ti.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#ifndef __TI_VPDMA_H_
+#define __TI_VPDMA_H_
+
+/*
+ * A vpdma_buf tracks the size, DMA address and mapping status of each
+ * driver DMA area.
+ */
+struct vpdma_buf {
+	void			*addr;
+	dma_addr_t		dma_addr;
+	size_t			size;
+	bool			mapped;
+};
+
+struct vpdma_desc_list {
+	struct vpdma_buf buf;
+	void *next;
+	int type;
+};
+
+struct vpdma_data {
+	void __iomem		*base;
+
+	struct platform_device	*pdev;
+
+	/* tells whether vpdma firmware is loaded or not */
+	bool ready;
+};
+
+struct vpdma_data_format {
+	int data_type;
+	u8 depth;
+};
+
+#define VPDMA_DESC_ALIGN		16	/* 16-byte descriptor alignment */
+
+#define VPDMA_MAX_DESC_SIZE		32	/* 8 words */
+
+#define VPDMA_LIST_TYPE_NORMAL		0
+#define VPDMA_LIST_TYPE_SELF_MODIFYING	1
+#define VPDMA_LIST_TYPE_DOORBELL	2
+
+enum vpdma_yuv_formats {
+	VPDMA_DATA_FMT_Y444 = 0,
+	VPDMA_DATA_FMT_Y422,
+	VPDMA_DATA_FMT_Y420,
+	VPDMA_DATA_FMT_C444,
+	VPDMA_DATA_FMT_C422,
+	VPDMA_DATA_FMT_C420,
+	VPDMA_DATA_FMT_YC422,
+	VPDMA_DATA_FMT_YC444,
+	VPDMA_DATA_FMT_CY422,
+};
+
+enum vpdma_rgb_formats {
+	VPDMA_DATA_FMT_RGB565 = 0,
+	VPDMA_DATA_FMT_ARGB16_1555,
+	VPDMA_DATA_FMT_ARGB16,
+	VPDMA_DATA_FMT_RGBA16_5551,
+	VPDMA_DATA_FMT_RGBA16,
+	VPDMA_DATA_FMT_ARGB24,
+	VPDMA_DATA_FMT_RGB24,
+	VPDMA_DATA_FMT_ARGB32,
+	VPDMA_DATA_FMT_RGBA24,
+	VPDMA_DATA_FMT_RGBA32,
+	VPDMA_DATA_FMT_BGR565,
+	VPDMA_DATA_FMT_ABGR16_1555,
+	VPDMA_DATA_FMT_ABGR16,
+	VPDMA_DATA_FMT_BGRA16_5551,
+	VPDMA_DATA_FMT_BGRA16,
+	VPDMA_DATA_FMT_ABGR24,
+	VPDMA_DATA_FMT_BGR24,
+	VPDMA_DATA_FMT_ABGR32,
+	VPDMA_DATA_FMT_BGRA24,
+	VPDMA_DATA_FMT_BGRA32,
+};
+
+enum vpdma_misc_formats {
+	VPDMA_DATA_FMT_MV = 0,
+};
+
+extern struct vpdma_data_format vpdma_yuv_fmts[];
+extern struct vpdma_data_format vpdma_rgb_fmts[];
+extern struct vpdma_data_format vpdma_misc_fmts[];
+
+enum vpdma_frame_start_event {
+	VPDMA_FSEVENT_HDMI_FID = 0,
+	VPDMA_FSEVENT_DVO2_FID,
+	VPDMA_FSEVENT_HDCOMP_FID,
+	VPDMA_FSEVENT_SD_FID,
+	VPDMA_FSEVENT_LM_FID0,
+	VPDMA_FSEVENT_LM_FID1,
+	VPDMA_FSEVENT_LM_FID2,
+	VPDMA_FSEVENT_CHANNEL_ACTIVE,
+};
+
+/*
+ * VPDMA channel numbers
+ */
+enum vpdma_channel {
+	VPE_CHAN_LUMA1_IN,
+	VPE_CHAN_CHROMA1_IN,
+	VPE_CHAN_LUMA2_IN,
+	VPE_CHAN_CHROMA2_IN,
+	VPE_CHAN_LUMA3_IN,
+	VPE_CHAN_CHROMA3_IN,
+	VPE_CHAN_MV_IN,
+	VPE_CHAN_MV_OUT,
+	VPE_CHAN_LUMA_OUT,
+	VPE_CHAN_CHROMA_OUT,
+	VPE_CHAN_RGB_OUT,
+};
+
+/* vpdma descriptor buffer allocation and management */
+int vpdma_buf_alloc(struct vpdma_buf *buf, size_t size);
+void vpdma_buf_free(struct vpdma_buf *buf);
+void vpdma_buf_map(struct vpdma_data *vpdma, struct vpdma_buf *buf);
+void vpdma_buf_unmap(struct vpdma_data *vpdma, struct vpdma_buf *buf);
+
+/* vpdma descriptor list funcs */
+int vpdma_create_desc_list(struct vpdma_desc_list *list, size_t size, int type);
+void vpdma_reset_desc_list(struct vpdma_desc_list *list);
+void vpdma_free_desc_list(struct vpdma_desc_list *list);
+int vpdma_submit_descs(struct vpdma_data *vpdma, struct vpdma_desc_list *list);
+
+/* vpdma list interrupt management */
+void vpdma_enable_list_complete_irq(struct vpdma_data *vpdma, int list_num,
+		bool enable);
+void vpdma_clear_list_stat(struct vpdma_data *vpdma);
+
+/* vpdma client configuration */
+void vpdma_set_line_mode(struct vpdma_data *vpdma, int line_mode,
+		enum vpdma_channel chan);
+void vpdma_set_frame_start_event(struct vpdma_data *vpdma,
+		enum vpdma_frame_start_event fs_event, enum vpdma_channel chan);
+
+void vpdma_dump_regs(struct vpdma_data *vpdma);
+
+/* initialize vpdma, passed with VPE's platform device pointer */
+int vpdma_init(struct platform_device *pdev, struct vpdma_data **pvpdma);
+
+#endif
diff --git a/drivers/media/platform/ti-vpe/vpdma_priv.h b/drivers/media/platform/ti-vpe/vpdma_priv.h
new file mode 100644
index 0000000..8ff51a3
--- /dev/null
+++ b/drivers/media/platform/ti-vpe/vpdma_priv.h
@@ -0,0 +1,119 @@
+/*
+ * Copyright (c) 2013 Texas Instruments Inc.
+ *
+ * David Griego, <dagriego@biglakesoftware.com>
+ * Dale Farnsworth, <dale@farnsworth.org>
+ * Archit Taneja, <archit@ti.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#ifndef _TI_VPDMA_PRIV_H_
+#define _TI_VPDMA_PRIV_H_
+
+/*
+ * VPDMA Register offsets
+ */
+
+/* Top level */
+#define VPDMA_PID		0x00
+#define VPDMA_LIST_ADDR		0x04
+#define VPDMA_LIST_ATTR		0x08
+#define VPDMA_LIST_STAT_SYNC	0x0c
+#define VPDMA_BG_RGB		0x18
+#define VPDMA_BG_YUV		0x1c
+#define VPDMA_SETUP		0x30
+#define VPDMA_MAX_SIZE1		0x34
+#define VPDMA_MAX_SIZE2		0x38
+#define VPDMA_MAX_SIZE3		0x3c
+
+/* Interrupts */
+#define VPDMA_INT_CHAN_STAT(grp)	(0x40 + grp * 8)
+#define VPDMA_INT_CHAN_MASK(grp)	(VPDMA_INT_CHAN_STAT(grp) + 4)
+#define VPDMA_INT_CLIENT0_STAT		0x78
+#define VPDMA_INT_CLIENT0_MASK		0x7c
+#define VPDMA_INT_CLIENT1_STAT		0x80
+#define VPDMA_INT_CLIENT1_MASK		0x84
+#define VPDMA_INT_LIST0_STAT		0x88
+#define VPDMA_INT_LIST0_MASK		0x8c
+
+#define VPDMA_PERFMON(i)		(0x200 + i * 4)
+
+/* VPE specific client registers */
+#define VPDMA_DEI_CHROMA1_CSTAT		0x0300
+#define VPDMA_DEI_LUMA1_CSTAT		0x0304
+#define VPDMA_DEI_LUMA2_CSTAT		0x0308
+#define VPDMA_DEI_CHROMA2_CSTAT		0x030c
+#define VPDMA_DEI_LUMA3_CSTAT		0x0310
+#define VPDMA_DEI_CHROMA3_CSTAT		0x0314
+#define VPDMA_DEI_MV_IN_CSTAT		0x0330
+#define VPDMA_DEI_MV_OUT_CSTAT		0x033c
+#define VPDMA_VIP_UP_Y_CSTAT		0x0390
+#define VPDMA_VIP_UP_UV_CSTAT		0x0394
+#define VPDMA_VPI_CTL_CSTAT		0x03d0
+
+/* Reg field info for VPDMA_CLIENT_CSTAT registers */
+#define VPDMA_CSTAT_LINE_MODE_MASK	0x03
+#define VPDMA_CSTAT_LINE_MODE_SHIFT	8
+#define VPDMA_CSTAT_FRAME_START_MASK	0xf
+#define VPDMA_CSTAT_FRAME_START_SHIFT	10
+
+#define VPDMA_LIST_NUM_MASK		0x07
+#define VPDMA_LIST_NUM_SHFT		24
+#define VPDMA_LIST_STOP_SHFT		20
+#define VPDMA_LIST_RDY_MASK		0x01
+#define VPDMA_LIST_RDY_SHFT		19
+#define VPDMA_LIST_TYPE_MASK		0x03
+#define VPDMA_LIST_TYPE_SHFT		16
+#define VPDMA_LIST_SIZE_MASK		0xffff
+
+/* VPDMA data type values for data formats */
+#define DATA_TYPE_Y444				0x0
+#define DATA_TYPE_Y422				0x1
+#define DATA_TYPE_Y420				0x2
+#define DATA_TYPE_C444				0x4
+#define DATA_TYPE_C422				0x5
+#define DATA_TYPE_C420				0x6
+#define DATA_TYPE_YC422				0x7
+#define DATA_TYPE_YC444				0x8
+#define DATA_TYPE_CY422				0x23
+
+#define DATA_TYPE_RGB16_565			0x0
+#define DATA_TYPE_ARGB_1555			0x1
+#define DATA_TYPE_ARGB_4444			0x2
+#define DATA_TYPE_RGBA_5551			0x3
+#define DATA_TYPE_RGBA_4444			0x4
+#define DATA_TYPE_ARGB24_6666			0x5
+#define DATA_TYPE_RGB24_888			0x6
+#define DATA_TYPE_ARGB32_8888			0x7
+#define DATA_TYPE_RGBA24_6666			0x8
+#define DATA_TYPE_RGBA32_8888			0x9
+#define DATA_TYPE_BGR16_565			0x10
+#define DATA_TYPE_ABGR_1555			0x11
+#define DATA_TYPE_ABGR_4444			0x12
+#define DATA_TYPE_BGRA_5551			0x13
+#define DATA_TYPE_BGRA_4444			0x14
+#define DATA_TYPE_ABGR24_6666			0x15
+#define DATA_TYPE_BGR24_888			0x16
+#define DATA_TYPE_ABGR32_8888			0x17
+#define DATA_TYPE_BGRA24_6666			0x18
+#define DATA_TYPE_BGRA32_8888			0x19
+
+#define DATA_TYPE_MV				0x3
+
+/* VPDMA channel numbers(only VPE channels for now) */
+#define	VPE_CHAN_NUM_LUMA1_IN		0
+#define	VPE_CHAN_NUM_CHROMA1_IN		1
+#define	VPE_CHAN_NUM_LUMA2_IN		2
+#define	VPE_CHAN_NUM_CHROMA2_IN		3
+#define	VPE_CHAN_NUM_LUMA3_IN		4
+#define	VPE_CHAN_NUM_CHROMA3_IN		5
+#define	VPE_CHAN_NUM_MV_IN		12
+#define	VPE_CHAN_NUM_MV_OUT		15
+#define	VPE_CHAN_NUM_LUMA_OUT		102
+#define	VPE_CHAN_NUM_CHROMA_OUT		103
+#define	VPE_CHAN_NUM_RGB_OUT		106
+
+#endif
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH 2/6] v4l: ti-vpe: Add helpers for creating VPDMA descriptors
  2013-08-02 14:03 ` Archit Taneja
@ 2013-08-02 14:03   ` Archit Taneja
  -1 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-08-02 14:03 UTC (permalink / raw)
  To: linux-media
  Cc: linux-omap, dagriego, dale, pawel, m.szyprowski, hverkuil,
	laurent.pinchart, tomi.valkeinen, Archit Taneja

Create functions which the VPE driver can use to create a VPDMA descriptor and
add it to a VPDMA descriptor list. These functions take a pointer to an existing
list, and append the configuration/data/control descriptor header to the list.

In the case of configuration descriptors, the creation of a payload block may be
required(the payloads can hold VPE MMR values, or scaler coefficients). The
allocation of the payload buffer and it's content is left to the VPE driver.
However, the VPDMA library provides helper macros to create payload in the
correct format.

Add debug functions to dump the descriptors in a way such that it's easy to see
the values of different fields in the descriptors.

Signed-off-by: Archit Taneja <archit@ti.com>
---
 drivers/media/platform/ti-vpe/vpdma.c      | 269 +++++++++++
 drivers/media/platform/ti-vpe/vpdma.h      |  48 ++
 drivers/media/platform/ti-vpe/vpdma_priv.h | 695 +++++++++++++++++++++++++++++
 3 files changed, 1012 insertions(+)

diff --git a/drivers/media/platform/ti-vpe/vpdma.c b/drivers/media/platform/ti-vpe/vpdma.c
index b15b3dd..b957381 100644
--- a/drivers/media/platform/ti-vpe/vpdma.c
+++ b/drivers/media/platform/ti-vpe/vpdma.c
@@ -21,6 +21,7 @@
 #include <linux/platform_device.h>
 #include <linux/sched.h>
 #include <linux/slab.h>
+#include <linux/videodev2.h>
 
 #include "vpdma.h"
 #include "vpdma_priv.h"
@@ -425,6 +426,274 @@ int vpdma_submit_descs(struct vpdma_data *vpdma, struct vpdma_desc_list *list)
 	return 0;
 }
 
+static void dump_cfd(struct vpdma_cfd *cfd)
+{
+	int class;
+
+	class = cfd_get_class(cfd);
+
+	pr_debug("config descriptor of payload class: %s\n",
+		class == CFD_CLS_BLOCK ? "simple block" :
+		"address data block");
+
+	if (class == CFD_CLS_BLOCK)
+		pr_debug("word0: dst_addr_offset = 0x%08x\n",
+			cfd_get_dest_addr_offset(cfd));
+
+	if (class == CFD_CLS_BLOCK)
+		pr_debug("word1: num_data_wrds = %d\n", cfd_get_block_len(cfd));
+
+	pr_debug("word2: payload_addr = 0x%08x\n", cfd_get_payload_addr(cfd));
+
+	pr_debug("word3: pkt_type = %d, direct = %d, class = %d, dest = %d, "
+		"payload_len = %d\n", cfd_get_pkt_type(cfd),
+		cfd_get_direct(cfd), class, cfd_get_dest(cfd),
+		cfd_get_payload_len(cfd));
+}
+
+/*
+ * append a configuration descriptor to the given descriptor list, where the
+ * payload is in the form of a simple data block specified in the descriptor
+ * header, this is used to upload scaler coefficients to the scaler module
+ */
+void vpdma_add_cfd_block(struct vpdma_desc_list *list, int client,
+		struct vpdma_buf *blk, u32 dest_offset)
+{
+	struct vpdma_cfd *cfd;
+	int len = blk->size;
+
+	WARN_ON(blk->dma_addr & VPDMA_DESC_ALIGN);
+
+	cfd = list->next;
+	WARN_ON((void *)(cfd + 1) > (list->buf.addr + list->buf.size));
+
+	cfd_set_dest_addr_offset(cfd, dest_offset);
+	cfd_set_block_len(cfd, len);
+	cfd_set_payload_addr(cfd, blk->dma_addr);
+	cfd_set_pkt_payload_len(cfd, CFD_INDIRECT, CFD_CLS_BLOCK, client,
+		len >> 4);
+
+	list->next = cfd + 1;
+
+	dump_cfd(cfd);
+}
+
+/*
+ * append a configuration descriptor to the given descriptor list, where the
+ * payload is in the address data block format, this is used to a configure a
+ * discontiguous set of MMRs
+ */
+void vpdma_add_cfd_adb(struct vpdma_desc_list *list, int client,
+		struct vpdma_buf *adb)
+{
+	struct vpdma_cfd *cfd;
+	unsigned int len = adb->size;
+
+	WARN_ON(len & VPDMA_ADB_SIZE_ALIGN);
+	WARN_ON(adb->dma_addr & VPDMA_DESC_ALIGN);
+
+	cfd = list->next;
+	BUG_ON((void *)(cfd + 1) > (list->buf.addr + list->buf.size));
+
+	cfd_set_w0(cfd, 0);
+	cfd_set_w1(cfd, 0);
+	cfd_set_payload_addr(cfd, adb->dma_addr);
+	cfd_set_pkt_payload_len(cfd, CFD_INDIRECT, CFD_CLS_ADB, client,
+		len >> 4);
+
+	list->next = cfd + 1;
+
+	dump_cfd(cfd);
+};
+
+/*
+ * control descriptor format change based on what type of control descriptor it
+ * is, we only use 'sync on channel' control descriptors for now, so assume it's
+ * that
+ */
+static void dump_ctd(struct vpdma_ctd *ctd)
+{
+	pr_debug("control descriptor\n");
+
+	pr_debug("word3: pkt_type = %d, source = %d, ctl_type = %d\n",
+		ctd_get_pkt_type(ctd), ctd_get_source(ctd), ctd_get_ctl(ctd));
+}
+
+/*
+ * append a 'sync on channel' type control descriptor to the given descriptor
+ * list, this descriptor stalls the VPDMA list till the time DMA is completed
+ * on the specified channel
+ */
+void vpdma_add_sync_on_channel_ctd(struct vpdma_desc_list *list,
+		enum vpdma_channel chan)
+{
+	struct vpdma_ctd *ctd;
+
+	ctd = list->next;
+	WARN_ON((void *)(ctd + 1) > (list->buf.addr + list->buf.size));
+
+	ctd_set_w0(ctd, 0);
+	ctd_set_w1(ctd, 0);
+	ctd_set_w2(ctd, 0);
+	ctd_set_type_source_ctl(ctd, chan_info[chan].num,
+		CTD_TYPE_SYNC_ON_CHANNEL);
+
+	list->next = ctd + 1;
+
+	dump_ctd(ctd);
+}
+
+static void dump_dtd(struct vpdma_dtd *dtd)
+{
+	int dir, chan;
+
+	dir = dtd_get_dir(dtd);
+	chan = dtd_get_chan(dtd);
+
+	pr_debug("%s data transfer descriptor for channel %d\n",
+		dir == DTD_DIR_OUT ? "outbound" : "inbound", chan);
+
+	pr_debug("word0: data_type = %d, notify = %d, field = %d, 1D = %d, "
+		"even_ln_skp = %d, odd_ln_skp = %d, line_stride = %d\n",
+		dtd_get_data_type(dtd), dtd_get_notify(dtd), dtd_get_field(dtd),
+		dtd_get_1d(dtd), dtd_get_even_line_skip(dtd),
+		dtd_get_odd_line_skip(dtd), dtd_get_line_stride(dtd));
+
+	if (dir == DTD_DIR_IN)
+		pr_debug("word1: line_length = %d, xfer_height = %d\n",
+			dtd_get_line_length(dtd), dtd_get_xfer_height(dtd));
+
+	pr_debug("word2: start_addr = 0x%08x\n", dtd_get_start_addr(dtd));
+
+	pr_debug("word3: pkt_type = %d, mode = %d, dir = %d, chan = %d, "
+		"pri = %d, next_chan = %d\n", dtd_get_pkt_type(dtd),
+		dtd_get_mode(dtd), dir, chan, dtd_get_priority(dtd),
+		dtd_get_next_chan(dtd));
+
+	if (dir == DTD_DIR_IN)
+		pr_debug("word4: frame_width = %d, frame_height = %d\n",
+			dtd_get_frame_width(dtd), dtd_get_frame_height(dtd));
+	else
+		pr_debug("word4: desc_write_addr = 0x%08x, write_desc = %d, "
+			" drp_data = %d, use_desc_reg = %d\n",
+			dtd_get_desc_write_addr(dtd), dtd_get_write_desc(dtd),
+			dtd_get_drop_data(dtd), dtd_get_use_desc(dtd));
+
+	if (dir == DTD_DIR_IN)
+		pr_debug("word5: hor_start = %d, ver_start = %d\n",
+			dtd_get_h_start(dtd), dtd_get_v_start(dtd));
+	else
+		pr_debug("word5: max_width %d, max_height %d\n",
+			dtd_get_max_width(dtd), dtd_get_max_height(dtd));
+
+	pr_debug("word6: client specfic attr0 = 0x%08x\n", dtd->client_attr0);
+	pr_debug("word7: client specfic attr1 = 0x%08x\n", dtd->client_attr1);
+}
+
+/*
+ * append an outbound data transfer descriptor to the given descriptor list,
+ * this sets up a 'client to memory' VPDMA transfer for the given VPDMA channel
+ */
+void vpdma_add_out_dtd(struct vpdma_desc_list *list, struct v4l2_rect *c_rect,
+		struct vpdma_data_format *fmt, dma_addr_t dma_addr,
+		enum vpdma_channel chan, u32 flags)
+{
+	int priority = 0;
+	int field = 0;
+	int notify = 1;
+	int channel, next_chan;
+	int depth = fmt->depth;
+	int stride;
+	struct vpdma_dtd *dtd;
+
+	channel = next_chan = chan_info[chan].num;
+
+	if (fmt->data_type == DATA_TYPE_C420)
+		depth = 8;
+
+	stride = (depth * c_rect->width) >> 3;
+	dma_addr += (c_rect->left * depth) >> 3;
+
+	dtd = list->next;
+	WARN_ON((void *)(dtd + 1) > (list->buf.addr + list->buf.size));
+
+	dtd_set_type_ctl_stride(dtd,
+				fmt->data_type,
+				notify,
+				field,
+				!!(flags & VPDMA_DATA_FRAME_1D),
+				!!(flags & VPDMA_DATA_EVEN_LINE_SKIP),
+				!!(flags & VPDMA_DATA_ODD_LINE_SKIP),
+				stride);
+	dtd_set_w1(dtd, 0);
+	dtd_set_start_addr(dtd, dma_addr);
+	dtd_set_pkt_ctl(dtd, !!(flags & VPDMA_DATA_MODE_TILED), DTD_DIR_OUT,
+			channel, priority, next_chan);
+
+	dtd_set_desc_write_addr(dtd, 0, 0, 0, 0);
+	dtd_set_max_width_height(dtd, MAX_OUT_WIDTH_1920, MAX_OUT_HEIGHT_1080);
+	dtd_set_client_attr0(dtd, 0);
+	dtd_set_client_attr1(dtd, 0);
+
+	list->next = dtd + 1;
+
+	dump_dtd(dtd);
+}
+
+/*
+ * append an inbound data transfer descriptor to the given descriptor list,
+ * this sets up a 'memory to client' VPDMA transfer for the given VPDMA channel
+ */
+void vpdma_add_in_dtd(struct vpdma_desc_list *list, int frame_width,
+		int frame_height, struct v4l2_rect *c_rect,
+		struct vpdma_data_format *fmt, dma_addr_t dma_addr,
+		enum vpdma_channel chan, int field, u32 flags)
+{
+	int priority = 0;
+	int notify = 1;
+	int depth = fmt->depth;
+	int channel, next_chan;
+	int stride;
+	int height = c_rect->height;
+	struct vpdma_dtd *dtd;
+
+	channel = next_chan = chan_info[chan].num;
+
+	if (fmt->data_type == DATA_TYPE_C420) {
+		height >>= 1;
+		frame_height >>= 1;
+		depth = 8;
+	}
+
+	stride = (depth * c_rect->width) >> 3;
+	dma_addr += (c_rect->left * depth) >> 3;
+
+	dtd = list->next;
+	WARN_ON((void *)(dtd + 1) > (list->buf.addr + list->buf.size));
+
+	dtd_set_type_ctl_stride(dtd,
+				fmt->data_type,
+				notify,
+				field,
+				!!(flags & VPDMA_DATA_FRAME_1D),
+				!!(flags & VPDMA_DATA_EVEN_LINE_SKIP),
+				!!(flags & VPDMA_DATA_ODD_LINE_SKIP),
+				stride);
+
+	dtd_set_xfer_length_height(dtd, c_rect->width, height);
+	dtd_set_start_addr(dtd, dma_addr);
+	dtd_set_pkt_ctl(dtd, !!(flags & VPDMA_DATA_MODE_TILED), DTD_DIR_IN,
+			channel, priority, next_chan);
+	dtd_set_frame_width_height(dtd, frame_width, frame_height);
+	dtd_set_start_h_v(dtd, c_rect->left, c_rect->top);
+	dtd_set_client_attr0(dtd, 0);
+	dtd_set_client_attr1(dtd, 0);
+
+	list->next = dtd + 1;
+
+	dump_dtd(dtd);
+}
+
 /* set or clear the mask for list complete interrupt */
 void vpdma_enable_list_complete_irq(struct vpdma_data *vpdma, int list_num,
 		bool enable)
diff --git a/drivers/media/platform/ti-vpe/vpdma.h b/drivers/media/platform/ti-vpe/vpdma.h
index 2ea2dd3..a5435c5 100644
--- a/drivers/media/platform/ti-vpe/vpdma.h
+++ b/drivers/media/platform/ti-vpe/vpdma.h
@@ -123,6 +123,39 @@ enum vpdma_channel {
 	VPE_CHAN_RGB_OUT,
 };
 
+/* flags for VPDMA data descriptors */
+#define VPDMA_DATA_ODD_LINE_SKIP	(1 << 0)
+#define VPDMA_DATA_EVEN_LINE_SKIP	(1 << 1)
+#define VPDMA_DATA_FRAME_1D		(1 << 2)
+#define VPDMA_DATA_MODE_TILED		(1 << 3)
+
+/*
+ * client identifiers used for configuration descriptors
+ */
+#define CFD_MMR_CLIENT		0
+#define CFD_SC_CLIENT		7
+
+/* Address data block header format */
+struct vpdma_adb_hdr {
+	u32			offset;
+	u32			nwords;
+	u32			reserved0;
+	u32			reserved1;
+};
+
+/* helpers for creating ADB headers for config descriptors MMRs as client */
+#define ADB_ADDR(dma_buf, str, fld)	((dma_buf)->addr + offsetof(str, fld))
+#define MMR_ADB_ADDR(buf, str, fld)	ADB_ADDR(&(buf), struct str, fld)
+
+#define VPDMA_SET_MMR_ADB_HDR(buf, str, hdr, regs, offset_a)	\
+	do {							\
+		struct vpdma_adb_hdr *h;			\
+		struct str *adb = NULL;				\
+		h = MMR_ADB_ADDR(buf, str, hdr);		\
+		h->offset = (offset_a);				\
+		h->nwords = sizeof(adb->regs) >> 2;		\
+	} while (0)
+
 /* vpdma descriptor buffer allocation and management */
 int vpdma_buf_alloc(struct vpdma_buf *buf, size_t size);
 void vpdma_buf_free(struct vpdma_buf *buf);
@@ -135,6 +168,21 @@ void vpdma_reset_desc_list(struct vpdma_desc_list *list);
 void vpdma_free_desc_list(struct vpdma_desc_list *list);
 int vpdma_submit_descs(struct vpdma_data *vpdma, struct vpdma_desc_list *list);
 
+/* helpers for creating vpdma descriptors */
+void vpdma_add_cfd_block(struct vpdma_desc_list *list, int client,
+		struct vpdma_buf *blk, u32 dest_offset);
+void vpdma_add_cfd_adb(struct vpdma_desc_list *list, int client,
+		struct vpdma_buf *adb);
+void vpdma_add_sync_on_channel_ctd(struct vpdma_desc_list *list,
+		enum vpdma_channel chan);
+void vpdma_add_out_dtd(struct vpdma_desc_list *list, struct v4l2_rect *c_rect,
+		struct vpdma_data_format *fmt, dma_addr_t dma_addr,
+		enum vpdma_channel chan, u32 flags);
+void vpdma_add_in_dtd(struct vpdma_desc_list *list, int frame_width,
+		int frame_height, struct v4l2_rect *c_rect,
+		struct vpdma_data_format *fmt, dma_addr_t dma_addr,
+		enum vpdma_channel chan, int field, u32 flags);
+
 /* vpdma list interrupt management */
 void vpdma_enable_list_complete_irq(struct vpdma_data *vpdma, int list_num,
 		bool enable);
diff --git a/drivers/media/platform/ti-vpe/vpdma_priv.h b/drivers/media/platform/ti-vpe/vpdma_priv.h
index 8ff51a3..3b62f3d 100644
--- a/drivers/media/platform/ti-vpe/vpdma_priv.h
+++ b/drivers/media/platform/ti-vpe/vpdma_priv.h
@@ -116,4 +116,699 @@
 #define	VPE_CHAN_NUM_CHROMA_OUT		103
 #define	VPE_CHAN_NUM_RGB_OUT		106
 
+/*
+ * a VPDMA address data block payload for a configuration descriptor needs to
+ * have each sub block length as a multiple of 16 bytes. Therefore, the overall
+ * size of the payload also needs to be a multiple of 16 bytes. The sub block
+ * lengths should be ensured to be aligned by the VPDMA user.
+ */
+#define VPDMA_ADB_SIZE_ALIGN		0x0f
+
+/*
+ * data transfer descriptor
+ *
+ * All fields are 32 bits to make them endian neutral
+ */
+struct vpdma_dtd {
+	u32			type_ctl_stride;
+	union {
+		u32		xfer_length_height;
+		u32		w1;
+	};
+	dma_addr_t		start_addr;
+	u32			pkt_ctl;
+	union {
+		u32		frame_width_height;	/* inbound */
+		dma_addr_t	desc_write_addr;	/* outbound */
+	};
+	union {
+		u32		start_h_v;		/* inbound */
+		u32		max_width_height;	/* outbound */
+	};
+	u32			client_attr0;
+	u32			client_attr1;
+};
+
+/* Data Transfer Descriptor specifics */
+#define DTD_NO_NOTIFY		0
+#define DTD_NOTIFY		1
+
+#define DTD_PKT_TYPE		0xa
+#define DTD_DIR_IN		0
+#define DTD_DIR_OUT		1
+
+/* type_ctl_stride */
+#define DTD_DATA_TYPE_MASK	0x3f
+#define DTD_DATA_TYPE_SHFT	26
+#define DTD_NOTIFY_MASK		0x01
+#define DTD_NOTIFY_SHFT		25
+#define DTD_FIELD_MASK		0x01
+#define DTD_FIELD_SHFT		24
+#define DTD_1D_MASK		0x01
+#define DTD_1D_SHFT		23
+#define DTD_EVEN_LINE_SKIP_MASK	0x01
+#define DTD_EVEN_LINE_SKIP_SHFT	20
+#define DTD_ODD_LINE_SKIP_MASK	0x01
+#define DTD_ODD_LINE_SKIP_SHFT	16
+#define DTD_LINE_STRIDE_MASK	0xffff
+#define DTD_LINE_STRIDE_SHFT	0
+
+/* xfer_length_height */
+#define DTD_LINE_LENGTH_MASK	0xffff
+#define DTD_LINE_LENGTH_SHFT	16
+#define DTD_XFER_HEIGHT_MASK	0xffff
+#define DTD_XFER_HEIGHT_SHFT	0
+
+/* pkt_ctl */
+#define DTD_PKT_TYPE_MASK	0x1f
+#define DTD_PKT_TYPE_SHFT	27
+#define DTD_MODE_MASK		0x01
+#define DTD_MODE_SHFT		26
+#define DTD_DIR_MASK		0x01
+#define DTD_DIR_SHFT		25
+#define DTD_CHAN_MASK		0x01ff
+#define DTD_CHAN_SHFT		16
+#define DTD_PRI_MASK		0x0f
+#define DTD_PRI_SHFT		9
+#define DTD_NEXT_CHAN_MASK	0x01ff
+#define DTD_NEXT_CHAN_SHFT	0
+
+/* frame_width_height */
+#define DTD_FRAME_WIDTH_MASK	0xffff
+#define DTD_FRAME_WIDTH_SHFT	16
+#define DTD_FRAME_HEIGHT_MASK	0xffff
+#define DTD_FRAME_HEIGHT_SHFT	0
+
+/* start_h_v */
+#define DTD_H_START_MASK	0xffff
+#define DTD_H_START_SHFT	16
+#define DTD_V_START_MASK	0xffff
+#define DTD_V_START_SHFT	0
+
+#define DTD_DESC_START_SHIFT	5
+#define DTD_WRITE_DESC_MASK	0x01
+#define DTD_WRITE_DESC_SHIFT	2
+#define DTD_DROP_DATA_MASK	0x01
+#define DTD_DROP_DATA_SHIFT	1
+#define DTD_USE_DESC_MASK	0x01
+#define DTD_USE_DESC_SHIFT	0
+
+/* max_width_height */
+#define DTD_MAX_WIDTH_MASK	0x07
+#define DTD_MAX_WIDTH_SHFT	4
+#define DTD_MAX_HEIGHT_MASK	0x07
+#define DTD_MAX_HEIGHT_SHFT	0
+
+/* max width configurations */
+ /* unlimited width */
+#define	MAX_OUT_WIDTH_UNLIMITED		0
+/* as specified in max_size1 reg */
+#define MAX_OUT_WIDTH_REG1		1
+/* as specified in max_size2 reg */
+#define MAX_OUT_WIDTH_REG2		2
+/* as specified in max_size3 reg */
+#define	MAX_OUT_WIDTH_REG3		3
+/* maximum of 352 pixels as width */
+#define MAX_OUT_WIDTH_352		4
+/* maximum of 768 pixels as width */
+#define	MAX_OUT_WIDTH_768		5
+/* maximum of 1280 pixels width */
+#define	MAX_OUT_WIDTH_1280		6
+/* maximum of 1920 pixels as width */
+#define	MAX_OUT_WIDTH_1920		7
+
+/* max height configurations */
+ /* unlimited height */
+#define	MAX_OUT_HEIGHT_UNLIMITED	0
+/* as specified in max_size1 reg */
+#define MAX_OUT_HEIGHT_REG1		1
+/* as specified in max_size2 reg */
+#define MAX_OUT_HEIGHT_REG2		2
+/* as specified in max_size3 reg */
+#define	MAX_OUT_HEIGHT_REG3		3
+/* maximum of 288 lines as height */
+#define MAX_OUT_HEIGHT_288		4
+/* maximum of 576 lines as height */
+#define	MAX_OUT_HEIGHT_576		5
+/* maximum of 720 lines as height */
+#define	MAX_OUT_HEIGHT_720		6
+/* maximum of 1080 lines as height */
+#define	MAX_OUT_HEIGHT_1080		7
+
+
+/*
+ * The following macros may be useful for structure initialization
+ */
+#define DTD_W0(type, notify, field, one_d, even_line_skip,	\
+		odd_line_skip, line_stride)			\
+	((type << DTD_DATA_TYPE_SHFT) |				\
+	(notify << DTD_NOTIFY_SHFT) |				\
+	(field << DTD_FIELD_SHFT) |				\
+	(one_d << DTD_1D_SHFT) |				\
+	(even_line_skip << DTD_EVEN_LINE_SKIP_SHFT) |		\
+	(odd_line_skip << DTD_ODD_LINE_SKIP_SHFT) |		\
+	line_stride)
+
+#define DTD_W1(line_length, xfer_height)			\
+	((line_length << DTD_LINE_LENGTH_SHFT) |		\
+				  xfer_height)
+
+#define DTD_W3(mode, dir, chan, pri, next_chan)	\
+	((DTD_PKT_TYPE << DTD_PKT_TYPE_SHFT) |			\
+	(mode << DTD_MODE_SHFT) |				\
+	(dir << DTD_DIR_SHFT) |					\
+	(chan << DTD_CHAN_SHFT) |				\
+	(pri << DTD_PRI_SHFT) |					\
+	next_chan)
+
+#define DTD_I_W4(width, height)					\
+	((width << DTD_FRAME_WIDTH_SHFT) | height)
+
+#define DTD_O_W4(addr, write, drop, use)			\
+	((addr << DTD_DESC_START_SHIFT) |			\
+	(write << DTD_WRITE_DESC_SHIFT) |			\
+	(drop << DTD_DROP_DATA_SHIFT)	|			\
+	use)
+
+#define DTD_I_W5(h_start, v_start)				\
+	((h_start << DTD_H_START_SHFT) | v_start)
+
+#define DTD_O_W5(max_width, max_height)				\
+	((max_width << DTD_MAX_WIDTH_SHFT) | max_height)
+
+static inline void dtd_set_type_ctl_stride(struct vpdma_dtd *dtd, int type,
+					   bool notify, int field, bool one_d,
+					   bool even_line_skip,
+					   bool odd_line_skip, int line_stride)
+{
+	dtd->type_ctl_stride = DTD_W0(type, notify, field, one_d,
+				      even_line_skip, odd_line_skip,
+				      line_stride);
+}
+
+static inline void dtd_set_xfer_length_height(struct vpdma_dtd *dtd,
+					      int line_length, int xfer_height)
+{
+	dtd->xfer_length_height = DTD_W1(line_length, xfer_height);
+}
+
+static inline void dtd_set_w1(struct vpdma_dtd *dtd, u32 value)
+{
+	dtd->w1 = value;
+}
+
+static inline void dtd_set_start_addr(struct vpdma_dtd *dtd, dma_addr_t addr)
+{
+	dtd->start_addr = addr;
+}
+
+static inline void dtd_set_pkt_ctl(struct vpdma_dtd *dtd, bool mode,
+				   bool dir, int chan, int pri, int next_chan)
+{
+	dtd->pkt_ctl = DTD_W3(mode, dir, chan, pri, next_chan);
+}
+
+static inline void dtd_set_frame_width_height(struct vpdma_dtd *dtd,
+					      int width, int height)
+{
+	dtd->frame_width_height = DTD_I_W4(width, height);
+}
+
+static inline void dtd_set_desc_write_addr(struct vpdma_dtd *dtd,
+			unsigned int addr, bool write_desc, bool drop_data,
+			bool use_desc)
+{
+	dtd->desc_write_addr = DTD_O_W4(addr, write_desc, drop_data, use_desc);
+}
+
+static inline void dtd_set_start_h_v(struct vpdma_dtd *dtd,
+				     int h_start, int v_start)
+{
+	dtd->start_h_v = DTD_I_W5(h_start, v_start);
+}
+
+static inline void dtd_set_max_width_height(struct vpdma_dtd *dtd,
+					    int max_width, int max_height)
+{
+	dtd->max_width_height = DTD_O_W5(max_width, max_height);
+}
+
+static inline void dtd_set_client_attr0(struct vpdma_dtd *dtd, u32 value)
+{
+		dtd->client_attr0 = value;
+}
+
+static inline void dtd_set_client_attr1(struct vpdma_dtd *dtd, u32 value)
+{
+		dtd->client_attr1 = value;
+}
+
+static inline int dtd_get_data_type(struct vpdma_dtd *dtd)
+{
+	return dtd->type_ctl_stride >> DTD_DATA_TYPE_SHFT;
+}
+
+static inline bool dtd_get_notify(struct vpdma_dtd *dtd)
+{
+	return (dtd->type_ctl_stride >> DTD_NOTIFY_SHFT) & DTD_NOTIFY_MASK;
+}
+
+static inline int dtd_get_field(struct vpdma_dtd *dtd)
+{
+	return (dtd->type_ctl_stride >> DTD_FIELD_SHFT) & DTD_FIELD_MASK;
+}
+
+static inline bool dtd_get_1d(struct vpdma_dtd *dtd)
+{
+	return (dtd->type_ctl_stride >> DTD_1D_SHFT) & DTD_1D_MASK;
+}
+
+static inline bool dtd_get_even_line_skip(struct vpdma_dtd *dtd)
+{
+	return (dtd->type_ctl_stride >> DTD_EVEN_LINE_SKIP_SHFT)
+		& DTD_EVEN_LINE_SKIP_MASK;
+}
+
+static inline bool dtd_get_odd_line_skip(struct vpdma_dtd *dtd)
+{
+	return (dtd->type_ctl_stride >> DTD_ODD_LINE_SKIP_SHFT)
+		& DTD_ODD_LINE_SKIP_MASK;
+}
+
+static inline int dtd_get_line_stride(struct vpdma_dtd *dtd)
+{
+	return dtd->type_ctl_stride & DTD_LINE_STRIDE_MASK;
+}
+
+static inline int dtd_get_line_length(struct vpdma_dtd *dtd)
+{
+	return dtd->xfer_length_height >> DTD_LINE_LENGTH_SHFT;
+}
+
+static inline int dtd_get_xfer_height(struct vpdma_dtd *dtd)
+{
+	return dtd->xfer_length_height & DTD_XFER_HEIGHT_MASK;
+}
+
+static inline dma_addr_t dtd_get_start_addr(struct vpdma_dtd *dtd)
+{
+	return (dma_addr_t)dtd->start_addr;
+}
+
+static inline int dtd_get_pkt_type(struct vpdma_dtd *dtd)
+{
+	return dtd->pkt_ctl >> DTD_PKT_TYPE_SHFT;
+}
+
+static inline bool dtd_get_mode(struct vpdma_dtd *dtd)
+{
+	return (dtd->pkt_ctl >> DTD_MODE_SHFT) & DTD_MODE_MASK;
+}
+
+static inline bool dtd_get_dir(struct vpdma_dtd *dtd)
+{
+	return (dtd->pkt_ctl >> DTD_DIR_SHFT) & DTD_DIR_MASK;
+}
+
+static inline int dtd_get_chan(struct vpdma_dtd *dtd)
+{
+	return (dtd->pkt_ctl >> DTD_CHAN_SHFT) & DTD_CHAN_MASK;
+}
+
+static inline int dtd_get_priority(struct vpdma_dtd *dtd)
+{
+	return (dtd->pkt_ctl >> DTD_PRI_SHFT) & DTD_PRI_MASK;
+}
+
+static inline int dtd_get_next_chan(struct vpdma_dtd *dtd)
+{
+	return (dtd->pkt_ctl >> DTD_NEXT_CHAN_SHFT) & DTD_NEXT_CHAN_MASK;
+}
+
+static inline int dtd_get_frame_width(struct vpdma_dtd *dtd)
+{
+	return dtd->frame_width_height >> DTD_FRAME_WIDTH_SHFT;
+}
+
+static inline int dtd_get_frame_height(struct vpdma_dtd *dtd)
+{
+	return dtd->frame_width_height & DTD_FRAME_HEIGHT_MASK;
+}
+
+static inline int dtd_get_desc_write_addr(struct vpdma_dtd *dtd)
+{
+	return dtd->desc_write_addr >> DTD_DESC_START_SHIFT;
+}
+
+static inline bool dtd_get_write_desc(struct vpdma_dtd *dtd)
+{
+	return (dtd->desc_write_addr >> DTD_WRITE_DESC_SHIFT) &
+							DTD_WRITE_DESC_MASK;
+}
+
+static inline bool dtd_get_drop_data(struct vpdma_dtd *dtd)
+{
+	return (dtd->desc_write_addr >> DTD_DROP_DATA_SHIFT) &
+							DTD_DROP_DATA_MASK;
+}
+
+static inline bool dtd_get_use_desc(struct vpdma_dtd *dtd)
+{
+	return dtd->desc_write_addr & DTD_USE_DESC_MASK;
+}
+
+static inline int dtd_get_h_start(struct vpdma_dtd *dtd)
+{
+	return dtd->start_h_v >> DTD_H_START_SHFT;
+}
+
+static inline int dtd_get_v_start(struct vpdma_dtd *dtd)
+{
+	return dtd->start_h_v & DTD_V_START_MASK;
+}
+
+static inline int dtd_get_max_width(struct vpdma_dtd *dtd)
+{
+	return (dtd->max_width_height >> DTD_MAX_WIDTH_SHFT) &
+							DTD_MAX_WIDTH_MASK;
+}
+
+static inline int dtd_get_max_height(struct vpdma_dtd *dtd)
+{
+	return (dtd->max_width_height >> DTD_MAX_HEIGHT_SHFT) &
+							DTD_MAX_HEIGHT_MASK;
+}
+
+/*
+ * configuration descriptor
+ */
+struct vpdma_cfd {
+	union {
+		u32	dest_addr_offset;
+		u32	w0;
+	};
+	union {
+		u32	block_len;		/* in words */
+		u32	w1;
+	};
+	u32		payload_addr;
+	u32		ctl_payload_len;	/* in words */
+};
+
+/* Configuration descriptor specifics */
+
+#define CFD_PKT_TYPE		0xb
+
+#define CFD_DIRECT		1
+#define CFD_INDIRECT		0
+#define CFD_CLS_ADB		0
+#define CFD_CLS_BLOCK		1
+
+/* block_len */
+#define CFD__BLOCK_LEN_MASK	0xffff
+#define CFD__BLOCK_LEN_SHFT	0
+
+/* ctl_payload_len */
+#define CFD_PKT_TYPE_MASK	0x1f
+#define CFD_PKT_TYPE_SHFT	27
+#define CFD_DIRECT_MASK		0x01
+#define CFD_DIRECT_SHFT		26
+#define CFD_CLASS_MASK		0x03
+#define CFD_CLASS_SHFT		24
+#define CFD_DEST_MASK		0xff
+#define CFD_DEST_SHFT		16
+#define CFD_PAYLOAD_LEN_MASK	0xffff
+#define CFD_PAYLOAD_LEN_SHFT	0
+
+#define CFD_W3(direct, cls, dest, payload_len)	\
+	((CFD_PKT_TYPE << CFD_PKT_TYPE_SHFT) |			\
+	(direct << CFD_DIRECT_SHFT) |				\
+	(cls << CFD_CLASS_SHFT) |				\
+	(dest << CFD_DEST_SHFT) |				\
+	payload_len)
+
+static inline void cfd_set_dest_addr_offset(struct vpdma_cfd *cfd, u32 offset)
+{
+	cfd->dest_addr_offset = offset;
+}
+
+static inline void cfd_set_w0(struct vpdma_cfd *cfd, u32 w0)
+{
+	cfd->w0 = w0;
+}
+
+static inline void cfd_set_block_len(struct vpdma_cfd *cfd, int len)
+{
+	cfd->block_len = len;
+}
+
+static inline void cfd_set_w1(struct vpdma_cfd *cfd, u32 w1)
+{
+	cfd->w1 = w1;
+}
+
+static inline void cfd_set_payload_addr(struct vpdma_cfd *cfd, dma_addr_t addr)
+{
+	cfd->payload_addr = (u32)addr;
+}
+
+static inline void cfd_set_pkt_payload_len(struct vpdma_cfd *cfd,
+					   bool direct, int cls, int dest,
+					   int payload_len)
+{
+	cfd->ctl_payload_len = CFD_W3(direct, cls, dest, payload_len);
+}
+
+static inline u32 cfd_get_dest_addr_offset(struct vpdma_cfd *cfd)
+{
+	return cfd->dest_addr_offset;
+}
+
+static inline int cfd_get_block_len(struct vpdma_cfd *cfd)
+{
+	return cfd->block_len;
+}
+
+static inline dma_addr_t cfd_get_payload_addr(struct vpdma_cfd *cfd)
+{
+	return (dma_addr_t)cfd->payload_addr;
+}
+
+static inline int cfd_get_pkt_type(struct vpdma_cfd *cfd)
+{
+	return cfd->ctl_payload_len >> CFD_PKT_TYPE_SHFT;
+}
+
+static inline bool cfd_get_direct(struct vpdma_cfd *cfd)
+{
+	return (cfd->ctl_payload_len >> CFD_DIRECT_SHFT) & CFD_DIRECT_MASK;
+}
+
+static inline bool cfd_get_class(struct vpdma_cfd *cfd)
+{
+	return (cfd->ctl_payload_len >> CFD_CLASS_SHFT) & CFD_CLASS_MASK;
+}
+
+static inline int cfd_get_dest(struct vpdma_cfd *cfd)
+{
+	return (cfd->ctl_payload_len >> CFD_DEST_SHFT) & CFD_DEST_MASK;
+}
+
+static inline int cfd_get_payload_len(struct vpdma_cfd *cfd)
+{
+	return cfd->ctl_payload_len & CFD_PAYLOAD_LEN_MASK;
+}
+
+/*
+ * control descriptor
+ */
+struct vpdma_ctd {
+	union {
+		u32	timer_value;
+		u32	list_addr;
+		u32	w0;
+	};
+	union {
+		u32	pixel_line_count;
+		u32	list_size;
+		u32	w1;
+	};
+	union {
+		u32	event;
+		u32	fid_ctl;
+		u32	w2;
+	};
+	u32		type_source_ctl;
+};
+
+/* control descriptor types */
+#define CTD_TYPE_SYNC_ON_CLIENT		0
+#define CTD_TYPE_SYNC_ON_LIST		1
+#define CTD_TYPE_SYNC_ON_EXT		2
+#define CTD_TYPE_SYNC_ON_CHANNEL	3
+#define CTD_TYPE_CHNG_CLIENT_IRQ	4
+#define CTD_TYPE_SEND_IRQ		5
+#define CTD_TYPE_RELOAD_LIST		6
+#define CTD_TYPE_ABORT_CHANNEL		6
+
+#define CTD_PKT_TYPE		0xc
+
+/* timer_value */
+#define CTD_TIMER_VALUE_MASK	0xffff
+#define CTD_TIMER_VALUE_SHFT	0
+
+/* pixel_line_count */
+#define CTD_PIXEL_COUNT_MASK	0xffff
+#define CTD_PIXEL_COUNT_SHFT	16
+#define CTD_LINE_COUNT_MASK	0xffff
+#define CTD_LINE_COUNT_SHFT	0
+
+/* list_size */
+#define CTD_LIST_SIZE_MASK	0xffff
+#define CTD_LIST_SIZE_SHFT	0
+
+/* event */
+#define CTD_EVENT_MASK		0x0f
+#define CTD_EVENT_SHFT		0
+
+/* fid_ctl */
+#define CTD_FID2_MASK		0x03
+#define CTD_FID2_SHFT		4
+#define CTD_FID1_MASK		0x03
+#define CTD_FID1_SHFT		2
+#define CTD_FID0_MASK		0x03
+#define CTD_FID0_SHFT		0
+
+/* type_source_ctl */
+#define CTD_PKT_TYPE_MASK	0x1f
+#define CTD_PKT_TYPE_SHFT	27
+#define CTD_SOURCE_MASK		0xff
+#define CTD_SOURCE_SHFT		16
+#define CTD_CONTROL_MASK	0x0f
+#define CTD_CONTROL_SHFT	0
+
+#define CTD_W1(pixel_count, line_count)				\
+	((pixel_count << CTD_PIXEL_COUNT_SHFT) | line_count)
+
+#define CTD_W2(fid0, fid1, fid2)				\
+	((fid2 << CTD_FID2_SHFT) | (fid1 << CTD_FID1_SHFT) | fid0)
+
+#define CTD_W3(source, control)			\
+	((CTD_PKT_TYPE << CTD_PKT_TYPE_SHFT) |			\
+	(source << CTD_SOURCE_SHFT) | control)
+
+
+static inline void ctd_set_timer_value(struct vpdma_ctd *ctd, u32 value)
+{
+	ctd->timer_value = value;
+}
+
+static inline void ctd_set_list_addr(struct vpdma_ctd *ctd, dma_addr_t addr)
+{
+	ctd->list_addr = (u32)addr;
+}
+
+static inline void ctd_set_pixel_line_count(struct vpdma_ctd *ctd,
+					    int pixel_count, int line_count)
+{
+	ctd->pixel_line_count = CTD_W1(pixel_count, line_count);
+}
+
+static inline void ctd_set_list_size(struct vpdma_ctd *ctd, int list_size)
+{
+	ctd->list_size = list_size;
+}
+
+static inline void ctd_set_event(struct vpdma_ctd *ctd, int event)
+{
+	ctd->event = event;
+}
+
+static inline void ctd_set_fid_ctl(struct vpdma_ctd *ctd, int fid0, int fid1,
+				   int fid2)
+{
+	ctd->fid_ctl = CTD_W2(fid0, fid1, fid2);
+}
+
+static inline void ctd_set_type_source_ctl(struct vpdma_ctd *ctd,
+					   int source, int control)
+{
+	ctd->type_source_ctl = CTD_W3(source, control);
+}
+
+static inline void ctd_set_w0(struct vpdma_ctd *ctd, u32 w0)
+{
+	ctd->w0 = w0;
+}
+
+static inline void ctd_set_w1(struct vpdma_ctd *ctd, u32 w1)
+{
+	ctd->w1 = w1;
+}
+
+static inline void ctd_set_w2(struct vpdma_ctd *ctd, u32 w2)
+{
+	ctd->w2 = w2;
+}
+
+static inline u32 ctd_get_timer_value(struct vpdma_ctd *ctd)
+{
+	return ctd->timer_value;
+}
+
+static inline dma_addr_t ctd_get_list_addr(struct vpdma_ctd *ctd)
+{
+	return (dma_addr_t)ctd->list_addr;
+}
+
+static inline u32 ctd_get_pixel_count(struct vpdma_ctd *ctd)
+{
+	return ctd->pixel_line_count >> CTD_PIXEL_COUNT_SHFT;
+}
+
+static inline int ctd_get_line_count(struct vpdma_ctd *ctd)
+{
+	return ctd->pixel_line_count & CTD_LINE_COUNT_MASK;
+}
+
+static inline u32 ctd_get_list_size(struct vpdma_ctd *ctd)
+{
+	return ctd->list_size;
+}
+
+static inline int ctd_get_event(struct vpdma_ctd *ctd)
+{
+	return ctd->event & CTD_EVENT_MASK;
+}
+
+static inline int ctd_get_fid2_ctl(struct vpdma_ctd *ctd)
+{
+	return (ctd->fid_ctl >> CTD_FID2_SHFT) & CTD_FID2_MASK;
+}
+
+static inline int ctd_get_fid1_ctl(struct vpdma_ctd *ctd)
+{
+	return (ctd->fid_ctl >> CTD_FID1_SHFT) & CTD_FID1_MASK;
+}
+
+static inline int ctd_get_fid0_ctl(struct vpdma_ctd *ctd)
+{
+	return ctd->fid_ctl & CTD_FID2_MASK;
+}
+
+static inline int ctd_get_pkt_type(struct vpdma_ctd *ctd)
+{
+	return ctd->type_source_ctl >> CTD_PKT_TYPE_SHFT;
+}
+
+static inline int ctd_get_source(struct vpdma_ctd *ctd)
+{
+	return (ctd->type_source_ctl >> CTD_SOURCE_SHFT) & CTD_SOURCE_MASK;
+}
+
+static inline int ctd_get_ctl(struct vpdma_ctd *ctd)
+{
+	return ctd->type_source_ctl & CTD_CONTROL_MASK;
+}
+
 #endif
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH 2/6] v4l: ti-vpe: Add helpers for creating VPDMA descriptors
@ 2013-08-02 14:03   ` Archit Taneja
  0 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-08-02 14:03 UTC (permalink / raw)
  To: linux-media
  Cc: linux-omap, dagriego, dale, pawel, m.szyprowski, hverkuil,
	laurent.pinchart, tomi.valkeinen, Archit Taneja

Create functions which the VPE driver can use to create a VPDMA descriptor and
add it to a VPDMA descriptor list. These functions take a pointer to an existing
list, and append the configuration/data/control descriptor header to the list.

In the case of configuration descriptors, the creation of a payload block may be
required(the payloads can hold VPE MMR values, or scaler coefficients). The
allocation of the payload buffer and it's content is left to the VPE driver.
However, the VPDMA library provides helper macros to create payload in the
correct format.

Add debug functions to dump the descriptors in a way such that it's easy to see
the values of different fields in the descriptors.

Signed-off-by: Archit Taneja <archit@ti.com>
---
 drivers/media/platform/ti-vpe/vpdma.c      | 269 +++++++++++
 drivers/media/platform/ti-vpe/vpdma.h      |  48 ++
 drivers/media/platform/ti-vpe/vpdma_priv.h | 695 +++++++++++++++++++++++++++++
 3 files changed, 1012 insertions(+)

diff --git a/drivers/media/platform/ti-vpe/vpdma.c b/drivers/media/platform/ti-vpe/vpdma.c
index b15b3dd..b957381 100644
--- a/drivers/media/platform/ti-vpe/vpdma.c
+++ b/drivers/media/platform/ti-vpe/vpdma.c
@@ -21,6 +21,7 @@
 #include <linux/platform_device.h>
 #include <linux/sched.h>
 #include <linux/slab.h>
+#include <linux/videodev2.h>
 
 #include "vpdma.h"
 #include "vpdma_priv.h"
@@ -425,6 +426,274 @@ int vpdma_submit_descs(struct vpdma_data *vpdma, struct vpdma_desc_list *list)
 	return 0;
 }
 
+static void dump_cfd(struct vpdma_cfd *cfd)
+{
+	int class;
+
+	class = cfd_get_class(cfd);
+
+	pr_debug("config descriptor of payload class: %s\n",
+		class == CFD_CLS_BLOCK ? "simple block" :
+		"address data block");
+
+	if (class == CFD_CLS_BLOCK)
+		pr_debug("word0: dst_addr_offset = 0x%08x\n",
+			cfd_get_dest_addr_offset(cfd));
+
+	if (class == CFD_CLS_BLOCK)
+		pr_debug("word1: num_data_wrds = %d\n", cfd_get_block_len(cfd));
+
+	pr_debug("word2: payload_addr = 0x%08x\n", cfd_get_payload_addr(cfd));
+
+	pr_debug("word3: pkt_type = %d, direct = %d, class = %d, dest = %d, "
+		"payload_len = %d\n", cfd_get_pkt_type(cfd),
+		cfd_get_direct(cfd), class, cfd_get_dest(cfd),
+		cfd_get_payload_len(cfd));
+}
+
+/*
+ * append a configuration descriptor to the given descriptor list, where the
+ * payload is in the form of a simple data block specified in the descriptor
+ * header, this is used to upload scaler coefficients to the scaler module
+ */
+void vpdma_add_cfd_block(struct vpdma_desc_list *list, int client,
+		struct vpdma_buf *blk, u32 dest_offset)
+{
+	struct vpdma_cfd *cfd;
+	int len = blk->size;
+
+	WARN_ON(blk->dma_addr & VPDMA_DESC_ALIGN);
+
+	cfd = list->next;
+	WARN_ON((void *)(cfd + 1) > (list->buf.addr + list->buf.size));
+
+	cfd_set_dest_addr_offset(cfd, dest_offset);
+	cfd_set_block_len(cfd, len);
+	cfd_set_payload_addr(cfd, blk->dma_addr);
+	cfd_set_pkt_payload_len(cfd, CFD_INDIRECT, CFD_CLS_BLOCK, client,
+		len >> 4);
+
+	list->next = cfd + 1;
+
+	dump_cfd(cfd);
+}
+
+/*
+ * append a configuration descriptor to the given descriptor list, where the
+ * payload is in the address data block format, this is used to a configure a
+ * discontiguous set of MMRs
+ */
+void vpdma_add_cfd_adb(struct vpdma_desc_list *list, int client,
+		struct vpdma_buf *adb)
+{
+	struct vpdma_cfd *cfd;
+	unsigned int len = adb->size;
+
+	WARN_ON(len & VPDMA_ADB_SIZE_ALIGN);
+	WARN_ON(adb->dma_addr & VPDMA_DESC_ALIGN);
+
+	cfd = list->next;
+	BUG_ON((void *)(cfd + 1) > (list->buf.addr + list->buf.size));
+
+	cfd_set_w0(cfd, 0);
+	cfd_set_w1(cfd, 0);
+	cfd_set_payload_addr(cfd, adb->dma_addr);
+	cfd_set_pkt_payload_len(cfd, CFD_INDIRECT, CFD_CLS_ADB, client,
+		len >> 4);
+
+	list->next = cfd + 1;
+
+	dump_cfd(cfd);
+};
+
+/*
+ * control descriptor format change based on what type of control descriptor it
+ * is, we only use 'sync on channel' control descriptors for now, so assume it's
+ * that
+ */
+static void dump_ctd(struct vpdma_ctd *ctd)
+{
+	pr_debug("control descriptor\n");
+
+	pr_debug("word3: pkt_type = %d, source = %d, ctl_type = %d\n",
+		ctd_get_pkt_type(ctd), ctd_get_source(ctd), ctd_get_ctl(ctd));
+}
+
+/*
+ * append a 'sync on channel' type control descriptor to the given descriptor
+ * list, this descriptor stalls the VPDMA list till the time DMA is completed
+ * on the specified channel
+ */
+void vpdma_add_sync_on_channel_ctd(struct vpdma_desc_list *list,
+		enum vpdma_channel chan)
+{
+	struct vpdma_ctd *ctd;
+
+	ctd = list->next;
+	WARN_ON((void *)(ctd + 1) > (list->buf.addr + list->buf.size));
+
+	ctd_set_w0(ctd, 0);
+	ctd_set_w1(ctd, 0);
+	ctd_set_w2(ctd, 0);
+	ctd_set_type_source_ctl(ctd, chan_info[chan].num,
+		CTD_TYPE_SYNC_ON_CHANNEL);
+
+	list->next = ctd + 1;
+
+	dump_ctd(ctd);
+}
+
+static void dump_dtd(struct vpdma_dtd *dtd)
+{
+	int dir, chan;
+
+	dir = dtd_get_dir(dtd);
+	chan = dtd_get_chan(dtd);
+
+	pr_debug("%s data transfer descriptor for channel %d\n",
+		dir == DTD_DIR_OUT ? "outbound" : "inbound", chan);
+
+	pr_debug("word0: data_type = %d, notify = %d, field = %d, 1D = %d, "
+		"even_ln_skp = %d, odd_ln_skp = %d, line_stride = %d\n",
+		dtd_get_data_type(dtd), dtd_get_notify(dtd), dtd_get_field(dtd),
+		dtd_get_1d(dtd), dtd_get_even_line_skip(dtd),
+		dtd_get_odd_line_skip(dtd), dtd_get_line_stride(dtd));
+
+	if (dir == DTD_DIR_IN)
+		pr_debug("word1: line_length = %d, xfer_height = %d\n",
+			dtd_get_line_length(dtd), dtd_get_xfer_height(dtd));
+
+	pr_debug("word2: start_addr = 0x%08x\n", dtd_get_start_addr(dtd));
+
+	pr_debug("word3: pkt_type = %d, mode = %d, dir = %d, chan = %d, "
+		"pri = %d, next_chan = %d\n", dtd_get_pkt_type(dtd),
+		dtd_get_mode(dtd), dir, chan, dtd_get_priority(dtd),
+		dtd_get_next_chan(dtd));
+
+	if (dir == DTD_DIR_IN)
+		pr_debug("word4: frame_width = %d, frame_height = %d\n",
+			dtd_get_frame_width(dtd), dtd_get_frame_height(dtd));
+	else
+		pr_debug("word4: desc_write_addr = 0x%08x, write_desc = %d, "
+			" drp_data = %d, use_desc_reg = %d\n",
+			dtd_get_desc_write_addr(dtd), dtd_get_write_desc(dtd),
+			dtd_get_drop_data(dtd), dtd_get_use_desc(dtd));
+
+	if (dir == DTD_DIR_IN)
+		pr_debug("word5: hor_start = %d, ver_start = %d\n",
+			dtd_get_h_start(dtd), dtd_get_v_start(dtd));
+	else
+		pr_debug("word5: max_width %d, max_height %d\n",
+			dtd_get_max_width(dtd), dtd_get_max_height(dtd));
+
+	pr_debug("word6: client specfic attr0 = 0x%08x\n", dtd->client_attr0);
+	pr_debug("word7: client specfic attr1 = 0x%08x\n", dtd->client_attr1);
+}
+
+/*
+ * append an outbound data transfer descriptor to the given descriptor list,
+ * this sets up a 'client to memory' VPDMA transfer for the given VPDMA channel
+ */
+void vpdma_add_out_dtd(struct vpdma_desc_list *list, struct v4l2_rect *c_rect,
+		struct vpdma_data_format *fmt, dma_addr_t dma_addr,
+		enum vpdma_channel chan, u32 flags)
+{
+	int priority = 0;
+	int field = 0;
+	int notify = 1;
+	int channel, next_chan;
+	int depth = fmt->depth;
+	int stride;
+	struct vpdma_dtd *dtd;
+
+	channel = next_chan = chan_info[chan].num;
+
+	if (fmt->data_type == DATA_TYPE_C420)
+		depth = 8;
+
+	stride = (depth * c_rect->width) >> 3;
+	dma_addr += (c_rect->left * depth) >> 3;
+
+	dtd = list->next;
+	WARN_ON((void *)(dtd + 1) > (list->buf.addr + list->buf.size));
+
+	dtd_set_type_ctl_stride(dtd,
+				fmt->data_type,
+				notify,
+				field,
+				!!(flags & VPDMA_DATA_FRAME_1D),
+				!!(flags & VPDMA_DATA_EVEN_LINE_SKIP),
+				!!(flags & VPDMA_DATA_ODD_LINE_SKIP),
+				stride);
+	dtd_set_w1(dtd, 0);
+	dtd_set_start_addr(dtd, dma_addr);
+	dtd_set_pkt_ctl(dtd, !!(flags & VPDMA_DATA_MODE_TILED), DTD_DIR_OUT,
+			channel, priority, next_chan);
+
+	dtd_set_desc_write_addr(dtd, 0, 0, 0, 0);
+	dtd_set_max_width_height(dtd, MAX_OUT_WIDTH_1920, MAX_OUT_HEIGHT_1080);
+	dtd_set_client_attr0(dtd, 0);
+	dtd_set_client_attr1(dtd, 0);
+
+	list->next = dtd + 1;
+
+	dump_dtd(dtd);
+}
+
+/*
+ * append an inbound data transfer descriptor to the given descriptor list,
+ * this sets up a 'memory to client' VPDMA transfer for the given VPDMA channel
+ */
+void vpdma_add_in_dtd(struct vpdma_desc_list *list, int frame_width,
+		int frame_height, struct v4l2_rect *c_rect,
+		struct vpdma_data_format *fmt, dma_addr_t dma_addr,
+		enum vpdma_channel chan, int field, u32 flags)
+{
+	int priority = 0;
+	int notify = 1;
+	int depth = fmt->depth;
+	int channel, next_chan;
+	int stride;
+	int height = c_rect->height;
+	struct vpdma_dtd *dtd;
+
+	channel = next_chan = chan_info[chan].num;
+
+	if (fmt->data_type == DATA_TYPE_C420) {
+		height >>= 1;
+		frame_height >>= 1;
+		depth = 8;
+	}
+
+	stride = (depth * c_rect->width) >> 3;
+	dma_addr += (c_rect->left * depth) >> 3;
+
+	dtd = list->next;
+	WARN_ON((void *)(dtd + 1) > (list->buf.addr + list->buf.size));
+
+	dtd_set_type_ctl_stride(dtd,
+				fmt->data_type,
+				notify,
+				field,
+				!!(flags & VPDMA_DATA_FRAME_1D),
+				!!(flags & VPDMA_DATA_EVEN_LINE_SKIP),
+				!!(flags & VPDMA_DATA_ODD_LINE_SKIP),
+				stride);
+
+	dtd_set_xfer_length_height(dtd, c_rect->width, height);
+	dtd_set_start_addr(dtd, dma_addr);
+	dtd_set_pkt_ctl(dtd, !!(flags & VPDMA_DATA_MODE_TILED), DTD_DIR_IN,
+			channel, priority, next_chan);
+	dtd_set_frame_width_height(dtd, frame_width, frame_height);
+	dtd_set_start_h_v(dtd, c_rect->left, c_rect->top);
+	dtd_set_client_attr0(dtd, 0);
+	dtd_set_client_attr1(dtd, 0);
+
+	list->next = dtd + 1;
+
+	dump_dtd(dtd);
+}
+
 /* set or clear the mask for list complete interrupt */
 void vpdma_enable_list_complete_irq(struct vpdma_data *vpdma, int list_num,
 		bool enable)
diff --git a/drivers/media/platform/ti-vpe/vpdma.h b/drivers/media/platform/ti-vpe/vpdma.h
index 2ea2dd3..a5435c5 100644
--- a/drivers/media/platform/ti-vpe/vpdma.h
+++ b/drivers/media/platform/ti-vpe/vpdma.h
@@ -123,6 +123,39 @@ enum vpdma_channel {
 	VPE_CHAN_RGB_OUT,
 };
 
+/* flags for VPDMA data descriptors */
+#define VPDMA_DATA_ODD_LINE_SKIP	(1 << 0)
+#define VPDMA_DATA_EVEN_LINE_SKIP	(1 << 1)
+#define VPDMA_DATA_FRAME_1D		(1 << 2)
+#define VPDMA_DATA_MODE_TILED		(1 << 3)
+
+/*
+ * client identifiers used for configuration descriptors
+ */
+#define CFD_MMR_CLIENT		0
+#define CFD_SC_CLIENT		7
+
+/* Address data block header format */
+struct vpdma_adb_hdr {
+	u32			offset;
+	u32			nwords;
+	u32			reserved0;
+	u32			reserved1;
+};
+
+/* helpers for creating ADB headers for config descriptors MMRs as client */
+#define ADB_ADDR(dma_buf, str, fld)	((dma_buf)->addr + offsetof(str, fld))
+#define MMR_ADB_ADDR(buf, str, fld)	ADB_ADDR(&(buf), struct str, fld)
+
+#define VPDMA_SET_MMR_ADB_HDR(buf, str, hdr, regs, offset_a)	\
+	do {							\
+		struct vpdma_adb_hdr *h;			\
+		struct str *adb = NULL;				\
+		h = MMR_ADB_ADDR(buf, str, hdr);		\
+		h->offset = (offset_a);				\
+		h->nwords = sizeof(adb->regs) >> 2;		\
+	} while (0)
+
 /* vpdma descriptor buffer allocation and management */
 int vpdma_buf_alloc(struct vpdma_buf *buf, size_t size);
 void vpdma_buf_free(struct vpdma_buf *buf);
@@ -135,6 +168,21 @@ void vpdma_reset_desc_list(struct vpdma_desc_list *list);
 void vpdma_free_desc_list(struct vpdma_desc_list *list);
 int vpdma_submit_descs(struct vpdma_data *vpdma, struct vpdma_desc_list *list);
 
+/* helpers for creating vpdma descriptors */
+void vpdma_add_cfd_block(struct vpdma_desc_list *list, int client,
+		struct vpdma_buf *blk, u32 dest_offset);
+void vpdma_add_cfd_adb(struct vpdma_desc_list *list, int client,
+		struct vpdma_buf *adb);
+void vpdma_add_sync_on_channel_ctd(struct vpdma_desc_list *list,
+		enum vpdma_channel chan);
+void vpdma_add_out_dtd(struct vpdma_desc_list *list, struct v4l2_rect *c_rect,
+		struct vpdma_data_format *fmt, dma_addr_t dma_addr,
+		enum vpdma_channel chan, u32 flags);
+void vpdma_add_in_dtd(struct vpdma_desc_list *list, int frame_width,
+		int frame_height, struct v4l2_rect *c_rect,
+		struct vpdma_data_format *fmt, dma_addr_t dma_addr,
+		enum vpdma_channel chan, int field, u32 flags);
+
 /* vpdma list interrupt management */
 void vpdma_enable_list_complete_irq(struct vpdma_data *vpdma, int list_num,
 		bool enable);
diff --git a/drivers/media/platform/ti-vpe/vpdma_priv.h b/drivers/media/platform/ti-vpe/vpdma_priv.h
index 8ff51a3..3b62f3d 100644
--- a/drivers/media/platform/ti-vpe/vpdma_priv.h
+++ b/drivers/media/platform/ti-vpe/vpdma_priv.h
@@ -116,4 +116,699 @@
 #define	VPE_CHAN_NUM_CHROMA_OUT		103
 #define	VPE_CHAN_NUM_RGB_OUT		106
 
+/*
+ * a VPDMA address data block payload for a configuration descriptor needs to
+ * have each sub block length as a multiple of 16 bytes. Therefore, the overall
+ * size of the payload also needs to be a multiple of 16 bytes. The sub block
+ * lengths should be ensured to be aligned by the VPDMA user.
+ */
+#define VPDMA_ADB_SIZE_ALIGN		0x0f
+
+/*
+ * data transfer descriptor
+ *
+ * All fields are 32 bits to make them endian neutral
+ */
+struct vpdma_dtd {
+	u32			type_ctl_stride;
+	union {
+		u32		xfer_length_height;
+		u32		w1;
+	};
+	dma_addr_t		start_addr;
+	u32			pkt_ctl;
+	union {
+		u32		frame_width_height;	/* inbound */
+		dma_addr_t	desc_write_addr;	/* outbound */
+	};
+	union {
+		u32		start_h_v;		/* inbound */
+		u32		max_width_height;	/* outbound */
+	};
+	u32			client_attr0;
+	u32			client_attr1;
+};
+
+/* Data Transfer Descriptor specifics */
+#define DTD_NO_NOTIFY		0
+#define DTD_NOTIFY		1
+
+#define DTD_PKT_TYPE		0xa
+#define DTD_DIR_IN		0
+#define DTD_DIR_OUT		1
+
+/* type_ctl_stride */
+#define DTD_DATA_TYPE_MASK	0x3f
+#define DTD_DATA_TYPE_SHFT	26
+#define DTD_NOTIFY_MASK		0x01
+#define DTD_NOTIFY_SHFT		25
+#define DTD_FIELD_MASK		0x01
+#define DTD_FIELD_SHFT		24
+#define DTD_1D_MASK		0x01
+#define DTD_1D_SHFT		23
+#define DTD_EVEN_LINE_SKIP_MASK	0x01
+#define DTD_EVEN_LINE_SKIP_SHFT	20
+#define DTD_ODD_LINE_SKIP_MASK	0x01
+#define DTD_ODD_LINE_SKIP_SHFT	16
+#define DTD_LINE_STRIDE_MASK	0xffff
+#define DTD_LINE_STRIDE_SHFT	0
+
+/* xfer_length_height */
+#define DTD_LINE_LENGTH_MASK	0xffff
+#define DTD_LINE_LENGTH_SHFT	16
+#define DTD_XFER_HEIGHT_MASK	0xffff
+#define DTD_XFER_HEIGHT_SHFT	0
+
+/* pkt_ctl */
+#define DTD_PKT_TYPE_MASK	0x1f
+#define DTD_PKT_TYPE_SHFT	27
+#define DTD_MODE_MASK		0x01
+#define DTD_MODE_SHFT		26
+#define DTD_DIR_MASK		0x01
+#define DTD_DIR_SHFT		25
+#define DTD_CHAN_MASK		0x01ff
+#define DTD_CHAN_SHFT		16
+#define DTD_PRI_MASK		0x0f
+#define DTD_PRI_SHFT		9
+#define DTD_NEXT_CHAN_MASK	0x01ff
+#define DTD_NEXT_CHAN_SHFT	0
+
+/* frame_width_height */
+#define DTD_FRAME_WIDTH_MASK	0xffff
+#define DTD_FRAME_WIDTH_SHFT	16
+#define DTD_FRAME_HEIGHT_MASK	0xffff
+#define DTD_FRAME_HEIGHT_SHFT	0
+
+/* start_h_v */
+#define DTD_H_START_MASK	0xffff
+#define DTD_H_START_SHFT	16
+#define DTD_V_START_MASK	0xffff
+#define DTD_V_START_SHFT	0
+
+#define DTD_DESC_START_SHIFT	5
+#define DTD_WRITE_DESC_MASK	0x01
+#define DTD_WRITE_DESC_SHIFT	2
+#define DTD_DROP_DATA_MASK	0x01
+#define DTD_DROP_DATA_SHIFT	1
+#define DTD_USE_DESC_MASK	0x01
+#define DTD_USE_DESC_SHIFT	0
+
+/* max_width_height */
+#define DTD_MAX_WIDTH_MASK	0x07
+#define DTD_MAX_WIDTH_SHFT	4
+#define DTD_MAX_HEIGHT_MASK	0x07
+#define DTD_MAX_HEIGHT_SHFT	0
+
+/* max width configurations */
+ /* unlimited width */
+#define	MAX_OUT_WIDTH_UNLIMITED		0
+/* as specified in max_size1 reg */
+#define MAX_OUT_WIDTH_REG1		1
+/* as specified in max_size2 reg */
+#define MAX_OUT_WIDTH_REG2		2
+/* as specified in max_size3 reg */
+#define	MAX_OUT_WIDTH_REG3		3
+/* maximum of 352 pixels as width */
+#define MAX_OUT_WIDTH_352		4
+/* maximum of 768 pixels as width */
+#define	MAX_OUT_WIDTH_768		5
+/* maximum of 1280 pixels width */
+#define	MAX_OUT_WIDTH_1280		6
+/* maximum of 1920 pixels as width */
+#define	MAX_OUT_WIDTH_1920		7
+
+/* max height configurations */
+ /* unlimited height */
+#define	MAX_OUT_HEIGHT_UNLIMITED	0
+/* as specified in max_size1 reg */
+#define MAX_OUT_HEIGHT_REG1		1
+/* as specified in max_size2 reg */
+#define MAX_OUT_HEIGHT_REG2		2
+/* as specified in max_size3 reg */
+#define	MAX_OUT_HEIGHT_REG3		3
+/* maximum of 288 lines as height */
+#define MAX_OUT_HEIGHT_288		4
+/* maximum of 576 lines as height */
+#define	MAX_OUT_HEIGHT_576		5
+/* maximum of 720 lines as height */
+#define	MAX_OUT_HEIGHT_720		6
+/* maximum of 1080 lines as height */
+#define	MAX_OUT_HEIGHT_1080		7
+
+
+/*
+ * The following macros may be useful for structure initialization
+ */
+#define DTD_W0(type, notify, field, one_d, even_line_skip,	\
+		odd_line_skip, line_stride)			\
+	((type << DTD_DATA_TYPE_SHFT) |				\
+	(notify << DTD_NOTIFY_SHFT) |				\
+	(field << DTD_FIELD_SHFT) |				\
+	(one_d << DTD_1D_SHFT) |				\
+	(even_line_skip << DTD_EVEN_LINE_SKIP_SHFT) |		\
+	(odd_line_skip << DTD_ODD_LINE_SKIP_SHFT) |		\
+	line_stride)
+
+#define DTD_W1(line_length, xfer_height)			\
+	((line_length << DTD_LINE_LENGTH_SHFT) |		\
+				  xfer_height)
+
+#define DTD_W3(mode, dir, chan, pri, next_chan)	\
+	((DTD_PKT_TYPE << DTD_PKT_TYPE_SHFT) |			\
+	(mode << DTD_MODE_SHFT) |				\
+	(dir << DTD_DIR_SHFT) |					\
+	(chan << DTD_CHAN_SHFT) |				\
+	(pri << DTD_PRI_SHFT) |					\
+	next_chan)
+
+#define DTD_I_W4(width, height)					\
+	((width << DTD_FRAME_WIDTH_SHFT) | height)
+
+#define DTD_O_W4(addr, write, drop, use)			\
+	((addr << DTD_DESC_START_SHIFT) |			\
+	(write << DTD_WRITE_DESC_SHIFT) |			\
+	(drop << DTD_DROP_DATA_SHIFT)	|			\
+	use)
+
+#define DTD_I_W5(h_start, v_start)				\
+	((h_start << DTD_H_START_SHFT) | v_start)
+
+#define DTD_O_W5(max_width, max_height)				\
+	((max_width << DTD_MAX_WIDTH_SHFT) | max_height)
+
+static inline void dtd_set_type_ctl_stride(struct vpdma_dtd *dtd, int type,
+					   bool notify, int field, bool one_d,
+					   bool even_line_skip,
+					   bool odd_line_skip, int line_stride)
+{
+	dtd->type_ctl_stride = DTD_W0(type, notify, field, one_d,
+				      even_line_skip, odd_line_skip,
+				      line_stride);
+}
+
+static inline void dtd_set_xfer_length_height(struct vpdma_dtd *dtd,
+					      int line_length, int xfer_height)
+{
+	dtd->xfer_length_height = DTD_W1(line_length, xfer_height);
+}
+
+static inline void dtd_set_w1(struct vpdma_dtd *dtd, u32 value)
+{
+	dtd->w1 = value;
+}
+
+static inline void dtd_set_start_addr(struct vpdma_dtd *dtd, dma_addr_t addr)
+{
+	dtd->start_addr = addr;
+}
+
+static inline void dtd_set_pkt_ctl(struct vpdma_dtd *dtd, bool mode,
+				   bool dir, int chan, int pri, int next_chan)
+{
+	dtd->pkt_ctl = DTD_W3(mode, dir, chan, pri, next_chan);
+}
+
+static inline void dtd_set_frame_width_height(struct vpdma_dtd *dtd,
+					      int width, int height)
+{
+	dtd->frame_width_height = DTD_I_W4(width, height);
+}
+
+static inline void dtd_set_desc_write_addr(struct vpdma_dtd *dtd,
+			unsigned int addr, bool write_desc, bool drop_data,
+			bool use_desc)
+{
+	dtd->desc_write_addr = DTD_O_W4(addr, write_desc, drop_data, use_desc);
+}
+
+static inline void dtd_set_start_h_v(struct vpdma_dtd *dtd,
+				     int h_start, int v_start)
+{
+	dtd->start_h_v = DTD_I_W5(h_start, v_start);
+}
+
+static inline void dtd_set_max_width_height(struct vpdma_dtd *dtd,
+					    int max_width, int max_height)
+{
+	dtd->max_width_height = DTD_O_W5(max_width, max_height);
+}
+
+static inline void dtd_set_client_attr0(struct vpdma_dtd *dtd, u32 value)
+{
+		dtd->client_attr0 = value;
+}
+
+static inline void dtd_set_client_attr1(struct vpdma_dtd *dtd, u32 value)
+{
+		dtd->client_attr1 = value;
+}
+
+static inline int dtd_get_data_type(struct vpdma_dtd *dtd)
+{
+	return dtd->type_ctl_stride >> DTD_DATA_TYPE_SHFT;
+}
+
+static inline bool dtd_get_notify(struct vpdma_dtd *dtd)
+{
+	return (dtd->type_ctl_stride >> DTD_NOTIFY_SHFT) & DTD_NOTIFY_MASK;
+}
+
+static inline int dtd_get_field(struct vpdma_dtd *dtd)
+{
+	return (dtd->type_ctl_stride >> DTD_FIELD_SHFT) & DTD_FIELD_MASK;
+}
+
+static inline bool dtd_get_1d(struct vpdma_dtd *dtd)
+{
+	return (dtd->type_ctl_stride >> DTD_1D_SHFT) & DTD_1D_MASK;
+}
+
+static inline bool dtd_get_even_line_skip(struct vpdma_dtd *dtd)
+{
+	return (dtd->type_ctl_stride >> DTD_EVEN_LINE_SKIP_SHFT)
+		& DTD_EVEN_LINE_SKIP_MASK;
+}
+
+static inline bool dtd_get_odd_line_skip(struct vpdma_dtd *dtd)
+{
+	return (dtd->type_ctl_stride >> DTD_ODD_LINE_SKIP_SHFT)
+		& DTD_ODD_LINE_SKIP_MASK;
+}
+
+static inline int dtd_get_line_stride(struct vpdma_dtd *dtd)
+{
+	return dtd->type_ctl_stride & DTD_LINE_STRIDE_MASK;
+}
+
+static inline int dtd_get_line_length(struct vpdma_dtd *dtd)
+{
+	return dtd->xfer_length_height >> DTD_LINE_LENGTH_SHFT;
+}
+
+static inline int dtd_get_xfer_height(struct vpdma_dtd *dtd)
+{
+	return dtd->xfer_length_height & DTD_XFER_HEIGHT_MASK;
+}
+
+static inline dma_addr_t dtd_get_start_addr(struct vpdma_dtd *dtd)
+{
+	return (dma_addr_t)dtd->start_addr;
+}
+
+static inline int dtd_get_pkt_type(struct vpdma_dtd *dtd)
+{
+	return dtd->pkt_ctl >> DTD_PKT_TYPE_SHFT;
+}
+
+static inline bool dtd_get_mode(struct vpdma_dtd *dtd)
+{
+	return (dtd->pkt_ctl >> DTD_MODE_SHFT) & DTD_MODE_MASK;
+}
+
+static inline bool dtd_get_dir(struct vpdma_dtd *dtd)
+{
+	return (dtd->pkt_ctl >> DTD_DIR_SHFT) & DTD_DIR_MASK;
+}
+
+static inline int dtd_get_chan(struct vpdma_dtd *dtd)
+{
+	return (dtd->pkt_ctl >> DTD_CHAN_SHFT) & DTD_CHAN_MASK;
+}
+
+static inline int dtd_get_priority(struct vpdma_dtd *dtd)
+{
+	return (dtd->pkt_ctl >> DTD_PRI_SHFT) & DTD_PRI_MASK;
+}
+
+static inline int dtd_get_next_chan(struct vpdma_dtd *dtd)
+{
+	return (dtd->pkt_ctl >> DTD_NEXT_CHAN_SHFT) & DTD_NEXT_CHAN_MASK;
+}
+
+static inline int dtd_get_frame_width(struct vpdma_dtd *dtd)
+{
+	return dtd->frame_width_height >> DTD_FRAME_WIDTH_SHFT;
+}
+
+static inline int dtd_get_frame_height(struct vpdma_dtd *dtd)
+{
+	return dtd->frame_width_height & DTD_FRAME_HEIGHT_MASK;
+}
+
+static inline int dtd_get_desc_write_addr(struct vpdma_dtd *dtd)
+{
+	return dtd->desc_write_addr >> DTD_DESC_START_SHIFT;
+}
+
+static inline bool dtd_get_write_desc(struct vpdma_dtd *dtd)
+{
+	return (dtd->desc_write_addr >> DTD_WRITE_DESC_SHIFT) &
+							DTD_WRITE_DESC_MASK;
+}
+
+static inline bool dtd_get_drop_data(struct vpdma_dtd *dtd)
+{
+	return (dtd->desc_write_addr >> DTD_DROP_DATA_SHIFT) &
+							DTD_DROP_DATA_MASK;
+}
+
+static inline bool dtd_get_use_desc(struct vpdma_dtd *dtd)
+{
+	return dtd->desc_write_addr & DTD_USE_DESC_MASK;
+}
+
+static inline int dtd_get_h_start(struct vpdma_dtd *dtd)
+{
+	return dtd->start_h_v >> DTD_H_START_SHFT;
+}
+
+static inline int dtd_get_v_start(struct vpdma_dtd *dtd)
+{
+	return dtd->start_h_v & DTD_V_START_MASK;
+}
+
+static inline int dtd_get_max_width(struct vpdma_dtd *dtd)
+{
+	return (dtd->max_width_height >> DTD_MAX_WIDTH_SHFT) &
+							DTD_MAX_WIDTH_MASK;
+}
+
+static inline int dtd_get_max_height(struct vpdma_dtd *dtd)
+{
+	return (dtd->max_width_height >> DTD_MAX_HEIGHT_SHFT) &
+							DTD_MAX_HEIGHT_MASK;
+}
+
+/*
+ * configuration descriptor
+ */
+struct vpdma_cfd {
+	union {
+		u32	dest_addr_offset;
+		u32	w0;
+	};
+	union {
+		u32	block_len;		/* in words */
+		u32	w1;
+	};
+	u32		payload_addr;
+	u32		ctl_payload_len;	/* in words */
+};
+
+/* Configuration descriptor specifics */
+
+#define CFD_PKT_TYPE		0xb
+
+#define CFD_DIRECT		1
+#define CFD_INDIRECT		0
+#define CFD_CLS_ADB		0
+#define CFD_CLS_BLOCK		1
+
+/* block_len */
+#define CFD__BLOCK_LEN_MASK	0xffff
+#define CFD__BLOCK_LEN_SHFT	0
+
+/* ctl_payload_len */
+#define CFD_PKT_TYPE_MASK	0x1f
+#define CFD_PKT_TYPE_SHFT	27
+#define CFD_DIRECT_MASK		0x01
+#define CFD_DIRECT_SHFT		26
+#define CFD_CLASS_MASK		0x03
+#define CFD_CLASS_SHFT		24
+#define CFD_DEST_MASK		0xff
+#define CFD_DEST_SHFT		16
+#define CFD_PAYLOAD_LEN_MASK	0xffff
+#define CFD_PAYLOAD_LEN_SHFT	0
+
+#define CFD_W3(direct, cls, dest, payload_len)	\
+	((CFD_PKT_TYPE << CFD_PKT_TYPE_SHFT) |			\
+	(direct << CFD_DIRECT_SHFT) |				\
+	(cls << CFD_CLASS_SHFT) |				\
+	(dest << CFD_DEST_SHFT) |				\
+	payload_len)
+
+static inline void cfd_set_dest_addr_offset(struct vpdma_cfd *cfd, u32 offset)
+{
+	cfd->dest_addr_offset = offset;
+}
+
+static inline void cfd_set_w0(struct vpdma_cfd *cfd, u32 w0)
+{
+	cfd->w0 = w0;
+}
+
+static inline void cfd_set_block_len(struct vpdma_cfd *cfd, int len)
+{
+	cfd->block_len = len;
+}
+
+static inline void cfd_set_w1(struct vpdma_cfd *cfd, u32 w1)
+{
+	cfd->w1 = w1;
+}
+
+static inline void cfd_set_payload_addr(struct vpdma_cfd *cfd, dma_addr_t addr)
+{
+	cfd->payload_addr = (u32)addr;
+}
+
+static inline void cfd_set_pkt_payload_len(struct vpdma_cfd *cfd,
+					   bool direct, int cls, int dest,
+					   int payload_len)
+{
+	cfd->ctl_payload_len = CFD_W3(direct, cls, dest, payload_len);
+}
+
+static inline u32 cfd_get_dest_addr_offset(struct vpdma_cfd *cfd)
+{
+	return cfd->dest_addr_offset;
+}
+
+static inline int cfd_get_block_len(struct vpdma_cfd *cfd)
+{
+	return cfd->block_len;
+}
+
+static inline dma_addr_t cfd_get_payload_addr(struct vpdma_cfd *cfd)
+{
+	return (dma_addr_t)cfd->payload_addr;
+}
+
+static inline int cfd_get_pkt_type(struct vpdma_cfd *cfd)
+{
+	return cfd->ctl_payload_len >> CFD_PKT_TYPE_SHFT;
+}
+
+static inline bool cfd_get_direct(struct vpdma_cfd *cfd)
+{
+	return (cfd->ctl_payload_len >> CFD_DIRECT_SHFT) & CFD_DIRECT_MASK;
+}
+
+static inline bool cfd_get_class(struct vpdma_cfd *cfd)
+{
+	return (cfd->ctl_payload_len >> CFD_CLASS_SHFT) & CFD_CLASS_MASK;
+}
+
+static inline int cfd_get_dest(struct vpdma_cfd *cfd)
+{
+	return (cfd->ctl_payload_len >> CFD_DEST_SHFT) & CFD_DEST_MASK;
+}
+
+static inline int cfd_get_payload_len(struct vpdma_cfd *cfd)
+{
+	return cfd->ctl_payload_len & CFD_PAYLOAD_LEN_MASK;
+}
+
+/*
+ * control descriptor
+ */
+struct vpdma_ctd {
+	union {
+		u32	timer_value;
+		u32	list_addr;
+		u32	w0;
+	};
+	union {
+		u32	pixel_line_count;
+		u32	list_size;
+		u32	w1;
+	};
+	union {
+		u32	event;
+		u32	fid_ctl;
+		u32	w2;
+	};
+	u32		type_source_ctl;
+};
+
+/* control descriptor types */
+#define CTD_TYPE_SYNC_ON_CLIENT		0
+#define CTD_TYPE_SYNC_ON_LIST		1
+#define CTD_TYPE_SYNC_ON_EXT		2
+#define CTD_TYPE_SYNC_ON_CHANNEL	3
+#define CTD_TYPE_CHNG_CLIENT_IRQ	4
+#define CTD_TYPE_SEND_IRQ		5
+#define CTD_TYPE_RELOAD_LIST		6
+#define CTD_TYPE_ABORT_CHANNEL		6
+
+#define CTD_PKT_TYPE		0xc
+
+/* timer_value */
+#define CTD_TIMER_VALUE_MASK	0xffff
+#define CTD_TIMER_VALUE_SHFT	0
+
+/* pixel_line_count */
+#define CTD_PIXEL_COUNT_MASK	0xffff
+#define CTD_PIXEL_COUNT_SHFT	16
+#define CTD_LINE_COUNT_MASK	0xffff
+#define CTD_LINE_COUNT_SHFT	0
+
+/* list_size */
+#define CTD_LIST_SIZE_MASK	0xffff
+#define CTD_LIST_SIZE_SHFT	0
+
+/* event */
+#define CTD_EVENT_MASK		0x0f
+#define CTD_EVENT_SHFT		0
+
+/* fid_ctl */
+#define CTD_FID2_MASK		0x03
+#define CTD_FID2_SHFT		4
+#define CTD_FID1_MASK		0x03
+#define CTD_FID1_SHFT		2
+#define CTD_FID0_MASK		0x03
+#define CTD_FID0_SHFT		0
+
+/* type_source_ctl */
+#define CTD_PKT_TYPE_MASK	0x1f
+#define CTD_PKT_TYPE_SHFT	27
+#define CTD_SOURCE_MASK		0xff
+#define CTD_SOURCE_SHFT		16
+#define CTD_CONTROL_MASK	0x0f
+#define CTD_CONTROL_SHFT	0
+
+#define CTD_W1(pixel_count, line_count)				\
+	((pixel_count << CTD_PIXEL_COUNT_SHFT) | line_count)
+
+#define CTD_W2(fid0, fid1, fid2)				\
+	((fid2 << CTD_FID2_SHFT) | (fid1 << CTD_FID1_SHFT) | fid0)
+
+#define CTD_W3(source, control)			\
+	((CTD_PKT_TYPE << CTD_PKT_TYPE_SHFT) |			\
+	(source << CTD_SOURCE_SHFT) | control)
+
+
+static inline void ctd_set_timer_value(struct vpdma_ctd *ctd, u32 value)
+{
+	ctd->timer_value = value;
+}
+
+static inline void ctd_set_list_addr(struct vpdma_ctd *ctd, dma_addr_t addr)
+{
+	ctd->list_addr = (u32)addr;
+}
+
+static inline void ctd_set_pixel_line_count(struct vpdma_ctd *ctd,
+					    int pixel_count, int line_count)
+{
+	ctd->pixel_line_count = CTD_W1(pixel_count, line_count);
+}
+
+static inline void ctd_set_list_size(struct vpdma_ctd *ctd, int list_size)
+{
+	ctd->list_size = list_size;
+}
+
+static inline void ctd_set_event(struct vpdma_ctd *ctd, int event)
+{
+	ctd->event = event;
+}
+
+static inline void ctd_set_fid_ctl(struct vpdma_ctd *ctd, int fid0, int fid1,
+				   int fid2)
+{
+	ctd->fid_ctl = CTD_W2(fid0, fid1, fid2);
+}
+
+static inline void ctd_set_type_source_ctl(struct vpdma_ctd *ctd,
+					   int source, int control)
+{
+	ctd->type_source_ctl = CTD_W3(source, control);
+}
+
+static inline void ctd_set_w0(struct vpdma_ctd *ctd, u32 w0)
+{
+	ctd->w0 = w0;
+}
+
+static inline void ctd_set_w1(struct vpdma_ctd *ctd, u32 w1)
+{
+	ctd->w1 = w1;
+}
+
+static inline void ctd_set_w2(struct vpdma_ctd *ctd, u32 w2)
+{
+	ctd->w2 = w2;
+}
+
+static inline u32 ctd_get_timer_value(struct vpdma_ctd *ctd)
+{
+	return ctd->timer_value;
+}
+
+static inline dma_addr_t ctd_get_list_addr(struct vpdma_ctd *ctd)
+{
+	return (dma_addr_t)ctd->list_addr;
+}
+
+static inline u32 ctd_get_pixel_count(struct vpdma_ctd *ctd)
+{
+	return ctd->pixel_line_count >> CTD_PIXEL_COUNT_SHFT;
+}
+
+static inline int ctd_get_line_count(struct vpdma_ctd *ctd)
+{
+	return ctd->pixel_line_count & CTD_LINE_COUNT_MASK;
+}
+
+static inline u32 ctd_get_list_size(struct vpdma_ctd *ctd)
+{
+	return ctd->list_size;
+}
+
+static inline int ctd_get_event(struct vpdma_ctd *ctd)
+{
+	return ctd->event & CTD_EVENT_MASK;
+}
+
+static inline int ctd_get_fid2_ctl(struct vpdma_ctd *ctd)
+{
+	return (ctd->fid_ctl >> CTD_FID2_SHFT) & CTD_FID2_MASK;
+}
+
+static inline int ctd_get_fid1_ctl(struct vpdma_ctd *ctd)
+{
+	return (ctd->fid_ctl >> CTD_FID1_SHFT) & CTD_FID1_MASK;
+}
+
+static inline int ctd_get_fid0_ctl(struct vpdma_ctd *ctd)
+{
+	return ctd->fid_ctl & CTD_FID2_MASK;
+}
+
+static inline int ctd_get_pkt_type(struct vpdma_ctd *ctd)
+{
+	return ctd->type_source_ctl >> CTD_PKT_TYPE_SHFT;
+}
+
+static inline int ctd_get_source(struct vpdma_ctd *ctd)
+{
+	return (ctd->type_source_ctl >> CTD_SOURCE_SHFT) & CTD_SOURCE_MASK;
+}
+
+static inline int ctd_get_ctl(struct vpdma_ctd *ctd)
+{
+	return ctd->type_source_ctl & CTD_CONTROL_MASK;
+}
+
 #endif
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH 3/6] v4l: ti-vpe: Add VPE mem to mem driver
  2013-08-02 14:03 ` Archit Taneja
@ 2013-08-02 14:03   ` Archit Taneja
  -1 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-08-02 14:03 UTC (permalink / raw)
  To: linux-media
  Cc: linux-omap, dagriego, dale, pawel, m.szyprowski, hverkuil,
	laurent.pinchart, tomi.valkeinen, Archit Taneja

VPE is a block which consists of a single memory to memory path which can
perform chrominance up/down sampling, de-interlacing, scaling, and color space
conversion of raster or tiled YUV420 coplanar, YUV422 coplanar or YUV422
interleaved video formats.

We create a mem2mem driver based primarily on the mem2mem-testdev example.
The de-interlacer, scaler and color space converter are all bypassed for now
to keep the driver simple. Chroma up/down sampler blocks are implemented, so
conversion beteen different YUV formats is possible.

Each mem2mem context allocates a buffer for VPE MMR values which it will use
when it gets access to the VPE HW via the mem2mem queue, it also allocates
a VPDMA descriptor list to which configuration and data descriptors are added.

Based on the information received via v4l2 ioctls for the source and
destination queues, the driver configures the values for the MMRs, and stores
them in the buffer. There are also some VPDMA parameters like frame start and
line mode which needs to be configured, these are configured by direct register
writes via the VPDMA helper functions.

The driver's device_run() mem2mem op will add each descriptor based on how the
source and destination queues are set up for the given ctx, once the list is
prepared, it's submitted to VPDMA, these descriptors when parsed by VPDMA will
upload MMR registers, start DMA of video buffers on the various input and output
clients/ports.

When the list is parsed completely(and the DMAs on all the output ports done),
an interrupt is generated which we use to notify that the source and destination
buffers are done.

The rest of the driver is quite similar to other mem2mem drivers, we use the
multiplane v4l2 ioctls as the HW support coplanar formats.

Signed-off-by: Archit Taneja <archit@ti.com>
---
 drivers/media/platform/Kconfig           |   10 +
 drivers/media/platform/Makefile          |    2 +
 drivers/media/platform/ti-vpe/vpe.c      | 1763 ++++++++++++++++++++++++++++++
 drivers/media/platform/ti-vpe/vpe_regs.h |  496 +++++++++
 4 files changed, 2271 insertions(+)
 create mode 100644 drivers/media/platform/ti-vpe/vpe.c
 create mode 100644 drivers/media/platform/ti-vpe/vpe_regs.h

diff --git a/drivers/media/platform/Kconfig b/drivers/media/platform/Kconfig
index 08de865..909e590 100644
--- a/drivers/media/platform/Kconfig
+++ b/drivers/media/platform/Kconfig
@@ -210,6 +210,16 @@ config VIDEO_SH_VEU
 	    Support for the Video Engine Unit (VEU) on SuperH and
 	    SH-Mobile SoCs.
 
+config VIDEO_TI_VPE
+	tristate "TI VPE (Video Processing Engine) driver"
+	depends on VIDEO_DEV && VIDEO_V4L2 && SOC_DRA7XX
+	select VIDEOBUF2_DMA_CONTIG
+	select V4L2_MEM2MEM_DEV
+	default n
+	---help---
+	  This is a v4l2 driver for the TI VPE(Video Processing Engine) block
+	  found on DRA7XX SoC.
+
 endif # V4L_MEM2MEM_DRIVERS
 
 menuconfig V4L_TEST_DRIVERS
diff --git a/drivers/media/platform/Makefile b/drivers/media/platform/Makefile
index eee28dd..d4614e7 100644
--- a/drivers/media/platform/Makefile
+++ b/drivers/media/platform/Makefile
@@ -22,6 +22,8 @@ obj-$(CONFIG_VIDEO_VIVI) += vivi.o
 
 obj-$(CONFIG_VIDEO_MEM2MEM_TESTDEV) += mem2mem_testdev.o
 
+obj-$(CONFIG_VIDEO_TI_VPE)		+= ti-vpe/
+
 obj-$(CONFIG_VIDEO_MX2_EMMAPRP)		+= mx2_emmaprp.o
 obj-$(CONFIG_VIDEO_CODA) 		+= coda.o
 
diff --git a/drivers/media/platform/ti-vpe/vpe.c b/drivers/media/platform/ti-vpe/vpe.c
new file mode 100644
index 0000000..14a292b
--- /dev/null
+++ b/drivers/media/platform/ti-vpe/vpe.c
@@ -0,0 +1,1763 @@
+/*
+ * TI VPE mem2mem driver, based on the virtual v4l2-mem2mem example driver
+ *
+ * Copyright (c) 2013 Texas Instruments Inc.
+ * David Griego, <dagriego@biglakesoftware.com>
+ * Dale Farnsworth, <dale@farnsworth.org>
+ * Archit Taneja, <archit@ti.com>
+ *
+ * Copyright (c) 2009-2010 Samsung Electronics Co., Ltd.
+ * Pawel Osciak, <pawel@osciak.com>
+ * Marek Szyprowski, <m.szyprowski@samsung.com>
+ *
+ * Based on the virtual v4l2-mem2mem example device
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation
+ */
+
+#include <linux/delay.h>
+#include <linux/dma-mapping.h>
+#include <linux/err.h>
+#include <linux/fs.h>
+#include <linux/interrupt.h>
+#include <linux/io.h>
+#include <linux/ioctl.h>
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <linux/pm_runtime.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/videodev2.h>
+
+#include <media/v4l2-common.h>
+#include <media/v4l2-ctrls.h>
+#include <media/v4l2-device.h>
+#include <media/v4l2-event.h>
+#include <media/v4l2-ioctl.h>
+#include <media/v4l2-mem2mem.h>
+#include <media/videobuf2-core.h>
+#include <media/videobuf2-dma-contig.h>
+
+#include "vpdma.h"
+#include "vpe_regs.h"
+
+#define VPE_MODULE_NAME "vpe"
+
+/* minimum and maximum frame sizes */
+#define MIN_W		128
+#define MIN_H		128
+#define MAX_W		1920
+#define MAX_H		1080
+
+/* required alignments */
+#define S_ALIGN		0	/* multiple of 1 */
+#define H_ALIGN		1	/* multiple of 2 */
+#define W_ALIGN		1	/* multiple of 2 */
+
+/* multiple of 128 bits, line stride, 16 bytes */
+#define L_ALIGN		4
+
+/* flags that indicate a format can be used for capture/output */
+#define VPE_FMT_TYPE_CAPTURE	(1 << 0)
+#define VPE_FMT_TYPE_OUTPUT	(1 << 1)
+
+/* used as plane indices */
+#define VPE_MAX_PLANES	2
+#define VPE_LUMA	0
+#define VPE_CHROMA	1
+
+/* per m2m context info */
+#define VPE_DEF_BUFS_PER_JOB	1	/* default one buffer per batch job */
+
+/*
+ * each VPE context needs up to 3 config desciptors, 7 input descriptors,
+ * 3 output descriptors, and 3 control descripors
+ */
+#define VPE_DESC_LIST_SIZE	(15 * VPDMA_MAX_DESC_SIZE)
+
+#define vpe_dbg(vpedev, fmt, arg...)	\
+		dev_dbg((vpedev)->v4l2_dev.dev, fmt, ##arg)
+#define vpe_err(vpedev, fmt, arg...)	\
+		dev_err((vpedev)->v4l2_dev.dev, fmt, ##arg)
+
+struct vpe_us_coeffs {
+	unsigned short	anchor_fid0_c0;
+	unsigned short	anchor_fid0_c1;
+	unsigned short	anchor_fid0_c2;
+	unsigned short	anchor_fid0_c3;
+	unsigned short	interp_fid0_c0;
+	unsigned short	interp_fid0_c1;
+	unsigned short	interp_fid0_c2;
+	unsigned short	interp_fid0_c3;
+	unsigned short	anchor_fid1_c0;
+	unsigned short	anchor_fid1_c1;
+	unsigned short	anchor_fid1_c2;
+	unsigned short	anchor_fid1_c3;
+	unsigned short	interp_fid1_c0;
+	unsigned short	interp_fid1_c1;
+	unsigned short	interp_fid1_c2;
+	unsigned short	interp_fid1_c3;
+};
+
+/*
+ * Default upsampler coefficients
+ */
+static struct vpe_us_coeffs us_coeffs[] = {
+	{
+		/* Coefficients for progressive input */
+		0x00C8, 0x0348, 0x0018, 0x3FD8, 0x3FB8, 0x0378, 0x00E8, 0x3FE8,
+		0x00C8, 0x0348, 0x0018, 0x3FD8, 0x3FB8, 0x0378, 0x00E8, 0x3FE8,
+	},
+};
+
+/*
+ * The port_data structure contains per-port data.
+ */
+struct vpe_port_data {
+	enum vpdma_channel channel;	/* VPDMA channel */
+	u8	vb_part;		/* plane index for co-panar formats */
+};
+
+/*
+ * Define indices into the port_data tables
+ */
+#define VPE_PORT_LUMA1_IN	0
+#define VPE_PORT_CHROMA1_IN	1
+#define VPE_PORT_LUMA_OUT	8
+#define VPE_PORT_CHROMA_OUT	9
+#define VPE_PORT_RGB_OUT	10
+
+static struct vpe_port_data port_data[11] = {
+	[VPE_PORT_LUMA1_IN] = {
+		.channel	= VPE_CHAN_LUMA1_IN,
+		.vb_part	= VPE_LUMA,
+	},
+	[VPE_PORT_CHROMA1_IN] = {
+		.channel	= VPE_CHAN_CHROMA1_IN,
+		.vb_part	= VPE_CHROMA,
+	},
+	[VPE_PORT_LUMA_OUT] = {
+		.channel	= VPE_CHAN_LUMA_OUT,
+		.vb_part	= VPE_LUMA,
+	},
+	[VPE_PORT_CHROMA_OUT] = {
+		.channel	= VPE_CHAN_CHROMA_OUT,
+		.vb_part	= VPE_CHROMA,
+	},
+	[VPE_PORT_RGB_OUT] = {
+		.channel	= VPE_CHAN_RGB_OUT,
+		.vb_part	= VPE_LUMA,
+	},
+};
+
+
+/* driver info for each of the supported video formats */
+struct vpe_fmt {
+	char	*name;			/* human-readable name */
+	u32	fourcc;			/* standard format identifier */
+	u8	types;			/* CAPTURE and/or OUTPUT */
+	u8	coplanar;		/* set for unpacked Luma and Chroma */
+	/* vpdma format info for each plane */
+	struct vpdma_data_format *vpdma_fmt[VPE_MAX_PLANES];
+};
+
+static struct vpe_fmt vpe_formats[] = {
+	{
+		.name		= "YUV 422 co-planar",
+		.fourcc		= V4L2_PIX_FMT_NV16,
+		.types		= VPE_FMT_TYPE_CAPTURE | VPE_FMT_TYPE_OUTPUT,
+		.coplanar	= 1,
+		.vpdma_fmt	= { &vpdma_yuv_fmts[VPDMA_DATA_FMT_Y444],
+				    &vpdma_yuv_fmts[VPDMA_DATA_FMT_C444],
+				  },
+	},
+	{
+		.name		= "YUV 420 co-planar",
+		.fourcc		= V4L2_PIX_FMT_NV12,
+		.types		= VPE_FMT_TYPE_CAPTURE | VPE_FMT_TYPE_OUTPUT,
+		.coplanar	= 1,
+		.vpdma_fmt	= { &vpdma_yuv_fmts[VPDMA_DATA_FMT_Y420],
+				    &vpdma_yuv_fmts[VPDMA_DATA_FMT_C420],
+				  },
+	},
+	{
+		.name		= "YUYV 422 packed",
+		.fourcc		= V4L2_PIX_FMT_YUYV,
+		.types		= VPE_FMT_TYPE_CAPTURE | VPE_FMT_TYPE_OUTPUT,
+		.coplanar	= 0,
+		.vpdma_fmt	= { &vpdma_yuv_fmts[VPDMA_DATA_FMT_YC422],
+				  },
+	},
+	{
+		.name		= "UYVY 422 packed",
+		.fourcc		= V4L2_PIX_FMT_UYVY,
+		.types		= VPE_FMT_TYPE_CAPTURE | VPE_FMT_TYPE_OUTPUT,
+		.coplanar	= 0,
+		.vpdma_fmt	= { &vpdma_yuv_fmts[VPDMA_DATA_FMT_CY422],
+				  },
+	},
+};
+
+/*
+ * per-queue, driver-specific private data.
+ * there is one source queue and one destination queue for each m2m context.
+ */
+struct vpe_q_data {
+	unsigned int		width;				/* frame width */
+	unsigned int		height;				/* frame height */
+	unsigned int		bytesperline[VPE_MAX_PLANES];	/* bytes per line in memory */
+	enum v4l2_colorspace	colorspace;
+	unsigned int		flags;
+	unsigned int		sizeimage[VPE_MAX_PLANES];	/* image size in memory */
+	struct v4l2_rect	c_rect;				/* crop/compose rectangle */
+	struct vpe_fmt		*fmt;				/* format info */
+};
+
+/* vpe_q_data flag bits */
+#define	Q_DATA_FRAME_1D		(1 << 0)
+#define	Q_DATA_MODE_TILED	(1 << 1)
+
+enum {
+	Q_DATA_SRC = 0,
+	Q_DATA_DST = 1,
+};
+
+/* find our format description corresponding to the passed v4l2_format */
+static struct vpe_fmt *find_format(struct v4l2_format *f)
+{
+	struct vpe_fmt *fmt;
+	unsigned int k;
+
+	for (k = 0; k < ARRAY_SIZE(vpe_formats); k++) {
+		fmt = &vpe_formats[k];
+		if (fmt->fourcc == f->fmt.pix.pixelformat)
+			return fmt;
+	}
+
+	return NULL;
+}
+
+/*
+ * there is one vpe_dev structure in the driver, it is shared by
+ * all instances.
+ */
+struct vpe_dev {
+	struct v4l2_device	v4l2_dev;
+	struct video_device	*vfd;
+	struct v4l2_m2m_dev	*m2m_dev;
+
+	atomic_t		num_instances;	/* count of driver instances */
+	dma_addr_t		loaded_mmrs;	/* shadow mmrs in device */
+	struct mutex		dev_mutex;
+	spinlock_t		lock;
+
+	int			irq;
+	void __iomem		*base;
+
+	struct vb2_alloc_ctx	*alloc_ctx;
+	struct vpdma_data	*vpdma;		/* vpdma data handle */
+};
+
+/*
+ * There is one vpe_ctx structure for each m2m context.
+ */
+struct vpe_ctx {
+	struct v4l2_fh		fh;
+	struct vpe_dev		*dev;
+	struct v4l2_m2m_ctx	*m2m_ctx;
+	struct v4l2_ctrl_handler hdl;
+
+	unsigned int		sequence;		/* current frame/field seq */
+	unsigned int		aborting;		/* abort after next irq */
+
+	unsigned int		bufs_per_job;		/* input buffers per batch */
+	unsigned int		bufs_completed;		/* bufs done in this batch */
+
+	struct vpe_q_data	q_data[2];		/* src & dst queue data */
+	struct vb2_buffer	*src_vb;
+	struct vb2_buffer	*dst_vb;
+
+	struct vpdma_buf	mmr_adb;		/* shadow reg addr/data block */
+	struct vpdma_desc_list	desc_list;		/* DMA descriptor list */
+
+	bool			load_mmrs;		/* have new shadow reg values */
+};
+
+
+/*
+ * M2M devices get 2 queues.
+ * Return the queue given the type.
+ */
+static struct vpe_q_data *get_q_data(struct vpe_ctx *ctx,
+				     enum v4l2_buf_type type)
+{
+	switch (type) {
+	case V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE:
+		return &ctx->q_data[Q_DATA_SRC];
+	case V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE:
+		return &ctx->q_data[Q_DATA_DST];
+	default:
+		BUG();
+	}
+	return NULL;
+}
+
+static u32 read_reg(struct vpe_dev *dev, int offset)
+{
+	return ioread32(dev->base + offset);
+}
+
+static void write_reg(struct vpe_dev *dev, int offset, u32 value)
+{
+	iowrite32(value, dev->base + offset);
+}
+
+/* register field read/write helpers */
+static int get_field(u32 value, u32 mask, int shift)
+{
+	return (value & (mask << shift)) >> shift;
+}
+
+static int get_field_reg(struct vpe_dev *dev, int offset, u32 mask, int shift)
+{
+	return get_field(read_reg(dev, offset), mask, shift);
+}
+
+static void insert_field(u32 *valp, u32 field, u32 mask, int shift)
+{
+	u32 val = *valp;
+
+	val &= ~(mask << shift);
+	val |= (field & mask) << shift;
+	*valp = val;
+}
+
+static void insert_field_reg(struct vpe_dev *dev, int offset, u32 field,
+		u32 mask, int shift)
+{
+	u32 val = read_reg(dev, offset);
+
+	insert_field(&val, field, mask, shift);
+
+	write_reg(dev, offset, val);
+}
+
+/*
+ * DMA address/data block for the shadow registers
+ */
+struct vpe_mmr_adb {
+	struct vpdma_adb_hdr	out_fmt_hdr;
+	u32			out_fmt_reg[1];
+	u32			out_fmt_pad[3];
+	struct vpdma_adb_hdr	us1_hdr;
+	u32			us1_regs[8];
+	struct vpdma_adb_hdr	us2_hdr;
+	u32			us2_regs[8];
+	struct vpdma_adb_hdr	us3_hdr;
+	u32			us3_regs[8];
+	struct vpdma_adb_hdr	dei_hdr;
+	u32			dei_regs[1];
+	u32			dei_pad[3];
+	struct vpdma_adb_hdr	sc_hdr;
+	u32			sc_regs[1];
+	u32			sc_pad[3];
+	struct vpdma_adb_hdr	csc_hdr;
+	u32			csc_regs[6];
+	u32			csc_pad[2];
+};
+
+#define VPE_SET_MMR_ADB_HDR(ctx, hdr, regs, offset_a)	\
+	VPDMA_SET_MMR_ADB_HDR(ctx->mmr_adb, vpe_mmr_adb, hdr, regs, offset_a)
+/*
+ * Set the headers for all of the address/data block structures.
+ */
+static void init_adb_hdrs(struct vpe_ctx *ctx)
+{
+	VPE_SET_MMR_ADB_HDR(ctx, out_fmt_hdr, out_fmt_reg, VPE_CLK_FORMAT_SELECT);
+	VPE_SET_MMR_ADB_HDR(ctx, us1_hdr, us1_regs, VPE_US1_R0);
+	VPE_SET_MMR_ADB_HDR(ctx, us2_hdr, us2_regs, VPE_US2_R0);
+	VPE_SET_MMR_ADB_HDR(ctx, us3_hdr, us3_regs, VPE_US3_R0);
+	VPE_SET_MMR_ADB_HDR(ctx, dei_hdr, dei_regs, VPE_DEI_FRAME_SIZE);
+	VPE_SET_MMR_ADB_HDR(ctx, sc_hdr, sc_regs, VPE_SC_MP_SC0);
+	VPE_SET_MMR_ADB_HDR(ctx, csc_hdr, csc_regs, VPE_CSC_CSC00);
+};
+
+/*
+ * Enable or disable the VPE clocks
+ */
+static void vpe_set_clock_enable(struct vpe_dev *dev, bool on)
+{
+	u32 val = 0;
+
+	if (on)
+		val = VPE_DATA_PATH_CLK_ENABLE | VPE_VPEDMA_CLK_ENABLE;
+	write_reg(dev, VPE_CLK_ENABLE, val);
+}
+
+static void vpe_top_reset(struct vpe_dev *dev)
+{
+
+	insert_field_reg(dev, VPE_CLK_RESET, 1, VPE_DATA_PATH_CLK_RESET_MASK,
+		VPE_DATA_PATH_CLK_RESET_SHIFT);
+
+	usleep_range(100, 150);
+
+	insert_field_reg(dev, VPE_CLK_RESET, 0, VPE_DATA_PATH_CLK_RESET_MASK,
+		VPE_DATA_PATH_CLK_RESET_SHIFT);
+}
+
+static void vpe_top_vpdma_reset(struct vpe_dev *dev)
+{
+	insert_field_reg(dev, VPE_CLK_RESET, 1, VPE_VPDMA_CLK_RESET_MASK,
+		VPE_VPDMA_CLK_RESET_SHIFT);
+
+	usleep_range(100, 150);
+
+	insert_field_reg(dev, VPE_CLK_RESET, 0, VPE_VPDMA_CLK_RESET_MASK,
+		VPE_VPDMA_CLK_RESET_SHIFT);
+}
+
+/*
+ * Load the correct of upsampler coefficients into the shadow MMRs
+ */
+static void set_us_coefficients(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	u32 *us1_reg = &mmr_adb->us1_regs[0];
+	u32 *us2_reg = &mmr_adb->us2_regs[0];
+	u32 *us3_reg = &mmr_adb->us3_regs[0];
+	unsigned short *cp, *end_cp;
+
+	cp = &us_coeffs[0].anchor_fid0_c0;
+
+	end_cp = cp + sizeof(us_coeffs[0]) / sizeof(*cp);
+
+	while (cp < end_cp) {
+		insert_field(us1_reg, *cp++, VPE_US_C0_MASK, VPE_US_C0_SHIFT);
+		insert_field(us1_reg, *cp++, VPE_US_C1_MASK, VPE_US_C1_SHIFT);
+		*us2_reg++ = *us1_reg;
+		*us3_reg++ = *us1_reg++;
+	}
+	ctx->load_mmrs = true;
+}
+
+/*
+ * Set the upsampler config mode and the VPDMA line mode in the shadow MMRs.
+ */
+static void set_cfg_and_line_modes(struct vpe_ctx *ctx)
+{
+	struct vpe_fmt *fmt = ctx->q_data[Q_DATA_SRC].fmt;
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	u32 *us1_reg0 = &mmr_adb->us1_regs[0];
+	u32 *us2_reg0 = &mmr_adb->us2_regs[0];
+	u32 *us3_reg0 = &mmr_adb->us3_regs[0];
+	int line_mode = 1;
+	int cfg_mode = 1;
+
+	/*
+	 * Cfg Mode 0: YUV420 source, enable upsampler, DEI is de-interlacing.
+	 * Cfg Mode 1: YUV422 source, disable upsampler, DEI is de-interlacing.
+	 */
+
+	if (fmt->fourcc == V4L2_PIX_FMT_NV12) {
+		cfg_mode = 0;
+		line_mode = 0;		/* double lines to line buffer */
+	}
+
+	insert_field(us1_reg0, cfg_mode, VPE_US_MODE_MASK, VPE_US_MODE_SHIFT);
+	insert_field(us2_reg0, cfg_mode, VPE_US_MODE_MASK, VPE_US_MODE_SHIFT);
+	insert_field(us3_reg0, cfg_mode, VPE_US_MODE_MASK, VPE_US_MODE_SHIFT);
+
+	/* regs for now */
+	vpdma_set_line_mode(ctx->dev->vpdma, line_mode, VPE_CHAN_CHROMA1_IN);
+
+	/* frame start for input luma */
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_LUMA1_IN);
+
+	/* frame start for input chroma */
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_CHROMA1_IN);
+
+	ctx->load_mmrs = true;
+}
+
+/*
+ * Set the shadow registers that are modified when the source
+ * format changes.
+ */
+static void set_src_registers(struct vpe_ctx *ctx)
+{
+	set_us_coefficients(ctx);
+}
+
+/*
+ * Set the shadow registers that are modified when the destination
+ * format changes.
+ */
+static void set_dst_registers(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	struct vpe_fmt *fmt = ctx->q_data[Q_DATA_DST].fmt;
+	u32 val = 0;
+
+	/* select RGB path when color space conversion is supported in future */
+	if (fmt->fourcc == V4L2_PIX_FMT_RGB24)
+		val |= VPE_RGB_OUT_SELECT | VPE_CSC_SRC_DEI_SCALER;
+	else if (fmt->fourcc == V4L2_PIX_FMT_NV16)
+		val |= VPE_COLOR_SEPARATE_422;
+
+	/* The source of CHR_DS is always the scaler, whether it's used or not */
+	val |= VPE_DS_SRC_DEI_SCALER;
+
+	if (fmt->fourcc != V4L2_PIX_FMT_NV12)
+		val |= VPE_DS_BYPASS;
+
+	mmr_adb->out_fmt_reg[0] = val;
+
+	ctx->load_mmrs = true;
+}
+
+/*
+ * Set the de-interlacer shadow register values
+ */
+static void set_dei_regs_bypass(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	struct vpe_q_data *s_q_data = &ctx->q_data[Q_DATA_SRC];
+	unsigned int src_h = s_q_data->c_rect.height;
+	unsigned int src_w = s_q_data->c_rect.width;
+	u32 *dei_mmr0 = &mmr_adb->dei_regs[0];
+	u32 val = 0;
+
+	/*
+	 * according to TRM, we should set DEI in progressive bypass mode when
+	 * the input content is progressive, however, DEI is bypassed correctly
+	 * for both progressive and interlace content in interlace bypass mode.
+	 * It has been recommended not to use progressive bypass mode.
+	 */
+	val = VPE_DEI_INTERLACE_BYPASS;
+
+	val |= (src_h << VPE_DEI_HEIGHT_SHIFT) |
+		(src_w << VPE_DEI_WIDTH_SHIFT) |
+		VPE_DEI_FIELD_FLUSH;
+
+	*dei_mmr0 = val;
+
+	ctx->load_mmrs = true;
+}
+
+static void set_csc_coeff_bypass(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	u32 *shadow_csc_reg5 = &mmr_adb->csc_regs[5];
+
+	*shadow_csc_reg5 |= VPE_CSC_BYPASS;
+
+	ctx->load_mmrs = true;
+}
+
+static void set_sc_regs_bypass(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	u32 *sc_reg0 = &mmr_adb->sc_regs[0];
+	u32 val = 0;
+
+	val |= VPE_SC_BYPASS;
+	*sc_reg0 = val;
+
+	ctx->load_mmrs = true;
+}
+
+/*
+ * Set the shadow registers whose values are modified when either the
+ * source or destination format is changed.
+ */
+static int set_srcdst_params(struct vpe_ctx *ctx)
+{
+	ctx->sequence = 0;
+
+	set_cfg_and_line_modes(ctx);
+	set_dei_regs_bypass(ctx);
+	set_csc_coeff_bypass(ctx);
+	set_sc_regs_bypass(ctx);
+
+	return 0;
+}
+
+/*
+ * Return the vpe_ctx structure for a given struct file
+ */
+static struct vpe_ctx *file2ctx(struct file *file)
+{
+	return container_of(file->private_data, struct vpe_ctx, fh);
+}
+
+/*
+ * mem2mem callbacks
+ */
+
+/**
+ * job_ready() - check whether an instance is ready to be scheduled to run
+ */
+static int job_ready(void *priv)
+{
+	struct vpe_ctx *ctx = priv;
+	int needed = ctx->bufs_per_job;
+
+	if (v4l2_m2m_num_src_bufs_ready(ctx->m2m_ctx) < needed)
+		return 0;
+
+	return 1;
+}
+
+static void job_abort(void *priv)
+{
+	struct vpe_ctx *ctx = priv;
+
+	/* Will cancel the transaction in the next interrupt handler */
+	ctx->aborting = 1;
+}
+
+/*
+ * Lock access to the device
+ */
+static void vpe_lock(void *priv)
+{
+	struct vpe_ctx *ctx = priv;
+	struct vpe_dev *dev = ctx->dev;
+	mutex_lock(&dev->dev_mutex);
+}
+
+static void vpe_unlock(void *priv)
+{
+	struct vpe_ctx *ctx = priv;
+	struct vpe_dev *dev = ctx->dev;
+	mutex_unlock(&dev->dev_mutex);
+}
+
+static void vpe_dump_regs(struct vpe_dev *dev)
+{
+#define DUMPREG(r) vpe_dbg(dev, "%-35s %08x\n", #r, read_reg(dev, VPE_##r))
+
+	vpe_dbg(dev, "VPE Registers:\n");
+
+	DUMPREG(PID);
+	DUMPREG(SYSCONFIG);
+	DUMPREG(INT0_STATUS0_RAW);
+	DUMPREG(INT0_STATUS0);
+	DUMPREG(INT0_ENABLE0);
+	DUMPREG(INT0_STATUS1_RAW);
+	DUMPREG(INT0_STATUS1);
+	DUMPREG(INT0_ENABLE1);
+	DUMPREG(CLK_ENABLE);
+	DUMPREG(CLK_RESET);
+	DUMPREG(CLK_FORMAT_SELECT);
+	DUMPREG(CLK_RANGE_MAP);
+	DUMPREG(US1_R0);
+	DUMPREG(US1_R1);
+	DUMPREG(US1_R2);
+	DUMPREG(US1_R3);
+	DUMPREG(US1_R4);
+	DUMPREG(US1_R5);
+	DUMPREG(US1_R6);
+	DUMPREG(US1_R7);
+	DUMPREG(US2_R0);
+	DUMPREG(US2_R1);
+	DUMPREG(US2_R2);
+	DUMPREG(US2_R3);
+	DUMPREG(US2_R4);
+	DUMPREG(US2_R5);
+	DUMPREG(US2_R6);
+	DUMPREG(US2_R7);
+	DUMPREG(US3_R0);
+	DUMPREG(US3_R1);
+	DUMPREG(US3_R2);
+	DUMPREG(US3_R3);
+	DUMPREG(US3_R4);
+	DUMPREG(US3_R5);
+	DUMPREG(US3_R6);
+	DUMPREG(US3_R7);
+	DUMPREG(DEI_FRAME_SIZE);
+	DUMPREG(MDT_BYPASS);
+	DUMPREG(MDT_SF_THRESHOLD);
+	DUMPREG(EDI_CONFIG);
+	DUMPREG(DEI_EDI_LUT_R0);
+	DUMPREG(DEI_EDI_LUT_R1);
+	DUMPREG(DEI_EDI_LUT_R2);
+	DUMPREG(DEI_EDI_LUT_R3);
+	DUMPREG(DEI_FMD_WINDOW_R0);
+	DUMPREG(DEI_FMD_WINDOW_R1);
+	DUMPREG(DEI_FMD_CONTROL_R0);
+	DUMPREG(DEI_FMD_CONTROL_R1);
+	DUMPREG(DEI_FMD_STATUS_R0);
+	DUMPREG(DEI_FMD_STATUS_R1);
+	DUMPREG(DEI_FMD_STATUS_R2);
+	DUMPREG(SC_MP_SC0);
+	DUMPREG(SC_MP_SC1);
+	DUMPREG(SC_MP_SC2);
+	DUMPREG(SC_MP_SC3);
+	DUMPREG(SC_MP_SC4);
+	DUMPREG(SC_MP_SC5);
+	DUMPREG(SC_MP_SC6);
+	DUMPREG(SC_MP_SC8);
+	DUMPREG(SC_MP_SC9);
+	DUMPREG(SC_MP_SC10);
+	DUMPREG(SC_MP_SC11);
+	DUMPREG(SC_MP_SC12);
+	DUMPREG(SC_MP_SC13);
+	DUMPREG(SC_MP_SC17);
+	DUMPREG(SC_MP_SC18);
+	DUMPREG(SC_MP_SC19);
+	DUMPREG(SC_MP_SC20);
+	DUMPREG(SC_MP_SC21);
+	DUMPREG(SC_MP_SC22);
+	DUMPREG(SC_MP_SC23);
+	DUMPREG(SC_MP_SC24);
+	DUMPREG(SC_MP_SC25);
+	DUMPREG(CSC_CSC00);
+	DUMPREG(CSC_CSC01);
+	DUMPREG(CSC_CSC02);
+	DUMPREG(CSC_CSC03);
+	DUMPREG(CSC_CSC04);
+	DUMPREG(CSC_CSC05);
+#undef DUMPREG
+}
+
+static void add_out_dtd(struct vpe_ctx *ctx, int port)
+{
+	struct vpe_q_data *q_data = &ctx->q_data[Q_DATA_DST];
+	struct vpe_port_data *p_data = &port_data[port];
+	struct vb2_buffer *vb = ctx->dst_vb;
+	struct v4l2_rect *c_rect = &q_data->c_rect;
+	struct vpe_fmt *fmt = q_data->fmt;
+	struct vpdma_data_format *vpdma_fmt;
+	int plane = fmt->coplanar ? p_data->vb_part : 0;
+	dma_addr_t dma_addr;
+	u32 flags = 0;
+
+	vpdma_fmt = fmt->vpdma_fmt[plane];
+	dma_addr = vb2_dma_contig_plane_dma_addr(vb, plane);
+	if (!dma_addr) {
+		vpe_err(ctx->dev,
+			"acquiring output buffer(%d) dma_addr failed\n",
+			port);
+		return;
+	}
+
+	if (q_data->flags & Q_DATA_FRAME_1D)
+		flags |= VPDMA_DATA_FRAME_1D;
+	if (q_data->flags & Q_DATA_MODE_TILED)
+		flags |= VPDMA_DATA_MODE_TILED;
+
+	vpdma_add_out_dtd(&ctx->desc_list, c_rect, vpdma_fmt, dma_addr,
+		p_data->channel, flags);
+}
+
+static void add_in_dtd(struct vpe_ctx *ctx, int port)
+{
+	struct vpe_q_data *q_data = &ctx->q_data[Q_DATA_SRC];
+	struct vpe_port_data *p_data = &port_data[port];
+	struct vb2_buffer *vb = ctx->src_vb;
+	struct v4l2_rect *c_rect = &q_data->c_rect;
+	struct vpe_fmt *fmt = q_data->fmt;
+	struct vpdma_data_format *vpdma_fmt;
+	int plane = fmt->coplanar ? p_data->vb_part : 0;
+	int field = 0;
+	dma_addr_t dma_addr;
+	u32 flags = 0;
+
+	vpdma_fmt = fmt->vpdma_fmt[plane];
+
+	dma_addr = vb2_dma_contig_plane_dma_addr(vb, plane);
+	if (!dma_addr) {
+		vpe_err(ctx->dev,
+			"acquiring input buffer(%d) dma_addr failed\n",
+			port);
+		return;
+	}
+
+	if (q_data->flags & Q_DATA_FRAME_1D)
+		flags |= VPDMA_DATA_FRAME_1D;
+	if (q_data->flags & Q_DATA_MODE_TILED)
+		flags |= VPDMA_DATA_MODE_TILED;
+
+	vpdma_add_in_dtd(&ctx->desc_list, q_data->width, q_data->height,
+		c_rect, vpdma_fmt, dma_addr, p_data->channel, field, flags);
+}
+
+/*
+ * Enable the expected IRQ sources
+ */
+static void enable_irqs(struct vpe_ctx *ctx)
+{
+	write_reg(ctx->dev, VPE_INT0_ENABLE0_SET, VPE_INT0_LIST0_COMPLETE);
+	write_reg(ctx->dev, VPE_INT0_ENABLE1_SET, VPE_DS1_UV_ERROR_INT);
+
+	vpdma_enable_list_complete_irq(ctx->dev->vpdma, 0, true);
+}
+
+static void disable_irqs(struct vpe_ctx *ctx)
+{
+	write_reg(ctx->dev, VPE_INT0_ENABLE0_CLR, 0xffffffff);
+	write_reg(ctx->dev, VPE_INT0_ENABLE1_CLR, 0xffffffff);
+
+	vpdma_enable_list_complete_irq(ctx->dev->vpdma, 0, false);
+}
+
+/* device_run() - prepares and starts the device
+ *
+ * This function is only called when both the source and destination
+ * buffers are in place.
+ */
+static void device_run(void *priv)
+{
+	struct vpe_ctx *ctx = priv;
+	struct vpe_q_data *d_q_data = &ctx->q_data[Q_DATA_DST];
+
+	ctx->src_vb = v4l2_m2m_src_buf_remove(ctx->m2m_ctx);
+	WARN_ON(ctx->src_vb == NULL);
+	ctx->dst_vb = v4l2_m2m_dst_buf_remove(ctx->m2m_ctx);
+	WARN_ON(ctx->dst_vb == NULL);
+
+	/* config descriptors */
+	if (ctx->dev->loaded_mmrs != ctx->mmr_adb.dma_addr || ctx->load_mmrs) {
+		vpdma_buf_map(ctx->dev->vpdma, &ctx->mmr_adb);
+		vpdma_add_cfd_adb(&ctx->desc_list, CFD_MMR_CLIENT, &ctx->mmr_adb);
+		ctx->dev->loaded_mmrs = ctx->mmr_adb.dma_addr;
+		ctx->load_mmrs = false;
+	}
+
+	add_out_dtd(ctx, VPE_PORT_LUMA_OUT);
+	if (d_q_data->fmt->coplanar)
+		add_out_dtd(ctx, VPE_PORT_CHROMA_OUT);
+
+	add_in_dtd(ctx, VPE_PORT_LUMA1_IN);
+	add_in_dtd(ctx, VPE_PORT_CHROMA1_IN);
+
+	/* sync on channel control descriptors for output ports */
+	vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_LUMA_OUT);
+	if (d_q_data->fmt->coplanar)
+		vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_CHROMA_OUT);
+
+	enable_irqs(ctx);
+
+	vpdma_buf_map(ctx->dev->vpdma, &ctx->desc_list.buf);
+	vpdma_submit_descs(ctx->dev->vpdma, &ctx->desc_list);
+}
+
+static void ds1_uv_error(struct vpe_ctx *ctx)
+{
+	dev_warn(ctx->dev->v4l2_dev.dev,
+		"received downsampler error interrupt\n");
+}
+
+static irqreturn_t vpe_irq(int irq_vpe, void *data)
+{
+	struct vpe_dev *dev = (struct vpe_dev *)data;
+	struct vpe_ctx *ctx;
+	struct vb2_buffer *s_vb, *d_vb;
+	struct v4l2_buffer *s_buf, *d_buf;
+	unsigned long flags;
+	u32 irqst0, irqst1;
+
+	irqst0 = read_reg(dev, VPE_INT0_STATUS0);
+	if (irqst0) {
+		write_reg(dev, VPE_INT0_STATUS0_CLR, irqst0);
+		vpe_dbg(dev, "INT0_STATUS0 = 0x%08x\n", irqst0);
+	}
+
+	irqst1 = read_reg(dev, VPE_INT0_STATUS1);
+	if (irqst1) {
+		write_reg(dev, VPE_INT0_STATUS1_CLR, irqst1);
+		vpe_dbg(dev, "INT0_STATUS1 = 0x%08x\n", irqst1);
+	}
+
+	ctx = v4l2_m2m_get_curr_priv(dev->m2m_dev);
+	if (!ctx) {
+		vpe_err(dev, "instance released before end of transaction\n");
+		goto handled;
+	}
+
+	if (irqst1 & VPE_DS1_UV_ERROR_INT) {
+		irqst1 &= ~VPE_DS1_UV_ERROR_INT;
+		ds1_uv_error(ctx);
+	}
+
+	if (irqst0) {
+		if (irqst0 & VPE_INT0_LIST0_COMPLETE)
+			vpdma_clear_list_stat(ctx->dev->vpdma);
+
+		irqst0 &= ~(VPE_INT0_LIST0_COMPLETE);
+	}
+
+	if (irqst0 | irqst1) {
+		dev_warn(dev->v4l2_dev.dev, "Unexpected interrupt: "
+			"INT0_STATUS0 = 0x%08x, INT0_STATUS1 = 0x%08x\n",
+			irqst0, irqst1);
+	}
+
+	disable_irqs(ctx);
+
+	vpdma_buf_unmap(dev->vpdma, &ctx->desc_list.buf);
+	vpdma_buf_unmap(dev->vpdma, &ctx->mmr_adb);
+
+	vpdma_reset_desc_list(&ctx->desc_list);
+
+	if (ctx->aborting)
+		goto finished;
+
+	s_vb = ctx->src_vb;
+	d_vb = ctx->dst_vb;
+	s_buf = &s_vb->v4l2_buf;
+	d_buf = &d_vb->v4l2_buf;
+
+	d_buf->timestamp = s_buf->timestamp;
+	if (s_buf->flags & V4L2_BUF_FLAG_TIMECODE) {
+		d_buf->flags |= V4L2_BUF_FLAG_TIMECODE;
+		d_buf->timecode = s_buf->timecode;
+	}
+
+	d_buf->sequence = ctx->sequence;
+
+	ctx->sequence++;
+
+	spin_lock_irqsave(&dev->lock, flags);
+	v4l2_m2m_buf_done(s_vb, VB2_BUF_STATE_DONE);
+	v4l2_m2m_buf_done(d_vb, VB2_BUF_STATE_DONE);
+	spin_unlock_irqrestore(&dev->lock, flags);
+
+	ctx->bufs_completed++;
+	if (ctx->bufs_completed < ctx->bufs_per_job) {
+		device_run(ctx);
+		goto handled;
+	}
+
+finished:
+	vpe_dbg(ctx->dev, "finishing transaction\n");
+	ctx->bufs_completed = 0;
+	v4l2_m2m_job_finish(dev->m2m_dev, ctx->m2m_ctx);
+handled:
+	return IRQ_HANDLED;
+}
+
+/*
+ * video ioctls
+ */
+static int vpe_querycap(struct file *file, void *priv,
+			struct v4l2_capability *cap)
+{
+	strncpy(cap->driver, VPE_MODULE_NAME, sizeof(cap->driver) - 1);
+	strncpy(cap->card, VPE_MODULE_NAME, sizeof(cap->card) - 1);
+	strlcpy(cap->bus_info, VPE_MODULE_NAME, sizeof(cap->bus_info));
+	cap->device_caps  = V4L2_CAP_VIDEO_M2M | V4L2_CAP_STREAMING |
+				V4L2_CAP_VIDEO_CAPTURE_MPLANE |
+				V4L2_CAP_VIDEO_OUTPUT_MPLANE;
+	cap->capabilities = cap->device_caps | V4L2_CAP_DEVICE_CAPS;
+	return 0;
+}
+
+static int __enum_fmt(struct v4l2_fmtdesc *f, u32 type)
+{
+	int i, index;
+	struct vpe_fmt *fmt = NULL;
+
+	index = 0;
+	for (i = 0; i < ARRAY_SIZE(vpe_formats); ++i) {
+		if (vpe_formats[i].types & type) {
+			if (index == f->index) {
+				fmt = &vpe_formats[i];
+				break;
+			}
+			index++;
+		}
+	}
+
+	if (!fmt)
+		return -EINVAL;
+
+	strncpy(f->description, fmt->name, sizeof(f->description) - 1);
+	f->pixelformat = fmt->fourcc;
+	return 0;
+}
+
+static int vpe_enum_fmt(struct file *file, void *priv,
+				struct v4l2_fmtdesc *f)
+{
+	if (V4L2_TYPE_IS_OUTPUT(f->type))
+		return __enum_fmt(f, VPE_FMT_TYPE_OUTPUT);
+	else
+		return __enum_fmt(f, VPE_FMT_TYPE_CAPTURE);
+}
+
+static int vpe_g_fmt(struct file *file, void *priv, struct v4l2_format *f)
+{
+	struct v4l2_pix_format_mplane *pix = &f->fmt.pix_mp;
+	struct vpe_ctx *ctx = file2ctx(file);
+	struct vb2_queue *vq;
+	struct vpe_q_data *q_data;
+	int i;
+
+	vq = v4l2_m2m_get_vq(ctx->m2m_ctx, f->type);
+	if (!vq)
+		return -EINVAL;
+
+	q_data = get_q_data(ctx, f->type);
+
+	pix->width = q_data->width;
+	pix->height = q_data->height;
+	pix->pixelformat = q_data->fmt->fourcc;
+	pix->colorspace = q_data->colorspace;
+	pix->num_planes = q_data->fmt->coplanar ? 2 : 1;
+
+	for (i = 0; i < pix->num_planes; i++) {
+		pix->plane_fmt[i].bytesperline = q_data->bytesperline[i];
+		pix->plane_fmt[i].sizeimage = q_data->sizeimage[i];
+	}
+
+	return 0;
+}
+
+static int __vpe_try_fmt(struct vpe_ctx *ctx, struct v4l2_format *f,
+		       struct vpe_fmt *fmt, int type)
+{
+	struct v4l2_pix_format_mplane *pix = &f->fmt.pix_mp;
+	struct v4l2_plane_pix_format *plane_fmt;
+	int i;
+
+	if (!fmt || !(fmt->types & type)) {
+		vpe_err(ctx->dev, "Fourcc format (0x%08x) invalid.\n",
+			pix->pixelformat);
+		return -EINVAL;
+	}
+
+	if (pix->field == V4L2_FIELD_ANY)
+		pix->field = V4L2_FIELD_NONE;
+	else if (V4L2_FIELD_NONE != pix->field)
+		return -EINVAL;
+
+	v4l_bound_align_image(&pix->width, MIN_W, MAX_W, W_ALIGN,
+			      &pix->height, MIN_H, MAX_H, H_ALIGN,
+			      S_ALIGN);
+
+	pix->num_planes = fmt->coplanar ? 2 : 1;
+	pix->pixelformat = fmt->fourcc;
+	pix->colorspace = fmt->fourcc == V4L2_PIX_FMT_RGB24 ?
+			V4L2_COLORSPACE_SRGB : V4L2_COLORSPACE_SMPTE170M;
+
+
+	for (i = 0; i < pix->num_planes; i++) {
+		int depth;
+
+		plane_fmt = &pix->plane_fmt[i];
+		depth = fmt->vpdma_fmt[i]->depth;
+
+		if (i == VPE_LUMA)
+			plane_fmt->bytesperline =
+					round_up((pix->width * depth) >> 3,
+						1 << L_ALIGN);
+		else
+			plane_fmt->bytesperline = pix->width;
+
+		plane_fmt->sizeimage =
+				(pix->height * pix->width * depth) >> 3;
+	}
+
+	return 0;
+}
+
+static int vpe_try_fmt(struct file *file, void *priv, struct v4l2_format *f)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+	struct vpe_fmt *fmt = find_format(f);
+
+	if (V4L2_TYPE_IS_OUTPUT(f->type))
+		return __vpe_try_fmt(ctx, f, fmt, VPE_FMT_TYPE_OUTPUT);
+	else
+		return __vpe_try_fmt(ctx, f, fmt, VPE_FMT_TYPE_CAPTURE);
+}
+
+static int __vpe_s_fmt(struct vpe_ctx *ctx, struct v4l2_format *f)
+{
+	struct v4l2_pix_format_mplane *pix = &f->fmt.pix_mp;
+	struct v4l2_plane_pix_format *plane_fmt;
+	struct vpe_q_data *q_data;
+	struct vb2_queue *vq;
+	int i;
+
+	vq = v4l2_m2m_get_vq(ctx->m2m_ctx, f->type);
+	if (!vq)
+		return -EINVAL;
+
+	if (vb2_is_busy(vq)) {
+		vpe_err(ctx->dev, "queue busy\n");
+		return -EBUSY;
+	}
+
+	q_data = get_q_data(ctx, f->type);
+	if (!q_data)
+		return -EINVAL;
+
+	q_data->fmt		= find_format(f);
+	q_data->width		= pix->width;
+	q_data->height		= pix->height;
+	q_data->colorspace	= pix->colorspace;
+
+	for (i = 0; i < pix->num_planes; i++) {
+		plane_fmt = &pix->plane_fmt[i];
+
+		q_data->bytesperline[i]	= plane_fmt->bytesperline;
+		q_data->sizeimage[i]	= plane_fmt->sizeimage;
+	}
+
+	q_data->c_rect.left	= 0;
+	q_data->c_rect.top	= 0;
+	q_data->c_rect.width	= q_data->width;
+	q_data->c_rect.height	= q_data->height;
+
+	vpe_dbg(ctx->dev, "Setting format for type %d, wxh: %dx%d, fmt: %d bpl_y %d",
+		f->type, q_data->width, q_data->height, q_data->fmt->fourcc,
+		q_data->bytesperline[VPE_LUMA]);
+	if (q_data->fmt->coplanar)
+		vpe_dbg(ctx->dev, " bpl_uv %d\n",
+			q_data->bytesperline[VPE_CHROMA]);
+
+	return 0;
+}
+
+static int vpe_s_fmt(struct file *file, void *priv, struct v4l2_format *f)
+{
+	int ret;
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	ret = vpe_try_fmt(file, priv, f);
+	if (ret)
+		return ret;
+
+	ret = __vpe_s_fmt(ctx, f);
+	if (ret)
+		return ret;
+
+	if (V4L2_TYPE_IS_OUTPUT(f->type))
+		set_src_registers(ctx);
+	else
+		set_dst_registers(ctx);
+
+	return set_srcdst_params(ctx);
+}
+
+static int vpe_reqbufs(struct file *file, void *priv,
+		       struct v4l2_requestbuffers *reqbufs)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	return v4l2_m2m_reqbufs(file, ctx->m2m_ctx, reqbufs);
+}
+
+static int vpe_querybuf(struct file *file, void *priv, struct v4l2_buffer *buf)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	return v4l2_m2m_querybuf(file, ctx->m2m_ctx, buf);
+}
+
+static int vpe_qbuf(struct file *file, void *priv, struct v4l2_buffer *buf)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	return v4l2_m2m_qbuf(file, ctx->m2m_ctx, buf);
+}
+
+static int vpe_dqbuf(struct file *file, void *priv, struct v4l2_buffer *buf)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	return v4l2_m2m_dqbuf(file, ctx->m2m_ctx, buf);
+}
+
+static int vpe_streamon(struct file *file, void *priv, enum v4l2_buf_type type)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	return v4l2_m2m_streamon(file, ctx->m2m_ctx, type);
+}
+
+static int vpe_streamoff(struct file *file, void *priv, enum v4l2_buf_type type)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	vpe_dump_regs(ctx->dev);
+	vpdma_dump_regs(ctx->dev->vpdma);
+
+	return v4l2_m2m_streamoff(file, ctx->m2m_ctx, type);
+}
+
+#define V4L2_CID_TRANS_NUM_BUFS		(V4L2_CID_USER_BASE)
+
+static int vpe_s_ctrl(struct v4l2_ctrl *ctrl)
+{
+	struct vpe_ctx *ctx =
+		container_of(ctrl->handler, struct vpe_ctx, hdl);
+
+	switch (ctrl->id) {
+	case V4L2_CID_TRANS_NUM_BUFS:
+		ctx->bufs_per_job = ctrl->val;
+		break;
+
+	default:
+		vpe_err(ctx->dev, "Invalid control\n");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static const struct v4l2_ctrl_ops vpe_ctrl_ops = {
+	.s_ctrl = vpe_s_ctrl,
+};
+
+static const struct v4l2_ioctl_ops vpe_ioctl_ops = {
+	.vidioc_querycap	= vpe_querycap,
+
+	.vidioc_enum_fmt_vid_cap_mplane = vpe_enum_fmt,
+	.vidioc_g_fmt_vid_cap_mplane	= vpe_g_fmt,
+	.vidioc_try_fmt_vid_cap_mplane	= vpe_try_fmt,
+	.vidioc_s_fmt_vid_cap_mplane	= vpe_s_fmt,
+
+	.vidioc_enum_fmt_vid_out_mplane = vpe_enum_fmt,
+	.vidioc_g_fmt_vid_out_mplane	= vpe_g_fmt,
+	.vidioc_try_fmt_vid_out_mplane	= vpe_try_fmt,
+	.vidioc_s_fmt_vid_out_mplane	= vpe_s_fmt,
+
+	.vidioc_reqbufs		= vpe_reqbufs,
+	.vidioc_querybuf	= vpe_querybuf,
+
+	.vidioc_qbuf		= vpe_qbuf,
+	.vidioc_dqbuf		= vpe_dqbuf,
+
+	.vidioc_streamon	= vpe_streamon,
+	.vidioc_streamoff	= vpe_streamoff,
+	.vidioc_subscribe_event = v4l2_ctrl_subscribe_event,
+	.vidioc_unsubscribe_event = v4l2_event_unsubscribe,
+};
+
+/*
+ * Queue operations
+ */
+static int vpe_queue_setup(struct vb2_queue *vq,
+			   const struct v4l2_format *fmt,
+			   unsigned int *nbuffers, unsigned int *nplanes,
+			   unsigned int sizes[], void *alloc_ctxs[])
+{
+	int i;
+	struct vpe_ctx *ctx = vb2_get_drv_priv(vq);
+	struct vpe_q_data *q_data;
+
+	q_data = get_q_data(ctx, vq->type);
+
+	*nplanes = q_data->fmt->coplanar ? 2 : 1;
+
+	for (i = 0; i < *nplanes; i++) {
+		sizes[i] = q_data->sizeimage[i];
+		alloc_ctxs[i] = ctx->dev->alloc_ctx;
+	}
+
+	vpe_dbg(ctx->dev, "get %d buffer(s) of size %d", *nbuffers,
+		sizes[VPE_LUMA]);
+	if (q_data->fmt->coplanar)
+		vpe_dbg(ctx->dev, " and %d\n", sizes[VPE_CHROMA]);
+
+	return 0;
+}
+
+static int vpe_buf_prepare(struct vb2_buffer *vb)
+{
+	struct vpe_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
+	struct vpe_q_data *q_data;
+	int i, num_planes;
+
+	vpe_dbg(ctx->dev, "type: %d\n", vb->vb2_queue->type);
+
+	q_data = get_q_data(ctx, vb->vb2_queue->type);
+	num_planes = q_data->fmt->coplanar ? 2 : 1;
+
+	for (i = 0; i < num_planes; i++) {
+		if (vb2_plane_size(vb, i) < q_data->sizeimage[i]) {
+			vpe_err(ctx->dev,
+				"data will not fit into plane (%lu < %lu)\n",
+				vb2_plane_size(vb, i),
+				(long) q_data->sizeimage[i]);
+			return -EINVAL;
+		}
+	}
+
+	for (i = 0; i < num_planes; i++)
+		vb2_set_plane_payload(vb, i, q_data->sizeimage[i]);
+
+	return 0;
+}
+
+static void vpe_buf_queue(struct vb2_buffer *vb)
+{
+	struct vpe_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
+	v4l2_m2m_buf_queue(ctx->m2m_ctx, vb);
+}
+
+static void vpe_wait_prepare(struct vb2_queue *q)
+{
+	struct vpe_ctx *ctx = vb2_get_drv_priv(q);
+	vpe_unlock(ctx);
+}
+
+static void vpe_wait_finish(struct vb2_queue *q)
+{
+	struct vpe_ctx *ctx = vb2_get_drv_priv(q);
+	vpe_lock(ctx);
+}
+
+static struct vb2_ops vpe_qops = {
+	.queue_setup	 = vpe_queue_setup,
+	.buf_prepare	 = vpe_buf_prepare,
+	.buf_queue	 = vpe_buf_queue,
+	.wait_prepare	 = vpe_wait_prepare,
+	.wait_finish	 = vpe_wait_finish,
+};
+
+static int queue_init(void *priv, struct vb2_queue *src_vq,
+		      struct vb2_queue *dst_vq)
+{
+	struct vpe_ctx *ctx = priv;
+	int ret;
+
+	memset(src_vq, 0, sizeof(*src_vq));
+	src_vq->type = V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE;
+	src_vq->io_modes = VB2_MMAP;
+	src_vq->drv_priv = ctx;
+	src_vq->buf_struct_size = sizeof(struct v4l2_m2m_buffer);
+	src_vq->ops = &vpe_qops;
+	src_vq->mem_ops = &vb2_dma_contig_memops;
+	src_vq->timestamp_type = V4L2_BUF_FLAG_TIMESTAMP_MONOTONIC;
+
+	ret = vb2_queue_init(src_vq);
+	if (ret)
+		return ret;
+
+	memset(dst_vq, 0, sizeof(*dst_vq));
+	dst_vq->type = V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE;
+	dst_vq->io_modes = VB2_MMAP;
+	dst_vq->drv_priv = ctx;
+	dst_vq->buf_struct_size = sizeof(struct v4l2_m2m_buffer);
+	dst_vq->ops = &vpe_qops;
+	dst_vq->mem_ops = &vb2_dma_contig_memops;
+	dst_vq->timestamp_type = V4L2_BUF_FLAG_TIMESTAMP_MONOTONIC;
+
+	return vb2_queue_init(dst_vq);
+}
+
+static const struct v4l2_ctrl_config vpe_bufs_per_job = {
+	.ops = &vpe_ctrl_ops,
+	.id = V4L2_CID_TRANS_NUM_BUFS,
+	.name = "Buffers Per Transaction",
+	.type = V4L2_CTRL_TYPE_INTEGER,
+	.def = VPE_DEF_BUFS_PER_JOB,
+	.min = 1,
+	.max = VIDEO_MAX_FRAME,
+	.step = 1,
+};
+
+/*
+ * File operations
+ */
+static int vpe_open(struct file *file)
+{
+	struct vpe_dev *dev = video_drvdata(file);
+	struct vpe_ctx *ctx = NULL;
+	struct vpe_q_data *s_q_data;
+	struct v4l2_ctrl_handler *hdl;
+	int ret;
+
+	vpe_dbg(dev, "vpe_open\n");
+
+	if (!dev->vpdma->ready) {
+		vpe_err(dev, "vpdma firmware not loaded\n");
+		return -ENODEV;
+	}
+
+	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
+	if (!ctx)
+		return -ENOMEM;
+
+	ctx->dev = dev;
+
+	if (mutex_lock_interruptible(&dev->dev_mutex)) {
+		ret = -ERESTARTSYS;
+		goto free_ctx;
+	}
+
+	ret = vpdma_create_desc_list(&ctx->desc_list, VPE_DESC_LIST_SIZE,
+			VPDMA_LIST_TYPE_NORMAL);
+	if (ret != 0)
+		goto unlock;
+
+	ret = vpdma_buf_alloc(&ctx->mmr_adb, sizeof(struct vpe_mmr_adb));
+	if (ret != 0)
+		goto free_desc_list;
+
+	init_adb_hdrs(ctx);
+
+	v4l2_fh_init(&ctx->fh, video_devdata(file));
+	file->private_data = &ctx->fh;
+
+	hdl = &ctx->hdl;
+	v4l2_ctrl_handler_init(hdl, 1);
+	v4l2_ctrl_new_custom(hdl, &vpe_bufs_per_job, NULL);
+	if (hdl->error) {
+		ret = hdl->error;
+		goto exit_fh;
+	}
+
+	s_q_data = &ctx->q_data[Q_DATA_SRC];
+	s_q_data->fmt = &vpe_formats[2];
+	s_q_data->width = 1920;
+	s_q_data->height = 1080;
+	s_q_data->sizeimage[VPE_LUMA] = (s_q_data->width * s_q_data->height *
+			s_q_data->fmt->vpdma_fmt[VPE_LUMA]->depth) >> 3;
+	s_q_data->colorspace = V4L2_COLORSPACE_SMPTE240M;
+	s_q_data->c_rect.left = 0;
+	s_q_data->c_rect.top = 0;
+	s_q_data->c_rect.width = s_q_data->width;
+	s_q_data->c_rect.height = s_q_data->height;
+	s_q_data->flags = 0;
+
+	ctx->q_data[Q_DATA_DST] = *s_q_data;
+
+	set_src_registers(ctx);
+	set_dst_registers(ctx);
+	ret = set_srcdst_params(ctx);
+	if (ret)
+		goto exit_fh;
+
+	ctx->m2m_ctx = v4l2_m2m_ctx_init(dev->m2m_dev, ctx, &queue_init);
+
+	if (IS_ERR(ctx->m2m_ctx)) {
+		ret = PTR_ERR(ctx->m2m_ctx);
+		goto exit_fh;
+	}
+
+	v4l2_fh_add(&ctx->fh);
+
+	/*
+	 * for now, just report the creation of the first instance, we can later
+	 * optimize the driver to enable or disable clocks when the first
+	 * instance is created or the last instance released
+	 */
+	if (atomic_inc_return(&dev->num_instances) == 1)
+		vpe_dbg(dev, "first instance created\n");
+
+	ctx->bufs_per_job = VPE_DEF_BUFS_PER_JOB;
+
+	ctx->load_mmrs = true;
+
+	vpe_dbg(dev, "created instance %p, m2m_ctx: %p\n",
+		ctx, ctx->m2m_ctx);
+
+	mutex_unlock(&dev->dev_mutex);
+
+	return 0;
+exit_fh:
+	v4l2_ctrl_handler_free(hdl);
+	v4l2_fh_exit(&ctx->fh);
+	vpdma_buf_free(&ctx->mmr_adb);
+free_desc_list:
+	vpdma_free_desc_list(&ctx->desc_list);
+unlock:
+	mutex_unlock(&dev->dev_mutex);
+free_ctx:
+	kfree(ctx);
+	return ret;
+}
+
+static int vpe_release(struct file *file)
+{
+	struct vpe_dev *dev = video_drvdata(file);
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	vpe_dbg(dev, "releasing instance %p\n", ctx);
+
+	mutex_lock(&dev->dev_mutex);
+	vpdma_free_desc_list(&ctx->desc_list);
+	vpdma_buf_free(&ctx->mmr_adb);
+
+	v4l2_fh_del(&ctx->fh);
+	v4l2_fh_exit(&ctx->fh);
+	v4l2_m2m_ctx_release(ctx->m2m_ctx);
+
+	kfree(ctx);
+
+	/*
+	 * for now, just report the release of the last instance, we can later
+	 * optimize the driver to enable or disable clocks when the first
+	 * instance is created or the last instance released
+	 */
+	if (atomic_dec_return(&dev->num_instances) == 0)
+		vpe_dbg(dev, "last instance released\n");
+
+	mutex_unlock(&dev->dev_mutex);
+
+	return 0;
+}
+
+static unsigned int vpe_poll(struct file *file,
+			     struct poll_table_struct *wait)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+	struct vpe_dev *dev = ctx->dev;
+	int ret;
+
+	mutex_lock(&dev->dev_mutex);
+	ret = v4l2_m2m_poll(file, ctx->m2m_ctx, wait);
+	mutex_unlock(&dev->dev_mutex);
+	return ret;
+}
+
+static int vpe_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+	struct vpe_dev *dev = ctx->dev;
+	int ret;
+
+	if (mutex_lock_interruptible(&dev->dev_mutex))
+		return -ERESTARTSYS;
+	ret = v4l2_m2m_mmap(file, ctx->m2m_ctx, vma);
+	mutex_unlock(&dev->dev_mutex);
+	return ret;
+}
+
+static const struct v4l2_file_operations vpe_fops = {
+	.owner		= THIS_MODULE,
+	.open		= vpe_open,
+	.release	= vpe_release,
+	.poll		= vpe_poll,
+	.unlocked_ioctl	= video_ioctl2,
+	.mmap		= vpe_mmap,
+};
+
+static struct video_device vpe_videodev = {
+	.name		= VPE_MODULE_NAME,
+	.fops		= &vpe_fops,
+	.ioctl_ops	= &vpe_ioctl_ops,
+	.minor		= -1,
+	.release	= video_device_release,
+	.vfl_dir	= VFL_DIR_M2M,
+};
+
+static struct v4l2_m2m_ops m2m_ops = {
+	.device_run	= device_run,
+	.job_ready	= job_ready,
+	.job_abort	= job_abort,
+	.lock		= vpe_lock,
+	.unlock		= vpe_unlock,
+};
+
+static int vpe_runtime_get(struct platform_device *pdev)
+{
+	int r;
+
+	dev_dbg(&pdev->dev, "vpe_runtime_get\n");
+
+	r = pm_runtime_get_sync(&pdev->dev);
+	WARN_ON(r < 0);
+	return r < 0 ? r : 0;
+}
+
+static void vpe_runtime_put(struct platform_device *pdev)
+{
+
+	int r;
+
+	dev_dbg(&pdev->dev, "vpe_runtime_put\n");
+
+	r = pm_runtime_put_sync(&pdev->dev);
+	WARN_ON(r < 0 && r != -ENOSYS);
+}
+
+static int vpe_probe(struct platform_device *pdev)
+{
+	struct vpe_dev *dev;
+	struct video_device *vfd;
+	struct resource *res;
+	int ret, irq, func;
+
+	dev = devm_kzalloc(&pdev->dev, sizeof(*dev), GFP_KERNEL);
+	if (!dev)
+		return -ENOMEM;
+
+	spin_lock_init(&dev->lock);
+
+	pm_runtime_enable(&pdev->dev);
+
+	ret = vpe_runtime_get(pdev);
+	if (ret)
+		goto err_runtime_get;
+
+	irq = platform_get_irq(pdev, 0);
+	if (irq < 0) {
+		dev_err(&pdev->dev, "missing irq data\n");
+		return -ENODEV;
+	}
+
+	res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "vpe");
+	if (res == NULL) {
+		dev_err(&pdev->dev, "missing platform resources data\n");
+		return -ENODEV;
+	}
+
+	ret = v4l2_device_register(&pdev->dev, &dev->v4l2_dev);
+	if (ret)
+		return ret;
+
+	atomic_set(&dev->num_instances, 0);
+	mutex_init(&dev->dev_mutex);
+
+	vfd = video_device_alloc();
+	if (!vfd) {
+		vpe_err(dev, "Failed to allocate video device\n");
+		ret = -ENOMEM;
+		goto dev_unreg;
+	}
+
+	*vfd = vpe_videodev;
+	vfd->lock = &dev->dev_mutex;
+	vfd->v4l2_dev = &dev->v4l2_dev;
+
+	ret = video_register_device(vfd, VFL_TYPE_GRABBER, 0);
+	if (ret) {
+		vpe_err(dev, "Failed to register video device\n");
+		goto rel_vdev;
+	}
+
+	video_set_drvdata(vfd, dev);
+	snprintf(vfd->name, sizeof(vfd->name), "%s", vpe_videodev.name);
+	dev->vfd = vfd;
+	dev_info(dev->v4l2_dev.dev, "Device registered as /dev/video%d\n",
+		vfd->num);
+
+	platform_set_drvdata(pdev, dev);
+
+	dev->base = devm_ioremap(&pdev->dev, res->start, SZ_128K);
+	if (!dev->base) {
+		ret = -ENOMEM;
+		goto vid_unreg_dev;
+	}
+
+	/* Perform clk enable followed by reset */
+	vpe_set_clock_enable(dev, 1);
+
+	vpe_top_reset(dev);
+
+	func = get_field_reg(dev, VPE_PID, VPE_PID_FUNC_MASK,
+		VPE_PID_FUNC_SHIFT);
+	vpe_dbg(dev, "VPE PID function %x\n", func);
+
+	if (devm_request_irq(&pdev->dev, irq, vpe_irq, 0, VPE_MODULE_NAME,
+			dev) < 0) {
+		ret = -ENOMEM;
+		goto vid_unreg_dev;
+	}
+
+	dev->alloc_ctx = vb2_dma_contig_init_ctx(&pdev->dev);
+	if (IS_ERR(dev->alloc_ctx)) {
+		vpe_err(dev, "Failed to alloc vb2 context\n");
+		ret = PTR_ERR(dev->alloc_ctx);
+		goto vid_unreg_dev;
+	}
+
+	dev->m2m_dev = v4l2_m2m_init(&m2m_ops);
+	if (IS_ERR(dev->m2m_dev)) {
+		vpe_err(dev, "Failed to init mem2mem device\n");
+		ret = PTR_ERR(dev->m2m_dev);
+		goto rel_ctx;
+	}
+
+	vpe_top_vpdma_reset(dev);
+
+	ret = vpdma_init(pdev, &dev->vpdma);
+	if (ret < 0)
+		goto rel_m2m;
+
+	return 0;
+
+rel_m2m:
+	v4l2_m2m_release(dev->m2m_dev);
+rel_ctx:
+	vb2_dma_contig_cleanup_ctx(dev->alloc_ctx);
+vid_unreg_dev:
+	video_unregister_device(vfd);
+rel_vdev:
+	video_device_release(vfd);
+dev_unreg:
+	v4l2_device_unregister(&dev->v4l2_dev);
+err_runtime_get:
+	pm_runtime_disable(&pdev->dev);
+
+	return ret;
+}
+
+static int vpe_remove(struct platform_device *pdev)
+{
+	struct vpe_dev *dev =
+		(struct vpe_dev *) platform_get_drvdata(pdev);
+
+	v4l2_info(&dev->v4l2_dev, "Removing " VPE_MODULE_NAME);
+
+	v4l2_m2m_release(dev->m2m_dev);
+	video_unregister_device(dev->vfd);
+	v4l2_device_unregister(&dev->v4l2_dev);
+	vb2_dma_contig_cleanup_ctx(dev->alloc_ctx);
+
+	vpe_set_clock_enable(dev, 0);
+	vpe_runtime_put(pdev);
+	pm_runtime_disable(&pdev->dev);
+
+	return 0;
+}
+
+#if defined(CONFIG_OF)
+static const struct of_device_id vpe_of_match[] = {
+	{
+		.compatible = "ti,vpe",
+	},
+	{},
+};
+#else
+#define vpe_of_match NULL
+#endif
+
+static struct platform_driver vpe_pdrv = {
+	.probe		= vpe_probe,
+	.remove		= vpe_remove,
+	.driver		= {
+		.name	= VPE_MODULE_NAME,
+		.owner	= THIS_MODULE,
+		.of_match_table = vpe_of_match,
+	},
+};
+
+static void __exit vpe_exit(void)
+{
+	platform_driver_unregister(&vpe_pdrv);
+}
+
+static int __init vpe_init(void)
+{
+	return platform_driver_register(&vpe_pdrv);
+}
+
+module_init(vpe_init);
+module_exit(vpe_exit);
+
+MODULE_DESCRIPTION("TI VPE driver");
+MODULE_AUTHOR("Dale Farnsworth, <dale@farnsworth.org>");
+MODULE_LICENSE("GPL");
diff --git a/drivers/media/platform/ti-vpe/vpe_regs.h b/drivers/media/platform/ti-vpe/vpe_regs.h
new file mode 100644
index 0000000..be41a1f
--- /dev/null
+++ b/drivers/media/platform/ti-vpe/vpe_regs.h
@@ -0,0 +1,496 @@
+/*
+ * Copyright (c) 2013 Texas Instruments Inc.
+ *
+ * David Griego, <dagriego@biglakesoftware.com>
+ * Dale Farnsworth, <dale@farnsworth.org>
+ * Archit Taneja, <archit@ti.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#ifndef __TI_VPE_REGS_H
+#define __TI_VPE_REGS_H
+
+/* VPE register offsets and field selectors */
+
+/* VPE top level regs */
+#define VPE_PID				0x0000
+#define VPE_PID_MINOR_MASK		0x3f
+#define VPE_PID_MINOR_SHIFT		0
+#define VPE_PID_CUSTOM_MASK		0x03
+#define VPE_PID_CUSTOM_SHIFT		6
+#define VPE_PID_MAJOR_MASK		0x07
+#define VPE_PID_MAJOR_SHIFT		8
+#define VPE_PID_RTL_MASK		0x1f
+#define VPE_PID_RTL_SHIFT		11
+#define VPE_PID_FUNC_MASK		0xfff
+#define VPE_PID_FUNC_SHIFT		16
+#define VPE_PID_SCHEME_MASK		0x03
+#define VPE_PID_SCHEME_SHIFT		30
+
+#define VPE_SYSCONFIG			0x0010
+#define VPE_SYSCONFIG_IDLE_MASK		0x03
+#define VPE_SYSCONFIG_IDLE_SHIFT	2
+#define VPE_SYSCONFIG_STANDBY_MASK	0x03
+#define VPE_SYSCONFIG_STANDBY_SHIFT	4
+#define VPE_FORCE_IDLE_MODE		0
+#define VPE_NO_IDLE_MODE		1
+#define VPE_SMART_IDLE_MODE		2
+#define VPE_SMART_IDLE_WAKEUP_MODE	3
+#define VPE_FORCE_STANDBY_MODE		0
+#define VPE_NO_STANDBY_MODE		1
+#define VPE_SMART_STANDBY_MODE		2
+#define VPE_SMART_STANDBY_WAKEUP_MODE	3
+
+#define VPE_INT0_STATUS0_RAW_SET	0x0020
+#define VPE_INT0_STATUS0_RAW		VPE_INT0_STATUS0_RAW_SET
+#define VPE_INT0_STATUS0_CLR		0x0028
+#define VPE_INT0_STATUS0		VPE_INT0_STATUS0_CLR
+#define VPE_INT0_ENABLE0_SET		0x0030
+#define VPE_INT0_ENABLE0		VPE_INT0_ENABLE0_SET
+#define VPE_INT0_ENABLE0_CLR		0x0038
+#define VPE_INT0_LIST0_COMPLETE		(1 << 0)
+#define VPE_INT0_LIST0_NOTIFY		(1 << 1)
+#define VPE_INT0_LIST1_COMPLETE		(1 << 2)
+#define VPE_INT0_LIST1_NOTIFY		(1 << 3)
+#define VPE_INT0_LIST2_COMPLETE		(1 << 4)
+#define VPE_INT0_LIST2_NOTIFY		(1 << 5)
+#define VPE_INT0_LIST3_COMPLETE		(1 << 6)
+#define VPE_INT0_LIST3_NOTIFY		(1 << 7)
+#define VPE_INT0_LIST4_COMPLETE		(1 << 8)
+#define VPE_INT0_LIST4_NOTIFY		(1 << 9)
+#define VPE_INT0_LIST5_COMPLETE		(1 << 10)
+#define VPE_INT0_LIST5_NOTIFY		(1 << 11)
+#define VPE_INT0_LIST6_COMPLETE		(1 << 12)
+#define VPE_INT0_LIST6_NOTIFY		(1 << 13)
+#define VPE_INT0_LIST7_COMPLETE		(1 << 14)
+#define VPE_INT0_LIST7_NOTIFY		(1 << 15)
+#define VPE_INT0_DESCRIPTOR		(1 << 16)
+#define VPE_DEI_FMD_INT			(1 << 18)
+
+#define VPE_INT0_STATUS1_RAW_SET	0x0024
+#define VPE_INT0_STATUS1_RAW		VPE_INT0_STATUS1_RAW_SET
+#define VPE_INT0_STATUS1_CLR		0x002c
+#define VPE_INT0_STATUS1		VPE_INT0_STATUS1_CLR
+#define VPE_INT0_ENABLE1_SET		0x0034
+#define VPE_INT0_ENABLE1		VPE_INT0_ENABLE1_SET
+#define VPE_INT0_ENABLE1_CLR		0x003c
+#define VPE_INT0_CHANNEL_GROUP0		(1 << 0)
+#define VPE_INT0_CHANNEL_GROUP1		(1 << 1)
+#define VPE_INT0_CHANNEL_GROUP2		(1 << 2)
+#define VPE_INT0_CHANNEL_GROUP3		(1 << 3)
+#define VPE_INT0_CHANNEL_GROUP4		(1 << 4)
+#define VPE_INT0_CHANNEL_GROUP5		(1 << 5)
+#define VPE_INT0_CLIENT			(1 << 7)
+#define VPE_DEI_ERROR_INT		(1 << 16)
+#define VPE_DS1_UV_ERROR_INT		(1 << 22)
+
+#define VPE_INTC_EOI			0x00a0
+
+#define VPE_CLK_ENABLE			0x0100
+#define VPE_VPEDMA_CLK_ENABLE		(1 << 0)
+#define VPE_DATA_PATH_CLK_ENABLE	(1 << 1)
+
+#define VPE_CLK_RESET			0x0104
+#define VPE_VPDMA_CLK_RESET_MASK	0x1
+#define VPE_VPDMA_CLK_RESET_SHIFT	0
+#define VPE_DATA_PATH_CLK_RESET_MASK	0x1
+#define VPE_DATA_PATH_CLK_RESET_SHIFT	1
+#define VPE_MAIN_RESET_MASK		0x1
+#define VPE_MAIN_RESET_SHIFT		31
+
+#define VPE_CLK_FORMAT_SELECT		0x010c
+#define VPE_CSC_SRC_SELECT_MASK		0x03
+#define VPE_CSC_SRC_SELECT_SHIFT	0
+#define VPE_RGB_OUT_SELECT		(1 << 8)
+#define VPE_DS_SRC_SELECT_MASK		0x07
+#define VPE_DS_SRC_SELECT_SHIFT		9
+#define VPE_DS_BYPASS			(1 << 16)
+#define VPE_COLOR_SEPARATE_422		(1 << 18)
+
+#define VPE_DS_SRC_DEI_SCALER		(5 << VPE_DS_SRC_SELECT_SHIFT)
+#define VPE_CSC_SRC_DEI_SCALER		(3 << VPE_CSC_SRC_SELECT_SHIFT)
+
+#define VPE_CLK_RANGE_MAP		0x011c
+#define VPE_RANGE_RANGE_MAP_Y_MASK	0x07
+#define VPE_RANGE_RANGE_MAP_Y_SHIFT	0
+#define VPE_RANGE_RANGE_MAP_UV_MASK	0x07
+#define VPE_RANGE_RANGE_MAP_UV_SHIFT	3
+#define VPE_RANGE_MAP_ON		(1 << 6)
+#define VPE_RANGE_REDUCTION_ON		(1 << 28)
+
+/* VPE chrominance upsampler regs */
+#define VPE_US1_R0			0x0304
+#define VPE_US2_R0			0x0404
+#define VPE_US3_R0			0x0504
+#define VPE_US_C1_MASK			0x3fff
+#define VPE_US_C1_SHIFT			2
+#define VPE_US_C0_MASK			0x3fff
+#define VPE_US_C0_SHIFT			18
+#define VPE_US_MODE_MASK		0x03
+#define VPE_US_MODE_SHIFT		16
+#define VPE_ANCHOR_FID0_C1_MASK		0x3fff
+#define VPE_ANCHOR_FID0_C1_SHIFT	2
+#define VPE_ANCHOR_FID0_C0_MASK		0x3fff
+#define VPE_ANCHOR_FID0_C0_SHIFT	18
+
+#define VPE_US1_R1			0x0308
+#define VPE_US2_R1			0x0408
+#define VPE_US3_R1			0x0508
+#define VPE_ANCHOR_FID0_C3_MASK		0x3fff
+#define VPE_ANCHOR_FID0_C3_SHIFT	2
+#define VPE_ANCHOR_FID0_C2_MASK		0x3fff
+#define VPE_ANCHOR_FID0_C2_SHIFT	18
+
+#define VPE_US1_R2			0x030c
+#define VPE_US2_R2			0x040c
+#define VPE_US3_R2			0x050c
+#define VPE_INTERP_FID0_C1_MASK		0x3fff
+#define VPE_INTERP_FID0_C1_SHIFT	2
+#define VPE_INTERP_FID0_C0_MASK		0x3fff
+#define VPE_INTERP_FID0_C0_SHIFT	18
+
+#define VPE_US1_R3			0x0310
+#define VPE_US2_R3			0x0410
+#define VPE_US3_R3			0x0510
+#define VPE_INTERP_FID0_C3_MASK		0x3fff
+#define VPE_INTERP_FID0_C3_SHIFT	2
+#define VPE_INTERP_FID0_C2_MASK		0x3fff
+#define VPE_INTERP_FID0_C2_SHIFT	18
+
+#define VPE_US1_R4			0x0314
+#define VPE_US2_R4			0x0414
+#define VPE_US3_R4			0x0514
+#define VPE_ANCHOR_FID1_C1_MASK		0x3fff
+#define VPE_ANCHOR_FID1_C1_SHIFT	2
+#define VPE_ANCHOR_FID1_C0_MASK		0x3fff
+#define VPE_ANCHOR_FID1_C0_SHIFT	18
+
+#define VPE_US1_R5			0x0318
+#define VPE_US2_R5			0x0418
+#define VPE_US3_R5			0x0518
+#define VPE_ANCHOR_FID1_C3_MASK		0x3fff
+#define VPE_ANCHOR_FID1_C3_SHIFT	2
+#define VPE_ANCHOR_FID1_C2_MASK		0x3fff
+#define VPE_ANCHOR_FID1_C2_SHIFT	18
+
+#define VPE_US1_R6			0x031c
+#define VPE_US2_R6			0x041c
+#define VPE_US3_R6			0x051c
+#define VPE_INTERP_FID1_C1_MASK		0x3fff
+#define VPE_INTERP_FID1_C1_SHIFT	2
+#define VPE_INTERP_FID1_C0_MASK		0x3fff
+#define VPE_INTERP_FID1_C0_SHIFT	18
+
+#define VPE_US1_R7			0x0320
+#define VPE_US2_R7			0x0420
+#define VPE_US3_R7			0x0520
+#define VPE_INTERP_FID0_C3_MASK		0x3fff
+#define VPE_INTERP_FID0_C3_SHIFT	2
+#define VPE_INTERP_FID0_C2_MASK		0x3fff
+#define VPE_INTERP_FID0_C2_SHIFT	18
+
+/* VPE de-interlacer regs */
+#define VPE_DEI_FRAME_SIZE		0x0600
+#define VPE_DEI_WIDTH_MASK		0x07ff
+#define VPE_DEI_WIDTH_SHIFT		0
+#define VPE_DEI_HEIGHT_MASK		0x07ff
+#define VPE_DEI_HEIGHT_SHIFT		16
+#define VPE_DEI_INTERLACE_BYPASS	(1 << 29)
+#define VPE_DEI_FIELD_FLUSH		(1 << 30)
+#define VPE_DEI_PROGRESSIVE		(1 << 31)
+
+#define VPE_MDT_BYPASS			0x0604
+#define VPE_MDT_TEMPMAX_BYPASS		(1 << 0)
+#define VPE_MDT_SPATMAX_BYPASS		(1 << 1)
+
+#define VPE_MDT_SF_THRESHOLD		0x0608
+#define VPE_MDT_SF_SC_THR1_MASK		0xff
+#define VPE_MDT_SF_SC_THR1_SHIFT	0
+#define VPE_MDT_SF_SC_THR2_MASK		0xff
+#define VPE_MDT_SF_SC_THR2_SHIFT	0
+#define VPE_MDT_SF_SC_THR3_MASK		0xff
+#define VPE_MDT_SF_SC_THR3_SHIFT	0
+
+#define VPE_EDI_CONFIG			0x060c
+#define VPE_EDI_INP_MODE_MASK		0x03
+#define VPE_EDI_INP_MODE_SHIFT		0
+#define VPE_EDI_ENABLE_3D		(1 << 2)
+#define VPE_EDI_ENABLE_CHROMA_3D	(1 << 3)
+#define VPE_EDI_CHROMA3D_COR_THR_MASK	0xff
+#define VPE_EDI_CHROMA3D_COR_THR_SHIFT	8
+#define VPE_EDI_DIR_COR_LOWER_THR_MASK	0xff
+#define VPE_EDI_DIR_COR_LOWER_THR_SHIFT	16
+#define VPE_EDI_COR_SCALE_FACTOR_MASK	0xff
+#define VPE_EDI_COR_SCALE_FACTOR_SHIFT	23
+
+#define VPE_DEI_EDI_LUT_R0		0x0610
+#define VPE_EDI_LUT0_MASK		0x1f
+#define VPE_EDI_LUT0_SHIFT		0
+#define VPE_EDI_LUT1_MASK		0x1f
+#define VPE_EDI_LUT1_SHIFT		8
+#define VPE_EDI_LUT2_MASK		0x1f
+#define VPE_EDI_LUT2_SHIFT		16
+#define VPE_EDI_LUT3_MASK		0x1f
+#define VPE_EDI_LUT3_SHIFT		24
+
+#define VPE_DEI_EDI_LUT_R1		0x0614
+#define VPE_EDI_LUT0_MASK		0x1f
+#define VPE_EDI_LUT0_SHIFT		0
+#define VPE_EDI_LUT1_MASK		0x1f
+#define VPE_EDI_LUT1_SHIFT		8
+#define VPE_EDI_LUT2_MASK		0x1f
+#define VPE_EDI_LUT2_SHIFT		16
+#define VPE_EDI_LUT3_MASK		0x1f
+#define VPE_EDI_LUT3_SHIFT		24
+
+#define VPE_DEI_EDI_LUT_R2		0x0618
+#define VPE_EDI_LUT4_MASK		0x1f
+#define VPE_EDI_LUT4_SHIFT		0
+#define VPE_EDI_LUT5_MASK		0x1f
+#define VPE_EDI_LUT5_SHIFT		8
+#define VPE_EDI_LUT6_MASK		0x1f
+#define VPE_EDI_LUT6_SHIFT		16
+#define VPE_EDI_LUT7_MASK		0x1f
+#define VPE_EDI_LUT7_SHIFT		24
+
+#define VPE_DEI_EDI_LUT_R3		0x061c
+#define VPE_EDI_LUT8_MASK		0x1f
+#define VPE_EDI_LUT8_SHIFT		0
+#define VPE_EDI_LUT9_MASK		0x1f
+#define VPE_EDI_LUT9_SHIFT		8
+#define VPE_EDI_LUT10_MASK		0x1f
+#define VPE_EDI_LUT10_SHIFT		16
+#define VPE_EDI_LUT11_MASK		0x1f
+#define VPE_EDI_LUT11_SHIFT		24
+
+#define VPE_DEI_FMD_WINDOW_R0		0x0620
+#define VPE_FMD_WINDOW_MINX_MASK	0x07ff
+#define VPE_FMD_WINDOW_MINX_SHIFT	0
+#define VPE_FMD_WINDOW_MAXX_MASK	0x07ff
+#define VPE_FMD_WINDOW_MAXX_SHIFT	16
+#define VPE_FMD_WINDOW_ENABLE		(1 << 31)
+
+#define VPE_DEI_FMD_WINDOW_R1		0x0624
+#define VPE_FMD_WINDOW_MINY_MASK	0x07ff
+#define VPE_FMD_WINDOW_MINY_SHIFT	0
+#define VPE_FMD_WINDOW_MAXY_MASK	0x07ff
+#define VPE_FMD_WINDOW_MAXY_SHIFT	16
+
+#define VPE_DEI_FMD_CONTROL_R0		0x0628
+#define VPE_FMD_ENABLE			(1 << 0)
+#define VPE_FMD_LOCK			(1 << 1)
+#define VPE_FMD_JAM_DIR			(1 << 2)
+#define VPE_FMD_BED_ENABLE		(1 << 3)
+#define VPE_FMD_CAF_FIELD_THR_MASK	0xff
+#define VPE_FMD_CAF_FIELD_THR_SHIFT	16
+#define VPE_FMD_CAF_LINE_THR_MASK	0xff
+#define VPE_FMD_CAF_LINE_THR_SHIFT	24
+
+#define VPE_DEI_FMD_CONTROL_R1		0x062c
+#define VPE_FMD_CAF_THR_MASK		0x000fffff
+#define VPE_FMD_CAF_THR_SHIFT		0
+
+#define VPE_DEI_FMD_STATUS_R0		0x0630
+#define VPE_FMD_CAF_MASK		0x000fffff
+#define VPE_FMD_CAF_SHIFT		0
+#define VPE_FMD_RESET			(1 << 24)
+
+#define VPE_DEI_FMD_STATUS_R1		0x0634
+#define VPE_FMD_FIELD_DIFF_MASK		0x0fffffff
+#define VPE_FMD_FIELD_DIFF_SHIFT	0
+
+#define VPE_DEI_FMD_STATUS_R2		0x0638
+#define VPE_FMD_FRAME_DIFF_MASK		0x000fffff
+#define VPE_FMD_FRAME_DIFF_SHIFT	0
+
+/* VPE scaler regs */
+#define VPE_SC_MP_SC0			0x0700
+#define VPE_INTERLACE_O			(1 << 0)
+#define VPE_LINEAR			(1 << 1)
+#define VPE_SC_BYPASS			(1 << 2)
+#define VPE_INVT_FID			(1 << 3)
+#define VPE_USE_RAV			(1 << 4)
+#define VPE_ENABLE_EV			(1 << 5)
+#define VPE_AUTH_HS			(1 << 6)
+#define VPE_DCM_2X			(1 << 7)
+#define VPE_DCM_4X			(1 << 8)
+#define VPE_HP_BYPASS			(1 << 9)
+#define VPE_INTERLACE_I			(1 << 10)
+#define VPE_ENABLE_SIN2_VER_INTP	(1 << 11)
+#define VPE_Y_PK_EN			(1 << 14)
+#define VPE_TRIM			(1 << 15)
+#define VPE_SELFGEN_FID			(1 << 16)
+
+#define VPE_SC_MP_SC1			0x0704
+#define VPE_ROW_ACC_INC_MASK		0x07ffffff
+#define VPE_ROW_ACC_INC_SHIFT		0
+
+#define VPE_SC_MP_SC2			0x0708
+#define VPE_ROW_ACC_OFFSET_MASK		0x0fffffff
+#define VPE_ROW_ACC_OFFSET_SHIFT	0
+
+#define VPE_SC_MP_SC3			0x070c
+#define VPE_ROW_ACC_OFFSET_B_MASK	0x0fffffff
+#define VPE_ROW_ACC_OFFSET_B_SHIFT	0
+
+#define VPE_SC_MP_SC4			0x0710
+#define VPE_TAR_H_MASK			0x07ff
+#define VPE_TAR_H_SHIFT			0
+#define VPE_TAR_W_MASK			0x07ff
+#define VPE_TAR_W_SHIFT			12
+#define VPE_LIN_ACC_INC_U_MASK		0x07
+#define VPE_LIN_ACC_INC_U_SHIFT		24
+#define VPE_NLIN_ACC_INIT_U_MASK	0x07
+#define VPE_NLIN_ACC_INIT_U_SHIFT	28
+
+#define VPE_SC_MP_SC5			0x0714
+#define VPE_SRC_H_MASK			0x07ff
+#define VPE_SRC_H_SHIFT			0
+#define VPE_SRC_W_MASK			0x07ff
+#define VPE_SRC_W_SHIFT			12
+#define VPE_NLIN_ACC_INC_U_MASK		0x07
+#define VPE_NLIN_ACC_INC_U_SHIFT	24
+
+#define VPE_SC_MP_SC6			0x0718
+#define VPE_ROW_ACC_INIT_RAV_MASK	0x03ff
+#define VPE_ROW_ACC_INIT_RAV_SHIFT	0
+#define VPE_ROW_ACC_INIT_RAV_B_MASK	0x03ff
+#define VPE_ROW_ACC_INIT_RAV_B_SHIFT	10
+
+#define VPE_SC_MP_SC8			0x0720
+#define VPE_NLIN_LEFT_MASK		0x07ff
+#define VPE_NLIN_LEFT_SHIFT		0
+#define VPE_NLIN_RIGHT_MASK		0x07ff
+#define VPE_NLIN_RIGHT_SHIFT		12
+
+#define VPE_SC_MP_SC9			0x0724
+#define VPE_LIN_ACC_INC			VPE_SC_MP_SC9
+
+#define VPE_SC_MP_SC10			0x0728
+#define VPE_NLIN_ACC_INIT		VPE_SC_MP_SC10
+
+#define VPE_SC_MP_SC11			0x072c
+#define VPE_NLIN_ACC_INC		VPE_SC_MP_SC11
+
+#define VPE_SC_MP_SC12			0x0730
+#define VPE_COL_ACC_OFFSET_MASK		0x01ffffff
+#define VPE_COL_ACC_OFFSET_SHIFT	0
+
+#define VPE_SC_MP_SC13			0x0734
+#define VPE_SC_FACTOR_RAV_MASK		0x03ff
+#define VPE_SC_FACTOR_RAV_SHIFT		0
+#define VPE_CHROMA_INTP_THR_MASK	0x03ff
+#define VPE_CHROMA_INTP_THR_SHIFT	12
+#define VPE_DELTA_CHROMA_THR_MASK	0x0f
+#define VPE_DELTA_CHROMA_THR_SHIFT	24
+
+#define VPE_SC_MP_SC17			0x0744
+#define VPE_EV_THR_MASK			0x03ff
+#define VPE_EV_THR_SHIFT		12
+#define VPE_DELTA_LUMA_THR_MASK		0x0f
+#define VPE_DELTA_LUMA_THR_SHIFT	24
+#define VPE_DELTA_EV_THR_MASK		0x0f
+#define VPE_DELTA_EV_THR_SHIFT		28
+
+#define VPE_SC_MP_SC18			0x0748
+#define VPE_HS_FACTOR_MASK		0x03ff
+#define VPE_HS_FACTOR_SHIFT		0
+#define VPE_CONF_DEFAULT_MASK		0x01ff
+#define VPE_CONF_DEFAULT_SHIFT		16
+
+#define VPE_SC_MP_SC19			0x074c
+#define VPE_HPF_COEFF0_MASK		0xff
+#define VPE_HPF_COEFF0_SHIFT		0
+#define VPE_HPF_COEFF1_MASK		0xff
+#define VPE_HPF_COEFF1_SHIFT		8
+#define VPE_HPF_COEFF2_MASK		0xff
+#define VPE_HPF_COEFF2_SHIFT		16
+#define VPE_HPF_COEFF3_MASK		0xff
+#define VPE_HPF_COEFF3_SHIFT		23
+
+#define VPE_SC_MP_SC20			0x0750
+#define VPE_HPF_COEFF4_MASK		0xff
+#define VPE_HPF_COEFF4_SHIFT		0
+#define VPE_HPF_COEFF5_MASK		0xff
+#define VPE_HPF_COEFF5_SHIFT		8
+#define VPE_HPF_NORM_SHIFT_MASK		0x07
+#define VPE_HPF_NORM_SHIFT_SHIFT	16
+#define VPE_NL_LIMIT_MASK		0x1ff
+#define VPE_NL_LIMIT_SHIFT		20
+
+#define VPE_SC_MP_SC21			0x0754
+#define VPE_NL_LO_THR_MASK		0x01ff
+#define VPE_NL_LO_THR_SHIFT		0
+#define VPE_NL_LO_SLOPE_MASK		0xff
+#define VPE_NL_LO_SLOPE_SHIFT		16
+
+#define VPE_SC_MP_SC22			0x0758
+#define VPE_NL_HI_THR_MASK		0x01ff
+#define VPE_NL_HI_THR_SHIFT		0
+#define VPE_NL_HI_SLOPE_SH_MASK		0x07
+#define VPE_NL_HI_SLOPE_SH_SHIFT	16
+
+#define VPE_SC_MP_SC23			0x075c
+#define VPE_GRADIENT_THR_MASK		0x07ff
+#define VPE_GRADIENT_THR_SHIFT		0
+#define VPE_GRADIENT_THR_RANGE_MASK	0x0f
+#define VPE_GRADIENT_THR_RANGE_SHIFT	12
+#define VPE_MIN_GY_THR_MASK		0xff
+#define VPE_MIN_GY_THR_SHIFT		16
+#define VPE_MIN_GY_THR_RANGE_MASK	0x0f
+#define VPE_MIN_GY_THR_RANGE_SHIFT	28
+
+#define VPE_SC_MP_SC24			0x0760
+#define VPE_ORG_H_MASK			0x07ff
+#define VPE_ORG_H_SHIFT			0
+#define VPE_ORG_W_MASK			0x07ff
+#define VPE_ORG_W_SHIFT			16
+
+#define VPE_SC_MP_SC25			0x0764
+#define VPE_OFF_H_MASK			0x07ff
+#define VPE_OFF_H_SHIFT			0
+#define VPE_OFF_W_MASK			0x07ff
+#define VPE_OFF_W_SHIFT			16
+
+/* VPE color space converter regs */
+#define VPE_CSC_CSC00			0x5700
+#define VPE_CSC_A0_MASK			0x1fff
+#define VPE_CSC_A0_SHIFT		0
+#define VPE_CSC_B0_MASK			0x1fff
+#define VPE_CSC_B0_SHIFT		16
+
+#define VPE_CSC_CSC01			0x5704
+#define VPE_CSC_C0_MASK			0x1fff
+#define VPE_CSC_C0_SHIFT		0
+#define VPE_CSC_A1_MASK			0x1fff
+#define VPE_CSC_A1_SHIFT		16
+
+#define VPE_CSC_CSC02			0x5708
+#define VPE_CSC_B1_MASK			0x1fff
+#define VPE_CSC_B1_SHIFT		0
+#define VPE_CSC_C1_MASK			0x1fff
+#define VPE_CSC_C1_SHIFT		16
+
+#define VPE_CSC_CSC03			0x570c
+#define VPE_CSC_A2_MASK			0x1fff
+#define VPE_CSC_A2_SHIFT		0
+#define VPE_CSC_B2_MASK			0x1fff
+#define VPE_CSC_B2_SHIFT		16
+
+#define VPE_CSC_CSC04			0x5710
+#define VPE_CSC_C2_MASK			0x1fff
+#define VPE_CSC_C2_SHIFT		0
+#define VPE_CSC_D0_MASK			0x0fff
+#define VPE_CSC_D0_SHIFT		16
+
+#define VPE_CSC_CSC05			0x5714
+#define VPE_CSC_D1_MASK			0x0fff
+#define VPE_CSC_D1_SHIFT		0
+#define VPE_CSC_D2_MASK			0x0fff
+#define VPE_CSC_D2_SHIFT		16
+#define VPE_CSC_BYPASS			(1 << 28)
+
+#endif
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH 3/6] v4l: ti-vpe: Add VPE mem to mem driver
@ 2013-08-02 14:03   ` Archit Taneja
  0 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-08-02 14:03 UTC (permalink / raw)
  To: linux-media
  Cc: linux-omap, dagriego, dale, pawel, m.szyprowski, hverkuil,
	laurent.pinchart, tomi.valkeinen, Archit Taneja

VPE is a block which consists of a single memory to memory path which can
perform chrominance up/down sampling, de-interlacing, scaling, and color space
conversion of raster or tiled YUV420 coplanar, YUV422 coplanar or YUV422
interleaved video formats.

We create a mem2mem driver based primarily on the mem2mem-testdev example.
The de-interlacer, scaler and color space converter are all bypassed for now
to keep the driver simple. Chroma up/down sampler blocks are implemented, so
conversion beteen different YUV formats is possible.

Each mem2mem context allocates a buffer for VPE MMR values which it will use
when it gets access to the VPE HW via the mem2mem queue, it also allocates
a VPDMA descriptor list to which configuration and data descriptors are added.

Based on the information received via v4l2 ioctls for the source and
destination queues, the driver configures the values for the MMRs, and stores
them in the buffer. There are also some VPDMA parameters like frame start and
line mode which needs to be configured, these are configured by direct register
writes via the VPDMA helper functions.

The driver's device_run() mem2mem op will add each descriptor based on how the
source and destination queues are set up for the given ctx, once the list is
prepared, it's submitted to VPDMA, these descriptors when parsed by VPDMA will
upload MMR registers, start DMA of video buffers on the various input and output
clients/ports.

When the list is parsed completely(and the DMAs on all the output ports done),
an interrupt is generated which we use to notify that the source and destination
buffers are done.

The rest of the driver is quite similar to other mem2mem drivers, we use the
multiplane v4l2 ioctls as the HW support coplanar formats.

Signed-off-by: Archit Taneja <archit@ti.com>
---
 drivers/media/platform/Kconfig           |   10 +
 drivers/media/platform/Makefile          |    2 +
 drivers/media/platform/ti-vpe/vpe.c      | 1763 ++++++++++++++++++++++++++++++
 drivers/media/platform/ti-vpe/vpe_regs.h |  496 +++++++++
 4 files changed, 2271 insertions(+)
 create mode 100644 drivers/media/platform/ti-vpe/vpe.c
 create mode 100644 drivers/media/platform/ti-vpe/vpe_regs.h

diff --git a/drivers/media/platform/Kconfig b/drivers/media/platform/Kconfig
index 08de865..909e590 100644
--- a/drivers/media/platform/Kconfig
+++ b/drivers/media/platform/Kconfig
@@ -210,6 +210,16 @@ config VIDEO_SH_VEU
 	    Support for the Video Engine Unit (VEU) on SuperH and
 	    SH-Mobile SoCs.
 
+config VIDEO_TI_VPE
+	tristate "TI VPE (Video Processing Engine) driver"
+	depends on VIDEO_DEV && VIDEO_V4L2 && SOC_DRA7XX
+	select VIDEOBUF2_DMA_CONTIG
+	select V4L2_MEM2MEM_DEV
+	default n
+	---help---
+	  This is a v4l2 driver for the TI VPE(Video Processing Engine) block
+	  found on DRA7XX SoC.
+
 endif # V4L_MEM2MEM_DRIVERS
 
 menuconfig V4L_TEST_DRIVERS
diff --git a/drivers/media/platform/Makefile b/drivers/media/platform/Makefile
index eee28dd..d4614e7 100644
--- a/drivers/media/platform/Makefile
+++ b/drivers/media/platform/Makefile
@@ -22,6 +22,8 @@ obj-$(CONFIG_VIDEO_VIVI) += vivi.o
 
 obj-$(CONFIG_VIDEO_MEM2MEM_TESTDEV) += mem2mem_testdev.o
 
+obj-$(CONFIG_VIDEO_TI_VPE)		+= ti-vpe/
+
 obj-$(CONFIG_VIDEO_MX2_EMMAPRP)		+= mx2_emmaprp.o
 obj-$(CONFIG_VIDEO_CODA) 		+= coda.o
 
diff --git a/drivers/media/platform/ti-vpe/vpe.c b/drivers/media/platform/ti-vpe/vpe.c
new file mode 100644
index 0000000..14a292b
--- /dev/null
+++ b/drivers/media/platform/ti-vpe/vpe.c
@@ -0,0 +1,1763 @@
+/*
+ * TI VPE mem2mem driver, based on the virtual v4l2-mem2mem example driver
+ *
+ * Copyright (c) 2013 Texas Instruments Inc.
+ * David Griego, <dagriego@biglakesoftware.com>
+ * Dale Farnsworth, <dale@farnsworth.org>
+ * Archit Taneja, <archit@ti.com>
+ *
+ * Copyright (c) 2009-2010 Samsung Electronics Co., Ltd.
+ * Pawel Osciak, <pawel@osciak.com>
+ * Marek Szyprowski, <m.szyprowski@samsung.com>
+ *
+ * Based on the virtual v4l2-mem2mem example device
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation
+ */
+
+#include <linux/delay.h>
+#include <linux/dma-mapping.h>
+#include <linux/err.h>
+#include <linux/fs.h>
+#include <linux/interrupt.h>
+#include <linux/io.h>
+#include <linux/ioctl.h>
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <linux/pm_runtime.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/videodev2.h>
+
+#include <media/v4l2-common.h>
+#include <media/v4l2-ctrls.h>
+#include <media/v4l2-device.h>
+#include <media/v4l2-event.h>
+#include <media/v4l2-ioctl.h>
+#include <media/v4l2-mem2mem.h>
+#include <media/videobuf2-core.h>
+#include <media/videobuf2-dma-contig.h>
+
+#include "vpdma.h"
+#include "vpe_regs.h"
+
+#define VPE_MODULE_NAME "vpe"
+
+/* minimum and maximum frame sizes */
+#define MIN_W		128
+#define MIN_H		128
+#define MAX_W		1920
+#define MAX_H		1080
+
+/* required alignments */
+#define S_ALIGN		0	/* multiple of 1 */
+#define H_ALIGN		1	/* multiple of 2 */
+#define W_ALIGN		1	/* multiple of 2 */
+
+/* multiple of 128 bits, line stride, 16 bytes */
+#define L_ALIGN		4
+
+/* flags that indicate a format can be used for capture/output */
+#define VPE_FMT_TYPE_CAPTURE	(1 << 0)
+#define VPE_FMT_TYPE_OUTPUT	(1 << 1)
+
+/* used as plane indices */
+#define VPE_MAX_PLANES	2
+#define VPE_LUMA	0
+#define VPE_CHROMA	1
+
+/* per m2m context info */
+#define VPE_DEF_BUFS_PER_JOB	1	/* default one buffer per batch job */
+
+/*
+ * each VPE context needs up to 3 config desciptors, 7 input descriptors,
+ * 3 output descriptors, and 3 control descripors
+ */
+#define VPE_DESC_LIST_SIZE	(15 * VPDMA_MAX_DESC_SIZE)
+
+#define vpe_dbg(vpedev, fmt, arg...)	\
+		dev_dbg((vpedev)->v4l2_dev.dev, fmt, ##arg)
+#define vpe_err(vpedev, fmt, arg...)	\
+		dev_err((vpedev)->v4l2_dev.dev, fmt, ##arg)
+
+struct vpe_us_coeffs {
+	unsigned short	anchor_fid0_c0;
+	unsigned short	anchor_fid0_c1;
+	unsigned short	anchor_fid0_c2;
+	unsigned short	anchor_fid0_c3;
+	unsigned short	interp_fid0_c0;
+	unsigned short	interp_fid0_c1;
+	unsigned short	interp_fid0_c2;
+	unsigned short	interp_fid0_c3;
+	unsigned short	anchor_fid1_c0;
+	unsigned short	anchor_fid1_c1;
+	unsigned short	anchor_fid1_c2;
+	unsigned short	anchor_fid1_c3;
+	unsigned short	interp_fid1_c0;
+	unsigned short	interp_fid1_c1;
+	unsigned short	interp_fid1_c2;
+	unsigned short	interp_fid1_c3;
+};
+
+/*
+ * Default upsampler coefficients
+ */
+static struct vpe_us_coeffs us_coeffs[] = {
+	{
+		/* Coefficients for progressive input */
+		0x00C8, 0x0348, 0x0018, 0x3FD8, 0x3FB8, 0x0378, 0x00E8, 0x3FE8,
+		0x00C8, 0x0348, 0x0018, 0x3FD8, 0x3FB8, 0x0378, 0x00E8, 0x3FE8,
+	},
+};
+
+/*
+ * The port_data structure contains per-port data.
+ */
+struct vpe_port_data {
+	enum vpdma_channel channel;	/* VPDMA channel */
+	u8	vb_part;		/* plane index for co-panar formats */
+};
+
+/*
+ * Define indices into the port_data tables
+ */
+#define VPE_PORT_LUMA1_IN	0
+#define VPE_PORT_CHROMA1_IN	1
+#define VPE_PORT_LUMA_OUT	8
+#define VPE_PORT_CHROMA_OUT	9
+#define VPE_PORT_RGB_OUT	10
+
+static struct vpe_port_data port_data[11] = {
+	[VPE_PORT_LUMA1_IN] = {
+		.channel	= VPE_CHAN_LUMA1_IN,
+		.vb_part	= VPE_LUMA,
+	},
+	[VPE_PORT_CHROMA1_IN] = {
+		.channel	= VPE_CHAN_CHROMA1_IN,
+		.vb_part	= VPE_CHROMA,
+	},
+	[VPE_PORT_LUMA_OUT] = {
+		.channel	= VPE_CHAN_LUMA_OUT,
+		.vb_part	= VPE_LUMA,
+	},
+	[VPE_PORT_CHROMA_OUT] = {
+		.channel	= VPE_CHAN_CHROMA_OUT,
+		.vb_part	= VPE_CHROMA,
+	},
+	[VPE_PORT_RGB_OUT] = {
+		.channel	= VPE_CHAN_RGB_OUT,
+		.vb_part	= VPE_LUMA,
+	},
+};
+
+
+/* driver info for each of the supported video formats */
+struct vpe_fmt {
+	char	*name;			/* human-readable name */
+	u32	fourcc;			/* standard format identifier */
+	u8	types;			/* CAPTURE and/or OUTPUT */
+	u8	coplanar;		/* set for unpacked Luma and Chroma */
+	/* vpdma format info for each plane */
+	struct vpdma_data_format *vpdma_fmt[VPE_MAX_PLANES];
+};
+
+static struct vpe_fmt vpe_formats[] = {
+	{
+		.name		= "YUV 422 co-planar",
+		.fourcc		= V4L2_PIX_FMT_NV16,
+		.types		= VPE_FMT_TYPE_CAPTURE | VPE_FMT_TYPE_OUTPUT,
+		.coplanar	= 1,
+		.vpdma_fmt	= { &vpdma_yuv_fmts[VPDMA_DATA_FMT_Y444],
+				    &vpdma_yuv_fmts[VPDMA_DATA_FMT_C444],
+				  },
+	},
+	{
+		.name		= "YUV 420 co-planar",
+		.fourcc		= V4L2_PIX_FMT_NV12,
+		.types		= VPE_FMT_TYPE_CAPTURE | VPE_FMT_TYPE_OUTPUT,
+		.coplanar	= 1,
+		.vpdma_fmt	= { &vpdma_yuv_fmts[VPDMA_DATA_FMT_Y420],
+				    &vpdma_yuv_fmts[VPDMA_DATA_FMT_C420],
+				  },
+	},
+	{
+		.name		= "YUYV 422 packed",
+		.fourcc		= V4L2_PIX_FMT_YUYV,
+		.types		= VPE_FMT_TYPE_CAPTURE | VPE_FMT_TYPE_OUTPUT,
+		.coplanar	= 0,
+		.vpdma_fmt	= { &vpdma_yuv_fmts[VPDMA_DATA_FMT_YC422],
+				  },
+	},
+	{
+		.name		= "UYVY 422 packed",
+		.fourcc		= V4L2_PIX_FMT_UYVY,
+		.types		= VPE_FMT_TYPE_CAPTURE | VPE_FMT_TYPE_OUTPUT,
+		.coplanar	= 0,
+		.vpdma_fmt	= { &vpdma_yuv_fmts[VPDMA_DATA_FMT_CY422],
+				  },
+	},
+};
+
+/*
+ * per-queue, driver-specific private data.
+ * there is one source queue and one destination queue for each m2m context.
+ */
+struct vpe_q_data {
+	unsigned int		width;				/* frame width */
+	unsigned int		height;				/* frame height */
+	unsigned int		bytesperline[VPE_MAX_PLANES];	/* bytes per line in memory */
+	enum v4l2_colorspace	colorspace;
+	unsigned int		flags;
+	unsigned int		sizeimage[VPE_MAX_PLANES];	/* image size in memory */
+	struct v4l2_rect	c_rect;				/* crop/compose rectangle */
+	struct vpe_fmt		*fmt;				/* format info */
+};
+
+/* vpe_q_data flag bits */
+#define	Q_DATA_FRAME_1D		(1 << 0)
+#define	Q_DATA_MODE_TILED	(1 << 1)
+
+enum {
+	Q_DATA_SRC = 0,
+	Q_DATA_DST = 1,
+};
+
+/* find our format description corresponding to the passed v4l2_format */
+static struct vpe_fmt *find_format(struct v4l2_format *f)
+{
+	struct vpe_fmt *fmt;
+	unsigned int k;
+
+	for (k = 0; k < ARRAY_SIZE(vpe_formats); k++) {
+		fmt = &vpe_formats[k];
+		if (fmt->fourcc == f->fmt.pix.pixelformat)
+			return fmt;
+	}
+
+	return NULL;
+}
+
+/*
+ * there is one vpe_dev structure in the driver, it is shared by
+ * all instances.
+ */
+struct vpe_dev {
+	struct v4l2_device	v4l2_dev;
+	struct video_device	*vfd;
+	struct v4l2_m2m_dev	*m2m_dev;
+
+	atomic_t		num_instances;	/* count of driver instances */
+	dma_addr_t		loaded_mmrs;	/* shadow mmrs in device */
+	struct mutex		dev_mutex;
+	spinlock_t		lock;
+
+	int			irq;
+	void __iomem		*base;
+
+	struct vb2_alloc_ctx	*alloc_ctx;
+	struct vpdma_data	*vpdma;		/* vpdma data handle */
+};
+
+/*
+ * There is one vpe_ctx structure for each m2m context.
+ */
+struct vpe_ctx {
+	struct v4l2_fh		fh;
+	struct vpe_dev		*dev;
+	struct v4l2_m2m_ctx	*m2m_ctx;
+	struct v4l2_ctrl_handler hdl;
+
+	unsigned int		sequence;		/* current frame/field seq */
+	unsigned int		aborting;		/* abort after next irq */
+
+	unsigned int		bufs_per_job;		/* input buffers per batch */
+	unsigned int		bufs_completed;		/* bufs done in this batch */
+
+	struct vpe_q_data	q_data[2];		/* src & dst queue data */
+	struct vb2_buffer	*src_vb;
+	struct vb2_buffer	*dst_vb;
+
+	struct vpdma_buf	mmr_adb;		/* shadow reg addr/data block */
+	struct vpdma_desc_list	desc_list;		/* DMA descriptor list */
+
+	bool			load_mmrs;		/* have new shadow reg values */
+};
+
+
+/*
+ * M2M devices get 2 queues.
+ * Return the queue given the type.
+ */
+static struct vpe_q_data *get_q_data(struct vpe_ctx *ctx,
+				     enum v4l2_buf_type type)
+{
+	switch (type) {
+	case V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE:
+		return &ctx->q_data[Q_DATA_SRC];
+	case V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE:
+		return &ctx->q_data[Q_DATA_DST];
+	default:
+		BUG();
+	}
+	return NULL;
+}
+
+static u32 read_reg(struct vpe_dev *dev, int offset)
+{
+	return ioread32(dev->base + offset);
+}
+
+static void write_reg(struct vpe_dev *dev, int offset, u32 value)
+{
+	iowrite32(value, dev->base + offset);
+}
+
+/* register field read/write helpers */
+static int get_field(u32 value, u32 mask, int shift)
+{
+	return (value & (mask << shift)) >> shift;
+}
+
+static int get_field_reg(struct vpe_dev *dev, int offset, u32 mask, int shift)
+{
+	return get_field(read_reg(dev, offset), mask, shift);
+}
+
+static void insert_field(u32 *valp, u32 field, u32 mask, int shift)
+{
+	u32 val = *valp;
+
+	val &= ~(mask << shift);
+	val |= (field & mask) << shift;
+	*valp = val;
+}
+
+static void insert_field_reg(struct vpe_dev *dev, int offset, u32 field,
+		u32 mask, int shift)
+{
+	u32 val = read_reg(dev, offset);
+
+	insert_field(&val, field, mask, shift);
+
+	write_reg(dev, offset, val);
+}
+
+/*
+ * DMA address/data block for the shadow registers
+ */
+struct vpe_mmr_adb {
+	struct vpdma_adb_hdr	out_fmt_hdr;
+	u32			out_fmt_reg[1];
+	u32			out_fmt_pad[3];
+	struct vpdma_adb_hdr	us1_hdr;
+	u32			us1_regs[8];
+	struct vpdma_adb_hdr	us2_hdr;
+	u32			us2_regs[8];
+	struct vpdma_adb_hdr	us3_hdr;
+	u32			us3_regs[8];
+	struct vpdma_adb_hdr	dei_hdr;
+	u32			dei_regs[1];
+	u32			dei_pad[3];
+	struct vpdma_adb_hdr	sc_hdr;
+	u32			sc_regs[1];
+	u32			sc_pad[3];
+	struct vpdma_adb_hdr	csc_hdr;
+	u32			csc_regs[6];
+	u32			csc_pad[2];
+};
+
+#define VPE_SET_MMR_ADB_HDR(ctx, hdr, regs, offset_a)	\
+	VPDMA_SET_MMR_ADB_HDR(ctx->mmr_adb, vpe_mmr_adb, hdr, regs, offset_a)
+/*
+ * Set the headers for all of the address/data block structures.
+ */
+static void init_adb_hdrs(struct vpe_ctx *ctx)
+{
+	VPE_SET_MMR_ADB_HDR(ctx, out_fmt_hdr, out_fmt_reg, VPE_CLK_FORMAT_SELECT);
+	VPE_SET_MMR_ADB_HDR(ctx, us1_hdr, us1_regs, VPE_US1_R0);
+	VPE_SET_MMR_ADB_HDR(ctx, us2_hdr, us2_regs, VPE_US2_R0);
+	VPE_SET_MMR_ADB_HDR(ctx, us3_hdr, us3_regs, VPE_US3_R0);
+	VPE_SET_MMR_ADB_HDR(ctx, dei_hdr, dei_regs, VPE_DEI_FRAME_SIZE);
+	VPE_SET_MMR_ADB_HDR(ctx, sc_hdr, sc_regs, VPE_SC_MP_SC0);
+	VPE_SET_MMR_ADB_HDR(ctx, csc_hdr, csc_regs, VPE_CSC_CSC00);
+};
+
+/*
+ * Enable or disable the VPE clocks
+ */
+static void vpe_set_clock_enable(struct vpe_dev *dev, bool on)
+{
+	u32 val = 0;
+
+	if (on)
+		val = VPE_DATA_PATH_CLK_ENABLE | VPE_VPEDMA_CLK_ENABLE;
+	write_reg(dev, VPE_CLK_ENABLE, val);
+}
+
+static void vpe_top_reset(struct vpe_dev *dev)
+{
+
+	insert_field_reg(dev, VPE_CLK_RESET, 1, VPE_DATA_PATH_CLK_RESET_MASK,
+		VPE_DATA_PATH_CLK_RESET_SHIFT);
+
+	usleep_range(100, 150);
+
+	insert_field_reg(dev, VPE_CLK_RESET, 0, VPE_DATA_PATH_CLK_RESET_MASK,
+		VPE_DATA_PATH_CLK_RESET_SHIFT);
+}
+
+static void vpe_top_vpdma_reset(struct vpe_dev *dev)
+{
+	insert_field_reg(dev, VPE_CLK_RESET, 1, VPE_VPDMA_CLK_RESET_MASK,
+		VPE_VPDMA_CLK_RESET_SHIFT);
+
+	usleep_range(100, 150);
+
+	insert_field_reg(dev, VPE_CLK_RESET, 0, VPE_VPDMA_CLK_RESET_MASK,
+		VPE_VPDMA_CLK_RESET_SHIFT);
+}
+
+/*
+ * Load the correct of upsampler coefficients into the shadow MMRs
+ */
+static void set_us_coefficients(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	u32 *us1_reg = &mmr_adb->us1_regs[0];
+	u32 *us2_reg = &mmr_adb->us2_regs[0];
+	u32 *us3_reg = &mmr_adb->us3_regs[0];
+	unsigned short *cp, *end_cp;
+
+	cp = &us_coeffs[0].anchor_fid0_c0;
+
+	end_cp = cp + sizeof(us_coeffs[0]) / sizeof(*cp);
+
+	while (cp < end_cp) {
+		insert_field(us1_reg, *cp++, VPE_US_C0_MASK, VPE_US_C0_SHIFT);
+		insert_field(us1_reg, *cp++, VPE_US_C1_MASK, VPE_US_C1_SHIFT);
+		*us2_reg++ = *us1_reg;
+		*us3_reg++ = *us1_reg++;
+	}
+	ctx->load_mmrs = true;
+}
+
+/*
+ * Set the upsampler config mode and the VPDMA line mode in the shadow MMRs.
+ */
+static void set_cfg_and_line_modes(struct vpe_ctx *ctx)
+{
+	struct vpe_fmt *fmt = ctx->q_data[Q_DATA_SRC].fmt;
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	u32 *us1_reg0 = &mmr_adb->us1_regs[0];
+	u32 *us2_reg0 = &mmr_adb->us2_regs[0];
+	u32 *us3_reg0 = &mmr_adb->us3_regs[0];
+	int line_mode = 1;
+	int cfg_mode = 1;
+
+	/*
+	 * Cfg Mode 0: YUV420 source, enable upsampler, DEI is de-interlacing.
+	 * Cfg Mode 1: YUV422 source, disable upsampler, DEI is de-interlacing.
+	 */
+
+	if (fmt->fourcc == V4L2_PIX_FMT_NV12) {
+		cfg_mode = 0;
+		line_mode = 0;		/* double lines to line buffer */
+	}
+
+	insert_field(us1_reg0, cfg_mode, VPE_US_MODE_MASK, VPE_US_MODE_SHIFT);
+	insert_field(us2_reg0, cfg_mode, VPE_US_MODE_MASK, VPE_US_MODE_SHIFT);
+	insert_field(us3_reg0, cfg_mode, VPE_US_MODE_MASK, VPE_US_MODE_SHIFT);
+
+	/* regs for now */
+	vpdma_set_line_mode(ctx->dev->vpdma, line_mode, VPE_CHAN_CHROMA1_IN);
+
+	/* frame start for input luma */
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_LUMA1_IN);
+
+	/* frame start for input chroma */
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_CHROMA1_IN);
+
+	ctx->load_mmrs = true;
+}
+
+/*
+ * Set the shadow registers that are modified when the source
+ * format changes.
+ */
+static void set_src_registers(struct vpe_ctx *ctx)
+{
+	set_us_coefficients(ctx);
+}
+
+/*
+ * Set the shadow registers that are modified when the destination
+ * format changes.
+ */
+static void set_dst_registers(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	struct vpe_fmt *fmt = ctx->q_data[Q_DATA_DST].fmt;
+	u32 val = 0;
+
+	/* select RGB path when color space conversion is supported in future */
+	if (fmt->fourcc == V4L2_PIX_FMT_RGB24)
+		val |= VPE_RGB_OUT_SELECT | VPE_CSC_SRC_DEI_SCALER;
+	else if (fmt->fourcc == V4L2_PIX_FMT_NV16)
+		val |= VPE_COLOR_SEPARATE_422;
+
+	/* The source of CHR_DS is always the scaler, whether it's used or not */
+	val |= VPE_DS_SRC_DEI_SCALER;
+
+	if (fmt->fourcc != V4L2_PIX_FMT_NV12)
+		val |= VPE_DS_BYPASS;
+
+	mmr_adb->out_fmt_reg[0] = val;
+
+	ctx->load_mmrs = true;
+}
+
+/*
+ * Set the de-interlacer shadow register values
+ */
+static void set_dei_regs_bypass(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	struct vpe_q_data *s_q_data = &ctx->q_data[Q_DATA_SRC];
+	unsigned int src_h = s_q_data->c_rect.height;
+	unsigned int src_w = s_q_data->c_rect.width;
+	u32 *dei_mmr0 = &mmr_adb->dei_regs[0];
+	u32 val = 0;
+
+	/*
+	 * according to TRM, we should set DEI in progressive bypass mode when
+	 * the input content is progressive, however, DEI is bypassed correctly
+	 * for both progressive and interlace content in interlace bypass mode.
+	 * It has been recommended not to use progressive bypass mode.
+	 */
+	val = VPE_DEI_INTERLACE_BYPASS;
+
+	val |= (src_h << VPE_DEI_HEIGHT_SHIFT) |
+		(src_w << VPE_DEI_WIDTH_SHIFT) |
+		VPE_DEI_FIELD_FLUSH;
+
+	*dei_mmr0 = val;
+
+	ctx->load_mmrs = true;
+}
+
+static void set_csc_coeff_bypass(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	u32 *shadow_csc_reg5 = &mmr_adb->csc_regs[5];
+
+	*shadow_csc_reg5 |= VPE_CSC_BYPASS;
+
+	ctx->load_mmrs = true;
+}
+
+static void set_sc_regs_bypass(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	u32 *sc_reg0 = &mmr_adb->sc_regs[0];
+	u32 val = 0;
+
+	val |= VPE_SC_BYPASS;
+	*sc_reg0 = val;
+
+	ctx->load_mmrs = true;
+}
+
+/*
+ * Set the shadow registers whose values are modified when either the
+ * source or destination format is changed.
+ */
+static int set_srcdst_params(struct vpe_ctx *ctx)
+{
+	ctx->sequence = 0;
+
+	set_cfg_and_line_modes(ctx);
+	set_dei_regs_bypass(ctx);
+	set_csc_coeff_bypass(ctx);
+	set_sc_regs_bypass(ctx);
+
+	return 0;
+}
+
+/*
+ * Return the vpe_ctx structure for a given struct file
+ */
+static struct vpe_ctx *file2ctx(struct file *file)
+{
+	return container_of(file->private_data, struct vpe_ctx, fh);
+}
+
+/*
+ * mem2mem callbacks
+ */
+
+/**
+ * job_ready() - check whether an instance is ready to be scheduled to run
+ */
+static int job_ready(void *priv)
+{
+	struct vpe_ctx *ctx = priv;
+	int needed = ctx->bufs_per_job;
+
+	if (v4l2_m2m_num_src_bufs_ready(ctx->m2m_ctx) < needed)
+		return 0;
+
+	return 1;
+}
+
+static void job_abort(void *priv)
+{
+	struct vpe_ctx *ctx = priv;
+
+	/* Will cancel the transaction in the next interrupt handler */
+	ctx->aborting = 1;
+}
+
+/*
+ * Lock access to the device
+ */
+static void vpe_lock(void *priv)
+{
+	struct vpe_ctx *ctx = priv;
+	struct vpe_dev *dev = ctx->dev;
+	mutex_lock(&dev->dev_mutex);
+}
+
+static void vpe_unlock(void *priv)
+{
+	struct vpe_ctx *ctx = priv;
+	struct vpe_dev *dev = ctx->dev;
+	mutex_unlock(&dev->dev_mutex);
+}
+
+static void vpe_dump_regs(struct vpe_dev *dev)
+{
+#define DUMPREG(r) vpe_dbg(dev, "%-35s %08x\n", #r, read_reg(dev, VPE_##r))
+
+	vpe_dbg(dev, "VPE Registers:\n");
+
+	DUMPREG(PID);
+	DUMPREG(SYSCONFIG);
+	DUMPREG(INT0_STATUS0_RAW);
+	DUMPREG(INT0_STATUS0);
+	DUMPREG(INT0_ENABLE0);
+	DUMPREG(INT0_STATUS1_RAW);
+	DUMPREG(INT0_STATUS1);
+	DUMPREG(INT0_ENABLE1);
+	DUMPREG(CLK_ENABLE);
+	DUMPREG(CLK_RESET);
+	DUMPREG(CLK_FORMAT_SELECT);
+	DUMPREG(CLK_RANGE_MAP);
+	DUMPREG(US1_R0);
+	DUMPREG(US1_R1);
+	DUMPREG(US1_R2);
+	DUMPREG(US1_R3);
+	DUMPREG(US1_R4);
+	DUMPREG(US1_R5);
+	DUMPREG(US1_R6);
+	DUMPREG(US1_R7);
+	DUMPREG(US2_R0);
+	DUMPREG(US2_R1);
+	DUMPREG(US2_R2);
+	DUMPREG(US2_R3);
+	DUMPREG(US2_R4);
+	DUMPREG(US2_R5);
+	DUMPREG(US2_R6);
+	DUMPREG(US2_R7);
+	DUMPREG(US3_R0);
+	DUMPREG(US3_R1);
+	DUMPREG(US3_R2);
+	DUMPREG(US3_R3);
+	DUMPREG(US3_R4);
+	DUMPREG(US3_R5);
+	DUMPREG(US3_R6);
+	DUMPREG(US3_R7);
+	DUMPREG(DEI_FRAME_SIZE);
+	DUMPREG(MDT_BYPASS);
+	DUMPREG(MDT_SF_THRESHOLD);
+	DUMPREG(EDI_CONFIG);
+	DUMPREG(DEI_EDI_LUT_R0);
+	DUMPREG(DEI_EDI_LUT_R1);
+	DUMPREG(DEI_EDI_LUT_R2);
+	DUMPREG(DEI_EDI_LUT_R3);
+	DUMPREG(DEI_FMD_WINDOW_R0);
+	DUMPREG(DEI_FMD_WINDOW_R1);
+	DUMPREG(DEI_FMD_CONTROL_R0);
+	DUMPREG(DEI_FMD_CONTROL_R1);
+	DUMPREG(DEI_FMD_STATUS_R0);
+	DUMPREG(DEI_FMD_STATUS_R1);
+	DUMPREG(DEI_FMD_STATUS_R2);
+	DUMPREG(SC_MP_SC0);
+	DUMPREG(SC_MP_SC1);
+	DUMPREG(SC_MP_SC2);
+	DUMPREG(SC_MP_SC3);
+	DUMPREG(SC_MP_SC4);
+	DUMPREG(SC_MP_SC5);
+	DUMPREG(SC_MP_SC6);
+	DUMPREG(SC_MP_SC8);
+	DUMPREG(SC_MP_SC9);
+	DUMPREG(SC_MP_SC10);
+	DUMPREG(SC_MP_SC11);
+	DUMPREG(SC_MP_SC12);
+	DUMPREG(SC_MP_SC13);
+	DUMPREG(SC_MP_SC17);
+	DUMPREG(SC_MP_SC18);
+	DUMPREG(SC_MP_SC19);
+	DUMPREG(SC_MP_SC20);
+	DUMPREG(SC_MP_SC21);
+	DUMPREG(SC_MP_SC22);
+	DUMPREG(SC_MP_SC23);
+	DUMPREG(SC_MP_SC24);
+	DUMPREG(SC_MP_SC25);
+	DUMPREG(CSC_CSC00);
+	DUMPREG(CSC_CSC01);
+	DUMPREG(CSC_CSC02);
+	DUMPREG(CSC_CSC03);
+	DUMPREG(CSC_CSC04);
+	DUMPREG(CSC_CSC05);
+#undef DUMPREG
+}
+
+static void add_out_dtd(struct vpe_ctx *ctx, int port)
+{
+	struct vpe_q_data *q_data = &ctx->q_data[Q_DATA_DST];
+	struct vpe_port_data *p_data = &port_data[port];
+	struct vb2_buffer *vb = ctx->dst_vb;
+	struct v4l2_rect *c_rect = &q_data->c_rect;
+	struct vpe_fmt *fmt = q_data->fmt;
+	struct vpdma_data_format *vpdma_fmt;
+	int plane = fmt->coplanar ? p_data->vb_part : 0;
+	dma_addr_t dma_addr;
+	u32 flags = 0;
+
+	vpdma_fmt = fmt->vpdma_fmt[plane];
+	dma_addr = vb2_dma_contig_plane_dma_addr(vb, plane);
+	if (!dma_addr) {
+		vpe_err(ctx->dev,
+			"acquiring output buffer(%d) dma_addr failed\n",
+			port);
+		return;
+	}
+
+	if (q_data->flags & Q_DATA_FRAME_1D)
+		flags |= VPDMA_DATA_FRAME_1D;
+	if (q_data->flags & Q_DATA_MODE_TILED)
+		flags |= VPDMA_DATA_MODE_TILED;
+
+	vpdma_add_out_dtd(&ctx->desc_list, c_rect, vpdma_fmt, dma_addr,
+		p_data->channel, flags);
+}
+
+static void add_in_dtd(struct vpe_ctx *ctx, int port)
+{
+	struct vpe_q_data *q_data = &ctx->q_data[Q_DATA_SRC];
+	struct vpe_port_data *p_data = &port_data[port];
+	struct vb2_buffer *vb = ctx->src_vb;
+	struct v4l2_rect *c_rect = &q_data->c_rect;
+	struct vpe_fmt *fmt = q_data->fmt;
+	struct vpdma_data_format *vpdma_fmt;
+	int plane = fmt->coplanar ? p_data->vb_part : 0;
+	int field = 0;
+	dma_addr_t dma_addr;
+	u32 flags = 0;
+
+	vpdma_fmt = fmt->vpdma_fmt[plane];
+
+	dma_addr = vb2_dma_contig_plane_dma_addr(vb, plane);
+	if (!dma_addr) {
+		vpe_err(ctx->dev,
+			"acquiring input buffer(%d) dma_addr failed\n",
+			port);
+		return;
+	}
+
+	if (q_data->flags & Q_DATA_FRAME_1D)
+		flags |= VPDMA_DATA_FRAME_1D;
+	if (q_data->flags & Q_DATA_MODE_TILED)
+		flags |= VPDMA_DATA_MODE_TILED;
+
+	vpdma_add_in_dtd(&ctx->desc_list, q_data->width, q_data->height,
+		c_rect, vpdma_fmt, dma_addr, p_data->channel, field, flags);
+}
+
+/*
+ * Enable the expected IRQ sources
+ */
+static void enable_irqs(struct vpe_ctx *ctx)
+{
+	write_reg(ctx->dev, VPE_INT0_ENABLE0_SET, VPE_INT0_LIST0_COMPLETE);
+	write_reg(ctx->dev, VPE_INT0_ENABLE1_SET, VPE_DS1_UV_ERROR_INT);
+
+	vpdma_enable_list_complete_irq(ctx->dev->vpdma, 0, true);
+}
+
+static void disable_irqs(struct vpe_ctx *ctx)
+{
+	write_reg(ctx->dev, VPE_INT0_ENABLE0_CLR, 0xffffffff);
+	write_reg(ctx->dev, VPE_INT0_ENABLE1_CLR, 0xffffffff);
+
+	vpdma_enable_list_complete_irq(ctx->dev->vpdma, 0, false);
+}
+
+/* device_run() - prepares and starts the device
+ *
+ * This function is only called when both the source and destination
+ * buffers are in place.
+ */
+static void device_run(void *priv)
+{
+	struct vpe_ctx *ctx = priv;
+	struct vpe_q_data *d_q_data = &ctx->q_data[Q_DATA_DST];
+
+	ctx->src_vb = v4l2_m2m_src_buf_remove(ctx->m2m_ctx);
+	WARN_ON(ctx->src_vb == NULL);
+	ctx->dst_vb = v4l2_m2m_dst_buf_remove(ctx->m2m_ctx);
+	WARN_ON(ctx->dst_vb == NULL);
+
+	/* config descriptors */
+	if (ctx->dev->loaded_mmrs != ctx->mmr_adb.dma_addr || ctx->load_mmrs) {
+		vpdma_buf_map(ctx->dev->vpdma, &ctx->mmr_adb);
+		vpdma_add_cfd_adb(&ctx->desc_list, CFD_MMR_CLIENT, &ctx->mmr_adb);
+		ctx->dev->loaded_mmrs = ctx->mmr_adb.dma_addr;
+		ctx->load_mmrs = false;
+	}
+
+	add_out_dtd(ctx, VPE_PORT_LUMA_OUT);
+	if (d_q_data->fmt->coplanar)
+		add_out_dtd(ctx, VPE_PORT_CHROMA_OUT);
+
+	add_in_dtd(ctx, VPE_PORT_LUMA1_IN);
+	add_in_dtd(ctx, VPE_PORT_CHROMA1_IN);
+
+	/* sync on channel control descriptors for output ports */
+	vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_LUMA_OUT);
+	if (d_q_data->fmt->coplanar)
+		vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_CHROMA_OUT);
+
+	enable_irqs(ctx);
+
+	vpdma_buf_map(ctx->dev->vpdma, &ctx->desc_list.buf);
+	vpdma_submit_descs(ctx->dev->vpdma, &ctx->desc_list);
+}
+
+static void ds1_uv_error(struct vpe_ctx *ctx)
+{
+	dev_warn(ctx->dev->v4l2_dev.dev,
+		"received downsampler error interrupt\n");
+}
+
+static irqreturn_t vpe_irq(int irq_vpe, void *data)
+{
+	struct vpe_dev *dev = (struct vpe_dev *)data;
+	struct vpe_ctx *ctx;
+	struct vb2_buffer *s_vb, *d_vb;
+	struct v4l2_buffer *s_buf, *d_buf;
+	unsigned long flags;
+	u32 irqst0, irqst1;
+
+	irqst0 = read_reg(dev, VPE_INT0_STATUS0);
+	if (irqst0) {
+		write_reg(dev, VPE_INT0_STATUS0_CLR, irqst0);
+		vpe_dbg(dev, "INT0_STATUS0 = 0x%08x\n", irqst0);
+	}
+
+	irqst1 = read_reg(dev, VPE_INT0_STATUS1);
+	if (irqst1) {
+		write_reg(dev, VPE_INT0_STATUS1_CLR, irqst1);
+		vpe_dbg(dev, "INT0_STATUS1 = 0x%08x\n", irqst1);
+	}
+
+	ctx = v4l2_m2m_get_curr_priv(dev->m2m_dev);
+	if (!ctx) {
+		vpe_err(dev, "instance released before end of transaction\n");
+		goto handled;
+	}
+
+	if (irqst1 & VPE_DS1_UV_ERROR_INT) {
+		irqst1 &= ~VPE_DS1_UV_ERROR_INT;
+		ds1_uv_error(ctx);
+	}
+
+	if (irqst0) {
+		if (irqst0 & VPE_INT0_LIST0_COMPLETE)
+			vpdma_clear_list_stat(ctx->dev->vpdma);
+
+		irqst0 &= ~(VPE_INT0_LIST0_COMPLETE);
+	}
+
+	if (irqst0 | irqst1) {
+		dev_warn(dev->v4l2_dev.dev, "Unexpected interrupt: "
+			"INT0_STATUS0 = 0x%08x, INT0_STATUS1 = 0x%08x\n",
+			irqst0, irqst1);
+	}
+
+	disable_irqs(ctx);
+
+	vpdma_buf_unmap(dev->vpdma, &ctx->desc_list.buf);
+	vpdma_buf_unmap(dev->vpdma, &ctx->mmr_adb);
+
+	vpdma_reset_desc_list(&ctx->desc_list);
+
+	if (ctx->aborting)
+		goto finished;
+
+	s_vb = ctx->src_vb;
+	d_vb = ctx->dst_vb;
+	s_buf = &s_vb->v4l2_buf;
+	d_buf = &d_vb->v4l2_buf;
+
+	d_buf->timestamp = s_buf->timestamp;
+	if (s_buf->flags & V4L2_BUF_FLAG_TIMECODE) {
+		d_buf->flags |= V4L2_BUF_FLAG_TIMECODE;
+		d_buf->timecode = s_buf->timecode;
+	}
+
+	d_buf->sequence = ctx->sequence;
+
+	ctx->sequence++;
+
+	spin_lock_irqsave(&dev->lock, flags);
+	v4l2_m2m_buf_done(s_vb, VB2_BUF_STATE_DONE);
+	v4l2_m2m_buf_done(d_vb, VB2_BUF_STATE_DONE);
+	spin_unlock_irqrestore(&dev->lock, flags);
+
+	ctx->bufs_completed++;
+	if (ctx->bufs_completed < ctx->bufs_per_job) {
+		device_run(ctx);
+		goto handled;
+	}
+
+finished:
+	vpe_dbg(ctx->dev, "finishing transaction\n");
+	ctx->bufs_completed = 0;
+	v4l2_m2m_job_finish(dev->m2m_dev, ctx->m2m_ctx);
+handled:
+	return IRQ_HANDLED;
+}
+
+/*
+ * video ioctls
+ */
+static int vpe_querycap(struct file *file, void *priv,
+			struct v4l2_capability *cap)
+{
+	strncpy(cap->driver, VPE_MODULE_NAME, sizeof(cap->driver) - 1);
+	strncpy(cap->card, VPE_MODULE_NAME, sizeof(cap->card) - 1);
+	strlcpy(cap->bus_info, VPE_MODULE_NAME, sizeof(cap->bus_info));
+	cap->device_caps  = V4L2_CAP_VIDEO_M2M | V4L2_CAP_STREAMING |
+				V4L2_CAP_VIDEO_CAPTURE_MPLANE |
+				V4L2_CAP_VIDEO_OUTPUT_MPLANE;
+	cap->capabilities = cap->device_caps | V4L2_CAP_DEVICE_CAPS;
+	return 0;
+}
+
+static int __enum_fmt(struct v4l2_fmtdesc *f, u32 type)
+{
+	int i, index;
+	struct vpe_fmt *fmt = NULL;
+
+	index = 0;
+	for (i = 0; i < ARRAY_SIZE(vpe_formats); ++i) {
+		if (vpe_formats[i].types & type) {
+			if (index == f->index) {
+				fmt = &vpe_formats[i];
+				break;
+			}
+			index++;
+		}
+	}
+
+	if (!fmt)
+		return -EINVAL;
+
+	strncpy(f->description, fmt->name, sizeof(f->description) - 1);
+	f->pixelformat = fmt->fourcc;
+	return 0;
+}
+
+static int vpe_enum_fmt(struct file *file, void *priv,
+				struct v4l2_fmtdesc *f)
+{
+	if (V4L2_TYPE_IS_OUTPUT(f->type))
+		return __enum_fmt(f, VPE_FMT_TYPE_OUTPUT);
+	else
+		return __enum_fmt(f, VPE_FMT_TYPE_CAPTURE);
+}
+
+static int vpe_g_fmt(struct file *file, void *priv, struct v4l2_format *f)
+{
+	struct v4l2_pix_format_mplane *pix = &f->fmt.pix_mp;
+	struct vpe_ctx *ctx = file2ctx(file);
+	struct vb2_queue *vq;
+	struct vpe_q_data *q_data;
+	int i;
+
+	vq = v4l2_m2m_get_vq(ctx->m2m_ctx, f->type);
+	if (!vq)
+		return -EINVAL;
+
+	q_data = get_q_data(ctx, f->type);
+
+	pix->width = q_data->width;
+	pix->height = q_data->height;
+	pix->pixelformat = q_data->fmt->fourcc;
+	pix->colorspace = q_data->colorspace;
+	pix->num_planes = q_data->fmt->coplanar ? 2 : 1;
+
+	for (i = 0; i < pix->num_planes; i++) {
+		pix->plane_fmt[i].bytesperline = q_data->bytesperline[i];
+		pix->plane_fmt[i].sizeimage = q_data->sizeimage[i];
+	}
+
+	return 0;
+}
+
+static int __vpe_try_fmt(struct vpe_ctx *ctx, struct v4l2_format *f,
+		       struct vpe_fmt *fmt, int type)
+{
+	struct v4l2_pix_format_mplane *pix = &f->fmt.pix_mp;
+	struct v4l2_plane_pix_format *plane_fmt;
+	int i;
+
+	if (!fmt || !(fmt->types & type)) {
+		vpe_err(ctx->dev, "Fourcc format (0x%08x) invalid.\n",
+			pix->pixelformat);
+		return -EINVAL;
+	}
+
+	if (pix->field == V4L2_FIELD_ANY)
+		pix->field = V4L2_FIELD_NONE;
+	else if (V4L2_FIELD_NONE != pix->field)
+		return -EINVAL;
+
+	v4l_bound_align_image(&pix->width, MIN_W, MAX_W, W_ALIGN,
+			      &pix->height, MIN_H, MAX_H, H_ALIGN,
+			      S_ALIGN);
+
+	pix->num_planes = fmt->coplanar ? 2 : 1;
+	pix->pixelformat = fmt->fourcc;
+	pix->colorspace = fmt->fourcc == V4L2_PIX_FMT_RGB24 ?
+			V4L2_COLORSPACE_SRGB : V4L2_COLORSPACE_SMPTE170M;
+
+
+	for (i = 0; i < pix->num_planes; i++) {
+		int depth;
+
+		plane_fmt = &pix->plane_fmt[i];
+		depth = fmt->vpdma_fmt[i]->depth;
+
+		if (i == VPE_LUMA)
+			plane_fmt->bytesperline =
+					round_up((pix->width * depth) >> 3,
+						1 << L_ALIGN);
+		else
+			plane_fmt->bytesperline = pix->width;
+
+		plane_fmt->sizeimage =
+				(pix->height * pix->width * depth) >> 3;
+	}
+
+	return 0;
+}
+
+static int vpe_try_fmt(struct file *file, void *priv, struct v4l2_format *f)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+	struct vpe_fmt *fmt = find_format(f);
+
+	if (V4L2_TYPE_IS_OUTPUT(f->type))
+		return __vpe_try_fmt(ctx, f, fmt, VPE_FMT_TYPE_OUTPUT);
+	else
+		return __vpe_try_fmt(ctx, f, fmt, VPE_FMT_TYPE_CAPTURE);
+}
+
+static int __vpe_s_fmt(struct vpe_ctx *ctx, struct v4l2_format *f)
+{
+	struct v4l2_pix_format_mplane *pix = &f->fmt.pix_mp;
+	struct v4l2_plane_pix_format *plane_fmt;
+	struct vpe_q_data *q_data;
+	struct vb2_queue *vq;
+	int i;
+
+	vq = v4l2_m2m_get_vq(ctx->m2m_ctx, f->type);
+	if (!vq)
+		return -EINVAL;
+
+	if (vb2_is_busy(vq)) {
+		vpe_err(ctx->dev, "queue busy\n");
+		return -EBUSY;
+	}
+
+	q_data = get_q_data(ctx, f->type);
+	if (!q_data)
+		return -EINVAL;
+
+	q_data->fmt		= find_format(f);
+	q_data->width		= pix->width;
+	q_data->height		= pix->height;
+	q_data->colorspace	= pix->colorspace;
+
+	for (i = 0; i < pix->num_planes; i++) {
+		plane_fmt = &pix->plane_fmt[i];
+
+		q_data->bytesperline[i]	= plane_fmt->bytesperline;
+		q_data->sizeimage[i]	= plane_fmt->sizeimage;
+	}
+
+	q_data->c_rect.left	= 0;
+	q_data->c_rect.top	= 0;
+	q_data->c_rect.width	= q_data->width;
+	q_data->c_rect.height	= q_data->height;
+
+	vpe_dbg(ctx->dev, "Setting format for type %d, wxh: %dx%d, fmt: %d bpl_y %d",
+		f->type, q_data->width, q_data->height, q_data->fmt->fourcc,
+		q_data->bytesperline[VPE_LUMA]);
+	if (q_data->fmt->coplanar)
+		vpe_dbg(ctx->dev, " bpl_uv %d\n",
+			q_data->bytesperline[VPE_CHROMA]);
+
+	return 0;
+}
+
+static int vpe_s_fmt(struct file *file, void *priv, struct v4l2_format *f)
+{
+	int ret;
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	ret = vpe_try_fmt(file, priv, f);
+	if (ret)
+		return ret;
+
+	ret = __vpe_s_fmt(ctx, f);
+	if (ret)
+		return ret;
+
+	if (V4L2_TYPE_IS_OUTPUT(f->type))
+		set_src_registers(ctx);
+	else
+		set_dst_registers(ctx);
+
+	return set_srcdst_params(ctx);
+}
+
+static int vpe_reqbufs(struct file *file, void *priv,
+		       struct v4l2_requestbuffers *reqbufs)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	return v4l2_m2m_reqbufs(file, ctx->m2m_ctx, reqbufs);
+}
+
+static int vpe_querybuf(struct file *file, void *priv, struct v4l2_buffer *buf)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	return v4l2_m2m_querybuf(file, ctx->m2m_ctx, buf);
+}
+
+static int vpe_qbuf(struct file *file, void *priv, struct v4l2_buffer *buf)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	return v4l2_m2m_qbuf(file, ctx->m2m_ctx, buf);
+}
+
+static int vpe_dqbuf(struct file *file, void *priv, struct v4l2_buffer *buf)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	return v4l2_m2m_dqbuf(file, ctx->m2m_ctx, buf);
+}
+
+static int vpe_streamon(struct file *file, void *priv, enum v4l2_buf_type type)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	return v4l2_m2m_streamon(file, ctx->m2m_ctx, type);
+}
+
+static int vpe_streamoff(struct file *file, void *priv, enum v4l2_buf_type type)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	vpe_dump_regs(ctx->dev);
+	vpdma_dump_regs(ctx->dev->vpdma);
+
+	return v4l2_m2m_streamoff(file, ctx->m2m_ctx, type);
+}
+
+#define V4L2_CID_TRANS_NUM_BUFS		(V4L2_CID_USER_BASE)
+
+static int vpe_s_ctrl(struct v4l2_ctrl *ctrl)
+{
+	struct vpe_ctx *ctx =
+		container_of(ctrl->handler, struct vpe_ctx, hdl);
+
+	switch (ctrl->id) {
+	case V4L2_CID_TRANS_NUM_BUFS:
+		ctx->bufs_per_job = ctrl->val;
+		break;
+
+	default:
+		vpe_err(ctx->dev, "Invalid control\n");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static const struct v4l2_ctrl_ops vpe_ctrl_ops = {
+	.s_ctrl = vpe_s_ctrl,
+};
+
+static const struct v4l2_ioctl_ops vpe_ioctl_ops = {
+	.vidioc_querycap	= vpe_querycap,
+
+	.vidioc_enum_fmt_vid_cap_mplane = vpe_enum_fmt,
+	.vidioc_g_fmt_vid_cap_mplane	= vpe_g_fmt,
+	.vidioc_try_fmt_vid_cap_mplane	= vpe_try_fmt,
+	.vidioc_s_fmt_vid_cap_mplane	= vpe_s_fmt,
+
+	.vidioc_enum_fmt_vid_out_mplane = vpe_enum_fmt,
+	.vidioc_g_fmt_vid_out_mplane	= vpe_g_fmt,
+	.vidioc_try_fmt_vid_out_mplane	= vpe_try_fmt,
+	.vidioc_s_fmt_vid_out_mplane	= vpe_s_fmt,
+
+	.vidioc_reqbufs		= vpe_reqbufs,
+	.vidioc_querybuf	= vpe_querybuf,
+
+	.vidioc_qbuf		= vpe_qbuf,
+	.vidioc_dqbuf		= vpe_dqbuf,
+
+	.vidioc_streamon	= vpe_streamon,
+	.vidioc_streamoff	= vpe_streamoff,
+	.vidioc_subscribe_event = v4l2_ctrl_subscribe_event,
+	.vidioc_unsubscribe_event = v4l2_event_unsubscribe,
+};
+
+/*
+ * Queue operations
+ */
+static int vpe_queue_setup(struct vb2_queue *vq,
+			   const struct v4l2_format *fmt,
+			   unsigned int *nbuffers, unsigned int *nplanes,
+			   unsigned int sizes[], void *alloc_ctxs[])
+{
+	int i;
+	struct vpe_ctx *ctx = vb2_get_drv_priv(vq);
+	struct vpe_q_data *q_data;
+
+	q_data = get_q_data(ctx, vq->type);
+
+	*nplanes = q_data->fmt->coplanar ? 2 : 1;
+
+	for (i = 0; i < *nplanes; i++) {
+		sizes[i] = q_data->sizeimage[i];
+		alloc_ctxs[i] = ctx->dev->alloc_ctx;
+	}
+
+	vpe_dbg(ctx->dev, "get %d buffer(s) of size %d", *nbuffers,
+		sizes[VPE_LUMA]);
+	if (q_data->fmt->coplanar)
+		vpe_dbg(ctx->dev, " and %d\n", sizes[VPE_CHROMA]);
+
+	return 0;
+}
+
+static int vpe_buf_prepare(struct vb2_buffer *vb)
+{
+	struct vpe_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
+	struct vpe_q_data *q_data;
+	int i, num_planes;
+
+	vpe_dbg(ctx->dev, "type: %d\n", vb->vb2_queue->type);
+
+	q_data = get_q_data(ctx, vb->vb2_queue->type);
+	num_planes = q_data->fmt->coplanar ? 2 : 1;
+
+	for (i = 0; i < num_planes; i++) {
+		if (vb2_plane_size(vb, i) < q_data->sizeimage[i]) {
+			vpe_err(ctx->dev,
+				"data will not fit into plane (%lu < %lu)\n",
+				vb2_plane_size(vb, i),
+				(long) q_data->sizeimage[i]);
+			return -EINVAL;
+		}
+	}
+
+	for (i = 0; i < num_planes; i++)
+		vb2_set_plane_payload(vb, i, q_data->sizeimage[i]);
+
+	return 0;
+}
+
+static void vpe_buf_queue(struct vb2_buffer *vb)
+{
+	struct vpe_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
+	v4l2_m2m_buf_queue(ctx->m2m_ctx, vb);
+}
+
+static void vpe_wait_prepare(struct vb2_queue *q)
+{
+	struct vpe_ctx *ctx = vb2_get_drv_priv(q);
+	vpe_unlock(ctx);
+}
+
+static void vpe_wait_finish(struct vb2_queue *q)
+{
+	struct vpe_ctx *ctx = vb2_get_drv_priv(q);
+	vpe_lock(ctx);
+}
+
+static struct vb2_ops vpe_qops = {
+	.queue_setup	 = vpe_queue_setup,
+	.buf_prepare	 = vpe_buf_prepare,
+	.buf_queue	 = vpe_buf_queue,
+	.wait_prepare	 = vpe_wait_prepare,
+	.wait_finish	 = vpe_wait_finish,
+};
+
+static int queue_init(void *priv, struct vb2_queue *src_vq,
+		      struct vb2_queue *dst_vq)
+{
+	struct vpe_ctx *ctx = priv;
+	int ret;
+
+	memset(src_vq, 0, sizeof(*src_vq));
+	src_vq->type = V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE;
+	src_vq->io_modes = VB2_MMAP;
+	src_vq->drv_priv = ctx;
+	src_vq->buf_struct_size = sizeof(struct v4l2_m2m_buffer);
+	src_vq->ops = &vpe_qops;
+	src_vq->mem_ops = &vb2_dma_contig_memops;
+	src_vq->timestamp_type = V4L2_BUF_FLAG_TIMESTAMP_MONOTONIC;
+
+	ret = vb2_queue_init(src_vq);
+	if (ret)
+		return ret;
+
+	memset(dst_vq, 0, sizeof(*dst_vq));
+	dst_vq->type = V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE;
+	dst_vq->io_modes = VB2_MMAP;
+	dst_vq->drv_priv = ctx;
+	dst_vq->buf_struct_size = sizeof(struct v4l2_m2m_buffer);
+	dst_vq->ops = &vpe_qops;
+	dst_vq->mem_ops = &vb2_dma_contig_memops;
+	dst_vq->timestamp_type = V4L2_BUF_FLAG_TIMESTAMP_MONOTONIC;
+
+	return vb2_queue_init(dst_vq);
+}
+
+static const struct v4l2_ctrl_config vpe_bufs_per_job = {
+	.ops = &vpe_ctrl_ops,
+	.id = V4L2_CID_TRANS_NUM_BUFS,
+	.name = "Buffers Per Transaction",
+	.type = V4L2_CTRL_TYPE_INTEGER,
+	.def = VPE_DEF_BUFS_PER_JOB,
+	.min = 1,
+	.max = VIDEO_MAX_FRAME,
+	.step = 1,
+};
+
+/*
+ * File operations
+ */
+static int vpe_open(struct file *file)
+{
+	struct vpe_dev *dev = video_drvdata(file);
+	struct vpe_ctx *ctx = NULL;
+	struct vpe_q_data *s_q_data;
+	struct v4l2_ctrl_handler *hdl;
+	int ret;
+
+	vpe_dbg(dev, "vpe_open\n");
+
+	if (!dev->vpdma->ready) {
+		vpe_err(dev, "vpdma firmware not loaded\n");
+		return -ENODEV;
+	}
+
+	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
+	if (!ctx)
+		return -ENOMEM;
+
+	ctx->dev = dev;
+
+	if (mutex_lock_interruptible(&dev->dev_mutex)) {
+		ret = -ERESTARTSYS;
+		goto free_ctx;
+	}
+
+	ret = vpdma_create_desc_list(&ctx->desc_list, VPE_DESC_LIST_SIZE,
+			VPDMA_LIST_TYPE_NORMAL);
+	if (ret != 0)
+		goto unlock;
+
+	ret = vpdma_buf_alloc(&ctx->mmr_adb, sizeof(struct vpe_mmr_adb));
+	if (ret != 0)
+		goto free_desc_list;
+
+	init_adb_hdrs(ctx);
+
+	v4l2_fh_init(&ctx->fh, video_devdata(file));
+	file->private_data = &ctx->fh;
+
+	hdl = &ctx->hdl;
+	v4l2_ctrl_handler_init(hdl, 1);
+	v4l2_ctrl_new_custom(hdl, &vpe_bufs_per_job, NULL);
+	if (hdl->error) {
+		ret = hdl->error;
+		goto exit_fh;
+	}
+
+	s_q_data = &ctx->q_data[Q_DATA_SRC];
+	s_q_data->fmt = &vpe_formats[2];
+	s_q_data->width = 1920;
+	s_q_data->height = 1080;
+	s_q_data->sizeimage[VPE_LUMA] = (s_q_data->width * s_q_data->height *
+			s_q_data->fmt->vpdma_fmt[VPE_LUMA]->depth) >> 3;
+	s_q_data->colorspace = V4L2_COLORSPACE_SMPTE240M;
+	s_q_data->c_rect.left = 0;
+	s_q_data->c_rect.top = 0;
+	s_q_data->c_rect.width = s_q_data->width;
+	s_q_data->c_rect.height = s_q_data->height;
+	s_q_data->flags = 0;
+
+	ctx->q_data[Q_DATA_DST] = *s_q_data;
+
+	set_src_registers(ctx);
+	set_dst_registers(ctx);
+	ret = set_srcdst_params(ctx);
+	if (ret)
+		goto exit_fh;
+
+	ctx->m2m_ctx = v4l2_m2m_ctx_init(dev->m2m_dev, ctx, &queue_init);
+
+	if (IS_ERR(ctx->m2m_ctx)) {
+		ret = PTR_ERR(ctx->m2m_ctx);
+		goto exit_fh;
+	}
+
+	v4l2_fh_add(&ctx->fh);
+
+	/*
+	 * for now, just report the creation of the first instance, we can later
+	 * optimize the driver to enable or disable clocks when the first
+	 * instance is created or the last instance released
+	 */
+	if (atomic_inc_return(&dev->num_instances) == 1)
+		vpe_dbg(dev, "first instance created\n");
+
+	ctx->bufs_per_job = VPE_DEF_BUFS_PER_JOB;
+
+	ctx->load_mmrs = true;
+
+	vpe_dbg(dev, "created instance %p, m2m_ctx: %p\n",
+		ctx, ctx->m2m_ctx);
+
+	mutex_unlock(&dev->dev_mutex);
+
+	return 0;
+exit_fh:
+	v4l2_ctrl_handler_free(hdl);
+	v4l2_fh_exit(&ctx->fh);
+	vpdma_buf_free(&ctx->mmr_adb);
+free_desc_list:
+	vpdma_free_desc_list(&ctx->desc_list);
+unlock:
+	mutex_unlock(&dev->dev_mutex);
+free_ctx:
+	kfree(ctx);
+	return ret;
+}
+
+static int vpe_release(struct file *file)
+{
+	struct vpe_dev *dev = video_drvdata(file);
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	vpe_dbg(dev, "releasing instance %p\n", ctx);
+
+	mutex_lock(&dev->dev_mutex);
+	vpdma_free_desc_list(&ctx->desc_list);
+	vpdma_buf_free(&ctx->mmr_adb);
+
+	v4l2_fh_del(&ctx->fh);
+	v4l2_fh_exit(&ctx->fh);
+	v4l2_m2m_ctx_release(ctx->m2m_ctx);
+
+	kfree(ctx);
+
+	/*
+	 * for now, just report the release of the last instance, we can later
+	 * optimize the driver to enable or disable clocks when the first
+	 * instance is created or the last instance released
+	 */
+	if (atomic_dec_return(&dev->num_instances) == 0)
+		vpe_dbg(dev, "last instance released\n");
+
+	mutex_unlock(&dev->dev_mutex);
+
+	return 0;
+}
+
+static unsigned int vpe_poll(struct file *file,
+			     struct poll_table_struct *wait)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+	struct vpe_dev *dev = ctx->dev;
+	int ret;
+
+	mutex_lock(&dev->dev_mutex);
+	ret = v4l2_m2m_poll(file, ctx->m2m_ctx, wait);
+	mutex_unlock(&dev->dev_mutex);
+	return ret;
+}
+
+static int vpe_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+	struct vpe_dev *dev = ctx->dev;
+	int ret;
+
+	if (mutex_lock_interruptible(&dev->dev_mutex))
+		return -ERESTARTSYS;
+	ret = v4l2_m2m_mmap(file, ctx->m2m_ctx, vma);
+	mutex_unlock(&dev->dev_mutex);
+	return ret;
+}
+
+static const struct v4l2_file_operations vpe_fops = {
+	.owner		= THIS_MODULE,
+	.open		= vpe_open,
+	.release	= vpe_release,
+	.poll		= vpe_poll,
+	.unlocked_ioctl	= video_ioctl2,
+	.mmap		= vpe_mmap,
+};
+
+static struct video_device vpe_videodev = {
+	.name		= VPE_MODULE_NAME,
+	.fops		= &vpe_fops,
+	.ioctl_ops	= &vpe_ioctl_ops,
+	.minor		= -1,
+	.release	= video_device_release,
+	.vfl_dir	= VFL_DIR_M2M,
+};
+
+static struct v4l2_m2m_ops m2m_ops = {
+	.device_run	= device_run,
+	.job_ready	= job_ready,
+	.job_abort	= job_abort,
+	.lock		= vpe_lock,
+	.unlock		= vpe_unlock,
+};
+
+static int vpe_runtime_get(struct platform_device *pdev)
+{
+	int r;
+
+	dev_dbg(&pdev->dev, "vpe_runtime_get\n");
+
+	r = pm_runtime_get_sync(&pdev->dev);
+	WARN_ON(r < 0);
+	return r < 0 ? r : 0;
+}
+
+static void vpe_runtime_put(struct platform_device *pdev)
+{
+
+	int r;
+
+	dev_dbg(&pdev->dev, "vpe_runtime_put\n");
+
+	r = pm_runtime_put_sync(&pdev->dev);
+	WARN_ON(r < 0 && r != -ENOSYS);
+}
+
+static int vpe_probe(struct platform_device *pdev)
+{
+	struct vpe_dev *dev;
+	struct video_device *vfd;
+	struct resource *res;
+	int ret, irq, func;
+
+	dev = devm_kzalloc(&pdev->dev, sizeof(*dev), GFP_KERNEL);
+	if (!dev)
+		return -ENOMEM;
+
+	spin_lock_init(&dev->lock);
+
+	pm_runtime_enable(&pdev->dev);
+
+	ret = vpe_runtime_get(pdev);
+	if (ret)
+		goto err_runtime_get;
+
+	irq = platform_get_irq(pdev, 0);
+	if (irq < 0) {
+		dev_err(&pdev->dev, "missing irq data\n");
+		return -ENODEV;
+	}
+
+	res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "vpe");
+	if (res == NULL) {
+		dev_err(&pdev->dev, "missing platform resources data\n");
+		return -ENODEV;
+	}
+
+	ret = v4l2_device_register(&pdev->dev, &dev->v4l2_dev);
+	if (ret)
+		return ret;
+
+	atomic_set(&dev->num_instances, 0);
+	mutex_init(&dev->dev_mutex);
+
+	vfd = video_device_alloc();
+	if (!vfd) {
+		vpe_err(dev, "Failed to allocate video device\n");
+		ret = -ENOMEM;
+		goto dev_unreg;
+	}
+
+	*vfd = vpe_videodev;
+	vfd->lock = &dev->dev_mutex;
+	vfd->v4l2_dev = &dev->v4l2_dev;
+
+	ret = video_register_device(vfd, VFL_TYPE_GRABBER, 0);
+	if (ret) {
+		vpe_err(dev, "Failed to register video device\n");
+		goto rel_vdev;
+	}
+
+	video_set_drvdata(vfd, dev);
+	snprintf(vfd->name, sizeof(vfd->name), "%s", vpe_videodev.name);
+	dev->vfd = vfd;
+	dev_info(dev->v4l2_dev.dev, "Device registered as /dev/video%d\n",
+		vfd->num);
+
+	platform_set_drvdata(pdev, dev);
+
+	dev->base = devm_ioremap(&pdev->dev, res->start, SZ_128K);
+	if (!dev->base) {
+		ret = -ENOMEM;
+		goto vid_unreg_dev;
+	}
+
+	/* Perform clk enable followed by reset */
+	vpe_set_clock_enable(dev, 1);
+
+	vpe_top_reset(dev);
+
+	func = get_field_reg(dev, VPE_PID, VPE_PID_FUNC_MASK,
+		VPE_PID_FUNC_SHIFT);
+	vpe_dbg(dev, "VPE PID function %x\n", func);
+
+	if (devm_request_irq(&pdev->dev, irq, vpe_irq, 0, VPE_MODULE_NAME,
+			dev) < 0) {
+		ret = -ENOMEM;
+		goto vid_unreg_dev;
+	}
+
+	dev->alloc_ctx = vb2_dma_contig_init_ctx(&pdev->dev);
+	if (IS_ERR(dev->alloc_ctx)) {
+		vpe_err(dev, "Failed to alloc vb2 context\n");
+		ret = PTR_ERR(dev->alloc_ctx);
+		goto vid_unreg_dev;
+	}
+
+	dev->m2m_dev = v4l2_m2m_init(&m2m_ops);
+	if (IS_ERR(dev->m2m_dev)) {
+		vpe_err(dev, "Failed to init mem2mem device\n");
+		ret = PTR_ERR(dev->m2m_dev);
+		goto rel_ctx;
+	}
+
+	vpe_top_vpdma_reset(dev);
+
+	ret = vpdma_init(pdev, &dev->vpdma);
+	if (ret < 0)
+		goto rel_m2m;
+
+	return 0;
+
+rel_m2m:
+	v4l2_m2m_release(dev->m2m_dev);
+rel_ctx:
+	vb2_dma_contig_cleanup_ctx(dev->alloc_ctx);
+vid_unreg_dev:
+	video_unregister_device(vfd);
+rel_vdev:
+	video_device_release(vfd);
+dev_unreg:
+	v4l2_device_unregister(&dev->v4l2_dev);
+err_runtime_get:
+	pm_runtime_disable(&pdev->dev);
+
+	return ret;
+}
+
+static int vpe_remove(struct platform_device *pdev)
+{
+	struct vpe_dev *dev =
+		(struct vpe_dev *) platform_get_drvdata(pdev);
+
+	v4l2_info(&dev->v4l2_dev, "Removing " VPE_MODULE_NAME);
+
+	v4l2_m2m_release(dev->m2m_dev);
+	video_unregister_device(dev->vfd);
+	v4l2_device_unregister(&dev->v4l2_dev);
+	vb2_dma_contig_cleanup_ctx(dev->alloc_ctx);
+
+	vpe_set_clock_enable(dev, 0);
+	vpe_runtime_put(pdev);
+	pm_runtime_disable(&pdev->dev);
+
+	return 0;
+}
+
+#if defined(CONFIG_OF)
+static const struct of_device_id vpe_of_match[] = {
+	{
+		.compatible = "ti,vpe",
+	},
+	{},
+};
+#else
+#define vpe_of_match NULL
+#endif
+
+static struct platform_driver vpe_pdrv = {
+	.probe		= vpe_probe,
+	.remove		= vpe_remove,
+	.driver		= {
+		.name	= VPE_MODULE_NAME,
+		.owner	= THIS_MODULE,
+		.of_match_table = vpe_of_match,
+	},
+};
+
+static void __exit vpe_exit(void)
+{
+	platform_driver_unregister(&vpe_pdrv);
+}
+
+static int __init vpe_init(void)
+{
+	return platform_driver_register(&vpe_pdrv);
+}
+
+module_init(vpe_init);
+module_exit(vpe_exit);
+
+MODULE_DESCRIPTION("TI VPE driver");
+MODULE_AUTHOR("Dale Farnsworth, <dale@farnsworth.org>");
+MODULE_LICENSE("GPL");
diff --git a/drivers/media/platform/ti-vpe/vpe_regs.h b/drivers/media/platform/ti-vpe/vpe_regs.h
new file mode 100644
index 0000000..be41a1f
--- /dev/null
+++ b/drivers/media/platform/ti-vpe/vpe_regs.h
@@ -0,0 +1,496 @@
+/*
+ * Copyright (c) 2013 Texas Instruments Inc.
+ *
+ * David Griego, <dagriego@biglakesoftware.com>
+ * Dale Farnsworth, <dale@farnsworth.org>
+ * Archit Taneja, <archit@ti.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#ifndef __TI_VPE_REGS_H
+#define __TI_VPE_REGS_H
+
+/* VPE register offsets and field selectors */
+
+/* VPE top level regs */
+#define VPE_PID				0x0000
+#define VPE_PID_MINOR_MASK		0x3f
+#define VPE_PID_MINOR_SHIFT		0
+#define VPE_PID_CUSTOM_MASK		0x03
+#define VPE_PID_CUSTOM_SHIFT		6
+#define VPE_PID_MAJOR_MASK		0x07
+#define VPE_PID_MAJOR_SHIFT		8
+#define VPE_PID_RTL_MASK		0x1f
+#define VPE_PID_RTL_SHIFT		11
+#define VPE_PID_FUNC_MASK		0xfff
+#define VPE_PID_FUNC_SHIFT		16
+#define VPE_PID_SCHEME_MASK		0x03
+#define VPE_PID_SCHEME_SHIFT		30
+
+#define VPE_SYSCONFIG			0x0010
+#define VPE_SYSCONFIG_IDLE_MASK		0x03
+#define VPE_SYSCONFIG_IDLE_SHIFT	2
+#define VPE_SYSCONFIG_STANDBY_MASK	0x03
+#define VPE_SYSCONFIG_STANDBY_SHIFT	4
+#define VPE_FORCE_IDLE_MODE		0
+#define VPE_NO_IDLE_MODE		1
+#define VPE_SMART_IDLE_MODE		2
+#define VPE_SMART_IDLE_WAKEUP_MODE	3
+#define VPE_FORCE_STANDBY_MODE		0
+#define VPE_NO_STANDBY_MODE		1
+#define VPE_SMART_STANDBY_MODE		2
+#define VPE_SMART_STANDBY_WAKEUP_MODE	3
+
+#define VPE_INT0_STATUS0_RAW_SET	0x0020
+#define VPE_INT0_STATUS0_RAW		VPE_INT0_STATUS0_RAW_SET
+#define VPE_INT0_STATUS0_CLR		0x0028
+#define VPE_INT0_STATUS0		VPE_INT0_STATUS0_CLR
+#define VPE_INT0_ENABLE0_SET		0x0030
+#define VPE_INT0_ENABLE0		VPE_INT0_ENABLE0_SET
+#define VPE_INT0_ENABLE0_CLR		0x0038
+#define VPE_INT0_LIST0_COMPLETE		(1 << 0)
+#define VPE_INT0_LIST0_NOTIFY		(1 << 1)
+#define VPE_INT0_LIST1_COMPLETE		(1 << 2)
+#define VPE_INT0_LIST1_NOTIFY		(1 << 3)
+#define VPE_INT0_LIST2_COMPLETE		(1 << 4)
+#define VPE_INT0_LIST2_NOTIFY		(1 << 5)
+#define VPE_INT0_LIST3_COMPLETE		(1 << 6)
+#define VPE_INT0_LIST3_NOTIFY		(1 << 7)
+#define VPE_INT0_LIST4_COMPLETE		(1 << 8)
+#define VPE_INT0_LIST4_NOTIFY		(1 << 9)
+#define VPE_INT0_LIST5_COMPLETE		(1 << 10)
+#define VPE_INT0_LIST5_NOTIFY		(1 << 11)
+#define VPE_INT0_LIST6_COMPLETE		(1 << 12)
+#define VPE_INT0_LIST6_NOTIFY		(1 << 13)
+#define VPE_INT0_LIST7_COMPLETE		(1 << 14)
+#define VPE_INT0_LIST7_NOTIFY		(1 << 15)
+#define VPE_INT0_DESCRIPTOR		(1 << 16)
+#define VPE_DEI_FMD_INT			(1 << 18)
+
+#define VPE_INT0_STATUS1_RAW_SET	0x0024
+#define VPE_INT0_STATUS1_RAW		VPE_INT0_STATUS1_RAW_SET
+#define VPE_INT0_STATUS1_CLR		0x002c
+#define VPE_INT0_STATUS1		VPE_INT0_STATUS1_CLR
+#define VPE_INT0_ENABLE1_SET		0x0034
+#define VPE_INT0_ENABLE1		VPE_INT0_ENABLE1_SET
+#define VPE_INT0_ENABLE1_CLR		0x003c
+#define VPE_INT0_CHANNEL_GROUP0		(1 << 0)
+#define VPE_INT0_CHANNEL_GROUP1		(1 << 1)
+#define VPE_INT0_CHANNEL_GROUP2		(1 << 2)
+#define VPE_INT0_CHANNEL_GROUP3		(1 << 3)
+#define VPE_INT0_CHANNEL_GROUP4		(1 << 4)
+#define VPE_INT0_CHANNEL_GROUP5		(1 << 5)
+#define VPE_INT0_CLIENT			(1 << 7)
+#define VPE_DEI_ERROR_INT		(1 << 16)
+#define VPE_DS1_UV_ERROR_INT		(1 << 22)
+
+#define VPE_INTC_EOI			0x00a0
+
+#define VPE_CLK_ENABLE			0x0100
+#define VPE_VPEDMA_CLK_ENABLE		(1 << 0)
+#define VPE_DATA_PATH_CLK_ENABLE	(1 << 1)
+
+#define VPE_CLK_RESET			0x0104
+#define VPE_VPDMA_CLK_RESET_MASK	0x1
+#define VPE_VPDMA_CLK_RESET_SHIFT	0
+#define VPE_DATA_PATH_CLK_RESET_MASK	0x1
+#define VPE_DATA_PATH_CLK_RESET_SHIFT	1
+#define VPE_MAIN_RESET_MASK		0x1
+#define VPE_MAIN_RESET_SHIFT		31
+
+#define VPE_CLK_FORMAT_SELECT		0x010c
+#define VPE_CSC_SRC_SELECT_MASK		0x03
+#define VPE_CSC_SRC_SELECT_SHIFT	0
+#define VPE_RGB_OUT_SELECT		(1 << 8)
+#define VPE_DS_SRC_SELECT_MASK		0x07
+#define VPE_DS_SRC_SELECT_SHIFT		9
+#define VPE_DS_BYPASS			(1 << 16)
+#define VPE_COLOR_SEPARATE_422		(1 << 18)
+
+#define VPE_DS_SRC_DEI_SCALER		(5 << VPE_DS_SRC_SELECT_SHIFT)
+#define VPE_CSC_SRC_DEI_SCALER		(3 << VPE_CSC_SRC_SELECT_SHIFT)
+
+#define VPE_CLK_RANGE_MAP		0x011c
+#define VPE_RANGE_RANGE_MAP_Y_MASK	0x07
+#define VPE_RANGE_RANGE_MAP_Y_SHIFT	0
+#define VPE_RANGE_RANGE_MAP_UV_MASK	0x07
+#define VPE_RANGE_RANGE_MAP_UV_SHIFT	3
+#define VPE_RANGE_MAP_ON		(1 << 6)
+#define VPE_RANGE_REDUCTION_ON		(1 << 28)
+
+/* VPE chrominance upsampler regs */
+#define VPE_US1_R0			0x0304
+#define VPE_US2_R0			0x0404
+#define VPE_US3_R0			0x0504
+#define VPE_US_C1_MASK			0x3fff
+#define VPE_US_C1_SHIFT			2
+#define VPE_US_C0_MASK			0x3fff
+#define VPE_US_C0_SHIFT			18
+#define VPE_US_MODE_MASK		0x03
+#define VPE_US_MODE_SHIFT		16
+#define VPE_ANCHOR_FID0_C1_MASK		0x3fff
+#define VPE_ANCHOR_FID0_C1_SHIFT	2
+#define VPE_ANCHOR_FID0_C0_MASK		0x3fff
+#define VPE_ANCHOR_FID0_C0_SHIFT	18
+
+#define VPE_US1_R1			0x0308
+#define VPE_US2_R1			0x0408
+#define VPE_US3_R1			0x0508
+#define VPE_ANCHOR_FID0_C3_MASK		0x3fff
+#define VPE_ANCHOR_FID0_C3_SHIFT	2
+#define VPE_ANCHOR_FID0_C2_MASK		0x3fff
+#define VPE_ANCHOR_FID0_C2_SHIFT	18
+
+#define VPE_US1_R2			0x030c
+#define VPE_US2_R2			0x040c
+#define VPE_US3_R2			0x050c
+#define VPE_INTERP_FID0_C1_MASK		0x3fff
+#define VPE_INTERP_FID0_C1_SHIFT	2
+#define VPE_INTERP_FID0_C0_MASK		0x3fff
+#define VPE_INTERP_FID0_C0_SHIFT	18
+
+#define VPE_US1_R3			0x0310
+#define VPE_US2_R3			0x0410
+#define VPE_US3_R3			0x0510
+#define VPE_INTERP_FID0_C3_MASK		0x3fff
+#define VPE_INTERP_FID0_C3_SHIFT	2
+#define VPE_INTERP_FID0_C2_MASK		0x3fff
+#define VPE_INTERP_FID0_C2_SHIFT	18
+
+#define VPE_US1_R4			0x0314
+#define VPE_US2_R4			0x0414
+#define VPE_US3_R4			0x0514
+#define VPE_ANCHOR_FID1_C1_MASK		0x3fff
+#define VPE_ANCHOR_FID1_C1_SHIFT	2
+#define VPE_ANCHOR_FID1_C0_MASK		0x3fff
+#define VPE_ANCHOR_FID1_C0_SHIFT	18
+
+#define VPE_US1_R5			0x0318
+#define VPE_US2_R5			0x0418
+#define VPE_US3_R5			0x0518
+#define VPE_ANCHOR_FID1_C3_MASK		0x3fff
+#define VPE_ANCHOR_FID1_C3_SHIFT	2
+#define VPE_ANCHOR_FID1_C2_MASK		0x3fff
+#define VPE_ANCHOR_FID1_C2_SHIFT	18
+
+#define VPE_US1_R6			0x031c
+#define VPE_US2_R6			0x041c
+#define VPE_US3_R6			0x051c
+#define VPE_INTERP_FID1_C1_MASK		0x3fff
+#define VPE_INTERP_FID1_C1_SHIFT	2
+#define VPE_INTERP_FID1_C0_MASK		0x3fff
+#define VPE_INTERP_FID1_C0_SHIFT	18
+
+#define VPE_US1_R7			0x0320
+#define VPE_US2_R7			0x0420
+#define VPE_US3_R7			0x0520
+#define VPE_INTERP_FID0_C3_MASK		0x3fff
+#define VPE_INTERP_FID0_C3_SHIFT	2
+#define VPE_INTERP_FID0_C2_MASK		0x3fff
+#define VPE_INTERP_FID0_C2_SHIFT	18
+
+/* VPE de-interlacer regs */
+#define VPE_DEI_FRAME_SIZE		0x0600
+#define VPE_DEI_WIDTH_MASK		0x07ff
+#define VPE_DEI_WIDTH_SHIFT		0
+#define VPE_DEI_HEIGHT_MASK		0x07ff
+#define VPE_DEI_HEIGHT_SHIFT		16
+#define VPE_DEI_INTERLACE_BYPASS	(1 << 29)
+#define VPE_DEI_FIELD_FLUSH		(1 << 30)
+#define VPE_DEI_PROGRESSIVE		(1 << 31)
+
+#define VPE_MDT_BYPASS			0x0604
+#define VPE_MDT_TEMPMAX_BYPASS		(1 << 0)
+#define VPE_MDT_SPATMAX_BYPASS		(1 << 1)
+
+#define VPE_MDT_SF_THRESHOLD		0x0608
+#define VPE_MDT_SF_SC_THR1_MASK		0xff
+#define VPE_MDT_SF_SC_THR1_SHIFT	0
+#define VPE_MDT_SF_SC_THR2_MASK		0xff
+#define VPE_MDT_SF_SC_THR2_SHIFT	0
+#define VPE_MDT_SF_SC_THR3_MASK		0xff
+#define VPE_MDT_SF_SC_THR3_SHIFT	0
+
+#define VPE_EDI_CONFIG			0x060c
+#define VPE_EDI_INP_MODE_MASK		0x03
+#define VPE_EDI_INP_MODE_SHIFT		0
+#define VPE_EDI_ENABLE_3D		(1 << 2)
+#define VPE_EDI_ENABLE_CHROMA_3D	(1 << 3)
+#define VPE_EDI_CHROMA3D_COR_THR_MASK	0xff
+#define VPE_EDI_CHROMA3D_COR_THR_SHIFT	8
+#define VPE_EDI_DIR_COR_LOWER_THR_MASK	0xff
+#define VPE_EDI_DIR_COR_LOWER_THR_SHIFT	16
+#define VPE_EDI_COR_SCALE_FACTOR_MASK	0xff
+#define VPE_EDI_COR_SCALE_FACTOR_SHIFT	23
+
+#define VPE_DEI_EDI_LUT_R0		0x0610
+#define VPE_EDI_LUT0_MASK		0x1f
+#define VPE_EDI_LUT0_SHIFT		0
+#define VPE_EDI_LUT1_MASK		0x1f
+#define VPE_EDI_LUT1_SHIFT		8
+#define VPE_EDI_LUT2_MASK		0x1f
+#define VPE_EDI_LUT2_SHIFT		16
+#define VPE_EDI_LUT3_MASK		0x1f
+#define VPE_EDI_LUT3_SHIFT		24
+
+#define VPE_DEI_EDI_LUT_R1		0x0614
+#define VPE_EDI_LUT0_MASK		0x1f
+#define VPE_EDI_LUT0_SHIFT		0
+#define VPE_EDI_LUT1_MASK		0x1f
+#define VPE_EDI_LUT1_SHIFT		8
+#define VPE_EDI_LUT2_MASK		0x1f
+#define VPE_EDI_LUT2_SHIFT		16
+#define VPE_EDI_LUT3_MASK		0x1f
+#define VPE_EDI_LUT3_SHIFT		24
+
+#define VPE_DEI_EDI_LUT_R2		0x0618
+#define VPE_EDI_LUT4_MASK		0x1f
+#define VPE_EDI_LUT4_SHIFT		0
+#define VPE_EDI_LUT5_MASK		0x1f
+#define VPE_EDI_LUT5_SHIFT		8
+#define VPE_EDI_LUT6_MASK		0x1f
+#define VPE_EDI_LUT6_SHIFT		16
+#define VPE_EDI_LUT7_MASK		0x1f
+#define VPE_EDI_LUT7_SHIFT		24
+
+#define VPE_DEI_EDI_LUT_R3		0x061c
+#define VPE_EDI_LUT8_MASK		0x1f
+#define VPE_EDI_LUT8_SHIFT		0
+#define VPE_EDI_LUT9_MASK		0x1f
+#define VPE_EDI_LUT9_SHIFT		8
+#define VPE_EDI_LUT10_MASK		0x1f
+#define VPE_EDI_LUT10_SHIFT		16
+#define VPE_EDI_LUT11_MASK		0x1f
+#define VPE_EDI_LUT11_SHIFT		24
+
+#define VPE_DEI_FMD_WINDOW_R0		0x0620
+#define VPE_FMD_WINDOW_MINX_MASK	0x07ff
+#define VPE_FMD_WINDOW_MINX_SHIFT	0
+#define VPE_FMD_WINDOW_MAXX_MASK	0x07ff
+#define VPE_FMD_WINDOW_MAXX_SHIFT	16
+#define VPE_FMD_WINDOW_ENABLE		(1 << 31)
+
+#define VPE_DEI_FMD_WINDOW_R1		0x0624
+#define VPE_FMD_WINDOW_MINY_MASK	0x07ff
+#define VPE_FMD_WINDOW_MINY_SHIFT	0
+#define VPE_FMD_WINDOW_MAXY_MASK	0x07ff
+#define VPE_FMD_WINDOW_MAXY_SHIFT	16
+
+#define VPE_DEI_FMD_CONTROL_R0		0x0628
+#define VPE_FMD_ENABLE			(1 << 0)
+#define VPE_FMD_LOCK			(1 << 1)
+#define VPE_FMD_JAM_DIR			(1 << 2)
+#define VPE_FMD_BED_ENABLE		(1 << 3)
+#define VPE_FMD_CAF_FIELD_THR_MASK	0xff
+#define VPE_FMD_CAF_FIELD_THR_SHIFT	16
+#define VPE_FMD_CAF_LINE_THR_MASK	0xff
+#define VPE_FMD_CAF_LINE_THR_SHIFT	24
+
+#define VPE_DEI_FMD_CONTROL_R1		0x062c
+#define VPE_FMD_CAF_THR_MASK		0x000fffff
+#define VPE_FMD_CAF_THR_SHIFT		0
+
+#define VPE_DEI_FMD_STATUS_R0		0x0630
+#define VPE_FMD_CAF_MASK		0x000fffff
+#define VPE_FMD_CAF_SHIFT		0
+#define VPE_FMD_RESET			(1 << 24)
+
+#define VPE_DEI_FMD_STATUS_R1		0x0634
+#define VPE_FMD_FIELD_DIFF_MASK		0x0fffffff
+#define VPE_FMD_FIELD_DIFF_SHIFT	0
+
+#define VPE_DEI_FMD_STATUS_R2		0x0638
+#define VPE_FMD_FRAME_DIFF_MASK		0x000fffff
+#define VPE_FMD_FRAME_DIFF_SHIFT	0
+
+/* VPE scaler regs */
+#define VPE_SC_MP_SC0			0x0700
+#define VPE_INTERLACE_O			(1 << 0)
+#define VPE_LINEAR			(1 << 1)
+#define VPE_SC_BYPASS			(1 << 2)
+#define VPE_INVT_FID			(1 << 3)
+#define VPE_USE_RAV			(1 << 4)
+#define VPE_ENABLE_EV			(1 << 5)
+#define VPE_AUTH_HS			(1 << 6)
+#define VPE_DCM_2X			(1 << 7)
+#define VPE_DCM_4X			(1 << 8)
+#define VPE_HP_BYPASS			(1 << 9)
+#define VPE_INTERLACE_I			(1 << 10)
+#define VPE_ENABLE_SIN2_VER_INTP	(1 << 11)
+#define VPE_Y_PK_EN			(1 << 14)
+#define VPE_TRIM			(1 << 15)
+#define VPE_SELFGEN_FID			(1 << 16)
+
+#define VPE_SC_MP_SC1			0x0704
+#define VPE_ROW_ACC_INC_MASK		0x07ffffff
+#define VPE_ROW_ACC_INC_SHIFT		0
+
+#define VPE_SC_MP_SC2			0x0708
+#define VPE_ROW_ACC_OFFSET_MASK		0x0fffffff
+#define VPE_ROW_ACC_OFFSET_SHIFT	0
+
+#define VPE_SC_MP_SC3			0x070c
+#define VPE_ROW_ACC_OFFSET_B_MASK	0x0fffffff
+#define VPE_ROW_ACC_OFFSET_B_SHIFT	0
+
+#define VPE_SC_MP_SC4			0x0710
+#define VPE_TAR_H_MASK			0x07ff
+#define VPE_TAR_H_SHIFT			0
+#define VPE_TAR_W_MASK			0x07ff
+#define VPE_TAR_W_SHIFT			12
+#define VPE_LIN_ACC_INC_U_MASK		0x07
+#define VPE_LIN_ACC_INC_U_SHIFT		24
+#define VPE_NLIN_ACC_INIT_U_MASK	0x07
+#define VPE_NLIN_ACC_INIT_U_SHIFT	28
+
+#define VPE_SC_MP_SC5			0x0714
+#define VPE_SRC_H_MASK			0x07ff
+#define VPE_SRC_H_SHIFT			0
+#define VPE_SRC_W_MASK			0x07ff
+#define VPE_SRC_W_SHIFT			12
+#define VPE_NLIN_ACC_INC_U_MASK		0x07
+#define VPE_NLIN_ACC_INC_U_SHIFT	24
+
+#define VPE_SC_MP_SC6			0x0718
+#define VPE_ROW_ACC_INIT_RAV_MASK	0x03ff
+#define VPE_ROW_ACC_INIT_RAV_SHIFT	0
+#define VPE_ROW_ACC_INIT_RAV_B_MASK	0x03ff
+#define VPE_ROW_ACC_INIT_RAV_B_SHIFT	10
+
+#define VPE_SC_MP_SC8			0x0720
+#define VPE_NLIN_LEFT_MASK		0x07ff
+#define VPE_NLIN_LEFT_SHIFT		0
+#define VPE_NLIN_RIGHT_MASK		0x07ff
+#define VPE_NLIN_RIGHT_SHIFT		12
+
+#define VPE_SC_MP_SC9			0x0724
+#define VPE_LIN_ACC_INC			VPE_SC_MP_SC9
+
+#define VPE_SC_MP_SC10			0x0728
+#define VPE_NLIN_ACC_INIT		VPE_SC_MP_SC10
+
+#define VPE_SC_MP_SC11			0x072c
+#define VPE_NLIN_ACC_INC		VPE_SC_MP_SC11
+
+#define VPE_SC_MP_SC12			0x0730
+#define VPE_COL_ACC_OFFSET_MASK		0x01ffffff
+#define VPE_COL_ACC_OFFSET_SHIFT	0
+
+#define VPE_SC_MP_SC13			0x0734
+#define VPE_SC_FACTOR_RAV_MASK		0x03ff
+#define VPE_SC_FACTOR_RAV_SHIFT		0
+#define VPE_CHROMA_INTP_THR_MASK	0x03ff
+#define VPE_CHROMA_INTP_THR_SHIFT	12
+#define VPE_DELTA_CHROMA_THR_MASK	0x0f
+#define VPE_DELTA_CHROMA_THR_SHIFT	24
+
+#define VPE_SC_MP_SC17			0x0744
+#define VPE_EV_THR_MASK			0x03ff
+#define VPE_EV_THR_SHIFT		12
+#define VPE_DELTA_LUMA_THR_MASK		0x0f
+#define VPE_DELTA_LUMA_THR_SHIFT	24
+#define VPE_DELTA_EV_THR_MASK		0x0f
+#define VPE_DELTA_EV_THR_SHIFT		28
+
+#define VPE_SC_MP_SC18			0x0748
+#define VPE_HS_FACTOR_MASK		0x03ff
+#define VPE_HS_FACTOR_SHIFT		0
+#define VPE_CONF_DEFAULT_MASK		0x01ff
+#define VPE_CONF_DEFAULT_SHIFT		16
+
+#define VPE_SC_MP_SC19			0x074c
+#define VPE_HPF_COEFF0_MASK		0xff
+#define VPE_HPF_COEFF0_SHIFT		0
+#define VPE_HPF_COEFF1_MASK		0xff
+#define VPE_HPF_COEFF1_SHIFT		8
+#define VPE_HPF_COEFF2_MASK		0xff
+#define VPE_HPF_COEFF2_SHIFT		16
+#define VPE_HPF_COEFF3_MASK		0xff
+#define VPE_HPF_COEFF3_SHIFT		23
+
+#define VPE_SC_MP_SC20			0x0750
+#define VPE_HPF_COEFF4_MASK		0xff
+#define VPE_HPF_COEFF4_SHIFT		0
+#define VPE_HPF_COEFF5_MASK		0xff
+#define VPE_HPF_COEFF5_SHIFT		8
+#define VPE_HPF_NORM_SHIFT_MASK		0x07
+#define VPE_HPF_NORM_SHIFT_SHIFT	16
+#define VPE_NL_LIMIT_MASK		0x1ff
+#define VPE_NL_LIMIT_SHIFT		20
+
+#define VPE_SC_MP_SC21			0x0754
+#define VPE_NL_LO_THR_MASK		0x01ff
+#define VPE_NL_LO_THR_SHIFT		0
+#define VPE_NL_LO_SLOPE_MASK		0xff
+#define VPE_NL_LO_SLOPE_SHIFT		16
+
+#define VPE_SC_MP_SC22			0x0758
+#define VPE_NL_HI_THR_MASK		0x01ff
+#define VPE_NL_HI_THR_SHIFT		0
+#define VPE_NL_HI_SLOPE_SH_MASK		0x07
+#define VPE_NL_HI_SLOPE_SH_SHIFT	16
+
+#define VPE_SC_MP_SC23			0x075c
+#define VPE_GRADIENT_THR_MASK		0x07ff
+#define VPE_GRADIENT_THR_SHIFT		0
+#define VPE_GRADIENT_THR_RANGE_MASK	0x0f
+#define VPE_GRADIENT_THR_RANGE_SHIFT	12
+#define VPE_MIN_GY_THR_MASK		0xff
+#define VPE_MIN_GY_THR_SHIFT		16
+#define VPE_MIN_GY_THR_RANGE_MASK	0x0f
+#define VPE_MIN_GY_THR_RANGE_SHIFT	28
+
+#define VPE_SC_MP_SC24			0x0760
+#define VPE_ORG_H_MASK			0x07ff
+#define VPE_ORG_H_SHIFT			0
+#define VPE_ORG_W_MASK			0x07ff
+#define VPE_ORG_W_SHIFT			16
+
+#define VPE_SC_MP_SC25			0x0764
+#define VPE_OFF_H_MASK			0x07ff
+#define VPE_OFF_H_SHIFT			0
+#define VPE_OFF_W_MASK			0x07ff
+#define VPE_OFF_W_SHIFT			16
+
+/* VPE color space converter regs */
+#define VPE_CSC_CSC00			0x5700
+#define VPE_CSC_A0_MASK			0x1fff
+#define VPE_CSC_A0_SHIFT		0
+#define VPE_CSC_B0_MASK			0x1fff
+#define VPE_CSC_B0_SHIFT		16
+
+#define VPE_CSC_CSC01			0x5704
+#define VPE_CSC_C0_MASK			0x1fff
+#define VPE_CSC_C0_SHIFT		0
+#define VPE_CSC_A1_MASK			0x1fff
+#define VPE_CSC_A1_SHIFT		16
+
+#define VPE_CSC_CSC02			0x5708
+#define VPE_CSC_B1_MASK			0x1fff
+#define VPE_CSC_B1_SHIFT		0
+#define VPE_CSC_C1_MASK			0x1fff
+#define VPE_CSC_C1_SHIFT		16
+
+#define VPE_CSC_CSC03			0x570c
+#define VPE_CSC_A2_MASK			0x1fff
+#define VPE_CSC_A2_SHIFT		0
+#define VPE_CSC_B2_MASK			0x1fff
+#define VPE_CSC_B2_SHIFT		16
+
+#define VPE_CSC_CSC04			0x5710
+#define VPE_CSC_C2_MASK			0x1fff
+#define VPE_CSC_C2_SHIFT		0
+#define VPE_CSC_D0_MASK			0x0fff
+#define VPE_CSC_D0_SHIFT		16
+
+#define VPE_CSC_CSC05			0x5714
+#define VPE_CSC_D1_MASK			0x0fff
+#define VPE_CSC_D1_SHIFT		0
+#define VPE_CSC_D2_MASK			0x0fff
+#define VPE_CSC_D2_SHIFT		16
+#define VPE_CSC_BYPASS			(1 << 28)
+
+#endif
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH 4/6] v4l: ti-vpe: Add de-interlacer support in VPE
  2013-08-02 14:03 ` Archit Taneja
@ 2013-08-02 14:03   ` Archit Taneja
  -1 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-08-02 14:03 UTC (permalink / raw)
  To: linux-media
  Cc: linux-omap, dagriego, dale, pawel, m.szyprowski, hverkuil,
	laurent.pinchart, tomi.valkeinen, Archit Taneja

Add support for the de-interlacer block in VPE.

For de-interlacer to work, we need to enable 2 more sets of VPE input ports
which fetch data from the 'last' and 'last to last' fields of the interlaced
video. Apart from that, we need to enable the Motion vector output and input
ports, and also allocate DMA buffers for them.

We need to make sure that two most recent fields in the source queue are
available and in the 'READY' state. Once a mem2mem context gets access to the
VPE HW(in device_run), it extracts the addresses of the 3 buffers, and provides
it to the data descriptors for the 3 sets of input ports((LUMA1, CHROMA1),
(LUMA2, CHROMA2), and (LUMA3, CHROMA3)) respectively for the 3 consecutive
fields. The motion vector and output port descriptors are configured and the
list is submitted to VPDMA.

Once the transaction is done, the v4l2 buffer corresponding to the oldest
field(the 3rd one) is changed to the state 'DONE', and the buffers corresponding
to 1st and 2nd fields become the 2nd and 3rd field for the next de-interlace
operation. This way, for each deinterlace operation, we have the 3 most recent
fields. After each transaction, we also swap the motion vector buffers, the new
input motion vector buffer contains the resultant motion information of all the
previous frames, and the new output motion vector buffer will be used to hold
the updated motion vector to capture the motion changes in the next field.

The de-interlacer is removed from bypass mode, it requires some extra default
configurations which are now added. The chrominance upsampler coefficients are
added for interlaced frames. Some VPDMA parameters like frame start event and
line mode are configured for the 2 extra sets of input ports.

Signed-off-by: Archit Taneja <archit@ti.com>
---
 drivers/media/platform/ti-vpe/vpe.c | 372 ++++++++++++++++++++++++++++++++----
 1 file changed, 337 insertions(+), 35 deletions(-)

diff --git a/drivers/media/platform/ti-vpe/vpe.c b/drivers/media/platform/ti-vpe/vpe.c
index 14a292b..5b1410c 100644
--- a/drivers/media/platform/ti-vpe/vpe.c
+++ b/drivers/media/platform/ti-vpe/vpe.c
@@ -69,6 +69,8 @@
 #define VPE_CHROMA	1
 
 /* per m2m context info */
+#define VPE_MAX_SRC_BUFS	3	/* need 3 src fields to de-interlace */
+
 #define VPE_DEF_BUFS_PER_JOB	1	/* default one buffer per batch job */
 
 /*
@@ -104,12 +106,44 @@ struct vpe_us_coeffs {
 /*
  * Default upsampler coefficients
  */
-static struct vpe_us_coeffs us_coeffs[] = {
+static struct vpe_us_coeffs us_coeffs[2] = {
 	{
 		/* Coefficients for progressive input */
 		0x00C8, 0x0348, 0x0018, 0x3FD8, 0x3FB8, 0x0378, 0x00E8, 0x3FE8,
 		0x00C8, 0x0348, 0x0018, 0x3FD8, 0x3FB8, 0x0378, 0x00E8, 0x3FE8,
 	},
+	{
+		/* Coefficients for Top Field Interlaced input */
+		0x0051, 0x03D5, 0x3FE3, 0x3FF7, 0x3FB5, 0x02E9, 0x018F, 0x3FD3,
+		/* Coefficients for Bottom Field Interlaced input */
+		0x016B, 0x0247, 0x00B1, 0x3F9D, 0x3FCF, 0x03DB, 0x005D, 0x3FF9,
+	},
+};
+
+/*
+ * the following registers are for configuring some of the parameters of the
+ * motion and edge detection blocks inside DEI, these generally remain the same,
+ * these could be passed later via userspace if some one needs to tweak these.
+ */
+struct vpe_dei_regs {
+	unsigned long mdt_spacial_freq_thr_reg;		/* VPE_DEI_REG2 */
+	unsigned long edi_config_reg;			/* VPE_DEI_REG3 */
+	unsigned long edi_lut_reg0;			/* VPE_DEI_REG4 */
+	unsigned long edi_lut_reg1;			/* VPE_DEI_REG5 */
+	unsigned long edi_lut_reg2;			/* VPE_DEI_REG6 */
+	unsigned long edi_lut_reg3;			/* VPE_DEI_REG7 */
+};
+
+/*
+ * default expert DEI register values, unlikely to be modified.
+ */
+static struct vpe_dei_regs dei_regs = {
+	0x020C0804u,
+	0x0118100Fu,
+	0x08040200u,
+	0x1010100Cu,
+	0x10101010u,
+	0x10101010u,
 };
 
 /*
@@ -117,6 +151,7 @@ static struct vpe_us_coeffs us_coeffs[] = {
  */
 struct vpe_port_data {
 	enum vpdma_channel channel;	/* VPDMA channel */
+	u8	vb_index;		/* input frame f, f-1, f-2 index */
 	u8	vb_part;		/* plane index for co-panar formats */
 };
 
@@ -125,6 +160,12 @@ struct vpe_port_data {
  */
 #define VPE_PORT_LUMA1_IN	0
 #define VPE_PORT_CHROMA1_IN	1
+#define VPE_PORT_LUMA2_IN	2
+#define VPE_PORT_CHROMA2_IN	3
+#define VPE_PORT_LUMA3_IN	4
+#define VPE_PORT_CHROMA3_IN	5
+#define VPE_PORT_MV_IN		6
+#define VPE_PORT_MV_OUT		7
 #define VPE_PORT_LUMA_OUT	8
 #define VPE_PORT_CHROMA_OUT	9
 #define VPE_PORT_RGB_OUT	10
@@ -132,12 +173,40 @@ struct vpe_port_data {
 static struct vpe_port_data port_data[11] = {
 	[VPE_PORT_LUMA1_IN] = {
 		.channel	= VPE_CHAN_LUMA1_IN,
+		.vb_index	= 0,
 		.vb_part	= VPE_LUMA,
 	},
 	[VPE_PORT_CHROMA1_IN] = {
 		.channel	= VPE_CHAN_CHROMA1_IN,
+		.vb_index	= 0,
+		.vb_part	= VPE_CHROMA,
+	},
+	[VPE_PORT_LUMA2_IN] = {
+		.channel	= VPE_CHAN_LUMA2_IN,
+		.vb_index	= 1,
+		.vb_part	= VPE_LUMA,
+	},
+	[VPE_PORT_CHROMA2_IN] = {
+		.channel	= VPE_CHAN_CHROMA2_IN,
+		.vb_index	= 1,
 		.vb_part	= VPE_CHROMA,
 	},
+	[VPE_PORT_LUMA3_IN] = {
+		.channel	= VPE_CHAN_LUMA3_IN,
+		.vb_index	= 2,
+		.vb_part	= VPE_LUMA,
+	},
+	[VPE_PORT_CHROMA3_IN] = {
+		.channel	= VPE_CHAN_CHROMA3_IN,
+		.vb_index	= 2,
+		.vb_part	= VPE_CHROMA,
+	},
+	[VPE_PORT_MV_IN] = {
+		.channel	= VPE_CHAN_MV_IN,
+	},
+	[VPE_PORT_MV_OUT] = {
+		.channel	= VPE_CHAN_MV_OUT,
+	},
 	[VPE_PORT_LUMA_OUT] = {
 		.channel	= VPE_CHAN_LUMA_OUT,
 		.vb_part	= VPE_LUMA,
@@ -209,6 +278,7 @@ struct vpe_q_data {
 	unsigned int		height;				/* frame height */
 	unsigned int		bytesperline[VPE_MAX_PLANES];	/* bytes per line in memory */
 	enum v4l2_colorspace	colorspace;
+	enum v4l2_field		field;				/* supported field value */
 	unsigned int		flags;
 	unsigned int		sizeimage[VPE_MAX_PLANES];	/* image size in memory */
 	struct v4l2_rect	c_rect;				/* crop/compose rectangle */
@@ -218,6 +288,7 @@ struct vpe_q_data {
 /* vpe_q_data flag bits */
 #define	Q_DATA_FRAME_1D		(1 << 0)
 #define	Q_DATA_MODE_TILED	(1 << 1)
+#define	Q_DATA_INTERLACED	(1 << 2)
 
 enum {
 	Q_DATA_SRC = 0,
@@ -269,6 +340,7 @@ struct vpe_ctx {
 	struct v4l2_m2m_ctx	*m2m_ctx;
 	struct v4l2_ctrl_handler hdl;
 
+	unsigned int		field;			/* current field */
 	unsigned int		sequence;		/* current frame/field seq */
 	unsigned int		aborting;		/* abort after next irq */
 
@@ -276,13 +348,17 @@ struct vpe_ctx {
 	unsigned int		bufs_completed;		/* bufs done in this batch */
 
 	struct vpe_q_data	q_data[2];		/* src & dst queue data */
-	struct vb2_buffer	*src_vb;
+	struct vb2_buffer	*src_vbs[VPE_MAX_SRC_BUFS];
 	struct vb2_buffer	*dst_vb;
 
+	struct vpdma_buf	mv_buf[2];		/* motion vector in/out bufs */
 	struct vpdma_buf	mmr_adb;		/* shadow reg addr/data block */
 	struct vpdma_desc_list	desc_list;		/* DMA descriptor list */
 
+	bool			deinterlacing;		/* using de-interlacer */
 	bool			load_mmrs;		/* have new shadow reg values */
+
+	unsigned int		src_mv_buf_selector;
 };
 
 
@@ -358,8 +434,7 @@ struct vpe_mmr_adb {
 	struct vpdma_adb_hdr	us3_hdr;
 	u32			us3_regs[8];
 	struct vpdma_adb_hdr	dei_hdr;
-	u32			dei_regs[1];
-	u32			dei_pad[3];
+	u32			dei_regs[8];
 	struct vpdma_adb_hdr	sc_hdr;
 	u32			sc_regs[1];
 	u32			sc_pad[3];
@@ -385,6 +460,74 @@ static void init_adb_hdrs(struct vpe_ctx *ctx)
 };
 
 /*
+ * Allocate or re-allocate the motion vector DMA buffers
+ * There are two buffers, one for input and one for output.
+ * However, the roles are reversed after each field is processed.
+ * In other words, after each field is processed, the previous
+ * output (dst) MV buffer becomes the new input (src) MV buffer.
+ */
+static int realloc_mv_buffers(struct vpe_ctx *ctx, size_t size)
+{
+	struct vpdma_data *vpdma = ctx->dev->vpdma;
+	int ret;
+
+	if (ctx->mv_buf[0].mapped) {
+		vpdma_buf_unmap(vpdma, &ctx->mv_buf[0]);
+		vpdma_buf_free(&ctx->mv_buf[0]);
+	}
+
+	if (ctx->mv_buf[1].mapped) {
+		vpdma_buf_unmap(vpdma, &ctx->mv_buf[1]);
+		vpdma_buf_free(&ctx->mv_buf[1]);
+	}
+
+	if (size == 0)
+		return 0;
+
+	ret = vpdma_buf_alloc(&ctx->mv_buf[0], size);
+	if (ret)
+		return ret;
+	ret = vpdma_buf_alloc(&ctx->mv_buf[1], size);
+	if (ret) {
+		vpdma_buf_free(&ctx->mv_buf[0]);
+		return ret;
+	}
+
+	vpdma_buf_map(vpdma, &ctx->mv_buf[0]);
+	vpdma_buf_map(vpdma, &ctx->mv_buf[1]);
+
+	ctx->src_mv_buf_selector = 0;
+
+	return 0;
+}
+
+static void free_mv_buffers(struct vpe_ctx *ctx)
+{
+	realloc_mv_buffers(ctx, 0);
+}
+
+/*
+ * While de-interlacing, we keep the two most recent input buffers
+ * around.  This function frees those two buffers when we have
+ * finished processing the current stream.
+ */
+static void free_vbs(struct vpe_ctx *ctx)
+{
+	struct vpe_dev *dev = ctx->dev;
+	unsigned long flags;
+
+	if (ctx->src_vbs[2] == NULL)
+		return;
+
+	spin_lock_irqsave(&dev->lock, flags);
+	if (ctx->src_vbs[2]) {
+		v4l2_m2m_buf_done(ctx->src_vbs[2], VB2_BUF_STATE_DONE);
+		v4l2_m2m_buf_done(ctx->src_vbs[1], VB2_BUF_STATE_DONE);
+	}
+	spin_unlock_irqrestore(&dev->lock, flags);
+}
+
+/*
  * Enable or disable the VPE clocks
  */
 static void vpe_set_clock_enable(struct vpe_dev *dev, bool on)
@@ -425,6 +568,7 @@ static void vpe_top_vpdma_reset(struct vpe_dev *dev)
 static void set_us_coefficients(struct vpe_ctx *ctx)
 {
 	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	struct vpe_q_data *s_q_data = &ctx->q_data[Q_DATA_SRC];
 	u32 *us1_reg = &mmr_adb->us1_regs[0];
 	u32 *us2_reg = &mmr_adb->us2_regs[0];
 	u32 *us3_reg = &mmr_adb->us3_regs[0];
@@ -432,6 +576,9 @@ static void set_us_coefficients(struct vpe_ctx *ctx)
 
 	cp = &us_coeffs[0].anchor_fid0_c0;
 
+	if (s_q_data->flags & Q_DATA_INTERLACED)	/* interlaced */
+		cp += sizeof(us_coeffs[0]) / sizeof(*cp);
+
 	end_cp = cp + sizeof(us_coeffs[0]) / sizeof(*cp);
 
 	while (cp < end_cp) {
@@ -472,14 +619,28 @@ static void set_cfg_and_line_modes(struct vpe_ctx *ctx)
 
 	/* regs for now */
 	vpdma_set_line_mode(ctx->dev->vpdma, line_mode, VPE_CHAN_CHROMA1_IN);
+	vpdma_set_line_mode(ctx->dev->vpdma, line_mode, VPE_CHAN_CHROMA2_IN);
+	vpdma_set_line_mode(ctx->dev->vpdma, line_mode, VPE_CHAN_CHROMA3_IN);
 
 	/* frame start for input luma */
 	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
 		VPE_CHAN_LUMA1_IN);
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_LUMA2_IN);
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_LUMA3_IN);
 
 	/* frame start for input chroma */
 	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
 		VPE_CHAN_CHROMA1_IN);
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_CHROMA2_IN);
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_CHROMA3_IN);
+
+	/* frame start for MV in client */
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_MV_IN);
 
 	ctx->load_mmrs = true;
 }
@@ -523,13 +684,14 @@ static void set_dst_registers(struct vpe_ctx *ctx)
 /*
  * Set the de-interlacer shadow register values
  */
-static void set_dei_regs_bypass(struct vpe_ctx *ctx)
+static void set_dei_regs(struct vpe_ctx *ctx)
 {
 	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
 	struct vpe_q_data *s_q_data = &ctx->q_data[Q_DATA_SRC];
 	unsigned int src_h = s_q_data->c_rect.height;
 	unsigned int src_w = s_q_data->c_rect.width;
 	u32 *dei_mmr0 = &mmr_adb->dei_regs[0];
+	bool deinterlace = true;
 	u32 val = 0;
 
 	/*
@@ -538,7 +700,13 @@ static void set_dei_regs_bypass(struct vpe_ctx *ctx)
 	 * for both progressive and interlace content in interlace bypass mode.
 	 * It has been recommended not to use progressive bypass mode.
 	 */
-	val = VPE_DEI_INTERLACE_BYPASS;
+	if ((!ctx->deinterlacing && (s_q_data->flags & Q_DATA_INTERLACED)) ||
+			!(s_q_data->flags & Q_DATA_INTERLACED)) {
+		deinterlace = false;
+		val = VPE_DEI_INTERLACE_BYPASS;
+	}
+
+	src_h = deinterlace ? src_h * 2 : src_h;
 
 	val |= (src_h << VPE_DEI_HEIGHT_SHIFT) |
 		(src_w << VPE_DEI_WIDTH_SHIFT) |
@@ -577,10 +745,35 @@ static void set_sc_regs_bypass(struct vpe_ctx *ctx)
  */
 static int set_srcdst_params(struct vpe_ctx *ctx)
 {
+	struct vpe_q_data *s_q_data =  &ctx->q_data[Q_DATA_SRC];
+	struct vpe_q_data *d_q_data =  &ctx->q_data[Q_DATA_DST];
+	size_t mv_buf_size;
+	int ret;
+
 	ctx->sequence = 0;
+	ctx->field = V4L2_FIELD_TOP;
+
+	if ((s_q_data->flags & Q_DATA_INTERLACED) &&
+			!(d_q_data->flags & Q_DATA_INTERLACED)) {
+		struct vpdma_data_format *mv =
+			&vpdma_misc_fmts[VPDMA_DATA_FMT_MV];
+
+		ctx->deinterlacing = 1;
+		mv_buf_size =
+			(s_q_data->width * s_q_data->height * mv->depth) >> 3;
+	} else {
+		ctx->deinterlacing = 0;
+		mv_buf_size = 0;
+	}
+
+	free_vbs(ctx);
+
+	ret = realloc_mv_buffers(ctx, mv_buf_size);
+	if (ret)
+		return ret;
 
 	set_cfg_and_line_modes(ctx);
-	set_dei_regs_bypass(ctx);
+	set_dei_regs(ctx);
 	set_csc_coeff_bypass(ctx);
 	set_sc_regs_bypass(ctx);
 
@@ -607,6 +800,9 @@ static int job_ready(void *priv)
 	struct vpe_ctx *ctx = priv;
 	int needed = ctx->bufs_per_job;
 
+	if (ctx->deinterlacing && ctx->src_vbs[2] == NULL)
+		needed += 2;	/* need additional two most recent fields */
+
 	if (v4l2_m2m_num_src_bufs_ready(ctx->m2m_ctx) < needed)
 		return 0;
 
@@ -734,17 +930,25 @@ static void add_out_dtd(struct vpe_ctx *ctx, int port)
 	struct v4l2_rect *c_rect = &q_data->c_rect;
 	struct vpe_fmt *fmt = q_data->fmt;
 	struct vpdma_data_format *vpdma_fmt;
-	int plane = fmt->coplanar ? p_data->vb_part : 0;
+	int mv_buf_selector = !ctx->src_mv_buf_selector;
 	dma_addr_t dma_addr;
 	u32 flags = 0;
 
-	vpdma_fmt = fmt->vpdma_fmt[plane];
-	dma_addr = vb2_dma_contig_plane_dma_addr(vb, plane);
-	if (!dma_addr) {
-		vpe_err(ctx->dev,
-			"acquiring output buffer(%d) dma_addr failed\n",
-			port);
-		return;
+	if (port == VPE_PORT_MV_OUT) {
+		vpdma_fmt = &vpdma_misc_fmts[VPDMA_DATA_FMT_MV];
+		dma_addr = ctx->mv_buf[mv_buf_selector].dma_addr;
+	} else {
+		/* to incorporate interleaved formats */
+		int plane = fmt->coplanar ? p_data->vb_part : 0;
+
+		vpdma_fmt = fmt->vpdma_fmt[plane];
+		dma_addr = vb2_dma_contig_plane_dma_addr(vb, plane);
+		if (!dma_addr) {
+			vpe_err(ctx->dev,
+				"acquiring output buffer(%d) dma_addr failed\n",
+				port);
+			return;
+		}
 	}
 
 	if (q_data->flags & Q_DATA_FRAME_1D)
@@ -760,23 +964,31 @@ static void add_in_dtd(struct vpe_ctx *ctx, int port)
 {
 	struct vpe_q_data *q_data = &ctx->q_data[Q_DATA_SRC];
 	struct vpe_port_data *p_data = &port_data[port];
-	struct vb2_buffer *vb = ctx->src_vb;
+	struct vb2_buffer *vb = ctx->src_vbs[p_data->vb_index];
 	struct v4l2_rect *c_rect = &q_data->c_rect;
 	struct vpe_fmt *fmt = q_data->fmt;
 	struct vpdma_data_format *vpdma_fmt;
-	int plane = fmt->coplanar ? p_data->vb_part : 0;
-	int field = 0;
+	int mv_buf_selector = ctx->src_mv_buf_selector;
+	int field = vb->v4l2_buf.field == V4L2_FIELD_BOTTOM;
 	dma_addr_t dma_addr;
 	u32 flags = 0;
 
-	vpdma_fmt = fmt->vpdma_fmt[plane];
+	if (port == VPE_PORT_MV_IN) {
+		vpdma_fmt = &vpdma_misc_fmts[VPDMA_DATA_FMT_MV];
+		dma_addr = ctx->mv_buf[mv_buf_selector].dma_addr;
+	} else {
+		/* to incorporate interleaved formats */
+		int plane = fmt->coplanar ? p_data->vb_part : 0;
 
-	dma_addr = vb2_dma_contig_plane_dma_addr(vb, plane);
-	if (!dma_addr) {
-		vpe_err(ctx->dev,
-			"acquiring input buffer(%d) dma_addr failed\n",
-			port);
-		return;
+		vpdma_fmt = fmt->vpdma_fmt[plane];
+
+		dma_addr = vb2_dma_contig_plane_dma_addr(vb, plane);
+		if (!dma_addr) {
+			vpe_err(ctx->dev,
+				"acquiring input buffer(%d) dma_addr failed\n",
+				port);
+			return;
+		}
 	}
 
 	if (q_data->flags & Q_DATA_FRAME_1D)
@@ -794,7 +1006,8 @@ static void add_in_dtd(struct vpe_ctx *ctx, int port)
 static void enable_irqs(struct vpe_ctx *ctx)
 {
 	write_reg(ctx->dev, VPE_INT0_ENABLE0_SET, VPE_INT0_LIST0_COMPLETE);
-	write_reg(ctx->dev, VPE_INT0_ENABLE1_SET, VPE_DS1_UV_ERROR_INT);
+	write_reg(ctx->dev, VPE_INT0_ENABLE1_SET, VPE_DEI_ERROR_INT |
+				VPE_DS1_UV_ERROR_INT);
 
 	vpdma_enable_list_complete_irq(ctx->dev->vpdma, 0, true);
 }
@@ -817,8 +1030,15 @@ static void device_run(void *priv)
 	struct vpe_ctx *ctx = priv;
 	struct vpe_q_data *d_q_data = &ctx->q_data[Q_DATA_DST];
 
-	ctx->src_vb = v4l2_m2m_src_buf_remove(ctx->m2m_ctx);
-	WARN_ON(ctx->src_vb == NULL);
+	if (ctx->deinterlacing && ctx->src_vbs[2] == NULL) {
+		ctx->src_vbs[2] = v4l2_m2m_src_buf_remove(ctx->m2m_ctx);
+		WARN_ON(ctx->src_vbs[2] == NULL);
+		ctx->src_vbs[1] = v4l2_m2m_src_buf_remove(ctx->m2m_ctx);
+		WARN_ON(ctx->src_vbs[1] == NULL);
+	}
+
+	ctx->src_vbs[0] = v4l2_m2m_src_buf_remove(ctx->m2m_ctx);
+	WARN_ON(ctx->src_vbs[0] == NULL);
 	ctx->dst_vb = v4l2_m2m_dst_buf_remove(ctx->m2m_ctx);
 	WARN_ON(ctx->dst_vb == NULL);
 
@@ -830,24 +1050,49 @@ static void device_run(void *priv)
 		ctx->load_mmrs = false;
 	}
 
+	/* output data descriptors */
+	if (ctx->deinterlacing)
+		add_out_dtd(ctx, VPE_PORT_MV_OUT);
+
 	add_out_dtd(ctx, VPE_PORT_LUMA_OUT);
 	if (d_q_data->fmt->coplanar)
 		add_out_dtd(ctx, VPE_PORT_CHROMA_OUT);
 
+	/* input data descriptors */
+	if (ctx->deinterlacing) {
+		add_in_dtd(ctx, VPE_PORT_LUMA3_IN);
+		add_in_dtd(ctx, VPE_PORT_CHROMA3_IN);
+
+		add_in_dtd(ctx, VPE_PORT_LUMA2_IN);
+		add_in_dtd(ctx, VPE_PORT_CHROMA2_IN);
+	}
+
 	add_in_dtd(ctx, VPE_PORT_LUMA1_IN);
 	add_in_dtd(ctx, VPE_PORT_CHROMA1_IN);
 
+	if (ctx->deinterlacing)
+		add_in_dtd(ctx, VPE_PORT_MV_IN);
+
 	/* sync on channel control descriptors for output ports */
 	vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_LUMA_OUT);
 	if (d_q_data->fmt->coplanar)
 		vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_CHROMA_OUT);
 
+	if (ctx->deinterlacing)
+		vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_MV_OUT);
+
 	enable_irqs(ctx);
 
 	vpdma_buf_map(ctx->dev->vpdma, &ctx->desc_list.buf);
 	vpdma_submit_descs(ctx->dev->vpdma, &ctx->desc_list);
 }
 
+static void dei_error(struct vpe_ctx *ctx)
+{
+	dev_warn(ctx->dev->v4l2_dev.dev,
+		"received DEI error interrupt\n");
+}
+
 static void ds1_uv_error(struct vpe_ctx *ctx)
 {
 	dev_warn(ctx->dev->v4l2_dev.dev,
@@ -858,6 +1103,7 @@ static irqreturn_t vpe_irq(int irq_vpe, void *data)
 {
 	struct vpe_dev *dev = (struct vpe_dev *)data;
 	struct vpe_ctx *ctx;
+	struct vpe_q_data *d_q_data;
 	struct vb2_buffer *s_vb, *d_vb;
 	struct v4l2_buffer *s_buf, *d_buf;
 	unsigned long flags;
@@ -881,9 +1127,15 @@ static irqreturn_t vpe_irq(int irq_vpe, void *data)
 		goto handled;
 	}
 
-	if (irqst1 & VPE_DS1_UV_ERROR_INT) {
-		irqst1 &= ~VPE_DS1_UV_ERROR_INT;
-		ds1_uv_error(ctx);
+	if (irqst1) {
+		if (irqst1 & VPE_DEI_ERROR_INT) {
+			irqst1 &= ~VPE_DEI_ERROR_INT;
+			dei_error(ctx);
+		}
+		if (irqst1 & VPE_DS1_UV_ERROR_INT) {
+			irqst1 &= ~VPE_DS1_UV_ERROR_INT;
+			ds1_uv_error(ctx);
+		}
 	}
 
 	if (irqst0) {
@@ -906,10 +1158,13 @@ static irqreturn_t vpe_irq(int irq_vpe, void *data)
 
 	vpdma_reset_desc_list(&ctx->desc_list);
 
+	 /* the previous dst mv buffer becomes the next src mv buffer */
+	ctx->src_mv_buf_selector = !ctx->src_mv_buf_selector;
+
 	if (ctx->aborting)
 		goto finished;
 
-	s_vb = ctx->src_vb;
+	s_vb = ctx->src_vbs[0];
 	d_vb = ctx->dst_vb;
 	s_buf = &s_vb->v4l2_buf;
 	d_buf = &d_vb->v4l2_buf;
@@ -919,16 +1174,35 @@ static irqreturn_t vpe_irq(int irq_vpe, void *data)
 		d_buf->flags |= V4L2_BUF_FLAG_TIMECODE;
 		d_buf->timecode = s_buf->timecode;
 	}
-
 	d_buf->sequence = ctx->sequence;
+	d_buf->field = ctx->field;
+
+	d_q_data = &ctx->q_data[Q_DATA_DST];
+	if (d_q_data->flags & Q_DATA_INTERLACED) {
+		if (ctx->field == V4L2_FIELD_BOTTOM) {
+			ctx->sequence++;
+			ctx->field = V4L2_FIELD_TOP;
+		} else {
+			WARN_ON(ctx->field != V4L2_FIELD_TOP);
+			ctx->field = V4L2_FIELD_BOTTOM;
+		}
+	} else {
+		ctx->sequence++;
+	}
 
-	ctx->sequence++;
+	if (ctx->deinterlacing)
+		s_vb = ctx->src_vbs[2];
 
 	spin_lock_irqsave(&dev->lock, flags);
 	v4l2_m2m_buf_done(s_vb, VB2_BUF_STATE_DONE);
 	v4l2_m2m_buf_done(d_vb, VB2_BUF_STATE_DONE);
 	spin_unlock_irqrestore(&dev->lock, flags);
 
+	if (ctx->deinterlacing) {
+		ctx->src_vbs[2] = ctx->src_vbs[1];
+		ctx->src_vbs[1] = ctx->src_vbs[0];
+	}
+
 	ctx->bufs_completed++;
 	if (ctx->bufs_completed < ctx->bufs_per_job) {
 		device_run(ctx);
@@ -1009,6 +1283,7 @@ static int vpe_g_fmt(struct file *file, void *priv, struct v4l2_format *f)
 	pix->width = q_data->width;
 	pix->height = q_data->height;
 	pix->pixelformat = q_data->fmt->fourcc;
+	pix->field = q_data->field;
 	pix->colorspace = q_data->colorspace;
 	pix->num_planes = q_data->fmt->coplanar ? 2 : 1;
 
@@ -1035,7 +1310,8 @@ static int __vpe_try_fmt(struct vpe_ctx *ctx, struct v4l2_format *f,
 
 	if (pix->field == V4L2_FIELD_ANY)
 		pix->field = V4L2_FIELD_NONE;
-	else if (V4L2_FIELD_NONE != pix->field)
+	else if (V4L2_FIELD_NONE != pix->field &&
+			V4L2_FIELD_ALTERNATE != pix->field)
 		return -EINVAL;
 
 	v4l_bound_align_image(&pix->width, MIN_W, MAX_W, W_ALIGN,
@@ -1104,6 +1380,7 @@ static int __vpe_s_fmt(struct vpe_ctx *ctx, struct v4l2_format *f)
 	q_data->width		= pix->width;
 	q_data->height		= pix->height;
 	q_data->colorspace	= pix->colorspace;
+	q_data->field		= pix->field;
 
 	for (i = 0; i < pix->num_planes; i++) {
 		plane_fmt = &pix->plane_fmt[i];
@@ -1117,6 +1394,11 @@ static int __vpe_s_fmt(struct vpe_ctx *ctx, struct v4l2_format *f)
 	q_data->c_rect.width	= q_data->width;
 	q_data->c_rect.height	= q_data->height;
 
+	if (q_data->field == V4L2_FIELD_ALTERNATE)
+		q_data->flags |= Q_DATA_INTERLACED;
+	else
+		q_data->flags &= ~Q_DATA_INTERLACED;
+
 	vpe_dbg(ctx->dev, "Setting format for type %d, wxh: %dx%d, fmt: %d bpl_y %d",
 		f->type, q_data->width, q_data->height, q_data->fmt->fourcc,
 		q_data->bytesperline[VPE_LUMA]);
@@ -1194,6 +1476,22 @@ static int vpe_streamoff(struct file *file, void *priv, enum v4l2_buf_type type)
 	return v4l2_m2m_streamoff(file, ctx->m2m_ctx, type);
 }
 
+static void set_dei_shadow_registers(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	u32 *dei_mmr = &mmr_adb->dei_regs[0];
+	struct vpe_dei_regs *cur = &dei_regs;
+
+	dei_mmr[2]  = cur->mdt_spacial_freq_thr_reg;
+	dei_mmr[3]  = cur->edi_config_reg;
+	dei_mmr[4]  = cur->edi_lut_reg0;
+	dei_mmr[5]  = cur->edi_lut_reg1;
+	dei_mmr[6]  = cur->edi_lut_reg2;
+	dei_mmr[7]  = cur->edi_lut_reg3;
+
+	ctx->load_mmrs = true;
+}
+
 #define V4L2_CID_TRANS_NUM_BUFS		(V4L2_CID_USER_BASE)
 
 static int vpe_s_ctrl(struct v4l2_ctrl *ctrl)
@@ -1425,6 +1723,7 @@ static int vpe_open(struct file *file)
 	s_q_data->sizeimage[VPE_LUMA] = (s_q_data->width * s_q_data->height *
 			s_q_data->fmt->vpdma_fmt[VPE_LUMA]->depth) >> 3;
 	s_q_data->colorspace = V4L2_COLORSPACE_SMPTE240M;
+	s_q_data->field = V4L2_FIELD_NONE;
 	s_q_data->c_rect.left = 0;
 	s_q_data->c_rect.top = 0;
 	s_q_data->c_rect.width = s_q_data->width;
@@ -1433,6 +1732,7 @@ static int vpe_open(struct file *file)
 
 	ctx->q_data[Q_DATA_DST] = *s_q_data;
 
+	set_dei_shadow_registers(ctx);
 	set_src_registers(ctx);
 	set_dst_registers(ctx);
 	ret = set_srcdst_params(ctx);
@@ -1487,6 +1787,8 @@ static int vpe_release(struct file *file)
 	vpe_dbg(dev, "releasing instance %p\n", ctx);
 
 	mutex_lock(&dev->dev_mutex);
+	free_vbs(ctx);
+	free_mv_buffers(ctx);
 	vpdma_free_desc_list(&ctx->desc_list);
 	vpdma_buf_free(&ctx->mmr_adb);
 
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH 4/6] v4l: ti-vpe: Add de-interlacer support in VPE
@ 2013-08-02 14:03   ` Archit Taneja
  0 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-08-02 14:03 UTC (permalink / raw)
  To: linux-media
  Cc: linux-omap, dagriego, dale, pawel, m.szyprowski, hverkuil,
	laurent.pinchart, tomi.valkeinen, Archit Taneja

Add support for the de-interlacer block in VPE.

For de-interlacer to work, we need to enable 2 more sets of VPE input ports
which fetch data from the 'last' and 'last to last' fields of the interlaced
video. Apart from that, we need to enable the Motion vector output and input
ports, and also allocate DMA buffers for them.

We need to make sure that two most recent fields in the source queue are
available and in the 'READY' state. Once a mem2mem context gets access to the
VPE HW(in device_run), it extracts the addresses of the 3 buffers, and provides
it to the data descriptors for the 3 sets of input ports((LUMA1, CHROMA1),
(LUMA2, CHROMA2), and (LUMA3, CHROMA3)) respectively for the 3 consecutive
fields. The motion vector and output port descriptors are configured and the
list is submitted to VPDMA.

Once the transaction is done, the v4l2 buffer corresponding to the oldest
field(the 3rd one) is changed to the state 'DONE', and the buffers corresponding
to 1st and 2nd fields become the 2nd and 3rd field for the next de-interlace
operation. This way, for each deinterlace operation, we have the 3 most recent
fields. After each transaction, we also swap the motion vector buffers, the new
input motion vector buffer contains the resultant motion information of all the
previous frames, and the new output motion vector buffer will be used to hold
the updated motion vector to capture the motion changes in the next field.

The de-interlacer is removed from bypass mode, it requires some extra default
configurations which are now added. The chrominance upsampler coefficients are
added for interlaced frames. Some VPDMA parameters like frame start event and
line mode are configured for the 2 extra sets of input ports.

Signed-off-by: Archit Taneja <archit@ti.com>
---
 drivers/media/platform/ti-vpe/vpe.c | 372 ++++++++++++++++++++++++++++++++----
 1 file changed, 337 insertions(+), 35 deletions(-)

diff --git a/drivers/media/platform/ti-vpe/vpe.c b/drivers/media/platform/ti-vpe/vpe.c
index 14a292b..5b1410c 100644
--- a/drivers/media/platform/ti-vpe/vpe.c
+++ b/drivers/media/platform/ti-vpe/vpe.c
@@ -69,6 +69,8 @@
 #define VPE_CHROMA	1
 
 /* per m2m context info */
+#define VPE_MAX_SRC_BUFS	3	/* need 3 src fields to de-interlace */
+
 #define VPE_DEF_BUFS_PER_JOB	1	/* default one buffer per batch job */
 
 /*
@@ -104,12 +106,44 @@ struct vpe_us_coeffs {
 /*
  * Default upsampler coefficients
  */
-static struct vpe_us_coeffs us_coeffs[] = {
+static struct vpe_us_coeffs us_coeffs[2] = {
 	{
 		/* Coefficients for progressive input */
 		0x00C8, 0x0348, 0x0018, 0x3FD8, 0x3FB8, 0x0378, 0x00E8, 0x3FE8,
 		0x00C8, 0x0348, 0x0018, 0x3FD8, 0x3FB8, 0x0378, 0x00E8, 0x3FE8,
 	},
+	{
+		/* Coefficients for Top Field Interlaced input */
+		0x0051, 0x03D5, 0x3FE3, 0x3FF7, 0x3FB5, 0x02E9, 0x018F, 0x3FD3,
+		/* Coefficients for Bottom Field Interlaced input */
+		0x016B, 0x0247, 0x00B1, 0x3F9D, 0x3FCF, 0x03DB, 0x005D, 0x3FF9,
+	},
+};
+
+/*
+ * the following registers are for configuring some of the parameters of the
+ * motion and edge detection blocks inside DEI, these generally remain the same,
+ * these could be passed later via userspace if some one needs to tweak these.
+ */
+struct vpe_dei_regs {
+	unsigned long mdt_spacial_freq_thr_reg;		/* VPE_DEI_REG2 */
+	unsigned long edi_config_reg;			/* VPE_DEI_REG3 */
+	unsigned long edi_lut_reg0;			/* VPE_DEI_REG4 */
+	unsigned long edi_lut_reg1;			/* VPE_DEI_REG5 */
+	unsigned long edi_lut_reg2;			/* VPE_DEI_REG6 */
+	unsigned long edi_lut_reg3;			/* VPE_DEI_REG7 */
+};
+
+/*
+ * default expert DEI register values, unlikely to be modified.
+ */
+static struct vpe_dei_regs dei_regs = {
+	0x020C0804u,
+	0x0118100Fu,
+	0x08040200u,
+	0x1010100Cu,
+	0x10101010u,
+	0x10101010u,
 };
 
 /*
@@ -117,6 +151,7 @@ static struct vpe_us_coeffs us_coeffs[] = {
  */
 struct vpe_port_data {
 	enum vpdma_channel channel;	/* VPDMA channel */
+	u8	vb_index;		/* input frame f, f-1, f-2 index */
 	u8	vb_part;		/* plane index for co-panar formats */
 };
 
@@ -125,6 +160,12 @@ struct vpe_port_data {
  */
 #define VPE_PORT_LUMA1_IN	0
 #define VPE_PORT_CHROMA1_IN	1
+#define VPE_PORT_LUMA2_IN	2
+#define VPE_PORT_CHROMA2_IN	3
+#define VPE_PORT_LUMA3_IN	4
+#define VPE_PORT_CHROMA3_IN	5
+#define VPE_PORT_MV_IN		6
+#define VPE_PORT_MV_OUT		7
 #define VPE_PORT_LUMA_OUT	8
 #define VPE_PORT_CHROMA_OUT	9
 #define VPE_PORT_RGB_OUT	10
@@ -132,12 +173,40 @@ struct vpe_port_data {
 static struct vpe_port_data port_data[11] = {
 	[VPE_PORT_LUMA1_IN] = {
 		.channel	= VPE_CHAN_LUMA1_IN,
+		.vb_index	= 0,
 		.vb_part	= VPE_LUMA,
 	},
 	[VPE_PORT_CHROMA1_IN] = {
 		.channel	= VPE_CHAN_CHROMA1_IN,
+		.vb_index	= 0,
+		.vb_part	= VPE_CHROMA,
+	},
+	[VPE_PORT_LUMA2_IN] = {
+		.channel	= VPE_CHAN_LUMA2_IN,
+		.vb_index	= 1,
+		.vb_part	= VPE_LUMA,
+	},
+	[VPE_PORT_CHROMA2_IN] = {
+		.channel	= VPE_CHAN_CHROMA2_IN,
+		.vb_index	= 1,
 		.vb_part	= VPE_CHROMA,
 	},
+	[VPE_PORT_LUMA3_IN] = {
+		.channel	= VPE_CHAN_LUMA3_IN,
+		.vb_index	= 2,
+		.vb_part	= VPE_LUMA,
+	},
+	[VPE_PORT_CHROMA3_IN] = {
+		.channel	= VPE_CHAN_CHROMA3_IN,
+		.vb_index	= 2,
+		.vb_part	= VPE_CHROMA,
+	},
+	[VPE_PORT_MV_IN] = {
+		.channel	= VPE_CHAN_MV_IN,
+	},
+	[VPE_PORT_MV_OUT] = {
+		.channel	= VPE_CHAN_MV_OUT,
+	},
 	[VPE_PORT_LUMA_OUT] = {
 		.channel	= VPE_CHAN_LUMA_OUT,
 		.vb_part	= VPE_LUMA,
@@ -209,6 +278,7 @@ struct vpe_q_data {
 	unsigned int		height;				/* frame height */
 	unsigned int		bytesperline[VPE_MAX_PLANES];	/* bytes per line in memory */
 	enum v4l2_colorspace	colorspace;
+	enum v4l2_field		field;				/* supported field value */
 	unsigned int		flags;
 	unsigned int		sizeimage[VPE_MAX_PLANES];	/* image size in memory */
 	struct v4l2_rect	c_rect;				/* crop/compose rectangle */
@@ -218,6 +288,7 @@ struct vpe_q_data {
 /* vpe_q_data flag bits */
 #define	Q_DATA_FRAME_1D		(1 << 0)
 #define	Q_DATA_MODE_TILED	(1 << 1)
+#define	Q_DATA_INTERLACED	(1 << 2)
 
 enum {
 	Q_DATA_SRC = 0,
@@ -269,6 +340,7 @@ struct vpe_ctx {
 	struct v4l2_m2m_ctx	*m2m_ctx;
 	struct v4l2_ctrl_handler hdl;
 
+	unsigned int		field;			/* current field */
 	unsigned int		sequence;		/* current frame/field seq */
 	unsigned int		aborting;		/* abort after next irq */
 
@@ -276,13 +348,17 @@ struct vpe_ctx {
 	unsigned int		bufs_completed;		/* bufs done in this batch */
 
 	struct vpe_q_data	q_data[2];		/* src & dst queue data */
-	struct vb2_buffer	*src_vb;
+	struct vb2_buffer	*src_vbs[VPE_MAX_SRC_BUFS];
 	struct vb2_buffer	*dst_vb;
 
+	struct vpdma_buf	mv_buf[2];		/* motion vector in/out bufs */
 	struct vpdma_buf	mmr_adb;		/* shadow reg addr/data block */
 	struct vpdma_desc_list	desc_list;		/* DMA descriptor list */
 
+	bool			deinterlacing;		/* using de-interlacer */
 	bool			load_mmrs;		/* have new shadow reg values */
+
+	unsigned int		src_mv_buf_selector;
 };
 
 
@@ -358,8 +434,7 @@ struct vpe_mmr_adb {
 	struct vpdma_adb_hdr	us3_hdr;
 	u32			us3_regs[8];
 	struct vpdma_adb_hdr	dei_hdr;
-	u32			dei_regs[1];
-	u32			dei_pad[3];
+	u32			dei_regs[8];
 	struct vpdma_adb_hdr	sc_hdr;
 	u32			sc_regs[1];
 	u32			sc_pad[3];
@@ -385,6 +460,74 @@ static void init_adb_hdrs(struct vpe_ctx *ctx)
 };
 
 /*
+ * Allocate or re-allocate the motion vector DMA buffers
+ * There are two buffers, one for input and one for output.
+ * However, the roles are reversed after each field is processed.
+ * In other words, after each field is processed, the previous
+ * output (dst) MV buffer becomes the new input (src) MV buffer.
+ */
+static int realloc_mv_buffers(struct vpe_ctx *ctx, size_t size)
+{
+	struct vpdma_data *vpdma = ctx->dev->vpdma;
+	int ret;
+
+	if (ctx->mv_buf[0].mapped) {
+		vpdma_buf_unmap(vpdma, &ctx->mv_buf[0]);
+		vpdma_buf_free(&ctx->mv_buf[0]);
+	}
+
+	if (ctx->mv_buf[1].mapped) {
+		vpdma_buf_unmap(vpdma, &ctx->mv_buf[1]);
+		vpdma_buf_free(&ctx->mv_buf[1]);
+	}
+
+	if (size == 0)
+		return 0;
+
+	ret = vpdma_buf_alloc(&ctx->mv_buf[0], size);
+	if (ret)
+		return ret;
+	ret = vpdma_buf_alloc(&ctx->mv_buf[1], size);
+	if (ret) {
+		vpdma_buf_free(&ctx->mv_buf[0]);
+		return ret;
+	}
+
+	vpdma_buf_map(vpdma, &ctx->mv_buf[0]);
+	vpdma_buf_map(vpdma, &ctx->mv_buf[1]);
+
+	ctx->src_mv_buf_selector = 0;
+
+	return 0;
+}
+
+static void free_mv_buffers(struct vpe_ctx *ctx)
+{
+	realloc_mv_buffers(ctx, 0);
+}
+
+/*
+ * While de-interlacing, we keep the two most recent input buffers
+ * around.  This function frees those two buffers when we have
+ * finished processing the current stream.
+ */
+static void free_vbs(struct vpe_ctx *ctx)
+{
+	struct vpe_dev *dev = ctx->dev;
+	unsigned long flags;
+
+	if (ctx->src_vbs[2] == NULL)
+		return;
+
+	spin_lock_irqsave(&dev->lock, flags);
+	if (ctx->src_vbs[2]) {
+		v4l2_m2m_buf_done(ctx->src_vbs[2], VB2_BUF_STATE_DONE);
+		v4l2_m2m_buf_done(ctx->src_vbs[1], VB2_BUF_STATE_DONE);
+	}
+	spin_unlock_irqrestore(&dev->lock, flags);
+}
+
+/*
  * Enable or disable the VPE clocks
  */
 static void vpe_set_clock_enable(struct vpe_dev *dev, bool on)
@@ -425,6 +568,7 @@ static void vpe_top_vpdma_reset(struct vpe_dev *dev)
 static void set_us_coefficients(struct vpe_ctx *ctx)
 {
 	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	struct vpe_q_data *s_q_data = &ctx->q_data[Q_DATA_SRC];
 	u32 *us1_reg = &mmr_adb->us1_regs[0];
 	u32 *us2_reg = &mmr_adb->us2_regs[0];
 	u32 *us3_reg = &mmr_adb->us3_regs[0];
@@ -432,6 +576,9 @@ static void set_us_coefficients(struct vpe_ctx *ctx)
 
 	cp = &us_coeffs[0].anchor_fid0_c0;
 
+	if (s_q_data->flags & Q_DATA_INTERLACED)	/* interlaced */
+		cp += sizeof(us_coeffs[0]) / sizeof(*cp);
+
 	end_cp = cp + sizeof(us_coeffs[0]) / sizeof(*cp);
 
 	while (cp < end_cp) {
@@ -472,14 +619,28 @@ static void set_cfg_and_line_modes(struct vpe_ctx *ctx)
 
 	/* regs for now */
 	vpdma_set_line_mode(ctx->dev->vpdma, line_mode, VPE_CHAN_CHROMA1_IN);
+	vpdma_set_line_mode(ctx->dev->vpdma, line_mode, VPE_CHAN_CHROMA2_IN);
+	vpdma_set_line_mode(ctx->dev->vpdma, line_mode, VPE_CHAN_CHROMA3_IN);
 
 	/* frame start for input luma */
 	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
 		VPE_CHAN_LUMA1_IN);
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_LUMA2_IN);
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_LUMA3_IN);
 
 	/* frame start for input chroma */
 	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
 		VPE_CHAN_CHROMA1_IN);
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_CHROMA2_IN);
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_CHROMA3_IN);
+
+	/* frame start for MV in client */
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_MV_IN);
 
 	ctx->load_mmrs = true;
 }
@@ -523,13 +684,14 @@ static void set_dst_registers(struct vpe_ctx *ctx)
 /*
  * Set the de-interlacer shadow register values
  */
-static void set_dei_regs_bypass(struct vpe_ctx *ctx)
+static void set_dei_regs(struct vpe_ctx *ctx)
 {
 	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
 	struct vpe_q_data *s_q_data = &ctx->q_data[Q_DATA_SRC];
 	unsigned int src_h = s_q_data->c_rect.height;
 	unsigned int src_w = s_q_data->c_rect.width;
 	u32 *dei_mmr0 = &mmr_adb->dei_regs[0];
+	bool deinterlace = true;
 	u32 val = 0;
 
 	/*
@@ -538,7 +700,13 @@ static void set_dei_regs_bypass(struct vpe_ctx *ctx)
 	 * for both progressive and interlace content in interlace bypass mode.
 	 * It has been recommended not to use progressive bypass mode.
 	 */
-	val = VPE_DEI_INTERLACE_BYPASS;
+	if ((!ctx->deinterlacing && (s_q_data->flags & Q_DATA_INTERLACED)) ||
+			!(s_q_data->flags & Q_DATA_INTERLACED)) {
+		deinterlace = false;
+		val = VPE_DEI_INTERLACE_BYPASS;
+	}
+
+	src_h = deinterlace ? src_h * 2 : src_h;
 
 	val |= (src_h << VPE_DEI_HEIGHT_SHIFT) |
 		(src_w << VPE_DEI_WIDTH_SHIFT) |
@@ -577,10 +745,35 @@ static void set_sc_regs_bypass(struct vpe_ctx *ctx)
  */
 static int set_srcdst_params(struct vpe_ctx *ctx)
 {
+	struct vpe_q_data *s_q_data =  &ctx->q_data[Q_DATA_SRC];
+	struct vpe_q_data *d_q_data =  &ctx->q_data[Q_DATA_DST];
+	size_t mv_buf_size;
+	int ret;
+
 	ctx->sequence = 0;
+	ctx->field = V4L2_FIELD_TOP;
+
+	if ((s_q_data->flags & Q_DATA_INTERLACED) &&
+			!(d_q_data->flags & Q_DATA_INTERLACED)) {
+		struct vpdma_data_format *mv =
+			&vpdma_misc_fmts[VPDMA_DATA_FMT_MV];
+
+		ctx->deinterlacing = 1;
+		mv_buf_size =
+			(s_q_data->width * s_q_data->height * mv->depth) >> 3;
+	} else {
+		ctx->deinterlacing = 0;
+		mv_buf_size = 0;
+	}
+
+	free_vbs(ctx);
+
+	ret = realloc_mv_buffers(ctx, mv_buf_size);
+	if (ret)
+		return ret;
 
 	set_cfg_and_line_modes(ctx);
-	set_dei_regs_bypass(ctx);
+	set_dei_regs(ctx);
 	set_csc_coeff_bypass(ctx);
 	set_sc_regs_bypass(ctx);
 
@@ -607,6 +800,9 @@ static int job_ready(void *priv)
 	struct vpe_ctx *ctx = priv;
 	int needed = ctx->bufs_per_job;
 
+	if (ctx->deinterlacing && ctx->src_vbs[2] == NULL)
+		needed += 2;	/* need additional two most recent fields */
+
 	if (v4l2_m2m_num_src_bufs_ready(ctx->m2m_ctx) < needed)
 		return 0;
 
@@ -734,17 +930,25 @@ static void add_out_dtd(struct vpe_ctx *ctx, int port)
 	struct v4l2_rect *c_rect = &q_data->c_rect;
 	struct vpe_fmt *fmt = q_data->fmt;
 	struct vpdma_data_format *vpdma_fmt;
-	int plane = fmt->coplanar ? p_data->vb_part : 0;
+	int mv_buf_selector = !ctx->src_mv_buf_selector;
 	dma_addr_t dma_addr;
 	u32 flags = 0;
 
-	vpdma_fmt = fmt->vpdma_fmt[plane];
-	dma_addr = vb2_dma_contig_plane_dma_addr(vb, plane);
-	if (!dma_addr) {
-		vpe_err(ctx->dev,
-			"acquiring output buffer(%d) dma_addr failed\n",
-			port);
-		return;
+	if (port == VPE_PORT_MV_OUT) {
+		vpdma_fmt = &vpdma_misc_fmts[VPDMA_DATA_FMT_MV];
+		dma_addr = ctx->mv_buf[mv_buf_selector].dma_addr;
+	} else {
+		/* to incorporate interleaved formats */
+		int plane = fmt->coplanar ? p_data->vb_part : 0;
+
+		vpdma_fmt = fmt->vpdma_fmt[plane];
+		dma_addr = vb2_dma_contig_plane_dma_addr(vb, plane);
+		if (!dma_addr) {
+			vpe_err(ctx->dev,
+				"acquiring output buffer(%d) dma_addr failed\n",
+				port);
+			return;
+		}
 	}
 
 	if (q_data->flags & Q_DATA_FRAME_1D)
@@ -760,23 +964,31 @@ static void add_in_dtd(struct vpe_ctx *ctx, int port)
 {
 	struct vpe_q_data *q_data = &ctx->q_data[Q_DATA_SRC];
 	struct vpe_port_data *p_data = &port_data[port];
-	struct vb2_buffer *vb = ctx->src_vb;
+	struct vb2_buffer *vb = ctx->src_vbs[p_data->vb_index];
 	struct v4l2_rect *c_rect = &q_data->c_rect;
 	struct vpe_fmt *fmt = q_data->fmt;
 	struct vpdma_data_format *vpdma_fmt;
-	int plane = fmt->coplanar ? p_data->vb_part : 0;
-	int field = 0;
+	int mv_buf_selector = ctx->src_mv_buf_selector;
+	int field = vb->v4l2_buf.field == V4L2_FIELD_BOTTOM;
 	dma_addr_t dma_addr;
 	u32 flags = 0;
 
-	vpdma_fmt = fmt->vpdma_fmt[plane];
+	if (port == VPE_PORT_MV_IN) {
+		vpdma_fmt = &vpdma_misc_fmts[VPDMA_DATA_FMT_MV];
+		dma_addr = ctx->mv_buf[mv_buf_selector].dma_addr;
+	} else {
+		/* to incorporate interleaved formats */
+		int plane = fmt->coplanar ? p_data->vb_part : 0;
 
-	dma_addr = vb2_dma_contig_plane_dma_addr(vb, plane);
-	if (!dma_addr) {
-		vpe_err(ctx->dev,
-			"acquiring input buffer(%d) dma_addr failed\n",
-			port);
-		return;
+		vpdma_fmt = fmt->vpdma_fmt[plane];
+
+		dma_addr = vb2_dma_contig_plane_dma_addr(vb, plane);
+		if (!dma_addr) {
+			vpe_err(ctx->dev,
+				"acquiring input buffer(%d) dma_addr failed\n",
+				port);
+			return;
+		}
 	}
 
 	if (q_data->flags & Q_DATA_FRAME_1D)
@@ -794,7 +1006,8 @@ static void add_in_dtd(struct vpe_ctx *ctx, int port)
 static void enable_irqs(struct vpe_ctx *ctx)
 {
 	write_reg(ctx->dev, VPE_INT0_ENABLE0_SET, VPE_INT0_LIST0_COMPLETE);
-	write_reg(ctx->dev, VPE_INT0_ENABLE1_SET, VPE_DS1_UV_ERROR_INT);
+	write_reg(ctx->dev, VPE_INT0_ENABLE1_SET, VPE_DEI_ERROR_INT |
+				VPE_DS1_UV_ERROR_INT);
 
 	vpdma_enable_list_complete_irq(ctx->dev->vpdma, 0, true);
 }
@@ -817,8 +1030,15 @@ static void device_run(void *priv)
 	struct vpe_ctx *ctx = priv;
 	struct vpe_q_data *d_q_data = &ctx->q_data[Q_DATA_DST];
 
-	ctx->src_vb = v4l2_m2m_src_buf_remove(ctx->m2m_ctx);
-	WARN_ON(ctx->src_vb == NULL);
+	if (ctx->deinterlacing && ctx->src_vbs[2] == NULL) {
+		ctx->src_vbs[2] = v4l2_m2m_src_buf_remove(ctx->m2m_ctx);
+		WARN_ON(ctx->src_vbs[2] == NULL);
+		ctx->src_vbs[1] = v4l2_m2m_src_buf_remove(ctx->m2m_ctx);
+		WARN_ON(ctx->src_vbs[1] == NULL);
+	}
+
+	ctx->src_vbs[0] = v4l2_m2m_src_buf_remove(ctx->m2m_ctx);
+	WARN_ON(ctx->src_vbs[0] == NULL);
 	ctx->dst_vb = v4l2_m2m_dst_buf_remove(ctx->m2m_ctx);
 	WARN_ON(ctx->dst_vb == NULL);
 
@@ -830,24 +1050,49 @@ static void device_run(void *priv)
 		ctx->load_mmrs = false;
 	}
 
+	/* output data descriptors */
+	if (ctx->deinterlacing)
+		add_out_dtd(ctx, VPE_PORT_MV_OUT);
+
 	add_out_dtd(ctx, VPE_PORT_LUMA_OUT);
 	if (d_q_data->fmt->coplanar)
 		add_out_dtd(ctx, VPE_PORT_CHROMA_OUT);
 
+	/* input data descriptors */
+	if (ctx->deinterlacing) {
+		add_in_dtd(ctx, VPE_PORT_LUMA3_IN);
+		add_in_dtd(ctx, VPE_PORT_CHROMA3_IN);
+
+		add_in_dtd(ctx, VPE_PORT_LUMA2_IN);
+		add_in_dtd(ctx, VPE_PORT_CHROMA2_IN);
+	}
+
 	add_in_dtd(ctx, VPE_PORT_LUMA1_IN);
 	add_in_dtd(ctx, VPE_PORT_CHROMA1_IN);
 
+	if (ctx->deinterlacing)
+		add_in_dtd(ctx, VPE_PORT_MV_IN);
+
 	/* sync on channel control descriptors for output ports */
 	vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_LUMA_OUT);
 	if (d_q_data->fmt->coplanar)
 		vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_CHROMA_OUT);
 
+	if (ctx->deinterlacing)
+		vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_MV_OUT);
+
 	enable_irqs(ctx);
 
 	vpdma_buf_map(ctx->dev->vpdma, &ctx->desc_list.buf);
 	vpdma_submit_descs(ctx->dev->vpdma, &ctx->desc_list);
 }
 
+static void dei_error(struct vpe_ctx *ctx)
+{
+	dev_warn(ctx->dev->v4l2_dev.dev,
+		"received DEI error interrupt\n");
+}
+
 static void ds1_uv_error(struct vpe_ctx *ctx)
 {
 	dev_warn(ctx->dev->v4l2_dev.dev,
@@ -858,6 +1103,7 @@ static irqreturn_t vpe_irq(int irq_vpe, void *data)
 {
 	struct vpe_dev *dev = (struct vpe_dev *)data;
 	struct vpe_ctx *ctx;
+	struct vpe_q_data *d_q_data;
 	struct vb2_buffer *s_vb, *d_vb;
 	struct v4l2_buffer *s_buf, *d_buf;
 	unsigned long flags;
@@ -881,9 +1127,15 @@ static irqreturn_t vpe_irq(int irq_vpe, void *data)
 		goto handled;
 	}
 
-	if (irqst1 & VPE_DS1_UV_ERROR_INT) {
-		irqst1 &= ~VPE_DS1_UV_ERROR_INT;
-		ds1_uv_error(ctx);
+	if (irqst1) {
+		if (irqst1 & VPE_DEI_ERROR_INT) {
+			irqst1 &= ~VPE_DEI_ERROR_INT;
+			dei_error(ctx);
+		}
+		if (irqst1 & VPE_DS1_UV_ERROR_INT) {
+			irqst1 &= ~VPE_DS1_UV_ERROR_INT;
+			ds1_uv_error(ctx);
+		}
 	}
 
 	if (irqst0) {
@@ -906,10 +1158,13 @@ static irqreturn_t vpe_irq(int irq_vpe, void *data)
 
 	vpdma_reset_desc_list(&ctx->desc_list);
 
+	 /* the previous dst mv buffer becomes the next src mv buffer */
+	ctx->src_mv_buf_selector = !ctx->src_mv_buf_selector;
+
 	if (ctx->aborting)
 		goto finished;
 
-	s_vb = ctx->src_vb;
+	s_vb = ctx->src_vbs[0];
 	d_vb = ctx->dst_vb;
 	s_buf = &s_vb->v4l2_buf;
 	d_buf = &d_vb->v4l2_buf;
@@ -919,16 +1174,35 @@ static irqreturn_t vpe_irq(int irq_vpe, void *data)
 		d_buf->flags |= V4L2_BUF_FLAG_TIMECODE;
 		d_buf->timecode = s_buf->timecode;
 	}
-
 	d_buf->sequence = ctx->sequence;
+	d_buf->field = ctx->field;
+
+	d_q_data = &ctx->q_data[Q_DATA_DST];
+	if (d_q_data->flags & Q_DATA_INTERLACED) {
+		if (ctx->field == V4L2_FIELD_BOTTOM) {
+			ctx->sequence++;
+			ctx->field = V4L2_FIELD_TOP;
+		} else {
+			WARN_ON(ctx->field != V4L2_FIELD_TOP);
+			ctx->field = V4L2_FIELD_BOTTOM;
+		}
+	} else {
+		ctx->sequence++;
+	}
 
-	ctx->sequence++;
+	if (ctx->deinterlacing)
+		s_vb = ctx->src_vbs[2];
 
 	spin_lock_irqsave(&dev->lock, flags);
 	v4l2_m2m_buf_done(s_vb, VB2_BUF_STATE_DONE);
 	v4l2_m2m_buf_done(d_vb, VB2_BUF_STATE_DONE);
 	spin_unlock_irqrestore(&dev->lock, flags);
 
+	if (ctx->deinterlacing) {
+		ctx->src_vbs[2] = ctx->src_vbs[1];
+		ctx->src_vbs[1] = ctx->src_vbs[0];
+	}
+
 	ctx->bufs_completed++;
 	if (ctx->bufs_completed < ctx->bufs_per_job) {
 		device_run(ctx);
@@ -1009,6 +1283,7 @@ static int vpe_g_fmt(struct file *file, void *priv, struct v4l2_format *f)
 	pix->width = q_data->width;
 	pix->height = q_data->height;
 	pix->pixelformat = q_data->fmt->fourcc;
+	pix->field = q_data->field;
 	pix->colorspace = q_data->colorspace;
 	pix->num_planes = q_data->fmt->coplanar ? 2 : 1;
 
@@ -1035,7 +1310,8 @@ static int __vpe_try_fmt(struct vpe_ctx *ctx, struct v4l2_format *f,
 
 	if (pix->field == V4L2_FIELD_ANY)
 		pix->field = V4L2_FIELD_NONE;
-	else if (V4L2_FIELD_NONE != pix->field)
+	else if (V4L2_FIELD_NONE != pix->field &&
+			V4L2_FIELD_ALTERNATE != pix->field)
 		return -EINVAL;
 
 	v4l_bound_align_image(&pix->width, MIN_W, MAX_W, W_ALIGN,
@@ -1104,6 +1380,7 @@ static int __vpe_s_fmt(struct vpe_ctx *ctx, struct v4l2_format *f)
 	q_data->width		= pix->width;
 	q_data->height		= pix->height;
 	q_data->colorspace	= pix->colorspace;
+	q_data->field		= pix->field;
 
 	for (i = 0; i < pix->num_planes; i++) {
 		plane_fmt = &pix->plane_fmt[i];
@@ -1117,6 +1394,11 @@ static int __vpe_s_fmt(struct vpe_ctx *ctx, struct v4l2_format *f)
 	q_data->c_rect.width	= q_data->width;
 	q_data->c_rect.height	= q_data->height;
 
+	if (q_data->field == V4L2_FIELD_ALTERNATE)
+		q_data->flags |= Q_DATA_INTERLACED;
+	else
+		q_data->flags &= ~Q_DATA_INTERLACED;
+
 	vpe_dbg(ctx->dev, "Setting format for type %d, wxh: %dx%d, fmt: %d bpl_y %d",
 		f->type, q_data->width, q_data->height, q_data->fmt->fourcc,
 		q_data->bytesperline[VPE_LUMA]);
@@ -1194,6 +1476,22 @@ static int vpe_streamoff(struct file *file, void *priv, enum v4l2_buf_type type)
 	return v4l2_m2m_streamoff(file, ctx->m2m_ctx, type);
 }
 
+static void set_dei_shadow_registers(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	u32 *dei_mmr = &mmr_adb->dei_regs[0];
+	struct vpe_dei_regs *cur = &dei_regs;
+
+	dei_mmr[2]  = cur->mdt_spacial_freq_thr_reg;
+	dei_mmr[3]  = cur->edi_config_reg;
+	dei_mmr[4]  = cur->edi_lut_reg0;
+	dei_mmr[5]  = cur->edi_lut_reg1;
+	dei_mmr[6]  = cur->edi_lut_reg2;
+	dei_mmr[7]  = cur->edi_lut_reg3;
+
+	ctx->load_mmrs = true;
+}
+
 #define V4L2_CID_TRANS_NUM_BUFS		(V4L2_CID_USER_BASE)
 
 static int vpe_s_ctrl(struct v4l2_ctrl *ctrl)
@@ -1425,6 +1723,7 @@ static int vpe_open(struct file *file)
 	s_q_data->sizeimage[VPE_LUMA] = (s_q_data->width * s_q_data->height *
 			s_q_data->fmt->vpdma_fmt[VPE_LUMA]->depth) >> 3;
 	s_q_data->colorspace = V4L2_COLORSPACE_SMPTE240M;
+	s_q_data->field = V4L2_FIELD_NONE;
 	s_q_data->c_rect.left = 0;
 	s_q_data->c_rect.top = 0;
 	s_q_data->c_rect.width = s_q_data->width;
@@ -1433,6 +1732,7 @@ static int vpe_open(struct file *file)
 
 	ctx->q_data[Q_DATA_DST] = *s_q_data;
 
+	set_dei_shadow_registers(ctx);
 	set_src_registers(ctx);
 	set_dst_registers(ctx);
 	ret = set_srcdst_params(ctx);
@@ -1487,6 +1787,8 @@ static int vpe_release(struct file *file)
 	vpe_dbg(dev, "releasing instance %p\n", ctx);
 
 	mutex_lock(&dev->dev_mutex);
+	free_vbs(ctx);
+	free_mv_buffers(ctx);
 	vpdma_free_desc_list(&ctx->desc_list);
 	vpdma_buf_free(&ctx->mmr_adb);
 
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH 5/6] arm: dra7xx: hwmod data: add VPE hwmod data and ocp_if info
  2013-08-02 14:03 ` Archit Taneja
@ 2013-08-02 14:03   ` Archit Taneja
  -1 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-08-02 14:03 UTC (permalink / raw)
  To: linux-media
  Cc: linux-omap, dagriego, dale, pawel, m.szyprowski, hverkuil,
	laurent.pinchart, tomi.valkeinen, Archit Taneja, Rajendra Nayak,
	Sricharan R

Add hwmod data for the VPE IP, this is needed for the IP to be reset during
boot, and control the functional clock when the driver needs it via
pm_runtime apis. Add the corresponding ocp_if struct and add it DRA7XX's
ocp interface list.

Cc: Rajendra Nayak <rnayak@ti.com>
Cc: Sricharan R <r.sricharan@ti.com>
Signed-off-by: Archit Taneja <archit@ti.com>
---
 arch/arm/mach-omap2/omap_hwmod_7xx_data.c | 42 +++++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/arch/arm/mach-omap2/omap_hwmod_7xx_data.c b/arch/arm/mach-omap2/omap_hwmod_7xx_data.c
index f647998b..181365d 100644
--- a/arch/arm/mach-omap2/omap_hwmod_7xx_data.c
+++ b/arch/arm/mach-omap2/omap_hwmod_7xx_data.c
@@ -1883,6 +1883,39 @@ static struct omap_hwmod dra7xx_wd_timer2_hwmod = {
 	},
 };
 
+/*
+ * 'vpe' class
+ *
+ */
+
+static struct omap_hwmod_class_sysconfig dra7xx_vpe_sysc = {
+	.sysc_offs	= 0x0010,
+	.sysc_flags	= (SYSC_HAS_MIDLEMODE | SYSC_HAS_SIDLEMODE),
+	.idlemodes	= (SIDLE_FORCE | SIDLE_NO | SIDLE_SMART |
+			   SIDLE_SMART_WKUP | MSTANDBY_FORCE | MSTANDBY_NO |
+			   MSTANDBY_SMART | MSTANDBY_SMART_WKUP),
+	.sysc_fields	= &omap_hwmod_sysc_type2,
+};
+
+static struct omap_hwmod_class dra7xx_vpe_hwmod_class = {
+	.name	= "vpe",
+	.sysc	= &dra7xx_vpe_sysc,
+};
+
+/* vpe */
+static struct omap_hwmod dra7xx_vpe_hwmod = {
+	.name		= "vpe",
+	.class		= &dra7xx_vpe_hwmod_class,
+	.clkdm_name	= "vpe_clkdm",
+	.main_clk	= "dpll_core_h23x2_ck",
+	.prcm = {
+		.omap4 = {
+			.clkctrl_offs = DRA7XX_CM_VPE_VPE_CLKCTRL_OFFSET,
+			.context_offs = DRA7XX_RM_VPE_VPE_CONTEXT_OFFSET,
+			.modulemode   = MODULEMODE_HWCTRL,
+		},
+	},
+};
 
 /*
  * Interfaces
@@ -2636,6 +2669,14 @@ static struct omap_hwmod_ocp_if dra7xx_l4_wkup__wd_timer2 = {
 	.user		= OCP_USER_MPU | OCP_USER_SDMA,
 };
 
+/* l4_per3 -> vpe */
+static struct omap_hwmod_ocp_if dra7xx_l4_per3__vpe = {
+	.master		= &dra7xx_l4_per3_hwmod,
+	.slave		= &dra7xx_vpe_hwmod,
+	.clk		= "l3_iclk_div",
+	.user		= OCP_USER_MPU | OCP_USER_SDMA,
+};
+
 static struct omap_hwmod_ocp_if *dra7xx_hwmod_ocp_ifs[] __initdata = {
 	&dra7xx_l3_main_2__l3_instr,
 	&dra7xx_l4_cfg__l3_main_1,
@@ -2714,6 +2755,7 @@ static struct omap_hwmod_ocp_if *dra7xx_hwmod_ocp_ifs[] __initdata = {
 	&dra7xx_l3_main_1__vcp2,
 	&dra7xx_l4_per2__vcp2,
 	&dra7xx_l4_wkup__wd_timer2,
+	&dra7xx_l4_per3__vpe,
 	NULL,
 };
 
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH 5/6] arm: dra7xx: hwmod data: add VPE hwmod data and ocp_if info
@ 2013-08-02 14:03   ` Archit Taneja
  0 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-08-02 14:03 UTC (permalink / raw)
  To: linux-media
  Cc: linux-omap, dagriego, dale, pawel, m.szyprowski, hverkuil,
	laurent.pinchart, tomi.valkeinen, Archit Taneja, Rajendra Nayak,
	Sricharan R

Add hwmod data for the VPE IP, this is needed for the IP to be reset during
boot, and control the functional clock when the driver needs it via
pm_runtime apis. Add the corresponding ocp_if struct and add it DRA7XX's
ocp interface list.

Cc: Rajendra Nayak <rnayak@ti.com>
Cc: Sricharan R <r.sricharan@ti.com>
Signed-off-by: Archit Taneja <archit@ti.com>
---
 arch/arm/mach-omap2/omap_hwmod_7xx_data.c | 42 +++++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/arch/arm/mach-omap2/omap_hwmod_7xx_data.c b/arch/arm/mach-omap2/omap_hwmod_7xx_data.c
index f647998b..181365d 100644
--- a/arch/arm/mach-omap2/omap_hwmod_7xx_data.c
+++ b/arch/arm/mach-omap2/omap_hwmod_7xx_data.c
@@ -1883,6 +1883,39 @@ static struct omap_hwmod dra7xx_wd_timer2_hwmod = {
 	},
 };
 
+/*
+ * 'vpe' class
+ *
+ */
+
+static struct omap_hwmod_class_sysconfig dra7xx_vpe_sysc = {
+	.sysc_offs	= 0x0010,
+	.sysc_flags	= (SYSC_HAS_MIDLEMODE | SYSC_HAS_SIDLEMODE),
+	.idlemodes	= (SIDLE_FORCE | SIDLE_NO | SIDLE_SMART |
+			   SIDLE_SMART_WKUP | MSTANDBY_FORCE | MSTANDBY_NO |
+			   MSTANDBY_SMART | MSTANDBY_SMART_WKUP),
+	.sysc_fields	= &omap_hwmod_sysc_type2,
+};
+
+static struct omap_hwmod_class dra7xx_vpe_hwmod_class = {
+	.name	= "vpe",
+	.sysc	= &dra7xx_vpe_sysc,
+};
+
+/* vpe */
+static struct omap_hwmod dra7xx_vpe_hwmod = {
+	.name		= "vpe",
+	.class		= &dra7xx_vpe_hwmod_class,
+	.clkdm_name	= "vpe_clkdm",
+	.main_clk	= "dpll_core_h23x2_ck",
+	.prcm = {
+		.omap4 = {
+			.clkctrl_offs = DRA7XX_CM_VPE_VPE_CLKCTRL_OFFSET,
+			.context_offs = DRA7XX_RM_VPE_VPE_CONTEXT_OFFSET,
+			.modulemode   = MODULEMODE_HWCTRL,
+		},
+	},
+};
 
 /*
  * Interfaces
@@ -2636,6 +2669,14 @@ static struct omap_hwmod_ocp_if dra7xx_l4_wkup__wd_timer2 = {
 	.user		= OCP_USER_MPU | OCP_USER_SDMA,
 };
 
+/* l4_per3 -> vpe */
+static struct omap_hwmod_ocp_if dra7xx_l4_per3__vpe = {
+	.master		= &dra7xx_l4_per3_hwmod,
+	.slave		= &dra7xx_vpe_hwmod,
+	.clk		= "l3_iclk_div",
+	.user		= OCP_USER_MPU | OCP_USER_SDMA,
+};
+
 static struct omap_hwmod_ocp_if *dra7xx_hwmod_ocp_ifs[] __initdata = {
 	&dra7xx_l3_main_2__l3_instr,
 	&dra7xx_l4_cfg__l3_main_1,
@@ -2714,6 +2755,7 @@ static struct omap_hwmod_ocp_if *dra7xx_hwmod_ocp_ifs[] __initdata = {
 	&dra7xx_l3_main_1__vcp2,
 	&dra7xx_l4_per2__vcp2,
 	&dra7xx_l4_wkup__wd_timer2,
+	&dra7xx_l4_per3__vpe,
 	NULL,
 };
 
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH 6/6] experimental: arm: dts: dra7xx: Add a DT node for VPE
  2013-08-02 14:03 ` Archit Taneja
@ 2013-08-02 14:03   ` Archit Taneja
  -1 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-08-02 14:03 UTC (permalink / raw)
  To: linux-media
  Cc: linux-omap, dagriego, dale, pawel, m.szyprowski, hverkuil,
	laurent.pinchart, tomi.valkeinen, Archit Taneja, Rajendra Nayak,
	Sricharan R

Add a DT node for VPE in dra7.dtsi. This is experimental because we might need
to split the VPE address space a bit more, and also because the IRQ line
described is accessible the IRQ crossbar driver is added for DRA7XX.

Cc: Rajendra Nayak <rnayak@ti.com>
Cc: Sricharan R <r.sricharan@ti.com>
Signed-off-by: Archit Taneja <archit@ti.com>
---
 arch/arm/boot/dts/dra7.dtsi | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/arch/arm/boot/dts/dra7.dtsi b/arch/arm/boot/dts/dra7.dtsi
index ce9a0f0..3237972 100644
--- a/arch/arm/boot/dts/dra7.dtsi
+++ b/arch/arm/boot/dts/dra7.dtsi
@@ -484,6 +484,17 @@
 			dmas = <&sdma 70>, <&sdma 71>;
 			dma-names = "tx0", "rx0";
 		};
+
+		vpe {
+			compatible = "ti,vpe";
+			ti,hwmods = "vpe";
+			reg = <0x489d0000 0xd000>, <0x489dd000 0x400>;
+			reg-names = "vpe", "vpdma";
+			interrupts = <0 159 0x4>;
+			#address-cells = <1>;
+			#size-cells = <0>;
+		};
+
 	};
 
 	clocks {
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH 6/6] experimental: arm: dts: dra7xx: Add a DT node for VPE
@ 2013-08-02 14:03   ` Archit Taneja
  0 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-08-02 14:03 UTC (permalink / raw)
  To: linux-media
  Cc: linux-omap, dagriego, dale, pawel, m.szyprowski, hverkuil,
	laurent.pinchart, tomi.valkeinen, Archit Taneja, Rajendra Nayak,
	Sricharan R

Add a DT node for VPE in dra7.dtsi. This is experimental because we might need
to split the VPE address space a bit more, and also because the IRQ line
described is accessible the IRQ crossbar driver is added for DRA7XX.

Cc: Rajendra Nayak <rnayak@ti.com>
Cc: Sricharan R <r.sricharan@ti.com>
Signed-off-by: Archit Taneja <archit@ti.com>
---
 arch/arm/boot/dts/dra7.dtsi | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/arch/arm/boot/dts/dra7.dtsi b/arch/arm/boot/dts/dra7.dtsi
index ce9a0f0..3237972 100644
--- a/arch/arm/boot/dts/dra7.dtsi
+++ b/arch/arm/boot/dts/dra7.dtsi
@@ -484,6 +484,17 @@
 			dmas = <&sdma 70>, <&sdma 71>;
 			dma-names = "tx0", "rx0";
 		};
+
+		vpe {
+			compatible = "ti,vpe";
+			ti,hwmods = "vpe";
+			reg = <0x489d0000 0xd000>, <0x489dd000 0x400>;
+			reg-names = "vpe", "vpdma";
+			interrupts = <0 159 0x4>;
+			#address-cells = <1>;
+			#size-cells = <0>;
+		};
+
 	};
 
 	clocks {
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* Re: [PATCH 3/6] v4l: ti-vpe: Add VPE mem to mem driver
  2013-08-02 14:03   ` Archit Taneja
  (?)
@ 2013-08-02 14:36   ` Hans Verkuil
  2013-08-02 14:55       ` Archit Taneja
  -1 siblings, 1 reply; 138+ messages in thread
From: Hans Verkuil @ 2013-08-02 14:36 UTC (permalink / raw)
  To: Archit Taneja
  Cc: linux-media, linux-omap, dagriego, dale, pawel, m.szyprowski,
	laurent.pinchart, tomi.valkeinen

Hi Archit,

I've got a few comments:

On 08/02/2013 04:03 PM, Archit Taneja wrote:
> VPE is a block which consists of a single memory to memory path which can
> perform chrominance up/down sampling, de-interlacing, scaling, and color space
> conversion of raster or tiled YUV420 coplanar, YUV422 coplanar or YUV422
> interleaved video formats.
> 
> We create a mem2mem driver based primarily on the mem2mem-testdev example.
> The de-interlacer, scaler and color space converter are all bypassed for now
> to keep the driver simple. Chroma up/down sampler blocks are implemented, so
> conversion beteen different YUV formats is possible.
> 
> Each mem2mem context allocates a buffer for VPE MMR values which it will use
> when it gets access to the VPE HW via the mem2mem queue, it also allocates
> a VPDMA descriptor list to which configuration and data descriptors are added.
> 
> Based on the information received via v4l2 ioctls for the source and
> destination queues, the driver configures the values for the MMRs, and stores
> them in the buffer. There are also some VPDMA parameters like frame start and
> line mode which needs to be configured, these are configured by direct register
> writes via the VPDMA helper functions.
> 
> The driver's device_run() mem2mem op will add each descriptor based on how the
> source and destination queues are set up for the given ctx, once the list is
> prepared, it's submitted to VPDMA, these descriptors when parsed by VPDMA will
> upload MMR registers, start DMA of video buffers on the various input and output
> clients/ports.
> 
> When the list is parsed completely(and the DMAs on all the output ports done),
> an interrupt is generated which we use to notify that the source and destination
> buffers are done.
> 
> The rest of the driver is quite similar to other mem2mem drivers, we use the
> multiplane v4l2 ioctls as the HW support coplanar formats.
> 
> Signed-off-by: Archit Taneja <archit@ti.com>
> ---
>  drivers/media/platform/Kconfig           |   10 +
>  drivers/media/platform/Makefile          |    2 +
>  drivers/media/platform/ti-vpe/vpe.c      | 1763 ++++++++++++++++++++++++++++++
>  drivers/media/platform/ti-vpe/vpe_regs.h |  496 +++++++++
>  4 files changed, 2271 insertions(+)
>  create mode 100644 drivers/media/platform/ti-vpe/vpe.c
>  create mode 100644 drivers/media/platform/ti-vpe/vpe_regs.h
> 

...

> +/*
> + * video ioctls
> + */
> +static int vpe_querycap(struct file *file, void *priv,
> +			struct v4l2_capability *cap)
> +{
> +	strncpy(cap->driver, VPE_MODULE_NAME, sizeof(cap->driver) - 1);
> +	strncpy(cap->card, VPE_MODULE_NAME, sizeof(cap->card) - 1);
> +	strlcpy(cap->bus_info, VPE_MODULE_NAME, sizeof(cap->bus_info));
> +	cap->device_caps  = V4L2_CAP_VIDEO_M2M | V4L2_CAP_STREAMING |
> +				V4L2_CAP_VIDEO_CAPTURE_MPLANE |
> +				V4L2_CAP_VIDEO_OUTPUT_MPLANE;

That should be: V4L2_CAP_VIDEO_M2M_MPLANE | V4L2_CAP_STREAMING;

No CAPTURE/OUTPUT_MPLANE.

> +	cap->capabilities = cap->device_caps | V4L2_CAP_DEVICE_CAPS;
> +	return 0;
> +}
> +
> +static int __enum_fmt(struct v4l2_fmtdesc *f, u32 type)
> +{
> +	int i, index;
> +	struct vpe_fmt *fmt = NULL;
> +
> +	index = 0;
> +	for (i = 0; i < ARRAY_SIZE(vpe_formats); ++i) {
> +		if (vpe_formats[i].types & type) {
> +			if (index == f->index) {
> +				fmt = &vpe_formats[i];
> +				break;
> +			}
> +			index++;
> +		}
> +	}
> +
> +	if (!fmt)
> +		return -EINVAL;
> +
> +	strncpy(f->description, fmt->name, sizeof(f->description) - 1);
> +	f->pixelformat = fmt->fourcc;
> +	return 0;
> +}
> +
> +static int vpe_enum_fmt(struct file *file, void *priv,
> +				struct v4l2_fmtdesc *f)
> +{
> +	if (V4L2_TYPE_IS_OUTPUT(f->type))
> +		return __enum_fmt(f, VPE_FMT_TYPE_OUTPUT);
> +	else
> +		return __enum_fmt(f, VPE_FMT_TYPE_CAPTURE);
> +}
> +
> +static int vpe_g_fmt(struct file *file, void *priv, struct v4l2_format *f)
> +{
> +	struct v4l2_pix_format_mplane *pix = &f->fmt.pix_mp;
> +	struct vpe_ctx *ctx = file2ctx(file);
> +	struct vb2_queue *vq;
> +	struct vpe_q_data *q_data;
> +	int i;
> +
> +	vq = v4l2_m2m_get_vq(ctx->m2m_ctx, f->type);
> +	if (!vq)
> +		return -EINVAL;
> +
> +	q_data = get_q_data(ctx, f->type);
> +
> +	pix->width = q_data->width;
> +	pix->height = q_data->height;
> +	pix->pixelformat = q_data->fmt->fourcc;
> +	pix->colorspace = q_data->colorspace;
> +	pix->num_planes = q_data->fmt->coplanar ? 2 : 1;
> +
> +	for (i = 0; i < pix->num_planes; i++) {
> +		pix->plane_fmt[i].bytesperline = q_data->bytesperline[i];
> +		pix->plane_fmt[i].sizeimage = q_data->sizeimage[i];
> +	}
> +
> +	return 0;
> +}
> +
> +static int __vpe_try_fmt(struct vpe_ctx *ctx, struct v4l2_format *f,
> +		       struct vpe_fmt *fmt, int type)
> +{
> +	struct v4l2_pix_format_mplane *pix = &f->fmt.pix_mp;
> +	struct v4l2_plane_pix_format *plane_fmt;
> +	int i;
> +
> +	if (!fmt || !(fmt->types & type)) {
> +		vpe_err(ctx->dev, "Fourcc format (0x%08x) invalid.\n",
> +			pix->pixelformat);
> +		return -EINVAL;
> +	}
> +
> +	if (pix->field == V4L2_FIELD_ANY)
> +		pix->field = V4L2_FIELD_NONE;
> +	else if (V4L2_FIELD_NONE != pix->field)
> +		return -EINVAL;

No, TRY_FMT should map field to a valid field type. In this case it is simple:
just set field to V4L2_FIELD_NONE.

> +
> +	v4l_bound_align_image(&pix->width, MIN_W, MAX_W, W_ALIGN,
> +			      &pix->height, MIN_H, MAX_H, H_ALIGN,
> +			      S_ALIGN);
> +
> +	pix->num_planes = fmt->coplanar ? 2 : 1;
> +	pix->pixelformat = fmt->fourcc;
> +	pix->colorspace = fmt->fourcc == V4L2_PIX_FMT_RGB24 ?
> +			V4L2_COLORSPACE_SRGB : V4L2_COLORSPACE_SMPTE170M;
> +
> +
> +	for (i = 0; i < pix->num_planes; i++) {
> +		int depth;
> +
> +		plane_fmt = &pix->plane_fmt[i];
> +		depth = fmt->vpdma_fmt[i]->depth;
> +
> +		if (i == VPE_LUMA)
> +			plane_fmt->bytesperline =
> +					round_up((pix->width * depth) >> 3,
> +						1 << L_ALIGN);
> +		else
> +			plane_fmt->bytesperline = pix->width;
> +
> +		plane_fmt->sizeimage =
> +				(pix->height * pix->width * depth) >> 3;
> +	}
> +
> +	return 0;
> +}
> +
> +static int vpe_try_fmt(struct file *file, void *priv, struct v4l2_format *f)
> +{
> +	struct vpe_ctx *ctx = file2ctx(file);
> +	struct vpe_fmt *fmt = find_format(f);
> +
> +	if (V4L2_TYPE_IS_OUTPUT(f->type))
> +		return __vpe_try_fmt(ctx, f, fmt, VPE_FMT_TYPE_OUTPUT);
> +	else
> +		return __vpe_try_fmt(ctx, f, fmt, VPE_FMT_TYPE_CAPTURE);
> +}
> +
> +static int __vpe_s_fmt(struct vpe_ctx *ctx, struct v4l2_format *f)
> +{
> +	struct v4l2_pix_format_mplane *pix = &f->fmt.pix_mp;
> +	struct v4l2_plane_pix_format *plane_fmt;
> +	struct vpe_q_data *q_data;
> +	struct vb2_queue *vq;
> +	int i;
> +
> +	vq = v4l2_m2m_get_vq(ctx->m2m_ctx, f->type);
> +	if (!vq)
> +		return -EINVAL;
> +
> +	if (vb2_is_busy(vq)) {
> +		vpe_err(ctx->dev, "queue busy\n");
> +		return -EBUSY;
> +	}
> +
> +	q_data = get_q_data(ctx, f->type);
> +	if (!q_data)
> +		return -EINVAL;
> +
> +	q_data->fmt		= find_format(f);
> +	q_data->width		= pix->width;
> +	q_data->height		= pix->height;
> +	q_data->colorspace	= pix->colorspace;
> +
> +	for (i = 0; i < pix->num_planes; i++) {
> +		plane_fmt = &pix->plane_fmt[i];
> +
> +		q_data->bytesperline[i]	= plane_fmt->bytesperline;
> +		q_data->sizeimage[i]	= plane_fmt->sizeimage;
> +	}
> +
> +	q_data->c_rect.left	= 0;
> +	q_data->c_rect.top	= 0;
> +	q_data->c_rect.width	= q_data->width;
> +	q_data->c_rect.height	= q_data->height;
> +
> +	vpe_dbg(ctx->dev, "Setting format for type %d, wxh: %dx%d, fmt: %d bpl_y %d",
> +		f->type, q_data->width, q_data->height, q_data->fmt->fourcc,
> +		q_data->bytesperline[VPE_LUMA]);
> +	if (q_data->fmt->coplanar)
> +		vpe_dbg(ctx->dev, " bpl_uv %d\n",
> +			q_data->bytesperline[VPE_CHROMA]);
> +
> +	return 0;
> +}
> +
> +static int vpe_s_fmt(struct file *file, void *priv, struct v4l2_format *f)
> +{
> +	int ret;
> +	struct vpe_ctx *ctx = file2ctx(file);
> +
> +	ret = vpe_try_fmt(file, priv, f);
> +	if (ret)
> +		return ret;
> +
> +	ret = __vpe_s_fmt(ctx, f);
> +	if (ret)
> +		return ret;
> +
> +	if (V4L2_TYPE_IS_OUTPUT(f->type))
> +		set_src_registers(ctx);
> +	else
> +		set_dst_registers(ctx);
> +
> +	return set_srcdst_params(ctx);
> +}
> +
> +static int vpe_reqbufs(struct file *file, void *priv,
> +		       struct v4l2_requestbuffers *reqbufs)
> +{
> +	struct vpe_ctx *ctx = file2ctx(file);
> +
> +	return v4l2_m2m_reqbufs(file, ctx->m2m_ctx, reqbufs);
> +}
> +
> +static int vpe_querybuf(struct file *file, void *priv, struct v4l2_buffer *buf)
> +{
> +	struct vpe_ctx *ctx = file2ctx(file);
> +
> +	return v4l2_m2m_querybuf(file, ctx->m2m_ctx, buf);
> +}
> +
> +static int vpe_qbuf(struct file *file, void *priv, struct v4l2_buffer *buf)
> +{
> +	struct vpe_ctx *ctx = file2ctx(file);
> +
> +	return v4l2_m2m_qbuf(file, ctx->m2m_ctx, buf);
> +}
> +
> +static int vpe_dqbuf(struct file *file, void *priv, struct v4l2_buffer *buf)
> +{
> +	struct vpe_ctx *ctx = file2ctx(file);
> +
> +	return v4l2_m2m_dqbuf(file, ctx->m2m_ctx, buf);
> +}
> +
> +static int vpe_streamon(struct file *file, void *priv, enum v4l2_buf_type type)
> +{
> +	struct vpe_ctx *ctx = file2ctx(file);
> +
> +	return v4l2_m2m_streamon(file, ctx->m2m_ctx, type);
> +}
> +
> +static int vpe_streamoff(struct file *file, void *priv, enum v4l2_buf_type type)
> +{
> +	struct vpe_ctx *ctx = file2ctx(file);
> +
> +	vpe_dump_regs(ctx->dev);
> +	vpdma_dump_regs(ctx->dev->vpdma);
> +
> +	return v4l2_m2m_streamoff(file, ctx->m2m_ctx, type);
> +}
> +
> +#define V4L2_CID_TRANS_NUM_BUFS		(V4L2_CID_USER_BASE)
> +
> +static int vpe_s_ctrl(struct v4l2_ctrl *ctrl)
> +{
> +	struct vpe_ctx *ctx =
> +		container_of(ctrl->handler, struct vpe_ctx, hdl);
> +
> +	switch (ctrl->id) {
> +	case V4L2_CID_TRANS_NUM_BUFS:
> +		ctx->bufs_per_job = ctrl->val;
> +		break;
> +
> +	default:
> +		vpe_err(ctx->dev, "Invalid control\n");
> +		return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +
> +static const struct v4l2_ctrl_ops vpe_ctrl_ops = {
> +	.s_ctrl = vpe_s_ctrl,
> +};
> +
> +static const struct v4l2_ioctl_ops vpe_ioctl_ops = {
> +	.vidioc_querycap	= vpe_querycap,
> +
> +	.vidioc_enum_fmt_vid_cap_mplane = vpe_enum_fmt,
> +	.vidioc_g_fmt_vid_cap_mplane	= vpe_g_fmt,
> +	.vidioc_try_fmt_vid_cap_mplane	= vpe_try_fmt,
> +	.vidioc_s_fmt_vid_cap_mplane	= vpe_s_fmt,
> +
> +	.vidioc_enum_fmt_vid_out_mplane = vpe_enum_fmt,
> +	.vidioc_g_fmt_vid_out_mplane	= vpe_g_fmt,
> +	.vidioc_try_fmt_vid_out_mplane	= vpe_try_fmt,
> +	.vidioc_s_fmt_vid_out_mplane	= vpe_s_fmt,
> +
> +	.vidioc_reqbufs		= vpe_reqbufs,
> +	.vidioc_querybuf	= vpe_querybuf,
> +
> +	.vidioc_qbuf		= vpe_qbuf,
> +	.vidioc_dqbuf		= vpe_dqbuf,
> +
> +	.vidioc_streamon	= vpe_streamon,
> +	.vidioc_streamoff	= vpe_streamoff,
> +	.vidioc_subscribe_event = v4l2_ctrl_subscribe_event,
> +	.vidioc_unsubscribe_event = v4l2_event_unsubscribe,
> +};
> +
> +/*
> + * Queue operations
> + */
> +static int vpe_queue_setup(struct vb2_queue *vq,
> +			   const struct v4l2_format *fmt,
> +			   unsigned int *nbuffers, unsigned int *nplanes,
> +			   unsigned int sizes[], void *alloc_ctxs[])
> +{
> +	int i;
> +	struct vpe_ctx *ctx = vb2_get_drv_priv(vq);
> +	struct vpe_q_data *q_data;
> +
> +	q_data = get_q_data(ctx, vq->type);
> +
> +	*nplanes = q_data->fmt->coplanar ? 2 : 1;
> +
> +	for (i = 0; i < *nplanes; i++) {
> +		sizes[i] = q_data->sizeimage[i];
> +		alloc_ctxs[i] = ctx->dev->alloc_ctx;
> +	}
> +
> +	vpe_dbg(ctx->dev, "get %d buffer(s) of size %d", *nbuffers,
> +		sizes[VPE_LUMA]);
> +	if (q_data->fmt->coplanar)
> +		vpe_dbg(ctx->dev, " and %d\n", sizes[VPE_CHROMA]);
> +
> +	return 0;
> +}
> +
> +static int vpe_buf_prepare(struct vb2_buffer *vb)
> +{
> +	struct vpe_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
> +	struct vpe_q_data *q_data;
> +	int i, num_planes;
> +
> +	vpe_dbg(ctx->dev, "type: %d\n", vb->vb2_queue->type);
> +
> +	q_data = get_q_data(ctx, vb->vb2_queue->type);
> +	num_planes = q_data->fmt->coplanar ? 2 : 1;
> +
> +	for (i = 0; i < num_planes; i++) {
> +		if (vb2_plane_size(vb, i) < q_data->sizeimage[i]) {
> +			vpe_err(ctx->dev,
> +				"data will not fit into plane (%lu < %lu)\n",
> +				vb2_plane_size(vb, i),
> +				(long) q_data->sizeimage[i]);
> +			return -EINVAL;
> +		}
> +	}
> +
> +	for (i = 0; i < num_planes; i++)
> +		vb2_set_plane_payload(vb, i, q_data->sizeimage[i]);
> +
> +	return 0;
> +}
> +
> +static void vpe_buf_queue(struct vb2_buffer *vb)
> +{
> +	struct vpe_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
> +	v4l2_m2m_buf_queue(ctx->m2m_ctx, vb);
> +}
> +
> +static void vpe_wait_prepare(struct vb2_queue *q)
> +{
> +	struct vpe_ctx *ctx = vb2_get_drv_priv(q);
> +	vpe_unlock(ctx);
> +}
> +
> +static void vpe_wait_finish(struct vb2_queue *q)
> +{
> +	struct vpe_ctx *ctx = vb2_get_drv_priv(q);
> +	vpe_lock(ctx);
> +}
> +
> +static struct vb2_ops vpe_qops = {
> +	.queue_setup	 = vpe_queue_setup,
> +	.buf_prepare	 = vpe_buf_prepare,
> +	.buf_queue	 = vpe_buf_queue,
> +	.wait_prepare	 = vpe_wait_prepare,
> +	.wait_finish	 = vpe_wait_finish,
> +};
> +
> +static int queue_init(void *priv, struct vb2_queue *src_vq,
> +		      struct vb2_queue *dst_vq)
> +{
> +	struct vpe_ctx *ctx = priv;
> +	int ret;
> +
> +	memset(src_vq, 0, sizeof(*src_vq));
> +	src_vq->type = V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE;
> +	src_vq->io_modes = VB2_MMAP;
> +	src_vq->drv_priv = ctx;
> +	src_vq->buf_struct_size = sizeof(struct v4l2_m2m_buffer);
> +	src_vq->ops = &vpe_qops;
> +	src_vq->mem_ops = &vb2_dma_contig_memops;
> +	src_vq->timestamp_type = V4L2_BUF_FLAG_TIMESTAMP_MONOTONIC;
> +
> +	ret = vb2_queue_init(src_vq);
> +	if (ret)
> +		return ret;
> +
> +	memset(dst_vq, 0, sizeof(*dst_vq));
> +	dst_vq->type = V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE;
> +	dst_vq->io_modes = VB2_MMAP;
> +	dst_vq->drv_priv = ctx;
> +	dst_vq->buf_struct_size = sizeof(struct v4l2_m2m_buffer);
> +	dst_vq->ops = &vpe_qops;
> +	dst_vq->mem_ops = &vb2_dma_contig_memops;
> +	dst_vq->timestamp_type = V4L2_BUF_FLAG_TIMESTAMP_MONOTONIC;
> +
> +	return vb2_queue_init(dst_vq);
> +}
> +
> +static const struct v4l2_ctrl_config vpe_bufs_per_job = {
> +	.ops = &vpe_ctrl_ops,
> +	.id = V4L2_CID_TRANS_NUM_BUFS,
> +	.name = "Buffers Per Transaction",
> +	.type = V4L2_CTRL_TYPE_INTEGER,
> +	.def = VPE_DEF_BUFS_PER_JOB,
> +	.min = 1,
> +	.max = VIDEO_MAX_FRAME,
> +	.step = 1,
> +};
> +
> +/*
> + * File operations
> + */
> +static int vpe_open(struct file *file)
> +{
> +	struct vpe_dev *dev = video_drvdata(file);
> +	struct vpe_ctx *ctx = NULL;
> +	struct vpe_q_data *s_q_data;
> +	struct v4l2_ctrl_handler *hdl;
> +	int ret;
> +
> +	vpe_dbg(dev, "vpe_open\n");
> +
> +	if (!dev->vpdma->ready) {
> +		vpe_err(dev, "vpdma firmware not loaded\n");
> +		return -ENODEV;
> +	}
> +
> +	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
> +	if (!ctx)
> +		return -ENOMEM;
> +
> +	ctx->dev = dev;
> +
> +	if (mutex_lock_interruptible(&dev->dev_mutex)) {
> +		ret = -ERESTARTSYS;
> +		goto free_ctx;
> +	}
> +
> +	ret = vpdma_create_desc_list(&ctx->desc_list, VPE_DESC_LIST_SIZE,
> +			VPDMA_LIST_TYPE_NORMAL);
> +	if (ret != 0)
> +		goto unlock;
> +
> +	ret = vpdma_buf_alloc(&ctx->mmr_adb, sizeof(struct vpe_mmr_adb));
> +	if (ret != 0)
> +		goto free_desc_list;
> +
> +	init_adb_hdrs(ctx);
> +
> +	v4l2_fh_init(&ctx->fh, video_devdata(file));
> +	file->private_data = &ctx->fh;
> +
> +	hdl = &ctx->hdl;
> +	v4l2_ctrl_handler_init(hdl, 1);
> +	v4l2_ctrl_new_custom(hdl, &vpe_bufs_per_job, NULL);
> +	if (hdl->error) {
> +		ret = hdl->error;
> +		goto exit_fh;
> +	}

I would expect to see a:

	ctx->fh.ctrl_handler = hdl;

here, otherwise the handler will never be seen by the v4l2 framework.

> +
> +	s_q_data = &ctx->q_data[Q_DATA_SRC];
> +	s_q_data->fmt = &vpe_formats[2];
> +	s_q_data->width = 1920;
> +	s_q_data->height = 1080;
> +	s_q_data->sizeimage[VPE_LUMA] = (s_q_data->width * s_q_data->height *
> +			s_q_data->fmt->vpdma_fmt[VPE_LUMA]->depth) >> 3;
> +	s_q_data->colorspace = V4L2_COLORSPACE_SMPTE240M;
> +	s_q_data->c_rect.left = 0;
> +	s_q_data->c_rect.top = 0;
> +	s_q_data->c_rect.width = s_q_data->width;
> +	s_q_data->c_rect.height = s_q_data->height;
> +	s_q_data->flags = 0;
> +
> +	ctx->q_data[Q_DATA_DST] = *s_q_data;
> +
> +	set_src_registers(ctx);
> +	set_dst_registers(ctx);
> +	ret = set_srcdst_params(ctx);
> +	if (ret)
> +		goto exit_fh;
> +
> +	ctx->m2m_ctx = v4l2_m2m_ctx_init(dev->m2m_dev, ctx, &queue_init);
> +
> +	if (IS_ERR(ctx->m2m_ctx)) {
> +		ret = PTR_ERR(ctx->m2m_ctx);
> +		goto exit_fh;
> +	}
> +
> +	v4l2_fh_add(&ctx->fh);
> +
> +	/*
> +	 * for now, just report the creation of the first instance, we can later
> +	 * optimize the driver to enable or disable clocks when the first
> +	 * instance is created or the last instance released
> +	 */
> +	if (atomic_inc_return(&dev->num_instances) == 1)
> +		vpe_dbg(dev, "first instance created\n");
> +
> +	ctx->bufs_per_job = VPE_DEF_BUFS_PER_JOB;
> +
> +	ctx->load_mmrs = true;
> +
> +	vpe_dbg(dev, "created instance %p, m2m_ctx: %p\n",
> +		ctx, ctx->m2m_ctx);
> +
> +	mutex_unlock(&dev->dev_mutex);
> +
> +	return 0;
> +exit_fh:
> +	v4l2_ctrl_handler_free(hdl);
> +	v4l2_fh_exit(&ctx->fh);
> +	vpdma_buf_free(&ctx->mmr_adb);
> +free_desc_list:
> +	vpdma_free_desc_list(&ctx->desc_list);
> +unlock:
> +	mutex_unlock(&dev->dev_mutex);
> +free_ctx:
> +	kfree(ctx);
> +	return ret;
> +}
> +
> +static int vpe_release(struct file *file)
> +{
> +	struct vpe_dev *dev = video_drvdata(file);
> +	struct vpe_ctx *ctx = file2ctx(file);
> +
> +	vpe_dbg(dev, "releasing instance %p\n", ctx);
> +
> +	mutex_lock(&dev->dev_mutex);
> +	vpdma_free_desc_list(&ctx->desc_list);
> +	vpdma_buf_free(&ctx->mmr_adb);

I'm missing a v4l2_ctrl_handler_free() in this release function.

> +
> +	v4l2_fh_del(&ctx->fh);
> +	v4l2_fh_exit(&ctx->fh);
> +	v4l2_m2m_ctx_release(ctx->m2m_ctx);
> +
> +	kfree(ctx);
> +
> +	/*
> +	 * for now, just report the release of the last instance, we can later
> +	 * optimize the driver to enable or disable clocks when the first
> +	 * instance is created or the last instance released
> +	 */
> +	if (atomic_dec_return(&dev->num_instances) == 0)
> +		vpe_dbg(dev, "last instance released\n");
> +
> +	mutex_unlock(&dev->dev_mutex);
> +
> +	return 0;
> +}
> +
> +static unsigned int vpe_poll(struct file *file,
> +			     struct poll_table_struct *wait)
> +{
> +	struct vpe_ctx *ctx = file2ctx(file);
> +	struct vpe_dev *dev = ctx->dev;
> +	int ret;
> +
> +	mutex_lock(&dev->dev_mutex);
> +	ret = v4l2_m2m_poll(file, ctx->m2m_ctx, wait);
> +	mutex_unlock(&dev->dev_mutex);
> +	return ret;
> +}
> +
> +static int vpe_mmap(struct file *file, struct vm_area_struct *vma)
> +{
> +	struct vpe_ctx *ctx = file2ctx(file);
> +	struct vpe_dev *dev = ctx->dev;
> +	int ret;
> +
> +	if (mutex_lock_interruptible(&dev->dev_mutex))
> +		return -ERESTARTSYS;
> +	ret = v4l2_m2m_mmap(file, ctx->m2m_ctx, vma);
> +	mutex_unlock(&dev->dev_mutex);
> +	return ret;
> +}
> +
> +static const struct v4l2_file_operations vpe_fops = {
> +	.owner		= THIS_MODULE,
> +	.open		= vpe_open,
> +	.release	= vpe_release,
> +	.poll		= vpe_poll,
> +	.unlocked_ioctl	= video_ioctl2,
> +	.mmap		= vpe_mmap,
> +};
> +
> +static struct video_device vpe_videodev = {
> +	.name		= VPE_MODULE_NAME,
> +	.fops		= &vpe_fops,
> +	.ioctl_ops	= &vpe_ioctl_ops,
> +	.minor		= -1,
> +	.release	= video_device_release,
> +	.vfl_dir	= VFL_DIR_M2M,
> +};
> +
> +static struct v4l2_m2m_ops m2m_ops = {
> +	.device_run	= device_run,
> +	.job_ready	= job_ready,
> +	.job_abort	= job_abort,
> +	.lock		= vpe_lock,
> +	.unlock		= vpe_unlock,
> +};
> +
> +static int vpe_runtime_get(struct platform_device *pdev)
> +{
> +	int r;
> +
> +	dev_dbg(&pdev->dev, "vpe_runtime_get\n");
> +
> +	r = pm_runtime_get_sync(&pdev->dev);
> +	WARN_ON(r < 0);
> +	return r < 0 ? r : 0;
> +}
> +
> +static void vpe_runtime_put(struct platform_device *pdev)
> +{
> +
> +	int r;
> +
> +	dev_dbg(&pdev->dev, "vpe_runtime_put\n");
> +
> +	r = pm_runtime_put_sync(&pdev->dev);
> +	WARN_ON(r < 0 && r != -ENOSYS);
> +}
> +
> +static int vpe_probe(struct platform_device *pdev)
> +{
> +	struct vpe_dev *dev;
> +	struct video_device *vfd;
> +	struct resource *res;
> +	int ret, irq, func;
> +
> +	dev = devm_kzalloc(&pdev->dev, sizeof(*dev), GFP_KERNEL);
> +	if (!dev)
> +		return -ENOMEM;
> +
> +	spin_lock_init(&dev->lock);
> +
> +	pm_runtime_enable(&pdev->dev);
> +
> +	ret = vpe_runtime_get(pdev);
> +	if (ret)
> +		goto err_runtime_get;
> +
> +	irq = platform_get_irq(pdev, 0);
> +	if (irq < 0) {
> +		dev_err(&pdev->dev, "missing irq data\n");
> +		return -ENODEV;
> +	}
> +
> +	res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "vpe");
> +	if (res == NULL) {
> +		dev_err(&pdev->dev, "missing platform resources data\n");
> +		return -ENODEV;
> +	}
> +
> +	ret = v4l2_device_register(&pdev->dev, &dev->v4l2_dev);
> +	if (ret)
> +		return ret;
> +
> +	atomic_set(&dev->num_instances, 0);
> +	mutex_init(&dev->dev_mutex);
> +
> +	vfd = video_device_alloc();

In general I prefer to see the struct video_device embedded in struct vpe_dev. If
nothing else it saves you from the NULL test below.

> +	if (!vfd) {
> +		vpe_err(dev, "Failed to allocate video device\n");
> +		ret = -ENOMEM;
> +		goto dev_unreg;
> +	}
> +
> +	*vfd = vpe_videodev;
> +	vfd->lock = &dev->dev_mutex;
> +	vfd->v4l2_dev = &dev->v4l2_dev;
> +
> +	ret = video_register_device(vfd, VFL_TYPE_GRABBER, 0);

This should be done as the very last thing: once the device is registered apps can
immediately use it, and if the internal state is not yet fully ready, that can
cause major problems.

Remember that udev daemons can open devices as soon as they appear, so this is not
a theoretical race condition. The best way to avoid problems is to call this as the
very last thing.

> +	if (ret) {
> +		vpe_err(dev, "Failed to register video device\n");
> +		goto rel_vdev;
> +	}
> +
> +	video_set_drvdata(vfd, dev);
> +	snprintf(vfd->name, sizeof(vfd->name), "%s", vpe_videodev.name);
> +	dev->vfd = vfd;
> +	dev_info(dev->v4l2_dev.dev, "Device registered as /dev/video%d\n",
> +		vfd->num);
> +
> +	platform_set_drvdata(pdev, dev);
> +
> +	dev->base = devm_ioremap(&pdev->dev, res->start, SZ_128K);
> +	if (!dev->base) {
> +		ret = -ENOMEM;
> +		goto vid_unreg_dev;
> +	}
> +
> +	/* Perform clk enable followed by reset */
> +	vpe_set_clock_enable(dev, 1);
> +
> +	vpe_top_reset(dev);
> +
> +	func = get_field_reg(dev, VPE_PID, VPE_PID_FUNC_MASK,
> +		VPE_PID_FUNC_SHIFT);
> +	vpe_dbg(dev, "VPE PID function %x\n", func);
> +
> +	if (devm_request_irq(&pdev->dev, irq, vpe_irq, 0, VPE_MODULE_NAME,
> +			dev) < 0) {
> +		ret = -ENOMEM;
> +		goto vid_unreg_dev;
> +	}
> +
> +	dev->alloc_ctx = vb2_dma_contig_init_ctx(&pdev->dev);
> +	if (IS_ERR(dev->alloc_ctx)) {
> +		vpe_err(dev, "Failed to alloc vb2 context\n");
> +		ret = PTR_ERR(dev->alloc_ctx);
> +		goto vid_unreg_dev;
> +	}
> +
> +	dev->m2m_dev = v4l2_m2m_init(&m2m_ops);
> +	if (IS_ERR(dev->m2m_dev)) {
> +		vpe_err(dev, "Failed to init mem2mem device\n");
> +		ret = PTR_ERR(dev->m2m_dev);
> +		goto rel_ctx;
> +	}
> +
> +	vpe_top_vpdma_reset(dev);
> +
> +	ret = vpdma_init(pdev, &dev->vpdma);
> +	if (ret < 0)
> +		goto rel_m2m;
> +
> +	return 0;
> +
> +rel_m2m:
> +	v4l2_m2m_release(dev->m2m_dev);
> +rel_ctx:
> +	vb2_dma_contig_cleanup_ctx(dev->alloc_ctx);
> +vid_unreg_dev:
> +	video_unregister_device(vfd);
> +rel_vdev:
> +	video_device_release(vfd);
> +dev_unreg:
> +	v4l2_device_unregister(&dev->v4l2_dev);
> +err_runtime_get:
> +	pm_runtime_disable(&pdev->dev);
> +
> +	return ret;
> +}
> +
> +static int vpe_remove(struct platform_device *pdev)
> +{
> +	struct vpe_dev *dev =
> +		(struct vpe_dev *) platform_get_drvdata(pdev);
> +
> +	v4l2_info(&dev->v4l2_dev, "Removing " VPE_MODULE_NAME);
> +
> +	v4l2_m2m_release(dev->m2m_dev);
> +	video_unregister_device(dev->vfd);
> +	v4l2_device_unregister(&dev->v4l2_dev);
> +	vb2_dma_contig_cleanup_ctx(dev->alloc_ctx);
> +
> +	vpe_set_clock_enable(dev, 0);
> +	vpe_runtime_put(pdev);
> +	pm_runtime_disable(&pdev->dev);
> +
> +	return 0;
> +}
> +
> +#if defined(CONFIG_OF)
> +static const struct of_device_id vpe_of_match[] = {
> +	{
> +		.compatible = "ti,vpe",
> +	},
> +	{},
> +};
> +#else
> +#define vpe_of_match NULL
> +#endif
> +
> +static struct platform_driver vpe_pdrv = {
> +	.probe		= vpe_probe,
> +	.remove		= vpe_remove,
> +	.driver		= {
> +		.name	= VPE_MODULE_NAME,
> +		.owner	= THIS_MODULE,
> +		.of_match_table = vpe_of_match,
> +	},
> +};
> +
> +static void __exit vpe_exit(void)
> +{
> +	platform_driver_unregister(&vpe_pdrv);
> +}
> +
> +static int __init vpe_init(void)
> +{
> +	return platform_driver_register(&vpe_pdrv);
> +}
> +
> +module_init(vpe_init);
> +module_exit(vpe_exit);
> +
> +MODULE_DESCRIPTION("TI VPE driver");
> +MODULE_AUTHOR("Dale Farnsworth, <dale@farnsworth.org>");
> +MODULE_LICENSE("GPL");

Overall it looks pretty good!

Regards,

	Hans

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 4/6] v4l: ti-vpe: Add de-interlacer support in VPE
  2013-08-02 14:03   ` Archit Taneja
  (?)
@ 2013-08-02 14:40   ` Hans Verkuil
  -1 siblings, 0 replies; 138+ messages in thread
From: Hans Verkuil @ 2013-08-02 14:40 UTC (permalink / raw)
  To: Archit Taneja
  Cc: linux-media, linux-omap, dagriego, dale, pawel, m.szyprowski,
	laurent.pinchart, tomi.valkeinen

More comments...

On 08/02/2013 04:03 PM, Archit Taneja wrote:
> Add support for the de-interlacer block in VPE.
> 
> For de-interlacer to work, we need to enable 2 more sets of VPE input ports
> which fetch data from the 'last' and 'last to last' fields of the interlaced
> video. Apart from that, we need to enable the Motion vector output and input
> ports, and also allocate DMA buffers for them.
> 
> We need to make sure that two most recent fields in the source queue are
> available and in the 'READY' state. Once a mem2mem context gets access to the
> VPE HW(in device_run), it extracts the addresses of the 3 buffers, and provides
> it to the data descriptors for the 3 sets of input ports((LUMA1, CHROMA1),
> (LUMA2, CHROMA2), and (LUMA3, CHROMA3)) respectively for the 3 consecutive
> fields. The motion vector and output port descriptors are configured and the
> list is submitted to VPDMA.
> 
> Once the transaction is done, the v4l2 buffer corresponding to the oldest
> field(the 3rd one) is changed to the state 'DONE', and the buffers corresponding
> to 1st and 2nd fields become the 2nd and 3rd field for the next de-interlace
> operation. This way, for each deinterlace operation, we have the 3 most recent
> fields. After each transaction, we also swap the motion vector buffers, the new
> input motion vector buffer contains the resultant motion information of all the
> previous frames, and the new output motion vector buffer will be used to hold
> the updated motion vector to capture the motion changes in the next field.
> 
> The de-interlacer is removed from bypass mode, it requires some extra default
> configurations which are now added. The chrominance upsampler coefficients are
> added for interlaced frames. Some VPDMA parameters like frame start event and
> line mode are configured for the 2 extra sets of input ports.
> 
> Signed-off-by: Archit Taneja <archit@ti.com>
> ---
>  drivers/media/platform/ti-vpe/vpe.c | 372 ++++++++++++++++++++++++++++++++----
>  1 file changed, 337 insertions(+), 35 deletions(-)
> 
> diff --git a/drivers/media/platform/ti-vpe/vpe.c b/drivers/media/platform/ti-vpe/vpe.c
> index 14a292b..5b1410c 100644
> --- a/drivers/media/platform/ti-vpe/vpe.c
> +++ b/drivers/media/platform/ti-vpe/vpe.c

...

> @@ -1035,7 +1310,8 @@ static int __vpe_try_fmt(struct vpe_ctx *ctx, struct v4l2_format *f,
>  
>  	if (pix->field == V4L2_FIELD_ANY)
>  		pix->field = V4L2_FIELD_NONE;
> -	else if (V4L2_FIELD_NONE != pix->field)
> +	else if (V4L2_FIELD_NONE != pix->field &&
> +			V4L2_FIELD_ALTERNATE != pix->field)
>  		return -EINVAL;

As mentioned before, this shouldn't result in an error, but map to a valid
field format.

For a deinterlacer I would expect NONE for the output of the deinterlacer (or
capture buffer type) and ALTERNATE for the input of the deinterlacer (or output
buffer type).

>  
>  	v4l_bound_align_image(&pix->width, MIN_W, MAX_W, W_ALIGN,
> @@ -1104,6 +1380,7 @@ static int __vpe_s_fmt(struct vpe_ctx *ctx, struct v4l2_format *f)
>  	q_data->width		= pix->width;
>  	q_data->height		= pix->height;
>  	q_data->colorspace	= pix->colorspace;
> +	q_data->field		= pix->field;
>  
>  	for (i = 0; i < pix->num_planes; i++) {
>  		plane_fmt = &pix->plane_fmt[i];
> @@ -1117,6 +1394,11 @@ static int __vpe_s_fmt(struct vpe_ctx *ctx, struct v4l2_format *f)
>  	q_data->c_rect.width	= q_data->width;
>  	q_data->c_rect.height	= q_data->height;
>  
> +	if (q_data->field == V4L2_FIELD_ALTERNATE)
> +		q_data->flags |= Q_DATA_INTERLACED;
> +	else
> +		q_data->flags &= ~Q_DATA_INTERLACED;
> +
>  	vpe_dbg(ctx->dev, "Setting format for type %d, wxh: %dx%d, fmt: %d bpl_y %d",
>  		f->type, q_data->width, q_data->height, q_data->fmt->fourcc,
>  		q_data->bytesperline[VPE_LUMA]);
> @@ -1194,6 +1476,22 @@ static int vpe_streamoff(struct file *file, void *priv, enum v4l2_buf_type type)
>  	return v4l2_m2m_streamoff(file, ctx->m2m_ctx, type);
>  }
>  
> +static void set_dei_shadow_registers(struct vpe_ctx *ctx)
> +{
> +	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
> +	u32 *dei_mmr = &mmr_adb->dei_regs[0];
> +	struct vpe_dei_regs *cur = &dei_regs;
> +
> +	dei_mmr[2]  = cur->mdt_spacial_freq_thr_reg;
> +	dei_mmr[3]  = cur->edi_config_reg;
> +	dei_mmr[4]  = cur->edi_lut_reg0;
> +	dei_mmr[5]  = cur->edi_lut_reg1;
> +	dei_mmr[6]  = cur->edi_lut_reg2;
> +	dei_mmr[7]  = cur->edi_lut_reg3;
> +
> +	ctx->load_mmrs = true;
> +}
> +
>  #define V4L2_CID_TRANS_NUM_BUFS		(V4L2_CID_USER_BASE)
>  
>  static int vpe_s_ctrl(struct v4l2_ctrl *ctrl)
> @@ -1425,6 +1723,7 @@ static int vpe_open(struct file *file)
>  	s_q_data->sizeimage[VPE_LUMA] = (s_q_data->width * s_q_data->height *
>  			s_q_data->fmt->vpdma_fmt[VPE_LUMA]->depth) >> 3;
>  	s_q_data->colorspace = V4L2_COLORSPACE_SMPTE240M;
> +	s_q_data->field = V4L2_FIELD_NONE;
>  	s_q_data->c_rect.left = 0;
>  	s_q_data->c_rect.top = 0;
>  	s_q_data->c_rect.width = s_q_data->width;
> @@ -1433,6 +1732,7 @@ static int vpe_open(struct file *file)
>  
>  	ctx->q_data[Q_DATA_DST] = *s_q_data;
>  
> +	set_dei_shadow_registers(ctx);
>  	set_src_registers(ctx);
>  	set_dst_registers(ctx);
>  	ret = set_srcdst_params(ctx);
> @@ -1487,6 +1787,8 @@ static int vpe_release(struct file *file)
>  	vpe_dbg(dev, "releasing instance %p\n", ctx);
>  
>  	mutex_lock(&dev->dev_mutex);
> +	free_vbs(ctx);
> +	free_mv_buffers(ctx);
>  	vpdma_free_desc_list(&ctx->desc_list);
>  	vpdma_buf_free(&ctx->mmr_adb);
>  
> 

Regards,

	Hans

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 3/6] v4l: ti-vpe: Add VPE mem to mem driver
  2013-08-02 14:36   ` Hans Verkuil
@ 2013-08-02 14:55       ` Archit Taneja
  0 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-08-02 14:55 UTC (permalink / raw)
  To: Hans Verkuil
  Cc: linux-media, linux-omap, dagriego, dale, pawel, m.szyprowski,
	laurent.pinchart, tomi.valkeinen

Hi Hans,

Thanks for the comments. Some replies below.

On Friday 02 August 2013 08:06 PM, Hans Verkuil wrote:
> Hi Archit,
>
> I've got a few comments:
>
> On 08/02/2013 04:03 PM, Archit Taneja wrote:
>> VPE is a block which consists of a single memory to memory path which can
>> perform chrominance up/down sampling, de-interlacing, scaling, and color space
>> conversion of raster or tiled YUV420 coplanar, YUV422 coplanar or YUV422
>> interleaved video formats.
>>
>> We create a mem2mem driver based primarily on the mem2mem-testdev example.
>> The de-interlacer, scaler and color space converter are all bypassed for now
>> to keep the driver simple. Chroma up/down sampler blocks are implemented, so
>> conversion beteen different YUV formats is possible.
>>
>> Each mem2mem context allocates a buffer for VPE MMR values which it will use
>> when it gets access to the VPE HW via the mem2mem queue, it also allocates
>> a VPDMA descriptor list to which configuration and data descriptors are added.
>>
>> Based on the information received via v4l2 ioctls for the source and
>> destination queues, the driver configures the values for the MMRs, and stores
>> them in the buffer. There are also some VPDMA parameters like frame start and
>> line mode which needs to be configured, these are configured by direct register
>> writes via the VPDMA helper functions.
>>
>> The driver's device_run() mem2mem op will add each descriptor based on how the
>> source and destination queues are set up for the given ctx, once the list is
>> prepared, it's submitted to VPDMA, these descriptors when parsed by VPDMA will
>> upload MMR registers, start DMA of video buffers on the various input and output
>> clients/ports.
>>
>> When the list is parsed completely(and the DMAs on all the output ports done),
>> an interrupt is generated which we use to notify that the source and destination
>> buffers are done.
>>
>> The rest of the driver is quite similar to other mem2mem drivers, we use the
>> multiplane v4l2 ioctls as the HW support coplanar formats.
>>
>> Signed-off-by: Archit Taneja <archit@ti.com>
>> ---
>>   drivers/media/platform/Kconfig           |   10 +
>>   drivers/media/platform/Makefile          |    2 +
>>   drivers/media/platform/ti-vpe/vpe.c      | 1763 ++++++++++++++++++++++++++++++
>>   drivers/media/platform/ti-vpe/vpe_regs.h |  496 +++++++++
>>   4 files changed, 2271 insertions(+)
>>   create mode 100644 drivers/media/platform/ti-vpe/vpe.c
>>   create mode 100644 drivers/media/platform/ti-vpe/vpe_regs.h
>>
>
> ...
>
>> +/*
>> + * video ioctls
>> + */
>> +static int vpe_querycap(struct file *file, void *priv,
>> +			struct v4l2_capability *cap)
>> +{
>> +	strncpy(cap->driver, VPE_MODULE_NAME, sizeof(cap->driver) - 1);
>> +	strncpy(cap->card, VPE_MODULE_NAME, sizeof(cap->card) - 1);
>> +	strlcpy(cap->bus_info, VPE_MODULE_NAME, sizeof(cap->bus_info));
>> +	cap->device_caps  = V4L2_CAP_VIDEO_M2M | V4L2_CAP_STREAMING |
>> +				V4L2_CAP_VIDEO_CAPTURE_MPLANE |
>> +				V4L2_CAP_VIDEO_OUTPUT_MPLANE;
>
> That should be: V4L2_CAP_VIDEO_M2M_MPLANE | V4L2_CAP_STREAMING;
>
> No CAPTURE/OUTPUT_MPLANE.

Sure, I'll fix this.

<snip>

>> +
>> +	if (pix->field == V4L2_FIELD_ANY)
>> +		pix->field = V4L2_FIELD_NONE;
>> +	else if (V4L2_FIELD_NONE != pix->field)
>> +		return -EINVAL;
>
> No, TRY_FMT should map field to a valid field type. In this case it is simple:
> just set field to V4L2_FIELD_NONE.

Okay, I'll correct this.

I saw your comment on the de-interlacer patch. The de-interlacer can be 
bypassed, so for both ouput buffer type and capture buffer type, we can 
have interlaced and progressive content, so I guess NONE and ALTERNATE 
are possible for both.

>
>> +
>> +	v4l_bound_align_image(&pix->width, MIN_W, MAX_W, W_ALIGN,
>> +			      &pix->height, MIN_H, MAX_H, H_ALIGN,
>> +			      S_ALIGN);
>> +
>> +	pix->num_planes = fmt->coplanar ? 2 : 1;
>> +	pix->pixelformat = fmt->fourcc;
>> +	pix->colorspace = fmt->fourcc == V4L2_PIX_FMT_RGB24 ?
>> +			V4L2_COLORSPACE_SRGB : V4L2_COLORSPACE_SMPTE170M;
>> +
>> +
>> +	for (i = 0; i < pix->num_planes; i++) {
>> +		int depth;
>> +
>> +		plane_fmt = &pix->plane_fmt[i];
>> +		depth = fmt->vpdma_fmt[i]->depth;
>> +
>> +		if (i == VPE_LUMA)
>> +			plane_fmt->bytesperline =
>> +					round_up((pix->width * depth) >> 3,
>> +						1 << L_ALIGN);
>> +		else
>> +			plane_fmt->bytesperline = pix->width;
>> +
>> +		plane_fmt->sizeimage =
>> +				(pix->height * pix->width * depth) >> 3;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static int vpe_try_fmt(struct file *file, void *priv, struct v4l2_format *f)
>> +{
>> +	struct vpe_ctx *ctx = file2ctx(file);
>> +	struct vpe_fmt *fmt = find_format(f);
>> +
>> +	if (V4L2_TYPE_IS_OUTPUT(f->type))
>> +		return __vpe_try_fmt(ctx, f, fmt, VPE_FMT_TYPE_OUTPUT);
>> +	else
>> +		return __vpe_try_fmt(ctx, f, fmt, VPE_FMT_TYPE_CAPTURE);
>> +}
>> +
>> +static int __vpe_s_fmt(struct vpe_ctx *ctx, struct v4l2_format *f)
>> +{
>> +	struct v4l2_pix_format_mplane *pix = &f->fmt.pix_mp;
>> +	struct v4l2_plane_pix_format *plane_fmt;
>> +	struct vpe_q_data *q_data;
>> +	struct vb2_queue *vq;
>> +	int i;
>> +
>> +	vq = v4l2_m2m_get_vq(ctx->m2m_ctx, f->type);
>> +	if (!vq)
>> +		return -EINVAL;
>> +
>> +	if (vb2_is_busy(vq)) {
>> +		vpe_err(ctx->dev, "queue busy\n");
>> +		return -EBUSY;
>> +	}
>> +
>> +	q_data = get_q_data(ctx, f->type);
>> +	if (!q_data)
>> +		return -EINVAL;
>> +
>> +	q_data->fmt		= find_format(f);
>> +	q_data->width		= pix->width;
>> +	q_data->height		= pix->height;
>> +	q_data->colorspace	= pix->colorspace;
>> +
>> +	for (i = 0; i < pix->num_planes; i++) {
>> +		plane_fmt = &pix->plane_fmt[i];
>> +
>> +		q_data->bytesperline[i]	= plane_fmt->bytesperline;
>> +		q_data->sizeimage[i]	= plane_fmt->sizeimage;
>> +	}
>> +
>> +	q_data->c_rect.left	= 0;
>> +	q_data->c_rect.top	= 0;
>> +	q_data->c_rect.width	= q_data->width;
>> +	q_data->c_rect.height	= q_data->height;
>> +
>> +	vpe_dbg(ctx->dev, "Setting format for type %d, wxh: %dx%d, fmt: %d bpl_y %d",
>> +		f->type, q_data->width, q_data->height, q_data->fmt->fourcc,
>> +		q_data->bytesperline[VPE_LUMA]);
>> +	if (q_data->fmt->coplanar)
>> +		vpe_dbg(ctx->dev, " bpl_uv %d\n",
>> +			q_data->bytesperline[VPE_CHROMA]);
>> +
>> +	return 0;
>> +}
>> +
>> +static int vpe_s_fmt(struct file *file, void *priv, struct v4l2_format *f)
>> +{
>> +	int ret;
>> +	struct vpe_ctx *ctx = file2ctx(file);
>> +
>> +	ret = vpe_try_fmt(file, priv, f);
>> +	if (ret)
>> +		return ret;
>> +
>> +	ret = __vpe_s_fmt(ctx, f);
>> +	if (ret)
>> +		return ret;
>> +
>> +	if (V4L2_TYPE_IS_OUTPUT(f->type))
>> +		set_src_registers(ctx);
>> +	else
>> +		set_dst_registers(ctx);
>> +
>> +	return set_srcdst_params(ctx);
>> +}
>> +
>> +static int vpe_reqbufs(struct file *file, void *priv,
>> +		       struct v4l2_requestbuffers *reqbufs)
>> +{
>> +	struct vpe_ctx *ctx = file2ctx(file);
>> +
>> +	return v4l2_m2m_reqbufs(file, ctx->m2m_ctx, reqbufs);
>> +}
>> +
>> +static int vpe_querybuf(struct file *file, void *priv, struct v4l2_buffer *buf)
>> +{
>> +	struct vpe_ctx *ctx = file2ctx(file);
>> +
>> +	return v4l2_m2m_querybuf(file, ctx->m2m_ctx, buf);
>> +}
>> +
>> +static int vpe_qbuf(struct file *file, void *priv, struct v4l2_buffer *buf)
>> +{
>> +	struct vpe_ctx *ctx = file2ctx(file);
>> +
>> +	return v4l2_m2m_qbuf(file, ctx->m2m_ctx, buf);
>> +}
>> +
>> +static int vpe_dqbuf(struct file *file, void *priv, struct v4l2_buffer *buf)
>> +{
>> +	struct vpe_ctx *ctx = file2ctx(file);
>> +
>> +	return v4l2_m2m_dqbuf(file, ctx->m2m_ctx, buf);
>> +}
>> +
>> +static int vpe_streamon(struct file *file, void *priv, enum v4l2_buf_type type)
>> +{
>> +	struct vpe_ctx *ctx = file2ctx(file);
>> +
>> +	return v4l2_m2m_streamon(file, ctx->m2m_ctx, type);
>> +}
>> +
>> +static int vpe_streamoff(struct file *file, void *priv, enum v4l2_buf_type type)
>> +{
>> +	struct vpe_ctx *ctx = file2ctx(file);
>> +
>> +	vpe_dump_regs(ctx->dev);
>> +	vpdma_dump_regs(ctx->dev->vpdma);
>> +
>> +	return v4l2_m2m_streamoff(file, ctx->m2m_ctx, type);
>> +}
>> +
>> +#define V4L2_CID_TRANS_NUM_BUFS		(V4L2_CID_USER_BASE)
>> +
>> +static int vpe_s_ctrl(struct v4l2_ctrl *ctrl)
>> +{
>> +	struct vpe_ctx *ctx =
>> +		container_of(ctrl->handler, struct vpe_ctx, hdl);
>> +
>> +	switch (ctrl->id) {
>> +	case V4L2_CID_TRANS_NUM_BUFS:
>> +		ctx->bufs_per_job = ctrl->val;
>> +		break;
>> +
>> +	default:
>> +		vpe_err(ctx->dev, "Invalid control\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static const struct v4l2_ctrl_ops vpe_ctrl_ops = {
>> +	.s_ctrl = vpe_s_ctrl,
>> +};
>> +
>> +static const struct v4l2_ioctl_ops vpe_ioctl_ops = {
>> +	.vidioc_querycap	= vpe_querycap,
>> +
>> +	.vidioc_enum_fmt_vid_cap_mplane = vpe_enum_fmt,
>> +	.vidioc_g_fmt_vid_cap_mplane	= vpe_g_fmt,
>> +	.vidioc_try_fmt_vid_cap_mplane	= vpe_try_fmt,
>> +	.vidioc_s_fmt_vid_cap_mplane	= vpe_s_fmt,
>> +
>> +	.vidioc_enum_fmt_vid_out_mplane = vpe_enum_fmt,
>> +	.vidioc_g_fmt_vid_out_mplane	= vpe_g_fmt,
>> +	.vidioc_try_fmt_vid_out_mplane	= vpe_try_fmt,
>> +	.vidioc_s_fmt_vid_out_mplane	= vpe_s_fmt,
>> +
>> +	.vidioc_reqbufs		= vpe_reqbufs,
>> +	.vidioc_querybuf	= vpe_querybuf,
>> +
>> +	.vidioc_qbuf		= vpe_qbuf,
>> +	.vidioc_dqbuf		= vpe_dqbuf,
>> +
>> +	.vidioc_streamon	= vpe_streamon,
>> +	.vidioc_streamoff	= vpe_streamoff,
>> +	.vidioc_subscribe_event = v4l2_ctrl_subscribe_event,
>> +	.vidioc_unsubscribe_event = v4l2_event_unsubscribe,
>> +};
>> +
>> +/*
>> + * Queue operations
>> + */
>> +static int vpe_queue_setup(struct vb2_queue *vq,
>> +			   const struct v4l2_format *fmt,
>> +			   unsigned int *nbuffers, unsigned int *nplanes,
>> +			   unsigned int sizes[], void *alloc_ctxs[])
>> +{
>> +	int i;
>> +	struct vpe_ctx *ctx = vb2_get_drv_priv(vq);
>> +	struct vpe_q_data *q_data;
>> +
>> +	q_data = get_q_data(ctx, vq->type);
>> +
>> +	*nplanes = q_data->fmt->coplanar ? 2 : 1;
>> +
>> +	for (i = 0; i < *nplanes; i++) {
>> +		sizes[i] = q_data->sizeimage[i];
>> +		alloc_ctxs[i] = ctx->dev->alloc_ctx;
>> +	}
>> +
>> +	vpe_dbg(ctx->dev, "get %d buffer(s) of size %d", *nbuffers,
>> +		sizes[VPE_LUMA]);
>> +	if (q_data->fmt->coplanar)
>> +		vpe_dbg(ctx->dev, " and %d\n", sizes[VPE_CHROMA]);
>> +
>> +	return 0;
>> +}
>> +
>> +static int vpe_buf_prepare(struct vb2_buffer *vb)
>> +{
>> +	struct vpe_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
>> +	struct vpe_q_data *q_data;
>> +	int i, num_planes;
>> +
>> +	vpe_dbg(ctx->dev, "type: %d\n", vb->vb2_queue->type);
>> +
>> +	q_data = get_q_data(ctx, vb->vb2_queue->type);
>> +	num_planes = q_data->fmt->coplanar ? 2 : 1;
>> +
>> +	for (i = 0; i < num_planes; i++) {
>> +		if (vb2_plane_size(vb, i) < q_data->sizeimage[i]) {
>> +			vpe_err(ctx->dev,
>> +				"data will not fit into plane (%lu < %lu)\n",
>> +				vb2_plane_size(vb, i),
>> +				(long) q_data->sizeimage[i]);
>> +			return -EINVAL;
>> +		}
>> +	}
>> +
>> +	for (i = 0; i < num_planes; i++)
>> +		vb2_set_plane_payload(vb, i, q_data->sizeimage[i]);
>> +
>> +	return 0;
>> +}
>> +
>> +static void vpe_buf_queue(struct vb2_buffer *vb)
>> +{
>> +	struct vpe_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
>> +	v4l2_m2m_buf_queue(ctx->m2m_ctx, vb);
>> +}
>> +
>> +static void vpe_wait_prepare(struct vb2_queue *q)
>> +{
>> +	struct vpe_ctx *ctx = vb2_get_drv_priv(q);
>> +	vpe_unlock(ctx);
>> +}
>> +
>> +static void vpe_wait_finish(struct vb2_queue *q)
>> +{
>> +	struct vpe_ctx *ctx = vb2_get_drv_priv(q);
>> +	vpe_lock(ctx);
>> +}
>> +
>> +static struct vb2_ops vpe_qops = {
>> +	.queue_setup	 = vpe_queue_setup,
>> +	.buf_prepare	 = vpe_buf_prepare,
>> +	.buf_queue	 = vpe_buf_queue,
>> +	.wait_prepare	 = vpe_wait_prepare,
>> +	.wait_finish	 = vpe_wait_finish,
>> +};
>> +
>> +static int queue_init(void *priv, struct vb2_queue *src_vq,
>> +		      struct vb2_queue *dst_vq)
>> +{
>> +	struct vpe_ctx *ctx = priv;
>> +	int ret;
>> +
>> +	memset(src_vq, 0, sizeof(*src_vq));
>> +	src_vq->type = V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE;
>> +	src_vq->io_modes = VB2_MMAP;
>> +	src_vq->drv_priv = ctx;
>> +	src_vq->buf_struct_size = sizeof(struct v4l2_m2m_buffer);
>> +	src_vq->ops = &vpe_qops;
>> +	src_vq->mem_ops = &vb2_dma_contig_memops;
>> +	src_vq->timestamp_type = V4L2_BUF_FLAG_TIMESTAMP_MONOTONIC;
>> +
>> +	ret = vb2_queue_init(src_vq);
>> +	if (ret)
>> +		return ret;
>> +
>> +	memset(dst_vq, 0, sizeof(*dst_vq));
>> +	dst_vq->type = V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE;
>> +	dst_vq->io_modes = VB2_MMAP;
>> +	dst_vq->drv_priv = ctx;
>> +	dst_vq->buf_struct_size = sizeof(struct v4l2_m2m_buffer);
>> +	dst_vq->ops = &vpe_qops;
>> +	dst_vq->mem_ops = &vb2_dma_contig_memops;
>> +	dst_vq->timestamp_type = V4L2_BUF_FLAG_TIMESTAMP_MONOTONIC;
>> +
>> +	return vb2_queue_init(dst_vq);
>> +}
>> +
>> +static const struct v4l2_ctrl_config vpe_bufs_per_job = {
>> +	.ops = &vpe_ctrl_ops,
>> +	.id = V4L2_CID_TRANS_NUM_BUFS,
>> +	.name = "Buffers Per Transaction",
>> +	.type = V4L2_CTRL_TYPE_INTEGER,
>> +	.def = VPE_DEF_BUFS_PER_JOB,
>> +	.min = 1,
>> +	.max = VIDEO_MAX_FRAME,
>> +	.step = 1,
>> +};
>> +
>> +/*
>> + * File operations
>> + */
>> +static int vpe_open(struct file *file)
>> +{
>> +	struct vpe_dev *dev = video_drvdata(file);
>> +	struct vpe_ctx *ctx = NULL;
>> +	struct vpe_q_data *s_q_data;
>> +	struct v4l2_ctrl_handler *hdl;
>> +	int ret;
>> +
>> +	vpe_dbg(dev, "vpe_open\n");
>> +
>> +	if (!dev->vpdma->ready) {
>> +		vpe_err(dev, "vpdma firmware not loaded\n");
>> +		return -ENODEV;
>> +	}
>> +
>> +	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
>> +	if (!ctx)
>> +		return -ENOMEM;
>> +
>> +	ctx->dev = dev;
>> +
>> +	if (mutex_lock_interruptible(&dev->dev_mutex)) {
>> +		ret = -ERESTARTSYS;
>> +		goto free_ctx;
>> +	}
>> +
>> +	ret = vpdma_create_desc_list(&ctx->desc_list, VPE_DESC_LIST_SIZE,
>> +			VPDMA_LIST_TYPE_NORMAL);
>> +	if (ret != 0)
>> +		goto unlock;
>> +
>> +	ret = vpdma_buf_alloc(&ctx->mmr_adb, sizeof(struct vpe_mmr_adb));
>> +	if (ret != 0)
>> +		goto free_desc_list;
>> +
>> +	init_adb_hdrs(ctx);
>> +
>> +	v4l2_fh_init(&ctx->fh, video_devdata(file));
>> +	file->private_data = &ctx->fh;
>> +
>> +	hdl = &ctx->hdl;
>> +	v4l2_ctrl_handler_init(hdl, 1);
>> +	v4l2_ctrl_new_custom(hdl, &vpe_bufs_per_job, NULL);
>> +	if (hdl->error) {
>> +		ret = hdl->error;
>> +		goto exit_fh;
>> +	}
>
> I would expect to see a:
>
> 	ctx->fh.ctrl_handler = hdl;
>
> here, otherwise the handler will never be seen by the v4l2 framework.

This was something that I was going to follow up with a question on the 
list, it gave an error when I tried to pass the S_CTRL ioctl through 
user space. I guess your suggestion will fix this. Will update and try 
it out.


>
>> +
>> +	s_q_data = &ctx->q_data[Q_DATA_SRC];
>> +	s_q_data->fmt = &vpe_formats[2];
>> +	s_q_data->width = 1920;
>> +	s_q_data->height = 1080;
>> +	s_q_data->sizeimage[VPE_LUMA] = (s_q_data->width * s_q_data->height *
>> +			s_q_data->fmt->vpdma_fmt[VPE_LUMA]->depth) >> 3;
>> +	s_q_data->colorspace = V4L2_COLORSPACE_SMPTE240M;
>> +	s_q_data->c_rect.left = 0;
>> +	s_q_data->c_rect.top = 0;
>> +	s_q_data->c_rect.width = s_q_data->width;
>> +	s_q_data->c_rect.height = s_q_data->height;
>> +	s_q_data->flags = 0;
>> +
>> +	ctx->q_data[Q_DATA_DST] = *s_q_data;
>> +
>> +	set_src_registers(ctx);
>> +	set_dst_registers(ctx);
>> +	ret = set_srcdst_params(ctx);
>> +	if (ret)
>> +		goto exit_fh;
>> +
>> +	ctx->m2m_ctx = v4l2_m2m_ctx_init(dev->m2m_dev, ctx, &queue_init);
>> +
>> +	if (IS_ERR(ctx->m2m_ctx)) {
>> +		ret = PTR_ERR(ctx->m2m_ctx);
>> +		goto exit_fh;
>> +	}
>> +
>> +	v4l2_fh_add(&ctx->fh);
>> +
>> +	/*
>> +	 * for now, just report the creation of the first instance, we can later
>> +	 * optimize the driver to enable or disable clocks when the first
>> +	 * instance is created or the last instance released
>> +	 */
>> +	if (atomic_inc_return(&dev->num_instances) == 1)
>> +		vpe_dbg(dev, "first instance created\n");
>> +
>> +	ctx->bufs_per_job = VPE_DEF_BUFS_PER_JOB;
>> +
>> +	ctx->load_mmrs = true;
>> +
>> +	vpe_dbg(dev, "created instance %p, m2m_ctx: %p\n",
>> +		ctx, ctx->m2m_ctx);
>> +
>> +	mutex_unlock(&dev->dev_mutex);
>> +
>> +	return 0;
>> +exit_fh:
>> +	v4l2_ctrl_handler_free(hdl);
>> +	v4l2_fh_exit(&ctx->fh);
>> +	vpdma_buf_free(&ctx->mmr_adb);
>> +free_desc_list:
>> +	vpdma_free_desc_list(&ctx->desc_list);
>> +unlock:
>> +	mutex_unlock(&dev->dev_mutex);
>> +free_ctx:
>> +	kfree(ctx);
>> +	return ret;
>> +}
>> +
>> +static int vpe_release(struct file *file)
>> +{
>> +	struct vpe_dev *dev = video_drvdata(file);
>> +	struct vpe_ctx *ctx = file2ctx(file);
>> +
>> +	vpe_dbg(dev, "releasing instance %p\n", ctx);
>> +
>> +	mutex_lock(&dev->dev_mutex);
>> +	vpdma_free_desc_list(&ctx->desc_list);
>> +	vpdma_buf_free(&ctx->mmr_adb);
>
> I'm missing a v4l2_ctrl_handler_free() in this release function.
>
>> +
>> +	v4l2_fh_del(&ctx->fh);
>> +	v4l2_fh_exit(&ctx->fh);
>> +	v4l2_m2m_ctx_release(ctx->m2m_ctx);
>> +
>> +	kfree(ctx);
>> +
>> +	/*
>> +	 * for now, just report the release of the last instance, we can later
>> +	 * optimize the driver to enable or disable clocks when the first
>> +	 * instance is created or the last instance released
>> +	 */
>> +	if (atomic_dec_return(&dev->num_instances) == 0)
>> +		vpe_dbg(dev, "last instance released\n");
>> +
>> +	mutex_unlock(&dev->dev_mutex);
>> +
>> +	return 0;
>> +}
>> +
>> +static unsigned int vpe_poll(struct file *file,
>> +			     struct poll_table_struct *wait)
>> +{
>> +	struct vpe_ctx *ctx = file2ctx(file);
>> +	struct vpe_dev *dev = ctx->dev;
>> +	int ret;
>> +
>> +	mutex_lock(&dev->dev_mutex);
>> +	ret = v4l2_m2m_poll(file, ctx->m2m_ctx, wait);
>> +	mutex_unlock(&dev->dev_mutex);
>> +	return ret;
>> +}
>> +
>> +static int vpe_mmap(struct file *file, struct vm_area_struct *vma)
>> +{
>> +	struct vpe_ctx *ctx = file2ctx(file);
>> +	struct vpe_dev *dev = ctx->dev;
>> +	int ret;
>> +
>> +	if (mutex_lock_interruptible(&dev->dev_mutex))
>> +		return -ERESTARTSYS;
>> +	ret = v4l2_m2m_mmap(file, ctx->m2m_ctx, vma);
>> +	mutex_unlock(&dev->dev_mutex);
>> +	return ret;
>> +}
>> +
>> +static const struct v4l2_file_operations vpe_fops = {
>> +	.owner		= THIS_MODULE,
>> +	.open		= vpe_open,
>> +	.release	= vpe_release,
>> +	.poll		= vpe_poll,
>> +	.unlocked_ioctl	= video_ioctl2,
>> +	.mmap		= vpe_mmap,
>> +};
>> +
>> +static struct video_device vpe_videodev = {
>> +	.name		= VPE_MODULE_NAME,
>> +	.fops		= &vpe_fops,
>> +	.ioctl_ops	= &vpe_ioctl_ops,
>> +	.minor		= -1,
>> +	.release	= video_device_release,
>> +	.vfl_dir	= VFL_DIR_M2M,
>> +};
>> +
>> +static struct v4l2_m2m_ops m2m_ops = {
>> +	.device_run	= device_run,
>> +	.job_ready	= job_ready,
>> +	.job_abort	= job_abort,
>> +	.lock		= vpe_lock,
>> +	.unlock		= vpe_unlock,
>> +};
>> +
>> +static int vpe_runtime_get(struct platform_device *pdev)
>> +{
>> +	int r;
>> +
>> +	dev_dbg(&pdev->dev, "vpe_runtime_get\n");
>> +
>> +	r = pm_runtime_get_sync(&pdev->dev);
>> +	WARN_ON(r < 0);
>> +	return r < 0 ? r : 0;
>> +}
>> +
>> +static void vpe_runtime_put(struct platform_device *pdev)
>> +{
>> +
>> +	int r;
>> +
>> +	dev_dbg(&pdev->dev, "vpe_runtime_put\n");
>> +
>> +	r = pm_runtime_put_sync(&pdev->dev);
>> +	WARN_ON(r < 0 && r != -ENOSYS);
>> +}
>> +
>> +static int vpe_probe(struct platform_device *pdev)
>> +{
>> +	struct vpe_dev *dev;
>> +	struct video_device *vfd;
>> +	struct resource *res;
>> +	int ret, irq, func;
>> +
>> +	dev = devm_kzalloc(&pdev->dev, sizeof(*dev), GFP_KERNEL);
>> +	if (!dev)
>> +		return -ENOMEM;
>> +
>> +	spin_lock_init(&dev->lock);
>> +
>> +	pm_runtime_enable(&pdev->dev);
>> +
>> +	ret = vpe_runtime_get(pdev);
>> +	if (ret)
>> +		goto err_runtime_get;
>> +
>> +	irq = platform_get_irq(pdev, 0);
>> +	if (irq < 0) {
>> +		dev_err(&pdev->dev, "missing irq data\n");
>> +		return -ENODEV;
>> +	}
>> +
>> +	res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "vpe");
>> +	if (res == NULL) {
>> +		dev_err(&pdev->dev, "missing platform resources data\n");
>> +		return -ENODEV;
>> +	}
>> +
>> +	ret = v4l2_device_register(&pdev->dev, &dev->v4l2_dev);
>> +	if (ret)
>> +		return ret;
>> +
>> +	atomic_set(&dev->num_instances, 0);
>> +	mutex_init(&dev->dev_mutex);
>> +
>> +	vfd = video_device_alloc();
>
> In general I prefer to see the struct video_device embedded in struct vpe_dev. If
> nothing else it saves you from the NULL test below.
>
>> +	if (!vfd) {
>> +		vpe_err(dev, "Failed to allocate video device\n");
>> +		ret = -ENOMEM;
>> +		goto dev_unreg;
>> +	}
>> +
>> +	*vfd = vpe_videodev;
>> +	vfd->lock = &dev->dev_mutex;
>> +	vfd->v4l2_dev = &dev->v4l2_dev;
>> +
>> +	ret = video_register_device(vfd, VFL_TYPE_GRABBER, 0);
>
> This should be done as the very last thing: once the device is registered apps can
> immediately use it, and if the internal state is not yet fully ready, that can
> cause major problems.
>
> Remember that udev daemons can open devices as soon as they appear, so this is not
> a theoretical race condition. The best way to avoid problems is to call this as the
> very last thing.

That's a good point. I'll call it later in the probe.

Thanks,
Archit

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 3/6] v4l: ti-vpe: Add VPE mem to mem driver
@ 2013-08-02 14:55       ` Archit Taneja
  0 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-08-02 14:55 UTC (permalink / raw)
  To: Hans Verkuil
  Cc: linux-media, linux-omap, dagriego, dale, pawel, m.szyprowski,
	laurent.pinchart, tomi.valkeinen

Hi Hans,

Thanks for the comments. Some replies below.

On Friday 02 August 2013 08:06 PM, Hans Verkuil wrote:
> Hi Archit,
>
> I've got a few comments:
>
> On 08/02/2013 04:03 PM, Archit Taneja wrote:
>> VPE is a block which consists of a single memory to memory path which can
>> perform chrominance up/down sampling, de-interlacing, scaling, and color space
>> conversion of raster or tiled YUV420 coplanar, YUV422 coplanar or YUV422
>> interleaved video formats.
>>
>> We create a mem2mem driver based primarily on the mem2mem-testdev example.
>> The de-interlacer, scaler and color space converter are all bypassed for now
>> to keep the driver simple. Chroma up/down sampler blocks are implemented, so
>> conversion beteen different YUV formats is possible.
>>
>> Each mem2mem context allocates a buffer for VPE MMR values which it will use
>> when it gets access to the VPE HW via the mem2mem queue, it also allocates
>> a VPDMA descriptor list to which configuration and data descriptors are added.
>>
>> Based on the information received via v4l2 ioctls for the source and
>> destination queues, the driver configures the values for the MMRs, and stores
>> them in the buffer. There are also some VPDMA parameters like frame start and
>> line mode which needs to be configured, these are configured by direct register
>> writes via the VPDMA helper functions.
>>
>> The driver's device_run() mem2mem op will add each descriptor based on how the
>> source and destination queues are set up for the given ctx, once the list is
>> prepared, it's submitted to VPDMA, these descriptors when parsed by VPDMA will
>> upload MMR registers, start DMA of video buffers on the various input and output
>> clients/ports.
>>
>> When the list is parsed completely(and the DMAs on all the output ports done),
>> an interrupt is generated which we use to notify that the source and destination
>> buffers are done.
>>
>> The rest of the driver is quite similar to other mem2mem drivers, we use the
>> multiplane v4l2 ioctls as the HW support coplanar formats.
>>
>> Signed-off-by: Archit Taneja <archit@ti.com>
>> ---
>>   drivers/media/platform/Kconfig           |   10 +
>>   drivers/media/platform/Makefile          |    2 +
>>   drivers/media/platform/ti-vpe/vpe.c      | 1763 ++++++++++++++++++++++++++++++
>>   drivers/media/platform/ti-vpe/vpe_regs.h |  496 +++++++++
>>   4 files changed, 2271 insertions(+)
>>   create mode 100644 drivers/media/platform/ti-vpe/vpe.c
>>   create mode 100644 drivers/media/platform/ti-vpe/vpe_regs.h
>>
>
> ...
>
>> +/*
>> + * video ioctls
>> + */
>> +static int vpe_querycap(struct file *file, void *priv,
>> +			struct v4l2_capability *cap)
>> +{
>> +	strncpy(cap->driver, VPE_MODULE_NAME, sizeof(cap->driver) - 1);
>> +	strncpy(cap->card, VPE_MODULE_NAME, sizeof(cap->card) - 1);
>> +	strlcpy(cap->bus_info, VPE_MODULE_NAME, sizeof(cap->bus_info));
>> +	cap->device_caps  = V4L2_CAP_VIDEO_M2M | V4L2_CAP_STREAMING |
>> +				V4L2_CAP_VIDEO_CAPTURE_MPLANE |
>> +				V4L2_CAP_VIDEO_OUTPUT_MPLANE;
>
> That should be: V4L2_CAP_VIDEO_M2M_MPLANE | V4L2_CAP_STREAMING;
>
> No CAPTURE/OUTPUT_MPLANE.

Sure, I'll fix this.

<snip>

>> +
>> +	if (pix->field == V4L2_FIELD_ANY)
>> +		pix->field = V4L2_FIELD_NONE;
>> +	else if (V4L2_FIELD_NONE != pix->field)
>> +		return -EINVAL;
>
> No, TRY_FMT should map field to a valid field type. In this case it is simple:
> just set field to V4L2_FIELD_NONE.

Okay, I'll correct this.

I saw your comment on the de-interlacer patch. The de-interlacer can be 
bypassed, so for both ouput buffer type and capture buffer type, we can 
have interlaced and progressive content, so I guess NONE and ALTERNATE 
are possible for both.

>
>> +
>> +	v4l_bound_align_image(&pix->width, MIN_W, MAX_W, W_ALIGN,
>> +			      &pix->height, MIN_H, MAX_H, H_ALIGN,
>> +			      S_ALIGN);
>> +
>> +	pix->num_planes = fmt->coplanar ? 2 : 1;
>> +	pix->pixelformat = fmt->fourcc;
>> +	pix->colorspace = fmt->fourcc == V4L2_PIX_FMT_RGB24 ?
>> +			V4L2_COLORSPACE_SRGB : V4L2_COLORSPACE_SMPTE170M;
>> +
>> +
>> +	for (i = 0; i < pix->num_planes; i++) {
>> +		int depth;
>> +
>> +		plane_fmt = &pix->plane_fmt[i];
>> +		depth = fmt->vpdma_fmt[i]->depth;
>> +
>> +		if (i == VPE_LUMA)
>> +			plane_fmt->bytesperline =
>> +					round_up((pix->width * depth) >> 3,
>> +						1 << L_ALIGN);
>> +		else
>> +			plane_fmt->bytesperline = pix->width;
>> +
>> +		plane_fmt->sizeimage =
>> +				(pix->height * pix->width * depth) >> 3;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static int vpe_try_fmt(struct file *file, void *priv, struct v4l2_format *f)
>> +{
>> +	struct vpe_ctx *ctx = file2ctx(file);
>> +	struct vpe_fmt *fmt = find_format(f);
>> +
>> +	if (V4L2_TYPE_IS_OUTPUT(f->type))
>> +		return __vpe_try_fmt(ctx, f, fmt, VPE_FMT_TYPE_OUTPUT);
>> +	else
>> +		return __vpe_try_fmt(ctx, f, fmt, VPE_FMT_TYPE_CAPTURE);
>> +}
>> +
>> +static int __vpe_s_fmt(struct vpe_ctx *ctx, struct v4l2_format *f)
>> +{
>> +	struct v4l2_pix_format_mplane *pix = &f->fmt.pix_mp;
>> +	struct v4l2_plane_pix_format *plane_fmt;
>> +	struct vpe_q_data *q_data;
>> +	struct vb2_queue *vq;
>> +	int i;
>> +
>> +	vq = v4l2_m2m_get_vq(ctx->m2m_ctx, f->type);
>> +	if (!vq)
>> +		return -EINVAL;
>> +
>> +	if (vb2_is_busy(vq)) {
>> +		vpe_err(ctx->dev, "queue busy\n");
>> +		return -EBUSY;
>> +	}
>> +
>> +	q_data = get_q_data(ctx, f->type);
>> +	if (!q_data)
>> +		return -EINVAL;
>> +
>> +	q_data->fmt		= find_format(f);
>> +	q_data->width		= pix->width;
>> +	q_data->height		= pix->height;
>> +	q_data->colorspace	= pix->colorspace;
>> +
>> +	for (i = 0; i < pix->num_planes; i++) {
>> +		plane_fmt = &pix->plane_fmt[i];
>> +
>> +		q_data->bytesperline[i]	= plane_fmt->bytesperline;
>> +		q_data->sizeimage[i]	= plane_fmt->sizeimage;
>> +	}
>> +
>> +	q_data->c_rect.left	= 0;
>> +	q_data->c_rect.top	= 0;
>> +	q_data->c_rect.width	= q_data->width;
>> +	q_data->c_rect.height	= q_data->height;
>> +
>> +	vpe_dbg(ctx->dev, "Setting format for type %d, wxh: %dx%d, fmt: %d bpl_y %d",
>> +		f->type, q_data->width, q_data->height, q_data->fmt->fourcc,
>> +		q_data->bytesperline[VPE_LUMA]);
>> +	if (q_data->fmt->coplanar)
>> +		vpe_dbg(ctx->dev, " bpl_uv %d\n",
>> +			q_data->bytesperline[VPE_CHROMA]);
>> +
>> +	return 0;
>> +}
>> +
>> +static int vpe_s_fmt(struct file *file, void *priv, struct v4l2_format *f)
>> +{
>> +	int ret;
>> +	struct vpe_ctx *ctx = file2ctx(file);
>> +
>> +	ret = vpe_try_fmt(file, priv, f);
>> +	if (ret)
>> +		return ret;
>> +
>> +	ret = __vpe_s_fmt(ctx, f);
>> +	if (ret)
>> +		return ret;
>> +
>> +	if (V4L2_TYPE_IS_OUTPUT(f->type))
>> +		set_src_registers(ctx);
>> +	else
>> +		set_dst_registers(ctx);
>> +
>> +	return set_srcdst_params(ctx);
>> +}
>> +
>> +static int vpe_reqbufs(struct file *file, void *priv,
>> +		       struct v4l2_requestbuffers *reqbufs)
>> +{
>> +	struct vpe_ctx *ctx = file2ctx(file);
>> +
>> +	return v4l2_m2m_reqbufs(file, ctx->m2m_ctx, reqbufs);
>> +}
>> +
>> +static int vpe_querybuf(struct file *file, void *priv, struct v4l2_buffer *buf)
>> +{
>> +	struct vpe_ctx *ctx = file2ctx(file);
>> +
>> +	return v4l2_m2m_querybuf(file, ctx->m2m_ctx, buf);
>> +}
>> +
>> +static int vpe_qbuf(struct file *file, void *priv, struct v4l2_buffer *buf)
>> +{
>> +	struct vpe_ctx *ctx = file2ctx(file);
>> +
>> +	return v4l2_m2m_qbuf(file, ctx->m2m_ctx, buf);
>> +}
>> +
>> +static int vpe_dqbuf(struct file *file, void *priv, struct v4l2_buffer *buf)
>> +{
>> +	struct vpe_ctx *ctx = file2ctx(file);
>> +
>> +	return v4l2_m2m_dqbuf(file, ctx->m2m_ctx, buf);
>> +}
>> +
>> +static int vpe_streamon(struct file *file, void *priv, enum v4l2_buf_type type)
>> +{
>> +	struct vpe_ctx *ctx = file2ctx(file);
>> +
>> +	return v4l2_m2m_streamon(file, ctx->m2m_ctx, type);
>> +}
>> +
>> +static int vpe_streamoff(struct file *file, void *priv, enum v4l2_buf_type type)
>> +{
>> +	struct vpe_ctx *ctx = file2ctx(file);
>> +
>> +	vpe_dump_regs(ctx->dev);
>> +	vpdma_dump_regs(ctx->dev->vpdma);
>> +
>> +	return v4l2_m2m_streamoff(file, ctx->m2m_ctx, type);
>> +}
>> +
>> +#define V4L2_CID_TRANS_NUM_BUFS		(V4L2_CID_USER_BASE)
>> +
>> +static int vpe_s_ctrl(struct v4l2_ctrl *ctrl)
>> +{
>> +	struct vpe_ctx *ctx =
>> +		container_of(ctrl->handler, struct vpe_ctx, hdl);
>> +
>> +	switch (ctrl->id) {
>> +	case V4L2_CID_TRANS_NUM_BUFS:
>> +		ctx->bufs_per_job = ctrl->val;
>> +		break;
>> +
>> +	default:
>> +		vpe_err(ctx->dev, "Invalid control\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static const struct v4l2_ctrl_ops vpe_ctrl_ops = {
>> +	.s_ctrl = vpe_s_ctrl,
>> +};
>> +
>> +static const struct v4l2_ioctl_ops vpe_ioctl_ops = {
>> +	.vidioc_querycap	= vpe_querycap,
>> +
>> +	.vidioc_enum_fmt_vid_cap_mplane = vpe_enum_fmt,
>> +	.vidioc_g_fmt_vid_cap_mplane	= vpe_g_fmt,
>> +	.vidioc_try_fmt_vid_cap_mplane	= vpe_try_fmt,
>> +	.vidioc_s_fmt_vid_cap_mplane	= vpe_s_fmt,
>> +
>> +	.vidioc_enum_fmt_vid_out_mplane = vpe_enum_fmt,
>> +	.vidioc_g_fmt_vid_out_mplane	= vpe_g_fmt,
>> +	.vidioc_try_fmt_vid_out_mplane	= vpe_try_fmt,
>> +	.vidioc_s_fmt_vid_out_mplane	= vpe_s_fmt,
>> +
>> +	.vidioc_reqbufs		= vpe_reqbufs,
>> +	.vidioc_querybuf	= vpe_querybuf,
>> +
>> +	.vidioc_qbuf		= vpe_qbuf,
>> +	.vidioc_dqbuf		= vpe_dqbuf,
>> +
>> +	.vidioc_streamon	= vpe_streamon,
>> +	.vidioc_streamoff	= vpe_streamoff,
>> +	.vidioc_subscribe_event = v4l2_ctrl_subscribe_event,
>> +	.vidioc_unsubscribe_event = v4l2_event_unsubscribe,
>> +};
>> +
>> +/*
>> + * Queue operations
>> + */
>> +static int vpe_queue_setup(struct vb2_queue *vq,
>> +			   const struct v4l2_format *fmt,
>> +			   unsigned int *nbuffers, unsigned int *nplanes,
>> +			   unsigned int sizes[], void *alloc_ctxs[])
>> +{
>> +	int i;
>> +	struct vpe_ctx *ctx = vb2_get_drv_priv(vq);
>> +	struct vpe_q_data *q_data;
>> +
>> +	q_data = get_q_data(ctx, vq->type);
>> +
>> +	*nplanes = q_data->fmt->coplanar ? 2 : 1;
>> +
>> +	for (i = 0; i < *nplanes; i++) {
>> +		sizes[i] = q_data->sizeimage[i];
>> +		alloc_ctxs[i] = ctx->dev->alloc_ctx;
>> +	}
>> +
>> +	vpe_dbg(ctx->dev, "get %d buffer(s) of size %d", *nbuffers,
>> +		sizes[VPE_LUMA]);
>> +	if (q_data->fmt->coplanar)
>> +		vpe_dbg(ctx->dev, " and %d\n", sizes[VPE_CHROMA]);
>> +
>> +	return 0;
>> +}
>> +
>> +static int vpe_buf_prepare(struct vb2_buffer *vb)
>> +{
>> +	struct vpe_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
>> +	struct vpe_q_data *q_data;
>> +	int i, num_planes;
>> +
>> +	vpe_dbg(ctx->dev, "type: %d\n", vb->vb2_queue->type);
>> +
>> +	q_data = get_q_data(ctx, vb->vb2_queue->type);
>> +	num_planes = q_data->fmt->coplanar ? 2 : 1;
>> +
>> +	for (i = 0; i < num_planes; i++) {
>> +		if (vb2_plane_size(vb, i) < q_data->sizeimage[i]) {
>> +			vpe_err(ctx->dev,
>> +				"data will not fit into plane (%lu < %lu)\n",
>> +				vb2_plane_size(vb, i),
>> +				(long) q_data->sizeimage[i]);
>> +			return -EINVAL;
>> +		}
>> +	}
>> +
>> +	for (i = 0; i < num_planes; i++)
>> +		vb2_set_plane_payload(vb, i, q_data->sizeimage[i]);
>> +
>> +	return 0;
>> +}
>> +
>> +static void vpe_buf_queue(struct vb2_buffer *vb)
>> +{
>> +	struct vpe_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
>> +	v4l2_m2m_buf_queue(ctx->m2m_ctx, vb);
>> +}
>> +
>> +static void vpe_wait_prepare(struct vb2_queue *q)
>> +{
>> +	struct vpe_ctx *ctx = vb2_get_drv_priv(q);
>> +	vpe_unlock(ctx);
>> +}
>> +
>> +static void vpe_wait_finish(struct vb2_queue *q)
>> +{
>> +	struct vpe_ctx *ctx = vb2_get_drv_priv(q);
>> +	vpe_lock(ctx);
>> +}
>> +
>> +static struct vb2_ops vpe_qops = {
>> +	.queue_setup	 = vpe_queue_setup,
>> +	.buf_prepare	 = vpe_buf_prepare,
>> +	.buf_queue	 = vpe_buf_queue,
>> +	.wait_prepare	 = vpe_wait_prepare,
>> +	.wait_finish	 = vpe_wait_finish,
>> +};
>> +
>> +static int queue_init(void *priv, struct vb2_queue *src_vq,
>> +		      struct vb2_queue *dst_vq)
>> +{
>> +	struct vpe_ctx *ctx = priv;
>> +	int ret;
>> +
>> +	memset(src_vq, 0, sizeof(*src_vq));
>> +	src_vq->type = V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE;
>> +	src_vq->io_modes = VB2_MMAP;
>> +	src_vq->drv_priv = ctx;
>> +	src_vq->buf_struct_size = sizeof(struct v4l2_m2m_buffer);
>> +	src_vq->ops = &vpe_qops;
>> +	src_vq->mem_ops = &vb2_dma_contig_memops;
>> +	src_vq->timestamp_type = V4L2_BUF_FLAG_TIMESTAMP_MONOTONIC;
>> +
>> +	ret = vb2_queue_init(src_vq);
>> +	if (ret)
>> +		return ret;
>> +
>> +	memset(dst_vq, 0, sizeof(*dst_vq));
>> +	dst_vq->type = V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE;
>> +	dst_vq->io_modes = VB2_MMAP;
>> +	dst_vq->drv_priv = ctx;
>> +	dst_vq->buf_struct_size = sizeof(struct v4l2_m2m_buffer);
>> +	dst_vq->ops = &vpe_qops;
>> +	dst_vq->mem_ops = &vb2_dma_contig_memops;
>> +	dst_vq->timestamp_type = V4L2_BUF_FLAG_TIMESTAMP_MONOTONIC;
>> +
>> +	return vb2_queue_init(dst_vq);
>> +}
>> +
>> +static const struct v4l2_ctrl_config vpe_bufs_per_job = {
>> +	.ops = &vpe_ctrl_ops,
>> +	.id = V4L2_CID_TRANS_NUM_BUFS,
>> +	.name = "Buffers Per Transaction",
>> +	.type = V4L2_CTRL_TYPE_INTEGER,
>> +	.def = VPE_DEF_BUFS_PER_JOB,
>> +	.min = 1,
>> +	.max = VIDEO_MAX_FRAME,
>> +	.step = 1,
>> +};
>> +
>> +/*
>> + * File operations
>> + */
>> +static int vpe_open(struct file *file)
>> +{
>> +	struct vpe_dev *dev = video_drvdata(file);
>> +	struct vpe_ctx *ctx = NULL;
>> +	struct vpe_q_data *s_q_data;
>> +	struct v4l2_ctrl_handler *hdl;
>> +	int ret;
>> +
>> +	vpe_dbg(dev, "vpe_open\n");
>> +
>> +	if (!dev->vpdma->ready) {
>> +		vpe_err(dev, "vpdma firmware not loaded\n");
>> +		return -ENODEV;
>> +	}
>> +
>> +	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
>> +	if (!ctx)
>> +		return -ENOMEM;
>> +
>> +	ctx->dev = dev;
>> +
>> +	if (mutex_lock_interruptible(&dev->dev_mutex)) {
>> +		ret = -ERESTARTSYS;
>> +		goto free_ctx;
>> +	}
>> +
>> +	ret = vpdma_create_desc_list(&ctx->desc_list, VPE_DESC_LIST_SIZE,
>> +			VPDMA_LIST_TYPE_NORMAL);
>> +	if (ret != 0)
>> +		goto unlock;
>> +
>> +	ret = vpdma_buf_alloc(&ctx->mmr_adb, sizeof(struct vpe_mmr_adb));
>> +	if (ret != 0)
>> +		goto free_desc_list;
>> +
>> +	init_adb_hdrs(ctx);
>> +
>> +	v4l2_fh_init(&ctx->fh, video_devdata(file));
>> +	file->private_data = &ctx->fh;
>> +
>> +	hdl = &ctx->hdl;
>> +	v4l2_ctrl_handler_init(hdl, 1);
>> +	v4l2_ctrl_new_custom(hdl, &vpe_bufs_per_job, NULL);
>> +	if (hdl->error) {
>> +		ret = hdl->error;
>> +		goto exit_fh;
>> +	}
>
> I would expect to see a:
>
> 	ctx->fh.ctrl_handler = hdl;
>
> here, otherwise the handler will never be seen by the v4l2 framework.

This was something that I was going to follow up with a question on the 
list, it gave an error when I tried to pass the S_CTRL ioctl through 
user space. I guess your suggestion will fix this. Will update and try 
it out.


>
>> +
>> +	s_q_data = &ctx->q_data[Q_DATA_SRC];
>> +	s_q_data->fmt = &vpe_formats[2];
>> +	s_q_data->width = 1920;
>> +	s_q_data->height = 1080;
>> +	s_q_data->sizeimage[VPE_LUMA] = (s_q_data->width * s_q_data->height *
>> +			s_q_data->fmt->vpdma_fmt[VPE_LUMA]->depth) >> 3;
>> +	s_q_data->colorspace = V4L2_COLORSPACE_SMPTE240M;
>> +	s_q_data->c_rect.left = 0;
>> +	s_q_data->c_rect.top = 0;
>> +	s_q_data->c_rect.width = s_q_data->width;
>> +	s_q_data->c_rect.height = s_q_data->height;
>> +	s_q_data->flags = 0;
>> +
>> +	ctx->q_data[Q_DATA_DST] = *s_q_data;
>> +
>> +	set_src_registers(ctx);
>> +	set_dst_registers(ctx);
>> +	ret = set_srcdst_params(ctx);
>> +	if (ret)
>> +		goto exit_fh;
>> +
>> +	ctx->m2m_ctx = v4l2_m2m_ctx_init(dev->m2m_dev, ctx, &queue_init);
>> +
>> +	if (IS_ERR(ctx->m2m_ctx)) {
>> +		ret = PTR_ERR(ctx->m2m_ctx);
>> +		goto exit_fh;
>> +	}
>> +
>> +	v4l2_fh_add(&ctx->fh);
>> +
>> +	/*
>> +	 * for now, just report the creation of the first instance, we can later
>> +	 * optimize the driver to enable or disable clocks when the first
>> +	 * instance is created or the last instance released
>> +	 */
>> +	if (atomic_inc_return(&dev->num_instances) == 1)
>> +		vpe_dbg(dev, "first instance created\n");
>> +
>> +	ctx->bufs_per_job = VPE_DEF_BUFS_PER_JOB;
>> +
>> +	ctx->load_mmrs = true;
>> +
>> +	vpe_dbg(dev, "created instance %p, m2m_ctx: %p\n",
>> +		ctx, ctx->m2m_ctx);
>> +
>> +	mutex_unlock(&dev->dev_mutex);
>> +
>> +	return 0;
>> +exit_fh:
>> +	v4l2_ctrl_handler_free(hdl);
>> +	v4l2_fh_exit(&ctx->fh);
>> +	vpdma_buf_free(&ctx->mmr_adb);
>> +free_desc_list:
>> +	vpdma_free_desc_list(&ctx->desc_list);
>> +unlock:
>> +	mutex_unlock(&dev->dev_mutex);
>> +free_ctx:
>> +	kfree(ctx);
>> +	return ret;
>> +}
>> +
>> +static int vpe_release(struct file *file)
>> +{
>> +	struct vpe_dev *dev = video_drvdata(file);
>> +	struct vpe_ctx *ctx = file2ctx(file);
>> +
>> +	vpe_dbg(dev, "releasing instance %p\n", ctx);
>> +
>> +	mutex_lock(&dev->dev_mutex);
>> +	vpdma_free_desc_list(&ctx->desc_list);
>> +	vpdma_buf_free(&ctx->mmr_adb);
>
> I'm missing a v4l2_ctrl_handler_free() in this release function.
>
>> +
>> +	v4l2_fh_del(&ctx->fh);
>> +	v4l2_fh_exit(&ctx->fh);
>> +	v4l2_m2m_ctx_release(ctx->m2m_ctx);
>> +
>> +	kfree(ctx);
>> +
>> +	/*
>> +	 * for now, just report the release of the last instance, we can later
>> +	 * optimize the driver to enable or disable clocks when the first
>> +	 * instance is created or the last instance released
>> +	 */
>> +	if (atomic_dec_return(&dev->num_instances) == 0)
>> +		vpe_dbg(dev, "last instance released\n");
>> +
>> +	mutex_unlock(&dev->dev_mutex);
>> +
>> +	return 0;
>> +}
>> +
>> +static unsigned int vpe_poll(struct file *file,
>> +			     struct poll_table_struct *wait)
>> +{
>> +	struct vpe_ctx *ctx = file2ctx(file);
>> +	struct vpe_dev *dev = ctx->dev;
>> +	int ret;
>> +
>> +	mutex_lock(&dev->dev_mutex);
>> +	ret = v4l2_m2m_poll(file, ctx->m2m_ctx, wait);
>> +	mutex_unlock(&dev->dev_mutex);
>> +	return ret;
>> +}
>> +
>> +static int vpe_mmap(struct file *file, struct vm_area_struct *vma)
>> +{
>> +	struct vpe_ctx *ctx = file2ctx(file);
>> +	struct vpe_dev *dev = ctx->dev;
>> +	int ret;
>> +
>> +	if (mutex_lock_interruptible(&dev->dev_mutex))
>> +		return -ERESTARTSYS;
>> +	ret = v4l2_m2m_mmap(file, ctx->m2m_ctx, vma);
>> +	mutex_unlock(&dev->dev_mutex);
>> +	return ret;
>> +}
>> +
>> +static const struct v4l2_file_operations vpe_fops = {
>> +	.owner		= THIS_MODULE,
>> +	.open		= vpe_open,
>> +	.release	= vpe_release,
>> +	.poll		= vpe_poll,
>> +	.unlocked_ioctl	= video_ioctl2,
>> +	.mmap		= vpe_mmap,
>> +};
>> +
>> +static struct video_device vpe_videodev = {
>> +	.name		= VPE_MODULE_NAME,
>> +	.fops		= &vpe_fops,
>> +	.ioctl_ops	= &vpe_ioctl_ops,
>> +	.minor		= -1,
>> +	.release	= video_device_release,
>> +	.vfl_dir	= VFL_DIR_M2M,
>> +};
>> +
>> +static struct v4l2_m2m_ops m2m_ops = {
>> +	.device_run	= device_run,
>> +	.job_ready	= job_ready,
>> +	.job_abort	= job_abort,
>> +	.lock		= vpe_lock,
>> +	.unlock		= vpe_unlock,
>> +};
>> +
>> +static int vpe_runtime_get(struct platform_device *pdev)
>> +{
>> +	int r;
>> +
>> +	dev_dbg(&pdev->dev, "vpe_runtime_get\n");
>> +
>> +	r = pm_runtime_get_sync(&pdev->dev);
>> +	WARN_ON(r < 0);
>> +	return r < 0 ? r : 0;
>> +}
>> +
>> +static void vpe_runtime_put(struct platform_device *pdev)
>> +{
>> +
>> +	int r;
>> +
>> +	dev_dbg(&pdev->dev, "vpe_runtime_put\n");
>> +
>> +	r = pm_runtime_put_sync(&pdev->dev);
>> +	WARN_ON(r < 0 && r != -ENOSYS);
>> +}
>> +
>> +static int vpe_probe(struct platform_device *pdev)
>> +{
>> +	struct vpe_dev *dev;
>> +	struct video_device *vfd;
>> +	struct resource *res;
>> +	int ret, irq, func;
>> +
>> +	dev = devm_kzalloc(&pdev->dev, sizeof(*dev), GFP_KERNEL);
>> +	if (!dev)
>> +		return -ENOMEM;
>> +
>> +	spin_lock_init(&dev->lock);
>> +
>> +	pm_runtime_enable(&pdev->dev);
>> +
>> +	ret = vpe_runtime_get(pdev);
>> +	if (ret)
>> +		goto err_runtime_get;
>> +
>> +	irq = platform_get_irq(pdev, 0);
>> +	if (irq < 0) {
>> +		dev_err(&pdev->dev, "missing irq data\n");
>> +		return -ENODEV;
>> +	}
>> +
>> +	res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "vpe");
>> +	if (res == NULL) {
>> +		dev_err(&pdev->dev, "missing platform resources data\n");
>> +		return -ENODEV;
>> +	}
>> +
>> +	ret = v4l2_device_register(&pdev->dev, &dev->v4l2_dev);
>> +	if (ret)
>> +		return ret;
>> +
>> +	atomic_set(&dev->num_instances, 0);
>> +	mutex_init(&dev->dev_mutex);
>> +
>> +	vfd = video_device_alloc();
>
> In general I prefer to see the struct video_device embedded in struct vpe_dev. If
> nothing else it saves you from the NULL test below.
>
>> +	if (!vfd) {
>> +		vpe_err(dev, "Failed to allocate video device\n");
>> +		ret = -ENOMEM;
>> +		goto dev_unreg;
>> +	}
>> +
>> +	*vfd = vpe_videodev;
>> +	vfd->lock = &dev->dev_mutex;
>> +	vfd->v4l2_dev = &dev->v4l2_dev;
>> +
>> +	ret = video_register_device(vfd, VFL_TYPE_GRABBER, 0);
>
> This should be done as the very last thing: once the device is registered apps can
> immediately use it, and if the internal state is not yet fully ready, that can
> cause major problems.
>
> Remember that udev daemons can open devices as soon as they appear, so this is not
> a theoretical race condition. The best way to avoid problems is to call this as the
> very last thing.

That's a good point. I'll call it later in the probe.

Thanks,
Archit

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 1/6] v4l: ti-vpe: Create a vpdma helper library
  2013-08-02 14:03   ` Archit Taneja
@ 2013-08-05  8:13     ` Tomi Valkeinen
  -1 siblings, 0 replies; 138+ messages in thread
From: Tomi Valkeinen @ 2013-08-05  8:13 UTC (permalink / raw)
  To: Archit Taneja
  Cc: linux-media, linux-omap, dagriego, dale, pawel, m.szyprowski,
	hverkuil, laurent.pinchart

[-- Attachment #1: Type: text/plain, Size: 3081 bytes --]

Hi,

On 02/08/13 17:03, Archit Taneja wrote:

> +struct vpdma_data_format vpdma_yuv_fmts[] = {
> +	[VPDMA_DATA_FMT_Y444] = {
> +		.data_type	= DATA_TYPE_Y444,
> +		.depth		= 8,
> +	},

This, and all the other tables, should probably be consts?

> +static void insert_field(u32 *valp, u32 field, u32 mask, int shift)
> +{
> +	u32 val = *valp;
> +
> +	val &= ~(mask << shift);
> +	val |= (field & mask) << shift;
> +	*valp = val;
> +}

I think "insert" normally means, well, inserting a thing in between
something. What you do here is overwriting.

Why not just call it "write_field"?

> + * Allocate a DMA buffer
> + */
> +int vpdma_buf_alloc(struct vpdma_buf *buf, size_t size)
> +{
> +	buf->size = size;
> +	buf->mapped = 0;

Maybe true/false is clearer here that 0/1.

> +/*
> + * submit a list of DMA descriptors to the VPE VPDMA, do not wait for completion
> + */
> +int vpdma_submit_descs(struct vpdma_data *vpdma, struct vpdma_desc_list *list)
> +{
> +	/* we always use the first list */
> +	int list_num = 0;
> +	int list_size;
> +
> +	if (vpdma_list_busy(vpdma, list_num))
> +		return -EBUSY;
> +
> +	/* 16-byte granularity */
> +	list_size = (list->next - list->buf.addr) >> 4;
> +
> +	write_reg(vpdma, VPDMA_LIST_ADDR, (u32) list->buf.dma_addr);
> +	wmb();

What is the wmb() for?

> +	write_reg(vpdma, VPDMA_LIST_ATTR,
> +			(list_num << VPDMA_LIST_NUM_SHFT) |
> +			(list->type << VPDMA_LIST_TYPE_SHFT) |
> +			list_size);
> +
> +	return 0;
> +}

> +static void vpdma_firmware_cb(const struct firmware *f, void *context)
> +{
> +	struct vpdma_data *vpdma = context;
> +	struct vpdma_buf fw_dma_buf;
> +	int i, r;
> +
> +	dev_dbg(&vpdma->pdev->dev, "firmware callback\n");
> +
> +	if (!f || !f->data) {
> +		dev_err(&vpdma->pdev->dev, "couldn't get firmware\n");
> +		return;
> +	}
> +
> +	/* already initialized */
> +	if (get_field_reg(vpdma, VPDMA_LIST_ATTR, VPDMA_LIST_RDY_MASK,
> +			VPDMA_LIST_RDY_SHFT)) {
> +		vpdma->ready = true;
> +		return;
> +	}
> +
> +	r = vpdma_buf_alloc(&fw_dma_buf, f->size);
> +	if (r) {
> +		dev_err(&vpdma->pdev->dev,
> +			"failed to allocate dma buffer for firmware\n");
> +		goto rel_fw;
> +	}
> +
> +	memcpy(fw_dma_buf.addr, f->data, f->size);
> +
> +	vpdma_buf_map(vpdma, &fw_dma_buf);
> +
> +	write_reg(vpdma, VPDMA_LIST_ADDR, (u32) fw_dma_buf.dma_addr);
> +
> +	for (i = 0; i < 100; i++) {		/* max 1 second */
> +		msleep_interruptible(10);

You call interruptible version here, but you don't handle the
interrupted case. I believe the loop will just continue looping, even if
the user interrupted.

> +		if (get_field_reg(vpdma, VPDMA_LIST_ATTR, VPDMA_LIST_RDY_MASK,
> +				VPDMA_LIST_RDY_SHFT))
> +			break;
> +	}
> +
> +	if (i == 100) {
> +		dev_err(&vpdma->pdev->dev, "firmware upload failed\n");
> +		goto free_buf;
> +	}
> +
> +	vpdma->ready = true;
> +
> +free_buf:
> +	vpdma_buf_unmap(vpdma, &fw_dma_buf);
> +
> +	vpdma_buf_free(&fw_dma_buf);
> +rel_fw:
> +	release_firmware(f);
> +}

 Tomi



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 901 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 1/6] v4l: ti-vpe: Create a vpdma helper library
@ 2013-08-05  8:13     ` Tomi Valkeinen
  0 siblings, 0 replies; 138+ messages in thread
From: Tomi Valkeinen @ 2013-08-05  8:13 UTC (permalink / raw)
  To: Archit Taneja
  Cc: linux-media, linux-omap, dagriego, dale, pawel, m.szyprowski,
	hverkuil, laurent.pinchart

[-- Attachment #1: Type: text/plain, Size: 3081 bytes --]

Hi,

On 02/08/13 17:03, Archit Taneja wrote:

> +struct vpdma_data_format vpdma_yuv_fmts[] = {
> +	[VPDMA_DATA_FMT_Y444] = {
> +		.data_type	= DATA_TYPE_Y444,
> +		.depth		= 8,
> +	},

This, and all the other tables, should probably be consts?

> +static void insert_field(u32 *valp, u32 field, u32 mask, int shift)
> +{
> +	u32 val = *valp;
> +
> +	val &= ~(mask << shift);
> +	val |= (field & mask) << shift;
> +	*valp = val;
> +}

I think "insert" normally means, well, inserting a thing in between
something. What you do here is overwriting.

Why not just call it "write_field"?

> + * Allocate a DMA buffer
> + */
> +int vpdma_buf_alloc(struct vpdma_buf *buf, size_t size)
> +{
> +	buf->size = size;
> +	buf->mapped = 0;

Maybe true/false is clearer here that 0/1.

> +/*
> + * submit a list of DMA descriptors to the VPE VPDMA, do not wait for completion
> + */
> +int vpdma_submit_descs(struct vpdma_data *vpdma, struct vpdma_desc_list *list)
> +{
> +	/* we always use the first list */
> +	int list_num = 0;
> +	int list_size;
> +
> +	if (vpdma_list_busy(vpdma, list_num))
> +		return -EBUSY;
> +
> +	/* 16-byte granularity */
> +	list_size = (list->next - list->buf.addr) >> 4;
> +
> +	write_reg(vpdma, VPDMA_LIST_ADDR, (u32) list->buf.dma_addr);
> +	wmb();

What is the wmb() for?

> +	write_reg(vpdma, VPDMA_LIST_ATTR,
> +			(list_num << VPDMA_LIST_NUM_SHFT) |
> +			(list->type << VPDMA_LIST_TYPE_SHFT) |
> +			list_size);
> +
> +	return 0;
> +}

> +static void vpdma_firmware_cb(const struct firmware *f, void *context)
> +{
> +	struct vpdma_data *vpdma = context;
> +	struct vpdma_buf fw_dma_buf;
> +	int i, r;
> +
> +	dev_dbg(&vpdma->pdev->dev, "firmware callback\n");
> +
> +	if (!f || !f->data) {
> +		dev_err(&vpdma->pdev->dev, "couldn't get firmware\n");
> +		return;
> +	}
> +
> +	/* already initialized */
> +	if (get_field_reg(vpdma, VPDMA_LIST_ATTR, VPDMA_LIST_RDY_MASK,
> +			VPDMA_LIST_RDY_SHFT)) {
> +		vpdma->ready = true;
> +		return;
> +	}
> +
> +	r = vpdma_buf_alloc(&fw_dma_buf, f->size);
> +	if (r) {
> +		dev_err(&vpdma->pdev->dev,
> +			"failed to allocate dma buffer for firmware\n");
> +		goto rel_fw;
> +	}
> +
> +	memcpy(fw_dma_buf.addr, f->data, f->size);
> +
> +	vpdma_buf_map(vpdma, &fw_dma_buf);
> +
> +	write_reg(vpdma, VPDMA_LIST_ADDR, (u32) fw_dma_buf.dma_addr);
> +
> +	for (i = 0; i < 100; i++) {		/* max 1 second */
> +		msleep_interruptible(10);

You call interruptible version here, but you don't handle the
interrupted case. I believe the loop will just continue looping, even if
the user interrupted.

> +		if (get_field_reg(vpdma, VPDMA_LIST_ATTR, VPDMA_LIST_RDY_MASK,
> +				VPDMA_LIST_RDY_SHFT))
> +			break;
> +	}
> +
> +	if (i == 100) {
> +		dev_err(&vpdma->pdev->dev, "firmware upload failed\n");
> +		goto free_buf;
> +	}
> +
> +	vpdma->ready = true;
> +
> +free_buf:
> +	vpdma_buf_unmap(vpdma, &fw_dma_buf);
> +
> +	vpdma_buf_free(&fw_dma_buf);
> +rel_fw:
> +	release_firmware(f);
> +}

 Tomi



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 901 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 2/6] v4l: ti-vpe: Add helpers for creating VPDMA descriptors
  2013-08-02 14:03   ` Archit Taneja
@ 2013-08-05  9:11     ` Tomi Valkeinen
  -1 siblings, 0 replies; 138+ messages in thread
From: Tomi Valkeinen @ 2013-08-05  9:11 UTC (permalink / raw)
  To: Archit Taneja
  Cc: linux-media, linux-omap, dagriego, dale, pawel, m.szyprowski,
	hverkuil, laurent.pinchart

[-- Attachment #1: Type: text/plain, Size: 5067 bytes --]

On 02/08/13 17:03, Archit Taneja wrote:
> Create functions which the VPE driver can use to create a VPDMA descriptor and
> add it to a VPDMA descriptor list. These functions take a pointer to an existing
> list, and append the configuration/data/control descriptor header to the list.
> 
> In the case of configuration descriptors, the creation of a payload block may be
> required(the payloads can hold VPE MMR values, or scaler coefficients). The
> allocation of the payload buffer and it's content is left to the VPE driver.
> However, the VPDMA library provides helper macros to create payload in the
> correct format.
> 
> Add debug functions to dump the descriptors in a way such that it's easy to see
> the values of different fields in the descriptors.

There are lots of defines and inline functions in this patch. But at
least the ones I looked at were only used once.

For example, dtd_set_xfer_length_height() is called only in one place.
Then dtd_set_xfer_length_height() uses DTD_W1(), and again it's the only
place where DTD_W1() is used.

So instead of:

dtd_set_xfer_length_height(dtd, c_rect->width, height);

You could as well do:

dtd->xfer_length_height = (c_rect->width << DTD_LINE_LENGTH_SHFT) | height;

Now, presuming the compiler optimizes correctly, there should be no
difference between the two options above. My only point is that I wonder
if having multiple "layers" there improves readability at all. Some
helper funcs are rather trivial, like:

+static inline void dtd_set_w1(struct vpdma_dtd *dtd, u32 value)
+{
+	dtd->w1 = value;
+}

Then there are some, like dtd_set_type_ctl_stride(), that contains lots
of parameters. Hmm, okay, dtd_set_type_ctl_stride() is called in two
places, so at least in that case it makes sense to have that helper
func. But dtd_set_type_ctl_stride() uses DTD_W0(), and that's again the
only place where it's used.

So, I don't know. I'm not suggesting to change anything, I just started
wondering if all those macros and helpers actually help or not.

> Signed-off-by: Archit Taneja <archit@ti.com>
> ---
>  drivers/media/platform/ti-vpe/vpdma.c      | 269 +++++++++++
>  drivers/media/platform/ti-vpe/vpdma.h      |  48 ++
>  drivers/media/platform/ti-vpe/vpdma_priv.h | 695 +++++++++++++++++++++++++++++
>  3 files changed, 1012 insertions(+)
> 
> diff --git a/drivers/media/platform/ti-vpe/vpdma.c b/drivers/media/platform/ti-vpe/vpdma.c
> index b15b3dd..b957381 100644
> --- a/drivers/media/platform/ti-vpe/vpdma.c
> +++ b/drivers/media/platform/ti-vpe/vpdma.c
> @@ -21,6 +21,7 @@
>  #include <linux/platform_device.h>
>  #include <linux/sched.h>
>  #include <linux/slab.h>
> +#include <linux/videodev2.h>
>  
>  #include "vpdma.h"
>  #include "vpdma_priv.h"
> @@ -425,6 +426,274 @@ int vpdma_submit_descs(struct vpdma_data *vpdma, struct vpdma_desc_list *list)
>  	return 0;
>  }
>  
> +static void dump_cfd(struct vpdma_cfd *cfd)
> +{
> +	int class;
> +
> +	class = cfd_get_class(cfd);
> +
> +	pr_debug("config descriptor of payload class: %s\n",
> +		class == CFD_CLS_BLOCK ? "simple block" :
> +		"address data block");
> +
> +	if (class == CFD_CLS_BLOCK)
> +		pr_debug("word0: dst_addr_offset = 0x%08x\n",
> +			cfd_get_dest_addr_offset(cfd));
> +
> +	if (class == CFD_CLS_BLOCK)
> +		pr_debug("word1: num_data_wrds = %d\n", cfd_get_block_len(cfd));
> +
> +	pr_debug("word2: payload_addr = 0x%08x\n", cfd_get_payload_addr(cfd));
> +
> +	pr_debug("word3: pkt_type = %d, direct = %d, class = %d, dest = %d, "
> +		"payload_len = %d\n", cfd_get_pkt_type(cfd),
> +		cfd_get_direct(cfd), class, cfd_get_dest(cfd),
> +		cfd_get_payload_len(cfd));
> +}

There's quite a bit of code in these dump functions, and they are always
called. I'm sure getting that data is good for debugging, but I presume
they are quite useless for normal use. So I think they should be
compiled in only if some Kconfig option is selected.

> +/*
> + * data transfer descriptor
> + *
> + * All fields are 32 bits to make them endian neutral

What does that mean? Why would 32bit fields make it endian neutral?

> + */
> +struct vpdma_dtd {
> +	u32			type_ctl_stride;
> +	union {
> +		u32		xfer_length_height;
> +		u32		w1;
> +	};
> +	dma_addr_t		start_addr;
> +	u32			pkt_ctl;
> +	union {
> +		u32		frame_width_height;	/* inbound */
> +		dma_addr_t	desc_write_addr;	/* outbound */

Are you sure dma_addr_t is always 32 bit?

> +	};
> +	union {
> +		u32		start_h_v;		/* inbound */
> +		u32		max_width_height;	/* outbound */
> +	};
> +	u32			client_attr0;
> +	u32			client_attr1;
> +};

I'm not sure if I understand the struct right, but presuming this one
struct is used for both writing and reading, and certain set of fields
is used for writes and other set for reads, would it make sense to have
two different structs, instead of using unions? Although they do have
many common fields, and the unions are a bit scattered there, so I don't
know if that would be cleaner...

 Tomi



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 901 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 2/6] v4l: ti-vpe: Add helpers for creating VPDMA descriptors
@ 2013-08-05  9:11     ` Tomi Valkeinen
  0 siblings, 0 replies; 138+ messages in thread
From: Tomi Valkeinen @ 2013-08-05  9:11 UTC (permalink / raw)
  To: Archit Taneja
  Cc: linux-media, linux-omap, dagriego, dale, pawel, m.szyprowski,
	hverkuil, laurent.pinchart

[-- Attachment #1: Type: text/plain, Size: 5067 bytes --]

On 02/08/13 17:03, Archit Taneja wrote:
> Create functions which the VPE driver can use to create a VPDMA descriptor and
> add it to a VPDMA descriptor list. These functions take a pointer to an existing
> list, and append the configuration/data/control descriptor header to the list.
> 
> In the case of configuration descriptors, the creation of a payload block may be
> required(the payloads can hold VPE MMR values, or scaler coefficients). The
> allocation of the payload buffer and it's content is left to the VPE driver.
> However, the VPDMA library provides helper macros to create payload in the
> correct format.
> 
> Add debug functions to dump the descriptors in a way such that it's easy to see
> the values of different fields in the descriptors.

There are lots of defines and inline functions in this patch. But at
least the ones I looked at were only used once.

For example, dtd_set_xfer_length_height() is called only in one place.
Then dtd_set_xfer_length_height() uses DTD_W1(), and again it's the only
place where DTD_W1() is used.

So instead of:

dtd_set_xfer_length_height(dtd, c_rect->width, height);

You could as well do:

dtd->xfer_length_height = (c_rect->width << DTD_LINE_LENGTH_SHFT) | height;

Now, presuming the compiler optimizes correctly, there should be no
difference between the two options above. My only point is that I wonder
if having multiple "layers" there improves readability at all. Some
helper funcs are rather trivial, like:

+static inline void dtd_set_w1(struct vpdma_dtd *dtd, u32 value)
+{
+	dtd->w1 = value;
+}

Then there are some, like dtd_set_type_ctl_stride(), that contains lots
of parameters. Hmm, okay, dtd_set_type_ctl_stride() is called in two
places, so at least in that case it makes sense to have that helper
func. But dtd_set_type_ctl_stride() uses DTD_W0(), and that's again the
only place where it's used.

So, I don't know. I'm not suggesting to change anything, I just started
wondering if all those macros and helpers actually help or not.

> Signed-off-by: Archit Taneja <archit@ti.com>
> ---
>  drivers/media/platform/ti-vpe/vpdma.c      | 269 +++++++++++
>  drivers/media/platform/ti-vpe/vpdma.h      |  48 ++
>  drivers/media/platform/ti-vpe/vpdma_priv.h | 695 +++++++++++++++++++++++++++++
>  3 files changed, 1012 insertions(+)
> 
> diff --git a/drivers/media/platform/ti-vpe/vpdma.c b/drivers/media/platform/ti-vpe/vpdma.c
> index b15b3dd..b957381 100644
> --- a/drivers/media/platform/ti-vpe/vpdma.c
> +++ b/drivers/media/platform/ti-vpe/vpdma.c
> @@ -21,6 +21,7 @@
>  #include <linux/platform_device.h>
>  #include <linux/sched.h>
>  #include <linux/slab.h>
> +#include <linux/videodev2.h>
>  
>  #include "vpdma.h"
>  #include "vpdma_priv.h"
> @@ -425,6 +426,274 @@ int vpdma_submit_descs(struct vpdma_data *vpdma, struct vpdma_desc_list *list)
>  	return 0;
>  }
>  
> +static void dump_cfd(struct vpdma_cfd *cfd)
> +{
> +	int class;
> +
> +	class = cfd_get_class(cfd);
> +
> +	pr_debug("config descriptor of payload class: %s\n",
> +		class == CFD_CLS_BLOCK ? "simple block" :
> +		"address data block");
> +
> +	if (class == CFD_CLS_BLOCK)
> +		pr_debug("word0: dst_addr_offset = 0x%08x\n",
> +			cfd_get_dest_addr_offset(cfd));
> +
> +	if (class == CFD_CLS_BLOCK)
> +		pr_debug("word1: num_data_wrds = %d\n", cfd_get_block_len(cfd));
> +
> +	pr_debug("word2: payload_addr = 0x%08x\n", cfd_get_payload_addr(cfd));
> +
> +	pr_debug("word3: pkt_type = %d, direct = %d, class = %d, dest = %d, "
> +		"payload_len = %d\n", cfd_get_pkt_type(cfd),
> +		cfd_get_direct(cfd), class, cfd_get_dest(cfd),
> +		cfd_get_payload_len(cfd));
> +}

There's quite a bit of code in these dump functions, and they are always
called. I'm sure getting that data is good for debugging, but I presume
they are quite useless for normal use. So I think they should be
compiled in only if some Kconfig option is selected.

> +/*
> + * data transfer descriptor
> + *
> + * All fields are 32 bits to make them endian neutral

What does that mean? Why would 32bit fields make it endian neutral?

> + */
> +struct vpdma_dtd {
> +	u32			type_ctl_stride;
> +	union {
> +		u32		xfer_length_height;
> +		u32		w1;
> +	};
> +	dma_addr_t		start_addr;
> +	u32			pkt_ctl;
> +	union {
> +		u32		frame_width_height;	/* inbound */
> +		dma_addr_t	desc_write_addr;	/* outbound */

Are you sure dma_addr_t is always 32 bit?

> +	};
> +	union {
> +		u32		start_h_v;		/* inbound */
> +		u32		max_width_height;	/* outbound */
> +	};
> +	u32			client_attr0;
> +	u32			client_attr1;
> +};

I'm not sure if I understand the struct right, but presuming this one
struct is used for both writing and reading, and certain set of fields
is used for writes and other set for reads, would it make sense to have
two different structs, instead of using unions? Although they do have
many common fields, and the unions are a bit scattered there, so I don't
know if that would be cleaner...

 Tomi



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 901 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 3/6] v4l: ti-vpe: Add VPE mem to mem driver
  2013-08-02 14:03   ` Archit Taneja
@ 2013-08-05  9:18     ` Tomi Valkeinen
  -1 siblings, 0 replies; 138+ messages in thread
From: Tomi Valkeinen @ 2013-08-05  9:18 UTC (permalink / raw)
  To: Archit Taneja
  Cc: linux-media, linux-omap, dagriego, dale, pawel, m.szyprowski,
	hverkuil, laurent.pinchart

[-- Attachment #1: Type: text/plain, Size: 2500 bytes --]

On 02/08/13 17:03, Archit Taneja wrote:
> VPE is a block which consists of a single memory to memory path which can
> perform chrominance up/down sampling, de-interlacing, scaling, and color space
> conversion of raster or tiled YUV420 coplanar, YUV422 coplanar or YUV422
> interleaved video formats.
> 
> We create a mem2mem driver based primarily on the mem2mem-testdev example.
> The de-interlacer, scaler and color space converter are all bypassed for now
> to keep the driver simple. Chroma up/down sampler blocks are implemented, so
> conversion beteen different YUV formats is possible.
> 
> Each mem2mem context allocates a buffer for VPE MMR values which it will use
> when it gets access to the VPE HW via the mem2mem queue, it also allocates
> a VPDMA descriptor list to which configuration and data descriptors are added.
> 
> Based on the information received via v4l2 ioctls for the source and
> destination queues, the driver configures the values for the MMRs, and stores
> them in the buffer. There are also some VPDMA parameters like frame start and
> line mode which needs to be configured, these are configured by direct register
> writes via the VPDMA helper functions.
> 
> The driver's device_run() mem2mem op will add each descriptor based on how the
> source and destination queues are set up for the given ctx, once the list is
> prepared, it's submitted to VPDMA, these descriptors when parsed by VPDMA will
> upload MMR registers, start DMA of video buffers on the various input and output
> clients/ports.
> 
> When the list is parsed completely(and the DMAs on all the output ports done),
> an interrupt is generated which we use to notify that the source and destination
> buffers are done.
> 
> The rest of the driver is quite similar to other mem2mem drivers, we use the
> multiplane v4l2 ioctls as the HW support coplanar formats.
> 
> Signed-off-by: Archit Taneja <archit@ti.com>
> ---
>  drivers/media/platform/Kconfig           |   10 +
>  drivers/media/platform/Makefile          |    2 +
>  drivers/media/platform/ti-vpe/vpe.c      | 1763 ++++++++++++++++++++++++++++++
>  drivers/media/platform/ti-vpe/vpe_regs.h |  496 +++++++++
>  4 files changed, 2271 insertions(+)
>  create mode 100644 drivers/media/platform/ti-vpe/vpe.c
>  create mode 100644 drivers/media/platform/ti-vpe/vpe_regs.h

Just two quick comments, the same as to an earlier patch: consts for
tables, and "write" instead of "insert".

 Tomi



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 901 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 3/6] v4l: ti-vpe: Add VPE mem to mem driver
@ 2013-08-05  9:18     ` Tomi Valkeinen
  0 siblings, 0 replies; 138+ messages in thread
From: Tomi Valkeinen @ 2013-08-05  9:18 UTC (permalink / raw)
  To: Archit Taneja
  Cc: linux-media, linux-omap, dagriego, dale, pawel, m.szyprowski,
	hverkuil, laurent.pinchart

[-- Attachment #1: Type: text/plain, Size: 2500 bytes --]

On 02/08/13 17:03, Archit Taneja wrote:
> VPE is a block which consists of a single memory to memory path which can
> perform chrominance up/down sampling, de-interlacing, scaling, and color space
> conversion of raster or tiled YUV420 coplanar, YUV422 coplanar or YUV422
> interleaved video formats.
> 
> We create a mem2mem driver based primarily on the mem2mem-testdev example.
> The de-interlacer, scaler and color space converter are all bypassed for now
> to keep the driver simple. Chroma up/down sampler blocks are implemented, so
> conversion beteen different YUV formats is possible.
> 
> Each mem2mem context allocates a buffer for VPE MMR values which it will use
> when it gets access to the VPE HW via the mem2mem queue, it also allocates
> a VPDMA descriptor list to which configuration and data descriptors are added.
> 
> Based on the information received via v4l2 ioctls for the source and
> destination queues, the driver configures the values for the MMRs, and stores
> them in the buffer. There are also some VPDMA parameters like frame start and
> line mode which needs to be configured, these are configured by direct register
> writes via the VPDMA helper functions.
> 
> The driver's device_run() mem2mem op will add each descriptor based on how the
> source and destination queues are set up for the given ctx, once the list is
> prepared, it's submitted to VPDMA, these descriptors when parsed by VPDMA will
> upload MMR registers, start DMA of video buffers on the various input and output
> clients/ports.
> 
> When the list is parsed completely(and the DMAs on all the output ports done),
> an interrupt is generated which we use to notify that the source and destination
> buffers are done.
> 
> The rest of the driver is quite similar to other mem2mem drivers, we use the
> multiplane v4l2 ioctls as the HW support coplanar formats.
> 
> Signed-off-by: Archit Taneja <archit@ti.com>
> ---
>  drivers/media/platform/Kconfig           |   10 +
>  drivers/media/platform/Makefile          |    2 +
>  drivers/media/platform/ti-vpe/vpe.c      | 1763 ++++++++++++++++++++++++++++++
>  drivers/media/platform/ti-vpe/vpe_regs.h |  496 +++++++++
>  4 files changed, 2271 insertions(+)
>  create mode 100644 drivers/media/platform/ti-vpe/vpe.c
>  create mode 100644 drivers/media/platform/ti-vpe/vpe_regs.h

Just two quick comments, the same as to an earlier patch: consts for
tables, and "write" instead of "insert".

 Tomi



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 901 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 1/6] v4l: ti-vpe: Create a vpdma helper library
  2013-08-05  8:13     ` Tomi Valkeinen
@ 2013-08-05 11:26       ` Archit Taneja
  -1 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-08-05 11:26 UTC (permalink / raw)
  To: Tomi Valkeinen
  Cc: linux-media, linux-omap, dagriego, dale, pawel, m.szyprowski,
	hverkuil, laurent.pinchart

On Monday 05 August 2013 01:43 PM, Tomi Valkeinen wrote:
> Hi,
>
> On 02/08/13 17:03, Archit Taneja wrote:
>
>> +struct vpdma_data_format vpdma_yuv_fmts[] = {
>> +	[VPDMA_DATA_FMT_Y444] = {
>> +		.data_type	= DATA_TYPE_Y444,
>> +		.depth		= 8,
>> +	},
>
> This, and all the other tables, should probably be consts?

That's true, I'll fix those.

>
>> +static void insert_field(u32 *valp, u32 field, u32 mask, int shift)
>> +{
>> +	u32 val = *valp;
>> +
>> +	val &= ~(mask << shift);
>> +	val |= (field & mask) << shift;
>> +	*valp = val;
>> +}
>
> I think "insert" normally means, well, inserting a thing in between
> something. What you do here is overwriting.
>
> Why not just call it "write_field"?

sure, will change it.

>
>> + * Allocate a DMA buffer
>> + */
>> +int vpdma_buf_alloc(struct vpdma_buf *buf, size_t size)
>> +{
>> +	buf->size = size;
>> +	buf->mapped = 0;
>
> Maybe true/false is clearer here that 0/1.

okay.

>
>> +/*
>> + * submit a list of DMA descriptors to the VPE VPDMA, do not wait for completion
>> + */
>> +int vpdma_submit_descs(struct vpdma_data *vpdma, struct vpdma_desc_list *list)
>> +{
>> +	/* we always use the first list */
>> +	int list_num = 0;
>> +	int list_size;
>> +
>> +	if (vpdma_list_busy(vpdma, list_num))
>> +		return -EBUSY;
>> +
>> +	/* 16-byte granularity */
>> +	list_size = (list->next - list->buf.addr) >> 4;
>> +
>> +	write_reg(vpdma, VPDMA_LIST_ADDR, (u32) list->buf.dma_addr);
>> +	wmb();
>
> What is the wmb() for?

VPDMA_LIST_ADDR needs to be written before VPDMA_LIST_ATTR, otherwise 
VPDMA doesn't work. wmb() ensures the ordering.

>
>> +	write_reg(vpdma, VPDMA_LIST_ATTR,
>> +			(list_num << VPDMA_LIST_NUM_SHFT) |
>> +			(list->type << VPDMA_LIST_TYPE_SHFT) |
>> +			list_size);
>> +
>> +	return 0;
>> +}
>
>> +static void vpdma_firmware_cb(const struct firmware *f, void *context)
>> +{
>> +	struct vpdma_data *vpdma = context;
>> +	struct vpdma_buf fw_dma_buf;
>> +	int i, r;
>> +
>> +	dev_dbg(&vpdma->pdev->dev, "firmware callback\n");
>> +
>> +	if (!f || !f->data) {
>> +		dev_err(&vpdma->pdev->dev, "couldn't get firmware\n");
>> +		return;
>> +	}
>> +
>> +	/* already initialized */
>> +	if (get_field_reg(vpdma, VPDMA_LIST_ATTR, VPDMA_LIST_RDY_MASK,
>> +			VPDMA_LIST_RDY_SHFT)) {
>> +		vpdma->ready = true;
>> +		return;
>> +	}
>> +
>> +	r = vpdma_buf_alloc(&fw_dma_buf, f->size);
>> +	if (r) {
>> +		dev_err(&vpdma->pdev->dev,
>> +			"failed to allocate dma buffer for firmware\n");
>> +		goto rel_fw;
>> +	}
>> +
>> +	memcpy(fw_dma_buf.addr, f->data, f->size);
>> +
>> +	vpdma_buf_map(vpdma, &fw_dma_buf);
>> +
>> +	write_reg(vpdma, VPDMA_LIST_ADDR, (u32) fw_dma_buf.dma_addr);
>> +
>> +	for (i = 0; i < 100; i++) {		/* max 1 second */
>> +		msleep_interruptible(10);
>
> You call interruptible version here, but you don't handle the
> interrupted case. I believe the loop will just continue looping, even if
> the user interrupted.

Okay. I think I don't understand the interruptible version correctly. We 
don't need to msleep_interruptible here, we aren't waiting on any wake 
up event, we just want to wait till a bit gets set.

I am thinking of implementing something similar to wait_for_bit_change() 
in 'drivers/video/omap2/dss/dsi.c'

Archit


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 1/6] v4l: ti-vpe: Create a vpdma helper library
@ 2013-08-05 11:26       ` Archit Taneja
  0 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-08-05 11:26 UTC (permalink / raw)
  To: Tomi Valkeinen
  Cc: linux-media, linux-omap, dagriego, dale, pawel, m.szyprowski,
	hverkuil, laurent.pinchart

On Monday 05 August 2013 01:43 PM, Tomi Valkeinen wrote:
> Hi,
>
> On 02/08/13 17:03, Archit Taneja wrote:
>
>> +struct vpdma_data_format vpdma_yuv_fmts[] = {
>> +	[VPDMA_DATA_FMT_Y444] = {
>> +		.data_type	= DATA_TYPE_Y444,
>> +		.depth		= 8,
>> +	},
>
> This, and all the other tables, should probably be consts?

That's true, I'll fix those.

>
>> +static void insert_field(u32 *valp, u32 field, u32 mask, int shift)
>> +{
>> +	u32 val = *valp;
>> +
>> +	val &= ~(mask << shift);
>> +	val |= (field & mask) << shift;
>> +	*valp = val;
>> +}
>
> I think "insert" normally means, well, inserting a thing in between
> something. What you do here is overwriting.
>
> Why not just call it "write_field"?

sure, will change it.

>
>> + * Allocate a DMA buffer
>> + */
>> +int vpdma_buf_alloc(struct vpdma_buf *buf, size_t size)
>> +{
>> +	buf->size = size;
>> +	buf->mapped = 0;
>
> Maybe true/false is clearer here that 0/1.

okay.

>
>> +/*
>> + * submit a list of DMA descriptors to the VPE VPDMA, do not wait for completion
>> + */
>> +int vpdma_submit_descs(struct vpdma_data *vpdma, struct vpdma_desc_list *list)
>> +{
>> +	/* we always use the first list */
>> +	int list_num = 0;
>> +	int list_size;
>> +
>> +	if (vpdma_list_busy(vpdma, list_num))
>> +		return -EBUSY;
>> +
>> +	/* 16-byte granularity */
>> +	list_size = (list->next - list->buf.addr) >> 4;
>> +
>> +	write_reg(vpdma, VPDMA_LIST_ADDR, (u32) list->buf.dma_addr);
>> +	wmb();
>
> What is the wmb() for?

VPDMA_LIST_ADDR needs to be written before VPDMA_LIST_ATTR, otherwise 
VPDMA doesn't work. wmb() ensures the ordering.

>
>> +	write_reg(vpdma, VPDMA_LIST_ATTR,
>> +			(list_num << VPDMA_LIST_NUM_SHFT) |
>> +			(list->type << VPDMA_LIST_TYPE_SHFT) |
>> +			list_size);
>> +
>> +	return 0;
>> +}
>
>> +static void vpdma_firmware_cb(const struct firmware *f, void *context)
>> +{
>> +	struct vpdma_data *vpdma = context;
>> +	struct vpdma_buf fw_dma_buf;
>> +	int i, r;
>> +
>> +	dev_dbg(&vpdma->pdev->dev, "firmware callback\n");
>> +
>> +	if (!f || !f->data) {
>> +		dev_err(&vpdma->pdev->dev, "couldn't get firmware\n");
>> +		return;
>> +	}
>> +
>> +	/* already initialized */
>> +	if (get_field_reg(vpdma, VPDMA_LIST_ATTR, VPDMA_LIST_RDY_MASK,
>> +			VPDMA_LIST_RDY_SHFT)) {
>> +		vpdma->ready = true;
>> +		return;
>> +	}
>> +
>> +	r = vpdma_buf_alloc(&fw_dma_buf, f->size);
>> +	if (r) {
>> +		dev_err(&vpdma->pdev->dev,
>> +			"failed to allocate dma buffer for firmware\n");
>> +		goto rel_fw;
>> +	}
>> +
>> +	memcpy(fw_dma_buf.addr, f->data, f->size);
>> +
>> +	vpdma_buf_map(vpdma, &fw_dma_buf);
>> +
>> +	write_reg(vpdma, VPDMA_LIST_ADDR, (u32) fw_dma_buf.dma_addr);
>> +
>> +	for (i = 0; i < 100; i++) {		/* max 1 second */
>> +		msleep_interruptible(10);
>
> You call interruptible version here, but you don't handle the
> interrupted case. I believe the loop will just continue looping, even if
> the user interrupted.

Okay. I think I don't understand the interruptible version correctly. We 
don't need to msleep_interruptible here, we aren't waiting on any wake 
up event, we just want to wait till a bit gets set.

I am thinking of implementing something similar to wait_for_bit_change() 
in 'drivers/video/omap2/dss/dsi.c'

Archit


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 2/6] v4l: ti-vpe: Add helpers for creating VPDMA descriptors
  2013-08-05  9:11     ` Tomi Valkeinen
@ 2013-08-05 12:05       ` Archit Taneja
  -1 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-08-05 12:05 UTC (permalink / raw)
  To: Tomi Valkeinen
  Cc: linux-media, linux-omap, dagriego, dale, pawel, m.szyprowski,
	hverkuil, laurent.pinchart

On Monday 05 August 2013 02:41 PM, Tomi Valkeinen wrote:
> On 02/08/13 17:03, Archit Taneja wrote:
>> Create functions which the VPE driver can use to create a VPDMA descriptor and
>> add it to a VPDMA descriptor list. These functions take a pointer to an existing
>> list, and append the configuration/data/control descriptor header to the list.
>>
>> In the case of configuration descriptors, the creation of a payload block may be
>> required(the payloads can hold VPE MMR values, or scaler coefficients). The
>> allocation of the payload buffer and it's content is left to the VPE driver.
>> However, the VPDMA library provides helper macros to create payload in the
>> correct format.
>>
>> Add debug functions to dump the descriptors in a way such that it's easy to see
>> the values of different fields in the descriptors.
>
> There are lots of defines and inline functions in this patch. But at
> least the ones I looked at were only used once.
>
> For example, dtd_set_xfer_length_height() is called only in one place.
> Then dtd_set_xfer_length_height() uses DTD_W1(), and again it's the only
> place where DTD_W1() is used.
>
> So instead of:
>
> dtd_set_xfer_length_height(dtd, c_rect->width, height);
>
> You could as well do:
>
> dtd->xfer_length_height = (c_rect->width << DTD_LINE_LENGTH_SHFT) | height;
>
> Now, presuming the compiler optimizes correctly, there should be no
> difference between the two options above. My only point is that I wonder
> if having multiple "layers" there improves readability at all. Some
> helper funcs are rather trivial, like:
>
> +static inline void dtd_set_w1(struct vpdma_dtd *dtd, u32 value)
> +{
> +	dtd->w1 = value;
> +}
>
> Then there are some, like dtd_set_type_ctl_stride(), that contains lots
> of parameters. Hmm, okay, dtd_set_type_ctl_stride() is called in two
> places, so at least in that case it makes sense to have that helper
> func. But dtd_set_type_ctl_stride() uses DTD_W0(), and that's again the
> only place where it's used.
>
> So, I don't know. I'm not suggesting to change anything, I just started
> wondering if all those macros and helpers actually help or not.

There are some more descriptors to add later on, but you are right about 
many of them being used at only one place, I'll have a look at the 
macros again.

>
>> Signed-off-by: Archit Taneja <archit@ti.com>
>> ---
>>   drivers/media/platform/ti-vpe/vpdma.c      | 269 +++++++++++
>>   drivers/media/platform/ti-vpe/vpdma.h      |  48 ++
>>   drivers/media/platform/ti-vpe/vpdma_priv.h | 695 +++++++++++++++++++++++++++++
>>   3 files changed, 1012 insertions(+)
>>
>> diff --git a/drivers/media/platform/ti-vpe/vpdma.c b/drivers/media/platform/ti-vpe/vpdma.c
>> index b15b3dd..b957381 100644
>> --- a/drivers/media/platform/ti-vpe/vpdma.c
>> +++ b/drivers/media/platform/ti-vpe/vpdma.c
>> @@ -21,6 +21,7 @@
>>   #include <linux/platform_device.h>
>>   #include <linux/sched.h>
>>   #include <linux/slab.h>
>> +#include <linux/videodev2.h>
>>
>>   #include "vpdma.h"
>>   #include "vpdma_priv.h"
>> @@ -425,6 +426,274 @@ int vpdma_submit_descs(struct vpdma_data *vpdma, struct vpdma_desc_list *list)
>>   	return 0;
>>   }
>>
>> +static void dump_cfd(struct vpdma_cfd *cfd)
>> +{
>> +	int class;
>> +
>> +	class = cfd_get_class(cfd);
>> +
>> +	pr_debug("config descriptor of payload class: %s\n",
>> +		class == CFD_CLS_BLOCK ? "simple block" :
>> +		"address data block");
>> +
>> +	if (class == CFD_CLS_BLOCK)
>> +		pr_debug("word0: dst_addr_offset = 0x%08x\n",
>> +			cfd_get_dest_addr_offset(cfd));
>> +
>> +	if (class == CFD_CLS_BLOCK)
>> +		pr_debug("word1: num_data_wrds = %d\n", cfd_get_block_len(cfd));
>> +
>> +	pr_debug("word2: payload_addr = 0x%08x\n", cfd_get_payload_addr(cfd));
>> +
>> +	pr_debug("word3: pkt_type = %d, direct = %d, class = %d, dest = %d, "
>> +		"payload_len = %d\n", cfd_get_pkt_type(cfd),
>> +		cfd_get_direct(cfd), class, cfd_get_dest(cfd),
>> +		cfd_get_payload_len(cfd));
>> +}
>
> There's quite a bit of code in these dump functions, and they are always
> called. I'm sure getting that data is good for debugging, but I presume
> they are quite useless for normal use. So I think they should be
> compiled in only if some Kconfig option is selected.

Won't pr_debug() functions actually print something only when 
CONFIG_DYNAMIC_DEBUG is selected or if the DEBUG is defined? They will 
still consume a lot of code, but it would just end up in dummy printk 
calls, right?

>
>> +/*
>> + * data transfer descriptor
>> + *
>> + * All fields are 32 bits to make them endian neutral
>
> What does that mean? Why would 32bit fields make it endian neutral?


Each 32 bit field describes one word of the data descriptor. Each 
descriptor has a number of parameters.

If we look at the word 'xfer_length_height'. It's composed of height 
(from bits 15:0) and width(from bits 31:16). If the word was expressed 
using bit fields, we can describe the word(in big endian) as:

struct vpdma_dtd {
	...
	unsigned int	xfer_width:16;
	unsigned int	xfer_height:16;
	...
	...
};

and in little endian as:

struct vpdma_dtd {
	...
	unsigned int	xfer_height:16;
	unsigned int	xfer_width:16;
	...
	...
};

So this representation makes it endian dependent. Maybe the comment 
should be improved saying that usage of u32 words instead of bit fields 
prevents endian issues.

>
>> + */
>> +struct vpdma_dtd {
>> +	u32			type_ctl_stride;
>> +	union {
>> +		u32		xfer_length_height;
>> +		u32		w1;
>> +	};
>> +	dma_addr_t		start_addr;
>> +	u32			pkt_ctl;
>> +	union {
>> +		u32		frame_width_height;	/* inbound */
>> +		dma_addr_t	desc_write_addr;	/* outbound */
>
> Are you sure dma_addr_t is always 32 bit?

I am not sure about this.

>
>> +	};
>> +	union {
>> +		u32		start_h_v;		/* inbound */
>> +		u32		max_width_height;	/* outbound */
>> +	};
>> +	u32			client_attr0;
>> +	u32			client_attr1;
>> +};
>
> I'm not sure if I understand the struct right, but presuming this one
> struct is used for both writing and reading, and certain set of fields
> is used for writes and other set for reads, would it make sense to have
> two different structs, instead of using unions? Although they do have
> many common fields, and the unions are a bit scattered there, so I don't
> know if that would be cleaner...

It helps in a having a common debug function, I don't see much benefit 
apart from that. I'll see if it's better to have them as separate structs.

Thanks,
Archit


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 2/6] v4l: ti-vpe: Add helpers for creating VPDMA descriptors
@ 2013-08-05 12:05       ` Archit Taneja
  0 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-08-05 12:05 UTC (permalink / raw)
  To: Tomi Valkeinen
  Cc: linux-media, linux-omap, dagriego, dale, pawel, m.szyprowski,
	hverkuil, laurent.pinchart

On Monday 05 August 2013 02:41 PM, Tomi Valkeinen wrote:
> On 02/08/13 17:03, Archit Taneja wrote:
>> Create functions which the VPE driver can use to create a VPDMA descriptor and
>> add it to a VPDMA descriptor list. These functions take a pointer to an existing
>> list, and append the configuration/data/control descriptor header to the list.
>>
>> In the case of configuration descriptors, the creation of a payload block may be
>> required(the payloads can hold VPE MMR values, or scaler coefficients). The
>> allocation of the payload buffer and it's content is left to the VPE driver.
>> However, the VPDMA library provides helper macros to create payload in the
>> correct format.
>>
>> Add debug functions to dump the descriptors in a way such that it's easy to see
>> the values of different fields in the descriptors.
>
> There are lots of defines and inline functions in this patch. But at
> least the ones I looked at were only used once.
>
> For example, dtd_set_xfer_length_height() is called only in one place.
> Then dtd_set_xfer_length_height() uses DTD_W1(), and again it's the only
> place where DTD_W1() is used.
>
> So instead of:
>
> dtd_set_xfer_length_height(dtd, c_rect->width, height);
>
> You could as well do:
>
> dtd->xfer_length_height = (c_rect->width << DTD_LINE_LENGTH_SHFT) | height;
>
> Now, presuming the compiler optimizes correctly, there should be no
> difference between the two options above. My only point is that I wonder
> if having multiple "layers" there improves readability at all. Some
> helper funcs are rather trivial, like:
>
> +static inline void dtd_set_w1(struct vpdma_dtd *dtd, u32 value)
> +{
> +	dtd->w1 = value;
> +}
>
> Then there are some, like dtd_set_type_ctl_stride(), that contains lots
> of parameters. Hmm, okay, dtd_set_type_ctl_stride() is called in two
> places, so at least in that case it makes sense to have that helper
> func. But dtd_set_type_ctl_stride() uses DTD_W0(), and that's again the
> only place where it's used.
>
> So, I don't know. I'm not suggesting to change anything, I just started
> wondering if all those macros and helpers actually help or not.

There are some more descriptors to add later on, but you are right about 
many of them being used at only one place, I'll have a look at the 
macros again.

>
>> Signed-off-by: Archit Taneja <archit@ti.com>
>> ---
>>   drivers/media/platform/ti-vpe/vpdma.c      | 269 +++++++++++
>>   drivers/media/platform/ti-vpe/vpdma.h      |  48 ++
>>   drivers/media/platform/ti-vpe/vpdma_priv.h | 695 +++++++++++++++++++++++++++++
>>   3 files changed, 1012 insertions(+)
>>
>> diff --git a/drivers/media/platform/ti-vpe/vpdma.c b/drivers/media/platform/ti-vpe/vpdma.c
>> index b15b3dd..b957381 100644
>> --- a/drivers/media/platform/ti-vpe/vpdma.c
>> +++ b/drivers/media/platform/ti-vpe/vpdma.c
>> @@ -21,6 +21,7 @@
>>   #include <linux/platform_device.h>
>>   #include <linux/sched.h>
>>   #include <linux/slab.h>
>> +#include <linux/videodev2.h>
>>
>>   #include "vpdma.h"
>>   #include "vpdma_priv.h"
>> @@ -425,6 +426,274 @@ int vpdma_submit_descs(struct vpdma_data *vpdma, struct vpdma_desc_list *list)
>>   	return 0;
>>   }
>>
>> +static void dump_cfd(struct vpdma_cfd *cfd)
>> +{
>> +	int class;
>> +
>> +	class = cfd_get_class(cfd);
>> +
>> +	pr_debug("config descriptor of payload class: %s\n",
>> +		class == CFD_CLS_BLOCK ? "simple block" :
>> +		"address data block");
>> +
>> +	if (class == CFD_CLS_BLOCK)
>> +		pr_debug("word0: dst_addr_offset = 0x%08x\n",
>> +			cfd_get_dest_addr_offset(cfd));
>> +
>> +	if (class == CFD_CLS_BLOCK)
>> +		pr_debug("word1: num_data_wrds = %d\n", cfd_get_block_len(cfd));
>> +
>> +	pr_debug("word2: payload_addr = 0x%08x\n", cfd_get_payload_addr(cfd));
>> +
>> +	pr_debug("word3: pkt_type = %d, direct = %d, class = %d, dest = %d, "
>> +		"payload_len = %d\n", cfd_get_pkt_type(cfd),
>> +		cfd_get_direct(cfd), class, cfd_get_dest(cfd),
>> +		cfd_get_payload_len(cfd));
>> +}
>
> There's quite a bit of code in these dump functions, and they are always
> called. I'm sure getting that data is good for debugging, but I presume
> they are quite useless for normal use. So I think they should be
> compiled in only if some Kconfig option is selected.

Won't pr_debug() functions actually print something only when 
CONFIG_DYNAMIC_DEBUG is selected or if the DEBUG is defined? They will 
still consume a lot of code, but it would just end up in dummy printk 
calls, right?

>
>> +/*
>> + * data transfer descriptor
>> + *
>> + * All fields are 32 bits to make them endian neutral
>
> What does that mean? Why would 32bit fields make it endian neutral?


Each 32 bit field describes one word of the data descriptor. Each 
descriptor has a number of parameters.

If we look at the word 'xfer_length_height'. It's composed of height 
(from bits 15:0) and width(from bits 31:16). If the word was expressed 
using bit fields, we can describe the word(in big endian) as:

struct vpdma_dtd {
	...
	unsigned int	xfer_width:16;
	unsigned int	xfer_height:16;
	...
	...
};

and in little endian as:

struct vpdma_dtd {
	...
	unsigned int	xfer_height:16;
	unsigned int	xfer_width:16;
	...
	...
};

So this representation makes it endian dependent. Maybe the comment 
should be improved saying that usage of u32 words instead of bit fields 
prevents endian issues.

>
>> + */
>> +struct vpdma_dtd {
>> +	u32			type_ctl_stride;
>> +	union {
>> +		u32		xfer_length_height;
>> +		u32		w1;
>> +	};
>> +	dma_addr_t		start_addr;
>> +	u32			pkt_ctl;
>> +	union {
>> +		u32		frame_width_height;	/* inbound */
>> +		dma_addr_t	desc_write_addr;	/* outbound */
>
> Are you sure dma_addr_t is always 32 bit?

I am not sure about this.

>
>> +	};
>> +	union {
>> +		u32		start_h_v;		/* inbound */
>> +		u32		max_width_height;	/* outbound */
>> +	};
>> +	u32			client_attr0;
>> +	u32			client_attr1;
>> +};
>
> I'm not sure if I understand the struct right, but presuming this one
> struct is used for both writing and reading, and certain set of fields
> is used for writes and other set for reads, would it make sense to have
> two different structs, instead of using unions? Although they do have
> many common fields, and the unions are a bit scattered there, so I don't
> know if that would be cleaner...

It helps in a having a common debug function, I don't see much benefit 
apart from that. I'll see if it's better to have them as separate structs.

Thanks,
Archit

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 1/6] v4l: ti-vpe: Create a vpdma helper library
  2013-08-05 11:26       ` Archit Taneja
@ 2013-08-05 12:26         ` Tomi Valkeinen
  -1 siblings, 0 replies; 138+ messages in thread
From: Tomi Valkeinen @ 2013-08-05 12:26 UTC (permalink / raw)
  To: Archit Taneja
  Cc: linux-media, linux-omap, dagriego, dale, pawel, m.szyprowski,
	hverkuil, laurent.pinchart

[-- Attachment #1: Type: text/plain, Size: 1756 bytes --]

On 05/08/13 14:26, Archit Taneja wrote:
> On Monday 05 August 2013 01:43 PM, Tomi Valkeinen wrote:

>>> +/*
>>> + * submit a list of DMA descriptors to the VPE VPDMA, do not wait
>>> for completion
>>> + */
>>> +int vpdma_submit_descs(struct vpdma_data *vpdma, struct
>>> vpdma_desc_list *list)
>>> +{
>>> +    /* we always use the first list */
>>> +    int list_num = 0;
>>> +    int list_size;
>>> +
>>> +    if (vpdma_list_busy(vpdma, list_num))
>>> +        return -EBUSY;
>>> +
>>> +    /* 16-byte granularity */
>>> +    list_size = (list->next - list->buf.addr) >> 4;
>>> +
>>> +    write_reg(vpdma, VPDMA_LIST_ADDR, (u32) list->buf.dma_addr);
>>> +    wmb();
>>
>> What is the wmb() for?
> 
> VPDMA_LIST_ADDR needs to be written before VPDMA_LIST_ATTR, otherwise
> VPDMA doesn't work. wmb() ensures the ordering.

Are you sure it's needed? Here's an interesting thread about writing and
reading to registers: http://marc.info/?t=130588594900002&r=1&w=2

>>> +
>>> +    for (i = 0; i < 100; i++) {        /* max 1 second */
>>> +        msleep_interruptible(10);
>>
>> You call interruptible version here, but you don't handle the
>> interrupted case. I believe the loop will just continue looping, even if
>> the user interrupted.
> 
> Okay. I think I don't understand the interruptible version correctly. We
> don't need to msleep_interruptible here, we aren't waiting on any wake
> up event, we just want to wait till a bit gets set.

Well, I think the interruptible versions should be used when the user
(wel, userspace program) initiates the action. The user should have the
option to interrupt a possibly long running operation, which is what
msleep_interruptible() makes possible.

 Tomi



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 901 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 1/6] v4l: ti-vpe: Create a vpdma helper library
@ 2013-08-05 12:26         ` Tomi Valkeinen
  0 siblings, 0 replies; 138+ messages in thread
From: Tomi Valkeinen @ 2013-08-05 12:26 UTC (permalink / raw)
  To: Archit Taneja
  Cc: linux-media, linux-omap, dagriego, dale, pawel, m.szyprowski,
	hverkuil, laurent.pinchart

[-- Attachment #1: Type: text/plain, Size: 1756 bytes --]

On 05/08/13 14:26, Archit Taneja wrote:
> On Monday 05 August 2013 01:43 PM, Tomi Valkeinen wrote:

>>> +/*
>>> + * submit a list of DMA descriptors to the VPE VPDMA, do not wait
>>> for completion
>>> + */
>>> +int vpdma_submit_descs(struct vpdma_data *vpdma, struct
>>> vpdma_desc_list *list)
>>> +{
>>> +    /* we always use the first list */
>>> +    int list_num = 0;
>>> +    int list_size;
>>> +
>>> +    if (vpdma_list_busy(vpdma, list_num))
>>> +        return -EBUSY;
>>> +
>>> +    /* 16-byte granularity */
>>> +    list_size = (list->next - list->buf.addr) >> 4;
>>> +
>>> +    write_reg(vpdma, VPDMA_LIST_ADDR, (u32) list->buf.dma_addr);
>>> +    wmb();
>>
>> What is the wmb() for?
> 
> VPDMA_LIST_ADDR needs to be written before VPDMA_LIST_ATTR, otherwise
> VPDMA doesn't work. wmb() ensures the ordering.

Are you sure it's needed? Here's an interesting thread about writing and
reading to registers: http://marc.info/?t=130588594900002&r=1&w=2

>>> +
>>> +    for (i = 0; i < 100; i++) {        /* max 1 second */
>>> +        msleep_interruptible(10);
>>
>> You call interruptible version here, but you don't handle the
>> interrupted case. I believe the loop will just continue looping, even if
>> the user interrupted.
> 
> Okay. I think I don't understand the interruptible version correctly. We
> don't need to msleep_interruptible here, we aren't waiting on any wake
> up event, we just want to wait till a bit gets set.

Well, I think the interruptible versions should be used when the user
(wel, userspace program) initiates the action. The user should have the
option to interrupt a possibly long running operation, which is what
msleep_interruptible() makes possible.

 Tomi



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 901 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 2/6] v4l: ti-vpe: Add helpers for creating VPDMA descriptors
  2013-08-05 12:05       ` Archit Taneja
@ 2013-08-05 13:03         ` Tomi Valkeinen
  -1 siblings, 0 replies; 138+ messages in thread
From: Tomi Valkeinen @ 2013-08-05 13:03 UTC (permalink / raw)
  To: Archit Taneja
  Cc: linux-media, linux-omap, dagriego, dale, pawel, m.szyprowski,
	hverkuil, laurent.pinchart

[-- Attachment #1: Type: text/plain, Size: 4730 bytes --]

On 05/08/13 15:05, Archit Taneja wrote:
> On Monday 05 August 2013 02:41 PM, Tomi Valkeinen wrote:

>> There's quite a bit of code in these dump functions, and they are always
>> called. I'm sure getting that data is good for debugging, but I presume
>> they are quite useless for normal use. So I think they should be
>> compiled in only if some Kconfig option is selected.
> 
> Won't pr_debug() functions actually print something only when
> CONFIG_DYNAMIC_DEBUG is selected or if the DEBUG is defined? They will

If DEBUG is defined, they are always printed. If dynamic debug is in
use, the user has to enable debug prints for VPE for the dumps to be
printed.

> still consume a lot of code, but it would just end up in dummy printk
> calls, right?

Yes.

Well, I don't know VPE, so I can't really say how much those prints are
needed or not. They just looked very verbose to me.

I think we should have "normal" level debugging messages compiled in by
default, and for "verbose" there should be a separate compile options.
With verbose I mean something that may be useful if you are changing the
code and want to verify it or debugging some very odd bug. I.e. for the
developer of the driver.

And with normal something that would be used when, say, somebody uses
VPE for in his app, but things don't seem to be quite right, and there's
need to get some info on what is going on. I.e. for "normal" user.

But that's just my opinion, and it's obviously difficult to define those
clearly =). To be honest, I don't know how much overhead very verbose
kernel debug prints even cause. Maybe it's negligible.

>>> +/*
>>> + * data transfer descriptor
>>> + *
>>> + * All fields are 32 bits to make them endian neutral
>>
>> What does that mean? Why would 32bit fields make it endian neutral?
> 
> 
> Each 32 bit field describes one word of the data descriptor. Each
> descriptor has a number of parameters.
> 
> If we look at the word 'xfer_length_height'. It's composed of height
> (from bits 15:0) and width(from bits 31:16). If the word was expressed
> using bit fields, we can describe the word(in big endian) as:
> 
> struct vpdma_dtd {
>     ...
>     unsigned int    xfer_width:16;
>     unsigned int    xfer_height:16;
>     ...
>     ...
> };
> 
> and in little endian as:
> 
> struct vpdma_dtd {
>     ...
>     unsigned int    xfer_height:16;
>     unsigned int    xfer_width:16;
>     ...
>     ...
> };
> 
> So this representation makes it endian dependent. Maybe the comment
> should be improved saying that usage of u32 words instead of bit fields
> prevents endian issues.

No, I don't think that's correct. Endianness is about bytes, not 16 bit
words. The above text doesn't make much sense to me.

I haven't really worked with endiannes issues, but maybe __le32 and
others should be used in the struct, if that struct is read by the HW.
And use cpu_to_le32() & others to write those. But googling will
probably give more info (I should read also =).

>>> + */
>>> +struct vpdma_dtd {
>>> +    u32            type_ctl_stride;
>>> +    union {
>>> +        u32        xfer_length_height;
>>> +        u32        w1;
>>> +    };
>>> +    dma_addr_t        start_addr;
>>> +    u32            pkt_ctl;
>>> +    union {
>>> +        u32        frame_width_height;    /* inbound */
>>> +        dma_addr_t    desc_write_addr;    /* outbound */
>>
>> Are you sure dma_addr_t is always 32 bit?
> 
> I am not sure about this.

Is this struct directly read by the HW, or written to HW? If so, I
believe using dma_addr_t is very wrong here. Having a typedef like
dma_addr_t hides the actual type used for it. So even if it currently
would always be 32bit, there's no guarantee.

>>
>>> +    };
>>> +    union {
>>> +        u32        start_h_v;        /* inbound */
>>> +        u32        max_width_height;    /* outbound */
>>> +    };
>>> +    u32            client_attr0;
>>> +    u32            client_attr1;
>>> +};
>>
>> I'm not sure if I understand the struct right, but presuming this one
>> struct is used for both writing and reading, and certain set of fields
>> is used for writes and other set for reads, would it make sense to have
>> two different structs, instead of using unions? Although they do have
>> many common fields, and the unions are a bit scattered there, so I don't
>> know if that would be cleaner...
> 
> It helps in a having a common debug function, I don't see much benefit
> apart from that. I'll see if it's better to have them as separate structs.

Ok. Does the struct have any bit or such that tells us if the current
data is inbound or outbound?

 Tomi



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 901 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 2/6] v4l: ti-vpe: Add helpers for creating VPDMA descriptors
@ 2013-08-05 13:03         ` Tomi Valkeinen
  0 siblings, 0 replies; 138+ messages in thread
From: Tomi Valkeinen @ 2013-08-05 13:03 UTC (permalink / raw)
  To: Archit Taneja
  Cc: linux-media, linux-omap, dagriego, dale, pawel, m.szyprowski,
	hverkuil, laurent.pinchart

[-- Attachment #1: Type: text/plain, Size: 4730 bytes --]

On 05/08/13 15:05, Archit Taneja wrote:
> On Monday 05 August 2013 02:41 PM, Tomi Valkeinen wrote:

>> There's quite a bit of code in these dump functions, and they are always
>> called. I'm sure getting that data is good for debugging, but I presume
>> they are quite useless for normal use. So I think they should be
>> compiled in only if some Kconfig option is selected.
> 
> Won't pr_debug() functions actually print something only when
> CONFIG_DYNAMIC_DEBUG is selected or if the DEBUG is defined? They will

If DEBUG is defined, they are always printed. If dynamic debug is in
use, the user has to enable debug prints for VPE for the dumps to be
printed.

> still consume a lot of code, but it would just end up in dummy printk
> calls, right?

Yes.

Well, I don't know VPE, so I can't really say how much those prints are
needed or not. They just looked very verbose to me.

I think we should have "normal" level debugging messages compiled in by
default, and for "verbose" there should be a separate compile options.
With verbose I mean something that may be useful if you are changing the
code and want to verify it or debugging some very odd bug. I.e. for the
developer of the driver.

And with normal something that would be used when, say, somebody uses
VPE for in his app, but things don't seem to be quite right, and there's
need to get some info on what is going on. I.e. for "normal" user.

But that's just my opinion, and it's obviously difficult to define those
clearly =). To be honest, I don't know how much overhead very verbose
kernel debug prints even cause. Maybe it's negligible.

>>> +/*
>>> + * data transfer descriptor
>>> + *
>>> + * All fields are 32 bits to make them endian neutral
>>
>> What does that mean? Why would 32bit fields make it endian neutral?
> 
> 
> Each 32 bit field describes one word of the data descriptor. Each
> descriptor has a number of parameters.
> 
> If we look at the word 'xfer_length_height'. It's composed of height
> (from bits 15:0) and width(from bits 31:16). If the word was expressed
> using bit fields, we can describe the word(in big endian) as:
> 
> struct vpdma_dtd {
>     ...
>     unsigned int    xfer_width:16;
>     unsigned int    xfer_height:16;
>     ...
>     ...
> };
> 
> and in little endian as:
> 
> struct vpdma_dtd {
>     ...
>     unsigned int    xfer_height:16;
>     unsigned int    xfer_width:16;
>     ...
>     ...
> };
> 
> So this representation makes it endian dependent. Maybe the comment
> should be improved saying that usage of u32 words instead of bit fields
> prevents endian issues.

No, I don't think that's correct. Endianness is about bytes, not 16 bit
words. The above text doesn't make much sense to me.

I haven't really worked with endiannes issues, but maybe __le32 and
others should be used in the struct, if that struct is read by the HW.
And use cpu_to_le32() & others to write those. But googling will
probably give more info (I should read also =).

>>> + */
>>> +struct vpdma_dtd {
>>> +    u32            type_ctl_stride;
>>> +    union {
>>> +        u32        xfer_length_height;
>>> +        u32        w1;
>>> +    };
>>> +    dma_addr_t        start_addr;
>>> +    u32            pkt_ctl;
>>> +    union {
>>> +        u32        frame_width_height;    /* inbound */
>>> +        dma_addr_t    desc_write_addr;    /* outbound */
>>
>> Are you sure dma_addr_t is always 32 bit?
> 
> I am not sure about this.

Is this struct directly read by the HW, or written to HW? If so, I
believe using dma_addr_t is very wrong here. Having a typedef like
dma_addr_t hides the actual type used for it. So even if it currently
would always be 32bit, there's no guarantee.

>>
>>> +    };
>>> +    union {
>>> +        u32        start_h_v;        /* inbound */
>>> +        u32        max_width_height;    /* outbound */
>>> +    };
>>> +    u32            client_attr0;
>>> +    u32            client_attr1;
>>> +};
>>
>> I'm not sure if I understand the struct right, but presuming this one
>> struct is used for both writing and reading, and certain set of fields
>> is used for writes and other set for reads, would it make sense to have
>> two different structs, instead of using unions? Although they do have
>> many common fields, and the unions are a bit scattered there, so I don't
>> know if that would be cleaner...
> 
> It helps in a having a common debug function, I don't see much benefit
> apart from that. I'll see if it's better to have them as separate structs.

Ok. Does the struct have any bit or such that tells us if the current
data is inbound or outbound?

 Tomi



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 901 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 1/6] v4l: ti-vpe: Create a vpdma helper library
  2013-08-05 11:26       ` Archit Taneja
  (?)
  (?)
@ 2013-08-08 21:35       ` Laurent Pinchart
  2013-08-14 10:19           ` Archit Taneja
  -1 siblings, 1 reply; 138+ messages in thread
From: Laurent Pinchart @ 2013-08-08 21:35 UTC (permalink / raw)
  To: Archit Taneja
  Cc: Tomi Valkeinen, linux-media, linux-omap, dagriego, dale, pawel,
	m.szyprowski, hverkuil

Hi Archit,

On Monday 05 August 2013 16:56:46 Archit Taneja wrote:
> On Monday 05 August 2013 01:43 PM, Tomi Valkeinen wrote:
> > On 02/08/13 17:03, Archit Taneja wrote:
> >> +struct vpdma_data_format vpdma_yuv_fmts[] = {
> >> +	[VPDMA_DATA_FMT_Y444] = {
> >> +		.data_type	= DATA_TYPE_Y444,
> >> +		.depth		= 8,
> >> +	},
> > 
> > This, and all the other tables, should probably be consts?
> 
> That's true, I'll fix those.
> 
> >> +static void insert_field(u32 *valp, u32 field, u32 mask, int shift)
> >> +{
> >> +	u32 val = *valp;
> >> +
> >> +	val &= ~(mask << shift);
> >> +	val |= (field & mask) << shift;
> >> +	*valp = val;
> >> +}
> > 
> > I think "insert" normally means, well, inserting a thing in between
> > something. What you do here is overwriting.
> > 
> > Why not just call it "write_field"?
> 
> sure, will change it.
> 
> >> + * Allocate a DMA buffer
> >> + */
> >> +int vpdma_buf_alloc(struct vpdma_buf *buf, size_t size)
> >> +{
> >> +	buf->size = size;
> >> +	buf->mapped = 0;
> > 
> > Maybe true/false is clearer here that 0/1.
> 
> okay.
> 
> >> +/*
> >> + * submit a list of DMA descriptors to the VPE VPDMA, do not wait for
> >> completion + */
> >> +int vpdma_submit_descs(struct vpdma_data *vpdma, struct vpdma_desc_list
> >> *list) +{
> >> +	/* we always use the first list */
> >> +	int list_num = 0;
> >> +	int list_size;
> >> +
> >> +	if (vpdma_list_busy(vpdma, list_num))
> >> +		return -EBUSY;
> >> +
> >> +	/* 16-byte granularity */
> >> +	list_size = (list->next - list->buf.addr) >> 4;
> >> +
> >> +	write_reg(vpdma, VPDMA_LIST_ADDR, (u32) list->buf.dma_addr);
> >> +	wmb();
> > 
> > What is the wmb() for?
> 
> VPDMA_LIST_ADDR needs to be written before VPDMA_LIST_ATTR, otherwise
> VPDMA doesn't work. wmb() ensures the ordering.

write_reg() calls iowrite32(), which already includes an __iowmb().

> >> +	write_reg(vpdma, VPDMA_LIST_ATTR,
> >> +			(list_num << VPDMA_LIST_NUM_SHFT) |
> >> +			(list->type << VPDMA_LIST_TYPE_SHFT) |
> >> +			list_size);
> >> +
> >> +	return 0;
> >> +}
> >> 
> >> +static void vpdma_firmware_cb(const struct firmware *f, void *context)
> >> +{
> >> +	struct vpdma_data *vpdma = context;
> >> +	struct vpdma_buf fw_dma_buf;
> >> +	int i, r;
> >> +
> >> +	dev_dbg(&vpdma->pdev->dev, "firmware callback\n");
> >> +
> >> +	if (!f || !f->data) {
> >> +		dev_err(&vpdma->pdev->dev, "couldn't get firmware\n");
> >> +		return;
> >> +	}
> >> +
> >> +	/* already initialized */
> >> +	if (get_field_reg(vpdma, VPDMA_LIST_ATTR, VPDMA_LIST_RDY_MASK,
> >> +			VPDMA_LIST_RDY_SHFT)) {
> >> +		vpdma->ready = true;
> >> +		return;
> >> +	}
> >> +
> >> +	r = vpdma_buf_alloc(&fw_dma_buf, f->size);
> >> +	if (r) {
> >> +		dev_err(&vpdma->pdev->dev,
> >> +			"failed to allocate dma buffer for firmware\n");
> >> +		goto rel_fw;
> >> +	}
> >> +
> >> +	memcpy(fw_dma_buf.addr, f->data, f->size);
> >> +
> >> +	vpdma_buf_map(vpdma, &fw_dma_buf);
> >> +
> >> +	write_reg(vpdma, VPDMA_LIST_ADDR, (u32) fw_dma_buf.dma_addr);
> >> +
> >> +	for (i = 0; i < 100; i++) {		/* max 1 second */
> >> +		msleep_interruptible(10);
> > 
> > You call interruptible version here, but you don't handle the
> > interrupted case. I believe the loop will just continue looping, even if
> > the user interrupted.
> 
> Okay. I think I don't understand the interruptible version correctly. We
> don't need to msleep_interruptible here, we aren't waiting on any wake
> up event, we just want to wait till a bit gets set.
> 
> I am thinking of implementing something similar to wait_for_bit_change()
> in 'drivers/video/omap2/dss/dsi.c'

-- 
Regards,

Laurent Pinchart


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 1/6] v4l: ti-vpe: Create a vpdma helper library
  2013-08-02 14:03   ` Archit Taneja
  (?)
  (?)
@ 2013-08-08 22:04   ` Laurent Pinchart
  2013-08-14 10:57       ` Archit Taneja
  -1 siblings, 1 reply; 138+ messages in thread
From: Laurent Pinchart @ 2013-08-08 22:04 UTC (permalink / raw)
  To: Archit Taneja
  Cc: linux-media, linux-omap, dagriego, dale, pawel, m.szyprowski,
	hverkuil, tomi.valkeinen

Hi Archit,

Thank you for the patch.

On Friday 02 August 2013 19:33:38 Archit Taneja wrote:
> The primary function of VPDMA is to move data between external memory and
> internal processing modules(in our case, VPE) that source or sink data.
> VPDMA is capable of buffering this data and then delivering the data as
> demanded to the modules as programmed. The modules that source or sink data
> are referred to as clients or ports. A channel is setup inside the VPDMA to
> connect a specific memory buffer to a specific client. The VPDMA
> centralizes the DMA control functions and buffering required to allow all
> the clients to minimize the effect of long latency times.
> 
> Add the following to the VPDMA helper:
> 
> - A data struct which describe VPDMA channels. For now, these channels are
> the ones used only by VPE, the list of channels will increase when
> VIP(Video Input Port) also uses the VPDMA library. This channel information
> will be used to populate fields required by data descriptors.
> 
> - Data structs which describe the different data types supported by VPDMA.
> This data type information will be used to populate fields required by data
> descriptors and used by the VPE driver to map a V4L2 format to the
> corresponding VPDMA data type.
> 
> - Provide VPDMA register offset definitions, functions to read, write and
> modify VPDMA registers.
> 
> - Functions to create and submit a VPDMA list. A list is a group of
> descriptors that makes up a set of DMA transfers that need to be completed.
> Each descriptor will either perform a DMA transaction to fetch input
> buffers and write to output buffers(data descriptors), or configure the
> MMRs of sub blocks of VPE(configuration descriptors), or provide control
> information to VPDMA (control descriptors).
> 
> - Functions to allocate, map and unmap buffers needed for the descriptor
> list, payloads containing MMR values and motion vector buffers. These use
> the DMA mapping APIs to ensure exclusive access to VPDMA.
> 
> - Functions to enable VPDMA interrupts. VPDMA can trigger an interrupt on
> the VPE interrupt line when a descriptor list is parsed completely and the
> DMA transactions are completed. This requires masking the events in VPDMA
> registers and configuring some top level VPE interrupt registers.
> 
> - Enable some VPDMA specific parameters: frame start event(when to start DMA
> for a client) and line mode(whether each line fetched should be mirrored or
> not).
> 
> - Function to load firmware required by VPDMA. VPDMA requires a firmware for
> it's internal list manager. We add the required request_firmware apis to
> fetch this firmware from user space.
> 
> - Function to dump VPDMA registers.
> 
> - A function to initialize VPDMA, this will be called by the VPE driver with
> it's platform device pointer, this function will take care of loading VPDMA
> firmware and returning a handle back to the VPE driver. The VIP driver will
> also call the same init function to initialize it's own VPDMA instance.
> 
> Signed-off-by: Archit Taneja <archit@ti.com>
> ---
>  drivers/media/platform/ti-vpe/vpdma.c      | 589 ++++++++++++++++++++++++++
>  drivers/media/platform/ti-vpe/vpdma.h      | 154 ++++++++
>  drivers/media/platform/ti-vpe/vpdma_priv.h | 119 ++++++
>  3 files changed, 862 insertions(+)
>  create mode 100644 drivers/media/platform/ti-vpe/vpdma.c
>  create mode 100644 drivers/media/platform/ti-vpe/vpdma.h
>  create mode 100644 drivers/media/platform/ti-vpe/vpdma_priv.h
> 
> diff --git a/drivers/media/platform/ti-vpe/vpdma.c
> b/drivers/media/platform/ti-vpe/vpdma.c new file mode 100644
> index 0000000..b15b3dd
> --- /dev/null
> +++ b/drivers/media/platform/ti-vpe/vpdma.c
> @@ -0,0 +1,589 @@

[snip]

> +static int get_field(u32 value, u32 mask, int shift)
> +{
> +	return (value & (mask << shift)) >> shift;
> +}
> +
> +static int get_field_reg(struct vpdma_data *vpdma, int offset,
> +		u32 mask, int shift)

I would call this read_field_reg().

> +{
> +	return get_field(read_reg(vpdma, offset), mask, shift);
> +}
> +
> +static void insert_field(u32 *valp, u32 field, u32 mask, int shift)
> +{
> +	u32 val = *valp;
> +
> +	val &= ~(mask << shift);
> +	val |= (field & mask) << shift;
> +	*valp = val;
> +}

get_field() and insert_field() are used in a single location, you can manually 
inline them.

> +static void insert_field_reg(struct vpdma_data *vpdma, int offset, u32
> field,
> +		u32 mask, int shift)
> +{
> +	u32 val = read_reg(vpdma, offset);
> +
> +	insert_field(&val, field, mask, shift);
> +
> +	write_reg(vpdma, offset, val);
> +}

[snip]

> +/*
> + * Allocate a DMA buffer
> + */
> +int vpdma_buf_alloc(struct vpdma_buf *buf, size_t size)
> +{
> +	buf->size = size;
> +	buf->mapped = 0;
> +	buf->addr = kzalloc(size, GFP_KERNEL);

You should use the dma allocation API (depending on your needs, 
dma_alloc_coherent for instance) to allocate DMA-able buffers.

> +	if (!buf->addr)
> +		return -ENOMEM;
> +
> +	WARN_ON((u32) buf->addr & VPDMA_DESC_ALIGN);
> +
> +	return 0;
> +}
> +
> +void vpdma_buf_free(struct vpdma_buf *buf)
> +{
> +	WARN_ON(buf->mapped != 0);
> +	kfree(buf->addr);
> +	buf->addr = NULL;
> +	buf->size = 0;
> +}
> +
> +/*
> + * map a DMA buffer, enabling DMA access
> + */
> +void vpdma_buf_map(struct vpdma_data *vpdma, struct vpdma_buf *buf)
> +{
> +	struct device *dev = &vpdma->pdev->dev;
> +
> +	WARN_ON(buf->mapped != 0);
> +	buf->dma_addr = dma_map_single(dev, buf->addr, buf->size,
> +				DMA_TO_DEVICE);
> +	buf->mapped = 1;
> +	BUG_ON(dma_mapping_error(dev, buf->dma_addr));

BUG_ON() is too harsh, you should return a proper error instead.

> +}
> +
> +/*
> + * unmap a DMA buffer, disabling DMA access and
> + * allowing the main processor to acces the data
> + */
> +void vpdma_buf_unmap(struct vpdma_data *vpdma, struct vpdma_buf *buf)
> +{
> +	struct device *dev = &vpdma->pdev->dev;
> +
> +	if (buf->mapped)
> +		dma_unmap_single(dev, buf->dma_addr, buf->size, DMA_TO_DEVICE);
> +
> +	buf->mapped = 0;
> +}
> +
> +/*
> + * create a descriptor list, the user of this list will append
> configuration,
> + * contorl and data descriptors to this list, this list will be submitted

s/contorl/control/

> to
> + * VPDMA. VPDMA's list parser will go through each descriptor and perform
> + * the required DMA operations
> + */
> +int vpdma_create_desc_list(struct vpdma_desc_list *list, size_t size, int
> type)
> +{
> +	int r;
> +
> +	r = vpdma_buf_alloc(&list->buf, size);
> +	if (r)
> +		return r;
> +
> +	list->next = list->buf.addr;
> +
> +	list->type = type;
> +
> +	return 0;
> +}
> +
> +/*
> + * once a descriptor list is parsed by VPDMA, we reset the list by emptying
> it,
> + * to allow new descriptors to be added to the list.
> + */
> +void vpdma_reset_desc_list(struct vpdma_desc_list *list)
> +{
> +	list->next = list->buf.addr;
> +}
> +
> +/*
> + * free the buffer allocated fot the VPDMA descriptor list, this should be
> + * called when the user doesn't want to use VPDMA any more.
> + */
> +void vpdma_free_desc_list(struct vpdma_desc_list *list)
> +{
> +	vpdma_buf_free(&list->buf);
> +
> +	list->next = NULL;
> +}
> +
> +static int vpdma_list_busy(struct vpdma_data *vpdma, int list_num)

Should the function return a bool instead of an int ?

> +{
> +	u32 sync_reg = read_reg(vpdma, VPDMA_LIST_STAT_SYNC);
> +
> +	return (sync_reg >> (list_num + 16)) & 0x01;

You could shorten that as

	return read_reg(vpdma, VPDMA_LIST_STAT_SYNC) & BIT(list_num + 16);

> +}
> +
> +/*
> + * submit a list of DMA descriptors to the VPE VPDMA, do not wait for
> completion
> + */
> +int vpdma_submit_descs(struct vpdma_data *vpdma, struct vpdma_desc_list
> *list)
> +{
> +	/* we always use the first list */
> +	int list_num = 0;
> +	int list_size;
> +
> +	if (vpdma_list_busy(vpdma, list_num))
> +		return -EBUSY;
> +
> +	/* 16-byte granularity */
> +	list_size = (list->next - list->buf.addr) >> 4;
> +
> +	write_reg(vpdma, VPDMA_LIST_ADDR, (u32) list->buf.dma_addr);
> +	wmb();
> +	write_reg(vpdma, VPDMA_LIST_ATTR,
> +			(list_num << VPDMA_LIST_NUM_SHFT) |
> +			(list->type << VPDMA_LIST_TYPE_SHFT) |
> +			list_size);
> +
> +	return 0;
> +}
> +
> +/* set or clear the mask for list complete interrupt */
> +void vpdma_enable_list_complete_irq(struct vpdma_data *vpdma, int list_num,
> +		bool enable)
> +{
> +	u32 val;
> +
> +	val = read_reg(vpdma, VPDMA_INT_LIST0_MASK);
> +	if (enable)
> +		val |= (1 << (list_num * 2));
> +	else
> +		val &= ~(1 << (list_num * 2));
> +	write_reg(vpdma, VPDMA_INT_LIST0_MASK, val);
> +}
> +
> +/* clear previosuly occured list intterupts in the LIST_STAT register */
> +void vpdma_clear_list_stat(struct vpdma_data *vpdma)
> +{
> +	write_reg(vpdma, VPDMA_INT_LIST0_STAT,
> +		read_reg(vpdma, VPDMA_INT_LIST0_STAT));
> +}
> +
> +/*
> + * configures the output mode of the line buffer for the given client, the
> + * line buffer content can either be mirrored(each line repeated twice) or
> + * passed to the client as is
> + */
> +void vpdma_set_line_mode(struct vpdma_data *vpdma, int line_mode,
> +		enum vpdma_channel chan)
> +{
> +	int client_cstat = chan_info[chan].cstat_offset;
> +
> +	insert_field_reg(vpdma, client_cstat, line_mode,
> +		VPDMA_CSTAT_LINE_MODE_MASK, VPDMA_CSTAT_LINE_MODE_SHIFT);
> +}
> +
> +/*
> + * configures the event which should trigger VPDMA transfer for the given
> + * client
> + */
> +void vpdma_set_frame_start_event(struct vpdma_data *vpdma,
> +		enum vpdma_frame_start_event fs_event,
> +		enum vpdma_channel chan)
> +{
> +	int client_cstat = chan_info[chan].cstat_offset;
> +
> +	insert_field_reg(vpdma, client_cstat, fs_event,
> +		VPDMA_CSTAT_FRAME_START_MASK, VPDMA_CSTAT_FRAME_START_SHIFT);
> +}
> +
> +static void vpdma_firmware_cb(const struct firmware *f, void *context)
> +{
> +	struct vpdma_data *vpdma = context;
> +	struct vpdma_buf fw_dma_buf;
> +	int i, r;
> +
> +	dev_dbg(&vpdma->pdev->dev, "firmware callback\n");
> +
> +	if (!f || !f->data) {
> +		dev_err(&vpdma->pdev->dev, "couldn't get firmware\n");
> +		return;
> +	}
> +
> +	/* already initialized */
> +	if (get_field_reg(vpdma, VPDMA_LIST_ATTR, VPDMA_LIST_RDY_MASK,
> +			VPDMA_LIST_RDY_SHFT)) {
> +		vpdma->ready = true;
> +		return;
> +	}
> +
> +	r = vpdma_buf_alloc(&fw_dma_buf, f->size);
> +	if (r) {
> +		dev_err(&vpdma->pdev->dev,
> +			"failed to allocate dma buffer for firmware\n");
> +		goto rel_fw;
> +	}
> +
> +	memcpy(fw_dma_buf.addr, f->data, f->size);
> +
> +	vpdma_buf_map(vpdma, &fw_dma_buf);
> +
> +	write_reg(vpdma, VPDMA_LIST_ADDR, (u32) fw_dma_buf.dma_addr);
> +
> +	for (i = 0; i < 100; i++) {		/* max 1 second */
> +		msleep_interruptible(10);
> +
> +		if (get_field_reg(vpdma, VPDMA_LIST_ATTR, VPDMA_LIST_RDY_MASK,
> +				VPDMA_LIST_RDY_SHFT))
> +			break;
> +	}
> +
> +	if (i == 100) {
> +		dev_err(&vpdma->pdev->dev, "firmware upload failed\n");
> +		goto free_buf;
> +	}
> +
> +	vpdma->ready = true;
> +
> +free_buf:
> +	vpdma_buf_unmap(vpdma, &fw_dma_buf);
> +
> +	vpdma_buf_free(&fw_dma_buf);
> +rel_fw:
> +	release_firmware(f);
> +}
> +
> +static int vpdma_load_firmware(struct vpdma_data *vpdma)
> +{
> +	int r;
> +	struct device *dev = &vpdma->pdev->dev;
> +
> +	r = request_firmware_nowait(THIS_MODULE, 1,
> +		(const char *) VPDMA_FIRMWARE, dev, GFP_KERNEL, vpdma,
> +		vpdma_firmware_cb);

Is there a reason not to use the synchronous interface ? That would simplify 
both this code and the callers, as they won't have to check whether the 
firmware has been correctly loaded.
> +	if (r) {
> +		dev_err(dev, "firmware not available %s\n", VPDMA_FIRMWARE);
> +		return r;
> +	} else {
> +		dev_info(dev, "loading firmware %s\n", VPDMA_FIRMWARE);
> +	}
> +
> +	return 0;
> +}
> +
> +int vpdma_init(struct platform_device *pdev, struct vpdma_data **pvpdma)

As the function allocates the vpdma instance, I would call it vpdma_create()  
and make it turn a struct vpdma_data *. You can then return error codes using 
ERR_PTR().

> +{
> +	struct resource *res;
> +	struct vpdma_data *vpdma;
> +	int r;
> +
> +	dev_dbg(&pdev->dev, "vpdma_init\n");
> +
> +	vpdma = devm_kzalloc(&pdev->dev, sizeof(*vpdma), GFP_KERNEL);
> +	if (!vpdma) {
> +		dev_err(&pdev->dev, "couldn't alloc vpdma_dev\n");
> +		return -ENOMEM;
> +	}
> +
> +	vpdma->pdev = pdev;
> +
> +	res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "vpdma");
> +	if (res == NULL) {
> +		dev_err(&pdev->dev, "missing platform resources data\n");
> +		return -ENODEV;
> +	}
> +
> +	vpdma->base = devm_ioremap(&pdev->dev, res->start, resource_size(res));

You can use devm_ioremap_resource(). The function checks the res pointer and 
prints error messages, so you can remove the res == NULL check above and the 
dev_err() below.

> +	if (!vpdma->base) {
> +		dev_err(&pdev->dev, "failed to ioremap\n");
> +		return -ENOMEM;
> +	}
> +
> +	r = vpdma_load_firmware(vpdma);
> +	if (r) {
> +		pr_err("failed to load firmware %s\n", VPDMA_FIRMWARE);
> +		return r;
> +	}
> +
> +	*pvpdma = vpdma;
> +
> +	return 0;
> +}
> +MODULE_FIRMWARE(VPDMA_FIRMWARE);

-- 
Regards,

Laurent Pinchart


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 6/6] experimental: arm: dts: dra7xx: Add a DT node for VPE
  2013-08-02 14:03   ` Archit Taneja
  (?)
@ 2013-08-08 22:11   ` Laurent Pinchart
  2013-10-25 10:35       ` Archit Taneja
  2013-12-03 10:08       ` Archit Taneja
  -1 siblings, 2 replies; 138+ messages in thread
From: Laurent Pinchart @ 2013-08-08 22:11 UTC (permalink / raw)
  To: Archit Taneja
  Cc: linux-media, linux-omap, dagriego, dale, pawel, m.szyprowski,
	hverkuil, tomi.valkeinen, Rajendra Nayak, Sricharan R

Hi Archit,

Thank you for the patch.

On Friday 02 August 2013 19:33:43 Archit Taneja wrote:
> Add a DT node for VPE in dra7.dtsi. This is experimental because we might
> need to split the VPE address space a bit more, and also because the IRQ
> line described is accessible the IRQ crossbar driver is added for DRA7XX.
> 
> Cc: Rajendra Nayak <rnayak@ti.com>
> Cc: Sricharan R <r.sricharan@ti.com>
> Signed-off-by: Archit Taneja <archit@ti.com>
> ---
>  arch/arm/boot/dts/dra7.dtsi | 11 +++++++++++

Documentation is missing :-) As this is an experimental patch you can probably 
document the bindings later.

>  1 file changed, 11 insertions(+)
> 
> diff --git a/arch/arm/boot/dts/dra7.dtsi b/arch/arm/boot/dts/dra7.dtsi
> index ce9a0f0..3237972 100644
> --- a/arch/arm/boot/dts/dra7.dtsi
> +++ b/arch/arm/boot/dts/dra7.dtsi
> @@ -484,6 +484,17 @@
>  			dmas = <&sdma 70>, <&sdma 71>;
>  			dma-names = "tx0", "rx0";
>  		};
> +
> +		vpe {
> +			compatible = "ti,vpe";
> +			ti,hwmods = "vpe";
> +			reg = <0x489d0000 0xd000>, <0x489dd000 0x400>;
> +			reg-names = "vpe", "vpdma";
> +			interrupts = <0 159 0x4>;
> +			#address-cells = <1>;
> +			#size-cells = <0>;

Are #address-cells and #size-cells really needed ?

> +		};
> +
>  	};
> 
>  	clocks {
-- 
Regards,

Laurent Pinchart


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 1/6] v4l: ti-vpe: Create a vpdma helper library
  2013-08-08 21:35       ` Laurent Pinchart
@ 2013-08-14 10:19           ` Archit Taneja
  0 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-08-14 10:19 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Tomi Valkeinen, linux-media, linux-omap, dagriego, dale, pawel,
	m.szyprowski, hverkuil

On Friday 09 August 2013 03:05 AM, Laurent Pinchart wrote:
> Hi Archit,
>
> On Monday 05 August 2013 16:56:46 Archit Taneja wrote:
>> On Monday 05 August 2013 01:43 PM, Tomi Valkeinen wrote:
>>> On 02/08/13 17:03, Archit Taneja wrote:
>>>> +struct vpdma_data_format vpdma_yuv_fmts[] = {
>>>> +	[VPDMA_DATA_FMT_Y444] = {
>>>> +		.data_type	= DATA_TYPE_Y444,
>>>> +		.depth		= 8,
>>>> +	},
>>>
>>> This, and all the other tables, should probably be consts?
>>
>> That's true, I'll fix those.
>>
>>>> +static void insert_field(u32 *valp, u32 field, u32 mask, int shift)
>>>> +{
>>>> +	u32 val = *valp;
>>>> +
>>>> +	val &= ~(mask << shift);
>>>> +	val |= (field & mask) << shift;
>>>> +	*valp = val;
>>>> +}
>>>
>>> I think "insert" normally means, well, inserting a thing in between
>>> something. What you do here is overwriting.
>>>
>>> Why not just call it "write_field"?
>>
>> sure, will change it.
>>
>>>> + * Allocate a DMA buffer
>>>> + */
>>>> +int vpdma_buf_alloc(struct vpdma_buf *buf, size_t size)
>>>> +{
>>>> +	buf->size = size;
>>>> +	buf->mapped = 0;
>>>
>>> Maybe true/false is clearer here that 0/1.
>>
>> okay.
>>
>>>> +/*
>>>> + * submit a list of DMA descriptors to the VPE VPDMA, do not wait for
>>>> completion + */
>>>> +int vpdma_submit_descs(struct vpdma_data *vpdma, struct vpdma_desc_list
>>>> *list) +{
>>>> +	/* we always use the first list */
>>>> +	int list_num = 0;
>>>> +	int list_size;
>>>> +
>>>> +	if (vpdma_list_busy(vpdma, list_num))
>>>> +		return -EBUSY;
>>>> +
>>>> +	/* 16-byte granularity */
>>>> +	list_size = (list->next - list->buf.addr) >> 4;
>>>> +
>>>> +	write_reg(vpdma, VPDMA_LIST_ADDR, (u32) list->buf.dma_addr);
>>>> +	wmb();
>>>
>>> What is the wmb() for?
>>
>> VPDMA_LIST_ADDR needs to be written before VPDMA_LIST_ATTR, otherwise
>> VPDMA doesn't work. wmb() ensures the ordering.
>
> write_reg() calls iowrite32(), which already includes an __iowmb().

I wasn't aware of that. I'll remove the wmb() call. Thanks for sharing 
this info.

Archit


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 1/6] v4l: ti-vpe: Create a vpdma helper library
@ 2013-08-14 10:19           ` Archit Taneja
  0 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-08-14 10:19 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Tomi Valkeinen, linux-media, linux-omap, dagriego, dale, pawel,
	m.szyprowski, hverkuil

On Friday 09 August 2013 03:05 AM, Laurent Pinchart wrote:
> Hi Archit,
>
> On Monday 05 August 2013 16:56:46 Archit Taneja wrote:
>> On Monday 05 August 2013 01:43 PM, Tomi Valkeinen wrote:
>>> On 02/08/13 17:03, Archit Taneja wrote:
>>>> +struct vpdma_data_format vpdma_yuv_fmts[] = {
>>>> +	[VPDMA_DATA_FMT_Y444] = {
>>>> +		.data_type	= DATA_TYPE_Y444,
>>>> +		.depth		= 8,
>>>> +	},
>>>
>>> This, and all the other tables, should probably be consts?
>>
>> That's true, I'll fix those.
>>
>>>> +static void insert_field(u32 *valp, u32 field, u32 mask, int shift)
>>>> +{
>>>> +	u32 val = *valp;
>>>> +
>>>> +	val &= ~(mask << shift);
>>>> +	val |= (field & mask) << shift;
>>>> +	*valp = val;
>>>> +}
>>>
>>> I think "insert" normally means, well, inserting a thing in between
>>> something. What you do here is overwriting.
>>>
>>> Why not just call it "write_field"?
>>
>> sure, will change it.
>>
>>>> + * Allocate a DMA buffer
>>>> + */
>>>> +int vpdma_buf_alloc(struct vpdma_buf *buf, size_t size)
>>>> +{
>>>> +	buf->size = size;
>>>> +	buf->mapped = 0;
>>>
>>> Maybe true/false is clearer here that 0/1.
>>
>> okay.
>>
>>>> +/*
>>>> + * submit a list of DMA descriptors to the VPE VPDMA, do not wait for
>>>> completion + */
>>>> +int vpdma_submit_descs(struct vpdma_data *vpdma, struct vpdma_desc_list
>>>> *list) +{
>>>> +	/* we always use the first list */
>>>> +	int list_num = 0;
>>>> +	int list_size;
>>>> +
>>>> +	if (vpdma_list_busy(vpdma, list_num))
>>>> +		return -EBUSY;
>>>> +
>>>> +	/* 16-byte granularity */
>>>> +	list_size = (list->next - list->buf.addr) >> 4;
>>>> +
>>>> +	write_reg(vpdma, VPDMA_LIST_ADDR, (u32) list->buf.dma_addr);
>>>> +	wmb();
>>>
>>> What is the wmb() for?
>>
>> VPDMA_LIST_ADDR needs to be written before VPDMA_LIST_ATTR, otherwise
>> VPDMA doesn't work. wmb() ensures the ordering.
>
> write_reg() calls iowrite32(), which already includes an __iowmb().

I wasn't aware of that. I'll remove the wmb() call. Thanks for sharing 
this info.

Archit

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 1/6] v4l: ti-vpe: Create a vpdma helper library
  2013-08-08 22:04   ` Laurent Pinchart
@ 2013-08-14 10:57       ` Archit Taneja
  0 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-08-14 10:57 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: linux-media, linux-omap, dagriego, dale, pawel, m.szyprowski,
	hverkuil, tomi.valkeinen

On Friday 09 August 2013 03:34 AM, Laurent Pinchart wrote:
> Hi Archit,
>
> Thank you for the patch.
>
> On Friday 02 August 2013 19:33:38 Archit Taneja wrote:
>> The primary function of VPDMA is to move data between external memory and
>> internal processing modules(in our case, VPE) that source or sink data.
>> VPDMA is capable of buffering this data and then delivering the data as
>> demanded to the modules as programmed. The modules that source or sink data
>> are referred to as clients or ports. A channel is setup inside the VPDMA to
>> connect a specific memory buffer to a specific client. The VPDMA
>> centralizes the DMA control functions and buffering required to allow all
>> the clients to minimize the effect of long latency times.
>>
>> Add the following to the VPDMA helper:
>>
>> - A data struct which describe VPDMA channels. For now, these channels are
>> the ones used only by VPE, the list of channels will increase when
>> VIP(Video Input Port) also uses the VPDMA library. This channel information
>> will be used to populate fields required by data descriptors.
>>
>> - Data structs which describe the different data types supported by VPDMA.
>> This data type information will be used to populate fields required by data
>> descriptors and used by the VPE driver to map a V4L2 format to the
>> corresponding VPDMA data type.
>>
>> - Provide VPDMA register offset definitions, functions to read, write and
>> modify VPDMA registers.
>>
>> - Functions to create and submit a VPDMA list. A list is a group of
>> descriptors that makes up a set of DMA transfers that need to be completed.
>> Each descriptor will either perform a DMA transaction to fetch input
>> buffers and write to output buffers(data descriptors), or configure the
>> MMRs of sub blocks of VPE(configuration descriptors), or provide control
>> information to VPDMA (control descriptors).
>>
>> - Functions to allocate, map and unmap buffers needed for the descriptor
>> list, payloads containing MMR values and motion vector buffers. These use
>> the DMA mapping APIs to ensure exclusive access to VPDMA.
>>
>> - Functions to enable VPDMA interrupts. VPDMA can trigger an interrupt on
>> the VPE interrupt line when a descriptor list is parsed completely and the
>> DMA transactions are completed. This requires masking the events in VPDMA
>> registers and configuring some top level VPE interrupt registers.
>>
>> - Enable some VPDMA specific parameters: frame start event(when to start DMA
>> for a client) and line mode(whether each line fetched should be mirrored or
>> not).
>>
>> - Function to load firmware required by VPDMA. VPDMA requires a firmware for
>> it's internal list manager. We add the required request_firmware apis to
>> fetch this firmware from user space.
>>
>> - Function to dump VPDMA registers.
>>
>> - A function to initialize VPDMA, this will be called by the VPE driver with
>> it's platform device pointer, this function will take care of loading VPDMA
>> firmware and returning a handle back to the VPE driver. The VIP driver will
>> also call the same init function to initialize it's own VPDMA instance.
>>
>> Signed-off-by: Archit Taneja <archit@ti.com>
>> ---
>>   drivers/media/platform/ti-vpe/vpdma.c      | 589 ++++++++++++++++++++++++++
>>   drivers/media/platform/ti-vpe/vpdma.h      | 154 ++++++++
>>   drivers/media/platform/ti-vpe/vpdma_priv.h | 119 ++++++
>>   3 files changed, 862 insertions(+)
>>   create mode 100644 drivers/media/platform/ti-vpe/vpdma.c
>>   create mode 100644 drivers/media/platform/ti-vpe/vpdma.h
>>   create mode 100644 drivers/media/platform/ti-vpe/vpdma_priv.h
>>
>> diff --git a/drivers/media/platform/ti-vpe/vpdma.c
>> b/drivers/media/platform/ti-vpe/vpdma.c new file mode 100644
>> index 0000000..b15b3dd
>> --- /dev/null
>> +++ b/drivers/media/platform/ti-vpe/vpdma.c
>> @@ -0,0 +1,589 @@
>
> [snip]
>
>> +static int get_field(u32 value, u32 mask, int shift)
>> +{
>> +	return (value & (mask << shift)) >> shift;
>> +}
>> +
>> +static int get_field_reg(struct vpdma_data *vpdma, int offset,
>> +		u32 mask, int shift)
>
> I would call this read_field_reg().
>
>> +{
>> +	return get_field(read_reg(vpdma, offset), mask, shift);
>> +}
>> +
>> +static void insert_field(u32 *valp, u32 field, u32 mask, int shift)
>> +{
>> +	u32 val = *valp;
>> +
>> +	val &= ~(mask << shift);
>> +	val |= (field & mask) << shift;
>> +	*valp = val;
>> +}
>
> get_field() and insert_field() are used in a single location, you can manually
> inline them.

Thanks, I'll make these modifications.

>
>> +static void insert_field_reg(struct vpdma_data *vpdma, int offset, u32
>> field,
>> +		u32 mask, int shift)
>> +{
>> +	u32 val = read_reg(vpdma, offset);
>> +
>> +	insert_field(&val, field, mask, shift);
>> +
>> +	write_reg(vpdma, offset, val);
>> +}
>
> [snip]
>
>> +/*
>> + * Allocate a DMA buffer
>> + */
>> +int vpdma_buf_alloc(struct vpdma_buf *buf, size_t size)
>> +{
>> +	buf->size = size;
>> +	buf->mapped = 0;
>> +	buf->addr = kzalloc(size, GFP_KERNEL);
>
> You should use the dma allocation API (depending on your needs,
> dma_alloc_coherent for instance) to allocate DMA-able buffers.

I'm not sure about this, dma_map_single() api works fine on kzalloc'd 
buffers. The above function is used to allocate small contiguous 
buffers(never more than 1024 bytes) needed for descriptors for the DMA 
IP. I thought of using DMA pool, but that creates small buffers of the 
same size. So I finally went with kzalloc.

>
>> +	if (!buf->addr)
>> +		return -ENOMEM;
>> +
>> +	WARN_ON((u32) buf->addr & VPDMA_DESC_ALIGN);
>> +
>> +	return 0;
>> +}
>> +
>> +void vpdma_buf_free(struct vpdma_buf *buf)
>> +{
>> +	WARN_ON(buf->mapped != 0);
>> +	kfree(buf->addr);
>> +	buf->addr = NULL;
>> +	buf->size = 0;
>> +}
>> +
>> +/*
>> + * map a DMA buffer, enabling DMA access
>> + */
>> +void vpdma_buf_map(struct vpdma_data *vpdma, struct vpdma_buf *buf)
>> +{
>> +	struct device *dev = &vpdma->pdev->dev;
>> +
>> +	WARN_ON(buf->mapped != 0);
>> +	buf->dma_addr = dma_map_single(dev, buf->addr, buf->size,
>> +				DMA_TO_DEVICE);
>> +	buf->mapped = 1;
>> +	BUG_ON(dma_mapping_error(dev, buf->dma_addr));
>
> BUG_ON() is too harsh, you should return a proper error instead.

Right, I'll fix this.

>
>> +}
>> +
>> +/*
>> + * unmap a DMA buffer, disabling DMA access and
>> + * allowing the main processor to acces the data
>> + */
>> +void vpdma_buf_unmap(struct vpdma_data *vpdma, struct vpdma_buf *buf)
>> +{
>> +	struct device *dev = &vpdma->pdev->dev;
>> +
>> +	if (buf->mapped)
>> +		dma_unmap_single(dev, buf->dma_addr, buf->size, DMA_TO_DEVICE);
>> +
>> +	buf->mapped = 0;
>> +}
>> +
>> +/*
>> + * create a descriptor list, the user of this list will append
>> configuration,
>> + * contorl and data descriptors to this list, this list will be submitted
>
> s/contorl/control/
>
>> to
>> + * VPDMA. VPDMA's list parser will go through each descriptor and perform
>> + * the required DMA operations
>> + */
>> +int vpdma_create_desc_list(struct vpdma_desc_list *list, size_t size, int
>> type)
>> +{
>> +	int r;
>> +
>> +	r = vpdma_buf_alloc(&list->buf, size);
>> +	if (r)
>> +		return r;
>> +
>> +	list->next = list->buf.addr;
>> +
>> +	list->type = type;
>> +
>> +	return 0;
>> +}
>> +
>> +/*
>> + * once a descriptor list is parsed by VPDMA, we reset the list by emptying
>> it,
>> + * to allow new descriptors to be added to the list.
>> + */
>> +void vpdma_reset_desc_list(struct vpdma_desc_list *list)
>> +{
>> +	list->next = list->buf.addr;
>> +}
>> +
>> +/*
>> + * free the buffer allocated fot the VPDMA descriptor list, this should be
>> + * called when the user doesn't want to use VPDMA any more.
>> + */
>> +void vpdma_free_desc_list(struct vpdma_desc_list *list)
>> +{
>> +	vpdma_buf_free(&list->buf);
>> +
>> +	list->next = NULL;
>> +}
>> +
>> +static int vpdma_list_busy(struct vpdma_data *vpdma, int list_num)
>
> Should the function return a bool instead of an int ?

Yes, a bool would be better here.
>
>> +{
>> +	u32 sync_reg = read_reg(vpdma, VPDMA_LIST_STAT_SYNC);
>> +
>> +	return (sync_reg >> (list_num + 16)) & 0x01;
>
> You could shorten that as
>
> 	return read_reg(vpdma, VPDMA_LIST_STAT_SYNC) & BIT(list_num + 16);

yes, that does look better, I'll modify it.

>
>> +}
>> +
>> +/*
>> + * submit a list of DMA descriptors to the VPE VPDMA, do not wait for
>> completion
>> + */
>> +int vpdma_submit_descs(struct vpdma_data *vpdma, struct vpdma_desc_list
>> *list)
>> +{
>> +	/* we always use the first list */
>> +	int list_num = 0;
>> +	int list_size;
>> +
>> +	if (vpdma_list_busy(vpdma, list_num))
>> +		return -EBUSY;
>> +
>> +	/* 16-byte granularity */
>> +	list_size = (list->next - list->buf.addr) >> 4;
>> +
>> +	write_reg(vpdma, VPDMA_LIST_ADDR, (u32) list->buf.dma_addr);
>> +	wmb();
>> +	write_reg(vpdma, VPDMA_LIST_ATTR,
>> +			(list_num << VPDMA_LIST_NUM_SHFT) |
>> +			(list->type << VPDMA_LIST_TYPE_SHFT) |
>> +			list_size);
>> +
>> +	return 0;
>> +}
>> +
>> +/* set or clear the mask for list complete interrupt */
>> +void vpdma_enable_list_complete_irq(struct vpdma_data *vpdma, int list_num,
>> +		bool enable)
>> +{
>> +	u32 val;
>> +
>> +	val = read_reg(vpdma, VPDMA_INT_LIST0_MASK);
>> +	if (enable)
>> +		val |= (1 << (list_num * 2));
>> +	else
>> +		val &= ~(1 << (list_num * 2));
>> +	write_reg(vpdma, VPDMA_INT_LIST0_MASK, val);
>> +}
>> +
>> +/* clear previosuly occured list intterupts in the LIST_STAT register */
>> +void vpdma_clear_list_stat(struct vpdma_data *vpdma)
>> +{
>> +	write_reg(vpdma, VPDMA_INT_LIST0_STAT,
>> +		read_reg(vpdma, VPDMA_INT_LIST0_STAT));
>> +}
>> +
>> +/*
>> + * configures the output mode of the line buffer for the given client, the
>> + * line buffer content can either be mirrored(each line repeated twice) or
>> + * passed to the client as is
>> + */
>> +void vpdma_set_line_mode(struct vpdma_data *vpdma, int line_mode,
>> +		enum vpdma_channel chan)
>> +{
>> +	int client_cstat = chan_info[chan].cstat_offset;
>> +
>> +	insert_field_reg(vpdma, client_cstat, line_mode,
>> +		VPDMA_CSTAT_LINE_MODE_MASK, VPDMA_CSTAT_LINE_MODE_SHIFT);
>> +}
>> +
>> +/*
>> + * configures the event which should trigger VPDMA transfer for the given
>> + * client
>> + */
>> +void vpdma_set_frame_start_event(struct vpdma_data *vpdma,
>> +		enum vpdma_frame_start_event fs_event,
>> +		enum vpdma_channel chan)
>> +{
>> +	int client_cstat = chan_info[chan].cstat_offset;
>> +
>> +	insert_field_reg(vpdma, client_cstat, fs_event,
>> +		VPDMA_CSTAT_FRAME_START_MASK, VPDMA_CSTAT_FRAME_START_SHIFT);
>> +}
>> +
>> +static void vpdma_firmware_cb(const struct firmware *f, void *context)
>> +{
>> +	struct vpdma_data *vpdma = context;
>> +	struct vpdma_buf fw_dma_buf;
>> +	int i, r;
>> +
>> +	dev_dbg(&vpdma->pdev->dev, "firmware callback\n");
>> +
>> +	if (!f || !f->data) {
>> +		dev_err(&vpdma->pdev->dev, "couldn't get firmware\n");
>> +		return;
>> +	}
>> +
>> +	/* already initialized */
>> +	if (get_field_reg(vpdma, VPDMA_LIST_ATTR, VPDMA_LIST_RDY_MASK,
>> +			VPDMA_LIST_RDY_SHFT)) {
>> +		vpdma->ready = true;
>> +		return;
>> +	}
>> +
>> +	r = vpdma_buf_alloc(&fw_dma_buf, f->size);
>> +	if (r) {
>> +		dev_err(&vpdma->pdev->dev,
>> +			"failed to allocate dma buffer for firmware\n");
>> +		goto rel_fw;
>> +	}
>> +
>> +	memcpy(fw_dma_buf.addr, f->data, f->size);
>> +
>> +	vpdma_buf_map(vpdma, &fw_dma_buf);
>> +
>> +	write_reg(vpdma, VPDMA_LIST_ADDR, (u32) fw_dma_buf.dma_addr);
>> +
>> +	for (i = 0; i < 100; i++) {		/* max 1 second */
>> +		msleep_interruptible(10);
>> +
>> +		if (get_field_reg(vpdma, VPDMA_LIST_ATTR, VPDMA_LIST_RDY_MASK,
>> +				VPDMA_LIST_RDY_SHFT))
>> +			break;
>> +	}
>> +
>> +	if (i == 100) {
>> +		dev_err(&vpdma->pdev->dev, "firmware upload failed\n");
>> +		goto free_buf;
>> +	}
>> +
>> +	vpdma->ready = true;
>> +
>> +free_buf:
>> +	vpdma_buf_unmap(vpdma, &fw_dma_buf);
>> +
>> +	vpdma_buf_free(&fw_dma_buf);
>> +rel_fw:
>> +	release_firmware(f);
>> +}
>> +
>> +static int vpdma_load_firmware(struct vpdma_data *vpdma)
>> +{
>> +	int r;
>> +	struct device *dev = &vpdma->pdev->dev;
>> +
>> +	r = request_firmware_nowait(THIS_MODULE, 1,
>> +		(const char *) VPDMA_FIRMWARE, dev, GFP_KERNEL, vpdma,
>> +		vpdma_firmware_cb);
>
> Is there a reason not to use the synchronous interface ? That would simplify
> both this code and the callers, as they won't have to check whether the
> firmware has been correctly loaded.

I'm not clear what you mean by that, the firmware would be stored in the 
filesystem. If the driver is built-in, then the synchronous interface 
wouldn't work unless the firmware is appended to the kernel image. Am I 
missing something here? I'm not very aware of the firmware api.


>> +	if (r) {
>> +		dev_err(dev, "firmware not available %s\n", VPDMA_FIRMWARE);
>> +		return r;
>> +	} else {
>> +		dev_info(dev, "loading firmware %s\n", VPDMA_FIRMWARE);
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +int vpdma_init(struct platform_device *pdev, struct vpdma_data **pvpdma)
>
> As the function allocates the vpdma instance, I would call it vpdma_create()
> and make it turn a struct vpdma_data *. You can then return error codes using
> ERR_PTR().

Yes, that makes quite more sense. I'll use your approach.

>
>> +{
>> +	struct resource *res;
>> +	struct vpdma_data *vpdma;
>> +	int r;
>> +
>> +	dev_dbg(&pdev->dev, "vpdma_init\n");
>> +
>> +	vpdma = devm_kzalloc(&pdev->dev, sizeof(*vpdma), GFP_KERNEL);
>> +	if (!vpdma) {
>> +		dev_err(&pdev->dev, "couldn't alloc vpdma_dev\n");
>> +		return -ENOMEM;
>> +	}
>> +
>> +	vpdma->pdev = pdev;
>> +
>> +	res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "vpdma");
>> +	if (res == NULL) {
>> +		dev_err(&pdev->dev, "missing platform resources data\n");
>> +		return -ENODEV;
>> +	}
>> +
>> +	vpdma->base = devm_ioremap(&pdev->dev, res->start, resource_size(res));
>
> You can use devm_ioremap_resource(). The function checks the res pointer and
> prints error messages, so you can remove the res == NULL check above and the
> dev_err() below.

Ah nice, I'll use that one.

Thanks a lot for the comments!

Archit


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 1/6] v4l: ti-vpe: Create a vpdma helper library
@ 2013-08-14 10:57       ` Archit Taneja
  0 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-08-14 10:57 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: linux-media, linux-omap, dagriego, dale, pawel, m.szyprowski,
	hverkuil, tomi.valkeinen

On Friday 09 August 2013 03:34 AM, Laurent Pinchart wrote:
> Hi Archit,
>
> Thank you for the patch.
>
> On Friday 02 August 2013 19:33:38 Archit Taneja wrote:
>> The primary function of VPDMA is to move data between external memory and
>> internal processing modules(in our case, VPE) that source or sink data.
>> VPDMA is capable of buffering this data and then delivering the data as
>> demanded to the modules as programmed. The modules that source or sink data
>> are referred to as clients or ports. A channel is setup inside the VPDMA to
>> connect a specific memory buffer to a specific client. The VPDMA
>> centralizes the DMA control functions and buffering required to allow all
>> the clients to minimize the effect of long latency times.
>>
>> Add the following to the VPDMA helper:
>>
>> - A data struct which describe VPDMA channels. For now, these channels are
>> the ones used only by VPE, the list of channels will increase when
>> VIP(Video Input Port) also uses the VPDMA library. This channel information
>> will be used to populate fields required by data descriptors.
>>
>> - Data structs which describe the different data types supported by VPDMA.
>> This data type information will be used to populate fields required by data
>> descriptors and used by the VPE driver to map a V4L2 format to the
>> corresponding VPDMA data type.
>>
>> - Provide VPDMA register offset definitions, functions to read, write and
>> modify VPDMA registers.
>>
>> - Functions to create and submit a VPDMA list. A list is a group of
>> descriptors that makes up a set of DMA transfers that need to be completed.
>> Each descriptor will either perform a DMA transaction to fetch input
>> buffers and write to output buffers(data descriptors), or configure the
>> MMRs of sub blocks of VPE(configuration descriptors), or provide control
>> information to VPDMA (control descriptors).
>>
>> - Functions to allocate, map and unmap buffers needed for the descriptor
>> list, payloads containing MMR values and motion vector buffers. These use
>> the DMA mapping APIs to ensure exclusive access to VPDMA.
>>
>> - Functions to enable VPDMA interrupts. VPDMA can trigger an interrupt on
>> the VPE interrupt line when a descriptor list is parsed completely and the
>> DMA transactions are completed. This requires masking the events in VPDMA
>> registers and configuring some top level VPE interrupt registers.
>>
>> - Enable some VPDMA specific parameters: frame start event(when to start DMA
>> for a client) and line mode(whether each line fetched should be mirrored or
>> not).
>>
>> - Function to load firmware required by VPDMA. VPDMA requires a firmware for
>> it's internal list manager. We add the required request_firmware apis to
>> fetch this firmware from user space.
>>
>> - Function to dump VPDMA registers.
>>
>> - A function to initialize VPDMA, this will be called by the VPE driver with
>> it's platform device pointer, this function will take care of loading VPDMA
>> firmware and returning a handle back to the VPE driver. The VIP driver will
>> also call the same init function to initialize it's own VPDMA instance.
>>
>> Signed-off-by: Archit Taneja <archit@ti.com>
>> ---
>>   drivers/media/platform/ti-vpe/vpdma.c      | 589 ++++++++++++++++++++++++++
>>   drivers/media/platform/ti-vpe/vpdma.h      | 154 ++++++++
>>   drivers/media/platform/ti-vpe/vpdma_priv.h | 119 ++++++
>>   3 files changed, 862 insertions(+)
>>   create mode 100644 drivers/media/platform/ti-vpe/vpdma.c
>>   create mode 100644 drivers/media/platform/ti-vpe/vpdma.h
>>   create mode 100644 drivers/media/platform/ti-vpe/vpdma_priv.h
>>
>> diff --git a/drivers/media/platform/ti-vpe/vpdma.c
>> b/drivers/media/platform/ti-vpe/vpdma.c new file mode 100644
>> index 0000000..b15b3dd
>> --- /dev/null
>> +++ b/drivers/media/platform/ti-vpe/vpdma.c
>> @@ -0,0 +1,589 @@
>
> [snip]
>
>> +static int get_field(u32 value, u32 mask, int shift)
>> +{
>> +	return (value & (mask << shift)) >> shift;
>> +}
>> +
>> +static int get_field_reg(struct vpdma_data *vpdma, int offset,
>> +		u32 mask, int shift)
>
> I would call this read_field_reg().
>
>> +{
>> +	return get_field(read_reg(vpdma, offset), mask, shift);
>> +}
>> +
>> +static void insert_field(u32 *valp, u32 field, u32 mask, int shift)
>> +{
>> +	u32 val = *valp;
>> +
>> +	val &= ~(mask << shift);
>> +	val |= (field & mask) << shift;
>> +	*valp = val;
>> +}
>
> get_field() and insert_field() are used in a single location, you can manually
> inline them.

Thanks, I'll make these modifications.

>
>> +static void insert_field_reg(struct vpdma_data *vpdma, int offset, u32
>> field,
>> +		u32 mask, int shift)
>> +{
>> +	u32 val = read_reg(vpdma, offset);
>> +
>> +	insert_field(&val, field, mask, shift);
>> +
>> +	write_reg(vpdma, offset, val);
>> +}
>
> [snip]
>
>> +/*
>> + * Allocate a DMA buffer
>> + */
>> +int vpdma_buf_alloc(struct vpdma_buf *buf, size_t size)
>> +{
>> +	buf->size = size;
>> +	buf->mapped = 0;
>> +	buf->addr = kzalloc(size, GFP_KERNEL);
>
> You should use the dma allocation API (depending on your needs,
> dma_alloc_coherent for instance) to allocate DMA-able buffers.

I'm not sure about this, dma_map_single() api works fine on kzalloc'd 
buffers. The above function is used to allocate small contiguous 
buffers(never more than 1024 bytes) needed for descriptors for the DMA 
IP. I thought of using DMA pool, but that creates small buffers of the 
same size. So I finally went with kzalloc.

>
>> +	if (!buf->addr)
>> +		return -ENOMEM;
>> +
>> +	WARN_ON((u32) buf->addr & VPDMA_DESC_ALIGN);
>> +
>> +	return 0;
>> +}
>> +
>> +void vpdma_buf_free(struct vpdma_buf *buf)
>> +{
>> +	WARN_ON(buf->mapped != 0);
>> +	kfree(buf->addr);
>> +	buf->addr = NULL;
>> +	buf->size = 0;
>> +}
>> +
>> +/*
>> + * map a DMA buffer, enabling DMA access
>> + */
>> +void vpdma_buf_map(struct vpdma_data *vpdma, struct vpdma_buf *buf)
>> +{
>> +	struct device *dev = &vpdma->pdev->dev;
>> +
>> +	WARN_ON(buf->mapped != 0);
>> +	buf->dma_addr = dma_map_single(dev, buf->addr, buf->size,
>> +				DMA_TO_DEVICE);
>> +	buf->mapped = 1;
>> +	BUG_ON(dma_mapping_error(dev, buf->dma_addr));
>
> BUG_ON() is too harsh, you should return a proper error instead.

Right, I'll fix this.

>
>> +}
>> +
>> +/*
>> + * unmap a DMA buffer, disabling DMA access and
>> + * allowing the main processor to acces the data
>> + */
>> +void vpdma_buf_unmap(struct vpdma_data *vpdma, struct vpdma_buf *buf)
>> +{
>> +	struct device *dev = &vpdma->pdev->dev;
>> +
>> +	if (buf->mapped)
>> +		dma_unmap_single(dev, buf->dma_addr, buf->size, DMA_TO_DEVICE);
>> +
>> +	buf->mapped = 0;
>> +}
>> +
>> +/*
>> + * create a descriptor list, the user of this list will append
>> configuration,
>> + * contorl and data descriptors to this list, this list will be submitted
>
> s/contorl/control/
>
>> to
>> + * VPDMA. VPDMA's list parser will go through each descriptor and perform
>> + * the required DMA operations
>> + */
>> +int vpdma_create_desc_list(struct vpdma_desc_list *list, size_t size, int
>> type)
>> +{
>> +	int r;
>> +
>> +	r = vpdma_buf_alloc(&list->buf, size);
>> +	if (r)
>> +		return r;
>> +
>> +	list->next = list->buf.addr;
>> +
>> +	list->type = type;
>> +
>> +	return 0;
>> +}
>> +
>> +/*
>> + * once a descriptor list is parsed by VPDMA, we reset the list by emptying
>> it,
>> + * to allow new descriptors to be added to the list.
>> + */
>> +void vpdma_reset_desc_list(struct vpdma_desc_list *list)
>> +{
>> +	list->next = list->buf.addr;
>> +}
>> +
>> +/*
>> + * free the buffer allocated fot the VPDMA descriptor list, this should be
>> + * called when the user doesn't want to use VPDMA any more.
>> + */
>> +void vpdma_free_desc_list(struct vpdma_desc_list *list)
>> +{
>> +	vpdma_buf_free(&list->buf);
>> +
>> +	list->next = NULL;
>> +}
>> +
>> +static int vpdma_list_busy(struct vpdma_data *vpdma, int list_num)
>
> Should the function return a bool instead of an int ?

Yes, a bool would be better here.
>
>> +{
>> +	u32 sync_reg = read_reg(vpdma, VPDMA_LIST_STAT_SYNC);
>> +
>> +	return (sync_reg >> (list_num + 16)) & 0x01;
>
> You could shorten that as
>
> 	return read_reg(vpdma, VPDMA_LIST_STAT_SYNC) & BIT(list_num + 16);

yes, that does look better, I'll modify it.

>
>> +}
>> +
>> +/*
>> + * submit a list of DMA descriptors to the VPE VPDMA, do not wait for
>> completion
>> + */
>> +int vpdma_submit_descs(struct vpdma_data *vpdma, struct vpdma_desc_list
>> *list)
>> +{
>> +	/* we always use the first list */
>> +	int list_num = 0;
>> +	int list_size;
>> +
>> +	if (vpdma_list_busy(vpdma, list_num))
>> +		return -EBUSY;
>> +
>> +	/* 16-byte granularity */
>> +	list_size = (list->next - list->buf.addr) >> 4;
>> +
>> +	write_reg(vpdma, VPDMA_LIST_ADDR, (u32) list->buf.dma_addr);
>> +	wmb();
>> +	write_reg(vpdma, VPDMA_LIST_ATTR,
>> +			(list_num << VPDMA_LIST_NUM_SHFT) |
>> +			(list->type << VPDMA_LIST_TYPE_SHFT) |
>> +			list_size);
>> +
>> +	return 0;
>> +}
>> +
>> +/* set or clear the mask for list complete interrupt */
>> +void vpdma_enable_list_complete_irq(struct vpdma_data *vpdma, int list_num,
>> +		bool enable)
>> +{
>> +	u32 val;
>> +
>> +	val = read_reg(vpdma, VPDMA_INT_LIST0_MASK);
>> +	if (enable)
>> +		val |= (1 << (list_num * 2));
>> +	else
>> +		val &= ~(1 << (list_num * 2));
>> +	write_reg(vpdma, VPDMA_INT_LIST0_MASK, val);
>> +}
>> +
>> +/* clear previosuly occured list intterupts in the LIST_STAT register */
>> +void vpdma_clear_list_stat(struct vpdma_data *vpdma)
>> +{
>> +	write_reg(vpdma, VPDMA_INT_LIST0_STAT,
>> +		read_reg(vpdma, VPDMA_INT_LIST0_STAT));
>> +}
>> +
>> +/*
>> + * configures the output mode of the line buffer for the given client, the
>> + * line buffer content can either be mirrored(each line repeated twice) or
>> + * passed to the client as is
>> + */
>> +void vpdma_set_line_mode(struct vpdma_data *vpdma, int line_mode,
>> +		enum vpdma_channel chan)
>> +{
>> +	int client_cstat = chan_info[chan].cstat_offset;
>> +
>> +	insert_field_reg(vpdma, client_cstat, line_mode,
>> +		VPDMA_CSTAT_LINE_MODE_MASK, VPDMA_CSTAT_LINE_MODE_SHIFT);
>> +}
>> +
>> +/*
>> + * configures the event which should trigger VPDMA transfer for the given
>> + * client
>> + */
>> +void vpdma_set_frame_start_event(struct vpdma_data *vpdma,
>> +		enum vpdma_frame_start_event fs_event,
>> +		enum vpdma_channel chan)
>> +{
>> +	int client_cstat = chan_info[chan].cstat_offset;
>> +
>> +	insert_field_reg(vpdma, client_cstat, fs_event,
>> +		VPDMA_CSTAT_FRAME_START_MASK, VPDMA_CSTAT_FRAME_START_SHIFT);
>> +}
>> +
>> +static void vpdma_firmware_cb(const struct firmware *f, void *context)
>> +{
>> +	struct vpdma_data *vpdma = context;
>> +	struct vpdma_buf fw_dma_buf;
>> +	int i, r;
>> +
>> +	dev_dbg(&vpdma->pdev->dev, "firmware callback\n");
>> +
>> +	if (!f || !f->data) {
>> +		dev_err(&vpdma->pdev->dev, "couldn't get firmware\n");
>> +		return;
>> +	}
>> +
>> +	/* already initialized */
>> +	if (get_field_reg(vpdma, VPDMA_LIST_ATTR, VPDMA_LIST_RDY_MASK,
>> +			VPDMA_LIST_RDY_SHFT)) {
>> +		vpdma->ready = true;
>> +		return;
>> +	}
>> +
>> +	r = vpdma_buf_alloc(&fw_dma_buf, f->size);
>> +	if (r) {
>> +		dev_err(&vpdma->pdev->dev,
>> +			"failed to allocate dma buffer for firmware\n");
>> +		goto rel_fw;
>> +	}
>> +
>> +	memcpy(fw_dma_buf.addr, f->data, f->size);
>> +
>> +	vpdma_buf_map(vpdma, &fw_dma_buf);
>> +
>> +	write_reg(vpdma, VPDMA_LIST_ADDR, (u32) fw_dma_buf.dma_addr);
>> +
>> +	for (i = 0; i < 100; i++) {		/* max 1 second */
>> +		msleep_interruptible(10);
>> +
>> +		if (get_field_reg(vpdma, VPDMA_LIST_ATTR, VPDMA_LIST_RDY_MASK,
>> +				VPDMA_LIST_RDY_SHFT))
>> +			break;
>> +	}
>> +
>> +	if (i == 100) {
>> +		dev_err(&vpdma->pdev->dev, "firmware upload failed\n");
>> +		goto free_buf;
>> +	}
>> +
>> +	vpdma->ready = true;
>> +
>> +free_buf:
>> +	vpdma_buf_unmap(vpdma, &fw_dma_buf);
>> +
>> +	vpdma_buf_free(&fw_dma_buf);
>> +rel_fw:
>> +	release_firmware(f);
>> +}
>> +
>> +static int vpdma_load_firmware(struct vpdma_data *vpdma)
>> +{
>> +	int r;
>> +	struct device *dev = &vpdma->pdev->dev;
>> +
>> +	r = request_firmware_nowait(THIS_MODULE, 1,
>> +		(const char *) VPDMA_FIRMWARE, dev, GFP_KERNEL, vpdma,
>> +		vpdma_firmware_cb);
>
> Is there a reason not to use the synchronous interface ? That would simplify
> both this code and the callers, as they won't have to check whether the
> firmware has been correctly loaded.

I'm not clear what you mean by that, the firmware would be stored in the 
filesystem. If the driver is built-in, then the synchronous interface 
wouldn't work unless the firmware is appended to the kernel image. Am I 
missing something here? I'm not very aware of the firmware api.


>> +	if (r) {
>> +		dev_err(dev, "firmware not available %s\n", VPDMA_FIRMWARE);
>> +		return r;
>> +	} else {
>> +		dev_info(dev, "loading firmware %s\n", VPDMA_FIRMWARE);
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +int vpdma_init(struct platform_device *pdev, struct vpdma_data **pvpdma)
>
> As the function allocates the vpdma instance, I would call it vpdma_create()
> and make it turn a struct vpdma_data *. You can then return error codes using
> ERR_PTR().

Yes, that makes quite more sense. I'll use your approach.

>
>> +{
>> +	struct resource *res;
>> +	struct vpdma_data *vpdma;
>> +	int r;
>> +
>> +	dev_dbg(&pdev->dev, "vpdma_init\n");
>> +
>> +	vpdma = devm_kzalloc(&pdev->dev, sizeof(*vpdma), GFP_KERNEL);
>> +	if (!vpdma) {
>> +		dev_err(&pdev->dev, "couldn't alloc vpdma_dev\n");
>> +		return -ENOMEM;
>> +	}
>> +
>> +	vpdma->pdev = pdev;
>> +
>> +	res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "vpdma");
>> +	if (res == NULL) {
>> +		dev_err(&pdev->dev, "missing platform resources data\n");
>> +		return -ENODEV;
>> +	}
>> +
>> +	vpdma->base = devm_ioremap(&pdev->dev, res->start, resource_size(res));
>
> You can use devm_ioremap_resource(). The function checks the res pointer and
> prints error messages, so you can remove the res == NULL check above and the
> dev_err() below.

Ah nice, I'll use that one.

Thanks a lot for the comments!

Archit


^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v2 0/6] v4l: VPE mem to mem driver
  2013-08-02 14:03 ` Archit Taneja
@ 2013-08-20 11:00   ` Archit Taneja
  -1 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-08-20 11:00 UTC (permalink / raw)
  To: linux-media, hverkuil, laurent.pinchart, tomi.valkeinen
  Cc: linux-omap, Archit Taneja

VPE(Video Processing Engine) is an IP found on DRA7xx, this series adds VPE as a
mem to mem v4l2 driver, and VPDMA as a helper library.

The previous revision of the series described VPE in detail, you can have a look
at it here:

http://www.spinics.net/lists/linux-media/msg66518.html

There were a lot of useful suggestions made by Hans, Tomi and Laurent. This
series incorporate these changes. There are few comments which I haven't been
able to address, I've pointed those below too.

Changes in v2:
- Constify the structs that can be constified.
- Remove the use of wmb() from vpdma_submit_descs().
- Remove an unnecessary layer of helper macros used for creating or reading
  VPDMA descriptors.
- Create a CONFIG which enables/disables VPE debug prints.
- Remove CAPTURE/OUTPUT_MPLANE from device_caps.
- Fix the pix->field setting in TRY_FMT ioctl.
- Fix a bug in the v4l2 control handler registration and release.
- Move video_register_device() at the end of driver probe.
- Improve some of the function names, remove unnecessary BUG_ONs etc, use
  ERR_PTR() to return error codes correctly.

Things still open in v2:
- Tomi had a comment about the usage msleep_interruptible in the first patch. I
  am not clear whether this is actually a problem and what's the right approach.
- Laurent suggested using DMA allocation API for the VPDMA library, we currently
  use kzalloc to allocate and dma_map/unmap_single api to map to VPDMA/CPU. DMA
  pool api won't be a good replacement here.
- There was a suggestion to use synchronous firmware interface. If I understand
  right, synchronous interfaces forces us to have the firmware appended to the
  kernel, that's not something we want.


Archit Taneja (6):
  v4l: ti-vpe: Create a vpdma helper library
  v4l: ti-vpe: Add helpers for creating VPDMA descriptors
  v4l: ti-vpe: Add VPE mem to mem driver
  v4l: ti-vpe: Add de-interlacer support in VPE
  arm: dra7xx: hwmod data: add VPE hwmod data and ocp_if info
  experimental: arm: dts: dra7xx: Add a DT node for VPE

 arch/arm/boot/dts/dra7.dtsi                |   11 +
 arch/arm/mach-omap2/omap_hwmod_7xx_data.c  |   42 +
 drivers/media/platform/Kconfig             |   16 +
 drivers/media/platform/Makefile            |    2 +
 drivers/media/platform/ti-vpe/Makefile     |    5 +
 drivers/media/platform/ti-vpe/vpdma.c      |  846 ++++++++++++
 drivers/media/platform/ti-vpe/vpdma.h      |  202 +++
 drivers/media/platform/ti-vpe/vpdma_priv.h |  640 +++++++++
 drivers/media/platform/ti-vpe/vpe.c        | 2042 ++++++++++++++++++++++++++++
 drivers/media/platform/ti-vpe/vpe_regs.h   |  496 +++++++
 10 files changed, 4302 insertions(+)
 create mode 100644 drivers/media/platform/ti-vpe/Makefile
 create mode 100644 drivers/media/platform/ti-vpe/vpdma.c
 create mode 100644 drivers/media/platform/ti-vpe/vpdma.h
 create mode 100644 drivers/media/platform/ti-vpe/vpdma_priv.h
 create mode 100644 drivers/media/platform/ti-vpe/vpe.c
 create mode 100644 drivers/media/platform/ti-vpe/vpe_regs.h

-- 
1.8.1.2


^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v2 0/6] v4l: VPE mem to mem driver
@ 2013-08-20 11:00   ` Archit Taneja
  0 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-08-20 11:00 UTC (permalink / raw)
  To: linux-media, hverkuil, laurent.pinchart, tomi.valkeinen
  Cc: linux-omap, Archit Taneja

VPE(Video Processing Engine) is an IP found on DRA7xx, this series adds VPE as a
mem to mem v4l2 driver, and VPDMA as a helper library.

The previous revision of the series described VPE in detail, you can have a look
at it here:

http://www.spinics.net/lists/linux-media/msg66518.html

There were a lot of useful suggestions made by Hans, Tomi and Laurent. This
series incorporate these changes. There are few comments which I haven't been
able to address, I've pointed those below too.

Changes in v2:
- Constify the structs that can be constified.
- Remove the use of wmb() from vpdma_submit_descs().
- Remove an unnecessary layer of helper macros used for creating or reading
  VPDMA descriptors.
- Create a CONFIG which enables/disables VPE debug prints.
- Remove CAPTURE/OUTPUT_MPLANE from device_caps.
- Fix the pix->field setting in TRY_FMT ioctl.
- Fix a bug in the v4l2 control handler registration and release.
- Move video_register_device() at the end of driver probe.
- Improve some of the function names, remove unnecessary BUG_ONs etc, use
  ERR_PTR() to return error codes correctly.

Things still open in v2:
- Tomi had a comment about the usage msleep_interruptible in the first patch. I
  am not clear whether this is actually a problem and what's the right approach.
- Laurent suggested using DMA allocation API for the VPDMA library, we currently
  use kzalloc to allocate and dma_map/unmap_single api to map to VPDMA/CPU. DMA
  pool api won't be a good replacement here.
- There was a suggestion to use synchronous firmware interface. If I understand
  right, synchronous interfaces forces us to have the firmware appended to the
  kernel, that's not something we want.


Archit Taneja (6):
  v4l: ti-vpe: Create a vpdma helper library
  v4l: ti-vpe: Add helpers for creating VPDMA descriptors
  v4l: ti-vpe: Add VPE mem to mem driver
  v4l: ti-vpe: Add de-interlacer support in VPE
  arm: dra7xx: hwmod data: add VPE hwmod data and ocp_if info
  experimental: arm: dts: dra7xx: Add a DT node for VPE

 arch/arm/boot/dts/dra7.dtsi                |   11 +
 arch/arm/mach-omap2/omap_hwmod_7xx_data.c  |   42 +
 drivers/media/platform/Kconfig             |   16 +
 drivers/media/platform/Makefile            |    2 +
 drivers/media/platform/ti-vpe/Makefile     |    5 +
 drivers/media/platform/ti-vpe/vpdma.c      |  846 ++++++++++++
 drivers/media/platform/ti-vpe/vpdma.h      |  202 +++
 drivers/media/platform/ti-vpe/vpdma_priv.h |  640 +++++++++
 drivers/media/platform/ti-vpe/vpe.c        | 2042 ++++++++++++++++++++++++++++
 drivers/media/platform/ti-vpe/vpe_regs.h   |  496 +++++++
 10 files changed, 4302 insertions(+)
 create mode 100644 drivers/media/platform/ti-vpe/Makefile
 create mode 100644 drivers/media/platform/ti-vpe/vpdma.c
 create mode 100644 drivers/media/platform/ti-vpe/vpdma.h
 create mode 100644 drivers/media/platform/ti-vpe/vpdma_priv.h
 create mode 100644 drivers/media/platform/ti-vpe/vpe.c
 create mode 100644 drivers/media/platform/ti-vpe/vpe_regs.h

-- 
1.8.1.2


^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v2 1/6] v4l: ti-vpe: Create a vpdma helper library
  2013-08-20 11:00   ` Archit Taneja
@ 2013-08-20 11:00     ` Archit Taneja
  -1 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-08-20 11:00 UTC (permalink / raw)
  To: linux-media, hverkuil, laurent.pinchart, tomi.valkeinen
  Cc: linux-omap, Archit Taneja

The primary function of VPDMA is to move data between external memory and
internal processing modules(in our case, VPE) that source or sink data. VPDMA is
capable of buffering this data and then delivering the data as demanded to the
modules as programmed. The modules that source or sink data are referred to as
clients or ports. A channel is setup inside the VPDMA to connect a specific
memory buffer to a specific client. The VPDMA centralizes the DMA control
functions and buffering required to allow all the clients to minimize the
effect of long latency times.

Add the following to the VPDMA helper:

- A data struct which describe VPDMA channels. For now, these channels are the
  ones used only by VPE, the list of channels will increase when VIP(Video
  Input Port) also uses the VPDMA library. This channel information will be
  used to populate fields required by data descriptors.

- Data structs which describe the different data types supported by VPDMA. This
  data type information will be used to populate fields required by data
  descriptors and used by the VPE driver to map a V4L2 format to the
  corresponding VPDMA data type.

- Provide VPDMA register offset definitions, functions to read, write and modify
  VPDMA registers.

- Functions to create and submit a VPDMA list. A list is a group of descriptors
  that makes up a set of DMA transfers that need to be completed. Each
  descriptor will either perform a DMA transaction to fetch input buffers and
  write to output buffers(data descriptors), or configure the MMRs of sub blocks
  of VPE(configuration descriptors), or provide control information to VPDMA
  (control descriptors).

- Functions to allocate, map and unmap buffers needed for the descriptor list,
  payloads containing MMR values and motion vector buffers. These use the
  DMA mapping APIs to ensure exclusive access to VPDMA.

- Functions to enable VPDMA interrupts. VPDMA can trigger an interrupt on the
  VPE interrupt line when a descriptor list is parsed completely and the DMA
  transactions are completed. This requires masking the events in VPDMA
  registers and configuring some top level VPE interrupt registers.

- Enable some VPDMA specific parameters: frame start event(when to start DMA for
  a client) and line mode(whether each line fetched should be mirrored or not).

- Function to load firmware required by VPDMA. VPDMA requires a firmware for
  it's internal list manager. We add the required request_firmware apis to fetch
  this firmware from user space.

- Function to dump VPDMA registers.

- A function to initialize and create a VPDMA instance, this will be called by
  the VPE driver with it's platform device pointer, this function will take care
  of loading VPDMA firmware and returning a vpdma_data instance back to the VPE
  driver. The VIP driver will also call the same init function to initialize it's
  own VPDMA instance.

Signed-off-by: Archit Taneja <archit@ti.com>
---
 drivers/media/platform/ti-vpe/vpdma.c      | 578 +++++++++++++++++++++++++++++
 drivers/media/platform/ti-vpe/vpdma.h      | 154 ++++++++
 drivers/media/platform/ti-vpe/vpdma_priv.h | 119 ++++++
 3 files changed, 851 insertions(+)
 create mode 100644 drivers/media/platform/ti-vpe/vpdma.c
 create mode 100644 drivers/media/platform/ti-vpe/vpdma.h
 create mode 100644 drivers/media/platform/ti-vpe/vpdma_priv.h

diff --git a/drivers/media/platform/ti-vpe/vpdma.c b/drivers/media/platform/ti-vpe/vpdma.c
new file mode 100644
index 0000000..84b8ee52
--- /dev/null
+++ b/drivers/media/platform/ti-vpe/vpdma.c
@@ -0,0 +1,578 @@
+/*
+ * VPDMA helper library
+ *
+ * Copyright (c) 2013 Texas Instruments Inc.
+ *
+ * David Griego, <dagriego@biglakesoftware.com>
+ * Dale Farnsworth, <dale@farnsworth.org>
+ * Archit Taneja, <archit@ti.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#include <linux/delay.h>
+#include <linux/dma-mapping.h>
+#include <linux/err.h>
+#include <linux/firmware.h>
+#include <linux/io.h>
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+
+#include "vpdma.h"
+#include "vpdma_priv.h"
+
+#define VPDMA_FIRMWARE	"vpdma-1b8.bin"
+
+const struct vpdma_data_format vpdma_yuv_fmts[] = {
+	[VPDMA_DATA_FMT_Y444] = {
+		.data_type	= DATA_TYPE_Y444,
+		.depth		= 8,
+	},
+	[VPDMA_DATA_FMT_Y422] = {
+		.data_type	= DATA_TYPE_Y422,
+		.depth		= 8,
+	},
+	[VPDMA_DATA_FMT_Y420] = {
+		.data_type	= DATA_TYPE_Y420,
+		.depth		= 8,
+	},
+	[VPDMA_DATA_FMT_C444] = {
+		.data_type	= DATA_TYPE_C444,
+		.depth		= 8,
+	},
+	[VPDMA_DATA_FMT_C422] = {
+		.data_type	= DATA_TYPE_C422,
+		.depth		= 8,
+	},
+	[VPDMA_DATA_FMT_C420] = {
+		.data_type	= DATA_TYPE_C420,
+		.depth		= 4,
+	},
+	[VPDMA_DATA_FMT_YC422] = {
+		.data_type	= DATA_TYPE_YC422,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_YC444] = {
+		.data_type	= DATA_TYPE_YC444,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_CY422] = {
+		.data_type	= DATA_TYPE_CY422,
+		.depth		= 16,
+	},
+};
+
+const struct vpdma_data_format vpdma_rgb_fmts[] = {
+	[VPDMA_DATA_FMT_RGB565] = {
+		.data_type	= DATA_TYPE_RGB16_565,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_ARGB16_1555] = {
+		.data_type	= DATA_TYPE_ARGB_1555,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_ARGB16] = {
+		.data_type	= DATA_TYPE_ARGB_4444,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_RGBA16_5551] = {
+		.data_type	= DATA_TYPE_RGBA_5551,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_RGBA16] = {
+		.data_type	= DATA_TYPE_RGBA_4444,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_ARGB24] = {
+		.data_type	= DATA_TYPE_ARGB24_6666,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_RGB24] = {
+		.data_type	= DATA_TYPE_RGB24_888,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_ARGB32] = {
+		.data_type	= DATA_TYPE_ARGB32_8888,
+		.depth		= 32,
+	},
+	[VPDMA_DATA_FMT_RGBA24] = {
+		.data_type	= DATA_TYPE_RGBA24_6666,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_RGBA32] = {
+		.data_type	= DATA_TYPE_RGBA32_8888,
+		.depth		= 32,
+	},
+	[VPDMA_DATA_FMT_BGR565] = {
+		.data_type	= DATA_TYPE_BGR16_565,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_ABGR16_1555] = {
+		.data_type	= DATA_TYPE_ABGR_1555,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_ABGR16] = {
+		.data_type	= DATA_TYPE_ABGR_4444,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_BGRA16_5551] = {
+		.data_type	= DATA_TYPE_BGRA_5551,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_BGRA16] = {
+		.data_type	= DATA_TYPE_BGRA_4444,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_ABGR24] = {
+		.data_type	= DATA_TYPE_ABGR24_6666,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_BGR24] = {
+		.data_type	= DATA_TYPE_BGR24_888,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_ABGR32] = {
+		.data_type	= DATA_TYPE_ABGR32_8888,
+		.depth		= 32,
+	},
+	[VPDMA_DATA_FMT_BGRA24] = {
+		.data_type	= DATA_TYPE_BGRA24_6666,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_BGRA32] = {
+		.data_type	= DATA_TYPE_BGRA32_8888,
+		.depth		= 32,
+	},
+};
+
+const struct vpdma_data_format vpdma_misc_fmts[] = {
+	[VPDMA_DATA_FMT_MV] = {
+		.data_type	= DATA_TYPE_MV,
+		.depth		= 4,
+	},
+};
+
+struct vpdma_channel_info {
+	int num;		/* VPDMA channel number */
+	int cstat_offset;	/* client CSTAT register offset */
+};
+
+static const struct vpdma_channel_info chan_info[] = {
+	[VPE_CHAN_LUMA1_IN] = {
+		.num		= VPE_CHAN_NUM_LUMA1_IN,
+		.cstat_offset	= VPDMA_DEI_LUMA1_CSTAT,
+	},
+	[VPE_CHAN_CHROMA1_IN] = {
+		.num		= VPE_CHAN_NUM_CHROMA1_IN,
+		.cstat_offset	= VPDMA_DEI_CHROMA1_CSTAT,
+	},
+	[VPE_CHAN_LUMA2_IN] = {
+		.num		= VPE_CHAN_NUM_LUMA2_IN,
+		.cstat_offset	= VPDMA_DEI_LUMA2_CSTAT,
+	},
+	[VPE_CHAN_CHROMA2_IN] = {
+		.num		= VPE_CHAN_NUM_CHROMA2_IN,
+		.cstat_offset	= VPDMA_DEI_CHROMA2_CSTAT,
+	},
+	[VPE_CHAN_LUMA3_IN] = {
+		.num		= VPE_CHAN_NUM_LUMA3_IN,
+		.cstat_offset	= VPDMA_DEI_LUMA3_CSTAT,
+	},
+	[VPE_CHAN_CHROMA3_IN] = {
+		.num		= VPE_CHAN_NUM_CHROMA3_IN,
+		.cstat_offset	= VPDMA_DEI_CHROMA3_CSTAT,
+	},
+	[VPE_CHAN_MV_IN] = {
+		.num		= VPE_CHAN_NUM_MV_IN,
+		.cstat_offset	= VPDMA_DEI_MV_IN_CSTAT,
+	},
+	[VPE_CHAN_MV_OUT] = {
+		.num		= VPE_CHAN_NUM_MV_OUT,
+		.cstat_offset	= VPDMA_DEI_MV_OUT_CSTAT,
+	},
+	[VPE_CHAN_LUMA_OUT] = {
+		.num		= VPE_CHAN_NUM_LUMA_OUT,
+		.cstat_offset	= VPDMA_VIP_UP_Y_CSTAT,
+	},
+	[VPE_CHAN_CHROMA_OUT] = {
+		.num		= VPE_CHAN_NUM_CHROMA_OUT,
+		.cstat_offset	= VPDMA_VIP_UP_UV_CSTAT,
+	},
+	[VPE_CHAN_RGB_OUT] = {
+		.num		= VPE_CHAN_NUM_RGB_OUT,
+		.cstat_offset	= VPDMA_VIP_UP_Y_CSTAT,
+	},
+};
+
+static u32 read_reg(struct vpdma_data *vpdma, int offset)
+{
+	return ioread32(vpdma->base + offset);
+}
+
+static void write_reg(struct vpdma_data *vpdma, int offset, u32 value)
+{
+	iowrite32(value, vpdma->base + offset);
+}
+
+static int read_field_reg(struct vpdma_data *vpdma, int offset,
+		u32 mask, int shift)
+{
+	return (read_reg(vpdma, offset) & (mask << shift)) >> shift;
+}
+
+static void write_field_reg(struct vpdma_data *vpdma, int offset, u32 field,
+		u32 mask, int shift)
+{
+	u32 val = read_reg(vpdma, offset);
+
+	val &= ~(mask << shift);
+	val |= (field & mask) << shift;
+
+	write_reg(vpdma, offset, val);
+}
+
+void vpdma_dump_regs(struct vpdma_data *vpdma)
+{
+	struct device *dev = &vpdma->pdev->dev;
+
+#define DUMPREG(r) dev_dbg(dev, "%-35s %08x\n", #r, read_reg(vpdma, VPDMA_##r))
+
+	dev_dbg(dev, "VPDMA Registers:\n");
+
+	DUMPREG(PID);
+	DUMPREG(LIST_ADDR);
+	DUMPREG(LIST_ATTR);
+	DUMPREG(LIST_STAT_SYNC);
+	DUMPREG(BG_RGB);
+	DUMPREG(BG_YUV);
+	DUMPREG(SETUP);
+	DUMPREG(MAX_SIZE1);
+	DUMPREG(MAX_SIZE2);
+	DUMPREG(MAX_SIZE3);
+
+	/*
+	 * dumping registers of only group0 and group3, because VPE channels
+	 * lie within group0 and group3 registers
+	 */
+	DUMPREG(INT_CHAN_STAT(0));
+	DUMPREG(INT_CHAN_MASK(0));
+	DUMPREG(INT_CHAN_STAT(3));
+	DUMPREG(INT_CHAN_MASK(3));
+	DUMPREG(INT_CLIENT0_STAT);
+	DUMPREG(INT_CLIENT0_MASK);
+	DUMPREG(INT_CLIENT1_STAT);
+	DUMPREG(INT_CLIENT1_MASK);
+	DUMPREG(INT_LIST0_STAT);
+	DUMPREG(INT_LIST0_MASK);
+
+	/*
+	 * these are registers specific to VPE clients, we can make this
+	 * function dump client registers specific to VPE or VIP based on
+	 * who is using it
+	 */
+	DUMPREG(DEI_CHROMA1_CSTAT);
+	DUMPREG(DEI_LUMA1_CSTAT);
+	DUMPREG(DEI_CHROMA2_CSTAT);
+	DUMPREG(DEI_LUMA2_CSTAT);
+	DUMPREG(DEI_CHROMA3_CSTAT);
+	DUMPREG(DEI_LUMA3_CSTAT);
+	DUMPREG(DEI_MV_IN_CSTAT);
+	DUMPREG(DEI_MV_OUT_CSTAT);
+	DUMPREG(VIP_UP_Y_CSTAT);
+	DUMPREG(VIP_UP_UV_CSTAT);
+	DUMPREG(VPI_CTL_CSTAT);
+}
+
+/*
+ * Allocate a DMA buffer
+ */
+int vpdma_buf_alloc(struct vpdma_buf *buf, size_t size)
+{
+	buf->size = size;
+	buf->mapped = false;
+	buf->addr = kzalloc(size, GFP_KERNEL);
+	if (!buf->addr)
+		return -ENOMEM;
+
+	WARN_ON((u32) buf->addr & VPDMA_DESC_ALIGN);
+
+	return 0;
+}
+
+void vpdma_buf_free(struct vpdma_buf *buf)
+{
+	WARN_ON(buf->mapped);
+	kfree(buf->addr);
+	buf->addr = NULL;
+	buf->size = 0;
+}
+
+/*
+ * map a DMA buffer, enabling DMA access
+ */
+int vpdma_buf_map(struct vpdma_data *vpdma, struct vpdma_buf *buf)
+{
+	struct device *dev = &vpdma->pdev->dev;
+
+	WARN_ON(buf->mapped);
+	buf->dma_addr = dma_map_single(dev, buf->addr, buf->size,
+				DMA_TO_DEVICE);
+	if (dma_mapping_error(dev, buf->dma_addr)) {
+		dev_err(dev, "failed to map buffer\n");
+		return -EINVAL;
+	}
+
+	buf->mapped = true;
+
+	return 0;
+}
+
+/*
+ * unmap a DMA buffer, disabling DMA access and
+ * allowing the main processor to acces the data
+ */
+void vpdma_buf_unmap(struct vpdma_data *vpdma, struct vpdma_buf *buf)
+{
+	struct device *dev = &vpdma->pdev->dev;
+
+	if (buf->mapped)
+		dma_unmap_single(dev, buf->dma_addr, buf->size, DMA_TO_DEVICE);
+
+	buf->mapped = false;
+}
+
+/*
+ * create a descriptor list, the user of this list will append configuration,
+ * control and data descriptors to this list, this list will be submitted to
+ * VPDMA. VPDMA's list parser will go through each descriptor and perform the
+ * required DMA operations
+ */
+int vpdma_create_desc_list(struct vpdma_desc_list *list, size_t size, int type)
+{
+	int r;
+
+	r = vpdma_buf_alloc(&list->buf, size);
+	if (r)
+		return r;
+
+	list->next = list->buf.addr;
+
+	list->type = type;
+
+	return 0;
+}
+
+/*
+ * once a descriptor list is parsed by VPDMA, we reset the list by emptying it,
+ * to allow new descriptors to be added to the list.
+ */
+void vpdma_reset_desc_list(struct vpdma_desc_list *list)
+{
+	list->next = list->buf.addr;
+}
+
+/*
+ * free the buffer allocated fot the VPDMA descriptor list, this should be
+ * called when the user doesn't want to use VPDMA any more.
+ */
+void vpdma_free_desc_list(struct vpdma_desc_list *list)
+{
+	vpdma_buf_free(&list->buf);
+
+	list->next = NULL;
+}
+
+static bool vpdma_list_busy(struct vpdma_data *vpdma, int list_num)
+{
+	return read_reg(vpdma, VPDMA_LIST_STAT_SYNC) & BIT(list_num + 16);
+}
+
+/*
+ * submit a list of DMA descriptors to the VPE VPDMA, do not wait for completion
+ */
+int vpdma_submit_descs(struct vpdma_data *vpdma, struct vpdma_desc_list *list)
+{
+	/* we always use the first list */
+	int list_num = 0;
+	int list_size;
+
+	if (vpdma_list_busy(vpdma, list_num))
+		return -EBUSY;
+
+	/* 16-byte granularity */
+	list_size = (list->next - list->buf.addr) >> 4;
+
+	write_reg(vpdma, VPDMA_LIST_ADDR, (u32) list->buf.dma_addr);
+
+	write_reg(vpdma, VPDMA_LIST_ATTR,
+			(list_num << VPDMA_LIST_NUM_SHFT) |
+			(list->type << VPDMA_LIST_TYPE_SHFT) |
+			list_size);
+
+	return 0;
+}
+
+/* set or clear the mask for list complete interrupt */
+void vpdma_enable_list_complete_irq(struct vpdma_data *vpdma, int list_num,
+		bool enable)
+{
+	u32 val;
+
+	val = read_reg(vpdma, VPDMA_INT_LIST0_MASK);
+	if (enable)
+		val |= (1 << (list_num * 2));
+	else
+		val &= ~(1 << (list_num * 2));
+	write_reg(vpdma, VPDMA_INT_LIST0_MASK, val);
+}
+
+/* clear previosuly occured list intterupts in the LIST_STAT register */
+void vpdma_clear_list_stat(struct vpdma_data *vpdma)
+{
+	write_reg(vpdma, VPDMA_INT_LIST0_STAT,
+		read_reg(vpdma, VPDMA_INT_LIST0_STAT));
+}
+
+/*
+ * configures the output mode of the line buffer for the given client, the
+ * line buffer content can either be mirrored(each line repeated twice) or
+ * passed to the client as is
+ */
+void vpdma_set_line_mode(struct vpdma_data *vpdma, int line_mode,
+		enum vpdma_channel chan)
+{
+	int client_cstat = chan_info[chan].cstat_offset;
+
+	write_field_reg(vpdma, client_cstat, line_mode,
+		VPDMA_CSTAT_LINE_MODE_MASK, VPDMA_CSTAT_LINE_MODE_SHIFT);
+}
+
+/*
+ * configures the event which should trigger VPDMA transfer for the given
+ * client
+ */
+void vpdma_set_frame_start_event(struct vpdma_data *vpdma,
+		enum vpdma_frame_start_event fs_event,
+		enum vpdma_channel chan)
+{
+	int client_cstat = chan_info[chan].cstat_offset;
+
+	write_field_reg(vpdma, client_cstat, fs_event,
+		VPDMA_CSTAT_FRAME_START_MASK, VPDMA_CSTAT_FRAME_START_SHIFT);
+}
+
+static void vpdma_firmware_cb(const struct firmware *f, void *context)
+{
+	struct vpdma_data *vpdma = context;
+	struct vpdma_buf fw_dma_buf;
+	int i, r;
+
+	dev_dbg(&vpdma->pdev->dev, "firmware callback\n");
+
+	if (!f || !f->data) {
+		dev_err(&vpdma->pdev->dev, "couldn't get firmware\n");
+		return;
+	}
+
+	/* already initialized */
+	if (read_field_reg(vpdma, VPDMA_LIST_ATTR, VPDMA_LIST_RDY_MASK,
+			VPDMA_LIST_RDY_SHFT)) {
+		vpdma->ready = true;
+		return;
+	}
+
+	r = vpdma_buf_alloc(&fw_dma_buf, f->size);
+	if (r) {
+		dev_err(&vpdma->pdev->dev,
+			"failed to allocate dma buffer for firmware\n");
+		goto rel_fw;
+	}
+
+	memcpy(fw_dma_buf.addr, f->data, f->size);
+
+	vpdma_buf_map(vpdma, &fw_dma_buf);
+
+	write_reg(vpdma, VPDMA_LIST_ADDR, (u32) fw_dma_buf.dma_addr);
+
+	for (i = 0; i < 100; i++) {		/* max 1 second */
+		msleep_interruptible(10);
+
+		if (read_field_reg(vpdma, VPDMA_LIST_ATTR, VPDMA_LIST_RDY_MASK,
+				VPDMA_LIST_RDY_SHFT))
+			break;
+	}
+
+	if (i == 100) {
+		dev_err(&vpdma->pdev->dev, "firmware upload failed\n");
+		goto free_buf;
+	}
+
+	vpdma->ready = true;
+
+free_buf:
+	vpdma_buf_unmap(vpdma, &fw_dma_buf);
+
+	vpdma_buf_free(&fw_dma_buf);
+rel_fw:
+	release_firmware(f);
+}
+
+static int vpdma_load_firmware(struct vpdma_data *vpdma)
+{
+	int r;
+	struct device *dev = &vpdma->pdev->dev;
+
+	r = request_firmware_nowait(THIS_MODULE, 1,
+		(const char *) VPDMA_FIRMWARE, dev, GFP_KERNEL, vpdma,
+		vpdma_firmware_cb);
+	if (r) {
+		dev_err(dev, "firmware not available %s\n", VPDMA_FIRMWARE);
+		return r;
+	} else {
+		dev_info(dev, "loading firmware %s\n", VPDMA_FIRMWARE);
+	}
+
+	return 0;
+}
+
+struct vpdma_data *vpdma_create(struct platform_device *pdev)
+{
+	struct resource *res;
+	struct vpdma_data *vpdma;
+	int r;
+
+	dev_dbg(&pdev->dev, "vpdma_create\n");
+
+	vpdma = devm_kzalloc(&pdev->dev, sizeof(*vpdma), GFP_KERNEL);
+	if (!vpdma) {
+		dev_err(&pdev->dev, "couldn't alloc vpdma_dev\n");
+		return ERR_PTR(-ENOMEM);
+	}
+
+	vpdma->pdev = pdev;
+
+	res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "vpdma");
+	if (res == NULL) {
+		dev_err(&pdev->dev, "missing platform resources data\n");
+		return ERR_PTR(-ENODEV);
+	}
+
+	vpdma->base = devm_ioremap(&pdev->dev, res->start, resource_size(res));
+	if (!vpdma->base) {
+		dev_err(&pdev->dev, "failed to ioremap\n");
+		return ERR_PTR(-ENOMEM);
+	}
+
+	r = vpdma_load_firmware(vpdma);
+	if (r) {
+		pr_err("failed to load firmware %s\n", VPDMA_FIRMWARE);
+		return ERR_PTR(r);
+	}
+
+	return vpdma;
+}
+MODULE_FIRMWARE(VPDMA_FIRMWARE);
diff --git a/drivers/media/platform/ti-vpe/vpdma.h b/drivers/media/platform/ti-vpe/vpdma.h
new file mode 100644
index 0000000..2e571a8
--- /dev/null
+++ b/drivers/media/platform/ti-vpe/vpdma.h
@@ -0,0 +1,154 @@
+/*
+ * Copyright (c) 2013 Texas Instruments Inc.
+ *
+ * David Griego, <dagriego@biglakesoftware.com>
+ * Dale Farnsworth, <dale@farnsworth.org>
+ * Archit Taneja, <archit@ti.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#ifndef __TI_VPDMA_H_
+#define __TI_VPDMA_H_
+
+/*
+ * A vpdma_buf tracks the size, DMA address and mapping status of each
+ * driver DMA area.
+ */
+struct vpdma_buf {
+	void			*addr;
+	dma_addr_t		dma_addr;
+	size_t			size;
+	bool			mapped;
+};
+
+struct vpdma_desc_list {
+	struct vpdma_buf buf;
+	void *next;
+	int type;
+};
+
+struct vpdma_data {
+	void __iomem		*base;
+
+	struct platform_device	*pdev;
+
+	/* tells whether vpdma firmware is loaded or not */
+	bool ready;
+};
+
+struct vpdma_data_format {
+	int data_type;
+	u8 depth;
+};
+
+#define VPDMA_DESC_ALIGN		16	/* 16-byte descriptor alignment */
+
+#define VPDMA_MAX_DESC_SIZE		32	/* 8 words */
+
+#define VPDMA_LIST_TYPE_NORMAL		0
+#define VPDMA_LIST_TYPE_SELF_MODIFYING	1
+#define VPDMA_LIST_TYPE_DOORBELL	2
+
+enum vpdma_yuv_formats {
+	VPDMA_DATA_FMT_Y444 = 0,
+	VPDMA_DATA_FMT_Y422,
+	VPDMA_DATA_FMT_Y420,
+	VPDMA_DATA_FMT_C444,
+	VPDMA_DATA_FMT_C422,
+	VPDMA_DATA_FMT_C420,
+	VPDMA_DATA_FMT_YC422,
+	VPDMA_DATA_FMT_YC444,
+	VPDMA_DATA_FMT_CY422,
+};
+
+enum vpdma_rgb_formats {
+	VPDMA_DATA_FMT_RGB565 = 0,
+	VPDMA_DATA_FMT_ARGB16_1555,
+	VPDMA_DATA_FMT_ARGB16,
+	VPDMA_DATA_FMT_RGBA16_5551,
+	VPDMA_DATA_FMT_RGBA16,
+	VPDMA_DATA_FMT_ARGB24,
+	VPDMA_DATA_FMT_RGB24,
+	VPDMA_DATA_FMT_ARGB32,
+	VPDMA_DATA_FMT_RGBA24,
+	VPDMA_DATA_FMT_RGBA32,
+	VPDMA_DATA_FMT_BGR565,
+	VPDMA_DATA_FMT_ABGR16_1555,
+	VPDMA_DATA_FMT_ABGR16,
+	VPDMA_DATA_FMT_BGRA16_5551,
+	VPDMA_DATA_FMT_BGRA16,
+	VPDMA_DATA_FMT_ABGR24,
+	VPDMA_DATA_FMT_BGR24,
+	VPDMA_DATA_FMT_ABGR32,
+	VPDMA_DATA_FMT_BGRA24,
+	VPDMA_DATA_FMT_BGRA32,
+};
+
+enum vpdma_misc_formats {
+	VPDMA_DATA_FMT_MV = 0,
+};
+
+extern const struct vpdma_data_format vpdma_yuv_fmts[];
+extern const struct vpdma_data_format vpdma_rgb_fmts[];
+extern const struct vpdma_data_format vpdma_misc_fmts[];
+
+enum vpdma_frame_start_event {
+	VPDMA_FSEVENT_HDMI_FID = 0,
+	VPDMA_FSEVENT_DVO2_FID,
+	VPDMA_FSEVENT_HDCOMP_FID,
+	VPDMA_FSEVENT_SD_FID,
+	VPDMA_FSEVENT_LM_FID0,
+	VPDMA_FSEVENT_LM_FID1,
+	VPDMA_FSEVENT_LM_FID2,
+	VPDMA_FSEVENT_CHANNEL_ACTIVE,
+};
+
+/*
+ * VPDMA channel numbers
+ */
+enum vpdma_channel {
+	VPE_CHAN_LUMA1_IN,
+	VPE_CHAN_CHROMA1_IN,
+	VPE_CHAN_LUMA2_IN,
+	VPE_CHAN_CHROMA2_IN,
+	VPE_CHAN_LUMA3_IN,
+	VPE_CHAN_CHROMA3_IN,
+	VPE_CHAN_MV_IN,
+	VPE_CHAN_MV_OUT,
+	VPE_CHAN_LUMA_OUT,
+	VPE_CHAN_CHROMA_OUT,
+	VPE_CHAN_RGB_OUT,
+};
+
+/* vpdma descriptor buffer allocation and management */
+int vpdma_buf_alloc(struct vpdma_buf *buf, size_t size);
+void vpdma_buf_free(struct vpdma_buf *buf);
+int vpdma_buf_map(struct vpdma_data *vpdma, struct vpdma_buf *buf);
+void vpdma_buf_unmap(struct vpdma_data *vpdma, struct vpdma_buf *buf);
+
+/* vpdma descriptor list funcs */
+int vpdma_create_desc_list(struct vpdma_desc_list *list, size_t size, int type);
+void vpdma_reset_desc_list(struct vpdma_desc_list *list);
+void vpdma_free_desc_list(struct vpdma_desc_list *list);
+int vpdma_submit_descs(struct vpdma_data *vpdma, struct vpdma_desc_list *list);
+
+/* vpdma list interrupt management */
+void vpdma_enable_list_complete_irq(struct vpdma_data *vpdma, int list_num,
+		bool enable);
+void vpdma_clear_list_stat(struct vpdma_data *vpdma);
+
+/* vpdma client configuration */
+void vpdma_set_line_mode(struct vpdma_data *vpdma, int line_mode,
+		enum vpdma_channel chan);
+void vpdma_set_frame_start_event(struct vpdma_data *vpdma,
+		enum vpdma_frame_start_event fs_event, enum vpdma_channel chan);
+
+void vpdma_dump_regs(struct vpdma_data *vpdma);
+
+/* initialize vpdma, passed with VPE's platform device pointer */
+struct vpdma_data *vpdma_create(struct platform_device *pdev);
+
+#endif
diff --git a/drivers/media/platform/ti-vpe/vpdma_priv.h b/drivers/media/platform/ti-vpe/vpdma_priv.h
new file mode 100644
index 0000000..8ff51a3
--- /dev/null
+++ b/drivers/media/platform/ti-vpe/vpdma_priv.h
@@ -0,0 +1,119 @@
+/*
+ * Copyright (c) 2013 Texas Instruments Inc.
+ *
+ * David Griego, <dagriego@biglakesoftware.com>
+ * Dale Farnsworth, <dale@farnsworth.org>
+ * Archit Taneja, <archit@ti.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#ifndef _TI_VPDMA_PRIV_H_
+#define _TI_VPDMA_PRIV_H_
+
+/*
+ * VPDMA Register offsets
+ */
+
+/* Top level */
+#define VPDMA_PID		0x00
+#define VPDMA_LIST_ADDR		0x04
+#define VPDMA_LIST_ATTR		0x08
+#define VPDMA_LIST_STAT_SYNC	0x0c
+#define VPDMA_BG_RGB		0x18
+#define VPDMA_BG_YUV		0x1c
+#define VPDMA_SETUP		0x30
+#define VPDMA_MAX_SIZE1		0x34
+#define VPDMA_MAX_SIZE2		0x38
+#define VPDMA_MAX_SIZE3		0x3c
+
+/* Interrupts */
+#define VPDMA_INT_CHAN_STAT(grp)	(0x40 + grp * 8)
+#define VPDMA_INT_CHAN_MASK(grp)	(VPDMA_INT_CHAN_STAT(grp) + 4)
+#define VPDMA_INT_CLIENT0_STAT		0x78
+#define VPDMA_INT_CLIENT0_MASK		0x7c
+#define VPDMA_INT_CLIENT1_STAT		0x80
+#define VPDMA_INT_CLIENT1_MASK		0x84
+#define VPDMA_INT_LIST0_STAT		0x88
+#define VPDMA_INT_LIST0_MASK		0x8c
+
+#define VPDMA_PERFMON(i)		(0x200 + i * 4)
+
+/* VPE specific client registers */
+#define VPDMA_DEI_CHROMA1_CSTAT		0x0300
+#define VPDMA_DEI_LUMA1_CSTAT		0x0304
+#define VPDMA_DEI_LUMA2_CSTAT		0x0308
+#define VPDMA_DEI_CHROMA2_CSTAT		0x030c
+#define VPDMA_DEI_LUMA3_CSTAT		0x0310
+#define VPDMA_DEI_CHROMA3_CSTAT		0x0314
+#define VPDMA_DEI_MV_IN_CSTAT		0x0330
+#define VPDMA_DEI_MV_OUT_CSTAT		0x033c
+#define VPDMA_VIP_UP_Y_CSTAT		0x0390
+#define VPDMA_VIP_UP_UV_CSTAT		0x0394
+#define VPDMA_VPI_CTL_CSTAT		0x03d0
+
+/* Reg field info for VPDMA_CLIENT_CSTAT registers */
+#define VPDMA_CSTAT_LINE_MODE_MASK	0x03
+#define VPDMA_CSTAT_LINE_MODE_SHIFT	8
+#define VPDMA_CSTAT_FRAME_START_MASK	0xf
+#define VPDMA_CSTAT_FRAME_START_SHIFT	10
+
+#define VPDMA_LIST_NUM_MASK		0x07
+#define VPDMA_LIST_NUM_SHFT		24
+#define VPDMA_LIST_STOP_SHFT		20
+#define VPDMA_LIST_RDY_MASK		0x01
+#define VPDMA_LIST_RDY_SHFT		19
+#define VPDMA_LIST_TYPE_MASK		0x03
+#define VPDMA_LIST_TYPE_SHFT		16
+#define VPDMA_LIST_SIZE_MASK		0xffff
+
+/* VPDMA data type values for data formats */
+#define DATA_TYPE_Y444				0x0
+#define DATA_TYPE_Y422				0x1
+#define DATA_TYPE_Y420				0x2
+#define DATA_TYPE_C444				0x4
+#define DATA_TYPE_C422				0x5
+#define DATA_TYPE_C420				0x6
+#define DATA_TYPE_YC422				0x7
+#define DATA_TYPE_YC444				0x8
+#define DATA_TYPE_CY422				0x23
+
+#define DATA_TYPE_RGB16_565			0x0
+#define DATA_TYPE_ARGB_1555			0x1
+#define DATA_TYPE_ARGB_4444			0x2
+#define DATA_TYPE_RGBA_5551			0x3
+#define DATA_TYPE_RGBA_4444			0x4
+#define DATA_TYPE_ARGB24_6666			0x5
+#define DATA_TYPE_RGB24_888			0x6
+#define DATA_TYPE_ARGB32_8888			0x7
+#define DATA_TYPE_RGBA24_6666			0x8
+#define DATA_TYPE_RGBA32_8888			0x9
+#define DATA_TYPE_BGR16_565			0x10
+#define DATA_TYPE_ABGR_1555			0x11
+#define DATA_TYPE_ABGR_4444			0x12
+#define DATA_TYPE_BGRA_5551			0x13
+#define DATA_TYPE_BGRA_4444			0x14
+#define DATA_TYPE_ABGR24_6666			0x15
+#define DATA_TYPE_BGR24_888			0x16
+#define DATA_TYPE_ABGR32_8888			0x17
+#define DATA_TYPE_BGRA24_6666			0x18
+#define DATA_TYPE_BGRA32_8888			0x19
+
+#define DATA_TYPE_MV				0x3
+
+/* VPDMA channel numbers(only VPE channels for now) */
+#define	VPE_CHAN_NUM_LUMA1_IN		0
+#define	VPE_CHAN_NUM_CHROMA1_IN		1
+#define	VPE_CHAN_NUM_LUMA2_IN		2
+#define	VPE_CHAN_NUM_CHROMA2_IN		3
+#define	VPE_CHAN_NUM_LUMA3_IN		4
+#define	VPE_CHAN_NUM_CHROMA3_IN		5
+#define	VPE_CHAN_NUM_MV_IN		12
+#define	VPE_CHAN_NUM_MV_OUT		15
+#define	VPE_CHAN_NUM_LUMA_OUT		102
+#define	VPE_CHAN_NUM_CHROMA_OUT		103
+#define	VPE_CHAN_NUM_RGB_OUT		106
+
+#endif
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v2 1/6] v4l: ti-vpe: Create a vpdma helper library
@ 2013-08-20 11:00     ` Archit Taneja
  0 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-08-20 11:00 UTC (permalink / raw)
  To: linux-media, hverkuil, laurent.pinchart, tomi.valkeinen
  Cc: linux-omap, Archit Taneja

The primary function of VPDMA is to move data between external memory and
internal processing modules(in our case, VPE) that source or sink data. VPDMA is
capable of buffering this data and then delivering the data as demanded to the
modules as programmed. The modules that source or sink data are referred to as
clients or ports. A channel is setup inside the VPDMA to connect a specific
memory buffer to a specific client. The VPDMA centralizes the DMA control
functions and buffering required to allow all the clients to minimize the
effect of long latency times.

Add the following to the VPDMA helper:

- A data struct which describe VPDMA channels. For now, these channels are the
  ones used only by VPE, the list of channels will increase when VIP(Video
  Input Port) also uses the VPDMA library. This channel information will be
  used to populate fields required by data descriptors.

- Data structs which describe the different data types supported by VPDMA. This
  data type information will be used to populate fields required by data
  descriptors and used by the VPE driver to map a V4L2 format to the
  corresponding VPDMA data type.

- Provide VPDMA register offset definitions, functions to read, write and modify
  VPDMA registers.

- Functions to create and submit a VPDMA list. A list is a group of descriptors
  that makes up a set of DMA transfers that need to be completed. Each
  descriptor will either perform a DMA transaction to fetch input buffers and
  write to output buffers(data descriptors), or configure the MMRs of sub blocks
  of VPE(configuration descriptors), or provide control information to VPDMA
  (control descriptors).

- Functions to allocate, map and unmap buffers needed for the descriptor list,
  payloads containing MMR values and motion vector buffers. These use the
  DMA mapping APIs to ensure exclusive access to VPDMA.

- Functions to enable VPDMA interrupts. VPDMA can trigger an interrupt on the
  VPE interrupt line when a descriptor list is parsed completely and the DMA
  transactions are completed. This requires masking the events in VPDMA
  registers and configuring some top level VPE interrupt registers.

- Enable some VPDMA specific parameters: frame start event(when to start DMA for
  a client) and line mode(whether each line fetched should be mirrored or not).

- Function to load firmware required by VPDMA. VPDMA requires a firmware for
  it's internal list manager. We add the required request_firmware apis to fetch
  this firmware from user space.

- Function to dump VPDMA registers.

- A function to initialize and create a VPDMA instance, this will be called by
  the VPE driver with it's platform device pointer, this function will take care
  of loading VPDMA firmware and returning a vpdma_data instance back to the VPE
  driver. The VIP driver will also call the same init function to initialize it's
  own VPDMA instance.

Signed-off-by: Archit Taneja <archit@ti.com>
---
 drivers/media/platform/ti-vpe/vpdma.c      | 578 +++++++++++++++++++++++++++++
 drivers/media/platform/ti-vpe/vpdma.h      | 154 ++++++++
 drivers/media/platform/ti-vpe/vpdma_priv.h | 119 ++++++
 3 files changed, 851 insertions(+)
 create mode 100644 drivers/media/platform/ti-vpe/vpdma.c
 create mode 100644 drivers/media/platform/ti-vpe/vpdma.h
 create mode 100644 drivers/media/platform/ti-vpe/vpdma_priv.h

diff --git a/drivers/media/platform/ti-vpe/vpdma.c b/drivers/media/platform/ti-vpe/vpdma.c
new file mode 100644
index 0000000..84b8ee52
--- /dev/null
+++ b/drivers/media/platform/ti-vpe/vpdma.c
@@ -0,0 +1,578 @@
+/*
+ * VPDMA helper library
+ *
+ * Copyright (c) 2013 Texas Instruments Inc.
+ *
+ * David Griego, <dagriego@biglakesoftware.com>
+ * Dale Farnsworth, <dale@farnsworth.org>
+ * Archit Taneja, <archit@ti.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#include <linux/delay.h>
+#include <linux/dma-mapping.h>
+#include <linux/err.h>
+#include <linux/firmware.h>
+#include <linux/io.h>
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+
+#include "vpdma.h"
+#include "vpdma_priv.h"
+
+#define VPDMA_FIRMWARE	"vpdma-1b8.bin"
+
+const struct vpdma_data_format vpdma_yuv_fmts[] = {
+	[VPDMA_DATA_FMT_Y444] = {
+		.data_type	= DATA_TYPE_Y444,
+		.depth		= 8,
+	},
+	[VPDMA_DATA_FMT_Y422] = {
+		.data_type	= DATA_TYPE_Y422,
+		.depth		= 8,
+	},
+	[VPDMA_DATA_FMT_Y420] = {
+		.data_type	= DATA_TYPE_Y420,
+		.depth		= 8,
+	},
+	[VPDMA_DATA_FMT_C444] = {
+		.data_type	= DATA_TYPE_C444,
+		.depth		= 8,
+	},
+	[VPDMA_DATA_FMT_C422] = {
+		.data_type	= DATA_TYPE_C422,
+		.depth		= 8,
+	},
+	[VPDMA_DATA_FMT_C420] = {
+		.data_type	= DATA_TYPE_C420,
+		.depth		= 4,
+	},
+	[VPDMA_DATA_FMT_YC422] = {
+		.data_type	= DATA_TYPE_YC422,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_YC444] = {
+		.data_type	= DATA_TYPE_YC444,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_CY422] = {
+		.data_type	= DATA_TYPE_CY422,
+		.depth		= 16,
+	},
+};
+
+const struct vpdma_data_format vpdma_rgb_fmts[] = {
+	[VPDMA_DATA_FMT_RGB565] = {
+		.data_type	= DATA_TYPE_RGB16_565,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_ARGB16_1555] = {
+		.data_type	= DATA_TYPE_ARGB_1555,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_ARGB16] = {
+		.data_type	= DATA_TYPE_ARGB_4444,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_RGBA16_5551] = {
+		.data_type	= DATA_TYPE_RGBA_5551,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_RGBA16] = {
+		.data_type	= DATA_TYPE_RGBA_4444,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_ARGB24] = {
+		.data_type	= DATA_TYPE_ARGB24_6666,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_RGB24] = {
+		.data_type	= DATA_TYPE_RGB24_888,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_ARGB32] = {
+		.data_type	= DATA_TYPE_ARGB32_8888,
+		.depth		= 32,
+	},
+	[VPDMA_DATA_FMT_RGBA24] = {
+		.data_type	= DATA_TYPE_RGBA24_6666,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_RGBA32] = {
+		.data_type	= DATA_TYPE_RGBA32_8888,
+		.depth		= 32,
+	},
+	[VPDMA_DATA_FMT_BGR565] = {
+		.data_type	= DATA_TYPE_BGR16_565,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_ABGR16_1555] = {
+		.data_type	= DATA_TYPE_ABGR_1555,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_ABGR16] = {
+		.data_type	= DATA_TYPE_ABGR_4444,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_BGRA16_5551] = {
+		.data_type	= DATA_TYPE_BGRA_5551,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_BGRA16] = {
+		.data_type	= DATA_TYPE_BGRA_4444,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_ABGR24] = {
+		.data_type	= DATA_TYPE_ABGR24_6666,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_BGR24] = {
+		.data_type	= DATA_TYPE_BGR24_888,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_ABGR32] = {
+		.data_type	= DATA_TYPE_ABGR32_8888,
+		.depth		= 32,
+	},
+	[VPDMA_DATA_FMT_BGRA24] = {
+		.data_type	= DATA_TYPE_BGRA24_6666,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_BGRA32] = {
+		.data_type	= DATA_TYPE_BGRA32_8888,
+		.depth		= 32,
+	},
+};
+
+const struct vpdma_data_format vpdma_misc_fmts[] = {
+	[VPDMA_DATA_FMT_MV] = {
+		.data_type	= DATA_TYPE_MV,
+		.depth		= 4,
+	},
+};
+
+struct vpdma_channel_info {
+	int num;		/* VPDMA channel number */
+	int cstat_offset;	/* client CSTAT register offset */
+};
+
+static const struct vpdma_channel_info chan_info[] = {
+	[VPE_CHAN_LUMA1_IN] = {
+		.num		= VPE_CHAN_NUM_LUMA1_IN,
+		.cstat_offset	= VPDMA_DEI_LUMA1_CSTAT,
+	},
+	[VPE_CHAN_CHROMA1_IN] = {
+		.num		= VPE_CHAN_NUM_CHROMA1_IN,
+		.cstat_offset	= VPDMA_DEI_CHROMA1_CSTAT,
+	},
+	[VPE_CHAN_LUMA2_IN] = {
+		.num		= VPE_CHAN_NUM_LUMA2_IN,
+		.cstat_offset	= VPDMA_DEI_LUMA2_CSTAT,
+	},
+	[VPE_CHAN_CHROMA2_IN] = {
+		.num		= VPE_CHAN_NUM_CHROMA2_IN,
+		.cstat_offset	= VPDMA_DEI_CHROMA2_CSTAT,
+	},
+	[VPE_CHAN_LUMA3_IN] = {
+		.num		= VPE_CHAN_NUM_LUMA3_IN,
+		.cstat_offset	= VPDMA_DEI_LUMA3_CSTAT,
+	},
+	[VPE_CHAN_CHROMA3_IN] = {
+		.num		= VPE_CHAN_NUM_CHROMA3_IN,
+		.cstat_offset	= VPDMA_DEI_CHROMA3_CSTAT,
+	},
+	[VPE_CHAN_MV_IN] = {
+		.num		= VPE_CHAN_NUM_MV_IN,
+		.cstat_offset	= VPDMA_DEI_MV_IN_CSTAT,
+	},
+	[VPE_CHAN_MV_OUT] = {
+		.num		= VPE_CHAN_NUM_MV_OUT,
+		.cstat_offset	= VPDMA_DEI_MV_OUT_CSTAT,
+	},
+	[VPE_CHAN_LUMA_OUT] = {
+		.num		= VPE_CHAN_NUM_LUMA_OUT,
+		.cstat_offset	= VPDMA_VIP_UP_Y_CSTAT,
+	},
+	[VPE_CHAN_CHROMA_OUT] = {
+		.num		= VPE_CHAN_NUM_CHROMA_OUT,
+		.cstat_offset	= VPDMA_VIP_UP_UV_CSTAT,
+	},
+	[VPE_CHAN_RGB_OUT] = {
+		.num		= VPE_CHAN_NUM_RGB_OUT,
+		.cstat_offset	= VPDMA_VIP_UP_Y_CSTAT,
+	},
+};
+
+static u32 read_reg(struct vpdma_data *vpdma, int offset)
+{
+	return ioread32(vpdma->base + offset);
+}
+
+static void write_reg(struct vpdma_data *vpdma, int offset, u32 value)
+{
+	iowrite32(value, vpdma->base + offset);
+}
+
+static int read_field_reg(struct vpdma_data *vpdma, int offset,
+		u32 mask, int shift)
+{
+	return (read_reg(vpdma, offset) & (mask << shift)) >> shift;
+}
+
+static void write_field_reg(struct vpdma_data *vpdma, int offset, u32 field,
+		u32 mask, int shift)
+{
+	u32 val = read_reg(vpdma, offset);
+
+	val &= ~(mask << shift);
+	val |= (field & mask) << shift;
+
+	write_reg(vpdma, offset, val);
+}
+
+void vpdma_dump_regs(struct vpdma_data *vpdma)
+{
+	struct device *dev = &vpdma->pdev->dev;
+
+#define DUMPREG(r) dev_dbg(dev, "%-35s %08x\n", #r, read_reg(vpdma, VPDMA_##r))
+
+	dev_dbg(dev, "VPDMA Registers:\n");
+
+	DUMPREG(PID);
+	DUMPREG(LIST_ADDR);
+	DUMPREG(LIST_ATTR);
+	DUMPREG(LIST_STAT_SYNC);
+	DUMPREG(BG_RGB);
+	DUMPREG(BG_YUV);
+	DUMPREG(SETUP);
+	DUMPREG(MAX_SIZE1);
+	DUMPREG(MAX_SIZE2);
+	DUMPREG(MAX_SIZE3);
+
+	/*
+	 * dumping registers of only group0 and group3, because VPE channels
+	 * lie within group0 and group3 registers
+	 */
+	DUMPREG(INT_CHAN_STAT(0));
+	DUMPREG(INT_CHAN_MASK(0));
+	DUMPREG(INT_CHAN_STAT(3));
+	DUMPREG(INT_CHAN_MASK(3));
+	DUMPREG(INT_CLIENT0_STAT);
+	DUMPREG(INT_CLIENT0_MASK);
+	DUMPREG(INT_CLIENT1_STAT);
+	DUMPREG(INT_CLIENT1_MASK);
+	DUMPREG(INT_LIST0_STAT);
+	DUMPREG(INT_LIST0_MASK);
+
+	/*
+	 * these are registers specific to VPE clients, we can make this
+	 * function dump client registers specific to VPE or VIP based on
+	 * who is using it
+	 */
+	DUMPREG(DEI_CHROMA1_CSTAT);
+	DUMPREG(DEI_LUMA1_CSTAT);
+	DUMPREG(DEI_CHROMA2_CSTAT);
+	DUMPREG(DEI_LUMA2_CSTAT);
+	DUMPREG(DEI_CHROMA3_CSTAT);
+	DUMPREG(DEI_LUMA3_CSTAT);
+	DUMPREG(DEI_MV_IN_CSTAT);
+	DUMPREG(DEI_MV_OUT_CSTAT);
+	DUMPREG(VIP_UP_Y_CSTAT);
+	DUMPREG(VIP_UP_UV_CSTAT);
+	DUMPREG(VPI_CTL_CSTAT);
+}
+
+/*
+ * Allocate a DMA buffer
+ */
+int vpdma_buf_alloc(struct vpdma_buf *buf, size_t size)
+{
+	buf->size = size;
+	buf->mapped = false;
+	buf->addr = kzalloc(size, GFP_KERNEL);
+	if (!buf->addr)
+		return -ENOMEM;
+
+	WARN_ON((u32) buf->addr & VPDMA_DESC_ALIGN);
+
+	return 0;
+}
+
+void vpdma_buf_free(struct vpdma_buf *buf)
+{
+	WARN_ON(buf->mapped);
+	kfree(buf->addr);
+	buf->addr = NULL;
+	buf->size = 0;
+}
+
+/*
+ * map a DMA buffer, enabling DMA access
+ */
+int vpdma_buf_map(struct vpdma_data *vpdma, struct vpdma_buf *buf)
+{
+	struct device *dev = &vpdma->pdev->dev;
+
+	WARN_ON(buf->mapped);
+	buf->dma_addr = dma_map_single(dev, buf->addr, buf->size,
+				DMA_TO_DEVICE);
+	if (dma_mapping_error(dev, buf->dma_addr)) {
+		dev_err(dev, "failed to map buffer\n");
+		return -EINVAL;
+	}
+
+	buf->mapped = true;
+
+	return 0;
+}
+
+/*
+ * unmap a DMA buffer, disabling DMA access and
+ * allowing the main processor to acces the data
+ */
+void vpdma_buf_unmap(struct vpdma_data *vpdma, struct vpdma_buf *buf)
+{
+	struct device *dev = &vpdma->pdev->dev;
+
+	if (buf->mapped)
+		dma_unmap_single(dev, buf->dma_addr, buf->size, DMA_TO_DEVICE);
+
+	buf->mapped = false;
+}
+
+/*
+ * create a descriptor list, the user of this list will append configuration,
+ * control and data descriptors to this list, this list will be submitted to
+ * VPDMA. VPDMA's list parser will go through each descriptor and perform the
+ * required DMA operations
+ */
+int vpdma_create_desc_list(struct vpdma_desc_list *list, size_t size, int type)
+{
+	int r;
+
+	r = vpdma_buf_alloc(&list->buf, size);
+	if (r)
+		return r;
+
+	list->next = list->buf.addr;
+
+	list->type = type;
+
+	return 0;
+}
+
+/*
+ * once a descriptor list is parsed by VPDMA, we reset the list by emptying it,
+ * to allow new descriptors to be added to the list.
+ */
+void vpdma_reset_desc_list(struct vpdma_desc_list *list)
+{
+	list->next = list->buf.addr;
+}
+
+/*
+ * free the buffer allocated fot the VPDMA descriptor list, this should be
+ * called when the user doesn't want to use VPDMA any more.
+ */
+void vpdma_free_desc_list(struct vpdma_desc_list *list)
+{
+	vpdma_buf_free(&list->buf);
+
+	list->next = NULL;
+}
+
+static bool vpdma_list_busy(struct vpdma_data *vpdma, int list_num)
+{
+	return read_reg(vpdma, VPDMA_LIST_STAT_SYNC) & BIT(list_num + 16);
+}
+
+/*
+ * submit a list of DMA descriptors to the VPE VPDMA, do not wait for completion
+ */
+int vpdma_submit_descs(struct vpdma_data *vpdma, struct vpdma_desc_list *list)
+{
+	/* we always use the first list */
+	int list_num = 0;
+	int list_size;
+
+	if (vpdma_list_busy(vpdma, list_num))
+		return -EBUSY;
+
+	/* 16-byte granularity */
+	list_size = (list->next - list->buf.addr) >> 4;
+
+	write_reg(vpdma, VPDMA_LIST_ADDR, (u32) list->buf.dma_addr);
+
+	write_reg(vpdma, VPDMA_LIST_ATTR,
+			(list_num << VPDMA_LIST_NUM_SHFT) |
+			(list->type << VPDMA_LIST_TYPE_SHFT) |
+			list_size);
+
+	return 0;
+}
+
+/* set or clear the mask for list complete interrupt */
+void vpdma_enable_list_complete_irq(struct vpdma_data *vpdma, int list_num,
+		bool enable)
+{
+	u32 val;
+
+	val = read_reg(vpdma, VPDMA_INT_LIST0_MASK);
+	if (enable)
+		val |= (1 << (list_num * 2));
+	else
+		val &= ~(1 << (list_num * 2));
+	write_reg(vpdma, VPDMA_INT_LIST0_MASK, val);
+}
+
+/* clear previosuly occured list intterupts in the LIST_STAT register */
+void vpdma_clear_list_stat(struct vpdma_data *vpdma)
+{
+	write_reg(vpdma, VPDMA_INT_LIST0_STAT,
+		read_reg(vpdma, VPDMA_INT_LIST0_STAT));
+}
+
+/*
+ * configures the output mode of the line buffer for the given client, the
+ * line buffer content can either be mirrored(each line repeated twice) or
+ * passed to the client as is
+ */
+void vpdma_set_line_mode(struct vpdma_data *vpdma, int line_mode,
+		enum vpdma_channel chan)
+{
+	int client_cstat = chan_info[chan].cstat_offset;
+
+	write_field_reg(vpdma, client_cstat, line_mode,
+		VPDMA_CSTAT_LINE_MODE_MASK, VPDMA_CSTAT_LINE_MODE_SHIFT);
+}
+
+/*
+ * configures the event which should trigger VPDMA transfer for the given
+ * client
+ */
+void vpdma_set_frame_start_event(struct vpdma_data *vpdma,
+		enum vpdma_frame_start_event fs_event,
+		enum vpdma_channel chan)
+{
+	int client_cstat = chan_info[chan].cstat_offset;
+
+	write_field_reg(vpdma, client_cstat, fs_event,
+		VPDMA_CSTAT_FRAME_START_MASK, VPDMA_CSTAT_FRAME_START_SHIFT);
+}
+
+static void vpdma_firmware_cb(const struct firmware *f, void *context)
+{
+	struct vpdma_data *vpdma = context;
+	struct vpdma_buf fw_dma_buf;
+	int i, r;
+
+	dev_dbg(&vpdma->pdev->dev, "firmware callback\n");
+
+	if (!f || !f->data) {
+		dev_err(&vpdma->pdev->dev, "couldn't get firmware\n");
+		return;
+	}
+
+	/* already initialized */
+	if (read_field_reg(vpdma, VPDMA_LIST_ATTR, VPDMA_LIST_RDY_MASK,
+			VPDMA_LIST_RDY_SHFT)) {
+		vpdma->ready = true;
+		return;
+	}
+
+	r = vpdma_buf_alloc(&fw_dma_buf, f->size);
+	if (r) {
+		dev_err(&vpdma->pdev->dev,
+			"failed to allocate dma buffer for firmware\n");
+		goto rel_fw;
+	}
+
+	memcpy(fw_dma_buf.addr, f->data, f->size);
+
+	vpdma_buf_map(vpdma, &fw_dma_buf);
+
+	write_reg(vpdma, VPDMA_LIST_ADDR, (u32) fw_dma_buf.dma_addr);
+
+	for (i = 0; i < 100; i++) {		/* max 1 second */
+		msleep_interruptible(10);
+
+		if (read_field_reg(vpdma, VPDMA_LIST_ATTR, VPDMA_LIST_RDY_MASK,
+				VPDMA_LIST_RDY_SHFT))
+			break;
+	}
+
+	if (i == 100) {
+		dev_err(&vpdma->pdev->dev, "firmware upload failed\n");
+		goto free_buf;
+	}
+
+	vpdma->ready = true;
+
+free_buf:
+	vpdma_buf_unmap(vpdma, &fw_dma_buf);
+
+	vpdma_buf_free(&fw_dma_buf);
+rel_fw:
+	release_firmware(f);
+}
+
+static int vpdma_load_firmware(struct vpdma_data *vpdma)
+{
+	int r;
+	struct device *dev = &vpdma->pdev->dev;
+
+	r = request_firmware_nowait(THIS_MODULE, 1,
+		(const char *) VPDMA_FIRMWARE, dev, GFP_KERNEL, vpdma,
+		vpdma_firmware_cb);
+	if (r) {
+		dev_err(dev, "firmware not available %s\n", VPDMA_FIRMWARE);
+		return r;
+	} else {
+		dev_info(dev, "loading firmware %s\n", VPDMA_FIRMWARE);
+	}
+
+	return 0;
+}
+
+struct vpdma_data *vpdma_create(struct platform_device *pdev)
+{
+	struct resource *res;
+	struct vpdma_data *vpdma;
+	int r;
+
+	dev_dbg(&pdev->dev, "vpdma_create\n");
+
+	vpdma = devm_kzalloc(&pdev->dev, sizeof(*vpdma), GFP_KERNEL);
+	if (!vpdma) {
+		dev_err(&pdev->dev, "couldn't alloc vpdma_dev\n");
+		return ERR_PTR(-ENOMEM);
+	}
+
+	vpdma->pdev = pdev;
+
+	res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "vpdma");
+	if (res == NULL) {
+		dev_err(&pdev->dev, "missing platform resources data\n");
+		return ERR_PTR(-ENODEV);
+	}
+
+	vpdma->base = devm_ioremap(&pdev->dev, res->start, resource_size(res));
+	if (!vpdma->base) {
+		dev_err(&pdev->dev, "failed to ioremap\n");
+		return ERR_PTR(-ENOMEM);
+	}
+
+	r = vpdma_load_firmware(vpdma);
+	if (r) {
+		pr_err("failed to load firmware %s\n", VPDMA_FIRMWARE);
+		return ERR_PTR(r);
+	}
+
+	return vpdma;
+}
+MODULE_FIRMWARE(VPDMA_FIRMWARE);
diff --git a/drivers/media/platform/ti-vpe/vpdma.h b/drivers/media/platform/ti-vpe/vpdma.h
new file mode 100644
index 0000000..2e571a8
--- /dev/null
+++ b/drivers/media/platform/ti-vpe/vpdma.h
@@ -0,0 +1,154 @@
+/*
+ * Copyright (c) 2013 Texas Instruments Inc.
+ *
+ * David Griego, <dagriego@biglakesoftware.com>
+ * Dale Farnsworth, <dale@farnsworth.org>
+ * Archit Taneja, <archit@ti.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#ifndef __TI_VPDMA_H_
+#define __TI_VPDMA_H_
+
+/*
+ * A vpdma_buf tracks the size, DMA address and mapping status of each
+ * driver DMA area.
+ */
+struct vpdma_buf {
+	void			*addr;
+	dma_addr_t		dma_addr;
+	size_t			size;
+	bool			mapped;
+};
+
+struct vpdma_desc_list {
+	struct vpdma_buf buf;
+	void *next;
+	int type;
+};
+
+struct vpdma_data {
+	void __iomem		*base;
+
+	struct platform_device	*pdev;
+
+	/* tells whether vpdma firmware is loaded or not */
+	bool ready;
+};
+
+struct vpdma_data_format {
+	int data_type;
+	u8 depth;
+};
+
+#define VPDMA_DESC_ALIGN		16	/* 16-byte descriptor alignment */
+
+#define VPDMA_MAX_DESC_SIZE		32	/* 8 words */
+
+#define VPDMA_LIST_TYPE_NORMAL		0
+#define VPDMA_LIST_TYPE_SELF_MODIFYING	1
+#define VPDMA_LIST_TYPE_DOORBELL	2
+
+enum vpdma_yuv_formats {
+	VPDMA_DATA_FMT_Y444 = 0,
+	VPDMA_DATA_FMT_Y422,
+	VPDMA_DATA_FMT_Y420,
+	VPDMA_DATA_FMT_C444,
+	VPDMA_DATA_FMT_C422,
+	VPDMA_DATA_FMT_C420,
+	VPDMA_DATA_FMT_YC422,
+	VPDMA_DATA_FMT_YC444,
+	VPDMA_DATA_FMT_CY422,
+};
+
+enum vpdma_rgb_formats {
+	VPDMA_DATA_FMT_RGB565 = 0,
+	VPDMA_DATA_FMT_ARGB16_1555,
+	VPDMA_DATA_FMT_ARGB16,
+	VPDMA_DATA_FMT_RGBA16_5551,
+	VPDMA_DATA_FMT_RGBA16,
+	VPDMA_DATA_FMT_ARGB24,
+	VPDMA_DATA_FMT_RGB24,
+	VPDMA_DATA_FMT_ARGB32,
+	VPDMA_DATA_FMT_RGBA24,
+	VPDMA_DATA_FMT_RGBA32,
+	VPDMA_DATA_FMT_BGR565,
+	VPDMA_DATA_FMT_ABGR16_1555,
+	VPDMA_DATA_FMT_ABGR16,
+	VPDMA_DATA_FMT_BGRA16_5551,
+	VPDMA_DATA_FMT_BGRA16,
+	VPDMA_DATA_FMT_ABGR24,
+	VPDMA_DATA_FMT_BGR24,
+	VPDMA_DATA_FMT_ABGR32,
+	VPDMA_DATA_FMT_BGRA24,
+	VPDMA_DATA_FMT_BGRA32,
+};
+
+enum vpdma_misc_formats {
+	VPDMA_DATA_FMT_MV = 0,
+};
+
+extern const struct vpdma_data_format vpdma_yuv_fmts[];
+extern const struct vpdma_data_format vpdma_rgb_fmts[];
+extern const struct vpdma_data_format vpdma_misc_fmts[];
+
+enum vpdma_frame_start_event {
+	VPDMA_FSEVENT_HDMI_FID = 0,
+	VPDMA_FSEVENT_DVO2_FID,
+	VPDMA_FSEVENT_HDCOMP_FID,
+	VPDMA_FSEVENT_SD_FID,
+	VPDMA_FSEVENT_LM_FID0,
+	VPDMA_FSEVENT_LM_FID1,
+	VPDMA_FSEVENT_LM_FID2,
+	VPDMA_FSEVENT_CHANNEL_ACTIVE,
+};
+
+/*
+ * VPDMA channel numbers
+ */
+enum vpdma_channel {
+	VPE_CHAN_LUMA1_IN,
+	VPE_CHAN_CHROMA1_IN,
+	VPE_CHAN_LUMA2_IN,
+	VPE_CHAN_CHROMA2_IN,
+	VPE_CHAN_LUMA3_IN,
+	VPE_CHAN_CHROMA3_IN,
+	VPE_CHAN_MV_IN,
+	VPE_CHAN_MV_OUT,
+	VPE_CHAN_LUMA_OUT,
+	VPE_CHAN_CHROMA_OUT,
+	VPE_CHAN_RGB_OUT,
+};
+
+/* vpdma descriptor buffer allocation and management */
+int vpdma_buf_alloc(struct vpdma_buf *buf, size_t size);
+void vpdma_buf_free(struct vpdma_buf *buf);
+int vpdma_buf_map(struct vpdma_data *vpdma, struct vpdma_buf *buf);
+void vpdma_buf_unmap(struct vpdma_data *vpdma, struct vpdma_buf *buf);
+
+/* vpdma descriptor list funcs */
+int vpdma_create_desc_list(struct vpdma_desc_list *list, size_t size, int type);
+void vpdma_reset_desc_list(struct vpdma_desc_list *list);
+void vpdma_free_desc_list(struct vpdma_desc_list *list);
+int vpdma_submit_descs(struct vpdma_data *vpdma, struct vpdma_desc_list *list);
+
+/* vpdma list interrupt management */
+void vpdma_enable_list_complete_irq(struct vpdma_data *vpdma, int list_num,
+		bool enable);
+void vpdma_clear_list_stat(struct vpdma_data *vpdma);
+
+/* vpdma client configuration */
+void vpdma_set_line_mode(struct vpdma_data *vpdma, int line_mode,
+		enum vpdma_channel chan);
+void vpdma_set_frame_start_event(struct vpdma_data *vpdma,
+		enum vpdma_frame_start_event fs_event, enum vpdma_channel chan);
+
+void vpdma_dump_regs(struct vpdma_data *vpdma);
+
+/* initialize vpdma, passed with VPE's platform device pointer */
+struct vpdma_data *vpdma_create(struct platform_device *pdev);
+
+#endif
diff --git a/drivers/media/platform/ti-vpe/vpdma_priv.h b/drivers/media/platform/ti-vpe/vpdma_priv.h
new file mode 100644
index 0000000..8ff51a3
--- /dev/null
+++ b/drivers/media/platform/ti-vpe/vpdma_priv.h
@@ -0,0 +1,119 @@
+/*
+ * Copyright (c) 2013 Texas Instruments Inc.
+ *
+ * David Griego, <dagriego@biglakesoftware.com>
+ * Dale Farnsworth, <dale@farnsworth.org>
+ * Archit Taneja, <archit@ti.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#ifndef _TI_VPDMA_PRIV_H_
+#define _TI_VPDMA_PRIV_H_
+
+/*
+ * VPDMA Register offsets
+ */
+
+/* Top level */
+#define VPDMA_PID		0x00
+#define VPDMA_LIST_ADDR		0x04
+#define VPDMA_LIST_ATTR		0x08
+#define VPDMA_LIST_STAT_SYNC	0x0c
+#define VPDMA_BG_RGB		0x18
+#define VPDMA_BG_YUV		0x1c
+#define VPDMA_SETUP		0x30
+#define VPDMA_MAX_SIZE1		0x34
+#define VPDMA_MAX_SIZE2		0x38
+#define VPDMA_MAX_SIZE3		0x3c
+
+/* Interrupts */
+#define VPDMA_INT_CHAN_STAT(grp)	(0x40 + grp * 8)
+#define VPDMA_INT_CHAN_MASK(grp)	(VPDMA_INT_CHAN_STAT(grp) + 4)
+#define VPDMA_INT_CLIENT0_STAT		0x78
+#define VPDMA_INT_CLIENT0_MASK		0x7c
+#define VPDMA_INT_CLIENT1_STAT		0x80
+#define VPDMA_INT_CLIENT1_MASK		0x84
+#define VPDMA_INT_LIST0_STAT		0x88
+#define VPDMA_INT_LIST0_MASK		0x8c
+
+#define VPDMA_PERFMON(i)		(0x200 + i * 4)
+
+/* VPE specific client registers */
+#define VPDMA_DEI_CHROMA1_CSTAT		0x0300
+#define VPDMA_DEI_LUMA1_CSTAT		0x0304
+#define VPDMA_DEI_LUMA2_CSTAT		0x0308
+#define VPDMA_DEI_CHROMA2_CSTAT		0x030c
+#define VPDMA_DEI_LUMA3_CSTAT		0x0310
+#define VPDMA_DEI_CHROMA3_CSTAT		0x0314
+#define VPDMA_DEI_MV_IN_CSTAT		0x0330
+#define VPDMA_DEI_MV_OUT_CSTAT		0x033c
+#define VPDMA_VIP_UP_Y_CSTAT		0x0390
+#define VPDMA_VIP_UP_UV_CSTAT		0x0394
+#define VPDMA_VPI_CTL_CSTAT		0x03d0
+
+/* Reg field info for VPDMA_CLIENT_CSTAT registers */
+#define VPDMA_CSTAT_LINE_MODE_MASK	0x03
+#define VPDMA_CSTAT_LINE_MODE_SHIFT	8
+#define VPDMA_CSTAT_FRAME_START_MASK	0xf
+#define VPDMA_CSTAT_FRAME_START_SHIFT	10
+
+#define VPDMA_LIST_NUM_MASK		0x07
+#define VPDMA_LIST_NUM_SHFT		24
+#define VPDMA_LIST_STOP_SHFT		20
+#define VPDMA_LIST_RDY_MASK		0x01
+#define VPDMA_LIST_RDY_SHFT		19
+#define VPDMA_LIST_TYPE_MASK		0x03
+#define VPDMA_LIST_TYPE_SHFT		16
+#define VPDMA_LIST_SIZE_MASK		0xffff
+
+/* VPDMA data type values for data formats */
+#define DATA_TYPE_Y444				0x0
+#define DATA_TYPE_Y422				0x1
+#define DATA_TYPE_Y420				0x2
+#define DATA_TYPE_C444				0x4
+#define DATA_TYPE_C422				0x5
+#define DATA_TYPE_C420				0x6
+#define DATA_TYPE_YC422				0x7
+#define DATA_TYPE_YC444				0x8
+#define DATA_TYPE_CY422				0x23
+
+#define DATA_TYPE_RGB16_565			0x0
+#define DATA_TYPE_ARGB_1555			0x1
+#define DATA_TYPE_ARGB_4444			0x2
+#define DATA_TYPE_RGBA_5551			0x3
+#define DATA_TYPE_RGBA_4444			0x4
+#define DATA_TYPE_ARGB24_6666			0x5
+#define DATA_TYPE_RGB24_888			0x6
+#define DATA_TYPE_ARGB32_8888			0x7
+#define DATA_TYPE_RGBA24_6666			0x8
+#define DATA_TYPE_RGBA32_8888			0x9
+#define DATA_TYPE_BGR16_565			0x10
+#define DATA_TYPE_ABGR_1555			0x11
+#define DATA_TYPE_ABGR_4444			0x12
+#define DATA_TYPE_BGRA_5551			0x13
+#define DATA_TYPE_BGRA_4444			0x14
+#define DATA_TYPE_ABGR24_6666			0x15
+#define DATA_TYPE_BGR24_888			0x16
+#define DATA_TYPE_ABGR32_8888			0x17
+#define DATA_TYPE_BGRA24_6666			0x18
+#define DATA_TYPE_BGRA32_8888			0x19
+
+#define DATA_TYPE_MV				0x3
+
+/* VPDMA channel numbers(only VPE channels for now) */
+#define	VPE_CHAN_NUM_LUMA1_IN		0
+#define	VPE_CHAN_NUM_CHROMA1_IN		1
+#define	VPE_CHAN_NUM_LUMA2_IN		2
+#define	VPE_CHAN_NUM_CHROMA2_IN		3
+#define	VPE_CHAN_NUM_LUMA3_IN		4
+#define	VPE_CHAN_NUM_CHROMA3_IN		5
+#define	VPE_CHAN_NUM_MV_IN		12
+#define	VPE_CHAN_NUM_MV_OUT		15
+#define	VPE_CHAN_NUM_LUMA_OUT		102
+#define	VPE_CHAN_NUM_CHROMA_OUT		103
+#define	VPE_CHAN_NUM_RGB_OUT		106
+
+#endif
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v2 2/6] v4l: ti-vpe: Add helpers for creating VPDMA descriptors
  2013-08-20 11:00   ` Archit Taneja
@ 2013-08-20 11:00     ` Archit Taneja
  -1 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-08-20 11:00 UTC (permalink / raw)
  To: linux-media, hverkuil, laurent.pinchart, tomi.valkeinen
  Cc: linux-omap, Archit Taneja

Create functions which the VPE driver can use to create a VPDMA descriptor and
add it to a VPDMA descriptor list. These functions take a pointer to an existing
list, and append the configuration/data/control descriptor header to the list.

In the case of configuration descriptors, the creation of a payload block may be
required(the payloads can hold VPE MMR values, or scaler coefficients). The
allocation of the payload buffer and it's content is left to the VPE driver.
However, the VPDMA library provides helper macros to create payload in the
correct format.

Add debug functions to dump the descriptors in a way such that it's easy to see
the values of different fields in the descriptors.

Signed-off-by: Archit Taneja <archit@ti.com>
---
 drivers/media/platform/ti-vpe/vpdma.c      | 268 +++++++++++++++
 drivers/media/platform/ti-vpe/vpdma.h      |  48 +++
 drivers/media/platform/ti-vpe/vpdma_priv.h | 521 +++++++++++++++++++++++++++++
 3 files changed, 837 insertions(+)

diff --git a/drivers/media/platform/ti-vpe/vpdma.c b/drivers/media/platform/ti-vpe/vpdma.c
index 84b8ee52..18b7c24 100644
--- a/drivers/media/platform/ti-vpe/vpdma.c
+++ b/drivers/media/platform/ti-vpe/vpdma.c
@@ -21,6 +21,7 @@
 #include <linux/platform_device.h>
 #include <linux/sched.h>
 #include <linux/slab.h>
+#include <linux/videodev2.h>
 
 #include "vpdma.h"
 #include "vpdma_priv.h"
@@ -416,6 +417,273 @@ int vpdma_submit_descs(struct vpdma_data *vpdma, struct vpdma_desc_list *list)
 	return 0;
 }
 
+static void dump_cfd(struct vpdma_cfd *cfd)
+{
+	int class;
+
+	class = cfd_get_class(cfd);
+
+	pr_debug("config descriptor of payload class: %s\n",
+		class == CFD_CLS_BLOCK ? "simple block" :
+		"address data block");
+
+	if (class == CFD_CLS_BLOCK)
+		pr_debug("word0: dst_addr_offset = 0x%08x\n",
+			cfd->dest_addr_offset);
+
+	if (class == CFD_CLS_BLOCK)
+		pr_debug("word1: num_data_wrds = %d\n", cfd->block_len);
+
+	pr_debug("word2: payload_addr = 0x%08x\n", cfd->payload_addr);
+
+	pr_debug("word3: pkt_type = %d, direct = %d, class = %d, dest = %d, "
+		"payload_len = %d\n", cfd_get_pkt_type(cfd),
+		cfd_get_direct(cfd), class, cfd_get_dest(cfd),
+		cfd_get_payload_len(cfd));
+}
+
+/*
+ * append a configuration descriptor to the given descriptor list, where the
+ * payload is in the form of a simple data block specified in the descriptor
+ * header, this is used to upload scaler coefficients to the scaler module
+ */
+void vpdma_add_cfd_block(struct vpdma_desc_list *list, int client,
+		struct vpdma_buf *blk, u32 dest_offset)
+{
+	struct vpdma_cfd *cfd;
+	int len = blk->size;
+
+	WARN_ON(blk->dma_addr & VPDMA_DESC_ALIGN);
+
+	cfd = list->next;
+	WARN_ON((void *)(cfd + 1) > (list->buf.addr + list->buf.size));
+
+	cfd->dest_addr_offset = dest_offset;
+	cfd->block_len = len;
+	cfd->payload_addr = (u32) blk->dma_addr;
+	cfd->ctl_payload_len = cfd_pkt_payload_len(CFD_INDIRECT, CFD_CLS_BLOCK,
+				client, len >> 4);
+
+	list->next = cfd + 1;
+
+	dump_cfd(cfd);
+}
+
+/*
+ * append a configuration descriptor to the given descriptor list, where the
+ * payload is in the address data block format, this is used to a configure a
+ * discontiguous set of MMRs
+ */
+void vpdma_add_cfd_adb(struct vpdma_desc_list *list, int client,
+		struct vpdma_buf *adb)
+{
+	struct vpdma_cfd *cfd;
+	unsigned int len = adb->size;
+
+	WARN_ON(len & VPDMA_ADB_SIZE_ALIGN);
+	WARN_ON(adb->dma_addr & VPDMA_DESC_ALIGN);
+
+	cfd = list->next;
+	BUG_ON((void *)(cfd + 1) > (list->buf.addr + list->buf.size));
+
+	cfd->w0 = 0;
+	cfd->w1 = 0;
+	cfd->payload_addr = (u32) adb->dma_addr;
+	cfd->ctl_payload_len = cfd_pkt_payload_len(CFD_INDIRECT, CFD_CLS_ADB,
+				client, len >> 4);
+
+	list->next = cfd + 1;
+
+	dump_cfd(cfd);
+};
+
+/*
+ * control descriptor format change based on what type of control descriptor it
+ * is, we only use 'sync on channel' control descriptors for now, so assume it's
+ * that
+ */
+static void dump_ctd(struct vpdma_ctd *ctd)
+{
+	pr_debug("control descriptor\n");
+
+	pr_debug("word3: pkt_type = %d, source = %d, ctl_type = %d\n",
+		ctd_get_pkt_type(ctd), ctd_get_source(ctd), ctd_get_ctl(ctd));
+}
+
+/*
+ * append a 'sync on channel' type control descriptor to the given descriptor
+ * list, this descriptor stalls the VPDMA list till the time DMA is completed
+ * on the specified channel
+ */
+void vpdma_add_sync_on_channel_ctd(struct vpdma_desc_list *list,
+		enum vpdma_channel chan)
+{
+	struct vpdma_ctd *ctd;
+
+	ctd = list->next;
+	WARN_ON((void *)(ctd + 1) > (list->buf.addr + list->buf.size));
+
+	ctd->w0 = 0;
+	ctd->w1 = 0;
+	ctd->w2 = 0;
+	ctd->type_source_ctl = ctd_type_source_ctl(chan_info[chan].num,
+				CTD_TYPE_SYNC_ON_CHANNEL);
+
+	list->next = ctd + 1;
+
+	dump_ctd(ctd);
+}
+
+static void dump_dtd(struct vpdma_dtd *dtd)
+{
+	int dir, chan;
+
+	dir = dtd_get_dir(dtd);
+	chan = dtd_get_chan(dtd);
+
+	pr_debug("%s data transfer descriptor for channel %d\n",
+		dir == DTD_DIR_OUT ? "outbound" : "inbound", chan);
+
+	pr_debug("word0: data_type = %d, notify = %d, field = %d, 1D = %d, "
+		"even_ln_skp = %d, odd_ln_skp = %d, line_stride = %d\n",
+		dtd_get_data_type(dtd), dtd_get_notify(dtd), dtd_get_field(dtd),
+		dtd_get_1d(dtd), dtd_get_even_line_skip(dtd),
+		dtd_get_odd_line_skip(dtd), dtd_get_line_stride(dtd));
+
+	if (dir == DTD_DIR_IN)
+		pr_debug("word1: line_length = %d, xfer_height = %d\n",
+			dtd_get_line_length(dtd), dtd_get_xfer_height(dtd));
+
+	pr_debug("word2: start_addr = 0x%08x\n", dtd->start_addr);
+
+	pr_debug("word3: pkt_type = %d, mode = %d, dir = %d, chan = %d, "
+		"pri = %d, next_chan = %d\n", dtd_get_pkt_type(dtd),
+		dtd_get_mode(dtd), dir, chan, dtd_get_priority(dtd),
+		dtd_get_next_chan(dtd));
+
+	if (dir == DTD_DIR_IN)
+		pr_debug("word4: frame_width = %d, frame_height = %d\n",
+			dtd_get_frame_width(dtd), dtd_get_frame_height(dtd));
+	else
+		pr_debug("word4: desc_write_addr = 0x%08x, write_desc = %d, "
+			"drp_data = %d, use_desc_reg = %d\n",
+			dtd_get_desc_write_addr(dtd), dtd_get_write_desc(dtd),
+			dtd_get_drop_data(dtd), dtd_get_use_desc(dtd));
+
+	if (dir == DTD_DIR_IN)
+		pr_debug("word5: hor_start = %d, ver_start = %d\n",
+			dtd_get_h_start(dtd), dtd_get_v_start(dtd));
+	else
+		pr_debug("word5: max_width %d, max_height %d\n",
+			dtd_get_max_width(dtd), dtd_get_max_height(dtd));
+
+	pr_debug("word6: client specfic attr0 = 0x%08x\n", dtd->client_attr0);
+	pr_debug("word7: client specfic attr1 = 0x%08x\n", dtd->client_attr1);
+}
+
+/*
+ * append an outbound data transfer descriptor to the given descriptor list,
+ * this sets up a 'client to memory' VPDMA transfer for the given VPDMA channel
+ */
+void vpdma_add_out_dtd(struct vpdma_desc_list *list, struct v4l2_rect *c_rect,
+		const struct vpdma_data_format *fmt, dma_addr_t dma_addr,
+		enum vpdma_channel chan, u32 flags)
+{
+	int priority = 0;
+	int field = 0;
+	int notify = 1;
+	int channel, next_chan;
+	int depth = fmt->depth;
+	int stride;
+	struct vpdma_dtd *dtd;
+
+	channel = next_chan = chan_info[chan].num;
+
+	if (fmt->data_type == DATA_TYPE_C420)
+		depth = 8;
+
+	stride = (depth * c_rect->width) >> 3;
+	dma_addr += (c_rect->left * depth) >> 3;
+
+	dtd = list->next;
+	WARN_ON((void *)(dtd + 1) > (list->buf.addr + list->buf.size));
+
+	dtd->type_ctl_stride = dtd_type_ctl_stride(fmt->data_type,
+					notify,
+					field,
+					!!(flags & VPDMA_DATA_FRAME_1D),
+					!!(flags & VPDMA_DATA_EVEN_LINE_SKIP),
+					!!(flags & VPDMA_DATA_ODD_LINE_SKIP),
+					stride);
+	dtd->w1 = 0;
+	dtd->start_addr = (u32) dma_addr;
+	dtd->pkt_ctl = dtd_pkt_ctl(!!(flags & VPDMA_DATA_MODE_TILED),
+				DTD_DIR_OUT, channel, priority, next_chan);
+	dtd->desc_write_addr = dtd_desc_write_addr(0, 0, 0, 0);
+	dtd->max_width_height = dtd_max_width_height(MAX_OUT_WIDTH_1920,
+					MAX_OUT_HEIGHT_1080);
+	dtd->client_attr0 = 0;
+	dtd->client_attr1 = 0;
+
+	list->next = dtd + 1;
+
+	dump_dtd(dtd);
+}
+
+/*
+ * append an inbound data transfer descriptor to the given descriptor list,
+ * this sets up a 'memory to client' VPDMA transfer for the given VPDMA channel
+ */
+void vpdma_add_in_dtd(struct vpdma_desc_list *list, int frame_width,
+		int frame_height, struct v4l2_rect *c_rect,
+		const struct vpdma_data_format *fmt, dma_addr_t dma_addr,
+		enum vpdma_channel chan, int field, u32 flags)
+{
+	int priority = 0;
+	int notify = 1;
+	int depth = fmt->depth;
+	int channel, next_chan;
+	int stride;
+	int height = c_rect->height;
+	struct vpdma_dtd *dtd;
+
+	channel = next_chan = chan_info[chan].num;
+
+	if (fmt->data_type == DATA_TYPE_C420) {
+		height >>= 1;
+		frame_height >>= 1;
+		depth = 8;
+	}
+
+	stride = (depth * c_rect->width) >> 3;
+	dma_addr += (c_rect->left * depth) >> 3;
+
+	dtd = list->next;
+	WARN_ON((void *)(dtd + 1) > (list->buf.addr + list->buf.size));
+
+	dtd->type_ctl_stride = dtd_type_ctl_stride(fmt->data_type,
+					notify,
+					field,
+					!!(flags & VPDMA_DATA_FRAME_1D),
+					!!(flags & VPDMA_DATA_EVEN_LINE_SKIP),
+					!!(flags & VPDMA_DATA_ODD_LINE_SKIP),
+					stride);
+
+	dtd->xfer_length_height = dtd_xfer_length_height(c_rect->width, height);
+	dtd->start_addr = (u32) dma_addr;
+	dtd->pkt_ctl = dtd_pkt_ctl(!!(flags & VPDMA_DATA_MODE_TILED),
+				DTD_DIR_IN, channel, priority, next_chan);
+	dtd->frame_width_height = dtd_frame_width_height(frame_width,
+					frame_height);
+	dtd->start_h_v = dtd_start_h_v(c_rect->left, c_rect->top);
+	dtd->client_attr0 = 0;
+	dtd->client_attr1 = 0;
+
+	list->next = dtd + 1;
+
+	dump_dtd(dtd);
+}
+
 /* set or clear the mask for list complete interrupt */
 void vpdma_enable_list_complete_irq(struct vpdma_data *vpdma, int list_num,
 		bool enable)
diff --git a/drivers/media/platform/ti-vpe/vpdma.h b/drivers/media/platform/ti-vpe/vpdma.h
index 2e571a8..702ba45 100644
--- a/drivers/media/platform/ti-vpe/vpdma.h
+++ b/drivers/media/platform/ti-vpe/vpdma.h
@@ -123,6 +123,39 @@ enum vpdma_channel {
 	VPE_CHAN_RGB_OUT,
 };
 
+/* flags for VPDMA data descriptors */
+#define VPDMA_DATA_ODD_LINE_SKIP	(1 << 0)
+#define VPDMA_DATA_EVEN_LINE_SKIP	(1 << 1)
+#define VPDMA_DATA_FRAME_1D		(1 << 2)
+#define VPDMA_DATA_MODE_TILED		(1 << 3)
+
+/*
+ * client identifiers used for configuration descriptors
+ */
+#define CFD_MMR_CLIENT		0
+#define CFD_SC_CLIENT		7
+
+/* Address data block header format */
+struct vpdma_adb_hdr {
+	u32			offset;
+	u32			nwords;
+	u32			reserved0;
+	u32			reserved1;
+};
+
+/* helpers for creating ADB headers for config descriptors MMRs as client */
+#define ADB_ADDR(dma_buf, str, fld)	((dma_buf)->addr + offsetof(str, fld))
+#define MMR_ADB_ADDR(buf, str, fld)	ADB_ADDR(&(buf), struct str, fld)
+
+#define VPDMA_SET_MMR_ADB_HDR(buf, str, hdr, regs, offset_a)	\
+	do {							\
+		struct vpdma_adb_hdr *h;			\
+		struct str *adb = NULL;				\
+		h = MMR_ADB_ADDR(buf, str, hdr);		\
+		h->offset = (offset_a);				\
+		h->nwords = sizeof(adb->regs) >> 2;		\
+	} while (0)
+
 /* vpdma descriptor buffer allocation and management */
 int vpdma_buf_alloc(struct vpdma_buf *buf, size_t size);
 void vpdma_buf_free(struct vpdma_buf *buf);
@@ -135,6 +168,21 @@ void vpdma_reset_desc_list(struct vpdma_desc_list *list);
 void vpdma_free_desc_list(struct vpdma_desc_list *list);
 int vpdma_submit_descs(struct vpdma_data *vpdma, struct vpdma_desc_list *list);
 
+/* helpers for creating vpdma descriptors */
+void vpdma_add_cfd_block(struct vpdma_desc_list *list, int client,
+		struct vpdma_buf *blk, u32 dest_offset);
+void vpdma_add_cfd_adb(struct vpdma_desc_list *list, int client,
+		struct vpdma_buf *adb);
+void vpdma_add_sync_on_channel_ctd(struct vpdma_desc_list *list,
+		enum vpdma_channel chan);
+void vpdma_add_out_dtd(struct vpdma_desc_list *list, struct v4l2_rect *c_rect,
+		const struct vpdma_data_format *fmt, dma_addr_t dma_addr,
+		enum vpdma_channel chan, u32 flags);
+void vpdma_add_in_dtd(struct vpdma_desc_list *list, int frame_width,
+		int frame_height, struct v4l2_rect *c_rect,
+		const struct vpdma_data_format *fmt, dma_addr_t dma_addr,
+		enum vpdma_channel chan, int field, u32 flags);
+
 /* vpdma list interrupt management */
 void vpdma_enable_list_complete_irq(struct vpdma_data *vpdma, int list_num,
 		bool enable);
diff --git a/drivers/media/platform/ti-vpe/vpdma_priv.h b/drivers/media/platform/ti-vpe/vpdma_priv.h
index 8ff51a3..da3976b 100644
--- a/drivers/media/platform/ti-vpe/vpdma_priv.h
+++ b/drivers/media/platform/ti-vpe/vpdma_priv.h
@@ -116,4 +116,525 @@
 #define	VPE_CHAN_NUM_CHROMA_OUT		103
 #define	VPE_CHAN_NUM_RGB_OUT		106
 
+/*
+ * a VPDMA address data block payload for a configuration descriptor needs to
+ * have each sub block length as a multiple of 16 bytes. Therefore, the overall
+ * size of the payload also needs to be a multiple of 16 bytes. The sub block
+ * lengths should be ensured to be aligned by the VPDMA user.
+ */
+#define VPDMA_ADB_SIZE_ALIGN		0x0f
+
+/*
+ * data transfer descriptor
+ */
+struct vpdma_dtd {
+	u32			type_ctl_stride;
+	union {
+		u32		xfer_length_height;
+		u32		w1;
+	};
+	dma_addr_t		start_addr;
+	u32			pkt_ctl;
+	union {
+		u32		frame_width_height;	/* inbound */
+		dma_addr_t	desc_write_addr;	/* outbound */
+	};
+	union {
+		u32		start_h_v;		/* inbound */
+		u32		max_width_height;	/* outbound */
+	};
+	u32			client_attr0;
+	u32			client_attr1;
+};
+
+/* Data Transfer Descriptor specifics */
+#define DTD_NO_NOTIFY		0
+#define DTD_NOTIFY		1
+
+#define DTD_PKT_TYPE		0xa
+#define DTD_DIR_IN		0
+#define DTD_DIR_OUT		1
+
+/* type_ctl_stride */
+#define DTD_DATA_TYPE_MASK	0x3f
+#define DTD_DATA_TYPE_SHFT	26
+#define DTD_NOTIFY_MASK		0x01
+#define DTD_NOTIFY_SHFT		25
+#define DTD_FIELD_MASK		0x01
+#define DTD_FIELD_SHFT		24
+#define DTD_1D_MASK		0x01
+#define DTD_1D_SHFT		23
+#define DTD_EVEN_LINE_SKIP_MASK	0x01
+#define DTD_EVEN_LINE_SKIP_SHFT	20
+#define DTD_ODD_LINE_SKIP_MASK	0x01
+#define DTD_ODD_LINE_SKIP_SHFT	16
+#define DTD_LINE_STRIDE_MASK	0xffff
+#define DTD_LINE_STRIDE_SHFT	0
+
+/* xfer_length_height */
+#define DTD_LINE_LENGTH_MASK	0xffff
+#define DTD_LINE_LENGTH_SHFT	16
+#define DTD_XFER_HEIGHT_MASK	0xffff
+#define DTD_XFER_HEIGHT_SHFT	0
+
+/* pkt_ctl */
+#define DTD_PKT_TYPE_MASK	0x1f
+#define DTD_PKT_TYPE_SHFT	27
+#define DTD_MODE_MASK		0x01
+#define DTD_MODE_SHFT		26
+#define DTD_DIR_MASK		0x01
+#define DTD_DIR_SHFT		25
+#define DTD_CHAN_MASK		0x01ff
+#define DTD_CHAN_SHFT		16
+#define DTD_PRI_MASK		0x0f
+#define DTD_PRI_SHFT		9
+#define DTD_NEXT_CHAN_MASK	0x01ff
+#define DTD_NEXT_CHAN_SHFT	0
+
+/* frame_width_height */
+#define DTD_FRAME_WIDTH_MASK	0xffff
+#define DTD_FRAME_WIDTH_SHFT	16
+#define DTD_FRAME_HEIGHT_MASK	0xffff
+#define DTD_FRAME_HEIGHT_SHFT	0
+
+/* start_h_v */
+#define DTD_H_START_MASK	0xffff
+#define DTD_H_START_SHFT	16
+#define DTD_V_START_MASK	0xffff
+#define DTD_V_START_SHFT	0
+
+#define DTD_DESC_START_SHIFT	5
+#define DTD_WRITE_DESC_MASK	0x01
+#define DTD_WRITE_DESC_SHIFT	2
+#define DTD_DROP_DATA_MASK	0x01
+#define DTD_DROP_DATA_SHIFT	1
+#define DTD_USE_DESC_MASK	0x01
+#define DTD_USE_DESC_SHIFT	0
+
+/* max_width_height */
+#define DTD_MAX_WIDTH_MASK	0x07
+#define DTD_MAX_WIDTH_SHFT	4
+#define DTD_MAX_HEIGHT_MASK	0x07
+#define DTD_MAX_HEIGHT_SHFT	0
+
+/* max width configurations */
+ /* unlimited width */
+#define	MAX_OUT_WIDTH_UNLIMITED		0
+/* as specified in max_size1 reg */
+#define MAX_OUT_WIDTH_REG1		1
+/* as specified in max_size2 reg */
+#define MAX_OUT_WIDTH_REG2		2
+/* as specified in max_size3 reg */
+#define	MAX_OUT_WIDTH_REG3		3
+/* maximum of 352 pixels as width */
+#define MAX_OUT_WIDTH_352		4
+/* maximum of 768 pixels as width */
+#define	MAX_OUT_WIDTH_768		5
+/* maximum of 1280 pixels width */
+#define	MAX_OUT_WIDTH_1280		6
+/* maximum of 1920 pixels as width */
+#define	MAX_OUT_WIDTH_1920		7
+
+/* max height configurations */
+ /* unlimited height */
+#define	MAX_OUT_HEIGHT_UNLIMITED	0
+/* as specified in max_size1 reg */
+#define MAX_OUT_HEIGHT_REG1		1
+/* as specified in max_size2 reg */
+#define MAX_OUT_HEIGHT_REG2		2
+/* as specified in max_size3 reg */
+#define	MAX_OUT_HEIGHT_REG3		3
+/* maximum of 288 lines as height */
+#define MAX_OUT_HEIGHT_288		4
+/* maximum of 576 lines as height */
+#define	MAX_OUT_HEIGHT_576		5
+/* maximum of 720 lines as height */
+#define	MAX_OUT_HEIGHT_720		6
+/* maximum of 1080 lines as height */
+#define	MAX_OUT_HEIGHT_1080		7
+
+static inline u32 dtd_type_ctl_stride(int type, bool notify, int field,
+			bool one_d, bool even_line_skip, bool odd_line_skip,
+			int line_stride)
+{
+	return (type << DTD_DATA_TYPE_SHFT) | (notify << DTD_NOTIFY_SHFT) |
+		(field << DTD_FIELD_SHFT) | (one_d << DTD_1D_SHFT) |
+		(even_line_skip << DTD_EVEN_LINE_SKIP_SHFT) |
+		(odd_line_skip << DTD_ODD_LINE_SKIP_SHFT) |
+		line_stride;
+}
+
+static inline u32 dtd_xfer_length_height(int line_length, int xfer_height)
+{
+	return (line_length << DTD_LINE_LENGTH_SHFT) | xfer_height;
+}
+
+static inline u32 dtd_pkt_ctl(bool mode, bool dir, int chan, int pri,
+			int next_chan)
+{
+	return (DTD_PKT_TYPE << DTD_PKT_TYPE_SHFT) | (mode << DTD_MODE_SHFT) |
+		(dir << DTD_DIR_SHFT) | (chan << DTD_CHAN_SHFT) |
+		(pri << DTD_PRI_SHFT) | next_chan;
+}
+
+static inline u32 dtd_frame_width_height(int width, int height)
+{
+	return (width << DTD_FRAME_WIDTH_SHFT) | height;
+}
+
+static inline u32 dtd_desc_write_addr(unsigned int addr, bool write_desc,
+			bool drop_data, bool use_desc)
+{
+	return (addr << DTD_DESC_START_SHIFT) |
+		(write_desc << DTD_WRITE_DESC_SHIFT) |
+		(drop_data << DTD_DROP_DATA_SHIFT) |
+		use_desc;
+}
+
+static inline u32 dtd_start_h_v(int h_start, int v_start)
+{
+	return (h_start << DTD_H_START_SHFT) | v_start;
+}
+
+static inline u32 dtd_max_width_height(int max_width, int max_height)
+{
+	return (max_width << DTD_MAX_WIDTH_SHFT) | max_height;
+}
+
+static inline int dtd_get_data_type(struct vpdma_dtd *dtd)
+{
+	return dtd->type_ctl_stride >> DTD_DATA_TYPE_SHFT;
+}
+
+static inline bool dtd_get_notify(struct vpdma_dtd *dtd)
+{
+	return (dtd->type_ctl_stride >> DTD_NOTIFY_SHFT) & DTD_NOTIFY_MASK;
+}
+
+static inline int dtd_get_field(struct vpdma_dtd *dtd)
+{
+	return (dtd->type_ctl_stride >> DTD_FIELD_SHFT) & DTD_FIELD_MASK;
+}
+
+static inline bool dtd_get_1d(struct vpdma_dtd *dtd)
+{
+	return (dtd->type_ctl_stride >> DTD_1D_SHFT) & DTD_1D_MASK;
+}
+
+static inline bool dtd_get_even_line_skip(struct vpdma_dtd *dtd)
+{
+	return (dtd->type_ctl_stride >> DTD_EVEN_LINE_SKIP_SHFT)
+		& DTD_EVEN_LINE_SKIP_MASK;
+}
+
+static inline bool dtd_get_odd_line_skip(struct vpdma_dtd *dtd)
+{
+	return (dtd->type_ctl_stride >> DTD_ODD_LINE_SKIP_SHFT)
+		& DTD_ODD_LINE_SKIP_MASK;
+}
+
+static inline int dtd_get_line_stride(struct vpdma_dtd *dtd)
+{
+	return dtd->type_ctl_stride & DTD_LINE_STRIDE_MASK;
+}
+
+static inline int dtd_get_line_length(struct vpdma_dtd *dtd)
+{
+	return dtd->xfer_length_height >> DTD_LINE_LENGTH_SHFT;
+}
+
+static inline int dtd_get_xfer_height(struct vpdma_dtd *dtd)
+{
+	return dtd->xfer_length_height & DTD_XFER_HEIGHT_MASK;
+}
+
+static inline int dtd_get_pkt_type(struct vpdma_dtd *dtd)
+{
+	return dtd->pkt_ctl >> DTD_PKT_TYPE_SHFT;
+}
+
+static inline bool dtd_get_mode(struct vpdma_dtd *dtd)
+{
+	return (dtd->pkt_ctl >> DTD_MODE_SHFT) & DTD_MODE_MASK;
+}
+
+static inline bool dtd_get_dir(struct vpdma_dtd *dtd)
+{
+	return (dtd->pkt_ctl >> DTD_DIR_SHFT) & DTD_DIR_MASK;
+}
+
+static inline int dtd_get_chan(struct vpdma_dtd *dtd)
+{
+	return (dtd->pkt_ctl >> DTD_CHAN_SHFT) & DTD_CHAN_MASK;
+}
+
+static inline int dtd_get_priority(struct vpdma_dtd *dtd)
+{
+	return (dtd->pkt_ctl >> DTD_PRI_SHFT) & DTD_PRI_MASK;
+}
+
+static inline int dtd_get_next_chan(struct vpdma_dtd *dtd)
+{
+	return (dtd->pkt_ctl >> DTD_NEXT_CHAN_SHFT) & DTD_NEXT_CHAN_MASK;
+}
+
+static inline int dtd_get_frame_width(struct vpdma_dtd *dtd)
+{
+	return dtd->frame_width_height >> DTD_FRAME_WIDTH_SHFT;
+}
+
+static inline int dtd_get_frame_height(struct vpdma_dtd *dtd)
+{
+	return dtd->frame_width_height & DTD_FRAME_HEIGHT_MASK;
+}
+
+static inline int dtd_get_desc_write_addr(struct vpdma_dtd *dtd)
+{
+	return dtd->desc_write_addr >> DTD_DESC_START_SHIFT;
+}
+
+static inline bool dtd_get_write_desc(struct vpdma_dtd *dtd)
+{
+	return (dtd->desc_write_addr >> DTD_WRITE_DESC_SHIFT) &
+							DTD_WRITE_DESC_MASK;
+}
+
+static inline bool dtd_get_drop_data(struct vpdma_dtd *dtd)
+{
+	return (dtd->desc_write_addr >> DTD_DROP_DATA_SHIFT) &
+							DTD_DROP_DATA_MASK;
+}
+
+static inline bool dtd_get_use_desc(struct vpdma_dtd *dtd)
+{
+	return dtd->desc_write_addr & DTD_USE_DESC_MASK;
+}
+
+static inline int dtd_get_h_start(struct vpdma_dtd *dtd)
+{
+	return dtd->start_h_v >> DTD_H_START_SHFT;
+}
+
+static inline int dtd_get_v_start(struct vpdma_dtd *dtd)
+{
+	return dtd->start_h_v & DTD_V_START_MASK;
+}
+
+static inline int dtd_get_max_width(struct vpdma_dtd *dtd)
+{
+	return (dtd->max_width_height >> DTD_MAX_WIDTH_SHFT) &
+							DTD_MAX_WIDTH_MASK;
+}
+
+static inline int dtd_get_max_height(struct vpdma_dtd *dtd)
+{
+	return (dtd->max_width_height >> DTD_MAX_HEIGHT_SHFT) &
+							DTD_MAX_HEIGHT_MASK;
+}
+
+/*
+ * configuration descriptor
+ */
+struct vpdma_cfd {
+	union {
+		u32	dest_addr_offset;
+		u32	w0;
+	};
+	union {
+		u32	block_len;		/* in words */
+		u32	w1;
+	};
+	u32		payload_addr;
+	u32		ctl_payload_len;	/* in words */
+};
+
+/* Configuration descriptor specifics */
+
+#define CFD_PKT_TYPE		0xb
+
+#define CFD_DIRECT		1
+#define CFD_INDIRECT		0
+#define CFD_CLS_ADB		0
+#define CFD_CLS_BLOCK		1
+
+/* block_len */
+#define CFD__BLOCK_LEN_MASK	0xffff
+#define CFD__BLOCK_LEN_SHFT	0
+
+/* ctl_payload_len */
+#define CFD_PKT_TYPE_MASK	0x1f
+#define CFD_PKT_TYPE_SHFT	27
+#define CFD_DIRECT_MASK		0x01
+#define CFD_DIRECT_SHFT		26
+#define CFD_CLASS_MASK		0x03
+#define CFD_CLASS_SHFT		24
+#define CFD_DEST_MASK		0xff
+#define CFD_DEST_SHFT		16
+#define CFD_PAYLOAD_LEN_MASK	0xffff
+#define CFD_PAYLOAD_LEN_SHFT	0
+
+static inline u32 cfd_pkt_payload_len(bool direct, int cls, int dest,
+		int payload_len)
+{
+	return (CFD_PKT_TYPE << CFD_PKT_TYPE_SHFT) |
+		(direct << CFD_DIRECT_SHFT) |
+		(cls << CFD_CLASS_SHFT) |
+		(dest << CFD_DEST_SHFT) |
+		payload_len;
+}
+
+static inline int cfd_get_pkt_type(struct vpdma_cfd *cfd)
+{
+	return cfd->ctl_payload_len >> CFD_PKT_TYPE_SHFT;
+}
+
+static inline bool cfd_get_direct(struct vpdma_cfd *cfd)
+{
+	return (cfd->ctl_payload_len >> CFD_DIRECT_SHFT) & CFD_DIRECT_MASK;
+}
+
+static inline bool cfd_get_class(struct vpdma_cfd *cfd)
+{
+	return (cfd->ctl_payload_len >> CFD_CLASS_SHFT) & CFD_CLASS_MASK;
+}
+
+static inline int cfd_get_dest(struct vpdma_cfd *cfd)
+{
+	return (cfd->ctl_payload_len >> CFD_DEST_SHFT) & CFD_DEST_MASK;
+}
+
+static inline int cfd_get_payload_len(struct vpdma_cfd *cfd)
+{
+	return cfd->ctl_payload_len & CFD_PAYLOAD_LEN_MASK;
+}
+
+/*
+ * control descriptor
+ */
+struct vpdma_ctd {
+	union {
+		u32	timer_value;
+		u32	list_addr;
+		u32	w0;
+	};
+	union {
+		u32	pixel_line_count;
+		u32	list_size;
+		u32	w1;
+	};
+	union {
+		u32	event;
+		u32	fid_ctl;
+		u32	w2;
+	};
+	u32		type_source_ctl;
+};
+
+/* control descriptor types */
+#define CTD_TYPE_SYNC_ON_CLIENT		0
+#define CTD_TYPE_SYNC_ON_LIST		1
+#define CTD_TYPE_SYNC_ON_EXT		2
+#define CTD_TYPE_SYNC_ON_CHANNEL	3
+#define CTD_TYPE_CHNG_CLIENT_IRQ	4
+#define CTD_TYPE_SEND_IRQ		5
+#define CTD_TYPE_RELOAD_LIST		6
+#define CTD_TYPE_ABORT_CHANNEL		6
+
+#define CTD_PKT_TYPE		0xc
+
+/* timer_value */
+#define CTD_TIMER_VALUE_MASK	0xffff
+#define CTD_TIMER_VALUE_SHFT	0
+
+/* pixel_line_count */
+#define CTD_PIXEL_COUNT_MASK	0xffff
+#define CTD_PIXEL_COUNT_SHFT	16
+#define CTD_LINE_COUNT_MASK	0xffff
+#define CTD_LINE_COUNT_SHFT	0
+
+/* list_size */
+#define CTD_LIST_SIZE_MASK	0xffff
+#define CTD_LIST_SIZE_SHFT	0
+
+/* event */
+#define CTD_EVENT_MASK		0x0f
+#define CTD_EVENT_SHFT		0
+
+/* fid_ctl */
+#define CTD_FID2_MASK		0x03
+#define CTD_FID2_SHFT		4
+#define CTD_FID1_MASK		0x03
+#define CTD_FID1_SHFT		2
+#define CTD_FID0_MASK		0x03
+#define CTD_FID0_SHFT		0
+
+/* type_source_ctl */
+#define CTD_PKT_TYPE_MASK	0x1f
+#define CTD_PKT_TYPE_SHFT	27
+#define CTD_SOURCE_MASK		0xff
+#define CTD_SOURCE_SHFT		16
+#define CTD_CONTROL_MASK	0x0f
+#define CTD_CONTROL_SHFT	0
+
+static inline u32 ctd_pixel_line_count(int pixel_count, int line_count)
+{
+	return (pixel_count << CTD_PIXEL_COUNT_SHFT) | line_count;
+}
+
+static inline u32 ctd_set_fid_ctl(int fid0, int fid1, int fid2)
+{
+	return (fid2 << CTD_FID2_SHFT) | (fid1 << CTD_FID1_SHFT) | fid0;
+}
+
+static inline u32 ctd_type_source_ctl(int source, int control)
+{
+	return (CTD_PKT_TYPE << CTD_PKT_TYPE_SHFT) |
+		(source << CTD_SOURCE_SHFT) | control;
+}
+
+static inline u32 ctd_get_pixel_count(struct vpdma_ctd *ctd)
+{
+	return ctd->pixel_line_count >> CTD_PIXEL_COUNT_SHFT;
+}
+
+static inline int ctd_get_line_count(struct vpdma_ctd *ctd)
+{
+	return ctd->pixel_line_count & CTD_LINE_COUNT_MASK;
+}
+
+static inline int ctd_get_event(struct vpdma_ctd *ctd)
+{
+	return ctd->event & CTD_EVENT_MASK;
+}
+
+static inline int ctd_get_fid2_ctl(struct vpdma_ctd *ctd)
+{
+	return (ctd->fid_ctl >> CTD_FID2_SHFT) & CTD_FID2_MASK;
+}
+
+static inline int ctd_get_fid1_ctl(struct vpdma_ctd *ctd)
+{
+	return (ctd->fid_ctl >> CTD_FID1_SHFT) & CTD_FID1_MASK;
+}
+
+static inline int ctd_get_fid0_ctl(struct vpdma_ctd *ctd)
+{
+	return ctd->fid_ctl & CTD_FID2_MASK;
+}
+
+static inline int ctd_get_pkt_type(struct vpdma_ctd *ctd)
+{
+	return ctd->type_source_ctl >> CTD_PKT_TYPE_SHFT;
+}
+
+static inline int ctd_get_source(struct vpdma_ctd *ctd)
+{
+	return (ctd->type_source_ctl >> CTD_SOURCE_SHFT) & CTD_SOURCE_MASK;
+}
+
+static inline int ctd_get_ctl(struct vpdma_ctd *ctd)
+{
+	return ctd->type_source_ctl & CTD_CONTROL_MASK;
+}
+
 #endif
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v2 2/6] v4l: ti-vpe: Add helpers for creating VPDMA descriptors
@ 2013-08-20 11:00     ` Archit Taneja
  0 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-08-20 11:00 UTC (permalink / raw)
  To: linux-media, hverkuil, laurent.pinchart, tomi.valkeinen
  Cc: linux-omap, Archit Taneja

Create functions which the VPE driver can use to create a VPDMA descriptor and
add it to a VPDMA descriptor list. These functions take a pointer to an existing
list, and append the configuration/data/control descriptor header to the list.

In the case of configuration descriptors, the creation of a payload block may be
required(the payloads can hold VPE MMR values, or scaler coefficients). The
allocation of the payload buffer and it's content is left to the VPE driver.
However, the VPDMA library provides helper macros to create payload in the
correct format.

Add debug functions to dump the descriptors in a way such that it's easy to see
the values of different fields in the descriptors.

Signed-off-by: Archit Taneja <archit@ti.com>
---
 drivers/media/platform/ti-vpe/vpdma.c      | 268 +++++++++++++++
 drivers/media/platform/ti-vpe/vpdma.h      |  48 +++
 drivers/media/platform/ti-vpe/vpdma_priv.h | 521 +++++++++++++++++++++++++++++
 3 files changed, 837 insertions(+)

diff --git a/drivers/media/platform/ti-vpe/vpdma.c b/drivers/media/platform/ti-vpe/vpdma.c
index 84b8ee52..18b7c24 100644
--- a/drivers/media/platform/ti-vpe/vpdma.c
+++ b/drivers/media/platform/ti-vpe/vpdma.c
@@ -21,6 +21,7 @@
 #include <linux/platform_device.h>
 #include <linux/sched.h>
 #include <linux/slab.h>
+#include <linux/videodev2.h>
 
 #include "vpdma.h"
 #include "vpdma_priv.h"
@@ -416,6 +417,273 @@ int vpdma_submit_descs(struct vpdma_data *vpdma, struct vpdma_desc_list *list)
 	return 0;
 }
 
+static void dump_cfd(struct vpdma_cfd *cfd)
+{
+	int class;
+
+	class = cfd_get_class(cfd);
+
+	pr_debug("config descriptor of payload class: %s\n",
+		class == CFD_CLS_BLOCK ? "simple block" :
+		"address data block");
+
+	if (class == CFD_CLS_BLOCK)
+		pr_debug("word0: dst_addr_offset = 0x%08x\n",
+			cfd->dest_addr_offset);
+
+	if (class == CFD_CLS_BLOCK)
+		pr_debug("word1: num_data_wrds = %d\n", cfd->block_len);
+
+	pr_debug("word2: payload_addr = 0x%08x\n", cfd->payload_addr);
+
+	pr_debug("word3: pkt_type = %d, direct = %d, class = %d, dest = %d, "
+		"payload_len = %d\n", cfd_get_pkt_type(cfd),
+		cfd_get_direct(cfd), class, cfd_get_dest(cfd),
+		cfd_get_payload_len(cfd));
+}
+
+/*
+ * append a configuration descriptor to the given descriptor list, where the
+ * payload is in the form of a simple data block specified in the descriptor
+ * header, this is used to upload scaler coefficients to the scaler module
+ */
+void vpdma_add_cfd_block(struct vpdma_desc_list *list, int client,
+		struct vpdma_buf *blk, u32 dest_offset)
+{
+	struct vpdma_cfd *cfd;
+	int len = blk->size;
+
+	WARN_ON(blk->dma_addr & VPDMA_DESC_ALIGN);
+
+	cfd = list->next;
+	WARN_ON((void *)(cfd + 1) > (list->buf.addr + list->buf.size));
+
+	cfd->dest_addr_offset = dest_offset;
+	cfd->block_len = len;
+	cfd->payload_addr = (u32) blk->dma_addr;
+	cfd->ctl_payload_len = cfd_pkt_payload_len(CFD_INDIRECT, CFD_CLS_BLOCK,
+				client, len >> 4);
+
+	list->next = cfd + 1;
+
+	dump_cfd(cfd);
+}
+
+/*
+ * append a configuration descriptor to the given descriptor list, where the
+ * payload is in the address data block format, this is used to a configure a
+ * discontiguous set of MMRs
+ */
+void vpdma_add_cfd_adb(struct vpdma_desc_list *list, int client,
+		struct vpdma_buf *adb)
+{
+	struct vpdma_cfd *cfd;
+	unsigned int len = adb->size;
+
+	WARN_ON(len & VPDMA_ADB_SIZE_ALIGN);
+	WARN_ON(adb->dma_addr & VPDMA_DESC_ALIGN);
+
+	cfd = list->next;
+	BUG_ON((void *)(cfd + 1) > (list->buf.addr + list->buf.size));
+
+	cfd->w0 = 0;
+	cfd->w1 = 0;
+	cfd->payload_addr = (u32) adb->dma_addr;
+	cfd->ctl_payload_len = cfd_pkt_payload_len(CFD_INDIRECT, CFD_CLS_ADB,
+				client, len >> 4);
+
+	list->next = cfd + 1;
+
+	dump_cfd(cfd);
+};
+
+/*
+ * control descriptor format change based on what type of control descriptor it
+ * is, we only use 'sync on channel' control descriptors for now, so assume it's
+ * that
+ */
+static void dump_ctd(struct vpdma_ctd *ctd)
+{
+	pr_debug("control descriptor\n");
+
+	pr_debug("word3: pkt_type = %d, source = %d, ctl_type = %d\n",
+		ctd_get_pkt_type(ctd), ctd_get_source(ctd), ctd_get_ctl(ctd));
+}
+
+/*
+ * append a 'sync on channel' type control descriptor to the given descriptor
+ * list, this descriptor stalls the VPDMA list till the time DMA is completed
+ * on the specified channel
+ */
+void vpdma_add_sync_on_channel_ctd(struct vpdma_desc_list *list,
+		enum vpdma_channel chan)
+{
+	struct vpdma_ctd *ctd;
+
+	ctd = list->next;
+	WARN_ON((void *)(ctd + 1) > (list->buf.addr + list->buf.size));
+
+	ctd->w0 = 0;
+	ctd->w1 = 0;
+	ctd->w2 = 0;
+	ctd->type_source_ctl = ctd_type_source_ctl(chan_info[chan].num,
+				CTD_TYPE_SYNC_ON_CHANNEL);
+
+	list->next = ctd + 1;
+
+	dump_ctd(ctd);
+}
+
+static void dump_dtd(struct vpdma_dtd *dtd)
+{
+	int dir, chan;
+
+	dir = dtd_get_dir(dtd);
+	chan = dtd_get_chan(dtd);
+
+	pr_debug("%s data transfer descriptor for channel %d\n",
+		dir == DTD_DIR_OUT ? "outbound" : "inbound", chan);
+
+	pr_debug("word0: data_type = %d, notify = %d, field = %d, 1D = %d, "
+		"even_ln_skp = %d, odd_ln_skp = %d, line_stride = %d\n",
+		dtd_get_data_type(dtd), dtd_get_notify(dtd), dtd_get_field(dtd),
+		dtd_get_1d(dtd), dtd_get_even_line_skip(dtd),
+		dtd_get_odd_line_skip(dtd), dtd_get_line_stride(dtd));
+
+	if (dir == DTD_DIR_IN)
+		pr_debug("word1: line_length = %d, xfer_height = %d\n",
+			dtd_get_line_length(dtd), dtd_get_xfer_height(dtd));
+
+	pr_debug("word2: start_addr = 0x%08x\n", dtd->start_addr);
+
+	pr_debug("word3: pkt_type = %d, mode = %d, dir = %d, chan = %d, "
+		"pri = %d, next_chan = %d\n", dtd_get_pkt_type(dtd),
+		dtd_get_mode(dtd), dir, chan, dtd_get_priority(dtd),
+		dtd_get_next_chan(dtd));
+
+	if (dir == DTD_DIR_IN)
+		pr_debug("word4: frame_width = %d, frame_height = %d\n",
+			dtd_get_frame_width(dtd), dtd_get_frame_height(dtd));
+	else
+		pr_debug("word4: desc_write_addr = 0x%08x, write_desc = %d, "
+			"drp_data = %d, use_desc_reg = %d\n",
+			dtd_get_desc_write_addr(dtd), dtd_get_write_desc(dtd),
+			dtd_get_drop_data(dtd), dtd_get_use_desc(dtd));
+
+	if (dir == DTD_DIR_IN)
+		pr_debug("word5: hor_start = %d, ver_start = %d\n",
+			dtd_get_h_start(dtd), dtd_get_v_start(dtd));
+	else
+		pr_debug("word5: max_width %d, max_height %d\n",
+			dtd_get_max_width(dtd), dtd_get_max_height(dtd));
+
+	pr_debug("word6: client specfic attr0 = 0x%08x\n", dtd->client_attr0);
+	pr_debug("word7: client specfic attr1 = 0x%08x\n", dtd->client_attr1);
+}
+
+/*
+ * append an outbound data transfer descriptor to the given descriptor list,
+ * this sets up a 'client to memory' VPDMA transfer for the given VPDMA channel
+ */
+void vpdma_add_out_dtd(struct vpdma_desc_list *list, struct v4l2_rect *c_rect,
+		const struct vpdma_data_format *fmt, dma_addr_t dma_addr,
+		enum vpdma_channel chan, u32 flags)
+{
+	int priority = 0;
+	int field = 0;
+	int notify = 1;
+	int channel, next_chan;
+	int depth = fmt->depth;
+	int stride;
+	struct vpdma_dtd *dtd;
+
+	channel = next_chan = chan_info[chan].num;
+
+	if (fmt->data_type == DATA_TYPE_C420)
+		depth = 8;
+
+	stride = (depth * c_rect->width) >> 3;
+	dma_addr += (c_rect->left * depth) >> 3;
+
+	dtd = list->next;
+	WARN_ON((void *)(dtd + 1) > (list->buf.addr + list->buf.size));
+
+	dtd->type_ctl_stride = dtd_type_ctl_stride(fmt->data_type,
+					notify,
+					field,
+					!!(flags & VPDMA_DATA_FRAME_1D),
+					!!(flags & VPDMA_DATA_EVEN_LINE_SKIP),
+					!!(flags & VPDMA_DATA_ODD_LINE_SKIP),
+					stride);
+	dtd->w1 = 0;
+	dtd->start_addr = (u32) dma_addr;
+	dtd->pkt_ctl = dtd_pkt_ctl(!!(flags & VPDMA_DATA_MODE_TILED),
+				DTD_DIR_OUT, channel, priority, next_chan);
+	dtd->desc_write_addr = dtd_desc_write_addr(0, 0, 0, 0);
+	dtd->max_width_height = dtd_max_width_height(MAX_OUT_WIDTH_1920,
+					MAX_OUT_HEIGHT_1080);
+	dtd->client_attr0 = 0;
+	dtd->client_attr1 = 0;
+
+	list->next = dtd + 1;
+
+	dump_dtd(dtd);
+}
+
+/*
+ * append an inbound data transfer descriptor to the given descriptor list,
+ * this sets up a 'memory to client' VPDMA transfer for the given VPDMA channel
+ */
+void vpdma_add_in_dtd(struct vpdma_desc_list *list, int frame_width,
+		int frame_height, struct v4l2_rect *c_rect,
+		const struct vpdma_data_format *fmt, dma_addr_t dma_addr,
+		enum vpdma_channel chan, int field, u32 flags)
+{
+	int priority = 0;
+	int notify = 1;
+	int depth = fmt->depth;
+	int channel, next_chan;
+	int stride;
+	int height = c_rect->height;
+	struct vpdma_dtd *dtd;
+
+	channel = next_chan = chan_info[chan].num;
+
+	if (fmt->data_type == DATA_TYPE_C420) {
+		height >>= 1;
+		frame_height >>= 1;
+		depth = 8;
+	}
+
+	stride = (depth * c_rect->width) >> 3;
+	dma_addr += (c_rect->left * depth) >> 3;
+
+	dtd = list->next;
+	WARN_ON((void *)(dtd + 1) > (list->buf.addr + list->buf.size));
+
+	dtd->type_ctl_stride = dtd_type_ctl_stride(fmt->data_type,
+					notify,
+					field,
+					!!(flags & VPDMA_DATA_FRAME_1D),
+					!!(flags & VPDMA_DATA_EVEN_LINE_SKIP),
+					!!(flags & VPDMA_DATA_ODD_LINE_SKIP),
+					stride);
+
+	dtd->xfer_length_height = dtd_xfer_length_height(c_rect->width, height);
+	dtd->start_addr = (u32) dma_addr;
+	dtd->pkt_ctl = dtd_pkt_ctl(!!(flags & VPDMA_DATA_MODE_TILED),
+				DTD_DIR_IN, channel, priority, next_chan);
+	dtd->frame_width_height = dtd_frame_width_height(frame_width,
+					frame_height);
+	dtd->start_h_v = dtd_start_h_v(c_rect->left, c_rect->top);
+	dtd->client_attr0 = 0;
+	dtd->client_attr1 = 0;
+
+	list->next = dtd + 1;
+
+	dump_dtd(dtd);
+}
+
 /* set or clear the mask for list complete interrupt */
 void vpdma_enable_list_complete_irq(struct vpdma_data *vpdma, int list_num,
 		bool enable)
diff --git a/drivers/media/platform/ti-vpe/vpdma.h b/drivers/media/platform/ti-vpe/vpdma.h
index 2e571a8..702ba45 100644
--- a/drivers/media/platform/ti-vpe/vpdma.h
+++ b/drivers/media/platform/ti-vpe/vpdma.h
@@ -123,6 +123,39 @@ enum vpdma_channel {
 	VPE_CHAN_RGB_OUT,
 };
 
+/* flags for VPDMA data descriptors */
+#define VPDMA_DATA_ODD_LINE_SKIP	(1 << 0)
+#define VPDMA_DATA_EVEN_LINE_SKIP	(1 << 1)
+#define VPDMA_DATA_FRAME_1D		(1 << 2)
+#define VPDMA_DATA_MODE_TILED		(1 << 3)
+
+/*
+ * client identifiers used for configuration descriptors
+ */
+#define CFD_MMR_CLIENT		0
+#define CFD_SC_CLIENT		7
+
+/* Address data block header format */
+struct vpdma_adb_hdr {
+	u32			offset;
+	u32			nwords;
+	u32			reserved0;
+	u32			reserved1;
+};
+
+/* helpers for creating ADB headers for config descriptors MMRs as client */
+#define ADB_ADDR(dma_buf, str, fld)	((dma_buf)->addr + offsetof(str, fld))
+#define MMR_ADB_ADDR(buf, str, fld)	ADB_ADDR(&(buf), struct str, fld)
+
+#define VPDMA_SET_MMR_ADB_HDR(buf, str, hdr, regs, offset_a)	\
+	do {							\
+		struct vpdma_adb_hdr *h;			\
+		struct str *adb = NULL;				\
+		h = MMR_ADB_ADDR(buf, str, hdr);		\
+		h->offset = (offset_a);				\
+		h->nwords = sizeof(adb->regs) >> 2;		\
+	} while (0)
+
 /* vpdma descriptor buffer allocation and management */
 int vpdma_buf_alloc(struct vpdma_buf *buf, size_t size);
 void vpdma_buf_free(struct vpdma_buf *buf);
@@ -135,6 +168,21 @@ void vpdma_reset_desc_list(struct vpdma_desc_list *list);
 void vpdma_free_desc_list(struct vpdma_desc_list *list);
 int vpdma_submit_descs(struct vpdma_data *vpdma, struct vpdma_desc_list *list);
 
+/* helpers for creating vpdma descriptors */
+void vpdma_add_cfd_block(struct vpdma_desc_list *list, int client,
+		struct vpdma_buf *blk, u32 dest_offset);
+void vpdma_add_cfd_adb(struct vpdma_desc_list *list, int client,
+		struct vpdma_buf *adb);
+void vpdma_add_sync_on_channel_ctd(struct vpdma_desc_list *list,
+		enum vpdma_channel chan);
+void vpdma_add_out_dtd(struct vpdma_desc_list *list, struct v4l2_rect *c_rect,
+		const struct vpdma_data_format *fmt, dma_addr_t dma_addr,
+		enum vpdma_channel chan, u32 flags);
+void vpdma_add_in_dtd(struct vpdma_desc_list *list, int frame_width,
+		int frame_height, struct v4l2_rect *c_rect,
+		const struct vpdma_data_format *fmt, dma_addr_t dma_addr,
+		enum vpdma_channel chan, int field, u32 flags);
+
 /* vpdma list interrupt management */
 void vpdma_enable_list_complete_irq(struct vpdma_data *vpdma, int list_num,
 		bool enable);
diff --git a/drivers/media/platform/ti-vpe/vpdma_priv.h b/drivers/media/platform/ti-vpe/vpdma_priv.h
index 8ff51a3..da3976b 100644
--- a/drivers/media/platform/ti-vpe/vpdma_priv.h
+++ b/drivers/media/platform/ti-vpe/vpdma_priv.h
@@ -116,4 +116,525 @@
 #define	VPE_CHAN_NUM_CHROMA_OUT		103
 #define	VPE_CHAN_NUM_RGB_OUT		106
 
+/*
+ * a VPDMA address data block payload for a configuration descriptor needs to
+ * have each sub block length as a multiple of 16 bytes. Therefore, the overall
+ * size of the payload also needs to be a multiple of 16 bytes. The sub block
+ * lengths should be ensured to be aligned by the VPDMA user.
+ */
+#define VPDMA_ADB_SIZE_ALIGN		0x0f
+
+/*
+ * data transfer descriptor
+ */
+struct vpdma_dtd {
+	u32			type_ctl_stride;
+	union {
+		u32		xfer_length_height;
+		u32		w1;
+	};
+	dma_addr_t		start_addr;
+	u32			pkt_ctl;
+	union {
+		u32		frame_width_height;	/* inbound */
+		dma_addr_t	desc_write_addr;	/* outbound */
+	};
+	union {
+		u32		start_h_v;		/* inbound */
+		u32		max_width_height;	/* outbound */
+	};
+	u32			client_attr0;
+	u32			client_attr1;
+};
+
+/* Data Transfer Descriptor specifics */
+#define DTD_NO_NOTIFY		0
+#define DTD_NOTIFY		1
+
+#define DTD_PKT_TYPE		0xa
+#define DTD_DIR_IN		0
+#define DTD_DIR_OUT		1
+
+/* type_ctl_stride */
+#define DTD_DATA_TYPE_MASK	0x3f
+#define DTD_DATA_TYPE_SHFT	26
+#define DTD_NOTIFY_MASK		0x01
+#define DTD_NOTIFY_SHFT		25
+#define DTD_FIELD_MASK		0x01
+#define DTD_FIELD_SHFT		24
+#define DTD_1D_MASK		0x01
+#define DTD_1D_SHFT		23
+#define DTD_EVEN_LINE_SKIP_MASK	0x01
+#define DTD_EVEN_LINE_SKIP_SHFT	20
+#define DTD_ODD_LINE_SKIP_MASK	0x01
+#define DTD_ODD_LINE_SKIP_SHFT	16
+#define DTD_LINE_STRIDE_MASK	0xffff
+#define DTD_LINE_STRIDE_SHFT	0
+
+/* xfer_length_height */
+#define DTD_LINE_LENGTH_MASK	0xffff
+#define DTD_LINE_LENGTH_SHFT	16
+#define DTD_XFER_HEIGHT_MASK	0xffff
+#define DTD_XFER_HEIGHT_SHFT	0
+
+/* pkt_ctl */
+#define DTD_PKT_TYPE_MASK	0x1f
+#define DTD_PKT_TYPE_SHFT	27
+#define DTD_MODE_MASK		0x01
+#define DTD_MODE_SHFT		26
+#define DTD_DIR_MASK		0x01
+#define DTD_DIR_SHFT		25
+#define DTD_CHAN_MASK		0x01ff
+#define DTD_CHAN_SHFT		16
+#define DTD_PRI_MASK		0x0f
+#define DTD_PRI_SHFT		9
+#define DTD_NEXT_CHAN_MASK	0x01ff
+#define DTD_NEXT_CHAN_SHFT	0
+
+/* frame_width_height */
+#define DTD_FRAME_WIDTH_MASK	0xffff
+#define DTD_FRAME_WIDTH_SHFT	16
+#define DTD_FRAME_HEIGHT_MASK	0xffff
+#define DTD_FRAME_HEIGHT_SHFT	0
+
+/* start_h_v */
+#define DTD_H_START_MASK	0xffff
+#define DTD_H_START_SHFT	16
+#define DTD_V_START_MASK	0xffff
+#define DTD_V_START_SHFT	0
+
+#define DTD_DESC_START_SHIFT	5
+#define DTD_WRITE_DESC_MASK	0x01
+#define DTD_WRITE_DESC_SHIFT	2
+#define DTD_DROP_DATA_MASK	0x01
+#define DTD_DROP_DATA_SHIFT	1
+#define DTD_USE_DESC_MASK	0x01
+#define DTD_USE_DESC_SHIFT	0
+
+/* max_width_height */
+#define DTD_MAX_WIDTH_MASK	0x07
+#define DTD_MAX_WIDTH_SHFT	4
+#define DTD_MAX_HEIGHT_MASK	0x07
+#define DTD_MAX_HEIGHT_SHFT	0
+
+/* max width configurations */
+ /* unlimited width */
+#define	MAX_OUT_WIDTH_UNLIMITED		0
+/* as specified in max_size1 reg */
+#define MAX_OUT_WIDTH_REG1		1
+/* as specified in max_size2 reg */
+#define MAX_OUT_WIDTH_REG2		2
+/* as specified in max_size3 reg */
+#define	MAX_OUT_WIDTH_REG3		3
+/* maximum of 352 pixels as width */
+#define MAX_OUT_WIDTH_352		4
+/* maximum of 768 pixels as width */
+#define	MAX_OUT_WIDTH_768		5
+/* maximum of 1280 pixels width */
+#define	MAX_OUT_WIDTH_1280		6
+/* maximum of 1920 pixels as width */
+#define	MAX_OUT_WIDTH_1920		7
+
+/* max height configurations */
+ /* unlimited height */
+#define	MAX_OUT_HEIGHT_UNLIMITED	0
+/* as specified in max_size1 reg */
+#define MAX_OUT_HEIGHT_REG1		1
+/* as specified in max_size2 reg */
+#define MAX_OUT_HEIGHT_REG2		2
+/* as specified in max_size3 reg */
+#define	MAX_OUT_HEIGHT_REG3		3
+/* maximum of 288 lines as height */
+#define MAX_OUT_HEIGHT_288		4
+/* maximum of 576 lines as height */
+#define	MAX_OUT_HEIGHT_576		5
+/* maximum of 720 lines as height */
+#define	MAX_OUT_HEIGHT_720		6
+/* maximum of 1080 lines as height */
+#define	MAX_OUT_HEIGHT_1080		7
+
+static inline u32 dtd_type_ctl_stride(int type, bool notify, int field,
+			bool one_d, bool even_line_skip, bool odd_line_skip,
+			int line_stride)
+{
+	return (type << DTD_DATA_TYPE_SHFT) | (notify << DTD_NOTIFY_SHFT) |
+		(field << DTD_FIELD_SHFT) | (one_d << DTD_1D_SHFT) |
+		(even_line_skip << DTD_EVEN_LINE_SKIP_SHFT) |
+		(odd_line_skip << DTD_ODD_LINE_SKIP_SHFT) |
+		line_stride;
+}
+
+static inline u32 dtd_xfer_length_height(int line_length, int xfer_height)
+{
+	return (line_length << DTD_LINE_LENGTH_SHFT) | xfer_height;
+}
+
+static inline u32 dtd_pkt_ctl(bool mode, bool dir, int chan, int pri,
+			int next_chan)
+{
+	return (DTD_PKT_TYPE << DTD_PKT_TYPE_SHFT) | (mode << DTD_MODE_SHFT) |
+		(dir << DTD_DIR_SHFT) | (chan << DTD_CHAN_SHFT) |
+		(pri << DTD_PRI_SHFT) | next_chan;
+}
+
+static inline u32 dtd_frame_width_height(int width, int height)
+{
+	return (width << DTD_FRAME_WIDTH_SHFT) | height;
+}
+
+static inline u32 dtd_desc_write_addr(unsigned int addr, bool write_desc,
+			bool drop_data, bool use_desc)
+{
+	return (addr << DTD_DESC_START_SHIFT) |
+		(write_desc << DTD_WRITE_DESC_SHIFT) |
+		(drop_data << DTD_DROP_DATA_SHIFT) |
+		use_desc;
+}
+
+static inline u32 dtd_start_h_v(int h_start, int v_start)
+{
+	return (h_start << DTD_H_START_SHFT) | v_start;
+}
+
+static inline u32 dtd_max_width_height(int max_width, int max_height)
+{
+	return (max_width << DTD_MAX_WIDTH_SHFT) | max_height;
+}
+
+static inline int dtd_get_data_type(struct vpdma_dtd *dtd)
+{
+	return dtd->type_ctl_stride >> DTD_DATA_TYPE_SHFT;
+}
+
+static inline bool dtd_get_notify(struct vpdma_dtd *dtd)
+{
+	return (dtd->type_ctl_stride >> DTD_NOTIFY_SHFT) & DTD_NOTIFY_MASK;
+}
+
+static inline int dtd_get_field(struct vpdma_dtd *dtd)
+{
+	return (dtd->type_ctl_stride >> DTD_FIELD_SHFT) & DTD_FIELD_MASK;
+}
+
+static inline bool dtd_get_1d(struct vpdma_dtd *dtd)
+{
+	return (dtd->type_ctl_stride >> DTD_1D_SHFT) & DTD_1D_MASK;
+}
+
+static inline bool dtd_get_even_line_skip(struct vpdma_dtd *dtd)
+{
+	return (dtd->type_ctl_stride >> DTD_EVEN_LINE_SKIP_SHFT)
+		& DTD_EVEN_LINE_SKIP_MASK;
+}
+
+static inline bool dtd_get_odd_line_skip(struct vpdma_dtd *dtd)
+{
+	return (dtd->type_ctl_stride >> DTD_ODD_LINE_SKIP_SHFT)
+		& DTD_ODD_LINE_SKIP_MASK;
+}
+
+static inline int dtd_get_line_stride(struct vpdma_dtd *dtd)
+{
+	return dtd->type_ctl_stride & DTD_LINE_STRIDE_MASK;
+}
+
+static inline int dtd_get_line_length(struct vpdma_dtd *dtd)
+{
+	return dtd->xfer_length_height >> DTD_LINE_LENGTH_SHFT;
+}
+
+static inline int dtd_get_xfer_height(struct vpdma_dtd *dtd)
+{
+	return dtd->xfer_length_height & DTD_XFER_HEIGHT_MASK;
+}
+
+static inline int dtd_get_pkt_type(struct vpdma_dtd *dtd)
+{
+	return dtd->pkt_ctl >> DTD_PKT_TYPE_SHFT;
+}
+
+static inline bool dtd_get_mode(struct vpdma_dtd *dtd)
+{
+	return (dtd->pkt_ctl >> DTD_MODE_SHFT) & DTD_MODE_MASK;
+}
+
+static inline bool dtd_get_dir(struct vpdma_dtd *dtd)
+{
+	return (dtd->pkt_ctl >> DTD_DIR_SHFT) & DTD_DIR_MASK;
+}
+
+static inline int dtd_get_chan(struct vpdma_dtd *dtd)
+{
+	return (dtd->pkt_ctl >> DTD_CHAN_SHFT) & DTD_CHAN_MASK;
+}
+
+static inline int dtd_get_priority(struct vpdma_dtd *dtd)
+{
+	return (dtd->pkt_ctl >> DTD_PRI_SHFT) & DTD_PRI_MASK;
+}
+
+static inline int dtd_get_next_chan(struct vpdma_dtd *dtd)
+{
+	return (dtd->pkt_ctl >> DTD_NEXT_CHAN_SHFT) & DTD_NEXT_CHAN_MASK;
+}
+
+static inline int dtd_get_frame_width(struct vpdma_dtd *dtd)
+{
+	return dtd->frame_width_height >> DTD_FRAME_WIDTH_SHFT;
+}
+
+static inline int dtd_get_frame_height(struct vpdma_dtd *dtd)
+{
+	return dtd->frame_width_height & DTD_FRAME_HEIGHT_MASK;
+}
+
+static inline int dtd_get_desc_write_addr(struct vpdma_dtd *dtd)
+{
+	return dtd->desc_write_addr >> DTD_DESC_START_SHIFT;
+}
+
+static inline bool dtd_get_write_desc(struct vpdma_dtd *dtd)
+{
+	return (dtd->desc_write_addr >> DTD_WRITE_DESC_SHIFT) &
+							DTD_WRITE_DESC_MASK;
+}
+
+static inline bool dtd_get_drop_data(struct vpdma_dtd *dtd)
+{
+	return (dtd->desc_write_addr >> DTD_DROP_DATA_SHIFT) &
+							DTD_DROP_DATA_MASK;
+}
+
+static inline bool dtd_get_use_desc(struct vpdma_dtd *dtd)
+{
+	return dtd->desc_write_addr & DTD_USE_DESC_MASK;
+}
+
+static inline int dtd_get_h_start(struct vpdma_dtd *dtd)
+{
+	return dtd->start_h_v >> DTD_H_START_SHFT;
+}
+
+static inline int dtd_get_v_start(struct vpdma_dtd *dtd)
+{
+	return dtd->start_h_v & DTD_V_START_MASK;
+}
+
+static inline int dtd_get_max_width(struct vpdma_dtd *dtd)
+{
+	return (dtd->max_width_height >> DTD_MAX_WIDTH_SHFT) &
+							DTD_MAX_WIDTH_MASK;
+}
+
+static inline int dtd_get_max_height(struct vpdma_dtd *dtd)
+{
+	return (dtd->max_width_height >> DTD_MAX_HEIGHT_SHFT) &
+							DTD_MAX_HEIGHT_MASK;
+}
+
+/*
+ * configuration descriptor
+ */
+struct vpdma_cfd {
+	union {
+		u32	dest_addr_offset;
+		u32	w0;
+	};
+	union {
+		u32	block_len;		/* in words */
+		u32	w1;
+	};
+	u32		payload_addr;
+	u32		ctl_payload_len;	/* in words */
+};
+
+/* Configuration descriptor specifics */
+
+#define CFD_PKT_TYPE		0xb
+
+#define CFD_DIRECT		1
+#define CFD_INDIRECT		0
+#define CFD_CLS_ADB		0
+#define CFD_CLS_BLOCK		1
+
+/* block_len */
+#define CFD__BLOCK_LEN_MASK	0xffff
+#define CFD__BLOCK_LEN_SHFT	0
+
+/* ctl_payload_len */
+#define CFD_PKT_TYPE_MASK	0x1f
+#define CFD_PKT_TYPE_SHFT	27
+#define CFD_DIRECT_MASK		0x01
+#define CFD_DIRECT_SHFT		26
+#define CFD_CLASS_MASK		0x03
+#define CFD_CLASS_SHFT		24
+#define CFD_DEST_MASK		0xff
+#define CFD_DEST_SHFT		16
+#define CFD_PAYLOAD_LEN_MASK	0xffff
+#define CFD_PAYLOAD_LEN_SHFT	0
+
+static inline u32 cfd_pkt_payload_len(bool direct, int cls, int dest,
+		int payload_len)
+{
+	return (CFD_PKT_TYPE << CFD_PKT_TYPE_SHFT) |
+		(direct << CFD_DIRECT_SHFT) |
+		(cls << CFD_CLASS_SHFT) |
+		(dest << CFD_DEST_SHFT) |
+		payload_len;
+}
+
+static inline int cfd_get_pkt_type(struct vpdma_cfd *cfd)
+{
+	return cfd->ctl_payload_len >> CFD_PKT_TYPE_SHFT;
+}
+
+static inline bool cfd_get_direct(struct vpdma_cfd *cfd)
+{
+	return (cfd->ctl_payload_len >> CFD_DIRECT_SHFT) & CFD_DIRECT_MASK;
+}
+
+static inline bool cfd_get_class(struct vpdma_cfd *cfd)
+{
+	return (cfd->ctl_payload_len >> CFD_CLASS_SHFT) & CFD_CLASS_MASK;
+}
+
+static inline int cfd_get_dest(struct vpdma_cfd *cfd)
+{
+	return (cfd->ctl_payload_len >> CFD_DEST_SHFT) & CFD_DEST_MASK;
+}
+
+static inline int cfd_get_payload_len(struct vpdma_cfd *cfd)
+{
+	return cfd->ctl_payload_len & CFD_PAYLOAD_LEN_MASK;
+}
+
+/*
+ * control descriptor
+ */
+struct vpdma_ctd {
+	union {
+		u32	timer_value;
+		u32	list_addr;
+		u32	w0;
+	};
+	union {
+		u32	pixel_line_count;
+		u32	list_size;
+		u32	w1;
+	};
+	union {
+		u32	event;
+		u32	fid_ctl;
+		u32	w2;
+	};
+	u32		type_source_ctl;
+};
+
+/* control descriptor types */
+#define CTD_TYPE_SYNC_ON_CLIENT		0
+#define CTD_TYPE_SYNC_ON_LIST		1
+#define CTD_TYPE_SYNC_ON_EXT		2
+#define CTD_TYPE_SYNC_ON_CHANNEL	3
+#define CTD_TYPE_CHNG_CLIENT_IRQ	4
+#define CTD_TYPE_SEND_IRQ		5
+#define CTD_TYPE_RELOAD_LIST		6
+#define CTD_TYPE_ABORT_CHANNEL		6
+
+#define CTD_PKT_TYPE		0xc
+
+/* timer_value */
+#define CTD_TIMER_VALUE_MASK	0xffff
+#define CTD_TIMER_VALUE_SHFT	0
+
+/* pixel_line_count */
+#define CTD_PIXEL_COUNT_MASK	0xffff
+#define CTD_PIXEL_COUNT_SHFT	16
+#define CTD_LINE_COUNT_MASK	0xffff
+#define CTD_LINE_COUNT_SHFT	0
+
+/* list_size */
+#define CTD_LIST_SIZE_MASK	0xffff
+#define CTD_LIST_SIZE_SHFT	0
+
+/* event */
+#define CTD_EVENT_MASK		0x0f
+#define CTD_EVENT_SHFT		0
+
+/* fid_ctl */
+#define CTD_FID2_MASK		0x03
+#define CTD_FID2_SHFT		4
+#define CTD_FID1_MASK		0x03
+#define CTD_FID1_SHFT		2
+#define CTD_FID0_MASK		0x03
+#define CTD_FID0_SHFT		0
+
+/* type_source_ctl */
+#define CTD_PKT_TYPE_MASK	0x1f
+#define CTD_PKT_TYPE_SHFT	27
+#define CTD_SOURCE_MASK		0xff
+#define CTD_SOURCE_SHFT		16
+#define CTD_CONTROL_MASK	0x0f
+#define CTD_CONTROL_SHFT	0
+
+static inline u32 ctd_pixel_line_count(int pixel_count, int line_count)
+{
+	return (pixel_count << CTD_PIXEL_COUNT_SHFT) | line_count;
+}
+
+static inline u32 ctd_set_fid_ctl(int fid0, int fid1, int fid2)
+{
+	return (fid2 << CTD_FID2_SHFT) | (fid1 << CTD_FID1_SHFT) | fid0;
+}
+
+static inline u32 ctd_type_source_ctl(int source, int control)
+{
+	return (CTD_PKT_TYPE << CTD_PKT_TYPE_SHFT) |
+		(source << CTD_SOURCE_SHFT) | control;
+}
+
+static inline u32 ctd_get_pixel_count(struct vpdma_ctd *ctd)
+{
+	return ctd->pixel_line_count >> CTD_PIXEL_COUNT_SHFT;
+}
+
+static inline int ctd_get_line_count(struct vpdma_ctd *ctd)
+{
+	return ctd->pixel_line_count & CTD_LINE_COUNT_MASK;
+}
+
+static inline int ctd_get_event(struct vpdma_ctd *ctd)
+{
+	return ctd->event & CTD_EVENT_MASK;
+}
+
+static inline int ctd_get_fid2_ctl(struct vpdma_ctd *ctd)
+{
+	return (ctd->fid_ctl >> CTD_FID2_SHFT) & CTD_FID2_MASK;
+}
+
+static inline int ctd_get_fid1_ctl(struct vpdma_ctd *ctd)
+{
+	return (ctd->fid_ctl >> CTD_FID1_SHFT) & CTD_FID1_MASK;
+}
+
+static inline int ctd_get_fid0_ctl(struct vpdma_ctd *ctd)
+{
+	return ctd->fid_ctl & CTD_FID2_MASK;
+}
+
+static inline int ctd_get_pkt_type(struct vpdma_ctd *ctd)
+{
+	return ctd->type_source_ctl >> CTD_PKT_TYPE_SHFT;
+}
+
+static inline int ctd_get_source(struct vpdma_ctd *ctd)
+{
+	return (ctd->type_source_ctl >> CTD_SOURCE_SHFT) & CTD_SOURCE_MASK;
+}
+
+static inline int ctd_get_ctl(struct vpdma_ctd *ctd)
+{
+	return ctd->type_source_ctl & CTD_CONTROL_MASK;
+}
+
 #endif
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v2 3/6] v4l: ti-vpe: Add VPE mem to mem driver
  2013-08-20 11:00   ` Archit Taneja
@ 2013-08-20 11:00     ` Archit Taneja
  -1 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-08-20 11:00 UTC (permalink / raw)
  To: linux-media, hverkuil, laurent.pinchart, tomi.valkeinen
  Cc: linux-omap, Archit Taneja

VPE is a block which consists of a single memory to memory path which can
perform chrominance up/down sampling, de-interlacing, scaling, and color space
conversion of raster or tiled YUV420 coplanar, YUV422 coplanar or YUV422
interleaved video formats.

We create a mem2mem driver based primarily on the mem2mem-testdev example.
The de-interlacer, scaler and color space converter are all bypassed for now
to keep the driver simple. Chroma up/down sampler blocks are implemented, so
conversion beteen different YUV formats is possible.

Each mem2mem context allocates a buffer for VPE MMR values which it will use
when it gets access to the VPE HW via the mem2mem queue, it also allocates
a VPDMA descriptor list to which configuration and data descriptors are added.

Based on the information received via v4l2 ioctls for the source and
destination queues, the driver configures the values for the MMRs, and stores
them in the buffer. There are also some VPDMA parameters like frame start and
line mode which needs to be configured, these are configured by direct register
writes via the VPDMA helper functions.

The driver's device_run() mem2mem op will add each descriptor based on how the
source and destination queues are set up for the given ctx, once the list is
prepared, it's submitted to VPDMA, these descriptors when parsed by VPDMA will
upload MMR registers, start DMA of video buffers on the various input and output
clients/ports.

When the list is parsed completely(and the DMAs on all the output ports done),
an interrupt is generated which we use to notify that the source and destination
buffers are done.

The rest of the driver is quite similar to other mem2mem drivers, we use the
multiplane v4l2 ioctls as the HW support coplanar formats.

Signed-off-by: Archit Taneja <archit@ti.com>
---
 drivers/media/platform/Kconfig           |   16 +
 drivers/media/platform/Makefile          |    2 +
 drivers/media/platform/ti-vpe/Makefile   |    5 +
 drivers/media/platform/ti-vpe/vpe.c      | 1740 ++++++++++++++++++++++++++++++
 drivers/media/platform/ti-vpe/vpe_regs.h |  496 +++++++++
 5 files changed, 2259 insertions(+)
 create mode 100644 drivers/media/platform/ti-vpe/Makefile
 create mode 100644 drivers/media/platform/ti-vpe/vpe.c
 create mode 100644 drivers/media/platform/ti-vpe/vpe_regs.h

diff --git a/drivers/media/platform/Kconfig b/drivers/media/platform/Kconfig
index 08de865..94eede7 100644
--- a/drivers/media/platform/Kconfig
+++ b/drivers/media/platform/Kconfig
@@ -210,6 +210,22 @@ config VIDEO_SH_VEU
 	    Support for the Video Engine Unit (VEU) on SuperH and
 	    SH-Mobile SoCs.
 
+config VIDEO_TI_VPE
+	tristate "TI VPE (Video Processing Engine) driver"
+	depends on VIDEO_DEV && VIDEO_V4L2 && SOC_DRA7XX
+	select VIDEOBUF2_DMA_CONTIG
+	select V4L2_MEM2MEM_DEV
+	default n
+	---help---
+	  Support for the TI VPE(Video Processing Engine) block
+	  found on DRA7XX SoC.
+
+config VIDEO_TI_VPE_DEBUG
+	bool "VPE debug messages"
+	depends on VIDEO_TI_VPE
+	---help---
+	  Enable debug messages on VPE driver.
+
 endif # V4L_MEM2MEM_DRIVERS
 
 menuconfig V4L_TEST_DRIVERS
diff --git a/drivers/media/platform/Makefile b/drivers/media/platform/Makefile
index eee28dd..d4614e7 100644
--- a/drivers/media/platform/Makefile
+++ b/drivers/media/platform/Makefile
@@ -22,6 +22,8 @@ obj-$(CONFIG_VIDEO_VIVI) += vivi.o
 
 obj-$(CONFIG_VIDEO_MEM2MEM_TESTDEV) += mem2mem_testdev.o
 
+obj-$(CONFIG_VIDEO_TI_VPE)		+= ti-vpe/
+
 obj-$(CONFIG_VIDEO_MX2_EMMAPRP)		+= mx2_emmaprp.o
 obj-$(CONFIG_VIDEO_CODA) 		+= coda.o
 
diff --git a/drivers/media/platform/ti-vpe/Makefile b/drivers/media/platform/ti-vpe/Makefile
new file mode 100644
index 0000000..cbf0a80
--- /dev/null
+++ b/drivers/media/platform/ti-vpe/Makefile
@@ -0,0 +1,5 @@
+obj-$(CONFIG_VIDEO_TI_VPE) += ti-vpe.o
+
+ti-vpe-y := vpe.o vpdma.o
+
+ccflags-$(CONFIG_VIDEO_TI_VPE_DEBUG) += -DDEBUG
diff --git a/drivers/media/platform/ti-vpe/vpe.c b/drivers/media/platform/ti-vpe/vpe.c
new file mode 100644
index 0000000..5e1d80e
--- /dev/null
+++ b/drivers/media/platform/ti-vpe/vpe.c
@@ -0,0 +1,1740 @@
+/*
+ * TI VPE mem2mem driver, based on the virtual v4l2-mem2mem example driver
+ *
+ * Copyright (c) 2013 Texas Instruments Inc.
+ * David Griego, <dagriego@biglakesoftware.com>
+ * Dale Farnsworth, <dale@farnsworth.org>
+ * Archit Taneja, <archit@ti.com>
+ *
+ * Copyright (c) 2009-2010 Samsung Electronics Co., Ltd.
+ * Pawel Osciak, <pawel@osciak.com>
+ * Marek Szyprowski, <m.szyprowski@samsung.com>
+ *
+ * Based on the virtual v4l2-mem2mem example device
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation
+ */
+
+#include <linux/delay.h>
+#include <linux/dma-mapping.h>
+#include <linux/err.h>
+#include <linux/fs.h>
+#include <linux/interrupt.h>
+#include <linux/io.h>
+#include <linux/ioctl.h>
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <linux/pm_runtime.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/videodev2.h>
+
+#include <media/v4l2-common.h>
+#include <media/v4l2-ctrls.h>
+#include <media/v4l2-device.h>
+#include <media/v4l2-event.h>
+#include <media/v4l2-ioctl.h>
+#include <media/v4l2-mem2mem.h>
+#include <media/videobuf2-core.h>
+#include <media/videobuf2-dma-contig.h>
+
+#include "vpdma.h"
+#include "vpe_regs.h"
+
+#define VPE_MODULE_NAME "vpe"
+
+/* minimum and maximum frame sizes */
+#define MIN_W		128
+#define MIN_H		128
+#define MAX_W		1920
+#define MAX_H		1080
+
+/* required alignments */
+#define S_ALIGN		0	/* multiple of 1 */
+#define H_ALIGN		1	/* multiple of 2 */
+#define W_ALIGN		1	/* multiple of 2 */
+
+/* multiple of 128 bits, line stride, 16 bytes */
+#define L_ALIGN		4
+
+/* flags that indicate a format can be used for capture/output */
+#define VPE_FMT_TYPE_CAPTURE	(1 << 0)
+#define VPE_FMT_TYPE_OUTPUT	(1 << 1)
+
+/* used as plane indices */
+#define VPE_MAX_PLANES	2
+#define VPE_LUMA	0
+#define VPE_CHROMA	1
+
+/* per m2m context info */
+#define VPE_DEF_BUFS_PER_JOB	1	/* default one buffer per batch job */
+
+/*
+ * each VPE context needs up to 3 config desciptors, 7 input descriptors,
+ * 3 output descriptors, and 3 control descripors
+ */
+#define VPE_DESC_LIST_SIZE	(15 * VPDMA_MAX_DESC_SIZE)
+
+#define vpe_dbg(vpedev, fmt, arg...)	\
+		dev_dbg((vpedev)->v4l2_dev.dev, fmt, ##arg)
+#define vpe_err(vpedev, fmt, arg...)	\
+		dev_err((vpedev)->v4l2_dev.dev, fmt, ##arg)
+
+struct vpe_us_coeffs {
+	unsigned short	anchor_fid0_c0;
+	unsigned short	anchor_fid0_c1;
+	unsigned short	anchor_fid0_c2;
+	unsigned short	anchor_fid0_c3;
+	unsigned short	interp_fid0_c0;
+	unsigned short	interp_fid0_c1;
+	unsigned short	interp_fid0_c2;
+	unsigned short	interp_fid0_c3;
+	unsigned short	anchor_fid1_c0;
+	unsigned short	anchor_fid1_c1;
+	unsigned short	anchor_fid1_c2;
+	unsigned short	anchor_fid1_c3;
+	unsigned short	interp_fid1_c0;
+	unsigned short	interp_fid1_c1;
+	unsigned short	interp_fid1_c2;
+	unsigned short	interp_fid1_c3;
+};
+
+/*
+ * Default upsampler coefficients
+ */
+static const struct vpe_us_coeffs us_coeffs[] = {
+	{
+		/* Coefficients for progressive input */
+		0x00C8, 0x0348, 0x0018, 0x3FD8, 0x3FB8, 0x0378, 0x00E8, 0x3FE8,
+		0x00C8, 0x0348, 0x0018, 0x3FD8, 0x3FB8, 0x0378, 0x00E8, 0x3FE8,
+	},
+};
+
+/*
+ * The port_data structure contains per-port data.
+ */
+struct vpe_port_data {
+	enum vpdma_channel channel;	/* VPDMA channel */
+	u8	vb_part;		/* plane index for co-panar formats */
+};
+
+/*
+ * Define indices into the port_data tables
+ */
+#define VPE_PORT_LUMA1_IN	0
+#define VPE_PORT_CHROMA1_IN	1
+#define VPE_PORT_LUMA_OUT	8
+#define VPE_PORT_CHROMA_OUT	9
+#define VPE_PORT_RGB_OUT	10
+
+static const struct vpe_port_data port_data[11] = {
+	[VPE_PORT_LUMA1_IN] = {
+		.channel	= VPE_CHAN_LUMA1_IN,
+		.vb_part	= VPE_LUMA,
+	},
+	[VPE_PORT_CHROMA1_IN] = {
+		.channel	= VPE_CHAN_CHROMA1_IN,
+		.vb_part	= VPE_CHROMA,
+	},
+	[VPE_PORT_LUMA_OUT] = {
+		.channel	= VPE_CHAN_LUMA_OUT,
+		.vb_part	= VPE_LUMA,
+	},
+	[VPE_PORT_CHROMA_OUT] = {
+		.channel	= VPE_CHAN_CHROMA_OUT,
+		.vb_part	= VPE_CHROMA,
+	},
+	[VPE_PORT_RGB_OUT] = {
+		.channel	= VPE_CHAN_RGB_OUT,
+		.vb_part	= VPE_LUMA,
+	},
+};
+
+
+/* driver info for each of the supported video formats */
+struct vpe_fmt {
+	char	*name;			/* human-readable name */
+	u32	fourcc;			/* standard format identifier */
+	u8	types;			/* CAPTURE and/or OUTPUT */
+	u8	coplanar;		/* set for unpacked Luma and Chroma */
+	/* vpdma format info for each plane */
+	struct vpdma_data_format const *vpdma_fmt[VPE_MAX_PLANES];
+};
+
+static struct vpe_fmt vpe_formats[] = {
+	{
+		.name		= "YUV 422 co-planar",
+		.fourcc		= V4L2_PIX_FMT_NV16,
+		.types		= VPE_FMT_TYPE_CAPTURE | VPE_FMT_TYPE_OUTPUT,
+		.coplanar	= 1,
+		.vpdma_fmt	= { &vpdma_yuv_fmts[VPDMA_DATA_FMT_Y444],
+				    &vpdma_yuv_fmts[VPDMA_DATA_FMT_C444],
+				  },
+	},
+	{
+		.name		= "YUV 420 co-planar",
+		.fourcc		= V4L2_PIX_FMT_NV12,
+		.types		= VPE_FMT_TYPE_CAPTURE | VPE_FMT_TYPE_OUTPUT,
+		.coplanar	= 1,
+		.vpdma_fmt	= { &vpdma_yuv_fmts[VPDMA_DATA_FMT_Y420],
+				    &vpdma_yuv_fmts[VPDMA_DATA_FMT_C420],
+				  },
+	},
+	{
+		.name		= "YUYV 422 packed",
+		.fourcc		= V4L2_PIX_FMT_YUYV,
+		.types		= VPE_FMT_TYPE_CAPTURE | VPE_FMT_TYPE_OUTPUT,
+		.coplanar	= 0,
+		.vpdma_fmt	= { &vpdma_yuv_fmts[VPDMA_DATA_FMT_YC422],
+				  },
+	},
+	{
+		.name		= "UYVY 422 packed",
+		.fourcc		= V4L2_PIX_FMT_UYVY,
+		.types		= VPE_FMT_TYPE_CAPTURE | VPE_FMT_TYPE_OUTPUT,
+		.coplanar	= 0,
+		.vpdma_fmt	= { &vpdma_yuv_fmts[VPDMA_DATA_FMT_CY422],
+				  },
+	},
+};
+
+/*
+ * per-queue, driver-specific private data.
+ * there is one source queue and one destination queue for each m2m context.
+ */
+struct vpe_q_data {
+	unsigned int		width;				/* frame width */
+	unsigned int		height;				/* frame height */
+	unsigned int		bytesperline[VPE_MAX_PLANES];	/* bytes per line in memory */
+	enum v4l2_colorspace	colorspace;
+	unsigned int		flags;
+	unsigned int		sizeimage[VPE_MAX_PLANES];	/* image size in memory */
+	struct v4l2_rect	c_rect;				/* crop/compose rectangle */
+	struct vpe_fmt		*fmt;				/* format info */
+};
+
+/* vpe_q_data flag bits */
+#define	Q_DATA_FRAME_1D		(1 << 0)
+#define	Q_DATA_MODE_TILED	(1 << 1)
+
+enum {
+	Q_DATA_SRC = 0,
+	Q_DATA_DST = 1,
+};
+
+/* find our format description corresponding to the passed v4l2_format */
+static struct vpe_fmt *find_format(struct v4l2_format *f)
+{
+	struct vpe_fmt *fmt;
+	unsigned int k;
+
+	for (k = 0; k < ARRAY_SIZE(vpe_formats); k++) {
+		fmt = &vpe_formats[k];
+		if (fmt->fourcc == f->fmt.pix.pixelformat)
+			return fmt;
+	}
+
+	return NULL;
+}
+
+/*
+ * there is one vpe_dev structure in the driver, it is shared by
+ * all instances.
+ */
+struct vpe_dev {
+	struct v4l2_device	v4l2_dev;
+	struct video_device	vfd;
+	struct v4l2_m2m_dev	*m2m_dev;
+
+	atomic_t		num_instances;	/* count of driver instances */
+	dma_addr_t		loaded_mmrs;	/* shadow mmrs in device */
+	struct mutex		dev_mutex;
+	spinlock_t		lock;
+
+	int			irq;
+	void __iomem		*base;
+
+	struct vb2_alloc_ctx	*alloc_ctx;
+	struct vpdma_data	*vpdma;		/* vpdma data handle */
+};
+
+/*
+ * There is one vpe_ctx structure for each m2m context.
+ */
+struct vpe_ctx {
+	struct v4l2_fh		fh;
+	struct vpe_dev		*dev;
+	struct v4l2_m2m_ctx	*m2m_ctx;
+	struct v4l2_ctrl_handler hdl;
+
+	unsigned int		sequence;		/* current frame/field seq */
+	unsigned int		aborting;		/* abort after next irq */
+
+	unsigned int		bufs_per_job;		/* input buffers per batch */
+	unsigned int		bufs_completed;		/* bufs done in this batch */
+
+	struct vpe_q_data	q_data[2];		/* src & dst queue data */
+	struct vb2_buffer	*src_vb;
+	struct vb2_buffer	*dst_vb;
+
+	struct vpdma_buf	mmr_adb;		/* shadow reg addr/data block */
+	struct vpdma_desc_list	desc_list;		/* DMA descriptor list */
+
+	bool			load_mmrs;		/* have new shadow reg values */
+};
+
+
+/*
+ * M2M devices get 2 queues.
+ * Return the queue given the type.
+ */
+static struct vpe_q_data *get_q_data(struct vpe_ctx *ctx,
+				     enum v4l2_buf_type type)
+{
+	switch (type) {
+	case V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE:
+		return &ctx->q_data[Q_DATA_SRC];
+	case V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE:
+		return &ctx->q_data[Q_DATA_DST];
+	default:
+		BUG();
+	}
+	return NULL;
+}
+
+static u32 read_reg(struct vpe_dev *dev, int offset)
+{
+	return ioread32(dev->base + offset);
+}
+
+static void write_reg(struct vpe_dev *dev, int offset, u32 value)
+{
+	iowrite32(value, dev->base + offset);
+}
+
+/* register field read/write helpers */
+static int get_field(u32 value, u32 mask, int shift)
+{
+	return (value & (mask << shift)) >> shift;
+}
+
+static int read_field_reg(struct vpe_dev *dev, int offset, u32 mask, int shift)
+{
+	return get_field(read_reg(dev, offset), mask, shift);
+}
+
+static void write_field(u32 *valp, u32 field, u32 mask, int shift)
+{
+	u32 val = *valp;
+
+	val &= ~(mask << shift);
+	val |= (field & mask) << shift;
+	*valp = val;
+}
+
+static void write_field_reg(struct vpe_dev *dev, int offset, u32 field,
+		u32 mask, int shift)
+{
+	u32 val = read_reg(dev, offset);
+
+	write_field(&val, field, mask, shift);
+
+	write_reg(dev, offset, val);
+}
+
+/*
+ * DMA address/data block for the shadow registers
+ */
+struct vpe_mmr_adb {
+	struct vpdma_adb_hdr	out_fmt_hdr;
+	u32			out_fmt_reg[1];
+	u32			out_fmt_pad[3];
+	struct vpdma_adb_hdr	us1_hdr;
+	u32			us1_regs[8];
+	struct vpdma_adb_hdr	us2_hdr;
+	u32			us2_regs[8];
+	struct vpdma_adb_hdr	us3_hdr;
+	u32			us3_regs[8];
+	struct vpdma_adb_hdr	dei_hdr;
+	u32			dei_regs[1];
+	u32			dei_pad[3];
+	struct vpdma_adb_hdr	sc_hdr;
+	u32			sc_regs[1];
+	u32			sc_pad[3];
+	struct vpdma_adb_hdr	csc_hdr;
+	u32			csc_regs[6];
+	u32			csc_pad[2];
+};
+
+#define VPE_SET_MMR_ADB_HDR(ctx, hdr, regs, offset_a)	\
+	VPDMA_SET_MMR_ADB_HDR(ctx->mmr_adb, vpe_mmr_adb, hdr, regs, offset_a)
+/*
+ * Set the headers for all of the address/data block structures.
+ */
+static void init_adb_hdrs(struct vpe_ctx *ctx)
+{
+	VPE_SET_MMR_ADB_HDR(ctx, out_fmt_hdr, out_fmt_reg, VPE_CLK_FORMAT_SELECT);
+	VPE_SET_MMR_ADB_HDR(ctx, us1_hdr, us1_regs, VPE_US1_R0);
+	VPE_SET_MMR_ADB_HDR(ctx, us2_hdr, us2_regs, VPE_US2_R0);
+	VPE_SET_MMR_ADB_HDR(ctx, us3_hdr, us3_regs, VPE_US3_R0);
+	VPE_SET_MMR_ADB_HDR(ctx, dei_hdr, dei_regs, VPE_DEI_FRAME_SIZE);
+	VPE_SET_MMR_ADB_HDR(ctx, sc_hdr, sc_regs, VPE_SC_MP_SC0);
+	VPE_SET_MMR_ADB_HDR(ctx, csc_hdr, csc_regs, VPE_CSC_CSC00);
+};
+
+/*
+ * Enable or disable the VPE clocks
+ */
+static void vpe_set_clock_enable(struct vpe_dev *dev, bool on)
+{
+	u32 val = 0;
+
+	if (on)
+		val = VPE_DATA_PATH_CLK_ENABLE | VPE_VPEDMA_CLK_ENABLE;
+	write_reg(dev, VPE_CLK_ENABLE, val);
+}
+
+static void vpe_top_reset(struct vpe_dev *dev)
+{
+
+	write_field_reg(dev, VPE_CLK_RESET, 1, VPE_DATA_PATH_CLK_RESET_MASK,
+		VPE_DATA_PATH_CLK_RESET_SHIFT);
+
+	usleep_range(100, 150);
+
+	write_field_reg(dev, VPE_CLK_RESET, 0, VPE_DATA_PATH_CLK_RESET_MASK,
+		VPE_DATA_PATH_CLK_RESET_SHIFT);
+}
+
+static void vpe_top_vpdma_reset(struct vpe_dev *dev)
+{
+	write_field_reg(dev, VPE_CLK_RESET, 1, VPE_VPDMA_CLK_RESET_MASK,
+		VPE_VPDMA_CLK_RESET_SHIFT);
+
+	usleep_range(100, 150);
+
+	write_field_reg(dev, VPE_CLK_RESET, 0, VPE_VPDMA_CLK_RESET_MASK,
+		VPE_VPDMA_CLK_RESET_SHIFT);
+}
+
+/*
+ * Load the correct of upsampler coefficients into the shadow MMRs
+ */
+static void set_us_coefficients(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	u32 *us1_reg = &mmr_adb->us1_regs[0];
+	u32 *us2_reg = &mmr_adb->us2_regs[0];
+	u32 *us3_reg = &mmr_adb->us3_regs[0];
+	const unsigned short *cp, *end_cp;
+
+	cp = &us_coeffs[0].anchor_fid0_c0;
+
+	end_cp = cp + sizeof(us_coeffs[0]) / sizeof(*cp);
+
+	while (cp < end_cp) {
+		write_field(us1_reg, *cp++, VPE_US_C0_MASK, VPE_US_C0_SHIFT);
+		write_field(us1_reg, *cp++, VPE_US_C1_MASK, VPE_US_C1_SHIFT);
+		*us2_reg++ = *us1_reg;
+		*us3_reg++ = *us1_reg++;
+	}
+	ctx->load_mmrs = true;
+}
+
+/*
+ * Set the upsampler config mode and the VPDMA line mode in the shadow MMRs.
+ */
+static void set_cfg_and_line_modes(struct vpe_ctx *ctx)
+{
+	struct vpe_fmt *fmt = ctx->q_data[Q_DATA_SRC].fmt;
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	u32 *us1_reg0 = &mmr_adb->us1_regs[0];
+	u32 *us2_reg0 = &mmr_adb->us2_regs[0];
+	u32 *us3_reg0 = &mmr_adb->us3_regs[0];
+	int line_mode = 1;
+	int cfg_mode = 1;
+
+	/*
+	 * Cfg Mode 0: YUV420 source, enable upsampler, DEI is de-interlacing.
+	 * Cfg Mode 1: YUV422 source, disable upsampler, DEI is de-interlacing.
+	 */
+
+	if (fmt->fourcc == V4L2_PIX_FMT_NV12) {
+		cfg_mode = 0;
+		line_mode = 0;		/* double lines to line buffer */
+	}
+
+	write_field(us1_reg0, cfg_mode, VPE_US_MODE_MASK, VPE_US_MODE_SHIFT);
+	write_field(us2_reg0, cfg_mode, VPE_US_MODE_MASK, VPE_US_MODE_SHIFT);
+	write_field(us3_reg0, cfg_mode, VPE_US_MODE_MASK, VPE_US_MODE_SHIFT);
+
+	/* regs for now */
+	vpdma_set_line_mode(ctx->dev->vpdma, line_mode, VPE_CHAN_CHROMA1_IN);
+
+	/* frame start for input luma */
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_LUMA1_IN);
+
+	/* frame start for input chroma */
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_CHROMA1_IN);
+
+	ctx->load_mmrs = true;
+}
+
+/*
+ * Set the shadow registers that are modified when the source
+ * format changes.
+ */
+static void set_src_registers(struct vpe_ctx *ctx)
+{
+	set_us_coefficients(ctx);
+}
+
+/*
+ * Set the shadow registers that are modified when the destination
+ * format changes.
+ */
+static void set_dst_registers(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	struct vpe_fmt *fmt = ctx->q_data[Q_DATA_DST].fmt;
+	u32 val = 0;
+
+	/* select RGB path when color space conversion is supported in future */
+	if (fmt->fourcc == V4L2_PIX_FMT_RGB24)
+		val |= VPE_RGB_OUT_SELECT | VPE_CSC_SRC_DEI_SCALER;
+	else if (fmt->fourcc == V4L2_PIX_FMT_NV16)
+		val |= VPE_COLOR_SEPARATE_422;
+
+	/* The source of CHR_DS is always the scaler, whether it's used or not */
+	val |= VPE_DS_SRC_DEI_SCALER;
+
+	if (fmt->fourcc != V4L2_PIX_FMT_NV12)
+		val |= VPE_DS_BYPASS;
+
+	mmr_adb->out_fmt_reg[0] = val;
+
+	ctx->load_mmrs = true;
+}
+
+/*
+ * Set the de-interlacer shadow register values
+ */
+static void set_dei_regs_bypass(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	struct vpe_q_data *s_q_data = &ctx->q_data[Q_DATA_SRC];
+	unsigned int src_h = s_q_data->c_rect.height;
+	unsigned int src_w = s_q_data->c_rect.width;
+	u32 *dei_mmr0 = &mmr_adb->dei_regs[0];
+	u32 val = 0;
+
+	/*
+	 * according to TRM, we should set DEI in progressive bypass mode when
+	 * the input content is progressive, however, DEI is bypassed correctly
+	 * for both progressive and interlace content in interlace bypass mode.
+	 * It has been recommended not to use progressive bypass mode.
+	 */
+	val = VPE_DEI_INTERLACE_BYPASS;
+
+	val |= (src_h << VPE_DEI_HEIGHT_SHIFT) |
+		(src_w << VPE_DEI_WIDTH_SHIFT) |
+		VPE_DEI_FIELD_FLUSH;
+
+	*dei_mmr0 = val;
+
+	ctx->load_mmrs = true;
+}
+
+static void set_csc_coeff_bypass(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	u32 *shadow_csc_reg5 = &mmr_adb->csc_regs[5];
+
+	*shadow_csc_reg5 |= VPE_CSC_BYPASS;
+
+	ctx->load_mmrs = true;
+}
+
+static void set_sc_regs_bypass(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	u32 *sc_reg0 = &mmr_adb->sc_regs[0];
+	u32 val = 0;
+
+	val |= VPE_SC_BYPASS;
+	*sc_reg0 = val;
+
+	ctx->load_mmrs = true;
+}
+
+/*
+ * Set the shadow registers whose values are modified when either the
+ * source or destination format is changed.
+ */
+static int set_srcdst_params(struct vpe_ctx *ctx)
+{
+	ctx->sequence = 0;
+
+	set_cfg_and_line_modes(ctx);
+	set_dei_regs_bypass(ctx);
+	set_csc_coeff_bypass(ctx);
+	set_sc_regs_bypass(ctx);
+
+	return 0;
+}
+
+/*
+ * Return the vpe_ctx structure for a given struct file
+ */
+static struct vpe_ctx *file2ctx(struct file *file)
+{
+	return container_of(file->private_data, struct vpe_ctx, fh);
+}
+
+/*
+ * mem2mem callbacks
+ */
+
+/**
+ * job_ready() - check whether an instance is ready to be scheduled to run
+ */
+static int job_ready(void *priv)
+{
+	struct vpe_ctx *ctx = priv;
+	int needed = ctx->bufs_per_job;
+
+	if (v4l2_m2m_num_src_bufs_ready(ctx->m2m_ctx) < needed)
+		return 0;
+
+	return 1;
+}
+
+static void job_abort(void *priv)
+{
+	struct vpe_ctx *ctx = priv;
+
+	/* Will cancel the transaction in the next interrupt handler */
+	ctx->aborting = 1;
+}
+
+/*
+ * Lock access to the device
+ */
+static void vpe_lock(void *priv)
+{
+	struct vpe_ctx *ctx = priv;
+	struct vpe_dev *dev = ctx->dev;
+	mutex_lock(&dev->dev_mutex);
+}
+
+static void vpe_unlock(void *priv)
+{
+	struct vpe_ctx *ctx = priv;
+	struct vpe_dev *dev = ctx->dev;
+	mutex_unlock(&dev->dev_mutex);
+}
+
+static void vpe_dump_regs(struct vpe_dev *dev)
+{
+#define DUMPREG(r) vpe_dbg(dev, "%-35s %08x\n", #r, read_reg(dev, VPE_##r))
+
+	vpe_dbg(dev, "VPE Registers:\n");
+
+	DUMPREG(PID);
+	DUMPREG(SYSCONFIG);
+	DUMPREG(INT0_STATUS0_RAW);
+	DUMPREG(INT0_STATUS0);
+	DUMPREG(INT0_ENABLE0);
+	DUMPREG(INT0_STATUS1_RAW);
+	DUMPREG(INT0_STATUS1);
+	DUMPREG(INT0_ENABLE1);
+	DUMPREG(CLK_ENABLE);
+	DUMPREG(CLK_RESET);
+	DUMPREG(CLK_FORMAT_SELECT);
+	DUMPREG(CLK_RANGE_MAP);
+	DUMPREG(US1_R0);
+	DUMPREG(US1_R1);
+	DUMPREG(US1_R2);
+	DUMPREG(US1_R3);
+	DUMPREG(US1_R4);
+	DUMPREG(US1_R5);
+	DUMPREG(US1_R6);
+	DUMPREG(US1_R7);
+	DUMPREG(US2_R0);
+	DUMPREG(US2_R1);
+	DUMPREG(US2_R2);
+	DUMPREG(US2_R3);
+	DUMPREG(US2_R4);
+	DUMPREG(US2_R5);
+	DUMPREG(US2_R6);
+	DUMPREG(US2_R7);
+	DUMPREG(US3_R0);
+	DUMPREG(US3_R1);
+	DUMPREG(US3_R2);
+	DUMPREG(US3_R3);
+	DUMPREG(US3_R4);
+	DUMPREG(US3_R5);
+	DUMPREG(US3_R6);
+	DUMPREG(US3_R7);
+	DUMPREG(DEI_FRAME_SIZE);
+	DUMPREG(MDT_BYPASS);
+	DUMPREG(MDT_SF_THRESHOLD);
+	DUMPREG(EDI_CONFIG);
+	DUMPREG(DEI_EDI_LUT_R0);
+	DUMPREG(DEI_EDI_LUT_R1);
+	DUMPREG(DEI_EDI_LUT_R2);
+	DUMPREG(DEI_EDI_LUT_R3);
+	DUMPREG(DEI_FMD_WINDOW_R0);
+	DUMPREG(DEI_FMD_WINDOW_R1);
+	DUMPREG(DEI_FMD_CONTROL_R0);
+	DUMPREG(DEI_FMD_CONTROL_R1);
+	DUMPREG(DEI_FMD_STATUS_R0);
+	DUMPREG(DEI_FMD_STATUS_R1);
+	DUMPREG(DEI_FMD_STATUS_R2);
+	DUMPREG(SC_MP_SC0);
+	DUMPREG(SC_MP_SC1);
+	DUMPREG(SC_MP_SC2);
+	DUMPREG(SC_MP_SC3);
+	DUMPREG(SC_MP_SC4);
+	DUMPREG(SC_MP_SC5);
+	DUMPREG(SC_MP_SC6);
+	DUMPREG(SC_MP_SC8);
+	DUMPREG(SC_MP_SC9);
+	DUMPREG(SC_MP_SC10);
+	DUMPREG(SC_MP_SC11);
+	DUMPREG(SC_MP_SC12);
+	DUMPREG(SC_MP_SC13);
+	DUMPREG(SC_MP_SC17);
+	DUMPREG(SC_MP_SC18);
+	DUMPREG(SC_MP_SC19);
+	DUMPREG(SC_MP_SC20);
+	DUMPREG(SC_MP_SC21);
+	DUMPREG(SC_MP_SC22);
+	DUMPREG(SC_MP_SC23);
+	DUMPREG(SC_MP_SC24);
+	DUMPREG(SC_MP_SC25);
+	DUMPREG(CSC_CSC00);
+	DUMPREG(CSC_CSC01);
+	DUMPREG(CSC_CSC02);
+	DUMPREG(CSC_CSC03);
+	DUMPREG(CSC_CSC04);
+	DUMPREG(CSC_CSC05);
+#undef DUMPREG
+}
+
+static void add_out_dtd(struct vpe_ctx *ctx, int port)
+{
+	struct vpe_q_data *q_data = &ctx->q_data[Q_DATA_DST];
+	const struct vpe_port_data *p_data = &port_data[port];
+	struct vb2_buffer *vb = ctx->dst_vb;
+	struct v4l2_rect *c_rect = &q_data->c_rect;
+	struct vpe_fmt *fmt = q_data->fmt;
+	const struct vpdma_data_format *vpdma_fmt;
+	int plane = fmt->coplanar ? p_data->vb_part : 0;
+	dma_addr_t dma_addr;
+	u32 flags = 0;
+
+	vpdma_fmt = fmt->vpdma_fmt[plane];
+	dma_addr = vb2_dma_contig_plane_dma_addr(vb, plane);
+	if (!dma_addr) {
+		vpe_err(ctx->dev,
+			"acquiring output buffer(%d) dma_addr failed\n",
+			port);
+		return;
+	}
+
+	if (q_data->flags & Q_DATA_FRAME_1D)
+		flags |= VPDMA_DATA_FRAME_1D;
+	if (q_data->flags & Q_DATA_MODE_TILED)
+		flags |= VPDMA_DATA_MODE_TILED;
+
+	vpdma_add_out_dtd(&ctx->desc_list, c_rect, vpdma_fmt, dma_addr,
+		p_data->channel, flags);
+}
+
+static void add_in_dtd(struct vpe_ctx *ctx, int port)
+{
+	struct vpe_q_data *q_data = &ctx->q_data[Q_DATA_SRC];
+	const struct vpe_port_data *p_data = &port_data[port];
+	struct vb2_buffer *vb = ctx->src_vb;
+	struct v4l2_rect *c_rect = &q_data->c_rect;
+	struct vpe_fmt *fmt = q_data->fmt;
+	const struct vpdma_data_format *vpdma_fmt;
+	int plane = fmt->coplanar ? p_data->vb_part : 0;
+	int field = 0;
+	dma_addr_t dma_addr;
+	u32 flags = 0;
+
+	vpdma_fmt = fmt->vpdma_fmt[plane];
+
+	dma_addr = vb2_dma_contig_plane_dma_addr(vb, plane);
+	if (!dma_addr) {
+		vpe_err(ctx->dev,
+			"acquiring input buffer(%d) dma_addr failed\n",
+			port);
+		return;
+	}
+
+	if (q_data->flags & Q_DATA_FRAME_1D)
+		flags |= VPDMA_DATA_FRAME_1D;
+	if (q_data->flags & Q_DATA_MODE_TILED)
+		flags |= VPDMA_DATA_MODE_TILED;
+
+	vpdma_add_in_dtd(&ctx->desc_list, q_data->width, q_data->height,
+		c_rect, vpdma_fmt, dma_addr, p_data->channel, field, flags);
+}
+
+/*
+ * Enable the expected IRQ sources
+ */
+static void enable_irqs(struct vpe_ctx *ctx)
+{
+	write_reg(ctx->dev, VPE_INT0_ENABLE0_SET, VPE_INT0_LIST0_COMPLETE);
+	write_reg(ctx->dev, VPE_INT0_ENABLE1_SET, VPE_DS1_UV_ERROR_INT);
+
+	vpdma_enable_list_complete_irq(ctx->dev->vpdma, 0, true);
+}
+
+static void disable_irqs(struct vpe_ctx *ctx)
+{
+	write_reg(ctx->dev, VPE_INT0_ENABLE0_CLR, 0xffffffff);
+	write_reg(ctx->dev, VPE_INT0_ENABLE1_CLR, 0xffffffff);
+
+	vpdma_enable_list_complete_irq(ctx->dev->vpdma, 0, false);
+}
+
+/* device_run() - prepares and starts the device
+ *
+ * This function is only called when both the source and destination
+ * buffers are in place.
+ */
+static void device_run(void *priv)
+{
+	struct vpe_ctx *ctx = priv;
+	struct vpe_q_data *d_q_data = &ctx->q_data[Q_DATA_DST];
+
+	ctx->src_vb = v4l2_m2m_src_buf_remove(ctx->m2m_ctx);
+	WARN_ON(ctx->src_vb == NULL);
+	ctx->dst_vb = v4l2_m2m_dst_buf_remove(ctx->m2m_ctx);
+	WARN_ON(ctx->dst_vb == NULL);
+
+	/* config descriptors */
+	if (ctx->dev->loaded_mmrs != ctx->mmr_adb.dma_addr || ctx->load_mmrs) {
+		vpdma_buf_map(ctx->dev->vpdma, &ctx->mmr_adb);
+		vpdma_add_cfd_adb(&ctx->desc_list, CFD_MMR_CLIENT, &ctx->mmr_adb);
+		ctx->dev->loaded_mmrs = ctx->mmr_adb.dma_addr;
+		ctx->load_mmrs = false;
+	}
+
+	add_out_dtd(ctx, VPE_PORT_LUMA_OUT);
+	if (d_q_data->fmt->coplanar)
+		add_out_dtd(ctx, VPE_PORT_CHROMA_OUT);
+
+	add_in_dtd(ctx, VPE_PORT_LUMA1_IN);
+	add_in_dtd(ctx, VPE_PORT_CHROMA1_IN);
+
+	/* sync on channel control descriptors for output ports */
+	vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_LUMA_OUT);
+	if (d_q_data->fmt->coplanar)
+		vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_CHROMA_OUT);
+
+	enable_irqs(ctx);
+
+	vpdma_buf_map(ctx->dev->vpdma, &ctx->desc_list.buf);
+	vpdma_submit_descs(ctx->dev->vpdma, &ctx->desc_list);
+}
+
+static void ds1_uv_error(struct vpe_ctx *ctx)
+{
+	dev_warn(ctx->dev->v4l2_dev.dev,
+		"received downsampler error interrupt\n");
+}
+
+static irqreturn_t vpe_irq(int irq_vpe, void *data)
+{
+	struct vpe_dev *dev = (struct vpe_dev *)data;
+	struct vpe_ctx *ctx;
+	struct vb2_buffer *s_vb, *d_vb;
+	struct v4l2_buffer *s_buf, *d_buf;
+	unsigned long flags;
+	u32 irqst0, irqst1;
+
+	irqst0 = read_reg(dev, VPE_INT0_STATUS0);
+	if (irqst0) {
+		write_reg(dev, VPE_INT0_STATUS0_CLR, irqst0);
+		vpe_dbg(dev, "INT0_STATUS0 = 0x%08x\n", irqst0);
+	}
+
+	irqst1 = read_reg(dev, VPE_INT0_STATUS1);
+	if (irqst1) {
+		write_reg(dev, VPE_INT0_STATUS1_CLR, irqst1);
+		vpe_dbg(dev, "INT0_STATUS1 = 0x%08x\n", irqst1);
+	}
+
+	ctx = v4l2_m2m_get_curr_priv(dev->m2m_dev);
+	if (!ctx) {
+		vpe_err(dev, "instance released before end of transaction\n");
+		goto handled;
+	}
+
+	if (irqst1 & VPE_DS1_UV_ERROR_INT) {
+		irqst1 &= ~VPE_DS1_UV_ERROR_INT;
+		ds1_uv_error(ctx);
+	}
+
+	if (irqst0) {
+		if (irqst0 & VPE_INT0_LIST0_COMPLETE)
+			vpdma_clear_list_stat(ctx->dev->vpdma);
+
+		irqst0 &= ~(VPE_INT0_LIST0_COMPLETE);
+	}
+
+	if (irqst0 | irqst1) {
+		dev_warn(dev->v4l2_dev.dev, "Unexpected interrupt: "
+			"INT0_STATUS0 = 0x%08x, INT0_STATUS1 = 0x%08x\n",
+			irqst0, irqst1);
+	}
+
+	disable_irqs(ctx);
+
+	vpdma_buf_unmap(dev->vpdma, &ctx->desc_list.buf);
+	vpdma_buf_unmap(dev->vpdma, &ctx->mmr_adb);
+
+	vpdma_reset_desc_list(&ctx->desc_list);
+
+	if (ctx->aborting)
+		goto finished;
+
+	s_vb = ctx->src_vb;
+	d_vb = ctx->dst_vb;
+	s_buf = &s_vb->v4l2_buf;
+	d_buf = &d_vb->v4l2_buf;
+
+	d_buf->timestamp = s_buf->timestamp;
+	if (s_buf->flags & V4L2_BUF_FLAG_TIMECODE) {
+		d_buf->flags |= V4L2_BUF_FLAG_TIMECODE;
+		d_buf->timecode = s_buf->timecode;
+	}
+
+	d_buf->sequence = ctx->sequence;
+
+	ctx->sequence++;
+
+	spin_lock_irqsave(&dev->lock, flags);
+	v4l2_m2m_buf_done(s_vb, VB2_BUF_STATE_DONE);
+	v4l2_m2m_buf_done(d_vb, VB2_BUF_STATE_DONE);
+	spin_unlock_irqrestore(&dev->lock, flags);
+
+	ctx->bufs_completed++;
+	if (ctx->bufs_completed < ctx->bufs_per_job) {
+		device_run(ctx);
+		goto handled;
+	}
+
+finished:
+	vpe_dbg(ctx->dev, "finishing transaction\n");
+	ctx->bufs_completed = 0;
+	v4l2_m2m_job_finish(dev->m2m_dev, ctx->m2m_ctx);
+handled:
+	return IRQ_HANDLED;
+}
+
+/*
+ * video ioctls
+ */
+static int vpe_querycap(struct file *file, void *priv,
+			struct v4l2_capability *cap)
+{
+	strncpy(cap->driver, VPE_MODULE_NAME, sizeof(cap->driver) - 1);
+	strncpy(cap->card, VPE_MODULE_NAME, sizeof(cap->card) - 1);
+	strlcpy(cap->bus_info, VPE_MODULE_NAME, sizeof(cap->bus_info));
+	cap->device_caps  = V4L2_CAP_VIDEO_M2M | V4L2_CAP_STREAMING;
+	cap->capabilities = cap->device_caps | V4L2_CAP_DEVICE_CAPS;
+	return 0;
+}
+
+static int __enum_fmt(struct v4l2_fmtdesc *f, u32 type)
+{
+	int i, index;
+	struct vpe_fmt *fmt = NULL;
+
+	index = 0;
+	for (i = 0; i < ARRAY_SIZE(vpe_formats); ++i) {
+		if (vpe_formats[i].types & type) {
+			if (index == f->index) {
+				fmt = &vpe_formats[i];
+				break;
+			}
+			index++;
+		}
+	}
+
+	if (!fmt)
+		return -EINVAL;
+
+	strncpy(f->description, fmt->name, sizeof(f->description) - 1);
+	f->pixelformat = fmt->fourcc;
+	return 0;
+}
+
+static int vpe_enum_fmt(struct file *file, void *priv,
+				struct v4l2_fmtdesc *f)
+{
+	if (V4L2_TYPE_IS_OUTPUT(f->type))
+		return __enum_fmt(f, VPE_FMT_TYPE_OUTPUT);
+	else
+		return __enum_fmt(f, VPE_FMT_TYPE_CAPTURE);
+}
+
+static int vpe_g_fmt(struct file *file, void *priv, struct v4l2_format *f)
+{
+	struct v4l2_pix_format_mplane *pix = &f->fmt.pix_mp;
+	struct vpe_ctx *ctx = file2ctx(file);
+	struct vb2_queue *vq;
+	struct vpe_q_data *q_data;
+	int i;
+
+	vq = v4l2_m2m_get_vq(ctx->m2m_ctx, f->type);
+	if (!vq)
+		return -EINVAL;
+
+	q_data = get_q_data(ctx, f->type);
+
+	pix->width = q_data->width;
+	pix->height = q_data->height;
+	pix->pixelformat = q_data->fmt->fourcc;
+	pix->colorspace = q_data->colorspace;
+	pix->num_planes = q_data->fmt->coplanar ? 2 : 1;
+
+	for (i = 0; i < pix->num_planes; i++) {
+		pix->plane_fmt[i].bytesperline = q_data->bytesperline[i];
+		pix->plane_fmt[i].sizeimage = q_data->sizeimage[i];
+	}
+
+	return 0;
+}
+
+static int __vpe_try_fmt(struct vpe_ctx *ctx, struct v4l2_format *f,
+		       struct vpe_fmt *fmt, int type)
+{
+	struct v4l2_pix_format_mplane *pix = &f->fmt.pix_mp;
+	struct v4l2_plane_pix_format *plane_fmt;
+	int i;
+
+	if (!fmt || !(fmt->types & type)) {
+		vpe_err(ctx->dev, "Fourcc format (0x%08x) invalid.\n",
+			pix->pixelformat);
+		return -EINVAL;
+	}
+
+	pix->field = V4L2_FIELD_NONE;
+
+	v4l_bound_align_image(&pix->width, MIN_W, MAX_W, W_ALIGN,
+			      &pix->height, MIN_H, MAX_H, H_ALIGN,
+			      S_ALIGN);
+
+	pix->num_planes = fmt->coplanar ? 2 : 1;
+	pix->pixelformat = fmt->fourcc;
+	pix->colorspace = fmt->fourcc == V4L2_PIX_FMT_RGB24 ?
+			V4L2_COLORSPACE_SRGB : V4L2_COLORSPACE_SMPTE170M;
+
+
+	for (i = 0; i < pix->num_planes; i++) {
+		int depth;
+
+		plane_fmt = &pix->plane_fmt[i];
+		depth = fmt->vpdma_fmt[i]->depth;
+
+		if (i == VPE_LUMA)
+			plane_fmt->bytesperline =
+					round_up((pix->width * depth) >> 3,
+						1 << L_ALIGN);
+		else
+			plane_fmt->bytesperline = pix->width;
+
+		plane_fmt->sizeimage =
+				(pix->height * pix->width * depth) >> 3;
+	}
+
+	return 0;
+}
+
+static int vpe_try_fmt(struct file *file, void *priv, struct v4l2_format *f)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+	struct vpe_fmt *fmt = find_format(f);
+
+	if (V4L2_TYPE_IS_OUTPUT(f->type))
+		return __vpe_try_fmt(ctx, f, fmt, VPE_FMT_TYPE_OUTPUT);
+	else
+		return __vpe_try_fmt(ctx, f, fmt, VPE_FMT_TYPE_CAPTURE);
+}
+
+static int __vpe_s_fmt(struct vpe_ctx *ctx, struct v4l2_format *f)
+{
+	struct v4l2_pix_format_mplane *pix = &f->fmt.pix_mp;
+	struct v4l2_plane_pix_format *plane_fmt;
+	struct vpe_q_data *q_data;
+	struct vb2_queue *vq;
+	int i;
+
+	vq = v4l2_m2m_get_vq(ctx->m2m_ctx, f->type);
+	if (!vq)
+		return -EINVAL;
+
+	if (vb2_is_busy(vq)) {
+		vpe_err(ctx->dev, "queue busy\n");
+		return -EBUSY;
+	}
+
+	q_data = get_q_data(ctx, f->type);
+	if (!q_data)
+		return -EINVAL;
+
+	q_data->fmt		= find_format(f);
+	q_data->width		= pix->width;
+	q_data->height		= pix->height;
+	q_data->colorspace	= pix->colorspace;
+
+	for (i = 0; i < pix->num_planes; i++) {
+		plane_fmt = &pix->plane_fmt[i];
+
+		q_data->bytesperline[i]	= plane_fmt->bytesperline;
+		q_data->sizeimage[i]	= plane_fmt->sizeimage;
+	}
+
+	q_data->c_rect.left	= 0;
+	q_data->c_rect.top	= 0;
+	q_data->c_rect.width	= q_data->width;
+	q_data->c_rect.height	= q_data->height;
+
+	vpe_dbg(ctx->dev, "Setting format for type %d, wxh: %dx%d, fmt: %d bpl_y %d",
+		f->type, q_data->width, q_data->height, q_data->fmt->fourcc,
+		q_data->bytesperline[VPE_LUMA]);
+	if (q_data->fmt->coplanar)
+		vpe_dbg(ctx->dev, " bpl_uv %d\n",
+			q_data->bytesperline[VPE_CHROMA]);
+
+	return 0;
+}
+
+static int vpe_s_fmt(struct file *file, void *priv, struct v4l2_format *f)
+{
+	int ret;
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	ret = vpe_try_fmt(file, priv, f);
+	if (ret)
+		return ret;
+
+	ret = __vpe_s_fmt(ctx, f);
+	if (ret)
+		return ret;
+
+	if (V4L2_TYPE_IS_OUTPUT(f->type))
+		set_src_registers(ctx);
+	else
+		set_dst_registers(ctx);
+
+	return set_srcdst_params(ctx);
+}
+
+static int vpe_reqbufs(struct file *file, void *priv,
+		       struct v4l2_requestbuffers *reqbufs)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	return v4l2_m2m_reqbufs(file, ctx->m2m_ctx, reqbufs);
+}
+
+static int vpe_querybuf(struct file *file, void *priv, struct v4l2_buffer *buf)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	return v4l2_m2m_querybuf(file, ctx->m2m_ctx, buf);
+}
+
+static int vpe_qbuf(struct file *file, void *priv, struct v4l2_buffer *buf)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	return v4l2_m2m_qbuf(file, ctx->m2m_ctx, buf);
+}
+
+static int vpe_dqbuf(struct file *file, void *priv, struct v4l2_buffer *buf)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	return v4l2_m2m_dqbuf(file, ctx->m2m_ctx, buf);
+}
+
+static int vpe_streamon(struct file *file, void *priv, enum v4l2_buf_type type)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	return v4l2_m2m_streamon(file, ctx->m2m_ctx, type);
+}
+
+static int vpe_streamoff(struct file *file, void *priv, enum v4l2_buf_type type)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	vpe_dump_regs(ctx->dev);
+	vpdma_dump_regs(ctx->dev->vpdma);
+
+	return v4l2_m2m_streamoff(file, ctx->m2m_ctx, type);
+}
+
+#define V4L2_CID_TRANS_NUM_BUFS		(V4L2_CID_USER_BASE + 0x1000)
+
+static int vpe_s_ctrl(struct v4l2_ctrl *ctrl)
+{
+	struct vpe_ctx *ctx =
+		container_of(ctrl->handler, struct vpe_ctx, hdl);
+
+	switch (ctrl->id) {
+	case V4L2_CID_TRANS_NUM_BUFS:
+		ctx->bufs_per_job = ctrl->val;
+		break;
+
+	default:
+		vpe_err(ctx->dev, "Invalid control\n");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static const struct v4l2_ctrl_ops vpe_ctrl_ops = {
+	.s_ctrl = vpe_s_ctrl,
+};
+
+static const struct v4l2_ioctl_ops vpe_ioctl_ops = {
+	.vidioc_querycap	= vpe_querycap,
+
+	.vidioc_enum_fmt_vid_cap_mplane = vpe_enum_fmt,
+	.vidioc_g_fmt_vid_cap_mplane	= vpe_g_fmt,
+	.vidioc_try_fmt_vid_cap_mplane	= vpe_try_fmt,
+	.vidioc_s_fmt_vid_cap_mplane	= vpe_s_fmt,
+
+	.vidioc_enum_fmt_vid_out_mplane = vpe_enum_fmt,
+	.vidioc_g_fmt_vid_out_mplane	= vpe_g_fmt,
+	.vidioc_try_fmt_vid_out_mplane	= vpe_try_fmt,
+	.vidioc_s_fmt_vid_out_mplane	= vpe_s_fmt,
+
+	.vidioc_reqbufs		= vpe_reqbufs,
+	.vidioc_querybuf	= vpe_querybuf,
+
+	.vidioc_qbuf		= vpe_qbuf,
+	.vidioc_dqbuf		= vpe_dqbuf,
+
+	.vidioc_streamon	= vpe_streamon,
+	.vidioc_streamoff	= vpe_streamoff,
+	.vidioc_subscribe_event = v4l2_ctrl_subscribe_event,
+	.vidioc_unsubscribe_event = v4l2_event_unsubscribe,
+};
+
+/*
+ * Queue operations
+ */
+static int vpe_queue_setup(struct vb2_queue *vq,
+			   const struct v4l2_format *fmt,
+			   unsigned int *nbuffers, unsigned int *nplanes,
+			   unsigned int sizes[], void *alloc_ctxs[])
+{
+	int i;
+	struct vpe_ctx *ctx = vb2_get_drv_priv(vq);
+	struct vpe_q_data *q_data;
+
+	q_data = get_q_data(ctx, vq->type);
+
+	*nplanes = q_data->fmt->coplanar ? 2 : 1;
+
+	for (i = 0; i < *nplanes; i++) {
+		sizes[i] = q_data->sizeimage[i];
+		alloc_ctxs[i] = ctx->dev->alloc_ctx;
+	}
+
+	vpe_dbg(ctx->dev, "get %d buffer(s) of size %d", *nbuffers,
+		sizes[VPE_LUMA]);
+	if (q_data->fmt->coplanar)
+		vpe_dbg(ctx->dev, " and %d\n", sizes[VPE_CHROMA]);
+
+	return 0;
+}
+
+static int vpe_buf_prepare(struct vb2_buffer *vb)
+{
+	struct vpe_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
+	struct vpe_q_data *q_data;
+	int i, num_planes;
+
+	vpe_dbg(ctx->dev, "type: %d\n", vb->vb2_queue->type);
+
+	q_data = get_q_data(ctx, vb->vb2_queue->type);
+	num_planes = q_data->fmt->coplanar ? 2 : 1;
+
+	for (i = 0; i < num_planes; i++) {
+		if (vb2_plane_size(vb, i) < q_data->sizeimage[i]) {
+			vpe_err(ctx->dev,
+				"data will not fit into plane (%lu < %lu)\n",
+				vb2_plane_size(vb, i),
+				(long) q_data->sizeimage[i]);
+			return -EINVAL;
+		}
+	}
+
+	for (i = 0; i < num_planes; i++)
+		vb2_set_plane_payload(vb, i, q_data->sizeimage[i]);
+
+	return 0;
+}
+
+static void vpe_buf_queue(struct vb2_buffer *vb)
+{
+	struct vpe_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
+	v4l2_m2m_buf_queue(ctx->m2m_ctx, vb);
+}
+
+static void vpe_wait_prepare(struct vb2_queue *q)
+{
+	struct vpe_ctx *ctx = vb2_get_drv_priv(q);
+	vpe_unlock(ctx);
+}
+
+static void vpe_wait_finish(struct vb2_queue *q)
+{
+	struct vpe_ctx *ctx = vb2_get_drv_priv(q);
+	vpe_lock(ctx);
+}
+
+static struct vb2_ops vpe_qops = {
+	.queue_setup	 = vpe_queue_setup,
+	.buf_prepare	 = vpe_buf_prepare,
+	.buf_queue	 = vpe_buf_queue,
+	.wait_prepare	 = vpe_wait_prepare,
+	.wait_finish	 = vpe_wait_finish,
+};
+
+static int queue_init(void *priv, struct vb2_queue *src_vq,
+		      struct vb2_queue *dst_vq)
+{
+	struct vpe_ctx *ctx = priv;
+	int ret;
+
+	memset(src_vq, 0, sizeof(*src_vq));
+	src_vq->type = V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE;
+	src_vq->io_modes = VB2_MMAP;
+	src_vq->drv_priv = ctx;
+	src_vq->buf_struct_size = sizeof(struct v4l2_m2m_buffer);
+	src_vq->ops = &vpe_qops;
+	src_vq->mem_ops = &vb2_dma_contig_memops;
+	src_vq->timestamp_type = V4L2_BUF_FLAG_TIMESTAMP_MONOTONIC;
+
+	ret = vb2_queue_init(src_vq);
+	if (ret)
+		return ret;
+
+	memset(dst_vq, 0, sizeof(*dst_vq));
+	dst_vq->type = V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE;
+	dst_vq->io_modes = VB2_MMAP;
+	dst_vq->drv_priv = ctx;
+	dst_vq->buf_struct_size = sizeof(struct v4l2_m2m_buffer);
+	dst_vq->ops = &vpe_qops;
+	dst_vq->mem_ops = &vb2_dma_contig_memops;
+	dst_vq->timestamp_type = V4L2_BUF_FLAG_TIMESTAMP_MONOTONIC;
+
+	return vb2_queue_init(dst_vq);
+}
+
+static const struct v4l2_ctrl_config vpe_bufs_per_job = {
+	.ops = &vpe_ctrl_ops,
+	.id = V4L2_CID_TRANS_NUM_BUFS,
+	.name = "Buffers Per Transaction",
+	.type = V4L2_CTRL_TYPE_INTEGER,
+	.def = VPE_DEF_BUFS_PER_JOB,
+	.min = 1,
+	.max = VIDEO_MAX_FRAME,
+	.step = 1,
+};
+
+/*
+ * File operations
+ */
+static int vpe_open(struct file *file)
+{
+	struct vpe_dev *dev = video_drvdata(file);
+	struct vpe_ctx *ctx = NULL;
+	struct vpe_q_data *s_q_data;
+	struct v4l2_ctrl_handler *hdl;
+	int ret;
+
+	vpe_dbg(dev, "vpe_open\n");
+
+	if (!dev->vpdma->ready) {
+		vpe_err(dev, "vpdma firmware not loaded\n");
+		return -ENODEV;
+	}
+
+	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
+	if (!ctx)
+		return -ENOMEM;
+
+	ctx->dev = dev;
+
+	if (mutex_lock_interruptible(&dev->dev_mutex)) {
+		ret = -ERESTARTSYS;
+		goto free_ctx;
+	}
+
+	ret = vpdma_create_desc_list(&ctx->desc_list, VPE_DESC_LIST_SIZE,
+			VPDMA_LIST_TYPE_NORMAL);
+	if (ret != 0)
+		goto unlock;
+
+	ret = vpdma_buf_alloc(&ctx->mmr_adb, sizeof(struct vpe_mmr_adb));
+	if (ret != 0)
+		goto free_desc_list;
+
+	init_adb_hdrs(ctx);
+
+	v4l2_fh_init(&ctx->fh, video_devdata(file));
+	file->private_data = &ctx->fh;
+
+	hdl = &ctx->hdl;
+	v4l2_ctrl_handler_init(hdl, 1);
+	v4l2_ctrl_new_custom(hdl, &vpe_bufs_per_job, NULL);
+	if (hdl->error) {
+		ret = hdl->error;
+		goto exit_fh;
+	}
+	ctx->fh.ctrl_handler = hdl;
+	v4l2_ctrl_handler_setup(hdl);
+
+	s_q_data = &ctx->q_data[Q_DATA_SRC];
+	s_q_data->fmt = &vpe_formats[2];
+	s_q_data->width = 1920;
+	s_q_data->height = 1080;
+	s_q_data->sizeimage[VPE_LUMA] = (s_q_data->width * s_q_data->height *
+			s_q_data->fmt->vpdma_fmt[VPE_LUMA]->depth) >> 3;
+	s_q_data->colorspace = V4L2_COLORSPACE_SMPTE240M;
+	s_q_data->c_rect.left = 0;
+	s_q_data->c_rect.top = 0;
+	s_q_data->c_rect.width = s_q_data->width;
+	s_q_data->c_rect.height = s_q_data->height;
+	s_q_data->flags = 0;
+
+	ctx->q_data[Q_DATA_DST] = *s_q_data;
+
+	set_src_registers(ctx);
+	set_dst_registers(ctx);
+	ret = set_srcdst_params(ctx);
+	if (ret)
+		goto exit_fh;
+
+	ctx->m2m_ctx = v4l2_m2m_ctx_init(dev->m2m_dev, ctx, &queue_init);
+
+	if (IS_ERR(ctx->m2m_ctx)) {
+		ret = PTR_ERR(ctx->m2m_ctx);
+		goto exit_fh;
+	}
+
+	v4l2_fh_add(&ctx->fh);
+
+	/*
+	 * for now, just report the creation of the first instance, we can later
+	 * optimize the driver to enable or disable clocks when the first
+	 * instance is created or the last instance released
+	 */
+	if (atomic_inc_return(&dev->num_instances) == 1)
+		vpe_dbg(dev, "first instance created\n");
+
+	ctx->bufs_per_job = VPE_DEF_BUFS_PER_JOB;
+
+	ctx->load_mmrs = true;
+
+	vpe_dbg(dev, "created instance %p, m2m_ctx: %p\n",
+		ctx, ctx->m2m_ctx);
+
+	mutex_unlock(&dev->dev_mutex);
+
+	return 0;
+exit_fh:
+	v4l2_ctrl_handler_free(hdl);
+	v4l2_fh_exit(&ctx->fh);
+	vpdma_buf_free(&ctx->mmr_adb);
+free_desc_list:
+	vpdma_free_desc_list(&ctx->desc_list);
+unlock:
+	mutex_unlock(&dev->dev_mutex);
+free_ctx:
+	kfree(ctx);
+	return ret;
+}
+
+static int vpe_release(struct file *file)
+{
+	struct vpe_dev *dev = video_drvdata(file);
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	vpe_dbg(dev, "releasing instance %p\n", ctx);
+
+	mutex_lock(&dev->dev_mutex);
+	vpdma_free_desc_list(&ctx->desc_list);
+	vpdma_buf_free(&ctx->mmr_adb);
+
+	v4l2_fh_del(&ctx->fh);
+	v4l2_fh_exit(&ctx->fh);
+	v4l2_ctrl_handler_free(&ctx->hdl);
+	v4l2_m2m_ctx_release(ctx->m2m_ctx);
+
+	kfree(ctx);
+
+	/*
+	 * for now, just report the release of the last instance, we can later
+	 * optimize the driver to enable or disable clocks when the first
+	 * instance is created or the last instance released
+	 */
+	if (atomic_dec_return(&dev->num_instances) == 0)
+		vpe_dbg(dev, "last instance released\n");
+
+	mutex_unlock(&dev->dev_mutex);
+
+	return 0;
+}
+
+static unsigned int vpe_poll(struct file *file,
+			     struct poll_table_struct *wait)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+	struct vpe_dev *dev = ctx->dev;
+	int ret;
+
+	mutex_lock(&dev->dev_mutex);
+	ret = v4l2_m2m_poll(file, ctx->m2m_ctx, wait);
+	mutex_unlock(&dev->dev_mutex);
+	return ret;
+}
+
+static int vpe_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+	struct vpe_dev *dev = ctx->dev;
+	int ret;
+
+	if (mutex_lock_interruptible(&dev->dev_mutex))
+		return -ERESTARTSYS;
+	ret = v4l2_m2m_mmap(file, ctx->m2m_ctx, vma);
+	mutex_unlock(&dev->dev_mutex);
+	return ret;
+}
+
+static const struct v4l2_file_operations vpe_fops = {
+	.owner		= THIS_MODULE,
+	.open		= vpe_open,
+	.release	= vpe_release,
+	.poll		= vpe_poll,
+	.unlocked_ioctl	= video_ioctl2,
+	.mmap		= vpe_mmap,
+};
+
+static struct video_device vpe_videodev = {
+	.name		= VPE_MODULE_NAME,
+	.fops		= &vpe_fops,
+	.ioctl_ops	= &vpe_ioctl_ops,
+	.minor		= -1,
+	.release	= video_device_release,
+	.vfl_dir	= VFL_DIR_M2M,
+};
+
+static struct v4l2_m2m_ops m2m_ops = {
+	.device_run	= device_run,
+	.job_ready	= job_ready,
+	.job_abort	= job_abort,
+	.lock		= vpe_lock,
+	.unlock		= vpe_unlock,
+};
+
+static int vpe_runtime_get(struct platform_device *pdev)
+{
+	int r;
+
+	dev_dbg(&pdev->dev, "vpe_runtime_get\n");
+
+	r = pm_runtime_get_sync(&pdev->dev);
+	WARN_ON(r < 0);
+	return r < 0 ? r : 0;
+}
+
+static void vpe_runtime_put(struct platform_device *pdev)
+{
+
+	int r;
+
+	dev_dbg(&pdev->dev, "vpe_runtime_put\n");
+
+	r = pm_runtime_put_sync(&pdev->dev);
+	WARN_ON(r < 0 && r != -ENOSYS);
+}
+
+static int vpe_probe(struct platform_device *pdev)
+{
+	struct vpe_dev *dev;
+	struct video_device *vfd;
+	struct resource *res;
+	int ret, irq, func;
+
+	dev = devm_kzalloc(&pdev->dev, sizeof(*dev), GFP_KERNEL);
+	if (IS_ERR(dev))
+		return PTR_ERR(dev);
+
+	spin_lock_init(&dev->lock);
+
+	ret = v4l2_device_register(&pdev->dev, &dev->v4l2_dev);
+	if (ret)
+		return ret;
+
+	atomic_set(&dev->num_instances, 0);
+	mutex_init(&dev->dev_mutex);
+
+	res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "vpe");
+	dev->base = devm_ioremap_resource(&pdev->dev, res);
+	if (IS_ERR(dev->base)) {
+		ret = PTR_ERR(dev->base);
+		goto v4l2_dev_unreg;
+	}
+
+	irq = platform_get_irq(pdev, 0);
+	ret = devm_request_irq(&pdev->dev, irq, vpe_irq, 0, VPE_MODULE_NAME,
+			dev);
+	if (ret)
+		goto v4l2_dev_unreg;
+
+	platform_set_drvdata(pdev, dev);
+
+	dev->alloc_ctx = vb2_dma_contig_init_ctx(&pdev->dev);
+	if (IS_ERR(dev->alloc_ctx)) {
+		vpe_err(dev, "Failed to alloc vb2 context\n");
+		ret = PTR_ERR(dev->alloc_ctx);
+		goto v4l2_dev_unreg;
+	}
+
+	dev->m2m_dev = v4l2_m2m_init(&m2m_ops);
+	if (IS_ERR(dev->m2m_dev)) {
+		vpe_err(dev, "Failed to init mem2mem device\n");
+		ret = PTR_ERR(dev->m2m_dev);
+		goto rel_ctx;
+	}
+
+	pm_runtime_enable(&pdev->dev);
+
+	ret = vpe_runtime_get(pdev);
+	if (ret)
+		goto rel_m2m;
+
+	/* Perform clk enable followed by reset */
+	vpe_set_clock_enable(dev, 1);
+
+	vpe_top_reset(dev);
+
+	func = read_field_reg(dev, VPE_PID, VPE_PID_FUNC_MASK,
+		VPE_PID_FUNC_SHIFT);
+	vpe_dbg(dev, "VPE PID function %x\n", func);
+
+	vpe_top_vpdma_reset(dev);
+
+	dev->vpdma = vpdma_create(pdev);
+	if (IS_ERR(dev->vpdma))
+		goto runtime_put;
+
+	vfd = &dev->vfd;
+	*vfd = vpe_videodev;
+	vfd->lock = &dev->dev_mutex;
+	vfd->v4l2_dev = &dev->v4l2_dev;
+
+	ret = video_register_device(vfd, VFL_TYPE_GRABBER, 0);
+	if (ret) {
+		vpe_err(dev, "Failed to register video device\n");
+		goto runtime_put;
+	}
+
+	video_set_drvdata(vfd, dev);
+	snprintf(vfd->name, sizeof(vfd->name), "%s", vpe_videodev.name);
+	dev_info(dev->v4l2_dev.dev, "Device registered as /dev/video%d\n",
+		vfd->num);
+
+	return 0;
+
+runtime_put:
+	vpe_runtime_put(pdev);
+rel_m2m:
+	pm_runtime_disable(&pdev->dev);
+	v4l2_m2m_release(dev->m2m_dev);
+rel_ctx:
+	vb2_dma_contig_cleanup_ctx(dev->alloc_ctx);
+v4l2_dev_unreg:
+	v4l2_device_unregister(&dev->v4l2_dev);
+
+	return ret;
+}
+
+static int vpe_remove(struct platform_device *pdev)
+{
+	struct vpe_dev *dev =
+		(struct vpe_dev *) platform_get_drvdata(pdev);
+
+	v4l2_info(&dev->v4l2_dev, "Removing " VPE_MODULE_NAME);
+
+	v4l2_m2m_release(dev->m2m_dev);
+	video_unregister_device(&dev->vfd);
+	v4l2_device_unregister(&dev->v4l2_dev);
+	vb2_dma_contig_cleanup_ctx(dev->alloc_ctx);
+
+	vpe_set_clock_enable(dev, 0);
+	vpe_runtime_put(pdev);
+	pm_runtime_disable(&pdev->dev);
+
+	return 0;
+}
+
+#if defined(CONFIG_OF)
+static const struct of_device_id vpe_of_match[] = {
+	{
+		.compatible = "ti,vpe",
+	},
+	{},
+};
+#else
+#define vpe_of_match NULL
+#endif
+
+static struct platform_driver vpe_pdrv = {
+	.probe		= vpe_probe,
+	.remove		= vpe_remove,
+	.driver		= {
+		.name	= VPE_MODULE_NAME,
+		.owner	= THIS_MODULE,
+		.of_match_table = vpe_of_match,
+	},
+};
+
+static void __exit vpe_exit(void)
+{
+	platform_driver_unregister(&vpe_pdrv);
+}
+
+static int __init vpe_init(void)
+{
+	return platform_driver_register(&vpe_pdrv);
+}
+
+module_init(vpe_init);
+module_exit(vpe_exit);
+
+MODULE_DESCRIPTION("TI VPE driver");
+MODULE_AUTHOR("Dale Farnsworth, <dale@farnsworth.org>");
+MODULE_LICENSE("GPL");
diff --git a/drivers/media/platform/ti-vpe/vpe_regs.h b/drivers/media/platform/ti-vpe/vpe_regs.h
new file mode 100644
index 0000000..be41a1f
--- /dev/null
+++ b/drivers/media/platform/ti-vpe/vpe_regs.h
@@ -0,0 +1,496 @@
+/*
+ * Copyright (c) 2013 Texas Instruments Inc.
+ *
+ * David Griego, <dagriego@biglakesoftware.com>
+ * Dale Farnsworth, <dale@farnsworth.org>
+ * Archit Taneja, <archit@ti.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#ifndef __TI_VPE_REGS_H
+#define __TI_VPE_REGS_H
+
+/* VPE register offsets and field selectors */
+
+/* VPE top level regs */
+#define VPE_PID				0x0000
+#define VPE_PID_MINOR_MASK		0x3f
+#define VPE_PID_MINOR_SHIFT		0
+#define VPE_PID_CUSTOM_MASK		0x03
+#define VPE_PID_CUSTOM_SHIFT		6
+#define VPE_PID_MAJOR_MASK		0x07
+#define VPE_PID_MAJOR_SHIFT		8
+#define VPE_PID_RTL_MASK		0x1f
+#define VPE_PID_RTL_SHIFT		11
+#define VPE_PID_FUNC_MASK		0xfff
+#define VPE_PID_FUNC_SHIFT		16
+#define VPE_PID_SCHEME_MASK		0x03
+#define VPE_PID_SCHEME_SHIFT		30
+
+#define VPE_SYSCONFIG			0x0010
+#define VPE_SYSCONFIG_IDLE_MASK		0x03
+#define VPE_SYSCONFIG_IDLE_SHIFT	2
+#define VPE_SYSCONFIG_STANDBY_MASK	0x03
+#define VPE_SYSCONFIG_STANDBY_SHIFT	4
+#define VPE_FORCE_IDLE_MODE		0
+#define VPE_NO_IDLE_MODE		1
+#define VPE_SMART_IDLE_MODE		2
+#define VPE_SMART_IDLE_WAKEUP_MODE	3
+#define VPE_FORCE_STANDBY_MODE		0
+#define VPE_NO_STANDBY_MODE		1
+#define VPE_SMART_STANDBY_MODE		2
+#define VPE_SMART_STANDBY_WAKEUP_MODE	3
+
+#define VPE_INT0_STATUS0_RAW_SET	0x0020
+#define VPE_INT0_STATUS0_RAW		VPE_INT0_STATUS0_RAW_SET
+#define VPE_INT0_STATUS0_CLR		0x0028
+#define VPE_INT0_STATUS0		VPE_INT0_STATUS0_CLR
+#define VPE_INT0_ENABLE0_SET		0x0030
+#define VPE_INT0_ENABLE0		VPE_INT0_ENABLE0_SET
+#define VPE_INT0_ENABLE0_CLR		0x0038
+#define VPE_INT0_LIST0_COMPLETE		(1 << 0)
+#define VPE_INT0_LIST0_NOTIFY		(1 << 1)
+#define VPE_INT0_LIST1_COMPLETE		(1 << 2)
+#define VPE_INT0_LIST1_NOTIFY		(1 << 3)
+#define VPE_INT0_LIST2_COMPLETE		(1 << 4)
+#define VPE_INT0_LIST2_NOTIFY		(1 << 5)
+#define VPE_INT0_LIST3_COMPLETE		(1 << 6)
+#define VPE_INT0_LIST3_NOTIFY		(1 << 7)
+#define VPE_INT0_LIST4_COMPLETE		(1 << 8)
+#define VPE_INT0_LIST4_NOTIFY		(1 << 9)
+#define VPE_INT0_LIST5_COMPLETE		(1 << 10)
+#define VPE_INT0_LIST5_NOTIFY		(1 << 11)
+#define VPE_INT0_LIST6_COMPLETE		(1 << 12)
+#define VPE_INT0_LIST6_NOTIFY		(1 << 13)
+#define VPE_INT0_LIST7_COMPLETE		(1 << 14)
+#define VPE_INT0_LIST7_NOTIFY		(1 << 15)
+#define VPE_INT0_DESCRIPTOR		(1 << 16)
+#define VPE_DEI_FMD_INT			(1 << 18)
+
+#define VPE_INT0_STATUS1_RAW_SET	0x0024
+#define VPE_INT0_STATUS1_RAW		VPE_INT0_STATUS1_RAW_SET
+#define VPE_INT0_STATUS1_CLR		0x002c
+#define VPE_INT0_STATUS1		VPE_INT0_STATUS1_CLR
+#define VPE_INT0_ENABLE1_SET		0x0034
+#define VPE_INT0_ENABLE1		VPE_INT0_ENABLE1_SET
+#define VPE_INT0_ENABLE1_CLR		0x003c
+#define VPE_INT0_CHANNEL_GROUP0		(1 << 0)
+#define VPE_INT0_CHANNEL_GROUP1		(1 << 1)
+#define VPE_INT0_CHANNEL_GROUP2		(1 << 2)
+#define VPE_INT0_CHANNEL_GROUP3		(1 << 3)
+#define VPE_INT0_CHANNEL_GROUP4		(1 << 4)
+#define VPE_INT0_CHANNEL_GROUP5		(1 << 5)
+#define VPE_INT0_CLIENT			(1 << 7)
+#define VPE_DEI_ERROR_INT		(1 << 16)
+#define VPE_DS1_UV_ERROR_INT		(1 << 22)
+
+#define VPE_INTC_EOI			0x00a0
+
+#define VPE_CLK_ENABLE			0x0100
+#define VPE_VPEDMA_CLK_ENABLE		(1 << 0)
+#define VPE_DATA_PATH_CLK_ENABLE	(1 << 1)
+
+#define VPE_CLK_RESET			0x0104
+#define VPE_VPDMA_CLK_RESET_MASK	0x1
+#define VPE_VPDMA_CLK_RESET_SHIFT	0
+#define VPE_DATA_PATH_CLK_RESET_MASK	0x1
+#define VPE_DATA_PATH_CLK_RESET_SHIFT	1
+#define VPE_MAIN_RESET_MASK		0x1
+#define VPE_MAIN_RESET_SHIFT		31
+
+#define VPE_CLK_FORMAT_SELECT		0x010c
+#define VPE_CSC_SRC_SELECT_MASK		0x03
+#define VPE_CSC_SRC_SELECT_SHIFT	0
+#define VPE_RGB_OUT_SELECT		(1 << 8)
+#define VPE_DS_SRC_SELECT_MASK		0x07
+#define VPE_DS_SRC_SELECT_SHIFT		9
+#define VPE_DS_BYPASS			(1 << 16)
+#define VPE_COLOR_SEPARATE_422		(1 << 18)
+
+#define VPE_DS_SRC_DEI_SCALER		(5 << VPE_DS_SRC_SELECT_SHIFT)
+#define VPE_CSC_SRC_DEI_SCALER		(3 << VPE_CSC_SRC_SELECT_SHIFT)
+
+#define VPE_CLK_RANGE_MAP		0x011c
+#define VPE_RANGE_RANGE_MAP_Y_MASK	0x07
+#define VPE_RANGE_RANGE_MAP_Y_SHIFT	0
+#define VPE_RANGE_RANGE_MAP_UV_MASK	0x07
+#define VPE_RANGE_RANGE_MAP_UV_SHIFT	3
+#define VPE_RANGE_MAP_ON		(1 << 6)
+#define VPE_RANGE_REDUCTION_ON		(1 << 28)
+
+/* VPE chrominance upsampler regs */
+#define VPE_US1_R0			0x0304
+#define VPE_US2_R0			0x0404
+#define VPE_US3_R0			0x0504
+#define VPE_US_C1_MASK			0x3fff
+#define VPE_US_C1_SHIFT			2
+#define VPE_US_C0_MASK			0x3fff
+#define VPE_US_C0_SHIFT			18
+#define VPE_US_MODE_MASK		0x03
+#define VPE_US_MODE_SHIFT		16
+#define VPE_ANCHOR_FID0_C1_MASK		0x3fff
+#define VPE_ANCHOR_FID0_C1_SHIFT	2
+#define VPE_ANCHOR_FID0_C0_MASK		0x3fff
+#define VPE_ANCHOR_FID0_C0_SHIFT	18
+
+#define VPE_US1_R1			0x0308
+#define VPE_US2_R1			0x0408
+#define VPE_US3_R1			0x0508
+#define VPE_ANCHOR_FID0_C3_MASK		0x3fff
+#define VPE_ANCHOR_FID0_C3_SHIFT	2
+#define VPE_ANCHOR_FID0_C2_MASK		0x3fff
+#define VPE_ANCHOR_FID0_C2_SHIFT	18
+
+#define VPE_US1_R2			0x030c
+#define VPE_US2_R2			0x040c
+#define VPE_US3_R2			0x050c
+#define VPE_INTERP_FID0_C1_MASK		0x3fff
+#define VPE_INTERP_FID0_C1_SHIFT	2
+#define VPE_INTERP_FID0_C0_MASK		0x3fff
+#define VPE_INTERP_FID0_C0_SHIFT	18
+
+#define VPE_US1_R3			0x0310
+#define VPE_US2_R3			0x0410
+#define VPE_US3_R3			0x0510
+#define VPE_INTERP_FID0_C3_MASK		0x3fff
+#define VPE_INTERP_FID0_C3_SHIFT	2
+#define VPE_INTERP_FID0_C2_MASK		0x3fff
+#define VPE_INTERP_FID0_C2_SHIFT	18
+
+#define VPE_US1_R4			0x0314
+#define VPE_US2_R4			0x0414
+#define VPE_US3_R4			0x0514
+#define VPE_ANCHOR_FID1_C1_MASK		0x3fff
+#define VPE_ANCHOR_FID1_C1_SHIFT	2
+#define VPE_ANCHOR_FID1_C0_MASK		0x3fff
+#define VPE_ANCHOR_FID1_C0_SHIFT	18
+
+#define VPE_US1_R5			0x0318
+#define VPE_US2_R5			0x0418
+#define VPE_US3_R5			0x0518
+#define VPE_ANCHOR_FID1_C3_MASK		0x3fff
+#define VPE_ANCHOR_FID1_C3_SHIFT	2
+#define VPE_ANCHOR_FID1_C2_MASK		0x3fff
+#define VPE_ANCHOR_FID1_C2_SHIFT	18
+
+#define VPE_US1_R6			0x031c
+#define VPE_US2_R6			0x041c
+#define VPE_US3_R6			0x051c
+#define VPE_INTERP_FID1_C1_MASK		0x3fff
+#define VPE_INTERP_FID1_C1_SHIFT	2
+#define VPE_INTERP_FID1_C0_MASK		0x3fff
+#define VPE_INTERP_FID1_C0_SHIFT	18
+
+#define VPE_US1_R7			0x0320
+#define VPE_US2_R7			0x0420
+#define VPE_US3_R7			0x0520
+#define VPE_INTERP_FID0_C3_MASK		0x3fff
+#define VPE_INTERP_FID0_C3_SHIFT	2
+#define VPE_INTERP_FID0_C2_MASK		0x3fff
+#define VPE_INTERP_FID0_C2_SHIFT	18
+
+/* VPE de-interlacer regs */
+#define VPE_DEI_FRAME_SIZE		0x0600
+#define VPE_DEI_WIDTH_MASK		0x07ff
+#define VPE_DEI_WIDTH_SHIFT		0
+#define VPE_DEI_HEIGHT_MASK		0x07ff
+#define VPE_DEI_HEIGHT_SHIFT		16
+#define VPE_DEI_INTERLACE_BYPASS	(1 << 29)
+#define VPE_DEI_FIELD_FLUSH		(1 << 30)
+#define VPE_DEI_PROGRESSIVE		(1 << 31)
+
+#define VPE_MDT_BYPASS			0x0604
+#define VPE_MDT_TEMPMAX_BYPASS		(1 << 0)
+#define VPE_MDT_SPATMAX_BYPASS		(1 << 1)
+
+#define VPE_MDT_SF_THRESHOLD		0x0608
+#define VPE_MDT_SF_SC_THR1_MASK		0xff
+#define VPE_MDT_SF_SC_THR1_SHIFT	0
+#define VPE_MDT_SF_SC_THR2_MASK		0xff
+#define VPE_MDT_SF_SC_THR2_SHIFT	0
+#define VPE_MDT_SF_SC_THR3_MASK		0xff
+#define VPE_MDT_SF_SC_THR3_SHIFT	0
+
+#define VPE_EDI_CONFIG			0x060c
+#define VPE_EDI_INP_MODE_MASK		0x03
+#define VPE_EDI_INP_MODE_SHIFT		0
+#define VPE_EDI_ENABLE_3D		(1 << 2)
+#define VPE_EDI_ENABLE_CHROMA_3D	(1 << 3)
+#define VPE_EDI_CHROMA3D_COR_THR_MASK	0xff
+#define VPE_EDI_CHROMA3D_COR_THR_SHIFT	8
+#define VPE_EDI_DIR_COR_LOWER_THR_MASK	0xff
+#define VPE_EDI_DIR_COR_LOWER_THR_SHIFT	16
+#define VPE_EDI_COR_SCALE_FACTOR_MASK	0xff
+#define VPE_EDI_COR_SCALE_FACTOR_SHIFT	23
+
+#define VPE_DEI_EDI_LUT_R0		0x0610
+#define VPE_EDI_LUT0_MASK		0x1f
+#define VPE_EDI_LUT0_SHIFT		0
+#define VPE_EDI_LUT1_MASK		0x1f
+#define VPE_EDI_LUT1_SHIFT		8
+#define VPE_EDI_LUT2_MASK		0x1f
+#define VPE_EDI_LUT2_SHIFT		16
+#define VPE_EDI_LUT3_MASK		0x1f
+#define VPE_EDI_LUT3_SHIFT		24
+
+#define VPE_DEI_EDI_LUT_R1		0x0614
+#define VPE_EDI_LUT0_MASK		0x1f
+#define VPE_EDI_LUT0_SHIFT		0
+#define VPE_EDI_LUT1_MASK		0x1f
+#define VPE_EDI_LUT1_SHIFT		8
+#define VPE_EDI_LUT2_MASK		0x1f
+#define VPE_EDI_LUT2_SHIFT		16
+#define VPE_EDI_LUT3_MASK		0x1f
+#define VPE_EDI_LUT3_SHIFT		24
+
+#define VPE_DEI_EDI_LUT_R2		0x0618
+#define VPE_EDI_LUT4_MASK		0x1f
+#define VPE_EDI_LUT4_SHIFT		0
+#define VPE_EDI_LUT5_MASK		0x1f
+#define VPE_EDI_LUT5_SHIFT		8
+#define VPE_EDI_LUT6_MASK		0x1f
+#define VPE_EDI_LUT6_SHIFT		16
+#define VPE_EDI_LUT7_MASK		0x1f
+#define VPE_EDI_LUT7_SHIFT		24
+
+#define VPE_DEI_EDI_LUT_R3		0x061c
+#define VPE_EDI_LUT8_MASK		0x1f
+#define VPE_EDI_LUT8_SHIFT		0
+#define VPE_EDI_LUT9_MASK		0x1f
+#define VPE_EDI_LUT9_SHIFT		8
+#define VPE_EDI_LUT10_MASK		0x1f
+#define VPE_EDI_LUT10_SHIFT		16
+#define VPE_EDI_LUT11_MASK		0x1f
+#define VPE_EDI_LUT11_SHIFT		24
+
+#define VPE_DEI_FMD_WINDOW_R0		0x0620
+#define VPE_FMD_WINDOW_MINX_MASK	0x07ff
+#define VPE_FMD_WINDOW_MINX_SHIFT	0
+#define VPE_FMD_WINDOW_MAXX_MASK	0x07ff
+#define VPE_FMD_WINDOW_MAXX_SHIFT	16
+#define VPE_FMD_WINDOW_ENABLE		(1 << 31)
+
+#define VPE_DEI_FMD_WINDOW_R1		0x0624
+#define VPE_FMD_WINDOW_MINY_MASK	0x07ff
+#define VPE_FMD_WINDOW_MINY_SHIFT	0
+#define VPE_FMD_WINDOW_MAXY_MASK	0x07ff
+#define VPE_FMD_WINDOW_MAXY_SHIFT	16
+
+#define VPE_DEI_FMD_CONTROL_R0		0x0628
+#define VPE_FMD_ENABLE			(1 << 0)
+#define VPE_FMD_LOCK			(1 << 1)
+#define VPE_FMD_JAM_DIR			(1 << 2)
+#define VPE_FMD_BED_ENABLE		(1 << 3)
+#define VPE_FMD_CAF_FIELD_THR_MASK	0xff
+#define VPE_FMD_CAF_FIELD_THR_SHIFT	16
+#define VPE_FMD_CAF_LINE_THR_MASK	0xff
+#define VPE_FMD_CAF_LINE_THR_SHIFT	24
+
+#define VPE_DEI_FMD_CONTROL_R1		0x062c
+#define VPE_FMD_CAF_THR_MASK		0x000fffff
+#define VPE_FMD_CAF_THR_SHIFT		0
+
+#define VPE_DEI_FMD_STATUS_R0		0x0630
+#define VPE_FMD_CAF_MASK		0x000fffff
+#define VPE_FMD_CAF_SHIFT		0
+#define VPE_FMD_RESET			(1 << 24)
+
+#define VPE_DEI_FMD_STATUS_R1		0x0634
+#define VPE_FMD_FIELD_DIFF_MASK		0x0fffffff
+#define VPE_FMD_FIELD_DIFF_SHIFT	0
+
+#define VPE_DEI_FMD_STATUS_R2		0x0638
+#define VPE_FMD_FRAME_DIFF_MASK		0x000fffff
+#define VPE_FMD_FRAME_DIFF_SHIFT	0
+
+/* VPE scaler regs */
+#define VPE_SC_MP_SC0			0x0700
+#define VPE_INTERLACE_O			(1 << 0)
+#define VPE_LINEAR			(1 << 1)
+#define VPE_SC_BYPASS			(1 << 2)
+#define VPE_INVT_FID			(1 << 3)
+#define VPE_USE_RAV			(1 << 4)
+#define VPE_ENABLE_EV			(1 << 5)
+#define VPE_AUTH_HS			(1 << 6)
+#define VPE_DCM_2X			(1 << 7)
+#define VPE_DCM_4X			(1 << 8)
+#define VPE_HP_BYPASS			(1 << 9)
+#define VPE_INTERLACE_I			(1 << 10)
+#define VPE_ENABLE_SIN2_VER_INTP	(1 << 11)
+#define VPE_Y_PK_EN			(1 << 14)
+#define VPE_TRIM			(1 << 15)
+#define VPE_SELFGEN_FID			(1 << 16)
+
+#define VPE_SC_MP_SC1			0x0704
+#define VPE_ROW_ACC_INC_MASK		0x07ffffff
+#define VPE_ROW_ACC_INC_SHIFT		0
+
+#define VPE_SC_MP_SC2			0x0708
+#define VPE_ROW_ACC_OFFSET_MASK		0x0fffffff
+#define VPE_ROW_ACC_OFFSET_SHIFT	0
+
+#define VPE_SC_MP_SC3			0x070c
+#define VPE_ROW_ACC_OFFSET_B_MASK	0x0fffffff
+#define VPE_ROW_ACC_OFFSET_B_SHIFT	0
+
+#define VPE_SC_MP_SC4			0x0710
+#define VPE_TAR_H_MASK			0x07ff
+#define VPE_TAR_H_SHIFT			0
+#define VPE_TAR_W_MASK			0x07ff
+#define VPE_TAR_W_SHIFT			12
+#define VPE_LIN_ACC_INC_U_MASK		0x07
+#define VPE_LIN_ACC_INC_U_SHIFT		24
+#define VPE_NLIN_ACC_INIT_U_MASK	0x07
+#define VPE_NLIN_ACC_INIT_U_SHIFT	28
+
+#define VPE_SC_MP_SC5			0x0714
+#define VPE_SRC_H_MASK			0x07ff
+#define VPE_SRC_H_SHIFT			0
+#define VPE_SRC_W_MASK			0x07ff
+#define VPE_SRC_W_SHIFT			12
+#define VPE_NLIN_ACC_INC_U_MASK		0x07
+#define VPE_NLIN_ACC_INC_U_SHIFT	24
+
+#define VPE_SC_MP_SC6			0x0718
+#define VPE_ROW_ACC_INIT_RAV_MASK	0x03ff
+#define VPE_ROW_ACC_INIT_RAV_SHIFT	0
+#define VPE_ROW_ACC_INIT_RAV_B_MASK	0x03ff
+#define VPE_ROW_ACC_INIT_RAV_B_SHIFT	10
+
+#define VPE_SC_MP_SC8			0x0720
+#define VPE_NLIN_LEFT_MASK		0x07ff
+#define VPE_NLIN_LEFT_SHIFT		0
+#define VPE_NLIN_RIGHT_MASK		0x07ff
+#define VPE_NLIN_RIGHT_SHIFT		12
+
+#define VPE_SC_MP_SC9			0x0724
+#define VPE_LIN_ACC_INC			VPE_SC_MP_SC9
+
+#define VPE_SC_MP_SC10			0x0728
+#define VPE_NLIN_ACC_INIT		VPE_SC_MP_SC10
+
+#define VPE_SC_MP_SC11			0x072c
+#define VPE_NLIN_ACC_INC		VPE_SC_MP_SC11
+
+#define VPE_SC_MP_SC12			0x0730
+#define VPE_COL_ACC_OFFSET_MASK		0x01ffffff
+#define VPE_COL_ACC_OFFSET_SHIFT	0
+
+#define VPE_SC_MP_SC13			0x0734
+#define VPE_SC_FACTOR_RAV_MASK		0x03ff
+#define VPE_SC_FACTOR_RAV_SHIFT		0
+#define VPE_CHROMA_INTP_THR_MASK	0x03ff
+#define VPE_CHROMA_INTP_THR_SHIFT	12
+#define VPE_DELTA_CHROMA_THR_MASK	0x0f
+#define VPE_DELTA_CHROMA_THR_SHIFT	24
+
+#define VPE_SC_MP_SC17			0x0744
+#define VPE_EV_THR_MASK			0x03ff
+#define VPE_EV_THR_SHIFT		12
+#define VPE_DELTA_LUMA_THR_MASK		0x0f
+#define VPE_DELTA_LUMA_THR_SHIFT	24
+#define VPE_DELTA_EV_THR_MASK		0x0f
+#define VPE_DELTA_EV_THR_SHIFT		28
+
+#define VPE_SC_MP_SC18			0x0748
+#define VPE_HS_FACTOR_MASK		0x03ff
+#define VPE_HS_FACTOR_SHIFT		0
+#define VPE_CONF_DEFAULT_MASK		0x01ff
+#define VPE_CONF_DEFAULT_SHIFT		16
+
+#define VPE_SC_MP_SC19			0x074c
+#define VPE_HPF_COEFF0_MASK		0xff
+#define VPE_HPF_COEFF0_SHIFT		0
+#define VPE_HPF_COEFF1_MASK		0xff
+#define VPE_HPF_COEFF1_SHIFT		8
+#define VPE_HPF_COEFF2_MASK		0xff
+#define VPE_HPF_COEFF2_SHIFT		16
+#define VPE_HPF_COEFF3_MASK		0xff
+#define VPE_HPF_COEFF3_SHIFT		23
+
+#define VPE_SC_MP_SC20			0x0750
+#define VPE_HPF_COEFF4_MASK		0xff
+#define VPE_HPF_COEFF4_SHIFT		0
+#define VPE_HPF_COEFF5_MASK		0xff
+#define VPE_HPF_COEFF5_SHIFT		8
+#define VPE_HPF_NORM_SHIFT_MASK		0x07
+#define VPE_HPF_NORM_SHIFT_SHIFT	16
+#define VPE_NL_LIMIT_MASK		0x1ff
+#define VPE_NL_LIMIT_SHIFT		20
+
+#define VPE_SC_MP_SC21			0x0754
+#define VPE_NL_LO_THR_MASK		0x01ff
+#define VPE_NL_LO_THR_SHIFT		0
+#define VPE_NL_LO_SLOPE_MASK		0xff
+#define VPE_NL_LO_SLOPE_SHIFT		16
+
+#define VPE_SC_MP_SC22			0x0758
+#define VPE_NL_HI_THR_MASK		0x01ff
+#define VPE_NL_HI_THR_SHIFT		0
+#define VPE_NL_HI_SLOPE_SH_MASK		0x07
+#define VPE_NL_HI_SLOPE_SH_SHIFT	16
+
+#define VPE_SC_MP_SC23			0x075c
+#define VPE_GRADIENT_THR_MASK		0x07ff
+#define VPE_GRADIENT_THR_SHIFT		0
+#define VPE_GRADIENT_THR_RANGE_MASK	0x0f
+#define VPE_GRADIENT_THR_RANGE_SHIFT	12
+#define VPE_MIN_GY_THR_MASK		0xff
+#define VPE_MIN_GY_THR_SHIFT		16
+#define VPE_MIN_GY_THR_RANGE_MASK	0x0f
+#define VPE_MIN_GY_THR_RANGE_SHIFT	28
+
+#define VPE_SC_MP_SC24			0x0760
+#define VPE_ORG_H_MASK			0x07ff
+#define VPE_ORG_H_SHIFT			0
+#define VPE_ORG_W_MASK			0x07ff
+#define VPE_ORG_W_SHIFT			16
+
+#define VPE_SC_MP_SC25			0x0764
+#define VPE_OFF_H_MASK			0x07ff
+#define VPE_OFF_H_SHIFT			0
+#define VPE_OFF_W_MASK			0x07ff
+#define VPE_OFF_W_SHIFT			16
+
+/* VPE color space converter regs */
+#define VPE_CSC_CSC00			0x5700
+#define VPE_CSC_A0_MASK			0x1fff
+#define VPE_CSC_A0_SHIFT		0
+#define VPE_CSC_B0_MASK			0x1fff
+#define VPE_CSC_B0_SHIFT		16
+
+#define VPE_CSC_CSC01			0x5704
+#define VPE_CSC_C0_MASK			0x1fff
+#define VPE_CSC_C0_SHIFT		0
+#define VPE_CSC_A1_MASK			0x1fff
+#define VPE_CSC_A1_SHIFT		16
+
+#define VPE_CSC_CSC02			0x5708
+#define VPE_CSC_B1_MASK			0x1fff
+#define VPE_CSC_B1_SHIFT		0
+#define VPE_CSC_C1_MASK			0x1fff
+#define VPE_CSC_C1_SHIFT		16
+
+#define VPE_CSC_CSC03			0x570c
+#define VPE_CSC_A2_MASK			0x1fff
+#define VPE_CSC_A2_SHIFT		0
+#define VPE_CSC_B2_MASK			0x1fff
+#define VPE_CSC_B2_SHIFT		16
+
+#define VPE_CSC_CSC04			0x5710
+#define VPE_CSC_C2_MASK			0x1fff
+#define VPE_CSC_C2_SHIFT		0
+#define VPE_CSC_D0_MASK			0x0fff
+#define VPE_CSC_D0_SHIFT		16
+
+#define VPE_CSC_CSC05			0x5714
+#define VPE_CSC_D1_MASK			0x0fff
+#define VPE_CSC_D1_SHIFT		0
+#define VPE_CSC_D2_MASK			0x0fff
+#define VPE_CSC_D2_SHIFT		16
+#define VPE_CSC_BYPASS			(1 << 28)
+
+#endif
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v2 3/6] v4l: ti-vpe: Add VPE mem to mem driver
@ 2013-08-20 11:00     ` Archit Taneja
  0 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-08-20 11:00 UTC (permalink / raw)
  To: linux-media, hverkuil, laurent.pinchart, tomi.valkeinen
  Cc: linux-omap, Archit Taneja

VPE is a block which consists of a single memory to memory path which can
perform chrominance up/down sampling, de-interlacing, scaling, and color space
conversion of raster or tiled YUV420 coplanar, YUV422 coplanar or YUV422
interleaved video formats.

We create a mem2mem driver based primarily on the mem2mem-testdev example.
The de-interlacer, scaler and color space converter are all bypassed for now
to keep the driver simple. Chroma up/down sampler blocks are implemented, so
conversion beteen different YUV formats is possible.

Each mem2mem context allocates a buffer for VPE MMR values which it will use
when it gets access to the VPE HW via the mem2mem queue, it also allocates
a VPDMA descriptor list to which configuration and data descriptors are added.

Based on the information received via v4l2 ioctls for the source and
destination queues, the driver configures the values for the MMRs, and stores
them in the buffer. There are also some VPDMA parameters like frame start and
line mode which needs to be configured, these are configured by direct register
writes via the VPDMA helper functions.

The driver's device_run() mem2mem op will add each descriptor based on how the
source and destination queues are set up for the given ctx, once the list is
prepared, it's submitted to VPDMA, these descriptors when parsed by VPDMA will
upload MMR registers, start DMA of video buffers on the various input and output
clients/ports.

When the list is parsed completely(and the DMAs on all the output ports done),
an interrupt is generated which we use to notify that the source and destination
buffers are done.

The rest of the driver is quite similar to other mem2mem drivers, we use the
multiplane v4l2 ioctls as the HW support coplanar formats.

Signed-off-by: Archit Taneja <archit@ti.com>
---
 drivers/media/platform/Kconfig           |   16 +
 drivers/media/platform/Makefile          |    2 +
 drivers/media/platform/ti-vpe/Makefile   |    5 +
 drivers/media/platform/ti-vpe/vpe.c      | 1740 ++++++++++++++++++++++++++++++
 drivers/media/platform/ti-vpe/vpe_regs.h |  496 +++++++++
 5 files changed, 2259 insertions(+)
 create mode 100644 drivers/media/platform/ti-vpe/Makefile
 create mode 100644 drivers/media/platform/ti-vpe/vpe.c
 create mode 100644 drivers/media/platform/ti-vpe/vpe_regs.h

diff --git a/drivers/media/platform/Kconfig b/drivers/media/platform/Kconfig
index 08de865..94eede7 100644
--- a/drivers/media/platform/Kconfig
+++ b/drivers/media/platform/Kconfig
@@ -210,6 +210,22 @@ config VIDEO_SH_VEU
 	    Support for the Video Engine Unit (VEU) on SuperH and
 	    SH-Mobile SoCs.
 
+config VIDEO_TI_VPE
+	tristate "TI VPE (Video Processing Engine) driver"
+	depends on VIDEO_DEV && VIDEO_V4L2 && SOC_DRA7XX
+	select VIDEOBUF2_DMA_CONTIG
+	select V4L2_MEM2MEM_DEV
+	default n
+	---help---
+	  Support for the TI VPE(Video Processing Engine) block
+	  found on DRA7XX SoC.
+
+config VIDEO_TI_VPE_DEBUG
+	bool "VPE debug messages"
+	depends on VIDEO_TI_VPE
+	---help---
+	  Enable debug messages on VPE driver.
+
 endif # V4L_MEM2MEM_DRIVERS
 
 menuconfig V4L_TEST_DRIVERS
diff --git a/drivers/media/platform/Makefile b/drivers/media/platform/Makefile
index eee28dd..d4614e7 100644
--- a/drivers/media/platform/Makefile
+++ b/drivers/media/platform/Makefile
@@ -22,6 +22,8 @@ obj-$(CONFIG_VIDEO_VIVI) += vivi.o
 
 obj-$(CONFIG_VIDEO_MEM2MEM_TESTDEV) += mem2mem_testdev.o
 
+obj-$(CONFIG_VIDEO_TI_VPE)		+= ti-vpe/
+
 obj-$(CONFIG_VIDEO_MX2_EMMAPRP)		+= mx2_emmaprp.o
 obj-$(CONFIG_VIDEO_CODA) 		+= coda.o
 
diff --git a/drivers/media/platform/ti-vpe/Makefile b/drivers/media/platform/ti-vpe/Makefile
new file mode 100644
index 0000000..cbf0a80
--- /dev/null
+++ b/drivers/media/platform/ti-vpe/Makefile
@@ -0,0 +1,5 @@
+obj-$(CONFIG_VIDEO_TI_VPE) += ti-vpe.o
+
+ti-vpe-y := vpe.o vpdma.o
+
+ccflags-$(CONFIG_VIDEO_TI_VPE_DEBUG) += -DDEBUG
diff --git a/drivers/media/platform/ti-vpe/vpe.c b/drivers/media/platform/ti-vpe/vpe.c
new file mode 100644
index 0000000..5e1d80e
--- /dev/null
+++ b/drivers/media/platform/ti-vpe/vpe.c
@@ -0,0 +1,1740 @@
+/*
+ * TI VPE mem2mem driver, based on the virtual v4l2-mem2mem example driver
+ *
+ * Copyright (c) 2013 Texas Instruments Inc.
+ * David Griego, <dagriego@biglakesoftware.com>
+ * Dale Farnsworth, <dale@farnsworth.org>
+ * Archit Taneja, <archit@ti.com>
+ *
+ * Copyright (c) 2009-2010 Samsung Electronics Co., Ltd.
+ * Pawel Osciak, <pawel@osciak.com>
+ * Marek Szyprowski, <m.szyprowski@samsung.com>
+ *
+ * Based on the virtual v4l2-mem2mem example device
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation
+ */
+
+#include <linux/delay.h>
+#include <linux/dma-mapping.h>
+#include <linux/err.h>
+#include <linux/fs.h>
+#include <linux/interrupt.h>
+#include <linux/io.h>
+#include <linux/ioctl.h>
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <linux/pm_runtime.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/videodev2.h>
+
+#include <media/v4l2-common.h>
+#include <media/v4l2-ctrls.h>
+#include <media/v4l2-device.h>
+#include <media/v4l2-event.h>
+#include <media/v4l2-ioctl.h>
+#include <media/v4l2-mem2mem.h>
+#include <media/videobuf2-core.h>
+#include <media/videobuf2-dma-contig.h>
+
+#include "vpdma.h"
+#include "vpe_regs.h"
+
+#define VPE_MODULE_NAME "vpe"
+
+/* minimum and maximum frame sizes */
+#define MIN_W		128
+#define MIN_H		128
+#define MAX_W		1920
+#define MAX_H		1080
+
+/* required alignments */
+#define S_ALIGN		0	/* multiple of 1 */
+#define H_ALIGN		1	/* multiple of 2 */
+#define W_ALIGN		1	/* multiple of 2 */
+
+/* multiple of 128 bits, line stride, 16 bytes */
+#define L_ALIGN		4
+
+/* flags that indicate a format can be used for capture/output */
+#define VPE_FMT_TYPE_CAPTURE	(1 << 0)
+#define VPE_FMT_TYPE_OUTPUT	(1 << 1)
+
+/* used as plane indices */
+#define VPE_MAX_PLANES	2
+#define VPE_LUMA	0
+#define VPE_CHROMA	1
+
+/* per m2m context info */
+#define VPE_DEF_BUFS_PER_JOB	1	/* default one buffer per batch job */
+
+/*
+ * each VPE context needs up to 3 config desciptors, 7 input descriptors,
+ * 3 output descriptors, and 3 control descripors
+ */
+#define VPE_DESC_LIST_SIZE	(15 * VPDMA_MAX_DESC_SIZE)
+
+#define vpe_dbg(vpedev, fmt, arg...)	\
+		dev_dbg((vpedev)->v4l2_dev.dev, fmt, ##arg)
+#define vpe_err(vpedev, fmt, arg...)	\
+		dev_err((vpedev)->v4l2_dev.dev, fmt, ##arg)
+
+struct vpe_us_coeffs {
+	unsigned short	anchor_fid0_c0;
+	unsigned short	anchor_fid0_c1;
+	unsigned short	anchor_fid0_c2;
+	unsigned short	anchor_fid0_c3;
+	unsigned short	interp_fid0_c0;
+	unsigned short	interp_fid0_c1;
+	unsigned short	interp_fid0_c2;
+	unsigned short	interp_fid0_c3;
+	unsigned short	anchor_fid1_c0;
+	unsigned short	anchor_fid1_c1;
+	unsigned short	anchor_fid1_c2;
+	unsigned short	anchor_fid1_c3;
+	unsigned short	interp_fid1_c0;
+	unsigned short	interp_fid1_c1;
+	unsigned short	interp_fid1_c2;
+	unsigned short	interp_fid1_c3;
+};
+
+/*
+ * Default upsampler coefficients
+ */
+static const struct vpe_us_coeffs us_coeffs[] = {
+	{
+		/* Coefficients for progressive input */
+		0x00C8, 0x0348, 0x0018, 0x3FD8, 0x3FB8, 0x0378, 0x00E8, 0x3FE8,
+		0x00C8, 0x0348, 0x0018, 0x3FD8, 0x3FB8, 0x0378, 0x00E8, 0x3FE8,
+	},
+};
+
+/*
+ * The port_data structure contains per-port data.
+ */
+struct vpe_port_data {
+	enum vpdma_channel channel;	/* VPDMA channel */
+	u8	vb_part;		/* plane index for co-panar formats */
+};
+
+/*
+ * Define indices into the port_data tables
+ */
+#define VPE_PORT_LUMA1_IN	0
+#define VPE_PORT_CHROMA1_IN	1
+#define VPE_PORT_LUMA_OUT	8
+#define VPE_PORT_CHROMA_OUT	9
+#define VPE_PORT_RGB_OUT	10
+
+static const struct vpe_port_data port_data[11] = {
+	[VPE_PORT_LUMA1_IN] = {
+		.channel	= VPE_CHAN_LUMA1_IN,
+		.vb_part	= VPE_LUMA,
+	},
+	[VPE_PORT_CHROMA1_IN] = {
+		.channel	= VPE_CHAN_CHROMA1_IN,
+		.vb_part	= VPE_CHROMA,
+	},
+	[VPE_PORT_LUMA_OUT] = {
+		.channel	= VPE_CHAN_LUMA_OUT,
+		.vb_part	= VPE_LUMA,
+	},
+	[VPE_PORT_CHROMA_OUT] = {
+		.channel	= VPE_CHAN_CHROMA_OUT,
+		.vb_part	= VPE_CHROMA,
+	},
+	[VPE_PORT_RGB_OUT] = {
+		.channel	= VPE_CHAN_RGB_OUT,
+		.vb_part	= VPE_LUMA,
+	},
+};
+
+
+/* driver info for each of the supported video formats */
+struct vpe_fmt {
+	char	*name;			/* human-readable name */
+	u32	fourcc;			/* standard format identifier */
+	u8	types;			/* CAPTURE and/or OUTPUT */
+	u8	coplanar;		/* set for unpacked Luma and Chroma */
+	/* vpdma format info for each plane */
+	struct vpdma_data_format const *vpdma_fmt[VPE_MAX_PLANES];
+};
+
+static struct vpe_fmt vpe_formats[] = {
+	{
+		.name		= "YUV 422 co-planar",
+		.fourcc		= V4L2_PIX_FMT_NV16,
+		.types		= VPE_FMT_TYPE_CAPTURE | VPE_FMT_TYPE_OUTPUT,
+		.coplanar	= 1,
+		.vpdma_fmt	= { &vpdma_yuv_fmts[VPDMA_DATA_FMT_Y444],
+				    &vpdma_yuv_fmts[VPDMA_DATA_FMT_C444],
+				  },
+	},
+	{
+		.name		= "YUV 420 co-planar",
+		.fourcc		= V4L2_PIX_FMT_NV12,
+		.types		= VPE_FMT_TYPE_CAPTURE | VPE_FMT_TYPE_OUTPUT,
+		.coplanar	= 1,
+		.vpdma_fmt	= { &vpdma_yuv_fmts[VPDMA_DATA_FMT_Y420],
+				    &vpdma_yuv_fmts[VPDMA_DATA_FMT_C420],
+				  },
+	},
+	{
+		.name		= "YUYV 422 packed",
+		.fourcc		= V4L2_PIX_FMT_YUYV,
+		.types		= VPE_FMT_TYPE_CAPTURE | VPE_FMT_TYPE_OUTPUT,
+		.coplanar	= 0,
+		.vpdma_fmt	= { &vpdma_yuv_fmts[VPDMA_DATA_FMT_YC422],
+				  },
+	},
+	{
+		.name		= "UYVY 422 packed",
+		.fourcc		= V4L2_PIX_FMT_UYVY,
+		.types		= VPE_FMT_TYPE_CAPTURE | VPE_FMT_TYPE_OUTPUT,
+		.coplanar	= 0,
+		.vpdma_fmt	= { &vpdma_yuv_fmts[VPDMA_DATA_FMT_CY422],
+				  },
+	},
+};
+
+/*
+ * per-queue, driver-specific private data.
+ * there is one source queue and one destination queue for each m2m context.
+ */
+struct vpe_q_data {
+	unsigned int		width;				/* frame width */
+	unsigned int		height;				/* frame height */
+	unsigned int		bytesperline[VPE_MAX_PLANES];	/* bytes per line in memory */
+	enum v4l2_colorspace	colorspace;
+	unsigned int		flags;
+	unsigned int		sizeimage[VPE_MAX_PLANES];	/* image size in memory */
+	struct v4l2_rect	c_rect;				/* crop/compose rectangle */
+	struct vpe_fmt		*fmt;				/* format info */
+};
+
+/* vpe_q_data flag bits */
+#define	Q_DATA_FRAME_1D		(1 << 0)
+#define	Q_DATA_MODE_TILED	(1 << 1)
+
+enum {
+	Q_DATA_SRC = 0,
+	Q_DATA_DST = 1,
+};
+
+/* find our format description corresponding to the passed v4l2_format */
+static struct vpe_fmt *find_format(struct v4l2_format *f)
+{
+	struct vpe_fmt *fmt;
+	unsigned int k;
+
+	for (k = 0; k < ARRAY_SIZE(vpe_formats); k++) {
+		fmt = &vpe_formats[k];
+		if (fmt->fourcc == f->fmt.pix.pixelformat)
+			return fmt;
+	}
+
+	return NULL;
+}
+
+/*
+ * there is one vpe_dev structure in the driver, it is shared by
+ * all instances.
+ */
+struct vpe_dev {
+	struct v4l2_device	v4l2_dev;
+	struct video_device	vfd;
+	struct v4l2_m2m_dev	*m2m_dev;
+
+	atomic_t		num_instances;	/* count of driver instances */
+	dma_addr_t		loaded_mmrs;	/* shadow mmrs in device */
+	struct mutex		dev_mutex;
+	spinlock_t		lock;
+
+	int			irq;
+	void __iomem		*base;
+
+	struct vb2_alloc_ctx	*alloc_ctx;
+	struct vpdma_data	*vpdma;		/* vpdma data handle */
+};
+
+/*
+ * There is one vpe_ctx structure for each m2m context.
+ */
+struct vpe_ctx {
+	struct v4l2_fh		fh;
+	struct vpe_dev		*dev;
+	struct v4l2_m2m_ctx	*m2m_ctx;
+	struct v4l2_ctrl_handler hdl;
+
+	unsigned int		sequence;		/* current frame/field seq */
+	unsigned int		aborting;		/* abort after next irq */
+
+	unsigned int		bufs_per_job;		/* input buffers per batch */
+	unsigned int		bufs_completed;		/* bufs done in this batch */
+
+	struct vpe_q_data	q_data[2];		/* src & dst queue data */
+	struct vb2_buffer	*src_vb;
+	struct vb2_buffer	*dst_vb;
+
+	struct vpdma_buf	mmr_adb;		/* shadow reg addr/data block */
+	struct vpdma_desc_list	desc_list;		/* DMA descriptor list */
+
+	bool			load_mmrs;		/* have new shadow reg values */
+};
+
+
+/*
+ * M2M devices get 2 queues.
+ * Return the queue given the type.
+ */
+static struct vpe_q_data *get_q_data(struct vpe_ctx *ctx,
+				     enum v4l2_buf_type type)
+{
+	switch (type) {
+	case V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE:
+		return &ctx->q_data[Q_DATA_SRC];
+	case V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE:
+		return &ctx->q_data[Q_DATA_DST];
+	default:
+		BUG();
+	}
+	return NULL;
+}
+
+static u32 read_reg(struct vpe_dev *dev, int offset)
+{
+	return ioread32(dev->base + offset);
+}
+
+static void write_reg(struct vpe_dev *dev, int offset, u32 value)
+{
+	iowrite32(value, dev->base + offset);
+}
+
+/* register field read/write helpers */
+static int get_field(u32 value, u32 mask, int shift)
+{
+	return (value & (mask << shift)) >> shift;
+}
+
+static int read_field_reg(struct vpe_dev *dev, int offset, u32 mask, int shift)
+{
+	return get_field(read_reg(dev, offset), mask, shift);
+}
+
+static void write_field(u32 *valp, u32 field, u32 mask, int shift)
+{
+	u32 val = *valp;
+
+	val &= ~(mask << shift);
+	val |= (field & mask) << shift;
+	*valp = val;
+}
+
+static void write_field_reg(struct vpe_dev *dev, int offset, u32 field,
+		u32 mask, int shift)
+{
+	u32 val = read_reg(dev, offset);
+
+	write_field(&val, field, mask, shift);
+
+	write_reg(dev, offset, val);
+}
+
+/*
+ * DMA address/data block for the shadow registers
+ */
+struct vpe_mmr_adb {
+	struct vpdma_adb_hdr	out_fmt_hdr;
+	u32			out_fmt_reg[1];
+	u32			out_fmt_pad[3];
+	struct vpdma_adb_hdr	us1_hdr;
+	u32			us1_regs[8];
+	struct vpdma_adb_hdr	us2_hdr;
+	u32			us2_regs[8];
+	struct vpdma_adb_hdr	us3_hdr;
+	u32			us3_regs[8];
+	struct vpdma_adb_hdr	dei_hdr;
+	u32			dei_regs[1];
+	u32			dei_pad[3];
+	struct vpdma_adb_hdr	sc_hdr;
+	u32			sc_regs[1];
+	u32			sc_pad[3];
+	struct vpdma_adb_hdr	csc_hdr;
+	u32			csc_regs[6];
+	u32			csc_pad[2];
+};
+
+#define VPE_SET_MMR_ADB_HDR(ctx, hdr, regs, offset_a)	\
+	VPDMA_SET_MMR_ADB_HDR(ctx->mmr_adb, vpe_mmr_adb, hdr, regs, offset_a)
+/*
+ * Set the headers for all of the address/data block structures.
+ */
+static void init_adb_hdrs(struct vpe_ctx *ctx)
+{
+	VPE_SET_MMR_ADB_HDR(ctx, out_fmt_hdr, out_fmt_reg, VPE_CLK_FORMAT_SELECT);
+	VPE_SET_MMR_ADB_HDR(ctx, us1_hdr, us1_regs, VPE_US1_R0);
+	VPE_SET_MMR_ADB_HDR(ctx, us2_hdr, us2_regs, VPE_US2_R0);
+	VPE_SET_MMR_ADB_HDR(ctx, us3_hdr, us3_regs, VPE_US3_R0);
+	VPE_SET_MMR_ADB_HDR(ctx, dei_hdr, dei_regs, VPE_DEI_FRAME_SIZE);
+	VPE_SET_MMR_ADB_HDR(ctx, sc_hdr, sc_regs, VPE_SC_MP_SC0);
+	VPE_SET_MMR_ADB_HDR(ctx, csc_hdr, csc_regs, VPE_CSC_CSC00);
+};
+
+/*
+ * Enable or disable the VPE clocks
+ */
+static void vpe_set_clock_enable(struct vpe_dev *dev, bool on)
+{
+	u32 val = 0;
+
+	if (on)
+		val = VPE_DATA_PATH_CLK_ENABLE | VPE_VPEDMA_CLK_ENABLE;
+	write_reg(dev, VPE_CLK_ENABLE, val);
+}
+
+static void vpe_top_reset(struct vpe_dev *dev)
+{
+
+	write_field_reg(dev, VPE_CLK_RESET, 1, VPE_DATA_PATH_CLK_RESET_MASK,
+		VPE_DATA_PATH_CLK_RESET_SHIFT);
+
+	usleep_range(100, 150);
+
+	write_field_reg(dev, VPE_CLK_RESET, 0, VPE_DATA_PATH_CLK_RESET_MASK,
+		VPE_DATA_PATH_CLK_RESET_SHIFT);
+}
+
+static void vpe_top_vpdma_reset(struct vpe_dev *dev)
+{
+	write_field_reg(dev, VPE_CLK_RESET, 1, VPE_VPDMA_CLK_RESET_MASK,
+		VPE_VPDMA_CLK_RESET_SHIFT);
+
+	usleep_range(100, 150);
+
+	write_field_reg(dev, VPE_CLK_RESET, 0, VPE_VPDMA_CLK_RESET_MASK,
+		VPE_VPDMA_CLK_RESET_SHIFT);
+}
+
+/*
+ * Load the correct of upsampler coefficients into the shadow MMRs
+ */
+static void set_us_coefficients(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	u32 *us1_reg = &mmr_adb->us1_regs[0];
+	u32 *us2_reg = &mmr_adb->us2_regs[0];
+	u32 *us3_reg = &mmr_adb->us3_regs[0];
+	const unsigned short *cp, *end_cp;
+
+	cp = &us_coeffs[0].anchor_fid0_c0;
+
+	end_cp = cp + sizeof(us_coeffs[0]) / sizeof(*cp);
+
+	while (cp < end_cp) {
+		write_field(us1_reg, *cp++, VPE_US_C0_MASK, VPE_US_C0_SHIFT);
+		write_field(us1_reg, *cp++, VPE_US_C1_MASK, VPE_US_C1_SHIFT);
+		*us2_reg++ = *us1_reg;
+		*us3_reg++ = *us1_reg++;
+	}
+	ctx->load_mmrs = true;
+}
+
+/*
+ * Set the upsampler config mode and the VPDMA line mode in the shadow MMRs.
+ */
+static void set_cfg_and_line_modes(struct vpe_ctx *ctx)
+{
+	struct vpe_fmt *fmt = ctx->q_data[Q_DATA_SRC].fmt;
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	u32 *us1_reg0 = &mmr_adb->us1_regs[0];
+	u32 *us2_reg0 = &mmr_adb->us2_regs[0];
+	u32 *us3_reg0 = &mmr_adb->us3_regs[0];
+	int line_mode = 1;
+	int cfg_mode = 1;
+
+	/*
+	 * Cfg Mode 0: YUV420 source, enable upsampler, DEI is de-interlacing.
+	 * Cfg Mode 1: YUV422 source, disable upsampler, DEI is de-interlacing.
+	 */
+
+	if (fmt->fourcc == V4L2_PIX_FMT_NV12) {
+		cfg_mode = 0;
+		line_mode = 0;		/* double lines to line buffer */
+	}
+
+	write_field(us1_reg0, cfg_mode, VPE_US_MODE_MASK, VPE_US_MODE_SHIFT);
+	write_field(us2_reg0, cfg_mode, VPE_US_MODE_MASK, VPE_US_MODE_SHIFT);
+	write_field(us3_reg0, cfg_mode, VPE_US_MODE_MASK, VPE_US_MODE_SHIFT);
+
+	/* regs for now */
+	vpdma_set_line_mode(ctx->dev->vpdma, line_mode, VPE_CHAN_CHROMA1_IN);
+
+	/* frame start for input luma */
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_LUMA1_IN);
+
+	/* frame start for input chroma */
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_CHROMA1_IN);
+
+	ctx->load_mmrs = true;
+}
+
+/*
+ * Set the shadow registers that are modified when the source
+ * format changes.
+ */
+static void set_src_registers(struct vpe_ctx *ctx)
+{
+	set_us_coefficients(ctx);
+}
+
+/*
+ * Set the shadow registers that are modified when the destination
+ * format changes.
+ */
+static void set_dst_registers(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	struct vpe_fmt *fmt = ctx->q_data[Q_DATA_DST].fmt;
+	u32 val = 0;
+
+	/* select RGB path when color space conversion is supported in future */
+	if (fmt->fourcc == V4L2_PIX_FMT_RGB24)
+		val |= VPE_RGB_OUT_SELECT | VPE_CSC_SRC_DEI_SCALER;
+	else if (fmt->fourcc == V4L2_PIX_FMT_NV16)
+		val |= VPE_COLOR_SEPARATE_422;
+
+	/* The source of CHR_DS is always the scaler, whether it's used or not */
+	val |= VPE_DS_SRC_DEI_SCALER;
+
+	if (fmt->fourcc != V4L2_PIX_FMT_NV12)
+		val |= VPE_DS_BYPASS;
+
+	mmr_adb->out_fmt_reg[0] = val;
+
+	ctx->load_mmrs = true;
+}
+
+/*
+ * Set the de-interlacer shadow register values
+ */
+static void set_dei_regs_bypass(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	struct vpe_q_data *s_q_data = &ctx->q_data[Q_DATA_SRC];
+	unsigned int src_h = s_q_data->c_rect.height;
+	unsigned int src_w = s_q_data->c_rect.width;
+	u32 *dei_mmr0 = &mmr_adb->dei_regs[0];
+	u32 val = 0;
+
+	/*
+	 * according to TRM, we should set DEI in progressive bypass mode when
+	 * the input content is progressive, however, DEI is bypassed correctly
+	 * for both progressive and interlace content in interlace bypass mode.
+	 * It has been recommended not to use progressive bypass mode.
+	 */
+	val = VPE_DEI_INTERLACE_BYPASS;
+
+	val |= (src_h << VPE_DEI_HEIGHT_SHIFT) |
+		(src_w << VPE_DEI_WIDTH_SHIFT) |
+		VPE_DEI_FIELD_FLUSH;
+
+	*dei_mmr0 = val;
+
+	ctx->load_mmrs = true;
+}
+
+static void set_csc_coeff_bypass(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	u32 *shadow_csc_reg5 = &mmr_adb->csc_regs[5];
+
+	*shadow_csc_reg5 |= VPE_CSC_BYPASS;
+
+	ctx->load_mmrs = true;
+}
+
+static void set_sc_regs_bypass(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	u32 *sc_reg0 = &mmr_adb->sc_regs[0];
+	u32 val = 0;
+
+	val |= VPE_SC_BYPASS;
+	*sc_reg0 = val;
+
+	ctx->load_mmrs = true;
+}
+
+/*
+ * Set the shadow registers whose values are modified when either the
+ * source or destination format is changed.
+ */
+static int set_srcdst_params(struct vpe_ctx *ctx)
+{
+	ctx->sequence = 0;
+
+	set_cfg_and_line_modes(ctx);
+	set_dei_regs_bypass(ctx);
+	set_csc_coeff_bypass(ctx);
+	set_sc_regs_bypass(ctx);
+
+	return 0;
+}
+
+/*
+ * Return the vpe_ctx structure for a given struct file
+ */
+static struct vpe_ctx *file2ctx(struct file *file)
+{
+	return container_of(file->private_data, struct vpe_ctx, fh);
+}
+
+/*
+ * mem2mem callbacks
+ */
+
+/**
+ * job_ready() - check whether an instance is ready to be scheduled to run
+ */
+static int job_ready(void *priv)
+{
+	struct vpe_ctx *ctx = priv;
+	int needed = ctx->bufs_per_job;
+
+	if (v4l2_m2m_num_src_bufs_ready(ctx->m2m_ctx) < needed)
+		return 0;
+
+	return 1;
+}
+
+static void job_abort(void *priv)
+{
+	struct vpe_ctx *ctx = priv;
+
+	/* Will cancel the transaction in the next interrupt handler */
+	ctx->aborting = 1;
+}
+
+/*
+ * Lock access to the device
+ */
+static void vpe_lock(void *priv)
+{
+	struct vpe_ctx *ctx = priv;
+	struct vpe_dev *dev = ctx->dev;
+	mutex_lock(&dev->dev_mutex);
+}
+
+static void vpe_unlock(void *priv)
+{
+	struct vpe_ctx *ctx = priv;
+	struct vpe_dev *dev = ctx->dev;
+	mutex_unlock(&dev->dev_mutex);
+}
+
+static void vpe_dump_regs(struct vpe_dev *dev)
+{
+#define DUMPREG(r) vpe_dbg(dev, "%-35s %08x\n", #r, read_reg(dev, VPE_##r))
+
+	vpe_dbg(dev, "VPE Registers:\n");
+
+	DUMPREG(PID);
+	DUMPREG(SYSCONFIG);
+	DUMPREG(INT0_STATUS0_RAW);
+	DUMPREG(INT0_STATUS0);
+	DUMPREG(INT0_ENABLE0);
+	DUMPREG(INT0_STATUS1_RAW);
+	DUMPREG(INT0_STATUS1);
+	DUMPREG(INT0_ENABLE1);
+	DUMPREG(CLK_ENABLE);
+	DUMPREG(CLK_RESET);
+	DUMPREG(CLK_FORMAT_SELECT);
+	DUMPREG(CLK_RANGE_MAP);
+	DUMPREG(US1_R0);
+	DUMPREG(US1_R1);
+	DUMPREG(US1_R2);
+	DUMPREG(US1_R3);
+	DUMPREG(US1_R4);
+	DUMPREG(US1_R5);
+	DUMPREG(US1_R6);
+	DUMPREG(US1_R7);
+	DUMPREG(US2_R0);
+	DUMPREG(US2_R1);
+	DUMPREG(US2_R2);
+	DUMPREG(US2_R3);
+	DUMPREG(US2_R4);
+	DUMPREG(US2_R5);
+	DUMPREG(US2_R6);
+	DUMPREG(US2_R7);
+	DUMPREG(US3_R0);
+	DUMPREG(US3_R1);
+	DUMPREG(US3_R2);
+	DUMPREG(US3_R3);
+	DUMPREG(US3_R4);
+	DUMPREG(US3_R5);
+	DUMPREG(US3_R6);
+	DUMPREG(US3_R7);
+	DUMPREG(DEI_FRAME_SIZE);
+	DUMPREG(MDT_BYPASS);
+	DUMPREG(MDT_SF_THRESHOLD);
+	DUMPREG(EDI_CONFIG);
+	DUMPREG(DEI_EDI_LUT_R0);
+	DUMPREG(DEI_EDI_LUT_R1);
+	DUMPREG(DEI_EDI_LUT_R2);
+	DUMPREG(DEI_EDI_LUT_R3);
+	DUMPREG(DEI_FMD_WINDOW_R0);
+	DUMPREG(DEI_FMD_WINDOW_R1);
+	DUMPREG(DEI_FMD_CONTROL_R0);
+	DUMPREG(DEI_FMD_CONTROL_R1);
+	DUMPREG(DEI_FMD_STATUS_R0);
+	DUMPREG(DEI_FMD_STATUS_R1);
+	DUMPREG(DEI_FMD_STATUS_R2);
+	DUMPREG(SC_MP_SC0);
+	DUMPREG(SC_MP_SC1);
+	DUMPREG(SC_MP_SC2);
+	DUMPREG(SC_MP_SC3);
+	DUMPREG(SC_MP_SC4);
+	DUMPREG(SC_MP_SC5);
+	DUMPREG(SC_MP_SC6);
+	DUMPREG(SC_MP_SC8);
+	DUMPREG(SC_MP_SC9);
+	DUMPREG(SC_MP_SC10);
+	DUMPREG(SC_MP_SC11);
+	DUMPREG(SC_MP_SC12);
+	DUMPREG(SC_MP_SC13);
+	DUMPREG(SC_MP_SC17);
+	DUMPREG(SC_MP_SC18);
+	DUMPREG(SC_MP_SC19);
+	DUMPREG(SC_MP_SC20);
+	DUMPREG(SC_MP_SC21);
+	DUMPREG(SC_MP_SC22);
+	DUMPREG(SC_MP_SC23);
+	DUMPREG(SC_MP_SC24);
+	DUMPREG(SC_MP_SC25);
+	DUMPREG(CSC_CSC00);
+	DUMPREG(CSC_CSC01);
+	DUMPREG(CSC_CSC02);
+	DUMPREG(CSC_CSC03);
+	DUMPREG(CSC_CSC04);
+	DUMPREG(CSC_CSC05);
+#undef DUMPREG
+}
+
+static void add_out_dtd(struct vpe_ctx *ctx, int port)
+{
+	struct vpe_q_data *q_data = &ctx->q_data[Q_DATA_DST];
+	const struct vpe_port_data *p_data = &port_data[port];
+	struct vb2_buffer *vb = ctx->dst_vb;
+	struct v4l2_rect *c_rect = &q_data->c_rect;
+	struct vpe_fmt *fmt = q_data->fmt;
+	const struct vpdma_data_format *vpdma_fmt;
+	int plane = fmt->coplanar ? p_data->vb_part : 0;
+	dma_addr_t dma_addr;
+	u32 flags = 0;
+
+	vpdma_fmt = fmt->vpdma_fmt[plane];
+	dma_addr = vb2_dma_contig_plane_dma_addr(vb, plane);
+	if (!dma_addr) {
+		vpe_err(ctx->dev,
+			"acquiring output buffer(%d) dma_addr failed\n",
+			port);
+		return;
+	}
+
+	if (q_data->flags & Q_DATA_FRAME_1D)
+		flags |= VPDMA_DATA_FRAME_1D;
+	if (q_data->flags & Q_DATA_MODE_TILED)
+		flags |= VPDMA_DATA_MODE_TILED;
+
+	vpdma_add_out_dtd(&ctx->desc_list, c_rect, vpdma_fmt, dma_addr,
+		p_data->channel, flags);
+}
+
+static void add_in_dtd(struct vpe_ctx *ctx, int port)
+{
+	struct vpe_q_data *q_data = &ctx->q_data[Q_DATA_SRC];
+	const struct vpe_port_data *p_data = &port_data[port];
+	struct vb2_buffer *vb = ctx->src_vb;
+	struct v4l2_rect *c_rect = &q_data->c_rect;
+	struct vpe_fmt *fmt = q_data->fmt;
+	const struct vpdma_data_format *vpdma_fmt;
+	int plane = fmt->coplanar ? p_data->vb_part : 0;
+	int field = 0;
+	dma_addr_t dma_addr;
+	u32 flags = 0;
+
+	vpdma_fmt = fmt->vpdma_fmt[plane];
+
+	dma_addr = vb2_dma_contig_plane_dma_addr(vb, plane);
+	if (!dma_addr) {
+		vpe_err(ctx->dev,
+			"acquiring input buffer(%d) dma_addr failed\n",
+			port);
+		return;
+	}
+
+	if (q_data->flags & Q_DATA_FRAME_1D)
+		flags |= VPDMA_DATA_FRAME_1D;
+	if (q_data->flags & Q_DATA_MODE_TILED)
+		flags |= VPDMA_DATA_MODE_TILED;
+
+	vpdma_add_in_dtd(&ctx->desc_list, q_data->width, q_data->height,
+		c_rect, vpdma_fmt, dma_addr, p_data->channel, field, flags);
+}
+
+/*
+ * Enable the expected IRQ sources
+ */
+static void enable_irqs(struct vpe_ctx *ctx)
+{
+	write_reg(ctx->dev, VPE_INT0_ENABLE0_SET, VPE_INT0_LIST0_COMPLETE);
+	write_reg(ctx->dev, VPE_INT0_ENABLE1_SET, VPE_DS1_UV_ERROR_INT);
+
+	vpdma_enable_list_complete_irq(ctx->dev->vpdma, 0, true);
+}
+
+static void disable_irqs(struct vpe_ctx *ctx)
+{
+	write_reg(ctx->dev, VPE_INT0_ENABLE0_CLR, 0xffffffff);
+	write_reg(ctx->dev, VPE_INT0_ENABLE1_CLR, 0xffffffff);
+
+	vpdma_enable_list_complete_irq(ctx->dev->vpdma, 0, false);
+}
+
+/* device_run() - prepares and starts the device
+ *
+ * This function is only called when both the source and destination
+ * buffers are in place.
+ */
+static void device_run(void *priv)
+{
+	struct vpe_ctx *ctx = priv;
+	struct vpe_q_data *d_q_data = &ctx->q_data[Q_DATA_DST];
+
+	ctx->src_vb = v4l2_m2m_src_buf_remove(ctx->m2m_ctx);
+	WARN_ON(ctx->src_vb == NULL);
+	ctx->dst_vb = v4l2_m2m_dst_buf_remove(ctx->m2m_ctx);
+	WARN_ON(ctx->dst_vb == NULL);
+
+	/* config descriptors */
+	if (ctx->dev->loaded_mmrs != ctx->mmr_adb.dma_addr || ctx->load_mmrs) {
+		vpdma_buf_map(ctx->dev->vpdma, &ctx->mmr_adb);
+		vpdma_add_cfd_adb(&ctx->desc_list, CFD_MMR_CLIENT, &ctx->mmr_adb);
+		ctx->dev->loaded_mmrs = ctx->mmr_adb.dma_addr;
+		ctx->load_mmrs = false;
+	}
+
+	add_out_dtd(ctx, VPE_PORT_LUMA_OUT);
+	if (d_q_data->fmt->coplanar)
+		add_out_dtd(ctx, VPE_PORT_CHROMA_OUT);
+
+	add_in_dtd(ctx, VPE_PORT_LUMA1_IN);
+	add_in_dtd(ctx, VPE_PORT_CHROMA1_IN);
+
+	/* sync on channel control descriptors for output ports */
+	vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_LUMA_OUT);
+	if (d_q_data->fmt->coplanar)
+		vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_CHROMA_OUT);
+
+	enable_irqs(ctx);
+
+	vpdma_buf_map(ctx->dev->vpdma, &ctx->desc_list.buf);
+	vpdma_submit_descs(ctx->dev->vpdma, &ctx->desc_list);
+}
+
+static void ds1_uv_error(struct vpe_ctx *ctx)
+{
+	dev_warn(ctx->dev->v4l2_dev.dev,
+		"received downsampler error interrupt\n");
+}
+
+static irqreturn_t vpe_irq(int irq_vpe, void *data)
+{
+	struct vpe_dev *dev = (struct vpe_dev *)data;
+	struct vpe_ctx *ctx;
+	struct vb2_buffer *s_vb, *d_vb;
+	struct v4l2_buffer *s_buf, *d_buf;
+	unsigned long flags;
+	u32 irqst0, irqst1;
+
+	irqst0 = read_reg(dev, VPE_INT0_STATUS0);
+	if (irqst0) {
+		write_reg(dev, VPE_INT0_STATUS0_CLR, irqst0);
+		vpe_dbg(dev, "INT0_STATUS0 = 0x%08x\n", irqst0);
+	}
+
+	irqst1 = read_reg(dev, VPE_INT0_STATUS1);
+	if (irqst1) {
+		write_reg(dev, VPE_INT0_STATUS1_CLR, irqst1);
+		vpe_dbg(dev, "INT0_STATUS1 = 0x%08x\n", irqst1);
+	}
+
+	ctx = v4l2_m2m_get_curr_priv(dev->m2m_dev);
+	if (!ctx) {
+		vpe_err(dev, "instance released before end of transaction\n");
+		goto handled;
+	}
+
+	if (irqst1 & VPE_DS1_UV_ERROR_INT) {
+		irqst1 &= ~VPE_DS1_UV_ERROR_INT;
+		ds1_uv_error(ctx);
+	}
+
+	if (irqst0) {
+		if (irqst0 & VPE_INT0_LIST0_COMPLETE)
+			vpdma_clear_list_stat(ctx->dev->vpdma);
+
+		irqst0 &= ~(VPE_INT0_LIST0_COMPLETE);
+	}
+
+	if (irqst0 | irqst1) {
+		dev_warn(dev->v4l2_dev.dev, "Unexpected interrupt: "
+			"INT0_STATUS0 = 0x%08x, INT0_STATUS1 = 0x%08x\n",
+			irqst0, irqst1);
+	}
+
+	disable_irqs(ctx);
+
+	vpdma_buf_unmap(dev->vpdma, &ctx->desc_list.buf);
+	vpdma_buf_unmap(dev->vpdma, &ctx->mmr_adb);
+
+	vpdma_reset_desc_list(&ctx->desc_list);
+
+	if (ctx->aborting)
+		goto finished;
+
+	s_vb = ctx->src_vb;
+	d_vb = ctx->dst_vb;
+	s_buf = &s_vb->v4l2_buf;
+	d_buf = &d_vb->v4l2_buf;
+
+	d_buf->timestamp = s_buf->timestamp;
+	if (s_buf->flags & V4L2_BUF_FLAG_TIMECODE) {
+		d_buf->flags |= V4L2_BUF_FLAG_TIMECODE;
+		d_buf->timecode = s_buf->timecode;
+	}
+
+	d_buf->sequence = ctx->sequence;
+
+	ctx->sequence++;
+
+	spin_lock_irqsave(&dev->lock, flags);
+	v4l2_m2m_buf_done(s_vb, VB2_BUF_STATE_DONE);
+	v4l2_m2m_buf_done(d_vb, VB2_BUF_STATE_DONE);
+	spin_unlock_irqrestore(&dev->lock, flags);
+
+	ctx->bufs_completed++;
+	if (ctx->bufs_completed < ctx->bufs_per_job) {
+		device_run(ctx);
+		goto handled;
+	}
+
+finished:
+	vpe_dbg(ctx->dev, "finishing transaction\n");
+	ctx->bufs_completed = 0;
+	v4l2_m2m_job_finish(dev->m2m_dev, ctx->m2m_ctx);
+handled:
+	return IRQ_HANDLED;
+}
+
+/*
+ * video ioctls
+ */
+static int vpe_querycap(struct file *file, void *priv,
+			struct v4l2_capability *cap)
+{
+	strncpy(cap->driver, VPE_MODULE_NAME, sizeof(cap->driver) - 1);
+	strncpy(cap->card, VPE_MODULE_NAME, sizeof(cap->card) - 1);
+	strlcpy(cap->bus_info, VPE_MODULE_NAME, sizeof(cap->bus_info));
+	cap->device_caps  = V4L2_CAP_VIDEO_M2M | V4L2_CAP_STREAMING;
+	cap->capabilities = cap->device_caps | V4L2_CAP_DEVICE_CAPS;
+	return 0;
+}
+
+static int __enum_fmt(struct v4l2_fmtdesc *f, u32 type)
+{
+	int i, index;
+	struct vpe_fmt *fmt = NULL;
+
+	index = 0;
+	for (i = 0; i < ARRAY_SIZE(vpe_formats); ++i) {
+		if (vpe_formats[i].types & type) {
+			if (index == f->index) {
+				fmt = &vpe_formats[i];
+				break;
+			}
+			index++;
+		}
+	}
+
+	if (!fmt)
+		return -EINVAL;
+
+	strncpy(f->description, fmt->name, sizeof(f->description) - 1);
+	f->pixelformat = fmt->fourcc;
+	return 0;
+}
+
+static int vpe_enum_fmt(struct file *file, void *priv,
+				struct v4l2_fmtdesc *f)
+{
+	if (V4L2_TYPE_IS_OUTPUT(f->type))
+		return __enum_fmt(f, VPE_FMT_TYPE_OUTPUT);
+	else
+		return __enum_fmt(f, VPE_FMT_TYPE_CAPTURE);
+}
+
+static int vpe_g_fmt(struct file *file, void *priv, struct v4l2_format *f)
+{
+	struct v4l2_pix_format_mplane *pix = &f->fmt.pix_mp;
+	struct vpe_ctx *ctx = file2ctx(file);
+	struct vb2_queue *vq;
+	struct vpe_q_data *q_data;
+	int i;
+
+	vq = v4l2_m2m_get_vq(ctx->m2m_ctx, f->type);
+	if (!vq)
+		return -EINVAL;
+
+	q_data = get_q_data(ctx, f->type);
+
+	pix->width = q_data->width;
+	pix->height = q_data->height;
+	pix->pixelformat = q_data->fmt->fourcc;
+	pix->colorspace = q_data->colorspace;
+	pix->num_planes = q_data->fmt->coplanar ? 2 : 1;
+
+	for (i = 0; i < pix->num_planes; i++) {
+		pix->plane_fmt[i].bytesperline = q_data->bytesperline[i];
+		pix->plane_fmt[i].sizeimage = q_data->sizeimage[i];
+	}
+
+	return 0;
+}
+
+static int __vpe_try_fmt(struct vpe_ctx *ctx, struct v4l2_format *f,
+		       struct vpe_fmt *fmt, int type)
+{
+	struct v4l2_pix_format_mplane *pix = &f->fmt.pix_mp;
+	struct v4l2_plane_pix_format *plane_fmt;
+	int i;
+
+	if (!fmt || !(fmt->types & type)) {
+		vpe_err(ctx->dev, "Fourcc format (0x%08x) invalid.\n",
+			pix->pixelformat);
+		return -EINVAL;
+	}
+
+	pix->field = V4L2_FIELD_NONE;
+
+	v4l_bound_align_image(&pix->width, MIN_W, MAX_W, W_ALIGN,
+			      &pix->height, MIN_H, MAX_H, H_ALIGN,
+			      S_ALIGN);
+
+	pix->num_planes = fmt->coplanar ? 2 : 1;
+	pix->pixelformat = fmt->fourcc;
+	pix->colorspace = fmt->fourcc == V4L2_PIX_FMT_RGB24 ?
+			V4L2_COLORSPACE_SRGB : V4L2_COLORSPACE_SMPTE170M;
+
+
+	for (i = 0; i < pix->num_planes; i++) {
+		int depth;
+
+		plane_fmt = &pix->plane_fmt[i];
+		depth = fmt->vpdma_fmt[i]->depth;
+
+		if (i == VPE_LUMA)
+			plane_fmt->bytesperline =
+					round_up((pix->width * depth) >> 3,
+						1 << L_ALIGN);
+		else
+			plane_fmt->bytesperline = pix->width;
+
+		plane_fmt->sizeimage =
+				(pix->height * pix->width * depth) >> 3;
+	}
+
+	return 0;
+}
+
+static int vpe_try_fmt(struct file *file, void *priv, struct v4l2_format *f)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+	struct vpe_fmt *fmt = find_format(f);
+
+	if (V4L2_TYPE_IS_OUTPUT(f->type))
+		return __vpe_try_fmt(ctx, f, fmt, VPE_FMT_TYPE_OUTPUT);
+	else
+		return __vpe_try_fmt(ctx, f, fmt, VPE_FMT_TYPE_CAPTURE);
+}
+
+static int __vpe_s_fmt(struct vpe_ctx *ctx, struct v4l2_format *f)
+{
+	struct v4l2_pix_format_mplane *pix = &f->fmt.pix_mp;
+	struct v4l2_plane_pix_format *plane_fmt;
+	struct vpe_q_data *q_data;
+	struct vb2_queue *vq;
+	int i;
+
+	vq = v4l2_m2m_get_vq(ctx->m2m_ctx, f->type);
+	if (!vq)
+		return -EINVAL;
+
+	if (vb2_is_busy(vq)) {
+		vpe_err(ctx->dev, "queue busy\n");
+		return -EBUSY;
+	}
+
+	q_data = get_q_data(ctx, f->type);
+	if (!q_data)
+		return -EINVAL;
+
+	q_data->fmt		= find_format(f);
+	q_data->width		= pix->width;
+	q_data->height		= pix->height;
+	q_data->colorspace	= pix->colorspace;
+
+	for (i = 0; i < pix->num_planes; i++) {
+		plane_fmt = &pix->plane_fmt[i];
+
+		q_data->bytesperline[i]	= plane_fmt->bytesperline;
+		q_data->sizeimage[i]	= plane_fmt->sizeimage;
+	}
+
+	q_data->c_rect.left	= 0;
+	q_data->c_rect.top	= 0;
+	q_data->c_rect.width	= q_data->width;
+	q_data->c_rect.height	= q_data->height;
+
+	vpe_dbg(ctx->dev, "Setting format for type %d, wxh: %dx%d, fmt: %d bpl_y %d",
+		f->type, q_data->width, q_data->height, q_data->fmt->fourcc,
+		q_data->bytesperline[VPE_LUMA]);
+	if (q_data->fmt->coplanar)
+		vpe_dbg(ctx->dev, " bpl_uv %d\n",
+			q_data->bytesperline[VPE_CHROMA]);
+
+	return 0;
+}
+
+static int vpe_s_fmt(struct file *file, void *priv, struct v4l2_format *f)
+{
+	int ret;
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	ret = vpe_try_fmt(file, priv, f);
+	if (ret)
+		return ret;
+
+	ret = __vpe_s_fmt(ctx, f);
+	if (ret)
+		return ret;
+
+	if (V4L2_TYPE_IS_OUTPUT(f->type))
+		set_src_registers(ctx);
+	else
+		set_dst_registers(ctx);
+
+	return set_srcdst_params(ctx);
+}
+
+static int vpe_reqbufs(struct file *file, void *priv,
+		       struct v4l2_requestbuffers *reqbufs)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	return v4l2_m2m_reqbufs(file, ctx->m2m_ctx, reqbufs);
+}
+
+static int vpe_querybuf(struct file *file, void *priv, struct v4l2_buffer *buf)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	return v4l2_m2m_querybuf(file, ctx->m2m_ctx, buf);
+}
+
+static int vpe_qbuf(struct file *file, void *priv, struct v4l2_buffer *buf)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	return v4l2_m2m_qbuf(file, ctx->m2m_ctx, buf);
+}
+
+static int vpe_dqbuf(struct file *file, void *priv, struct v4l2_buffer *buf)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	return v4l2_m2m_dqbuf(file, ctx->m2m_ctx, buf);
+}
+
+static int vpe_streamon(struct file *file, void *priv, enum v4l2_buf_type type)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	return v4l2_m2m_streamon(file, ctx->m2m_ctx, type);
+}
+
+static int vpe_streamoff(struct file *file, void *priv, enum v4l2_buf_type type)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	vpe_dump_regs(ctx->dev);
+	vpdma_dump_regs(ctx->dev->vpdma);
+
+	return v4l2_m2m_streamoff(file, ctx->m2m_ctx, type);
+}
+
+#define V4L2_CID_TRANS_NUM_BUFS		(V4L2_CID_USER_BASE + 0x1000)
+
+static int vpe_s_ctrl(struct v4l2_ctrl *ctrl)
+{
+	struct vpe_ctx *ctx =
+		container_of(ctrl->handler, struct vpe_ctx, hdl);
+
+	switch (ctrl->id) {
+	case V4L2_CID_TRANS_NUM_BUFS:
+		ctx->bufs_per_job = ctrl->val;
+		break;
+
+	default:
+		vpe_err(ctx->dev, "Invalid control\n");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static const struct v4l2_ctrl_ops vpe_ctrl_ops = {
+	.s_ctrl = vpe_s_ctrl,
+};
+
+static const struct v4l2_ioctl_ops vpe_ioctl_ops = {
+	.vidioc_querycap	= vpe_querycap,
+
+	.vidioc_enum_fmt_vid_cap_mplane = vpe_enum_fmt,
+	.vidioc_g_fmt_vid_cap_mplane	= vpe_g_fmt,
+	.vidioc_try_fmt_vid_cap_mplane	= vpe_try_fmt,
+	.vidioc_s_fmt_vid_cap_mplane	= vpe_s_fmt,
+
+	.vidioc_enum_fmt_vid_out_mplane = vpe_enum_fmt,
+	.vidioc_g_fmt_vid_out_mplane	= vpe_g_fmt,
+	.vidioc_try_fmt_vid_out_mplane	= vpe_try_fmt,
+	.vidioc_s_fmt_vid_out_mplane	= vpe_s_fmt,
+
+	.vidioc_reqbufs		= vpe_reqbufs,
+	.vidioc_querybuf	= vpe_querybuf,
+
+	.vidioc_qbuf		= vpe_qbuf,
+	.vidioc_dqbuf		= vpe_dqbuf,
+
+	.vidioc_streamon	= vpe_streamon,
+	.vidioc_streamoff	= vpe_streamoff,
+	.vidioc_subscribe_event = v4l2_ctrl_subscribe_event,
+	.vidioc_unsubscribe_event = v4l2_event_unsubscribe,
+};
+
+/*
+ * Queue operations
+ */
+static int vpe_queue_setup(struct vb2_queue *vq,
+			   const struct v4l2_format *fmt,
+			   unsigned int *nbuffers, unsigned int *nplanes,
+			   unsigned int sizes[], void *alloc_ctxs[])
+{
+	int i;
+	struct vpe_ctx *ctx = vb2_get_drv_priv(vq);
+	struct vpe_q_data *q_data;
+
+	q_data = get_q_data(ctx, vq->type);
+
+	*nplanes = q_data->fmt->coplanar ? 2 : 1;
+
+	for (i = 0; i < *nplanes; i++) {
+		sizes[i] = q_data->sizeimage[i];
+		alloc_ctxs[i] = ctx->dev->alloc_ctx;
+	}
+
+	vpe_dbg(ctx->dev, "get %d buffer(s) of size %d", *nbuffers,
+		sizes[VPE_LUMA]);
+	if (q_data->fmt->coplanar)
+		vpe_dbg(ctx->dev, " and %d\n", sizes[VPE_CHROMA]);
+
+	return 0;
+}
+
+static int vpe_buf_prepare(struct vb2_buffer *vb)
+{
+	struct vpe_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
+	struct vpe_q_data *q_data;
+	int i, num_planes;
+
+	vpe_dbg(ctx->dev, "type: %d\n", vb->vb2_queue->type);
+
+	q_data = get_q_data(ctx, vb->vb2_queue->type);
+	num_planes = q_data->fmt->coplanar ? 2 : 1;
+
+	for (i = 0; i < num_planes; i++) {
+		if (vb2_plane_size(vb, i) < q_data->sizeimage[i]) {
+			vpe_err(ctx->dev,
+				"data will not fit into plane (%lu < %lu)\n",
+				vb2_plane_size(vb, i),
+				(long) q_data->sizeimage[i]);
+			return -EINVAL;
+		}
+	}
+
+	for (i = 0; i < num_planes; i++)
+		vb2_set_plane_payload(vb, i, q_data->sizeimage[i]);
+
+	return 0;
+}
+
+static void vpe_buf_queue(struct vb2_buffer *vb)
+{
+	struct vpe_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
+	v4l2_m2m_buf_queue(ctx->m2m_ctx, vb);
+}
+
+static void vpe_wait_prepare(struct vb2_queue *q)
+{
+	struct vpe_ctx *ctx = vb2_get_drv_priv(q);
+	vpe_unlock(ctx);
+}
+
+static void vpe_wait_finish(struct vb2_queue *q)
+{
+	struct vpe_ctx *ctx = vb2_get_drv_priv(q);
+	vpe_lock(ctx);
+}
+
+static struct vb2_ops vpe_qops = {
+	.queue_setup	 = vpe_queue_setup,
+	.buf_prepare	 = vpe_buf_prepare,
+	.buf_queue	 = vpe_buf_queue,
+	.wait_prepare	 = vpe_wait_prepare,
+	.wait_finish	 = vpe_wait_finish,
+};
+
+static int queue_init(void *priv, struct vb2_queue *src_vq,
+		      struct vb2_queue *dst_vq)
+{
+	struct vpe_ctx *ctx = priv;
+	int ret;
+
+	memset(src_vq, 0, sizeof(*src_vq));
+	src_vq->type = V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE;
+	src_vq->io_modes = VB2_MMAP;
+	src_vq->drv_priv = ctx;
+	src_vq->buf_struct_size = sizeof(struct v4l2_m2m_buffer);
+	src_vq->ops = &vpe_qops;
+	src_vq->mem_ops = &vb2_dma_contig_memops;
+	src_vq->timestamp_type = V4L2_BUF_FLAG_TIMESTAMP_MONOTONIC;
+
+	ret = vb2_queue_init(src_vq);
+	if (ret)
+		return ret;
+
+	memset(dst_vq, 0, sizeof(*dst_vq));
+	dst_vq->type = V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE;
+	dst_vq->io_modes = VB2_MMAP;
+	dst_vq->drv_priv = ctx;
+	dst_vq->buf_struct_size = sizeof(struct v4l2_m2m_buffer);
+	dst_vq->ops = &vpe_qops;
+	dst_vq->mem_ops = &vb2_dma_contig_memops;
+	dst_vq->timestamp_type = V4L2_BUF_FLAG_TIMESTAMP_MONOTONIC;
+
+	return vb2_queue_init(dst_vq);
+}
+
+static const struct v4l2_ctrl_config vpe_bufs_per_job = {
+	.ops = &vpe_ctrl_ops,
+	.id = V4L2_CID_TRANS_NUM_BUFS,
+	.name = "Buffers Per Transaction",
+	.type = V4L2_CTRL_TYPE_INTEGER,
+	.def = VPE_DEF_BUFS_PER_JOB,
+	.min = 1,
+	.max = VIDEO_MAX_FRAME,
+	.step = 1,
+};
+
+/*
+ * File operations
+ */
+static int vpe_open(struct file *file)
+{
+	struct vpe_dev *dev = video_drvdata(file);
+	struct vpe_ctx *ctx = NULL;
+	struct vpe_q_data *s_q_data;
+	struct v4l2_ctrl_handler *hdl;
+	int ret;
+
+	vpe_dbg(dev, "vpe_open\n");
+
+	if (!dev->vpdma->ready) {
+		vpe_err(dev, "vpdma firmware not loaded\n");
+		return -ENODEV;
+	}
+
+	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
+	if (!ctx)
+		return -ENOMEM;
+
+	ctx->dev = dev;
+
+	if (mutex_lock_interruptible(&dev->dev_mutex)) {
+		ret = -ERESTARTSYS;
+		goto free_ctx;
+	}
+
+	ret = vpdma_create_desc_list(&ctx->desc_list, VPE_DESC_LIST_SIZE,
+			VPDMA_LIST_TYPE_NORMAL);
+	if (ret != 0)
+		goto unlock;
+
+	ret = vpdma_buf_alloc(&ctx->mmr_adb, sizeof(struct vpe_mmr_adb));
+	if (ret != 0)
+		goto free_desc_list;
+
+	init_adb_hdrs(ctx);
+
+	v4l2_fh_init(&ctx->fh, video_devdata(file));
+	file->private_data = &ctx->fh;
+
+	hdl = &ctx->hdl;
+	v4l2_ctrl_handler_init(hdl, 1);
+	v4l2_ctrl_new_custom(hdl, &vpe_bufs_per_job, NULL);
+	if (hdl->error) {
+		ret = hdl->error;
+		goto exit_fh;
+	}
+	ctx->fh.ctrl_handler = hdl;
+	v4l2_ctrl_handler_setup(hdl);
+
+	s_q_data = &ctx->q_data[Q_DATA_SRC];
+	s_q_data->fmt = &vpe_formats[2];
+	s_q_data->width = 1920;
+	s_q_data->height = 1080;
+	s_q_data->sizeimage[VPE_LUMA] = (s_q_data->width * s_q_data->height *
+			s_q_data->fmt->vpdma_fmt[VPE_LUMA]->depth) >> 3;
+	s_q_data->colorspace = V4L2_COLORSPACE_SMPTE240M;
+	s_q_data->c_rect.left = 0;
+	s_q_data->c_rect.top = 0;
+	s_q_data->c_rect.width = s_q_data->width;
+	s_q_data->c_rect.height = s_q_data->height;
+	s_q_data->flags = 0;
+
+	ctx->q_data[Q_DATA_DST] = *s_q_data;
+
+	set_src_registers(ctx);
+	set_dst_registers(ctx);
+	ret = set_srcdst_params(ctx);
+	if (ret)
+		goto exit_fh;
+
+	ctx->m2m_ctx = v4l2_m2m_ctx_init(dev->m2m_dev, ctx, &queue_init);
+
+	if (IS_ERR(ctx->m2m_ctx)) {
+		ret = PTR_ERR(ctx->m2m_ctx);
+		goto exit_fh;
+	}
+
+	v4l2_fh_add(&ctx->fh);
+
+	/*
+	 * for now, just report the creation of the first instance, we can later
+	 * optimize the driver to enable or disable clocks when the first
+	 * instance is created or the last instance released
+	 */
+	if (atomic_inc_return(&dev->num_instances) == 1)
+		vpe_dbg(dev, "first instance created\n");
+
+	ctx->bufs_per_job = VPE_DEF_BUFS_PER_JOB;
+
+	ctx->load_mmrs = true;
+
+	vpe_dbg(dev, "created instance %p, m2m_ctx: %p\n",
+		ctx, ctx->m2m_ctx);
+
+	mutex_unlock(&dev->dev_mutex);
+
+	return 0;
+exit_fh:
+	v4l2_ctrl_handler_free(hdl);
+	v4l2_fh_exit(&ctx->fh);
+	vpdma_buf_free(&ctx->mmr_adb);
+free_desc_list:
+	vpdma_free_desc_list(&ctx->desc_list);
+unlock:
+	mutex_unlock(&dev->dev_mutex);
+free_ctx:
+	kfree(ctx);
+	return ret;
+}
+
+static int vpe_release(struct file *file)
+{
+	struct vpe_dev *dev = video_drvdata(file);
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	vpe_dbg(dev, "releasing instance %p\n", ctx);
+
+	mutex_lock(&dev->dev_mutex);
+	vpdma_free_desc_list(&ctx->desc_list);
+	vpdma_buf_free(&ctx->mmr_adb);
+
+	v4l2_fh_del(&ctx->fh);
+	v4l2_fh_exit(&ctx->fh);
+	v4l2_ctrl_handler_free(&ctx->hdl);
+	v4l2_m2m_ctx_release(ctx->m2m_ctx);
+
+	kfree(ctx);
+
+	/*
+	 * for now, just report the release of the last instance, we can later
+	 * optimize the driver to enable or disable clocks when the first
+	 * instance is created or the last instance released
+	 */
+	if (atomic_dec_return(&dev->num_instances) == 0)
+		vpe_dbg(dev, "last instance released\n");
+
+	mutex_unlock(&dev->dev_mutex);
+
+	return 0;
+}
+
+static unsigned int vpe_poll(struct file *file,
+			     struct poll_table_struct *wait)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+	struct vpe_dev *dev = ctx->dev;
+	int ret;
+
+	mutex_lock(&dev->dev_mutex);
+	ret = v4l2_m2m_poll(file, ctx->m2m_ctx, wait);
+	mutex_unlock(&dev->dev_mutex);
+	return ret;
+}
+
+static int vpe_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+	struct vpe_dev *dev = ctx->dev;
+	int ret;
+
+	if (mutex_lock_interruptible(&dev->dev_mutex))
+		return -ERESTARTSYS;
+	ret = v4l2_m2m_mmap(file, ctx->m2m_ctx, vma);
+	mutex_unlock(&dev->dev_mutex);
+	return ret;
+}
+
+static const struct v4l2_file_operations vpe_fops = {
+	.owner		= THIS_MODULE,
+	.open		= vpe_open,
+	.release	= vpe_release,
+	.poll		= vpe_poll,
+	.unlocked_ioctl	= video_ioctl2,
+	.mmap		= vpe_mmap,
+};
+
+static struct video_device vpe_videodev = {
+	.name		= VPE_MODULE_NAME,
+	.fops		= &vpe_fops,
+	.ioctl_ops	= &vpe_ioctl_ops,
+	.minor		= -1,
+	.release	= video_device_release,
+	.vfl_dir	= VFL_DIR_M2M,
+};
+
+static struct v4l2_m2m_ops m2m_ops = {
+	.device_run	= device_run,
+	.job_ready	= job_ready,
+	.job_abort	= job_abort,
+	.lock		= vpe_lock,
+	.unlock		= vpe_unlock,
+};
+
+static int vpe_runtime_get(struct platform_device *pdev)
+{
+	int r;
+
+	dev_dbg(&pdev->dev, "vpe_runtime_get\n");
+
+	r = pm_runtime_get_sync(&pdev->dev);
+	WARN_ON(r < 0);
+	return r < 0 ? r : 0;
+}
+
+static void vpe_runtime_put(struct platform_device *pdev)
+{
+
+	int r;
+
+	dev_dbg(&pdev->dev, "vpe_runtime_put\n");
+
+	r = pm_runtime_put_sync(&pdev->dev);
+	WARN_ON(r < 0 && r != -ENOSYS);
+}
+
+static int vpe_probe(struct platform_device *pdev)
+{
+	struct vpe_dev *dev;
+	struct video_device *vfd;
+	struct resource *res;
+	int ret, irq, func;
+
+	dev = devm_kzalloc(&pdev->dev, sizeof(*dev), GFP_KERNEL);
+	if (IS_ERR(dev))
+		return PTR_ERR(dev);
+
+	spin_lock_init(&dev->lock);
+
+	ret = v4l2_device_register(&pdev->dev, &dev->v4l2_dev);
+	if (ret)
+		return ret;
+
+	atomic_set(&dev->num_instances, 0);
+	mutex_init(&dev->dev_mutex);
+
+	res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "vpe");
+	dev->base = devm_ioremap_resource(&pdev->dev, res);
+	if (IS_ERR(dev->base)) {
+		ret = PTR_ERR(dev->base);
+		goto v4l2_dev_unreg;
+	}
+
+	irq = platform_get_irq(pdev, 0);
+	ret = devm_request_irq(&pdev->dev, irq, vpe_irq, 0, VPE_MODULE_NAME,
+			dev);
+	if (ret)
+		goto v4l2_dev_unreg;
+
+	platform_set_drvdata(pdev, dev);
+
+	dev->alloc_ctx = vb2_dma_contig_init_ctx(&pdev->dev);
+	if (IS_ERR(dev->alloc_ctx)) {
+		vpe_err(dev, "Failed to alloc vb2 context\n");
+		ret = PTR_ERR(dev->alloc_ctx);
+		goto v4l2_dev_unreg;
+	}
+
+	dev->m2m_dev = v4l2_m2m_init(&m2m_ops);
+	if (IS_ERR(dev->m2m_dev)) {
+		vpe_err(dev, "Failed to init mem2mem device\n");
+		ret = PTR_ERR(dev->m2m_dev);
+		goto rel_ctx;
+	}
+
+	pm_runtime_enable(&pdev->dev);
+
+	ret = vpe_runtime_get(pdev);
+	if (ret)
+		goto rel_m2m;
+
+	/* Perform clk enable followed by reset */
+	vpe_set_clock_enable(dev, 1);
+
+	vpe_top_reset(dev);
+
+	func = read_field_reg(dev, VPE_PID, VPE_PID_FUNC_MASK,
+		VPE_PID_FUNC_SHIFT);
+	vpe_dbg(dev, "VPE PID function %x\n", func);
+
+	vpe_top_vpdma_reset(dev);
+
+	dev->vpdma = vpdma_create(pdev);
+	if (IS_ERR(dev->vpdma))
+		goto runtime_put;
+
+	vfd = &dev->vfd;
+	*vfd = vpe_videodev;
+	vfd->lock = &dev->dev_mutex;
+	vfd->v4l2_dev = &dev->v4l2_dev;
+
+	ret = video_register_device(vfd, VFL_TYPE_GRABBER, 0);
+	if (ret) {
+		vpe_err(dev, "Failed to register video device\n");
+		goto runtime_put;
+	}
+
+	video_set_drvdata(vfd, dev);
+	snprintf(vfd->name, sizeof(vfd->name), "%s", vpe_videodev.name);
+	dev_info(dev->v4l2_dev.dev, "Device registered as /dev/video%d\n",
+		vfd->num);
+
+	return 0;
+
+runtime_put:
+	vpe_runtime_put(pdev);
+rel_m2m:
+	pm_runtime_disable(&pdev->dev);
+	v4l2_m2m_release(dev->m2m_dev);
+rel_ctx:
+	vb2_dma_contig_cleanup_ctx(dev->alloc_ctx);
+v4l2_dev_unreg:
+	v4l2_device_unregister(&dev->v4l2_dev);
+
+	return ret;
+}
+
+static int vpe_remove(struct platform_device *pdev)
+{
+	struct vpe_dev *dev =
+		(struct vpe_dev *) platform_get_drvdata(pdev);
+
+	v4l2_info(&dev->v4l2_dev, "Removing " VPE_MODULE_NAME);
+
+	v4l2_m2m_release(dev->m2m_dev);
+	video_unregister_device(&dev->vfd);
+	v4l2_device_unregister(&dev->v4l2_dev);
+	vb2_dma_contig_cleanup_ctx(dev->alloc_ctx);
+
+	vpe_set_clock_enable(dev, 0);
+	vpe_runtime_put(pdev);
+	pm_runtime_disable(&pdev->dev);
+
+	return 0;
+}
+
+#if defined(CONFIG_OF)
+static const struct of_device_id vpe_of_match[] = {
+	{
+		.compatible = "ti,vpe",
+	},
+	{},
+};
+#else
+#define vpe_of_match NULL
+#endif
+
+static struct platform_driver vpe_pdrv = {
+	.probe		= vpe_probe,
+	.remove		= vpe_remove,
+	.driver		= {
+		.name	= VPE_MODULE_NAME,
+		.owner	= THIS_MODULE,
+		.of_match_table = vpe_of_match,
+	},
+};
+
+static void __exit vpe_exit(void)
+{
+	platform_driver_unregister(&vpe_pdrv);
+}
+
+static int __init vpe_init(void)
+{
+	return platform_driver_register(&vpe_pdrv);
+}
+
+module_init(vpe_init);
+module_exit(vpe_exit);
+
+MODULE_DESCRIPTION("TI VPE driver");
+MODULE_AUTHOR("Dale Farnsworth, <dale@farnsworth.org>");
+MODULE_LICENSE("GPL");
diff --git a/drivers/media/platform/ti-vpe/vpe_regs.h b/drivers/media/platform/ti-vpe/vpe_regs.h
new file mode 100644
index 0000000..be41a1f
--- /dev/null
+++ b/drivers/media/platform/ti-vpe/vpe_regs.h
@@ -0,0 +1,496 @@
+/*
+ * Copyright (c) 2013 Texas Instruments Inc.
+ *
+ * David Griego, <dagriego@biglakesoftware.com>
+ * Dale Farnsworth, <dale@farnsworth.org>
+ * Archit Taneja, <archit@ti.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#ifndef __TI_VPE_REGS_H
+#define __TI_VPE_REGS_H
+
+/* VPE register offsets and field selectors */
+
+/* VPE top level regs */
+#define VPE_PID				0x0000
+#define VPE_PID_MINOR_MASK		0x3f
+#define VPE_PID_MINOR_SHIFT		0
+#define VPE_PID_CUSTOM_MASK		0x03
+#define VPE_PID_CUSTOM_SHIFT		6
+#define VPE_PID_MAJOR_MASK		0x07
+#define VPE_PID_MAJOR_SHIFT		8
+#define VPE_PID_RTL_MASK		0x1f
+#define VPE_PID_RTL_SHIFT		11
+#define VPE_PID_FUNC_MASK		0xfff
+#define VPE_PID_FUNC_SHIFT		16
+#define VPE_PID_SCHEME_MASK		0x03
+#define VPE_PID_SCHEME_SHIFT		30
+
+#define VPE_SYSCONFIG			0x0010
+#define VPE_SYSCONFIG_IDLE_MASK		0x03
+#define VPE_SYSCONFIG_IDLE_SHIFT	2
+#define VPE_SYSCONFIG_STANDBY_MASK	0x03
+#define VPE_SYSCONFIG_STANDBY_SHIFT	4
+#define VPE_FORCE_IDLE_MODE		0
+#define VPE_NO_IDLE_MODE		1
+#define VPE_SMART_IDLE_MODE		2
+#define VPE_SMART_IDLE_WAKEUP_MODE	3
+#define VPE_FORCE_STANDBY_MODE		0
+#define VPE_NO_STANDBY_MODE		1
+#define VPE_SMART_STANDBY_MODE		2
+#define VPE_SMART_STANDBY_WAKEUP_MODE	3
+
+#define VPE_INT0_STATUS0_RAW_SET	0x0020
+#define VPE_INT0_STATUS0_RAW		VPE_INT0_STATUS0_RAW_SET
+#define VPE_INT0_STATUS0_CLR		0x0028
+#define VPE_INT0_STATUS0		VPE_INT0_STATUS0_CLR
+#define VPE_INT0_ENABLE0_SET		0x0030
+#define VPE_INT0_ENABLE0		VPE_INT0_ENABLE0_SET
+#define VPE_INT0_ENABLE0_CLR		0x0038
+#define VPE_INT0_LIST0_COMPLETE		(1 << 0)
+#define VPE_INT0_LIST0_NOTIFY		(1 << 1)
+#define VPE_INT0_LIST1_COMPLETE		(1 << 2)
+#define VPE_INT0_LIST1_NOTIFY		(1 << 3)
+#define VPE_INT0_LIST2_COMPLETE		(1 << 4)
+#define VPE_INT0_LIST2_NOTIFY		(1 << 5)
+#define VPE_INT0_LIST3_COMPLETE		(1 << 6)
+#define VPE_INT0_LIST3_NOTIFY		(1 << 7)
+#define VPE_INT0_LIST4_COMPLETE		(1 << 8)
+#define VPE_INT0_LIST4_NOTIFY		(1 << 9)
+#define VPE_INT0_LIST5_COMPLETE		(1 << 10)
+#define VPE_INT0_LIST5_NOTIFY		(1 << 11)
+#define VPE_INT0_LIST6_COMPLETE		(1 << 12)
+#define VPE_INT0_LIST6_NOTIFY		(1 << 13)
+#define VPE_INT0_LIST7_COMPLETE		(1 << 14)
+#define VPE_INT0_LIST7_NOTIFY		(1 << 15)
+#define VPE_INT0_DESCRIPTOR		(1 << 16)
+#define VPE_DEI_FMD_INT			(1 << 18)
+
+#define VPE_INT0_STATUS1_RAW_SET	0x0024
+#define VPE_INT0_STATUS1_RAW		VPE_INT0_STATUS1_RAW_SET
+#define VPE_INT0_STATUS1_CLR		0x002c
+#define VPE_INT0_STATUS1		VPE_INT0_STATUS1_CLR
+#define VPE_INT0_ENABLE1_SET		0x0034
+#define VPE_INT0_ENABLE1		VPE_INT0_ENABLE1_SET
+#define VPE_INT0_ENABLE1_CLR		0x003c
+#define VPE_INT0_CHANNEL_GROUP0		(1 << 0)
+#define VPE_INT0_CHANNEL_GROUP1		(1 << 1)
+#define VPE_INT0_CHANNEL_GROUP2		(1 << 2)
+#define VPE_INT0_CHANNEL_GROUP3		(1 << 3)
+#define VPE_INT0_CHANNEL_GROUP4		(1 << 4)
+#define VPE_INT0_CHANNEL_GROUP5		(1 << 5)
+#define VPE_INT0_CLIENT			(1 << 7)
+#define VPE_DEI_ERROR_INT		(1 << 16)
+#define VPE_DS1_UV_ERROR_INT		(1 << 22)
+
+#define VPE_INTC_EOI			0x00a0
+
+#define VPE_CLK_ENABLE			0x0100
+#define VPE_VPEDMA_CLK_ENABLE		(1 << 0)
+#define VPE_DATA_PATH_CLK_ENABLE	(1 << 1)
+
+#define VPE_CLK_RESET			0x0104
+#define VPE_VPDMA_CLK_RESET_MASK	0x1
+#define VPE_VPDMA_CLK_RESET_SHIFT	0
+#define VPE_DATA_PATH_CLK_RESET_MASK	0x1
+#define VPE_DATA_PATH_CLK_RESET_SHIFT	1
+#define VPE_MAIN_RESET_MASK		0x1
+#define VPE_MAIN_RESET_SHIFT		31
+
+#define VPE_CLK_FORMAT_SELECT		0x010c
+#define VPE_CSC_SRC_SELECT_MASK		0x03
+#define VPE_CSC_SRC_SELECT_SHIFT	0
+#define VPE_RGB_OUT_SELECT		(1 << 8)
+#define VPE_DS_SRC_SELECT_MASK		0x07
+#define VPE_DS_SRC_SELECT_SHIFT		9
+#define VPE_DS_BYPASS			(1 << 16)
+#define VPE_COLOR_SEPARATE_422		(1 << 18)
+
+#define VPE_DS_SRC_DEI_SCALER		(5 << VPE_DS_SRC_SELECT_SHIFT)
+#define VPE_CSC_SRC_DEI_SCALER		(3 << VPE_CSC_SRC_SELECT_SHIFT)
+
+#define VPE_CLK_RANGE_MAP		0x011c
+#define VPE_RANGE_RANGE_MAP_Y_MASK	0x07
+#define VPE_RANGE_RANGE_MAP_Y_SHIFT	0
+#define VPE_RANGE_RANGE_MAP_UV_MASK	0x07
+#define VPE_RANGE_RANGE_MAP_UV_SHIFT	3
+#define VPE_RANGE_MAP_ON		(1 << 6)
+#define VPE_RANGE_REDUCTION_ON		(1 << 28)
+
+/* VPE chrominance upsampler regs */
+#define VPE_US1_R0			0x0304
+#define VPE_US2_R0			0x0404
+#define VPE_US3_R0			0x0504
+#define VPE_US_C1_MASK			0x3fff
+#define VPE_US_C1_SHIFT			2
+#define VPE_US_C0_MASK			0x3fff
+#define VPE_US_C0_SHIFT			18
+#define VPE_US_MODE_MASK		0x03
+#define VPE_US_MODE_SHIFT		16
+#define VPE_ANCHOR_FID0_C1_MASK		0x3fff
+#define VPE_ANCHOR_FID0_C1_SHIFT	2
+#define VPE_ANCHOR_FID0_C0_MASK		0x3fff
+#define VPE_ANCHOR_FID0_C0_SHIFT	18
+
+#define VPE_US1_R1			0x0308
+#define VPE_US2_R1			0x0408
+#define VPE_US3_R1			0x0508
+#define VPE_ANCHOR_FID0_C3_MASK		0x3fff
+#define VPE_ANCHOR_FID0_C3_SHIFT	2
+#define VPE_ANCHOR_FID0_C2_MASK		0x3fff
+#define VPE_ANCHOR_FID0_C2_SHIFT	18
+
+#define VPE_US1_R2			0x030c
+#define VPE_US2_R2			0x040c
+#define VPE_US3_R2			0x050c
+#define VPE_INTERP_FID0_C1_MASK		0x3fff
+#define VPE_INTERP_FID0_C1_SHIFT	2
+#define VPE_INTERP_FID0_C0_MASK		0x3fff
+#define VPE_INTERP_FID0_C0_SHIFT	18
+
+#define VPE_US1_R3			0x0310
+#define VPE_US2_R3			0x0410
+#define VPE_US3_R3			0x0510
+#define VPE_INTERP_FID0_C3_MASK		0x3fff
+#define VPE_INTERP_FID0_C3_SHIFT	2
+#define VPE_INTERP_FID0_C2_MASK		0x3fff
+#define VPE_INTERP_FID0_C2_SHIFT	18
+
+#define VPE_US1_R4			0x0314
+#define VPE_US2_R4			0x0414
+#define VPE_US3_R4			0x0514
+#define VPE_ANCHOR_FID1_C1_MASK		0x3fff
+#define VPE_ANCHOR_FID1_C1_SHIFT	2
+#define VPE_ANCHOR_FID1_C0_MASK		0x3fff
+#define VPE_ANCHOR_FID1_C0_SHIFT	18
+
+#define VPE_US1_R5			0x0318
+#define VPE_US2_R5			0x0418
+#define VPE_US3_R5			0x0518
+#define VPE_ANCHOR_FID1_C3_MASK		0x3fff
+#define VPE_ANCHOR_FID1_C3_SHIFT	2
+#define VPE_ANCHOR_FID1_C2_MASK		0x3fff
+#define VPE_ANCHOR_FID1_C2_SHIFT	18
+
+#define VPE_US1_R6			0x031c
+#define VPE_US2_R6			0x041c
+#define VPE_US3_R6			0x051c
+#define VPE_INTERP_FID1_C1_MASK		0x3fff
+#define VPE_INTERP_FID1_C1_SHIFT	2
+#define VPE_INTERP_FID1_C0_MASK		0x3fff
+#define VPE_INTERP_FID1_C0_SHIFT	18
+
+#define VPE_US1_R7			0x0320
+#define VPE_US2_R7			0x0420
+#define VPE_US3_R7			0x0520
+#define VPE_INTERP_FID0_C3_MASK		0x3fff
+#define VPE_INTERP_FID0_C3_SHIFT	2
+#define VPE_INTERP_FID0_C2_MASK		0x3fff
+#define VPE_INTERP_FID0_C2_SHIFT	18
+
+/* VPE de-interlacer regs */
+#define VPE_DEI_FRAME_SIZE		0x0600
+#define VPE_DEI_WIDTH_MASK		0x07ff
+#define VPE_DEI_WIDTH_SHIFT		0
+#define VPE_DEI_HEIGHT_MASK		0x07ff
+#define VPE_DEI_HEIGHT_SHIFT		16
+#define VPE_DEI_INTERLACE_BYPASS	(1 << 29)
+#define VPE_DEI_FIELD_FLUSH		(1 << 30)
+#define VPE_DEI_PROGRESSIVE		(1 << 31)
+
+#define VPE_MDT_BYPASS			0x0604
+#define VPE_MDT_TEMPMAX_BYPASS		(1 << 0)
+#define VPE_MDT_SPATMAX_BYPASS		(1 << 1)
+
+#define VPE_MDT_SF_THRESHOLD		0x0608
+#define VPE_MDT_SF_SC_THR1_MASK		0xff
+#define VPE_MDT_SF_SC_THR1_SHIFT	0
+#define VPE_MDT_SF_SC_THR2_MASK		0xff
+#define VPE_MDT_SF_SC_THR2_SHIFT	0
+#define VPE_MDT_SF_SC_THR3_MASK		0xff
+#define VPE_MDT_SF_SC_THR3_SHIFT	0
+
+#define VPE_EDI_CONFIG			0x060c
+#define VPE_EDI_INP_MODE_MASK		0x03
+#define VPE_EDI_INP_MODE_SHIFT		0
+#define VPE_EDI_ENABLE_3D		(1 << 2)
+#define VPE_EDI_ENABLE_CHROMA_3D	(1 << 3)
+#define VPE_EDI_CHROMA3D_COR_THR_MASK	0xff
+#define VPE_EDI_CHROMA3D_COR_THR_SHIFT	8
+#define VPE_EDI_DIR_COR_LOWER_THR_MASK	0xff
+#define VPE_EDI_DIR_COR_LOWER_THR_SHIFT	16
+#define VPE_EDI_COR_SCALE_FACTOR_MASK	0xff
+#define VPE_EDI_COR_SCALE_FACTOR_SHIFT	23
+
+#define VPE_DEI_EDI_LUT_R0		0x0610
+#define VPE_EDI_LUT0_MASK		0x1f
+#define VPE_EDI_LUT0_SHIFT		0
+#define VPE_EDI_LUT1_MASK		0x1f
+#define VPE_EDI_LUT1_SHIFT		8
+#define VPE_EDI_LUT2_MASK		0x1f
+#define VPE_EDI_LUT2_SHIFT		16
+#define VPE_EDI_LUT3_MASK		0x1f
+#define VPE_EDI_LUT3_SHIFT		24
+
+#define VPE_DEI_EDI_LUT_R1		0x0614
+#define VPE_EDI_LUT0_MASK		0x1f
+#define VPE_EDI_LUT0_SHIFT		0
+#define VPE_EDI_LUT1_MASK		0x1f
+#define VPE_EDI_LUT1_SHIFT		8
+#define VPE_EDI_LUT2_MASK		0x1f
+#define VPE_EDI_LUT2_SHIFT		16
+#define VPE_EDI_LUT3_MASK		0x1f
+#define VPE_EDI_LUT3_SHIFT		24
+
+#define VPE_DEI_EDI_LUT_R2		0x0618
+#define VPE_EDI_LUT4_MASK		0x1f
+#define VPE_EDI_LUT4_SHIFT		0
+#define VPE_EDI_LUT5_MASK		0x1f
+#define VPE_EDI_LUT5_SHIFT		8
+#define VPE_EDI_LUT6_MASK		0x1f
+#define VPE_EDI_LUT6_SHIFT		16
+#define VPE_EDI_LUT7_MASK		0x1f
+#define VPE_EDI_LUT7_SHIFT		24
+
+#define VPE_DEI_EDI_LUT_R3		0x061c
+#define VPE_EDI_LUT8_MASK		0x1f
+#define VPE_EDI_LUT8_SHIFT		0
+#define VPE_EDI_LUT9_MASK		0x1f
+#define VPE_EDI_LUT9_SHIFT		8
+#define VPE_EDI_LUT10_MASK		0x1f
+#define VPE_EDI_LUT10_SHIFT		16
+#define VPE_EDI_LUT11_MASK		0x1f
+#define VPE_EDI_LUT11_SHIFT		24
+
+#define VPE_DEI_FMD_WINDOW_R0		0x0620
+#define VPE_FMD_WINDOW_MINX_MASK	0x07ff
+#define VPE_FMD_WINDOW_MINX_SHIFT	0
+#define VPE_FMD_WINDOW_MAXX_MASK	0x07ff
+#define VPE_FMD_WINDOW_MAXX_SHIFT	16
+#define VPE_FMD_WINDOW_ENABLE		(1 << 31)
+
+#define VPE_DEI_FMD_WINDOW_R1		0x0624
+#define VPE_FMD_WINDOW_MINY_MASK	0x07ff
+#define VPE_FMD_WINDOW_MINY_SHIFT	0
+#define VPE_FMD_WINDOW_MAXY_MASK	0x07ff
+#define VPE_FMD_WINDOW_MAXY_SHIFT	16
+
+#define VPE_DEI_FMD_CONTROL_R0		0x0628
+#define VPE_FMD_ENABLE			(1 << 0)
+#define VPE_FMD_LOCK			(1 << 1)
+#define VPE_FMD_JAM_DIR			(1 << 2)
+#define VPE_FMD_BED_ENABLE		(1 << 3)
+#define VPE_FMD_CAF_FIELD_THR_MASK	0xff
+#define VPE_FMD_CAF_FIELD_THR_SHIFT	16
+#define VPE_FMD_CAF_LINE_THR_MASK	0xff
+#define VPE_FMD_CAF_LINE_THR_SHIFT	24
+
+#define VPE_DEI_FMD_CONTROL_R1		0x062c
+#define VPE_FMD_CAF_THR_MASK		0x000fffff
+#define VPE_FMD_CAF_THR_SHIFT		0
+
+#define VPE_DEI_FMD_STATUS_R0		0x0630
+#define VPE_FMD_CAF_MASK		0x000fffff
+#define VPE_FMD_CAF_SHIFT		0
+#define VPE_FMD_RESET			(1 << 24)
+
+#define VPE_DEI_FMD_STATUS_R1		0x0634
+#define VPE_FMD_FIELD_DIFF_MASK		0x0fffffff
+#define VPE_FMD_FIELD_DIFF_SHIFT	0
+
+#define VPE_DEI_FMD_STATUS_R2		0x0638
+#define VPE_FMD_FRAME_DIFF_MASK		0x000fffff
+#define VPE_FMD_FRAME_DIFF_SHIFT	0
+
+/* VPE scaler regs */
+#define VPE_SC_MP_SC0			0x0700
+#define VPE_INTERLACE_O			(1 << 0)
+#define VPE_LINEAR			(1 << 1)
+#define VPE_SC_BYPASS			(1 << 2)
+#define VPE_INVT_FID			(1 << 3)
+#define VPE_USE_RAV			(1 << 4)
+#define VPE_ENABLE_EV			(1 << 5)
+#define VPE_AUTH_HS			(1 << 6)
+#define VPE_DCM_2X			(1 << 7)
+#define VPE_DCM_4X			(1 << 8)
+#define VPE_HP_BYPASS			(1 << 9)
+#define VPE_INTERLACE_I			(1 << 10)
+#define VPE_ENABLE_SIN2_VER_INTP	(1 << 11)
+#define VPE_Y_PK_EN			(1 << 14)
+#define VPE_TRIM			(1 << 15)
+#define VPE_SELFGEN_FID			(1 << 16)
+
+#define VPE_SC_MP_SC1			0x0704
+#define VPE_ROW_ACC_INC_MASK		0x07ffffff
+#define VPE_ROW_ACC_INC_SHIFT		0
+
+#define VPE_SC_MP_SC2			0x0708
+#define VPE_ROW_ACC_OFFSET_MASK		0x0fffffff
+#define VPE_ROW_ACC_OFFSET_SHIFT	0
+
+#define VPE_SC_MP_SC3			0x070c
+#define VPE_ROW_ACC_OFFSET_B_MASK	0x0fffffff
+#define VPE_ROW_ACC_OFFSET_B_SHIFT	0
+
+#define VPE_SC_MP_SC4			0x0710
+#define VPE_TAR_H_MASK			0x07ff
+#define VPE_TAR_H_SHIFT			0
+#define VPE_TAR_W_MASK			0x07ff
+#define VPE_TAR_W_SHIFT			12
+#define VPE_LIN_ACC_INC_U_MASK		0x07
+#define VPE_LIN_ACC_INC_U_SHIFT		24
+#define VPE_NLIN_ACC_INIT_U_MASK	0x07
+#define VPE_NLIN_ACC_INIT_U_SHIFT	28
+
+#define VPE_SC_MP_SC5			0x0714
+#define VPE_SRC_H_MASK			0x07ff
+#define VPE_SRC_H_SHIFT			0
+#define VPE_SRC_W_MASK			0x07ff
+#define VPE_SRC_W_SHIFT			12
+#define VPE_NLIN_ACC_INC_U_MASK		0x07
+#define VPE_NLIN_ACC_INC_U_SHIFT	24
+
+#define VPE_SC_MP_SC6			0x0718
+#define VPE_ROW_ACC_INIT_RAV_MASK	0x03ff
+#define VPE_ROW_ACC_INIT_RAV_SHIFT	0
+#define VPE_ROW_ACC_INIT_RAV_B_MASK	0x03ff
+#define VPE_ROW_ACC_INIT_RAV_B_SHIFT	10
+
+#define VPE_SC_MP_SC8			0x0720
+#define VPE_NLIN_LEFT_MASK		0x07ff
+#define VPE_NLIN_LEFT_SHIFT		0
+#define VPE_NLIN_RIGHT_MASK		0x07ff
+#define VPE_NLIN_RIGHT_SHIFT		12
+
+#define VPE_SC_MP_SC9			0x0724
+#define VPE_LIN_ACC_INC			VPE_SC_MP_SC9
+
+#define VPE_SC_MP_SC10			0x0728
+#define VPE_NLIN_ACC_INIT		VPE_SC_MP_SC10
+
+#define VPE_SC_MP_SC11			0x072c
+#define VPE_NLIN_ACC_INC		VPE_SC_MP_SC11
+
+#define VPE_SC_MP_SC12			0x0730
+#define VPE_COL_ACC_OFFSET_MASK		0x01ffffff
+#define VPE_COL_ACC_OFFSET_SHIFT	0
+
+#define VPE_SC_MP_SC13			0x0734
+#define VPE_SC_FACTOR_RAV_MASK		0x03ff
+#define VPE_SC_FACTOR_RAV_SHIFT		0
+#define VPE_CHROMA_INTP_THR_MASK	0x03ff
+#define VPE_CHROMA_INTP_THR_SHIFT	12
+#define VPE_DELTA_CHROMA_THR_MASK	0x0f
+#define VPE_DELTA_CHROMA_THR_SHIFT	24
+
+#define VPE_SC_MP_SC17			0x0744
+#define VPE_EV_THR_MASK			0x03ff
+#define VPE_EV_THR_SHIFT		12
+#define VPE_DELTA_LUMA_THR_MASK		0x0f
+#define VPE_DELTA_LUMA_THR_SHIFT	24
+#define VPE_DELTA_EV_THR_MASK		0x0f
+#define VPE_DELTA_EV_THR_SHIFT		28
+
+#define VPE_SC_MP_SC18			0x0748
+#define VPE_HS_FACTOR_MASK		0x03ff
+#define VPE_HS_FACTOR_SHIFT		0
+#define VPE_CONF_DEFAULT_MASK		0x01ff
+#define VPE_CONF_DEFAULT_SHIFT		16
+
+#define VPE_SC_MP_SC19			0x074c
+#define VPE_HPF_COEFF0_MASK		0xff
+#define VPE_HPF_COEFF0_SHIFT		0
+#define VPE_HPF_COEFF1_MASK		0xff
+#define VPE_HPF_COEFF1_SHIFT		8
+#define VPE_HPF_COEFF2_MASK		0xff
+#define VPE_HPF_COEFF2_SHIFT		16
+#define VPE_HPF_COEFF3_MASK		0xff
+#define VPE_HPF_COEFF3_SHIFT		23
+
+#define VPE_SC_MP_SC20			0x0750
+#define VPE_HPF_COEFF4_MASK		0xff
+#define VPE_HPF_COEFF4_SHIFT		0
+#define VPE_HPF_COEFF5_MASK		0xff
+#define VPE_HPF_COEFF5_SHIFT		8
+#define VPE_HPF_NORM_SHIFT_MASK		0x07
+#define VPE_HPF_NORM_SHIFT_SHIFT	16
+#define VPE_NL_LIMIT_MASK		0x1ff
+#define VPE_NL_LIMIT_SHIFT		20
+
+#define VPE_SC_MP_SC21			0x0754
+#define VPE_NL_LO_THR_MASK		0x01ff
+#define VPE_NL_LO_THR_SHIFT		0
+#define VPE_NL_LO_SLOPE_MASK		0xff
+#define VPE_NL_LO_SLOPE_SHIFT		16
+
+#define VPE_SC_MP_SC22			0x0758
+#define VPE_NL_HI_THR_MASK		0x01ff
+#define VPE_NL_HI_THR_SHIFT		0
+#define VPE_NL_HI_SLOPE_SH_MASK		0x07
+#define VPE_NL_HI_SLOPE_SH_SHIFT	16
+
+#define VPE_SC_MP_SC23			0x075c
+#define VPE_GRADIENT_THR_MASK		0x07ff
+#define VPE_GRADIENT_THR_SHIFT		0
+#define VPE_GRADIENT_THR_RANGE_MASK	0x0f
+#define VPE_GRADIENT_THR_RANGE_SHIFT	12
+#define VPE_MIN_GY_THR_MASK		0xff
+#define VPE_MIN_GY_THR_SHIFT		16
+#define VPE_MIN_GY_THR_RANGE_MASK	0x0f
+#define VPE_MIN_GY_THR_RANGE_SHIFT	28
+
+#define VPE_SC_MP_SC24			0x0760
+#define VPE_ORG_H_MASK			0x07ff
+#define VPE_ORG_H_SHIFT			0
+#define VPE_ORG_W_MASK			0x07ff
+#define VPE_ORG_W_SHIFT			16
+
+#define VPE_SC_MP_SC25			0x0764
+#define VPE_OFF_H_MASK			0x07ff
+#define VPE_OFF_H_SHIFT			0
+#define VPE_OFF_W_MASK			0x07ff
+#define VPE_OFF_W_SHIFT			16
+
+/* VPE color space converter regs */
+#define VPE_CSC_CSC00			0x5700
+#define VPE_CSC_A0_MASK			0x1fff
+#define VPE_CSC_A0_SHIFT		0
+#define VPE_CSC_B0_MASK			0x1fff
+#define VPE_CSC_B0_SHIFT		16
+
+#define VPE_CSC_CSC01			0x5704
+#define VPE_CSC_C0_MASK			0x1fff
+#define VPE_CSC_C0_SHIFT		0
+#define VPE_CSC_A1_MASK			0x1fff
+#define VPE_CSC_A1_SHIFT		16
+
+#define VPE_CSC_CSC02			0x5708
+#define VPE_CSC_B1_MASK			0x1fff
+#define VPE_CSC_B1_SHIFT		0
+#define VPE_CSC_C1_MASK			0x1fff
+#define VPE_CSC_C1_SHIFT		16
+
+#define VPE_CSC_CSC03			0x570c
+#define VPE_CSC_A2_MASK			0x1fff
+#define VPE_CSC_A2_SHIFT		0
+#define VPE_CSC_B2_MASK			0x1fff
+#define VPE_CSC_B2_SHIFT		16
+
+#define VPE_CSC_CSC04			0x5710
+#define VPE_CSC_C2_MASK			0x1fff
+#define VPE_CSC_C2_SHIFT		0
+#define VPE_CSC_D0_MASK			0x0fff
+#define VPE_CSC_D0_SHIFT		16
+
+#define VPE_CSC_CSC05			0x5714
+#define VPE_CSC_D1_MASK			0x0fff
+#define VPE_CSC_D1_SHIFT		0
+#define VPE_CSC_D2_MASK			0x0fff
+#define VPE_CSC_D2_SHIFT		16
+#define VPE_CSC_BYPASS			(1 << 28)
+
+#endif
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v2 4/6] v4l: ti-vpe: Add de-interlacer support in VPE
  2013-08-20 11:00   ` Archit Taneja
@ 2013-08-20 11:00     ` Archit Taneja
  -1 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-08-20 11:00 UTC (permalink / raw)
  To: linux-media, hverkuil, laurent.pinchart, tomi.valkeinen
  Cc: linux-omap, Archit Taneja

Add support for the de-interlacer block in VPE.

For de-interlacer to work, we need to enable 2 more sets of VPE input ports
which fetch data from the 'last' and 'last to last' fields of the interlaced
video. Apart from that, we need to enable the Motion vector output and input
ports, and also allocate DMA buffers for them.

We need to make sure that two most recent fields in the source queue are
available and in the 'READY' state. Once a mem2mem context gets access to the
VPE HW(in device_run), it extracts the addresses of the 3 buffers, and provides
it to the data descriptors for the 3 sets of input ports((LUMA1, CHROMA1),
(LUMA2, CHROMA2), and (LUMA3, CHROMA3)) respectively for the 3 consecutive
fields. The motion vector and output port descriptors are configured and the
list is submitted to VPDMA.

Once the transaction is done, the v4l2 buffer corresponding to the oldest
field(the 3rd one) is changed to the state 'DONE', and the buffers corresponding
to 1st and 2nd fields become the 2nd and 3rd field for the next de-interlace
operation. This way, for each deinterlace operation, we have the 3 most recent
fields. After each transaction, we also swap the motion vector buffers, the new
input motion vector buffer contains the resultant motion information of all the
previous frames, and the new output motion vector buffer will be used to hold
the updated motion vector to capture the motion changes in the next field.

The de-interlacer is removed from bypass mode, it requires some extra default
configurations which are now added. The chrominance upsampler coefficients are
added for interlaced frames. Some VPDMA parameters like frame start event and
line mode are configured for the 2 extra sets of input ports.

Signed-off-by: Archit Taneja <archit@ti.com>
---
 drivers/media/platform/ti-vpe/vpe.c | 370 ++++++++++++++++++++++++++++++++----
 1 file changed, 336 insertions(+), 34 deletions(-)

diff --git a/drivers/media/platform/ti-vpe/vpe.c b/drivers/media/platform/ti-vpe/vpe.c
index 5e1d80e..724ae65 100644
--- a/drivers/media/platform/ti-vpe/vpe.c
+++ b/drivers/media/platform/ti-vpe/vpe.c
@@ -69,6 +69,8 @@
 #define VPE_CHROMA	1
 
 /* per m2m context info */
+#define VPE_MAX_SRC_BUFS	3	/* need 3 src fields to de-interlace */
+
 #define VPE_DEF_BUFS_PER_JOB	1	/* default one buffer per batch job */
 
 /*
@@ -110,6 +112,38 @@ static const struct vpe_us_coeffs us_coeffs[] = {
 		0x00C8, 0x0348, 0x0018, 0x3FD8, 0x3FB8, 0x0378, 0x00E8, 0x3FE8,
 		0x00C8, 0x0348, 0x0018, 0x3FD8, 0x3FB8, 0x0378, 0x00E8, 0x3FE8,
 	},
+	{
+		/* Coefficients for Top Field Interlaced input */
+		0x0051, 0x03D5, 0x3FE3, 0x3FF7, 0x3FB5, 0x02E9, 0x018F, 0x3FD3,
+		/* Coefficients for Bottom Field Interlaced input */
+		0x016B, 0x0247, 0x00B1, 0x3F9D, 0x3FCF, 0x03DB, 0x005D, 0x3FF9,
+	},
+};
+
+/*
+ * the following registers are for configuring some of the parameters of the
+ * motion and edge detection blocks inside DEI, these generally remain the same,
+ * these could be passed later via userspace if some one needs to tweak these.
+ */
+struct vpe_dei_regs {
+	unsigned long mdt_spacial_freq_thr_reg;		/* VPE_DEI_REG2 */
+	unsigned long edi_config_reg;			/* VPE_DEI_REG3 */
+	unsigned long edi_lut_reg0;			/* VPE_DEI_REG4 */
+	unsigned long edi_lut_reg1;			/* VPE_DEI_REG5 */
+	unsigned long edi_lut_reg2;			/* VPE_DEI_REG6 */
+	unsigned long edi_lut_reg3;			/* VPE_DEI_REG7 */
+};
+
+/*
+ * default expert DEI register values, unlikely to be modified.
+ */
+static const struct vpe_dei_regs dei_regs = {
+	0x020C0804u,
+	0x0118100Fu,
+	0x08040200u,
+	0x1010100Cu,
+	0x10101010u,
+	0x10101010u,
 };
 
 /*
@@ -117,6 +151,7 @@ static const struct vpe_us_coeffs us_coeffs[] = {
  */
 struct vpe_port_data {
 	enum vpdma_channel channel;	/* VPDMA channel */
+	u8	vb_index;		/* input frame f, f-1, f-2 index */
 	u8	vb_part;		/* plane index for co-panar formats */
 };
 
@@ -125,6 +160,12 @@ struct vpe_port_data {
  */
 #define VPE_PORT_LUMA1_IN	0
 #define VPE_PORT_CHROMA1_IN	1
+#define VPE_PORT_LUMA2_IN	2
+#define VPE_PORT_CHROMA2_IN	3
+#define VPE_PORT_LUMA3_IN	4
+#define VPE_PORT_CHROMA3_IN	5
+#define VPE_PORT_MV_IN		6
+#define VPE_PORT_MV_OUT		7
 #define VPE_PORT_LUMA_OUT	8
 #define VPE_PORT_CHROMA_OUT	9
 #define VPE_PORT_RGB_OUT	10
@@ -132,12 +173,40 @@ struct vpe_port_data {
 static const struct vpe_port_data port_data[11] = {
 	[VPE_PORT_LUMA1_IN] = {
 		.channel	= VPE_CHAN_LUMA1_IN,
+		.vb_index	= 0,
 		.vb_part	= VPE_LUMA,
 	},
 	[VPE_PORT_CHROMA1_IN] = {
 		.channel	= VPE_CHAN_CHROMA1_IN,
+		.vb_index	= 0,
+		.vb_part	= VPE_CHROMA,
+	},
+	[VPE_PORT_LUMA2_IN] = {
+		.channel	= VPE_CHAN_LUMA2_IN,
+		.vb_index	= 1,
+		.vb_part	= VPE_LUMA,
+	},
+	[VPE_PORT_CHROMA2_IN] = {
+		.channel	= VPE_CHAN_CHROMA2_IN,
+		.vb_index	= 1,
 		.vb_part	= VPE_CHROMA,
 	},
+	[VPE_PORT_LUMA3_IN] = {
+		.channel	= VPE_CHAN_LUMA3_IN,
+		.vb_index	= 2,
+		.vb_part	= VPE_LUMA,
+	},
+	[VPE_PORT_CHROMA3_IN] = {
+		.channel	= VPE_CHAN_CHROMA3_IN,
+		.vb_index	= 2,
+		.vb_part	= VPE_CHROMA,
+	},
+	[VPE_PORT_MV_IN] = {
+		.channel	= VPE_CHAN_MV_IN,
+	},
+	[VPE_PORT_MV_OUT] = {
+		.channel	= VPE_CHAN_MV_OUT,
+	},
 	[VPE_PORT_LUMA_OUT] = {
 		.channel	= VPE_CHAN_LUMA_OUT,
 		.vb_part	= VPE_LUMA,
@@ -209,6 +278,7 @@ struct vpe_q_data {
 	unsigned int		height;				/* frame height */
 	unsigned int		bytesperline[VPE_MAX_PLANES];	/* bytes per line in memory */
 	enum v4l2_colorspace	colorspace;
+	enum v4l2_field		field;				/* supported field value */
 	unsigned int		flags;
 	unsigned int		sizeimage[VPE_MAX_PLANES];	/* image size in memory */
 	struct v4l2_rect	c_rect;				/* crop/compose rectangle */
@@ -218,6 +288,7 @@ struct vpe_q_data {
 /* vpe_q_data flag bits */
 #define	Q_DATA_FRAME_1D		(1 << 0)
 #define	Q_DATA_MODE_TILED	(1 << 1)
+#define	Q_DATA_INTERLACED	(1 << 2)
 
 enum {
 	Q_DATA_SRC = 0,
@@ -269,6 +340,7 @@ struct vpe_ctx {
 	struct v4l2_m2m_ctx	*m2m_ctx;
 	struct v4l2_ctrl_handler hdl;
 
+	unsigned int		field;			/* current field */
 	unsigned int		sequence;		/* current frame/field seq */
 	unsigned int		aborting;		/* abort after next irq */
 
@@ -276,13 +348,17 @@ struct vpe_ctx {
 	unsigned int		bufs_completed;		/* bufs done in this batch */
 
 	struct vpe_q_data	q_data[2];		/* src & dst queue data */
-	struct vb2_buffer	*src_vb;
+	struct vb2_buffer	*src_vbs[VPE_MAX_SRC_BUFS];
 	struct vb2_buffer	*dst_vb;
 
+	struct vpdma_buf	mv_buf[2];		/* motion vector in/out bufs */
 	struct vpdma_buf	mmr_adb;		/* shadow reg addr/data block */
 	struct vpdma_desc_list	desc_list;		/* DMA descriptor list */
 
+	bool			deinterlacing;		/* using de-interlacer */
 	bool			load_mmrs;		/* have new shadow reg values */
+
+	unsigned int		src_mv_buf_selector;
 };
 
 
@@ -358,8 +434,7 @@ struct vpe_mmr_adb {
 	struct vpdma_adb_hdr	us3_hdr;
 	u32			us3_regs[8];
 	struct vpdma_adb_hdr	dei_hdr;
-	u32			dei_regs[1];
-	u32			dei_pad[3];
+	u32			dei_regs[8];
 	struct vpdma_adb_hdr	sc_hdr;
 	u32			sc_regs[1];
 	u32			sc_pad[3];
@@ -385,6 +460,74 @@ static void init_adb_hdrs(struct vpe_ctx *ctx)
 };
 
 /*
+ * Allocate or re-allocate the motion vector DMA buffers
+ * There are two buffers, one for input and one for output.
+ * However, the roles are reversed after each field is processed.
+ * In other words, after each field is processed, the previous
+ * output (dst) MV buffer becomes the new input (src) MV buffer.
+ */
+static int realloc_mv_buffers(struct vpe_ctx *ctx, size_t size)
+{
+	struct vpdma_data *vpdma = ctx->dev->vpdma;
+	int ret;
+
+	if (ctx->mv_buf[0].mapped) {
+		vpdma_buf_unmap(vpdma, &ctx->mv_buf[0]);
+		vpdma_buf_free(&ctx->mv_buf[0]);
+	}
+
+	if (ctx->mv_buf[1].mapped) {
+		vpdma_buf_unmap(vpdma, &ctx->mv_buf[1]);
+		vpdma_buf_free(&ctx->mv_buf[1]);
+	}
+
+	if (size == 0)
+		return 0;
+
+	ret = vpdma_buf_alloc(&ctx->mv_buf[0], size);
+	if (ret)
+		return ret;
+	ret = vpdma_buf_alloc(&ctx->mv_buf[1], size);
+	if (ret) {
+		vpdma_buf_free(&ctx->mv_buf[0]);
+		return ret;
+	}
+
+	vpdma_buf_map(vpdma, &ctx->mv_buf[0]);
+	vpdma_buf_map(vpdma, &ctx->mv_buf[1]);
+
+	ctx->src_mv_buf_selector = 0;
+
+	return 0;
+}
+
+static void free_mv_buffers(struct vpe_ctx *ctx)
+{
+	realloc_mv_buffers(ctx, 0);
+}
+
+/*
+ * While de-interlacing, we keep the two most recent input buffers
+ * around.  This function frees those two buffers when we have
+ * finished processing the current stream.
+ */
+static void free_vbs(struct vpe_ctx *ctx)
+{
+	struct vpe_dev *dev = ctx->dev;
+	unsigned long flags;
+
+	if (ctx->src_vbs[2] == NULL)
+		return;
+
+	spin_lock_irqsave(&dev->lock, flags);
+	if (ctx->src_vbs[2]) {
+		v4l2_m2m_buf_done(ctx->src_vbs[2], VB2_BUF_STATE_DONE);
+		v4l2_m2m_buf_done(ctx->src_vbs[1], VB2_BUF_STATE_DONE);
+	}
+	spin_unlock_irqrestore(&dev->lock, flags);
+}
+
+/*
  * Enable or disable the VPE clocks
  */
 static void vpe_set_clock_enable(struct vpe_dev *dev, bool on)
@@ -425,6 +568,7 @@ static void vpe_top_vpdma_reset(struct vpe_dev *dev)
 static void set_us_coefficients(struct vpe_ctx *ctx)
 {
 	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	struct vpe_q_data *s_q_data = &ctx->q_data[Q_DATA_SRC];
 	u32 *us1_reg = &mmr_adb->us1_regs[0];
 	u32 *us2_reg = &mmr_adb->us2_regs[0];
 	u32 *us3_reg = &mmr_adb->us3_regs[0];
@@ -432,6 +576,9 @@ static void set_us_coefficients(struct vpe_ctx *ctx)
 
 	cp = &us_coeffs[0].anchor_fid0_c0;
 
+	if (s_q_data->flags & Q_DATA_INTERLACED)	/* interlaced */
+		cp += sizeof(us_coeffs[0]) / sizeof(*cp);
+
 	end_cp = cp + sizeof(us_coeffs[0]) / sizeof(*cp);
 
 	while (cp < end_cp) {
@@ -472,14 +619,28 @@ static void set_cfg_and_line_modes(struct vpe_ctx *ctx)
 
 	/* regs for now */
 	vpdma_set_line_mode(ctx->dev->vpdma, line_mode, VPE_CHAN_CHROMA1_IN);
+	vpdma_set_line_mode(ctx->dev->vpdma, line_mode, VPE_CHAN_CHROMA2_IN);
+	vpdma_set_line_mode(ctx->dev->vpdma, line_mode, VPE_CHAN_CHROMA3_IN);
 
 	/* frame start for input luma */
 	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
 		VPE_CHAN_LUMA1_IN);
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_LUMA2_IN);
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_LUMA3_IN);
 
 	/* frame start for input chroma */
 	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
 		VPE_CHAN_CHROMA1_IN);
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_CHROMA2_IN);
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_CHROMA3_IN);
+
+	/* frame start for MV in client */
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_MV_IN);
 
 	ctx->load_mmrs = true;
 }
@@ -523,13 +684,14 @@ static void set_dst_registers(struct vpe_ctx *ctx)
 /*
  * Set the de-interlacer shadow register values
  */
-static void set_dei_regs_bypass(struct vpe_ctx *ctx)
+static void set_dei_regs(struct vpe_ctx *ctx)
 {
 	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
 	struct vpe_q_data *s_q_data = &ctx->q_data[Q_DATA_SRC];
 	unsigned int src_h = s_q_data->c_rect.height;
 	unsigned int src_w = s_q_data->c_rect.width;
 	u32 *dei_mmr0 = &mmr_adb->dei_regs[0];
+	bool deinterlace = true;
 	u32 val = 0;
 
 	/*
@@ -538,7 +700,13 @@ static void set_dei_regs_bypass(struct vpe_ctx *ctx)
 	 * for both progressive and interlace content in interlace bypass mode.
 	 * It has been recommended not to use progressive bypass mode.
 	 */
-	val = VPE_DEI_INTERLACE_BYPASS;
+	if ((!ctx->deinterlacing && (s_q_data->flags & Q_DATA_INTERLACED)) ||
+			!(s_q_data->flags & Q_DATA_INTERLACED)) {
+		deinterlace = false;
+		val = VPE_DEI_INTERLACE_BYPASS;
+	}
+
+	src_h = deinterlace ? src_h * 2 : src_h;
 
 	val |= (src_h << VPE_DEI_HEIGHT_SHIFT) |
 		(src_w << VPE_DEI_WIDTH_SHIFT) |
@@ -577,10 +745,35 @@ static void set_sc_regs_bypass(struct vpe_ctx *ctx)
  */
 static int set_srcdst_params(struct vpe_ctx *ctx)
 {
+	struct vpe_q_data *s_q_data =  &ctx->q_data[Q_DATA_SRC];
+	struct vpe_q_data *d_q_data =  &ctx->q_data[Q_DATA_DST];
+	size_t mv_buf_size;
+	int ret;
+
 	ctx->sequence = 0;
+	ctx->field = V4L2_FIELD_TOP;
+
+	if ((s_q_data->flags & Q_DATA_INTERLACED) &&
+			!(d_q_data->flags & Q_DATA_INTERLACED)) {
+		const struct vpdma_data_format *mv =
+			&vpdma_misc_fmts[VPDMA_DATA_FMT_MV];
+
+		ctx->deinterlacing = 1;
+		mv_buf_size =
+			(s_q_data->width * s_q_data->height * mv->depth) >> 3;
+	} else {
+		ctx->deinterlacing = 0;
+		mv_buf_size = 0;
+	}
+
+	free_vbs(ctx);
+
+	ret = realloc_mv_buffers(ctx, mv_buf_size);
+	if (ret)
+		return ret;
 
 	set_cfg_and_line_modes(ctx);
-	set_dei_regs_bypass(ctx);
+	set_dei_regs(ctx);
 	set_csc_coeff_bypass(ctx);
 	set_sc_regs_bypass(ctx);
 
@@ -607,6 +800,9 @@ static int job_ready(void *priv)
 	struct vpe_ctx *ctx = priv;
 	int needed = ctx->bufs_per_job;
 
+	if (ctx->deinterlacing && ctx->src_vbs[2] == NULL)
+		needed += 2;	/* need additional two most recent fields */
+
 	if (v4l2_m2m_num_src_bufs_ready(ctx->m2m_ctx) < needed)
 		return 0;
 
@@ -734,17 +930,25 @@ static void add_out_dtd(struct vpe_ctx *ctx, int port)
 	struct v4l2_rect *c_rect = &q_data->c_rect;
 	struct vpe_fmt *fmt = q_data->fmt;
 	const struct vpdma_data_format *vpdma_fmt;
-	int plane = fmt->coplanar ? p_data->vb_part : 0;
+	int mv_buf_selector = !ctx->src_mv_buf_selector;
 	dma_addr_t dma_addr;
 	u32 flags = 0;
 
-	vpdma_fmt = fmt->vpdma_fmt[plane];
-	dma_addr = vb2_dma_contig_plane_dma_addr(vb, plane);
-	if (!dma_addr) {
-		vpe_err(ctx->dev,
-			"acquiring output buffer(%d) dma_addr failed\n",
-			port);
-		return;
+	if (port == VPE_PORT_MV_OUT) {
+		vpdma_fmt = &vpdma_misc_fmts[VPDMA_DATA_FMT_MV];
+		dma_addr = ctx->mv_buf[mv_buf_selector].dma_addr;
+	} else {
+		/* to incorporate interleaved formats */
+		int plane = fmt->coplanar ? p_data->vb_part : 0;
+
+		vpdma_fmt = fmt->vpdma_fmt[plane];
+		dma_addr = vb2_dma_contig_plane_dma_addr(vb, plane);
+		if (!dma_addr) {
+			vpe_err(ctx->dev,
+				"acquiring output buffer(%d) dma_addr failed\n",
+				port);
+			return;
+		}
 	}
 
 	if (q_data->flags & Q_DATA_FRAME_1D)
@@ -760,23 +964,31 @@ static void add_in_dtd(struct vpe_ctx *ctx, int port)
 {
 	struct vpe_q_data *q_data = &ctx->q_data[Q_DATA_SRC];
 	const struct vpe_port_data *p_data = &port_data[port];
-	struct vb2_buffer *vb = ctx->src_vb;
+	struct vb2_buffer *vb = ctx->src_vbs[p_data->vb_index];
 	struct v4l2_rect *c_rect = &q_data->c_rect;
 	struct vpe_fmt *fmt = q_data->fmt;
 	const struct vpdma_data_format *vpdma_fmt;
-	int plane = fmt->coplanar ? p_data->vb_part : 0;
-	int field = 0;
+	int mv_buf_selector = ctx->src_mv_buf_selector;
+	int field = vb->v4l2_buf.field == V4L2_FIELD_BOTTOM;
 	dma_addr_t dma_addr;
 	u32 flags = 0;
 
-	vpdma_fmt = fmt->vpdma_fmt[plane];
+	if (port == VPE_PORT_MV_IN) {
+		vpdma_fmt = &vpdma_misc_fmts[VPDMA_DATA_FMT_MV];
+		dma_addr = ctx->mv_buf[mv_buf_selector].dma_addr;
+	} else {
+		/* to incorporate interleaved formats */
+		int plane = fmt->coplanar ? p_data->vb_part : 0;
 
-	dma_addr = vb2_dma_contig_plane_dma_addr(vb, plane);
-	if (!dma_addr) {
-		vpe_err(ctx->dev,
-			"acquiring input buffer(%d) dma_addr failed\n",
-			port);
-		return;
+		vpdma_fmt = fmt->vpdma_fmt[plane];
+
+		dma_addr = vb2_dma_contig_plane_dma_addr(vb, plane);
+		if (!dma_addr) {
+			vpe_err(ctx->dev,
+				"acquiring input buffer(%d) dma_addr failed\n",
+				port);
+			return;
+		}
 	}
 
 	if (q_data->flags & Q_DATA_FRAME_1D)
@@ -794,7 +1006,8 @@ static void add_in_dtd(struct vpe_ctx *ctx, int port)
 static void enable_irqs(struct vpe_ctx *ctx)
 {
 	write_reg(ctx->dev, VPE_INT0_ENABLE0_SET, VPE_INT0_LIST0_COMPLETE);
-	write_reg(ctx->dev, VPE_INT0_ENABLE1_SET, VPE_DS1_UV_ERROR_INT);
+	write_reg(ctx->dev, VPE_INT0_ENABLE1_SET, VPE_DEI_ERROR_INT |
+				VPE_DS1_UV_ERROR_INT);
 
 	vpdma_enable_list_complete_irq(ctx->dev->vpdma, 0, true);
 }
@@ -817,8 +1030,15 @@ static void device_run(void *priv)
 	struct vpe_ctx *ctx = priv;
 	struct vpe_q_data *d_q_data = &ctx->q_data[Q_DATA_DST];
 
-	ctx->src_vb = v4l2_m2m_src_buf_remove(ctx->m2m_ctx);
-	WARN_ON(ctx->src_vb == NULL);
+	if (ctx->deinterlacing && ctx->src_vbs[2] == NULL) {
+		ctx->src_vbs[2] = v4l2_m2m_src_buf_remove(ctx->m2m_ctx);
+		WARN_ON(ctx->src_vbs[2] == NULL);
+		ctx->src_vbs[1] = v4l2_m2m_src_buf_remove(ctx->m2m_ctx);
+		WARN_ON(ctx->src_vbs[1] == NULL);
+	}
+
+	ctx->src_vbs[0] = v4l2_m2m_src_buf_remove(ctx->m2m_ctx);
+	WARN_ON(ctx->src_vbs[0] == NULL);
 	ctx->dst_vb = v4l2_m2m_dst_buf_remove(ctx->m2m_ctx);
 	WARN_ON(ctx->dst_vb == NULL);
 
@@ -830,24 +1050,49 @@ static void device_run(void *priv)
 		ctx->load_mmrs = false;
 	}
 
+	/* output data descriptors */
+	if (ctx->deinterlacing)
+		add_out_dtd(ctx, VPE_PORT_MV_OUT);
+
 	add_out_dtd(ctx, VPE_PORT_LUMA_OUT);
 	if (d_q_data->fmt->coplanar)
 		add_out_dtd(ctx, VPE_PORT_CHROMA_OUT);
 
+	/* input data descriptors */
+	if (ctx->deinterlacing) {
+		add_in_dtd(ctx, VPE_PORT_LUMA3_IN);
+		add_in_dtd(ctx, VPE_PORT_CHROMA3_IN);
+
+		add_in_dtd(ctx, VPE_PORT_LUMA2_IN);
+		add_in_dtd(ctx, VPE_PORT_CHROMA2_IN);
+	}
+
 	add_in_dtd(ctx, VPE_PORT_LUMA1_IN);
 	add_in_dtd(ctx, VPE_PORT_CHROMA1_IN);
 
+	if (ctx->deinterlacing)
+		add_in_dtd(ctx, VPE_PORT_MV_IN);
+
 	/* sync on channel control descriptors for output ports */
 	vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_LUMA_OUT);
 	if (d_q_data->fmt->coplanar)
 		vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_CHROMA_OUT);
 
+	if (ctx->deinterlacing)
+		vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_MV_OUT);
+
 	enable_irqs(ctx);
 
 	vpdma_buf_map(ctx->dev->vpdma, &ctx->desc_list.buf);
 	vpdma_submit_descs(ctx->dev->vpdma, &ctx->desc_list);
 }
 
+static void dei_error(struct vpe_ctx *ctx)
+{
+	dev_warn(ctx->dev->v4l2_dev.dev,
+		"received DEI error interrupt\n");
+}
+
 static void ds1_uv_error(struct vpe_ctx *ctx)
 {
 	dev_warn(ctx->dev->v4l2_dev.dev,
@@ -858,6 +1103,7 @@ static irqreturn_t vpe_irq(int irq_vpe, void *data)
 {
 	struct vpe_dev *dev = (struct vpe_dev *)data;
 	struct vpe_ctx *ctx;
+	struct vpe_q_data *d_q_data;
 	struct vb2_buffer *s_vb, *d_vb;
 	struct v4l2_buffer *s_buf, *d_buf;
 	unsigned long flags;
@@ -881,9 +1127,15 @@ static irqreturn_t vpe_irq(int irq_vpe, void *data)
 		goto handled;
 	}
 
-	if (irqst1 & VPE_DS1_UV_ERROR_INT) {
-		irqst1 &= ~VPE_DS1_UV_ERROR_INT;
-		ds1_uv_error(ctx);
+	if (irqst1) {
+		if (irqst1 & VPE_DEI_ERROR_INT) {
+			irqst1 &= ~VPE_DEI_ERROR_INT;
+			dei_error(ctx);
+		}
+		if (irqst1 & VPE_DS1_UV_ERROR_INT) {
+			irqst1 &= ~VPE_DS1_UV_ERROR_INT;
+			ds1_uv_error(ctx);
+		}
 	}
 
 	if (irqst0) {
@@ -906,10 +1158,13 @@ static irqreturn_t vpe_irq(int irq_vpe, void *data)
 
 	vpdma_reset_desc_list(&ctx->desc_list);
 
+	 /* the previous dst mv buffer becomes the next src mv buffer */
+	ctx->src_mv_buf_selector = !ctx->src_mv_buf_selector;
+
 	if (ctx->aborting)
 		goto finished;
 
-	s_vb = ctx->src_vb;
+	s_vb = ctx->src_vbs[0];
 	d_vb = ctx->dst_vb;
 	s_buf = &s_vb->v4l2_buf;
 	d_buf = &d_vb->v4l2_buf;
@@ -919,16 +1174,35 @@ static irqreturn_t vpe_irq(int irq_vpe, void *data)
 		d_buf->flags |= V4L2_BUF_FLAG_TIMECODE;
 		d_buf->timecode = s_buf->timecode;
 	}
-
 	d_buf->sequence = ctx->sequence;
+	d_buf->field = ctx->field;
+
+	d_q_data = &ctx->q_data[Q_DATA_DST];
+	if (d_q_data->flags & Q_DATA_INTERLACED) {
+		if (ctx->field == V4L2_FIELD_BOTTOM) {
+			ctx->sequence++;
+			ctx->field = V4L2_FIELD_TOP;
+		} else {
+			WARN_ON(ctx->field != V4L2_FIELD_TOP);
+			ctx->field = V4L2_FIELD_BOTTOM;
+		}
+	} else {
+		ctx->sequence++;
+	}
 
-	ctx->sequence++;
+	if (ctx->deinterlacing)
+		s_vb = ctx->src_vbs[2];
 
 	spin_lock_irqsave(&dev->lock, flags);
 	v4l2_m2m_buf_done(s_vb, VB2_BUF_STATE_DONE);
 	v4l2_m2m_buf_done(d_vb, VB2_BUF_STATE_DONE);
 	spin_unlock_irqrestore(&dev->lock, flags);
 
+	if (ctx->deinterlacing) {
+		ctx->src_vbs[2] = ctx->src_vbs[1];
+		ctx->src_vbs[1] = ctx->src_vbs[0];
+	}
+
 	ctx->bufs_completed++;
 	if (ctx->bufs_completed < ctx->bufs_per_job) {
 		device_run(ctx);
@@ -1007,6 +1281,7 @@ static int vpe_g_fmt(struct file *file, void *priv, struct v4l2_format *f)
 	pix->width = q_data->width;
 	pix->height = q_data->height;
 	pix->pixelformat = q_data->fmt->fourcc;
+	pix->field = q_data->field;
 	pix->colorspace = q_data->colorspace;
 	pix->num_planes = q_data->fmt->coplanar ? 2 : 1;
 
@@ -1031,7 +1306,8 @@ static int __vpe_try_fmt(struct vpe_ctx *ctx, struct v4l2_format *f,
 		return -EINVAL;
 	}
 
-	pix->field = V4L2_FIELD_NONE;
+	if (pix->field != V4L2_FIELD_NONE && pix->field != V4L2_FIELD_ALTERNATE)
+		pix->field = V4L2_FIELD_NONE;
 
 	v4l_bound_align_image(&pix->width, MIN_W, MAX_W, W_ALIGN,
 			      &pix->height, MIN_H, MAX_H, H_ALIGN,
@@ -1099,6 +1375,7 @@ static int __vpe_s_fmt(struct vpe_ctx *ctx, struct v4l2_format *f)
 	q_data->width		= pix->width;
 	q_data->height		= pix->height;
 	q_data->colorspace	= pix->colorspace;
+	q_data->field		= pix->field;
 
 	for (i = 0; i < pix->num_planes; i++) {
 		plane_fmt = &pix->plane_fmt[i];
@@ -1112,6 +1389,11 @@ static int __vpe_s_fmt(struct vpe_ctx *ctx, struct v4l2_format *f)
 	q_data->c_rect.width	= q_data->width;
 	q_data->c_rect.height	= q_data->height;
 
+	if (q_data->field == V4L2_FIELD_ALTERNATE)
+		q_data->flags |= Q_DATA_INTERLACED;
+	else
+		q_data->flags &= ~Q_DATA_INTERLACED;
+
 	vpe_dbg(ctx->dev, "Setting format for type %d, wxh: %dx%d, fmt: %d bpl_y %d",
 		f->type, q_data->width, q_data->height, q_data->fmt->fourcc,
 		q_data->bytesperline[VPE_LUMA]);
@@ -1189,6 +1471,22 @@ static int vpe_streamoff(struct file *file, void *priv, enum v4l2_buf_type type)
 	return v4l2_m2m_streamoff(file, ctx->m2m_ctx, type);
 }
 
+static void set_dei_shadow_registers(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	u32 *dei_mmr = &mmr_adb->dei_regs[0];
+	const struct vpe_dei_regs *cur = &dei_regs;
+
+	dei_mmr[2]  = cur->mdt_spacial_freq_thr_reg;
+	dei_mmr[3]  = cur->edi_config_reg;
+	dei_mmr[4]  = cur->edi_lut_reg0;
+	dei_mmr[5]  = cur->edi_lut_reg1;
+	dei_mmr[6]  = cur->edi_lut_reg2;
+	dei_mmr[7]  = cur->edi_lut_reg3;
+
+	ctx->load_mmrs = true;
+}
+
 #define V4L2_CID_TRANS_NUM_BUFS		(V4L2_CID_USER_BASE + 0x1000)
 
 static int vpe_s_ctrl(struct v4l2_ctrl *ctrl)
@@ -1422,6 +1720,7 @@ static int vpe_open(struct file *file)
 	s_q_data->sizeimage[VPE_LUMA] = (s_q_data->width * s_q_data->height *
 			s_q_data->fmt->vpdma_fmt[VPE_LUMA]->depth) >> 3;
 	s_q_data->colorspace = V4L2_COLORSPACE_SMPTE240M;
+	s_q_data->field = V4L2_FIELD_NONE;
 	s_q_data->c_rect.left = 0;
 	s_q_data->c_rect.top = 0;
 	s_q_data->c_rect.width = s_q_data->width;
@@ -1430,6 +1729,7 @@ static int vpe_open(struct file *file)
 
 	ctx->q_data[Q_DATA_DST] = *s_q_data;
 
+	set_dei_shadow_registers(ctx);
 	set_src_registers(ctx);
 	set_dst_registers(ctx);
 	ret = set_srcdst_params(ctx);
@@ -1484,6 +1784,8 @@ static int vpe_release(struct file *file)
 	vpe_dbg(dev, "releasing instance %p\n", ctx);
 
 	mutex_lock(&dev->dev_mutex);
+	free_vbs(ctx);
+	free_mv_buffers(ctx);
 	vpdma_free_desc_list(&ctx->desc_list);
 	vpdma_buf_free(&ctx->mmr_adb);
 
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v2 4/6] v4l: ti-vpe: Add de-interlacer support in VPE
@ 2013-08-20 11:00     ` Archit Taneja
  0 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-08-20 11:00 UTC (permalink / raw)
  To: linux-media, hverkuil, laurent.pinchart, tomi.valkeinen
  Cc: linux-omap, Archit Taneja

Add support for the de-interlacer block in VPE.

For de-interlacer to work, we need to enable 2 more sets of VPE input ports
which fetch data from the 'last' and 'last to last' fields of the interlaced
video. Apart from that, we need to enable the Motion vector output and input
ports, and also allocate DMA buffers for them.

We need to make sure that two most recent fields in the source queue are
available and in the 'READY' state. Once a mem2mem context gets access to the
VPE HW(in device_run), it extracts the addresses of the 3 buffers, and provides
it to the data descriptors for the 3 sets of input ports((LUMA1, CHROMA1),
(LUMA2, CHROMA2), and (LUMA3, CHROMA3)) respectively for the 3 consecutive
fields. The motion vector and output port descriptors are configured and the
list is submitted to VPDMA.

Once the transaction is done, the v4l2 buffer corresponding to the oldest
field(the 3rd one) is changed to the state 'DONE', and the buffers corresponding
to 1st and 2nd fields become the 2nd and 3rd field for the next de-interlace
operation. This way, for each deinterlace operation, we have the 3 most recent
fields. After each transaction, we also swap the motion vector buffers, the new
input motion vector buffer contains the resultant motion information of all the
previous frames, and the new output motion vector buffer will be used to hold
the updated motion vector to capture the motion changes in the next field.

The de-interlacer is removed from bypass mode, it requires some extra default
configurations which are now added. The chrominance upsampler coefficients are
added for interlaced frames. Some VPDMA parameters like frame start event and
line mode are configured for the 2 extra sets of input ports.

Signed-off-by: Archit Taneja <archit@ti.com>
---
 drivers/media/platform/ti-vpe/vpe.c | 370 ++++++++++++++++++++++++++++++++----
 1 file changed, 336 insertions(+), 34 deletions(-)

diff --git a/drivers/media/platform/ti-vpe/vpe.c b/drivers/media/platform/ti-vpe/vpe.c
index 5e1d80e..724ae65 100644
--- a/drivers/media/platform/ti-vpe/vpe.c
+++ b/drivers/media/platform/ti-vpe/vpe.c
@@ -69,6 +69,8 @@
 #define VPE_CHROMA	1
 
 /* per m2m context info */
+#define VPE_MAX_SRC_BUFS	3	/* need 3 src fields to de-interlace */
+
 #define VPE_DEF_BUFS_PER_JOB	1	/* default one buffer per batch job */
 
 /*
@@ -110,6 +112,38 @@ static const struct vpe_us_coeffs us_coeffs[] = {
 		0x00C8, 0x0348, 0x0018, 0x3FD8, 0x3FB8, 0x0378, 0x00E8, 0x3FE8,
 		0x00C8, 0x0348, 0x0018, 0x3FD8, 0x3FB8, 0x0378, 0x00E8, 0x3FE8,
 	},
+	{
+		/* Coefficients for Top Field Interlaced input */
+		0x0051, 0x03D5, 0x3FE3, 0x3FF7, 0x3FB5, 0x02E9, 0x018F, 0x3FD3,
+		/* Coefficients for Bottom Field Interlaced input */
+		0x016B, 0x0247, 0x00B1, 0x3F9D, 0x3FCF, 0x03DB, 0x005D, 0x3FF9,
+	},
+};
+
+/*
+ * the following registers are for configuring some of the parameters of the
+ * motion and edge detection blocks inside DEI, these generally remain the same,
+ * these could be passed later via userspace if some one needs to tweak these.
+ */
+struct vpe_dei_regs {
+	unsigned long mdt_spacial_freq_thr_reg;		/* VPE_DEI_REG2 */
+	unsigned long edi_config_reg;			/* VPE_DEI_REG3 */
+	unsigned long edi_lut_reg0;			/* VPE_DEI_REG4 */
+	unsigned long edi_lut_reg1;			/* VPE_DEI_REG5 */
+	unsigned long edi_lut_reg2;			/* VPE_DEI_REG6 */
+	unsigned long edi_lut_reg3;			/* VPE_DEI_REG7 */
+};
+
+/*
+ * default expert DEI register values, unlikely to be modified.
+ */
+static const struct vpe_dei_regs dei_regs = {
+	0x020C0804u,
+	0x0118100Fu,
+	0x08040200u,
+	0x1010100Cu,
+	0x10101010u,
+	0x10101010u,
 };
 
 /*
@@ -117,6 +151,7 @@ static const struct vpe_us_coeffs us_coeffs[] = {
  */
 struct vpe_port_data {
 	enum vpdma_channel channel;	/* VPDMA channel */
+	u8	vb_index;		/* input frame f, f-1, f-2 index */
 	u8	vb_part;		/* plane index for co-panar formats */
 };
 
@@ -125,6 +160,12 @@ struct vpe_port_data {
  */
 #define VPE_PORT_LUMA1_IN	0
 #define VPE_PORT_CHROMA1_IN	1
+#define VPE_PORT_LUMA2_IN	2
+#define VPE_PORT_CHROMA2_IN	3
+#define VPE_PORT_LUMA3_IN	4
+#define VPE_PORT_CHROMA3_IN	5
+#define VPE_PORT_MV_IN		6
+#define VPE_PORT_MV_OUT		7
 #define VPE_PORT_LUMA_OUT	8
 #define VPE_PORT_CHROMA_OUT	9
 #define VPE_PORT_RGB_OUT	10
@@ -132,12 +173,40 @@ struct vpe_port_data {
 static const struct vpe_port_data port_data[11] = {
 	[VPE_PORT_LUMA1_IN] = {
 		.channel	= VPE_CHAN_LUMA1_IN,
+		.vb_index	= 0,
 		.vb_part	= VPE_LUMA,
 	},
 	[VPE_PORT_CHROMA1_IN] = {
 		.channel	= VPE_CHAN_CHROMA1_IN,
+		.vb_index	= 0,
+		.vb_part	= VPE_CHROMA,
+	},
+	[VPE_PORT_LUMA2_IN] = {
+		.channel	= VPE_CHAN_LUMA2_IN,
+		.vb_index	= 1,
+		.vb_part	= VPE_LUMA,
+	},
+	[VPE_PORT_CHROMA2_IN] = {
+		.channel	= VPE_CHAN_CHROMA2_IN,
+		.vb_index	= 1,
 		.vb_part	= VPE_CHROMA,
 	},
+	[VPE_PORT_LUMA3_IN] = {
+		.channel	= VPE_CHAN_LUMA3_IN,
+		.vb_index	= 2,
+		.vb_part	= VPE_LUMA,
+	},
+	[VPE_PORT_CHROMA3_IN] = {
+		.channel	= VPE_CHAN_CHROMA3_IN,
+		.vb_index	= 2,
+		.vb_part	= VPE_CHROMA,
+	},
+	[VPE_PORT_MV_IN] = {
+		.channel	= VPE_CHAN_MV_IN,
+	},
+	[VPE_PORT_MV_OUT] = {
+		.channel	= VPE_CHAN_MV_OUT,
+	},
 	[VPE_PORT_LUMA_OUT] = {
 		.channel	= VPE_CHAN_LUMA_OUT,
 		.vb_part	= VPE_LUMA,
@@ -209,6 +278,7 @@ struct vpe_q_data {
 	unsigned int		height;				/* frame height */
 	unsigned int		bytesperline[VPE_MAX_PLANES];	/* bytes per line in memory */
 	enum v4l2_colorspace	colorspace;
+	enum v4l2_field		field;				/* supported field value */
 	unsigned int		flags;
 	unsigned int		sizeimage[VPE_MAX_PLANES];	/* image size in memory */
 	struct v4l2_rect	c_rect;				/* crop/compose rectangle */
@@ -218,6 +288,7 @@ struct vpe_q_data {
 /* vpe_q_data flag bits */
 #define	Q_DATA_FRAME_1D		(1 << 0)
 #define	Q_DATA_MODE_TILED	(1 << 1)
+#define	Q_DATA_INTERLACED	(1 << 2)
 
 enum {
 	Q_DATA_SRC = 0,
@@ -269,6 +340,7 @@ struct vpe_ctx {
 	struct v4l2_m2m_ctx	*m2m_ctx;
 	struct v4l2_ctrl_handler hdl;
 
+	unsigned int		field;			/* current field */
 	unsigned int		sequence;		/* current frame/field seq */
 	unsigned int		aborting;		/* abort after next irq */
 
@@ -276,13 +348,17 @@ struct vpe_ctx {
 	unsigned int		bufs_completed;		/* bufs done in this batch */
 
 	struct vpe_q_data	q_data[2];		/* src & dst queue data */
-	struct vb2_buffer	*src_vb;
+	struct vb2_buffer	*src_vbs[VPE_MAX_SRC_BUFS];
 	struct vb2_buffer	*dst_vb;
 
+	struct vpdma_buf	mv_buf[2];		/* motion vector in/out bufs */
 	struct vpdma_buf	mmr_adb;		/* shadow reg addr/data block */
 	struct vpdma_desc_list	desc_list;		/* DMA descriptor list */
 
+	bool			deinterlacing;		/* using de-interlacer */
 	bool			load_mmrs;		/* have new shadow reg values */
+
+	unsigned int		src_mv_buf_selector;
 };
 
 
@@ -358,8 +434,7 @@ struct vpe_mmr_adb {
 	struct vpdma_adb_hdr	us3_hdr;
 	u32			us3_regs[8];
 	struct vpdma_adb_hdr	dei_hdr;
-	u32			dei_regs[1];
-	u32			dei_pad[3];
+	u32			dei_regs[8];
 	struct vpdma_adb_hdr	sc_hdr;
 	u32			sc_regs[1];
 	u32			sc_pad[3];
@@ -385,6 +460,74 @@ static void init_adb_hdrs(struct vpe_ctx *ctx)
 };
 
 /*
+ * Allocate or re-allocate the motion vector DMA buffers
+ * There are two buffers, one for input and one for output.
+ * However, the roles are reversed after each field is processed.
+ * In other words, after each field is processed, the previous
+ * output (dst) MV buffer becomes the new input (src) MV buffer.
+ */
+static int realloc_mv_buffers(struct vpe_ctx *ctx, size_t size)
+{
+	struct vpdma_data *vpdma = ctx->dev->vpdma;
+	int ret;
+
+	if (ctx->mv_buf[0].mapped) {
+		vpdma_buf_unmap(vpdma, &ctx->mv_buf[0]);
+		vpdma_buf_free(&ctx->mv_buf[0]);
+	}
+
+	if (ctx->mv_buf[1].mapped) {
+		vpdma_buf_unmap(vpdma, &ctx->mv_buf[1]);
+		vpdma_buf_free(&ctx->mv_buf[1]);
+	}
+
+	if (size == 0)
+		return 0;
+
+	ret = vpdma_buf_alloc(&ctx->mv_buf[0], size);
+	if (ret)
+		return ret;
+	ret = vpdma_buf_alloc(&ctx->mv_buf[1], size);
+	if (ret) {
+		vpdma_buf_free(&ctx->mv_buf[0]);
+		return ret;
+	}
+
+	vpdma_buf_map(vpdma, &ctx->mv_buf[0]);
+	vpdma_buf_map(vpdma, &ctx->mv_buf[1]);
+
+	ctx->src_mv_buf_selector = 0;
+
+	return 0;
+}
+
+static void free_mv_buffers(struct vpe_ctx *ctx)
+{
+	realloc_mv_buffers(ctx, 0);
+}
+
+/*
+ * While de-interlacing, we keep the two most recent input buffers
+ * around.  This function frees those two buffers when we have
+ * finished processing the current stream.
+ */
+static void free_vbs(struct vpe_ctx *ctx)
+{
+	struct vpe_dev *dev = ctx->dev;
+	unsigned long flags;
+
+	if (ctx->src_vbs[2] == NULL)
+		return;
+
+	spin_lock_irqsave(&dev->lock, flags);
+	if (ctx->src_vbs[2]) {
+		v4l2_m2m_buf_done(ctx->src_vbs[2], VB2_BUF_STATE_DONE);
+		v4l2_m2m_buf_done(ctx->src_vbs[1], VB2_BUF_STATE_DONE);
+	}
+	spin_unlock_irqrestore(&dev->lock, flags);
+}
+
+/*
  * Enable or disable the VPE clocks
  */
 static void vpe_set_clock_enable(struct vpe_dev *dev, bool on)
@@ -425,6 +568,7 @@ static void vpe_top_vpdma_reset(struct vpe_dev *dev)
 static void set_us_coefficients(struct vpe_ctx *ctx)
 {
 	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	struct vpe_q_data *s_q_data = &ctx->q_data[Q_DATA_SRC];
 	u32 *us1_reg = &mmr_adb->us1_regs[0];
 	u32 *us2_reg = &mmr_adb->us2_regs[0];
 	u32 *us3_reg = &mmr_adb->us3_regs[0];
@@ -432,6 +576,9 @@ static void set_us_coefficients(struct vpe_ctx *ctx)
 
 	cp = &us_coeffs[0].anchor_fid0_c0;
 
+	if (s_q_data->flags & Q_DATA_INTERLACED)	/* interlaced */
+		cp += sizeof(us_coeffs[0]) / sizeof(*cp);
+
 	end_cp = cp + sizeof(us_coeffs[0]) / sizeof(*cp);
 
 	while (cp < end_cp) {
@@ -472,14 +619,28 @@ static void set_cfg_and_line_modes(struct vpe_ctx *ctx)
 
 	/* regs for now */
 	vpdma_set_line_mode(ctx->dev->vpdma, line_mode, VPE_CHAN_CHROMA1_IN);
+	vpdma_set_line_mode(ctx->dev->vpdma, line_mode, VPE_CHAN_CHROMA2_IN);
+	vpdma_set_line_mode(ctx->dev->vpdma, line_mode, VPE_CHAN_CHROMA3_IN);
 
 	/* frame start for input luma */
 	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
 		VPE_CHAN_LUMA1_IN);
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_LUMA2_IN);
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_LUMA3_IN);
 
 	/* frame start for input chroma */
 	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
 		VPE_CHAN_CHROMA1_IN);
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_CHROMA2_IN);
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_CHROMA3_IN);
+
+	/* frame start for MV in client */
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_MV_IN);
 
 	ctx->load_mmrs = true;
 }
@@ -523,13 +684,14 @@ static void set_dst_registers(struct vpe_ctx *ctx)
 /*
  * Set the de-interlacer shadow register values
  */
-static void set_dei_regs_bypass(struct vpe_ctx *ctx)
+static void set_dei_regs(struct vpe_ctx *ctx)
 {
 	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
 	struct vpe_q_data *s_q_data = &ctx->q_data[Q_DATA_SRC];
 	unsigned int src_h = s_q_data->c_rect.height;
 	unsigned int src_w = s_q_data->c_rect.width;
 	u32 *dei_mmr0 = &mmr_adb->dei_regs[0];
+	bool deinterlace = true;
 	u32 val = 0;
 
 	/*
@@ -538,7 +700,13 @@ static void set_dei_regs_bypass(struct vpe_ctx *ctx)
 	 * for both progressive and interlace content in interlace bypass mode.
 	 * It has been recommended not to use progressive bypass mode.
 	 */
-	val = VPE_DEI_INTERLACE_BYPASS;
+	if ((!ctx->deinterlacing && (s_q_data->flags & Q_DATA_INTERLACED)) ||
+			!(s_q_data->flags & Q_DATA_INTERLACED)) {
+		deinterlace = false;
+		val = VPE_DEI_INTERLACE_BYPASS;
+	}
+
+	src_h = deinterlace ? src_h * 2 : src_h;
 
 	val |= (src_h << VPE_DEI_HEIGHT_SHIFT) |
 		(src_w << VPE_DEI_WIDTH_SHIFT) |
@@ -577,10 +745,35 @@ static void set_sc_regs_bypass(struct vpe_ctx *ctx)
  */
 static int set_srcdst_params(struct vpe_ctx *ctx)
 {
+	struct vpe_q_data *s_q_data =  &ctx->q_data[Q_DATA_SRC];
+	struct vpe_q_data *d_q_data =  &ctx->q_data[Q_DATA_DST];
+	size_t mv_buf_size;
+	int ret;
+
 	ctx->sequence = 0;
+	ctx->field = V4L2_FIELD_TOP;
+
+	if ((s_q_data->flags & Q_DATA_INTERLACED) &&
+			!(d_q_data->flags & Q_DATA_INTERLACED)) {
+		const struct vpdma_data_format *mv =
+			&vpdma_misc_fmts[VPDMA_DATA_FMT_MV];
+
+		ctx->deinterlacing = 1;
+		mv_buf_size =
+			(s_q_data->width * s_q_data->height * mv->depth) >> 3;
+	} else {
+		ctx->deinterlacing = 0;
+		mv_buf_size = 0;
+	}
+
+	free_vbs(ctx);
+
+	ret = realloc_mv_buffers(ctx, mv_buf_size);
+	if (ret)
+		return ret;
 
 	set_cfg_and_line_modes(ctx);
-	set_dei_regs_bypass(ctx);
+	set_dei_regs(ctx);
 	set_csc_coeff_bypass(ctx);
 	set_sc_regs_bypass(ctx);
 
@@ -607,6 +800,9 @@ static int job_ready(void *priv)
 	struct vpe_ctx *ctx = priv;
 	int needed = ctx->bufs_per_job;
 
+	if (ctx->deinterlacing && ctx->src_vbs[2] == NULL)
+		needed += 2;	/* need additional two most recent fields */
+
 	if (v4l2_m2m_num_src_bufs_ready(ctx->m2m_ctx) < needed)
 		return 0;
 
@@ -734,17 +930,25 @@ static void add_out_dtd(struct vpe_ctx *ctx, int port)
 	struct v4l2_rect *c_rect = &q_data->c_rect;
 	struct vpe_fmt *fmt = q_data->fmt;
 	const struct vpdma_data_format *vpdma_fmt;
-	int plane = fmt->coplanar ? p_data->vb_part : 0;
+	int mv_buf_selector = !ctx->src_mv_buf_selector;
 	dma_addr_t dma_addr;
 	u32 flags = 0;
 
-	vpdma_fmt = fmt->vpdma_fmt[plane];
-	dma_addr = vb2_dma_contig_plane_dma_addr(vb, plane);
-	if (!dma_addr) {
-		vpe_err(ctx->dev,
-			"acquiring output buffer(%d) dma_addr failed\n",
-			port);
-		return;
+	if (port == VPE_PORT_MV_OUT) {
+		vpdma_fmt = &vpdma_misc_fmts[VPDMA_DATA_FMT_MV];
+		dma_addr = ctx->mv_buf[mv_buf_selector].dma_addr;
+	} else {
+		/* to incorporate interleaved formats */
+		int plane = fmt->coplanar ? p_data->vb_part : 0;
+
+		vpdma_fmt = fmt->vpdma_fmt[plane];
+		dma_addr = vb2_dma_contig_plane_dma_addr(vb, plane);
+		if (!dma_addr) {
+			vpe_err(ctx->dev,
+				"acquiring output buffer(%d) dma_addr failed\n",
+				port);
+			return;
+		}
 	}
 
 	if (q_data->flags & Q_DATA_FRAME_1D)
@@ -760,23 +964,31 @@ static void add_in_dtd(struct vpe_ctx *ctx, int port)
 {
 	struct vpe_q_data *q_data = &ctx->q_data[Q_DATA_SRC];
 	const struct vpe_port_data *p_data = &port_data[port];
-	struct vb2_buffer *vb = ctx->src_vb;
+	struct vb2_buffer *vb = ctx->src_vbs[p_data->vb_index];
 	struct v4l2_rect *c_rect = &q_data->c_rect;
 	struct vpe_fmt *fmt = q_data->fmt;
 	const struct vpdma_data_format *vpdma_fmt;
-	int plane = fmt->coplanar ? p_data->vb_part : 0;
-	int field = 0;
+	int mv_buf_selector = ctx->src_mv_buf_selector;
+	int field = vb->v4l2_buf.field == V4L2_FIELD_BOTTOM;
 	dma_addr_t dma_addr;
 	u32 flags = 0;
 
-	vpdma_fmt = fmt->vpdma_fmt[plane];
+	if (port == VPE_PORT_MV_IN) {
+		vpdma_fmt = &vpdma_misc_fmts[VPDMA_DATA_FMT_MV];
+		dma_addr = ctx->mv_buf[mv_buf_selector].dma_addr;
+	} else {
+		/* to incorporate interleaved formats */
+		int plane = fmt->coplanar ? p_data->vb_part : 0;
 
-	dma_addr = vb2_dma_contig_plane_dma_addr(vb, plane);
-	if (!dma_addr) {
-		vpe_err(ctx->dev,
-			"acquiring input buffer(%d) dma_addr failed\n",
-			port);
-		return;
+		vpdma_fmt = fmt->vpdma_fmt[plane];
+
+		dma_addr = vb2_dma_contig_plane_dma_addr(vb, plane);
+		if (!dma_addr) {
+			vpe_err(ctx->dev,
+				"acquiring input buffer(%d) dma_addr failed\n",
+				port);
+			return;
+		}
 	}
 
 	if (q_data->flags & Q_DATA_FRAME_1D)
@@ -794,7 +1006,8 @@ static void add_in_dtd(struct vpe_ctx *ctx, int port)
 static void enable_irqs(struct vpe_ctx *ctx)
 {
 	write_reg(ctx->dev, VPE_INT0_ENABLE0_SET, VPE_INT0_LIST0_COMPLETE);
-	write_reg(ctx->dev, VPE_INT0_ENABLE1_SET, VPE_DS1_UV_ERROR_INT);
+	write_reg(ctx->dev, VPE_INT0_ENABLE1_SET, VPE_DEI_ERROR_INT |
+				VPE_DS1_UV_ERROR_INT);
 
 	vpdma_enable_list_complete_irq(ctx->dev->vpdma, 0, true);
 }
@@ -817,8 +1030,15 @@ static void device_run(void *priv)
 	struct vpe_ctx *ctx = priv;
 	struct vpe_q_data *d_q_data = &ctx->q_data[Q_DATA_DST];
 
-	ctx->src_vb = v4l2_m2m_src_buf_remove(ctx->m2m_ctx);
-	WARN_ON(ctx->src_vb == NULL);
+	if (ctx->deinterlacing && ctx->src_vbs[2] == NULL) {
+		ctx->src_vbs[2] = v4l2_m2m_src_buf_remove(ctx->m2m_ctx);
+		WARN_ON(ctx->src_vbs[2] == NULL);
+		ctx->src_vbs[1] = v4l2_m2m_src_buf_remove(ctx->m2m_ctx);
+		WARN_ON(ctx->src_vbs[1] == NULL);
+	}
+
+	ctx->src_vbs[0] = v4l2_m2m_src_buf_remove(ctx->m2m_ctx);
+	WARN_ON(ctx->src_vbs[0] == NULL);
 	ctx->dst_vb = v4l2_m2m_dst_buf_remove(ctx->m2m_ctx);
 	WARN_ON(ctx->dst_vb == NULL);
 
@@ -830,24 +1050,49 @@ static void device_run(void *priv)
 		ctx->load_mmrs = false;
 	}
 
+	/* output data descriptors */
+	if (ctx->deinterlacing)
+		add_out_dtd(ctx, VPE_PORT_MV_OUT);
+
 	add_out_dtd(ctx, VPE_PORT_LUMA_OUT);
 	if (d_q_data->fmt->coplanar)
 		add_out_dtd(ctx, VPE_PORT_CHROMA_OUT);
 
+	/* input data descriptors */
+	if (ctx->deinterlacing) {
+		add_in_dtd(ctx, VPE_PORT_LUMA3_IN);
+		add_in_dtd(ctx, VPE_PORT_CHROMA3_IN);
+
+		add_in_dtd(ctx, VPE_PORT_LUMA2_IN);
+		add_in_dtd(ctx, VPE_PORT_CHROMA2_IN);
+	}
+
 	add_in_dtd(ctx, VPE_PORT_LUMA1_IN);
 	add_in_dtd(ctx, VPE_PORT_CHROMA1_IN);
 
+	if (ctx->deinterlacing)
+		add_in_dtd(ctx, VPE_PORT_MV_IN);
+
 	/* sync on channel control descriptors for output ports */
 	vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_LUMA_OUT);
 	if (d_q_data->fmt->coplanar)
 		vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_CHROMA_OUT);
 
+	if (ctx->deinterlacing)
+		vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_MV_OUT);
+
 	enable_irqs(ctx);
 
 	vpdma_buf_map(ctx->dev->vpdma, &ctx->desc_list.buf);
 	vpdma_submit_descs(ctx->dev->vpdma, &ctx->desc_list);
 }
 
+static void dei_error(struct vpe_ctx *ctx)
+{
+	dev_warn(ctx->dev->v4l2_dev.dev,
+		"received DEI error interrupt\n");
+}
+
 static void ds1_uv_error(struct vpe_ctx *ctx)
 {
 	dev_warn(ctx->dev->v4l2_dev.dev,
@@ -858,6 +1103,7 @@ static irqreturn_t vpe_irq(int irq_vpe, void *data)
 {
 	struct vpe_dev *dev = (struct vpe_dev *)data;
 	struct vpe_ctx *ctx;
+	struct vpe_q_data *d_q_data;
 	struct vb2_buffer *s_vb, *d_vb;
 	struct v4l2_buffer *s_buf, *d_buf;
 	unsigned long flags;
@@ -881,9 +1127,15 @@ static irqreturn_t vpe_irq(int irq_vpe, void *data)
 		goto handled;
 	}
 
-	if (irqst1 & VPE_DS1_UV_ERROR_INT) {
-		irqst1 &= ~VPE_DS1_UV_ERROR_INT;
-		ds1_uv_error(ctx);
+	if (irqst1) {
+		if (irqst1 & VPE_DEI_ERROR_INT) {
+			irqst1 &= ~VPE_DEI_ERROR_INT;
+			dei_error(ctx);
+		}
+		if (irqst1 & VPE_DS1_UV_ERROR_INT) {
+			irqst1 &= ~VPE_DS1_UV_ERROR_INT;
+			ds1_uv_error(ctx);
+		}
 	}
 
 	if (irqst0) {
@@ -906,10 +1158,13 @@ static irqreturn_t vpe_irq(int irq_vpe, void *data)
 
 	vpdma_reset_desc_list(&ctx->desc_list);
 
+	 /* the previous dst mv buffer becomes the next src mv buffer */
+	ctx->src_mv_buf_selector = !ctx->src_mv_buf_selector;
+
 	if (ctx->aborting)
 		goto finished;
 
-	s_vb = ctx->src_vb;
+	s_vb = ctx->src_vbs[0];
 	d_vb = ctx->dst_vb;
 	s_buf = &s_vb->v4l2_buf;
 	d_buf = &d_vb->v4l2_buf;
@@ -919,16 +1174,35 @@ static irqreturn_t vpe_irq(int irq_vpe, void *data)
 		d_buf->flags |= V4L2_BUF_FLAG_TIMECODE;
 		d_buf->timecode = s_buf->timecode;
 	}
-
 	d_buf->sequence = ctx->sequence;
+	d_buf->field = ctx->field;
+
+	d_q_data = &ctx->q_data[Q_DATA_DST];
+	if (d_q_data->flags & Q_DATA_INTERLACED) {
+		if (ctx->field == V4L2_FIELD_BOTTOM) {
+			ctx->sequence++;
+			ctx->field = V4L2_FIELD_TOP;
+		} else {
+			WARN_ON(ctx->field != V4L2_FIELD_TOP);
+			ctx->field = V4L2_FIELD_BOTTOM;
+		}
+	} else {
+		ctx->sequence++;
+	}
 
-	ctx->sequence++;
+	if (ctx->deinterlacing)
+		s_vb = ctx->src_vbs[2];
 
 	spin_lock_irqsave(&dev->lock, flags);
 	v4l2_m2m_buf_done(s_vb, VB2_BUF_STATE_DONE);
 	v4l2_m2m_buf_done(d_vb, VB2_BUF_STATE_DONE);
 	spin_unlock_irqrestore(&dev->lock, flags);
 
+	if (ctx->deinterlacing) {
+		ctx->src_vbs[2] = ctx->src_vbs[1];
+		ctx->src_vbs[1] = ctx->src_vbs[0];
+	}
+
 	ctx->bufs_completed++;
 	if (ctx->bufs_completed < ctx->bufs_per_job) {
 		device_run(ctx);
@@ -1007,6 +1281,7 @@ static int vpe_g_fmt(struct file *file, void *priv, struct v4l2_format *f)
 	pix->width = q_data->width;
 	pix->height = q_data->height;
 	pix->pixelformat = q_data->fmt->fourcc;
+	pix->field = q_data->field;
 	pix->colorspace = q_data->colorspace;
 	pix->num_planes = q_data->fmt->coplanar ? 2 : 1;
 
@@ -1031,7 +1306,8 @@ static int __vpe_try_fmt(struct vpe_ctx *ctx, struct v4l2_format *f,
 		return -EINVAL;
 	}
 
-	pix->field = V4L2_FIELD_NONE;
+	if (pix->field != V4L2_FIELD_NONE && pix->field != V4L2_FIELD_ALTERNATE)
+		pix->field = V4L2_FIELD_NONE;
 
 	v4l_bound_align_image(&pix->width, MIN_W, MAX_W, W_ALIGN,
 			      &pix->height, MIN_H, MAX_H, H_ALIGN,
@@ -1099,6 +1375,7 @@ static int __vpe_s_fmt(struct vpe_ctx *ctx, struct v4l2_format *f)
 	q_data->width		= pix->width;
 	q_data->height		= pix->height;
 	q_data->colorspace	= pix->colorspace;
+	q_data->field		= pix->field;
 
 	for (i = 0; i < pix->num_planes; i++) {
 		plane_fmt = &pix->plane_fmt[i];
@@ -1112,6 +1389,11 @@ static int __vpe_s_fmt(struct vpe_ctx *ctx, struct v4l2_format *f)
 	q_data->c_rect.width	= q_data->width;
 	q_data->c_rect.height	= q_data->height;
 
+	if (q_data->field == V4L2_FIELD_ALTERNATE)
+		q_data->flags |= Q_DATA_INTERLACED;
+	else
+		q_data->flags &= ~Q_DATA_INTERLACED;
+
 	vpe_dbg(ctx->dev, "Setting format for type %d, wxh: %dx%d, fmt: %d bpl_y %d",
 		f->type, q_data->width, q_data->height, q_data->fmt->fourcc,
 		q_data->bytesperline[VPE_LUMA]);
@@ -1189,6 +1471,22 @@ static int vpe_streamoff(struct file *file, void *priv, enum v4l2_buf_type type)
 	return v4l2_m2m_streamoff(file, ctx->m2m_ctx, type);
 }
 
+static void set_dei_shadow_registers(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	u32 *dei_mmr = &mmr_adb->dei_regs[0];
+	const struct vpe_dei_regs *cur = &dei_regs;
+
+	dei_mmr[2]  = cur->mdt_spacial_freq_thr_reg;
+	dei_mmr[3]  = cur->edi_config_reg;
+	dei_mmr[4]  = cur->edi_lut_reg0;
+	dei_mmr[5]  = cur->edi_lut_reg1;
+	dei_mmr[6]  = cur->edi_lut_reg2;
+	dei_mmr[7]  = cur->edi_lut_reg3;
+
+	ctx->load_mmrs = true;
+}
+
 #define V4L2_CID_TRANS_NUM_BUFS		(V4L2_CID_USER_BASE + 0x1000)
 
 static int vpe_s_ctrl(struct v4l2_ctrl *ctrl)
@@ -1422,6 +1720,7 @@ static int vpe_open(struct file *file)
 	s_q_data->sizeimage[VPE_LUMA] = (s_q_data->width * s_q_data->height *
 			s_q_data->fmt->vpdma_fmt[VPE_LUMA]->depth) >> 3;
 	s_q_data->colorspace = V4L2_COLORSPACE_SMPTE240M;
+	s_q_data->field = V4L2_FIELD_NONE;
 	s_q_data->c_rect.left = 0;
 	s_q_data->c_rect.top = 0;
 	s_q_data->c_rect.width = s_q_data->width;
@@ -1430,6 +1729,7 @@ static int vpe_open(struct file *file)
 
 	ctx->q_data[Q_DATA_DST] = *s_q_data;
 
+	set_dei_shadow_registers(ctx);
 	set_src_registers(ctx);
 	set_dst_registers(ctx);
 	ret = set_srcdst_params(ctx);
@@ -1484,6 +1784,8 @@ static int vpe_release(struct file *file)
 	vpe_dbg(dev, "releasing instance %p\n", ctx);
 
 	mutex_lock(&dev->dev_mutex);
+	free_vbs(ctx);
+	free_mv_buffers(ctx);
 	vpdma_free_desc_list(&ctx->desc_list);
 	vpdma_buf_free(&ctx->mmr_adb);
 
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v2 5/6] arm: dra7xx: hwmod data: add VPE hwmod data and ocp_if info
  2013-08-20 11:00   ` Archit Taneja
@ 2013-08-20 11:00     ` Archit Taneja
  -1 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-08-20 11:00 UTC (permalink / raw)
  To: linux-media, hverkuil, laurent.pinchart, tomi.valkeinen
  Cc: linux-omap, Archit Taneja, Rajendra Nayak, Sricharan R

Add hwmod data for the VPE IP, this is needed for the IP to be reset during
boot, and control the functional clock when the driver needs it via
pm_runtime apis. Add the corresponding ocp_if struct and add it DRA7XX's
ocp interface list.

Cc: Rajendra Nayak <rnayak@ti.com>
Cc: Sricharan R <r.sricharan@ti.com>
Signed-off-by: Archit Taneja <archit@ti.com>
---
 arch/arm/mach-omap2/omap_hwmod_7xx_data.c | 42 +++++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/arch/arm/mach-omap2/omap_hwmod_7xx_data.c b/arch/arm/mach-omap2/omap_hwmod_7xx_data.c
index f647998b..181365d 100644
--- a/arch/arm/mach-omap2/omap_hwmod_7xx_data.c
+++ b/arch/arm/mach-omap2/omap_hwmod_7xx_data.c
@@ -1883,6 +1883,39 @@ static struct omap_hwmod dra7xx_wd_timer2_hwmod = {
 	},
 };
 
+/*
+ * 'vpe' class
+ *
+ */
+
+static struct omap_hwmod_class_sysconfig dra7xx_vpe_sysc = {
+	.sysc_offs	= 0x0010,
+	.sysc_flags	= (SYSC_HAS_MIDLEMODE | SYSC_HAS_SIDLEMODE),
+	.idlemodes	= (SIDLE_FORCE | SIDLE_NO | SIDLE_SMART |
+			   SIDLE_SMART_WKUP | MSTANDBY_FORCE | MSTANDBY_NO |
+			   MSTANDBY_SMART | MSTANDBY_SMART_WKUP),
+	.sysc_fields	= &omap_hwmod_sysc_type2,
+};
+
+static struct omap_hwmod_class dra7xx_vpe_hwmod_class = {
+	.name	= "vpe",
+	.sysc	= &dra7xx_vpe_sysc,
+};
+
+/* vpe */
+static struct omap_hwmod dra7xx_vpe_hwmod = {
+	.name		= "vpe",
+	.class		= &dra7xx_vpe_hwmod_class,
+	.clkdm_name	= "vpe_clkdm",
+	.main_clk	= "dpll_core_h23x2_ck",
+	.prcm = {
+		.omap4 = {
+			.clkctrl_offs = DRA7XX_CM_VPE_VPE_CLKCTRL_OFFSET,
+			.context_offs = DRA7XX_RM_VPE_VPE_CONTEXT_OFFSET,
+			.modulemode   = MODULEMODE_HWCTRL,
+		},
+	},
+};
 
 /*
  * Interfaces
@@ -2636,6 +2669,14 @@ static struct omap_hwmod_ocp_if dra7xx_l4_wkup__wd_timer2 = {
 	.user		= OCP_USER_MPU | OCP_USER_SDMA,
 };
 
+/* l4_per3 -> vpe */
+static struct omap_hwmod_ocp_if dra7xx_l4_per3__vpe = {
+	.master		= &dra7xx_l4_per3_hwmod,
+	.slave		= &dra7xx_vpe_hwmod,
+	.clk		= "l3_iclk_div",
+	.user		= OCP_USER_MPU | OCP_USER_SDMA,
+};
+
 static struct omap_hwmod_ocp_if *dra7xx_hwmod_ocp_ifs[] __initdata = {
 	&dra7xx_l3_main_2__l3_instr,
 	&dra7xx_l4_cfg__l3_main_1,
@@ -2714,6 +2755,7 @@ static struct omap_hwmod_ocp_if *dra7xx_hwmod_ocp_ifs[] __initdata = {
 	&dra7xx_l3_main_1__vcp2,
 	&dra7xx_l4_per2__vcp2,
 	&dra7xx_l4_wkup__wd_timer2,
+	&dra7xx_l4_per3__vpe,
 	NULL,
 };
 
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v2 5/6] arm: dra7xx: hwmod data: add VPE hwmod data and ocp_if info
@ 2013-08-20 11:00     ` Archit Taneja
  0 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-08-20 11:00 UTC (permalink / raw)
  To: linux-media, hverkuil, laurent.pinchart, tomi.valkeinen
  Cc: linux-omap, Archit Taneja, Rajendra Nayak, Sricharan R

Add hwmod data for the VPE IP, this is needed for the IP to be reset during
boot, and control the functional clock when the driver needs it via
pm_runtime apis. Add the corresponding ocp_if struct and add it DRA7XX's
ocp interface list.

Cc: Rajendra Nayak <rnayak@ti.com>
Cc: Sricharan R <r.sricharan@ti.com>
Signed-off-by: Archit Taneja <archit@ti.com>
---
 arch/arm/mach-omap2/omap_hwmod_7xx_data.c | 42 +++++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/arch/arm/mach-omap2/omap_hwmod_7xx_data.c b/arch/arm/mach-omap2/omap_hwmod_7xx_data.c
index f647998b..181365d 100644
--- a/arch/arm/mach-omap2/omap_hwmod_7xx_data.c
+++ b/arch/arm/mach-omap2/omap_hwmod_7xx_data.c
@@ -1883,6 +1883,39 @@ static struct omap_hwmod dra7xx_wd_timer2_hwmod = {
 	},
 };
 
+/*
+ * 'vpe' class
+ *
+ */
+
+static struct omap_hwmod_class_sysconfig dra7xx_vpe_sysc = {
+	.sysc_offs	= 0x0010,
+	.sysc_flags	= (SYSC_HAS_MIDLEMODE | SYSC_HAS_SIDLEMODE),
+	.idlemodes	= (SIDLE_FORCE | SIDLE_NO | SIDLE_SMART |
+			   SIDLE_SMART_WKUP | MSTANDBY_FORCE | MSTANDBY_NO |
+			   MSTANDBY_SMART | MSTANDBY_SMART_WKUP),
+	.sysc_fields	= &omap_hwmod_sysc_type2,
+};
+
+static struct omap_hwmod_class dra7xx_vpe_hwmod_class = {
+	.name	= "vpe",
+	.sysc	= &dra7xx_vpe_sysc,
+};
+
+/* vpe */
+static struct omap_hwmod dra7xx_vpe_hwmod = {
+	.name		= "vpe",
+	.class		= &dra7xx_vpe_hwmod_class,
+	.clkdm_name	= "vpe_clkdm",
+	.main_clk	= "dpll_core_h23x2_ck",
+	.prcm = {
+		.omap4 = {
+			.clkctrl_offs = DRA7XX_CM_VPE_VPE_CLKCTRL_OFFSET,
+			.context_offs = DRA7XX_RM_VPE_VPE_CONTEXT_OFFSET,
+			.modulemode   = MODULEMODE_HWCTRL,
+		},
+	},
+};
 
 /*
  * Interfaces
@@ -2636,6 +2669,14 @@ static struct omap_hwmod_ocp_if dra7xx_l4_wkup__wd_timer2 = {
 	.user		= OCP_USER_MPU | OCP_USER_SDMA,
 };
 
+/* l4_per3 -> vpe */
+static struct omap_hwmod_ocp_if dra7xx_l4_per3__vpe = {
+	.master		= &dra7xx_l4_per3_hwmod,
+	.slave		= &dra7xx_vpe_hwmod,
+	.clk		= "l3_iclk_div",
+	.user		= OCP_USER_MPU | OCP_USER_SDMA,
+};
+
 static struct omap_hwmod_ocp_if *dra7xx_hwmod_ocp_ifs[] __initdata = {
 	&dra7xx_l3_main_2__l3_instr,
 	&dra7xx_l4_cfg__l3_main_1,
@@ -2714,6 +2755,7 @@ static struct omap_hwmod_ocp_if *dra7xx_hwmod_ocp_ifs[] __initdata = {
 	&dra7xx_l3_main_1__vcp2,
 	&dra7xx_l4_per2__vcp2,
 	&dra7xx_l4_wkup__wd_timer2,
+	&dra7xx_l4_per3__vpe,
 	NULL,
 };
 
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v2 6/6] experimental: arm: dts: dra7xx: Add a DT node for VPE
  2013-08-20 11:00   ` Archit Taneja
@ 2013-08-20 11:00     ` Archit Taneja
  -1 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-08-20 11:00 UTC (permalink / raw)
  To: linux-media, hverkuil, laurent.pinchart, tomi.valkeinen
  Cc: linux-omap, Archit Taneja, Rajendra Nayak, Sricharan R

Add a DT node for VPE in dra7.dtsi. This is experimental because we might need
to split the VPE address space a bit more, and also because the IRQ line
described is accessible the IRQ crossbar driver is added for DRA7XX.

Cc: Rajendra Nayak <rnayak@ti.com>
Cc: Sricharan R <r.sricharan@ti.com>
Signed-off-by: Archit Taneja <archit@ti.com>
---
 arch/arm/boot/dts/dra7.dtsi | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/arch/arm/boot/dts/dra7.dtsi b/arch/arm/boot/dts/dra7.dtsi
index ce9a0f0..7c1cbfe 100644
--- a/arch/arm/boot/dts/dra7.dtsi
+++ b/arch/arm/boot/dts/dra7.dtsi
@@ -484,6 +484,17 @@
 			dmas = <&sdma 70>, <&sdma 71>;
 			dma-names = "tx0", "rx0";
 		};
+
+		vpe {
+			compatible = "ti,vpe";
+			ti,hwmods = "vpe";
+			reg = <0x489d0000 0xd000>, <0x489dd000 0x400>;
+			reg-names = "vpe", "vpdma";
+			interrupts = <0 158 0x4>;
+			#address-cells = <1>;
+			#size-cells = <0>;
+		};
+
 	};
 
 	clocks {
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v2 6/6] experimental: arm: dts: dra7xx: Add a DT node for VPE
@ 2013-08-20 11:00     ` Archit Taneja
  0 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-08-20 11:00 UTC (permalink / raw)
  To: linux-media, hverkuil, laurent.pinchart, tomi.valkeinen
  Cc: linux-omap, Archit Taneja, Rajendra Nayak, Sricharan R

Add a DT node for VPE in dra7.dtsi. This is experimental because we might need
to split the VPE address space a bit more, and also because the IRQ line
described is accessible the IRQ crossbar driver is added for DRA7XX.

Cc: Rajendra Nayak <rnayak@ti.com>
Cc: Sricharan R <r.sricharan@ti.com>
Signed-off-by: Archit Taneja <archit@ti.com>
---
 arch/arm/boot/dts/dra7.dtsi | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/arch/arm/boot/dts/dra7.dtsi b/arch/arm/boot/dts/dra7.dtsi
index ce9a0f0..7c1cbfe 100644
--- a/arch/arm/boot/dts/dra7.dtsi
+++ b/arch/arm/boot/dts/dra7.dtsi
@@ -484,6 +484,17 @@
 			dmas = <&sdma 70>, <&sdma 71>;
 			dma-names = "tx0", "rx0";
 		};
+
+		vpe {
+			compatible = "ti,vpe";
+			ti,hwmods = "vpe";
+			reg = <0x489d0000 0xd000>, <0x489dd000 0x400>;
+			reg-names = "vpe", "vpdma";
+			interrupts = <0 158 0x4>;
+			#address-cells = <1>;
+			#size-cells = <0>;
+		};
+
 	};
 
 	clocks {
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* Re: [PATCH 1/6] v4l: ti-vpe: Create a vpdma helper library
  2013-08-14 10:57       ` Archit Taneja
  (?)
@ 2013-08-20 11:39       ` Laurent Pinchart
  2013-08-20 12:51           ` Archit Taneja
  2013-08-20 13:16           ` Archit Taneja
  -1 siblings, 2 replies; 138+ messages in thread
From: Laurent Pinchart @ 2013-08-20 11:39 UTC (permalink / raw)
  To: Archit Taneja
  Cc: linux-media, linux-omap, dagriego, dale, pawel, m.szyprowski,
	hverkuil, tomi.valkeinen

Hi Archit,

On Wednesday 14 August 2013 16:27:57 Archit Taneja wrote:
> On Friday 09 August 2013 03:34 AM, Laurent Pinchart wrote:
> > On Friday 02 August 2013 19:33:38 Archit Taneja wrote:
> >> The primary function of VPDMA is to move data between external memory and
> >> internal processing modules(in our case, VPE) that source or sink data.
> >> VPDMA is capable of buffering this data and then delivering the data as
> >> demanded to the modules as programmed. The modules that source or sink
> >> data are referred to as clients or ports. A channel is setup inside the
> >> VPDMA to connect a specific memory buffer to a specific client. The VPDMA
> >> centralizes the DMA control functions and buffering required to allow all
> >> the clients to minimize the effect of long latency times.
> >> 
> >> Add the following to the VPDMA helper:
> >> 
> >> - A data struct which describe VPDMA channels. For now, these channels
> >> are the ones used only by VPE, the list of channels will increase when
> >> VIP(Video Input Port) also uses the VPDMA library. This channel
> >> information will be used to populate fields required by data descriptors.
> >> 
> >> - Data structs which describe the different data types supported by
> >> VPDMA. This data type information will be used to populate fields
> >> required by data descriptors and used by the VPE driver to map a V4L2
> >> format to the corresponding VPDMA data type.
> >> 
> >> - Provide VPDMA register offset definitions, functions to read, write and
> >> modify VPDMA registers.
> >> 
> >> - Functions to create and submit a VPDMA list. A list is a group of
> >> descriptors that makes up a set of DMA transfers that need to be
> >> completed. Each descriptor will either perform a DMA transaction to fetch
> >> input buffers and write to output buffers(data descriptors), or configure
> >> the MMRs of sub blocks of VPE(configuration descriptors), or provide
> >> control information to VPDMA (control descriptors).
> >> 
> >> - Functions to allocate, map and unmap buffers needed for the descriptor
> >> list, payloads containing MMR values and motion vector buffers. These use
> >> the DMA mapping APIs to ensure exclusive access to VPDMA.
> >> 
> >> - Functions to enable VPDMA interrupts. VPDMA can trigger an interrupt on
> >> the VPE interrupt line when a descriptor list is parsed completely and
> >> the DMA transactions are completed. This requires masking the events in
> >> VPDMA registers and configuring some top level VPE interrupt registers.
> >> 
> >> - Enable some VPDMA specific parameters: frame start event(when to start
> >> DMA for a client) and line mode(whether each line fetched should be
> >> mirrored or not).
> >> 
> >> - Function to load firmware required by VPDMA. VPDMA requires a firmware
> >> for it's internal list manager. We add the required request_firmware
> >> apis to fetch this firmware from user space.
> >> 
> >> - Function to dump VPDMA registers.
> >> 
> >> - A function to initialize VPDMA, this will be called by the VPE driver
> >> with it's platform device pointer, this function will take care of
> >> loading VPDMA firmware and returning a handle back to the VPE driver.
> >> The VIP driver will also call the same init function to initialize it's
> >> own VPDMA instance.
> >> 
> >> Signed-off-by: Archit Taneja <archit@ti.com>

[snip]

> >> +/*
> >> + * Allocate a DMA buffer
> >> + */
> >> +int vpdma_buf_alloc(struct vpdma_buf *buf, size_t size)
> >> +{
> >> +	buf->size = size;
> >> +	buf->mapped = 0;
> >> +	buf->addr = kzalloc(size, GFP_KERNEL);
> > 
> > You should use the dma allocation API (depending on your needs,
> > dma_alloc_coherent for instance) to allocate DMA-able buffers.
> 
> I'm not sure about this, dma_map_single() api works fine on kzalloc'd
> buffers. The above function is used to allocate small contiguous buffers
> (never more than 1024 bytes) needed for descriptors for the DMA IP. I
> thought of using DMA pool, but that creates small buffers of the same size.
> So I finally went with kzalloc.

OK, I mistakenly thought it would allocate larger buffers as well. If it's 
used to allocate descriptors only, would it be better to rename it to 
vpdma_desc_alloc() (or something similar) ?

> >> +	if (!buf->addr)
> >> +		return -ENOMEM;
> >> +
> >> +	WARN_ON((u32) buf->addr & VPDMA_DESC_ALIGN);
> >> +
> >> +	return 0;
> >> +}

[snip]

> >> +static int vpdma_load_firmware(struct vpdma_data *vpdma)
> >> +{
> >> +	int r;
> >> +	struct device *dev = &vpdma->pdev->dev;
> >> +
> >> +	r = request_firmware_nowait(THIS_MODULE, 1,
> >> +		(const char *) VPDMA_FIRMWARE, dev, GFP_KERNEL, vpdma,
> >> +		vpdma_firmware_cb);
> > 
> > Is there a reason not to use the synchronous interface ? That would
> > simplify both this code and the callers, as they won't have to check
> > whether the firmware has been correctly loaded.
> 
> I'm not clear what you mean by that, the firmware would be stored in the
> filesystem. If the driver is built-in, then the synchronous interface
> wouldn't work unless the firmware is appended to the kernel image. Am I
> missing something here? I'm not very aware of the firmware api.

request_firmware() would just sleep (with a 30 seconds timeout if I'm not 
mistaken) until userspace provides the firmware. As devices are probed 
asynchronously (in kernel threads) the system will just boot normally, and the 
request_firmware() call will return when the firmware is available.

> >> +	if (r) {
> >> +		dev_err(dev, "firmware not available %s\n", VPDMA_FIRMWARE);
> >> +		return r;
> >> +	} else {
> >> +		dev_info(dev, "loading firmware %s\n", VPDMA_FIRMWARE);
> >> +	}
> >> +
> >> +	return 0;
> >> +}

-- 
Regards,

Laurent Pinchart


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 1/6] v4l: ti-vpe: Create a vpdma helper library
  2013-08-20 11:39       ` Laurent Pinchart
@ 2013-08-20 12:51           ` Archit Taneja
  2013-08-20 13:16           ` Archit Taneja
  1 sibling, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-08-20 12:51 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: linux-media, linux-omap, dagriego, dale, pawel, m.szyprowski,
	hverkuil, tomi.valkeinen

On Tuesday 20 August 2013 05:09 PM, Laurent Pinchart wrote:
> Hi Archit,
>
> On Wednesday 14 August 2013 16:27:57 Archit Taneja wrote:
>> On Friday 09 August 2013 03:34 AM, Laurent Pinchart wrote:
>>> On Friday 02 August 2013 19:33:38 Archit Taneja wrote:
>>>> The primary function of VPDMA is to move data between external memory and
>>>> internal processing modules(in our case, VPE) that source or sink data.
>>>> VPDMA is capable of buffering this data and then delivering the data as
>>>> demanded to the modules as programmed. The modules that source or sink
>>>> data are referred to as clients or ports. A channel is setup inside the
>>>> VPDMA to connect a specific memory buffer to a specific client. The VPDMA
>>>> centralizes the DMA control functions and buffering required to allow all
>>>> the clients to minimize the effect of long latency times.
>>>>
>>>> Add the following to the VPDMA helper:
>>>>
>>>> - A data struct which describe VPDMA channels. For now, these channels
>>>> are the ones used only by VPE, the list of channels will increase when
>>>> VIP(Video Input Port) also uses the VPDMA library. This channel
>>>> information will be used to populate fields required by data descriptors.
>>>>
>>>> - Data structs which describe the different data types supported by
>>>> VPDMA. This data type information will be used to populate fields
>>>> required by data descriptors and used by the VPE driver to map a V4L2
>>>> format to the corresponding VPDMA data type.
>>>>
>>>> - Provide VPDMA register offset definitions, functions to read, write and
>>>> modify VPDMA registers.
>>>>
>>>> - Functions to create and submit a VPDMA list. A list is a group of
>>>> descriptors that makes up a set of DMA transfers that need to be
>>>> completed. Each descriptor will either perform a DMA transaction to fetch
>>>> input buffers and write to output buffers(data descriptors), or configure
>>>> the MMRs of sub blocks of VPE(configuration descriptors), or provide
>>>> control information to VPDMA (control descriptors).
>>>>
>>>> - Functions to allocate, map and unmap buffers needed for the descriptor
>>>> list, payloads containing MMR values and motion vector buffers. These use
>>>> the DMA mapping APIs to ensure exclusive access to VPDMA.
>>>>
>>>> - Functions to enable VPDMA interrupts. VPDMA can trigger an interrupt on
>>>> the VPE interrupt line when a descriptor list is parsed completely and
>>>> the DMA transactions are completed. This requires masking the events in
>>>> VPDMA registers and configuring some top level VPE interrupt registers.
>>>>
>>>> - Enable some VPDMA specific parameters: frame start event(when to start
>>>> DMA for a client) and line mode(whether each line fetched should be
>>>> mirrored or not).
>>>>
>>>> - Function to load firmware required by VPDMA. VPDMA requires a firmware
>>>> for it's internal list manager. We add the required request_firmware
>>>> apis to fetch this firmware from user space.
>>>>
>>>> - Function to dump VPDMA registers.
>>>>
>>>> - A function to initialize VPDMA, this will be called by the VPE driver
>>>> with it's platform device pointer, this function will take care of
>>>> loading VPDMA firmware and returning a handle back to the VPE driver.
>>>> The VIP driver will also call the same init function to initialize it's
>>>> own VPDMA instance.
>>>>
>>>> Signed-off-by: Archit Taneja <archit@ti.com>
>
> [snip]
>
>>>> +/*
>>>> + * Allocate a DMA buffer
>>>> + */
>>>> +int vpdma_buf_alloc(struct vpdma_buf *buf, size_t size)
>>>> +{
>>>> +	buf->size = size;
>>>> +	buf->mapped = 0;
>>>> +	buf->addr = kzalloc(size, GFP_KERNEL);
>>>
>>> You should use the dma allocation API (depending on your needs,
>>> dma_alloc_coherent for instance) to allocate DMA-able buffers.
>>
>> I'm not sure about this, dma_map_single() api works fine on kzalloc'd
>> buffers. The above function is used to allocate small contiguous buffers
>> (never more than 1024 bytes) needed for descriptors for the DMA IP. I
>> thought of using DMA pool, but that creates small buffers of the same size.
>> So I finally went with kzalloc.
>
> OK, I mistakenly thought it would allocate larger buffers as well. If it's
> used to allocate descriptors only, would it be better to rename it to
> vpdma_desc_alloc() (or something similar) ?

Actually, I just thought about this again. We use this api to allocate a 
motion vector buffer for the de-interlacer, that's a buffer which is 
proportional to the size of the frame, it takes up 4 bits per pixel. So 
for a 1080i frame(our limit), it would take around 51 kilobytes for it. 
I should probably use dma_alloc_coherent there.

>
>>>> +	if (!buf->addr)
>>>> +		return -ENOMEM;
>>>> +
>>>> +	WARN_ON((u32) buf->addr & VPDMA_DESC_ALIGN);
>>>> +
>>>> +	return 0;
>>>> +}
>
> [snip]
>
>>>> +static int vpdma_load_firmware(struct vpdma_data *vpdma)
>>>> +{
>>>> +	int r;
>>>> +	struct device *dev = &vpdma->pdev->dev;
>>>> +
>>>> +	r = request_firmware_nowait(THIS_MODULE, 1,
>>>> +		(const char *) VPDMA_FIRMWARE, dev, GFP_KERNEL, vpdma,
>>>> +		vpdma_firmware_cb);
>>>
>>> Is there a reason not to use the synchronous interface ? That would
>>> simplify both this code and the callers, as they won't have to check
>>> whether the firmware has been correctly loaded.
>>
>> I'm not clear what you mean by that, the firmware would be stored in the
>> filesystem. If the driver is built-in, then the synchronous interface
>> wouldn't work unless the firmware is appended to the kernel image. Am I
>> missing something here? I'm not very aware of the firmware api.
>
> request_firmware() would just sleep (with a 30 seconds timeout if I'm not
> mistaken) until userspace provides the firmware. As devices are probed
> asynchronously (in kernel threads) the system will just boot normally, and the
> request_firmware() call will return when the firmware is available.
>
>>>> +	if (r) {
>>>> +		dev_err(dev, "firmware not available %s\n", VPDMA_FIRMWARE);
>>>> +		return r;
>>>> +	} else {
>>>> +		dev_info(dev, "loading firmware %s\n", VPDMA_FIRMWARE);
>>>> +	}
>>>> +
>>>> +	return 0;
>>>> +}
>


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 1/6] v4l: ti-vpe: Create a vpdma helper library
@ 2013-08-20 12:51           ` Archit Taneja
  0 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-08-20 12:51 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: linux-media, linux-omap, dagriego, dale, pawel, m.szyprowski,
	hverkuil, tomi.valkeinen

On Tuesday 20 August 2013 05:09 PM, Laurent Pinchart wrote:
> Hi Archit,
>
> On Wednesday 14 August 2013 16:27:57 Archit Taneja wrote:
>> On Friday 09 August 2013 03:34 AM, Laurent Pinchart wrote:
>>> On Friday 02 August 2013 19:33:38 Archit Taneja wrote:
>>>> The primary function of VPDMA is to move data between external memory and
>>>> internal processing modules(in our case, VPE) that source or sink data.
>>>> VPDMA is capable of buffering this data and then delivering the data as
>>>> demanded to the modules as programmed. The modules that source or sink
>>>> data are referred to as clients or ports. A channel is setup inside the
>>>> VPDMA to connect a specific memory buffer to a specific client. The VPDMA
>>>> centralizes the DMA control functions and buffering required to allow all
>>>> the clients to minimize the effect of long latency times.
>>>>
>>>> Add the following to the VPDMA helper:
>>>>
>>>> - A data struct which describe VPDMA channels. For now, these channels
>>>> are the ones used only by VPE, the list of channels will increase when
>>>> VIP(Video Input Port) also uses the VPDMA library. This channel
>>>> information will be used to populate fields required by data descriptors.
>>>>
>>>> - Data structs which describe the different data types supported by
>>>> VPDMA. This data type information will be used to populate fields
>>>> required by data descriptors and used by the VPE driver to map a V4L2
>>>> format to the corresponding VPDMA data type.
>>>>
>>>> - Provide VPDMA register offset definitions, functions to read, write and
>>>> modify VPDMA registers.
>>>>
>>>> - Functions to create and submit a VPDMA list. A list is a group of
>>>> descriptors that makes up a set of DMA transfers that need to be
>>>> completed. Each descriptor will either perform a DMA transaction to fetch
>>>> input buffers and write to output buffers(data descriptors), or configure
>>>> the MMRs of sub blocks of VPE(configuration descriptors), or provide
>>>> control information to VPDMA (control descriptors).
>>>>
>>>> - Functions to allocate, map and unmap buffers needed for the descriptor
>>>> list, payloads containing MMR values and motion vector buffers. These use
>>>> the DMA mapping APIs to ensure exclusive access to VPDMA.
>>>>
>>>> - Functions to enable VPDMA interrupts. VPDMA can trigger an interrupt on
>>>> the VPE interrupt line when a descriptor list is parsed completely and
>>>> the DMA transactions are completed. This requires masking the events in
>>>> VPDMA registers and configuring some top level VPE interrupt registers.
>>>>
>>>> - Enable some VPDMA specific parameters: frame start event(when to start
>>>> DMA for a client) and line mode(whether each line fetched should be
>>>> mirrored or not).
>>>>
>>>> - Function to load firmware required by VPDMA. VPDMA requires a firmware
>>>> for it's internal list manager. We add the required request_firmware
>>>> apis to fetch this firmware from user space.
>>>>
>>>> - Function to dump VPDMA registers.
>>>>
>>>> - A function to initialize VPDMA, this will be called by the VPE driver
>>>> with it's platform device pointer, this function will take care of
>>>> loading VPDMA firmware and returning a handle back to the VPE driver.
>>>> The VIP driver will also call the same init function to initialize it's
>>>> own VPDMA instance.
>>>>
>>>> Signed-off-by: Archit Taneja <archit@ti.com>
>
> [snip]
>
>>>> +/*
>>>> + * Allocate a DMA buffer
>>>> + */
>>>> +int vpdma_buf_alloc(struct vpdma_buf *buf, size_t size)
>>>> +{
>>>> +	buf->size = size;
>>>> +	buf->mapped = 0;
>>>> +	buf->addr = kzalloc(size, GFP_KERNEL);
>>>
>>> You should use the dma allocation API (depending on your needs,
>>> dma_alloc_coherent for instance) to allocate DMA-able buffers.
>>
>> I'm not sure about this, dma_map_single() api works fine on kzalloc'd
>> buffers. The above function is used to allocate small contiguous buffers
>> (never more than 1024 bytes) needed for descriptors for the DMA IP. I
>> thought of using DMA pool, but that creates small buffers of the same size.
>> So I finally went with kzalloc.
>
> OK, I mistakenly thought it would allocate larger buffers as well. If it's
> used to allocate descriptors only, would it be better to rename it to
> vpdma_desc_alloc() (or something similar) ?

Actually, I just thought about this again. We use this api to allocate a 
motion vector buffer for the de-interlacer, that's a buffer which is 
proportional to the size of the frame, it takes up 4 bits per pixel. So 
for a 1080i frame(our limit), it would take around 51 kilobytes for it. 
I should probably use dma_alloc_coherent there.

>
>>>> +	if (!buf->addr)
>>>> +		return -ENOMEM;
>>>> +
>>>> +	WARN_ON((u32) buf->addr & VPDMA_DESC_ALIGN);
>>>> +
>>>> +	return 0;
>>>> +}
>
> [snip]
>
>>>> +static int vpdma_load_firmware(struct vpdma_data *vpdma)
>>>> +{
>>>> +	int r;
>>>> +	struct device *dev = &vpdma->pdev->dev;
>>>> +
>>>> +	r = request_firmware_nowait(THIS_MODULE, 1,
>>>> +		(const char *) VPDMA_FIRMWARE, dev, GFP_KERNEL, vpdma,
>>>> +		vpdma_firmware_cb);
>>>
>>> Is there a reason not to use the synchronous interface ? That would
>>> simplify both this code and the callers, as they won't have to check
>>> whether the firmware has been correctly loaded.
>>
>> I'm not clear what you mean by that, the firmware would be stored in the
>> filesystem. If the driver is built-in, then the synchronous interface
>> wouldn't work unless the firmware is appended to the kernel image. Am I
>> missing something here? I'm not very aware of the firmware api.
>
> request_firmware() would just sleep (with a 30 seconds timeout if I'm not
> mistaken) until userspace provides the firmware. As devices are probed
> asynchronously (in kernel threads) the system will just boot normally, and the
> request_firmware() call will return when the firmware is available.
>
>>>> +	if (r) {
>>>> +		dev_err(dev, "firmware not available %s\n", VPDMA_FIRMWARE);
>>>> +		return r;
>>>> +	} else {
>>>> +		dev_info(dev, "loading firmware %s\n", VPDMA_FIRMWARE);
>>>> +	}
>>>> +
>>>> +	return 0;
>>>> +}
>

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 1/6] v4l: ti-vpe: Create a vpdma helper library
  2013-08-20 11:39       ` Laurent Pinchart
@ 2013-08-20 13:16           ` Archit Taneja
  2013-08-20 13:16           ` Archit Taneja
  1 sibling, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-08-20 13:16 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Archit Taneja, linux-media, linux-omap, dagriego, dale, pawel,
	m.szyprowski, hverkuil, tomi.valkeinen

Hi Laurent,

On Tuesday 20 August 2013 05:09 PM, Laurent Pinchart wrote:

<snip>

>>>> +static int vpdma_load_firmware(struct vpdma_data *vpdma)
>>>> +{
>>>> +	int r;
>>>> +	struct device *dev = &vpdma->pdev->dev;
>>>> +
>>>> +	r = request_firmware_nowait(THIS_MODULE, 1,
>>>> +		(const char *) VPDMA_FIRMWARE, dev, GFP_KERNEL, vpdma,
>>>> +		vpdma_firmware_cb);
>>>
>>> Is there a reason not to use the synchronous interface ? That would
>>> simplify both this code and the callers, as they won't have to check
>>> whether the firmware has been correctly loaded.
>>
>> I'm not clear what you mean by that, the firmware would be stored in the
>> filesystem. If the driver is built-in, then the synchronous interface
>> wouldn't work unless the firmware is appended to the kernel image. Am I
>> missing something here? I'm not very aware of the firmware api.
>
> request_firmware() would just sleep (with a 30 seconds timeout if I'm not
> mistaken) until userspace provides the firmware. As devices are probed
> asynchronously (in kernel threads) the system will just boot normally, and the
> request_firmware() call will return when the firmware is available.

Sorry, I sent the previous mail bit too early.

With request_firmware() and the driver built-in, I see that the kernel 
stalls for 10 seconds at the driver's probe, and the firware loading 
fails since we didn't enter userspace where the file is.

The probing of devices asynchronously with kernel threads makes sense, 
so it's possible that I'm doing something wrong here. I'll give it a try 
again

Archit


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 1/6] v4l: ti-vpe: Create a vpdma helper library
@ 2013-08-20 13:16           ` Archit Taneja
  0 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-08-20 13:16 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Archit Taneja, linux-media, linux-omap, dagriego, dale, pawel,
	m.szyprowski, hverkuil, tomi.valkeinen

Hi Laurent,

On Tuesday 20 August 2013 05:09 PM, Laurent Pinchart wrote:

<snip>

>>>> +static int vpdma_load_firmware(struct vpdma_data *vpdma)
>>>> +{
>>>> +	int r;
>>>> +	struct device *dev = &vpdma->pdev->dev;
>>>> +
>>>> +	r = request_firmware_nowait(THIS_MODULE, 1,
>>>> +		(const char *) VPDMA_FIRMWARE, dev, GFP_KERNEL, vpdma,
>>>> +		vpdma_firmware_cb);
>>>
>>> Is there a reason not to use the synchronous interface ? That would
>>> simplify both this code and the callers, as they won't have to check
>>> whether the firmware has been correctly loaded.
>>
>> I'm not clear what you mean by that, the firmware would be stored in the
>> filesystem. If the driver is built-in, then the synchronous interface
>> wouldn't work unless the firmware is appended to the kernel image. Am I
>> missing something here? I'm not very aware of the firmware api.
>
> request_firmware() would just sleep (with a 30 seconds timeout if I'm not
> mistaken) until userspace provides the firmware. As devices are probed
> asynchronously (in kernel threads) the system will just boot normally, and the
> request_firmware() call will return when the firmware is available.

Sorry, I sent the previous mail bit too early.

With request_firmware() and the driver built-in, I see that the kernel 
stalls for 10 seconds at the driver's probe, and the firware loading 
fails since we didn't enter userspace where the file is.

The probing of devices asynchronously with kernel threads makes sense, 
so it's possible that I'm doing something wrong here. I'll give it a try 
again

Archit


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 1/6] v4l: ti-vpe: Create a vpdma helper library
  2013-08-20 13:16           ` Archit Taneja
  (?)
@ 2013-08-20 13:56           ` Laurent Pinchart
  2013-08-21  6:47               ` Archit Taneja
  -1 siblings, 1 reply; 138+ messages in thread
From: Laurent Pinchart @ 2013-08-20 13:56 UTC (permalink / raw)
  To: Archit Taneja
  Cc: Archit Taneja, linux-media, linux-omap, dagriego, dale, pawel,
	m.szyprowski, hverkuil, tomi.valkeinen

Hi Archit,

On Tuesday 20 August 2013 18:46:38 Archit Taneja wrote:
> On Tuesday 20 August 2013 05:09 PM, Laurent Pinchart wrote:
> 
> <snip>
> 
> >>>> +static int vpdma_load_firmware(struct vpdma_data *vpdma)
> >>>> +{
> >>>> +	int r;
> >>>> +	struct device *dev = &vpdma->pdev->dev;
> >>>> +
> >>>> +	r = request_firmware_nowait(THIS_MODULE, 1,
> >>>> +		(const char *) VPDMA_FIRMWARE, dev, GFP_KERNEL, vpdma,
> >>>> +		vpdma_firmware_cb);
> >>> 
> >>> Is there a reason not to use the synchronous interface ? That would
> >>> simplify both this code and the callers, as they won't have to check
> >>> whether the firmware has been correctly loaded.
> >> 
> >> I'm not clear what you mean by that, the firmware would be stored in the
> >> filesystem. If the driver is built-in, then the synchronous interface
> >> wouldn't work unless the firmware is appended to the kernel image. Am I
> >> missing something here? I'm not very aware of the firmware api.
> > 
> > request_firmware() would just sleep (with a 30 seconds timeout if I'm not
> > mistaken) until userspace provides the firmware. As devices are probed
> > asynchronously (in kernel threads) the system will just boot normally, and
> > the request_firmware() call will return when the firmware is available.
>
> Sorry, I sent the previous mail bit too early.
> 
> With request_firmware() and the driver built-in, I see that the kernel
> stalls for 10 seconds at the driver's probe, and the firware loading fails
> since we didn't enter userspace where the file is.
> 
> The probing of devices asynchronously with kernel threads makes sense, so
> it's possible that I'm doing something wrong here. I'll give it a try again

I might have spoken too fast. It looks like module initcalls are not run in 
threads. I've most probably mistaken that with asynchronous probing of hot-
pluggable devices.

If your driver is built-in then it looks like the correct solution is to build 
the firmware in the kernel image as well, or use the asynchronous API as you 
did.

-- 
Regards,

Laurent Pinchart


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 1/6] v4l: ti-vpe: Create a vpdma helper library
  2013-08-20 13:56           ` Laurent Pinchart
@ 2013-08-21  6:47               ` Archit Taneja
  0 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-08-21  6:47 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: linux-media, linux-omap, dagriego, dale, pawel, m.szyprowski,
	hverkuil, tomi.valkeinen

Hi,

On Tuesday 20 August 2013 07:26 PM, Laurent Pinchart wrote:
> Hi Archit,
>
> On Tuesday 20 August 2013 18:46:38 Archit Taneja wrote:
>> On Tuesday 20 August 2013 05:09 PM, Laurent Pinchart wrote:
>>
>> <snip>
>>
>>>>>> +static int vpdma_load_firmware(struct vpdma_data *vpdma)
>>>>>> +{
>>>>>> +	int r;
>>>>>> +	struct device *dev = &vpdma->pdev->dev;
>>>>>> +
>>>>>> +	r = request_firmware_nowait(THIS_MODULE, 1,
>>>>>> +		(const char *) VPDMA_FIRMWARE, dev, GFP_KERNEL, vpdma,
>>>>>> +		vpdma_firmware_cb);
>>>>>
>>>>> Is there a reason not to use the synchronous interface ? That would
>>>>> simplify both this code and the callers, as they won't have to check
>>>>> whether the firmware has been correctly loaded.
>>>>
>>>> I'm not clear what you mean by that, the firmware would be stored in the
>>>> filesystem. If the driver is built-in, then the synchronous interface
>>>> wouldn't work unless the firmware is appended to the kernel image. Am I
>>>> missing something here? I'm not very aware of the firmware api.
>>>
>>> request_firmware() would just sleep (with a 30 seconds timeout if I'm not
>>> mistaken) until userspace provides the firmware. As devices are probed
>>> asynchronously (in kernel threads) the system will just boot normally, and
>>> the request_firmware() call will return when the firmware is available.
>>
>> Sorry, I sent the previous mail bit too early.
>>
>> With request_firmware() and the driver built-in, I see that the kernel
>> stalls for 10 seconds at the driver's probe, and the firware loading fails
>> since we didn't enter userspace where the file is.
>>
>> The probing of devices asynchronously with kernel threads makes sense, so
>> it's possible that I'm doing something wrong here. I'll give it a try again
>
> I might have spoken too fast. It looks like module initcalls are not run in
> threads. I've most probably mistaken that with asynchronous probing of hot-
> pluggable devices.
>
> If your driver is built-in then it looks like the correct solution is to build
> the firmware in the kernel image as well, or use the asynchronous API as you
> did.

Okay, thanks for clarifying that.

We could use the request_firmware synchronous version if we call it in 
the open v4l2 file op.

Maybe I could load the firmware when the device is opened the first 
time(one instance).

I'll have to see whether it slows things down, and if I'd need to load 
firmware more often. But I'd probably leave this experiment for later.

Archit




^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 1/6] v4l: ti-vpe: Create a vpdma helper library
@ 2013-08-21  6:47               ` Archit Taneja
  0 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-08-21  6:47 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: linux-media, linux-omap, dagriego, dale, pawel, m.szyprowski,
	hverkuil, tomi.valkeinen

Hi,

On Tuesday 20 August 2013 07:26 PM, Laurent Pinchart wrote:
> Hi Archit,
>
> On Tuesday 20 August 2013 18:46:38 Archit Taneja wrote:
>> On Tuesday 20 August 2013 05:09 PM, Laurent Pinchart wrote:
>>
>> <snip>
>>
>>>>>> +static int vpdma_load_firmware(struct vpdma_data *vpdma)
>>>>>> +{
>>>>>> +	int r;
>>>>>> +	struct device *dev = &vpdma->pdev->dev;
>>>>>> +
>>>>>> +	r = request_firmware_nowait(THIS_MODULE, 1,
>>>>>> +		(const char *) VPDMA_FIRMWARE, dev, GFP_KERNEL, vpdma,
>>>>>> +		vpdma_firmware_cb);
>>>>>
>>>>> Is there a reason not to use the synchronous interface ? That would
>>>>> simplify both this code and the callers, as they won't have to check
>>>>> whether the firmware has been correctly loaded.
>>>>
>>>> I'm not clear what you mean by that, the firmware would be stored in the
>>>> filesystem. If the driver is built-in, then the synchronous interface
>>>> wouldn't work unless the firmware is appended to the kernel image. Am I
>>>> missing something here? I'm not very aware of the firmware api.
>>>
>>> request_firmware() would just sleep (with a 30 seconds timeout if I'm not
>>> mistaken) until userspace provides the firmware. As devices are probed
>>> asynchronously (in kernel threads) the system will just boot normally, and
>>> the request_firmware() call will return when the firmware is available.
>>
>> Sorry, I sent the previous mail bit too early.
>>
>> With request_firmware() and the driver built-in, I see that the kernel
>> stalls for 10 seconds at the driver's probe, and the firware loading fails
>> since we didn't enter userspace where the file is.
>>
>> The probing of devices asynchronously with kernel threads makes sense, so
>> it's possible that I'm doing something wrong here. I'll give it a try again
>
> I might have spoken too fast. It looks like module initcalls are not run in
> threads. I've most probably mistaken that with asynchronous probing of hot-
> pluggable devices.
>
> If your driver is built-in then it looks like the correct solution is to build
> the firmware in the kernel image as well, or use the asynchronous API as you
> did.

Okay, thanks for clarifying that.

We could use the request_firmware synchronous version if we call it in 
the open v4l2 file op.

Maybe I could load the firmware when the device is opened the first 
time(one instance).

I'll have to see whether it slows things down, and if I'd need to load 
firmware more often. But I'd probably leave this experiment for later.

Archit




^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v3 0/6] v4l: VPE mem to mem driver
  2013-08-20 11:00   ` Archit Taneja
@ 2013-08-29 12:32     ` Archit Taneja
  -1 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-08-29 12:32 UTC (permalink / raw)
  To: linux-media
  Cc: hverkuil, laurent.pinchart, tomi.valkeinen, linux-omap, Archit Taneja

VPE(Video Processing Engine) is an IP found on DRA7xx, this series adds VPE as a
mem to mem v4l2 driver, and VPDMA as a helper library.

The first version of the patch series described VPE in detail, you can have a
look at it here:

http://www.spinics.net/lists/linux-media/msg66518.html

The only change in v3 is that DMA allocation APIs for motion vector buffers
instead of kzalloc as they can take up to 100Kb of memory. The descriptors used
by VPDMA are still allocated via kzalloc. The allocation/mapping api for VPDMA
was renamed such that we know it's for allocating descriptor lists and
descriptor payloads.

Archit Taneja (6):
  v4l: ti-vpe: Create a vpdma helper library
  v4l: ti-vpe: Add helpers for creating VPDMA descriptors
  v4l: ti-vpe: Add VPE mem to mem driver
  v4l: ti-vpe: Add de-interlacer support in VPE
  arm: dra7xx: hwmod data: add VPE hwmod data and ocp_if info
  experimental: arm: dts: dra7xx: Add a DT node for VPE

 arch/arm/boot/dts/dra7.dtsi                |   11 +
 arch/arm/mach-omap2/omap_hwmod_7xx_data.c  |   42 +
 drivers/media/platform/Kconfig             |   16 +
 drivers/media/platform/Makefile            |    2 +
 drivers/media/platform/ti-vpe/Makefile     |    5 +
 drivers/media/platform/ti-vpe/vpdma.c      |  846 ++++++++++++
 drivers/media/platform/ti-vpe/vpdma.h      |  202 +++
 drivers/media/platform/ti-vpe/vpdma_priv.h |  640 +++++++++
 drivers/media/platform/ti-vpe/vpe.c        | 2050 ++++++++++++++++++++++++++++
 drivers/media/platform/ti-vpe/vpe_regs.h   |  496 +++++++
 10 files changed, 4310 insertions(+)
 create mode 100644 drivers/media/platform/ti-vpe/Makefile
 create mode 100644 drivers/media/platform/ti-vpe/vpdma.c
 create mode 100644 drivers/media/platform/ti-vpe/vpdma.h
 create mode 100644 drivers/media/platform/ti-vpe/vpdma_priv.h
 create mode 100644 drivers/media/platform/ti-vpe/vpe.c
 create mode 100644 drivers/media/platform/ti-vpe/vpe_regs.h

-- 
1.8.1.2


^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v3 0/6] v4l: VPE mem to mem driver
@ 2013-08-29 12:32     ` Archit Taneja
  0 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-08-29 12:32 UTC (permalink / raw)
  To: linux-media
  Cc: hverkuil, laurent.pinchart, tomi.valkeinen, linux-omap, Archit Taneja

VPE(Video Processing Engine) is an IP found on DRA7xx, this series adds VPE as a
mem to mem v4l2 driver, and VPDMA as a helper library.

The first version of the patch series described VPE in detail, you can have a
look at it here:

http://www.spinics.net/lists/linux-media/msg66518.html

The only change in v3 is that DMA allocation APIs for motion vector buffers
instead of kzalloc as they can take up to 100Kb of memory. The descriptors used
by VPDMA are still allocated via kzalloc. The allocation/mapping api for VPDMA
was renamed such that we know it's for allocating descriptor lists and
descriptor payloads.

Archit Taneja (6):
  v4l: ti-vpe: Create a vpdma helper library
  v4l: ti-vpe: Add helpers for creating VPDMA descriptors
  v4l: ti-vpe: Add VPE mem to mem driver
  v4l: ti-vpe: Add de-interlacer support in VPE
  arm: dra7xx: hwmod data: add VPE hwmod data and ocp_if info
  experimental: arm: dts: dra7xx: Add a DT node for VPE

 arch/arm/boot/dts/dra7.dtsi                |   11 +
 arch/arm/mach-omap2/omap_hwmod_7xx_data.c  |   42 +
 drivers/media/platform/Kconfig             |   16 +
 drivers/media/platform/Makefile            |    2 +
 drivers/media/platform/ti-vpe/Makefile     |    5 +
 drivers/media/platform/ti-vpe/vpdma.c      |  846 ++++++++++++
 drivers/media/platform/ti-vpe/vpdma.h      |  202 +++
 drivers/media/platform/ti-vpe/vpdma_priv.h |  640 +++++++++
 drivers/media/platform/ti-vpe/vpe.c        | 2050 ++++++++++++++++++++++++++++
 drivers/media/platform/ti-vpe/vpe_regs.h   |  496 +++++++
 10 files changed, 4310 insertions(+)
 create mode 100644 drivers/media/platform/ti-vpe/Makefile
 create mode 100644 drivers/media/platform/ti-vpe/vpdma.c
 create mode 100644 drivers/media/platform/ti-vpe/vpdma.h
 create mode 100644 drivers/media/platform/ti-vpe/vpdma_priv.h
 create mode 100644 drivers/media/platform/ti-vpe/vpe.c
 create mode 100644 drivers/media/platform/ti-vpe/vpe_regs.h

-- 
1.8.1.2

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v3 1/6] v4l: ti-vpe: Create a vpdma helper library
  2013-08-29 12:32     ` Archit Taneja
@ 2013-08-29 12:32       ` Archit Taneja
  -1 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-08-29 12:32 UTC (permalink / raw)
  To: linux-media
  Cc: hverkuil, laurent.pinchart, tomi.valkeinen, linux-omap, Archit Taneja

The primary function of VPDMA is to move data between external memory and
internal processing modules(in our case, VPE) that source or sink data. VPDMA is
capable of buffering this data and then delivering the data as demanded to the
modules as programmed. The modules that source or sink data are referred to as
clients or ports. A channel is setup inside the VPDMA to connect a specific
memory buffer to a specific client. The VPDMA centralizes the DMA control
functions and buffering required to allow all the clients to minimize the
effect of long latency times.

Add the following to the VPDMA helper:

- A data struct which describe VPDMA channels. For now, these channels are the
  ones used only by VPE, the list of channels will increase when VIP(Video
  Input Port) also uses the VPDMA library. This channel information will be
  used to populate fields required by data descriptors.

- Data structs which describe the different data types supported by VPDMA. This
  data type information will be used to populate fields required by data
  descriptors and used by the VPE driver to map a V4L2 format to the
  corresponding VPDMA data type.

- Provide VPDMA register offset definitions, functions to read, write and modify
  VPDMA registers.

- Functions to create and submit a VPDMA list. A list is a group of descriptors
  that makes up a set of DMA transfers that need to be completed. Each
  descriptor will either perform a DMA transaction to fetch input buffers and
  write to output buffers(data descriptors), or configure the MMRs of sub blocks
  of VPE(configuration descriptors), or provide control information to VPDMA
  (control descriptors).

- Functions to allocate, map and unmap buffers needed for the descriptor list,
  payloads containing MMR values and scaler coefficients. These use the DMA
  mapping APIs to ensure exclusive access to VPDMA.

- Functions to enable VPDMA interrupts. VPDMA can trigger an interrupt on the
  VPE interrupt line when a descriptor list is parsed completely and the DMA
  transactions are completed. This requires masking the events in VPDMA
  registers and configuring some top level VPE interrupt registers.

- Enable some VPDMA specific parameters: frame start event(when to start DMA for
  a client) and line mode(whether each line fetched should be mirrored or not).

- Function to load firmware required by VPDMA. VPDMA requires a firmware for
  it's internal list manager. We add the required request_firmware apis to fetch
  this firmware from user space.

- Function to dump VPDMA registers.

- A function to initialize and create a VPDMA instance, this will be called by
  the VPE driver with it's platform device pointer, this function will take care
  of loading VPDMA firmware and returning a vpdma_data instance back to the VPE
  driver. The VIP driver will also call the same init function to initialize it's
  own VPDMA instance.

Signed-off-by: Archit Taneja <archit@ti.com>
---
 drivers/media/platform/ti-vpe/vpdma.c      | 578 +++++++++++++++++++++++++++++
 drivers/media/platform/ti-vpe/vpdma.h      | 154 ++++++++
 drivers/media/platform/ti-vpe/vpdma_priv.h | 119 ++++++
 3 files changed, 851 insertions(+)
 create mode 100644 drivers/media/platform/ti-vpe/vpdma.c
 create mode 100644 drivers/media/platform/ti-vpe/vpdma.h
 create mode 100644 drivers/media/platform/ti-vpe/vpdma_priv.h

diff --git a/drivers/media/platform/ti-vpe/vpdma.c b/drivers/media/platform/ti-vpe/vpdma.c
new file mode 100644
index 0000000..42db12c
--- /dev/null
+++ b/drivers/media/platform/ti-vpe/vpdma.c
@@ -0,0 +1,578 @@
+/*
+ * VPDMA helper library
+ *
+ * Copyright (c) 2013 Texas Instruments Inc.
+ *
+ * David Griego, <dagriego@biglakesoftware.com>
+ * Dale Farnsworth, <dale@farnsworth.org>
+ * Archit Taneja, <archit@ti.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#include <linux/delay.h>
+#include <linux/dma-mapping.h>
+#include <linux/err.h>
+#include <linux/firmware.h>
+#include <linux/io.h>
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+
+#include "vpdma.h"
+#include "vpdma_priv.h"
+
+#define VPDMA_FIRMWARE	"vpdma-1b8.bin"
+
+const struct vpdma_data_format vpdma_yuv_fmts[] = {
+	[VPDMA_DATA_FMT_Y444] = {
+		.data_type	= DATA_TYPE_Y444,
+		.depth		= 8,
+	},
+	[VPDMA_DATA_FMT_Y422] = {
+		.data_type	= DATA_TYPE_Y422,
+		.depth		= 8,
+	},
+	[VPDMA_DATA_FMT_Y420] = {
+		.data_type	= DATA_TYPE_Y420,
+		.depth		= 8,
+	},
+	[VPDMA_DATA_FMT_C444] = {
+		.data_type	= DATA_TYPE_C444,
+		.depth		= 8,
+	},
+	[VPDMA_DATA_FMT_C422] = {
+		.data_type	= DATA_TYPE_C422,
+		.depth		= 8,
+	},
+	[VPDMA_DATA_FMT_C420] = {
+		.data_type	= DATA_TYPE_C420,
+		.depth		= 4,
+	},
+	[VPDMA_DATA_FMT_YC422] = {
+		.data_type	= DATA_TYPE_YC422,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_YC444] = {
+		.data_type	= DATA_TYPE_YC444,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_CY422] = {
+		.data_type	= DATA_TYPE_CY422,
+		.depth		= 16,
+	},
+};
+
+const struct vpdma_data_format vpdma_rgb_fmts[] = {
+	[VPDMA_DATA_FMT_RGB565] = {
+		.data_type	= DATA_TYPE_RGB16_565,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_ARGB16_1555] = {
+		.data_type	= DATA_TYPE_ARGB_1555,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_ARGB16] = {
+		.data_type	= DATA_TYPE_ARGB_4444,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_RGBA16_5551] = {
+		.data_type	= DATA_TYPE_RGBA_5551,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_RGBA16] = {
+		.data_type	= DATA_TYPE_RGBA_4444,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_ARGB24] = {
+		.data_type	= DATA_TYPE_ARGB24_6666,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_RGB24] = {
+		.data_type	= DATA_TYPE_RGB24_888,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_ARGB32] = {
+		.data_type	= DATA_TYPE_ARGB32_8888,
+		.depth		= 32,
+	},
+	[VPDMA_DATA_FMT_RGBA24] = {
+		.data_type	= DATA_TYPE_RGBA24_6666,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_RGBA32] = {
+		.data_type	= DATA_TYPE_RGBA32_8888,
+		.depth		= 32,
+	},
+	[VPDMA_DATA_FMT_BGR565] = {
+		.data_type	= DATA_TYPE_BGR16_565,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_ABGR16_1555] = {
+		.data_type	= DATA_TYPE_ABGR_1555,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_ABGR16] = {
+		.data_type	= DATA_TYPE_ABGR_4444,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_BGRA16_5551] = {
+		.data_type	= DATA_TYPE_BGRA_5551,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_BGRA16] = {
+		.data_type	= DATA_TYPE_BGRA_4444,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_ABGR24] = {
+		.data_type	= DATA_TYPE_ABGR24_6666,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_BGR24] = {
+		.data_type	= DATA_TYPE_BGR24_888,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_ABGR32] = {
+		.data_type	= DATA_TYPE_ABGR32_8888,
+		.depth		= 32,
+	},
+	[VPDMA_DATA_FMT_BGRA24] = {
+		.data_type	= DATA_TYPE_BGRA24_6666,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_BGRA32] = {
+		.data_type	= DATA_TYPE_BGRA32_8888,
+		.depth		= 32,
+	},
+};
+
+const struct vpdma_data_format vpdma_misc_fmts[] = {
+	[VPDMA_DATA_FMT_MV] = {
+		.data_type	= DATA_TYPE_MV,
+		.depth		= 4,
+	},
+};
+
+struct vpdma_channel_info {
+	int num;		/* VPDMA channel number */
+	int cstat_offset;	/* client CSTAT register offset */
+};
+
+static const struct vpdma_channel_info chan_info[] = {
+	[VPE_CHAN_LUMA1_IN] = {
+		.num		= VPE_CHAN_NUM_LUMA1_IN,
+		.cstat_offset	= VPDMA_DEI_LUMA1_CSTAT,
+	},
+	[VPE_CHAN_CHROMA1_IN] = {
+		.num		= VPE_CHAN_NUM_CHROMA1_IN,
+		.cstat_offset	= VPDMA_DEI_CHROMA1_CSTAT,
+	},
+	[VPE_CHAN_LUMA2_IN] = {
+		.num		= VPE_CHAN_NUM_LUMA2_IN,
+		.cstat_offset	= VPDMA_DEI_LUMA2_CSTAT,
+	},
+	[VPE_CHAN_CHROMA2_IN] = {
+		.num		= VPE_CHAN_NUM_CHROMA2_IN,
+		.cstat_offset	= VPDMA_DEI_CHROMA2_CSTAT,
+	},
+	[VPE_CHAN_LUMA3_IN] = {
+		.num		= VPE_CHAN_NUM_LUMA3_IN,
+		.cstat_offset	= VPDMA_DEI_LUMA3_CSTAT,
+	},
+	[VPE_CHAN_CHROMA3_IN] = {
+		.num		= VPE_CHAN_NUM_CHROMA3_IN,
+		.cstat_offset	= VPDMA_DEI_CHROMA3_CSTAT,
+	},
+	[VPE_CHAN_MV_IN] = {
+		.num		= VPE_CHAN_NUM_MV_IN,
+		.cstat_offset	= VPDMA_DEI_MV_IN_CSTAT,
+	},
+	[VPE_CHAN_MV_OUT] = {
+		.num		= VPE_CHAN_NUM_MV_OUT,
+		.cstat_offset	= VPDMA_DEI_MV_OUT_CSTAT,
+	},
+	[VPE_CHAN_LUMA_OUT] = {
+		.num		= VPE_CHAN_NUM_LUMA_OUT,
+		.cstat_offset	= VPDMA_VIP_UP_Y_CSTAT,
+	},
+	[VPE_CHAN_CHROMA_OUT] = {
+		.num		= VPE_CHAN_NUM_CHROMA_OUT,
+		.cstat_offset	= VPDMA_VIP_UP_UV_CSTAT,
+	},
+	[VPE_CHAN_RGB_OUT] = {
+		.num		= VPE_CHAN_NUM_RGB_OUT,
+		.cstat_offset	= VPDMA_VIP_UP_Y_CSTAT,
+	},
+};
+
+static u32 read_reg(struct vpdma_data *vpdma, int offset)
+{
+	return ioread32(vpdma->base + offset);
+}
+
+static void write_reg(struct vpdma_data *vpdma, int offset, u32 value)
+{
+	iowrite32(value, vpdma->base + offset);
+}
+
+static int read_field_reg(struct vpdma_data *vpdma, int offset,
+		u32 mask, int shift)
+{
+	return (read_reg(vpdma, offset) & (mask << shift)) >> shift;
+}
+
+static void write_field_reg(struct vpdma_data *vpdma, int offset, u32 field,
+		u32 mask, int shift)
+{
+	u32 val = read_reg(vpdma, offset);
+
+	val &= ~(mask << shift);
+	val |= (field & mask) << shift;
+
+	write_reg(vpdma, offset, val);
+}
+
+void vpdma_dump_regs(struct vpdma_data *vpdma)
+{
+	struct device *dev = &vpdma->pdev->dev;
+
+#define DUMPREG(r) dev_dbg(dev, "%-35s %08x\n", #r, read_reg(vpdma, VPDMA_##r))
+
+	dev_dbg(dev, "VPDMA Registers:\n");
+
+	DUMPREG(PID);
+	DUMPREG(LIST_ADDR);
+	DUMPREG(LIST_ATTR);
+	DUMPREG(LIST_STAT_SYNC);
+	DUMPREG(BG_RGB);
+	DUMPREG(BG_YUV);
+	DUMPREG(SETUP);
+	DUMPREG(MAX_SIZE1);
+	DUMPREG(MAX_SIZE2);
+	DUMPREG(MAX_SIZE3);
+
+	/*
+	 * dumping registers of only group0 and group3, because VPE channels
+	 * lie within group0 and group3 registers
+	 */
+	DUMPREG(INT_CHAN_STAT(0));
+	DUMPREG(INT_CHAN_MASK(0));
+	DUMPREG(INT_CHAN_STAT(3));
+	DUMPREG(INT_CHAN_MASK(3));
+	DUMPREG(INT_CLIENT0_STAT);
+	DUMPREG(INT_CLIENT0_MASK);
+	DUMPREG(INT_CLIENT1_STAT);
+	DUMPREG(INT_CLIENT1_MASK);
+	DUMPREG(INT_LIST0_STAT);
+	DUMPREG(INT_LIST0_MASK);
+
+	/*
+	 * these are registers specific to VPE clients, we can make this
+	 * function dump client registers specific to VPE or VIP based on
+	 * who is using it
+	 */
+	DUMPREG(DEI_CHROMA1_CSTAT);
+	DUMPREG(DEI_LUMA1_CSTAT);
+	DUMPREG(DEI_CHROMA2_CSTAT);
+	DUMPREG(DEI_LUMA2_CSTAT);
+	DUMPREG(DEI_CHROMA3_CSTAT);
+	DUMPREG(DEI_LUMA3_CSTAT);
+	DUMPREG(DEI_MV_IN_CSTAT);
+	DUMPREG(DEI_MV_OUT_CSTAT);
+	DUMPREG(VIP_UP_Y_CSTAT);
+	DUMPREG(VIP_UP_UV_CSTAT);
+	DUMPREG(VPI_CTL_CSTAT);
+}
+
+/*
+ * Allocate a DMA buffer
+ */
+int vpdma_alloc_desc_buf(struct vpdma_buf *buf, size_t size)
+{
+	buf->size = size;
+	buf->mapped = false;
+	buf->addr = kzalloc(size, GFP_KERNEL);
+	if (!buf->addr)
+		return -ENOMEM;
+
+	WARN_ON((u32) buf->addr & VPDMA_DESC_ALIGN);
+
+	return 0;
+}
+
+void vpdma_free_desc_buf(struct vpdma_buf *buf)
+{
+	WARN_ON(buf->mapped);
+	kfree(buf->addr);
+	buf->addr = NULL;
+	buf->size = 0;
+}
+
+/*
+ * map descriptor/payload DMA buffer, enabling DMA access
+ */
+int vpdma_map_desc_buf(struct vpdma_data *vpdma, struct vpdma_buf *buf)
+{
+	struct device *dev = &vpdma->pdev->dev;
+
+	WARN_ON(buf->mapped);
+	buf->dma_addr = dma_map_single(dev, buf->addr, buf->size,
+				DMA_TO_DEVICE);
+	if (dma_mapping_error(dev, buf->dma_addr)) {
+		dev_err(dev, "failed to map buffer\n");
+		return -EINVAL;
+	}
+
+	buf->mapped = true;
+
+	return 0;
+}
+
+/*
+ * unmap descriptor/payload DMA buffer, disabling DMA access and
+ * allowing the main processor to acces the data
+ */
+void vpdma_unmap_desc_buf(struct vpdma_data *vpdma, struct vpdma_buf *buf)
+{
+	struct device *dev = &vpdma->pdev->dev;
+
+	if (buf->mapped)
+		dma_unmap_single(dev, buf->dma_addr, buf->size, DMA_TO_DEVICE);
+
+	buf->mapped = false;
+}
+
+/*
+ * create a descriptor list, the user of this list will append configuration,
+ * control and data descriptors to this list, this list will be submitted to
+ * VPDMA. VPDMA's list parser will go through each descriptor and perform the
+ * required DMA operations
+ */
+int vpdma_create_desc_list(struct vpdma_desc_list *list, size_t size, int type)
+{
+	int r;
+
+	r = vpdma_alloc_desc_buf(&list->buf, size);
+	if (r)
+		return r;
+
+	list->next = list->buf.addr;
+
+	list->type = type;
+
+	return 0;
+}
+
+/*
+ * once a descriptor list is parsed by VPDMA, we reset the list by emptying it,
+ * to allow new descriptors to be added to the list.
+ */
+void vpdma_reset_desc_list(struct vpdma_desc_list *list)
+{
+	list->next = list->buf.addr;
+}
+
+/*
+ * free the buffer allocated fot the VPDMA descriptor list, this should be
+ * called when the user doesn't want to use VPDMA any more.
+ */
+void vpdma_free_desc_list(struct vpdma_desc_list *list)
+{
+	vpdma_free_desc_buf(&list->buf);
+
+	list->next = NULL;
+}
+
+static bool vpdma_list_busy(struct vpdma_data *vpdma, int list_num)
+{
+	return read_reg(vpdma, VPDMA_LIST_STAT_SYNC) & BIT(list_num + 16);
+}
+
+/*
+ * submit a list of DMA descriptors to the VPE VPDMA, do not wait for completion
+ */
+int vpdma_submit_descs(struct vpdma_data *vpdma, struct vpdma_desc_list *list)
+{
+	/* we always use the first list */
+	int list_num = 0;
+	int list_size;
+
+	if (vpdma_list_busy(vpdma, list_num))
+		return -EBUSY;
+
+	/* 16-byte granularity */
+	list_size = (list->next - list->buf.addr) >> 4;
+
+	write_reg(vpdma, VPDMA_LIST_ADDR, (u32) list->buf.dma_addr);
+
+	write_reg(vpdma, VPDMA_LIST_ATTR,
+			(list_num << VPDMA_LIST_NUM_SHFT) |
+			(list->type << VPDMA_LIST_TYPE_SHFT) |
+			list_size);
+
+	return 0;
+}
+
+/* set or clear the mask for list complete interrupt */
+void vpdma_enable_list_complete_irq(struct vpdma_data *vpdma, int list_num,
+		bool enable)
+{
+	u32 val;
+
+	val = read_reg(vpdma, VPDMA_INT_LIST0_MASK);
+	if (enable)
+		val |= (1 << (list_num * 2));
+	else
+		val &= ~(1 << (list_num * 2));
+	write_reg(vpdma, VPDMA_INT_LIST0_MASK, val);
+}
+
+/* clear previosuly occured list intterupts in the LIST_STAT register */
+void vpdma_clear_list_stat(struct vpdma_data *vpdma)
+{
+	write_reg(vpdma, VPDMA_INT_LIST0_STAT,
+		read_reg(vpdma, VPDMA_INT_LIST0_STAT));
+}
+
+/*
+ * configures the output mode of the line buffer for the given client, the
+ * line buffer content can either be mirrored(each line repeated twice) or
+ * passed to the client as is
+ */
+void vpdma_set_line_mode(struct vpdma_data *vpdma, int line_mode,
+		enum vpdma_channel chan)
+{
+	int client_cstat = chan_info[chan].cstat_offset;
+
+	write_field_reg(vpdma, client_cstat, line_mode,
+		VPDMA_CSTAT_LINE_MODE_MASK, VPDMA_CSTAT_LINE_MODE_SHIFT);
+}
+
+/*
+ * configures the event which should trigger VPDMA transfer for the given
+ * client
+ */
+void vpdma_set_frame_start_event(struct vpdma_data *vpdma,
+		enum vpdma_frame_start_event fs_event,
+		enum vpdma_channel chan)
+{
+	int client_cstat = chan_info[chan].cstat_offset;
+
+	write_field_reg(vpdma, client_cstat, fs_event,
+		VPDMA_CSTAT_FRAME_START_MASK, VPDMA_CSTAT_FRAME_START_SHIFT);
+}
+
+static void vpdma_firmware_cb(const struct firmware *f, void *context)
+{
+	struct vpdma_data *vpdma = context;
+	struct vpdma_buf fw_dma_buf;
+	int i, r;
+
+	dev_dbg(&vpdma->pdev->dev, "firmware callback\n");
+
+	if (!f || !f->data) {
+		dev_err(&vpdma->pdev->dev, "couldn't get firmware\n");
+		return;
+	}
+
+	/* already initialized */
+	if (read_field_reg(vpdma, VPDMA_LIST_ATTR, VPDMA_LIST_RDY_MASK,
+			VPDMA_LIST_RDY_SHFT)) {
+		vpdma->ready = true;
+		return;
+	}
+
+	r = vpdma_alloc_desc_buf(&fw_dma_buf, f->size);
+	if (r) {
+		dev_err(&vpdma->pdev->dev,
+			"failed to allocate dma buffer for firmware\n");
+		goto rel_fw;
+	}
+
+	memcpy(fw_dma_buf.addr, f->data, f->size);
+
+	vpdma_map_desc_buf(vpdma, &fw_dma_buf);
+
+	write_reg(vpdma, VPDMA_LIST_ADDR, (u32) fw_dma_buf.dma_addr);
+
+	for (i = 0; i < 100; i++) {		/* max 1 second */
+		msleep_interruptible(10);
+
+		if (read_field_reg(vpdma, VPDMA_LIST_ATTR, VPDMA_LIST_RDY_MASK,
+				VPDMA_LIST_RDY_SHFT))
+			break;
+	}
+
+	if (i == 100) {
+		dev_err(&vpdma->pdev->dev, "firmware upload failed\n");
+		goto free_buf;
+	}
+
+	vpdma->ready = true;
+
+free_buf:
+	vpdma_unmap_desc_buf(vpdma, &fw_dma_buf);
+
+	vpdma_free_desc_buf(&fw_dma_buf);
+rel_fw:
+	release_firmware(f);
+}
+
+static int vpdma_load_firmware(struct vpdma_data *vpdma)
+{
+	int r;
+	struct device *dev = &vpdma->pdev->dev;
+
+	r = request_firmware_nowait(THIS_MODULE, 1,
+		(const char *) VPDMA_FIRMWARE, dev, GFP_KERNEL, vpdma,
+		vpdma_firmware_cb);
+	if (r) {
+		dev_err(dev, "firmware not available %s\n", VPDMA_FIRMWARE);
+		return r;
+	} else {
+		dev_info(dev, "loading firmware %s\n", VPDMA_FIRMWARE);
+	}
+
+	return 0;
+}
+
+struct vpdma_data *vpdma_create(struct platform_device *pdev)
+{
+	struct resource *res;
+	struct vpdma_data *vpdma;
+	int r;
+
+	dev_dbg(&pdev->dev, "vpdma_create\n");
+
+	vpdma = devm_kzalloc(&pdev->dev, sizeof(*vpdma), GFP_KERNEL);
+	if (!vpdma) {
+		dev_err(&pdev->dev, "couldn't alloc vpdma_dev\n");
+		return ERR_PTR(-ENOMEM);
+	}
+
+	vpdma->pdev = pdev;
+
+	res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "vpdma");
+	if (res == NULL) {
+		dev_err(&pdev->dev, "missing platform resources data\n");
+		return ERR_PTR(-ENODEV);
+	}
+
+	vpdma->base = devm_ioremap(&pdev->dev, res->start, resource_size(res));
+	if (!vpdma->base) {
+		dev_err(&pdev->dev, "failed to ioremap\n");
+		return ERR_PTR(-ENOMEM);
+	}
+
+	r = vpdma_load_firmware(vpdma);
+	if (r) {
+		pr_err("failed to load firmware %s\n", VPDMA_FIRMWARE);
+		return ERR_PTR(r);
+	}
+
+	return vpdma;
+}
+MODULE_FIRMWARE(VPDMA_FIRMWARE);
diff --git a/drivers/media/platform/ti-vpe/vpdma.h b/drivers/media/platform/ti-vpe/vpdma.h
new file mode 100644
index 0000000..9710f57
--- /dev/null
+++ b/drivers/media/platform/ti-vpe/vpdma.h
@@ -0,0 +1,154 @@
+/*
+ * Copyright (c) 2013 Texas Instruments Inc.
+ *
+ * David Griego, <dagriego@biglakesoftware.com>
+ * Dale Farnsworth, <dale@farnsworth.org>
+ * Archit Taneja, <archit@ti.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#ifndef __TI_VPDMA_H_
+#define __TI_VPDMA_H_
+
+/*
+ * A vpdma_buf tracks the size, DMA address and mapping status of each
+ * driver DMA area.
+ */
+struct vpdma_buf {
+	void			*addr;
+	dma_addr_t		dma_addr;
+	size_t			size;
+	bool			mapped;
+};
+
+struct vpdma_desc_list {
+	struct vpdma_buf buf;
+	void *next;
+	int type;
+};
+
+struct vpdma_data {
+	void __iomem		*base;
+
+	struct platform_device	*pdev;
+
+	/* tells whether vpdma firmware is loaded or not */
+	bool ready;
+};
+
+struct vpdma_data_format {
+	int data_type;
+	u8 depth;
+};
+
+#define VPDMA_DESC_ALIGN		16	/* 16-byte descriptor alignment */
+
+#define VPDMA_MAX_DESC_SIZE		32	/* 8 words */
+
+#define VPDMA_LIST_TYPE_NORMAL		0
+#define VPDMA_LIST_TYPE_SELF_MODIFYING	1
+#define VPDMA_LIST_TYPE_DOORBELL	2
+
+enum vpdma_yuv_formats {
+	VPDMA_DATA_FMT_Y444 = 0,
+	VPDMA_DATA_FMT_Y422,
+	VPDMA_DATA_FMT_Y420,
+	VPDMA_DATA_FMT_C444,
+	VPDMA_DATA_FMT_C422,
+	VPDMA_DATA_FMT_C420,
+	VPDMA_DATA_FMT_YC422,
+	VPDMA_DATA_FMT_YC444,
+	VPDMA_DATA_FMT_CY422,
+};
+
+enum vpdma_rgb_formats {
+	VPDMA_DATA_FMT_RGB565 = 0,
+	VPDMA_DATA_FMT_ARGB16_1555,
+	VPDMA_DATA_FMT_ARGB16,
+	VPDMA_DATA_FMT_RGBA16_5551,
+	VPDMA_DATA_FMT_RGBA16,
+	VPDMA_DATA_FMT_ARGB24,
+	VPDMA_DATA_FMT_RGB24,
+	VPDMA_DATA_FMT_ARGB32,
+	VPDMA_DATA_FMT_RGBA24,
+	VPDMA_DATA_FMT_RGBA32,
+	VPDMA_DATA_FMT_BGR565,
+	VPDMA_DATA_FMT_ABGR16_1555,
+	VPDMA_DATA_FMT_ABGR16,
+	VPDMA_DATA_FMT_BGRA16_5551,
+	VPDMA_DATA_FMT_BGRA16,
+	VPDMA_DATA_FMT_ABGR24,
+	VPDMA_DATA_FMT_BGR24,
+	VPDMA_DATA_FMT_ABGR32,
+	VPDMA_DATA_FMT_BGRA24,
+	VPDMA_DATA_FMT_BGRA32,
+};
+
+enum vpdma_misc_formats {
+	VPDMA_DATA_FMT_MV = 0,
+};
+
+extern const struct vpdma_data_format vpdma_yuv_fmts[];
+extern const struct vpdma_data_format vpdma_rgb_fmts[];
+extern const struct vpdma_data_format vpdma_misc_fmts[];
+
+enum vpdma_frame_start_event {
+	VPDMA_FSEVENT_HDMI_FID = 0,
+	VPDMA_FSEVENT_DVO2_FID,
+	VPDMA_FSEVENT_HDCOMP_FID,
+	VPDMA_FSEVENT_SD_FID,
+	VPDMA_FSEVENT_LM_FID0,
+	VPDMA_FSEVENT_LM_FID1,
+	VPDMA_FSEVENT_LM_FID2,
+	VPDMA_FSEVENT_CHANNEL_ACTIVE,
+};
+
+/*
+ * VPDMA channel numbers
+ */
+enum vpdma_channel {
+	VPE_CHAN_LUMA1_IN,
+	VPE_CHAN_CHROMA1_IN,
+	VPE_CHAN_LUMA2_IN,
+	VPE_CHAN_CHROMA2_IN,
+	VPE_CHAN_LUMA3_IN,
+	VPE_CHAN_CHROMA3_IN,
+	VPE_CHAN_MV_IN,
+	VPE_CHAN_MV_OUT,
+	VPE_CHAN_LUMA_OUT,
+	VPE_CHAN_CHROMA_OUT,
+	VPE_CHAN_RGB_OUT,
+};
+
+/* vpdma descriptor buffer allocation and management */
+int vpdma_alloc_desc_buf(struct vpdma_buf *buf, size_t size);
+void vpdma_free_desc_buf(struct vpdma_buf *buf);
+int vpdma_map_desc_buf(struct vpdma_data *vpdma, struct vpdma_buf *buf);
+void vpdma_unmap_desc_buf(struct vpdma_data *vpdma, struct vpdma_buf *buf);
+
+/* vpdma descriptor list funcs */
+int vpdma_create_desc_list(struct vpdma_desc_list *list, size_t size, int type);
+void vpdma_reset_desc_list(struct vpdma_desc_list *list);
+void vpdma_free_desc_list(struct vpdma_desc_list *list);
+int vpdma_submit_descs(struct vpdma_data *vpdma, struct vpdma_desc_list *list);
+
+/* vpdma list interrupt management */
+void vpdma_enable_list_complete_irq(struct vpdma_data *vpdma, int list_num,
+		bool enable);
+void vpdma_clear_list_stat(struct vpdma_data *vpdma);
+
+/* vpdma client configuration */
+void vpdma_set_line_mode(struct vpdma_data *vpdma, int line_mode,
+		enum vpdma_channel chan);
+void vpdma_set_frame_start_event(struct vpdma_data *vpdma,
+		enum vpdma_frame_start_event fs_event, enum vpdma_channel chan);
+
+void vpdma_dump_regs(struct vpdma_data *vpdma);
+
+/* initialize vpdma, passed with VPE's platform device pointer */
+struct vpdma_data *vpdma_create(struct platform_device *pdev);
+
+#endif
diff --git a/drivers/media/platform/ti-vpe/vpdma_priv.h b/drivers/media/platform/ti-vpe/vpdma_priv.h
new file mode 100644
index 0000000..8ff51a3
--- /dev/null
+++ b/drivers/media/platform/ti-vpe/vpdma_priv.h
@@ -0,0 +1,119 @@
+/*
+ * Copyright (c) 2013 Texas Instruments Inc.
+ *
+ * David Griego, <dagriego@biglakesoftware.com>
+ * Dale Farnsworth, <dale@farnsworth.org>
+ * Archit Taneja, <archit@ti.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#ifndef _TI_VPDMA_PRIV_H_
+#define _TI_VPDMA_PRIV_H_
+
+/*
+ * VPDMA Register offsets
+ */
+
+/* Top level */
+#define VPDMA_PID		0x00
+#define VPDMA_LIST_ADDR		0x04
+#define VPDMA_LIST_ATTR		0x08
+#define VPDMA_LIST_STAT_SYNC	0x0c
+#define VPDMA_BG_RGB		0x18
+#define VPDMA_BG_YUV		0x1c
+#define VPDMA_SETUP		0x30
+#define VPDMA_MAX_SIZE1		0x34
+#define VPDMA_MAX_SIZE2		0x38
+#define VPDMA_MAX_SIZE3		0x3c
+
+/* Interrupts */
+#define VPDMA_INT_CHAN_STAT(grp)	(0x40 + grp * 8)
+#define VPDMA_INT_CHAN_MASK(grp)	(VPDMA_INT_CHAN_STAT(grp) + 4)
+#define VPDMA_INT_CLIENT0_STAT		0x78
+#define VPDMA_INT_CLIENT0_MASK		0x7c
+#define VPDMA_INT_CLIENT1_STAT		0x80
+#define VPDMA_INT_CLIENT1_MASK		0x84
+#define VPDMA_INT_LIST0_STAT		0x88
+#define VPDMA_INT_LIST0_MASK		0x8c
+
+#define VPDMA_PERFMON(i)		(0x200 + i * 4)
+
+/* VPE specific client registers */
+#define VPDMA_DEI_CHROMA1_CSTAT		0x0300
+#define VPDMA_DEI_LUMA1_CSTAT		0x0304
+#define VPDMA_DEI_LUMA2_CSTAT		0x0308
+#define VPDMA_DEI_CHROMA2_CSTAT		0x030c
+#define VPDMA_DEI_LUMA3_CSTAT		0x0310
+#define VPDMA_DEI_CHROMA3_CSTAT		0x0314
+#define VPDMA_DEI_MV_IN_CSTAT		0x0330
+#define VPDMA_DEI_MV_OUT_CSTAT		0x033c
+#define VPDMA_VIP_UP_Y_CSTAT		0x0390
+#define VPDMA_VIP_UP_UV_CSTAT		0x0394
+#define VPDMA_VPI_CTL_CSTAT		0x03d0
+
+/* Reg field info for VPDMA_CLIENT_CSTAT registers */
+#define VPDMA_CSTAT_LINE_MODE_MASK	0x03
+#define VPDMA_CSTAT_LINE_MODE_SHIFT	8
+#define VPDMA_CSTAT_FRAME_START_MASK	0xf
+#define VPDMA_CSTAT_FRAME_START_SHIFT	10
+
+#define VPDMA_LIST_NUM_MASK		0x07
+#define VPDMA_LIST_NUM_SHFT		24
+#define VPDMA_LIST_STOP_SHFT		20
+#define VPDMA_LIST_RDY_MASK		0x01
+#define VPDMA_LIST_RDY_SHFT		19
+#define VPDMA_LIST_TYPE_MASK		0x03
+#define VPDMA_LIST_TYPE_SHFT		16
+#define VPDMA_LIST_SIZE_MASK		0xffff
+
+/* VPDMA data type values for data formats */
+#define DATA_TYPE_Y444				0x0
+#define DATA_TYPE_Y422				0x1
+#define DATA_TYPE_Y420				0x2
+#define DATA_TYPE_C444				0x4
+#define DATA_TYPE_C422				0x5
+#define DATA_TYPE_C420				0x6
+#define DATA_TYPE_YC422				0x7
+#define DATA_TYPE_YC444				0x8
+#define DATA_TYPE_CY422				0x23
+
+#define DATA_TYPE_RGB16_565			0x0
+#define DATA_TYPE_ARGB_1555			0x1
+#define DATA_TYPE_ARGB_4444			0x2
+#define DATA_TYPE_RGBA_5551			0x3
+#define DATA_TYPE_RGBA_4444			0x4
+#define DATA_TYPE_ARGB24_6666			0x5
+#define DATA_TYPE_RGB24_888			0x6
+#define DATA_TYPE_ARGB32_8888			0x7
+#define DATA_TYPE_RGBA24_6666			0x8
+#define DATA_TYPE_RGBA32_8888			0x9
+#define DATA_TYPE_BGR16_565			0x10
+#define DATA_TYPE_ABGR_1555			0x11
+#define DATA_TYPE_ABGR_4444			0x12
+#define DATA_TYPE_BGRA_5551			0x13
+#define DATA_TYPE_BGRA_4444			0x14
+#define DATA_TYPE_ABGR24_6666			0x15
+#define DATA_TYPE_BGR24_888			0x16
+#define DATA_TYPE_ABGR32_8888			0x17
+#define DATA_TYPE_BGRA24_6666			0x18
+#define DATA_TYPE_BGRA32_8888			0x19
+
+#define DATA_TYPE_MV				0x3
+
+/* VPDMA channel numbers(only VPE channels for now) */
+#define	VPE_CHAN_NUM_LUMA1_IN		0
+#define	VPE_CHAN_NUM_CHROMA1_IN		1
+#define	VPE_CHAN_NUM_LUMA2_IN		2
+#define	VPE_CHAN_NUM_CHROMA2_IN		3
+#define	VPE_CHAN_NUM_LUMA3_IN		4
+#define	VPE_CHAN_NUM_CHROMA3_IN		5
+#define	VPE_CHAN_NUM_MV_IN		12
+#define	VPE_CHAN_NUM_MV_OUT		15
+#define	VPE_CHAN_NUM_LUMA_OUT		102
+#define	VPE_CHAN_NUM_CHROMA_OUT		103
+#define	VPE_CHAN_NUM_RGB_OUT		106
+
+#endif
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v3 1/6] v4l: ti-vpe: Create a vpdma helper library
@ 2013-08-29 12:32       ` Archit Taneja
  0 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-08-29 12:32 UTC (permalink / raw)
  To: linux-media
  Cc: hverkuil, laurent.pinchart, tomi.valkeinen, linux-omap, Archit Taneja

The primary function of VPDMA is to move data between external memory and
internal processing modules(in our case, VPE) that source or sink data. VPDMA is
capable of buffering this data and then delivering the data as demanded to the
modules as programmed. The modules that source or sink data are referred to as
clients or ports. A channel is setup inside the VPDMA to connect a specific
memory buffer to a specific client. The VPDMA centralizes the DMA control
functions and buffering required to allow all the clients to minimize the
effect of long latency times.

Add the following to the VPDMA helper:

- A data struct which describe VPDMA channels. For now, these channels are the
  ones used only by VPE, the list of channels will increase when VIP(Video
  Input Port) also uses the VPDMA library. This channel information will be
  used to populate fields required by data descriptors.

- Data structs which describe the different data types supported by VPDMA. This
  data type information will be used to populate fields required by data
  descriptors and used by the VPE driver to map a V4L2 format to the
  corresponding VPDMA data type.

- Provide VPDMA register offset definitions, functions to read, write and modify
  VPDMA registers.

- Functions to create and submit a VPDMA list. A list is a group of descriptors
  that makes up a set of DMA transfers that need to be completed. Each
  descriptor will either perform a DMA transaction to fetch input buffers and
  write to output buffers(data descriptors), or configure the MMRs of sub blocks
  of VPE(configuration descriptors), or provide control information to VPDMA
  (control descriptors).

- Functions to allocate, map and unmap buffers needed for the descriptor list,
  payloads containing MMR values and scaler coefficients. These use the DMA
  mapping APIs to ensure exclusive access to VPDMA.

- Functions to enable VPDMA interrupts. VPDMA can trigger an interrupt on the
  VPE interrupt line when a descriptor list is parsed completely and the DMA
  transactions are completed. This requires masking the events in VPDMA
  registers and configuring some top level VPE interrupt registers.

- Enable some VPDMA specific parameters: frame start event(when to start DMA for
  a client) and line mode(whether each line fetched should be mirrored or not).

- Function to load firmware required by VPDMA. VPDMA requires a firmware for
  it's internal list manager. We add the required request_firmware apis to fetch
  this firmware from user space.

- Function to dump VPDMA registers.

- A function to initialize and create a VPDMA instance, this will be called by
  the VPE driver with it's platform device pointer, this function will take care
  of loading VPDMA firmware and returning a vpdma_data instance back to the VPE
  driver. The VIP driver will also call the same init function to initialize it's
  own VPDMA instance.

Signed-off-by: Archit Taneja <archit@ti.com>
---
 drivers/media/platform/ti-vpe/vpdma.c      | 578 +++++++++++++++++++++++++++++
 drivers/media/platform/ti-vpe/vpdma.h      | 154 ++++++++
 drivers/media/platform/ti-vpe/vpdma_priv.h | 119 ++++++
 3 files changed, 851 insertions(+)
 create mode 100644 drivers/media/platform/ti-vpe/vpdma.c
 create mode 100644 drivers/media/platform/ti-vpe/vpdma.h
 create mode 100644 drivers/media/platform/ti-vpe/vpdma_priv.h

diff --git a/drivers/media/platform/ti-vpe/vpdma.c b/drivers/media/platform/ti-vpe/vpdma.c
new file mode 100644
index 0000000..42db12c
--- /dev/null
+++ b/drivers/media/platform/ti-vpe/vpdma.c
@@ -0,0 +1,578 @@
+/*
+ * VPDMA helper library
+ *
+ * Copyright (c) 2013 Texas Instruments Inc.
+ *
+ * David Griego, <dagriego@biglakesoftware.com>
+ * Dale Farnsworth, <dale@farnsworth.org>
+ * Archit Taneja, <archit@ti.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#include <linux/delay.h>
+#include <linux/dma-mapping.h>
+#include <linux/err.h>
+#include <linux/firmware.h>
+#include <linux/io.h>
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+
+#include "vpdma.h"
+#include "vpdma_priv.h"
+
+#define VPDMA_FIRMWARE	"vpdma-1b8.bin"
+
+const struct vpdma_data_format vpdma_yuv_fmts[] = {
+	[VPDMA_DATA_FMT_Y444] = {
+		.data_type	= DATA_TYPE_Y444,
+		.depth		= 8,
+	},
+	[VPDMA_DATA_FMT_Y422] = {
+		.data_type	= DATA_TYPE_Y422,
+		.depth		= 8,
+	},
+	[VPDMA_DATA_FMT_Y420] = {
+		.data_type	= DATA_TYPE_Y420,
+		.depth		= 8,
+	},
+	[VPDMA_DATA_FMT_C444] = {
+		.data_type	= DATA_TYPE_C444,
+		.depth		= 8,
+	},
+	[VPDMA_DATA_FMT_C422] = {
+		.data_type	= DATA_TYPE_C422,
+		.depth		= 8,
+	},
+	[VPDMA_DATA_FMT_C420] = {
+		.data_type	= DATA_TYPE_C420,
+		.depth		= 4,
+	},
+	[VPDMA_DATA_FMT_YC422] = {
+		.data_type	= DATA_TYPE_YC422,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_YC444] = {
+		.data_type	= DATA_TYPE_YC444,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_CY422] = {
+		.data_type	= DATA_TYPE_CY422,
+		.depth		= 16,
+	},
+};
+
+const struct vpdma_data_format vpdma_rgb_fmts[] = {
+	[VPDMA_DATA_FMT_RGB565] = {
+		.data_type	= DATA_TYPE_RGB16_565,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_ARGB16_1555] = {
+		.data_type	= DATA_TYPE_ARGB_1555,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_ARGB16] = {
+		.data_type	= DATA_TYPE_ARGB_4444,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_RGBA16_5551] = {
+		.data_type	= DATA_TYPE_RGBA_5551,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_RGBA16] = {
+		.data_type	= DATA_TYPE_RGBA_4444,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_ARGB24] = {
+		.data_type	= DATA_TYPE_ARGB24_6666,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_RGB24] = {
+		.data_type	= DATA_TYPE_RGB24_888,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_ARGB32] = {
+		.data_type	= DATA_TYPE_ARGB32_8888,
+		.depth		= 32,
+	},
+	[VPDMA_DATA_FMT_RGBA24] = {
+		.data_type	= DATA_TYPE_RGBA24_6666,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_RGBA32] = {
+		.data_type	= DATA_TYPE_RGBA32_8888,
+		.depth		= 32,
+	},
+	[VPDMA_DATA_FMT_BGR565] = {
+		.data_type	= DATA_TYPE_BGR16_565,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_ABGR16_1555] = {
+		.data_type	= DATA_TYPE_ABGR_1555,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_ABGR16] = {
+		.data_type	= DATA_TYPE_ABGR_4444,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_BGRA16_5551] = {
+		.data_type	= DATA_TYPE_BGRA_5551,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_BGRA16] = {
+		.data_type	= DATA_TYPE_BGRA_4444,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_ABGR24] = {
+		.data_type	= DATA_TYPE_ABGR24_6666,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_BGR24] = {
+		.data_type	= DATA_TYPE_BGR24_888,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_ABGR32] = {
+		.data_type	= DATA_TYPE_ABGR32_8888,
+		.depth		= 32,
+	},
+	[VPDMA_DATA_FMT_BGRA24] = {
+		.data_type	= DATA_TYPE_BGRA24_6666,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_BGRA32] = {
+		.data_type	= DATA_TYPE_BGRA32_8888,
+		.depth		= 32,
+	},
+};
+
+const struct vpdma_data_format vpdma_misc_fmts[] = {
+	[VPDMA_DATA_FMT_MV] = {
+		.data_type	= DATA_TYPE_MV,
+		.depth		= 4,
+	},
+};
+
+struct vpdma_channel_info {
+	int num;		/* VPDMA channel number */
+	int cstat_offset;	/* client CSTAT register offset */
+};
+
+static const struct vpdma_channel_info chan_info[] = {
+	[VPE_CHAN_LUMA1_IN] = {
+		.num		= VPE_CHAN_NUM_LUMA1_IN,
+		.cstat_offset	= VPDMA_DEI_LUMA1_CSTAT,
+	},
+	[VPE_CHAN_CHROMA1_IN] = {
+		.num		= VPE_CHAN_NUM_CHROMA1_IN,
+		.cstat_offset	= VPDMA_DEI_CHROMA1_CSTAT,
+	},
+	[VPE_CHAN_LUMA2_IN] = {
+		.num		= VPE_CHAN_NUM_LUMA2_IN,
+		.cstat_offset	= VPDMA_DEI_LUMA2_CSTAT,
+	},
+	[VPE_CHAN_CHROMA2_IN] = {
+		.num		= VPE_CHAN_NUM_CHROMA2_IN,
+		.cstat_offset	= VPDMA_DEI_CHROMA2_CSTAT,
+	},
+	[VPE_CHAN_LUMA3_IN] = {
+		.num		= VPE_CHAN_NUM_LUMA3_IN,
+		.cstat_offset	= VPDMA_DEI_LUMA3_CSTAT,
+	},
+	[VPE_CHAN_CHROMA3_IN] = {
+		.num		= VPE_CHAN_NUM_CHROMA3_IN,
+		.cstat_offset	= VPDMA_DEI_CHROMA3_CSTAT,
+	},
+	[VPE_CHAN_MV_IN] = {
+		.num		= VPE_CHAN_NUM_MV_IN,
+		.cstat_offset	= VPDMA_DEI_MV_IN_CSTAT,
+	},
+	[VPE_CHAN_MV_OUT] = {
+		.num		= VPE_CHAN_NUM_MV_OUT,
+		.cstat_offset	= VPDMA_DEI_MV_OUT_CSTAT,
+	},
+	[VPE_CHAN_LUMA_OUT] = {
+		.num		= VPE_CHAN_NUM_LUMA_OUT,
+		.cstat_offset	= VPDMA_VIP_UP_Y_CSTAT,
+	},
+	[VPE_CHAN_CHROMA_OUT] = {
+		.num		= VPE_CHAN_NUM_CHROMA_OUT,
+		.cstat_offset	= VPDMA_VIP_UP_UV_CSTAT,
+	},
+	[VPE_CHAN_RGB_OUT] = {
+		.num		= VPE_CHAN_NUM_RGB_OUT,
+		.cstat_offset	= VPDMA_VIP_UP_Y_CSTAT,
+	},
+};
+
+static u32 read_reg(struct vpdma_data *vpdma, int offset)
+{
+	return ioread32(vpdma->base + offset);
+}
+
+static void write_reg(struct vpdma_data *vpdma, int offset, u32 value)
+{
+	iowrite32(value, vpdma->base + offset);
+}
+
+static int read_field_reg(struct vpdma_data *vpdma, int offset,
+		u32 mask, int shift)
+{
+	return (read_reg(vpdma, offset) & (mask << shift)) >> shift;
+}
+
+static void write_field_reg(struct vpdma_data *vpdma, int offset, u32 field,
+		u32 mask, int shift)
+{
+	u32 val = read_reg(vpdma, offset);
+
+	val &= ~(mask << shift);
+	val |= (field & mask) << shift;
+
+	write_reg(vpdma, offset, val);
+}
+
+void vpdma_dump_regs(struct vpdma_data *vpdma)
+{
+	struct device *dev = &vpdma->pdev->dev;
+
+#define DUMPREG(r) dev_dbg(dev, "%-35s %08x\n", #r, read_reg(vpdma, VPDMA_##r))
+
+	dev_dbg(dev, "VPDMA Registers:\n");
+
+	DUMPREG(PID);
+	DUMPREG(LIST_ADDR);
+	DUMPREG(LIST_ATTR);
+	DUMPREG(LIST_STAT_SYNC);
+	DUMPREG(BG_RGB);
+	DUMPREG(BG_YUV);
+	DUMPREG(SETUP);
+	DUMPREG(MAX_SIZE1);
+	DUMPREG(MAX_SIZE2);
+	DUMPREG(MAX_SIZE3);
+
+	/*
+	 * dumping registers of only group0 and group3, because VPE channels
+	 * lie within group0 and group3 registers
+	 */
+	DUMPREG(INT_CHAN_STAT(0));
+	DUMPREG(INT_CHAN_MASK(0));
+	DUMPREG(INT_CHAN_STAT(3));
+	DUMPREG(INT_CHAN_MASK(3));
+	DUMPREG(INT_CLIENT0_STAT);
+	DUMPREG(INT_CLIENT0_MASK);
+	DUMPREG(INT_CLIENT1_STAT);
+	DUMPREG(INT_CLIENT1_MASK);
+	DUMPREG(INT_LIST0_STAT);
+	DUMPREG(INT_LIST0_MASK);
+
+	/*
+	 * these are registers specific to VPE clients, we can make this
+	 * function dump client registers specific to VPE or VIP based on
+	 * who is using it
+	 */
+	DUMPREG(DEI_CHROMA1_CSTAT);
+	DUMPREG(DEI_LUMA1_CSTAT);
+	DUMPREG(DEI_CHROMA2_CSTAT);
+	DUMPREG(DEI_LUMA2_CSTAT);
+	DUMPREG(DEI_CHROMA3_CSTAT);
+	DUMPREG(DEI_LUMA3_CSTAT);
+	DUMPREG(DEI_MV_IN_CSTAT);
+	DUMPREG(DEI_MV_OUT_CSTAT);
+	DUMPREG(VIP_UP_Y_CSTAT);
+	DUMPREG(VIP_UP_UV_CSTAT);
+	DUMPREG(VPI_CTL_CSTAT);
+}
+
+/*
+ * Allocate a DMA buffer
+ */
+int vpdma_alloc_desc_buf(struct vpdma_buf *buf, size_t size)
+{
+	buf->size = size;
+	buf->mapped = false;
+	buf->addr = kzalloc(size, GFP_KERNEL);
+	if (!buf->addr)
+		return -ENOMEM;
+
+	WARN_ON((u32) buf->addr & VPDMA_DESC_ALIGN);
+
+	return 0;
+}
+
+void vpdma_free_desc_buf(struct vpdma_buf *buf)
+{
+	WARN_ON(buf->mapped);
+	kfree(buf->addr);
+	buf->addr = NULL;
+	buf->size = 0;
+}
+
+/*
+ * map descriptor/payload DMA buffer, enabling DMA access
+ */
+int vpdma_map_desc_buf(struct vpdma_data *vpdma, struct vpdma_buf *buf)
+{
+	struct device *dev = &vpdma->pdev->dev;
+
+	WARN_ON(buf->mapped);
+	buf->dma_addr = dma_map_single(dev, buf->addr, buf->size,
+				DMA_TO_DEVICE);
+	if (dma_mapping_error(dev, buf->dma_addr)) {
+		dev_err(dev, "failed to map buffer\n");
+		return -EINVAL;
+	}
+
+	buf->mapped = true;
+
+	return 0;
+}
+
+/*
+ * unmap descriptor/payload DMA buffer, disabling DMA access and
+ * allowing the main processor to acces the data
+ */
+void vpdma_unmap_desc_buf(struct vpdma_data *vpdma, struct vpdma_buf *buf)
+{
+	struct device *dev = &vpdma->pdev->dev;
+
+	if (buf->mapped)
+		dma_unmap_single(dev, buf->dma_addr, buf->size, DMA_TO_DEVICE);
+
+	buf->mapped = false;
+}
+
+/*
+ * create a descriptor list, the user of this list will append configuration,
+ * control and data descriptors to this list, this list will be submitted to
+ * VPDMA. VPDMA's list parser will go through each descriptor and perform the
+ * required DMA operations
+ */
+int vpdma_create_desc_list(struct vpdma_desc_list *list, size_t size, int type)
+{
+	int r;
+
+	r = vpdma_alloc_desc_buf(&list->buf, size);
+	if (r)
+		return r;
+
+	list->next = list->buf.addr;
+
+	list->type = type;
+
+	return 0;
+}
+
+/*
+ * once a descriptor list is parsed by VPDMA, we reset the list by emptying it,
+ * to allow new descriptors to be added to the list.
+ */
+void vpdma_reset_desc_list(struct vpdma_desc_list *list)
+{
+	list->next = list->buf.addr;
+}
+
+/*
+ * free the buffer allocated fot the VPDMA descriptor list, this should be
+ * called when the user doesn't want to use VPDMA any more.
+ */
+void vpdma_free_desc_list(struct vpdma_desc_list *list)
+{
+	vpdma_free_desc_buf(&list->buf);
+
+	list->next = NULL;
+}
+
+static bool vpdma_list_busy(struct vpdma_data *vpdma, int list_num)
+{
+	return read_reg(vpdma, VPDMA_LIST_STAT_SYNC) & BIT(list_num + 16);
+}
+
+/*
+ * submit a list of DMA descriptors to the VPE VPDMA, do not wait for completion
+ */
+int vpdma_submit_descs(struct vpdma_data *vpdma, struct vpdma_desc_list *list)
+{
+	/* we always use the first list */
+	int list_num = 0;
+	int list_size;
+
+	if (vpdma_list_busy(vpdma, list_num))
+		return -EBUSY;
+
+	/* 16-byte granularity */
+	list_size = (list->next - list->buf.addr) >> 4;
+
+	write_reg(vpdma, VPDMA_LIST_ADDR, (u32) list->buf.dma_addr);
+
+	write_reg(vpdma, VPDMA_LIST_ATTR,
+			(list_num << VPDMA_LIST_NUM_SHFT) |
+			(list->type << VPDMA_LIST_TYPE_SHFT) |
+			list_size);
+
+	return 0;
+}
+
+/* set or clear the mask for list complete interrupt */
+void vpdma_enable_list_complete_irq(struct vpdma_data *vpdma, int list_num,
+		bool enable)
+{
+	u32 val;
+
+	val = read_reg(vpdma, VPDMA_INT_LIST0_MASK);
+	if (enable)
+		val |= (1 << (list_num * 2));
+	else
+		val &= ~(1 << (list_num * 2));
+	write_reg(vpdma, VPDMA_INT_LIST0_MASK, val);
+}
+
+/* clear previosuly occured list intterupts in the LIST_STAT register */
+void vpdma_clear_list_stat(struct vpdma_data *vpdma)
+{
+	write_reg(vpdma, VPDMA_INT_LIST0_STAT,
+		read_reg(vpdma, VPDMA_INT_LIST0_STAT));
+}
+
+/*
+ * configures the output mode of the line buffer for the given client, the
+ * line buffer content can either be mirrored(each line repeated twice) or
+ * passed to the client as is
+ */
+void vpdma_set_line_mode(struct vpdma_data *vpdma, int line_mode,
+		enum vpdma_channel chan)
+{
+	int client_cstat = chan_info[chan].cstat_offset;
+
+	write_field_reg(vpdma, client_cstat, line_mode,
+		VPDMA_CSTAT_LINE_MODE_MASK, VPDMA_CSTAT_LINE_MODE_SHIFT);
+}
+
+/*
+ * configures the event which should trigger VPDMA transfer for the given
+ * client
+ */
+void vpdma_set_frame_start_event(struct vpdma_data *vpdma,
+		enum vpdma_frame_start_event fs_event,
+		enum vpdma_channel chan)
+{
+	int client_cstat = chan_info[chan].cstat_offset;
+
+	write_field_reg(vpdma, client_cstat, fs_event,
+		VPDMA_CSTAT_FRAME_START_MASK, VPDMA_CSTAT_FRAME_START_SHIFT);
+}
+
+static void vpdma_firmware_cb(const struct firmware *f, void *context)
+{
+	struct vpdma_data *vpdma = context;
+	struct vpdma_buf fw_dma_buf;
+	int i, r;
+
+	dev_dbg(&vpdma->pdev->dev, "firmware callback\n");
+
+	if (!f || !f->data) {
+		dev_err(&vpdma->pdev->dev, "couldn't get firmware\n");
+		return;
+	}
+
+	/* already initialized */
+	if (read_field_reg(vpdma, VPDMA_LIST_ATTR, VPDMA_LIST_RDY_MASK,
+			VPDMA_LIST_RDY_SHFT)) {
+		vpdma->ready = true;
+		return;
+	}
+
+	r = vpdma_alloc_desc_buf(&fw_dma_buf, f->size);
+	if (r) {
+		dev_err(&vpdma->pdev->dev,
+			"failed to allocate dma buffer for firmware\n");
+		goto rel_fw;
+	}
+
+	memcpy(fw_dma_buf.addr, f->data, f->size);
+
+	vpdma_map_desc_buf(vpdma, &fw_dma_buf);
+
+	write_reg(vpdma, VPDMA_LIST_ADDR, (u32) fw_dma_buf.dma_addr);
+
+	for (i = 0; i < 100; i++) {		/* max 1 second */
+		msleep_interruptible(10);
+
+		if (read_field_reg(vpdma, VPDMA_LIST_ATTR, VPDMA_LIST_RDY_MASK,
+				VPDMA_LIST_RDY_SHFT))
+			break;
+	}
+
+	if (i == 100) {
+		dev_err(&vpdma->pdev->dev, "firmware upload failed\n");
+		goto free_buf;
+	}
+
+	vpdma->ready = true;
+
+free_buf:
+	vpdma_unmap_desc_buf(vpdma, &fw_dma_buf);
+
+	vpdma_free_desc_buf(&fw_dma_buf);
+rel_fw:
+	release_firmware(f);
+}
+
+static int vpdma_load_firmware(struct vpdma_data *vpdma)
+{
+	int r;
+	struct device *dev = &vpdma->pdev->dev;
+
+	r = request_firmware_nowait(THIS_MODULE, 1,
+		(const char *) VPDMA_FIRMWARE, dev, GFP_KERNEL, vpdma,
+		vpdma_firmware_cb);
+	if (r) {
+		dev_err(dev, "firmware not available %s\n", VPDMA_FIRMWARE);
+		return r;
+	} else {
+		dev_info(dev, "loading firmware %s\n", VPDMA_FIRMWARE);
+	}
+
+	return 0;
+}
+
+struct vpdma_data *vpdma_create(struct platform_device *pdev)
+{
+	struct resource *res;
+	struct vpdma_data *vpdma;
+	int r;
+
+	dev_dbg(&pdev->dev, "vpdma_create\n");
+
+	vpdma = devm_kzalloc(&pdev->dev, sizeof(*vpdma), GFP_KERNEL);
+	if (!vpdma) {
+		dev_err(&pdev->dev, "couldn't alloc vpdma_dev\n");
+		return ERR_PTR(-ENOMEM);
+	}
+
+	vpdma->pdev = pdev;
+
+	res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "vpdma");
+	if (res == NULL) {
+		dev_err(&pdev->dev, "missing platform resources data\n");
+		return ERR_PTR(-ENODEV);
+	}
+
+	vpdma->base = devm_ioremap(&pdev->dev, res->start, resource_size(res));
+	if (!vpdma->base) {
+		dev_err(&pdev->dev, "failed to ioremap\n");
+		return ERR_PTR(-ENOMEM);
+	}
+
+	r = vpdma_load_firmware(vpdma);
+	if (r) {
+		pr_err("failed to load firmware %s\n", VPDMA_FIRMWARE);
+		return ERR_PTR(r);
+	}
+
+	return vpdma;
+}
+MODULE_FIRMWARE(VPDMA_FIRMWARE);
diff --git a/drivers/media/platform/ti-vpe/vpdma.h b/drivers/media/platform/ti-vpe/vpdma.h
new file mode 100644
index 0000000..9710f57
--- /dev/null
+++ b/drivers/media/platform/ti-vpe/vpdma.h
@@ -0,0 +1,154 @@
+/*
+ * Copyright (c) 2013 Texas Instruments Inc.
+ *
+ * David Griego, <dagriego@biglakesoftware.com>
+ * Dale Farnsworth, <dale@farnsworth.org>
+ * Archit Taneja, <archit@ti.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#ifndef __TI_VPDMA_H_
+#define __TI_VPDMA_H_
+
+/*
+ * A vpdma_buf tracks the size, DMA address and mapping status of each
+ * driver DMA area.
+ */
+struct vpdma_buf {
+	void			*addr;
+	dma_addr_t		dma_addr;
+	size_t			size;
+	bool			mapped;
+};
+
+struct vpdma_desc_list {
+	struct vpdma_buf buf;
+	void *next;
+	int type;
+};
+
+struct vpdma_data {
+	void __iomem		*base;
+
+	struct platform_device	*pdev;
+
+	/* tells whether vpdma firmware is loaded or not */
+	bool ready;
+};
+
+struct vpdma_data_format {
+	int data_type;
+	u8 depth;
+};
+
+#define VPDMA_DESC_ALIGN		16	/* 16-byte descriptor alignment */
+
+#define VPDMA_MAX_DESC_SIZE		32	/* 8 words */
+
+#define VPDMA_LIST_TYPE_NORMAL		0
+#define VPDMA_LIST_TYPE_SELF_MODIFYING	1
+#define VPDMA_LIST_TYPE_DOORBELL	2
+
+enum vpdma_yuv_formats {
+	VPDMA_DATA_FMT_Y444 = 0,
+	VPDMA_DATA_FMT_Y422,
+	VPDMA_DATA_FMT_Y420,
+	VPDMA_DATA_FMT_C444,
+	VPDMA_DATA_FMT_C422,
+	VPDMA_DATA_FMT_C420,
+	VPDMA_DATA_FMT_YC422,
+	VPDMA_DATA_FMT_YC444,
+	VPDMA_DATA_FMT_CY422,
+};
+
+enum vpdma_rgb_formats {
+	VPDMA_DATA_FMT_RGB565 = 0,
+	VPDMA_DATA_FMT_ARGB16_1555,
+	VPDMA_DATA_FMT_ARGB16,
+	VPDMA_DATA_FMT_RGBA16_5551,
+	VPDMA_DATA_FMT_RGBA16,
+	VPDMA_DATA_FMT_ARGB24,
+	VPDMA_DATA_FMT_RGB24,
+	VPDMA_DATA_FMT_ARGB32,
+	VPDMA_DATA_FMT_RGBA24,
+	VPDMA_DATA_FMT_RGBA32,
+	VPDMA_DATA_FMT_BGR565,
+	VPDMA_DATA_FMT_ABGR16_1555,
+	VPDMA_DATA_FMT_ABGR16,
+	VPDMA_DATA_FMT_BGRA16_5551,
+	VPDMA_DATA_FMT_BGRA16,
+	VPDMA_DATA_FMT_ABGR24,
+	VPDMA_DATA_FMT_BGR24,
+	VPDMA_DATA_FMT_ABGR32,
+	VPDMA_DATA_FMT_BGRA24,
+	VPDMA_DATA_FMT_BGRA32,
+};
+
+enum vpdma_misc_formats {
+	VPDMA_DATA_FMT_MV = 0,
+};
+
+extern const struct vpdma_data_format vpdma_yuv_fmts[];
+extern const struct vpdma_data_format vpdma_rgb_fmts[];
+extern const struct vpdma_data_format vpdma_misc_fmts[];
+
+enum vpdma_frame_start_event {
+	VPDMA_FSEVENT_HDMI_FID = 0,
+	VPDMA_FSEVENT_DVO2_FID,
+	VPDMA_FSEVENT_HDCOMP_FID,
+	VPDMA_FSEVENT_SD_FID,
+	VPDMA_FSEVENT_LM_FID0,
+	VPDMA_FSEVENT_LM_FID1,
+	VPDMA_FSEVENT_LM_FID2,
+	VPDMA_FSEVENT_CHANNEL_ACTIVE,
+};
+
+/*
+ * VPDMA channel numbers
+ */
+enum vpdma_channel {
+	VPE_CHAN_LUMA1_IN,
+	VPE_CHAN_CHROMA1_IN,
+	VPE_CHAN_LUMA2_IN,
+	VPE_CHAN_CHROMA2_IN,
+	VPE_CHAN_LUMA3_IN,
+	VPE_CHAN_CHROMA3_IN,
+	VPE_CHAN_MV_IN,
+	VPE_CHAN_MV_OUT,
+	VPE_CHAN_LUMA_OUT,
+	VPE_CHAN_CHROMA_OUT,
+	VPE_CHAN_RGB_OUT,
+};
+
+/* vpdma descriptor buffer allocation and management */
+int vpdma_alloc_desc_buf(struct vpdma_buf *buf, size_t size);
+void vpdma_free_desc_buf(struct vpdma_buf *buf);
+int vpdma_map_desc_buf(struct vpdma_data *vpdma, struct vpdma_buf *buf);
+void vpdma_unmap_desc_buf(struct vpdma_data *vpdma, struct vpdma_buf *buf);
+
+/* vpdma descriptor list funcs */
+int vpdma_create_desc_list(struct vpdma_desc_list *list, size_t size, int type);
+void vpdma_reset_desc_list(struct vpdma_desc_list *list);
+void vpdma_free_desc_list(struct vpdma_desc_list *list);
+int vpdma_submit_descs(struct vpdma_data *vpdma, struct vpdma_desc_list *list);
+
+/* vpdma list interrupt management */
+void vpdma_enable_list_complete_irq(struct vpdma_data *vpdma, int list_num,
+		bool enable);
+void vpdma_clear_list_stat(struct vpdma_data *vpdma);
+
+/* vpdma client configuration */
+void vpdma_set_line_mode(struct vpdma_data *vpdma, int line_mode,
+		enum vpdma_channel chan);
+void vpdma_set_frame_start_event(struct vpdma_data *vpdma,
+		enum vpdma_frame_start_event fs_event, enum vpdma_channel chan);
+
+void vpdma_dump_regs(struct vpdma_data *vpdma);
+
+/* initialize vpdma, passed with VPE's platform device pointer */
+struct vpdma_data *vpdma_create(struct platform_device *pdev);
+
+#endif
diff --git a/drivers/media/platform/ti-vpe/vpdma_priv.h b/drivers/media/platform/ti-vpe/vpdma_priv.h
new file mode 100644
index 0000000..8ff51a3
--- /dev/null
+++ b/drivers/media/platform/ti-vpe/vpdma_priv.h
@@ -0,0 +1,119 @@
+/*
+ * Copyright (c) 2013 Texas Instruments Inc.
+ *
+ * David Griego, <dagriego@biglakesoftware.com>
+ * Dale Farnsworth, <dale@farnsworth.org>
+ * Archit Taneja, <archit@ti.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#ifndef _TI_VPDMA_PRIV_H_
+#define _TI_VPDMA_PRIV_H_
+
+/*
+ * VPDMA Register offsets
+ */
+
+/* Top level */
+#define VPDMA_PID		0x00
+#define VPDMA_LIST_ADDR		0x04
+#define VPDMA_LIST_ATTR		0x08
+#define VPDMA_LIST_STAT_SYNC	0x0c
+#define VPDMA_BG_RGB		0x18
+#define VPDMA_BG_YUV		0x1c
+#define VPDMA_SETUP		0x30
+#define VPDMA_MAX_SIZE1		0x34
+#define VPDMA_MAX_SIZE2		0x38
+#define VPDMA_MAX_SIZE3		0x3c
+
+/* Interrupts */
+#define VPDMA_INT_CHAN_STAT(grp)	(0x40 + grp * 8)
+#define VPDMA_INT_CHAN_MASK(grp)	(VPDMA_INT_CHAN_STAT(grp) + 4)
+#define VPDMA_INT_CLIENT0_STAT		0x78
+#define VPDMA_INT_CLIENT0_MASK		0x7c
+#define VPDMA_INT_CLIENT1_STAT		0x80
+#define VPDMA_INT_CLIENT1_MASK		0x84
+#define VPDMA_INT_LIST0_STAT		0x88
+#define VPDMA_INT_LIST0_MASK		0x8c
+
+#define VPDMA_PERFMON(i)		(0x200 + i * 4)
+
+/* VPE specific client registers */
+#define VPDMA_DEI_CHROMA1_CSTAT		0x0300
+#define VPDMA_DEI_LUMA1_CSTAT		0x0304
+#define VPDMA_DEI_LUMA2_CSTAT		0x0308
+#define VPDMA_DEI_CHROMA2_CSTAT		0x030c
+#define VPDMA_DEI_LUMA3_CSTAT		0x0310
+#define VPDMA_DEI_CHROMA3_CSTAT		0x0314
+#define VPDMA_DEI_MV_IN_CSTAT		0x0330
+#define VPDMA_DEI_MV_OUT_CSTAT		0x033c
+#define VPDMA_VIP_UP_Y_CSTAT		0x0390
+#define VPDMA_VIP_UP_UV_CSTAT		0x0394
+#define VPDMA_VPI_CTL_CSTAT		0x03d0
+
+/* Reg field info for VPDMA_CLIENT_CSTAT registers */
+#define VPDMA_CSTAT_LINE_MODE_MASK	0x03
+#define VPDMA_CSTAT_LINE_MODE_SHIFT	8
+#define VPDMA_CSTAT_FRAME_START_MASK	0xf
+#define VPDMA_CSTAT_FRAME_START_SHIFT	10
+
+#define VPDMA_LIST_NUM_MASK		0x07
+#define VPDMA_LIST_NUM_SHFT		24
+#define VPDMA_LIST_STOP_SHFT		20
+#define VPDMA_LIST_RDY_MASK		0x01
+#define VPDMA_LIST_RDY_SHFT		19
+#define VPDMA_LIST_TYPE_MASK		0x03
+#define VPDMA_LIST_TYPE_SHFT		16
+#define VPDMA_LIST_SIZE_MASK		0xffff
+
+/* VPDMA data type values for data formats */
+#define DATA_TYPE_Y444				0x0
+#define DATA_TYPE_Y422				0x1
+#define DATA_TYPE_Y420				0x2
+#define DATA_TYPE_C444				0x4
+#define DATA_TYPE_C422				0x5
+#define DATA_TYPE_C420				0x6
+#define DATA_TYPE_YC422				0x7
+#define DATA_TYPE_YC444				0x8
+#define DATA_TYPE_CY422				0x23
+
+#define DATA_TYPE_RGB16_565			0x0
+#define DATA_TYPE_ARGB_1555			0x1
+#define DATA_TYPE_ARGB_4444			0x2
+#define DATA_TYPE_RGBA_5551			0x3
+#define DATA_TYPE_RGBA_4444			0x4
+#define DATA_TYPE_ARGB24_6666			0x5
+#define DATA_TYPE_RGB24_888			0x6
+#define DATA_TYPE_ARGB32_8888			0x7
+#define DATA_TYPE_RGBA24_6666			0x8
+#define DATA_TYPE_RGBA32_8888			0x9
+#define DATA_TYPE_BGR16_565			0x10
+#define DATA_TYPE_ABGR_1555			0x11
+#define DATA_TYPE_ABGR_4444			0x12
+#define DATA_TYPE_BGRA_5551			0x13
+#define DATA_TYPE_BGRA_4444			0x14
+#define DATA_TYPE_ABGR24_6666			0x15
+#define DATA_TYPE_BGR24_888			0x16
+#define DATA_TYPE_ABGR32_8888			0x17
+#define DATA_TYPE_BGRA24_6666			0x18
+#define DATA_TYPE_BGRA32_8888			0x19
+
+#define DATA_TYPE_MV				0x3
+
+/* VPDMA channel numbers(only VPE channels for now) */
+#define	VPE_CHAN_NUM_LUMA1_IN		0
+#define	VPE_CHAN_NUM_CHROMA1_IN		1
+#define	VPE_CHAN_NUM_LUMA2_IN		2
+#define	VPE_CHAN_NUM_CHROMA2_IN		3
+#define	VPE_CHAN_NUM_LUMA3_IN		4
+#define	VPE_CHAN_NUM_CHROMA3_IN		5
+#define	VPE_CHAN_NUM_MV_IN		12
+#define	VPE_CHAN_NUM_MV_OUT		15
+#define	VPE_CHAN_NUM_LUMA_OUT		102
+#define	VPE_CHAN_NUM_CHROMA_OUT		103
+#define	VPE_CHAN_NUM_RGB_OUT		106
+
+#endif
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v3 2/6] v4l: ti-vpe: Add helpers for creating VPDMA descriptors
  2013-08-29 12:32     ` Archit Taneja
@ 2013-08-29 12:32       ` Archit Taneja
  -1 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-08-29 12:32 UTC (permalink / raw)
  To: linux-media
  Cc: hverkuil, laurent.pinchart, tomi.valkeinen, linux-omap, Archit Taneja

Create functions which the VPE driver can use to create a VPDMA descriptor and
add it to a VPDMA descriptor list. These functions take a pointer to an existing
list, and append the configuration/data/control descriptor header to the list.

In the case of configuration descriptors, the creation of a payload block may be
required(the payloads can hold VPE MMR values, or scaler coefficients). The
allocation of the payload buffer and it's content is left to the VPE driver.
However, the VPDMA library provides helper macros to create payload in the
correct format.

Add debug functions to dump the descriptors in a way such that it's easy to see
the values of different fields in the descriptors.

Signed-off-by: Archit Taneja <archit@ti.com>
---
 drivers/media/platform/ti-vpe/vpdma.c      | 268 +++++++++++++++
 drivers/media/platform/ti-vpe/vpdma.h      |  48 +++
 drivers/media/platform/ti-vpe/vpdma_priv.h | 521 +++++++++++++++++++++++++++++
 3 files changed, 837 insertions(+)

diff --git a/drivers/media/platform/ti-vpe/vpdma.c b/drivers/media/platform/ti-vpe/vpdma.c
index 42db12c..af0a5ff 100644
--- a/drivers/media/platform/ti-vpe/vpdma.c
+++ b/drivers/media/platform/ti-vpe/vpdma.c
@@ -21,6 +21,7 @@
 #include <linux/platform_device.h>
 #include <linux/sched.h>
 #include <linux/slab.h>
+#include <linux/videodev2.h>
 
 #include "vpdma.h"
 #include "vpdma_priv.h"
@@ -416,6 +417,273 @@ int vpdma_submit_descs(struct vpdma_data *vpdma, struct vpdma_desc_list *list)
 	return 0;
 }
 
+static void dump_cfd(struct vpdma_cfd *cfd)
+{
+	int class;
+
+	class = cfd_get_class(cfd);
+
+	pr_debug("config descriptor of payload class: %s\n",
+		class == CFD_CLS_BLOCK ? "simple block" :
+		"address data block");
+
+	if (class == CFD_CLS_BLOCK)
+		pr_debug("word0: dst_addr_offset = 0x%08x\n",
+			cfd->dest_addr_offset);
+
+	if (class == CFD_CLS_BLOCK)
+		pr_debug("word1: num_data_wrds = %d\n", cfd->block_len);
+
+	pr_debug("word2: payload_addr = 0x%08x\n", cfd->payload_addr);
+
+	pr_debug("word3: pkt_type = %d, direct = %d, class = %d, dest = %d, "
+		"payload_len = %d\n", cfd_get_pkt_type(cfd),
+		cfd_get_direct(cfd), class, cfd_get_dest(cfd),
+		cfd_get_payload_len(cfd));
+}
+
+/*
+ * append a configuration descriptor to the given descriptor list, where the
+ * payload is in the form of a simple data block specified in the descriptor
+ * header, this is used to upload scaler coefficients to the scaler module
+ */
+void vpdma_add_cfd_block(struct vpdma_desc_list *list, int client,
+		struct vpdma_buf *blk, u32 dest_offset)
+{
+	struct vpdma_cfd *cfd;
+	int len = blk->size;
+
+	WARN_ON(blk->dma_addr & VPDMA_DESC_ALIGN);
+
+	cfd = list->next;
+	WARN_ON((void *)(cfd + 1) > (list->buf.addr + list->buf.size));
+
+	cfd->dest_addr_offset = dest_offset;
+	cfd->block_len = len;
+	cfd->payload_addr = (u32) blk->dma_addr;
+	cfd->ctl_payload_len = cfd_pkt_payload_len(CFD_INDIRECT, CFD_CLS_BLOCK,
+				client, len >> 4);
+
+	list->next = cfd + 1;
+
+	dump_cfd(cfd);
+}
+
+/*
+ * append a configuration descriptor to the given descriptor list, where the
+ * payload is in the address data block format, this is used to a configure a
+ * discontiguous set of MMRs
+ */
+void vpdma_add_cfd_adb(struct vpdma_desc_list *list, int client,
+		struct vpdma_buf *adb)
+{
+	struct vpdma_cfd *cfd;
+	unsigned int len = adb->size;
+
+	WARN_ON(len & VPDMA_ADB_SIZE_ALIGN);
+	WARN_ON(adb->dma_addr & VPDMA_DESC_ALIGN);
+
+	cfd = list->next;
+	BUG_ON((void *)(cfd + 1) > (list->buf.addr + list->buf.size));
+
+	cfd->w0 = 0;
+	cfd->w1 = 0;
+	cfd->payload_addr = (u32) adb->dma_addr;
+	cfd->ctl_payload_len = cfd_pkt_payload_len(CFD_INDIRECT, CFD_CLS_ADB,
+				client, len >> 4);
+
+	list->next = cfd + 1;
+
+	dump_cfd(cfd);
+};
+
+/*
+ * control descriptor format change based on what type of control descriptor it
+ * is, we only use 'sync on channel' control descriptors for now, so assume it's
+ * that
+ */
+static void dump_ctd(struct vpdma_ctd *ctd)
+{
+	pr_debug("control descriptor\n");
+
+	pr_debug("word3: pkt_type = %d, source = %d, ctl_type = %d\n",
+		ctd_get_pkt_type(ctd), ctd_get_source(ctd), ctd_get_ctl(ctd));
+}
+
+/*
+ * append a 'sync on channel' type control descriptor to the given descriptor
+ * list, this descriptor stalls the VPDMA list till the time DMA is completed
+ * on the specified channel
+ */
+void vpdma_add_sync_on_channel_ctd(struct vpdma_desc_list *list,
+		enum vpdma_channel chan)
+{
+	struct vpdma_ctd *ctd;
+
+	ctd = list->next;
+	WARN_ON((void *)(ctd + 1) > (list->buf.addr + list->buf.size));
+
+	ctd->w0 = 0;
+	ctd->w1 = 0;
+	ctd->w2 = 0;
+	ctd->type_source_ctl = ctd_type_source_ctl(chan_info[chan].num,
+				CTD_TYPE_SYNC_ON_CHANNEL);
+
+	list->next = ctd + 1;
+
+	dump_ctd(ctd);
+}
+
+static void dump_dtd(struct vpdma_dtd *dtd)
+{
+	int dir, chan;
+
+	dir = dtd_get_dir(dtd);
+	chan = dtd_get_chan(dtd);
+
+	pr_debug("%s data transfer descriptor for channel %d\n",
+		dir == DTD_DIR_OUT ? "outbound" : "inbound", chan);
+
+	pr_debug("word0: data_type = %d, notify = %d, field = %d, 1D = %d, "
+		"even_ln_skp = %d, odd_ln_skp = %d, line_stride = %d\n",
+		dtd_get_data_type(dtd), dtd_get_notify(dtd), dtd_get_field(dtd),
+		dtd_get_1d(dtd), dtd_get_even_line_skip(dtd),
+		dtd_get_odd_line_skip(dtd), dtd_get_line_stride(dtd));
+
+	if (dir == DTD_DIR_IN)
+		pr_debug("word1: line_length = %d, xfer_height = %d\n",
+			dtd_get_line_length(dtd), dtd_get_xfer_height(dtd));
+
+	pr_debug("word2: start_addr = 0x%08x\n", dtd->start_addr);
+
+	pr_debug("word3: pkt_type = %d, mode = %d, dir = %d, chan = %d, "
+		"pri = %d, next_chan = %d\n", dtd_get_pkt_type(dtd),
+		dtd_get_mode(dtd), dir, chan, dtd_get_priority(dtd),
+		dtd_get_next_chan(dtd));
+
+	if (dir == DTD_DIR_IN)
+		pr_debug("word4: frame_width = %d, frame_height = %d\n",
+			dtd_get_frame_width(dtd), dtd_get_frame_height(dtd));
+	else
+		pr_debug("word4: desc_write_addr = 0x%08x, write_desc = %d, "
+			"drp_data = %d, use_desc_reg = %d\n",
+			dtd_get_desc_write_addr(dtd), dtd_get_write_desc(dtd),
+			dtd_get_drop_data(dtd), dtd_get_use_desc(dtd));
+
+	if (dir == DTD_DIR_IN)
+		pr_debug("word5: hor_start = %d, ver_start = %d\n",
+			dtd_get_h_start(dtd), dtd_get_v_start(dtd));
+	else
+		pr_debug("word5: max_width %d, max_height %d\n",
+			dtd_get_max_width(dtd), dtd_get_max_height(dtd));
+
+	pr_debug("word6: client specfic attr0 = 0x%08x\n", dtd->client_attr0);
+	pr_debug("word7: client specfic attr1 = 0x%08x\n", dtd->client_attr1);
+}
+
+/*
+ * append an outbound data transfer descriptor to the given descriptor list,
+ * this sets up a 'client to memory' VPDMA transfer for the given VPDMA channel
+ */
+void vpdma_add_out_dtd(struct vpdma_desc_list *list, struct v4l2_rect *c_rect,
+		const struct vpdma_data_format *fmt, dma_addr_t dma_addr,
+		enum vpdma_channel chan, u32 flags)
+{
+	int priority = 0;
+	int field = 0;
+	int notify = 1;
+	int channel, next_chan;
+	int depth = fmt->depth;
+	int stride;
+	struct vpdma_dtd *dtd;
+
+	channel = next_chan = chan_info[chan].num;
+
+	if (fmt->data_type == DATA_TYPE_C420)
+		depth = 8;
+
+	stride = (depth * c_rect->width) >> 3;
+	dma_addr += (c_rect->left * depth) >> 3;
+
+	dtd = list->next;
+	WARN_ON((void *)(dtd + 1) > (list->buf.addr + list->buf.size));
+
+	dtd->type_ctl_stride = dtd_type_ctl_stride(fmt->data_type,
+					notify,
+					field,
+					!!(flags & VPDMA_DATA_FRAME_1D),
+					!!(flags & VPDMA_DATA_EVEN_LINE_SKIP),
+					!!(flags & VPDMA_DATA_ODD_LINE_SKIP),
+					stride);
+	dtd->w1 = 0;
+	dtd->start_addr = (u32) dma_addr;
+	dtd->pkt_ctl = dtd_pkt_ctl(!!(flags & VPDMA_DATA_MODE_TILED),
+				DTD_DIR_OUT, channel, priority, next_chan);
+	dtd->desc_write_addr = dtd_desc_write_addr(0, 0, 0, 0);
+	dtd->max_width_height = dtd_max_width_height(MAX_OUT_WIDTH_1920,
+					MAX_OUT_HEIGHT_1080);
+	dtd->client_attr0 = 0;
+	dtd->client_attr1 = 0;
+
+	list->next = dtd + 1;
+
+	dump_dtd(dtd);
+}
+
+/*
+ * append an inbound data transfer descriptor to the given descriptor list,
+ * this sets up a 'memory to client' VPDMA transfer for the given VPDMA channel
+ */
+void vpdma_add_in_dtd(struct vpdma_desc_list *list, int frame_width,
+		int frame_height, struct v4l2_rect *c_rect,
+		const struct vpdma_data_format *fmt, dma_addr_t dma_addr,
+		enum vpdma_channel chan, int field, u32 flags)
+{
+	int priority = 0;
+	int notify = 1;
+	int depth = fmt->depth;
+	int channel, next_chan;
+	int stride;
+	int height = c_rect->height;
+	struct vpdma_dtd *dtd;
+
+	channel = next_chan = chan_info[chan].num;
+
+	if (fmt->data_type == DATA_TYPE_C420) {
+		height >>= 1;
+		frame_height >>= 1;
+		depth = 8;
+	}
+
+	stride = (depth * c_rect->width) >> 3;
+	dma_addr += (c_rect->left * depth) >> 3;
+
+	dtd = list->next;
+	WARN_ON((void *)(dtd + 1) > (list->buf.addr + list->buf.size));
+
+	dtd->type_ctl_stride = dtd_type_ctl_stride(fmt->data_type,
+					notify,
+					field,
+					!!(flags & VPDMA_DATA_FRAME_1D),
+					!!(flags & VPDMA_DATA_EVEN_LINE_SKIP),
+					!!(flags & VPDMA_DATA_ODD_LINE_SKIP),
+					stride);
+
+	dtd->xfer_length_height = dtd_xfer_length_height(c_rect->width, height);
+	dtd->start_addr = (u32) dma_addr;
+	dtd->pkt_ctl = dtd_pkt_ctl(!!(flags & VPDMA_DATA_MODE_TILED),
+				DTD_DIR_IN, channel, priority, next_chan);
+	dtd->frame_width_height = dtd_frame_width_height(frame_width,
+					frame_height);
+	dtd->start_h_v = dtd_start_h_v(c_rect->left, c_rect->top);
+	dtd->client_attr0 = 0;
+	dtd->client_attr1 = 0;
+
+	list->next = dtd + 1;
+
+	dump_dtd(dtd);
+}
+
 /* set or clear the mask for list complete interrupt */
 void vpdma_enable_list_complete_irq(struct vpdma_data *vpdma, int list_num,
 		bool enable)
diff --git a/drivers/media/platform/ti-vpe/vpdma.h b/drivers/media/platform/ti-vpe/vpdma.h
index 9710f57..75f78fb 100644
--- a/drivers/media/platform/ti-vpe/vpdma.h
+++ b/drivers/media/platform/ti-vpe/vpdma.h
@@ -123,6 +123,39 @@ enum vpdma_channel {
 	VPE_CHAN_RGB_OUT,
 };
 
+/* flags for VPDMA data descriptors */
+#define VPDMA_DATA_ODD_LINE_SKIP	(1 << 0)
+#define VPDMA_DATA_EVEN_LINE_SKIP	(1 << 1)
+#define VPDMA_DATA_FRAME_1D		(1 << 2)
+#define VPDMA_DATA_MODE_TILED		(1 << 3)
+
+/*
+ * client identifiers used for configuration descriptors
+ */
+#define CFD_MMR_CLIENT		0
+#define CFD_SC_CLIENT		7
+
+/* Address data block header format */
+struct vpdma_adb_hdr {
+	u32			offset;
+	u32			nwords;
+	u32			reserved0;
+	u32			reserved1;
+};
+
+/* helpers for creating ADB headers for config descriptors MMRs as client */
+#define ADB_ADDR(dma_buf, str, fld)	((dma_buf)->addr + offsetof(str, fld))
+#define MMR_ADB_ADDR(buf, str, fld)	ADB_ADDR(&(buf), struct str, fld)
+
+#define VPDMA_SET_MMR_ADB_HDR(buf, str, hdr, regs, offset_a)	\
+	do {							\
+		struct vpdma_adb_hdr *h;			\
+		struct str *adb = NULL;				\
+		h = MMR_ADB_ADDR(buf, str, hdr);		\
+		h->offset = (offset_a);				\
+		h->nwords = sizeof(adb->regs) >> 2;		\
+	} while (0)
+
 /* vpdma descriptor buffer allocation and management */
 int vpdma_alloc_desc_buf(struct vpdma_buf *buf, size_t size);
 void vpdma_free_desc_buf(struct vpdma_buf *buf);
@@ -135,6 +168,21 @@ void vpdma_reset_desc_list(struct vpdma_desc_list *list);
 void vpdma_free_desc_list(struct vpdma_desc_list *list);
 int vpdma_submit_descs(struct vpdma_data *vpdma, struct vpdma_desc_list *list);
 
+/* helpers for creating vpdma descriptors */
+void vpdma_add_cfd_block(struct vpdma_desc_list *list, int client,
+		struct vpdma_buf *blk, u32 dest_offset);
+void vpdma_add_cfd_adb(struct vpdma_desc_list *list, int client,
+		struct vpdma_buf *adb);
+void vpdma_add_sync_on_channel_ctd(struct vpdma_desc_list *list,
+		enum vpdma_channel chan);
+void vpdma_add_out_dtd(struct vpdma_desc_list *list, struct v4l2_rect *c_rect,
+		const struct vpdma_data_format *fmt, dma_addr_t dma_addr,
+		enum vpdma_channel chan, u32 flags);
+void vpdma_add_in_dtd(struct vpdma_desc_list *list, int frame_width,
+		int frame_height, struct v4l2_rect *c_rect,
+		const struct vpdma_data_format *fmt, dma_addr_t dma_addr,
+		enum vpdma_channel chan, int field, u32 flags);
+
 /* vpdma list interrupt management */
 void vpdma_enable_list_complete_irq(struct vpdma_data *vpdma, int list_num,
 		bool enable);
diff --git a/drivers/media/platform/ti-vpe/vpdma_priv.h b/drivers/media/platform/ti-vpe/vpdma_priv.h
index 8ff51a3..da3976b 100644
--- a/drivers/media/platform/ti-vpe/vpdma_priv.h
+++ b/drivers/media/platform/ti-vpe/vpdma_priv.h
@@ -116,4 +116,525 @@
 #define	VPE_CHAN_NUM_CHROMA_OUT		103
 #define	VPE_CHAN_NUM_RGB_OUT		106
 
+/*
+ * a VPDMA address data block payload for a configuration descriptor needs to
+ * have each sub block length as a multiple of 16 bytes. Therefore, the overall
+ * size of the payload also needs to be a multiple of 16 bytes. The sub block
+ * lengths should be ensured to be aligned by the VPDMA user.
+ */
+#define VPDMA_ADB_SIZE_ALIGN		0x0f
+
+/*
+ * data transfer descriptor
+ */
+struct vpdma_dtd {
+	u32			type_ctl_stride;
+	union {
+		u32		xfer_length_height;
+		u32		w1;
+	};
+	dma_addr_t		start_addr;
+	u32			pkt_ctl;
+	union {
+		u32		frame_width_height;	/* inbound */
+		dma_addr_t	desc_write_addr;	/* outbound */
+	};
+	union {
+		u32		start_h_v;		/* inbound */
+		u32		max_width_height;	/* outbound */
+	};
+	u32			client_attr0;
+	u32			client_attr1;
+};
+
+/* Data Transfer Descriptor specifics */
+#define DTD_NO_NOTIFY		0
+#define DTD_NOTIFY		1
+
+#define DTD_PKT_TYPE		0xa
+#define DTD_DIR_IN		0
+#define DTD_DIR_OUT		1
+
+/* type_ctl_stride */
+#define DTD_DATA_TYPE_MASK	0x3f
+#define DTD_DATA_TYPE_SHFT	26
+#define DTD_NOTIFY_MASK		0x01
+#define DTD_NOTIFY_SHFT		25
+#define DTD_FIELD_MASK		0x01
+#define DTD_FIELD_SHFT		24
+#define DTD_1D_MASK		0x01
+#define DTD_1D_SHFT		23
+#define DTD_EVEN_LINE_SKIP_MASK	0x01
+#define DTD_EVEN_LINE_SKIP_SHFT	20
+#define DTD_ODD_LINE_SKIP_MASK	0x01
+#define DTD_ODD_LINE_SKIP_SHFT	16
+#define DTD_LINE_STRIDE_MASK	0xffff
+#define DTD_LINE_STRIDE_SHFT	0
+
+/* xfer_length_height */
+#define DTD_LINE_LENGTH_MASK	0xffff
+#define DTD_LINE_LENGTH_SHFT	16
+#define DTD_XFER_HEIGHT_MASK	0xffff
+#define DTD_XFER_HEIGHT_SHFT	0
+
+/* pkt_ctl */
+#define DTD_PKT_TYPE_MASK	0x1f
+#define DTD_PKT_TYPE_SHFT	27
+#define DTD_MODE_MASK		0x01
+#define DTD_MODE_SHFT		26
+#define DTD_DIR_MASK		0x01
+#define DTD_DIR_SHFT		25
+#define DTD_CHAN_MASK		0x01ff
+#define DTD_CHAN_SHFT		16
+#define DTD_PRI_MASK		0x0f
+#define DTD_PRI_SHFT		9
+#define DTD_NEXT_CHAN_MASK	0x01ff
+#define DTD_NEXT_CHAN_SHFT	0
+
+/* frame_width_height */
+#define DTD_FRAME_WIDTH_MASK	0xffff
+#define DTD_FRAME_WIDTH_SHFT	16
+#define DTD_FRAME_HEIGHT_MASK	0xffff
+#define DTD_FRAME_HEIGHT_SHFT	0
+
+/* start_h_v */
+#define DTD_H_START_MASK	0xffff
+#define DTD_H_START_SHFT	16
+#define DTD_V_START_MASK	0xffff
+#define DTD_V_START_SHFT	0
+
+#define DTD_DESC_START_SHIFT	5
+#define DTD_WRITE_DESC_MASK	0x01
+#define DTD_WRITE_DESC_SHIFT	2
+#define DTD_DROP_DATA_MASK	0x01
+#define DTD_DROP_DATA_SHIFT	1
+#define DTD_USE_DESC_MASK	0x01
+#define DTD_USE_DESC_SHIFT	0
+
+/* max_width_height */
+#define DTD_MAX_WIDTH_MASK	0x07
+#define DTD_MAX_WIDTH_SHFT	4
+#define DTD_MAX_HEIGHT_MASK	0x07
+#define DTD_MAX_HEIGHT_SHFT	0
+
+/* max width configurations */
+ /* unlimited width */
+#define	MAX_OUT_WIDTH_UNLIMITED		0
+/* as specified in max_size1 reg */
+#define MAX_OUT_WIDTH_REG1		1
+/* as specified in max_size2 reg */
+#define MAX_OUT_WIDTH_REG2		2
+/* as specified in max_size3 reg */
+#define	MAX_OUT_WIDTH_REG3		3
+/* maximum of 352 pixels as width */
+#define MAX_OUT_WIDTH_352		4
+/* maximum of 768 pixels as width */
+#define	MAX_OUT_WIDTH_768		5
+/* maximum of 1280 pixels width */
+#define	MAX_OUT_WIDTH_1280		6
+/* maximum of 1920 pixels as width */
+#define	MAX_OUT_WIDTH_1920		7
+
+/* max height configurations */
+ /* unlimited height */
+#define	MAX_OUT_HEIGHT_UNLIMITED	0
+/* as specified in max_size1 reg */
+#define MAX_OUT_HEIGHT_REG1		1
+/* as specified in max_size2 reg */
+#define MAX_OUT_HEIGHT_REG2		2
+/* as specified in max_size3 reg */
+#define	MAX_OUT_HEIGHT_REG3		3
+/* maximum of 288 lines as height */
+#define MAX_OUT_HEIGHT_288		4
+/* maximum of 576 lines as height */
+#define	MAX_OUT_HEIGHT_576		5
+/* maximum of 720 lines as height */
+#define	MAX_OUT_HEIGHT_720		6
+/* maximum of 1080 lines as height */
+#define	MAX_OUT_HEIGHT_1080		7
+
+static inline u32 dtd_type_ctl_stride(int type, bool notify, int field,
+			bool one_d, bool even_line_skip, bool odd_line_skip,
+			int line_stride)
+{
+	return (type << DTD_DATA_TYPE_SHFT) | (notify << DTD_NOTIFY_SHFT) |
+		(field << DTD_FIELD_SHFT) | (one_d << DTD_1D_SHFT) |
+		(even_line_skip << DTD_EVEN_LINE_SKIP_SHFT) |
+		(odd_line_skip << DTD_ODD_LINE_SKIP_SHFT) |
+		line_stride;
+}
+
+static inline u32 dtd_xfer_length_height(int line_length, int xfer_height)
+{
+	return (line_length << DTD_LINE_LENGTH_SHFT) | xfer_height;
+}
+
+static inline u32 dtd_pkt_ctl(bool mode, bool dir, int chan, int pri,
+			int next_chan)
+{
+	return (DTD_PKT_TYPE << DTD_PKT_TYPE_SHFT) | (mode << DTD_MODE_SHFT) |
+		(dir << DTD_DIR_SHFT) | (chan << DTD_CHAN_SHFT) |
+		(pri << DTD_PRI_SHFT) | next_chan;
+}
+
+static inline u32 dtd_frame_width_height(int width, int height)
+{
+	return (width << DTD_FRAME_WIDTH_SHFT) | height;
+}
+
+static inline u32 dtd_desc_write_addr(unsigned int addr, bool write_desc,
+			bool drop_data, bool use_desc)
+{
+	return (addr << DTD_DESC_START_SHIFT) |
+		(write_desc << DTD_WRITE_DESC_SHIFT) |
+		(drop_data << DTD_DROP_DATA_SHIFT) |
+		use_desc;
+}
+
+static inline u32 dtd_start_h_v(int h_start, int v_start)
+{
+	return (h_start << DTD_H_START_SHFT) | v_start;
+}
+
+static inline u32 dtd_max_width_height(int max_width, int max_height)
+{
+	return (max_width << DTD_MAX_WIDTH_SHFT) | max_height;
+}
+
+static inline int dtd_get_data_type(struct vpdma_dtd *dtd)
+{
+	return dtd->type_ctl_stride >> DTD_DATA_TYPE_SHFT;
+}
+
+static inline bool dtd_get_notify(struct vpdma_dtd *dtd)
+{
+	return (dtd->type_ctl_stride >> DTD_NOTIFY_SHFT) & DTD_NOTIFY_MASK;
+}
+
+static inline int dtd_get_field(struct vpdma_dtd *dtd)
+{
+	return (dtd->type_ctl_stride >> DTD_FIELD_SHFT) & DTD_FIELD_MASK;
+}
+
+static inline bool dtd_get_1d(struct vpdma_dtd *dtd)
+{
+	return (dtd->type_ctl_stride >> DTD_1D_SHFT) & DTD_1D_MASK;
+}
+
+static inline bool dtd_get_even_line_skip(struct vpdma_dtd *dtd)
+{
+	return (dtd->type_ctl_stride >> DTD_EVEN_LINE_SKIP_SHFT)
+		& DTD_EVEN_LINE_SKIP_MASK;
+}
+
+static inline bool dtd_get_odd_line_skip(struct vpdma_dtd *dtd)
+{
+	return (dtd->type_ctl_stride >> DTD_ODD_LINE_SKIP_SHFT)
+		& DTD_ODD_LINE_SKIP_MASK;
+}
+
+static inline int dtd_get_line_stride(struct vpdma_dtd *dtd)
+{
+	return dtd->type_ctl_stride & DTD_LINE_STRIDE_MASK;
+}
+
+static inline int dtd_get_line_length(struct vpdma_dtd *dtd)
+{
+	return dtd->xfer_length_height >> DTD_LINE_LENGTH_SHFT;
+}
+
+static inline int dtd_get_xfer_height(struct vpdma_dtd *dtd)
+{
+	return dtd->xfer_length_height & DTD_XFER_HEIGHT_MASK;
+}
+
+static inline int dtd_get_pkt_type(struct vpdma_dtd *dtd)
+{
+	return dtd->pkt_ctl >> DTD_PKT_TYPE_SHFT;
+}
+
+static inline bool dtd_get_mode(struct vpdma_dtd *dtd)
+{
+	return (dtd->pkt_ctl >> DTD_MODE_SHFT) & DTD_MODE_MASK;
+}
+
+static inline bool dtd_get_dir(struct vpdma_dtd *dtd)
+{
+	return (dtd->pkt_ctl >> DTD_DIR_SHFT) & DTD_DIR_MASK;
+}
+
+static inline int dtd_get_chan(struct vpdma_dtd *dtd)
+{
+	return (dtd->pkt_ctl >> DTD_CHAN_SHFT) & DTD_CHAN_MASK;
+}
+
+static inline int dtd_get_priority(struct vpdma_dtd *dtd)
+{
+	return (dtd->pkt_ctl >> DTD_PRI_SHFT) & DTD_PRI_MASK;
+}
+
+static inline int dtd_get_next_chan(struct vpdma_dtd *dtd)
+{
+	return (dtd->pkt_ctl >> DTD_NEXT_CHAN_SHFT) & DTD_NEXT_CHAN_MASK;
+}
+
+static inline int dtd_get_frame_width(struct vpdma_dtd *dtd)
+{
+	return dtd->frame_width_height >> DTD_FRAME_WIDTH_SHFT;
+}
+
+static inline int dtd_get_frame_height(struct vpdma_dtd *dtd)
+{
+	return dtd->frame_width_height & DTD_FRAME_HEIGHT_MASK;
+}
+
+static inline int dtd_get_desc_write_addr(struct vpdma_dtd *dtd)
+{
+	return dtd->desc_write_addr >> DTD_DESC_START_SHIFT;
+}
+
+static inline bool dtd_get_write_desc(struct vpdma_dtd *dtd)
+{
+	return (dtd->desc_write_addr >> DTD_WRITE_DESC_SHIFT) &
+							DTD_WRITE_DESC_MASK;
+}
+
+static inline bool dtd_get_drop_data(struct vpdma_dtd *dtd)
+{
+	return (dtd->desc_write_addr >> DTD_DROP_DATA_SHIFT) &
+							DTD_DROP_DATA_MASK;
+}
+
+static inline bool dtd_get_use_desc(struct vpdma_dtd *dtd)
+{
+	return dtd->desc_write_addr & DTD_USE_DESC_MASK;
+}
+
+static inline int dtd_get_h_start(struct vpdma_dtd *dtd)
+{
+	return dtd->start_h_v >> DTD_H_START_SHFT;
+}
+
+static inline int dtd_get_v_start(struct vpdma_dtd *dtd)
+{
+	return dtd->start_h_v & DTD_V_START_MASK;
+}
+
+static inline int dtd_get_max_width(struct vpdma_dtd *dtd)
+{
+	return (dtd->max_width_height >> DTD_MAX_WIDTH_SHFT) &
+							DTD_MAX_WIDTH_MASK;
+}
+
+static inline int dtd_get_max_height(struct vpdma_dtd *dtd)
+{
+	return (dtd->max_width_height >> DTD_MAX_HEIGHT_SHFT) &
+							DTD_MAX_HEIGHT_MASK;
+}
+
+/*
+ * configuration descriptor
+ */
+struct vpdma_cfd {
+	union {
+		u32	dest_addr_offset;
+		u32	w0;
+	};
+	union {
+		u32	block_len;		/* in words */
+		u32	w1;
+	};
+	u32		payload_addr;
+	u32		ctl_payload_len;	/* in words */
+};
+
+/* Configuration descriptor specifics */
+
+#define CFD_PKT_TYPE		0xb
+
+#define CFD_DIRECT		1
+#define CFD_INDIRECT		0
+#define CFD_CLS_ADB		0
+#define CFD_CLS_BLOCK		1
+
+/* block_len */
+#define CFD__BLOCK_LEN_MASK	0xffff
+#define CFD__BLOCK_LEN_SHFT	0
+
+/* ctl_payload_len */
+#define CFD_PKT_TYPE_MASK	0x1f
+#define CFD_PKT_TYPE_SHFT	27
+#define CFD_DIRECT_MASK		0x01
+#define CFD_DIRECT_SHFT		26
+#define CFD_CLASS_MASK		0x03
+#define CFD_CLASS_SHFT		24
+#define CFD_DEST_MASK		0xff
+#define CFD_DEST_SHFT		16
+#define CFD_PAYLOAD_LEN_MASK	0xffff
+#define CFD_PAYLOAD_LEN_SHFT	0
+
+static inline u32 cfd_pkt_payload_len(bool direct, int cls, int dest,
+		int payload_len)
+{
+	return (CFD_PKT_TYPE << CFD_PKT_TYPE_SHFT) |
+		(direct << CFD_DIRECT_SHFT) |
+		(cls << CFD_CLASS_SHFT) |
+		(dest << CFD_DEST_SHFT) |
+		payload_len;
+}
+
+static inline int cfd_get_pkt_type(struct vpdma_cfd *cfd)
+{
+	return cfd->ctl_payload_len >> CFD_PKT_TYPE_SHFT;
+}
+
+static inline bool cfd_get_direct(struct vpdma_cfd *cfd)
+{
+	return (cfd->ctl_payload_len >> CFD_DIRECT_SHFT) & CFD_DIRECT_MASK;
+}
+
+static inline bool cfd_get_class(struct vpdma_cfd *cfd)
+{
+	return (cfd->ctl_payload_len >> CFD_CLASS_SHFT) & CFD_CLASS_MASK;
+}
+
+static inline int cfd_get_dest(struct vpdma_cfd *cfd)
+{
+	return (cfd->ctl_payload_len >> CFD_DEST_SHFT) & CFD_DEST_MASK;
+}
+
+static inline int cfd_get_payload_len(struct vpdma_cfd *cfd)
+{
+	return cfd->ctl_payload_len & CFD_PAYLOAD_LEN_MASK;
+}
+
+/*
+ * control descriptor
+ */
+struct vpdma_ctd {
+	union {
+		u32	timer_value;
+		u32	list_addr;
+		u32	w0;
+	};
+	union {
+		u32	pixel_line_count;
+		u32	list_size;
+		u32	w1;
+	};
+	union {
+		u32	event;
+		u32	fid_ctl;
+		u32	w2;
+	};
+	u32		type_source_ctl;
+};
+
+/* control descriptor types */
+#define CTD_TYPE_SYNC_ON_CLIENT		0
+#define CTD_TYPE_SYNC_ON_LIST		1
+#define CTD_TYPE_SYNC_ON_EXT		2
+#define CTD_TYPE_SYNC_ON_CHANNEL	3
+#define CTD_TYPE_CHNG_CLIENT_IRQ	4
+#define CTD_TYPE_SEND_IRQ		5
+#define CTD_TYPE_RELOAD_LIST		6
+#define CTD_TYPE_ABORT_CHANNEL		6
+
+#define CTD_PKT_TYPE		0xc
+
+/* timer_value */
+#define CTD_TIMER_VALUE_MASK	0xffff
+#define CTD_TIMER_VALUE_SHFT	0
+
+/* pixel_line_count */
+#define CTD_PIXEL_COUNT_MASK	0xffff
+#define CTD_PIXEL_COUNT_SHFT	16
+#define CTD_LINE_COUNT_MASK	0xffff
+#define CTD_LINE_COUNT_SHFT	0
+
+/* list_size */
+#define CTD_LIST_SIZE_MASK	0xffff
+#define CTD_LIST_SIZE_SHFT	0
+
+/* event */
+#define CTD_EVENT_MASK		0x0f
+#define CTD_EVENT_SHFT		0
+
+/* fid_ctl */
+#define CTD_FID2_MASK		0x03
+#define CTD_FID2_SHFT		4
+#define CTD_FID1_MASK		0x03
+#define CTD_FID1_SHFT		2
+#define CTD_FID0_MASK		0x03
+#define CTD_FID0_SHFT		0
+
+/* type_source_ctl */
+#define CTD_PKT_TYPE_MASK	0x1f
+#define CTD_PKT_TYPE_SHFT	27
+#define CTD_SOURCE_MASK		0xff
+#define CTD_SOURCE_SHFT		16
+#define CTD_CONTROL_MASK	0x0f
+#define CTD_CONTROL_SHFT	0
+
+static inline u32 ctd_pixel_line_count(int pixel_count, int line_count)
+{
+	return (pixel_count << CTD_PIXEL_COUNT_SHFT) | line_count;
+}
+
+static inline u32 ctd_set_fid_ctl(int fid0, int fid1, int fid2)
+{
+	return (fid2 << CTD_FID2_SHFT) | (fid1 << CTD_FID1_SHFT) | fid0;
+}
+
+static inline u32 ctd_type_source_ctl(int source, int control)
+{
+	return (CTD_PKT_TYPE << CTD_PKT_TYPE_SHFT) |
+		(source << CTD_SOURCE_SHFT) | control;
+}
+
+static inline u32 ctd_get_pixel_count(struct vpdma_ctd *ctd)
+{
+	return ctd->pixel_line_count >> CTD_PIXEL_COUNT_SHFT;
+}
+
+static inline int ctd_get_line_count(struct vpdma_ctd *ctd)
+{
+	return ctd->pixel_line_count & CTD_LINE_COUNT_MASK;
+}
+
+static inline int ctd_get_event(struct vpdma_ctd *ctd)
+{
+	return ctd->event & CTD_EVENT_MASK;
+}
+
+static inline int ctd_get_fid2_ctl(struct vpdma_ctd *ctd)
+{
+	return (ctd->fid_ctl >> CTD_FID2_SHFT) & CTD_FID2_MASK;
+}
+
+static inline int ctd_get_fid1_ctl(struct vpdma_ctd *ctd)
+{
+	return (ctd->fid_ctl >> CTD_FID1_SHFT) & CTD_FID1_MASK;
+}
+
+static inline int ctd_get_fid0_ctl(struct vpdma_ctd *ctd)
+{
+	return ctd->fid_ctl & CTD_FID2_MASK;
+}
+
+static inline int ctd_get_pkt_type(struct vpdma_ctd *ctd)
+{
+	return ctd->type_source_ctl >> CTD_PKT_TYPE_SHFT;
+}
+
+static inline int ctd_get_source(struct vpdma_ctd *ctd)
+{
+	return (ctd->type_source_ctl >> CTD_SOURCE_SHFT) & CTD_SOURCE_MASK;
+}
+
+static inline int ctd_get_ctl(struct vpdma_ctd *ctd)
+{
+	return ctd->type_source_ctl & CTD_CONTROL_MASK;
+}
+
 #endif
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v3 2/6] v4l: ti-vpe: Add helpers for creating VPDMA descriptors
@ 2013-08-29 12:32       ` Archit Taneja
  0 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-08-29 12:32 UTC (permalink / raw)
  To: linux-media
  Cc: hverkuil, laurent.pinchart, tomi.valkeinen, linux-omap, Archit Taneja

Create functions which the VPE driver can use to create a VPDMA descriptor and
add it to a VPDMA descriptor list. These functions take a pointer to an existing
list, and append the configuration/data/control descriptor header to the list.

In the case of configuration descriptors, the creation of a payload block may be
required(the payloads can hold VPE MMR values, or scaler coefficients). The
allocation of the payload buffer and it's content is left to the VPE driver.
However, the VPDMA library provides helper macros to create payload in the
correct format.

Add debug functions to dump the descriptors in a way such that it's easy to see
the values of different fields in the descriptors.

Signed-off-by: Archit Taneja <archit@ti.com>
---
 drivers/media/platform/ti-vpe/vpdma.c      | 268 +++++++++++++++
 drivers/media/platform/ti-vpe/vpdma.h      |  48 +++
 drivers/media/platform/ti-vpe/vpdma_priv.h | 521 +++++++++++++++++++++++++++++
 3 files changed, 837 insertions(+)

diff --git a/drivers/media/platform/ti-vpe/vpdma.c b/drivers/media/platform/ti-vpe/vpdma.c
index 42db12c..af0a5ff 100644
--- a/drivers/media/platform/ti-vpe/vpdma.c
+++ b/drivers/media/platform/ti-vpe/vpdma.c
@@ -21,6 +21,7 @@
 #include <linux/platform_device.h>
 #include <linux/sched.h>
 #include <linux/slab.h>
+#include <linux/videodev2.h>
 
 #include "vpdma.h"
 #include "vpdma_priv.h"
@@ -416,6 +417,273 @@ int vpdma_submit_descs(struct vpdma_data *vpdma, struct vpdma_desc_list *list)
 	return 0;
 }
 
+static void dump_cfd(struct vpdma_cfd *cfd)
+{
+	int class;
+
+	class = cfd_get_class(cfd);
+
+	pr_debug("config descriptor of payload class: %s\n",
+		class == CFD_CLS_BLOCK ? "simple block" :
+		"address data block");
+
+	if (class == CFD_CLS_BLOCK)
+		pr_debug("word0: dst_addr_offset = 0x%08x\n",
+			cfd->dest_addr_offset);
+
+	if (class == CFD_CLS_BLOCK)
+		pr_debug("word1: num_data_wrds = %d\n", cfd->block_len);
+
+	pr_debug("word2: payload_addr = 0x%08x\n", cfd->payload_addr);
+
+	pr_debug("word3: pkt_type = %d, direct = %d, class = %d, dest = %d, "
+		"payload_len = %d\n", cfd_get_pkt_type(cfd),
+		cfd_get_direct(cfd), class, cfd_get_dest(cfd),
+		cfd_get_payload_len(cfd));
+}
+
+/*
+ * append a configuration descriptor to the given descriptor list, where the
+ * payload is in the form of a simple data block specified in the descriptor
+ * header, this is used to upload scaler coefficients to the scaler module
+ */
+void vpdma_add_cfd_block(struct vpdma_desc_list *list, int client,
+		struct vpdma_buf *blk, u32 dest_offset)
+{
+	struct vpdma_cfd *cfd;
+	int len = blk->size;
+
+	WARN_ON(blk->dma_addr & VPDMA_DESC_ALIGN);
+
+	cfd = list->next;
+	WARN_ON((void *)(cfd + 1) > (list->buf.addr + list->buf.size));
+
+	cfd->dest_addr_offset = dest_offset;
+	cfd->block_len = len;
+	cfd->payload_addr = (u32) blk->dma_addr;
+	cfd->ctl_payload_len = cfd_pkt_payload_len(CFD_INDIRECT, CFD_CLS_BLOCK,
+				client, len >> 4);
+
+	list->next = cfd + 1;
+
+	dump_cfd(cfd);
+}
+
+/*
+ * append a configuration descriptor to the given descriptor list, where the
+ * payload is in the address data block format, this is used to a configure a
+ * discontiguous set of MMRs
+ */
+void vpdma_add_cfd_adb(struct vpdma_desc_list *list, int client,
+		struct vpdma_buf *adb)
+{
+	struct vpdma_cfd *cfd;
+	unsigned int len = adb->size;
+
+	WARN_ON(len & VPDMA_ADB_SIZE_ALIGN);
+	WARN_ON(adb->dma_addr & VPDMA_DESC_ALIGN);
+
+	cfd = list->next;
+	BUG_ON((void *)(cfd + 1) > (list->buf.addr + list->buf.size));
+
+	cfd->w0 = 0;
+	cfd->w1 = 0;
+	cfd->payload_addr = (u32) adb->dma_addr;
+	cfd->ctl_payload_len = cfd_pkt_payload_len(CFD_INDIRECT, CFD_CLS_ADB,
+				client, len >> 4);
+
+	list->next = cfd + 1;
+
+	dump_cfd(cfd);
+};
+
+/*
+ * control descriptor format change based on what type of control descriptor it
+ * is, we only use 'sync on channel' control descriptors for now, so assume it's
+ * that
+ */
+static void dump_ctd(struct vpdma_ctd *ctd)
+{
+	pr_debug("control descriptor\n");
+
+	pr_debug("word3: pkt_type = %d, source = %d, ctl_type = %d\n",
+		ctd_get_pkt_type(ctd), ctd_get_source(ctd), ctd_get_ctl(ctd));
+}
+
+/*
+ * append a 'sync on channel' type control descriptor to the given descriptor
+ * list, this descriptor stalls the VPDMA list till the time DMA is completed
+ * on the specified channel
+ */
+void vpdma_add_sync_on_channel_ctd(struct vpdma_desc_list *list,
+		enum vpdma_channel chan)
+{
+	struct vpdma_ctd *ctd;
+
+	ctd = list->next;
+	WARN_ON((void *)(ctd + 1) > (list->buf.addr + list->buf.size));
+
+	ctd->w0 = 0;
+	ctd->w1 = 0;
+	ctd->w2 = 0;
+	ctd->type_source_ctl = ctd_type_source_ctl(chan_info[chan].num,
+				CTD_TYPE_SYNC_ON_CHANNEL);
+
+	list->next = ctd + 1;
+
+	dump_ctd(ctd);
+}
+
+static void dump_dtd(struct vpdma_dtd *dtd)
+{
+	int dir, chan;
+
+	dir = dtd_get_dir(dtd);
+	chan = dtd_get_chan(dtd);
+
+	pr_debug("%s data transfer descriptor for channel %d\n",
+		dir == DTD_DIR_OUT ? "outbound" : "inbound", chan);
+
+	pr_debug("word0: data_type = %d, notify = %d, field = %d, 1D = %d, "
+		"even_ln_skp = %d, odd_ln_skp = %d, line_stride = %d\n",
+		dtd_get_data_type(dtd), dtd_get_notify(dtd), dtd_get_field(dtd),
+		dtd_get_1d(dtd), dtd_get_even_line_skip(dtd),
+		dtd_get_odd_line_skip(dtd), dtd_get_line_stride(dtd));
+
+	if (dir == DTD_DIR_IN)
+		pr_debug("word1: line_length = %d, xfer_height = %d\n",
+			dtd_get_line_length(dtd), dtd_get_xfer_height(dtd));
+
+	pr_debug("word2: start_addr = 0x%08x\n", dtd->start_addr);
+
+	pr_debug("word3: pkt_type = %d, mode = %d, dir = %d, chan = %d, "
+		"pri = %d, next_chan = %d\n", dtd_get_pkt_type(dtd),
+		dtd_get_mode(dtd), dir, chan, dtd_get_priority(dtd),
+		dtd_get_next_chan(dtd));
+
+	if (dir == DTD_DIR_IN)
+		pr_debug("word4: frame_width = %d, frame_height = %d\n",
+			dtd_get_frame_width(dtd), dtd_get_frame_height(dtd));
+	else
+		pr_debug("word4: desc_write_addr = 0x%08x, write_desc = %d, "
+			"drp_data = %d, use_desc_reg = %d\n",
+			dtd_get_desc_write_addr(dtd), dtd_get_write_desc(dtd),
+			dtd_get_drop_data(dtd), dtd_get_use_desc(dtd));
+
+	if (dir == DTD_DIR_IN)
+		pr_debug("word5: hor_start = %d, ver_start = %d\n",
+			dtd_get_h_start(dtd), dtd_get_v_start(dtd));
+	else
+		pr_debug("word5: max_width %d, max_height %d\n",
+			dtd_get_max_width(dtd), dtd_get_max_height(dtd));
+
+	pr_debug("word6: client specfic attr0 = 0x%08x\n", dtd->client_attr0);
+	pr_debug("word7: client specfic attr1 = 0x%08x\n", dtd->client_attr1);
+}
+
+/*
+ * append an outbound data transfer descriptor to the given descriptor list,
+ * this sets up a 'client to memory' VPDMA transfer for the given VPDMA channel
+ */
+void vpdma_add_out_dtd(struct vpdma_desc_list *list, struct v4l2_rect *c_rect,
+		const struct vpdma_data_format *fmt, dma_addr_t dma_addr,
+		enum vpdma_channel chan, u32 flags)
+{
+	int priority = 0;
+	int field = 0;
+	int notify = 1;
+	int channel, next_chan;
+	int depth = fmt->depth;
+	int stride;
+	struct vpdma_dtd *dtd;
+
+	channel = next_chan = chan_info[chan].num;
+
+	if (fmt->data_type == DATA_TYPE_C420)
+		depth = 8;
+
+	stride = (depth * c_rect->width) >> 3;
+	dma_addr += (c_rect->left * depth) >> 3;
+
+	dtd = list->next;
+	WARN_ON((void *)(dtd + 1) > (list->buf.addr + list->buf.size));
+
+	dtd->type_ctl_stride = dtd_type_ctl_stride(fmt->data_type,
+					notify,
+					field,
+					!!(flags & VPDMA_DATA_FRAME_1D),
+					!!(flags & VPDMA_DATA_EVEN_LINE_SKIP),
+					!!(flags & VPDMA_DATA_ODD_LINE_SKIP),
+					stride);
+	dtd->w1 = 0;
+	dtd->start_addr = (u32) dma_addr;
+	dtd->pkt_ctl = dtd_pkt_ctl(!!(flags & VPDMA_DATA_MODE_TILED),
+				DTD_DIR_OUT, channel, priority, next_chan);
+	dtd->desc_write_addr = dtd_desc_write_addr(0, 0, 0, 0);
+	dtd->max_width_height = dtd_max_width_height(MAX_OUT_WIDTH_1920,
+					MAX_OUT_HEIGHT_1080);
+	dtd->client_attr0 = 0;
+	dtd->client_attr1 = 0;
+
+	list->next = dtd + 1;
+
+	dump_dtd(dtd);
+}
+
+/*
+ * append an inbound data transfer descriptor to the given descriptor list,
+ * this sets up a 'memory to client' VPDMA transfer for the given VPDMA channel
+ */
+void vpdma_add_in_dtd(struct vpdma_desc_list *list, int frame_width,
+		int frame_height, struct v4l2_rect *c_rect,
+		const struct vpdma_data_format *fmt, dma_addr_t dma_addr,
+		enum vpdma_channel chan, int field, u32 flags)
+{
+	int priority = 0;
+	int notify = 1;
+	int depth = fmt->depth;
+	int channel, next_chan;
+	int stride;
+	int height = c_rect->height;
+	struct vpdma_dtd *dtd;
+
+	channel = next_chan = chan_info[chan].num;
+
+	if (fmt->data_type == DATA_TYPE_C420) {
+		height >>= 1;
+		frame_height >>= 1;
+		depth = 8;
+	}
+
+	stride = (depth * c_rect->width) >> 3;
+	dma_addr += (c_rect->left * depth) >> 3;
+
+	dtd = list->next;
+	WARN_ON((void *)(dtd + 1) > (list->buf.addr + list->buf.size));
+
+	dtd->type_ctl_stride = dtd_type_ctl_stride(fmt->data_type,
+					notify,
+					field,
+					!!(flags & VPDMA_DATA_FRAME_1D),
+					!!(flags & VPDMA_DATA_EVEN_LINE_SKIP),
+					!!(flags & VPDMA_DATA_ODD_LINE_SKIP),
+					stride);
+
+	dtd->xfer_length_height = dtd_xfer_length_height(c_rect->width, height);
+	dtd->start_addr = (u32) dma_addr;
+	dtd->pkt_ctl = dtd_pkt_ctl(!!(flags & VPDMA_DATA_MODE_TILED),
+				DTD_DIR_IN, channel, priority, next_chan);
+	dtd->frame_width_height = dtd_frame_width_height(frame_width,
+					frame_height);
+	dtd->start_h_v = dtd_start_h_v(c_rect->left, c_rect->top);
+	dtd->client_attr0 = 0;
+	dtd->client_attr1 = 0;
+
+	list->next = dtd + 1;
+
+	dump_dtd(dtd);
+}
+
 /* set or clear the mask for list complete interrupt */
 void vpdma_enable_list_complete_irq(struct vpdma_data *vpdma, int list_num,
 		bool enable)
diff --git a/drivers/media/platform/ti-vpe/vpdma.h b/drivers/media/platform/ti-vpe/vpdma.h
index 9710f57..75f78fb 100644
--- a/drivers/media/platform/ti-vpe/vpdma.h
+++ b/drivers/media/platform/ti-vpe/vpdma.h
@@ -123,6 +123,39 @@ enum vpdma_channel {
 	VPE_CHAN_RGB_OUT,
 };
 
+/* flags for VPDMA data descriptors */
+#define VPDMA_DATA_ODD_LINE_SKIP	(1 << 0)
+#define VPDMA_DATA_EVEN_LINE_SKIP	(1 << 1)
+#define VPDMA_DATA_FRAME_1D		(1 << 2)
+#define VPDMA_DATA_MODE_TILED		(1 << 3)
+
+/*
+ * client identifiers used for configuration descriptors
+ */
+#define CFD_MMR_CLIENT		0
+#define CFD_SC_CLIENT		7
+
+/* Address data block header format */
+struct vpdma_adb_hdr {
+	u32			offset;
+	u32			nwords;
+	u32			reserved0;
+	u32			reserved1;
+};
+
+/* helpers for creating ADB headers for config descriptors MMRs as client */
+#define ADB_ADDR(dma_buf, str, fld)	((dma_buf)->addr + offsetof(str, fld))
+#define MMR_ADB_ADDR(buf, str, fld)	ADB_ADDR(&(buf), struct str, fld)
+
+#define VPDMA_SET_MMR_ADB_HDR(buf, str, hdr, regs, offset_a)	\
+	do {							\
+		struct vpdma_adb_hdr *h;			\
+		struct str *adb = NULL;				\
+		h = MMR_ADB_ADDR(buf, str, hdr);		\
+		h->offset = (offset_a);				\
+		h->nwords = sizeof(adb->regs) >> 2;		\
+	} while (0)
+
 /* vpdma descriptor buffer allocation and management */
 int vpdma_alloc_desc_buf(struct vpdma_buf *buf, size_t size);
 void vpdma_free_desc_buf(struct vpdma_buf *buf);
@@ -135,6 +168,21 @@ void vpdma_reset_desc_list(struct vpdma_desc_list *list);
 void vpdma_free_desc_list(struct vpdma_desc_list *list);
 int vpdma_submit_descs(struct vpdma_data *vpdma, struct vpdma_desc_list *list);
 
+/* helpers for creating vpdma descriptors */
+void vpdma_add_cfd_block(struct vpdma_desc_list *list, int client,
+		struct vpdma_buf *blk, u32 dest_offset);
+void vpdma_add_cfd_adb(struct vpdma_desc_list *list, int client,
+		struct vpdma_buf *adb);
+void vpdma_add_sync_on_channel_ctd(struct vpdma_desc_list *list,
+		enum vpdma_channel chan);
+void vpdma_add_out_dtd(struct vpdma_desc_list *list, struct v4l2_rect *c_rect,
+		const struct vpdma_data_format *fmt, dma_addr_t dma_addr,
+		enum vpdma_channel chan, u32 flags);
+void vpdma_add_in_dtd(struct vpdma_desc_list *list, int frame_width,
+		int frame_height, struct v4l2_rect *c_rect,
+		const struct vpdma_data_format *fmt, dma_addr_t dma_addr,
+		enum vpdma_channel chan, int field, u32 flags);
+
 /* vpdma list interrupt management */
 void vpdma_enable_list_complete_irq(struct vpdma_data *vpdma, int list_num,
 		bool enable);
diff --git a/drivers/media/platform/ti-vpe/vpdma_priv.h b/drivers/media/platform/ti-vpe/vpdma_priv.h
index 8ff51a3..da3976b 100644
--- a/drivers/media/platform/ti-vpe/vpdma_priv.h
+++ b/drivers/media/platform/ti-vpe/vpdma_priv.h
@@ -116,4 +116,525 @@
 #define	VPE_CHAN_NUM_CHROMA_OUT		103
 #define	VPE_CHAN_NUM_RGB_OUT		106
 
+/*
+ * a VPDMA address data block payload for a configuration descriptor needs to
+ * have each sub block length as a multiple of 16 bytes. Therefore, the overall
+ * size of the payload also needs to be a multiple of 16 bytes. The sub block
+ * lengths should be ensured to be aligned by the VPDMA user.
+ */
+#define VPDMA_ADB_SIZE_ALIGN		0x0f
+
+/*
+ * data transfer descriptor
+ */
+struct vpdma_dtd {
+	u32			type_ctl_stride;
+	union {
+		u32		xfer_length_height;
+		u32		w1;
+	};
+	dma_addr_t		start_addr;
+	u32			pkt_ctl;
+	union {
+		u32		frame_width_height;	/* inbound */
+		dma_addr_t	desc_write_addr;	/* outbound */
+	};
+	union {
+		u32		start_h_v;		/* inbound */
+		u32		max_width_height;	/* outbound */
+	};
+	u32			client_attr0;
+	u32			client_attr1;
+};
+
+/* Data Transfer Descriptor specifics */
+#define DTD_NO_NOTIFY		0
+#define DTD_NOTIFY		1
+
+#define DTD_PKT_TYPE		0xa
+#define DTD_DIR_IN		0
+#define DTD_DIR_OUT		1
+
+/* type_ctl_stride */
+#define DTD_DATA_TYPE_MASK	0x3f
+#define DTD_DATA_TYPE_SHFT	26
+#define DTD_NOTIFY_MASK		0x01
+#define DTD_NOTIFY_SHFT		25
+#define DTD_FIELD_MASK		0x01
+#define DTD_FIELD_SHFT		24
+#define DTD_1D_MASK		0x01
+#define DTD_1D_SHFT		23
+#define DTD_EVEN_LINE_SKIP_MASK	0x01
+#define DTD_EVEN_LINE_SKIP_SHFT	20
+#define DTD_ODD_LINE_SKIP_MASK	0x01
+#define DTD_ODD_LINE_SKIP_SHFT	16
+#define DTD_LINE_STRIDE_MASK	0xffff
+#define DTD_LINE_STRIDE_SHFT	0
+
+/* xfer_length_height */
+#define DTD_LINE_LENGTH_MASK	0xffff
+#define DTD_LINE_LENGTH_SHFT	16
+#define DTD_XFER_HEIGHT_MASK	0xffff
+#define DTD_XFER_HEIGHT_SHFT	0
+
+/* pkt_ctl */
+#define DTD_PKT_TYPE_MASK	0x1f
+#define DTD_PKT_TYPE_SHFT	27
+#define DTD_MODE_MASK		0x01
+#define DTD_MODE_SHFT		26
+#define DTD_DIR_MASK		0x01
+#define DTD_DIR_SHFT		25
+#define DTD_CHAN_MASK		0x01ff
+#define DTD_CHAN_SHFT		16
+#define DTD_PRI_MASK		0x0f
+#define DTD_PRI_SHFT		9
+#define DTD_NEXT_CHAN_MASK	0x01ff
+#define DTD_NEXT_CHAN_SHFT	0
+
+/* frame_width_height */
+#define DTD_FRAME_WIDTH_MASK	0xffff
+#define DTD_FRAME_WIDTH_SHFT	16
+#define DTD_FRAME_HEIGHT_MASK	0xffff
+#define DTD_FRAME_HEIGHT_SHFT	0
+
+/* start_h_v */
+#define DTD_H_START_MASK	0xffff
+#define DTD_H_START_SHFT	16
+#define DTD_V_START_MASK	0xffff
+#define DTD_V_START_SHFT	0
+
+#define DTD_DESC_START_SHIFT	5
+#define DTD_WRITE_DESC_MASK	0x01
+#define DTD_WRITE_DESC_SHIFT	2
+#define DTD_DROP_DATA_MASK	0x01
+#define DTD_DROP_DATA_SHIFT	1
+#define DTD_USE_DESC_MASK	0x01
+#define DTD_USE_DESC_SHIFT	0
+
+/* max_width_height */
+#define DTD_MAX_WIDTH_MASK	0x07
+#define DTD_MAX_WIDTH_SHFT	4
+#define DTD_MAX_HEIGHT_MASK	0x07
+#define DTD_MAX_HEIGHT_SHFT	0
+
+/* max width configurations */
+ /* unlimited width */
+#define	MAX_OUT_WIDTH_UNLIMITED		0
+/* as specified in max_size1 reg */
+#define MAX_OUT_WIDTH_REG1		1
+/* as specified in max_size2 reg */
+#define MAX_OUT_WIDTH_REG2		2
+/* as specified in max_size3 reg */
+#define	MAX_OUT_WIDTH_REG3		3
+/* maximum of 352 pixels as width */
+#define MAX_OUT_WIDTH_352		4
+/* maximum of 768 pixels as width */
+#define	MAX_OUT_WIDTH_768		5
+/* maximum of 1280 pixels width */
+#define	MAX_OUT_WIDTH_1280		6
+/* maximum of 1920 pixels as width */
+#define	MAX_OUT_WIDTH_1920		7
+
+/* max height configurations */
+ /* unlimited height */
+#define	MAX_OUT_HEIGHT_UNLIMITED	0
+/* as specified in max_size1 reg */
+#define MAX_OUT_HEIGHT_REG1		1
+/* as specified in max_size2 reg */
+#define MAX_OUT_HEIGHT_REG2		2
+/* as specified in max_size3 reg */
+#define	MAX_OUT_HEIGHT_REG3		3
+/* maximum of 288 lines as height */
+#define MAX_OUT_HEIGHT_288		4
+/* maximum of 576 lines as height */
+#define	MAX_OUT_HEIGHT_576		5
+/* maximum of 720 lines as height */
+#define	MAX_OUT_HEIGHT_720		6
+/* maximum of 1080 lines as height */
+#define	MAX_OUT_HEIGHT_1080		7
+
+static inline u32 dtd_type_ctl_stride(int type, bool notify, int field,
+			bool one_d, bool even_line_skip, bool odd_line_skip,
+			int line_stride)
+{
+	return (type << DTD_DATA_TYPE_SHFT) | (notify << DTD_NOTIFY_SHFT) |
+		(field << DTD_FIELD_SHFT) | (one_d << DTD_1D_SHFT) |
+		(even_line_skip << DTD_EVEN_LINE_SKIP_SHFT) |
+		(odd_line_skip << DTD_ODD_LINE_SKIP_SHFT) |
+		line_stride;
+}
+
+static inline u32 dtd_xfer_length_height(int line_length, int xfer_height)
+{
+	return (line_length << DTD_LINE_LENGTH_SHFT) | xfer_height;
+}
+
+static inline u32 dtd_pkt_ctl(bool mode, bool dir, int chan, int pri,
+			int next_chan)
+{
+	return (DTD_PKT_TYPE << DTD_PKT_TYPE_SHFT) | (mode << DTD_MODE_SHFT) |
+		(dir << DTD_DIR_SHFT) | (chan << DTD_CHAN_SHFT) |
+		(pri << DTD_PRI_SHFT) | next_chan;
+}
+
+static inline u32 dtd_frame_width_height(int width, int height)
+{
+	return (width << DTD_FRAME_WIDTH_SHFT) | height;
+}
+
+static inline u32 dtd_desc_write_addr(unsigned int addr, bool write_desc,
+			bool drop_data, bool use_desc)
+{
+	return (addr << DTD_DESC_START_SHIFT) |
+		(write_desc << DTD_WRITE_DESC_SHIFT) |
+		(drop_data << DTD_DROP_DATA_SHIFT) |
+		use_desc;
+}
+
+static inline u32 dtd_start_h_v(int h_start, int v_start)
+{
+	return (h_start << DTD_H_START_SHFT) | v_start;
+}
+
+static inline u32 dtd_max_width_height(int max_width, int max_height)
+{
+	return (max_width << DTD_MAX_WIDTH_SHFT) | max_height;
+}
+
+static inline int dtd_get_data_type(struct vpdma_dtd *dtd)
+{
+	return dtd->type_ctl_stride >> DTD_DATA_TYPE_SHFT;
+}
+
+static inline bool dtd_get_notify(struct vpdma_dtd *dtd)
+{
+	return (dtd->type_ctl_stride >> DTD_NOTIFY_SHFT) & DTD_NOTIFY_MASK;
+}
+
+static inline int dtd_get_field(struct vpdma_dtd *dtd)
+{
+	return (dtd->type_ctl_stride >> DTD_FIELD_SHFT) & DTD_FIELD_MASK;
+}
+
+static inline bool dtd_get_1d(struct vpdma_dtd *dtd)
+{
+	return (dtd->type_ctl_stride >> DTD_1D_SHFT) & DTD_1D_MASK;
+}
+
+static inline bool dtd_get_even_line_skip(struct vpdma_dtd *dtd)
+{
+	return (dtd->type_ctl_stride >> DTD_EVEN_LINE_SKIP_SHFT)
+		& DTD_EVEN_LINE_SKIP_MASK;
+}
+
+static inline bool dtd_get_odd_line_skip(struct vpdma_dtd *dtd)
+{
+	return (dtd->type_ctl_stride >> DTD_ODD_LINE_SKIP_SHFT)
+		& DTD_ODD_LINE_SKIP_MASK;
+}
+
+static inline int dtd_get_line_stride(struct vpdma_dtd *dtd)
+{
+	return dtd->type_ctl_stride & DTD_LINE_STRIDE_MASK;
+}
+
+static inline int dtd_get_line_length(struct vpdma_dtd *dtd)
+{
+	return dtd->xfer_length_height >> DTD_LINE_LENGTH_SHFT;
+}
+
+static inline int dtd_get_xfer_height(struct vpdma_dtd *dtd)
+{
+	return dtd->xfer_length_height & DTD_XFER_HEIGHT_MASK;
+}
+
+static inline int dtd_get_pkt_type(struct vpdma_dtd *dtd)
+{
+	return dtd->pkt_ctl >> DTD_PKT_TYPE_SHFT;
+}
+
+static inline bool dtd_get_mode(struct vpdma_dtd *dtd)
+{
+	return (dtd->pkt_ctl >> DTD_MODE_SHFT) & DTD_MODE_MASK;
+}
+
+static inline bool dtd_get_dir(struct vpdma_dtd *dtd)
+{
+	return (dtd->pkt_ctl >> DTD_DIR_SHFT) & DTD_DIR_MASK;
+}
+
+static inline int dtd_get_chan(struct vpdma_dtd *dtd)
+{
+	return (dtd->pkt_ctl >> DTD_CHAN_SHFT) & DTD_CHAN_MASK;
+}
+
+static inline int dtd_get_priority(struct vpdma_dtd *dtd)
+{
+	return (dtd->pkt_ctl >> DTD_PRI_SHFT) & DTD_PRI_MASK;
+}
+
+static inline int dtd_get_next_chan(struct vpdma_dtd *dtd)
+{
+	return (dtd->pkt_ctl >> DTD_NEXT_CHAN_SHFT) & DTD_NEXT_CHAN_MASK;
+}
+
+static inline int dtd_get_frame_width(struct vpdma_dtd *dtd)
+{
+	return dtd->frame_width_height >> DTD_FRAME_WIDTH_SHFT;
+}
+
+static inline int dtd_get_frame_height(struct vpdma_dtd *dtd)
+{
+	return dtd->frame_width_height & DTD_FRAME_HEIGHT_MASK;
+}
+
+static inline int dtd_get_desc_write_addr(struct vpdma_dtd *dtd)
+{
+	return dtd->desc_write_addr >> DTD_DESC_START_SHIFT;
+}
+
+static inline bool dtd_get_write_desc(struct vpdma_dtd *dtd)
+{
+	return (dtd->desc_write_addr >> DTD_WRITE_DESC_SHIFT) &
+							DTD_WRITE_DESC_MASK;
+}
+
+static inline bool dtd_get_drop_data(struct vpdma_dtd *dtd)
+{
+	return (dtd->desc_write_addr >> DTD_DROP_DATA_SHIFT) &
+							DTD_DROP_DATA_MASK;
+}
+
+static inline bool dtd_get_use_desc(struct vpdma_dtd *dtd)
+{
+	return dtd->desc_write_addr & DTD_USE_DESC_MASK;
+}
+
+static inline int dtd_get_h_start(struct vpdma_dtd *dtd)
+{
+	return dtd->start_h_v >> DTD_H_START_SHFT;
+}
+
+static inline int dtd_get_v_start(struct vpdma_dtd *dtd)
+{
+	return dtd->start_h_v & DTD_V_START_MASK;
+}
+
+static inline int dtd_get_max_width(struct vpdma_dtd *dtd)
+{
+	return (dtd->max_width_height >> DTD_MAX_WIDTH_SHFT) &
+							DTD_MAX_WIDTH_MASK;
+}
+
+static inline int dtd_get_max_height(struct vpdma_dtd *dtd)
+{
+	return (dtd->max_width_height >> DTD_MAX_HEIGHT_SHFT) &
+							DTD_MAX_HEIGHT_MASK;
+}
+
+/*
+ * configuration descriptor
+ */
+struct vpdma_cfd {
+	union {
+		u32	dest_addr_offset;
+		u32	w0;
+	};
+	union {
+		u32	block_len;		/* in words */
+		u32	w1;
+	};
+	u32		payload_addr;
+	u32		ctl_payload_len;	/* in words */
+};
+
+/* Configuration descriptor specifics */
+
+#define CFD_PKT_TYPE		0xb
+
+#define CFD_DIRECT		1
+#define CFD_INDIRECT		0
+#define CFD_CLS_ADB		0
+#define CFD_CLS_BLOCK		1
+
+/* block_len */
+#define CFD__BLOCK_LEN_MASK	0xffff
+#define CFD__BLOCK_LEN_SHFT	0
+
+/* ctl_payload_len */
+#define CFD_PKT_TYPE_MASK	0x1f
+#define CFD_PKT_TYPE_SHFT	27
+#define CFD_DIRECT_MASK		0x01
+#define CFD_DIRECT_SHFT		26
+#define CFD_CLASS_MASK		0x03
+#define CFD_CLASS_SHFT		24
+#define CFD_DEST_MASK		0xff
+#define CFD_DEST_SHFT		16
+#define CFD_PAYLOAD_LEN_MASK	0xffff
+#define CFD_PAYLOAD_LEN_SHFT	0
+
+static inline u32 cfd_pkt_payload_len(bool direct, int cls, int dest,
+		int payload_len)
+{
+	return (CFD_PKT_TYPE << CFD_PKT_TYPE_SHFT) |
+		(direct << CFD_DIRECT_SHFT) |
+		(cls << CFD_CLASS_SHFT) |
+		(dest << CFD_DEST_SHFT) |
+		payload_len;
+}
+
+static inline int cfd_get_pkt_type(struct vpdma_cfd *cfd)
+{
+	return cfd->ctl_payload_len >> CFD_PKT_TYPE_SHFT;
+}
+
+static inline bool cfd_get_direct(struct vpdma_cfd *cfd)
+{
+	return (cfd->ctl_payload_len >> CFD_DIRECT_SHFT) & CFD_DIRECT_MASK;
+}
+
+static inline bool cfd_get_class(struct vpdma_cfd *cfd)
+{
+	return (cfd->ctl_payload_len >> CFD_CLASS_SHFT) & CFD_CLASS_MASK;
+}
+
+static inline int cfd_get_dest(struct vpdma_cfd *cfd)
+{
+	return (cfd->ctl_payload_len >> CFD_DEST_SHFT) & CFD_DEST_MASK;
+}
+
+static inline int cfd_get_payload_len(struct vpdma_cfd *cfd)
+{
+	return cfd->ctl_payload_len & CFD_PAYLOAD_LEN_MASK;
+}
+
+/*
+ * control descriptor
+ */
+struct vpdma_ctd {
+	union {
+		u32	timer_value;
+		u32	list_addr;
+		u32	w0;
+	};
+	union {
+		u32	pixel_line_count;
+		u32	list_size;
+		u32	w1;
+	};
+	union {
+		u32	event;
+		u32	fid_ctl;
+		u32	w2;
+	};
+	u32		type_source_ctl;
+};
+
+/* control descriptor types */
+#define CTD_TYPE_SYNC_ON_CLIENT		0
+#define CTD_TYPE_SYNC_ON_LIST		1
+#define CTD_TYPE_SYNC_ON_EXT		2
+#define CTD_TYPE_SYNC_ON_CHANNEL	3
+#define CTD_TYPE_CHNG_CLIENT_IRQ	4
+#define CTD_TYPE_SEND_IRQ		5
+#define CTD_TYPE_RELOAD_LIST		6
+#define CTD_TYPE_ABORT_CHANNEL		6
+
+#define CTD_PKT_TYPE		0xc
+
+/* timer_value */
+#define CTD_TIMER_VALUE_MASK	0xffff
+#define CTD_TIMER_VALUE_SHFT	0
+
+/* pixel_line_count */
+#define CTD_PIXEL_COUNT_MASK	0xffff
+#define CTD_PIXEL_COUNT_SHFT	16
+#define CTD_LINE_COUNT_MASK	0xffff
+#define CTD_LINE_COUNT_SHFT	0
+
+/* list_size */
+#define CTD_LIST_SIZE_MASK	0xffff
+#define CTD_LIST_SIZE_SHFT	0
+
+/* event */
+#define CTD_EVENT_MASK		0x0f
+#define CTD_EVENT_SHFT		0
+
+/* fid_ctl */
+#define CTD_FID2_MASK		0x03
+#define CTD_FID2_SHFT		4
+#define CTD_FID1_MASK		0x03
+#define CTD_FID1_SHFT		2
+#define CTD_FID0_MASK		0x03
+#define CTD_FID0_SHFT		0
+
+/* type_source_ctl */
+#define CTD_PKT_TYPE_MASK	0x1f
+#define CTD_PKT_TYPE_SHFT	27
+#define CTD_SOURCE_MASK		0xff
+#define CTD_SOURCE_SHFT		16
+#define CTD_CONTROL_MASK	0x0f
+#define CTD_CONTROL_SHFT	0
+
+static inline u32 ctd_pixel_line_count(int pixel_count, int line_count)
+{
+	return (pixel_count << CTD_PIXEL_COUNT_SHFT) | line_count;
+}
+
+static inline u32 ctd_set_fid_ctl(int fid0, int fid1, int fid2)
+{
+	return (fid2 << CTD_FID2_SHFT) | (fid1 << CTD_FID1_SHFT) | fid0;
+}
+
+static inline u32 ctd_type_source_ctl(int source, int control)
+{
+	return (CTD_PKT_TYPE << CTD_PKT_TYPE_SHFT) |
+		(source << CTD_SOURCE_SHFT) | control;
+}
+
+static inline u32 ctd_get_pixel_count(struct vpdma_ctd *ctd)
+{
+	return ctd->pixel_line_count >> CTD_PIXEL_COUNT_SHFT;
+}
+
+static inline int ctd_get_line_count(struct vpdma_ctd *ctd)
+{
+	return ctd->pixel_line_count & CTD_LINE_COUNT_MASK;
+}
+
+static inline int ctd_get_event(struct vpdma_ctd *ctd)
+{
+	return ctd->event & CTD_EVENT_MASK;
+}
+
+static inline int ctd_get_fid2_ctl(struct vpdma_ctd *ctd)
+{
+	return (ctd->fid_ctl >> CTD_FID2_SHFT) & CTD_FID2_MASK;
+}
+
+static inline int ctd_get_fid1_ctl(struct vpdma_ctd *ctd)
+{
+	return (ctd->fid_ctl >> CTD_FID1_SHFT) & CTD_FID1_MASK;
+}
+
+static inline int ctd_get_fid0_ctl(struct vpdma_ctd *ctd)
+{
+	return ctd->fid_ctl & CTD_FID2_MASK;
+}
+
+static inline int ctd_get_pkt_type(struct vpdma_ctd *ctd)
+{
+	return ctd->type_source_ctl >> CTD_PKT_TYPE_SHFT;
+}
+
+static inline int ctd_get_source(struct vpdma_ctd *ctd)
+{
+	return (ctd->type_source_ctl >> CTD_SOURCE_SHFT) & CTD_SOURCE_MASK;
+}
+
+static inline int ctd_get_ctl(struct vpdma_ctd *ctd)
+{
+	return ctd->type_source_ctl & CTD_CONTROL_MASK;
+}
+
 #endif
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v3 3/6] v4l: ti-vpe: Add VPE mem to mem driver
  2013-08-29 12:32     ` Archit Taneja
@ 2013-08-29 12:32       ` Archit Taneja
  -1 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-08-29 12:32 UTC (permalink / raw)
  To: linux-media
  Cc: hverkuil, laurent.pinchart, tomi.valkeinen, linux-omap, Archit Taneja

VPE is a block which consists of a single memory to memory path which can
perform chrominance up/down sampling, de-interlacing, scaling, and color space
conversion of raster or tiled YUV420 coplanar, YUV422 coplanar or YUV422
interleaved video formats.

We create a mem2mem driver based primarily on the mem2mem-testdev example.
The de-interlacer, scaler and color space converter are all bypassed for now
to keep the driver simple. Chroma up/down sampler blocks are implemented, so
conversion beteen different YUV formats is possible.

Each mem2mem context allocates a buffer for VPE MMR values which it will use
when it gets access to the VPE HW via the mem2mem queue, it also allocates
a VPDMA descriptor list to which configuration and data descriptors are added.

Based on the information received via v4l2 ioctls for the source and
destination queues, the driver configures the values for the MMRs, and stores
them in the buffer. There are also some VPDMA parameters like frame start and
line mode which needs to be configured, these are configured by direct register
writes via the VPDMA helper functions.

The driver's device_run() mem2mem op will add each descriptor based on how the
source and destination queues are set up for the given ctx, once the list is
prepared, it's submitted to VPDMA, these descriptors when parsed by VPDMA will
upload MMR registers, start DMA of video buffers on the various input and output
clients/ports.

When the list is parsed completely(and the DMAs on all the output ports done),
an interrupt is generated which we use to notify that the source and destination
buffers are done.

The rest of the driver is quite similar to other mem2mem drivers, we use the
multiplane v4l2 ioctls as the HW support coplanar formats.

Signed-off-by: Archit Taneja <archit@ti.com>
---
 drivers/media/platform/Kconfig           |   16 +
 drivers/media/platform/Makefile          |    2 +
 drivers/media/platform/ti-vpe/Makefile   |    5 +
 drivers/media/platform/ti-vpe/vpe.c      | 1740 ++++++++++++++++++++++++++++++
 drivers/media/platform/ti-vpe/vpe_regs.h |  496 +++++++++
 5 files changed, 2259 insertions(+)
 create mode 100644 drivers/media/platform/ti-vpe/Makefile
 create mode 100644 drivers/media/platform/ti-vpe/vpe.c
 create mode 100644 drivers/media/platform/ti-vpe/vpe_regs.h

diff --git a/drivers/media/platform/Kconfig b/drivers/media/platform/Kconfig
index 08de865..94eede7 100644
--- a/drivers/media/platform/Kconfig
+++ b/drivers/media/platform/Kconfig
@@ -210,6 +210,22 @@ config VIDEO_SH_VEU
 	    Support for the Video Engine Unit (VEU) on SuperH and
 	    SH-Mobile SoCs.
 
+config VIDEO_TI_VPE
+	tristate "TI VPE (Video Processing Engine) driver"
+	depends on VIDEO_DEV && VIDEO_V4L2 && SOC_DRA7XX
+	select VIDEOBUF2_DMA_CONTIG
+	select V4L2_MEM2MEM_DEV
+	default n
+	---help---
+	  Support for the TI VPE(Video Processing Engine) block
+	  found on DRA7XX SoC.
+
+config VIDEO_TI_VPE_DEBUG
+	bool "VPE debug messages"
+	depends on VIDEO_TI_VPE
+	---help---
+	  Enable debug messages on VPE driver.
+
 endif # V4L_MEM2MEM_DRIVERS
 
 menuconfig V4L_TEST_DRIVERS
diff --git a/drivers/media/platform/Makefile b/drivers/media/platform/Makefile
index eee28dd..d4614e7 100644
--- a/drivers/media/platform/Makefile
+++ b/drivers/media/platform/Makefile
@@ -22,6 +22,8 @@ obj-$(CONFIG_VIDEO_VIVI) += vivi.o
 
 obj-$(CONFIG_VIDEO_MEM2MEM_TESTDEV) += mem2mem_testdev.o
 
+obj-$(CONFIG_VIDEO_TI_VPE)		+= ti-vpe/
+
 obj-$(CONFIG_VIDEO_MX2_EMMAPRP)		+= mx2_emmaprp.o
 obj-$(CONFIG_VIDEO_CODA) 		+= coda.o
 
diff --git a/drivers/media/platform/ti-vpe/Makefile b/drivers/media/platform/ti-vpe/Makefile
new file mode 100644
index 0000000..cbf0a80
--- /dev/null
+++ b/drivers/media/platform/ti-vpe/Makefile
@@ -0,0 +1,5 @@
+obj-$(CONFIG_VIDEO_TI_VPE) += ti-vpe.o
+
+ti-vpe-y := vpe.o vpdma.o
+
+ccflags-$(CONFIG_VIDEO_TI_VPE_DEBUG) += -DDEBUG
diff --git a/drivers/media/platform/ti-vpe/vpe.c b/drivers/media/platform/ti-vpe/vpe.c
new file mode 100644
index 0000000..85b0880
--- /dev/null
+++ b/drivers/media/platform/ti-vpe/vpe.c
@@ -0,0 +1,1740 @@
+/*
+ * TI VPE mem2mem driver, based on the virtual v4l2-mem2mem example driver
+ *
+ * Copyright (c) 2013 Texas Instruments Inc.
+ * David Griego, <dagriego@biglakesoftware.com>
+ * Dale Farnsworth, <dale@farnsworth.org>
+ * Archit Taneja, <archit@ti.com>
+ *
+ * Copyright (c) 2009-2010 Samsung Electronics Co., Ltd.
+ * Pawel Osciak, <pawel@osciak.com>
+ * Marek Szyprowski, <m.szyprowski@samsung.com>
+ *
+ * Based on the virtual v4l2-mem2mem example device
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation
+ */
+
+#include <linux/delay.h>
+#include <linux/dma-mapping.h>
+#include <linux/err.h>
+#include <linux/fs.h>
+#include <linux/interrupt.h>
+#include <linux/io.h>
+#include <linux/ioctl.h>
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <linux/pm_runtime.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/videodev2.h>
+
+#include <media/v4l2-common.h>
+#include <media/v4l2-ctrls.h>
+#include <media/v4l2-device.h>
+#include <media/v4l2-event.h>
+#include <media/v4l2-ioctl.h>
+#include <media/v4l2-mem2mem.h>
+#include <media/videobuf2-core.h>
+#include <media/videobuf2-dma-contig.h>
+
+#include "vpdma.h"
+#include "vpe_regs.h"
+
+#define VPE_MODULE_NAME "vpe"
+
+/* minimum and maximum frame sizes */
+#define MIN_W		128
+#define MIN_H		128
+#define MAX_W		1920
+#define MAX_H		1080
+
+/* required alignments */
+#define S_ALIGN		0	/* multiple of 1 */
+#define H_ALIGN		1	/* multiple of 2 */
+#define W_ALIGN		1	/* multiple of 2 */
+
+/* multiple of 128 bits, line stride, 16 bytes */
+#define L_ALIGN		4
+
+/* flags that indicate a format can be used for capture/output */
+#define VPE_FMT_TYPE_CAPTURE	(1 << 0)
+#define VPE_FMT_TYPE_OUTPUT	(1 << 1)
+
+/* used as plane indices */
+#define VPE_MAX_PLANES	2
+#define VPE_LUMA	0
+#define VPE_CHROMA	1
+
+/* per m2m context info */
+#define VPE_DEF_BUFS_PER_JOB	1	/* default one buffer per batch job */
+
+/*
+ * each VPE context needs up to 3 config desciptors, 7 input descriptors,
+ * 3 output descriptors, and 3 control descripors
+ */
+#define VPE_DESC_LIST_SIZE	(15 * VPDMA_MAX_DESC_SIZE)
+
+#define vpe_dbg(vpedev, fmt, arg...)	\
+		dev_dbg((vpedev)->v4l2_dev.dev, fmt, ##arg)
+#define vpe_err(vpedev, fmt, arg...)	\
+		dev_err((vpedev)->v4l2_dev.dev, fmt, ##arg)
+
+struct vpe_us_coeffs {
+	unsigned short	anchor_fid0_c0;
+	unsigned short	anchor_fid0_c1;
+	unsigned short	anchor_fid0_c2;
+	unsigned short	anchor_fid0_c3;
+	unsigned short	interp_fid0_c0;
+	unsigned short	interp_fid0_c1;
+	unsigned short	interp_fid0_c2;
+	unsigned short	interp_fid0_c3;
+	unsigned short	anchor_fid1_c0;
+	unsigned short	anchor_fid1_c1;
+	unsigned short	anchor_fid1_c2;
+	unsigned short	anchor_fid1_c3;
+	unsigned short	interp_fid1_c0;
+	unsigned short	interp_fid1_c1;
+	unsigned short	interp_fid1_c2;
+	unsigned short	interp_fid1_c3;
+};
+
+/*
+ * Default upsampler coefficients
+ */
+static const struct vpe_us_coeffs us_coeffs[] = {
+	{
+		/* Coefficients for progressive input */
+		0x00C8, 0x0348, 0x0018, 0x3FD8, 0x3FB8, 0x0378, 0x00E8, 0x3FE8,
+		0x00C8, 0x0348, 0x0018, 0x3FD8, 0x3FB8, 0x0378, 0x00E8, 0x3FE8,
+	},
+};
+
+/*
+ * The port_data structure contains per-port data.
+ */
+struct vpe_port_data {
+	enum vpdma_channel channel;	/* VPDMA channel */
+	u8	vb_part;		/* plane index for co-panar formats */
+};
+
+/*
+ * Define indices into the port_data tables
+ */
+#define VPE_PORT_LUMA1_IN	0
+#define VPE_PORT_CHROMA1_IN	1
+#define VPE_PORT_LUMA_OUT	8
+#define VPE_PORT_CHROMA_OUT	9
+#define VPE_PORT_RGB_OUT	10
+
+static const struct vpe_port_data port_data[11] = {
+	[VPE_PORT_LUMA1_IN] = {
+		.channel	= VPE_CHAN_LUMA1_IN,
+		.vb_part	= VPE_LUMA,
+	},
+	[VPE_PORT_CHROMA1_IN] = {
+		.channel	= VPE_CHAN_CHROMA1_IN,
+		.vb_part	= VPE_CHROMA,
+	},
+	[VPE_PORT_LUMA_OUT] = {
+		.channel	= VPE_CHAN_LUMA_OUT,
+		.vb_part	= VPE_LUMA,
+	},
+	[VPE_PORT_CHROMA_OUT] = {
+		.channel	= VPE_CHAN_CHROMA_OUT,
+		.vb_part	= VPE_CHROMA,
+	},
+	[VPE_PORT_RGB_OUT] = {
+		.channel	= VPE_CHAN_RGB_OUT,
+		.vb_part	= VPE_LUMA,
+	},
+};
+
+
+/* driver info for each of the supported video formats */
+struct vpe_fmt {
+	char	*name;			/* human-readable name */
+	u32	fourcc;			/* standard format identifier */
+	u8	types;			/* CAPTURE and/or OUTPUT */
+	u8	coplanar;		/* set for unpacked Luma and Chroma */
+	/* vpdma format info for each plane */
+	struct vpdma_data_format const *vpdma_fmt[VPE_MAX_PLANES];
+};
+
+static struct vpe_fmt vpe_formats[] = {
+	{
+		.name		= "YUV 422 co-planar",
+		.fourcc		= V4L2_PIX_FMT_NV16,
+		.types		= VPE_FMT_TYPE_CAPTURE | VPE_FMT_TYPE_OUTPUT,
+		.coplanar	= 1,
+		.vpdma_fmt	= { &vpdma_yuv_fmts[VPDMA_DATA_FMT_Y444],
+				    &vpdma_yuv_fmts[VPDMA_DATA_FMT_C444],
+				  },
+	},
+	{
+		.name		= "YUV 420 co-planar",
+		.fourcc		= V4L2_PIX_FMT_NV12,
+		.types		= VPE_FMT_TYPE_CAPTURE | VPE_FMT_TYPE_OUTPUT,
+		.coplanar	= 1,
+		.vpdma_fmt	= { &vpdma_yuv_fmts[VPDMA_DATA_FMT_Y420],
+				    &vpdma_yuv_fmts[VPDMA_DATA_FMT_C420],
+				  },
+	},
+	{
+		.name		= "YUYV 422 packed",
+		.fourcc		= V4L2_PIX_FMT_YUYV,
+		.types		= VPE_FMT_TYPE_CAPTURE | VPE_FMT_TYPE_OUTPUT,
+		.coplanar	= 0,
+		.vpdma_fmt	= { &vpdma_yuv_fmts[VPDMA_DATA_FMT_YC422],
+				  },
+	},
+	{
+		.name		= "UYVY 422 packed",
+		.fourcc		= V4L2_PIX_FMT_UYVY,
+		.types		= VPE_FMT_TYPE_CAPTURE | VPE_FMT_TYPE_OUTPUT,
+		.coplanar	= 0,
+		.vpdma_fmt	= { &vpdma_yuv_fmts[VPDMA_DATA_FMT_CY422],
+				  },
+	},
+};
+
+/*
+ * per-queue, driver-specific private data.
+ * there is one source queue and one destination queue for each m2m context.
+ */
+struct vpe_q_data {
+	unsigned int		width;				/* frame width */
+	unsigned int		height;				/* frame height */
+	unsigned int		bytesperline[VPE_MAX_PLANES];	/* bytes per line in memory */
+	enum v4l2_colorspace	colorspace;
+	unsigned int		flags;
+	unsigned int		sizeimage[VPE_MAX_PLANES];	/* image size in memory */
+	struct v4l2_rect	c_rect;				/* crop/compose rectangle */
+	struct vpe_fmt		*fmt;				/* format info */
+};
+
+/* vpe_q_data flag bits */
+#define	Q_DATA_FRAME_1D		(1 << 0)
+#define	Q_DATA_MODE_TILED	(1 << 1)
+
+enum {
+	Q_DATA_SRC = 0,
+	Q_DATA_DST = 1,
+};
+
+/* find our format description corresponding to the passed v4l2_format */
+static struct vpe_fmt *find_format(struct v4l2_format *f)
+{
+	struct vpe_fmt *fmt;
+	unsigned int k;
+
+	for (k = 0; k < ARRAY_SIZE(vpe_formats); k++) {
+		fmt = &vpe_formats[k];
+		if (fmt->fourcc == f->fmt.pix.pixelformat)
+			return fmt;
+	}
+
+	return NULL;
+}
+
+/*
+ * there is one vpe_dev structure in the driver, it is shared by
+ * all instances.
+ */
+struct vpe_dev {
+	struct v4l2_device	v4l2_dev;
+	struct video_device	vfd;
+	struct v4l2_m2m_dev	*m2m_dev;
+
+	atomic_t		num_instances;	/* count of driver instances */
+	dma_addr_t		loaded_mmrs;	/* shadow mmrs in device */
+	struct mutex		dev_mutex;
+	spinlock_t		lock;
+
+	int			irq;
+	void __iomem		*base;
+
+	struct vb2_alloc_ctx	*alloc_ctx;
+	struct vpdma_data	*vpdma;		/* vpdma data handle */
+};
+
+/*
+ * There is one vpe_ctx structure for each m2m context.
+ */
+struct vpe_ctx {
+	struct v4l2_fh		fh;
+	struct vpe_dev		*dev;
+	struct v4l2_m2m_ctx	*m2m_ctx;
+	struct v4l2_ctrl_handler hdl;
+
+	unsigned int		sequence;		/* current frame/field seq */
+	unsigned int		aborting;		/* abort after next irq */
+
+	unsigned int		bufs_per_job;		/* input buffers per batch */
+	unsigned int		bufs_completed;		/* bufs done in this batch */
+
+	struct vpe_q_data	q_data[2];		/* src & dst queue data */
+	struct vb2_buffer	*src_vb;
+	struct vb2_buffer	*dst_vb;
+
+	struct vpdma_buf	mmr_adb;		/* shadow reg addr/data block */
+	struct vpdma_desc_list	desc_list;		/* DMA descriptor list */
+
+	bool			load_mmrs;		/* have new shadow reg values */
+};
+
+
+/*
+ * M2M devices get 2 queues.
+ * Return the queue given the type.
+ */
+static struct vpe_q_data *get_q_data(struct vpe_ctx *ctx,
+				     enum v4l2_buf_type type)
+{
+	switch (type) {
+	case V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE:
+		return &ctx->q_data[Q_DATA_SRC];
+	case V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE:
+		return &ctx->q_data[Q_DATA_DST];
+	default:
+		BUG();
+	}
+	return NULL;
+}
+
+static u32 read_reg(struct vpe_dev *dev, int offset)
+{
+	return ioread32(dev->base + offset);
+}
+
+static void write_reg(struct vpe_dev *dev, int offset, u32 value)
+{
+	iowrite32(value, dev->base + offset);
+}
+
+/* register field read/write helpers */
+static int get_field(u32 value, u32 mask, int shift)
+{
+	return (value & (mask << shift)) >> shift;
+}
+
+static int read_field_reg(struct vpe_dev *dev, int offset, u32 mask, int shift)
+{
+	return get_field(read_reg(dev, offset), mask, shift);
+}
+
+static void write_field(u32 *valp, u32 field, u32 mask, int shift)
+{
+	u32 val = *valp;
+
+	val &= ~(mask << shift);
+	val |= (field & mask) << shift;
+	*valp = val;
+}
+
+static void write_field_reg(struct vpe_dev *dev, int offset, u32 field,
+		u32 mask, int shift)
+{
+	u32 val = read_reg(dev, offset);
+
+	write_field(&val, field, mask, shift);
+
+	write_reg(dev, offset, val);
+}
+
+/*
+ * DMA address/data block for the shadow registers
+ */
+struct vpe_mmr_adb {
+	struct vpdma_adb_hdr	out_fmt_hdr;
+	u32			out_fmt_reg[1];
+	u32			out_fmt_pad[3];
+	struct vpdma_adb_hdr	us1_hdr;
+	u32			us1_regs[8];
+	struct vpdma_adb_hdr	us2_hdr;
+	u32			us2_regs[8];
+	struct vpdma_adb_hdr	us3_hdr;
+	u32			us3_regs[8];
+	struct vpdma_adb_hdr	dei_hdr;
+	u32			dei_regs[1];
+	u32			dei_pad[3];
+	struct vpdma_adb_hdr	sc_hdr;
+	u32			sc_regs[1];
+	u32			sc_pad[3];
+	struct vpdma_adb_hdr	csc_hdr;
+	u32			csc_regs[6];
+	u32			csc_pad[2];
+};
+
+#define VPE_SET_MMR_ADB_HDR(ctx, hdr, regs, offset_a)	\
+	VPDMA_SET_MMR_ADB_HDR(ctx->mmr_adb, vpe_mmr_adb, hdr, regs, offset_a)
+/*
+ * Set the headers for all of the address/data block structures.
+ */
+static void init_adb_hdrs(struct vpe_ctx *ctx)
+{
+	VPE_SET_MMR_ADB_HDR(ctx, out_fmt_hdr, out_fmt_reg, VPE_CLK_FORMAT_SELECT);
+	VPE_SET_MMR_ADB_HDR(ctx, us1_hdr, us1_regs, VPE_US1_R0);
+	VPE_SET_MMR_ADB_HDR(ctx, us2_hdr, us2_regs, VPE_US2_R0);
+	VPE_SET_MMR_ADB_HDR(ctx, us3_hdr, us3_regs, VPE_US3_R0);
+	VPE_SET_MMR_ADB_HDR(ctx, dei_hdr, dei_regs, VPE_DEI_FRAME_SIZE);
+	VPE_SET_MMR_ADB_HDR(ctx, sc_hdr, sc_regs, VPE_SC_MP_SC0);
+	VPE_SET_MMR_ADB_HDR(ctx, csc_hdr, csc_regs, VPE_CSC_CSC00);
+};
+
+/*
+ * Enable or disable the VPE clocks
+ */
+static void vpe_set_clock_enable(struct vpe_dev *dev, bool on)
+{
+	u32 val = 0;
+
+	if (on)
+		val = VPE_DATA_PATH_CLK_ENABLE | VPE_VPEDMA_CLK_ENABLE;
+	write_reg(dev, VPE_CLK_ENABLE, val);
+}
+
+static void vpe_top_reset(struct vpe_dev *dev)
+{
+
+	write_field_reg(dev, VPE_CLK_RESET, 1, VPE_DATA_PATH_CLK_RESET_MASK,
+		VPE_DATA_PATH_CLK_RESET_SHIFT);
+
+	usleep_range(100, 150);
+
+	write_field_reg(dev, VPE_CLK_RESET, 0, VPE_DATA_PATH_CLK_RESET_MASK,
+		VPE_DATA_PATH_CLK_RESET_SHIFT);
+}
+
+static void vpe_top_vpdma_reset(struct vpe_dev *dev)
+{
+	write_field_reg(dev, VPE_CLK_RESET, 1, VPE_VPDMA_CLK_RESET_MASK,
+		VPE_VPDMA_CLK_RESET_SHIFT);
+
+	usleep_range(100, 150);
+
+	write_field_reg(dev, VPE_CLK_RESET, 0, VPE_VPDMA_CLK_RESET_MASK,
+		VPE_VPDMA_CLK_RESET_SHIFT);
+}
+
+/*
+ * Load the correct of upsampler coefficients into the shadow MMRs
+ */
+static void set_us_coefficients(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	u32 *us1_reg = &mmr_adb->us1_regs[0];
+	u32 *us2_reg = &mmr_adb->us2_regs[0];
+	u32 *us3_reg = &mmr_adb->us3_regs[0];
+	const unsigned short *cp, *end_cp;
+
+	cp = &us_coeffs[0].anchor_fid0_c0;
+
+	end_cp = cp + sizeof(us_coeffs[0]) / sizeof(*cp);
+
+	while (cp < end_cp) {
+		write_field(us1_reg, *cp++, VPE_US_C0_MASK, VPE_US_C0_SHIFT);
+		write_field(us1_reg, *cp++, VPE_US_C1_MASK, VPE_US_C1_SHIFT);
+		*us2_reg++ = *us1_reg;
+		*us3_reg++ = *us1_reg++;
+	}
+	ctx->load_mmrs = true;
+}
+
+/*
+ * Set the upsampler config mode and the VPDMA line mode in the shadow MMRs.
+ */
+static void set_cfg_and_line_modes(struct vpe_ctx *ctx)
+{
+	struct vpe_fmt *fmt = ctx->q_data[Q_DATA_SRC].fmt;
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	u32 *us1_reg0 = &mmr_adb->us1_regs[0];
+	u32 *us2_reg0 = &mmr_adb->us2_regs[0];
+	u32 *us3_reg0 = &mmr_adb->us3_regs[0];
+	int line_mode = 1;
+	int cfg_mode = 1;
+
+	/*
+	 * Cfg Mode 0: YUV420 source, enable upsampler, DEI is de-interlacing.
+	 * Cfg Mode 1: YUV422 source, disable upsampler, DEI is de-interlacing.
+	 */
+
+	if (fmt->fourcc == V4L2_PIX_FMT_NV12) {
+		cfg_mode = 0;
+		line_mode = 0;		/* double lines to line buffer */
+	}
+
+	write_field(us1_reg0, cfg_mode, VPE_US_MODE_MASK, VPE_US_MODE_SHIFT);
+	write_field(us2_reg0, cfg_mode, VPE_US_MODE_MASK, VPE_US_MODE_SHIFT);
+	write_field(us3_reg0, cfg_mode, VPE_US_MODE_MASK, VPE_US_MODE_SHIFT);
+
+	/* regs for now */
+	vpdma_set_line_mode(ctx->dev->vpdma, line_mode, VPE_CHAN_CHROMA1_IN);
+
+	/* frame start for input luma */
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_LUMA1_IN);
+
+	/* frame start for input chroma */
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_CHROMA1_IN);
+
+	ctx->load_mmrs = true;
+}
+
+/*
+ * Set the shadow registers that are modified when the source
+ * format changes.
+ */
+static void set_src_registers(struct vpe_ctx *ctx)
+{
+	set_us_coefficients(ctx);
+}
+
+/*
+ * Set the shadow registers that are modified when the destination
+ * format changes.
+ */
+static void set_dst_registers(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	struct vpe_fmt *fmt = ctx->q_data[Q_DATA_DST].fmt;
+	u32 val = 0;
+
+	/* select RGB path when color space conversion is supported in future */
+	if (fmt->fourcc == V4L2_PIX_FMT_RGB24)
+		val |= VPE_RGB_OUT_SELECT | VPE_CSC_SRC_DEI_SCALER;
+	else if (fmt->fourcc == V4L2_PIX_FMT_NV16)
+		val |= VPE_COLOR_SEPARATE_422;
+
+	/* The source of CHR_DS is always the scaler, whether it's used or not */
+	val |= VPE_DS_SRC_DEI_SCALER;
+
+	if (fmt->fourcc != V4L2_PIX_FMT_NV12)
+		val |= VPE_DS_BYPASS;
+
+	mmr_adb->out_fmt_reg[0] = val;
+
+	ctx->load_mmrs = true;
+}
+
+/*
+ * Set the de-interlacer shadow register values
+ */
+static void set_dei_regs_bypass(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	struct vpe_q_data *s_q_data = &ctx->q_data[Q_DATA_SRC];
+	unsigned int src_h = s_q_data->c_rect.height;
+	unsigned int src_w = s_q_data->c_rect.width;
+	u32 *dei_mmr0 = &mmr_adb->dei_regs[0];
+	u32 val = 0;
+
+	/*
+	 * according to TRM, we should set DEI in progressive bypass mode when
+	 * the input content is progressive, however, DEI is bypassed correctly
+	 * for both progressive and interlace content in interlace bypass mode.
+	 * It has been recommended not to use progressive bypass mode.
+	 */
+	val = VPE_DEI_INTERLACE_BYPASS;
+
+	val |= (src_h << VPE_DEI_HEIGHT_SHIFT) |
+		(src_w << VPE_DEI_WIDTH_SHIFT) |
+		VPE_DEI_FIELD_FLUSH;
+
+	*dei_mmr0 = val;
+
+	ctx->load_mmrs = true;
+}
+
+static void set_csc_coeff_bypass(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	u32 *shadow_csc_reg5 = &mmr_adb->csc_regs[5];
+
+	*shadow_csc_reg5 |= VPE_CSC_BYPASS;
+
+	ctx->load_mmrs = true;
+}
+
+static void set_sc_regs_bypass(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	u32 *sc_reg0 = &mmr_adb->sc_regs[0];
+	u32 val = 0;
+
+	val |= VPE_SC_BYPASS;
+	*sc_reg0 = val;
+
+	ctx->load_mmrs = true;
+}
+
+/*
+ * Set the shadow registers whose values are modified when either the
+ * source or destination format is changed.
+ */
+static int set_srcdst_params(struct vpe_ctx *ctx)
+{
+	ctx->sequence = 0;
+
+	set_cfg_and_line_modes(ctx);
+	set_dei_regs_bypass(ctx);
+	set_csc_coeff_bypass(ctx);
+	set_sc_regs_bypass(ctx);
+
+	return 0;
+}
+
+/*
+ * Return the vpe_ctx structure for a given struct file
+ */
+static struct vpe_ctx *file2ctx(struct file *file)
+{
+	return container_of(file->private_data, struct vpe_ctx, fh);
+}
+
+/*
+ * mem2mem callbacks
+ */
+
+/**
+ * job_ready() - check whether an instance is ready to be scheduled to run
+ */
+static int job_ready(void *priv)
+{
+	struct vpe_ctx *ctx = priv;
+	int needed = ctx->bufs_per_job;
+
+	if (v4l2_m2m_num_src_bufs_ready(ctx->m2m_ctx) < needed)
+		return 0;
+
+	return 1;
+}
+
+static void job_abort(void *priv)
+{
+	struct vpe_ctx *ctx = priv;
+
+	/* Will cancel the transaction in the next interrupt handler */
+	ctx->aborting = 1;
+}
+
+/*
+ * Lock access to the device
+ */
+static void vpe_lock(void *priv)
+{
+	struct vpe_ctx *ctx = priv;
+	struct vpe_dev *dev = ctx->dev;
+	mutex_lock(&dev->dev_mutex);
+}
+
+static void vpe_unlock(void *priv)
+{
+	struct vpe_ctx *ctx = priv;
+	struct vpe_dev *dev = ctx->dev;
+	mutex_unlock(&dev->dev_mutex);
+}
+
+static void vpe_dump_regs(struct vpe_dev *dev)
+{
+#define DUMPREG(r) vpe_dbg(dev, "%-35s %08x\n", #r, read_reg(dev, VPE_##r))
+
+	vpe_dbg(dev, "VPE Registers:\n");
+
+	DUMPREG(PID);
+	DUMPREG(SYSCONFIG);
+	DUMPREG(INT0_STATUS0_RAW);
+	DUMPREG(INT0_STATUS0);
+	DUMPREG(INT0_ENABLE0);
+	DUMPREG(INT0_STATUS1_RAW);
+	DUMPREG(INT0_STATUS1);
+	DUMPREG(INT0_ENABLE1);
+	DUMPREG(CLK_ENABLE);
+	DUMPREG(CLK_RESET);
+	DUMPREG(CLK_FORMAT_SELECT);
+	DUMPREG(CLK_RANGE_MAP);
+	DUMPREG(US1_R0);
+	DUMPREG(US1_R1);
+	DUMPREG(US1_R2);
+	DUMPREG(US1_R3);
+	DUMPREG(US1_R4);
+	DUMPREG(US1_R5);
+	DUMPREG(US1_R6);
+	DUMPREG(US1_R7);
+	DUMPREG(US2_R0);
+	DUMPREG(US2_R1);
+	DUMPREG(US2_R2);
+	DUMPREG(US2_R3);
+	DUMPREG(US2_R4);
+	DUMPREG(US2_R5);
+	DUMPREG(US2_R6);
+	DUMPREG(US2_R7);
+	DUMPREG(US3_R0);
+	DUMPREG(US3_R1);
+	DUMPREG(US3_R2);
+	DUMPREG(US3_R3);
+	DUMPREG(US3_R4);
+	DUMPREG(US3_R5);
+	DUMPREG(US3_R6);
+	DUMPREG(US3_R7);
+	DUMPREG(DEI_FRAME_SIZE);
+	DUMPREG(MDT_BYPASS);
+	DUMPREG(MDT_SF_THRESHOLD);
+	DUMPREG(EDI_CONFIG);
+	DUMPREG(DEI_EDI_LUT_R0);
+	DUMPREG(DEI_EDI_LUT_R1);
+	DUMPREG(DEI_EDI_LUT_R2);
+	DUMPREG(DEI_EDI_LUT_R3);
+	DUMPREG(DEI_FMD_WINDOW_R0);
+	DUMPREG(DEI_FMD_WINDOW_R1);
+	DUMPREG(DEI_FMD_CONTROL_R0);
+	DUMPREG(DEI_FMD_CONTROL_R1);
+	DUMPREG(DEI_FMD_STATUS_R0);
+	DUMPREG(DEI_FMD_STATUS_R1);
+	DUMPREG(DEI_FMD_STATUS_R2);
+	DUMPREG(SC_MP_SC0);
+	DUMPREG(SC_MP_SC1);
+	DUMPREG(SC_MP_SC2);
+	DUMPREG(SC_MP_SC3);
+	DUMPREG(SC_MP_SC4);
+	DUMPREG(SC_MP_SC5);
+	DUMPREG(SC_MP_SC6);
+	DUMPREG(SC_MP_SC8);
+	DUMPREG(SC_MP_SC9);
+	DUMPREG(SC_MP_SC10);
+	DUMPREG(SC_MP_SC11);
+	DUMPREG(SC_MP_SC12);
+	DUMPREG(SC_MP_SC13);
+	DUMPREG(SC_MP_SC17);
+	DUMPREG(SC_MP_SC18);
+	DUMPREG(SC_MP_SC19);
+	DUMPREG(SC_MP_SC20);
+	DUMPREG(SC_MP_SC21);
+	DUMPREG(SC_MP_SC22);
+	DUMPREG(SC_MP_SC23);
+	DUMPREG(SC_MP_SC24);
+	DUMPREG(SC_MP_SC25);
+	DUMPREG(CSC_CSC00);
+	DUMPREG(CSC_CSC01);
+	DUMPREG(CSC_CSC02);
+	DUMPREG(CSC_CSC03);
+	DUMPREG(CSC_CSC04);
+	DUMPREG(CSC_CSC05);
+#undef DUMPREG
+}
+
+static void add_out_dtd(struct vpe_ctx *ctx, int port)
+{
+	struct vpe_q_data *q_data = &ctx->q_data[Q_DATA_DST];
+	const struct vpe_port_data *p_data = &port_data[port];
+	struct vb2_buffer *vb = ctx->dst_vb;
+	struct v4l2_rect *c_rect = &q_data->c_rect;
+	struct vpe_fmt *fmt = q_data->fmt;
+	const struct vpdma_data_format *vpdma_fmt;
+	int plane = fmt->coplanar ? p_data->vb_part : 0;
+	dma_addr_t dma_addr;
+	u32 flags = 0;
+
+	vpdma_fmt = fmt->vpdma_fmt[plane];
+	dma_addr = vb2_dma_contig_plane_dma_addr(vb, plane);
+	if (!dma_addr) {
+		vpe_err(ctx->dev,
+			"acquiring output buffer(%d) dma_addr failed\n",
+			port);
+		return;
+	}
+
+	if (q_data->flags & Q_DATA_FRAME_1D)
+		flags |= VPDMA_DATA_FRAME_1D;
+	if (q_data->flags & Q_DATA_MODE_TILED)
+		flags |= VPDMA_DATA_MODE_TILED;
+
+	vpdma_add_out_dtd(&ctx->desc_list, c_rect, vpdma_fmt, dma_addr,
+		p_data->channel, flags);
+}
+
+static void add_in_dtd(struct vpe_ctx *ctx, int port)
+{
+	struct vpe_q_data *q_data = &ctx->q_data[Q_DATA_SRC];
+	const struct vpe_port_data *p_data = &port_data[port];
+	struct vb2_buffer *vb = ctx->src_vb;
+	struct v4l2_rect *c_rect = &q_data->c_rect;
+	struct vpe_fmt *fmt = q_data->fmt;
+	const struct vpdma_data_format *vpdma_fmt;
+	int plane = fmt->coplanar ? p_data->vb_part : 0;
+	int field = 0;
+	dma_addr_t dma_addr;
+	u32 flags = 0;
+
+	vpdma_fmt = fmt->vpdma_fmt[plane];
+
+	dma_addr = vb2_dma_contig_plane_dma_addr(vb, plane);
+	if (!dma_addr) {
+		vpe_err(ctx->dev,
+			"acquiring input buffer(%d) dma_addr failed\n",
+			port);
+		return;
+	}
+
+	if (q_data->flags & Q_DATA_FRAME_1D)
+		flags |= VPDMA_DATA_FRAME_1D;
+	if (q_data->flags & Q_DATA_MODE_TILED)
+		flags |= VPDMA_DATA_MODE_TILED;
+
+	vpdma_add_in_dtd(&ctx->desc_list, q_data->width, q_data->height,
+		c_rect, vpdma_fmt, dma_addr, p_data->channel, field, flags);
+}
+
+/*
+ * Enable the expected IRQ sources
+ */
+static void enable_irqs(struct vpe_ctx *ctx)
+{
+	write_reg(ctx->dev, VPE_INT0_ENABLE0_SET, VPE_INT0_LIST0_COMPLETE);
+	write_reg(ctx->dev, VPE_INT0_ENABLE1_SET, VPE_DS1_UV_ERROR_INT);
+
+	vpdma_enable_list_complete_irq(ctx->dev->vpdma, 0, true);
+}
+
+static void disable_irqs(struct vpe_ctx *ctx)
+{
+	write_reg(ctx->dev, VPE_INT0_ENABLE0_CLR, 0xffffffff);
+	write_reg(ctx->dev, VPE_INT0_ENABLE1_CLR, 0xffffffff);
+
+	vpdma_enable_list_complete_irq(ctx->dev->vpdma, 0, false);
+}
+
+/* device_run() - prepares and starts the device
+ *
+ * This function is only called when both the source and destination
+ * buffers are in place.
+ */
+static void device_run(void *priv)
+{
+	struct vpe_ctx *ctx = priv;
+	struct vpe_q_data *d_q_data = &ctx->q_data[Q_DATA_DST];
+
+	ctx->src_vb = v4l2_m2m_src_buf_remove(ctx->m2m_ctx);
+	WARN_ON(ctx->src_vb == NULL);
+	ctx->dst_vb = v4l2_m2m_dst_buf_remove(ctx->m2m_ctx);
+	WARN_ON(ctx->dst_vb == NULL);
+
+	/* config descriptors */
+	if (ctx->dev->loaded_mmrs != ctx->mmr_adb.dma_addr || ctx->load_mmrs) {
+		vpdma_map_desc_buf(ctx->dev->vpdma, &ctx->mmr_adb);
+		vpdma_add_cfd_adb(&ctx->desc_list, CFD_MMR_CLIENT, &ctx->mmr_adb);
+		ctx->dev->loaded_mmrs = ctx->mmr_adb.dma_addr;
+		ctx->load_mmrs = false;
+	}
+
+	add_out_dtd(ctx, VPE_PORT_LUMA_OUT);
+	if (d_q_data->fmt->coplanar)
+		add_out_dtd(ctx, VPE_PORT_CHROMA_OUT);
+
+	add_in_dtd(ctx, VPE_PORT_LUMA1_IN);
+	add_in_dtd(ctx, VPE_PORT_CHROMA1_IN);
+
+	/* sync on channel control descriptors for output ports */
+	vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_LUMA_OUT);
+	if (d_q_data->fmt->coplanar)
+		vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_CHROMA_OUT);
+
+	enable_irqs(ctx);
+
+	vpdma_map_desc_buf(ctx->dev->vpdma, &ctx->desc_list.buf);
+	vpdma_submit_descs(ctx->dev->vpdma, &ctx->desc_list);
+}
+
+static void ds1_uv_error(struct vpe_ctx *ctx)
+{
+	dev_warn(ctx->dev->v4l2_dev.dev,
+		"received downsampler error interrupt\n");
+}
+
+static irqreturn_t vpe_irq(int irq_vpe, void *data)
+{
+	struct vpe_dev *dev = (struct vpe_dev *)data;
+	struct vpe_ctx *ctx;
+	struct vb2_buffer *s_vb, *d_vb;
+	struct v4l2_buffer *s_buf, *d_buf;
+	unsigned long flags;
+	u32 irqst0, irqst1;
+
+	irqst0 = read_reg(dev, VPE_INT0_STATUS0);
+	if (irqst0) {
+		write_reg(dev, VPE_INT0_STATUS0_CLR, irqst0);
+		vpe_dbg(dev, "INT0_STATUS0 = 0x%08x\n", irqst0);
+	}
+
+	irqst1 = read_reg(dev, VPE_INT0_STATUS1);
+	if (irqst1) {
+		write_reg(dev, VPE_INT0_STATUS1_CLR, irqst1);
+		vpe_dbg(dev, "INT0_STATUS1 = 0x%08x\n", irqst1);
+	}
+
+	ctx = v4l2_m2m_get_curr_priv(dev->m2m_dev);
+	if (!ctx) {
+		vpe_err(dev, "instance released before end of transaction\n");
+		goto handled;
+	}
+
+	if (irqst1 & VPE_DS1_UV_ERROR_INT) {
+		irqst1 &= ~VPE_DS1_UV_ERROR_INT;
+		ds1_uv_error(ctx);
+	}
+
+	if (irqst0) {
+		if (irqst0 & VPE_INT0_LIST0_COMPLETE)
+			vpdma_clear_list_stat(ctx->dev->vpdma);
+
+		irqst0 &= ~(VPE_INT0_LIST0_COMPLETE);
+	}
+
+	if (irqst0 | irqst1) {
+		dev_warn(dev->v4l2_dev.dev, "Unexpected interrupt: "
+			"INT0_STATUS0 = 0x%08x, INT0_STATUS1 = 0x%08x\n",
+			irqst0, irqst1);
+	}
+
+	disable_irqs(ctx);
+
+	vpdma_unmap_desc_buf(dev->vpdma, &ctx->desc_list.buf);
+	vpdma_unmap_desc_buf(dev->vpdma, &ctx->mmr_adb);
+
+	vpdma_reset_desc_list(&ctx->desc_list);
+
+	if (ctx->aborting)
+		goto finished;
+
+	s_vb = ctx->src_vb;
+	d_vb = ctx->dst_vb;
+	s_buf = &s_vb->v4l2_buf;
+	d_buf = &d_vb->v4l2_buf;
+
+	d_buf->timestamp = s_buf->timestamp;
+	if (s_buf->flags & V4L2_BUF_FLAG_TIMECODE) {
+		d_buf->flags |= V4L2_BUF_FLAG_TIMECODE;
+		d_buf->timecode = s_buf->timecode;
+	}
+
+	d_buf->sequence = ctx->sequence;
+
+	ctx->sequence++;
+
+	spin_lock_irqsave(&dev->lock, flags);
+	v4l2_m2m_buf_done(s_vb, VB2_BUF_STATE_DONE);
+	v4l2_m2m_buf_done(d_vb, VB2_BUF_STATE_DONE);
+	spin_unlock_irqrestore(&dev->lock, flags);
+
+	ctx->bufs_completed++;
+	if (ctx->bufs_completed < ctx->bufs_per_job) {
+		device_run(ctx);
+		goto handled;
+	}
+
+finished:
+	vpe_dbg(ctx->dev, "finishing transaction\n");
+	ctx->bufs_completed = 0;
+	v4l2_m2m_job_finish(dev->m2m_dev, ctx->m2m_ctx);
+handled:
+	return IRQ_HANDLED;
+}
+
+/*
+ * video ioctls
+ */
+static int vpe_querycap(struct file *file, void *priv,
+			struct v4l2_capability *cap)
+{
+	strncpy(cap->driver, VPE_MODULE_NAME, sizeof(cap->driver) - 1);
+	strncpy(cap->card, VPE_MODULE_NAME, sizeof(cap->card) - 1);
+	strlcpy(cap->bus_info, VPE_MODULE_NAME, sizeof(cap->bus_info));
+	cap->device_caps  = V4L2_CAP_VIDEO_M2M | V4L2_CAP_STREAMING;
+	cap->capabilities = cap->device_caps | V4L2_CAP_DEVICE_CAPS;
+	return 0;
+}
+
+static int __enum_fmt(struct v4l2_fmtdesc *f, u32 type)
+{
+	int i, index;
+	struct vpe_fmt *fmt = NULL;
+
+	index = 0;
+	for (i = 0; i < ARRAY_SIZE(vpe_formats); ++i) {
+		if (vpe_formats[i].types & type) {
+			if (index == f->index) {
+				fmt = &vpe_formats[i];
+				break;
+			}
+			index++;
+		}
+	}
+
+	if (!fmt)
+		return -EINVAL;
+
+	strncpy(f->description, fmt->name, sizeof(f->description) - 1);
+	f->pixelformat = fmt->fourcc;
+	return 0;
+}
+
+static int vpe_enum_fmt(struct file *file, void *priv,
+				struct v4l2_fmtdesc *f)
+{
+	if (V4L2_TYPE_IS_OUTPUT(f->type))
+		return __enum_fmt(f, VPE_FMT_TYPE_OUTPUT);
+	else
+		return __enum_fmt(f, VPE_FMT_TYPE_CAPTURE);
+}
+
+static int vpe_g_fmt(struct file *file, void *priv, struct v4l2_format *f)
+{
+	struct v4l2_pix_format_mplane *pix = &f->fmt.pix_mp;
+	struct vpe_ctx *ctx = file2ctx(file);
+	struct vb2_queue *vq;
+	struct vpe_q_data *q_data;
+	int i;
+
+	vq = v4l2_m2m_get_vq(ctx->m2m_ctx, f->type);
+	if (!vq)
+		return -EINVAL;
+
+	q_data = get_q_data(ctx, f->type);
+
+	pix->width = q_data->width;
+	pix->height = q_data->height;
+	pix->pixelformat = q_data->fmt->fourcc;
+	pix->colorspace = q_data->colorspace;
+	pix->num_planes = q_data->fmt->coplanar ? 2 : 1;
+
+	for (i = 0; i < pix->num_planes; i++) {
+		pix->plane_fmt[i].bytesperline = q_data->bytesperline[i];
+		pix->plane_fmt[i].sizeimage = q_data->sizeimage[i];
+	}
+
+	return 0;
+}
+
+static int __vpe_try_fmt(struct vpe_ctx *ctx, struct v4l2_format *f,
+		       struct vpe_fmt *fmt, int type)
+{
+	struct v4l2_pix_format_mplane *pix = &f->fmt.pix_mp;
+	struct v4l2_plane_pix_format *plane_fmt;
+	int i;
+
+	if (!fmt || !(fmt->types & type)) {
+		vpe_err(ctx->dev, "Fourcc format (0x%08x) invalid.\n",
+			pix->pixelformat);
+		return -EINVAL;
+	}
+
+	pix->field = V4L2_FIELD_NONE;
+
+	v4l_bound_align_image(&pix->width, MIN_W, MAX_W, W_ALIGN,
+			      &pix->height, MIN_H, MAX_H, H_ALIGN,
+			      S_ALIGN);
+
+	pix->num_planes = fmt->coplanar ? 2 : 1;
+	pix->pixelformat = fmt->fourcc;
+	pix->colorspace = fmt->fourcc == V4L2_PIX_FMT_RGB24 ?
+			V4L2_COLORSPACE_SRGB : V4L2_COLORSPACE_SMPTE170M;
+
+
+	for (i = 0; i < pix->num_planes; i++) {
+		int depth;
+
+		plane_fmt = &pix->plane_fmt[i];
+		depth = fmt->vpdma_fmt[i]->depth;
+
+		if (i == VPE_LUMA)
+			plane_fmt->bytesperline =
+					round_up((pix->width * depth) >> 3,
+						1 << L_ALIGN);
+		else
+			plane_fmt->bytesperline = pix->width;
+
+		plane_fmt->sizeimage =
+				(pix->height * pix->width * depth) >> 3;
+	}
+
+	return 0;
+}
+
+static int vpe_try_fmt(struct file *file, void *priv, struct v4l2_format *f)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+	struct vpe_fmt *fmt = find_format(f);
+
+	if (V4L2_TYPE_IS_OUTPUT(f->type))
+		return __vpe_try_fmt(ctx, f, fmt, VPE_FMT_TYPE_OUTPUT);
+	else
+		return __vpe_try_fmt(ctx, f, fmt, VPE_FMT_TYPE_CAPTURE);
+}
+
+static int __vpe_s_fmt(struct vpe_ctx *ctx, struct v4l2_format *f)
+{
+	struct v4l2_pix_format_mplane *pix = &f->fmt.pix_mp;
+	struct v4l2_plane_pix_format *plane_fmt;
+	struct vpe_q_data *q_data;
+	struct vb2_queue *vq;
+	int i;
+
+	vq = v4l2_m2m_get_vq(ctx->m2m_ctx, f->type);
+	if (!vq)
+		return -EINVAL;
+
+	if (vb2_is_busy(vq)) {
+		vpe_err(ctx->dev, "queue busy\n");
+		return -EBUSY;
+	}
+
+	q_data = get_q_data(ctx, f->type);
+	if (!q_data)
+		return -EINVAL;
+
+	q_data->fmt		= find_format(f);
+	q_data->width		= pix->width;
+	q_data->height		= pix->height;
+	q_data->colorspace	= pix->colorspace;
+
+	for (i = 0; i < pix->num_planes; i++) {
+		plane_fmt = &pix->plane_fmt[i];
+
+		q_data->bytesperline[i]	= plane_fmt->bytesperline;
+		q_data->sizeimage[i]	= plane_fmt->sizeimage;
+	}
+
+	q_data->c_rect.left	= 0;
+	q_data->c_rect.top	= 0;
+	q_data->c_rect.width	= q_data->width;
+	q_data->c_rect.height	= q_data->height;
+
+	vpe_dbg(ctx->dev, "Setting format for type %d, wxh: %dx%d, fmt: %d bpl_y %d",
+		f->type, q_data->width, q_data->height, q_data->fmt->fourcc,
+		q_data->bytesperline[VPE_LUMA]);
+	if (q_data->fmt->coplanar)
+		vpe_dbg(ctx->dev, " bpl_uv %d\n",
+			q_data->bytesperline[VPE_CHROMA]);
+
+	return 0;
+}
+
+static int vpe_s_fmt(struct file *file, void *priv, struct v4l2_format *f)
+{
+	int ret;
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	ret = vpe_try_fmt(file, priv, f);
+	if (ret)
+		return ret;
+
+	ret = __vpe_s_fmt(ctx, f);
+	if (ret)
+		return ret;
+
+	if (V4L2_TYPE_IS_OUTPUT(f->type))
+		set_src_registers(ctx);
+	else
+		set_dst_registers(ctx);
+
+	return set_srcdst_params(ctx);
+}
+
+static int vpe_reqbufs(struct file *file, void *priv,
+		       struct v4l2_requestbuffers *reqbufs)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	return v4l2_m2m_reqbufs(file, ctx->m2m_ctx, reqbufs);
+}
+
+static int vpe_querybuf(struct file *file, void *priv, struct v4l2_buffer *buf)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	return v4l2_m2m_querybuf(file, ctx->m2m_ctx, buf);
+}
+
+static int vpe_qbuf(struct file *file, void *priv, struct v4l2_buffer *buf)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	return v4l2_m2m_qbuf(file, ctx->m2m_ctx, buf);
+}
+
+static int vpe_dqbuf(struct file *file, void *priv, struct v4l2_buffer *buf)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	return v4l2_m2m_dqbuf(file, ctx->m2m_ctx, buf);
+}
+
+static int vpe_streamon(struct file *file, void *priv, enum v4l2_buf_type type)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	return v4l2_m2m_streamon(file, ctx->m2m_ctx, type);
+}
+
+static int vpe_streamoff(struct file *file, void *priv, enum v4l2_buf_type type)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	vpe_dump_regs(ctx->dev);
+	vpdma_dump_regs(ctx->dev->vpdma);
+
+	return v4l2_m2m_streamoff(file, ctx->m2m_ctx, type);
+}
+
+#define V4L2_CID_TRANS_NUM_BUFS		(V4L2_CID_USER_BASE + 0x1000)
+
+static int vpe_s_ctrl(struct v4l2_ctrl *ctrl)
+{
+	struct vpe_ctx *ctx =
+		container_of(ctrl->handler, struct vpe_ctx, hdl);
+
+	switch (ctrl->id) {
+	case V4L2_CID_TRANS_NUM_BUFS:
+		ctx->bufs_per_job = ctrl->val;
+		break;
+
+	default:
+		vpe_err(ctx->dev, "Invalid control\n");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static const struct v4l2_ctrl_ops vpe_ctrl_ops = {
+	.s_ctrl = vpe_s_ctrl,
+};
+
+static const struct v4l2_ioctl_ops vpe_ioctl_ops = {
+	.vidioc_querycap	= vpe_querycap,
+
+	.vidioc_enum_fmt_vid_cap_mplane = vpe_enum_fmt,
+	.vidioc_g_fmt_vid_cap_mplane	= vpe_g_fmt,
+	.vidioc_try_fmt_vid_cap_mplane	= vpe_try_fmt,
+	.vidioc_s_fmt_vid_cap_mplane	= vpe_s_fmt,
+
+	.vidioc_enum_fmt_vid_out_mplane = vpe_enum_fmt,
+	.vidioc_g_fmt_vid_out_mplane	= vpe_g_fmt,
+	.vidioc_try_fmt_vid_out_mplane	= vpe_try_fmt,
+	.vidioc_s_fmt_vid_out_mplane	= vpe_s_fmt,
+
+	.vidioc_reqbufs		= vpe_reqbufs,
+	.vidioc_querybuf	= vpe_querybuf,
+
+	.vidioc_qbuf		= vpe_qbuf,
+	.vidioc_dqbuf		= vpe_dqbuf,
+
+	.vidioc_streamon	= vpe_streamon,
+	.vidioc_streamoff	= vpe_streamoff,
+	.vidioc_subscribe_event = v4l2_ctrl_subscribe_event,
+	.vidioc_unsubscribe_event = v4l2_event_unsubscribe,
+};
+
+/*
+ * Queue operations
+ */
+static int vpe_queue_setup(struct vb2_queue *vq,
+			   const struct v4l2_format *fmt,
+			   unsigned int *nbuffers, unsigned int *nplanes,
+			   unsigned int sizes[], void *alloc_ctxs[])
+{
+	int i;
+	struct vpe_ctx *ctx = vb2_get_drv_priv(vq);
+	struct vpe_q_data *q_data;
+
+	q_data = get_q_data(ctx, vq->type);
+
+	*nplanes = q_data->fmt->coplanar ? 2 : 1;
+
+	for (i = 0; i < *nplanes; i++) {
+		sizes[i] = q_data->sizeimage[i];
+		alloc_ctxs[i] = ctx->dev->alloc_ctx;
+	}
+
+	vpe_dbg(ctx->dev, "get %d buffer(s) of size %d", *nbuffers,
+		sizes[VPE_LUMA]);
+	if (q_data->fmt->coplanar)
+		vpe_dbg(ctx->dev, " and %d\n", sizes[VPE_CHROMA]);
+
+	return 0;
+}
+
+static int vpe_buf_prepare(struct vb2_buffer *vb)
+{
+	struct vpe_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
+	struct vpe_q_data *q_data;
+	int i, num_planes;
+
+	vpe_dbg(ctx->dev, "type: %d\n", vb->vb2_queue->type);
+
+	q_data = get_q_data(ctx, vb->vb2_queue->type);
+	num_planes = q_data->fmt->coplanar ? 2 : 1;
+
+	for (i = 0; i < num_planes; i++) {
+		if (vb2_plane_size(vb, i) < q_data->sizeimage[i]) {
+			vpe_err(ctx->dev,
+				"data will not fit into plane (%lu < %lu)\n",
+				vb2_plane_size(vb, i),
+				(long) q_data->sizeimage[i]);
+			return -EINVAL;
+		}
+	}
+
+	for (i = 0; i < num_planes; i++)
+		vb2_set_plane_payload(vb, i, q_data->sizeimage[i]);
+
+	return 0;
+}
+
+static void vpe_buf_queue(struct vb2_buffer *vb)
+{
+	struct vpe_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
+	v4l2_m2m_buf_queue(ctx->m2m_ctx, vb);
+}
+
+static void vpe_wait_prepare(struct vb2_queue *q)
+{
+	struct vpe_ctx *ctx = vb2_get_drv_priv(q);
+	vpe_unlock(ctx);
+}
+
+static void vpe_wait_finish(struct vb2_queue *q)
+{
+	struct vpe_ctx *ctx = vb2_get_drv_priv(q);
+	vpe_lock(ctx);
+}
+
+static struct vb2_ops vpe_qops = {
+	.queue_setup	 = vpe_queue_setup,
+	.buf_prepare	 = vpe_buf_prepare,
+	.buf_queue	 = vpe_buf_queue,
+	.wait_prepare	 = vpe_wait_prepare,
+	.wait_finish	 = vpe_wait_finish,
+};
+
+static int queue_init(void *priv, struct vb2_queue *src_vq,
+		      struct vb2_queue *dst_vq)
+{
+	struct vpe_ctx *ctx = priv;
+	int ret;
+
+	memset(src_vq, 0, sizeof(*src_vq));
+	src_vq->type = V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE;
+	src_vq->io_modes = VB2_MMAP;
+	src_vq->drv_priv = ctx;
+	src_vq->buf_struct_size = sizeof(struct v4l2_m2m_buffer);
+	src_vq->ops = &vpe_qops;
+	src_vq->mem_ops = &vb2_dma_contig_memops;
+	src_vq->timestamp_type = V4L2_BUF_FLAG_TIMESTAMP_MONOTONIC;
+
+	ret = vb2_queue_init(src_vq);
+	if (ret)
+		return ret;
+
+	memset(dst_vq, 0, sizeof(*dst_vq));
+	dst_vq->type = V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE;
+	dst_vq->io_modes = VB2_MMAP;
+	dst_vq->drv_priv = ctx;
+	dst_vq->buf_struct_size = sizeof(struct v4l2_m2m_buffer);
+	dst_vq->ops = &vpe_qops;
+	dst_vq->mem_ops = &vb2_dma_contig_memops;
+	dst_vq->timestamp_type = V4L2_BUF_FLAG_TIMESTAMP_MONOTONIC;
+
+	return vb2_queue_init(dst_vq);
+}
+
+static const struct v4l2_ctrl_config vpe_bufs_per_job = {
+	.ops = &vpe_ctrl_ops,
+	.id = V4L2_CID_TRANS_NUM_BUFS,
+	.name = "Buffers Per Transaction",
+	.type = V4L2_CTRL_TYPE_INTEGER,
+	.def = VPE_DEF_BUFS_PER_JOB,
+	.min = 1,
+	.max = VIDEO_MAX_FRAME,
+	.step = 1,
+};
+
+/*
+ * File operations
+ */
+static int vpe_open(struct file *file)
+{
+	struct vpe_dev *dev = video_drvdata(file);
+	struct vpe_ctx *ctx = NULL;
+	struct vpe_q_data *s_q_data;
+	struct v4l2_ctrl_handler *hdl;
+	int ret;
+
+	vpe_dbg(dev, "vpe_open\n");
+
+	if (!dev->vpdma->ready) {
+		vpe_err(dev, "vpdma firmware not loaded\n");
+		return -ENODEV;
+	}
+
+	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
+	if (!ctx)
+		return -ENOMEM;
+
+	ctx->dev = dev;
+
+	if (mutex_lock_interruptible(&dev->dev_mutex)) {
+		ret = -ERESTARTSYS;
+		goto free_ctx;
+	}
+
+	ret = vpdma_create_desc_list(&ctx->desc_list, VPE_DESC_LIST_SIZE,
+			VPDMA_LIST_TYPE_NORMAL);
+	if (ret != 0)
+		goto unlock;
+
+	ret = vpdma_alloc_desc_buf(&ctx->mmr_adb, sizeof(struct vpe_mmr_adb));
+	if (ret != 0)
+		goto free_desc_list;
+
+	init_adb_hdrs(ctx);
+
+	v4l2_fh_init(&ctx->fh, video_devdata(file));
+	file->private_data = &ctx->fh;
+
+	hdl = &ctx->hdl;
+	v4l2_ctrl_handler_init(hdl, 1);
+	v4l2_ctrl_new_custom(hdl, &vpe_bufs_per_job, NULL);
+	if (hdl->error) {
+		ret = hdl->error;
+		goto exit_fh;
+	}
+	ctx->fh.ctrl_handler = hdl;
+	v4l2_ctrl_handler_setup(hdl);
+
+	s_q_data = &ctx->q_data[Q_DATA_SRC];
+	s_q_data->fmt = &vpe_formats[2];
+	s_q_data->width = 1920;
+	s_q_data->height = 1080;
+	s_q_data->sizeimage[VPE_LUMA] = (s_q_data->width * s_q_data->height *
+			s_q_data->fmt->vpdma_fmt[VPE_LUMA]->depth) >> 3;
+	s_q_data->colorspace = V4L2_COLORSPACE_SMPTE240M;
+	s_q_data->c_rect.left = 0;
+	s_q_data->c_rect.top = 0;
+	s_q_data->c_rect.width = s_q_data->width;
+	s_q_data->c_rect.height = s_q_data->height;
+	s_q_data->flags = 0;
+
+	ctx->q_data[Q_DATA_DST] = *s_q_data;
+
+	set_src_registers(ctx);
+	set_dst_registers(ctx);
+	ret = set_srcdst_params(ctx);
+	if (ret)
+		goto exit_fh;
+
+	ctx->m2m_ctx = v4l2_m2m_ctx_init(dev->m2m_dev, ctx, &queue_init);
+
+	if (IS_ERR(ctx->m2m_ctx)) {
+		ret = PTR_ERR(ctx->m2m_ctx);
+		goto exit_fh;
+	}
+
+	v4l2_fh_add(&ctx->fh);
+
+	/*
+	 * for now, just report the creation of the first instance, we can later
+	 * optimize the driver to enable or disable clocks when the first
+	 * instance is created or the last instance released
+	 */
+	if (atomic_inc_return(&dev->num_instances) == 1)
+		vpe_dbg(dev, "first instance created\n");
+
+	ctx->bufs_per_job = VPE_DEF_BUFS_PER_JOB;
+
+	ctx->load_mmrs = true;
+
+	vpe_dbg(dev, "created instance %p, m2m_ctx: %p\n",
+		ctx, ctx->m2m_ctx);
+
+	mutex_unlock(&dev->dev_mutex);
+
+	return 0;
+exit_fh:
+	v4l2_ctrl_handler_free(hdl);
+	v4l2_fh_exit(&ctx->fh);
+	vpdma_free_desc_buf(&ctx->mmr_adb);
+free_desc_list:
+	vpdma_free_desc_list(&ctx->desc_list);
+unlock:
+	mutex_unlock(&dev->dev_mutex);
+free_ctx:
+	kfree(ctx);
+	return ret;
+}
+
+static int vpe_release(struct file *file)
+{
+	struct vpe_dev *dev = video_drvdata(file);
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	vpe_dbg(dev, "releasing instance %p\n", ctx);
+
+	mutex_lock(&dev->dev_mutex);
+	vpdma_free_desc_list(&ctx->desc_list);
+	vpdma_free_desc_buf(&ctx->mmr_adb);
+
+	v4l2_fh_del(&ctx->fh);
+	v4l2_fh_exit(&ctx->fh);
+	v4l2_ctrl_handler_free(&ctx->hdl);
+	v4l2_m2m_ctx_release(ctx->m2m_ctx);
+
+	kfree(ctx);
+
+	/*
+	 * for now, just report the release of the last instance, we can later
+	 * optimize the driver to enable or disable clocks when the first
+	 * instance is created or the last instance released
+	 */
+	if (atomic_dec_return(&dev->num_instances) == 0)
+		vpe_dbg(dev, "last instance released\n");
+
+	mutex_unlock(&dev->dev_mutex);
+
+	return 0;
+}
+
+static unsigned int vpe_poll(struct file *file,
+			     struct poll_table_struct *wait)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+	struct vpe_dev *dev = ctx->dev;
+	int ret;
+
+	mutex_lock(&dev->dev_mutex);
+	ret = v4l2_m2m_poll(file, ctx->m2m_ctx, wait);
+	mutex_unlock(&dev->dev_mutex);
+	return ret;
+}
+
+static int vpe_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+	struct vpe_dev *dev = ctx->dev;
+	int ret;
+
+	if (mutex_lock_interruptible(&dev->dev_mutex))
+		return -ERESTARTSYS;
+	ret = v4l2_m2m_mmap(file, ctx->m2m_ctx, vma);
+	mutex_unlock(&dev->dev_mutex);
+	return ret;
+}
+
+static const struct v4l2_file_operations vpe_fops = {
+	.owner		= THIS_MODULE,
+	.open		= vpe_open,
+	.release	= vpe_release,
+	.poll		= vpe_poll,
+	.unlocked_ioctl	= video_ioctl2,
+	.mmap		= vpe_mmap,
+};
+
+static struct video_device vpe_videodev = {
+	.name		= VPE_MODULE_NAME,
+	.fops		= &vpe_fops,
+	.ioctl_ops	= &vpe_ioctl_ops,
+	.minor		= -1,
+	.release	= video_device_release,
+	.vfl_dir	= VFL_DIR_M2M,
+};
+
+static struct v4l2_m2m_ops m2m_ops = {
+	.device_run	= device_run,
+	.job_ready	= job_ready,
+	.job_abort	= job_abort,
+	.lock		= vpe_lock,
+	.unlock		= vpe_unlock,
+};
+
+static int vpe_runtime_get(struct platform_device *pdev)
+{
+	int r;
+
+	dev_dbg(&pdev->dev, "vpe_runtime_get\n");
+
+	r = pm_runtime_get_sync(&pdev->dev);
+	WARN_ON(r < 0);
+	return r < 0 ? r : 0;
+}
+
+static void vpe_runtime_put(struct platform_device *pdev)
+{
+
+	int r;
+
+	dev_dbg(&pdev->dev, "vpe_runtime_put\n");
+
+	r = pm_runtime_put_sync(&pdev->dev);
+	WARN_ON(r < 0 && r != -ENOSYS);
+}
+
+static int vpe_probe(struct platform_device *pdev)
+{
+	struct vpe_dev *dev;
+	struct video_device *vfd;
+	struct resource *res;
+	int ret, irq, func;
+
+	dev = devm_kzalloc(&pdev->dev, sizeof(*dev), GFP_KERNEL);
+	if (IS_ERR(dev))
+		return PTR_ERR(dev);
+
+	spin_lock_init(&dev->lock);
+
+	ret = v4l2_device_register(&pdev->dev, &dev->v4l2_dev);
+	if (ret)
+		return ret;
+
+	atomic_set(&dev->num_instances, 0);
+	mutex_init(&dev->dev_mutex);
+
+	res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "vpe");
+	dev->base = devm_ioremap_resource(&pdev->dev, res);
+	if (IS_ERR(dev->base)) {
+		ret = PTR_ERR(dev->base);
+		goto v4l2_dev_unreg;
+	}
+
+	irq = platform_get_irq(pdev, 0);
+	ret = devm_request_irq(&pdev->dev, irq, vpe_irq, 0, VPE_MODULE_NAME,
+			dev);
+	if (ret)
+		goto v4l2_dev_unreg;
+
+	platform_set_drvdata(pdev, dev);
+
+	dev->alloc_ctx = vb2_dma_contig_init_ctx(&pdev->dev);
+	if (IS_ERR(dev->alloc_ctx)) {
+		vpe_err(dev, "Failed to alloc vb2 context\n");
+		ret = PTR_ERR(dev->alloc_ctx);
+		goto v4l2_dev_unreg;
+	}
+
+	dev->m2m_dev = v4l2_m2m_init(&m2m_ops);
+	if (IS_ERR(dev->m2m_dev)) {
+		vpe_err(dev, "Failed to init mem2mem device\n");
+		ret = PTR_ERR(dev->m2m_dev);
+		goto rel_ctx;
+	}
+
+	pm_runtime_enable(&pdev->dev);
+
+	ret = vpe_runtime_get(pdev);
+	if (ret)
+		goto rel_m2m;
+
+	/* Perform clk enable followed by reset */
+	vpe_set_clock_enable(dev, 1);
+
+	vpe_top_reset(dev);
+
+	func = read_field_reg(dev, VPE_PID, VPE_PID_FUNC_MASK,
+		VPE_PID_FUNC_SHIFT);
+	vpe_dbg(dev, "VPE PID function %x\n", func);
+
+	vpe_top_vpdma_reset(dev);
+
+	dev->vpdma = vpdma_create(pdev);
+	if (IS_ERR(dev->vpdma))
+		goto runtime_put;
+
+	vfd = &dev->vfd;
+	*vfd = vpe_videodev;
+	vfd->lock = &dev->dev_mutex;
+	vfd->v4l2_dev = &dev->v4l2_dev;
+
+	ret = video_register_device(vfd, VFL_TYPE_GRABBER, 0);
+	if (ret) {
+		vpe_err(dev, "Failed to register video device\n");
+		goto runtime_put;
+	}
+
+	video_set_drvdata(vfd, dev);
+	snprintf(vfd->name, sizeof(vfd->name), "%s", vpe_videodev.name);
+	dev_info(dev->v4l2_dev.dev, "Device registered as /dev/video%d\n",
+		vfd->num);
+
+	return 0;
+
+runtime_put:
+	vpe_runtime_put(pdev);
+rel_m2m:
+	pm_runtime_disable(&pdev->dev);
+	v4l2_m2m_release(dev->m2m_dev);
+rel_ctx:
+	vb2_dma_contig_cleanup_ctx(dev->alloc_ctx);
+v4l2_dev_unreg:
+	v4l2_device_unregister(&dev->v4l2_dev);
+
+	return ret;
+}
+
+static int vpe_remove(struct platform_device *pdev)
+{
+	struct vpe_dev *dev =
+		(struct vpe_dev *) platform_get_drvdata(pdev);
+
+	v4l2_info(&dev->v4l2_dev, "Removing " VPE_MODULE_NAME);
+
+	v4l2_m2m_release(dev->m2m_dev);
+	video_unregister_device(&dev->vfd);
+	v4l2_device_unregister(&dev->v4l2_dev);
+	vb2_dma_contig_cleanup_ctx(dev->alloc_ctx);
+
+	vpe_set_clock_enable(dev, 0);
+	vpe_runtime_put(pdev);
+	pm_runtime_disable(&pdev->dev);
+
+	return 0;
+}
+
+#if defined(CONFIG_OF)
+static const struct of_device_id vpe_of_match[] = {
+	{
+		.compatible = "ti,vpe",
+	},
+	{},
+};
+#else
+#define vpe_of_match NULL
+#endif
+
+static struct platform_driver vpe_pdrv = {
+	.probe		= vpe_probe,
+	.remove		= vpe_remove,
+	.driver		= {
+		.name	= VPE_MODULE_NAME,
+		.owner	= THIS_MODULE,
+		.of_match_table = vpe_of_match,
+	},
+};
+
+static void __exit vpe_exit(void)
+{
+	platform_driver_unregister(&vpe_pdrv);
+}
+
+static int __init vpe_init(void)
+{
+	return platform_driver_register(&vpe_pdrv);
+}
+
+module_init(vpe_init);
+module_exit(vpe_exit);
+
+MODULE_DESCRIPTION("TI VPE driver");
+MODULE_AUTHOR("Dale Farnsworth, <dale@farnsworth.org>");
+MODULE_LICENSE("GPL");
diff --git a/drivers/media/platform/ti-vpe/vpe_regs.h b/drivers/media/platform/ti-vpe/vpe_regs.h
new file mode 100644
index 0000000..ed214e8
--- /dev/null
+++ b/drivers/media/platform/ti-vpe/vpe_regs.h
@@ -0,0 +1,496 @@
+/*
+ * Copyright (c) 2013 Texas Instruments Inc.
+ *
+ * David Griego, <dagriego@biglakesoftware.com>
+ * Dale Farnsworth, <dale@farnsworth.org>
+ * Archit Taneja, <archit@ti.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#ifndef __TI_VPE_REGS_H
+#define __TI_VPE_REGS_H
+
+/* VPE register offsets and field selectors */
+
+/* VPE top level regs */
+#define VPE_PID				0x0000
+#define VPE_PID_MINOR_MASK		0x3f
+#define VPE_PID_MINOR_SHIFT		0
+#define VPE_PID_CUSTOM_MASK		0x03
+#define VPE_PID_CUSTOM_SHIFT		6
+#define VPE_PID_MAJOR_MASK		0x07
+#define VPE_PID_MAJOR_SHIFT		8
+#define VPE_PID_RTL_MASK		0x1f
+#define VPE_PID_RTL_SHIFT		11
+#define VPE_PID_FUNC_MASK		0xfff
+#define VPE_PID_FUNC_SHIFT		16
+#define VPE_PID_SCHEME_MASK		0x03
+#define VPE_PID_SCHEME_SHIFT		30
+
+#define VPE_SYSCONFIG			0x0010
+#define VPE_SYSCONFIG_IDLE_MASK		0x03
+#define VPE_SYSCONFIG_IDLE_SHIFT	2
+#define VPE_SYSCONFIG_STANDBY_MASK	0x03
+#define VPE_SYSCONFIG_STANDBY_SHIFT	4
+#define VPE_FORCE_IDLE_MODE		0
+#define VPE_NO_IDLE_MODE		1
+#define VPE_SMART_IDLE_MODE		2
+#define VPE_SMART_IDLE_WAKEUP_MODE	3
+#define VPE_FORCE_STANDBY_MODE		0
+#define VPE_NO_STANDBY_MODE		1
+#define VPE_SMART_STANDBY_MODE		2
+#define VPE_SMART_STANDBY_WAKEUP_MODE	3
+
+#define VPE_INT0_STATUS0_RAW_SET	0x0020
+#define VPE_INT0_STATUS0_RAW		VPE_INT0_STATUS0_RAW_SET
+#define VPE_INT0_STATUS0_CLR		0x0028
+#define VPE_INT0_STATUS0		VPE_INT0_STATUS0_CLR
+#define VPE_INT0_ENABLE0_SET		0x0030
+#define VPE_INT0_ENABLE0		VPE_INT0_ENABLE0_SET
+#define VPE_INT0_ENABLE0_CLR		0x0038
+#define VPE_INT0_LIST0_COMPLETE		(1 << 0)
+#define VPE_INT0_LIST0_NOTIFY		(1 << 1)
+#define VPE_INT0_LIST1_COMPLETE		(1 << 2)
+#define VPE_INT0_LIST1_NOTIFY		(1 << 3)
+#define VPE_INT0_LIST2_COMPLETE		(1 << 4)
+#define VPE_INT0_LIST2_NOTIFY		(1 << 5)
+#define VPE_INT0_LIST3_COMPLETE		(1 << 6)
+#define VPE_INT0_LIST3_NOTIFY		(1 << 7)
+#define VPE_INT0_LIST4_COMPLETE		(1 << 8)
+#define VPE_INT0_LIST4_NOTIFY		(1 << 9)
+#define VPE_INT0_LIST5_COMPLETE		(1 << 10)
+#define VPE_INT0_LIST5_NOTIFY		(1 << 11)
+#define VPE_INT0_LIST6_COMPLETE		(1 << 12)
+#define VPE_INT0_LIST6_NOTIFY		(1 << 13)
+#define VPE_INT0_LIST7_COMPLETE		(1 << 14)
+#define VPE_INT0_LIST7_NOTIFY		(1 << 15)
+#define VPE_INT0_DESCRIPTOR		(1 << 16)
+#define VPE_DEI_FMD_INT			(1 << 18)
+
+#define VPE_INT0_STATUS1_RAW_SET	0x0024
+#define VPE_INT0_STATUS1_RAW		VPE_INT0_STATUS1_RAW_SET
+#define VPE_INT0_STATUS1_CLR		0x002c
+#define VPE_INT0_STATUS1		VPE_INT0_STATUS1_CLR
+#define VPE_INT0_ENABLE1_SET		0x0034
+#define VPE_INT0_ENABLE1		VPE_INT0_ENABLE1_SET
+#define VPE_INT0_ENABLE1_CLR		0x003c
+#define VPE_INT0_CHANNEL_GROUP0		(1 << 0)
+#define VPE_INT0_CHANNEL_GROUP1		(1 << 1)
+#define VPE_INT0_CHANNEL_GROUP2		(1 << 2)
+#define VPE_INT0_CHANNEL_GROUP3		(1 << 3)
+#define VPE_INT0_CHANNEL_GROUP4		(1 << 4)
+#define VPE_INT0_CHANNEL_GROUP5		(1 << 5)
+#define VPE_INT0_CLIENT			(1 << 7)
+#define VPE_DEI_ERROR_INT		(1 << 16)
+#define VPE_DS1_UV_ERROR_INT		(1 << 22)
+
+#define VPE_INTC_EOI			0x00a0
+
+#define VPE_CLK_ENABLE			0x0100
+#define VPE_VPEDMA_CLK_ENABLE		(1 << 0)
+#define VPE_DATA_PATH_CLK_ENABLE	(1 << 1)
+
+#define VPE_CLK_RESET			0x0104
+#define VPE_VPDMA_CLK_RESET_MASK	0x1
+#define VPE_VPDMA_CLK_RESET_SHIFT	0
+#define VPE_DATA_PATH_CLK_RESET_MASK	0x1
+#define VPE_DATA_PATH_CLK_RESET_SHIFT	1
+#define VPE_MAIN_RESET_MASK		0x1
+#define VPE_MAIN_RESET_SHIFT		31
+
+#define VPE_CLK_FORMAT_SELECT		0x010c
+#define VPE_CSC_SRC_SELECT_MASK		0x03
+#define VPE_CSC_SRC_SELECT_SHIFT	0
+#define VPE_RGB_OUT_SELECT		(1 << 8)
+#define VPE_DS_SRC_SELECT_MASK		0x07
+#define VPE_DS_SRC_SELECT_SHIFT		9
+#define VPE_DS_BYPASS			(1 << 16)
+#define VPE_COLOR_SEPARATE_422		(1 << 18)
+
+#define VPE_DS_SRC_DEI_SCALER		(5 << VPE_DS_SRC_SELECT_SHIFT)
+#define VPE_CSC_SRC_DEI_SCALER		(3 << VPE_CSC_SRC_SELECT_SHIFT)
+
+#define VPE_CLK_RANGE_MAP		0x011c
+#define VPE_RANGE_RANGE_MAP_Y_MASK	0x07
+#define VPE_RANGE_RANGE_MAP_Y_SHIFT	0
+#define VPE_RANGE_RANGE_MAP_UV_MASK	0x07
+#define VPE_RANGE_RANGE_MAP_UV_SHIFT	3
+#define VPE_RANGE_MAP_ON		(1 << 6)
+#define VPE_RANGE_REDUCTION_ON		(1 << 28)
+
+/* VPE chrominance upsampler regs */
+#define VPE_US1_R0			0x0304
+#define VPE_US2_R0			0x0404
+#define VPE_US3_R0			0x0504
+#define VPE_US_C1_MASK			0x3fff
+#define VPE_US_C1_SHIFT			2
+#define VPE_US_C0_MASK			0x3fff
+#define VPE_US_C0_SHIFT			18
+#define VPE_US_MODE_MASK		0x03
+#define VPE_US_MODE_SHIFT		16
+#define VPE_ANCHOR_FID0_C1_MASK		0x3fff
+#define VPE_ANCHOR_FID0_C1_SHIFT	2
+#define VPE_ANCHOR_FID0_C0_MASK		0x3fff
+#define VPE_ANCHOR_FID0_C0_SHIFT	18
+
+#define VPE_US1_R1			0x0308
+#define VPE_US2_R1			0x0408
+#define VPE_US3_R1			0x0508
+#define VPE_ANCHOR_FID0_C3_MASK		0x3fff
+#define VPE_ANCHOR_FID0_C3_SHIFT	2
+#define VPE_ANCHOR_FID0_C2_MASK		0x3fff
+#define VPE_ANCHOR_FID0_C2_SHIFT	18
+
+#define VPE_US1_R2			0x030c
+#define VPE_US2_R2			0x040c
+#define VPE_US3_R2			0x050c
+#define VPE_INTERP_FID0_C1_MASK		0x3fff
+#define VPE_INTERP_FID0_C1_SHIFT	2
+#define VPE_INTERP_FID0_C0_MASK		0x3fff
+#define VPE_INTERP_FID0_C0_SHIFT	18
+
+#define VPE_US1_R3			0x0310
+#define VPE_US2_R3			0x0410
+#define VPE_US3_R3			0x0510
+#define VPE_INTERP_FID0_C3_MASK		0x3fff
+#define VPE_INTERP_FID0_C3_SHIFT	2
+#define VPE_INTERP_FID0_C2_MASK		0x3fff
+#define VPE_INTERP_FID0_C2_SHIFT	18
+
+#define VPE_US1_R4			0x0314
+#define VPE_US2_R4			0x0414
+#define VPE_US3_R4			0x0514
+#define VPE_ANCHOR_FID1_C1_MASK		0x3fff
+#define VPE_ANCHOR_FID1_C1_SHIFT	2
+#define VPE_ANCHOR_FID1_C0_MASK		0x3fff
+#define VPE_ANCHOR_FID1_C0_SHIFT	18
+
+#define VPE_US1_R5			0x0318
+#define VPE_US2_R5			0x0418
+#define VPE_US3_R5			0x0518
+#define VPE_ANCHOR_FID1_C3_MASK		0x3fff
+#define VPE_ANCHOR_FID1_C3_SHIFT	2
+#define VPE_ANCHOR_FID1_C2_MASK		0x3fff
+#define VPE_ANCHOR_FID1_C2_SHIFT	18
+
+#define VPE_US1_R6			0x031c
+#define VPE_US2_R6			0x041c
+#define VPE_US3_R6			0x051c
+#define VPE_INTERP_FID1_C1_MASK		0x3fff
+#define VPE_INTERP_FID1_C1_SHIFT	2
+#define VPE_INTERP_FID1_C0_MASK		0x3fff
+#define VPE_INTERP_FID1_C0_SHIFT	18
+
+#define VPE_US1_R7			0x0320
+#define VPE_US2_R7			0x0420
+#define VPE_US3_R7			0x0520
+#define VPE_INTERP_FID0_C3_MASK		0x3fff
+#define VPE_INTERP_FID0_C3_SHIFT	2
+#define VPE_INTERP_FID0_C2_MASK		0x3fff
+#define VPE_INTERP_FID0_C2_SHIFT	18
+
+/* VPE de-interlacer regs */
+#define VPE_DEI_FRAME_SIZE		0x0600
+#define VPE_DEI_WIDTH_MASK		0x07ff
+#define VPE_DEI_WIDTH_SHIFT		0
+#define VPE_DEI_HEIGHT_MASK		0x07ff
+#define VPE_DEI_HEIGHT_SHIFT		16
+#define VPE_DEI_INTERLACE_BYPASS	(1 << 29)
+#define VPE_DEI_FIELD_FLUSH		(1 << 30)
+#define VPE_DEI_PROGRESSIVE		(1 << 31)
+
+#define VPE_MDT_BYPASS			0x0604
+#define VPE_MDT_TEMPMAX_BYPASS		(1 << 0)
+#define VPE_MDT_SPATMAX_BYPASS		(1 << 1)
+
+#define VPE_MDT_SF_THRESHOLD		0x0608
+#define VPE_MDT_SF_SC_THR1_MASK		0xff
+#define VPE_MDT_SF_SC_THR1_SHIFT	0
+#define VPE_MDT_SF_SC_THR2_MASK		0xff
+#define VPE_MDT_SF_SC_THR2_SHIFT	0
+#define VPE_MDT_SF_SC_THR3_MASK		0xff
+#define VPE_MDT_SF_SC_THR3_SHIFT	0
+
+#define VPE_EDI_CONFIG			0x060c
+#define VPE_EDI_INP_MODE_MASK		0x03
+#define VPE_EDI_INP_MODE_SHIFT		0
+#define VPE_EDI_ENABLE_3D		(1 << 2)
+#define VPE_EDI_ENABLE_CHROMA_3D	(1 << 3)
+#define VPE_EDI_CHROMA3D_COR_THR_MASK	0xff
+#define VPE_EDI_CHROMA3D_COR_THR_SHIFT	8
+#define VPE_EDI_DIR_COR_LOWER_THR_MASK	0xff
+#define VPE_EDI_DIR_COR_LOWER_THR_SHIFT	16
+#define VPE_EDI_COR_SCALE_FACTOR_MASK	0xff
+#define VPE_EDI_COR_SCALE_FACTOR_SHIFT	23
+
+#define VPE_DEI_EDI_LUT_R0		0x0610
+#define VPE_EDI_LUT0_MASK		0x1f
+#define VPE_EDI_LUT0_SHIFT		0
+#define VPE_EDI_LUT1_MASK		0x1f
+#define VPE_EDI_LUT1_SHIFT		8
+#define VPE_EDI_LUT2_MASK		0x1f
+#define VPE_EDI_LUT2_SHIFT		16
+#define VPE_EDI_LUT3_MASK		0x1f
+#define VPE_EDI_LUT3_SHIFT		24
+
+#define VPE_DEI_EDI_LUT_R1		0x0614
+#define VPE_EDI_LUT0_MASK		0x1f
+#define VPE_EDI_LUT0_SHIFT		0
+#define VPE_EDI_LUT1_MASK		0x1f
+#define VPE_EDI_LUT1_SHIFT		8
+#define VPE_EDI_LUT2_MASK		0x1f
+#define VPE_EDI_LUT2_SHIFT		16
+#define VPE_EDI_LUT3_MASK		0x1f
+#define VPE_EDI_LUT3_SHIFT		24
+
+#define VPE_DEI_EDI_LUT_R2		0x0618
+#define VPE_EDI_LUT4_MASK		0x1f
+#define VPE_EDI_LUT4_SHIFT		0
+#define VPE_EDI_LUT5_MASK		0x1f
+#define VPE_EDI_LUT5_SHIFT		8
+#define VPE_EDI_LUT6_MASK		0x1f
+#define VPE_EDI_LUT6_SHIFT		16
+#define VPE_EDI_LUT7_MASK		0x1f
+#define VPE_EDI_LUT7_SHIFT		24
+
+#define VPE_DEI_EDI_LUT_R3		0x061c
+#define VPE_EDI_LUT8_MASK		0x1f
+#define VPE_EDI_LUT8_SHIFT		0
+#define VPE_EDI_LUT9_MASK		0x1f
+#define VPE_EDI_LUT9_SHIFT		8
+#define VPE_EDI_LUT10_MASK		0x1f
+#define VPE_EDI_LUT10_SHIFT		16
+#define VPE_EDI_LUT11_MASK		0x1f
+#define VPE_EDI_LUT11_SHIFT		24
+
+#define VPE_DEI_FMD_WINDOW_R0		0x0620
+#define VPE_FMD_WINDOW_MINX_MASK	0x07ff
+#define VPE_FMD_WINDOW_MINX_SHIFT	0
+#define VPE_FMD_WINDOW_MAXX_MASK	0x07ff
+#define VPE_FMD_WINDOW_MAXX_SHIFT	16
+#define VPE_FMD_WINDOW_ENABLE		(1 << 31)
+
+#define VPE_DEI_FMD_WINDOW_R1		0x0624
+#define VPE_FMD_WINDOW_MINY_MASK	0x07ff
+#define VPE_FMD_WINDOW_MINY_SHIFT	0
+#define VPE_FMD_WINDOW_MAXY_MASK	0x07ff
+#define VPE_FMD_WINDOW_MAXY_SHIFT	16
+
+#define VPE_DEI_FMD_CONTROL_R0		0x0628
+#define VPE_FMD_ENABLE			(1 << 0)
+#define VPE_FMD_LOCK			(1 << 1)
+#define VPE_FMD_JAM_DIR			(1 << 2)
+#define VPE_FMD_BED_ENABLE		(1 << 3)
+#define VPE_FMD_CAF_FIELD_THR_MASK	0xff
+#define VPE_FMD_CAF_FIELD_THR_SHIFT	16
+#define VPE_FMD_CAF_LINE_THR_MASK	0xff
+#define VPE_FMD_CAF_LINE_THR_SHIFT	24
+
+#define VPE_DEI_FMD_CONTROL_R1		0x062c
+#define VPE_FMD_CAF_THR_MASK		0x000fffff
+#define VPE_FMD_CAF_THR_SHIFT		0
+
+#define VPE_DEI_FMD_STATUS_R0		0x0630
+#define VPE_FMD_CAF_MASK		0x000fffff
+#define VPE_FMD_CAF_SHIFT		0
+#define VPE_FMD_RESET			(1 << 24)
+
+#define VPE_DEI_FMD_STATUS_R1		0x0634
+#define VPE_FMD_FIELD_DIFF_MASK		0x0fffffff
+#define VPE_FMD_FIELD_DIFF_SHIFT	0
+
+#define VPE_DEI_FMD_STATUS_R2		0x0638
+#define VPE_FMD_FRAME_DIFF_MASK		0x000fffff
+#define VPE_FMD_FRAME_DIFF_SHIFT	0
+
+/* VPE scaler regs */
+#define VPE_SC_MP_SC0			0x0700
+#define VPE_INTERLACE_O			(1 << 0)
+#define VPE_LINEAR			(1 << 1)
+#define VPE_SC_BYPASS			(1 << 2)
+#define VPE_INVT_FID			(1 << 3)
+#define VPE_USE_RAV			(1 << 4)
+#define VPE_ENABLE_EV			(1 << 5)
+#define VPE_AUTO_HS			(1 << 6)
+#define VPE_DCM_2X			(1 << 7)
+#define VPE_DCM_4X			(1 << 8)
+#define VPE_HP_BYPASS			(1 << 9)
+#define VPE_INTERLACE_I			(1 << 10)
+#define VPE_ENABLE_SIN2_VER_INTP	(1 << 11)
+#define VPE_Y_PK_EN			(1 << 14)
+#define VPE_TRIM			(1 << 15)
+#define VPE_SELFGEN_FID			(1 << 16)
+
+#define VPE_SC_MP_SC1			0x0704
+#define VPE_ROW_ACC_INC_MASK		0x07ffffff
+#define VPE_ROW_ACC_INC_SHIFT		0
+
+#define VPE_SC_MP_SC2			0x0708
+#define VPE_ROW_ACC_OFFSET_MASK		0x0fffffff
+#define VPE_ROW_ACC_OFFSET_SHIFT	0
+
+#define VPE_SC_MP_SC3			0x070c
+#define VPE_ROW_ACC_OFFSET_B_MASK	0x0fffffff
+#define VPE_ROW_ACC_OFFSET_B_SHIFT	0
+
+#define VPE_SC_MP_SC4			0x0710
+#define VPE_TAR_H_MASK			0x07ff
+#define VPE_TAR_H_SHIFT			0
+#define VPE_TAR_W_MASK			0x07ff
+#define VPE_TAR_W_SHIFT			12
+#define VPE_LIN_ACC_INC_U_MASK		0x07
+#define VPE_LIN_ACC_INC_U_SHIFT		24
+#define VPE_NLIN_ACC_INIT_U_MASK	0x07
+#define VPE_NLIN_ACC_INIT_U_SHIFT	28
+
+#define VPE_SC_MP_SC5			0x0714
+#define VPE_SRC_H_MASK			0x07ff
+#define VPE_SRC_H_SHIFT			0
+#define VPE_SRC_W_MASK			0x07ff
+#define VPE_SRC_W_SHIFT			12
+#define VPE_NLIN_ACC_INC_U_MASK		0x07
+#define VPE_NLIN_ACC_INC_U_SHIFT	24
+
+#define VPE_SC_MP_SC6			0x0718
+#define VPE_ROW_ACC_INIT_RAV_MASK	0x03ff
+#define VPE_ROW_ACC_INIT_RAV_SHIFT	0
+#define VPE_ROW_ACC_INIT_RAV_B_MASK	0x03ff
+#define VPE_ROW_ACC_INIT_RAV_B_SHIFT	10
+
+#define VPE_SC_MP_SC8			0x0720
+#define VPE_NLIN_LEFT_MASK		0x07ff
+#define VPE_NLIN_LEFT_SHIFT		0
+#define VPE_NLIN_RIGHT_MASK		0x07ff
+#define VPE_NLIN_RIGHT_SHIFT		12
+
+#define VPE_SC_MP_SC9			0x0724
+#define VPE_LIN_ACC_INC			VPE_SC_MP_SC9
+
+#define VPE_SC_MP_SC10			0x0728
+#define VPE_NLIN_ACC_INIT		VPE_SC_MP_SC10
+
+#define VPE_SC_MP_SC11			0x072c
+#define VPE_NLIN_ACC_INC		VPE_SC_MP_SC11
+
+#define VPE_SC_MP_SC12			0x0730
+#define VPE_COL_ACC_OFFSET_MASK		0x01ffffff
+#define VPE_COL_ACC_OFFSET_SHIFT	0
+
+#define VPE_SC_MP_SC13			0x0734
+#define VPE_SC_FACTOR_RAV_MASK		0x03ff
+#define VPE_SC_FACTOR_RAV_SHIFT		0
+#define VPE_CHROMA_INTP_THR_MASK	0x03ff
+#define VPE_CHROMA_INTP_THR_SHIFT	12
+#define VPE_DELTA_CHROMA_THR_MASK	0x0f
+#define VPE_DELTA_CHROMA_THR_SHIFT	24
+
+#define VPE_SC_MP_SC17			0x0744
+#define VPE_EV_THR_MASK			0x03ff
+#define VPE_EV_THR_SHIFT		12
+#define VPE_DELTA_LUMA_THR_MASK		0x0f
+#define VPE_DELTA_LUMA_THR_SHIFT	24
+#define VPE_DELTA_EV_THR_MASK		0x0f
+#define VPE_DELTA_EV_THR_SHIFT		28
+
+#define VPE_SC_MP_SC18			0x0748
+#define VPE_HS_FACTOR_MASK		0x03ff
+#define VPE_HS_FACTOR_SHIFT		0
+#define VPE_CONF_DEFAULT_MASK		0x01ff
+#define VPE_CONF_DEFAULT_SHIFT		16
+
+#define VPE_SC_MP_SC19			0x074c
+#define VPE_HPF_COEFF0_MASK		0xff
+#define VPE_HPF_COEFF0_SHIFT		0
+#define VPE_HPF_COEFF1_MASK		0xff
+#define VPE_HPF_COEFF1_SHIFT		8
+#define VPE_HPF_COEFF2_MASK		0xff
+#define VPE_HPF_COEFF2_SHIFT		16
+#define VPE_HPF_COEFF3_MASK		0xff
+#define VPE_HPF_COEFF3_SHIFT		23
+
+#define VPE_SC_MP_SC20			0x0750
+#define VPE_HPF_COEFF4_MASK		0xff
+#define VPE_HPF_COEFF4_SHIFT		0
+#define VPE_HPF_COEFF5_MASK		0xff
+#define VPE_HPF_COEFF5_SHIFT		8
+#define VPE_HPF_NORM_SHIFT_MASK		0x07
+#define VPE_HPF_NORM_SHIFT_SHIFT	16
+#define VPE_NL_LIMIT_MASK		0x1ff
+#define VPE_NL_LIMIT_SHIFT		20
+
+#define VPE_SC_MP_SC21			0x0754
+#define VPE_NL_LO_THR_MASK		0x01ff
+#define VPE_NL_LO_THR_SHIFT		0
+#define VPE_NL_LO_SLOPE_MASK		0xff
+#define VPE_NL_LO_SLOPE_SHIFT		16
+
+#define VPE_SC_MP_SC22			0x0758
+#define VPE_NL_HI_THR_MASK		0x01ff
+#define VPE_NL_HI_THR_SHIFT		0
+#define VPE_NL_HI_SLOPE_SH_MASK		0x07
+#define VPE_NL_HI_SLOPE_SH_SHIFT	16
+
+#define VPE_SC_MP_SC23			0x075c
+#define VPE_GRADIENT_THR_MASK		0x07ff
+#define VPE_GRADIENT_THR_SHIFT		0
+#define VPE_GRADIENT_THR_RANGE_MASK	0x0f
+#define VPE_GRADIENT_THR_RANGE_SHIFT	12
+#define VPE_MIN_GY_THR_MASK		0xff
+#define VPE_MIN_GY_THR_SHIFT		16
+#define VPE_MIN_GY_THR_RANGE_MASK	0x0f
+#define VPE_MIN_GY_THR_RANGE_SHIFT	28
+
+#define VPE_SC_MP_SC24			0x0760
+#define VPE_ORG_H_MASK			0x07ff
+#define VPE_ORG_H_SHIFT			0
+#define VPE_ORG_W_MASK			0x07ff
+#define VPE_ORG_W_SHIFT			16
+
+#define VPE_SC_MP_SC25			0x0764
+#define VPE_OFF_H_MASK			0x07ff
+#define VPE_OFF_H_SHIFT			0
+#define VPE_OFF_W_MASK			0x07ff
+#define VPE_OFF_W_SHIFT			16
+
+/* VPE color space converter regs */
+#define VPE_CSC_CSC00			0x5700
+#define VPE_CSC_A0_MASK			0x1fff
+#define VPE_CSC_A0_SHIFT		0
+#define VPE_CSC_B0_MASK			0x1fff
+#define VPE_CSC_B0_SHIFT		16
+
+#define VPE_CSC_CSC01			0x5704
+#define VPE_CSC_C0_MASK			0x1fff
+#define VPE_CSC_C0_SHIFT		0
+#define VPE_CSC_A1_MASK			0x1fff
+#define VPE_CSC_A1_SHIFT		16
+
+#define VPE_CSC_CSC02			0x5708
+#define VPE_CSC_B1_MASK			0x1fff
+#define VPE_CSC_B1_SHIFT		0
+#define VPE_CSC_C1_MASK			0x1fff
+#define VPE_CSC_C1_SHIFT		16
+
+#define VPE_CSC_CSC03			0x570c
+#define VPE_CSC_A2_MASK			0x1fff
+#define VPE_CSC_A2_SHIFT		0
+#define VPE_CSC_B2_MASK			0x1fff
+#define VPE_CSC_B2_SHIFT		16
+
+#define VPE_CSC_CSC04			0x5710
+#define VPE_CSC_C2_MASK			0x1fff
+#define VPE_CSC_C2_SHIFT		0
+#define VPE_CSC_D0_MASK			0x0fff
+#define VPE_CSC_D0_SHIFT		16
+
+#define VPE_CSC_CSC05			0x5714
+#define VPE_CSC_D1_MASK			0x0fff
+#define VPE_CSC_D1_SHIFT		0
+#define VPE_CSC_D2_MASK			0x0fff
+#define VPE_CSC_D2_SHIFT		16
+#define VPE_CSC_BYPASS			(1 << 28)
+
+#endif
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v3 3/6] v4l: ti-vpe: Add VPE mem to mem driver
@ 2013-08-29 12:32       ` Archit Taneja
  0 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-08-29 12:32 UTC (permalink / raw)
  To: linux-media
  Cc: hverkuil, laurent.pinchart, tomi.valkeinen, linux-omap, Archit Taneja

VPE is a block which consists of a single memory to memory path which can
perform chrominance up/down sampling, de-interlacing, scaling, and color space
conversion of raster or tiled YUV420 coplanar, YUV422 coplanar or YUV422
interleaved video formats.

We create a mem2mem driver based primarily on the mem2mem-testdev example.
The de-interlacer, scaler and color space converter are all bypassed for now
to keep the driver simple. Chroma up/down sampler blocks are implemented, so
conversion beteen different YUV formats is possible.

Each mem2mem context allocates a buffer for VPE MMR values which it will use
when it gets access to the VPE HW via the mem2mem queue, it also allocates
a VPDMA descriptor list to which configuration and data descriptors are added.

Based on the information received via v4l2 ioctls for the source and
destination queues, the driver configures the values for the MMRs, and stores
them in the buffer. There are also some VPDMA parameters like frame start and
line mode which needs to be configured, these are configured by direct register
writes via the VPDMA helper functions.

The driver's device_run() mem2mem op will add each descriptor based on how the
source and destination queues are set up for the given ctx, once the list is
prepared, it's submitted to VPDMA, these descriptors when parsed by VPDMA will
upload MMR registers, start DMA of video buffers on the various input and output
clients/ports.

When the list is parsed completely(and the DMAs on all the output ports done),
an interrupt is generated which we use to notify that the source and destination
buffers are done.

The rest of the driver is quite similar to other mem2mem drivers, we use the
multiplane v4l2 ioctls as the HW support coplanar formats.

Signed-off-by: Archit Taneja <archit@ti.com>
---
 drivers/media/platform/Kconfig           |   16 +
 drivers/media/platform/Makefile          |    2 +
 drivers/media/platform/ti-vpe/Makefile   |    5 +
 drivers/media/platform/ti-vpe/vpe.c      | 1740 ++++++++++++++++++++++++++++++
 drivers/media/platform/ti-vpe/vpe_regs.h |  496 +++++++++
 5 files changed, 2259 insertions(+)
 create mode 100644 drivers/media/platform/ti-vpe/Makefile
 create mode 100644 drivers/media/platform/ti-vpe/vpe.c
 create mode 100644 drivers/media/platform/ti-vpe/vpe_regs.h

diff --git a/drivers/media/platform/Kconfig b/drivers/media/platform/Kconfig
index 08de865..94eede7 100644
--- a/drivers/media/platform/Kconfig
+++ b/drivers/media/platform/Kconfig
@@ -210,6 +210,22 @@ config VIDEO_SH_VEU
 	    Support for the Video Engine Unit (VEU) on SuperH and
 	    SH-Mobile SoCs.
 
+config VIDEO_TI_VPE
+	tristate "TI VPE (Video Processing Engine) driver"
+	depends on VIDEO_DEV && VIDEO_V4L2 && SOC_DRA7XX
+	select VIDEOBUF2_DMA_CONTIG
+	select V4L2_MEM2MEM_DEV
+	default n
+	---help---
+	  Support for the TI VPE(Video Processing Engine) block
+	  found on DRA7XX SoC.
+
+config VIDEO_TI_VPE_DEBUG
+	bool "VPE debug messages"
+	depends on VIDEO_TI_VPE
+	---help---
+	  Enable debug messages on VPE driver.
+
 endif # V4L_MEM2MEM_DRIVERS
 
 menuconfig V4L_TEST_DRIVERS
diff --git a/drivers/media/platform/Makefile b/drivers/media/platform/Makefile
index eee28dd..d4614e7 100644
--- a/drivers/media/platform/Makefile
+++ b/drivers/media/platform/Makefile
@@ -22,6 +22,8 @@ obj-$(CONFIG_VIDEO_VIVI) += vivi.o
 
 obj-$(CONFIG_VIDEO_MEM2MEM_TESTDEV) += mem2mem_testdev.o
 
+obj-$(CONFIG_VIDEO_TI_VPE)		+= ti-vpe/
+
 obj-$(CONFIG_VIDEO_MX2_EMMAPRP)		+= mx2_emmaprp.o
 obj-$(CONFIG_VIDEO_CODA) 		+= coda.o
 
diff --git a/drivers/media/platform/ti-vpe/Makefile b/drivers/media/platform/ti-vpe/Makefile
new file mode 100644
index 0000000..cbf0a80
--- /dev/null
+++ b/drivers/media/platform/ti-vpe/Makefile
@@ -0,0 +1,5 @@
+obj-$(CONFIG_VIDEO_TI_VPE) += ti-vpe.o
+
+ti-vpe-y := vpe.o vpdma.o
+
+ccflags-$(CONFIG_VIDEO_TI_VPE_DEBUG) += -DDEBUG
diff --git a/drivers/media/platform/ti-vpe/vpe.c b/drivers/media/platform/ti-vpe/vpe.c
new file mode 100644
index 0000000..85b0880
--- /dev/null
+++ b/drivers/media/platform/ti-vpe/vpe.c
@@ -0,0 +1,1740 @@
+/*
+ * TI VPE mem2mem driver, based on the virtual v4l2-mem2mem example driver
+ *
+ * Copyright (c) 2013 Texas Instruments Inc.
+ * David Griego, <dagriego@biglakesoftware.com>
+ * Dale Farnsworth, <dale@farnsworth.org>
+ * Archit Taneja, <archit@ti.com>
+ *
+ * Copyright (c) 2009-2010 Samsung Electronics Co., Ltd.
+ * Pawel Osciak, <pawel@osciak.com>
+ * Marek Szyprowski, <m.szyprowski@samsung.com>
+ *
+ * Based on the virtual v4l2-mem2mem example device
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation
+ */
+
+#include <linux/delay.h>
+#include <linux/dma-mapping.h>
+#include <linux/err.h>
+#include <linux/fs.h>
+#include <linux/interrupt.h>
+#include <linux/io.h>
+#include <linux/ioctl.h>
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <linux/pm_runtime.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/videodev2.h>
+
+#include <media/v4l2-common.h>
+#include <media/v4l2-ctrls.h>
+#include <media/v4l2-device.h>
+#include <media/v4l2-event.h>
+#include <media/v4l2-ioctl.h>
+#include <media/v4l2-mem2mem.h>
+#include <media/videobuf2-core.h>
+#include <media/videobuf2-dma-contig.h>
+
+#include "vpdma.h"
+#include "vpe_regs.h"
+
+#define VPE_MODULE_NAME "vpe"
+
+/* minimum and maximum frame sizes */
+#define MIN_W		128
+#define MIN_H		128
+#define MAX_W		1920
+#define MAX_H		1080
+
+/* required alignments */
+#define S_ALIGN		0	/* multiple of 1 */
+#define H_ALIGN		1	/* multiple of 2 */
+#define W_ALIGN		1	/* multiple of 2 */
+
+/* multiple of 128 bits, line stride, 16 bytes */
+#define L_ALIGN		4
+
+/* flags that indicate a format can be used for capture/output */
+#define VPE_FMT_TYPE_CAPTURE	(1 << 0)
+#define VPE_FMT_TYPE_OUTPUT	(1 << 1)
+
+/* used as plane indices */
+#define VPE_MAX_PLANES	2
+#define VPE_LUMA	0
+#define VPE_CHROMA	1
+
+/* per m2m context info */
+#define VPE_DEF_BUFS_PER_JOB	1	/* default one buffer per batch job */
+
+/*
+ * each VPE context needs up to 3 config desciptors, 7 input descriptors,
+ * 3 output descriptors, and 3 control descripors
+ */
+#define VPE_DESC_LIST_SIZE	(15 * VPDMA_MAX_DESC_SIZE)
+
+#define vpe_dbg(vpedev, fmt, arg...)	\
+		dev_dbg((vpedev)->v4l2_dev.dev, fmt, ##arg)
+#define vpe_err(vpedev, fmt, arg...)	\
+		dev_err((vpedev)->v4l2_dev.dev, fmt, ##arg)
+
+struct vpe_us_coeffs {
+	unsigned short	anchor_fid0_c0;
+	unsigned short	anchor_fid0_c1;
+	unsigned short	anchor_fid0_c2;
+	unsigned short	anchor_fid0_c3;
+	unsigned short	interp_fid0_c0;
+	unsigned short	interp_fid0_c1;
+	unsigned short	interp_fid0_c2;
+	unsigned short	interp_fid0_c3;
+	unsigned short	anchor_fid1_c0;
+	unsigned short	anchor_fid1_c1;
+	unsigned short	anchor_fid1_c2;
+	unsigned short	anchor_fid1_c3;
+	unsigned short	interp_fid1_c0;
+	unsigned short	interp_fid1_c1;
+	unsigned short	interp_fid1_c2;
+	unsigned short	interp_fid1_c3;
+};
+
+/*
+ * Default upsampler coefficients
+ */
+static const struct vpe_us_coeffs us_coeffs[] = {
+	{
+		/* Coefficients for progressive input */
+		0x00C8, 0x0348, 0x0018, 0x3FD8, 0x3FB8, 0x0378, 0x00E8, 0x3FE8,
+		0x00C8, 0x0348, 0x0018, 0x3FD8, 0x3FB8, 0x0378, 0x00E8, 0x3FE8,
+	},
+};
+
+/*
+ * The port_data structure contains per-port data.
+ */
+struct vpe_port_data {
+	enum vpdma_channel channel;	/* VPDMA channel */
+	u8	vb_part;		/* plane index for co-panar formats */
+};
+
+/*
+ * Define indices into the port_data tables
+ */
+#define VPE_PORT_LUMA1_IN	0
+#define VPE_PORT_CHROMA1_IN	1
+#define VPE_PORT_LUMA_OUT	8
+#define VPE_PORT_CHROMA_OUT	9
+#define VPE_PORT_RGB_OUT	10
+
+static const struct vpe_port_data port_data[11] = {
+	[VPE_PORT_LUMA1_IN] = {
+		.channel	= VPE_CHAN_LUMA1_IN,
+		.vb_part	= VPE_LUMA,
+	},
+	[VPE_PORT_CHROMA1_IN] = {
+		.channel	= VPE_CHAN_CHROMA1_IN,
+		.vb_part	= VPE_CHROMA,
+	},
+	[VPE_PORT_LUMA_OUT] = {
+		.channel	= VPE_CHAN_LUMA_OUT,
+		.vb_part	= VPE_LUMA,
+	},
+	[VPE_PORT_CHROMA_OUT] = {
+		.channel	= VPE_CHAN_CHROMA_OUT,
+		.vb_part	= VPE_CHROMA,
+	},
+	[VPE_PORT_RGB_OUT] = {
+		.channel	= VPE_CHAN_RGB_OUT,
+		.vb_part	= VPE_LUMA,
+	},
+};
+
+
+/* driver info for each of the supported video formats */
+struct vpe_fmt {
+	char	*name;			/* human-readable name */
+	u32	fourcc;			/* standard format identifier */
+	u8	types;			/* CAPTURE and/or OUTPUT */
+	u8	coplanar;		/* set for unpacked Luma and Chroma */
+	/* vpdma format info for each plane */
+	struct vpdma_data_format const *vpdma_fmt[VPE_MAX_PLANES];
+};
+
+static struct vpe_fmt vpe_formats[] = {
+	{
+		.name		= "YUV 422 co-planar",
+		.fourcc		= V4L2_PIX_FMT_NV16,
+		.types		= VPE_FMT_TYPE_CAPTURE | VPE_FMT_TYPE_OUTPUT,
+		.coplanar	= 1,
+		.vpdma_fmt	= { &vpdma_yuv_fmts[VPDMA_DATA_FMT_Y444],
+				    &vpdma_yuv_fmts[VPDMA_DATA_FMT_C444],
+				  },
+	},
+	{
+		.name		= "YUV 420 co-planar",
+		.fourcc		= V4L2_PIX_FMT_NV12,
+		.types		= VPE_FMT_TYPE_CAPTURE | VPE_FMT_TYPE_OUTPUT,
+		.coplanar	= 1,
+		.vpdma_fmt	= { &vpdma_yuv_fmts[VPDMA_DATA_FMT_Y420],
+				    &vpdma_yuv_fmts[VPDMA_DATA_FMT_C420],
+				  },
+	},
+	{
+		.name		= "YUYV 422 packed",
+		.fourcc		= V4L2_PIX_FMT_YUYV,
+		.types		= VPE_FMT_TYPE_CAPTURE | VPE_FMT_TYPE_OUTPUT,
+		.coplanar	= 0,
+		.vpdma_fmt	= { &vpdma_yuv_fmts[VPDMA_DATA_FMT_YC422],
+				  },
+	},
+	{
+		.name		= "UYVY 422 packed",
+		.fourcc		= V4L2_PIX_FMT_UYVY,
+		.types		= VPE_FMT_TYPE_CAPTURE | VPE_FMT_TYPE_OUTPUT,
+		.coplanar	= 0,
+		.vpdma_fmt	= { &vpdma_yuv_fmts[VPDMA_DATA_FMT_CY422],
+				  },
+	},
+};
+
+/*
+ * per-queue, driver-specific private data.
+ * there is one source queue and one destination queue for each m2m context.
+ */
+struct vpe_q_data {
+	unsigned int		width;				/* frame width */
+	unsigned int		height;				/* frame height */
+	unsigned int		bytesperline[VPE_MAX_PLANES];	/* bytes per line in memory */
+	enum v4l2_colorspace	colorspace;
+	unsigned int		flags;
+	unsigned int		sizeimage[VPE_MAX_PLANES];	/* image size in memory */
+	struct v4l2_rect	c_rect;				/* crop/compose rectangle */
+	struct vpe_fmt		*fmt;				/* format info */
+};
+
+/* vpe_q_data flag bits */
+#define	Q_DATA_FRAME_1D		(1 << 0)
+#define	Q_DATA_MODE_TILED	(1 << 1)
+
+enum {
+	Q_DATA_SRC = 0,
+	Q_DATA_DST = 1,
+};
+
+/* find our format description corresponding to the passed v4l2_format */
+static struct vpe_fmt *find_format(struct v4l2_format *f)
+{
+	struct vpe_fmt *fmt;
+	unsigned int k;
+
+	for (k = 0; k < ARRAY_SIZE(vpe_formats); k++) {
+		fmt = &vpe_formats[k];
+		if (fmt->fourcc == f->fmt.pix.pixelformat)
+			return fmt;
+	}
+
+	return NULL;
+}
+
+/*
+ * there is one vpe_dev structure in the driver, it is shared by
+ * all instances.
+ */
+struct vpe_dev {
+	struct v4l2_device	v4l2_dev;
+	struct video_device	vfd;
+	struct v4l2_m2m_dev	*m2m_dev;
+
+	atomic_t		num_instances;	/* count of driver instances */
+	dma_addr_t		loaded_mmrs;	/* shadow mmrs in device */
+	struct mutex		dev_mutex;
+	spinlock_t		lock;
+
+	int			irq;
+	void __iomem		*base;
+
+	struct vb2_alloc_ctx	*alloc_ctx;
+	struct vpdma_data	*vpdma;		/* vpdma data handle */
+};
+
+/*
+ * There is one vpe_ctx structure for each m2m context.
+ */
+struct vpe_ctx {
+	struct v4l2_fh		fh;
+	struct vpe_dev		*dev;
+	struct v4l2_m2m_ctx	*m2m_ctx;
+	struct v4l2_ctrl_handler hdl;
+
+	unsigned int		sequence;		/* current frame/field seq */
+	unsigned int		aborting;		/* abort after next irq */
+
+	unsigned int		bufs_per_job;		/* input buffers per batch */
+	unsigned int		bufs_completed;		/* bufs done in this batch */
+
+	struct vpe_q_data	q_data[2];		/* src & dst queue data */
+	struct vb2_buffer	*src_vb;
+	struct vb2_buffer	*dst_vb;
+
+	struct vpdma_buf	mmr_adb;		/* shadow reg addr/data block */
+	struct vpdma_desc_list	desc_list;		/* DMA descriptor list */
+
+	bool			load_mmrs;		/* have new shadow reg values */
+};
+
+
+/*
+ * M2M devices get 2 queues.
+ * Return the queue given the type.
+ */
+static struct vpe_q_data *get_q_data(struct vpe_ctx *ctx,
+				     enum v4l2_buf_type type)
+{
+	switch (type) {
+	case V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE:
+		return &ctx->q_data[Q_DATA_SRC];
+	case V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE:
+		return &ctx->q_data[Q_DATA_DST];
+	default:
+		BUG();
+	}
+	return NULL;
+}
+
+static u32 read_reg(struct vpe_dev *dev, int offset)
+{
+	return ioread32(dev->base + offset);
+}
+
+static void write_reg(struct vpe_dev *dev, int offset, u32 value)
+{
+	iowrite32(value, dev->base + offset);
+}
+
+/* register field read/write helpers */
+static int get_field(u32 value, u32 mask, int shift)
+{
+	return (value & (mask << shift)) >> shift;
+}
+
+static int read_field_reg(struct vpe_dev *dev, int offset, u32 mask, int shift)
+{
+	return get_field(read_reg(dev, offset), mask, shift);
+}
+
+static void write_field(u32 *valp, u32 field, u32 mask, int shift)
+{
+	u32 val = *valp;
+
+	val &= ~(mask << shift);
+	val |= (field & mask) << shift;
+	*valp = val;
+}
+
+static void write_field_reg(struct vpe_dev *dev, int offset, u32 field,
+		u32 mask, int shift)
+{
+	u32 val = read_reg(dev, offset);
+
+	write_field(&val, field, mask, shift);
+
+	write_reg(dev, offset, val);
+}
+
+/*
+ * DMA address/data block for the shadow registers
+ */
+struct vpe_mmr_adb {
+	struct vpdma_adb_hdr	out_fmt_hdr;
+	u32			out_fmt_reg[1];
+	u32			out_fmt_pad[3];
+	struct vpdma_adb_hdr	us1_hdr;
+	u32			us1_regs[8];
+	struct vpdma_adb_hdr	us2_hdr;
+	u32			us2_regs[8];
+	struct vpdma_adb_hdr	us3_hdr;
+	u32			us3_regs[8];
+	struct vpdma_adb_hdr	dei_hdr;
+	u32			dei_regs[1];
+	u32			dei_pad[3];
+	struct vpdma_adb_hdr	sc_hdr;
+	u32			sc_regs[1];
+	u32			sc_pad[3];
+	struct vpdma_adb_hdr	csc_hdr;
+	u32			csc_regs[6];
+	u32			csc_pad[2];
+};
+
+#define VPE_SET_MMR_ADB_HDR(ctx, hdr, regs, offset_a)	\
+	VPDMA_SET_MMR_ADB_HDR(ctx->mmr_adb, vpe_mmr_adb, hdr, regs, offset_a)
+/*
+ * Set the headers for all of the address/data block structures.
+ */
+static void init_adb_hdrs(struct vpe_ctx *ctx)
+{
+	VPE_SET_MMR_ADB_HDR(ctx, out_fmt_hdr, out_fmt_reg, VPE_CLK_FORMAT_SELECT);
+	VPE_SET_MMR_ADB_HDR(ctx, us1_hdr, us1_regs, VPE_US1_R0);
+	VPE_SET_MMR_ADB_HDR(ctx, us2_hdr, us2_regs, VPE_US2_R0);
+	VPE_SET_MMR_ADB_HDR(ctx, us3_hdr, us3_regs, VPE_US3_R0);
+	VPE_SET_MMR_ADB_HDR(ctx, dei_hdr, dei_regs, VPE_DEI_FRAME_SIZE);
+	VPE_SET_MMR_ADB_HDR(ctx, sc_hdr, sc_regs, VPE_SC_MP_SC0);
+	VPE_SET_MMR_ADB_HDR(ctx, csc_hdr, csc_regs, VPE_CSC_CSC00);
+};
+
+/*
+ * Enable or disable the VPE clocks
+ */
+static void vpe_set_clock_enable(struct vpe_dev *dev, bool on)
+{
+	u32 val = 0;
+
+	if (on)
+		val = VPE_DATA_PATH_CLK_ENABLE | VPE_VPEDMA_CLK_ENABLE;
+	write_reg(dev, VPE_CLK_ENABLE, val);
+}
+
+static void vpe_top_reset(struct vpe_dev *dev)
+{
+
+	write_field_reg(dev, VPE_CLK_RESET, 1, VPE_DATA_PATH_CLK_RESET_MASK,
+		VPE_DATA_PATH_CLK_RESET_SHIFT);
+
+	usleep_range(100, 150);
+
+	write_field_reg(dev, VPE_CLK_RESET, 0, VPE_DATA_PATH_CLK_RESET_MASK,
+		VPE_DATA_PATH_CLK_RESET_SHIFT);
+}
+
+static void vpe_top_vpdma_reset(struct vpe_dev *dev)
+{
+	write_field_reg(dev, VPE_CLK_RESET, 1, VPE_VPDMA_CLK_RESET_MASK,
+		VPE_VPDMA_CLK_RESET_SHIFT);
+
+	usleep_range(100, 150);
+
+	write_field_reg(dev, VPE_CLK_RESET, 0, VPE_VPDMA_CLK_RESET_MASK,
+		VPE_VPDMA_CLK_RESET_SHIFT);
+}
+
+/*
+ * Load the correct of upsampler coefficients into the shadow MMRs
+ */
+static void set_us_coefficients(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	u32 *us1_reg = &mmr_adb->us1_regs[0];
+	u32 *us2_reg = &mmr_adb->us2_regs[0];
+	u32 *us3_reg = &mmr_adb->us3_regs[0];
+	const unsigned short *cp, *end_cp;
+
+	cp = &us_coeffs[0].anchor_fid0_c0;
+
+	end_cp = cp + sizeof(us_coeffs[0]) / sizeof(*cp);
+
+	while (cp < end_cp) {
+		write_field(us1_reg, *cp++, VPE_US_C0_MASK, VPE_US_C0_SHIFT);
+		write_field(us1_reg, *cp++, VPE_US_C1_MASK, VPE_US_C1_SHIFT);
+		*us2_reg++ = *us1_reg;
+		*us3_reg++ = *us1_reg++;
+	}
+	ctx->load_mmrs = true;
+}
+
+/*
+ * Set the upsampler config mode and the VPDMA line mode in the shadow MMRs.
+ */
+static void set_cfg_and_line_modes(struct vpe_ctx *ctx)
+{
+	struct vpe_fmt *fmt = ctx->q_data[Q_DATA_SRC].fmt;
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	u32 *us1_reg0 = &mmr_adb->us1_regs[0];
+	u32 *us2_reg0 = &mmr_adb->us2_regs[0];
+	u32 *us3_reg0 = &mmr_adb->us3_regs[0];
+	int line_mode = 1;
+	int cfg_mode = 1;
+
+	/*
+	 * Cfg Mode 0: YUV420 source, enable upsampler, DEI is de-interlacing.
+	 * Cfg Mode 1: YUV422 source, disable upsampler, DEI is de-interlacing.
+	 */
+
+	if (fmt->fourcc == V4L2_PIX_FMT_NV12) {
+		cfg_mode = 0;
+		line_mode = 0;		/* double lines to line buffer */
+	}
+
+	write_field(us1_reg0, cfg_mode, VPE_US_MODE_MASK, VPE_US_MODE_SHIFT);
+	write_field(us2_reg0, cfg_mode, VPE_US_MODE_MASK, VPE_US_MODE_SHIFT);
+	write_field(us3_reg0, cfg_mode, VPE_US_MODE_MASK, VPE_US_MODE_SHIFT);
+
+	/* regs for now */
+	vpdma_set_line_mode(ctx->dev->vpdma, line_mode, VPE_CHAN_CHROMA1_IN);
+
+	/* frame start for input luma */
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_LUMA1_IN);
+
+	/* frame start for input chroma */
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_CHROMA1_IN);
+
+	ctx->load_mmrs = true;
+}
+
+/*
+ * Set the shadow registers that are modified when the source
+ * format changes.
+ */
+static void set_src_registers(struct vpe_ctx *ctx)
+{
+	set_us_coefficients(ctx);
+}
+
+/*
+ * Set the shadow registers that are modified when the destination
+ * format changes.
+ */
+static void set_dst_registers(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	struct vpe_fmt *fmt = ctx->q_data[Q_DATA_DST].fmt;
+	u32 val = 0;
+
+	/* select RGB path when color space conversion is supported in future */
+	if (fmt->fourcc == V4L2_PIX_FMT_RGB24)
+		val |= VPE_RGB_OUT_SELECT | VPE_CSC_SRC_DEI_SCALER;
+	else if (fmt->fourcc == V4L2_PIX_FMT_NV16)
+		val |= VPE_COLOR_SEPARATE_422;
+
+	/* The source of CHR_DS is always the scaler, whether it's used or not */
+	val |= VPE_DS_SRC_DEI_SCALER;
+
+	if (fmt->fourcc != V4L2_PIX_FMT_NV12)
+		val |= VPE_DS_BYPASS;
+
+	mmr_adb->out_fmt_reg[0] = val;
+
+	ctx->load_mmrs = true;
+}
+
+/*
+ * Set the de-interlacer shadow register values
+ */
+static void set_dei_regs_bypass(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	struct vpe_q_data *s_q_data = &ctx->q_data[Q_DATA_SRC];
+	unsigned int src_h = s_q_data->c_rect.height;
+	unsigned int src_w = s_q_data->c_rect.width;
+	u32 *dei_mmr0 = &mmr_adb->dei_regs[0];
+	u32 val = 0;
+
+	/*
+	 * according to TRM, we should set DEI in progressive bypass mode when
+	 * the input content is progressive, however, DEI is bypassed correctly
+	 * for both progressive and interlace content in interlace bypass mode.
+	 * It has been recommended not to use progressive bypass mode.
+	 */
+	val = VPE_DEI_INTERLACE_BYPASS;
+
+	val |= (src_h << VPE_DEI_HEIGHT_SHIFT) |
+		(src_w << VPE_DEI_WIDTH_SHIFT) |
+		VPE_DEI_FIELD_FLUSH;
+
+	*dei_mmr0 = val;
+
+	ctx->load_mmrs = true;
+}
+
+static void set_csc_coeff_bypass(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	u32 *shadow_csc_reg5 = &mmr_adb->csc_regs[5];
+
+	*shadow_csc_reg5 |= VPE_CSC_BYPASS;
+
+	ctx->load_mmrs = true;
+}
+
+static void set_sc_regs_bypass(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	u32 *sc_reg0 = &mmr_adb->sc_regs[0];
+	u32 val = 0;
+
+	val |= VPE_SC_BYPASS;
+	*sc_reg0 = val;
+
+	ctx->load_mmrs = true;
+}
+
+/*
+ * Set the shadow registers whose values are modified when either the
+ * source or destination format is changed.
+ */
+static int set_srcdst_params(struct vpe_ctx *ctx)
+{
+	ctx->sequence = 0;
+
+	set_cfg_and_line_modes(ctx);
+	set_dei_regs_bypass(ctx);
+	set_csc_coeff_bypass(ctx);
+	set_sc_regs_bypass(ctx);
+
+	return 0;
+}
+
+/*
+ * Return the vpe_ctx structure for a given struct file
+ */
+static struct vpe_ctx *file2ctx(struct file *file)
+{
+	return container_of(file->private_data, struct vpe_ctx, fh);
+}
+
+/*
+ * mem2mem callbacks
+ */
+
+/**
+ * job_ready() - check whether an instance is ready to be scheduled to run
+ */
+static int job_ready(void *priv)
+{
+	struct vpe_ctx *ctx = priv;
+	int needed = ctx->bufs_per_job;
+
+	if (v4l2_m2m_num_src_bufs_ready(ctx->m2m_ctx) < needed)
+		return 0;
+
+	return 1;
+}
+
+static void job_abort(void *priv)
+{
+	struct vpe_ctx *ctx = priv;
+
+	/* Will cancel the transaction in the next interrupt handler */
+	ctx->aborting = 1;
+}
+
+/*
+ * Lock access to the device
+ */
+static void vpe_lock(void *priv)
+{
+	struct vpe_ctx *ctx = priv;
+	struct vpe_dev *dev = ctx->dev;
+	mutex_lock(&dev->dev_mutex);
+}
+
+static void vpe_unlock(void *priv)
+{
+	struct vpe_ctx *ctx = priv;
+	struct vpe_dev *dev = ctx->dev;
+	mutex_unlock(&dev->dev_mutex);
+}
+
+static void vpe_dump_regs(struct vpe_dev *dev)
+{
+#define DUMPREG(r) vpe_dbg(dev, "%-35s %08x\n", #r, read_reg(dev, VPE_##r))
+
+	vpe_dbg(dev, "VPE Registers:\n");
+
+	DUMPREG(PID);
+	DUMPREG(SYSCONFIG);
+	DUMPREG(INT0_STATUS0_RAW);
+	DUMPREG(INT0_STATUS0);
+	DUMPREG(INT0_ENABLE0);
+	DUMPREG(INT0_STATUS1_RAW);
+	DUMPREG(INT0_STATUS1);
+	DUMPREG(INT0_ENABLE1);
+	DUMPREG(CLK_ENABLE);
+	DUMPREG(CLK_RESET);
+	DUMPREG(CLK_FORMAT_SELECT);
+	DUMPREG(CLK_RANGE_MAP);
+	DUMPREG(US1_R0);
+	DUMPREG(US1_R1);
+	DUMPREG(US1_R2);
+	DUMPREG(US1_R3);
+	DUMPREG(US1_R4);
+	DUMPREG(US1_R5);
+	DUMPREG(US1_R6);
+	DUMPREG(US1_R7);
+	DUMPREG(US2_R0);
+	DUMPREG(US2_R1);
+	DUMPREG(US2_R2);
+	DUMPREG(US2_R3);
+	DUMPREG(US2_R4);
+	DUMPREG(US2_R5);
+	DUMPREG(US2_R6);
+	DUMPREG(US2_R7);
+	DUMPREG(US3_R0);
+	DUMPREG(US3_R1);
+	DUMPREG(US3_R2);
+	DUMPREG(US3_R3);
+	DUMPREG(US3_R4);
+	DUMPREG(US3_R5);
+	DUMPREG(US3_R6);
+	DUMPREG(US3_R7);
+	DUMPREG(DEI_FRAME_SIZE);
+	DUMPREG(MDT_BYPASS);
+	DUMPREG(MDT_SF_THRESHOLD);
+	DUMPREG(EDI_CONFIG);
+	DUMPREG(DEI_EDI_LUT_R0);
+	DUMPREG(DEI_EDI_LUT_R1);
+	DUMPREG(DEI_EDI_LUT_R2);
+	DUMPREG(DEI_EDI_LUT_R3);
+	DUMPREG(DEI_FMD_WINDOW_R0);
+	DUMPREG(DEI_FMD_WINDOW_R1);
+	DUMPREG(DEI_FMD_CONTROL_R0);
+	DUMPREG(DEI_FMD_CONTROL_R1);
+	DUMPREG(DEI_FMD_STATUS_R0);
+	DUMPREG(DEI_FMD_STATUS_R1);
+	DUMPREG(DEI_FMD_STATUS_R2);
+	DUMPREG(SC_MP_SC0);
+	DUMPREG(SC_MP_SC1);
+	DUMPREG(SC_MP_SC2);
+	DUMPREG(SC_MP_SC3);
+	DUMPREG(SC_MP_SC4);
+	DUMPREG(SC_MP_SC5);
+	DUMPREG(SC_MP_SC6);
+	DUMPREG(SC_MP_SC8);
+	DUMPREG(SC_MP_SC9);
+	DUMPREG(SC_MP_SC10);
+	DUMPREG(SC_MP_SC11);
+	DUMPREG(SC_MP_SC12);
+	DUMPREG(SC_MP_SC13);
+	DUMPREG(SC_MP_SC17);
+	DUMPREG(SC_MP_SC18);
+	DUMPREG(SC_MP_SC19);
+	DUMPREG(SC_MP_SC20);
+	DUMPREG(SC_MP_SC21);
+	DUMPREG(SC_MP_SC22);
+	DUMPREG(SC_MP_SC23);
+	DUMPREG(SC_MP_SC24);
+	DUMPREG(SC_MP_SC25);
+	DUMPREG(CSC_CSC00);
+	DUMPREG(CSC_CSC01);
+	DUMPREG(CSC_CSC02);
+	DUMPREG(CSC_CSC03);
+	DUMPREG(CSC_CSC04);
+	DUMPREG(CSC_CSC05);
+#undef DUMPREG
+}
+
+static void add_out_dtd(struct vpe_ctx *ctx, int port)
+{
+	struct vpe_q_data *q_data = &ctx->q_data[Q_DATA_DST];
+	const struct vpe_port_data *p_data = &port_data[port];
+	struct vb2_buffer *vb = ctx->dst_vb;
+	struct v4l2_rect *c_rect = &q_data->c_rect;
+	struct vpe_fmt *fmt = q_data->fmt;
+	const struct vpdma_data_format *vpdma_fmt;
+	int plane = fmt->coplanar ? p_data->vb_part : 0;
+	dma_addr_t dma_addr;
+	u32 flags = 0;
+
+	vpdma_fmt = fmt->vpdma_fmt[plane];
+	dma_addr = vb2_dma_contig_plane_dma_addr(vb, plane);
+	if (!dma_addr) {
+		vpe_err(ctx->dev,
+			"acquiring output buffer(%d) dma_addr failed\n",
+			port);
+		return;
+	}
+
+	if (q_data->flags & Q_DATA_FRAME_1D)
+		flags |= VPDMA_DATA_FRAME_1D;
+	if (q_data->flags & Q_DATA_MODE_TILED)
+		flags |= VPDMA_DATA_MODE_TILED;
+
+	vpdma_add_out_dtd(&ctx->desc_list, c_rect, vpdma_fmt, dma_addr,
+		p_data->channel, flags);
+}
+
+static void add_in_dtd(struct vpe_ctx *ctx, int port)
+{
+	struct vpe_q_data *q_data = &ctx->q_data[Q_DATA_SRC];
+	const struct vpe_port_data *p_data = &port_data[port];
+	struct vb2_buffer *vb = ctx->src_vb;
+	struct v4l2_rect *c_rect = &q_data->c_rect;
+	struct vpe_fmt *fmt = q_data->fmt;
+	const struct vpdma_data_format *vpdma_fmt;
+	int plane = fmt->coplanar ? p_data->vb_part : 0;
+	int field = 0;
+	dma_addr_t dma_addr;
+	u32 flags = 0;
+
+	vpdma_fmt = fmt->vpdma_fmt[plane];
+
+	dma_addr = vb2_dma_contig_plane_dma_addr(vb, plane);
+	if (!dma_addr) {
+		vpe_err(ctx->dev,
+			"acquiring input buffer(%d) dma_addr failed\n",
+			port);
+		return;
+	}
+
+	if (q_data->flags & Q_DATA_FRAME_1D)
+		flags |= VPDMA_DATA_FRAME_1D;
+	if (q_data->flags & Q_DATA_MODE_TILED)
+		flags |= VPDMA_DATA_MODE_TILED;
+
+	vpdma_add_in_dtd(&ctx->desc_list, q_data->width, q_data->height,
+		c_rect, vpdma_fmt, dma_addr, p_data->channel, field, flags);
+}
+
+/*
+ * Enable the expected IRQ sources
+ */
+static void enable_irqs(struct vpe_ctx *ctx)
+{
+	write_reg(ctx->dev, VPE_INT0_ENABLE0_SET, VPE_INT0_LIST0_COMPLETE);
+	write_reg(ctx->dev, VPE_INT0_ENABLE1_SET, VPE_DS1_UV_ERROR_INT);
+
+	vpdma_enable_list_complete_irq(ctx->dev->vpdma, 0, true);
+}
+
+static void disable_irqs(struct vpe_ctx *ctx)
+{
+	write_reg(ctx->dev, VPE_INT0_ENABLE0_CLR, 0xffffffff);
+	write_reg(ctx->dev, VPE_INT0_ENABLE1_CLR, 0xffffffff);
+
+	vpdma_enable_list_complete_irq(ctx->dev->vpdma, 0, false);
+}
+
+/* device_run() - prepares and starts the device
+ *
+ * This function is only called when both the source and destination
+ * buffers are in place.
+ */
+static void device_run(void *priv)
+{
+	struct vpe_ctx *ctx = priv;
+	struct vpe_q_data *d_q_data = &ctx->q_data[Q_DATA_DST];
+
+	ctx->src_vb = v4l2_m2m_src_buf_remove(ctx->m2m_ctx);
+	WARN_ON(ctx->src_vb == NULL);
+	ctx->dst_vb = v4l2_m2m_dst_buf_remove(ctx->m2m_ctx);
+	WARN_ON(ctx->dst_vb == NULL);
+
+	/* config descriptors */
+	if (ctx->dev->loaded_mmrs != ctx->mmr_adb.dma_addr || ctx->load_mmrs) {
+		vpdma_map_desc_buf(ctx->dev->vpdma, &ctx->mmr_adb);
+		vpdma_add_cfd_adb(&ctx->desc_list, CFD_MMR_CLIENT, &ctx->mmr_adb);
+		ctx->dev->loaded_mmrs = ctx->mmr_adb.dma_addr;
+		ctx->load_mmrs = false;
+	}
+
+	add_out_dtd(ctx, VPE_PORT_LUMA_OUT);
+	if (d_q_data->fmt->coplanar)
+		add_out_dtd(ctx, VPE_PORT_CHROMA_OUT);
+
+	add_in_dtd(ctx, VPE_PORT_LUMA1_IN);
+	add_in_dtd(ctx, VPE_PORT_CHROMA1_IN);
+
+	/* sync on channel control descriptors for output ports */
+	vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_LUMA_OUT);
+	if (d_q_data->fmt->coplanar)
+		vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_CHROMA_OUT);
+
+	enable_irqs(ctx);
+
+	vpdma_map_desc_buf(ctx->dev->vpdma, &ctx->desc_list.buf);
+	vpdma_submit_descs(ctx->dev->vpdma, &ctx->desc_list);
+}
+
+static void ds1_uv_error(struct vpe_ctx *ctx)
+{
+	dev_warn(ctx->dev->v4l2_dev.dev,
+		"received downsampler error interrupt\n");
+}
+
+static irqreturn_t vpe_irq(int irq_vpe, void *data)
+{
+	struct vpe_dev *dev = (struct vpe_dev *)data;
+	struct vpe_ctx *ctx;
+	struct vb2_buffer *s_vb, *d_vb;
+	struct v4l2_buffer *s_buf, *d_buf;
+	unsigned long flags;
+	u32 irqst0, irqst1;
+
+	irqst0 = read_reg(dev, VPE_INT0_STATUS0);
+	if (irqst0) {
+		write_reg(dev, VPE_INT0_STATUS0_CLR, irqst0);
+		vpe_dbg(dev, "INT0_STATUS0 = 0x%08x\n", irqst0);
+	}
+
+	irqst1 = read_reg(dev, VPE_INT0_STATUS1);
+	if (irqst1) {
+		write_reg(dev, VPE_INT0_STATUS1_CLR, irqst1);
+		vpe_dbg(dev, "INT0_STATUS1 = 0x%08x\n", irqst1);
+	}
+
+	ctx = v4l2_m2m_get_curr_priv(dev->m2m_dev);
+	if (!ctx) {
+		vpe_err(dev, "instance released before end of transaction\n");
+		goto handled;
+	}
+
+	if (irqst1 & VPE_DS1_UV_ERROR_INT) {
+		irqst1 &= ~VPE_DS1_UV_ERROR_INT;
+		ds1_uv_error(ctx);
+	}
+
+	if (irqst0) {
+		if (irqst0 & VPE_INT0_LIST0_COMPLETE)
+			vpdma_clear_list_stat(ctx->dev->vpdma);
+
+		irqst0 &= ~(VPE_INT0_LIST0_COMPLETE);
+	}
+
+	if (irqst0 | irqst1) {
+		dev_warn(dev->v4l2_dev.dev, "Unexpected interrupt: "
+			"INT0_STATUS0 = 0x%08x, INT0_STATUS1 = 0x%08x\n",
+			irqst0, irqst1);
+	}
+
+	disable_irqs(ctx);
+
+	vpdma_unmap_desc_buf(dev->vpdma, &ctx->desc_list.buf);
+	vpdma_unmap_desc_buf(dev->vpdma, &ctx->mmr_adb);
+
+	vpdma_reset_desc_list(&ctx->desc_list);
+
+	if (ctx->aborting)
+		goto finished;
+
+	s_vb = ctx->src_vb;
+	d_vb = ctx->dst_vb;
+	s_buf = &s_vb->v4l2_buf;
+	d_buf = &d_vb->v4l2_buf;
+
+	d_buf->timestamp = s_buf->timestamp;
+	if (s_buf->flags & V4L2_BUF_FLAG_TIMECODE) {
+		d_buf->flags |= V4L2_BUF_FLAG_TIMECODE;
+		d_buf->timecode = s_buf->timecode;
+	}
+
+	d_buf->sequence = ctx->sequence;
+
+	ctx->sequence++;
+
+	spin_lock_irqsave(&dev->lock, flags);
+	v4l2_m2m_buf_done(s_vb, VB2_BUF_STATE_DONE);
+	v4l2_m2m_buf_done(d_vb, VB2_BUF_STATE_DONE);
+	spin_unlock_irqrestore(&dev->lock, flags);
+
+	ctx->bufs_completed++;
+	if (ctx->bufs_completed < ctx->bufs_per_job) {
+		device_run(ctx);
+		goto handled;
+	}
+
+finished:
+	vpe_dbg(ctx->dev, "finishing transaction\n");
+	ctx->bufs_completed = 0;
+	v4l2_m2m_job_finish(dev->m2m_dev, ctx->m2m_ctx);
+handled:
+	return IRQ_HANDLED;
+}
+
+/*
+ * video ioctls
+ */
+static int vpe_querycap(struct file *file, void *priv,
+			struct v4l2_capability *cap)
+{
+	strncpy(cap->driver, VPE_MODULE_NAME, sizeof(cap->driver) - 1);
+	strncpy(cap->card, VPE_MODULE_NAME, sizeof(cap->card) - 1);
+	strlcpy(cap->bus_info, VPE_MODULE_NAME, sizeof(cap->bus_info));
+	cap->device_caps  = V4L2_CAP_VIDEO_M2M | V4L2_CAP_STREAMING;
+	cap->capabilities = cap->device_caps | V4L2_CAP_DEVICE_CAPS;
+	return 0;
+}
+
+static int __enum_fmt(struct v4l2_fmtdesc *f, u32 type)
+{
+	int i, index;
+	struct vpe_fmt *fmt = NULL;
+
+	index = 0;
+	for (i = 0; i < ARRAY_SIZE(vpe_formats); ++i) {
+		if (vpe_formats[i].types & type) {
+			if (index == f->index) {
+				fmt = &vpe_formats[i];
+				break;
+			}
+			index++;
+		}
+	}
+
+	if (!fmt)
+		return -EINVAL;
+
+	strncpy(f->description, fmt->name, sizeof(f->description) - 1);
+	f->pixelformat = fmt->fourcc;
+	return 0;
+}
+
+static int vpe_enum_fmt(struct file *file, void *priv,
+				struct v4l2_fmtdesc *f)
+{
+	if (V4L2_TYPE_IS_OUTPUT(f->type))
+		return __enum_fmt(f, VPE_FMT_TYPE_OUTPUT);
+	else
+		return __enum_fmt(f, VPE_FMT_TYPE_CAPTURE);
+}
+
+static int vpe_g_fmt(struct file *file, void *priv, struct v4l2_format *f)
+{
+	struct v4l2_pix_format_mplane *pix = &f->fmt.pix_mp;
+	struct vpe_ctx *ctx = file2ctx(file);
+	struct vb2_queue *vq;
+	struct vpe_q_data *q_data;
+	int i;
+
+	vq = v4l2_m2m_get_vq(ctx->m2m_ctx, f->type);
+	if (!vq)
+		return -EINVAL;
+
+	q_data = get_q_data(ctx, f->type);
+
+	pix->width = q_data->width;
+	pix->height = q_data->height;
+	pix->pixelformat = q_data->fmt->fourcc;
+	pix->colorspace = q_data->colorspace;
+	pix->num_planes = q_data->fmt->coplanar ? 2 : 1;
+
+	for (i = 0; i < pix->num_planes; i++) {
+		pix->plane_fmt[i].bytesperline = q_data->bytesperline[i];
+		pix->plane_fmt[i].sizeimage = q_data->sizeimage[i];
+	}
+
+	return 0;
+}
+
+static int __vpe_try_fmt(struct vpe_ctx *ctx, struct v4l2_format *f,
+		       struct vpe_fmt *fmt, int type)
+{
+	struct v4l2_pix_format_mplane *pix = &f->fmt.pix_mp;
+	struct v4l2_plane_pix_format *plane_fmt;
+	int i;
+
+	if (!fmt || !(fmt->types & type)) {
+		vpe_err(ctx->dev, "Fourcc format (0x%08x) invalid.\n",
+			pix->pixelformat);
+		return -EINVAL;
+	}
+
+	pix->field = V4L2_FIELD_NONE;
+
+	v4l_bound_align_image(&pix->width, MIN_W, MAX_W, W_ALIGN,
+			      &pix->height, MIN_H, MAX_H, H_ALIGN,
+			      S_ALIGN);
+
+	pix->num_planes = fmt->coplanar ? 2 : 1;
+	pix->pixelformat = fmt->fourcc;
+	pix->colorspace = fmt->fourcc == V4L2_PIX_FMT_RGB24 ?
+			V4L2_COLORSPACE_SRGB : V4L2_COLORSPACE_SMPTE170M;
+
+
+	for (i = 0; i < pix->num_planes; i++) {
+		int depth;
+
+		plane_fmt = &pix->plane_fmt[i];
+		depth = fmt->vpdma_fmt[i]->depth;
+
+		if (i == VPE_LUMA)
+			plane_fmt->bytesperline =
+					round_up((pix->width * depth) >> 3,
+						1 << L_ALIGN);
+		else
+			plane_fmt->bytesperline = pix->width;
+
+		plane_fmt->sizeimage =
+				(pix->height * pix->width * depth) >> 3;
+	}
+
+	return 0;
+}
+
+static int vpe_try_fmt(struct file *file, void *priv, struct v4l2_format *f)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+	struct vpe_fmt *fmt = find_format(f);
+
+	if (V4L2_TYPE_IS_OUTPUT(f->type))
+		return __vpe_try_fmt(ctx, f, fmt, VPE_FMT_TYPE_OUTPUT);
+	else
+		return __vpe_try_fmt(ctx, f, fmt, VPE_FMT_TYPE_CAPTURE);
+}
+
+static int __vpe_s_fmt(struct vpe_ctx *ctx, struct v4l2_format *f)
+{
+	struct v4l2_pix_format_mplane *pix = &f->fmt.pix_mp;
+	struct v4l2_plane_pix_format *plane_fmt;
+	struct vpe_q_data *q_data;
+	struct vb2_queue *vq;
+	int i;
+
+	vq = v4l2_m2m_get_vq(ctx->m2m_ctx, f->type);
+	if (!vq)
+		return -EINVAL;
+
+	if (vb2_is_busy(vq)) {
+		vpe_err(ctx->dev, "queue busy\n");
+		return -EBUSY;
+	}
+
+	q_data = get_q_data(ctx, f->type);
+	if (!q_data)
+		return -EINVAL;
+
+	q_data->fmt		= find_format(f);
+	q_data->width		= pix->width;
+	q_data->height		= pix->height;
+	q_data->colorspace	= pix->colorspace;
+
+	for (i = 0; i < pix->num_planes; i++) {
+		plane_fmt = &pix->plane_fmt[i];
+
+		q_data->bytesperline[i]	= plane_fmt->bytesperline;
+		q_data->sizeimage[i]	= plane_fmt->sizeimage;
+	}
+
+	q_data->c_rect.left	= 0;
+	q_data->c_rect.top	= 0;
+	q_data->c_rect.width	= q_data->width;
+	q_data->c_rect.height	= q_data->height;
+
+	vpe_dbg(ctx->dev, "Setting format for type %d, wxh: %dx%d, fmt: %d bpl_y %d",
+		f->type, q_data->width, q_data->height, q_data->fmt->fourcc,
+		q_data->bytesperline[VPE_LUMA]);
+	if (q_data->fmt->coplanar)
+		vpe_dbg(ctx->dev, " bpl_uv %d\n",
+			q_data->bytesperline[VPE_CHROMA]);
+
+	return 0;
+}
+
+static int vpe_s_fmt(struct file *file, void *priv, struct v4l2_format *f)
+{
+	int ret;
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	ret = vpe_try_fmt(file, priv, f);
+	if (ret)
+		return ret;
+
+	ret = __vpe_s_fmt(ctx, f);
+	if (ret)
+		return ret;
+
+	if (V4L2_TYPE_IS_OUTPUT(f->type))
+		set_src_registers(ctx);
+	else
+		set_dst_registers(ctx);
+
+	return set_srcdst_params(ctx);
+}
+
+static int vpe_reqbufs(struct file *file, void *priv,
+		       struct v4l2_requestbuffers *reqbufs)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	return v4l2_m2m_reqbufs(file, ctx->m2m_ctx, reqbufs);
+}
+
+static int vpe_querybuf(struct file *file, void *priv, struct v4l2_buffer *buf)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	return v4l2_m2m_querybuf(file, ctx->m2m_ctx, buf);
+}
+
+static int vpe_qbuf(struct file *file, void *priv, struct v4l2_buffer *buf)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	return v4l2_m2m_qbuf(file, ctx->m2m_ctx, buf);
+}
+
+static int vpe_dqbuf(struct file *file, void *priv, struct v4l2_buffer *buf)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	return v4l2_m2m_dqbuf(file, ctx->m2m_ctx, buf);
+}
+
+static int vpe_streamon(struct file *file, void *priv, enum v4l2_buf_type type)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	return v4l2_m2m_streamon(file, ctx->m2m_ctx, type);
+}
+
+static int vpe_streamoff(struct file *file, void *priv, enum v4l2_buf_type type)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	vpe_dump_regs(ctx->dev);
+	vpdma_dump_regs(ctx->dev->vpdma);
+
+	return v4l2_m2m_streamoff(file, ctx->m2m_ctx, type);
+}
+
+#define V4L2_CID_TRANS_NUM_BUFS		(V4L2_CID_USER_BASE + 0x1000)
+
+static int vpe_s_ctrl(struct v4l2_ctrl *ctrl)
+{
+	struct vpe_ctx *ctx =
+		container_of(ctrl->handler, struct vpe_ctx, hdl);
+
+	switch (ctrl->id) {
+	case V4L2_CID_TRANS_NUM_BUFS:
+		ctx->bufs_per_job = ctrl->val;
+		break;
+
+	default:
+		vpe_err(ctx->dev, "Invalid control\n");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static const struct v4l2_ctrl_ops vpe_ctrl_ops = {
+	.s_ctrl = vpe_s_ctrl,
+};
+
+static const struct v4l2_ioctl_ops vpe_ioctl_ops = {
+	.vidioc_querycap	= vpe_querycap,
+
+	.vidioc_enum_fmt_vid_cap_mplane = vpe_enum_fmt,
+	.vidioc_g_fmt_vid_cap_mplane	= vpe_g_fmt,
+	.vidioc_try_fmt_vid_cap_mplane	= vpe_try_fmt,
+	.vidioc_s_fmt_vid_cap_mplane	= vpe_s_fmt,
+
+	.vidioc_enum_fmt_vid_out_mplane = vpe_enum_fmt,
+	.vidioc_g_fmt_vid_out_mplane	= vpe_g_fmt,
+	.vidioc_try_fmt_vid_out_mplane	= vpe_try_fmt,
+	.vidioc_s_fmt_vid_out_mplane	= vpe_s_fmt,
+
+	.vidioc_reqbufs		= vpe_reqbufs,
+	.vidioc_querybuf	= vpe_querybuf,
+
+	.vidioc_qbuf		= vpe_qbuf,
+	.vidioc_dqbuf		= vpe_dqbuf,
+
+	.vidioc_streamon	= vpe_streamon,
+	.vidioc_streamoff	= vpe_streamoff,
+	.vidioc_subscribe_event = v4l2_ctrl_subscribe_event,
+	.vidioc_unsubscribe_event = v4l2_event_unsubscribe,
+};
+
+/*
+ * Queue operations
+ */
+static int vpe_queue_setup(struct vb2_queue *vq,
+			   const struct v4l2_format *fmt,
+			   unsigned int *nbuffers, unsigned int *nplanes,
+			   unsigned int sizes[], void *alloc_ctxs[])
+{
+	int i;
+	struct vpe_ctx *ctx = vb2_get_drv_priv(vq);
+	struct vpe_q_data *q_data;
+
+	q_data = get_q_data(ctx, vq->type);
+
+	*nplanes = q_data->fmt->coplanar ? 2 : 1;
+
+	for (i = 0; i < *nplanes; i++) {
+		sizes[i] = q_data->sizeimage[i];
+		alloc_ctxs[i] = ctx->dev->alloc_ctx;
+	}
+
+	vpe_dbg(ctx->dev, "get %d buffer(s) of size %d", *nbuffers,
+		sizes[VPE_LUMA]);
+	if (q_data->fmt->coplanar)
+		vpe_dbg(ctx->dev, " and %d\n", sizes[VPE_CHROMA]);
+
+	return 0;
+}
+
+static int vpe_buf_prepare(struct vb2_buffer *vb)
+{
+	struct vpe_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
+	struct vpe_q_data *q_data;
+	int i, num_planes;
+
+	vpe_dbg(ctx->dev, "type: %d\n", vb->vb2_queue->type);
+
+	q_data = get_q_data(ctx, vb->vb2_queue->type);
+	num_planes = q_data->fmt->coplanar ? 2 : 1;
+
+	for (i = 0; i < num_planes; i++) {
+		if (vb2_plane_size(vb, i) < q_data->sizeimage[i]) {
+			vpe_err(ctx->dev,
+				"data will not fit into plane (%lu < %lu)\n",
+				vb2_plane_size(vb, i),
+				(long) q_data->sizeimage[i]);
+			return -EINVAL;
+		}
+	}
+
+	for (i = 0; i < num_planes; i++)
+		vb2_set_plane_payload(vb, i, q_data->sizeimage[i]);
+
+	return 0;
+}
+
+static void vpe_buf_queue(struct vb2_buffer *vb)
+{
+	struct vpe_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
+	v4l2_m2m_buf_queue(ctx->m2m_ctx, vb);
+}
+
+static void vpe_wait_prepare(struct vb2_queue *q)
+{
+	struct vpe_ctx *ctx = vb2_get_drv_priv(q);
+	vpe_unlock(ctx);
+}
+
+static void vpe_wait_finish(struct vb2_queue *q)
+{
+	struct vpe_ctx *ctx = vb2_get_drv_priv(q);
+	vpe_lock(ctx);
+}
+
+static struct vb2_ops vpe_qops = {
+	.queue_setup	 = vpe_queue_setup,
+	.buf_prepare	 = vpe_buf_prepare,
+	.buf_queue	 = vpe_buf_queue,
+	.wait_prepare	 = vpe_wait_prepare,
+	.wait_finish	 = vpe_wait_finish,
+};
+
+static int queue_init(void *priv, struct vb2_queue *src_vq,
+		      struct vb2_queue *dst_vq)
+{
+	struct vpe_ctx *ctx = priv;
+	int ret;
+
+	memset(src_vq, 0, sizeof(*src_vq));
+	src_vq->type = V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE;
+	src_vq->io_modes = VB2_MMAP;
+	src_vq->drv_priv = ctx;
+	src_vq->buf_struct_size = sizeof(struct v4l2_m2m_buffer);
+	src_vq->ops = &vpe_qops;
+	src_vq->mem_ops = &vb2_dma_contig_memops;
+	src_vq->timestamp_type = V4L2_BUF_FLAG_TIMESTAMP_MONOTONIC;
+
+	ret = vb2_queue_init(src_vq);
+	if (ret)
+		return ret;
+
+	memset(dst_vq, 0, sizeof(*dst_vq));
+	dst_vq->type = V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE;
+	dst_vq->io_modes = VB2_MMAP;
+	dst_vq->drv_priv = ctx;
+	dst_vq->buf_struct_size = sizeof(struct v4l2_m2m_buffer);
+	dst_vq->ops = &vpe_qops;
+	dst_vq->mem_ops = &vb2_dma_contig_memops;
+	dst_vq->timestamp_type = V4L2_BUF_FLAG_TIMESTAMP_MONOTONIC;
+
+	return vb2_queue_init(dst_vq);
+}
+
+static const struct v4l2_ctrl_config vpe_bufs_per_job = {
+	.ops = &vpe_ctrl_ops,
+	.id = V4L2_CID_TRANS_NUM_BUFS,
+	.name = "Buffers Per Transaction",
+	.type = V4L2_CTRL_TYPE_INTEGER,
+	.def = VPE_DEF_BUFS_PER_JOB,
+	.min = 1,
+	.max = VIDEO_MAX_FRAME,
+	.step = 1,
+};
+
+/*
+ * File operations
+ */
+static int vpe_open(struct file *file)
+{
+	struct vpe_dev *dev = video_drvdata(file);
+	struct vpe_ctx *ctx = NULL;
+	struct vpe_q_data *s_q_data;
+	struct v4l2_ctrl_handler *hdl;
+	int ret;
+
+	vpe_dbg(dev, "vpe_open\n");
+
+	if (!dev->vpdma->ready) {
+		vpe_err(dev, "vpdma firmware not loaded\n");
+		return -ENODEV;
+	}
+
+	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
+	if (!ctx)
+		return -ENOMEM;
+
+	ctx->dev = dev;
+
+	if (mutex_lock_interruptible(&dev->dev_mutex)) {
+		ret = -ERESTARTSYS;
+		goto free_ctx;
+	}
+
+	ret = vpdma_create_desc_list(&ctx->desc_list, VPE_DESC_LIST_SIZE,
+			VPDMA_LIST_TYPE_NORMAL);
+	if (ret != 0)
+		goto unlock;
+
+	ret = vpdma_alloc_desc_buf(&ctx->mmr_adb, sizeof(struct vpe_mmr_adb));
+	if (ret != 0)
+		goto free_desc_list;
+
+	init_adb_hdrs(ctx);
+
+	v4l2_fh_init(&ctx->fh, video_devdata(file));
+	file->private_data = &ctx->fh;
+
+	hdl = &ctx->hdl;
+	v4l2_ctrl_handler_init(hdl, 1);
+	v4l2_ctrl_new_custom(hdl, &vpe_bufs_per_job, NULL);
+	if (hdl->error) {
+		ret = hdl->error;
+		goto exit_fh;
+	}
+	ctx->fh.ctrl_handler = hdl;
+	v4l2_ctrl_handler_setup(hdl);
+
+	s_q_data = &ctx->q_data[Q_DATA_SRC];
+	s_q_data->fmt = &vpe_formats[2];
+	s_q_data->width = 1920;
+	s_q_data->height = 1080;
+	s_q_data->sizeimage[VPE_LUMA] = (s_q_data->width * s_q_data->height *
+			s_q_data->fmt->vpdma_fmt[VPE_LUMA]->depth) >> 3;
+	s_q_data->colorspace = V4L2_COLORSPACE_SMPTE240M;
+	s_q_data->c_rect.left = 0;
+	s_q_data->c_rect.top = 0;
+	s_q_data->c_rect.width = s_q_data->width;
+	s_q_data->c_rect.height = s_q_data->height;
+	s_q_data->flags = 0;
+
+	ctx->q_data[Q_DATA_DST] = *s_q_data;
+
+	set_src_registers(ctx);
+	set_dst_registers(ctx);
+	ret = set_srcdst_params(ctx);
+	if (ret)
+		goto exit_fh;
+
+	ctx->m2m_ctx = v4l2_m2m_ctx_init(dev->m2m_dev, ctx, &queue_init);
+
+	if (IS_ERR(ctx->m2m_ctx)) {
+		ret = PTR_ERR(ctx->m2m_ctx);
+		goto exit_fh;
+	}
+
+	v4l2_fh_add(&ctx->fh);
+
+	/*
+	 * for now, just report the creation of the first instance, we can later
+	 * optimize the driver to enable or disable clocks when the first
+	 * instance is created or the last instance released
+	 */
+	if (atomic_inc_return(&dev->num_instances) == 1)
+		vpe_dbg(dev, "first instance created\n");
+
+	ctx->bufs_per_job = VPE_DEF_BUFS_PER_JOB;
+
+	ctx->load_mmrs = true;
+
+	vpe_dbg(dev, "created instance %p, m2m_ctx: %p\n",
+		ctx, ctx->m2m_ctx);
+
+	mutex_unlock(&dev->dev_mutex);
+
+	return 0;
+exit_fh:
+	v4l2_ctrl_handler_free(hdl);
+	v4l2_fh_exit(&ctx->fh);
+	vpdma_free_desc_buf(&ctx->mmr_adb);
+free_desc_list:
+	vpdma_free_desc_list(&ctx->desc_list);
+unlock:
+	mutex_unlock(&dev->dev_mutex);
+free_ctx:
+	kfree(ctx);
+	return ret;
+}
+
+static int vpe_release(struct file *file)
+{
+	struct vpe_dev *dev = video_drvdata(file);
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	vpe_dbg(dev, "releasing instance %p\n", ctx);
+
+	mutex_lock(&dev->dev_mutex);
+	vpdma_free_desc_list(&ctx->desc_list);
+	vpdma_free_desc_buf(&ctx->mmr_adb);
+
+	v4l2_fh_del(&ctx->fh);
+	v4l2_fh_exit(&ctx->fh);
+	v4l2_ctrl_handler_free(&ctx->hdl);
+	v4l2_m2m_ctx_release(ctx->m2m_ctx);
+
+	kfree(ctx);
+
+	/*
+	 * for now, just report the release of the last instance, we can later
+	 * optimize the driver to enable or disable clocks when the first
+	 * instance is created or the last instance released
+	 */
+	if (atomic_dec_return(&dev->num_instances) == 0)
+		vpe_dbg(dev, "last instance released\n");
+
+	mutex_unlock(&dev->dev_mutex);
+
+	return 0;
+}
+
+static unsigned int vpe_poll(struct file *file,
+			     struct poll_table_struct *wait)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+	struct vpe_dev *dev = ctx->dev;
+	int ret;
+
+	mutex_lock(&dev->dev_mutex);
+	ret = v4l2_m2m_poll(file, ctx->m2m_ctx, wait);
+	mutex_unlock(&dev->dev_mutex);
+	return ret;
+}
+
+static int vpe_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+	struct vpe_dev *dev = ctx->dev;
+	int ret;
+
+	if (mutex_lock_interruptible(&dev->dev_mutex))
+		return -ERESTARTSYS;
+	ret = v4l2_m2m_mmap(file, ctx->m2m_ctx, vma);
+	mutex_unlock(&dev->dev_mutex);
+	return ret;
+}
+
+static const struct v4l2_file_operations vpe_fops = {
+	.owner		= THIS_MODULE,
+	.open		= vpe_open,
+	.release	= vpe_release,
+	.poll		= vpe_poll,
+	.unlocked_ioctl	= video_ioctl2,
+	.mmap		= vpe_mmap,
+};
+
+static struct video_device vpe_videodev = {
+	.name		= VPE_MODULE_NAME,
+	.fops		= &vpe_fops,
+	.ioctl_ops	= &vpe_ioctl_ops,
+	.minor		= -1,
+	.release	= video_device_release,
+	.vfl_dir	= VFL_DIR_M2M,
+};
+
+static struct v4l2_m2m_ops m2m_ops = {
+	.device_run	= device_run,
+	.job_ready	= job_ready,
+	.job_abort	= job_abort,
+	.lock		= vpe_lock,
+	.unlock		= vpe_unlock,
+};
+
+static int vpe_runtime_get(struct platform_device *pdev)
+{
+	int r;
+
+	dev_dbg(&pdev->dev, "vpe_runtime_get\n");
+
+	r = pm_runtime_get_sync(&pdev->dev);
+	WARN_ON(r < 0);
+	return r < 0 ? r : 0;
+}
+
+static void vpe_runtime_put(struct platform_device *pdev)
+{
+
+	int r;
+
+	dev_dbg(&pdev->dev, "vpe_runtime_put\n");
+
+	r = pm_runtime_put_sync(&pdev->dev);
+	WARN_ON(r < 0 && r != -ENOSYS);
+}
+
+static int vpe_probe(struct platform_device *pdev)
+{
+	struct vpe_dev *dev;
+	struct video_device *vfd;
+	struct resource *res;
+	int ret, irq, func;
+
+	dev = devm_kzalloc(&pdev->dev, sizeof(*dev), GFP_KERNEL);
+	if (IS_ERR(dev))
+		return PTR_ERR(dev);
+
+	spin_lock_init(&dev->lock);
+
+	ret = v4l2_device_register(&pdev->dev, &dev->v4l2_dev);
+	if (ret)
+		return ret;
+
+	atomic_set(&dev->num_instances, 0);
+	mutex_init(&dev->dev_mutex);
+
+	res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "vpe");
+	dev->base = devm_ioremap_resource(&pdev->dev, res);
+	if (IS_ERR(dev->base)) {
+		ret = PTR_ERR(dev->base);
+		goto v4l2_dev_unreg;
+	}
+
+	irq = platform_get_irq(pdev, 0);
+	ret = devm_request_irq(&pdev->dev, irq, vpe_irq, 0, VPE_MODULE_NAME,
+			dev);
+	if (ret)
+		goto v4l2_dev_unreg;
+
+	platform_set_drvdata(pdev, dev);
+
+	dev->alloc_ctx = vb2_dma_contig_init_ctx(&pdev->dev);
+	if (IS_ERR(dev->alloc_ctx)) {
+		vpe_err(dev, "Failed to alloc vb2 context\n");
+		ret = PTR_ERR(dev->alloc_ctx);
+		goto v4l2_dev_unreg;
+	}
+
+	dev->m2m_dev = v4l2_m2m_init(&m2m_ops);
+	if (IS_ERR(dev->m2m_dev)) {
+		vpe_err(dev, "Failed to init mem2mem device\n");
+		ret = PTR_ERR(dev->m2m_dev);
+		goto rel_ctx;
+	}
+
+	pm_runtime_enable(&pdev->dev);
+
+	ret = vpe_runtime_get(pdev);
+	if (ret)
+		goto rel_m2m;
+
+	/* Perform clk enable followed by reset */
+	vpe_set_clock_enable(dev, 1);
+
+	vpe_top_reset(dev);
+
+	func = read_field_reg(dev, VPE_PID, VPE_PID_FUNC_MASK,
+		VPE_PID_FUNC_SHIFT);
+	vpe_dbg(dev, "VPE PID function %x\n", func);
+
+	vpe_top_vpdma_reset(dev);
+
+	dev->vpdma = vpdma_create(pdev);
+	if (IS_ERR(dev->vpdma))
+		goto runtime_put;
+
+	vfd = &dev->vfd;
+	*vfd = vpe_videodev;
+	vfd->lock = &dev->dev_mutex;
+	vfd->v4l2_dev = &dev->v4l2_dev;
+
+	ret = video_register_device(vfd, VFL_TYPE_GRABBER, 0);
+	if (ret) {
+		vpe_err(dev, "Failed to register video device\n");
+		goto runtime_put;
+	}
+
+	video_set_drvdata(vfd, dev);
+	snprintf(vfd->name, sizeof(vfd->name), "%s", vpe_videodev.name);
+	dev_info(dev->v4l2_dev.dev, "Device registered as /dev/video%d\n",
+		vfd->num);
+
+	return 0;
+
+runtime_put:
+	vpe_runtime_put(pdev);
+rel_m2m:
+	pm_runtime_disable(&pdev->dev);
+	v4l2_m2m_release(dev->m2m_dev);
+rel_ctx:
+	vb2_dma_contig_cleanup_ctx(dev->alloc_ctx);
+v4l2_dev_unreg:
+	v4l2_device_unregister(&dev->v4l2_dev);
+
+	return ret;
+}
+
+static int vpe_remove(struct platform_device *pdev)
+{
+	struct vpe_dev *dev =
+		(struct vpe_dev *) platform_get_drvdata(pdev);
+
+	v4l2_info(&dev->v4l2_dev, "Removing " VPE_MODULE_NAME);
+
+	v4l2_m2m_release(dev->m2m_dev);
+	video_unregister_device(&dev->vfd);
+	v4l2_device_unregister(&dev->v4l2_dev);
+	vb2_dma_contig_cleanup_ctx(dev->alloc_ctx);
+
+	vpe_set_clock_enable(dev, 0);
+	vpe_runtime_put(pdev);
+	pm_runtime_disable(&pdev->dev);
+
+	return 0;
+}
+
+#if defined(CONFIG_OF)
+static const struct of_device_id vpe_of_match[] = {
+	{
+		.compatible = "ti,vpe",
+	},
+	{},
+};
+#else
+#define vpe_of_match NULL
+#endif
+
+static struct platform_driver vpe_pdrv = {
+	.probe		= vpe_probe,
+	.remove		= vpe_remove,
+	.driver		= {
+		.name	= VPE_MODULE_NAME,
+		.owner	= THIS_MODULE,
+		.of_match_table = vpe_of_match,
+	},
+};
+
+static void __exit vpe_exit(void)
+{
+	platform_driver_unregister(&vpe_pdrv);
+}
+
+static int __init vpe_init(void)
+{
+	return platform_driver_register(&vpe_pdrv);
+}
+
+module_init(vpe_init);
+module_exit(vpe_exit);
+
+MODULE_DESCRIPTION("TI VPE driver");
+MODULE_AUTHOR("Dale Farnsworth, <dale@farnsworth.org>");
+MODULE_LICENSE("GPL");
diff --git a/drivers/media/platform/ti-vpe/vpe_regs.h b/drivers/media/platform/ti-vpe/vpe_regs.h
new file mode 100644
index 0000000..ed214e8
--- /dev/null
+++ b/drivers/media/platform/ti-vpe/vpe_regs.h
@@ -0,0 +1,496 @@
+/*
+ * Copyright (c) 2013 Texas Instruments Inc.
+ *
+ * David Griego, <dagriego@biglakesoftware.com>
+ * Dale Farnsworth, <dale@farnsworth.org>
+ * Archit Taneja, <archit@ti.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#ifndef __TI_VPE_REGS_H
+#define __TI_VPE_REGS_H
+
+/* VPE register offsets and field selectors */
+
+/* VPE top level regs */
+#define VPE_PID				0x0000
+#define VPE_PID_MINOR_MASK		0x3f
+#define VPE_PID_MINOR_SHIFT		0
+#define VPE_PID_CUSTOM_MASK		0x03
+#define VPE_PID_CUSTOM_SHIFT		6
+#define VPE_PID_MAJOR_MASK		0x07
+#define VPE_PID_MAJOR_SHIFT		8
+#define VPE_PID_RTL_MASK		0x1f
+#define VPE_PID_RTL_SHIFT		11
+#define VPE_PID_FUNC_MASK		0xfff
+#define VPE_PID_FUNC_SHIFT		16
+#define VPE_PID_SCHEME_MASK		0x03
+#define VPE_PID_SCHEME_SHIFT		30
+
+#define VPE_SYSCONFIG			0x0010
+#define VPE_SYSCONFIG_IDLE_MASK		0x03
+#define VPE_SYSCONFIG_IDLE_SHIFT	2
+#define VPE_SYSCONFIG_STANDBY_MASK	0x03
+#define VPE_SYSCONFIG_STANDBY_SHIFT	4
+#define VPE_FORCE_IDLE_MODE		0
+#define VPE_NO_IDLE_MODE		1
+#define VPE_SMART_IDLE_MODE		2
+#define VPE_SMART_IDLE_WAKEUP_MODE	3
+#define VPE_FORCE_STANDBY_MODE		0
+#define VPE_NO_STANDBY_MODE		1
+#define VPE_SMART_STANDBY_MODE		2
+#define VPE_SMART_STANDBY_WAKEUP_MODE	3
+
+#define VPE_INT0_STATUS0_RAW_SET	0x0020
+#define VPE_INT0_STATUS0_RAW		VPE_INT0_STATUS0_RAW_SET
+#define VPE_INT0_STATUS0_CLR		0x0028
+#define VPE_INT0_STATUS0		VPE_INT0_STATUS0_CLR
+#define VPE_INT0_ENABLE0_SET		0x0030
+#define VPE_INT0_ENABLE0		VPE_INT0_ENABLE0_SET
+#define VPE_INT0_ENABLE0_CLR		0x0038
+#define VPE_INT0_LIST0_COMPLETE		(1 << 0)
+#define VPE_INT0_LIST0_NOTIFY		(1 << 1)
+#define VPE_INT0_LIST1_COMPLETE		(1 << 2)
+#define VPE_INT0_LIST1_NOTIFY		(1 << 3)
+#define VPE_INT0_LIST2_COMPLETE		(1 << 4)
+#define VPE_INT0_LIST2_NOTIFY		(1 << 5)
+#define VPE_INT0_LIST3_COMPLETE		(1 << 6)
+#define VPE_INT0_LIST3_NOTIFY		(1 << 7)
+#define VPE_INT0_LIST4_COMPLETE		(1 << 8)
+#define VPE_INT0_LIST4_NOTIFY		(1 << 9)
+#define VPE_INT0_LIST5_COMPLETE		(1 << 10)
+#define VPE_INT0_LIST5_NOTIFY		(1 << 11)
+#define VPE_INT0_LIST6_COMPLETE		(1 << 12)
+#define VPE_INT0_LIST6_NOTIFY		(1 << 13)
+#define VPE_INT0_LIST7_COMPLETE		(1 << 14)
+#define VPE_INT0_LIST7_NOTIFY		(1 << 15)
+#define VPE_INT0_DESCRIPTOR		(1 << 16)
+#define VPE_DEI_FMD_INT			(1 << 18)
+
+#define VPE_INT0_STATUS1_RAW_SET	0x0024
+#define VPE_INT0_STATUS1_RAW		VPE_INT0_STATUS1_RAW_SET
+#define VPE_INT0_STATUS1_CLR		0x002c
+#define VPE_INT0_STATUS1		VPE_INT0_STATUS1_CLR
+#define VPE_INT0_ENABLE1_SET		0x0034
+#define VPE_INT0_ENABLE1		VPE_INT0_ENABLE1_SET
+#define VPE_INT0_ENABLE1_CLR		0x003c
+#define VPE_INT0_CHANNEL_GROUP0		(1 << 0)
+#define VPE_INT0_CHANNEL_GROUP1		(1 << 1)
+#define VPE_INT0_CHANNEL_GROUP2		(1 << 2)
+#define VPE_INT0_CHANNEL_GROUP3		(1 << 3)
+#define VPE_INT0_CHANNEL_GROUP4		(1 << 4)
+#define VPE_INT0_CHANNEL_GROUP5		(1 << 5)
+#define VPE_INT0_CLIENT			(1 << 7)
+#define VPE_DEI_ERROR_INT		(1 << 16)
+#define VPE_DS1_UV_ERROR_INT		(1 << 22)
+
+#define VPE_INTC_EOI			0x00a0
+
+#define VPE_CLK_ENABLE			0x0100
+#define VPE_VPEDMA_CLK_ENABLE		(1 << 0)
+#define VPE_DATA_PATH_CLK_ENABLE	(1 << 1)
+
+#define VPE_CLK_RESET			0x0104
+#define VPE_VPDMA_CLK_RESET_MASK	0x1
+#define VPE_VPDMA_CLK_RESET_SHIFT	0
+#define VPE_DATA_PATH_CLK_RESET_MASK	0x1
+#define VPE_DATA_PATH_CLK_RESET_SHIFT	1
+#define VPE_MAIN_RESET_MASK		0x1
+#define VPE_MAIN_RESET_SHIFT		31
+
+#define VPE_CLK_FORMAT_SELECT		0x010c
+#define VPE_CSC_SRC_SELECT_MASK		0x03
+#define VPE_CSC_SRC_SELECT_SHIFT	0
+#define VPE_RGB_OUT_SELECT		(1 << 8)
+#define VPE_DS_SRC_SELECT_MASK		0x07
+#define VPE_DS_SRC_SELECT_SHIFT		9
+#define VPE_DS_BYPASS			(1 << 16)
+#define VPE_COLOR_SEPARATE_422		(1 << 18)
+
+#define VPE_DS_SRC_DEI_SCALER		(5 << VPE_DS_SRC_SELECT_SHIFT)
+#define VPE_CSC_SRC_DEI_SCALER		(3 << VPE_CSC_SRC_SELECT_SHIFT)
+
+#define VPE_CLK_RANGE_MAP		0x011c
+#define VPE_RANGE_RANGE_MAP_Y_MASK	0x07
+#define VPE_RANGE_RANGE_MAP_Y_SHIFT	0
+#define VPE_RANGE_RANGE_MAP_UV_MASK	0x07
+#define VPE_RANGE_RANGE_MAP_UV_SHIFT	3
+#define VPE_RANGE_MAP_ON		(1 << 6)
+#define VPE_RANGE_REDUCTION_ON		(1 << 28)
+
+/* VPE chrominance upsampler regs */
+#define VPE_US1_R0			0x0304
+#define VPE_US2_R0			0x0404
+#define VPE_US3_R0			0x0504
+#define VPE_US_C1_MASK			0x3fff
+#define VPE_US_C1_SHIFT			2
+#define VPE_US_C0_MASK			0x3fff
+#define VPE_US_C0_SHIFT			18
+#define VPE_US_MODE_MASK		0x03
+#define VPE_US_MODE_SHIFT		16
+#define VPE_ANCHOR_FID0_C1_MASK		0x3fff
+#define VPE_ANCHOR_FID0_C1_SHIFT	2
+#define VPE_ANCHOR_FID0_C0_MASK		0x3fff
+#define VPE_ANCHOR_FID0_C0_SHIFT	18
+
+#define VPE_US1_R1			0x0308
+#define VPE_US2_R1			0x0408
+#define VPE_US3_R1			0x0508
+#define VPE_ANCHOR_FID0_C3_MASK		0x3fff
+#define VPE_ANCHOR_FID0_C3_SHIFT	2
+#define VPE_ANCHOR_FID0_C2_MASK		0x3fff
+#define VPE_ANCHOR_FID0_C2_SHIFT	18
+
+#define VPE_US1_R2			0x030c
+#define VPE_US2_R2			0x040c
+#define VPE_US3_R2			0x050c
+#define VPE_INTERP_FID0_C1_MASK		0x3fff
+#define VPE_INTERP_FID0_C1_SHIFT	2
+#define VPE_INTERP_FID0_C0_MASK		0x3fff
+#define VPE_INTERP_FID0_C0_SHIFT	18
+
+#define VPE_US1_R3			0x0310
+#define VPE_US2_R3			0x0410
+#define VPE_US3_R3			0x0510
+#define VPE_INTERP_FID0_C3_MASK		0x3fff
+#define VPE_INTERP_FID0_C3_SHIFT	2
+#define VPE_INTERP_FID0_C2_MASK		0x3fff
+#define VPE_INTERP_FID0_C2_SHIFT	18
+
+#define VPE_US1_R4			0x0314
+#define VPE_US2_R4			0x0414
+#define VPE_US3_R4			0x0514
+#define VPE_ANCHOR_FID1_C1_MASK		0x3fff
+#define VPE_ANCHOR_FID1_C1_SHIFT	2
+#define VPE_ANCHOR_FID1_C0_MASK		0x3fff
+#define VPE_ANCHOR_FID1_C0_SHIFT	18
+
+#define VPE_US1_R5			0x0318
+#define VPE_US2_R5			0x0418
+#define VPE_US3_R5			0x0518
+#define VPE_ANCHOR_FID1_C3_MASK		0x3fff
+#define VPE_ANCHOR_FID1_C3_SHIFT	2
+#define VPE_ANCHOR_FID1_C2_MASK		0x3fff
+#define VPE_ANCHOR_FID1_C2_SHIFT	18
+
+#define VPE_US1_R6			0x031c
+#define VPE_US2_R6			0x041c
+#define VPE_US3_R6			0x051c
+#define VPE_INTERP_FID1_C1_MASK		0x3fff
+#define VPE_INTERP_FID1_C1_SHIFT	2
+#define VPE_INTERP_FID1_C0_MASK		0x3fff
+#define VPE_INTERP_FID1_C0_SHIFT	18
+
+#define VPE_US1_R7			0x0320
+#define VPE_US2_R7			0x0420
+#define VPE_US3_R7			0x0520
+#define VPE_INTERP_FID0_C3_MASK		0x3fff
+#define VPE_INTERP_FID0_C3_SHIFT	2
+#define VPE_INTERP_FID0_C2_MASK		0x3fff
+#define VPE_INTERP_FID0_C2_SHIFT	18
+
+/* VPE de-interlacer regs */
+#define VPE_DEI_FRAME_SIZE		0x0600
+#define VPE_DEI_WIDTH_MASK		0x07ff
+#define VPE_DEI_WIDTH_SHIFT		0
+#define VPE_DEI_HEIGHT_MASK		0x07ff
+#define VPE_DEI_HEIGHT_SHIFT		16
+#define VPE_DEI_INTERLACE_BYPASS	(1 << 29)
+#define VPE_DEI_FIELD_FLUSH		(1 << 30)
+#define VPE_DEI_PROGRESSIVE		(1 << 31)
+
+#define VPE_MDT_BYPASS			0x0604
+#define VPE_MDT_TEMPMAX_BYPASS		(1 << 0)
+#define VPE_MDT_SPATMAX_BYPASS		(1 << 1)
+
+#define VPE_MDT_SF_THRESHOLD		0x0608
+#define VPE_MDT_SF_SC_THR1_MASK		0xff
+#define VPE_MDT_SF_SC_THR1_SHIFT	0
+#define VPE_MDT_SF_SC_THR2_MASK		0xff
+#define VPE_MDT_SF_SC_THR2_SHIFT	0
+#define VPE_MDT_SF_SC_THR3_MASK		0xff
+#define VPE_MDT_SF_SC_THR3_SHIFT	0
+
+#define VPE_EDI_CONFIG			0x060c
+#define VPE_EDI_INP_MODE_MASK		0x03
+#define VPE_EDI_INP_MODE_SHIFT		0
+#define VPE_EDI_ENABLE_3D		(1 << 2)
+#define VPE_EDI_ENABLE_CHROMA_3D	(1 << 3)
+#define VPE_EDI_CHROMA3D_COR_THR_MASK	0xff
+#define VPE_EDI_CHROMA3D_COR_THR_SHIFT	8
+#define VPE_EDI_DIR_COR_LOWER_THR_MASK	0xff
+#define VPE_EDI_DIR_COR_LOWER_THR_SHIFT	16
+#define VPE_EDI_COR_SCALE_FACTOR_MASK	0xff
+#define VPE_EDI_COR_SCALE_FACTOR_SHIFT	23
+
+#define VPE_DEI_EDI_LUT_R0		0x0610
+#define VPE_EDI_LUT0_MASK		0x1f
+#define VPE_EDI_LUT0_SHIFT		0
+#define VPE_EDI_LUT1_MASK		0x1f
+#define VPE_EDI_LUT1_SHIFT		8
+#define VPE_EDI_LUT2_MASK		0x1f
+#define VPE_EDI_LUT2_SHIFT		16
+#define VPE_EDI_LUT3_MASK		0x1f
+#define VPE_EDI_LUT3_SHIFT		24
+
+#define VPE_DEI_EDI_LUT_R1		0x0614
+#define VPE_EDI_LUT0_MASK		0x1f
+#define VPE_EDI_LUT0_SHIFT		0
+#define VPE_EDI_LUT1_MASK		0x1f
+#define VPE_EDI_LUT1_SHIFT		8
+#define VPE_EDI_LUT2_MASK		0x1f
+#define VPE_EDI_LUT2_SHIFT		16
+#define VPE_EDI_LUT3_MASK		0x1f
+#define VPE_EDI_LUT3_SHIFT		24
+
+#define VPE_DEI_EDI_LUT_R2		0x0618
+#define VPE_EDI_LUT4_MASK		0x1f
+#define VPE_EDI_LUT4_SHIFT		0
+#define VPE_EDI_LUT5_MASK		0x1f
+#define VPE_EDI_LUT5_SHIFT		8
+#define VPE_EDI_LUT6_MASK		0x1f
+#define VPE_EDI_LUT6_SHIFT		16
+#define VPE_EDI_LUT7_MASK		0x1f
+#define VPE_EDI_LUT7_SHIFT		24
+
+#define VPE_DEI_EDI_LUT_R3		0x061c
+#define VPE_EDI_LUT8_MASK		0x1f
+#define VPE_EDI_LUT8_SHIFT		0
+#define VPE_EDI_LUT9_MASK		0x1f
+#define VPE_EDI_LUT9_SHIFT		8
+#define VPE_EDI_LUT10_MASK		0x1f
+#define VPE_EDI_LUT10_SHIFT		16
+#define VPE_EDI_LUT11_MASK		0x1f
+#define VPE_EDI_LUT11_SHIFT		24
+
+#define VPE_DEI_FMD_WINDOW_R0		0x0620
+#define VPE_FMD_WINDOW_MINX_MASK	0x07ff
+#define VPE_FMD_WINDOW_MINX_SHIFT	0
+#define VPE_FMD_WINDOW_MAXX_MASK	0x07ff
+#define VPE_FMD_WINDOW_MAXX_SHIFT	16
+#define VPE_FMD_WINDOW_ENABLE		(1 << 31)
+
+#define VPE_DEI_FMD_WINDOW_R1		0x0624
+#define VPE_FMD_WINDOW_MINY_MASK	0x07ff
+#define VPE_FMD_WINDOW_MINY_SHIFT	0
+#define VPE_FMD_WINDOW_MAXY_MASK	0x07ff
+#define VPE_FMD_WINDOW_MAXY_SHIFT	16
+
+#define VPE_DEI_FMD_CONTROL_R0		0x0628
+#define VPE_FMD_ENABLE			(1 << 0)
+#define VPE_FMD_LOCK			(1 << 1)
+#define VPE_FMD_JAM_DIR			(1 << 2)
+#define VPE_FMD_BED_ENABLE		(1 << 3)
+#define VPE_FMD_CAF_FIELD_THR_MASK	0xff
+#define VPE_FMD_CAF_FIELD_THR_SHIFT	16
+#define VPE_FMD_CAF_LINE_THR_MASK	0xff
+#define VPE_FMD_CAF_LINE_THR_SHIFT	24
+
+#define VPE_DEI_FMD_CONTROL_R1		0x062c
+#define VPE_FMD_CAF_THR_MASK		0x000fffff
+#define VPE_FMD_CAF_THR_SHIFT		0
+
+#define VPE_DEI_FMD_STATUS_R0		0x0630
+#define VPE_FMD_CAF_MASK		0x000fffff
+#define VPE_FMD_CAF_SHIFT		0
+#define VPE_FMD_RESET			(1 << 24)
+
+#define VPE_DEI_FMD_STATUS_R1		0x0634
+#define VPE_FMD_FIELD_DIFF_MASK		0x0fffffff
+#define VPE_FMD_FIELD_DIFF_SHIFT	0
+
+#define VPE_DEI_FMD_STATUS_R2		0x0638
+#define VPE_FMD_FRAME_DIFF_MASK		0x000fffff
+#define VPE_FMD_FRAME_DIFF_SHIFT	0
+
+/* VPE scaler regs */
+#define VPE_SC_MP_SC0			0x0700
+#define VPE_INTERLACE_O			(1 << 0)
+#define VPE_LINEAR			(1 << 1)
+#define VPE_SC_BYPASS			(1 << 2)
+#define VPE_INVT_FID			(1 << 3)
+#define VPE_USE_RAV			(1 << 4)
+#define VPE_ENABLE_EV			(1 << 5)
+#define VPE_AUTO_HS			(1 << 6)
+#define VPE_DCM_2X			(1 << 7)
+#define VPE_DCM_4X			(1 << 8)
+#define VPE_HP_BYPASS			(1 << 9)
+#define VPE_INTERLACE_I			(1 << 10)
+#define VPE_ENABLE_SIN2_VER_INTP	(1 << 11)
+#define VPE_Y_PK_EN			(1 << 14)
+#define VPE_TRIM			(1 << 15)
+#define VPE_SELFGEN_FID			(1 << 16)
+
+#define VPE_SC_MP_SC1			0x0704
+#define VPE_ROW_ACC_INC_MASK		0x07ffffff
+#define VPE_ROW_ACC_INC_SHIFT		0
+
+#define VPE_SC_MP_SC2			0x0708
+#define VPE_ROW_ACC_OFFSET_MASK		0x0fffffff
+#define VPE_ROW_ACC_OFFSET_SHIFT	0
+
+#define VPE_SC_MP_SC3			0x070c
+#define VPE_ROW_ACC_OFFSET_B_MASK	0x0fffffff
+#define VPE_ROW_ACC_OFFSET_B_SHIFT	0
+
+#define VPE_SC_MP_SC4			0x0710
+#define VPE_TAR_H_MASK			0x07ff
+#define VPE_TAR_H_SHIFT			0
+#define VPE_TAR_W_MASK			0x07ff
+#define VPE_TAR_W_SHIFT			12
+#define VPE_LIN_ACC_INC_U_MASK		0x07
+#define VPE_LIN_ACC_INC_U_SHIFT		24
+#define VPE_NLIN_ACC_INIT_U_MASK	0x07
+#define VPE_NLIN_ACC_INIT_U_SHIFT	28
+
+#define VPE_SC_MP_SC5			0x0714
+#define VPE_SRC_H_MASK			0x07ff
+#define VPE_SRC_H_SHIFT			0
+#define VPE_SRC_W_MASK			0x07ff
+#define VPE_SRC_W_SHIFT			12
+#define VPE_NLIN_ACC_INC_U_MASK		0x07
+#define VPE_NLIN_ACC_INC_U_SHIFT	24
+
+#define VPE_SC_MP_SC6			0x0718
+#define VPE_ROW_ACC_INIT_RAV_MASK	0x03ff
+#define VPE_ROW_ACC_INIT_RAV_SHIFT	0
+#define VPE_ROW_ACC_INIT_RAV_B_MASK	0x03ff
+#define VPE_ROW_ACC_INIT_RAV_B_SHIFT	10
+
+#define VPE_SC_MP_SC8			0x0720
+#define VPE_NLIN_LEFT_MASK		0x07ff
+#define VPE_NLIN_LEFT_SHIFT		0
+#define VPE_NLIN_RIGHT_MASK		0x07ff
+#define VPE_NLIN_RIGHT_SHIFT		12
+
+#define VPE_SC_MP_SC9			0x0724
+#define VPE_LIN_ACC_INC			VPE_SC_MP_SC9
+
+#define VPE_SC_MP_SC10			0x0728
+#define VPE_NLIN_ACC_INIT		VPE_SC_MP_SC10
+
+#define VPE_SC_MP_SC11			0x072c
+#define VPE_NLIN_ACC_INC		VPE_SC_MP_SC11
+
+#define VPE_SC_MP_SC12			0x0730
+#define VPE_COL_ACC_OFFSET_MASK		0x01ffffff
+#define VPE_COL_ACC_OFFSET_SHIFT	0
+
+#define VPE_SC_MP_SC13			0x0734
+#define VPE_SC_FACTOR_RAV_MASK		0x03ff
+#define VPE_SC_FACTOR_RAV_SHIFT		0
+#define VPE_CHROMA_INTP_THR_MASK	0x03ff
+#define VPE_CHROMA_INTP_THR_SHIFT	12
+#define VPE_DELTA_CHROMA_THR_MASK	0x0f
+#define VPE_DELTA_CHROMA_THR_SHIFT	24
+
+#define VPE_SC_MP_SC17			0x0744
+#define VPE_EV_THR_MASK			0x03ff
+#define VPE_EV_THR_SHIFT		12
+#define VPE_DELTA_LUMA_THR_MASK		0x0f
+#define VPE_DELTA_LUMA_THR_SHIFT	24
+#define VPE_DELTA_EV_THR_MASK		0x0f
+#define VPE_DELTA_EV_THR_SHIFT		28
+
+#define VPE_SC_MP_SC18			0x0748
+#define VPE_HS_FACTOR_MASK		0x03ff
+#define VPE_HS_FACTOR_SHIFT		0
+#define VPE_CONF_DEFAULT_MASK		0x01ff
+#define VPE_CONF_DEFAULT_SHIFT		16
+
+#define VPE_SC_MP_SC19			0x074c
+#define VPE_HPF_COEFF0_MASK		0xff
+#define VPE_HPF_COEFF0_SHIFT		0
+#define VPE_HPF_COEFF1_MASK		0xff
+#define VPE_HPF_COEFF1_SHIFT		8
+#define VPE_HPF_COEFF2_MASK		0xff
+#define VPE_HPF_COEFF2_SHIFT		16
+#define VPE_HPF_COEFF3_MASK		0xff
+#define VPE_HPF_COEFF3_SHIFT		23
+
+#define VPE_SC_MP_SC20			0x0750
+#define VPE_HPF_COEFF4_MASK		0xff
+#define VPE_HPF_COEFF4_SHIFT		0
+#define VPE_HPF_COEFF5_MASK		0xff
+#define VPE_HPF_COEFF5_SHIFT		8
+#define VPE_HPF_NORM_SHIFT_MASK		0x07
+#define VPE_HPF_NORM_SHIFT_SHIFT	16
+#define VPE_NL_LIMIT_MASK		0x1ff
+#define VPE_NL_LIMIT_SHIFT		20
+
+#define VPE_SC_MP_SC21			0x0754
+#define VPE_NL_LO_THR_MASK		0x01ff
+#define VPE_NL_LO_THR_SHIFT		0
+#define VPE_NL_LO_SLOPE_MASK		0xff
+#define VPE_NL_LO_SLOPE_SHIFT		16
+
+#define VPE_SC_MP_SC22			0x0758
+#define VPE_NL_HI_THR_MASK		0x01ff
+#define VPE_NL_HI_THR_SHIFT		0
+#define VPE_NL_HI_SLOPE_SH_MASK		0x07
+#define VPE_NL_HI_SLOPE_SH_SHIFT	16
+
+#define VPE_SC_MP_SC23			0x075c
+#define VPE_GRADIENT_THR_MASK		0x07ff
+#define VPE_GRADIENT_THR_SHIFT		0
+#define VPE_GRADIENT_THR_RANGE_MASK	0x0f
+#define VPE_GRADIENT_THR_RANGE_SHIFT	12
+#define VPE_MIN_GY_THR_MASK		0xff
+#define VPE_MIN_GY_THR_SHIFT		16
+#define VPE_MIN_GY_THR_RANGE_MASK	0x0f
+#define VPE_MIN_GY_THR_RANGE_SHIFT	28
+
+#define VPE_SC_MP_SC24			0x0760
+#define VPE_ORG_H_MASK			0x07ff
+#define VPE_ORG_H_SHIFT			0
+#define VPE_ORG_W_MASK			0x07ff
+#define VPE_ORG_W_SHIFT			16
+
+#define VPE_SC_MP_SC25			0x0764
+#define VPE_OFF_H_MASK			0x07ff
+#define VPE_OFF_H_SHIFT			0
+#define VPE_OFF_W_MASK			0x07ff
+#define VPE_OFF_W_SHIFT			16
+
+/* VPE color space converter regs */
+#define VPE_CSC_CSC00			0x5700
+#define VPE_CSC_A0_MASK			0x1fff
+#define VPE_CSC_A0_SHIFT		0
+#define VPE_CSC_B0_MASK			0x1fff
+#define VPE_CSC_B0_SHIFT		16
+
+#define VPE_CSC_CSC01			0x5704
+#define VPE_CSC_C0_MASK			0x1fff
+#define VPE_CSC_C0_SHIFT		0
+#define VPE_CSC_A1_MASK			0x1fff
+#define VPE_CSC_A1_SHIFT		16
+
+#define VPE_CSC_CSC02			0x5708
+#define VPE_CSC_B1_MASK			0x1fff
+#define VPE_CSC_B1_SHIFT		0
+#define VPE_CSC_C1_MASK			0x1fff
+#define VPE_CSC_C1_SHIFT		16
+
+#define VPE_CSC_CSC03			0x570c
+#define VPE_CSC_A2_MASK			0x1fff
+#define VPE_CSC_A2_SHIFT		0
+#define VPE_CSC_B2_MASK			0x1fff
+#define VPE_CSC_B2_SHIFT		16
+
+#define VPE_CSC_CSC04			0x5710
+#define VPE_CSC_C2_MASK			0x1fff
+#define VPE_CSC_C2_SHIFT		0
+#define VPE_CSC_D0_MASK			0x0fff
+#define VPE_CSC_D0_SHIFT		16
+
+#define VPE_CSC_CSC05			0x5714
+#define VPE_CSC_D1_MASK			0x0fff
+#define VPE_CSC_D1_SHIFT		0
+#define VPE_CSC_D2_MASK			0x0fff
+#define VPE_CSC_D2_SHIFT		16
+#define VPE_CSC_BYPASS			(1 << 28)
+
+#endif
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v3 4/6] v4l: ti-vpe: Add de-interlacer support in VPE
  2013-08-29 12:32     ` Archit Taneja
@ 2013-08-29 12:32       ` Archit Taneja
  -1 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-08-29 12:32 UTC (permalink / raw)
  To: linux-media
  Cc: hverkuil, laurent.pinchart, tomi.valkeinen, linux-omap, Archit Taneja

Add support for the de-interlacer block in VPE.

For de-interlacer to work, we need to enable 2 more sets of VPE input ports
which fetch data from the 'last' and 'last to last' fields of the interlaced
video. Apart from that, we need to enable the Motion vector output and input
ports, and also allocate DMA buffers for them.

We need to make sure that two most recent fields in the source queue are
available and in the 'READY' state. Once a mem2mem context gets access to the
VPE HW(in device_run), it extracts the addresses of the 3 buffers, and provides
it to the data descriptors for the 3 sets of input ports((LUMA1, CHROMA1),
(LUMA2, CHROMA2), and (LUMA3, CHROMA3)) respectively for the 3 consecutive
fields. The motion vector and output port descriptors are configured and the
list is submitted to VPDMA.

Once the transaction is done, the v4l2 buffer corresponding to the oldest
field(the 3rd one) is changed to the state 'DONE', and the buffers corresponding
to 1st and 2nd fields become the 2nd and 3rd field for the next de-interlace
operation. This way, for each deinterlace operation, we have the 3 most recent
fields. After each transaction, we also swap the motion vector buffers, the new
input motion vector buffer contains the resultant motion information of all the
previous frames, and the new output motion vector buffer will be used to hold
the updated motion vector to capture the motion changes in the next field. The
motion vector buffers are allocated using the DMA allocation API.

The de-interlacer is removed from bypass mode, it requires some extra default
configurations which are now added. The chrominance upsampler coefficients are
added for interlaced frames. Some VPDMA parameters like frame start event and
line mode are configured for the 2 extra sets of input ports.

Signed-off-by: Archit Taneja <archit@ti.com>
---
 drivers/media/platform/ti-vpe/vpe.c | 378 ++++++++++++++++++++++++++++++++----
 1 file changed, 344 insertions(+), 34 deletions(-)

diff --git a/drivers/media/platform/ti-vpe/vpe.c b/drivers/media/platform/ti-vpe/vpe.c
index 85b0880..c2bc0f4 100644
--- a/drivers/media/platform/ti-vpe/vpe.c
+++ b/drivers/media/platform/ti-vpe/vpe.c
@@ -69,6 +69,8 @@
 #define VPE_CHROMA	1
 
 /* per m2m context info */
+#define VPE_MAX_SRC_BUFS	3	/* need 3 src fields to de-interlace */
+
 #define VPE_DEF_BUFS_PER_JOB	1	/* default one buffer per batch job */
 
 /*
@@ -110,6 +112,38 @@ static const struct vpe_us_coeffs us_coeffs[] = {
 		0x00C8, 0x0348, 0x0018, 0x3FD8, 0x3FB8, 0x0378, 0x00E8, 0x3FE8,
 		0x00C8, 0x0348, 0x0018, 0x3FD8, 0x3FB8, 0x0378, 0x00E8, 0x3FE8,
 	},
+	{
+		/* Coefficients for Top Field Interlaced input */
+		0x0051, 0x03D5, 0x3FE3, 0x3FF7, 0x3FB5, 0x02E9, 0x018F, 0x3FD3,
+		/* Coefficients for Bottom Field Interlaced input */
+		0x016B, 0x0247, 0x00B1, 0x3F9D, 0x3FCF, 0x03DB, 0x005D, 0x3FF9,
+	},
+};
+
+/*
+ * the following registers are for configuring some of the parameters of the
+ * motion and edge detection blocks inside DEI, these generally remain the same,
+ * these could be passed later via userspace if some one needs to tweak these.
+ */
+struct vpe_dei_regs {
+	unsigned long mdt_spacial_freq_thr_reg;		/* VPE_DEI_REG2 */
+	unsigned long edi_config_reg;			/* VPE_DEI_REG3 */
+	unsigned long edi_lut_reg0;			/* VPE_DEI_REG4 */
+	unsigned long edi_lut_reg1;			/* VPE_DEI_REG5 */
+	unsigned long edi_lut_reg2;			/* VPE_DEI_REG6 */
+	unsigned long edi_lut_reg3;			/* VPE_DEI_REG7 */
+};
+
+/*
+ * default expert DEI register values, unlikely to be modified.
+ */
+static const struct vpe_dei_regs dei_regs = {
+	0x020C0804u,
+	0x0118100Fu,
+	0x08040200u,
+	0x1010100Cu,
+	0x10101010u,
+	0x10101010u,
 };
 
 /*
@@ -117,6 +151,7 @@ static const struct vpe_us_coeffs us_coeffs[] = {
  */
 struct vpe_port_data {
 	enum vpdma_channel channel;	/* VPDMA channel */
+	u8	vb_index;		/* input frame f, f-1, f-2 index */
 	u8	vb_part;		/* plane index for co-panar formats */
 };
 
@@ -125,6 +160,12 @@ struct vpe_port_data {
  */
 #define VPE_PORT_LUMA1_IN	0
 #define VPE_PORT_CHROMA1_IN	1
+#define VPE_PORT_LUMA2_IN	2
+#define VPE_PORT_CHROMA2_IN	3
+#define VPE_PORT_LUMA3_IN	4
+#define VPE_PORT_CHROMA3_IN	5
+#define VPE_PORT_MV_IN		6
+#define VPE_PORT_MV_OUT		7
 #define VPE_PORT_LUMA_OUT	8
 #define VPE_PORT_CHROMA_OUT	9
 #define VPE_PORT_RGB_OUT	10
@@ -132,12 +173,40 @@ struct vpe_port_data {
 static const struct vpe_port_data port_data[11] = {
 	[VPE_PORT_LUMA1_IN] = {
 		.channel	= VPE_CHAN_LUMA1_IN,
+		.vb_index	= 0,
 		.vb_part	= VPE_LUMA,
 	},
 	[VPE_PORT_CHROMA1_IN] = {
 		.channel	= VPE_CHAN_CHROMA1_IN,
+		.vb_index	= 0,
+		.vb_part	= VPE_CHROMA,
+	},
+	[VPE_PORT_LUMA2_IN] = {
+		.channel	= VPE_CHAN_LUMA2_IN,
+		.vb_index	= 1,
+		.vb_part	= VPE_LUMA,
+	},
+	[VPE_PORT_CHROMA2_IN] = {
+		.channel	= VPE_CHAN_CHROMA2_IN,
+		.vb_index	= 1,
 		.vb_part	= VPE_CHROMA,
 	},
+	[VPE_PORT_LUMA3_IN] = {
+		.channel	= VPE_CHAN_LUMA3_IN,
+		.vb_index	= 2,
+		.vb_part	= VPE_LUMA,
+	},
+	[VPE_PORT_CHROMA3_IN] = {
+		.channel	= VPE_CHAN_CHROMA3_IN,
+		.vb_index	= 2,
+		.vb_part	= VPE_CHROMA,
+	},
+	[VPE_PORT_MV_IN] = {
+		.channel	= VPE_CHAN_MV_IN,
+	},
+	[VPE_PORT_MV_OUT] = {
+		.channel	= VPE_CHAN_MV_OUT,
+	},
 	[VPE_PORT_LUMA_OUT] = {
 		.channel	= VPE_CHAN_LUMA_OUT,
 		.vb_part	= VPE_LUMA,
@@ -209,6 +278,7 @@ struct vpe_q_data {
 	unsigned int		height;				/* frame height */
 	unsigned int		bytesperline[VPE_MAX_PLANES];	/* bytes per line in memory */
 	enum v4l2_colorspace	colorspace;
+	enum v4l2_field		field;				/* supported field value */
 	unsigned int		flags;
 	unsigned int		sizeimage[VPE_MAX_PLANES];	/* image size in memory */
 	struct v4l2_rect	c_rect;				/* crop/compose rectangle */
@@ -218,6 +288,7 @@ struct vpe_q_data {
 /* vpe_q_data flag bits */
 #define	Q_DATA_FRAME_1D		(1 << 0)
 #define	Q_DATA_MODE_TILED	(1 << 1)
+#define	Q_DATA_INTERLACED	(1 << 2)
 
 enum {
 	Q_DATA_SRC = 0,
@@ -269,6 +340,7 @@ struct vpe_ctx {
 	struct v4l2_m2m_ctx	*m2m_ctx;
 	struct v4l2_ctrl_handler hdl;
 
+	unsigned int		field;			/* current field */
 	unsigned int		sequence;		/* current frame/field seq */
 	unsigned int		aborting;		/* abort after next irq */
 
@@ -276,13 +348,19 @@ struct vpe_ctx {
 	unsigned int		bufs_completed;		/* bufs done in this batch */
 
 	struct vpe_q_data	q_data[2];		/* src & dst queue data */
-	struct vb2_buffer	*src_vb;
+	struct vb2_buffer	*src_vbs[VPE_MAX_SRC_BUFS];
 	struct vb2_buffer	*dst_vb;
 
+	dma_addr_t		mv_buf_dma[2];		/* dma addrs of motion vector in/out bufs */
+	void			*mv_buf[2];		/* virtual addrs of motion vector bufs */
+	size_t			mv_buf_size;		/* current motion vector buffer size */
 	struct vpdma_buf	mmr_adb;		/* shadow reg addr/data block */
 	struct vpdma_desc_list	desc_list;		/* DMA descriptor list */
 
+	bool			deinterlacing;		/* using de-interlacer */
 	bool			load_mmrs;		/* have new shadow reg values */
+
+	unsigned int		src_mv_buf_selector;
 };
 
 
@@ -358,8 +436,7 @@ struct vpe_mmr_adb {
 	struct vpdma_adb_hdr	us3_hdr;
 	u32			us3_regs[8];
 	struct vpdma_adb_hdr	dei_hdr;
-	u32			dei_regs[1];
-	u32			dei_pad[3];
+	u32			dei_regs[8];
 	struct vpdma_adb_hdr	sc_hdr;
 	u32			sc_regs[1];
 	u32			sc_pad[3];
@@ -385,6 +462,80 @@ static void init_adb_hdrs(struct vpe_ctx *ctx)
 };
 
 /*
+ * Allocate or re-allocate the motion vector DMA buffers
+ * There are two buffers, one for input and one for output.
+ * However, the roles are reversed after each field is processed.
+ * In other words, after each field is processed, the previous
+ * output (dst) MV buffer becomes the new input (src) MV buffer.
+ */
+static int realloc_mv_buffers(struct vpe_ctx *ctx, size_t size)
+{
+	struct device *dev = ctx->dev->v4l2_dev.dev;
+
+	if (ctx->mv_buf_size == size)
+		return 0;
+
+	if (ctx->mv_buf[0])
+		dma_free_coherent(dev, ctx->mv_buf_size, ctx->mv_buf[0],
+			ctx->mv_buf_dma[0]);
+
+	if (ctx->mv_buf[1])
+		dma_free_coherent(dev, ctx->mv_buf_size, ctx->mv_buf[1],
+			ctx->mv_buf_dma[1]);
+
+	if (size == 0)
+		return 0;
+
+	ctx->mv_buf[0] = dma_alloc_coherent(dev, size, &ctx->mv_buf_dma[0],
+				GFP_KERNEL);
+	if (!ctx->mv_buf[0]) {
+		vpe_err(ctx->dev, "failed to allocate motion vector buffer\n");
+		return -ENOMEM;
+	}
+
+	ctx->mv_buf[1] = dma_alloc_coherent(dev, size, &ctx->mv_buf_dma[1],
+				GFP_KERNEL);
+	if (!ctx->mv_buf[1]) {
+		vpe_err(ctx->dev, "failed to allocate motion vector buffer\n");
+		dma_free_coherent(dev, size, ctx->mv_buf[0],
+			ctx->mv_buf_dma[0]);
+
+		return -ENOMEM;
+	}
+
+	ctx->mv_buf_size = size;
+	ctx->src_mv_buf_selector = 0;
+
+	return 0;
+}
+
+static void free_mv_buffers(struct vpe_ctx *ctx)
+{
+	realloc_mv_buffers(ctx, 0);
+}
+
+/*
+ * While de-interlacing, we keep the two most recent input buffers
+ * around.  This function frees those two buffers when we have
+ * finished processing the current stream.
+ */
+static void free_vbs(struct vpe_ctx *ctx)
+{
+	struct vpe_dev *dev = ctx->dev;
+	unsigned long flags;
+
+	if (ctx->src_vbs[2] == NULL)
+		return;
+
+	spin_lock_irqsave(&dev->lock, flags);
+	if (ctx->src_vbs[2]) {
+		v4l2_m2m_buf_done(ctx->src_vbs[2], VB2_BUF_STATE_DONE);
+		v4l2_m2m_buf_done(ctx->src_vbs[1], VB2_BUF_STATE_DONE);
+	}
+	spin_unlock_irqrestore(&dev->lock, flags);
+}
+
+/*
  * Enable or disable the VPE clocks
  */
 static void vpe_set_clock_enable(struct vpe_dev *dev, bool on)
@@ -425,6 +576,7 @@ static void vpe_top_vpdma_reset(struct vpe_dev *dev)
 static void set_us_coefficients(struct vpe_ctx *ctx)
 {
 	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	struct vpe_q_data *s_q_data = &ctx->q_data[Q_DATA_SRC];
 	u32 *us1_reg = &mmr_adb->us1_regs[0];
 	u32 *us2_reg = &mmr_adb->us2_regs[0];
 	u32 *us3_reg = &mmr_adb->us3_regs[0];
@@ -432,6 +584,9 @@ static void set_us_coefficients(struct vpe_ctx *ctx)
 
 	cp = &us_coeffs[0].anchor_fid0_c0;
 
+	if (s_q_data->flags & Q_DATA_INTERLACED)	/* interlaced */
+		cp += sizeof(us_coeffs[0]) / sizeof(*cp);
+
 	end_cp = cp + sizeof(us_coeffs[0]) / sizeof(*cp);
 
 	while (cp < end_cp) {
@@ -472,14 +627,28 @@ static void set_cfg_and_line_modes(struct vpe_ctx *ctx)
 
 	/* regs for now */
 	vpdma_set_line_mode(ctx->dev->vpdma, line_mode, VPE_CHAN_CHROMA1_IN);
+	vpdma_set_line_mode(ctx->dev->vpdma, line_mode, VPE_CHAN_CHROMA2_IN);
+	vpdma_set_line_mode(ctx->dev->vpdma, line_mode, VPE_CHAN_CHROMA3_IN);
 
 	/* frame start for input luma */
 	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
 		VPE_CHAN_LUMA1_IN);
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_LUMA2_IN);
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_LUMA3_IN);
 
 	/* frame start for input chroma */
 	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
 		VPE_CHAN_CHROMA1_IN);
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_CHROMA2_IN);
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_CHROMA3_IN);
+
+	/* frame start for MV in client */
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_MV_IN);
 
 	ctx->load_mmrs = true;
 }
@@ -523,13 +692,14 @@ static void set_dst_registers(struct vpe_ctx *ctx)
 /*
  * Set the de-interlacer shadow register values
  */
-static void set_dei_regs_bypass(struct vpe_ctx *ctx)
+static void set_dei_regs(struct vpe_ctx *ctx)
 {
 	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
 	struct vpe_q_data *s_q_data = &ctx->q_data[Q_DATA_SRC];
 	unsigned int src_h = s_q_data->c_rect.height;
 	unsigned int src_w = s_q_data->c_rect.width;
 	u32 *dei_mmr0 = &mmr_adb->dei_regs[0];
+	bool deinterlace = true;
 	u32 val = 0;
 
 	/*
@@ -538,7 +708,13 @@ static void set_dei_regs_bypass(struct vpe_ctx *ctx)
 	 * for both progressive and interlace content in interlace bypass mode.
 	 * It has been recommended not to use progressive bypass mode.
 	 */
-	val = VPE_DEI_INTERLACE_BYPASS;
+	if ((!ctx->deinterlacing && (s_q_data->flags & Q_DATA_INTERLACED)) ||
+			!(s_q_data->flags & Q_DATA_INTERLACED)) {
+		deinterlace = false;
+		val = VPE_DEI_INTERLACE_BYPASS;
+	}
+
+	src_h = deinterlace ? src_h * 2 : src_h;
 
 	val |= (src_h << VPE_DEI_HEIGHT_SHIFT) |
 		(src_w << VPE_DEI_WIDTH_SHIFT) |
@@ -577,10 +753,35 @@ static void set_sc_regs_bypass(struct vpe_ctx *ctx)
  */
 static int set_srcdst_params(struct vpe_ctx *ctx)
 {
+	struct vpe_q_data *s_q_data =  &ctx->q_data[Q_DATA_SRC];
+	struct vpe_q_data *d_q_data =  &ctx->q_data[Q_DATA_DST];
+	size_t mv_buf_size;
+	int ret;
+
 	ctx->sequence = 0;
+	ctx->field = V4L2_FIELD_TOP;
+
+	if ((s_q_data->flags & Q_DATA_INTERLACED) &&
+			!(d_q_data->flags & Q_DATA_INTERLACED)) {
+		const struct vpdma_data_format *mv =
+			&vpdma_misc_fmts[VPDMA_DATA_FMT_MV];
+
+		ctx->deinterlacing = 1;
+		mv_buf_size =
+			(s_q_data->width * s_q_data->height * mv->depth) >> 3;
+	} else {
+		ctx->deinterlacing = 0;
+		mv_buf_size = 0;
+	}
+
+	free_vbs(ctx);
+
+	ret = realloc_mv_buffers(ctx, mv_buf_size);
+	if (ret)
+		return ret;
 
 	set_cfg_and_line_modes(ctx);
-	set_dei_regs_bypass(ctx);
+	set_dei_regs(ctx);
 	set_csc_coeff_bypass(ctx);
 	set_sc_regs_bypass(ctx);
 
@@ -607,6 +808,9 @@ static int job_ready(void *priv)
 	struct vpe_ctx *ctx = priv;
 	int needed = ctx->bufs_per_job;
 
+	if (ctx->deinterlacing && ctx->src_vbs[2] == NULL)
+		needed += 2;	/* need additional two most recent fields */
+
 	if (v4l2_m2m_num_src_bufs_ready(ctx->m2m_ctx) < needed)
 		return 0;
 
@@ -734,17 +938,25 @@ static void add_out_dtd(struct vpe_ctx *ctx, int port)
 	struct v4l2_rect *c_rect = &q_data->c_rect;
 	struct vpe_fmt *fmt = q_data->fmt;
 	const struct vpdma_data_format *vpdma_fmt;
-	int plane = fmt->coplanar ? p_data->vb_part : 0;
+	int mv_buf_selector = !ctx->src_mv_buf_selector;
 	dma_addr_t dma_addr;
 	u32 flags = 0;
 
-	vpdma_fmt = fmt->vpdma_fmt[plane];
-	dma_addr = vb2_dma_contig_plane_dma_addr(vb, plane);
-	if (!dma_addr) {
-		vpe_err(ctx->dev,
-			"acquiring output buffer(%d) dma_addr failed\n",
-			port);
-		return;
+	if (port == VPE_PORT_MV_OUT) {
+		vpdma_fmt = &vpdma_misc_fmts[VPDMA_DATA_FMT_MV];
+		dma_addr = ctx->mv_buf_dma[mv_buf_selector];
+	} else {
+		/* to incorporate interleaved formats */
+		int plane = fmt->coplanar ? p_data->vb_part : 0;
+
+		vpdma_fmt = fmt->vpdma_fmt[plane];
+		dma_addr = vb2_dma_contig_plane_dma_addr(vb, plane);
+		if (!dma_addr) {
+			vpe_err(ctx->dev,
+				"acquiring output buffer(%d) dma_addr failed\n",
+				port);
+			return;
+		}
 	}
 
 	if (q_data->flags & Q_DATA_FRAME_1D)
@@ -760,23 +972,31 @@ static void add_in_dtd(struct vpe_ctx *ctx, int port)
 {
 	struct vpe_q_data *q_data = &ctx->q_data[Q_DATA_SRC];
 	const struct vpe_port_data *p_data = &port_data[port];
-	struct vb2_buffer *vb = ctx->src_vb;
+	struct vb2_buffer *vb = ctx->src_vbs[p_data->vb_index];
 	struct v4l2_rect *c_rect = &q_data->c_rect;
 	struct vpe_fmt *fmt = q_data->fmt;
 	const struct vpdma_data_format *vpdma_fmt;
-	int plane = fmt->coplanar ? p_data->vb_part : 0;
-	int field = 0;
+	int mv_buf_selector = ctx->src_mv_buf_selector;
+	int field = vb->v4l2_buf.field == V4L2_FIELD_BOTTOM;
 	dma_addr_t dma_addr;
 	u32 flags = 0;
 
-	vpdma_fmt = fmt->vpdma_fmt[plane];
+	if (port == VPE_PORT_MV_IN) {
+		vpdma_fmt = &vpdma_misc_fmts[VPDMA_DATA_FMT_MV];
+		dma_addr = ctx->mv_buf_dma[mv_buf_selector];
+	} else {
+		/* to incorporate interleaved formats */
+		int plane = fmt->coplanar ? p_data->vb_part : 0;
 
-	dma_addr = vb2_dma_contig_plane_dma_addr(vb, plane);
-	if (!dma_addr) {
-		vpe_err(ctx->dev,
-			"acquiring input buffer(%d) dma_addr failed\n",
-			port);
-		return;
+		vpdma_fmt = fmt->vpdma_fmt[plane];
+
+		dma_addr = vb2_dma_contig_plane_dma_addr(vb, plane);
+		if (!dma_addr) {
+			vpe_err(ctx->dev,
+				"acquiring input buffer(%d) dma_addr failed\n",
+				port);
+			return;
+		}
 	}
 
 	if (q_data->flags & Q_DATA_FRAME_1D)
@@ -794,7 +1014,8 @@ static void add_in_dtd(struct vpe_ctx *ctx, int port)
 static void enable_irqs(struct vpe_ctx *ctx)
 {
 	write_reg(ctx->dev, VPE_INT0_ENABLE0_SET, VPE_INT0_LIST0_COMPLETE);
-	write_reg(ctx->dev, VPE_INT0_ENABLE1_SET, VPE_DS1_UV_ERROR_INT);
+	write_reg(ctx->dev, VPE_INT0_ENABLE1_SET, VPE_DEI_ERROR_INT |
+				VPE_DS1_UV_ERROR_INT);
 
 	vpdma_enable_list_complete_irq(ctx->dev->vpdma, 0, true);
 }
@@ -817,8 +1038,15 @@ static void device_run(void *priv)
 	struct vpe_ctx *ctx = priv;
 	struct vpe_q_data *d_q_data = &ctx->q_data[Q_DATA_DST];
 
-	ctx->src_vb = v4l2_m2m_src_buf_remove(ctx->m2m_ctx);
-	WARN_ON(ctx->src_vb == NULL);
+	if (ctx->deinterlacing && ctx->src_vbs[2] == NULL) {
+		ctx->src_vbs[2] = v4l2_m2m_src_buf_remove(ctx->m2m_ctx);
+		WARN_ON(ctx->src_vbs[2] == NULL);
+		ctx->src_vbs[1] = v4l2_m2m_src_buf_remove(ctx->m2m_ctx);
+		WARN_ON(ctx->src_vbs[1] == NULL);
+	}
+
+	ctx->src_vbs[0] = v4l2_m2m_src_buf_remove(ctx->m2m_ctx);
+	WARN_ON(ctx->src_vbs[0] == NULL);
 	ctx->dst_vb = v4l2_m2m_dst_buf_remove(ctx->m2m_ctx);
 	WARN_ON(ctx->dst_vb == NULL);
 
@@ -830,24 +1058,49 @@ static void device_run(void *priv)
 		ctx->load_mmrs = false;
 	}
 
+	/* output data descriptors */
+	if (ctx->deinterlacing)
+		add_out_dtd(ctx, VPE_PORT_MV_OUT);
+
 	add_out_dtd(ctx, VPE_PORT_LUMA_OUT);
 	if (d_q_data->fmt->coplanar)
 		add_out_dtd(ctx, VPE_PORT_CHROMA_OUT);
 
+	/* input data descriptors */
+	if (ctx->deinterlacing) {
+		add_in_dtd(ctx, VPE_PORT_LUMA3_IN);
+		add_in_dtd(ctx, VPE_PORT_CHROMA3_IN);
+
+		add_in_dtd(ctx, VPE_PORT_LUMA2_IN);
+		add_in_dtd(ctx, VPE_PORT_CHROMA2_IN);
+	}
+
 	add_in_dtd(ctx, VPE_PORT_LUMA1_IN);
 	add_in_dtd(ctx, VPE_PORT_CHROMA1_IN);
 
+	if (ctx->deinterlacing)
+		add_in_dtd(ctx, VPE_PORT_MV_IN);
+
 	/* sync on channel control descriptors for output ports */
 	vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_LUMA_OUT);
 	if (d_q_data->fmt->coplanar)
 		vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_CHROMA_OUT);
 
+	if (ctx->deinterlacing)
+		vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_MV_OUT);
+
 	enable_irqs(ctx);
 
 	vpdma_map_desc_buf(ctx->dev->vpdma, &ctx->desc_list.buf);
 	vpdma_submit_descs(ctx->dev->vpdma, &ctx->desc_list);
 }
 
+static void dei_error(struct vpe_ctx *ctx)
+{
+	dev_warn(ctx->dev->v4l2_dev.dev,
+		"received DEI error interrupt\n");
+}
+
 static void ds1_uv_error(struct vpe_ctx *ctx)
 {
 	dev_warn(ctx->dev->v4l2_dev.dev,
@@ -858,6 +1111,7 @@ static irqreturn_t vpe_irq(int irq_vpe, void *data)
 {
 	struct vpe_dev *dev = (struct vpe_dev *)data;
 	struct vpe_ctx *ctx;
+	struct vpe_q_data *d_q_data;
 	struct vb2_buffer *s_vb, *d_vb;
 	struct v4l2_buffer *s_buf, *d_buf;
 	unsigned long flags;
@@ -881,9 +1135,15 @@ static irqreturn_t vpe_irq(int irq_vpe, void *data)
 		goto handled;
 	}
 
-	if (irqst1 & VPE_DS1_UV_ERROR_INT) {
-		irqst1 &= ~VPE_DS1_UV_ERROR_INT;
-		ds1_uv_error(ctx);
+	if (irqst1) {
+		if (irqst1 & VPE_DEI_ERROR_INT) {
+			irqst1 &= ~VPE_DEI_ERROR_INT;
+			dei_error(ctx);
+		}
+		if (irqst1 & VPE_DS1_UV_ERROR_INT) {
+			irqst1 &= ~VPE_DS1_UV_ERROR_INT;
+			ds1_uv_error(ctx);
+		}
 	}
 
 	if (irqst0) {
@@ -906,10 +1166,13 @@ static irqreturn_t vpe_irq(int irq_vpe, void *data)
 
 	vpdma_reset_desc_list(&ctx->desc_list);
 
+	 /* the previous dst mv buffer becomes the next src mv buffer */
+	ctx->src_mv_buf_selector = !ctx->src_mv_buf_selector;
+
 	if (ctx->aborting)
 		goto finished;
 
-	s_vb = ctx->src_vb;
+	s_vb = ctx->src_vbs[0];
 	d_vb = ctx->dst_vb;
 	s_buf = &s_vb->v4l2_buf;
 	d_buf = &d_vb->v4l2_buf;
@@ -919,16 +1182,35 @@ static irqreturn_t vpe_irq(int irq_vpe, void *data)
 		d_buf->flags |= V4L2_BUF_FLAG_TIMECODE;
 		d_buf->timecode = s_buf->timecode;
 	}
-
 	d_buf->sequence = ctx->sequence;
+	d_buf->field = ctx->field;
+
+	d_q_data = &ctx->q_data[Q_DATA_DST];
+	if (d_q_data->flags & Q_DATA_INTERLACED) {
+		if (ctx->field == V4L2_FIELD_BOTTOM) {
+			ctx->sequence++;
+			ctx->field = V4L2_FIELD_TOP;
+		} else {
+			WARN_ON(ctx->field != V4L2_FIELD_TOP);
+			ctx->field = V4L2_FIELD_BOTTOM;
+		}
+	} else {
+		ctx->sequence++;
+	}
 
-	ctx->sequence++;
+	if (ctx->deinterlacing)
+		s_vb = ctx->src_vbs[2];
 
 	spin_lock_irqsave(&dev->lock, flags);
 	v4l2_m2m_buf_done(s_vb, VB2_BUF_STATE_DONE);
 	v4l2_m2m_buf_done(d_vb, VB2_BUF_STATE_DONE);
 	spin_unlock_irqrestore(&dev->lock, flags);
 
+	if (ctx->deinterlacing) {
+		ctx->src_vbs[2] = ctx->src_vbs[1];
+		ctx->src_vbs[1] = ctx->src_vbs[0];
+	}
+
 	ctx->bufs_completed++;
 	if (ctx->bufs_completed < ctx->bufs_per_job) {
 		device_run(ctx);
@@ -1007,6 +1289,7 @@ static int vpe_g_fmt(struct file *file, void *priv, struct v4l2_format *f)
 	pix->width = q_data->width;
 	pix->height = q_data->height;
 	pix->pixelformat = q_data->fmt->fourcc;
+	pix->field = q_data->field;
 	pix->colorspace = q_data->colorspace;
 	pix->num_planes = q_data->fmt->coplanar ? 2 : 1;
 
@@ -1031,7 +1314,8 @@ static int __vpe_try_fmt(struct vpe_ctx *ctx, struct v4l2_format *f,
 		return -EINVAL;
 	}
 
-	pix->field = V4L2_FIELD_NONE;
+	if (pix->field != V4L2_FIELD_NONE && pix->field != V4L2_FIELD_ALTERNATE)
+		pix->field = V4L2_FIELD_NONE;
 
 	v4l_bound_align_image(&pix->width, MIN_W, MAX_W, W_ALIGN,
 			      &pix->height, MIN_H, MAX_H, H_ALIGN,
@@ -1099,6 +1383,7 @@ static int __vpe_s_fmt(struct vpe_ctx *ctx, struct v4l2_format *f)
 	q_data->width		= pix->width;
 	q_data->height		= pix->height;
 	q_data->colorspace	= pix->colorspace;
+	q_data->field		= pix->field;
 
 	for (i = 0; i < pix->num_planes; i++) {
 		plane_fmt = &pix->plane_fmt[i];
@@ -1112,6 +1397,11 @@ static int __vpe_s_fmt(struct vpe_ctx *ctx, struct v4l2_format *f)
 	q_data->c_rect.width	= q_data->width;
 	q_data->c_rect.height	= q_data->height;
 
+	if (q_data->field == V4L2_FIELD_ALTERNATE)
+		q_data->flags |= Q_DATA_INTERLACED;
+	else
+		q_data->flags &= ~Q_DATA_INTERLACED;
+
 	vpe_dbg(ctx->dev, "Setting format for type %d, wxh: %dx%d, fmt: %d bpl_y %d",
 		f->type, q_data->width, q_data->height, q_data->fmt->fourcc,
 		q_data->bytesperline[VPE_LUMA]);
@@ -1189,6 +1479,22 @@ static int vpe_streamoff(struct file *file, void *priv, enum v4l2_buf_type type)
 	return v4l2_m2m_streamoff(file, ctx->m2m_ctx, type);
 }
 
+static void set_dei_shadow_registers(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	u32 *dei_mmr = &mmr_adb->dei_regs[0];
+	const struct vpe_dei_regs *cur = &dei_regs;
+
+	dei_mmr[2]  = cur->mdt_spacial_freq_thr_reg;
+	dei_mmr[3]  = cur->edi_config_reg;
+	dei_mmr[4]  = cur->edi_lut_reg0;
+	dei_mmr[5]  = cur->edi_lut_reg1;
+	dei_mmr[6]  = cur->edi_lut_reg2;
+	dei_mmr[7]  = cur->edi_lut_reg3;
+
+	ctx->load_mmrs = true;
+}
+
 #define V4L2_CID_TRANS_NUM_BUFS		(V4L2_CID_USER_BASE + 0x1000)
 
 static int vpe_s_ctrl(struct v4l2_ctrl *ctrl)
@@ -1422,6 +1728,7 @@ static int vpe_open(struct file *file)
 	s_q_data->sizeimage[VPE_LUMA] = (s_q_data->width * s_q_data->height *
 			s_q_data->fmt->vpdma_fmt[VPE_LUMA]->depth) >> 3;
 	s_q_data->colorspace = V4L2_COLORSPACE_SMPTE240M;
+	s_q_data->field = V4L2_FIELD_NONE;
 	s_q_data->c_rect.left = 0;
 	s_q_data->c_rect.top = 0;
 	s_q_data->c_rect.width = s_q_data->width;
@@ -1430,6 +1737,7 @@ static int vpe_open(struct file *file)
 
 	ctx->q_data[Q_DATA_DST] = *s_q_data;
 
+	set_dei_shadow_registers(ctx);
 	set_src_registers(ctx);
 	set_dst_registers(ctx);
 	ret = set_srcdst_params(ctx);
@@ -1484,6 +1792,8 @@ static int vpe_release(struct file *file)
 	vpe_dbg(dev, "releasing instance %p\n", ctx);
 
 	mutex_lock(&dev->dev_mutex);
+	free_vbs(ctx);
+	free_mv_buffers(ctx);
 	vpdma_free_desc_list(&ctx->desc_list);
 	vpdma_free_desc_buf(&ctx->mmr_adb);
 
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v3 4/6] v4l: ti-vpe: Add de-interlacer support in VPE
@ 2013-08-29 12:32       ` Archit Taneja
  0 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-08-29 12:32 UTC (permalink / raw)
  To: linux-media
  Cc: hverkuil, laurent.pinchart, tomi.valkeinen, linux-omap, Archit Taneja

Add support for the de-interlacer block in VPE.

For de-interlacer to work, we need to enable 2 more sets of VPE input ports
which fetch data from the 'last' and 'last to last' fields of the interlaced
video. Apart from that, we need to enable the Motion vector output and input
ports, and also allocate DMA buffers for them.

We need to make sure that two most recent fields in the source queue are
available and in the 'READY' state. Once a mem2mem context gets access to the
VPE HW(in device_run), it extracts the addresses of the 3 buffers, and provides
it to the data descriptors for the 3 sets of input ports((LUMA1, CHROMA1),
(LUMA2, CHROMA2), and (LUMA3, CHROMA3)) respectively for the 3 consecutive
fields. The motion vector and output port descriptors are configured and the
list is submitted to VPDMA.

Once the transaction is done, the v4l2 buffer corresponding to the oldest
field(the 3rd one) is changed to the state 'DONE', and the buffers corresponding
to 1st and 2nd fields become the 2nd and 3rd field for the next de-interlace
operation. This way, for each deinterlace operation, we have the 3 most recent
fields. After each transaction, we also swap the motion vector buffers, the new
input motion vector buffer contains the resultant motion information of all the
previous frames, and the new output motion vector buffer will be used to hold
the updated motion vector to capture the motion changes in the next field. The
motion vector buffers are allocated using the DMA allocation API.

The de-interlacer is removed from bypass mode, it requires some extra default
configurations which are now added. The chrominance upsampler coefficients are
added for interlaced frames. Some VPDMA parameters like frame start event and
line mode are configured for the 2 extra sets of input ports.

Signed-off-by: Archit Taneja <archit@ti.com>
---
 drivers/media/platform/ti-vpe/vpe.c | 378 ++++++++++++++++++++++++++++++++----
 1 file changed, 344 insertions(+), 34 deletions(-)

diff --git a/drivers/media/platform/ti-vpe/vpe.c b/drivers/media/platform/ti-vpe/vpe.c
index 85b0880..c2bc0f4 100644
--- a/drivers/media/platform/ti-vpe/vpe.c
+++ b/drivers/media/platform/ti-vpe/vpe.c
@@ -69,6 +69,8 @@
 #define VPE_CHROMA	1
 
 /* per m2m context info */
+#define VPE_MAX_SRC_BUFS	3	/* need 3 src fields to de-interlace */
+
 #define VPE_DEF_BUFS_PER_JOB	1	/* default one buffer per batch job */
 
 /*
@@ -110,6 +112,38 @@ static const struct vpe_us_coeffs us_coeffs[] = {
 		0x00C8, 0x0348, 0x0018, 0x3FD8, 0x3FB8, 0x0378, 0x00E8, 0x3FE8,
 		0x00C8, 0x0348, 0x0018, 0x3FD8, 0x3FB8, 0x0378, 0x00E8, 0x3FE8,
 	},
+	{
+		/* Coefficients for Top Field Interlaced input */
+		0x0051, 0x03D5, 0x3FE3, 0x3FF7, 0x3FB5, 0x02E9, 0x018F, 0x3FD3,
+		/* Coefficients for Bottom Field Interlaced input */
+		0x016B, 0x0247, 0x00B1, 0x3F9D, 0x3FCF, 0x03DB, 0x005D, 0x3FF9,
+	},
+};
+
+/*
+ * the following registers are for configuring some of the parameters of the
+ * motion and edge detection blocks inside DEI, these generally remain the same,
+ * these could be passed later via userspace if some one needs to tweak these.
+ */
+struct vpe_dei_regs {
+	unsigned long mdt_spacial_freq_thr_reg;		/* VPE_DEI_REG2 */
+	unsigned long edi_config_reg;			/* VPE_DEI_REG3 */
+	unsigned long edi_lut_reg0;			/* VPE_DEI_REG4 */
+	unsigned long edi_lut_reg1;			/* VPE_DEI_REG5 */
+	unsigned long edi_lut_reg2;			/* VPE_DEI_REG6 */
+	unsigned long edi_lut_reg3;			/* VPE_DEI_REG7 */
+};
+
+/*
+ * default expert DEI register values, unlikely to be modified.
+ */
+static const struct vpe_dei_regs dei_regs = {
+	0x020C0804u,
+	0x0118100Fu,
+	0x08040200u,
+	0x1010100Cu,
+	0x10101010u,
+	0x10101010u,
 };
 
 /*
@@ -117,6 +151,7 @@ static const struct vpe_us_coeffs us_coeffs[] = {
  */
 struct vpe_port_data {
 	enum vpdma_channel channel;	/* VPDMA channel */
+	u8	vb_index;		/* input frame f, f-1, f-2 index */
 	u8	vb_part;		/* plane index for co-panar formats */
 };
 
@@ -125,6 +160,12 @@ struct vpe_port_data {
  */
 #define VPE_PORT_LUMA1_IN	0
 #define VPE_PORT_CHROMA1_IN	1
+#define VPE_PORT_LUMA2_IN	2
+#define VPE_PORT_CHROMA2_IN	3
+#define VPE_PORT_LUMA3_IN	4
+#define VPE_PORT_CHROMA3_IN	5
+#define VPE_PORT_MV_IN		6
+#define VPE_PORT_MV_OUT		7
 #define VPE_PORT_LUMA_OUT	8
 #define VPE_PORT_CHROMA_OUT	9
 #define VPE_PORT_RGB_OUT	10
@@ -132,12 +173,40 @@ struct vpe_port_data {
 static const struct vpe_port_data port_data[11] = {
 	[VPE_PORT_LUMA1_IN] = {
 		.channel	= VPE_CHAN_LUMA1_IN,
+		.vb_index	= 0,
 		.vb_part	= VPE_LUMA,
 	},
 	[VPE_PORT_CHROMA1_IN] = {
 		.channel	= VPE_CHAN_CHROMA1_IN,
+		.vb_index	= 0,
+		.vb_part	= VPE_CHROMA,
+	},
+	[VPE_PORT_LUMA2_IN] = {
+		.channel	= VPE_CHAN_LUMA2_IN,
+		.vb_index	= 1,
+		.vb_part	= VPE_LUMA,
+	},
+	[VPE_PORT_CHROMA2_IN] = {
+		.channel	= VPE_CHAN_CHROMA2_IN,
+		.vb_index	= 1,
 		.vb_part	= VPE_CHROMA,
 	},
+	[VPE_PORT_LUMA3_IN] = {
+		.channel	= VPE_CHAN_LUMA3_IN,
+		.vb_index	= 2,
+		.vb_part	= VPE_LUMA,
+	},
+	[VPE_PORT_CHROMA3_IN] = {
+		.channel	= VPE_CHAN_CHROMA3_IN,
+		.vb_index	= 2,
+		.vb_part	= VPE_CHROMA,
+	},
+	[VPE_PORT_MV_IN] = {
+		.channel	= VPE_CHAN_MV_IN,
+	},
+	[VPE_PORT_MV_OUT] = {
+		.channel	= VPE_CHAN_MV_OUT,
+	},
 	[VPE_PORT_LUMA_OUT] = {
 		.channel	= VPE_CHAN_LUMA_OUT,
 		.vb_part	= VPE_LUMA,
@@ -209,6 +278,7 @@ struct vpe_q_data {
 	unsigned int		height;				/* frame height */
 	unsigned int		bytesperline[VPE_MAX_PLANES];	/* bytes per line in memory */
 	enum v4l2_colorspace	colorspace;
+	enum v4l2_field		field;				/* supported field value */
 	unsigned int		flags;
 	unsigned int		sizeimage[VPE_MAX_PLANES];	/* image size in memory */
 	struct v4l2_rect	c_rect;				/* crop/compose rectangle */
@@ -218,6 +288,7 @@ struct vpe_q_data {
 /* vpe_q_data flag bits */
 #define	Q_DATA_FRAME_1D		(1 << 0)
 #define	Q_DATA_MODE_TILED	(1 << 1)
+#define	Q_DATA_INTERLACED	(1 << 2)
 
 enum {
 	Q_DATA_SRC = 0,
@@ -269,6 +340,7 @@ struct vpe_ctx {
 	struct v4l2_m2m_ctx	*m2m_ctx;
 	struct v4l2_ctrl_handler hdl;
 
+	unsigned int		field;			/* current field */
 	unsigned int		sequence;		/* current frame/field seq */
 	unsigned int		aborting;		/* abort after next irq */
 
@@ -276,13 +348,19 @@ struct vpe_ctx {
 	unsigned int		bufs_completed;		/* bufs done in this batch */
 
 	struct vpe_q_data	q_data[2];		/* src & dst queue data */
-	struct vb2_buffer	*src_vb;
+	struct vb2_buffer	*src_vbs[VPE_MAX_SRC_BUFS];
 	struct vb2_buffer	*dst_vb;
 
+	dma_addr_t		mv_buf_dma[2];		/* dma addrs of motion vector in/out bufs */
+	void			*mv_buf[2];		/* virtual addrs of motion vector bufs */
+	size_t			mv_buf_size;		/* current motion vector buffer size */
 	struct vpdma_buf	mmr_adb;		/* shadow reg addr/data block */
 	struct vpdma_desc_list	desc_list;		/* DMA descriptor list */
 
+	bool			deinterlacing;		/* using de-interlacer */
 	bool			load_mmrs;		/* have new shadow reg values */
+
+	unsigned int		src_mv_buf_selector;
 };
 
 
@@ -358,8 +436,7 @@ struct vpe_mmr_adb {
 	struct vpdma_adb_hdr	us3_hdr;
 	u32			us3_regs[8];
 	struct vpdma_adb_hdr	dei_hdr;
-	u32			dei_regs[1];
-	u32			dei_pad[3];
+	u32			dei_regs[8];
 	struct vpdma_adb_hdr	sc_hdr;
 	u32			sc_regs[1];
 	u32			sc_pad[3];
@@ -385,6 +462,80 @@ static void init_adb_hdrs(struct vpe_ctx *ctx)
 };
 
 /*
+ * Allocate or re-allocate the motion vector DMA buffers
+ * There are two buffers, one for input and one for output.
+ * However, the roles are reversed after each field is processed.
+ * In other words, after each field is processed, the previous
+ * output (dst) MV buffer becomes the new input (src) MV buffer.
+ */
+static int realloc_mv_buffers(struct vpe_ctx *ctx, size_t size)
+{
+	struct device *dev = ctx->dev->v4l2_dev.dev;
+
+	if (ctx->mv_buf_size == size)
+		return 0;
+
+	if (ctx->mv_buf[0])
+		dma_free_coherent(dev, ctx->mv_buf_size, ctx->mv_buf[0],
+			ctx->mv_buf_dma[0]);
+
+	if (ctx->mv_buf[1])
+		dma_free_coherent(dev, ctx->mv_buf_size, ctx->mv_buf[1],
+			ctx->mv_buf_dma[1]);
+
+	if (size == 0)
+		return 0;
+
+	ctx->mv_buf[0] = dma_alloc_coherent(dev, size, &ctx->mv_buf_dma[0],
+				GFP_KERNEL);
+	if (!ctx->mv_buf[0]) {
+		vpe_err(ctx->dev, "failed to allocate motion vector buffer\n");
+		return -ENOMEM;
+	}
+
+	ctx->mv_buf[1] = dma_alloc_coherent(dev, size, &ctx->mv_buf_dma[1],
+				GFP_KERNEL);
+	if (!ctx->mv_buf[1]) {
+		vpe_err(ctx->dev, "failed to allocate motion vector buffer\n");
+		dma_free_coherent(dev, size, ctx->mv_buf[0],
+			ctx->mv_buf_dma[0]);
+
+		return -ENOMEM;
+	}
+
+	ctx->mv_buf_size = size;
+	ctx->src_mv_buf_selector = 0;
+
+	return 0;
+}
+
+static void free_mv_buffers(struct vpe_ctx *ctx)
+{
+	realloc_mv_buffers(ctx, 0);
+}
+
+/*
+ * While de-interlacing, we keep the two most recent input buffers
+ * around.  This function frees those two buffers when we have
+ * finished processing the current stream.
+ */
+static void free_vbs(struct vpe_ctx *ctx)
+{
+	struct vpe_dev *dev = ctx->dev;
+	unsigned long flags;
+
+	if (ctx->src_vbs[2] == NULL)
+		return;
+
+	spin_lock_irqsave(&dev->lock, flags);
+	if (ctx->src_vbs[2]) {
+		v4l2_m2m_buf_done(ctx->src_vbs[2], VB2_BUF_STATE_DONE);
+		v4l2_m2m_buf_done(ctx->src_vbs[1], VB2_BUF_STATE_DONE);
+	}
+	spin_unlock_irqrestore(&dev->lock, flags);
+}
+
+/*
  * Enable or disable the VPE clocks
  */
 static void vpe_set_clock_enable(struct vpe_dev *dev, bool on)
@@ -425,6 +576,7 @@ static void vpe_top_vpdma_reset(struct vpe_dev *dev)
 static void set_us_coefficients(struct vpe_ctx *ctx)
 {
 	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	struct vpe_q_data *s_q_data = &ctx->q_data[Q_DATA_SRC];
 	u32 *us1_reg = &mmr_adb->us1_regs[0];
 	u32 *us2_reg = &mmr_adb->us2_regs[0];
 	u32 *us3_reg = &mmr_adb->us3_regs[0];
@@ -432,6 +584,9 @@ static void set_us_coefficients(struct vpe_ctx *ctx)
 
 	cp = &us_coeffs[0].anchor_fid0_c0;
 
+	if (s_q_data->flags & Q_DATA_INTERLACED)	/* interlaced */
+		cp += sizeof(us_coeffs[0]) / sizeof(*cp);
+
 	end_cp = cp + sizeof(us_coeffs[0]) / sizeof(*cp);
 
 	while (cp < end_cp) {
@@ -472,14 +627,28 @@ static void set_cfg_and_line_modes(struct vpe_ctx *ctx)
 
 	/* regs for now */
 	vpdma_set_line_mode(ctx->dev->vpdma, line_mode, VPE_CHAN_CHROMA1_IN);
+	vpdma_set_line_mode(ctx->dev->vpdma, line_mode, VPE_CHAN_CHROMA2_IN);
+	vpdma_set_line_mode(ctx->dev->vpdma, line_mode, VPE_CHAN_CHROMA3_IN);
 
 	/* frame start for input luma */
 	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
 		VPE_CHAN_LUMA1_IN);
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_LUMA2_IN);
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_LUMA3_IN);
 
 	/* frame start for input chroma */
 	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
 		VPE_CHAN_CHROMA1_IN);
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_CHROMA2_IN);
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_CHROMA3_IN);
+
+	/* frame start for MV in client */
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_MV_IN);
 
 	ctx->load_mmrs = true;
 }
@@ -523,13 +692,14 @@ static void set_dst_registers(struct vpe_ctx *ctx)
 /*
  * Set the de-interlacer shadow register values
  */
-static void set_dei_regs_bypass(struct vpe_ctx *ctx)
+static void set_dei_regs(struct vpe_ctx *ctx)
 {
 	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
 	struct vpe_q_data *s_q_data = &ctx->q_data[Q_DATA_SRC];
 	unsigned int src_h = s_q_data->c_rect.height;
 	unsigned int src_w = s_q_data->c_rect.width;
 	u32 *dei_mmr0 = &mmr_adb->dei_regs[0];
+	bool deinterlace = true;
 	u32 val = 0;
 
 	/*
@@ -538,7 +708,13 @@ static void set_dei_regs_bypass(struct vpe_ctx *ctx)
 	 * for both progressive and interlace content in interlace bypass mode.
 	 * It has been recommended not to use progressive bypass mode.
 	 */
-	val = VPE_DEI_INTERLACE_BYPASS;
+	if ((!ctx->deinterlacing && (s_q_data->flags & Q_DATA_INTERLACED)) ||
+			!(s_q_data->flags & Q_DATA_INTERLACED)) {
+		deinterlace = false;
+		val = VPE_DEI_INTERLACE_BYPASS;
+	}
+
+	src_h = deinterlace ? src_h * 2 : src_h;
 
 	val |= (src_h << VPE_DEI_HEIGHT_SHIFT) |
 		(src_w << VPE_DEI_WIDTH_SHIFT) |
@@ -577,10 +753,35 @@ static void set_sc_regs_bypass(struct vpe_ctx *ctx)
  */
 static int set_srcdst_params(struct vpe_ctx *ctx)
 {
+	struct vpe_q_data *s_q_data =  &ctx->q_data[Q_DATA_SRC];
+	struct vpe_q_data *d_q_data =  &ctx->q_data[Q_DATA_DST];
+	size_t mv_buf_size;
+	int ret;
+
 	ctx->sequence = 0;
+	ctx->field = V4L2_FIELD_TOP;
+
+	if ((s_q_data->flags & Q_DATA_INTERLACED) &&
+			!(d_q_data->flags & Q_DATA_INTERLACED)) {
+		const struct vpdma_data_format *mv =
+			&vpdma_misc_fmts[VPDMA_DATA_FMT_MV];
+
+		ctx->deinterlacing = 1;
+		mv_buf_size =
+			(s_q_data->width * s_q_data->height * mv->depth) >> 3;
+	} else {
+		ctx->deinterlacing = 0;
+		mv_buf_size = 0;
+	}
+
+	free_vbs(ctx);
+
+	ret = realloc_mv_buffers(ctx, mv_buf_size);
+	if (ret)
+		return ret;
 
 	set_cfg_and_line_modes(ctx);
-	set_dei_regs_bypass(ctx);
+	set_dei_regs(ctx);
 	set_csc_coeff_bypass(ctx);
 	set_sc_regs_bypass(ctx);
 
@@ -607,6 +808,9 @@ static int job_ready(void *priv)
 	struct vpe_ctx *ctx = priv;
 	int needed = ctx->bufs_per_job;
 
+	if (ctx->deinterlacing && ctx->src_vbs[2] == NULL)
+		needed += 2;	/* need additional two most recent fields */
+
 	if (v4l2_m2m_num_src_bufs_ready(ctx->m2m_ctx) < needed)
 		return 0;
 
@@ -734,17 +938,25 @@ static void add_out_dtd(struct vpe_ctx *ctx, int port)
 	struct v4l2_rect *c_rect = &q_data->c_rect;
 	struct vpe_fmt *fmt = q_data->fmt;
 	const struct vpdma_data_format *vpdma_fmt;
-	int plane = fmt->coplanar ? p_data->vb_part : 0;
+	int mv_buf_selector = !ctx->src_mv_buf_selector;
 	dma_addr_t dma_addr;
 	u32 flags = 0;
 
-	vpdma_fmt = fmt->vpdma_fmt[plane];
-	dma_addr = vb2_dma_contig_plane_dma_addr(vb, plane);
-	if (!dma_addr) {
-		vpe_err(ctx->dev,
-			"acquiring output buffer(%d) dma_addr failed\n",
-			port);
-		return;
+	if (port == VPE_PORT_MV_OUT) {
+		vpdma_fmt = &vpdma_misc_fmts[VPDMA_DATA_FMT_MV];
+		dma_addr = ctx->mv_buf_dma[mv_buf_selector];
+	} else {
+		/* to incorporate interleaved formats */
+		int plane = fmt->coplanar ? p_data->vb_part : 0;
+
+		vpdma_fmt = fmt->vpdma_fmt[plane];
+		dma_addr = vb2_dma_contig_plane_dma_addr(vb, plane);
+		if (!dma_addr) {
+			vpe_err(ctx->dev,
+				"acquiring output buffer(%d) dma_addr failed\n",
+				port);
+			return;
+		}
 	}
 
 	if (q_data->flags & Q_DATA_FRAME_1D)
@@ -760,23 +972,31 @@ static void add_in_dtd(struct vpe_ctx *ctx, int port)
 {
 	struct vpe_q_data *q_data = &ctx->q_data[Q_DATA_SRC];
 	const struct vpe_port_data *p_data = &port_data[port];
-	struct vb2_buffer *vb = ctx->src_vb;
+	struct vb2_buffer *vb = ctx->src_vbs[p_data->vb_index];
 	struct v4l2_rect *c_rect = &q_data->c_rect;
 	struct vpe_fmt *fmt = q_data->fmt;
 	const struct vpdma_data_format *vpdma_fmt;
-	int plane = fmt->coplanar ? p_data->vb_part : 0;
-	int field = 0;
+	int mv_buf_selector = ctx->src_mv_buf_selector;
+	int field = vb->v4l2_buf.field == V4L2_FIELD_BOTTOM;
 	dma_addr_t dma_addr;
 	u32 flags = 0;
 
-	vpdma_fmt = fmt->vpdma_fmt[plane];
+	if (port == VPE_PORT_MV_IN) {
+		vpdma_fmt = &vpdma_misc_fmts[VPDMA_DATA_FMT_MV];
+		dma_addr = ctx->mv_buf_dma[mv_buf_selector];
+	} else {
+		/* to incorporate interleaved formats */
+		int plane = fmt->coplanar ? p_data->vb_part : 0;
 
-	dma_addr = vb2_dma_contig_plane_dma_addr(vb, plane);
-	if (!dma_addr) {
-		vpe_err(ctx->dev,
-			"acquiring input buffer(%d) dma_addr failed\n",
-			port);
-		return;
+		vpdma_fmt = fmt->vpdma_fmt[plane];
+
+		dma_addr = vb2_dma_contig_plane_dma_addr(vb, plane);
+		if (!dma_addr) {
+			vpe_err(ctx->dev,
+				"acquiring input buffer(%d) dma_addr failed\n",
+				port);
+			return;
+		}
 	}
 
 	if (q_data->flags & Q_DATA_FRAME_1D)
@@ -794,7 +1014,8 @@ static void add_in_dtd(struct vpe_ctx *ctx, int port)
 static void enable_irqs(struct vpe_ctx *ctx)
 {
 	write_reg(ctx->dev, VPE_INT0_ENABLE0_SET, VPE_INT0_LIST0_COMPLETE);
-	write_reg(ctx->dev, VPE_INT0_ENABLE1_SET, VPE_DS1_UV_ERROR_INT);
+	write_reg(ctx->dev, VPE_INT0_ENABLE1_SET, VPE_DEI_ERROR_INT |
+				VPE_DS1_UV_ERROR_INT);
 
 	vpdma_enable_list_complete_irq(ctx->dev->vpdma, 0, true);
 }
@@ -817,8 +1038,15 @@ static void device_run(void *priv)
 	struct vpe_ctx *ctx = priv;
 	struct vpe_q_data *d_q_data = &ctx->q_data[Q_DATA_DST];
 
-	ctx->src_vb = v4l2_m2m_src_buf_remove(ctx->m2m_ctx);
-	WARN_ON(ctx->src_vb == NULL);
+	if (ctx->deinterlacing && ctx->src_vbs[2] == NULL) {
+		ctx->src_vbs[2] = v4l2_m2m_src_buf_remove(ctx->m2m_ctx);
+		WARN_ON(ctx->src_vbs[2] == NULL);
+		ctx->src_vbs[1] = v4l2_m2m_src_buf_remove(ctx->m2m_ctx);
+		WARN_ON(ctx->src_vbs[1] == NULL);
+	}
+
+	ctx->src_vbs[0] = v4l2_m2m_src_buf_remove(ctx->m2m_ctx);
+	WARN_ON(ctx->src_vbs[0] == NULL);
 	ctx->dst_vb = v4l2_m2m_dst_buf_remove(ctx->m2m_ctx);
 	WARN_ON(ctx->dst_vb == NULL);
 
@@ -830,24 +1058,49 @@ static void device_run(void *priv)
 		ctx->load_mmrs = false;
 	}
 
+	/* output data descriptors */
+	if (ctx->deinterlacing)
+		add_out_dtd(ctx, VPE_PORT_MV_OUT);
+
 	add_out_dtd(ctx, VPE_PORT_LUMA_OUT);
 	if (d_q_data->fmt->coplanar)
 		add_out_dtd(ctx, VPE_PORT_CHROMA_OUT);
 
+	/* input data descriptors */
+	if (ctx->deinterlacing) {
+		add_in_dtd(ctx, VPE_PORT_LUMA3_IN);
+		add_in_dtd(ctx, VPE_PORT_CHROMA3_IN);
+
+		add_in_dtd(ctx, VPE_PORT_LUMA2_IN);
+		add_in_dtd(ctx, VPE_PORT_CHROMA2_IN);
+	}
+
 	add_in_dtd(ctx, VPE_PORT_LUMA1_IN);
 	add_in_dtd(ctx, VPE_PORT_CHROMA1_IN);
 
+	if (ctx->deinterlacing)
+		add_in_dtd(ctx, VPE_PORT_MV_IN);
+
 	/* sync on channel control descriptors for output ports */
 	vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_LUMA_OUT);
 	if (d_q_data->fmt->coplanar)
 		vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_CHROMA_OUT);
 
+	if (ctx->deinterlacing)
+		vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_MV_OUT);
+
 	enable_irqs(ctx);
 
 	vpdma_map_desc_buf(ctx->dev->vpdma, &ctx->desc_list.buf);
 	vpdma_submit_descs(ctx->dev->vpdma, &ctx->desc_list);
 }
 
+static void dei_error(struct vpe_ctx *ctx)
+{
+	dev_warn(ctx->dev->v4l2_dev.dev,
+		"received DEI error interrupt\n");
+}
+
 static void ds1_uv_error(struct vpe_ctx *ctx)
 {
 	dev_warn(ctx->dev->v4l2_dev.dev,
@@ -858,6 +1111,7 @@ static irqreturn_t vpe_irq(int irq_vpe, void *data)
 {
 	struct vpe_dev *dev = (struct vpe_dev *)data;
 	struct vpe_ctx *ctx;
+	struct vpe_q_data *d_q_data;
 	struct vb2_buffer *s_vb, *d_vb;
 	struct v4l2_buffer *s_buf, *d_buf;
 	unsigned long flags;
@@ -881,9 +1135,15 @@ static irqreturn_t vpe_irq(int irq_vpe, void *data)
 		goto handled;
 	}
 
-	if (irqst1 & VPE_DS1_UV_ERROR_INT) {
-		irqst1 &= ~VPE_DS1_UV_ERROR_INT;
-		ds1_uv_error(ctx);
+	if (irqst1) {
+		if (irqst1 & VPE_DEI_ERROR_INT) {
+			irqst1 &= ~VPE_DEI_ERROR_INT;
+			dei_error(ctx);
+		}
+		if (irqst1 & VPE_DS1_UV_ERROR_INT) {
+			irqst1 &= ~VPE_DS1_UV_ERROR_INT;
+			ds1_uv_error(ctx);
+		}
 	}
 
 	if (irqst0) {
@@ -906,10 +1166,13 @@ static irqreturn_t vpe_irq(int irq_vpe, void *data)
 
 	vpdma_reset_desc_list(&ctx->desc_list);
 
+	 /* the previous dst mv buffer becomes the next src mv buffer */
+	ctx->src_mv_buf_selector = !ctx->src_mv_buf_selector;
+
 	if (ctx->aborting)
 		goto finished;
 
-	s_vb = ctx->src_vb;
+	s_vb = ctx->src_vbs[0];
 	d_vb = ctx->dst_vb;
 	s_buf = &s_vb->v4l2_buf;
 	d_buf = &d_vb->v4l2_buf;
@@ -919,16 +1182,35 @@ static irqreturn_t vpe_irq(int irq_vpe, void *data)
 		d_buf->flags |= V4L2_BUF_FLAG_TIMECODE;
 		d_buf->timecode = s_buf->timecode;
 	}
-
 	d_buf->sequence = ctx->sequence;
+	d_buf->field = ctx->field;
+
+	d_q_data = &ctx->q_data[Q_DATA_DST];
+	if (d_q_data->flags & Q_DATA_INTERLACED) {
+		if (ctx->field == V4L2_FIELD_BOTTOM) {
+			ctx->sequence++;
+			ctx->field = V4L2_FIELD_TOP;
+		} else {
+			WARN_ON(ctx->field != V4L2_FIELD_TOP);
+			ctx->field = V4L2_FIELD_BOTTOM;
+		}
+	} else {
+		ctx->sequence++;
+	}
 
-	ctx->sequence++;
+	if (ctx->deinterlacing)
+		s_vb = ctx->src_vbs[2];
 
 	spin_lock_irqsave(&dev->lock, flags);
 	v4l2_m2m_buf_done(s_vb, VB2_BUF_STATE_DONE);
 	v4l2_m2m_buf_done(d_vb, VB2_BUF_STATE_DONE);
 	spin_unlock_irqrestore(&dev->lock, flags);
 
+	if (ctx->deinterlacing) {
+		ctx->src_vbs[2] = ctx->src_vbs[1];
+		ctx->src_vbs[1] = ctx->src_vbs[0];
+	}
+
 	ctx->bufs_completed++;
 	if (ctx->bufs_completed < ctx->bufs_per_job) {
 		device_run(ctx);
@@ -1007,6 +1289,7 @@ static int vpe_g_fmt(struct file *file, void *priv, struct v4l2_format *f)
 	pix->width = q_data->width;
 	pix->height = q_data->height;
 	pix->pixelformat = q_data->fmt->fourcc;
+	pix->field = q_data->field;
 	pix->colorspace = q_data->colorspace;
 	pix->num_planes = q_data->fmt->coplanar ? 2 : 1;
 
@@ -1031,7 +1314,8 @@ static int __vpe_try_fmt(struct vpe_ctx *ctx, struct v4l2_format *f,
 		return -EINVAL;
 	}
 
-	pix->field = V4L2_FIELD_NONE;
+	if (pix->field != V4L2_FIELD_NONE && pix->field != V4L2_FIELD_ALTERNATE)
+		pix->field = V4L2_FIELD_NONE;
 
 	v4l_bound_align_image(&pix->width, MIN_W, MAX_W, W_ALIGN,
 			      &pix->height, MIN_H, MAX_H, H_ALIGN,
@@ -1099,6 +1383,7 @@ static int __vpe_s_fmt(struct vpe_ctx *ctx, struct v4l2_format *f)
 	q_data->width		= pix->width;
 	q_data->height		= pix->height;
 	q_data->colorspace	= pix->colorspace;
+	q_data->field		= pix->field;
 
 	for (i = 0; i < pix->num_planes; i++) {
 		plane_fmt = &pix->plane_fmt[i];
@@ -1112,6 +1397,11 @@ static int __vpe_s_fmt(struct vpe_ctx *ctx, struct v4l2_format *f)
 	q_data->c_rect.width	= q_data->width;
 	q_data->c_rect.height	= q_data->height;
 
+	if (q_data->field == V4L2_FIELD_ALTERNATE)
+		q_data->flags |= Q_DATA_INTERLACED;
+	else
+		q_data->flags &= ~Q_DATA_INTERLACED;
+
 	vpe_dbg(ctx->dev, "Setting format for type %d, wxh: %dx%d, fmt: %d bpl_y %d",
 		f->type, q_data->width, q_data->height, q_data->fmt->fourcc,
 		q_data->bytesperline[VPE_LUMA]);
@@ -1189,6 +1479,22 @@ static int vpe_streamoff(struct file *file, void *priv, enum v4l2_buf_type type)
 	return v4l2_m2m_streamoff(file, ctx->m2m_ctx, type);
 }
 
+static void set_dei_shadow_registers(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	u32 *dei_mmr = &mmr_adb->dei_regs[0];
+	const struct vpe_dei_regs *cur = &dei_regs;
+
+	dei_mmr[2]  = cur->mdt_spacial_freq_thr_reg;
+	dei_mmr[3]  = cur->edi_config_reg;
+	dei_mmr[4]  = cur->edi_lut_reg0;
+	dei_mmr[5]  = cur->edi_lut_reg1;
+	dei_mmr[6]  = cur->edi_lut_reg2;
+	dei_mmr[7]  = cur->edi_lut_reg3;
+
+	ctx->load_mmrs = true;
+}
+
 #define V4L2_CID_TRANS_NUM_BUFS		(V4L2_CID_USER_BASE + 0x1000)
 
 static int vpe_s_ctrl(struct v4l2_ctrl *ctrl)
@@ -1422,6 +1728,7 @@ static int vpe_open(struct file *file)
 	s_q_data->sizeimage[VPE_LUMA] = (s_q_data->width * s_q_data->height *
 			s_q_data->fmt->vpdma_fmt[VPE_LUMA]->depth) >> 3;
 	s_q_data->colorspace = V4L2_COLORSPACE_SMPTE240M;
+	s_q_data->field = V4L2_FIELD_NONE;
 	s_q_data->c_rect.left = 0;
 	s_q_data->c_rect.top = 0;
 	s_q_data->c_rect.width = s_q_data->width;
@@ -1430,6 +1737,7 @@ static int vpe_open(struct file *file)
 
 	ctx->q_data[Q_DATA_DST] = *s_q_data;
 
+	set_dei_shadow_registers(ctx);
 	set_src_registers(ctx);
 	set_dst_registers(ctx);
 	ret = set_srcdst_params(ctx);
@@ -1484,6 +1792,8 @@ static int vpe_release(struct file *file)
 	vpe_dbg(dev, "releasing instance %p\n", ctx);
 
 	mutex_lock(&dev->dev_mutex);
+	free_vbs(ctx);
+	free_mv_buffers(ctx);
 	vpdma_free_desc_list(&ctx->desc_list);
 	vpdma_free_desc_buf(&ctx->mmr_adb);
 
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v3 5/6] arm: dra7xx: hwmod data: add VPE hwmod data and ocp_if info
  2013-08-29 12:32     ` Archit Taneja
@ 2013-08-29 12:32       ` Archit Taneja
  -1 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-08-29 12:32 UTC (permalink / raw)
  To: linux-media
  Cc: hverkuil, laurent.pinchart, tomi.valkeinen, linux-omap,
	Archit Taneja, Rajendra Nayak, Sricharan R

Add hwmod data for the VPE IP, this is needed for the IP to be reset during
boot, and control the functional clock when the driver needs it via
pm_runtime apis. Add the corresponding ocp_if struct and add it DRA7XX's
ocp interface list.

Cc: Rajendra Nayak <rnayak@ti.com>
Cc: Sricharan R <r.sricharan@ti.com>
Signed-off-by: Archit Taneja <archit@ti.com>
---
 arch/arm/mach-omap2/omap_hwmod_7xx_data.c | 42 +++++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/arch/arm/mach-omap2/omap_hwmod_7xx_data.c b/arch/arm/mach-omap2/omap_hwmod_7xx_data.c
index f647998b..181365d 100644
--- a/arch/arm/mach-omap2/omap_hwmod_7xx_data.c
+++ b/arch/arm/mach-omap2/omap_hwmod_7xx_data.c
@@ -1883,6 +1883,39 @@ static struct omap_hwmod dra7xx_wd_timer2_hwmod = {
 	},
 };
 
+/*
+ * 'vpe' class
+ *
+ */
+
+static struct omap_hwmod_class_sysconfig dra7xx_vpe_sysc = {
+	.sysc_offs	= 0x0010,
+	.sysc_flags	= (SYSC_HAS_MIDLEMODE | SYSC_HAS_SIDLEMODE),
+	.idlemodes	= (SIDLE_FORCE | SIDLE_NO | SIDLE_SMART |
+			   SIDLE_SMART_WKUP | MSTANDBY_FORCE | MSTANDBY_NO |
+			   MSTANDBY_SMART | MSTANDBY_SMART_WKUP),
+	.sysc_fields	= &omap_hwmod_sysc_type2,
+};
+
+static struct omap_hwmod_class dra7xx_vpe_hwmod_class = {
+	.name	= "vpe",
+	.sysc	= &dra7xx_vpe_sysc,
+};
+
+/* vpe */
+static struct omap_hwmod dra7xx_vpe_hwmod = {
+	.name		= "vpe",
+	.class		= &dra7xx_vpe_hwmod_class,
+	.clkdm_name	= "vpe_clkdm",
+	.main_clk	= "dpll_core_h23x2_ck",
+	.prcm = {
+		.omap4 = {
+			.clkctrl_offs = DRA7XX_CM_VPE_VPE_CLKCTRL_OFFSET,
+			.context_offs = DRA7XX_RM_VPE_VPE_CONTEXT_OFFSET,
+			.modulemode   = MODULEMODE_HWCTRL,
+		},
+	},
+};
 
 /*
  * Interfaces
@@ -2636,6 +2669,14 @@ static struct omap_hwmod_ocp_if dra7xx_l4_wkup__wd_timer2 = {
 	.user		= OCP_USER_MPU | OCP_USER_SDMA,
 };
 
+/* l4_per3 -> vpe */
+static struct omap_hwmod_ocp_if dra7xx_l4_per3__vpe = {
+	.master		= &dra7xx_l4_per3_hwmod,
+	.slave		= &dra7xx_vpe_hwmod,
+	.clk		= "l3_iclk_div",
+	.user		= OCP_USER_MPU | OCP_USER_SDMA,
+};
+
 static struct omap_hwmod_ocp_if *dra7xx_hwmod_ocp_ifs[] __initdata = {
 	&dra7xx_l3_main_2__l3_instr,
 	&dra7xx_l4_cfg__l3_main_1,
@@ -2714,6 +2755,7 @@ static struct omap_hwmod_ocp_if *dra7xx_hwmod_ocp_ifs[] __initdata = {
 	&dra7xx_l3_main_1__vcp2,
 	&dra7xx_l4_per2__vcp2,
 	&dra7xx_l4_wkup__wd_timer2,
+	&dra7xx_l4_per3__vpe,
 	NULL,
 };
 
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v3 5/6] arm: dra7xx: hwmod data: add VPE hwmod data and ocp_if info
@ 2013-08-29 12:32       ` Archit Taneja
  0 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-08-29 12:32 UTC (permalink / raw)
  To: linux-media
  Cc: hverkuil, laurent.pinchart, tomi.valkeinen, linux-omap,
	Archit Taneja, Rajendra Nayak, Sricharan R

Add hwmod data for the VPE IP, this is needed for the IP to be reset during
boot, and control the functional clock when the driver needs it via
pm_runtime apis. Add the corresponding ocp_if struct and add it DRA7XX's
ocp interface list.

Cc: Rajendra Nayak <rnayak@ti.com>
Cc: Sricharan R <r.sricharan@ti.com>
Signed-off-by: Archit Taneja <archit@ti.com>
---
 arch/arm/mach-omap2/omap_hwmod_7xx_data.c | 42 +++++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/arch/arm/mach-omap2/omap_hwmod_7xx_data.c b/arch/arm/mach-omap2/omap_hwmod_7xx_data.c
index f647998b..181365d 100644
--- a/arch/arm/mach-omap2/omap_hwmod_7xx_data.c
+++ b/arch/arm/mach-omap2/omap_hwmod_7xx_data.c
@@ -1883,6 +1883,39 @@ static struct omap_hwmod dra7xx_wd_timer2_hwmod = {
 	},
 };
 
+/*
+ * 'vpe' class
+ *
+ */
+
+static struct omap_hwmod_class_sysconfig dra7xx_vpe_sysc = {
+	.sysc_offs	= 0x0010,
+	.sysc_flags	= (SYSC_HAS_MIDLEMODE | SYSC_HAS_SIDLEMODE),
+	.idlemodes	= (SIDLE_FORCE | SIDLE_NO | SIDLE_SMART |
+			   SIDLE_SMART_WKUP | MSTANDBY_FORCE | MSTANDBY_NO |
+			   MSTANDBY_SMART | MSTANDBY_SMART_WKUP),
+	.sysc_fields	= &omap_hwmod_sysc_type2,
+};
+
+static struct omap_hwmod_class dra7xx_vpe_hwmod_class = {
+	.name	= "vpe",
+	.sysc	= &dra7xx_vpe_sysc,
+};
+
+/* vpe */
+static struct omap_hwmod dra7xx_vpe_hwmod = {
+	.name		= "vpe",
+	.class		= &dra7xx_vpe_hwmod_class,
+	.clkdm_name	= "vpe_clkdm",
+	.main_clk	= "dpll_core_h23x2_ck",
+	.prcm = {
+		.omap4 = {
+			.clkctrl_offs = DRA7XX_CM_VPE_VPE_CLKCTRL_OFFSET,
+			.context_offs = DRA7XX_RM_VPE_VPE_CONTEXT_OFFSET,
+			.modulemode   = MODULEMODE_HWCTRL,
+		},
+	},
+};
 
 /*
  * Interfaces
@@ -2636,6 +2669,14 @@ static struct omap_hwmod_ocp_if dra7xx_l4_wkup__wd_timer2 = {
 	.user		= OCP_USER_MPU | OCP_USER_SDMA,
 };
 
+/* l4_per3 -> vpe */
+static struct omap_hwmod_ocp_if dra7xx_l4_per3__vpe = {
+	.master		= &dra7xx_l4_per3_hwmod,
+	.slave		= &dra7xx_vpe_hwmod,
+	.clk		= "l3_iclk_div",
+	.user		= OCP_USER_MPU | OCP_USER_SDMA,
+};
+
 static struct omap_hwmod_ocp_if *dra7xx_hwmod_ocp_ifs[] __initdata = {
 	&dra7xx_l3_main_2__l3_instr,
 	&dra7xx_l4_cfg__l3_main_1,
@@ -2714,6 +2755,7 @@ static struct omap_hwmod_ocp_if *dra7xx_hwmod_ocp_ifs[] __initdata = {
 	&dra7xx_l3_main_1__vcp2,
 	&dra7xx_l4_per2__vcp2,
 	&dra7xx_l4_wkup__wd_timer2,
+	&dra7xx_l4_per3__vpe,
 	NULL,
 };
 
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v3 6/6] experimental: arm: dts: dra7xx: Add a DT node for VPE
  2013-08-29 12:32     ` Archit Taneja
@ 2013-08-29 12:32       ` Archit Taneja
  -1 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-08-29 12:32 UTC (permalink / raw)
  To: linux-media
  Cc: hverkuil, laurent.pinchart, tomi.valkeinen, linux-omap,
	Archit Taneja, Rajendra Nayak, Sricharan R

Add a DT node for VPE in dra7.dtsi. This is experimental because we might need
to split the VPE address space a bit more, and also because the IRQ line
described is accessible the IRQ crossbar driver is added for DRA7XX.

Cc: Rajendra Nayak <rnayak@ti.com>
Cc: Sricharan R <r.sricharan@ti.com>
Signed-off-by: Archit Taneja <archit@ti.com>
---
 arch/arm/boot/dts/dra7.dtsi | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/arch/arm/boot/dts/dra7.dtsi b/arch/arm/boot/dts/dra7.dtsi
index ce9a0f0..7c1cbfe 100644
--- a/arch/arm/boot/dts/dra7.dtsi
+++ b/arch/arm/boot/dts/dra7.dtsi
@@ -484,6 +484,17 @@
 			dmas = <&sdma 70>, <&sdma 71>;
 			dma-names = "tx0", "rx0";
 		};
+
+		vpe {
+			compatible = "ti,vpe";
+			ti,hwmods = "vpe";
+			reg = <0x489d0000 0xd000>, <0x489dd000 0x400>;
+			reg-names = "vpe", "vpdma";
+			interrupts = <0 158 0x4>;
+			#address-cells = <1>;
+			#size-cells = <0>;
+		};
+
 	};
 
 	clocks {
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v3 6/6] experimental: arm: dts: dra7xx: Add a DT node for VPE
@ 2013-08-29 12:32       ` Archit Taneja
  0 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-08-29 12:32 UTC (permalink / raw)
  To: linux-media
  Cc: hverkuil, laurent.pinchart, tomi.valkeinen, linux-omap,
	Archit Taneja, Rajendra Nayak, Sricharan R

Add a DT node for VPE in dra7.dtsi. This is experimental because we might need
to split the VPE address space a bit more, and also because the IRQ line
described is accessible the IRQ crossbar driver is added for DRA7XX.

Cc: Rajendra Nayak <rnayak@ti.com>
Cc: Sricharan R <r.sricharan@ti.com>
Signed-off-by: Archit Taneja <archit@ti.com>
---
 arch/arm/boot/dts/dra7.dtsi | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/arch/arm/boot/dts/dra7.dtsi b/arch/arm/boot/dts/dra7.dtsi
index ce9a0f0..7c1cbfe 100644
--- a/arch/arm/boot/dts/dra7.dtsi
+++ b/arch/arm/boot/dts/dra7.dtsi
@@ -484,6 +484,17 @@
 			dmas = <&sdma 70>, <&sdma 71>;
 			dma-names = "tx0", "rx0";
 		};
+
+		vpe {
+			compatible = "ti,vpe";
+			ti,hwmods = "vpe";
+			reg = <0x489d0000 0xd000>, <0x489dd000 0x400>;
+			reg-names = "vpe", "vpdma";
+			interrupts = <0 158 0x4>;
+			#address-cells = <1>;
+			#size-cells = <0>;
+		};
+
 	};
 
 	clocks {
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* Re: [PATCH v3 5/6] arm: dra7xx: hwmod data: add VPE hwmod data and ocp_if info
  2013-08-29 12:32       ` Archit Taneja
@ 2013-08-29 12:42         ` Rajendra Nayak
  -1 siblings, 0 replies; 138+ messages in thread
From: Rajendra Nayak @ 2013-08-29 12:42 UTC (permalink / raw)
  To: Archit Taneja
  Cc: linux-media, hverkuil, laurent.pinchart, tomi.valkeinen,
	linux-omap, Sricharan R

Archit,

On Thursday 29 August 2013 06:02 PM, Archit Taneja wrote:
> Add hwmod data for the VPE IP, this is needed for the IP to be reset during
> boot, and control the functional clock when the driver needs it via
> pm_runtime apis. Add the corresponding ocp_if struct and add it DRA7XX's
> ocp interface list.

You need to swap patches 5/6 and 6/6 to maintain git-bisect.
Thats needed because after $subject patch, hwmod wouldn't find
the register iospace and crash, and thats added only in patch 6/6.

regards,
Rajendra

> 
> Cc: Rajendra Nayak <rnayak@ti.com>
> Cc: Sricharan R <r.sricharan@ti.com>
> Signed-off-by: Archit Taneja <archit@ti.com>
> ---
>  arch/arm/mach-omap2/omap_hwmod_7xx_data.c | 42 +++++++++++++++++++++++++++++++
>  1 file changed, 42 insertions(+)
> 
> diff --git a/arch/arm/mach-omap2/omap_hwmod_7xx_data.c b/arch/arm/mach-omap2/omap_hwmod_7xx_data.c
> index f647998b..181365d 100644
> --- a/arch/arm/mach-omap2/omap_hwmod_7xx_data.c
> +++ b/arch/arm/mach-omap2/omap_hwmod_7xx_data.c
> @@ -1883,6 +1883,39 @@ static struct omap_hwmod dra7xx_wd_timer2_hwmod = {
>  	},
>  };
>  
> +/*
> + * 'vpe' class
> + *
> + */
> +
> +static struct omap_hwmod_class_sysconfig dra7xx_vpe_sysc = {
> +	.sysc_offs	= 0x0010,
> +	.sysc_flags	= (SYSC_HAS_MIDLEMODE | SYSC_HAS_SIDLEMODE),
> +	.idlemodes	= (SIDLE_FORCE | SIDLE_NO | SIDLE_SMART |
> +			   SIDLE_SMART_WKUP | MSTANDBY_FORCE | MSTANDBY_NO |
> +			   MSTANDBY_SMART | MSTANDBY_SMART_WKUP),
> +	.sysc_fields	= &omap_hwmod_sysc_type2,
> +};
> +
> +static struct omap_hwmod_class dra7xx_vpe_hwmod_class = {
> +	.name	= "vpe",
> +	.sysc	= &dra7xx_vpe_sysc,
> +};
> +
> +/* vpe */
> +static struct omap_hwmod dra7xx_vpe_hwmod = {
> +	.name		= "vpe",
> +	.class		= &dra7xx_vpe_hwmod_class,
> +	.clkdm_name	= "vpe_clkdm",
> +	.main_clk	= "dpll_core_h23x2_ck",
> +	.prcm = {
> +		.omap4 = {
> +			.clkctrl_offs = DRA7XX_CM_VPE_VPE_CLKCTRL_OFFSET,
> +			.context_offs = DRA7XX_RM_VPE_VPE_CONTEXT_OFFSET,
> +			.modulemode   = MODULEMODE_HWCTRL,
> +		},
> +	},
> +};
>  
>  /*
>   * Interfaces
> @@ -2636,6 +2669,14 @@ static struct omap_hwmod_ocp_if dra7xx_l4_wkup__wd_timer2 = {
>  	.user		= OCP_USER_MPU | OCP_USER_SDMA,
>  };
>  
> +/* l4_per3 -> vpe */
> +static struct omap_hwmod_ocp_if dra7xx_l4_per3__vpe = {
> +	.master		= &dra7xx_l4_per3_hwmod,
> +	.slave		= &dra7xx_vpe_hwmod,
> +	.clk		= "l3_iclk_div",
> +	.user		= OCP_USER_MPU | OCP_USER_SDMA,
> +};
> +
>  static struct omap_hwmod_ocp_if *dra7xx_hwmod_ocp_ifs[] __initdata = {
>  	&dra7xx_l3_main_2__l3_instr,
>  	&dra7xx_l4_cfg__l3_main_1,
> @@ -2714,6 +2755,7 @@ static struct omap_hwmod_ocp_if *dra7xx_hwmod_ocp_ifs[] __initdata = {
>  	&dra7xx_l3_main_1__vcp2,
>  	&dra7xx_l4_per2__vcp2,
>  	&dra7xx_l4_wkup__wd_timer2,
> +	&dra7xx_l4_per3__vpe,
>  	NULL,
>  };
>  
> 


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v3 5/6] arm: dra7xx: hwmod data: add VPE hwmod data and ocp_if info
@ 2013-08-29 12:42         ` Rajendra Nayak
  0 siblings, 0 replies; 138+ messages in thread
From: Rajendra Nayak @ 2013-08-29 12:42 UTC (permalink / raw)
  To: Archit Taneja
  Cc: linux-media, hverkuil, laurent.pinchart, tomi.valkeinen,
	linux-omap, Sricharan R

Archit,

On Thursday 29 August 2013 06:02 PM, Archit Taneja wrote:
> Add hwmod data for the VPE IP, this is needed for the IP to be reset during
> boot, and control the functional clock when the driver needs it via
> pm_runtime apis. Add the corresponding ocp_if struct and add it DRA7XX's
> ocp interface list.

You need to swap patches 5/6 and 6/6 to maintain git-bisect.
Thats needed because after $subject patch, hwmod wouldn't find
the register iospace and crash, and thats added only in patch 6/6.

regards,
Rajendra

> 
> Cc: Rajendra Nayak <rnayak@ti.com>
> Cc: Sricharan R <r.sricharan@ti.com>
> Signed-off-by: Archit Taneja <archit@ti.com>
> ---
>  arch/arm/mach-omap2/omap_hwmod_7xx_data.c | 42 +++++++++++++++++++++++++++++++
>  1 file changed, 42 insertions(+)
> 
> diff --git a/arch/arm/mach-omap2/omap_hwmod_7xx_data.c b/arch/arm/mach-omap2/omap_hwmod_7xx_data.c
> index f647998b..181365d 100644
> --- a/arch/arm/mach-omap2/omap_hwmod_7xx_data.c
> +++ b/arch/arm/mach-omap2/omap_hwmod_7xx_data.c
> @@ -1883,6 +1883,39 @@ static struct omap_hwmod dra7xx_wd_timer2_hwmod = {
>  	},
>  };
>  
> +/*
> + * 'vpe' class
> + *
> + */
> +
> +static struct omap_hwmod_class_sysconfig dra7xx_vpe_sysc = {
> +	.sysc_offs	= 0x0010,
> +	.sysc_flags	= (SYSC_HAS_MIDLEMODE | SYSC_HAS_SIDLEMODE),
> +	.idlemodes	= (SIDLE_FORCE | SIDLE_NO | SIDLE_SMART |
> +			   SIDLE_SMART_WKUP | MSTANDBY_FORCE | MSTANDBY_NO |
> +			   MSTANDBY_SMART | MSTANDBY_SMART_WKUP),
> +	.sysc_fields	= &omap_hwmod_sysc_type2,
> +};
> +
> +static struct omap_hwmod_class dra7xx_vpe_hwmod_class = {
> +	.name	= "vpe",
> +	.sysc	= &dra7xx_vpe_sysc,
> +};
> +
> +/* vpe */
> +static struct omap_hwmod dra7xx_vpe_hwmod = {
> +	.name		= "vpe",
> +	.class		= &dra7xx_vpe_hwmod_class,
> +	.clkdm_name	= "vpe_clkdm",
> +	.main_clk	= "dpll_core_h23x2_ck",
> +	.prcm = {
> +		.omap4 = {
> +			.clkctrl_offs = DRA7XX_CM_VPE_VPE_CLKCTRL_OFFSET,
> +			.context_offs = DRA7XX_RM_VPE_VPE_CONTEXT_OFFSET,
> +			.modulemode   = MODULEMODE_HWCTRL,
> +		},
> +	},
> +};
>  
>  /*
>   * Interfaces
> @@ -2636,6 +2669,14 @@ static struct omap_hwmod_ocp_if dra7xx_l4_wkup__wd_timer2 = {
>  	.user		= OCP_USER_MPU | OCP_USER_SDMA,
>  };
>  
> +/* l4_per3 -> vpe */
> +static struct omap_hwmod_ocp_if dra7xx_l4_per3__vpe = {
> +	.master		= &dra7xx_l4_per3_hwmod,
> +	.slave		= &dra7xx_vpe_hwmod,
> +	.clk		= "l3_iclk_div",
> +	.user		= OCP_USER_MPU | OCP_USER_SDMA,
> +};
> +
>  static struct omap_hwmod_ocp_if *dra7xx_hwmod_ocp_ifs[] __initdata = {
>  	&dra7xx_l3_main_2__l3_instr,
>  	&dra7xx_l4_cfg__l3_main_1,
> @@ -2714,6 +2755,7 @@ static struct omap_hwmod_ocp_if *dra7xx_hwmod_ocp_ifs[] __initdata = {
>  	&dra7xx_l3_main_1__vcp2,
>  	&dra7xx_l4_per2__vcp2,
>  	&dra7xx_l4_wkup__wd_timer2,
> +	&dra7xx_l4_per3__vpe,
>  	NULL,
>  };
>  
> 


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v3 3/6] v4l: ti-vpe: Add VPE mem to mem driver
  2013-08-29 12:32       ` Archit Taneja
  (?)
@ 2013-08-29 13:28       ` Hans Verkuil
  2013-08-30  6:47           ` Archit Taneja
  2013-09-05  5:56           ` Archit Taneja
  -1 siblings, 2 replies; 138+ messages in thread
From: Hans Verkuil @ 2013-08-29 13:28 UTC (permalink / raw)
  To: Archit Taneja; +Cc: linux-media, laurent.pinchart, tomi.valkeinen, linux-omap

On Thu 29 August 2013 14:32:49 Archit Taneja wrote:
> VPE is a block which consists of a single memory to memory path which can
> perform chrominance up/down sampling, de-interlacing, scaling, and color space
> conversion of raster or tiled YUV420 coplanar, YUV422 coplanar or YUV422
> interleaved video formats.
> 
> We create a mem2mem driver based primarily on the mem2mem-testdev example.
> The de-interlacer, scaler and color space converter are all bypassed for now
> to keep the driver simple. Chroma up/down sampler blocks are implemented, so
> conversion beteen different YUV formats is possible.
> 
> Each mem2mem context allocates a buffer for VPE MMR values which it will use
> when it gets access to the VPE HW via the mem2mem queue, it also allocates
> a VPDMA descriptor list to which configuration and data descriptors are added.
> 
> Based on the information received via v4l2 ioctls for the source and
> destination queues, the driver configures the values for the MMRs, and stores
> them in the buffer. There are also some VPDMA parameters like frame start and
> line mode which needs to be configured, these are configured by direct register
> writes via the VPDMA helper functions.
> 
> The driver's device_run() mem2mem op will add each descriptor based on how the
> source and destination queues are set up for the given ctx, once the list is
> prepared, it's submitted to VPDMA, these descriptors when parsed by VPDMA will
> upload MMR registers, start DMA of video buffers on the various input and output
> clients/ports.
> 
> When the list is parsed completely(and the DMAs on all the output ports done),
> an interrupt is generated which we use to notify that the source and destination
> buffers are done.
> 
> The rest of the driver is quite similar to other mem2mem drivers, we use the
> multiplane v4l2 ioctls as the HW support coplanar formats.
> 
> Signed-off-by: Archit Taneja <archit@ti.com>

Thanks for the patch. Just a few small comments below...

> ---
>  drivers/media/platform/Kconfig           |   16 +
>  drivers/media/platform/Makefile          |    2 +
>  drivers/media/platform/ti-vpe/Makefile   |    5 +
>  drivers/media/platform/ti-vpe/vpe.c      | 1740 ++++++++++++++++++++++++++++++
>  drivers/media/platform/ti-vpe/vpe_regs.h |  496 +++++++++
>  5 files changed, 2259 insertions(+)
>  create mode 100644 drivers/media/platform/ti-vpe/Makefile
>  create mode 100644 drivers/media/platform/ti-vpe/vpe.c
>  create mode 100644 drivers/media/platform/ti-vpe/vpe_regs.h
> 
> diff --git a/drivers/media/platform/ti-vpe/vpe.c b/drivers/media/platform/ti-vpe/vpe.c
> new file mode 100644
> index 0000000..85b0880
> --- /dev/null
> +++ b/drivers/media/platform/ti-vpe/vpe.c

<snip>

> +static int vpe_enum_fmt(struct file *file, void *priv,
> +				struct v4l2_fmtdesc *f)
> +{
> +	if (V4L2_TYPE_IS_OUTPUT(f->type))
> +		return __enum_fmt(f, VPE_FMT_TYPE_OUTPUT);
> +	else

The line above isn't necessary.

> +		return __enum_fmt(f, VPE_FMT_TYPE_CAPTURE);
> +}
> +
> +static int vpe_g_fmt(struct file *file, void *priv, struct v4l2_format *f)
> +{
> +	struct v4l2_pix_format_mplane *pix = &f->fmt.pix_mp;
> +	struct vpe_ctx *ctx = file2ctx(file);
> +	struct vb2_queue *vq;
> +	struct vpe_q_data *q_data;
> +	int i;
> +
> +	vq = v4l2_m2m_get_vq(ctx->m2m_ctx, f->type);
> +	if (!vq)
> +		return -EINVAL;
> +
> +	q_data = get_q_data(ctx, f->type);
> +
> +	pix->width = q_data->width;
> +	pix->height = q_data->height;
> +	pix->pixelformat = q_data->fmt->fourcc;
> +	pix->colorspace = q_data->colorspace;
> +	pix->num_planes = q_data->fmt->coplanar ? 2 : 1;
> +
> +	for (i = 0; i < pix->num_planes; i++) {
> +		pix->plane_fmt[i].bytesperline = q_data->bytesperline[i];
> +		pix->plane_fmt[i].sizeimage = q_data->sizeimage[i];
> +	}
> +
> +	return 0;
> +}
> +
> +static int __vpe_try_fmt(struct vpe_ctx *ctx, struct v4l2_format *f,
> +		       struct vpe_fmt *fmt, int type)
> +{
> +	struct v4l2_pix_format_mplane *pix = &f->fmt.pix_mp;
> +	struct v4l2_plane_pix_format *plane_fmt;
> +	int i;
> +
> +	if (!fmt || !(fmt->types & type)) {
> +		vpe_err(ctx->dev, "Fourcc format (0x%08x) invalid.\n",
> +			pix->pixelformat);
> +		return -EINVAL;
> +	}
> +
> +	pix->field = V4L2_FIELD_NONE;
> +
> +	v4l_bound_align_image(&pix->width, MIN_W, MAX_W, W_ALIGN,
> +			      &pix->height, MIN_H, MAX_H, H_ALIGN,
> +			      S_ALIGN);
> +
> +	pix->num_planes = fmt->coplanar ? 2 : 1;
> +	pix->pixelformat = fmt->fourcc;
> +	pix->colorspace = fmt->fourcc == V4L2_PIX_FMT_RGB24 ?
> +			V4L2_COLORSPACE_SRGB : V4L2_COLORSPACE_SMPTE170M;

pix->priv should be set to NULL as well.

> +
> +
> +	for (i = 0; i < pix->num_planes; i++) {
> +		int depth;
> +
> +		plane_fmt = &pix->plane_fmt[i];
> +		depth = fmt->vpdma_fmt[i]->depth;
> +
> +		if (i == VPE_LUMA)
> +			plane_fmt->bytesperline =
> +					round_up((pix->width * depth) >> 3,
> +						1 << L_ALIGN);
> +		else
> +			plane_fmt->bytesperline = pix->width;
> +
> +		plane_fmt->sizeimage =
> +				(pix->height * pix->width * depth) >> 3;
> +	}
> +
> +	return 0;
> +}
> +
> +static int vpe_try_fmt(struct file *file, void *priv, struct v4l2_format *f)
> +{
> +	struct vpe_ctx *ctx = file2ctx(file);
> +	struct vpe_fmt *fmt = find_format(f);
> +
> +	if (V4L2_TYPE_IS_OUTPUT(f->type))
> +		return __vpe_try_fmt(ctx, f, fmt, VPE_FMT_TYPE_OUTPUT);
> +	else
> +		return __vpe_try_fmt(ctx, f, fmt, VPE_FMT_TYPE_CAPTURE);
> +}
> +
> +static int __vpe_s_fmt(struct vpe_ctx *ctx, struct v4l2_format *f)
> +{
> +	struct v4l2_pix_format_mplane *pix = &f->fmt.pix_mp;
> +	struct v4l2_plane_pix_format *plane_fmt;
> +	struct vpe_q_data *q_data;
> +	struct vb2_queue *vq;
> +	int i;
> +
> +	vq = v4l2_m2m_get_vq(ctx->m2m_ctx, f->type);
> +	if (!vq)
> +		return -EINVAL;
> +
> +	if (vb2_is_busy(vq)) {
> +		vpe_err(ctx->dev, "queue busy\n");
> +		return -EBUSY;
> +	}
> +
> +	q_data = get_q_data(ctx, f->type);
> +	if (!q_data)
> +		return -EINVAL;
> +
> +	q_data->fmt		= find_format(f);
> +	q_data->width		= pix->width;
> +	q_data->height		= pix->height;
> +	q_data->colorspace	= pix->colorspace;
> +
> +	for (i = 0; i < pix->num_planes; i++) {
> +		plane_fmt = &pix->plane_fmt[i];
> +
> +		q_data->bytesperline[i]	= plane_fmt->bytesperline;
> +		q_data->sizeimage[i]	= plane_fmt->sizeimage;
> +	}
> +
> +	q_data->c_rect.left	= 0;
> +	q_data->c_rect.top	= 0;
> +	q_data->c_rect.width	= q_data->width;
> +	q_data->c_rect.height	= q_data->height;
> +
> +	vpe_dbg(ctx->dev, "Setting format for type %d, wxh: %dx%d, fmt: %d bpl_y %d",
> +		f->type, q_data->width, q_data->height, q_data->fmt->fourcc,
> +		q_data->bytesperline[VPE_LUMA]);
> +	if (q_data->fmt->coplanar)
> +		vpe_dbg(ctx->dev, " bpl_uv %d\n",
> +			q_data->bytesperline[VPE_CHROMA]);
> +
> +	return 0;
> +}
> +
> +static int vpe_s_fmt(struct file *file, void *priv, struct v4l2_format *f)
> +{
> +	int ret;
> +	struct vpe_ctx *ctx = file2ctx(file);
> +
> +	ret = vpe_try_fmt(file, priv, f);
> +	if (ret)
> +		return ret;
> +
> +	ret = __vpe_s_fmt(ctx, f);
> +	if (ret)
> +		return ret;
> +
> +	if (V4L2_TYPE_IS_OUTPUT(f->type))
> +		set_src_registers(ctx);
> +	else
> +		set_dst_registers(ctx);
> +
> +	return set_srcdst_params(ctx);
> +}
> +
> +static int vpe_reqbufs(struct file *file, void *priv,
> +		       struct v4l2_requestbuffers *reqbufs)
> +{
> +	struct vpe_ctx *ctx = file2ctx(file);
> +
> +	return v4l2_m2m_reqbufs(file, ctx->m2m_ctx, reqbufs);
> +}
> +
> +static int vpe_querybuf(struct file *file, void *priv, struct v4l2_buffer *buf)
> +{
> +	struct vpe_ctx *ctx = file2ctx(file);
> +
> +	return v4l2_m2m_querybuf(file, ctx->m2m_ctx, buf);
> +}
> +
> +static int vpe_qbuf(struct file *file, void *priv, struct v4l2_buffer *buf)
> +{
> +	struct vpe_ctx *ctx = file2ctx(file);
> +
> +	return v4l2_m2m_qbuf(file, ctx->m2m_ctx, buf);
> +}
> +
> +static int vpe_dqbuf(struct file *file, void *priv, struct v4l2_buffer *buf)
> +{
> +	struct vpe_ctx *ctx = file2ctx(file);
> +
> +	return v4l2_m2m_dqbuf(file, ctx->m2m_ctx, buf);
> +}
> +
> +static int vpe_streamon(struct file *file, void *priv, enum v4l2_buf_type type)
> +{
> +	struct vpe_ctx *ctx = file2ctx(file);
> +
> +	return v4l2_m2m_streamon(file, ctx->m2m_ctx, type);
> +}
> +
> +static int vpe_streamoff(struct file *file, void *priv, enum v4l2_buf_type type)
> +{
> +	struct vpe_ctx *ctx = file2ctx(file);
> +
> +	vpe_dump_regs(ctx->dev);
> +	vpdma_dump_regs(ctx->dev->vpdma);
> +
> +	return v4l2_m2m_streamoff(file, ctx->m2m_ctx, type);
> +}
> +
> +#define V4L2_CID_TRANS_NUM_BUFS		(V4L2_CID_USER_BASE + 0x1000)

Reserve a control range for this driver in include/uapi/linux/v4l2-controls.h.
Similar to the ones already defined there.

That will ensure that controls for this driver have unique IDs.

> +
> +static int vpe_s_ctrl(struct v4l2_ctrl *ctrl)
> +{
> +	struct vpe_ctx *ctx =
> +		container_of(ctrl->handler, struct vpe_ctx, hdl);
> +
> +	switch (ctrl->id) {
> +	case V4L2_CID_TRANS_NUM_BUFS:
> +		ctx->bufs_per_job = ctrl->val;
> +		break;
> +
> +	default:
> +		vpe_err(ctx->dev, "Invalid control\n");
> +		return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +
> +static const struct v4l2_ctrl_ops vpe_ctrl_ops = {
> +	.s_ctrl = vpe_s_ctrl,
> +};
> +
> +static const struct v4l2_ioctl_ops vpe_ioctl_ops = {
> +	.vidioc_querycap	= vpe_querycap,
> +
> +	.vidioc_enum_fmt_vid_cap_mplane = vpe_enum_fmt,
> +	.vidioc_g_fmt_vid_cap_mplane	= vpe_g_fmt,
> +	.vidioc_try_fmt_vid_cap_mplane	= vpe_try_fmt,
> +	.vidioc_s_fmt_vid_cap_mplane	= vpe_s_fmt,
> +
> +	.vidioc_enum_fmt_vid_out_mplane = vpe_enum_fmt,
> +	.vidioc_g_fmt_vid_out_mplane	= vpe_g_fmt,
> +	.vidioc_try_fmt_vid_out_mplane	= vpe_try_fmt,
> +	.vidioc_s_fmt_vid_out_mplane	= vpe_s_fmt,
> +
> +	.vidioc_reqbufs		= vpe_reqbufs,
> +	.vidioc_querybuf	= vpe_querybuf,
> +
> +	.vidioc_qbuf		= vpe_qbuf,
> +	.vidioc_dqbuf		= vpe_dqbuf,
> +
> +	.vidioc_streamon	= vpe_streamon,
> +	.vidioc_streamoff	= vpe_streamoff,
> +	.vidioc_subscribe_event = v4l2_ctrl_subscribe_event,
> +	.vidioc_unsubscribe_event = v4l2_event_unsubscribe,
> +};
> +
> +/*
> + * Queue operations
> + */
> +static int vpe_queue_setup(struct vb2_queue *vq,
> +			   const struct v4l2_format *fmt,
> +			   unsigned int *nbuffers, unsigned int *nplanes,
> +			   unsigned int sizes[], void *alloc_ctxs[])
> +{
> +	int i;
> +	struct vpe_ctx *ctx = vb2_get_drv_priv(vq);
> +	struct vpe_q_data *q_data;
> +
> +	q_data = get_q_data(ctx, vq->type);
> +
> +	*nplanes = q_data->fmt->coplanar ? 2 : 1;
> +
> +	for (i = 0; i < *nplanes; i++) {
> +		sizes[i] = q_data->sizeimage[i];
> +		alloc_ctxs[i] = ctx->dev->alloc_ctx;
> +	}
> +
> +	vpe_dbg(ctx->dev, "get %d buffer(s) of size %d", *nbuffers,
> +		sizes[VPE_LUMA]);
> +	if (q_data->fmt->coplanar)
> +		vpe_dbg(ctx->dev, " and %d\n", sizes[VPE_CHROMA]);
> +
> +	return 0;
> +}
> +
> +static int vpe_buf_prepare(struct vb2_buffer *vb)
> +{
> +	struct vpe_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
> +	struct vpe_q_data *q_data;
> +	int i, num_planes;
> +
> +	vpe_dbg(ctx->dev, "type: %d\n", vb->vb2_queue->type);
> +
> +	q_data = get_q_data(ctx, vb->vb2_queue->type);
> +	num_planes = q_data->fmt->coplanar ? 2 : 1;
> +
> +	for (i = 0; i < num_planes; i++) {
> +		if (vb2_plane_size(vb, i) < q_data->sizeimage[i]) {
> +			vpe_err(ctx->dev,
> +				"data will not fit into plane (%lu < %lu)\n",
> +				vb2_plane_size(vb, i),
> +				(long) q_data->sizeimage[i]);
> +			return -EINVAL;
> +		}
> +	}
> +
> +	for (i = 0; i < num_planes; i++)
> +		vb2_set_plane_payload(vb, i, q_data->sizeimage[i]);
> +
> +	return 0;
> +}
> +
> +static void vpe_buf_queue(struct vb2_buffer *vb)
> +{
> +	struct vpe_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
> +	v4l2_m2m_buf_queue(ctx->m2m_ctx, vb);
> +}
> +
> +static void vpe_wait_prepare(struct vb2_queue *q)
> +{
> +	struct vpe_ctx *ctx = vb2_get_drv_priv(q);
> +	vpe_unlock(ctx);
> +}
> +
> +static void vpe_wait_finish(struct vb2_queue *q)
> +{
> +	struct vpe_ctx *ctx = vb2_get_drv_priv(q);
> +	vpe_lock(ctx);
> +}
> +
> +static struct vb2_ops vpe_qops = {
> +	.queue_setup	 = vpe_queue_setup,
> +	.buf_prepare	 = vpe_buf_prepare,
> +	.buf_queue	 = vpe_buf_queue,
> +	.wait_prepare	 = vpe_wait_prepare,
> +	.wait_finish	 = vpe_wait_finish,
> +};
> +
> +static int queue_init(void *priv, struct vb2_queue *src_vq,
> +		      struct vb2_queue *dst_vq)
> +{
> +	struct vpe_ctx *ctx = priv;
> +	int ret;
> +
> +	memset(src_vq, 0, sizeof(*src_vq));
> +	src_vq->type = V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE;
> +	src_vq->io_modes = VB2_MMAP;
> +	src_vq->drv_priv = ctx;
> +	src_vq->buf_struct_size = sizeof(struct v4l2_m2m_buffer);
> +	src_vq->ops = &vpe_qops;
> +	src_vq->mem_ops = &vb2_dma_contig_memops;
> +	src_vq->timestamp_type = V4L2_BUF_FLAG_TIMESTAMP_MONOTONIC;

Shouldn't this be TIMESTAMP_COPY?

> +
> +	ret = vb2_queue_init(src_vq);
> +	if (ret)
> +		return ret;
> +
> +	memset(dst_vq, 0, sizeof(*dst_vq));
> +	dst_vq->type = V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE;
> +	dst_vq->io_modes = VB2_MMAP;
> +	dst_vq->drv_priv = ctx;
> +	dst_vq->buf_struct_size = sizeof(struct v4l2_m2m_buffer);
> +	dst_vq->ops = &vpe_qops;
> +	dst_vq->mem_ops = &vb2_dma_contig_memops;
> +	dst_vq->timestamp_type = V4L2_BUF_FLAG_TIMESTAMP_MONOTONIC;

Ditto.

> +
> +	return vb2_queue_init(dst_vq);
> +}
> +
> +static const struct v4l2_ctrl_config vpe_bufs_per_job = {
> +	.ops = &vpe_ctrl_ops,
> +	.id = V4L2_CID_TRANS_NUM_BUFS,
> +	.name = "Buffers Per Transaction",
> +	.type = V4L2_CTRL_TYPE_INTEGER,
> +	.def = VPE_DEF_BUFS_PER_JOB,
> +	.min = 1,
> +	.max = VIDEO_MAX_FRAME,
> +	.step = 1,
> +};
> +
> +/*
> + * File operations
> + */
> +static int vpe_open(struct file *file)
> +{
> +	struct vpe_dev *dev = video_drvdata(file);
> +	struct vpe_ctx *ctx = NULL;
> +	struct vpe_q_data *s_q_data;
> +	struct v4l2_ctrl_handler *hdl;
> +	int ret;
> +
> +	vpe_dbg(dev, "vpe_open\n");
> +
> +	if (!dev->vpdma->ready) {
> +		vpe_err(dev, "vpdma firmware not loaded\n");
> +		return -ENODEV;
> +	}
> +
> +	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
> +	if (!ctx)
> +		return -ENOMEM;
> +
> +	ctx->dev = dev;
> +
> +	if (mutex_lock_interruptible(&dev->dev_mutex)) {
> +		ret = -ERESTARTSYS;
> +		goto free_ctx;
> +	}
> +
> +	ret = vpdma_create_desc_list(&ctx->desc_list, VPE_DESC_LIST_SIZE,
> +			VPDMA_LIST_TYPE_NORMAL);
> +	if (ret != 0)
> +		goto unlock;
> +
> +	ret = vpdma_alloc_desc_buf(&ctx->mmr_adb, sizeof(struct vpe_mmr_adb));
> +	if (ret != 0)
> +		goto free_desc_list;
> +
> +	init_adb_hdrs(ctx);
> +
> +	v4l2_fh_init(&ctx->fh, video_devdata(file));
> +	file->private_data = &ctx->fh;
> +
> +	hdl = &ctx->hdl;
> +	v4l2_ctrl_handler_init(hdl, 1);
> +	v4l2_ctrl_new_custom(hdl, &vpe_bufs_per_job, NULL);
> +	if (hdl->error) {
> +		ret = hdl->error;
> +		goto exit_fh;
> +	}
> +	ctx->fh.ctrl_handler = hdl;
> +	v4l2_ctrl_handler_setup(hdl);
> +
> +	s_q_data = &ctx->q_data[Q_DATA_SRC];
> +	s_q_data->fmt = &vpe_formats[2];
> +	s_q_data->width = 1920;
> +	s_q_data->height = 1080;
> +	s_q_data->sizeimage[VPE_LUMA] = (s_q_data->width * s_q_data->height *
> +			s_q_data->fmt->vpdma_fmt[VPE_LUMA]->depth) >> 3;
> +	s_q_data->colorspace = V4L2_COLORSPACE_SMPTE240M;
> +	s_q_data->c_rect.left = 0;
> +	s_q_data->c_rect.top = 0;
> +	s_q_data->c_rect.width = s_q_data->width;
> +	s_q_data->c_rect.height = s_q_data->height;
> +	s_q_data->flags = 0;
> +
> +	ctx->q_data[Q_DATA_DST] = *s_q_data;
> +
> +	set_src_registers(ctx);
> +	set_dst_registers(ctx);
> +	ret = set_srcdst_params(ctx);
> +	if (ret)
> +		goto exit_fh;
> +
> +	ctx->m2m_ctx = v4l2_m2m_ctx_init(dev->m2m_dev, ctx, &queue_init);
> +
> +	if (IS_ERR(ctx->m2m_ctx)) {
> +		ret = PTR_ERR(ctx->m2m_ctx);
> +		goto exit_fh;
> +	}
> +
> +	v4l2_fh_add(&ctx->fh);
> +
> +	/*
> +	 * for now, just report the creation of the first instance, we can later
> +	 * optimize the driver to enable or disable clocks when the first
> +	 * instance is created or the last instance released
> +	 */
> +	if (atomic_inc_return(&dev->num_instances) == 1)
> +		vpe_dbg(dev, "first instance created\n");
> +
> +	ctx->bufs_per_job = VPE_DEF_BUFS_PER_JOB;
> +
> +	ctx->load_mmrs = true;
> +
> +	vpe_dbg(dev, "created instance %p, m2m_ctx: %p\n",
> +		ctx, ctx->m2m_ctx);
> +
> +	mutex_unlock(&dev->dev_mutex);
> +
> +	return 0;
> +exit_fh:
> +	v4l2_ctrl_handler_free(hdl);
> +	v4l2_fh_exit(&ctx->fh);
> +	vpdma_free_desc_buf(&ctx->mmr_adb);
> +free_desc_list:
> +	vpdma_free_desc_list(&ctx->desc_list);
> +unlock:
> +	mutex_unlock(&dev->dev_mutex);
> +free_ctx:
> +	kfree(ctx);
> +	return ret;
> +}
> +
> +static int vpe_release(struct file *file)
> +{
> +	struct vpe_dev *dev = video_drvdata(file);
> +	struct vpe_ctx *ctx = file2ctx(file);
> +
> +	vpe_dbg(dev, "releasing instance %p\n", ctx);
> +
> +	mutex_lock(&dev->dev_mutex);
> +	vpdma_free_desc_list(&ctx->desc_list);
> +	vpdma_free_desc_buf(&ctx->mmr_adb);
> +
> +	v4l2_fh_del(&ctx->fh);
> +	v4l2_fh_exit(&ctx->fh);
> +	v4l2_ctrl_handler_free(&ctx->hdl);
> +	v4l2_m2m_ctx_release(ctx->m2m_ctx);
> +
> +	kfree(ctx);
> +
> +	/*
> +	 * for now, just report the release of the last instance, we can later
> +	 * optimize the driver to enable or disable clocks when the first
> +	 * instance is created or the last instance released
> +	 */
> +	if (atomic_dec_return(&dev->num_instances) == 0)
> +		vpe_dbg(dev, "last instance released\n");
> +
> +	mutex_unlock(&dev->dev_mutex);
> +
> +	return 0;
> +}
> +
> +static unsigned int vpe_poll(struct file *file,
> +			     struct poll_table_struct *wait)
> +{
> +	struct vpe_ctx *ctx = file2ctx(file);
> +	struct vpe_dev *dev = ctx->dev;
> +	int ret;
> +
> +	mutex_lock(&dev->dev_mutex);
> +	ret = v4l2_m2m_poll(file, ctx->m2m_ctx, wait);
> +	mutex_unlock(&dev->dev_mutex);
> +	return ret;
> +}
> +
> +static int vpe_mmap(struct file *file, struct vm_area_struct *vma)
> +{
> +	struct vpe_ctx *ctx = file2ctx(file);
> +	struct vpe_dev *dev = ctx->dev;
> +	int ret;
> +
> +	if (mutex_lock_interruptible(&dev->dev_mutex))
> +		return -ERESTARTSYS;
> +	ret = v4l2_m2m_mmap(file, ctx->m2m_ctx, vma);
> +	mutex_unlock(&dev->dev_mutex);
> +	return ret;
> +}
> +
> +static const struct v4l2_file_operations vpe_fops = {
> +	.owner		= THIS_MODULE,
> +	.open		= vpe_open,
> +	.release	= vpe_release,
> +	.poll		= vpe_poll,
> +	.unlocked_ioctl	= video_ioctl2,
> +	.mmap		= vpe_mmap,
> +};
> +
> +static struct video_device vpe_videodev = {
> +	.name		= VPE_MODULE_NAME,
> +	.fops		= &vpe_fops,
> +	.ioctl_ops	= &vpe_ioctl_ops,
> +	.minor		= -1,
> +	.release	= video_device_release,
> +	.vfl_dir	= VFL_DIR_M2M,
> +};
> +
> +static struct v4l2_m2m_ops m2m_ops = {
> +	.device_run	= device_run,
> +	.job_ready	= job_ready,
> +	.job_abort	= job_abort,
> +	.lock		= vpe_lock,
> +	.unlock		= vpe_unlock,
> +};
> +
> +static int vpe_runtime_get(struct platform_device *pdev)
> +{
> +	int r;
> +
> +	dev_dbg(&pdev->dev, "vpe_runtime_get\n");
> +
> +	r = pm_runtime_get_sync(&pdev->dev);
> +	WARN_ON(r < 0);
> +	return r < 0 ? r : 0;
> +}
> +
> +static void vpe_runtime_put(struct platform_device *pdev)
> +{
> +
> +	int r;
> +
> +	dev_dbg(&pdev->dev, "vpe_runtime_put\n");
> +
> +	r = pm_runtime_put_sync(&pdev->dev);
> +	WARN_ON(r < 0 && r != -ENOSYS);
> +}
> +
> +static int vpe_probe(struct platform_device *pdev)
> +{
> +	struct vpe_dev *dev;
> +	struct video_device *vfd;
> +	struct resource *res;
> +	int ret, irq, func;
> +
> +	dev = devm_kzalloc(&pdev->dev, sizeof(*dev), GFP_KERNEL);
> +	if (IS_ERR(dev))
> +		return PTR_ERR(dev);
> +
> +	spin_lock_init(&dev->lock);
> +
> +	ret = v4l2_device_register(&pdev->dev, &dev->v4l2_dev);
> +	if (ret)
> +		return ret;
> +
> +	atomic_set(&dev->num_instances, 0);
> +	mutex_init(&dev->dev_mutex);
> +
> +	res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "vpe");
> +	dev->base = devm_ioremap_resource(&pdev->dev, res);
> +	if (IS_ERR(dev->base)) {
> +		ret = PTR_ERR(dev->base);
> +		goto v4l2_dev_unreg;
> +	}
> +
> +	irq = platform_get_irq(pdev, 0);
> +	ret = devm_request_irq(&pdev->dev, irq, vpe_irq, 0, VPE_MODULE_NAME,
> +			dev);
> +	if (ret)
> +		goto v4l2_dev_unreg;
> +
> +	platform_set_drvdata(pdev, dev);
> +
> +	dev->alloc_ctx = vb2_dma_contig_init_ctx(&pdev->dev);
> +	if (IS_ERR(dev->alloc_ctx)) {
> +		vpe_err(dev, "Failed to alloc vb2 context\n");
> +		ret = PTR_ERR(dev->alloc_ctx);
> +		goto v4l2_dev_unreg;
> +	}
> +
> +	dev->m2m_dev = v4l2_m2m_init(&m2m_ops);
> +	if (IS_ERR(dev->m2m_dev)) {
> +		vpe_err(dev, "Failed to init mem2mem device\n");
> +		ret = PTR_ERR(dev->m2m_dev);
> +		goto rel_ctx;
> +	}
> +
> +	pm_runtime_enable(&pdev->dev);
> +
> +	ret = vpe_runtime_get(pdev);
> +	if (ret)
> +		goto rel_m2m;
> +
> +	/* Perform clk enable followed by reset */
> +	vpe_set_clock_enable(dev, 1);
> +
> +	vpe_top_reset(dev);
> +
> +	func = read_field_reg(dev, VPE_PID, VPE_PID_FUNC_MASK,
> +		VPE_PID_FUNC_SHIFT);
> +	vpe_dbg(dev, "VPE PID function %x\n", func);
> +
> +	vpe_top_vpdma_reset(dev);
> +
> +	dev->vpdma = vpdma_create(pdev);
> +	if (IS_ERR(dev->vpdma))
> +		goto runtime_put;
> +
> +	vfd = &dev->vfd;
> +	*vfd = vpe_videodev;
> +	vfd->lock = &dev->dev_mutex;
> +	vfd->v4l2_dev = &dev->v4l2_dev;
> +
> +	ret = video_register_device(vfd, VFL_TYPE_GRABBER, 0);
> +	if (ret) {
> +		vpe_err(dev, "Failed to register video device\n");
> +		goto runtime_put;
> +	}
> +
> +	video_set_drvdata(vfd, dev);
> +	snprintf(vfd->name, sizeof(vfd->name), "%s", vpe_videodev.name);
> +	dev_info(dev->v4l2_dev.dev, "Device registered as /dev/video%d\n",
> +		vfd->num);
> +
> +	return 0;
> +
> +runtime_put:
> +	vpe_runtime_put(pdev);
> +rel_m2m:
> +	pm_runtime_disable(&pdev->dev);
> +	v4l2_m2m_release(dev->m2m_dev);
> +rel_ctx:
> +	vb2_dma_contig_cleanup_ctx(dev->alloc_ctx);
> +v4l2_dev_unreg:
> +	v4l2_device_unregister(&dev->v4l2_dev);
> +
> +	return ret;
> +}
> +
> +static int vpe_remove(struct platform_device *pdev)
> +{
> +	struct vpe_dev *dev =
> +		(struct vpe_dev *) platform_get_drvdata(pdev);
> +
> +	v4l2_info(&dev->v4l2_dev, "Removing " VPE_MODULE_NAME);
> +
> +	v4l2_m2m_release(dev->m2m_dev);
> +	video_unregister_device(&dev->vfd);
> +	v4l2_device_unregister(&dev->v4l2_dev);
> +	vb2_dma_contig_cleanup_ctx(dev->alloc_ctx);
> +
> +	vpe_set_clock_enable(dev, 0);
> +	vpe_runtime_put(pdev);
> +	pm_runtime_disable(&pdev->dev);
> +
> +	return 0;
> +}
> +
> +#if defined(CONFIG_OF)
> +static const struct of_device_id vpe_of_match[] = {
> +	{
> +		.compatible = "ti,vpe",
> +	},
> +	{},
> +};
> +#else
> +#define vpe_of_match NULL
> +#endif
> +
> +static struct platform_driver vpe_pdrv = {
> +	.probe		= vpe_probe,
> +	.remove		= vpe_remove,
> +	.driver		= {
> +		.name	= VPE_MODULE_NAME,
> +		.owner	= THIS_MODULE,
> +		.of_match_table = vpe_of_match,
> +	},
> +};
> +
> +static void __exit vpe_exit(void)
> +{
> +	platform_driver_unregister(&vpe_pdrv);
> +}
> +
> +static int __init vpe_init(void)
> +{
> +	return platform_driver_register(&vpe_pdrv);
> +}
> +
> +module_init(vpe_init);
> +module_exit(vpe_exit);
> +
> +MODULE_DESCRIPTION("TI VPE driver");
> +MODULE_AUTHOR("Dale Farnsworth, <dale@farnsworth.org>");
> +MODULE_LICENSE("GPL");

Regards,

	Hans

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v3 5/6] arm: dra7xx: hwmod data: add VPE hwmod data and ocp_if info
  2013-08-29 12:42         ` Rajendra Nayak
@ 2013-08-29 13:42           ` Archit Taneja
  -1 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-08-29 13:42 UTC (permalink / raw)
  To: Rajendra Nayak
  Cc: linux-media, hverkuil, laurent.pinchart, tomi.valkeinen,
	linux-omap, Sricharan R

On Thursday 29 August 2013 06:12 PM, Rajendra Nayak wrote:
> Archit,
>
> On Thursday 29 August 2013 06:02 PM, Archit Taneja wrote:
>> Add hwmod data for the VPE IP, this is needed for the IP to be reset during
>> boot, and control the functional clock when the driver needs it via
>> pm_runtime apis. Add the corresponding ocp_if struct and add it DRA7XX's
>> ocp interface list.
>
> You need to swap patches 5/6 and 6/6 to maintain git-bisect.
> Thats needed because after $subject patch, hwmod wouldn't find
> the register iospace and crash, and thats added only in patch 6/6.

That's a good point, I'll take care of this.

Thanks,
Archit

>
> regards,
> Rajendra
>
>>
>> Cc: Rajendra Nayak <rnayak@ti.com>
>> Cc: Sricharan R <r.sricharan@ti.com>
>> Signed-off-by: Archit Taneja <archit@ti.com>
>> ---
>>   arch/arm/mach-omap2/omap_hwmod_7xx_data.c | 42 +++++++++++++++++++++++++++++++
>>   1 file changed, 42 insertions(+)
>>
>> diff --git a/arch/arm/mach-omap2/omap_hwmod_7xx_data.c b/arch/arm/mach-omap2/omap_hwmod_7xx_data.c
>> index f647998b..181365d 100644
>> --- a/arch/arm/mach-omap2/omap_hwmod_7xx_data.c
>> +++ b/arch/arm/mach-omap2/omap_hwmod_7xx_data.c
>> @@ -1883,6 +1883,39 @@ static struct omap_hwmod dra7xx_wd_timer2_hwmod = {
>>   	},
>>   };
>>
>> +/*
>> + * 'vpe' class
>> + *
>> + */
>> +
>> +static struct omap_hwmod_class_sysconfig dra7xx_vpe_sysc = {
>> +	.sysc_offs	= 0x0010,
>> +	.sysc_flags	= (SYSC_HAS_MIDLEMODE | SYSC_HAS_SIDLEMODE),
>> +	.idlemodes	= (SIDLE_FORCE | SIDLE_NO | SIDLE_SMART |
>> +			   SIDLE_SMART_WKUP | MSTANDBY_FORCE | MSTANDBY_NO |
>> +			   MSTANDBY_SMART | MSTANDBY_SMART_WKUP),
>> +	.sysc_fields	= &omap_hwmod_sysc_type2,
>> +};
>> +
>> +static struct omap_hwmod_class dra7xx_vpe_hwmod_class = {
>> +	.name	= "vpe",
>> +	.sysc	= &dra7xx_vpe_sysc,
>> +};
>> +
>> +/* vpe */
>> +static struct omap_hwmod dra7xx_vpe_hwmod = {
>> +	.name		= "vpe",
>> +	.class		= &dra7xx_vpe_hwmod_class,
>> +	.clkdm_name	= "vpe_clkdm",
>> +	.main_clk	= "dpll_core_h23x2_ck",
>> +	.prcm = {
>> +		.omap4 = {
>> +			.clkctrl_offs = DRA7XX_CM_VPE_VPE_CLKCTRL_OFFSET,
>> +			.context_offs = DRA7XX_RM_VPE_VPE_CONTEXT_OFFSET,
>> +			.modulemode   = MODULEMODE_HWCTRL,
>> +		},
>> +	},
>> +};
>>
>>   /*
>>    * Interfaces
>> @@ -2636,6 +2669,14 @@ static struct omap_hwmod_ocp_if dra7xx_l4_wkup__wd_timer2 = {
>>   	.user		= OCP_USER_MPU | OCP_USER_SDMA,
>>   };
>>
>> +/* l4_per3 -> vpe */
>> +static struct omap_hwmod_ocp_if dra7xx_l4_per3__vpe = {
>> +	.master		= &dra7xx_l4_per3_hwmod,
>> +	.slave		= &dra7xx_vpe_hwmod,
>> +	.clk		= "l3_iclk_div",
>> +	.user		= OCP_USER_MPU | OCP_USER_SDMA,
>> +};
>> +
>>   static struct omap_hwmod_ocp_if *dra7xx_hwmod_ocp_ifs[] __initdata = {
>>   	&dra7xx_l3_main_2__l3_instr,
>>   	&dra7xx_l4_cfg__l3_main_1,
>> @@ -2714,6 +2755,7 @@ static struct omap_hwmod_ocp_if *dra7xx_hwmod_ocp_ifs[] __initdata = {
>>   	&dra7xx_l3_main_1__vcp2,
>>   	&dra7xx_l4_per2__vcp2,
>>   	&dra7xx_l4_wkup__wd_timer2,
>> +	&dra7xx_l4_per3__vpe,
>>   	NULL,
>>   };
>>
>>
>
>
>


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v3 5/6] arm: dra7xx: hwmod data: add VPE hwmod data and ocp_if info
@ 2013-08-29 13:42           ` Archit Taneja
  0 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-08-29 13:42 UTC (permalink / raw)
  To: Rajendra Nayak
  Cc: linux-media, hverkuil, laurent.pinchart, tomi.valkeinen,
	linux-omap, Sricharan R

On Thursday 29 August 2013 06:12 PM, Rajendra Nayak wrote:
> Archit,
>
> On Thursday 29 August 2013 06:02 PM, Archit Taneja wrote:
>> Add hwmod data for the VPE IP, this is needed for the IP to be reset during
>> boot, and control the functional clock when the driver needs it via
>> pm_runtime apis. Add the corresponding ocp_if struct and add it DRA7XX's
>> ocp interface list.
>
> You need to swap patches 5/6 and 6/6 to maintain git-bisect.
> Thats needed because after $subject patch, hwmod wouldn't find
> the register iospace and crash, and thats added only in patch 6/6.

That's a good point, I'll take care of this.

Thanks,
Archit

>
> regards,
> Rajendra
>
>>
>> Cc: Rajendra Nayak <rnayak@ti.com>
>> Cc: Sricharan R <r.sricharan@ti.com>
>> Signed-off-by: Archit Taneja <archit@ti.com>
>> ---
>>   arch/arm/mach-omap2/omap_hwmod_7xx_data.c | 42 +++++++++++++++++++++++++++++++
>>   1 file changed, 42 insertions(+)
>>
>> diff --git a/arch/arm/mach-omap2/omap_hwmod_7xx_data.c b/arch/arm/mach-omap2/omap_hwmod_7xx_data.c
>> index f647998b..181365d 100644
>> --- a/arch/arm/mach-omap2/omap_hwmod_7xx_data.c
>> +++ b/arch/arm/mach-omap2/omap_hwmod_7xx_data.c
>> @@ -1883,6 +1883,39 @@ static struct omap_hwmod dra7xx_wd_timer2_hwmod = {
>>   	},
>>   };
>>
>> +/*
>> + * 'vpe' class
>> + *
>> + */
>> +
>> +static struct omap_hwmod_class_sysconfig dra7xx_vpe_sysc = {
>> +	.sysc_offs	= 0x0010,
>> +	.sysc_flags	= (SYSC_HAS_MIDLEMODE | SYSC_HAS_SIDLEMODE),
>> +	.idlemodes	= (SIDLE_FORCE | SIDLE_NO | SIDLE_SMART |
>> +			   SIDLE_SMART_WKUP | MSTANDBY_FORCE | MSTANDBY_NO |
>> +			   MSTANDBY_SMART | MSTANDBY_SMART_WKUP),
>> +	.sysc_fields	= &omap_hwmod_sysc_type2,
>> +};
>> +
>> +static struct omap_hwmod_class dra7xx_vpe_hwmod_class = {
>> +	.name	= "vpe",
>> +	.sysc	= &dra7xx_vpe_sysc,
>> +};
>> +
>> +/* vpe */
>> +static struct omap_hwmod dra7xx_vpe_hwmod = {
>> +	.name		= "vpe",
>> +	.class		= &dra7xx_vpe_hwmod_class,
>> +	.clkdm_name	= "vpe_clkdm",
>> +	.main_clk	= "dpll_core_h23x2_ck",
>> +	.prcm = {
>> +		.omap4 = {
>> +			.clkctrl_offs = DRA7XX_CM_VPE_VPE_CLKCTRL_OFFSET,
>> +			.context_offs = DRA7XX_RM_VPE_VPE_CONTEXT_OFFSET,
>> +			.modulemode   = MODULEMODE_HWCTRL,
>> +		},
>> +	},
>> +};
>>
>>   /*
>>    * Interfaces
>> @@ -2636,6 +2669,14 @@ static struct omap_hwmod_ocp_if dra7xx_l4_wkup__wd_timer2 = {
>>   	.user		= OCP_USER_MPU | OCP_USER_SDMA,
>>   };
>>
>> +/* l4_per3 -> vpe */
>> +static struct omap_hwmod_ocp_if dra7xx_l4_per3__vpe = {
>> +	.master		= &dra7xx_l4_per3_hwmod,
>> +	.slave		= &dra7xx_vpe_hwmod,
>> +	.clk		= "l3_iclk_div",
>> +	.user		= OCP_USER_MPU | OCP_USER_SDMA,
>> +};
>> +
>>   static struct omap_hwmod_ocp_if *dra7xx_hwmod_ocp_ifs[] __initdata = {
>>   	&dra7xx_l3_main_2__l3_instr,
>>   	&dra7xx_l4_cfg__l3_main_1,
>> @@ -2714,6 +2755,7 @@ static struct omap_hwmod_ocp_if *dra7xx_hwmod_ocp_ifs[] __initdata = {
>>   	&dra7xx_l3_main_1__vcp2,
>>   	&dra7xx_l4_per2__vcp2,
>>   	&dra7xx_l4_wkup__wd_timer2,
>> +	&dra7xx_l4_per3__vpe,
>>   	NULL,
>>   };
>>
>>
>
>
>


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v3 3/6] v4l: ti-vpe: Add VPE mem to mem driver
  2013-08-29 13:28       ` Hans Verkuil
@ 2013-08-30  6:47           ` Archit Taneja
  2013-09-05  5:56           ` Archit Taneja
  1 sibling, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-08-30  6:47 UTC (permalink / raw)
  To: Hans Verkuil; +Cc: linux-media, laurent.pinchart, tomi.valkeinen, linux-omap

On Thursday 29 August 2013 06:58 PM, Hans Verkuil wrote:
> On Thu 29 August 2013 14:32:49 Archit Taneja wrote:
>> VPE is a block which consists of a single memory to memory path which can
>> perform chrominance up/down sampling, de-interlacing, scaling, and color space
>> conversion of raster or tiled YUV420 coplanar, YUV422 coplanar or YUV422
>> interleaved video formats.
>>
>> We create a mem2mem driver based primarily on the mem2mem-testdev example.
>> The de-interlacer, scaler and color space converter are all bypassed for now
>> to keep the driver simple. Chroma up/down sampler blocks are implemented, so
>> conversion beteen different YUV formats is possible.
>>
>> Each mem2mem context allocates a buffer for VPE MMR values which it will use
>> when it gets access to the VPE HW via the mem2mem queue, it also allocates
>> a VPDMA descriptor list to which configuration and data descriptors are added.
>>
>> Based on the information received via v4l2 ioctls for the source and
>> destination queues, the driver configures the values for the MMRs, and stores
>> them in the buffer. There are also some VPDMA parameters like frame start and
>> line mode which needs to be configured, these are configured by direct register
>> writes via the VPDMA helper functions.
>>
>> The driver's device_run() mem2mem op will add each descriptor based on how the
>> source and destination queues are set up for the given ctx, once the list is
>> prepared, it's submitted to VPDMA, these descriptors when parsed by VPDMA will
>> upload MMR registers, start DMA of video buffers on the various input and output
>> clients/ports.
>>
>> When the list is parsed completely(and the DMAs on all the output ports done),
>> an interrupt is generated which we use to notify that the source and destination
>> buffers are done.
>>
>> The rest of the driver is quite similar to other mem2mem drivers, we use the
>> multiplane v4l2 ioctls as the HW support coplanar formats.
>>
>> Signed-off-by: Archit Taneja <archit@ti.com>
>
> Thanks for the patch. Just a few small comments below...
>
>> ---
>>   drivers/media/platform/Kconfig           |   16 +
>>   drivers/media/platform/Makefile          |    2 +
>>   drivers/media/platform/ti-vpe/Makefile   |    5 +
>>   drivers/media/platform/ti-vpe/vpe.c      | 1740 ++++++++++++++++++++++++++++++
>>   drivers/media/platform/ti-vpe/vpe_regs.h |  496 +++++++++
>>   5 files changed, 2259 insertions(+)
>>   create mode 100644 drivers/media/platform/ti-vpe/Makefile
>>   create mode 100644 drivers/media/platform/ti-vpe/vpe.c
>>   create mode 100644 drivers/media/platform/ti-vpe/vpe_regs.h
>>
>> diff --git a/drivers/media/platform/ti-vpe/vpe.c b/drivers/media/platform/ti-vpe/vpe.c
>> new file mode 100644
>> index 0000000..85b0880
>> --- /dev/null
>> +++ b/drivers/media/platform/ti-vpe/vpe.c
>
> <snip>
>
>> +static int vpe_enum_fmt(struct file *file, void *priv,
>> +				struct v4l2_fmtdesc *f)
>> +{
>> +	if (V4L2_TYPE_IS_OUTPUT(f->type))
>> +		return __enum_fmt(f, VPE_FMT_TYPE_OUTPUT);
>> +	else
>
> The line above isn't necessary.

Oh right, thanks for spotting that.

>
>> +		return __enum_fmt(f, VPE_FMT_TYPE_CAPTURE);
>> +}
>> +
<snip>

>> +
>> +	pix->field = V4L2_FIELD_NONE;
>> +
>> +	v4l_bound_align_image(&pix->width, MIN_W, MAX_W, W_ALIGN,
>> +			      &pix->height, MIN_H, MAX_H, H_ALIGN,
>> +			      S_ALIGN);
>> +
>> +	pix->num_planes = fmt->coplanar ? 2 : 1;
>> +	pix->pixelformat = fmt->fourcc;
>> +	pix->colorspace = fmt->fourcc == V4L2_PIX_FMT_RGB24 ?
>> +			V4L2_COLORSPACE_SRGB : V4L2_COLORSPACE_SMPTE170M;
>
> pix->priv should be set to NULL as well.

I'll fix this.

<snip>

>> +}
>> +
>> +#define V4L2_CID_TRANS_NUM_BUFS		(V4L2_CID_USER_BASE + 0x1000)
>
> Reserve a control range for this driver in include/uapi/linux/v4l2-controls.h.
> Similar to the ones already defined there.
>
> That will ensure that controls for this driver have unique IDs.

Thanks, I took this from the mem2mem-testdev driver, a test driver 
doesn't need to worry about this I suppose.

I had a query regarding this. I am planning to add a capture driver in 
the future for a similar IP which can share some of the control IDs with 
VPE. Is it possible for 2 different drivers to share the IDs?

Also, I noticed in the header that most drivers reserve space for 16 
IDs. The current driver just has one, but there will be more custom ones 
in the future. Is it fine if I reserve 16 for this driver too?

<snip>

>> +
>> +static int queue_init(void *priv, struct vb2_queue *src_vq,
>> +		      struct vb2_queue *dst_vq)
>> +{
>> +	struct vpe_ctx *ctx = priv;
>> +	int ret;
>> +
>> +	memset(src_vq, 0, sizeof(*src_vq));
>> +	src_vq->type = V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE;
>> +	src_vq->io_modes = VB2_MMAP;
>> +	src_vq->drv_priv = ctx;
>> +	src_vq->buf_struct_size = sizeof(struct v4l2_m2m_buffer);
>> +	src_vq->ops = &vpe_qops;
>> +	src_vq->mem_ops = &vb2_dma_contig_memops;
>> +	src_vq->timestamp_type = V4L2_BUF_FLAG_TIMESTAMP_MONOTONIC;
>
> Shouldn't this be TIMESTAMP_COPY?

Right, it should be, I'll fix it.

>
>> +
>> +	ret = vb2_queue_init(src_vq);
>> +	if (ret)
>> +		return ret;
>> +
>> +	memset(dst_vq, 0, sizeof(*dst_vq));
>> +	dst_vq->type = V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE;
>> +	dst_vq->io_modes = VB2_MMAP;
>> +	dst_vq->drv_priv = ctx;
>> +	dst_vq->buf_struct_size = sizeof(struct v4l2_m2m_buffer);
>> +	dst_vq->ops = &vpe_qops;
>> +	dst_vq->mem_ops = &vb2_dma_contig_memops;
>> +	dst_vq->timestamp_type = V4L2_BUF_FLAG_TIMESTAMP_MONOTONIC;
>
> Ditto.
>

Thanks for the review.

Archit



^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v3 3/6] v4l: ti-vpe: Add VPE mem to mem driver
@ 2013-08-30  6:47           ` Archit Taneja
  0 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-08-30  6:47 UTC (permalink / raw)
  To: Hans Verkuil; +Cc: linux-media, laurent.pinchart, tomi.valkeinen, linux-omap

On Thursday 29 August 2013 06:58 PM, Hans Verkuil wrote:
> On Thu 29 August 2013 14:32:49 Archit Taneja wrote:
>> VPE is a block which consists of a single memory to memory path which can
>> perform chrominance up/down sampling, de-interlacing, scaling, and color space
>> conversion of raster or tiled YUV420 coplanar, YUV422 coplanar or YUV422
>> interleaved video formats.
>>
>> We create a mem2mem driver based primarily on the mem2mem-testdev example.
>> The de-interlacer, scaler and color space converter are all bypassed for now
>> to keep the driver simple. Chroma up/down sampler blocks are implemented, so
>> conversion beteen different YUV formats is possible.
>>
>> Each mem2mem context allocates a buffer for VPE MMR values which it will use
>> when it gets access to the VPE HW via the mem2mem queue, it also allocates
>> a VPDMA descriptor list to which configuration and data descriptors are added.
>>
>> Based on the information received via v4l2 ioctls for the source and
>> destination queues, the driver configures the values for the MMRs, and stores
>> them in the buffer. There are also some VPDMA parameters like frame start and
>> line mode which needs to be configured, these are configured by direct register
>> writes via the VPDMA helper functions.
>>
>> The driver's device_run() mem2mem op will add each descriptor based on how the
>> source and destination queues are set up for the given ctx, once the list is
>> prepared, it's submitted to VPDMA, these descriptors when parsed by VPDMA will
>> upload MMR registers, start DMA of video buffers on the various input and output
>> clients/ports.
>>
>> When the list is parsed completely(and the DMAs on all the output ports done),
>> an interrupt is generated which we use to notify that the source and destination
>> buffers are done.
>>
>> The rest of the driver is quite similar to other mem2mem drivers, we use the
>> multiplane v4l2 ioctls as the HW support coplanar formats.
>>
>> Signed-off-by: Archit Taneja <archit@ti.com>
>
> Thanks for the patch. Just a few small comments below...
>
>> ---
>>   drivers/media/platform/Kconfig           |   16 +
>>   drivers/media/platform/Makefile          |    2 +
>>   drivers/media/platform/ti-vpe/Makefile   |    5 +
>>   drivers/media/platform/ti-vpe/vpe.c      | 1740 ++++++++++++++++++++++++++++++
>>   drivers/media/platform/ti-vpe/vpe_regs.h |  496 +++++++++
>>   5 files changed, 2259 insertions(+)
>>   create mode 100644 drivers/media/platform/ti-vpe/Makefile
>>   create mode 100644 drivers/media/platform/ti-vpe/vpe.c
>>   create mode 100644 drivers/media/platform/ti-vpe/vpe_regs.h
>>
>> diff --git a/drivers/media/platform/ti-vpe/vpe.c b/drivers/media/platform/ti-vpe/vpe.c
>> new file mode 100644
>> index 0000000..85b0880
>> --- /dev/null
>> +++ b/drivers/media/platform/ti-vpe/vpe.c
>
> <snip>
>
>> +static int vpe_enum_fmt(struct file *file, void *priv,
>> +				struct v4l2_fmtdesc *f)
>> +{
>> +	if (V4L2_TYPE_IS_OUTPUT(f->type))
>> +		return __enum_fmt(f, VPE_FMT_TYPE_OUTPUT);
>> +	else
>
> The line above isn't necessary.

Oh right, thanks for spotting that.

>
>> +		return __enum_fmt(f, VPE_FMT_TYPE_CAPTURE);
>> +}
>> +
<snip>

>> +
>> +	pix->field = V4L2_FIELD_NONE;
>> +
>> +	v4l_bound_align_image(&pix->width, MIN_W, MAX_W, W_ALIGN,
>> +			      &pix->height, MIN_H, MAX_H, H_ALIGN,
>> +			      S_ALIGN);
>> +
>> +	pix->num_planes = fmt->coplanar ? 2 : 1;
>> +	pix->pixelformat = fmt->fourcc;
>> +	pix->colorspace = fmt->fourcc == V4L2_PIX_FMT_RGB24 ?
>> +			V4L2_COLORSPACE_SRGB : V4L2_COLORSPACE_SMPTE170M;
>
> pix->priv should be set to NULL as well.

I'll fix this.

<snip>

>> +}
>> +
>> +#define V4L2_CID_TRANS_NUM_BUFS		(V4L2_CID_USER_BASE + 0x1000)
>
> Reserve a control range for this driver in include/uapi/linux/v4l2-controls.h.
> Similar to the ones already defined there.
>
> That will ensure that controls for this driver have unique IDs.

Thanks, I took this from the mem2mem-testdev driver, a test driver 
doesn't need to worry about this I suppose.

I had a query regarding this. I am planning to add a capture driver in 
the future for a similar IP which can share some of the control IDs with 
VPE. Is it possible for 2 different drivers to share the IDs?

Also, I noticed in the header that most drivers reserve space for 16 
IDs. The current driver just has one, but there will be more custom ones 
in the future. Is it fine if I reserve 16 for this driver too?

<snip>

>> +
>> +static int queue_init(void *priv, struct vb2_queue *src_vq,
>> +		      struct vb2_queue *dst_vq)
>> +{
>> +	struct vpe_ctx *ctx = priv;
>> +	int ret;
>> +
>> +	memset(src_vq, 0, sizeof(*src_vq));
>> +	src_vq->type = V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE;
>> +	src_vq->io_modes = VB2_MMAP;
>> +	src_vq->drv_priv = ctx;
>> +	src_vq->buf_struct_size = sizeof(struct v4l2_m2m_buffer);
>> +	src_vq->ops = &vpe_qops;
>> +	src_vq->mem_ops = &vb2_dma_contig_memops;
>> +	src_vq->timestamp_type = V4L2_BUF_FLAG_TIMESTAMP_MONOTONIC;
>
> Shouldn't this be TIMESTAMP_COPY?

Right, it should be, I'll fix it.

>
>> +
>> +	ret = vb2_queue_init(src_vq);
>> +	if (ret)
>> +		return ret;
>> +
>> +	memset(dst_vq, 0, sizeof(*dst_vq));
>> +	dst_vq->type = V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE;
>> +	dst_vq->io_modes = VB2_MMAP;
>> +	dst_vq->drv_priv = ctx;
>> +	dst_vq->buf_struct_size = sizeof(struct v4l2_m2m_buffer);
>> +	dst_vq->ops = &vpe_qops;
>> +	dst_vq->mem_ops = &vb2_dma_contig_memops;
>> +	dst_vq->timestamp_type = V4L2_BUF_FLAG_TIMESTAMP_MONOTONIC;
>
> Ditto.
>

Thanks for the review.

Archit



^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v3 3/6] v4l: ti-vpe: Add VPE mem to mem driver
  2013-08-30  6:47           ` Archit Taneja
  (?)
@ 2013-08-30  7:07           ` Hans Verkuil
  2013-08-30 10:05               ` Archit Taneja
  -1 siblings, 1 reply; 138+ messages in thread
From: Hans Verkuil @ 2013-08-30  7:07 UTC (permalink / raw)
  To: Archit Taneja; +Cc: linux-media, laurent.pinchart, tomi.valkeinen, linux-omap

On 08/30/2013 08:47 AM, Archit Taneja wrote:
> On Thursday 29 August 2013 06:58 PM, Hans Verkuil wrote:
>> On Thu 29 August 2013 14:32:49 Archit Taneja wrote:
>>> VPE is a block which consists of a single memory to memory path which can
>>> perform chrominance up/down sampling, de-interlacing, scaling, and color space
>>> conversion of raster or tiled YUV420 coplanar, YUV422 coplanar or YUV422
>>> interleaved video formats.
>>>
>>> We create a mem2mem driver based primarily on the mem2mem-testdev example.
>>> The de-interlacer, scaler and color space converter are all bypassed for now
>>> to keep the driver simple. Chroma up/down sampler blocks are implemented, so
>>> conversion beteen different YUV formats is possible.
>>>
>>> Each mem2mem context allocates a buffer for VPE MMR values which it will use
>>> when it gets access to the VPE HW via the mem2mem queue, it also allocates
>>> a VPDMA descriptor list to which configuration and data descriptors are added.
>>>
>>> Based on the information received via v4l2 ioctls for the source and
>>> destination queues, the driver configures the values for the MMRs, and stores
>>> them in the buffer. There are also some VPDMA parameters like frame start and
>>> line mode which needs to be configured, these are configured by direct register
>>> writes via the VPDMA helper functions.
>>>
>>> The driver's device_run() mem2mem op will add each descriptor based on how the
>>> source and destination queues are set up for the given ctx, once the list is
>>> prepared, it's submitted to VPDMA, these descriptors when parsed by VPDMA will
>>> upload MMR registers, start DMA of video buffers on the various input and output
>>> clients/ports.
>>>
>>> When the list is parsed completely(and the DMAs on all the output ports done),
>>> an interrupt is generated which we use to notify that the source and destination
>>> buffers are done.
>>>
>>> The rest of the driver is quite similar to other mem2mem drivers, we use the
>>> multiplane v4l2 ioctls as the HW support coplanar formats.
>>>
>>> Signed-off-by: Archit Taneja <archit@ti.com>
>>
>> Thanks for the patch. Just a few small comments below...
>>
>>> ---
>>>   drivers/media/platform/Kconfig           |   16 +
>>>   drivers/media/platform/Makefile          |    2 +
>>>   drivers/media/platform/ti-vpe/Makefile   |    5 +
>>>   drivers/media/platform/ti-vpe/vpe.c      | 1740 ++++++++++++++++++++++++++++++
>>>   drivers/media/platform/ti-vpe/vpe_regs.h |  496 +++++++++
>>>   5 files changed, 2259 insertions(+)
>>>   create mode 100644 drivers/media/platform/ti-vpe/Makefile
>>>   create mode 100644 drivers/media/platform/ti-vpe/vpe.c
>>>   create mode 100644 drivers/media/platform/ti-vpe/vpe_regs.h
>>>
>>> diff --git a/drivers/media/platform/ti-vpe/vpe.c b/drivers/media/platform/ti-vpe/vpe.c
>>> new file mode 100644
>>> index 0000000..85b0880
>>> --- /dev/null
>>> +++ b/drivers/media/platform/ti-vpe/vpe.c
>>
>> <snip>
>>
>>> +static int vpe_enum_fmt(struct file *file, void *priv,
>>> +				struct v4l2_fmtdesc *f)
>>> +{
>>> +	if (V4L2_TYPE_IS_OUTPUT(f->type))
>>> +		return __enum_fmt(f, VPE_FMT_TYPE_OUTPUT);
>>> +	else
>>
>> The line above isn't necessary.
> 
> Oh right, thanks for spotting that.
> 
>>
>>> +		return __enum_fmt(f, VPE_FMT_TYPE_CAPTURE);
>>> +}
>>> +
> <snip>
> 
>>> +
>>> +	pix->field = V4L2_FIELD_NONE;
>>> +
>>> +	v4l_bound_align_image(&pix->width, MIN_W, MAX_W, W_ALIGN,
>>> +			      &pix->height, MIN_H, MAX_H, H_ALIGN,
>>> +			      S_ALIGN);
>>> +
>>> +	pix->num_planes = fmt->coplanar ? 2 : 1;
>>> +	pix->pixelformat = fmt->fourcc;
>>> +	pix->colorspace = fmt->fourcc == V4L2_PIX_FMT_RGB24 ?
>>> +			V4L2_COLORSPACE_SRGB : V4L2_COLORSPACE_SMPTE170M;
>>
>> pix->priv should be set to NULL as well.
> 
> I'll fix this.
> 
> <snip>
> 
>>> +}
>>> +
>>> +#define V4L2_CID_TRANS_NUM_BUFS		(V4L2_CID_USER_BASE + 0x1000)
>>
>> Reserve a control range for this driver in include/uapi/linux/v4l2-controls.h.
>> Similar to the ones already defined there.
>>
>> That will ensure that controls for this driver have unique IDs.
> 
> Thanks, I took this from the mem2mem-testdev driver, a test driver 
> doesn't need to worry about this I suppose.
> 
> I had a query regarding this. I am planning to add a capture driver in 
> the future for a similar IP which can share some of the control IDs with 
> VPE. Is it possible for 2 different drivers to share the IDs?

Certainly. There are three levels of controls:

1) Standard controls: can be used by any driver and are documented in the spec.
2) IP-specific controls: controls specific for a commonly used IP.
   These can be used by any driver containing that IP and are documented as well
   in the spec. Good examples are the MFC and CX2341x MPEG controls.
3) Driver-specific controls: these are specific to a driver and do not have to be
   documented in the spec, only in the header/source specifying them. A range
   of controls needs to be assigned to such a driver in v4l2-dv-controls.h.

In your case it looks like the controls would fall into category 2.

> Also, I noticed in the header that most drivers reserve space for 16 
> IDs. The current driver just has one, but there will be more custom ones 
> in the future. Is it fine if I reserve 16 for this driver too?

Sure, that's no problem. Make sure you reserve enough space for future
expansion, i.e. IDs are cheap, so no need to be conservative when defining
the range.

Regards,

	Hans

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v3 3/6] v4l: ti-vpe: Add VPE mem to mem driver
  2013-08-30  7:07           ` Hans Verkuil
@ 2013-08-30 10:05               ` Archit Taneja
  0 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-08-30 10:05 UTC (permalink / raw)
  To: Hans Verkuil; +Cc: linux-media, laurent.pinchart, tomi.valkeinen, linux-omap

Hi,

On Friday 30 August 2013 12:37 PM, Hans Verkuil wrote:
> On 08/30/2013 08:47 AM, Archit Taneja wrote:
>> On Thursday 29 August 2013 06:58 PM, Hans Verkuil wrote:
>>> On Thu 29 August 2013 14:32:49 Archit Taneja wrote:
>>>> VPE is a block which consists of a single memory to memory path which can
>>>> perform chrominance up/down sampling, de-interlacing, scaling, and color space
>>>> conversion of raster or tiled YUV420 coplanar, YUV422 coplanar or YUV422
>>>> interleaved video formats.
>>>>
>>>> We create a mem2mem driver based primarily on the mem2mem-testdev example.
>>>> The de-interlacer, scaler and color space converter are all bypassed for now
>>>> to keep the driver simple. Chroma up/down sampler blocks are implemented, so
>>>> conversion beteen different YUV formats is possible.
>>>>
>>>> Each mem2mem context allocates a buffer for VPE MMR values which it will use
>>>> when it gets access to the VPE HW via the mem2mem queue, it also allocates
>>>> a VPDMA descriptor list to which configuration and data descriptors are added.
>>>>
>>>> Based on the information received via v4l2 ioctls for the source and
>>>> destination queues, the driver configures the values for the MMRs, and stores
>>>> them in the buffer. There are also some VPDMA parameters like frame start and
>>>> line mode which needs to be configured, these are configured by direct register
>>>> writes via the VPDMA helper functions.
>>>>
>>>> The driver's device_run() mem2mem op will add each descriptor based on how the
>>>> source and destination queues are set up for the given ctx, once the list is
>>>> prepared, it's submitted to VPDMA, these descriptors when parsed by VPDMA will
>>>> upload MMR registers, start DMA of video buffers on the various input and output
>>>> clients/ports.
>>>>
<snip>

>>
>>>> +}
>>>> +
>>>> +#define V4L2_CID_TRANS_NUM_BUFS		(V4L2_CID_USER_BASE + 0x1000)
>>>
>>> Reserve a control range for this driver in include/uapi/linux/v4l2-controls.h.
>>> Similar to the ones already defined there.
>>>
>>> That will ensure that controls for this driver have unique IDs.
>>
>> Thanks, I took this from the mem2mem-testdev driver, a test driver
>> doesn't need to worry about this I suppose.
>>
>> I had a query regarding this. I am planning to add a capture driver in
>> the future for a similar IP which can share some of the control IDs with
>> VPE. Is it possible for 2 different drivers to share the IDs?
>
> Certainly. There are three levels of controls:
>
> 1) Standard controls: can be used by any driver and are documented in the spec.
> 2) IP-specific controls: controls specific for a commonly used IP.
>     These can be used by any driver containing that IP and are documented as well
>     in the spec. Good examples are the MFC and CX2341x MPEG controls.
> 3) Driver-specific controls: these are specific to a driver and do not have to be
>     documented in the spec, only in the header/source specifying them. A range
>     of controls needs to be assigned to such a driver in v4l2-dv-controls.h.
>
> In your case it looks like the controls would fall into category 2.

For 2), by commonly used IP, do you mean a commonly used class of IPs 
like MPEG decoder, FM and camera? Or do you mean a specific vendor IP 
like say a camera subsystem found on different SoCs.

I think the controls in my case are very specific to the VPE and VIP 
IPs. These 2 IPs have some components like scaler, color space 
converter, chrominance up/downsampler in common. The controls will be 
specific to how these components behave. For example, a control can tell 
what value of frequency of Luminance peaking the scaler needs to 
perform. I don't think all scalers would provide Luma peaking. This 
holds for other controls too.

So if I understood your explanation correctly, I think 3) might make 
more sense.

>
>> Also, I noticed in the header that most drivers reserve space for 16
>> IDs. The current driver just has one, but there will be more custom ones
>> in the future. Is it fine if I reserve 16 for this driver too?
>
> Sure, that's no problem. Make sure you reserve enough space for future
> expansion, i.e. IDs are cheap, so no need to be conservative when defining
> the range.

Thanks for the clarification.

Archit


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v3 3/6] v4l: ti-vpe: Add VPE mem to mem driver
@ 2013-08-30 10:05               ` Archit Taneja
  0 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-08-30 10:05 UTC (permalink / raw)
  To: Hans Verkuil; +Cc: linux-media, laurent.pinchart, tomi.valkeinen, linux-omap

Hi,

On Friday 30 August 2013 12:37 PM, Hans Verkuil wrote:
> On 08/30/2013 08:47 AM, Archit Taneja wrote:
>> On Thursday 29 August 2013 06:58 PM, Hans Verkuil wrote:
>>> On Thu 29 August 2013 14:32:49 Archit Taneja wrote:
>>>> VPE is a block which consists of a single memory to memory path which can
>>>> perform chrominance up/down sampling, de-interlacing, scaling, and color space
>>>> conversion of raster or tiled YUV420 coplanar, YUV422 coplanar or YUV422
>>>> interleaved video formats.
>>>>
>>>> We create a mem2mem driver based primarily on the mem2mem-testdev example.
>>>> The de-interlacer, scaler and color space converter are all bypassed for now
>>>> to keep the driver simple. Chroma up/down sampler blocks are implemented, so
>>>> conversion beteen different YUV formats is possible.
>>>>
>>>> Each mem2mem context allocates a buffer for VPE MMR values which it will use
>>>> when it gets access to the VPE HW via the mem2mem queue, it also allocates
>>>> a VPDMA descriptor list to which configuration and data descriptors are added.
>>>>
>>>> Based on the information received via v4l2 ioctls for the source and
>>>> destination queues, the driver configures the values for the MMRs, and stores
>>>> them in the buffer. There are also some VPDMA parameters like frame start and
>>>> line mode which needs to be configured, these are configured by direct register
>>>> writes via the VPDMA helper functions.
>>>>
>>>> The driver's device_run() mem2mem op will add each descriptor based on how the
>>>> source and destination queues are set up for the given ctx, once the list is
>>>> prepared, it's submitted to VPDMA, these descriptors when parsed by VPDMA will
>>>> upload MMR registers, start DMA of video buffers on the various input and output
>>>> clients/ports.
>>>>
<snip>

>>
>>>> +}
>>>> +
>>>> +#define V4L2_CID_TRANS_NUM_BUFS		(V4L2_CID_USER_BASE + 0x1000)
>>>
>>> Reserve a control range for this driver in include/uapi/linux/v4l2-controls.h.
>>> Similar to the ones already defined there.
>>>
>>> That will ensure that controls for this driver have unique IDs.
>>
>> Thanks, I took this from the mem2mem-testdev driver, a test driver
>> doesn't need to worry about this I suppose.
>>
>> I had a query regarding this. I am planning to add a capture driver in
>> the future for a similar IP which can share some of the control IDs with
>> VPE. Is it possible for 2 different drivers to share the IDs?
>
> Certainly. There are three levels of controls:
>
> 1) Standard controls: can be used by any driver and are documented in the spec.
> 2) IP-specific controls: controls specific for a commonly used IP.
>     These can be used by any driver containing that IP and are documented as well
>     in the spec. Good examples are the MFC and CX2341x MPEG controls.
> 3) Driver-specific controls: these are specific to a driver and do not have to be
>     documented in the spec, only in the header/source specifying them. A range
>     of controls needs to be assigned to such a driver in v4l2-dv-controls.h.
>
> In your case it looks like the controls would fall into category 2.

For 2), by commonly used IP, do you mean a commonly used class of IPs 
like MPEG decoder, FM and camera? Or do you mean a specific vendor IP 
like say a camera subsystem found on different SoCs.

I think the controls in my case are very specific to the VPE and VIP 
IPs. These 2 IPs have some components like scaler, color space 
converter, chrominance up/downsampler in common. The controls will be 
specific to how these components behave. For example, a control can tell 
what value of frequency of Luminance peaking the scaler needs to 
perform. I don't think all scalers would provide Luma peaking. This 
holds for other controls too.

So if I understood your explanation correctly, I think 3) might make 
more sense.

>
>> Also, I noticed in the header that most drivers reserve space for 16
>> IDs. The current driver just has one, but there will be more custom ones
>> in the future. Is it fine if I reserve 16 for this driver too?
>
> Sure, that's no problem. Make sure you reserve enough space for future
> expansion, i.e. IDs are cheap, so no need to be conservative when defining
> the range.

Thanks for the clarification.

Archit


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v3 3/6] v4l: ti-vpe: Add VPE mem to mem driver
  2013-08-30 10:05               ` Archit Taneja
  (?)
@ 2013-08-30 10:44               ` Hans Verkuil
  -1 siblings, 0 replies; 138+ messages in thread
From: Hans Verkuil @ 2013-08-30 10:44 UTC (permalink / raw)
  To: Archit Taneja; +Cc: linux-media, laurent.pinchart, tomi.valkeinen, linux-omap

On Fri 30 August 2013 12:05:11 Archit Taneja wrote:
> Hi,
> 
> On Friday 30 August 2013 12:37 PM, Hans Verkuil wrote:
> > On 08/30/2013 08:47 AM, Archit Taneja wrote:
> >> On Thursday 29 August 2013 06:58 PM, Hans Verkuil wrote:
> >>> On Thu 29 August 2013 14:32:49 Archit Taneja wrote:
> >>>> VPE is a block which consists of a single memory to memory path which can
> >>>> perform chrominance up/down sampling, de-interlacing, scaling, and color space
> >>>> conversion of raster or tiled YUV420 coplanar, YUV422 coplanar or YUV422
> >>>> interleaved video formats.
> >>>>
> >>>> We create a mem2mem driver based primarily on the mem2mem-testdev example.
> >>>> The de-interlacer, scaler and color space converter are all bypassed for now
> >>>> to keep the driver simple. Chroma up/down sampler blocks are implemented, so
> >>>> conversion beteen different YUV formats is possible.
> >>>>
> >>>> Each mem2mem context allocates a buffer for VPE MMR values which it will use
> >>>> when it gets access to the VPE HW via the mem2mem queue, it also allocates
> >>>> a VPDMA descriptor list to which configuration and data descriptors are added.
> >>>>
> >>>> Based on the information received via v4l2 ioctls for the source and
> >>>> destination queues, the driver configures the values for the MMRs, and stores
> >>>> them in the buffer. There are also some VPDMA parameters like frame start and
> >>>> line mode which needs to be configured, these are configured by direct register
> >>>> writes via the VPDMA helper functions.
> >>>>
> >>>> The driver's device_run() mem2mem op will add each descriptor based on how the
> >>>> source and destination queues are set up for the given ctx, once the list is
> >>>> prepared, it's submitted to VPDMA, these descriptors when parsed by VPDMA will
> >>>> upload MMR registers, start DMA of video buffers on the various input and output
> >>>> clients/ports.
> >>>>
> <snip>
> 
> >>
> >>>> +}
> >>>> +
> >>>> +#define V4L2_CID_TRANS_NUM_BUFS		(V4L2_CID_USER_BASE + 0x1000)
> >>>
> >>> Reserve a control range for this driver in include/uapi/linux/v4l2-controls.h.
> >>> Similar to the ones already defined there.
> >>>
> >>> That will ensure that controls for this driver have unique IDs.
> >>
> >> Thanks, I took this from the mem2mem-testdev driver, a test driver
> >> doesn't need to worry about this I suppose.
> >>
> >> I had a query regarding this. I am planning to add a capture driver in
> >> the future for a similar IP which can share some of the control IDs with
> >> VPE. Is it possible for 2 different drivers to share the IDs?
> >
> > Certainly. There are three levels of controls:
> >
> > 1) Standard controls: can be used by any driver and are documented in the spec.
> > 2) IP-specific controls: controls specific for a commonly used IP.
> >     These can be used by any driver containing that IP and are documented as well
> >     in the spec. Good examples are the MFC and CX2341x MPEG controls.
> > 3) Driver-specific controls: these are specific to a driver and do not have to be
> >     documented in the spec, only in the header/source specifying them. A range
> >     of controls needs to be assigned to such a driver in v4l2-dv-controls.h.
> >
> > In your case it looks like the controls would fall into category 2.
> 
> For 2), by commonly used IP, do you mean a commonly used class of IPs 
> like MPEG decoder, FM and camera? Or do you mean a specific vendor IP 
> like say a camera subsystem found on different SoCs.

I mean a specific vendor IP found on different SoCs. So different drivers
would have to support the same IP.

> I think the controls in my case are very specific to the VPE and VIP 
> IPs. These 2 IPs have some components like scaler, color space 
> converter, chrominance up/downsampler in common. The controls will be 
> specific to how these components behave. For example, a control can tell 
> what value of frequency of Luminance peaking the scaler needs to 
> perform. I don't think all scalers would provide Luma peaking. This 
> holds for other controls too.
> 
> So if I understood your explanation correctly, I think 3) might make 
> more sense.

That might be a good starting point. It is not uncommon that controls
migrate from being custom controls to more standardized controls when
other devices appear using the same IP. Or sometimes what seemed like a
HW specific feature turns out to be available on other hardware from
other vendors as well.

> 
> >
> >> Also, I noticed in the header that most drivers reserve space for 16
> >> IDs. The current driver just has one, but there will be more custom ones
> >> in the future. Is it fine if I reserve 16 for this driver too?
> >
> > Sure, that's no problem. Make sure you reserve enough space for future
> > expansion, i.e. IDs are cheap, so no need to be conservative when defining
> > the range.
> 
> Thanks for the clarification.
> 
> Archit
> 
> 

Regards,

	Hans

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v3 3/6] v4l: ti-vpe: Add VPE mem to mem driver
  2013-08-29 13:28       ` Hans Verkuil
@ 2013-09-05  5:56           ` Archit Taneja
  2013-09-05  5:56           ` Archit Taneja
  1 sibling, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-09-05  5:56 UTC (permalink / raw)
  To: Hans Verkuil; +Cc: linux-media, laurent.pinchart, tomi.valkeinen, linux-omap

Hi Hans,

On Thursday 29 August 2013 06:58 PM, Hans Verkuil wrote:
> On Thu 29 August 2013 14:32:49 Archit Taneja wrote:
>> VPE is a block which consists of a single memory to memory path which can
>> perform chrominance up/down sampling, de-interlacing, scaling, and color space
>> conversion of raster or tiled YUV420 coplanar, YUV422 coplanar or YUV422
>> interleaved video formats.
>>
>> We create a mem2mem driver based primarily on the mem2mem-testdev example.
>> The de-interlacer, scaler and color space converter are all bypassed for now
>> to keep the driver simple. Chroma up/down sampler blocks are implemented, so
>> conversion beteen different YUV formats is possible.
>>
>> Each mem2mem context allocates a buffer for VPE MMR values which it will use
>> when it gets access to the VPE HW via the mem2mem queue, it also allocates
>> a VPDMA descriptor list to which configuration and data descriptors are added.
>>
>> Based on the information received via v4l2 ioctls for the source and
>> destination queues, the driver configures the values for the MMRs, and stores
>> them in the buffer. There are also some VPDMA parameters like frame start and
>> line mode which needs to be configured, these are configured by direct register
>> writes via the VPDMA helper functions.
>>
>> The driver's device_run() mem2mem op will add each descriptor based on how the
>> source and destination queues are set up for the given ctx, once the list is
>> prepared, it's submitted to VPDMA, these descriptors when parsed by VPDMA will
>> upload MMR registers, start DMA of video buffers on the various input and output
>> clients/ports.
>>
>> When the list is parsed completely(and the DMAs on all the output ports done),
>> an interrupt is generated which we use to notify that the source and destination
>> buffers are done.
>>
>> The rest of the driver is quite similar to other mem2mem drivers, we use the
>> multiplane v4l2 ioctls as the HW support coplanar formats.
>>
>> Signed-off-by: Archit Taneja <archit@ti.com>
>
> Thanks for the patch. Just a few small comments below...
>
>> ---
>>   drivers/media/platform/Kconfig           |   16 +
>>   drivers/media/platform/Makefile          |    2 +
>>   drivers/media/platform/ti-vpe/Makefile   |    5 +
>>   drivers/media/platform/ti-vpe/vpe.c      | 1740 ++++++++++++++++++++++++++++++
>>   drivers/media/platform/ti-vpe/vpe_regs.h |  496 +++++++++
>>   5 files changed, 2259 insertions(+)
>>   create mode 100644 drivers/media/platform/ti-vpe/Makefile
>>   create mode 100644 drivers/media/platform/ti-vpe/vpe.c
>>   create mode 100644 drivers/media/platform/ti-vpe/vpe_regs.h
>>
>> diff --git a/drivers/media/platform/ti-vpe/vpe.c b/drivers/media/platform/ti-vpe/vpe.c
>> new file mode 100644
>> index 0000000..85b0880
>> --- /dev/null
>> +++ b/drivers/media/platform/ti-vpe/vpe.c
>

<snip>

>> +
>> +static int __vpe_try_fmt(struct vpe_ctx *ctx, struct v4l2_format *f,
>> +		       struct vpe_fmt *fmt, int type)
>> +{
>> +	struct v4l2_pix_format_mplane *pix = &f->fmt.pix_mp;
>> +	struct v4l2_plane_pix_format *plane_fmt;
>> +	int i;
>> +
>> +	if (!fmt || !(fmt->types & type)) {
>> +		vpe_err(ctx->dev, "Fourcc format (0x%08x) invalid.\n",
>> +			pix->pixelformat);
>> +		return -EINVAL;
>> +	}
>> +
>> +	pix->field = V4L2_FIELD_NONE;
>> +
>> +	v4l_bound_align_image(&pix->width, MIN_W, MAX_W, W_ALIGN,
>> +			      &pix->height, MIN_H, MAX_H, H_ALIGN,
>> +			      S_ALIGN);
>> +
>> +	pix->num_planes = fmt->coplanar ? 2 : 1;
>> +	pix->pixelformat = fmt->fourcc;
>> +	pix->colorspace = fmt->fourcc == V4L2_PIX_FMT_RGB24 ?
>> +			V4L2_COLORSPACE_SRGB : V4L2_COLORSPACE_SMPTE170M;
>
> pix->priv should be set to NULL as well.

I wanted to point out that we use v4l2_pix_format_mplane in the 
v4l2_format fmt union. So I suppose we don't have a priv field in the 
pix structure here.

Thanks,
Archit


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v3 3/6] v4l: ti-vpe: Add VPE mem to mem driver
@ 2013-09-05  5:56           ` Archit Taneja
  0 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-09-05  5:56 UTC (permalink / raw)
  To: Hans Verkuil; +Cc: linux-media, laurent.pinchart, tomi.valkeinen, linux-omap

Hi Hans,

On Thursday 29 August 2013 06:58 PM, Hans Verkuil wrote:
> On Thu 29 August 2013 14:32:49 Archit Taneja wrote:
>> VPE is a block which consists of a single memory to memory path which can
>> perform chrominance up/down sampling, de-interlacing, scaling, and color space
>> conversion of raster or tiled YUV420 coplanar, YUV422 coplanar or YUV422
>> interleaved video formats.
>>
>> We create a mem2mem driver based primarily on the mem2mem-testdev example.
>> The de-interlacer, scaler and color space converter are all bypassed for now
>> to keep the driver simple. Chroma up/down sampler blocks are implemented, so
>> conversion beteen different YUV formats is possible.
>>
>> Each mem2mem context allocates a buffer for VPE MMR values which it will use
>> when it gets access to the VPE HW via the mem2mem queue, it also allocates
>> a VPDMA descriptor list to which configuration and data descriptors are added.
>>
>> Based on the information received via v4l2 ioctls for the source and
>> destination queues, the driver configures the values for the MMRs, and stores
>> them in the buffer. There are also some VPDMA parameters like frame start and
>> line mode which needs to be configured, these are configured by direct register
>> writes via the VPDMA helper functions.
>>
>> The driver's device_run() mem2mem op will add each descriptor based on how the
>> source and destination queues are set up for the given ctx, once the list is
>> prepared, it's submitted to VPDMA, these descriptors when parsed by VPDMA will
>> upload MMR registers, start DMA of video buffers on the various input and output
>> clients/ports.
>>
>> When the list is parsed completely(and the DMAs on all the output ports done),
>> an interrupt is generated which we use to notify that the source and destination
>> buffers are done.
>>
>> The rest of the driver is quite similar to other mem2mem drivers, we use the
>> multiplane v4l2 ioctls as the HW support coplanar formats.
>>
>> Signed-off-by: Archit Taneja <archit@ti.com>
>
> Thanks for the patch. Just a few small comments below...
>
>> ---
>>   drivers/media/platform/Kconfig           |   16 +
>>   drivers/media/platform/Makefile          |    2 +
>>   drivers/media/platform/ti-vpe/Makefile   |    5 +
>>   drivers/media/platform/ti-vpe/vpe.c      | 1740 ++++++++++++++++++++++++++++++
>>   drivers/media/platform/ti-vpe/vpe_regs.h |  496 +++++++++
>>   5 files changed, 2259 insertions(+)
>>   create mode 100644 drivers/media/platform/ti-vpe/Makefile
>>   create mode 100644 drivers/media/platform/ti-vpe/vpe.c
>>   create mode 100644 drivers/media/platform/ti-vpe/vpe_regs.h
>>
>> diff --git a/drivers/media/platform/ti-vpe/vpe.c b/drivers/media/platform/ti-vpe/vpe.c
>> new file mode 100644
>> index 0000000..85b0880
>> --- /dev/null
>> +++ b/drivers/media/platform/ti-vpe/vpe.c
>

<snip>

>> +
>> +static int __vpe_try_fmt(struct vpe_ctx *ctx, struct v4l2_format *f,
>> +		       struct vpe_fmt *fmt, int type)
>> +{
>> +	struct v4l2_pix_format_mplane *pix = &f->fmt.pix_mp;
>> +	struct v4l2_plane_pix_format *plane_fmt;
>> +	int i;
>> +
>> +	if (!fmt || !(fmt->types & type)) {
>> +		vpe_err(ctx->dev, "Fourcc format (0x%08x) invalid.\n",
>> +			pix->pixelformat);
>> +		return -EINVAL;
>> +	}
>> +
>> +	pix->field = V4L2_FIELD_NONE;
>> +
>> +	v4l_bound_align_image(&pix->width, MIN_W, MAX_W, W_ALIGN,
>> +			      &pix->height, MIN_H, MAX_H, H_ALIGN,
>> +			      S_ALIGN);
>> +
>> +	pix->num_planes = fmt->coplanar ? 2 : 1;
>> +	pix->pixelformat = fmt->fourcc;
>> +	pix->colorspace = fmt->fourcc == V4L2_PIX_FMT_RGB24 ?
>> +			V4L2_COLORSPACE_SRGB : V4L2_COLORSPACE_SMPTE170M;
>
> pix->priv should be set to NULL as well.

I wanted to point out that we use v4l2_pix_format_mplane in the 
v4l2_format fmt union. So I suppose we don't have a priv field in the 
pix structure here.

Thanks,
Archit

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v4 0/4] v4l: VPE mem to mem driver
  2013-08-20 11:00   ` Archit Taneja
@ 2013-09-06 10:12     ` Archit Taneja
  -1 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-09-06 10:12 UTC (permalink / raw)
  To: linux-media, hverkuil, laurent.pinchart
  Cc: linux-omap, tomi.valkeinen, Archit Taneja

VPE(Video Processing Engine) is an IP found on DRA7xx, this series adds VPE as a
mem to mem v4l2 driver, and VPDMA as a helper library.

The first version of the patch series described VPE in detail, you can have a
look at it here:

http://www.spinics.net/lists/linux-media/msg66518.html

Changes in v4:
- Control ID for the driver reserved in v4l2-controls.h
- Some fixes/clean ups suggested by Hans.
- A small hack done in VPE's probe to use a fixed 64K resource size, this
  is needed as the DT bindings will split the addresses accross VPE
  submodules, the driver currently works with register offsets from the top
  level VPE base. The driver can be modified later to support multiple
  ioremaps of the sub modules.
- Addition of sync on channel descriptors for input DMA channels, this
  ensures the VPDMA list is stalled in the rare case of an input channel's
  DMA getting completed after all the output channel DMAs.
- Removed the DT and hwmod patches from this series. DRA7xx support is not
  yet got in the 3.12 merge window. Will deal with those separately.

Archit Taneja (4):
  v4l: ti-vpe: Create a vpdma helper library
  v4l: ti-vpe: Add helpers for creating VPDMA descriptors
  v4l: ti-vpe: Add VPE mem to mem driver
  v4l: ti-vpe: Add de-interlacer support in VPE

 drivers/media/platform/Kconfig             |   16 +
 drivers/media/platform/Makefile            |    2 +
 drivers/media/platform/ti-vpe/Makefile     |    5 +
 drivers/media/platform/ti-vpe/vpdma.c      |  846 ++++++++++++
 drivers/media/platform/ti-vpe/vpdma.h      |  203 +++
 drivers/media/platform/ti-vpe/vpdma_priv.h |  641 +++++++++
 drivers/media/platform/ti-vpe/vpe.c        | 2074 ++++++++++++++++++++++++++++
 drivers/media/platform/ti-vpe/vpe_regs.h   |  496 +++++++
 include/uapi/linux/v4l2-controls.h         |    4 +
 9 files changed, 4287 insertions(+)
 create mode 100644 drivers/media/platform/ti-vpe/Makefile
 create mode 100644 drivers/media/platform/ti-vpe/vpdma.c
 create mode 100644 drivers/media/platform/ti-vpe/vpdma.h
 create mode 100644 drivers/media/platform/ti-vpe/vpdma_priv.h
 create mode 100644 drivers/media/platform/ti-vpe/vpe.c
 create mode 100644 drivers/media/platform/ti-vpe/vpe_regs.h

-- 
1.8.1.2


^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v4 0/4] v4l: VPE mem to mem driver
@ 2013-09-06 10:12     ` Archit Taneja
  0 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-09-06 10:12 UTC (permalink / raw)
  To: linux-media, hverkuil, laurent.pinchart
  Cc: linux-omap, tomi.valkeinen, Archit Taneja

VPE(Video Processing Engine) is an IP found on DRA7xx, this series adds VPE as a
mem to mem v4l2 driver, and VPDMA as a helper library.

The first version of the patch series described VPE in detail, you can have a
look at it here:

http://www.spinics.net/lists/linux-media/msg66518.html

Changes in v4:
- Control ID for the driver reserved in v4l2-controls.h
- Some fixes/clean ups suggested by Hans.
- A small hack done in VPE's probe to use a fixed 64K resource size, this
  is needed as the DT bindings will split the addresses accross VPE
  submodules, the driver currently works with register offsets from the top
  level VPE base. The driver can be modified later to support multiple
  ioremaps of the sub modules.
- Addition of sync on channel descriptors for input DMA channels, this
  ensures the VPDMA list is stalled in the rare case of an input channel's
  DMA getting completed after all the output channel DMAs.
- Removed the DT and hwmod patches from this series. DRA7xx support is not
  yet got in the 3.12 merge window. Will deal with those separately.

Archit Taneja (4):
  v4l: ti-vpe: Create a vpdma helper library
  v4l: ti-vpe: Add helpers for creating VPDMA descriptors
  v4l: ti-vpe: Add VPE mem to mem driver
  v4l: ti-vpe: Add de-interlacer support in VPE

 drivers/media/platform/Kconfig             |   16 +
 drivers/media/platform/Makefile            |    2 +
 drivers/media/platform/ti-vpe/Makefile     |    5 +
 drivers/media/platform/ti-vpe/vpdma.c      |  846 ++++++++++++
 drivers/media/platform/ti-vpe/vpdma.h      |  203 +++
 drivers/media/platform/ti-vpe/vpdma_priv.h |  641 +++++++++
 drivers/media/platform/ti-vpe/vpe.c        | 2074 ++++++++++++++++++++++++++++
 drivers/media/platform/ti-vpe/vpe_regs.h   |  496 +++++++
 include/uapi/linux/v4l2-controls.h         |    4 +
 9 files changed, 4287 insertions(+)
 create mode 100644 drivers/media/platform/ti-vpe/Makefile
 create mode 100644 drivers/media/platform/ti-vpe/vpdma.c
 create mode 100644 drivers/media/platform/ti-vpe/vpdma.h
 create mode 100644 drivers/media/platform/ti-vpe/vpdma_priv.h
 create mode 100644 drivers/media/platform/ti-vpe/vpe.c
 create mode 100644 drivers/media/platform/ti-vpe/vpe_regs.h

-- 
1.8.1.2


^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v4 1/4] v4l: ti-vpe: Create a vpdma helper library
  2013-09-06 10:12     ` Archit Taneja
@ 2013-09-06 10:12       ` Archit Taneja
  -1 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-09-06 10:12 UTC (permalink / raw)
  To: linux-media, hverkuil, laurent.pinchart
  Cc: linux-omap, tomi.valkeinen, Archit Taneja

The primary function of VPDMA is to move data between external memory and
internal processing modules(in our case, VPE) that source or sink data. VPDMA is
capable of buffering this data and then delivering the data as demanded to the
modules as programmed. The modules that source or sink data are referred to as
clients or ports. A channel is setup inside the VPDMA to connect a specific
memory buffer to a specific client. The VPDMA centralizes the DMA control
functions and buffering required to allow all the clients to minimize the
effect of long latency times.

Add the following to the VPDMA helper:

- A data struct which describe VPDMA channels. For now, these channels are the
  ones used only by VPE, the list of channels will increase when VIP(Video
  Input Port) also uses the VPDMA library. This channel information will be
  used to populate fields required by data descriptors.

- Data structs which describe the different data types supported by VPDMA. This
  data type information will be used to populate fields required by data
  descriptors and used by the VPE driver to map a V4L2 format to the
  corresponding VPDMA data type.

- Provide VPDMA register offset definitions, functions to read, write and modify
  VPDMA registers.

- Functions to create and submit a VPDMA list. A list is a group of descriptors
  that makes up a set of DMA transfers that need to be completed. Each
  descriptor will either perform a DMA transaction to fetch input buffers and
  write to output buffers(data descriptors), or configure the MMRs of sub blocks
  of VPE(configuration descriptors), or provide control information to VPDMA
  (control descriptors).

- Functions to allocate, map and unmap buffers needed for the descriptor list,
  payloads containing MMR values and scaler coefficients. These use the DMA
  mapping APIs to ensure exclusive access to VPDMA.

- Functions to enable VPDMA interrupts. VPDMA can trigger an interrupt on the
  VPE interrupt line when a descriptor list is parsed completely and the DMA
  transactions are completed. This requires masking the events in VPDMA
  registers and configuring some top level VPE interrupt registers.

- Enable some VPDMA specific parameters: frame start event(when to start DMA for
  a client) and line mode(whether each line fetched should be mirrored or not).

- Function to load firmware required by VPDMA. VPDMA requires a firmware for
  it's internal list manager. We add the required request_firmware apis to fetch
  this firmware from user space.

- Function to dump VPDMA registers.

- A function to initialize and create a VPDMA instance, this will be called by
  the VPE driver with it's platform device pointer, this function will take care
  of loading VPDMA firmware and returning a vpdma_data instance back to the VPE
  driver. The VIP driver will also call the same init function to initialize it's
  own VPDMA instance.

Signed-off-by: Archit Taneja <archit@ti.com>
---
 drivers/media/platform/ti-vpe/vpdma.c      | 578 +++++++++++++++++++++++++++++
 drivers/media/platform/ti-vpe/vpdma.h      | 155 ++++++++
 drivers/media/platform/ti-vpe/vpdma_priv.h | 119 ++++++
 3 files changed, 852 insertions(+)
 create mode 100644 drivers/media/platform/ti-vpe/vpdma.c
 create mode 100644 drivers/media/platform/ti-vpe/vpdma.h
 create mode 100644 drivers/media/platform/ti-vpe/vpdma_priv.h

diff --git a/drivers/media/platform/ti-vpe/vpdma.c b/drivers/media/platform/ti-vpe/vpdma.c
new file mode 100644
index 0000000..42db12c
--- /dev/null
+++ b/drivers/media/platform/ti-vpe/vpdma.c
@@ -0,0 +1,578 @@
+/*
+ * VPDMA helper library
+ *
+ * Copyright (c) 2013 Texas Instruments Inc.
+ *
+ * David Griego, <dagriego@biglakesoftware.com>
+ * Dale Farnsworth, <dale@farnsworth.org>
+ * Archit Taneja, <archit@ti.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#include <linux/delay.h>
+#include <linux/dma-mapping.h>
+#include <linux/err.h>
+#include <linux/firmware.h>
+#include <linux/io.h>
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+
+#include "vpdma.h"
+#include "vpdma_priv.h"
+
+#define VPDMA_FIRMWARE	"vpdma-1b8.bin"
+
+const struct vpdma_data_format vpdma_yuv_fmts[] = {
+	[VPDMA_DATA_FMT_Y444] = {
+		.data_type	= DATA_TYPE_Y444,
+		.depth		= 8,
+	},
+	[VPDMA_DATA_FMT_Y422] = {
+		.data_type	= DATA_TYPE_Y422,
+		.depth		= 8,
+	},
+	[VPDMA_DATA_FMT_Y420] = {
+		.data_type	= DATA_TYPE_Y420,
+		.depth		= 8,
+	},
+	[VPDMA_DATA_FMT_C444] = {
+		.data_type	= DATA_TYPE_C444,
+		.depth		= 8,
+	},
+	[VPDMA_DATA_FMT_C422] = {
+		.data_type	= DATA_TYPE_C422,
+		.depth		= 8,
+	},
+	[VPDMA_DATA_FMT_C420] = {
+		.data_type	= DATA_TYPE_C420,
+		.depth		= 4,
+	},
+	[VPDMA_DATA_FMT_YC422] = {
+		.data_type	= DATA_TYPE_YC422,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_YC444] = {
+		.data_type	= DATA_TYPE_YC444,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_CY422] = {
+		.data_type	= DATA_TYPE_CY422,
+		.depth		= 16,
+	},
+};
+
+const struct vpdma_data_format vpdma_rgb_fmts[] = {
+	[VPDMA_DATA_FMT_RGB565] = {
+		.data_type	= DATA_TYPE_RGB16_565,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_ARGB16_1555] = {
+		.data_type	= DATA_TYPE_ARGB_1555,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_ARGB16] = {
+		.data_type	= DATA_TYPE_ARGB_4444,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_RGBA16_5551] = {
+		.data_type	= DATA_TYPE_RGBA_5551,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_RGBA16] = {
+		.data_type	= DATA_TYPE_RGBA_4444,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_ARGB24] = {
+		.data_type	= DATA_TYPE_ARGB24_6666,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_RGB24] = {
+		.data_type	= DATA_TYPE_RGB24_888,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_ARGB32] = {
+		.data_type	= DATA_TYPE_ARGB32_8888,
+		.depth		= 32,
+	},
+	[VPDMA_DATA_FMT_RGBA24] = {
+		.data_type	= DATA_TYPE_RGBA24_6666,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_RGBA32] = {
+		.data_type	= DATA_TYPE_RGBA32_8888,
+		.depth		= 32,
+	},
+	[VPDMA_DATA_FMT_BGR565] = {
+		.data_type	= DATA_TYPE_BGR16_565,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_ABGR16_1555] = {
+		.data_type	= DATA_TYPE_ABGR_1555,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_ABGR16] = {
+		.data_type	= DATA_TYPE_ABGR_4444,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_BGRA16_5551] = {
+		.data_type	= DATA_TYPE_BGRA_5551,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_BGRA16] = {
+		.data_type	= DATA_TYPE_BGRA_4444,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_ABGR24] = {
+		.data_type	= DATA_TYPE_ABGR24_6666,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_BGR24] = {
+		.data_type	= DATA_TYPE_BGR24_888,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_ABGR32] = {
+		.data_type	= DATA_TYPE_ABGR32_8888,
+		.depth		= 32,
+	},
+	[VPDMA_DATA_FMT_BGRA24] = {
+		.data_type	= DATA_TYPE_BGRA24_6666,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_BGRA32] = {
+		.data_type	= DATA_TYPE_BGRA32_8888,
+		.depth		= 32,
+	},
+};
+
+const struct vpdma_data_format vpdma_misc_fmts[] = {
+	[VPDMA_DATA_FMT_MV] = {
+		.data_type	= DATA_TYPE_MV,
+		.depth		= 4,
+	},
+};
+
+struct vpdma_channel_info {
+	int num;		/* VPDMA channel number */
+	int cstat_offset;	/* client CSTAT register offset */
+};
+
+static const struct vpdma_channel_info chan_info[] = {
+	[VPE_CHAN_LUMA1_IN] = {
+		.num		= VPE_CHAN_NUM_LUMA1_IN,
+		.cstat_offset	= VPDMA_DEI_LUMA1_CSTAT,
+	},
+	[VPE_CHAN_CHROMA1_IN] = {
+		.num		= VPE_CHAN_NUM_CHROMA1_IN,
+		.cstat_offset	= VPDMA_DEI_CHROMA1_CSTAT,
+	},
+	[VPE_CHAN_LUMA2_IN] = {
+		.num		= VPE_CHAN_NUM_LUMA2_IN,
+		.cstat_offset	= VPDMA_DEI_LUMA2_CSTAT,
+	},
+	[VPE_CHAN_CHROMA2_IN] = {
+		.num		= VPE_CHAN_NUM_CHROMA2_IN,
+		.cstat_offset	= VPDMA_DEI_CHROMA2_CSTAT,
+	},
+	[VPE_CHAN_LUMA3_IN] = {
+		.num		= VPE_CHAN_NUM_LUMA3_IN,
+		.cstat_offset	= VPDMA_DEI_LUMA3_CSTAT,
+	},
+	[VPE_CHAN_CHROMA3_IN] = {
+		.num		= VPE_CHAN_NUM_CHROMA3_IN,
+		.cstat_offset	= VPDMA_DEI_CHROMA3_CSTAT,
+	},
+	[VPE_CHAN_MV_IN] = {
+		.num		= VPE_CHAN_NUM_MV_IN,
+		.cstat_offset	= VPDMA_DEI_MV_IN_CSTAT,
+	},
+	[VPE_CHAN_MV_OUT] = {
+		.num		= VPE_CHAN_NUM_MV_OUT,
+		.cstat_offset	= VPDMA_DEI_MV_OUT_CSTAT,
+	},
+	[VPE_CHAN_LUMA_OUT] = {
+		.num		= VPE_CHAN_NUM_LUMA_OUT,
+		.cstat_offset	= VPDMA_VIP_UP_Y_CSTAT,
+	},
+	[VPE_CHAN_CHROMA_OUT] = {
+		.num		= VPE_CHAN_NUM_CHROMA_OUT,
+		.cstat_offset	= VPDMA_VIP_UP_UV_CSTAT,
+	},
+	[VPE_CHAN_RGB_OUT] = {
+		.num		= VPE_CHAN_NUM_RGB_OUT,
+		.cstat_offset	= VPDMA_VIP_UP_Y_CSTAT,
+	},
+};
+
+static u32 read_reg(struct vpdma_data *vpdma, int offset)
+{
+	return ioread32(vpdma->base + offset);
+}
+
+static void write_reg(struct vpdma_data *vpdma, int offset, u32 value)
+{
+	iowrite32(value, vpdma->base + offset);
+}
+
+static int read_field_reg(struct vpdma_data *vpdma, int offset,
+		u32 mask, int shift)
+{
+	return (read_reg(vpdma, offset) & (mask << shift)) >> shift;
+}
+
+static void write_field_reg(struct vpdma_data *vpdma, int offset, u32 field,
+		u32 mask, int shift)
+{
+	u32 val = read_reg(vpdma, offset);
+
+	val &= ~(mask << shift);
+	val |= (field & mask) << shift;
+
+	write_reg(vpdma, offset, val);
+}
+
+void vpdma_dump_regs(struct vpdma_data *vpdma)
+{
+	struct device *dev = &vpdma->pdev->dev;
+
+#define DUMPREG(r) dev_dbg(dev, "%-35s %08x\n", #r, read_reg(vpdma, VPDMA_##r))
+
+	dev_dbg(dev, "VPDMA Registers:\n");
+
+	DUMPREG(PID);
+	DUMPREG(LIST_ADDR);
+	DUMPREG(LIST_ATTR);
+	DUMPREG(LIST_STAT_SYNC);
+	DUMPREG(BG_RGB);
+	DUMPREG(BG_YUV);
+	DUMPREG(SETUP);
+	DUMPREG(MAX_SIZE1);
+	DUMPREG(MAX_SIZE2);
+	DUMPREG(MAX_SIZE3);
+
+	/*
+	 * dumping registers of only group0 and group3, because VPE channels
+	 * lie within group0 and group3 registers
+	 */
+	DUMPREG(INT_CHAN_STAT(0));
+	DUMPREG(INT_CHAN_MASK(0));
+	DUMPREG(INT_CHAN_STAT(3));
+	DUMPREG(INT_CHAN_MASK(3));
+	DUMPREG(INT_CLIENT0_STAT);
+	DUMPREG(INT_CLIENT0_MASK);
+	DUMPREG(INT_CLIENT1_STAT);
+	DUMPREG(INT_CLIENT1_MASK);
+	DUMPREG(INT_LIST0_STAT);
+	DUMPREG(INT_LIST0_MASK);
+
+	/*
+	 * these are registers specific to VPE clients, we can make this
+	 * function dump client registers specific to VPE or VIP based on
+	 * who is using it
+	 */
+	DUMPREG(DEI_CHROMA1_CSTAT);
+	DUMPREG(DEI_LUMA1_CSTAT);
+	DUMPREG(DEI_CHROMA2_CSTAT);
+	DUMPREG(DEI_LUMA2_CSTAT);
+	DUMPREG(DEI_CHROMA3_CSTAT);
+	DUMPREG(DEI_LUMA3_CSTAT);
+	DUMPREG(DEI_MV_IN_CSTAT);
+	DUMPREG(DEI_MV_OUT_CSTAT);
+	DUMPREG(VIP_UP_Y_CSTAT);
+	DUMPREG(VIP_UP_UV_CSTAT);
+	DUMPREG(VPI_CTL_CSTAT);
+}
+
+/*
+ * Allocate a DMA buffer
+ */
+int vpdma_alloc_desc_buf(struct vpdma_buf *buf, size_t size)
+{
+	buf->size = size;
+	buf->mapped = false;
+	buf->addr = kzalloc(size, GFP_KERNEL);
+	if (!buf->addr)
+		return -ENOMEM;
+
+	WARN_ON((u32) buf->addr & VPDMA_DESC_ALIGN);
+
+	return 0;
+}
+
+void vpdma_free_desc_buf(struct vpdma_buf *buf)
+{
+	WARN_ON(buf->mapped);
+	kfree(buf->addr);
+	buf->addr = NULL;
+	buf->size = 0;
+}
+
+/*
+ * map descriptor/payload DMA buffer, enabling DMA access
+ */
+int vpdma_map_desc_buf(struct vpdma_data *vpdma, struct vpdma_buf *buf)
+{
+	struct device *dev = &vpdma->pdev->dev;
+
+	WARN_ON(buf->mapped);
+	buf->dma_addr = dma_map_single(dev, buf->addr, buf->size,
+				DMA_TO_DEVICE);
+	if (dma_mapping_error(dev, buf->dma_addr)) {
+		dev_err(dev, "failed to map buffer\n");
+		return -EINVAL;
+	}
+
+	buf->mapped = true;
+
+	return 0;
+}
+
+/*
+ * unmap descriptor/payload DMA buffer, disabling DMA access and
+ * allowing the main processor to acces the data
+ */
+void vpdma_unmap_desc_buf(struct vpdma_data *vpdma, struct vpdma_buf *buf)
+{
+	struct device *dev = &vpdma->pdev->dev;
+
+	if (buf->mapped)
+		dma_unmap_single(dev, buf->dma_addr, buf->size, DMA_TO_DEVICE);
+
+	buf->mapped = false;
+}
+
+/*
+ * create a descriptor list, the user of this list will append configuration,
+ * control and data descriptors to this list, this list will be submitted to
+ * VPDMA. VPDMA's list parser will go through each descriptor and perform the
+ * required DMA operations
+ */
+int vpdma_create_desc_list(struct vpdma_desc_list *list, size_t size, int type)
+{
+	int r;
+
+	r = vpdma_alloc_desc_buf(&list->buf, size);
+	if (r)
+		return r;
+
+	list->next = list->buf.addr;
+
+	list->type = type;
+
+	return 0;
+}
+
+/*
+ * once a descriptor list is parsed by VPDMA, we reset the list by emptying it,
+ * to allow new descriptors to be added to the list.
+ */
+void vpdma_reset_desc_list(struct vpdma_desc_list *list)
+{
+	list->next = list->buf.addr;
+}
+
+/*
+ * free the buffer allocated fot the VPDMA descriptor list, this should be
+ * called when the user doesn't want to use VPDMA any more.
+ */
+void vpdma_free_desc_list(struct vpdma_desc_list *list)
+{
+	vpdma_free_desc_buf(&list->buf);
+
+	list->next = NULL;
+}
+
+static bool vpdma_list_busy(struct vpdma_data *vpdma, int list_num)
+{
+	return read_reg(vpdma, VPDMA_LIST_STAT_SYNC) & BIT(list_num + 16);
+}
+
+/*
+ * submit a list of DMA descriptors to the VPE VPDMA, do not wait for completion
+ */
+int vpdma_submit_descs(struct vpdma_data *vpdma, struct vpdma_desc_list *list)
+{
+	/* we always use the first list */
+	int list_num = 0;
+	int list_size;
+
+	if (vpdma_list_busy(vpdma, list_num))
+		return -EBUSY;
+
+	/* 16-byte granularity */
+	list_size = (list->next - list->buf.addr) >> 4;
+
+	write_reg(vpdma, VPDMA_LIST_ADDR, (u32) list->buf.dma_addr);
+
+	write_reg(vpdma, VPDMA_LIST_ATTR,
+			(list_num << VPDMA_LIST_NUM_SHFT) |
+			(list->type << VPDMA_LIST_TYPE_SHFT) |
+			list_size);
+
+	return 0;
+}
+
+/* set or clear the mask for list complete interrupt */
+void vpdma_enable_list_complete_irq(struct vpdma_data *vpdma, int list_num,
+		bool enable)
+{
+	u32 val;
+
+	val = read_reg(vpdma, VPDMA_INT_LIST0_MASK);
+	if (enable)
+		val |= (1 << (list_num * 2));
+	else
+		val &= ~(1 << (list_num * 2));
+	write_reg(vpdma, VPDMA_INT_LIST0_MASK, val);
+}
+
+/* clear previosuly occured list intterupts in the LIST_STAT register */
+void vpdma_clear_list_stat(struct vpdma_data *vpdma)
+{
+	write_reg(vpdma, VPDMA_INT_LIST0_STAT,
+		read_reg(vpdma, VPDMA_INT_LIST0_STAT));
+}
+
+/*
+ * configures the output mode of the line buffer for the given client, the
+ * line buffer content can either be mirrored(each line repeated twice) or
+ * passed to the client as is
+ */
+void vpdma_set_line_mode(struct vpdma_data *vpdma, int line_mode,
+		enum vpdma_channel chan)
+{
+	int client_cstat = chan_info[chan].cstat_offset;
+
+	write_field_reg(vpdma, client_cstat, line_mode,
+		VPDMA_CSTAT_LINE_MODE_MASK, VPDMA_CSTAT_LINE_MODE_SHIFT);
+}
+
+/*
+ * configures the event which should trigger VPDMA transfer for the given
+ * client
+ */
+void vpdma_set_frame_start_event(struct vpdma_data *vpdma,
+		enum vpdma_frame_start_event fs_event,
+		enum vpdma_channel chan)
+{
+	int client_cstat = chan_info[chan].cstat_offset;
+
+	write_field_reg(vpdma, client_cstat, fs_event,
+		VPDMA_CSTAT_FRAME_START_MASK, VPDMA_CSTAT_FRAME_START_SHIFT);
+}
+
+static void vpdma_firmware_cb(const struct firmware *f, void *context)
+{
+	struct vpdma_data *vpdma = context;
+	struct vpdma_buf fw_dma_buf;
+	int i, r;
+
+	dev_dbg(&vpdma->pdev->dev, "firmware callback\n");
+
+	if (!f || !f->data) {
+		dev_err(&vpdma->pdev->dev, "couldn't get firmware\n");
+		return;
+	}
+
+	/* already initialized */
+	if (read_field_reg(vpdma, VPDMA_LIST_ATTR, VPDMA_LIST_RDY_MASK,
+			VPDMA_LIST_RDY_SHFT)) {
+		vpdma->ready = true;
+		return;
+	}
+
+	r = vpdma_alloc_desc_buf(&fw_dma_buf, f->size);
+	if (r) {
+		dev_err(&vpdma->pdev->dev,
+			"failed to allocate dma buffer for firmware\n");
+		goto rel_fw;
+	}
+
+	memcpy(fw_dma_buf.addr, f->data, f->size);
+
+	vpdma_map_desc_buf(vpdma, &fw_dma_buf);
+
+	write_reg(vpdma, VPDMA_LIST_ADDR, (u32) fw_dma_buf.dma_addr);
+
+	for (i = 0; i < 100; i++) {		/* max 1 second */
+		msleep_interruptible(10);
+
+		if (read_field_reg(vpdma, VPDMA_LIST_ATTR, VPDMA_LIST_RDY_MASK,
+				VPDMA_LIST_RDY_SHFT))
+			break;
+	}
+
+	if (i == 100) {
+		dev_err(&vpdma->pdev->dev, "firmware upload failed\n");
+		goto free_buf;
+	}
+
+	vpdma->ready = true;
+
+free_buf:
+	vpdma_unmap_desc_buf(vpdma, &fw_dma_buf);
+
+	vpdma_free_desc_buf(&fw_dma_buf);
+rel_fw:
+	release_firmware(f);
+}
+
+static int vpdma_load_firmware(struct vpdma_data *vpdma)
+{
+	int r;
+	struct device *dev = &vpdma->pdev->dev;
+
+	r = request_firmware_nowait(THIS_MODULE, 1,
+		(const char *) VPDMA_FIRMWARE, dev, GFP_KERNEL, vpdma,
+		vpdma_firmware_cb);
+	if (r) {
+		dev_err(dev, "firmware not available %s\n", VPDMA_FIRMWARE);
+		return r;
+	} else {
+		dev_info(dev, "loading firmware %s\n", VPDMA_FIRMWARE);
+	}
+
+	return 0;
+}
+
+struct vpdma_data *vpdma_create(struct platform_device *pdev)
+{
+	struct resource *res;
+	struct vpdma_data *vpdma;
+	int r;
+
+	dev_dbg(&pdev->dev, "vpdma_create\n");
+
+	vpdma = devm_kzalloc(&pdev->dev, sizeof(*vpdma), GFP_KERNEL);
+	if (!vpdma) {
+		dev_err(&pdev->dev, "couldn't alloc vpdma_dev\n");
+		return ERR_PTR(-ENOMEM);
+	}
+
+	vpdma->pdev = pdev;
+
+	res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "vpdma");
+	if (res == NULL) {
+		dev_err(&pdev->dev, "missing platform resources data\n");
+		return ERR_PTR(-ENODEV);
+	}
+
+	vpdma->base = devm_ioremap(&pdev->dev, res->start, resource_size(res));
+	if (!vpdma->base) {
+		dev_err(&pdev->dev, "failed to ioremap\n");
+		return ERR_PTR(-ENOMEM);
+	}
+
+	r = vpdma_load_firmware(vpdma);
+	if (r) {
+		pr_err("failed to load firmware %s\n", VPDMA_FIRMWARE);
+		return ERR_PTR(r);
+	}
+
+	return vpdma;
+}
+MODULE_FIRMWARE(VPDMA_FIRMWARE);
diff --git a/drivers/media/platform/ti-vpe/vpdma.h b/drivers/media/platform/ti-vpe/vpdma.h
new file mode 100644
index 0000000..8056689
--- /dev/null
+++ b/drivers/media/platform/ti-vpe/vpdma.h
@@ -0,0 +1,155 @@
+/*
+ * Copyright (c) 2013 Texas Instruments Inc.
+ *
+ * David Griego, <dagriego@biglakesoftware.com>
+ * Dale Farnsworth, <dale@farnsworth.org>
+ * Archit Taneja, <archit@ti.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#ifndef __TI_VPDMA_H_
+#define __TI_VPDMA_H_
+
+/*
+ * A vpdma_buf tracks the size, DMA address and mapping status of each
+ * driver DMA area.
+ */
+struct vpdma_buf {
+	void			*addr;
+	dma_addr_t		dma_addr;
+	size_t			size;
+	bool			mapped;
+};
+
+struct vpdma_desc_list {
+	struct vpdma_buf buf;
+	void *next;
+	int type;
+};
+
+struct vpdma_data {
+	void __iomem		*base;
+
+	struct platform_device	*pdev;
+
+	/* tells whether vpdma firmware is loaded or not */
+	bool ready;
+};
+
+struct vpdma_data_format {
+	int data_type;
+	u8 depth;
+};
+
+#define VPDMA_DESC_ALIGN		16	/* 16-byte descriptor alignment */
+
+#define VPDMA_DTD_DESC_SIZE		32	/* 8 words */
+#define VPDMA_CFD_CTD_DESC_SIZE		16	/* 4 words */
+
+#define VPDMA_LIST_TYPE_NORMAL		0
+#define VPDMA_LIST_TYPE_SELF_MODIFYING	1
+#define VPDMA_LIST_TYPE_DOORBELL	2
+
+enum vpdma_yuv_formats {
+	VPDMA_DATA_FMT_Y444 = 0,
+	VPDMA_DATA_FMT_Y422,
+	VPDMA_DATA_FMT_Y420,
+	VPDMA_DATA_FMT_C444,
+	VPDMA_DATA_FMT_C422,
+	VPDMA_DATA_FMT_C420,
+	VPDMA_DATA_FMT_YC422,
+	VPDMA_DATA_FMT_YC444,
+	VPDMA_DATA_FMT_CY422,
+};
+
+enum vpdma_rgb_formats {
+	VPDMA_DATA_FMT_RGB565 = 0,
+	VPDMA_DATA_FMT_ARGB16_1555,
+	VPDMA_DATA_FMT_ARGB16,
+	VPDMA_DATA_FMT_RGBA16_5551,
+	VPDMA_DATA_FMT_RGBA16,
+	VPDMA_DATA_FMT_ARGB24,
+	VPDMA_DATA_FMT_RGB24,
+	VPDMA_DATA_FMT_ARGB32,
+	VPDMA_DATA_FMT_RGBA24,
+	VPDMA_DATA_FMT_RGBA32,
+	VPDMA_DATA_FMT_BGR565,
+	VPDMA_DATA_FMT_ABGR16_1555,
+	VPDMA_DATA_FMT_ABGR16,
+	VPDMA_DATA_FMT_BGRA16_5551,
+	VPDMA_DATA_FMT_BGRA16,
+	VPDMA_DATA_FMT_ABGR24,
+	VPDMA_DATA_FMT_BGR24,
+	VPDMA_DATA_FMT_ABGR32,
+	VPDMA_DATA_FMT_BGRA24,
+	VPDMA_DATA_FMT_BGRA32,
+};
+
+enum vpdma_misc_formats {
+	VPDMA_DATA_FMT_MV = 0,
+};
+
+extern const struct vpdma_data_format vpdma_yuv_fmts[];
+extern const struct vpdma_data_format vpdma_rgb_fmts[];
+extern const struct vpdma_data_format vpdma_misc_fmts[];
+
+enum vpdma_frame_start_event {
+	VPDMA_FSEVENT_HDMI_FID = 0,
+	VPDMA_FSEVENT_DVO2_FID,
+	VPDMA_FSEVENT_HDCOMP_FID,
+	VPDMA_FSEVENT_SD_FID,
+	VPDMA_FSEVENT_LM_FID0,
+	VPDMA_FSEVENT_LM_FID1,
+	VPDMA_FSEVENT_LM_FID2,
+	VPDMA_FSEVENT_CHANNEL_ACTIVE,
+};
+
+/*
+ * VPDMA channel numbers
+ */
+enum vpdma_channel {
+	VPE_CHAN_LUMA1_IN,
+	VPE_CHAN_CHROMA1_IN,
+	VPE_CHAN_LUMA2_IN,
+	VPE_CHAN_CHROMA2_IN,
+	VPE_CHAN_LUMA3_IN,
+	VPE_CHAN_CHROMA3_IN,
+	VPE_CHAN_MV_IN,
+	VPE_CHAN_MV_OUT,
+	VPE_CHAN_LUMA_OUT,
+	VPE_CHAN_CHROMA_OUT,
+	VPE_CHAN_RGB_OUT,
+};
+
+/* vpdma descriptor buffer allocation and management */
+int vpdma_alloc_desc_buf(struct vpdma_buf *buf, size_t size);
+void vpdma_free_desc_buf(struct vpdma_buf *buf);
+int vpdma_map_desc_buf(struct vpdma_data *vpdma, struct vpdma_buf *buf);
+void vpdma_unmap_desc_buf(struct vpdma_data *vpdma, struct vpdma_buf *buf);
+
+/* vpdma descriptor list funcs */
+int vpdma_create_desc_list(struct vpdma_desc_list *list, size_t size, int type);
+void vpdma_reset_desc_list(struct vpdma_desc_list *list);
+void vpdma_free_desc_list(struct vpdma_desc_list *list);
+int vpdma_submit_descs(struct vpdma_data *vpdma, struct vpdma_desc_list *list);
+
+/* vpdma list interrupt management */
+void vpdma_enable_list_complete_irq(struct vpdma_data *vpdma, int list_num,
+		bool enable);
+void vpdma_clear_list_stat(struct vpdma_data *vpdma);
+
+/* vpdma client configuration */
+void vpdma_set_line_mode(struct vpdma_data *vpdma, int line_mode,
+		enum vpdma_channel chan);
+void vpdma_set_frame_start_event(struct vpdma_data *vpdma,
+		enum vpdma_frame_start_event fs_event, enum vpdma_channel chan);
+
+void vpdma_dump_regs(struct vpdma_data *vpdma);
+
+/* initialize vpdma, passed with VPE's platform device pointer */
+struct vpdma_data *vpdma_create(struct platform_device *pdev);
+
+#endif
diff --git a/drivers/media/platform/ti-vpe/vpdma_priv.h b/drivers/media/platform/ti-vpe/vpdma_priv.h
new file mode 100644
index 0000000..8ff51a3
--- /dev/null
+++ b/drivers/media/platform/ti-vpe/vpdma_priv.h
@@ -0,0 +1,119 @@
+/*
+ * Copyright (c) 2013 Texas Instruments Inc.
+ *
+ * David Griego, <dagriego@biglakesoftware.com>
+ * Dale Farnsworth, <dale@farnsworth.org>
+ * Archit Taneja, <archit@ti.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#ifndef _TI_VPDMA_PRIV_H_
+#define _TI_VPDMA_PRIV_H_
+
+/*
+ * VPDMA Register offsets
+ */
+
+/* Top level */
+#define VPDMA_PID		0x00
+#define VPDMA_LIST_ADDR		0x04
+#define VPDMA_LIST_ATTR		0x08
+#define VPDMA_LIST_STAT_SYNC	0x0c
+#define VPDMA_BG_RGB		0x18
+#define VPDMA_BG_YUV		0x1c
+#define VPDMA_SETUP		0x30
+#define VPDMA_MAX_SIZE1		0x34
+#define VPDMA_MAX_SIZE2		0x38
+#define VPDMA_MAX_SIZE3		0x3c
+
+/* Interrupts */
+#define VPDMA_INT_CHAN_STAT(grp)	(0x40 + grp * 8)
+#define VPDMA_INT_CHAN_MASK(grp)	(VPDMA_INT_CHAN_STAT(grp) + 4)
+#define VPDMA_INT_CLIENT0_STAT		0x78
+#define VPDMA_INT_CLIENT0_MASK		0x7c
+#define VPDMA_INT_CLIENT1_STAT		0x80
+#define VPDMA_INT_CLIENT1_MASK		0x84
+#define VPDMA_INT_LIST0_STAT		0x88
+#define VPDMA_INT_LIST0_MASK		0x8c
+
+#define VPDMA_PERFMON(i)		(0x200 + i * 4)
+
+/* VPE specific client registers */
+#define VPDMA_DEI_CHROMA1_CSTAT		0x0300
+#define VPDMA_DEI_LUMA1_CSTAT		0x0304
+#define VPDMA_DEI_LUMA2_CSTAT		0x0308
+#define VPDMA_DEI_CHROMA2_CSTAT		0x030c
+#define VPDMA_DEI_LUMA3_CSTAT		0x0310
+#define VPDMA_DEI_CHROMA3_CSTAT		0x0314
+#define VPDMA_DEI_MV_IN_CSTAT		0x0330
+#define VPDMA_DEI_MV_OUT_CSTAT		0x033c
+#define VPDMA_VIP_UP_Y_CSTAT		0x0390
+#define VPDMA_VIP_UP_UV_CSTAT		0x0394
+#define VPDMA_VPI_CTL_CSTAT		0x03d0
+
+/* Reg field info for VPDMA_CLIENT_CSTAT registers */
+#define VPDMA_CSTAT_LINE_MODE_MASK	0x03
+#define VPDMA_CSTAT_LINE_MODE_SHIFT	8
+#define VPDMA_CSTAT_FRAME_START_MASK	0xf
+#define VPDMA_CSTAT_FRAME_START_SHIFT	10
+
+#define VPDMA_LIST_NUM_MASK		0x07
+#define VPDMA_LIST_NUM_SHFT		24
+#define VPDMA_LIST_STOP_SHFT		20
+#define VPDMA_LIST_RDY_MASK		0x01
+#define VPDMA_LIST_RDY_SHFT		19
+#define VPDMA_LIST_TYPE_MASK		0x03
+#define VPDMA_LIST_TYPE_SHFT		16
+#define VPDMA_LIST_SIZE_MASK		0xffff
+
+/* VPDMA data type values for data formats */
+#define DATA_TYPE_Y444				0x0
+#define DATA_TYPE_Y422				0x1
+#define DATA_TYPE_Y420				0x2
+#define DATA_TYPE_C444				0x4
+#define DATA_TYPE_C422				0x5
+#define DATA_TYPE_C420				0x6
+#define DATA_TYPE_YC422				0x7
+#define DATA_TYPE_YC444				0x8
+#define DATA_TYPE_CY422				0x23
+
+#define DATA_TYPE_RGB16_565			0x0
+#define DATA_TYPE_ARGB_1555			0x1
+#define DATA_TYPE_ARGB_4444			0x2
+#define DATA_TYPE_RGBA_5551			0x3
+#define DATA_TYPE_RGBA_4444			0x4
+#define DATA_TYPE_ARGB24_6666			0x5
+#define DATA_TYPE_RGB24_888			0x6
+#define DATA_TYPE_ARGB32_8888			0x7
+#define DATA_TYPE_RGBA24_6666			0x8
+#define DATA_TYPE_RGBA32_8888			0x9
+#define DATA_TYPE_BGR16_565			0x10
+#define DATA_TYPE_ABGR_1555			0x11
+#define DATA_TYPE_ABGR_4444			0x12
+#define DATA_TYPE_BGRA_5551			0x13
+#define DATA_TYPE_BGRA_4444			0x14
+#define DATA_TYPE_ABGR24_6666			0x15
+#define DATA_TYPE_BGR24_888			0x16
+#define DATA_TYPE_ABGR32_8888			0x17
+#define DATA_TYPE_BGRA24_6666			0x18
+#define DATA_TYPE_BGRA32_8888			0x19
+
+#define DATA_TYPE_MV				0x3
+
+/* VPDMA channel numbers(only VPE channels for now) */
+#define	VPE_CHAN_NUM_LUMA1_IN		0
+#define	VPE_CHAN_NUM_CHROMA1_IN		1
+#define	VPE_CHAN_NUM_LUMA2_IN		2
+#define	VPE_CHAN_NUM_CHROMA2_IN		3
+#define	VPE_CHAN_NUM_LUMA3_IN		4
+#define	VPE_CHAN_NUM_CHROMA3_IN		5
+#define	VPE_CHAN_NUM_MV_IN		12
+#define	VPE_CHAN_NUM_MV_OUT		15
+#define	VPE_CHAN_NUM_LUMA_OUT		102
+#define	VPE_CHAN_NUM_CHROMA_OUT		103
+#define	VPE_CHAN_NUM_RGB_OUT		106
+
+#endif
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v4 1/4] v4l: ti-vpe: Create a vpdma helper library
@ 2013-09-06 10:12       ` Archit Taneja
  0 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-09-06 10:12 UTC (permalink / raw)
  To: linux-media, hverkuil, laurent.pinchart
  Cc: linux-omap, tomi.valkeinen, Archit Taneja

The primary function of VPDMA is to move data between external memory and
internal processing modules(in our case, VPE) that source or sink data. VPDMA is
capable of buffering this data and then delivering the data as demanded to the
modules as programmed. The modules that source or sink data are referred to as
clients or ports. A channel is setup inside the VPDMA to connect a specific
memory buffer to a specific client. The VPDMA centralizes the DMA control
functions and buffering required to allow all the clients to minimize the
effect of long latency times.

Add the following to the VPDMA helper:

- A data struct which describe VPDMA channels. For now, these channels are the
  ones used only by VPE, the list of channels will increase when VIP(Video
  Input Port) also uses the VPDMA library. This channel information will be
  used to populate fields required by data descriptors.

- Data structs which describe the different data types supported by VPDMA. This
  data type information will be used to populate fields required by data
  descriptors and used by the VPE driver to map a V4L2 format to the
  corresponding VPDMA data type.

- Provide VPDMA register offset definitions, functions to read, write and modify
  VPDMA registers.

- Functions to create and submit a VPDMA list. A list is a group of descriptors
  that makes up a set of DMA transfers that need to be completed. Each
  descriptor will either perform a DMA transaction to fetch input buffers and
  write to output buffers(data descriptors), or configure the MMRs of sub blocks
  of VPE(configuration descriptors), or provide control information to VPDMA
  (control descriptors).

- Functions to allocate, map and unmap buffers needed for the descriptor list,
  payloads containing MMR values and scaler coefficients. These use the DMA
  mapping APIs to ensure exclusive access to VPDMA.

- Functions to enable VPDMA interrupts. VPDMA can trigger an interrupt on the
  VPE interrupt line when a descriptor list is parsed completely and the DMA
  transactions are completed. This requires masking the events in VPDMA
  registers and configuring some top level VPE interrupt registers.

- Enable some VPDMA specific parameters: frame start event(when to start DMA for
  a client) and line mode(whether each line fetched should be mirrored or not).

- Function to load firmware required by VPDMA. VPDMA requires a firmware for
  it's internal list manager. We add the required request_firmware apis to fetch
  this firmware from user space.

- Function to dump VPDMA registers.

- A function to initialize and create a VPDMA instance, this will be called by
  the VPE driver with it's platform device pointer, this function will take care
  of loading VPDMA firmware and returning a vpdma_data instance back to the VPE
  driver. The VIP driver will also call the same init function to initialize it's
  own VPDMA instance.

Signed-off-by: Archit Taneja <archit@ti.com>
---
 drivers/media/platform/ti-vpe/vpdma.c      | 578 +++++++++++++++++++++++++++++
 drivers/media/platform/ti-vpe/vpdma.h      | 155 ++++++++
 drivers/media/platform/ti-vpe/vpdma_priv.h | 119 ++++++
 3 files changed, 852 insertions(+)
 create mode 100644 drivers/media/platform/ti-vpe/vpdma.c
 create mode 100644 drivers/media/platform/ti-vpe/vpdma.h
 create mode 100644 drivers/media/platform/ti-vpe/vpdma_priv.h

diff --git a/drivers/media/platform/ti-vpe/vpdma.c b/drivers/media/platform/ti-vpe/vpdma.c
new file mode 100644
index 0000000..42db12c
--- /dev/null
+++ b/drivers/media/platform/ti-vpe/vpdma.c
@@ -0,0 +1,578 @@
+/*
+ * VPDMA helper library
+ *
+ * Copyright (c) 2013 Texas Instruments Inc.
+ *
+ * David Griego, <dagriego@biglakesoftware.com>
+ * Dale Farnsworth, <dale@farnsworth.org>
+ * Archit Taneja, <archit@ti.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#include <linux/delay.h>
+#include <linux/dma-mapping.h>
+#include <linux/err.h>
+#include <linux/firmware.h>
+#include <linux/io.h>
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+
+#include "vpdma.h"
+#include "vpdma_priv.h"
+
+#define VPDMA_FIRMWARE	"vpdma-1b8.bin"
+
+const struct vpdma_data_format vpdma_yuv_fmts[] = {
+	[VPDMA_DATA_FMT_Y444] = {
+		.data_type	= DATA_TYPE_Y444,
+		.depth		= 8,
+	},
+	[VPDMA_DATA_FMT_Y422] = {
+		.data_type	= DATA_TYPE_Y422,
+		.depth		= 8,
+	},
+	[VPDMA_DATA_FMT_Y420] = {
+		.data_type	= DATA_TYPE_Y420,
+		.depth		= 8,
+	},
+	[VPDMA_DATA_FMT_C444] = {
+		.data_type	= DATA_TYPE_C444,
+		.depth		= 8,
+	},
+	[VPDMA_DATA_FMT_C422] = {
+		.data_type	= DATA_TYPE_C422,
+		.depth		= 8,
+	},
+	[VPDMA_DATA_FMT_C420] = {
+		.data_type	= DATA_TYPE_C420,
+		.depth		= 4,
+	},
+	[VPDMA_DATA_FMT_YC422] = {
+		.data_type	= DATA_TYPE_YC422,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_YC444] = {
+		.data_type	= DATA_TYPE_YC444,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_CY422] = {
+		.data_type	= DATA_TYPE_CY422,
+		.depth		= 16,
+	},
+};
+
+const struct vpdma_data_format vpdma_rgb_fmts[] = {
+	[VPDMA_DATA_FMT_RGB565] = {
+		.data_type	= DATA_TYPE_RGB16_565,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_ARGB16_1555] = {
+		.data_type	= DATA_TYPE_ARGB_1555,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_ARGB16] = {
+		.data_type	= DATA_TYPE_ARGB_4444,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_RGBA16_5551] = {
+		.data_type	= DATA_TYPE_RGBA_5551,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_RGBA16] = {
+		.data_type	= DATA_TYPE_RGBA_4444,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_ARGB24] = {
+		.data_type	= DATA_TYPE_ARGB24_6666,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_RGB24] = {
+		.data_type	= DATA_TYPE_RGB24_888,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_ARGB32] = {
+		.data_type	= DATA_TYPE_ARGB32_8888,
+		.depth		= 32,
+	},
+	[VPDMA_DATA_FMT_RGBA24] = {
+		.data_type	= DATA_TYPE_RGBA24_6666,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_RGBA32] = {
+		.data_type	= DATA_TYPE_RGBA32_8888,
+		.depth		= 32,
+	},
+	[VPDMA_DATA_FMT_BGR565] = {
+		.data_type	= DATA_TYPE_BGR16_565,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_ABGR16_1555] = {
+		.data_type	= DATA_TYPE_ABGR_1555,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_ABGR16] = {
+		.data_type	= DATA_TYPE_ABGR_4444,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_BGRA16_5551] = {
+		.data_type	= DATA_TYPE_BGRA_5551,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_BGRA16] = {
+		.data_type	= DATA_TYPE_BGRA_4444,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_ABGR24] = {
+		.data_type	= DATA_TYPE_ABGR24_6666,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_BGR24] = {
+		.data_type	= DATA_TYPE_BGR24_888,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_ABGR32] = {
+		.data_type	= DATA_TYPE_ABGR32_8888,
+		.depth		= 32,
+	},
+	[VPDMA_DATA_FMT_BGRA24] = {
+		.data_type	= DATA_TYPE_BGRA24_6666,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_BGRA32] = {
+		.data_type	= DATA_TYPE_BGRA32_8888,
+		.depth		= 32,
+	},
+};
+
+const struct vpdma_data_format vpdma_misc_fmts[] = {
+	[VPDMA_DATA_FMT_MV] = {
+		.data_type	= DATA_TYPE_MV,
+		.depth		= 4,
+	},
+};
+
+struct vpdma_channel_info {
+	int num;		/* VPDMA channel number */
+	int cstat_offset;	/* client CSTAT register offset */
+};
+
+static const struct vpdma_channel_info chan_info[] = {
+	[VPE_CHAN_LUMA1_IN] = {
+		.num		= VPE_CHAN_NUM_LUMA1_IN,
+		.cstat_offset	= VPDMA_DEI_LUMA1_CSTAT,
+	},
+	[VPE_CHAN_CHROMA1_IN] = {
+		.num		= VPE_CHAN_NUM_CHROMA1_IN,
+		.cstat_offset	= VPDMA_DEI_CHROMA1_CSTAT,
+	},
+	[VPE_CHAN_LUMA2_IN] = {
+		.num		= VPE_CHAN_NUM_LUMA2_IN,
+		.cstat_offset	= VPDMA_DEI_LUMA2_CSTAT,
+	},
+	[VPE_CHAN_CHROMA2_IN] = {
+		.num		= VPE_CHAN_NUM_CHROMA2_IN,
+		.cstat_offset	= VPDMA_DEI_CHROMA2_CSTAT,
+	},
+	[VPE_CHAN_LUMA3_IN] = {
+		.num		= VPE_CHAN_NUM_LUMA3_IN,
+		.cstat_offset	= VPDMA_DEI_LUMA3_CSTAT,
+	},
+	[VPE_CHAN_CHROMA3_IN] = {
+		.num		= VPE_CHAN_NUM_CHROMA3_IN,
+		.cstat_offset	= VPDMA_DEI_CHROMA3_CSTAT,
+	},
+	[VPE_CHAN_MV_IN] = {
+		.num		= VPE_CHAN_NUM_MV_IN,
+		.cstat_offset	= VPDMA_DEI_MV_IN_CSTAT,
+	},
+	[VPE_CHAN_MV_OUT] = {
+		.num		= VPE_CHAN_NUM_MV_OUT,
+		.cstat_offset	= VPDMA_DEI_MV_OUT_CSTAT,
+	},
+	[VPE_CHAN_LUMA_OUT] = {
+		.num		= VPE_CHAN_NUM_LUMA_OUT,
+		.cstat_offset	= VPDMA_VIP_UP_Y_CSTAT,
+	},
+	[VPE_CHAN_CHROMA_OUT] = {
+		.num		= VPE_CHAN_NUM_CHROMA_OUT,
+		.cstat_offset	= VPDMA_VIP_UP_UV_CSTAT,
+	},
+	[VPE_CHAN_RGB_OUT] = {
+		.num		= VPE_CHAN_NUM_RGB_OUT,
+		.cstat_offset	= VPDMA_VIP_UP_Y_CSTAT,
+	},
+};
+
+static u32 read_reg(struct vpdma_data *vpdma, int offset)
+{
+	return ioread32(vpdma->base + offset);
+}
+
+static void write_reg(struct vpdma_data *vpdma, int offset, u32 value)
+{
+	iowrite32(value, vpdma->base + offset);
+}
+
+static int read_field_reg(struct vpdma_data *vpdma, int offset,
+		u32 mask, int shift)
+{
+	return (read_reg(vpdma, offset) & (mask << shift)) >> shift;
+}
+
+static void write_field_reg(struct vpdma_data *vpdma, int offset, u32 field,
+		u32 mask, int shift)
+{
+	u32 val = read_reg(vpdma, offset);
+
+	val &= ~(mask << shift);
+	val |= (field & mask) << shift;
+
+	write_reg(vpdma, offset, val);
+}
+
+void vpdma_dump_regs(struct vpdma_data *vpdma)
+{
+	struct device *dev = &vpdma->pdev->dev;
+
+#define DUMPREG(r) dev_dbg(dev, "%-35s %08x\n", #r, read_reg(vpdma, VPDMA_##r))
+
+	dev_dbg(dev, "VPDMA Registers:\n");
+
+	DUMPREG(PID);
+	DUMPREG(LIST_ADDR);
+	DUMPREG(LIST_ATTR);
+	DUMPREG(LIST_STAT_SYNC);
+	DUMPREG(BG_RGB);
+	DUMPREG(BG_YUV);
+	DUMPREG(SETUP);
+	DUMPREG(MAX_SIZE1);
+	DUMPREG(MAX_SIZE2);
+	DUMPREG(MAX_SIZE3);
+
+	/*
+	 * dumping registers of only group0 and group3, because VPE channels
+	 * lie within group0 and group3 registers
+	 */
+	DUMPREG(INT_CHAN_STAT(0));
+	DUMPREG(INT_CHAN_MASK(0));
+	DUMPREG(INT_CHAN_STAT(3));
+	DUMPREG(INT_CHAN_MASK(3));
+	DUMPREG(INT_CLIENT0_STAT);
+	DUMPREG(INT_CLIENT0_MASK);
+	DUMPREG(INT_CLIENT1_STAT);
+	DUMPREG(INT_CLIENT1_MASK);
+	DUMPREG(INT_LIST0_STAT);
+	DUMPREG(INT_LIST0_MASK);
+
+	/*
+	 * these are registers specific to VPE clients, we can make this
+	 * function dump client registers specific to VPE or VIP based on
+	 * who is using it
+	 */
+	DUMPREG(DEI_CHROMA1_CSTAT);
+	DUMPREG(DEI_LUMA1_CSTAT);
+	DUMPREG(DEI_CHROMA2_CSTAT);
+	DUMPREG(DEI_LUMA2_CSTAT);
+	DUMPREG(DEI_CHROMA3_CSTAT);
+	DUMPREG(DEI_LUMA3_CSTAT);
+	DUMPREG(DEI_MV_IN_CSTAT);
+	DUMPREG(DEI_MV_OUT_CSTAT);
+	DUMPREG(VIP_UP_Y_CSTAT);
+	DUMPREG(VIP_UP_UV_CSTAT);
+	DUMPREG(VPI_CTL_CSTAT);
+}
+
+/*
+ * Allocate a DMA buffer
+ */
+int vpdma_alloc_desc_buf(struct vpdma_buf *buf, size_t size)
+{
+	buf->size = size;
+	buf->mapped = false;
+	buf->addr = kzalloc(size, GFP_KERNEL);
+	if (!buf->addr)
+		return -ENOMEM;
+
+	WARN_ON((u32) buf->addr & VPDMA_DESC_ALIGN);
+
+	return 0;
+}
+
+void vpdma_free_desc_buf(struct vpdma_buf *buf)
+{
+	WARN_ON(buf->mapped);
+	kfree(buf->addr);
+	buf->addr = NULL;
+	buf->size = 0;
+}
+
+/*
+ * map descriptor/payload DMA buffer, enabling DMA access
+ */
+int vpdma_map_desc_buf(struct vpdma_data *vpdma, struct vpdma_buf *buf)
+{
+	struct device *dev = &vpdma->pdev->dev;
+
+	WARN_ON(buf->mapped);
+	buf->dma_addr = dma_map_single(dev, buf->addr, buf->size,
+				DMA_TO_DEVICE);
+	if (dma_mapping_error(dev, buf->dma_addr)) {
+		dev_err(dev, "failed to map buffer\n");
+		return -EINVAL;
+	}
+
+	buf->mapped = true;
+
+	return 0;
+}
+
+/*
+ * unmap descriptor/payload DMA buffer, disabling DMA access and
+ * allowing the main processor to acces the data
+ */
+void vpdma_unmap_desc_buf(struct vpdma_data *vpdma, struct vpdma_buf *buf)
+{
+	struct device *dev = &vpdma->pdev->dev;
+
+	if (buf->mapped)
+		dma_unmap_single(dev, buf->dma_addr, buf->size, DMA_TO_DEVICE);
+
+	buf->mapped = false;
+}
+
+/*
+ * create a descriptor list, the user of this list will append configuration,
+ * control and data descriptors to this list, this list will be submitted to
+ * VPDMA. VPDMA's list parser will go through each descriptor and perform the
+ * required DMA operations
+ */
+int vpdma_create_desc_list(struct vpdma_desc_list *list, size_t size, int type)
+{
+	int r;
+
+	r = vpdma_alloc_desc_buf(&list->buf, size);
+	if (r)
+		return r;
+
+	list->next = list->buf.addr;
+
+	list->type = type;
+
+	return 0;
+}
+
+/*
+ * once a descriptor list is parsed by VPDMA, we reset the list by emptying it,
+ * to allow new descriptors to be added to the list.
+ */
+void vpdma_reset_desc_list(struct vpdma_desc_list *list)
+{
+	list->next = list->buf.addr;
+}
+
+/*
+ * free the buffer allocated fot the VPDMA descriptor list, this should be
+ * called when the user doesn't want to use VPDMA any more.
+ */
+void vpdma_free_desc_list(struct vpdma_desc_list *list)
+{
+	vpdma_free_desc_buf(&list->buf);
+
+	list->next = NULL;
+}
+
+static bool vpdma_list_busy(struct vpdma_data *vpdma, int list_num)
+{
+	return read_reg(vpdma, VPDMA_LIST_STAT_SYNC) & BIT(list_num + 16);
+}
+
+/*
+ * submit a list of DMA descriptors to the VPE VPDMA, do not wait for completion
+ */
+int vpdma_submit_descs(struct vpdma_data *vpdma, struct vpdma_desc_list *list)
+{
+	/* we always use the first list */
+	int list_num = 0;
+	int list_size;
+
+	if (vpdma_list_busy(vpdma, list_num))
+		return -EBUSY;
+
+	/* 16-byte granularity */
+	list_size = (list->next - list->buf.addr) >> 4;
+
+	write_reg(vpdma, VPDMA_LIST_ADDR, (u32) list->buf.dma_addr);
+
+	write_reg(vpdma, VPDMA_LIST_ATTR,
+			(list_num << VPDMA_LIST_NUM_SHFT) |
+			(list->type << VPDMA_LIST_TYPE_SHFT) |
+			list_size);
+
+	return 0;
+}
+
+/* set or clear the mask for list complete interrupt */
+void vpdma_enable_list_complete_irq(struct vpdma_data *vpdma, int list_num,
+		bool enable)
+{
+	u32 val;
+
+	val = read_reg(vpdma, VPDMA_INT_LIST0_MASK);
+	if (enable)
+		val |= (1 << (list_num * 2));
+	else
+		val &= ~(1 << (list_num * 2));
+	write_reg(vpdma, VPDMA_INT_LIST0_MASK, val);
+}
+
+/* clear previosuly occured list intterupts in the LIST_STAT register */
+void vpdma_clear_list_stat(struct vpdma_data *vpdma)
+{
+	write_reg(vpdma, VPDMA_INT_LIST0_STAT,
+		read_reg(vpdma, VPDMA_INT_LIST0_STAT));
+}
+
+/*
+ * configures the output mode of the line buffer for the given client, the
+ * line buffer content can either be mirrored(each line repeated twice) or
+ * passed to the client as is
+ */
+void vpdma_set_line_mode(struct vpdma_data *vpdma, int line_mode,
+		enum vpdma_channel chan)
+{
+	int client_cstat = chan_info[chan].cstat_offset;
+
+	write_field_reg(vpdma, client_cstat, line_mode,
+		VPDMA_CSTAT_LINE_MODE_MASK, VPDMA_CSTAT_LINE_MODE_SHIFT);
+}
+
+/*
+ * configures the event which should trigger VPDMA transfer for the given
+ * client
+ */
+void vpdma_set_frame_start_event(struct vpdma_data *vpdma,
+		enum vpdma_frame_start_event fs_event,
+		enum vpdma_channel chan)
+{
+	int client_cstat = chan_info[chan].cstat_offset;
+
+	write_field_reg(vpdma, client_cstat, fs_event,
+		VPDMA_CSTAT_FRAME_START_MASK, VPDMA_CSTAT_FRAME_START_SHIFT);
+}
+
+static void vpdma_firmware_cb(const struct firmware *f, void *context)
+{
+	struct vpdma_data *vpdma = context;
+	struct vpdma_buf fw_dma_buf;
+	int i, r;
+
+	dev_dbg(&vpdma->pdev->dev, "firmware callback\n");
+
+	if (!f || !f->data) {
+		dev_err(&vpdma->pdev->dev, "couldn't get firmware\n");
+		return;
+	}
+
+	/* already initialized */
+	if (read_field_reg(vpdma, VPDMA_LIST_ATTR, VPDMA_LIST_RDY_MASK,
+			VPDMA_LIST_RDY_SHFT)) {
+		vpdma->ready = true;
+		return;
+	}
+
+	r = vpdma_alloc_desc_buf(&fw_dma_buf, f->size);
+	if (r) {
+		dev_err(&vpdma->pdev->dev,
+			"failed to allocate dma buffer for firmware\n");
+		goto rel_fw;
+	}
+
+	memcpy(fw_dma_buf.addr, f->data, f->size);
+
+	vpdma_map_desc_buf(vpdma, &fw_dma_buf);
+
+	write_reg(vpdma, VPDMA_LIST_ADDR, (u32) fw_dma_buf.dma_addr);
+
+	for (i = 0; i < 100; i++) {		/* max 1 second */
+		msleep_interruptible(10);
+
+		if (read_field_reg(vpdma, VPDMA_LIST_ATTR, VPDMA_LIST_RDY_MASK,
+				VPDMA_LIST_RDY_SHFT))
+			break;
+	}
+
+	if (i == 100) {
+		dev_err(&vpdma->pdev->dev, "firmware upload failed\n");
+		goto free_buf;
+	}
+
+	vpdma->ready = true;
+
+free_buf:
+	vpdma_unmap_desc_buf(vpdma, &fw_dma_buf);
+
+	vpdma_free_desc_buf(&fw_dma_buf);
+rel_fw:
+	release_firmware(f);
+}
+
+static int vpdma_load_firmware(struct vpdma_data *vpdma)
+{
+	int r;
+	struct device *dev = &vpdma->pdev->dev;
+
+	r = request_firmware_nowait(THIS_MODULE, 1,
+		(const char *) VPDMA_FIRMWARE, dev, GFP_KERNEL, vpdma,
+		vpdma_firmware_cb);
+	if (r) {
+		dev_err(dev, "firmware not available %s\n", VPDMA_FIRMWARE);
+		return r;
+	} else {
+		dev_info(dev, "loading firmware %s\n", VPDMA_FIRMWARE);
+	}
+
+	return 0;
+}
+
+struct vpdma_data *vpdma_create(struct platform_device *pdev)
+{
+	struct resource *res;
+	struct vpdma_data *vpdma;
+	int r;
+
+	dev_dbg(&pdev->dev, "vpdma_create\n");
+
+	vpdma = devm_kzalloc(&pdev->dev, sizeof(*vpdma), GFP_KERNEL);
+	if (!vpdma) {
+		dev_err(&pdev->dev, "couldn't alloc vpdma_dev\n");
+		return ERR_PTR(-ENOMEM);
+	}
+
+	vpdma->pdev = pdev;
+
+	res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "vpdma");
+	if (res == NULL) {
+		dev_err(&pdev->dev, "missing platform resources data\n");
+		return ERR_PTR(-ENODEV);
+	}
+
+	vpdma->base = devm_ioremap(&pdev->dev, res->start, resource_size(res));
+	if (!vpdma->base) {
+		dev_err(&pdev->dev, "failed to ioremap\n");
+		return ERR_PTR(-ENOMEM);
+	}
+
+	r = vpdma_load_firmware(vpdma);
+	if (r) {
+		pr_err("failed to load firmware %s\n", VPDMA_FIRMWARE);
+		return ERR_PTR(r);
+	}
+
+	return vpdma;
+}
+MODULE_FIRMWARE(VPDMA_FIRMWARE);
diff --git a/drivers/media/platform/ti-vpe/vpdma.h b/drivers/media/platform/ti-vpe/vpdma.h
new file mode 100644
index 0000000..8056689
--- /dev/null
+++ b/drivers/media/platform/ti-vpe/vpdma.h
@@ -0,0 +1,155 @@
+/*
+ * Copyright (c) 2013 Texas Instruments Inc.
+ *
+ * David Griego, <dagriego@biglakesoftware.com>
+ * Dale Farnsworth, <dale@farnsworth.org>
+ * Archit Taneja, <archit@ti.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#ifndef __TI_VPDMA_H_
+#define __TI_VPDMA_H_
+
+/*
+ * A vpdma_buf tracks the size, DMA address and mapping status of each
+ * driver DMA area.
+ */
+struct vpdma_buf {
+	void			*addr;
+	dma_addr_t		dma_addr;
+	size_t			size;
+	bool			mapped;
+};
+
+struct vpdma_desc_list {
+	struct vpdma_buf buf;
+	void *next;
+	int type;
+};
+
+struct vpdma_data {
+	void __iomem		*base;
+
+	struct platform_device	*pdev;
+
+	/* tells whether vpdma firmware is loaded or not */
+	bool ready;
+};
+
+struct vpdma_data_format {
+	int data_type;
+	u8 depth;
+};
+
+#define VPDMA_DESC_ALIGN		16	/* 16-byte descriptor alignment */
+
+#define VPDMA_DTD_DESC_SIZE		32	/* 8 words */
+#define VPDMA_CFD_CTD_DESC_SIZE		16	/* 4 words */
+
+#define VPDMA_LIST_TYPE_NORMAL		0
+#define VPDMA_LIST_TYPE_SELF_MODIFYING	1
+#define VPDMA_LIST_TYPE_DOORBELL	2
+
+enum vpdma_yuv_formats {
+	VPDMA_DATA_FMT_Y444 = 0,
+	VPDMA_DATA_FMT_Y422,
+	VPDMA_DATA_FMT_Y420,
+	VPDMA_DATA_FMT_C444,
+	VPDMA_DATA_FMT_C422,
+	VPDMA_DATA_FMT_C420,
+	VPDMA_DATA_FMT_YC422,
+	VPDMA_DATA_FMT_YC444,
+	VPDMA_DATA_FMT_CY422,
+};
+
+enum vpdma_rgb_formats {
+	VPDMA_DATA_FMT_RGB565 = 0,
+	VPDMA_DATA_FMT_ARGB16_1555,
+	VPDMA_DATA_FMT_ARGB16,
+	VPDMA_DATA_FMT_RGBA16_5551,
+	VPDMA_DATA_FMT_RGBA16,
+	VPDMA_DATA_FMT_ARGB24,
+	VPDMA_DATA_FMT_RGB24,
+	VPDMA_DATA_FMT_ARGB32,
+	VPDMA_DATA_FMT_RGBA24,
+	VPDMA_DATA_FMT_RGBA32,
+	VPDMA_DATA_FMT_BGR565,
+	VPDMA_DATA_FMT_ABGR16_1555,
+	VPDMA_DATA_FMT_ABGR16,
+	VPDMA_DATA_FMT_BGRA16_5551,
+	VPDMA_DATA_FMT_BGRA16,
+	VPDMA_DATA_FMT_ABGR24,
+	VPDMA_DATA_FMT_BGR24,
+	VPDMA_DATA_FMT_ABGR32,
+	VPDMA_DATA_FMT_BGRA24,
+	VPDMA_DATA_FMT_BGRA32,
+};
+
+enum vpdma_misc_formats {
+	VPDMA_DATA_FMT_MV = 0,
+};
+
+extern const struct vpdma_data_format vpdma_yuv_fmts[];
+extern const struct vpdma_data_format vpdma_rgb_fmts[];
+extern const struct vpdma_data_format vpdma_misc_fmts[];
+
+enum vpdma_frame_start_event {
+	VPDMA_FSEVENT_HDMI_FID = 0,
+	VPDMA_FSEVENT_DVO2_FID,
+	VPDMA_FSEVENT_HDCOMP_FID,
+	VPDMA_FSEVENT_SD_FID,
+	VPDMA_FSEVENT_LM_FID0,
+	VPDMA_FSEVENT_LM_FID1,
+	VPDMA_FSEVENT_LM_FID2,
+	VPDMA_FSEVENT_CHANNEL_ACTIVE,
+};
+
+/*
+ * VPDMA channel numbers
+ */
+enum vpdma_channel {
+	VPE_CHAN_LUMA1_IN,
+	VPE_CHAN_CHROMA1_IN,
+	VPE_CHAN_LUMA2_IN,
+	VPE_CHAN_CHROMA2_IN,
+	VPE_CHAN_LUMA3_IN,
+	VPE_CHAN_CHROMA3_IN,
+	VPE_CHAN_MV_IN,
+	VPE_CHAN_MV_OUT,
+	VPE_CHAN_LUMA_OUT,
+	VPE_CHAN_CHROMA_OUT,
+	VPE_CHAN_RGB_OUT,
+};
+
+/* vpdma descriptor buffer allocation and management */
+int vpdma_alloc_desc_buf(struct vpdma_buf *buf, size_t size);
+void vpdma_free_desc_buf(struct vpdma_buf *buf);
+int vpdma_map_desc_buf(struct vpdma_data *vpdma, struct vpdma_buf *buf);
+void vpdma_unmap_desc_buf(struct vpdma_data *vpdma, struct vpdma_buf *buf);
+
+/* vpdma descriptor list funcs */
+int vpdma_create_desc_list(struct vpdma_desc_list *list, size_t size, int type);
+void vpdma_reset_desc_list(struct vpdma_desc_list *list);
+void vpdma_free_desc_list(struct vpdma_desc_list *list);
+int vpdma_submit_descs(struct vpdma_data *vpdma, struct vpdma_desc_list *list);
+
+/* vpdma list interrupt management */
+void vpdma_enable_list_complete_irq(struct vpdma_data *vpdma, int list_num,
+		bool enable);
+void vpdma_clear_list_stat(struct vpdma_data *vpdma);
+
+/* vpdma client configuration */
+void vpdma_set_line_mode(struct vpdma_data *vpdma, int line_mode,
+		enum vpdma_channel chan);
+void vpdma_set_frame_start_event(struct vpdma_data *vpdma,
+		enum vpdma_frame_start_event fs_event, enum vpdma_channel chan);
+
+void vpdma_dump_regs(struct vpdma_data *vpdma);
+
+/* initialize vpdma, passed with VPE's platform device pointer */
+struct vpdma_data *vpdma_create(struct platform_device *pdev);
+
+#endif
diff --git a/drivers/media/platform/ti-vpe/vpdma_priv.h b/drivers/media/platform/ti-vpe/vpdma_priv.h
new file mode 100644
index 0000000..8ff51a3
--- /dev/null
+++ b/drivers/media/platform/ti-vpe/vpdma_priv.h
@@ -0,0 +1,119 @@
+/*
+ * Copyright (c) 2013 Texas Instruments Inc.
+ *
+ * David Griego, <dagriego@biglakesoftware.com>
+ * Dale Farnsworth, <dale@farnsworth.org>
+ * Archit Taneja, <archit@ti.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#ifndef _TI_VPDMA_PRIV_H_
+#define _TI_VPDMA_PRIV_H_
+
+/*
+ * VPDMA Register offsets
+ */
+
+/* Top level */
+#define VPDMA_PID		0x00
+#define VPDMA_LIST_ADDR		0x04
+#define VPDMA_LIST_ATTR		0x08
+#define VPDMA_LIST_STAT_SYNC	0x0c
+#define VPDMA_BG_RGB		0x18
+#define VPDMA_BG_YUV		0x1c
+#define VPDMA_SETUP		0x30
+#define VPDMA_MAX_SIZE1		0x34
+#define VPDMA_MAX_SIZE2		0x38
+#define VPDMA_MAX_SIZE3		0x3c
+
+/* Interrupts */
+#define VPDMA_INT_CHAN_STAT(grp)	(0x40 + grp * 8)
+#define VPDMA_INT_CHAN_MASK(grp)	(VPDMA_INT_CHAN_STAT(grp) + 4)
+#define VPDMA_INT_CLIENT0_STAT		0x78
+#define VPDMA_INT_CLIENT0_MASK		0x7c
+#define VPDMA_INT_CLIENT1_STAT		0x80
+#define VPDMA_INT_CLIENT1_MASK		0x84
+#define VPDMA_INT_LIST0_STAT		0x88
+#define VPDMA_INT_LIST0_MASK		0x8c
+
+#define VPDMA_PERFMON(i)		(0x200 + i * 4)
+
+/* VPE specific client registers */
+#define VPDMA_DEI_CHROMA1_CSTAT		0x0300
+#define VPDMA_DEI_LUMA1_CSTAT		0x0304
+#define VPDMA_DEI_LUMA2_CSTAT		0x0308
+#define VPDMA_DEI_CHROMA2_CSTAT		0x030c
+#define VPDMA_DEI_LUMA3_CSTAT		0x0310
+#define VPDMA_DEI_CHROMA3_CSTAT		0x0314
+#define VPDMA_DEI_MV_IN_CSTAT		0x0330
+#define VPDMA_DEI_MV_OUT_CSTAT		0x033c
+#define VPDMA_VIP_UP_Y_CSTAT		0x0390
+#define VPDMA_VIP_UP_UV_CSTAT		0x0394
+#define VPDMA_VPI_CTL_CSTAT		0x03d0
+
+/* Reg field info for VPDMA_CLIENT_CSTAT registers */
+#define VPDMA_CSTAT_LINE_MODE_MASK	0x03
+#define VPDMA_CSTAT_LINE_MODE_SHIFT	8
+#define VPDMA_CSTAT_FRAME_START_MASK	0xf
+#define VPDMA_CSTAT_FRAME_START_SHIFT	10
+
+#define VPDMA_LIST_NUM_MASK		0x07
+#define VPDMA_LIST_NUM_SHFT		24
+#define VPDMA_LIST_STOP_SHFT		20
+#define VPDMA_LIST_RDY_MASK		0x01
+#define VPDMA_LIST_RDY_SHFT		19
+#define VPDMA_LIST_TYPE_MASK		0x03
+#define VPDMA_LIST_TYPE_SHFT		16
+#define VPDMA_LIST_SIZE_MASK		0xffff
+
+/* VPDMA data type values for data formats */
+#define DATA_TYPE_Y444				0x0
+#define DATA_TYPE_Y422				0x1
+#define DATA_TYPE_Y420				0x2
+#define DATA_TYPE_C444				0x4
+#define DATA_TYPE_C422				0x5
+#define DATA_TYPE_C420				0x6
+#define DATA_TYPE_YC422				0x7
+#define DATA_TYPE_YC444				0x8
+#define DATA_TYPE_CY422				0x23
+
+#define DATA_TYPE_RGB16_565			0x0
+#define DATA_TYPE_ARGB_1555			0x1
+#define DATA_TYPE_ARGB_4444			0x2
+#define DATA_TYPE_RGBA_5551			0x3
+#define DATA_TYPE_RGBA_4444			0x4
+#define DATA_TYPE_ARGB24_6666			0x5
+#define DATA_TYPE_RGB24_888			0x6
+#define DATA_TYPE_ARGB32_8888			0x7
+#define DATA_TYPE_RGBA24_6666			0x8
+#define DATA_TYPE_RGBA32_8888			0x9
+#define DATA_TYPE_BGR16_565			0x10
+#define DATA_TYPE_ABGR_1555			0x11
+#define DATA_TYPE_ABGR_4444			0x12
+#define DATA_TYPE_BGRA_5551			0x13
+#define DATA_TYPE_BGRA_4444			0x14
+#define DATA_TYPE_ABGR24_6666			0x15
+#define DATA_TYPE_BGR24_888			0x16
+#define DATA_TYPE_ABGR32_8888			0x17
+#define DATA_TYPE_BGRA24_6666			0x18
+#define DATA_TYPE_BGRA32_8888			0x19
+
+#define DATA_TYPE_MV				0x3
+
+/* VPDMA channel numbers(only VPE channels for now) */
+#define	VPE_CHAN_NUM_LUMA1_IN		0
+#define	VPE_CHAN_NUM_CHROMA1_IN		1
+#define	VPE_CHAN_NUM_LUMA2_IN		2
+#define	VPE_CHAN_NUM_CHROMA2_IN		3
+#define	VPE_CHAN_NUM_LUMA3_IN		4
+#define	VPE_CHAN_NUM_CHROMA3_IN		5
+#define	VPE_CHAN_NUM_MV_IN		12
+#define	VPE_CHAN_NUM_MV_OUT		15
+#define	VPE_CHAN_NUM_LUMA_OUT		102
+#define	VPE_CHAN_NUM_CHROMA_OUT		103
+#define	VPE_CHAN_NUM_RGB_OUT		106
+
+#endif
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v4 2/4] v4l: ti-vpe: Add helpers for creating VPDMA descriptors
  2013-09-06 10:12     ` Archit Taneja
@ 2013-09-06 10:12       ` Archit Taneja
  -1 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-09-06 10:12 UTC (permalink / raw)
  To: linux-media, hverkuil, laurent.pinchart
  Cc: linux-omap, tomi.valkeinen, Archit Taneja

Create functions which the VPE driver can use to create a VPDMA descriptor and
add it to a VPDMA descriptor list. These functions take a pointer to an existing
list, and append the configuration/data/control descriptor header to the list.

In the case of configuration descriptors, the creation of a payload block may be
required(the payloads can hold VPE MMR values, or scaler coefficients). The
allocation of the payload buffer and it's content is left to the VPE driver.
However, the VPDMA library provides helper macros to create payload in the
correct format.

Add debug functions to dump the descriptors in a way such that it's easy to see
the values of different fields in the descriptors.

Signed-off-by: Archit Taneja <archit@ti.com>
---
 drivers/media/platform/ti-vpe/vpdma.c      | 268 +++++++++++++++
 drivers/media/platform/ti-vpe/vpdma.h      |  48 +++
 drivers/media/platform/ti-vpe/vpdma_priv.h | 522 +++++++++++++++++++++++++++++
 3 files changed, 838 insertions(+)

diff --git a/drivers/media/platform/ti-vpe/vpdma.c b/drivers/media/platform/ti-vpe/vpdma.c
index 42db12c..af0a5ff 100644
--- a/drivers/media/platform/ti-vpe/vpdma.c
+++ b/drivers/media/platform/ti-vpe/vpdma.c
@@ -21,6 +21,7 @@
 #include <linux/platform_device.h>
 #include <linux/sched.h>
 #include <linux/slab.h>
+#include <linux/videodev2.h>
 
 #include "vpdma.h"
 #include "vpdma_priv.h"
@@ -416,6 +417,273 @@ int vpdma_submit_descs(struct vpdma_data *vpdma, struct vpdma_desc_list *list)
 	return 0;
 }
 
+static void dump_cfd(struct vpdma_cfd *cfd)
+{
+	int class;
+
+	class = cfd_get_class(cfd);
+
+	pr_debug("config descriptor of payload class: %s\n",
+		class == CFD_CLS_BLOCK ? "simple block" :
+		"address data block");
+
+	if (class == CFD_CLS_BLOCK)
+		pr_debug("word0: dst_addr_offset = 0x%08x\n",
+			cfd->dest_addr_offset);
+
+	if (class == CFD_CLS_BLOCK)
+		pr_debug("word1: num_data_wrds = %d\n", cfd->block_len);
+
+	pr_debug("word2: payload_addr = 0x%08x\n", cfd->payload_addr);
+
+	pr_debug("word3: pkt_type = %d, direct = %d, class = %d, dest = %d, "
+		"payload_len = %d\n", cfd_get_pkt_type(cfd),
+		cfd_get_direct(cfd), class, cfd_get_dest(cfd),
+		cfd_get_payload_len(cfd));
+}
+
+/*
+ * append a configuration descriptor to the given descriptor list, where the
+ * payload is in the form of a simple data block specified in the descriptor
+ * header, this is used to upload scaler coefficients to the scaler module
+ */
+void vpdma_add_cfd_block(struct vpdma_desc_list *list, int client,
+		struct vpdma_buf *blk, u32 dest_offset)
+{
+	struct vpdma_cfd *cfd;
+	int len = blk->size;
+
+	WARN_ON(blk->dma_addr & VPDMA_DESC_ALIGN);
+
+	cfd = list->next;
+	WARN_ON((void *)(cfd + 1) > (list->buf.addr + list->buf.size));
+
+	cfd->dest_addr_offset = dest_offset;
+	cfd->block_len = len;
+	cfd->payload_addr = (u32) blk->dma_addr;
+	cfd->ctl_payload_len = cfd_pkt_payload_len(CFD_INDIRECT, CFD_CLS_BLOCK,
+				client, len >> 4);
+
+	list->next = cfd + 1;
+
+	dump_cfd(cfd);
+}
+
+/*
+ * append a configuration descriptor to the given descriptor list, where the
+ * payload is in the address data block format, this is used to a configure a
+ * discontiguous set of MMRs
+ */
+void vpdma_add_cfd_adb(struct vpdma_desc_list *list, int client,
+		struct vpdma_buf *adb)
+{
+	struct vpdma_cfd *cfd;
+	unsigned int len = adb->size;
+
+	WARN_ON(len & VPDMA_ADB_SIZE_ALIGN);
+	WARN_ON(adb->dma_addr & VPDMA_DESC_ALIGN);
+
+	cfd = list->next;
+	BUG_ON((void *)(cfd + 1) > (list->buf.addr + list->buf.size));
+
+	cfd->w0 = 0;
+	cfd->w1 = 0;
+	cfd->payload_addr = (u32) adb->dma_addr;
+	cfd->ctl_payload_len = cfd_pkt_payload_len(CFD_INDIRECT, CFD_CLS_ADB,
+				client, len >> 4);
+
+	list->next = cfd + 1;
+
+	dump_cfd(cfd);
+};
+
+/*
+ * control descriptor format change based on what type of control descriptor it
+ * is, we only use 'sync on channel' control descriptors for now, so assume it's
+ * that
+ */
+static void dump_ctd(struct vpdma_ctd *ctd)
+{
+	pr_debug("control descriptor\n");
+
+	pr_debug("word3: pkt_type = %d, source = %d, ctl_type = %d\n",
+		ctd_get_pkt_type(ctd), ctd_get_source(ctd), ctd_get_ctl(ctd));
+}
+
+/*
+ * append a 'sync on channel' type control descriptor to the given descriptor
+ * list, this descriptor stalls the VPDMA list till the time DMA is completed
+ * on the specified channel
+ */
+void vpdma_add_sync_on_channel_ctd(struct vpdma_desc_list *list,
+		enum vpdma_channel chan)
+{
+	struct vpdma_ctd *ctd;
+
+	ctd = list->next;
+	WARN_ON((void *)(ctd + 1) > (list->buf.addr + list->buf.size));
+
+	ctd->w0 = 0;
+	ctd->w1 = 0;
+	ctd->w2 = 0;
+	ctd->type_source_ctl = ctd_type_source_ctl(chan_info[chan].num,
+				CTD_TYPE_SYNC_ON_CHANNEL);
+
+	list->next = ctd + 1;
+
+	dump_ctd(ctd);
+}
+
+static void dump_dtd(struct vpdma_dtd *dtd)
+{
+	int dir, chan;
+
+	dir = dtd_get_dir(dtd);
+	chan = dtd_get_chan(dtd);
+
+	pr_debug("%s data transfer descriptor for channel %d\n",
+		dir == DTD_DIR_OUT ? "outbound" : "inbound", chan);
+
+	pr_debug("word0: data_type = %d, notify = %d, field = %d, 1D = %d, "
+		"even_ln_skp = %d, odd_ln_skp = %d, line_stride = %d\n",
+		dtd_get_data_type(dtd), dtd_get_notify(dtd), dtd_get_field(dtd),
+		dtd_get_1d(dtd), dtd_get_even_line_skip(dtd),
+		dtd_get_odd_line_skip(dtd), dtd_get_line_stride(dtd));
+
+	if (dir == DTD_DIR_IN)
+		pr_debug("word1: line_length = %d, xfer_height = %d\n",
+			dtd_get_line_length(dtd), dtd_get_xfer_height(dtd));
+
+	pr_debug("word2: start_addr = 0x%08x\n", dtd->start_addr);
+
+	pr_debug("word3: pkt_type = %d, mode = %d, dir = %d, chan = %d, "
+		"pri = %d, next_chan = %d\n", dtd_get_pkt_type(dtd),
+		dtd_get_mode(dtd), dir, chan, dtd_get_priority(dtd),
+		dtd_get_next_chan(dtd));
+
+	if (dir == DTD_DIR_IN)
+		pr_debug("word4: frame_width = %d, frame_height = %d\n",
+			dtd_get_frame_width(dtd), dtd_get_frame_height(dtd));
+	else
+		pr_debug("word4: desc_write_addr = 0x%08x, write_desc = %d, "
+			"drp_data = %d, use_desc_reg = %d\n",
+			dtd_get_desc_write_addr(dtd), dtd_get_write_desc(dtd),
+			dtd_get_drop_data(dtd), dtd_get_use_desc(dtd));
+
+	if (dir == DTD_DIR_IN)
+		pr_debug("word5: hor_start = %d, ver_start = %d\n",
+			dtd_get_h_start(dtd), dtd_get_v_start(dtd));
+	else
+		pr_debug("word5: max_width %d, max_height %d\n",
+			dtd_get_max_width(dtd), dtd_get_max_height(dtd));
+
+	pr_debug("word6: client specfic attr0 = 0x%08x\n", dtd->client_attr0);
+	pr_debug("word7: client specfic attr1 = 0x%08x\n", dtd->client_attr1);
+}
+
+/*
+ * append an outbound data transfer descriptor to the given descriptor list,
+ * this sets up a 'client to memory' VPDMA transfer for the given VPDMA channel
+ */
+void vpdma_add_out_dtd(struct vpdma_desc_list *list, struct v4l2_rect *c_rect,
+		const struct vpdma_data_format *fmt, dma_addr_t dma_addr,
+		enum vpdma_channel chan, u32 flags)
+{
+	int priority = 0;
+	int field = 0;
+	int notify = 1;
+	int channel, next_chan;
+	int depth = fmt->depth;
+	int stride;
+	struct vpdma_dtd *dtd;
+
+	channel = next_chan = chan_info[chan].num;
+
+	if (fmt->data_type == DATA_TYPE_C420)
+		depth = 8;
+
+	stride = (depth * c_rect->width) >> 3;
+	dma_addr += (c_rect->left * depth) >> 3;
+
+	dtd = list->next;
+	WARN_ON((void *)(dtd + 1) > (list->buf.addr + list->buf.size));
+
+	dtd->type_ctl_stride = dtd_type_ctl_stride(fmt->data_type,
+					notify,
+					field,
+					!!(flags & VPDMA_DATA_FRAME_1D),
+					!!(flags & VPDMA_DATA_EVEN_LINE_SKIP),
+					!!(flags & VPDMA_DATA_ODD_LINE_SKIP),
+					stride);
+	dtd->w1 = 0;
+	dtd->start_addr = (u32) dma_addr;
+	dtd->pkt_ctl = dtd_pkt_ctl(!!(flags & VPDMA_DATA_MODE_TILED),
+				DTD_DIR_OUT, channel, priority, next_chan);
+	dtd->desc_write_addr = dtd_desc_write_addr(0, 0, 0, 0);
+	dtd->max_width_height = dtd_max_width_height(MAX_OUT_WIDTH_1920,
+					MAX_OUT_HEIGHT_1080);
+	dtd->client_attr0 = 0;
+	dtd->client_attr1 = 0;
+
+	list->next = dtd + 1;
+
+	dump_dtd(dtd);
+}
+
+/*
+ * append an inbound data transfer descriptor to the given descriptor list,
+ * this sets up a 'memory to client' VPDMA transfer for the given VPDMA channel
+ */
+void vpdma_add_in_dtd(struct vpdma_desc_list *list, int frame_width,
+		int frame_height, struct v4l2_rect *c_rect,
+		const struct vpdma_data_format *fmt, dma_addr_t dma_addr,
+		enum vpdma_channel chan, int field, u32 flags)
+{
+	int priority = 0;
+	int notify = 1;
+	int depth = fmt->depth;
+	int channel, next_chan;
+	int stride;
+	int height = c_rect->height;
+	struct vpdma_dtd *dtd;
+
+	channel = next_chan = chan_info[chan].num;
+
+	if (fmt->data_type == DATA_TYPE_C420) {
+		height >>= 1;
+		frame_height >>= 1;
+		depth = 8;
+	}
+
+	stride = (depth * c_rect->width) >> 3;
+	dma_addr += (c_rect->left * depth) >> 3;
+
+	dtd = list->next;
+	WARN_ON((void *)(dtd + 1) > (list->buf.addr + list->buf.size));
+
+	dtd->type_ctl_stride = dtd_type_ctl_stride(fmt->data_type,
+					notify,
+					field,
+					!!(flags & VPDMA_DATA_FRAME_1D),
+					!!(flags & VPDMA_DATA_EVEN_LINE_SKIP),
+					!!(flags & VPDMA_DATA_ODD_LINE_SKIP),
+					stride);
+
+	dtd->xfer_length_height = dtd_xfer_length_height(c_rect->width, height);
+	dtd->start_addr = (u32) dma_addr;
+	dtd->pkt_ctl = dtd_pkt_ctl(!!(flags & VPDMA_DATA_MODE_TILED),
+				DTD_DIR_IN, channel, priority, next_chan);
+	dtd->frame_width_height = dtd_frame_width_height(frame_width,
+					frame_height);
+	dtd->start_h_v = dtd_start_h_v(c_rect->left, c_rect->top);
+	dtd->client_attr0 = 0;
+	dtd->client_attr1 = 0;
+
+	list->next = dtd + 1;
+
+	dump_dtd(dtd);
+}
+
 /* set or clear the mask for list complete interrupt */
 void vpdma_enable_list_complete_irq(struct vpdma_data *vpdma, int list_num,
 		bool enable)
diff --git a/drivers/media/platform/ti-vpe/vpdma.h b/drivers/media/platform/ti-vpe/vpdma.h
index 8056689..eaa2a71 100644
--- a/drivers/media/platform/ti-vpe/vpdma.h
+++ b/drivers/media/platform/ti-vpe/vpdma.h
@@ -124,6 +124,39 @@ enum vpdma_channel {
 	VPE_CHAN_RGB_OUT,
 };
 
+/* flags for VPDMA data descriptors */
+#define VPDMA_DATA_ODD_LINE_SKIP	(1 << 0)
+#define VPDMA_DATA_EVEN_LINE_SKIP	(1 << 1)
+#define VPDMA_DATA_FRAME_1D		(1 << 2)
+#define VPDMA_DATA_MODE_TILED		(1 << 3)
+
+/*
+ * client identifiers used for configuration descriptors
+ */
+#define CFD_MMR_CLIENT		0
+#define CFD_SC_CLIENT		4
+
+/* Address data block header format */
+struct vpdma_adb_hdr {
+	u32			offset;
+	u32			nwords;
+	u32			reserved0;
+	u32			reserved1;
+};
+
+/* helpers for creating ADB headers for config descriptors MMRs as client */
+#define ADB_ADDR(dma_buf, str, fld)	((dma_buf)->addr + offsetof(str, fld))
+#define MMR_ADB_ADDR(buf, str, fld)	ADB_ADDR(&(buf), struct str, fld)
+
+#define VPDMA_SET_MMR_ADB_HDR(buf, str, hdr, regs, offset_a)	\
+	do {							\
+		struct vpdma_adb_hdr *h;			\
+		struct str *adb = NULL;				\
+		h = MMR_ADB_ADDR(buf, str, hdr);		\
+		h->offset = (offset_a);				\
+		h->nwords = sizeof(adb->regs) >> 2;		\
+	} while (0)
+
 /* vpdma descriptor buffer allocation and management */
 int vpdma_alloc_desc_buf(struct vpdma_buf *buf, size_t size);
 void vpdma_free_desc_buf(struct vpdma_buf *buf);
@@ -136,6 +169,21 @@ void vpdma_reset_desc_list(struct vpdma_desc_list *list);
 void vpdma_free_desc_list(struct vpdma_desc_list *list);
 int vpdma_submit_descs(struct vpdma_data *vpdma, struct vpdma_desc_list *list);
 
+/* helpers for creating vpdma descriptors */
+void vpdma_add_cfd_block(struct vpdma_desc_list *list, int client,
+		struct vpdma_buf *blk, u32 dest_offset);
+void vpdma_add_cfd_adb(struct vpdma_desc_list *list, int client,
+		struct vpdma_buf *adb);
+void vpdma_add_sync_on_channel_ctd(struct vpdma_desc_list *list,
+		enum vpdma_channel chan);
+void vpdma_add_out_dtd(struct vpdma_desc_list *list, struct v4l2_rect *c_rect,
+		const struct vpdma_data_format *fmt, dma_addr_t dma_addr,
+		enum vpdma_channel chan, u32 flags);
+void vpdma_add_in_dtd(struct vpdma_desc_list *list, int frame_width,
+		int frame_height, struct v4l2_rect *c_rect,
+		const struct vpdma_data_format *fmt, dma_addr_t dma_addr,
+		enum vpdma_channel chan, int field, u32 flags);
+
 /* vpdma list interrupt management */
 void vpdma_enable_list_complete_irq(struct vpdma_data *vpdma, int list_num,
 		bool enable);
diff --git a/drivers/media/platform/ti-vpe/vpdma_priv.h b/drivers/media/platform/ti-vpe/vpdma_priv.h
index 8ff51a3..f0e9a80 100644
--- a/drivers/media/platform/ti-vpe/vpdma_priv.h
+++ b/drivers/media/platform/ti-vpe/vpdma_priv.h
@@ -116,4 +116,526 @@
 #define	VPE_CHAN_NUM_CHROMA_OUT		103
 #define	VPE_CHAN_NUM_RGB_OUT		106
 
+/*
+ * a VPDMA address data block payload for a configuration descriptor needs to
+ * have each sub block length as a multiple of 16 bytes. Therefore, the overall
+ * size of the payload also needs to be a multiple of 16 bytes. The sub block
+ * lengths should be ensured to be aligned by the VPDMA user.
+ */
+#define VPDMA_ADB_SIZE_ALIGN		0x0f
+
+/*
+ * data transfer descriptor
+ */
+struct vpdma_dtd {
+	u32			type_ctl_stride;
+	union {
+		u32		xfer_length_height;
+		u32		w1;
+	};
+	dma_addr_t		start_addr;
+	u32			pkt_ctl;
+	union {
+		u32		frame_width_height;	/* inbound */
+		dma_addr_t	desc_write_addr;	/* outbound */
+	};
+	union {
+		u32		start_h_v;		/* inbound */
+		u32		max_width_height;	/* outbound */
+	};
+	u32			client_attr0;
+	u32			client_attr1;
+};
+
+/* Data Transfer Descriptor specifics */
+#define DTD_NO_NOTIFY		0
+#define DTD_NOTIFY		1
+
+#define DTD_PKT_TYPE		0xa
+#define DTD_DIR_IN		0
+#define DTD_DIR_OUT		1
+
+/* type_ctl_stride */
+#define DTD_DATA_TYPE_MASK	0x3f
+#define DTD_DATA_TYPE_SHFT	26
+#define DTD_NOTIFY_MASK		0x01
+#define DTD_NOTIFY_SHFT		25
+#define DTD_FIELD_MASK		0x01
+#define DTD_FIELD_SHFT		24
+#define DTD_1D_MASK		0x01
+#define DTD_1D_SHFT		23
+#define DTD_EVEN_LINE_SKIP_MASK	0x01
+#define DTD_EVEN_LINE_SKIP_SHFT	20
+#define DTD_ODD_LINE_SKIP_MASK	0x01
+#define DTD_ODD_LINE_SKIP_SHFT	16
+#define DTD_LINE_STRIDE_MASK	0xffff
+#define DTD_LINE_STRIDE_SHFT	0
+
+/* xfer_length_height */
+#define DTD_LINE_LENGTH_MASK	0xffff
+#define DTD_LINE_LENGTH_SHFT	16
+#define DTD_XFER_HEIGHT_MASK	0xffff
+#define DTD_XFER_HEIGHT_SHFT	0
+
+/* pkt_ctl */
+#define DTD_PKT_TYPE_MASK	0x1f
+#define DTD_PKT_TYPE_SHFT	27
+#define DTD_MODE_MASK		0x01
+#define DTD_MODE_SHFT		26
+#define DTD_DIR_MASK		0x01
+#define DTD_DIR_SHFT		25
+#define DTD_CHAN_MASK		0x01ff
+#define DTD_CHAN_SHFT		16
+#define DTD_PRI_MASK		0x0f
+#define DTD_PRI_SHFT		9
+#define DTD_NEXT_CHAN_MASK	0x01ff
+#define DTD_NEXT_CHAN_SHFT	0
+
+/* frame_width_height */
+#define DTD_FRAME_WIDTH_MASK	0xffff
+#define DTD_FRAME_WIDTH_SHFT	16
+#define DTD_FRAME_HEIGHT_MASK	0xffff
+#define DTD_FRAME_HEIGHT_SHFT	0
+
+/* start_h_v */
+#define DTD_H_START_MASK	0xffff
+#define DTD_H_START_SHFT	16
+#define DTD_V_START_MASK	0xffff
+#define DTD_V_START_SHFT	0
+
+#define DTD_DESC_START_SHIFT	5
+#define DTD_WRITE_DESC_MASK	0x01
+#define DTD_WRITE_DESC_SHIFT	2
+#define DTD_DROP_DATA_MASK	0x01
+#define DTD_DROP_DATA_SHIFT	1
+#define DTD_USE_DESC_MASK	0x01
+#define DTD_USE_DESC_SHIFT	0
+
+/* max_width_height */
+#define DTD_MAX_WIDTH_MASK	0x07
+#define DTD_MAX_WIDTH_SHFT	4
+#define DTD_MAX_HEIGHT_MASK	0x07
+#define DTD_MAX_HEIGHT_SHFT	0
+
+/* max width configurations */
+ /* unlimited width */
+#define	MAX_OUT_WIDTH_UNLIMITED		0
+/* as specified in max_size1 reg */
+#define MAX_OUT_WIDTH_REG1		1
+/* as specified in max_size2 reg */
+#define MAX_OUT_WIDTH_REG2		2
+/* as specified in max_size3 reg */
+#define	MAX_OUT_WIDTH_REG3		3
+/* maximum of 352 pixels as width */
+#define MAX_OUT_WIDTH_352		4
+/* maximum of 768 pixels as width */
+#define	MAX_OUT_WIDTH_768		5
+/* maximum of 1280 pixels width */
+#define	MAX_OUT_WIDTH_1280		6
+/* maximum of 1920 pixels as width */
+#define	MAX_OUT_WIDTH_1920		7
+
+/* max height configurations */
+ /* unlimited height */
+#define	MAX_OUT_HEIGHT_UNLIMITED	0
+/* as specified in max_size1 reg */
+#define MAX_OUT_HEIGHT_REG1		1
+/* as specified in max_size2 reg */
+#define MAX_OUT_HEIGHT_REG2		2
+/* as specified in max_size3 reg */
+#define	MAX_OUT_HEIGHT_REG3		3
+/* maximum of 288 lines as height */
+#define MAX_OUT_HEIGHT_288		4
+/* maximum of 576 lines as height */
+#define	MAX_OUT_HEIGHT_576		5
+/* maximum of 720 lines as height */
+#define	MAX_OUT_HEIGHT_720		6
+/* maximum of 1080 lines as height */
+#define	MAX_OUT_HEIGHT_1080		7
+
+static inline u32 dtd_type_ctl_stride(int type, bool notify, int field,
+			bool one_d, bool even_line_skip, bool odd_line_skip,
+			int line_stride)
+{
+	return (type << DTD_DATA_TYPE_SHFT) | (notify << DTD_NOTIFY_SHFT) |
+		(field << DTD_FIELD_SHFT) | (one_d << DTD_1D_SHFT) |
+		(even_line_skip << DTD_EVEN_LINE_SKIP_SHFT) |
+		(odd_line_skip << DTD_ODD_LINE_SKIP_SHFT) |
+		line_stride;
+}
+
+static inline u32 dtd_xfer_length_height(int line_length, int xfer_height)
+{
+	return (line_length << DTD_LINE_LENGTH_SHFT) | xfer_height;
+}
+
+static inline u32 dtd_pkt_ctl(bool mode, bool dir, int chan, int pri,
+			int next_chan)
+{
+	return (DTD_PKT_TYPE << DTD_PKT_TYPE_SHFT) | (mode << DTD_MODE_SHFT) |
+		(dir << DTD_DIR_SHFT) | (chan << DTD_CHAN_SHFT) |
+		(pri << DTD_PRI_SHFT) | next_chan;
+}
+
+static inline u32 dtd_frame_width_height(int width, int height)
+{
+	return (width << DTD_FRAME_WIDTH_SHFT) | height;
+}
+
+static inline u32 dtd_desc_write_addr(unsigned int addr, bool write_desc,
+			bool drop_data, bool use_desc)
+{
+	return (addr << DTD_DESC_START_SHIFT) |
+		(write_desc << DTD_WRITE_DESC_SHIFT) |
+		(drop_data << DTD_DROP_DATA_SHIFT) |
+		use_desc;
+}
+
+static inline u32 dtd_start_h_v(int h_start, int v_start)
+{
+	return (h_start << DTD_H_START_SHFT) | v_start;
+}
+
+static inline u32 dtd_max_width_height(int max_width, int max_height)
+{
+	return (max_width << DTD_MAX_WIDTH_SHFT) | max_height;
+}
+
+static inline int dtd_get_data_type(struct vpdma_dtd *dtd)
+{
+	return dtd->type_ctl_stride >> DTD_DATA_TYPE_SHFT;
+}
+
+static inline bool dtd_get_notify(struct vpdma_dtd *dtd)
+{
+	return (dtd->type_ctl_stride >> DTD_NOTIFY_SHFT) & DTD_NOTIFY_MASK;
+}
+
+static inline int dtd_get_field(struct vpdma_dtd *dtd)
+{
+	return (dtd->type_ctl_stride >> DTD_FIELD_SHFT) & DTD_FIELD_MASK;
+}
+
+static inline bool dtd_get_1d(struct vpdma_dtd *dtd)
+{
+	return (dtd->type_ctl_stride >> DTD_1D_SHFT) & DTD_1D_MASK;
+}
+
+static inline bool dtd_get_even_line_skip(struct vpdma_dtd *dtd)
+{
+	return (dtd->type_ctl_stride >> DTD_EVEN_LINE_SKIP_SHFT)
+		& DTD_EVEN_LINE_SKIP_MASK;
+}
+
+static inline bool dtd_get_odd_line_skip(struct vpdma_dtd *dtd)
+{
+	return (dtd->type_ctl_stride >> DTD_ODD_LINE_SKIP_SHFT)
+		& DTD_ODD_LINE_SKIP_MASK;
+}
+
+static inline int dtd_get_line_stride(struct vpdma_dtd *dtd)
+{
+	return dtd->type_ctl_stride & DTD_LINE_STRIDE_MASK;
+}
+
+static inline int dtd_get_line_length(struct vpdma_dtd *dtd)
+{
+	return dtd->xfer_length_height >> DTD_LINE_LENGTH_SHFT;
+}
+
+static inline int dtd_get_xfer_height(struct vpdma_dtd *dtd)
+{
+	return dtd->xfer_length_height & DTD_XFER_HEIGHT_MASK;
+}
+
+static inline int dtd_get_pkt_type(struct vpdma_dtd *dtd)
+{
+	return dtd->pkt_ctl >> DTD_PKT_TYPE_SHFT;
+}
+
+static inline bool dtd_get_mode(struct vpdma_dtd *dtd)
+{
+	return (dtd->pkt_ctl >> DTD_MODE_SHFT) & DTD_MODE_MASK;
+}
+
+static inline bool dtd_get_dir(struct vpdma_dtd *dtd)
+{
+	return (dtd->pkt_ctl >> DTD_DIR_SHFT) & DTD_DIR_MASK;
+}
+
+static inline int dtd_get_chan(struct vpdma_dtd *dtd)
+{
+	return (dtd->pkt_ctl >> DTD_CHAN_SHFT) & DTD_CHAN_MASK;
+}
+
+static inline int dtd_get_priority(struct vpdma_dtd *dtd)
+{
+	return (dtd->pkt_ctl >> DTD_PRI_SHFT) & DTD_PRI_MASK;
+}
+
+static inline int dtd_get_next_chan(struct vpdma_dtd *dtd)
+{
+	return (dtd->pkt_ctl >> DTD_NEXT_CHAN_SHFT) & DTD_NEXT_CHAN_MASK;
+}
+
+static inline int dtd_get_frame_width(struct vpdma_dtd *dtd)
+{
+	return dtd->frame_width_height >> DTD_FRAME_WIDTH_SHFT;
+}
+
+static inline int dtd_get_frame_height(struct vpdma_dtd *dtd)
+{
+	return dtd->frame_width_height & DTD_FRAME_HEIGHT_MASK;
+}
+
+static inline int dtd_get_desc_write_addr(struct vpdma_dtd *dtd)
+{
+	return dtd->desc_write_addr >> DTD_DESC_START_SHIFT;
+}
+
+static inline bool dtd_get_write_desc(struct vpdma_dtd *dtd)
+{
+	return (dtd->desc_write_addr >> DTD_WRITE_DESC_SHIFT) &
+							DTD_WRITE_DESC_MASK;
+}
+
+static inline bool dtd_get_drop_data(struct vpdma_dtd *dtd)
+{
+	return (dtd->desc_write_addr >> DTD_DROP_DATA_SHIFT) &
+							DTD_DROP_DATA_MASK;
+}
+
+static inline bool dtd_get_use_desc(struct vpdma_dtd *dtd)
+{
+	return dtd->desc_write_addr & DTD_USE_DESC_MASK;
+}
+
+static inline int dtd_get_h_start(struct vpdma_dtd *dtd)
+{
+	return dtd->start_h_v >> DTD_H_START_SHFT;
+}
+
+static inline int dtd_get_v_start(struct vpdma_dtd *dtd)
+{
+	return dtd->start_h_v & DTD_V_START_MASK;
+}
+
+static inline int dtd_get_max_width(struct vpdma_dtd *dtd)
+{
+	return (dtd->max_width_height >> DTD_MAX_WIDTH_SHFT) &
+							DTD_MAX_WIDTH_MASK;
+}
+
+static inline int dtd_get_max_height(struct vpdma_dtd *dtd)
+{
+	return (dtd->max_width_height >> DTD_MAX_HEIGHT_SHFT) &
+							DTD_MAX_HEIGHT_MASK;
+}
+
+/*
+ * configuration descriptor
+ */
+struct vpdma_cfd {
+	union {
+		u32	dest_addr_offset;
+		u32	w0;
+	};
+	union {
+		u32	block_len;		/* in words */
+		u32	w1;
+	};
+	u32		payload_addr;
+	u32		ctl_payload_len;	/* in words */
+};
+
+/* Configuration descriptor specifics */
+
+#define CFD_PKT_TYPE		0xb
+
+#define CFD_DIRECT		1
+#define CFD_INDIRECT		0
+#define CFD_CLS_ADB		0
+#define CFD_CLS_BLOCK		1
+
+/* block_len */
+#define CFD__BLOCK_LEN_MASK	0xffff
+#define CFD__BLOCK_LEN_SHFT	0
+
+/* ctl_payload_len */
+#define CFD_PKT_TYPE_MASK	0x1f
+#define CFD_PKT_TYPE_SHFT	27
+#define CFD_DIRECT_MASK		0x01
+#define CFD_DIRECT_SHFT		26
+#define CFD_CLASS_MASK		0x03
+#define CFD_CLASS_SHFT		24
+#define CFD_DEST_MASK		0xff
+#define CFD_DEST_SHFT		16
+#define CFD_PAYLOAD_LEN_MASK	0xffff
+#define CFD_PAYLOAD_LEN_SHFT	0
+
+static inline u32 cfd_pkt_payload_len(bool direct, int cls, int dest,
+		int payload_len)
+{
+	return (CFD_PKT_TYPE << CFD_PKT_TYPE_SHFT) |
+		(direct << CFD_DIRECT_SHFT) |
+		(cls << CFD_CLASS_SHFT) |
+		(dest << CFD_DEST_SHFT) |
+		payload_len;
+}
+
+static inline int cfd_get_pkt_type(struct vpdma_cfd *cfd)
+{
+	return cfd->ctl_payload_len >> CFD_PKT_TYPE_SHFT;
+}
+
+static inline bool cfd_get_direct(struct vpdma_cfd *cfd)
+{
+	return (cfd->ctl_payload_len >> CFD_DIRECT_SHFT) & CFD_DIRECT_MASK;
+}
+
+static inline bool cfd_get_class(struct vpdma_cfd *cfd)
+{
+	return (cfd->ctl_payload_len >> CFD_CLASS_SHFT) & CFD_CLASS_MASK;
+}
+
+static inline int cfd_get_dest(struct vpdma_cfd *cfd)
+{
+	return (cfd->ctl_payload_len >> CFD_DEST_SHFT) & CFD_DEST_MASK;
+}
+
+static inline int cfd_get_payload_len(struct vpdma_cfd *cfd)
+{
+	return cfd->ctl_payload_len & CFD_PAYLOAD_LEN_MASK;
+}
+
+/*
+ * control descriptor
+ */
+struct vpdma_ctd {
+	union {
+		u32	timer_value;
+		u32	list_addr;
+		u32	w0;
+	};
+	union {
+		u32	pixel_line_count;
+		u32	list_size;
+		u32	w1;
+	};
+	union {
+		u32	event;
+		u32	fid_ctl;
+		u32	w2;
+	};
+	u32		type_source_ctl;
+};
+
+/* control descriptor types */
+#define CTD_TYPE_SYNC_ON_CLIENT		0
+#define CTD_TYPE_SYNC_ON_LIST		1
+#define CTD_TYPE_SYNC_ON_EXT		2
+#define CTD_TYPE_SYNC_ON_LM_TIMER	3
+#define CTD_TYPE_SYNC_ON_CHANNEL	4
+#define CTD_TYPE_CHNG_CLIENT_IRQ	5
+#define CTD_TYPE_SEND_IRQ		6
+#define CTD_TYPE_RELOAD_LIST		7
+#define CTD_TYPE_ABORT_CHANNEL		8
+
+#define CTD_PKT_TYPE		0xc
+
+/* timer_value */
+#define CTD_TIMER_VALUE_MASK	0xffff
+#define CTD_TIMER_VALUE_SHFT	0
+
+/* pixel_line_count */
+#define CTD_PIXEL_COUNT_MASK	0xffff
+#define CTD_PIXEL_COUNT_SHFT	16
+#define CTD_LINE_COUNT_MASK	0xffff
+#define CTD_LINE_COUNT_SHFT	0
+
+/* list_size */
+#define CTD_LIST_SIZE_MASK	0xffff
+#define CTD_LIST_SIZE_SHFT	0
+
+/* event */
+#define CTD_EVENT_MASK		0x0f
+#define CTD_EVENT_SHFT		0
+
+/* fid_ctl */
+#define CTD_FID2_MASK		0x03
+#define CTD_FID2_SHFT		4
+#define CTD_FID1_MASK		0x03
+#define CTD_FID1_SHFT		2
+#define CTD_FID0_MASK		0x03
+#define CTD_FID0_SHFT		0
+
+/* type_source_ctl */
+#define CTD_PKT_TYPE_MASK	0x1f
+#define CTD_PKT_TYPE_SHFT	27
+#define CTD_SOURCE_MASK		0xff
+#define CTD_SOURCE_SHFT		16
+#define CTD_CONTROL_MASK	0x0f
+#define CTD_CONTROL_SHFT	0
+
+static inline u32 ctd_pixel_line_count(int pixel_count, int line_count)
+{
+	return (pixel_count << CTD_PIXEL_COUNT_SHFT) | line_count;
+}
+
+static inline u32 ctd_set_fid_ctl(int fid0, int fid1, int fid2)
+{
+	return (fid2 << CTD_FID2_SHFT) | (fid1 << CTD_FID1_SHFT) | fid0;
+}
+
+static inline u32 ctd_type_source_ctl(int source, int control)
+{
+	return (CTD_PKT_TYPE << CTD_PKT_TYPE_SHFT) |
+		(source << CTD_SOURCE_SHFT) | control;
+}
+
+static inline u32 ctd_get_pixel_count(struct vpdma_ctd *ctd)
+{
+	return ctd->pixel_line_count >> CTD_PIXEL_COUNT_SHFT;
+}
+
+static inline int ctd_get_line_count(struct vpdma_ctd *ctd)
+{
+	return ctd->pixel_line_count & CTD_LINE_COUNT_MASK;
+}
+
+static inline int ctd_get_event(struct vpdma_ctd *ctd)
+{
+	return ctd->event & CTD_EVENT_MASK;
+}
+
+static inline int ctd_get_fid2_ctl(struct vpdma_ctd *ctd)
+{
+	return (ctd->fid_ctl >> CTD_FID2_SHFT) & CTD_FID2_MASK;
+}
+
+static inline int ctd_get_fid1_ctl(struct vpdma_ctd *ctd)
+{
+	return (ctd->fid_ctl >> CTD_FID1_SHFT) & CTD_FID1_MASK;
+}
+
+static inline int ctd_get_fid0_ctl(struct vpdma_ctd *ctd)
+{
+	return ctd->fid_ctl & CTD_FID2_MASK;
+}
+
+static inline int ctd_get_pkt_type(struct vpdma_ctd *ctd)
+{
+	return ctd->type_source_ctl >> CTD_PKT_TYPE_SHFT;
+}
+
+static inline int ctd_get_source(struct vpdma_ctd *ctd)
+{
+	return (ctd->type_source_ctl >> CTD_SOURCE_SHFT) & CTD_SOURCE_MASK;
+}
+
+static inline int ctd_get_ctl(struct vpdma_ctd *ctd)
+{
+	return ctd->type_source_ctl & CTD_CONTROL_MASK;
+}
+
 #endif
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v4 2/4] v4l: ti-vpe: Add helpers for creating VPDMA descriptors
@ 2013-09-06 10:12       ` Archit Taneja
  0 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-09-06 10:12 UTC (permalink / raw)
  To: linux-media, hverkuil, laurent.pinchart
  Cc: linux-omap, tomi.valkeinen, Archit Taneja

Create functions which the VPE driver can use to create a VPDMA descriptor and
add it to a VPDMA descriptor list. These functions take a pointer to an existing
list, and append the configuration/data/control descriptor header to the list.

In the case of configuration descriptors, the creation of a payload block may be
required(the payloads can hold VPE MMR values, or scaler coefficients). The
allocation of the payload buffer and it's content is left to the VPE driver.
However, the VPDMA library provides helper macros to create payload in the
correct format.

Add debug functions to dump the descriptors in a way such that it's easy to see
the values of different fields in the descriptors.

Signed-off-by: Archit Taneja <archit@ti.com>
---
 drivers/media/platform/ti-vpe/vpdma.c      | 268 +++++++++++++++
 drivers/media/platform/ti-vpe/vpdma.h      |  48 +++
 drivers/media/platform/ti-vpe/vpdma_priv.h | 522 +++++++++++++++++++++++++++++
 3 files changed, 838 insertions(+)

diff --git a/drivers/media/platform/ti-vpe/vpdma.c b/drivers/media/platform/ti-vpe/vpdma.c
index 42db12c..af0a5ff 100644
--- a/drivers/media/platform/ti-vpe/vpdma.c
+++ b/drivers/media/platform/ti-vpe/vpdma.c
@@ -21,6 +21,7 @@
 #include <linux/platform_device.h>
 #include <linux/sched.h>
 #include <linux/slab.h>
+#include <linux/videodev2.h>
 
 #include "vpdma.h"
 #include "vpdma_priv.h"
@@ -416,6 +417,273 @@ int vpdma_submit_descs(struct vpdma_data *vpdma, struct vpdma_desc_list *list)
 	return 0;
 }
 
+static void dump_cfd(struct vpdma_cfd *cfd)
+{
+	int class;
+
+	class = cfd_get_class(cfd);
+
+	pr_debug("config descriptor of payload class: %s\n",
+		class == CFD_CLS_BLOCK ? "simple block" :
+		"address data block");
+
+	if (class == CFD_CLS_BLOCK)
+		pr_debug("word0: dst_addr_offset = 0x%08x\n",
+			cfd->dest_addr_offset);
+
+	if (class == CFD_CLS_BLOCK)
+		pr_debug("word1: num_data_wrds = %d\n", cfd->block_len);
+
+	pr_debug("word2: payload_addr = 0x%08x\n", cfd->payload_addr);
+
+	pr_debug("word3: pkt_type = %d, direct = %d, class = %d, dest = %d, "
+		"payload_len = %d\n", cfd_get_pkt_type(cfd),
+		cfd_get_direct(cfd), class, cfd_get_dest(cfd),
+		cfd_get_payload_len(cfd));
+}
+
+/*
+ * append a configuration descriptor to the given descriptor list, where the
+ * payload is in the form of a simple data block specified in the descriptor
+ * header, this is used to upload scaler coefficients to the scaler module
+ */
+void vpdma_add_cfd_block(struct vpdma_desc_list *list, int client,
+		struct vpdma_buf *blk, u32 dest_offset)
+{
+	struct vpdma_cfd *cfd;
+	int len = blk->size;
+
+	WARN_ON(blk->dma_addr & VPDMA_DESC_ALIGN);
+
+	cfd = list->next;
+	WARN_ON((void *)(cfd + 1) > (list->buf.addr + list->buf.size));
+
+	cfd->dest_addr_offset = dest_offset;
+	cfd->block_len = len;
+	cfd->payload_addr = (u32) blk->dma_addr;
+	cfd->ctl_payload_len = cfd_pkt_payload_len(CFD_INDIRECT, CFD_CLS_BLOCK,
+				client, len >> 4);
+
+	list->next = cfd + 1;
+
+	dump_cfd(cfd);
+}
+
+/*
+ * append a configuration descriptor to the given descriptor list, where the
+ * payload is in the address data block format, this is used to a configure a
+ * discontiguous set of MMRs
+ */
+void vpdma_add_cfd_adb(struct vpdma_desc_list *list, int client,
+		struct vpdma_buf *adb)
+{
+	struct vpdma_cfd *cfd;
+	unsigned int len = adb->size;
+
+	WARN_ON(len & VPDMA_ADB_SIZE_ALIGN);
+	WARN_ON(adb->dma_addr & VPDMA_DESC_ALIGN);
+
+	cfd = list->next;
+	BUG_ON((void *)(cfd + 1) > (list->buf.addr + list->buf.size));
+
+	cfd->w0 = 0;
+	cfd->w1 = 0;
+	cfd->payload_addr = (u32) adb->dma_addr;
+	cfd->ctl_payload_len = cfd_pkt_payload_len(CFD_INDIRECT, CFD_CLS_ADB,
+				client, len >> 4);
+
+	list->next = cfd + 1;
+
+	dump_cfd(cfd);
+};
+
+/*
+ * control descriptor format change based on what type of control descriptor it
+ * is, we only use 'sync on channel' control descriptors for now, so assume it's
+ * that
+ */
+static void dump_ctd(struct vpdma_ctd *ctd)
+{
+	pr_debug("control descriptor\n");
+
+	pr_debug("word3: pkt_type = %d, source = %d, ctl_type = %d\n",
+		ctd_get_pkt_type(ctd), ctd_get_source(ctd), ctd_get_ctl(ctd));
+}
+
+/*
+ * append a 'sync on channel' type control descriptor to the given descriptor
+ * list, this descriptor stalls the VPDMA list till the time DMA is completed
+ * on the specified channel
+ */
+void vpdma_add_sync_on_channel_ctd(struct vpdma_desc_list *list,
+		enum vpdma_channel chan)
+{
+	struct vpdma_ctd *ctd;
+
+	ctd = list->next;
+	WARN_ON((void *)(ctd + 1) > (list->buf.addr + list->buf.size));
+
+	ctd->w0 = 0;
+	ctd->w1 = 0;
+	ctd->w2 = 0;
+	ctd->type_source_ctl = ctd_type_source_ctl(chan_info[chan].num,
+				CTD_TYPE_SYNC_ON_CHANNEL);
+
+	list->next = ctd + 1;
+
+	dump_ctd(ctd);
+}
+
+static void dump_dtd(struct vpdma_dtd *dtd)
+{
+	int dir, chan;
+
+	dir = dtd_get_dir(dtd);
+	chan = dtd_get_chan(dtd);
+
+	pr_debug("%s data transfer descriptor for channel %d\n",
+		dir == DTD_DIR_OUT ? "outbound" : "inbound", chan);
+
+	pr_debug("word0: data_type = %d, notify = %d, field = %d, 1D = %d, "
+		"even_ln_skp = %d, odd_ln_skp = %d, line_stride = %d\n",
+		dtd_get_data_type(dtd), dtd_get_notify(dtd), dtd_get_field(dtd),
+		dtd_get_1d(dtd), dtd_get_even_line_skip(dtd),
+		dtd_get_odd_line_skip(dtd), dtd_get_line_stride(dtd));
+
+	if (dir == DTD_DIR_IN)
+		pr_debug("word1: line_length = %d, xfer_height = %d\n",
+			dtd_get_line_length(dtd), dtd_get_xfer_height(dtd));
+
+	pr_debug("word2: start_addr = 0x%08x\n", dtd->start_addr);
+
+	pr_debug("word3: pkt_type = %d, mode = %d, dir = %d, chan = %d, "
+		"pri = %d, next_chan = %d\n", dtd_get_pkt_type(dtd),
+		dtd_get_mode(dtd), dir, chan, dtd_get_priority(dtd),
+		dtd_get_next_chan(dtd));
+
+	if (dir == DTD_DIR_IN)
+		pr_debug("word4: frame_width = %d, frame_height = %d\n",
+			dtd_get_frame_width(dtd), dtd_get_frame_height(dtd));
+	else
+		pr_debug("word4: desc_write_addr = 0x%08x, write_desc = %d, "
+			"drp_data = %d, use_desc_reg = %d\n",
+			dtd_get_desc_write_addr(dtd), dtd_get_write_desc(dtd),
+			dtd_get_drop_data(dtd), dtd_get_use_desc(dtd));
+
+	if (dir == DTD_DIR_IN)
+		pr_debug("word5: hor_start = %d, ver_start = %d\n",
+			dtd_get_h_start(dtd), dtd_get_v_start(dtd));
+	else
+		pr_debug("word5: max_width %d, max_height %d\n",
+			dtd_get_max_width(dtd), dtd_get_max_height(dtd));
+
+	pr_debug("word6: client specfic attr0 = 0x%08x\n", dtd->client_attr0);
+	pr_debug("word7: client specfic attr1 = 0x%08x\n", dtd->client_attr1);
+}
+
+/*
+ * append an outbound data transfer descriptor to the given descriptor list,
+ * this sets up a 'client to memory' VPDMA transfer for the given VPDMA channel
+ */
+void vpdma_add_out_dtd(struct vpdma_desc_list *list, struct v4l2_rect *c_rect,
+		const struct vpdma_data_format *fmt, dma_addr_t dma_addr,
+		enum vpdma_channel chan, u32 flags)
+{
+	int priority = 0;
+	int field = 0;
+	int notify = 1;
+	int channel, next_chan;
+	int depth = fmt->depth;
+	int stride;
+	struct vpdma_dtd *dtd;
+
+	channel = next_chan = chan_info[chan].num;
+
+	if (fmt->data_type == DATA_TYPE_C420)
+		depth = 8;
+
+	stride = (depth * c_rect->width) >> 3;
+	dma_addr += (c_rect->left * depth) >> 3;
+
+	dtd = list->next;
+	WARN_ON((void *)(dtd + 1) > (list->buf.addr + list->buf.size));
+
+	dtd->type_ctl_stride = dtd_type_ctl_stride(fmt->data_type,
+					notify,
+					field,
+					!!(flags & VPDMA_DATA_FRAME_1D),
+					!!(flags & VPDMA_DATA_EVEN_LINE_SKIP),
+					!!(flags & VPDMA_DATA_ODD_LINE_SKIP),
+					stride);
+	dtd->w1 = 0;
+	dtd->start_addr = (u32) dma_addr;
+	dtd->pkt_ctl = dtd_pkt_ctl(!!(flags & VPDMA_DATA_MODE_TILED),
+				DTD_DIR_OUT, channel, priority, next_chan);
+	dtd->desc_write_addr = dtd_desc_write_addr(0, 0, 0, 0);
+	dtd->max_width_height = dtd_max_width_height(MAX_OUT_WIDTH_1920,
+					MAX_OUT_HEIGHT_1080);
+	dtd->client_attr0 = 0;
+	dtd->client_attr1 = 0;
+
+	list->next = dtd + 1;
+
+	dump_dtd(dtd);
+}
+
+/*
+ * append an inbound data transfer descriptor to the given descriptor list,
+ * this sets up a 'memory to client' VPDMA transfer for the given VPDMA channel
+ */
+void vpdma_add_in_dtd(struct vpdma_desc_list *list, int frame_width,
+		int frame_height, struct v4l2_rect *c_rect,
+		const struct vpdma_data_format *fmt, dma_addr_t dma_addr,
+		enum vpdma_channel chan, int field, u32 flags)
+{
+	int priority = 0;
+	int notify = 1;
+	int depth = fmt->depth;
+	int channel, next_chan;
+	int stride;
+	int height = c_rect->height;
+	struct vpdma_dtd *dtd;
+
+	channel = next_chan = chan_info[chan].num;
+
+	if (fmt->data_type == DATA_TYPE_C420) {
+		height >>= 1;
+		frame_height >>= 1;
+		depth = 8;
+	}
+
+	stride = (depth * c_rect->width) >> 3;
+	dma_addr += (c_rect->left * depth) >> 3;
+
+	dtd = list->next;
+	WARN_ON((void *)(dtd + 1) > (list->buf.addr + list->buf.size));
+
+	dtd->type_ctl_stride = dtd_type_ctl_stride(fmt->data_type,
+					notify,
+					field,
+					!!(flags & VPDMA_DATA_FRAME_1D),
+					!!(flags & VPDMA_DATA_EVEN_LINE_SKIP),
+					!!(flags & VPDMA_DATA_ODD_LINE_SKIP),
+					stride);
+
+	dtd->xfer_length_height = dtd_xfer_length_height(c_rect->width, height);
+	dtd->start_addr = (u32) dma_addr;
+	dtd->pkt_ctl = dtd_pkt_ctl(!!(flags & VPDMA_DATA_MODE_TILED),
+				DTD_DIR_IN, channel, priority, next_chan);
+	dtd->frame_width_height = dtd_frame_width_height(frame_width,
+					frame_height);
+	dtd->start_h_v = dtd_start_h_v(c_rect->left, c_rect->top);
+	dtd->client_attr0 = 0;
+	dtd->client_attr1 = 0;
+
+	list->next = dtd + 1;
+
+	dump_dtd(dtd);
+}
+
 /* set or clear the mask for list complete interrupt */
 void vpdma_enable_list_complete_irq(struct vpdma_data *vpdma, int list_num,
 		bool enable)
diff --git a/drivers/media/platform/ti-vpe/vpdma.h b/drivers/media/platform/ti-vpe/vpdma.h
index 8056689..eaa2a71 100644
--- a/drivers/media/platform/ti-vpe/vpdma.h
+++ b/drivers/media/platform/ti-vpe/vpdma.h
@@ -124,6 +124,39 @@ enum vpdma_channel {
 	VPE_CHAN_RGB_OUT,
 };
 
+/* flags for VPDMA data descriptors */
+#define VPDMA_DATA_ODD_LINE_SKIP	(1 << 0)
+#define VPDMA_DATA_EVEN_LINE_SKIP	(1 << 1)
+#define VPDMA_DATA_FRAME_1D		(1 << 2)
+#define VPDMA_DATA_MODE_TILED		(1 << 3)
+
+/*
+ * client identifiers used for configuration descriptors
+ */
+#define CFD_MMR_CLIENT		0
+#define CFD_SC_CLIENT		4
+
+/* Address data block header format */
+struct vpdma_adb_hdr {
+	u32			offset;
+	u32			nwords;
+	u32			reserved0;
+	u32			reserved1;
+};
+
+/* helpers for creating ADB headers for config descriptors MMRs as client */
+#define ADB_ADDR(dma_buf, str, fld)	((dma_buf)->addr + offsetof(str, fld))
+#define MMR_ADB_ADDR(buf, str, fld)	ADB_ADDR(&(buf), struct str, fld)
+
+#define VPDMA_SET_MMR_ADB_HDR(buf, str, hdr, regs, offset_a)	\
+	do {							\
+		struct vpdma_adb_hdr *h;			\
+		struct str *adb = NULL;				\
+		h = MMR_ADB_ADDR(buf, str, hdr);		\
+		h->offset = (offset_a);				\
+		h->nwords = sizeof(adb->regs) >> 2;		\
+	} while (0)
+
 /* vpdma descriptor buffer allocation and management */
 int vpdma_alloc_desc_buf(struct vpdma_buf *buf, size_t size);
 void vpdma_free_desc_buf(struct vpdma_buf *buf);
@@ -136,6 +169,21 @@ void vpdma_reset_desc_list(struct vpdma_desc_list *list);
 void vpdma_free_desc_list(struct vpdma_desc_list *list);
 int vpdma_submit_descs(struct vpdma_data *vpdma, struct vpdma_desc_list *list);
 
+/* helpers for creating vpdma descriptors */
+void vpdma_add_cfd_block(struct vpdma_desc_list *list, int client,
+		struct vpdma_buf *blk, u32 dest_offset);
+void vpdma_add_cfd_adb(struct vpdma_desc_list *list, int client,
+		struct vpdma_buf *adb);
+void vpdma_add_sync_on_channel_ctd(struct vpdma_desc_list *list,
+		enum vpdma_channel chan);
+void vpdma_add_out_dtd(struct vpdma_desc_list *list, struct v4l2_rect *c_rect,
+		const struct vpdma_data_format *fmt, dma_addr_t dma_addr,
+		enum vpdma_channel chan, u32 flags);
+void vpdma_add_in_dtd(struct vpdma_desc_list *list, int frame_width,
+		int frame_height, struct v4l2_rect *c_rect,
+		const struct vpdma_data_format *fmt, dma_addr_t dma_addr,
+		enum vpdma_channel chan, int field, u32 flags);
+
 /* vpdma list interrupt management */
 void vpdma_enable_list_complete_irq(struct vpdma_data *vpdma, int list_num,
 		bool enable);
diff --git a/drivers/media/platform/ti-vpe/vpdma_priv.h b/drivers/media/platform/ti-vpe/vpdma_priv.h
index 8ff51a3..f0e9a80 100644
--- a/drivers/media/platform/ti-vpe/vpdma_priv.h
+++ b/drivers/media/platform/ti-vpe/vpdma_priv.h
@@ -116,4 +116,526 @@
 #define	VPE_CHAN_NUM_CHROMA_OUT		103
 #define	VPE_CHAN_NUM_RGB_OUT		106
 
+/*
+ * a VPDMA address data block payload for a configuration descriptor needs to
+ * have each sub block length as a multiple of 16 bytes. Therefore, the overall
+ * size of the payload also needs to be a multiple of 16 bytes. The sub block
+ * lengths should be ensured to be aligned by the VPDMA user.
+ */
+#define VPDMA_ADB_SIZE_ALIGN		0x0f
+
+/*
+ * data transfer descriptor
+ */
+struct vpdma_dtd {
+	u32			type_ctl_stride;
+	union {
+		u32		xfer_length_height;
+		u32		w1;
+	};
+	dma_addr_t		start_addr;
+	u32			pkt_ctl;
+	union {
+		u32		frame_width_height;	/* inbound */
+		dma_addr_t	desc_write_addr;	/* outbound */
+	};
+	union {
+		u32		start_h_v;		/* inbound */
+		u32		max_width_height;	/* outbound */
+	};
+	u32			client_attr0;
+	u32			client_attr1;
+};
+
+/* Data Transfer Descriptor specifics */
+#define DTD_NO_NOTIFY		0
+#define DTD_NOTIFY		1
+
+#define DTD_PKT_TYPE		0xa
+#define DTD_DIR_IN		0
+#define DTD_DIR_OUT		1
+
+/* type_ctl_stride */
+#define DTD_DATA_TYPE_MASK	0x3f
+#define DTD_DATA_TYPE_SHFT	26
+#define DTD_NOTIFY_MASK		0x01
+#define DTD_NOTIFY_SHFT		25
+#define DTD_FIELD_MASK		0x01
+#define DTD_FIELD_SHFT		24
+#define DTD_1D_MASK		0x01
+#define DTD_1D_SHFT		23
+#define DTD_EVEN_LINE_SKIP_MASK	0x01
+#define DTD_EVEN_LINE_SKIP_SHFT	20
+#define DTD_ODD_LINE_SKIP_MASK	0x01
+#define DTD_ODD_LINE_SKIP_SHFT	16
+#define DTD_LINE_STRIDE_MASK	0xffff
+#define DTD_LINE_STRIDE_SHFT	0
+
+/* xfer_length_height */
+#define DTD_LINE_LENGTH_MASK	0xffff
+#define DTD_LINE_LENGTH_SHFT	16
+#define DTD_XFER_HEIGHT_MASK	0xffff
+#define DTD_XFER_HEIGHT_SHFT	0
+
+/* pkt_ctl */
+#define DTD_PKT_TYPE_MASK	0x1f
+#define DTD_PKT_TYPE_SHFT	27
+#define DTD_MODE_MASK		0x01
+#define DTD_MODE_SHFT		26
+#define DTD_DIR_MASK		0x01
+#define DTD_DIR_SHFT		25
+#define DTD_CHAN_MASK		0x01ff
+#define DTD_CHAN_SHFT		16
+#define DTD_PRI_MASK		0x0f
+#define DTD_PRI_SHFT		9
+#define DTD_NEXT_CHAN_MASK	0x01ff
+#define DTD_NEXT_CHAN_SHFT	0
+
+/* frame_width_height */
+#define DTD_FRAME_WIDTH_MASK	0xffff
+#define DTD_FRAME_WIDTH_SHFT	16
+#define DTD_FRAME_HEIGHT_MASK	0xffff
+#define DTD_FRAME_HEIGHT_SHFT	0
+
+/* start_h_v */
+#define DTD_H_START_MASK	0xffff
+#define DTD_H_START_SHFT	16
+#define DTD_V_START_MASK	0xffff
+#define DTD_V_START_SHFT	0
+
+#define DTD_DESC_START_SHIFT	5
+#define DTD_WRITE_DESC_MASK	0x01
+#define DTD_WRITE_DESC_SHIFT	2
+#define DTD_DROP_DATA_MASK	0x01
+#define DTD_DROP_DATA_SHIFT	1
+#define DTD_USE_DESC_MASK	0x01
+#define DTD_USE_DESC_SHIFT	0
+
+/* max_width_height */
+#define DTD_MAX_WIDTH_MASK	0x07
+#define DTD_MAX_WIDTH_SHFT	4
+#define DTD_MAX_HEIGHT_MASK	0x07
+#define DTD_MAX_HEIGHT_SHFT	0
+
+/* max width configurations */
+ /* unlimited width */
+#define	MAX_OUT_WIDTH_UNLIMITED		0
+/* as specified in max_size1 reg */
+#define MAX_OUT_WIDTH_REG1		1
+/* as specified in max_size2 reg */
+#define MAX_OUT_WIDTH_REG2		2
+/* as specified in max_size3 reg */
+#define	MAX_OUT_WIDTH_REG3		3
+/* maximum of 352 pixels as width */
+#define MAX_OUT_WIDTH_352		4
+/* maximum of 768 pixels as width */
+#define	MAX_OUT_WIDTH_768		5
+/* maximum of 1280 pixels width */
+#define	MAX_OUT_WIDTH_1280		6
+/* maximum of 1920 pixels as width */
+#define	MAX_OUT_WIDTH_1920		7
+
+/* max height configurations */
+ /* unlimited height */
+#define	MAX_OUT_HEIGHT_UNLIMITED	0
+/* as specified in max_size1 reg */
+#define MAX_OUT_HEIGHT_REG1		1
+/* as specified in max_size2 reg */
+#define MAX_OUT_HEIGHT_REG2		2
+/* as specified in max_size3 reg */
+#define	MAX_OUT_HEIGHT_REG3		3
+/* maximum of 288 lines as height */
+#define MAX_OUT_HEIGHT_288		4
+/* maximum of 576 lines as height */
+#define	MAX_OUT_HEIGHT_576		5
+/* maximum of 720 lines as height */
+#define	MAX_OUT_HEIGHT_720		6
+/* maximum of 1080 lines as height */
+#define	MAX_OUT_HEIGHT_1080		7
+
+static inline u32 dtd_type_ctl_stride(int type, bool notify, int field,
+			bool one_d, bool even_line_skip, bool odd_line_skip,
+			int line_stride)
+{
+	return (type << DTD_DATA_TYPE_SHFT) | (notify << DTD_NOTIFY_SHFT) |
+		(field << DTD_FIELD_SHFT) | (one_d << DTD_1D_SHFT) |
+		(even_line_skip << DTD_EVEN_LINE_SKIP_SHFT) |
+		(odd_line_skip << DTD_ODD_LINE_SKIP_SHFT) |
+		line_stride;
+}
+
+static inline u32 dtd_xfer_length_height(int line_length, int xfer_height)
+{
+	return (line_length << DTD_LINE_LENGTH_SHFT) | xfer_height;
+}
+
+static inline u32 dtd_pkt_ctl(bool mode, bool dir, int chan, int pri,
+			int next_chan)
+{
+	return (DTD_PKT_TYPE << DTD_PKT_TYPE_SHFT) | (mode << DTD_MODE_SHFT) |
+		(dir << DTD_DIR_SHFT) | (chan << DTD_CHAN_SHFT) |
+		(pri << DTD_PRI_SHFT) | next_chan;
+}
+
+static inline u32 dtd_frame_width_height(int width, int height)
+{
+	return (width << DTD_FRAME_WIDTH_SHFT) | height;
+}
+
+static inline u32 dtd_desc_write_addr(unsigned int addr, bool write_desc,
+			bool drop_data, bool use_desc)
+{
+	return (addr << DTD_DESC_START_SHIFT) |
+		(write_desc << DTD_WRITE_DESC_SHIFT) |
+		(drop_data << DTD_DROP_DATA_SHIFT) |
+		use_desc;
+}
+
+static inline u32 dtd_start_h_v(int h_start, int v_start)
+{
+	return (h_start << DTD_H_START_SHFT) | v_start;
+}
+
+static inline u32 dtd_max_width_height(int max_width, int max_height)
+{
+	return (max_width << DTD_MAX_WIDTH_SHFT) | max_height;
+}
+
+static inline int dtd_get_data_type(struct vpdma_dtd *dtd)
+{
+	return dtd->type_ctl_stride >> DTD_DATA_TYPE_SHFT;
+}
+
+static inline bool dtd_get_notify(struct vpdma_dtd *dtd)
+{
+	return (dtd->type_ctl_stride >> DTD_NOTIFY_SHFT) & DTD_NOTIFY_MASK;
+}
+
+static inline int dtd_get_field(struct vpdma_dtd *dtd)
+{
+	return (dtd->type_ctl_stride >> DTD_FIELD_SHFT) & DTD_FIELD_MASK;
+}
+
+static inline bool dtd_get_1d(struct vpdma_dtd *dtd)
+{
+	return (dtd->type_ctl_stride >> DTD_1D_SHFT) & DTD_1D_MASK;
+}
+
+static inline bool dtd_get_even_line_skip(struct vpdma_dtd *dtd)
+{
+	return (dtd->type_ctl_stride >> DTD_EVEN_LINE_SKIP_SHFT)
+		& DTD_EVEN_LINE_SKIP_MASK;
+}
+
+static inline bool dtd_get_odd_line_skip(struct vpdma_dtd *dtd)
+{
+	return (dtd->type_ctl_stride >> DTD_ODD_LINE_SKIP_SHFT)
+		& DTD_ODD_LINE_SKIP_MASK;
+}
+
+static inline int dtd_get_line_stride(struct vpdma_dtd *dtd)
+{
+	return dtd->type_ctl_stride & DTD_LINE_STRIDE_MASK;
+}
+
+static inline int dtd_get_line_length(struct vpdma_dtd *dtd)
+{
+	return dtd->xfer_length_height >> DTD_LINE_LENGTH_SHFT;
+}
+
+static inline int dtd_get_xfer_height(struct vpdma_dtd *dtd)
+{
+	return dtd->xfer_length_height & DTD_XFER_HEIGHT_MASK;
+}
+
+static inline int dtd_get_pkt_type(struct vpdma_dtd *dtd)
+{
+	return dtd->pkt_ctl >> DTD_PKT_TYPE_SHFT;
+}
+
+static inline bool dtd_get_mode(struct vpdma_dtd *dtd)
+{
+	return (dtd->pkt_ctl >> DTD_MODE_SHFT) & DTD_MODE_MASK;
+}
+
+static inline bool dtd_get_dir(struct vpdma_dtd *dtd)
+{
+	return (dtd->pkt_ctl >> DTD_DIR_SHFT) & DTD_DIR_MASK;
+}
+
+static inline int dtd_get_chan(struct vpdma_dtd *dtd)
+{
+	return (dtd->pkt_ctl >> DTD_CHAN_SHFT) & DTD_CHAN_MASK;
+}
+
+static inline int dtd_get_priority(struct vpdma_dtd *dtd)
+{
+	return (dtd->pkt_ctl >> DTD_PRI_SHFT) & DTD_PRI_MASK;
+}
+
+static inline int dtd_get_next_chan(struct vpdma_dtd *dtd)
+{
+	return (dtd->pkt_ctl >> DTD_NEXT_CHAN_SHFT) & DTD_NEXT_CHAN_MASK;
+}
+
+static inline int dtd_get_frame_width(struct vpdma_dtd *dtd)
+{
+	return dtd->frame_width_height >> DTD_FRAME_WIDTH_SHFT;
+}
+
+static inline int dtd_get_frame_height(struct vpdma_dtd *dtd)
+{
+	return dtd->frame_width_height & DTD_FRAME_HEIGHT_MASK;
+}
+
+static inline int dtd_get_desc_write_addr(struct vpdma_dtd *dtd)
+{
+	return dtd->desc_write_addr >> DTD_DESC_START_SHIFT;
+}
+
+static inline bool dtd_get_write_desc(struct vpdma_dtd *dtd)
+{
+	return (dtd->desc_write_addr >> DTD_WRITE_DESC_SHIFT) &
+							DTD_WRITE_DESC_MASK;
+}
+
+static inline bool dtd_get_drop_data(struct vpdma_dtd *dtd)
+{
+	return (dtd->desc_write_addr >> DTD_DROP_DATA_SHIFT) &
+							DTD_DROP_DATA_MASK;
+}
+
+static inline bool dtd_get_use_desc(struct vpdma_dtd *dtd)
+{
+	return dtd->desc_write_addr & DTD_USE_DESC_MASK;
+}
+
+static inline int dtd_get_h_start(struct vpdma_dtd *dtd)
+{
+	return dtd->start_h_v >> DTD_H_START_SHFT;
+}
+
+static inline int dtd_get_v_start(struct vpdma_dtd *dtd)
+{
+	return dtd->start_h_v & DTD_V_START_MASK;
+}
+
+static inline int dtd_get_max_width(struct vpdma_dtd *dtd)
+{
+	return (dtd->max_width_height >> DTD_MAX_WIDTH_SHFT) &
+							DTD_MAX_WIDTH_MASK;
+}
+
+static inline int dtd_get_max_height(struct vpdma_dtd *dtd)
+{
+	return (dtd->max_width_height >> DTD_MAX_HEIGHT_SHFT) &
+							DTD_MAX_HEIGHT_MASK;
+}
+
+/*
+ * configuration descriptor
+ */
+struct vpdma_cfd {
+	union {
+		u32	dest_addr_offset;
+		u32	w0;
+	};
+	union {
+		u32	block_len;		/* in words */
+		u32	w1;
+	};
+	u32		payload_addr;
+	u32		ctl_payload_len;	/* in words */
+};
+
+/* Configuration descriptor specifics */
+
+#define CFD_PKT_TYPE		0xb
+
+#define CFD_DIRECT		1
+#define CFD_INDIRECT		0
+#define CFD_CLS_ADB		0
+#define CFD_CLS_BLOCK		1
+
+/* block_len */
+#define CFD__BLOCK_LEN_MASK	0xffff
+#define CFD__BLOCK_LEN_SHFT	0
+
+/* ctl_payload_len */
+#define CFD_PKT_TYPE_MASK	0x1f
+#define CFD_PKT_TYPE_SHFT	27
+#define CFD_DIRECT_MASK		0x01
+#define CFD_DIRECT_SHFT		26
+#define CFD_CLASS_MASK		0x03
+#define CFD_CLASS_SHFT		24
+#define CFD_DEST_MASK		0xff
+#define CFD_DEST_SHFT		16
+#define CFD_PAYLOAD_LEN_MASK	0xffff
+#define CFD_PAYLOAD_LEN_SHFT	0
+
+static inline u32 cfd_pkt_payload_len(bool direct, int cls, int dest,
+		int payload_len)
+{
+	return (CFD_PKT_TYPE << CFD_PKT_TYPE_SHFT) |
+		(direct << CFD_DIRECT_SHFT) |
+		(cls << CFD_CLASS_SHFT) |
+		(dest << CFD_DEST_SHFT) |
+		payload_len;
+}
+
+static inline int cfd_get_pkt_type(struct vpdma_cfd *cfd)
+{
+	return cfd->ctl_payload_len >> CFD_PKT_TYPE_SHFT;
+}
+
+static inline bool cfd_get_direct(struct vpdma_cfd *cfd)
+{
+	return (cfd->ctl_payload_len >> CFD_DIRECT_SHFT) & CFD_DIRECT_MASK;
+}
+
+static inline bool cfd_get_class(struct vpdma_cfd *cfd)
+{
+	return (cfd->ctl_payload_len >> CFD_CLASS_SHFT) & CFD_CLASS_MASK;
+}
+
+static inline int cfd_get_dest(struct vpdma_cfd *cfd)
+{
+	return (cfd->ctl_payload_len >> CFD_DEST_SHFT) & CFD_DEST_MASK;
+}
+
+static inline int cfd_get_payload_len(struct vpdma_cfd *cfd)
+{
+	return cfd->ctl_payload_len & CFD_PAYLOAD_LEN_MASK;
+}
+
+/*
+ * control descriptor
+ */
+struct vpdma_ctd {
+	union {
+		u32	timer_value;
+		u32	list_addr;
+		u32	w0;
+	};
+	union {
+		u32	pixel_line_count;
+		u32	list_size;
+		u32	w1;
+	};
+	union {
+		u32	event;
+		u32	fid_ctl;
+		u32	w2;
+	};
+	u32		type_source_ctl;
+};
+
+/* control descriptor types */
+#define CTD_TYPE_SYNC_ON_CLIENT		0
+#define CTD_TYPE_SYNC_ON_LIST		1
+#define CTD_TYPE_SYNC_ON_EXT		2
+#define CTD_TYPE_SYNC_ON_LM_TIMER	3
+#define CTD_TYPE_SYNC_ON_CHANNEL	4
+#define CTD_TYPE_CHNG_CLIENT_IRQ	5
+#define CTD_TYPE_SEND_IRQ		6
+#define CTD_TYPE_RELOAD_LIST		7
+#define CTD_TYPE_ABORT_CHANNEL		8
+
+#define CTD_PKT_TYPE		0xc
+
+/* timer_value */
+#define CTD_TIMER_VALUE_MASK	0xffff
+#define CTD_TIMER_VALUE_SHFT	0
+
+/* pixel_line_count */
+#define CTD_PIXEL_COUNT_MASK	0xffff
+#define CTD_PIXEL_COUNT_SHFT	16
+#define CTD_LINE_COUNT_MASK	0xffff
+#define CTD_LINE_COUNT_SHFT	0
+
+/* list_size */
+#define CTD_LIST_SIZE_MASK	0xffff
+#define CTD_LIST_SIZE_SHFT	0
+
+/* event */
+#define CTD_EVENT_MASK		0x0f
+#define CTD_EVENT_SHFT		0
+
+/* fid_ctl */
+#define CTD_FID2_MASK		0x03
+#define CTD_FID2_SHFT		4
+#define CTD_FID1_MASK		0x03
+#define CTD_FID1_SHFT		2
+#define CTD_FID0_MASK		0x03
+#define CTD_FID0_SHFT		0
+
+/* type_source_ctl */
+#define CTD_PKT_TYPE_MASK	0x1f
+#define CTD_PKT_TYPE_SHFT	27
+#define CTD_SOURCE_MASK		0xff
+#define CTD_SOURCE_SHFT		16
+#define CTD_CONTROL_MASK	0x0f
+#define CTD_CONTROL_SHFT	0
+
+static inline u32 ctd_pixel_line_count(int pixel_count, int line_count)
+{
+	return (pixel_count << CTD_PIXEL_COUNT_SHFT) | line_count;
+}
+
+static inline u32 ctd_set_fid_ctl(int fid0, int fid1, int fid2)
+{
+	return (fid2 << CTD_FID2_SHFT) | (fid1 << CTD_FID1_SHFT) | fid0;
+}
+
+static inline u32 ctd_type_source_ctl(int source, int control)
+{
+	return (CTD_PKT_TYPE << CTD_PKT_TYPE_SHFT) |
+		(source << CTD_SOURCE_SHFT) | control;
+}
+
+static inline u32 ctd_get_pixel_count(struct vpdma_ctd *ctd)
+{
+	return ctd->pixel_line_count >> CTD_PIXEL_COUNT_SHFT;
+}
+
+static inline int ctd_get_line_count(struct vpdma_ctd *ctd)
+{
+	return ctd->pixel_line_count & CTD_LINE_COUNT_MASK;
+}
+
+static inline int ctd_get_event(struct vpdma_ctd *ctd)
+{
+	return ctd->event & CTD_EVENT_MASK;
+}
+
+static inline int ctd_get_fid2_ctl(struct vpdma_ctd *ctd)
+{
+	return (ctd->fid_ctl >> CTD_FID2_SHFT) & CTD_FID2_MASK;
+}
+
+static inline int ctd_get_fid1_ctl(struct vpdma_ctd *ctd)
+{
+	return (ctd->fid_ctl >> CTD_FID1_SHFT) & CTD_FID1_MASK;
+}
+
+static inline int ctd_get_fid0_ctl(struct vpdma_ctd *ctd)
+{
+	return ctd->fid_ctl & CTD_FID2_MASK;
+}
+
+static inline int ctd_get_pkt_type(struct vpdma_ctd *ctd)
+{
+	return ctd->type_source_ctl >> CTD_PKT_TYPE_SHFT;
+}
+
+static inline int ctd_get_source(struct vpdma_ctd *ctd)
+{
+	return (ctd->type_source_ctl >> CTD_SOURCE_SHFT) & CTD_SOURCE_MASK;
+}
+
+static inline int ctd_get_ctl(struct vpdma_ctd *ctd)
+{
+	return ctd->type_source_ctl & CTD_CONTROL_MASK;
+}
+
 #endif
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v4 3/4] v4l: ti-vpe: Add VPE mem to mem driver
  2013-09-06 10:12     ` Archit Taneja
@ 2013-09-06 10:12       ` Archit Taneja
  -1 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-09-06 10:12 UTC (permalink / raw)
  To: linux-media, hverkuil, laurent.pinchart
  Cc: linux-omap, tomi.valkeinen, Archit Taneja

VPE is a block which consists of a single memory to memory path which can
perform chrominance up/down sampling, de-interlacing, scaling, and color space
conversion of raster or tiled YUV420 coplanar, YUV422 coplanar or YUV422
interleaved video formats.

We create a mem2mem driver based primarily on the mem2mem-testdev example.
The de-interlacer, scaler and color space converter are all bypassed for now
to keep the driver simple. Chroma up/down sampler blocks are implemented, so
conversion beteen different YUV formats is possible.

Each mem2mem context allocates a buffer for VPE MMR values which it will use
when it gets access to the VPE HW via the mem2mem queue, it also allocates
a VPDMA descriptor list to which configuration and data descriptors are added.

Based on the information received via v4l2 ioctls for the source and
destination queues, the driver configures the values for the MMRs, and stores
them in the buffer. There are also some VPDMA parameters like frame start and
line mode which needs to be configured, these are configured by direct register
writes via the VPDMA helper functions.

The driver's device_run() mem2mem op will add each descriptor based on how the
source and destination queues are set up for the given ctx, once the list is
prepared, it's submitted to VPDMA, these descriptors when parsed by VPDMA will
upload MMR registers, start DMA of video buffers on the various input and output
clients/ports.

When the list is parsed completely(and the DMAs on all the output ports done),
an interrupt is generated which we use to notify that the source and destination
buffers are done.

The rest of the driver is quite similar to other mem2mem drivers, we use the
multiplane v4l2 ioctls as the HW support coplanar formats.

Signed-off-by: Archit Taneja <archit@ti.com>
---
 drivers/media/platform/Kconfig           |   16 +
 drivers/media/platform/Makefile          |    2 +
 drivers/media/platform/ti-vpe/Makefile   |    5 +
 drivers/media/platform/ti-vpe/vpe.c      | 1750 ++++++++++++++++++++++++++++++
 drivers/media/platform/ti-vpe/vpe_regs.h |  496 +++++++++
 include/uapi/linux/v4l2-controls.h       |    4 +
 6 files changed, 2273 insertions(+)
 create mode 100644 drivers/media/platform/ti-vpe/Makefile
 create mode 100644 drivers/media/platform/ti-vpe/vpe.c
 create mode 100644 drivers/media/platform/ti-vpe/vpe_regs.h

diff --git a/drivers/media/platform/Kconfig b/drivers/media/platform/Kconfig
index 8068d7b..f622943 100644
--- a/drivers/media/platform/Kconfig
+++ b/drivers/media/platform/Kconfig
@@ -220,6 +220,22 @@ config VIDEO_RENESAS_VSP1
 	  To compile this driver as a module, choose M here: the module
 	  will be called vsp1.
 
+config VIDEO_TI_VPE
+	tristate "TI VPE (Video Processing Engine) driver"
+	depends on VIDEO_DEV && VIDEO_V4L2 && SOC_DRA7XX
+	select VIDEOBUF2_DMA_CONTIG
+	select V4L2_MEM2MEM_DEV
+	default n
+	---help---
+	  Support for the TI VPE(Video Processing Engine) block
+	  found on DRA7XX SoC.
+
+config VIDEO_TI_VPE_DEBUG
+	bool "VPE debug messages"
+	depends on VIDEO_TI_VPE
+	---help---
+	  Enable debug messages on VPE driver.
+
 endif # V4L_MEM2MEM_DRIVERS
 
 menuconfig V4L_TEST_DRIVERS
diff --git a/drivers/media/platform/Makefile b/drivers/media/platform/Makefile
index 4e4da48..1348ba1 100644
--- a/drivers/media/platform/Makefile
+++ b/drivers/media/platform/Makefile
@@ -22,6 +22,8 @@ obj-$(CONFIG_VIDEO_VIVI) += vivi.o
 
 obj-$(CONFIG_VIDEO_MEM2MEM_TESTDEV) += mem2mem_testdev.o
 
+obj-$(CONFIG_VIDEO_TI_VPE)		+= ti-vpe/
+
 obj-$(CONFIG_VIDEO_MX2_EMMAPRP)		+= mx2_emmaprp.o
 obj-$(CONFIG_VIDEO_CODA) 		+= coda.o
 
diff --git a/drivers/media/platform/ti-vpe/Makefile b/drivers/media/platform/ti-vpe/Makefile
new file mode 100644
index 0000000..cbf0a80
--- /dev/null
+++ b/drivers/media/platform/ti-vpe/Makefile
@@ -0,0 +1,5 @@
+obj-$(CONFIG_VIDEO_TI_VPE) += ti-vpe.o
+
+ti-vpe-y := vpe.o vpdma.o
+
+ccflags-$(CONFIG_VIDEO_TI_VPE_DEBUG) += -DDEBUG
diff --git a/drivers/media/platform/ti-vpe/vpe.c b/drivers/media/platform/ti-vpe/vpe.c
new file mode 100644
index 0000000..549681e
--- /dev/null
+++ b/drivers/media/platform/ti-vpe/vpe.c
@@ -0,0 +1,1750 @@
+/*
+ * TI VPE mem2mem driver, based on the virtual v4l2-mem2mem example driver
+ *
+ * Copyright (c) 2013 Texas Instruments Inc.
+ * David Griego, <dagriego@biglakesoftware.com>
+ * Dale Farnsworth, <dale@farnsworth.org>
+ * Archit Taneja, <archit@ti.com>
+ *
+ * Copyright (c) 2009-2010 Samsung Electronics Co., Ltd.
+ * Pawel Osciak, <pawel@osciak.com>
+ * Marek Szyprowski, <m.szyprowski@samsung.com>
+ *
+ * Based on the virtual v4l2-mem2mem example device
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation
+ */
+
+#include <linux/delay.h>
+#include <linux/dma-mapping.h>
+#include <linux/err.h>
+#include <linux/fs.h>
+#include <linux/interrupt.h>
+#include <linux/io.h>
+#include <linux/ioctl.h>
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <linux/pm_runtime.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/videodev2.h>
+
+#include <media/v4l2-common.h>
+#include <media/v4l2-ctrls.h>
+#include <media/v4l2-device.h>
+#include <media/v4l2-event.h>
+#include <media/v4l2-ioctl.h>
+#include <media/v4l2-mem2mem.h>
+#include <media/videobuf2-core.h>
+#include <media/videobuf2-dma-contig.h>
+
+#include "vpdma.h"
+#include "vpe_regs.h"
+
+#define VPE_MODULE_NAME "vpe"
+
+/* minimum and maximum frame sizes */
+#define MIN_W		128
+#define MIN_H		128
+#define MAX_W		1920
+#define MAX_H		1080
+
+/* required alignments */
+#define S_ALIGN		0	/* multiple of 1 */
+#define H_ALIGN		1	/* multiple of 2 */
+#define W_ALIGN		1	/* multiple of 2 */
+
+/* multiple of 128 bits, line stride, 16 bytes */
+#define L_ALIGN		4
+
+/* flags that indicate a format can be used for capture/output */
+#define VPE_FMT_TYPE_CAPTURE	(1 << 0)
+#define VPE_FMT_TYPE_OUTPUT	(1 << 1)
+
+/* used as plane indices */
+#define VPE_MAX_PLANES	2
+#define VPE_LUMA	0
+#define VPE_CHROMA	1
+
+/* per m2m context info */
+#define VPE_DEF_BUFS_PER_JOB	1	/* default one buffer per batch job */
+
+/*
+ * each VPE context can need up to 3 config desciptors, 7 input descriptors,
+ * 3 output descriptors, and 10 control descriptors
+ */
+#define VPE_DESC_LIST_SIZE	(10 * VPDMA_DTD_DESC_SIZE +	\
+					13 * VPDMA_CFD_CTD_DESC_SIZE)
+
+#define vpe_dbg(vpedev, fmt, arg...)	\
+		dev_dbg((vpedev)->v4l2_dev.dev, fmt, ##arg)
+#define vpe_err(vpedev, fmt, arg...)	\
+		dev_err((vpedev)->v4l2_dev.dev, fmt, ##arg)
+
+struct vpe_us_coeffs {
+	unsigned short	anchor_fid0_c0;
+	unsigned short	anchor_fid0_c1;
+	unsigned short	anchor_fid0_c2;
+	unsigned short	anchor_fid0_c3;
+	unsigned short	interp_fid0_c0;
+	unsigned short	interp_fid0_c1;
+	unsigned short	interp_fid0_c2;
+	unsigned short	interp_fid0_c3;
+	unsigned short	anchor_fid1_c0;
+	unsigned short	anchor_fid1_c1;
+	unsigned short	anchor_fid1_c2;
+	unsigned short	anchor_fid1_c3;
+	unsigned short	interp_fid1_c0;
+	unsigned short	interp_fid1_c1;
+	unsigned short	interp_fid1_c2;
+	unsigned short	interp_fid1_c3;
+};
+
+/*
+ * Default upsampler coefficients
+ */
+static const struct vpe_us_coeffs us_coeffs[] = {
+	{
+		/* Coefficients for progressive input */
+		0x00C8, 0x0348, 0x0018, 0x3FD8, 0x3FB8, 0x0378, 0x00E8, 0x3FE8,
+		0x00C8, 0x0348, 0x0018, 0x3FD8, 0x3FB8, 0x0378, 0x00E8, 0x3FE8,
+	},
+};
+
+/*
+ * The port_data structure contains per-port data.
+ */
+struct vpe_port_data {
+	enum vpdma_channel channel;	/* VPDMA channel */
+	u8	vb_part;		/* plane index for co-panar formats */
+};
+
+/*
+ * Define indices into the port_data tables
+ */
+#define VPE_PORT_LUMA1_IN	0
+#define VPE_PORT_CHROMA1_IN	1
+#define VPE_PORT_LUMA_OUT	8
+#define VPE_PORT_CHROMA_OUT	9
+#define VPE_PORT_RGB_OUT	10
+
+static const struct vpe_port_data port_data[11] = {
+	[VPE_PORT_LUMA1_IN] = {
+		.channel	= VPE_CHAN_LUMA1_IN,
+		.vb_part	= VPE_LUMA,
+	},
+	[VPE_PORT_CHROMA1_IN] = {
+		.channel	= VPE_CHAN_CHROMA1_IN,
+		.vb_part	= VPE_CHROMA,
+	},
+	[VPE_PORT_LUMA_OUT] = {
+		.channel	= VPE_CHAN_LUMA_OUT,
+		.vb_part	= VPE_LUMA,
+	},
+	[VPE_PORT_CHROMA_OUT] = {
+		.channel	= VPE_CHAN_CHROMA_OUT,
+		.vb_part	= VPE_CHROMA,
+	},
+	[VPE_PORT_RGB_OUT] = {
+		.channel	= VPE_CHAN_RGB_OUT,
+		.vb_part	= VPE_LUMA,
+	},
+};
+
+
+/* driver info for each of the supported video formats */
+struct vpe_fmt {
+	char	*name;			/* human-readable name */
+	u32	fourcc;			/* standard format identifier */
+	u8	types;			/* CAPTURE and/or OUTPUT */
+	u8	coplanar;		/* set for unpacked Luma and Chroma */
+	/* vpdma format info for each plane */
+	struct vpdma_data_format const *vpdma_fmt[VPE_MAX_PLANES];
+};
+
+static struct vpe_fmt vpe_formats[] = {
+	{
+		.name		= "YUV 422 co-planar",
+		.fourcc		= V4L2_PIX_FMT_NV16,
+		.types		= VPE_FMT_TYPE_CAPTURE | VPE_FMT_TYPE_OUTPUT,
+		.coplanar	= 1,
+		.vpdma_fmt	= { &vpdma_yuv_fmts[VPDMA_DATA_FMT_Y444],
+				    &vpdma_yuv_fmts[VPDMA_DATA_FMT_C444],
+				  },
+	},
+	{
+		.name		= "YUV 420 co-planar",
+		.fourcc		= V4L2_PIX_FMT_NV12,
+		.types		= VPE_FMT_TYPE_CAPTURE | VPE_FMT_TYPE_OUTPUT,
+		.coplanar	= 1,
+		.vpdma_fmt	= { &vpdma_yuv_fmts[VPDMA_DATA_FMT_Y420],
+				    &vpdma_yuv_fmts[VPDMA_DATA_FMT_C420],
+				  },
+	},
+	{
+		.name		= "YUYV 422 packed",
+		.fourcc		= V4L2_PIX_FMT_YUYV,
+		.types		= VPE_FMT_TYPE_CAPTURE | VPE_FMT_TYPE_OUTPUT,
+		.coplanar	= 0,
+		.vpdma_fmt	= { &vpdma_yuv_fmts[VPDMA_DATA_FMT_YC422],
+				  },
+	},
+	{
+		.name		= "UYVY 422 packed",
+		.fourcc		= V4L2_PIX_FMT_UYVY,
+		.types		= VPE_FMT_TYPE_CAPTURE | VPE_FMT_TYPE_OUTPUT,
+		.coplanar	= 0,
+		.vpdma_fmt	= { &vpdma_yuv_fmts[VPDMA_DATA_FMT_CY422],
+				  },
+	},
+};
+
+/*
+ * per-queue, driver-specific private data.
+ * there is one source queue and one destination queue for each m2m context.
+ */
+struct vpe_q_data {
+	unsigned int		width;				/* frame width */
+	unsigned int		height;				/* frame height */
+	unsigned int		bytesperline[VPE_MAX_PLANES];	/* bytes per line in memory */
+	enum v4l2_colorspace	colorspace;
+	unsigned int		flags;
+	unsigned int		sizeimage[VPE_MAX_PLANES];	/* image size in memory */
+	struct v4l2_rect	c_rect;				/* crop/compose rectangle */
+	struct vpe_fmt		*fmt;				/* format info */
+};
+
+/* vpe_q_data flag bits */
+#define	Q_DATA_FRAME_1D		(1 << 0)
+#define	Q_DATA_MODE_TILED	(1 << 1)
+
+enum {
+	Q_DATA_SRC = 0,
+	Q_DATA_DST = 1,
+};
+
+/* find our format description corresponding to the passed v4l2_format */
+static struct vpe_fmt *find_format(struct v4l2_format *f)
+{
+	struct vpe_fmt *fmt;
+	unsigned int k;
+
+	for (k = 0; k < ARRAY_SIZE(vpe_formats); k++) {
+		fmt = &vpe_formats[k];
+		if (fmt->fourcc == f->fmt.pix.pixelformat)
+			return fmt;
+	}
+
+	return NULL;
+}
+
+/*
+ * there is one vpe_dev structure in the driver, it is shared by
+ * all instances.
+ */
+struct vpe_dev {
+	struct v4l2_device	v4l2_dev;
+	struct video_device	vfd;
+	struct v4l2_m2m_dev	*m2m_dev;
+
+	atomic_t		num_instances;	/* count of driver instances */
+	dma_addr_t		loaded_mmrs;	/* shadow mmrs in device */
+	struct mutex		dev_mutex;
+	spinlock_t		lock;
+
+	int			irq;
+	void __iomem		*base;
+
+	struct vb2_alloc_ctx	*alloc_ctx;
+	struct vpdma_data	*vpdma;		/* vpdma data handle */
+};
+
+/*
+ * There is one vpe_ctx structure for each m2m context.
+ */
+struct vpe_ctx {
+	struct v4l2_fh		fh;
+	struct vpe_dev		*dev;
+	struct v4l2_m2m_ctx	*m2m_ctx;
+	struct v4l2_ctrl_handler hdl;
+
+	unsigned int		sequence;		/* current frame/field seq */
+	unsigned int		aborting;		/* abort after next irq */
+
+	unsigned int		bufs_per_job;		/* input buffers per batch */
+	unsigned int		bufs_completed;		/* bufs done in this batch */
+
+	struct vpe_q_data	q_data[2];		/* src & dst queue data */
+	struct vb2_buffer	*src_vb;
+	struct vb2_buffer	*dst_vb;
+
+	struct vpdma_buf	mmr_adb;		/* shadow reg addr/data block */
+	struct vpdma_desc_list	desc_list;		/* DMA descriptor list */
+
+	bool			load_mmrs;		/* have new shadow reg values */
+};
+
+
+/*
+ * M2M devices get 2 queues.
+ * Return the queue given the type.
+ */
+static struct vpe_q_data *get_q_data(struct vpe_ctx *ctx,
+				     enum v4l2_buf_type type)
+{
+	switch (type) {
+	case V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE:
+		return &ctx->q_data[Q_DATA_SRC];
+	case V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE:
+		return &ctx->q_data[Q_DATA_DST];
+	default:
+		BUG();
+	}
+	return NULL;
+}
+
+static u32 read_reg(struct vpe_dev *dev, int offset)
+{
+	return ioread32(dev->base + offset);
+}
+
+static void write_reg(struct vpe_dev *dev, int offset, u32 value)
+{
+	iowrite32(value, dev->base + offset);
+}
+
+/* register field read/write helpers */
+static int get_field(u32 value, u32 mask, int shift)
+{
+	return (value & (mask << shift)) >> shift;
+}
+
+static int read_field_reg(struct vpe_dev *dev, int offset, u32 mask, int shift)
+{
+	return get_field(read_reg(dev, offset), mask, shift);
+}
+
+static void write_field(u32 *valp, u32 field, u32 mask, int shift)
+{
+	u32 val = *valp;
+
+	val &= ~(mask << shift);
+	val |= (field & mask) << shift;
+	*valp = val;
+}
+
+static void write_field_reg(struct vpe_dev *dev, int offset, u32 field,
+		u32 mask, int shift)
+{
+	u32 val = read_reg(dev, offset);
+
+	write_field(&val, field, mask, shift);
+
+	write_reg(dev, offset, val);
+}
+
+/*
+ * DMA address/data block for the shadow registers
+ */
+struct vpe_mmr_adb {
+	struct vpdma_adb_hdr	out_fmt_hdr;
+	u32			out_fmt_reg[1];
+	u32			out_fmt_pad[3];
+	struct vpdma_adb_hdr	us1_hdr;
+	u32			us1_regs[8];
+	struct vpdma_adb_hdr	us2_hdr;
+	u32			us2_regs[8];
+	struct vpdma_adb_hdr	us3_hdr;
+	u32			us3_regs[8];
+	struct vpdma_adb_hdr	dei_hdr;
+	u32			dei_regs[1];
+	u32			dei_pad[3];
+	struct vpdma_adb_hdr	sc_hdr;
+	u32			sc_regs[1];
+	u32			sc_pad[3];
+	struct vpdma_adb_hdr	csc_hdr;
+	u32			csc_regs[6];
+	u32			csc_pad[2];
+};
+
+#define VPE_SET_MMR_ADB_HDR(ctx, hdr, regs, offset_a)	\
+	VPDMA_SET_MMR_ADB_HDR(ctx->mmr_adb, vpe_mmr_adb, hdr, regs, offset_a)
+/*
+ * Set the headers for all of the address/data block structures.
+ */
+static void init_adb_hdrs(struct vpe_ctx *ctx)
+{
+	VPE_SET_MMR_ADB_HDR(ctx, out_fmt_hdr, out_fmt_reg, VPE_CLK_FORMAT_SELECT);
+	VPE_SET_MMR_ADB_HDR(ctx, us1_hdr, us1_regs, VPE_US1_R0);
+	VPE_SET_MMR_ADB_HDR(ctx, us2_hdr, us2_regs, VPE_US2_R0);
+	VPE_SET_MMR_ADB_HDR(ctx, us3_hdr, us3_regs, VPE_US3_R0);
+	VPE_SET_MMR_ADB_HDR(ctx, dei_hdr, dei_regs, VPE_DEI_FRAME_SIZE);
+	VPE_SET_MMR_ADB_HDR(ctx, sc_hdr, sc_regs, VPE_SC_MP_SC0);
+	VPE_SET_MMR_ADB_HDR(ctx, csc_hdr, csc_regs, VPE_CSC_CSC00);
+};
+
+/*
+ * Enable or disable the VPE clocks
+ */
+static void vpe_set_clock_enable(struct vpe_dev *dev, bool on)
+{
+	u32 val = 0;
+
+	if (on)
+		val = VPE_DATA_PATH_CLK_ENABLE | VPE_VPEDMA_CLK_ENABLE;
+	write_reg(dev, VPE_CLK_ENABLE, val);
+}
+
+static void vpe_top_reset(struct vpe_dev *dev)
+{
+
+	write_field_reg(dev, VPE_CLK_RESET, 1, VPE_DATA_PATH_CLK_RESET_MASK,
+		VPE_DATA_PATH_CLK_RESET_SHIFT);
+
+	usleep_range(100, 150);
+
+	write_field_reg(dev, VPE_CLK_RESET, 0, VPE_DATA_PATH_CLK_RESET_MASK,
+		VPE_DATA_PATH_CLK_RESET_SHIFT);
+}
+
+static void vpe_top_vpdma_reset(struct vpe_dev *dev)
+{
+	write_field_reg(dev, VPE_CLK_RESET, 1, VPE_VPDMA_CLK_RESET_MASK,
+		VPE_VPDMA_CLK_RESET_SHIFT);
+
+	usleep_range(100, 150);
+
+	write_field_reg(dev, VPE_CLK_RESET, 0, VPE_VPDMA_CLK_RESET_MASK,
+		VPE_VPDMA_CLK_RESET_SHIFT);
+}
+
+/*
+ * Load the correct of upsampler coefficients into the shadow MMRs
+ */
+static void set_us_coefficients(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	u32 *us1_reg = &mmr_adb->us1_regs[0];
+	u32 *us2_reg = &mmr_adb->us2_regs[0];
+	u32 *us3_reg = &mmr_adb->us3_regs[0];
+	const unsigned short *cp, *end_cp;
+
+	cp = &us_coeffs[0].anchor_fid0_c0;
+
+	end_cp = cp + sizeof(us_coeffs[0]) / sizeof(*cp);
+
+	while (cp < end_cp) {
+		write_field(us1_reg, *cp++, VPE_US_C0_MASK, VPE_US_C0_SHIFT);
+		write_field(us1_reg, *cp++, VPE_US_C1_MASK, VPE_US_C1_SHIFT);
+		*us2_reg++ = *us1_reg;
+		*us3_reg++ = *us1_reg++;
+	}
+	ctx->load_mmrs = true;
+}
+
+/*
+ * Set the upsampler config mode and the VPDMA line mode in the shadow MMRs.
+ */
+static void set_cfg_and_line_modes(struct vpe_ctx *ctx)
+{
+	struct vpe_fmt *fmt = ctx->q_data[Q_DATA_SRC].fmt;
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	u32 *us1_reg0 = &mmr_adb->us1_regs[0];
+	u32 *us2_reg0 = &mmr_adb->us2_regs[0];
+	u32 *us3_reg0 = &mmr_adb->us3_regs[0];
+	int line_mode = 1;
+	int cfg_mode = 1;
+
+	/*
+	 * Cfg Mode 0: YUV420 source, enable upsampler, DEI is de-interlacing.
+	 * Cfg Mode 1: YUV422 source, disable upsampler, DEI is de-interlacing.
+	 */
+
+	if (fmt->fourcc == V4L2_PIX_FMT_NV12) {
+		cfg_mode = 0;
+		line_mode = 0;		/* double lines to line buffer */
+	}
+
+	write_field(us1_reg0, cfg_mode, VPE_US_MODE_MASK, VPE_US_MODE_SHIFT);
+	write_field(us2_reg0, cfg_mode, VPE_US_MODE_MASK, VPE_US_MODE_SHIFT);
+	write_field(us3_reg0, cfg_mode, VPE_US_MODE_MASK, VPE_US_MODE_SHIFT);
+
+	/* regs for now */
+	vpdma_set_line_mode(ctx->dev->vpdma, line_mode, VPE_CHAN_CHROMA1_IN);
+
+	/* frame start for input luma */
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_LUMA1_IN);
+
+	/* frame start for input chroma */
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_CHROMA1_IN);
+
+	ctx->load_mmrs = true;
+}
+
+/*
+ * Set the shadow registers that are modified when the source
+ * format changes.
+ */
+static void set_src_registers(struct vpe_ctx *ctx)
+{
+	set_us_coefficients(ctx);
+}
+
+/*
+ * Set the shadow registers that are modified when the destination
+ * format changes.
+ */
+static void set_dst_registers(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	struct vpe_fmt *fmt = ctx->q_data[Q_DATA_DST].fmt;
+	u32 val = 0;
+
+	/* select RGB path when color space conversion is supported in future */
+	if (fmt->fourcc == V4L2_PIX_FMT_RGB24)
+		val |= VPE_RGB_OUT_SELECT | VPE_CSC_SRC_DEI_SCALER;
+	else if (fmt->fourcc == V4L2_PIX_FMT_NV16)
+		val |= VPE_COLOR_SEPARATE_422;
+
+	/* The source of CHR_DS is always the scaler, whether it's used or not */
+	val |= VPE_DS_SRC_DEI_SCALER;
+
+	if (fmt->fourcc != V4L2_PIX_FMT_NV12)
+		val |= VPE_DS_BYPASS;
+
+	mmr_adb->out_fmt_reg[0] = val;
+
+	ctx->load_mmrs = true;
+}
+
+/*
+ * Set the de-interlacer shadow register values
+ */
+static void set_dei_regs_bypass(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	struct vpe_q_data *s_q_data = &ctx->q_data[Q_DATA_SRC];
+	unsigned int src_h = s_q_data->c_rect.height;
+	unsigned int src_w = s_q_data->c_rect.width;
+	u32 *dei_mmr0 = &mmr_adb->dei_regs[0];
+	u32 val = 0;
+
+	/*
+	 * according to TRM, we should set DEI in progressive bypass mode when
+	 * the input content is progressive, however, DEI is bypassed correctly
+	 * for both progressive and interlace content in interlace bypass mode.
+	 * It has been recommended not to use progressive bypass mode.
+	 */
+	val = VPE_DEI_INTERLACE_BYPASS;
+
+	val |= (src_h << VPE_DEI_HEIGHT_SHIFT) |
+		(src_w << VPE_DEI_WIDTH_SHIFT) |
+		VPE_DEI_FIELD_FLUSH;
+
+	*dei_mmr0 = val;
+
+	ctx->load_mmrs = true;
+}
+
+static void set_csc_coeff_bypass(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	u32 *shadow_csc_reg5 = &mmr_adb->csc_regs[5];
+
+	*shadow_csc_reg5 |= VPE_CSC_BYPASS;
+
+	ctx->load_mmrs = true;
+}
+
+static void set_sc_regs_bypass(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	u32 *sc_reg0 = &mmr_adb->sc_regs[0];
+	u32 val = 0;
+
+	val |= VPE_SC_BYPASS;
+	*sc_reg0 = val;
+
+	ctx->load_mmrs = true;
+}
+
+/*
+ * Set the shadow registers whose values are modified when either the
+ * source or destination format is changed.
+ */
+static int set_srcdst_params(struct vpe_ctx *ctx)
+{
+	ctx->sequence = 0;
+
+	set_cfg_and_line_modes(ctx);
+	set_dei_regs_bypass(ctx);
+	set_csc_coeff_bypass(ctx);
+	set_sc_regs_bypass(ctx);
+
+	return 0;
+}
+
+/*
+ * Return the vpe_ctx structure for a given struct file
+ */
+static struct vpe_ctx *file2ctx(struct file *file)
+{
+	return container_of(file->private_data, struct vpe_ctx, fh);
+}
+
+/*
+ * mem2mem callbacks
+ */
+
+/**
+ * job_ready() - check whether an instance is ready to be scheduled to run
+ */
+static int job_ready(void *priv)
+{
+	struct vpe_ctx *ctx = priv;
+	int needed = ctx->bufs_per_job;
+
+	if (v4l2_m2m_num_src_bufs_ready(ctx->m2m_ctx) < needed)
+		return 0;
+
+	return 1;
+}
+
+static void job_abort(void *priv)
+{
+	struct vpe_ctx *ctx = priv;
+
+	/* Will cancel the transaction in the next interrupt handler */
+	ctx->aborting = 1;
+}
+
+/*
+ * Lock access to the device
+ */
+static void vpe_lock(void *priv)
+{
+	struct vpe_ctx *ctx = priv;
+	struct vpe_dev *dev = ctx->dev;
+	mutex_lock(&dev->dev_mutex);
+}
+
+static void vpe_unlock(void *priv)
+{
+	struct vpe_ctx *ctx = priv;
+	struct vpe_dev *dev = ctx->dev;
+	mutex_unlock(&dev->dev_mutex);
+}
+
+static void vpe_dump_regs(struct vpe_dev *dev)
+{
+#define DUMPREG(r) vpe_dbg(dev, "%-35s %08x\n", #r, read_reg(dev, VPE_##r))
+
+	vpe_dbg(dev, "VPE Registers:\n");
+
+	DUMPREG(PID);
+	DUMPREG(SYSCONFIG);
+	DUMPREG(INT0_STATUS0_RAW);
+	DUMPREG(INT0_STATUS0);
+	DUMPREG(INT0_ENABLE0);
+	DUMPREG(INT0_STATUS1_RAW);
+	DUMPREG(INT0_STATUS1);
+	DUMPREG(INT0_ENABLE1);
+	DUMPREG(CLK_ENABLE);
+	DUMPREG(CLK_RESET);
+	DUMPREG(CLK_FORMAT_SELECT);
+	DUMPREG(CLK_RANGE_MAP);
+	DUMPREG(US1_R0);
+	DUMPREG(US1_R1);
+	DUMPREG(US1_R2);
+	DUMPREG(US1_R3);
+	DUMPREG(US1_R4);
+	DUMPREG(US1_R5);
+	DUMPREG(US1_R6);
+	DUMPREG(US1_R7);
+	DUMPREG(US2_R0);
+	DUMPREG(US2_R1);
+	DUMPREG(US2_R2);
+	DUMPREG(US2_R3);
+	DUMPREG(US2_R4);
+	DUMPREG(US2_R5);
+	DUMPREG(US2_R6);
+	DUMPREG(US2_R7);
+	DUMPREG(US3_R0);
+	DUMPREG(US3_R1);
+	DUMPREG(US3_R2);
+	DUMPREG(US3_R3);
+	DUMPREG(US3_R4);
+	DUMPREG(US3_R5);
+	DUMPREG(US3_R6);
+	DUMPREG(US3_R7);
+	DUMPREG(DEI_FRAME_SIZE);
+	DUMPREG(MDT_BYPASS);
+	DUMPREG(MDT_SF_THRESHOLD);
+	DUMPREG(EDI_CONFIG);
+	DUMPREG(DEI_EDI_LUT_R0);
+	DUMPREG(DEI_EDI_LUT_R1);
+	DUMPREG(DEI_EDI_LUT_R2);
+	DUMPREG(DEI_EDI_LUT_R3);
+	DUMPREG(DEI_FMD_WINDOW_R0);
+	DUMPREG(DEI_FMD_WINDOW_R1);
+	DUMPREG(DEI_FMD_CONTROL_R0);
+	DUMPREG(DEI_FMD_CONTROL_R1);
+	DUMPREG(DEI_FMD_STATUS_R0);
+	DUMPREG(DEI_FMD_STATUS_R1);
+	DUMPREG(DEI_FMD_STATUS_R2);
+	DUMPREG(SC_MP_SC0);
+	DUMPREG(SC_MP_SC1);
+	DUMPREG(SC_MP_SC2);
+	DUMPREG(SC_MP_SC3);
+	DUMPREG(SC_MP_SC4);
+	DUMPREG(SC_MP_SC5);
+	DUMPREG(SC_MP_SC6);
+	DUMPREG(SC_MP_SC8);
+	DUMPREG(SC_MP_SC9);
+	DUMPREG(SC_MP_SC10);
+	DUMPREG(SC_MP_SC11);
+	DUMPREG(SC_MP_SC12);
+	DUMPREG(SC_MP_SC13);
+	DUMPREG(SC_MP_SC17);
+	DUMPREG(SC_MP_SC18);
+	DUMPREG(SC_MP_SC19);
+	DUMPREG(SC_MP_SC20);
+	DUMPREG(SC_MP_SC21);
+	DUMPREG(SC_MP_SC22);
+	DUMPREG(SC_MP_SC23);
+	DUMPREG(SC_MP_SC24);
+	DUMPREG(SC_MP_SC25);
+	DUMPREG(CSC_CSC00);
+	DUMPREG(CSC_CSC01);
+	DUMPREG(CSC_CSC02);
+	DUMPREG(CSC_CSC03);
+	DUMPREG(CSC_CSC04);
+	DUMPREG(CSC_CSC05);
+#undef DUMPREG
+}
+
+static void add_out_dtd(struct vpe_ctx *ctx, int port)
+{
+	struct vpe_q_data *q_data = &ctx->q_data[Q_DATA_DST];
+	const struct vpe_port_data *p_data = &port_data[port];
+	struct vb2_buffer *vb = ctx->dst_vb;
+	struct v4l2_rect *c_rect = &q_data->c_rect;
+	struct vpe_fmt *fmt = q_data->fmt;
+	const struct vpdma_data_format *vpdma_fmt;
+	int plane = fmt->coplanar ? p_data->vb_part : 0;
+	dma_addr_t dma_addr;
+	u32 flags = 0;
+
+	vpdma_fmt = fmt->vpdma_fmt[plane];
+	dma_addr = vb2_dma_contig_plane_dma_addr(vb, plane);
+	if (!dma_addr) {
+		vpe_err(ctx->dev,
+			"acquiring output buffer(%d) dma_addr failed\n",
+			port);
+		return;
+	}
+
+	if (q_data->flags & Q_DATA_FRAME_1D)
+		flags |= VPDMA_DATA_FRAME_1D;
+	if (q_data->flags & Q_DATA_MODE_TILED)
+		flags |= VPDMA_DATA_MODE_TILED;
+
+	vpdma_add_out_dtd(&ctx->desc_list, c_rect, vpdma_fmt, dma_addr,
+		p_data->channel, flags);
+}
+
+static void add_in_dtd(struct vpe_ctx *ctx, int port)
+{
+	struct vpe_q_data *q_data = &ctx->q_data[Q_DATA_SRC];
+	const struct vpe_port_data *p_data = &port_data[port];
+	struct vb2_buffer *vb = ctx->src_vb;
+	struct v4l2_rect *c_rect = &q_data->c_rect;
+	struct vpe_fmt *fmt = q_data->fmt;
+	const struct vpdma_data_format *vpdma_fmt;
+	int plane = fmt->coplanar ? p_data->vb_part : 0;
+	int field = 0;
+	dma_addr_t dma_addr;
+	u32 flags = 0;
+
+	vpdma_fmt = fmt->vpdma_fmt[plane];
+
+	dma_addr = vb2_dma_contig_plane_dma_addr(vb, plane);
+	if (!dma_addr) {
+		vpe_err(ctx->dev,
+			"acquiring input buffer(%d) dma_addr failed\n",
+			port);
+		return;
+	}
+
+	if (q_data->flags & Q_DATA_FRAME_1D)
+		flags |= VPDMA_DATA_FRAME_1D;
+	if (q_data->flags & Q_DATA_MODE_TILED)
+		flags |= VPDMA_DATA_MODE_TILED;
+
+	vpdma_add_in_dtd(&ctx->desc_list, q_data->width, q_data->height,
+		c_rect, vpdma_fmt, dma_addr, p_data->channel, field, flags);
+}
+
+/*
+ * Enable the expected IRQ sources
+ */
+static void enable_irqs(struct vpe_ctx *ctx)
+{
+	write_reg(ctx->dev, VPE_INT0_ENABLE0_SET, VPE_INT0_LIST0_COMPLETE);
+	write_reg(ctx->dev, VPE_INT0_ENABLE1_SET, VPE_DS1_UV_ERROR_INT);
+
+	vpdma_enable_list_complete_irq(ctx->dev->vpdma, 0, true);
+}
+
+static void disable_irqs(struct vpe_ctx *ctx)
+{
+	write_reg(ctx->dev, VPE_INT0_ENABLE0_CLR, 0xffffffff);
+	write_reg(ctx->dev, VPE_INT0_ENABLE1_CLR, 0xffffffff);
+
+	vpdma_enable_list_complete_irq(ctx->dev->vpdma, 0, false);
+}
+
+/* device_run() - prepares and starts the device
+ *
+ * This function is only called when both the source and destination
+ * buffers are in place.
+ */
+static void device_run(void *priv)
+{
+	struct vpe_ctx *ctx = priv;
+	struct vpe_q_data *d_q_data = &ctx->q_data[Q_DATA_DST];
+
+	ctx->src_vb = v4l2_m2m_src_buf_remove(ctx->m2m_ctx);
+	WARN_ON(ctx->src_vb == NULL);
+	ctx->dst_vb = v4l2_m2m_dst_buf_remove(ctx->m2m_ctx);
+	WARN_ON(ctx->dst_vb == NULL);
+
+	/* config descriptors */
+	if (ctx->dev->loaded_mmrs != ctx->mmr_adb.dma_addr || ctx->load_mmrs) {
+		vpdma_map_desc_buf(ctx->dev->vpdma, &ctx->mmr_adb);
+		vpdma_add_cfd_adb(&ctx->desc_list, CFD_MMR_CLIENT, &ctx->mmr_adb);
+		ctx->dev->loaded_mmrs = ctx->mmr_adb.dma_addr;
+		ctx->load_mmrs = false;
+	}
+
+	add_out_dtd(ctx, VPE_PORT_LUMA_OUT);
+	if (d_q_data->fmt->coplanar)
+		add_out_dtd(ctx, VPE_PORT_CHROMA_OUT);
+
+	add_in_dtd(ctx, VPE_PORT_LUMA1_IN);
+	add_in_dtd(ctx, VPE_PORT_CHROMA1_IN);
+
+	/* sync on channel control descriptors for input ports */
+	vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_LUMA1_IN);
+	vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_CHROMA1_IN);
+
+	/* sync on channel control descriptors for output ports */
+	vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_LUMA_OUT);
+	if (d_q_data->fmt->coplanar)
+		vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_CHROMA_OUT);
+
+	enable_irqs(ctx);
+
+	vpdma_map_desc_buf(ctx->dev->vpdma, &ctx->desc_list.buf);
+	vpdma_submit_descs(ctx->dev->vpdma, &ctx->desc_list);
+}
+
+static void ds1_uv_error(struct vpe_ctx *ctx)
+{
+	dev_warn(ctx->dev->v4l2_dev.dev,
+		"received downsampler error interrupt\n");
+}
+
+static irqreturn_t vpe_irq(int irq_vpe, void *data)
+{
+	struct vpe_dev *dev = (struct vpe_dev *)data;
+	struct vpe_ctx *ctx;
+	struct vb2_buffer *s_vb, *d_vb;
+	struct v4l2_buffer *s_buf, *d_buf;
+	unsigned long flags;
+	u32 irqst0, irqst1;
+
+	irqst0 = read_reg(dev, VPE_INT0_STATUS0);
+	if (irqst0) {
+		write_reg(dev, VPE_INT0_STATUS0_CLR, irqst0);
+		vpe_dbg(dev, "INT0_STATUS0 = 0x%08x\n", irqst0);
+	}
+
+	irqst1 = read_reg(dev, VPE_INT0_STATUS1);
+	if (irqst1) {
+		write_reg(dev, VPE_INT0_STATUS1_CLR, irqst1);
+		vpe_dbg(dev, "INT0_STATUS1 = 0x%08x\n", irqst1);
+	}
+
+	ctx = v4l2_m2m_get_curr_priv(dev->m2m_dev);
+	if (!ctx) {
+		vpe_err(dev, "instance released before end of transaction\n");
+		goto handled;
+	}
+
+	if (irqst1 & VPE_DS1_UV_ERROR_INT) {
+		irqst1 &= ~VPE_DS1_UV_ERROR_INT;
+		ds1_uv_error(ctx);
+	}
+
+	if (irqst0) {
+		if (irqst0 & VPE_INT0_LIST0_COMPLETE)
+			vpdma_clear_list_stat(ctx->dev->vpdma);
+
+		irqst0 &= ~(VPE_INT0_LIST0_COMPLETE);
+	}
+
+	if (irqst0 | irqst1) {
+		dev_warn(dev->v4l2_dev.dev, "Unexpected interrupt: "
+			"INT0_STATUS0 = 0x%08x, INT0_STATUS1 = 0x%08x\n",
+			irqst0, irqst1);
+	}
+
+	disable_irqs(ctx);
+
+	vpdma_unmap_desc_buf(dev->vpdma, &ctx->desc_list.buf);
+	vpdma_unmap_desc_buf(dev->vpdma, &ctx->mmr_adb);
+
+	vpdma_reset_desc_list(&ctx->desc_list);
+
+	if (ctx->aborting)
+		goto finished;
+
+	s_vb = ctx->src_vb;
+	d_vb = ctx->dst_vb;
+	s_buf = &s_vb->v4l2_buf;
+	d_buf = &d_vb->v4l2_buf;
+
+	d_buf->timestamp = s_buf->timestamp;
+	if (s_buf->flags & V4L2_BUF_FLAG_TIMECODE) {
+		d_buf->flags |= V4L2_BUF_FLAG_TIMECODE;
+		d_buf->timecode = s_buf->timecode;
+	}
+
+	d_buf->sequence = ctx->sequence;
+
+	ctx->sequence++;
+
+	spin_lock_irqsave(&dev->lock, flags);
+	v4l2_m2m_buf_done(s_vb, VB2_BUF_STATE_DONE);
+	v4l2_m2m_buf_done(d_vb, VB2_BUF_STATE_DONE);
+	spin_unlock_irqrestore(&dev->lock, flags);
+
+	ctx->bufs_completed++;
+	if (ctx->bufs_completed < ctx->bufs_per_job) {
+		device_run(ctx);
+		goto handled;
+	}
+
+finished:
+	vpe_dbg(ctx->dev, "finishing transaction\n");
+	ctx->bufs_completed = 0;
+	v4l2_m2m_job_finish(dev->m2m_dev, ctx->m2m_ctx);
+handled:
+	return IRQ_HANDLED;
+}
+
+/*
+ * video ioctls
+ */
+static int vpe_querycap(struct file *file, void *priv,
+			struct v4l2_capability *cap)
+{
+	strncpy(cap->driver, VPE_MODULE_NAME, sizeof(cap->driver) - 1);
+	strncpy(cap->card, VPE_MODULE_NAME, sizeof(cap->card) - 1);
+	strlcpy(cap->bus_info, VPE_MODULE_NAME, sizeof(cap->bus_info));
+	cap->device_caps  = V4L2_CAP_VIDEO_M2M | V4L2_CAP_STREAMING;
+	cap->capabilities = cap->device_caps | V4L2_CAP_DEVICE_CAPS;
+	return 0;
+}
+
+static int __enum_fmt(struct v4l2_fmtdesc *f, u32 type)
+{
+	int i, index;
+	struct vpe_fmt *fmt = NULL;
+
+	index = 0;
+	for (i = 0; i < ARRAY_SIZE(vpe_formats); ++i) {
+		if (vpe_formats[i].types & type) {
+			if (index == f->index) {
+				fmt = &vpe_formats[i];
+				break;
+			}
+			index++;
+		}
+	}
+
+	if (!fmt)
+		return -EINVAL;
+
+	strncpy(f->description, fmt->name, sizeof(f->description) - 1);
+	f->pixelformat = fmt->fourcc;
+	return 0;
+}
+
+static int vpe_enum_fmt(struct file *file, void *priv,
+				struct v4l2_fmtdesc *f)
+{
+	if (V4L2_TYPE_IS_OUTPUT(f->type))
+		return __enum_fmt(f, VPE_FMT_TYPE_OUTPUT);
+
+	return __enum_fmt(f, VPE_FMT_TYPE_CAPTURE);
+}
+
+static int vpe_g_fmt(struct file *file, void *priv, struct v4l2_format *f)
+{
+	struct v4l2_pix_format_mplane *pix = &f->fmt.pix_mp;
+	struct vpe_ctx *ctx = file2ctx(file);
+	struct vb2_queue *vq;
+	struct vpe_q_data *q_data;
+	int i;
+
+	vq = v4l2_m2m_get_vq(ctx->m2m_ctx, f->type);
+	if (!vq)
+		return -EINVAL;
+
+	q_data = get_q_data(ctx, f->type);
+
+	pix->width = q_data->width;
+	pix->height = q_data->height;
+	pix->pixelformat = q_data->fmt->fourcc;
+	pix->colorspace = q_data->colorspace;
+	pix->num_planes = q_data->fmt->coplanar ? 2 : 1;
+
+	for (i = 0; i < pix->num_planes; i++) {
+		pix->plane_fmt[i].bytesperline = q_data->bytesperline[i];
+		pix->plane_fmt[i].sizeimage = q_data->sizeimage[i];
+	}
+
+	return 0;
+}
+
+static int __vpe_try_fmt(struct vpe_ctx *ctx, struct v4l2_format *f,
+		       struct vpe_fmt *fmt, int type)
+{
+	struct v4l2_pix_format_mplane *pix = &f->fmt.pix_mp;
+	struct v4l2_plane_pix_format *plane_fmt;
+	int i;
+
+	if (!fmt || !(fmt->types & type)) {
+		vpe_err(ctx->dev, "Fourcc format (0x%08x) invalid.\n",
+			pix->pixelformat);
+		return -EINVAL;
+	}
+
+	pix->field = V4L2_FIELD_NONE;
+
+	v4l_bound_align_image(&pix->width, MIN_W, MAX_W, W_ALIGN,
+			      &pix->height, MIN_H, MAX_H, H_ALIGN,
+			      S_ALIGN);
+
+	pix->num_planes = fmt->coplanar ? 2 : 1;
+	pix->pixelformat = fmt->fourcc;
+	pix->colorspace = fmt->fourcc == V4L2_PIX_FMT_RGB24 ?
+			V4L2_COLORSPACE_SRGB : V4L2_COLORSPACE_SMPTE170M;
+
+	for (i = 0; i < pix->num_planes; i++) {
+		int depth;
+
+		plane_fmt = &pix->plane_fmt[i];
+		depth = fmt->vpdma_fmt[i]->depth;
+
+		if (i == VPE_LUMA)
+			plane_fmt->bytesperline =
+					round_up((pix->width * depth) >> 3,
+						1 << L_ALIGN);
+		else
+			plane_fmt->bytesperline = pix->width;
+
+		plane_fmt->sizeimage =
+				(pix->height * pix->width * depth) >> 3;
+	}
+
+	return 0;
+}
+
+static int vpe_try_fmt(struct file *file, void *priv, struct v4l2_format *f)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+	struct vpe_fmt *fmt = find_format(f);
+
+	if (V4L2_TYPE_IS_OUTPUT(f->type))
+		return __vpe_try_fmt(ctx, f, fmt, VPE_FMT_TYPE_OUTPUT);
+	else
+		return __vpe_try_fmt(ctx, f, fmt, VPE_FMT_TYPE_CAPTURE);
+}
+
+static int __vpe_s_fmt(struct vpe_ctx *ctx, struct v4l2_format *f)
+{
+	struct v4l2_pix_format_mplane *pix = &f->fmt.pix_mp;
+	struct v4l2_plane_pix_format *plane_fmt;
+	struct vpe_q_data *q_data;
+	struct vb2_queue *vq;
+	int i;
+
+	vq = v4l2_m2m_get_vq(ctx->m2m_ctx, f->type);
+	if (!vq)
+		return -EINVAL;
+
+	if (vb2_is_busy(vq)) {
+		vpe_err(ctx->dev, "queue busy\n");
+		return -EBUSY;
+	}
+
+	q_data = get_q_data(ctx, f->type);
+	if (!q_data)
+		return -EINVAL;
+
+	q_data->fmt		= find_format(f);
+	q_data->width		= pix->width;
+	q_data->height		= pix->height;
+	q_data->colorspace	= pix->colorspace;
+
+	for (i = 0; i < pix->num_planes; i++) {
+		plane_fmt = &pix->plane_fmt[i];
+
+		q_data->bytesperline[i]	= plane_fmt->bytesperline;
+		q_data->sizeimage[i]	= plane_fmt->sizeimage;
+	}
+
+	q_data->c_rect.left	= 0;
+	q_data->c_rect.top	= 0;
+	q_data->c_rect.width	= q_data->width;
+	q_data->c_rect.height	= q_data->height;
+
+	vpe_dbg(ctx->dev, "Setting format for type %d, wxh: %dx%d, fmt: %d bpl_y %d",
+		f->type, q_data->width, q_data->height, q_data->fmt->fourcc,
+		q_data->bytesperline[VPE_LUMA]);
+	if (q_data->fmt->coplanar)
+		vpe_dbg(ctx->dev, " bpl_uv %d\n",
+			q_data->bytesperline[VPE_CHROMA]);
+
+	return 0;
+}
+
+static int vpe_s_fmt(struct file *file, void *priv, struct v4l2_format *f)
+{
+	int ret;
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	ret = vpe_try_fmt(file, priv, f);
+	if (ret)
+		return ret;
+
+	ret = __vpe_s_fmt(ctx, f);
+	if (ret)
+		return ret;
+
+	if (V4L2_TYPE_IS_OUTPUT(f->type))
+		set_src_registers(ctx);
+	else
+		set_dst_registers(ctx);
+
+	return set_srcdst_params(ctx);
+}
+
+static int vpe_reqbufs(struct file *file, void *priv,
+		       struct v4l2_requestbuffers *reqbufs)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	return v4l2_m2m_reqbufs(file, ctx->m2m_ctx, reqbufs);
+}
+
+static int vpe_querybuf(struct file *file, void *priv, struct v4l2_buffer *buf)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	return v4l2_m2m_querybuf(file, ctx->m2m_ctx, buf);
+}
+
+static int vpe_qbuf(struct file *file, void *priv, struct v4l2_buffer *buf)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	return v4l2_m2m_qbuf(file, ctx->m2m_ctx, buf);
+}
+
+static int vpe_dqbuf(struct file *file, void *priv, struct v4l2_buffer *buf)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	return v4l2_m2m_dqbuf(file, ctx->m2m_ctx, buf);
+}
+
+static int vpe_streamon(struct file *file, void *priv, enum v4l2_buf_type type)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	return v4l2_m2m_streamon(file, ctx->m2m_ctx, type);
+}
+
+static int vpe_streamoff(struct file *file, void *priv, enum v4l2_buf_type type)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	vpe_dump_regs(ctx->dev);
+	vpdma_dump_regs(ctx->dev->vpdma);
+
+	return v4l2_m2m_streamoff(file, ctx->m2m_ctx, type);
+}
+
+#define V4L2_CID_VPE_BUFS_PER_JOB		(V4L2_CID_USER_TI_VPE_BASE + 0)
+
+static int vpe_s_ctrl(struct v4l2_ctrl *ctrl)
+{
+	struct vpe_ctx *ctx =
+		container_of(ctrl->handler, struct vpe_ctx, hdl);
+
+	switch (ctrl->id) {
+	case V4L2_CID_VPE_BUFS_PER_JOB:
+		ctx->bufs_per_job = ctrl->val;
+		break;
+
+	default:
+		vpe_err(ctx->dev, "Invalid control\n");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static const struct v4l2_ctrl_ops vpe_ctrl_ops = {
+	.s_ctrl = vpe_s_ctrl,
+};
+
+static const struct v4l2_ioctl_ops vpe_ioctl_ops = {
+	.vidioc_querycap	= vpe_querycap,
+
+	.vidioc_enum_fmt_vid_cap_mplane = vpe_enum_fmt,
+	.vidioc_g_fmt_vid_cap_mplane	= vpe_g_fmt,
+	.vidioc_try_fmt_vid_cap_mplane	= vpe_try_fmt,
+	.vidioc_s_fmt_vid_cap_mplane	= vpe_s_fmt,
+
+	.vidioc_enum_fmt_vid_out_mplane = vpe_enum_fmt,
+	.vidioc_g_fmt_vid_out_mplane	= vpe_g_fmt,
+	.vidioc_try_fmt_vid_out_mplane	= vpe_try_fmt,
+	.vidioc_s_fmt_vid_out_mplane	= vpe_s_fmt,
+
+	.vidioc_reqbufs		= vpe_reqbufs,
+	.vidioc_querybuf	= vpe_querybuf,
+
+	.vidioc_qbuf		= vpe_qbuf,
+	.vidioc_dqbuf		= vpe_dqbuf,
+
+	.vidioc_streamon	= vpe_streamon,
+	.vidioc_streamoff	= vpe_streamoff,
+	.vidioc_subscribe_event = v4l2_ctrl_subscribe_event,
+	.vidioc_unsubscribe_event = v4l2_event_unsubscribe,
+};
+
+/*
+ * Queue operations
+ */
+static int vpe_queue_setup(struct vb2_queue *vq,
+			   const struct v4l2_format *fmt,
+			   unsigned int *nbuffers, unsigned int *nplanes,
+			   unsigned int sizes[], void *alloc_ctxs[])
+{
+	int i;
+	struct vpe_ctx *ctx = vb2_get_drv_priv(vq);
+	struct vpe_q_data *q_data;
+
+	q_data = get_q_data(ctx, vq->type);
+
+	*nplanes = q_data->fmt->coplanar ? 2 : 1;
+
+	for (i = 0; i < *nplanes; i++) {
+		sizes[i] = q_data->sizeimage[i];
+		alloc_ctxs[i] = ctx->dev->alloc_ctx;
+	}
+
+	vpe_dbg(ctx->dev, "get %d buffer(s) of size %d", *nbuffers,
+		sizes[VPE_LUMA]);
+	if (q_data->fmt->coplanar)
+		vpe_dbg(ctx->dev, " and %d\n", sizes[VPE_CHROMA]);
+
+	return 0;
+}
+
+static int vpe_buf_prepare(struct vb2_buffer *vb)
+{
+	struct vpe_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
+	struct vpe_q_data *q_data;
+	int i, num_planes;
+
+	vpe_dbg(ctx->dev, "type: %d\n", vb->vb2_queue->type);
+
+	q_data = get_q_data(ctx, vb->vb2_queue->type);
+	num_planes = q_data->fmt->coplanar ? 2 : 1;
+
+	for (i = 0; i < num_planes; i++) {
+		if (vb2_plane_size(vb, i) < q_data->sizeimage[i]) {
+			vpe_err(ctx->dev,
+				"data will not fit into plane (%lu < %lu)\n",
+				vb2_plane_size(vb, i),
+				(long) q_data->sizeimage[i]);
+			return -EINVAL;
+		}
+	}
+
+	for (i = 0; i < num_planes; i++)
+		vb2_set_plane_payload(vb, i, q_data->sizeimage[i]);
+
+	return 0;
+}
+
+static void vpe_buf_queue(struct vb2_buffer *vb)
+{
+	struct vpe_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
+	v4l2_m2m_buf_queue(ctx->m2m_ctx, vb);
+}
+
+static void vpe_wait_prepare(struct vb2_queue *q)
+{
+	struct vpe_ctx *ctx = vb2_get_drv_priv(q);
+	vpe_unlock(ctx);
+}
+
+static void vpe_wait_finish(struct vb2_queue *q)
+{
+	struct vpe_ctx *ctx = vb2_get_drv_priv(q);
+	vpe_lock(ctx);
+}
+
+static struct vb2_ops vpe_qops = {
+	.queue_setup	 = vpe_queue_setup,
+	.buf_prepare	 = vpe_buf_prepare,
+	.buf_queue	 = vpe_buf_queue,
+	.wait_prepare	 = vpe_wait_prepare,
+	.wait_finish	 = vpe_wait_finish,
+};
+
+static int queue_init(void *priv, struct vb2_queue *src_vq,
+		      struct vb2_queue *dst_vq)
+{
+	struct vpe_ctx *ctx = priv;
+	int ret;
+
+	memset(src_vq, 0, sizeof(*src_vq));
+	src_vq->type = V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE;
+	src_vq->io_modes = VB2_MMAP;
+	src_vq->drv_priv = ctx;
+	src_vq->buf_struct_size = sizeof(struct v4l2_m2m_buffer);
+	src_vq->ops = &vpe_qops;
+	src_vq->mem_ops = &vb2_dma_contig_memops;
+	src_vq->timestamp_type = V4L2_BUF_FLAG_TIMESTAMP_COPY;
+
+	ret = vb2_queue_init(src_vq);
+	if (ret)
+		return ret;
+
+	memset(dst_vq, 0, sizeof(*dst_vq));
+	dst_vq->type = V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE;
+	dst_vq->io_modes = VB2_MMAP;
+	dst_vq->drv_priv = ctx;
+	dst_vq->buf_struct_size = sizeof(struct v4l2_m2m_buffer);
+	dst_vq->ops = &vpe_qops;
+	dst_vq->mem_ops = &vb2_dma_contig_memops;
+	dst_vq->timestamp_type = V4L2_BUF_FLAG_TIMESTAMP_COPY;
+
+	return vb2_queue_init(dst_vq);
+}
+
+static const struct v4l2_ctrl_config vpe_bufs_per_job = {
+	.ops = &vpe_ctrl_ops,
+	.id = V4L2_CID_VPE_BUFS_PER_JOB,
+	.name = "Buffers Per Transaction",
+	.type = V4L2_CTRL_TYPE_INTEGER,
+	.def = VPE_DEF_BUFS_PER_JOB,
+	.min = 1,
+	.max = VIDEO_MAX_FRAME,
+	.step = 1,
+};
+
+/*
+ * File operations
+ */
+static int vpe_open(struct file *file)
+{
+	struct vpe_dev *dev = video_drvdata(file);
+	struct vpe_ctx *ctx = NULL;
+	struct vpe_q_data *s_q_data;
+	struct v4l2_ctrl_handler *hdl;
+	int ret;
+
+	vpe_dbg(dev, "vpe_open\n");
+
+	if (!dev->vpdma->ready) {
+		vpe_err(dev, "vpdma firmware not loaded\n");
+		return -ENODEV;
+	}
+
+	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
+	if (!ctx)
+		return -ENOMEM;
+
+	ctx->dev = dev;
+
+	if (mutex_lock_interruptible(&dev->dev_mutex)) {
+		ret = -ERESTARTSYS;
+		goto free_ctx;
+	}
+
+	ret = vpdma_create_desc_list(&ctx->desc_list, VPE_DESC_LIST_SIZE,
+			VPDMA_LIST_TYPE_NORMAL);
+	if (ret != 0)
+		goto unlock;
+
+	ret = vpdma_alloc_desc_buf(&ctx->mmr_adb, sizeof(struct vpe_mmr_adb));
+	if (ret != 0)
+		goto free_desc_list;
+
+	init_adb_hdrs(ctx);
+
+	v4l2_fh_init(&ctx->fh, video_devdata(file));
+	file->private_data = &ctx->fh;
+
+	hdl = &ctx->hdl;
+	v4l2_ctrl_handler_init(hdl, 1);
+	v4l2_ctrl_new_custom(hdl, &vpe_bufs_per_job, NULL);
+	if (hdl->error) {
+		ret = hdl->error;
+		goto exit_fh;
+	}
+	ctx->fh.ctrl_handler = hdl;
+	v4l2_ctrl_handler_setup(hdl);
+
+	s_q_data = &ctx->q_data[Q_DATA_SRC];
+	s_q_data->fmt = &vpe_formats[2];
+	s_q_data->width = 1920;
+	s_q_data->height = 1080;
+	s_q_data->sizeimage[VPE_LUMA] = (s_q_data->width * s_q_data->height *
+			s_q_data->fmt->vpdma_fmt[VPE_LUMA]->depth) >> 3;
+	s_q_data->colorspace = V4L2_COLORSPACE_SMPTE240M;
+	s_q_data->c_rect.left = 0;
+	s_q_data->c_rect.top = 0;
+	s_q_data->c_rect.width = s_q_data->width;
+	s_q_data->c_rect.height = s_q_data->height;
+	s_q_data->flags = 0;
+
+	ctx->q_data[Q_DATA_DST] = *s_q_data;
+
+	set_src_registers(ctx);
+	set_dst_registers(ctx);
+	ret = set_srcdst_params(ctx);
+	if (ret)
+		goto exit_fh;
+
+	ctx->m2m_ctx = v4l2_m2m_ctx_init(dev->m2m_dev, ctx, &queue_init);
+
+	if (IS_ERR(ctx->m2m_ctx)) {
+		ret = PTR_ERR(ctx->m2m_ctx);
+		goto exit_fh;
+	}
+
+	v4l2_fh_add(&ctx->fh);
+
+	/*
+	 * for now, just report the creation of the first instance, we can later
+	 * optimize the driver to enable or disable clocks when the first
+	 * instance is created or the last instance released
+	 */
+	if (atomic_inc_return(&dev->num_instances) == 1)
+		vpe_dbg(dev, "first instance created\n");
+
+	ctx->bufs_per_job = VPE_DEF_BUFS_PER_JOB;
+
+	ctx->load_mmrs = true;
+
+	vpe_dbg(dev, "created instance %p, m2m_ctx: %p\n",
+		ctx, ctx->m2m_ctx);
+
+	mutex_unlock(&dev->dev_mutex);
+
+	return 0;
+exit_fh:
+	v4l2_ctrl_handler_free(hdl);
+	v4l2_fh_exit(&ctx->fh);
+	vpdma_free_desc_buf(&ctx->mmr_adb);
+free_desc_list:
+	vpdma_free_desc_list(&ctx->desc_list);
+unlock:
+	mutex_unlock(&dev->dev_mutex);
+free_ctx:
+	kfree(ctx);
+	return ret;
+}
+
+static int vpe_release(struct file *file)
+{
+	struct vpe_dev *dev = video_drvdata(file);
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	vpe_dbg(dev, "releasing instance %p\n", ctx);
+
+	mutex_lock(&dev->dev_mutex);
+	vpdma_free_desc_list(&ctx->desc_list);
+	vpdma_free_desc_buf(&ctx->mmr_adb);
+
+	v4l2_fh_del(&ctx->fh);
+	v4l2_fh_exit(&ctx->fh);
+	v4l2_ctrl_handler_free(&ctx->hdl);
+	v4l2_m2m_ctx_release(ctx->m2m_ctx);
+
+	kfree(ctx);
+
+	/*
+	 * for now, just report the release of the last instance, we can later
+	 * optimize the driver to enable or disable clocks when the first
+	 * instance is created or the last instance released
+	 */
+	if (atomic_dec_return(&dev->num_instances) == 0)
+		vpe_dbg(dev, "last instance released\n");
+
+	mutex_unlock(&dev->dev_mutex);
+
+	return 0;
+}
+
+static unsigned int vpe_poll(struct file *file,
+			     struct poll_table_struct *wait)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+	struct vpe_dev *dev = ctx->dev;
+	int ret;
+
+	mutex_lock(&dev->dev_mutex);
+	ret = v4l2_m2m_poll(file, ctx->m2m_ctx, wait);
+	mutex_unlock(&dev->dev_mutex);
+	return ret;
+}
+
+static int vpe_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+	struct vpe_dev *dev = ctx->dev;
+	int ret;
+
+	if (mutex_lock_interruptible(&dev->dev_mutex))
+		return -ERESTARTSYS;
+	ret = v4l2_m2m_mmap(file, ctx->m2m_ctx, vma);
+	mutex_unlock(&dev->dev_mutex);
+	return ret;
+}
+
+static const struct v4l2_file_operations vpe_fops = {
+	.owner		= THIS_MODULE,
+	.open		= vpe_open,
+	.release	= vpe_release,
+	.poll		= vpe_poll,
+	.unlocked_ioctl	= video_ioctl2,
+	.mmap		= vpe_mmap,
+};
+
+static struct video_device vpe_videodev = {
+	.name		= VPE_MODULE_NAME,
+	.fops		= &vpe_fops,
+	.ioctl_ops	= &vpe_ioctl_ops,
+	.minor		= -1,
+	.release	= video_device_release,
+	.vfl_dir	= VFL_DIR_M2M,
+};
+
+static struct v4l2_m2m_ops m2m_ops = {
+	.device_run	= device_run,
+	.job_ready	= job_ready,
+	.job_abort	= job_abort,
+	.lock		= vpe_lock,
+	.unlock		= vpe_unlock,
+};
+
+static int vpe_runtime_get(struct platform_device *pdev)
+{
+	int r;
+
+	dev_dbg(&pdev->dev, "vpe_runtime_get\n");
+
+	r = pm_runtime_get_sync(&pdev->dev);
+	WARN_ON(r < 0);
+	return r < 0 ? r : 0;
+}
+
+static void vpe_runtime_put(struct platform_device *pdev)
+{
+
+	int r;
+
+	dev_dbg(&pdev->dev, "vpe_runtime_put\n");
+
+	r = pm_runtime_put_sync(&pdev->dev);
+	WARN_ON(r < 0 && r != -ENOSYS);
+}
+
+static int vpe_probe(struct platform_device *pdev)
+{
+	struct vpe_dev *dev;
+	struct video_device *vfd;
+	struct resource *res;
+	int ret, irq, func;
+
+	dev = devm_kzalloc(&pdev->dev, sizeof(*dev), GFP_KERNEL);
+	if (IS_ERR(dev))
+		return PTR_ERR(dev);
+
+	spin_lock_init(&dev->lock);
+
+	ret = v4l2_device_register(&pdev->dev, &dev->v4l2_dev);
+	if (ret)
+		return ret;
+
+	atomic_set(&dev->num_instances, 0);
+	mutex_init(&dev->dev_mutex);
+
+	res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "vpe_top");
+	/*
+	 * HACK: we get resource info from device tree in the form of a list of
+	 * VPE sub blocks, the driver currently uses only the base of vpe_top
+	 * for register access, the driver should be changed later to access
+	 * registers based on the sub block base addresses
+	 */
+	dev->base = devm_ioremap(&pdev->dev, res->start, SZ_32K);
+	if (IS_ERR(dev->base)) {
+		ret = PTR_ERR(dev->base);
+		goto v4l2_dev_unreg;
+	}
+
+	irq = platform_get_irq(pdev, 0);
+	ret = devm_request_irq(&pdev->dev, irq, vpe_irq, 0, VPE_MODULE_NAME,
+			dev);
+	if (ret)
+		goto v4l2_dev_unreg;
+
+	platform_set_drvdata(pdev, dev);
+
+	dev->alloc_ctx = vb2_dma_contig_init_ctx(&pdev->dev);
+	if (IS_ERR(dev->alloc_ctx)) {
+		vpe_err(dev, "Failed to alloc vb2 context\n");
+		ret = PTR_ERR(dev->alloc_ctx);
+		goto v4l2_dev_unreg;
+	}
+
+	dev->m2m_dev = v4l2_m2m_init(&m2m_ops);
+	if (IS_ERR(dev->m2m_dev)) {
+		vpe_err(dev, "Failed to init mem2mem device\n");
+		ret = PTR_ERR(dev->m2m_dev);
+		goto rel_ctx;
+	}
+
+	pm_runtime_enable(&pdev->dev);
+
+	ret = vpe_runtime_get(pdev);
+	if (ret)
+		goto rel_m2m;
+
+	/* Perform clk enable followed by reset */
+	vpe_set_clock_enable(dev, 1);
+
+	vpe_top_reset(dev);
+
+	func = read_field_reg(dev, VPE_PID, VPE_PID_FUNC_MASK,
+		VPE_PID_FUNC_SHIFT);
+	vpe_dbg(dev, "VPE PID function %x\n", func);
+
+	vpe_top_vpdma_reset(dev);
+
+	dev->vpdma = vpdma_create(pdev);
+	if (IS_ERR(dev->vpdma))
+		goto runtime_put;
+
+	vfd = &dev->vfd;
+	*vfd = vpe_videodev;
+	vfd->lock = &dev->dev_mutex;
+	vfd->v4l2_dev = &dev->v4l2_dev;
+
+	ret = video_register_device(vfd, VFL_TYPE_GRABBER, 0);
+	if (ret) {
+		vpe_err(dev, "Failed to register video device\n");
+		goto runtime_put;
+	}
+
+	video_set_drvdata(vfd, dev);
+	snprintf(vfd->name, sizeof(vfd->name), "%s", vpe_videodev.name);
+	dev_info(dev->v4l2_dev.dev, "Device registered as /dev/video%d\n",
+		vfd->num);
+
+	return 0;
+
+runtime_put:
+	vpe_runtime_put(pdev);
+rel_m2m:
+	pm_runtime_disable(&pdev->dev);
+	v4l2_m2m_release(dev->m2m_dev);
+rel_ctx:
+	vb2_dma_contig_cleanup_ctx(dev->alloc_ctx);
+v4l2_dev_unreg:
+	v4l2_device_unregister(&dev->v4l2_dev);
+
+	return ret;
+}
+
+static int vpe_remove(struct platform_device *pdev)
+{
+	struct vpe_dev *dev =
+		(struct vpe_dev *) platform_get_drvdata(pdev);
+
+	v4l2_info(&dev->v4l2_dev, "Removing " VPE_MODULE_NAME);
+
+	v4l2_m2m_release(dev->m2m_dev);
+	video_unregister_device(&dev->vfd);
+	v4l2_device_unregister(&dev->v4l2_dev);
+	vb2_dma_contig_cleanup_ctx(dev->alloc_ctx);
+
+	vpe_set_clock_enable(dev, 0);
+	vpe_runtime_put(pdev);
+	pm_runtime_disable(&pdev->dev);
+
+	return 0;
+}
+
+#if defined(CONFIG_OF)
+static const struct of_device_id vpe_of_match[] = {
+	{
+		.compatible = "ti,vpe",
+	},
+	{},
+};
+#else
+#define vpe_of_match NULL
+#endif
+
+static struct platform_driver vpe_pdrv = {
+	.probe		= vpe_probe,
+	.remove		= vpe_remove,
+	.driver		= {
+		.name	= VPE_MODULE_NAME,
+		.owner	= THIS_MODULE,
+		.of_match_table = vpe_of_match,
+	},
+};
+
+static void __exit vpe_exit(void)
+{
+	platform_driver_unregister(&vpe_pdrv);
+}
+
+static int __init vpe_init(void)
+{
+	return platform_driver_register(&vpe_pdrv);
+}
+
+module_init(vpe_init);
+module_exit(vpe_exit);
+
+MODULE_DESCRIPTION("TI VPE driver");
+MODULE_AUTHOR("Dale Farnsworth, <dale@farnsworth.org>");
+MODULE_LICENSE("GPL");
diff --git a/drivers/media/platform/ti-vpe/vpe_regs.h b/drivers/media/platform/ti-vpe/vpe_regs.h
new file mode 100644
index 0000000..ed214e8
--- /dev/null
+++ b/drivers/media/platform/ti-vpe/vpe_regs.h
@@ -0,0 +1,496 @@
+/*
+ * Copyright (c) 2013 Texas Instruments Inc.
+ *
+ * David Griego, <dagriego@biglakesoftware.com>
+ * Dale Farnsworth, <dale@farnsworth.org>
+ * Archit Taneja, <archit@ti.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#ifndef __TI_VPE_REGS_H
+#define __TI_VPE_REGS_H
+
+/* VPE register offsets and field selectors */
+
+/* VPE top level regs */
+#define VPE_PID				0x0000
+#define VPE_PID_MINOR_MASK		0x3f
+#define VPE_PID_MINOR_SHIFT		0
+#define VPE_PID_CUSTOM_MASK		0x03
+#define VPE_PID_CUSTOM_SHIFT		6
+#define VPE_PID_MAJOR_MASK		0x07
+#define VPE_PID_MAJOR_SHIFT		8
+#define VPE_PID_RTL_MASK		0x1f
+#define VPE_PID_RTL_SHIFT		11
+#define VPE_PID_FUNC_MASK		0xfff
+#define VPE_PID_FUNC_SHIFT		16
+#define VPE_PID_SCHEME_MASK		0x03
+#define VPE_PID_SCHEME_SHIFT		30
+
+#define VPE_SYSCONFIG			0x0010
+#define VPE_SYSCONFIG_IDLE_MASK		0x03
+#define VPE_SYSCONFIG_IDLE_SHIFT	2
+#define VPE_SYSCONFIG_STANDBY_MASK	0x03
+#define VPE_SYSCONFIG_STANDBY_SHIFT	4
+#define VPE_FORCE_IDLE_MODE		0
+#define VPE_NO_IDLE_MODE		1
+#define VPE_SMART_IDLE_MODE		2
+#define VPE_SMART_IDLE_WAKEUP_MODE	3
+#define VPE_FORCE_STANDBY_MODE		0
+#define VPE_NO_STANDBY_MODE		1
+#define VPE_SMART_STANDBY_MODE		2
+#define VPE_SMART_STANDBY_WAKEUP_MODE	3
+
+#define VPE_INT0_STATUS0_RAW_SET	0x0020
+#define VPE_INT0_STATUS0_RAW		VPE_INT0_STATUS0_RAW_SET
+#define VPE_INT0_STATUS0_CLR		0x0028
+#define VPE_INT0_STATUS0		VPE_INT0_STATUS0_CLR
+#define VPE_INT0_ENABLE0_SET		0x0030
+#define VPE_INT0_ENABLE0		VPE_INT0_ENABLE0_SET
+#define VPE_INT0_ENABLE0_CLR		0x0038
+#define VPE_INT0_LIST0_COMPLETE		(1 << 0)
+#define VPE_INT0_LIST0_NOTIFY		(1 << 1)
+#define VPE_INT0_LIST1_COMPLETE		(1 << 2)
+#define VPE_INT0_LIST1_NOTIFY		(1 << 3)
+#define VPE_INT0_LIST2_COMPLETE		(1 << 4)
+#define VPE_INT0_LIST2_NOTIFY		(1 << 5)
+#define VPE_INT0_LIST3_COMPLETE		(1 << 6)
+#define VPE_INT0_LIST3_NOTIFY		(1 << 7)
+#define VPE_INT0_LIST4_COMPLETE		(1 << 8)
+#define VPE_INT0_LIST4_NOTIFY		(1 << 9)
+#define VPE_INT0_LIST5_COMPLETE		(1 << 10)
+#define VPE_INT0_LIST5_NOTIFY		(1 << 11)
+#define VPE_INT0_LIST6_COMPLETE		(1 << 12)
+#define VPE_INT0_LIST6_NOTIFY		(1 << 13)
+#define VPE_INT0_LIST7_COMPLETE		(1 << 14)
+#define VPE_INT0_LIST7_NOTIFY		(1 << 15)
+#define VPE_INT0_DESCRIPTOR		(1 << 16)
+#define VPE_DEI_FMD_INT			(1 << 18)
+
+#define VPE_INT0_STATUS1_RAW_SET	0x0024
+#define VPE_INT0_STATUS1_RAW		VPE_INT0_STATUS1_RAW_SET
+#define VPE_INT0_STATUS1_CLR		0x002c
+#define VPE_INT0_STATUS1		VPE_INT0_STATUS1_CLR
+#define VPE_INT0_ENABLE1_SET		0x0034
+#define VPE_INT0_ENABLE1		VPE_INT0_ENABLE1_SET
+#define VPE_INT0_ENABLE1_CLR		0x003c
+#define VPE_INT0_CHANNEL_GROUP0		(1 << 0)
+#define VPE_INT0_CHANNEL_GROUP1		(1 << 1)
+#define VPE_INT0_CHANNEL_GROUP2		(1 << 2)
+#define VPE_INT0_CHANNEL_GROUP3		(1 << 3)
+#define VPE_INT0_CHANNEL_GROUP4		(1 << 4)
+#define VPE_INT0_CHANNEL_GROUP5		(1 << 5)
+#define VPE_INT0_CLIENT			(1 << 7)
+#define VPE_DEI_ERROR_INT		(1 << 16)
+#define VPE_DS1_UV_ERROR_INT		(1 << 22)
+
+#define VPE_INTC_EOI			0x00a0
+
+#define VPE_CLK_ENABLE			0x0100
+#define VPE_VPEDMA_CLK_ENABLE		(1 << 0)
+#define VPE_DATA_PATH_CLK_ENABLE	(1 << 1)
+
+#define VPE_CLK_RESET			0x0104
+#define VPE_VPDMA_CLK_RESET_MASK	0x1
+#define VPE_VPDMA_CLK_RESET_SHIFT	0
+#define VPE_DATA_PATH_CLK_RESET_MASK	0x1
+#define VPE_DATA_PATH_CLK_RESET_SHIFT	1
+#define VPE_MAIN_RESET_MASK		0x1
+#define VPE_MAIN_RESET_SHIFT		31
+
+#define VPE_CLK_FORMAT_SELECT		0x010c
+#define VPE_CSC_SRC_SELECT_MASK		0x03
+#define VPE_CSC_SRC_SELECT_SHIFT	0
+#define VPE_RGB_OUT_SELECT		(1 << 8)
+#define VPE_DS_SRC_SELECT_MASK		0x07
+#define VPE_DS_SRC_SELECT_SHIFT		9
+#define VPE_DS_BYPASS			(1 << 16)
+#define VPE_COLOR_SEPARATE_422		(1 << 18)
+
+#define VPE_DS_SRC_DEI_SCALER		(5 << VPE_DS_SRC_SELECT_SHIFT)
+#define VPE_CSC_SRC_DEI_SCALER		(3 << VPE_CSC_SRC_SELECT_SHIFT)
+
+#define VPE_CLK_RANGE_MAP		0x011c
+#define VPE_RANGE_RANGE_MAP_Y_MASK	0x07
+#define VPE_RANGE_RANGE_MAP_Y_SHIFT	0
+#define VPE_RANGE_RANGE_MAP_UV_MASK	0x07
+#define VPE_RANGE_RANGE_MAP_UV_SHIFT	3
+#define VPE_RANGE_MAP_ON		(1 << 6)
+#define VPE_RANGE_REDUCTION_ON		(1 << 28)
+
+/* VPE chrominance upsampler regs */
+#define VPE_US1_R0			0x0304
+#define VPE_US2_R0			0x0404
+#define VPE_US3_R0			0x0504
+#define VPE_US_C1_MASK			0x3fff
+#define VPE_US_C1_SHIFT			2
+#define VPE_US_C0_MASK			0x3fff
+#define VPE_US_C0_SHIFT			18
+#define VPE_US_MODE_MASK		0x03
+#define VPE_US_MODE_SHIFT		16
+#define VPE_ANCHOR_FID0_C1_MASK		0x3fff
+#define VPE_ANCHOR_FID0_C1_SHIFT	2
+#define VPE_ANCHOR_FID0_C0_MASK		0x3fff
+#define VPE_ANCHOR_FID0_C0_SHIFT	18
+
+#define VPE_US1_R1			0x0308
+#define VPE_US2_R1			0x0408
+#define VPE_US3_R1			0x0508
+#define VPE_ANCHOR_FID0_C3_MASK		0x3fff
+#define VPE_ANCHOR_FID0_C3_SHIFT	2
+#define VPE_ANCHOR_FID0_C2_MASK		0x3fff
+#define VPE_ANCHOR_FID0_C2_SHIFT	18
+
+#define VPE_US1_R2			0x030c
+#define VPE_US2_R2			0x040c
+#define VPE_US3_R2			0x050c
+#define VPE_INTERP_FID0_C1_MASK		0x3fff
+#define VPE_INTERP_FID0_C1_SHIFT	2
+#define VPE_INTERP_FID0_C0_MASK		0x3fff
+#define VPE_INTERP_FID0_C0_SHIFT	18
+
+#define VPE_US1_R3			0x0310
+#define VPE_US2_R3			0x0410
+#define VPE_US3_R3			0x0510
+#define VPE_INTERP_FID0_C3_MASK		0x3fff
+#define VPE_INTERP_FID0_C3_SHIFT	2
+#define VPE_INTERP_FID0_C2_MASK		0x3fff
+#define VPE_INTERP_FID0_C2_SHIFT	18
+
+#define VPE_US1_R4			0x0314
+#define VPE_US2_R4			0x0414
+#define VPE_US3_R4			0x0514
+#define VPE_ANCHOR_FID1_C1_MASK		0x3fff
+#define VPE_ANCHOR_FID1_C1_SHIFT	2
+#define VPE_ANCHOR_FID1_C0_MASK		0x3fff
+#define VPE_ANCHOR_FID1_C0_SHIFT	18
+
+#define VPE_US1_R5			0x0318
+#define VPE_US2_R5			0x0418
+#define VPE_US3_R5			0x0518
+#define VPE_ANCHOR_FID1_C3_MASK		0x3fff
+#define VPE_ANCHOR_FID1_C3_SHIFT	2
+#define VPE_ANCHOR_FID1_C2_MASK		0x3fff
+#define VPE_ANCHOR_FID1_C2_SHIFT	18
+
+#define VPE_US1_R6			0x031c
+#define VPE_US2_R6			0x041c
+#define VPE_US3_R6			0x051c
+#define VPE_INTERP_FID1_C1_MASK		0x3fff
+#define VPE_INTERP_FID1_C1_SHIFT	2
+#define VPE_INTERP_FID1_C0_MASK		0x3fff
+#define VPE_INTERP_FID1_C0_SHIFT	18
+
+#define VPE_US1_R7			0x0320
+#define VPE_US2_R7			0x0420
+#define VPE_US3_R7			0x0520
+#define VPE_INTERP_FID0_C3_MASK		0x3fff
+#define VPE_INTERP_FID0_C3_SHIFT	2
+#define VPE_INTERP_FID0_C2_MASK		0x3fff
+#define VPE_INTERP_FID0_C2_SHIFT	18
+
+/* VPE de-interlacer regs */
+#define VPE_DEI_FRAME_SIZE		0x0600
+#define VPE_DEI_WIDTH_MASK		0x07ff
+#define VPE_DEI_WIDTH_SHIFT		0
+#define VPE_DEI_HEIGHT_MASK		0x07ff
+#define VPE_DEI_HEIGHT_SHIFT		16
+#define VPE_DEI_INTERLACE_BYPASS	(1 << 29)
+#define VPE_DEI_FIELD_FLUSH		(1 << 30)
+#define VPE_DEI_PROGRESSIVE		(1 << 31)
+
+#define VPE_MDT_BYPASS			0x0604
+#define VPE_MDT_TEMPMAX_BYPASS		(1 << 0)
+#define VPE_MDT_SPATMAX_BYPASS		(1 << 1)
+
+#define VPE_MDT_SF_THRESHOLD		0x0608
+#define VPE_MDT_SF_SC_THR1_MASK		0xff
+#define VPE_MDT_SF_SC_THR1_SHIFT	0
+#define VPE_MDT_SF_SC_THR2_MASK		0xff
+#define VPE_MDT_SF_SC_THR2_SHIFT	0
+#define VPE_MDT_SF_SC_THR3_MASK		0xff
+#define VPE_MDT_SF_SC_THR3_SHIFT	0
+
+#define VPE_EDI_CONFIG			0x060c
+#define VPE_EDI_INP_MODE_MASK		0x03
+#define VPE_EDI_INP_MODE_SHIFT		0
+#define VPE_EDI_ENABLE_3D		(1 << 2)
+#define VPE_EDI_ENABLE_CHROMA_3D	(1 << 3)
+#define VPE_EDI_CHROMA3D_COR_THR_MASK	0xff
+#define VPE_EDI_CHROMA3D_COR_THR_SHIFT	8
+#define VPE_EDI_DIR_COR_LOWER_THR_MASK	0xff
+#define VPE_EDI_DIR_COR_LOWER_THR_SHIFT	16
+#define VPE_EDI_COR_SCALE_FACTOR_MASK	0xff
+#define VPE_EDI_COR_SCALE_FACTOR_SHIFT	23
+
+#define VPE_DEI_EDI_LUT_R0		0x0610
+#define VPE_EDI_LUT0_MASK		0x1f
+#define VPE_EDI_LUT0_SHIFT		0
+#define VPE_EDI_LUT1_MASK		0x1f
+#define VPE_EDI_LUT1_SHIFT		8
+#define VPE_EDI_LUT2_MASK		0x1f
+#define VPE_EDI_LUT2_SHIFT		16
+#define VPE_EDI_LUT3_MASK		0x1f
+#define VPE_EDI_LUT3_SHIFT		24
+
+#define VPE_DEI_EDI_LUT_R1		0x0614
+#define VPE_EDI_LUT0_MASK		0x1f
+#define VPE_EDI_LUT0_SHIFT		0
+#define VPE_EDI_LUT1_MASK		0x1f
+#define VPE_EDI_LUT1_SHIFT		8
+#define VPE_EDI_LUT2_MASK		0x1f
+#define VPE_EDI_LUT2_SHIFT		16
+#define VPE_EDI_LUT3_MASK		0x1f
+#define VPE_EDI_LUT3_SHIFT		24
+
+#define VPE_DEI_EDI_LUT_R2		0x0618
+#define VPE_EDI_LUT4_MASK		0x1f
+#define VPE_EDI_LUT4_SHIFT		0
+#define VPE_EDI_LUT5_MASK		0x1f
+#define VPE_EDI_LUT5_SHIFT		8
+#define VPE_EDI_LUT6_MASK		0x1f
+#define VPE_EDI_LUT6_SHIFT		16
+#define VPE_EDI_LUT7_MASK		0x1f
+#define VPE_EDI_LUT7_SHIFT		24
+
+#define VPE_DEI_EDI_LUT_R3		0x061c
+#define VPE_EDI_LUT8_MASK		0x1f
+#define VPE_EDI_LUT8_SHIFT		0
+#define VPE_EDI_LUT9_MASK		0x1f
+#define VPE_EDI_LUT9_SHIFT		8
+#define VPE_EDI_LUT10_MASK		0x1f
+#define VPE_EDI_LUT10_SHIFT		16
+#define VPE_EDI_LUT11_MASK		0x1f
+#define VPE_EDI_LUT11_SHIFT		24
+
+#define VPE_DEI_FMD_WINDOW_R0		0x0620
+#define VPE_FMD_WINDOW_MINX_MASK	0x07ff
+#define VPE_FMD_WINDOW_MINX_SHIFT	0
+#define VPE_FMD_WINDOW_MAXX_MASK	0x07ff
+#define VPE_FMD_WINDOW_MAXX_SHIFT	16
+#define VPE_FMD_WINDOW_ENABLE		(1 << 31)
+
+#define VPE_DEI_FMD_WINDOW_R1		0x0624
+#define VPE_FMD_WINDOW_MINY_MASK	0x07ff
+#define VPE_FMD_WINDOW_MINY_SHIFT	0
+#define VPE_FMD_WINDOW_MAXY_MASK	0x07ff
+#define VPE_FMD_WINDOW_MAXY_SHIFT	16
+
+#define VPE_DEI_FMD_CONTROL_R0		0x0628
+#define VPE_FMD_ENABLE			(1 << 0)
+#define VPE_FMD_LOCK			(1 << 1)
+#define VPE_FMD_JAM_DIR			(1 << 2)
+#define VPE_FMD_BED_ENABLE		(1 << 3)
+#define VPE_FMD_CAF_FIELD_THR_MASK	0xff
+#define VPE_FMD_CAF_FIELD_THR_SHIFT	16
+#define VPE_FMD_CAF_LINE_THR_MASK	0xff
+#define VPE_FMD_CAF_LINE_THR_SHIFT	24
+
+#define VPE_DEI_FMD_CONTROL_R1		0x062c
+#define VPE_FMD_CAF_THR_MASK		0x000fffff
+#define VPE_FMD_CAF_THR_SHIFT		0
+
+#define VPE_DEI_FMD_STATUS_R0		0x0630
+#define VPE_FMD_CAF_MASK		0x000fffff
+#define VPE_FMD_CAF_SHIFT		0
+#define VPE_FMD_RESET			(1 << 24)
+
+#define VPE_DEI_FMD_STATUS_R1		0x0634
+#define VPE_FMD_FIELD_DIFF_MASK		0x0fffffff
+#define VPE_FMD_FIELD_DIFF_SHIFT	0
+
+#define VPE_DEI_FMD_STATUS_R2		0x0638
+#define VPE_FMD_FRAME_DIFF_MASK		0x000fffff
+#define VPE_FMD_FRAME_DIFF_SHIFT	0
+
+/* VPE scaler regs */
+#define VPE_SC_MP_SC0			0x0700
+#define VPE_INTERLACE_O			(1 << 0)
+#define VPE_LINEAR			(1 << 1)
+#define VPE_SC_BYPASS			(1 << 2)
+#define VPE_INVT_FID			(1 << 3)
+#define VPE_USE_RAV			(1 << 4)
+#define VPE_ENABLE_EV			(1 << 5)
+#define VPE_AUTO_HS			(1 << 6)
+#define VPE_DCM_2X			(1 << 7)
+#define VPE_DCM_4X			(1 << 8)
+#define VPE_HP_BYPASS			(1 << 9)
+#define VPE_INTERLACE_I			(1 << 10)
+#define VPE_ENABLE_SIN2_VER_INTP	(1 << 11)
+#define VPE_Y_PK_EN			(1 << 14)
+#define VPE_TRIM			(1 << 15)
+#define VPE_SELFGEN_FID			(1 << 16)
+
+#define VPE_SC_MP_SC1			0x0704
+#define VPE_ROW_ACC_INC_MASK		0x07ffffff
+#define VPE_ROW_ACC_INC_SHIFT		0
+
+#define VPE_SC_MP_SC2			0x0708
+#define VPE_ROW_ACC_OFFSET_MASK		0x0fffffff
+#define VPE_ROW_ACC_OFFSET_SHIFT	0
+
+#define VPE_SC_MP_SC3			0x070c
+#define VPE_ROW_ACC_OFFSET_B_MASK	0x0fffffff
+#define VPE_ROW_ACC_OFFSET_B_SHIFT	0
+
+#define VPE_SC_MP_SC4			0x0710
+#define VPE_TAR_H_MASK			0x07ff
+#define VPE_TAR_H_SHIFT			0
+#define VPE_TAR_W_MASK			0x07ff
+#define VPE_TAR_W_SHIFT			12
+#define VPE_LIN_ACC_INC_U_MASK		0x07
+#define VPE_LIN_ACC_INC_U_SHIFT		24
+#define VPE_NLIN_ACC_INIT_U_MASK	0x07
+#define VPE_NLIN_ACC_INIT_U_SHIFT	28
+
+#define VPE_SC_MP_SC5			0x0714
+#define VPE_SRC_H_MASK			0x07ff
+#define VPE_SRC_H_SHIFT			0
+#define VPE_SRC_W_MASK			0x07ff
+#define VPE_SRC_W_SHIFT			12
+#define VPE_NLIN_ACC_INC_U_MASK		0x07
+#define VPE_NLIN_ACC_INC_U_SHIFT	24
+
+#define VPE_SC_MP_SC6			0x0718
+#define VPE_ROW_ACC_INIT_RAV_MASK	0x03ff
+#define VPE_ROW_ACC_INIT_RAV_SHIFT	0
+#define VPE_ROW_ACC_INIT_RAV_B_MASK	0x03ff
+#define VPE_ROW_ACC_INIT_RAV_B_SHIFT	10
+
+#define VPE_SC_MP_SC8			0x0720
+#define VPE_NLIN_LEFT_MASK		0x07ff
+#define VPE_NLIN_LEFT_SHIFT		0
+#define VPE_NLIN_RIGHT_MASK		0x07ff
+#define VPE_NLIN_RIGHT_SHIFT		12
+
+#define VPE_SC_MP_SC9			0x0724
+#define VPE_LIN_ACC_INC			VPE_SC_MP_SC9
+
+#define VPE_SC_MP_SC10			0x0728
+#define VPE_NLIN_ACC_INIT		VPE_SC_MP_SC10
+
+#define VPE_SC_MP_SC11			0x072c
+#define VPE_NLIN_ACC_INC		VPE_SC_MP_SC11
+
+#define VPE_SC_MP_SC12			0x0730
+#define VPE_COL_ACC_OFFSET_MASK		0x01ffffff
+#define VPE_COL_ACC_OFFSET_SHIFT	0
+
+#define VPE_SC_MP_SC13			0x0734
+#define VPE_SC_FACTOR_RAV_MASK		0x03ff
+#define VPE_SC_FACTOR_RAV_SHIFT		0
+#define VPE_CHROMA_INTP_THR_MASK	0x03ff
+#define VPE_CHROMA_INTP_THR_SHIFT	12
+#define VPE_DELTA_CHROMA_THR_MASK	0x0f
+#define VPE_DELTA_CHROMA_THR_SHIFT	24
+
+#define VPE_SC_MP_SC17			0x0744
+#define VPE_EV_THR_MASK			0x03ff
+#define VPE_EV_THR_SHIFT		12
+#define VPE_DELTA_LUMA_THR_MASK		0x0f
+#define VPE_DELTA_LUMA_THR_SHIFT	24
+#define VPE_DELTA_EV_THR_MASK		0x0f
+#define VPE_DELTA_EV_THR_SHIFT		28
+
+#define VPE_SC_MP_SC18			0x0748
+#define VPE_HS_FACTOR_MASK		0x03ff
+#define VPE_HS_FACTOR_SHIFT		0
+#define VPE_CONF_DEFAULT_MASK		0x01ff
+#define VPE_CONF_DEFAULT_SHIFT		16
+
+#define VPE_SC_MP_SC19			0x074c
+#define VPE_HPF_COEFF0_MASK		0xff
+#define VPE_HPF_COEFF0_SHIFT		0
+#define VPE_HPF_COEFF1_MASK		0xff
+#define VPE_HPF_COEFF1_SHIFT		8
+#define VPE_HPF_COEFF2_MASK		0xff
+#define VPE_HPF_COEFF2_SHIFT		16
+#define VPE_HPF_COEFF3_MASK		0xff
+#define VPE_HPF_COEFF3_SHIFT		23
+
+#define VPE_SC_MP_SC20			0x0750
+#define VPE_HPF_COEFF4_MASK		0xff
+#define VPE_HPF_COEFF4_SHIFT		0
+#define VPE_HPF_COEFF5_MASK		0xff
+#define VPE_HPF_COEFF5_SHIFT		8
+#define VPE_HPF_NORM_SHIFT_MASK		0x07
+#define VPE_HPF_NORM_SHIFT_SHIFT	16
+#define VPE_NL_LIMIT_MASK		0x1ff
+#define VPE_NL_LIMIT_SHIFT		20
+
+#define VPE_SC_MP_SC21			0x0754
+#define VPE_NL_LO_THR_MASK		0x01ff
+#define VPE_NL_LO_THR_SHIFT		0
+#define VPE_NL_LO_SLOPE_MASK		0xff
+#define VPE_NL_LO_SLOPE_SHIFT		16
+
+#define VPE_SC_MP_SC22			0x0758
+#define VPE_NL_HI_THR_MASK		0x01ff
+#define VPE_NL_HI_THR_SHIFT		0
+#define VPE_NL_HI_SLOPE_SH_MASK		0x07
+#define VPE_NL_HI_SLOPE_SH_SHIFT	16
+
+#define VPE_SC_MP_SC23			0x075c
+#define VPE_GRADIENT_THR_MASK		0x07ff
+#define VPE_GRADIENT_THR_SHIFT		0
+#define VPE_GRADIENT_THR_RANGE_MASK	0x0f
+#define VPE_GRADIENT_THR_RANGE_SHIFT	12
+#define VPE_MIN_GY_THR_MASK		0xff
+#define VPE_MIN_GY_THR_SHIFT		16
+#define VPE_MIN_GY_THR_RANGE_MASK	0x0f
+#define VPE_MIN_GY_THR_RANGE_SHIFT	28
+
+#define VPE_SC_MP_SC24			0x0760
+#define VPE_ORG_H_MASK			0x07ff
+#define VPE_ORG_H_SHIFT			0
+#define VPE_ORG_W_MASK			0x07ff
+#define VPE_ORG_W_SHIFT			16
+
+#define VPE_SC_MP_SC25			0x0764
+#define VPE_OFF_H_MASK			0x07ff
+#define VPE_OFF_H_SHIFT			0
+#define VPE_OFF_W_MASK			0x07ff
+#define VPE_OFF_W_SHIFT			16
+
+/* VPE color space converter regs */
+#define VPE_CSC_CSC00			0x5700
+#define VPE_CSC_A0_MASK			0x1fff
+#define VPE_CSC_A0_SHIFT		0
+#define VPE_CSC_B0_MASK			0x1fff
+#define VPE_CSC_B0_SHIFT		16
+
+#define VPE_CSC_CSC01			0x5704
+#define VPE_CSC_C0_MASK			0x1fff
+#define VPE_CSC_C0_SHIFT		0
+#define VPE_CSC_A1_MASK			0x1fff
+#define VPE_CSC_A1_SHIFT		16
+
+#define VPE_CSC_CSC02			0x5708
+#define VPE_CSC_B1_MASK			0x1fff
+#define VPE_CSC_B1_SHIFT		0
+#define VPE_CSC_C1_MASK			0x1fff
+#define VPE_CSC_C1_SHIFT		16
+
+#define VPE_CSC_CSC03			0x570c
+#define VPE_CSC_A2_MASK			0x1fff
+#define VPE_CSC_A2_SHIFT		0
+#define VPE_CSC_B2_MASK			0x1fff
+#define VPE_CSC_B2_SHIFT		16
+
+#define VPE_CSC_CSC04			0x5710
+#define VPE_CSC_C2_MASK			0x1fff
+#define VPE_CSC_C2_SHIFT		0
+#define VPE_CSC_D0_MASK			0x0fff
+#define VPE_CSC_D0_SHIFT		16
+
+#define VPE_CSC_CSC05			0x5714
+#define VPE_CSC_D1_MASK			0x0fff
+#define VPE_CSC_D1_SHIFT		0
+#define VPE_CSC_D2_MASK			0x0fff
+#define VPE_CSC_D2_SHIFT		16
+#define VPE_CSC_BYPASS			(1 << 28)
+
+#endif
diff --git a/include/uapi/linux/v4l2-controls.h b/include/uapi/linux/v4l2-controls.h
index 083bb5a..1666aab 100644
--- a/include/uapi/linux/v4l2-controls.h
+++ b/include/uapi/linux/v4l2-controls.h
@@ -160,6 +160,10 @@ enum v4l2_colorfx {
  * of controls. Total of 16 controls is reserved for this driver */
 #define V4L2_CID_USER_SI476X_BASE		(V4L2_CID_USER_BASE + 0x1040)
 
+/* The base for the TI VPE driver controls. Total of 16 controls is reserved for
+ * this driver */
+#define V4L2_CID_USER_TI_VPE_BASE		(V4L2_CID_USER_BASE + 0x1050)
+
 /* MPEG-class control IDs */
 /* The MPEG controls are applicable to all codec controls
  * and the 'MPEG' part of the define is historical */
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v4 3/4] v4l: ti-vpe: Add VPE mem to mem driver
@ 2013-09-06 10:12       ` Archit Taneja
  0 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-09-06 10:12 UTC (permalink / raw)
  To: linux-media, hverkuil, laurent.pinchart
  Cc: linux-omap, tomi.valkeinen, Archit Taneja

VPE is a block which consists of a single memory to memory path which can
perform chrominance up/down sampling, de-interlacing, scaling, and color space
conversion of raster or tiled YUV420 coplanar, YUV422 coplanar or YUV422
interleaved video formats.

We create a mem2mem driver based primarily on the mem2mem-testdev example.
The de-interlacer, scaler and color space converter are all bypassed for now
to keep the driver simple. Chroma up/down sampler blocks are implemented, so
conversion beteen different YUV formats is possible.

Each mem2mem context allocates a buffer for VPE MMR values which it will use
when it gets access to the VPE HW via the mem2mem queue, it also allocates
a VPDMA descriptor list to which configuration and data descriptors are added.

Based on the information received via v4l2 ioctls for the source and
destination queues, the driver configures the values for the MMRs, and stores
them in the buffer. There are also some VPDMA parameters like frame start and
line mode which needs to be configured, these are configured by direct register
writes via the VPDMA helper functions.

The driver's device_run() mem2mem op will add each descriptor based on how the
source and destination queues are set up for the given ctx, once the list is
prepared, it's submitted to VPDMA, these descriptors when parsed by VPDMA will
upload MMR registers, start DMA of video buffers on the various input and output
clients/ports.

When the list is parsed completely(and the DMAs on all the output ports done),
an interrupt is generated which we use to notify that the source and destination
buffers are done.

The rest of the driver is quite similar to other mem2mem drivers, we use the
multiplane v4l2 ioctls as the HW support coplanar formats.

Signed-off-by: Archit Taneja <archit@ti.com>
---
 drivers/media/platform/Kconfig           |   16 +
 drivers/media/platform/Makefile          |    2 +
 drivers/media/platform/ti-vpe/Makefile   |    5 +
 drivers/media/platform/ti-vpe/vpe.c      | 1750 ++++++++++++++++++++++++++++++
 drivers/media/platform/ti-vpe/vpe_regs.h |  496 +++++++++
 include/uapi/linux/v4l2-controls.h       |    4 +
 6 files changed, 2273 insertions(+)
 create mode 100644 drivers/media/platform/ti-vpe/Makefile
 create mode 100644 drivers/media/platform/ti-vpe/vpe.c
 create mode 100644 drivers/media/platform/ti-vpe/vpe_regs.h

diff --git a/drivers/media/platform/Kconfig b/drivers/media/platform/Kconfig
index 8068d7b..f622943 100644
--- a/drivers/media/platform/Kconfig
+++ b/drivers/media/platform/Kconfig
@@ -220,6 +220,22 @@ config VIDEO_RENESAS_VSP1
 	  To compile this driver as a module, choose M here: the module
 	  will be called vsp1.
 
+config VIDEO_TI_VPE
+	tristate "TI VPE (Video Processing Engine) driver"
+	depends on VIDEO_DEV && VIDEO_V4L2 && SOC_DRA7XX
+	select VIDEOBUF2_DMA_CONTIG
+	select V4L2_MEM2MEM_DEV
+	default n
+	---help---
+	  Support for the TI VPE(Video Processing Engine) block
+	  found on DRA7XX SoC.
+
+config VIDEO_TI_VPE_DEBUG
+	bool "VPE debug messages"
+	depends on VIDEO_TI_VPE
+	---help---
+	  Enable debug messages on VPE driver.
+
 endif # V4L_MEM2MEM_DRIVERS
 
 menuconfig V4L_TEST_DRIVERS
diff --git a/drivers/media/platform/Makefile b/drivers/media/platform/Makefile
index 4e4da48..1348ba1 100644
--- a/drivers/media/platform/Makefile
+++ b/drivers/media/platform/Makefile
@@ -22,6 +22,8 @@ obj-$(CONFIG_VIDEO_VIVI) += vivi.o
 
 obj-$(CONFIG_VIDEO_MEM2MEM_TESTDEV) += mem2mem_testdev.o
 
+obj-$(CONFIG_VIDEO_TI_VPE)		+= ti-vpe/
+
 obj-$(CONFIG_VIDEO_MX2_EMMAPRP)		+= mx2_emmaprp.o
 obj-$(CONFIG_VIDEO_CODA) 		+= coda.o
 
diff --git a/drivers/media/platform/ti-vpe/Makefile b/drivers/media/platform/ti-vpe/Makefile
new file mode 100644
index 0000000..cbf0a80
--- /dev/null
+++ b/drivers/media/platform/ti-vpe/Makefile
@@ -0,0 +1,5 @@
+obj-$(CONFIG_VIDEO_TI_VPE) += ti-vpe.o
+
+ti-vpe-y := vpe.o vpdma.o
+
+ccflags-$(CONFIG_VIDEO_TI_VPE_DEBUG) += -DDEBUG
diff --git a/drivers/media/platform/ti-vpe/vpe.c b/drivers/media/platform/ti-vpe/vpe.c
new file mode 100644
index 0000000..549681e
--- /dev/null
+++ b/drivers/media/platform/ti-vpe/vpe.c
@@ -0,0 +1,1750 @@
+/*
+ * TI VPE mem2mem driver, based on the virtual v4l2-mem2mem example driver
+ *
+ * Copyright (c) 2013 Texas Instruments Inc.
+ * David Griego, <dagriego@biglakesoftware.com>
+ * Dale Farnsworth, <dale@farnsworth.org>
+ * Archit Taneja, <archit@ti.com>
+ *
+ * Copyright (c) 2009-2010 Samsung Electronics Co., Ltd.
+ * Pawel Osciak, <pawel@osciak.com>
+ * Marek Szyprowski, <m.szyprowski@samsung.com>
+ *
+ * Based on the virtual v4l2-mem2mem example device
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation
+ */
+
+#include <linux/delay.h>
+#include <linux/dma-mapping.h>
+#include <linux/err.h>
+#include <linux/fs.h>
+#include <linux/interrupt.h>
+#include <linux/io.h>
+#include <linux/ioctl.h>
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <linux/pm_runtime.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/videodev2.h>
+
+#include <media/v4l2-common.h>
+#include <media/v4l2-ctrls.h>
+#include <media/v4l2-device.h>
+#include <media/v4l2-event.h>
+#include <media/v4l2-ioctl.h>
+#include <media/v4l2-mem2mem.h>
+#include <media/videobuf2-core.h>
+#include <media/videobuf2-dma-contig.h>
+
+#include "vpdma.h"
+#include "vpe_regs.h"
+
+#define VPE_MODULE_NAME "vpe"
+
+/* minimum and maximum frame sizes */
+#define MIN_W		128
+#define MIN_H		128
+#define MAX_W		1920
+#define MAX_H		1080
+
+/* required alignments */
+#define S_ALIGN		0	/* multiple of 1 */
+#define H_ALIGN		1	/* multiple of 2 */
+#define W_ALIGN		1	/* multiple of 2 */
+
+/* multiple of 128 bits, line stride, 16 bytes */
+#define L_ALIGN		4
+
+/* flags that indicate a format can be used for capture/output */
+#define VPE_FMT_TYPE_CAPTURE	(1 << 0)
+#define VPE_FMT_TYPE_OUTPUT	(1 << 1)
+
+/* used as plane indices */
+#define VPE_MAX_PLANES	2
+#define VPE_LUMA	0
+#define VPE_CHROMA	1
+
+/* per m2m context info */
+#define VPE_DEF_BUFS_PER_JOB	1	/* default one buffer per batch job */
+
+/*
+ * each VPE context can need up to 3 config desciptors, 7 input descriptors,
+ * 3 output descriptors, and 10 control descriptors
+ */
+#define VPE_DESC_LIST_SIZE	(10 * VPDMA_DTD_DESC_SIZE +	\
+					13 * VPDMA_CFD_CTD_DESC_SIZE)
+
+#define vpe_dbg(vpedev, fmt, arg...)	\
+		dev_dbg((vpedev)->v4l2_dev.dev, fmt, ##arg)
+#define vpe_err(vpedev, fmt, arg...)	\
+		dev_err((vpedev)->v4l2_dev.dev, fmt, ##arg)
+
+struct vpe_us_coeffs {
+	unsigned short	anchor_fid0_c0;
+	unsigned short	anchor_fid0_c1;
+	unsigned short	anchor_fid0_c2;
+	unsigned short	anchor_fid0_c3;
+	unsigned short	interp_fid0_c0;
+	unsigned short	interp_fid0_c1;
+	unsigned short	interp_fid0_c2;
+	unsigned short	interp_fid0_c3;
+	unsigned short	anchor_fid1_c0;
+	unsigned short	anchor_fid1_c1;
+	unsigned short	anchor_fid1_c2;
+	unsigned short	anchor_fid1_c3;
+	unsigned short	interp_fid1_c0;
+	unsigned short	interp_fid1_c1;
+	unsigned short	interp_fid1_c2;
+	unsigned short	interp_fid1_c3;
+};
+
+/*
+ * Default upsampler coefficients
+ */
+static const struct vpe_us_coeffs us_coeffs[] = {
+	{
+		/* Coefficients for progressive input */
+		0x00C8, 0x0348, 0x0018, 0x3FD8, 0x3FB8, 0x0378, 0x00E8, 0x3FE8,
+		0x00C8, 0x0348, 0x0018, 0x3FD8, 0x3FB8, 0x0378, 0x00E8, 0x3FE8,
+	},
+};
+
+/*
+ * The port_data structure contains per-port data.
+ */
+struct vpe_port_data {
+	enum vpdma_channel channel;	/* VPDMA channel */
+	u8	vb_part;		/* plane index for co-panar formats */
+};
+
+/*
+ * Define indices into the port_data tables
+ */
+#define VPE_PORT_LUMA1_IN	0
+#define VPE_PORT_CHROMA1_IN	1
+#define VPE_PORT_LUMA_OUT	8
+#define VPE_PORT_CHROMA_OUT	9
+#define VPE_PORT_RGB_OUT	10
+
+static const struct vpe_port_data port_data[11] = {
+	[VPE_PORT_LUMA1_IN] = {
+		.channel	= VPE_CHAN_LUMA1_IN,
+		.vb_part	= VPE_LUMA,
+	},
+	[VPE_PORT_CHROMA1_IN] = {
+		.channel	= VPE_CHAN_CHROMA1_IN,
+		.vb_part	= VPE_CHROMA,
+	},
+	[VPE_PORT_LUMA_OUT] = {
+		.channel	= VPE_CHAN_LUMA_OUT,
+		.vb_part	= VPE_LUMA,
+	},
+	[VPE_PORT_CHROMA_OUT] = {
+		.channel	= VPE_CHAN_CHROMA_OUT,
+		.vb_part	= VPE_CHROMA,
+	},
+	[VPE_PORT_RGB_OUT] = {
+		.channel	= VPE_CHAN_RGB_OUT,
+		.vb_part	= VPE_LUMA,
+	},
+};
+
+
+/* driver info for each of the supported video formats */
+struct vpe_fmt {
+	char	*name;			/* human-readable name */
+	u32	fourcc;			/* standard format identifier */
+	u8	types;			/* CAPTURE and/or OUTPUT */
+	u8	coplanar;		/* set for unpacked Luma and Chroma */
+	/* vpdma format info for each plane */
+	struct vpdma_data_format const *vpdma_fmt[VPE_MAX_PLANES];
+};
+
+static struct vpe_fmt vpe_formats[] = {
+	{
+		.name		= "YUV 422 co-planar",
+		.fourcc		= V4L2_PIX_FMT_NV16,
+		.types		= VPE_FMT_TYPE_CAPTURE | VPE_FMT_TYPE_OUTPUT,
+		.coplanar	= 1,
+		.vpdma_fmt	= { &vpdma_yuv_fmts[VPDMA_DATA_FMT_Y444],
+				    &vpdma_yuv_fmts[VPDMA_DATA_FMT_C444],
+				  },
+	},
+	{
+		.name		= "YUV 420 co-planar",
+		.fourcc		= V4L2_PIX_FMT_NV12,
+		.types		= VPE_FMT_TYPE_CAPTURE | VPE_FMT_TYPE_OUTPUT,
+		.coplanar	= 1,
+		.vpdma_fmt	= { &vpdma_yuv_fmts[VPDMA_DATA_FMT_Y420],
+				    &vpdma_yuv_fmts[VPDMA_DATA_FMT_C420],
+				  },
+	},
+	{
+		.name		= "YUYV 422 packed",
+		.fourcc		= V4L2_PIX_FMT_YUYV,
+		.types		= VPE_FMT_TYPE_CAPTURE | VPE_FMT_TYPE_OUTPUT,
+		.coplanar	= 0,
+		.vpdma_fmt	= { &vpdma_yuv_fmts[VPDMA_DATA_FMT_YC422],
+				  },
+	},
+	{
+		.name		= "UYVY 422 packed",
+		.fourcc		= V4L2_PIX_FMT_UYVY,
+		.types		= VPE_FMT_TYPE_CAPTURE | VPE_FMT_TYPE_OUTPUT,
+		.coplanar	= 0,
+		.vpdma_fmt	= { &vpdma_yuv_fmts[VPDMA_DATA_FMT_CY422],
+				  },
+	},
+};
+
+/*
+ * per-queue, driver-specific private data.
+ * there is one source queue and one destination queue for each m2m context.
+ */
+struct vpe_q_data {
+	unsigned int		width;				/* frame width */
+	unsigned int		height;				/* frame height */
+	unsigned int		bytesperline[VPE_MAX_PLANES];	/* bytes per line in memory */
+	enum v4l2_colorspace	colorspace;
+	unsigned int		flags;
+	unsigned int		sizeimage[VPE_MAX_PLANES];	/* image size in memory */
+	struct v4l2_rect	c_rect;				/* crop/compose rectangle */
+	struct vpe_fmt		*fmt;				/* format info */
+};
+
+/* vpe_q_data flag bits */
+#define	Q_DATA_FRAME_1D		(1 << 0)
+#define	Q_DATA_MODE_TILED	(1 << 1)
+
+enum {
+	Q_DATA_SRC = 0,
+	Q_DATA_DST = 1,
+};
+
+/* find our format description corresponding to the passed v4l2_format */
+static struct vpe_fmt *find_format(struct v4l2_format *f)
+{
+	struct vpe_fmt *fmt;
+	unsigned int k;
+
+	for (k = 0; k < ARRAY_SIZE(vpe_formats); k++) {
+		fmt = &vpe_formats[k];
+		if (fmt->fourcc == f->fmt.pix.pixelformat)
+			return fmt;
+	}
+
+	return NULL;
+}
+
+/*
+ * there is one vpe_dev structure in the driver, it is shared by
+ * all instances.
+ */
+struct vpe_dev {
+	struct v4l2_device	v4l2_dev;
+	struct video_device	vfd;
+	struct v4l2_m2m_dev	*m2m_dev;
+
+	atomic_t		num_instances;	/* count of driver instances */
+	dma_addr_t		loaded_mmrs;	/* shadow mmrs in device */
+	struct mutex		dev_mutex;
+	spinlock_t		lock;
+
+	int			irq;
+	void __iomem		*base;
+
+	struct vb2_alloc_ctx	*alloc_ctx;
+	struct vpdma_data	*vpdma;		/* vpdma data handle */
+};
+
+/*
+ * There is one vpe_ctx structure for each m2m context.
+ */
+struct vpe_ctx {
+	struct v4l2_fh		fh;
+	struct vpe_dev		*dev;
+	struct v4l2_m2m_ctx	*m2m_ctx;
+	struct v4l2_ctrl_handler hdl;
+
+	unsigned int		sequence;		/* current frame/field seq */
+	unsigned int		aborting;		/* abort after next irq */
+
+	unsigned int		bufs_per_job;		/* input buffers per batch */
+	unsigned int		bufs_completed;		/* bufs done in this batch */
+
+	struct vpe_q_data	q_data[2];		/* src & dst queue data */
+	struct vb2_buffer	*src_vb;
+	struct vb2_buffer	*dst_vb;
+
+	struct vpdma_buf	mmr_adb;		/* shadow reg addr/data block */
+	struct vpdma_desc_list	desc_list;		/* DMA descriptor list */
+
+	bool			load_mmrs;		/* have new shadow reg values */
+};
+
+
+/*
+ * M2M devices get 2 queues.
+ * Return the queue given the type.
+ */
+static struct vpe_q_data *get_q_data(struct vpe_ctx *ctx,
+				     enum v4l2_buf_type type)
+{
+	switch (type) {
+	case V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE:
+		return &ctx->q_data[Q_DATA_SRC];
+	case V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE:
+		return &ctx->q_data[Q_DATA_DST];
+	default:
+		BUG();
+	}
+	return NULL;
+}
+
+static u32 read_reg(struct vpe_dev *dev, int offset)
+{
+	return ioread32(dev->base + offset);
+}
+
+static void write_reg(struct vpe_dev *dev, int offset, u32 value)
+{
+	iowrite32(value, dev->base + offset);
+}
+
+/* register field read/write helpers */
+static int get_field(u32 value, u32 mask, int shift)
+{
+	return (value & (mask << shift)) >> shift;
+}
+
+static int read_field_reg(struct vpe_dev *dev, int offset, u32 mask, int shift)
+{
+	return get_field(read_reg(dev, offset), mask, shift);
+}
+
+static void write_field(u32 *valp, u32 field, u32 mask, int shift)
+{
+	u32 val = *valp;
+
+	val &= ~(mask << shift);
+	val |= (field & mask) << shift;
+	*valp = val;
+}
+
+static void write_field_reg(struct vpe_dev *dev, int offset, u32 field,
+		u32 mask, int shift)
+{
+	u32 val = read_reg(dev, offset);
+
+	write_field(&val, field, mask, shift);
+
+	write_reg(dev, offset, val);
+}
+
+/*
+ * DMA address/data block for the shadow registers
+ */
+struct vpe_mmr_adb {
+	struct vpdma_adb_hdr	out_fmt_hdr;
+	u32			out_fmt_reg[1];
+	u32			out_fmt_pad[3];
+	struct vpdma_adb_hdr	us1_hdr;
+	u32			us1_regs[8];
+	struct vpdma_adb_hdr	us2_hdr;
+	u32			us2_regs[8];
+	struct vpdma_adb_hdr	us3_hdr;
+	u32			us3_regs[8];
+	struct vpdma_adb_hdr	dei_hdr;
+	u32			dei_regs[1];
+	u32			dei_pad[3];
+	struct vpdma_adb_hdr	sc_hdr;
+	u32			sc_regs[1];
+	u32			sc_pad[3];
+	struct vpdma_adb_hdr	csc_hdr;
+	u32			csc_regs[6];
+	u32			csc_pad[2];
+};
+
+#define VPE_SET_MMR_ADB_HDR(ctx, hdr, regs, offset_a)	\
+	VPDMA_SET_MMR_ADB_HDR(ctx->mmr_adb, vpe_mmr_adb, hdr, regs, offset_a)
+/*
+ * Set the headers for all of the address/data block structures.
+ */
+static void init_adb_hdrs(struct vpe_ctx *ctx)
+{
+	VPE_SET_MMR_ADB_HDR(ctx, out_fmt_hdr, out_fmt_reg, VPE_CLK_FORMAT_SELECT);
+	VPE_SET_MMR_ADB_HDR(ctx, us1_hdr, us1_regs, VPE_US1_R0);
+	VPE_SET_MMR_ADB_HDR(ctx, us2_hdr, us2_regs, VPE_US2_R0);
+	VPE_SET_MMR_ADB_HDR(ctx, us3_hdr, us3_regs, VPE_US3_R0);
+	VPE_SET_MMR_ADB_HDR(ctx, dei_hdr, dei_regs, VPE_DEI_FRAME_SIZE);
+	VPE_SET_MMR_ADB_HDR(ctx, sc_hdr, sc_regs, VPE_SC_MP_SC0);
+	VPE_SET_MMR_ADB_HDR(ctx, csc_hdr, csc_regs, VPE_CSC_CSC00);
+};
+
+/*
+ * Enable or disable the VPE clocks
+ */
+static void vpe_set_clock_enable(struct vpe_dev *dev, bool on)
+{
+	u32 val = 0;
+
+	if (on)
+		val = VPE_DATA_PATH_CLK_ENABLE | VPE_VPEDMA_CLK_ENABLE;
+	write_reg(dev, VPE_CLK_ENABLE, val);
+}
+
+static void vpe_top_reset(struct vpe_dev *dev)
+{
+
+	write_field_reg(dev, VPE_CLK_RESET, 1, VPE_DATA_PATH_CLK_RESET_MASK,
+		VPE_DATA_PATH_CLK_RESET_SHIFT);
+
+	usleep_range(100, 150);
+
+	write_field_reg(dev, VPE_CLK_RESET, 0, VPE_DATA_PATH_CLK_RESET_MASK,
+		VPE_DATA_PATH_CLK_RESET_SHIFT);
+}
+
+static void vpe_top_vpdma_reset(struct vpe_dev *dev)
+{
+	write_field_reg(dev, VPE_CLK_RESET, 1, VPE_VPDMA_CLK_RESET_MASK,
+		VPE_VPDMA_CLK_RESET_SHIFT);
+
+	usleep_range(100, 150);
+
+	write_field_reg(dev, VPE_CLK_RESET, 0, VPE_VPDMA_CLK_RESET_MASK,
+		VPE_VPDMA_CLK_RESET_SHIFT);
+}
+
+/*
+ * Load the correct of upsampler coefficients into the shadow MMRs
+ */
+static void set_us_coefficients(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	u32 *us1_reg = &mmr_adb->us1_regs[0];
+	u32 *us2_reg = &mmr_adb->us2_regs[0];
+	u32 *us3_reg = &mmr_adb->us3_regs[0];
+	const unsigned short *cp, *end_cp;
+
+	cp = &us_coeffs[0].anchor_fid0_c0;
+
+	end_cp = cp + sizeof(us_coeffs[0]) / sizeof(*cp);
+
+	while (cp < end_cp) {
+		write_field(us1_reg, *cp++, VPE_US_C0_MASK, VPE_US_C0_SHIFT);
+		write_field(us1_reg, *cp++, VPE_US_C1_MASK, VPE_US_C1_SHIFT);
+		*us2_reg++ = *us1_reg;
+		*us3_reg++ = *us1_reg++;
+	}
+	ctx->load_mmrs = true;
+}
+
+/*
+ * Set the upsampler config mode and the VPDMA line mode in the shadow MMRs.
+ */
+static void set_cfg_and_line_modes(struct vpe_ctx *ctx)
+{
+	struct vpe_fmt *fmt = ctx->q_data[Q_DATA_SRC].fmt;
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	u32 *us1_reg0 = &mmr_adb->us1_regs[0];
+	u32 *us2_reg0 = &mmr_adb->us2_regs[0];
+	u32 *us3_reg0 = &mmr_adb->us3_regs[0];
+	int line_mode = 1;
+	int cfg_mode = 1;
+
+	/*
+	 * Cfg Mode 0: YUV420 source, enable upsampler, DEI is de-interlacing.
+	 * Cfg Mode 1: YUV422 source, disable upsampler, DEI is de-interlacing.
+	 */
+
+	if (fmt->fourcc == V4L2_PIX_FMT_NV12) {
+		cfg_mode = 0;
+		line_mode = 0;		/* double lines to line buffer */
+	}
+
+	write_field(us1_reg0, cfg_mode, VPE_US_MODE_MASK, VPE_US_MODE_SHIFT);
+	write_field(us2_reg0, cfg_mode, VPE_US_MODE_MASK, VPE_US_MODE_SHIFT);
+	write_field(us3_reg0, cfg_mode, VPE_US_MODE_MASK, VPE_US_MODE_SHIFT);
+
+	/* regs for now */
+	vpdma_set_line_mode(ctx->dev->vpdma, line_mode, VPE_CHAN_CHROMA1_IN);
+
+	/* frame start for input luma */
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_LUMA1_IN);
+
+	/* frame start for input chroma */
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_CHROMA1_IN);
+
+	ctx->load_mmrs = true;
+}
+
+/*
+ * Set the shadow registers that are modified when the source
+ * format changes.
+ */
+static void set_src_registers(struct vpe_ctx *ctx)
+{
+	set_us_coefficients(ctx);
+}
+
+/*
+ * Set the shadow registers that are modified when the destination
+ * format changes.
+ */
+static void set_dst_registers(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	struct vpe_fmt *fmt = ctx->q_data[Q_DATA_DST].fmt;
+	u32 val = 0;
+
+	/* select RGB path when color space conversion is supported in future */
+	if (fmt->fourcc == V4L2_PIX_FMT_RGB24)
+		val |= VPE_RGB_OUT_SELECT | VPE_CSC_SRC_DEI_SCALER;
+	else if (fmt->fourcc == V4L2_PIX_FMT_NV16)
+		val |= VPE_COLOR_SEPARATE_422;
+
+	/* The source of CHR_DS is always the scaler, whether it's used or not */
+	val |= VPE_DS_SRC_DEI_SCALER;
+
+	if (fmt->fourcc != V4L2_PIX_FMT_NV12)
+		val |= VPE_DS_BYPASS;
+
+	mmr_adb->out_fmt_reg[0] = val;
+
+	ctx->load_mmrs = true;
+}
+
+/*
+ * Set the de-interlacer shadow register values
+ */
+static void set_dei_regs_bypass(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	struct vpe_q_data *s_q_data = &ctx->q_data[Q_DATA_SRC];
+	unsigned int src_h = s_q_data->c_rect.height;
+	unsigned int src_w = s_q_data->c_rect.width;
+	u32 *dei_mmr0 = &mmr_adb->dei_regs[0];
+	u32 val = 0;
+
+	/*
+	 * according to TRM, we should set DEI in progressive bypass mode when
+	 * the input content is progressive, however, DEI is bypassed correctly
+	 * for both progressive and interlace content in interlace bypass mode.
+	 * It has been recommended not to use progressive bypass mode.
+	 */
+	val = VPE_DEI_INTERLACE_BYPASS;
+
+	val |= (src_h << VPE_DEI_HEIGHT_SHIFT) |
+		(src_w << VPE_DEI_WIDTH_SHIFT) |
+		VPE_DEI_FIELD_FLUSH;
+
+	*dei_mmr0 = val;
+
+	ctx->load_mmrs = true;
+}
+
+static void set_csc_coeff_bypass(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	u32 *shadow_csc_reg5 = &mmr_adb->csc_regs[5];
+
+	*shadow_csc_reg5 |= VPE_CSC_BYPASS;
+
+	ctx->load_mmrs = true;
+}
+
+static void set_sc_regs_bypass(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	u32 *sc_reg0 = &mmr_adb->sc_regs[0];
+	u32 val = 0;
+
+	val |= VPE_SC_BYPASS;
+	*sc_reg0 = val;
+
+	ctx->load_mmrs = true;
+}
+
+/*
+ * Set the shadow registers whose values are modified when either the
+ * source or destination format is changed.
+ */
+static int set_srcdst_params(struct vpe_ctx *ctx)
+{
+	ctx->sequence = 0;
+
+	set_cfg_and_line_modes(ctx);
+	set_dei_regs_bypass(ctx);
+	set_csc_coeff_bypass(ctx);
+	set_sc_regs_bypass(ctx);
+
+	return 0;
+}
+
+/*
+ * Return the vpe_ctx structure for a given struct file
+ */
+static struct vpe_ctx *file2ctx(struct file *file)
+{
+	return container_of(file->private_data, struct vpe_ctx, fh);
+}
+
+/*
+ * mem2mem callbacks
+ */
+
+/**
+ * job_ready() - check whether an instance is ready to be scheduled to run
+ */
+static int job_ready(void *priv)
+{
+	struct vpe_ctx *ctx = priv;
+	int needed = ctx->bufs_per_job;
+
+	if (v4l2_m2m_num_src_bufs_ready(ctx->m2m_ctx) < needed)
+		return 0;
+
+	return 1;
+}
+
+static void job_abort(void *priv)
+{
+	struct vpe_ctx *ctx = priv;
+
+	/* Will cancel the transaction in the next interrupt handler */
+	ctx->aborting = 1;
+}
+
+/*
+ * Lock access to the device
+ */
+static void vpe_lock(void *priv)
+{
+	struct vpe_ctx *ctx = priv;
+	struct vpe_dev *dev = ctx->dev;
+	mutex_lock(&dev->dev_mutex);
+}
+
+static void vpe_unlock(void *priv)
+{
+	struct vpe_ctx *ctx = priv;
+	struct vpe_dev *dev = ctx->dev;
+	mutex_unlock(&dev->dev_mutex);
+}
+
+static void vpe_dump_regs(struct vpe_dev *dev)
+{
+#define DUMPREG(r) vpe_dbg(dev, "%-35s %08x\n", #r, read_reg(dev, VPE_##r))
+
+	vpe_dbg(dev, "VPE Registers:\n");
+
+	DUMPREG(PID);
+	DUMPREG(SYSCONFIG);
+	DUMPREG(INT0_STATUS0_RAW);
+	DUMPREG(INT0_STATUS0);
+	DUMPREG(INT0_ENABLE0);
+	DUMPREG(INT0_STATUS1_RAW);
+	DUMPREG(INT0_STATUS1);
+	DUMPREG(INT0_ENABLE1);
+	DUMPREG(CLK_ENABLE);
+	DUMPREG(CLK_RESET);
+	DUMPREG(CLK_FORMAT_SELECT);
+	DUMPREG(CLK_RANGE_MAP);
+	DUMPREG(US1_R0);
+	DUMPREG(US1_R1);
+	DUMPREG(US1_R2);
+	DUMPREG(US1_R3);
+	DUMPREG(US1_R4);
+	DUMPREG(US1_R5);
+	DUMPREG(US1_R6);
+	DUMPREG(US1_R7);
+	DUMPREG(US2_R0);
+	DUMPREG(US2_R1);
+	DUMPREG(US2_R2);
+	DUMPREG(US2_R3);
+	DUMPREG(US2_R4);
+	DUMPREG(US2_R5);
+	DUMPREG(US2_R6);
+	DUMPREG(US2_R7);
+	DUMPREG(US3_R0);
+	DUMPREG(US3_R1);
+	DUMPREG(US3_R2);
+	DUMPREG(US3_R3);
+	DUMPREG(US3_R4);
+	DUMPREG(US3_R5);
+	DUMPREG(US3_R6);
+	DUMPREG(US3_R7);
+	DUMPREG(DEI_FRAME_SIZE);
+	DUMPREG(MDT_BYPASS);
+	DUMPREG(MDT_SF_THRESHOLD);
+	DUMPREG(EDI_CONFIG);
+	DUMPREG(DEI_EDI_LUT_R0);
+	DUMPREG(DEI_EDI_LUT_R1);
+	DUMPREG(DEI_EDI_LUT_R2);
+	DUMPREG(DEI_EDI_LUT_R3);
+	DUMPREG(DEI_FMD_WINDOW_R0);
+	DUMPREG(DEI_FMD_WINDOW_R1);
+	DUMPREG(DEI_FMD_CONTROL_R0);
+	DUMPREG(DEI_FMD_CONTROL_R1);
+	DUMPREG(DEI_FMD_STATUS_R0);
+	DUMPREG(DEI_FMD_STATUS_R1);
+	DUMPREG(DEI_FMD_STATUS_R2);
+	DUMPREG(SC_MP_SC0);
+	DUMPREG(SC_MP_SC1);
+	DUMPREG(SC_MP_SC2);
+	DUMPREG(SC_MP_SC3);
+	DUMPREG(SC_MP_SC4);
+	DUMPREG(SC_MP_SC5);
+	DUMPREG(SC_MP_SC6);
+	DUMPREG(SC_MP_SC8);
+	DUMPREG(SC_MP_SC9);
+	DUMPREG(SC_MP_SC10);
+	DUMPREG(SC_MP_SC11);
+	DUMPREG(SC_MP_SC12);
+	DUMPREG(SC_MP_SC13);
+	DUMPREG(SC_MP_SC17);
+	DUMPREG(SC_MP_SC18);
+	DUMPREG(SC_MP_SC19);
+	DUMPREG(SC_MP_SC20);
+	DUMPREG(SC_MP_SC21);
+	DUMPREG(SC_MP_SC22);
+	DUMPREG(SC_MP_SC23);
+	DUMPREG(SC_MP_SC24);
+	DUMPREG(SC_MP_SC25);
+	DUMPREG(CSC_CSC00);
+	DUMPREG(CSC_CSC01);
+	DUMPREG(CSC_CSC02);
+	DUMPREG(CSC_CSC03);
+	DUMPREG(CSC_CSC04);
+	DUMPREG(CSC_CSC05);
+#undef DUMPREG
+}
+
+static void add_out_dtd(struct vpe_ctx *ctx, int port)
+{
+	struct vpe_q_data *q_data = &ctx->q_data[Q_DATA_DST];
+	const struct vpe_port_data *p_data = &port_data[port];
+	struct vb2_buffer *vb = ctx->dst_vb;
+	struct v4l2_rect *c_rect = &q_data->c_rect;
+	struct vpe_fmt *fmt = q_data->fmt;
+	const struct vpdma_data_format *vpdma_fmt;
+	int plane = fmt->coplanar ? p_data->vb_part : 0;
+	dma_addr_t dma_addr;
+	u32 flags = 0;
+
+	vpdma_fmt = fmt->vpdma_fmt[plane];
+	dma_addr = vb2_dma_contig_plane_dma_addr(vb, plane);
+	if (!dma_addr) {
+		vpe_err(ctx->dev,
+			"acquiring output buffer(%d) dma_addr failed\n",
+			port);
+		return;
+	}
+
+	if (q_data->flags & Q_DATA_FRAME_1D)
+		flags |= VPDMA_DATA_FRAME_1D;
+	if (q_data->flags & Q_DATA_MODE_TILED)
+		flags |= VPDMA_DATA_MODE_TILED;
+
+	vpdma_add_out_dtd(&ctx->desc_list, c_rect, vpdma_fmt, dma_addr,
+		p_data->channel, flags);
+}
+
+static void add_in_dtd(struct vpe_ctx *ctx, int port)
+{
+	struct vpe_q_data *q_data = &ctx->q_data[Q_DATA_SRC];
+	const struct vpe_port_data *p_data = &port_data[port];
+	struct vb2_buffer *vb = ctx->src_vb;
+	struct v4l2_rect *c_rect = &q_data->c_rect;
+	struct vpe_fmt *fmt = q_data->fmt;
+	const struct vpdma_data_format *vpdma_fmt;
+	int plane = fmt->coplanar ? p_data->vb_part : 0;
+	int field = 0;
+	dma_addr_t dma_addr;
+	u32 flags = 0;
+
+	vpdma_fmt = fmt->vpdma_fmt[plane];
+
+	dma_addr = vb2_dma_contig_plane_dma_addr(vb, plane);
+	if (!dma_addr) {
+		vpe_err(ctx->dev,
+			"acquiring input buffer(%d) dma_addr failed\n",
+			port);
+		return;
+	}
+
+	if (q_data->flags & Q_DATA_FRAME_1D)
+		flags |= VPDMA_DATA_FRAME_1D;
+	if (q_data->flags & Q_DATA_MODE_TILED)
+		flags |= VPDMA_DATA_MODE_TILED;
+
+	vpdma_add_in_dtd(&ctx->desc_list, q_data->width, q_data->height,
+		c_rect, vpdma_fmt, dma_addr, p_data->channel, field, flags);
+}
+
+/*
+ * Enable the expected IRQ sources
+ */
+static void enable_irqs(struct vpe_ctx *ctx)
+{
+	write_reg(ctx->dev, VPE_INT0_ENABLE0_SET, VPE_INT0_LIST0_COMPLETE);
+	write_reg(ctx->dev, VPE_INT0_ENABLE1_SET, VPE_DS1_UV_ERROR_INT);
+
+	vpdma_enable_list_complete_irq(ctx->dev->vpdma, 0, true);
+}
+
+static void disable_irqs(struct vpe_ctx *ctx)
+{
+	write_reg(ctx->dev, VPE_INT0_ENABLE0_CLR, 0xffffffff);
+	write_reg(ctx->dev, VPE_INT0_ENABLE1_CLR, 0xffffffff);
+
+	vpdma_enable_list_complete_irq(ctx->dev->vpdma, 0, false);
+}
+
+/* device_run() - prepares and starts the device
+ *
+ * This function is only called when both the source and destination
+ * buffers are in place.
+ */
+static void device_run(void *priv)
+{
+	struct vpe_ctx *ctx = priv;
+	struct vpe_q_data *d_q_data = &ctx->q_data[Q_DATA_DST];
+
+	ctx->src_vb = v4l2_m2m_src_buf_remove(ctx->m2m_ctx);
+	WARN_ON(ctx->src_vb == NULL);
+	ctx->dst_vb = v4l2_m2m_dst_buf_remove(ctx->m2m_ctx);
+	WARN_ON(ctx->dst_vb == NULL);
+
+	/* config descriptors */
+	if (ctx->dev->loaded_mmrs != ctx->mmr_adb.dma_addr || ctx->load_mmrs) {
+		vpdma_map_desc_buf(ctx->dev->vpdma, &ctx->mmr_adb);
+		vpdma_add_cfd_adb(&ctx->desc_list, CFD_MMR_CLIENT, &ctx->mmr_adb);
+		ctx->dev->loaded_mmrs = ctx->mmr_adb.dma_addr;
+		ctx->load_mmrs = false;
+	}
+
+	add_out_dtd(ctx, VPE_PORT_LUMA_OUT);
+	if (d_q_data->fmt->coplanar)
+		add_out_dtd(ctx, VPE_PORT_CHROMA_OUT);
+
+	add_in_dtd(ctx, VPE_PORT_LUMA1_IN);
+	add_in_dtd(ctx, VPE_PORT_CHROMA1_IN);
+
+	/* sync on channel control descriptors for input ports */
+	vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_LUMA1_IN);
+	vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_CHROMA1_IN);
+
+	/* sync on channel control descriptors for output ports */
+	vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_LUMA_OUT);
+	if (d_q_data->fmt->coplanar)
+		vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_CHROMA_OUT);
+
+	enable_irqs(ctx);
+
+	vpdma_map_desc_buf(ctx->dev->vpdma, &ctx->desc_list.buf);
+	vpdma_submit_descs(ctx->dev->vpdma, &ctx->desc_list);
+}
+
+static void ds1_uv_error(struct vpe_ctx *ctx)
+{
+	dev_warn(ctx->dev->v4l2_dev.dev,
+		"received downsampler error interrupt\n");
+}
+
+static irqreturn_t vpe_irq(int irq_vpe, void *data)
+{
+	struct vpe_dev *dev = (struct vpe_dev *)data;
+	struct vpe_ctx *ctx;
+	struct vb2_buffer *s_vb, *d_vb;
+	struct v4l2_buffer *s_buf, *d_buf;
+	unsigned long flags;
+	u32 irqst0, irqst1;
+
+	irqst0 = read_reg(dev, VPE_INT0_STATUS0);
+	if (irqst0) {
+		write_reg(dev, VPE_INT0_STATUS0_CLR, irqst0);
+		vpe_dbg(dev, "INT0_STATUS0 = 0x%08x\n", irqst0);
+	}
+
+	irqst1 = read_reg(dev, VPE_INT0_STATUS1);
+	if (irqst1) {
+		write_reg(dev, VPE_INT0_STATUS1_CLR, irqst1);
+		vpe_dbg(dev, "INT0_STATUS1 = 0x%08x\n", irqst1);
+	}
+
+	ctx = v4l2_m2m_get_curr_priv(dev->m2m_dev);
+	if (!ctx) {
+		vpe_err(dev, "instance released before end of transaction\n");
+		goto handled;
+	}
+
+	if (irqst1 & VPE_DS1_UV_ERROR_INT) {
+		irqst1 &= ~VPE_DS1_UV_ERROR_INT;
+		ds1_uv_error(ctx);
+	}
+
+	if (irqst0) {
+		if (irqst0 & VPE_INT0_LIST0_COMPLETE)
+			vpdma_clear_list_stat(ctx->dev->vpdma);
+
+		irqst0 &= ~(VPE_INT0_LIST0_COMPLETE);
+	}
+
+	if (irqst0 | irqst1) {
+		dev_warn(dev->v4l2_dev.dev, "Unexpected interrupt: "
+			"INT0_STATUS0 = 0x%08x, INT0_STATUS1 = 0x%08x\n",
+			irqst0, irqst1);
+	}
+
+	disable_irqs(ctx);
+
+	vpdma_unmap_desc_buf(dev->vpdma, &ctx->desc_list.buf);
+	vpdma_unmap_desc_buf(dev->vpdma, &ctx->mmr_adb);
+
+	vpdma_reset_desc_list(&ctx->desc_list);
+
+	if (ctx->aborting)
+		goto finished;
+
+	s_vb = ctx->src_vb;
+	d_vb = ctx->dst_vb;
+	s_buf = &s_vb->v4l2_buf;
+	d_buf = &d_vb->v4l2_buf;
+
+	d_buf->timestamp = s_buf->timestamp;
+	if (s_buf->flags & V4L2_BUF_FLAG_TIMECODE) {
+		d_buf->flags |= V4L2_BUF_FLAG_TIMECODE;
+		d_buf->timecode = s_buf->timecode;
+	}
+
+	d_buf->sequence = ctx->sequence;
+
+	ctx->sequence++;
+
+	spin_lock_irqsave(&dev->lock, flags);
+	v4l2_m2m_buf_done(s_vb, VB2_BUF_STATE_DONE);
+	v4l2_m2m_buf_done(d_vb, VB2_BUF_STATE_DONE);
+	spin_unlock_irqrestore(&dev->lock, flags);
+
+	ctx->bufs_completed++;
+	if (ctx->bufs_completed < ctx->bufs_per_job) {
+		device_run(ctx);
+		goto handled;
+	}
+
+finished:
+	vpe_dbg(ctx->dev, "finishing transaction\n");
+	ctx->bufs_completed = 0;
+	v4l2_m2m_job_finish(dev->m2m_dev, ctx->m2m_ctx);
+handled:
+	return IRQ_HANDLED;
+}
+
+/*
+ * video ioctls
+ */
+static int vpe_querycap(struct file *file, void *priv,
+			struct v4l2_capability *cap)
+{
+	strncpy(cap->driver, VPE_MODULE_NAME, sizeof(cap->driver) - 1);
+	strncpy(cap->card, VPE_MODULE_NAME, sizeof(cap->card) - 1);
+	strlcpy(cap->bus_info, VPE_MODULE_NAME, sizeof(cap->bus_info));
+	cap->device_caps  = V4L2_CAP_VIDEO_M2M | V4L2_CAP_STREAMING;
+	cap->capabilities = cap->device_caps | V4L2_CAP_DEVICE_CAPS;
+	return 0;
+}
+
+static int __enum_fmt(struct v4l2_fmtdesc *f, u32 type)
+{
+	int i, index;
+	struct vpe_fmt *fmt = NULL;
+
+	index = 0;
+	for (i = 0; i < ARRAY_SIZE(vpe_formats); ++i) {
+		if (vpe_formats[i].types & type) {
+			if (index == f->index) {
+				fmt = &vpe_formats[i];
+				break;
+			}
+			index++;
+		}
+	}
+
+	if (!fmt)
+		return -EINVAL;
+
+	strncpy(f->description, fmt->name, sizeof(f->description) - 1);
+	f->pixelformat = fmt->fourcc;
+	return 0;
+}
+
+static int vpe_enum_fmt(struct file *file, void *priv,
+				struct v4l2_fmtdesc *f)
+{
+	if (V4L2_TYPE_IS_OUTPUT(f->type))
+		return __enum_fmt(f, VPE_FMT_TYPE_OUTPUT);
+
+	return __enum_fmt(f, VPE_FMT_TYPE_CAPTURE);
+}
+
+static int vpe_g_fmt(struct file *file, void *priv, struct v4l2_format *f)
+{
+	struct v4l2_pix_format_mplane *pix = &f->fmt.pix_mp;
+	struct vpe_ctx *ctx = file2ctx(file);
+	struct vb2_queue *vq;
+	struct vpe_q_data *q_data;
+	int i;
+
+	vq = v4l2_m2m_get_vq(ctx->m2m_ctx, f->type);
+	if (!vq)
+		return -EINVAL;
+
+	q_data = get_q_data(ctx, f->type);
+
+	pix->width = q_data->width;
+	pix->height = q_data->height;
+	pix->pixelformat = q_data->fmt->fourcc;
+	pix->colorspace = q_data->colorspace;
+	pix->num_planes = q_data->fmt->coplanar ? 2 : 1;
+
+	for (i = 0; i < pix->num_planes; i++) {
+		pix->plane_fmt[i].bytesperline = q_data->bytesperline[i];
+		pix->plane_fmt[i].sizeimage = q_data->sizeimage[i];
+	}
+
+	return 0;
+}
+
+static int __vpe_try_fmt(struct vpe_ctx *ctx, struct v4l2_format *f,
+		       struct vpe_fmt *fmt, int type)
+{
+	struct v4l2_pix_format_mplane *pix = &f->fmt.pix_mp;
+	struct v4l2_plane_pix_format *plane_fmt;
+	int i;
+
+	if (!fmt || !(fmt->types & type)) {
+		vpe_err(ctx->dev, "Fourcc format (0x%08x) invalid.\n",
+			pix->pixelformat);
+		return -EINVAL;
+	}
+
+	pix->field = V4L2_FIELD_NONE;
+
+	v4l_bound_align_image(&pix->width, MIN_W, MAX_W, W_ALIGN,
+			      &pix->height, MIN_H, MAX_H, H_ALIGN,
+			      S_ALIGN);
+
+	pix->num_planes = fmt->coplanar ? 2 : 1;
+	pix->pixelformat = fmt->fourcc;
+	pix->colorspace = fmt->fourcc == V4L2_PIX_FMT_RGB24 ?
+			V4L2_COLORSPACE_SRGB : V4L2_COLORSPACE_SMPTE170M;
+
+	for (i = 0; i < pix->num_planes; i++) {
+		int depth;
+
+		plane_fmt = &pix->plane_fmt[i];
+		depth = fmt->vpdma_fmt[i]->depth;
+
+		if (i == VPE_LUMA)
+			plane_fmt->bytesperline =
+					round_up((pix->width * depth) >> 3,
+						1 << L_ALIGN);
+		else
+			plane_fmt->bytesperline = pix->width;
+
+		plane_fmt->sizeimage =
+				(pix->height * pix->width * depth) >> 3;
+	}
+
+	return 0;
+}
+
+static int vpe_try_fmt(struct file *file, void *priv, struct v4l2_format *f)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+	struct vpe_fmt *fmt = find_format(f);
+
+	if (V4L2_TYPE_IS_OUTPUT(f->type))
+		return __vpe_try_fmt(ctx, f, fmt, VPE_FMT_TYPE_OUTPUT);
+	else
+		return __vpe_try_fmt(ctx, f, fmt, VPE_FMT_TYPE_CAPTURE);
+}
+
+static int __vpe_s_fmt(struct vpe_ctx *ctx, struct v4l2_format *f)
+{
+	struct v4l2_pix_format_mplane *pix = &f->fmt.pix_mp;
+	struct v4l2_plane_pix_format *plane_fmt;
+	struct vpe_q_data *q_data;
+	struct vb2_queue *vq;
+	int i;
+
+	vq = v4l2_m2m_get_vq(ctx->m2m_ctx, f->type);
+	if (!vq)
+		return -EINVAL;
+
+	if (vb2_is_busy(vq)) {
+		vpe_err(ctx->dev, "queue busy\n");
+		return -EBUSY;
+	}
+
+	q_data = get_q_data(ctx, f->type);
+	if (!q_data)
+		return -EINVAL;
+
+	q_data->fmt		= find_format(f);
+	q_data->width		= pix->width;
+	q_data->height		= pix->height;
+	q_data->colorspace	= pix->colorspace;
+
+	for (i = 0; i < pix->num_planes; i++) {
+		plane_fmt = &pix->plane_fmt[i];
+
+		q_data->bytesperline[i]	= plane_fmt->bytesperline;
+		q_data->sizeimage[i]	= plane_fmt->sizeimage;
+	}
+
+	q_data->c_rect.left	= 0;
+	q_data->c_rect.top	= 0;
+	q_data->c_rect.width	= q_data->width;
+	q_data->c_rect.height	= q_data->height;
+
+	vpe_dbg(ctx->dev, "Setting format for type %d, wxh: %dx%d, fmt: %d bpl_y %d",
+		f->type, q_data->width, q_data->height, q_data->fmt->fourcc,
+		q_data->bytesperline[VPE_LUMA]);
+	if (q_data->fmt->coplanar)
+		vpe_dbg(ctx->dev, " bpl_uv %d\n",
+			q_data->bytesperline[VPE_CHROMA]);
+
+	return 0;
+}
+
+static int vpe_s_fmt(struct file *file, void *priv, struct v4l2_format *f)
+{
+	int ret;
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	ret = vpe_try_fmt(file, priv, f);
+	if (ret)
+		return ret;
+
+	ret = __vpe_s_fmt(ctx, f);
+	if (ret)
+		return ret;
+
+	if (V4L2_TYPE_IS_OUTPUT(f->type))
+		set_src_registers(ctx);
+	else
+		set_dst_registers(ctx);
+
+	return set_srcdst_params(ctx);
+}
+
+static int vpe_reqbufs(struct file *file, void *priv,
+		       struct v4l2_requestbuffers *reqbufs)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	return v4l2_m2m_reqbufs(file, ctx->m2m_ctx, reqbufs);
+}
+
+static int vpe_querybuf(struct file *file, void *priv, struct v4l2_buffer *buf)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	return v4l2_m2m_querybuf(file, ctx->m2m_ctx, buf);
+}
+
+static int vpe_qbuf(struct file *file, void *priv, struct v4l2_buffer *buf)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	return v4l2_m2m_qbuf(file, ctx->m2m_ctx, buf);
+}
+
+static int vpe_dqbuf(struct file *file, void *priv, struct v4l2_buffer *buf)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	return v4l2_m2m_dqbuf(file, ctx->m2m_ctx, buf);
+}
+
+static int vpe_streamon(struct file *file, void *priv, enum v4l2_buf_type type)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	return v4l2_m2m_streamon(file, ctx->m2m_ctx, type);
+}
+
+static int vpe_streamoff(struct file *file, void *priv, enum v4l2_buf_type type)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	vpe_dump_regs(ctx->dev);
+	vpdma_dump_regs(ctx->dev->vpdma);
+
+	return v4l2_m2m_streamoff(file, ctx->m2m_ctx, type);
+}
+
+#define V4L2_CID_VPE_BUFS_PER_JOB		(V4L2_CID_USER_TI_VPE_BASE + 0)
+
+static int vpe_s_ctrl(struct v4l2_ctrl *ctrl)
+{
+	struct vpe_ctx *ctx =
+		container_of(ctrl->handler, struct vpe_ctx, hdl);
+
+	switch (ctrl->id) {
+	case V4L2_CID_VPE_BUFS_PER_JOB:
+		ctx->bufs_per_job = ctrl->val;
+		break;
+
+	default:
+		vpe_err(ctx->dev, "Invalid control\n");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static const struct v4l2_ctrl_ops vpe_ctrl_ops = {
+	.s_ctrl = vpe_s_ctrl,
+};
+
+static const struct v4l2_ioctl_ops vpe_ioctl_ops = {
+	.vidioc_querycap	= vpe_querycap,
+
+	.vidioc_enum_fmt_vid_cap_mplane = vpe_enum_fmt,
+	.vidioc_g_fmt_vid_cap_mplane	= vpe_g_fmt,
+	.vidioc_try_fmt_vid_cap_mplane	= vpe_try_fmt,
+	.vidioc_s_fmt_vid_cap_mplane	= vpe_s_fmt,
+
+	.vidioc_enum_fmt_vid_out_mplane = vpe_enum_fmt,
+	.vidioc_g_fmt_vid_out_mplane	= vpe_g_fmt,
+	.vidioc_try_fmt_vid_out_mplane	= vpe_try_fmt,
+	.vidioc_s_fmt_vid_out_mplane	= vpe_s_fmt,
+
+	.vidioc_reqbufs		= vpe_reqbufs,
+	.vidioc_querybuf	= vpe_querybuf,
+
+	.vidioc_qbuf		= vpe_qbuf,
+	.vidioc_dqbuf		= vpe_dqbuf,
+
+	.vidioc_streamon	= vpe_streamon,
+	.vidioc_streamoff	= vpe_streamoff,
+	.vidioc_subscribe_event = v4l2_ctrl_subscribe_event,
+	.vidioc_unsubscribe_event = v4l2_event_unsubscribe,
+};
+
+/*
+ * Queue operations
+ */
+static int vpe_queue_setup(struct vb2_queue *vq,
+			   const struct v4l2_format *fmt,
+			   unsigned int *nbuffers, unsigned int *nplanes,
+			   unsigned int sizes[], void *alloc_ctxs[])
+{
+	int i;
+	struct vpe_ctx *ctx = vb2_get_drv_priv(vq);
+	struct vpe_q_data *q_data;
+
+	q_data = get_q_data(ctx, vq->type);
+
+	*nplanes = q_data->fmt->coplanar ? 2 : 1;
+
+	for (i = 0; i < *nplanes; i++) {
+		sizes[i] = q_data->sizeimage[i];
+		alloc_ctxs[i] = ctx->dev->alloc_ctx;
+	}
+
+	vpe_dbg(ctx->dev, "get %d buffer(s) of size %d", *nbuffers,
+		sizes[VPE_LUMA]);
+	if (q_data->fmt->coplanar)
+		vpe_dbg(ctx->dev, " and %d\n", sizes[VPE_CHROMA]);
+
+	return 0;
+}
+
+static int vpe_buf_prepare(struct vb2_buffer *vb)
+{
+	struct vpe_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
+	struct vpe_q_data *q_data;
+	int i, num_planes;
+
+	vpe_dbg(ctx->dev, "type: %d\n", vb->vb2_queue->type);
+
+	q_data = get_q_data(ctx, vb->vb2_queue->type);
+	num_planes = q_data->fmt->coplanar ? 2 : 1;
+
+	for (i = 0; i < num_planes; i++) {
+		if (vb2_plane_size(vb, i) < q_data->sizeimage[i]) {
+			vpe_err(ctx->dev,
+				"data will not fit into plane (%lu < %lu)\n",
+				vb2_plane_size(vb, i),
+				(long) q_data->sizeimage[i]);
+			return -EINVAL;
+		}
+	}
+
+	for (i = 0; i < num_planes; i++)
+		vb2_set_plane_payload(vb, i, q_data->sizeimage[i]);
+
+	return 0;
+}
+
+static void vpe_buf_queue(struct vb2_buffer *vb)
+{
+	struct vpe_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
+	v4l2_m2m_buf_queue(ctx->m2m_ctx, vb);
+}
+
+static void vpe_wait_prepare(struct vb2_queue *q)
+{
+	struct vpe_ctx *ctx = vb2_get_drv_priv(q);
+	vpe_unlock(ctx);
+}
+
+static void vpe_wait_finish(struct vb2_queue *q)
+{
+	struct vpe_ctx *ctx = vb2_get_drv_priv(q);
+	vpe_lock(ctx);
+}
+
+static struct vb2_ops vpe_qops = {
+	.queue_setup	 = vpe_queue_setup,
+	.buf_prepare	 = vpe_buf_prepare,
+	.buf_queue	 = vpe_buf_queue,
+	.wait_prepare	 = vpe_wait_prepare,
+	.wait_finish	 = vpe_wait_finish,
+};
+
+static int queue_init(void *priv, struct vb2_queue *src_vq,
+		      struct vb2_queue *dst_vq)
+{
+	struct vpe_ctx *ctx = priv;
+	int ret;
+
+	memset(src_vq, 0, sizeof(*src_vq));
+	src_vq->type = V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE;
+	src_vq->io_modes = VB2_MMAP;
+	src_vq->drv_priv = ctx;
+	src_vq->buf_struct_size = sizeof(struct v4l2_m2m_buffer);
+	src_vq->ops = &vpe_qops;
+	src_vq->mem_ops = &vb2_dma_contig_memops;
+	src_vq->timestamp_type = V4L2_BUF_FLAG_TIMESTAMP_COPY;
+
+	ret = vb2_queue_init(src_vq);
+	if (ret)
+		return ret;
+
+	memset(dst_vq, 0, sizeof(*dst_vq));
+	dst_vq->type = V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE;
+	dst_vq->io_modes = VB2_MMAP;
+	dst_vq->drv_priv = ctx;
+	dst_vq->buf_struct_size = sizeof(struct v4l2_m2m_buffer);
+	dst_vq->ops = &vpe_qops;
+	dst_vq->mem_ops = &vb2_dma_contig_memops;
+	dst_vq->timestamp_type = V4L2_BUF_FLAG_TIMESTAMP_COPY;
+
+	return vb2_queue_init(dst_vq);
+}
+
+static const struct v4l2_ctrl_config vpe_bufs_per_job = {
+	.ops = &vpe_ctrl_ops,
+	.id = V4L2_CID_VPE_BUFS_PER_JOB,
+	.name = "Buffers Per Transaction",
+	.type = V4L2_CTRL_TYPE_INTEGER,
+	.def = VPE_DEF_BUFS_PER_JOB,
+	.min = 1,
+	.max = VIDEO_MAX_FRAME,
+	.step = 1,
+};
+
+/*
+ * File operations
+ */
+static int vpe_open(struct file *file)
+{
+	struct vpe_dev *dev = video_drvdata(file);
+	struct vpe_ctx *ctx = NULL;
+	struct vpe_q_data *s_q_data;
+	struct v4l2_ctrl_handler *hdl;
+	int ret;
+
+	vpe_dbg(dev, "vpe_open\n");
+
+	if (!dev->vpdma->ready) {
+		vpe_err(dev, "vpdma firmware not loaded\n");
+		return -ENODEV;
+	}
+
+	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
+	if (!ctx)
+		return -ENOMEM;
+
+	ctx->dev = dev;
+
+	if (mutex_lock_interruptible(&dev->dev_mutex)) {
+		ret = -ERESTARTSYS;
+		goto free_ctx;
+	}
+
+	ret = vpdma_create_desc_list(&ctx->desc_list, VPE_DESC_LIST_SIZE,
+			VPDMA_LIST_TYPE_NORMAL);
+	if (ret != 0)
+		goto unlock;
+
+	ret = vpdma_alloc_desc_buf(&ctx->mmr_adb, sizeof(struct vpe_mmr_adb));
+	if (ret != 0)
+		goto free_desc_list;
+
+	init_adb_hdrs(ctx);
+
+	v4l2_fh_init(&ctx->fh, video_devdata(file));
+	file->private_data = &ctx->fh;
+
+	hdl = &ctx->hdl;
+	v4l2_ctrl_handler_init(hdl, 1);
+	v4l2_ctrl_new_custom(hdl, &vpe_bufs_per_job, NULL);
+	if (hdl->error) {
+		ret = hdl->error;
+		goto exit_fh;
+	}
+	ctx->fh.ctrl_handler = hdl;
+	v4l2_ctrl_handler_setup(hdl);
+
+	s_q_data = &ctx->q_data[Q_DATA_SRC];
+	s_q_data->fmt = &vpe_formats[2];
+	s_q_data->width = 1920;
+	s_q_data->height = 1080;
+	s_q_data->sizeimage[VPE_LUMA] = (s_q_data->width * s_q_data->height *
+			s_q_data->fmt->vpdma_fmt[VPE_LUMA]->depth) >> 3;
+	s_q_data->colorspace = V4L2_COLORSPACE_SMPTE240M;
+	s_q_data->c_rect.left = 0;
+	s_q_data->c_rect.top = 0;
+	s_q_data->c_rect.width = s_q_data->width;
+	s_q_data->c_rect.height = s_q_data->height;
+	s_q_data->flags = 0;
+
+	ctx->q_data[Q_DATA_DST] = *s_q_data;
+
+	set_src_registers(ctx);
+	set_dst_registers(ctx);
+	ret = set_srcdst_params(ctx);
+	if (ret)
+		goto exit_fh;
+
+	ctx->m2m_ctx = v4l2_m2m_ctx_init(dev->m2m_dev, ctx, &queue_init);
+
+	if (IS_ERR(ctx->m2m_ctx)) {
+		ret = PTR_ERR(ctx->m2m_ctx);
+		goto exit_fh;
+	}
+
+	v4l2_fh_add(&ctx->fh);
+
+	/*
+	 * for now, just report the creation of the first instance, we can later
+	 * optimize the driver to enable or disable clocks when the first
+	 * instance is created or the last instance released
+	 */
+	if (atomic_inc_return(&dev->num_instances) == 1)
+		vpe_dbg(dev, "first instance created\n");
+
+	ctx->bufs_per_job = VPE_DEF_BUFS_PER_JOB;
+
+	ctx->load_mmrs = true;
+
+	vpe_dbg(dev, "created instance %p, m2m_ctx: %p\n",
+		ctx, ctx->m2m_ctx);
+
+	mutex_unlock(&dev->dev_mutex);
+
+	return 0;
+exit_fh:
+	v4l2_ctrl_handler_free(hdl);
+	v4l2_fh_exit(&ctx->fh);
+	vpdma_free_desc_buf(&ctx->mmr_adb);
+free_desc_list:
+	vpdma_free_desc_list(&ctx->desc_list);
+unlock:
+	mutex_unlock(&dev->dev_mutex);
+free_ctx:
+	kfree(ctx);
+	return ret;
+}
+
+static int vpe_release(struct file *file)
+{
+	struct vpe_dev *dev = video_drvdata(file);
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	vpe_dbg(dev, "releasing instance %p\n", ctx);
+
+	mutex_lock(&dev->dev_mutex);
+	vpdma_free_desc_list(&ctx->desc_list);
+	vpdma_free_desc_buf(&ctx->mmr_adb);
+
+	v4l2_fh_del(&ctx->fh);
+	v4l2_fh_exit(&ctx->fh);
+	v4l2_ctrl_handler_free(&ctx->hdl);
+	v4l2_m2m_ctx_release(ctx->m2m_ctx);
+
+	kfree(ctx);
+
+	/*
+	 * for now, just report the release of the last instance, we can later
+	 * optimize the driver to enable or disable clocks when the first
+	 * instance is created or the last instance released
+	 */
+	if (atomic_dec_return(&dev->num_instances) == 0)
+		vpe_dbg(dev, "last instance released\n");
+
+	mutex_unlock(&dev->dev_mutex);
+
+	return 0;
+}
+
+static unsigned int vpe_poll(struct file *file,
+			     struct poll_table_struct *wait)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+	struct vpe_dev *dev = ctx->dev;
+	int ret;
+
+	mutex_lock(&dev->dev_mutex);
+	ret = v4l2_m2m_poll(file, ctx->m2m_ctx, wait);
+	mutex_unlock(&dev->dev_mutex);
+	return ret;
+}
+
+static int vpe_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+	struct vpe_dev *dev = ctx->dev;
+	int ret;
+
+	if (mutex_lock_interruptible(&dev->dev_mutex))
+		return -ERESTARTSYS;
+	ret = v4l2_m2m_mmap(file, ctx->m2m_ctx, vma);
+	mutex_unlock(&dev->dev_mutex);
+	return ret;
+}
+
+static const struct v4l2_file_operations vpe_fops = {
+	.owner		= THIS_MODULE,
+	.open		= vpe_open,
+	.release	= vpe_release,
+	.poll		= vpe_poll,
+	.unlocked_ioctl	= video_ioctl2,
+	.mmap		= vpe_mmap,
+};
+
+static struct video_device vpe_videodev = {
+	.name		= VPE_MODULE_NAME,
+	.fops		= &vpe_fops,
+	.ioctl_ops	= &vpe_ioctl_ops,
+	.minor		= -1,
+	.release	= video_device_release,
+	.vfl_dir	= VFL_DIR_M2M,
+};
+
+static struct v4l2_m2m_ops m2m_ops = {
+	.device_run	= device_run,
+	.job_ready	= job_ready,
+	.job_abort	= job_abort,
+	.lock		= vpe_lock,
+	.unlock		= vpe_unlock,
+};
+
+static int vpe_runtime_get(struct platform_device *pdev)
+{
+	int r;
+
+	dev_dbg(&pdev->dev, "vpe_runtime_get\n");
+
+	r = pm_runtime_get_sync(&pdev->dev);
+	WARN_ON(r < 0);
+	return r < 0 ? r : 0;
+}
+
+static void vpe_runtime_put(struct platform_device *pdev)
+{
+
+	int r;
+
+	dev_dbg(&pdev->dev, "vpe_runtime_put\n");
+
+	r = pm_runtime_put_sync(&pdev->dev);
+	WARN_ON(r < 0 && r != -ENOSYS);
+}
+
+static int vpe_probe(struct platform_device *pdev)
+{
+	struct vpe_dev *dev;
+	struct video_device *vfd;
+	struct resource *res;
+	int ret, irq, func;
+
+	dev = devm_kzalloc(&pdev->dev, sizeof(*dev), GFP_KERNEL);
+	if (IS_ERR(dev))
+		return PTR_ERR(dev);
+
+	spin_lock_init(&dev->lock);
+
+	ret = v4l2_device_register(&pdev->dev, &dev->v4l2_dev);
+	if (ret)
+		return ret;
+
+	atomic_set(&dev->num_instances, 0);
+	mutex_init(&dev->dev_mutex);
+
+	res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "vpe_top");
+	/*
+	 * HACK: we get resource info from device tree in the form of a list of
+	 * VPE sub blocks, the driver currently uses only the base of vpe_top
+	 * for register access, the driver should be changed later to access
+	 * registers based on the sub block base addresses
+	 */
+	dev->base = devm_ioremap(&pdev->dev, res->start, SZ_32K);
+	if (IS_ERR(dev->base)) {
+		ret = PTR_ERR(dev->base);
+		goto v4l2_dev_unreg;
+	}
+
+	irq = platform_get_irq(pdev, 0);
+	ret = devm_request_irq(&pdev->dev, irq, vpe_irq, 0, VPE_MODULE_NAME,
+			dev);
+	if (ret)
+		goto v4l2_dev_unreg;
+
+	platform_set_drvdata(pdev, dev);
+
+	dev->alloc_ctx = vb2_dma_contig_init_ctx(&pdev->dev);
+	if (IS_ERR(dev->alloc_ctx)) {
+		vpe_err(dev, "Failed to alloc vb2 context\n");
+		ret = PTR_ERR(dev->alloc_ctx);
+		goto v4l2_dev_unreg;
+	}
+
+	dev->m2m_dev = v4l2_m2m_init(&m2m_ops);
+	if (IS_ERR(dev->m2m_dev)) {
+		vpe_err(dev, "Failed to init mem2mem device\n");
+		ret = PTR_ERR(dev->m2m_dev);
+		goto rel_ctx;
+	}
+
+	pm_runtime_enable(&pdev->dev);
+
+	ret = vpe_runtime_get(pdev);
+	if (ret)
+		goto rel_m2m;
+
+	/* Perform clk enable followed by reset */
+	vpe_set_clock_enable(dev, 1);
+
+	vpe_top_reset(dev);
+
+	func = read_field_reg(dev, VPE_PID, VPE_PID_FUNC_MASK,
+		VPE_PID_FUNC_SHIFT);
+	vpe_dbg(dev, "VPE PID function %x\n", func);
+
+	vpe_top_vpdma_reset(dev);
+
+	dev->vpdma = vpdma_create(pdev);
+	if (IS_ERR(dev->vpdma))
+		goto runtime_put;
+
+	vfd = &dev->vfd;
+	*vfd = vpe_videodev;
+	vfd->lock = &dev->dev_mutex;
+	vfd->v4l2_dev = &dev->v4l2_dev;
+
+	ret = video_register_device(vfd, VFL_TYPE_GRABBER, 0);
+	if (ret) {
+		vpe_err(dev, "Failed to register video device\n");
+		goto runtime_put;
+	}
+
+	video_set_drvdata(vfd, dev);
+	snprintf(vfd->name, sizeof(vfd->name), "%s", vpe_videodev.name);
+	dev_info(dev->v4l2_dev.dev, "Device registered as /dev/video%d\n",
+		vfd->num);
+
+	return 0;
+
+runtime_put:
+	vpe_runtime_put(pdev);
+rel_m2m:
+	pm_runtime_disable(&pdev->dev);
+	v4l2_m2m_release(dev->m2m_dev);
+rel_ctx:
+	vb2_dma_contig_cleanup_ctx(dev->alloc_ctx);
+v4l2_dev_unreg:
+	v4l2_device_unregister(&dev->v4l2_dev);
+
+	return ret;
+}
+
+static int vpe_remove(struct platform_device *pdev)
+{
+	struct vpe_dev *dev =
+		(struct vpe_dev *) platform_get_drvdata(pdev);
+
+	v4l2_info(&dev->v4l2_dev, "Removing " VPE_MODULE_NAME);
+
+	v4l2_m2m_release(dev->m2m_dev);
+	video_unregister_device(&dev->vfd);
+	v4l2_device_unregister(&dev->v4l2_dev);
+	vb2_dma_contig_cleanup_ctx(dev->alloc_ctx);
+
+	vpe_set_clock_enable(dev, 0);
+	vpe_runtime_put(pdev);
+	pm_runtime_disable(&pdev->dev);
+
+	return 0;
+}
+
+#if defined(CONFIG_OF)
+static const struct of_device_id vpe_of_match[] = {
+	{
+		.compatible = "ti,vpe",
+	},
+	{},
+};
+#else
+#define vpe_of_match NULL
+#endif
+
+static struct platform_driver vpe_pdrv = {
+	.probe		= vpe_probe,
+	.remove		= vpe_remove,
+	.driver		= {
+		.name	= VPE_MODULE_NAME,
+		.owner	= THIS_MODULE,
+		.of_match_table = vpe_of_match,
+	},
+};
+
+static void __exit vpe_exit(void)
+{
+	platform_driver_unregister(&vpe_pdrv);
+}
+
+static int __init vpe_init(void)
+{
+	return platform_driver_register(&vpe_pdrv);
+}
+
+module_init(vpe_init);
+module_exit(vpe_exit);
+
+MODULE_DESCRIPTION("TI VPE driver");
+MODULE_AUTHOR("Dale Farnsworth, <dale@farnsworth.org>");
+MODULE_LICENSE("GPL");
diff --git a/drivers/media/platform/ti-vpe/vpe_regs.h b/drivers/media/platform/ti-vpe/vpe_regs.h
new file mode 100644
index 0000000..ed214e8
--- /dev/null
+++ b/drivers/media/platform/ti-vpe/vpe_regs.h
@@ -0,0 +1,496 @@
+/*
+ * Copyright (c) 2013 Texas Instruments Inc.
+ *
+ * David Griego, <dagriego@biglakesoftware.com>
+ * Dale Farnsworth, <dale@farnsworth.org>
+ * Archit Taneja, <archit@ti.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#ifndef __TI_VPE_REGS_H
+#define __TI_VPE_REGS_H
+
+/* VPE register offsets and field selectors */
+
+/* VPE top level regs */
+#define VPE_PID				0x0000
+#define VPE_PID_MINOR_MASK		0x3f
+#define VPE_PID_MINOR_SHIFT		0
+#define VPE_PID_CUSTOM_MASK		0x03
+#define VPE_PID_CUSTOM_SHIFT		6
+#define VPE_PID_MAJOR_MASK		0x07
+#define VPE_PID_MAJOR_SHIFT		8
+#define VPE_PID_RTL_MASK		0x1f
+#define VPE_PID_RTL_SHIFT		11
+#define VPE_PID_FUNC_MASK		0xfff
+#define VPE_PID_FUNC_SHIFT		16
+#define VPE_PID_SCHEME_MASK		0x03
+#define VPE_PID_SCHEME_SHIFT		30
+
+#define VPE_SYSCONFIG			0x0010
+#define VPE_SYSCONFIG_IDLE_MASK		0x03
+#define VPE_SYSCONFIG_IDLE_SHIFT	2
+#define VPE_SYSCONFIG_STANDBY_MASK	0x03
+#define VPE_SYSCONFIG_STANDBY_SHIFT	4
+#define VPE_FORCE_IDLE_MODE		0
+#define VPE_NO_IDLE_MODE		1
+#define VPE_SMART_IDLE_MODE		2
+#define VPE_SMART_IDLE_WAKEUP_MODE	3
+#define VPE_FORCE_STANDBY_MODE		0
+#define VPE_NO_STANDBY_MODE		1
+#define VPE_SMART_STANDBY_MODE		2
+#define VPE_SMART_STANDBY_WAKEUP_MODE	3
+
+#define VPE_INT0_STATUS0_RAW_SET	0x0020
+#define VPE_INT0_STATUS0_RAW		VPE_INT0_STATUS0_RAW_SET
+#define VPE_INT0_STATUS0_CLR		0x0028
+#define VPE_INT0_STATUS0		VPE_INT0_STATUS0_CLR
+#define VPE_INT0_ENABLE0_SET		0x0030
+#define VPE_INT0_ENABLE0		VPE_INT0_ENABLE0_SET
+#define VPE_INT0_ENABLE0_CLR		0x0038
+#define VPE_INT0_LIST0_COMPLETE		(1 << 0)
+#define VPE_INT0_LIST0_NOTIFY		(1 << 1)
+#define VPE_INT0_LIST1_COMPLETE		(1 << 2)
+#define VPE_INT0_LIST1_NOTIFY		(1 << 3)
+#define VPE_INT0_LIST2_COMPLETE		(1 << 4)
+#define VPE_INT0_LIST2_NOTIFY		(1 << 5)
+#define VPE_INT0_LIST3_COMPLETE		(1 << 6)
+#define VPE_INT0_LIST3_NOTIFY		(1 << 7)
+#define VPE_INT0_LIST4_COMPLETE		(1 << 8)
+#define VPE_INT0_LIST4_NOTIFY		(1 << 9)
+#define VPE_INT0_LIST5_COMPLETE		(1 << 10)
+#define VPE_INT0_LIST5_NOTIFY		(1 << 11)
+#define VPE_INT0_LIST6_COMPLETE		(1 << 12)
+#define VPE_INT0_LIST6_NOTIFY		(1 << 13)
+#define VPE_INT0_LIST7_COMPLETE		(1 << 14)
+#define VPE_INT0_LIST7_NOTIFY		(1 << 15)
+#define VPE_INT0_DESCRIPTOR		(1 << 16)
+#define VPE_DEI_FMD_INT			(1 << 18)
+
+#define VPE_INT0_STATUS1_RAW_SET	0x0024
+#define VPE_INT0_STATUS1_RAW		VPE_INT0_STATUS1_RAW_SET
+#define VPE_INT0_STATUS1_CLR		0x002c
+#define VPE_INT0_STATUS1		VPE_INT0_STATUS1_CLR
+#define VPE_INT0_ENABLE1_SET		0x0034
+#define VPE_INT0_ENABLE1		VPE_INT0_ENABLE1_SET
+#define VPE_INT0_ENABLE1_CLR		0x003c
+#define VPE_INT0_CHANNEL_GROUP0		(1 << 0)
+#define VPE_INT0_CHANNEL_GROUP1		(1 << 1)
+#define VPE_INT0_CHANNEL_GROUP2		(1 << 2)
+#define VPE_INT0_CHANNEL_GROUP3		(1 << 3)
+#define VPE_INT0_CHANNEL_GROUP4		(1 << 4)
+#define VPE_INT0_CHANNEL_GROUP5		(1 << 5)
+#define VPE_INT0_CLIENT			(1 << 7)
+#define VPE_DEI_ERROR_INT		(1 << 16)
+#define VPE_DS1_UV_ERROR_INT		(1 << 22)
+
+#define VPE_INTC_EOI			0x00a0
+
+#define VPE_CLK_ENABLE			0x0100
+#define VPE_VPEDMA_CLK_ENABLE		(1 << 0)
+#define VPE_DATA_PATH_CLK_ENABLE	(1 << 1)
+
+#define VPE_CLK_RESET			0x0104
+#define VPE_VPDMA_CLK_RESET_MASK	0x1
+#define VPE_VPDMA_CLK_RESET_SHIFT	0
+#define VPE_DATA_PATH_CLK_RESET_MASK	0x1
+#define VPE_DATA_PATH_CLK_RESET_SHIFT	1
+#define VPE_MAIN_RESET_MASK		0x1
+#define VPE_MAIN_RESET_SHIFT		31
+
+#define VPE_CLK_FORMAT_SELECT		0x010c
+#define VPE_CSC_SRC_SELECT_MASK		0x03
+#define VPE_CSC_SRC_SELECT_SHIFT	0
+#define VPE_RGB_OUT_SELECT		(1 << 8)
+#define VPE_DS_SRC_SELECT_MASK		0x07
+#define VPE_DS_SRC_SELECT_SHIFT		9
+#define VPE_DS_BYPASS			(1 << 16)
+#define VPE_COLOR_SEPARATE_422		(1 << 18)
+
+#define VPE_DS_SRC_DEI_SCALER		(5 << VPE_DS_SRC_SELECT_SHIFT)
+#define VPE_CSC_SRC_DEI_SCALER		(3 << VPE_CSC_SRC_SELECT_SHIFT)
+
+#define VPE_CLK_RANGE_MAP		0x011c
+#define VPE_RANGE_RANGE_MAP_Y_MASK	0x07
+#define VPE_RANGE_RANGE_MAP_Y_SHIFT	0
+#define VPE_RANGE_RANGE_MAP_UV_MASK	0x07
+#define VPE_RANGE_RANGE_MAP_UV_SHIFT	3
+#define VPE_RANGE_MAP_ON		(1 << 6)
+#define VPE_RANGE_REDUCTION_ON		(1 << 28)
+
+/* VPE chrominance upsampler regs */
+#define VPE_US1_R0			0x0304
+#define VPE_US2_R0			0x0404
+#define VPE_US3_R0			0x0504
+#define VPE_US_C1_MASK			0x3fff
+#define VPE_US_C1_SHIFT			2
+#define VPE_US_C0_MASK			0x3fff
+#define VPE_US_C0_SHIFT			18
+#define VPE_US_MODE_MASK		0x03
+#define VPE_US_MODE_SHIFT		16
+#define VPE_ANCHOR_FID0_C1_MASK		0x3fff
+#define VPE_ANCHOR_FID0_C1_SHIFT	2
+#define VPE_ANCHOR_FID0_C0_MASK		0x3fff
+#define VPE_ANCHOR_FID0_C0_SHIFT	18
+
+#define VPE_US1_R1			0x0308
+#define VPE_US2_R1			0x0408
+#define VPE_US3_R1			0x0508
+#define VPE_ANCHOR_FID0_C3_MASK		0x3fff
+#define VPE_ANCHOR_FID0_C3_SHIFT	2
+#define VPE_ANCHOR_FID0_C2_MASK		0x3fff
+#define VPE_ANCHOR_FID0_C2_SHIFT	18
+
+#define VPE_US1_R2			0x030c
+#define VPE_US2_R2			0x040c
+#define VPE_US3_R2			0x050c
+#define VPE_INTERP_FID0_C1_MASK		0x3fff
+#define VPE_INTERP_FID0_C1_SHIFT	2
+#define VPE_INTERP_FID0_C0_MASK		0x3fff
+#define VPE_INTERP_FID0_C0_SHIFT	18
+
+#define VPE_US1_R3			0x0310
+#define VPE_US2_R3			0x0410
+#define VPE_US3_R3			0x0510
+#define VPE_INTERP_FID0_C3_MASK		0x3fff
+#define VPE_INTERP_FID0_C3_SHIFT	2
+#define VPE_INTERP_FID0_C2_MASK		0x3fff
+#define VPE_INTERP_FID0_C2_SHIFT	18
+
+#define VPE_US1_R4			0x0314
+#define VPE_US2_R4			0x0414
+#define VPE_US3_R4			0x0514
+#define VPE_ANCHOR_FID1_C1_MASK		0x3fff
+#define VPE_ANCHOR_FID1_C1_SHIFT	2
+#define VPE_ANCHOR_FID1_C0_MASK		0x3fff
+#define VPE_ANCHOR_FID1_C0_SHIFT	18
+
+#define VPE_US1_R5			0x0318
+#define VPE_US2_R5			0x0418
+#define VPE_US3_R5			0x0518
+#define VPE_ANCHOR_FID1_C3_MASK		0x3fff
+#define VPE_ANCHOR_FID1_C3_SHIFT	2
+#define VPE_ANCHOR_FID1_C2_MASK		0x3fff
+#define VPE_ANCHOR_FID1_C2_SHIFT	18
+
+#define VPE_US1_R6			0x031c
+#define VPE_US2_R6			0x041c
+#define VPE_US3_R6			0x051c
+#define VPE_INTERP_FID1_C1_MASK		0x3fff
+#define VPE_INTERP_FID1_C1_SHIFT	2
+#define VPE_INTERP_FID1_C0_MASK		0x3fff
+#define VPE_INTERP_FID1_C0_SHIFT	18
+
+#define VPE_US1_R7			0x0320
+#define VPE_US2_R7			0x0420
+#define VPE_US3_R7			0x0520
+#define VPE_INTERP_FID0_C3_MASK		0x3fff
+#define VPE_INTERP_FID0_C3_SHIFT	2
+#define VPE_INTERP_FID0_C2_MASK		0x3fff
+#define VPE_INTERP_FID0_C2_SHIFT	18
+
+/* VPE de-interlacer regs */
+#define VPE_DEI_FRAME_SIZE		0x0600
+#define VPE_DEI_WIDTH_MASK		0x07ff
+#define VPE_DEI_WIDTH_SHIFT		0
+#define VPE_DEI_HEIGHT_MASK		0x07ff
+#define VPE_DEI_HEIGHT_SHIFT		16
+#define VPE_DEI_INTERLACE_BYPASS	(1 << 29)
+#define VPE_DEI_FIELD_FLUSH		(1 << 30)
+#define VPE_DEI_PROGRESSIVE		(1 << 31)
+
+#define VPE_MDT_BYPASS			0x0604
+#define VPE_MDT_TEMPMAX_BYPASS		(1 << 0)
+#define VPE_MDT_SPATMAX_BYPASS		(1 << 1)
+
+#define VPE_MDT_SF_THRESHOLD		0x0608
+#define VPE_MDT_SF_SC_THR1_MASK		0xff
+#define VPE_MDT_SF_SC_THR1_SHIFT	0
+#define VPE_MDT_SF_SC_THR2_MASK		0xff
+#define VPE_MDT_SF_SC_THR2_SHIFT	0
+#define VPE_MDT_SF_SC_THR3_MASK		0xff
+#define VPE_MDT_SF_SC_THR3_SHIFT	0
+
+#define VPE_EDI_CONFIG			0x060c
+#define VPE_EDI_INP_MODE_MASK		0x03
+#define VPE_EDI_INP_MODE_SHIFT		0
+#define VPE_EDI_ENABLE_3D		(1 << 2)
+#define VPE_EDI_ENABLE_CHROMA_3D	(1 << 3)
+#define VPE_EDI_CHROMA3D_COR_THR_MASK	0xff
+#define VPE_EDI_CHROMA3D_COR_THR_SHIFT	8
+#define VPE_EDI_DIR_COR_LOWER_THR_MASK	0xff
+#define VPE_EDI_DIR_COR_LOWER_THR_SHIFT	16
+#define VPE_EDI_COR_SCALE_FACTOR_MASK	0xff
+#define VPE_EDI_COR_SCALE_FACTOR_SHIFT	23
+
+#define VPE_DEI_EDI_LUT_R0		0x0610
+#define VPE_EDI_LUT0_MASK		0x1f
+#define VPE_EDI_LUT0_SHIFT		0
+#define VPE_EDI_LUT1_MASK		0x1f
+#define VPE_EDI_LUT1_SHIFT		8
+#define VPE_EDI_LUT2_MASK		0x1f
+#define VPE_EDI_LUT2_SHIFT		16
+#define VPE_EDI_LUT3_MASK		0x1f
+#define VPE_EDI_LUT3_SHIFT		24
+
+#define VPE_DEI_EDI_LUT_R1		0x0614
+#define VPE_EDI_LUT0_MASK		0x1f
+#define VPE_EDI_LUT0_SHIFT		0
+#define VPE_EDI_LUT1_MASK		0x1f
+#define VPE_EDI_LUT1_SHIFT		8
+#define VPE_EDI_LUT2_MASK		0x1f
+#define VPE_EDI_LUT2_SHIFT		16
+#define VPE_EDI_LUT3_MASK		0x1f
+#define VPE_EDI_LUT3_SHIFT		24
+
+#define VPE_DEI_EDI_LUT_R2		0x0618
+#define VPE_EDI_LUT4_MASK		0x1f
+#define VPE_EDI_LUT4_SHIFT		0
+#define VPE_EDI_LUT5_MASK		0x1f
+#define VPE_EDI_LUT5_SHIFT		8
+#define VPE_EDI_LUT6_MASK		0x1f
+#define VPE_EDI_LUT6_SHIFT		16
+#define VPE_EDI_LUT7_MASK		0x1f
+#define VPE_EDI_LUT7_SHIFT		24
+
+#define VPE_DEI_EDI_LUT_R3		0x061c
+#define VPE_EDI_LUT8_MASK		0x1f
+#define VPE_EDI_LUT8_SHIFT		0
+#define VPE_EDI_LUT9_MASK		0x1f
+#define VPE_EDI_LUT9_SHIFT		8
+#define VPE_EDI_LUT10_MASK		0x1f
+#define VPE_EDI_LUT10_SHIFT		16
+#define VPE_EDI_LUT11_MASK		0x1f
+#define VPE_EDI_LUT11_SHIFT		24
+
+#define VPE_DEI_FMD_WINDOW_R0		0x0620
+#define VPE_FMD_WINDOW_MINX_MASK	0x07ff
+#define VPE_FMD_WINDOW_MINX_SHIFT	0
+#define VPE_FMD_WINDOW_MAXX_MASK	0x07ff
+#define VPE_FMD_WINDOW_MAXX_SHIFT	16
+#define VPE_FMD_WINDOW_ENABLE		(1 << 31)
+
+#define VPE_DEI_FMD_WINDOW_R1		0x0624
+#define VPE_FMD_WINDOW_MINY_MASK	0x07ff
+#define VPE_FMD_WINDOW_MINY_SHIFT	0
+#define VPE_FMD_WINDOW_MAXY_MASK	0x07ff
+#define VPE_FMD_WINDOW_MAXY_SHIFT	16
+
+#define VPE_DEI_FMD_CONTROL_R0		0x0628
+#define VPE_FMD_ENABLE			(1 << 0)
+#define VPE_FMD_LOCK			(1 << 1)
+#define VPE_FMD_JAM_DIR			(1 << 2)
+#define VPE_FMD_BED_ENABLE		(1 << 3)
+#define VPE_FMD_CAF_FIELD_THR_MASK	0xff
+#define VPE_FMD_CAF_FIELD_THR_SHIFT	16
+#define VPE_FMD_CAF_LINE_THR_MASK	0xff
+#define VPE_FMD_CAF_LINE_THR_SHIFT	24
+
+#define VPE_DEI_FMD_CONTROL_R1		0x062c
+#define VPE_FMD_CAF_THR_MASK		0x000fffff
+#define VPE_FMD_CAF_THR_SHIFT		0
+
+#define VPE_DEI_FMD_STATUS_R0		0x0630
+#define VPE_FMD_CAF_MASK		0x000fffff
+#define VPE_FMD_CAF_SHIFT		0
+#define VPE_FMD_RESET			(1 << 24)
+
+#define VPE_DEI_FMD_STATUS_R1		0x0634
+#define VPE_FMD_FIELD_DIFF_MASK		0x0fffffff
+#define VPE_FMD_FIELD_DIFF_SHIFT	0
+
+#define VPE_DEI_FMD_STATUS_R2		0x0638
+#define VPE_FMD_FRAME_DIFF_MASK		0x000fffff
+#define VPE_FMD_FRAME_DIFF_SHIFT	0
+
+/* VPE scaler regs */
+#define VPE_SC_MP_SC0			0x0700
+#define VPE_INTERLACE_O			(1 << 0)
+#define VPE_LINEAR			(1 << 1)
+#define VPE_SC_BYPASS			(1 << 2)
+#define VPE_INVT_FID			(1 << 3)
+#define VPE_USE_RAV			(1 << 4)
+#define VPE_ENABLE_EV			(1 << 5)
+#define VPE_AUTO_HS			(1 << 6)
+#define VPE_DCM_2X			(1 << 7)
+#define VPE_DCM_4X			(1 << 8)
+#define VPE_HP_BYPASS			(1 << 9)
+#define VPE_INTERLACE_I			(1 << 10)
+#define VPE_ENABLE_SIN2_VER_INTP	(1 << 11)
+#define VPE_Y_PK_EN			(1 << 14)
+#define VPE_TRIM			(1 << 15)
+#define VPE_SELFGEN_FID			(1 << 16)
+
+#define VPE_SC_MP_SC1			0x0704
+#define VPE_ROW_ACC_INC_MASK		0x07ffffff
+#define VPE_ROW_ACC_INC_SHIFT		0
+
+#define VPE_SC_MP_SC2			0x0708
+#define VPE_ROW_ACC_OFFSET_MASK		0x0fffffff
+#define VPE_ROW_ACC_OFFSET_SHIFT	0
+
+#define VPE_SC_MP_SC3			0x070c
+#define VPE_ROW_ACC_OFFSET_B_MASK	0x0fffffff
+#define VPE_ROW_ACC_OFFSET_B_SHIFT	0
+
+#define VPE_SC_MP_SC4			0x0710
+#define VPE_TAR_H_MASK			0x07ff
+#define VPE_TAR_H_SHIFT			0
+#define VPE_TAR_W_MASK			0x07ff
+#define VPE_TAR_W_SHIFT			12
+#define VPE_LIN_ACC_INC_U_MASK		0x07
+#define VPE_LIN_ACC_INC_U_SHIFT		24
+#define VPE_NLIN_ACC_INIT_U_MASK	0x07
+#define VPE_NLIN_ACC_INIT_U_SHIFT	28
+
+#define VPE_SC_MP_SC5			0x0714
+#define VPE_SRC_H_MASK			0x07ff
+#define VPE_SRC_H_SHIFT			0
+#define VPE_SRC_W_MASK			0x07ff
+#define VPE_SRC_W_SHIFT			12
+#define VPE_NLIN_ACC_INC_U_MASK		0x07
+#define VPE_NLIN_ACC_INC_U_SHIFT	24
+
+#define VPE_SC_MP_SC6			0x0718
+#define VPE_ROW_ACC_INIT_RAV_MASK	0x03ff
+#define VPE_ROW_ACC_INIT_RAV_SHIFT	0
+#define VPE_ROW_ACC_INIT_RAV_B_MASK	0x03ff
+#define VPE_ROW_ACC_INIT_RAV_B_SHIFT	10
+
+#define VPE_SC_MP_SC8			0x0720
+#define VPE_NLIN_LEFT_MASK		0x07ff
+#define VPE_NLIN_LEFT_SHIFT		0
+#define VPE_NLIN_RIGHT_MASK		0x07ff
+#define VPE_NLIN_RIGHT_SHIFT		12
+
+#define VPE_SC_MP_SC9			0x0724
+#define VPE_LIN_ACC_INC			VPE_SC_MP_SC9
+
+#define VPE_SC_MP_SC10			0x0728
+#define VPE_NLIN_ACC_INIT		VPE_SC_MP_SC10
+
+#define VPE_SC_MP_SC11			0x072c
+#define VPE_NLIN_ACC_INC		VPE_SC_MP_SC11
+
+#define VPE_SC_MP_SC12			0x0730
+#define VPE_COL_ACC_OFFSET_MASK		0x01ffffff
+#define VPE_COL_ACC_OFFSET_SHIFT	0
+
+#define VPE_SC_MP_SC13			0x0734
+#define VPE_SC_FACTOR_RAV_MASK		0x03ff
+#define VPE_SC_FACTOR_RAV_SHIFT		0
+#define VPE_CHROMA_INTP_THR_MASK	0x03ff
+#define VPE_CHROMA_INTP_THR_SHIFT	12
+#define VPE_DELTA_CHROMA_THR_MASK	0x0f
+#define VPE_DELTA_CHROMA_THR_SHIFT	24
+
+#define VPE_SC_MP_SC17			0x0744
+#define VPE_EV_THR_MASK			0x03ff
+#define VPE_EV_THR_SHIFT		12
+#define VPE_DELTA_LUMA_THR_MASK		0x0f
+#define VPE_DELTA_LUMA_THR_SHIFT	24
+#define VPE_DELTA_EV_THR_MASK		0x0f
+#define VPE_DELTA_EV_THR_SHIFT		28
+
+#define VPE_SC_MP_SC18			0x0748
+#define VPE_HS_FACTOR_MASK		0x03ff
+#define VPE_HS_FACTOR_SHIFT		0
+#define VPE_CONF_DEFAULT_MASK		0x01ff
+#define VPE_CONF_DEFAULT_SHIFT		16
+
+#define VPE_SC_MP_SC19			0x074c
+#define VPE_HPF_COEFF0_MASK		0xff
+#define VPE_HPF_COEFF0_SHIFT		0
+#define VPE_HPF_COEFF1_MASK		0xff
+#define VPE_HPF_COEFF1_SHIFT		8
+#define VPE_HPF_COEFF2_MASK		0xff
+#define VPE_HPF_COEFF2_SHIFT		16
+#define VPE_HPF_COEFF3_MASK		0xff
+#define VPE_HPF_COEFF3_SHIFT		23
+
+#define VPE_SC_MP_SC20			0x0750
+#define VPE_HPF_COEFF4_MASK		0xff
+#define VPE_HPF_COEFF4_SHIFT		0
+#define VPE_HPF_COEFF5_MASK		0xff
+#define VPE_HPF_COEFF5_SHIFT		8
+#define VPE_HPF_NORM_SHIFT_MASK		0x07
+#define VPE_HPF_NORM_SHIFT_SHIFT	16
+#define VPE_NL_LIMIT_MASK		0x1ff
+#define VPE_NL_LIMIT_SHIFT		20
+
+#define VPE_SC_MP_SC21			0x0754
+#define VPE_NL_LO_THR_MASK		0x01ff
+#define VPE_NL_LO_THR_SHIFT		0
+#define VPE_NL_LO_SLOPE_MASK		0xff
+#define VPE_NL_LO_SLOPE_SHIFT		16
+
+#define VPE_SC_MP_SC22			0x0758
+#define VPE_NL_HI_THR_MASK		0x01ff
+#define VPE_NL_HI_THR_SHIFT		0
+#define VPE_NL_HI_SLOPE_SH_MASK		0x07
+#define VPE_NL_HI_SLOPE_SH_SHIFT	16
+
+#define VPE_SC_MP_SC23			0x075c
+#define VPE_GRADIENT_THR_MASK		0x07ff
+#define VPE_GRADIENT_THR_SHIFT		0
+#define VPE_GRADIENT_THR_RANGE_MASK	0x0f
+#define VPE_GRADIENT_THR_RANGE_SHIFT	12
+#define VPE_MIN_GY_THR_MASK		0xff
+#define VPE_MIN_GY_THR_SHIFT		16
+#define VPE_MIN_GY_THR_RANGE_MASK	0x0f
+#define VPE_MIN_GY_THR_RANGE_SHIFT	28
+
+#define VPE_SC_MP_SC24			0x0760
+#define VPE_ORG_H_MASK			0x07ff
+#define VPE_ORG_H_SHIFT			0
+#define VPE_ORG_W_MASK			0x07ff
+#define VPE_ORG_W_SHIFT			16
+
+#define VPE_SC_MP_SC25			0x0764
+#define VPE_OFF_H_MASK			0x07ff
+#define VPE_OFF_H_SHIFT			0
+#define VPE_OFF_W_MASK			0x07ff
+#define VPE_OFF_W_SHIFT			16
+
+/* VPE color space converter regs */
+#define VPE_CSC_CSC00			0x5700
+#define VPE_CSC_A0_MASK			0x1fff
+#define VPE_CSC_A0_SHIFT		0
+#define VPE_CSC_B0_MASK			0x1fff
+#define VPE_CSC_B0_SHIFT		16
+
+#define VPE_CSC_CSC01			0x5704
+#define VPE_CSC_C0_MASK			0x1fff
+#define VPE_CSC_C0_SHIFT		0
+#define VPE_CSC_A1_MASK			0x1fff
+#define VPE_CSC_A1_SHIFT		16
+
+#define VPE_CSC_CSC02			0x5708
+#define VPE_CSC_B1_MASK			0x1fff
+#define VPE_CSC_B1_SHIFT		0
+#define VPE_CSC_C1_MASK			0x1fff
+#define VPE_CSC_C1_SHIFT		16
+
+#define VPE_CSC_CSC03			0x570c
+#define VPE_CSC_A2_MASK			0x1fff
+#define VPE_CSC_A2_SHIFT		0
+#define VPE_CSC_B2_MASK			0x1fff
+#define VPE_CSC_B2_SHIFT		16
+
+#define VPE_CSC_CSC04			0x5710
+#define VPE_CSC_C2_MASK			0x1fff
+#define VPE_CSC_C2_SHIFT		0
+#define VPE_CSC_D0_MASK			0x0fff
+#define VPE_CSC_D0_SHIFT		16
+
+#define VPE_CSC_CSC05			0x5714
+#define VPE_CSC_D1_MASK			0x0fff
+#define VPE_CSC_D1_SHIFT		0
+#define VPE_CSC_D2_MASK			0x0fff
+#define VPE_CSC_D2_SHIFT		16
+#define VPE_CSC_BYPASS			(1 << 28)
+
+#endif
diff --git a/include/uapi/linux/v4l2-controls.h b/include/uapi/linux/v4l2-controls.h
index 083bb5a..1666aab 100644
--- a/include/uapi/linux/v4l2-controls.h
+++ b/include/uapi/linux/v4l2-controls.h
@@ -160,6 +160,10 @@ enum v4l2_colorfx {
  * of controls. Total of 16 controls is reserved for this driver */
 #define V4L2_CID_USER_SI476X_BASE		(V4L2_CID_USER_BASE + 0x1040)
 
+/* The base for the TI VPE driver controls. Total of 16 controls is reserved for
+ * this driver */
+#define V4L2_CID_USER_TI_VPE_BASE		(V4L2_CID_USER_BASE + 0x1050)
+
 /* MPEG-class control IDs */
 /* The MPEG controls are applicable to all codec controls
  * and the 'MPEG' part of the define is historical */
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v4 4/4] v4l: ti-vpe: Add de-interlacer support in VPE
  2013-09-06 10:12     ` Archit Taneja
@ 2013-09-06 10:12       ` Archit Taneja
  -1 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-09-06 10:12 UTC (permalink / raw)
  To: linux-media, hverkuil, laurent.pinchart
  Cc: linux-omap, tomi.valkeinen, Archit Taneja

Add support for the de-interlacer block in VPE.

For de-interlacer to work, we need to enable 2 more sets of VPE input ports
which fetch data from the 'last' and 'last to last' fields of the interlaced
video. Apart from that, we need to enable the Motion vector output and input
ports, and also allocate DMA buffers for them.

We need to make sure that two most recent fields in the source queue are
available and in the 'READY' state. Once a mem2mem context gets access to the
VPE HW(in device_run), it extracts the addresses of the 3 buffers, and provides
it to the data descriptors for the 3 sets of input ports((LUMA1, CHROMA1),
(LUMA2, CHROMA2), and (LUMA3, CHROMA3)) respectively for the 3 consecutive
fields. The motion vector and output port descriptors are configured and the
list is submitted to VPDMA.

Once the transaction is done, the v4l2 buffer corresponding to the oldest
field(the 3rd one) is changed to the state 'DONE', and the buffers corresponding
to 1st and 2nd fields become the 2nd and 3rd field for the next de-interlace
operation. This way, for each deinterlace operation, we have the 3 most recent
fields. After each transaction, we also swap the motion vector buffers, the new
input motion vector buffer contains the resultant motion information of all the
previous frames, and the new output motion vector buffer will be used to hold
the updated motion vector to capture the motion changes in the next field. The
motion vector buffers are allocated using the DMA allocation API.

The de-interlacer is removed from bypass mode, it requires some extra default
configurations which are now added. The chrominance upsampler coefficients are
added for interlaced frames. Some VPDMA parameters like frame start event and
line mode are configured for the 2 extra sets of input ports.

Signed-off-by: Archit Taneja <archit@ti.com>
---
 drivers/media/platform/ti-vpe/vpe.c | 392 ++++++++++++++++++++++++++++++++----
 1 file changed, 358 insertions(+), 34 deletions(-)

diff --git a/drivers/media/platform/ti-vpe/vpe.c b/drivers/media/platform/ti-vpe/vpe.c
index 549681e..a892b69 100644
--- a/drivers/media/platform/ti-vpe/vpe.c
+++ b/drivers/media/platform/ti-vpe/vpe.c
@@ -69,6 +69,8 @@
 #define VPE_CHROMA	1
 
 /* per m2m context info */
+#define VPE_MAX_SRC_BUFS	3	/* need 3 src fields to de-interlace */
+
 #define VPE_DEF_BUFS_PER_JOB	1	/* default one buffer per batch job */
 
 /*
@@ -111,6 +113,38 @@ static const struct vpe_us_coeffs us_coeffs[] = {
 		0x00C8, 0x0348, 0x0018, 0x3FD8, 0x3FB8, 0x0378, 0x00E8, 0x3FE8,
 		0x00C8, 0x0348, 0x0018, 0x3FD8, 0x3FB8, 0x0378, 0x00E8, 0x3FE8,
 	},
+	{
+		/* Coefficients for Top Field Interlaced input */
+		0x0051, 0x03D5, 0x3FE3, 0x3FF7, 0x3FB5, 0x02E9, 0x018F, 0x3FD3,
+		/* Coefficients for Bottom Field Interlaced input */
+		0x016B, 0x0247, 0x00B1, 0x3F9D, 0x3FCF, 0x03DB, 0x005D, 0x3FF9,
+	},
+};
+
+/*
+ * the following registers are for configuring some of the parameters of the
+ * motion and edge detection blocks inside DEI, these generally remain the same,
+ * these could be passed later via userspace if some one needs to tweak these.
+ */
+struct vpe_dei_regs {
+	unsigned long mdt_spacial_freq_thr_reg;		/* VPE_DEI_REG2 */
+	unsigned long edi_config_reg;			/* VPE_DEI_REG3 */
+	unsigned long edi_lut_reg0;			/* VPE_DEI_REG4 */
+	unsigned long edi_lut_reg1;			/* VPE_DEI_REG5 */
+	unsigned long edi_lut_reg2;			/* VPE_DEI_REG6 */
+	unsigned long edi_lut_reg3;			/* VPE_DEI_REG7 */
+};
+
+/*
+ * default expert DEI register values, unlikely to be modified.
+ */
+static const struct vpe_dei_regs dei_regs = {
+	0x020C0804u,
+	0x0118100Fu,
+	0x08040200u,
+	0x1010100Cu,
+	0x10101010u,
+	0x10101010u,
 };
 
 /*
@@ -118,6 +152,7 @@ static const struct vpe_us_coeffs us_coeffs[] = {
  */
 struct vpe_port_data {
 	enum vpdma_channel channel;	/* VPDMA channel */
+	u8	vb_index;		/* input frame f, f-1, f-2 index */
 	u8	vb_part;		/* plane index for co-panar formats */
 };
 
@@ -126,6 +161,12 @@ struct vpe_port_data {
  */
 #define VPE_PORT_LUMA1_IN	0
 #define VPE_PORT_CHROMA1_IN	1
+#define VPE_PORT_LUMA2_IN	2
+#define VPE_PORT_CHROMA2_IN	3
+#define VPE_PORT_LUMA3_IN	4
+#define VPE_PORT_CHROMA3_IN	5
+#define VPE_PORT_MV_IN		6
+#define VPE_PORT_MV_OUT		7
 #define VPE_PORT_LUMA_OUT	8
 #define VPE_PORT_CHROMA_OUT	9
 #define VPE_PORT_RGB_OUT	10
@@ -133,12 +174,40 @@ struct vpe_port_data {
 static const struct vpe_port_data port_data[11] = {
 	[VPE_PORT_LUMA1_IN] = {
 		.channel	= VPE_CHAN_LUMA1_IN,
+		.vb_index	= 0,
 		.vb_part	= VPE_LUMA,
 	},
 	[VPE_PORT_CHROMA1_IN] = {
 		.channel	= VPE_CHAN_CHROMA1_IN,
+		.vb_index	= 0,
+		.vb_part	= VPE_CHROMA,
+	},
+	[VPE_PORT_LUMA2_IN] = {
+		.channel	= VPE_CHAN_LUMA2_IN,
+		.vb_index	= 1,
+		.vb_part	= VPE_LUMA,
+	},
+	[VPE_PORT_CHROMA2_IN] = {
+		.channel	= VPE_CHAN_CHROMA2_IN,
+		.vb_index	= 1,
 		.vb_part	= VPE_CHROMA,
 	},
+	[VPE_PORT_LUMA3_IN] = {
+		.channel	= VPE_CHAN_LUMA3_IN,
+		.vb_index	= 2,
+		.vb_part	= VPE_LUMA,
+	},
+	[VPE_PORT_CHROMA3_IN] = {
+		.channel	= VPE_CHAN_CHROMA3_IN,
+		.vb_index	= 2,
+		.vb_part	= VPE_CHROMA,
+	},
+	[VPE_PORT_MV_IN] = {
+		.channel	= VPE_CHAN_MV_IN,
+	},
+	[VPE_PORT_MV_OUT] = {
+		.channel	= VPE_CHAN_MV_OUT,
+	},
 	[VPE_PORT_LUMA_OUT] = {
 		.channel	= VPE_CHAN_LUMA_OUT,
 		.vb_part	= VPE_LUMA,
@@ -210,6 +279,7 @@ struct vpe_q_data {
 	unsigned int		height;				/* frame height */
 	unsigned int		bytesperline[VPE_MAX_PLANES];	/* bytes per line in memory */
 	enum v4l2_colorspace	colorspace;
+	enum v4l2_field		field;				/* supported field value */
 	unsigned int		flags;
 	unsigned int		sizeimage[VPE_MAX_PLANES];	/* image size in memory */
 	struct v4l2_rect	c_rect;				/* crop/compose rectangle */
@@ -219,6 +289,7 @@ struct vpe_q_data {
 /* vpe_q_data flag bits */
 #define	Q_DATA_FRAME_1D		(1 << 0)
 #define	Q_DATA_MODE_TILED	(1 << 1)
+#define	Q_DATA_INTERLACED	(1 << 2)
 
 enum {
 	Q_DATA_SRC = 0,
@@ -270,6 +341,7 @@ struct vpe_ctx {
 	struct v4l2_m2m_ctx	*m2m_ctx;
 	struct v4l2_ctrl_handler hdl;
 
+	unsigned int		field;			/* current field */
 	unsigned int		sequence;		/* current frame/field seq */
 	unsigned int		aborting;		/* abort after next irq */
 
@@ -277,13 +349,19 @@ struct vpe_ctx {
 	unsigned int		bufs_completed;		/* bufs done in this batch */
 
 	struct vpe_q_data	q_data[2];		/* src & dst queue data */
-	struct vb2_buffer	*src_vb;
+	struct vb2_buffer	*src_vbs[VPE_MAX_SRC_BUFS];
 	struct vb2_buffer	*dst_vb;
 
+	dma_addr_t		mv_buf_dma[2];		/* dma addrs of motion vector in/out bufs */
+	void			*mv_buf[2];		/* virtual addrs of motion vector bufs */
+	size_t			mv_buf_size;		/* current motion vector buffer size */
 	struct vpdma_buf	mmr_adb;		/* shadow reg addr/data block */
 	struct vpdma_desc_list	desc_list;		/* DMA descriptor list */
 
+	bool			deinterlacing;		/* using de-interlacer */
 	bool			load_mmrs;		/* have new shadow reg values */
+
+	unsigned int		src_mv_buf_selector;
 };
 
 
@@ -359,8 +437,7 @@ struct vpe_mmr_adb {
 	struct vpdma_adb_hdr	us3_hdr;
 	u32			us3_regs[8];
 	struct vpdma_adb_hdr	dei_hdr;
-	u32			dei_regs[1];
-	u32			dei_pad[3];
+	u32			dei_regs[8];
 	struct vpdma_adb_hdr	sc_hdr;
 	u32			sc_regs[1];
 	u32			sc_pad[3];
@@ -386,6 +463,80 @@ static void init_adb_hdrs(struct vpe_ctx *ctx)
 };
 
 /*
+ * Allocate or re-allocate the motion vector DMA buffers
+ * There are two buffers, one for input and one for output.
+ * However, the roles are reversed after each field is processed.
+ * In other words, after each field is processed, the previous
+ * output (dst) MV buffer becomes the new input (src) MV buffer.
+ */
+static int realloc_mv_buffers(struct vpe_ctx *ctx, size_t size)
+{
+	struct device *dev = ctx->dev->v4l2_dev.dev;
+
+	if (ctx->mv_buf_size == size)
+		return 0;
+
+	if (ctx->mv_buf[0])
+		dma_free_coherent(dev, ctx->mv_buf_size, ctx->mv_buf[0],
+			ctx->mv_buf_dma[0]);
+
+	if (ctx->mv_buf[1])
+		dma_free_coherent(dev, ctx->mv_buf_size, ctx->mv_buf[1],
+			ctx->mv_buf_dma[1]);
+
+	if (size == 0)
+		return 0;
+
+	ctx->mv_buf[0] = dma_alloc_coherent(dev, size, &ctx->mv_buf_dma[0],
+				GFP_KERNEL);
+	if (!ctx->mv_buf[0]) {
+		vpe_err(ctx->dev, "failed to allocate motion vector buffer\n");
+		return -ENOMEM;
+	}
+
+	ctx->mv_buf[1] = dma_alloc_coherent(dev, size, &ctx->mv_buf_dma[1],
+				GFP_KERNEL);
+	if (!ctx->mv_buf[1]) {
+		vpe_err(ctx->dev, "failed to allocate motion vector buffer\n");
+		dma_free_coherent(dev, size, ctx->mv_buf[0],
+			ctx->mv_buf_dma[0]);
+
+		return -ENOMEM;
+	}
+
+	ctx->mv_buf_size = size;
+	ctx->src_mv_buf_selector = 0;
+
+	return 0;
+}
+
+static void free_mv_buffers(struct vpe_ctx *ctx)
+{
+	realloc_mv_buffers(ctx, 0);
+}
+
+/*
+ * While de-interlacing, we keep the two most recent input buffers
+ * around.  This function frees those two buffers when we have
+ * finished processing the current stream.
+ */
+static void free_vbs(struct vpe_ctx *ctx)
+{
+	struct vpe_dev *dev = ctx->dev;
+	unsigned long flags;
+
+	if (ctx->src_vbs[2] == NULL)
+		return;
+
+	spin_lock_irqsave(&dev->lock, flags);
+	if (ctx->src_vbs[2]) {
+		v4l2_m2m_buf_done(ctx->src_vbs[2], VB2_BUF_STATE_DONE);
+		v4l2_m2m_buf_done(ctx->src_vbs[1], VB2_BUF_STATE_DONE);
+	}
+	spin_unlock_irqrestore(&dev->lock, flags);
+}
+
+/*
  * Enable or disable the VPE clocks
  */
 static void vpe_set_clock_enable(struct vpe_dev *dev, bool on)
@@ -426,6 +577,7 @@ static void vpe_top_vpdma_reset(struct vpe_dev *dev)
 static void set_us_coefficients(struct vpe_ctx *ctx)
 {
 	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	struct vpe_q_data *s_q_data = &ctx->q_data[Q_DATA_SRC];
 	u32 *us1_reg = &mmr_adb->us1_regs[0];
 	u32 *us2_reg = &mmr_adb->us2_regs[0];
 	u32 *us3_reg = &mmr_adb->us3_regs[0];
@@ -433,6 +585,9 @@ static void set_us_coefficients(struct vpe_ctx *ctx)
 
 	cp = &us_coeffs[0].anchor_fid0_c0;
 
+	if (s_q_data->flags & Q_DATA_INTERLACED)	/* interlaced */
+		cp += sizeof(us_coeffs[0]) / sizeof(*cp);
+
 	end_cp = cp + sizeof(us_coeffs[0]) / sizeof(*cp);
 
 	while (cp < end_cp) {
@@ -473,14 +628,28 @@ static void set_cfg_and_line_modes(struct vpe_ctx *ctx)
 
 	/* regs for now */
 	vpdma_set_line_mode(ctx->dev->vpdma, line_mode, VPE_CHAN_CHROMA1_IN);
+	vpdma_set_line_mode(ctx->dev->vpdma, line_mode, VPE_CHAN_CHROMA2_IN);
+	vpdma_set_line_mode(ctx->dev->vpdma, line_mode, VPE_CHAN_CHROMA3_IN);
 
 	/* frame start for input luma */
 	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
 		VPE_CHAN_LUMA1_IN);
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_LUMA2_IN);
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_LUMA3_IN);
 
 	/* frame start for input chroma */
 	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
 		VPE_CHAN_CHROMA1_IN);
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_CHROMA2_IN);
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_CHROMA3_IN);
+
+	/* frame start for MV in client */
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_MV_IN);
 
 	ctx->load_mmrs = true;
 }
@@ -524,13 +693,14 @@ static void set_dst_registers(struct vpe_ctx *ctx)
 /*
  * Set the de-interlacer shadow register values
  */
-static void set_dei_regs_bypass(struct vpe_ctx *ctx)
+static void set_dei_regs(struct vpe_ctx *ctx)
 {
 	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
 	struct vpe_q_data *s_q_data = &ctx->q_data[Q_DATA_SRC];
 	unsigned int src_h = s_q_data->c_rect.height;
 	unsigned int src_w = s_q_data->c_rect.width;
 	u32 *dei_mmr0 = &mmr_adb->dei_regs[0];
+	bool deinterlace = true;
 	u32 val = 0;
 
 	/*
@@ -539,7 +709,13 @@ static void set_dei_regs_bypass(struct vpe_ctx *ctx)
 	 * for both progressive and interlace content in interlace bypass mode.
 	 * It has been recommended not to use progressive bypass mode.
 	 */
-	val = VPE_DEI_INTERLACE_BYPASS;
+	if ((!ctx->deinterlacing && (s_q_data->flags & Q_DATA_INTERLACED)) ||
+			!(s_q_data->flags & Q_DATA_INTERLACED)) {
+		deinterlace = false;
+		val = VPE_DEI_INTERLACE_BYPASS;
+	}
+
+	src_h = deinterlace ? src_h * 2 : src_h;
 
 	val |= (src_h << VPE_DEI_HEIGHT_SHIFT) |
 		(src_w << VPE_DEI_WIDTH_SHIFT) |
@@ -550,6 +726,22 @@ static void set_dei_regs_bypass(struct vpe_ctx *ctx)
 	ctx->load_mmrs = true;
 }
 
+static void set_dei_shadow_registers(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	u32 *dei_mmr = &mmr_adb->dei_regs[0];
+	const struct vpe_dei_regs *cur = &dei_regs;
+
+	dei_mmr[2]  = cur->mdt_spacial_freq_thr_reg;
+	dei_mmr[3]  = cur->edi_config_reg;
+	dei_mmr[4]  = cur->edi_lut_reg0;
+	dei_mmr[5]  = cur->edi_lut_reg1;
+	dei_mmr[6]  = cur->edi_lut_reg2;
+	dei_mmr[7]  = cur->edi_lut_reg3;
+
+	ctx->load_mmrs = true;
+}
+
 static void set_csc_coeff_bypass(struct vpe_ctx *ctx)
 {
 	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
@@ -578,10 +770,35 @@ static void set_sc_regs_bypass(struct vpe_ctx *ctx)
  */
 static int set_srcdst_params(struct vpe_ctx *ctx)
 {
+	struct vpe_q_data *s_q_data =  &ctx->q_data[Q_DATA_SRC];
+	struct vpe_q_data *d_q_data =  &ctx->q_data[Q_DATA_DST];
+	size_t mv_buf_size;
+	int ret;
+
 	ctx->sequence = 0;
+	ctx->field = V4L2_FIELD_TOP;
+
+	if ((s_q_data->flags & Q_DATA_INTERLACED) &&
+			!(d_q_data->flags & Q_DATA_INTERLACED)) {
+		const struct vpdma_data_format *mv =
+			&vpdma_misc_fmts[VPDMA_DATA_FMT_MV];
+
+		ctx->deinterlacing = 1;
+		mv_buf_size =
+			(s_q_data->width * s_q_data->height * mv->depth) >> 3;
+	} else {
+		ctx->deinterlacing = 0;
+		mv_buf_size = 0;
+	}
+
+	free_vbs(ctx);
+
+	ret = realloc_mv_buffers(ctx, mv_buf_size);
+	if (ret)
+		return ret;
 
 	set_cfg_and_line_modes(ctx);
-	set_dei_regs_bypass(ctx);
+	set_dei_regs(ctx);
 	set_csc_coeff_bypass(ctx);
 	set_sc_regs_bypass(ctx);
 
@@ -608,6 +825,9 @@ static int job_ready(void *priv)
 	struct vpe_ctx *ctx = priv;
 	int needed = ctx->bufs_per_job;
 
+	if (ctx->deinterlacing && ctx->src_vbs[2] == NULL)
+		needed += 2;	/* need additional two most recent fields */
+
 	if (v4l2_m2m_num_src_bufs_ready(ctx->m2m_ctx) < needed)
 		return 0;
 
@@ -735,17 +955,25 @@ static void add_out_dtd(struct vpe_ctx *ctx, int port)
 	struct v4l2_rect *c_rect = &q_data->c_rect;
 	struct vpe_fmt *fmt = q_data->fmt;
 	const struct vpdma_data_format *vpdma_fmt;
-	int plane = fmt->coplanar ? p_data->vb_part : 0;
+	int mv_buf_selector = !ctx->src_mv_buf_selector;
 	dma_addr_t dma_addr;
 	u32 flags = 0;
 
-	vpdma_fmt = fmt->vpdma_fmt[plane];
-	dma_addr = vb2_dma_contig_plane_dma_addr(vb, plane);
-	if (!dma_addr) {
-		vpe_err(ctx->dev,
-			"acquiring output buffer(%d) dma_addr failed\n",
-			port);
-		return;
+	if (port == VPE_PORT_MV_OUT) {
+		vpdma_fmt = &vpdma_misc_fmts[VPDMA_DATA_FMT_MV];
+		dma_addr = ctx->mv_buf_dma[mv_buf_selector];
+	} else {
+		/* to incorporate interleaved formats */
+		int plane = fmt->coplanar ? p_data->vb_part : 0;
+
+		vpdma_fmt = fmt->vpdma_fmt[plane];
+		dma_addr = vb2_dma_contig_plane_dma_addr(vb, plane);
+		if (!dma_addr) {
+			vpe_err(ctx->dev,
+				"acquiring output buffer(%d) dma_addr failed\n",
+				port);
+			return;
+		}
 	}
 
 	if (q_data->flags & Q_DATA_FRAME_1D)
@@ -761,23 +989,31 @@ static void add_in_dtd(struct vpe_ctx *ctx, int port)
 {
 	struct vpe_q_data *q_data = &ctx->q_data[Q_DATA_SRC];
 	const struct vpe_port_data *p_data = &port_data[port];
-	struct vb2_buffer *vb = ctx->src_vb;
+	struct vb2_buffer *vb = ctx->src_vbs[p_data->vb_index];
 	struct v4l2_rect *c_rect = &q_data->c_rect;
 	struct vpe_fmt *fmt = q_data->fmt;
 	const struct vpdma_data_format *vpdma_fmt;
-	int plane = fmt->coplanar ? p_data->vb_part : 0;
-	int field = 0;
+	int mv_buf_selector = ctx->src_mv_buf_selector;
+	int field = vb->v4l2_buf.field == V4L2_FIELD_BOTTOM;
 	dma_addr_t dma_addr;
 	u32 flags = 0;
 
-	vpdma_fmt = fmt->vpdma_fmt[plane];
+	if (port == VPE_PORT_MV_IN) {
+		vpdma_fmt = &vpdma_misc_fmts[VPDMA_DATA_FMT_MV];
+		dma_addr = ctx->mv_buf_dma[mv_buf_selector];
+	} else {
+		/* to incorporate interleaved formats */
+		int plane = fmt->coplanar ? p_data->vb_part : 0;
 
-	dma_addr = vb2_dma_contig_plane_dma_addr(vb, plane);
-	if (!dma_addr) {
-		vpe_err(ctx->dev,
-			"acquiring input buffer(%d) dma_addr failed\n",
-			port);
-		return;
+		vpdma_fmt = fmt->vpdma_fmt[plane];
+
+		dma_addr = vb2_dma_contig_plane_dma_addr(vb, plane);
+		if (!dma_addr) {
+			vpe_err(ctx->dev,
+				"acquiring input buffer(%d) dma_addr failed\n",
+				port);
+			return;
+		}
 	}
 
 	if (q_data->flags & Q_DATA_FRAME_1D)
@@ -795,7 +1031,8 @@ static void add_in_dtd(struct vpe_ctx *ctx, int port)
 static void enable_irqs(struct vpe_ctx *ctx)
 {
 	write_reg(ctx->dev, VPE_INT0_ENABLE0_SET, VPE_INT0_LIST0_COMPLETE);
-	write_reg(ctx->dev, VPE_INT0_ENABLE1_SET, VPE_DS1_UV_ERROR_INT);
+	write_reg(ctx->dev, VPE_INT0_ENABLE1_SET, VPE_DEI_ERROR_INT |
+				VPE_DS1_UV_ERROR_INT);
 
 	vpdma_enable_list_complete_irq(ctx->dev->vpdma, 0, true);
 }
@@ -818,8 +1055,15 @@ static void device_run(void *priv)
 	struct vpe_ctx *ctx = priv;
 	struct vpe_q_data *d_q_data = &ctx->q_data[Q_DATA_DST];
 
-	ctx->src_vb = v4l2_m2m_src_buf_remove(ctx->m2m_ctx);
-	WARN_ON(ctx->src_vb == NULL);
+	if (ctx->deinterlacing && ctx->src_vbs[2] == NULL) {
+		ctx->src_vbs[2] = v4l2_m2m_src_buf_remove(ctx->m2m_ctx);
+		WARN_ON(ctx->src_vbs[2] == NULL);
+		ctx->src_vbs[1] = v4l2_m2m_src_buf_remove(ctx->m2m_ctx);
+		WARN_ON(ctx->src_vbs[1] == NULL);
+	}
+
+	ctx->src_vbs[0] = v4l2_m2m_src_buf_remove(ctx->m2m_ctx);
+	WARN_ON(ctx->src_vbs[0] == NULL);
 	ctx->dst_vb = v4l2_m2m_dst_buf_remove(ctx->m2m_ctx);
 	WARN_ON(ctx->dst_vb == NULL);
 
@@ -831,28 +1075,67 @@ static void device_run(void *priv)
 		ctx->load_mmrs = false;
 	}
 
+	/* output data descriptors */
+	if (ctx->deinterlacing)
+		add_out_dtd(ctx, VPE_PORT_MV_OUT);
+
 	add_out_dtd(ctx, VPE_PORT_LUMA_OUT);
 	if (d_q_data->fmt->coplanar)
 		add_out_dtd(ctx, VPE_PORT_CHROMA_OUT);
 
+	/* input data descriptors */
+	if (ctx->deinterlacing) {
+		add_in_dtd(ctx, VPE_PORT_LUMA3_IN);
+		add_in_dtd(ctx, VPE_PORT_CHROMA3_IN);
+
+		add_in_dtd(ctx, VPE_PORT_LUMA2_IN);
+		add_in_dtd(ctx, VPE_PORT_CHROMA2_IN);
+	}
+
 	add_in_dtd(ctx, VPE_PORT_LUMA1_IN);
 	add_in_dtd(ctx, VPE_PORT_CHROMA1_IN);
 
+	if (ctx->deinterlacing)
+		add_in_dtd(ctx, VPE_PORT_MV_IN);
+
 	/* sync on channel control descriptors for input ports */
 	vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_LUMA1_IN);
 	vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_CHROMA1_IN);
 
+	if (ctx->deinterlacing) {
+		vpdma_add_sync_on_channel_ctd(&ctx->desc_list,
+			VPE_CHAN_LUMA2_IN);
+		vpdma_add_sync_on_channel_ctd(&ctx->desc_list,
+			VPE_CHAN_CHROMA2_IN);
+
+		vpdma_add_sync_on_channel_ctd(&ctx->desc_list,
+			VPE_CHAN_LUMA3_IN);
+		vpdma_add_sync_on_channel_ctd(&ctx->desc_list,
+			VPE_CHAN_CHROMA3_IN);
+
+		vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_MV_IN);
+	}
+
 	/* sync on channel control descriptors for output ports */
 	vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_LUMA_OUT);
 	if (d_q_data->fmt->coplanar)
 		vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_CHROMA_OUT);
 
+	if (ctx->deinterlacing)
+		vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_MV_OUT);
+
 	enable_irqs(ctx);
 
 	vpdma_map_desc_buf(ctx->dev->vpdma, &ctx->desc_list.buf);
 	vpdma_submit_descs(ctx->dev->vpdma, &ctx->desc_list);
 }
 
+static void dei_error(struct vpe_ctx *ctx)
+{
+	dev_warn(ctx->dev->v4l2_dev.dev,
+		"received DEI error interrupt\n");
+}
+
 static void ds1_uv_error(struct vpe_ctx *ctx)
 {
 	dev_warn(ctx->dev->v4l2_dev.dev,
@@ -863,6 +1146,7 @@ static irqreturn_t vpe_irq(int irq_vpe, void *data)
 {
 	struct vpe_dev *dev = (struct vpe_dev *)data;
 	struct vpe_ctx *ctx;
+	struct vpe_q_data *d_q_data;
 	struct vb2_buffer *s_vb, *d_vb;
 	struct v4l2_buffer *s_buf, *d_buf;
 	unsigned long flags;
@@ -886,9 +1170,15 @@ static irqreturn_t vpe_irq(int irq_vpe, void *data)
 		goto handled;
 	}
 
-	if (irqst1 & VPE_DS1_UV_ERROR_INT) {
-		irqst1 &= ~VPE_DS1_UV_ERROR_INT;
-		ds1_uv_error(ctx);
+	if (irqst1) {
+		if (irqst1 & VPE_DEI_ERROR_INT) {
+			irqst1 &= ~VPE_DEI_ERROR_INT;
+			dei_error(ctx);
+		}
+		if (irqst1 & VPE_DS1_UV_ERROR_INT) {
+			irqst1 &= ~VPE_DS1_UV_ERROR_INT;
+			ds1_uv_error(ctx);
+		}
 	}
 
 	if (irqst0) {
@@ -911,10 +1201,13 @@ static irqreturn_t vpe_irq(int irq_vpe, void *data)
 
 	vpdma_reset_desc_list(&ctx->desc_list);
 
+	 /* the previous dst mv buffer becomes the next src mv buffer */
+	ctx->src_mv_buf_selector = !ctx->src_mv_buf_selector;
+
 	if (ctx->aborting)
 		goto finished;
 
-	s_vb = ctx->src_vb;
+	s_vb = ctx->src_vbs[0];
 	d_vb = ctx->dst_vb;
 	s_buf = &s_vb->v4l2_buf;
 	d_buf = &d_vb->v4l2_buf;
@@ -924,16 +1217,35 @@ static irqreturn_t vpe_irq(int irq_vpe, void *data)
 		d_buf->flags |= V4L2_BUF_FLAG_TIMECODE;
 		d_buf->timecode = s_buf->timecode;
 	}
-
 	d_buf->sequence = ctx->sequence;
+	d_buf->field = ctx->field;
+
+	d_q_data = &ctx->q_data[Q_DATA_DST];
+	if (d_q_data->flags & Q_DATA_INTERLACED) {
+		if (ctx->field == V4L2_FIELD_BOTTOM) {
+			ctx->sequence++;
+			ctx->field = V4L2_FIELD_TOP;
+		} else {
+			WARN_ON(ctx->field != V4L2_FIELD_TOP);
+			ctx->field = V4L2_FIELD_BOTTOM;
+		}
+	} else {
+		ctx->sequence++;
+	}
 
-	ctx->sequence++;
+	if (ctx->deinterlacing)
+		s_vb = ctx->src_vbs[2];
 
 	spin_lock_irqsave(&dev->lock, flags);
 	v4l2_m2m_buf_done(s_vb, VB2_BUF_STATE_DONE);
 	v4l2_m2m_buf_done(d_vb, VB2_BUF_STATE_DONE);
 	spin_unlock_irqrestore(&dev->lock, flags);
 
+	if (ctx->deinterlacing) {
+		ctx->src_vbs[2] = ctx->src_vbs[1];
+		ctx->src_vbs[1] = ctx->src_vbs[0];
+	}
+
 	ctx->bufs_completed++;
 	if (ctx->bufs_completed < ctx->bufs_per_job) {
 		device_run(ctx);
@@ -1012,6 +1324,7 @@ static int vpe_g_fmt(struct file *file, void *priv, struct v4l2_format *f)
 	pix->width = q_data->width;
 	pix->height = q_data->height;
 	pix->pixelformat = q_data->fmt->fourcc;
+	pix->field = q_data->field;
 	pix->colorspace = q_data->colorspace;
 	pix->num_planes = q_data->fmt->coplanar ? 2 : 1;
 
@@ -1036,7 +1349,8 @@ static int __vpe_try_fmt(struct vpe_ctx *ctx, struct v4l2_format *f,
 		return -EINVAL;
 	}
 
-	pix->field = V4L2_FIELD_NONE;
+	if (pix->field != V4L2_FIELD_NONE && pix->field != V4L2_FIELD_ALTERNATE)
+		pix->field = V4L2_FIELD_NONE;
 
 	v4l_bound_align_image(&pix->width, MIN_W, MAX_W, W_ALIGN,
 			      &pix->height, MIN_H, MAX_H, H_ALIGN,
@@ -1103,6 +1417,7 @@ static int __vpe_s_fmt(struct vpe_ctx *ctx, struct v4l2_format *f)
 	q_data->width		= pix->width;
 	q_data->height		= pix->height;
 	q_data->colorspace	= pix->colorspace;
+	q_data->field		= pix->field;
 
 	for (i = 0; i < pix->num_planes; i++) {
 		plane_fmt = &pix->plane_fmt[i];
@@ -1116,6 +1431,11 @@ static int __vpe_s_fmt(struct vpe_ctx *ctx, struct v4l2_format *f)
 	q_data->c_rect.width	= q_data->width;
 	q_data->c_rect.height	= q_data->height;
 
+	if (q_data->field == V4L2_FIELD_ALTERNATE)
+		q_data->flags |= Q_DATA_INTERLACED;
+	else
+		q_data->flags &= ~Q_DATA_INTERLACED;
+
 	vpe_dbg(ctx->dev, "Setting format for type %d, wxh: %dx%d, fmt: %d bpl_y %d",
 		f->type, q_data->width, q_data->height, q_data->fmt->fourcc,
 		q_data->bytesperline[VPE_LUMA]);
@@ -1426,6 +1746,7 @@ static int vpe_open(struct file *file)
 	s_q_data->sizeimage[VPE_LUMA] = (s_q_data->width * s_q_data->height *
 			s_q_data->fmt->vpdma_fmt[VPE_LUMA]->depth) >> 3;
 	s_q_data->colorspace = V4L2_COLORSPACE_SMPTE240M;
+	s_q_data->field = V4L2_FIELD_NONE;
 	s_q_data->c_rect.left = 0;
 	s_q_data->c_rect.top = 0;
 	s_q_data->c_rect.width = s_q_data->width;
@@ -1434,6 +1755,7 @@ static int vpe_open(struct file *file)
 
 	ctx->q_data[Q_DATA_DST] = *s_q_data;
 
+	set_dei_shadow_registers(ctx);
 	set_src_registers(ctx);
 	set_dst_registers(ctx);
 	ret = set_srcdst_params(ctx);
@@ -1488,6 +1810,8 @@ static int vpe_release(struct file *file)
 	vpe_dbg(dev, "releasing instance %p\n", ctx);
 
 	mutex_lock(&dev->dev_mutex);
+	free_vbs(ctx);
+	free_mv_buffers(ctx);
 	vpdma_free_desc_list(&ctx->desc_list);
 	vpdma_free_desc_buf(&ctx->mmr_adb);
 
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v4 4/4] v4l: ti-vpe: Add de-interlacer support in VPE
@ 2013-09-06 10:12       ` Archit Taneja
  0 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-09-06 10:12 UTC (permalink / raw)
  To: linux-media, hverkuil, laurent.pinchart
  Cc: linux-omap, tomi.valkeinen, Archit Taneja

Add support for the de-interlacer block in VPE.

For de-interlacer to work, we need to enable 2 more sets of VPE input ports
which fetch data from the 'last' and 'last to last' fields of the interlaced
video. Apart from that, we need to enable the Motion vector output and input
ports, and also allocate DMA buffers for them.

We need to make sure that two most recent fields in the source queue are
available and in the 'READY' state. Once a mem2mem context gets access to the
VPE HW(in device_run), it extracts the addresses of the 3 buffers, and provides
it to the data descriptors for the 3 sets of input ports((LUMA1, CHROMA1),
(LUMA2, CHROMA2), and (LUMA3, CHROMA3)) respectively for the 3 consecutive
fields. The motion vector and output port descriptors are configured and the
list is submitted to VPDMA.

Once the transaction is done, the v4l2 buffer corresponding to the oldest
field(the 3rd one) is changed to the state 'DONE', and the buffers corresponding
to 1st and 2nd fields become the 2nd and 3rd field for the next de-interlace
operation. This way, for each deinterlace operation, we have the 3 most recent
fields. After each transaction, we also swap the motion vector buffers, the new
input motion vector buffer contains the resultant motion information of all the
previous frames, and the new output motion vector buffer will be used to hold
the updated motion vector to capture the motion changes in the next field. The
motion vector buffers are allocated using the DMA allocation API.

The de-interlacer is removed from bypass mode, it requires some extra default
configurations which are now added. The chrominance upsampler coefficients are
added for interlaced frames. Some VPDMA parameters like frame start event and
line mode are configured for the 2 extra sets of input ports.

Signed-off-by: Archit Taneja <archit@ti.com>
---
 drivers/media/platform/ti-vpe/vpe.c | 392 ++++++++++++++++++++++++++++++++----
 1 file changed, 358 insertions(+), 34 deletions(-)

diff --git a/drivers/media/platform/ti-vpe/vpe.c b/drivers/media/platform/ti-vpe/vpe.c
index 549681e..a892b69 100644
--- a/drivers/media/platform/ti-vpe/vpe.c
+++ b/drivers/media/platform/ti-vpe/vpe.c
@@ -69,6 +69,8 @@
 #define VPE_CHROMA	1
 
 /* per m2m context info */
+#define VPE_MAX_SRC_BUFS	3	/* need 3 src fields to de-interlace */
+
 #define VPE_DEF_BUFS_PER_JOB	1	/* default one buffer per batch job */
 
 /*
@@ -111,6 +113,38 @@ static const struct vpe_us_coeffs us_coeffs[] = {
 		0x00C8, 0x0348, 0x0018, 0x3FD8, 0x3FB8, 0x0378, 0x00E8, 0x3FE8,
 		0x00C8, 0x0348, 0x0018, 0x3FD8, 0x3FB8, 0x0378, 0x00E8, 0x3FE8,
 	},
+	{
+		/* Coefficients for Top Field Interlaced input */
+		0x0051, 0x03D5, 0x3FE3, 0x3FF7, 0x3FB5, 0x02E9, 0x018F, 0x3FD3,
+		/* Coefficients for Bottom Field Interlaced input */
+		0x016B, 0x0247, 0x00B1, 0x3F9D, 0x3FCF, 0x03DB, 0x005D, 0x3FF9,
+	},
+};
+
+/*
+ * the following registers are for configuring some of the parameters of the
+ * motion and edge detection blocks inside DEI, these generally remain the same,
+ * these could be passed later via userspace if some one needs to tweak these.
+ */
+struct vpe_dei_regs {
+	unsigned long mdt_spacial_freq_thr_reg;		/* VPE_DEI_REG2 */
+	unsigned long edi_config_reg;			/* VPE_DEI_REG3 */
+	unsigned long edi_lut_reg0;			/* VPE_DEI_REG4 */
+	unsigned long edi_lut_reg1;			/* VPE_DEI_REG5 */
+	unsigned long edi_lut_reg2;			/* VPE_DEI_REG6 */
+	unsigned long edi_lut_reg3;			/* VPE_DEI_REG7 */
+};
+
+/*
+ * default expert DEI register values, unlikely to be modified.
+ */
+static const struct vpe_dei_regs dei_regs = {
+	0x020C0804u,
+	0x0118100Fu,
+	0x08040200u,
+	0x1010100Cu,
+	0x10101010u,
+	0x10101010u,
 };
 
 /*
@@ -118,6 +152,7 @@ static const struct vpe_us_coeffs us_coeffs[] = {
  */
 struct vpe_port_data {
 	enum vpdma_channel channel;	/* VPDMA channel */
+	u8	vb_index;		/* input frame f, f-1, f-2 index */
 	u8	vb_part;		/* plane index for co-panar formats */
 };
 
@@ -126,6 +161,12 @@ struct vpe_port_data {
  */
 #define VPE_PORT_LUMA1_IN	0
 #define VPE_PORT_CHROMA1_IN	1
+#define VPE_PORT_LUMA2_IN	2
+#define VPE_PORT_CHROMA2_IN	3
+#define VPE_PORT_LUMA3_IN	4
+#define VPE_PORT_CHROMA3_IN	5
+#define VPE_PORT_MV_IN		6
+#define VPE_PORT_MV_OUT		7
 #define VPE_PORT_LUMA_OUT	8
 #define VPE_PORT_CHROMA_OUT	9
 #define VPE_PORT_RGB_OUT	10
@@ -133,12 +174,40 @@ struct vpe_port_data {
 static const struct vpe_port_data port_data[11] = {
 	[VPE_PORT_LUMA1_IN] = {
 		.channel	= VPE_CHAN_LUMA1_IN,
+		.vb_index	= 0,
 		.vb_part	= VPE_LUMA,
 	},
 	[VPE_PORT_CHROMA1_IN] = {
 		.channel	= VPE_CHAN_CHROMA1_IN,
+		.vb_index	= 0,
+		.vb_part	= VPE_CHROMA,
+	},
+	[VPE_PORT_LUMA2_IN] = {
+		.channel	= VPE_CHAN_LUMA2_IN,
+		.vb_index	= 1,
+		.vb_part	= VPE_LUMA,
+	},
+	[VPE_PORT_CHROMA2_IN] = {
+		.channel	= VPE_CHAN_CHROMA2_IN,
+		.vb_index	= 1,
 		.vb_part	= VPE_CHROMA,
 	},
+	[VPE_PORT_LUMA3_IN] = {
+		.channel	= VPE_CHAN_LUMA3_IN,
+		.vb_index	= 2,
+		.vb_part	= VPE_LUMA,
+	},
+	[VPE_PORT_CHROMA3_IN] = {
+		.channel	= VPE_CHAN_CHROMA3_IN,
+		.vb_index	= 2,
+		.vb_part	= VPE_CHROMA,
+	},
+	[VPE_PORT_MV_IN] = {
+		.channel	= VPE_CHAN_MV_IN,
+	},
+	[VPE_PORT_MV_OUT] = {
+		.channel	= VPE_CHAN_MV_OUT,
+	},
 	[VPE_PORT_LUMA_OUT] = {
 		.channel	= VPE_CHAN_LUMA_OUT,
 		.vb_part	= VPE_LUMA,
@@ -210,6 +279,7 @@ struct vpe_q_data {
 	unsigned int		height;				/* frame height */
 	unsigned int		bytesperline[VPE_MAX_PLANES];	/* bytes per line in memory */
 	enum v4l2_colorspace	colorspace;
+	enum v4l2_field		field;				/* supported field value */
 	unsigned int		flags;
 	unsigned int		sizeimage[VPE_MAX_PLANES];	/* image size in memory */
 	struct v4l2_rect	c_rect;				/* crop/compose rectangle */
@@ -219,6 +289,7 @@ struct vpe_q_data {
 /* vpe_q_data flag bits */
 #define	Q_DATA_FRAME_1D		(1 << 0)
 #define	Q_DATA_MODE_TILED	(1 << 1)
+#define	Q_DATA_INTERLACED	(1 << 2)
 
 enum {
 	Q_DATA_SRC = 0,
@@ -270,6 +341,7 @@ struct vpe_ctx {
 	struct v4l2_m2m_ctx	*m2m_ctx;
 	struct v4l2_ctrl_handler hdl;
 
+	unsigned int		field;			/* current field */
 	unsigned int		sequence;		/* current frame/field seq */
 	unsigned int		aborting;		/* abort after next irq */
 
@@ -277,13 +349,19 @@ struct vpe_ctx {
 	unsigned int		bufs_completed;		/* bufs done in this batch */
 
 	struct vpe_q_data	q_data[2];		/* src & dst queue data */
-	struct vb2_buffer	*src_vb;
+	struct vb2_buffer	*src_vbs[VPE_MAX_SRC_BUFS];
 	struct vb2_buffer	*dst_vb;
 
+	dma_addr_t		mv_buf_dma[2];		/* dma addrs of motion vector in/out bufs */
+	void			*mv_buf[2];		/* virtual addrs of motion vector bufs */
+	size_t			mv_buf_size;		/* current motion vector buffer size */
 	struct vpdma_buf	mmr_adb;		/* shadow reg addr/data block */
 	struct vpdma_desc_list	desc_list;		/* DMA descriptor list */
 
+	bool			deinterlacing;		/* using de-interlacer */
 	bool			load_mmrs;		/* have new shadow reg values */
+
+	unsigned int		src_mv_buf_selector;
 };
 
 
@@ -359,8 +437,7 @@ struct vpe_mmr_adb {
 	struct vpdma_adb_hdr	us3_hdr;
 	u32			us3_regs[8];
 	struct vpdma_adb_hdr	dei_hdr;
-	u32			dei_regs[1];
-	u32			dei_pad[3];
+	u32			dei_regs[8];
 	struct vpdma_adb_hdr	sc_hdr;
 	u32			sc_regs[1];
 	u32			sc_pad[3];
@@ -386,6 +463,80 @@ static void init_adb_hdrs(struct vpe_ctx *ctx)
 };
 
 /*
+ * Allocate or re-allocate the motion vector DMA buffers
+ * There are two buffers, one for input and one for output.
+ * However, the roles are reversed after each field is processed.
+ * In other words, after each field is processed, the previous
+ * output (dst) MV buffer becomes the new input (src) MV buffer.
+ */
+static int realloc_mv_buffers(struct vpe_ctx *ctx, size_t size)
+{
+	struct device *dev = ctx->dev->v4l2_dev.dev;
+
+	if (ctx->mv_buf_size == size)
+		return 0;
+
+	if (ctx->mv_buf[0])
+		dma_free_coherent(dev, ctx->mv_buf_size, ctx->mv_buf[0],
+			ctx->mv_buf_dma[0]);
+
+	if (ctx->mv_buf[1])
+		dma_free_coherent(dev, ctx->mv_buf_size, ctx->mv_buf[1],
+			ctx->mv_buf_dma[1]);
+
+	if (size == 0)
+		return 0;
+
+	ctx->mv_buf[0] = dma_alloc_coherent(dev, size, &ctx->mv_buf_dma[0],
+				GFP_KERNEL);
+	if (!ctx->mv_buf[0]) {
+		vpe_err(ctx->dev, "failed to allocate motion vector buffer\n");
+		return -ENOMEM;
+	}
+
+	ctx->mv_buf[1] = dma_alloc_coherent(dev, size, &ctx->mv_buf_dma[1],
+				GFP_KERNEL);
+	if (!ctx->mv_buf[1]) {
+		vpe_err(ctx->dev, "failed to allocate motion vector buffer\n");
+		dma_free_coherent(dev, size, ctx->mv_buf[0],
+			ctx->mv_buf_dma[0]);
+
+		return -ENOMEM;
+	}
+
+	ctx->mv_buf_size = size;
+	ctx->src_mv_buf_selector = 0;
+
+	return 0;
+}
+
+static void free_mv_buffers(struct vpe_ctx *ctx)
+{
+	realloc_mv_buffers(ctx, 0);
+}
+
+/*
+ * While de-interlacing, we keep the two most recent input buffers
+ * around.  This function frees those two buffers when we have
+ * finished processing the current stream.
+ */
+static void free_vbs(struct vpe_ctx *ctx)
+{
+	struct vpe_dev *dev = ctx->dev;
+	unsigned long flags;
+
+	if (ctx->src_vbs[2] == NULL)
+		return;
+
+	spin_lock_irqsave(&dev->lock, flags);
+	if (ctx->src_vbs[2]) {
+		v4l2_m2m_buf_done(ctx->src_vbs[2], VB2_BUF_STATE_DONE);
+		v4l2_m2m_buf_done(ctx->src_vbs[1], VB2_BUF_STATE_DONE);
+	}
+	spin_unlock_irqrestore(&dev->lock, flags);
+}
+
+/*
  * Enable or disable the VPE clocks
  */
 static void vpe_set_clock_enable(struct vpe_dev *dev, bool on)
@@ -426,6 +577,7 @@ static void vpe_top_vpdma_reset(struct vpe_dev *dev)
 static void set_us_coefficients(struct vpe_ctx *ctx)
 {
 	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	struct vpe_q_data *s_q_data = &ctx->q_data[Q_DATA_SRC];
 	u32 *us1_reg = &mmr_adb->us1_regs[0];
 	u32 *us2_reg = &mmr_adb->us2_regs[0];
 	u32 *us3_reg = &mmr_adb->us3_regs[0];
@@ -433,6 +585,9 @@ static void set_us_coefficients(struct vpe_ctx *ctx)
 
 	cp = &us_coeffs[0].anchor_fid0_c0;
 
+	if (s_q_data->flags & Q_DATA_INTERLACED)	/* interlaced */
+		cp += sizeof(us_coeffs[0]) / sizeof(*cp);
+
 	end_cp = cp + sizeof(us_coeffs[0]) / sizeof(*cp);
 
 	while (cp < end_cp) {
@@ -473,14 +628,28 @@ static void set_cfg_and_line_modes(struct vpe_ctx *ctx)
 
 	/* regs for now */
 	vpdma_set_line_mode(ctx->dev->vpdma, line_mode, VPE_CHAN_CHROMA1_IN);
+	vpdma_set_line_mode(ctx->dev->vpdma, line_mode, VPE_CHAN_CHROMA2_IN);
+	vpdma_set_line_mode(ctx->dev->vpdma, line_mode, VPE_CHAN_CHROMA3_IN);
 
 	/* frame start for input luma */
 	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
 		VPE_CHAN_LUMA1_IN);
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_LUMA2_IN);
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_LUMA3_IN);
 
 	/* frame start for input chroma */
 	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
 		VPE_CHAN_CHROMA1_IN);
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_CHROMA2_IN);
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_CHROMA3_IN);
+
+	/* frame start for MV in client */
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_MV_IN);
 
 	ctx->load_mmrs = true;
 }
@@ -524,13 +693,14 @@ static void set_dst_registers(struct vpe_ctx *ctx)
 /*
  * Set the de-interlacer shadow register values
  */
-static void set_dei_regs_bypass(struct vpe_ctx *ctx)
+static void set_dei_regs(struct vpe_ctx *ctx)
 {
 	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
 	struct vpe_q_data *s_q_data = &ctx->q_data[Q_DATA_SRC];
 	unsigned int src_h = s_q_data->c_rect.height;
 	unsigned int src_w = s_q_data->c_rect.width;
 	u32 *dei_mmr0 = &mmr_adb->dei_regs[0];
+	bool deinterlace = true;
 	u32 val = 0;
 
 	/*
@@ -539,7 +709,13 @@ static void set_dei_regs_bypass(struct vpe_ctx *ctx)
 	 * for both progressive and interlace content in interlace bypass mode.
 	 * It has been recommended not to use progressive bypass mode.
 	 */
-	val = VPE_DEI_INTERLACE_BYPASS;
+	if ((!ctx->deinterlacing && (s_q_data->flags & Q_DATA_INTERLACED)) ||
+			!(s_q_data->flags & Q_DATA_INTERLACED)) {
+		deinterlace = false;
+		val = VPE_DEI_INTERLACE_BYPASS;
+	}
+
+	src_h = deinterlace ? src_h * 2 : src_h;
 
 	val |= (src_h << VPE_DEI_HEIGHT_SHIFT) |
 		(src_w << VPE_DEI_WIDTH_SHIFT) |
@@ -550,6 +726,22 @@ static void set_dei_regs_bypass(struct vpe_ctx *ctx)
 	ctx->load_mmrs = true;
 }
 
+static void set_dei_shadow_registers(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	u32 *dei_mmr = &mmr_adb->dei_regs[0];
+	const struct vpe_dei_regs *cur = &dei_regs;
+
+	dei_mmr[2]  = cur->mdt_spacial_freq_thr_reg;
+	dei_mmr[3]  = cur->edi_config_reg;
+	dei_mmr[4]  = cur->edi_lut_reg0;
+	dei_mmr[5]  = cur->edi_lut_reg1;
+	dei_mmr[6]  = cur->edi_lut_reg2;
+	dei_mmr[7]  = cur->edi_lut_reg3;
+
+	ctx->load_mmrs = true;
+}
+
 static void set_csc_coeff_bypass(struct vpe_ctx *ctx)
 {
 	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
@@ -578,10 +770,35 @@ static void set_sc_regs_bypass(struct vpe_ctx *ctx)
  */
 static int set_srcdst_params(struct vpe_ctx *ctx)
 {
+	struct vpe_q_data *s_q_data =  &ctx->q_data[Q_DATA_SRC];
+	struct vpe_q_data *d_q_data =  &ctx->q_data[Q_DATA_DST];
+	size_t mv_buf_size;
+	int ret;
+
 	ctx->sequence = 0;
+	ctx->field = V4L2_FIELD_TOP;
+
+	if ((s_q_data->flags & Q_DATA_INTERLACED) &&
+			!(d_q_data->flags & Q_DATA_INTERLACED)) {
+		const struct vpdma_data_format *mv =
+			&vpdma_misc_fmts[VPDMA_DATA_FMT_MV];
+
+		ctx->deinterlacing = 1;
+		mv_buf_size =
+			(s_q_data->width * s_q_data->height * mv->depth) >> 3;
+	} else {
+		ctx->deinterlacing = 0;
+		mv_buf_size = 0;
+	}
+
+	free_vbs(ctx);
+
+	ret = realloc_mv_buffers(ctx, mv_buf_size);
+	if (ret)
+		return ret;
 
 	set_cfg_and_line_modes(ctx);
-	set_dei_regs_bypass(ctx);
+	set_dei_regs(ctx);
 	set_csc_coeff_bypass(ctx);
 	set_sc_regs_bypass(ctx);
 
@@ -608,6 +825,9 @@ static int job_ready(void *priv)
 	struct vpe_ctx *ctx = priv;
 	int needed = ctx->bufs_per_job;
 
+	if (ctx->deinterlacing && ctx->src_vbs[2] == NULL)
+		needed += 2;	/* need additional two most recent fields */
+
 	if (v4l2_m2m_num_src_bufs_ready(ctx->m2m_ctx) < needed)
 		return 0;
 
@@ -735,17 +955,25 @@ static void add_out_dtd(struct vpe_ctx *ctx, int port)
 	struct v4l2_rect *c_rect = &q_data->c_rect;
 	struct vpe_fmt *fmt = q_data->fmt;
 	const struct vpdma_data_format *vpdma_fmt;
-	int plane = fmt->coplanar ? p_data->vb_part : 0;
+	int mv_buf_selector = !ctx->src_mv_buf_selector;
 	dma_addr_t dma_addr;
 	u32 flags = 0;
 
-	vpdma_fmt = fmt->vpdma_fmt[plane];
-	dma_addr = vb2_dma_contig_plane_dma_addr(vb, plane);
-	if (!dma_addr) {
-		vpe_err(ctx->dev,
-			"acquiring output buffer(%d) dma_addr failed\n",
-			port);
-		return;
+	if (port == VPE_PORT_MV_OUT) {
+		vpdma_fmt = &vpdma_misc_fmts[VPDMA_DATA_FMT_MV];
+		dma_addr = ctx->mv_buf_dma[mv_buf_selector];
+	} else {
+		/* to incorporate interleaved formats */
+		int plane = fmt->coplanar ? p_data->vb_part : 0;
+
+		vpdma_fmt = fmt->vpdma_fmt[plane];
+		dma_addr = vb2_dma_contig_plane_dma_addr(vb, plane);
+		if (!dma_addr) {
+			vpe_err(ctx->dev,
+				"acquiring output buffer(%d) dma_addr failed\n",
+				port);
+			return;
+		}
 	}
 
 	if (q_data->flags & Q_DATA_FRAME_1D)
@@ -761,23 +989,31 @@ static void add_in_dtd(struct vpe_ctx *ctx, int port)
 {
 	struct vpe_q_data *q_data = &ctx->q_data[Q_DATA_SRC];
 	const struct vpe_port_data *p_data = &port_data[port];
-	struct vb2_buffer *vb = ctx->src_vb;
+	struct vb2_buffer *vb = ctx->src_vbs[p_data->vb_index];
 	struct v4l2_rect *c_rect = &q_data->c_rect;
 	struct vpe_fmt *fmt = q_data->fmt;
 	const struct vpdma_data_format *vpdma_fmt;
-	int plane = fmt->coplanar ? p_data->vb_part : 0;
-	int field = 0;
+	int mv_buf_selector = ctx->src_mv_buf_selector;
+	int field = vb->v4l2_buf.field == V4L2_FIELD_BOTTOM;
 	dma_addr_t dma_addr;
 	u32 flags = 0;
 
-	vpdma_fmt = fmt->vpdma_fmt[plane];
+	if (port == VPE_PORT_MV_IN) {
+		vpdma_fmt = &vpdma_misc_fmts[VPDMA_DATA_FMT_MV];
+		dma_addr = ctx->mv_buf_dma[mv_buf_selector];
+	} else {
+		/* to incorporate interleaved formats */
+		int plane = fmt->coplanar ? p_data->vb_part : 0;
 
-	dma_addr = vb2_dma_contig_plane_dma_addr(vb, plane);
-	if (!dma_addr) {
-		vpe_err(ctx->dev,
-			"acquiring input buffer(%d) dma_addr failed\n",
-			port);
-		return;
+		vpdma_fmt = fmt->vpdma_fmt[plane];
+
+		dma_addr = vb2_dma_contig_plane_dma_addr(vb, plane);
+		if (!dma_addr) {
+			vpe_err(ctx->dev,
+				"acquiring input buffer(%d) dma_addr failed\n",
+				port);
+			return;
+		}
 	}
 
 	if (q_data->flags & Q_DATA_FRAME_1D)
@@ -795,7 +1031,8 @@ static void add_in_dtd(struct vpe_ctx *ctx, int port)
 static void enable_irqs(struct vpe_ctx *ctx)
 {
 	write_reg(ctx->dev, VPE_INT0_ENABLE0_SET, VPE_INT0_LIST0_COMPLETE);
-	write_reg(ctx->dev, VPE_INT0_ENABLE1_SET, VPE_DS1_UV_ERROR_INT);
+	write_reg(ctx->dev, VPE_INT0_ENABLE1_SET, VPE_DEI_ERROR_INT |
+				VPE_DS1_UV_ERROR_INT);
 
 	vpdma_enable_list_complete_irq(ctx->dev->vpdma, 0, true);
 }
@@ -818,8 +1055,15 @@ static void device_run(void *priv)
 	struct vpe_ctx *ctx = priv;
 	struct vpe_q_data *d_q_data = &ctx->q_data[Q_DATA_DST];
 
-	ctx->src_vb = v4l2_m2m_src_buf_remove(ctx->m2m_ctx);
-	WARN_ON(ctx->src_vb == NULL);
+	if (ctx->deinterlacing && ctx->src_vbs[2] == NULL) {
+		ctx->src_vbs[2] = v4l2_m2m_src_buf_remove(ctx->m2m_ctx);
+		WARN_ON(ctx->src_vbs[2] == NULL);
+		ctx->src_vbs[1] = v4l2_m2m_src_buf_remove(ctx->m2m_ctx);
+		WARN_ON(ctx->src_vbs[1] == NULL);
+	}
+
+	ctx->src_vbs[0] = v4l2_m2m_src_buf_remove(ctx->m2m_ctx);
+	WARN_ON(ctx->src_vbs[0] == NULL);
 	ctx->dst_vb = v4l2_m2m_dst_buf_remove(ctx->m2m_ctx);
 	WARN_ON(ctx->dst_vb == NULL);
 
@@ -831,28 +1075,67 @@ static void device_run(void *priv)
 		ctx->load_mmrs = false;
 	}
 
+	/* output data descriptors */
+	if (ctx->deinterlacing)
+		add_out_dtd(ctx, VPE_PORT_MV_OUT);
+
 	add_out_dtd(ctx, VPE_PORT_LUMA_OUT);
 	if (d_q_data->fmt->coplanar)
 		add_out_dtd(ctx, VPE_PORT_CHROMA_OUT);
 
+	/* input data descriptors */
+	if (ctx->deinterlacing) {
+		add_in_dtd(ctx, VPE_PORT_LUMA3_IN);
+		add_in_dtd(ctx, VPE_PORT_CHROMA3_IN);
+
+		add_in_dtd(ctx, VPE_PORT_LUMA2_IN);
+		add_in_dtd(ctx, VPE_PORT_CHROMA2_IN);
+	}
+
 	add_in_dtd(ctx, VPE_PORT_LUMA1_IN);
 	add_in_dtd(ctx, VPE_PORT_CHROMA1_IN);
 
+	if (ctx->deinterlacing)
+		add_in_dtd(ctx, VPE_PORT_MV_IN);
+
 	/* sync on channel control descriptors for input ports */
 	vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_LUMA1_IN);
 	vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_CHROMA1_IN);
 
+	if (ctx->deinterlacing) {
+		vpdma_add_sync_on_channel_ctd(&ctx->desc_list,
+			VPE_CHAN_LUMA2_IN);
+		vpdma_add_sync_on_channel_ctd(&ctx->desc_list,
+			VPE_CHAN_CHROMA2_IN);
+
+		vpdma_add_sync_on_channel_ctd(&ctx->desc_list,
+			VPE_CHAN_LUMA3_IN);
+		vpdma_add_sync_on_channel_ctd(&ctx->desc_list,
+			VPE_CHAN_CHROMA3_IN);
+
+		vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_MV_IN);
+	}
+
 	/* sync on channel control descriptors for output ports */
 	vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_LUMA_OUT);
 	if (d_q_data->fmt->coplanar)
 		vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_CHROMA_OUT);
 
+	if (ctx->deinterlacing)
+		vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_MV_OUT);
+
 	enable_irqs(ctx);
 
 	vpdma_map_desc_buf(ctx->dev->vpdma, &ctx->desc_list.buf);
 	vpdma_submit_descs(ctx->dev->vpdma, &ctx->desc_list);
 }
 
+static void dei_error(struct vpe_ctx *ctx)
+{
+	dev_warn(ctx->dev->v4l2_dev.dev,
+		"received DEI error interrupt\n");
+}
+
 static void ds1_uv_error(struct vpe_ctx *ctx)
 {
 	dev_warn(ctx->dev->v4l2_dev.dev,
@@ -863,6 +1146,7 @@ static irqreturn_t vpe_irq(int irq_vpe, void *data)
 {
 	struct vpe_dev *dev = (struct vpe_dev *)data;
 	struct vpe_ctx *ctx;
+	struct vpe_q_data *d_q_data;
 	struct vb2_buffer *s_vb, *d_vb;
 	struct v4l2_buffer *s_buf, *d_buf;
 	unsigned long flags;
@@ -886,9 +1170,15 @@ static irqreturn_t vpe_irq(int irq_vpe, void *data)
 		goto handled;
 	}
 
-	if (irqst1 & VPE_DS1_UV_ERROR_INT) {
-		irqst1 &= ~VPE_DS1_UV_ERROR_INT;
-		ds1_uv_error(ctx);
+	if (irqst1) {
+		if (irqst1 & VPE_DEI_ERROR_INT) {
+			irqst1 &= ~VPE_DEI_ERROR_INT;
+			dei_error(ctx);
+		}
+		if (irqst1 & VPE_DS1_UV_ERROR_INT) {
+			irqst1 &= ~VPE_DS1_UV_ERROR_INT;
+			ds1_uv_error(ctx);
+		}
 	}
 
 	if (irqst0) {
@@ -911,10 +1201,13 @@ static irqreturn_t vpe_irq(int irq_vpe, void *data)
 
 	vpdma_reset_desc_list(&ctx->desc_list);
 
+	 /* the previous dst mv buffer becomes the next src mv buffer */
+	ctx->src_mv_buf_selector = !ctx->src_mv_buf_selector;
+
 	if (ctx->aborting)
 		goto finished;
 
-	s_vb = ctx->src_vb;
+	s_vb = ctx->src_vbs[0];
 	d_vb = ctx->dst_vb;
 	s_buf = &s_vb->v4l2_buf;
 	d_buf = &d_vb->v4l2_buf;
@@ -924,16 +1217,35 @@ static irqreturn_t vpe_irq(int irq_vpe, void *data)
 		d_buf->flags |= V4L2_BUF_FLAG_TIMECODE;
 		d_buf->timecode = s_buf->timecode;
 	}
-
 	d_buf->sequence = ctx->sequence;
+	d_buf->field = ctx->field;
+
+	d_q_data = &ctx->q_data[Q_DATA_DST];
+	if (d_q_data->flags & Q_DATA_INTERLACED) {
+		if (ctx->field == V4L2_FIELD_BOTTOM) {
+			ctx->sequence++;
+			ctx->field = V4L2_FIELD_TOP;
+		} else {
+			WARN_ON(ctx->field != V4L2_FIELD_TOP);
+			ctx->field = V4L2_FIELD_BOTTOM;
+		}
+	} else {
+		ctx->sequence++;
+	}
 
-	ctx->sequence++;
+	if (ctx->deinterlacing)
+		s_vb = ctx->src_vbs[2];
 
 	spin_lock_irqsave(&dev->lock, flags);
 	v4l2_m2m_buf_done(s_vb, VB2_BUF_STATE_DONE);
 	v4l2_m2m_buf_done(d_vb, VB2_BUF_STATE_DONE);
 	spin_unlock_irqrestore(&dev->lock, flags);
 
+	if (ctx->deinterlacing) {
+		ctx->src_vbs[2] = ctx->src_vbs[1];
+		ctx->src_vbs[1] = ctx->src_vbs[0];
+	}
+
 	ctx->bufs_completed++;
 	if (ctx->bufs_completed < ctx->bufs_per_job) {
 		device_run(ctx);
@@ -1012,6 +1324,7 @@ static int vpe_g_fmt(struct file *file, void *priv, struct v4l2_format *f)
 	pix->width = q_data->width;
 	pix->height = q_data->height;
 	pix->pixelformat = q_data->fmt->fourcc;
+	pix->field = q_data->field;
 	pix->colorspace = q_data->colorspace;
 	pix->num_planes = q_data->fmt->coplanar ? 2 : 1;
 
@@ -1036,7 +1349,8 @@ static int __vpe_try_fmt(struct vpe_ctx *ctx, struct v4l2_format *f,
 		return -EINVAL;
 	}
 
-	pix->field = V4L2_FIELD_NONE;
+	if (pix->field != V4L2_FIELD_NONE && pix->field != V4L2_FIELD_ALTERNATE)
+		pix->field = V4L2_FIELD_NONE;
 
 	v4l_bound_align_image(&pix->width, MIN_W, MAX_W, W_ALIGN,
 			      &pix->height, MIN_H, MAX_H, H_ALIGN,
@@ -1103,6 +1417,7 @@ static int __vpe_s_fmt(struct vpe_ctx *ctx, struct v4l2_format *f)
 	q_data->width		= pix->width;
 	q_data->height		= pix->height;
 	q_data->colorspace	= pix->colorspace;
+	q_data->field		= pix->field;
 
 	for (i = 0; i < pix->num_planes; i++) {
 		plane_fmt = &pix->plane_fmt[i];
@@ -1116,6 +1431,11 @@ static int __vpe_s_fmt(struct vpe_ctx *ctx, struct v4l2_format *f)
 	q_data->c_rect.width	= q_data->width;
 	q_data->c_rect.height	= q_data->height;
 
+	if (q_data->field == V4L2_FIELD_ALTERNATE)
+		q_data->flags |= Q_DATA_INTERLACED;
+	else
+		q_data->flags &= ~Q_DATA_INTERLACED;
+
 	vpe_dbg(ctx->dev, "Setting format for type %d, wxh: %dx%d, fmt: %d bpl_y %d",
 		f->type, q_data->width, q_data->height, q_data->fmt->fourcc,
 		q_data->bytesperline[VPE_LUMA]);
@@ -1426,6 +1746,7 @@ static int vpe_open(struct file *file)
 	s_q_data->sizeimage[VPE_LUMA] = (s_q_data->width * s_q_data->height *
 			s_q_data->fmt->vpdma_fmt[VPE_LUMA]->depth) >> 3;
 	s_q_data->colorspace = V4L2_COLORSPACE_SMPTE240M;
+	s_q_data->field = V4L2_FIELD_NONE;
 	s_q_data->c_rect.left = 0;
 	s_q_data->c_rect.top = 0;
 	s_q_data->c_rect.width = s_q_data->width;
@@ -1434,6 +1755,7 @@ static int vpe_open(struct file *file)
 
 	ctx->q_data[Q_DATA_DST] = *s_q_data;
 
+	set_dei_shadow_registers(ctx);
 	set_src_registers(ctx);
 	set_dst_registers(ctx);
 	ret = set_srcdst_params(ctx);
@@ -1488,6 +1810,8 @@ static int vpe_release(struct file *file)
 	vpe_dbg(dev, "releasing instance %p\n", ctx);
 
 	mutex_lock(&dev->dev_mutex);
+	free_vbs(ctx);
+	free_mv_buffers(ctx);
 	vpdma_free_desc_list(&ctx->desc_list);
 	vpdma_free_desc_buf(&ctx->mmr_adb);
 
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 0/4] v4l: VPE mem to mem driver
  2013-09-06 10:12     ` Archit Taneja
@ 2013-09-16  6:59       ` Archit Taneja
  -1 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-09-16  6:59 UTC (permalink / raw)
  To: hverkuil, laurent.pinchart; +Cc: linux-media, linux-omap, tomi.valkeinen

Hi Hans, Laurent,

On Friday 06 September 2013 03:42 PM, Archit Taneja wrote:
> VPE(Video Processing Engine) is an IP found on DRA7xx, this series adds VPE as a
> mem to mem v4l2 driver, and VPDMA as a helper library.
>
> The first version of the patch series described VPE in detail, you can have a
> look at it here:
>
> http://www.spinics.net/lists/linux-media/msg66518.html
>
> Changes in v4:
> - Control ID for the driver reserved in v4l2-controls.h
> - Some fixes/clean ups suggested by Hans.
> - A small hack done in VPE's probe to use a fixed 64K resource size, this
>    is needed as the DT bindings will split the addresses accross VPE
>    submodules, the driver currently works with register offsets from the top
>    level VPE base. The driver can be modified later to support multiple
>    ioremaps of the sub modules.
> - Addition of sync on channel descriptors for input DMA channels, this
>    ensures the VPDMA list is stalled in the rare case of an input channel's
>    DMA getting completed after all the output channel DMAs.
> - Removed the DT and hwmod patches from this series. DRA7xx support is not
>    yet got in the 3.12 merge window. Will deal with those separately.

I incorporated your comments and suggestions from the previous series. 
Wanted to know if you think it looks good enough to get merged now?

Thanks!
Archit


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 0/4] v4l: VPE mem to mem driver
@ 2013-09-16  6:59       ` Archit Taneja
  0 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-09-16  6:59 UTC (permalink / raw)
  To: hverkuil, laurent.pinchart; +Cc: linux-media, linux-omap, tomi.valkeinen

Hi Hans, Laurent,

On Friday 06 September 2013 03:42 PM, Archit Taneja wrote:
> VPE(Video Processing Engine) is an IP found on DRA7xx, this series adds VPE as a
> mem to mem v4l2 driver, and VPDMA as a helper library.
>
> The first version of the patch series described VPE in detail, you can have a
> look at it here:
>
> http://www.spinics.net/lists/linux-media/msg66518.html
>
> Changes in v4:
> - Control ID for the driver reserved in v4l2-controls.h
> - Some fixes/clean ups suggested by Hans.
> - A small hack done in VPE's probe to use a fixed 64K resource size, this
>    is needed as the DT bindings will split the addresses accross VPE
>    submodules, the driver currently works with register offsets from the top
>    level VPE base. The driver can be modified later to support multiple
>    ioremaps of the sub modules.
> - Addition of sync on channel descriptors for input DMA channels, this
>    ensures the VPDMA list is stalled in the rare case of an input channel's
>    DMA getting completed after all the output channel DMAs.
> - Removed the DT and hwmod patches from this series. DRA7xx support is not
>    yet got in the 3.12 merge window. Will deal with those separately.

I incorporated your comments and suggestions from the previous series. 
Wanted to know if you think it looks good enough to get merged now?

Thanks!
Archit


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 0/4] v4l: VPE mem to mem driver
  2013-09-16  6:59       ` Archit Taneja
@ 2013-10-07  6:39         ` Archit Taneja
  -1 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-10-07  6:39 UTC (permalink / raw)
  To: hverkuil, laurent.pinchart; +Cc: linux-media, linux-omap, tomi.valkeinen

Hi,

On Monday 16 September 2013 12:29 PM, Archit Taneja wrote:
> Hi Hans, Laurent,
>
> On Friday 06 September 2013 03:42 PM, Archit Taneja wrote:
>> VPE(Video Processing Engine) is an IP found on DRA7xx, this series
>> adds VPE as a
>> mem to mem v4l2 driver, and VPDMA as a helper library.
>>
>> The first version of the patch series described VPE in detail, you can
>> have a
>> look at it here:
>>
>> http://www.spinics.net/lists/linux-media/msg66518.html
>>
>> Changes in v4:
>> - Control ID for the driver reserved in v4l2-controls.h
>> - Some fixes/clean ups suggested by Hans.
>> - A small hack done in VPE's probe to use a fixed 64K resource size, this
>>    is needed as the DT bindings will split the addresses accross VPE
>>    submodules, the driver currently works with register offsets from
>> the top
>>    level VPE base. The driver can be modified later to support multiple
>>    ioremaps of the sub modules.
>> - Addition of sync on channel descriptors for input DMA channels, this
>>    ensures the VPDMA list is stalled in the rare case of an input
>> channel's
>>    DMA getting completed after all the output channel DMAs.
>> - Removed the DT and hwmod patches from this series. DRA7xx support is
>> not
>>    yet got in the 3.12 merge window. Will deal with those separately.
>
> I incorporated your comments and suggestions from the previous series.
> Wanted to know if you think it looks good enough to get merged now?

Ping. Any comments on this?

Thanks,
Archit


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 0/4] v4l: VPE mem to mem driver
@ 2013-10-07  6:39         ` Archit Taneja
  0 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-10-07  6:39 UTC (permalink / raw)
  To: hverkuil, laurent.pinchart; +Cc: linux-media, linux-omap, tomi.valkeinen

Hi,

On Monday 16 September 2013 12:29 PM, Archit Taneja wrote:
> Hi Hans, Laurent,
>
> On Friday 06 September 2013 03:42 PM, Archit Taneja wrote:
>> VPE(Video Processing Engine) is an IP found on DRA7xx, this series
>> adds VPE as a
>> mem to mem v4l2 driver, and VPDMA as a helper library.
>>
>> The first version of the patch series described VPE in detail, you can
>> have a
>> look at it here:
>>
>> http://www.spinics.net/lists/linux-media/msg66518.html
>>
>> Changes in v4:
>> - Control ID for the driver reserved in v4l2-controls.h
>> - Some fixes/clean ups suggested by Hans.
>> - A small hack done in VPE's probe to use a fixed 64K resource size, this
>>    is needed as the DT bindings will split the addresses accross VPE
>>    submodules, the driver currently works with register offsets from
>> the top
>>    level VPE base. The driver can be modified later to support multiple
>>    ioremaps of the sub modules.
>> - Addition of sync on channel descriptors for input DMA channels, this
>>    ensures the VPDMA list is stalled in the rare case of an input
>> channel's
>>    DMA getting completed after all the output channel DMAs.
>> - Removed the DT and hwmod patches from this series. DRA7xx support is
>> not
>>    yet got in the 3.12 merge window. Will deal with those separately.
>
> I incorporated your comments and suggestions from the previous series.
> Wanted to know if you think it looks good enough to get merged now?

Ping. Any comments on this?

Thanks,
Archit


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 1/4] v4l: ti-vpe: Create a vpdma helper library
  2013-09-06 10:12       ` Archit Taneja
  (?)
@ 2013-10-07  7:46       ` Hans Verkuil
  -1 siblings, 0 replies; 138+ messages in thread
From: Hans Verkuil @ 2013-10-07  7:46 UTC (permalink / raw)
  To: Archit Taneja; +Cc: linux-media, laurent.pinchart, linux-omap, tomi.valkeinen

On 09/06/2013 12:12 PM, Archit Taneja wrote:
> The primary function of VPDMA is to move data between external memory and
> internal processing modules(in our case, VPE) that source or sink data. VPDMA is
> capable of buffering this data and then delivering the data as demanded to the
> modules as programmed. The modules that source or sink data are referred to as
> clients or ports. A channel is setup inside the VPDMA to connect a specific
> memory buffer to a specific client. The VPDMA centralizes the DMA control
> functions and buffering required to allow all the clients to minimize the
> effect of long latency times.
> 
> Add the following to the VPDMA helper:
> 
> - A data struct which describe VPDMA channels. For now, these channels are the
>   ones used only by VPE, the list of channels will increase when VIP(Video
>   Input Port) also uses the VPDMA library. This channel information will be
>   used to populate fields required by data descriptors.
> 
> - Data structs which describe the different data types supported by VPDMA. This
>   data type information will be used to populate fields required by data
>   descriptors and used by the VPE driver to map a V4L2 format to the
>   corresponding VPDMA data type.
> 
> - Provide VPDMA register offset definitions, functions to read, write and modify
>   VPDMA registers.
> 
> - Functions to create and submit a VPDMA list. A list is a group of descriptors
>   that makes up a set of DMA transfers that need to be completed. Each
>   descriptor will either perform a DMA transaction to fetch input buffers and
>   write to output buffers(data descriptors), or configure the MMRs of sub blocks
>   of VPE(configuration descriptors), or provide control information to VPDMA
>   (control descriptors).
> 
> - Functions to allocate, map and unmap buffers needed for the descriptor list,
>   payloads containing MMR values and scaler coefficients. These use the DMA
>   mapping APIs to ensure exclusive access to VPDMA.
> 
> - Functions to enable VPDMA interrupts. VPDMA can trigger an interrupt on the
>   VPE interrupt line when a descriptor list is parsed completely and the DMA
>   transactions are completed. This requires masking the events in VPDMA
>   registers and configuring some top level VPE interrupt registers.
> 
> - Enable some VPDMA specific parameters: frame start event(when to start DMA for
>   a client) and line mode(whether each line fetched should be mirrored or not).
> 
> - Function to load firmware required by VPDMA. VPDMA requires a firmware for
>   it's internal list manager. We add the required request_firmware apis to fetch
>   this firmware from user space.
> 
> - Function to dump VPDMA registers.
> 
> - A function to initialize and create a VPDMA instance, this will be called by
>   the VPE driver with it's platform device pointer, this function will take care
>   of loading VPDMA firmware and returning a vpdma_data instance back to the VPE
>   driver. The VIP driver will also call the same init function to initialize it's
>   own VPDMA instance.
> 
> Signed-off-by: Archit Taneja <archit@ti.com>

Acked-by: Hans Verkuil <hans.verkuil@cisco.com>

Regards,

	Hans

> ---
>  drivers/media/platform/ti-vpe/vpdma.c      | 578 +++++++++++++++++++++++++++++
>  drivers/media/platform/ti-vpe/vpdma.h      | 155 ++++++++
>  drivers/media/platform/ti-vpe/vpdma_priv.h | 119 ++++++
>  3 files changed, 852 insertions(+)
>  create mode 100644 drivers/media/platform/ti-vpe/vpdma.c
>  create mode 100644 drivers/media/platform/ti-vpe/vpdma.h
>  create mode 100644 drivers/media/platform/ti-vpe/vpdma_priv.h
> 


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 2/4] v4l: ti-vpe: Add helpers for creating VPDMA descriptors
  2013-09-06 10:12       ` Archit Taneja
  (?)
@ 2013-10-07  7:46       ` Hans Verkuil
  -1 siblings, 0 replies; 138+ messages in thread
From: Hans Verkuil @ 2013-10-07  7:46 UTC (permalink / raw)
  To: Archit Taneja; +Cc: linux-media, laurent.pinchart, linux-omap, tomi.valkeinen

On 09/06/2013 12:12 PM, Archit Taneja wrote:
> Create functions which the VPE driver can use to create a VPDMA descriptor and
> add it to a VPDMA descriptor list. These functions take a pointer to an existing
> list, and append the configuration/data/control descriptor header to the list.
> 
> In the case of configuration descriptors, the creation of a payload block may be
> required(the payloads can hold VPE MMR values, or scaler coefficients). The
> allocation of the payload buffer and it's content is left to the VPE driver.
> However, the VPDMA library provides helper macros to create payload in the
> correct format.
> 
> Add debug functions to dump the descriptors in a way such that it's easy to see
> the values of different fields in the descriptors.
> 
> Signed-off-by: Archit Taneja <archit@ti.com>

Acked-by: Hans Verkuil <hans.verkuil@cisco.com>

Regards,

	Hans

> ---
>  drivers/media/platform/ti-vpe/vpdma.c      | 268 +++++++++++++++
>  drivers/media/platform/ti-vpe/vpdma.h      |  48 +++
>  drivers/media/platform/ti-vpe/vpdma_priv.h | 522 +++++++++++++++++++++++++++++
>  3 files changed, 838 insertions(+)
> 


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 3/4] v4l: ti-vpe: Add VPE mem to mem driver
  2013-09-06 10:12       ` Archit Taneja
  (?)
@ 2013-10-07  7:55       ` Hans Verkuil
  2013-10-07  9:16           ` Archit Taneja
  -1 siblings, 1 reply; 138+ messages in thread
From: Hans Verkuil @ 2013-10-07  7:55 UTC (permalink / raw)
  To: Archit Taneja; +Cc: linux-media, laurent.pinchart, linux-omap, tomi.valkeinen

Hi Archit,

I've got a few comments below...

On 09/06/2013 12:12 PM, Archit Taneja wrote:
> VPE is a block which consists of a single memory to memory path which can
> perform chrominance up/down sampling, de-interlacing, scaling, and color space
> conversion of raster or tiled YUV420 coplanar, YUV422 coplanar or YUV422
> interleaved video formats.
> 
> We create a mem2mem driver based primarily on the mem2mem-testdev example.
> The de-interlacer, scaler and color space converter are all bypassed for now
> to keep the driver simple. Chroma up/down sampler blocks are implemented, so
> conversion beteen different YUV formats is possible.
> 
> Each mem2mem context allocates a buffer for VPE MMR values which it will use
> when it gets access to the VPE HW via the mem2mem queue, it also allocates
> a VPDMA descriptor list to which configuration and data descriptors are added.
> 
> Based on the information received via v4l2 ioctls for the source and
> destination queues, the driver configures the values for the MMRs, and stores
> them in the buffer. There are also some VPDMA parameters like frame start and
> line mode which needs to be configured, these are configured by direct register
> writes via the VPDMA helper functions.
> 
> The driver's device_run() mem2mem op will add each descriptor based on how the
> source and destination queues are set up for the given ctx, once the list is
> prepared, it's submitted to VPDMA, these descriptors when parsed by VPDMA will
> upload MMR registers, start DMA of video buffers on the various input and output
> clients/ports.
> 
> When the list is parsed completely(and the DMAs on all the output ports done),
> an interrupt is generated which we use to notify that the source and destination
> buffers are done.
> 
> The rest of the driver is quite similar to other mem2mem drivers, we use the
> multiplane v4l2 ioctls as the HW support coplanar formats.
> 
> Signed-off-by: Archit Taneja <archit@ti.com>
> ---
>  drivers/media/platform/Kconfig           |   16 +
>  drivers/media/platform/Makefile          |    2 +
>  drivers/media/platform/ti-vpe/Makefile   |    5 +
>  drivers/media/platform/ti-vpe/vpe.c      | 1750 ++++++++++++++++++++++++++++++
>  drivers/media/platform/ti-vpe/vpe_regs.h |  496 +++++++++
>  include/uapi/linux/v4l2-controls.h       |    4 +
>  6 files changed, 2273 insertions(+)
>  create mode 100644 drivers/media/platform/ti-vpe/Makefile
>  create mode 100644 drivers/media/platform/ti-vpe/vpe.c
>  create mode 100644 drivers/media/platform/ti-vpe/vpe_regs.h
> 

<snip>

> +
> +static int vpe_g_fmt(struct file *file, void *priv, struct v4l2_format *f)
> +{
> +	struct v4l2_pix_format_mplane *pix = &f->fmt.pix_mp;
> +	struct vpe_ctx *ctx = file2ctx(file);
> +	struct vb2_queue *vq;
> +	struct vpe_q_data *q_data;
> +	int i;
> +
> +	vq = v4l2_m2m_get_vq(ctx->m2m_ctx, f->type);
> +	if (!vq)
> +		return -EINVAL;
> +
> +	q_data = get_q_data(ctx, f->type);
> +
> +	pix->width = q_data->width;
> +	pix->height = q_data->height;
> +	pix->pixelformat = q_data->fmt->fourcc;
> +	pix->colorspace = q_data->colorspace;
> +	pix->num_planes = q_data->fmt->coplanar ? 2 : 1;
> +
> +	for (i = 0; i < pix->num_planes; i++) {
> +		pix->plane_fmt[i].bytesperline = q_data->bytesperline[i];
> +		pix->plane_fmt[i].sizeimage = q_data->sizeimage[i];
> +	}
> +
> +	return 0;
> +}
> +
> +static int __vpe_try_fmt(struct vpe_ctx *ctx, struct v4l2_format *f,
> +		       struct vpe_fmt *fmt, int type)
> +{
> +	struct v4l2_pix_format_mplane *pix = &f->fmt.pix_mp;
> +	struct v4l2_plane_pix_format *plane_fmt;
> +	int i;
> +
> +	if (!fmt || !(fmt->types & type)) {
> +		vpe_err(ctx->dev, "Fourcc format (0x%08x) invalid.\n",
> +			pix->pixelformat);
> +		return -EINVAL;
> +	}
> +
> +	pix->field = V4L2_FIELD_NONE;
> +
> +	v4l_bound_align_image(&pix->width, MIN_W, MAX_W, W_ALIGN,
> +			      &pix->height, MIN_H, MAX_H, H_ALIGN,
> +			      S_ALIGN);
> +
> +	pix->num_planes = fmt->coplanar ? 2 : 1;
> +	pix->pixelformat = fmt->fourcc;
> +	pix->colorspace = fmt->fourcc == V4L2_PIX_FMT_RGB24 ?

You do this only for capture. Output sets the colorspace, so try_fmt should
leave the colorspace field untouched for the output direction.

> +			V4L2_COLORSPACE_SRGB : V4L2_COLORSPACE_SMPTE170M;

Zero pix->priv as well:

	pix->priv = 0;

> +
> +	for (i = 0; i < pix->num_planes; i++) {
> +		int depth;
> +
> +		plane_fmt = &pix->plane_fmt[i];
> +		depth = fmt->vpdma_fmt[i]->depth;
> +
> +		if (i == VPE_LUMA)
> +			plane_fmt->bytesperline =
> +					round_up((pix->width * depth) >> 3,
> +						1 << L_ALIGN);
> +		else
> +			plane_fmt->bytesperline = pix->width;
> +
> +		plane_fmt->sizeimage =
> +				(pix->height * pix->width * depth) >> 3;
> +	}
> +
> +	return 0;
> +}
> +
> +static int vpe_try_fmt(struct file *file, void *priv, struct v4l2_format *f)
> +{
> +	struct vpe_ctx *ctx = file2ctx(file);
> +	struct vpe_fmt *fmt = find_format(f);
> +
> +	if (V4L2_TYPE_IS_OUTPUT(f->type))
> +		return __vpe_try_fmt(ctx, f, fmt, VPE_FMT_TYPE_OUTPUT);
> +	else
> +		return __vpe_try_fmt(ctx, f, fmt, VPE_FMT_TYPE_CAPTURE);
> +}
> +
> +static int __vpe_s_fmt(struct vpe_ctx *ctx, struct v4l2_format *f)
> +{
> +	struct v4l2_pix_format_mplane *pix = &f->fmt.pix_mp;
> +	struct v4l2_plane_pix_format *plane_fmt;
> +	struct vpe_q_data *q_data;
> +	struct vb2_queue *vq;
> +	int i;
> +
> +	vq = v4l2_m2m_get_vq(ctx->m2m_ctx, f->type);
> +	if (!vq)
> +		return -EINVAL;
> +
> +	if (vb2_is_busy(vq)) {
> +		vpe_err(ctx->dev, "queue busy\n");
> +		return -EBUSY;
> +	}
> +
> +	q_data = get_q_data(ctx, f->type);
> +	if (!q_data)
> +		return -EINVAL;
> +
> +	q_data->fmt		= find_format(f);
> +	q_data->width		= pix->width;
> +	q_data->height		= pix->height;
> +	q_data->colorspace	= pix->colorspace;

Does this make sense for output devices? See comment in try_fmt.

> +
> +	for (i = 0; i < pix->num_planes; i++) {
> +		plane_fmt = &pix->plane_fmt[i];
> +
> +		q_data->bytesperline[i]	= plane_fmt->bytesperline;
> +		q_data->sizeimage[i]	= plane_fmt->sizeimage;
> +	}
> +
> +	q_data->c_rect.left	= 0;
> +	q_data->c_rect.top	= 0;
> +	q_data->c_rect.width	= q_data->width;
> +	q_data->c_rect.height	= q_data->height;
> +
> +	vpe_dbg(ctx->dev, "Setting format for type %d, wxh: %dx%d, fmt: %d bpl_y %d",
> +		f->type, q_data->width, q_data->height, q_data->fmt->fourcc,
> +		q_data->bytesperline[VPE_LUMA]);
> +	if (q_data->fmt->coplanar)
> +		vpe_dbg(ctx->dev, " bpl_uv %d\n",
> +			q_data->bytesperline[VPE_CHROMA]);
> +
> +	return 0;
> +}
> +
> +static int vpe_s_fmt(struct file *file, void *priv, struct v4l2_format *f)
> +{
> +	int ret;
> +	struct vpe_ctx *ctx = file2ctx(file);
> +
> +	ret = vpe_try_fmt(file, priv, f);
> +	if (ret)
> +		return ret;
> +
> +	ret = __vpe_s_fmt(ctx, f);
> +	if (ret)
> +		return ret;
> +
> +	if (V4L2_TYPE_IS_OUTPUT(f->type))
> +		set_src_registers(ctx);
> +	else
> +		set_dst_registers(ctx);
> +
> +	return set_srcdst_params(ctx);
> +}
> +
> +static int vpe_reqbufs(struct file *file, void *priv,
> +		       struct v4l2_requestbuffers *reqbufs)
> +{
> +	struct vpe_ctx *ctx = file2ctx(file);
> +
> +	return v4l2_m2m_reqbufs(file, ctx->m2m_ctx, reqbufs);
> +}
> +
> +static int vpe_querybuf(struct file *file, void *priv, struct v4l2_buffer *buf)
> +{
> +	struct vpe_ctx *ctx = file2ctx(file);
> +
> +	return v4l2_m2m_querybuf(file, ctx->m2m_ctx, buf);
> +}
> +
> +static int vpe_qbuf(struct file *file, void *priv, struct v4l2_buffer *buf)
> +{
> +	struct vpe_ctx *ctx = file2ctx(file);
> +
> +	return v4l2_m2m_qbuf(file, ctx->m2m_ctx, buf);
> +}
> +
> +static int vpe_dqbuf(struct file *file, void *priv, struct v4l2_buffer *buf)
> +{
> +	struct vpe_ctx *ctx = file2ctx(file);
> +
> +	return v4l2_m2m_dqbuf(file, ctx->m2m_ctx, buf);
> +}
> +
> +static int vpe_streamon(struct file *file, void *priv, enum v4l2_buf_type type)
> +{
> +	struct vpe_ctx *ctx = file2ctx(file);
> +
> +	return v4l2_m2m_streamon(file, ctx->m2m_ctx, type);
> +}
> +
> +static int vpe_streamoff(struct file *file, void *priv, enum v4l2_buf_type type)
> +{
> +	struct vpe_ctx *ctx = file2ctx(file);
> +
> +	vpe_dump_regs(ctx->dev);
> +	vpdma_dump_regs(ctx->dev->vpdma);
> +
> +	return v4l2_m2m_streamoff(file, ctx->m2m_ctx, type);
> +}
> +
> +#define V4L2_CID_VPE_BUFS_PER_JOB		(V4L2_CID_USER_TI_VPE_BASE + 0)

Please comment here what this control does. If applications are supposed to explicitly
set this control, then you need to put it in a public driver-specific header, otherwise
they can never get hold of the control.

> +
> +static int vpe_s_ctrl(struct v4l2_ctrl *ctrl)
> +{
> +	struct vpe_ctx *ctx =
> +		container_of(ctrl->handler, struct vpe_ctx, hdl);
> +
> +	switch (ctrl->id) {
> +	case V4L2_CID_VPE_BUFS_PER_JOB:
> +		ctx->bufs_per_job = ctrl->val;
> +		break;
> +
> +	default:
> +		vpe_err(ctx->dev, "Invalid control\n");
> +		return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +
> +static const struct v4l2_ctrl_ops vpe_ctrl_ops = {
> +	.s_ctrl = vpe_s_ctrl,
> +};
> +
> +static const struct v4l2_ioctl_ops vpe_ioctl_ops = {
> +	.vidioc_querycap	= vpe_querycap,
> +
> +	.vidioc_enum_fmt_vid_cap_mplane = vpe_enum_fmt,
> +	.vidioc_g_fmt_vid_cap_mplane	= vpe_g_fmt,
> +	.vidioc_try_fmt_vid_cap_mplane	= vpe_try_fmt,
> +	.vidioc_s_fmt_vid_cap_mplane	= vpe_s_fmt,
> +
> +	.vidioc_enum_fmt_vid_out_mplane = vpe_enum_fmt,
> +	.vidioc_g_fmt_vid_out_mplane	= vpe_g_fmt,
> +	.vidioc_try_fmt_vid_out_mplane	= vpe_try_fmt,
> +	.vidioc_s_fmt_vid_out_mplane	= vpe_s_fmt,
> +
> +	.vidioc_reqbufs		= vpe_reqbufs,
> +	.vidioc_querybuf	= vpe_querybuf,
> +
> +	.vidioc_qbuf		= vpe_qbuf,
> +	.vidioc_dqbuf		= vpe_dqbuf,
> +
> +	.vidioc_streamon	= vpe_streamon,
> +	.vidioc_streamoff	= vpe_streamoff,
> +	.vidioc_subscribe_event = v4l2_ctrl_subscribe_event,
> +	.vidioc_unsubscribe_event = v4l2_event_unsubscribe,
> +};
> +
> +/*
> + * Queue operations
> + */
> +static int vpe_queue_setup(struct vb2_queue *vq,
> +			   const struct v4l2_format *fmt,
> +			   unsigned int *nbuffers, unsigned int *nplanes,
> +			   unsigned int sizes[], void *alloc_ctxs[])
> +{
> +	int i;
> +	struct vpe_ctx *ctx = vb2_get_drv_priv(vq);
> +	struct vpe_q_data *q_data;
> +
> +	q_data = get_q_data(ctx, vq->type);
> +
> +	*nplanes = q_data->fmt->coplanar ? 2 : 1;
> +
> +	for (i = 0; i < *nplanes; i++) {
> +		sizes[i] = q_data->sizeimage[i];
> +		alloc_ctxs[i] = ctx->dev->alloc_ctx;
> +	}
> +
> +	vpe_dbg(ctx->dev, "get %d buffer(s) of size %d", *nbuffers,
> +		sizes[VPE_LUMA]);
> +	if (q_data->fmt->coplanar)
> +		vpe_dbg(ctx->dev, " and %d\n", sizes[VPE_CHROMA]);
> +
> +	return 0;
> +}
> +
> +static int vpe_buf_prepare(struct vb2_buffer *vb)
> +{
> +	struct vpe_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
> +	struct vpe_q_data *q_data;
> +	int i, num_planes;
> +
> +	vpe_dbg(ctx->dev, "type: %d\n", vb->vb2_queue->type);
> +
> +	q_data = get_q_data(ctx, vb->vb2_queue->type);
> +	num_planes = q_data->fmt->coplanar ? 2 : 1;
> +
> +	for (i = 0; i < num_planes; i++) {
> +		if (vb2_plane_size(vb, i) < q_data->sizeimage[i]) {
> +			vpe_err(ctx->dev,
> +				"data will not fit into plane (%lu < %lu)\n",
> +				vb2_plane_size(vb, i),
> +				(long) q_data->sizeimage[i]);
> +			return -EINVAL;
> +		}
> +	}
> +
> +	for (i = 0; i < num_planes; i++)
> +		vb2_set_plane_payload(vb, i, q_data->sizeimage[i]);
> +
> +	return 0;
> +}
> +
> +static void vpe_buf_queue(struct vb2_buffer *vb)
> +{
> +	struct vpe_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
> +	v4l2_m2m_buf_queue(ctx->m2m_ctx, vb);
> +}
> +
> +static void vpe_wait_prepare(struct vb2_queue *q)
> +{
> +	struct vpe_ctx *ctx = vb2_get_drv_priv(q);
> +	vpe_unlock(ctx);
> +}
> +
> +static void vpe_wait_finish(struct vb2_queue *q)
> +{
> +	struct vpe_ctx *ctx = vb2_get_drv_priv(q);
> +	vpe_lock(ctx);
> +}
> +
> +static struct vb2_ops vpe_qops = {
> +	.queue_setup	 = vpe_queue_setup,
> +	.buf_prepare	 = vpe_buf_prepare,
> +	.buf_queue	 = vpe_buf_queue,
> +	.wait_prepare	 = vpe_wait_prepare,
> +	.wait_finish	 = vpe_wait_finish,
> +};
> +
> +static int queue_init(void *priv, struct vb2_queue *src_vq,
> +		      struct vb2_queue *dst_vq)
> +{
> +	struct vpe_ctx *ctx = priv;
> +	int ret;
> +
> +	memset(src_vq, 0, sizeof(*src_vq));
> +	src_vq->type = V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE;
> +	src_vq->io_modes = VB2_MMAP;
> +	src_vq->drv_priv = ctx;
> +	src_vq->buf_struct_size = sizeof(struct v4l2_m2m_buffer);
> +	src_vq->ops = &vpe_qops;
> +	src_vq->mem_ops = &vb2_dma_contig_memops;
> +	src_vq->timestamp_type = V4L2_BUF_FLAG_TIMESTAMP_COPY;
> +
> +	ret = vb2_queue_init(src_vq);
> +	if (ret)
> +		return ret;
> +
> +	memset(dst_vq, 0, sizeof(*dst_vq));
> +	dst_vq->type = V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE;
> +	dst_vq->io_modes = VB2_MMAP;
> +	dst_vq->drv_priv = ctx;
> +	dst_vq->buf_struct_size = sizeof(struct v4l2_m2m_buffer);
> +	dst_vq->ops = &vpe_qops;
> +	dst_vq->mem_ops = &vb2_dma_contig_memops;
> +	dst_vq->timestamp_type = V4L2_BUF_FLAG_TIMESTAMP_COPY;
> +
> +	return vb2_queue_init(dst_vq);
> +}
> +
> +static const struct v4l2_ctrl_config vpe_bufs_per_job = {
> +	.ops = &vpe_ctrl_ops,
> +	.id = V4L2_CID_VPE_BUFS_PER_JOB,
> +	.name = "Buffers Per Transaction",
> +	.type = V4L2_CTRL_TYPE_INTEGER,
> +	.def = VPE_DEF_BUFS_PER_JOB,
> +	.min = 1,
> +	.max = VIDEO_MAX_FRAME,
> +	.step = 1,
> +};
> +
> +/*
> + * File operations
> + */
> +static int vpe_open(struct file *file)
> +{
> +	struct vpe_dev *dev = video_drvdata(file);
> +	struct vpe_ctx *ctx = NULL;
> +	struct vpe_q_data *s_q_data;
> +	struct v4l2_ctrl_handler *hdl;
> +	int ret;
> +
> +	vpe_dbg(dev, "vpe_open\n");
> +
> +	if (!dev->vpdma->ready) {
> +		vpe_err(dev, "vpdma firmware not loaded\n");
> +		return -ENODEV;
> +	}
> +
> +	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
> +	if (!ctx)
> +		return -ENOMEM;
> +
> +	ctx->dev = dev;
> +
> +	if (mutex_lock_interruptible(&dev->dev_mutex)) {
> +		ret = -ERESTARTSYS;
> +		goto free_ctx;
> +	}
> +
> +	ret = vpdma_create_desc_list(&ctx->desc_list, VPE_DESC_LIST_SIZE,
> +			VPDMA_LIST_TYPE_NORMAL);
> +	if (ret != 0)
> +		goto unlock;
> +
> +	ret = vpdma_alloc_desc_buf(&ctx->mmr_adb, sizeof(struct vpe_mmr_adb));
> +	if (ret != 0)
> +		goto free_desc_list;
> +
> +	init_adb_hdrs(ctx);
> +
> +	v4l2_fh_init(&ctx->fh, video_devdata(file));
> +	file->private_data = &ctx->fh;
> +
> +	hdl = &ctx->hdl;
> +	v4l2_ctrl_handler_init(hdl, 1);
> +	v4l2_ctrl_new_custom(hdl, &vpe_bufs_per_job, NULL);
> +	if (hdl->error) {
> +		ret = hdl->error;
> +		goto exit_fh;
> +	}
> +	ctx->fh.ctrl_handler = hdl;
> +	v4l2_ctrl_handler_setup(hdl);
> +
> +	s_q_data = &ctx->q_data[Q_DATA_SRC];
> +	s_q_data->fmt = &vpe_formats[2];
> +	s_q_data->width = 1920;
> +	s_q_data->height = 1080;
> +	s_q_data->sizeimage[VPE_LUMA] = (s_q_data->width * s_q_data->height *
> +			s_q_data->fmt->vpdma_fmt[VPE_LUMA]->depth) >> 3;
> +	s_q_data->colorspace = V4L2_COLORSPACE_SMPTE240M;
> +	s_q_data->c_rect.left = 0;
> +	s_q_data->c_rect.top = 0;
> +	s_q_data->c_rect.width = s_q_data->width;
> +	s_q_data->c_rect.height = s_q_data->height;
> +	s_q_data->flags = 0;
> +
> +	ctx->q_data[Q_DATA_DST] = *s_q_data;
> +
> +	set_src_registers(ctx);
> +	set_dst_registers(ctx);
> +	ret = set_srcdst_params(ctx);
> +	if (ret)
> +		goto exit_fh;
> +
> +	ctx->m2m_ctx = v4l2_m2m_ctx_init(dev->m2m_dev, ctx, &queue_init);
> +
> +	if (IS_ERR(ctx->m2m_ctx)) {
> +		ret = PTR_ERR(ctx->m2m_ctx);
> +		goto exit_fh;
> +	}
> +
> +	v4l2_fh_add(&ctx->fh);
> +
> +	/*
> +	 * for now, just report the creation of the first instance, we can later
> +	 * optimize the driver to enable or disable clocks when the first
> +	 * instance is created or the last instance released
> +	 */
> +	if (atomic_inc_return(&dev->num_instances) == 1)
> +		vpe_dbg(dev, "first instance created\n");
> +
> +	ctx->bufs_per_job = VPE_DEF_BUFS_PER_JOB;
> +
> +	ctx->load_mmrs = true;
> +
> +	vpe_dbg(dev, "created instance %p, m2m_ctx: %p\n",
> +		ctx, ctx->m2m_ctx);
> +
> +	mutex_unlock(&dev->dev_mutex);
> +
> +	return 0;
> +exit_fh:
> +	v4l2_ctrl_handler_free(hdl);
> +	v4l2_fh_exit(&ctx->fh);
> +	vpdma_free_desc_buf(&ctx->mmr_adb);
> +free_desc_list:
> +	vpdma_free_desc_list(&ctx->desc_list);
> +unlock:
> +	mutex_unlock(&dev->dev_mutex);
> +free_ctx:
> +	kfree(ctx);
> +	return ret;
> +}
> +
> +static int vpe_release(struct file *file)
> +{
> +	struct vpe_dev *dev = video_drvdata(file);
> +	struct vpe_ctx *ctx = file2ctx(file);
> +
> +	vpe_dbg(dev, "releasing instance %p\n", ctx);
> +
> +	mutex_lock(&dev->dev_mutex);
> +	vpdma_free_desc_list(&ctx->desc_list);
> +	vpdma_free_desc_buf(&ctx->mmr_adb);
> +
> +	v4l2_fh_del(&ctx->fh);
> +	v4l2_fh_exit(&ctx->fh);
> +	v4l2_ctrl_handler_free(&ctx->hdl);
> +	v4l2_m2m_ctx_release(ctx->m2m_ctx);
> +
> +	kfree(ctx);
> +
> +	/*
> +	 * for now, just report the release of the last instance, we can later
> +	 * optimize the driver to enable or disable clocks when the first
> +	 * instance is created or the last instance released
> +	 */
> +	if (atomic_dec_return(&dev->num_instances) == 0)
> +		vpe_dbg(dev, "last instance released\n");
> +
> +	mutex_unlock(&dev->dev_mutex);
> +
> +	return 0;
> +}
> +
> +static unsigned int vpe_poll(struct file *file,
> +			     struct poll_table_struct *wait)
> +{
> +	struct vpe_ctx *ctx = file2ctx(file);
> +	struct vpe_dev *dev = ctx->dev;
> +	int ret;
> +
> +	mutex_lock(&dev->dev_mutex);
> +	ret = v4l2_m2m_poll(file, ctx->m2m_ctx, wait);
> +	mutex_unlock(&dev->dev_mutex);
> +	return ret;
> +}
> +
> +static int vpe_mmap(struct file *file, struct vm_area_struct *vma)
> +{
> +	struct vpe_ctx *ctx = file2ctx(file);
> +	struct vpe_dev *dev = ctx->dev;
> +	int ret;
> +
> +	if (mutex_lock_interruptible(&dev->dev_mutex))
> +		return -ERESTARTSYS;
> +	ret = v4l2_m2m_mmap(file, ctx->m2m_ctx, vma);
> +	mutex_unlock(&dev->dev_mutex);
> +	return ret;
> +}
> +
> +static const struct v4l2_file_operations vpe_fops = {
> +	.owner		= THIS_MODULE,
> +	.open		= vpe_open,
> +	.release	= vpe_release,
> +	.poll		= vpe_poll,
> +	.unlocked_ioctl	= video_ioctl2,
> +	.mmap		= vpe_mmap,
> +};
> +
> +static struct video_device vpe_videodev = {
> +	.name		= VPE_MODULE_NAME,
> +	.fops		= &vpe_fops,
> +	.ioctl_ops	= &vpe_ioctl_ops,
> +	.minor		= -1,
> +	.release	= video_device_release,
> +	.vfl_dir	= VFL_DIR_M2M,
> +};
> +
> +static struct v4l2_m2m_ops m2m_ops = {
> +	.device_run	= device_run,
> +	.job_ready	= job_ready,
> +	.job_abort	= job_abort,
> +	.lock		= vpe_lock,
> +	.unlock		= vpe_unlock,
> +};
> +
> +static int vpe_runtime_get(struct platform_device *pdev)
> +{
> +	int r;
> +
> +	dev_dbg(&pdev->dev, "vpe_runtime_get\n");
> +
> +	r = pm_runtime_get_sync(&pdev->dev);
> +	WARN_ON(r < 0);
> +	return r < 0 ? r : 0;
> +}
> +
> +static void vpe_runtime_put(struct platform_device *pdev)
> +{
> +
> +	int r;
> +
> +	dev_dbg(&pdev->dev, "vpe_runtime_put\n");
> +
> +	r = pm_runtime_put_sync(&pdev->dev);
> +	WARN_ON(r < 0 && r != -ENOSYS);
> +}
> +
> +static int vpe_probe(struct platform_device *pdev)
> +{
> +	struct vpe_dev *dev;
> +	struct video_device *vfd;
> +	struct resource *res;
> +	int ret, irq, func;
> +
> +	dev = devm_kzalloc(&pdev->dev, sizeof(*dev), GFP_KERNEL);
> +	if (IS_ERR(dev))
> +		return PTR_ERR(dev);
> +
> +	spin_lock_init(&dev->lock);
> +
> +	ret = v4l2_device_register(&pdev->dev, &dev->v4l2_dev);
> +	if (ret)
> +		return ret;
> +
> +	atomic_set(&dev->num_instances, 0);
> +	mutex_init(&dev->dev_mutex);
> +
> +	res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "vpe_top");
> +	/*
> +	 * HACK: we get resource info from device tree in the form of a list of
> +	 * VPE sub blocks, the driver currently uses only the base of vpe_top
> +	 * for register access, the driver should be changed later to access
> +	 * registers based on the sub block base addresses
> +	 */
> +	dev->base = devm_ioremap(&pdev->dev, res->start, SZ_32K);
> +	if (IS_ERR(dev->base)) {
> +		ret = PTR_ERR(dev->base);
> +		goto v4l2_dev_unreg;
> +	}
> +
> +	irq = platform_get_irq(pdev, 0);
> +	ret = devm_request_irq(&pdev->dev, irq, vpe_irq, 0, VPE_MODULE_NAME,
> +			dev);
> +	if (ret)
> +		goto v4l2_dev_unreg;
> +
> +	platform_set_drvdata(pdev, dev);
> +
> +	dev->alloc_ctx = vb2_dma_contig_init_ctx(&pdev->dev);
> +	if (IS_ERR(dev->alloc_ctx)) {
> +		vpe_err(dev, "Failed to alloc vb2 context\n");
> +		ret = PTR_ERR(dev->alloc_ctx);
> +		goto v4l2_dev_unreg;
> +	}
> +
> +	dev->m2m_dev = v4l2_m2m_init(&m2m_ops);
> +	if (IS_ERR(dev->m2m_dev)) {
> +		vpe_err(dev, "Failed to init mem2mem device\n");
> +		ret = PTR_ERR(dev->m2m_dev);
> +		goto rel_ctx;
> +	}
> +
> +	pm_runtime_enable(&pdev->dev);
> +
> +	ret = vpe_runtime_get(pdev);
> +	if (ret)
> +		goto rel_m2m;
> +
> +	/* Perform clk enable followed by reset */
> +	vpe_set_clock_enable(dev, 1);
> +
> +	vpe_top_reset(dev);
> +
> +	func = read_field_reg(dev, VPE_PID, VPE_PID_FUNC_MASK,
> +		VPE_PID_FUNC_SHIFT);
> +	vpe_dbg(dev, "VPE PID function %x\n", func);
> +
> +	vpe_top_vpdma_reset(dev);
> +
> +	dev->vpdma = vpdma_create(pdev);
> +	if (IS_ERR(dev->vpdma))
> +		goto runtime_put;
> +
> +	vfd = &dev->vfd;
> +	*vfd = vpe_videodev;
> +	vfd->lock = &dev->dev_mutex;
> +	vfd->v4l2_dev = &dev->v4l2_dev;
> +
> +	ret = video_register_device(vfd, VFL_TYPE_GRABBER, 0);
> +	if (ret) {
> +		vpe_err(dev, "Failed to register video device\n");
> +		goto runtime_put;
> +	}
> +
> +	video_set_drvdata(vfd, dev);
> +	snprintf(vfd->name, sizeof(vfd->name), "%s", vpe_videodev.name);
> +	dev_info(dev->v4l2_dev.dev, "Device registered as /dev/video%d\n",
> +		vfd->num);
> +
> +	return 0;
> +
> +runtime_put:
> +	vpe_runtime_put(pdev);
> +rel_m2m:
> +	pm_runtime_disable(&pdev->dev);
> +	v4l2_m2m_release(dev->m2m_dev);
> +rel_ctx:
> +	vb2_dma_contig_cleanup_ctx(dev->alloc_ctx);
> +v4l2_dev_unreg:
> +	v4l2_device_unregister(&dev->v4l2_dev);
> +
> +	return ret;
> +}
> +
> +static int vpe_remove(struct platform_device *pdev)
> +{
> +	struct vpe_dev *dev =
> +		(struct vpe_dev *) platform_get_drvdata(pdev);
> +
> +	v4l2_info(&dev->v4l2_dev, "Removing " VPE_MODULE_NAME);
> +
> +	v4l2_m2m_release(dev->m2m_dev);
> +	video_unregister_device(&dev->vfd);
> +	v4l2_device_unregister(&dev->v4l2_dev);
> +	vb2_dma_contig_cleanup_ctx(dev->alloc_ctx);
> +
> +	vpe_set_clock_enable(dev, 0);
> +	vpe_runtime_put(pdev);
> +	pm_runtime_disable(&pdev->dev);
> +
> +	return 0;
> +}
> +
> +#if defined(CONFIG_OF)
> +static const struct of_device_id vpe_of_match[] = {
> +	{
> +		.compatible = "ti,vpe",
> +	},
> +	{},
> +};
> +#else
> +#define vpe_of_match NULL
> +#endif
> +
> +static struct platform_driver vpe_pdrv = {
> +	.probe		= vpe_probe,
> +	.remove		= vpe_remove,
> +	.driver		= {
> +		.name	= VPE_MODULE_NAME,
> +		.owner	= THIS_MODULE,
> +		.of_match_table = vpe_of_match,
> +	},
> +};
> +
> +static void __exit vpe_exit(void)
> +{
> +	platform_driver_unregister(&vpe_pdrv);
> +}
> +
> +static int __init vpe_init(void)
> +{
> +	return platform_driver_register(&vpe_pdrv);
> +}
> +
> +module_init(vpe_init);
> +module_exit(vpe_exit);
> +
> +MODULE_DESCRIPTION("TI VPE driver");
> +MODULE_AUTHOR("Dale Farnsworth, <dale@farnsworth.org>");
> +MODULE_LICENSE("GPL");
> diff --git a/drivers/media/platform/ti-vpe/vpe_regs.h b/drivers/media/platform/ti-vpe/vpe_regs.h
> new file mode 100644
> index 0000000..ed214e8
> --- /dev/null
> +++ b/drivers/media/platform/ti-vpe/vpe_regs.h
> @@ -0,0 +1,496 @@
> +/*
> + * Copyright (c) 2013 Texas Instruments Inc.
> + *
> + * David Griego, <dagriego@biglakesoftware.com>
> + * Dale Farnsworth, <dale@farnsworth.org>
> + * Archit Taneja, <archit@ti.com>
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU General Public License version 2 as published by
> + * the Free Software Foundation.
> + */
> +
> +#ifndef __TI_VPE_REGS_H
> +#define __TI_VPE_REGS_H
> +
> +/* VPE register offsets and field selectors */
> +
> +/* VPE top level regs */
> +#define VPE_PID				0x0000
> +#define VPE_PID_MINOR_MASK		0x3f
> +#define VPE_PID_MINOR_SHIFT		0
> +#define VPE_PID_CUSTOM_MASK		0x03
> +#define VPE_PID_CUSTOM_SHIFT		6
> +#define VPE_PID_MAJOR_MASK		0x07
> +#define VPE_PID_MAJOR_SHIFT		8
> +#define VPE_PID_RTL_MASK		0x1f
> +#define VPE_PID_RTL_SHIFT		11
> +#define VPE_PID_FUNC_MASK		0xfff
> +#define VPE_PID_FUNC_SHIFT		16
> +#define VPE_PID_SCHEME_MASK		0x03
> +#define VPE_PID_SCHEME_SHIFT		30
> +
> +#define VPE_SYSCONFIG			0x0010
> +#define VPE_SYSCONFIG_IDLE_MASK		0x03
> +#define VPE_SYSCONFIG_IDLE_SHIFT	2
> +#define VPE_SYSCONFIG_STANDBY_MASK	0x03
> +#define VPE_SYSCONFIG_STANDBY_SHIFT	4
> +#define VPE_FORCE_IDLE_MODE		0
> +#define VPE_NO_IDLE_MODE		1
> +#define VPE_SMART_IDLE_MODE		2
> +#define VPE_SMART_IDLE_WAKEUP_MODE	3
> +#define VPE_FORCE_STANDBY_MODE		0
> +#define VPE_NO_STANDBY_MODE		1
> +#define VPE_SMART_STANDBY_MODE		2
> +#define VPE_SMART_STANDBY_WAKEUP_MODE	3
> +
> +#define VPE_INT0_STATUS0_RAW_SET	0x0020
> +#define VPE_INT0_STATUS0_RAW		VPE_INT0_STATUS0_RAW_SET
> +#define VPE_INT0_STATUS0_CLR		0x0028
> +#define VPE_INT0_STATUS0		VPE_INT0_STATUS0_CLR
> +#define VPE_INT0_ENABLE0_SET		0x0030
> +#define VPE_INT0_ENABLE0		VPE_INT0_ENABLE0_SET
> +#define VPE_INT0_ENABLE0_CLR		0x0038
> +#define VPE_INT0_LIST0_COMPLETE		(1 << 0)
> +#define VPE_INT0_LIST0_NOTIFY		(1 << 1)
> +#define VPE_INT0_LIST1_COMPLETE		(1 << 2)
> +#define VPE_INT0_LIST1_NOTIFY		(1 << 3)
> +#define VPE_INT0_LIST2_COMPLETE		(1 << 4)
> +#define VPE_INT0_LIST2_NOTIFY		(1 << 5)
> +#define VPE_INT0_LIST3_COMPLETE		(1 << 6)
> +#define VPE_INT0_LIST3_NOTIFY		(1 << 7)
> +#define VPE_INT0_LIST4_COMPLETE		(1 << 8)
> +#define VPE_INT0_LIST4_NOTIFY		(1 << 9)
> +#define VPE_INT0_LIST5_COMPLETE		(1 << 10)
> +#define VPE_INT0_LIST5_NOTIFY		(1 << 11)
> +#define VPE_INT0_LIST6_COMPLETE		(1 << 12)
> +#define VPE_INT0_LIST6_NOTIFY		(1 << 13)
> +#define VPE_INT0_LIST7_COMPLETE		(1 << 14)
> +#define VPE_INT0_LIST7_NOTIFY		(1 << 15)
> +#define VPE_INT0_DESCRIPTOR		(1 << 16)
> +#define VPE_DEI_FMD_INT			(1 << 18)
> +
> +#define VPE_INT0_STATUS1_RAW_SET	0x0024
> +#define VPE_INT0_STATUS1_RAW		VPE_INT0_STATUS1_RAW_SET
> +#define VPE_INT0_STATUS1_CLR		0x002c
> +#define VPE_INT0_STATUS1		VPE_INT0_STATUS1_CLR
> +#define VPE_INT0_ENABLE1_SET		0x0034
> +#define VPE_INT0_ENABLE1		VPE_INT0_ENABLE1_SET
> +#define VPE_INT0_ENABLE1_CLR		0x003c
> +#define VPE_INT0_CHANNEL_GROUP0		(1 << 0)
> +#define VPE_INT0_CHANNEL_GROUP1		(1 << 1)
> +#define VPE_INT0_CHANNEL_GROUP2		(1 << 2)
> +#define VPE_INT0_CHANNEL_GROUP3		(1 << 3)
> +#define VPE_INT0_CHANNEL_GROUP4		(1 << 4)
> +#define VPE_INT0_CHANNEL_GROUP5		(1 << 5)
> +#define VPE_INT0_CLIENT			(1 << 7)
> +#define VPE_DEI_ERROR_INT		(1 << 16)
> +#define VPE_DS1_UV_ERROR_INT		(1 << 22)
> +
> +#define VPE_INTC_EOI			0x00a0
> +
> +#define VPE_CLK_ENABLE			0x0100
> +#define VPE_VPEDMA_CLK_ENABLE		(1 << 0)
> +#define VPE_DATA_PATH_CLK_ENABLE	(1 << 1)
> +
> +#define VPE_CLK_RESET			0x0104
> +#define VPE_VPDMA_CLK_RESET_MASK	0x1
> +#define VPE_VPDMA_CLK_RESET_SHIFT	0
> +#define VPE_DATA_PATH_CLK_RESET_MASK	0x1
> +#define VPE_DATA_PATH_CLK_RESET_SHIFT	1
> +#define VPE_MAIN_RESET_MASK		0x1
> +#define VPE_MAIN_RESET_SHIFT		31
> +
> +#define VPE_CLK_FORMAT_SELECT		0x010c
> +#define VPE_CSC_SRC_SELECT_MASK		0x03
> +#define VPE_CSC_SRC_SELECT_SHIFT	0
> +#define VPE_RGB_OUT_SELECT		(1 << 8)
> +#define VPE_DS_SRC_SELECT_MASK		0x07
> +#define VPE_DS_SRC_SELECT_SHIFT		9
> +#define VPE_DS_BYPASS			(1 << 16)
> +#define VPE_COLOR_SEPARATE_422		(1 << 18)
> +
> +#define VPE_DS_SRC_DEI_SCALER		(5 << VPE_DS_SRC_SELECT_SHIFT)
> +#define VPE_CSC_SRC_DEI_SCALER		(3 << VPE_CSC_SRC_SELECT_SHIFT)
> +
> +#define VPE_CLK_RANGE_MAP		0x011c
> +#define VPE_RANGE_RANGE_MAP_Y_MASK	0x07
> +#define VPE_RANGE_RANGE_MAP_Y_SHIFT	0
> +#define VPE_RANGE_RANGE_MAP_UV_MASK	0x07
> +#define VPE_RANGE_RANGE_MAP_UV_SHIFT	3
> +#define VPE_RANGE_MAP_ON		(1 << 6)
> +#define VPE_RANGE_REDUCTION_ON		(1 << 28)
> +
> +/* VPE chrominance upsampler regs */
> +#define VPE_US1_R0			0x0304
> +#define VPE_US2_R0			0x0404
> +#define VPE_US3_R0			0x0504
> +#define VPE_US_C1_MASK			0x3fff
> +#define VPE_US_C1_SHIFT			2
> +#define VPE_US_C0_MASK			0x3fff
> +#define VPE_US_C0_SHIFT			18
> +#define VPE_US_MODE_MASK		0x03
> +#define VPE_US_MODE_SHIFT		16
> +#define VPE_ANCHOR_FID0_C1_MASK		0x3fff
> +#define VPE_ANCHOR_FID0_C1_SHIFT	2
> +#define VPE_ANCHOR_FID0_C0_MASK		0x3fff
> +#define VPE_ANCHOR_FID0_C0_SHIFT	18
> +
> +#define VPE_US1_R1			0x0308
> +#define VPE_US2_R1			0x0408
> +#define VPE_US3_R1			0x0508
> +#define VPE_ANCHOR_FID0_C3_MASK		0x3fff
> +#define VPE_ANCHOR_FID0_C3_SHIFT	2
> +#define VPE_ANCHOR_FID0_C2_MASK		0x3fff
> +#define VPE_ANCHOR_FID0_C2_SHIFT	18
> +
> +#define VPE_US1_R2			0x030c
> +#define VPE_US2_R2			0x040c
> +#define VPE_US3_R2			0x050c
> +#define VPE_INTERP_FID0_C1_MASK		0x3fff
> +#define VPE_INTERP_FID0_C1_SHIFT	2
> +#define VPE_INTERP_FID0_C0_MASK		0x3fff
> +#define VPE_INTERP_FID0_C0_SHIFT	18
> +
> +#define VPE_US1_R3			0x0310
> +#define VPE_US2_R3			0x0410
> +#define VPE_US3_R3			0x0510
> +#define VPE_INTERP_FID0_C3_MASK		0x3fff
> +#define VPE_INTERP_FID0_C3_SHIFT	2
> +#define VPE_INTERP_FID0_C2_MASK		0x3fff
> +#define VPE_INTERP_FID0_C2_SHIFT	18
> +
> +#define VPE_US1_R4			0x0314
> +#define VPE_US2_R4			0x0414
> +#define VPE_US3_R4			0x0514
> +#define VPE_ANCHOR_FID1_C1_MASK		0x3fff
> +#define VPE_ANCHOR_FID1_C1_SHIFT	2
> +#define VPE_ANCHOR_FID1_C0_MASK		0x3fff
> +#define VPE_ANCHOR_FID1_C0_SHIFT	18
> +
> +#define VPE_US1_R5			0x0318
> +#define VPE_US2_R5			0x0418
> +#define VPE_US3_R5			0x0518
> +#define VPE_ANCHOR_FID1_C3_MASK		0x3fff
> +#define VPE_ANCHOR_FID1_C3_SHIFT	2
> +#define VPE_ANCHOR_FID1_C2_MASK		0x3fff
> +#define VPE_ANCHOR_FID1_C2_SHIFT	18
> +
> +#define VPE_US1_R6			0x031c
> +#define VPE_US2_R6			0x041c
> +#define VPE_US3_R6			0x051c
> +#define VPE_INTERP_FID1_C1_MASK		0x3fff
> +#define VPE_INTERP_FID1_C1_SHIFT	2
> +#define VPE_INTERP_FID1_C0_MASK		0x3fff
> +#define VPE_INTERP_FID1_C0_SHIFT	18
> +
> +#define VPE_US1_R7			0x0320
> +#define VPE_US2_R7			0x0420
> +#define VPE_US3_R7			0x0520
> +#define VPE_INTERP_FID0_C3_MASK		0x3fff
> +#define VPE_INTERP_FID0_C3_SHIFT	2
> +#define VPE_INTERP_FID0_C2_MASK		0x3fff
> +#define VPE_INTERP_FID0_C2_SHIFT	18
> +
> +/* VPE de-interlacer regs */
> +#define VPE_DEI_FRAME_SIZE		0x0600
> +#define VPE_DEI_WIDTH_MASK		0x07ff
> +#define VPE_DEI_WIDTH_SHIFT		0
> +#define VPE_DEI_HEIGHT_MASK		0x07ff
> +#define VPE_DEI_HEIGHT_SHIFT		16
> +#define VPE_DEI_INTERLACE_BYPASS	(1 << 29)
> +#define VPE_DEI_FIELD_FLUSH		(1 << 30)
> +#define VPE_DEI_PROGRESSIVE		(1 << 31)
> +
> +#define VPE_MDT_BYPASS			0x0604
> +#define VPE_MDT_TEMPMAX_BYPASS		(1 << 0)
> +#define VPE_MDT_SPATMAX_BYPASS		(1 << 1)
> +
> +#define VPE_MDT_SF_THRESHOLD		0x0608
> +#define VPE_MDT_SF_SC_THR1_MASK		0xff
> +#define VPE_MDT_SF_SC_THR1_SHIFT	0
> +#define VPE_MDT_SF_SC_THR2_MASK		0xff
> +#define VPE_MDT_SF_SC_THR2_SHIFT	0
> +#define VPE_MDT_SF_SC_THR3_MASK		0xff
> +#define VPE_MDT_SF_SC_THR3_SHIFT	0
> +
> +#define VPE_EDI_CONFIG			0x060c
> +#define VPE_EDI_INP_MODE_MASK		0x03
> +#define VPE_EDI_INP_MODE_SHIFT		0
> +#define VPE_EDI_ENABLE_3D		(1 << 2)
> +#define VPE_EDI_ENABLE_CHROMA_3D	(1 << 3)
> +#define VPE_EDI_CHROMA3D_COR_THR_MASK	0xff
> +#define VPE_EDI_CHROMA3D_COR_THR_SHIFT	8
> +#define VPE_EDI_DIR_COR_LOWER_THR_MASK	0xff
> +#define VPE_EDI_DIR_COR_LOWER_THR_SHIFT	16
> +#define VPE_EDI_COR_SCALE_FACTOR_MASK	0xff
> +#define VPE_EDI_COR_SCALE_FACTOR_SHIFT	23
> +
> +#define VPE_DEI_EDI_LUT_R0		0x0610
> +#define VPE_EDI_LUT0_MASK		0x1f
> +#define VPE_EDI_LUT0_SHIFT		0
> +#define VPE_EDI_LUT1_MASK		0x1f
> +#define VPE_EDI_LUT1_SHIFT		8
> +#define VPE_EDI_LUT2_MASK		0x1f
> +#define VPE_EDI_LUT2_SHIFT		16
> +#define VPE_EDI_LUT3_MASK		0x1f
> +#define VPE_EDI_LUT3_SHIFT		24
> +
> +#define VPE_DEI_EDI_LUT_R1		0x0614
> +#define VPE_EDI_LUT0_MASK		0x1f
> +#define VPE_EDI_LUT0_SHIFT		0
> +#define VPE_EDI_LUT1_MASK		0x1f
> +#define VPE_EDI_LUT1_SHIFT		8
> +#define VPE_EDI_LUT2_MASK		0x1f
> +#define VPE_EDI_LUT2_SHIFT		16
> +#define VPE_EDI_LUT3_MASK		0x1f
> +#define VPE_EDI_LUT3_SHIFT		24
> +
> +#define VPE_DEI_EDI_LUT_R2		0x0618
> +#define VPE_EDI_LUT4_MASK		0x1f
> +#define VPE_EDI_LUT4_SHIFT		0
> +#define VPE_EDI_LUT5_MASK		0x1f
> +#define VPE_EDI_LUT5_SHIFT		8
> +#define VPE_EDI_LUT6_MASK		0x1f
> +#define VPE_EDI_LUT6_SHIFT		16
> +#define VPE_EDI_LUT7_MASK		0x1f
> +#define VPE_EDI_LUT7_SHIFT		24
> +
> +#define VPE_DEI_EDI_LUT_R3		0x061c
> +#define VPE_EDI_LUT8_MASK		0x1f
> +#define VPE_EDI_LUT8_SHIFT		0
> +#define VPE_EDI_LUT9_MASK		0x1f
> +#define VPE_EDI_LUT9_SHIFT		8
> +#define VPE_EDI_LUT10_MASK		0x1f
> +#define VPE_EDI_LUT10_SHIFT		16
> +#define VPE_EDI_LUT11_MASK		0x1f
> +#define VPE_EDI_LUT11_SHIFT		24
> +
> +#define VPE_DEI_FMD_WINDOW_R0		0x0620
> +#define VPE_FMD_WINDOW_MINX_MASK	0x07ff
> +#define VPE_FMD_WINDOW_MINX_SHIFT	0
> +#define VPE_FMD_WINDOW_MAXX_MASK	0x07ff
> +#define VPE_FMD_WINDOW_MAXX_SHIFT	16
> +#define VPE_FMD_WINDOW_ENABLE		(1 << 31)
> +
> +#define VPE_DEI_FMD_WINDOW_R1		0x0624
> +#define VPE_FMD_WINDOW_MINY_MASK	0x07ff
> +#define VPE_FMD_WINDOW_MINY_SHIFT	0
> +#define VPE_FMD_WINDOW_MAXY_MASK	0x07ff
> +#define VPE_FMD_WINDOW_MAXY_SHIFT	16
> +
> +#define VPE_DEI_FMD_CONTROL_R0		0x0628
> +#define VPE_FMD_ENABLE			(1 << 0)
> +#define VPE_FMD_LOCK			(1 << 1)
> +#define VPE_FMD_JAM_DIR			(1 << 2)
> +#define VPE_FMD_BED_ENABLE		(1 << 3)
> +#define VPE_FMD_CAF_FIELD_THR_MASK	0xff
> +#define VPE_FMD_CAF_FIELD_THR_SHIFT	16
> +#define VPE_FMD_CAF_LINE_THR_MASK	0xff
> +#define VPE_FMD_CAF_LINE_THR_SHIFT	24
> +
> +#define VPE_DEI_FMD_CONTROL_R1		0x062c
> +#define VPE_FMD_CAF_THR_MASK		0x000fffff
> +#define VPE_FMD_CAF_THR_SHIFT		0
> +
> +#define VPE_DEI_FMD_STATUS_R0		0x0630
> +#define VPE_FMD_CAF_MASK		0x000fffff
> +#define VPE_FMD_CAF_SHIFT		0
> +#define VPE_FMD_RESET			(1 << 24)
> +
> +#define VPE_DEI_FMD_STATUS_R1		0x0634
> +#define VPE_FMD_FIELD_DIFF_MASK		0x0fffffff
> +#define VPE_FMD_FIELD_DIFF_SHIFT	0
> +
> +#define VPE_DEI_FMD_STATUS_R2		0x0638
> +#define VPE_FMD_FRAME_DIFF_MASK		0x000fffff
> +#define VPE_FMD_FRAME_DIFF_SHIFT	0
> +
> +/* VPE scaler regs */
> +#define VPE_SC_MP_SC0			0x0700
> +#define VPE_INTERLACE_O			(1 << 0)
> +#define VPE_LINEAR			(1 << 1)
> +#define VPE_SC_BYPASS			(1 << 2)
> +#define VPE_INVT_FID			(1 << 3)
> +#define VPE_USE_RAV			(1 << 4)
> +#define VPE_ENABLE_EV			(1 << 5)
> +#define VPE_AUTO_HS			(1 << 6)
> +#define VPE_DCM_2X			(1 << 7)
> +#define VPE_DCM_4X			(1 << 8)
> +#define VPE_HP_BYPASS			(1 << 9)
> +#define VPE_INTERLACE_I			(1 << 10)
> +#define VPE_ENABLE_SIN2_VER_INTP	(1 << 11)
> +#define VPE_Y_PK_EN			(1 << 14)
> +#define VPE_TRIM			(1 << 15)
> +#define VPE_SELFGEN_FID			(1 << 16)
> +
> +#define VPE_SC_MP_SC1			0x0704
> +#define VPE_ROW_ACC_INC_MASK		0x07ffffff
> +#define VPE_ROW_ACC_INC_SHIFT		0
> +
> +#define VPE_SC_MP_SC2			0x0708
> +#define VPE_ROW_ACC_OFFSET_MASK		0x0fffffff
> +#define VPE_ROW_ACC_OFFSET_SHIFT	0
> +
> +#define VPE_SC_MP_SC3			0x070c
> +#define VPE_ROW_ACC_OFFSET_B_MASK	0x0fffffff
> +#define VPE_ROW_ACC_OFFSET_B_SHIFT	0
> +
> +#define VPE_SC_MP_SC4			0x0710
> +#define VPE_TAR_H_MASK			0x07ff
> +#define VPE_TAR_H_SHIFT			0
> +#define VPE_TAR_W_MASK			0x07ff
> +#define VPE_TAR_W_SHIFT			12
> +#define VPE_LIN_ACC_INC_U_MASK		0x07
> +#define VPE_LIN_ACC_INC_U_SHIFT		24
> +#define VPE_NLIN_ACC_INIT_U_MASK	0x07
> +#define VPE_NLIN_ACC_INIT_U_SHIFT	28
> +
> +#define VPE_SC_MP_SC5			0x0714
> +#define VPE_SRC_H_MASK			0x07ff
> +#define VPE_SRC_H_SHIFT			0
> +#define VPE_SRC_W_MASK			0x07ff
> +#define VPE_SRC_W_SHIFT			12
> +#define VPE_NLIN_ACC_INC_U_MASK		0x07
> +#define VPE_NLIN_ACC_INC_U_SHIFT	24
> +
> +#define VPE_SC_MP_SC6			0x0718
> +#define VPE_ROW_ACC_INIT_RAV_MASK	0x03ff
> +#define VPE_ROW_ACC_INIT_RAV_SHIFT	0
> +#define VPE_ROW_ACC_INIT_RAV_B_MASK	0x03ff
> +#define VPE_ROW_ACC_INIT_RAV_B_SHIFT	10
> +
> +#define VPE_SC_MP_SC8			0x0720
> +#define VPE_NLIN_LEFT_MASK		0x07ff
> +#define VPE_NLIN_LEFT_SHIFT		0
> +#define VPE_NLIN_RIGHT_MASK		0x07ff
> +#define VPE_NLIN_RIGHT_SHIFT		12
> +
> +#define VPE_SC_MP_SC9			0x0724
> +#define VPE_LIN_ACC_INC			VPE_SC_MP_SC9
> +
> +#define VPE_SC_MP_SC10			0x0728
> +#define VPE_NLIN_ACC_INIT		VPE_SC_MP_SC10
> +
> +#define VPE_SC_MP_SC11			0x072c
> +#define VPE_NLIN_ACC_INC		VPE_SC_MP_SC11
> +
> +#define VPE_SC_MP_SC12			0x0730
> +#define VPE_COL_ACC_OFFSET_MASK		0x01ffffff
> +#define VPE_COL_ACC_OFFSET_SHIFT	0
> +
> +#define VPE_SC_MP_SC13			0x0734
> +#define VPE_SC_FACTOR_RAV_MASK		0x03ff
> +#define VPE_SC_FACTOR_RAV_SHIFT		0
> +#define VPE_CHROMA_INTP_THR_MASK	0x03ff
> +#define VPE_CHROMA_INTP_THR_SHIFT	12
> +#define VPE_DELTA_CHROMA_THR_MASK	0x0f
> +#define VPE_DELTA_CHROMA_THR_SHIFT	24
> +
> +#define VPE_SC_MP_SC17			0x0744
> +#define VPE_EV_THR_MASK			0x03ff
> +#define VPE_EV_THR_SHIFT		12
> +#define VPE_DELTA_LUMA_THR_MASK		0x0f
> +#define VPE_DELTA_LUMA_THR_SHIFT	24
> +#define VPE_DELTA_EV_THR_MASK		0x0f
> +#define VPE_DELTA_EV_THR_SHIFT		28
> +
> +#define VPE_SC_MP_SC18			0x0748
> +#define VPE_HS_FACTOR_MASK		0x03ff
> +#define VPE_HS_FACTOR_SHIFT		0
> +#define VPE_CONF_DEFAULT_MASK		0x01ff
> +#define VPE_CONF_DEFAULT_SHIFT		16
> +
> +#define VPE_SC_MP_SC19			0x074c
> +#define VPE_HPF_COEFF0_MASK		0xff
> +#define VPE_HPF_COEFF0_SHIFT		0
> +#define VPE_HPF_COEFF1_MASK		0xff
> +#define VPE_HPF_COEFF1_SHIFT		8
> +#define VPE_HPF_COEFF2_MASK		0xff
> +#define VPE_HPF_COEFF2_SHIFT		16
> +#define VPE_HPF_COEFF3_MASK		0xff
> +#define VPE_HPF_COEFF3_SHIFT		23
> +
> +#define VPE_SC_MP_SC20			0x0750
> +#define VPE_HPF_COEFF4_MASK		0xff
> +#define VPE_HPF_COEFF4_SHIFT		0
> +#define VPE_HPF_COEFF5_MASK		0xff
> +#define VPE_HPF_COEFF5_SHIFT		8
> +#define VPE_HPF_NORM_SHIFT_MASK		0x07
> +#define VPE_HPF_NORM_SHIFT_SHIFT	16
> +#define VPE_NL_LIMIT_MASK		0x1ff
> +#define VPE_NL_LIMIT_SHIFT		20
> +
> +#define VPE_SC_MP_SC21			0x0754
> +#define VPE_NL_LO_THR_MASK		0x01ff
> +#define VPE_NL_LO_THR_SHIFT		0
> +#define VPE_NL_LO_SLOPE_MASK		0xff
> +#define VPE_NL_LO_SLOPE_SHIFT		16
> +
> +#define VPE_SC_MP_SC22			0x0758
> +#define VPE_NL_HI_THR_MASK		0x01ff
> +#define VPE_NL_HI_THR_SHIFT		0
> +#define VPE_NL_HI_SLOPE_SH_MASK		0x07
> +#define VPE_NL_HI_SLOPE_SH_SHIFT	16
> +
> +#define VPE_SC_MP_SC23			0x075c
> +#define VPE_GRADIENT_THR_MASK		0x07ff
> +#define VPE_GRADIENT_THR_SHIFT		0
> +#define VPE_GRADIENT_THR_RANGE_MASK	0x0f
> +#define VPE_GRADIENT_THR_RANGE_SHIFT	12
> +#define VPE_MIN_GY_THR_MASK		0xff
> +#define VPE_MIN_GY_THR_SHIFT		16
> +#define VPE_MIN_GY_THR_RANGE_MASK	0x0f
> +#define VPE_MIN_GY_THR_RANGE_SHIFT	28
> +
> +#define VPE_SC_MP_SC24			0x0760
> +#define VPE_ORG_H_MASK			0x07ff
> +#define VPE_ORG_H_SHIFT			0
> +#define VPE_ORG_W_MASK			0x07ff
> +#define VPE_ORG_W_SHIFT			16
> +
> +#define VPE_SC_MP_SC25			0x0764
> +#define VPE_OFF_H_MASK			0x07ff
> +#define VPE_OFF_H_SHIFT			0
> +#define VPE_OFF_W_MASK			0x07ff
> +#define VPE_OFF_W_SHIFT			16
> +
> +/* VPE color space converter regs */
> +#define VPE_CSC_CSC00			0x5700
> +#define VPE_CSC_A0_MASK			0x1fff
> +#define VPE_CSC_A0_SHIFT		0
> +#define VPE_CSC_B0_MASK			0x1fff
> +#define VPE_CSC_B0_SHIFT		16
> +
> +#define VPE_CSC_CSC01			0x5704
> +#define VPE_CSC_C0_MASK			0x1fff
> +#define VPE_CSC_C0_SHIFT		0
> +#define VPE_CSC_A1_MASK			0x1fff
> +#define VPE_CSC_A1_SHIFT		16
> +
> +#define VPE_CSC_CSC02			0x5708
> +#define VPE_CSC_B1_MASK			0x1fff
> +#define VPE_CSC_B1_SHIFT		0
> +#define VPE_CSC_C1_MASK			0x1fff
> +#define VPE_CSC_C1_SHIFT		16
> +
> +#define VPE_CSC_CSC03			0x570c
> +#define VPE_CSC_A2_MASK			0x1fff
> +#define VPE_CSC_A2_SHIFT		0
> +#define VPE_CSC_B2_MASK			0x1fff
> +#define VPE_CSC_B2_SHIFT		16
> +
> +#define VPE_CSC_CSC04			0x5710
> +#define VPE_CSC_C2_MASK			0x1fff
> +#define VPE_CSC_C2_SHIFT		0
> +#define VPE_CSC_D0_MASK			0x0fff
> +#define VPE_CSC_D0_SHIFT		16
> +
> +#define VPE_CSC_CSC05			0x5714
> +#define VPE_CSC_D1_MASK			0x0fff
> +#define VPE_CSC_D1_SHIFT		0
> +#define VPE_CSC_D2_MASK			0x0fff
> +#define VPE_CSC_D2_SHIFT		16
> +#define VPE_CSC_BYPASS			(1 << 28)
> +
> +#endif
> diff --git a/include/uapi/linux/v4l2-controls.h b/include/uapi/linux/v4l2-controls.h
> index 083bb5a..1666aab 100644
> --- a/include/uapi/linux/v4l2-controls.h
> +++ b/include/uapi/linux/v4l2-controls.h
> @@ -160,6 +160,10 @@ enum v4l2_colorfx {
>   * of controls. Total of 16 controls is reserved for this driver */
>  #define V4L2_CID_USER_SI476X_BASE		(V4L2_CID_USER_BASE + 0x1040)
>  
> +/* The base for the TI VPE driver controls. Total of 16 controls is reserved for
> + * this driver */
> +#define V4L2_CID_USER_TI_VPE_BASE		(V4L2_CID_USER_BASE + 0x1050)
> +
>  /* MPEG-class control IDs */
>  /* The MPEG controls are applicable to all codec controls
>   * and the 'MPEG' part of the define is historical */
> 

Regards,

	Hans

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 4/4] v4l: ti-vpe: Add de-interlacer support in VPE
  2013-09-06 10:12       ` Archit Taneja
  (?)
@ 2013-10-07  7:57       ` Hans Verkuil
  -1 siblings, 0 replies; 138+ messages in thread
From: Hans Verkuil @ 2013-10-07  7:57 UTC (permalink / raw)
  To: Archit Taneja; +Cc: linux-media, laurent.pinchart, linux-omap, tomi.valkeinen

On 09/06/2013 12:12 PM, Archit Taneja wrote:
> Add support for the de-interlacer block in VPE.
> 
> For de-interlacer to work, we need to enable 2 more sets of VPE input ports
> which fetch data from the 'last' and 'last to last' fields of the interlaced
> video. Apart from that, we need to enable the Motion vector output and input
> ports, and also allocate DMA buffers for them.
> 
> We need to make sure that two most recent fields in the source queue are
> available and in the 'READY' state. Once a mem2mem context gets access to the
> VPE HW(in device_run), it extracts the addresses of the 3 buffers, and provides
> it to the data descriptors for the 3 sets of input ports((LUMA1, CHROMA1),
> (LUMA2, CHROMA2), and (LUMA3, CHROMA3)) respectively for the 3 consecutive
> fields. The motion vector and output port descriptors are configured and the
> list is submitted to VPDMA.
> 
> Once the transaction is done, the v4l2 buffer corresponding to the oldest
> field(the 3rd one) is changed to the state 'DONE', and the buffers corresponding
> to 1st and 2nd fields become the 2nd and 3rd field for the next de-interlace
> operation. This way, for each deinterlace operation, we have the 3 most recent
> fields. After each transaction, we also swap the motion vector buffers, the new
> input motion vector buffer contains the resultant motion information of all the
> previous frames, and the new output motion vector buffer will be used to hold
> the updated motion vector to capture the motion changes in the next field. The
> motion vector buffers are allocated using the DMA allocation API.
> 
> The de-interlacer is removed from bypass mode, it requires some extra default
> configurations which are now added. The chrominance upsampler coefficients are
> added for interlaced frames. Some VPDMA parameters like frame start event and
> line mode are configured for the 2 extra sets of input ports.
> 
> Signed-off-by: Archit Taneja <archit@ti.com>

Acked-by: Hans Verkuil <hans.verkuil@cisco.com>

Regards,

	Hans

> ---
>  drivers/media/platform/ti-vpe/vpe.c | 392 ++++++++++++++++++++++++++++++++----
>  1 file changed, 358 insertions(+), 34 deletions(-)
> 


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 3/4] v4l: ti-vpe: Add VPE mem to mem driver
  2013-10-07  7:55       ` Hans Verkuil
@ 2013-10-07  9:16           ` Archit Taneja
  0 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-10-07  9:16 UTC (permalink / raw)
  To: Hans Verkuil; +Cc: linux-media, laurent.pinchart, linux-omap, tomi.valkeinen

On Monday 07 October 2013 01:25 PM, Hans Verkuil wrote:
> Hi Archit,
>
> I've got a few comments below...
>
> On 09/06/2013 12:12 PM, Archit Taneja wrote:
>> VPE is a block which consists of a single memory to memory path which can
>> perform chrominance up/down sampling, de-interlacing, scaling, and color space
>> conversion of raster or tiled YUV420 coplanar, YUV422 coplanar or YUV422
>> interleaved video formats.
>>
>> We create a mem2mem driver based primarily on the mem2mem-testdev example.
>> The de-interlacer, scaler and color space converter are all bypassed for now
>> to keep the driver simple. Chroma up/down sampler blocks are implemented, so
>> conversion beteen different YUV formats is possible.
>>
>> Each mem2mem context allocates a buffer for VPE MMR values which it will use
>> when it gets access to the VPE HW via the mem2mem queue, it also allocates
>> a VPDMA descriptor list to which configuration and data descriptors are added.
>>
>> Based on the information received via v4l2 ioctls for the source and
>> destination queues, the driver configures the values for the MMRs, and stores
>> them in the buffer. There are also some VPDMA parameters like frame start and
>> line mode which needs to be configured, these are configured by direct register
>> writes via the VPDMA helper functions.
>>
>> The driver's device_run() mem2mem op will add each descriptor based on how the
>> source and destination queues are set up for the given ctx, once the list is
>> prepared, it's submitted to VPDMA, these descriptors when parsed by VPDMA will
>> upload MMR registers, start DMA of video buffers on the various input and output
>> clients/ports.
>>
>> When the list is parsed completely(and the DMAs on all the output ports done),
>> an interrupt is generated which we use to notify that the source and destination
>> buffers are done.
>>
>> The rest of the driver is quite similar to other mem2mem drivers, we use the
>> multiplane v4l2 ioctls as the HW support coplanar formats.
>>
>> Signed-off-by: Archit Taneja <archit@ti.com>
>> ---
>>   drivers/media/platform/Kconfig           |   16 +
>>   drivers/media/platform/Makefile          |    2 +
>>   drivers/media/platform/ti-vpe/Makefile   |    5 +
>>   drivers/media/platform/ti-vpe/vpe.c      | 1750 ++++++++++++++++++++++++++++++
>>   drivers/media/platform/ti-vpe/vpe_regs.h |  496 +++++++++
>>   include/uapi/linux/v4l2-controls.h       |    4 +
>>   6 files changed, 2273 insertions(+)
>>   create mode 100644 drivers/media/platform/ti-vpe/Makefile
>>   create mode 100644 drivers/media/platform/ti-vpe/vpe.c
>>   create mode 100644 drivers/media/platform/ti-vpe/vpe_regs.h
>>
>
> <snip>
>
>> +
>> +static int vpe_g_fmt(struct file *file, void *priv, struct v4l2_format *f)
>> +{
>> +	struct v4l2_pix_format_mplane *pix = &f->fmt.pix_mp;
>> +	struct vpe_ctx *ctx = file2ctx(file);
>> +	struct vb2_queue *vq;
>> +	struct vpe_q_data *q_data;
>> +	int i;
>> +
>> +	vq = v4l2_m2m_get_vq(ctx->m2m_ctx, f->type);
>> +	if (!vq)
>> +		return -EINVAL;
>> +
>> +	q_data = get_q_data(ctx, f->type);
>> +
>> +	pix->width = q_data->width;
>> +	pix->height = q_data->height;
>> +	pix->pixelformat = q_data->fmt->fourcc;
>> +	pix->colorspace = q_data->colorspace;
>> +	pix->num_planes = q_data->fmt->coplanar ? 2 : 1;
>> +
>> +	for (i = 0; i < pix->num_planes; i++) {
>> +		pix->plane_fmt[i].bytesperline = q_data->bytesperline[i];
>> +		pix->plane_fmt[i].sizeimage = q_data->sizeimage[i];
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static int __vpe_try_fmt(struct vpe_ctx *ctx, struct v4l2_format *f,
>> +		       struct vpe_fmt *fmt, int type)
>> +{
>> +	struct v4l2_pix_format_mplane *pix = &f->fmt.pix_mp;
>> +	struct v4l2_plane_pix_format *plane_fmt;
>> +	int i;
>> +
>> +	if (!fmt || !(fmt->types & type)) {
>> +		vpe_err(ctx->dev, "Fourcc format (0x%08x) invalid.\n",
>> +			pix->pixelformat);
>> +		return -EINVAL;
>> +	}
>> +
>> +	pix->field = V4L2_FIELD_NONE;
>> +
>> +	v4l_bound_align_image(&pix->width, MIN_W, MAX_W, W_ALIGN,
>> +			      &pix->height, MIN_H, MAX_H, H_ALIGN,
>> +			      S_ALIGN);
>> +
>> +	pix->num_planes = fmt->coplanar ? 2 : 1;
>> +	pix->pixelformat = fmt->fourcc;
>> +	pix->colorspace = fmt->fourcc == V4L2_PIX_FMT_RGB24 ?
>
> You do this only for capture. Output sets the colorspace, so try_fmt should
> leave the colorspace field untouched for the output direction.
>
>> +			V4L2_COLORSPACE_SRGB : V4L2_COLORSPACE_SMPTE170M;

The input to the VPE block can be various YUV formats, and the VPE can 
generate both RGB and YUV formats.

So, I guess the output(V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE) side has 
choice only to set V4L2_COLORSPACE_SMPTE170M. And in the 
capture(V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE) capture side we have both 
SRG and SMPTE170M options.

One thing I am not clear about is whether the userspace application has 
to set the colorspace in the v4l2 format for OUTPUT or CAPTURE or both?

 From what I understood, the code should be as below.

For output:

if (!pix->colorspace)
	pix->colorspace = V4L2_COLORSPACE_SMPTE170M;

And for capture:
	pix->colorspace = fmt->fourcc == V4L2_PIX_FMT_RGB24 ?
		V4L2_COLORSPACE_SRGB : V4L2_COLORSPACE_SMPTE170M;

Does this look correct?

>
> Zero pix->priv as well:
>
> 	pix->priv = 0;

pix here is of the type v4l2_pix_format_mplane, so there isn't a private 
field here.

Thanks,
Archit


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 3/4] v4l: ti-vpe: Add VPE mem to mem driver
@ 2013-10-07  9:16           ` Archit Taneja
  0 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-10-07  9:16 UTC (permalink / raw)
  To: Hans Verkuil; +Cc: linux-media, laurent.pinchart, linux-omap, tomi.valkeinen

On Monday 07 October 2013 01:25 PM, Hans Verkuil wrote:
> Hi Archit,
>
> I've got a few comments below...
>
> On 09/06/2013 12:12 PM, Archit Taneja wrote:
>> VPE is a block which consists of a single memory to memory path which can
>> perform chrominance up/down sampling, de-interlacing, scaling, and color space
>> conversion of raster or tiled YUV420 coplanar, YUV422 coplanar or YUV422
>> interleaved video formats.
>>
>> We create a mem2mem driver based primarily on the mem2mem-testdev example.
>> The de-interlacer, scaler and color space converter are all bypassed for now
>> to keep the driver simple. Chroma up/down sampler blocks are implemented, so
>> conversion beteen different YUV formats is possible.
>>
>> Each mem2mem context allocates a buffer for VPE MMR values which it will use
>> when it gets access to the VPE HW via the mem2mem queue, it also allocates
>> a VPDMA descriptor list to which configuration and data descriptors are added.
>>
>> Based on the information received via v4l2 ioctls for the source and
>> destination queues, the driver configures the values for the MMRs, and stores
>> them in the buffer. There are also some VPDMA parameters like frame start and
>> line mode which needs to be configured, these are configured by direct register
>> writes via the VPDMA helper functions.
>>
>> The driver's device_run() mem2mem op will add each descriptor based on how the
>> source and destination queues are set up for the given ctx, once the list is
>> prepared, it's submitted to VPDMA, these descriptors when parsed by VPDMA will
>> upload MMR registers, start DMA of video buffers on the various input and output
>> clients/ports.
>>
>> When the list is parsed completely(and the DMAs on all the output ports done),
>> an interrupt is generated which we use to notify that the source and destination
>> buffers are done.
>>
>> The rest of the driver is quite similar to other mem2mem drivers, we use the
>> multiplane v4l2 ioctls as the HW support coplanar formats.
>>
>> Signed-off-by: Archit Taneja <archit@ti.com>
>> ---
>>   drivers/media/platform/Kconfig           |   16 +
>>   drivers/media/platform/Makefile          |    2 +
>>   drivers/media/platform/ti-vpe/Makefile   |    5 +
>>   drivers/media/platform/ti-vpe/vpe.c      | 1750 ++++++++++++++++++++++++++++++
>>   drivers/media/platform/ti-vpe/vpe_regs.h |  496 +++++++++
>>   include/uapi/linux/v4l2-controls.h       |    4 +
>>   6 files changed, 2273 insertions(+)
>>   create mode 100644 drivers/media/platform/ti-vpe/Makefile
>>   create mode 100644 drivers/media/platform/ti-vpe/vpe.c
>>   create mode 100644 drivers/media/platform/ti-vpe/vpe_regs.h
>>
>
> <snip>
>
>> +
>> +static int vpe_g_fmt(struct file *file, void *priv, struct v4l2_format *f)
>> +{
>> +	struct v4l2_pix_format_mplane *pix = &f->fmt.pix_mp;
>> +	struct vpe_ctx *ctx = file2ctx(file);
>> +	struct vb2_queue *vq;
>> +	struct vpe_q_data *q_data;
>> +	int i;
>> +
>> +	vq = v4l2_m2m_get_vq(ctx->m2m_ctx, f->type);
>> +	if (!vq)
>> +		return -EINVAL;
>> +
>> +	q_data = get_q_data(ctx, f->type);
>> +
>> +	pix->width = q_data->width;
>> +	pix->height = q_data->height;
>> +	pix->pixelformat = q_data->fmt->fourcc;
>> +	pix->colorspace = q_data->colorspace;
>> +	pix->num_planes = q_data->fmt->coplanar ? 2 : 1;
>> +
>> +	for (i = 0; i < pix->num_planes; i++) {
>> +		pix->plane_fmt[i].bytesperline = q_data->bytesperline[i];
>> +		pix->plane_fmt[i].sizeimage = q_data->sizeimage[i];
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static int __vpe_try_fmt(struct vpe_ctx *ctx, struct v4l2_format *f,
>> +		       struct vpe_fmt *fmt, int type)
>> +{
>> +	struct v4l2_pix_format_mplane *pix = &f->fmt.pix_mp;
>> +	struct v4l2_plane_pix_format *plane_fmt;
>> +	int i;
>> +
>> +	if (!fmt || !(fmt->types & type)) {
>> +		vpe_err(ctx->dev, "Fourcc format (0x%08x) invalid.\n",
>> +			pix->pixelformat);
>> +		return -EINVAL;
>> +	}
>> +
>> +	pix->field = V4L2_FIELD_NONE;
>> +
>> +	v4l_bound_align_image(&pix->width, MIN_W, MAX_W, W_ALIGN,
>> +			      &pix->height, MIN_H, MAX_H, H_ALIGN,
>> +			      S_ALIGN);
>> +
>> +	pix->num_planes = fmt->coplanar ? 2 : 1;
>> +	pix->pixelformat = fmt->fourcc;
>> +	pix->colorspace = fmt->fourcc == V4L2_PIX_FMT_RGB24 ?
>
> You do this only for capture. Output sets the colorspace, so try_fmt should
> leave the colorspace field untouched for the output direction.
>
>> +			V4L2_COLORSPACE_SRGB : V4L2_COLORSPACE_SMPTE170M;

The input to the VPE block can be various YUV formats, and the VPE can 
generate both RGB and YUV formats.

So, I guess the output(V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE) side has 
choice only to set V4L2_COLORSPACE_SMPTE170M. And in the 
capture(V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE) capture side we have both 
SRG and SMPTE170M options.

One thing I am not clear about is whether the userspace application has 
to set the colorspace in the v4l2 format for OUTPUT or CAPTURE or both?

 From what I understood, the code should be as below.

For output:

if (!pix->colorspace)
	pix->colorspace = V4L2_COLORSPACE_SMPTE170M;

And for capture:
	pix->colorspace = fmt->fourcc == V4L2_PIX_FMT_RGB24 ?
		V4L2_COLORSPACE_SRGB : V4L2_COLORSPACE_SMPTE170M;

Does this look correct?

>
> Zero pix->priv as well:
>
> 	pix->priv = 0;

pix here is of the type v4l2_pix_format_mplane, so there isn't a private 
field here.

Thanks,
Archit


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 3/4] v4l: ti-vpe: Add VPE mem to mem driver
  2013-10-07  9:16           ` Archit Taneja
  (?)
@ 2013-10-07  9:34           ` Hans Verkuil
  2013-10-07 10:22               ` Archit Taneja
  -1 siblings, 1 reply; 138+ messages in thread
From: Hans Verkuil @ 2013-10-07  9:34 UTC (permalink / raw)
  To: Archit Taneja; +Cc: linux-media, laurent.pinchart, linux-omap, tomi.valkeinen

On 10/07/2013 11:16 AM, Archit Taneja wrote:
> On Monday 07 October 2013 01:25 PM, Hans Verkuil wrote:
>> Hi Archit,
>>
>> I've got a few comments below...
>>
>> On 09/06/2013 12:12 PM, Archit Taneja wrote:
>>> VPE is a block which consists of a single memory to memory path which can
>>> perform chrominance up/down sampling, de-interlacing, scaling, and color space
>>> conversion of raster or tiled YUV420 coplanar, YUV422 coplanar or YUV422
>>> interleaved video formats.
>>>
>>> We create a mem2mem driver based primarily on the mem2mem-testdev example.
>>> The de-interlacer, scaler and color space converter are all bypassed for now
>>> to keep the driver simple. Chroma up/down sampler blocks are implemented, so
>>> conversion beteen different YUV formats is possible.
>>>
>>> Each mem2mem context allocates a buffer for VPE MMR values which it will use
>>> when it gets access to the VPE HW via the mem2mem queue, it also allocates
>>> a VPDMA descriptor list to which configuration and data descriptors are added.
>>>
>>> Based on the information received via v4l2 ioctls for the source and
>>> destination queues, the driver configures the values for the MMRs, and stores
>>> them in the buffer. There are also some VPDMA parameters like frame start and
>>> line mode which needs to be configured, these are configured by direct register
>>> writes via the VPDMA helper functions.
>>>
>>> The driver's device_run() mem2mem op will add each descriptor based on how the
>>> source and destination queues are set up for the given ctx, once the list is
>>> prepared, it's submitted to VPDMA, these descriptors when parsed by VPDMA will
>>> upload MMR registers, start DMA of video buffers on the various input and output
>>> clients/ports.
>>>
>>> When the list is parsed completely(and the DMAs on all the output ports done),
>>> an interrupt is generated which we use to notify that the source and destination
>>> buffers are done.
>>>
>>> The rest of the driver is quite similar to other mem2mem drivers, we use the
>>> multiplane v4l2 ioctls as the HW support coplanar formats.
>>>
>>> Signed-off-by: Archit Taneja <archit@ti.com>
>>> ---
>>>   drivers/media/platform/Kconfig           |   16 +
>>>   drivers/media/platform/Makefile          |    2 +
>>>   drivers/media/platform/ti-vpe/Makefile   |    5 +
>>>   drivers/media/platform/ti-vpe/vpe.c      | 1750 ++++++++++++++++++++++++++++++
>>>   drivers/media/platform/ti-vpe/vpe_regs.h |  496 +++++++++
>>>   include/uapi/linux/v4l2-controls.h       |    4 +
>>>   6 files changed, 2273 insertions(+)
>>>   create mode 100644 drivers/media/platform/ti-vpe/Makefile
>>>   create mode 100644 drivers/media/platform/ti-vpe/vpe.c
>>>   create mode 100644 drivers/media/platform/ti-vpe/vpe_regs.h
>>>
>>
>> <snip>
>>
>>> +
>>> +static int vpe_g_fmt(struct file *file, void *priv, struct v4l2_format *f)
>>> +{
>>> +	struct v4l2_pix_format_mplane *pix = &f->fmt.pix_mp;
>>> +	struct vpe_ctx *ctx = file2ctx(file);
>>> +	struct vb2_queue *vq;
>>> +	struct vpe_q_data *q_data;
>>> +	int i;
>>> +
>>> +	vq = v4l2_m2m_get_vq(ctx->m2m_ctx, f->type);
>>> +	if (!vq)
>>> +		return -EINVAL;
>>> +
>>> +	q_data = get_q_data(ctx, f->type);
>>> +
>>> +	pix->width = q_data->width;
>>> +	pix->height = q_data->height;
>>> +	pix->pixelformat = q_data->fmt->fourcc;
>>> +	pix->colorspace = q_data->colorspace;
>>> +	pix->num_planes = q_data->fmt->coplanar ? 2 : 1;
>>> +
>>> +	for (i = 0; i < pix->num_planes; i++) {
>>> +		pix->plane_fmt[i].bytesperline = q_data->bytesperline[i];
>>> +		pix->plane_fmt[i].sizeimage = q_data->sizeimage[i];
>>> +	}
>>> +
>>> +	return 0;
>>> +}
>>> +
>>> +static int __vpe_try_fmt(struct vpe_ctx *ctx, struct v4l2_format *f,
>>> +		       struct vpe_fmt *fmt, int type)
>>> +{
>>> +	struct v4l2_pix_format_mplane *pix = &f->fmt.pix_mp;
>>> +	struct v4l2_plane_pix_format *plane_fmt;
>>> +	int i;
>>> +
>>> +	if (!fmt || !(fmt->types & type)) {
>>> +		vpe_err(ctx->dev, "Fourcc format (0x%08x) invalid.\n",
>>> +			pix->pixelformat);
>>> +		return -EINVAL;
>>> +	}
>>> +
>>> +	pix->field = V4L2_FIELD_NONE;
>>> +
>>> +	v4l_bound_align_image(&pix->width, MIN_W, MAX_W, W_ALIGN,
>>> +			      &pix->height, MIN_H, MAX_H, H_ALIGN,
>>> +			      S_ALIGN);
>>> +
>>> +	pix->num_planes = fmt->coplanar ? 2 : 1;
>>> +	pix->pixelformat = fmt->fourcc;
>>> +	pix->colorspace = fmt->fourcc == V4L2_PIX_FMT_RGB24 ?
>>
>> You do this only for capture. Output sets the colorspace, so try_fmt should
>> leave the colorspace field untouched for the output direction.
>>
>>> +			V4L2_COLORSPACE_SRGB : V4L2_COLORSPACE_SMPTE170M;
> 
> The input to the VPE block can be various YUV formats, and the VPE can 
> generate both RGB and YUV formats.
> 
> So, I guess the output(V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE) side has 
> choice only to set V4L2_COLORSPACE_SMPTE170M. And in the 
> capture(V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE) capture side we have both 
> SRG and SMPTE170M options.
> 
> One thing I am not clear about is whether the userspace application has 
> to set the colorspace in the v4l2 format for OUTPUT or CAPTURE or both?

The spec today says that the colorspace field is filled in by the driver.
It does not differentiate between output and capture. This is patently wrong,
since for output it should be set by the application since that's who is
telling the driver what colorspace the image has. The driver may change it
if it doesn't support that colorspace, but otherwise it should leave it as
is.

A mem-to-mem device that doesn't care about the colorspace should just copy
the colorspace field from the output value into the capture.

What is missing in today's API is a way to do colorspace conversion in a m2m
device since there is no way today to tell the driver the desired colorspace
that it should get back from the m2m device.

> 
>  From what I understood, the code should be as below.
> 
> For output:
> 
> if (!pix->colorspace)
> 	pix->colorspace = V4L2_COLORSPACE_SMPTE170M;

I would leave off the 'if' part. If this colorspace is all you support on the
output, then always set it.

However, since it can convert YUV to RGB, doesn't the hardware have to know
about the various YUV colorspaces? SDTV and HDTV have different colorspaces.

> 
> And for capture:
> 	pix->colorspace = fmt->fourcc == V4L2_PIX_FMT_RGB24 ?
> 		V4L2_COLORSPACE_SRGB : V4L2_COLORSPACE_SMPTE170M;
> 
> Does this look correct?

Yes, unless the hardware can take SDTV/HDTV YUV colorspaces into account. In
that case I need to think how the API should be improved.

> 
>>
>> Zero pix->priv as well:
>>
>> 	pix->priv = 0;
> 
> pix here is of the type v4l2_pix_format_mplane, so there isn't a private 
> field here.

Oops, my fault. I hadn't noticed that.

> 
> Thanks,
> Archit
> 

Regards,

	Hans

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 3/4] v4l: ti-vpe: Add VPE mem to mem driver
  2013-10-07  9:34           ` Hans Verkuil
@ 2013-10-07 10:22               ` Archit Taneja
  0 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-10-07 10:22 UTC (permalink / raw)
  To: Hans Verkuil; +Cc: linux-media, laurent.pinchart, linux-omap, tomi.valkeinen

On Monday 07 October 2013 03:04 PM, Hans Verkuil wrote:
> On 10/07/2013 11:16 AM, Archit Taneja wrote:
>> On Monday 07 October 2013 01:25 PM, Hans Verkuil wrote:
>>> Hi Archit,
>>>
>>> I've got a few comments below...
>>>
>>> On 09/06/2013 12:12 PM, Archit Taneja wrote:
>>>> VPE is a block which consists of a single memory to memory path which can
>>>> perform chrominance up/down sampling, de-interlacing, scaling, and color space
>>>> conversion of raster or tiled YUV420 coplanar, YUV422 coplanar or YUV422
>>>> interleaved video formats.
>>>>
>>>> We create a mem2mem driver based primarily on the mem2mem-testdev example.
>>>> The de-interlacer, scaler and color space converter are all bypassed for now
>>>> to keep the driver simple. Chroma up/down sampler blocks are implemented, so
>>>> conversion beteen different YUV formats is possible.
>>>>
>>>> Each mem2mem context allocates a buffer for VPE MMR values which it will use
>>>> when it gets access to the VPE HW via the mem2mem queue, it also allocates
>>>> a VPDMA descriptor list to which configuration and data descriptors are added.
>>>>
>>>> Based on the information received via v4l2 ioctls for the source and
>>>> destination queues, the driver configures the values for the MMRs, and stores
>>>> them in the buffer. There are also some VPDMA parameters like frame start and
>>>> line mode which needs to be configured, these are configured by direct register
>>>> writes via the VPDMA helper functions.
>>>>
>>>> The driver's device_run() mem2mem op will add each descriptor based on how the
>>>> source and destination queues are set up for the given ctx, once the list is
>>>> prepared, it's submitted to VPDMA, these descriptors when parsed by VPDMA will
>>>> upload MMR registers, start DMA of video buffers on the various input and output
>>>> clients/ports.
>>>>
>>>> When the list is parsed completely(and the DMAs on all the output ports done),
>>>> an interrupt is generated which we use to notify that the source and destination
>>>> buffers are done.
>>>>
>>>> The rest of the driver is quite similar to other mem2mem drivers, we use the
>>>> multiplane v4l2 ioctls as the HW support coplanar formats.
>>>>
>>>> Signed-off-by: Archit Taneja <archit@ti.com>
>>>> ---
>>>>    drivers/media/platform/Kconfig           |   16 +
>>>>    drivers/media/platform/Makefile          |    2 +
>>>>    drivers/media/platform/ti-vpe/Makefile   |    5 +
>>>>    drivers/media/platform/ti-vpe/vpe.c      | 1750 ++++++++++++++++++++++++++++++
>>>>    drivers/media/platform/ti-vpe/vpe_regs.h |  496 +++++++++
>>>>    include/uapi/linux/v4l2-controls.h       |    4 +
>>>>    6 files changed, 2273 insertions(+)
>>>>    create mode 100644 drivers/media/platform/ti-vpe/Makefile
>>>>    create mode 100644 drivers/media/platform/ti-vpe/vpe.c
>>>>    create mode 100644 drivers/media/platform/ti-vpe/vpe_regs.h
>>>>
>>>
>>> <snip>
>>>
>>>> +
>>>> +static int vpe_g_fmt(struct file *file, void *priv, struct v4l2_format *f)
>>>> +{
>>>> +	struct v4l2_pix_format_mplane *pix = &f->fmt.pix_mp;
>>>> +	struct vpe_ctx *ctx = file2ctx(file);
>>>> +	struct vb2_queue *vq;
>>>> +	struct vpe_q_data *q_data;
>>>> +	int i;
>>>> +
>>>> +	vq = v4l2_m2m_get_vq(ctx->m2m_ctx, f->type);
>>>> +	if (!vq)
>>>> +		return -EINVAL;
>>>> +
>>>> +	q_data = get_q_data(ctx, f->type);
>>>> +
>>>> +	pix->width = q_data->width;
>>>> +	pix->height = q_data->height;
>>>> +	pix->pixelformat = q_data->fmt->fourcc;
>>>> +	pix->colorspace = q_data->colorspace;
>>>> +	pix->num_planes = q_data->fmt->coplanar ? 2 : 1;
>>>> +
>>>> +	for (i = 0; i < pix->num_planes; i++) {
>>>> +		pix->plane_fmt[i].bytesperline = q_data->bytesperline[i];
>>>> +		pix->plane_fmt[i].sizeimage = q_data->sizeimage[i];
>>>> +	}
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +static int __vpe_try_fmt(struct vpe_ctx *ctx, struct v4l2_format *f,
>>>> +		       struct vpe_fmt *fmt, int type)
>>>> +{
>>>> +	struct v4l2_pix_format_mplane *pix = &f->fmt.pix_mp;
>>>> +	struct v4l2_plane_pix_format *plane_fmt;
>>>> +	int i;
>>>> +
>>>> +	if (!fmt || !(fmt->types & type)) {
>>>> +		vpe_err(ctx->dev, "Fourcc format (0x%08x) invalid.\n",
>>>> +			pix->pixelformat);
>>>> +		return -EINVAL;
>>>> +	}
>>>> +
>>>> +	pix->field = V4L2_FIELD_NONE;
>>>> +
>>>> +	v4l_bound_align_image(&pix->width, MIN_W, MAX_W, W_ALIGN,
>>>> +			      &pix->height, MIN_H, MAX_H, H_ALIGN,
>>>> +			      S_ALIGN);
>>>> +
>>>> +	pix->num_planes = fmt->coplanar ? 2 : 1;
>>>> +	pix->pixelformat = fmt->fourcc;
>>>> +	pix->colorspace = fmt->fourcc == V4L2_PIX_FMT_RGB24 ?
>>>
>>> You do this only for capture. Output sets the colorspace, so try_fmt should
>>> leave the colorspace field untouched for the output direction.
>>>
>>>> +			V4L2_COLORSPACE_SRGB : V4L2_COLORSPACE_SMPTE170M;
>>
>> The input to the VPE block can be various YUV formats, and the VPE can
>> generate both RGB and YUV formats.
>>
>> So, I guess the output(V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE) side has
>> choice only to set V4L2_COLORSPACE_SMPTE170M. And in the
>> capture(V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE) capture side we have both
>> SRG and SMPTE170M options.
>>
>> One thing I am not clear about is whether the userspace application has
>> to set the colorspace in the v4l2 format for OUTPUT or CAPTURE or both?
>
> The spec today says that the colorspace field is filled in by the driver.
> It does not differentiate between output and capture. This is patently wrong,
> since for output it should be set by the application since that's who is
> telling the driver what colorspace the image has. The driver may change it
> if it doesn't support that colorspace, but otherwise it should leave it as
> is.
>
> A mem-to-mem device that doesn't care about the colorspace should just copy
> the colorspace field from the output value into the capture.
>
> What is missing in today's API is a way to do colorspace conversion in a m2m
> device since there is no way today to tell the driver the desired colorspace
> that it should get back from the m2m device.
>
>>
>>   From what I understood, the code should be as below.
>>
>> For output:
>>
>> if (!pix->colorspace)
>> 	pix->colorspace = V4L2_COLORSPACE_SMPTE170M;
>
> I would leave off the 'if' part. If this colorspace is all you support on the
> output, then always set it.
>
> However, since it can convert YUV to RGB, doesn't the hardware have to know
> about the various YUV colorspaces? SDTV and HDTV have different colorspaces.
>
>>
>> And for capture:
>> 	pix->colorspace = fmt->fourcc == V4L2_PIX_FMT_RGB24 ?
>> 		V4L2_COLORSPACE_SRGB : V4L2_COLORSPACE_SMPTE170M;
>>
>> Does this look correct?
>
> Yes, unless the hardware can take SDTV/HDTV YUV colorspaces into account. In
> that case I need to think how the API should be improved.

The hardware can't convert one YUV color space to another. But it has a 
programmable CSC block for YUV->RGB conversion in which we can program 
coefficients based on the input YUV color space.

The color space conversion block isn't implemented by the driver yet. So 
I didn't look too much into it.

I guess it will be eventually important to consider the output 
colorspace. It doesn't need to be only SMPTE170M, it could be REC709 or 
SMPTE240M based on what the user says.

When the color space conversion block is implemented and the capture 
colorspace is RGB, the driver should see the input colorspace and choose 
the coefficients accordingly.

With this new information about the hardware (:)), I guess it should be 
as below for now:

output:
/* inserted this check back since multiple YUV spaces supported */
if (!pix->colorspace)	
	pix->colorspace = V4L2_COLORSPACE_SMPTE170M;

capture:
/* removed SRGB sicne we don't support CSC yet */
pix->colorspace = s_q_data->colorspace;

However, the above would imply that we need to s_fmt ioctl is called for 
OUTPUT first, followed by s_fmt for CAPTURE. I don't think that's 
something necessary according to the v4l spec.

Besides this, I noticed a series "V4L2 mem-to-mem ioctl helpers" which I 
should take use of. Do you suggest I base my patches over that?

Thanks
Archit


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 3/4] v4l: ti-vpe: Add VPE mem to mem driver
@ 2013-10-07 10:22               ` Archit Taneja
  0 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-10-07 10:22 UTC (permalink / raw)
  To: Hans Verkuil; +Cc: linux-media, laurent.pinchart, linux-omap, tomi.valkeinen

On Monday 07 October 2013 03:04 PM, Hans Verkuil wrote:
> On 10/07/2013 11:16 AM, Archit Taneja wrote:
>> On Monday 07 October 2013 01:25 PM, Hans Verkuil wrote:
>>> Hi Archit,
>>>
>>> I've got a few comments below...
>>>
>>> On 09/06/2013 12:12 PM, Archit Taneja wrote:
>>>> VPE is a block which consists of a single memory to memory path which can
>>>> perform chrominance up/down sampling, de-interlacing, scaling, and color space
>>>> conversion of raster or tiled YUV420 coplanar, YUV422 coplanar or YUV422
>>>> interleaved video formats.
>>>>
>>>> We create a mem2mem driver based primarily on the mem2mem-testdev example.
>>>> The de-interlacer, scaler and color space converter are all bypassed for now
>>>> to keep the driver simple. Chroma up/down sampler blocks are implemented, so
>>>> conversion beteen different YUV formats is possible.
>>>>
>>>> Each mem2mem context allocates a buffer for VPE MMR values which it will use
>>>> when it gets access to the VPE HW via the mem2mem queue, it also allocates
>>>> a VPDMA descriptor list to which configuration and data descriptors are added.
>>>>
>>>> Based on the information received via v4l2 ioctls for the source and
>>>> destination queues, the driver configures the values for the MMRs, and stores
>>>> them in the buffer. There are also some VPDMA parameters like frame start and
>>>> line mode which needs to be configured, these are configured by direct register
>>>> writes via the VPDMA helper functions.
>>>>
>>>> The driver's device_run() mem2mem op will add each descriptor based on how the
>>>> source and destination queues are set up for the given ctx, once the list is
>>>> prepared, it's submitted to VPDMA, these descriptors when parsed by VPDMA will
>>>> upload MMR registers, start DMA of video buffers on the various input and output
>>>> clients/ports.
>>>>
>>>> When the list is parsed completely(and the DMAs on all the output ports done),
>>>> an interrupt is generated which we use to notify that the source and destination
>>>> buffers are done.
>>>>
>>>> The rest of the driver is quite similar to other mem2mem drivers, we use the
>>>> multiplane v4l2 ioctls as the HW support coplanar formats.
>>>>
>>>> Signed-off-by: Archit Taneja <archit@ti.com>
>>>> ---
>>>>    drivers/media/platform/Kconfig           |   16 +
>>>>    drivers/media/platform/Makefile          |    2 +
>>>>    drivers/media/platform/ti-vpe/Makefile   |    5 +
>>>>    drivers/media/platform/ti-vpe/vpe.c      | 1750 ++++++++++++++++++++++++++++++
>>>>    drivers/media/platform/ti-vpe/vpe_regs.h |  496 +++++++++
>>>>    include/uapi/linux/v4l2-controls.h       |    4 +
>>>>    6 files changed, 2273 insertions(+)
>>>>    create mode 100644 drivers/media/platform/ti-vpe/Makefile
>>>>    create mode 100644 drivers/media/platform/ti-vpe/vpe.c
>>>>    create mode 100644 drivers/media/platform/ti-vpe/vpe_regs.h
>>>>
>>>
>>> <snip>
>>>
>>>> +
>>>> +static int vpe_g_fmt(struct file *file, void *priv, struct v4l2_format *f)
>>>> +{
>>>> +	struct v4l2_pix_format_mplane *pix = &f->fmt.pix_mp;
>>>> +	struct vpe_ctx *ctx = file2ctx(file);
>>>> +	struct vb2_queue *vq;
>>>> +	struct vpe_q_data *q_data;
>>>> +	int i;
>>>> +
>>>> +	vq = v4l2_m2m_get_vq(ctx->m2m_ctx, f->type);
>>>> +	if (!vq)
>>>> +		return -EINVAL;
>>>> +
>>>> +	q_data = get_q_data(ctx, f->type);
>>>> +
>>>> +	pix->width = q_data->width;
>>>> +	pix->height = q_data->height;
>>>> +	pix->pixelformat = q_data->fmt->fourcc;
>>>> +	pix->colorspace = q_data->colorspace;
>>>> +	pix->num_planes = q_data->fmt->coplanar ? 2 : 1;
>>>> +
>>>> +	for (i = 0; i < pix->num_planes; i++) {
>>>> +		pix->plane_fmt[i].bytesperline = q_data->bytesperline[i];
>>>> +		pix->plane_fmt[i].sizeimage = q_data->sizeimage[i];
>>>> +	}
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +static int __vpe_try_fmt(struct vpe_ctx *ctx, struct v4l2_format *f,
>>>> +		       struct vpe_fmt *fmt, int type)
>>>> +{
>>>> +	struct v4l2_pix_format_mplane *pix = &f->fmt.pix_mp;
>>>> +	struct v4l2_plane_pix_format *plane_fmt;
>>>> +	int i;
>>>> +
>>>> +	if (!fmt || !(fmt->types & type)) {
>>>> +		vpe_err(ctx->dev, "Fourcc format (0x%08x) invalid.\n",
>>>> +			pix->pixelformat);
>>>> +		return -EINVAL;
>>>> +	}
>>>> +
>>>> +	pix->field = V4L2_FIELD_NONE;
>>>> +
>>>> +	v4l_bound_align_image(&pix->width, MIN_W, MAX_W, W_ALIGN,
>>>> +			      &pix->height, MIN_H, MAX_H, H_ALIGN,
>>>> +			      S_ALIGN);
>>>> +
>>>> +	pix->num_planes = fmt->coplanar ? 2 : 1;
>>>> +	pix->pixelformat = fmt->fourcc;
>>>> +	pix->colorspace = fmt->fourcc == V4L2_PIX_FMT_RGB24 ?
>>>
>>> You do this only for capture. Output sets the colorspace, so try_fmt should
>>> leave the colorspace field untouched for the output direction.
>>>
>>>> +			V4L2_COLORSPACE_SRGB : V4L2_COLORSPACE_SMPTE170M;
>>
>> The input to the VPE block can be various YUV formats, and the VPE can
>> generate both RGB and YUV formats.
>>
>> So, I guess the output(V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE) side has
>> choice only to set V4L2_COLORSPACE_SMPTE170M. And in the
>> capture(V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE) capture side we have both
>> SRG and SMPTE170M options.
>>
>> One thing I am not clear about is whether the userspace application has
>> to set the colorspace in the v4l2 format for OUTPUT or CAPTURE or both?
>
> The spec today says that the colorspace field is filled in by the driver.
> It does not differentiate between output and capture. This is patently wrong,
> since for output it should be set by the application since that's who is
> telling the driver what colorspace the image has. The driver may change it
> if it doesn't support that colorspace, but otherwise it should leave it as
> is.
>
> A mem-to-mem device that doesn't care about the colorspace should just copy
> the colorspace field from the output value into the capture.
>
> What is missing in today's API is a way to do colorspace conversion in a m2m
> device since there is no way today to tell the driver the desired colorspace
> that it should get back from the m2m device.
>
>>
>>   From what I understood, the code should be as below.
>>
>> For output:
>>
>> if (!pix->colorspace)
>> 	pix->colorspace = V4L2_COLORSPACE_SMPTE170M;
>
> I would leave off the 'if' part. If this colorspace is all you support on the
> output, then always set it.
>
> However, since it can convert YUV to RGB, doesn't the hardware have to know
> about the various YUV colorspaces? SDTV and HDTV have different colorspaces.
>
>>
>> And for capture:
>> 	pix->colorspace = fmt->fourcc == V4L2_PIX_FMT_RGB24 ?
>> 		V4L2_COLORSPACE_SRGB : V4L2_COLORSPACE_SMPTE170M;
>>
>> Does this look correct?
>
> Yes, unless the hardware can take SDTV/HDTV YUV colorspaces into account. In
> that case I need to think how the API should be improved.

The hardware can't convert one YUV color space to another. But it has a 
programmable CSC block for YUV->RGB conversion in which we can program 
coefficients based on the input YUV color space.

The color space conversion block isn't implemented by the driver yet. So 
I didn't look too much into it.

I guess it will be eventually important to consider the output 
colorspace. It doesn't need to be only SMPTE170M, it could be REC709 or 
SMPTE240M based on what the user says.

When the color space conversion block is implemented and the capture 
colorspace is RGB, the driver should see the input colorspace and choose 
the coefficients accordingly.

With this new information about the hardware (:)), I guess it should be 
as below for now:

output:
/* inserted this check back since multiple YUV spaces supported */
if (!pix->colorspace)	
	pix->colorspace = V4L2_COLORSPACE_SMPTE170M;

capture:
/* removed SRGB sicne we don't support CSC yet */
pix->colorspace = s_q_data->colorspace;

However, the above would imply that we need to s_fmt ioctl is called for 
OUTPUT first, followed by s_fmt for CAPTURE. I don't think that's 
something necessary according to the v4l spec.

Besides this, I noticed a series "V4L2 mem-to-mem ioctl helpers" which I 
should take use of. Do you suggest I base my patches over that?

Thanks
Archit


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 3/4] v4l: ti-vpe: Add VPE mem to mem driver
  2013-10-07 10:22               ` Archit Taneja
  (?)
@ 2013-10-07 14:02               ` Hans Verkuil
  2013-10-07 14:34                   ` Archit Taneja
  -1 siblings, 1 reply; 138+ messages in thread
From: Hans Verkuil @ 2013-10-07 14:02 UTC (permalink / raw)
  To: Archit Taneja; +Cc: linux-media, laurent.pinchart, linux-omap, tomi.valkeinen

On 10/07/2013 12:22 PM, Archit Taneja wrote:
> On Monday 07 October 2013 03:04 PM, Hans Verkuil wrote:
>> On 10/07/2013 11:16 AM, Archit Taneja wrote:
>>> On Monday 07 October 2013 01:25 PM, Hans Verkuil wrote:
>>>> Hi Archit,
>>>>
>>>> I've got a few comments below...
>>>>
>>>> On 09/06/2013 12:12 PM, Archit Taneja wrote:
>>>>> VPE is a block which consists of a single memory to memory path which can
>>>>> perform chrominance up/down sampling, de-interlacing, scaling, and color space
>>>>> conversion of raster or tiled YUV420 coplanar, YUV422 coplanar or YUV422
>>>>> interleaved video formats.
>>>>>
>>>>> We create a mem2mem driver based primarily on the mem2mem-testdev example.
>>>>> The de-interlacer, scaler and color space converter are all bypassed for now
>>>>> to keep the driver simple. Chroma up/down sampler blocks are implemented, so
>>>>> conversion beteen different YUV formats is possible.
>>>>>
>>>>> Each mem2mem context allocates a buffer for VPE MMR values which it will use
>>>>> when it gets access to the VPE HW via the mem2mem queue, it also allocates
>>>>> a VPDMA descriptor list to which configuration and data descriptors are added.
>>>>>
>>>>> Based on the information received via v4l2 ioctls for the source and
>>>>> destination queues, the driver configures the values for the MMRs, and stores
>>>>> them in the buffer. There are also some VPDMA parameters like frame start and
>>>>> line mode which needs to be configured, these are configured by direct register
>>>>> writes via the VPDMA helper functions.
>>>>>
>>>>> The driver's device_run() mem2mem op will add each descriptor based on how the
>>>>> source and destination queues are set up for the given ctx, once the list is
>>>>> prepared, it's submitted to VPDMA, these descriptors when parsed by VPDMA will
>>>>> upload MMR registers, start DMA of video buffers on the various input and output
>>>>> clients/ports.
>>>>>
>>>>> When the list is parsed completely(and the DMAs on all the output ports done),
>>>>> an interrupt is generated which we use to notify that the source and destination
>>>>> buffers are done.
>>>>>
>>>>> The rest of the driver is quite similar to other mem2mem drivers, we use the
>>>>> multiplane v4l2 ioctls as the HW support coplanar formats.
>>>>>
>>>>> Signed-off-by: Archit Taneja <archit@ti.com>
>>>>> ---
>>>>>    drivers/media/platform/Kconfig           |   16 +
>>>>>    drivers/media/platform/Makefile          |    2 +
>>>>>    drivers/media/platform/ti-vpe/Makefile   |    5 +
>>>>>    drivers/media/platform/ti-vpe/vpe.c      | 1750 ++++++++++++++++++++++++++++++
>>>>>    drivers/media/platform/ti-vpe/vpe_regs.h |  496 +++++++++
>>>>>    include/uapi/linux/v4l2-controls.h       |    4 +
>>>>>    6 files changed, 2273 insertions(+)
>>>>>    create mode 100644 drivers/media/platform/ti-vpe/Makefile
>>>>>    create mode 100644 drivers/media/platform/ti-vpe/vpe.c
>>>>>    create mode 100644 drivers/media/platform/ti-vpe/vpe_regs.h
>>>>>
>>>>
>>>> <snip>
>>>>
>>>>> +
>>>>> +static int vpe_g_fmt(struct file *file, void *priv, struct v4l2_format *f)
>>>>> +{
>>>>> +	struct v4l2_pix_format_mplane *pix = &f->fmt.pix_mp;
>>>>> +	struct vpe_ctx *ctx = file2ctx(file);
>>>>> +	struct vb2_queue *vq;
>>>>> +	struct vpe_q_data *q_data;
>>>>> +	int i;
>>>>> +
>>>>> +	vq = v4l2_m2m_get_vq(ctx->m2m_ctx, f->type);
>>>>> +	if (!vq)
>>>>> +		return -EINVAL;
>>>>> +
>>>>> +	q_data = get_q_data(ctx, f->type);
>>>>> +
>>>>> +	pix->width = q_data->width;
>>>>> +	pix->height = q_data->height;
>>>>> +	pix->pixelformat = q_data->fmt->fourcc;
>>>>> +	pix->colorspace = q_data->colorspace;
>>>>> +	pix->num_planes = q_data->fmt->coplanar ? 2 : 1;
>>>>> +
>>>>> +	for (i = 0; i < pix->num_planes; i++) {
>>>>> +		pix->plane_fmt[i].bytesperline = q_data->bytesperline[i];
>>>>> +		pix->plane_fmt[i].sizeimage = q_data->sizeimage[i];
>>>>> +	}
>>>>> +
>>>>> +	return 0;
>>>>> +}
>>>>> +
>>>>> +static int __vpe_try_fmt(struct vpe_ctx *ctx, struct v4l2_format *f,
>>>>> +		       struct vpe_fmt *fmt, int type)
>>>>> +{
>>>>> +	struct v4l2_pix_format_mplane *pix = &f->fmt.pix_mp;
>>>>> +	struct v4l2_plane_pix_format *plane_fmt;
>>>>> +	int i;
>>>>> +
>>>>> +	if (!fmt || !(fmt->types & type)) {
>>>>> +		vpe_err(ctx->dev, "Fourcc format (0x%08x) invalid.\n",
>>>>> +			pix->pixelformat);
>>>>> +		return -EINVAL;
>>>>> +	}
>>>>> +
>>>>> +	pix->field = V4L2_FIELD_NONE;
>>>>> +
>>>>> +	v4l_bound_align_image(&pix->width, MIN_W, MAX_W, W_ALIGN,
>>>>> +			      &pix->height, MIN_H, MAX_H, H_ALIGN,
>>>>> +			      S_ALIGN);
>>>>> +
>>>>> +	pix->num_planes = fmt->coplanar ? 2 : 1;
>>>>> +	pix->pixelformat = fmt->fourcc;
>>>>> +	pix->colorspace = fmt->fourcc == V4L2_PIX_FMT_RGB24 ?
>>>>
>>>> You do this only for capture. Output sets the colorspace, so try_fmt should
>>>> leave the colorspace field untouched for the output direction.
>>>>
>>>>> +			V4L2_COLORSPACE_SRGB : V4L2_COLORSPACE_SMPTE170M;
>>>
>>> The input to the VPE block can be various YUV formats, and the VPE can
>>> generate both RGB and YUV formats.
>>>
>>> So, I guess the output(V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE) side has
>>> choice only to set V4L2_COLORSPACE_SMPTE170M. And in the
>>> capture(V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE) capture side we have both
>>> SRG and SMPTE170M options.
>>>
>>> One thing I am not clear about is whether the userspace application has
>>> to set the colorspace in the v4l2 format for OUTPUT or CAPTURE or both?
>>
>> The spec today says that the colorspace field is filled in by the driver.
>> It does not differentiate between output and capture. This is patently wrong,
>> since for output it should be set by the application since that's who is
>> telling the driver what colorspace the image has. The driver may change it
>> if it doesn't support that colorspace, but otherwise it should leave it as
>> is.
>>
>> A mem-to-mem device that doesn't care about the colorspace should just copy
>> the colorspace field from the output value into the capture.
>>
>> What is missing in today's API is a way to do colorspace conversion in a m2m
>> device since there is no way today to tell the driver the desired colorspace
>> that it should get back from the m2m device.
>>
>>>
>>>   From what I understood, the code should be as below.
>>>
>>> For output:
>>>
>>> if (!pix->colorspace)
>>> 	pix->colorspace = V4L2_COLORSPACE_SMPTE170M;
>>
>> I would leave off the 'if' part. If this colorspace is all you support on the
>> output, then always set it.
>>
>> However, since it can convert YUV to RGB, doesn't the hardware have to know
>> about the various YUV colorspaces? SDTV and HDTV have different colorspaces.
>>
>>>
>>> And for capture:
>>> 	pix->colorspace = fmt->fourcc == V4L2_PIX_FMT_RGB24 ?
>>> 		V4L2_COLORSPACE_SRGB : V4L2_COLORSPACE_SMPTE170M;
>>>
>>> Does this look correct?
>>
>> Yes, unless the hardware can take SDTV/HDTV YUV colorspaces into account. In
>> that case I need to think how the API should be improved.
> 
> The hardware can't convert one YUV color space to another. But it has a 
> programmable CSC block for YUV->RGB conversion in which we can program 
> coefficients based on the input YUV color space.
> 
> The color space conversion block isn't implemented by the driver yet. So 
> I didn't look too much into it.
> 
> I guess it will be eventually important to consider the output 
> colorspace. It doesn't need to be only SMPTE170M, it could be REC709 or 
> SMPTE240M based on what the user says.
> 
> When the color space conversion block is implemented and the capture 
> colorspace is RGB, the driver should see the input colorspace and choose 
> the coefficients accordingly.
> 
> With this new information about the hardware (:)), I guess it should be 
> as below for now:
> 
> output:
> /* inserted this check back since multiple YUV spaces supported */
> if (!pix->colorspace)	
> 	pix->colorspace = V4L2_COLORSPACE_SMPTE170M;
> 
> capture:
> /* removed SRGB sicne we don't support CSC yet */
> pix->colorspace = s_q_data->colorspace;
> 
> However, the above would imply that we need to s_fmt ioctl is called for 
> OUTPUT first, followed by s_fmt for CAPTURE. I don't think that's 
> something necessary according to the v4l spec.

Well, s_fmt(OUTPUT) influences the colorspace field returned by *_fmt(CAPTURE).
Which I think is OK. You can use either order, but to see the actual colorspace
used by capture you will have to call g_fmt(CAPTURE) after calling s_fmt(OUTPUT).

It's the logical order anyway for a m2m device to start with the output first.

> Besides this, I noticed a series "V4L2 mem-to-mem ioctl helpers" which I 
> should take use of. Do you suggest I base my patches over that?

I don't know when that will be merged. It might be easier to just add a new
patch converting the driver to the helpers, then we can apply all patches except
that last one if the helpers aren't merged yet.

Regards,

	Hans

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 3/4] v4l: ti-vpe: Add VPE mem to mem driver
  2013-10-07 14:02               ` Hans Verkuil
@ 2013-10-07 14:34                   ` Archit Taneja
  0 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-10-07 14:34 UTC (permalink / raw)
  To: Hans Verkuil; +Cc: linux-media, laurent.pinchart, linux-omap, tomi.valkeinen

On Monday 07 October 2013 07:32 PM, Hans Verkuil wrote:
> On 10/07/2013 12:22 PM, Archit Taneja wrote:
>> On Monday 07 October 2013 03:04 PM, Hans Verkuil wrote:
>>> On 10/07/2013 11:16 AM, Archit Taneja wrote:
>>>> On Monday 07 October 2013 01:25 PM, Hans Verkuil wrote:
>>>>> Hi Archit,
>>>>>
>>>>> I've got a few comments below...
>>>>>
>>>>> On 09/06/2013 12:12 PM, Archit Taneja wrote:
>>>>>> VPE is a block which consists of a single memory to memory path which can
>>>>>> perform chrominance up/down sampling, de-interlacing, scaling, and color space
>>>>>> conversion of raster or tiled YUV420 coplanar, YUV422 coplanar or YUV422
>>>>>> interleaved video formats.
>>>>>>
>>>>>> We create a mem2mem driver based primarily on the mem2mem-testdev example.
>>>>>> The de-interlacer, scaler and color space converter are all bypassed for now
>>>>>> to keep the driver simple. Chroma up/down sampler blocks are implemented, so
>>>>>> conversion beteen different YUV formats is possible.
>>>>>>
>>>>>> Each mem2mem context allocates a buffer for VPE MMR values which it will use
>>>>>> when it gets access to the VPE HW via the mem2mem queue, it also allocates
>>>>>> a VPDMA descriptor list to which configuration and data descriptors are added.
>>>>>>
>>>>>> Based on the information received via v4l2 ioctls for the source and
>>>>>> destination queues, the driver configures the values for the MMRs, and stores
>>>>>> them in the buffer. There are also some VPDMA parameters like frame start and
>>>>>> line mode which needs to be configured, these are configured by direct register
>>>>>> writes via the VPDMA helper functions.
>>>>>>
>>>>>> The driver's device_run() mem2mem op will add each descriptor based on how the
>>>>>> source and destination queues are set up for the given ctx, once the list is
>>>>>> prepared, it's submitted to VPDMA, these descriptors when parsed by VPDMA will
>>>>>> upload MMR registers, start DMA of video buffers on the various input and output
>>>>>> clients/ports.
>>>>>>
>>>>>> When the list is parsed completely(and the DMAs on all the output ports done),
>>>>>> an interrupt is generated which we use to notify that the source and destination
>>>>>> buffers are done.
>>>>>>
>>>>>> The rest of the driver is quite similar to other mem2mem drivers, we use the
>>>>>> multiplane v4l2 ioctls as the HW support coplanar formats.
>>>>>>
>>>>>> Signed-off-by: Archit Taneja <archit@ti.com>
>>>>>> ---
>>>>>>     drivers/media/platform/Kconfig           |   16 +
>>>>>>     drivers/media/platform/Makefile          |    2 +
>>>>>>     drivers/media/platform/ti-vpe/Makefile   |    5 +
>>>>>>     drivers/media/platform/ti-vpe/vpe.c      | 1750 ++++++++++++++++++++++++++++++
>>>>>>     drivers/media/platform/ti-vpe/vpe_regs.h |  496 +++++++++
>>>>>>     include/uapi/linux/v4l2-controls.h       |    4 +
>>>>>>     6 files changed, 2273 insertions(+)
>>>>>>     create mode 100644 drivers/media/platform/ti-vpe/Makefile
>>>>>>     create mode 100644 drivers/media/platform/ti-vpe/vpe.c
>>>>>>     create mode 100644 drivers/media/platform/ti-vpe/vpe_regs.h
>>>>>>
>>>>>
>>>>> <snip>
>>>>>
>>>>>> +
>>>>>> +static int vpe_g_fmt(struct file *file, void *priv, struct v4l2_format *f)
>>>>>> +{
>>>>>> +	struct v4l2_pix_format_mplane *pix = &f->fmt.pix_mp;
>>>>>> +	struct vpe_ctx *ctx = file2ctx(file);
>>>>>> +	struct vb2_queue *vq;
>>>>>> +	struct vpe_q_data *q_data;
>>>>>> +	int i;
>>>>>> +
>>>>>> +	vq = v4l2_m2m_get_vq(ctx->m2m_ctx, f->type);
>>>>>> +	if (!vq)
>>>>>> +		return -EINVAL;
>>>>>> +
>>>>>> +	q_data = get_q_data(ctx, f->type);
>>>>>> +
>>>>>> +	pix->width = q_data->width;
>>>>>> +	pix->height = q_data->height;
>>>>>> +	pix->pixelformat = q_data->fmt->fourcc;
>>>>>> +	pix->colorspace = q_data->colorspace;
>>>>>> +	pix->num_planes = q_data->fmt->coplanar ? 2 : 1;
>>>>>> +
>>>>>> +	for (i = 0; i < pix->num_planes; i++) {
>>>>>> +		pix->plane_fmt[i].bytesperline = q_data->bytesperline[i];
>>>>>> +		pix->plane_fmt[i].sizeimage = q_data->sizeimage[i];
>>>>>> +	}
>>>>>> +
>>>>>> +	return 0;
>>>>>> +}
>>>>>> +
>>>>>> +static int __vpe_try_fmt(struct vpe_ctx *ctx, struct v4l2_format *f,
>>>>>> +		       struct vpe_fmt *fmt, int type)
>>>>>> +{
>>>>>> +	struct v4l2_pix_format_mplane *pix = &f->fmt.pix_mp;
>>>>>> +	struct v4l2_plane_pix_format *plane_fmt;
>>>>>> +	int i;
>>>>>> +
>>>>>> +	if (!fmt || !(fmt->types & type)) {
>>>>>> +		vpe_err(ctx->dev, "Fourcc format (0x%08x) invalid.\n",
>>>>>> +			pix->pixelformat);
>>>>>> +		return -EINVAL;
>>>>>> +	}
>>>>>> +
>>>>>> +	pix->field = V4L2_FIELD_NONE;
>>>>>> +
>>>>>> +	v4l_bound_align_image(&pix->width, MIN_W, MAX_W, W_ALIGN,
>>>>>> +			      &pix->height, MIN_H, MAX_H, H_ALIGN,
>>>>>> +			      S_ALIGN);
>>>>>> +
>>>>>> +	pix->num_planes = fmt->coplanar ? 2 : 1;
>>>>>> +	pix->pixelformat = fmt->fourcc;
>>>>>> +	pix->colorspace = fmt->fourcc == V4L2_PIX_FMT_RGB24 ?
>>>>>
>>>>> You do this only for capture. Output sets the colorspace, so try_fmt should
>>>>> leave the colorspace field untouched for the output direction.
>>>>>
>>>>>> +			V4L2_COLORSPACE_SRGB : V4L2_COLORSPACE_SMPTE170M;
>>>>
>>>> The input to the VPE block can be various YUV formats, and the VPE can
>>>> generate both RGB and YUV formats.
>>>>
>>>> So, I guess the output(V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE) side has
>>>> choice only to set V4L2_COLORSPACE_SMPTE170M. And in the
>>>> capture(V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE) capture side we have both
>>>> SRG and SMPTE170M options.
>>>>
>>>> One thing I am not clear about is whether the userspace application has
>>>> to set the colorspace in the v4l2 format for OUTPUT or CAPTURE or both?
>>>
>>> The spec today says that the colorspace field is filled in by the driver.
>>> It does not differentiate between output and capture. This is patently wrong,
>>> since for output it should be set by the application since that's who is
>>> telling the driver what colorspace the image has. The driver may change it
>>> if it doesn't support that colorspace, but otherwise it should leave it as
>>> is.
>>>
>>> A mem-to-mem device that doesn't care about the colorspace should just copy
>>> the colorspace field from the output value into the capture.
>>>
>>> What is missing in today's API is a way to do colorspace conversion in a m2m
>>> device since there is no way today to tell the driver the desired colorspace
>>> that it should get back from the m2m device.
>>>
>>>>
>>>>    From what I understood, the code should be as below.
>>>>
>>>> For output:
>>>>
>>>> if (!pix->colorspace)
>>>> 	pix->colorspace = V4L2_COLORSPACE_SMPTE170M;
>>>
>>> I would leave off the 'if' part. If this colorspace is all you support on the
>>> output, then always set it.
>>>
>>> However, since it can convert YUV to RGB, doesn't the hardware have to know
>>> about the various YUV colorspaces? SDTV and HDTV have different colorspaces.
>>>
>>>>
>>>> And for capture:
>>>> 	pix->colorspace = fmt->fourcc == V4L2_PIX_FMT_RGB24 ?
>>>> 		V4L2_COLORSPACE_SRGB : V4L2_COLORSPACE_SMPTE170M;
>>>>
>>>> Does this look correct?
>>>
>>> Yes, unless the hardware can take SDTV/HDTV YUV colorspaces into account. In
>>> that case I need to think how the API should be improved.
>>
>> The hardware can't convert one YUV color space to another. But it has a
>> programmable CSC block for YUV->RGB conversion in which we can program
>> coefficients based on the input YUV color space.
>>
>> The color space conversion block isn't implemented by the driver yet. So
>> I didn't look too much into it.
>>
>> I guess it will be eventually important to consider the output
>> colorspace. It doesn't need to be only SMPTE170M, it could be REC709 or
>> SMPTE240M based on what the user says.
>>
>> When the color space conversion block is implemented and the capture
>> colorspace is RGB, the driver should see the input colorspace and choose
>> the coefficients accordingly.
>>
>> With this new information about the hardware (:)), I guess it should be
>> as below for now:
>>
>> output:
>> /* inserted this check back since multiple YUV spaces supported */
>> if (!pix->colorspace)	
>> 	pix->colorspace = V4L2_COLORSPACE_SMPTE170M;
>>
>> capture:
>> /* removed SRGB sicne we don't support CSC yet */
>> pix->colorspace = s_q_data->colorspace;
>>
>> However, the above would imply that we need to s_fmt ioctl is called for
>> OUTPUT first, followed by s_fmt for CAPTURE. I don't think that's
>> something necessary according to the v4l spec.
>
> Well, s_fmt(OUTPUT) influences the colorspace field returned by *_fmt(CAPTURE).
> Which I think is OK. You can use either order, but to see the actual colorspace
> used by capture you will have to call g_fmt(CAPTURE) after calling s_fmt(OUTPUT).
>
> It's the logical order anyway for a m2m device to start with the output first.

Okay, that makes sense.

>
>> Besides this, I noticed a series "V4L2 mem-to-mem ioctl helpers" which I
>> should take use of. Do you suggest I base my patches over that?
>
> I don't know when that will be merged. It might be easier to just add a new
> patch converting the driver to the helpers, then we can apply all patches except
> that last one if the helpers aren't merged yet.

That's a good idea. Will add a new patch in the next version.

Thanks for the review.

Archit


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 3/4] v4l: ti-vpe: Add VPE mem to mem driver
@ 2013-10-07 14:34                   ` Archit Taneja
  0 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-10-07 14:34 UTC (permalink / raw)
  To: Hans Verkuil; +Cc: linux-media, laurent.pinchart, linux-omap, tomi.valkeinen

On Monday 07 October 2013 07:32 PM, Hans Verkuil wrote:
> On 10/07/2013 12:22 PM, Archit Taneja wrote:
>> On Monday 07 October 2013 03:04 PM, Hans Verkuil wrote:
>>> On 10/07/2013 11:16 AM, Archit Taneja wrote:
>>>> On Monday 07 October 2013 01:25 PM, Hans Verkuil wrote:
>>>>> Hi Archit,
>>>>>
>>>>> I've got a few comments below...
>>>>>
>>>>> On 09/06/2013 12:12 PM, Archit Taneja wrote:
>>>>>> VPE is a block which consists of a single memory to memory path which can
>>>>>> perform chrominance up/down sampling, de-interlacing, scaling, and color space
>>>>>> conversion of raster or tiled YUV420 coplanar, YUV422 coplanar or YUV422
>>>>>> interleaved video formats.
>>>>>>
>>>>>> We create a mem2mem driver based primarily on the mem2mem-testdev example.
>>>>>> The de-interlacer, scaler and color space converter are all bypassed for now
>>>>>> to keep the driver simple. Chroma up/down sampler blocks are implemented, so
>>>>>> conversion beteen different YUV formats is possible.
>>>>>>
>>>>>> Each mem2mem context allocates a buffer for VPE MMR values which it will use
>>>>>> when it gets access to the VPE HW via the mem2mem queue, it also allocates
>>>>>> a VPDMA descriptor list to which configuration and data descriptors are added.
>>>>>>
>>>>>> Based on the information received via v4l2 ioctls for the source and
>>>>>> destination queues, the driver configures the values for the MMRs, and stores
>>>>>> them in the buffer. There are also some VPDMA parameters like frame start and
>>>>>> line mode which needs to be configured, these are configured by direct register
>>>>>> writes via the VPDMA helper functions.
>>>>>>
>>>>>> The driver's device_run() mem2mem op will add each descriptor based on how the
>>>>>> source and destination queues are set up for the given ctx, once the list is
>>>>>> prepared, it's submitted to VPDMA, these descriptors when parsed by VPDMA will
>>>>>> upload MMR registers, start DMA of video buffers on the various input and output
>>>>>> clients/ports.
>>>>>>
>>>>>> When the list is parsed completely(and the DMAs on all the output ports done),
>>>>>> an interrupt is generated which we use to notify that the source and destination
>>>>>> buffers are done.
>>>>>>
>>>>>> The rest of the driver is quite similar to other mem2mem drivers, we use the
>>>>>> multiplane v4l2 ioctls as the HW support coplanar formats.
>>>>>>
>>>>>> Signed-off-by: Archit Taneja <archit@ti.com>
>>>>>> ---
>>>>>>     drivers/media/platform/Kconfig           |   16 +
>>>>>>     drivers/media/platform/Makefile          |    2 +
>>>>>>     drivers/media/platform/ti-vpe/Makefile   |    5 +
>>>>>>     drivers/media/platform/ti-vpe/vpe.c      | 1750 ++++++++++++++++++++++++++++++
>>>>>>     drivers/media/platform/ti-vpe/vpe_regs.h |  496 +++++++++
>>>>>>     include/uapi/linux/v4l2-controls.h       |    4 +
>>>>>>     6 files changed, 2273 insertions(+)
>>>>>>     create mode 100644 drivers/media/platform/ti-vpe/Makefile
>>>>>>     create mode 100644 drivers/media/platform/ti-vpe/vpe.c
>>>>>>     create mode 100644 drivers/media/platform/ti-vpe/vpe_regs.h
>>>>>>
>>>>>
>>>>> <snip>
>>>>>
>>>>>> +
>>>>>> +static int vpe_g_fmt(struct file *file, void *priv, struct v4l2_format *f)
>>>>>> +{
>>>>>> +	struct v4l2_pix_format_mplane *pix = &f->fmt.pix_mp;
>>>>>> +	struct vpe_ctx *ctx = file2ctx(file);
>>>>>> +	struct vb2_queue *vq;
>>>>>> +	struct vpe_q_data *q_data;
>>>>>> +	int i;
>>>>>> +
>>>>>> +	vq = v4l2_m2m_get_vq(ctx->m2m_ctx, f->type);
>>>>>> +	if (!vq)
>>>>>> +		return -EINVAL;
>>>>>> +
>>>>>> +	q_data = get_q_data(ctx, f->type);
>>>>>> +
>>>>>> +	pix->width = q_data->width;
>>>>>> +	pix->height = q_data->height;
>>>>>> +	pix->pixelformat = q_data->fmt->fourcc;
>>>>>> +	pix->colorspace = q_data->colorspace;
>>>>>> +	pix->num_planes = q_data->fmt->coplanar ? 2 : 1;
>>>>>> +
>>>>>> +	for (i = 0; i < pix->num_planes; i++) {
>>>>>> +		pix->plane_fmt[i].bytesperline = q_data->bytesperline[i];
>>>>>> +		pix->plane_fmt[i].sizeimage = q_data->sizeimage[i];
>>>>>> +	}
>>>>>> +
>>>>>> +	return 0;
>>>>>> +}
>>>>>> +
>>>>>> +static int __vpe_try_fmt(struct vpe_ctx *ctx, struct v4l2_format *f,
>>>>>> +		       struct vpe_fmt *fmt, int type)
>>>>>> +{
>>>>>> +	struct v4l2_pix_format_mplane *pix = &f->fmt.pix_mp;
>>>>>> +	struct v4l2_plane_pix_format *plane_fmt;
>>>>>> +	int i;
>>>>>> +
>>>>>> +	if (!fmt || !(fmt->types & type)) {
>>>>>> +		vpe_err(ctx->dev, "Fourcc format (0x%08x) invalid.\n",
>>>>>> +			pix->pixelformat);
>>>>>> +		return -EINVAL;
>>>>>> +	}
>>>>>> +
>>>>>> +	pix->field = V4L2_FIELD_NONE;
>>>>>> +
>>>>>> +	v4l_bound_align_image(&pix->width, MIN_W, MAX_W, W_ALIGN,
>>>>>> +			      &pix->height, MIN_H, MAX_H, H_ALIGN,
>>>>>> +			      S_ALIGN);
>>>>>> +
>>>>>> +	pix->num_planes = fmt->coplanar ? 2 : 1;
>>>>>> +	pix->pixelformat = fmt->fourcc;
>>>>>> +	pix->colorspace = fmt->fourcc == V4L2_PIX_FMT_RGB24 ?
>>>>>
>>>>> You do this only for capture. Output sets the colorspace, so try_fmt should
>>>>> leave the colorspace field untouched for the output direction.
>>>>>
>>>>>> +			V4L2_COLORSPACE_SRGB : V4L2_COLORSPACE_SMPTE170M;
>>>>
>>>> The input to the VPE block can be various YUV formats, and the VPE can
>>>> generate both RGB and YUV formats.
>>>>
>>>> So, I guess the output(V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE) side has
>>>> choice only to set V4L2_COLORSPACE_SMPTE170M. And in the
>>>> capture(V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE) capture side we have both
>>>> SRG and SMPTE170M options.
>>>>
>>>> One thing I am not clear about is whether the userspace application has
>>>> to set the colorspace in the v4l2 format for OUTPUT or CAPTURE or both?
>>>
>>> The spec today says that the colorspace field is filled in by the driver.
>>> It does not differentiate between output and capture. This is patently wrong,
>>> since for output it should be set by the application since that's who is
>>> telling the driver what colorspace the image has. The driver may change it
>>> if it doesn't support that colorspace, but otherwise it should leave it as
>>> is.
>>>
>>> A mem-to-mem device that doesn't care about the colorspace should just copy
>>> the colorspace field from the output value into the capture.
>>>
>>> What is missing in today's API is a way to do colorspace conversion in a m2m
>>> device since there is no way today to tell the driver the desired colorspace
>>> that it should get back from the m2m device.
>>>
>>>>
>>>>    From what I understood, the code should be as below.
>>>>
>>>> For output:
>>>>
>>>> if (!pix->colorspace)
>>>> 	pix->colorspace = V4L2_COLORSPACE_SMPTE170M;
>>>
>>> I would leave off the 'if' part. If this colorspace is all you support on the
>>> output, then always set it.
>>>
>>> However, since it can convert YUV to RGB, doesn't the hardware have to know
>>> about the various YUV colorspaces? SDTV and HDTV have different colorspaces.
>>>
>>>>
>>>> And for capture:
>>>> 	pix->colorspace = fmt->fourcc == V4L2_PIX_FMT_RGB24 ?
>>>> 		V4L2_COLORSPACE_SRGB : V4L2_COLORSPACE_SMPTE170M;
>>>>
>>>> Does this look correct?
>>>
>>> Yes, unless the hardware can take SDTV/HDTV YUV colorspaces into account. In
>>> that case I need to think how the API should be improved.
>>
>> The hardware can't convert one YUV color space to another. But it has a
>> programmable CSC block for YUV->RGB conversion in which we can program
>> coefficients based on the input YUV color space.
>>
>> The color space conversion block isn't implemented by the driver yet. So
>> I didn't look too much into it.
>>
>> I guess it will be eventually important to consider the output
>> colorspace. It doesn't need to be only SMPTE170M, it could be REC709 or
>> SMPTE240M based on what the user says.
>>
>> When the color space conversion block is implemented and the capture
>> colorspace is RGB, the driver should see the input colorspace and choose
>> the coefficients accordingly.
>>
>> With this new information about the hardware (:)), I guess it should be
>> as below for now:
>>
>> output:
>> /* inserted this check back since multiple YUV spaces supported */
>> if (!pix->colorspace)	
>> 	pix->colorspace = V4L2_COLORSPACE_SMPTE170M;
>>
>> capture:
>> /* removed SRGB sicne we don't support CSC yet */
>> pix->colorspace = s_q_data->colorspace;
>>
>> However, the above would imply that we need to s_fmt ioctl is called for
>> OUTPUT first, followed by s_fmt for CAPTURE. I don't think that's
>> something necessary according to the v4l spec.
>
> Well, s_fmt(OUTPUT) influences the colorspace field returned by *_fmt(CAPTURE).
> Which I think is OK. You can use either order, but to see the actual colorspace
> used by capture you will have to call g_fmt(CAPTURE) after calling s_fmt(OUTPUT).
>
> It's the logical order anyway for a m2m device to start with the output first.

Okay, that makes sense.

>
>> Besides this, I noticed a series "V4L2 mem-to-mem ioctl helpers" which I
>> should take use of. Do you suggest I base my patches over that?
>
> I don't know when that will be merged. It might be easier to just add a new
> patch converting the driver to the helpers, then we can apply all patches except
> that last one if the helpers aren't merged yet.

That's a good idea. Will add a new patch in the next version.

Thanks for the review.

Archit


^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v5 3/4] v4l: ti-vpe: Add VPE mem to mem driver
  2013-09-06 10:12     ` Archit Taneja
@ 2013-10-09 14:29       ` Archit Taneja
  -1 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-10-09 14:29 UTC (permalink / raw)
  To: hverkuil; +Cc: linux-media, linux-omap, laurent.pinchart, Archit Taneja

VPE is a block which consists of a single memory to memory path which can
perform chrominance up/down sampling, de-interlacing, scaling, and color space
conversion of raster or tiled YUV420 coplanar, YUV422 coplanar or YUV422
interleaved video formats.

We create a mem2mem driver based primarily on the mem2mem-testdev example.
The de-interlacer, scaler and color space converter are all bypassed for now
to keep the driver simple. Chroma up/down sampler blocks are implemented, so
conversion beteen different YUV formats is possible.

Each mem2mem context allocates a buffer for VPE MMR values which it will use
when it gets access to the VPE HW via the mem2mem queue, it also allocates
a VPDMA descriptor list to which configuration and data descriptors are added.

Based on the information received via v4l2 ioctls for the source and
destination queues, the driver configures the values for the MMRs, and stores
them in the buffer. There are also some VPDMA parameters like frame start and
line mode which needs to be configured, these are configured by direct register
writes via the VPDMA helper functions.

The driver's device_run() mem2mem op will add each descriptor based on how the
source and destination queues are set up for the given ctx, once the list is
prepared, it's submitted to VPDMA, these descriptors when parsed by VPDMA will
upload MMR registers, start DMA of video buffers on the various input and output
clients/ports.

When the list is parsed completely(and the DMAs on all the output ports done),
an interrupt is generated which we use to notify that the source and destination
buffers are done.

The rest of the driver is quite similar to other mem2mem drivers, we use the
multiplane v4l2 ioctls as the HW support coplanar formats.

Signed-off-by: Archit Taneja <archit@ti.com>
---
- changes in v5:
 - updated how pix->colorspace is set.
 - adds comments on what our private control ID is used for.

- Removed the other patches from the series since they are the same.

 drivers/media/platform/Kconfig           |   16 +
 drivers/media/platform/Makefile          |    2 +
 drivers/media/platform/ti-vpe/Makefile   |    5 +
 drivers/media/platform/ti-vpe/vpe.c      | 1775 ++++++++++++++++++++++++++++++
 drivers/media/platform/ti-vpe/vpe_regs.h |  496 +++++++++
 include/uapi/linux/v4l2-controls.h       |    4 +
 6 files changed, 2298 insertions(+)
 create mode 100644 drivers/media/platform/ti-vpe/Makefile
 create mode 100644 drivers/media/platform/ti-vpe/vpe.c
 create mode 100644 drivers/media/platform/ti-vpe/vpe_regs.h

diff --git a/drivers/media/platform/Kconfig b/drivers/media/platform/Kconfig
index c7caf94..fc84d99 100644
--- a/drivers/media/platform/Kconfig
+++ b/drivers/media/platform/Kconfig
@@ -220,6 +220,22 @@ config VIDEO_RENESAS_VSP1
 	  To compile this driver as a module, choose M here: the module
 	  will be called vsp1.
 
+config VIDEO_TI_VPE
+	tristate "TI VPE (Video Processing Engine) driver"
+	depends on VIDEO_DEV && VIDEO_V4L2 && SOC_DRA7XX
+	select VIDEOBUF2_DMA_CONTIG
+	select V4L2_MEM2MEM_DEV
+	default n
+	---help---
+	  Support for the TI VPE(Video Processing Engine) block
+	  found on DRA7XX SoC.
+
+config VIDEO_TI_VPE_DEBUG
+	bool "VPE debug messages"
+	depends on VIDEO_TI_VPE
+	---help---
+	  Enable debug messages on VPE driver.
+
 endif # V4L_MEM2MEM_DRIVERS
 
 menuconfig V4L_TEST_DRIVERS
diff --git a/drivers/media/platform/Makefile b/drivers/media/platform/Makefile
index 4e4da48..1348ba1 100644
--- a/drivers/media/platform/Makefile
+++ b/drivers/media/platform/Makefile
@@ -22,6 +22,8 @@ obj-$(CONFIG_VIDEO_VIVI) += vivi.o
 
 obj-$(CONFIG_VIDEO_MEM2MEM_TESTDEV) += mem2mem_testdev.o
 
+obj-$(CONFIG_VIDEO_TI_VPE)		+= ti-vpe/
+
 obj-$(CONFIG_VIDEO_MX2_EMMAPRP)		+= mx2_emmaprp.o
 obj-$(CONFIG_VIDEO_CODA) 		+= coda.o
 
diff --git a/drivers/media/platform/ti-vpe/Makefile b/drivers/media/platform/ti-vpe/Makefile
new file mode 100644
index 0000000..cbf0a80
--- /dev/null
+++ b/drivers/media/platform/ti-vpe/Makefile
@@ -0,0 +1,5 @@
+obj-$(CONFIG_VIDEO_TI_VPE) += ti-vpe.o
+
+ti-vpe-y := vpe.o vpdma.o
+
+ccflags-$(CONFIG_VIDEO_TI_VPE_DEBUG) += -DDEBUG
diff --git a/drivers/media/platform/ti-vpe/vpe.c b/drivers/media/platform/ti-vpe/vpe.c
new file mode 100644
index 0000000..3bd9ca6
--- /dev/null
+++ b/drivers/media/platform/ti-vpe/vpe.c
@@ -0,0 +1,1775 @@
+/*
+ * TI VPE mem2mem driver, based on the virtual v4l2-mem2mem example driver
+ *
+ * Copyright (c) 2013 Texas Instruments Inc.
+ * David Griego, <dagriego@biglakesoftware.com>
+ * Dale Farnsworth, <dale@farnsworth.org>
+ * Archit Taneja, <archit@ti.com>
+ *
+ * Copyright (c) 2009-2010 Samsung Electronics Co., Ltd.
+ * Pawel Osciak, <pawel@osciak.com>
+ * Marek Szyprowski, <m.szyprowski@samsung.com>
+ *
+ * Based on the virtual v4l2-mem2mem example device
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation
+ */
+
+#include <linux/delay.h>
+#include <linux/dma-mapping.h>
+#include <linux/err.h>
+#include <linux/fs.h>
+#include <linux/interrupt.h>
+#include <linux/io.h>
+#include <linux/ioctl.h>
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <linux/pm_runtime.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/videodev2.h>
+
+#include <media/v4l2-common.h>
+#include <media/v4l2-ctrls.h>
+#include <media/v4l2-device.h>
+#include <media/v4l2-event.h>
+#include <media/v4l2-ioctl.h>
+#include <media/v4l2-mem2mem.h>
+#include <media/videobuf2-core.h>
+#include <media/videobuf2-dma-contig.h>
+
+#include "vpdma.h"
+#include "vpe_regs.h"
+
+#define VPE_MODULE_NAME "vpe"
+
+/* minimum and maximum frame sizes */
+#define MIN_W		128
+#define MIN_H		128
+#define MAX_W		1920
+#define MAX_H		1080
+
+/* required alignments */
+#define S_ALIGN		0	/* multiple of 1 */
+#define H_ALIGN		1	/* multiple of 2 */
+#define W_ALIGN		1	/* multiple of 2 */
+
+/* multiple of 128 bits, line stride, 16 bytes */
+#define L_ALIGN		4
+
+/* flags that indicate a format can be used for capture/output */
+#define VPE_FMT_TYPE_CAPTURE	(1 << 0)
+#define VPE_FMT_TYPE_OUTPUT	(1 << 1)
+
+/* used as plane indices */
+#define VPE_MAX_PLANES	2
+#define VPE_LUMA	0
+#define VPE_CHROMA	1
+
+/* per m2m context info */
+#define VPE_DEF_BUFS_PER_JOB	1	/* default one buffer per batch job */
+
+/*
+ * each VPE context can need up to 3 config desciptors, 7 input descriptors,
+ * 3 output descriptors, and 10 control descriptors
+ */
+#define VPE_DESC_LIST_SIZE	(10 * VPDMA_DTD_DESC_SIZE +	\
+					13 * VPDMA_CFD_CTD_DESC_SIZE)
+
+#define vpe_dbg(vpedev, fmt, arg...)	\
+		dev_dbg((vpedev)->v4l2_dev.dev, fmt, ##arg)
+#define vpe_err(vpedev, fmt, arg...)	\
+		dev_err((vpedev)->v4l2_dev.dev, fmt, ##arg)
+
+struct vpe_us_coeffs {
+	unsigned short	anchor_fid0_c0;
+	unsigned short	anchor_fid0_c1;
+	unsigned short	anchor_fid0_c2;
+	unsigned short	anchor_fid0_c3;
+	unsigned short	interp_fid0_c0;
+	unsigned short	interp_fid0_c1;
+	unsigned short	interp_fid0_c2;
+	unsigned short	interp_fid0_c3;
+	unsigned short	anchor_fid1_c0;
+	unsigned short	anchor_fid1_c1;
+	unsigned short	anchor_fid1_c2;
+	unsigned short	anchor_fid1_c3;
+	unsigned short	interp_fid1_c0;
+	unsigned short	interp_fid1_c1;
+	unsigned short	interp_fid1_c2;
+	unsigned short	interp_fid1_c3;
+};
+
+/*
+ * Default upsampler coefficients
+ */
+static const struct vpe_us_coeffs us_coeffs[] = {
+	{
+		/* Coefficients for progressive input */
+		0x00C8, 0x0348, 0x0018, 0x3FD8, 0x3FB8, 0x0378, 0x00E8, 0x3FE8,
+		0x00C8, 0x0348, 0x0018, 0x3FD8, 0x3FB8, 0x0378, 0x00E8, 0x3FE8,
+	},
+};
+
+/*
+ * The port_data structure contains per-port data.
+ */
+struct vpe_port_data {
+	enum vpdma_channel channel;	/* VPDMA channel */
+	u8	vb_part;		/* plane index for co-panar formats */
+};
+
+/*
+ * Define indices into the port_data tables
+ */
+#define VPE_PORT_LUMA1_IN	0
+#define VPE_PORT_CHROMA1_IN	1
+#define VPE_PORT_LUMA_OUT	8
+#define VPE_PORT_CHROMA_OUT	9
+#define VPE_PORT_RGB_OUT	10
+
+static const struct vpe_port_data port_data[11] = {
+	[VPE_PORT_LUMA1_IN] = {
+		.channel	= VPE_CHAN_LUMA1_IN,
+		.vb_part	= VPE_LUMA,
+	},
+	[VPE_PORT_CHROMA1_IN] = {
+		.channel	= VPE_CHAN_CHROMA1_IN,
+		.vb_part	= VPE_CHROMA,
+	},
+	[VPE_PORT_LUMA_OUT] = {
+		.channel	= VPE_CHAN_LUMA_OUT,
+		.vb_part	= VPE_LUMA,
+	},
+	[VPE_PORT_CHROMA_OUT] = {
+		.channel	= VPE_CHAN_CHROMA_OUT,
+		.vb_part	= VPE_CHROMA,
+	},
+	[VPE_PORT_RGB_OUT] = {
+		.channel	= VPE_CHAN_RGB_OUT,
+		.vb_part	= VPE_LUMA,
+	},
+};
+
+
+/* driver info for each of the supported video formats */
+struct vpe_fmt {
+	char	*name;			/* human-readable name */
+	u32	fourcc;			/* standard format identifier */
+	u8	types;			/* CAPTURE and/or OUTPUT */
+	u8	coplanar;		/* set for unpacked Luma and Chroma */
+	/* vpdma format info for each plane */
+	struct vpdma_data_format const *vpdma_fmt[VPE_MAX_PLANES];
+};
+
+static struct vpe_fmt vpe_formats[] = {
+	{
+		.name		= "YUV 422 co-planar",
+		.fourcc		= V4L2_PIX_FMT_NV16,
+		.types		= VPE_FMT_TYPE_CAPTURE | VPE_FMT_TYPE_OUTPUT,
+		.coplanar	= 1,
+		.vpdma_fmt	= { &vpdma_yuv_fmts[VPDMA_DATA_FMT_Y444],
+				    &vpdma_yuv_fmts[VPDMA_DATA_FMT_C444],
+				  },
+	},
+	{
+		.name		= "YUV 420 co-planar",
+		.fourcc		= V4L2_PIX_FMT_NV12,
+		.types		= VPE_FMT_TYPE_CAPTURE | VPE_FMT_TYPE_OUTPUT,
+		.coplanar	= 1,
+		.vpdma_fmt	= { &vpdma_yuv_fmts[VPDMA_DATA_FMT_Y420],
+				    &vpdma_yuv_fmts[VPDMA_DATA_FMT_C420],
+				  },
+	},
+	{
+		.name		= "YUYV 422 packed",
+		.fourcc		= V4L2_PIX_FMT_YUYV,
+		.types		= VPE_FMT_TYPE_CAPTURE | VPE_FMT_TYPE_OUTPUT,
+		.coplanar	= 0,
+		.vpdma_fmt	= { &vpdma_yuv_fmts[VPDMA_DATA_FMT_YC422],
+				  },
+	},
+	{
+		.name		= "UYVY 422 packed",
+		.fourcc		= V4L2_PIX_FMT_UYVY,
+		.types		= VPE_FMT_TYPE_CAPTURE | VPE_FMT_TYPE_OUTPUT,
+		.coplanar	= 0,
+		.vpdma_fmt	= { &vpdma_yuv_fmts[VPDMA_DATA_FMT_CY422],
+				  },
+	},
+};
+
+/*
+ * per-queue, driver-specific private data.
+ * there is one source queue and one destination queue for each m2m context.
+ */
+struct vpe_q_data {
+	unsigned int		width;				/* frame width */
+	unsigned int		height;				/* frame height */
+	unsigned int		bytesperline[VPE_MAX_PLANES];	/* bytes per line in memory */
+	enum v4l2_colorspace	colorspace;
+	unsigned int		flags;
+	unsigned int		sizeimage[VPE_MAX_PLANES];	/* image size in memory */
+	struct v4l2_rect	c_rect;				/* crop/compose rectangle */
+	struct vpe_fmt		*fmt;				/* format info */
+};
+
+/* vpe_q_data flag bits */
+#define	Q_DATA_FRAME_1D		(1 << 0)
+#define	Q_DATA_MODE_TILED	(1 << 1)
+
+enum {
+	Q_DATA_SRC = 0,
+	Q_DATA_DST = 1,
+};
+
+/* find our format description corresponding to the passed v4l2_format */
+static struct vpe_fmt *find_format(struct v4l2_format *f)
+{
+	struct vpe_fmt *fmt;
+	unsigned int k;
+
+	for (k = 0; k < ARRAY_SIZE(vpe_formats); k++) {
+		fmt = &vpe_formats[k];
+		if (fmt->fourcc == f->fmt.pix.pixelformat)
+			return fmt;
+	}
+
+	return NULL;
+}
+
+/*
+ * there is one vpe_dev structure in the driver, it is shared by
+ * all instances.
+ */
+struct vpe_dev {
+	struct v4l2_device	v4l2_dev;
+	struct video_device	vfd;
+	struct v4l2_m2m_dev	*m2m_dev;
+
+	atomic_t		num_instances;	/* count of driver instances */
+	dma_addr_t		loaded_mmrs;	/* shadow mmrs in device */
+	struct mutex		dev_mutex;
+	spinlock_t		lock;
+
+	int			irq;
+	void __iomem		*base;
+
+	struct vb2_alloc_ctx	*alloc_ctx;
+	struct vpdma_data	*vpdma;		/* vpdma data handle */
+};
+
+/*
+ * There is one vpe_ctx structure for each m2m context.
+ */
+struct vpe_ctx {
+	struct v4l2_fh		fh;
+	struct vpe_dev		*dev;
+	struct v4l2_m2m_ctx	*m2m_ctx;
+	struct v4l2_ctrl_handler hdl;
+
+	unsigned int		sequence;		/* current frame/field seq */
+	unsigned int		aborting;		/* abort after next irq */
+
+	unsigned int		bufs_per_job;		/* input buffers per batch */
+	unsigned int		bufs_completed;		/* bufs done in this batch */
+
+	struct vpe_q_data	q_data[2];		/* src & dst queue data */
+	struct vb2_buffer	*src_vb;
+	struct vb2_buffer	*dst_vb;
+
+	struct vpdma_buf	mmr_adb;		/* shadow reg addr/data block */
+	struct vpdma_desc_list	desc_list;		/* DMA descriptor list */
+
+	bool			load_mmrs;		/* have new shadow reg values */
+};
+
+
+/*
+ * M2M devices get 2 queues.
+ * Return the queue given the type.
+ */
+static struct vpe_q_data *get_q_data(struct vpe_ctx *ctx,
+				     enum v4l2_buf_type type)
+{
+	switch (type) {
+	case V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE:
+		return &ctx->q_data[Q_DATA_SRC];
+	case V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE:
+		return &ctx->q_data[Q_DATA_DST];
+	default:
+		BUG();
+	}
+	return NULL;
+}
+
+static u32 read_reg(struct vpe_dev *dev, int offset)
+{
+	return ioread32(dev->base + offset);
+}
+
+static void write_reg(struct vpe_dev *dev, int offset, u32 value)
+{
+	iowrite32(value, dev->base + offset);
+}
+
+/* register field read/write helpers */
+static int get_field(u32 value, u32 mask, int shift)
+{
+	return (value & (mask << shift)) >> shift;
+}
+
+static int read_field_reg(struct vpe_dev *dev, int offset, u32 mask, int shift)
+{
+	return get_field(read_reg(dev, offset), mask, shift);
+}
+
+static void write_field(u32 *valp, u32 field, u32 mask, int shift)
+{
+	u32 val = *valp;
+
+	val &= ~(mask << shift);
+	val |= (field & mask) << shift;
+	*valp = val;
+}
+
+static void write_field_reg(struct vpe_dev *dev, int offset, u32 field,
+		u32 mask, int shift)
+{
+	u32 val = read_reg(dev, offset);
+
+	write_field(&val, field, mask, shift);
+
+	write_reg(dev, offset, val);
+}
+
+/*
+ * DMA address/data block for the shadow registers
+ */
+struct vpe_mmr_adb {
+	struct vpdma_adb_hdr	out_fmt_hdr;
+	u32			out_fmt_reg[1];
+	u32			out_fmt_pad[3];
+	struct vpdma_adb_hdr	us1_hdr;
+	u32			us1_regs[8];
+	struct vpdma_adb_hdr	us2_hdr;
+	u32			us2_regs[8];
+	struct vpdma_adb_hdr	us3_hdr;
+	u32			us3_regs[8];
+	struct vpdma_adb_hdr	dei_hdr;
+	u32			dei_regs[1];
+	u32			dei_pad[3];
+	struct vpdma_adb_hdr	sc_hdr;
+	u32			sc_regs[1];
+	u32			sc_pad[3];
+	struct vpdma_adb_hdr	csc_hdr;
+	u32			csc_regs[6];
+	u32			csc_pad[2];
+};
+
+#define VPE_SET_MMR_ADB_HDR(ctx, hdr, regs, offset_a)	\
+	VPDMA_SET_MMR_ADB_HDR(ctx->mmr_adb, vpe_mmr_adb, hdr, regs, offset_a)
+/*
+ * Set the headers for all of the address/data block structures.
+ */
+static void init_adb_hdrs(struct vpe_ctx *ctx)
+{
+	VPE_SET_MMR_ADB_HDR(ctx, out_fmt_hdr, out_fmt_reg, VPE_CLK_FORMAT_SELECT);
+	VPE_SET_MMR_ADB_HDR(ctx, us1_hdr, us1_regs, VPE_US1_R0);
+	VPE_SET_MMR_ADB_HDR(ctx, us2_hdr, us2_regs, VPE_US2_R0);
+	VPE_SET_MMR_ADB_HDR(ctx, us3_hdr, us3_regs, VPE_US3_R0);
+	VPE_SET_MMR_ADB_HDR(ctx, dei_hdr, dei_regs, VPE_DEI_FRAME_SIZE);
+	VPE_SET_MMR_ADB_HDR(ctx, sc_hdr, sc_regs, VPE_SC_MP_SC0);
+	VPE_SET_MMR_ADB_HDR(ctx, csc_hdr, csc_regs, VPE_CSC_CSC00);
+};
+
+/*
+ * Enable or disable the VPE clocks
+ */
+static void vpe_set_clock_enable(struct vpe_dev *dev, bool on)
+{
+	u32 val = 0;
+
+	if (on)
+		val = VPE_DATA_PATH_CLK_ENABLE | VPE_VPEDMA_CLK_ENABLE;
+	write_reg(dev, VPE_CLK_ENABLE, val);
+}
+
+static void vpe_top_reset(struct vpe_dev *dev)
+{
+
+	write_field_reg(dev, VPE_CLK_RESET, 1, VPE_DATA_PATH_CLK_RESET_MASK,
+		VPE_DATA_PATH_CLK_RESET_SHIFT);
+
+	usleep_range(100, 150);
+
+	write_field_reg(dev, VPE_CLK_RESET, 0, VPE_DATA_PATH_CLK_RESET_MASK,
+		VPE_DATA_PATH_CLK_RESET_SHIFT);
+}
+
+static void vpe_top_vpdma_reset(struct vpe_dev *dev)
+{
+	write_field_reg(dev, VPE_CLK_RESET, 1, VPE_VPDMA_CLK_RESET_MASK,
+		VPE_VPDMA_CLK_RESET_SHIFT);
+
+	usleep_range(100, 150);
+
+	write_field_reg(dev, VPE_CLK_RESET, 0, VPE_VPDMA_CLK_RESET_MASK,
+		VPE_VPDMA_CLK_RESET_SHIFT);
+}
+
+/*
+ * Load the correct of upsampler coefficients into the shadow MMRs
+ */
+static void set_us_coefficients(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	u32 *us1_reg = &mmr_adb->us1_regs[0];
+	u32 *us2_reg = &mmr_adb->us2_regs[0];
+	u32 *us3_reg = &mmr_adb->us3_regs[0];
+	const unsigned short *cp, *end_cp;
+
+	cp = &us_coeffs[0].anchor_fid0_c0;
+
+	end_cp = cp + sizeof(us_coeffs[0]) / sizeof(*cp);
+
+	while (cp < end_cp) {
+		write_field(us1_reg, *cp++, VPE_US_C0_MASK, VPE_US_C0_SHIFT);
+		write_field(us1_reg, *cp++, VPE_US_C1_MASK, VPE_US_C1_SHIFT);
+		*us2_reg++ = *us1_reg;
+		*us3_reg++ = *us1_reg++;
+	}
+	ctx->load_mmrs = true;
+}
+
+/*
+ * Set the upsampler config mode and the VPDMA line mode in the shadow MMRs.
+ */
+static void set_cfg_and_line_modes(struct vpe_ctx *ctx)
+{
+	struct vpe_fmt *fmt = ctx->q_data[Q_DATA_SRC].fmt;
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	u32 *us1_reg0 = &mmr_adb->us1_regs[0];
+	u32 *us2_reg0 = &mmr_adb->us2_regs[0];
+	u32 *us3_reg0 = &mmr_adb->us3_regs[0];
+	int line_mode = 1;
+	int cfg_mode = 1;
+
+	/*
+	 * Cfg Mode 0: YUV420 source, enable upsampler, DEI is de-interlacing.
+	 * Cfg Mode 1: YUV422 source, disable upsampler, DEI is de-interlacing.
+	 */
+
+	if (fmt->fourcc == V4L2_PIX_FMT_NV12) {
+		cfg_mode = 0;
+		line_mode = 0;		/* double lines to line buffer */
+	}
+
+	write_field(us1_reg0, cfg_mode, VPE_US_MODE_MASK, VPE_US_MODE_SHIFT);
+	write_field(us2_reg0, cfg_mode, VPE_US_MODE_MASK, VPE_US_MODE_SHIFT);
+	write_field(us3_reg0, cfg_mode, VPE_US_MODE_MASK, VPE_US_MODE_SHIFT);
+
+	/* regs for now */
+	vpdma_set_line_mode(ctx->dev->vpdma, line_mode, VPE_CHAN_CHROMA1_IN);
+
+	/* frame start for input luma */
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_LUMA1_IN);
+
+	/* frame start for input chroma */
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_CHROMA1_IN);
+
+	ctx->load_mmrs = true;
+}
+
+/*
+ * Set the shadow registers that are modified when the source
+ * format changes.
+ */
+static void set_src_registers(struct vpe_ctx *ctx)
+{
+	set_us_coefficients(ctx);
+}
+
+/*
+ * Set the shadow registers that are modified when the destination
+ * format changes.
+ */
+static void set_dst_registers(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	struct vpe_fmt *fmt = ctx->q_data[Q_DATA_DST].fmt;
+	u32 val = 0;
+
+	/* select RGB path when color space conversion is supported in future */
+	if (fmt->fourcc == V4L2_PIX_FMT_RGB24)
+		val |= VPE_RGB_OUT_SELECT | VPE_CSC_SRC_DEI_SCALER;
+	else if (fmt->fourcc == V4L2_PIX_FMT_NV16)
+		val |= VPE_COLOR_SEPARATE_422;
+
+	/* The source of CHR_DS is always the scaler, whether it's used or not */
+	val |= VPE_DS_SRC_DEI_SCALER;
+
+	if (fmt->fourcc != V4L2_PIX_FMT_NV12)
+		val |= VPE_DS_BYPASS;
+
+	mmr_adb->out_fmt_reg[0] = val;
+
+	ctx->load_mmrs = true;
+}
+
+/*
+ * Set the de-interlacer shadow register values
+ */
+static void set_dei_regs_bypass(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	struct vpe_q_data *s_q_data = &ctx->q_data[Q_DATA_SRC];
+	unsigned int src_h = s_q_data->c_rect.height;
+	unsigned int src_w = s_q_data->c_rect.width;
+	u32 *dei_mmr0 = &mmr_adb->dei_regs[0];
+	u32 val = 0;
+
+	/*
+	 * according to TRM, we should set DEI in progressive bypass mode when
+	 * the input content is progressive, however, DEI is bypassed correctly
+	 * for both progressive and interlace content in interlace bypass mode.
+	 * It has been recommended not to use progressive bypass mode.
+	 */
+	val = VPE_DEI_INTERLACE_BYPASS;
+
+	val |= (src_h << VPE_DEI_HEIGHT_SHIFT) |
+		(src_w << VPE_DEI_WIDTH_SHIFT) |
+		VPE_DEI_FIELD_FLUSH;
+
+	*dei_mmr0 = val;
+
+	ctx->load_mmrs = true;
+}
+
+static void set_csc_coeff_bypass(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	u32 *shadow_csc_reg5 = &mmr_adb->csc_regs[5];
+
+	*shadow_csc_reg5 |= VPE_CSC_BYPASS;
+
+	ctx->load_mmrs = true;
+}
+
+static void set_sc_regs_bypass(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	u32 *sc_reg0 = &mmr_adb->sc_regs[0];
+	u32 val = 0;
+
+	val |= VPE_SC_BYPASS;
+	*sc_reg0 = val;
+
+	ctx->load_mmrs = true;
+}
+
+/*
+ * Set the shadow registers whose values are modified when either the
+ * source or destination format is changed.
+ */
+static int set_srcdst_params(struct vpe_ctx *ctx)
+{
+	ctx->sequence = 0;
+
+	set_cfg_and_line_modes(ctx);
+	set_dei_regs_bypass(ctx);
+	set_csc_coeff_bypass(ctx);
+	set_sc_regs_bypass(ctx);
+
+	return 0;
+}
+
+/*
+ * Return the vpe_ctx structure for a given struct file
+ */
+static struct vpe_ctx *file2ctx(struct file *file)
+{
+	return container_of(file->private_data, struct vpe_ctx, fh);
+}
+
+/*
+ * mem2mem callbacks
+ */
+
+/**
+ * job_ready() - check whether an instance is ready to be scheduled to run
+ */
+static int job_ready(void *priv)
+{
+	struct vpe_ctx *ctx = priv;
+	int needed = ctx->bufs_per_job;
+
+	if (v4l2_m2m_num_src_bufs_ready(ctx->m2m_ctx) < needed)
+		return 0;
+
+	return 1;
+}
+
+static void job_abort(void *priv)
+{
+	struct vpe_ctx *ctx = priv;
+
+	/* Will cancel the transaction in the next interrupt handler */
+	ctx->aborting = 1;
+}
+
+/*
+ * Lock access to the device
+ */
+static void vpe_lock(void *priv)
+{
+	struct vpe_ctx *ctx = priv;
+	struct vpe_dev *dev = ctx->dev;
+	mutex_lock(&dev->dev_mutex);
+}
+
+static void vpe_unlock(void *priv)
+{
+	struct vpe_ctx *ctx = priv;
+	struct vpe_dev *dev = ctx->dev;
+	mutex_unlock(&dev->dev_mutex);
+}
+
+static void vpe_dump_regs(struct vpe_dev *dev)
+{
+#define DUMPREG(r) vpe_dbg(dev, "%-35s %08x\n", #r, read_reg(dev, VPE_##r))
+
+	vpe_dbg(dev, "VPE Registers:\n");
+
+	DUMPREG(PID);
+	DUMPREG(SYSCONFIG);
+	DUMPREG(INT0_STATUS0_RAW);
+	DUMPREG(INT0_STATUS0);
+	DUMPREG(INT0_ENABLE0);
+	DUMPREG(INT0_STATUS1_RAW);
+	DUMPREG(INT0_STATUS1);
+	DUMPREG(INT0_ENABLE1);
+	DUMPREG(CLK_ENABLE);
+	DUMPREG(CLK_RESET);
+	DUMPREG(CLK_FORMAT_SELECT);
+	DUMPREG(CLK_RANGE_MAP);
+	DUMPREG(US1_R0);
+	DUMPREG(US1_R1);
+	DUMPREG(US1_R2);
+	DUMPREG(US1_R3);
+	DUMPREG(US1_R4);
+	DUMPREG(US1_R5);
+	DUMPREG(US1_R6);
+	DUMPREG(US1_R7);
+	DUMPREG(US2_R0);
+	DUMPREG(US2_R1);
+	DUMPREG(US2_R2);
+	DUMPREG(US2_R3);
+	DUMPREG(US2_R4);
+	DUMPREG(US2_R5);
+	DUMPREG(US2_R6);
+	DUMPREG(US2_R7);
+	DUMPREG(US3_R0);
+	DUMPREG(US3_R1);
+	DUMPREG(US3_R2);
+	DUMPREG(US3_R3);
+	DUMPREG(US3_R4);
+	DUMPREG(US3_R5);
+	DUMPREG(US3_R6);
+	DUMPREG(US3_R7);
+	DUMPREG(DEI_FRAME_SIZE);
+	DUMPREG(MDT_BYPASS);
+	DUMPREG(MDT_SF_THRESHOLD);
+	DUMPREG(EDI_CONFIG);
+	DUMPREG(DEI_EDI_LUT_R0);
+	DUMPREG(DEI_EDI_LUT_R1);
+	DUMPREG(DEI_EDI_LUT_R2);
+	DUMPREG(DEI_EDI_LUT_R3);
+	DUMPREG(DEI_FMD_WINDOW_R0);
+	DUMPREG(DEI_FMD_WINDOW_R1);
+	DUMPREG(DEI_FMD_CONTROL_R0);
+	DUMPREG(DEI_FMD_CONTROL_R1);
+	DUMPREG(DEI_FMD_STATUS_R0);
+	DUMPREG(DEI_FMD_STATUS_R1);
+	DUMPREG(DEI_FMD_STATUS_R2);
+	DUMPREG(SC_MP_SC0);
+	DUMPREG(SC_MP_SC1);
+	DUMPREG(SC_MP_SC2);
+	DUMPREG(SC_MP_SC3);
+	DUMPREG(SC_MP_SC4);
+	DUMPREG(SC_MP_SC5);
+	DUMPREG(SC_MP_SC6);
+	DUMPREG(SC_MP_SC8);
+	DUMPREG(SC_MP_SC9);
+	DUMPREG(SC_MP_SC10);
+	DUMPREG(SC_MP_SC11);
+	DUMPREG(SC_MP_SC12);
+	DUMPREG(SC_MP_SC13);
+	DUMPREG(SC_MP_SC17);
+	DUMPREG(SC_MP_SC18);
+	DUMPREG(SC_MP_SC19);
+	DUMPREG(SC_MP_SC20);
+	DUMPREG(SC_MP_SC21);
+	DUMPREG(SC_MP_SC22);
+	DUMPREG(SC_MP_SC23);
+	DUMPREG(SC_MP_SC24);
+	DUMPREG(SC_MP_SC25);
+	DUMPREG(CSC_CSC00);
+	DUMPREG(CSC_CSC01);
+	DUMPREG(CSC_CSC02);
+	DUMPREG(CSC_CSC03);
+	DUMPREG(CSC_CSC04);
+	DUMPREG(CSC_CSC05);
+#undef DUMPREG
+}
+
+static void add_out_dtd(struct vpe_ctx *ctx, int port)
+{
+	struct vpe_q_data *q_data = &ctx->q_data[Q_DATA_DST];
+	const struct vpe_port_data *p_data = &port_data[port];
+	struct vb2_buffer *vb = ctx->dst_vb;
+	struct v4l2_rect *c_rect = &q_data->c_rect;
+	struct vpe_fmt *fmt = q_data->fmt;
+	const struct vpdma_data_format *vpdma_fmt;
+	int plane = fmt->coplanar ? p_data->vb_part : 0;
+	dma_addr_t dma_addr;
+	u32 flags = 0;
+
+	vpdma_fmt = fmt->vpdma_fmt[plane];
+	dma_addr = vb2_dma_contig_plane_dma_addr(vb, plane);
+	if (!dma_addr) {
+		vpe_err(ctx->dev,
+			"acquiring output buffer(%d) dma_addr failed\n",
+			port);
+		return;
+	}
+
+	if (q_data->flags & Q_DATA_FRAME_1D)
+		flags |= VPDMA_DATA_FRAME_1D;
+	if (q_data->flags & Q_DATA_MODE_TILED)
+		flags |= VPDMA_DATA_MODE_TILED;
+
+	vpdma_add_out_dtd(&ctx->desc_list, c_rect, vpdma_fmt, dma_addr,
+		p_data->channel, flags);
+}
+
+static void add_in_dtd(struct vpe_ctx *ctx, int port)
+{
+	struct vpe_q_data *q_data = &ctx->q_data[Q_DATA_SRC];
+	const struct vpe_port_data *p_data = &port_data[port];
+	struct vb2_buffer *vb = ctx->src_vb;
+	struct v4l2_rect *c_rect = &q_data->c_rect;
+	struct vpe_fmt *fmt = q_data->fmt;
+	const struct vpdma_data_format *vpdma_fmt;
+	int plane = fmt->coplanar ? p_data->vb_part : 0;
+	int field = 0;
+	dma_addr_t dma_addr;
+	u32 flags = 0;
+
+	vpdma_fmt = fmt->vpdma_fmt[plane];
+
+	dma_addr = vb2_dma_contig_plane_dma_addr(vb, plane);
+	if (!dma_addr) {
+		vpe_err(ctx->dev,
+			"acquiring input buffer(%d) dma_addr failed\n",
+			port);
+		return;
+	}
+
+	if (q_data->flags & Q_DATA_FRAME_1D)
+		flags |= VPDMA_DATA_FRAME_1D;
+	if (q_data->flags & Q_DATA_MODE_TILED)
+		flags |= VPDMA_DATA_MODE_TILED;
+
+	vpdma_add_in_dtd(&ctx->desc_list, q_data->width, q_data->height,
+		c_rect, vpdma_fmt, dma_addr, p_data->channel, field, flags);
+}
+
+/*
+ * Enable the expected IRQ sources
+ */
+static void enable_irqs(struct vpe_ctx *ctx)
+{
+	write_reg(ctx->dev, VPE_INT0_ENABLE0_SET, VPE_INT0_LIST0_COMPLETE);
+	write_reg(ctx->dev, VPE_INT0_ENABLE1_SET, VPE_DS1_UV_ERROR_INT);
+
+	vpdma_enable_list_complete_irq(ctx->dev->vpdma, 0, true);
+}
+
+static void disable_irqs(struct vpe_ctx *ctx)
+{
+	write_reg(ctx->dev, VPE_INT0_ENABLE0_CLR, 0xffffffff);
+	write_reg(ctx->dev, VPE_INT0_ENABLE1_CLR, 0xffffffff);
+
+	vpdma_enable_list_complete_irq(ctx->dev->vpdma, 0, false);
+}
+
+/* device_run() - prepares and starts the device
+ *
+ * This function is only called when both the source and destination
+ * buffers are in place.
+ */
+static void device_run(void *priv)
+{
+	struct vpe_ctx *ctx = priv;
+	struct vpe_q_data *d_q_data = &ctx->q_data[Q_DATA_DST];
+
+	ctx->src_vb = v4l2_m2m_src_buf_remove(ctx->m2m_ctx);
+	WARN_ON(ctx->src_vb == NULL);
+	ctx->dst_vb = v4l2_m2m_dst_buf_remove(ctx->m2m_ctx);
+	WARN_ON(ctx->dst_vb == NULL);
+
+	/* config descriptors */
+	if (ctx->dev->loaded_mmrs != ctx->mmr_adb.dma_addr || ctx->load_mmrs) {
+		vpdma_map_desc_buf(ctx->dev->vpdma, &ctx->mmr_adb);
+		vpdma_add_cfd_adb(&ctx->desc_list, CFD_MMR_CLIENT, &ctx->mmr_adb);
+		ctx->dev->loaded_mmrs = ctx->mmr_adb.dma_addr;
+		ctx->load_mmrs = false;
+	}
+
+	add_out_dtd(ctx, VPE_PORT_LUMA_OUT);
+	if (d_q_data->fmt->coplanar)
+		add_out_dtd(ctx, VPE_PORT_CHROMA_OUT);
+
+	add_in_dtd(ctx, VPE_PORT_LUMA1_IN);
+	add_in_dtd(ctx, VPE_PORT_CHROMA1_IN);
+
+	/* sync on channel control descriptors for input ports */
+	vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_LUMA1_IN);
+	vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_CHROMA1_IN);
+
+	/* sync on channel control descriptors for output ports */
+	vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_LUMA_OUT);
+	if (d_q_data->fmt->coplanar)
+		vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_CHROMA_OUT);
+
+	enable_irqs(ctx);
+
+	vpdma_map_desc_buf(ctx->dev->vpdma, &ctx->desc_list.buf);
+	vpdma_submit_descs(ctx->dev->vpdma, &ctx->desc_list);
+}
+
+static void ds1_uv_error(struct vpe_ctx *ctx)
+{
+	dev_warn(ctx->dev->v4l2_dev.dev,
+		"received downsampler error interrupt\n");
+}
+
+static irqreturn_t vpe_irq(int irq_vpe, void *data)
+{
+	struct vpe_dev *dev = (struct vpe_dev *)data;
+	struct vpe_ctx *ctx;
+	struct vb2_buffer *s_vb, *d_vb;
+	struct v4l2_buffer *s_buf, *d_buf;
+	unsigned long flags;
+	u32 irqst0, irqst1;
+
+	irqst0 = read_reg(dev, VPE_INT0_STATUS0);
+	if (irqst0) {
+		write_reg(dev, VPE_INT0_STATUS0_CLR, irqst0);
+		vpe_dbg(dev, "INT0_STATUS0 = 0x%08x\n", irqst0);
+	}
+
+	irqst1 = read_reg(dev, VPE_INT0_STATUS1);
+	if (irqst1) {
+		write_reg(dev, VPE_INT0_STATUS1_CLR, irqst1);
+		vpe_dbg(dev, "INT0_STATUS1 = 0x%08x\n", irqst1);
+	}
+
+	ctx = v4l2_m2m_get_curr_priv(dev->m2m_dev);
+	if (!ctx) {
+		vpe_err(dev, "instance released before end of transaction\n");
+		goto handled;
+	}
+
+	if (irqst1 & VPE_DS1_UV_ERROR_INT) {
+		irqst1 &= ~VPE_DS1_UV_ERROR_INT;
+		ds1_uv_error(ctx);
+	}
+
+	if (irqst0) {
+		if (irqst0 & VPE_INT0_LIST0_COMPLETE)
+			vpdma_clear_list_stat(ctx->dev->vpdma);
+
+		irqst0 &= ~(VPE_INT0_LIST0_COMPLETE);
+	}
+
+	if (irqst0 | irqst1) {
+		dev_warn(dev->v4l2_dev.dev, "Unexpected interrupt: "
+			"INT0_STATUS0 = 0x%08x, INT0_STATUS1 = 0x%08x\n",
+			irqst0, irqst1);
+	}
+
+	disable_irqs(ctx);
+
+	vpdma_unmap_desc_buf(dev->vpdma, &ctx->desc_list.buf);
+	vpdma_unmap_desc_buf(dev->vpdma, &ctx->mmr_adb);
+
+	vpdma_reset_desc_list(&ctx->desc_list);
+
+	if (ctx->aborting)
+		goto finished;
+
+	s_vb = ctx->src_vb;
+	d_vb = ctx->dst_vb;
+	s_buf = &s_vb->v4l2_buf;
+	d_buf = &d_vb->v4l2_buf;
+
+	d_buf->timestamp = s_buf->timestamp;
+	if (s_buf->flags & V4L2_BUF_FLAG_TIMECODE) {
+		d_buf->flags |= V4L2_BUF_FLAG_TIMECODE;
+		d_buf->timecode = s_buf->timecode;
+	}
+
+	d_buf->sequence = ctx->sequence;
+
+	ctx->sequence++;
+
+	spin_lock_irqsave(&dev->lock, flags);
+	v4l2_m2m_buf_done(s_vb, VB2_BUF_STATE_DONE);
+	v4l2_m2m_buf_done(d_vb, VB2_BUF_STATE_DONE);
+	spin_unlock_irqrestore(&dev->lock, flags);
+
+	ctx->bufs_completed++;
+	if (ctx->bufs_completed < ctx->bufs_per_job) {
+		device_run(ctx);
+		goto handled;
+	}
+
+finished:
+	vpe_dbg(ctx->dev, "finishing transaction\n");
+	ctx->bufs_completed = 0;
+	v4l2_m2m_job_finish(dev->m2m_dev, ctx->m2m_ctx);
+handled:
+	return IRQ_HANDLED;
+}
+
+/*
+ * video ioctls
+ */
+static int vpe_querycap(struct file *file, void *priv,
+			struct v4l2_capability *cap)
+{
+	strncpy(cap->driver, VPE_MODULE_NAME, sizeof(cap->driver) - 1);
+	strncpy(cap->card, VPE_MODULE_NAME, sizeof(cap->card) - 1);
+	strlcpy(cap->bus_info, VPE_MODULE_NAME, sizeof(cap->bus_info));
+	cap->device_caps  = V4L2_CAP_VIDEO_M2M | V4L2_CAP_STREAMING;
+	cap->capabilities = cap->device_caps | V4L2_CAP_DEVICE_CAPS;
+	return 0;
+}
+
+static int __enum_fmt(struct v4l2_fmtdesc *f, u32 type)
+{
+	int i, index;
+	struct vpe_fmt *fmt = NULL;
+
+	index = 0;
+	for (i = 0; i < ARRAY_SIZE(vpe_formats); ++i) {
+		if (vpe_formats[i].types & type) {
+			if (index == f->index) {
+				fmt = &vpe_formats[i];
+				break;
+			}
+			index++;
+		}
+	}
+
+	if (!fmt)
+		return -EINVAL;
+
+	strncpy(f->description, fmt->name, sizeof(f->description) - 1);
+	f->pixelformat = fmt->fourcc;
+	return 0;
+}
+
+static int vpe_enum_fmt(struct file *file, void *priv,
+				struct v4l2_fmtdesc *f)
+{
+	if (V4L2_TYPE_IS_OUTPUT(f->type))
+		return __enum_fmt(f, VPE_FMT_TYPE_OUTPUT);
+
+	return __enum_fmt(f, VPE_FMT_TYPE_CAPTURE);
+}
+
+static int vpe_g_fmt(struct file *file, void *priv, struct v4l2_format *f)
+{
+	struct v4l2_pix_format_mplane *pix = &f->fmt.pix_mp;
+	struct vpe_ctx *ctx = file2ctx(file);
+	struct vb2_queue *vq;
+	struct vpe_q_data *q_data;
+	int i;
+
+	vq = v4l2_m2m_get_vq(ctx->m2m_ctx, f->type);
+	if (!vq)
+		return -EINVAL;
+
+	q_data = get_q_data(ctx, f->type);
+
+	pix->width = q_data->width;
+	pix->height = q_data->height;
+	pix->pixelformat = q_data->fmt->fourcc;
+
+	if (V4L2_TYPE_IS_OUTPUT(f->type)) {
+		pix->colorspace = q_data->colorspace;
+	} else {
+		struct vpe_q_data *s_q_data;
+
+		/* get colorspace from the source queue */
+		s_q_data = get_q_data(ctx, V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE);
+
+		pix->colorspace = s_q_data->colorspace;
+	}
+
+	pix->num_planes = q_data->fmt->coplanar ? 2 : 1;
+
+	for (i = 0; i < pix->num_planes; i++) {
+		pix->plane_fmt[i].bytesperline = q_data->bytesperline[i];
+		pix->plane_fmt[i].sizeimage = q_data->sizeimage[i];
+	}
+
+	return 0;
+}
+
+static int __vpe_try_fmt(struct vpe_ctx *ctx, struct v4l2_format *f,
+		       struct vpe_fmt *fmt, int type)
+{
+	struct v4l2_pix_format_mplane *pix = &f->fmt.pix_mp;
+	struct v4l2_plane_pix_format *plane_fmt;
+	int i;
+
+	if (!fmt || !(fmt->types & type)) {
+		vpe_err(ctx->dev, "Fourcc format (0x%08x) invalid.\n",
+			pix->pixelformat);
+		return -EINVAL;
+	}
+
+	pix->field = V4L2_FIELD_NONE;
+
+	v4l_bound_align_image(&pix->width, MIN_W, MAX_W, W_ALIGN,
+			      &pix->height, MIN_H, MAX_H, H_ALIGN,
+			      S_ALIGN);
+
+	pix->num_planes = fmt->coplanar ? 2 : 1;
+	pix->pixelformat = fmt->fourcc;
+
+	if (type == VPE_FMT_TYPE_CAPTURE) {
+		struct vpe_q_data *s_q_data;
+
+		/* get colorspace from the source queue */
+		s_q_data = get_q_data(ctx, V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE);
+
+		pix->colorspace = s_q_data->colorspace;
+	} else {
+		if (!pix->colorspace)
+			pix->colorspace = V4L2_COLORSPACE_SMPTE240M;
+	}
+
+	for (i = 0; i < pix->num_planes; i++) {
+		int depth;
+
+		plane_fmt = &pix->plane_fmt[i];
+		depth = fmt->vpdma_fmt[i]->depth;
+
+		if (i == VPE_LUMA)
+			plane_fmt->bytesperline =
+					round_up((pix->width * depth) >> 3,
+						1 << L_ALIGN);
+		else
+			plane_fmt->bytesperline = pix->width;
+
+		plane_fmt->sizeimage =
+				(pix->height * pix->width * depth) >> 3;
+	}
+
+	return 0;
+}
+
+static int vpe_try_fmt(struct file *file, void *priv, struct v4l2_format *f)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+	struct vpe_fmt *fmt = find_format(f);
+
+	if (V4L2_TYPE_IS_OUTPUT(f->type))
+		return __vpe_try_fmt(ctx, f, fmt, VPE_FMT_TYPE_OUTPUT);
+	else
+		return __vpe_try_fmt(ctx, f, fmt, VPE_FMT_TYPE_CAPTURE);
+}
+
+static int __vpe_s_fmt(struct vpe_ctx *ctx, struct v4l2_format *f)
+{
+	struct v4l2_pix_format_mplane *pix = &f->fmt.pix_mp;
+	struct v4l2_plane_pix_format *plane_fmt;
+	struct vpe_q_data *q_data;
+	struct vb2_queue *vq;
+	int i;
+
+	vq = v4l2_m2m_get_vq(ctx->m2m_ctx, f->type);
+	if (!vq)
+		return -EINVAL;
+
+	if (vb2_is_busy(vq)) {
+		vpe_err(ctx->dev, "queue busy\n");
+		return -EBUSY;
+	}
+
+	q_data = get_q_data(ctx, f->type);
+	if (!q_data)
+		return -EINVAL;
+
+	q_data->fmt		= find_format(f);
+	q_data->width		= pix->width;
+	q_data->height		= pix->height;
+	q_data->colorspace	= pix->colorspace;
+
+	for (i = 0; i < pix->num_planes; i++) {
+		plane_fmt = &pix->plane_fmt[i];
+
+		q_data->bytesperline[i]	= plane_fmt->bytesperline;
+		q_data->sizeimage[i]	= plane_fmt->sizeimage;
+	}
+
+	q_data->c_rect.left	= 0;
+	q_data->c_rect.top	= 0;
+	q_data->c_rect.width	= q_data->width;
+	q_data->c_rect.height	= q_data->height;
+
+	vpe_dbg(ctx->dev, "Setting format for type %d, wxh: %dx%d, fmt: %d bpl_y %d",
+		f->type, q_data->width, q_data->height, q_data->fmt->fourcc,
+		q_data->bytesperline[VPE_LUMA]);
+	if (q_data->fmt->coplanar)
+		vpe_dbg(ctx->dev, " bpl_uv %d\n",
+			q_data->bytesperline[VPE_CHROMA]);
+
+	return 0;
+}
+
+static int vpe_s_fmt(struct file *file, void *priv, struct v4l2_format *f)
+{
+	int ret;
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	ret = vpe_try_fmt(file, priv, f);
+	if (ret)
+		return ret;
+
+	ret = __vpe_s_fmt(ctx, f);
+	if (ret)
+		return ret;
+
+	if (V4L2_TYPE_IS_OUTPUT(f->type))
+		set_src_registers(ctx);
+	else
+		set_dst_registers(ctx);
+
+	return set_srcdst_params(ctx);
+}
+
+static int vpe_reqbufs(struct file *file, void *priv,
+		       struct v4l2_requestbuffers *reqbufs)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	return v4l2_m2m_reqbufs(file, ctx->m2m_ctx, reqbufs);
+}
+
+static int vpe_querybuf(struct file *file, void *priv, struct v4l2_buffer *buf)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	return v4l2_m2m_querybuf(file, ctx->m2m_ctx, buf);
+}
+
+static int vpe_qbuf(struct file *file, void *priv, struct v4l2_buffer *buf)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	return v4l2_m2m_qbuf(file, ctx->m2m_ctx, buf);
+}
+
+static int vpe_dqbuf(struct file *file, void *priv, struct v4l2_buffer *buf)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	return v4l2_m2m_dqbuf(file, ctx->m2m_ctx, buf);
+}
+
+static int vpe_streamon(struct file *file, void *priv, enum v4l2_buf_type type)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	return v4l2_m2m_streamon(file, ctx->m2m_ctx, type);
+}
+
+static int vpe_streamoff(struct file *file, void *priv, enum v4l2_buf_type type)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	vpe_dump_regs(ctx->dev);
+	vpdma_dump_regs(ctx->dev->vpdma);
+
+	return v4l2_m2m_streamoff(file, ctx->m2m_ctx, type);
+}
+
+/*
+ * defines number of buffers/frames a context can process with VPE before
+ * switching to a different context. default value is 1 buffer per context
+ */
+#define V4L2_CID_VPE_BUFS_PER_JOB		(V4L2_CID_USER_TI_VPE_BASE + 0)
+
+static int vpe_s_ctrl(struct v4l2_ctrl *ctrl)
+{
+	struct vpe_ctx *ctx =
+		container_of(ctrl->handler, struct vpe_ctx, hdl);
+
+	switch (ctrl->id) {
+	case V4L2_CID_VPE_BUFS_PER_JOB:
+		ctx->bufs_per_job = ctrl->val;
+		break;
+
+	default:
+		vpe_err(ctx->dev, "Invalid control\n");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static const struct v4l2_ctrl_ops vpe_ctrl_ops = {
+	.s_ctrl = vpe_s_ctrl,
+};
+
+static const struct v4l2_ioctl_ops vpe_ioctl_ops = {
+	.vidioc_querycap	= vpe_querycap,
+
+	.vidioc_enum_fmt_vid_cap_mplane = vpe_enum_fmt,
+	.vidioc_g_fmt_vid_cap_mplane	= vpe_g_fmt,
+	.vidioc_try_fmt_vid_cap_mplane	= vpe_try_fmt,
+	.vidioc_s_fmt_vid_cap_mplane	= vpe_s_fmt,
+
+	.vidioc_enum_fmt_vid_out_mplane = vpe_enum_fmt,
+	.vidioc_g_fmt_vid_out_mplane	= vpe_g_fmt,
+	.vidioc_try_fmt_vid_out_mplane	= vpe_try_fmt,
+	.vidioc_s_fmt_vid_out_mplane	= vpe_s_fmt,
+
+	.vidioc_reqbufs		= vpe_reqbufs,
+	.vidioc_querybuf	= vpe_querybuf,
+
+	.vidioc_qbuf		= vpe_qbuf,
+	.vidioc_dqbuf		= vpe_dqbuf,
+
+	.vidioc_streamon	= vpe_streamon,
+	.vidioc_streamoff	= vpe_streamoff,
+	.vidioc_subscribe_event = v4l2_ctrl_subscribe_event,
+	.vidioc_unsubscribe_event = v4l2_event_unsubscribe,
+};
+
+/*
+ * Queue operations
+ */
+static int vpe_queue_setup(struct vb2_queue *vq,
+			   const struct v4l2_format *fmt,
+			   unsigned int *nbuffers, unsigned int *nplanes,
+			   unsigned int sizes[], void *alloc_ctxs[])
+{
+	int i;
+	struct vpe_ctx *ctx = vb2_get_drv_priv(vq);
+	struct vpe_q_data *q_data;
+
+	q_data = get_q_data(ctx, vq->type);
+
+	*nplanes = q_data->fmt->coplanar ? 2 : 1;
+
+	for (i = 0; i < *nplanes; i++) {
+		sizes[i] = q_data->sizeimage[i];
+		alloc_ctxs[i] = ctx->dev->alloc_ctx;
+	}
+
+	vpe_dbg(ctx->dev, "get %d buffer(s) of size %d", *nbuffers,
+		sizes[VPE_LUMA]);
+	if (q_data->fmt->coplanar)
+		vpe_dbg(ctx->dev, " and %d\n", sizes[VPE_CHROMA]);
+
+	return 0;
+}
+
+static int vpe_buf_prepare(struct vb2_buffer *vb)
+{
+	struct vpe_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
+	struct vpe_q_data *q_data;
+	int i, num_planes;
+
+	vpe_dbg(ctx->dev, "type: %d\n", vb->vb2_queue->type);
+
+	q_data = get_q_data(ctx, vb->vb2_queue->type);
+	num_planes = q_data->fmt->coplanar ? 2 : 1;
+
+	for (i = 0; i < num_planes; i++) {
+		if (vb2_plane_size(vb, i) < q_data->sizeimage[i]) {
+			vpe_err(ctx->dev,
+				"data will not fit into plane (%lu < %lu)\n",
+				vb2_plane_size(vb, i),
+				(long) q_data->sizeimage[i]);
+			return -EINVAL;
+		}
+	}
+
+	for (i = 0; i < num_planes; i++)
+		vb2_set_plane_payload(vb, i, q_data->sizeimage[i]);
+
+	return 0;
+}
+
+static void vpe_buf_queue(struct vb2_buffer *vb)
+{
+	struct vpe_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
+	v4l2_m2m_buf_queue(ctx->m2m_ctx, vb);
+}
+
+static void vpe_wait_prepare(struct vb2_queue *q)
+{
+	struct vpe_ctx *ctx = vb2_get_drv_priv(q);
+	vpe_unlock(ctx);
+}
+
+static void vpe_wait_finish(struct vb2_queue *q)
+{
+	struct vpe_ctx *ctx = vb2_get_drv_priv(q);
+	vpe_lock(ctx);
+}
+
+static struct vb2_ops vpe_qops = {
+	.queue_setup	 = vpe_queue_setup,
+	.buf_prepare	 = vpe_buf_prepare,
+	.buf_queue	 = vpe_buf_queue,
+	.wait_prepare	 = vpe_wait_prepare,
+	.wait_finish	 = vpe_wait_finish,
+};
+
+static int queue_init(void *priv, struct vb2_queue *src_vq,
+		      struct vb2_queue *dst_vq)
+{
+	struct vpe_ctx *ctx = priv;
+	int ret;
+
+	memset(src_vq, 0, sizeof(*src_vq));
+	src_vq->type = V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE;
+	src_vq->io_modes = VB2_MMAP;
+	src_vq->drv_priv = ctx;
+	src_vq->buf_struct_size = sizeof(struct v4l2_m2m_buffer);
+	src_vq->ops = &vpe_qops;
+	src_vq->mem_ops = &vb2_dma_contig_memops;
+	src_vq->timestamp_type = V4L2_BUF_FLAG_TIMESTAMP_COPY;
+
+	ret = vb2_queue_init(src_vq);
+	if (ret)
+		return ret;
+
+	memset(dst_vq, 0, sizeof(*dst_vq));
+	dst_vq->type = V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE;
+	dst_vq->io_modes = VB2_MMAP;
+	dst_vq->drv_priv = ctx;
+	dst_vq->buf_struct_size = sizeof(struct v4l2_m2m_buffer);
+	dst_vq->ops = &vpe_qops;
+	dst_vq->mem_ops = &vb2_dma_contig_memops;
+	dst_vq->timestamp_type = V4L2_BUF_FLAG_TIMESTAMP_COPY;
+
+	return vb2_queue_init(dst_vq);
+}
+
+static const struct v4l2_ctrl_config vpe_bufs_per_job = {
+	.ops = &vpe_ctrl_ops,
+	.id = V4L2_CID_VPE_BUFS_PER_JOB,
+	.name = "Buffers Per Transaction",
+	.type = V4L2_CTRL_TYPE_INTEGER,
+	.def = VPE_DEF_BUFS_PER_JOB,
+	.min = 1,
+	.max = VIDEO_MAX_FRAME,
+	.step = 1,
+};
+
+/*
+ * File operations
+ */
+static int vpe_open(struct file *file)
+{
+	struct vpe_dev *dev = video_drvdata(file);
+	struct vpe_ctx *ctx = NULL;
+	struct vpe_q_data *s_q_data;
+	struct v4l2_ctrl_handler *hdl;
+	int ret;
+
+	vpe_dbg(dev, "vpe_open\n");
+
+	if (!dev->vpdma->ready) {
+		vpe_err(dev, "vpdma firmware not loaded\n");
+		return -ENODEV;
+	}
+
+	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
+	if (!ctx)
+		return -ENOMEM;
+
+	ctx->dev = dev;
+
+	if (mutex_lock_interruptible(&dev->dev_mutex)) {
+		ret = -ERESTARTSYS;
+		goto free_ctx;
+	}
+
+	ret = vpdma_create_desc_list(&ctx->desc_list, VPE_DESC_LIST_SIZE,
+			VPDMA_LIST_TYPE_NORMAL);
+	if (ret != 0)
+		goto unlock;
+
+	ret = vpdma_alloc_desc_buf(&ctx->mmr_adb, sizeof(struct vpe_mmr_adb));
+	if (ret != 0)
+		goto free_desc_list;
+
+	init_adb_hdrs(ctx);
+
+	v4l2_fh_init(&ctx->fh, video_devdata(file));
+	file->private_data = &ctx->fh;
+
+	hdl = &ctx->hdl;
+	v4l2_ctrl_handler_init(hdl, 1);
+	v4l2_ctrl_new_custom(hdl, &vpe_bufs_per_job, NULL);
+	if (hdl->error) {
+		ret = hdl->error;
+		goto exit_fh;
+	}
+	ctx->fh.ctrl_handler = hdl;
+	v4l2_ctrl_handler_setup(hdl);
+
+	s_q_data = &ctx->q_data[Q_DATA_SRC];
+	s_q_data->fmt = &vpe_formats[2];
+	s_q_data->width = 1920;
+	s_q_data->height = 1080;
+	s_q_data->sizeimage[VPE_LUMA] = (s_q_data->width * s_q_data->height *
+			s_q_data->fmt->vpdma_fmt[VPE_LUMA]->depth) >> 3;
+	s_q_data->colorspace = V4L2_COLORSPACE_SMPTE240M;
+	s_q_data->c_rect.left = 0;
+	s_q_data->c_rect.top = 0;
+	s_q_data->c_rect.width = s_q_data->width;
+	s_q_data->c_rect.height = s_q_data->height;
+	s_q_data->flags = 0;
+
+	ctx->q_data[Q_DATA_DST] = *s_q_data;
+
+	set_src_registers(ctx);
+	set_dst_registers(ctx);
+	ret = set_srcdst_params(ctx);
+	if (ret)
+		goto exit_fh;
+
+	ctx->m2m_ctx = v4l2_m2m_ctx_init(dev->m2m_dev, ctx, &queue_init);
+
+	if (IS_ERR(ctx->m2m_ctx)) {
+		ret = PTR_ERR(ctx->m2m_ctx);
+		goto exit_fh;
+	}
+
+	v4l2_fh_add(&ctx->fh);
+
+	/*
+	 * for now, just report the creation of the first instance, we can later
+	 * optimize the driver to enable or disable clocks when the first
+	 * instance is created or the last instance released
+	 */
+	if (atomic_inc_return(&dev->num_instances) == 1)
+		vpe_dbg(dev, "first instance created\n");
+
+	ctx->bufs_per_job = VPE_DEF_BUFS_PER_JOB;
+
+	ctx->load_mmrs = true;
+
+	vpe_dbg(dev, "created instance %p, m2m_ctx: %p\n",
+		ctx, ctx->m2m_ctx);
+
+	mutex_unlock(&dev->dev_mutex);
+
+	return 0;
+exit_fh:
+	v4l2_ctrl_handler_free(hdl);
+	v4l2_fh_exit(&ctx->fh);
+	vpdma_free_desc_buf(&ctx->mmr_adb);
+free_desc_list:
+	vpdma_free_desc_list(&ctx->desc_list);
+unlock:
+	mutex_unlock(&dev->dev_mutex);
+free_ctx:
+	kfree(ctx);
+	return ret;
+}
+
+static int vpe_release(struct file *file)
+{
+	struct vpe_dev *dev = video_drvdata(file);
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	vpe_dbg(dev, "releasing instance %p\n", ctx);
+
+	mutex_lock(&dev->dev_mutex);
+	vpdma_free_desc_list(&ctx->desc_list);
+	vpdma_free_desc_buf(&ctx->mmr_adb);
+
+	v4l2_fh_del(&ctx->fh);
+	v4l2_fh_exit(&ctx->fh);
+	v4l2_ctrl_handler_free(&ctx->hdl);
+	v4l2_m2m_ctx_release(ctx->m2m_ctx);
+
+	kfree(ctx);
+
+	/*
+	 * for now, just report the release of the last instance, we can later
+	 * optimize the driver to enable or disable clocks when the first
+	 * instance is created or the last instance released
+	 */
+	if (atomic_dec_return(&dev->num_instances) == 0)
+		vpe_dbg(dev, "last instance released\n");
+
+	mutex_unlock(&dev->dev_mutex);
+
+	return 0;
+}
+
+static unsigned int vpe_poll(struct file *file,
+			     struct poll_table_struct *wait)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+	struct vpe_dev *dev = ctx->dev;
+	int ret;
+
+	mutex_lock(&dev->dev_mutex);
+	ret = v4l2_m2m_poll(file, ctx->m2m_ctx, wait);
+	mutex_unlock(&dev->dev_mutex);
+	return ret;
+}
+
+static int vpe_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+	struct vpe_dev *dev = ctx->dev;
+	int ret;
+
+	if (mutex_lock_interruptible(&dev->dev_mutex))
+		return -ERESTARTSYS;
+	ret = v4l2_m2m_mmap(file, ctx->m2m_ctx, vma);
+	mutex_unlock(&dev->dev_mutex);
+	return ret;
+}
+
+static const struct v4l2_file_operations vpe_fops = {
+	.owner		= THIS_MODULE,
+	.open		= vpe_open,
+	.release	= vpe_release,
+	.poll		= vpe_poll,
+	.unlocked_ioctl	= video_ioctl2,
+	.mmap		= vpe_mmap,
+};
+
+static struct video_device vpe_videodev = {
+	.name		= VPE_MODULE_NAME,
+	.fops		= &vpe_fops,
+	.ioctl_ops	= &vpe_ioctl_ops,
+	.minor		= -1,
+	.release	= video_device_release,
+	.vfl_dir	= VFL_DIR_M2M,
+};
+
+static struct v4l2_m2m_ops m2m_ops = {
+	.device_run	= device_run,
+	.job_ready	= job_ready,
+	.job_abort	= job_abort,
+	.lock		= vpe_lock,
+	.unlock		= vpe_unlock,
+};
+
+static int vpe_runtime_get(struct platform_device *pdev)
+{
+	int r;
+
+	dev_dbg(&pdev->dev, "vpe_runtime_get\n");
+
+	r = pm_runtime_get_sync(&pdev->dev);
+	WARN_ON(r < 0);
+	return r < 0 ? r : 0;
+}
+
+static void vpe_runtime_put(struct platform_device *pdev)
+{
+
+	int r;
+
+	dev_dbg(&pdev->dev, "vpe_runtime_put\n");
+
+	r = pm_runtime_put_sync(&pdev->dev);
+	WARN_ON(r < 0 && r != -ENOSYS);
+}
+
+static int vpe_probe(struct platform_device *pdev)
+{
+	struct vpe_dev *dev;
+	struct video_device *vfd;
+	struct resource *res;
+	int ret, irq, func;
+
+	dev = devm_kzalloc(&pdev->dev, sizeof(*dev), GFP_KERNEL);
+	if (IS_ERR(dev))
+		return PTR_ERR(dev);
+
+	spin_lock_init(&dev->lock);
+
+	ret = v4l2_device_register(&pdev->dev, &dev->v4l2_dev);
+	if (ret)
+		return ret;
+
+	atomic_set(&dev->num_instances, 0);
+	mutex_init(&dev->dev_mutex);
+
+	res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "vpe_top");
+	/*
+	 * HACK: we get resource info from device tree in the form of a list of
+	 * VPE sub blocks, the driver currently uses only the base of vpe_top
+	 * for register access, the driver should be changed later to access
+	 * registers based on the sub block base addresses
+	 */
+	dev->base = devm_ioremap(&pdev->dev, res->start, SZ_32K);
+	if (IS_ERR(dev->base)) {
+		ret = PTR_ERR(dev->base);
+		goto v4l2_dev_unreg;
+	}
+
+	irq = platform_get_irq(pdev, 0);
+	ret = devm_request_irq(&pdev->dev, irq, vpe_irq, 0, VPE_MODULE_NAME,
+			dev);
+	if (ret)
+		goto v4l2_dev_unreg;
+
+	platform_set_drvdata(pdev, dev);
+
+	dev->alloc_ctx = vb2_dma_contig_init_ctx(&pdev->dev);
+	if (IS_ERR(dev->alloc_ctx)) {
+		vpe_err(dev, "Failed to alloc vb2 context\n");
+		ret = PTR_ERR(dev->alloc_ctx);
+		goto v4l2_dev_unreg;
+	}
+
+	dev->m2m_dev = v4l2_m2m_init(&m2m_ops);
+	if (IS_ERR(dev->m2m_dev)) {
+		vpe_err(dev, "Failed to init mem2mem device\n");
+		ret = PTR_ERR(dev->m2m_dev);
+		goto rel_ctx;
+	}
+
+	pm_runtime_enable(&pdev->dev);
+
+	ret = vpe_runtime_get(pdev);
+	if (ret)
+		goto rel_m2m;
+
+	/* Perform clk enable followed by reset */
+	vpe_set_clock_enable(dev, 1);
+
+	vpe_top_reset(dev);
+
+	func = read_field_reg(dev, VPE_PID, VPE_PID_FUNC_MASK,
+		VPE_PID_FUNC_SHIFT);
+	vpe_dbg(dev, "VPE PID function %x\n", func);
+
+	vpe_top_vpdma_reset(dev);
+
+	dev->vpdma = vpdma_create(pdev);
+	if (IS_ERR(dev->vpdma))
+		goto runtime_put;
+
+	vfd = &dev->vfd;
+	*vfd = vpe_videodev;
+	vfd->lock = &dev->dev_mutex;
+	vfd->v4l2_dev = &dev->v4l2_dev;
+
+	ret = video_register_device(vfd, VFL_TYPE_GRABBER, 0);
+	if (ret) {
+		vpe_err(dev, "Failed to register video device\n");
+		goto runtime_put;
+	}
+
+	video_set_drvdata(vfd, dev);
+	snprintf(vfd->name, sizeof(vfd->name), "%s", vpe_videodev.name);
+	dev_info(dev->v4l2_dev.dev, "Device registered as /dev/video%d\n",
+		vfd->num);
+
+	return 0;
+
+runtime_put:
+	vpe_runtime_put(pdev);
+rel_m2m:
+	pm_runtime_disable(&pdev->dev);
+	v4l2_m2m_release(dev->m2m_dev);
+rel_ctx:
+	vb2_dma_contig_cleanup_ctx(dev->alloc_ctx);
+v4l2_dev_unreg:
+	v4l2_device_unregister(&dev->v4l2_dev);
+
+	return ret;
+}
+
+static int vpe_remove(struct platform_device *pdev)
+{
+	struct vpe_dev *dev =
+		(struct vpe_dev *) platform_get_drvdata(pdev);
+
+	v4l2_info(&dev->v4l2_dev, "Removing " VPE_MODULE_NAME);
+
+	v4l2_m2m_release(dev->m2m_dev);
+	video_unregister_device(&dev->vfd);
+	v4l2_device_unregister(&dev->v4l2_dev);
+	vb2_dma_contig_cleanup_ctx(dev->alloc_ctx);
+
+	vpe_set_clock_enable(dev, 0);
+	vpe_runtime_put(pdev);
+	pm_runtime_disable(&pdev->dev);
+
+	return 0;
+}
+
+#if defined(CONFIG_OF)
+static const struct of_device_id vpe_of_match[] = {
+	{
+		.compatible = "ti,vpe",
+	},
+	{},
+};
+#else
+#define vpe_of_match NULL
+#endif
+
+static struct platform_driver vpe_pdrv = {
+	.probe		= vpe_probe,
+	.remove		= vpe_remove,
+	.driver		= {
+		.name	= VPE_MODULE_NAME,
+		.owner	= THIS_MODULE,
+		.of_match_table = vpe_of_match,
+	},
+};
+
+static void __exit vpe_exit(void)
+{
+	platform_driver_unregister(&vpe_pdrv);
+}
+
+static int __init vpe_init(void)
+{
+	return platform_driver_register(&vpe_pdrv);
+}
+
+module_init(vpe_init);
+module_exit(vpe_exit);
+
+MODULE_DESCRIPTION("TI VPE driver");
+MODULE_AUTHOR("Dale Farnsworth, <dale@farnsworth.org>");
+MODULE_LICENSE("GPL");
diff --git a/drivers/media/platform/ti-vpe/vpe_regs.h b/drivers/media/platform/ti-vpe/vpe_regs.h
new file mode 100644
index 0000000..ed214e8
--- /dev/null
+++ b/drivers/media/platform/ti-vpe/vpe_regs.h
@@ -0,0 +1,496 @@
+/*
+ * Copyright (c) 2013 Texas Instruments Inc.
+ *
+ * David Griego, <dagriego@biglakesoftware.com>
+ * Dale Farnsworth, <dale@farnsworth.org>
+ * Archit Taneja, <archit@ti.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#ifndef __TI_VPE_REGS_H
+#define __TI_VPE_REGS_H
+
+/* VPE register offsets and field selectors */
+
+/* VPE top level regs */
+#define VPE_PID				0x0000
+#define VPE_PID_MINOR_MASK		0x3f
+#define VPE_PID_MINOR_SHIFT		0
+#define VPE_PID_CUSTOM_MASK		0x03
+#define VPE_PID_CUSTOM_SHIFT		6
+#define VPE_PID_MAJOR_MASK		0x07
+#define VPE_PID_MAJOR_SHIFT		8
+#define VPE_PID_RTL_MASK		0x1f
+#define VPE_PID_RTL_SHIFT		11
+#define VPE_PID_FUNC_MASK		0xfff
+#define VPE_PID_FUNC_SHIFT		16
+#define VPE_PID_SCHEME_MASK		0x03
+#define VPE_PID_SCHEME_SHIFT		30
+
+#define VPE_SYSCONFIG			0x0010
+#define VPE_SYSCONFIG_IDLE_MASK		0x03
+#define VPE_SYSCONFIG_IDLE_SHIFT	2
+#define VPE_SYSCONFIG_STANDBY_MASK	0x03
+#define VPE_SYSCONFIG_STANDBY_SHIFT	4
+#define VPE_FORCE_IDLE_MODE		0
+#define VPE_NO_IDLE_MODE		1
+#define VPE_SMART_IDLE_MODE		2
+#define VPE_SMART_IDLE_WAKEUP_MODE	3
+#define VPE_FORCE_STANDBY_MODE		0
+#define VPE_NO_STANDBY_MODE		1
+#define VPE_SMART_STANDBY_MODE		2
+#define VPE_SMART_STANDBY_WAKEUP_MODE	3
+
+#define VPE_INT0_STATUS0_RAW_SET	0x0020
+#define VPE_INT0_STATUS0_RAW		VPE_INT0_STATUS0_RAW_SET
+#define VPE_INT0_STATUS0_CLR		0x0028
+#define VPE_INT0_STATUS0		VPE_INT0_STATUS0_CLR
+#define VPE_INT0_ENABLE0_SET		0x0030
+#define VPE_INT0_ENABLE0		VPE_INT0_ENABLE0_SET
+#define VPE_INT0_ENABLE0_CLR		0x0038
+#define VPE_INT0_LIST0_COMPLETE		(1 << 0)
+#define VPE_INT0_LIST0_NOTIFY		(1 << 1)
+#define VPE_INT0_LIST1_COMPLETE		(1 << 2)
+#define VPE_INT0_LIST1_NOTIFY		(1 << 3)
+#define VPE_INT0_LIST2_COMPLETE		(1 << 4)
+#define VPE_INT0_LIST2_NOTIFY		(1 << 5)
+#define VPE_INT0_LIST3_COMPLETE		(1 << 6)
+#define VPE_INT0_LIST3_NOTIFY		(1 << 7)
+#define VPE_INT0_LIST4_COMPLETE		(1 << 8)
+#define VPE_INT0_LIST4_NOTIFY		(1 << 9)
+#define VPE_INT0_LIST5_COMPLETE		(1 << 10)
+#define VPE_INT0_LIST5_NOTIFY		(1 << 11)
+#define VPE_INT0_LIST6_COMPLETE		(1 << 12)
+#define VPE_INT0_LIST6_NOTIFY		(1 << 13)
+#define VPE_INT0_LIST7_COMPLETE		(1 << 14)
+#define VPE_INT0_LIST7_NOTIFY		(1 << 15)
+#define VPE_INT0_DESCRIPTOR		(1 << 16)
+#define VPE_DEI_FMD_INT			(1 << 18)
+
+#define VPE_INT0_STATUS1_RAW_SET	0x0024
+#define VPE_INT0_STATUS1_RAW		VPE_INT0_STATUS1_RAW_SET
+#define VPE_INT0_STATUS1_CLR		0x002c
+#define VPE_INT0_STATUS1		VPE_INT0_STATUS1_CLR
+#define VPE_INT0_ENABLE1_SET		0x0034
+#define VPE_INT0_ENABLE1		VPE_INT0_ENABLE1_SET
+#define VPE_INT0_ENABLE1_CLR		0x003c
+#define VPE_INT0_CHANNEL_GROUP0		(1 << 0)
+#define VPE_INT0_CHANNEL_GROUP1		(1 << 1)
+#define VPE_INT0_CHANNEL_GROUP2		(1 << 2)
+#define VPE_INT0_CHANNEL_GROUP3		(1 << 3)
+#define VPE_INT0_CHANNEL_GROUP4		(1 << 4)
+#define VPE_INT0_CHANNEL_GROUP5		(1 << 5)
+#define VPE_INT0_CLIENT			(1 << 7)
+#define VPE_DEI_ERROR_INT		(1 << 16)
+#define VPE_DS1_UV_ERROR_INT		(1 << 22)
+
+#define VPE_INTC_EOI			0x00a0
+
+#define VPE_CLK_ENABLE			0x0100
+#define VPE_VPEDMA_CLK_ENABLE		(1 << 0)
+#define VPE_DATA_PATH_CLK_ENABLE	(1 << 1)
+
+#define VPE_CLK_RESET			0x0104
+#define VPE_VPDMA_CLK_RESET_MASK	0x1
+#define VPE_VPDMA_CLK_RESET_SHIFT	0
+#define VPE_DATA_PATH_CLK_RESET_MASK	0x1
+#define VPE_DATA_PATH_CLK_RESET_SHIFT	1
+#define VPE_MAIN_RESET_MASK		0x1
+#define VPE_MAIN_RESET_SHIFT		31
+
+#define VPE_CLK_FORMAT_SELECT		0x010c
+#define VPE_CSC_SRC_SELECT_MASK		0x03
+#define VPE_CSC_SRC_SELECT_SHIFT	0
+#define VPE_RGB_OUT_SELECT		(1 << 8)
+#define VPE_DS_SRC_SELECT_MASK		0x07
+#define VPE_DS_SRC_SELECT_SHIFT		9
+#define VPE_DS_BYPASS			(1 << 16)
+#define VPE_COLOR_SEPARATE_422		(1 << 18)
+
+#define VPE_DS_SRC_DEI_SCALER		(5 << VPE_DS_SRC_SELECT_SHIFT)
+#define VPE_CSC_SRC_DEI_SCALER		(3 << VPE_CSC_SRC_SELECT_SHIFT)
+
+#define VPE_CLK_RANGE_MAP		0x011c
+#define VPE_RANGE_RANGE_MAP_Y_MASK	0x07
+#define VPE_RANGE_RANGE_MAP_Y_SHIFT	0
+#define VPE_RANGE_RANGE_MAP_UV_MASK	0x07
+#define VPE_RANGE_RANGE_MAP_UV_SHIFT	3
+#define VPE_RANGE_MAP_ON		(1 << 6)
+#define VPE_RANGE_REDUCTION_ON		(1 << 28)
+
+/* VPE chrominance upsampler regs */
+#define VPE_US1_R0			0x0304
+#define VPE_US2_R0			0x0404
+#define VPE_US3_R0			0x0504
+#define VPE_US_C1_MASK			0x3fff
+#define VPE_US_C1_SHIFT			2
+#define VPE_US_C0_MASK			0x3fff
+#define VPE_US_C0_SHIFT			18
+#define VPE_US_MODE_MASK		0x03
+#define VPE_US_MODE_SHIFT		16
+#define VPE_ANCHOR_FID0_C1_MASK		0x3fff
+#define VPE_ANCHOR_FID0_C1_SHIFT	2
+#define VPE_ANCHOR_FID0_C0_MASK		0x3fff
+#define VPE_ANCHOR_FID0_C0_SHIFT	18
+
+#define VPE_US1_R1			0x0308
+#define VPE_US2_R1			0x0408
+#define VPE_US3_R1			0x0508
+#define VPE_ANCHOR_FID0_C3_MASK		0x3fff
+#define VPE_ANCHOR_FID0_C3_SHIFT	2
+#define VPE_ANCHOR_FID0_C2_MASK		0x3fff
+#define VPE_ANCHOR_FID0_C2_SHIFT	18
+
+#define VPE_US1_R2			0x030c
+#define VPE_US2_R2			0x040c
+#define VPE_US3_R2			0x050c
+#define VPE_INTERP_FID0_C1_MASK		0x3fff
+#define VPE_INTERP_FID0_C1_SHIFT	2
+#define VPE_INTERP_FID0_C0_MASK		0x3fff
+#define VPE_INTERP_FID0_C0_SHIFT	18
+
+#define VPE_US1_R3			0x0310
+#define VPE_US2_R3			0x0410
+#define VPE_US3_R3			0x0510
+#define VPE_INTERP_FID0_C3_MASK		0x3fff
+#define VPE_INTERP_FID0_C3_SHIFT	2
+#define VPE_INTERP_FID0_C2_MASK		0x3fff
+#define VPE_INTERP_FID0_C2_SHIFT	18
+
+#define VPE_US1_R4			0x0314
+#define VPE_US2_R4			0x0414
+#define VPE_US3_R4			0x0514
+#define VPE_ANCHOR_FID1_C1_MASK		0x3fff
+#define VPE_ANCHOR_FID1_C1_SHIFT	2
+#define VPE_ANCHOR_FID1_C0_MASK		0x3fff
+#define VPE_ANCHOR_FID1_C0_SHIFT	18
+
+#define VPE_US1_R5			0x0318
+#define VPE_US2_R5			0x0418
+#define VPE_US3_R5			0x0518
+#define VPE_ANCHOR_FID1_C3_MASK		0x3fff
+#define VPE_ANCHOR_FID1_C3_SHIFT	2
+#define VPE_ANCHOR_FID1_C2_MASK		0x3fff
+#define VPE_ANCHOR_FID1_C2_SHIFT	18
+
+#define VPE_US1_R6			0x031c
+#define VPE_US2_R6			0x041c
+#define VPE_US3_R6			0x051c
+#define VPE_INTERP_FID1_C1_MASK		0x3fff
+#define VPE_INTERP_FID1_C1_SHIFT	2
+#define VPE_INTERP_FID1_C0_MASK		0x3fff
+#define VPE_INTERP_FID1_C0_SHIFT	18
+
+#define VPE_US1_R7			0x0320
+#define VPE_US2_R7			0x0420
+#define VPE_US3_R7			0x0520
+#define VPE_INTERP_FID0_C3_MASK		0x3fff
+#define VPE_INTERP_FID0_C3_SHIFT	2
+#define VPE_INTERP_FID0_C2_MASK		0x3fff
+#define VPE_INTERP_FID0_C2_SHIFT	18
+
+/* VPE de-interlacer regs */
+#define VPE_DEI_FRAME_SIZE		0x0600
+#define VPE_DEI_WIDTH_MASK		0x07ff
+#define VPE_DEI_WIDTH_SHIFT		0
+#define VPE_DEI_HEIGHT_MASK		0x07ff
+#define VPE_DEI_HEIGHT_SHIFT		16
+#define VPE_DEI_INTERLACE_BYPASS	(1 << 29)
+#define VPE_DEI_FIELD_FLUSH		(1 << 30)
+#define VPE_DEI_PROGRESSIVE		(1 << 31)
+
+#define VPE_MDT_BYPASS			0x0604
+#define VPE_MDT_TEMPMAX_BYPASS		(1 << 0)
+#define VPE_MDT_SPATMAX_BYPASS		(1 << 1)
+
+#define VPE_MDT_SF_THRESHOLD		0x0608
+#define VPE_MDT_SF_SC_THR1_MASK		0xff
+#define VPE_MDT_SF_SC_THR1_SHIFT	0
+#define VPE_MDT_SF_SC_THR2_MASK		0xff
+#define VPE_MDT_SF_SC_THR2_SHIFT	0
+#define VPE_MDT_SF_SC_THR3_MASK		0xff
+#define VPE_MDT_SF_SC_THR3_SHIFT	0
+
+#define VPE_EDI_CONFIG			0x060c
+#define VPE_EDI_INP_MODE_MASK		0x03
+#define VPE_EDI_INP_MODE_SHIFT		0
+#define VPE_EDI_ENABLE_3D		(1 << 2)
+#define VPE_EDI_ENABLE_CHROMA_3D	(1 << 3)
+#define VPE_EDI_CHROMA3D_COR_THR_MASK	0xff
+#define VPE_EDI_CHROMA3D_COR_THR_SHIFT	8
+#define VPE_EDI_DIR_COR_LOWER_THR_MASK	0xff
+#define VPE_EDI_DIR_COR_LOWER_THR_SHIFT	16
+#define VPE_EDI_COR_SCALE_FACTOR_MASK	0xff
+#define VPE_EDI_COR_SCALE_FACTOR_SHIFT	23
+
+#define VPE_DEI_EDI_LUT_R0		0x0610
+#define VPE_EDI_LUT0_MASK		0x1f
+#define VPE_EDI_LUT0_SHIFT		0
+#define VPE_EDI_LUT1_MASK		0x1f
+#define VPE_EDI_LUT1_SHIFT		8
+#define VPE_EDI_LUT2_MASK		0x1f
+#define VPE_EDI_LUT2_SHIFT		16
+#define VPE_EDI_LUT3_MASK		0x1f
+#define VPE_EDI_LUT3_SHIFT		24
+
+#define VPE_DEI_EDI_LUT_R1		0x0614
+#define VPE_EDI_LUT0_MASK		0x1f
+#define VPE_EDI_LUT0_SHIFT		0
+#define VPE_EDI_LUT1_MASK		0x1f
+#define VPE_EDI_LUT1_SHIFT		8
+#define VPE_EDI_LUT2_MASK		0x1f
+#define VPE_EDI_LUT2_SHIFT		16
+#define VPE_EDI_LUT3_MASK		0x1f
+#define VPE_EDI_LUT3_SHIFT		24
+
+#define VPE_DEI_EDI_LUT_R2		0x0618
+#define VPE_EDI_LUT4_MASK		0x1f
+#define VPE_EDI_LUT4_SHIFT		0
+#define VPE_EDI_LUT5_MASK		0x1f
+#define VPE_EDI_LUT5_SHIFT		8
+#define VPE_EDI_LUT6_MASK		0x1f
+#define VPE_EDI_LUT6_SHIFT		16
+#define VPE_EDI_LUT7_MASK		0x1f
+#define VPE_EDI_LUT7_SHIFT		24
+
+#define VPE_DEI_EDI_LUT_R3		0x061c
+#define VPE_EDI_LUT8_MASK		0x1f
+#define VPE_EDI_LUT8_SHIFT		0
+#define VPE_EDI_LUT9_MASK		0x1f
+#define VPE_EDI_LUT9_SHIFT		8
+#define VPE_EDI_LUT10_MASK		0x1f
+#define VPE_EDI_LUT10_SHIFT		16
+#define VPE_EDI_LUT11_MASK		0x1f
+#define VPE_EDI_LUT11_SHIFT		24
+
+#define VPE_DEI_FMD_WINDOW_R0		0x0620
+#define VPE_FMD_WINDOW_MINX_MASK	0x07ff
+#define VPE_FMD_WINDOW_MINX_SHIFT	0
+#define VPE_FMD_WINDOW_MAXX_MASK	0x07ff
+#define VPE_FMD_WINDOW_MAXX_SHIFT	16
+#define VPE_FMD_WINDOW_ENABLE		(1 << 31)
+
+#define VPE_DEI_FMD_WINDOW_R1		0x0624
+#define VPE_FMD_WINDOW_MINY_MASK	0x07ff
+#define VPE_FMD_WINDOW_MINY_SHIFT	0
+#define VPE_FMD_WINDOW_MAXY_MASK	0x07ff
+#define VPE_FMD_WINDOW_MAXY_SHIFT	16
+
+#define VPE_DEI_FMD_CONTROL_R0		0x0628
+#define VPE_FMD_ENABLE			(1 << 0)
+#define VPE_FMD_LOCK			(1 << 1)
+#define VPE_FMD_JAM_DIR			(1 << 2)
+#define VPE_FMD_BED_ENABLE		(1 << 3)
+#define VPE_FMD_CAF_FIELD_THR_MASK	0xff
+#define VPE_FMD_CAF_FIELD_THR_SHIFT	16
+#define VPE_FMD_CAF_LINE_THR_MASK	0xff
+#define VPE_FMD_CAF_LINE_THR_SHIFT	24
+
+#define VPE_DEI_FMD_CONTROL_R1		0x062c
+#define VPE_FMD_CAF_THR_MASK		0x000fffff
+#define VPE_FMD_CAF_THR_SHIFT		0
+
+#define VPE_DEI_FMD_STATUS_R0		0x0630
+#define VPE_FMD_CAF_MASK		0x000fffff
+#define VPE_FMD_CAF_SHIFT		0
+#define VPE_FMD_RESET			(1 << 24)
+
+#define VPE_DEI_FMD_STATUS_R1		0x0634
+#define VPE_FMD_FIELD_DIFF_MASK		0x0fffffff
+#define VPE_FMD_FIELD_DIFF_SHIFT	0
+
+#define VPE_DEI_FMD_STATUS_R2		0x0638
+#define VPE_FMD_FRAME_DIFF_MASK		0x000fffff
+#define VPE_FMD_FRAME_DIFF_SHIFT	0
+
+/* VPE scaler regs */
+#define VPE_SC_MP_SC0			0x0700
+#define VPE_INTERLACE_O			(1 << 0)
+#define VPE_LINEAR			(1 << 1)
+#define VPE_SC_BYPASS			(1 << 2)
+#define VPE_INVT_FID			(1 << 3)
+#define VPE_USE_RAV			(1 << 4)
+#define VPE_ENABLE_EV			(1 << 5)
+#define VPE_AUTO_HS			(1 << 6)
+#define VPE_DCM_2X			(1 << 7)
+#define VPE_DCM_4X			(1 << 8)
+#define VPE_HP_BYPASS			(1 << 9)
+#define VPE_INTERLACE_I			(1 << 10)
+#define VPE_ENABLE_SIN2_VER_INTP	(1 << 11)
+#define VPE_Y_PK_EN			(1 << 14)
+#define VPE_TRIM			(1 << 15)
+#define VPE_SELFGEN_FID			(1 << 16)
+
+#define VPE_SC_MP_SC1			0x0704
+#define VPE_ROW_ACC_INC_MASK		0x07ffffff
+#define VPE_ROW_ACC_INC_SHIFT		0
+
+#define VPE_SC_MP_SC2			0x0708
+#define VPE_ROW_ACC_OFFSET_MASK		0x0fffffff
+#define VPE_ROW_ACC_OFFSET_SHIFT	0
+
+#define VPE_SC_MP_SC3			0x070c
+#define VPE_ROW_ACC_OFFSET_B_MASK	0x0fffffff
+#define VPE_ROW_ACC_OFFSET_B_SHIFT	0
+
+#define VPE_SC_MP_SC4			0x0710
+#define VPE_TAR_H_MASK			0x07ff
+#define VPE_TAR_H_SHIFT			0
+#define VPE_TAR_W_MASK			0x07ff
+#define VPE_TAR_W_SHIFT			12
+#define VPE_LIN_ACC_INC_U_MASK		0x07
+#define VPE_LIN_ACC_INC_U_SHIFT		24
+#define VPE_NLIN_ACC_INIT_U_MASK	0x07
+#define VPE_NLIN_ACC_INIT_U_SHIFT	28
+
+#define VPE_SC_MP_SC5			0x0714
+#define VPE_SRC_H_MASK			0x07ff
+#define VPE_SRC_H_SHIFT			0
+#define VPE_SRC_W_MASK			0x07ff
+#define VPE_SRC_W_SHIFT			12
+#define VPE_NLIN_ACC_INC_U_MASK		0x07
+#define VPE_NLIN_ACC_INC_U_SHIFT	24
+
+#define VPE_SC_MP_SC6			0x0718
+#define VPE_ROW_ACC_INIT_RAV_MASK	0x03ff
+#define VPE_ROW_ACC_INIT_RAV_SHIFT	0
+#define VPE_ROW_ACC_INIT_RAV_B_MASK	0x03ff
+#define VPE_ROW_ACC_INIT_RAV_B_SHIFT	10
+
+#define VPE_SC_MP_SC8			0x0720
+#define VPE_NLIN_LEFT_MASK		0x07ff
+#define VPE_NLIN_LEFT_SHIFT		0
+#define VPE_NLIN_RIGHT_MASK		0x07ff
+#define VPE_NLIN_RIGHT_SHIFT		12
+
+#define VPE_SC_MP_SC9			0x0724
+#define VPE_LIN_ACC_INC			VPE_SC_MP_SC9
+
+#define VPE_SC_MP_SC10			0x0728
+#define VPE_NLIN_ACC_INIT		VPE_SC_MP_SC10
+
+#define VPE_SC_MP_SC11			0x072c
+#define VPE_NLIN_ACC_INC		VPE_SC_MP_SC11
+
+#define VPE_SC_MP_SC12			0x0730
+#define VPE_COL_ACC_OFFSET_MASK		0x01ffffff
+#define VPE_COL_ACC_OFFSET_SHIFT	0
+
+#define VPE_SC_MP_SC13			0x0734
+#define VPE_SC_FACTOR_RAV_MASK		0x03ff
+#define VPE_SC_FACTOR_RAV_SHIFT		0
+#define VPE_CHROMA_INTP_THR_MASK	0x03ff
+#define VPE_CHROMA_INTP_THR_SHIFT	12
+#define VPE_DELTA_CHROMA_THR_MASK	0x0f
+#define VPE_DELTA_CHROMA_THR_SHIFT	24
+
+#define VPE_SC_MP_SC17			0x0744
+#define VPE_EV_THR_MASK			0x03ff
+#define VPE_EV_THR_SHIFT		12
+#define VPE_DELTA_LUMA_THR_MASK		0x0f
+#define VPE_DELTA_LUMA_THR_SHIFT	24
+#define VPE_DELTA_EV_THR_MASK		0x0f
+#define VPE_DELTA_EV_THR_SHIFT		28
+
+#define VPE_SC_MP_SC18			0x0748
+#define VPE_HS_FACTOR_MASK		0x03ff
+#define VPE_HS_FACTOR_SHIFT		0
+#define VPE_CONF_DEFAULT_MASK		0x01ff
+#define VPE_CONF_DEFAULT_SHIFT		16
+
+#define VPE_SC_MP_SC19			0x074c
+#define VPE_HPF_COEFF0_MASK		0xff
+#define VPE_HPF_COEFF0_SHIFT		0
+#define VPE_HPF_COEFF1_MASK		0xff
+#define VPE_HPF_COEFF1_SHIFT		8
+#define VPE_HPF_COEFF2_MASK		0xff
+#define VPE_HPF_COEFF2_SHIFT		16
+#define VPE_HPF_COEFF3_MASK		0xff
+#define VPE_HPF_COEFF3_SHIFT		23
+
+#define VPE_SC_MP_SC20			0x0750
+#define VPE_HPF_COEFF4_MASK		0xff
+#define VPE_HPF_COEFF4_SHIFT		0
+#define VPE_HPF_COEFF5_MASK		0xff
+#define VPE_HPF_COEFF5_SHIFT		8
+#define VPE_HPF_NORM_SHIFT_MASK		0x07
+#define VPE_HPF_NORM_SHIFT_SHIFT	16
+#define VPE_NL_LIMIT_MASK		0x1ff
+#define VPE_NL_LIMIT_SHIFT		20
+
+#define VPE_SC_MP_SC21			0x0754
+#define VPE_NL_LO_THR_MASK		0x01ff
+#define VPE_NL_LO_THR_SHIFT		0
+#define VPE_NL_LO_SLOPE_MASK		0xff
+#define VPE_NL_LO_SLOPE_SHIFT		16
+
+#define VPE_SC_MP_SC22			0x0758
+#define VPE_NL_HI_THR_MASK		0x01ff
+#define VPE_NL_HI_THR_SHIFT		0
+#define VPE_NL_HI_SLOPE_SH_MASK		0x07
+#define VPE_NL_HI_SLOPE_SH_SHIFT	16
+
+#define VPE_SC_MP_SC23			0x075c
+#define VPE_GRADIENT_THR_MASK		0x07ff
+#define VPE_GRADIENT_THR_SHIFT		0
+#define VPE_GRADIENT_THR_RANGE_MASK	0x0f
+#define VPE_GRADIENT_THR_RANGE_SHIFT	12
+#define VPE_MIN_GY_THR_MASK		0xff
+#define VPE_MIN_GY_THR_SHIFT		16
+#define VPE_MIN_GY_THR_RANGE_MASK	0x0f
+#define VPE_MIN_GY_THR_RANGE_SHIFT	28
+
+#define VPE_SC_MP_SC24			0x0760
+#define VPE_ORG_H_MASK			0x07ff
+#define VPE_ORG_H_SHIFT			0
+#define VPE_ORG_W_MASK			0x07ff
+#define VPE_ORG_W_SHIFT			16
+
+#define VPE_SC_MP_SC25			0x0764
+#define VPE_OFF_H_MASK			0x07ff
+#define VPE_OFF_H_SHIFT			0
+#define VPE_OFF_W_MASK			0x07ff
+#define VPE_OFF_W_SHIFT			16
+
+/* VPE color space converter regs */
+#define VPE_CSC_CSC00			0x5700
+#define VPE_CSC_A0_MASK			0x1fff
+#define VPE_CSC_A0_SHIFT		0
+#define VPE_CSC_B0_MASK			0x1fff
+#define VPE_CSC_B0_SHIFT		16
+
+#define VPE_CSC_CSC01			0x5704
+#define VPE_CSC_C0_MASK			0x1fff
+#define VPE_CSC_C0_SHIFT		0
+#define VPE_CSC_A1_MASK			0x1fff
+#define VPE_CSC_A1_SHIFT		16
+
+#define VPE_CSC_CSC02			0x5708
+#define VPE_CSC_B1_MASK			0x1fff
+#define VPE_CSC_B1_SHIFT		0
+#define VPE_CSC_C1_MASK			0x1fff
+#define VPE_CSC_C1_SHIFT		16
+
+#define VPE_CSC_CSC03			0x570c
+#define VPE_CSC_A2_MASK			0x1fff
+#define VPE_CSC_A2_SHIFT		0
+#define VPE_CSC_B2_MASK			0x1fff
+#define VPE_CSC_B2_SHIFT		16
+
+#define VPE_CSC_CSC04			0x5710
+#define VPE_CSC_C2_MASK			0x1fff
+#define VPE_CSC_C2_SHIFT		0
+#define VPE_CSC_D0_MASK			0x0fff
+#define VPE_CSC_D0_SHIFT		16
+
+#define VPE_CSC_CSC05			0x5714
+#define VPE_CSC_D1_MASK			0x0fff
+#define VPE_CSC_D1_SHIFT		0
+#define VPE_CSC_D2_MASK			0x0fff
+#define VPE_CSC_D2_SHIFT		16
+#define VPE_CSC_BYPASS			(1 << 28)
+
+#endif
diff --git a/include/uapi/linux/v4l2-controls.h b/include/uapi/linux/v4l2-controls.h
index 083bb5a..1666aab 100644
--- a/include/uapi/linux/v4l2-controls.h
+++ b/include/uapi/linux/v4l2-controls.h
@@ -160,6 +160,10 @@ enum v4l2_colorfx {
  * of controls. Total of 16 controls is reserved for this driver */
 #define V4L2_CID_USER_SI476X_BASE		(V4L2_CID_USER_BASE + 0x1040)
 
+/* The base for the TI VPE driver controls. Total of 16 controls is reserved for
+ * this driver */
+#define V4L2_CID_USER_TI_VPE_BASE		(V4L2_CID_USER_BASE + 0x1050)
+
 /* MPEG-class control IDs */
 /* The MPEG controls are applicable to all codec controls
  * and the 'MPEG' part of the define is historical */
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v5 3/4] v4l: ti-vpe: Add VPE mem to mem driver
@ 2013-10-09 14:29       ` Archit Taneja
  0 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-10-09 14:29 UTC (permalink / raw)
  To: hverkuil; +Cc: linux-media, linux-omap, laurent.pinchart, Archit Taneja

VPE is a block which consists of a single memory to memory path which can
perform chrominance up/down sampling, de-interlacing, scaling, and color space
conversion of raster or tiled YUV420 coplanar, YUV422 coplanar or YUV422
interleaved video formats.

We create a mem2mem driver based primarily on the mem2mem-testdev example.
The de-interlacer, scaler and color space converter are all bypassed for now
to keep the driver simple. Chroma up/down sampler blocks are implemented, so
conversion beteen different YUV formats is possible.

Each mem2mem context allocates a buffer for VPE MMR values which it will use
when it gets access to the VPE HW via the mem2mem queue, it also allocates
a VPDMA descriptor list to which configuration and data descriptors are added.

Based on the information received via v4l2 ioctls for the source and
destination queues, the driver configures the values for the MMRs, and stores
them in the buffer. There are also some VPDMA parameters like frame start and
line mode which needs to be configured, these are configured by direct register
writes via the VPDMA helper functions.

The driver's device_run() mem2mem op will add each descriptor based on how the
source and destination queues are set up for the given ctx, once the list is
prepared, it's submitted to VPDMA, these descriptors when parsed by VPDMA will
upload MMR registers, start DMA of video buffers on the various input and output
clients/ports.

When the list is parsed completely(and the DMAs on all the output ports done),
an interrupt is generated which we use to notify that the source and destination
buffers are done.

The rest of the driver is quite similar to other mem2mem drivers, we use the
multiplane v4l2 ioctls as the HW support coplanar formats.

Signed-off-by: Archit Taneja <archit@ti.com>
---
- changes in v5:
 - updated how pix->colorspace is set.
 - adds comments on what our private control ID is used for.

- Removed the other patches from the series since they are the same.

 drivers/media/platform/Kconfig           |   16 +
 drivers/media/platform/Makefile          |    2 +
 drivers/media/platform/ti-vpe/Makefile   |    5 +
 drivers/media/platform/ti-vpe/vpe.c      | 1775 ++++++++++++++++++++++++++++++
 drivers/media/platform/ti-vpe/vpe_regs.h |  496 +++++++++
 include/uapi/linux/v4l2-controls.h       |    4 +
 6 files changed, 2298 insertions(+)
 create mode 100644 drivers/media/platform/ti-vpe/Makefile
 create mode 100644 drivers/media/platform/ti-vpe/vpe.c
 create mode 100644 drivers/media/platform/ti-vpe/vpe_regs.h

diff --git a/drivers/media/platform/Kconfig b/drivers/media/platform/Kconfig
index c7caf94..fc84d99 100644
--- a/drivers/media/platform/Kconfig
+++ b/drivers/media/platform/Kconfig
@@ -220,6 +220,22 @@ config VIDEO_RENESAS_VSP1
 	  To compile this driver as a module, choose M here: the module
 	  will be called vsp1.
 
+config VIDEO_TI_VPE
+	tristate "TI VPE (Video Processing Engine) driver"
+	depends on VIDEO_DEV && VIDEO_V4L2 && SOC_DRA7XX
+	select VIDEOBUF2_DMA_CONTIG
+	select V4L2_MEM2MEM_DEV
+	default n
+	---help---
+	  Support for the TI VPE(Video Processing Engine) block
+	  found on DRA7XX SoC.
+
+config VIDEO_TI_VPE_DEBUG
+	bool "VPE debug messages"
+	depends on VIDEO_TI_VPE
+	---help---
+	  Enable debug messages on VPE driver.
+
 endif # V4L_MEM2MEM_DRIVERS
 
 menuconfig V4L_TEST_DRIVERS
diff --git a/drivers/media/platform/Makefile b/drivers/media/platform/Makefile
index 4e4da48..1348ba1 100644
--- a/drivers/media/platform/Makefile
+++ b/drivers/media/platform/Makefile
@@ -22,6 +22,8 @@ obj-$(CONFIG_VIDEO_VIVI) += vivi.o
 
 obj-$(CONFIG_VIDEO_MEM2MEM_TESTDEV) += mem2mem_testdev.o
 
+obj-$(CONFIG_VIDEO_TI_VPE)		+= ti-vpe/
+
 obj-$(CONFIG_VIDEO_MX2_EMMAPRP)		+= mx2_emmaprp.o
 obj-$(CONFIG_VIDEO_CODA) 		+= coda.o
 
diff --git a/drivers/media/platform/ti-vpe/Makefile b/drivers/media/platform/ti-vpe/Makefile
new file mode 100644
index 0000000..cbf0a80
--- /dev/null
+++ b/drivers/media/platform/ti-vpe/Makefile
@@ -0,0 +1,5 @@
+obj-$(CONFIG_VIDEO_TI_VPE) += ti-vpe.o
+
+ti-vpe-y := vpe.o vpdma.o
+
+ccflags-$(CONFIG_VIDEO_TI_VPE_DEBUG) += -DDEBUG
diff --git a/drivers/media/platform/ti-vpe/vpe.c b/drivers/media/platform/ti-vpe/vpe.c
new file mode 100644
index 0000000..3bd9ca6
--- /dev/null
+++ b/drivers/media/platform/ti-vpe/vpe.c
@@ -0,0 +1,1775 @@
+/*
+ * TI VPE mem2mem driver, based on the virtual v4l2-mem2mem example driver
+ *
+ * Copyright (c) 2013 Texas Instruments Inc.
+ * David Griego, <dagriego@biglakesoftware.com>
+ * Dale Farnsworth, <dale@farnsworth.org>
+ * Archit Taneja, <archit@ti.com>
+ *
+ * Copyright (c) 2009-2010 Samsung Electronics Co., Ltd.
+ * Pawel Osciak, <pawel@osciak.com>
+ * Marek Szyprowski, <m.szyprowski@samsung.com>
+ *
+ * Based on the virtual v4l2-mem2mem example device
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation
+ */
+
+#include <linux/delay.h>
+#include <linux/dma-mapping.h>
+#include <linux/err.h>
+#include <linux/fs.h>
+#include <linux/interrupt.h>
+#include <linux/io.h>
+#include <linux/ioctl.h>
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <linux/pm_runtime.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/videodev2.h>
+
+#include <media/v4l2-common.h>
+#include <media/v4l2-ctrls.h>
+#include <media/v4l2-device.h>
+#include <media/v4l2-event.h>
+#include <media/v4l2-ioctl.h>
+#include <media/v4l2-mem2mem.h>
+#include <media/videobuf2-core.h>
+#include <media/videobuf2-dma-contig.h>
+
+#include "vpdma.h"
+#include "vpe_regs.h"
+
+#define VPE_MODULE_NAME "vpe"
+
+/* minimum and maximum frame sizes */
+#define MIN_W		128
+#define MIN_H		128
+#define MAX_W		1920
+#define MAX_H		1080
+
+/* required alignments */
+#define S_ALIGN		0	/* multiple of 1 */
+#define H_ALIGN		1	/* multiple of 2 */
+#define W_ALIGN		1	/* multiple of 2 */
+
+/* multiple of 128 bits, line stride, 16 bytes */
+#define L_ALIGN		4
+
+/* flags that indicate a format can be used for capture/output */
+#define VPE_FMT_TYPE_CAPTURE	(1 << 0)
+#define VPE_FMT_TYPE_OUTPUT	(1 << 1)
+
+/* used as plane indices */
+#define VPE_MAX_PLANES	2
+#define VPE_LUMA	0
+#define VPE_CHROMA	1
+
+/* per m2m context info */
+#define VPE_DEF_BUFS_PER_JOB	1	/* default one buffer per batch job */
+
+/*
+ * each VPE context can need up to 3 config desciptors, 7 input descriptors,
+ * 3 output descriptors, and 10 control descriptors
+ */
+#define VPE_DESC_LIST_SIZE	(10 * VPDMA_DTD_DESC_SIZE +	\
+					13 * VPDMA_CFD_CTD_DESC_SIZE)
+
+#define vpe_dbg(vpedev, fmt, arg...)	\
+		dev_dbg((vpedev)->v4l2_dev.dev, fmt, ##arg)
+#define vpe_err(vpedev, fmt, arg...)	\
+		dev_err((vpedev)->v4l2_dev.dev, fmt, ##arg)
+
+struct vpe_us_coeffs {
+	unsigned short	anchor_fid0_c0;
+	unsigned short	anchor_fid0_c1;
+	unsigned short	anchor_fid0_c2;
+	unsigned short	anchor_fid0_c3;
+	unsigned short	interp_fid0_c0;
+	unsigned short	interp_fid0_c1;
+	unsigned short	interp_fid0_c2;
+	unsigned short	interp_fid0_c3;
+	unsigned short	anchor_fid1_c0;
+	unsigned short	anchor_fid1_c1;
+	unsigned short	anchor_fid1_c2;
+	unsigned short	anchor_fid1_c3;
+	unsigned short	interp_fid1_c0;
+	unsigned short	interp_fid1_c1;
+	unsigned short	interp_fid1_c2;
+	unsigned short	interp_fid1_c3;
+};
+
+/*
+ * Default upsampler coefficients
+ */
+static const struct vpe_us_coeffs us_coeffs[] = {
+	{
+		/* Coefficients for progressive input */
+		0x00C8, 0x0348, 0x0018, 0x3FD8, 0x3FB8, 0x0378, 0x00E8, 0x3FE8,
+		0x00C8, 0x0348, 0x0018, 0x3FD8, 0x3FB8, 0x0378, 0x00E8, 0x3FE8,
+	},
+};
+
+/*
+ * The port_data structure contains per-port data.
+ */
+struct vpe_port_data {
+	enum vpdma_channel channel;	/* VPDMA channel */
+	u8	vb_part;		/* plane index for co-panar formats */
+};
+
+/*
+ * Define indices into the port_data tables
+ */
+#define VPE_PORT_LUMA1_IN	0
+#define VPE_PORT_CHROMA1_IN	1
+#define VPE_PORT_LUMA_OUT	8
+#define VPE_PORT_CHROMA_OUT	9
+#define VPE_PORT_RGB_OUT	10
+
+static const struct vpe_port_data port_data[11] = {
+	[VPE_PORT_LUMA1_IN] = {
+		.channel	= VPE_CHAN_LUMA1_IN,
+		.vb_part	= VPE_LUMA,
+	},
+	[VPE_PORT_CHROMA1_IN] = {
+		.channel	= VPE_CHAN_CHROMA1_IN,
+		.vb_part	= VPE_CHROMA,
+	},
+	[VPE_PORT_LUMA_OUT] = {
+		.channel	= VPE_CHAN_LUMA_OUT,
+		.vb_part	= VPE_LUMA,
+	},
+	[VPE_PORT_CHROMA_OUT] = {
+		.channel	= VPE_CHAN_CHROMA_OUT,
+		.vb_part	= VPE_CHROMA,
+	},
+	[VPE_PORT_RGB_OUT] = {
+		.channel	= VPE_CHAN_RGB_OUT,
+		.vb_part	= VPE_LUMA,
+	},
+};
+
+
+/* driver info for each of the supported video formats */
+struct vpe_fmt {
+	char	*name;			/* human-readable name */
+	u32	fourcc;			/* standard format identifier */
+	u8	types;			/* CAPTURE and/or OUTPUT */
+	u8	coplanar;		/* set for unpacked Luma and Chroma */
+	/* vpdma format info for each plane */
+	struct vpdma_data_format const *vpdma_fmt[VPE_MAX_PLANES];
+};
+
+static struct vpe_fmt vpe_formats[] = {
+	{
+		.name		= "YUV 422 co-planar",
+		.fourcc		= V4L2_PIX_FMT_NV16,
+		.types		= VPE_FMT_TYPE_CAPTURE | VPE_FMT_TYPE_OUTPUT,
+		.coplanar	= 1,
+		.vpdma_fmt	= { &vpdma_yuv_fmts[VPDMA_DATA_FMT_Y444],
+				    &vpdma_yuv_fmts[VPDMA_DATA_FMT_C444],
+				  },
+	},
+	{
+		.name		= "YUV 420 co-planar",
+		.fourcc		= V4L2_PIX_FMT_NV12,
+		.types		= VPE_FMT_TYPE_CAPTURE | VPE_FMT_TYPE_OUTPUT,
+		.coplanar	= 1,
+		.vpdma_fmt	= { &vpdma_yuv_fmts[VPDMA_DATA_FMT_Y420],
+				    &vpdma_yuv_fmts[VPDMA_DATA_FMT_C420],
+				  },
+	},
+	{
+		.name		= "YUYV 422 packed",
+		.fourcc		= V4L2_PIX_FMT_YUYV,
+		.types		= VPE_FMT_TYPE_CAPTURE | VPE_FMT_TYPE_OUTPUT,
+		.coplanar	= 0,
+		.vpdma_fmt	= { &vpdma_yuv_fmts[VPDMA_DATA_FMT_YC422],
+				  },
+	},
+	{
+		.name		= "UYVY 422 packed",
+		.fourcc		= V4L2_PIX_FMT_UYVY,
+		.types		= VPE_FMT_TYPE_CAPTURE | VPE_FMT_TYPE_OUTPUT,
+		.coplanar	= 0,
+		.vpdma_fmt	= { &vpdma_yuv_fmts[VPDMA_DATA_FMT_CY422],
+				  },
+	},
+};
+
+/*
+ * per-queue, driver-specific private data.
+ * there is one source queue and one destination queue for each m2m context.
+ */
+struct vpe_q_data {
+	unsigned int		width;				/* frame width */
+	unsigned int		height;				/* frame height */
+	unsigned int		bytesperline[VPE_MAX_PLANES];	/* bytes per line in memory */
+	enum v4l2_colorspace	colorspace;
+	unsigned int		flags;
+	unsigned int		sizeimage[VPE_MAX_PLANES];	/* image size in memory */
+	struct v4l2_rect	c_rect;				/* crop/compose rectangle */
+	struct vpe_fmt		*fmt;				/* format info */
+};
+
+/* vpe_q_data flag bits */
+#define	Q_DATA_FRAME_1D		(1 << 0)
+#define	Q_DATA_MODE_TILED	(1 << 1)
+
+enum {
+	Q_DATA_SRC = 0,
+	Q_DATA_DST = 1,
+};
+
+/* find our format description corresponding to the passed v4l2_format */
+static struct vpe_fmt *find_format(struct v4l2_format *f)
+{
+	struct vpe_fmt *fmt;
+	unsigned int k;
+
+	for (k = 0; k < ARRAY_SIZE(vpe_formats); k++) {
+		fmt = &vpe_formats[k];
+		if (fmt->fourcc == f->fmt.pix.pixelformat)
+			return fmt;
+	}
+
+	return NULL;
+}
+
+/*
+ * there is one vpe_dev structure in the driver, it is shared by
+ * all instances.
+ */
+struct vpe_dev {
+	struct v4l2_device	v4l2_dev;
+	struct video_device	vfd;
+	struct v4l2_m2m_dev	*m2m_dev;
+
+	atomic_t		num_instances;	/* count of driver instances */
+	dma_addr_t		loaded_mmrs;	/* shadow mmrs in device */
+	struct mutex		dev_mutex;
+	spinlock_t		lock;
+
+	int			irq;
+	void __iomem		*base;
+
+	struct vb2_alloc_ctx	*alloc_ctx;
+	struct vpdma_data	*vpdma;		/* vpdma data handle */
+};
+
+/*
+ * There is one vpe_ctx structure for each m2m context.
+ */
+struct vpe_ctx {
+	struct v4l2_fh		fh;
+	struct vpe_dev		*dev;
+	struct v4l2_m2m_ctx	*m2m_ctx;
+	struct v4l2_ctrl_handler hdl;
+
+	unsigned int		sequence;		/* current frame/field seq */
+	unsigned int		aborting;		/* abort after next irq */
+
+	unsigned int		bufs_per_job;		/* input buffers per batch */
+	unsigned int		bufs_completed;		/* bufs done in this batch */
+
+	struct vpe_q_data	q_data[2];		/* src & dst queue data */
+	struct vb2_buffer	*src_vb;
+	struct vb2_buffer	*dst_vb;
+
+	struct vpdma_buf	mmr_adb;		/* shadow reg addr/data block */
+	struct vpdma_desc_list	desc_list;		/* DMA descriptor list */
+
+	bool			load_mmrs;		/* have new shadow reg values */
+};
+
+
+/*
+ * M2M devices get 2 queues.
+ * Return the queue given the type.
+ */
+static struct vpe_q_data *get_q_data(struct vpe_ctx *ctx,
+				     enum v4l2_buf_type type)
+{
+	switch (type) {
+	case V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE:
+		return &ctx->q_data[Q_DATA_SRC];
+	case V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE:
+		return &ctx->q_data[Q_DATA_DST];
+	default:
+		BUG();
+	}
+	return NULL;
+}
+
+static u32 read_reg(struct vpe_dev *dev, int offset)
+{
+	return ioread32(dev->base + offset);
+}
+
+static void write_reg(struct vpe_dev *dev, int offset, u32 value)
+{
+	iowrite32(value, dev->base + offset);
+}
+
+/* register field read/write helpers */
+static int get_field(u32 value, u32 mask, int shift)
+{
+	return (value & (mask << shift)) >> shift;
+}
+
+static int read_field_reg(struct vpe_dev *dev, int offset, u32 mask, int shift)
+{
+	return get_field(read_reg(dev, offset), mask, shift);
+}
+
+static void write_field(u32 *valp, u32 field, u32 mask, int shift)
+{
+	u32 val = *valp;
+
+	val &= ~(mask << shift);
+	val |= (field & mask) << shift;
+	*valp = val;
+}
+
+static void write_field_reg(struct vpe_dev *dev, int offset, u32 field,
+		u32 mask, int shift)
+{
+	u32 val = read_reg(dev, offset);
+
+	write_field(&val, field, mask, shift);
+
+	write_reg(dev, offset, val);
+}
+
+/*
+ * DMA address/data block for the shadow registers
+ */
+struct vpe_mmr_adb {
+	struct vpdma_adb_hdr	out_fmt_hdr;
+	u32			out_fmt_reg[1];
+	u32			out_fmt_pad[3];
+	struct vpdma_adb_hdr	us1_hdr;
+	u32			us1_regs[8];
+	struct vpdma_adb_hdr	us2_hdr;
+	u32			us2_regs[8];
+	struct vpdma_adb_hdr	us3_hdr;
+	u32			us3_regs[8];
+	struct vpdma_adb_hdr	dei_hdr;
+	u32			dei_regs[1];
+	u32			dei_pad[3];
+	struct vpdma_adb_hdr	sc_hdr;
+	u32			sc_regs[1];
+	u32			sc_pad[3];
+	struct vpdma_adb_hdr	csc_hdr;
+	u32			csc_regs[6];
+	u32			csc_pad[2];
+};
+
+#define VPE_SET_MMR_ADB_HDR(ctx, hdr, regs, offset_a)	\
+	VPDMA_SET_MMR_ADB_HDR(ctx->mmr_adb, vpe_mmr_adb, hdr, regs, offset_a)
+/*
+ * Set the headers for all of the address/data block structures.
+ */
+static void init_adb_hdrs(struct vpe_ctx *ctx)
+{
+	VPE_SET_MMR_ADB_HDR(ctx, out_fmt_hdr, out_fmt_reg, VPE_CLK_FORMAT_SELECT);
+	VPE_SET_MMR_ADB_HDR(ctx, us1_hdr, us1_regs, VPE_US1_R0);
+	VPE_SET_MMR_ADB_HDR(ctx, us2_hdr, us2_regs, VPE_US2_R0);
+	VPE_SET_MMR_ADB_HDR(ctx, us3_hdr, us3_regs, VPE_US3_R0);
+	VPE_SET_MMR_ADB_HDR(ctx, dei_hdr, dei_regs, VPE_DEI_FRAME_SIZE);
+	VPE_SET_MMR_ADB_HDR(ctx, sc_hdr, sc_regs, VPE_SC_MP_SC0);
+	VPE_SET_MMR_ADB_HDR(ctx, csc_hdr, csc_regs, VPE_CSC_CSC00);
+};
+
+/*
+ * Enable or disable the VPE clocks
+ */
+static void vpe_set_clock_enable(struct vpe_dev *dev, bool on)
+{
+	u32 val = 0;
+
+	if (on)
+		val = VPE_DATA_PATH_CLK_ENABLE | VPE_VPEDMA_CLK_ENABLE;
+	write_reg(dev, VPE_CLK_ENABLE, val);
+}
+
+static void vpe_top_reset(struct vpe_dev *dev)
+{
+
+	write_field_reg(dev, VPE_CLK_RESET, 1, VPE_DATA_PATH_CLK_RESET_MASK,
+		VPE_DATA_PATH_CLK_RESET_SHIFT);
+
+	usleep_range(100, 150);
+
+	write_field_reg(dev, VPE_CLK_RESET, 0, VPE_DATA_PATH_CLK_RESET_MASK,
+		VPE_DATA_PATH_CLK_RESET_SHIFT);
+}
+
+static void vpe_top_vpdma_reset(struct vpe_dev *dev)
+{
+	write_field_reg(dev, VPE_CLK_RESET, 1, VPE_VPDMA_CLK_RESET_MASK,
+		VPE_VPDMA_CLK_RESET_SHIFT);
+
+	usleep_range(100, 150);
+
+	write_field_reg(dev, VPE_CLK_RESET, 0, VPE_VPDMA_CLK_RESET_MASK,
+		VPE_VPDMA_CLK_RESET_SHIFT);
+}
+
+/*
+ * Load the correct of upsampler coefficients into the shadow MMRs
+ */
+static void set_us_coefficients(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	u32 *us1_reg = &mmr_adb->us1_regs[0];
+	u32 *us2_reg = &mmr_adb->us2_regs[0];
+	u32 *us3_reg = &mmr_adb->us3_regs[0];
+	const unsigned short *cp, *end_cp;
+
+	cp = &us_coeffs[0].anchor_fid0_c0;
+
+	end_cp = cp + sizeof(us_coeffs[0]) / sizeof(*cp);
+
+	while (cp < end_cp) {
+		write_field(us1_reg, *cp++, VPE_US_C0_MASK, VPE_US_C0_SHIFT);
+		write_field(us1_reg, *cp++, VPE_US_C1_MASK, VPE_US_C1_SHIFT);
+		*us2_reg++ = *us1_reg;
+		*us3_reg++ = *us1_reg++;
+	}
+	ctx->load_mmrs = true;
+}
+
+/*
+ * Set the upsampler config mode and the VPDMA line mode in the shadow MMRs.
+ */
+static void set_cfg_and_line_modes(struct vpe_ctx *ctx)
+{
+	struct vpe_fmt *fmt = ctx->q_data[Q_DATA_SRC].fmt;
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	u32 *us1_reg0 = &mmr_adb->us1_regs[0];
+	u32 *us2_reg0 = &mmr_adb->us2_regs[0];
+	u32 *us3_reg0 = &mmr_adb->us3_regs[0];
+	int line_mode = 1;
+	int cfg_mode = 1;
+
+	/*
+	 * Cfg Mode 0: YUV420 source, enable upsampler, DEI is de-interlacing.
+	 * Cfg Mode 1: YUV422 source, disable upsampler, DEI is de-interlacing.
+	 */
+
+	if (fmt->fourcc == V4L2_PIX_FMT_NV12) {
+		cfg_mode = 0;
+		line_mode = 0;		/* double lines to line buffer */
+	}
+
+	write_field(us1_reg0, cfg_mode, VPE_US_MODE_MASK, VPE_US_MODE_SHIFT);
+	write_field(us2_reg0, cfg_mode, VPE_US_MODE_MASK, VPE_US_MODE_SHIFT);
+	write_field(us3_reg0, cfg_mode, VPE_US_MODE_MASK, VPE_US_MODE_SHIFT);
+
+	/* regs for now */
+	vpdma_set_line_mode(ctx->dev->vpdma, line_mode, VPE_CHAN_CHROMA1_IN);
+
+	/* frame start for input luma */
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_LUMA1_IN);
+
+	/* frame start for input chroma */
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_CHROMA1_IN);
+
+	ctx->load_mmrs = true;
+}
+
+/*
+ * Set the shadow registers that are modified when the source
+ * format changes.
+ */
+static void set_src_registers(struct vpe_ctx *ctx)
+{
+	set_us_coefficients(ctx);
+}
+
+/*
+ * Set the shadow registers that are modified when the destination
+ * format changes.
+ */
+static void set_dst_registers(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	struct vpe_fmt *fmt = ctx->q_data[Q_DATA_DST].fmt;
+	u32 val = 0;
+
+	/* select RGB path when color space conversion is supported in future */
+	if (fmt->fourcc == V4L2_PIX_FMT_RGB24)
+		val |= VPE_RGB_OUT_SELECT | VPE_CSC_SRC_DEI_SCALER;
+	else if (fmt->fourcc == V4L2_PIX_FMT_NV16)
+		val |= VPE_COLOR_SEPARATE_422;
+
+	/* The source of CHR_DS is always the scaler, whether it's used or not */
+	val |= VPE_DS_SRC_DEI_SCALER;
+
+	if (fmt->fourcc != V4L2_PIX_FMT_NV12)
+		val |= VPE_DS_BYPASS;
+
+	mmr_adb->out_fmt_reg[0] = val;
+
+	ctx->load_mmrs = true;
+}
+
+/*
+ * Set the de-interlacer shadow register values
+ */
+static void set_dei_regs_bypass(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	struct vpe_q_data *s_q_data = &ctx->q_data[Q_DATA_SRC];
+	unsigned int src_h = s_q_data->c_rect.height;
+	unsigned int src_w = s_q_data->c_rect.width;
+	u32 *dei_mmr0 = &mmr_adb->dei_regs[0];
+	u32 val = 0;
+
+	/*
+	 * according to TRM, we should set DEI in progressive bypass mode when
+	 * the input content is progressive, however, DEI is bypassed correctly
+	 * for both progressive and interlace content in interlace bypass mode.
+	 * It has been recommended not to use progressive bypass mode.
+	 */
+	val = VPE_DEI_INTERLACE_BYPASS;
+
+	val |= (src_h << VPE_DEI_HEIGHT_SHIFT) |
+		(src_w << VPE_DEI_WIDTH_SHIFT) |
+		VPE_DEI_FIELD_FLUSH;
+
+	*dei_mmr0 = val;
+
+	ctx->load_mmrs = true;
+}
+
+static void set_csc_coeff_bypass(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	u32 *shadow_csc_reg5 = &mmr_adb->csc_regs[5];
+
+	*shadow_csc_reg5 |= VPE_CSC_BYPASS;
+
+	ctx->load_mmrs = true;
+}
+
+static void set_sc_regs_bypass(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	u32 *sc_reg0 = &mmr_adb->sc_regs[0];
+	u32 val = 0;
+
+	val |= VPE_SC_BYPASS;
+	*sc_reg0 = val;
+
+	ctx->load_mmrs = true;
+}
+
+/*
+ * Set the shadow registers whose values are modified when either the
+ * source or destination format is changed.
+ */
+static int set_srcdst_params(struct vpe_ctx *ctx)
+{
+	ctx->sequence = 0;
+
+	set_cfg_and_line_modes(ctx);
+	set_dei_regs_bypass(ctx);
+	set_csc_coeff_bypass(ctx);
+	set_sc_regs_bypass(ctx);
+
+	return 0;
+}
+
+/*
+ * Return the vpe_ctx structure for a given struct file
+ */
+static struct vpe_ctx *file2ctx(struct file *file)
+{
+	return container_of(file->private_data, struct vpe_ctx, fh);
+}
+
+/*
+ * mem2mem callbacks
+ */
+
+/**
+ * job_ready() - check whether an instance is ready to be scheduled to run
+ */
+static int job_ready(void *priv)
+{
+	struct vpe_ctx *ctx = priv;
+	int needed = ctx->bufs_per_job;
+
+	if (v4l2_m2m_num_src_bufs_ready(ctx->m2m_ctx) < needed)
+		return 0;
+
+	return 1;
+}
+
+static void job_abort(void *priv)
+{
+	struct vpe_ctx *ctx = priv;
+
+	/* Will cancel the transaction in the next interrupt handler */
+	ctx->aborting = 1;
+}
+
+/*
+ * Lock access to the device
+ */
+static void vpe_lock(void *priv)
+{
+	struct vpe_ctx *ctx = priv;
+	struct vpe_dev *dev = ctx->dev;
+	mutex_lock(&dev->dev_mutex);
+}
+
+static void vpe_unlock(void *priv)
+{
+	struct vpe_ctx *ctx = priv;
+	struct vpe_dev *dev = ctx->dev;
+	mutex_unlock(&dev->dev_mutex);
+}
+
+static void vpe_dump_regs(struct vpe_dev *dev)
+{
+#define DUMPREG(r) vpe_dbg(dev, "%-35s %08x\n", #r, read_reg(dev, VPE_##r))
+
+	vpe_dbg(dev, "VPE Registers:\n");
+
+	DUMPREG(PID);
+	DUMPREG(SYSCONFIG);
+	DUMPREG(INT0_STATUS0_RAW);
+	DUMPREG(INT0_STATUS0);
+	DUMPREG(INT0_ENABLE0);
+	DUMPREG(INT0_STATUS1_RAW);
+	DUMPREG(INT0_STATUS1);
+	DUMPREG(INT0_ENABLE1);
+	DUMPREG(CLK_ENABLE);
+	DUMPREG(CLK_RESET);
+	DUMPREG(CLK_FORMAT_SELECT);
+	DUMPREG(CLK_RANGE_MAP);
+	DUMPREG(US1_R0);
+	DUMPREG(US1_R1);
+	DUMPREG(US1_R2);
+	DUMPREG(US1_R3);
+	DUMPREG(US1_R4);
+	DUMPREG(US1_R5);
+	DUMPREG(US1_R6);
+	DUMPREG(US1_R7);
+	DUMPREG(US2_R0);
+	DUMPREG(US2_R1);
+	DUMPREG(US2_R2);
+	DUMPREG(US2_R3);
+	DUMPREG(US2_R4);
+	DUMPREG(US2_R5);
+	DUMPREG(US2_R6);
+	DUMPREG(US2_R7);
+	DUMPREG(US3_R0);
+	DUMPREG(US3_R1);
+	DUMPREG(US3_R2);
+	DUMPREG(US3_R3);
+	DUMPREG(US3_R4);
+	DUMPREG(US3_R5);
+	DUMPREG(US3_R6);
+	DUMPREG(US3_R7);
+	DUMPREG(DEI_FRAME_SIZE);
+	DUMPREG(MDT_BYPASS);
+	DUMPREG(MDT_SF_THRESHOLD);
+	DUMPREG(EDI_CONFIG);
+	DUMPREG(DEI_EDI_LUT_R0);
+	DUMPREG(DEI_EDI_LUT_R1);
+	DUMPREG(DEI_EDI_LUT_R2);
+	DUMPREG(DEI_EDI_LUT_R3);
+	DUMPREG(DEI_FMD_WINDOW_R0);
+	DUMPREG(DEI_FMD_WINDOW_R1);
+	DUMPREG(DEI_FMD_CONTROL_R0);
+	DUMPREG(DEI_FMD_CONTROL_R1);
+	DUMPREG(DEI_FMD_STATUS_R0);
+	DUMPREG(DEI_FMD_STATUS_R1);
+	DUMPREG(DEI_FMD_STATUS_R2);
+	DUMPREG(SC_MP_SC0);
+	DUMPREG(SC_MP_SC1);
+	DUMPREG(SC_MP_SC2);
+	DUMPREG(SC_MP_SC3);
+	DUMPREG(SC_MP_SC4);
+	DUMPREG(SC_MP_SC5);
+	DUMPREG(SC_MP_SC6);
+	DUMPREG(SC_MP_SC8);
+	DUMPREG(SC_MP_SC9);
+	DUMPREG(SC_MP_SC10);
+	DUMPREG(SC_MP_SC11);
+	DUMPREG(SC_MP_SC12);
+	DUMPREG(SC_MP_SC13);
+	DUMPREG(SC_MP_SC17);
+	DUMPREG(SC_MP_SC18);
+	DUMPREG(SC_MP_SC19);
+	DUMPREG(SC_MP_SC20);
+	DUMPREG(SC_MP_SC21);
+	DUMPREG(SC_MP_SC22);
+	DUMPREG(SC_MP_SC23);
+	DUMPREG(SC_MP_SC24);
+	DUMPREG(SC_MP_SC25);
+	DUMPREG(CSC_CSC00);
+	DUMPREG(CSC_CSC01);
+	DUMPREG(CSC_CSC02);
+	DUMPREG(CSC_CSC03);
+	DUMPREG(CSC_CSC04);
+	DUMPREG(CSC_CSC05);
+#undef DUMPREG
+}
+
+static void add_out_dtd(struct vpe_ctx *ctx, int port)
+{
+	struct vpe_q_data *q_data = &ctx->q_data[Q_DATA_DST];
+	const struct vpe_port_data *p_data = &port_data[port];
+	struct vb2_buffer *vb = ctx->dst_vb;
+	struct v4l2_rect *c_rect = &q_data->c_rect;
+	struct vpe_fmt *fmt = q_data->fmt;
+	const struct vpdma_data_format *vpdma_fmt;
+	int plane = fmt->coplanar ? p_data->vb_part : 0;
+	dma_addr_t dma_addr;
+	u32 flags = 0;
+
+	vpdma_fmt = fmt->vpdma_fmt[plane];
+	dma_addr = vb2_dma_contig_plane_dma_addr(vb, plane);
+	if (!dma_addr) {
+		vpe_err(ctx->dev,
+			"acquiring output buffer(%d) dma_addr failed\n",
+			port);
+		return;
+	}
+
+	if (q_data->flags & Q_DATA_FRAME_1D)
+		flags |= VPDMA_DATA_FRAME_1D;
+	if (q_data->flags & Q_DATA_MODE_TILED)
+		flags |= VPDMA_DATA_MODE_TILED;
+
+	vpdma_add_out_dtd(&ctx->desc_list, c_rect, vpdma_fmt, dma_addr,
+		p_data->channel, flags);
+}
+
+static void add_in_dtd(struct vpe_ctx *ctx, int port)
+{
+	struct vpe_q_data *q_data = &ctx->q_data[Q_DATA_SRC];
+	const struct vpe_port_data *p_data = &port_data[port];
+	struct vb2_buffer *vb = ctx->src_vb;
+	struct v4l2_rect *c_rect = &q_data->c_rect;
+	struct vpe_fmt *fmt = q_data->fmt;
+	const struct vpdma_data_format *vpdma_fmt;
+	int plane = fmt->coplanar ? p_data->vb_part : 0;
+	int field = 0;
+	dma_addr_t dma_addr;
+	u32 flags = 0;
+
+	vpdma_fmt = fmt->vpdma_fmt[plane];
+
+	dma_addr = vb2_dma_contig_plane_dma_addr(vb, plane);
+	if (!dma_addr) {
+		vpe_err(ctx->dev,
+			"acquiring input buffer(%d) dma_addr failed\n",
+			port);
+		return;
+	}
+
+	if (q_data->flags & Q_DATA_FRAME_1D)
+		flags |= VPDMA_DATA_FRAME_1D;
+	if (q_data->flags & Q_DATA_MODE_TILED)
+		flags |= VPDMA_DATA_MODE_TILED;
+
+	vpdma_add_in_dtd(&ctx->desc_list, q_data->width, q_data->height,
+		c_rect, vpdma_fmt, dma_addr, p_data->channel, field, flags);
+}
+
+/*
+ * Enable the expected IRQ sources
+ */
+static void enable_irqs(struct vpe_ctx *ctx)
+{
+	write_reg(ctx->dev, VPE_INT0_ENABLE0_SET, VPE_INT0_LIST0_COMPLETE);
+	write_reg(ctx->dev, VPE_INT0_ENABLE1_SET, VPE_DS1_UV_ERROR_INT);
+
+	vpdma_enable_list_complete_irq(ctx->dev->vpdma, 0, true);
+}
+
+static void disable_irqs(struct vpe_ctx *ctx)
+{
+	write_reg(ctx->dev, VPE_INT0_ENABLE0_CLR, 0xffffffff);
+	write_reg(ctx->dev, VPE_INT0_ENABLE1_CLR, 0xffffffff);
+
+	vpdma_enable_list_complete_irq(ctx->dev->vpdma, 0, false);
+}
+
+/* device_run() - prepares and starts the device
+ *
+ * This function is only called when both the source and destination
+ * buffers are in place.
+ */
+static void device_run(void *priv)
+{
+	struct vpe_ctx *ctx = priv;
+	struct vpe_q_data *d_q_data = &ctx->q_data[Q_DATA_DST];
+
+	ctx->src_vb = v4l2_m2m_src_buf_remove(ctx->m2m_ctx);
+	WARN_ON(ctx->src_vb == NULL);
+	ctx->dst_vb = v4l2_m2m_dst_buf_remove(ctx->m2m_ctx);
+	WARN_ON(ctx->dst_vb == NULL);
+
+	/* config descriptors */
+	if (ctx->dev->loaded_mmrs != ctx->mmr_adb.dma_addr || ctx->load_mmrs) {
+		vpdma_map_desc_buf(ctx->dev->vpdma, &ctx->mmr_adb);
+		vpdma_add_cfd_adb(&ctx->desc_list, CFD_MMR_CLIENT, &ctx->mmr_adb);
+		ctx->dev->loaded_mmrs = ctx->mmr_adb.dma_addr;
+		ctx->load_mmrs = false;
+	}
+
+	add_out_dtd(ctx, VPE_PORT_LUMA_OUT);
+	if (d_q_data->fmt->coplanar)
+		add_out_dtd(ctx, VPE_PORT_CHROMA_OUT);
+
+	add_in_dtd(ctx, VPE_PORT_LUMA1_IN);
+	add_in_dtd(ctx, VPE_PORT_CHROMA1_IN);
+
+	/* sync on channel control descriptors for input ports */
+	vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_LUMA1_IN);
+	vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_CHROMA1_IN);
+
+	/* sync on channel control descriptors for output ports */
+	vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_LUMA_OUT);
+	if (d_q_data->fmt->coplanar)
+		vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_CHROMA_OUT);
+
+	enable_irqs(ctx);
+
+	vpdma_map_desc_buf(ctx->dev->vpdma, &ctx->desc_list.buf);
+	vpdma_submit_descs(ctx->dev->vpdma, &ctx->desc_list);
+}
+
+static void ds1_uv_error(struct vpe_ctx *ctx)
+{
+	dev_warn(ctx->dev->v4l2_dev.dev,
+		"received downsampler error interrupt\n");
+}
+
+static irqreturn_t vpe_irq(int irq_vpe, void *data)
+{
+	struct vpe_dev *dev = (struct vpe_dev *)data;
+	struct vpe_ctx *ctx;
+	struct vb2_buffer *s_vb, *d_vb;
+	struct v4l2_buffer *s_buf, *d_buf;
+	unsigned long flags;
+	u32 irqst0, irqst1;
+
+	irqst0 = read_reg(dev, VPE_INT0_STATUS0);
+	if (irqst0) {
+		write_reg(dev, VPE_INT0_STATUS0_CLR, irqst0);
+		vpe_dbg(dev, "INT0_STATUS0 = 0x%08x\n", irqst0);
+	}
+
+	irqst1 = read_reg(dev, VPE_INT0_STATUS1);
+	if (irqst1) {
+		write_reg(dev, VPE_INT0_STATUS1_CLR, irqst1);
+		vpe_dbg(dev, "INT0_STATUS1 = 0x%08x\n", irqst1);
+	}
+
+	ctx = v4l2_m2m_get_curr_priv(dev->m2m_dev);
+	if (!ctx) {
+		vpe_err(dev, "instance released before end of transaction\n");
+		goto handled;
+	}
+
+	if (irqst1 & VPE_DS1_UV_ERROR_INT) {
+		irqst1 &= ~VPE_DS1_UV_ERROR_INT;
+		ds1_uv_error(ctx);
+	}
+
+	if (irqst0) {
+		if (irqst0 & VPE_INT0_LIST0_COMPLETE)
+			vpdma_clear_list_stat(ctx->dev->vpdma);
+
+		irqst0 &= ~(VPE_INT0_LIST0_COMPLETE);
+	}
+
+	if (irqst0 | irqst1) {
+		dev_warn(dev->v4l2_dev.dev, "Unexpected interrupt: "
+			"INT0_STATUS0 = 0x%08x, INT0_STATUS1 = 0x%08x\n",
+			irqst0, irqst1);
+	}
+
+	disable_irqs(ctx);
+
+	vpdma_unmap_desc_buf(dev->vpdma, &ctx->desc_list.buf);
+	vpdma_unmap_desc_buf(dev->vpdma, &ctx->mmr_adb);
+
+	vpdma_reset_desc_list(&ctx->desc_list);
+
+	if (ctx->aborting)
+		goto finished;
+
+	s_vb = ctx->src_vb;
+	d_vb = ctx->dst_vb;
+	s_buf = &s_vb->v4l2_buf;
+	d_buf = &d_vb->v4l2_buf;
+
+	d_buf->timestamp = s_buf->timestamp;
+	if (s_buf->flags & V4L2_BUF_FLAG_TIMECODE) {
+		d_buf->flags |= V4L2_BUF_FLAG_TIMECODE;
+		d_buf->timecode = s_buf->timecode;
+	}
+
+	d_buf->sequence = ctx->sequence;
+
+	ctx->sequence++;
+
+	spin_lock_irqsave(&dev->lock, flags);
+	v4l2_m2m_buf_done(s_vb, VB2_BUF_STATE_DONE);
+	v4l2_m2m_buf_done(d_vb, VB2_BUF_STATE_DONE);
+	spin_unlock_irqrestore(&dev->lock, flags);
+
+	ctx->bufs_completed++;
+	if (ctx->bufs_completed < ctx->bufs_per_job) {
+		device_run(ctx);
+		goto handled;
+	}
+
+finished:
+	vpe_dbg(ctx->dev, "finishing transaction\n");
+	ctx->bufs_completed = 0;
+	v4l2_m2m_job_finish(dev->m2m_dev, ctx->m2m_ctx);
+handled:
+	return IRQ_HANDLED;
+}
+
+/*
+ * video ioctls
+ */
+static int vpe_querycap(struct file *file, void *priv,
+			struct v4l2_capability *cap)
+{
+	strncpy(cap->driver, VPE_MODULE_NAME, sizeof(cap->driver) - 1);
+	strncpy(cap->card, VPE_MODULE_NAME, sizeof(cap->card) - 1);
+	strlcpy(cap->bus_info, VPE_MODULE_NAME, sizeof(cap->bus_info));
+	cap->device_caps  = V4L2_CAP_VIDEO_M2M | V4L2_CAP_STREAMING;
+	cap->capabilities = cap->device_caps | V4L2_CAP_DEVICE_CAPS;
+	return 0;
+}
+
+static int __enum_fmt(struct v4l2_fmtdesc *f, u32 type)
+{
+	int i, index;
+	struct vpe_fmt *fmt = NULL;
+
+	index = 0;
+	for (i = 0; i < ARRAY_SIZE(vpe_formats); ++i) {
+		if (vpe_formats[i].types & type) {
+			if (index == f->index) {
+				fmt = &vpe_formats[i];
+				break;
+			}
+			index++;
+		}
+	}
+
+	if (!fmt)
+		return -EINVAL;
+
+	strncpy(f->description, fmt->name, sizeof(f->description) - 1);
+	f->pixelformat = fmt->fourcc;
+	return 0;
+}
+
+static int vpe_enum_fmt(struct file *file, void *priv,
+				struct v4l2_fmtdesc *f)
+{
+	if (V4L2_TYPE_IS_OUTPUT(f->type))
+		return __enum_fmt(f, VPE_FMT_TYPE_OUTPUT);
+
+	return __enum_fmt(f, VPE_FMT_TYPE_CAPTURE);
+}
+
+static int vpe_g_fmt(struct file *file, void *priv, struct v4l2_format *f)
+{
+	struct v4l2_pix_format_mplane *pix = &f->fmt.pix_mp;
+	struct vpe_ctx *ctx = file2ctx(file);
+	struct vb2_queue *vq;
+	struct vpe_q_data *q_data;
+	int i;
+
+	vq = v4l2_m2m_get_vq(ctx->m2m_ctx, f->type);
+	if (!vq)
+		return -EINVAL;
+
+	q_data = get_q_data(ctx, f->type);
+
+	pix->width = q_data->width;
+	pix->height = q_data->height;
+	pix->pixelformat = q_data->fmt->fourcc;
+
+	if (V4L2_TYPE_IS_OUTPUT(f->type)) {
+		pix->colorspace = q_data->colorspace;
+	} else {
+		struct vpe_q_data *s_q_data;
+
+		/* get colorspace from the source queue */
+		s_q_data = get_q_data(ctx, V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE);
+
+		pix->colorspace = s_q_data->colorspace;
+	}
+
+	pix->num_planes = q_data->fmt->coplanar ? 2 : 1;
+
+	for (i = 0; i < pix->num_planes; i++) {
+		pix->plane_fmt[i].bytesperline = q_data->bytesperline[i];
+		pix->plane_fmt[i].sizeimage = q_data->sizeimage[i];
+	}
+
+	return 0;
+}
+
+static int __vpe_try_fmt(struct vpe_ctx *ctx, struct v4l2_format *f,
+		       struct vpe_fmt *fmt, int type)
+{
+	struct v4l2_pix_format_mplane *pix = &f->fmt.pix_mp;
+	struct v4l2_plane_pix_format *plane_fmt;
+	int i;
+
+	if (!fmt || !(fmt->types & type)) {
+		vpe_err(ctx->dev, "Fourcc format (0x%08x) invalid.\n",
+			pix->pixelformat);
+		return -EINVAL;
+	}
+
+	pix->field = V4L2_FIELD_NONE;
+
+	v4l_bound_align_image(&pix->width, MIN_W, MAX_W, W_ALIGN,
+			      &pix->height, MIN_H, MAX_H, H_ALIGN,
+			      S_ALIGN);
+
+	pix->num_planes = fmt->coplanar ? 2 : 1;
+	pix->pixelformat = fmt->fourcc;
+
+	if (type == VPE_FMT_TYPE_CAPTURE) {
+		struct vpe_q_data *s_q_data;
+
+		/* get colorspace from the source queue */
+		s_q_data = get_q_data(ctx, V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE);
+
+		pix->colorspace = s_q_data->colorspace;
+	} else {
+		if (!pix->colorspace)
+			pix->colorspace = V4L2_COLORSPACE_SMPTE240M;
+	}
+
+	for (i = 0; i < pix->num_planes; i++) {
+		int depth;
+
+		plane_fmt = &pix->plane_fmt[i];
+		depth = fmt->vpdma_fmt[i]->depth;
+
+		if (i == VPE_LUMA)
+			plane_fmt->bytesperline =
+					round_up((pix->width * depth) >> 3,
+						1 << L_ALIGN);
+		else
+			plane_fmt->bytesperline = pix->width;
+
+		plane_fmt->sizeimage =
+				(pix->height * pix->width * depth) >> 3;
+	}
+
+	return 0;
+}
+
+static int vpe_try_fmt(struct file *file, void *priv, struct v4l2_format *f)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+	struct vpe_fmt *fmt = find_format(f);
+
+	if (V4L2_TYPE_IS_OUTPUT(f->type))
+		return __vpe_try_fmt(ctx, f, fmt, VPE_FMT_TYPE_OUTPUT);
+	else
+		return __vpe_try_fmt(ctx, f, fmt, VPE_FMT_TYPE_CAPTURE);
+}
+
+static int __vpe_s_fmt(struct vpe_ctx *ctx, struct v4l2_format *f)
+{
+	struct v4l2_pix_format_mplane *pix = &f->fmt.pix_mp;
+	struct v4l2_plane_pix_format *plane_fmt;
+	struct vpe_q_data *q_data;
+	struct vb2_queue *vq;
+	int i;
+
+	vq = v4l2_m2m_get_vq(ctx->m2m_ctx, f->type);
+	if (!vq)
+		return -EINVAL;
+
+	if (vb2_is_busy(vq)) {
+		vpe_err(ctx->dev, "queue busy\n");
+		return -EBUSY;
+	}
+
+	q_data = get_q_data(ctx, f->type);
+	if (!q_data)
+		return -EINVAL;
+
+	q_data->fmt		= find_format(f);
+	q_data->width		= pix->width;
+	q_data->height		= pix->height;
+	q_data->colorspace	= pix->colorspace;
+
+	for (i = 0; i < pix->num_planes; i++) {
+		plane_fmt = &pix->plane_fmt[i];
+
+		q_data->bytesperline[i]	= plane_fmt->bytesperline;
+		q_data->sizeimage[i]	= plane_fmt->sizeimage;
+	}
+
+	q_data->c_rect.left	= 0;
+	q_data->c_rect.top	= 0;
+	q_data->c_rect.width	= q_data->width;
+	q_data->c_rect.height	= q_data->height;
+
+	vpe_dbg(ctx->dev, "Setting format for type %d, wxh: %dx%d, fmt: %d bpl_y %d",
+		f->type, q_data->width, q_data->height, q_data->fmt->fourcc,
+		q_data->bytesperline[VPE_LUMA]);
+	if (q_data->fmt->coplanar)
+		vpe_dbg(ctx->dev, " bpl_uv %d\n",
+			q_data->bytesperline[VPE_CHROMA]);
+
+	return 0;
+}
+
+static int vpe_s_fmt(struct file *file, void *priv, struct v4l2_format *f)
+{
+	int ret;
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	ret = vpe_try_fmt(file, priv, f);
+	if (ret)
+		return ret;
+
+	ret = __vpe_s_fmt(ctx, f);
+	if (ret)
+		return ret;
+
+	if (V4L2_TYPE_IS_OUTPUT(f->type))
+		set_src_registers(ctx);
+	else
+		set_dst_registers(ctx);
+
+	return set_srcdst_params(ctx);
+}
+
+static int vpe_reqbufs(struct file *file, void *priv,
+		       struct v4l2_requestbuffers *reqbufs)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	return v4l2_m2m_reqbufs(file, ctx->m2m_ctx, reqbufs);
+}
+
+static int vpe_querybuf(struct file *file, void *priv, struct v4l2_buffer *buf)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	return v4l2_m2m_querybuf(file, ctx->m2m_ctx, buf);
+}
+
+static int vpe_qbuf(struct file *file, void *priv, struct v4l2_buffer *buf)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	return v4l2_m2m_qbuf(file, ctx->m2m_ctx, buf);
+}
+
+static int vpe_dqbuf(struct file *file, void *priv, struct v4l2_buffer *buf)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	return v4l2_m2m_dqbuf(file, ctx->m2m_ctx, buf);
+}
+
+static int vpe_streamon(struct file *file, void *priv, enum v4l2_buf_type type)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	return v4l2_m2m_streamon(file, ctx->m2m_ctx, type);
+}
+
+static int vpe_streamoff(struct file *file, void *priv, enum v4l2_buf_type type)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	vpe_dump_regs(ctx->dev);
+	vpdma_dump_regs(ctx->dev->vpdma);
+
+	return v4l2_m2m_streamoff(file, ctx->m2m_ctx, type);
+}
+
+/*
+ * defines number of buffers/frames a context can process with VPE before
+ * switching to a different context. default value is 1 buffer per context
+ */
+#define V4L2_CID_VPE_BUFS_PER_JOB		(V4L2_CID_USER_TI_VPE_BASE + 0)
+
+static int vpe_s_ctrl(struct v4l2_ctrl *ctrl)
+{
+	struct vpe_ctx *ctx =
+		container_of(ctrl->handler, struct vpe_ctx, hdl);
+
+	switch (ctrl->id) {
+	case V4L2_CID_VPE_BUFS_PER_JOB:
+		ctx->bufs_per_job = ctrl->val;
+		break;
+
+	default:
+		vpe_err(ctx->dev, "Invalid control\n");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static const struct v4l2_ctrl_ops vpe_ctrl_ops = {
+	.s_ctrl = vpe_s_ctrl,
+};
+
+static const struct v4l2_ioctl_ops vpe_ioctl_ops = {
+	.vidioc_querycap	= vpe_querycap,
+
+	.vidioc_enum_fmt_vid_cap_mplane = vpe_enum_fmt,
+	.vidioc_g_fmt_vid_cap_mplane	= vpe_g_fmt,
+	.vidioc_try_fmt_vid_cap_mplane	= vpe_try_fmt,
+	.vidioc_s_fmt_vid_cap_mplane	= vpe_s_fmt,
+
+	.vidioc_enum_fmt_vid_out_mplane = vpe_enum_fmt,
+	.vidioc_g_fmt_vid_out_mplane	= vpe_g_fmt,
+	.vidioc_try_fmt_vid_out_mplane	= vpe_try_fmt,
+	.vidioc_s_fmt_vid_out_mplane	= vpe_s_fmt,
+
+	.vidioc_reqbufs		= vpe_reqbufs,
+	.vidioc_querybuf	= vpe_querybuf,
+
+	.vidioc_qbuf		= vpe_qbuf,
+	.vidioc_dqbuf		= vpe_dqbuf,
+
+	.vidioc_streamon	= vpe_streamon,
+	.vidioc_streamoff	= vpe_streamoff,
+	.vidioc_subscribe_event = v4l2_ctrl_subscribe_event,
+	.vidioc_unsubscribe_event = v4l2_event_unsubscribe,
+};
+
+/*
+ * Queue operations
+ */
+static int vpe_queue_setup(struct vb2_queue *vq,
+			   const struct v4l2_format *fmt,
+			   unsigned int *nbuffers, unsigned int *nplanes,
+			   unsigned int sizes[], void *alloc_ctxs[])
+{
+	int i;
+	struct vpe_ctx *ctx = vb2_get_drv_priv(vq);
+	struct vpe_q_data *q_data;
+
+	q_data = get_q_data(ctx, vq->type);
+
+	*nplanes = q_data->fmt->coplanar ? 2 : 1;
+
+	for (i = 0; i < *nplanes; i++) {
+		sizes[i] = q_data->sizeimage[i];
+		alloc_ctxs[i] = ctx->dev->alloc_ctx;
+	}
+
+	vpe_dbg(ctx->dev, "get %d buffer(s) of size %d", *nbuffers,
+		sizes[VPE_LUMA]);
+	if (q_data->fmt->coplanar)
+		vpe_dbg(ctx->dev, " and %d\n", sizes[VPE_CHROMA]);
+
+	return 0;
+}
+
+static int vpe_buf_prepare(struct vb2_buffer *vb)
+{
+	struct vpe_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
+	struct vpe_q_data *q_data;
+	int i, num_planes;
+
+	vpe_dbg(ctx->dev, "type: %d\n", vb->vb2_queue->type);
+
+	q_data = get_q_data(ctx, vb->vb2_queue->type);
+	num_planes = q_data->fmt->coplanar ? 2 : 1;
+
+	for (i = 0; i < num_planes; i++) {
+		if (vb2_plane_size(vb, i) < q_data->sizeimage[i]) {
+			vpe_err(ctx->dev,
+				"data will not fit into plane (%lu < %lu)\n",
+				vb2_plane_size(vb, i),
+				(long) q_data->sizeimage[i]);
+			return -EINVAL;
+		}
+	}
+
+	for (i = 0; i < num_planes; i++)
+		vb2_set_plane_payload(vb, i, q_data->sizeimage[i]);
+
+	return 0;
+}
+
+static void vpe_buf_queue(struct vb2_buffer *vb)
+{
+	struct vpe_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
+	v4l2_m2m_buf_queue(ctx->m2m_ctx, vb);
+}
+
+static void vpe_wait_prepare(struct vb2_queue *q)
+{
+	struct vpe_ctx *ctx = vb2_get_drv_priv(q);
+	vpe_unlock(ctx);
+}
+
+static void vpe_wait_finish(struct vb2_queue *q)
+{
+	struct vpe_ctx *ctx = vb2_get_drv_priv(q);
+	vpe_lock(ctx);
+}
+
+static struct vb2_ops vpe_qops = {
+	.queue_setup	 = vpe_queue_setup,
+	.buf_prepare	 = vpe_buf_prepare,
+	.buf_queue	 = vpe_buf_queue,
+	.wait_prepare	 = vpe_wait_prepare,
+	.wait_finish	 = vpe_wait_finish,
+};
+
+static int queue_init(void *priv, struct vb2_queue *src_vq,
+		      struct vb2_queue *dst_vq)
+{
+	struct vpe_ctx *ctx = priv;
+	int ret;
+
+	memset(src_vq, 0, sizeof(*src_vq));
+	src_vq->type = V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE;
+	src_vq->io_modes = VB2_MMAP;
+	src_vq->drv_priv = ctx;
+	src_vq->buf_struct_size = sizeof(struct v4l2_m2m_buffer);
+	src_vq->ops = &vpe_qops;
+	src_vq->mem_ops = &vb2_dma_contig_memops;
+	src_vq->timestamp_type = V4L2_BUF_FLAG_TIMESTAMP_COPY;
+
+	ret = vb2_queue_init(src_vq);
+	if (ret)
+		return ret;
+
+	memset(dst_vq, 0, sizeof(*dst_vq));
+	dst_vq->type = V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE;
+	dst_vq->io_modes = VB2_MMAP;
+	dst_vq->drv_priv = ctx;
+	dst_vq->buf_struct_size = sizeof(struct v4l2_m2m_buffer);
+	dst_vq->ops = &vpe_qops;
+	dst_vq->mem_ops = &vb2_dma_contig_memops;
+	dst_vq->timestamp_type = V4L2_BUF_FLAG_TIMESTAMP_COPY;
+
+	return vb2_queue_init(dst_vq);
+}
+
+static const struct v4l2_ctrl_config vpe_bufs_per_job = {
+	.ops = &vpe_ctrl_ops,
+	.id = V4L2_CID_VPE_BUFS_PER_JOB,
+	.name = "Buffers Per Transaction",
+	.type = V4L2_CTRL_TYPE_INTEGER,
+	.def = VPE_DEF_BUFS_PER_JOB,
+	.min = 1,
+	.max = VIDEO_MAX_FRAME,
+	.step = 1,
+};
+
+/*
+ * File operations
+ */
+static int vpe_open(struct file *file)
+{
+	struct vpe_dev *dev = video_drvdata(file);
+	struct vpe_ctx *ctx = NULL;
+	struct vpe_q_data *s_q_data;
+	struct v4l2_ctrl_handler *hdl;
+	int ret;
+
+	vpe_dbg(dev, "vpe_open\n");
+
+	if (!dev->vpdma->ready) {
+		vpe_err(dev, "vpdma firmware not loaded\n");
+		return -ENODEV;
+	}
+
+	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
+	if (!ctx)
+		return -ENOMEM;
+
+	ctx->dev = dev;
+
+	if (mutex_lock_interruptible(&dev->dev_mutex)) {
+		ret = -ERESTARTSYS;
+		goto free_ctx;
+	}
+
+	ret = vpdma_create_desc_list(&ctx->desc_list, VPE_DESC_LIST_SIZE,
+			VPDMA_LIST_TYPE_NORMAL);
+	if (ret != 0)
+		goto unlock;
+
+	ret = vpdma_alloc_desc_buf(&ctx->mmr_adb, sizeof(struct vpe_mmr_adb));
+	if (ret != 0)
+		goto free_desc_list;
+
+	init_adb_hdrs(ctx);
+
+	v4l2_fh_init(&ctx->fh, video_devdata(file));
+	file->private_data = &ctx->fh;
+
+	hdl = &ctx->hdl;
+	v4l2_ctrl_handler_init(hdl, 1);
+	v4l2_ctrl_new_custom(hdl, &vpe_bufs_per_job, NULL);
+	if (hdl->error) {
+		ret = hdl->error;
+		goto exit_fh;
+	}
+	ctx->fh.ctrl_handler = hdl;
+	v4l2_ctrl_handler_setup(hdl);
+
+	s_q_data = &ctx->q_data[Q_DATA_SRC];
+	s_q_data->fmt = &vpe_formats[2];
+	s_q_data->width = 1920;
+	s_q_data->height = 1080;
+	s_q_data->sizeimage[VPE_LUMA] = (s_q_data->width * s_q_data->height *
+			s_q_data->fmt->vpdma_fmt[VPE_LUMA]->depth) >> 3;
+	s_q_data->colorspace = V4L2_COLORSPACE_SMPTE240M;
+	s_q_data->c_rect.left = 0;
+	s_q_data->c_rect.top = 0;
+	s_q_data->c_rect.width = s_q_data->width;
+	s_q_data->c_rect.height = s_q_data->height;
+	s_q_data->flags = 0;
+
+	ctx->q_data[Q_DATA_DST] = *s_q_data;
+
+	set_src_registers(ctx);
+	set_dst_registers(ctx);
+	ret = set_srcdst_params(ctx);
+	if (ret)
+		goto exit_fh;
+
+	ctx->m2m_ctx = v4l2_m2m_ctx_init(dev->m2m_dev, ctx, &queue_init);
+
+	if (IS_ERR(ctx->m2m_ctx)) {
+		ret = PTR_ERR(ctx->m2m_ctx);
+		goto exit_fh;
+	}
+
+	v4l2_fh_add(&ctx->fh);
+
+	/*
+	 * for now, just report the creation of the first instance, we can later
+	 * optimize the driver to enable or disable clocks when the first
+	 * instance is created or the last instance released
+	 */
+	if (atomic_inc_return(&dev->num_instances) == 1)
+		vpe_dbg(dev, "first instance created\n");
+
+	ctx->bufs_per_job = VPE_DEF_BUFS_PER_JOB;
+
+	ctx->load_mmrs = true;
+
+	vpe_dbg(dev, "created instance %p, m2m_ctx: %p\n",
+		ctx, ctx->m2m_ctx);
+
+	mutex_unlock(&dev->dev_mutex);
+
+	return 0;
+exit_fh:
+	v4l2_ctrl_handler_free(hdl);
+	v4l2_fh_exit(&ctx->fh);
+	vpdma_free_desc_buf(&ctx->mmr_adb);
+free_desc_list:
+	vpdma_free_desc_list(&ctx->desc_list);
+unlock:
+	mutex_unlock(&dev->dev_mutex);
+free_ctx:
+	kfree(ctx);
+	return ret;
+}
+
+static int vpe_release(struct file *file)
+{
+	struct vpe_dev *dev = video_drvdata(file);
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	vpe_dbg(dev, "releasing instance %p\n", ctx);
+
+	mutex_lock(&dev->dev_mutex);
+	vpdma_free_desc_list(&ctx->desc_list);
+	vpdma_free_desc_buf(&ctx->mmr_adb);
+
+	v4l2_fh_del(&ctx->fh);
+	v4l2_fh_exit(&ctx->fh);
+	v4l2_ctrl_handler_free(&ctx->hdl);
+	v4l2_m2m_ctx_release(ctx->m2m_ctx);
+
+	kfree(ctx);
+
+	/*
+	 * for now, just report the release of the last instance, we can later
+	 * optimize the driver to enable or disable clocks when the first
+	 * instance is created or the last instance released
+	 */
+	if (atomic_dec_return(&dev->num_instances) == 0)
+		vpe_dbg(dev, "last instance released\n");
+
+	mutex_unlock(&dev->dev_mutex);
+
+	return 0;
+}
+
+static unsigned int vpe_poll(struct file *file,
+			     struct poll_table_struct *wait)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+	struct vpe_dev *dev = ctx->dev;
+	int ret;
+
+	mutex_lock(&dev->dev_mutex);
+	ret = v4l2_m2m_poll(file, ctx->m2m_ctx, wait);
+	mutex_unlock(&dev->dev_mutex);
+	return ret;
+}
+
+static int vpe_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+	struct vpe_dev *dev = ctx->dev;
+	int ret;
+
+	if (mutex_lock_interruptible(&dev->dev_mutex))
+		return -ERESTARTSYS;
+	ret = v4l2_m2m_mmap(file, ctx->m2m_ctx, vma);
+	mutex_unlock(&dev->dev_mutex);
+	return ret;
+}
+
+static const struct v4l2_file_operations vpe_fops = {
+	.owner		= THIS_MODULE,
+	.open		= vpe_open,
+	.release	= vpe_release,
+	.poll		= vpe_poll,
+	.unlocked_ioctl	= video_ioctl2,
+	.mmap		= vpe_mmap,
+};
+
+static struct video_device vpe_videodev = {
+	.name		= VPE_MODULE_NAME,
+	.fops		= &vpe_fops,
+	.ioctl_ops	= &vpe_ioctl_ops,
+	.minor		= -1,
+	.release	= video_device_release,
+	.vfl_dir	= VFL_DIR_M2M,
+};
+
+static struct v4l2_m2m_ops m2m_ops = {
+	.device_run	= device_run,
+	.job_ready	= job_ready,
+	.job_abort	= job_abort,
+	.lock		= vpe_lock,
+	.unlock		= vpe_unlock,
+};
+
+static int vpe_runtime_get(struct platform_device *pdev)
+{
+	int r;
+
+	dev_dbg(&pdev->dev, "vpe_runtime_get\n");
+
+	r = pm_runtime_get_sync(&pdev->dev);
+	WARN_ON(r < 0);
+	return r < 0 ? r : 0;
+}
+
+static void vpe_runtime_put(struct platform_device *pdev)
+{
+
+	int r;
+
+	dev_dbg(&pdev->dev, "vpe_runtime_put\n");
+
+	r = pm_runtime_put_sync(&pdev->dev);
+	WARN_ON(r < 0 && r != -ENOSYS);
+}
+
+static int vpe_probe(struct platform_device *pdev)
+{
+	struct vpe_dev *dev;
+	struct video_device *vfd;
+	struct resource *res;
+	int ret, irq, func;
+
+	dev = devm_kzalloc(&pdev->dev, sizeof(*dev), GFP_KERNEL);
+	if (IS_ERR(dev))
+		return PTR_ERR(dev);
+
+	spin_lock_init(&dev->lock);
+
+	ret = v4l2_device_register(&pdev->dev, &dev->v4l2_dev);
+	if (ret)
+		return ret;
+
+	atomic_set(&dev->num_instances, 0);
+	mutex_init(&dev->dev_mutex);
+
+	res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "vpe_top");
+	/*
+	 * HACK: we get resource info from device tree in the form of a list of
+	 * VPE sub blocks, the driver currently uses only the base of vpe_top
+	 * for register access, the driver should be changed later to access
+	 * registers based on the sub block base addresses
+	 */
+	dev->base = devm_ioremap(&pdev->dev, res->start, SZ_32K);
+	if (IS_ERR(dev->base)) {
+		ret = PTR_ERR(dev->base);
+		goto v4l2_dev_unreg;
+	}
+
+	irq = platform_get_irq(pdev, 0);
+	ret = devm_request_irq(&pdev->dev, irq, vpe_irq, 0, VPE_MODULE_NAME,
+			dev);
+	if (ret)
+		goto v4l2_dev_unreg;
+
+	platform_set_drvdata(pdev, dev);
+
+	dev->alloc_ctx = vb2_dma_contig_init_ctx(&pdev->dev);
+	if (IS_ERR(dev->alloc_ctx)) {
+		vpe_err(dev, "Failed to alloc vb2 context\n");
+		ret = PTR_ERR(dev->alloc_ctx);
+		goto v4l2_dev_unreg;
+	}
+
+	dev->m2m_dev = v4l2_m2m_init(&m2m_ops);
+	if (IS_ERR(dev->m2m_dev)) {
+		vpe_err(dev, "Failed to init mem2mem device\n");
+		ret = PTR_ERR(dev->m2m_dev);
+		goto rel_ctx;
+	}
+
+	pm_runtime_enable(&pdev->dev);
+
+	ret = vpe_runtime_get(pdev);
+	if (ret)
+		goto rel_m2m;
+
+	/* Perform clk enable followed by reset */
+	vpe_set_clock_enable(dev, 1);
+
+	vpe_top_reset(dev);
+
+	func = read_field_reg(dev, VPE_PID, VPE_PID_FUNC_MASK,
+		VPE_PID_FUNC_SHIFT);
+	vpe_dbg(dev, "VPE PID function %x\n", func);
+
+	vpe_top_vpdma_reset(dev);
+
+	dev->vpdma = vpdma_create(pdev);
+	if (IS_ERR(dev->vpdma))
+		goto runtime_put;
+
+	vfd = &dev->vfd;
+	*vfd = vpe_videodev;
+	vfd->lock = &dev->dev_mutex;
+	vfd->v4l2_dev = &dev->v4l2_dev;
+
+	ret = video_register_device(vfd, VFL_TYPE_GRABBER, 0);
+	if (ret) {
+		vpe_err(dev, "Failed to register video device\n");
+		goto runtime_put;
+	}
+
+	video_set_drvdata(vfd, dev);
+	snprintf(vfd->name, sizeof(vfd->name), "%s", vpe_videodev.name);
+	dev_info(dev->v4l2_dev.dev, "Device registered as /dev/video%d\n",
+		vfd->num);
+
+	return 0;
+
+runtime_put:
+	vpe_runtime_put(pdev);
+rel_m2m:
+	pm_runtime_disable(&pdev->dev);
+	v4l2_m2m_release(dev->m2m_dev);
+rel_ctx:
+	vb2_dma_contig_cleanup_ctx(dev->alloc_ctx);
+v4l2_dev_unreg:
+	v4l2_device_unregister(&dev->v4l2_dev);
+
+	return ret;
+}
+
+static int vpe_remove(struct platform_device *pdev)
+{
+	struct vpe_dev *dev =
+		(struct vpe_dev *) platform_get_drvdata(pdev);
+
+	v4l2_info(&dev->v4l2_dev, "Removing " VPE_MODULE_NAME);
+
+	v4l2_m2m_release(dev->m2m_dev);
+	video_unregister_device(&dev->vfd);
+	v4l2_device_unregister(&dev->v4l2_dev);
+	vb2_dma_contig_cleanup_ctx(dev->alloc_ctx);
+
+	vpe_set_clock_enable(dev, 0);
+	vpe_runtime_put(pdev);
+	pm_runtime_disable(&pdev->dev);
+
+	return 0;
+}
+
+#if defined(CONFIG_OF)
+static const struct of_device_id vpe_of_match[] = {
+	{
+		.compatible = "ti,vpe",
+	},
+	{},
+};
+#else
+#define vpe_of_match NULL
+#endif
+
+static struct platform_driver vpe_pdrv = {
+	.probe		= vpe_probe,
+	.remove		= vpe_remove,
+	.driver		= {
+		.name	= VPE_MODULE_NAME,
+		.owner	= THIS_MODULE,
+		.of_match_table = vpe_of_match,
+	},
+};
+
+static void __exit vpe_exit(void)
+{
+	platform_driver_unregister(&vpe_pdrv);
+}
+
+static int __init vpe_init(void)
+{
+	return platform_driver_register(&vpe_pdrv);
+}
+
+module_init(vpe_init);
+module_exit(vpe_exit);
+
+MODULE_DESCRIPTION("TI VPE driver");
+MODULE_AUTHOR("Dale Farnsworth, <dale@farnsworth.org>");
+MODULE_LICENSE("GPL");
diff --git a/drivers/media/platform/ti-vpe/vpe_regs.h b/drivers/media/platform/ti-vpe/vpe_regs.h
new file mode 100644
index 0000000..ed214e8
--- /dev/null
+++ b/drivers/media/platform/ti-vpe/vpe_regs.h
@@ -0,0 +1,496 @@
+/*
+ * Copyright (c) 2013 Texas Instruments Inc.
+ *
+ * David Griego, <dagriego@biglakesoftware.com>
+ * Dale Farnsworth, <dale@farnsworth.org>
+ * Archit Taneja, <archit@ti.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#ifndef __TI_VPE_REGS_H
+#define __TI_VPE_REGS_H
+
+/* VPE register offsets and field selectors */
+
+/* VPE top level regs */
+#define VPE_PID				0x0000
+#define VPE_PID_MINOR_MASK		0x3f
+#define VPE_PID_MINOR_SHIFT		0
+#define VPE_PID_CUSTOM_MASK		0x03
+#define VPE_PID_CUSTOM_SHIFT		6
+#define VPE_PID_MAJOR_MASK		0x07
+#define VPE_PID_MAJOR_SHIFT		8
+#define VPE_PID_RTL_MASK		0x1f
+#define VPE_PID_RTL_SHIFT		11
+#define VPE_PID_FUNC_MASK		0xfff
+#define VPE_PID_FUNC_SHIFT		16
+#define VPE_PID_SCHEME_MASK		0x03
+#define VPE_PID_SCHEME_SHIFT		30
+
+#define VPE_SYSCONFIG			0x0010
+#define VPE_SYSCONFIG_IDLE_MASK		0x03
+#define VPE_SYSCONFIG_IDLE_SHIFT	2
+#define VPE_SYSCONFIG_STANDBY_MASK	0x03
+#define VPE_SYSCONFIG_STANDBY_SHIFT	4
+#define VPE_FORCE_IDLE_MODE		0
+#define VPE_NO_IDLE_MODE		1
+#define VPE_SMART_IDLE_MODE		2
+#define VPE_SMART_IDLE_WAKEUP_MODE	3
+#define VPE_FORCE_STANDBY_MODE		0
+#define VPE_NO_STANDBY_MODE		1
+#define VPE_SMART_STANDBY_MODE		2
+#define VPE_SMART_STANDBY_WAKEUP_MODE	3
+
+#define VPE_INT0_STATUS0_RAW_SET	0x0020
+#define VPE_INT0_STATUS0_RAW		VPE_INT0_STATUS0_RAW_SET
+#define VPE_INT0_STATUS0_CLR		0x0028
+#define VPE_INT0_STATUS0		VPE_INT0_STATUS0_CLR
+#define VPE_INT0_ENABLE0_SET		0x0030
+#define VPE_INT0_ENABLE0		VPE_INT0_ENABLE0_SET
+#define VPE_INT0_ENABLE0_CLR		0x0038
+#define VPE_INT0_LIST0_COMPLETE		(1 << 0)
+#define VPE_INT0_LIST0_NOTIFY		(1 << 1)
+#define VPE_INT0_LIST1_COMPLETE		(1 << 2)
+#define VPE_INT0_LIST1_NOTIFY		(1 << 3)
+#define VPE_INT0_LIST2_COMPLETE		(1 << 4)
+#define VPE_INT0_LIST2_NOTIFY		(1 << 5)
+#define VPE_INT0_LIST3_COMPLETE		(1 << 6)
+#define VPE_INT0_LIST3_NOTIFY		(1 << 7)
+#define VPE_INT0_LIST4_COMPLETE		(1 << 8)
+#define VPE_INT0_LIST4_NOTIFY		(1 << 9)
+#define VPE_INT0_LIST5_COMPLETE		(1 << 10)
+#define VPE_INT0_LIST5_NOTIFY		(1 << 11)
+#define VPE_INT0_LIST6_COMPLETE		(1 << 12)
+#define VPE_INT0_LIST6_NOTIFY		(1 << 13)
+#define VPE_INT0_LIST7_COMPLETE		(1 << 14)
+#define VPE_INT0_LIST7_NOTIFY		(1 << 15)
+#define VPE_INT0_DESCRIPTOR		(1 << 16)
+#define VPE_DEI_FMD_INT			(1 << 18)
+
+#define VPE_INT0_STATUS1_RAW_SET	0x0024
+#define VPE_INT0_STATUS1_RAW		VPE_INT0_STATUS1_RAW_SET
+#define VPE_INT0_STATUS1_CLR		0x002c
+#define VPE_INT0_STATUS1		VPE_INT0_STATUS1_CLR
+#define VPE_INT0_ENABLE1_SET		0x0034
+#define VPE_INT0_ENABLE1		VPE_INT0_ENABLE1_SET
+#define VPE_INT0_ENABLE1_CLR		0x003c
+#define VPE_INT0_CHANNEL_GROUP0		(1 << 0)
+#define VPE_INT0_CHANNEL_GROUP1		(1 << 1)
+#define VPE_INT0_CHANNEL_GROUP2		(1 << 2)
+#define VPE_INT0_CHANNEL_GROUP3		(1 << 3)
+#define VPE_INT0_CHANNEL_GROUP4		(1 << 4)
+#define VPE_INT0_CHANNEL_GROUP5		(1 << 5)
+#define VPE_INT0_CLIENT			(1 << 7)
+#define VPE_DEI_ERROR_INT		(1 << 16)
+#define VPE_DS1_UV_ERROR_INT		(1 << 22)
+
+#define VPE_INTC_EOI			0x00a0
+
+#define VPE_CLK_ENABLE			0x0100
+#define VPE_VPEDMA_CLK_ENABLE		(1 << 0)
+#define VPE_DATA_PATH_CLK_ENABLE	(1 << 1)
+
+#define VPE_CLK_RESET			0x0104
+#define VPE_VPDMA_CLK_RESET_MASK	0x1
+#define VPE_VPDMA_CLK_RESET_SHIFT	0
+#define VPE_DATA_PATH_CLK_RESET_MASK	0x1
+#define VPE_DATA_PATH_CLK_RESET_SHIFT	1
+#define VPE_MAIN_RESET_MASK		0x1
+#define VPE_MAIN_RESET_SHIFT		31
+
+#define VPE_CLK_FORMAT_SELECT		0x010c
+#define VPE_CSC_SRC_SELECT_MASK		0x03
+#define VPE_CSC_SRC_SELECT_SHIFT	0
+#define VPE_RGB_OUT_SELECT		(1 << 8)
+#define VPE_DS_SRC_SELECT_MASK		0x07
+#define VPE_DS_SRC_SELECT_SHIFT		9
+#define VPE_DS_BYPASS			(1 << 16)
+#define VPE_COLOR_SEPARATE_422		(1 << 18)
+
+#define VPE_DS_SRC_DEI_SCALER		(5 << VPE_DS_SRC_SELECT_SHIFT)
+#define VPE_CSC_SRC_DEI_SCALER		(3 << VPE_CSC_SRC_SELECT_SHIFT)
+
+#define VPE_CLK_RANGE_MAP		0x011c
+#define VPE_RANGE_RANGE_MAP_Y_MASK	0x07
+#define VPE_RANGE_RANGE_MAP_Y_SHIFT	0
+#define VPE_RANGE_RANGE_MAP_UV_MASK	0x07
+#define VPE_RANGE_RANGE_MAP_UV_SHIFT	3
+#define VPE_RANGE_MAP_ON		(1 << 6)
+#define VPE_RANGE_REDUCTION_ON		(1 << 28)
+
+/* VPE chrominance upsampler regs */
+#define VPE_US1_R0			0x0304
+#define VPE_US2_R0			0x0404
+#define VPE_US3_R0			0x0504
+#define VPE_US_C1_MASK			0x3fff
+#define VPE_US_C1_SHIFT			2
+#define VPE_US_C0_MASK			0x3fff
+#define VPE_US_C0_SHIFT			18
+#define VPE_US_MODE_MASK		0x03
+#define VPE_US_MODE_SHIFT		16
+#define VPE_ANCHOR_FID0_C1_MASK		0x3fff
+#define VPE_ANCHOR_FID0_C1_SHIFT	2
+#define VPE_ANCHOR_FID0_C0_MASK		0x3fff
+#define VPE_ANCHOR_FID0_C0_SHIFT	18
+
+#define VPE_US1_R1			0x0308
+#define VPE_US2_R1			0x0408
+#define VPE_US3_R1			0x0508
+#define VPE_ANCHOR_FID0_C3_MASK		0x3fff
+#define VPE_ANCHOR_FID0_C3_SHIFT	2
+#define VPE_ANCHOR_FID0_C2_MASK		0x3fff
+#define VPE_ANCHOR_FID0_C2_SHIFT	18
+
+#define VPE_US1_R2			0x030c
+#define VPE_US2_R2			0x040c
+#define VPE_US3_R2			0x050c
+#define VPE_INTERP_FID0_C1_MASK		0x3fff
+#define VPE_INTERP_FID0_C1_SHIFT	2
+#define VPE_INTERP_FID0_C0_MASK		0x3fff
+#define VPE_INTERP_FID0_C0_SHIFT	18
+
+#define VPE_US1_R3			0x0310
+#define VPE_US2_R3			0x0410
+#define VPE_US3_R3			0x0510
+#define VPE_INTERP_FID0_C3_MASK		0x3fff
+#define VPE_INTERP_FID0_C3_SHIFT	2
+#define VPE_INTERP_FID0_C2_MASK		0x3fff
+#define VPE_INTERP_FID0_C2_SHIFT	18
+
+#define VPE_US1_R4			0x0314
+#define VPE_US2_R4			0x0414
+#define VPE_US3_R4			0x0514
+#define VPE_ANCHOR_FID1_C1_MASK		0x3fff
+#define VPE_ANCHOR_FID1_C1_SHIFT	2
+#define VPE_ANCHOR_FID1_C0_MASK		0x3fff
+#define VPE_ANCHOR_FID1_C0_SHIFT	18
+
+#define VPE_US1_R5			0x0318
+#define VPE_US2_R5			0x0418
+#define VPE_US3_R5			0x0518
+#define VPE_ANCHOR_FID1_C3_MASK		0x3fff
+#define VPE_ANCHOR_FID1_C3_SHIFT	2
+#define VPE_ANCHOR_FID1_C2_MASK		0x3fff
+#define VPE_ANCHOR_FID1_C2_SHIFT	18
+
+#define VPE_US1_R6			0x031c
+#define VPE_US2_R6			0x041c
+#define VPE_US3_R6			0x051c
+#define VPE_INTERP_FID1_C1_MASK		0x3fff
+#define VPE_INTERP_FID1_C1_SHIFT	2
+#define VPE_INTERP_FID1_C0_MASK		0x3fff
+#define VPE_INTERP_FID1_C0_SHIFT	18
+
+#define VPE_US1_R7			0x0320
+#define VPE_US2_R7			0x0420
+#define VPE_US3_R7			0x0520
+#define VPE_INTERP_FID0_C3_MASK		0x3fff
+#define VPE_INTERP_FID0_C3_SHIFT	2
+#define VPE_INTERP_FID0_C2_MASK		0x3fff
+#define VPE_INTERP_FID0_C2_SHIFT	18
+
+/* VPE de-interlacer regs */
+#define VPE_DEI_FRAME_SIZE		0x0600
+#define VPE_DEI_WIDTH_MASK		0x07ff
+#define VPE_DEI_WIDTH_SHIFT		0
+#define VPE_DEI_HEIGHT_MASK		0x07ff
+#define VPE_DEI_HEIGHT_SHIFT		16
+#define VPE_DEI_INTERLACE_BYPASS	(1 << 29)
+#define VPE_DEI_FIELD_FLUSH		(1 << 30)
+#define VPE_DEI_PROGRESSIVE		(1 << 31)
+
+#define VPE_MDT_BYPASS			0x0604
+#define VPE_MDT_TEMPMAX_BYPASS		(1 << 0)
+#define VPE_MDT_SPATMAX_BYPASS		(1 << 1)
+
+#define VPE_MDT_SF_THRESHOLD		0x0608
+#define VPE_MDT_SF_SC_THR1_MASK		0xff
+#define VPE_MDT_SF_SC_THR1_SHIFT	0
+#define VPE_MDT_SF_SC_THR2_MASK		0xff
+#define VPE_MDT_SF_SC_THR2_SHIFT	0
+#define VPE_MDT_SF_SC_THR3_MASK		0xff
+#define VPE_MDT_SF_SC_THR3_SHIFT	0
+
+#define VPE_EDI_CONFIG			0x060c
+#define VPE_EDI_INP_MODE_MASK		0x03
+#define VPE_EDI_INP_MODE_SHIFT		0
+#define VPE_EDI_ENABLE_3D		(1 << 2)
+#define VPE_EDI_ENABLE_CHROMA_3D	(1 << 3)
+#define VPE_EDI_CHROMA3D_COR_THR_MASK	0xff
+#define VPE_EDI_CHROMA3D_COR_THR_SHIFT	8
+#define VPE_EDI_DIR_COR_LOWER_THR_MASK	0xff
+#define VPE_EDI_DIR_COR_LOWER_THR_SHIFT	16
+#define VPE_EDI_COR_SCALE_FACTOR_MASK	0xff
+#define VPE_EDI_COR_SCALE_FACTOR_SHIFT	23
+
+#define VPE_DEI_EDI_LUT_R0		0x0610
+#define VPE_EDI_LUT0_MASK		0x1f
+#define VPE_EDI_LUT0_SHIFT		0
+#define VPE_EDI_LUT1_MASK		0x1f
+#define VPE_EDI_LUT1_SHIFT		8
+#define VPE_EDI_LUT2_MASK		0x1f
+#define VPE_EDI_LUT2_SHIFT		16
+#define VPE_EDI_LUT3_MASK		0x1f
+#define VPE_EDI_LUT3_SHIFT		24
+
+#define VPE_DEI_EDI_LUT_R1		0x0614
+#define VPE_EDI_LUT0_MASK		0x1f
+#define VPE_EDI_LUT0_SHIFT		0
+#define VPE_EDI_LUT1_MASK		0x1f
+#define VPE_EDI_LUT1_SHIFT		8
+#define VPE_EDI_LUT2_MASK		0x1f
+#define VPE_EDI_LUT2_SHIFT		16
+#define VPE_EDI_LUT3_MASK		0x1f
+#define VPE_EDI_LUT3_SHIFT		24
+
+#define VPE_DEI_EDI_LUT_R2		0x0618
+#define VPE_EDI_LUT4_MASK		0x1f
+#define VPE_EDI_LUT4_SHIFT		0
+#define VPE_EDI_LUT5_MASK		0x1f
+#define VPE_EDI_LUT5_SHIFT		8
+#define VPE_EDI_LUT6_MASK		0x1f
+#define VPE_EDI_LUT6_SHIFT		16
+#define VPE_EDI_LUT7_MASK		0x1f
+#define VPE_EDI_LUT7_SHIFT		24
+
+#define VPE_DEI_EDI_LUT_R3		0x061c
+#define VPE_EDI_LUT8_MASK		0x1f
+#define VPE_EDI_LUT8_SHIFT		0
+#define VPE_EDI_LUT9_MASK		0x1f
+#define VPE_EDI_LUT9_SHIFT		8
+#define VPE_EDI_LUT10_MASK		0x1f
+#define VPE_EDI_LUT10_SHIFT		16
+#define VPE_EDI_LUT11_MASK		0x1f
+#define VPE_EDI_LUT11_SHIFT		24
+
+#define VPE_DEI_FMD_WINDOW_R0		0x0620
+#define VPE_FMD_WINDOW_MINX_MASK	0x07ff
+#define VPE_FMD_WINDOW_MINX_SHIFT	0
+#define VPE_FMD_WINDOW_MAXX_MASK	0x07ff
+#define VPE_FMD_WINDOW_MAXX_SHIFT	16
+#define VPE_FMD_WINDOW_ENABLE		(1 << 31)
+
+#define VPE_DEI_FMD_WINDOW_R1		0x0624
+#define VPE_FMD_WINDOW_MINY_MASK	0x07ff
+#define VPE_FMD_WINDOW_MINY_SHIFT	0
+#define VPE_FMD_WINDOW_MAXY_MASK	0x07ff
+#define VPE_FMD_WINDOW_MAXY_SHIFT	16
+
+#define VPE_DEI_FMD_CONTROL_R0		0x0628
+#define VPE_FMD_ENABLE			(1 << 0)
+#define VPE_FMD_LOCK			(1 << 1)
+#define VPE_FMD_JAM_DIR			(1 << 2)
+#define VPE_FMD_BED_ENABLE		(1 << 3)
+#define VPE_FMD_CAF_FIELD_THR_MASK	0xff
+#define VPE_FMD_CAF_FIELD_THR_SHIFT	16
+#define VPE_FMD_CAF_LINE_THR_MASK	0xff
+#define VPE_FMD_CAF_LINE_THR_SHIFT	24
+
+#define VPE_DEI_FMD_CONTROL_R1		0x062c
+#define VPE_FMD_CAF_THR_MASK		0x000fffff
+#define VPE_FMD_CAF_THR_SHIFT		0
+
+#define VPE_DEI_FMD_STATUS_R0		0x0630
+#define VPE_FMD_CAF_MASK		0x000fffff
+#define VPE_FMD_CAF_SHIFT		0
+#define VPE_FMD_RESET			(1 << 24)
+
+#define VPE_DEI_FMD_STATUS_R1		0x0634
+#define VPE_FMD_FIELD_DIFF_MASK		0x0fffffff
+#define VPE_FMD_FIELD_DIFF_SHIFT	0
+
+#define VPE_DEI_FMD_STATUS_R2		0x0638
+#define VPE_FMD_FRAME_DIFF_MASK		0x000fffff
+#define VPE_FMD_FRAME_DIFF_SHIFT	0
+
+/* VPE scaler regs */
+#define VPE_SC_MP_SC0			0x0700
+#define VPE_INTERLACE_O			(1 << 0)
+#define VPE_LINEAR			(1 << 1)
+#define VPE_SC_BYPASS			(1 << 2)
+#define VPE_INVT_FID			(1 << 3)
+#define VPE_USE_RAV			(1 << 4)
+#define VPE_ENABLE_EV			(1 << 5)
+#define VPE_AUTO_HS			(1 << 6)
+#define VPE_DCM_2X			(1 << 7)
+#define VPE_DCM_4X			(1 << 8)
+#define VPE_HP_BYPASS			(1 << 9)
+#define VPE_INTERLACE_I			(1 << 10)
+#define VPE_ENABLE_SIN2_VER_INTP	(1 << 11)
+#define VPE_Y_PK_EN			(1 << 14)
+#define VPE_TRIM			(1 << 15)
+#define VPE_SELFGEN_FID			(1 << 16)
+
+#define VPE_SC_MP_SC1			0x0704
+#define VPE_ROW_ACC_INC_MASK		0x07ffffff
+#define VPE_ROW_ACC_INC_SHIFT		0
+
+#define VPE_SC_MP_SC2			0x0708
+#define VPE_ROW_ACC_OFFSET_MASK		0x0fffffff
+#define VPE_ROW_ACC_OFFSET_SHIFT	0
+
+#define VPE_SC_MP_SC3			0x070c
+#define VPE_ROW_ACC_OFFSET_B_MASK	0x0fffffff
+#define VPE_ROW_ACC_OFFSET_B_SHIFT	0
+
+#define VPE_SC_MP_SC4			0x0710
+#define VPE_TAR_H_MASK			0x07ff
+#define VPE_TAR_H_SHIFT			0
+#define VPE_TAR_W_MASK			0x07ff
+#define VPE_TAR_W_SHIFT			12
+#define VPE_LIN_ACC_INC_U_MASK		0x07
+#define VPE_LIN_ACC_INC_U_SHIFT		24
+#define VPE_NLIN_ACC_INIT_U_MASK	0x07
+#define VPE_NLIN_ACC_INIT_U_SHIFT	28
+
+#define VPE_SC_MP_SC5			0x0714
+#define VPE_SRC_H_MASK			0x07ff
+#define VPE_SRC_H_SHIFT			0
+#define VPE_SRC_W_MASK			0x07ff
+#define VPE_SRC_W_SHIFT			12
+#define VPE_NLIN_ACC_INC_U_MASK		0x07
+#define VPE_NLIN_ACC_INC_U_SHIFT	24
+
+#define VPE_SC_MP_SC6			0x0718
+#define VPE_ROW_ACC_INIT_RAV_MASK	0x03ff
+#define VPE_ROW_ACC_INIT_RAV_SHIFT	0
+#define VPE_ROW_ACC_INIT_RAV_B_MASK	0x03ff
+#define VPE_ROW_ACC_INIT_RAV_B_SHIFT	10
+
+#define VPE_SC_MP_SC8			0x0720
+#define VPE_NLIN_LEFT_MASK		0x07ff
+#define VPE_NLIN_LEFT_SHIFT		0
+#define VPE_NLIN_RIGHT_MASK		0x07ff
+#define VPE_NLIN_RIGHT_SHIFT		12
+
+#define VPE_SC_MP_SC9			0x0724
+#define VPE_LIN_ACC_INC			VPE_SC_MP_SC9
+
+#define VPE_SC_MP_SC10			0x0728
+#define VPE_NLIN_ACC_INIT		VPE_SC_MP_SC10
+
+#define VPE_SC_MP_SC11			0x072c
+#define VPE_NLIN_ACC_INC		VPE_SC_MP_SC11
+
+#define VPE_SC_MP_SC12			0x0730
+#define VPE_COL_ACC_OFFSET_MASK		0x01ffffff
+#define VPE_COL_ACC_OFFSET_SHIFT	0
+
+#define VPE_SC_MP_SC13			0x0734
+#define VPE_SC_FACTOR_RAV_MASK		0x03ff
+#define VPE_SC_FACTOR_RAV_SHIFT		0
+#define VPE_CHROMA_INTP_THR_MASK	0x03ff
+#define VPE_CHROMA_INTP_THR_SHIFT	12
+#define VPE_DELTA_CHROMA_THR_MASK	0x0f
+#define VPE_DELTA_CHROMA_THR_SHIFT	24
+
+#define VPE_SC_MP_SC17			0x0744
+#define VPE_EV_THR_MASK			0x03ff
+#define VPE_EV_THR_SHIFT		12
+#define VPE_DELTA_LUMA_THR_MASK		0x0f
+#define VPE_DELTA_LUMA_THR_SHIFT	24
+#define VPE_DELTA_EV_THR_MASK		0x0f
+#define VPE_DELTA_EV_THR_SHIFT		28
+
+#define VPE_SC_MP_SC18			0x0748
+#define VPE_HS_FACTOR_MASK		0x03ff
+#define VPE_HS_FACTOR_SHIFT		0
+#define VPE_CONF_DEFAULT_MASK		0x01ff
+#define VPE_CONF_DEFAULT_SHIFT		16
+
+#define VPE_SC_MP_SC19			0x074c
+#define VPE_HPF_COEFF0_MASK		0xff
+#define VPE_HPF_COEFF0_SHIFT		0
+#define VPE_HPF_COEFF1_MASK		0xff
+#define VPE_HPF_COEFF1_SHIFT		8
+#define VPE_HPF_COEFF2_MASK		0xff
+#define VPE_HPF_COEFF2_SHIFT		16
+#define VPE_HPF_COEFF3_MASK		0xff
+#define VPE_HPF_COEFF3_SHIFT		23
+
+#define VPE_SC_MP_SC20			0x0750
+#define VPE_HPF_COEFF4_MASK		0xff
+#define VPE_HPF_COEFF4_SHIFT		0
+#define VPE_HPF_COEFF5_MASK		0xff
+#define VPE_HPF_COEFF5_SHIFT		8
+#define VPE_HPF_NORM_SHIFT_MASK		0x07
+#define VPE_HPF_NORM_SHIFT_SHIFT	16
+#define VPE_NL_LIMIT_MASK		0x1ff
+#define VPE_NL_LIMIT_SHIFT		20
+
+#define VPE_SC_MP_SC21			0x0754
+#define VPE_NL_LO_THR_MASK		0x01ff
+#define VPE_NL_LO_THR_SHIFT		0
+#define VPE_NL_LO_SLOPE_MASK		0xff
+#define VPE_NL_LO_SLOPE_SHIFT		16
+
+#define VPE_SC_MP_SC22			0x0758
+#define VPE_NL_HI_THR_MASK		0x01ff
+#define VPE_NL_HI_THR_SHIFT		0
+#define VPE_NL_HI_SLOPE_SH_MASK		0x07
+#define VPE_NL_HI_SLOPE_SH_SHIFT	16
+
+#define VPE_SC_MP_SC23			0x075c
+#define VPE_GRADIENT_THR_MASK		0x07ff
+#define VPE_GRADIENT_THR_SHIFT		0
+#define VPE_GRADIENT_THR_RANGE_MASK	0x0f
+#define VPE_GRADIENT_THR_RANGE_SHIFT	12
+#define VPE_MIN_GY_THR_MASK		0xff
+#define VPE_MIN_GY_THR_SHIFT		16
+#define VPE_MIN_GY_THR_RANGE_MASK	0x0f
+#define VPE_MIN_GY_THR_RANGE_SHIFT	28
+
+#define VPE_SC_MP_SC24			0x0760
+#define VPE_ORG_H_MASK			0x07ff
+#define VPE_ORG_H_SHIFT			0
+#define VPE_ORG_W_MASK			0x07ff
+#define VPE_ORG_W_SHIFT			16
+
+#define VPE_SC_MP_SC25			0x0764
+#define VPE_OFF_H_MASK			0x07ff
+#define VPE_OFF_H_SHIFT			0
+#define VPE_OFF_W_MASK			0x07ff
+#define VPE_OFF_W_SHIFT			16
+
+/* VPE color space converter regs */
+#define VPE_CSC_CSC00			0x5700
+#define VPE_CSC_A0_MASK			0x1fff
+#define VPE_CSC_A0_SHIFT		0
+#define VPE_CSC_B0_MASK			0x1fff
+#define VPE_CSC_B0_SHIFT		16
+
+#define VPE_CSC_CSC01			0x5704
+#define VPE_CSC_C0_MASK			0x1fff
+#define VPE_CSC_C0_SHIFT		0
+#define VPE_CSC_A1_MASK			0x1fff
+#define VPE_CSC_A1_SHIFT		16
+
+#define VPE_CSC_CSC02			0x5708
+#define VPE_CSC_B1_MASK			0x1fff
+#define VPE_CSC_B1_SHIFT		0
+#define VPE_CSC_C1_MASK			0x1fff
+#define VPE_CSC_C1_SHIFT		16
+
+#define VPE_CSC_CSC03			0x570c
+#define VPE_CSC_A2_MASK			0x1fff
+#define VPE_CSC_A2_SHIFT		0
+#define VPE_CSC_B2_MASK			0x1fff
+#define VPE_CSC_B2_SHIFT		16
+
+#define VPE_CSC_CSC04			0x5710
+#define VPE_CSC_C2_MASK			0x1fff
+#define VPE_CSC_C2_SHIFT		0
+#define VPE_CSC_D0_MASK			0x0fff
+#define VPE_CSC_D0_SHIFT		16
+
+#define VPE_CSC_CSC05			0x5714
+#define VPE_CSC_D1_MASK			0x0fff
+#define VPE_CSC_D1_SHIFT		0
+#define VPE_CSC_D2_MASK			0x0fff
+#define VPE_CSC_D2_SHIFT		16
+#define VPE_CSC_BYPASS			(1 << 28)
+
+#endif
diff --git a/include/uapi/linux/v4l2-controls.h b/include/uapi/linux/v4l2-controls.h
index 083bb5a..1666aab 100644
--- a/include/uapi/linux/v4l2-controls.h
+++ b/include/uapi/linux/v4l2-controls.h
@@ -160,6 +160,10 @@ enum v4l2_colorfx {
  * of controls. Total of 16 controls is reserved for this driver */
 #define V4L2_CID_USER_SI476X_BASE		(V4L2_CID_USER_BASE + 0x1040)
 
+/* The base for the TI VPE driver controls. Total of 16 controls is reserved for
+ * this driver */
+#define V4L2_CID_USER_TI_VPE_BASE		(V4L2_CID_USER_BASE + 0x1050)
+
 /* MPEG-class control IDs */
 /* The MPEG controls are applicable to all codec controls
  * and the 'MPEG' part of the define is historical */
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* Re: [PATCH v5 3/4] v4l: ti-vpe: Add VPE mem to mem driver
  2013-10-09 14:29       ` Archit Taneja
  (?)
@ 2013-10-11  7:46       ` Hans Verkuil
  2013-10-15 13:47           ` Archit Taneja
  -1 siblings, 1 reply; 138+ messages in thread
From: Hans Verkuil @ 2013-10-11  7:46 UTC (permalink / raw)
  To: Archit Taneja; +Cc: linux-media, linux-omap, laurent.pinchart

On 10/09/2013 04:29 PM, Archit Taneja wrote:
> VPE is a block which consists of a single memory to memory path which can
> perform chrominance up/down sampling, de-interlacing, scaling, and color space
> conversion of raster or tiled YUV420 coplanar, YUV422 coplanar or YUV422
> interleaved video formats.
> 
> We create a mem2mem driver based primarily on the mem2mem-testdev example.
> The de-interlacer, scaler and color space converter are all bypassed for now
> to keep the driver simple. Chroma up/down sampler blocks are implemented, so
> conversion beteen different YUV formats is possible.
> 
> Each mem2mem context allocates a buffer for VPE MMR values which it will use
> when it gets access to the VPE HW via the mem2mem queue, it also allocates
> a VPDMA descriptor list to which configuration and data descriptors are added.
> 
> Based on the information received via v4l2 ioctls for the source and
> destination queues, the driver configures the values for the MMRs, and stores
> them in the buffer. There are also some VPDMA parameters like frame start and
> line mode which needs to be configured, these are configured by direct register
> writes via the VPDMA helper functions.
> 
> The driver's device_run() mem2mem op will add each descriptor based on how the
> source and destination queues are set up for the given ctx, once the list is
> prepared, it's submitted to VPDMA, these descriptors when parsed by VPDMA will
> upload MMR registers, start DMA of video buffers on the various input and output
> clients/ports.
> 
> When the list is parsed completely(and the DMAs on all the output ports done),
> an interrupt is generated which we use to notify that the source and destination
> buffers are done.
> 
> The rest of the driver is quite similar to other mem2mem drivers, we use the
> multiplane v4l2 ioctls as the HW support coplanar formats.
> 
> Signed-off-by: Archit Taneja <archit@ti.com>

Acked-by: Hans Verkuil <hans.verkuil@cisco.com>

Regards,

	Hans


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v5 3/4] v4l: ti-vpe: Add VPE mem to mem driver
  2013-10-11  7:46       ` Hans Verkuil
@ 2013-10-15 13:47           ` Archit Taneja
  0 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-10-15 13:47 UTC (permalink / raw)
  To: Hans Verkuil; +Cc: linux-media, linux-omap, laurent.pinchart

Hi Hans,

On Friday 11 October 2013 01:16 PM, Hans Verkuil wrote:
> On 10/09/2013 04:29 PM, Archit Taneja wrote:
>> VPE is a block which consists of a single memory to memory path which can
>> perform chrominance up/down sampling, de-interlacing, scaling, and color space
>> conversion of raster or tiled YUV420 coplanar, YUV422 coplanar or YUV422
>> interleaved video formats.
>>
>> We create a mem2mem driver based primarily on the mem2mem-testdev example.
>> The de-interlacer, scaler and color space converter are all bypassed for now
>> to keep the driver simple. Chroma up/down sampler blocks are implemented, so
>> conversion beteen different YUV formats is possible.
>>
>> Each mem2mem context allocates a buffer for VPE MMR values which it will use
>> when it gets access to the VPE HW via the mem2mem queue, it also allocates
>> a VPDMA descriptor list to which configuration and data descriptors are added.
>>
>> Based on the information received via v4l2 ioctls for the source and
>> destination queues, the driver configures the values for the MMRs, and stores
>> them in the buffer. There are also some VPDMA parameters like frame start and
>> line mode which needs to be configured, these are configured by direct register
>> writes via the VPDMA helper functions.
>>
>> The driver's device_run() mem2mem op will add each descriptor based on how the
>> source and destination queues are set up for the given ctx, once the list is
>> prepared, it's submitted to VPDMA, these descriptors when parsed by VPDMA will
>> upload MMR registers, start DMA of video buffers on the various input and output
>> clients/ports.
>>
>> When the list is parsed completely(and the DMAs on all the output ports done),
>> an interrupt is generated which we use to notify that the source and destination
>> buffers are done.
>>
>> The rest of the driver is quite similar to other mem2mem drivers, we use the
>> multiplane v4l2 ioctls as the HW support coplanar formats.
>>
>> Signed-off-by: Archit Taneja <archit@ti.com>
>
> Acked-by: Hans Verkuil <hans.verkuil@cisco.com>
>

Thanks for the Acks. Is it possible to queue these for 3.13?

Archit


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v5 3/4] v4l: ti-vpe: Add VPE mem to mem driver
@ 2013-10-15 13:47           ` Archit Taneja
  0 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-10-15 13:47 UTC (permalink / raw)
  To: Hans Verkuil; +Cc: linux-media, linux-omap, laurent.pinchart

Hi Hans,

On Friday 11 October 2013 01:16 PM, Hans Verkuil wrote:
> On 10/09/2013 04:29 PM, Archit Taneja wrote:
>> VPE is a block which consists of a single memory to memory path which can
>> perform chrominance up/down sampling, de-interlacing, scaling, and color space
>> conversion of raster or tiled YUV420 coplanar, YUV422 coplanar or YUV422
>> interleaved video formats.
>>
>> We create a mem2mem driver based primarily on the mem2mem-testdev example.
>> The de-interlacer, scaler and color space converter are all bypassed for now
>> to keep the driver simple. Chroma up/down sampler blocks are implemented, so
>> conversion beteen different YUV formats is possible.
>>
>> Each mem2mem context allocates a buffer for VPE MMR values which it will use
>> when it gets access to the VPE HW via the mem2mem queue, it also allocates
>> a VPDMA descriptor list to which configuration and data descriptors are added.
>>
>> Based on the information received via v4l2 ioctls for the source and
>> destination queues, the driver configures the values for the MMRs, and stores
>> them in the buffer. There are also some VPDMA parameters like frame start and
>> line mode which needs to be configured, these are configured by direct register
>> writes via the VPDMA helper functions.
>>
>> The driver's device_run() mem2mem op will add each descriptor based on how the
>> source and destination queues are set up for the given ctx, once the list is
>> prepared, it's submitted to VPDMA, these descriptors when parsed by VPDMA will
>> upload MMR registers, start DMA of video buffers on the various input and output
>> clients/ports.
>>
>> When the list is parsed completely(and the DMAs on all the output ports done),
>> an interrupt is generated which we use to notify that the source and destination
>> buffers are done.
>>
>> The rest of the driver is quite similar to other mem2mem drivers, we use the
>> multiplane v4l2 ioctls as the HW support coplanar formats.
>>
>> Signed-off-by: Archit Taneja <archit@ti.com>
>
> Acked-by: Hans Verkuil <hans.verkuil@cisco.com>
>

Thanks for the Acks. Is it possible to queue these for 3.13?

Archit

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v5 3/4] v4l: ti-vpe: Add VPE mem to mem driver
  2013-10-15 13:47           ` Archit Taneja
  (?)
@ 2013-10-15 13:51           ` Hans Verkuil
  2013-10-15 14:13             ` Kamil Debski
  2013-10-15 15:54             ` Kamil Debski
  -1 siblings, 2 replies; 138+ messages in thread
From: Hans Verkuil @ 2013-10-15 13:51 UTC (permalink / raw)
  To: Kamil Debski; +Cc: Archit Taneja, linux-media, linux-omap, laurent.pinchart

Kamil,

Can you take this driver as m2m maintainer or should I take it?

Regards,

	Hans

On 10/15/2013 03:47 PM, Archit Taneja wrote:
> Hi Hans,
> 
> On Friday 11 October 2013 01:16 PM, Hans Verkuil wrote:
>> On 10/09/2013 04:29 PM, Archit Taneja wrote:
>>> VPE is a block which consists of a single memory to memory path which can
>>> perform chrominance up/down sampling, de-interlacing, scaling, and color space
>>> conversion of raster or tiled YUV420 coplanar, YUV422 coplanar or YUV422
>>> interleaved video formats.
>>>
>>> We create a mem2mem driver based primarily on the mem2mem-testdev example.
>>> The de-interlacer, scaler and color space converter are all bypassed for now
>>> to keep the driver simple. Chroma up/down sampler blocks are implemented, so
>>> conversion beteen different YUV formats is possible.
>>>
>>> Each mem2mem context allocates a buffer for VPE MMR values which it will use
>>> when it gets access to the VPE HW via the mem2mem queue, it also allocates
>>> a VPDMA descriptor list to which configuration and data descriptors are added.
>>>
>>> Based on the information received via v4l2 ioctls for the source and
>>> destination queues, the driver configures the values for the MMRs, and stores
>>> them in the buffer. There are also some VPDMA parameters like frame start and
>>> line mode which needs to be configured, these are configured by direct register
>>> writes via the VPDMA helper functions.
>>>
>>> The driver's device_run() mem2mem op will add each descriptor based on how the
>>> source and destination queues are set up for the given ctx, once the list is
>>> prepared, it's submitted to VPDMA, these descriptors when parsed by VPDMA will
>>> upload MMR registers, start DMA of video buffers on the various input and output
>>> clients/ports.
>>>
>>> When the list is parsed completely(and the DMAs on all the output ports done),
>>> an interrupt is generated which we use to notify that the source and destination
>>> buffers are done.
>>>
>>> The rest of the driver is quite similar to other mem2mem drivers, we use the
>>> multiplane v4l2 ioctls as the HW support coplanar formats.
>>>
>>> Signed-off-by: Archit Taneja <archit@ti.com>
>>
>> Acked-by: Hans Verkuil <hans.verkuil@cisco.com>
>>
> 
> Thanks for the Acks. Is it possible to queue these for 3.13?
> 
> Archit
> 


^ permalink raw reply	[flat|nested] 138+ messages in thread

* RE: [PATCH v5 3/4] v4l: ti-vpe: Add VPE mem to mem driver
  2013-10-15 13:51           ` Hans Verkuil
@ 2013-10-15 14:13             ` Kamil Debski
  2013-10-15 15:54             ` Kamil Debski
  1 sibling, 0 replies; 138+ messages in thread
From: Kamil Debski @ 2013-10-15 14:13 UTC (permalink / raw)
  To: 'Hans Verkuil'
  Cc: 'Archit Taneja', linux-media, linux-omap, laurent.pinchart

Hi Hans,

Now, I am a bit busy with... USB. I have to admit I have a backlog of
patches to look through and prepare a tree for Mauro.
I wanted to do this on Thursday or Friday. Is it ok? BTW if you see any m2m
patches in patchwork feel free to delegate them to me.

Best wishes,
-- 
Kamil Debski
Linux Kernel Developer
Samsung R&D Institute Poland


> -----Original Message-----
> From: Hans Verkuil [mailto:hverkuil@xs4all.nl]
> Sent: Tuesday, October 15, 2013 3:52 PM
> To: Kamil Debski
> Cc: Archit Taneja; linux-media@vger.kernel.org; linux-
> omap@vger.kernel.org; laurent.pinchart@ideasonboard.com
> Subject: Re: [PATCH v5 3/4] v4l: ti-vpe: Add VPE mem to mem driver
> 
> Kamil,
> 
> Can you take this driver as m2m maintainer or should I take it?
> 
> Regards,
> 
> 	Hans
> 
> On 10/15/2013 03:47 PM, Archit Taneja wrote:
> > Hi Hans,
> >
> > On Friday 11 October 2013 01:16 PM, Hans Verkuil wrote:
> >> On 10/09/2013 04:29 PM, Archit Taneja wrote:
> >>> VPE is a block which consists of a single memory to memory path
> >>> which can perform chrominance up/down sampling, de-interlacing,
> >>> scaling, and color space conversion of raster or tiled YUV420
> >>> coplanar, YUV422 coplanar or YUV422 interleaved video formats.
> >>>
> >>> We create a mem2mem driver based primarily on the mem2mem-testdev
> example.
> >>> The de-interlacer, scaler and color space converter are all
> bypassed
> >>> for now to keep the driver simple. Chroma up/down sampler blocks
> are
> >>> implemented, so conversion beteen different YUV formats is possible.
> >>>
> >>> Each mem2mem context allocates a buffer for VPE MMR values which it
> >>> will use when it gets access to the VPE HW via the mem2mem queue,
> it
> >>> also allocates a VPDMA descriptor list to which configuration and
> data descriptors are added.
> >>>
> >>> Based on the information received via v4l2 ioctls for the source
> and
> >>> destination queues, the driver configures the values for the MMRs,
> >>> and stores them in the buffer. There are also some VPDMA parameters
> >>> like frame start and line mode which needs to be configured, these
> >>> are configured by direct register writes via the VPDMA helper
> functions.
> >>>
> >>> The driver's device_run() mem2mem op will add each descriptor based
> >>> on how the source and destination queues are set up for the given
> >>> ctx, once the list is prepared, it's submitted to VPDMA, these
> >>> descriptors when parsed by VPDMA will upload MMR registers, start
> >>> DMA of video buffers on the various input and output clients/ports.
> >>>
> >>> When the list is parsed completely(and the DMAs on all the output
> >>> ports done), an interrupt is generated which we use to notify that
> >>> the source and destination buffers are done.
> >>>
> >>> The rest of the driver is quite similar to other mem2mem drivers,
> we
> >>> use the multiplane v4l2 ioctls as the HW support coplanar formats.
> >>>
> >>> Signed-off-by: Archit Taneja <archit@ti.com>
> >>
> >> Acked-by: Hans Verkuil <hans.verkuil@cisco.com>
> >>
> >
> > Thanks for the Acks. Is it possible to queue these for 3.13?
> >
> > Archit
> >


^ permalink raw reply	[flat|nested] 138+ messages in thread

* RE: [PATCH v5 3/4] v4l: ti-vpe: Add VPE mem to mem driver
  2013-10-15 13:51           ` Hans Verkuil
  2013-10-15 14:13             ` Kamil Debski
@ 2013-10-15 15:54             ` Kamil Debski
  2013-10-16  5:08                 ` Archit Taneja
  1 sibling, 1 reply; 138+ messages in thread
From: Kamil Debski @ 2013-10-15 15:54 UTC (permalink / raw)
  To: 'Hans Verkuil'
  Cc: 'Archit Taneja', linux-media, linux-omap, laurent.pinchart

Hi Archit,

Please find my comment below.

> From: Hans Verkuil [mailto:hverkuil@xs4all.nl]
> Sent: Tuesday, October 15, 2013 3:52 PM
> 
> Kamil,
> 
> Can you take this driver as m2m maintainer or should I take it?
> 
> Regards,
> 
> 	Hans
> 
> On 10/15/2013 03:47 PM, Archit Taneja wrote:
> > Hi Hans,
> >
> > On Friday 11 October 2013 01:16 PM, Hans Verkuil wrote:
> >> On 10/09/2013 04:29 PM, Archit Taneja wrote:
> >>> VPE is a block which consists of a single memory to memory path
> >>> which can perform chrominance up/down sampling, de-interlacing,
> >>> scaling, and color space conversion of raster or tiled YUV420
> >>> coplanar, YUV422 coplanar or YUV422 interleaved video formats.
> >>>
> >>> We create a mem2mem driver based primarily on the mem2mem-testdev
> example.
> >>> The de-interlacer, scaler and color space converter are all
> bypassed
> >>> for now to keep the driver simple. Chroma up/down sampler blocks
> are
> >>> implemented, so conversion beteen different YUV formats is possible.
> >>>
> >>> Each mem2mem context allocates a buffer for VPE MMR values which it
> >>> will use when it gets access to the VPE HW via the mem2mem queue,
> it
> >>> also allocates a VPDMA descriptor list to which configuration and
> data descriptors are added.
> >>>
> >>> Based on the information received via v4l2 ioctls for the source
> and
> >>> destination queues, the driver configures the values for the MMRs,
> >>> and stores them in the buffer. There are also some VPDMA parameters
> >>> like frame start and line mode which needs to be configured, these
> >>> are configured by direct register writes via the VPDMA helper
> functions.
> >>>
> >>> The driver's device_run() mem2mem op will add each descriptor based
> >>> on how the source and destination queues are set up for the given
> >>> ctx, once the list is prepared, it's submitted to VPDMA, these
> >>> descriptors when parsed by VPDMA will upload MMR registers, start
> >>> DMA of video buffers on the various input and output clients/ports.
> >>>
> >>> When the list is parsed completely(and the DMAs on all the output
> >>> ports done), an interrupt is generated which we use to notify that
> >>> the source and destination buffers are done.
> >>>
> >>> The rest of the driver is quite similar to other mem2mem drivers,
> we
> >>> use the multiplane v4l2 ioctls as the HW support coplanar formats.
> >>>
> >>> Signed-off-by: Archit Taneja <archit@ti.com>
> >>
> >> Acked-by: Hans Verkuil <hans.verkuil@cisco.com>
> >>
> >
> > Thanks for the Acks. Is it possible to queue these for 3.13?

Yep, it is possible. But [v4,4/4] v4l: ti-vpe: Add de-interlacer support in
VPE does
not apply after applying [v5,3/4] v4l: ti-vpe: Add VPE mem to mem driver.

Please send a v5 with all patches.

Best wishes,
Kamil


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v5 3/4] v4l: ti-vpe: Add VPE mem to mem driver
  2013-10-15 15:54             ` Kamil Debski
@ 2013-10-16  5:08                 ` Archit Taneja
  0 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-10-16  5:08 UTC (permalink / raw)
  To: Kamil Debski, 'Hans Verkuil'
  Cc: linux-media, linux-omap, laurent.pinchart

Hi,

On Tuesday 15 October 2013 09:24 PM, Kamil Debski wrote:
> Hi Archit,
>
> Please find my comment below.
>
>> From: Hans Verkuil [mailto:hverkuil@xs4all.nl]
>> Sent: Tuesday, October 15, 2013 3:52 PM
>>
>> Kamil,
>>
>> Can you take this driver as m2m maintainer or should I take it?
>>
>> Regards,
>>
>> 	Hans
>>
>> On 10/15/2013 03:47 PM, Archit Taneja wrote:
>>> Hi Hans,
>>>
>>> On Friday 11 October 2013 01:16 PM, Hans Verkuil wrote:
>>>> On 10/09/2013 04:29 PM, Archit Taneja wrote:
>>>>> VPE is a block which consists of a single memory to memory path
>>>>> which can perform chrominance up/down sampling, de-interlacing,
>>>>> scaling, and color space conversion of raster or tiled YUV420
>>>>> coplanar, YUV422 coplanar or YUV422 interleaved video formats.
>>>>>
>>>>> We create a mem2mem driver based primarily on the mem2mem-testdev
>> example.
>>>>> The de-interlacer, scaler and color space converter are all
>> bypassed
>>>>> for now to keep the driver simple. Chroma up/down sampler blocks
>> are
>>>>> implemented, so conversion beteen different YUV formats is possible.
>>>>>
>>>>> Each mem2mem context allocates a buffer for VPE MMR values which it
>>>>> will use when it gets access to the VPE HW via the mem2mem queue,
>> it
>>>>> also allocates a VPDMA descriptor list to which configuration and
>> data descriptors are added.
>>>>>
>>>>> Based on the information received via v4l2 ioctls for the source
>> and
>>>>> destination queues, the driver configures the values for the MMRs,
>>>>> and stores them in the buffer. There are also some VPDMA parameters
>>>>> like frame start and line mode which needs to be configured, these
>>>>> are configured by direct register writes via the VPDMA helper
>> functions.
>>>>>
>>>>> The driver's device_run() mem2mem op will add each descriptor based
>>>>> on how the source and destination queues are set up for the given
>>>>> ctx, once the list is prepared, it's submitted to VPDMA, these
>>>>> descriptors when parsed by VPDMA will upload MMR registers, start
>>>>> DMA of video buffers on the various input and output clients/ports.
>>>>>
>>>>> When the list is parsed completely(and the DMAs on all the output
>>>>> ports done), an interrupt is generated which we use to notify that
>>>>> the source and destination buffers are done.
>>>>>
>>>>> The rest of the driver is quite similar to other mem2mem drivers,
>> we
>>>>> use the multiplane v4l2 ioctls as the HW support coplanar formats.
>>>>>
>>>>> Signed-off-by: Archit Taneja <archit@ti.com>
>>>>
>>>> Acked-by: Hans Verkuil <hans.verkuil@cisco.com>
>>>>
>>>
>>> Thanks for the Acks. Is it possible to queue these for 3.13?
>
> Yep, it is possible. But [v4,4/4] v4l: ti-vpe: Add de-interlacer support in
> VPE does
> not apply after applying [v5,3/4] v4l: ti-vpe: Add VPE mem to mem driver.
>
> Please send a v5 with all patches.

Ah, sorry about that. There was a minor conflict with the updated patch. 
Will post out v5.

Thanks,
Archit


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v5 3/4] v4l: ti-vpe: Add VPE mem to mem driver
@ 2013-10-16  5:08                 ` Archit Taneja
  0 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-10-16  5:08 UTC (permalink / raw)
  To: Kamil Debski, 'Hans Verkuil'
  Cc: linux-media, linux-omap, laurent.pinchart

Hi,

On Tuesday 15 October 2013 09:24 PM, Kamil Debski wrote:
> Hi Archit,
>
> Please find my comment below.
>
>> From: Hans Verkuil [mailto:hverkuil@xs4all.nl]
>> Sent: Tuesday, October 15, 2013 3:52 PM
>>
>> Kamil,
>>
>> Can you take this driver as m2m maintainer or should I take it?
>>
>> Regards,
>>
>> 	Hans
>>
>> On 10/15/2013 03:47 PM, Archit Taneja wrote:
>>> Hi Hans,
>>>
>>> On Friday 11 October 2013 01:16 PM, Hans Verkuil wrote:
>>>> On 10/09/2013 04:29 PM, Archit Taneja wrote:
>>>>> VPE is a block which consists of a single memory to memory path
>>>>> which can perform chrominance up/down sampling, de-interlacing,
>>>>> scaling, and color space conversion of raster or tiled YUV420
>>>>> coplanar, YUV422 coplanar or YUV422 interleaved video formats.
>>>>>
>>>>> We create a mem2mem driver based primarily on the mem2mem-testdev
>> example.
>>>>> The de-interlacer, scaler and color space converter are all
>> bypassed
>>>>> for now to keep the driver simple. Chroma up/down sampler blocks
>> are
>>>>> implemented, so conversion beteen different YUV formats is possible.
>>>>>
>>>>> Each mem2mem context allocates a buffer for VPE MMR values which it
>>>>> will use when it gets access to the VPE HW via the mem2mem queue,
>> it
>>>>> also allocates a VPDMA descriptor list to which configuration and
>> data descriptors are added.
>>>>>
>>>>> Based on the information received via v4l2 ioctls for the source
>> and
>>>>> destination queues, the driver configures the values for the MMRs,
>>>>> and stores them in the buffer. There are also some VPDMA parameters
>>>>> like frame start and line mode which needs to be configured, these
>>>>> are configured by direct register writes via the VPDMA helper
>> functions.
>>>>>
>>>>> The driver's device_run() mem2mem op will add each descriptor based
>>>>> on how the source and destination queues are set up for the given
>>>>> ctx, once the list is prepared, it's submitted to VPDMA, these
>>>>> descriptors when parsed by VPDMA will upload MMR registers, start
>>>>> DMA of video buffers on the various input and output clients/ports.
>>>>>
>>>>> When the list is parsed completely(and the DMAs on all the output
>>>>> ports done), an interrupt is generated which we use to notify that
>>>>> the source and destination buffers are done.
>>>>>
>>>>> The rest of the driver is quite similar to other mem2mem drivers,
>> we
>>>>> use the multiplane v4l2 ioctls as the HW support coplanar formats.
>>>>>
>>>>> Signed-off-by: Archit Taneja <archit@ti.com>
>>>>
>>>> Acked-by: Hans Verkuil <hans.verkuil@cisco.com>
>>>>
>>>
>>> Thanks for the Acks. Is it possible to queue these for 3.13?
>
> Yep, it is possible. But [v4,4/4] v4l: ti-vpe: Add de-interlacer support in
> VPE does
> not apply after applying [v5,3/4] v4l: ti-vpe: Add VPE mem to mem driver.
>
> Please send a v5 with all patches.

Ah, sorry about that. There was a minor conflict with the updated patch. 
Will post out v5.

Thanks,
Archit

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v5 0/4] v4l: VPE mem to mem driver
  2013-09-06 10:12     ` Archit Taneja
@ 2013-10-16  5:36       ` Archit Taneja
  -1 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-10-16  5:36 UTC (permalink / raw)
  To: k.debski; +Cc: hverkuil, linux-media, linux-omap, Archit Taneja

VPE(Video Processing Engine) is an IP found on DRA7xx, this series adds VPE as a
mem to mem v4l2 driver, and VPDMA as a helper library.

The first version of the patch series described VPE in detail, you can have a
look at it here:

http://www.spinics.net/lists/linux-media/msg66518.html

Changes in v5:
 - updated how pix->colorspace is set.
 - adds comments on what our private control ID is used for.

Archit Taneja (4):
  v4l: ti-vpe: Create a vpdma helper library
  v4l: ti-vpe: Add helpers for creating VPDMA descriptors
  v4l: ti-vpe: Add VPE mem to mem driver
  v4l: ti-vpe: Add de-interlacer support in VPE

 drivers/media/platform/Kconfig             |   16 +
 drivers/media/platform/Makefile            |    2 +
 drivers/media/platform/ti-vpe/Makefile     |    5 +
 drivers/media/platform/ti-vpe/vpdma.c      |  846 +++++++++++
 drivers/media/platform/ti-vpe/vpdma.h      |  203 +++
 drivers/media/platform/ti-vpe/vpdma_priv.h |  641 +++++++++
 drivers/media/platform/ti-vpe/vpe.c        | 2099 ++++++++++++++++++++++++++++
 drivers/media/platform/ti-vpe/vpe_regs.h   |  496 +++++++
 include/uapi/linux/v4l2-controls.h         |    4 +
 9 files changed, 4312 insertions(+)
 create mode 100644 drivers/media/platform/ti-vpe/Makefile
 create mode 100644 drivers/media/platform/ti-vpe/vpdma.c
 create mode 100644 drivers/media/platform/ti-vpe/vpdma.h
 create mode 100644 drivers/media/platform/ti-vpe/vpdma_priv.h
 create mode 100644 drivers/media/platform/ti-vpe/vpe.c
 create mode 100644 drivers/media/platform/ti-vpe/vpe_regs.h

-- 
1.8.1.2


^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v5 0/4] v4l: VPE mem to mem driver
@ 2013-10-16  5:36       ` Archit Taneja
  0 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-10-16  5:36 UTC (permalink / raw)
  To: k.debski; +Cc: hverkuil, linux-media, linux-omap, Archit Taneja

VPE(Video Processing Engine) is an IP found on DRA7xx, this series adds VPE as a
mem to mem v4l2 driver, and VPDMA as a helper library.

The first version of the patch series described VPE in detail, you can have a
look at it here:

http://www.spinics.net/lists/linux-media/msg66518.html

Changes in v5:
 - updated how pix->colorspace is set.
 - adds comments on what our private control ID is used for.

Archit Taneja (4):
  v4l: ti-vpe: Create a vpdma helper library
  v4l: ti-vpe: Add helpers for creating VPDMA descriptors
  v4l: ti-vpe: Add VPE mem to mem driver
  v4l: ti-vpe: Add de-interlacer support in VPE

 drivers/media/platform/Kconfig             |   16 +
 drivers/media/platform/Makefile            |    2 +
 drivers/media/platform/ti-vpe/Makefile     |    5 +
 drivers/media/platform/ti-vpe/vpdma.c      |  846 +++++++++++
 drivers/media/platform/ti-vpe/vpdma.h      |  203 +++
 drivers/media/platform/ti-vpe/vpdma_priv.h |  641 +++++++++
 drivers/media/platform/ti-vpe/vpe.c        | 2099 ++++++++++++++++++++++++++++
 drivers/media/platform/ti-vpe/vpe_regs.h   |  496 +++++++
 include/uapi/linux/v4l2-controls.h         |    4 +
 9 files changed, 4312 insertions(+)
 create mode 100644 drivers/media/platform/ti-vpe/Makefile
 create mode 100644 drivers/media/platform/ti-vpe/vpdma.c
 create mode 100644 drivers/media/platform/ti-vpe/vpdma.h
 create mode 100644 drivers/media/platform/ti-vpe/vpdma_priv.h
 create mode 100644 drivers/media/platform/ti-vpe/vpe.c
 create mode 100644 drivers/media/platform/ti-vpe/vpe_regs.h

-- 
1.8.1.2


^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v5 1/4] v4l: ti-vpe: Create a vpdma helper library
  2013-10-16  5:36       ` Archit Taneja
@ 2013-10-16  5:36         ` Archit Taneja
  -1 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-10-16  5:36 UTC (permalink / raw)
  To: k.debski; +Cc: hverkuil, linux-media, linux-omap, Archit Taneja

The primary function of VPDMA is to move data between external memory and
internal processing modules(in our case, VPE) that source or sink data. VPDMA is
capable of buffering this data and then delivering the data as demanded to the
modules as programmed. The modules that source or sink data are referred to as
clients or ports. A channel is setup inside the VPDMA to connect a specific
memory buffer to a specific client. The VPDMA centralizes the DMA control
functions and buffering required to allow all the clients to minimize the
effect of long latency times.

Add the following to the VPDMA helper:

- A data struct which describe VPDMA channels. For now, these channels are the
  ones used only by VPE, the list of channels will increase when VIP(Video
  Input Port) also uses the VPDMA library. This channel information will be
  used to populate fields required by data descriptors.

- Data structs which describe the different data types supported by VPDMA. This
  data type information will be used to populate fields required by data
  descriptors and used by the VPE driver to map a V4L2 format to the
  corresponding VPDMA data type.

- Provide VPDMA register offset definitions, functions to read, write and modify
  VPDMA registers.

- Functions to create and submit a VPDMA list. A list is a group of descriptors
  that makes up a set of DMA transfers that need to be completed. Each
  descriptor will either perform a DMA transaction to fetch input buffers and
  write to output buffers(data descriptors), or configure the MMRs of sub blocks
  of VPE(configuration descriptors), or provide control information to VPDMA
  (control descriptors).

- Functions to allocate, map and unmap buffers needed for the descriptor list,
  payloads containing MMR values and scaler coefficients. These use the DMA
  mapping APIs to ensure exclusive access to VPDMA.

- Functions to enable VPDMA interrupts. VPDMA can trigger an interrupt on the
  VPE interrupt line when a descriptor list is parsed completely and the DMA
  transactions are completed. This requires masking the events in VPDMA
  registers and configuring some top level VPE interrupt registers.

- Enable some VPDMA specific parameters: frame start event(when to start DMA for
  a client) and line mode(whether each line fetched should be mirrored or not).

- Function to load firmware required by VPDMA. VPDMA requires a firmware for
  it's internal list manager. We add the required request_firmware apis to fetch
  this firmware from user space.

- Function to dump VPDMA registers.

- A function to initialize and create a VPDMA instance, this will be called by
  the VPE driver with it's platform device pointer, this function will take care
  of loading VPDMA firmware and returning a vpdma_data instance back to the VPE
  driver. The VIP driver will also call the same init function to initialize it's
  own VPDMA instance.

Acked-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Archit Taneja <archit@ti.com>
---
 drivers/media/platform/ti-vpe/vpdma.c      | 578 +++++++++++++++++++++++++++++
 drivers/media/platform/ti-vpe/vpdma.h      | 155 ++++++++
 drivers/media/platform/ti-vpe/vpdma_priv.h | 119 ++++++
 3 files changed, 852 insertions(+)
 create mode 100644 drivers/media/platform/ti-vpe/vpdma.c
 create mode 100644 drivers/media/platform/ti-vpe/vpdma.h
 create mode 100644 drivers/media/platform/ti-vpe/vpdma_priv.h

diff --git a/drivers/media/platform/ti-vpe/vpdma.c b/drivers/media/platform/ti-vpe/vpdma.c
new file mode 100644
index 0000000..42db12c
--- /dev/null
+++ b/drivers/media/platform/ti-vpe/vpdma.c
@@ -0,0 +1,578 @@
+/*
+ * VPDMA helper library
+ *
+ * Copyright (c) 2013 Texas Instruments Inc.
+ *
+ * David Griego, <dagriego@biglakesoftware.com>
+ * Dale Farnsworth, <dale@farnsworth.org>
+ * Archit Taneja, <archit@ti.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#include <linux/delay.h>
+#include <linux/dma-mapping.h>
+#include <linux/err.h>
+#include <linux/firmware.h>
+#include <linux/io.h>
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+
+#include "vpdma.h"
+#include "vpdma_priv.h"
+
+#define VPDMA_FIRMWARE	"vpdma-1b8.bin"
+
+const struct vpdma_data_format vpdma_yuv_fmts[] = {
+	[VPDMA_DATA_FMT_Y444] = {
+		.data_type	= DATA_TYPE_Y444,
+		.depth		= 8,
+	},
+	[VPDMA_DATA_FMT_Y422] = {
+		.data_type	= DATA_TYPE_Y422,
+		.depth		= 8,
+	},
+	[VPDMA_DATA_FMT_Y420] = {
+		.data_type	= DATA_TYPE_Y420,
+		.depth		= 8,
+	},
+	[VPDMA_DATA_FMT_C444] = {
+		.data_type	= DATA_TYPE_C444,
+		.depth		= 8,
+	},
+	[VPDMA_DATA_FMT_C422] = {
+		.data_type	= DATA_TYPE_C422,
+		.depth		= 8,
+	},
+	[VPDMA_DATA_FMT_C420] = {
+		.data_type	= DATA_TYPE_C420,
+		.depth		= 4,
+	},
+	[VPDMA_DATA_FMT_YC422] = {
+		.data_type	= DATA_TYPE_YC422,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_YC444] = {
+		.data_type	= DATA_TYPE_YC444,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_CY422] = {
+		.data_type	= DATA_TYPE_CY422,
+		.depth		= 16,
+	},
+};
+
+const struct vpdma_data_format vpdma_rgb_fmts[] = {
+	[VPDMA_DATA_FMT_RGB565] = {
+		.data_type	= DATA_TYPE_RGB16_565,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_ARGB16_1555] = {
+		.data_type	= DATA_TYPE_ARGB_1555,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_ARGB16] = {
+		.data_type	= DATA_TYPE_ARGB_4444,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_RGBA16_5551] = {
+		.data_type	= DATA_TYPE_RGBA_5551,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_RGBA16] = {
+		.data_type	= DATA_TYPE_RGBA_4444,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_ARGB24] = {
+		.data_type	= DATA_TYPE_ARGB24_6666,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_RGB24] = {
+		.data_type	= DATA_TYPE_RGB24_888,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_ARGB32] = {
+		.data_type	= DATA_TYPE_ARGB32_8888,
+		.depth		= 32,
+	},
+	[VPDMA_DATA_FMT_RGBA24] = {
+		.data_type	= DATA_TYPE_RGBA24_6666,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_RGBA32] = {
+		.data_type	= DATA_TYPE_RGBA32_8888,
+		.depth		= 32,
+	},
+	[VPDMA_DATA_FMT_BGR565] = {
+		.data_type	= DATA_TYPE_BGR16_565,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_ABGR16_1555] = {
+		.data_type	= DATA_TYPE_ABGR_1555,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_ABGR16] = {
+		.data_type	= DATA_TYPE_ABGR_4444,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_BGRA16_5551] = {
+		.data_type	= DATA_TYPE_BGRA_5551,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_BGRA16] = {
+		.data_type	= DATA_TYPE_BGRA_4444,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_ABGR24] = {
+		.data_type	= DATA_TYPE_ABGR24_6666,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_BGR24] = {
+		.data_type	= DATA_TYPE_BGR24_888,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_ABGR32] = {
+		.data_type	= DATA_TYPE_ABGR32_8888,
+		.depth		= 32,
+	},
+	[VPDMA_DATA_FMT_BGRA24] = {
+		.data_type	= DATA_TYPE_BGRA24_6666,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_BGRA32] = {
+		.data_type	= DATA_TYPE_BGRA32_8888,
+		.depth		= 32,
+	},
+};
+
+const struct vpdma_data_format vpdma_misc_fmts[] = {
+	[VPDMA_DATA_FMT_MV] = {
+		.data_type	= DATA_TYPE_MV,
+		.depth		= 4,
+	},
+};
+
+struct vpdma_channel_info {
+	int num;		/* VPDMA channel number */
+	int cstat_offset;	/* client CSTAT register offset */
+};
+
+static const struct vpdma_channel_info chan_info[] = {
+	[VPE_CHAN_LUMA1_IN] = {
+		.num		= VPE_CHAN_NUM_LUMA1_IN,
+		.cstat_offset	= VPDMA_DEI_LUMA1_CSTAT,
+	},
+	[VPE_CHAN_CHROMA1_IN] = {
+		.num		= VPE_CHAN_NUM_CHROMA1_IN,
+		.cstat_offset	= VPDMA_DEI_CHROMA1_CSTAT,
+	},
+	[VPE_CHAN_LUMA2_IN] = {
+		.num		= VPE_CHAN_NUM_LUMA2_IN,
+		.cstat_offset	= VPDMA_DEI_LUMA2_CSTAT,
+	},
+	[VPE_CHAN_CHROMA2_IN] = {
+		.num		= VPE_CHAN_NUM_CHROMA2_IN,
+		.cstat_offset	= VPDMA_DEI_CHROMA2_CSTAT,
+	},
+	[VPE_CHAN_LUMA3_IN] = {
+		.num		= VPE_CHAN_NUM_LUMA3_IN,
+		.cstat_offset	= VPDMA_DEI_LUMA3_CSTAT,
+	},
+	[VPE_CHAN_CHROMA3_IN] = {
+		.num		= VPE_CHAN_NUM_CHROMA3_IN,
+		.cstat_offset	= VPDMA_DEI_CHROMA3_CSTAT,
+	},
+	[VPE_CHAN_MV_IN] = {
+		.num		= VPE_CHAN_NUM_MV_IN,
+		.cstat_offset	= VPDMA_DEI_MV_IN_CSTAT,
+	},
+	[VPE_CHAN_MV_OUT] = {
+		.num		= VPE_CHAN_NUM_MV_OUT,
+		.cstat_offset	= VPDMA_DEI_MV_OUT_CSTAT,
+	},
+	[VPE_CHAN_LUMA_OUT] = {
+		.num		= VPE_CHAN_NUM_LUMA_OUT,
+		.cstat_offset	= VPDMA_VIP_UP_Y_CSTAT,
+	},
+	[VPE_CHAN_CHROMA_OUT] = {
+		.num		= VPE_CHAN_NUM_CHROMA_OUT,
+		.cstat_offset	= VPDMA_VIP_UP_UV_CSTAT,
+	},
+	[VPE_CHAN_RGB_OUT] = {
+		.num		= VPE_CHAN_NUM_RGB_OUT,
+		.cstat_offset	= VPDMA_VIP_UP_Y_CSTAT,
+	},
+};
+
+static u32 read_reg(struct vpdma_data *vpdma, int offset)
+{
+	return ioread32(vpdma->base + offset);
+}
+
+static void write_reg(struct vpdma_data *vpdma, int offset, u32 value)
+{
+	iowrite32(value, vpdma->base + offset);
+}
+
+static int read_field_reg(struct vpdma_data *vpdma, int offset,
+		u32 mask, int shift)
+{
+	return (read_reg(vpdma, offset) & (mask << shift)) >> shift;
+}
+
+static void write_field_reg(struct vpdma_data *vpdma, int offset, u32 field,
+		u32 mask, int shift)
+{
+	u32 val = read_reg(vpdma, offset);
+
+	val &= ~(mask << shift);
+	val |= (field & mask) << shift;
+
+	write_reg(vpdma, offset, val);
+}
+
+void vpdma_dump_regs(struct vpdma_data *vpdma)
+{
+	struct device *dev = &vpdma->pdev->dev;
+
+#define DUMPREG(r) dev_dbg(dev, "%-35s %08x\n", #r, read_reg(vpdma, VPDMA_##r))
+
+	dev_dbg(dev, "VPDMA Registers:\n");
+
+	DUMPREG(PID);
+	DUMPREG(LIST_ADDR);
+	DUMPREG(LIST_ATTR);
+	DUMPREG(LIST_STAT_SYNC);
+	DUMPREG(BG_RGB);
+	DUMPREG(BG_YUV);
+	DUMPREG(SETUP);
+	DUMPREG(MAX_SIZE1);
+	DUMPREG(MAX_SIZE2);
+	DUMPREG(MAX_SIZE3);
+
+	/*
+	 * dumping registers of only group0 and group3, because VPE channels
+	 * lie within group0 and group3 registers
+	 */
+	DUMPREG(INT_CHAN_STAT(0));
+	DUMPREG(INT_CHAN_MASK(0));
+	DUMPREG(INT_CHAN_STAT(3));
+	DUMPREG(INT_CHAN_MASK(3));
+	DUMPREG(INT_CLIENT0_STAT);
+	DUMPREG(INT_CLIENT0_MASK);
+	DUMPREG(INT_CLIENT1_STAT);
+	DUMPREG(INT_CLIENT1_MASK);
+	DUMPREG(INT_LIST0_STAT);
+	DUMPREG(INT_LIST0_MASK);
+
+	/*
+	 * these are registers specific to VPE clients, we can make this
+	 * function dump client registers specific to VPE or VIP based on
+	 * who is using it
+	 */
+	DUMPREG(DEI_CHROMA1_CSTAT);
+	DUMPREG(DEI_LUMA1_CSTAT);
+	DUMPREG(DEI_CHROMA2_CSTAT);
+	DUMPREG(DEI_LUMA2_CSTAT);
+	DUMPREG(DEI_CHROMA3_CSTAT);
+	DUMPREG(DEI_LUMA3_CSTAT);
+	DUMPREG(DEI_MV_IN_CSTAT);
+	DUMPREG(DEI_MV_OUT_CSTAT);
+	DUMPREG(VIP_UP_Y_CSTAT);
+	DUMPREG(VIP_UP_UV_CSTAT);
+	DUMPREG(VPI_CTL_CSTAT);
+}
+
+/*
+ * Allocate a DMA buffer
+ */
+int vpdma_alloc_desc_buf(struct vpdma_buf *buf, size_t size)
+{
+	buf->size = size;
+	buf->mapped = false;
+	buf->addr = kzalloc(size, GFP_KERNEL);
+	if (!buf->addr)
+		return -ENOMEM;
+
+	WARN_ON((u32) buf->addr & VPDMA_DESC_ALIGN);
+
+	return 0;
+}
+
+void vpdma_free_desc_buf(struct vpdma_buf *buf)
+{
+	WARN_ON(buf->mapped);
+	kfree(buf->addr);
+	buf->addr = NULL;
+	buf->size = 0;
+}
+
+/*
+ * map descriptor/payload DMA buffer, enabling DMA access
+ */
+int vpdma_map_desc_buf(struct vpdma_data *vpdma, struct vpdma_buf *buf)
+{
+	struct device *dev = &vpdma->pdev->dev;
+
+	WARN_ON(buf->mapped);
+	buf->dma_addr = dma_map_single(dev, buf->addr, buf->size,
+				DMA_TO_DEVICE);
+	if (dma_mapping_error(dev, buf->dma_addr)) {
+		dev_err(dev, "failed to map buffer\n");
+		return -EINVAL;
+	}
+
+	buf->mapped = true;
+
+	return 0;
+}
+
+/*
+ * unmap descriptor/payload DMA buffer, disabling DMA access and
+ * allowing the main processor to acces the data
+ */
+void vpdma_unmap_desc_buf(struct vpdma_data *vpdma, struct vpdma_buf *buf)
+{
+	struct device *dev = &vpdma->pdev->dev;
+
+	if (buf->mapped)
+		dma_unmap_single(dev, buf->dma_addr, buf->size, DMA_TO_DEVICE);
+
+	buf->mapped = false;
+}
+
+/*
+ * create a descriptor list, the user of this list will append configuration,
+ * control and data descriptors to this list, this list will be submitted to
+ * VPDMA. VPDMA's list parser will go through each descriptor and perform the
+ * required DMA operations
+ */
+int vpdma_create_desc_list(struct vpdma_desc_list *list, size_t size, int type)
+{
+	int r;
+
+	r = vpdma_alloc_desc_buf(&list->buf, size);
+	if (r)
+		return r;
+
+	list->next = list->buf.addr;
+
+	list->type = type;
+
+	return 0;
+}
+
+/*
+ * once a descriptor list is parsed by VPDMA, we reset the list by emptying it,
+ * to allow new descriptors to be added to the list.
+ */
+void vpdma_reset_desc_list(struct vpdma_desc_list *list)
+{
+	list->next = list->buf.addr;
+}
+
+/*
+ * free the buffer allocated fot the VPDMA descriptor list, this should be
+ * called when the user doesn't want to use VPDMA any more.
+ */
+void vpdma_free_desc_list(struct vpdma_desc_list *list)
+{
+	vpdma_free_desc_buf(&list->buf);
+
+	list->next = NULL;
+}
+
+static bool vpdma_list_busy(struct vpdma_data *vpdma, int list_num)
+{
+	return read_reg(vpdma, VPDMA_LIST_STAT_SYNC) & BIT(list_num + 16);
+}
+
+/*
+ * submit a list of DMA descriptors to the VPE VPDMA, do not wait for completion
+ */
+int vpdma_submit_descs(struct vpdma_data *vpdma, struct vpdma_desc_list *list)
+{
+	/* we always use the first list */
+	int list_num = 0;
+	int list_size;
+
+	if (vpdma_list_busy(vpdma, list_num))
+		return -EBUSY;
+
+	/* 16-byte granularity */
+	list_size = (list->next - list->buf.addr) >> 4;
+
+	write_reg(vpdma, VPDMA_LIST_ADDR, (u32) list->buf.dma_addr);
+
+	write_reg(vpdma, VPDMA_LIST_ATTR,
+			(list_num << VPDMA_LIST_NUM_SHFT) |
+			(list->type << VPDMA_LIST_TYPE_SHFT) |
+			list_size);
+
+	return 0;
+}
+
+/* set or clear the mask for list complete interrupt */
+void vpdma_enable_list_complete_irq(struct vpdma_data *vpdma, int list_num,
+		bool enable)
+{
+	u32 val;
+
+	val = read_reg(vpdma, VPDMA_INT_LIST0_MASK);
+	if (enable)
+		val |= (1 << (list_num * 2));
+	else
+		val &= ~(1 << (list_num * 2));
+	write_reg(vpdma, VPDMA_INT_LIST0_MASK, val);
+}
+
+/* clear previosuly occured list intterupts in the LIST_STAT register */
+void vpdma_clear_list_stat(struct vpdma_data *vpdma)
+{
+	write_reg(vpdma, VPDMA_INT_LIST0_STAT,
+		read_reg(vpdma, VPDMA_INT_LIST0_STAT));
+}
+
+/*
+ * configures the output mode of the line buffer for the given client, the
+ * line buffer content can either be mirrored(each line repeated twice) or
+ * passed to the client as is
+ */
+void vpdma_set_line_mode(struct vpdma_data *vpdma, int line_mode,
+		enum vpdma_channel chan)
+{
+	int client_cstat = chan_info[chan].cstat_offset;
+
+	write_field_reg(vpdma, client_cstat, line_mode,
+		VPDMA_CSTAT_LINE_MODE_MASK, VPDMA_CSTAT_LINE_MODE_SHIFT);
+}
+
+/*
+ * configures the event which should trigger VPDMA transfer for the given
+ * client
+ */
+void vpdma_set_frame_start_event(struct vpdma_data *vpdma,
+		enum vpdma_frame_start_event fs_event,
+		enum vpdma_channel chan)
+{
+	int client_cstat = chan_info[chan].cstat_offset;
+
+	write_field_reg(vpdma, client_cstat, fs_event,
+		VPDMA_CSTAT_FRAME_START_MASK, VPDMA_CSTAT_FRAME_START_SHIFT);
+}
+
+static void vpdma_firmware_cb(const struct firmware *f, void *context)
+{
+	struct vpdma_data *vpdma = context;
+	struct vpdma_buf fw_dma_buf;
+	int i, r;
+
+	dev_dbg(&vpdma->pdev->dev, "firmware callback\n");
+
+	if (!f || !f->data) {
+		dev_err(&vpdma->pdev->dev, "couldn't get firmware\n");
+		return;
+	}
+
+	/* already initialized */
+	if (read_field_reg(vpdma, VPDMA_LIST_ATTR, VPDMA_LIST_RDY_MASK,
+			VPDMA_LIST_RDY_SHFT)) {
+		vpdma->ready = true;
+		return;
+	}
+
+	r = vpdma_alloc_desc_buf(&fw_dma_buf, f->size);
+	if (r) {
+		dev_err(&vpdma->pdev->dev,
+			"failed to allocate dma buffer for firmware\n");
+		goto rel_fw;
+	}
+
+	memcpy(fw_dma_buf.addr, f->data, f->size);
+
+	vpdma_map_desc_buf(vpdma, &fw_dma_buf);
+
+	write_reg(vpdma, VPDMA_LIST_ADDR, (u32) fw_dma_buf.dma_addr);
+
+	for (i = 0; i < 100; i++) {		/* max 1 second */
+		msleep_interruptible(10);
+
+		if (read_field_reg(vpdma, VPDMA_LIST_ATTR, VPDMA_LIST_RDY_MASK,
+				VPDMA_LIST_RDY_SHFT))
+			break;
+	}
+
+	if (i == 100) {
+		dev_err(&vpdma->pdev->dev, "firmware upload failed\n");
+		goto free_buf;
+	}
+
+	vpdma->ready = true;
+
+free_buf:
+	vpdma_unmap_desc_buf(vpdma, &fw_dma_buf);
+
+	vpdma_free_desc_buf(&fw_dma_buf);
+rel_fw:
+	release_firmware(f);
+}
+
+static int vpdma_load_firmware(struct vpdma_data *vpdma)
+{
+	int r;
+	struct device *dev = &vpdma->pdev->dev;
+
+	r = request_firmware_nowait(THIS_MODULE, 1,
+		(const char *) VPDMA_FIRMWARE, dev, GFP_KERNEL, vpdma,
+		vpdma_firmware_cb);
+	if (r) {
+		dev_err(dev, "firmware not available %s\n", VPDMA_FIRMWARE);
+		return r;
+	} else {
+		dev_info(dev, "loading firmware %s\n", VPDMA_FIRMWARE);
+	}
+
+	return 0;
+}
+
+struct vpdma_data *vpdma_create(struct platform_device *pdev)
+{
+	struct resource *res;
+	struct vpdma_data *vpdma;
+	int r;
+
+	dev_dbg(&pdev->dev, "vpdma_create\n");
+
+	vpdma = devm_kzalloc(&pdev->dev, sizeof(*vpdma), GFP_KERNEL);
+	if (!vpdma) {
+		dev_err(&pdev->dev, "couldn't alloc vpdma_dev\n");
+		return ERR_PTR(-ENOMEM);
+	}
+
+	vpdma->pdev = pdev;
+
+	res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "vpdma");
+	if (res == NULL) {
+		dev_err(&pdev->dev, "missing platform resources data\n");
+		return ERR_PTR(-ENODEV);
+	}
+
+	vpdma->base = devm_ioremap(&pdev->dev, res->start, resource_size(res));
+	if (!vpdma->base) {
+		dev_err(&pdev->dev, "failed to ioremap\n");
+		return ERR_PTR(-ENOMEM);
+	}
+
+	r = vpdma_load_firmware(vpdma);
+	if (r) {
+		pr_err("failed to load firmware %s\n", VPDMA_FIRMWARE);
+		return ERR_PTR(r);
+	}
+
+	return vpdma;
+}
+MODULE_FIRMWARE(VPDMA_FIRMWARE);
diff --git a/drivers/media/platform/ti-vpe/vpdma.h b/drivers/media/platform/ti-vpe/vpdma.h
new file mode 100644
index 0000000..8056689
--- /dev/null
+++ b/drivers/media/platform/ti-vpe/vpdma.h
@@ -0,0 +1,155 @@
+/*
+ * Copyright (c) 2013 Texas Instruments Inc.
+ *
+ * David Griego, <dagriego@biglakesoftware.com>
+ * Dale Farnsworth, <dale@farnsworth.org>
+ * Archit Taneja, <archit@ti.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#ifndef __TI_VPDMA_H_
+#define __TI_VPDMA_H_
+
+/*
+ * A vpdma_buf tracks the size, DMA address and mapping status of each
+ * driver DMA area.
+ */
+struct vpdma_buf {
+	void			*addr;
+	dma_addr_t		dma_addr;
+	size_t			size;
+	bool			mapped;
+};
+
+struct vpdma_desc_list {
+	struct vpdma_buf buf;
+	void *next;
+	int type;
+};
+
+struct vpdma_data {
+	void __iomem		*base;
+
+	struct platform_device	*pdev;
+
+	/* tells whether vpdma firmware is loaded or not */
+	bool ready;
+};
+
+struct vpdma_data_format {
+	int data_type;
+	u8 depth;
+};
+
+#define VPDMA_DESC_ALIGN		16	/* 16-byte descriptor alignment */
+
+#define VPDMA_DTD_DESC_SIZE		32	/* 8 words */
+#define VPDMA_CFD_CTD_DESC_SIZE		16	/* 4 words */
+
+#define VPDMA_LIST_TYPE_NORMAL		0
+#define VPDMA_LIST_TYPE_SELF_MODIFYING	1
+#define VPDMA_LIST_TYPE_DOORBELL	2
+
+enum vpdma_yuv_formats {
+	VPDMA_DATA_FMT_Y444 = 0,
+	VPDMA_DATA_FMT_Y422,
+	VPDMA_DATA_FMT_Y420,
+	VPDMA_DATA_FMT_C444,
+	VPDMA_DATA_FMT_C422,
+	VPDMA_DATA_FMT_C420,
+	VPDMA_DATA_FMT_YC422,
+	VPDMA_DATA_FMT_YC444,
+	VPDMA_DATA_FMT_CY422,
+};
+
+enum vpdma_rgb_formats {
+	VPDMA_DATA_FMT_RGB565 = 0,
+	VPDMA_DATA_FMT_ARGB16_1555,
+	VPDMA_DATA_FMT_ARGB16,
+	VPDMA_DATA_FMT_RGBA16_5551,
+	VPDMA_DATA_FMT_RGBA16,
+	VPDMA_DATA_FMT_ARGB24,
+	VPDMA_DATA_FMT_RGB24,
+	VPDMA_DATA_FMT_ARGB32,
+	VPDMA_DATA_FMT_RGBA24,
+	VPDMA_DATA_FMT_RGBA32,
+	VPDMA_DATA_FMT_BGR565,
+	VPDMA_DATA_FMT_ABGR16_1555,
+	VPDMA_DATA_FMT_ABGR16,
+	VPDMA_DATA_FMT_BGRA16_5551,
+	VPDMA_DATA_FMT_BGRA16,
+	VPDMA_DATA_FMT_ABGR24,
+	VPDMA_DATA_FMT_BGR24,
+	VPDMA_DATA_FMT_ABGR32,
+	VPDMA_DATA_FMT_BGRA24,
+	VPDMA_DATA_FMT_BGRA32,
+};
+
+enum vpdma_misc_formats {
+	VPDMA_DATA_FMT_MV = 0,
+};
+
+extern const struct vpdma_data_format vpdma_yuv_fmts[];
+extern const struct vpdma_data_format vpdma_rgb_fmts[];
+extern const struct vpdma_data_format vpdma_misc_fmts[];
+
+enum vpdma_frame_start_event {
+	VPDMA_FSEVENT_HDMI_FID = 0,
+	VPDMA_FSEVENT_DVO2_FID,
+	VPDMA_FSEVENT_HDCOMP_FID,
+	VPDMA_FSEVENT_SD_FID,
+	VPDMA_FSEVENT_LM_FID0,
+	VPDMA_FSEVENT_LM_FID1,
+	VPDMA_FSEVENT_LM_FID2,
+	VPDMA_FSEVENT_CHANNEL_ACTIVE,
+};
+
+/*
+ * VPDMA channel numbers
+ */
+enum vpdma_channel {
+	VPE_CHAN_LUMA1_IN,
+	VPE_CHAN_CHROMA1_IN,
+	VPE_CHAN_LUMA2_IN,
+	VPE_CHAN_CHROMA2_IN,
+	VPE_CHAN_LUMA3_IN,
+	VPE_CHAN_CHROMA3_IN,
+	VPE_CHAN_MV_IN,
+	VPE_CHAN_MV_OUT,
+	VPE_CHAN_LUMA_OUT,
+	VPE_CHAN_CHROMA_OUT,
+	VPE_CHAN_RGB_OUT,
+};
+
+/* vpdma descriptor buffer allocation and management */
+int vpdma_alloc_desc_buf(struct vpdma_buf *buf, size_t size);
+void vpdma_free_desc_buf(struct vpdma_buf *buf);
+int vpdma_map_desc_buf(struct vpdma_data *vpdma, struct vpdma_buf *buf);
+void vpdma_unmap_desc_buf(struct vpdma_data *vpdma, struct vpdma_buf *buf);
+
+/* vpdma descriptor list funcs */
+int vpdma_create_desc_list(struct vpdma_desc_list *list, size_t size, int type);
+void vpdma_reset_desc_list(struct vpdma_desc_list *list);
+void vpdma_free_desc_list(struct vpdma_desc_list *list);
+int vpdma_submit_descs(struct vpdma_data *vpdma, struct vpdma_desc_list *list);
+
+/* vpdma list interrupt management */
+void vpdma_enable_list_complete_irq(struct vpdma_data *vpdma, int list_num,
+		bool enable);
+void vpdma_clear_list_stat(struct vpdma_data *vpdma);
+
+/* vpdma client configuration */
+void vpdma_set_line_mode(struct vpdma_data *vpdma, int line_mode,
+		enum vpdma_channel chan);
+void vpdma_set_frame_start_event(struct vpdma_data *vpdma,
+		enum vpdma_frame_start_event fs_event, enum vpdma_channel chan);
+
+void vpdma_dump_regs(struct vpdma_data *vpdma);
+
+/* initialize vpdma, passed with VPE's platform device pointer */
+struct vpdma_data *vpdma_create(struct platform_device *pdev);
+
+#endif
diff --git a/drivers/media/platform/ti-vpe/vpdma_priv.h b/drivers/media/platform/ti-vpe/vpdma_priv.h
new file mode 100644
index 0000000..8ff51a3
--- /dev/null
+++ b/drivers/media/platform/ti-vpe/vpdma_priv.h
@@ -0,0 +1,119 @@
+/*
+ * Copyright (c) 2013 Texas Instruments Inc.
+ *
+ * David Griego, <dagriego@biglakesoftware.com>
+ * Dale Farnsworth, <dale@farnsworth.org>
+ * Archit Taneja, <archit@ti.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#ifndef _TI_VPDMA_PRIV_H_
+#define _TI_VPDMA_PRIV_H_
+
+/*
+ * VPDMA Register offsets
+ */
+
+/* Top level */
+#define VPDMA_PID		0x00
+#define VPDMA_LIST_ADDR		0x04
+#define VPDMA_LIST_ATTR		0x08
+#define VPDMA_LIST_STAT_SYNC	0x0c
+#define VPDMA_BG_RGB		0x18
+#define VPDMA_BG_YUV		0x1c
+#define VPDMA_SETUP		0x30
+#define VPDMA_MAX_SIZE1		0x34
+#define VPDMA_MAX_SIZE2		0x38
+#define VPDMA_MAX_SIZE3		0x3c
+
+/* Interrupts */
+#define VPDMA_INT_CHAN_STAT(grp)	(0x40 + grp * 8)
+#define VPDMA_INT_CHAN_MASK(grp)	(VPDMA_INT_CHAN_STAT(grp) + 4)
+#define VPDMA_INT_CLIENT0_STAT		0x78
+#define VPDMA_INT_CLIENT0_MASK		0x7c
+#define VPDMA_INT_CLIENT1_STAT		0x80
+#define VPDMA_INT_CLIENT1_MASK		0x84
+#define VPDMA_INT_LIST0_STAT		0x88
+#define VPDMA_INT_LIST0_MASK		0x8c
+
+#define VPDMA_PERFMON(i)		(0x200 + i * 4)
+
+/* VPE specific client registers */
+#define VPDMA_DEI_CHROMA1_CSTAT		0x0300
+#define VPDMA_DEI_LUMA1_CSTAT		0x0304
+#define VPDMA_DEI_LUMA2_CSTAT		0x0308
+#define VPDMA_DEI_CHROMA2_CSTAT		0x030c
+#define VPDMA_DEI_LUMA3_CSTAT		0x0310
+#define VPDMA_DEI_CHROMA3_CSTAT		0x0314
+#define VPDMA_DEI_MV_IN_CSTAT		0x0330
+#define VPDMA_DEI_MV_OUT_CSTAT		0x033c
+#define VPDMA_VIP_UP_Y_CSTAT		0x0390
+#define VPDMA_VIP_UP_UV_CSTAT		0x0394
+#define VPDMA_VPI_CTL_CSTAT		0x03d0
+
+/* Reg field info for VPDMA_CLIENT_CSTAT registers */
+#define VPDMA_CSTAT_LINE_MODE_MASK	0x03
+#define VPDMA_CSTAT_LINE_MODE_SHIFT	8
+#define VPDMA_CSTAT_FRAME_START_MASK	0xf
+#define VPDMA_CSTAT_FRAME_START_SHIFT	10
+
+#define VPDMA_LIST_NUM_MASK		0x07
+#define VPDMA_LIST_NUM_SHFT		24
+#define VPDMA_LIST_STOP_SHFT		20
+#define VPDMA_LIST_RDY_MASK		0x01
+#define VPDMA_LIST_RDY_SHFT		19
+#define VPDMA_LIST_TYPE_MASK		0x03
+#define VPDMA_LIST_TYPE_SHFT		16
+#define VPDMA_LIST_SIZE_MASK		0xffff
+
+/* VPDMA data type values for data formats */
+#define DATA_TYPE_Y444				0x0
+#define DATA_TYPE_Y422				0x1
+#define DATA_TYPE_Y420				0x2
+#define DATA_TYPE_C444				0x4
+#define DATA_TYPE_C422				0x5
+#define DATA_TYPE_C420				0x6
+#define DATA_TYPE_YC422				0x7
+#define DATA_TYPE_YC444				0x8
+#define DATA_TYPE_CY422				0x23
+
+#define DATA_TYPE_RGB16_565			0x0
+#define DATA_TYPE_ARGB_1555			0x1
+#define DATA_TYPE_ARGB_4444			0x2
+#define DATA_TYPE_RGBA_5551			0x3
+#define DATA_TYPE_RGBA_4444			0x4
+#define DATA_TYPE_ARGB24_6666			0x5
+#define DATA_TYPE_RGB24_888			0x6
+#define DATA_TYPE_ARGB32_8888			0x7
+#define DATA_TYPE_RGBA24_6666			0x8
+#define DATA_TYPE_RGBA32_8888			0x9
+#define DATA_TYPE_BGR16_565			0x10
+#define DATA_TYPE_ABGR_1555			0x11
+#define DATA_TYPE_ABGR_4444			0x12
+#define DATA_TYPE_BGRA_5551			0x13
+#define DATA_TYPE_BGRA_4444			0x14
+#define DATA_TYPE_ABGR24_6666			0x15
+#define DATA_TYPE_BGR24_888			0x16
+#define DATA_TYPE_ABGR32_8888			0x17
+#define DATA_TYPE_BGRA24_6666			0x18
+#define DATA_TYPE_BGRA32_8888			0x19
+
+#define DATA_TYPE_MV				0x3
+
+/* VPDMA channel numbers(only VPE channels for now) */
+#define	VPE_CHAN_NUM_LUMA1_IN		0
+#define	VPE_CHAN_NUM_CHROMA1_IN		1
+#define	VPE_CHAN_NUM_LUMA2_IN		2
+#define	VPE_CHAN_NUM_CHROMA2_IN		3
+#define	VPE_CHAN_NUM_LUMA3_IN		4
+#define	VPE_CHAN_NUM_CHROMA3_IN		5
+#define	VPE_CHAN_NUM_MV_IN		12
+#define	VPE_CHAN_NUM_MV_OUT		15
+#define	VPE_CHAN_NUM_LUMA_OUT		102
+#define	VPE_CHAN_NUM_CHROMA_OUT		103
+#define	VPE_CHAN_NUM_RGB_OUT		106
+
+#endif
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v5 1/4] v4l: ti-vpe: Create a vpdma helper library
@ 2013-10-16  5:36         ` Archit Taneja
  0 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-10-16  5:36 UTC (permalink / raw)
  To: k.debski; +Cc: hverkuil, linux-media, linux-omap, Archit Taneja

The primary function of VPDMA is to move data between external memory and
internal processing modules(in our case, VPE) that source or sink data. VPDMA is
capable of buffering this data and then delivering the data as demanded to the
modules as programmed. The modules that source or sink data are referred to as
clients or ports. A channel is setup inside the VPDMA to connect a specific
memory buffer to a specific client. The VPDMA centralizes the DMA control
functions and buffering required to allow all the clients to minimize the
effect of long latency times.

Add the following to the VPDMA helper:

- A data struct which describe VPDMA channels. For now, these channels are the
  ones used only by VPE, the list of channels will increase when VIP(Video
  Input Port) also uses the VPDMA library. This channel information will be
  used to populate fields required by data descriptors.

- Data structs which describe the different data types supported by VPDMA. This
  data type information will be used to populate fields required by data
  descriptors and used by the VPE driver to map a V4L2 format to the
  corresponding VPDMA data type.

- Provide VPDMA register offset definitions, functions to read, write and modify
  VPDMA registers.

- Functions to create and submit a VPDMA list. A list is a group of descriptors
  that makes up a set of DMA transfers that need to be completed. Each
  descriptor will either perform a DMA transaction to fetch input buffers and
  write to output buffers(data descriptors), or configure the MMRs of sub blocks
  of VPE(configuration descriptors), or provide control information to VPDMA
  (control descriptors).

- Functions to allocate, map and unmap buffers needed for the descriptor list,
  payloads containing MMR values and scaler coefficients. These use the DMA
  mapping APIs to ensure exclusive access to VPDMA.

- Functions to enable VPDMA interrupts. VPDMA can trigger an interrupt on the
  VPE interrupt line when a descriptor list is parsed completely and the DMA
  transactions are completed. This requires masking the events in VPDMA
  registers and configuring some top level VPE interrupt registers.

- Enable some VPDMA specific parameters: frame start event(when to start DMA for
  a client) and line mode(whether each line fetched should be mirrored or not).

- Function to load firmware required by VPDMA. VPDMA requires a firmware for
  it's internal list manager. We add the required request_firmware apis to fetch
  this firmware from user space.

- Function to dump VPDMA registers.

- A function to initialize and create a VPDMA instance, this will be called by
  the VPE driver with it's platform device pointer, this function will take care
  of loading VPDMA firmware and returning a vpdma_data instance back to the VPE
  driver. The VIP driver will also call the same init function to initialize it's
  own VPDMA instance.

Acked-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Archit Taneja <archit@ti.com>
---
 drivers/media/platform/ti-vpe/vpdma.c      | 578 +++++++++++++++++++++++++++++
 drivers/media/platform/ti-vpe/vpdma.h      | 155 ++++++++
 drivers/media/platform/ti-vpe/vpdma_priv.h | 119 ++++++
 3 files changed, 852 insertions(+)
 create mode 100644 drivers/media/platform/ti-vpe/vpdma.c
 create mode 100644 drivers/media/platform/ti-vpe/vpdma.h
 create mode 100644 drivers/media/platform/ti-vpe/vpdma_priv.h

diff --git a/drivers/media/platform/ti-vpe/vpdma.c b/drivers/media/platform/ti-vpe/vpdma.c
new file mode 100644
index 0000000..42db12c
--- /dev/null
+++ b/drivers/media/platform/ti-vpe/vpdma.c
@@ -0,0 +1,578 @@
+/*
+ * VPDMA helper library
+ *
+ * Copyright (c) 2013 Texas Instruments Inc.
+ *
+ * David Griego, <dagriego@biglakesoftware.com>
+ * Dale Farnsworth, <dale@farnsworth.org>
+ * Archit Taneja, <archit@ti.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#include <linux/delay.h>
+#include <linux/dma-mapping.h>
+#include <linux/err.h>
+#include <linux/firmware.h>
+#include <linux/io.h>
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+
+#include "vpdma.h"
+#include "vpdma_priv.h"
+
+#define VPDMA_FIRMWARE	"vpdma-1b8.bin"
+
+const struct vpdma_data_format vpdma_yuv_fmts[] = {
+	[VPDMA_DATA_FMT_Y444] = {
+		.data_type	= DATA_TYPE_Y444,
+		.depth		= 8,
+	},
+	[VPDMA_DATA_FMT_Y422] = {
+		.data_type	= DATA_TYPE_Y422,
+		.depth		= 8,
+	},
+	[VPDMA_DATA_FMT_Y420] = {
+		.data_type	= DATA_TYPE_Y420,
+		.depth		= 8,
+	},
+	[VPDMA_DATA_FMT_C444] = {
+		.data_type	= DATA_TYPE_C444,
+		.depth		= 8,
+	},
+	[VPDMA_DATA_FMT_C422] = {
+		.data_type	= DATA_TYPE_C422,
+		.depth		= 8,
+	},
+	[VPDMA_DATA_FMT_C420] = {
+		.data_type	= DATA_TYPE_C420,
+		.depth		= 4,
+	},
+	[VPDMA_DATA_FMT_YC422] = {
+		.data_type	= DATA_TYPE_YC422,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_YC444] = {
+		.data_type	= DATA_TYPE_YC444,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_CY422] = {
+		.data_type	= DATA_TYPE_CY422,
+		.depth		= 16,
+	},
+};
+
+const struct vpdma_data_format vpdma_rgb_fmts[] = {
+	[VPDMA_DATA_FMT_RGB565] = {
+		.data_type	= DATA_TYPE_RGB16_565,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_ARGB16_1555] = {
+		.data_type	= DATA_TYPE_ARGB_1555,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_ARGB16] = {
+		.data_type	= DATA_TYPE_ARGB_4444,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_RGBA16_5551] = {
+		.data_type	= DATA_TYPE_RGBA_5551,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_RGBA16] = {
+		.data_type	= DATA_TYPE_RGBA_4444,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_ARGB24] = {
+		.data_type	= DATA_TYPE_ARGB24_6666,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_RGB24] = {
+		.data_type	= DATA_TYPE_RGB24_888,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_ARGB32] = {
+		.data_type	= DATA_TYPE_ARGB32_8888,
+		.depth		= 32,
+	},
+	[VPDMA_DATA_FMT_RGBA24] = {
+		.data_type	= DATA_TYPE_RGBA24_6666,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_RGBA32] = {
+		.data_type	= DATA_TYPE_RGBA32_8888,
+		.depth		= 32,
+	},
+	[VPDMA_DATA_FMT_BGR565] = {
+		.data_type	= DATA_TYPE_BGR16_565,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_ABGR16_1555] = {
+		.data_type	= DATA_TYPE_ABGR_1555,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_ABGR16] = {
+		.data_type	= DATA_TYPE_ABGR_4444,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_BGRA16_5551] = {
+		.data_type	= DATA_TYPE_BGRA_5551,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_BGRA16] = {
+		.data_type	= DATA_TYPE_BGRA_4444,
+		.depth		= 16,
+	},
+	[VPDMA_DATA_FMT_ABGR24] = {
+		.data_type	= DATA_TYPE_ABGR24_6666,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_BGR24] = {
+		.data_type	= DATA_TYPE_BGR24_888,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_ABGR32] = {
+		.data_type	= DATA_TYPE_ABGR32_8888,
+		.depth		= 32,
+	},
+	[VPDMA_DATA_FMT_BGRA24] = {
+		.data_type	= DATA_TYPE_BGRA24_6666,
+		.depth		= 24,
+	},
+	[VPDMA_DATA_FMT_BGRA32] = {
+		.data_type	= DATA_TYPE_BGRA32_8888,
+		.depth		= 32,
+	},
+};
+
+const struct vpdma_data_format vpdma_misc_fmts[] = {
+	[VPDMA_DATA_FMT_MV] = {
+		.data_type	= DATA_TYPE_MV,
+		.depth		= 4,
+	},
+};
+
+struct vpdma_channel_info {
+	int num;		/* VPDMA channel number */
+	int cstat_offset;	/* client CSTAT register offset */
+};
+
+static const struct vpdma_channel_info chan_info[] = {
+	[VPE_CHAN_LUMA1_IN] = {
+		.num		= VPE_CHAN_NUM_LUMA1_IN,
+		.cstat_offset	= VPDMA_DEI_LUMA1_CSTAT,
+	},
+	[VPE_CHAN_CHROMA1_IN] = {
+		.num		= VPE_CHAN_NUM_CHROMA1_IN,
+		.cstat_offset	= VPDMA_DEI_CHROMA1_CSTAT,
+	},
+	[VPE_CHAN_LUMA2_IN] = {
+		.num		= VPE_CHAN_NUM_LUMA2_IN,
+		.cstat_offset	= VPDMA_DEI_LUMA2_CSTAT,
+	},
+	[VPE_CHAN_CHROMA2_IN] = {
+		.num		= VPE_CHAN_NUM_CHROMA2_IN,
+		.cstat_offset	= VPDMA_DEI_CHROMA2_CSTAT,
+	},
+	[VPE_CHAN_LUMA3_IN] = {
+		.num		= VPE_CHAN_NUM_LUMA3_IN,
+		.cstat_offset	= VPDMA_DEI_LUMA3_CSTAT,
+	},
+	[VPE_CHAN_CHROMA3_IN] = {
+		.num		= VPE_CHAN_NUM_CHROMA3_IN,
+		.cstat_offset	= VPDMA_DEI_CHROMA3_CSTAT,
+	},
+	[VPE_CHAN_MV_IN] = {
+		.num		= VPE_CHAN_NUM_MV_IN,
+		.cstat_offset	= VPDMA_DEI_MV_IN_CSTAT,
+	},
+	[VPE_CHAN_MV_OUT] = {
+		.num		= VPE_CHAN_NUM_MV_OUT,
+		.cstat_offset	= VPDMA_DEI_MV_OUT_CSTAT,
+	},
+	[VPE_CHAN_LUMA_OUT] = {
+		.num		= VPE_CHAN_NUM_LUMA_OUT,
+		.cstat_offset	= VPDMA_VIP_UP_Y_CSTAT,
+	},
+	[VPE_CHAN_CHROMA_OUT] = {
+		.num		= VPE_CHAN_NUM_CHROMA_OUT,
+		.cstat_offset	= VPDMA_VIP_UP_UV_CSTAT,
+	},
+	[VPE_CHAN_RGB_OUT] = {
+		.num		= VPE_CHAN_NUM_RGB_OUT,
+		.cstat_offset	= VPDMA_VIP_UP_Y_CSTAT,
+	},
+};
+
+static u32 read_reg(struct vpdma_data *vpdma, int offset)
+{
+	return ioread32(vpdma->base + offset);
+}
+
+static void write_reg(struct vpdma_data *vpdma, int offset, u32 value)
+{
+	iowrite32(value, vpdma->base + offset);
+}
+
+static int read_field_reg(struct vpdma_data *vpdma, int offset,
+		u32 mask, int shift)
+{
+	return (read_reg(vpdma, offset) & (mask << shift)) >> shift;
+}
+
+static void write_field_reg(struct vpdma_data *vpdma, int offset, u32 field,
+		u32 mask, int shift)
+{
+	u32 val = read_reg(vpdma, offset);
+
+	val &= ~(mask << shift);
+	val |= (field & mask) << shift;
+
+	write_reg(vpdma, offset, val);
+}
+
+void vpdma_dump_regs(struct vpdma_data *vpdma)
+{
+	struct device *dev = &vpdma->pdev->dev;
+
+#define DUMPREG(r) dev_dbg(dev, "%-35s %08x\n", #r, read_reg(vpdma, VPDMA_##r))
+
+	dev_dbg(dev, "VPDMA Registers:\n");
+
+	DUMPREG(PID);
+	DUMPREG(LIST_ADDR);
+	DUMPREG(LIST_ATTR);
+	DUMPREG(LIST_STAT_SYNC);
+	DUMPREG(BG_RGB);
+	DUMPREG(BG_YUV);
+	DUMPREG(SETUP);
+	DUMPREG(MAX_SIZE1);
+	DUMPREG(MAX_SIZE2);
+	DUMPREG(MAX_SIZE3);
+
+	/*
+	 * dumping registers of only group0 and group3, because VPE channels
+	 * lie within group0 and group3 registers
+	 */
+	DUMPREG(INT_CHAN_STAT(0));
+	DUMPREG(INT_CHAN_MASK(0));
+	DUMPREG(INT_CHAN_STAT(3));
+	DUMPREG(INT_CHAN_MASK(3));
+	DUMPREG(INT_CLIENT0_STAT);
+	DUMPREG(INT_CLIENT0_MASK);
+	DUMPREG(INT_CLIENT1_STAT);
+	DUMPREG(INT_CLIENT1_MASK);
+	DUMPREG(INT_LIST0_STAT);
+	DUMPREG(INT_LIST0_MASK);
+
+	/*
+	 * these are registers specific to VPE clients, we can make this
+	 * function dump client registers specific to VPE or VIP based on
+	 * who is using it
+	 */
+	DUMPREG(DEI_CHROMA1_CSTAT);
+	DUMPREG(DEI_LUMA1_CSTAT);
+	DUMPREG(DEI_CHROMA2_CSTAT);
+	DUMPREG(DEI_LUMA2_CSTAT);
+	DUMPREG(DEI_CHROMA3_CSTAT);
+	DUMPREG(DEI_LUMA3_CSTAT);
+	DUMPREG(DEI_MV_IN_CSTAT);
+	DUMPREG(DEI_MV_OUT_CSTAT);
+	DUMPREG(VIP_UP_Y_CSTAT);
+	DUMPREG(VIP_UP_UV_CSTAT);
+	DUMPREG(VPI_CTL_CSTAT);
+}
+
+/*
+ * Allocate a DMA buffer
+ */
+int vpdma_alloc_desc_buf(struct vpdma_buf *buf, size_t size)
+{
+	buf->size = size;
+	buf->mapped = false;
+	buf->addr = kzalloc(size, GFP_KERNEL);
+	if (!buf->addr)
+		return -ENOMEM;
+
+	WARN_ON((u32) buf->addr & VPDMA_DESC_ALIGN);
+
+	return 0;
+}
+
+void vpdma_free_desc_buf(struct vpdma_buf *buf)
+{
+	WARN_ON(buf->mapped);
+	kfree(buf->addr);
+	buf->addr = NULL;
+	buf->size = 0;
+}
+
+/*
+ * map descriptor/payload DMA buffer, enabling DMA access
+ */
+int vpdma_map_desc_buf(struct vpdma_data *vpdma, struct vpdma_buf *buf)
+{
+	struct device *dev = &vpdma->pdev->dev;
+
+	WARN_ON(buf->mapped);
+	buf->dma_addr = dma_map_single(dev, buf->addr, buf->size,
+				DMA_TO_DEVICE);
+	if (dma_mapping_error(dev, buf->dma_addr)) {
+		dev_err(dev, "failed to map buffer\n");
+		return -EINVAL;
+	}
+
+	buf->mapped = true;
+
+	return 0;
+}
+
+/*
+ * unmap descriptor/payload DMA buffer, disabling DMA access and
+ * allowing the main processor to acces the data
+ */
+void vpdma_unmap_desc_buf(struct vpdma_data *vpdma, struct vpdma_buf *buf)
+{
+	struct device *dev = &vpdma->pdev->dev;
+
+	if (buf->mapped)
+		dma_unmap_single(dev, buf->dma_addr, buf->size, DMA_TO_DEVICE);
+
+	buf->mapped = false;
+}
+
+/*
+ * create a descriptor list, the user of this list will append configuration,
+ * control and data descriptors to this list, this list will be submitted to
+ * VPDMA. VPDMA's list parser will go through each descriptor and perform the
+ * required DMA operations
+ */
+int vpdma_create_desc_list(struct vpdma_desc_list *list, size_t size, int type)
+{
+	int r;
+
+	r = vpdma_alloc_desc_buf(&list->buf, size);
+	if (r)
+		return r;
+
+	list->next = list->buf.addr;
+
+	list->type = type;
+
+	return 0;
+}
+
+/*
+ * once a descriptor list is parsed by VPDMA, we reset the list by emptying it,
+ * to allow new descriptors to be added to the list.
+ */
+void vpdma_reset_desc_list(struct vpdma_desc_list *list)
+{
+	list->next = list->buf.addr;
+}
+
+/*
+ * free the buffer allocated fot the VPDMA descriptor list, this should be
+ * called when the user doesn't want to use VPDMA any more.
+ */
+void vpdma_free_desc_list(struct vpdma_desc_list *list)
+{
+	vpdma_free_desc_buf(&list->buf);
+
+	list->next = NULL;
+}
+
+static bool vpdma_list_busy(struct vpdma_data *vpdma, int list_num)
+{
+	return read_reg(vpdma, VPDMA_LIST_STAT_SYNC) & BIT(list_num + 16);
+}
+
+/*
+ * submit a list of DMA descriptors to the VPE VPDMA, do not wait for completion
+ */
+int vpdma_submit_descs(struct vpdma_data *vpdma, struct vpdma_desc_list *list)
+{
+	/* we always use the first list */
+	int list_num = 0;
+	int list_size;
+
+	if (vpdma_list_busy(vpdma, list_num))
+		return -EBUSY;
+
+	/* 16-byte granularity */
+	list_size = (list->next - list->buf.addr) >> 4;
+
+	write_reg(vpdma, VPDMA_LIST_ADDR, (u32) list->buf.dma_addr);
+
+	write_reg(vpdma, VPDMA_LIST_ATTR,
+			(list_num << VPDMA_LIST_NUM_SHFT) |
+			(list->type << VPDMA_LIST_TYPE_SHFT) |
+			list_size);
+
+	return 0;
+}
+
+/* set or clear the mask for list complete interrupt */
+void vpdma_enable_list_complete_irq(struct vpdma_data *vpdma, int list_num,
+		bool enable)
+{
+	u32 val;
+
+	val = read_reg(vpdma, VPDMA_INT_LIST0_MASK);
+	if (enable)
+		val |= (1 << (list_num * 2));
+	else
+		val &= ~(1 << (list_num * 2));
+	write_reg(vpdma, VPDMA_INT_LIST0_MASK, val);
+}
+
+/* clear previosuly occured list intterupts in the LIST_STAT register */
+void vpdma_clear_list_stat(struct vpdma_data *vpdma)
+{
+	write_reg(vpdma, VPDMA_INT_LIST0_STAT,
+		read_reg(vpdma, VPDMA_INT_LIST0_STAT));
+}
+
+/*
+ * configures the output mode of the line buffer for the given client, the
+ * line buffer content can either be mirrored(each line repeated twice) or
+ * passed to the client as is
+ */
+void vpdma_set_line_mode(struct vpdma_data *vpdma, int line_mode,
+		enum vpdma_channel chan)
+{
+	int client_cstat = chan_info[chan].cstat_offset;
+
+	write_field_reg(vpdma, client_cstat, line_mode,
+		VPDMA_CSTAT_LINE_MODE_MASK, VPDMA_CSTAT_LINE_MODE_SHIFT);
+}
+
+/*
+ * configures the event which should trigger VPDMA transfer for the given
+ * client
+ */
+void vpdma_set_frame_start_event(struct vpdma_data *vpdma,
+		enum vpdma_frame_start_event fs_event,
+		enum vpdma_channel chan)
+{
+	int client_cstat = chan_info[chan].cstat_offset;
+
+	write_field_reg(vpdma, client_cstat, fs_event,
+		VPDMA_CSTAT_FRAME_START_MASK, VPDMA_CSTAT_FRAME_START_SHIFT);
+}
+
+static void vpdma_firmware_cb(const struct firmware *f, void *context)
+{
+	struct vpdma_data *vpdma = context;
+	struct vpdma_buf fw_dma_buf;
+	int i, r;
+
+	dev_dbg(&vpdma->pdev->dev, "firmware callback\n");
+
+	if (!f || !f->data) {
+		dev_err(&vpdma->pdev->dev, "couldn't get firmware\n");
+		return;
+	}
+
+	/* already initialized */
+	if (read_field_reg(vpdma, VPDMA_LIST_ATTR, VPDMA_LIST_RDY_MASK,
+			VPDMA_LIST_RDY_SHFT)) {
+		vpdma->ready = true;
+		return;
+	}
+
+	r = vpdma_alloc_desc_buf(&fw_dma_buf, f->size);
+	if (r) {
+		dev_err(&vpdma->pdev->dev,
+			"failed to allocate dma buffer for firmware\n");
+		goto rel_fw;
+	}
+
+	memcpy(fw_dma_buf.addr, f->data, f->size);
+
+	vpdma_map_desc_buf(vpdma, &fw_dma_buf);
+
+	write_reg(vpdma, VPDMA_LIST_ADDR, (u32) fw_dma_buf.dma_addr);
+
+	for (i = 0; i < 100; i++) {		/* max 1 second */
+		msleep_interruptible(10);
+
+		if (read_field_reg(vpdma, VPDMA_LIST_ATTR, VPDMA_LIST_RDY_MASK,
+				VPDMA_LIST_RDY_SHFT))
+			break;
+	}
+
+	if (i == 100) {
+		dev_err(&vpdma->pdev->dev, "firmware upload failed\n");
+		goto free_buf;
+	}
+
+	vpdma->ready = true;
+
+free_buf:
+	vpdma_unmap_desc_buf(vpdma, &fw_dma_buf);
+
+	vpdma_free_desc_buf(&fw_dma_buf);
+rel_fw:
+	release_firmware(f);
+}
+
+static int vpdma_load_firmware(struct vpdma_data *vpdma)
+{
+	int r;
+	struct device *dev = &vpdma->pdev->dev;
+
+	r = request_firmware_nowait(THIS_MODULE, 1,
+		(const char *) VPDMA_FIRMWARE, dev, GFP_KERNEL, vpdma,
+		vpdma_firmware_cb);
+	if (r) {
+		dev_err(dev, "firmware not available %s\n", VPDMA_FIRMWARE);
+		return r;
+	} else {
+		dev_info(dev, "loading firmware %s\n", VPDMA_FIRMWARE);
+	}
+
+	return 0;
+}
+
+struct vpdma_data *vpdma_create(struct platform_device *pdev)
+{
+	struct resource *res;
+	struct vpdma_data *vpdma;
+	int r;
+
+	dev_dbg(&pdev->dev, "vpdma_create\n");
+
+	vpdma = devm_kzalloc(&pdev->dev, sizeof(*vpdma), GFP_KERNEL);
+	if (!vpdma) {
+		dev_err(&pdev->dev, "couldn't alloc vpdma_dev\n");
+		return ERR_PTR(-ENOMEM);
+	}
+
+	vpdma->pdev = pdev;
+
+	res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "vpdma");
+	if (res == NULL) {
+		dev_err(&pdev->dev, "missing platform resources data\n");
+		return ERR_PTR(-ENODEV);
+	}
+
+	vpdma->base = devm_ioremap(&pdev->dev, res->start, resource_size(res));
+	if (!vpdma->base) {
+		dev_err(&pdev->dev, "failed to ioremap\n");
+		return ERR_PTR(-ENOMEM);
+	}
+
+	r = vpdma_load_firmware(vpdma);
+	if (r) {
+		pr_err("failed to load firmware %s\n", VPDMA_FIRMWARE);
+		return ERR_PTR(r);
+	}
+
+	return vpdma;
+}
+MODULE_FIRMWARE(VPDMA_FIRMWARE);
diff --git a/drivers/media/platform/ti-vpe/vpdma.h b/drivers/media/platform/ti-vpe/vpdma.h
new file mode 100644
index 0000000..8056689
--- /dev/null
+++ b/drivers/media/platform/ti-vpe/vpdma.h
@@ -0,0 +1,155 @@
+/*
+ * Copyright (c) 2013 Texas Instruments Inc.
+ *
+ * David Griego, <dagriego@biglakesoftware.com>
+ * Dale Farnsworth, <dale@farnsworth.org>
+ * Archit Taneja, <archit@ti.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#ifndef __TI_VPDMA_H_
+#define __TI_VPDMA_H_
+
+/*
+ * A vpdma_buf tracks the size, DMA address and mapping status of each
+ * driver DMA area.
+ */
+struct vpdma_buf {
+	void			*addr;
+	dma_addr_t		dma_addr;
+	size_t			size;
+	bool			mapped;
+};
+
+struct vpdma_desc_list {
+	struct vpdma_buf buf;
+	void *next;
+	int type;
+};
+
+struct vpdma_data {
+	void __iomem		*base;
+
+	struct platform_device	*pdev;
+
+	/* tells whether vpdma firmware is loaded or not */
+	bool ready;
+};
+
+struct vpdma_data_format {
+	int data_type;
+	u8 depth;
+};
+
+#define VPDMA_DESC_ALIGN		16	/* 16-byte descriptor alignment */
+
+#define VPDMA_DTD_DESC_SIZE		32	/* 8 words */
+#define VPDMA_CFD_CTD_DESC_SIZE		16	/* 4 words */
+
+#define VPDMA_LIST_TYPE_NORMAL		0
+#define VPDMA_LIST_TYPE_SELF_MODIFYING	1
+#define VPDMA_LIST_TYPE_DOORBELL	2
+
+enum vpdma_yuv_formats {
+	VPDMA_DATA_FMT_Y444 = 0,
+	VPDMA_DATA_FMT_Y422,
+	VPDMA_DATA_FMT_Y420,
+	VPDMA_DATA_FMT_C444,
+	VPDMA_DATA_FMT_C422,
+	VPDMA_DATA_FMT_C420,
+	VPDMA_DATA_FMT_YC422,
+	VPDMA_DATA_FMT_YC444,
+	VPDMA_DATA_FMT_CY422,
+};
+
+enum vpdma_rgb_formats {
+	VPDMA_DATA_FMT_RGB565 = 0,
+	VPDMA_DATA_FMT_ARGB16_1555,
+	VPDMA_DATA_FMT_ARGB16,
+	VPDMA_DATA_FMT_RGBA16_5551,
+	VPDMA_DATA_FMT_RGBA16,
+	VPDMA_DATA_FMT_ARGB24,
+	VPDMA_DATA_FMT_RGB24,
+	VPDMA_DATA_FMT_ARGB32,
+	VPDMA_DATA_FMT_RGBA24,
+	VPDMA_DATA_FMT_RGBA32,
+	VPDMA_DATA_FMT_BGR565,
+	VPDMA_DATA_FMT_ABGR16_1555,
+	VPDMA_DATA_FMT_ABGR16,
+	VPDMA_DATA_FMT_BGRA16_5551,
+	VPDMA_DATA_FMT_BGRA16,
+	VPDMA_DATA_FMT_ABGR24,
+	VPDMA_DATA_FMT_BGR24,
+	VPDMA_DATA_FMT_ABGR32,
+	VPDMA_DATA_FMT_BGRA24,
+	VPDMA_DATA_FMT_BGRA32,
+};
+
+enum vpdma_misc_formats {
+	VPDMA_DATA_FMT_MV = 0,
+};
+
+extern const struct vpdma_data_format vpdma_yuv_fmts[];
+extern const struct vpdma_data_format vpdma_rgb_fmts[];
+extern const struct vpdma_data_format vpdma_misc_fmts[];
+
+enum vpdma_frame_start_event {
+	VPDMA_FSEVENT_HDMI_FID = 0,
+	VPDMA_FSEVENT_DVO2_FID,
+	VPDMA_FSEVENT_HDCOMP_FID,
+	VPDMA_FSEVENT_SD_FID,
+	VPDMA_FSEVENT_LM_FID0,
+	VPDMA_FSEVENT_LM_FID1,
+	VPDMA_FSEVENT_LM_FID2,
+	VPDMA_FSEVENT_CHANNEL_ACTIVE,
+};
+
+/*
+ * VPDMA channel numbers
+ */
+enum vpdma_channel {
+	VPE_CHAN_LUMA1_IN,
+	VPE_CHAN_CHROMA1_IN,
+	VPE_CHAN_LUMA2_IN,
+	VPE_CHAN_CHROMA2_IN,
+	VPE_CHAN_LUMA3_IN,
+	VPE_CHAN_CHROMA3_IN,
+	VPE_CHAN_MV_IN,
+	VPE_CHAN_MV_OUT,
+	VPE_CHAN_LUMA_OUT,
+	VPE_CHAN_CHROMA_OUT,
+	VPE_CHAN_RGB_OUT,
+};
+
+/* vpdma descriptor buffer allocation and management */
+int vpdma_alloc_desc_buf(struct vpdma_buf *buf, size_t size);
+void vpdma_free_desc_buf(struct vpdma_buf *buf);
+int vpdma_map_desc_buf(struct vpdma_data *vpdma, struct vpdma_buf *buf);
+void vpdma_unmap_desc_buf(struct vpdma_data *vpdma, struct vpdma_buf *buf);
+
+/* vpdma descriptor list funcs */
+int vpdma_create_desc_list(struct vpdma_desc_list *list, size_t size, int type);
+void vpdma_reset_desc_list(struct vpdma_desc_list *list);
+void vpdma_free_desc_list(struct vpdma_desc_list *list);
+int vpdma_submit_descs(struct vpdma_data *vpdma, struct vpdma_desc_list *list);
+
+/* vpdma list interrupt management */
+void vpdma_enable_list_complete_irq(struct vpdma_data *vpdma, int list_num,
+		bool enable);
+void vpdma_clear_list_stat(struct vpdma_data *vpdma);
+
+/* vpdma client configuration */
+void vpdma_set_line_mode(struct vpdma_data *vpdma, int line_mode,
+		enum vpdma_channel chan);
+void vpdma_set_frame_start_event(struct vpdma_data *vpdma,
+		enum vpdma_frame_start_event fs_event, enum vpdma_channel chan);
+
+void vpdma_dump_regs(struct vpdma_data *vpdma);
+
+/* initialize vpdma, passed with VPE's platform device pointer */
+struct vpdma_data *vpdma_create(struct platform_device *pdev);
+
+#endif
diff --git a/drivers/media/platform/ti-vpe/vpdma_priv.h b/drivers/media/platform/ti-vpe/vpdma_priv.h
new file mode 100644
index 0000000..8ff51a3
--- /dev/null
+++ b/drivers/media/platform/ti-vpe/vpdma_priv.h
@@ -0,0 +1,119 @@
+/*
+ * Copyright (c) 2013 Texas Instruments Inc.
+ *
+ * David Griego, <dagriego@biglakesoftware.com>
+ * Dale Farnsworth, <dale@farnsworth.org>
+ * Archit Taneja, <archit@ti.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#ifndef _TI_VPDMA_PRIV_H_
+#define _TI_VPDMA_PRIV_H_
+
+/*
+ * VPDMA Register offsets
+ */
+
+/* Top level */
+#define VPDMA_PID		0x00
+#define VPDMA_LIST_ADDR		0x04
+#define VPDMA_LIST_ATTR		0x08
+#define VPDMA_LIST_STAT_SYNC	0x0c
+#define VPDMA_BG_RGB		0x18
+#define VPDMA_BG_YUV		0x1c
+#define VPDMA_SETUP		0x30
+#define VPDMA_MAX_SIZE1		0x34
+#define VPDMA_MAX_SIZE2		0x38
+#define VPDMA_MAX_SIZE3		0x3c
+
+/* Interrupts */
+#define VPDMA_INT_CHAN_STAT(grp)	(0x40 + grp * 8)
+#define VPDMA_INT_CHAN_MASK(grp)	(VPDMA_INT_CHAN_STAT(grp) + 4)
+#define VPDMA_INT_CLIENT0_STAT		0x78
+#define VPDMA_INT_CLIENT0_MASK		0x7c
+#define VPDMA_INT_CLIENT1_STAT		0x80
+#define VPDMA_INT_CLIENT1_MASK		0x84
+#define VPDMA_INT_LIST0_STAT		0x88
+#define VPDMA_INT_LIST0_MASK		0x8c
+
+#define VPDMA_PERFMON(i)		(0x200 + i * 4)
+
+/* VPE specific client registers */
+#define VPDMA_DEI_CHROMA1_CSTAT		0x0300
+#define VPDMA_DEI_LUMA1_CSTAT		0x0304
+#define VPDMA_DEI_LUMA2_CSTAT		0x0308
+#define VPDMA_DEI_CHROMA2_CSTAT		0x030c
+#define VPDMA_DEI_LUMA3_CSTAT		0x0310
+#define VPDMA_DEI_CHROMA3_CSTAT		0x0314
+#define VPDMA_DEI_MV_IN_CSTAT		0x0330
+#define VPDMA_DEI_MV_OUT_CSTAT		0x033c
+#define VPDMA_VIP_UP_Y_CSTAT		0x0390
+#define VPDMA_VIP_UP_UV_CSTAT		0x0394
+#define VPDMA_VPI_CTL_CSTAT		0x03d0
+
+/* Reg field info for VPDMA_CLIENT_CSTAT registers */
+#define VPDMA_CSTAT_LINE_MODE_MASK	0x03
+#define VPDMA_CSTAT_LINE_MODE_SHIFT	8
+#define VPDMA_CSTAT_FRAME_START_MASK	0xf
+#define VPDMA_CSTAT_FRAME_START_SHIFT	10
+
+#define VPDMA_LIST_NUM_MASK		0x07
+#define VPDMA_LIST_NUM_SHFT		24
+#define VPDMA_LIST_STOP_SHFT		20
+#define VPDMA_LIST_RDY_MASK		0x01
+#define VPDMA_LIST_RDY_SHFT		19
+#define VPDMA_LIST_TYPE_MASK		0x03
+#define VPDMA_LIST_TYPE_SHFT		16
+#define VPDMA_LIST_SIZE_MASK		0xffff
+
+/* VPDMA data type values for data formats */
+#define DATA_TYPE_Y444				0x0
+#define DATA_TYPE_Y422				0x1
+#define DATA_TYPE_Y420				0x2
+#define DATA_TYPE_C444				0x4
+#define DATA_TYPE_C422				0x5
+#define DATA_TYPE_C420				0x6
+#define DATA_TYPE_YC422				0x7
+#define DATA_TYPE_YC444				0x8
+#define DATA_TYPE_CY422				0x23
+
+#define DATA_TYPE_RGB16_565			0x0
+#define DATA_TYPE_ARGB_1555			0x1
+#define DATA_TYPE_ARGB_4444			0x2
+#define DATA_TYPE_RGBA_5551			0x3
+#define DATA_TYPE_RGBA_4444			0x4
+#define DATA_TYPE_ARGB24_6666			0x5
+#define DATA_TYPE_RGB24_888			0x6
+#define DATA_TYPE_ARGB32_8888			0x7
+#define DATA_TYPE_RGBA24_6666			0x8
+#define DATA_TYPE_RGBA32_8888			0x9
+#define DATA_TYPE_BGR16_565			0x10
+#define DATA_TYPE_ABGR_1555			0x11
+#define DATA_TYPE_ABGR_4444			0x12
+#define DATA_TYPE_BGRA_5551			0x13
+#define DATA_TYPE_BGRA_4444			0x14
+#define DATA_TYPE_ABGR24_6666			0x15
+#define DATA_TYPE_BGR24_888			0x16
+#define DATA_TYPE_ABGR32_8888			0x17
+#define DATA_TYPE_BGRA24_6666			0x18
+#define DATA_TYPE_BGRA32_8888			0x19
+
+#define DATA_TYPE_MV				0x3
+
+/* VPDMA channel numbers(only VPE channels for now) */
+#define	VPE_CHAN_NUM_LUMA1_IN		0
+#define	VPE_CHAN_NUM_CHROMA1_IN		1
+#define	VPE_CHAN_NUM_LUMA2_IN		2
+#define	VPE_CHAN_NUM_CHROMA2_IN		3
+#define	VPE_CHAN_NUM_LUMA3_IN		4
+#define	VPE_CHAN_NUM_CHROMA3_IN		5
+#define	VPE_CHAN_NUM_MV_IN		12
+#define	VPE_CHAN_NUM_MV_OUT		15
+#define	VPE_CHAN_NUM_LUMA_OUT		102
+#define	VPE_CHAN_NUM_CHROMA_OUT		103
+#define	VPE_CHAN_NUM_RGB_OUT		106
+
+#endif
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v5 2/4] v4l: ti-vpe: Add helpers for creating VPDMA descriptors
  2013-10-16  5:36       ` Archit Taneja
@ 2013-10-16  5:36         ` Archit Taneja
  -1 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-10-16  5:36 UTC (permalink / raw)
  To: k.debski; +Cc: hverkuil, linux-media, linux-omap, Archit Taneja

Create functions which the VPE driver can use to create a VPDMA descriptor and
add it to a VPDMA descriptor list. These functions take a pointer to an existing
list, and append the configuration/data/control descriptor header to the list.

In the case of configuration descriptors, the creation of a payload block may be
required(the payloads can hold VPE MMR values, or scaler coefficients). The
allocation of the payload buffer and it's content is left to the VPE driver.
However, the VPDMA library provides helper macros to create payload in the
correct format.

Add debug functions to dump the descriptors in a way such that it's easy to see
the values of different fields in the descriptors.

Acked-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Archit Taneja <archit@ti.com>
---
 drivers/media/platform/ti-vpe/vpdma.c      | 268 +++++++++++++++
 drivers/media/platform/ti-vpe/vpdma.h      |  48 +++
 drivers/media/platform/ti-vpe/vpdma_priv.h | 522 +++++++++++++++++++++++++++++
 3 files changed, 838 insertions(+)

diff --git a/drivers/media/platform/ti-vpe/vpdma.c b/drivers/media/platform/ti-vpe/vpdma.c
index 42db12c..af0a5ff 100644
--- a/drivers/media/platform/ti-vpe/vpdma.c
+++ b/drivers/media/platform/ti-vpe/vpdma.c
@@ -21,6 +21,7 @@
 #include <linux/platform_device.h>
 #include <linux/sched.h>
 #include <linux/slab.h>
+#include <linux/videodev2.h>
 
 #include "vpdma.h"
 #include "vpdma_priv.h"
@@ -416,6 +417,273 @@ int vpdma_submit_descs(struct vpdma_data *vpdma, struct vpdma_desc_list *list)
 	return 0;
 }
 
+static void dump_cfd(struct vpdma_cfd *cfd)
+{
+	int class;
+
+	class = cfd_get_class(cfd);
+
+	pr_debug("config descriptor of payload class: %s\n",
+		class == CFD_CLS_BLOCK ? "simple block" :
+		"address data block");
+
+	if (class == CFD_CLS_BLOCK)
+		pr_debug("word0: dst_addr_offset = 0x%08x\n",
+			cfd->dest_addr_offset);
+
+	if (class == CFD_CLS_BLOCK)
+		pr_debug("word1: num_data_wrds = %d\n", cfd->block_len);
+
+	pr_debug("word2: payload_addr = 0x%08x\n", cfd->payload_addr);
+
+	pr_debug("word3: pkt_type = %d, direct = %d, class = %d, dest = %d, "
+		"payload_len = %d\n", cfd_get_pkt_type(cfd),
+		cfd_get_direct(cfd), class, cfd_get_dest(cfd),
+		cfd_get_payload_len(cfd));
+}
+
+/*
+ * append a configuration descriptor to the given descriptor list, where the
+ * payload is in the form of a simple data block specified in the descriptor
+ * header, this is used to upload scaler coefficients to the scaler module
+ */
+void vpdma_add_cfd_block(struct vpdma_desc_list *list, int client,
+		struct vpdma_buf *blk, u32 dest_offset)
+{
+	struct vpdma_cfd *cfd;
+	int len = blk->size;
+
+	WARN_ON(blk->dma_addr & VPDMA_DESC_ALIGN);
+
+	cfd = list->next;
+	WARN_ON((void *)(cfd + 1) > (list->buf.addr + list->buf.size));
+
+	cfd->dest_addr_offset = dest_offset;
+	cfd->block_len = len;
+	cfd->payload_addr = (u32) blk->dma_addr;
+	cfd->ctl_payload_len = cfd_pkt_payload_len(CFD_INDIRECT, CFD_CLS_BLOCK,
+				client, len >> 4);
+
+	list->next = cfd + 1;
+
+	dump_cfd(cfd);
+}
+
+/*
+ * append a configuration descriptor to the given descriptor list, where the
+ * payload is in the address data block format, this is used to a configure a
+ * discontiguous set of MMRs
+ */
+void vpdma_add_cfd_adb(struct vpdma_desc_list *list, int client,
+		struct vpdma_buf *adb)
+{
+	struct vpdma_cfd *cfd;
+	unsigned int len = adb->size;
+
+	WARN_ON(len & VPDMA_ADB_SIZE_ALIGN);
+	WARN_ON(adb->dma_addr & VPDMA_DESC_ALIGN);
+
+	cfd = list->next;
+	BUG_ON((void *)(cfd + 1) > (list->buf.addr + list->buf.size));
+
+	cfd->w0 = 0;
+	cfd->w1 = 0;
+	cfd->payload_addr = (u32) adb->dma_addr;
+	cfd->ctl_payload_len = cfd_pkt_payload_len(CFD_INDIRECT, CFD_CLS_ADB,
+				client, len >> 4);
+
+	list->next = cfd + 1;
+
+	dump_cfd(cfd);
+};
+
+/*
+ * control descriptor format change based on what type of control descriptor it
+ * is, we only use 'sync on channel' control descriptors for now, so assume it's
+ * that
+ */
+static void dump_ctd(struct vpdma_ctd *ctd)
+{
+	pr_debug("control descriptor\n");
+
+	pr_debug("word3: pkt_type = %d, source = %d, ctl_type = %d\n",
+		ctd_get_pkt_type(ctd), ctd_get_source(ctd), ctd_get_ctl(ctd));
+}
+
+/*
+ * append a 'sync on channel' type control descriptor to the given descriptor
+ * list, this descriptor stalls the VPDMA list till the time DMA is completed
+ * on the specified channel
+ */
+void vpdma_add_sync_on_channel_ctd(struct vpdma_desc_list *list,
+		enum vpdma_channel chan)
+{
+	struct vpdma_ctd *ctd;
+
+	ctd = list->next;
+	WARN_ON((void *)(ctd + 1) > (list->buf.addr + list->buf.size));
+
+	ctd->w0 = 0;
+	ctd->w1 = 0;
+	ctd->w2 = 0;
+	ctd->type_source_ctl = ctd_type_source_ctl(chan_info[chan].num,
+				CTD_TYPE_SYNC_ON_CHANNEL);
+
+	list->next = ctd + 1;
+
+	dump_ctd(ctd);
+}
+
+static void dump_dtd(struct vpdma_dtd *dtd)
+{
+	int dir, chan;
+
+	dir = dtd_get_dir(dtd);
+	chan = dtd_get_chan(dtd);
+
+	pr_debug("%s data transfer descriptor for channel %d\n",
+		dir == DTD_DIR_OUT ? "outbound" : "inbound", chan);
+
+	pr_debug("word0: data_type = %d, notify = %d, field = %d, 1D = %d, "
+		"even_ln_skp = %d, odd_ln_skp = %d, line_stride = %d\n",
+		dtd_get_data_type(dtd), dtd_get_notify(dtd), dtd_get_field(dtd),
+		dtd_get_1d(dtd), dtd_get_even_line_skip(dtd),
+		dtd_get_odd_line_skip(dtd), dtd_get_line_stride(dtd));
+
+	if (dir == DTD_DIR_IN)
+		pr_debug("word1: line_length = %d, xfer_height = %d\n",
+			dtd_get_line_length(dtd), dtd_get_xfer_height(dtd));
+
+	pr_debug("word2: start_addr = 0x%08x\n", dtd->start_addr);
+
+	pr_debug("word3: pkt_type = %d, mode = %d, dir = %d, chan = %d, "
+		"pri = %d, next_chan = %d\n", dtd_get_pkt_type(dtd),
+		dtd_get_mode(dtd), dir, chan, dtd_get_priority(dtd),
+		dtd_get_next_chan(dtd));
+
+	if (dir == DTD_DIR_IN)
+		pr_debug("word4: frame_width = %d, frame_height = %d\n",
+			dtd_get_frame_width(dtd), dtd_get_frame_height(dtd));
+	else
+		pr_debug("word4: desc_write_addr = 0x%08x, write_desc = %d, "
+			"drp_data = %d, use_desc_reg = %d\n",
+			dtd_get_desc_write_addr(dtd), dtd_get_write_desc(dtd),
+			dtd_get_drop_data(dtd), dtd_get_use_desc(dtd));
+
+	if (dir == DTD_DIR_IN)
+		pr_debug("word5: hor_start = %d, ver_start = %d\n",
+			dtd_get_h_start(dtd), dtd_get_v_start(dtd));
+	else
+		pr_debug("word5: max_width %d, max_height %d\n",
+			dtd_get_max_width(dtd), dtd_get_max_height(dtd));
+
+	pr_debug("word6: client specfic attr0 = 0x%08x\n", dtd->client_attr0);
+	pr_debug("word7: client specfic attr1 = 0x%08x\n", dtd->client_attr1);
+}
+
+/*
+ * append an outbound data transfer descriptor to the given descriptor list,
+ * this sets up a 'client to memory' VPDMA transfer for the given VPDMA channel
+ */
+void vpdma_add_out_dtd(struct vpdma_desc_list *list, struct v4l2_rect *c_rect,
+		const struct vpdma_data_format *fmt, dma_addr_t dma_addr,
+		enum vpdma_channel chan, u32 flags)
+{
+	int priority = 0;
+	int field = 0;
+	int notify = 1;
+	int channel, next_chan;
+	int depth = fmt->depth;
+	int stride;
+	struct vpdma_dtd *dtd;
+
+	channel = next_chan = chan_info[chan].num;
+
+	if (fmt->data_type == DATA_TYPE_C420)
+		depth = 8;
+
+	stride = (depth * c_rect->width) >> 3;
+	dma_addr += (c_rect->left * depth) >> 3;
+
+	dtd = list->next;
+	WARN_ON((void *)(dtd + 1) > (list->buf.addr + list->buf.size));
+
+	dtd->type_ctl_stride = dtd_type_ctl_stride(fmt->data_type,
+					notify,
+					field,
+					!!(flags & VPDMA_DATA_FRAME_1D),
+					!!(flags & VPDMA_DATA_EVEN_LINE_SKIP),
+					!!(flags & VPDMA_DATA_ODD_LINE_SKIP),
+					stride);
+	dtd->w1 = 0;
+	dtd->start_addr = (u32) dma_addr;
+	dtd->pkt_ctl = dtd_pkt_ctl(!!(flags & VPDMA_DATA_MODE_TILED),
+				DTD_DIR_OUT, channel, priority, next_chan);
+	dtd->desc_write_addr = dtd_desc_write_addr(0, 0, 0, 0);
+	dtd->max_width_height = dtd_max_width_height(MAX_OUT_WIDTH_1920,
+					MAX_OUT_HEIGHT_1080);
+	dtd->client_attr0 = 0;
+	dtd->client_attr1 = 0;
+
+	list->next = dtd + 1;
+
+	dump_dtd(dtd);
+}
+
+/*
+ * append an inbound data transfer descriptor to the given descriptor list,
+ * this sets up a 'memory to client' VPDMA transfer for the given VPDMA channel
+ */
+void vpdma_add_in_dtd(struct vpdma_desc_list *list, int frame_width,
+		int frame_height, struct v4l2_rect *c_rect,
+		const struct vpdma_data_format *fmt, dma_addr_t dma_addr,
+		enum vpdma_channel chan, int field, u32 flags)
+{
+	int priority = 0;
+	int notify = 1;
+	int depth = fmt->depth;
+	int channel, next_chan;
+	int stride;
+	int height = c_rect->height;
+	struct vpdma_dtd *dtd;
+
+	channel = next_chan = chan_info[chan].num;
+
+	if (fmt->data_type == DATA_TYPE_C420) {
+		height >>= 1;
+		frame_height >>= 1;
+		depth = 8;
+	}
+
+	stride = (depth * c_rect->width) >> 3;
+	dma_addr += (c_rect->left * depth) >> 3;
+
+	dtd = list->next;
+	WARN_ON((void *)(dtd + 1) > (list->buf.addr + list->buf.size));
+
+	dtd->type_ctl_stride = dtd_type_ctl_stride(fmt->data_type,
+					notify,
+					field,
+					!!(flags & VPDMA_DATA_FRAME_1D),
+					!!(flags & VPDMA_DATA_EVEN_LINE_SKIP),
+					!!(flags & VPDMA_DATA_ODD_LINE_SKIP),
+					stride);
+
+	dtd->xfer_length_height = dtd_xfer_length_height(c_rect->width, height);
+	dtd->start_addr = (u32) dma_addr;
+	dtd->pkt_ctl = dtd_pkt_ctl(!!(flags & VPDMA_DATA_MODE_TILED),
+				DTD_DIR_IN, channel, priority, next_chan);
+	dtd->frame_width_height = dtd_frame_width_height(frame_width,
+					frame_height);
+	dtd->start_h_v = dtd_start_h_v(c_rect->left, c_rect->top);
+	dtd->client_attr0 = 0;
+	dtd->client_attr1 = 0;
+
+	list->next = dtd + 1;
+
+	dump_dtd(dtd);
+}
+
 /* set or clear the mask for list complete interrupt */
 void vpdma_enable_list_complete_irq(struct vpdma_data *vpdma, int list_num,
 		bool enable)
diff --git a/drivers/media/platform/ti-vpe/vpdma.h b/drivers/media/platform/ti-vpe/vpdma.h
index 8056689..eaa2a71 100644
--- a/drivers/media/platform/ti-vpe/vpdma.h
+++ b/drivers/media/platform/ti-vpe/vpdma.h
@@ -124,6 +124,39 @@ enum vpdma_channel {
 	VPE_CHAN_RGB_OUT,
 };
 
+/* flags for VPDMA data descriptors */
+#define VPDMA_DATA_ODD_LINE_SKIP	(1 << 0)
+#define VPDMA_DATA_EVEN_LINE_SKIP	(1 << 1)
+#define VPDMA_DATA_FRAME_1D		(1 << 2)
+#define VPDMA_DATA_MODE_TILED		(1 << 3)
+
+/*
+ * client identifiers used for configuration descriptors
+ */
+#define CFD_MMR_CLIENT		0
+#define CFD_SC_CLIENT		4
+
+/* Address data block header format */
+struct vpdma_adb_hdr {
+	u32			offset;
+	u32			nwords;
+	u32			reserved0;
+	u32			reserved1;
+};
+
+/* helpers for creating ADB headers for config descriptors MMRs as client */
+#define ADB_ADDR(dma_buf, str, fld)	((dma_buf)->addr + offsetof(str, fld))
+#define MMR_ADB_ADDR(buf, str, fld)	ADB_ADDR(&(buf), struct str, fld)
+
+#define VPDMA_SET_MMR_ADB_HDR(buf, str, hdr, regs, offset_a)	\
+	do {							\
+		struct vpdma_adb_hdr *h;			\
+		struct str *adb = NULL;				\
+		h = MMR_ADB_ADDR(buf, str, hdr);		\
+		h->offset = (offset_a);				\
+		h->nwords = sizeof(adb->regs) >> 2;		\
+	} while (0)
+
 /* vpdma descriptor buffer allocation and management */
 int vpdma_alloc_desc_buf(struct vpdma_buf *buf, size_t size);
 void vpdma_free_desc_buf(struct vpdma_buf *buf);
@@ -136,6 +169,21 @@ void vpdma_reset_desc_list(struct vpdma_desc_list *list);
 void vpdma_free_desc_list(struct vpdma_desc_list *list);
 int vpdma_submit_descs(struct vpdma_data *vpdma, struct vpdma_desc_list *list);
 
+/* helpers for creating vpdma descriptors */
+void vpdma_add_cfd_block(struct vpdma_desc_list *list, int client,
+		struct vpdma_buf *blk, u32 dest_offset);
+void vpdma_add_cfd_adb(struct vpdma_desc_list *list, int client,
+		struct vpdma_buf *adb);
+void vpdma_add_sync_on_channel_ctd(struct vpdma_desc_list *list,
+		enum vpdma_channel chan);
+void vpdma_add_out_dtd(struct vpdma_desc_list *list, struct v4l2_rect *c_rect,
+		const struct vpdma_data_format *fmt, dma_addr_t dma_addr,
+		enum vpdma_channel chan, u32 flags);
+void vpdma_add_in_dtd(struct vpdma_desc_list *list, int frame_width,
+		int frame_height, struct v4l2_rect *c_rect,
+		const struct vpdma_data_format *fmt, dma_addr_t dma_addr,
+		enum vpdma_channel chan, int field, u32 flags);
+
 /* vpdma list interrupt management */
 void vpdma_enable_list_complete_irq(struct vpdma_data *vpdma, int list_num,
 		bool enable);
diff --git a/drivers/media/platform/ti-vpe/vpdma_priv.h b/drivers/media/platform/ti-vpe/vpdma_priv.h
index 8ff51a3..f0e9a80 100644
--- a/drivers/media/platform/ti-vpe/vpdma_priv.h
+++ b/drivers/media/platform/ti-vpe/vpdma_priv.h
@@ -116,4 +116,526 @@
 #define	VPE_CHAN_NUM_CHROMA_OUT		103
 #define	VPE_CHAN_NUM_RGB_OUT		106
 
+/*
+ * a VPDMA address data block payload for a configuration descriptor needs to
+ * have each sub block length as a multiple of 16 bytes. Therefore, the overall
+ * size of the payload also needs to be a multiple of 16 bytes. The sub block
+ * lengths should be ensured to be aligned by the VPDMA user.
+ */
+#define VPDMA_ADB_SIZE_ALIGN		0x0f
+
+/*
+ * data transfer descriptor
+ */
+struct vpdma_dtd {
+	u32			type_ctl_stride;
+	union {
+		u32		xfer_length_height;
+		u32		w1;
+	};
+	dma_addr_t		start_addr;
+	u32			pkt_ctl;
+	union {
+		u32		frame_width_height;	/* inbound */
+		dma_addr_t	desc_write_addr;	/* outbound */
+	};
+	union {
+		u32		start_h_v;		/* inbound */
+		u32		max_width_height;	/* outbound */
+	};
+	u32			client_attr0;
+	u32			client_attr1;
+};
+
+/* Data Transfer Descriptor specifics */
+#define DTD_NO_NOTIFY		0
+#define DTD_NOTIFY		1
+
+#define DTD_PKT_TYPE		0xa
+#define DTD_DIR_IN		0
+#define DTD_DIR_OUT		1
+
+/* type_ctl_stride */
+#define DTD_DATA_TYPE_MASK	0x3f
+#define DTD_DATA_TYPE_SHFT	26
+#define DTD_NOTIFY_MASK		0x01
+#define DTD_NOTIFY_SHFT		25
+#define DTD_FIELD_MASK		0x01
+#define DTD_FIELD_SHFT		24
+#define DTD_1D_MASK		0x01
+#define DTD_1D_SHFT		23
+#define DTD_EVEN_LINE_SKIP_MASK	0x01
+#define DTD_EVEN_LINE_SKIP_SHFT	20
+#define DTD_ODD_LINE_SKIP_MASK	0x01
+#define DTD_ODD_LINE_SKIP_SHFT	16
+#define DTD_LINE_STRIDE_MASK	0xffff
+#define DTD_LINE_STRIDE_SHFT	0
+
+/* xfer_length_height */
+#define DTD_LINE_LENGTH_MASK	0xffff
+#define DTD_LINE_LENGTH_SHFT	16
+#define DTD_XFER_HEIGHT_MASK	0xffff
+#define DTD_XFER_HEIGHT_SHFT	0
+
+/* pkt_ctl */
+#define DTD_PKT_TYPE_MASK	0x1f
+#define DTD_PKT_TYPE_SHFT	27
+#define DTD_MODE_MASK		0x01
+#define DTD_MODE_SHFT		26
+#define DTD_DIR_MASK		0x01
+#define DTD_DIR_SHFT		25
+#define DTD_CHAN_MASK		0x01ff
+#define DTD_CHAN_SHFT		16
+#define DTD_PRI_MASK		0x0f
+#define DTD_PRI_SHFT		9
+#define DTD_NEXT_CHAN_MASK	0x01ff
+#define DTD_NEXT_CHAN_SHFT	0
+
+/* frame_width_height */
+#define DTD_FRAME_WIDTH_MASK	0xffff
+#define DTD_FRAME_WIDTH_SHFT	16
+#define DTD_FRAME_HEIGHT_MASK	0xffff
+#define DTD_FRAME_HEIGHT_SHFT	0
+
+/* start_h_v */
+#define DTD_H_START_MASK	0xffff
+#define DTD_H_START_SHFT	16
+#define DTD_V_START_MASK	0xffff
+#define DTD_V_START_SHFT	0
+
+#define DTD_DESC_START_SHIFT	5
+#define DTD_WRITE_DESC_MASK	0x01
+#define DTD_WRITE_DESC_SHIFT	2
+#define DTD_DROP_DATA_MASK	0x01
+#define DTD_DROP_DATA_SHIFT	1
+#define DTD_USE_DESC_MASK	0x01
+#define DTD_USE_DESC_SHIFT	0
+
+/* max_width_height */
+#define DTD_MAX_WIDTH_MASK	0x07
+#define DTD_MAX_WIDTH_SHFT	4
+#define DTD_MAX_HEIGHT_MASK	0x07
+#define DTD_MAX_HEIGHT_SHFT	0
+
+/* max width configurations */
+ /* unlimited width */
+#define	MAX_OUT_WIDTH_UNLIMITED		0
+/* as specified in max_size1 reg */
+#define MAX_OUT_WIDTH_REG1		1
+/* as specified in max_size2 reg */
+#define MAX_OUT_WIDTH_REG2		2
+/* as specified in max_size3 reg */
+#define	MAX_OUT_WIDTH_REG3		3
+/* maximum of 352 pixels as width */
+#define MAX_OUT_WIDTH_352		4
+/* maximum of 768 pixels as width */
+#define	MAX_OUT_WIDTH_768		5
+/* maximum of 1280 pixels width */
+#define	MAX_OUT_WIDTH_1280		6
+/* maximum of 1920 pixels as width */
+#define	MAX_OUT_WIDTH_1920		7
+
+/* max height configurations */
+ /* unlimited height */
+#define	MAX_OUT_HEIGHT_UNLIMITED	0
+/* as specified in max_size1 reg */
+#define MAX_OUT_HEIGHT_REG1		1
+/* as specified in max_size2 reg */
+#define MAX_OUT_HEIGHT_REG2		2
+/* as specified in max_size3 reg */
+#define	MAX_OUT_HEIGHT_REG3		3
+/* maximum of 288 lines as height */
+#define MAX_OUT_HEIGHT_288		4
+/* maximum of 576 lines as height */
+#define	MAX_OUT_HEIGHT_576		5
+/* maximum of 720 lines as height */
+#define	MAX_OUT_HEIGHT_720		6
+/* maximum of 1080 lines as height */
+#define	MAX_OUT_HEIGHT_1080		7
+
+static inline u32 dtd_type_ctl_stride(int type, bool notify, int field,
+			bool one_d, bool even_line_skip, bool odd_line_skip,
+			int line_stride)
+{
+	return (type << DTD_DATA_TYPE_SHFT) | (notify << DTD_NOTIFY_SHFT) |
+		(field << DTD_FIELD_SHFT) | (one_d << DTD_1D_SHFT) |
+		(even_line_skip << DTD_EVEN_LINE_SKIP_SHFT) |
+		(odd_line_skip << DTD_ODD_LINE_SKIP_SHFT) |
+		line_stride;
+}
+
+static inline u32 dtd_xfer_length_height(int line_length, int xfer_height)
+{
+	return (line_length << DTD_LINE_LENGTH_SHFT) | xfer_height;
+}
+
+static inline u32 dtd_pkt_ctl(bool mode, bool dir, int chan, int pri,
+			int next_chan)
+{
+	return (DTD_PKT_TYPE << DTD_PKT_TYPE_SHFT) | (mode << DTD_MODE_SHFT) |
+		(dir << DTD_DIR_SHFT) | (chan << DTD_CHAN_SHFT) |
+		(pri << DTD_PRI_SHFT) | next_chan;
+}
+
+static inline u32 dtd_frame_width_height(int width, int height)
+{
+	return (width << DTD_FRAME_WIDTH_SHFT) | height;
+}
+
+static inline u32 dtd_desc_write_addr(unsigned int addr, bool write_desc,
+			bool drop_data, bool use_desc)
+{
+	return (addr << DTD_DESC_START_SHIFT) |
+		(write_desc << DTD_WRITE_DESC_SHIFT) |
+		(drop_data << DTD_DROP_DATA_SHIFT) |
+		use_desc;
+}
+
+static inline u32 dtd_start_h_v(int h_start, int v_start)
+{
+	return (h_start << DTD_H_START_SHFT) | v_start;
+}
+
+static inline u32 dtd_max_width_height(int max_width, int max_height)
+{
+	return (max_width << DTD_MAX_WIDTH_SHFT) | max_height;
+}
+
+static inline int dtd_get_data_type(struct vpdma_dtd *dtd)
+{
+	return dtd->type_ctl_stride >> DTD_DATA_TYPE_SHFT;
+}
+
+static inline bool dtd_get_notify(struct vpdma_dtd *dtd)
+{
+	return (dtd->type_ctl_stride >> DTD_NOTIFY_SHFT) & DTD_NOTIFY_MASK;
+}
+
+static inline int dtd_get_field(struct vpdma_dtd *dtd)
+{
+	return (dtd->type_ctl_stride >> DTD_FIELD_SHFT) & DTD_FIELD_MASK;
+}
+
+static inline bool dtd_get_1d(struct vpdma_dtd *dtd)
+{
+	return (dtd->type_ctl_stride >> DTD_1D_SHFT) & DTD_1D_MASK;
+}
+
+static inline bool dtd_get_even_line_skip(struct vpdma_dtd *dtd)
+{
+	return (dtd->type_ctl_stride >> DTD_EVEN_LINE_SKIP_SHFT)
+		& DTD_EVEN_LINE_SKIP_MASK;
+}
+
+static inline bool dtd_get_odd_line_skip(struct vpdma_dtd *dtd)
+{
+	return (dtd->type_ctl_stride >> DTD_ODD_LINE_SKIP_SHFT)
+		& DTD_ODD_LINE_SKIP_MASK;
+}
+
+static inline int dtd_get_line_stride(struct vpdma_dtd *dtd)
+{
+	return dtd->type_ctl_stride & DTD_LINE_STRIDE_MASK;
+}
+
+static inline int dtd_get_line_length(struct vpdma_dtd *dtd)
+{
+	return dtd->xfer_length_height >> DTD_LINE_LENGTH_SHFT;
+}
+
+static inline int dtd_get_xfer_height(struct vpdma_dtd *dtd)
+{
+	return dtd->xfer_length_height & DTD_XFER_HEIGHT_MASK;
+}
+
+static inline int dtd_get_pkt_type(struct vpdma_dtd *dtd)
+{
+	return dtd->pkt_ctl >> DTD_PKT_TYPE_SHFT;
+}
+
+static inline bool dtd_get_mode(struct vpdma_dtd *dtd)
+{
+	return (dtd->pkt_ctl >> DTD_MODE_SHFT) & DTD_MODE_MASK;
+}
+
+static inline bool dtd_get_dir(struct vpdma_dtd *dtd)
+{
+	return (dtd->pkt_ctl >> DTD_DIR_SHFT) & DTD_DIR_MASK;
+}
+
+static inline int dtd_get_chan(struct vpdma_dtd *dtd)
+{
+	return (dtd->pkt_ctl >> DTD_CHAN_SHFT) & DTD_CHAN_MASK;
+}
+
+static inline int dtd_get_priority(struct vpdma_dtd *dtd)
+{
+	return (dtd->pkt_ctl >> DTD_PRI_SHFT) & DTD_PRI_MASK;
+}
+
+static inline int dtd_get_next_chan(struct vpdma_dtd *dtd)
+{
+	return (dtd->pkt_ctl >> DTD_NEXT_CHAN_SHFT) & DTD_NEXT_CHAN_MASK;
+}
+
+static inline int dtd_get_frame_width(struct vpdma_dtd *dtd)
+{
+	return dtd->frame_width_height >> DTD_FRAME_WIDTH_SHFT;
+}
+
+static inline int dtd_get_frame_height(struct vpdma_dtd *dtd)
+{
+	return dtd->frame_width_height & DTD_FRAME_HEIGHT_MASK;
+}
+
+static inline int dtd_get_desc_write_addr(struct vpdma_dtd *dtd)
+{
+	return dtd->desc_write_addr >> DTD_DESC_START_SHIFT;
+}
+
+static inline bool dtd_get_write_desc(struct vpdma_dtd *dtd)
+{
+	return (dtd->desc_write_addr >> DTD_WRITE_DESC_SHIFT) &
+							DTD_WRITE_DESC_MASK;
+}
+
+static inline bool dtd_get_drop_data(struct vpdma_dtd *dtd)
+{
+	return (dtd->desc_write_addr >> DTD_DROP_DATA_SHIFT) &
+							DTD_DROP_DATA_MASK;
+}
+
+static inline bool dtd_get_use_desc(struct vpdma_dtd *dtd)
+{
+	return dtd->desc_write_addr & DTD_USE_DESC_MASK;
+}
+
+static inline int dtd_get_h_start(struct vpdma_dtd *dtd)
+{
+	return dtd->start_h_v >> DTD_H_START_SHFT;
+}
+
+static inline int dtd_get_v_start(struct vpdma_dtd *dtd)
+{
+	return dtd->start_h_v & DTD_V_START_MASK;
+}
+
+static inline int dtd_get_max_width(struct vpdma_dtd *dtd)
+{
+	return (dtd->max_width_height >> DTD_MAX_WIDTH_SHFT) &
+							DTD_MAX_WIDTH_MASK;
+}
+
+static inline int dtd_get_max_height(struct vpdma_dtd *dtd)
+{
+	return (dtd->max_width_height >> DTD_MAX_HEIGHT_SHFT) &
+							DTD_MAX_HEIGHT_MASK;
+}
+
+/*
+ * configuration descriptor
+ */
+struct vpdma_cfd {
+	union {
+		u32	dest_addr_offset;
+		u32	w0;
+	};
+	union {
+		u32	block_len;		/* in words */
+		u32	w1;
+	};
+	u32		payload_addr;
+	u32		ctl_payload_len;	/* in words */
+};
+
+/* Configuration descriptor specifics */
+
+#define CFD_PKT_TYPE		0xb
+
+#define CFD_DIRECT		1
+#define CFD_INDIRECT		0
+#define CFD_CLS_ADB		0
+#define CFD_CLS_BLOCK		1
+
+/* block_len */
+#define CFD__BLOCK_LEN_MASK	0xffff
+#define CFD__BLOCK_LEN_SHFT	0
+
+/* ctl_payload_len */
+#define CFD_PKT_TYPE_MASK	0x1f
+#define CFD_PKT_TYPE_SHFT	27
+#define CFD_DIRECT_MASK		0x01
+#define CFD_DIRECT_SHFT		26
+#define CFD_CLASS_MASK		0x03
+#define CFD_CLASS_SHFT		24
+#define CFD_DEST_MASK		0xff
+#define CFD_DEST_SHFT		16
+#define CFD_PAYLOAD_LEN_MASK	0xffff
+#define CFD_PAYLOAD_LEN_SHFT	0
+
+static inline u32 cfd_pkt_payload_len(bool direct, int cls, int dest,
+		int payload_len)
+{
+	return (CFD_PKT_TYPE << CFD_PKT_TYPE_SHFT) |
+		(direct << CFD_DIRECT_SHFT) |
+		(cls << CFD_CLASS_SHFT) |
+		(dest << CFD_DEST_SHFT) |
+		payload_len;
+}
+
+static inline int cfd_get_pkt_type(struct vpdma_cfd *cfd)
+{
+	return cfd->ctl_payload_len >> CFD_PKT_TYPE_SHFT;
+}
+
+static inline bool cfd_get_direct(struct vpdma_cfd *cfd)
+{
+	return (cfd->ctl_payload_len >> CFD_DIRECT_SHFT) & CFD_DIRECT_MASK;
+}
+
+static inline bool cfd_get_class(struct vpdma_cfd *cfd)
+{
+	return (cfd->ctl_payload_len >> CFD_CLASS_SHFT) & CFD_CLASS_MASK;
+}
+
+static inline int cfd_get_dest(struct vpdma_cfd *cfd)
+{
+	return (cfd->ctl_payload_len >> CFD_DEST_SHFT) & CFD_DEST_MASK;
+}
+
+static inline int cfd_get_payload_len(struct vpdma_cfd *cfd)
+{
+	return cfd->ctl_payload_len & CFD_PAYLOAD_LEN_MASK;
+}
+
+/*
+ * control descriptor
+ */
+struct vpdma_ctd {
+	union {
+		u32	timer_value;
+		u32	list_addr;
+		u32	w0;
+	};
+	union {
+		u32	pixel_line_count;
+		u32	list_size;
+		u32	w1;
+	};
+	union {
+		u32	event;
+		u32	fid_ctl;
+		u32	w2;
+	};
+	u32		type_source_ctl;
+};
+
+/* control descriptor types */
+#define CTD_TYPE_SYNC_ON_CLIENT		0
+#define CTD_TYPE_SYNC_ON_LIST		1
+#define CTD_TYPE_SYNC_ON_EXT		2
+#define CTD_TYPE_SYNC_ON_LM_TIMER	3
+#define CTD_TYPE_SYNC_ON_CHANNEL	4
+#define CTD_TYPE_CHNG_CLIENT_IRQ	5
+#define CTD_TYPE_SEND_IRQ		6
+#define CTD_TYPE_RELOAD_LIST		7
+#define CTD_TYPE_ABORT_CHANNEL		8
+
+#define CTD_PKT_TYPE		0xc
+
+/* timer_value */
+#define CTD_TIMER_VALUE_MASK	0xffff
+#define CTD_TIMER_VALUE_SHFT	0
+
+/* pixel_line_count */
+#define CTD_PIXEL_COUNT_MASK	0xffff
+#define CTD_PIXEL_COUNT_SHFT	16
+#define CTD_LINE_COUNT_MASK	0xffff
+#define CTD_LINE_COUNT_SHFT	0
+
+/* list_size */
+#define CTD_LIST_SIZE_MASK	0xffff
+#define CTD_LIST_SIZE_SHFT	0
+
+/* event */
+#define CTD_EVENT_MASK		0x0f
+#define CTD_EVENT_SHFT		0
+
+/* fid_ctl */
+#define CTD_FID2_MASK		0x03
+#define CTD_FID2_SHFT		4
+#define CTD_FID1_MASK		0x03
+#define CTD_FID1_SHFT		2
+#define CTD_FID0_MASK		0x03
+#define CTD_FID0_SHFT		0
+
+/* type_source_ctl */
+#define CTD_PKT_TYPE_MASK	0x1f
+#define CTD_PKT_TYPE_SHFT	27
+#define CTD_SOURCE_MASK		0xff
+#define CTD_SOURCE_SHFT		16
+#define CTD_CONTROL_MASK	0x0f
+#define CTD_CONTROL_SHFT	0
+
+static inline u32 ctd_pixel_line_count(int pixel_count, int line_count)
+{
+	return (pixel_count << CTD_PIXEL_COUNT_SHFT) | line_count;
+}
+
+static inline u32 ctd_set_fid_ctl(int fid0, int fid1, int fid2)
+{
+	return (fid2 << CTD_FID2_SHFT) | (fid1 << CTD_FID1_SHFT) | fid0;
+}
+
+static inline u32 ctd_type_source_ctl(int source, int control)
+{
+	return (CTD_PKT_TYPE << CTD_PKT_TYPE_SHFT) |
+		(source << CTD_SOURCE_SHFT) | control;
+}
+
+static inline u32 ctd_get_pixel_count(struct vpdma_ctd *ctd)
+{
+	return ctd->pixel_line_count >> CTD_PIXEL_COUNT_SHFT;
+}
+
+static inline int ctd_get_line_count(struct vpdma_ctd *ctd)
+{
+	return ctd->pixel_line_count & CTD_LINE_COUNT_MASK;
+}
+
+static inline int ctd_get_event(struct vpdma_ctd *ctd)
+{
+	return ctd->event & CTD_EVENT_MASK;
+}
+
+static inline int ctd_get_fid2_ctl(struct vpdma_ctd *ctd)
+{
+	return (ctd->fid_ctl >> CTD_FID2_SHFT) & CTD_FID2_MASK;
+}
+
+static inline int ctd_get_fid1_ctl(struct vpdma_ctd *ctd)
+{
+	return (ctd->fid_ctl >> CTD_FID1_SHFT) & CTD_FID1_MASK;
+}
+
+static inline int ctd_get_fid0_ctl(struct vpdma_ctd *ctd)
+{
+	return ctd->fid_ctl & CTD_FID2_MASK;
+}
+
+static inline int ctd_get_pkt_type(struct vpdma_ctd *ctd)
+{
+	return ctd->type_source_ctl >> CTD_PKT_TYPE_SHFT;
+}
+
+static inline int ctd_get_source(struct vpdma_ctd *ctd)
+{
+	return (ctd->type_source_ctl >> CTD_SOURCE_SHFT) & CTD_SOURCE_MASK;
+}
+
+static inline int ctd_get_ctl(struct vpdma_ctd *ctd)
+{
+	return ctd->type_source_ctl & CTD_CONTROL_MASK;
+}
+
 #endif
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v5 2/4] v4l: ti-vpe: Add helpers for creating VPDMA descriptors
@ 2013-10-16  5:36         ` Archit Taneja
  0 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-10-16  5:36 UTC (permalink / raw)
  To: k.debski; +Cc: hverkuil, linux-media, linux-omap, Archit Taneja

Create functions which the VPE driver can use to create a VPDMA descriptor and
add it to a VPDMA descriptor list. These functions take a pointer to an existing
list, and append the configuration/data/control descriptor header to the list.

In the case of configuration descriptors, the creation of a payload block may be
required(the payloads can hold VPE MMR values, or scaler coefficients). The
allocation of the payload buffer and it's content is left to the VPE driver.
However, the VPDMA library provides helper macros to create payload in the
correct format.

Add debug functions to dump the descriptors in a way such that it's easy to see
the values of different fields in the descriptors.

Acked-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Archit Taneja <archit@ti.com>
---
 drivers/media/platform/ti-vpe/vpdma.c      | 268 +++++++++++++++
 drivers/media/platform/ti-vpe/vpdma.h      |  48 +++
 drivers/media/platform/ti-vpe/vpdma_priv.h | 522 +++++++++++++++++++++++++++++
 3 files changed, 838 insertions(+)

diff --git a/drivers/media/platform/ti-vpe/vpdma.c b/drivers/media/platform/ti-vpe/vpdma.c
index 42db12c..af0a5ff 100644
--- a/drivers/media/platform/ti-vpe/vpdma.c
+++ b/drivers/media/platform/ti-vpe/vpdma.c
@@ -21,6 +21,7 @@
 #include <linux/platform_device.h>
 #include <linux/sched.h>
 #include <linux/slab.h>
+#include <linux/videodev2.h>
 
 #include "vpdma.h"
 #include "vpdma_priv.h"
@@ -416,6 +417,273 @@ int vpdma_submit_descs(struct vpdma_data *vpdma, struct vpdma_desc_list *list)
 	return 0;
 }
 
+static void dump_cfd(struct vpdma_cfd *cfd)
+{
+	int class;
+
+	class = cfd_get_class(cfd);
+
+	pr_debug("config descriptor of payload class: %s\n",
+		class == CFD_CLS_BLOCK ? "simple block" :
+		"address data block");
+
+	if (class == CFD_CLS_BLOCK)
+		pr_debug("word0: dst_addr_offset = 0x%08x\n",
+			cfd->dest_addr_offset);
+
+	if (class == CFD_CLS_BLOCK)
+		pr_debug("word1: num_data_wrds = %d\n", cfd->block_len);
+
+	pr_debug("word2: payload_addr = 0x%08x\n", cfd->payload_addr);
+
+	pr_debug("word3: pkt_type = %d, direct = %d, class = %d, dest = %d, "
+		"payload_len = %d\n", cfd_get_pkt_type(cfd),
+		cfd_get_direct(cfd), class, cfd_get_dest(cfd),
+		cfd_get_payload_len(cfd));
+}
+
+/*
+ * append a configuration descriptor to the given descriptor list, where the
+ * payload is in the form of a simple data block specified in the descriptor
+ * header, this is used to upload scaler coefficients to the scaler module
+ */
+void vpdma_add_cfd_block(struct vpdma_desc_list *list, int client,
+		struct vpdma_buf *blk, u32 dest_offset)
+{
+	struct vpdma_cfd *cfd;
+	int len = blk->size;
+
+	WARN_ON(blk->dma_addr & VPDMA_DESC_ALIGN);
+
+	cfd = list->next;
+	WARN_ON((void *)(cfd + 1) > (list->buf.addr + list->buf.size));
+
+	cfd->dest_addr_offset = dest_offset;
+	cfd->block_len = len;
+	cfd->payload_addr = (u32) blk->dma_addr;
+	cfd->ctl_payload_len = cfd_pkt_payload_len(CFD_INDIRECT, CFD_CLS_BLOCK,
+				client, len >> 4);
+
+	list->next = cfd + 1;
+
+	dump_cfd(cfd);
+}
+
+/*
+ * append a configuration descriptor to the given descriptor list, where the
+ * payload is in the address data block format, this is used to a configure a
+ * discontiguous set of MMRs
+ */
+void vpdma_add_cfd_adb(struct vpdma_desc_list *list, int client,
+		struct vpdma_buf *adb)
+{
+	struct vpdma_cfd *cfd;
+	unsigned int len = adb->size;
+
+	WARN_ON(len & VPDMA_ADB_SIZE_ALIGN);
+	WARN_ON(adb->dma_addr & VPDMA_DESC_ALIGN);
+
+	cfd = list->next;
+	BUG_ON((void *)(cfd + 1) > (list->buf.addr + list->buf.size));
+
+	cfd->w0 = 0;
+	cfd->w1 = 0;
+	cfd->payload_addr = (u32) adb->dma_addr;
+	cfd->ctl_payload_len = cfd_pkt_payload_len(CFD_INDIRECT, CFD_CLS_ADB,
+				client, len >> 4);
+
+	list->next = cfd + 1;
+
+	dump_cfd(cfd);
+};
+
+/*
+ * control descriptor format change based on what type of control descriptor it
+ * is, we only use 'sync on channel' control descriptors for now, so assume it's
+ * that
+ */
+static void dump_ctd(struct vpdma_ctd *ctd)
+{
+	pr_debug("control descriptor\n");
+
+	pr_debug("word3: pkt_type = %d, source = %d, ctl_type = %d\n",
+		ctd_get_pkt_type(ctd), ctd_get_source(ctd), ctd_get_ctl(ctd));
+}
+
+/*
+ * append a 'sync on channel' type control descriptor to the given descriptor
+ * list, this descriptor stalls the VPDMA list till the time DMA is completed
+ * on the specified channel
+ */
+void vpdma_add_sync_on_channel_ctd(struct vpdma_desc_list *list,
+		enum vpdma_channel chan)
+{
+	struct vpdma_ctd *ctd;
+
+	ctd = list->next;
+	WARN_ON((void *)(ctd + 1) > (list->buf.addr + list->buf.size));
+
+	ctd->w0 = 0;
+	ctd->w1 = 0;
+	ctd->w2 = 0;
+	ctd->type_source_ctl = ctd_type_source_ctl(chan_info[chan].num,
+				CTD_TYPE_SYNC_ON_CHANNEL);
+
+	list->next = ctd + 1;
+
+	dump_ctd(ctd);
+}
+
+static void dump_dtd(struct vpdma_dtd *dtd)
+{
+	int dir, chan;
+
+	dir = dtd_get_dir(dtd);
+	chan = dtd_get_chan(dtd);
+
+	pr_debug("%s data transfer descriptor for channel %d\n",
+		dir == DTD_DIR_OUT ? "outbound" : "inbound", chan);
+
+	pr_debug("word0: data_type = %d, notify = %d, field = %d, 1D = %d, "
+		"even_ln_skp = %d, odd_ln_skp = %d, line_stride = %d\n",
+		dtd_get_data_type(dtd), dtd_get_notify(dtd), dtd_get_field(dtd),
+		dtd_get_1d(dtd), dtd_get_even_line_skip(dtd),
+		dtd_get_odd_line_skip(dtd), dtd_get_line_stride(dtd));
+
+	if (dir == DTD_DIR_IN)
+		pr_debug("word1: line_length = %d, xfer_height = %d\n",
+			dtd_get_line_length(dtd), dtd_get_xfer_height(dtd));
+
+	pr_debug("word2: start_addr = 0x%08x\n", dtd->start_addr);
+
+	pr_debug("word3: pkt_type = %d, mode = %d, dir = %d, chan = %d, "
+		"pri = %d, next_chan = %d\n", dtd_get_pkt_type(dtd),
+		dtd_get_mode(dtd), dir, chan, dtd_get_priority(dtd),
+		dtd_get_next_chan(dtd));
+
+	if (dir == DTD_DIR_IN)
+		pr_debug("word4: frame_width = %d, frame_height = %d\n",
+			dtd_get_frame_width(dtd), dtd_get_frame_height(dtd));
+	else
+		pr_debug("word4: desc_write_addr = 0x%08x, write_desc = %d, "
+			"drp_data = %d, use_desc_reg = %d\n",
+			dtd_get_desc_write_addr(dtd), dtd_get_write_desc(dtd),
+			dtd_get_drop_data(dtd), dtd_get_use_desc(dtd));
+
+	if (dir == DTD_DIR_IN)
+		pr_debug("word5: hor_start = %d, ver_start = %d\n",
+			dtd_get_h_start(dtd), dtd_get_v_start(dtd));
+	else
+		pr_debug("word5: max_width %d, max_height %d\n",
+			dtd_get_max_width(dtd), dtd_get_max_height(dtd));
+
+	pr_debug("word6: client specfic attr0 = 0x%08x\n", dtd->client_attr0);
+	pr_debug("word7: client specfic attr1 = 0x%08x\n", dtd->client_attr1);
+}
+
+/*
+ * append an outbound data transfer descriptor to the given descriptor list,
+ * this sets up a 'client to memory' VPDMA transfer for the given VPDMA channel
+ */
+void vpdma_add_out_dtd(struct vpdma_desc_list *list, struct v4l2_rect *c_rect,
+		const struct vpdma_data_format *fmt, dma_addr_t dma_addr,
+		enum vpdma_channel chan, u32 flags)
+{
+	int priority = 0;
+	int field = 0;
+	int notify = 1;
+	int channel, next_chan;
+	int depth = fmt->depth;
+	int stride;
+	struct vpdma_dtd *dtd;
+
+	channel = next_chan = chan_info[chan].num;
+
+	if (fmt->data_type == DATA_TYPE_C420)
+		depth = 8;
+
+	stride = (depth * c_rect->width) >> 3;
+	dma_addr += (c_rect->left * depth) >> 3;
+
+	dtd = list->next;
+	WARN_ON((void *)(dtd + 1) > (list->buf.addr + list->buf.size));
+
+	dtd->type_ctl_stride = dtd_type_ctl_stride(fmt->data_type,
+					notify,
+					field,
+					!!(flags & VPDMA_DATA_FRAME_1D),
+					!!(flags & VPDMA_DATA_EVEN_LINE_SKIP),
+					!!(flags & VPDMA_DATA_ODD_LINE_SKIP),
+					stride);
+	dtd->w1 = 0;
+	dtd->start_addr = (u32) dma_addr;
+	dtd->pkt_ctl = dtd_pkt_ctl(!!(flags & VPDMA_DATA_MODE_TILED),
+				DTD_DIR_OUT, channel, priority, next_chan);
+	dtd->desc_write_addr = dtd_desc_write_addr(0, 0, 0, 0);
+	dtd->max_width_height = dtd_max_width_height(MAX_OUT_WIDTH_1920,
+					MAX_OUT_HEIGHT_1080);
+	dtd->client_attr0 = 0;
+	dtd->client_attr1 = 0;
+
+	list->next = dtd + 1;
+
+	dump_dtd(dtd);
+}
+
+/*
+ * append an inbound data transfer descriptor to the given descriptor list,
+ * this sets up a 'memory to client' VPDMA transfer for the given VPDMA channel
+ */
+void vpdma_add_in_dtd(struct vpdma_desc_list *list, int frame_width,
+		int frame_height, struct v4l2_rect *c_rect,
+		const struct vpdma_data_format *fmt, dma_addr_t dma_addr,
+		enum vpdma_channel chan, int field, u32 flags)
+{
+	int priority = 0;
+	int notify = 1;
+	int depth = fmt->depth;
+	int channel, next_chan;
+	int stride;
+	int height = c_rect->height;
+	struct vpdma_dtd *dtd;
+
+	channel = next_chan = chan_info[chan].num;
+
+	if (fmt->data_type == DATA_TYPE_C420) {
+		height >>= 1;
+		frame_height >>= 1;
+		depth = 8;
+	}
+
+	stride = (depth * c_rect->width) >> 3;
+	dma_addr += (c_rect->left * depth) >> 3;
+
+	dtd = list->next;
+	WARN_ON((void *)(dtd + 1) > (list->buf.addr + list->buf.size));
+
+	dtd->type_ctl_stride = dtd_type_ctl_stride(fmt->data_type,
+					notify,
+					field,
+					!!(flags & VPDMA_DATA_FRAME_1D),
+					!!(flags & VPDMA_DATA_EVEN_LINE_SKIP),
+					!!(flags & VPDMA_DATA_ODD_LINE_SKIP),
+					stride);
+
+	dtd->xfer_length_height = dtd_xfer_length_height(c_rect->width, height);
+	dtd->start_addr = (u32) dma_addr;
+	dtd->pkt_ctl = dtd_pkt_ctl(!!(flags & VPDMA_DATA_MODE_TILED),
+				DTD_DIR_IN, channel, priority, next_chan);
+	dtd->frame_width_height = dtd_frame_width_height(frame_width,
+					frame_height);
+	dtd->start_h_v = dtd_start_h_v(c_rect->left, c_rect->top);
+	dtd->client_attr0 = 0;
+	dtd->client_attr1 = 0;
+
+	list->next = dtd + 1;
+
+	dump_dtd(dtd);
+}
+
 /* set or clear the mask for list complete interrupt */
 void vpdma_enable_list_complete_irq(struct vpdma_data *vpdma, int list_num,
 		bool enable)
diff --git a/drivers/media/platform/ti-vpe/vpdma.h b/drivers/media/platform/ti-vpe/vpdma.h
index 8056689..eaa2a71 100644
--- a/drivers/media/platform/ti-vpe/vpdma.h
+++ b/drivers/media/platform/ti-vpe/vpdma.h
@@ -124,6 +124,39 @@ enum vpdma_channel {
 	VPE_CHAN_RGB_OUT,
 };
 
+/* flags for VPDMA data descriptors */
+#define VPDMA_DATA_ODD_LINE_SKIP	(1 << 0)
+#define VPDMA_DATA_EVEN_LINE_SKIP	(1 << 1)
+#define VPDMA_DATA_FRAME_1D		(1 << 2)
+#define VPDMA_DATA_MODE_TILED		(1 << 3)
+
+/*
+ * client identifiers used for configuration descriptors
+ */
+#define CFD_MMR_CLIENT		0
+#define CFD_SC_CLIENT		4
+
+/* Address data block header format */
+struct vpdma_adb_hdr {
+	u32			offset;
+	u32			nwords;
+	u32			reserved0;
+	u32			reserved1;
+};
+
+/* helpers for creating ADB headers for config descriptors MMRs as client */
+#define ADB_ADDR(dma_buf, str, fld)	((dma_buf)->addr + offsetof(str, fld))
+#define MMR_ADB_ADDR(buf, str, fld)	ADB_ADDR(&(buf), struct str, fld)
+
+#define VPDMA_SET_MMR_ADB_HDR(buf, str, hdr, regs, offset_a)	\
+	do {							\
+		struct vpdma_adb_hdr *h;			\
+		struct str *adb = NULL;				\
+		h = MMR_ADB_ADDR(buf, str, hdr);		\
+		h->offset = (offset_a);				\
+		h->nwords = sizeof(adb->regs) >> 2;		\
+	} while (0)
+
 /* vpdma descriptor buffer allocation and management */
 int vpdma_alloc_desc_buf(struct vpdma_buf *buf, size_t size);
 void vpdma_free_desc_buf(struct vpdma_buf *buf);
@@ -136,6 +169,21 @@ void vpdma_reset_desc_list(struct vpdma_desc_list *list);
 void vpdma_free_desc_list(struct vpdma_desc_list *list);
 int vpdma_submit_descs(struct vpdma_data *vpdma, struct vpdma_desc_list *list);
 
+/* helpers for creating vpdma descriptors */
+void vpdma_add_cfd_block(struct vpdma_desc_list *list, int client,
+		struct vpdma_buf *blk, u32 dest_offset);
+void vpdma_add_cfd_adb(struct vpdma_desc_list *list, int client,
+		struct vpdma_buf *adb);
+void vpdma_add_sync_on_channel_ctd(struct vpdma_desc_list *list,
+		enum vpdma_channel chan);
+void vpdma_add_out_dtd(struct vpdma_desc_list *list, struct v4l2_rect *c_rect,
+		const struct vpdma_data_format *fmt, dma_addr_t dma_addr,
+		enum vpdma_channel chan, u32 flags);
+void vpdma_add_in_dtd(struct vpdma_desc_list *list, int frame_width,
+		int frame_height, struct v4l2_rect *c_rect,
+		const struct vpdma_data_format *fmt, dma_addr_t dma_addr,
+		enum vpdma_channel chan, int field, u32 flags);
+
 /* vpdma list interrupt management */
 void vpdma_enable_list_complete_irq(struct vpdma_data *vpdma, int list_num,
 		bool enable);
diff --git a/drivers/media/platform/ti-vpe/vpdma_priv.h b/drivers/media/platform/ti-vpe/vpdma_priv.h
index 8ff51a3..f0e9a80 100644
--- a/drivers/media/platform/ti-vpe/vpdma_priv.h
+++ b/drivers/media/platform/ti-vpe/vpdma_priv.h
@@ -116,4 +116,526 @@
 #define	VPE_CHAN_NUM_CHROMA_OUT		103
 #define	VPE_CHAN_NUM_RGB_OUT		106
 
+/*
+ * a VPDMA address data block payload for a configuration descriptor needs to
+ * have each sub block length as a multiple of 16 bytes. Therefore, the overall
+ * size of the payload also needs to be a multiple of 16 bytes. The sub block
+ * lengths should be ensured to be aligned by the VPDMA user.
+ */
+#define VPDMA_ADB_SIZE_ALIGN		0x0f
+
+/*
+ * data transfer descriptor
+ */
+struct vpdma_dtd {
+	u32			type_ctl_stride;
+	union {
+		u32		xfer_length_height;
+		u32		w1;
+	};
+	dma_addr_t		start_addr;
+	u32			pkt_ctl;
+	union {
+		u32		frame_width_height;	/* inbound */
+		dma_addr_t	desc_write_addr;	/* outbound */
+	};
+	union {
+		u32		start_h_v;		/* inbound */
+		u32		max_width_height;	/* outbound */
+	};
+	u32			client_attr0;
+	u32			client_attr1;
+};
+
+/* Data Transfer Descriptor specifics */
+#define DTD_NO_NOTIFY		0
+#define DTD_NOTIFY		1
+
+#define DTD_PKT_TYPE		0xa
+#define DTD_DIR_IN		0
+#define DTD_DIR_OUT		1
+
+/* type_ctl_stride */
+#define DTD_DATA_TYPE_MASK	0x3f
+#define DTD_DATA_TYPE_SHFT	26
+#define DTD_NOTIFY_MASK		0x01
+#define DTD_NOTIFY_SHFT		25
+#define DTD_FIELD_MASK		0x01
+#define DTD_FIELD_SHFT		24
+#define DTD_1D_MASK		0x01
+#define DTD_1D_SHFT		23
+#define DTD_EVEN_LINE_SKIP_MASK	0x01
+#define DTD_EVEN_LINE_SKIP_SHFT	20
+#define DTD_ODD_LINE_SKIP_MASK	0x01
+#define DTD_ODD_LINE_SKIP_SHFT	16
+#define DTD_LINE_STRIDE_MASK	0xffff
+#define DTD_LINE_STRIDE_SHFT	0
+
+/* xfer_length_height */
+#define DTD_LINE_LENGTH_MASK	0xffff
+#define DTD_LINE_LENGTH_SHFT	16
+#define DTD_XFER_HEIGHT_MASK	0xffff
+#define DTD_XFER_HEIGHT_SHFT	0
+
+/* pkt_ctl */
+#define DTD_PKT_TYPE_MASK	0x1f
+#define DTD_PKT_TYPE_SHFT	27
+#define DTD_MODE_MASK		0x01
+#define DTD_MODE_SHFT		26
+#define DTD_DIR_MASK		0x01
+#define DTD_DIR_SHFT		25
+#define DTD_CHAN_MASK		0x01ff
+#define DTD_CHAN_SHFT		16
+#define DTD_PRI_MASK		0x0f
+#define DTD_PRI_SHFT		9
+#define DTD_NEXT_CHAN_MASK	0x01ff
+#define DTD_NEXT_CHAN_SHFT	0
+
+/* frame_width_height */
+#define DTD_FRAME_WIDTH_MASK	0xffff
+#define DTD_FRAME_WIDTH_SHFT	16
+#define DTD_FRAME_HEIGHT_MASK	0xffff
+#define DTD_FRAME_HEIGHT_SHFT	0
+
+/* start_h_v */
+#define DTD_H_START_MASK	0xffff
+#define DTD_H_START_SHFT	16
+#define DTD_V_START_MASK	0xffff
+#define DTD_V_START_SHFT	0
+
+#define DTD_DESC_START_SHIFT	5
+#define DTD_WRITE_DESC_MASK	0x01
+#define DTD_WRITE_DESC_SHIFT	2
+#define DTD_DROP_DATA_MASK	0x01
+#define DTD_DROP_DATA_SHIFT	1
+#define DTD_USE_DESC_MASK	0x01
+#define DTD_USE_DESC_SHIFT	0
+
+/* max_width_height */
+#define DTD_MAX_WIDTH_MASK	0x07
+#define DTD_MAX_WIDTH_SHFT	4
+#define DTD_MAX_HEIGHT_MASK	0x07
+#define DTD_MAX_HEIGHT_SHFT	0
+
+/* max width configurations */
+ /* unlimited width */
+#define	MAX_OUT_WIDTH_UNLIMITED		0
+/* as specified in max_size1 reg */
+#define MAX_OUT_WIDTH_REG1		1
+/* as specified in max_size2 reg */
+#define MAX_OUT_WIDTH_REG2		2
+/* as specified in max_size3 reg */
+#define	MAX_OUT_WIDTH_REG3		3
+/* maximum of 352 pixels as width */
+#define MAX_OUT_WIDTH_352		4
+/* maximum of 768 pixels as width */
+#define	MAX_OUT_WIDTH_768		5
+/* maximum of 1280 pixels width */
+#define	MAX_OUT_WIDTH_1280		6
+/* maximum of 1920 pixels as width */
+#define	MAX_OUT_WIDTH_1920		7
+
+/* max height configurations */
+ /* unlimited height */
+#define	MAX_OUT_HEIGHT_UNLIMITED	0
+/* as specified in max_size1 reg */
+#define MAX_OUT_HEIGHT_REG1		1
+/* as specified in max_size2 reg */
+#define MAX_OUT_HEIGHT_REG2		2
+/* as specified in max_size3 reg */
+#define	MAX_OUT_HEIGHT_REG3		3
+/* maximum of 288 lines as height */
+#define MAX_OUT_HEIGHT_288		4
+/* maximum of 576 lines as height */
+#define	MAX_OUT_HEIGHT_576		5
+/* maximum of 720 lines as height */
+#define	MAX_OUT_HEIGHT_720		6
+/* maximum of 1080 lines as height */
+#define	MAX_OUT_HEIGHT_1080		7
+
+static inline u32 dtd_type_ctl_stride(int type, bool notify, int field,
+			bool one_d, bool even_line_skip, bool odd_line_skip,
+			int line_stride)
+{
+	return (type << DTD_DATA_TYPE_SHFT) | (notify << DTD_NOTIFY_SHFT) |
+		(field << DTD_FIELD_SHFT) | (one_d << DTD_1D_SHFT) |
+		(even_line_skip << DTD_EVEN_LINE_SKIP_SHFT) |
+		(odd_line_skip << DTD_ODD_LINE_SKIP_SHFT) |
+		line_stride;
+}
+
+static inline u32 dtd_xfer_length_height(int line_length, int xfer_height)
+{
+	return (line_length << DTD_LINE_LENGTH_SHFT) | xfer_height;
+}
+
+static inline u32 dtd_pkt_ctl(bool mode, bool dir, int chan, int pri,
+			int next_chan)
+{
+	return (DTD_PKT_TYPE << DTD_PKT_TYPE_SHFT) | (mode << DTD_MODE_SHFT) |
+		(dir << DTD_DIR_SHFT) | (chan << DTD_CHAN_SHFT) |
+		(pri << DTD_PRI_SHFT) | next_chan;
+}
+
+static inline u32 dtd_frame_width_height(int width, int height)
+{
+	return (width << DTD_FRAME_WIDTH_SHFT) | height;
+}
+
+static inline u32 dtd_desc_write_addr(unsigned int addr, bool write_desc,
+			bool drop_data, bool use_desc)
+{
+	return (addr << DTD_DESC_START_SHIFT) |
+		(write_desc << DTD_WRITE_DESC_SHIFT) |
+		(drop_data << DTD_DROP_DATA_SHIFT) |
+		use_desc;
+}
+
+static inline u32 dtd_start_h_v(int h_start, int v_start)
+{
+	return (h_start << DTD_H_START_SHFT) | v_start;
+}
+
+static inline u32 dtd_max_width_height(int max_width, int max_height)
+{
+	return (max_width << DTD_MAX_WIDTH_SHFT) | max_height;
+}
+
+static inline int dtd_get_data_type(struct vpdma_dtd *dtd)
+{
+	return dtd->type_ctl_stride >> DTD_DATA_TYPE_SHFT;
+}
+
+static inline bool dtd_get_notify(struct vpdma_dtd *dtd)
+{
+	return (dtd->type_ctl_stride >> DTD_NOTIFY_SHFT) & DTD_NOTIFY_MASK;
+}
+
+static inline int dtd_get_field(struct vpdma_dtd *dtd)
+{
+	return (dtd->type_ctl_stride >> DTD_FIELD_SHFT) & DTD_FIELD_MASK;
+}
+
+static inline bool dtd_get_1d(struct vpdma_dtd *dtd)
+{
+	return (dtd->type_ctl_stride >> DTD_1D_SHFT) & DTD_1D_MASK;
+}
+
+static inline bool dtd_get_even_line_skip(struct vpdma_dtd *dtd)
+{
+	return (dtd->type_ctl_stride >> DTD_EVEN_LINE_SKIP_SHFT)
+		& DTD_EVEN_LINE_SKIP_MASK;
+}
+
+static inline bool dtd_get_odd_line_skip(struct vpdma_dtd *dtd)
+{
+	return (dtd->type_ctl_stride >> DTD_ODD_LINE_SKIP_SHFT)
+		& DTD_ODD_LINE_SKIP_MASK;
+}
+
+static inline int dtd_get_line_stride(struct vpdma_dtd *dtd)
+{
+	return dtd->type_ctl_stride & DTD_LINE_STRIDE_MASK;
+}
+
+static inline int dtd_get_line_length(struct vpdma_dtd *dtd)
+{
+	return dtd->xfer_length_height >> DTD_LINE_LENGTH_SHFT;
+}
+
+static inline int dtd_get_xfer_height(struct vpdma_dtd *dtd)
+{
+	return dtd->xfer_length_height & DTD_XFER_HEIGHT_MASK;
+}
+
+static inline int dtd_get_pkt_type(struct vpdma_dtd *dtd)
+{
+	return dtd->pkt_ctl >> DTD_PKT_TYPE_SHFT;
+}
+
+static inline bool dtd_get_mode(struct vpdma_dtd *dtd)
+{
+	return (dtd->pkt_ctl >> DTD_MODE_SHFT) & DTD_MODE_MASK;
+}
+
+static inline bool dtd_get_dir(struct vpdma_dtd *dtd)
+{
+	return (dtd->pkt_ctl >> DTD_DIR_SHFT) & DTD_DIR_MASK;
+}
+
+static inline int dtd_get_chan(struct vpdma_dtd *dtd)
+{
+	return (dtd->pkt_ctl >> DTD_CHAN_SHFT) & DTD_CHAN_MASK;
+}
+
+static inline int dtd_get_priority(struct vpdma_dtd *dtd)
+{
+	return (dtd->pkt_ctl >> DTD_PRI_SHFT) & DTD_PRI_MASK;
+}
+
+static inline int dtd_get_next_chan(struct vpdma_dtd *dtd)
+{
+	return (dtd->pkt_ctl >> DTD_NEXT_CHAN_SHFT) & DTD_NEXT_CHAN_MASK;
+}
+
+static inline int dtd_get_frame_width(struct vpdma_dtd *dtd)
+{
+	return dtd->frame_width_height >> DTD_FRAME_WIDTH_SHFT;
+}
+
+static inline int dtd_get_frame_height(struct vpdma_dtd *dtd)
+{
+	return dtd->frame_width_height & DTD_FRAME_HEIGHT_MASK;
+}
+
+static inline int dtd_get_desc_write_addr(struct vpdma_dtd *dtd)
+{
+	return dtd->desc_write_addr >> DTD_DESC_START_SHIFT;
+}
+
+static inline bool dtd_get_write_desc(struct vpdma_dtd *dtd)
+{
+	return (dtd->desc_write_addr >> DTD_WRITE_DESC_SHIFT) &
+							DTD_WRITE_DESC_MASK;
+}
+
+static inline bool dtd_get_drop_data(struct vpdma_dtd *dtd)
+{
+	return (dtd->desc_write_addr >> DTD_DROP_DATA_SHIFT) &
+							DTD_DROP_DATA_MASK;
+}
+
+static inline bool dtd_get_use_desc(struct vpdma_dtd *dtd)
+{
+	return dtd->desc_write_addr & DTD_USE_DESC_MASK;
+}
+
+static inline int dtd_get_h_start(struct vpdma_dtd *dtd)
+{
+	return dtd->start_h_v >> DTD_H_START_SHFT;
+}
+
+static inline int dtd_get_v_start(struct vpdma_dtd *dtd)
+{
+	return dtd->start_h_v & DTD_V_START_MASK;
+}
+
+static inline int dtd_get_max_width(struct vpdma_dtd *dtd)
+{
+	return (dtd->max_width_height >> DTD_MAX_WIDTH_SHFT) &
+							DTD_MAX_WIDTH_MASK;
+}
+
+static inline int dtd_get_max_height(struct vpdma_dtd *dtd)
+{
+	return (dtd->max_width_height >> DTD_MAX_HEIGHT_SHFT) &
+							DTD_MAX_HEIGHT_MASK;
+}
+
+/*
+ * configuration descriptor
+ */
+struct vpdma_cfd {
+	union {
+		u32	dest_addr_offset;
+		u32	w0;
+	};
+	union {
+		u32	block_len;		/* in words */
+		u32	w1;
+	};
+	u32		payload_addr;
+	u32		ctl_payload_len;	/* in words */
+};
+
+/* Configuration descriptor specifics */
+
+#define CFD_PKT_TYPE		0xb
+
+#define CFD_DIRECT		1
+#define CFD_INDIRECT		0
+#define CFD_CLS_ADB		0
+#define CFD_CLS_BLOCK		1
+
+/* block_len */
+#define CFD__BLOCK_LEN_MASK	0xffff
+#define CFD__BLOCK_LEN_SHFT	0
+
+/* ctl_payload_len */
+#define CFD_PKT_TYPE_MASK	0x1f
+#define CFD_PKT_TYPE_SHFT	27
+#define CFD_DIRECT_MASK		0x01
+#define CFD_DIRECT_SHFT		26
+#define CFD_CLASS_MASK		0x03
+#define CFD_CLASS_SHFT		24
+#define CFD_DEST_MASK		0xff
+#define CFD_DEST_SHFT		16
+#define CFD_PAYLOAD_LEN_MASK	0xffff
+#define CFD_PAYLOAD_LEN_SHFT	0
+
+static inline u32 cfd_pkt_payload_len(bool direct, int cls, int dest,
+		int payload_len)
+{
+	return (CFD_PKT_TYPE << CFD_PKT_TYPE_SHFT) |
+		(direct << CFD_DIRECT_SHFT) |
+		(cls << CFD_CLASS_SHFT) |
+		(dest << CFD_DEST_SHFT) |
+		payload_len;
+}
+
+static inline int cfd_get_pkt_type(struct vpdma_cfd *cfd)
+{
+	return cfd->ctl_payload_len >> CFD_PKT_TYPE_SHFT;
+}
+
+static inline bool cfd_get_direct(struct vpdma_cfd *cfd)
+{
+	return (cfd->ctl_payload_len >> CFD_DIRECT_SHFT) & CFD_DIRECT_MASK;
+}
+
+static inline bool cfd_get_class(struct vpdma_cfd *cfd)
+{
+	return (cfd->ctl_payload_len >> CFD_CLASS_SHFT) & CFD_CLASS_MASK;
+}
+
+static inline int cfd_get_dest(struct vpdma_cfd *cfd)
+{
+	return (cfd->ctl_payload_len >> CFD_DEST_SHFT) & CFD_DEST_MASK;
+}
+
+static inline int cfd_get_payload_len(struct vpdma_cfd *cfd)
+{
+	return cfd->ctl_payload_len & CFD_PAYLOAD_LEN_MASK;
+}
+
+/*
+ * control descriptor
+ */
+struct vpdma_ctd {
+	union {
+		u32	timer_value;
+		u32	list_addr;
+		u32	w0;
+	};
+	union {
+		u32	pixel_line_count;
+		u32	list_size;
+		u32	w1;
+	};
+	union {
+		u32	event;
+		u32	fid_ctl;
+		u32	w2;
+	};
+	u32		type_source_ctl;
+};
+
+/* control descriptor types */
+#define CTD_TYPE_SYNC_ON_CLIENT		0
+#define CTD_TYPE_SYNC_ON_LIST		1
+#define CTD_TYPE_SYNC_ON_EXT		2
+#define CTD_TYPE_SYNC_ON_LM_TIMER	3
+#define CTD_TYPE_SYNC_ON_CHANNEL	4
+#define CTD_TYPE_CHNG_CLIENT_IRQ	5
+#define CTD_TYPE_SEND_IRQ		6
+#define CTD_TYPE_RELOAD_LIST		7
+#define CTD_TYPE_ABORT_CHANNEL		8
+
+#define CTD_PKT_TYPE		0xc
+
+/* timer_value */
+#define CTD_TIMER_VALUE_MASK	0xffff
+#define CTD_TIMER_VALUE_SHFT	0
+
+/* pixel_line_count */
+#define CTD_PIXEL_COUNT_MASK	0xffff
+#define CTD_PIXEL_COUNT_SHFT	16
+#define CTD_LINE_COUNT_MASK	0xffff
+#define CTD_LINE_COUNT_SHFT	0
+
+/* list_size */
+#define CTD_LIST_SIZE_MASK	0xffff
+#define CTD_LIST_SIZE_SHFT	0
+
+/* event */
+#define CTD_EVENT_MASK		0x0f
+#define CTD_EVENT_SHFT		0
+
+/* fid_ctl */
+#define CTD_FID2_MASK		0x03
+#define CTD_FID2_SHFT		4
+#define CTD_FID1_MASK		0x03
+#define CTD_FID1_SHFT		2
+#define CTD_FID0_MASK		0x03
+#define CTD_FID0_SHFT		0
+
+/* type_source_ctl */
+#define CTD_PKT_TYPE_MASK	0x1f
+#define CTD_PKT_TYPE_SHFT	27
+#define CTD_SOURCE_MASK		0xff
+#define CTD_SOURCE_SHFT		16
+#define CTD_CONTROL_MASK	0x0f
+#define CTD_CONTROL_SHFT	0
+
+static inline u32 ctd_pixel_line_count(int pixel_count, int line_count)
+{
+	return (pixel_count << CTD_PIXEL_COUNT_SHFT) | line_count;
+}
+
+static inline u32 ctd_set_fid_ctl(int fid0, int fid1, int fid2)
+{
+	return (fid2 << CTD_FID2_SHFT) | (fid1 << CTD_FID1_SHFT) | fid0;
+}
+
+static inline u32 ctd_type_source_ctl(int source, int control)
+{
+	return (CTD_PKT_TYPE << CTD_PKT_TYPE_SHFT) |
+		(source << CTD_SOURCE_SHFT) | control;
+}
+
+static inline u32 ctd_get_pixel_count(struct vpdma_ctd *ctd)
+{
+	return ctd->pixel_line_count >> CTD_PIXEL_COUNT_SHFT;
+}
+
+static inline int ctd_get_line_count(struct vpdma_ctd *ctd)
+{
+	return ctd->pixel_line_count & CTD_LINE_COUNT_MASK;
+}
+
+static inline int ctd_get_event(struct vpdma_ctd *ctd)
+{
+	return ctd->event & CTD_EVENT_MASK;
+}
+
+static inline int ctd_get_fid2_ctl(struct vpdma_ctd *ctd)
+{
+	return (ctd->fid_ctl >> CTD_FID2_SHFT) & CTD_FID2_MASK;
+}
+
+static inline int ctd_get_fid1_ctl(struct vpdma_ctd *ctd)
+{
+	return (ctd->fid_ctl >> CTD_FID1_SHFT) & CTD_FID1_MASK;
+}
+
+static inline int ctd_get_fid0_ctl(struct vpdma_ctd *ctd)
+{
+	return ctd->fid_ctl & CTD_FID2_MASK;
+}
+
+static inline int ctd_get_pkt_type(struct vpdma_ctd *ctd)
+{
+	return ctd->type_source_ctl >> CTD_PKT_TYPE_SHFT;
+}
+
+static inline int ctd_get_source(struct vpdma_ctd *ctd)
+{
+	return (ctd->type_source_ctl >> CTD_SOURCE_SHFT) & CTD_SOURCE_MASK;
+}
+
+static inline int ctd_get_ctl(struct vpdma_ctd *ctd)
+{
+	return ctd->type_source_ctl & CTD_CONTROL_MASK;
+}
+
 #endif
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v5 3/4] v4l: ti-vpe: Add VPE mem to mem driver
  2013-10-16  5:36       ` Archit Taneja
@ 2013-10-16  5:36         ` Archit Taneja
  -1 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-10-16  5:36 UTC (permalink / raw)
  To: k.debski; +Cc: hverkuil, linux-media, linux-omap, Archit Taneja

VPE is a block which consists of a single memory to memory path which can
perform chrominance up/down sampling, de-interlacing, scaling, and color space
conversion of raster or tiled YUV420 coplanar, YUV422 coplanar or YUV422
interleaved video formats.

We create a mem2mem driver based primarily on the mem2mem-testdev example.
The de-interlacer, scaler and color space converter are all bypassed for now
to keep the driver simple. Chroma up/down sampler blocks are implemented, so
conversion beteen different YUV formats is possible.

Each mem2mem context allocates a buffer for VPE MMR values which it will use
when it gets access to the VPE HW via the mem2mem queue, it also allocates
a VPDMA descriptor list to which configuration and data descriptors are added.

Based on the information received via v4l2 ioctls for the source and
destination queues, the driver configures the values for the MMRs, and stores
them in the buffer. There are also some VPDMA parameters like frame start and
line mode which needs to be configured, these are configured by direct register
writes via the VPDMA helper functions.

The driver's device_run() mem2mem op will add each descriptor based on how the
source and destination queues are set up for the given ctx, once the list is
prepared, it's submitted to VPDMA, these descriptors when parsed by VPDMA will
upload MMR registers, start DMA of video buffers on the various input and output
clients/ports.

When the list is parsed completely(and the DMAs on all the output ports done),
an interrupt is generated which we use to notify that the source and destination
buffers are done.

The rest of the driver is quite similar to other mem2mem drivers, we use the
multiplane v4l2 ioctls as the HW support coplanar formats.

Acked-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Archit Taneja <archit@ti.com>
---
 drivers/media/platform/Kconfig           |   16 +
 drivers/media/platform/Makefile          |    2 +
 drivers/media/platform/ti-vpe/Makefile   |    5 +
 drivers/media/platform/ti-vpe/vpe.c      | 1775 ++++++++++++++++++++++++++++++
 drivers/media/platform/ti-vpe/vpe_regs.h |  496 +++++++++
 include/uapi/linux/v4l2-controls.h       |    4 +
 6 files changed, 2298 insertions(+)
 create mode 100644 drivers/media/platform/ti-vpe/Makefile
 create mode 100644 drivers/media/platform/ti-vpe/vpe.c
 create mode 100644 drivers/media/platform/ti-vpe/vpe_regs.h

diff --git a/drivers/media/platform/Kconfig b/drivers/media/platform/Kconfig
index c7caf94..fc84d99 100644
--- a/drivers/media/platform/Kconfig
+++ b/drivers/media/platform/Kconfig
@@ -220,6 +220,22 @@ config VIDEO_RENESAS_VSP1
 	  To compile this driver as a module, choose M here: the module
 	  will be called vsp1.
 
+config VIDEO_TI_VPE
+	tristate "TI VPE (Video Processing Engine) driver"
+	depends on VIDEO_DEV && VIDEO_V4L2 && SOC_DRA7XX
+	select VIDEOBUF2_DMA_CONTIG
+	select V4L2_MEM2MEM_DEV
+	default n
+	---help---
+	  Support for the TI VPE(Video Processing Engine) block
+	  found on DRA7XX SoC.
+
+config VIDEO_TI_VPE_DEBUG
+	bool "VPE debug messages"
+	depends on VIDEO_TI_VPE
+	---help---
+	  Enable debug messages on VPE driver.
+
 endif # V4L_MEM2MEM_DRIVERS
 
 menuconfig V4L_TEST_DRIVERS
diff --git a/drivers/media/platform/Makefile b/drivers/media/platform/Makefile
index 4e4da48..1348ba1 100644
--- a/drivers/media/platform/Makefile
+++ b/drivers/media/platform/Makefile
@@ -22,6 +22,8 @@ obj-$(CONFIG_VIDEO_VIVI) += vivi.o
 
 obj-$(CONFIG_VIDEO_MEM2MEM_TESTDEV) += mem2mem_testdev.o
 
+obj-$(CONFIG_VIDEO_TI_VPE)		+= ti-vpe/
+
 obj-$(CONFIG_VIDEO_MX2_EMMAPRP)		+= mx2_emmaprp.o
 obj-$(CONFIG_VIDEO_CODA) 		+= coda.o
 
diff --git a/drivers/media/platform/ti-vpe/Makefile b/drivers/media/platform/ti-vpe/Makefile
new file mode 100644
index 0000000..cbf0a80
--- /dev/null
+++ b/drivers/media/platform/ti-vpe/Makefile
@@ -0,0 +1,5 @@
+obj-$(CONFIG_VIDEO_TI_VPE) += ti-vpe.o
+
+ti-vpe-y := vpe.o vpdma.o
+
+ccflags-$(CONFIG_VIDEO_TI_VPE_DEBUG) += -DDEBUG
diff --git a/drivers/media/platform/ti-vpe/vpe.c b/drivers/media/platform/ti-vpe/vpe.c
new file mode 100644
index 0000000..3bd9ca6
--- /dev/null
+++ b/drivers/media/platform/ti-vpe/vpe.c
@@ -0,0 +1,1775 @@
+/*
+ * TI VPE mem2mem driver, based on the virtual v4l2-mem2mem example driver
+ *
+ * Copyright (c) 2013 Texas Instruments Inc.
+ * David Griego, <dagriego@biglakesoftware.com>
+ * Dale Farnsworth, <dale@farnsworth.org>
+ * Archit Taneja, <archit@ti.com>
+ *
+ * Copyright (c) 2009-2010 Samsung Electronics Co., Ltd.
+ * Pawel Osciak, <pawel@osciak.com>
+ * Marek Szyprowski, <m.szyprowski@samsung.com>
+ *
+ * Based on the virtual v4l2-mem2mem example device
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation
+ */
+
+#include <linux/delay.h>
+#include <linux/dma-mapping.h>
+#include <linux/err.h>
+#include <linux/fs.h>
+#include <linux/interrupt.h>
+#include <linux/io.h>
+#include <linux/ioctl.h>
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <linux/pm_runtime.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/videodev2.h>
+
+#include <media/v4l2-common.h>
+#include <media/v4l2-ctrls.h>
+#include <media/v4l2-device.h>
+#include <media/v4l2-event.h>
+#include <media/v4l2-ioctl.h>
+#include <media/v4l2-mem2mem.h>
+#include <media/videobuf2-core.h>
+#include <media/videobuf2-dma-contig.h>
+
+#include "vpdma.h"
+#include "vpe_regs.h"
+
+#define VPE_MODULE_NAME "vpe"
+
+/* minimum and maximum frame sizes */
+#define MIN_W		128
+#define MIN_H		128
+#define MAX_W		1920
+#define MAX_H		1080
+
+/* required alignments */
+#define S_ALIGN		0	/* multiple of 1 */
+#define H_ALIGN		1	/* multiple of 2 */
+#define W_ALIGN		1	/* multiple of 2 */
+
+/* multiple of 128 bits, line stride, 16 bytes */
+#define L_ALIGN		4
+
+/* flags that indicate a format can be used for capture/output */
+#define VPE_FMT_TYPE_CAPTURE	(1 << 0)
+#define VPE_FMT_TYPE_OUTPUT	(1 << 1)
+
+/* used as plane indices */
+#define VPE_MAX_PLANES	2
+#define VPE_LUMA	0
+#define VPE_CHROMA	1
+
+/* per m2m context info */
+#define VPE_DEF_BUFS_PER_JOB	1	/* default one buffer per batch job */
+
+/*
+ * each VPE context can need up to 3 config desciptors, 7 input descriptors,
+ * 3 output descriptors, and 10 control descriptors
+ */
+#define VPE_DESC_LIST_SIZE	(10 * VPDMA_DTD_DESC_SIZE +	\
+					13 * VPDMA_CFD_CTD_DESC_SIZE)
+
+#define vpe_dbg(vpedev, fmt, arg...)	\
+		dev_dbg((vpedev)->v4l2_dev.dev, fmt, ##arg)
+#define vpe_err(vpedev, fmt, arg...)	\
+		dev_err((vpedev)->v4l2_dev.dev, fmt, ##arg)
+
+struct vpe_us_coeffs {
+	unsigned short	anchor_fid0_c0;
+	unsigned short	anchor_fid0_c1;
+	unsigned short	anchor_fid0_c2;
+	unsigned short	anchor_fid0_c3;
+	unsigned short	interp_fid0_c0;
+	unsigned short	interp_fid0_c1;
+	unsigned short	interp_fid0_c2;
+	unsigned short	interp_fid0_c3;
+	unsigned short	anchor_fid1_c0;
+	unsigned short	anchor_fid1_c1;
+	unsigned short	anchor_fid1_c2;
+	unsigned short	anchor_fid1_c3;
+	unsigned short	interp_fid1_c0;
+	unsigned short	interp_fid1_c1;
+	unsigned short	interp_fid1_c2;
+	unsigned short	interp_fid1_c3;
+};
+
+/*
+ * Default upsampler coefficients
+ */
+static const struct vpe_us_coeffs us_coeffs[] = {
+	{
+		/* Coefficients for progressive input */
+		0x00C8, 0x0348, 0x0018, 0x3FD8, 0x3FB8, 0x0378, 0x00E8, 0x3FE8,
+		0x00C8, 0x0348, 0x0018, 0x3FD8, 0x3FB8, 0x0378, 0x00E8, 0x3FE8,
+	},
+};
+
+/*
+ * The port_data structure contains per-port data.
+ */
+struct vpe_port_data {
+	enum vpdma_channel channel;	/* VPDMA channel */
+	u8	vb_part;		/* plane index for co-panar formats */
+};
+
+/*
+ * Define indices into the port_data tables
+ */
+#define VPE_PORT_LUMA1_IN	0
+#define VPE_PORT_CHROMA1_IN	1
+#define VPE_PORT_LUMA_OUT	8
+#define VPE_PORT_CHROMA_OUT	9
+#define VPE_PORT_RGB_OUT	10
+
+static const struct vpe_port_data port_data[11] = {
+	[VPE_PORT_LUMA1_IN] = {
+		.channel	= VPE_CHAN_LUMA1_IN,
+		.vb_part	= VPE_LUMA,
+	},
+	[VPE_PORT_CHROMA1_IN] = {
+		.channel	= VPE_CHAN_CHROMA1_IN,
+		.vb_part	= VPE_CHROMA,
+	},
+	[VPE_PORT_LUMA_OUT] = {
+		.channel	= VPE_CHAN_LUMA_OUT,
+		.vb_part	= VPE_LUMA,
+	},
+	[VPE_PORT_CHROMA_OUT] = {
+		.channel	= VPE_CHAN_CHROMA_OUT,
+		.vb_part	= VPE_CHROMA,
+	},
+	[VPE_PORT_RGB_OUT] = {
+		.channel	= VPE_CHAN_RGB_OUT,
+		.vb_part	= VPE_LUMA,
+	},
+};
+
+
+/* driver info for each of the supported video formats */
+struct vpe_fmt {
+	char	*name;			/* human-readable name */
+	u32	fourcc;			/* standard format identifier */
+	u8	types;			/* CAPTURE and/or OUTPUT */
+	u8	coplanar;		/* set for unpacked Luma and Chroma */
+	/* vpdma format info for each plane */
+	struct vpdma_data_format const *vpdma_fmt[VPE_MAX_PLANES];
+};
+
+static struct vpe_fmt vpe_formats[] = {
+	{
+		.name		= "YUV 422 co-planar",
+		.fourcc		= V4L2_PIX_FMT_NV16,
+		.types		= VPE_FMT_TYPE_CAPTURE | VPE_FMT_TYPE_OUTPUT,
+		.coplanar	= 1,
+		.vpdma_fmt	= { &vpdma_yuv_fmts[VPDMA_DATA_FMT_Y444],
+				    &vpdma_yuv_fmts[VPDMA_DATA_FMT_C444],
+				  },
+	},
+	{
+		.name		= "YUV 420 co-planar",
+		.fourcc		= V4L2_PIX_FMT_NV12,
+		.types		= VPE_FMT_TYPE_CAPTURE | VPE_FMT_TYPE_OUTPUT,
+		.coplanar	= 1,
+		.vpdma_fmt	= { &vpdma_yuv_fmts[VPDMA_DATA_FMT_Y420],
+				    &vpdma_yuv_fmts[VPDMA_DATA_FMT_C420],
+				  },
+	},
+	{
+		.name		= "YUYV 422 packed",
+		.fourcc		= V4L2_PIX_FMT_YUYV,
+		.types		= VPE_FMT_TYPE_CAPTURE | VPE_FMT_TYPE_OUTPUT,
+		.coplanar	= 0,
+		.vpdma_fmt	= { &vpdma_yuv_fmts[VPDMA_DATA_FMT_YC422],
+				  },
+	},
+	{
+		.name		= "UYVY 422 packed",
+		.fourcc		= V4L2_PIX_FMT_UYVY,
+		.types		= VPE_FMT_TYPE_CAPTURE | VPE_FMT_TYPE_OUTPUT,
+		.coplanar	= 0,
+		.vpdma_fmt	= { &vpdma_yuv_fmts[VPDMA_DATA_FMT_CY422],
+				  },
+	},
+};
+
+/*
+ * per-queue, driver-specific private data.
+ * there is one source queue and one destination queue for each m2m context.
+ */
+struct vpe_q_data {
+	unsigned int		width;				/* frame width */
+	unsigned int		height;				/* frame height */
+	unsigned int		bytesperline[VPE_MAX_PLANES];	/* bytes per line in memory */
+	enum v4l2_colorspace	colorspace;
+	unsigned int		flags;
+	unsigned int		sizeimage[VPE_MAX_PLANES];	/* image size in memory */
+	struct v4l2_rect	c_rect;				/* crop/compose rectangle */
+	struct vpe_fmt		*fmt;				/* format info */
+};
+
+/* vpe_q_data flag bits */
+#define	Q_DATA_FRAME_1D		(1 << 0)
+#define	Q_DATA_MODE_TILED	(1 << 1)
+
+enum {
+	Q_DATA_SRC = 0,
+	Q_DATA_DST = 1,
+};
+
+/* find our format description corresponding to the passed v4l2_format */
+static struct vpe_fmt *find_format(struct v4l2_format *f)
+{
+	struct vpe_fmt *fmt;
+	unsigned int k;
+
+	for (k = 0; k < ARRAY_SIZE(vpe_formats); k++) {
+		fmt = &vpe_formats[k];
+		if (fmt->fourcc == f->fmt.pix.pixelformat)
+			return fmt;
+	}
+
+	return NULL;
+}
+
+/*
+ * there is one vpe_dev structure in the driver, it is shared by
+ * all instances.
+ */
+struct vpe_dev {
+	struct v4l2_device	v4l2_dev;
+	struct video_device	vfd;
+	struct v4l2_m2m_dev	*m2m_dev;
+
+	atomic_t		num_instances;	/* count of driver instances */
+	dma_addr_t		loaded_mmrs;	/* shadow mmrs in device */
+	struct mutex		dev_mutex;
+	spinlock_t		lock;
+
+	int			irq;
+	void __iomem		*base;
+
+	struct vb2_alloc_ctx	*alloc_ctx;
+	struct vpdma_data	*vpdma;		/* vpdma data handle */
+};
+
+/*
+ * There is one vpe_ctx structure for each m2m context.
+ */
+struct vpe_ctx {
+	struct v4l2_fh		fh;
+	struct vpe_dev		*dev;
+	struct v4l2_m2m_ctx	*m2m_ctx;
+	struct v4l2_ctrl_handler hdl;
+
+	unsigned int		sequence;		/* current frame/field seq */
+	unsigned int		aborting;		/* abort after next irq */
+
+	unsigned int		bufs_per_job;		/* input buffers per batch */
+	unsigned int		bufs_completed;		/* bufs done in this batch */
+
+	struct vpe_q_data	q_data[2];		/* src & dst queue data */
+	struct vb2_buffer	*src_vb;
+	struct vb2_buffer	*dst_vb;
+
+	struct vpdma_buf	mmr_adb;		/* shadow reg addr/data block */
+	struct vpdma_desc_list	desc_list;		/* DMA descriptor list */
+
+	bool			load_mmrs;		/* have new shadow reg values */
+};
+
+
+/*
+ * M2M devices get 2 queues.
+ * Return the queue given the type.
+ */
+static struct vpe_q_data *get_q_data(struct vpe_ctx *ctx,
+				     enum v4l2_buf_type type)
+{
+	switch (type) {
+	case V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE:
+		return &ctx->q_data[Q_DATA_SRC];
+	case V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE:
+		return &ctx->q_data[Q_DATA_DST];
+	default:
+		BUG();
+	}
+	return NULL;
+}
+
+static u32 read_reg(struct vpe_dev *dev, int offset)
+{
+	return ioread32(dev->base + offset);
+}
+
+static void write_reg(struct vpe_dev *dev, int offset, u32 value)
+{
+	iowrite32(value, dev->base + offset);
+}
+
+/* register field read/write helpers */
+static int get_field(u32 value, u32 mask, int shift)
+{
+	return (value & (mask << shift)) >> shift;
+}
+
+static int read_field_reg(struct vpe_dev *dev, int offset, u32 mask, int shift)
+{
+	return get_field(read_reg(dev, offset), mask, shift);
+}
+
+static void write_field(u32 *valp, u32 field, u32 mask, int shift)
+{
+	u32 val = *valp;
+
+	val &= ~(mask << shift);
+	val |= (field & mask) << shift;
+	*valp = val;
+}
+
+static void write_field_reg(struct vpe_dev *dev, int offset, u32 field,
+		u32 mask, int shift)
+{
+	u32 val = read_reg(dev, offset);
+
+	write_field(&val, field, mask, shift);
+
+	write_reg(dev, offset, val);
+}
+
+/*
+ * DMA address/data block for the shadow registers
+ */
+struct vpe_mmr_adb {
+	struct vpdma_adb_hdr	out_fmt_hdr;
+	u32			out_fmt_reg[1];
+	u32			out_fmt_pad[3];
+	struct vpdma_adb_hdr	us1_hdr;
+	u32			us1_regs[8];
+	struct vpdma_adb_hdr	us2_hdr;
+	u32			us2_regs[8];
+	struct vpdma_adb_hdr	us3_hdr;
+	u32			us3_regs[8];
+	struct vpdma_adb_hdr	dei_hdr;
+	u32			dei_regs[1];
+	u32			dei_pad[3];
+	struct vpdma_adb_hdr	sc_hdr;
+	u32			sc_regs[1];
+	u32			sc_pad[3];
+	struct vpdma_adb_hdr	csc_hdr;
+	u32			csc_regs[6];
+	u32			csc_pad[2];
+};
+
+#define VPE_SET_MMR_ADB_HDR(ctx, hdr, regs, offset_a)	\
+	VPDMA_SET_MMR_ADB_HDR(ctx->mmr_adb, vpe_mmr_adb, hdr, regs, offset_a)
+/*
+ * Set the headers for all of the address/data block structures.
+ */
+static void init_adb_hdrs(struct vpe_ctx *ctx)
+{
+	VPE_SET_MMR_ADB_HDR(ctx, out_fmt_hdr, out_fmt_reg, VPE_CLK_FORMAT_SELECT);
+	VPE_SET_MMR_ADB_HDR(ctx, us1_hdr, us1_regs, VPE_US1_R0);
+	VPE_SET_MMR_ADB_HDR(ctx, us2_hdr, us2_regs, VPE_US2_R0);
+	VPE_SET_MMR_ADB_HDR(ctx, us3_hdr, us3_regs, VPE_US3_R0);
+	VPE_SET_MMR_ADB_HDR(ctx, dei_hdr, dei_regs, VPE_DEI_FRAME_SIZE);
+	VPE_SET_MMR_ADB_HDR(ctx, sc_hdr, sc_regs, VPE_SC_MP_SC0);
+	VPE_SET_MMR_ADB_HDR(ctx, csc_hdr, csc_regs, VPE_CSC_CSC00);
+};
+
+/*
+ * Enable or disable the VPE clocks
+ */
+static void vpe_set_clock_enable(struct vpe_dev *dev, bool on)
+{
+	u32 val = 0;
+
+	if (on)
+		val = VPE_DATA_PATH_CLK_ENABLE | VPE_VPEDMA_CLK_ENABLE;
+	write_reg(dev, VPE_CLK_ENABLE, val);
+}
+
+static void vpe_top_reset(struct vpe_dev *dev)
+{
+
+	write_field_reg(dev, VPE_CLK_RESET, 1, VPE_DATA_PATH_CLK_RESET_MASK,
+		VPE_DATA_PATH_CLK_RESET_SHIFT);
+
+	usleep_range(100, 150);
+
+	write_field_reg(dev, VPE_CLK_RESET, 0, VPE_DATA_PATH_CLK_RESET_MASK,
+		VPE_DATA_PATH_CLK_RESET_SHIFT);
+}
+
+static void vpe_top_vpdma_reset(struct vpe_dev *dev)
+{
+	write_field_reg(dev, VPE_CLK_RESET, 1, VPE_VPDMA_CLK_RESET_MASK,
+		VPE_VPDMA_CLK_RESET_SHIFT);
+
+	usleep_range(100, 150);
+
+	write_field_reg(dev, VPE_CLK_RESET, 0, VPE_VPDMA_CLK_RESET_MASK,
+		VPE_VPDMA_CLK_RESET_SHIFT);
+}
+
+/*
+ * Load the correct of upsampler coefficients into the shadow MMRs
+ */
+static void set_us_coefficients(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	u32 *us1_reg = &mmr_adb->us1_regs[0];
+	u32 *us2_reg = &mmr_adb->us2_regs[0];
+	u32 *us3_reg = &mmr_adb->us3_regs[0];
+	const unsigned short *cp, *end_cp;
+
+	cp = &us_coeffs[0].anchor_fid0_c0;
+
+	end_cp = cp + sizeof(us_coeffs[0]) / sizeof(*cp);
+
+	while (cp < end_cp) {
+		write_field(us1_reg, *cp++, VPE_US_C0_MASK, VPE_US_C0_SHIFT);
+		write_field(us1_reg, *cp++, VPE_US_C1_MASK, VPE_US_C1_SHIFT);
+		*us2_reg++ = *us1_reg;
+		*us3_reg++ = *us1_reg++;
+	}
+	ctx->load_mmrs = true;
+}
+
+/*
+ * Set the upsampler config mode and the VPDMA line mode in the shadow MMRs.
+ */
+static void set_cfg_and_line_modes(struct vpe_ctx *ctx)
+{
+	struct vpe_fmt *fmt = ctx->q_data[Q_DATA_SRC].fmt;
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	u32 *us1_reg0 = &mmr_adb->us1_regs[0];
+	u32 *us2_reg0 = &mmr_adb->us2_regs[0];
+	u32 *us3_reg0 = &mmr_adb->us3_regs[0];
+	int line_mode = 1;
+	int cfg_mode = 1;
+
+	/*
+	 * Cfg Mode 0: YUV420 source, enable upsampler, DEI is de-interlacing.
+	 * Cfg Mode 1: YUV422 source, disable upsampler, DEI is de-interlacing.
+	 */
+
+	if (fmt->fourcc == V4L2_PIX_FMT_NV12) {
+		cfg_mode = 0;
+		line_mode = 0;		/* double lines to line buffer */
+	}
+
+	write_field(us1_reg0, cfg_mode, VPE_US_MODE_MASK, VPE_US_MODE_SHIFT);
+	write_field(us2_reg0, cfg_mode, VPE_US_MODE_MASK, VPE_US_MODE_SHIFT);
+	write_field(us3_reg0, cfg_mode, VPE_US_MODE_MASK, VPE_US_MODE_SHIFT);
+
+	/* regs for now */
+	vpdma_set_line_mode(ctx->dev->vpdma, line_mode, VPE_CHAN_CHROMA1_IN);
+
+	/* frame start for input luma */
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_LUMA1_IN);
+
+	/* frame start for input chroma */
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_CHROMA1_IN);
+
+	ctx->load_mmrs = true;
+}
+
+/*
+ * Set the shadow registers that are modified when the source
+ * format changes.
+ */
+static void set_src_registers(struct vpe_ctx *ctx)
+{
+	set_us_coefficients(ctx);
+}
+
+/*
+ * Set the shadow registers that are modified when the destination
+ * format changes.
+ */
+static void set_dst_registers(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	struct vpe_fmt *fmt = ctx->q_data[Q_DATA_DST].fmt;
+	u32 val = 0;
+
+	/* select RGB path when color space conversion is supported in future */
+	if (fmt->fourcc == V4L2_PIX_FMT_RGB24)
+		val |= VPE_RGB_OUT_SELECT | VPE_CSC_SRC_DEI_SCALER;
+	else if (fmt->fourcc == V4L2_PIX_FMT_NV16)
+		val |= VPE_COLOR_SEPARATE_422;
+
+	/* The source of CHR_DS is always the scaler, whether it's used or not */
+	val |= VPE_DS_SRC_DEI_SCALER;
+
+	if (fmt->fourcc != V4L2_PIX_FMT_NV12)
+		val |= VPE_DS_BYPASS;
+
+	mmr_adb->out_fmt_reg[0] = val;
+
+	ctx->load_mmrs = true;
+}
+
+/*
+ * Set the de-interlacer shadow register values
+ */
+static void set_dei_regs_bypass(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	struct vpe_q_data *s_q_data = &ctx->q_data[Q_DATA_SRC];
+	unsigned int src_h = s_q_data->c_rect.height;
+	unsigned int src_w = s_q_data->c_rect.width;
+	u32 *dei_mmr0 = &mmr_adb->dei_regs[0];
+	u32 val = 0;
+
+	/*
+	 * according to TRM, we should set DEI in progressive bypass mode when
+	 * the input content is progressive, however, DEI is bypassed correctly
+	 * for both progressive and interlace content in interlace bypass mode.
+	 * It has been recommended not to use progressive bypass mode.
+	 */
+	val = VPE_DEI_INTERLACE_BYPASS;
+
+	val |= (src_h << VPE_DEI_HEIGHT_SHIFT) |
+		(src_w << VPE_DEI_WIDTH_SHIFT) |
+		VPE_DEI_FIELD_FLUSH;
+
+	*dei_mmr0 = val;
+
+	ctx->load_mmrs = true;
+}
+
+static void set_csc_coeff_bypass(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	u32 *shadow_csc_reg5 = &mmr_adb->csc_regs[5];
+
+	*shadow_csc_reg5 |= VPE_CSC_BYPASS;
+
+	ctx->load_mmrs = true;
+}
+
+static void set_sc_regs_bypass(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	u32 *sc_reg0 = &mmr_adb->sc_regs[0];
+	u32 val = 0;
+
+	val |= VPE_SC_BYPASS;
+	*sc_reg0 = val;
+
+	ctx->load_mmrs = true;
+}
+
+/*
+ * Set the shadow registers whose values are modified when either the
+ * source or destination format is changed.
+ */
+static int set_srcdst_params(struct vpe_ctx *ctx)
+{
+	ctx->sequence = 0;
+
+	set_cfg_and_line_modes(ctx);
+	set_dei_regs_bypass(ctx);
+	set_csc_coeff_bypass(ctx);
+	set_sc_regs_bypass(ctx);
+
+	return 0;
+}
+
+/*
+ * Return the vpe_ctx structure for a given struct file
+ */
+static struct vpe_ctx *file2ctx(struct file *file)
+{
+	return container_of(file->private_data, struct vpe_ctx, fh);
+}
+
+/*
+ * mem2mem callbacks
+ */
+
+/**
+ * job_ready() - check whether an instance is ready to be scheduled to run
+ */
+static int job_ready(void *priv)
+{
+	struct vpe_ctx *ctx = priv;
+	int needed = ctx->bufs_per_job;
+
+	if (v4l2_m2m_num_src_bufs_ready(ctx->m2m_ctx) < needed)
+		return 0;
+
+	return 1;
+}
+
+static void job_abort(void *priv)
+{
+	struct vpe_ctx *ctx = priv;
+
+	/* Will cancel the transaction in the next interrupt handler */
+	ctx->aborting = 1;
+}
+
+/*
+ * Lock access to the device
+ */
+static void vpe_lock(void *priv)
+{
+	struct vpe_ctx *ctx = priv;
+	struct vpe_dev *dev = ctx->dev;
+	mutex_lock(&dev->dev_mutex);
+}
+
+static void vpe_unlock(void *priv)
+{
+	struct vpe_ctx *ctx = priv;
+	struct vpe_dev *dev = ctx->dev;
+	mutex_unlock(&dev->dev_mutex);
+}
+
+static void vpe_dump_regs(struct vpe_dev *dev)
+{
+#define DUMPREG(r) vpe_dbg(dev, "%-35s %08x\n", #r, read_reg(dev, VPE_##r))
+
+	vpe_dbg(dev, "VPE Registers:\n");
+
+	DUMPREG(PID);
+	DUMPREG(SYSCONFIG);
+	DUMPREG(INT0_STATUS0_RAW);
+	DUMPREG(INT0_STATUS0);
+	DUMPREG(INT0_ENABLE0);
+	DUMPREG(INT0_STATUS1_RAW);
+	DUMPREG(INT0_STATUS1);
+	DUMPREG(INT0_ENABLE1);
+	DUMPREG(CLK_ENABLE);
+	DUMPREG(CLK_RESET);
+	DUMPREG(CLK_FORMAT_SELECT);
+	DUMPREG(CLK_RANGE_MAP);
+	DUMPREG(US1_R0);
+	DUMPREG(US1_R1);
+	DUMPREG(US1_R2);
+	DUMPREG(US1_R3);
+	DUMPREG(US1_R4);
+	DUMPREG(US1_R5);
+	DUMPREG(US1_R6);
+	DUMPREG(US1_R7);
+	DUMPREG(US2_R0);
+	DUMPREG(US2_R1);
+	DUMPREG(US2_R2);
+	DUMPREG(US2_R3);
+	DUMPREG(US2_R4);
+	DUMPREG(US2_R5);
+	DUMPREG(US2_R6);
+	DUMPREG(US2_R7);
+	DUMPREG(US3_R0);
+	DUMPREG(US3_R1);
+	DUMPREG(US3_R2);
+	DUMPREG(US3_R3);
+	DUMPREG(US3_R4);
+	DUMPREG(US3_R5);
+	DUMPREG(US3_R6);
+	DUMPREG(US3_R7);
+	DUMPREG(DEI_FRAME_SIZE);
+	DUMPREG(MDT_BYPASS);
+	DUMPREG(MDT_SF_THRESHOLD);
+	DUMPREG(EDI_CONFIG);
+	DUMPREG(DEI_EDI_LUT_R0);
+	DUMPREG(DEI_EDI_LUT_R1);
+	DUMPREG(DEI_EDI_LUT_R2);
+	DUMPREG(DEI_EDI_LUT_R3);
+	DUMPREG(DEI_FMD_WINDOW_R0);
+	DUMPREG(DEI_FMD_WINDOW_R1);
+	DUMPREG(DEI_FMD_CONTROL_R0);
+	DUMPREG(DEI_FMD_CONTROL_R1);
+	DUMPREG(DEI_FMD_STATUS_R0);
+	DUMPREG(DEI_FMD_STATUS_R1);
+	DUMPREG(DEI_FMD_STATUS_R2);
+	DUMPREG(SC_MP_SC0);
+	DUMPREG(SC_MP_SC1);
+	DUMPREG(SC_MP_SC2);
+	DUMPREG(SC_MP_SC3);
+	DUMPREG(SC_MP_SC4);
+	DUMPREG(SC_MP_SC5);
+	DUMPREG(SC_MP_SC6);
+	DUMPREG(SC_MP_SC8);
+	DUMPREG(SC_MP_SC9);
+	DUMPREG(SC_MP_SC10);
+	DUMPREG(SC_MP_SC11);
+	DUMPREG(SC_MP_SC12);
+	DUMPREG(SC_MP_SC13);
+	DUMPREG(SC_MP_SC17);
+	DUMPREG(SC_MP_SC18);
+	DUMPREG(SC_MP_SC19);
+	DUMPREG(SC_MP_SC20);
+	DUMPREG(SC_MP_SC21);
+	DUMPREG(SC_MP_SC22);
+	DUMPREG(SC_MP_SC23);
+	DUMPREG(SC_MP_SC24);
+	DUMPREG(SC_MP_SC25);
+	DUMPREG(CSC_CSC00);
+	DUMPREG(CSC_CSC01);
+	DUMPREG(CSC_CSC02);
+	DUMPREG(CSC_CSC03);
+	DUMPREG(CSC_CSC04);
+	DUMPREG(CSC_CSC05);
+#undef DUMPREG
+}
+
+static void add_out_dtd(struct vpe_ctx *ctx, int port)
+{
+	struct vpe_q_data *q_data = &ctx->q_data[Q_DATA_DST];
+	const struct vpe_port_data *p_data = &port_data[port];
+	struct vb2_buffer *vb = ctx->dst_vb;
+	struct v4l2_rect *c_rect = &q_data->c_rect;
+	struct vpe_fmt *fmt = q_data->fmt;
+	const struct vpdma_data_format *vpdma_fmt;
+	int plane = fmt->coplanar ? p_data->vb_part : 0;
+	dma_addr_t dma_addr;
+	u32 flags = 0;
+
+	vpdma_fmt = fmt->vpdma_fmt[plane];
+	dma_addr = vb2_dma_contig_plane_dma_addr(vb, plane);
+	if (!dma_addr) {
+		vpe_err(ctx->dev,
+			"acquiring output buffer(%d) dma_addr failed\n",
+			port);
+		return;
+	}
+
+	if (q_data->flags & Q_DATA_FRAME_1D)
+		flags |= VPDMA_DATA_FRAME_1D;
+	if (q_data->flags & Q_DATA_MODE_TILED)
+		flags |= VPDMA_DATA_MODE_TILED;
+
+	vpdma_add_out_dtd(&ctx->desc_list, c_rect, vpdma_fmt, dma_addr,
+		p_data->channel, flags);
+}
+
+static void add_in_dtd(struct vpe_ctx *ctx, int port)
+{
+	struct vpe_q_data *q_data = &ctx->q_data[Q_DATA_SRC];
+	const struct vpe_port_data *p_data = &port_data[port];
+	struct vb2_buffer *vb = ctx->src_vb;
+	struct v4l2_rect *c_rect = &q_data->c_rect;
+	struct vpe_fmt *fmt = q_data->fmt;
+	const struct vpdma_data_format *vpdma_fmt;
+	int plane = fmt->coplanar ? p_data->vb_part : 0;
+	int field = 0;
+	dma_addr_t dma_addr;
+	u32 flags = 0;
+
+	vpdma_fmt = fmt->vpdma_fmt[plane];
+
+	dma_addr = vb2_dma_contig_plane_dma_addr(vb, plane);
+	if (!dma_addr) {
+		vpe_err(ctx->dev,
+			"acquiring input buffer(%d) dma_addr failed\n",
+			port);
+		return;
+	}
+
+	if (q_data->flags & Q_DATA_FRAME_1D)
+		flags |= VPDMA_DATA_FRAME_1D;
+	if (q_data->flags & Q_DATA_MODE_TILED)
+		flags |= VPDMA_DATA_MODE_TILED;
+
+	vpdma_add_in_dtd(&ctx->desc_list, q_data->width, q_data->height,
+		c_rect, vpdma_fmt, dma_addr, p_data->channel, field, flags);
+}
+
+/*
+ * Enable the expected IRQ sources
+ */
+static void enable_irqs(struct vpe_ctx *ctx)
+{
+	write_reg(ctx->dev, VPE_INT0_ENABLE0_SET, VPE_INT0_LIST0_COMPLETE);
+	write_reg(ctx->dev, VPE_INT0_ENABLE1_SET, VPE_DS1_UV_ERROR_INT);
+
+	vpdma_enable_list_complete_irq(ctx->dev->vpdma, 0, true);
+}
+
+static void disable_irqs(struct vpe_ctx *ctx)
+{
+	write_reg(ctx->dev, VPE_INT0_ENABLE0_CLR, 0xffffffff);
+	write_reg(ctx->dev, VPE_INT0_ENABLE1_CLR, 0xffffffff);
+
+	vpdma_enable_list_complete_irq(ctx->dev->vpdma, 0, false);
+}
+
+/* device_run() - prepares and starts the device
+ *
+ * This function is only called when both the source and destination
+ * buffers are in place.
+ */
+static void device_run(void *priv)
+{
+	struct vpe_ctx *ctx = priv;
+	struct vpe_q_data *d_q_data = &ctx->q_data[Q_DATA_DST];
+
+	ctx->src_vb = v4l2_m2m_src_buf_remove(ctx->m2m_ctx);
+	WARN_ON(ctx->src_vb == NULL);
+	ctx->dst_vb = v4l2_m2m_dst_buf_remove(ctx->m2m_ctx);
+	WARN_ON(ctx->dst_vb == NULL);
+
+	/* config descriptors */
+	if (ctx->dev->loaded_mmrs != ctx->mmr_adb.dma_addr || ctx->load_mmrs) {
+		vpdma_map_desc_buf(ctx->dev->vpdma, &ctx->mmr_adb);
+		vpdma_add_cfd_adb(&ctx->desc_list, CFD_MMR_CLIENT, &ctx->mmr_adb);
+		ctx->dev->loaded_mmrs = ctx->mmr_adb.dma_addr;
+		ctx->load_mmrs = false;
+	}
+
+	add_out_dtd(ctx, VPE_PORT_LUMA_OUT);
+	if (d_q_data->fmt->coplanar)
+		add_out_dtd(ctx, VPE_PORT_CHROMA_OUT);
+
+	add_in_dtd(ctx, VPE_PORT_LUMA1_IN);
+	add_in_dtd(ctx, VPE_PORT_CHROMA1_IN);
+
+	/* sync on channel control descriptors for input ports */
+	vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_LUMA1_IN);
+	vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_CHROMA1_IN);
+
+	/* sync on channel control descriptors for output ports */
+	vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_LUMA_OUT);
+	if (d_q_data->fmt->coplanar)
+		vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_CHROMA_OUT);
+
+	enable_irqs(ctx);
+
+	vpdma_map_desc_buf(ctx->dev->vpdma, &ctx->desc_list.buf);
+	vpdma_submit_descs(ctx->dev->vpdma, &ctx->desc_list);
+}
+
+static void ds1_uv_error(struct vpe_ctx *ctx)
+{
+	dev_warn(ctx->dev->v4l2_dev.dev,
+		"received downsampler error interrupt\n");
+}
+
+static irqreturn_t vpe_irq(int irq_vpe, void *data)
+{
+	struct vpe_dev *dev = (struct vpe_dev *)data;
+	struct vpe_ctx *ctx;
+	struct vb2_buffer *s_vb, *d_vb;
+	struct v4l2_buffer *s_buf, *d_buf;
+	unsigned long flags;
+	u32 irqst0, irqst1;
+
+	irqst0 = read_reg(dev, VPE_INT0_STATUS0);
+	if (irqst0) {
+		write_reg(dev, VPE_INT0_STATUS0_CLR, irqst0);
+		vpe_dbg(dev, "INT0_STATUS0 = 0x%08x\n", irqst0);
+	}
+
+	irqst1 = read_reg(dev, VPE_INT0_STATUS1);
+	if (irqst1) {
+		write_reg(dev, VPE_INT0_STATUS1_CLR, irqst1);
+		vpe_dbg(dev, "INT0_STATUS1 = 0x%08x\n", irqst1);
+	}
+
+	ctx = v4l2_m2m_get_curr_priv(dev->m2m_dev);
+	if (!ctx) {
+		vpe_err(dev, "instance released before end of transaction\n");
+		goto handled;
+	}
+
+	if (irqst1 & VPE_DS1_UV_ERROR_INT) {
+		irqst1 &= ~VPE_DS1_UV_ERROR_INT;
+		ds1_uv_error(ctx);
+	}
+
+	if (irqst0) {
+		if (irqst0 & VPE_INT0_LIST0_COMPLETE)
+			vpdma_clear_list_stat(ctx->dev->vpdma);
+
+		irqst0 &= ~(VPE_INT0_LIST0_COMPLETE);
+	}
+
+	if (irqst0 | irqst1) {
+		dev_warn(dev->v4l2_dev.dev, "Unexpected interrupt: "
+			"INT0_STATUS0 = 0x%08x, INT0_STATUS1 = 0x%08x\n",
+			irqst0, irqst1);
+	}
+
+	disable_irqs(ctx);
+
+	vpdma_unmap_desc_buf(dev->vpdma, &ctx->desc_list.buf);
+	vpdma_unmap_desc_buf(dev->vpdma, &ctx->mmr_adb);
+
+	vpdma_reset_desc_list(&ctx->desc_list);
+
+	if (ctx->aborting)
+		goto finished;
+
+	s_vb = ctx->src_vb;
+	d_vb = ctx->dst_vb;
+	s_buf = &s_vb->v4l2_buf;
+	d_buf = &d_vb->v4l2_buf;
+
+	d_buf->timestamp = s_buf->timestamp;
+	if (s_buf->flags & V4L2_BUF_FLAG_TIMECODE) {
+		d_buf->flags |= V4L2_BUF_FLAG_TIMECODE;
+		d_buf->timecode = s_buf->timecode;
+	}
+
+	d_buf->sequence = ctx->sequence;
+
+	ctx->sequence++;
+
+	spin_lock_irqsave(&dev->lock, flags);
+	v4l2_m2m_buf_done(s_vb, VB2_BUF_STATE_DONE);
+	v4l2_m2m_buf_done(d_vb, VB2_BUF_STATE_DONE);
+	spin_unlock_irqrestore(&dev->lock, flags);
+
+	ctx->bufs_completed++;
+	if (ctx->bufs_completed < ctx->bufs_per_job) {
+		device_run(ctx);
+		goto handled;
+	}
+
+finished:
+	vpe_dbg(ctx->dev, "finishing transaction\n");
+	ctx->bufs_completed = 0;
+	v4l2_m2m_job_finish(dev->m2m_dev, ctx->m2m_ctx);
+handled:
+	return IRQ_HANDLED;
+}
+
+/*
+ * video ioctls
+ */
+static int vpe_querycap(struct file *file, void *priv,
+			struct v4l2_capability *cap)
+{
+	strncpy(cap->driver, VPE_MODULE_NAME, sizeof(cap->driver) - 1);
+	strncpy(cap->card, VPE_MODULE_NAME, sizeof(cap->card) - 1);
+	strlcpy(cap->bus_info, VPE_MODULE_NAME, sizeof(cap->bus_info));
+	cap->device_caps  = V4L2_CAP_VIDEO_M2M | V4L2_CAP_STREAMING;
+	cap->capabilities = cap->device_caps | V4L2_CAP_DEVICE_CAPS;
+	return 0;
+}
+
+static int __enum_fmt(struct v4l2_fmtdesc *f, u32 type)
+{
+	int i, index;
+	struct vpe_fmt *fmt = NULL;
+
+	index = 0;
+	for (i = 0; i < ARRAY_SIZE(vpe_formats); ++i) {
+		if (vpe_formats[i].types & type) {
+			if (index == f->index) {
+				fmt = &vpe_formats[i];
+				break;
+			}
+			index++;
+		}
+	}
+
+	if (!fmt)
+		return -EINVAL;
+
+	strncpy(f->description, fmt->name, sizeof(f->description) - 1);
+	f->pixelformat = fmt->fourcc;
+	return 0;
+}
+
+static int vpe_enum_fmt(struct file *file, void *priv,
+				struct v4l2_fmtdesc *f)
+{
+	if (V4L2_TYPE_IS_OUTPUT(f->type))
+		return __enum_fmt(f, VPE_FMT_TYPE_OUTPUT);
+
+	return __enum_fmt(f, VPE_FMT_TYPE_CAPTURE);
+}
+
+static int vpe_g_fmt(struct file *file, void *priv, struct v4l2_format *f)
+{
+	struct v4l2_pix_format_mplane *pix = &f->fmt.pix_mp;
+	struct vpe_ctx *ctx = file2ctx(file);
+	struct vb2_queue *vq;
+	struct vpe_q_data *q_data;
+	int i;
+
+	vq = v4l2_m2m_get_vq(ctx->m2m_ctx, f->type);
+	if (!vq)
+		return -EINVAL;
+
+	q_data = get_q_data(ctx, f->type);
+
+	pix->width = q_data->width;
+	pix->height = q_data->height;
+	pix->pixelformat = q_data->fmt->fourcc;
+
+	if (V4L2_TYPE_IS_OUTPUT(f->type)) {
+		pix->colorspace = q_data->colorspace;
+	} else {
+		struct vpe_q_data *s_q_data;
+
+		/* get colorspace from the source queue */
+		s_q_data = get_q_data(ctx, V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE);
+
+		pix->colorspace = s_q_data->colorspace;
+	}
+
+	pix->num_planes = q_data->fmt->coplanar ? 2 : 1;
+
+	for (i = 0; i < pix->num_planes; i++) {
+		pix->plane_fmt[i].bytesperline = q_data->bytesperline[i];
+		pix->plane_fmt[i].sizeimage = q_data->sizeimage[i];
+	}
+
+	return 0;
+}
+
+static int __vpe_try_fmt(struct vpe_ctx *ctx, struct v4l2_format *f,
+		       struct vpe_fmt *fmt, int type)
+{
+	struct v4l2_pix_format_mplane *pix = &f->fmt.pix_mp;
+	struct v4l2_plane_pix_format *plane_fmt;
+	int i;
+
+	if (!fmt || !(fmt->types & type)) {
+		vpe_err(ctx->dev, "Fourcc format (0x%08x) invalid.\n",
+			pix->pixelformat);
+		return -EINVAL;
+	}
+
+	pix->field = V4L2_FIELD_NONE;
+
+	v4l_bound_align_image(&pix->width, MIN_W, MAX_W, W_ALIGN,
+			      &pix->height, MIN_H, MAX_H, H_ALIGN,
+			      S_ALIGN);
+
+	pix->num_planes = fmt->coplanar ? 2 : 1;
+	pix->pixelformat = fmt->fourcc;
+
+	if (type == VPE_FMT_TYPE_CAPTURE) {
+		struct vpe_q_data *s_q_data;
+
+		/* get colorspace from the source queue */
+		s_q_data = get_q_data(ctx, V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE);
+
+		pix->colorspace = s_q_data->colorspace;
+	} else {
+		if (!pix->colorspace)
+			pix->colorspace = V4L2_COLORSPACE_SMPTE240M;
+	}
+
+	for (i = 0; i < pix->num_planes; i++) {
+		int depth;
+
+		plane_fmt = &pix->plane_fmt[i];
+		depth = fmt->vpdma_fmt[i]->depth;
+
+		if (i == VPE_LUMA)
+			plane_fmt->bytesperline =
+					round_up((pix->width * depth) >> 3,
+						1 << L_ALIGN);
+		else
+			plane_fmt->bytesperline = pix->width;
+
+		plane_fmt->sizeimage =
+				(pix->height * pix->width * depth) >> 3;
+	}
+
+	return 0;
+}
+
+static int vpe_try_fmt(struct file *file, void *priv, struct v4l2_format *f)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+	struct vpe_fmt *fmt = find_format(f);
+
+	if (V4L2_TYPE_IS_OUTPUT(f->type))
+		return __vpe_try_fmt(ctx, f, fmt, VPE_FMT_TYPE_OUTPUT);
+	else
+		return __vpe_try_fmt(ctx, f, fmt, VPE_FMT_TYPE_CAPTURE);
+}
+
+static int __vpe_s_fmt(struct vpe_ctx *ctx, struct v4l2_format *f)
+{
+	struct v4l2_pix_format_mplane *pix = &f->fmt.pix_mp;
+	struct v4l2_plane_pix_format *plane_fmt;
+	struct vpe_q_data *q_data;
+	struct vb2_queue *vq;
+	int i;
+
+	vq = v4l2_m2m_get_vq(ctx->m2m_ctx, f->type);
+	if (!vq)
+		return -EINVAL;
+
+	if (vb2_is_busy(vq)) {
+		vpe_err(ctx->dev, "queue busy\n");
+		return -EBUSY;
+	}
+
+	q_data = get_q_data(ctx, f->type);
+	if (!q_data)
+		return -EINVAL;
+
+	q_data->fmt		= find_format(f);
+	q_data->width		= pix->width;
+	q_data->height		= pix->height;
+	q_data->colorspace	= pix->colorspace;
+
+	for (i = 0; i < pix->num_planes; i++) {
+		plane_fmt = &pix->plane_fmt[i];
+
+		q_data->bytesperline[i]	= plane_fmt->bytesperline;
+		q_data->sizeimage[i]	= plane_fmt->sizeimage;
+	}
+
+	q_data->c_rect.left	= 0;
+	q_data->c_rect.top	= 0;
+	q_data->c_rect.width	= q_data->width;
+	q_data->c_rect.height	= q_data->height;
+
+	vpe_dbg(ctx->dev, "Setting format for type %d, wxh: %dx%d, fmt: %d bpl_y %d",
+		f->type, q_data->width, q_data->height, q_data->fmt->fourcc,
+		q_data->bytesperline[VPE_LUMA]);
+	if (q_data->fmt->coplanar)
+		vpe_dbg(ctx->dev, " bpl_uv %d\n",
+			q_data->bytesperline[VPE_CHROMA]);
+
+	return 0;
+}
+
+static int vpe_s_fmt(struct file *file, void *priv, struct v4l2_format *f)
+{
+	int ret;
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	ret = vpe_try_fmt(file, priv, f);
+	if (ret)
+		return ret;
+
+	ret = __vpe_s_fmt(ctx, f);
+	if (ret)
+		return ret;
+
+	if (V4L2_TYPE_IS_OUTPUT(f->type))
+		set_src_registers(ctx);
+	else
+		set_dst_registers(ctx);
+
+	return set_srcdst_params(ctx);
+}
+
+static int vpe_reqbufs(struct file *file, void *priv,
+		       struct v4l2_requestbuffers *reqbufs)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	return v4l2_m2m_reqbufs(file, ctx->m2m_ctx, reqbufs);
+}
+
+static int vpe_querybuf(struct file *file, void *priv, struct v4l2_buffer *buf)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	return v4l2_m2m_querybuf(file, ctx->m2m_ctx, buf);
+}
+
+static int vpe_qbuf(struct file *file, void *priv, struct v4l2_buffer *buf)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	return v4l2_m2m_qbuf(file, ctx->m2m_ctx, buf);
+}
+
+static int vpe_dqbuf(struct file *file, void *priv, struct v4l2_buffer *buf)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	return v4l2_m2m_dqbuf(file, ctx->m2m_ctx, buf);
+}
+
+static int vpe_streamon(struct file *file, void *priv, enum v4l2_buf_type type)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	return v4l2_m2m_streamon(file, ctx->m2m_ctx, type);
+}
+
+static int vpe_streamoff(struct file *file, void *priv, enum v4l2_buf_type type)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	vpe_dump_regs(ctx->dev);
+	vpdma_dump_regs(ctx->dev->vpdma);
+
+	return v4l2_m2m_streamoff(file, ctx->m2m_ctx, type);
+}
+
+/*
+ * defines number of buffers/frames a context can process with VPE before
+ * switching to a different context. default value is 1 buffer per context
+ */
+#define V4L2_CID_VPE_BUFS_PER_JOB		(V4L2_CID_USER_TI_VPE_BASE + 0)
+
+static int vpe_s_ctrl(struct v4l2_ctrl *ctrl)
+{
+	struct vpe_ctx *ctx =
+		container_of(ctrl->handler, struct vpe_ctx, hdl);
+
+	switch (ctrl->id) {
+	case V4L2_CID_VPE_BUFS_PER_JOB:
+		ctx->bufs_per_job = ctrl->val;
+		break;
+
+	default:
+		vpe_err(ctx->dev, "Invalid control\n");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static const struct v4l2_ctrl_ops vpe_ctrl_ops = {
+	.s_ctrl = vpe_s_ctrl,
+};
+
+static const struct v4l2_ioctl_ops vpe_ioctl_ops = {
+	.vidioc_querycap	= vpe_querycap,
+
+	.vidioc_enum_fmt_vid_cap_mplane = vpe_enum_fmt,
+	.vidioc_g_fmt_vid_cap_mplane	= vpe_g_fmt,
+	.vidioc_try_fmt_vid_cap_mplane	= vpe_try_fmt,
+	.vidioc_s_fmt_vid_cap_mplane	= vpe_s_fmt,
+
+	.vidioc_enum_fmt_vid_out_mplane = vpe_enum_fmt,
+	.vidioc_g_fmt_vid_out_mplane	= vpe_g_fmt,
+	.vidioc_try_fmt_vid_out_mplane	= vpe_try_fmt,
+	.vidioc_s_fmt_vid_out_mplane	= vpe_s_fmt,
+
+	.vidioc_reqbufs		= vpe_reqbufs,
+	.vidioc_querybuf	= vpe_querybuf,
+
+	.vidioc_qbuf		= vpe_qbuf,
+	.vidioc_dqbuf		= vpe_dqbuf,
+
+	.vidioc_streamon	= vpe_streamon,
+	.vidioc_streamoff	= vpe_streamoff,
+	.vidioc_subscribe_event = v4l2_ctrl_subscribe_event,
+	.vidioc_unsubscribe_event = v4l2_event_unsubscribe,
+};
+
+/*
+ * Queue operations
+ */
+static int vpe_queue_setup(struct vb2_queue *vq,
+			   const struct v4l2_format *fmt,
+			   unsigned int *nbuffers, unsigned int *nplanes,
+			   unsigned int sizes[], void *alloc_ctxs[])
+{
+	int i;
+	struct vpe_ctx *ctx = vb2_get_drv_priv(vq);
+	struct vpe_q_data *q_data;
+
+	q_data = get_q_data(ctx, vq->type);
+
+	*nplanes = q_data->fmt->coplanar ? 2 : 1;
+
+	for (i = 0; i < *nplanes; i++) {
+		sizes[i] = q_data->sizeimage[i];
+		alloc_ctxs[i] = ctx->dev->alloc_ctx;
+	}
+
+	vpe_dbg(ctx->dev, "get %d buffer(s) of size %d", *nbuffers,
+		sizes[VPE_LUMA]);
+	if (q_data->fmt->coplanar)
+		vpe_dbg(ctx->dev, " and %d\n", sizes[VPE_CHROMA]);
+
+	return 0;
+}
+
+static int vpe_buf_prepare(struct vb2_buffer *vb)
+{
+	struct vpe_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
+	struct vpe_q_data *q_data;
+	int i, num_planes;
+
+	vpe_dbg(ctx->dev, "type: %d\n", vb->vb2_queue->type);
+
+	q_data = get_q_data(ctx, vb->vb2_queue->type);
+	num_planes = q_data->fmt->coplanar ? 2 : 1;
+
+	for (i = 0; i < num_planes; i++) {
+		if (vb2_plane_size(vb, i) < q_data->sizeimage[i]) {
+			vpe_err(ctx->dev,
+				"data will not fit into plane (%lu < %lu)\n",
+				vb2_plane_size(vb, i),
+				(long) q_data->sizeimage[i]);
+			return -EINVAL;
+		}
+	}
+
+	for (i = 0; i < num_planes; i++)
+		vb2_set_plane_payload(vb, i, q_data->sizeimage[i]);
+
+	return 0;
+}
+
+static void vpe_buf_queue(struct vb2_buffer *vb)
+{
+	struct vpe_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
+	v4l2_m2m_buf_queue(ctx->m2m_ctx, vb);
+}
+
+static void vpe_wait_prepare(struct vb2_queue *q)
+{
+	struct vpe_ctx *ctx = vb2_get_drv_priv(q);
+	vpe_unlock(ctx);
+}
+
+static void vpe_wait_finish(struct vb2_queue *q)
+{
+	struct vpe_ctx *ctx = vb2_get_drv_priv(q);
+	vpe_lock(ctx);
+}
+
+static struct vb2_ops vpe_qops = {
+	.queue_setup	 = vpe_queue_setup,
+	.buf_prepare	 = vpe_buf_prepare,
+	.buf_queue	 = vpe_buf_queue,
+	.wait_prepare	 = vpe_wait_prepare,
+	.wait_finish	 = vpe_wait_finish,
+};
+
+static int queue_init(void *priv, struct vb2_queue *src_vq,
+		      struct vb2_queue *dst_vq)
+{
+	struct vpe_ctx *ctx = priv;
+	int ret;
+
+	memset(src_vq, 0, sizeof(*src_vq));
+	src_vq->type = V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE;
+	src_vq->io_modes = VB2_MMAP;
+	src_vq->drv_priv = ctx;
+	src_vq->buf_struct_size = sizeof(struct v4l2_m2m_buffer);
+	src_vq->ops = &vpe_qops;
+	src_vq->mem_ops = &vb2_dma_contig_memops;
+	src_vq->timestamp_type = V4L2_BUF_FLAG_TIMESTAMP_COPY;
+
+	ret = vb2_queue_init(src_vq);
+	if (ret)
+		return ret;
+
+	memset(dst_vq, 0, sizeof(*dst_vq));
+	dst_vq->type = V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE;
+	dst_vq->io_modes = VB2_MMAP;
+	dst_vq->drv_priv = ctx;
+	dst_vq->buf_struct_size = sizeof(struct v4l2_m2m_buffer);
+	dst_vq->ops = &vpe_qops;
+	dst_vq->mem_ops = &vb2_dma_contig_memops;
+	dst_vq->timestamp_type = V4L2_BUF_FLAG_TIMESTAMP_COPY;
+
+	return vb2_queue_init(dst_vq);
+}
+
+static const struct v4l2_ctrl_config vpe_bufs_per_job = {
+	.ops = &vpe_ctrl_ops,
+	.id = V4L2_CID_VPE_BUFS_PER_JOB,
+	.name = "Buffers Per Transaction",
+	.type = V4L2_CTRL_TYPE_INTEGER,
+	.def = VPE_DEF_BUFS_PER_JOB,
+	.min = 1,
+	.max = VIDEO_MAX_FRAME,
+	.step = 1,
+};
+
+/*
+ * File operations
+ */
+static int vpe_open(struct file *file)
+{
+	struct vpe_dev *dev = video_drvdata(file);
+	struct vpe_ctx *ctx = NULL;
+	struct vpe_q_data *s_q_data;
+	struct v4l2_ctrl_handler *hdl;
+	int ret;
+
+	vpe_dbg(dev, "vpe_open\n");
+
+	if (!dev->vpdma->ready) {
+		vpe_err(dev, "vpdma firmware not loaded\n");
+		return -ENODEV;
+	}
+
+	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
+	if (!ctx)
+		return -ENOMEM;
+
+	ctx->dev = dev;
+
+	if (mutex_lock_interruptible(&dev->dev_mutex)) {
+		ret = -ERESTARTSYS;
+		goto free_ctx;
+	}
+
+	ret = vpdma_create_desc_list(&ctx->desc_list, VPE_DESC_LIST_SIZE,
+			VPDMA_LIST_TYPE_NORMAL);
+	if (ret != 0)
+		goto unlock;
+
+	ret = vpdma_alloc_desc_buf(&ctx->mmr_adb, sizeof(struct vpe_mmr_adb));
+	if (ret != 0)
+		goto free_desc_list;
+
+	init_adb_hdrs(ctx);
+
+	v4l2_fh_init(&ctx->fh, video_devdata(file));
+	file->private_data = &ctx->fh;
+
+	hdl = &ctx->hdl;
+	v4l2_ctrl_handler_init(hdl, 1);
+	v4l2_ctrl_new_custom(hdl, &vpe_bufs_per_job, NULL);
+	if (hdl->error) {
+		ret = hdl->error;
+		goto exit_fh;
+	}
+	ctx->fh.ctrl_handler = hdl;
+	v4l2_ctrl_handler_setup(hdl);
+
+	s_q_data = &ctx->q_data[Q_DATA_SRC];
+	s_q_data->fmt = &vpe_formats[2];
+	s_q_data->width = 1920;
+	s_q_data->height = 1080;
+	s_q_data->sizeimage[VPE_LUMA] = (s_q_data->width * s_q_data->height *
+			s_q_data->fmt->vpdma_fmt[VPE_LUMA]->depth) >> 3;
+	s_q_data->colorspace = V4L2_COLORSPACE_SMPTE240M;
+	s_q_data->c_rect.left = 0;
+	s_q_data->c_rect.top = 0;
+	s_q_data->c_rect.width = s_q_data->width;
+	s_q_data->c_rect.height = s_q_data->height;
+	s_q_data->flags = 0;
+
+	ctx->q_data[Q_DATA_DST] = *s_q_data;
+
+	set_src_registers(ctx);
+	set_dst_registers(ctx);
+	ret = set_srcdst_params(ctx);
+	if (ret)
+		goto exit_fh;
+
+	ctx->m2m_ctx = v4l2_m2m_ctx_init(dev->m2m_dev, ctx, &queue_init);
+
+	if (IS_ERR(ctx->m2m_ctx)) {
+		ret = PTR_ERR(ctx->m2m_ctx);
+		goto exit_fh;
+	}
+
+	v4l2_fh_add(&ctx->fh);
+
+	/*
+	 * for now, just report the creation of the first instance, we can later
+	 * optimize the driver to enable or disable clocks when the first
+	 * instance is created or the last instance released
+	 */
+	if (atomic_inc_return(&dev->num_instances) == 1)
+		vpe_dbg(dev, "first instance created\n");
+
+	ctx->bufs_per_job = VPE_DEF_BUFS_PER_JOB;
+
+	ctx->load_mmrs = true;
+
+	vpe_dbg(dev, "created instance %p, m2m_ctx: %p\n",
+		ctx, ctx->m2m_ctx);
+
+	mutex_unlock(&dev->dev_mutex);
+
+	return 0;
+exit_fh:
+	v4l2_ctrl_handler_free(hdl);
+	v4l2_fh_exit(&ctx->fh);
+	vpdma_free_desc_buf(&ctx->mmr_adb);
+free_desc_list:
+	vpdma_free_desc_list(&ctx->desc_list);
+unlock:
+	mutex_unlock(&dev->dev_mutex);
+free_ctx:
+	kfree(ctx);
+	return ret;
+}
+
+static int vpe_release(struct file *file)
+{
+	struct vpe_dev *dev = video_drvdata(file);
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	vpe_dbg(dev, "releasing instance %p\n", ctx);
+
+	mutex_lock(&dev->dev_mutex);
+	vpdma_free_desc_list(&ctx->desc_list);
+	vpdma_free_desc_buf(&ctx->mmr_adb);
+
+	v4l2_fh_del(&ctx->fh);
+	v4l2_fh_exit(&ctx->fh);
+	v4l2_ctrl_handler_free(&ctx->hdl);
+	v4l2_m2m_ctx_release(ctx->m2m_ctx);
+
+	kfree(ctx);
+
+	/*
+	 * for now, just report the release of the last instance, we can later
+	 * optimize the driver to enable or disable clocks when the first
+	 * instance is created or the last instance released
+	 */
+	if (atomic_dec_return(&dev->num_instances) == 0)
+		vpe_dbg(dev, "last instance released\n");
+
+	mutex_unlock(&dev->dev_mutex);
+
+	return 0;
+}
+
+static unsigned int vpe_poll(struct file *file,
+			     struct poll_table_struct *wait)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+	struct vpe_dev *dev = ctx->dev;
+	int ret;
+
+	mutex_lock(&dev->dev_mutex);
+	ret = v4l2_m2m_poll(file, ctx->m2m_ctx, wait);
+	mutex_unlock(&dev->dev_mutex);
+	return ret;
+}
+
+static int vpe_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+	struct vpe_dev *dev = ctx->dev;
+	int ret;
+
+	if (mutex_lock_interruptible(&dev->dev_mutex))
+		return -ERESTARTSYS;
+	ret = v4l2_m2m_mmap(file, ctx->m2m_ctx, vma);
+	mutex_unlock(&dev->dev_mutex);
+	return ret;
+}
+
+static const struct v4l2_file_operations vpe_fops = {
+	.owner		= THIS_MODULE,
+	.open		= vpe_open,
+	.release	= vpe_release,
+	.poll		= vpe_poll,
+	.unlocked_ioctl	= video_ioctl2,
+	.mmap		= vpe_mmap,
+};
+
+static struct video_device vpe_videodev = {
+	.name		= VPE_MODULE_NAME,
+	.fops		= &vpe_fops,
+	.ioctl_ops	= &vpe_ioctl_ops,
+	.minor		= -1,
+	.release	= video_device_release,
+	.vfl_dir	= VFL_DIR_M2M,
+};
+
+static struct v4l2_m2m_ops m2m_ops = {
+	.device_run	= device_run,
+	.job_ready	= job_ready,
+	.job_abort	= job_abort,
+	.lock		= vpe_lock,
+	.unlock		= vpe_unlock,
+};
+
+static int vpe_runtime_get(struct platform_device *pdev)
+{
+	int r;
+
+	dev_dbg(&pdev->dev, "vpe_runtime_get\n");
+
+	r = pm_runtime_get_sync(&pdev->dev);
+	WARN_ON(r < 0);
+	return r < 0 ? r : 0;
+}
+
+static void vpe_runtime_put(struct platform_device *pdev)
+{
+
+	int r;
+
+	dev_dbg(&pdev->dev, "vpe_runtime_put\n");
+
+	r = pm_runtime_put_sync(&pdev->dev);
+	WARN_ON(r < 0 && r != -ENOSYS);
+}
+
+static int vpe_probe(struct platform_device *pdev)
+{
+	struct vpe_dev *dev;
+	struct video_device *vfd;
+	struct resource *res;
+	int ret, irq, func;
+
+	dev = devm_kzalloc(&pdev->dev, sizeof(*dev), GFP_KERNEL);
+	if (IS_ERR(dev))
+		return PTR_ERR(dev);
+
+	spin_lock_init(&dev->lock);
+
+	ret = v4l2_device_register(&pdev->dev, &dev->v4l2_dev);
+	if (ret)
+		return ret;
+
+	atomic_set(&dev->num_instances, 0);
+	mutex_init(&dev->dev_mutex);
+
+	res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "vpe_top");
+	/*
+	 * HACK: we get resource info from device tree in the form of a list of
+	 * VPE sub blocks, the driver currently uses only the base of vpe_top
+	 * for register access, the driver should be changed later to access
+	 * registers based on the sub block base addresses
+	 */
+	dev->base = devm_ioremap(&pdev->dev, res->start, SZ_32K);
+	if (IS_ERR(dev->base)) {
+		ret = PTR_ERR(dev->base);
+		goto v4l2_dev_unreg;
+	}
+
+	irq = platform_get_irq(pdev, 0);
+	ret = devm_request_irq(&pdev->dev, irq, vpe_irq, 0, VPE_MODULE_NAME,
+			dev);
+	if (ret)
+		goto v4l2_dev_unreg;
+
+	platform_set_drvdata(pdev, dev);
+
+	dev->alloc_ctx = vb2_dma_contig_init_ctx(&pdev->dev);
+	if (IS_ERR(dev->alloc_ctx)) {
+		vpe_err(dev, "Failed to alloc vb2 context\n");
+		ret = PTR_ERR(dev->alloc_ctx);
+		goto v4l2_dev_unreg;
+	}
+
+	dev->m2m_dev = v4l2_m2m_init(&m2m_ops);
+	if (IS_ERR(dev->m2m_dev)) {
+		vpe_err(dev, "Failed to init mem2mem device\n");
+		ret = PTR_ERR(dev->m2m_dev);
+		goto rel_ctx;
+	}
+
+	pm_runtime_enable(&pdev->dev);
+
+	ret = vpe_runtime_get(pdev);
+	if (ret)
+		goto rel_m2m;
+
+	/* Perform clk enable followed by reset */
+	vpe_set_clock_enable(dev, 1);
+
+	vpe_top_reset(dev);
+
+	func = read_field_reg(dev, VPE_PID, VPE_PID_FUNC_MASK,
+		VPE_PID_FUNC_SHIFT);
+	vpe_dbg(dev, "VPE PID function %x\n", func);
+
+	vpe_top_vpdma_reset(dev);
+
+	dev->vpdma = vpdma_create(pdev);
+	if (IS_ERR(dev->vpdma))
+		goto runtime_put;
+
+	vfd = &dev->vfd;
+	*vfd = vpe_videodev;
+	vfd->lock = &dev->dev_mutex;
+	vfd->v4l2_dev = &dev->v4l2_dev;
+
+	ret = video_register_device(vfd, VFL_TYPE_GRABBER, 0);
+	if (ret) {
+		vpe_err(dev, "Failed to register video device\n");
+		goto runtime_put;
+	}
+
+	video_set_drvdata(vfd, dev);
+	snprintf(vfd->name, sizeof(vfd->name), "%s", vpe_videodev.name);
+	dev_info(dev->v4l2_dev.dev, "Device registered as /dev/video%d\n",
+		vfd->num);
+
+	return 0;
+
+runtime_put:
+	vpe_runtime_put(pdev);
+rel_m2m:
+	pm_runtime_disable(&pdev->dev);
+	v4l2_m2m_release(dev->m2m_dev);
+rel_ctx:
+	vb2_dma_contig_cleanup_ctx(dev->alloc_ctx);
+v4l2_dev_unreg:
+	v4l2_device_unregister(&dev->v4l2_dev);
+
+	return ret;
+}
+
+static int vpe_remove(struct platform_device *pdev)
+{
+	struct vpe_dev *dev =
+		(struct vpe_dev *) platform_get_drvdata(pdev);
+
+	v4l2_info(&dev->v4l2_dev, "Removing " VPE_MODULE_NAME);
+
+	v4l2_m2m_release(dev->m2m_dev);
+	video_unregister_device(&dev->vfd);
+	v4l2_device_unregister(&dev->v4l2_dev);
+	vb2_dma_contig_cleanup_ctx(dev->alloc_ctx);
+
+	vpe_set_clock_enable(dev, 0);
+	vpe_runtime_put(pdev);
+	pm_runtime_disable(&pdev->dev);
+
+	return 0;
+}
+
+#if defined(CONFIG_OF)
+static const struct of_device_id vpe_of_match[] = {
+	{
+		.compatible = "ti,vpe",
+	},
+	{},
+};
+#else
+#define vpe_of_match NULL
+#endif
+
+static struct platform_driver vpe_pdrv = {
+	.probe		= vpe_probe,
+	.remove		= vpe_remove,
+	.driver		= {
+		.name	= VPE_MODULE_NAME,
+		.owner	= THIS_MODULE,
+		.of_match_table = vpe_of_match,
+	},
+};
+
+static void __exit vpe_exit(void)
+{
+	platform_driver_unregister(&vpe_pdrv);
+}
+
+static int __init vpe_init(void)
+{
+	return platform_driver_register(&vpe_pdrv);
+}
+
+module_init(vpe_init);
+module_exit(vpe_exit);
+
+MODULE_DESCRIPTION("TI VPE driver");
+MODULE_AUTHOR("Dale Farnsworth, <dale@farnsworth.org>");
+MODULE_LICENSE("GPL");
diff --git a/drivers/media/platform/ti-vpe/vpe_regs.h b/drivers/media/platform/ti-vpe/vpe_regs.h
new file mode 100644
index 0000000..ed214e8
--- /dev/null
+++ b/drivers/media/platform/ti-vpe/vpe_regs.h
@@ -0,0 +1,496 @@
+/*
+ * Copyright (c) 2013 Texas Instruments Inc.
+ *
+ * David Griego, <dagriego@biglakesoftware.com>
+ * Dale Farnsworth, <dale@farnsworth.org>
+ * Archit Taneja, <archit@ti.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#ifndef __TI_VPE_REGS_H
+#define __TI_VPE_REGS_H
+
+/* VPE register offsets and field selectors */
+
+/* VPE top level regs */
+#define VPE_PID				0x0000
+#define VPE_PID_MINOR_MASK		0x3f
+#define VPE_PID_MINOR_SHIFT		0
+#define VPE_PID_CUSTOM_MASK		0x03
+#define VPE_PID_CUSTOM_SHIFT		6
+#define VPE_PID_MAJOR_MASK		0x07
+#define VPE_PID_MAJOR_SHIFT		8
+#define VPE_PID_RTL_MASK		0x1f
+#define VPE_PID_RTL_SHIFT		11
+#define VPE_PID_FUNC_MASK		0xfff
+#define VPE_PID_FUNC_SHIFT		16
+#define VPE_PID_SCHEME_MASK		0x03
+#define VPE_PID_SCHEME_SHIFT		30
+
+#define VPE_SYSCONFIG			0x0010
+#define VPE_SYSCONFIG_IDLE_MASK		0x03
+#define VPE_SYSCONFIG_IDLE_SHIFT	2
+#define VPE_SYSCONFIG_STANDBY_MASK	0x03
+#define VPE_SYSCONFIG_STANDBY_SHIFT	4
+#define VPE_FORCE_IDLE_MODE		0
+#define VPE_NO_IDLE_MODE		1
+#define VPE_SMART_IDLE_MODE		2
+#define VPE_SMART_IDLE_WAKEUP_MODE	3
+#define VPE_FORCE_STANDBY_MODE		0
+#define VPE_NO_STANDBY_MODE		1
+#define VPE_SMART_STANDBY_MODE		2
+#define VPE_SMART_STANDBY_WAKEUP_MODE	3
+
+#define VPE_INT0_STATUS0_RAW_SET	0x0020
+#define VPE_INT0_STATUS0_RAW		VPE_INT0_STATUS0_RAW_SET
+#define VPE_INT0_STATUS0_CLR		0x0028
+#define VPE_INT0_STATUS0		VPE_INT0_STATUS0_CLR
+#define VPE_INT0_ENABLE0_SET		0x0030
+#define VPE_INT0_ENABLE0		VPE_INT0_ENABLE0_SET
+#define VPE_INT0_ENABLE0_CLR		0x0038
+#define VPE_INT0_LIST0_COMPLETE		(1 << 0)
+#define VPE_INT0_LIST0_NOTIFY		(1 << 1)
+#define VPE_INT0_LIST1_COMPLETE		(1 << 2)
+#define VPE_INT0_LIST1_NOTIFY		(1 << 3)
+#define VPE_INT0_LIST2_COMPLETE		(1 << 4)
+#define VPE_INT0_LIST2_NOTIFY		(1 << 5)
+#define VPE_INT0_LIST3_COMPLETE		(1 << 6)
+#define VPE_INT0_LIST3_NOTIFY		(1 << 7)
+#define VPE_INT0_LIST4_COMPLETE		(1 << 8)
+#define VPE_INT0_LIST4_NOTIFY		(1 << 9)
+#define VPE_INT0_LIST5_COMPLETE		(1 << 10)
+#define VPE_INT0_LIST5_NOTIFY		(1 << 11)
+#define VPE_INT0_LIST6_COMPLETE		(1 << 12)
+#define VPE_INT0_LIST6_NOTIFY		(1 << 13)
+#define VPE_INT0_LIST7_COMPLETE		(1 << 14)
+#define VPE_INT0_LIST7_NOTIFY		(1 << 15)
+#define VPE_INT0_DESCRIPTOR		(1 << 16)
+#define VPE_DEI_FMD_INT			(1 << 18)
+
+#define VPE_INT0_STATUS1_RAW_SET	0x0024
+#define VPE_INT0_STATUS1_RAW		VPE_INT0_STATUS1_RAW_SET
+#define VPE_INT0_STATUS1_CLR		0x002c
+#define VPE_INT0_STATUS1		VPE_INT0_STATUS1_CLR
+#define VPE_INT0_ENABLE1_SET		0x0034
+#define VPE_INT0_ENABLE1		VPE_INT0_ENABLE1_SET
+#define VPE_INT0_ENABLE1_CLR		0x003c
+#define VPE_INT0_CHANNEL_GROUP0		(1 << 0)
+#define VPE_INT0_CHANNEL_GROUP1		(1 << 1)
+#define VPE_INT0_CHANNEL_GROUP2		(1 << 2)
+#define VPE_INT0_CHANNEL_GROUP3		(1 << 3)
+#define VPE_INT0_CHANNEL_GROUP4		(1 << 4)
+#define VPE_INT0_CHANNEL_GROUP5		(1 << 5)
+#define VPE_INT0_CLIENT			(1 << 7)
+#define VPE_DEI_ERROR_INT		(1 << 16)
+#define VPE_DS1_UV_ERROR_INT		(1 << 22)
+
+#define VPE_INTC_EOI			0x00a0
+
+#define VPE_CLK_ENABLE			0x0100
+#define VPE_VPEDMA_CLK_ENABLE		(1 << 0)
+#define VPE_DATA_PATH_CLK_ENABLE	(1 << 1)
+
+#define VPE_CLK_RESET			0x0104
+#define VPE_VPDMA_CLK_RESET_MASK	0x1
+#define VPE_VPDMA_CLK_RESET_SHIFT	0
+#define VPE_DATA_PATH_CLK_RESET_MASK	0x1
+#define VPE_DATA_PATH_CLK_RESET_SHIFT	1
+#define VPE_MAIN_RESET_MASK		0x1
+#define VPE_MAIN_RESET_SHIFT		31
+
+#define VPE_CLK_FORMAT_SELECT		0x010c
+#define VPE_CSC_SRC_SELECT_MASK		0x03
+#define VPE_CSC_SRC_SELECT_SHIFT	0
+#define VPE_RGB_OUT_SELECT		(1 << 8)
+#define VPE_DS_SRC_SELECT_MASK		0x07
+#define VPE_DS_SRC_SELECT_SHIFT		9
+#define VPE_DS_BYPASS			(1 << 16)
+#define VPE_COLOR_SEPARATE_422		(1 << 18)
+
+#define VPE_DS_SRC_DEI_SCALER		(5 << VPE_DS_SRC_SELECT_SHIFT)
+#define VPE_CSC_SRC_DEI_SCALER		(3 << VPE_CSC_SRC_SELECT_SHIFT)
+
+#define VPE_CLK_RANGE_MAP		0x011c
+#define VPE_RANGE_RANGE_MAP_Y_MASK	0x07
+#define VPE_RANGE_RANGE_MAP_Y_SHIFT	0
+#define VPE_RANGE_RANGE_MAP_UV_MASK	0x07
+#define VPE_RANGE_RANGE_MAP_UV_SHIFT	3
+#define VPE_RANGE_MAP_ON		(1 << 6)
+#define VPE_RANGE_REDUCTION_ON		(1 << 28)
+
+/* VPE chrominance upsampler regs */
+#define VPE_US1_R0			0x0304
+#define VPE_US2_R0			0x0404
+#define VPE_US3_R0			0x0504
+#define VPE_US_C1_MASK			0x3fff
+#define VPE_US_C1_SHIFT			2
+#define VPE_US_C0_MASK			0x3fff
+#define VPE_US_C0_SHIFT			18
+#define VPE_US_MODE_MASK		0x03
+#define VPE_US_MODE_SHIFT		16
+#define VPE_ANCHOR_FID0_C1_MASK		0x3fff
+#define VPE_ANCHOR_FID0_C1_SHIFT	2
+#define VPE_ANCHOR_FID0_C0_MASK		0x3fff
+#define VPE_ANCHOR_FID0_C0_SHIFT	18
+
+#define VPE_US1_R1			0x0308
+#define VPE_US2_R1			0x0408
+#define VPE_US3_R1			0x0508
+#define VPE_ANCHOR_FID0_C3_MASK		0x3fff
+#define VPE_ANCHOR_FID0_C3_SHIFT	2
+#define VPE_ANCHOR_FID0_C2_MASK		0x3fff
+#define VPE_ANCHOR_FID0_C2_SHIFT	18
+
+#define VPE_US1_R2			0x030c
+#define VPE_US2_R2			0x040c
+#define VPE_US3_R2			0x050c
+#define VPE_INTERP_FID0_C1_MASK		0x3fff
+#define VPE_INTERP_FID0_C1_SHIFT	2
+#define VPE_INTERP_FID0_C0_MASK		0x3fff
+#define VPE_INTERP_FID0_C0_SHIFT	18
+
+#define VPE_US1_R3			0x0310
+#define VPE_US2_R3			0x0410
+#define VPE_US3_R3			0x0510
+#define VPE_INTERP_FID0_C3_MASK		0x3fff
+#define VPE_INTERP_FID0_C3_SHIFT	2
+#define VPE_INTERP_FID0_C2_MASK		0x3fff
+#define VPE_INTERP_FID0_C2_SHIFT	18
+
+#define VPE_US1_R4			0x0314
+#define VPE_US2_R4			0x0414
+#define VPE_US3_R4			0x0514
+#define VPE_ANCHOR_FID1_C1_MASK		0x3fff
+#define VPE_ANCHOR_FID1_C1_SHIFT	2
+#define VPE_ANCHOR_FID1_C0_MASK		0x3fff
+#define VPE_ANCHOR_FID1_C0_SHIFT	18
+
+#define VPE_US1_R5			0x0318
+#define VPE_US2_R5			0x0418
+#define VPE_US3_R5			0x0518
+#define VPE_ANCHOR_FID1_C3_MASK		0x3fff
+#define VPE_ANCHOR_FID1_C3_SHIFT	2
+#define VPE_ANCHOR_FID1_C2_MASK		0x3fff
+#define VPE_ANCHOR_FID1_C2_SHIFT	18
+
+#define VPE_US1_R6			0x031c
+#define VPE_US2_R6			0x041c
+#define VPE_US3_R6			0x051c
+#define VPE_INTERP_FID1_C1_MASK		0x3fff
+#define VPE_INTERP_FID1_C1_SHIFT	2
+#define VPE_INTERP_FID1_C0_MASK		0x3fff
+#define VPE_INTERP_FID1_C0_SHIFT	18
+
+#define VPE_US1_R7			0x0320
+#define VPE_US2_R7			0x0420
+#define VPE_US3_R7			0x0520
+#define VPE_INTERP_FID0_C3_MASK		0x3fff
+#define VPE_INTERP_FID0_C3_SHIFT	2
+#define VPE_INTERP_FID0_C2_MASK		0x3fff
+#define VPE_INTERP_FID0_C2_SHIFT	18
+
+/* VPE de-interlacer regs */
+#define VPE_DEI_FRAME_SIZE		0x0600
+#define VPE_DEI_WIDTH_MASK		0x07ff
+#define VPE_DEI_WIDTH_SHIFT		0
+#define VPE_DEI_HEIGHT_MASK		0x07ff
+#define VPE_DEI_HEIGHT_SHIFT		16
+#define VPE_DEI_INTERLACE_BYPASS	(1 << 29)
+#define VPE_DEI_FIELD_FLUSH		(1 << 30)
+#define VPE_DEI_PROGRESSIVE		(1 << 31)
+
+#define VPE_MDT_BYPASS			0x0604
+#define VPE_MDT_TEMPMAX_BYPASS		(1 << 0)
+#define VPE_MDT_SPATMAX_BYPASS		(1 << 1)
+
+#define VPE_MDT_SF_THRESHOLD		0x0608
+#define VPE_MDT_SF_SC_THR1_MASK		0xff
+#define VPE_MDT_SF_SC_THR1_SHIFT	0
+#define VPE_MDT_SF_SC_THR2_MASK		0xff
+#define VPE_MDT_SF_SC_THR2_SHIFT	0
+#define VPE_MDT_SF_SC_THR3_MASK		0xff
+#define VPE_MDT_SF_SC_THR3_SHIFT	0
+
+#define VPE_EDI_CONFIG			0x060c
+#define VPE_EDI_INP_MODE_MASK		0x03
+#define VPE_EDI_INP_MODE_SHIFT		0
+#define VPE_EDI_ENABLE_3D		(1 << 2)
+#define VPE_EDI_ENABLE_CHROMA_3D	(1 << 3)
+#define VPE_EDI_CHROMA3D_COR_THR_MASK	0xff
+#define VPE_EDI_CHROMA3D_COR_THR_SHIFT	8
+#define VPE_EDI_DIR_COR_LOWER_THR_MASK	0xff
+#define VPE_EDI_DIR_COR_LOWER_THR_SHIFT	16
+#define VPE_EDI_COR_SCALE_FACTOR_MASK	0xff
+#define VPE_EDI_COR_SCALE_FACTOR_SHIFT	23
+
+#define VPE_DEI_EDI_LUT_R0		0x0610
+#define VPE_EDI_LUT0_MASK		0x1f
+#define VPE_EDI_LUT0_SHIFT		0
+#define VPE_EDI_LUT1_MASK		0x1f
+#define VPE_EDI_LUT1_SHIFT		8
+#define VPE_EDI_LUT2_MASK		0x1f
+#define VPE_EDI_LUT2_SHIFT		16
+#define VPE_EDI_LUT3_MASK		0x1f
+#define VPE_EDI_LUT3_SHIFT		24
+
+#define VPE_DEI_EDI_LUT_R1		0x0614
+#define VPE_EDI_LUT0_MASK		0x1f
+#define VPE_EDI_LUT0_SHIFT		0
+#define VPE_EDI_LUT1_MASK		0x1f
+#define VPE_EDI_LUT1_SHIFT		8
+#define VPE_EDI_LUT2_MASK		0x1f
+#define VPE_EDI_LUT2_SHIFT		16
+#define VPE_EDI_LUT3_MASK		0x1f
+#define VPE_EDI_LUT3_SHIFT		24
+
+#define VPE_DEI_EDI_LUT_R2		0x0618
+#define VPE_EDI_LUT4_MASK		0x1f
+#define VPE_EDI_LUT4_SHIFT		0
+#define VPE_EDI_LUT5_MASK		0x1f
+#define VPE_EDI_LUT5_SHIFT		8
+#define VPE_EDI_LUT6_MASK		0x1f
+#define VPE_EDI_LUT6_SHIFT		16
+#define VPE_EDI_LUT7_MASK		0x1f
+#define VPE_EDI_LUT7_SHIFT		24
+
+#define VPE_DEI_EDI_LUT_R3		0x061c
+#define VPE_EDI_LUT8_MASK		0x1f
+#define VPE_EDI_LUT8_SHIFT		0
+#define VPE_EDI_LUT9_MASK		0x1f
+#define VPE_EDI_LUT9_SHIFT		8
+#define VPE_EDI_LUT10_MASK		0x1f
+#define VPE_EDI_LUT10_SHIFT		16
+#define VPE_EDI_LUT11_MASK		0x1f
+#define VPE_EDI_LUT11_SHIFT		24
+
+#define VPE_DEI_FMD_WINDOW_R0		0x0620
+#define VPE_FMD_WINDOW_MINX_MASK	0x07ff
+#define VPE_FMD_WINDOW_MINX_SHIFT	0
+#define VPE_FMD_WINDOW_MAXX_MASK	0x07ff
+#define VPE_FMD_WINDOW_MAXX_SHIFT	16
+#define VPE_FMD_WINDOW_ENABLE		(1 << 31)
+
+#define VPE_DEI_FMD_WINDOW_R1		0x0624
+#define VPE_FMD_WINDOW_MINY_MASK	0x07ff
+#define VPE_FMD_WINDOW_MINY_SHIFT	0
+#define VPE_FMD_WINDOW_MAXY_MASK	0x07ff
+#define VPE_FMD_WINDOW_MAXY_SHIFT	16
+
+#define VPE_DEI_FMD_CONTROL_R0		0x0628
+#define VPE_FMD_ENABLE			(1 << 0)
+#define VPE_FMD_LOCK			(1 << 1)
+#define VPE_FMD_JAM_DIR			(1 << 2)
+#define VPE_FMD_BED_ENABLE		(1 << 3)
+#define VPE_FMD_CAF_FIELD_THR_MASK	0xff
+#define VPE_FMD_CAF_FIELD_THR_SHIFT	16
+#define VPE_FMD_CAF_LINE_THR_MASK	0xff
+#define VPE_FMD_CAF_LINE_THR_SHIFT	24
+
+#define VPE_DEI_FMD_CONTROL_R1		0x062c
+#define VPE_FMD_CAF_THR_MASK		0x000fffff
+#define VPE_FMD_CAF_THR_SHIFT		0
+
+#define VPE_DEI_FMD_STATUS_R0		0x0630
+#define VPE_FMD_CAF_MASK		0x000fffff
+#define VPE_FMD_CAF_SHIFT		0
+#define VPE_FMD_RESET			(1 << 24)
+
+#define VPE_DEI_FMD_STATUS_R1		0x0634
+#define VPE_FMD_FIELD_DIFF_MASK		0x0fffffff
+#define VPE_FMD_FIELD_DIFF_SHIFT	0
+
+#define VPE_DEI_FMD_STATUS_R2		0x0638
+#define VPE_FMD_FRAME_DIFF_MASK		0x000fffff
+#define VPE_FMD_FRAME_DIFF_SHIFT	0
+
+/* VPE scaler regs */
+#define VPE_SC_MP_SC0			0x0700
+#define VPE_INTERLACE_O			(1 << 0)
+#define VPE_LINEAR			(1 << 1)
+#define VPE_SC_BYPASS			(1 << 2)
+#define VPE_INVT_FID			(1 << 3)
+#define VPE_USE_RAV			(1 << 4)
+#define VPE_ENABLE_EV			(1 << 5)
+#define VPE_AUTO_HS			(1 << 6)
+#define VPE_DCM_2X			(1 << 7)
+#define VPE_DCM_4X			(1 << 8)
+#define VPE_HP_BYPASS			(1 << 9)
+#define VPE_INTERLACE_I			(1 << 10)
+#define VPE_ENABLE_SIN2_VER_INTP	(1 << 11)
+#define VPE_Y_PK_EN			(1 << 14)
+#define VPE_TRIM			(1 << 15)
+#define VPE_SELFGEN_FID			(1 << 16)
+
+#define VPE_SC_MP_SC1			0x0704
+#define VPE_ROW_ACC_INC_MASK		0x07ffffff
+#define VPE_ROW_ACC_INC_SHIFT		0
+
+#define VPE_SC_MP_SC2			0x0708
+#define VPE_ROW_ACC_OFFSET_MASK		0x0fffffff
+#define VPE_ROW_ACC_OFFSET_SHIFT	0
+
+#define VPE_SC_MP_SC3			0x070c
+#define VPE_ROW_ACC_OFFSET_B_MASK	0x0fffffff
+#define VPE_ROW_ACC_OFFSET_B_SHIFT	0
+
+#define VPE_SC_MP_SC4			0x0710
+#define VPE_TAR_H_MASK			0x07ff
+#define VPE_TAR_H_SHIFT			0
+#define VPE_TAR_W_MASK			0x07ff
+#define VPE_TAR_W_SHIFT			12
+#define VPE_LIN_ACC_INC_U_MASK		0x07
+#define VPE_LIN_ACC_INC_U_SHIFT		24
+#define VPE_NLIN_ACC_INIT_U_MASK	0x07
+#define VPE_NLIN_ACC_INIT_U_SHIFT	28
+
+#define VPE_SC_MP_SC5			0x0714
+#define VPE_SRC_H_MASK			0x07ff
+#define VPE_SRC_H_SHIFT			0
+#define VPE_SRC_W_MASK			0x07ff
+#define VPE_SRC_W_SHIFT			12
+#define VPE_NLIN_ACC_INC_U_MASK		0x07
+#define VPE_NLIN_ACC_INC_U_SHIFT	24
+
+#define VPE_SC_MP_SC6			0x0718
+#define VPE_ROW_ACC_INIT_RAV_MASK	0x03ff
+#define VPE_ROW_ACC_INIT_RAV_SHIFT	0
+#define VPE_ROW_ACC_INIT_RAV_B_MASK	0x03ff
+#define VPE_ROW_ACC_INIT_RAV_B_SHIFT	10
+
+#define VPE_SC_MP_SC8			0x0720
+#define VPE_NLIN_LEFT_MASK		0x07ff
+#define VPE_NLIN_LEFT_SHIFT		0
+#define VPE_NLIN_RIGHT_MASK		0x07ff
+#define VPE_NLIN_RIGHT_SHIFT		12
+
+#define VPE_SC_MP_SC9			0x0724
+#define VPE_LIN_ACC_INC			VPE_SC_MP_SC9
+
+#define VPE_SC_MP_SC10			0x0728
+#define VPE_NLIN_ACC_INIT		VPE_SC_MP_SC10
+
+#define VPE_SC_MP_SC11			0x072c
+#define VPE_NLIN_ACC_INC		VPE_SC_MP_SC11
+
+#define VPE_SC_MP_SC12			0x0730
+#define VPE_COL_ACC_OFFSET_MASK		0x01ffffff
+#define VPE_COL_ACC_OFFSET_SHIFT	0
+
+#define VPE_SC_MP_SC13			0x0734
+#define VPE_SC_FACTOR_RAV_MASK		0x03ff
+#define VPE_SC_FACTOR_RAV_SHIFT		0
+#define VPE_CHROMA_INTP_THR_MASK	0x03ff
+#define VPE_CHROMA_INTP_THR_SHIFT	12
+#define VPE_DELTA_CHROMA_THR_MASK	0x0f
+#define VPE_DELTA_CHROMA_THR_SHIFT	24
+
+#define VPE_SC_MP_SC17			0x0744
+#define VPE_EV_THR_MASK			0x03ff
+#define VPE_EV_THR_SHIFT		12
+#define VPE_DELTA_LUMA_THR_MASK		0x0f
+#define VPE_DELTA_LUMA_THR_SHIFT	24
+#define VPE_DELTA_EV_THR_MASK		0x0f
+#define VPE_DELTA_EV_THR_SHIFT		28
+
+#define VPE_SC_MP_SC18			0x0748
+#define VPE_HS_FACTOR_MASK		0x03ff
+#define VPE_HS_FACTOR_SHIFT		0
+#define VPE_CONF_DEFAULT_MASK		0x01ff
+#define VPE_CONF_DEFAULT_SHIFT		16
+
+#define VPE_SC_MP_SC19			0x074c
+#define VPE_HPF_COEFF0_MASK		0xff
+#define VPE_HPF_COEFF0_SHIFT		0
+#define VPE_HPF_COEFF1_MASK		0xff
+#define VPE_HPF_COEFF1_SHIFT		8
+#define VPE_HPF_COEFF2_MASK		0xff
+#define VPE_HPF_COEFF2_SHIFT		16
+#define VPE_HPF_COEFF3_MASK		0xff
+#define VPE_HPF_COEFF3_SHIFT		23
+
+#define VPE_SC_MP_SC20			0x0750
+#define VPE_HPF_COEFF4_MASK		0xff
+#define VPE_HPF_COEFF4_SHIFT		0
+#define VPE_HPF_COEFF5_MASK		0xff
+#define VPE_HPF_COEFF5_SHIFT		8
+#define VPE_HPF_NORM_SHIFT_MASK		0x07
+#define VPE_HPF_NORM_SHIFT_SHIFT	16
+#define VPE_NL_LIMIT_MASK		0x1ff
+#define VPE_NL_LIMIT_SHIFT		20
+
+#define VPE_SC_MP_SC21			0x0754
+#define VPE_NL_LO_THR_MASK		0x01ff
+#define VPE_NL_LO_THR_SHIFT		0
+#define VPE_NL_LO_SLOPE_MASK		0xff
+#define VPE_NL_LO_SLOPE_SHIFT		16
+
+#define VPE_SC_MP_SC22			0x0758
+#define VPE_NL_HI_THR_MASK		0x01ff
+#define VPE_NL_HI_THR_SHIFT		0
+#define VPE_NL_HI_SLOPE_SH_MASK		0x07
+#define VPE_NL_HI_SLOPE_SH_SHIFT	16
+
+#define VPE_SC_MP_SC23			0x075c
+#define VPE_GRADIENT_THR_MASK		0x07ff
+#define VPE_GRADIENT_THR_SHIFT		0
+#define VPE_GRADIENT_THR_RANGE_MASK	0x0f
+#define VPE_GRADIENT_THR_RANGE_SHIFT	12
+#define VPE_MIN_GY_THR_MASK		0xff
+#define VPE_MIN_GY_THR_SHIFT		16
+#define VPE_MIN_GY_THR_RANGE_MASK	0x0f
+#define VPE_MIN_GY_THR_RANGE_SHIFT	28
+
+#define VPE_SC_MP_SC24			0x0760
+#define VPE_ORG_H_MASK			0x07ff
+#define VPE_ORG_H_SHIFT			0
+#define VPE_ORG_W_MASK			0x07ff
+#define VPE_ORG_W_SHIFT			16
+
+#define VPE_SC_MP_SC25			0x0764
+#define VPE_OFF_H_MASK			0x07ff
+#define VPE_OFF_H_SHIFT			0
+#define VPE_OFF_W_MASK			0x07ff
+#define VPE_OFF_W_SHIFT			16
+
+/* VPE color space converter regs */
+#define VPE_CSC_CSC00			0x5700
+#define VPE_CSC_A0_MASK			0x1fff
+#define VPE_CSC_A0_SHIFT		0
+#define VPE_CSC_B0_MASK			0x1fff
+#define VPE_CSC_B0_SHIFT		16
+
+#define VPE_CSC_CSC01			0x5704
+#define VPE_CSC_C0_MASK			0x1fff
+#define VPE_CSC_C0_SHIFT		0
+#define VPE_CSC_A1_MASK			0x1fff
+#define VPE_CSC_A1_SHIFT		16
+
+#define VPE_CSC_CSC02			0x5708
+#define VPE_CSC_B1_MASK			0x1fff
+#define VPE_CSC_B1_SHIFT		0
+#define VPE_CSC_C1_MASK			0x1fff
+#define VPE_CSC_C1_SHIFT		16
+
+#define VPE_CSC_CSC03			0x570c
+#define VPE_CSC_A2_MASK			0x1fff
+#define VPE_CSC_A2_SHIFT		0
+#define VPE_CSC_B2_MASK			0x1fff
+#define VPE_CSC_B2_SHIFT		16
+
+#define VPE_CSC_CSC04			0x5710
+#define VPE_CSC_C2_MASK			0x1fff
+#define VPE_CSC_C2_SHIFT		0
+#define VPE_CSC_D0_MASK			0x0fff
+#define VPE_CSC_D0_SHIFT		16
+
+#define VPE_CSC_CSC05			0x5714
+#define VPE_CSC_D1_MASK			0x0fff
+#define VPE_CSC_D1_SHIFT		0
+#define VPE_CSC_D2_MASK			0x0fff
+#define VPE_CSC_D2_SHIFT		16
+#define VPE_CSC_BYPASS			(1 << 28)
+
+#endif
diff --git a/include/uapi/linux/v4l2-controls.h b/include/uapi/linux/v4l2-controls.h
index 083bb5a..1666aab 100644
--- a/include/uapi/linux/v4l2-controls.h
+++ b/include/uapi/linux/v4l2-controls.h
@@ -160,6 +160,10 @@ enum v4l2_colorfx {
  * of controls. Total of 16 controls is reserved for this driver */
 #define V4L2_CID_USER_SI476X_BASE		(V4L2_CID_USER_BASE + 0x1040)
 
+/* The base for the TI VPE driver controls. Total of 16 controls is reserved for
+ * this driver */
+#define V4L2_CID_USER_TI_VPE_BASE		(V4L2_CID_USER_BASE + 0x1050)
+
 /* MPEG-class control IDs */
 /* The MPEG controls are applicable to all codec controls
  * and the 'MPEG' part of the define is historical */
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v5 3/4] v4l: ti-vpe: Add VPE mem to mem driver
@ 2013-10-16  5:36         ` Archit Taneja
  0 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-10-16  5:36 UTC (permalink / raw)
  To: k.debski; +Cc: hverkuil, linux-media, linux-omap, Archit Taneja

VPE is a block which consists of a single memory to memory path which can
perform chrominance up/down sampling, de-interlacing, scaling, and color space
conversion of raster or tiled YUV420 coplanar, YUV422 coplanar or YUV422
interleaved video formats.

We create a mem2mem driver based primarily on the mem2mem-testdev example.
The de-interlacer, scaler and color space converter are all bypassed for now
to keep the driver simple. Chroma up/down sampler blocks are implemented, so
conversion beteen different YUV formats is possible.

Each mem2mem context allocates a buffer for VPE MMR values which it will use
when it gets access to the VPE HW via the mem2mem queue, it also allocates
a VPDMA descriptor list to which configuration and data descriptors are added.

Based on the information received via v4l2 ioctls for the source and
destination queues, the driver configures the values for the MMRs, and stores
them in the buffer. There are also some VPDMA parameters like frame start and
line mode which needs to be configured, these are configured by direct register
writes via the VPDMA helper functions.

The driver's device_run() mem2mem op will add each descriptor based on how the
source and destination queues are set up for the given ctx, once the list is
prepared, it's submitted to VPDMA, these descriptors when parsed by VPDMA will
upload MMR registers, start DMA of video buffers on the various input and output
clients/ports.

When the list is parsed completely(and the DMAs on all the output ports done),
an interrupt is generated which we use to notify that the source and destination
buffers are done.

The rest of the driver is quite similar to other mem2mem drivers, we use the
multiplane v4l2 ioctls as the HW support coplanar formats.

Acked-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Archit Taneja <archit@ti.com>
---
 drivers/media/platform/Kconfig           |   16 +
 drivers/media/platform/Makefile          |    2 +
 drivers/media/platform/ti-vpe/Makefile   |    5 +
 drivers/media/platform/ti-vpe/vpe.c      | 1775 ++++++++++++++++++++++++++++++
 drivers/media/platform/ti-vpe/vpe_regs.h |  496 +++++++++
 include/uapi/linux/v4l2-controls.h       |    4 +
 6 files changed, 2298 insertions(+)
 create mode 100644 drivers/media/platform/ti-vpe/Makefile
 create mode 100644 drivers/media/platform/ti-vpe/vpe.c
 create mode 100644 drivers/media/platform/ti-vpe/vpe_regs.h

diff --git a/drivers/media/platform/Kconfig b/drivers/media/platform/Kconfig
index c7caf94..fc84d99 100644
--- a/drivers/media/platform/Kconfig
+++ b/drivers/media/platform/Kconfig
@@ -220,6 +220,22 @@ config VIDEO_RENESAS_VSP1
 	  To compile this driver as a module, choose M here: the module
 	  will be called vsp1.
 
+config VIDEO_TI_VPE
+	tristate "TI VPE (Video Processing Engine) driver"
+	depends on VIDEO_DEV && VIDEO_V4L2 && SOC_DRA7XX
+	select VIDEOBUF2_DMA_CONTIG
+	select V4L2_MEM2MEM_DEV
+	default n
+	---help---
+	  Support for the TI VPE(Video Processing Engine) block
+	  found on DRA7XX SoC.
+
+config VIDEO_TI_VPE_DEBUG
+	bool "VPE debug messages"
+	depends on VIDEO_TI_VPE
+	---help---
+	  Enable debug messages on VPE driver.
+
 endif # V4L_MEM2MEM_DRIVERS
 
 menuconfig V4L_TEST_DRIVERS
diff --git a/drivers/media/platform/Makefile b/drivers/media/platform/Makefile
index 4e4da48..1348ba1 100644
--- a/drivers/media/platform/Makefile
+++ b/drivers/media/platform/Makefile
@@ -22,6 +22,8 @@ obj-$(CONFIG_VIDEO_VIVI) += vivi.o
 
 obj-$(CONFIG_VIDEO_MEM2MEM_TESTDEV) += mem2mem_testdev.o
 
+obj-$(CONFIG_VIDEO_TI_VPE)		+= ti-vpe/
+
 obj-$(CONFIG_VIDEO_MX2_EMMAPRP)		+= mx2_emmaprp.o
 obj-$(CONFIG_VIDEO_CODA) 		+= coda.o
 
diff --git a/drivers/media/platform/ti-vpe/Makefile b/drivers/media/platform/ti-vpe/Makefile
new file mode 100644
index 0000000..cbf0a80
--- /dev/null
+++ b/drivers/media/platform/ti-vpe/Makefile
@@ -0,0 +1,5 @@
+obj-$(CONFIG_VIDEO_TI_VPE) += ti-vpe.o
+
+ti-vpe-y := vpe.o vpdma.o
+
+ccflags-$(CONFIG_VIDEO_TI_VPE_DEBUG) += -DDEBUG
diff --git a/drivers/media/platform/ti-vpe/vpe.c b/drivers/media/platform/ti-vpe/vpe.c
new file mode 100644
index 0000000..3bd9ca6
--- /dev/null
+++ b/drivers/media/platform/ti-vpe/vpe.c
@@ -0,0 +1,1775 @@
+/*
+ * TI VPE mem2mem driver, based on the virtual v4l2-mem2mem example driver
+ *
+ * Copyright (c) 2013 Texas Instruments Inc.
+ * David Griego, <dagriego@biglakesoftware.com>
+ * Dale Farnsworth, <dale@farnsworth.org>
+ * Archit Taneja, <archit@ti.com>
+ *
+ * Copyright (c) 2009-2010 Samsung Electronics Co., Ltd.
+ * Pawel Osciak, <pawel@osciak.com>
+ * Marek Szyprowski, <m.szyprowski@samsung.com>
+ *
+ * Based on the virtual v4l2-mem2mem example device
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation
+ */
+
+#include <linux/delay.h>
+#include <linux/dma-mapping.h>
+#include <linux/err.h>
+#include <linux/fs.h>
+#include <linux/interrupt.h>
+#include <linux/io.h>
+#include <linux/ioctl.h>
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <linux/pm_runtime.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/videodev2.h>
+
+#include <media/v4l2-common.h>
+#include <media/v4l2-ctrls.h>
+#include <media/v4l2-device.h>
+#include <media/v4l2-event.h>
+#include <media/v4l2-ioctl.h>
+#include <media/v4l2-mem2mem.h>
+#include <media/videobuf2-core.h>
+#include <media/videobuf2-dma-contig.h>
+
+#include "vpdma.h"
+#include "vpe_regs.h"
+
+#define VPE_MODULE_NAME "vpe"
+
+/* minimum and maximum frame sizes */
+#define MIN_W		128
+#define MIN_H		128
+#define MAX_W		1920
+#define MAX_H		1080
+
+/* required alignments */
+#define S_ALIGN		0	/* multiple of 1 */
+#define H_ALIGN		1	/* multiple of 2 */
+#define W_ALIGN		1	/* multiple of 2 */
+
+/* multiple of 128 bits, line stride, 16 bytes */
+#define L_ALIGN		4
+
+/* flags that indicate a format can be used for capture/output */
+#define VPE_FMT_TYPE_CAPTURE	(1 << 0)
+#define VPE_FMT_TYPE_OUTPUT	(1 << 1)
+
+/* used as plane indices */
+#define VPE_MAX_PLANES	2
+#define VPE_LUMA	0
+#define VPE_CHROMA	1
+
+/* per m2m context info */
+#define VPE_DEF_BUFS_PER_JOB	1	/* default one buffer per batch job */
+
+/*
+ * each VPE context can need up to 3 config desciptors, 7 input descriptors,
+ * 3 output descriptors, and 10 control descriptors
+ */
+#define VPE_DESC_LIST_SIZE	(10 * VPDMA_DTD_DESC_SIZE +	\
+					13 * VPDMA_CFD_CTD_DESC_SIZE)
+
+#define vpe_dbg(vpedev, fmt, arg...)	\
+		dev_dbg((vpedev)->v4l2_dev.dev, fmt, ##arg)
+#define vpe_err(vpedev, fmt, arg...)	\
+		dev_err((vpedev)->v4l2_dev.dev, fmt, ##arg)
+
+struct vpe_us_coeffs {
+	unsigned short	anchor_fid0_c0;
+	unsigned short	anchor_fid0_c1;
+	unsigned short	anchor_fid0_c2;
+	unsigned short	anchor_fid0_c3;
+	unsigned short	interp_fid0_c0;
+	unsigned short	interp_fid0_c1;
+	unsigned short	interp_fid0_c2;
+	unsigned short	interp_fid0_c3;
+	unsigned short	anchor_fid1_c0;
+	unsigned short	anchor_fid1_c1;
+	unsigned short	anchor_fid1_c2;
+	unsigned short	anchor_fid1_c3;
+	unsigned short	interp_fid1_c0;
+	unsigned short	interp_fid1_c1;
+	unsigned short	interp_fid1_c2;
+	unsigned short	interp_fid1_c3;
+};
+
+/*
+ * Default upsampler coefficients
+ */
+static const struct vpe_us_coeffs us_coeffs[] = {
+	{
+		/* Coefficients for progressive input */
+		0x00C8, 0x0348, 0x0018, 0x3FD8, 0x3FB8, 0x0378, 0x00E8, 0x3FE8,
+		0x00C8, 0x0348, 0x0018, 0x3FD8, 0x3FB8, 0x0378, 0x00E8, 0x3FE8,
+	},
+};
+
+/*
+ * The port_data structure contains per-port data.
+ */
+struct vpe_port_data {
+	enum vpdma_channel channel;	/* VPDMA channel */
+	u8	vb_part;		/* plane index for co-panar formats */
+};
+
+/*
+ * Define indices into the port_data tables
+ */
+#define VPE_PORT_LUMA1_IN	0
+#define VPE_PORT_CHROMA1_IN	1
+#define VPE_PORT_LUMA_OUT	8
+#define VPE_PORT_CHROMA_OUT	9
+#define VPE_PORT_RGB_OUT	10
+
+static const struct vpe_port_data port_data[11] = {
+	[VPE_PORT_LUMA1_IN] = {
+		.channel	= VPE_CHAN_LUMA1_IN,
+		.vb_part	= VPE_LUMA,
+	},
+	[VPE_PORT_CHROMA1_IN] = {
+		.channel	= VPE_CHAN_CHROMA1_IN,
+		.vb_part	= VPE_CHROMA,
+	},
+	[VPE_PORT_LUMA_OUT] = {
+		.channel	= VPE_CHAN_LUMA_OUT,
+		.vb_part	= VPE_LUMA,
+	},
+	[VPE_PORT_CHROMA_OUT] = {
+		.channel	= VPE_CHAN_CHROMA_OUT,
+		.vb_part	= VPE_CHROMA,
+	},
+	[VPE_PORT_RGB_OUT] = {
+		.channel	= VPE_CHAN_RGB_OUT,
+		.vb_part	= VPE_LUMA,
+	},
+};
+
+
+/* driver info for each of the supported video formats */
+struct vpe_fmt {
+	char	*name;			/* human-readable name */
+	u32	fourcc;			/* standard format identifier */
+	u8	types;			/* CAPTURE and/or OUTPUT */
+	u8	coplanar;		/* set for unpacked Luma and Chroma */
+	/* vpdma format info for each plane */
+	struct vpdma_data_format const *vpdma_fmt[VPE_MAX_PLANES];
+};
+
+static struct vpe_fmt vpe_formats[] = {
+	{
+		.name		= "YUV 422 co-planar",
+		.fourcc		= V4L2_PIX_FMT_NV16,
+		.types		= VPE_FMT_TYPE_CAPTURE | VPE_FMT_TYPE_OUTPUT,
+		.coplanar	= 1,
+		.vpdma_fmt	= { &vpdma_yuv_fmts[VPDMA_DATA_FMT_Y444],
+				    &vpdma_yuv_fmts[VPDMA_DATA_FMT_C444],
+				  },
+	},
+	{
+		.name		= "YUV 420 co-planar",
+		.fourcc		= V4L2_PIX_FMT_NV12,
+		.types		= VPE_FMT_TYPE_CAPTURE | VPE_FMT_TYPE_OUTPUT,
+		.coplanar	= 1,
+		.vpdma_fmt	= { &vpdma_yuv_fmts[VPDMA_DATA_FMT_Y420],
+				    &vpdma_yuv_fmts[VPDMA_DATA_FMT_C420],
+				  },
+	},
+	{
+		.name		= "YUYV 422 packed",
+		.fourcc		= V4L2_PIX_FMT_YUYV,
+		.types		= VPE_FMT_TYPE_CAPTURE | VPE_FMT_TYPE_OUTPUT,
+		.coplanar	= 0,
+		.vpdma_fmt	= { &vpdma_yuv_fmts[VPDMA_DATA_FMT_YC422],
+				  },
+	},
+	{
+		.name		= "UYVY 422 packed",
+		.fourcc		= V4L2_PIX_FMT_UYVY,
+		.types		= VPE_FMT_TYPE_CAPTURE | VPE_FMT_TYPE_OUTPUT,
+		.coplanar	= 0,
+		.vpdma_fmt	= { &vpdma_yuv_fmts[VPDMA_DATA_FMT_CY422],
+				  },
+	},
+};
+
+/*
+ * per-queue, driver-specific private data.
+ * there is one source queue and one destination queue for each m2m context.
+ */
+struct vpe_q_data {
+	unsigned int		width;				/* frame width */
+	unsigned int		height;				/* frame height */
+	unsigned int		bytesperline[VPE_MAX_PLANES];	/* bytes per line in memory */
+	enum v4l2_colorspace	colorspace;
+	unsigned int		flags;
+	unsigned int		sizeimage[VPE_MAX_PLANES];	/* image size in memory */
+	struct v4l2_rect	c_rect;				/* crop/compose rectangle */
+	struct vpe_fmt		*fmt;				/* format info */
+};
+
+/* vpe_q_data flag bits */
+#define	Q_DATA_FRAME_1D		(1 << 0)
+#define	Q_DATA_MODE_TILED	(1 << 1)
+
+enum {
+	Q_DATA_SRC = 0,
+	Q_DATA_DST = 1,
+};
+
+/* find our format description corresponding to the passed v4l2_format */
+static struct vpe_fmt *find_format(struct v4l2_format *f)
+{
+	struct vpe_fmt *fmt;
+	unsigned int k;
+
+	for (k = 0; k < ARRAY_SIZE(vpe_formats); k++) {
+		fmt = &vpe_formats[k];
+		if (fmt->fourcc == f->fmt.pix.pixelformat)
+			return fmt;
+	}
+
+	return NULL;
+}
+
+/*
+ * there is one vpe_dev structure in the driver, it is shared by
+ * all instances.
+ */
+struct vpe_dev {
+	struct v4l2_device	v4l2_dev;
+	struct video_device	vfd;
+	struct v4l2_m2m_dev	*m2m_dev;
+
+	atomic_t		num_instances;	/* count of driver instances */
+	dma_addr_t		loaded_mmrs;	/* shadow mmrs in device */
+	struct mutex		dev_mutex;
+	spinlock_t		lock;
+
+	int			irq;
+	void __iomem		*base;
+
+	struct vb2_alloc_ctx	*alloc_ctx;
+	struct vpdma_data	*vpdma;		/* vpdma data handle */
+};
+
+/*
+ * There is one vpe_ctx structure for each m2m context.
+ */
+struct vpe_ctx {
+	struct v4l2_fh		fh;
+	struct vpe_dev		*dev;
+	struct v4l2_m2m_ctx	*m2m_ctx;
+	struct v4l2_ctrl_handler hdl;
+
+	unsigned int		sequence;		/* current frame/field seq */
+	unsigned int		aborting;		/* abort after next irq */
+
+	unsigned int		bufs_per_job;		/* input buffers per batch */
+	unsigned int		bufs_completed;		/* bufs done in this batch */
+
+	struct vpe_q_data	q_data[2];		/* src & dst queue data */
+	struct vb2_buffer	*src_vb;
+	struct vb2_buffer	*dst_vb;
+
+	struct vpdma_buf	mmr_adb;		/* shadow reg addr/data block */
+	struct vpdma_desc_list	desc_list;		/* DMA descriptor list */
+
+	bool			load_mmrs;		/* have new shadow reg values */
+};
+
+
+/*
+ * M2M devices get 2 queues.
+ * Return the queue given the type.
+ */
+static struct vpe_q_data *get_q_data(struct vpe_ctx *ctx,
+				     enum v4l2_buf_type type)
+{
+	switch (type) {
+	case V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE:
+		return &ctx->q_data[Q_DATA_SRC];
+	case V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE:
+		return &ctx->q_data[Q_DATA_DST];
+	default:
+		BUG();
+	}
+	return NULL;
+}
+
+static u32 read_reg(struct vpe_dev *dev, int offset)
+{
+	return ioread32(dev->base + offset);
+}
+
+static void write_reg(struct vpe_dev *dev, int offset, u32 value)
+{
+	iowrite32(value, dev->base + offset);
+}
+
+/* register field read/write helpers */
+static int get_field(u32 value, u32 mask, int shift)
+{
+	return (value & (mask << shift)) >> shift;
+}
+
+static int read_field_reg(struct vpe_dev *dev, int offset, u32 mask, int shift)
+{
+	return get_field(read_reg(dev, offset), mask, shift);
+}
+
+static void write_field(u32 *valp, u32 field, u32 mask, int shift)
+{
+	u32 val = *valp;
+
+	val &= ~(mask << shift);
+	val |= (field & mask) << shift;
+	*valp = val;
+}
+
+static void write_field_reg(struct vpe_dev *dev, int offset, u32 field,
+		u32 mask, int shift)
+{
+	u32 val = read_reg(dev, offset);
+
+	write_field(&val, field, mask, shift);
+
+	write_reg(dev, offset, val);
+}
+
+/*
+ * DMA address/data block for the shadow registers
+ */
+struct vpe_mmr_adb {
+	struct vpdma_adb_hdr	out_fmt_hdr;
+	u32			out_fmt_reg[1];
+	u32			out_fmt_pad[3];
+	struct vpdma_adb_hdr	us1_hdr;
+	u32			us1_regs[8];
+	struct vpdma_adb_hdr	us2_hdr;
+	u32			us2_regs[8];
+	struct vpdma_adb_hdr	us3_hdr;
+	u32			us3_regs[8];
+	struct vpdma_adb_hdr	dei_hdr;
+	u32			dei_regs[1];
+	u32			dei_pad[3];
+	struct vpdma_adb_hdr	sc_hdr;
+	u32			sc_regs[1];
+	u32			sc_pad[3];
+	struct vpdma_adb_hdr	csc_hdr;
+	u32			csc_regs[6];
+	u32			csc_pad[2];
+};
+
+#define VPE_SET_MMR_ADB_HDR(ctx, hdr, regs, offset_a)	\
+	VPDMA_SET_MMR_ADB_HDR(ctx->mmr_adb, vpe_mmr_adb, hdr, regs, offset_a)
+/*
+ * Set the headers for all of the address/data block structures.
+ */
+static void init_adb_hdrs(struct vpe_ctx *ctx)
+{
+	VPE_SET_MMR_ADB_HDR(ctx, out_fmt_hdr, out_fmt_reg, VPE_CLK_FORMAT_SELECT);
+	VPE_SET_MMR_ADB_HDR(ctx, us1_hdr, us1_regs, VPE_US1_R0);
+	VPE_SET_MMR_ADB_HDR(ctx, us2_hdr, us2_regs, VPE_US2_R0);
+	VPE_SET_MMR_ADB_HDR(ctx, us3_hdr, us3_regs, VPE_US3_R0);
+	VPE_SET_MMR_ADB_HDR(ctx, dei_hdr, dei_regs, VPE_DEI_FRAME_SIZE);
+	VPE_SET_MMR_ADB_HDR(ctx, sc_hdr, sc_regs, VPE_SC_MP_SC0);
+	VPE_SET_MMR_ADB_HDR(ctx, csc_hdr, csc_regs, VPE_CSC_CSC00);
+};
+
+/*
+ * Enable or disable the VPE clocks
+ */
+static void vpe_set_clock_enable(struct vpe_dev *dev, bool on)
+{
+	u32 val = 0;
+
+	if (on)
+		val = VPE_DATA_PATH_CLK_ENABLE | VPE_VPEDMA_CLK_ENABLE;
+	write_reg(dev, VPE_CLK_ENABLE, val);
+}
+
+static void vpe_top_reset(struct vpe_dev *dev)
+{
+
+	write_field_reg(dev, VPE_CLK_RESET, 1, VPE_DATA_PATH_CLK_RESET_MASK,
+		VPE_DATA_PATH_CLK_RESET_SHIFT);
+
+	usleep_range(100, 150);
+
+	write_field_reg(dev, VPE_CLK_RESET, 0, VPE_DATA_PATH_CLK_RESET_MASK,
+		VPE_DATA_PATH_CLK_RESET_SHIFT);
+}
+
+static void vpe_top_vpdma_reset(struct vpe_dev *dev)
+{
+	write_field_reg(dev, VPE_CLK_RESET, 1, VPE_VPDMA_CLK_RESET_MASK,
+		VPE_VPDMA_CLK_RESET_SHIFT);
+
+	usleep_range(100, 150);
+
+	write_field_reg(dev, VPE_CLK_RESET, 0, VPE_VPDMA_CLK_RESET_MASK,
+		VPE_VPDMA_CLK_RESET_SHIFT);
+}
+
+/*
+ * Load the correct of upsampler coefficients into the shadow MMRs
+ */
+static void set_us_coefficients(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	u32 *us1_reg = &mmr_adb->us1_regs[0];
+	u32 *us2_reg = &mmr_adb->us2_regs[0];
+	u32 *us3_reg = &mmr_adb->us3_regs[0];
+	const unsigned short *cp, *end_cp;
+
+	cp = &us_coeffs[0].anchor_fid0_c0;
+
+	end_cp = cp + sizeof(us_coeffs[0]) / sizeof(*cp);
+
+	while (cp < end_cp) {
+		write_field(us1_reg, *cp++, VPE_US_C0_MASK, VPE_US_C0_SHIFT);
+		write_field(us1_reg, *cp++, VPE_US_C1_MASK, VPE_US_C1_SHIFT);
+		*us2_reg++ = *us1_reg;
+		*us3_reg++ = *us1_reg++;
+	}
+	ctx->load_mmrs = true;
+}
+
+/*
+ * Set the upsampler config mode and the VPDMA line mode in the shadow MMRs.
+ */
+static void set_cfg_and_line_modes(struct vpe_ctx *ctx)
+{
+	struct vpe_fmt *fmt = ctx->q_data[Q_DATA_SRC].fmt;
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	u32 *us1_reg0 = &mmr_adb->us1_regs[0];
+	u32 *us2_reg0 = &mmr_adb->us2_regs[0];
+	u32 *us3_reg0 = &mmr_adb->us3_regs[0];
+	int line_mode = 1;
+	int cfg_mode = 1;
+
+	/*
+	 * Cfg Mode 0: YUV420 source, enable upsampler, DEI is de-interlacing.
+	 * Cfg Mode 1: YUV422 source, disable upsampler, DEI is de-interlacing.
+	 */
+
+	if (fmt->fourcc == V4L2_PIX_FMT_NV12) {
+		cfg_mode = 0;
+		line_mode = 0;		/* double lines to line buffer */
+	}
+
+	write_field(us1_reg0, cfg_mode, VPE_US_MODE_MASK, VPE_US_MODE_SHIFT);
+	write_field(us2_reg0, cfg_mode, VPE_US_MODE_MASK, VPE_US_MODE_SHIFT);
+	write_field(us3_reg0, cfg_mode, VPE_US_MODE_MASK, VPE_US_MODE_SHIFT);
+
+	/* regs for now */
+	vpdma_set_line_mode(ctx->dev->vpdma, line_mode, VPE_CHAN_CHROMA1_IN);
+
+	/* frame start for input luma */
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_LUMA1_IN);
+
+	/* frame start for input chroma */
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_CHROMA1_IN);
+
+	ctx->load_mmrs = true;
+}
+
+/*
+ * Set the shadow registers that are modified when the source
+ * format changes.
+ */
+static void set_src_registers(struct vpe_ctx *ctx)
+{
+	set_us_coefficients(ctx);
+}
+
+/*
+ * Set the shadow registers that are modified when the destination
+ * format changes.
+ */
+static void set_dst_registers(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	struct vpe_fmt *fmt = ctx->q_data[Q_DATA_DST].fmt;
+	u32 val = 0;
+
+	/* select RGB path when color space conversion is supported in future */
+	if (fmt->fourcc == V4L2_PIX_FMT_RGB24)
+		val |= VPE_RGB_OUT_SELECT | VPE_CSC_SRC_DEI_SCALER;
+	else if (fmt->fourcc == V4L2_PIX_FMT_NV16)
+		val |= VPE_COLOR_SEPARATE_422;
+
+	/* The source of CHR_DS is always the scaler, whether it's used or not */
+	val |= VPE_DS_SRC_DEI_SCALER;
+
+	if (fmt->fourcc != V4L2_PIX_FMT_NV12)
+		val |= VPE_DS_BYPASS;
+
+	mmr_adb->out_fmt_reg[0] = val;
+
+	ctx->load_mmrs = true;
+}
+
+/*
+ * Set the de-interlacer shadow register values
+ */
+static void set_dei_regs_bypass(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	struct vpe_q_data *s_q_data = &ctx->q_data[Q_DATA_SRC];
+	unsigned int src_h = s_q_data->c_rect.height;
+	unsigned int src_w = s_q_data->c_rect.width;
+	u32 *dei_mmr0 = &mmr_adb->dei_regs[0];
+	u32 val = 0;
+
+	/*
+	 * according to TRM, we should set DEI in progressive bypass mode when
+	 * the input content is progressive, however, DEI is bypassed correctly
+	 * for both progressive and interlace content in interlace bypass mode.
+	 * It has been recommended not to use progressive bypass mode.
+	 */
+	val = VPE_DEI_INTERLACE_BYPASS;
+
+	val |= (src_h << VPE_DEI_HEIGHT_SHIFT) |
+		(src_w << VPE_DEI_WIDTH_SHIFT) |
+		VPE_DEI_FIELD_FLUSH;
+
+	*dei_mmr0 = val;
+
+	ctx->load_mmrs = true;
+}
+
+static void set_csc_coeff_bypass(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	u32 *shadow_csc_reg5 = &mmr_adb->csc_regs[5];
+
+	*shadow_csc_reg5 |= VPE_CSC_BYPASS;
+
+	ctx->load_mmrs = true;
+}
+
+static void set_sc_regs_bypass(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	u32 *sc_reg0 = &mmr_adb->sc_regs[0];
+	u32 val = 0;
+
+	val |= VPE_SC_BYPASS;
+	*sc_reg0 = val;
+
+	ctx->load_mmrs = true;
+}
+
+/*
+ * Set the shadow registers whose values are modified when either the
+ * source or destination format is changed.
+ */
+static int set_srcdst_params(struct vpe_ctx *ctx)
+{
+	ctx->sequence = 0;
+
+	set_cfg_and_line_modes(ctx);
+	set_dei_regs_bypass(ctx);
+	set_csc_coeff_bypass(ctx);
+	set_sc_regs_bypass(ctx);
+
+	return 0;
+}
+
+/*
+ * Return the vpe_ctx structure for a given struct file
+ */
+static struct vpe_ctx *file2ctx(struct file *file)
+{
+	return container_of(file->private_data, struct vpe_ctx, fh);
+}
+
+/*
+ * mem2mem callbacks
+ */
+
+/**
+ * job_ready() - check whether an instance is ready to be scheduled to run
+ */
+static int job_ready(void *priv)
+{
+	struct vpe_ctx *ctx = priv;
+	int needed = ctx->bufs_per_job;
+
+	if (v4l2_m2m_num_src_bufs_ready(ctx->m2m_ctx) < needed)
+		return 0;
+
+	return 1;
+}
+
+static void job_abort(void *priv)
+{
+	struct vpe_ctx *ctx = priv;
+
+	/* Will cancel the transaction in the next interrupt handler */
+	ctx->aborting = 1;
+}
+
+/*
+ * Lock access to the device
+ */
+static void vpe_lock(void *priv)
+{
+	struct vpe_ctx *ctx = priv;
+	struct vpe_dev *dev = ctx->dev;
+	mutex_lock(&dev->dev_mutex);
+}
+
+static void vpe_unlock(void *priv)
+{
+	struct vpe_ctx *ctx = priv;
+	struct vpe_dev *dev = ctx->dev;
+	mutex_unlock(&dev->dev_mutex);
+}
+
+static void vpe_dump_regs(struct vpe_dev *dev)
+{
+#define DUMPREG(r) vpe_dbg(dev, "%-35s %08x\n", #r, read_reg(dev, VPE_##r))
+
+	vpe_dbg(dev, "VPE Registers:\n");
+
+	DUMPREG(PID);
+	DUMPREG(SYSCONFIG);
+	DUMPREG(INT0_STATUS0_RAW);
+	DUMPREG(INT0_STATUS0);
+	DUMPREG(INT0_ENABLE0);
+	DUMPREG(INT0_STATUS1_RAW);
+	DUMPREG(INT0_STATUS1);
+	DUMPREG(INT0_ENABLE1);
+	DUMPREG(CLK_ENABLE);
+	DUMPREG(CLK_RESET);
+	DUMPREG(CLK_FORMAT_SELECT);
+	DUMPREG(CLK_RANGE_MAP);
+	DUMPREG(US1_R0);
+	DUMPREG(US1_R1);
+	DUMPREG(US1_R2);
+	DUMPREG(US1_R3);
+	DUMPREG(US1_R4);
+	DUMPREG(US1_R5);
+	DUMPREG(US1_R6);
+	DUMPREG(US1_R7);
+	DUMPREG(US2_R0);
+	DUMPREG(US2_R1);
+	DUMPREG(US2_R2);
+	DUMPREG(US2_R3);
+	DUMPREG(US2_R4);
+	DUMPREG(US2_R5);
+	DUMPREG(US2_R6);
+	DUMPREG(US2_R7);
+	DUMPREG(US3_R0);
+	DUMPREG(US3_R1);
+	DUMPREG(US3_R2);
+	DUMPREG(US3_R3);
+	DUMPREG(US3_R4);
+	DUMPREG(US3_R5);
+	DUMPREG(US3_R6);
+	DUMPREG(US3_R7);
+	DUMPREG(DEI_FRAME_SIZE);
+	DUMPREG(MDT_BYPASS);
+	DUMPREG(MDT_SF_THRESHOLD);
+	DUMPREG(EDI_CONFIG);
+	DUMPREG(DEI_EDI_LUT_R0);
+	DUMPREG(DEI_EDI_LUT_R1);
+	DUMPREG(DEI_EDI_LUT_R2);
+	DUMPREG(DEI_EDI_LUT_R3);
+	DUMPREG(DEI_FMD_WINDOW_R0);
+	DUMPREG(DEI_FMD_WINDOW_R1);
+	DUMPREG(DEI_FMD_CONTROL_R0);
+	DUMPREG(DEI_FMD_CONTROL_R1);
+	DUMPREG(DEI_FMD_STATUS_R0);
+	DUMPREG(DEI_FMD_STATUS_R1);
+	DUMPREG(DEI_FMD_STATUS_R2);
+	DUMPREG(SC_MP_SC0);
+	DUMPREG(SC_MP_SC1);
+	DUMPREG(SC_MP_SC2);
+	DUMPREG(SC_MP_SC3);
+	DUMPREG(SC_MP_SC4);
+	DUMPREG(SC_MP_SC5);
+	DUMPREG(SC_MP_SC6);
+	DUMPREG(SC_MP_SC8);
+	DUMPREG(SC_MP_SC9);
+	DUMPREG(SC_MP_SC10);
+	DUMPREG(SC_MP_SC11);
+	DUMPREG(SC_MP_SC12);
+	DUMPREG(SC_MP_SC13);
+	DUMPREG(SC_MP_SC17);
+	DUMPREG(SC_MP_SC18);
+	DUMPREG(SC_MP_SC19);
+	DUMPREG(SC_MP_SC20);
+	DUMPREG(SC_MP_SC21);
+	DUMPREG(SC_MP_SC22);
+	DUMPREG(SC_MP_SC23);
+	DUMPREG(SC_MP_SC24);
+	DUMPREG(SC_MP_SC25);
+	DUMPREG(CSC_CSC00);
+	DUMPREG(CSC_CSC01);
+	DUMPREG(CSC_CSC02);
+	DUMPREG(CSC_CSC03);
+	DUMPREG(CSC_CSC04);
+	DUMPREG(CSC_CSC05);
+#undef DUMPREG
+}
+
+static void add_out_dtd(struct vpe_ctx *ctx, int port)
+{
+	struct vpe_q_data *q_data = &ctx->q_data[Q_DATA_DST];
+	const struct vpe_port_data *p_data = &port_data[port];
+	struct vb2_buffer *vb = ctx->dst_vb;
+	struct v4l2_rect *c_rect = &q_data->c_rect;
+	struct vpe_fmt *fmt = q_data->fmt;
+	const struct vpdma_data_format *vpdma_fmt;
+	int plane = fmt->coplanar ? p_data->vb_part : 0;
+	dma_addr_t dma_addr;
+	u32 flags = 0;
+
+	vpdma_fmt = fmt->vpdma_fmt[plane];
+	dma_addr = vb2_dma_contig_plane_dma_addr(vb, plane);
+	if (!dma_addr) {
+		vpe_err(ctx->dev,
+			"acquiring output buffer(%d) dma_addr failed\n",
+			port);
+		return;
+	}
+
+	if (q_data->flags & Q_DATA_FRAME_1D)
+		flags |= VPDMA_DATA_FRAME_1D;
+	if (q_data->flags & Q_DATA_MODE_TILED)
+		flags |= VPDMA_DATA_MODE_TILED;
+
+	vpdma_add_out_dtd(&ctx->desc_list, c_rect, vpdma_fmt, dma_addr,
+		p_data->channel, flags);
+}
+
+static void add_in_dtd(struct vpe_ctx *ctx, int port)
+{
+	struct vpe_q_data *q_data = &ctx->q_data[Q_DATA_SRC];
+	const struct vpe_port_data *p_data = &port_data[port];
+	struct vb2_buffer *vb = ctx->src_vb;
+	struct v4l2_rect *c_rect = &q_data->c_rect;
+	struct vpe_fmt *fmt = q_data->fmt;
+	const struct vpdma_data_format *vpdma_fmt;
+	int plane = fmt->coplanar ? p_data->vb_part : 0;
+	int field = 0;
+	dma_addr_t dma_addr;
+	u32 flags = 0;
+
+	vpdma_fmt = fmt->vpdma_fmt[plane];
+
+	dma_addr = vb2_dma_contig_plane_dma_addr(vb, plane);
+	if (!dma_addr) {
+		vpe_err(ctx->dev,
+			"acquiring input buffer(%d) dma_addr failed\n",
+			port);
+		return;
+	}
+
+	if (q_data->flags & Q_DATA_FRAME_1D)
+		flags |= VPDMA_DATA_FRAME_1D;
+	if (q_data->flags & Q_DATA_MODE_TILED)
+		flags |= VPDMA_DATA_MODE_TILED;
+
+	vpdma_add_in_dtd(&ctx->desc_list, q_data->width, q_data->height,
+		c_rect, vpdma_fmt, dma_addr, p_data->channel, field, flags);
+}
+
+/*
+ * Enable the expected IRQ sources
+ */
+static void enable_irqs(struct vpe_ctx *ctx)
+{
+	write_reg(ctx->dev, VPE_INT0_ENABLE0_SET, VPE_INT0_LIST0_COMPLETE);
+	write_reg(ctx->dev, VPE_INT0_ENABLE1_SET, VPE_DS1_UV_ERROR_INT);
+
+	vpdma_enable_list_complete_irq(ctx->dev->vpdma, 0, true);
+}
+
+static void disable_irqs(struct vpe_ctx *ctx)
+{
+	write_reg(ctx->dev, VPE_INT0_ENABLE0_CLR, 0xffffffff);
+	write_reg(ctx->dev, VPE_INT0_ENABLE1_CLR, 0xffffffff);
+
+	vpdma_enable_list_complete_irq(ctx->dev->vpdma, 0, false);
+}
+
+/* device_run() - prepares and starts the device
+ *
+ * This function is only called when both the source and destination
+ * buffers are in place.
+ */
+static void device_run(void *priv)
+{
+	struct vpe_ctx *ctx = priv;
+	struct vpe_q_data *d_q_data = &ctx->q_data[Q_DATA_DST];
+
+	ctx->src_vb = v4l2_m2m_src_buf_remove(ctx->m2m_ctx);
+	WARN_ON(ctx->src_vb == NULL);
+	ctx->dst_vb = v4l2_m2m_dst_buf_remove(ctx->m2m_ctx);
+	WARN_ON(ctx->dst_vb == NULL);
+
+	/* config descriptors */
+	if (ctx->dev->loaded_mmrs != ctx->mmr_adb.dma_addr || ctx->load_mmrs) {
+		vpdma_map_desc_buf(ctx->dev->vpdma, &ctx->mmr_adb);
+		vpdma_add_cfd_adb(&ctx->desc_list, CFD_MMR_CLIENT, &ctx->mmr_adb);
+		ctx->dev->loaded_mmrs = ctx->mmr_adb.dma_addr;
+		ctx->load_mmrs = false;
+	}
+
+	add_out_dtd(ctx, VPE_PORT_LUMA_OUT);
+	if (d_q_data->fmt->coplanar)
+		add_out_dtd(ctx, VPE_PORT_CHROMA_OUT);
+
+	add_in_dtd(ctx, VPE_PORT_LUMA1_IN);
+	add_in_dtd(ctx, VPE_PORT_CHROMA1_IN);
+
+	/* sync on channel control descriptors for input ports */
+	vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_LUMA1_IN);
+	vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_CHROMA1_IN);
+
+	/* sync on channel control descriptors for output ports */
+	vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_LUMA_OUT);
+	if (d_q_data->fmt->coplanar)
+		vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_CHROMA_OUT);
+
+	enable_irqs(ctx);
+
+	vpdma_map_desc_buf(ctx->dev->vpdma, &ctx->desc_list.buf);
+	vpdma_submit_descs(ctx->dev->vpdma, &ctx->desc_list);
+}
+
+static void ds1_uv_error(struct vpe_ctx *ctx)
+{
+	dev_warn(ctx->dev->v4l2_dev.dev,
+		"received downsampler error interrupt\n");
+}
+
+static irqreturn_t vpe_irq(int irq_vpe, void *data)
+{
+	struct vpe_dev *dev = (struct vpe_dev *)data;
+	struct vpe_ctx *ctx;
+	struct vb2_buffer *s_vb, *d_vb;
+	struct v4l2_buffer *s_buf, *d_buf;
+	unsigned long flags;
+	u32 irqst0, irqst1;
+
+	irqst0 = read_reg(dev, VPE_INT0_STATUS0);
+	if (irqst0) {
+		write_reg(dev, VPE_INT0_STATUS0_CLR, irqst0);
+		vpe_dbg(dev, "INT0_STATUS0 = 0x%08x\n", irqst0);
+	}
+
+	irqst1 = read_reg(dev, VPE_INT0_STATUS1);
+	if (irqst1) {
+		write_reg(dev, VPE_INT0_STATUS1_CLR, irqst1);
+		vpe_dbg(dev, "INT0_STATUS1 = 0x%08x\n", irqst1);
+	}
+
+	ctx = v4l2_m2m_get_curr_priv(dev->m2m_dev);
+	if (!ctx) {
+		vpe_err(dev, "instance released before end of transaction\n");
+		goto handled;
+	}
+
+	if (irqst1 & VPE_DS1_UV_ERROR_INT) {
+		irqst1 &= ~VPE_DS1_UV_ERROR_INT;
+		ds1_uv_error(ctx);
+	}
+
+	if (irqst0) {
+		if (irqst0 & VPE_INT0_LIST0_COMPLETE)
+			vpdma_clear_list_stat(ctx->dev->vpdma);
+
+		irqst0 &= ~(VPE_INT0_LIST0_COMPLETE);
+	}
+
+	if (irqst0 | irqst1) {
+		dev_warn(dev->v4l2_dev.dev, "Unexpected interrupt: "
+			"INT0_STATUS0 = 0x%08x, INT0_STATUS1 = 0x%08x\n",
+			irqst0, irqst1);
+	}
+
+	disable_irqs(ctx);
+
+	vpdma_unmap_desc_buf(dev->vpdma, &ctx->desc_list.buf);
+	vpdma_unmap_desc_buf(dev->vpdma, &ctx->mmr_adb);
+
+	vpdma_reset_desc_list(&ctx->desc_list);
+
+	if (ctx->aborting)
+		goto finished;
+
+	s_vb = ctx->src_vb;
+	d_vb = ctx->dst_vb;
+	s_buf = &s_vb->v4l2_buf;
+	d_buf = &d_vb->v4l2_buf;
+
+	d_buf->timestamp = s_buf->timestamp;
+	if (s_buf->flags & V4L2_BUF_FLAG_TIMECODE) {
+		d_buf->flags |= V4L2_BUF_FLAG_TIMECODE;
+		d_buf->timecode = s_buf->timecode;
+	}
+
+	d_buf->sequence = ctx->sequence;
+
+	ctx->sequence++;
+
+	spin_lock_irqsave(&dev->lock, flags);
+	v4l2_m2m_buf_done(s_vb, VB2_BUF_STATE_DONE);
+	v4l2_m2m_buf_done(d_vb, VB2_BUF_STATE_DONE);
+	spin_unlock_irqrestore(&dev->lock, flags);
+
+	ctx->bufs_completed++;
+	if (ctx->bufs_completed < ctx->bufs_per_job) {
+		device_run(ctx);
+		goto handled;
+	}
+
+finished:
+	vpe_dbg(ctx->dev, "finishing transaction\n");
+	ctx->bufs_completed = 0;
+	v4l2_m2m_job_finish(dev->m2m_dev, ctx->m2m_ctx);
+handled:
+	return IRQ_HANDLED;
+}
+
+/*
+ * video ioctls
+ */
+static int vpe_querycap(struct file *file, void *priv,
+			struct v4l2_capability *cap)
+{
+	strncpy(cap->driver, VPE_MODULE_NAME, sizeof(cap->driver) - 1);
+	strncpy(cap->card, VPE_MODULE_NAME, sizeof(cap->card) - 1);
+	strlcpy(cap->bus_info, VPE_MODULE_NAME, sizeof(cap->bus_info));
+	cap->device_caps  = V4L2_CAP_VIDEO_M2M | V4L2_CAP_STREAMING;
+	cap->capabilities = cap->device_caps | V4L2_CAP_DEVICE_CAPS;
+	return 0;
+}
+
+static int __enum_fmt(struct v4l2_fmtdesc *f, u32 type)
+{
+	int i, index;
+	struct vpe_fmt *fmt = NULL;
+
+	index = 0;
+	for (i = 0; i < ARRAY_SIZE(vpe_formats); ++i) {
+		if (vpe_formats[i].types & type) {
+			if (index == f->index) {
+				fmt = &vpe_formats[i];
+				break;
+			}
+			index++;
+		}
+	}
+
+	if (!fmt)
+		return -EINVAL;
+
+	strncpy(f->description, fmt->name, sizeof(f->description) - 1);
+	f->pixelformat = fmt->fourcc;
+	return 0;
+}
+
+static int vpe_enum_fmt(struct file *file, void *priv,
+				struct v4l2_fmtdesc *f)
+{
+	if (V4L2_TYPE_IS_OUTPUT(f->type))
+		return __enum_fmt(f, VPE_FMT_TYPE_OUTPUT);
+
+	return __enum_fmt(f, VPE_FMT_TYPE_CAPTURE);
+}
+
+static int vpe_g_fmt(struct file *file, void *priv, struct v4l2_format *f)
+{
+	struct v4l2_pix_format_mplane *pix = &f->fmt.pix_mp;
+	struct vpe_ctx *ctx = file2ctx(file);
+	struct vb2_queue *vq;
+	struct vpe_q_data *q_data;
+	int i;
+
+	vq = v4l2_m2m_get_vq(ctx->m2m_ctx, f->type);
+	if (!vq)
+		return -EINVAL;
+
+	q_data = get_q_data(ctx, f->type);
+
+	pix->width = q_data->width;
+	pix->height = q_data->height;
+	pix->pixelformat = q_data->fmt->fourcc;
+
+	if (V4L2_TYPE_IS_OUTPUT(f->type)) {
+		pix->colorspace = q_data->colorspace;
+	} else {
+		struct vpe_q_data *s_q_data;
+
+		/* get colorspace from the source queue */
+		s_q_data = get_q_data(ctx, V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE);
+
+		pix->colorspace = s_q_data->colorspace;
+	}
+
+	pix->num_planes = q_data->fmt->coplanar ? 2 : 1;
+
+	for (i = 0; i < pix->num_planes; i++) {
+		pix->plane_fmt[i].bytesperline = q_data->bytesperline[i];
+		pix->plane_fmt[i].sizeimage = q_data->sizeimage[i];
+	}
+
+	return 0;
+}
+
+static int __vpe_try_fmt(struct vpe_ctx *ctx, struct v4l2_format *f,
+		       struct vpe_fmt *fmt, int type)
+{
+	struct v4l2_pix_format_mplane *pix = &f->fmt.pix_mp;
+	struct v4l2_plane_pix_format *plane_fmt;
+	int i;
+
+	if (!fmt || !(fmt->types & type)) {
+		vpe_err(ctx->dev, "Fourcc format (0x%08x) invalid.\n",
+			pix->pixelformat);
+		return -EINVAL;
+	}
+
+	pix->field = V4L2_FIELD_NONE;
+
+	v4l_bound_align_image(&pix->width, MIN_W, MAX_W, W_ALIGN,
+			      &pix->height, MIN_H, MAX_H, H_ALIGN,
+			      S_ALIGN);
+
+	pix->num_planes = fmt->coplanar ? 2 : 1;
+	pix->pixelformat = fmt->fourcc;
+
+	if (type == VPE_FMT_TYPE_CAPTURE) {
+		struct vpe_q_data *s_q_data;
+
+		/* get colorspace from the source queue */
+		s_q_data = get_q_data(ctx, V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE);
+
+		pix->colorspace = s_q_data->colorspace;
+	} else {
+		if (!pix->colorspace)
+			pix->colorspace = V4L2_COLORSPACE_SMPTE240M;
+	}
+
+	for (i = 0; i < pix->num_planes; i++) {
+		int depth;
+
+		plane_fmt = &pix->plane_fmt[i];
+		depth = fmt->vpdma_fmt[i]->depth;
+
+		if (i == VPE_LUMA)
+			plane_fmt->bytesperline =
+					round_up((pix->width * depth) >> 3,
+						1 << L_ALIGN);
+		else
+			plane_fmt->bytesperline = pix->width;
+
+		plane_fmt->sizeimage =
+				(pix->height * pix->width * depth) >> 3;
+	}
+
+	return 0;
+}
+
+static int vpe_try_fmt(struct file *file, void *priv, struct v4l2_format *f)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+	struct vpe_fmt *fmt = find_format(f);
+
+	if (V4L2_TYPE_IS_OUTPUT(f->type))
+		return __vpe_try_fmt(ctx, f, fmt, VPE_FMT_TYPE_OUTPUT);
+	else
+		return __vpe_try_fmt(ctx, f, fmt, VPE_FMT_TYPE_CAPTURE);
+}
+
+static int __vpe_s_fmt(struct vpe_ctx *ctx, struct v4l2_format *f)
+{
+	struct v4l2_pix_format_mplane *pix = &f->fmt.pix_mp;
+	struct v4l2_plane_pix_format *plane_fmt;
+	struct vpe_q_data *q_data;
+	struct vb2_queue *vq;
+	int i;
+
+	vq = v4l2_m2m_get_vq(ctx->m2m_ctx, f->type);
+	if (!vq)
+		return -EINVAL;
+
+	if (vb2_is_busy(vq)) {
+		vpe_err(ctx->dev, "queue busy\n");
+		return -EBUSY;
+	}
+
+	q_data = get_q_data(ctx, f->type);
+	if (!q_data)
+		return -EINVAL;
+
+	q_data->fmt		= find_format(f);
+	q_data->width		= pix->width;
+	q_data->height		= pix->height;
+	q_data->colorspace	= pix->colorspace;
+
+	for (i = 0; i < pix->num_planes; i++) {
+		plane_fmt = &pix->plane_fmt[i];
+
+		q_data->bytesperline[i]	= plane_fmt->bytesperline;
+		q_data->sizeimage[i]	= plane_fmt->sizeimage;
+	}
+
+	q_data->c_rect.left	= 0;
+	q_data->c_rect.top	= 0;
+	q_data->c_rect.width	= q_data->width;
+	q_data->c_rect.height	= q_data->height;
+
+	vpe_dbg(ctx->dev, "Setting format for type %d, wxh: %dx%d, fmt: %d bpl_y %d",
+		f->type, q_data->width, q_data->height, q_data->fmt->fourcc,
+		q_data->bytesperline[VPE_LUMA]);
+	if (q_data->fmt->coplanar)
+		vpe_dbg(ctx->dev, " bpl_uv %d\n",
+			q_data->bytesperline[VPE_CHROMA]);
+
+	return 0;
+}
+
+static int vpe_s_fmt(struct file *file, void *priv, struct v4l2_format *f)
+{
+	int ret;
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	ret = vpe_try_fmt(file, priv, f);
+	if (ret)
+		return ret;
+
+	ret = __vpe_s_fmt(ctx, f);
+	if (ret)
+		return ret;
+
+	if (V4L2_TYPE_IS_OUTPUT(f->type))
+		set_src_registers(ctx);
+	else
+		set_dst_registers(ctx);
+
+	return set_srcdst_params(ctx);
+}
+
+static int vpe_reqbufs(struct file *file, void *priv,
+		       struct v4l2_requestbuffers *reqbufs)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	return v4l2_m2m_reqbufs(file, ctx->m2m_ctx, reqbufs);
+}
+
+static int vpe_querybuf(struct file *file, void *priv, struct v4l2_buffer *buf)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	return v4l2_m2m_querybuf(file, ctx->m2m_ctx, buf);
+}
+
+static int vpe_qbuf(struct file *file, void *priv, struct v4l2_buffer *buf)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	return v4l2_m2m_qbuf(file, ctx->m2m_ctx, buf);
+}
+
+static int vpe_dqbuf(struct file *file, void *priv, struct v4l2_buffer *buf)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	return v4l2_m2m_dqbuf(file, ctx->m2m_ctx, buf);
+}
+
+static int vpe_streamon(struct file *file, void *priv, enum v4l2_buf_type type)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	return v4l2_m2m_streamon(file, ctx->m2m_ctx, type);
+}
+
+static int vpe_streamoff(struct file *file, void *priv, enum v4l2_buf_type type)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	vpe_dump_regs(ctx->dev);
+	vpdma_dump_regs(ctx->dev->vpdma);
+
+	return v4l2_m2m_streamoff(file, ctx->m2m_ctx, type);
+}
+
+/*
+ * defines number of buffers/frames a context can process with VPE before
+ * switching to a different context. default value is 1 buffer per context
+ */
+#define V4L2_CID_VPE_BUFS_PER_JOB		(V4L2_CID_USER_TI_VPE_BASE + 0)
+
+static int vpe_s_ctrl(struct v4l2_ctrl *ctrl)
+{
+	struct vpe_ctx *ctx =
+		container_of(ctrl->handler, struct vpe_ctx, hdl);
+
+	switch (ctrl->id) {
+	case V4L2_CID_VPE_BUFS_PER_JOB:
+		ctx->bufs_per_job = ctrl->val;
+		break;
+
+	default:
+		vpe_err(ctx->dev, "Invalid control\n");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static const struct v4l2_ctrl_ops vpe_ctrl_ops = {
+	.s_ctrl = vpe_s_ctrl,
+};
+
+static const struct v4l2_ioctl_ops vpe_ioctl_ops = {
+	.vidioc_querycap	= vpe_querycap,
+
+	.vidioc_enum_fmt_vid_cap_mplane = vpe_enum_fmt,
+	.vidioc_g_fmt_vid_cap_mplane	= vpe_g_fmt,
+	.vidioc_try_fmt_vid_cap_mplane	= vpe_try_fmt,
+	.vidioc_s_fmt_vid_cap_mplane	= vpe_s_fmt,
+
+	.vidioc_enum_fmt_vid_out_mplane = vpe_enum_fmt,
+	.vidioc_g_fmt_vid_out_mplane	= vpe_g_fmt,
+	.vidioc_try_fmt_vid_out_mplane	= vpe_try_fmt,
+	.vidioc_s_fmt_vid_out_mplane	= vpe_s_fmt,
+
+	.vidioc_reqbufs		= vpe_reqbufs,
+	.vidioc_querybuf	= vpe_querybuf,
+
+	.vidioc_qbuf		= vpe_qbuf,
+	.vidioc_dqbuf		= vpe_dqbuf,
+
+	.vidioc_streamon	= vpe_streamon,
+	.vidioc_streamoff	= vpe_streamoff,
+	.vidioc_subscribe_event = v4l2_ctrl_subscribe_event,
+	.vidioc_unsubscribe_event = v4l2_event_unsubscribe,
+};
+
+/*
+ * Queue operations
+ */
+static int vpe_queue_setup(struct vb2_queue *vq,
+			   const struct v4l2_format *fmt,
+			   unsigned int *nbuffers, unsigned int *nplanes,
+			   unsigned int sizes[], void *alloc_ctxs[])
+{
+	int i;
+	struct vpe_ctx *ctx = vb2_get_drv_priv(vq);
+	struct vpe_q_data *q_data;
+
+	q_data = get_q_data(ctx, vq->type);
+
+	*nplanes = q_data->fmt->coplanar ? 2 : 1;
+
+	for (i = 0; i < *nplanes; i++) {
+		sizes[i] = q_data->sizeimage[i];
+		alloc_ctxs[i] = ctx->dev->alloc_ctx;
+	}
+
+	vpe_dbg(ctx->dev, "get %d buffer(s) of size %d", *nbuffers,
+		sizes[VPE_LUMA]);
+	if (q_data->fmt->coplanar)
+		vpe_dbg(ctx->dev, " and %d\n", sizes[VPE_CHROMA]);
+
+	return 0;
+}
+
+static int vpe_buf_prepare(struct vb2_buffer *vb)
+{
+	struct vpe_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
+	struct vpe_q_data *q_data;
+	int i, num_planes;
+
+	vpe_dbg(ctx->dev, "type: %d\n", vb->vb2_queue->type);
+
+	q_data = get_q_data(ctx, vb->vb2_queue->type);
+	num_planes = q_data->fmt->coplanar ? 2 : 1;
+
+	for (i = 0; i < num_planes; i++) {
+		if (vb2_plane_size(vb, i) < q_data->sizeimage[i]) {
+			vpe_err(ctx->dev,
+				"data will not fit into plane (%lu < %lu)\n",
+				vb2_plane_size(vb, i),
+				(long) q_data->sizeimage[i]);
+			return -EINVAL;
+		}
+	}
+
+	for (i = 0; i < num_planes; i++)
+		vb2_set_plane_payload(vb, i, q_data->sizeimage[i]);
+
+	return 0;
+}
+
+static void vpe_buf_queue(struct vb2_buffer *vb)
+{
+	struct vpe_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
+	v4l2_m2m_buf_queue(ctx->m2m_ctx, vb);
+}
+
+static void vpe_wait_prepare(struct vb2_queue *q)
+{
+	struct vpe_ctx *ctx = vb2_get_drv_priv(q);
+	vpe_unlock(ctx);
+}
+
+static void vpe_wait_finish(struct vb2_queue *q)
+{
+	struct vpe_ctx *ctx = vb2_get_drv_priv(q);
+	vpe_lock(ctx);
+}
+
+static struct vb2_ops vpe_qops = {
+	.queue_setup	 = vpe_queue_setup,
+	.buf_prepare	 = vpe_buf_prepare,
+	.buf_queue	 = vpe_buf_queue,
+	.wait_prepare	 = vpe_wait_prepare,
+	.wait_finish	 = vpe_wait_finish,
+};
+
+static int queue_init(void *priv, struct vb2_queue *src_vq,
+		      struct vb2_queue *dst_vq)
+{
+	struct vpe_ctx *ctx = priv;
+	int ret;
+
+	memset(src_vq, 0, sizeof(*src_vq));
+	src_vq->type = V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE;
+	src_vq->io_modes = VB2_MMAP;
+	src_vq->drv_priv = ctx;
+	src_vq->buf_struct_size = sizeof(struct v4l2_m2m_buffer);
+	src_vq->ops = &vpe_qops;
+	src_vq->mem_ops = &vb2_dma_contig_memops;
+	src_vq->timestamp_type = V4L2_BUF_FLAG_TIMESTAMP_COPY;
+
+	ret = vb2_queue_init(src_vq);
+	if (ret)
+		return ret;
+
+	memset(dst_vq, 0, sizeof(*dst_vq));
+	dst_vq->type = V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE;
+	dst_vq->io_modes = VB2_MMAP;
+	dst_vq->drv_priv = ctx;
+	dst_vq->buf_struct_size = sizeof(struct v4l2_m2m_buffer);
+	dst_vq->ops = &vpe_qops;
+	dst_vq->mem_ops = &vb2_dma_contig_memops;
+	dst_vq->timestamp_type = V4L2_BUF_FLAG_TIMESTAMP_COPY;
+
+	return vb2_queue_init(dst_vq);
+}
+
+static const struct v4l2_ctrl_config vpe_bufs_per_job = {
+	.ops = &vpe_ctrl_ops,
+	.id = V4L2_CID_VPE_BUFS_PER_JOB,
+	.name = "Buffers Per Transaction",
+	.type = V4L2_CTRL_TYPE_INTEGER,
+	.def = VPE_DEF_BUFS_PER_JOB,
+	.min = 1,
+	.max = VIDEO_MAX_FRAME,
+	.step = 1,
+};
+
+/*
+ * File operations
+ */
+static int vpe_open(struct file *file)
+{
+	struct vpe_dev *dev = video_drvdata(file);
+	struct vpe_ctx *ctx = NULL;
+	struct vpe_q_data *s_q_data;
+	struct v4l2_ctrl_handler *hdl;
+	int ret;
+
+	vpe_dbg(dev, "vpe_open\n");
+
+	if (!dev->vpdma->ready) {
+		vpe_err(dev, "vpdma firmware not loaded\n");
+		return -ENODEV;
+	}
+
+	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
+	if (!ctx)
+		return -ENOMEM;
+
+	ctx->dev = dev;
+
+	if (mutex_lock_interruptible(&dev->dev_mutex)) {
+		ret = -ERESTARTSYS;
+		goto free_ctx;
+	}
+
+	ret = vpdma_create_desc_list(&ctx->desc_list, VPE_DESC_LIST_SIZE,
+			VPDMA_LIST_TYPE_NORMAL);
+	if (ret != 0)
+		goto unlock;
+
+	ret = vpdma_alloc_desc_buf(&ctx->mmr_adb, sizeof(struct vpe_mmr_adb));
+	if (ret != 0)
+		goto free_desc_list;
+
+	init_adb_hdrs(ctx);
+
+	v4l2_fh_init(&ctx->fh, video_devdata(file));
+	file->private_data = &ctx->fh;
+
+	hdl = &ctx->hdl;
+	v4l2_ctrl_handler_init(hdl, 1);
+	v4l2_ctrl_new_custom(hdl, &vpe_bufs_per_job, NULL);
+	if (hdl->error) {
+		ret = hdl->error;
+		goto exit_fh;
+	}
+	ctx->fh.ctrl_handler = hdl;
+	v4l2_ctrl_handler_setup(hdl);
+
+	s_q_data = &ctx->q_data[Q_DATA_SRC];
+	s_q_data->fmt = &vpe_formats[2];
+	s_q_data->width = 1920;
+	s_q_data->height = 1080;
+	s_q_data->sizeimage[VPE_LUMA] = (s_q_data->width * s_q_data->height *
+			s_q_data->fmt->vpdma_fmt[VPE_LUMA]->depth) >> 3;
+	s_q_data->colorspace = V4L2_COLORSPACE_SMPTE240M;
+	s_q_data->c_rect.left = 0;
+	s_q_data->c_rect.top = 0;
+	s_q_data->c_rect.width = s_q_data->width;
+	s_q_data->c_rect.height = s_q_data->height;
+	s_q_data->flags = 0;
+
+	ctx->q_data[Q_DATA_DST] = *s_q_data;
+
+	set_src_registers(ctx);
+	set_dst_registers(ctx);
+	ret = set_srcdst_params(ctx);
+	if (ret)
+		goto exit_fh;
+
+	ctx->m2m_ctx = v4l2_m2m_ctx_init(dev->m2m_dev, ctx, &queue_init);
+
+	if (IS_ERR(ctx->m2m_ctx)) {
+		ret = PTR_ERR(ctx->m2m_ctx);
+		goto exit_fh;
+	}
+
+	v4l2_fh_add(&ctx->fh);
+
+	/*
+	 * for now, just report the creation of the first instance, we can later
+	 * optimize the driver to enable or disable clocks when the first
+	 * instance is created or the last instance released
+	 */
+	if (atomic_inc_return(&dev->num_instances) == 1)
+		vpe_dbg(dev, "first instance created\n");
+
+	ctx->bufs_per_job = VPE_DEF_BUFS_PER_JOB;
+
+	ctx->load_mmrs = true;
+
+	vpe_dbg(dev, "created instance %p, m2m_ctx: %p\n",
+		ctx, ctx->m2m_ctx);
+
+	mutex_unlock(&dev->dev_mutex);
+
+	return 0;
+exit_fh:
+	v4l2_ctrl_handler_free(hdl);
+	v4l2_fh_exit(&ctx->fh);
+	vpdma_free_desc_buf(&ctx->mmr_adb);
+free_desc_list:
+	vpdma_free_desc_list(&ctx->desc_list);
+unlock:
+	mutex_unlock(&dev->dev_mutex);
+free_ctx:
+	kfree(ctx);
+	return ret;
+}
+
+static int vpe_release(struct file *file)
+{
+	struct vpe_dev *dev = video_drvdata(file);
+	struct vpe_ctx *ctx = file2ctx(file);
+
+	vpe_dbg(dev, "releasing instance %p\n", ctx);
+
+	mutex_lock(&dev->dev_mutex);
+	vpdma_free_desc_list(&ctx->desc_list);
+	vpdma_free_desc_buf(&ctx->mmr_adb);
+
+	v4l2_fh_del(&ctx->fh);
+	v4l2_fh_exit(&ctx->fh);
+	v4l2_ctrl_handler_free(&ctx->hdl);
+	v4l2_m2m_ctx_release(ctx->m2m_ctx);
+
+	kfree(ctx);
+
+	/*
+	 * for now, just report the release of the last instance, we can later
+	 * optimize the driver to enable or disable clocks when the first
+	 * instance is created or the last instance released
+	 */
+	if (atomic_dec_return(&dev->num_instances) == 0)
+		vpe_dbg(dev, "last instance released\n");
+
+	mutex_unlock(&dev->dev_mutex);
+
+	return 0;
+}
+
+static unsigned int vpe_poll(struct file *file,
+			     struct poll_table_struct *wait)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+	struct vpe_dev *dev = ctx->dev;
+	int ret;
+
+	mutex_lock(&dev->dev_mutex);
+	ret = v4l2_m2m_poll(file, ctx->m2m_ctx, wait);
+	mutex_unlock(&dev->dev_mutex);
+	return ret;
+}
+
+static int vpe_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	struct vpe_ctx *ctx = file2ctx(file);
+	struct vpe_dev *dev = ctx->dev;
+	int ret;
+
+	if (mutex_lock_interruptible(&dev->dev_mutex))
+		return -ERESTARTSYS;
+	ret = v4l2_m2m_mmap(file, ctx->m2m_ctx, vma);
+	mutex_unlock(&dev->dev_mutex);
+	return ret;
+}
+
+static const struct v4l2_file_operations vpe_fops = {
+	.owner		= THIS_MODULE,
+	.open		= vpe_open,
+	.release	= vpe_release,
+	.poll		= vpe_poll,
+	.unlocked_ioctl	= video_ioctl2,
+	.mmap		= vpe_mmap,
+};
+
+static struct video_device vpe_videodev = {
+	.name		= VPE_MODULE_NAME,
+	.fops		= &vpe_fops,
+	.ioctl_ops	= &vpe_ioctl_ops,
+	.minor		= -1,
+	.release	= video_device_release,
+	.vfl_dir	= VFL_DIR_M2M,
+};
+
+static struct v4l2_m2m_ops m2m_ops = {
+	.device_run	= device_run,
+	.job_ready	= job_ready,
+	.job_abort	= job_abort,
+	.lock		= vpe_lock,
+	.unlock		= vpe_unlock,
+};
+
+static int vpe_runtime_get(struct platform_device *pdev)
+{
+	int r;
+
+	dev_dbg(&pdev->dev, "vpe_runtime_get\n");
+
+	r = pm_runtime_get_sync(&pdev->dev);
+	WARN_ON(r < 0);
+	return r < 0 ? r : 0;
+}
+
+static void vpe_runtime_put(struct platform_device *pdev)
+{
+
+	int r;
+
+	dev_dbg(&pdev->dev, "vpe_runtime_put\n");
+
+	r = pm_runtime_put_sync(&pdev->dev);
+	WARN_ON(r < 0 && r != -ENOSYS);
+}
+
+static int vpe_probe(struct platform_device *pdev)
+{
+	struct vpe_dev *dev;
+	struct video_device *vfd;
+	struct resource *res;
+	int ret, irq, func;
+
+	dev = devm_kzalloc(&pdev->dev, sizeof(*dev), GFP_KERNEL);
+	if (IS_ERR(dev))
+		return PTR_ERR(dev);
+
+	spin_lock_init(&dev->lock);
+
+	ret = v4l2_device_register(&pdev->dev, &dev->v4l2_dev);
+	if (ret)
+		return ret;
+
+	atomic_set(&dev->num_instances, 0);
+	mutex_init(&dev->dev_mutex);
+
+	res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "vpe_top");
+	/*
+	 * HACK: we get resource info from device tree in the form of a list of
+	 * VPE sub blocks, the driver currently uses only the base of vpe_top
+	 * for register access, the driver should be changed later to access
+	 * registers based on the sub block base addresses
+	 */
+	dev->base = devm_ioremap(&pdev->dev, res->start, SZ_32K);
+	if (IS_ERR(dev->base)) {
+		ret = PTR_ERR(dev->base);
+		goto v4l2_dev_unreg;
+	}
+
+	irq = platform_get_irq(pdev, 0);
+	ret = devm_request_irq(&pdev->dev, irq, vpe_irq, 0, VPE_MODULE_NAME,
+			dev);
+	if (ret)
+		goto v4l2_dev_unreg;
+
+	platform_set_drvdata(pdev, dev);
+
+	dev->alloc_ctx = vb2_dma_contig_init_ctx(&pdev->dev);
+	if (IS_ERR(dev->alloc_ctx)) {
+		vpe_err(dev, "Failed to alloc vb2 context\n");
+		ret = PTR_ERR(dev->alloc_ctx);
+		goto v4l2_dev_unreg;
+	}
+
+	dev->m2m_dev = v4l2_m2m_init(&m2m_ops);
+	if (IS_ERR(dev->m2m_dev)) {
+		vpe_err(dev, "Failed to init mem2mem device\n");
+		ret = PTR_ERR(dev->m2m_dev);
+		goto rel_ctx;
+	}
+
+	pm_runtime_enable(&pdev->dev);
+
+	ret = vpe_runtime_get(pdev);
+	if (ret)
+		goto rel_m2m;
+
+	/* Perform clk enable followed by reset */
+	vpe_set_clock_enable(dev, 1);
+
+	vpe_top_reset(dev);
+
+	func = read_field_reg(dev, VPE_PID, VPE_PID_FUNC_MASK,
+		VPE_PID_FUNC_SHIFT);
+	vpe_dbg(dev, "VPE PID function %x\n", func);
+
+	vpe_top_vpdma_reset(dev);
+
+	dev->vpdma = vpdma_create(pdev);
+	if (IS_ERR(dev->vpdma))
+		goto runtime_put;
+
+	vfd = &dev->vfd;
+	*vfd = vpe_videodev;
+	vfd->lock = &dev->dev_mutex;
+	vfd->v4l2_dev = &dev->v4l2_dev;
+
+	ret = video_register_device(vfd, VFL_TYPE_GRABBER, 0);
+	if (ret) {
+		vpe_err(dev, "Failed to register video device\n");
+		goto runtime_put;
+	}
+
+	video_set_drvdata(vfd, dev);
+	snprintf(vfd->name, sizeof(vfd->name), "%s", vpe_videodev.name);
+	dev_info(dev->v4l2_dev.dev, "Device registered as /dev/video%d\n",
+		vfd->num);
+
+	return 0;
+
+runtime_put:
+	vpe_runtime_put(pdev);
+rel_m2m:
+	pm_runtime_disable(&pdev->dev);
+	v4l2_m2m_release(dev->m2m_dev);
+rel_ctx:
+	vb2_dma_contig_cleanup_ctx(dev->alloc_ctx);
+v4l2_dev_unreg:
+	v4l2_device_unregister(&dev->v4l2_dev);
+
+	return ret;
+}
+
+static int vpe_remove(struct platform_device *pdev)
+{
+	struct vpe_dev *dev =
+		(struct vpe_dev *) platform_get_drvdata(pdev);
+
+	v4l2_info(&dev->v4l2_dev, "Removing " VPE_MODULE_NAME);
+
+	v4l2_m2m_release(dev->m2m_dev);
+	video_unregister_device(&dev->vfd);
+	v4l2_device_unregister(&dev->v4l2_dev);
+	vb2_dma_contig_cleanup_ctx(dev->alloc_ctx);
+
+	vpe_set_clock_enable(dev, 0);
+	vpe_runtime_put(pdev);
+	pm_runtime_disable(&pdev->dev);
+
+	return 0;
+}
+
+#if defined(CONFIG_OF)
+static const struct of_device_id vpe_of_match[] = {
+	{
+		.compatible = "ti,vpe",
+	},
+	{},
+};
+#else
+#define vpe_of_match NULL
+#endif
+
+static struct platform_driver vpe_pdrv = {
+	.probe		= vpe_probe,
+	.remove		= vpe_remove,
+	.driver		= {
+		.name	= VPE_MODULE_NAME,
+		.owner	= THIS_MODULE,
+		.of_match_table = vpe_of_match,
+	},
+};
+
+static void __exit vpe_exit(void)
+{
+	platform_driver_unregister(&vpe_pdrv);
+}
+
+static int __init vpe_init(void)
+{
+	return platform_driver_register(&vpe_pdrv);
+}
+
+module_init(vpe_init);
+module_exit(vpe_exit);
+
+MODULE_DESCRIPTION("TI VPE driver");
+MODULE_AUTHOR("Dale Farnsworth, <dale@farnsworth.org>");
+MODULE_LICENSE("GPL");
diff --git a/drivers/media/platform/ti-vpe/vpe_regs.h b/drivers/media/platform/ti-vpe/vpe_regs.h
new file mode 100644
index 0000000..ed214e8
--- /dev/null
+++ b/drivers/media/platform/ti-vpe/vpe_regs.h
@@ -0,0 +1,496 @@
+/*
+ * Copyright (c) 2013 Texas Instruments Inc.
+ *
+ * David Griego, <dagriego@biglakesoftware.com>
+ * Dale Farnsworth, <dale@farnsworth.org>
+ * Archit Taneja, <archit@ti.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#ifndef __TI_VPE_REGS_H
+#define __TI_VPE_REGS_H
+
+/* VPE register offsets and field selectors */
+
+/* VPE top level regs */
+#define VPE_PID				0x0000
+#define VPE_PID_MINOR_MASK		0x3f
+#define VPE_PID_MINOR_SHIFT		0
+#define VPE_PID_CUSTOM_MASK		0x03
+#define VPE_PID_CUSTOM_SHIFT		6
+#define VPE_PID_MAJOR_MASK		0x07
+#define VPE_PID_MAJOR_SHIFT		8
+#define VPE_PID_RTL_MASK		0x1f
+#define VPE_PID_RTL_SHIFT		11
+#define VPE_PID_FUNC_MASK		0xfff
+#define VPE_PID_FUNC_SHIFT		16
+#define VPE_PID_SCHEME_MASK		0x03
+#define VPE_PID_SCHEME_SHIFT		30
+
+#define VPE_SYSCONFIG			0x0010
+#define VPE_SYSCONFIG_IDLE_MASK		0x03
+#define VPE_SYSCONFIG_IDLE_SHIFT	2
+#define VPE_SYSCONFIG_STANDBY_MASK	0x03
+#define VPE_SYSCONFIG_STANDBY_SHIFT	4
+#define VPE_FORCE_IDLE_MODE		0
+#define VPE_NO_IDLE_MODE		1
+#define VPE_SMART_IDLE_MODE		2
+#define VPE_SMART_IDLE_WAKEUP_MODE	3
+#define VPE_FORCE_STANDBY_MODE		0
+#define VPE_NO_STANDBY_MODE		1
+#define VPE_SMART_STANDBY_MODE		2
+#define VPE_SMART_STANDBY_WAKEUP_MODE	3
+
+#define VPE_INT0_STATUS0_RAW_SET	0x0020
+#define VPE_INT0_STATUS0_RAW		VPE_INT0_STATUS0_RAW_SET
+#define VPE_INT0_STATUS0_CLR		0x0028
+#define VPE_INT0_STATUS0		VPE_INT0_STATUS0_CLR
+#define VPE_INT0_ENABLE0_SET		0x0030
+#define VPE_INT0_ENABLE0		VPE_INT0_ENABLE0_SET
+#define VPE_INT0_ENABLE0_CLR		0x0038
+#define VPE_INT0_LIST0_COMPLETE		(1 << 0)
+#define VPE_INT0_LIST0_NOTIFY		(1 << 1)
+#define VPE_INT0_LIST1_COMPLETE		(1 << 2)
+#define VPE_INT0_LIST1_NOTIFY		(1 << 3)
+#define VPE_INT0_LIST2_COMPLETE		(1 << 4)
+#define VPE_INT0_LIST2_NOTIFY		(1 << 5)
+#define VPE_INT0_LIST3_COMPLETE		(1 << 6)
+#define VPE_INT0_LIST3_NOTIFY		(1 << 7)
+#define VPE_INT0_LIST4_COMPLETE		(1 << 8)
+#define VPE_INT0_LIST4_NOTIFY		(1 << 9)
+#define VPE_INT0_LIST5_COMPLETE		(1 << 10)
+#define VPE_INT0_LIST5_NOTIFY		(1 << 11)
+#define VPE_INT0_LIST6_COMPLETE		(1 << 12)
+#define VPE_INT0_LIST6_NOTIFY		(1 << 13)
+#define VPE_INT0_LIST7_COMPLETE		(1 << 14)
+#define VPE_INT0_LIST7_NOTIFY		(1 << 15)
+#define VPE_INT0_DESCRIPTOR		(1 << 16)
+#define VPE_DEI_FMD_INT			(1 << 18)
+
+#define VPE_INT0_STATUS1_RAW_SET	0x0024
+#define VPE_INT0_STATUS1_RAW		VPE_INT0_STATUS1_RAW_SET
+#define VPE_INT0_STATUS1_CLR		0x002c
+#define VPE_INT0_STATUS1		VPE_INT0_STATUS1_CLR
+#define VPE_INT0_ENABLE1_SET		0x0034
+#define VPE_INT0_ENABLE1		VPE_INT0_ENABLE1_SET
+#define VPE_INT0_ENABLE1_CLR		0x003c
+#define VPE_INT0_CHANNEL_GROUP0		(1 << 0)
+#define VPE_INT0_CHANNEL_GROUP1		(1 << 1)
+#define VPE_INT0_CHANNEL_GROUP2		(1 << 2)
+#define VPE_INT0_CHANNEL_GROUP3		(1 << 3)
+#define VPE_INT0_CHANNEL_GROUP4		(1 << 4)
+#define VPE_INT0_CHANNEL_GROUP5		(1 << 5)
+#define VPE_INT0_CLIENT			(1 << 7)
+#define VPE_DEI_ERROR_INT		(1 << 16)
+#define VPE_DS1_UV_ERROR_INT		(1 << 22)
+
+#define VPE_INTC_EOI			0x00a0
+
+#define VPE_CLK_ENABLE			0x0100
+#define VPE_VPEDMA_CLK_ENABLE		(1 << 0)
+#define VPE_DATA_PATH_CLK_ENABLE	(1 << 1)
+
+#define VPE_CLK_RESET			0x0104
+#define VPE_VPDMA_CLK_RESET_MASK	0x1
+#define VPE_VPDMA_CLK_RESET_SHIFT	0
+#define VPE_DATA_PATH_CLK_RESET_MASK	0x1
+#define VPE_DATA_PATH_CLK_RESET_SHIFT	1
+#define VPE_MAIN_RESET_MASK		0x1
+#define VPE_MAIN_RESET_SHIFT		31
+
+#define VPE_CLK_FORMAT_SELECT		0x010c
+#define VPE_CSC_SRC_SELECT_MASK		0x03
+#define VPE_CSC_SRC_SELECT_SHIFT	0
+#define VPE_RGB_OUT_SELECT		(1 << 8)
+#define VPE_DS_SRC_SELECT_MASK		0x07
+#define VPE_DS_SRC_SELECT_SHIFT		9
+#define VPE_DS_BYPASS			(1 << 16)
+#define VPE_COLOR_SEPARATE_422		(1 << 18)
+
+#define VPE_DS_SRC_DEI_SCALER		(5 << VPE_DS_SRC_SELECT_SHIFT)
+#define VPE_CSC_SRC_DEI_SCALER		(3 << VPE_CSC_SRC_SELECT_SHIFT)
+
+#define VPE_CLK_RANGE_MAP		0x011c
+#define VPE_RANGE_RANGE_MAP_Y_MASK	0x07
+#define VPE_RANGE_RANGE_MAP_Y_SHIFT	0
+#define VPE_RANGE_RANGE_MAP_UV_MASK	0x07
+#define VPE_RANGE_RANGE_MAP_UV_SHIFT	3
+#define VPE_RANGE_MAP_ON		(1 << 6)
+#define VPE_RANGE_REDUCTION_ON		(1 << 28)
+
+/* VPE chrominance upsampler regs */
+#define VPE_US1_R0			0x0304
+#define VPE_US2_R0			0x0404
+#define VPE_US3_R0			0x0504
+#define VPE_US_C1_MASK			0x3fff
+#define VPE_US_C1_SHIFT			2
+#define VPE_US_C0_MASK			0x3fff
+#define VPE_US_C0_SHIFT			18
+#define VPE_US_MODE_MASK		0x03
+#define VPE_US_MODE_SHIFT		16
+#define VPE_ANCHOR_FID0_C1_MASK		0x3fff
+#define VPE_ANCHOR_FID0_C1_SHIFT	2
+#define VPE_ANCHOR_FID0_C0_MASK		0x3fff
+#define VPE_ANCHOR_FID0_C0_SHIFT	18
+
+#define VPE_US1_R1			0x0308
+#define VPE_US2_R1			0x0408
+#define VPE_US3_R1			0x0508
+#define VPE_ANCHOR_FID0_C3_MASK		0x3fff
+#define VPE_ANCHOR_FID0_C3_SHIFT	2
+#define VPE_ANCHOR_FID0_C2_MASK		0x3fff
+#define VPE_ANCHOR_FID0_C2_SHIFT	18
+
+#define VPE_US1_R2			0x030c
+#define VPE_US2_R2			0x040c
+#define VPE_US3_R2			0x050c
+#define VPE_INTERP_FID0_C1_MASK		0x3fff
+#define VPE_INTERP_FID0_C1_SHIFT	2
+#define VPE_INTERP_FID0_C0_MASK		0x3fff
+#define VPE_INTERP_FID0_C0_SHIFT	18
+
+#define VPE_US1_R3			0x0310
+#define VPE_US2_R3			0x0410
+#define VPE_US3_R3			0x0510
+#define VPE_INTERP_FID0_C3_MASK		0x3fff
+#define VPE_INTERP_FID0_C3_SHIFT	2
+#define VPE_INTERP_FID0_C2_MASK		0x3fff
+#define VPE_INTERP_FID0_C2_SHIFT	18
+
+#define VPE_US1_R4			0x0314
+#define VPE_US2_R4			0x0414
+#define VPE_US3_R4			0x0514
+#define VPE_ANCHOR_FID1_C1_MASK		0x3fff
+#define VPE_ANCHOR_FID1_C1_SHIFT	2
+#define VPE_ANCHOR_FID1_C0_MASK		0x3fff
+#define VPE_ANCHOR_FID1_C0_SHIFT	18
+
+#define VPE_US1_R5			0x0318
+#define VPE_US2_R5			0x0418
+#define VPE_US3_R5			0x0518
+#define VPE_ANCHOR_FID1_C3_MASK		0x3fff
+#define VPE_ANCHOR_FID1_C3_SHIFT	2
+#define VPE_ANCHOR_FID1_C2_MASK		0x3fff
+#define VPE_ANCHOR_FID1_C2_SHIFT	18
+
+#define VPE_US1_R6			0x031c
+#define VPE_US2_R6			0x041c
+#define VPE_US3_R6			0x051c
+#define VPE_INTERP_FID1_C1_MASK		0x3fff
+#define VPE_INTERP_FID1_C1_SHIFT	2
+#define VPE_INTERP_FID1_C0_MASK		0x3fff
+#define VPE_INTERP_FID1_C0_SHIFT	18
+
+#define VPE_US1_R7			0x0320
+#define VPE_US2_R7			0x0420
+#define VPE_US3_R7			0x0520
+#define VPE_INTERP_FID0_C3_MASK		0x3fff
+#define VPE_INTERP_FID0_C3_SHIFT	2
+#define VPE_INTERP_FID0_C2_MASK		0x3fff
+#define VPE_INTERP_FID0_C2_SHIFT	18
+
+/* VPE de-interlacer regs */
+#define VPE_DEI_FRAME_SIZE		0x0600
+#define VPE_DEI_WIDTH_MASK		0x07ff
+#define VPE_DEI_WIDTH_SHIFT		0
+#define VPE_DEI_HEIGHT_MASK		0x07ff
+#define VPE_DEI_HEIGHT_SHIFT		16
+#define VPE_DEI_INTERLACE_BYPASS	(1 << 29)
+#define VPE_DEI_FIELD_FLUSH		(1 << 30)
+#define VPE_DEI_PROGRESSIVE		(1 << 31)
+
+#define VPE_MDT_BYPASS			0x0604
+#define VPE_MDT_TEMPMAX_BYPASS		(1 << 0)
+#define VPE_MDT_SPATMAX_BYPASS		(1 << 1)
+
+#define VPE_MDT_SF_THRESHOLD		0x0608
+#define VPE_MDT_SF_SC_THR1_MASK		0xff
+#define VPE_MDT_SF_SC_THR1_SHIFT	0
+#define VPE_MDT_SF_SC_THR2_MASK		0xff
+#define VPE_MDT_SF_SC_THR2_SHIFT	0
+#define VPE_MDT_SF_SC_THR3_MASK		0xff
+#define VPE_MDT_SF_SC_THR3_SHIFT	0
+
+#define VPE_EDI_CONFIG			0x060c
+#define VPE_EDI_INP_MODE_MASK		0x03
+#define VPE_EDI_INP_MODE_SHIFT		0
+#define VPE_EDI_ENABLE_3D		(1 << 2)
+#define VPE_EDI_ENABLE_CHROMA_3D	(1 << 3)
+#define VPE_EDI_CHROMA3D_COR_THR_MASK	0xff
+#define VPE_EDI_CHROMA3D_COR_THR_SHIFT	8
+#define VPE_EDI_DIR_COR_LOWER_THR_MASK	0xff
+#define VPE_EDI_DIR_COR_LOWER_THR_SHIFT	16
+#define VPE_EDI_COR_SCALE_FACTOR_MASK	0xff
+#define VPE_EDI_COR_SCALE_FACTOR_SHIFT	23
+
+#define VPE_DEI_EDI_LUT_R0		0x0610
+#define VPE_EDI_LUT0_MASK		0x1f
+#define VPE_EDI_LUT0_SHIFT		0
+#define VPE_EDI_LUT1_MASK		0x1f
+#define VPE_EDI_LUT1_SHIFT		8
+#define VPE_EDI_LUT2_MASK		0x1f
+#define VPE_EDI_LUT2_SHIFT		16
+#define VPE_EDI_LUT3_MASK		0x1f
+#define VPE_EDI_LUT3_SHIFT		24
+
+#define VPE_DEI_EDI_LUT_R1		0x0614
+#define VPE_EDI_LUT0_MASK		0x1f
+#define VPE_EDI_LUT0_SHIFT		0
+#define VPE_EDI_LUT1_MASK		0x1f
+#define VPE_EDI_LUT1_SHIFT		8
+#define VPE_EDI_LUT2_MASK		0x1f
+#define VPE_EDI_LUT2_SHIFT		16
+#define VPE_EDI_LUT3_MASK		0x1f
+#define VPE_EDI_LUT3_SHIFT		24
+
+#define VPE_DEI_EDI_LUT_R2		0x0618
+#define VPE_EDI_LUT4_MASK		0x1f
+#define VPE_EDI_LUT4_SHIFT		0
+#define VPE_EDI_LUT5_MASK		0x1f
+#define VPE_EDI_LUT5_SHIFT		8
+#define VPE_EDI_LUT6_MASK		0x1f
+#define VPE_EDI_LUT6_SHIFT		16
+#define VPE_EDI_LUT7_MASK		0x1f
+#define VPE_EDI_LUT7_SHIFT		24
+
+#define VPE_DEI_EDI_LUT_R3		0x061c
+#define VPE_EDI_LUT8_MASK		0x1f
+#define VPE_EDI_LUT8_SHIFT		0
+#define VPE_EDI_LUT9_MASK		0x1f
+#define VPE_EDI_LUT9_SHIFT		8
+#define VPE_EDI_LUT10_MASK		0x1f
+#define VPE_EDI_LUT10_SHIFT		16
+#define VPE_EDI_LUT11_MASK		0x1f
+#define VPE_EDI_LUT11_SHIFT		24
+
+#define VPE_DEI_FMD_WINDOW_R0		0x0620
+#define VPE_FMD_WINDOW_MINX_MASK	0x07ff
+#define VPE_FMD_WINDOW_MINX_SHIFT	0
+#define VPE_FMD_WINDOW_MAXX_MASK	0x07ff
+#define VPE_FMD_WINDOW_MAXX_SHIFT	16
+#define VPE_FMD_WINDOW_ENABLE		(1 << 31)
+
+#define VPE_DEI_FMD_WINDOW_R1		0x0624
+#define VPE_FMD_WINDOW_MINY_MASK	0x07ff
+#define VPE_FMD_WINDOW_MINY_SHIFT	0
+#define VPE_FMD_WINDOW_MAXY_MASK	0x07ff
+#define VPE_FMD_WINDOW_MAXY_SHIFT	16
+
+#define VPE_DEI_FMD_CONTROL_R0		0x0628
+#define VPE_FMD_ENABLE			(1 << 0)
+#define VPE_FMD_LOCK			(1 << 1)
+#define VPE_FMD_JAM_DIR			(1 << 2)
+#define VPE_FMD_BED_ENABLE		(1 << 3)
+#define VPE_FMD_CAF_FIELD_THR_MASK	0xff
+#define VPE_FMD_CAF_FIELD_THR_SHIFT	16
+#define VPE_FMD_CAF_LINE_THR_MASK	0xff
+#define VPE_FMD_CAF_LINE_THR_SHIFT	24
+
+#define VPE_DEI_FMD_CONTROL_R1		0x062c
+#define VPE_FMD_CAF_THR_MASK		0x000fffff
+#define VPE_FMD_CAF_THR_SHIFT		0
+
+#define VPE_DEI_FMD_STATUS_R0		0x0630
+#define VPE_FMD_CAF_MASK		0x000fffff
+#define VPE_FMD_CAF_SHIFT		0
+#define VPE_FMD_RESET			(1 << 24)
+
+#define VPE_DEI_FMD_STATUS_R1		0x0634
+#define VPE_FMD_FIELD_DIFF_MASK		0x0fffffff
+#define VPE_FMD_FIELD_DIFF_SHIFT	0
+
+#define VPE_DEI_FMD_STATUS_R2		0x0638
+#define VPE_FMD_FRAME_DIFF_MASK		0x000fffff
+#define VPE_FMD_FRAME_DIFF_SHIFT	0
+
+/* VPE scaler regs */
+#define VPE_SC_MP_SC0			0x0700
+#define VPE_INTERLACE_O			(1 << 0)
+#define VPE_LINEAR			(1 << 1)
+#define VPE_SC_BYPASS			(1 << 2)
+#define VPE_INVT_FID			(1 << 3)
+#define VPE_USE_RAV			(1 << 4)
+#define VPE_ENABLE_EV			(1 << 5)
+#define VPE_AUTO_HS			(1 << 6)
+#define VPE_DCM_2X			(1 << 7)
+#define VPE_DCM_4X			(1 << 8)
+#define VPE_HP_BYPASS			(1 << 9)
+#define VPE_INTERLACE_I			(1 << 10)
+#define VPE_ENABLE_SIN2_VER_INTP	(1 << 11)
+#define VPE_Y_PK_EN			(1 << 14)
+#define VPE_TRIM			(1 << 15)
+#define VPE_SELFGEN_FID			(1 << 16)
+
+#define VPE_SC_MP_SC1			0x0704
+#define VPE_ROW_ACC_INC_MASK		0x07ffffff
+#define VPE_ROW_ACC_INC_SHIFT		0
+
+#define VPE_SC_MP_SC2			0x0708
+#define VPE_ROW_ACC_OFFSET_MASK		0x0fffffff
+#define VPE_ROW_ACC_OFFSET_SHIFT	0
+
+#define VPE_SC_MP_SC3			0x070c
+#define VPE_ROW_ACC_OFFSET_B_MASK	0x0fffffff
+#define VPE_ROW_ACC_OFFSET_B_SHIFT	0
+
+#define VPE_SC_MP_SC4			0x0710
+#define VPE_TAR_H_MASK			0x07ff
+#define VPE_TAR_H_SHIFT			0
+#define VPE_TAR_W_MASK			0x07ff
+#define VPE_TAR_W_SHIFT			12
+#define VPE_LIN_ACC_INC_U_MASK		0x07
+#define VPE_LIN_ACC_INC_U_SHIFT		24
+#define VPE_NLIN_ACC_INIT_U_MASK	0x07
+#define VPE_NLIN_ACC_INIT_U_SHIFT	28
+
+#define VPE_SC_MP_SC5			0x0714
+#define VPE_SRC_H_MASK			0x07ff
+#define VPE_SRC_H_SHIFT			0
+#define VPE_SRC_W_MASK			0x07ff
+#define VPE_SRC_W_SHIFT			12
+#define VPE_NLIN_ACC_INC_U_MASK		0x07
+#define VPE_NLIN_ACC_INC_U_SHIFT	24
+
+#define VPE_SC_MP_SC6			0x0718
+#define VPE_ROW_ACC_INIT_RAV_MASK	0x03ff
+#define VPE_ROW_ACC_INIT_RAV_SHIFT	0
+#define VPE_ROW_ACC_INIT_RAV_B_MASK	0x03ff
+#define VPE_ROW_ACC_INIT_RAV_B_SHIFT	10
+
+#define VPE_SC_MP_SC8			0x0720
+#define VPE_NLIN_LEFT_MASK		0x07ff
+#define VPE_NLIN_LEFT_SHIFT		0
+#define VPE_NLIN_RIGHT_MASK		0x07ff
+#define VPE_NLIN_RIGHT_SHIFT		12
+
+#define VPE_SC_MP_SC9			0x0724
+#define VPE_LIN_ACC_INC			VPE_SC_MP_SC9
+
+#define VPE_SC_MP_SC10			0x0728
+#define VPE_NLIN_ACC_INIT		VPE_SC_MP_SC10
+
+#define VPE_SC_MP_SC11			0x072c
+#define VPE_NLIN_ACC_INC		VPE_SC_MP_SC11
+
+#define VPE_SC_MP_SC12			0x0730
+#define VPE_COL_ACC_OFFSET_MASK		0x01ffffff
+#define VPE_COL_ACC_OFFSET_SHIFT	0
+
+#define VPE_SC_MP_SC13			0x0734
+#define VPE_SC_FACTOR_RAV_MASK		0x03ff
+#define VPE_SC_FACTOR_RAV_SHIFT		0
+#define VPE_CHROMA_INTP_THR_MASK	0x03ff
+#define VPE_CHROMA_INTP_THR_SHIFT	12
+#define VPE_DELTA_CHROMA_THR_MASK	0x0f
+#define VPE_DELTA_CHROMA_THR_SHIFT	24
+
+#define VPE_SC_MP_SC17			0x0744
+#define VPE_EV_THR_MASK			0x03ff
+#define VPE_EV_THR_SHIFT		12
+#define VPE_DELTA_LUMA_THR_MASK		0x0f
+#define VPE_DELTA_LUMA_THR_SHIFT	24
+#define VPE_DELTA_EV_THR_MASK		0x0f
+#define VPE_DELTA_EV_THR_SHIFT		28
+
+#define VPE_SC_MP_SC18			0x0748
+#define VPE_HS_FACTOR_MASK		0x03ff
+#define VPE_HS_FACTOR_SHIFT		0
+#define VPE_CONF_DEFAULT_MASK		0x01ff
+#define VPE_CONF_DEFAULT_SHIFT		16
+
+#define VPE_SC_MP_SC19			0x074c
+#define VPE_HPF_COEFF0_MASK		0xff
+#define VPE_HPF_COEFF0_SHIFT		0
+#define VPE_HPF_COEFF1_MASK		0xff
+#define VPE_HPF_COEFF1_SHIFT		8
+#define VPE_HPF_COEFF2_MASK		0xff
+#define VPE_HPF_COEFF2_SHIFT		16
+#define VPE_HPF_COEFF3_MASK		0xff
+#define VPE_HPF_COEFF3_SHIFT		23
+
+#define VPE_SC_MP_SC20			0x0750
+#define VPE_HPF_COEFF4_MASK		0xff
+#define VPE_HPF_COEFF4_SHIFT		0
+#define VPE_HPF_COEFF5_MASK		0xff
+#define VPE_HPF_COEFF5_SHIFT		8
+#define VPE_HPF_NORM_SHIFT_MASK		0x07
+#define VPE_HPF_NORM_SHIFT_SHIFT	16
+#define VPE_NL_LIMIT_MASK		0x1ff
+#define VPE_NL_LIMIT_SHIFT		20
+
+#define VPE_SC_MP_SC21			0x0754
+#define VPE_NL_LO_THR_MASK		0x01ff
+#define VPE_NL_LO_THR_SHIFT		0
+#define VPE_NL_LO_SLOPE_MASK		0xff
+#define VPE_NL_LO_SLOPE_SHIFT		16
+
+#define VPE_SC_MP_SC22			0x0758
+#define VPE_NL_HI_THR_MASK		0x01ff
+#define VPE_NL_HI_THR_SHIFT		0
+#define VPE_NL_HI_SLOPE_SH_MASK		0x07
+#define VPE_NL_HI_SLOPE_SH_SHIFT	16
+
+#define VPE_SC_MP_SC23			0x075c
+#define VPE_GRADIENT_THR_MASK		0x07ff
+#define VPE_GRADIENT_THR_SHIFT		0
+#define VPE_GRADIENT_THR_RANGE_MASK	0x0f
+#define VPE_GRADIENT_THR_RANGE_SHIFT	12
+#define VPE_MIN_GY_THR_MASK		0xff
+#define VPE_MIN_GY_THR_SHIFT		16
+#define VPE_MIN_GY_THR_RANGE_MASK	0x0f
+#define VPE_MIN_GY_THR_RANGE_SHIFT	28
+
+#define VPE_SC_MP_SC24			0x0760
+#define VPE_ORG_H_MASK			0x07ff
+#define VPE_ORG_H_SHIFT			0
+#define VPE_ORG_W_MASK			0x07ff
+#define VPE_ORG_W_SHIFT			16
+
+#define VPE_SC_MP_SC25			0x0764
+#define VPE_OFF_H_MASK			0x07ff
+#define VPE_OFF_H_SHIFT			0
+#define VPE_OFF_W_MASK			0x07ff
+#define VPE_OFF_W_SHIFT			16
+
+/* VPE color space converter regs */
+#define VPE_CSC_CSC00			0x5700
+#define VPE_CSC_A0_MASK			0x1fff
+#define VPE_CSC_A0_SHIFT		0
+#define VPE_CSC_B0_MASK			0x1fff
+#define VPE_CSC_B0_SHIFT		16
+
+#define VPE_CSC_CSC01			0x5704
+#define VPE_CSC_C0_MASK			0x1fff
+#define VPE_CSC_C0_SHIFT		0
+#define VPE_CSC_A1_MASK			0x1fff
+#define VPE_CSC_A1_SHIFT		16
+
+#define VPE_CSC_CSC02			0x5708
+#define VPE_CSC_B1_MASK			0x1fff
+#define VPE_CSC_B1_SHIFT		0
+#define VPE_CSC_C1_MASK			0x1fff
+#define VPE_CSC_C1_SHIFT		16
+
+#define VPE_CSC_CSC03			0x570c
+#define VPE_CSC_A2_MASK			0x1fff
+#define VPE_CSC_A2_SHIFT		0
+#define VPE_CSC_B2_MASK			0x1fff
+#define VPE_CSC_B2_SHIFT		16
+
+#define VPE_CSC_CSC04			0x5710
+#define VPE_CSC_C2_MASK			0x1fff
+#define VPE_CSC_C2_SHIFT		0
+#define VPE_CSC_D0_MASK			0x0fff
+#define VPE_CSC_D0_SHIFT		16
+
+#define VPE_CSC_CSC05			0x5714
+#define VPE_CSC_D1_MASK			0x0fff
+#define VPE_CSC_D1_SHIFT		0
+#define VPE_CSC_D2_MASK			0x0fff
+#define VPE_CSC_D2_SHIFT		16
+#define VPE_CSC_BYPASS			(1 << 28)
+
+#endif
diff --git a/include/uapi/linux/v4l2-controls.h b/include/uapi/linux/v4l2-controls.h
index 083bb5a..1666aab 100644
--- a/include/uapi/linux/v4l2-controls.h
+++ b/include/uapi/linux/v4l2-controls.h
@@ -160,6 +160,10 @@ enum v4l2_colorfx {
  * of controls. Total of 16 controls is reserved for this driver */
 #define V4L2_CID_USER_SI476X_BASE		(V4L2_CID_USER_BASE + 0x1040)
 
+/* The base for the TI VPE driver controls. Total of 16 controls is reserved for
+ * this driver */
+#define V4L2_CID_USER_TI_VPE_BASE		(V4L2_CID_USER_BASE + 0x1050)
+
 /* MPEG-class control IDs */
 /* The MPEG controls are applicable to all codec controls
  * and the 'MPEG' part of the define is historical */
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v5 4/4] v4l: ti-vpe: Add de-interlacer support in VPE
  2013-10-16  5:36       ` Archit Taneja
@ 2013-10-16  5:36         ` Archit Taneja
  -1 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-10-16  5:36 UTC (permalink / raw)
  To: k.debski; +Cc: hverkuil, linux-media, linux-omap, Archit Taneja

Add support for the de-interlacer block in VPE.

For de-interlacer to work, we need to enable 2 more sets of VPE input ports
which fetch data from the 'last' and 'last to last' fields of the interlaced
video. Apart from that, we need to enable the Motion vector output and input
ports, and also allocate DMA buffers for them.

We need to make sure that two most recent fields in the source queue are
available and in the 'READY' state. Once a mem2mem context gets access to the
VPE HW(in device_run), it extracts the addresses of the 3 buffers, and provides
it to the data descriptors for the 3 sets of input ports((LUMA1, CHROMA1),
(LUMA2, CHROMA2), and (LUMA3, CHROMA3)) respectively for the 3 consecutive
fields. The motion vector and output port descriptors are configured and the
list is submitted to VPDMA.

Once the transaction is done, the v4l2 buffer corresponding to the oldest
field(the 3rd one) is changed to the state 'DONE', and the buffers corresponding
to 1st and 2nd fields become the 2nd and 3rd field for the next de-interlace
operation. This way, for each deinterlace operation, we have the 3 most recent
fields. After each transaction, we also swap the motion vector buffers, the new
input motion vector buffer contains the resultant motion information of all the
previous frames, and the new output motion vector buffer will be used to hold
the updated motion vector to capture the motion changes in the next field. The
motion vector buffers are allocated using the DMA allocation API.

The de-interlacer is removed from bypass mode, it requires some extra default
configurations which are now added. The chrominance upsampler coefficients are
added for interlaced frames. Some VPDMA parameters like frame start event and
line mode are configured for the 2 extra sets of input ports.

Acked-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Archit Taneja <archit@ti.com>
---
 drivers/media/platform/ti-vpe/vpe.c | 392 ++++++++++++++++++++++++++++++++----
 1 file changed, 358 insertions(+), 34 deletions(-)

diff --git a/drivers/media/platform/ti-vpe/vpe.c b/drivers/media/platform/ti-vpe/vpe.c
index 3bd9ca6..4e58069 100644
--- a/drivers/media/platform/ti-vpe/vpe.c
+++ b/drivers/media/platform/ti-vpe/vpe.c
@@ -69,6 +69,8 @@
 #define VPE_CHROMA	1
 
 /* per m2m context info */
+#define VPE_MAX_SRC_BUFS	3	/* need 3 src fields to de-interlace */
+
 #define VPE_DEF_BUFS_PER_JOB	1	/* default one buffer per batch job */
 
 /*
@@ -111,6 +113,38 @@ static const struct vpe_us_coeffs us_coeffs[] = {
 		0x00C8, 0x0348, 0x0018, 0x3FD8, 0x3FB8, 0x0378, 0x00E8, 0x3FE8,
 		0x00C8, 0x0348, 0x0018, 0x3FD8, 0x3FB8, 0x0378, 0x00E8, 0x3FE8,
 	},
+	{
+		/* Coefficients for Top Field Interlaced input */
+		0x0051, 0x03D5, 0x3FE3, 0x3FF7, 0x3FB5, 0x02E9, 0x018F, 0x3FD3,
+		/* Coefficients for Bottom Field Interlaced input */
+		0x016B, 0x0247, 0x00B1, 0x3F9D, 0x3FCF, 0x03DB, 0x005D, 0x3FF9,
+	},
+};
+
+/*
+ * the following registers are for configuring some of the parameters of the
+ * motion and edge detection blocks inside DEI, these generally remain the same,
+ * these could be passed later via userspace if some one needs to tweak these.
+ */
+struct vpe_dei_regs {
+	unsigned long mdt_spacial_freq_thr_reg;		/* VPE_DEI_REG2 */
+	unsigned long edi_config_reg;			/* VPE_DEI_REG3 */
+	unsigned long edi_lut_reg0;			/* VPE_DEI_REG4 */
+	unsigned long edi_lut_reg1;			/* VPE_DEI_REG5 */
+	unsigned long edi_lut_reg2;			/* VPE_DEI_REG6 */
+	unsigned long edi_lut_reg3;			/* VPE_DEI_REG7 */
+};
+
+/*
+ * default expert DEI register values, unlikely to be modified.
+ */
+static const struct vpe_dei_regs dei_regs = {
+	0x020C0804u,
+	0x0118100Fu,
+	0x08040200u,
+	0x1010100Cu,
+	0x10101010u,
+	0x10101010u,
 };
 
 /*
@@ -118,6 +152,7 @@ static const struct vpe_us_coeffs us_coeffs[] = {
  */
 struct vpe_port_data {
 	enum vpdma_channel channel;	/* VPDMA channel */
+	u8	vb_index;		/* input frame f, f-1, f-2 index */
 	u8	vb_part;		/* plane index for co-panar formats */
 };
 
@@ -126,6 +161,12 @@ struct vpe_port_data {
  */
 #define VPE_PORT_LUMA1_IN	0
 #define VPE_PORT_CHROMA1_IN	1
+#define VPE_PORT_LUMA2_IN	2
+#define VPE_PORT_CHROMA2_IN	3
+#define VPE_PORT_LUMA3_IN	4
+#define VPE_PORT_CHROMA3_IN	5
+#define VPE_PORT_MV_IN		6
+#define VPE_PORT_MV_OUT		7
 #define VPE_PORT_LUMA_OUT	8
 #define VPE_PORT_CHROMA_OUT	9
 #define VPE_PORT_RGB_OUT	10
@@ -133,12 +174,40 @@ struct vpe_port_data {
 static const struct vpe_port_data port_data[11] = {
 	[VPE_PORT_LUMA1_IN] = {
 		.channel	= VPE_CHAN_LUMA1_IN,
+		.vb_index	= 0,
 		.vb_part	= VPE_LUMA,
 	},
 	[VPE_PORT_CHROMA1_IN] = {
 		.channel	= VPE_CHAN_CHROMA1_IN,
+		.vb_index	= 0,
+		.vb_part	= VPE_CHROMA,
+	},
+	[VPE_PORT_LUMA2_IN] = {
+		.channel	= VPE_CHAN_LUMA2_IN,
+		.vb_index	= 1,
+		.vb_part	= VPE_LUMA,
+	},
+	[VPE_PORT_CHROMA2_IN] = {
+		.channel	= VPE_CHAN_CHROMA2_IN,
+		.vb_index	= 1,
+		.vb_part	= VPE_CHROMA,
+	},
+	[VPE_PORT_LUMA3_IN] = {
+		.channel	= VPE_CHAN_LUMA3_IN,
+		.vb_index	= 2,
+		.vb_part	= VPE_LUMA,
+	},
+	[VPE_PORT_CHROMA3_IN] = {
+		.channel	= VPE_CHAN_CHROMA3_IN,
+		.vb_index	= 2,
 		.vb_part	= VPE_CHROMA,
 	},
+	[VPE_PORT_MV_IN] = {
+		.channel	= VPE_CHAN_MV_IN,
+	},
+	[VPE_PORT_MV_OUT] = {
+		.channel	= VPE_CHAN_MV_OUT,
+	},
 	[VPE_PORT_LUMA_OUT] = {
 		.channel	= VPE_CHAN_LUMA_OUT,
 		.vb_part	= VPE_LUMA,
@@ -210,6 +279,7 @@ struct vpe_q_data {
 	unsigned int		height;				/* frame height */
 	unsigned int		bytesperline[VPE_MAX_PLANES];	/* bytes per line in memory */
 	enum v4l2_colorspace	colorspace;
+	enum v4l2_field		field;				/* supported field value */
 	unsigned int		flags;
 	unsigned int		sizeimage[VPE_MAX_PLANES];	/* image size in memory */
 	struct v4l2_rect	c_rect;				/* crop/compose rectangle */
@@ -219,6 +289,7 @@ struct vpe_q_data {
 /* vpe_q_data flag bits */
 #define	Q_DATA_FRAME_1D		(1 << 0)
 #define	Q_DATA_MODE_TILED	(1 << 1)
+#define	Q_DATA_INTERLACED	(1 << 2)
 
 enum {
 	Q_DATA_SRC = 0,
@@ -270,6 +341,7 @@ struct vpe_ctx {
 	struct v4l2_m2m_ctx	*m2m_ctx;
 	struct v4l2_ctrl_handler hdl;
 
+	unsigned int		field;			/* current field */
 	unsigned int		sequence;		/* current frame/field seq */
 	unsigned int		aborting;		/* abort after next irq */
 
@@ -277,13 +349,19 @@ struct vpe_ctx {
 	unsigned int		bufs_completed;		/* bufs done in this batch */
 
 	struct vpe_q_data	q_data[2];		/* src & dst queue data */
-	struct vb2_buffer	*src_vb;
+	struct vb2_buffer	*src_vbs[VPE_MAX_SRC_BUFS];
 	struct vb2_buffer	*dst_vb;
 
+	dma_addr_t		mv_buf_dma[2];		/* dma addrs of motion vector in/out bufs */
+	void			*mv_buf[2];		/* virtual addrs of motion vector bufs */
+	size_t			mv_buf_size;		/* current motion vector buffer size */
 	struct vpdma_buf	mmr_adb;		/* shadow reg addr/data block */
 	struct vpdma_desc_list	desc_list;		/* DMA descriptor list */
 
+	bool			deinterlacing;		/* using de-interlacer */
 	bool			load_mmrs;		/* have new shadow reg values */
+
+	unsigned int		src_mv_buf_selector;
 };
 
 
@@ -359,8 +437,7 @@ struct vpe_mmr_adb {
 	struct vpdma_adb_hdr	us3_hdr;
 	u32			us3_regs[8];
 	struct vpdma_adb_hdr	dei_hdr;
-	u32			dei_regs[1];
-	u32			dei_pad[3];
+	u32			dei_regs[8];
 	struct vpdma_adb_hdr	sc_hdr;
 	u32			sc_regs[1];
 	u32			sc_pad[3];
@@ -386,6 +463,80 @@ static void init_adb_hdrs(struct vpe_ctx *ctx)
 };
 
 /*
+ * Allocate or re-allocate the motion vector DMA buffers
+ * There are two buffers, one for input and one for output.
+ * However, the roles are reversed after each field is processed.
+ * In other words, after each field is processed, the previous
+ * output (dst) MV buffer becomes the new input (src) MV buffer.
+ */
+static int realloc_mv_buffers(struct vpe_ctx *ctx, size_t size)
+{
+	struct device *dev = ctx->dev->v4l2_dev.dev;
+
+	if (ctx->mv_buf_size == size)
+		return 0;
+
+	if (ctx->mv_buf[0])
+		dma_free_coherent(dev, ctx->mv_buf_size, ctx->mv_buf[0],
+			ctx->mv_buf_dma[0]);
+
+	if (ctx->mv_buf[1])
+		dma_free_coherent(dev, ctx->mv_buf_size, ctx->mv_buf[1],
+			ctx->mv_buf_dma[1]);
+
+	if (size == 0)
+		return 0;
+
+	ctx->mv_buf[0] = dma_alloc_coherent(dev, size, &ctx->mv_buf_dma[0],
+				GFP_KERNEL);
+	if (!ctx->mv_buf[0]) {
+		vpe_err(ctx->dev, "failed to allocate motion vector buffer\n");
+		return -ENOMEM;
+	}
+
+	ctx->mv_buf[1] = dma_alloc_coherent(dev, size, &ctx->mv_buf_dma[1],
+				GFP_KERNEL);
+	if (!ctx->mv_buf[1]) {
+		vpe_err(ctx->dev, "failed to allocate motion vector buffer\n");
+		dma_free_coherent(dev, size, ctx->mv_buf[0],
+			ctx->mv_buf_dma[0]);
+
+		return -ENOMEM;
+	}
+
+	ctx->mv_buf_size = size;
+	ctx->src_mv_buf_selector = 0;
+
+	return 0;
+}
+
+static void free_mv_buffers(struct vpe_ctx *ctx)
+{
+	realloc_mv_buffers(ctx, 0);
+}
+
+/*
+ * While de-interlacing, we keep the two most recent input buffers
+ * around.  This function frees those two buffers when we have
+ * finished processing the current stream.
+ */
+static void free_vbs(struct vpe_ctx *ctx)
+{
+	struct vpe_dev *dev = ctx->dev;
+	unsigned long flags;
+
+	if (ctx->src_vbs[2] == NULL)
+		return;
+
+	spin_lock_irqsave(&dev->lock, flags);
+	if (ctx->src_vbs[2]) {
+		v4l2_m2m_buf_done(ctx->src_vbs[2], VB2_BUF_STATE_DONE);
+		v4l2_m2m_buf_done(ctx->src_vbs[1], VB2_BUF_STATE_DONE);
+	}
+	spin_unlock_irqrestore(&dev->lock, flags);
+}
+
+/*
  * Enable or disable the VPE clocks
  */
 static void vpe_set_clock_enable(struct vpe_dev *dev, bool on)
@@ -426,6 +577,7 @@ static void vpe_top_vpdma_reset(struct vpe_dev *dev)
 static void set_us_coefficients(struct vpe_ctx *ctx)
 {
 	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	struct vpe_q_data *s_q_data = &ctx->q_data[Q_DATA_SRC];
 	u32 *us1_reg = &mmr_adb->us1_regs[0];
 	u32 *us2_reg = &mmr_adb->us2_regs[0];
 	u32 *us3_reg = &mmr_adb->us3_regs[0];
@@ -433,6 +585,9 @@ static void set_us_coefficients(struct vpe_ctx *ctx)
 
 	cp = &us_coeffs[0].anchor_fid0_c0;
 
+	if (s_q_data->flags & Q_DATA_INTERLACED)	/* interlaced */
+		cp += sizeof(us_coeffs[0]) / sizeof(*cp);
+
 	end_cp = cp + sizeof(us_coeffs[0]) / sizeof(*cp);
 
 	while (cp < end_cp) {
@@ -473,14 +628,28 @@ static void set_cfg_and_line_modes(struct vpe_ctx *ctx)
 
 	/* regs for now */
 	vpdma_set_line_mode(ctx->dev->vpdma, line_mode, VPE_CHAN_CHROMA1_IN);
+	vpdma_set_line_mode(ctx->dev->vpdma, line_mode, VPE_CHAN_CHROMA2_IN);
+	vpdma_set_line_mode(ctx->dev->vpdma, line_mode, VPE_CHAN_CHROMA3_IN);
 
 	/* frame start for input luma */
 	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
 		VPE_CHAN_LUMA1_IN);
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_LUMA2_IN);
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_LUMA3_IN);
 
 	/* frame start for input chroma */
 	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
 		VPE_CHAN_CHROMA1_IN);
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_CHROMA2_IN);
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_CHROMA3_IN);
+
+	/* frame start for MV in client */
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_MV_IN);
 
 	ctx->load_mmrs = true;
 }
@@ -524,13 +693,14 @@ static void set_dst_registers(struct vpe_ctx *ctx)
 /*
  * Set the de-interlacer shadow register values
  */
-static void set_dei_regs_bypass(struct vpe_ctx *ctx)
+static void set_dei_regs(struct vpe_ctx *ctx)
 {
 	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
 	struct vpe_q_data *s_q_data = &ctx->q_data[Q_DATA_SRC];
 	unsigned int src_h = s_q_data->c_rect.height;
 	unsigned int src_w = s_q_data->c_rect.width;
 	u32 *dei_mmr0 = &mmr_adb->dei_regs[0];
+	bool deinterlace = true;
 	u32 val = 0;
 
 	/*
@@ -539,7 +709,13 @@ static void set_dei_regs_bypass(struct vpe_ctx *ctx)
 	 * for both progressive and interlace content in interlace bypass mode.
 	 * It has been recommended not to use progressive bypass mode.
 	 */
-	val = VPE_DEI_INTERLACE_BYPASS;
+	if ((!ctx->deinterlacing && (s_q_data->flags & Q_DATA_INTERLACED)) ||
+			!(s_q_data->flags & Q_DATA_INTERLACED)) {
+		deinterlace = false;
+		val = VPE_DEI_INTERLACE_BYPASS;
+	}
+
+	src_h = deinterlace ? src_h * 2 : src_h;
 
 	val |= (src_h << VPE_DEI_HEIGHT_SHIFT) |
 		(src_w << VPE_DEI_WIDTH_SHIFT) |
@@ -550,6 +726,22 @@ static void set_dei_regs_bypass(struct vpe_ctx *ctx)
 	ctx->load_mmrs = true;
 }
 
+static void set_dei_shadow_registers(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	u32 *dei_mmr = &mmr_adb->dei_regs[0];
+	const struct vpe_dei_regs *cur = &dei_regs;
+
+	dei_mmr[2]  = cur->mdt_spacial_freq_thr_reg;
+	dei_mmr[3]  = cur->edi_config_reg;
+	dei_mmr[4]  = cur->edi_lut_reg0;
+	dei_mmr[5]  = cur->edi_lut_reg1;
+	dei_mmr[6]  = cur->edi_lut_reg2;
+	dei_mmr[7]  = cur->edi_lut_reg3;
+
+	ctx->load_mmrs = true;
+}
+
 static void set_csc_coeff_bypass(struct vpe_ctx *ctx)
 {
 	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
@@ -578,10 +770,35 @@ static void set_sc_regs_bypass(struct vpe_ctx *ctx)
  */
 static int set_srcdst_params(struct vpe_ctx *ctx)
 {
+	struct vpe_q_data *s_q_data =  &ctx->q_data[Q_DATA_SRC];
+	struct vpe_q_data *d_q_data =  &ctx->q_data[Q_DATA_DST];
+	size_t mv_buf_size;
+	int ret;
+
 	ctx->sequence = 0;
+	ctx->field = V4L2_FIELD_TOP;
+
+	if ((s_q_data->flags & Q_DATA_INTERLACED) &&
+			!(d_q_data->flags & Q_DATA_INTERLACED)) {
+		const struct vpdma_data_format *mv =
+			&vpdma_misc_fmts[VPDMA_DATA_FMT_MV];
+
+		ctx->deinterlacing = 1;
+		mv_buf_size =
+			(s_q_data->width * s_q_data->height * mv->depth) >> 3;
+	} else {
+		ctx->deinterlacing = 0;
+		mv_buf_size = 0;
+	}
+
+	free_vbs(ctx);
+
+	ret = realloc_mv_buffers(ctx, mv_buf_size);
+	if (ret)
+		return ret;
 
 	set_cfg_and_line_modes(ctx);
-	set_dei_regs_bypass(ctx);
+	set_dei_regs(ctx);
 	set_csc_coeff_bypass(ctx);
 	set_sc_regs_bypass(ctx);
 
@@ -608,6 +825,9 @@ static int job_ready(void *priv)
 	struct vpe_ctx *ctx = priv;
 	int needed = ctx->bufs_per_job;
 
+	if (ctx->deinterlacing && ctx->src_vbs[2] == NULL)
+		needed += 2;	/* need additional two most recent fields */
+
 	if (v4l2_m2m_num_src_bufs_ready(ctx->m2m_ctx) < needed)
 		return 0;
 
@@ -735,17 +955,25 @@ static void add_out_dtd(struct vpe_ctx *ctx, int port)
 	struct v4l2_rect *c_rect = &q_data->c_rect;
 	struct vpe_fmt *fmt = q_data->fmt;
 	const struct vpdma_data_format *vpdma_fmt;
-	int plane = fmt->coplanar ? p_data->vb_part : 0;
+	int mv_buf_selector = !ctx->src_mv_buf_selector;
 	dma_addr_t dma_addr;
 	u32 flags = 0;
 
-	vpdma_fmt = fmt->vpdma_fmt[plane];
-	dma_addr = vb2_dma_contig_plane_dma_addr(vb, plane);
-	if (!dma_addr) {
-		vpe_err(ctx->dev,
-			"acquiring output buffer(%d) dma_addr failed\n",
-			port);
-		return;
+	if (port == VPE_PORT_MV_OUT) {
+		vpdma_fmt = &vpdma_misc_fmts[VPDMA_DATA_FMT_MV];
+		dma_addr = ctx->mv_buf_dma[mv_buf_selector];
+	} else {
+		/* to incorporate interleaved formats */
+		int plane = fmt->coplanar ? p_data->vb_part : 0;
+
+		vpdma_fmt = fmt->vpdma_fmt[plane];
+		dma_addr = vb2_dma_contig_plane_dma_addr(vb, plane);
+		if (!dma_addr) {
+			vpe_err(ctx->dev,
+				"acquiring output buffer(%d) dma_addr failed\n",
+				port);
+			return;
+		}
 	}
 
 	if (q_data->flags & Q_DATA_FRAME_1D)
@@ -761,23 +989,31 @@ static void add_in_dtd(struct vpe_ctx *ctx, int port)
 {
 	struct vpe_q_data *q_data = &ctx->q_data[Q_DATA_SRC];
 	const struct vpe_port_data *p_data = &port_data[port];
-	struct vb2_buffer *vb = ctx->src_vb;
+	struct vb2_buffer *vb = ctx->src_vbs[p_data->vb_index];
 	struct v4l2_rect *c_rect = &q_data->c_rect;
 	struct vpe_fmt *fmt = q_data->fmt;
 	const struct vpdma_data_format *vpdma_fmt;
-	int plane = fmt->coplanar ? p_data->vb_part : 0;
-	int field = 0;
+	int mv_buf_selector = ctx->src_mv_buf_selector;
+	int field = vb->v4l2_buf.field == V4L2_FIELD_BOTTOM;
 	dma_addr_t dma_addr;
 	u32 flags = 0;
 
-	vpdma_fmt = fmt->vpdma_fmt[plane];
+	if (port == VPE_PORT_MV_IN) {
+		vpdma_fmt = &vpdma_misc_fmts[VPDMA_DATA_FMT_MV];
+		dma_addr = ctx->mv_buf_dma[mv_buf_selector];
+	} else {
+		/* to incorporate interleaved formats */
+		int plane = fmt->coplanar ? p_data->vb_part : 0;
 
-	dma_addr = vb2_dma_contig_plane_dma_addr(vb, plane);
-	if (!dma_addr) {
-		vpe_err(ctx->dev,
-			"acquiring input buffer(%d) dma_addr failed\n",
-			port);
-		return;
+		vpdma_fmt = fmt->vpdma_fmt[plane];
+
+		dma_addr = vb2_dma_contig_plane_dma_addr(vb, plane);
+		if (!dma_addr) {
+			vpe_err(ctx->dev,
+				"acquiring input buffer(%d) dma_addr failed\n",
+				port);
+			return;
+		}
 	}
 
 	if (q_data->flags & Q_DATA_FRAME_1D)
@@ -795,7 +1031,8 @@ static void add_in_dtd(struct vpe_ctx *ctx, int port)
 static void enable_irqs(struct vpe_ctx *ctx)
 {
 	write_reg(ctx->dev, VPE_INT0_ENABLE0_SET, VPE_INT0_LIST0_COMPLETE);
-	write_reg(ctx->dev, VPE_INT0_ENABLE1_SET, VPE_DS1_UV_ERROR_INT);
+	write_reg(ctx->dev, VPE_INT0_ENABLE1_SET, VPE_DEI_ERROR_INT |
+				VPE_DS1_UV_ERROR_INT);
 
 	vpdma_enable_list_complete_irq(ctx->dev->vpdma, 0, true);
 }
@@ -818,8 +1055,15 @@ static void device_run(void *priv)
 	struct vpe_ctx *ctx = priv;
 	struct vpe_q_data *d_q_data = &ctx->q_data[Q_DATA_DST];
 
-	ctx->src_vb = v4l2_m2m_src_buf_remove(ctx->m2m_ctx);
-	WARN_ON(ctx->src_vb == NULL);
+	if (ctx->deinterlacing && ctx->src_vbs[2] == NULL) {
+		ctx->src_vbs[2] = v4l2_m2m_src_buf_remove(ctx->m2m_ctx);
+		WARN_ON(ctx->src_vbs[2] == NULL);
+		ctx->src_vbs[1] = v4l2_m2m_src_buf_remove(ctx->m2m_ctx);
+		WARN_ON(ctx->src_vbs[1] == NULL);
+	}
+
+	ctx->src_vbs[0] = v4l2_m2m_src_buf_remove(ctx->m2m_ctx);
+	WARN_ON(ctx->src_vbs[0] == NULL);
 	ctx->dst_vb = v4l2_m2m_dst_buf_remove(ctx->m2m_ctx);
 	WARN_ON(ctx->dst_vb == NULL);
 
@@ -831,28 +1075,67 @@ static void device_run(void *priv)
 		ctx->load_mmrs = false;
 	}
 
+	/* output data descriptors */
+	if (ctx->deinterlacing)
+		add_out_dtd(ctx, VPE_PORT_MV_OUT);
+
 	add_out_dtd(ctx, VPE_PORT_LUMA_OUT);
 	if (d_q_data->fmt->coplanar)
 		add_out_dtd(ctx, VPE_PORT_CHROMA_OUT);
 
+	/* input data descriptors */
+	if (ctx->deinterlacing) {
+		add_in_dtd(ctx, VPE_PORT_LUMA3_IN);
+		add_in_dtd(ctx, VPE_PORT_CHROMA3_IN);
+
+		add_in_dtd(ctx, VPE_PORT_LUMA2_IN);
+		add_in_dtd(ctx, VPE_PORT_CHROMA2_IN);
+	}
+
 	add_in_dtd(ctx, VPE_PORT_LUMA1_IN);
 	add_in_dtd(ctx, VPE_PORT_CHROMA1_IN);
 
+	if (ctx->deinterlacing)
+		add_in_dtd(ctx, VPE_PORT_MV_IN);
+
 	/* sync on channel control descriptors for input ports */
 	vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_LUMA1_IN);
 	vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_CHROMA1_IN);
 
+	if (ctx->deinterlacing) {
+		vpdma_add_sync_on_channel_ctd(&ctx->desc_list,
+			VPE_CHAN_LUMA2_IN);
+		vpdma_add_sync_on_channel_ctd(&ctx->desc_list,
+			VPE_CHAN_CHROMA2_IN);
+
+		vpdma_add_sync_on_channel_ctd(&ctx->desc_list,
+			VPE_CHAN_LUMA3_IN);
+		vpdma_add_sync_on_channel_ctd(&ctx->desc_list,
+			VPE_CHAN_CHROMA3_IN);
+
+		vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_MV_IN);
+	}
+
 	/* sync on channel control descriptors for output ports */
 	vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_LUMA_OUT);
 	if (d_q_data->fmt->coplanar)
 		vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_CHROMA_OUT);
 
+	if (ctx->deinterlacing)
+		vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_MV_OUT);
+
 	enable_irqs(ctx);
 
 	vpdma_map_desc_buf(ctx->dev->vpdma, &ctx->desc_list.buf);
 	vpdma_submit_descs(ctx->dev->vpdma, &ctx->desc_list);
 }
 
+static void dei_error(struct vpe_ctx *ctx)
+{
+	dev_warn(ctx->dev->v4l2_dev.dev,
+		"received DEI error interrupt\n");
+}
+
 static void ds1_uv_error(struct vpe_ctx *ctx)
 {
 	dev_warn(ctx->dev->v4l2_dev.dev,
@@ -863,6 +1146,7 @@ static irqreturn_t vpe_irq(int irq_vpe, void *data)
 {
 	struct vpe_dev *dev = (struct vpe_dev *)data;
 	struct vpe_ctx *ctx;
+	struct vpe_q_data *d_q_data;
 	struct vb2_buffer *s_vb, *d_vb;
 	struct v4l2_buffer *s_buf, *d_buf;
 	unsigned long flags;
@@ -886,9 +1170,15 @@ static irqreturn_t vpe_irq(int irq_vpe, void *data)
 		goto handled;
 	}
 
-	if (irqst1 & VPE_DS1_UV_ERROR_INT) {
-		irqst1 &= ~VPE_DS1_UV_ERROR_INT;
-		ds1_uv_error(ctx);
+	if (irqst1) {
+		if (irqst1 & VPE_DEI_ERROR_INT) {
+			irqst1 &= ~VPE_DEI_ERROR_INT;
+			dei_error(ctx);
+		}
+		if (irqst1 & VPE_DS1_UV_ERROR_INT) {
+			irqst1 &= ~VPE_DS1_UV_ERROR_INT;
+			ds1_uv_error(ctx);
+		}
 	}
 
 	if (irqst0) {
@@ -911,10 +1201,13 @@ static irqreturn_t vpe_irq(int irq_vpe, void *data)
 
 	vpdma_reset_desc_list(&ctx->desc_list);
 
+	 /* the previous dst mv buffer becomes the next src mv buffer */
+	ctx->src_mv_buf_selector = !ctx->src_mv_buf_selector;
+
 	if (ctx->aborting)
 		goto finished;
 
-	s_vb = ctx->src_vb;
+	s_vb = ctx->src_vbs[0];
 	d_vb = ctx->dst_vb;
 	s_buf = &s_vb->v4l2_buf;
 	d_buf = &d_vb->v4l2_buf;
@@ -924,16 +1217,35 @@ static irqreturn_t vpe_irq(int irq_vpe, void *data)
 		d_buf->flags |= V4L2_BUF_FLAG_TIMECODE;
 		d_buf->timecode = s_buf->timecode;
 	}
-
 	d_buf->sequence = ctx->sequence;
+	d_buf->field = ctx->field;
+
+	d_q_data = &ctx->q_data[Q_DATA_DST];
+	if (d_q_data->flags & Q_DATA_INTERLACED) {
+		if (ctx->field == V4L2_FIELD_BOTTOM) {
+			ctx->sequence++;
+			ctx->field = V4L2_FIELD_TOP;
+		} else {
+			WARN_ON(ctx->field != V4L2_FIELD_TOP);
+			ctx->field = V4L2_FIELD_BOTTOM;
+		}
+	} else {
+		ctx->sequence++;
+	}
 
-	ctx->sequence++;
+	if (ctx->deinterlacing)
+		s_vb = ctx->src_vbs[2];
 
 	spin_lock_irqsave(&dev->lock, flags);
 	v4l2_m2m_buf_done(s_vb, VB2_BUF_STATE_DONE);
 	v4l2_m2m_buf_done(d_vb, VB2_BUF_STATE_DONE);
 	spin_unlock_irqrestore(&dev->lock, flags);
 
+	if (ctx->deinterlacing) {
+		ctx->src_vbs[2] = ctx->src_vbs[1];
+		ctx->src_vbs[1] = ctx->src_vbs[0];
+	}
+
 	ctx->bufs_completed++;
 	if (ctx->bufs_completed < ctx->bufs_per_job) {
 		device_run(ctx);
@@ -1012,6 +1324,7 @@ static int vpe_g_fmt(struct file *file, void *priv, struct v4l2_format *f)
 	pix->width = q_data->width;
 	pix->height = q_data->height;
 	pix->pixelformat = q_data->fmt->fourcc;
+	pix->field = q_data->field;
 
 	if (V4L2_TYPE_IS_OUTPUT(f->type)) {
 		pix->colorspace = q_data->colorspace;
@@ -1047,7 +1360,8 @@ static int __vpe_try_fmt(struct vpe_ctx *ctx, struct v4l2_format *f,
 		return -EINVAL;
 	}
 
-	pix->field = V4L2_FIELD_NONE;
+	if (pix->field != V4L2_FIELD_NONE && pix->field != V4L2_FIELD_ALTERNATE)
+		pix->field = V4L2_FIELD_NONE;
 
 	v4l_bound_align_image(&pix->width, MIN_W, MAX_W, W_ALIGN,
 			      &pix->height, MIN_H, MAX_H, H_ALIGN,
@@ -1124,6 +1438,7 @@ static int __vpe_s_fmt(struct vpe_ctx *ctx, struct v4l2_format *f)
 	q_data->width		= pix->width;
 	q_data->height		= pix->height;
 	q_data->colorspace	= pix->colorspace;
+	q_data->field		= pix->field;
 
 	for (i = 0; i < pix->num_planes; i++) {
 		plane_fmt = &pix->plane_fmt[i];
@@ -1137,6 +1452,11 @@ static int __vpe_s_fmt(struct vpe_ctx *ctx, struct v4l2_format *f)
 	q_data->c_rect.width	= q_data->width;
 	q_data->c_rect.height	= q_data->height;
 
+	if (q_data->field == V4L2_FIELD_ALTERNATE)
+		q_data->flags |= Q_DATA_INTERLACED;
+	else
+		q_data->flags &= ~Q_DATA_INTERLACED;
+
 	vpe_dbg(ctx->dev, "Setting format for type %d, wxh: %dx%d, fmt: %d bpl_y %d",
 		f->type, q_data->width, q_data->height, q_data->fmt->fourcc,
 		q_data->bytesperline[VPE_LUMA]);
@@ -1451,6 +1771,7 @@ static int vpe_open(struct file *file)
 	s_q_data->sizeimage[VPE_LUMA] = (s_q_data->width * s_q_data->height *
 			s_q_data->fmt->vpdma_fmt[VPE_LUMA]->depth) >> 3;
 	s_q_data->colorspace = V4L2_COLORSPACE_SMPTE240M;
+	s_q_data->field = V4L2_FIELD_NONE;
 	s_q_data->c_rect.left = 0;
 	s_q_data->c_rect.top = 0;
 	s_q_data->c_rect.width = s_q_data->width;
@@ -1459,6 +1780,7 @@ static int vpe_open(struct file *file)
 
 	ctx->q_data[Q_DATA_DST] = *s_q_data;
 
+	set_dei_shadow_registers(ctx);
 	set_src_registers(ctx);
 	set_dst_registers(ctx);
 	ret = set_srcdst_params(ctx);
@@ -1513,6 +1835,8 @@ static int vpe_release(struct file *file)
 	vpe_dbg(dev, "releasing instance %p\n", ctx);
 
 	mutex_lock(&dev->dev_mutex);
+	free_vbs(ctx);
+	free_mv_buffers(ctx);
 	vpdma_free_desc_list(&ctx->desc_list);
 	vpdma_free_desc_buf(&ctx->mmr_adb);
 
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v5 4/4] v4l: ti-vpe: Add de-interlacer support in VPE
@ 2013-10-16  5:36         ` Archit Taneja
  0 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-10-16  5:36 UTC (permalink / raw)
  To: k.debski; +Cc: hverkuil, linux-media, linux-omap, Archit Taneja

Add support for the de-interlacer block in VPE.

For de-interlacer to work, we need to enable 2 more sets of VPE input ports
which fetch data from the 'last' and 'last to last' fields of the interlaced
video. Apart from that, we need to enable the Motion vector output and input
ports, and also allocate DMA buffers for them.

We need to make sure that two most recent fields in the source queue are
available and in the 'READY' state. Once a mem2mem context gets access to the
VPE HW(in device_run), it extracts the addresses of the 3 buffers, and provides
it to the data descriptors for the 3 sets of input ports((LUMA1, CHROMA1),
(LUMA2, CHROMA2), and (LUMA3, CHROMA3)) respectively for the 3 consecutive
fields. The motion vector and output port descriptors are configured and the
list is submitted to VPDMA.

Once the transaction is done, the v4l2 buffer corresponding to the oldest
field(the 3rd one) is changed to the state 'DONE', and the buffers corresponding
to 1st and 2nd fields become the 2nd and 3rd field for the next de-interlace
operation. This way, for each deinterlace operation, we have the 3 most recent
fields. After each transaction, we also swap the motion vector buffers, the new
input motion vector buffer contains the resultant motion information of all the
previous frames, and the new output motion vector buffer will be used to hold
the updated motion vector to capture the motion changes in the next field. The
motion vector buffers are allocated using the DMA allocation API.

The de-interlacer is removed from bypass mode, it requires some extra default
configurations which are now added. The chrominance upsampler coefficients are
added for interlaced frames. Some VPDMA parameters like frame start event and
line mode are configured for the 2 extra sets of input ports.

Acked-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Archit Taneja <archit@ti.com>
---
 drivers/media/platform/ti-vpe/vpe.c | 392 ++++++++++++++++++++++++++++++++----
 1 file changed, 358 insertions(+), 34 deletions(-)

diff --git a/drivers/media/platform/ti-vpe/vpe.c b/drivers/media/platform/ti-vpe/vpe.c
index 3bd9ca6..4e58069 100644
--- a/drivers/media/platform/ti-vpe/vpe.c
+++ b/drivers/media/platform/ti-vpe/vpe.c
@@ -69,6 +69,8 @@
 #define VPE_CHROMA	1
 
 /* per m2m context info */
+#define VPE_MAX_SRC_BUFS	3	/* need 3 src fields to de-interlace */
+
 #define VPE_DEF_BUFS_PER_JOB	1	/* default one buffer per batch job */
 
 /*
@@ -111,6 +113,38 @@ static const struct vpe_us_coeffs us_coeffs[] = {
 		0x00C8, 0x0348, 0x0018, 0x3FD8, 0x3FB8, 0x0378, 0x00E8, 0x3FE8,
 		0x00C8, 0x0348, 0x0018, 0x3FD8, 0x3FB8, 0x0378, 0x00E8, 0x3FE8,
 	},
+	{
+		/* Coefficients for Top Field Interlaced input */
+		0x0051, 0x03D5, 0x3FE3, 0x3FF7, 0x3FB5, 0x02E9, 0x018F, 0x3FD3,
+		/* Coefficients for Bottom Field Interlaced input */
+		0x016B, 0x0247, 0x00B1, 0x3F9D, 0x3FCF, 0x03DB, 0x005D, 0x3FF9,
+	},
+};
+
+/*
+ * the following registers are for configuring some of the parameters of the
+ * motion and edge detection blocks inside DEI, these generally remain the same,
+ * these could be passed later via userspace if some one needs to tweak these.
+ */
+struct vpe_dei_regs {
+	unsigned long mdt_spacial_freq_thr_reg;		/* VPE_DEI_REG2 */
+	unsigned long edi_config_reg;			/* VPE_DEI_REG3 */
+	unsigned long edi_lut_reg0;			/* VPE_DEI_REG4 */
+	unsigned long edi_lut_reg1;			/* VPE_DEI_REG5 */
+	unsigned long edi_lut_reg2;			/* VPE_DEI_REG6 */
+	unsigned long edi_lut_reg3;			/* VPE_DEI_REG7 */
+};
+
+/*
+ * default expert DEI register values, unlikely to be modified.
+ */
+static const struct vpe_dei_regs dei_regs = {
+	0x020C0804u,
+	0x0118100Fu,
+	0x08040200u,
+	0x1010100Cu,
+	0x10101010u,
+	0x10101010u,
 };
 
 /*
@@ -118,6 +152,7 @@ static const struct vpe_us_coeffs us_coeffs[] = {
  */
 struct vpe_port_data {
 	enum vpdma_channel channel;	/* VPDMA channel */
+	u8	vb_index;		/* input frame f, f-1, f-2 index */
 	u8	vb_part;		/* plane index for co-panar formats */
 };
 
@@ -126,6 +161,12 @@ struct vpe_port_data {
  */
 #define VPE_PORT_LUMA1_IN	0
 #define VPE_PORT_CHROMA1_IN	1
+#define VPE_PORT_LUMA2_IN	2
+#define VPE_PORT_CHROMA2_IN	3
+#define VPE_PORT_LUMA3_IN	4
+#define VPE_PORT_CHROMA3_IN	5
+#define VPE_PORT_MV_IN		6
+#define VPE_PORT_MV_OUT		7
 #define VPE_PORT_LUMA_OUT	8
 #define VPE_PORT_CHROMA_OUT	9
 #define VPE_PORT_RGB_OUT	10
@@ -133,12 +174,40 @@ struct vpe_port_data {
 static const struct vpe_port_data port_data[11] = {
 	[VPE_PORT_LUMA1_IN] = {
 		.channel	= VPE_CHAN_LUMA1_IN,
+		.vb_index	= 0,
 		.vb_part	= VPE_LUMA,
 	},
 	[VPE_PORT_CHROMA1_IN] = {
 		.channel	= VPE_CHAN_CHROMA1_IN,
+		.vb_index	= 0,
+		.vb_part	= VPE_CHROMA,
+	},
+	[VPE_PORT_LUMA2_IN] = {
+		.channel	= VPE_CHAN_LUMA2_IN,
+		.vb_index	= 1,
+		.vb_part	= VPE_LUMA,
+	},
+	[VPE_PORT_CHROMA2_IN] = {
+		.channel	= VPE_CHAN_CHROMA2_IN,
+		.vb_index	= 1,
+		.vb_part	= VPE_CHROMA,
+	},
+	[VPE_PORT_LUMA3_IN] = {
+		.channel	= VPE_CHAN_LUMA3_IN,
+		.vb_index	= 2,
+		.vb_part	= VPE_LUMA,
+	},
+	[VPE_PORT_CHROMA3_IN] = {
+		.channel	= VPE_CHAN_CHROMA3_IN,
+		.vb_index	= 2,
 		.vb_part	= VPE_CHROMA,
 	},
+	[VPE_PORT_MV_IN] = {
+		.channel	= VPE_CHAN_MV_IN,
+	},
+	[VPE_PORT_MV_OUT] = {
+		.channel	= VPE_CHAN_MV_OUT,
+	},
 	[VPE_PORT_LUMA_OUT] = {
 		.channel	= VPE_CHAN_LUMA_OUT,
 		.vb_part	= VPE_LUMA,
@@ -210,6 +279,7 @@ struct vpe_q_data {
 	unsigned int		height;				/* frame height */
 	unsigned int		bytesperline[VPE_MAX_PLANES];	/* bytes per line in memory */
 	enum v4l2_colorspace	colorspace;
+	enum v4l2_field		field;				/* supported field value */
 	unsigned int		flags;
 	unsigned int		sizeimage[VPE_MAX_PLANES];	/* image size in memory */
 	struct v4l2_rect	c_rect;				/* crop/compose rectangle */
@@ -219,6 +289,7 @@ struct vpe_q_data {
 /* vpe_q_data flag bits */
 #define	Q_DATA_FRAME_1D		(1 << 0)
 #define	Q_DATA_MODE_TILED	(1 << 1)
+#define	Q_DATA_INTERLACED	(1 << 2)
 
 enum {
 	Q_DATA_SRC = 0,
@@ -270,6 +341,7 @@ struct vpe_ctx {
 	struct v4l2_m2m_ctx	*m2m_ctx;
 	struct v4l2_ctrl_handler hdl;
 
+	unsigned int		field;			/* current field */
 	unsigned int		sequence;		/* current frame/field seq */
 	unsigned int		aborting;		/* abort after next irq */
 
@@ -277,13 +349,19 @@ struct vpe_ctx {
 	unsigned int		bufs_completed;		/* bufs done in this batch */
 
 	struct vpe_q_data	q_data[2];		/* src & dst queue data */
-	struct vb2_buffer	*src_vb;
+	struct vb2_buffer	*src_vbs[VPE_MAX_SRC_BUFS];
 	struct vb2_buffer	*dst_vb;
 
+	dma_addr_t		mv_buf_dma[2];		/* dma addrs of motion vector in/out bufs */
+	void			*mv_buf[2];		/* virtual addrs of motion vector bufs */
+	size_t			mv_buf_size;		/* current motion vector buffer size */
 	struct vpdma_buf	mmr_adb;		/* shadow reg addr/data block */
 	struct vpdma_desc_list	desc_list;		/* DMA descriptor list */
 
+	bool			deinterlacing;		/* using de-interlacer */
 	bool			load_mmrs;		/* have new shadow reg values */
+
+	unsigned int		src_mv_buf_selector;
 };
 
 
@@ -359,8 +437,7 @@ struct vpe_mmr_adb {
 	struct vpdma_adb_hdr	us3_hdr;
 	u32			us3_regs[8];
 	struct vpdma_adb_hdr	dei_hdr;
-	u32			dei_regs[1];
-	u32			dei_pad[3];
+	u32			dei_regs[8];
 	struct vpdma_adb_hdr	sc_hdr;
 	u32			sc_regs[1];
 	u32			sc_pad[3];
@@ -386,6 +463,80 @@ static void init_adb_hdrs(struct vpe_ctx *ctx)
 };
 
 /*
+ * Allocate or re-allocate the motion vector DMA buffers
+ * There are two buffers, one for input and one for output.
+ * However, the roles are reversed after each field is processed.
+ * In other words, after each field is processed, the previous
+ * output (dst) MV buffer becomes the new input (src) MV buffer.
+ */
+static int realloc_mv_buffers(struct vpe_ctx *ctx, size_t size)
+{
+	struct device *dev = ctx->dev->v4l2_dev.dev;
+
+	if (ctx->mv_buf_size == size)
+		return 0;
+
+	if (ctx->mv_buf[0])
+		dma_free_coherent(dev, ctx->mv_buf_size, ctx->mv_buf[0],
+			ctx->mv_buf_dma[0]);
+
+	if (ctx->mv_buf[1])
+		dma_free_coherent(dev, ctx->mv_buf_size, ctx->mv_buf[1],
+			ctx->mv_buf_dma[1]);
+
+	if (size == 0)
+		return 0;
+
+	ctx->mv_buf[0] = dma_alloc_coherent(dev, size, &ctx->mv_buf_dma[0],
+				GFP_KERNEL);
+	if (!ctx->mv_buf[0]) {
+		vpe_err(ctx->dev, "failed to allocate motion vector buffer\n");
+		return -ENOMEM;
+	}
+
+	ctx->mv_buf[1] = dma_alloc_coherent(dev, size, &ctx->mv_buf_dma[1],
+				GFP_KERNEL);
+	if (!ctx->mv_buf[1]) {
+		vpe_err(ctx->dev, "failed to allocate motion vector buffer\n");
+		dma_free_coherent(dev, size, ctx->mv_buf[0],
+			ctx->mv_buf_dma[0]);
+
+		return -ENOMEM;
+	}
+
+	ctx->mv_buf_size = size;
+	ctx->src_mv_buf_selector = 0;
+
+	return 0;
+}
+
+static void free_mv_buffers(struct vpe_ctx *ctx)
+{
+	realloc_mv_buffers(ctx, 0);
+}
+
+/*
+ * While de-interlacing, we keep the two most recent input buffers
+ * around.  This function frees those two buffers when we have
+ * finished processing the current stream.
+ */
+static void free_vbs(struct vpe_ctx *ctx)
+{
+	struct vpe_dev *dev = ctx->dev;
+	unsigned long flags;
+
+	if (ctx->src_vbs[2] == NULL)
+		return;
+
+	spin_lock_irqsave(&dev->lock, flags);
+	if (ctx->src_vbs[2]) {
+		v4l2_m2m_buf_done(ctx->src_vbs[2], VB2_BUF_STATE_DONE);
+		v4l2_m2m_buf_done(ctx->src_vbs[1], VB2_BUF_STATE_DONE);
+	}
+	spin_unlock_irqrestore(&dev->lock, flags);
+}
+
+/*
  * Enable or disable the VPE clocks
  */
 static void vpe_set_clock_enable(struct vpe_dev *dev, bool on)
@@ -426,6 +577,7 @@ static void vpe_top_vpdma_reset(struct vpe_dev *dev)
 static void set_us_coefficients(struct vpe_ctx *ctx)
 {
 	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	struct vpe_q_data *s_q_data = &ctx->q_data[Q_DATA_SRC];
 	u32 *us1_reg = &mmr_adb->us1_regs[0];
 	u32 *us2_reg = &mmr_adb->us2_regs[0];
 	u32 *us3_reg = &mmr_adb->us3_regs[0];
@@ -433,6 +585,9 @@ static void set_us_coefficients(struct vpe_ctx *ctx)
 
 	cp = &us_coeffs[0].anchor_fid0_c0;
 
+	if (s_q_data->flags & Q_DATA_INTERLACED)	/* interlaced */
+		cp += sizeof(us_coeffs[0]) / sizeof(*cp);
+
 	end_cp = cp + sizeof(us_coeffs[0]) / sizeof(*cp);
 
 	while (cp < end_cp) {
@@ -473,14 +628,28 @@ static void set_cfg_and_line_modes(struct vpe_ctx *ctx)
 
 	/* regs for now */
 	vpdma_set_line_mode(ctx->dev->vpdma, line_mode, VPE_CHAN_CHROMA1_IN);
+	vpdma_set_line_mode(ctx->dev->vpdma, line_mode, VPE_CHAN_CHROMA2_IN);
+	vpdma_set_line_mode(ctx->dev->vpdma, line_mode, VPE_CHAN_CHROMA3_IN);
 
 	/* frame start for input luma */
 	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
 		VPE_CHAN_LUMA1_IN);
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_LUMA2_IN);
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_LUMA3_IN);
 
 	/* frame start for input chroma */
 	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
 		VPE_CHAN_CHROMA1_IN);
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_CHROMA2_IN);
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_CHROMA3_IN);
+
+	/* frame start for MV in client */
+	vpdma_set_frame_start_event(ctx->dev->vpdma, VPDMA_FSEVENT_CHANNEL_ACTIVE,
+		VPE_CHAN_MV_IN);
 
 	ctx->load_mmrs = true;
 }
@@ -524,13 +693,14 @@ static void set_dst_registers(struct vpe_ctx *ctx)
 /*
  * Set the de-interlacer shadow register values
  */
-static void set_dei_regs_bypass(struct vpe_ctx *ctx)
+static void set_dei_regs(struct vpe_ctx *ctx)
 {
 	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
 	struct vpe_q_data *s_q_data = &ctx->q_data[Q_DATA_SRC];
 	unsigned int src_h = s_q_data->c_rect.height;
 	unsigned int src_w = s_q_data->c_rect.width;
 	u32 *dei_mmr0 = &mmr_adb->dei_regs[0];
+	bool deinterlace = true;
 	u32 val = 0;
 
 	/*
@@ -539,7 +709,13 @@ static void set_dei_regs_bypass(struct vpe_ctx *ctx)
 	 * for both progressive and interlace content in interlace bypass mode.
 	 * It has been recommended not to use progressive bypass mode.
 	 */
-	val = VPE_DEI_INTERLACE_BYPASS;
+	if ((!ctx->deinterlacing && (s_q_data->flags & Q_DATA_INTERLACED)) ||
+			!(s_q_data->flags & Q_DATA_INTERLACED)) {
+		deinterlace = false;
+		val = VPE_DEI_INTERLACE_BYPASS;
+	}
+
+	src_h = deinterlace ? src_h * 2 : src_h;
 
 	val |= (src_h << VPE_DEI_HEIGHT_SHIFT) |
 		(src_w << VPE_DEI_WIDTH_SHIFT) |
@@ -550,6 +726,22 @@ static void set_dei_regs_bypass(struct vpe_ctx *ctx)
 	ctx->load_mmrs = true;
 }
 
+static void set_dei_shadow_registers(struct vpe_ctx *ctx)
+{
+	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
+	u32 *dei_mmr = &mmr_adb->dei_regs[0];
+	const struct vpe_dei_regs *cur = &dei_regs;
+
+	dei_mmr[2]  = cur->mdt_spacial_freq_thr_reg;
+	dei_mmr[3]  = cur->edi_config_reg;
+	dei_mmr[4]  = cur->edi_lut_reg0;
+	dei_mmr[5]  = cur->edi_lut_reg1;
+	dei_mmr[6]  = cur->edi_lut_reg2;
+	dei_mmr[7]  = cur->edi_lut_reg3;
+
+	ctx->load_mmrs = true;
+}
+
 static void set_csc_coeff_bypass(struct vpe_ctx *ctx)
 {
 	struct vpe_mmr_adb *mmr_adb = ctx->mmr_adb.addr;
@@ -578,10 +770,35 @@ static void set_sc_regs_bypass(struct vpe_ctx *ctx)
  */
 static int set_srcdst_params(struct vpe_ctx *ctx)
 {
+	struct vpe_q_data *s_q_data =  &ctx->q_data[Q_DATA_SRC];
+	struct vpe_q_data *d_q_data =  &ctx->q_data[Q_DATA_DST];
+	size_t mv_buf_size;
+	int ret;
+
 	ctx->sequence = 0;
+	ctx->field = V4L2_FIELD_TOP;
+
+	if ((s_q_data->flags & Q_DATA_INTERLACED) &&
+			!(d_q_data->flags & Q_DATA_INTERLACED)) {
+		const struct vpdma_data_format *mv =
+			&vpdma_misc_fmts[VPDMA_DATA_FMT_MV];
+
+		ctx->deinterlacing = 1;
+		mv_buf_size =
+			(s_q_data->width * s_q_data->height * mv->depth) >> 3;
+	} else {
+		ctx->deinterlacing = 0;
+		mv_buf_size = 0;
+	}
+
+	free_vbs(ctx);
+
+	ret = realloc_mv_buffers(ctx, mv_buf_size);
+	if (ret)
+		return ret;
 
 	set_cfg_and_line_modes(ctx);
-	set_dei_regs_bypass(ctx);
+	set_dei_regs(ctx);
 	set_csc_coeff_bypass(ctx);
 	set_sc_regs_bypass(ctx);
 
@@ -608,6 +825,9 @@ static int job_ready(void *priv)
 	struct vpe_ctx *ctx = priv;
 	int needed = ctx->bufs_per_job;
 
+	if (ctx->deinterlacing && ctx->src_vbs[2] == NULL)
+		needed += 2;	/* need additional two most recent fields */
+
 	if (v4l2_m2m_num_src_bufs_ready(ctx->m2m_ctx) < needed)
 		return 0;
 
@@ -735,17 +955,25 @@ static void add_out_dtd(struct vpe_ctx *ctx, int port)
 	struct v4l2_rect *c_rect = &q_data->c_rect;
 	struct vpe_fmt *fmt = q_data->fmt;
 	const struct vpdma_data_format *vpdma_fmt;
-	int plane = fmt->coplanar ? p_data->vb_part : 0;
+	int mv_buf_selector = !ctx->src_mv_buf_selector;
 	dma_addr_t dma_addr;
 	u32 flags = 0;
 
-	vpdma_fmt = fmt->vpdma_fmt[plane];
-	dma_addr = vb2_dma_contig_plane_dma_addr(vb, plane);
-	if (!dma_addr) {
-		vpe_err(ctx->dev,
-			"acquiring output buffer(%d) dma_addr failed\n",
-			port);
-		return;
+	if (port == VPE_PORT_MV_OUT) {
+		vpdma_fmt = &vpdma_misc_fmts[VPDMA_DATA_FMT_MV];
+		dma_addr = ctx->mv_buf_dma[mv_buf_selector];
+	} else {
+		/* to incorporate interleaved formats */
+		int plane = fmt->coplanar ? p_data->vb_part : 0;
+
+		vpdma_fmt = fmt->vpdma_fmt[plane];
+		dma_addr = vb2_dma_contig_plane_dma_addr(vb, plane);
+		if (!dma_addr) {
+			vpe_err(ctx->dev,
+				"acquiring output buffer(%d) dma_addr failed\n",
+				port);
+			return;
+		}
 	}
 
 	if (q_data->flags & Q_DATA_FRAME_1D)
@@ -761,23 +989,31 @@ static void add_in_dtd(struct vpe_ctx *ctx, int port)
 {
 	struct vpe_q_data *q_data = &ctx->q_data[Q_DATA_SRC];
 	const struct vpe_port_data *p_data = &port_data[port];
-	struct vb2_buffer *vb = ctx->src_vb;
+	struct vb2_buffer *vb = ctx->src_vbs[p_data->vb_index];
 	struct v4l2_rect *c_rect = &q_data->c_rect;
 	struct vpe_fmt *fmt = q_data->fmt;
 	const struct vpdma_data_format *vpdma_fmt;
-	int plane = fmt->coplanar ? p_data->vb_part : 0;
-	int field = 0;
+	int mv_buf_selector = ctx->src_mv_buf_selector;
+	int field = vb->v4l2_buf.field == V4L2_FIELD_BOTTOM;
 	dma_addr_t dma_addr;
 	u32 flags = 0;
 
-	vpdma_fmt = fmt->vpdma_fmt[plane];
+	if (port == VPE_PORT_MV_IN) {
+		vpdma_fmt = &vpdma_misc_fmts[VPDMA_DATA_FMT_MV];
+		dma_addr = ctx->mv_buf_dma[mv_buf_selector];
+	} else {
+		/* to incorporate interleaved formats */
+		int plane = fmt->coplanar ? p_data->vb_part : 0;
 
-	dma_addr = vb2_dma_contig_plane_dma_addr(vb, plane);
-	if (!dma_addr) {
-		vpe_err(ctx->dev,
-			"acquiring input buffer(%d) dma_addr failed\n",
-			port);
-		return;
+		vpdma_fmt = fmt->vpdma_fmt[plane];
+
+		dma_addr = vb2_dma_contig_plane_dma_addr(vb, plane);
+		if (!dma_addr) {
+			vpe_err(ctx->dev,
+				"acquiring input buffer(%d) dma_addr failed\n",
+				port);
+			return;
+		}
 	}
 
 	if (q_data->flags & Q_DATA_FRAME_1D)
@@ -795,7 +1031,8 @@ static void add_in_dtd(struct vpe_ctx *ctx, int port)
 static void enable_irqs(struct vpe_ctx *ctx)
 {
 	write_reg(ctx->dev, VPE_INT0_ENABLE0_SET, VPE_INT0_LIST0_COMPLETE);
-	write_reg(ctx->dev, VPE_INT0_ENABLE1_SET, VPE_DS1_UV_ERROR_INT);
+	write_reg(ctx->dev, VPE_INT0_ENABLE1_SET, VPE_DEI_ERROR_INT |
+				VPE_DS1_UV_ERROR_INT);
 
 	vpdma_enable_list_complete_irq(ctx->dev->vpdma, 0, true);
 }
@@ -818,8 +1055,15 @@ static void device_run(void *priv)
 	struct vpe_ctx *ctx = priv;
 	struct vpe_q_data *d_q_data = &ctx->q_data[Q_DATA_DST];
 
-	ctx->src_vb = v4l2_m2m_src_buf_remove(ctx->m2m_ctx);
-	WARN_ON(ctx->src_vb == NULL);
+	if (ctx->deinterlacing && ctx->src_vbs[2] == NULL) {
+		ctx->src_vbs[2] = v4l2_m2m_src_buf_remove(ctx->m2m_ctx);
+		WARN_ON(ctx->src_vbs[2] == NULL);
+		ctx->src_vbs[1] = v4l2_m2m_src_buf_remove(ctx->m2m_ctx);
+		WARN_ON(ctx->src_vbs[1] == NULL);
+	}
+
+	ctx->src_vbs[0] = v4l2_m2m_src_buf_remove(ctx->m2m_ctx);
+	WARN_ON(ctx->src_vbs[0] == NULL);
 	ctx->dst_vb = v4l2_m2m_dst_buf_remove(ctx->m2m_ctx);
 	WARN_ON(ctx->dst_vb == NULL);
 
@@ -831,28 +1075,67 @@ static void device_run(void *priv)
 		ctx->load_mmrs = false;
 	}
 
+	/* output data descriptors */
+	if (ctx->deinterlacing)
+		add_out_dtd(ctx, VPE_PORT_MV_OUT);
+
 	add_out_dtd(ctx, VPE_PORT_LUMA_OUT);
 	if (d_q_data->fmt->coplanar)
 		add_out_dtd(ctx, VPE_PORT_CHROMA_OUT);
 
+	/* input data descriptors */
+	if (ctx->deinterlacing) {
+		add_in_dtd(ctx, VPE_PORT_LUMA3_IN);
+		add_in_dtd(ctx, VPE_PORT_CHROMA3_IN);
+
+		add_in_dtd(ctx, VPE_PORT_LUMA2_IN);
+		add_in_dtd(ctx, VPE_PORT_CHROMA2_IN);
+	}
+
 	add_in_dtd(ctx, VPE_PORT_LUMA1_IN);
 	add_in_dtd(ctx, VPE_PORT_CHROMA1_IN);
 
+	if (ctx->deinterlacing)
+		add_in_dtd(ctx, VPE_PORT_MV_IN);
+
 	/* sync on channel control descriptors for input ports */
 	vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_LUMA1_IN);
 	vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_CHROMA1_IN);
 
+	if (ctx->deinterlacing) {
+		vpdma_add_sync_on_channel_ctd(&ctx->desc_list,
+			VPE_CHAN_LUMA2_IN);
+		vpdma_add_sync_on_channel_ctd(&ctx->desc_list,
+			VPE_CHAN_CHROMA2_IN);
+
+		vpdma_add_sync_on_channel_ctd(&ctx->desc_list,
+			VPE_CHAN_LUMA3_IN);
+		vpdma_add_sync_on_channel_ctd(&ctx->desc_list,
+			VPE_CHAN_CHROMA3_IN);
+
+		vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_MV_IN);
+	}
+
 	/* sync on channel control descriptors for output ports */
 	vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_LUMA_OUT);
 	if (d_q_data->fmt->coplanar)
 		vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_CHROMA_OUT);
 
+	if (ctx->deinterlacing)
+		vpdma_add_sync_on_channel_ctd(&ctx->desc_list, VPE_CHAN_MV_OUT);
+
 	enable_irqs(ctx);
 
 	vpdma_map_desc_buf(ctx->dev->vpdma, &ctx->desc_list.buf);
 	vpdma_submit_descs(ctx->dev->vpdma, &ctx->desc_list);
 }
 
+static void dei_error(struct vpe_ctx *ctx)
+{
+	dev_warn(ctx->dev->v4l2_dev.dev,
+		"received DEI error interrupt\n");
+}
+
 static void ds1_uv_error(struct vpe_ctx *ctx)
 {
 	dev_warn(ctx->dev->v4l2_dev.dev,
@@ -863,6 +1146,7 @@ static irqreturn_t vpe_irq(int irq_vpe, void *data)
 {
 	struct vpe_dev *dev = (struct vpe_dev *)data;
 	struct vpe_ctx *ctx;
+	struct vpe_q_data *d_q_data;
 	struct vb2_buffer *s_vb, *d_vb;
 	struct v4l2_buffer *s_buf, *d_buf;
 	unsigned long flags;
@@ -886,9 +1170,15 @@ static irqreturn_t vpe_irq(int irq_vpe, void *data)
 		goto handled;
 	}
 
-	if (irqst1 & VPE_DS1_UV_ERROR_INT) {
-		irqst1 &= ~VPE_DS1_UV_ERROR_INT;
-		ds1_uv_error(ctx);
+	if (irqst1) {
+		if (irqst1 & VPE_DEI_ERROR_INT) {
+			irqst1 &= ~VPE_DEI_ERROR_INT;
+			dei_error(ctx);
+		}
+		if (irqst1 & VPE_DS1_UV_ERROR_INT) {
+			irqst1 &= ~VPE_DS1_UV_ERROR_INT;
+			ds1_uv_error(ctx);
+		}
 	}
 
 	if (irqst0) {
@@ -911,10 +1201,13 @@ static irqreturn_t vpe_irq(int irq_vpe, void *data)
 
 	vpdma_reset_desc_list(&ctx->desc_list);
 
+	 /* the previous dst mv buffer becomes the next src mv buffer */
+	ctx->src_mv_buf_selector = !ctx->src_mv_buf_selector;
+
 	if (ctx->aborting)
 		goto finished;
 
-	s_vb = ctx->src_vb;
+	s_vb = ctx->src_vbs[0];
 	d_vb = ctx->dst_vb;
 	s_buf = &s_vb->v4l2_buf;
 	d_buf = &d_vb->v4l2_buf;
@@ -924,16 +1217,35 @@ static irqreturn_t vpe_irq(int irq_vpe, void *data)
 		d_buf->flags |= V4L2_BUF_FLAG_TIMECODE;
 		d_buf->timecode = s_buf->timecode;
 	}
-
 	d_buf->sequence = ctx->sequence;
+	d_buf->field = ctx->field;
+
+	d_q_data = &ctx->q_data[Q_DATA_DST];
+	if (d_q_data->flags & Q_DATA_INTERLACED) {
+		if (ctx->field == V4L2_FIELD_BOTTOM) {
+			ctx->sequence++;
+			ctx->field = V4L2_FIELD_TOP;
+		} else {
+			WARN_ON(ctx->field != V4L2_FIELD_TOP);
+			ctx->field = V4L2_FIELD_BOTTOM;
+		}
+	} else {
+		ctx->sequence++;
+	}
 
-	ctx->sequence++;
+	if (ctx->deinterlacing)
+		s_vb = ctx->src_vbs[2];
 
 	spin_lock_irqsave(&dev->lock, flags);
 	v4l2_m2m_buf_done(s_vb, VB2_BUF_STATE_DONE);
 	v4l2_m2m_buf_done(d_vb, VB2_BUF_STATE_DONE);
 	spin_unlock_irqrestore(&dev->lock, flags);
 
+	if (ctx->deinterlacing) {
+		ctx->src_vbs[2] = ctx->src_vbs[1];
+		ctx->src_vbs[1] = ctx->src_vbs[0];
+	}
+
 	ctx->bufs_completed++;
 	if (ctx->bufs_completed < ctx->bufs_per_job) {
 		device_run(ctx);
@@ -1012,6 +1324,7 @@ static int vpe_g_fmt(struct file *file, void *priv, struct v4l2_format *f)
 	pix->width = q_data->width;
 	pix->height = q_data->height;
 	pix->pixelformat = q_data->fmt->fourcc;
+	pix->field = q_data->field;
 
 	if (V4L2_TYPE_IS_OUTPUT(f->type)) {
 		pix->colorspace = q_data->colorspace;
@@ -1047,7 +1360,8 @@ static int __vpe_try_fmt(struct vpe_ctx *ctx, struct v4l2_format *f,
 		return -EINVAL;
 	}
 
-	pix->field = V4L2_FIELD_NONE;
+	if (pix->field != V4L2_FIELD_NONE && pix->field != V4L2_FIELD_ALTERNATE)
+		pix->field = V4L2_FIELD_NONE;
 
 	v4l_bound_align_image(&pix->width, MIN_W, MAX_W, W_ALIGN,
 			      &pix->height, MIN_H, MAX_H, H_ALIGN,
@@ -1124,6 +1438,7 @@ static int __vpe_s_fmt(struct vpe_ctx *ctx, struct v4l2_format *f)
 	q_data->width		= pix->width;
 	q_data->height		= pix->height;
 	q_data->colorspace	= pix->colorspace;
+	q_data->field		= pix->field;
 
 	for (i = 0; i < pix->num_planes; i++) {
 		plane_fmt = &pix->plane_fmt[i];
@@ -1137,6 +1452,11 @@ static int __vpe_s_fmt(struct vpe_ctx *ctx, struct v4l2_format *f)
 	q_data->c_rect.width	= q_data->width;
 	q_data->c_rect.height	= q_data->height;
 
+	if (q_data->field == V4L2_FIELD_ALTERNATE)
+		q_data->flags |= Q_DATA_INTERLACED;
+	else
+		q_data->flags &= ~Q_DATA_INTERLACED;
+
 	vpe_dbg(ctx->dev, "Setting format for type %d, wxh: %dx%d, fmt: %d bpl_y %d",
 		f->type, q_data->width, q_data->height, q_data->fmt->fourcc,
 		q_data->bytesperline[VPE_LUMA]);
@@ -1451,6 +1771,7 @@ static int vpe_open(struct file *file)
 	s_q_data->sizeimage[VPE_LUMA] = (s_q_data->width * s_q_data->height *
 			s_q_data->fmt->vpdma_fmt[VPE_LUMA]->depth) >> 3;
 	s_q_data->colorspace = V4L2_COLORSPACE_SMPTE240M;
+	s_q_data->field = V4L2_FIELD_NONE;
 	s_q_data->c_rect.left = 0;
 	s_q_data->c_rect.top = 0;
 	s_q_data->c_rect.width = s_q_data->width;
@@ -1459,6 +1780,7 @@ static int vpe_open(struct file *file)
 
 	ctx->q_data[Q_DATA_DST] = *s_q_data;
 
+	set_dei_shadow_registers(ctx);
 	set_src_registers(ctx);
 	set_dst_registers(ctx);
 	ret = set_srcdst_params(ctx);
@@ -1513,6 +1835,8 @@ static int vpe_release(struct file *file)
 	vpe_dbg(dev, "releasing instance %p\n", ctx);
 
 	mutex_lock(&dev->dev_mutex);
+	free_vbs(ctx);
+	free_mv_buffers(ctx);
 	vpdma_free_desc_list(&ctx->desc_list);
 	vpdma_free_desc_buf(&ctx->mmr_adb);
 
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* Re: [PATCH 6/6] experimental: arm: dts: dra7xx: Add a DT node for VPE
  2013-08-08 22:11   ` Laurent Pinchart
@ 2013-10-25 10:35       ` Archit Taneja
  2013-12-03 10:08       ` Archit Taneja
  1 sibling, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-10-25 10:35 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: linux-media, linux-omap, dagriego, dale, pawel, m.szyprowski,
	hverkuil, tomi.valkeinen, Rajendra Nayak, Sricharan R

Hi Laurent,

Sorry about the late response, I had scrapped the DT patch out of the 
VPE series since there were dependencies on crossbar drivers and some 
other baseport stuff. Comments below.

On Friday 09 August 2013 03:41 AM, Laurent Pinchart wrote:
> Hi Archit,
>
> Thank you for the patch.
>
> On Friday 02 August 2013 19:33:43 Archit Taneja wrote:
>> Add a DT node for VPE in dra7.dtsi. This is experimental because we might
>> need to split the VPE address space a bit more, and also because the IRQ
>> line described is accessible the IRQ crossbar driver is added for DRA7XX.
>>
>> Cc: Rajendra Nayak <rnayak@ti.com>
>> Cc: Sricharan R <r.sricharan@ti.com>
>> Signed-off-by: Archit Taneja <archit@ti.com>
>> ---
>>   arch/arm/boot/dts/dra7.dtsi | 11 +++++++++++
>
> Documentation is missing :-) As this is an experimental patch you can probably
> document the bindings later.

Yes, I will work on that.

>
>>   1 file changed, 11 insertions(+)
>>
>> diff --git a/arch/arm/boot/dts/dra7.dtsi b/arch/arm/boot/dts/dra7.dtsi
>> index ce9a0f0..3237972 100644
>> --- a/arch/arm/boot/dts/dra7.dtsi
>> +++ b/arch/arm/boot/dts/dra7.dtsi
>> @@ -484,6 +484,17 @@
>>   			dmas = <&sdma 70>, <&sdma 71>;
>>   			dma-names = "tx0", "rx0";
>>   		};
>> +
>> +		vpe {
>> +			compatible = "ti,vpe";
>> +			ti,hwmods = "vpe";
>> +			reg = <0x489d0000 0xd000>, <0x489dd000 0x400>;
>> +			reg-names = "vpe", "vpdma";
>> +			interrupts = <0 159 0x4>;
>> +			#address-cells = <1>;
>> +			#size-cells = <0>;
>
> Are #address-cells and #size-cells really needed ?

These aren't needed, vpe derives the address info from it's parent(ocp). 
I didn't know that the child nodes inherit these params from the parent.

Archit


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 6/6] experimental: arm: dts: dra7xx: Add a DT node for VPE
@ 2013-10-25 10:35       ` Archit Taneja
  0 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-10-25 10:35 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: linux-media, linux-omap, dagriego, dale, pawel, m.szyprowski,
	hverkuil, tomi.valkeinen, Rajendra Nayak, Sricharan R

Hi Laurent,

Sorry about the late response, I had scrapped the DT patch out of the 
VPE series since there were dependencies on crossbar drivers and some 
other baseport stuff. Comments below.

On Friday 09 August 2013 03:41 AM, Laurent Pinchart wrote:
> Hi Archit,
>
> Thank you for the patch.
>
> On Friday 02 August 2013 19:33:43 Archit Taneja wrote:
>> Add a DT node for VPE in dra7.dtsi. This is experimental because we might
>> need to split the VPE address space a bit more, and also because the IRQ
>> line described is accessible the IRQ crossbar driver is added for DRA7XX.
>>
>> Cc: Rajendra Nayak <rnayak@ti.com>
>> Cc: Sricharan R <r.sricharan@ti.com>
>> Signed-off-by: Archit Taneja <archit@ti.com>
>> ---
>>   arch/arm/boot/dts/dra7.dtsi | 11 +++++++++++
>
> Documentation is missing :-) As this is an experimental patch you can probably
> document the bindings later.

Yes, I will work on that.

>
>>   1 file changed, 11 insertions(+)
>>
>> diff --git a/arch/arm/boot/dts/dra7.dtsi b/arch/arm/boot/dts/dra7.dtsi
>> index ce9a0f0..3237972 100644
>> --- a/arch/arm/boot/dts/dra7.dtsi
>> +++ b/arch/arm/boot/dts/dra7.dtsi
>> @@ -484,6 +484,17 @@
>>   			dmas = <&sdma 70>, <&sdma 71>;
>>   			dma-names = "tx0", "rx0";
>>   		};
>> +
>> +		vpe {
>> +			compatible = "ti,vpe";
>> +			ti,hwmods = "vpe";
>> +			reg = <0x489d0000 0xd000>, <0x489dd000 0x400>;
>> +			reg-names = "vpe", "vpdma";
>> +			interrupts = <0 159 0x4>;
>> +			#address-cells = <1>;
>> +			#size-cells = <0>;
>
> Are #address-cells and #size-cells really needed ?

These aren't needed, vpe derives the address info from it's parent(ocp). 
I didn't know that the child nodes inherit these params from the parent.

Archit

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 6/6] experimental: arm: dts: dra7xx: Add a DT node for VPE
  2013-08-08 22:11   ` Laurent Pinchart
@ 2013-12-03 10:08       ` Archit Taneja
  2013-12-03 10:08       ` Archit Taneja
  1 sibling, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-12-03 10:08 UTC (permalink / raw)
  To: Laurent Pinchart; +Cc: linux-media, linux-omap

Hi Laurent,

On Friday 09 August 2013 03:41 AM, Laurent Pinchart wrote:
> Hi Archit,
>
> Thank you for the patch.
>
> On Friday 02 August 2013 19:33:43 Archit Taneja wrote:
>> Add a DT node for VPE in dra7.dtsi. This is experimental because we might
>> need to split the VPE address space a bit more, and also because the IRQ
>> line described is accessible the IRQ crossbar driver is added for DRA7XX.
>>
>> Cc: Rajendra Nayak <rnayak@ti.com>
>> Cc: Sricharan R <r.sricharan@ti.com>
>> Signed-off-by: Archit Taneja <archit@ti.com>
>> ---
>>   arch/arm/boot/dts/dra7.dtsi | 11 +++++++++++
>
> Documentation is missing :-) As this is an experimental patch you can probably
> document the bindings later.

Sorry for the late reply, I missed out on reading this message somehow.

I'm blocked on adding the DT nodes because of some dependencies on dra7x 
clocks, and the crossbar framework. I'll make sure to add documentation :)

>
>>   1 file changed, 11 insertions(+)
>>
>> diff --git a/arch/arm/boot/dts/dra7.dtsi b/arch/arm/boot/dts/dra7.dtsi
>> index ce9a0f0..3237972 100644
>> --- a/arch/arm/boot/dts/dra7.dtsi
>> +++ b/arch/arm/boot/dts/dra7.dtsi
>> @@ -484,6 +484,17 @@
>>   			dmas = <&sdma 70>, <&sdma 71>;
>>   			dma-names = "tx0", "rx0";
>>   		};
>> +
>> +		vpe {
>> +			compatible = "ti,vpe";
>> +			ti,hwmods = "vpe";
>> +			reg = <0x489d0000 0xd000>, <0x489dd000 0x400>;
>> +			reg-names = "vpe", "vpdma";
>> +			interrupts = <0 159 0x4>;
>> +			#address-cells = <1>;
>> +			#size-cells = <0>;
>
> Are #address-cells and #size-cells really needed ?

They aren't needed. The vpe node inherits these params from the parent 
"ocp" node.

Thanks,
Archit


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 6/6] experimental: arm: dts: dra7xx: Add a DT node for VPE
@ 2013-12-03 10:08       ` Archit Taneja
  0 siblings, 0 replies; 138+ messages in thread
From: Archit Taneja @ 2013-12-03 10:08 UTC (permalink / raw)
  To: Laurent Pinchart; +Cc: linux-media, linux-omap

Hi Laurent,

On Friday 09 August 2013 03:41 AM, Laurent Pinchart wrote:
> Hi Archit,
>
> Thank you for the patch.
>
> On Friday 02 August 2013 19:33:43 Archit Taneja wrote:
>> Add a DT node for VPE in dra7.dtsi. This is experimental because we might
>> need to split the VPE address space a bit more, and also because the IRQ
>> line described is accessible the IRQ crossbar driver is added for DRA7XX.
>>
>> Cc: Rajendra Nayak <rnayak@ti.com>
>> Cc: Sricharan R <r.sricharan@ti.com>
>> Signed-off-by: Archit Taneja <archit@ti.com>
>> ---
>>   arch/arm/boot/dts/dra7.dtsi | 11 +++++++++++
>
> Documentation is missing :-) As this is an experimental patch you can probably
> document the bindings later.

Sorry for the late reply, I missed out on reading this message somehow.

I'm blocked on adding the DT nodes because of some dependencies on dra7x 
clocks, and the crossbar framework. I'll make sure to add documentation :)

>
>>   1 file changed, 11 insertions(+)
>>
>> diff --git a/arch/arm/boot/dts/dra7.dtsi b/arch/arm/boot/dts/dra7.dtsi
>> index ce9a0f0..3237972 100644
>> --- a/arch/arm/boot/dts/dra7.dtsi
>> +++ b/arch/arm/boot/dts/dra7.dtsi
>> @@ -484,6 +484,17 @@
>>   			dmas = <&sdma 70>, <&sdma 71>;
>>   			dma-names = "tx0", "rx0";
>>   		};
>> +
>> +		vpe {
>> +			compatible = "ti,vpe";
>> +			ti,hwmods = "vpe";
>> +			reg = <0x489d0000 0xd000>, <0x489dd000 0x400>;
>> +			reg-names = "vpe", "vpdma";
>> +			interrupts = <0 159 0x4>;
>> +			#address-cells = <1>;
>> +			#size-cells = <0>;
>
> Are #address-cells and #size-cells really needed ?

They aren't needed. The vpe node inherits these params from the parent 
"ocp" node.

Thanks,
Archit


^ permalink raw reply	[flat|nested] 138+ messages in thread

end of thread, other threads:[~2013-12-03 10:08 UTC | newest]

Thread overview: 138+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-02 14:03 [PATCH 0/6] v4l: VPE mem to mem driver Archit Taneja
2013-08-02 14:03 ` Archit Taneja
2013-08-02 14:03 ` [PATCH 1/6] v4l: ti-vpe: Create a vpdma helper library Archit Taneja
2013-08-02 14:03   ` Archit Taneja
2013-08-05  8:13   ` Tomi Valkeinen
2013-08-05  8:13     ` Tomi Valkeinen
2013-08-05 11:26     ` Archit Taneja
2013-08-05 11:26       ` Archit Taneja
2013-08-05 12:26       ` Tomi Valkeinen
2013-08-05 12:26         ` Tomi Valkeinen
2013-08-08 21:35       ` Laurent Pinchart
2013-08-14 10:19         ` Archit Taneja
2013-08-14 10:19           ` Archit Taneja
2013-08-08 22:04   ` Laurent Pinchart
2013-08-14 10:57     ` Archit Taneja
2013-08-14 10:57       ` Archit Taneja
2013-08-20 11:39       ` Laurent Pinchart
2013-08-20 12:51         ` Archit Taneja
2013-08-20 12:51           ` Archit Taneja
2013-08-20 13:16         ` Archit Taneja
2013-08-20 13:16           ` Archit Taneja
2013-08-20 13:56           ` Laurent Pinchart
2013-08-21  6:47             ` Archit Taneja
2013-08-21  6:47               ` Archit Taneja
2013-08-02 14:03 ` [PATCH 2/6] v4l: ti-vpe: Add helpers for creating VPDMA descriptors Archit Taneja
2013-08-02 14:03   ` Archit Taneja
2013-08-05  9:11   ` Tomi Valkeinen
2013-08-05  9:11     ` Tomi Valkeinen
2013-08-05 12:05     ` Archit Taneja
2013-08-05 12:05       ` Archit Taneja
2013-08-05 13:03       ` Tomi Valkeinen
2013-08-05 13:03         ` Tomi Valkeinen
2013-08-02 14:03 ` [PATCH 3/6] v4l: ti-vpe: Add VPE mem to mem driver Archit Taneja
2013-08-02 14:03   ` Archit Taneja
2013-08-02 14:36   ` Hans Verkuil
2013-08-02 14:55     ` Archit Taneja
2013-08-02 14:55       ` Archit Taneja
2013-08-05  9:18   ` Tomi Valkeinen
2013-08-05  9:18     ` Tomi Valkeinen
2013-08-02 14:03 ` [PATCH 4/6] v4l: ti-vpe: Add de-interlacer support in VPE Archit Taneja
2013-08-02 14:03   ` Archit Taneja
2013-08-02 14:40   ` Hans Verkuil
2013-08-02 14:03 ` [PATCH 5/6] arm: dra7xx: hwmod data: add VPE hwmod data and ocp_if info Archit Taneja
2013-08-02 14:03   ` Archit Taneja
2013-08-02 14:03 ` [PATCH 6/6] experimental: arm: dts: dra7xx: Add a DT node for VPE Archit Taneja
2013-08-02 14:03   ` Archit Taneja
2013-08-08 22:11   ` Laurent Pinchart
2013-10-25 10:35     ` Archit Taneja
2013-10-25 10:35       ` Archit Taneja
2013-12-03 10:08     ` Archit Taneja
2013-12-03 10:08       ` Archit Taneja
2013-08-20 11:00 ` [PATCH v2 0/6] v4l: VPE mem to mem driver Archit Taneja
2013-08-20 11:00   ` Archit Taneja
2013-08-20 11:00   ` [PATCH v2 1/6] v4l: ti-vpe: Create a vpdma helper library Archit Taneja
2013-08-20 11:00     ` Archit Taneja
2013-08-20 11:00   ` [PATCH v2 2/6] v4l: ti-vpe: Add helpers for creating VPDMA descriptors Archit Taneja
2013-08-20 11:00     ` Archit Taneja
2013-08-20 11:00   ` [PATCH v2 3/6] v4l: ti-vpe: Add VPE mem to mem driver Archit Taneja
2013-08-20 11:00     ` Archit Taneja
2013-08-20 11:00   ` [PATCH v2 4/6] v4l: ti-vpe: Add de-interlacer support in VPE Archit Taneja
2013-08-20 11:00     ` Archit Taneja
2013-08-20 11:00   ` [PATCH v2 5/6] arm: dra7xx: hwmod data: add VPE hwmod data and ocp_if info Archit Taneja
2013-08-20 11:00     ` Archit Taneja
2013-08-20 11:00   ` [PATCH v2 6/6] experimental: arm: dts: dra7xx: Add a DT node for VPE Archit Taneja
2013-08-20 11:00     ` Archit Taneja
2013-08-29 12:32   ` [PATCH v3 0/6] v4l: VPE mem to mem driver Archit Taneja
2013-08-29 12:32     ` Archit Taneja
2013-08-29 12:32     ` [PATCH v3 1/6] v4l: ti-vpe: Create a vpdma helper library Archit Taneja
2013-08-29 12:32       ` Archit Taneja
2013-08-29 12:32     ` [PATCH v3 2/6] v4l: ti-vpe: Add helpers for creating VPDMA descriptors Archit Taneja
2013-08-29 12:32       ` Archit Taneja
2013-08-29 12:32     ` [PATCH v3 3/6] v4l: ti-vpe: Add VPE mem to mem driver Archit Taneja
2013-08-29 12:32       ` Archit Taneja
2013-08-29 13:28       ` Hans Verkuil
2013-08-30  6:47         ` Archit Taneja
2013-08-30  6:47           ` Archit Taneja
2013-08-30  7:07           ` Hans Verkuil
2013-08-30 10:05             ` Archit Taneja
2013-08-30 10:05               ` Archit Taneja
2013-08-30 10:44               ` Hans Verkuil
2013-09-05  5:56         ` Archit Taneja
2013-09-05  5:56           ` Archit Taneja
2013-08-29 12:32     ` [PATCH v3 4/6] v4l: ti-vpe: Add de-interlacer support in VPE Archit Taneja
2013-08-29 12:32       ` Archit Taneja
2013-08-29 12:32     ` [PATCH v3 5/6] arm: dra7xx: hwmod data: add VPE hwmod data and ocp_if info Archit Taneja
2013-08-29 12:32       ` Archit Taneja
2013-08-29 12:42       ` Rajendra Nayak
2013-08-29 12:42         ` Rajendra Nayak
2013-08-29 13:42         ` Archit Taneja
2013-08-29 13:42           ` Archit Taneja
2013-08-29 12:32     ` [PATCH v3 6/6] experimental: arm: dts: dra7xx: Add a DT node for VPE Archit Taneja
2013-08-29 12:32       ` Archit Taneja
2013-09-06 10:12   ` [PATCH v4 0/4] v4l: VPE mem to mem driver Archit Taneja
2013-09-06 10:12     ` Archit Taneja
2013-09-06 10:12     ` [PATCH v4 1/4] v4l: ti-vpe: Create a vpdma helper library Archit Taneja
2013-09-06 10:12       ` Archit Taneja
2013-10-07  7:46       ` Hans Verkuil
2013-09-06 10:12     ` [PATCH v4 2/4] v4l: ti-vpe: Add helpers for creating VPDMA descriptors Archit Taneja
2013-09-06 10:12       ` Archit Taneja
2013-10-07  7:46       ` Hans Verkuil
2013-09-06 10:12     ` [PATCH v4 3/4] v4l: ti-vpe: Add VPE mem to mem driver Archit Taneja
2013-09-06 10:12       ` Archit Taneja
2013-10-07  7:55       ` Hans Verkuil
2013-10-07  9:16         ` Archit Taneja
2013-10-07  9:16           ` Archit Taneja
2013-10-07  9:34           ` Hans Verkuil
2013-10-07 10:22             ` Archit Taneja
2013-10-07 10:22               ` Archit Taneja
2013-10-07 14:02               ` Hans Verkuil
2013-10-07 14:34                 ` Archit Taneja
2013-10-07 14:34                   ` Archit Taneja
2013-09-06 10:12     ` [PATCH v4 4/4] v4l: ti-vpe: Add de-interlacer support in VPE Archit Taneja
2013-09-06 10:12       ` Archit Taneja
2013-10-07  7:57       ` Hans Verkuil
2013-09-16  6:59     ` [PATCH v4 0/4] v4l: VPE mem to mem driver Archit Taneja
2013-09-16  6:59       ` Archit Taneja
2013-10-07  6:39       ` Archit Taneja
2013-10-07  6:39         ` Archit Taneja
2013-10-09 14:29     ` [PATCH v5 3/4] v4l: ti-vpe: Add " Archit Taneja
2013-10-09 14:29       ` Archit Taneja
2013-10-11  7:46       ` Hans Verkuil
2013-10-15 13:47         ` Archit Taneja
2013-10-15 13:47           ` Archit Taneja
2013-10-15 13:51           ` Hans Verkuil
2013-10-15 14:13             ` Kamil Debski
2013-10-15 15:54             ` Kamil Debski
2013-10-16  5:08               ` Archit Taneja
2013-10-16  5:08                 ` Archit Taneja
2013-10-16  5:36     ` [PATCH v5 0/4] v4l: " Archit Taneja
2013-10-16  5:36       ` Archit Taneja
2013-10-16  5:36       ` [PATCH v5 1/4] v4l: ti-vpe: Create a vpdma helper library Archit Taneja
2013-10-16  5:36         ` Archit Taneja
2013-10-16  5:36       ` [PATCH v5 2/4] v4l: ti-vpe: Add helpers for creating VPDMA descriptors Archit Taneja
2013-10-16  5:36         ` Archit Taneja
2013-10-16  5:36       ` [PATCH v5 3/4] v4l: ti-vpe: Add VPE mem to mem driver Archit Taneja
2013-10-16  5:36         ` Archit Taneja
2013-10-16  5:36       ` [PATCH v5 4/4] v4l: ti-vpe: Add de-interlacer support in VPE Archit Taneja
2013-10-16  5:36         ` Archit Taneja

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.