All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv3 0/7] Support for Tegra 2D hardware
@ 2012-12-13 14:04 ` Terje Bergstrom
  0 siblings, 0 replies; 24+ messages in thread
From: Terje Bergstrom @ 2012-12-13 14:04 UTC (permalink / raw)
  To: tbergstrom, thierry.reding, dev, linux-tegra, dri-devel; +Cc: linux-kernel

This set of patches adds support for Tegra20 and Tegra30 host1x and
2D. It is based on linux-next.

The third version has too many changes to list all of them. Here are
highlights:
 * Renamed to host1x, and moved to drivers/gpu/host1x
 * Greatly simplified the inner workings between physical and logical
   driver
 * Does not use AUXDATA for passing data to driver
 * Runtime power management removed - will replace with runtime PM
   later
 * IOCTLs padded and use __64 for passing pointers
 * DMABUF support removed, replaced with GEM CMA support
 * host1x driver validates command streams and copies them to kernel
   owned buffer
 * Generic interrupt support removed - only syncpt irq remains
 * Sync points are allocated now dynamically
 * IO register space handling rewritten to use helper functions
 * Other numerous fixes and simplifications to code

Some of the issues left open:
 * Register definitions still use static inline. There has been a
   debate about code style versus ability to use compiler type
   checking and code coverage analysis. There was no conclusion, so
   I left it as was.
 * tegradrm has a global variable. Plan was to hide that behind a
   virtual device, and use that as DRM root device. That plan went
   bad once the FB CMA helper used the device for trying to allocate
   memory.

host1x is the driver that controls host1x hardware. It supports
host1x command channels, synchronization, and memory management. It
is sectioned into logical driver under drivers/gpu/host1x and
physical driver under drivers/host1x/hw. The physical driver is
compiled with the hardware headers of the particular host1x version.

The hardware units are described (briefly) in the Tegra2 TRM. Wiki
page https://gitorious.org/linux-tegra-drm/pages/Host1xIntroduction
also contains a short description of the functionality.

The patch set removes responsibility of host1x from tegradrm. At the
same time, it moves all drm related infrastructure in
drivers/gpu/drm/tegra/host1x.c to drm.c.

The patch set adds 2D driver to tegradrm, which uses host1x for
communicating with host1x to access sync points and channels. We
expect to use the same infrastructure for other host1x clients, so
we have kept host1x and tegradrm separate.

The patch set also adds user space API to tegradrm for accessing
host1x and 2D. The user space parts are sent at the same time as
these patches.

Arto Merilainen (1):
  drm: tegra: Remove redundant host1x

Terje Bergstrom (6):
  gpu: host1x: Add host1x driver
  gpu: host1x: Add syncpoint wait and interrupts
  gpu: host1x: Add channel support
  gpu: host1x: Add debug support
  ARM: tegra: Add board data and 2D clocks
  drm: tegra: Add gr2d device

 arch/arm/mach-tegra/board-dt-tegra20.c      |    1 +
 arch/arm/mach-tegra/board-dt-tegra30.c      |    1 +
 arch/arm/mach-tegra/tegra20_clocks_data.c   |    2 +-
 arch/arm/mach-tegra/tegra30_clocks_data.c   |    1 +
 drivers/gpu/Makefile                        |    1 +
 drivers/gpu/drm/tegra/Kconfig               |    2 +-
 drivers/gpu/drm/tegra/Makefile              |    2 +-
 drivers/gpu/drm/tegra/dc.c                  |   20 +-
 drivers/gpu/drm/tegra/drm.c                 |  428 ++++++++++++++++++-
 drivers/gpu/drm/tegra/drm.h                 |   67 ++-
 drivers/gpu/drm/tegra/fb.c                  |   17 +-
 drivers/gpu/drm/tegra/gr2d.c                |  300 +++++++++++++
 drivers/gpu/drm/tegra/hdmi.c                |   24 +-
 drivers/gpu/drm/tegra/host1x.c              |  325 --------------
 drivers/gpu/host1x/Kconfig                  |   28 ++
 drivers/gpu/host1x/Makefile                 |   16 +
 drivers/gpu/host1x/cdma.c                   |  475 ++++++++++++++++++++
 drivers/gpu/host1x/cdma.h                   |  107 +++++
 drivers/gpu/host1x/channel.c                |  137 ++++++
 drivers/gpu/host1x/channel.h                |   64 +++
 drivers/gpu/host1x/cma.c                    |  116 +++++
 drivers/gpu/host1x/cma.h                    |   43 ++
 drivers/gpu/host1x/debug.c                  |  207 +++++++++
 drivers/gpu/host1x/debug.h                  |   49 +++
 drivers/gpu/host1x/dev.c                    |  242 +++++++++++
 drivers/gpu/host1x/dev.h                    |  165 +++++++
 drivers/gpu/host1x/hw/Makefile              |    6 +
 drivers/gpu/host1x/hw/cdma_hw.c             |  480 +++++++++++++++++++++
 drivers/gpu/host1x/hw/cdma_hw.h             |   37 ++
 drivers/gpu/host1x/hw/channel_hw.c          |  147 +++++++
 drivers/gpu/host1x/hw/debug_hw.c            |  399 +++++++++++++++++
 drivers/gpu/host1x/hw/host1x01.c            |   46 ++
 drivers/gpu/host1x/hw/host1x01.h            |   25 ++
 drivers/gpu/host1x/hw/host1x01_hardware.h   |  150 +++++++
 drivers/gpu/host1x/hw/hw_host1x01_channel.h |   98 +++++
 drivers/gpu/host1x/hw/hw_host1x01_sync.h    |  179 ++++++++
 drivers/gpu/host1x/hw/hw_host1x01_uclass.h  |  130 ++++++
 drivers/gpu/host1x/hw/intr_hw.c             |  175 ++++++++
 drivers/gpu/host1x/hw/syncpt_hw.c           |  157 +++++++
 drivers/gpu/host1x/intr.c                   |  377 ++++++++++++++++
 drivers/gpu/host1x/intr.h                   |  106 +++++
 drivers/gpu/host1x/job.c                    |  618 +++++++++++++++++++++++++++
 drivers/gpu/host1x/memmgr.c                 |  174 ++++++++
 drivers/gpu/host1x/memmgr.h                 |   53 +++
 drivers/gpu/host1x/syncpt.c                 |  397 +++++++++++++++++
 drivers/gpu/host1x/syncpt.h                 |  134 ++++++
 drivers/video/Kconfig                       |    2 +
 include/drm/tegra_drm.h                     |  131 ++++++
 include/linux/host1x.h                      |  215 ++++++++++
 include/trace/events/host1x.h               |  296 +++++++++++++
 50 files changed, 6980 insertions(+), 392 deletions(-)
 create mode 100644 drivers/gpu/drm/tegra/gr2d.c
 delete mode 100644 drivers/gpu/drm/tegra/host1x.c
 create mode 100644 drivers/gpu/host1x/Kconfig
 create mode 100644 drivers/gpu/host1x/Makefile
 create mode 100644 drivers/gpu/host1x/cdma.c
 create mode 100644 drivers/gpu/host1x/cdma.h
 create mode 100644 drivers/gpu/host1x/channel.c
 create mode 100644 drivers/gpu/host1x/channel.h
 create mode 100644 drivers/gpu/host1x/cma.c
 create mode 100644 drivers/gpu/host1x/cma.h
 create mode 100644 drivers/gpu/host1x/debug.c
 create mode 100644 drivers/gpu/host1x/debug.h
 create mode 100644 drivers/gpu/host1x/dev.c
 create mode 100644 drivers/gpu/host1x/dev.h
 create mode 100644 drivers/gpu/host1x/hw/Makefile
 create mode 100644 drivers/gpu/host1x/hw/cdma_hw.c
 create mode 100644 drivers/gpu/host1x/hw/cdma_hw.h
 create mode 100644 drivers/gpu/host1x/hw/channel_hw.c
 create mode 100644 drivers/gpu/host1x/hw/debug_hw.c
 create mode 100644 drivers/gpu/host1x/hw/host1x01.c
 create mode 100644 drivers/gpu/host1x/hw/host1x01.h
 create mode 100644 drivers/gpu/host1x/hw/host1x01_hardware.h
 create mode 100644 drivers/gpu/host1x/hw/hw_host1x01_channel.h
 create mode 100644 drivers/gpu/host1x/hw/hw_host1x01_sync.h
 create mode 100644 drivers/gpu/host1x/hw/hw_host1x01_uclass.h
 create mode 100644 drivers/gpu/host1x/hw/intr_hw.c
 create mode 100644 drivers/gpu/host1x/hw/syncpt_hw.c
 create mode 100644 drivers/gpu/host1x/intr.c
 create mode 100644 drivers/gpu/host1x/intr.h
 create mode 100644 drivers/gpu/host1x/job.c
 create mode 100644 drivers/gpu/host1x/memmgr.c
 create mode 100644 drivers/gpu/host1x/memmgr.h
 create mode 100644 drivers/gpu/host1x/syncpt.c
 create mode 100644 drivers/gpu/host1x/syncpt.h
 create mode 100644 include/drm/tegra_drm.h
 create mode 100644 include/linux/host1x.h
 create mode 100644 include/trace/events/host1x.h

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCHv3 0/7] Support for Tegra 2D hardware
@ 2012-12-13 14:04 ` Terje Bergstrom
  0 siblings, 0 replies; 24+ messages in thread
From: Terje Bergstrom @ 2012-12-13 14:04 UTC (permalink / raw)
  To: tbergstrom, thierry.reding, dev, linux-tegra, dri-devel
  Cc: amerilainen, linux-kernel

This set of patches adds support for Tegra20 and Tegra30 host1x and
2D. It is based on linux-next.

The third version has too many changes to list all of them. Here are
highlights:
 * Renamed to host1x, and moved to drivers/gpu/host1x
 * Greatly simplified the inner workings between physical and logical
   driver
 * Does not use AUXDATA for passing data to driver
 * Runtime power management removed - will replace with runtime PM
   later
 * IOCTLs padded and use __64 for passing pointers
 * DMABUF support removed, replaced with GEM CMA support
 * host1x driver validates command streams and copies them to kernel
   owned buffer
 * Generic interrupt support removed - only syncpt irq remains
 * Sync points are allocated now dynamically
 * IO register space handling rewritten to use helper functions
 * Other numerous fixes and simplifications to code

Some of the issues left open:
 * Register definitions still use static inline. There has been a
   debate about code style versus ability to use compiler type
   checking and code coverage analysis. There was no conclusion, so
   I left it as was.
 * tegradrm has a global variable. Plan was to hide that behind a
   virtual device, and use that as DRM root device. That plan went
   bad once the FB CMA helper used the device for trying to allocate
   memory.

host1x is the driver that controls host1x hardware. It supports
host1x command channels, synchronization, and memory management. It
is sectioned into logical driver under drivers/gpu/host1x and
physical driver under drivers/host1x/hw. The physical driver is
compiled with the hardware headers of the particular host1x version.

The hardware units are described (briefly) in the Tegra2 TRM. Wiki
page https://gitorious.org/linux-tegra-drm/pages/Host1xIntroduction
also contains a short description of the functionality.

The patch set removes responsibility of host1x from tegradrm. At the
same time, it moves all drm related infrastructure in
drivers/gpu/drm/tegra/host1x.c to drm.c.

The patch set adds 2D driver to tegradrm, which uses host1x for
communicating with host1x to access sync points and channels. We
expect to use the same infrastructure for other host1x clients, so
we have kept host1x and tegradrm separate.

The patch set also adds user space API to tegradrm for accessing
host1x and 2D. The user space parts are sent at the same time as
these patches.

Arto Merilainen (1):
  drm: tegra: Remove redundant host1x

Terje Bergstrom (6):
  gpu: host1x: Add host1x driver
  gpu: host1x: Add syncpoint wait and interrupts
  gpu: host1x: Add channel support
  gpu: host1x: Add debug support
  ARM: tegra: Add board data and 2D clocks
  drm: tegra: Add gr2d device

 arch/arm/mach-tegra/board-dt-tegra20.c      |    1 +
 arch/arm/mach-tegra/board-dt-tegra30.c      |    1 +
 arch/arm/mach-tegra/tegra20_clocks_data.c   |    2 +-
 arch/arm/mach-tegra/tegra30_clocks_data.c   |    1 +
 drivers/gpu/Makefile                        |    1 +
 drivers/gpu/drm/tegra/Kconfig               |    2 +-
 drivers/gpu/drm/tegra/Makefile              |    2 +-
 drivers/gpu/drm/tegra/dc.c                  |   20 +-
 drivers/gpu/drm/tegra/drm.c                 |  428 ++++++++++++++++++-
 drivers/gpu/drm/tegra/drm.h                 |   67 ++-
 drivers/gpu/drm/tegra/fb.c                  |   17 +-
 drivers/gpu/drm/tegra/gr2d.c                |  300 +++++++++++++
 drivers/gpu/drm/tegra/hdmi.c                |   24 +-
 drivers/gpu/drm/tegra/host1x.c              |  325 --------------
 drivers/gpu/host1x/Kconfig                  |   28 ++
 drivers/gpu/host1x/Makefile                 |   16 +
 drivers/gpu/host1x/cdma.c                   |  475 ++++++++++++++++++++
 drivers/gpu/host1x/cdma.h                   |  107 +++++
 drivers/gpu/host1x/channel.c                |  137 ++++++
 drivers/gpu/host1x/channel.h                |   64 +++
 drivers/gpu/host1x/cma.c                    |  116 +++++
 drivers/gpu/host1x/cma.h                    |   43 ++
 drivers/gpu/host1x/debug.c                  |  207 +++++++++
 drivers/gpu/host1x/debug.h                  |   49 +++
 drivers/gpu/host1x/dev.c                    |  242 +++++++++++
 drivers/gpu/host1x/dev.h                    |  165 +++++++
 drivers/gpu/host1x/hw/Makefile              |    6 +
 drivers/gpu/host1x/hw/cdma_hw.c             |  480 +++++++++++++++++++++
 drivers/gpu/host1x/hw/cdma_hw.h             |   37 ++
 drivers/gpu/host1x/hw/channel_hw.c          |  147 +++++++
 drivers/gpu/host1x/hw/debug_hw.c            |  399 +++++++++++++++++
 drivers/gpu/host1x/hw/host1x01.c            |   46 ++
 drivers/gpu/host1x/hw/host1x01.h            |   25 ++
 drivers/gpu/host1x/hw/host1x01_hardware.h   |  150 +++++++
 drivers/gpu/host1x/hw/hw_host1x01_channel.h |   98 +++++
 drivers/gpu/host1x/hw/hw_host1x01_sync.h    |  179 ++++++++
 drivers/gpu/host1x/hw/hw_host1x01_uclass.h  |  130 ++++++
 drivers/gpu/host1x/hw/intr_hw.c             |  175 ++++++++
 drivers/gpu/host1x/hw/syncpt_hw.c           |  157 +++++++
 drivers/gpu/host1x/intr.c                   |  377 ++++++++++++++++
 drivers/gpu/host1x/intr.h                   |  106 +++++
 drivers/gpu/host1x/job.c                    |  618 +++++++++++++++++++++++++++
 drivers/gpu/host1x/memmgr.c                 |  174 ++++++++
 drivers/gpu/host1x/memmgr.h                 |   53 +++
 drivers/gpu/host1x/syncpt.c                 |  397 +++++++++++++++++
 drivers/gpu/host1x/syncpt.h                 |  134 ++++++
 drivers/video/Kconfig                       |    2 +
 include/drm/tegra_drm.h                     |  131 ++++++
 include/linux/host1x.h                      |  215 ++++++++++
 include/trace/events/host1x.h               |  296 +++++++++++++
 50 files changed, 6980 insertions(+), 392 deletions(-)
 create mode 100644 drivers/gpu/drm/tegra/gr2d.c
 delete mode 100644 drivers/gpu/drm/tegra/host1x.c
 create mode 100644 drivers/gpu/host1x/Kconfig
 create mode 100644 drivers/gpu/host1x/Makefile
 create mode 100644 drivers/gpu/host1x/cdma.c
 create mode 100644 drivers/gpu/host1x/cdma.h
 create mode 100644 drivers/gpu/host1x/channel.c
 create mode 100644 drivers/gpu/host1x/channel.h
 create mode 100644 drivers/gpu/host1x/cma.c
 create mode 100644 drivers/gpu/host1x/cma.h
 create mode 100644 drivers/gpu/host1x/debug.c
 create mode 100644 drivers/gpu/host1x/debug.h
 create mode 100644 drivers/gpu/host1x/dev.c
 create mode 100644 drivers/gpu/host1x/dev.h
 create mode 100644 drivers/gpu/host1x/hw/Makefile
 create mode 100644 drivers/gpu/host1x/hw/cdma_hw.c
 create mode 100644 drivers/gpu/host1x/hw/cdma_hw.h
 create mode 100644 drivers/gpu/host1x/hw/channel_hw.c
 create mode 100644 drivers/gpu/host1x/hw/debug_hw.c
 create mode 100644 drivers/gpu/host1x/hw/host1x01.c
 create mode 100644 drivers/gpu/host1x/hw/host1x01.h
 create mode 100644 drivers/gpu/host1x/hw/host1x01_hardware.h
 create mode 100644 drivers/gpu/host1x/hw/hw_host1x01_channel.h
 create mode 100644 drivers/gpu/host1x/hw/hw_host1x01_sync.h
 create mode 100644 drivers/gpu/host1x/hw/hw_host1x01_uclass.h
 create mode 100644 drivers/gpu/host1x/hw/intr_hw.c
 create mode 100644 drivers/gpu/host1x/hw/syncpt_hw.c
 create mode 100644 drivers/gpu/host1x/intr.c
 create mode 100644 drivers/gpu/host1x/intr.h
 create mode 100644 drivers/gpu/host1x/job.c
 create mode 100644 drivers/gpu/host1x/memmgr.c
 create mode 100644 drivers/gpu/host1x/memmgr.h
 create mode 100644 drivers/gpu/host1x/syncpt.c
 create mode 100644 drivers/gpu/host1x/syncpt.h
 create mode 100644 include/drm/tegra_drm.h
 create mode 100644 include/linux/host1x.h
 create mode 100644 include/trace/events/host1x.h

-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCHv3 1/7] gpu: host1x: Add host1x driver
  2012-12-13 14:04 ` Terje Bergstrom
@ 2012-12-13 14:04   ` Terje Bergstrom
  -1 siblings, 0 replies; 24+ messages in thread
From: Terje Bergstrom @ 2012-12-13 14:04 UTC (permalink / raw)
  To: tbergstrom, thierry.reding, dev, linux-tegra, dri-devel; +Cc: linux-kernel

Add host1x, the driver for host1x and its client unit 2D.

Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
---
 drivers/gpu/Makefile                      |    1 +
 drivers/gpu/host1x/Kconfig                |    6 +
 drivers/gpu/host1x/Makefile               |    8 +
 drivers/gpu/host1x/dev.c                  |  182 +++++++++++++++++++++++
 drivers/gpu/host1x/dev.h                  |   79 ++++++++++
 drivers/gpu/host1x/hw/Makefile            |    6 +
 drivers/gpu/host1x/hw/host1x01.c          |   36 +++++
 drivers/gpu/host1x/hw/host1x01.h          |   25 ++++
 drivers/gpu/host1x/hw/host1x01_hardware.h |   26 ++++
 drivers/gpu/host1x/hw/hw_host1x01_sync.h  |   66 +++++++++
 drivers/gpu/host1x/hw/syncpt_hw.c         |  146 +++++++++++++++++++
 drivers/gpu/host1x/syncpt.c               |  227 +++++++++++++++++++++++++++++
 drivers/gpu/host1x/syncpt.h               |  128 ++++++++++++++++
 drivers/video/Kconfig                     |    2 +
 include/linux/host1x.h                    |   41 ++++++
 include/trace/events/host1x.h             |   61 ++++++++
 16 files changed, 1040 insertions(+)
 create mode 100644 drivers/gpu/host1x/Kconfig
 create mode 100644 drivers/gpu/host1x/Makefile
 create mode 100644 drivers/gpu/host1x/dev.c
 create mode 100644 drivers/gpu/host1x/dev.h
 create mode 100644 drivers/gpu/host1x/hw/Makefile
 create mode 100644 drivers/gpu/host1x/hw/host1x01.c
 create mode 100644 drivers/gpu/host1x/hw/host1x01.h
 create mode 100644 drivers/gpu/host1x/hw/host1x01_hardware.h
 create mode 100644 drivers/gpu/host1x/hw/hw_host1x01_sync.h
 create mode 100644 drivers/gpu/host1x/hw/syncpt_hw.c
 create mode 100644 drivers/gpu/host1x/syncpt.c
 create mode 100644 drivers/gpu/host1x/syncpt.h
 create mode 100644 include/linux/host1x.h
 create mode 100644 include/trace/events/host1x.h

diff --git a/drivers/gpu/Makefile b/drivers/gpu/Makefile
index cc92778..7fa2f68 100644
--- a/drivers/gpu/Makefile
+++ b/drivers/gpu/Makefile
@@ -1 +1,2 @@
+obj-$(CONFIG_TEGRA_HOST1X)	+= host1x/
 obj-y			+= drm/ vga/ stub/
diff --git a/drivers/gpu/host1x/Kconfig b/drivers/gpu/host1x/Kconfig
new file mode 100644
index 0000000..e89fb2b
--- /dev/null
+++ b/drivers/gpu/host1x/Kconfig
@@ -0,0 +1,6 @@
+config TEGRA_HOST1X
+	tristate "Tegra host1x driver"
+	help
+	  Driver for the Tegra host1x hardware.
+
+	  Required for enabling tegradrm.
diff --git a/drivers/gpu/host1x/Makefile b/drivers/gpu/host1x/Makefile
new file mode 100644
index 0000000..a4adcc6
--- /dev/null
+++ b/drivers/gpu/host1x/Makefile
@@ -0,0 +1,8 @@
+ccflags-y = -Idrivers/gpu/host1x
+
+host1x-objs = \
+	syncpt.o \
+	dev.o
+
+obj-$(CONFIG_TEGRA_HOST1X) += hw/
+obj-$(CONFIG_TEGRA_HOST1X) += host1x.o
diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
new file mode 100644
index 0000000..b0d630d
--- /dev/null
+++ b/drivers/gpu/host1x/dev.c
@@ -0,0 +1,182 @@
+/*
+ * Tegra host1x driver
+ *
+ * Copyright (c) 2010-2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/module.h>
+#include <linux/host1x.h>
+#include <linux/list.h>
+#include <linux/slab.h>
+#include <linux/of.h>
+#include <linux/of_device.h>
+#include <linux/clk.h>
+#include <linux/io.h>
+#include "dev.h"
+#include "hw/host1x01.h"
+
+#define CREATE_TRACE_POINTS
+#include <trace/events/host1x.h>
+
+#define DRIVER_NAME		"tegra-host1x"
+
+struct host1x *host1x;
+
+void host1x_syncpt_incr_byid(u32 id)
+{
+	struct host1x_syncpt *sp = host1x->syncpt + id;
+	return host1x_syncpt_incr(sp);
+}
+EXPORT_SYMBOL(host1x_syncpt_incr_byid);
+
+u32 host1x_syncpt_read_byid(u32 id)
+{
+	struct host1x_syncpt *sp = host1x->syncpt + id;
+	return host1x_syncpt_read(sp);
+}
+EXPORT_SYMBOL(host1x_syncpt_read_byid);
+
+void host1x_sync_writel(struct host1x *host1x, u32 v, u32 r)
+{
+	void __iomem *sync_regs = host1x->regs + host1x->info.sync_offset;
+
+	writel(v, sync_regs + r);
+}
+
+u32 host1x_sync_readl(struct host1x *host1x, u32 r)
+{
+	void __iomem *sync_regs = host1x->regs + host1x->info.sync_offset;
+
+	return readl(sync_regs + r);
+}
+
+static struct host1x_device_info host1x_info = {
+	.nb_channels	= 8,
+	.nb_pts		= 32,
+	.nb_mlocks	= 16,
+	.nb_bases	= 8,
+	.init		= host1x01_init,
+	.sync_offset	= 0x3000,
+};
+
+static struct of_device_id host1x_match[] = {
+	{ .compatible = "nvidia,tegra30-host1x", .data = &host1x_info, },
+	{ .compatible = "nvidia,tegra20-host1x", .data = &host1x_info, },
+	{ },
+};
+
+static int host1x_probe(struct platform_device *dev)
+{
+	struct host1x *host;
+	struct resource *regs;
+	int syncpt_irq;
+	int err;
+	const struct of_device_id *devid =
+		of_match_device(host1x_match, &dev->dev);
+
+	if (!devid)
+		return -EINVAL;
+
+	regs = platform_get_resource(dev, IORESOURCE_MEM, 0);
+	if (!regs) {
+		dev_err(&dev->dev, "missing regs\n");
+		return -ENXIO;
+	}
+
+	syncpt_irq = platform_get_irq(dev, 0);
+	if (IS_ERR_VALUE(syncpt_irq)) {
+		dev_err(&dev->dev, "missing irq\n");
+		return -ENXIO;
+	}
+
+	host = devm_kzalloc(&dev->dev, sizeof(*host), GFP_KERNEL);
+	if (!host)
+		return -ENOMEM;
+
+	host->dev = dev;
+	memcpy(&host->info, devid->data, sizeof(struct host1x_device_info));
+
+	/* set common host1x device data */
+	platform_set_drvdata(dev, host);
+
+	host->regs = devm_request_and_ioremap(&dev->dev, regs);
+	if (!host->regs) {
+		dev_err(&dev->dev, "failed to remap host registers\n");
+		err = -ENXIO;
+		goto fail;
+	}
+
+	if (host->info.init) {
+		err = host->info.init(host);
+		if (err)
+			goto fail;
+	}
+
+	host->syncpt = host1x_syncpt_init(host);
+	if (!host->syncpt)
+		goto fail;
+
+	host->nop_sp = _host1x_syncpt_alloc(host, NULL, 0);
+	if (!host->nop_sp)
+		goto fail;
+
+	host->clk = devm_clk_get(&dev->dev, NULL);
+	if (IS_ERR(host->clk)) {
+		dev_err(&dev->dev, "failed to get clock\n");
+		err = PTR_ERR(host->clk);
+		goto fail;
+	}
+
+	err = clk_prepare_enable(host->clk);
+	if (err < 0)
+		goto fail;
+
+	host1x_syncpt_reset(host);
+
+	host1x = host;
+
+	dev_info(&dev->dev, "initialized\n");
+
+	return 0;
+
+fail:
+	host1x_syncpt_free(host->nop_sp);
+	kfree(host);
+	return err;
+}
+
+static int __exit host1x_remove(struct platform_device *dev)
+{
+	struct host1x *host = platform_get_drvdata(dev);
+	host1x_syncpt_deinit(host);
+	clk_disable_unprepare(host->clk);
+	return 0;
+}
+
+static struct platform_driver platform_driver = {
+	.probe = host1x_probe,
+	.remove = __exit_p(host1x_remove),
+	.driver = {
+		.owner = THIS_MODULE,
+		.name = DRIVER_NAME,
+		.of_match_table = host1x_match,
+	},
+};
+
+module_platform_driver(platform_driver);
+
+MODULE_AUTHOR("Terje Bergstrom <tbergstrom@nvidia.com>");
+MODULE_DESCRIPTION("Host1x driver for Tegra products");
+MODULE_LICENSE("GPL");
diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h
new file mode 100644
index 0000000..8245e24
--- /dev/null
+++ b/drivers/gpu/host1x/dev.h
@@ -0,0 +1,79 @@
+/*
+ * Copyright (c) 2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HOST1X_DEV_H
+#define HOST1X_DEV_H
+
+#include <linux/host1x.h>
+
+#include "syncpt.h"
+
+struct host1x;
+struct host1x_syncpt;
+struct platform_device;
+
+struct host1x_syncpt_ops {
+	void (*reset)(struct host1x_syncpt *);
+	void (*reset_wait_base)(struct host1x_syncpt *);
+	void (*read_wait_base)(struct host1x_syncpt *);
+	u32 (*load_min)(struct host1x_syncpt *);
+	void (*cpu_incr)(struct host1x_syncpt *);
+	int (*patch_wait)(struct host1x_syncpt *, void *patch_addr);
+	void (*debug)(struct host1x_syncpt *);
+	const char * (*name)(struct host1x_syncpt *);
+};
+
+struct host1x_device_info {
+	int	nb_channels;		/* host1x: num channels supported */
+	int	nb_pts;			/* host1x: num syncpoints supported */
+	int	nb_bases;		/* host1x: num syncpoints supported */
+	int	nb_mlocks;		/* host1x: number of mlocks */
+	int	(*init)(struct host1x *); /* initialize per SoC ops */
+	int	sync_offset;
+};
+
+struct host1x {
+	void __iomem *regs;
+	struct host1x_syncpt *syncpt;
+	struct platform_device *dev;
+	atomic_t clientid;
+	struct host1x_device_info info;
+	struct clk *clk;
+
+	struct host1x_syncpt *nop_sp;
+
+	const char *soc_name;
+	struct host1x_syncpt_ops syncpt_op;
+
+	struct dentry *debugfs;
+};
+
+static inline
+struct host1x *host1x_get_host(struct platform_device *_dev)
+{
+	struct platform_device *pdev;
+
+	if (_dev->dev.parent) {
+		pdev = to_platform_device(_dev->dev.parent);
+		return platform_get_drvdata(pdev);
+	} else
+		return platform_get_drvdata(_dev);
+}
+
+void host1x_sync_writel(struct host1x *host1x, u32 r, u32 v);
+u32 host1x_sync_readl(struct host1x *host1x, u32 r);
+
+#endif
diff --git a/drivers/gpu/host1x/hw/Makefile b/drivers/gpu/host1x/hw/Makefile
new file mode 100644
index 0000000..9b50863
--- /dev/null
+++ b/drivers/gpu/host1x/hw/Makefile
@@ -0,0 +1,6 @@
+ccflags-y = -Idrivers/gpu/host1x
+
+host1x-hw-objs  = \
+	host1x01.o
+
+obj-$(CONFIG_TEGRA_HOST1X) += host1x-hw.o
diff --git a/drivers/gpu/host1x/hw/host1x01.c b/drivers/gpu/host1x/hw/host1x01.c
new file mode 100644
index 0000000..59176ba
--- /dev/null
+++ b/drivers/gpu/host1x/hw/host1x01.c
@@ -0,0 +1,36 @@
+/*
+ * Host1x init for T20 and T30 Architecture Chips
+ *
+ * Copyright (c) 2011-2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/init.h>
+#include <linux/clk.h>
+#include <linux/of.h>
+#include <linux/of_platform.h>
+#include <linux/host1x.h>
+
+#include "hw/host1x01.h"
+#include "dev.h"
+#include "hw/host1x01_hardware.h"
+
+#include "hw/syncpt_hw.c"
+
+int host1x01_init(struct host1x *host)
+{
+	host->syncpt_op = host1x_syncpt_ops;
+
+	return 0;
+}
diff --git a/drivers/gpu/host1x/hw/host1x01.h b/drivers/gpu/host1x/hw/host1x01.h
new file mode 100644
index 0000000..177725b
--- /dev/null
+++ b/drivers/gpu/host1x/hw/host1x01.h
@@ -0,0 +1,25 @@
+/*
+ * Host1x init for T20 and T30 Architecture Chips
+ *
+ * Copyright (c) 2011-2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef HOST1X_HOST1X01_H
+#define HOST1X_HOST1X01_H
+
+struct host1x;
+
+int host1x01_init(struct host1x *);
+
+#endif /* HOST1X_HOST1X01_H_ */
diff --git a/drivers/gpu/host1x/hw/host1x01_hardware.h b/drivers/gpu/host1x/hw/host1x01_hardware.h
new file mode 100644
index 0000000..4e57f21
--- /dev/null
+++ b/drivers/gpu/host1x/hw/host1x01_hardware.h
@@ -0,0 +1,26 @@
+/*
+ * Tegra host1x Register Offsets for Tegra20 and Tegra30
+ *
+ * Copyright (c) 2010-2012 NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef __HOST1X_HOST1X01_HARDWARE_H
+#define __HOST1X_HOST1X01_HARDWARE_H
+
+#include <linux/types.h>
+#include <linux/bitops.h>
+#include "hw_host1x01_sync.h"
+
+#endif
diff --git a/drivers/gpu/host1x/hw/hw_host1x01_sync.h b/drivers/gpu/host1x/hw/hw_host1x01_sync.h
new file mode 100644
index 0000000..63a71c8
--- /dev/null
+++ b/drivers/gpu/host1x/hw/hw_host1x01_sync.h
@@ -0,0 +1,66 @@
+/*
+ * Copyright (c) 2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+ /*
+  * Function naming determines intended use:
+  *
+  *     <x>_r(void) : Returns the offset for register <x>.
+  *
+  *     <x>_w(void) : Returns the word offset for word (4 byte) element <x>.
+  *
+  *     <x>_<y>_s(void) : Returns size of field <y> of register <x> in bits.
+  *
+  *     <x>_<y>_f(u32 v) : Returns a value based on 'v' which has been shifted
+  *         and masked to place it at field <y> of register <x>.  This value
+  *         can be |'d with others to produce a full register value for
+  *         register <x>.
+  *
+  *     <x>_<y>_m(void) : Returns a mask for field <y> of register <x>.  This
+  *         value can be ~'d and then &'d to clear the value of field <y> for
+  *         register <x>.
+  *
+  *     <x>_<y>_<z>_f(void) : Returns the constant value <z> after being shifted
+  *         to place it at field <y> of register <x>.  This value can be |'d
+  *         with others to produce a full register value for <x>.
+  *
+  *     <x>_<y>_v(u32 r) : Returns the value of field <y> from a full register
+  *         <x> value 'r' after being shifted to place its LSB at bit 0.
+  *         This value is suitable for direct comparison with other unshifted
+  *         values appropriate for use in field <y> of register <x>.
+  *
+  *     <x>_<y>_<z>_v(void) : Returns the constant value for <z> defined for
+  *         field <y> of register <x>.  This value is suitable for direct
+  *         comparison with unshifted values appropriate for use in field <y>
+  *         of register <x>.
+  */
+
+#ifndef __hw_host1x_sync_h__
+#define __hw_host1x_sync_h__
+
+static inline u32 host1x_sync_syncpt_0_r(void)
+{
+	return 0x400;
+}
+static inline u32 host1x_sync_syncpt_base_0_r(void)
+{
+	return 0x600;
+}
+static inline u32 host1x_sync_syncpt_cpu_incr_r(void)
+{
+	return 0x700;
+}
+#endif /* __hw_host1x_host1x_h__ */
diff --git a/drivers/gpu/host1x/hw/syncpt_hw.c b/drivers/gpu/host1x/hw/syncpt_hw.c
new file mode 100644
index 0000000..44a10b0
--- /dev/null
+++ b/drivers/gpu/host1x/hw/syncpt_hw.c
@@ -0,0 +1,146 @@
+/*
+ * Tegra host1x Syncpoints
+ *
+ * Copyright (c) 2010-2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/io.h>
+#include "syncpt.h"
+#include "dev.h"
+
+/*
+ * Write the current syncpoint value back to hw.
+ */
+static void syncpt_reset(struct host1x_syncpt *sp)
+{
+	struct host1x *dev = sp->dev;
+	int min = host1x_syncpt_read_min(sp);
+	host1x_sync_writel(dev, min, host1x_sync_syncpt_0_r() + sp->id * 4);
+}
+
+/*
+ * Write the current waitbase value back to hw.
+ */
+static void syncpt_reset_wait_base(struct host1x_syncpt *sp)
+{
+	struct host1x *dev = sp->dev;
+	host1x_sync_writel(dev, sp->base_val,
+			host1x_sync_syncpt_base_0_r() + sp->id * 4);
+}
+
+/*
+ * Read waitbase value from hw.
+ */
+static void syncpt_read_wait_base(struct host1x_syncpt *sp)
+{
+	struct host1x *dev = sp->dev;
+	sp->base_val = host1x_sync_readl(dev,
+				host1x_sync_syncpt_base_0_r() + sp->id * 4);
+}
+
+/*
+ * Updates the last value read from hardware.
+ * (was host1x_syncpt_load_min)
+ */
+static u32 syncpt_load_min(struct host1x_syncpt *sp)
+{
+	struct host1x *dev = sp->dev;
+	u32 old, live;
+
+	do {
+		old = host1x_syncpt_read_min(sp);
+		live = host1x_sync_readl(dev,
+				host1x_sync_syncpt_0_r() + sp->id * 4);
+	} while ((u32)atomic_cmpxchg(&sp->min_val, old, live) != old);
+
+	if (!host1x_syncpt_check_max(sp, live))
+		dev_err(&dev->dev->dev,
+				"%s failed: id=%u, min=%d, max=%d\n",
+				__func__,
+				sp->id,
+				host1x_syncpt_read_min(sp),
+				host1x_syncpt_read_max(sp));
+
+	return live;
+}
+
+/*
+ * Write a cpu syncpoint increment to the hardware, without touching
+ * the cache. Caller is responsible for host being powered.
+ */
+static void syncpt_cpu_incr(struct host1x_syncpt *sp)
+{
+	struct host1x *dev = sp->dev;
+	u32 reg_offset = sp->id / 32;
+
+	if (!host1x_syncpt_client_managed(sp)
+			&& host1x_syncpt_min_eq_max(sp)) {
+		dev_err(&dev->dev->dev,
+			"Trying to increment syncpoint id %d beyond max\n",
+			sp->id);
+		return;
+	}
+	host1x_sync_writel(dev, BIT_MASK(sp->id),
+			host1x_sync_syncpt_cpu_incr_r() + reg_offset * 4);
+	wmb();
+}
+
+static const char *syncpt_name(struct host1x_syncpt *sp)
+{
+	struct host1x_device_info *info = &sp->dev->info;
+	const char *name = NULL;
+
+	if (sp->id < info->nb_pts)
+		name = sp->name;
+
+	return name ? name : "";
+}
+
+static void syncpt_debug(struct host1x_syncpt *sp)
+{
+	u32 i;
+	for (i = 0; i < host1x_syncpt_nb_pts(sp->dev); i++) {
+		u32 max = host1x_syncpt_read_max(sp);
+		u32 min = host1x_syncpt_load_min(sp);
+		if (!max && !min)
+			continue;
+		dev_info(&sp->dev->dev->dev,
+			"id %d (%s) min %d max %d\n",
+			i, sp->name,
+			min, max);
+
+	}
+
+	for (i = 0; i < host1x_syncpt_nb_bases(sp->dev); i++) {
+		u32 base_val;
+		host1x_syncpt_read_wait_base(sp);
+		base_val = sp->base_val;
+		if (base_val)
+			dev_info(&sp->dev->dev->dev,
+					"waitbase id %d val %d\n",
+					i, base_val);
+
+	}
+}
+
+static const struct host1x_syncpt_ops host1x_syncpt_ops = {
+	.reset = syncpt_reset,
+	.reset_wait_base = syncpt_reset_wait_base,
+	.read_wait_base = syncpt_read_wait_base,
+	.load_min = syncpt_load_min,
+	.cpu_incr = syncpt_cpu_incr,
+	.debug = syncpt_debug,
+	.name = syncpt_name,
+};
diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
new file mode 100644
index 0000000..d551325
--- /dev/null
+++ b/drivers/gpu/host1x/syncpt.c
@@ -0,0 +1,227 @@
+/*
+ * Tegra host1x Syncpoints
+ *
+ * Copyright (c) 2010-2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/platform_device.h>
+#include <linux/slab.h>
+#include <linux/stat.h>
+#include <linux/module.h>
+#include "syncpt.h"
+#include "dev.h"
+#include <trace/events/host1x.h>
+
+#define MAX_SYNCPT_LENGTH	5
+
+u32 host1x_syncpt_id(struct host1x_syncpt *sp)
+{
+	return sp->id;
+}
+EXPORT_SYMBOL(host1x_syncpt_id);
+
+/*
+ * Updates the value sent to hardware.
+ */
+u32 host1x_syncpt_incr_max(struct host1x_syncpt *sp, u32 incrs)
+{
+	return (u32)atomic_add_return(incrs, &sp->max_val);
+}
+EXPORT_SYMBOL(host1x_syncpt_incr_max);
+
+/*
+ * Resets syncpoint and waitbase values to sw shadows
+ */
+void host1x_syncpt_reset(struct host1x *dev)
+{
+	struct host1x_syncpt *sp_base = dev->syncpt;
+	u32 i;
+
+	for (i = 0; i < host1x_syncpt_nb_pts(dev); i++)
+		dev->syncpt_op.reset(sp_base + i);
+	for (i = 0; i < host1x_syncpt_nb_bases(dev); i++)
+		dev->syncpt_op.reset_wait_base(sp_base + i);
+	wmb();
+}
+
+/*
+ * Updates sw shadow state for client managed registers
+ */
+void host1x_syncpt_save(struct host1x *dev)
+{
+	struct host1x_syncpt *sp_base = dev->syncpt;
+	u32 i;
+
+	for (i = 0; i < host1x_syncpt_nb_pts(dev); i++) {
+		if (host1x_syncpt_client_managed(sp_base + i))
+			dev->syncpt_op.load_min(sp_base + i);
+		else
+			WARN_ON(!host1x_syncpt_min_eq_max(sp_base + i));
+	}
+
+	for (i = 0; i < host1x_syncpt_nb_bases(dev); i++)
+		dev->syncpt_op.read_wait_base(sp_base + i);
+}
+
+/*
+ * Updates the last value read from hardware.
+ */
+u32 host1x_syncpt_load_min(struct host1x_syncpt *sp)
+{
+	u32 val;
+	val = sp->dev->syncpt_op.load_min(sp);
+	trace_host1x_syncpt_load_min(sp->id, val);
+
+	return val;
+}
+
+/*
+ * Get the current syncpoint value
+ */
+u32 host1x_syncpt_read(struct host1x_syncpt *sp)
+{
+	u32 val;
+	val = sp->dev->syncpt_op.load_min(sp);
+	return val;
+}
+EXPORT_SYMBOL(host1x_syncpt_read);
+
+/*
+ * Get the current syncpoint base
+ */
+u32 host1x_syncpt_read_wait_base(struct host1x_syncpt *sp)
+{
+	u32 val;
+	sp->dev->syncpt_op.read_wait_base(sp);
+	val = sp->base_val;
+	return val;
+}
+
+/*
+ * Write a cpu syncpoint increment to the hardware, without touching
+ * the cache. Caller is responsible for host being powered.
+ */
+void host1x_syncpt_cpu_incr(struct host1x_syncpt *sp)
+{
+	sp->dev->syncpt_op.cpu_incr(sp);
+}
+
+/*
+ * Increment syncpoint value from cpu, updating cache
+ */
+void host1x_syncpt_incr(struct host1x_syncpt *sp)
+{
+	if (host1x_syncpt_client_managed(sp))
+		host1x_syncpt_incr_max(sp, 1);
+	host1x_syncpt_cpu_incr(sp);
+}
+EXPORT_SYMBOL(host1x_syncpt_incr);
+
+void host1x_syncpt_debug(struct host1x_syncpt *sp)
+{
+	sp->dev->syncpt_op.debug(sp);
+}
+
+struct host1x_syncpt *host1x_syncpt_init(struct host1x *host)
+{
+	struct host1x_syncpt *syncpt, *sp;
+	int i;
+
+	syncpt = sp = kzalloc(sizeof(struct host1x_syncpt) * host->info.nb_pts,
+		GFP_KERNEL);
+	if (!syncpt)
+		return NULL;
+
+	for (i = 0; i < host->info.nb_pts; ++i, ++sp) {
+		sp->id = i;
+		sp->dev = host;
+	}
+
+	return syncpt;
+}
+
+struct host1x_syncpt *_host1x_syncpt_alloc(struct host1x *host,
+		struct platform_device *pdev,
+		int client_managed)
+{
+	int i;
+	struct host1x_syncpt *sp = host->syncpt;
+	char *name;
+
+	for (i = 0; i < host->info.nb_pts && sp->name; i++, sp++)
+		;
+	if (sp->pdev)
+		return NULL;
+
+	name = kasprintf(GFP_KERNEL, "%02d-%s", sp->id,
+			pdev ? dev_name(&pdev->dev) : NULL);
+	if (!name)
+		return NULL;
+
+	sp->pdev = pdev;
+	sp->name = name;
+	sp->client_managed = client_managed;
+
+	return sp;
+}
+
+struct host1x_syncpt *host1x_syncpt_alloc(struct platform_device *pdev,
+		int client_managed)
+{
+	struct host1x *host = host1x_get_host(pdev);
+	return _host1x_syncpt_alloc(host, pdev, client_managed);
+}
+EXPORT_SYMBOL(host1x_syncpt_alloc);
+
+void host1x_syncpt_free(struct host1x_syncpt *sp)
+{
+	if (!sp)
+		return;
+
+	kfree(sp->name);
+	sp->pdev = NULL;
+	sp->name = NULL;
+	sp->client_managed = 0;
+}
+EXPORT_SYMBOL(host1x_syncpt_free);
+
+void host1x_syncpt_deinit(struct host1x *host)
+{
+	int i;
+	struct host1x_syncpt *sp = host->syncpt;
+	for (i = 0; i < host->info.nb_pts; i++, sp++)
+		kfree(sp->name);
+	kfree(sp);
+}
+
+int host1x_syncpt_nb_pts(struct host1x *dev)
+{
+	return dev->info.nb_pts;
+}
+
+int host1x_syncpt_nb_bases(struct host1x *dev)
+{
+	return dev->info.nb_bases;
+}
+
+int host1x_syncpt_nb_mlocks(struct host1x *dev)
+{
+	return dev->info.nb_mlocks;
+}
+
+struct host1x_syncpt *host1x_syncpt_get(struct host1x *dev, u32 id)
+{
+	return dev->syncpt + id;
+}
diff --git a/drivers/gpu/host1x/syncpt.h b/drivers/gpu/host1x/syncpt.h
new file mode 100644
index 0000000..4f7777b
--- /dev/null
+++ b/drivers/gpu/host1x/syncpt.h
@@ -0,0 +1,128 @@
+/*
+ * Tegra host1x Syncpoints
+ *
+ * Copyright (c) 2010-2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef __HOST1X_SYNCPT_H
+#define __HOST1X_SYNCPT_H
+
+#include <linux/kernel.h>
+#include <linux/sched.h>
+#include <linux/host1x.h>
+#include <linux/atomic.h>
+
+/* host managed and invalid syncpt id */
+#define NVSYNCPT_GRAPHICS_HOST		     (0)
+
+struct host1x;
+
+struct host1x_syncpt {
+	int id;
+	atomic_t min_val;
+	atomic_t max_val;
+	u32 base_val;
+	const char *name;
+	int client_managed;
+	struct host1x *dev;
+	struct platform_device *pdev;
+};
+
+struct host1x_syncpt *host1x_syncpt_init(struct host1x *);
+void host1x_syncpt_deinit(struct host1x *);
+
+struct host1x_syncpt *_host1x_syncpt_alloc(struct host1x *,
+		struct platform_device *, int);
+void _host1x_syncpt_free(struct host1x_syncpt *sp);
+
+#define SYNCPT_CHECK_PERIOD (2 * HZ)
+#define MAX_STUCK_CHECK_COUNT 15
+
+/*
+ * Updated the value sent to hardware.
+ */
+static inline u32 host1x_syncpt_set_max(struct host1x_syncpt *sp, u32 val)
+{
+	atomic_set(&sp->max_val, val);
+	smp_wmb();
+	return val;
+}
+
+static inline u32 host1x_syncpt_read_max(struct host1x_syncpt *sp)
+{
+	smp_rmb();
+	return (u32)atomic_read(&sp->max_val);
+}
+
+static inline u32 host1x_syncpt_read_min(struct host1x_syncpt *sp)
+{
+	smp_rmb();
+	return (u32)atomic_read(&sp->min_val);
+}
+
+int host1x_syncpt_nb_pts(struct host1x *dev);
+int host1x_syncpt_nb_bases(struct host1x *dev);
+int host1x_syncpt_nb_mlocks(struct host1x *dev);
+
+static inline bool host1x_syncpt_check_max(struct host1x_syncpt *sp, u32 real)
+{
+	u32 max;
+	if (sp->client_managed)
+		return true;
+	max = host1x_syncpt_read_max(sp);
+	return (s32)(max - real) >= 0;
+}
+
+static inline int host1x_syncpt_client_managed(struct host1x_syncpt *sp)
+{
+	return sp->client_managed;
+}
+/*
+ * Returns true if syncpoint min == max
+ */
+static inline bool host1x_syncpt_min_eq_max(struct host1x_syncpt *sp)
+{
+	int min, max;
+	smp_rmb();
+	min = atomic_read(&sp->min_val);
+	max = atomic_read(&sp->max_val);
+	return (min == max);
+}
+
+struct host1x_syncpt *host1x_syncpt_get(struct host1x *dev, u32 id);
+
+void host1x_syncpt_cpu_incr(struct host1x_syncpt *sp);
+
+u32 host1x_syncpt_load_min(struct host1x_syncpt *sp);
+
+void host1x_syncpt_save(struct host1x *dev);
+
+void host1x_syncpt_reset(struct host1x *dev);
+
+u32 host1x_syncpt_read(struct host1x_syncpt *sp);
+u32 host1x_syncpt_read_wait_base(struct host1x_syncpt *sp);
+
+void host1x_syncpt_incr(struct host1x_syncpt *sp);
+u32 host1x_syncpt_incr_max(struct host1x_syncpt *sp, u32 incrs);
+
+void host1x_syncpt_debug(struct host1x_syncpt *sp);
+
+static inline int host1x_syncpt_is_valid(struct host1x_syncpt *sp)
+{
+	return sp->id != NVSYNCPT_INVALID &&
+		sp->id < host1x_syncpt_nb_pts(sp->dev);
+}
+
+#endif
diff --git a/drivers/video/Kconfig b/drivers/video/Kconfig
index e7068c5..09b3762 100644
--- a/drivers/video/Kconfig
+++ b/drivers/video/Kconfig
@@ -19,6 +19,8 @@ source "drivers/char/agp/Kconfig"
 
 source "drivers/gpu/vga/Kconfig"
 
+source "drivers/gpu/host1x/Kconfig"
+
 source "drivers/gpu/drm/Kconfig"
 
 source "drivers/gpu/stub/Kconfig"
diff --git a/include/linux/host1x.h b/include/linux/host1x.h
new file mode 100644
index 0000000..6c2cc8a
--- /dev/null
+++ b/include/linux/host1x.h
@@ -0,0 +1,41 @@
+/*
+ * Tegra host1x driver
+ *
+ * Copyright (c) 2009-2012, NVIDIA Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#ifndef __LINUX_HOST1X_H
+#define __LINUX_HOST1X_H
+
+#include <linux/device.h>
+#include <linux/types.h>
+#include <linux/platform_device.h>
+
+struct host1x_syncpt;
+
+#define NVSYNCPT_INVALID			(-1)
+
+/* public host1x sync-point management APIs */
+u32 host1x_syncpt_id(struct host1x_syncpt *sp);
+void host1x_syncpt_incr_byid(u32 id);
+u32 host1x_syncpt_read_byid(u32 id);
+
+struct host1x_syncpt *host1x_syncpt_alloc(struct platform_device *pdev,
+		int client_managed);
+void host1x_syncpt_free(struct host1x_syncpt *sp);
+
+#endif
diff --git a/include/trace/events/host1x.h b/include/trace/events/host1x.h
new file mode 100644
index 0000000..d98d74c
--- /dev/null
+++ b/include/trace/events/host1x.h
@@ -0,0 +1,61 @@
+/*
+ * include/trace/events/host1x.h
+ *
+ * Nvhost event logging to ftrace.
+ *
+ * Copyright (c) 2010-2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM host1x
+
+#if !defined(_TRACE_HOST1X_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_HOST1X_H
+
+#include <linux/ktime.h>
+#include <linux/tracepoint.h>
+
+DECLARE_EVENT_CLASS(host1x,
+	TP_PROTO(const char *name),
+	TP_ARGS(name),
+	TP_STRUCT__entry(__field(const char *, name)),
+	TP_fast_assign(__entry->name = name;),
+	TP_printk("name=%s", __entry->name)
+);
+
+TRACE_EVENT(host1x_syncpt_load_min,
+	TP_PROTO(u32 id, u32 val),
+
+	TP_ARGS(id, val),
+
+	TP_STRUCT__entry(
+		__field(u32, id)
+		__field(u32, val)
+	),
+
+	TP_fast_assign(
+		__entry->id = id;
+		__entry->val = val;
+	),
+
+	TP_printk("id=%d, val=%d", __entry->id, __entry->val)
+);
+
+#endif /*  _TRACE_HOST1X_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCHv3 1/7] gpu: host1x: Add host1x driver
@ 2012-12-13 14:04   ` Terje Bergstrom
  0 siblings, 0 replies; 24+ messages in thread
From: Terje Bergstrom @ 2012-12-13 14:04 UTC (permalink / raw)
  To: tbergstrom, thierry.reding, dev, linux-tegra, dri-devel
  Cc: amerilainen, linux-kernel

Add host1x, the driver for host1x and its client unit 2D.

Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
---
 drivers/gpu/Makefile                      |    1 +
 drivers/gpu/host1x/Kconfig                |    6 +
 drivers/gpu/host1x/Makefile               |    8 +
 drivers/gpu/host1x/dev.c                  |  182 +++++++++++++++++++++++
 drivers/gpu/host1x/dev.h                  |   79 ++++++++++
 drivers/gpu/host1x/hw/Makefile            |    6 +
 drivers/gpu/host1x/hw/host1x01.c          |   36 +++++
 drivers/gpu/host1x/hw/host1x01.h          |   25 ++++
 drivers/gpu/host1x/hw/host1x01_hardware.h |   26 ++++
 drivers/gpu/host1x/hw/hw_host1x01_sync.h  |   66 +++++++++
 drivers/gpu/host1x/hw/syncpt_hw.c         |  146 +++++++++++++++++++
 drivers/gpu/host1x/syncpt.c               |  227 +++++++++++++++++++++++++++++
 drivers/gpu/host1x/syncpt.h               |  128 ++++++++++++++++
 drivers/video/Kconfig                     |    2 +
 include/linux/host1x.h                    |   41 ++++++
 include/trace/events/host1x.h             |   61 ++++++++
 16 files changed, 1040 insertions(+)
 create mode 100644 drivers/gpu/host1x/Kconfig
 create mode 100644 drivers/gpu/host1x/Makefile
 create mode 100644 drivers/gpu/host1x/dev.c
 create mode 100644 drivers/gpu/host1x/dev.h
 create mode 100644 drivers/gpu/host1x/hw/Makefile
 create mode 100644 drivers/gpu/host1x/hw/host1x01.c
 create mode 100644 drivers/gpu/host1x/hw/host1x01.h
 create mode 100644 drivers/gpu/host1x/hw/host1x01_hardware.h
 create mode 100644 drivers/gpu/host1x/hw/hw_host1x01_sync.h
 create mode 100644 drivers/gpu/host1x/hw/syncpt_hw.c
 create mode 100644 drivers/gpu/host1x/syncpt.c
 create mode 100644 drivers/gpu/host1x/syncpt.h
 create mode 100644 include/linux/host1x.h
 create mode 100644 include/trace/events/host1x.h

diff --git a/drivers/gpu/Makefile b/drivers/gpu/Makefile
index cc92778..7fa2f68 100644
--- a/drivers/gpu/Makefile
+++ b/drivers/gpu/Makefile
@@ -1 +1,2 @@
+obj-$(CONFIG_TEGRA_HOST1X)	+= host1x/
 obj-y			+= drm/ vga/ stub/
diff --git a/drivers/gpu/host1x/Kconfig b/drivers/gpu/host1x/Kconfig
new file mode 100644
index 0000000..e89fb2b
--- /dev/null
+++ b/drivers/gpu/host1x/Kconfig
@@ -0,0 +1,6 @@
+config TEGRA_HOST1X
+	tristate "Tegra host1x driver"
+	help
+	  Driver for the Tegra host1x hardware.
+
+	  Required for enabling tegradrm.
diff --git a/drivers/gpu/host1x/Makefile b/drivers/gpu/host1x/Makefile
new file mode 100644
index 0000000..a4adcc6
--- /dev/null
+++ b/drivers/gpu/host1x/Makefile
@@ -0,0 +1,8 @@
+ccflags-y = -Idrivers/gpu/host1x
+
+host1x-objs = \
+	syncpt.o \
+	dev.o
+
+obj-$(CONFIG_TEGRA_HOST1X) += hw/
+obj-$(CONFIG_TEGRA_HOST1X) += host1x.o
diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
new file mode 100644
index 0000000..b0d630d
--- /dev/null
+++ b/drivers/gpu/host1x/dev.c
@@ -0,0 +1,182 @@
+/*
+ * Tegra host1x driver
+ *
+ * Copyright (c) 2010-2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/module.h>
+#include <linux/host1x.h>
+#include <linux/list.h>
+#include <linux/slab.h>
+#include <linux/of.h>
+#include <linux/of_device.h>
+#include <linux/clk.h>
+#include <linux/io.h>
+#include "dev.h"
+#include "hw/host1x01.h"
+
+#define CREATE_TRACE_POINTS
+#include <trace/events/host1x.h>
+
+#define DRIVER_NAME		"tegra-host1x"
+
+struct host1x *host1x;
+
+void host1x_syncpt_incr_byid(u32 id)
+{
+	struct host1x_syncpt *sp = host1x->syncpt + id;
+	return host1x_syncpt_incr(sp);
+}
+EXPORT_SYMBOL(host1x_syncpt_incr_byid);
+
+u32 host1x_syncpt_read_byid(u32 id)
+{
+	struct host1x_syncpt *sp = host1x->syncpt + id;
+	return host1x_syncpt_read(sp);
+}
+EXPORT_SYMBOL(host1x_syncpt_read_byid);
+
+void host1x_sync_writel(struct host1x *host1x, u32 v, u32 r)
+{
+	void __iomem *sync_regs = host1x->regs + host1x->info.sync_offset;
+
+	writel(v, sync_regs + r);
+}
+
+u32 host1x_sync_readl(struct host1x *host1x, u32 r)
+{
+	void __iomem *sync_regs = host1x->regs + host1x->info.sync_offset;
+
+	return readl(sync_regs + r);
+}
+
+static struct host1x_device_info host1x_info = {
+	.nb_channels	= 8,
+	.nb_pts		= 32,
+	.nb_mlocks	= 16,
+	.nb_bases	= 8,
+	.init		= host1x01_init,
+	.sync_offset	= 0x3000,
+};
+
+static struct of_device_id host1x_match[] = {
+	{ .compatible = "nvidia,tegra30-host1x", .data = &host1x_info, },
+	{ .compatible = "nvidia,tegra20-host1x", .data = &host1x_info, },
+	{ },
+};
+
+static int host1x_probe(struct platform_device *dev)
+{
+	struct host1x *host;
+	struct resource *regs;
+	int syncpt_irq;
+	int err;
+	const struct of_device_id *devid =
+		of_match_device(host1x_match, &dev->dev);
+
+	if (!devid)
+		return -EINVAL;
+
+	regs = platform_get_resource(dev, IORESOURCE_MEM, 0);
+	if (!regs) {
+		dev_err(&dev->dev, "missing regs\n");
+		return -ENXIO;
+	}
+
+	syncpt_irq = platform_get_irq(dev, 0);
+	if (IS_ERR_VALUE(syncpt_irq)) {
+		dev_err(&dev->dev, "missing irq\n");
+		return -ENXIO;
+	}
+
+	host = devm_kzalloc(&dev->dev, sizeof(*host), GFP_KERNEL);
+	if (!host)
+		return -ENOMEM;
+
+	host->dev = dev;
+	memcpy(&host->info, devid->data, sizeof(struct host1x_device_info));
+
+	/* set common host1x device data */
+	platform_set_drvdata(dev, host);
+
+	host->regs = devm_request_and_ioremap(&dev->dev, regs);
+	if (!host->regs) {
+		dev_err(&dev->dev, "failed to remap host registers\n");
+		err = -ENXIO;
+		goto fail;
+	}
+
+	if (host->info.init) {
+		err = host->info.init(host);
+		if (err)
+			goto fail;
+	}
+
+	host->syncpt = host1x_syncpt_init(host);
+	if (!host->syncpt)
+		goto fail;
+
+	host->nop_sp = _host1x_syncpt_alloc(host, NULL, 0);
+	if (!host->nop_sp)
+		goto fail;
+
+	host->clk = devm_clk_get(&dev->dev, NULL);
+	if (IS_ERR(host->clk)) {
+		dev_err(&dev->dev, "failed to get clock\n");
+		err = PTR_ERR(host->clk);
+		goto fail;
+	}
+
+	err = clk_prepare_enable(host->clk);
+	if (err < 0)
+		goto fail;
+
+	host1x_syncpt_reset(host);
+
+	host1x = host;
+
+	dev_info(&dev->dev, "initialized\n");
+
+	return 0;
+
+fail:
+	host1x_syncpt_free(host->nop_sp);
+	kfree(host);
+	return err;
+}
+
+static int __exit host1x_remove(struct platform_device *dev)
+{
+	struct host1x *host = platform_get_drvdata(dev);
+	host1x_syncpt_deinit(host);
+	clk_disable_unprepare(host->clk);
+	return 0;
+}
+
+static struct platform_driver platform_driver = {
+	.probe = host1x_probe,
+	.remove = __exit_p(host1x_remove),
+	.driver = {
+		.owner = THIS_MODULE,
+		.name = DRIVER_NAME,
+		.of_match_table = host1x_match,
+	},
+};
+
+module_platform_driver(platform_driver);
+
+MODULE_AUTHOR("Terje Bergstrom <tbergstrom@nvidia.com>");
+MODULE_DESCRIPTION("Host1x driver for Tegra products");
+MODULE_LICENSE("GPL");
diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h
new file mode 100644
index 0000000..8245e24
--- /dev/null
+++ b/drivers/gpu/host1x/dev.h
@@ -0,0 +1,79 @@
+/*
+ * Copyright (c) 2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HOST1X_DEV_H
+#define HOST1X_DEV_H
+
+#include <linux/host1x.h>
+
+#include "syncpt.h"
+
+struct host1x;
+struct host1x_syncpt;
+struct platform_device;
+
+struct host1x_syncpt_ops {
+	void (*reset)(struct host1x_syncpt *);
+	void (*reset_wait_base)(struct host1x_syncpt *);
+	void (*read_wait_base)(struct host1x_syncpt *);
+	u32 (*load_min)(struct host1x_syncpt *);
+	void (*cpu_incr)(struct host1x_syncpt *);
+	int (*patch_wait)(struct host1x_syncpt *, void *patch_addr);
+	void (*debug)(struct host1x_syncpt *);
+	const char * (*name)(struct host1x_syncpt *);
+};
+
+struct host1x_device_info {
+	int	nb_channels;		/* host1x: num channels supported */
+	int	nb_pts;			/* host1x: num syncpoints supported */
+	int	nb_bases;		/* host1x: num syncpoints supported */
+	int	nb_mlocks;		/* host1x: number of mlocks */
+	int	(*init)(struct host1x *); /* initialize per SoC ops */
+	int	sync_offset;
+};
+
+struct host1x {
+	void __iomem *regs;
+	struct host1x_syncpt *syncpt;
+	struct platform_device *dev;
+	atomic_t clientid;
+	struct host1x_device_info info;
+	struct clk *clk;
+
+	struct host1x_syncpt *nop_sp;
+
+	const char *soc_name;
+	struct host1x_syncpt_ops syncpt_op;
+
+	struct dentry *debugfs;
+};
+
+static inline
+struct host1x *host1x_get_host(struct platform_device *_dev)
+{
+	struct platform_device *pdev;
+
+	if (_dev->dev.parent) {
+		pdev = to_platform_device(_dev->dev.parent);
+		return platform_get_drvdata(pdev);
+	} else
+		return platform_get_drvdata(_dev);
+}
+
+void host1x_sync_writel(struct host1x *host1x, u32 r, u32 v);
+u32 host1x_sync_readl(struct host1x *host1x, u32 r);
+
+#endif
diff --git a/drivers/gpu/host1x/hw/Makefile b/drivers/gpu/host1x/hw/Makefile
new file mode 100644
index 0000000..9b50863
--- /dev/null
+++ b/drivers/gpu/host1x/hw/Makefile
@@ -0,0 +1,6 @@
+ccflags-y = -Idrivers/gpu/host1x
+
+host1x-hw-objs  = \
+	host1x01.o
+
+obj-$(CONFIG_TEGRA_HOST1X) += host1x-hw.o
diff --git a/drivers/gpu/host1x/hw/host1x01.c b/drivers/gpu/host1x/hw/host1x01.c
new file mode 100644
index 0000000..59176ba
--- /dev/null
+++ b/drivers/gpu/host1x/hw/host1x01.c
@@ -0,0 +1,36 @@
+/*
+ * Host1x init for T20 and T30 Architecture Chips
+ *
+ * Copyright (c) 2011-2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/init.h>
+#include <linux/clk.h>
+#include <linux/of.h>
+#include <linux/of_platform.h>
+#include <linux/host1x.h>
+
+#include "hw/host1x01.h"
+#include "dev.h"
+#include "hw/host1x01_hardware.h"
+
+#include "hw/syncpt_hw.c"
+
+int host1x01_init(struct host1x *host)
+{
+	host->syncpt_op = host1x_syncpt_ops;
+
+	return 0;
+}
diff --git a/drivers/gpu/host1x/hw/host1x01.h b/drivers/gpu/host1x/hw/host1x01.h
new file mode 100644
index 0000000..177725b
--- /dev/null
+++ b/drivers/gpu/host1x/hw/host1x01.h
@@ -0,0 +1,25 @@
+/*
+ * Host1x init for T20 and T30 Architecture Chips
+ *
+ * Copyright (c) 2011-2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef HOST1X_HOST1X01_H
+#define HOST1X_HOST1X01_H
+
+struct host1x;
+
+int host1x01_init(struct host1x *);
+
+#endif /* HOST1X_HOST1X01_H_ */
diff --git a/drivers/gpu/host1x/hw/host1x01_hardware.h b/drivers/gpu/host1x/hw/host1x01_hardware.h
new file mode 100644
index 0000000..4e57f21
--- /dev/null
+++ b/drivers/gpu/host1x/hw/host1x01_hardware.h
@@ -0,0 +1,26 @@
+/*
+ * Tegra host1x Register Offsets for Tegra20 and Tegra30
+ *
+ * Copyright (c) 2010-2012 NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef __HOST1X_HOST1X01_HARDWARE_H
+#define __HOST1X_HOST1X01_HARDWARE_H
+
+#include <linux/types.h>
+#include <linux/bitops.h>
+#include "hw_host1x01_sync.h"
+
+#endif
diff --git a/drivers/gpu/host1x/hw/hw_host1x01_sync.h b/drivers/gpu/host1x/hw/hw_host1x01_sync.h
new file mode 100644
index 0000000..63a71c8
--- /dev/null
+++ b/drivers/gpu/host1x/hw/hw_host1x01_sync.h
@@ -0,0 +1,66 @@
+/*
+ * Copyright (c) 2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+ /*
+  * Function naming determines intended use:
+  *
+  *     <x>_r(void) : Returns the offset for register <x>.
+  *
+  *     <x>_w(void) : Returns the word offset for word (4 byte) element <x>.
+  *
+  *     <x>_<y>_s(void) : Returns size of field <y> of register <x> in bits.
+  *
+  *     <x>_<y>_f(u32 v) : Returns a value based on 'v' which has been shifted
+  *         and masked to place it at field <y> of register <x>.  This value
+  *         can be |'d with others to produce a full register value for
+  *         register <x>.
+  *
+  *     <x>_<y>_m(void) : Returns a mask for field <y> of register <x>.  This
+  *         value can be ~'d and then &'d to clear the value of field <y> for
+  *         register <x>.
+  *
+  *     <x>_<y>_<z>_f(void) : Returns the constant value <z> after being shifted
+  *         to place it at field <y> of register <x>.  This value can be |'d
+  *         with others to produce a full register value for <x>.
+  *
+  *     <x>_<y>_v(u32 r) : Returns the value of field <y> from a full register
+  *         <x> value 'r' after being shifted to place its LSB at bit 0.
+  *         This value is suitable for direct comparison with other unshifted
+  *         values appropriate for use in field <y> of register <x>.
+  *
+  *     <x>_<y>_<z>_v(void) : Returns the constant value for <z> defined for
+  *         field <y> of register <x>.  This value is suitable for direct
+  *         comparison with unshifted values appropriate for use in field <y>
+  *         of register <x>.
+  */
+
+#ifndef __hw_host1x_sync_h__
+#define __hw_host1x_sync_h__
+
+static inline u32 host1x_sync_syncpt_0_r(void)
+{
+	return 0x400;
+}
+static inline u32 host1x_sync_syncpt_base_0_r(void)
+{
+	return 0x600;
+}
+static inline u32 host1x_sync_syncpt_cpu_incr_r(void)
+{
+	return 0x700;
+}
+#endif /* __hw_host1x_host1x_h__ */
diff --git a/drivers/gpu/host1x/hw/syncpt_hw.c b/drivers/gpu/host1x/hw/syncpt_hw.c
new file mode 100644
index 0000000..44a10b0
--- /dev/null
+++ b/drivers/gpu/host1x/hw/syncpt_hw.c
@@ -0,0 +1,146 @@
+/*
+ * Tegra host1x Syncpoints
+ *
+ * Copyright (c) 2010-2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/io.h>
+#include "syncpt.h"
+#include "dev.h"
+
+/*
+ * Write the current syncpoint value back to hw.
+ */
+static void syncpt_reset(struct host1x_syncpt *sp)
+{
+	struct host1x *dev = sp->dev;
+	int min = host1x_syncpt_read_min(sp);
+	host1x_sync_writel(dev, min, host1x_sync_syncpt_0_r() + sp->id * 4);
+}
+
+/*
+ * Write the current waitbase value back to hw.
+ */
+static void syncpt_reset_wait_base(struct host1x_syncpt *sp)
+{
+	struct host1x *dev = sp->dev;
+	host1x_sync_writel(dev, sp->base_val,
+			host1x_sync_syncpt_base_0_r() + sp->id * 4);
+}
+
+/*
+ * Read waitbase value from hw.
+ */
+static void syncpt_read_wait_base(struct host1x_syncpt *sp)
+{
+	struct host1x *dev = sp->dev;
+	sp->base_val = host1x_sync_readl(dev,
+				host1x_sync_syncpt_base_0_r() + sp->id * 4);
+}
+
+/*
+ * Updates the last value read from hardware.
+ * (was host1x_syncpt_load_min)
+ */
+static u32 syncpt_load_min(struct host1x_syncpt *sp)
+{
+	struct host1x *dev = sp->dev;
+	u32 old, live;
+
+	do {
+		old = host1x_syncpt_read_min(sp);
+		live = host1x_sync_readl(dev,
+				host1x_sync_syncpt_0_r() + sp->id * 4);
+	} while ((u32)atomic_cmpxchg(&sp->min_val, old, live) != old);
+
+	if (!host1x_syncpt_check_max(sp, live))
+		dev_err(&dev->dev->dev,
+				"%s failed: id=%u, min=%d, max=%d\n",
+				__func__,
+				sp->id,
+				host1x_syncpt_read_min(sp),
+				host1x_syncpt_read_max(sp));
+
+	return live;
+}
+
+/*
+ * Write a cpu syncpoint increment to the hardware, without touching
+ * the cache. Caller is responsible for host being powered.
+ */
+static void syncpt_cpu_incr(struct host1x_syncpt *sp)
+{
+	struct host1x *dev = sp->dev;
+	u32 reg_offset = sp->id / 32;
+
+	if (!host1x_syncpt_client_managed(sp)
+			&& host1x_syncpt_min_eq_max(sp)) {
+		dev_err(&dev->dev->dev,
+			"Trying to increment syncpoint id %d beyond max\n",
+			sp->id);
+		return;
+	}
+	host1x_sync_writel(dev, BIT_MASK(sp->id),
+			host1x_sync_syncpt_cpu_incr_r() + reg_offset * 4);
+	wmb();
+}
+
+static const char *syncpt_name(struct host1x_syncpt *sp)
+{
+	struct host1x_device_info *info = &sp->dev->info;
+	const char *name = NULL;
+
+	if (sp->id < info->nb_pts)
+		name = sp->name;
+
+	return name ? name : "";
+}
+
+static void syncpt_debug(struct host1x_syncpt *sp)
+{
+	u32 i;
+	for (i = 0; i < host1x_syncpt_nb_pts(sp->dev); i++) {
+		u32 max = host1x_syncpt_read_max(sp);
+		u32 min = host1x_syncpt_load_min(sp);
+		if (!max && !min)
+			continue;
+		dev_info(&sp->dev->dev->dev,
+			"id %d (%s) min %d max %d\n",
+			i, sp->name,
+			min, max);
+
+	}
+
+	for (i = 0; i < host1x_syncpt_nb_bases(sp->dev); i++) {
+		u32 base_val;
+		host1x_syncpt_read_wait_base(sp);
+		base_val = sp->base_val;
+		if (base_val)
+			dev_info(&sp->dev->dev->dev,
+					"waitbase id %d val %d\n",
+					i, base_val);
+
+	}
+}
+
+static const struct host1x_syncpt_ops host1x_syncpt_ops = {
+	.reset = syncpt_reset,
+	.reset_wait_base = syncpt_reset_wait_base,
+	.read_wait_base = syncpt_read_wait_base,
+	.load_min = syncpt_load_min,
+	.cpu_incr = syncpt_cpu_incr,
+	.debug = syncpt_debug,
+	.name = syncpt_name,
+};
diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
new file mode 100644
index 0000000..d551325
--- /dev/null
+++ b/drivers/gpu/host1x/syncpt.c
@@ -0,0 +1,227 @@
+/*
+ * Tegra host1x Syncpoints
+ *
+ * Copyright (c) 2010-2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/platform_device.h>
+#include <linux/slab.h>
+#include <linux/stat.h>
+#include <linux/module.h>
+#include "syncpt.h"
+#include "dev.h"
+#include <trace/events/host1x.h>
+
+#define MAX_SYNCPT_LENGTH	5
+
+u32 host1x_syncpt_id(struct host1x_syncpt *sp)
+{
+	return sp->id;
+}
+EXPORT_SYMBOL(host1x_syncpt_id);
+
+/*
+ * Updates the value sent to hardware.
+ */
+u32 host1x_syncpt_incr_max(struct host1x_syncpt *sp, u32 incrs)
+{
+	return (u32)atomic_add_return(incrs, &sp->max_val);
+}
+EXPORT_SYMBOL(host1x_syncpt_incr_max);
+
+/*
+ * Resets syncpoint and waitbase values to sw shadows
+ */
+void host1x_syncpt_reset(struct host1x *dev)
+{
+	struct host1x_syncpt *sp_base = dev->syncpt;
+	u32 i;
+
+	for (i = 0; i < host1x_syncpt_nb_pts(dev); i++)
+		dev->syncpt_op.reset(sp_base + i);
+	for (i = 0; i < host1x_syncpt_nb_bases(dev); i++)
+		dev->syncpt_op.reset_wait_base(sp_base + i);
+	wmb();
+}
+
+/*
+ * Updates sw shadow state for client managed registers
+ */
+void host1x_syncpt_save(struct host1x *dev)
+{
+	struct host1x_syncpt *sp_base = dev->syncpt;
+	u32 i;
+
+	for (i = 0; i < host1x_syncpt_nb_pts(dev); i++) {
+		if (host1x_syncpt_client_managed(sp_base + i))
+			dev->syncpt_op.load_min(sp_base + i);
+		else
+			WARN_ON(!host1x_syncpt_min_eq_max(sp_base + i));
+	}
+
+	for (i = 0; i < host1x_syncpt_nb_bases(dev); i++)
+		dev->syncpt_op.read_wait_base(sp_base + i);
+}
+
+/*
+ * Updates the last value read from hardware.
+ */
+u32 host1x_syncpt_load_min(struct host1x_syncpt *sp)
+{
+	u32 val;
+	val = sp->dev->syncpt_op.load_min(sp);
+	trace_host1x_syncpt_load_min(sp->id, val);
+
+	return val;
+}
+
+/*
+ * Get the current syncpoint value
+ */
+u32 host1x_syncpt_read(struct host1x_syncpt *sp)
+{
+	u32 val;
+	val = sp->dev->syncpt_op.load_min(sp);
+	return val;
+}
+EXPORT_SYMBOL(host1x_syncpt_read);
+
+/*
+ * Get the current syncpoint base
+ */
+u32 host1x_syncpt_read_wait_base(struct host1x_syncpt *sp)
+{
+	u32 val;
+	sp->dev->syncpt_op.read_wait_base(sp);
+	val = sp->base_val;
+	return val;
+}
+
+/*
+ * Write a cpu syncpoint increment to the hardware, without touching
+ * the cache. Caller is responsible for host being powered.
+ */
+void host1x_syncpt_cpu_incr(struct host1x_syncpt *sp)
+{
+	sp->dev->syncpt_op.cpu_incr(sp);
+}
+
+/*
+ * Increment syncpoint value from cpu, updating cache
+ */
+void host1x_syncpt_incr(struct host1x_syncpt *sp)
+{
+	if (host1x_syncpt_client_managed(sp))
+		host1x_syncpt_incr_max(sp, 1);
+	host1x_syncpt_cpu_incr(sp);
+}
+EXPORT_SYMBOL(host1x_syncpt_incr);
+
+void host1x_syncpt_debug(struct host1x_syncpt *sp)
+{
+	sp->dev->syncpt_op.debug(sp);
+}
+
+struct host1x_syncpt *host1x_syncpt_init(struct host1x *host)
+{
+	struct host1x_syncpt *syncpt, *sp;
+	int i;
+
+	syncpt = sp = kzalloc(sizeof(struct host1x_syncpt) * host->info.nb_pts,
+		GFP_KERNEL);
+	if (!syncpt)
+		return NULL;
+
+	for (i = 0; i < host->info.nb_pts; ++i, ++sp) {
+		sp->id = i;
+		sp->dev = host;
+	}
+
+	return syncpt;
+}
+
+struct host1x_syncpt *_host1x_syncpt_alloc(struct host1x *host,
+		struct platform_device *pdev,
+		int client_managed)
+{
+	int i;
+	struct host1x_syncpt *sp = host->syncpt;
+	char *name;
+
+	for (i = 0; i < host->info.nb_pts && sp->name; i++, sp++)
+		;
+	if (sp->pdev)
+		return NULL;
+
+	name = kasprintf(GFP_KERNEL, "%02d-%s", sp->id,
+			pdev ? dev_name(&pdev->dev) : NULL);
+	if (!name)
+		return NULL;
+
+	sp->pdev = pdev;
+	sp->name = name;
+	sp->client_managed = client_managed;
+
+	return sp;
+}
+
+struct host1x_syncpt *host1x_syncpt_alloc(struct platform_device *pdev,
+		int client_managed)
+{
+	struct host1x *host = host1x_get_host(pdev);
+	return _host1x_syncpt_alloc(host, pdev, client_managed);
+}
+EXPORT_SYMBOL(host1x_syncpt_alloc);
+
+void host1x_syncpt_free(struct host1x_syncpt *sp)
+{
+	if (!sp)
+		return;
+
+	kfree(sp->name);
+	sp->pdev = NULL;
+	sp->name = NULL;
+	sp->client_managed = 0;
+}
+EXPORT_SYMBOL(host1x_syncpt_free);
+
+void host1x_syncpt_deinit(struct host1x *host)
+{
+	int i;
+	struct host1x_syncpt *sp = host->syncpt;
+	for (i = 0; i < host->info.nb_pts; i++, sp++)
+		kfree(sp->name);
+	kfree(sp);
+}
+
+int host1x_syncpt_nb_pts(struct host1x *dev)
+{
+	return dev->info.nb_pts;
+}
+
+int host1x_syncpt_nb_bases(struct host1x *dev)
+{
+	return dev->info.nb_bases;
+}
+
+int host1x_syncpt_nb_mlocks(struct host1x *dev)
+{
+	return dev->info.nb_mlocks;
+}
+
+struct host1x_syncpt *host1x_syncpt_get(struct host1x *dev, u32 id)
+{
+	return dev->syncpt + id;
+}
diff --git a/drivers/gpu/host1x/syncpt.h b/drivers/gpu/host1x/syncpt.h
new file mode 100644
index 0000000..4f7777b
--- /dev/null
+++ b/drivers/gpu/host1x/syncpt.h
@@ -0,0 +1,128 @@
+/*
+ * Tegra host1x Syncpoints
+ *
+ * Copyright (c) 2010-2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef __HOST1X_SYNCPT_H
+#define __HOST1X_SYNCPT_H
+
+#include <linux/kernel.h>
+#include <linux/sched.h>
+#include <linux/host1x.h>
+#include <linux/atomic.h>
+
+/* host managed and invalid syncpt id */
+#define NVSYNCPT_GRAPHICS_HOST		     (0)
+
+struct host1x;
+
+struct host1x_syncpt {
+	int id;
+	atomic_t min_val;
+	atomic_t max_val;
+	u32 base_val;
+	const char *name;
+	int client_managed;
+	struct host1x *dev;
+	struct platform_device *pdev;
+};
+
+struct host1x_syncpt *host1x_syncpt_init(struct host1x *);
+void host1x_syncpt_deinit(struct host1x *);
+
+struct host1x_syncpt *_host1x_syncpt_alloc(struct host1x *,
+		struct platform_device *, int);
+void _host1x_syncpt_free(struct host1x_syncpt *sp);
+
+#define SYNCPT_CHECK_PERIOD (2 * HZ)
+#define MAX_STUCK_CHECK_COUNT 15
+
+/*
+ * Updated the value sent to hardware.
+ */
+static inline u32 host1x_syncpt_set_max(struct host1x_syncpt *sp, u32 val)
+{
+	atomic_set(&sp->max_val, val);
+	smp_wmb();
+	return val;
+}
+
+static inline u32 host1x_syncpt_read_max(struct host1x_syncpt *sp)
+{
+	smp_rmb();
+	return (u32)atomic_read(&sp->max_val);
+}
+
+static inline u32 host1x_syncpt_read_min(struct host1x_syncpt *sp)
+{
+	smp_rmb();
+	return (u32)atomic_read(&sp->min_val);
+}
+
+int host1x_syncpt_nb_pts(struct host1x *dev);
+int host1x_syncpt_nb_bases(struct host1x *dev);
+int host1x_syncpt_nb_mlocks(struct host1x *dev);
+
+static inline bool host1x_syncpt_check_max(struct host1x_syncpt *sp, u32 real)
+{
+	u32 max;
+	if (sp->client_managed)
+		return true;
+	max = host1x_syncpt_read_max(sp);
+	return (s32)(max - real) >= 0;
+}
+
+static inline int host1x_syncpt_client_managed(struct host1x_syncpt *sp)
+{
+	return sp->client_managed;
+}
+/*
+ * Returns true if syncpoint min == max
+ */
+static inline bool host1x_syncpt_min_eq_max(struct host1x_syncpt *sp)
+{
+	int min, max;
+	smp_rmb();
+	min = atomic_read(&sp->min_val);
+	max = atomic_read(&sp->max_val);
+	return (min == max);
+}
+
+struct host1x_syncpt *host1x_syncpt_get(struct host1x *dev, u32 id);
+
+void host1x_syncpt_cpu_incr(struct host1x_syncpt *sp);
+
+u32 host1x_syncpt_load_min(struct host1x_syncpt *sp);
+
+void host1x_syncpt_save(struct host1x *dev);
+
+void host1x_syncpt_reset(struct host1x *dev);
+
+u32 host1x_syncpt_read(struct host1x_syncpt *sp);
+u32 host1x_syncpt_read_wait_base(struct host1x_syncpt *sp);
+
+void host1x_syncpt_incr(struct host1x_syncpt *sp);
+u32 host1x_syncpt_incr_max(struct host1x_syncpt *sp, u32 incrs);
+
+void host1x_syncpt_debug(struct host1x_syncpt *sp);
+
+static inline int host1x_syncpt_is_valid(struct host1x_syncpt *sp)
+{
+	return sp->id != NVSYNCPT_INVALID &&
+		sp->id < host1x_syncpt_nb_pts(sp->dev);
+}
+
+#endif
diff --git a/drivers/video/Kconfig b/drivers/video/Kconfig
index e7068c5..09b3762 100644
--- a/drivers/video/Kconfig
+++ b/drivers/video/Kconfig
@@ -19,6 +19,8 @@ source "drivers/char/agp/Kconfig"
 
 source "drivers/gpu/vga/Kconfig"
 
+source "drivers/gpu/host1x/Kconfig"
+
 source "drivers/gpu/drm/Kconfig"
 
 source "drivers/gpu/stub/Kconfig"
diff --git a/include/linux/host1x.h b/include/linux/host1x.h
new file mode 100644
index 0000000..6c2cc8a
--- /dev/null
+++ b/include/linux/host1x.h
@@ -0,0 +1,41 @@
+/*
+ * Tegra host1x driver
+ *
+ * Copyright (c) 2009-2012, NVIDIA Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#ifndef __LINUX_HOST1X_H
+#define __LINUX_HOST1X_H
+
+#include <linux/device.h>
+#include <linux/types.h>
+#include <linux/platform_device.h>
+
+struct host1x_syncpt;
+
+#define NVSYNCPT_INVALID			(-1)
+
+/* public host1x sync-point management APIs */
+u32 host1x_syncpt_id(struct host1x_syncpt *sp);
+void host1x_syncpt_incr_byid(u32 id);
+u32 host1x_syncpt_read_byid(u32 id);
+
+struct host1x_syncpt *host1x_syncpt_alloc(struct platform_device *pdev,
+		int client_managed);
+void host1x_syncpt_free(struct host1x_syncpt *sp);
+
+#endif
diff --git a/include/trace/events/host1x.h b/include/trace/events/host1x.h
new file mode 100644
index 0000000..d98d74c
--- /dev/null
+++ b/include/trace/events/host1x.h
@@ -0,0 +1,61 @@
+/*
+ * include/trace/events/host1x.h
+ *
+ * Nvhost event logging to ftrace.
+ *
+ * Copyright (c) 2010-2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM host1x
+
+#if !defined(_TRACE_HOST1X_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_HOST1X_H
+
+#include <linux/ktime.h>
+#include <linux/tracepoint.h>
+
+DECLARE_EVENT_CLASS(host1x,
+	TP_PROTO(const char *name),
+	TP_ARGS(name),
+	TP_STRUCT__entry(__field(const char *, name)),
+	TP_fast_assign(__entry->name = name;),
+	TP_printk("name=%s", __entry->name)
+);
+
+TRACE_EVENT(host1x_syncpt_load_min,
+	TP_PROTO(u32 id, u32 val),
+
+	TP_ARGS(id, val),
+
+	TP_STRUCT__entry(
+		__field(u32, id)
+		__field(u32, val)
+	),
+
+	TP_fast_assign(
+		__entry->id = id;
+		__entry->val = val;
+	),
+
+	TP_printk("id=%d, val=%d", __entry->id, __entry->val)
+);
+
+#endif /*  _TRACE_HOST1X_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCHv3 2/7] gpu: host1x: Add syncpoint wait and interrupts
  2012-12-13 14:04 ` Terje Bergstrom
@ 2012-12-13 14:04   ` Terje Bergstrom
  -1 siblings, 0 replies; 24+ messages in thread
From: Terje Bergstrom @ 2012-12-13 14:04 UTC (permalink / raw)
  To: tbergstrom, thierry.reding, dev, linux-tegra, dri-devel; +Cc: linux-kernel

Add support for sync point interrupts, and sync point wait. Sync
point wait used interrupts for unblocking wait.

Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
---
 drivers/gpu/host1x/Makefile              |    3 +-
 drivers/gpu/host1x/dev.c                 |   44 ++++
 drivers/gpu/host1x/dev.h                 |   15 ++
 drivers/gpu/host1x/hw/host1x01.c         |    2 +
 drivers/gpu/host1x/hw/hw_host1x01_sync.h |   30 ++-
 drivers/gpu/host1x/hw/intr_hw.c          |  175 +++++++++++++++
 drivers/gpu/host1x/intr.c                |  350 ++++++++++++++++++++++++++++++
 drivers/gpu/host1x/intr.h                |  100 +++++++++
 drivers/gpu/host1x/syncpt.c              |  161 ++++++++++++++
 drivers/gpu/host1x/syncpt.h              |    4 +
 include/linux/host1x.h                   |    1 +
 11 files changed, 883 insertions(+), 2 deletions(-)
 create mode 100644 drivers/gpu/host1x/hw/intr_hw.c
 create mode 100644 drivers/gpu/host1x/intr.c
 create mode 100644 drivers/gpu/host1x/intr.h

diff --git a/drivers/gpu/host1x/Makefile b/drivers/gpu/host1x/Makefile
index a4adcc6..9d00b62 100644
--- a/drivers/gpu/host1x/Makefile
+++ b/drivers/gpu/host1x/Makefile
@@ -2,7 +2,8 @@ ccflags-y = -Idrivers/gpu/host1x
 
 host1x-objs = \
 	syncpt.o \
-	dev.o
+	dev.o \
+	intr.o
 
 obj-$(CONFIG_TEGRA_HOST1X) += hw/
 obj-$(CONFIG_TEGRA_HOST1X) += host1x.o
diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
index b0d630d..9255a49 100644
--- a/drivers/gpu/host1x/dev.c
+++ b/drivers/gpu/host1x/dev.c
@@ -25,6 +25,7 @@
 #include <linux/clk.h>
 #include <linux/io.h>
 #include "dev.h"
+#include "intr.h"
 #include "hw/host1x01.h"
 
 #define CREATE_TRACE_POINTS
@@ -48,6 +49,19 @@ u32 host1x_syncpt_read_byid(u32 id)
 }
 EXPORT_SYMBOL(host1x_syncpt_read_byid);
 
+int host1x_syncpt_wait_byid(u32 id, u32 thresh, long timeout, u32 *value)
+{
+	struct host1x_syncpt *sp = host1x->syncpt + id;
+	return host1x_syncpt_wait(sp, thresh, timeout, value);
+}
+EXPORT_SYMBOL(host1x_syncpt_wait_byid);
+
+static void host1x_free_resources(struct host1x *host)
+{
+	kfree(host->intr.syncpt);
+	host->intr.syncpt = 0;
+}
+
 void host1x_sync_writel(struct host1x *host1x, u32 v, u32 r)
 {
 	void __iomem *sync_regs = host1x->regs + host1x->info.sync_offset;
@@ -62,6 +76,21 @@ u32 host1x_sync_readl(struct host1x *host1x, u32 r)
 	return readl(sync_regs + r);
 }
 
+static int host1x_alloc_resources(struct host1x *host)
+{
+	host->intr.syncpt = devm_kzalloc(&host->dev->dev,
+			sizeof(struct host1x_intr_syncpt) *
+			host->info.nb_pts,
+			GFP_KERNEL);
+
+	if (!host->intr.syncpt) {
+		/* frees happen in the support removal phase */
+		return -ENOMEM;
+	}
+
+	return 0;
+}
+
 static struct host1x_device_info host1x_info = {
 	.nb_channels	= 8,
 	.nb_pts		= 32,
@@ -118,6 +147,12 @@ static int host1x_probe(struct platform_device *dev)
 		goto fail;
 	}
 
+	err = host1x_alloc_resources(host);
+	if (err) {
+		dev_err(&dev->dev, "failed to init chip support\n");
+		goto fail;
+	}
+
 	if (host->info.init) {
 		err = host->info.init(host);
 		if (err)
@@ -132,6 +167,10 @@ static int host1x_probe(struct platform_device *dev)
 	if (!host->nop_sp)
 		goto fail;
 
+	err = host1x_intr_init(&host->intr, syncpt_irq);
+	if (err)
+		goto fail;
+
 	host->clk = devm_clk_get(&dev->dev, NULL);
 	if (IS_ERR(host->clk)) {
 		dev_err(&dev->dev, "failed to get clock\n");
@@ -145,6 +184,8 @@ static int host1x_probe(struct platform_device *dev)
 
 	host1x_syncpt_reset(host);
 
+	host1x_intr_start(&host->intr, clk_get_rate(host->clk));
+
 	host1x = host;
 
 	dev_info(&dev->dev, "initialized\n");
@@ -153,6 +194,7 @@ static int host1x_probe(struct platform_device *dev)
 
 fail:
 	host1x_syncpt_free(host->nop_sp);
+	host1x_free_resources(host);
 	kfree(host);
 	return err;
 }
@@ -160,8 +202,10 @@ fail:
 static int __exit host1x_remove(struct platform_device *dev)
 {
 	struct host1x *host = platform_get_drvdata(dev);
+	host1x_intr_deinit(&host->intr);
 	host1x_syncpt_deinit(host);
 	clk_disable_unprepare(host->clk);
+	host1x_free_resources(host);
 	return 0;
 }
 
diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h
index 8245e24..a1622bb 100644
--- a/drivers/gpu/host1x/dev.h
+++ b/drivers/gpu/host1x/dev.h
@@ -20,6 +20,7 @@
 #include <linux/host1x.h>
 
 #include "syncpt.h"
+#include "intr.h"
 
 struct host1x;
 struct host1x_syncpt;
@@ -36,6 +37,18 @@ struct host1x_syncpt_ops {
 	const char * (*name)(struct host1x_syncpt *);
 };
 
+struct host1x_intr_ops {
+	void (*init_host_sync)(struct host1x_intr *);
+	void (*set_host_clocks_per_usec)(
+		struct host1x_intr *, u32 clocks);
+	void (*set_syncpt_threshold)(
+		struct host1x_intr *, u32 id, u32 thresh);
+	void (*enable_syncpt_intr)(struct host1x_intr *, u32 id);
+	void (*disable_syncpt_intr)(struct host1x_intr *, u32 id);
+	void (*disable_all_syncpt_intrs)(struct host1x_intr *);
+	int (*free_syncpt_irq)(struct host1x_intr *);
+};
+
 struct host1x_device_info {
 	int	nb_channels;		/* host1x: num channels supported */
 	int	nb_pts;			/* host1x: num syncpoints supported */
@@ -48,6 +61,7 @@ struct host1x_device_info {
 struct host1x {
 	void __iomem *regs;
 	struct host1x_syncpt *syncpt;
+	struct host1x_intr intr;
 	struct platform_device *dev;
 	atomic_t clientid;
 	struct host1x_device_info info;
@@ -57,6 +71,7 @@ struct host1x {
 
 	const char *soc_name;
 	struct host1x_syncpt_ops syncpt_op;
+	struct host1x_intr_ops intr_op;
 
 	struct dentry *debugfs;
 };
diff --git a/drivers/gpu/host1x/hw/host1x01.c b/drivers/gpu/host1x/hw/host1x01.c
index 59176ba..c5c55a3 100644
--- a/drivers/gpu/host1x/hw/host1x01.c
+++ b/drivers/gpu/host1x/hw/host1x01.c
@@ -27,10 +27,12 @@
 #include "hw/host1x01_hardware.h"
 
 #include "hw/syncpt_hw.c"
+#include "hw/intr_hw.c"
 
 int host1x01_init(struct host1x *host)
 {
 	host->syncpt_op = host1x_syncpt_ops;
+	host->intr_op = host1x_intr_ops;
 
 	return 0;
 }
diff --git a/drivers/gpu/host1x/hw/hw_host1x01_sync.h b/drivers/gpu/host1x/hw/hw_host1x01_sync.h
index 63a71c8..b06a2c5 100644
--- a/drivers/gpu/host1x/hw/hw_host1x01_sync.h
+++ b/drivers/gpu/host1x/hw/hw_host1x01_sync.h
@@ -51,10 +51,38 @@
 #ifndef __hw_host1x_sync_h__
 #define __hw_host1x_sync_h__
 
+static inline u32 host1x_sync_syncpt_thresh_cpu0_int_status_r(void)
+{
+	return 0x40;
+}
+static inline u32 host1x_sync_syncpt_thresh_int_disable_r(void)
+{
+	return 0x60;
+}
+static inline u32 host1x_sync_syncpt_thresh_int_enable_cpu0_r(void)
+{
+	return 0x68;
+}
+static inline u32 host1x_sync_usec_clk_r(void)
+{
+	return 0x1a4;
+}
+static inline u32 host1x_sync_ctxsw_timeout_cfg_r(void)
+{
+	return 0x1a8;
+}
+static inline u32 host1x_sync_ip_busy_timeout_r(void)
+{
+	return 0x1bc;
+}
 static inline u32 host1x_sync_syncpt_0_r(void)
 {
 	return 0x400;
 }
+static inline u32 host1x_sync_syncpt_int_thresh_0_r(void)
+{
+	return 0x500;
+}
 static inline u32 host1x_sync_syncpt_base_0_r(void)
 {
 	return 0x600;
@@ -63,4 +91,4 @@ static inline u32 host1x_sync_syncpt_cpu_incr_r(void)
 {
 	return 0x700;
 }
-#endif /* __hw_host1x_host1x_h__ */
+#endif /* __hw_host1x_sync_h__ */
diff --git a/drivers/gpu/host1x/hw/intr_hw.c b/drivers/gpu/host1x/hw/intr_hw.c
new file mode 100644
index 0000000..8efbd51
--- /dev/null
+++ b/drivers/gpu/host1x/hw/intr_hw.c
@@ -0,0 +1,175 @@
+/*
+ * Tegra host1x Interrupt Management
+ *
+ * Copyright (C) 2010 Google, Inc.
+ * Copyright (c) 2010-2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/interrupt.h>
+#include <linux/irq.h>
+#include <linux/io.h>
+#include <asm/mach/irq.h>
+
+#include "intr.h"
+#include "dev.h"
+
+/* Spacing between sync registers */
+#define REGISTER_STRIDE 4
+
+static void host1x_intr_syncpt_thresh_isr(struct host1x_intr_syncpt *syncpt);
+
+static void syncpt_thresh_cascade_fn(struct work_struct *work)
+{
+	struct host1x_intr_syncpt *sp =
+		container_of(work, struct host1x_intr_syncpt, work);
+	host1x_syncpt_thresh_fn(sp);
+}
+
+static irqreturn_t syncpt_thresh_cascade_isr(int irq, void *dev_id)
+{
+	struct host1x *host1x = dev_id;
+	struct host1x_intr *intr = &host1x->intr;
+	unsigned long reg;
+	int i, id;
+
+	for (i = 0; i < host1x->info.nb_pts / BITS_PER_LONG; i++) {
+		reg = host1x_sync_readl(host1x,
+				host1x_sync_syncpt_thresh_cpu0_int_status_r() +
+				i * REGISTER_STRIDE);
+		for_each_set_bit(id, &reg, BITS_PER_LONG) {
+			struct host1x_intr_syncpt *sp =
+				intr->syncpt + (i * BITS_PER_LONG + id);
+			host1x_intr_syncpt_thresh_isr(sp);
+			queue_work(intr->wq, &sp->work);
+		}
+	}
+
+	return IRQ_HANDLED;
+}
+
+static void host1x_intr_init_host_sync(struct host1x_intr *intr)
+{
+	struct host1x *host1x = intr_to_host1x(intr);
+	int i, err;
+
+	host1x_sync_writel(host1x, 0xffffffffUL,
+		host1x_sync_syncpt_thresh_int_disable_r());
+	host1x_sync_writel(host1x, 0xffffffffUL,
+		host1x_sync_syncpt_thresh_cpu0_int_status_r());
+
+	for (i = 0; i < host1x->info.nb_pts; i++)
+		INIT_WORK(&intr->syncpt[i].work, syncpt_thresh_cascade_fn);
+
+	err = devm_request_irq(&host1x->dev->dev, intr->syncpt_irq,
+				syncpt_thresh_cascade_isr,
+				IRQF_SHARED, "host1x_syncpt", host1x);
+	WARN_ON(IS_ERR_VALUE(err));
+
+	/* disable the ip_busy_timeout. this prevents write drops */
+	host1x_sync_writel(host1x, 0, host1x_sync_ip_busy_timeout_r());
+
+	/*
+	 * increase the auto-ack timout to the maximum value. 2d will hang
+	 * otherwise on Tegra2.
+	 */
+	host1x_sync_writel(host1x, 0xff, host1x_sync_ctxsw_timeout_cfg_r());
+}
+
+static void host1x_intr_set_host_clocks_per_usec(struct host1x_intr *intr,
+		u32 cpm)
+{
+	struct host1x *host1x = intr_to_host1x(intr);
+	/* write microsecond clock register */
+	host1x_sync_writel(host1x, cpm, host1x_sync_usec_clk_r());
+}
+
+static void host1x_intr_set_syncpt_threshold(struct host1x_intr *intr,
+	u32 id, u32 thresh)
+{
+	struct host1x *host1x = intr_to_host1x(intr);
+	host1x_sync_writel(host1x, thresh,
+		host1x_sync_syncpt_int_thresh_0_r() + id * REGISTER_STRIDE);
+}
+
+static void host1x_intr_enable_syncpt_intr(struct host1x_intr *intr, u32 id)
+{
+	struct host1x *host1x = intr_to_host1x(intr);
+
+	host1x_sync_writel(host1x, BIT_MASK(id),
+			host1x_sync_syncpt_thresh_int_enable_cpu0_r() +
+			BIT_WORD(id) * REGISTER_STRIDE);
+}
+
+static void host1x_intr_disable_syncpt_intr(struct host1x_intr *intr, u32 id)
+{
+	struct host1x *host1x = intr_to_host1x(intr);
+
+	host1x_sync_writel(host1x, BIT_MASK(id),
+			host1x_sync_syncpt_thresh_int_disable_r() +
+			BIT_WORD(id) * REGISTER_STRIDE);
+
+	host1x_sync_writel(host1x, BIT_MASK(id),
+		host1x_sync_syncpt_thresh_cpu0_int_status_r() +
+		BIT_WORD(id) * REGISTER_STRIDE);
+}
+
+static void host1x_intr_disable_all_syncpt_intrs(struct host1x_intr *intr)
+{
+	struct host1x *host1x = intr_to_host1x(intr);
+	u32 reg;
+
+	for (reg = 0; reg <= BIT_WORD(host1x->info.nb_pts) * REGISTER_STRIDE;
+			reg += REGISTER_STRIDE) {
+		host1x_sync_writel(host1x, 0xffffffffu,
+				host1x_sync_syncpt_thresh_int_disable_r() +
+				reg);
+
+		host1x_sync_writel(host1x, 0xffffffffu,
+			host1x_sync_syncpt_thresh_cpu0_int_status_r() + reg);
+	}
+}
+
+/*
+ * Sync point threshold interrupt service function
+ * Handles sync point threshold triggers, in interrupt context
+ */
+static void host1x_intr_syncpt_thresh_isr(struct host1x_intr_syncpt *syncpt)
+{
+	unsigned int id = syncpt->id;
+	struct host1x_intr *intr = intr_syncpt_to_intr(syncpt);
+	struct host1x *host1x = intr_to_host1x(intr);
+	u32 reg = BIT_WORD(id) * REGISTER_STRIDE;
+
+	host1x_sync_writel(host1x, BIT_MASK(id),
+		host1x_sync_syncpt_thresh_int_disable_r() + reg);
+	host1x_sync_writel(host1x, BIT_MASK(id),
+		host1x_sync_syncpt_thresh_cpu0_int_status_r() + reg);
+}
+
+static int host1x_free_syncpt_irq(struct host1x_intr *intr)
+{
+	flush_workqueue(intr->wq);
+	return 0;
+}
+
+static const struct host1x_intr_ops host1x_intr_ops = {
+	.init_host_sync = host1x_intr_init_host_sync,
+	.set_host_clocks_per_usec = host1x_intr_set_host_clocks_per_usec,
+	.set_syncpt_threshold = host1x_intr_set_syncpt_threshold,
+	.enable_syncpt_intr = host1x_intr_enable_syncpt_intr,
+	.disable_syncpt_intr = host1x_intr_disable_syncpt_intr,
+	.disable_all_syncpt_intrs = host1x_intr_disable_all_syncpt_intrs,
+	.free_syncpt_irq = host1x_free_syncpt_irq,
+};
diff --git a/drivers/gpu/host1x/intr.c b/drivers/gpu/host1x/intr.c
new file mode 100644
index 0000000..f166224
--- /dev/null
+++ b/drivers/gpu/host1x/intr.c
@@ -0,0 +1,350 @@
+/*
+ * Tegra host1x Interrupt Management
+ *
+ * Copyright (c) 2010-2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "intr.h"
+#include <linux/interrupt.h>
+#include <linux/slab.h>
+#include <linux/irq.h>
+#include "dev.h"
+
+/* Wait list management */
+
+struct host1x_waitlist {
+	struct list_head list;
+	struct kref refcount;
+	u32 thresh;
+	enum host1x_intr_action action;
+	atomic_t state;
+	void *data;
+	int count;
+};
+
+enum waitlist_state {
+	WLS_PENDING,
+	WLS_REMOVED,
+	WLS_CANCELLED,
+	WLS_HANDLED
+};
+
+static void waiter_release(struct kref *kref)
+{
+	kfree(container_of(kref, struct host1x_waitlist, refcount));
+}
+
+/*
+ * add a waiter to a waiter queue, sorted by threshold
+ * returns true if it was added at the head of the queue
+ */
+static bool add_waiter_to_queue(struct host1x_waitlist *waiter,
+				struct list_head *queue)
+{
+	struct host1x_waitlist *pos;
+	u32 thresh = waiter->thresh;
+
+	list_for_each_entry_reverse(pos, queue, list)
+		if ((s32)(pos->thresh - thresh) <= 0) {
+			list_add(&waiter->list, &pos->list);
+			return false;
+		}
+
+	list_add(&waiter->list, queue);
+	return true;
+}
+
+/*
+ * run through a waiter queue for a single sync point ID
+ * and gather all completed waiters into lists by actions
+ */
+static void remove_completed_waiters(struct list_head *head, u32 sync,
+			struct list_head completed[HOST1X_INTR_ACTION_COUNT])
+{
+	struct list_head *dest;
+	struct host1x_waitlist *waiter, *next;
+
+	list_for_each_entry_safe(waiter, next, head, list) {
+		if ((s32)(waiter->thresh - sync) > 0)
+			break;
+
+		dest = completed + waiter->action;
+
+		/* PENDING->REMOVED or CANCELLED->HANDLED */
+		if (atomic_inc_return(&waiter->state) == WLS_HANDLED || !dest) {
+			list_del(&waiter->list);
+			kref_put(&waiter->refcount, waiter_release);
+		} else {
+			list_move_tail(&waiter->list, dest);
+		}
+	}
+}
+
+void reset_threshold_interrupt(struct host1x_intr *intr,
+			       struct list_head *head,
+			       unsigned int id)
+{
+	struct host1x *host1x = intr_to_host1x(intr);
+	u32 thresh = list_first_entry(head,
+				struct host1x_waitlist, list)->thresh;
+
+	host1x->intr_op.set_syncpt_threshold(intr, id, thresh);
+	host1x->intr_op.enable_syncpt_intr(intr, id);
+}
+
+static void action_wakeup(struct host1x_waitlist *waiter)
+{
+	wait_queue_head_t *wq = waiter->data;
+
+	wake_up(wq);
+}
+
+static void action_wakeup_interruptible(struct host1x_waitlist *waiter)
+{
+	wait_queue_head_t *wq = waiter->data;
+
+	wake_up_interruptible(wq);
+}
+
+typedef void (*action_handler)(struct host1x_waitlist *waiter);
+
+static action_handler action_handlers[HOST1X_INTR_ACTION_COUNT] = {
+	action_wakeup,
+	action_wakeup_interruptible,
+};
+
+static void run_handlers(struct list_head completed[HOST1X_INTR_ACTION_COUNT])
+{
+	struct list_head *head = completed;
+	int i;
+
+	for (i = 0; i < HOST1X_INTR_ACTION_COUNT; ++i, ++head) {
+		action_handler handler = action_handlers[i];
+		struct host1x_waitlist *waiter, *next;
+
+		list_for_each_entry_safe(waiter, next, head, list) {
+			list_del(&waiter->list);
+			handler(waiter);
+			WARN_ON(atomic_xchg(&waiter->state, WLS_HANDLED)
+					!= WLS_REMOVED);
+			kref_put(&waiter->refcount, waiter_release);
+		}
+	}
+}
+
+/*
+ * Remove & handle all waiters that have completed for the given syncpt
+ */
+static int process_wait_list(struct host1x_intr *intr,
+			     struct host1x_intr_syncpt *syncpt,
+			     u32 threshold)
+{
+	struct host1x *host1x = intr_to_host1x(intr);
+	struct list_head completed[HOST1X_INTR_ACTION_COUNT];
+	unsigned int i;
+	int empty;
+
+	for (i = 0; i < HOST1X_INTR_ACTION_COUNT; ++i)
+		INIT_LIST_HEAD(completed + i);
+
+	spin_lock(&syncpt->lock);
+
+	remove_completed_waiters(&syncpt->wait_head, threshold, completed);
+
+	empty = list_empty(&syncpt->wait_head);
+	if (empty)
+		host1x->intr_op.disable_syncpt_intr(intr, syncpt->id);
+	else
+		reset_threshold_interrupt(intr, &syncpt->wait_head,
+					  syncpt->id);
+
+	spin_unlock(&syncpt->lock);
+
+	run_handlers(completed);
+
+	return empty;
+}
+
+/*
+ * Sync point threshold interrupt service thread function
+ * Handles sync point threshold triggers, in thread context
+ */
+irqreturn_t host1x_syncpt_thresh_fn(void *dev_id)
+{
+	struct host1x_intr_syncpt *syncpt = dev_id;
+	unsigned int id = syncpt->id;
+	struct host1x_intr *intr = intr_syncpt_to_intr(syncpt);
+	struct host1x *host1x = intr_to_host1x(intr);
+
+	(void)process_wait_list(intr, syncpt,
+				host1x_syncpt_load_min(host1x->syncpt + id));
+
+	return IRQ_HANDLED;
+}
+
+int host1x_intr_add_action(struct host1x_intr *intr, u32 id, u32 thresh,
+			enum host1x_intr_action action, void *data,
+			void *_waiter,
+			void **ref)
+{
+	struct host1x *host1x = intr_to_host1x(intr);
+	struct host1x_waitlist *waiter = _waiter;
+	struct host1x_intr_syncpt *syncpt;
+	int queue_was_empty;
+
+	if (waiter == NULL) {
+		pr_warn("%s: NULL waiter\n", __func__);
+		return -EINVAL;
+	}
+
+	/* initialize a new waiter */
+	INIT_LIST_HEAD(&waiter->list);
+	kref_init(&waiter->refcount);
+	if (ref)
+		kref_get(&waiter->refcount);
+	waiter->thresh = thresh;
+	waiter->action = action;
+	atomic_set(&waiter->state, WLS_PENDING);
+	waiter->data = data;
+	waiter->count = 1;
+
+	syncpt = intr->syncpt + id;
+
+	spin_lock(&syncpt->lock);
+
+	queue_was_empty = list_empty(&syncpt->wait_head);
+
+	if (add_waiter_to_queue(waiter, &syncpt->wait_head)) {
+		/* added at head of list - new threshold value */
+		host1x->intr_op.set_syncpt_threshold(intr, id, thresh);
+
+		/* added as first waiter - enable interrupt */
+		if (queue_was_empty)
+			host1x->intr_op.enable_syncpt_intr(intr, id);
+	}
+
+	spin_unlock(&syncpt->lock);
+
+	if (ref)
+		*ref = waiter;
+	return 0;
+}
+
+void *host1x_intr_alloc_waiter()
+{
+	return kzalloc(sizeof(struct host1x_waitlist), GFP_KERNEL);
+}
+
+void host1x_intr_put_ref(struct host1x_intr *intr, u32 id, void *ref)
+{
+	struct host1x_waitlist *waiter = ref;
+	struct host1x_intr_syncpt *syncpt;
+	struct host1x *host1x = intr_to_host1x(intr);
+
+	while (atomic_cmpxchg(&waiter->state,
+				WLS_PENDING, WLS_CANCELLED) == WLS_REMOVED)
+		schedule();
+
+	syncpt = intr->syncpt + id;
+	(void)process_wait_list(intr, syncpt,
+				host1x_syncpt_load_min(host1x->syncpt + id));
+
+	kref_put(&waiter->refcount, waiter_release);
+}
+
+int host1x_intr_init(struct host1x_intr *intr, u32 irq_sync)
+{
+	unsigned int id;
+	struct host1x *host1x = intr_to_host1x(intr);
+	u32 nb_pts = host1x_syncpt_nb_pts(host1x);
+
+	mutex_init(&intr->mutex);
+	intr->syncpt_irq = irq_sync;
+	intr->wq = create_workqueue("host_syncpt");
+	if (!intr->wq)
+		return -ENOMEM;
+
+	host1x->intr_op.init_host_sync(intr);
+
+	for (id = 0; id < nb_pts; ++id) {
+		struct host1x_intr_syncpt *syncpt = &intr->syncpt[id];
+
+		syncpt->intr = &host1x->intr;
+		syncpt->id = id;
+		spin_lock_init(&syncpt->lock);
+		INIT_LIST_HEAD(&syncpt->wait_head);
+		snprintf(syncpt->thresh_irq_name,
+			sizeof(syncpt->thresh_irq_name),
+			"host1x_sp_%02d", id);
+	}
+
+	return 0;
+}
+
+void host1x_intr_deinit(struct host1x_intr *intr)
+{
+	host1x_intr_stop(intr);
+	destroy_workqueue(intr->wq);
+}
+
+void host1x_intr_start(struct host1x_intr *intr, u32 hz)
+{
+	struct host1x *host1x = intr_to_host1x(intr);
+	mutex_lock(&intr->mutex);
+
+	host1x->intr_op.init_host_sync(intr);
+	host1x->intr_op.set_host_clocks_per_usec(intr,
+			DIV_ROUND_UP(hz, 1000000));
+
+	mutex_unlock(&intr->mutex);
+}
+
+void host1x_intr_stop(struct host1x_intr *intr)
+{
+	unsigned int id;
+	struct host1x *host1x = intr_to_host1x(intr);
+	struct host1x_intr_syncpt *syncpt;
+	u32 nb_pts = host1x_syncpt_nb_pts(intr_to_host1x(intr));
+
+	mutex_lock(&intr->mutex);
+
+	host1x->intr_op.disable_all_syncpt_intrs(intr);
+
+	for (id = 0, syncpt = intr->syncpt;
+	     id < nb_pts;
+	     ++id, ++syncpt) {
+		struct host1x_waitlist *waiter, *next;
+		list_for_each_entry_safe(waiter, next,
+				&syncpt->wait_head, list) {
+			if (atomic_cmpxchg(&waiter->state,
+						WLS_CANCELLED, WLS_HANDLED)
+				== WLS_CANCELLED) {
+				list_del(&waiter->list);
+				kref_put(&waiter->refcount, waiter_release);
+			}
+		}
+
+		if (!list_empty(&syncpt->wait_head)) {  /* output diagnostics */
+			mutex_unlock(&intr->mutex);
+			pr_warn("%s cannot stop syncpt intr id=%d\n",
+					__func__, id);
+			return;
+		}
+	}
+
+	host1x->intr_op.free_syncpt_irq(intr);
+
+	mutex_unlock(&intr->mutex);
+}
diff --git a/drivers/gpu/host1x/intr.h b/drivers/gpu/host1x/intr.h
new file mode 100644
index 0000000..3625bf3
--- /dev/null
+++ b/drivers/gpu/host1x/intr.h
@@ -0,0 +1,100 @@
+/*
+ * Tegra host1x Interrupt Management
+ *
+ * Copyright (c) 2010-2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef __HOST1X_INTR_H
+#define __HOST1X_INTR_H
+
+#include <linux/kthread.h>
+#include <linux/semaphore.h>
+#include <linux/interrupt.h>
+#include <linux/workqueue.h>
+
+struct host1x_channel;
+
+enum host1x_intr_action {
+	/*
+	 * Wake up a  task.
+	 * 'data' points to a wait_queue_head_t
+	 */
+	HOST1X_INTR_ACTION_WAKEUP,
+
+	/*
+	 * Wake up a interruptible task.
+	 * 'data' points to a wait_queue_head_t
+	 */
+	HOST1X_INTR_ACTION_WAKEUP_INTERRUPTIBLE,
+
+	HOST1X_INTR_ACTION_COUNT
+};
+
+struct host1x_intr;
+
+struct host1x_intr_syncpt {
+	struct host1x_intr *intr;
+	u8 id;
+	spinlock_t lock;
+	struct list_head wait_head;
+	char thresh_irq_name[12];
+	struct work_struct work;
+};
+
+struct host1x_intr {
+	struct host1x_intr_syncpt *syncpt;
+	struct mutex mutex;
+	int syncpt_irq;
+	struct workqueue_struct *wq;
+};
+#define intr_to_host1x(x) container_of(x, struct host1x, intr)
+#define intr_syncpt_to_intr(is) (is->intr)
+
+/*
+ * Schedule an action to be taken when a sync point reaches the given threshold.
+ *
+ * @id the sync point
+ * @thresh the threshold
+ * @action the action to take
+ * @data a pointer to extra data depending on action, see above
+ * @waiter waiter allocated with host1x_intr_alloc_waiter - assumes ownership
+ * @ref must be passed if cancellation is possible, else NULL
+ *
+ * This is a non-blocking api.
+ */
+int host1x_intr_add_action(struct host1x_intr *intr, u32 id, u32 thresh,
+			enum host1x_intr_action action, void *data,
+			void *waiter,
+			void **ref);
+
+/*
+ * Allocate a waiter.
+ */
+void *host1x_intr_alloc_waiter(void);
+
+/*
+ * Unreference an action submitted to host1x_intr_add_action().
+ * You must call this if you passed non-NULL as ref.
+ * @ref the ref returned from host1x_intr_add_action()
+ */
+void host1x_intr_put_ref(struct host1x_intr *intr, u32 id, void *ref);
+
+int host1x_intr_init(struct host1x_intr *intr, u32 irq_sync);
+void host1x_intr_deinit(struct host1x_intr *intr);
+void host1x_intr_start(struct host1x_intr *intr, u32 hz);
+void host1x_intr_stop(struct host1x_intr *intr);
+
+irqreturn_t host1x_syncpt_thresh_fn(void *dev_id);
+#endif
diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
index d551325..9987548 100644
--- a/drivers/gpu/host1x/syncpt.c
+++ b/drivers/gpu/host1x/syncpt.c
@@ -22,6 +22,7 @@
 #include <linux/module.h>
 #include "syncpt.h"
 #include "dev.h"
+#include "intr.h"
 #include <trace/events/host1x.h>
 
 #define MAX_SYNCPT_LENGTH	5
@@ -129,6 +130,166 @@ void host1x_syncpt_incr(struct host1x_syncpt *sp)
 }
 EXPORT_SYMBOL(host1x_syncpt_incr);
 
+/*
+ * Updated sync point form hardware, and returns true if syncpoint is expired,
+ * false if we may need to wait
+ */
+static bool syncpt_load_min_is_expired(
+	struct host1x_syncpt *sp,
+	u32 thresh)
+{
+	sp->dev->syncpt_op.load_min(sp);
+	return host1x_syncpt_is_expired(sp, thresh);
+}
+
+/*
+ * Main entrypoint for syncpoint value waits.
+ */
+int host1x_syncpt_wait(struct host1x_syncpt *sp,
+			u32 thresh, long timeout, u32 *value)
+{
+	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
+	void *ref;
+	void *waiter;
+	int err = 0, check_count = 0;
+	u32 val;
+
+	if (value)
+		*value = 0;
+
+	/* first check cache */
+	if (host1x_syncpt_is_expired(sp, thresh)) {
+		if (value)
+			*value = host1x_syncpt_read_min(sp);
+		return 0;
+	}
+
+	/* try to read from register */
+	val = sp->dev->syncpt_op.load_min(sp);
+	if (host1x_syncpt_is_expired(sp, thresh)) {
+		if (value)
+			*value = val;
+		goto done;
+	}
+
+	if (!timeout) {
+		err = -EAGAIN;
+		goto done;
+	}
+
+	/* schedule a wakeup when the syncpoint value is reached */
+	waiter = host1x_intr_alloc_waiter();
+	if (!waiter) {
+		err = -ENOMEM;
+		goto done;
+	}
+
+	err = host1x_intr_add_action(&(sp->dev->intr), sp->id, thresh,
+				HOST1X_INTR_ACTION_WAKEUP_INTERRUPTIBLE, &wq,
+				waiter,
+				&ref);
+	if (err)
+		goto done;
+
+	err = -EAGAIN;
+	/* Caller-specified timeout may be impractically low */
+	if (timeout < 0)
+		timeout = LONG_MAX;
+
+	/* wait for the syncpoint, or timeout, or signal */
+	while (timeout) {
+		long check = min_t(long, SYNCPT_CHECK_PERIOD, timeout);
+		int remain = wait_event_interruptible_timeout(wq,
+				syncpt_load_min_is_expired(sp, thresh),
+				check);
+		if (remain > 0 || host1x_syncpt_is_expired(sp, thresh)) {
+			if (value)
+				*value = host1x_syncpt_read_min(sp);
+			err = 0;
+			break;
+		}
+		if (remain < 0) {
+			err = remain;
+			break;
+		}
+		timeout -= check;
+		if (timeout && check_count <= MAX_STUCK_CHECK_COUNT) {
+			dev_warn(&sp->dev->dev->dev,
+				"%s: syncpoint id %d (%s) stuck waiting %d, timeout=%ld\n",
+				 current->comm, sp->id, sp->name,
+				 thresh, timeout);
+			sp->dev->syncpt_op.debug(sp);
+			check_count++;
+		}
+	}
+	host1x_intr_put_ref(&(sp->dev->intr), sp->id, ref);
+
+done:
+	return err;
+}
+EXPORT_SYMBOL(host1x_syncpt_wait);
+
+/*
+ * Returns true if syncpoint is expired, false if we may need to wait
+ */
+bool host1x_syncpt_is_expired(
+	struct host1x_syncpt *sp,
+	u32 thresh)
+{
+	u32 current_val;
+	u32 future_val;
+	smp_rmb();
+	current_val = (u32)atomic_read(&sp->min_val);
+	future_val = (u32)atomic_read(&sp->max_val);
+
+	/* Note the use of unsigned arithmetic here (mod 1<<32).
+	 *
+	 * c = current_val = min_val	= the current value of the syncpoint.
+	 * t = thresh			= the value we are checking
+	 * f = future_val  = max_val	= the value c will reach when all
+	 *				  outstanding increments have completed.
+	 *
+	 * Note that c always chases f until it reaches f.
+	 *
+	 * Dtf = (f - t)
+	 * Dtc = (c - t)
+	 *
+	 *  Consider all cases:
+	 *
+	 *	A) .....c..t..f.....	Dtf < Dtc	need to wait
+	 *	B) .....c.....f..t..	Dtf > Dtc	expired
+	 *	C) ..t..c.....f.....	Dtf > Dtc	expired	   (Dct very large)
+	 *
+	 *  Any case where f==c: always expired (for any t).	Dtf == Dcf
+	 *  Any case where t==c: always expired (for any f).	Dtf >= Dtc (because Dtc==0)
+	 *  Any case where t==f!=c: always wait.		Dtf <  Dtc (because Dtf==0,
+	 *							Dtc!=0)
+	 *
+	 *  Other cases:
+	 *
+	 *	A) .....t..f..c.....	Dtf < Dtc	need to wait
+	 *	A) .....f..c..t.....	Dtf < Dtc	need to wait
+	 *	A) .....f..t..c.....	Dtf > Dtc	expired
+	 *
+	 *   So:
+	 *	   Dtf >= Dtc implies EXPIRED	(return true)
+	 *	   Dtf <  Dtc implies WAIT	(return false)
+	 *
+	 * Note: If t is expired then we *cannot* wait on it. We would wait
+	 * forever (hang the system).
+	 *
+	 * Note: do NOT get clever and remove the -thresh from both sides. It
+	 * is NOT the same.
+	 *
+	 * If future valueis zero, we have a client managed sync point. In that
+	 * case we do a direct comparison.
+	 */
+	if (!host1x_syncpt_client_managed(sp))
+		return future_val - thresh >= current_val - thresh;
+	else
+		return (s32)(current_val - thresh) >= 0;
+}
+
 void host1x_syncpt_debug(struct host1x_syncpt *sp)
 {
 	sp->dev->syncpt_op.debug(sp);
diff --git a/drivers/gpu/host1x/syncpt.h b/drivers/gpu/host1x/syncpt.h
index 4f7777b..d4d1f3f 100644
--- a/drivers/gpu/host1x/syncpt.h
+++ b/drivers/gpu/host1x/syncpt.h
@@ -106,6 +106,7 @@ struct host1x_syncpt *host1x_syncpt_get(struct host1x *dev, u32 id);
 void host1x_syncpt_cpu_incr(struct host1x_syncpt *sp);
 
 u32 host1x_syncpt_load_min(struct host1x_syncpt *sp);
+bool host1x_syncpt_is_expired(struct host1x_syncpt *sp, u32 thresh);
 
 void host1x_syncpt_save(struct host1x *dev);
 
@@ -117,6 +118,9 @@ u32 host1x_syncpt_read_wait_base(struct host1x_syncpt *sp);
 void host1x_syncpt_incr(struct host1x_syncpt *sp);
 u32 host1x_syncpt_incr_max(struct host1x_syncpt *sp, u32 incrs);
 
+int host1x_syncpt_wait(struct host1x_syncpt *sp, u32 thresh,
+			long timeout, u32 *value);
+
 void host1x_syncpt_debug(struct host1x_syncpt *sp);
 
 static inline int host1x_syncpt_is_valid(struct host1x_syncpt *sp)
diff --git a/include/linux/host1x.h b/include/linux/host1x.h
index 6c2cc8a..00060ee 100644
--- a/include/linux/host1x.h
+++ b/include/linux/host1x.h
@@ -33,6 +33,7 @@ struct host1x_syncpt;
 u32 host1x_syncpt_id(struct host1x_syncpt *sp);
 void host1x_syncpt_incr_byid(u32 id);
 u32 host1x_syncpt_read_byid(u32 id);
+int host1x_syncpt_wait_byid(u32 id, u32 thresh, long timeout, u32 *value);
 
 struct host1x_syncpt *host1x_syncpt_alloc(struct platform_device *pdev,
 		int client_managed);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCHv3 2/7] gpu: host1x: Add syncpoint wait and interrupts
@ 2012-12-13 14:04   ` Terje Bergstrom
  0 siblings, 0 replies; 24+ messages in thread
From: Terje Bergstrom @ 2012-12-13 14:04 UTC (permalink / raw)
  To: tbergstrom, thierry.reding, dev, linux-tegra, dri-devel
  Cc: amerilainen, linux-kernel

Add support for sync point interrupts, and sync point wait. Sync
point wait used interrupts for unblocking wait.

Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
---
 drivers/gpu/host1x/Makefile              |    3 +-
 drivers/gpu/host1x/dev.c                 |   44 ++++
 drivers/gpu/host1x/dev.h                 |   15 ++
 drivers/gpu/host1x/hw/host1x01.c         |    2 +
 drivers/gpu/host1x/hw/hw_host1x01_sync.h |   30 ++-
 drivers/gpu/host1x/hw/intr_hw.c          |  175 +++++++++++++++
 drivers/gpu/host1x/intr.c                |  350 ++++++++++++++++++++++++++++++
 drivers/gpu/host1x/intr.h                |  100 +++++++++
 drivers/gpu/host1x/syncpt.c              |  161 ++++++++++++++
 drivers/gpu/host1x/syncpt.h              |    4 +
 include/linux/host1x.h                   |    1 +
 11 files changed, 883 insertions(+), 2 deletions(-)
 create mode 100644 drivers/gpu/host1x/hw/intr_hw.c
 create mode 100644 drivers/gpu/host1x/intr.c
 create mode 100644 drivers/gpu/host1x/intr.h

diff --git a/drivers/gpu/host1x/Makefile b/drivers/gpu/host1x/Makefile
index a4adcc6..9d00b62 100644
--- a/drivers/gpu/host1x/Makefile
+++ b/drivers/gpu/host1x/Makefile
@@ -2,7 +2,8 @@ ccflags-y = -Idrivers/gpu/host1x
 
 host1x-objs = \
 	syncpt.o \
-	dev.o
+	dev.o \
+	intr.o
 
 obj-$(CONFIG_TEGRA_HOST1X) += hw/
 obj-$(CONFIG_TEGRA_HOST1X) += host1x.o
diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
index b0d630d..9255a49 100644
--- a/drivers/gpu/host1x/dev.c
+++ b/drivers/gpu/host1x/dev.c
@@ -25,6 +25,7 @@
 #include <linux/clk.h>
 #include <linux/io.h>
 #include "dev.h"
+#include "intr.h"
 #include "hw/host1x01.h"
 
 #define CREATE_TRACE_POINTS
@@ -48,6 +49,19 @@ u32 host1x_syncpt_read_byid(u32 id)
 }
 EXPORT_SYMBOL(host1x_syncpt_read_byid);
 
+int host1x_syncpt_wait_byid(u32 id, u32 thresh, long timeout, u32 *value)
+{
+	struct host1x_syncpt *sp = host1x->syncpt + id;
+	return host1x_syncpt_wait(sp, thresh, timeout, value);
+}
+EXPORT_SYMBOL(host1x_syncpt_wait_byid);
+
+static void host1x_free_resources(struct host1x *host)
+{
+	kfree(host->intr.syncpt);
+	host->intr.syncpt = 0;
+}
+
 void host1x_sync_writel(struct host1x *host1x, u32 v, u32 r)
 {
 	void __iomem *sync_regs = host1x->regs + host1x->info.sync_offset;
@@ -62,6 +76,21 @@ u32 host1x_sync_readl(struct host1x *host1x, u32 r)
 	return readl(sync_regs + r);
 }
 
+static int host1x_alloc_resources(struct host1x *host)
+{
+	host->intr.syncpt = devm_kzalloc(&host->dev->dev,
+			sizeof(struct host1x_intr_syncpt) *
+			host->info.nb_pts,
+			GFP_KERNEL);
+
+	if (!host->intr.syncpt) {
+		/* frees happen in the support removal phase */
+		return -ENOMEM;
+	}
+
+	return 0;
+}
+
 static struct host1x_device_info host1x_info = {
 	.nb_channels	= 8,
 	.nb_pts		= 32,
@@ -118,6 +147,12 @@ static int host1x_probe(struct platform_device *dev)
 		goto fail;
 	}
 
+	err = host1x_alloc_resources(host);
+	if (err) {
+		dev_err(&dev->dev, "failed to init chip support\n");
+		goto fail;
+	}
+
 	if (host->info.init) {
 		err = host->info.init(host);
 		if (err)
@@ -132,6 +167,10 @@ static int host1x_probe(struct platform_device *dev)
 	if (!host->nop_sp)
 		goto fail;
 
+	err = host1x_intr_init(&host->intr, syncpt_irq);
+	if (err)
+		goto fail;
+
 	host->clk = devm_clk_get(&dev->dev, NULL);
 	if (IS_ERR(host->clk)) {
 		dev_err(&dev->dev, "failed to get clock\n");
@@ -145,6 +184,8 @@ static int host1x_probe(struct platform_device *dev)
 
 	host1x_syncpt_reset(host);
 
+	host1x_intr_start(&host->intr, clk_get_rate(host->clk));
+
 	host1x = host;
 
 	dev_info(&dev->dev, "initialized\n");
@@ -153,6 +194,7 @@ static int host1x_probe(struct platform_device *dev)
 
 fail:
 	host1x_syncpt_free(host->nop_sp);
+	host1x_free_resources(host);
 	kfree(host);
 	return err;
 }
@@ -160,8 +202,10 @@ fail:
 static int __exit host1x_remove(struct platform_device *dev)
 {
 	struct host1x *host = platform_get_drvdata(dev);
+	host1x_intr_deinit(&host->intr);
 	host1x_syncpt_deinit(host);
 	clk_disable_unprepare(host->clk);
+	host1x_free_resources(host);
 	return 0;
 }
 
diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h
index 8245e24..a1622bb 100644
--- a/drivers/gpu/host1x/dev.h
+++ b/drivers/gpu/host1x/dev.h
@@ -20,6 +20,7 @@
 #include <linux/host1x.h>
 
 #include "syncpt.h"
+#include "intr.h"
 
 struct host1x;
 struct host1x_syncpt;
@@ -36,6 +37,18 @@ struct host1x_syncpt_ops {
 	const char * (*name)(struct host1x_syncpt *);
 };
 
+struct host1x_intr_ops {
+	void (*init_host_sync)(struct host1x_intr *);
+	void (*set_host_clocks_per_usec)(
+		struct host1x_intr *, u32 clocks);
+	void (*set_syncpt_threshold)(
+		struct host1x_intr *, u32 id, u32 thresh);
+	void (*enable_syncpt_intr)(struct host1x_intr *, u32 id);
+	void (*disable_syncpt_intr)(struct host1x_intr *, u32 id);
+	void (*disable_all_syncpt_intrs)(struct host1x_intr *);
+	int (*free_syncpt_irq)(struct host1x_intr *);
+};
+
 struct host1x_device_info {
 	int	nb_channels;		/* host1x: num channels supported */
 	int	nb_pts;			/* host1x: num syncpoints supported */
@@ -48,6 +61,7 @@ struct host1x_device_info {
 struct host1x {
 	void __iomem *regs;
 	struct host1x_syncpt *syncpt;
+	struct host1x_intr intr;
 	struct platform_device *dev;
 	atomic_t clientid;
 	struct host1x_device_info info;
@@ -57,6 +71,7 @@ struct host1x {
 
 	const char *soc_name;
 	struct host1x_syncpt_ops syncpt_op;
+	struct host1x_intr_ops intr_op;
 
 	struct dentry *debugfs;
 };
diff --git a/drivers/gpu/host1x/hw/host1x01.c b/drivers/gpu/host1x/hw/host1x01.c
index 59176ba..c5c55a3 100644
--- a/drivers/gpu/host1x/hw/host1x01.c
+++ b/drivers/gpu/host1x/hw/host1x01.c
@@ -27,10 +27,12 @@
 #include "hw/host1x01_hardware.h"
 
 #include "hw/syncpt_hw.c"
+#include "hw/intr_hw.c"
 
 int host1x01_init(struct host1x *host)
 {
 	host->syncpt_op = host1x_syncpt_ops;
+	host->intr_op = host1x_intr_ops;
 
 	return 0;
 }
diff --git a/drivers/gpu/host1x/hw/hw_host1x01_sync.h b/drivers/gpu/host1x/hw/hw_host1x01_sync.h
index 63a71c8..b06a2c5 100644
--- a/drivers/gpu/host1x/hw/hw_host1x01_sync.h
+++ b/drivers/gpu/host1x/hw/hw_host1x01_sync.h
@@ -51,10 +51,38 @@
 #ifndef __hw_host1x_sync_h__
 #define __hw_host1x_sync_h__
 
+static inline u32 host1x_sync_syncpt_thresh_cpu0_int_status_r(void)
+{
+	return 0x40;
+}
+static inline u32 host1x_sync_syncpt_thresh_int_disable_r(void)
+{
+	return 0x60;
+}
+static inline u32 host1x_sync_syncpt_thresh_int_enable_cpu0_r(void)
+{
+	return 0x68;
+}
+static inline u32 host1x_sync_usec_clk_r(void)
+{
+	return 0x1a4;
+}
+static inline u32 host1x_sync_ctxsw_timeout_cfg_r(void)
+{
+	return 0x1a8;
+}
+static inline u32 host1x_sync_ip_busy_timeout_r(void)
+{
+	return 0x1bc;
+}
 static inline u32 host1x_sync_syncpt_0_r(void)
 {
 	return 0x400;
 }
+static inline u32 host1x_sync_syncpt_int_thresh_0_r(void)
+{
+	return 0x500;
+}
 static inline u32 host1x_sync_syncpt_base_0_r(void)
 {
 	return 0x600;
@@ -63,4 +91,4 @@ static inline u32 host1x_sync_syncpt_cpu_incr_r(void)
 {
 	return 0x700;
 }
-#endif /* __hw_host1x_host1x_h__ */
+#endif /* __hw_host1x_sync_h__ */
diff --git a/drivers/gpu/host1x/hw/intr_hw.c b/drivers/gpu/host1x/hw/intr_hw.c
new file mode 100644
index 0000000..8efbd51
--- /dev/null
+++ b/drivers/gpu/host1x/hw/intr_hw.c
@@ -0,0 +1,175 @@
+/*
+ * Tegra host1x Interrupt Management
+ *
+ * Copyright (C) 2010 Google, Inc.
+ * Copyright (c) 2010-2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/interrupt.h>
+#include <linux/irq.h>
+#include <linux/io.h>
+#include <asm/mach/irq.h>
+
+#include "intr.h"
+#include "dev.h"
+
+/* Spacing between sync registers */
+#define REGISTER_STRIDE 4
+
+static void host1x_intr_syncpt_thresh_isr(struct host1x_intr_syncpt *syncpt);
+
+static void syncpt_thresh_cascade_fn(struct work_struct *work)
+{
+	struct host1x_intr_syncpt *sp =
+		container_of(work, struct host1x_intr_syncpt, work);
+	host1x_syncpt_thresh_fn(sp);
+}
+
+static irqreturn_t syncpt_thresh_cascade_isr(int irq, void *dev_id)
+{
+	struct host1x *host1x = dev_id;
+	struct host1x_intr *intr = &host1x->intr;
+	unsigned long reg;
+	int i, id;
+
+	for (i = 0; i < host1x->info.nb_pts / BITS_PER_LONG; i++) {
+		reg = host1x_sync_readl(host1x,
+				host1x_sync_syncpt_thresh_cpu0_int_status_r() +
+				i * REGISTER_STRIDE);
+		for_each_set_bit(id, &reg, BITS_PER_LONG) {
+			struct host1x_intr_syncpt *sp =
+				intr->syncpt + (i * BITS_PER_LONG + id);
+			host1x_intr_syncpt_thresh_isr(sp);
+			queue_work(intr->wq, &sp->work);
+		}
+	}
+
+	return IRQ_HANDLED;
+}
+
+static void host1x_intr_init_host_sync(struct host1x_intr *intr)
+{
+	struct host1x *host1x = intr_to_host1x(intr);
+	int i, err;
+
+	host1x_sync_writel(host1x, 0xffffffffUL,
+		host1x_sync_syncpt_thresh_int_disable_r());
+	host1x_sync_writel(host1x, 0xffffffffUL,
+		host1x_sync_syncpt_thresh_cpu0_int_status_r());
+
+	for (i = 0; i < host1x->info.nb_pts; i++)
+		INIT_WORK(&intr->syncpt[i].work, syncpt_thresh_cascade_fn);
+
+	err = devm_request_irq(&host1x->dev->dev, intr->syncpt_irq,
+				syncpt_thresh_cascade_isr,
+				IRQF_SHARED, "host1x_syncpt", host1x);
+	WARN_ON(IS_ERR_VALUE(err));
+
+	/* disable the ip_busy_timeout. this prevents write drops */
+	host1x_sync_writel(host1x, 0, host1x_sync_ip_busy_timeout_r());
+
+	/*
+	 * increase the auto-ack timout to the maximum value. 2d will hang
+	 * otherwise on Tegra2.
+	 */
+	host1x_sync_writel(host1x, 0xff, host1x_sync_ctxsw_timeout_cfg_r());
+}
+
+static void host1x_intr_set_host_clocks_per_usec(struct host1x_intr *intr,
+		u32 cpm)
+{
+	struct host1x *host1x = intr_to_host1x(intr);
+	/* write microsecond clock register */
+	host1x_sync_writel(host1x, cpm, host1x_sync_usec_clk_r());
+}
+
+static void host1x_intr_set_syncpt_threshold(struct host1x_intr *intr,
+	u32 id, u32 thresh)
+{
+	struct host1x *host1x = intr_to_host1x(intr);
+	host1x_sync_writel(host1x, thresh,
+		host1x_sync_syncpt_int_thresh_0_r() + id * REGISTER_STRIDE);
+}
+
+static void host1x_intr_enable_syncpt_intr(struct host1x_intr *intr, u32 id)
+{
+	struct host1x *host1x = intr_to_host1x(intr);
+
+	host1x_sync_writel(host1x, BIT_MASK(id),
+			host1x_sync_syncpt_thresh_int_enable_cpu0_r() +
+			BIT_WORD(id) * REGISTER_STRIDE);
+}
+
+static void host1x_intr_disable_syncpt_intr(struct host1x_intr *intr, u32 id)
+{
+	struct host1x *host1x = intr_to_host1x(intr);
+
+	host1x_sync_writel(host1x, BIT_MASK(id),
+			host1x_sync_syncpt_thresh_int_disable_r() +
+			BIT_WORD(id) * REGISTER_STRIDE);
+
+	host1x_sync_writel(host1x, BIT_MASK(id),
+		host1x_sync_syncpt_thresh_cpu0_int_status_r() +
+		BIT_WORD(id) * REGISTER_STRIDE);
+}
+
+static void host1x_intr_disable_all_syncpt_intrs(struct host1x_intr *intr)
+{
+	struct host1x *host1x = intr_to_host1x(intr);
+	u32 reg;
+
+	for (reg = 0; reg <= BIT_WORD(host1x->info.nb_pts) * REGISTER_STRIDE;
+			reg += REGISTER_STRIDE) {
+		host1x_sync_writel(host1x, 0xffffffffu,
+				host1x_sync_syncpt_thresh_int_disable_r() +
+				reg);
+
+		host1x_sync_writel(host1x, 0xffffffffu,
+			host1x_sync_syncpt_thresh_cpu0_int_status_r() + reg);
+	}
+}
+
+/*
+ * Sync point threshold interrupt service function
+ * Handles sync point threshold triggers, in interrupt context
+ */
+static void host1x_intr_syncpt_thresh_isr(struct host1x_intr_syncpt *syncpt)
+{
+	unsigned int id = syncpt->id;
+	struct host1x_intr *intr = intr_syncpt_to_intr(syncpt);
+	struct host1x *host1x = intr_to_host1x(intr);
+	u32 reg = BIT_WORD(id) * REGISTER_STRIDE;
+
+	host1x_sync_writel(host1x, BIT_MASK(id),
+		host1x_sync_syncpt_thresh_int_disable_r() + reg);
+	host1x_sync_writel(host1x, BIT_MASK(id),
+		host1x_sync_syncpt_thresh_cpu0_int_status_r() + reg);
+}
+
+static int host1x_free_syncpt_irq(struct host1x_intr *intr)
+{
+	flush_workqueue(intr->wq);
+	return 0;
+}
+
+static const struct host1x_intr_ops host1x_intr_ops = {
+	.init_host_sync = host1x_intr_init_host_sync,
+	.set_host_clocks_per_usec = host1x_intr_set_host_clocks_per_usec,
+	.set_syncpt_threshold = host1x_intr_set_syncpt_threshold,
+	.enable_syncpt_intr = host1x_intr_enable_syncpt_intr,
+	.disable_syncpt_intr = host1x_intr_disable_syncpt_intr,
+	.disable_all_syncpt_intrs = host1x_intr_disable_all_syncpt_intrs,
+	.free_syncpt_irq = host1x_free_syncpt_irq,
+};
diff --git a/drivers/gpu/host1x/intr.c b/drivers/gpu/host1x/intr.c
new file mode 100644
index 0000000..f166224
--- /dev/null
+++ b/drivers/gpu/host1x/intr.c
@@ -0,0 +1,350 @@
+/*
+ * Tegra host1x Interrupt Management
+ *
+ * Copyright (c) 2010-2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "intr.h"
+#include <linux/interrupt.h>
+#include <linux/slab.h>
+#include <linux/irq.h>
+#include "dev.h"
+
+/* Wait list management */
+
+struct host1x_waitlist {
+	struct list_head list;
+	struct kref refcount;
+	u32 thresh;
+	enum host1x_intr_action action;
+	atomic_t state;
+	void *data;
+	int count;
+};
+
+enum waitlist_state {
+	WLS_PENDING,
+	WLS_REMOVED,
+	WLS_CANCELLED,
+	WLS_HANDLED
+};
+
+static void waiter_release(struct kref *kref)
+{
+	kfree(container_of(kref, struct host1x_waitlist, refcount));
+}
+
+/*
+ * add a waiter to a waiter queue, sorted by threshold
+ * returns true if it was added at the head of the queue
+ */
+static bool add_waiter_to_queue(struct host1x_waitlist *waiter,
+				struct list_head *queue)
+{
+	struct host1x_waitlist *pos;
+	u32 thresh = waiter->thresh;
+
+	list_for_each_entry_reverse(pos, queue, list)
+		if ((s32)(pos->thresh - thresh) <= 0) {
+			list_add(&waiter->list, &pos->list);
+			return false;
+		}
+
+	list_add(&waiter->list, queue);
+	return true;
+}
+
+/*
+ * run through a waiter queue for a single sync point ID
+ * and gather all completed waiters into lists by actions
+ */
+static void remove_completed_waiters(struct list_head *head, u32 sync,
+			struct list_head completed[HOST1X_INTR_ACTION_COUNT])
+{
+	struct list_head *dest;
+	struct host1x_waitlist *waiter, *next;
+
+	list_for_each_entry_safe(waiter, next, head, list) {
+		if ((s32)(waiter->thresh - sync) > 0)
+			break;
+
+		dest = completed + waiter->action;
+
+		/* PENDING->REMOVED or CANCELLED->HANDLED */
+		if (atomic_inc_return(&waiter->state) == WLS_HANDLED || !dest) {
+			list_del(&waiter->list);
+			kref_put(&waiter->refcount, waiter_release);
+		} else {
+			list_move_tail(&waiter->list, dest);
+		}
+	}
+}
+
+void reset_threshold_interrupt(struct host1x_intr *intr,
+			       struct list_head *head,
+			       unsigned int id)
+{
+	struct host1x *host1x = intr_to_host1x(intr);
+	u32 thresh = list_first_entry(head,
+				struct host1x_waitlist, list)->thresh;
+
+	host1x->intr_op.set_syncpt_threshold(intr, id, thresh);
+	host1x->intr_op.enable_syncpt_intr(intr, id);
+}
+
+static void action_wakeup(struct host1x_waitlist *waiter)
+{
+	wait_queue_head_t *wq = waiter->data;
+
+	wake_up(wq);
+}
+
+static void action_wakeup_interruptible(struct host1x_waitlist *waiter)
+{
+	wait_queue_head_t *wq = waiter->data;
+
+	wake_up_interruptible(wq);
+}
+
+typedef void (*action_handler)(struct host1x_waitlist *waiter);
+
+static action_handler action_handlers[HOST1X_INTR_ACTION_COUNT] = {
+	action_wakeup,
+	action_wakeup_interruptible,
+};
+
+static void run_handlers(struct list_head completed[HOST1X_INTR_ACTION_COUNT])
+{
+	struct list_head *head = completed;
+	int i;
+
+	for (i = 0; i < HOST1X_INTR_ACTION_COUNT; ++i, ++head) {
+		action_handler handler = action_handlers[i];
+		struct host1x_waitlist *waiter, *next;
+
+		list_for_each_entry_safe(waiter, next, head, list) {
+			list_del(&waiter->list);
+			handler(waiter);
+			WARN_ON(atomic_xchg(&waiter->state, WLS_HANDLED)
+					!= WLS_REMOVED);
+			kref_put(&waiter->refcount, waiter_release);
+		}
+	}
+}
+
+/*
+ * Remove & handle all waiters that have completed for the given syncpt
+ */
+static int process_wait_list(struct host1x_intr *intr,
+			     struct host1x_intr_syncpt *syncpt,
+			     u32 threshold)
+{
+	struct host1x *host1x = intr_to_host1x(intr);
+	struct list_head completed[HOST1X_INTR_ACTION_COUNT];
+	unsigned int i;
+	int empty;
+
+	for (i = 0; i < HOST1X_INTR_ACTION_COUNT; ++i)
+		INIT_LIST_HEAD(completed + i);
+
+	spin_lock(&syncpt->lock);
+
+	remove_completed_waiters(&syncpt->wait_head, threshold, completed);
+
+	empty = list_empty(&syncpt->wait_head);
+	if (empty)
+		host1x->intr_op.disable_syncpt_intr(intr, syncpt->id);
+	else
+		reset_threshold_interrupt(intr, &syncpt->wait_head,
+					  syncpt->id);
+
+	spin_unlock(&syncpt->lock);
+
+	run_handlers(completed);
+
+	return empty;
+}
+
+/*
+ * Sync point threshold interrupt service thread function
+ * Handles sync point threshold triggers, in thread context
+ */
+irqreturn_t host1x_syncpt_thresh_fn(void *dev_id)
+{
+	struct host1x_intr_syncpt *syncpt = dev_id;
+	unsigned int id = syncpt->id;
+	struct host1x_intr *intr = intr_syncpt_to_intr(syncpt);
+	struct host1x *host1x = intr_to_host1x(intr);
+
+	(void)process_wait_list(intr, syncpt,
+				host1x_syncpt_load_min(host1x->syncpt + id));
+
+	return IRQ_HANDLED;
+}
+
+int host1x_intr_add_action(struct host1x_intr *intr, u32 id, u32 thresh,
+			enum host1x_intr_action action, void *data,
+			void *_waiter,
+			void **ref)
+{
+	struct host1x *host1x = intr_to_host1x(intr);
+	struct host1x_waitlist *waiter = _waiter;
+	struct host1x_intr_syncpt *syncpt;
+	int queue_was_empty;
+
+	if (waiter == NULL) {
+		pr_warn("%s: NULL waiter\n", __func__);
+		return -EINVAL;
+	}
+
+	/* initialize a new waiter */
+	INIT_LIST_HEAD(&waiter->list);
+	kref_init(&waiter->refcount);
+	if (ref)
+		kref_get(&waiter->refcount);
+	waiter->thresh = thresh;
+	waiter->action = action;
+	atomic_set(&waiter->state, WLS_PENDING);
+	waiter->data = data;
+	waiter->count = 1;
+
+	syncpt = intr->syncpt + id;
+
+	spin_lock(&syncpt->lock);
+
+	queue_was_empty = list_empty(&syncpt->wait_head);
+
+	if (add_waiter_to_queue(waiter, &syncpt->wait_head)) {
+		/* added at head of list - new threshold value */
+		host1x->intr_op.set_syncpt_threshold(intr, id, thresh);
+
+		/* added as first waiter - enable interrupt */
+		if (queue_was_empty)
+			host1x->intr_op.enable_syncpt_intr(intr, id);
+	}
+
+	spin_unlock(&syncpt->lock);
+
+	if (ref)
+		*ref = waiter;
+	return 0;
+}
+
+void *host1x_intr_alloc_waiter()
+{
+	return kzalloc(sizeof(struct host1x_waitlist), GFP_KERNEL);
+}
+
+void host1x_intr_put_ref(struct host1x_intr *intr, u32 id, void *ref)
+{
+	struct host1x_waitlist *waiter = ref;
+	struct host1x_intr_syncpt *syncpt;
+	struct host1x *host1x = intr_to_host1x(intr);
+
+	while (atomic_cmpxchg(&waiter->state,
+				WLS_PENDING, WLS_CANCELLED) == WLS_REMOVED)
+		schedule();
+
+	syncpt = intr->syncpt + id;
+	(void)process_wait_list(intr, syncpt,
+				host1x_syncpt_load_min(host1x->syncpt + id));
+
+	kref_put(&waiter->refcount, waiter_release);
+}
+
+int host1x_intr_init(struct host1x_intr *intr, u32 irq_sync)
+{
+	unsigned int id;
+	struct host1x *host1x = intr_to_host1x(intr);
+	u32 nb_pts = host1x_syncpt_nb_pts(host1x);
+
+	mutex_init(&intr->mutex);
+	intr->syncpt_irq = irq_sync;
+	intr->wq = create_workqueue("host_syncpt");
+	if (!intr->wq)
+		return -ENOMEM;
+
+	host1x->intr_op.init_host_sync(intr);
+
+	for (id = 0; id < nb_pts; ++id) {
+		struct host1x_intr_syncpt *syncpt = &intr->syncpt[id];
+
+		syncpt->intr = &host1x->intr;
+		syncpt->id = id;
+		spin_lock_init(&syncpt->lock);
+		INIT_LIST_HEAD(&syncpt->wait_head);
+		snprintf(syncpt->thresh_irq_name,
+			sizeof(syncpt->thresh_irq_name),
+			"host1x_sp_%02d", id);
+	}
+
+	return 0;
+}
+
+void host1x_intr_deinit(struct host1x_intr *intr)
+{
+	host1x_intr_stop(intr);
+	destroy_workqueue(intr->wq);
+}
+
+void host1x_intr_start(struct host1x_intr *intr, u32 hz)
+{
+	struct host1x *host1x = intr_to_host1x(intr);
+	mutex_lock(&intr->mutex);
+
+	host1x->intr_op.init_host_sync(intr);
+	host1x->intr_op.set_host_clocks_per_usec(intr,
+			DIV_ROUND_UP(hz, 1000000));
+
+	mutex_unlock(&intr->mutex);
+}
+
+void host1x_intr_stop(struct host1x_intr *intr)
+{
+	unsigned int id;
+	struct host1x *host1x = intr_to_host1x(intr);
+	struct host1x_intr_syncpt *syncpt;
+	u32 nb_pts = host1x_syncpt_nb_pts(intr_to_host1x(intr));
+
+	mutex_lock(&intr->mutex);
+
+	host1x->intr_op.disable_all_syncpt_intrs(intr);
+
+	for (id = 0, syncpt = intr->syncpt;
+	     id < nb_pts;
+	     ++id, ++syncpt) {
+		struct host1x_waitlist *waiter, *next;
+		list_for_each_entry_safe(waiter, next,
+				&syncpt->wait_head, list) {
+			if (atomic_cmpxchg(&waiter->state,
+						WLS_CANCELLED, WLS_HANDLED)
+				== WLS_CANCELLED) {
+				list_del(&waiter->list);
+				kref_put(&waiter->refcount, waiter_release);
+			}
+		}
+
+		if (!list_empty(&syncpt->wait_head)) {  /* output diagnostics */
+			mutex_unlock(&intr->mutex);
+			pr_warn("%s cannot stop syncpt intr id=%d\n",
+					__func__, id);
+			return;
+		}
+	}
+
+	host1x->intr_op.free_syncpt_irq(intr);
+
+	mutex_unlock(&intr->mutex);
+}
diff --git a/drivers/gpu/host1x/intr.h b/drivers/gpu/host1x/intr.h
new file mode 100644
index 0000000..3625bf3
--- /dev/null
+++ b/drivers/gpu/host1x/intr.h
@@ -0,0 +1,100 @@
+/*
+ * Tegra host1x Interrupt Management
+ *
+ * Copyright (c) 2010-2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef __HOST1X_INTR_H
+#define __HOST1X_INTR_H
+
+#include <linux/kthread.h>
+#include <linux/semaphore.h>
+#include <linux/interrupt.h>
+#include <linux/workqueue.h>
+
+struct host1x_channel;
+
+enum host1x_intr_action {
+	/*
+	 * Wake up a  task.
+	 * 'data' points to a wait_queue_head_t
+	 */
+	HOST1X_INTR_ACTION_WAKEUP,
+
+	/*
+	 * Wake up a interruptible task.
+	 * 'data' points to a wait_queue_head_t
+	 */
+	HOST1X_INTR_ACTION_WAKEUP_INTERRUPTIBLE,
+
+	HOST1X_INTR_ACTION_COUNT
+};
+
+struct host1x_intr;
+
+struct host1x_intr_syncpt {
+	struct host1x_intr *intr;
+	u8 id;
+	spinlock_t lock;
+	struct list_head wait_head;
+	char thresh_irq_name[12];
+	struct work_struct work;
+};
+
+struct host1x_intr {
+	struct host1x_intr_syncpt *syncpt;
+	struct mutex mutex;
+	int syncpt_irq;
+	struct workqueue_struct *wq;
+};
+#define intr_to_host1x(x) container_of(x, struct host1x, intr)
+#define intr_syncpt_to_intr(is) (is->intr)
+
+/*
+ * Schedule an action to be taken when a sync point reaches the given threshold.
+ *
+ * @id the sync point
+ * @thresh the threshold
+ * @action the action to take
+ * @data a pointer to extra data depending on action, see above
+ * @waiter waiter allocated with host1x_intr_alloc_waiter - assumes ownership
+ * @ref must be passed if cancellation is possible, else NULL
+ *
+ * This is a non-blocking api.
+ */
+int host1x_intr_add_action(struct host1x_intr *intr, u32 id, u32 thresh,
+			enum host1x_intr_action action, void *data,
+			void *waiter,
+			void **ref);
+
+/*
+ * Allocate a waiter.
+ */
+void *host1x_intr_alloc_waiter(void);
+
+/*
+ * Unreference an action submitted to host1x_intr_add_action().
+ * You must call this if you passed non-NULL as ref.
+ * @ref the ref returned from host1x_intr_add_action()
+ */
+void host1x_intr_put_ref(struct host1x_intr *intr, u32 id, void *ref);
+
+int host1x_intr_init(struct host1x_intr *intr, u32 irq_sync);
+void host1x_intr_deinit(struct host1x_intr *intr);
+void host1x_intr_start(struct host1x_intr *intr, u32 hz);
+void host1x_intr_stop(struct host1x_intr *intr);
+
+irqreturn_t host1x_syncpt_thresh_fn(void *dev_id);
+#endif
diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
index d551325..9987548 100644
--- a/drivers/gpu/host1x/syncpt.c
+++ b/drivers/gpu/host1x/syncpt.c
@@ -22,6 +22,7 @@
 #include <linux/module.h>
 #include "syncpt.h"
 #include "dev.h"
+#include "intr.h"
 #include <trace/events/host1x.h>
 
 #define MAX_SYNCPT_LENGTH	5
@@ -129,6 +130,166 @@ void host1x_syncpt_incr(struct host1x_syncpt *sp)
 }
 EXPORT_SYMBOL(host1x_syncpt_incr);
 
+/*
+ * Updated sync point form hardware, and returns true if syncpoint is expired,
+ * false if we may need to wait
+ */
+static bool syncpt_load_min_is_expired(
+	struct host1x_syncpt *sp,
+	u32 thresh)
+{
+	sp->dev->syncpt_op.load_min(sp);
+	return host1x_syncpt_is_expired(sp, thresh);
+}
+
+/*
+ * Main entrypoint for syncpoint value waits.
+ */
+int host1x_syncpt_wait(struct host1x_syncpt *sp,
+			u32 thresh, long timeout, u32 *value)
+{
+	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
+	void *ref;
+	void *waiter;
+	int err = 0, check_count = 0;
+	u32 val;
+
+	if (value)
+		*value = 0;
+
+	/* first check cache */
+	if (host1x_syncpt_is_expired(sp, thresh)) {
+		if (value)
+			*value = host1x_syncpt_read_min(sp);
+		return 0;
+	}
+
+	/* try to read from register */
+	val = sp->dev->syncpt_op.load_min(sp);
+	if (host1x_syncpt_is_expired(sp, thresh)) {
+		if (value)
+			*value = val;
+		goto done;
+	}
+
+	if (!timeout) {
+		err = -EAGAIN;
+		goto done;
+	}
+
+	/* schedule a wakeup when the syncpoint value is reached */
+	waiter = host1x_intr_alloc_waiter();
+	if (!waiter) {
+		err = -ENOMEM;
+		goto done;
+	}
+
+	err = host1x_intr_add_action(&(sp->dev->intr), sp->id, thresh,
+				HOST1X_INTR_ACTION_WAKEUP_INTERRUPTIBLE, &wq,
+				waiter,
+				&ref);
+	if (err)
+		goto done;
+
+	err = -EAGAIN;
+	/* Caller-specified timeout may be impractically low */
+	if (timeout < 0)
+		timeout = LONG_MAX;
+
+	/* wait for the syncpoint, or timeout, or signal */
+	while (timeout) {
+		long check = min_t(long, SYNCPT_CHECK_PERIOD, timeout);
+		int remain = wait_event_interruptible_timeout(wq,
+				syncpt_load_min_is_expired(sp, thresh),
+				check);
+		if (remain > 0 || host1x_syncpt_is_expired(sp, thresh)) {
+			if (value)
+				*value = host1x_syncpt_read_min(sp);
+			err = 0;
+			break;
+		}
+		if (remain < 0) {
+			err = remain;
+			break;
+		}
+		timeout -= check;
+		if (timeout && check_count <= MAX_STUCK_CHECK_COUNT) {
+			dev_warn(&sp->dev->dev->dev,
+				"%s: syncpoint id %d (%s) stuck waiting %d, timeout=%ld\n",
+				 current->comm, sp->id, sp->name,
+				 thresh, timeout);
+			sp->dev->syncpt_op.debug(sp);
+			check_count++;
+		}
+	}
+	host1x_intr_put_ref(&(sp->dev->intr), sp->id, ref);
+
+done:
+	return err;
+}
+EXPORT_SYMBOL(host1x_syncpt_wait);
+
+/*
+ * Returns true if syncpoint is expired, false if we may need to wait
+ */
+bool host1x_syncpt_is_expired(
+	struct host1x_syncpt *sp,
+	u32 thresh)
+{
+	u32 current_val;
+	u32 future_val;
+	smp_rmb();
+	current_val = (u32)atomic_read(&sp->min_val);
+	future_val = (u32)atomic_read(&sp->max_val);
+
+	/* Note the use of unsigned arithmetic here (mod 1<<32).
+	 *
+	 * c = current_val = min_val	= the current value of the syncpoint.
+	 * t = thresh			= the value we are checking
+	 * f = future_val  = max_val	= the value c will reach when all
+	 *				  outstanding increments have completed.
+	 *
+	 * Note that c always chases f until it reaches f.
+	 *
+	 * Dtf = (f - t)
+	 * Dtc = (c - t)
+	 *
+	 *  Consider all cases:
+	 *
+	 *	A) .....c..t..f.....	Dtf < Dtc	need to wait
+	 *	B) .....c.....f..t..	Dtf > Dtc	expired
+	 *	C) ..t..c.....f.....	Dtf > Dtc	expired	   (Dct very large)
+	 *
+	 *  Any case where f==c: always expired (for any t).	Dtf == Dcf
+	 *  Any case where t==c: always expired (for any f).	Dtf >= Dtc (because Dtc==0)
+	 *  Any case where t==f!=c: always wait.		Dtf <  Dtc (because Dtf==0,
+	 *							Dtc!=0)
+	 *
+	 *  Other cases:
+	 *
+	 *	A) .....t..f..c.....	Dtf < Dtc	need to wait
+	 *	A) .....f..c..t.....	Dtf < Dtc	need to wait
+	 *	A) .....f..t..c.....	Dtf > Dtc	expired
+	 *
+	 *   So:
+	 *	   Dtf >= Dtc implies EXPIRED	(return true)
+	 *	   Dtf <  Dtc implies WAIT	(return false)
+	 *
+	 * Note: If t is expired then we *cannot* wait on it. We would wait
+	 * forever (hang the system).
+	 *
+	 * Note: do NOT get clever and remove the -thresh from both sides. It
+	 * is NOT the same.
+	 *
+	 * If future valueis zero, we have a client managed sync point. In that
+	 * case we do a direct comparison.
+	 */
+	if (!host1x_syncpt_client_managed(sp))
+		return future_val - thresh >= current_val - thresh;
+	else
+		return (s32)(current_val - thresh) >= 0;
+}
+
 void host1x_syncpt_debug(struct host1x_syncpt *sp)
 {
 	sp->dev->syncpt_op.debug(sp);
diff --git a/drivers/gpu/host1x/syncpt.h b/drivers/gpu/host1x/syncpt.h
index 4f7777b..d4d1f3f 100644
--- a/drivers/gpu/host1x/syncpt.h
+++ b/drivers/gpu/host1x/syncpt.h
@@ -106,6 +106,7 @@ struct host1x_syncpt *host1x_syncpt_get(struct host1x *dev, u32 id);
 void host1x_syncpt_cpu_incr(struct host1x_syncpt *sp);
 
 u32 host1x_syncpt_load_min(struct host1x_syncpt *sp);
+bool host1x_syncpt_is_expired(struct host1x_syncpt *sp, u32 thresh);
 
 void host1x_syncpt_save(struct host1x *dev);
 
@@ -117,6 +118,9 @@ u32 host1x_syncpt_read_wait_base(struct host1x_syncpt *sp);
 void host1x_syncpt_incr(struct host1x_syncpt *sp);
 u32 host1x_syncpt_incr_max(struct host1x_syncpt *sp, u32 incrs);
 
+int host1x_syncpt_wait(struct host1x_syncpt *sp, u32 thresh,
+			long timeout, u32 *value);
+
 void host1x_syncpt_debug(struct host1x_syncpt *sp);
 
 static inline int host1x_syncpt_is_valid(struct host1x_syncpt *sp)
diff --git a/include/linux/host1x.h b/include/linux/host1x.h
index 6c2cc8a..00060ee 100644
--- a/include/linux/host1x.h
+++ b/include/linux/host1x.h
@@ -33,6 +33,7 @@ struct host1x_syncpt;
 u32 host1x_syncpt_id(struct host1x_syncpt *sp);
 void host1x_syncpt_incr_byid(u32 id);
 u32 host1x_syncpt_read_byid(u32 id);
+int host1x_syncpt_wait_byid(u32 id, u32 thresh, long timeout, u32 *value);
 
 struct host1x_syncpt *host1x_syncpt_alloc(struct platform_device *pdev,
 		int client_managed);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCHv3 3/7] gpu: host1x: Add channel support
  2012-12-13 14:04 ` Terje Bergstrom
@ 2012-12-13 14:04   ` Terje Bergstrom
  -1 siblings, 0 replies; 24+ messages in thread
From: Terje Bergstrom @ 2012-12-13 14:04 UTC (permalink / raw)
  To: tbergstrom, thierry.reding, dev, linux-tegra, dri-devel; +Cc: linux-kernel

Add support for host1x client modules, and host1x channels to submit
work to the clients. The work is submitted in GEM CMA buffers, so
this patch adds support for them.

Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
---
 drivers/gpu/host1x/Kconfig                  |   24 +-
 drivers/gpu/host1x/Makefile                 |    8 +-
 drivers/gpu/host1x/cdma.c                   |  438 +++++++++++++++++++
 drivers/gpu/host1x/cdma.h                   |  107 +++++
 drivers/gpu/host1x/channel.c                |  137 ++++++
 drivers/gpu/host1x/channel.h                |   64 +++
 drivers/gpu/host1x/cma.c                    |  116 +++++
 drivers/gpu/host1x/cma.h                    |   43 ++
 drivers/gpu/host1x/dev.c                    |   13 +
 drivers/gpu/host1x/dev.h                    |   54 +++
 drivers/gpu/host1x/hw/cdma_hw.c             |  477 +++++++++++++++++++++
 drivers/gpu/host1x/hw/cdma_hw.h             |   37 ++
 drivers/gpu/host1x/hw/channel_hw.c          |  147 +++++++
 drivers/gpu/host1x/hw/host1x01.c            |    6 +
 drivers/gpu/host1x/hw/host1x01_hardware.h   |  124 ++++++
 drivers/gpu/host1x/hw/hw_host1x01_channel.h |   86 ++++
 drivers/gpu/host1x/hw/hw_host1x01_sync.h    |    8 +
 drivers/gpu/host1x/hw/hw_host1x01_uclass.h  |  130 ++++++
 drivers/gpu/host1x/hw/syncpt_hw.c           |   10 +
 drivers/gpu/host1x/intr.c                   |   29 +-
 drivers/gpu/host1x/intr.h                   |    6 +
 drivers/gpu/host1x/job.c                    |  618 +++++++++++++++++++++++++++
 drivers/gpu/host1x/memmgr.c                 |  174 ++++++++
 drivers/gpu/host1x/memmgr.h                 |   53 +++
 drivers/gpu/host1x/syncpt.c                 |    6 +
 drivers/gpu/host1x/syncpt.h                 |    2 +
 include/linux/host1x.h                      |  173 ++++++++
 include/trace/events/host1x.h               |  235 ++++++++++
 28 files changed, 3322 insertions(+), 3 deletions(-)
 create mode 100644 drivers/gpu/host1x/cdma.c
 create mode 100644 drivers/gpu/host1x/cdma.h
 create mode 100644 drivers/gpu/host1x/channel.c
 create mode 100644 drivers/gpu/host1x/channel.h
 create mode 100644 drivers/gpu/host1x/cma.c
 create mode 100644 drivers/gpu/host1x/cma.h
 create mode 100644 drivers/gpu/host1x/hw/cdma_hw.c
 create mode 100644 drivers/gpu/host1x/hw/cdma_hw.h
 create mode 100644 drivers/gpu/host1x/hw/channel_hw.c
 create mode 100644 drivers/gpu/host1x/hw/hw_host1x01_channel.h
 create mode 100644 drivers/gpu/host1x/hw/hw_host1x01_uclass.h
 create mode 100644 drivers/gpu/host1x/job.c
 create mode 100644 drivers/gpu/host1x/memmgr.c
 create mode 100644 drivers/gpu/host1x/memmgr.h

diff --git a/drivers/gpu/host1x/Kconfig b/drivers/gpu/host1x/Kconfig
index e89fb2b..61e7ba3 100644
--- a/drivers/gpu/host1x/Kconfig
+++ b/drivers/gpu/host1x/Kconfig
@@ -3,4 +3,26 @@ config TEGRA_HOST1X
 	help
 	  Driver for the Tegra host1x hardware.
 
-	  Required for enabling tegradrm.
+	  Required for enabling tegradrm and 2D acceleration.
+
+if TEGRA_HOST1X
+
+config TEGRA_HOST1X_CMA
+	bool "Support DRM CMA buffers"
+	depends on DRM
+	select DRM_GEM_CMA_HELPER
+	select DRM_KMS_CMA_HELPER
+	help
+	  Say yes if you wish to use DRM CMA buffers.
+
+	  If unsure, choose Y.
+
+config TEGRA_HOST1X_FIREWALL
+	bool "Enable HOST1X security firewall"
+	default y
+	help
+	  Say yes if kernel should protect command streams from tampering.
+
+	  If unsure, choose Y.
+
+endif
diff --git a/drivers/gpu/host1x/Makefile b/drivers/gpu/host1x/Makefile
index 9d00b62..f6c1924 100644
--- a/drivers/gpu/host1x/Makefile
+++ b/drivers/gpu/host1x/Makefile
@@ -3,7 +3,13 @@ ccflags-y = -Idrivers/gpu/host1x
 host1x-objs = \
 	syncpt.o \
 	dev.o \
-	intr.o
+	intr.o \
+	cdma.o \
+	intr.o \
+	channel.o \
+	job.o \
+	memmgr.o
 
+obj-$(CONFIG_TEGRA_HOST1X_CMA) += cma.o
 obj-$(CONFIG_TEGRA_HOST1X) += hw/
 obj-$(CONFIG_TEGRA_HOST1X) += host1x.o
diff --git a/drivers/gpu/host1x/cdma.c b/drivers/gpu/host1x/cdma.c
new file mode 100644
index 0000000..1193fea
--- /dev/null
+++ b/drivers/gpu/host1x/cdma.c
@@ -0,0 +1,438 @@
+/*
+ * Tegra host1x Command DMA
+ *
+ * Copyright (c) 2010-2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "cdma.h"
+#include "channel.h"
+#include "dev.h"
+#include "memmgr.h"
+#include <asm/cacheflush.h>
+
+#include <linux/slab.h>
+#include <linux/kfifo.h>
+#include <linux/interrupt.h>
+#include <trace/events/host1x.h>
+
+#define TRACE_MAX_LENGTH 128U
+
+/*
+ * Add an entry to the sync queue.
+ */
+static void add_to_sync_queue(struct host1x_cdma *cdma,
+			      struct host1x_job *job,
+			      u32 nr_slots,
+			      u32 first_get)
+{
+	if (job->syncpt_id == NVSYNCPT_INVALID) {
+		dev_warn(&job->ch->dev->dev, "%s: Invalid syncpt\n",
+				__func__);
+		return;
+	}
+
+	job->first_get = first_get;
+	job->num_slots = nr_slots;
+	host1x_job_get(job);
+	list_add_tail(&job->list, &cdma->sync_queue);
+}
+
+/*
+ * Return the status of the cdma's sync queue or push buffer for the given event
+ *  - sq empty: returns 1 for empty, 0 for not empty (as in "1 empty queue" :-)
+ *  - pb space: returns the number of free slots in the channel's push buffer
+ * Must be called with the cdma lock held.
+ */
+static unsigned int cdma_status_locked(struct host1x_cdma *cdma,
+		enum cdma_event event)
+{
+	struct host1x *host1x = cdma_to_host1x(cdma);
+	switch (event) {
+	case CDMA_EVENT_SYNC_QUEUE_EMPTY:
+		return list_empty(&cdma->sync_queue) ? 1 : 0;
+	case CDMA_EVENT_PUSH_BUFFER_SPACE: {
+		struct push_buffer *pb = &cdma->push_buffer;
+		return host1x->cdma_pb_op.space(pb);
+	}
+	default:
+		return 0;
+	}
+}
+
+/*
+ * Sleep (if necessary) until the requested event happens
+ *   - CDMA_EVENT_SYNC_QUEUE_EMPTY : sync queue is completely empty.
+ *     - Returns 1
+ *   - CDMA_EVENT_PUSH_BUFFER_SPACE : there is space in the push buffer
+ *     - Return the amount of space (> 0)
+ * Must be called with the cdma lock held.
+ */
+unsigned int host1x_cdma_wait_locked(struct host1x_cdma *cdma,
+		enum cdma_event event)
+{
+	for (;;) {
+		unsigned int space = cdma_status_locked(cdma, event);
+		if (space)
+			return space;
+
+		trace_host1x_wait_cdma(cdma_to_channel(cdma)->dev->name,
+				event);
+
+		/* If somebody has managed to already start waiting, yield */
+		if (cdma->event != CDMA_EVENT_NONE) {
+			mutex_unlock(&cdma->lock);
+			schedule();
+			mutex_lock(&cdma->lock);
+			continue;
+		}
+		cdma->event = event;
+
+		mutex_unlock(&cdma->lock);
+		down(&cdma->sem);
+		mutex_lock(&cdma->lock);
+	}
+	return 0;
+}
+
+/*
+ * Start timer for a buffer submition that has completed yet.
+ * Must be called with the cdma lock held.
+ */
+static void cdma_start_timer_locked(struct host1x_cdma *cdma,
+		struct host1x_job *job)
+{
+	struct host1x *host = cdma_to_host1x(cdma);
+
+	if (cdma->timeout.clientid) {
+		/* timer already started */
+		return;
+	}
+
+	cdma->timeout.clientid = job->clientid;
+	cdma->timeout.syncpt = host1x_syncpt_get(host, job->syncpt_id);
+	cdma->timeout.syncpt_val = job->syncpt_end;
+	cdma->timeout.start_ktime = ktime_get();
+
+	schedule_delayed_work(&cdma->timeout.wq,
+			msecs_to_jiffies(job->timeout));
+}
+
+/*
+ * Stop timer when a buffer submition completes.
+ * Must be called with the cdma lock held.
+ */
+static void stop_cdma_timer_locked(struct host1x_cdma *cdma)
+{
+	cancel_delayed_work(&cdma->timeout.wq);
+	cdma->timeout.clientid = 0;
+}
+
+/*
+ * For all sync queue entries that have already finished according to the
+ * current sync point registers:
+ *  - unpin & unref their mems
+ *  - pop their push buffer slots
+ *  - remove them from the sync queue
+ * This is normally called from the host code's worker thread, but can be
+ * called manually if necessary.
+ * Must be called with the cdma lock held.
+ */
+static void update_cdma_locked(struct host1x_cdma *cdma)
+{
+	bool signal = false;
+	struct host1x *host1x = cdma_to_host1x(cdma);
+	struct host1x_job *job, *n;
+
+	/* If CDMA is stopped, queue is cleared and we can return */
+	if (!cdma->running)
+		return;
+
+	/*
+	 * Walk the sync queue, reading the sync point registers as necessary,
+	 * to consume as many sync queue entries as possible without blocking
+	 */
+	list_for_each_entry_safe(job, n, &cdma->sync_queue, list) {
+		struct host1x_syncpt *sp = host1x->syncpt + job->syncpt_id;
+
+		/* Check whether this syncpt has completed, and bail if not */
+		if (!host1x_syncpt_is_expired(sp, job->syncpt_end)) {
+			/* Start timer on next pending syncpt */
+			if (job->timeout)
+				cdma_start_timer_locked(cdma, job);
+			break;
+		}
+
+		/* Cancel timeout, when a buffer completes */
+		if (cdma->timeout.clientid)
+			stop_cdma_timer_locked(cdma);
+
+		/* Unpin the memory */
+		host1x_job_unpin(job);
+
+		/* Pop push buffer slots */
+		if (job->num_slots) {
+			struct push_buffer *pb = &cdma->push_buffer;
+			host1x->cdma_pb_op.pop_from(pb, job->num_slots);
+			if (cdma->event == CDMA_EVENT_PUSH_BUFFER_SPACE)
+				signal = true;
+		}
+
+		list_del(&job->list);
+		host1x_job_put(job);
+	}
+
+	if (list_empty(&cdma->sync_queue) &&
+				cdma->event == CDMA_EVENT_SYNC_QUEUE_EMPTY)
+			signal = true;
+
+	/* Wake up CdmaWait() if the requested event happened */
+	if (signal) {
+		cdma->event = CDMA_EVENT_NONE;
+		up(&cdma->sem);
+	}
+}
+
+void host1x_cdma_update_sync_queue(struct host1x_cdma *cdma,
+		struct platform_device *dev)
+{
+	u32 get_restart;
+	u32 syncpt_incrs;
+	struct host1x_job *job = NULL;
+	u32 syncpt_val;
+	struct host1x *host1x = cdma_to_host1x(cdma);
+
+	syncpt_val = host1x_syncpt_load_min(cdma->timeout.syncpt);
+
+	dev_dbg(&dev->dev,
+		"%s: starting cleanup (thresh %d)\n",
+		__func__, syncpt_val);
+
+	/*
+	 * Move the sync_queue read pointer to the first entry that hasn't
+	 * completed based on the current HW syncpt value. It's likely there
+	 * won't be any (i.e. we're still at the head), but covers the case
+	 * where a syncpt incr happens just prior/during the teardown.
+	 */
+
+	dev_dbg(&dev->dev,
+		"%s: skip completed buffers still in sync_queue\n",
+		__func__);
+
+	list_for_each_entry(job, &cdma->sync_queue, list) {
+		if (syncpt_val < job->syncpt_end)
+			break;
+
+		host1x_job_dump(&dev->dev, job);
+	}
+
+	/*
+	 * Walk the sync_queue, first incrementing with the CPU syncpts that
+	 * are partially executed (the first buffer) or fully skipped while
+	 * still in the current context (slots are also NOP-ed).
+	 *
+	 * At the point contexts are interleaved, syncpt increments must be
+	 * done inline with the pushbuffer from a GATHER buffer to maintain
+	 * the order (slots are modified to be a GATHER of syncpt incrs).
+	 *
+	 * Note: save in get_restart the location where the timed out buffer
+	 * started in the PB, so we can start the refetch from there (with the
+	 * modified NOP-ed PB slots). This lets things appear to have completed
+	 * properly for this buffer and resources are freed.
+	 */
+
+	dev_dbg(&dev->dev,
+		"%s: perform CPU incr on pending same ctx buffers\n",
+		__func__);
+
+	get_restart = cdma->last_put;
+	if (!list_empty(&cdma->sync_queue))
+		get_restart = job->first_get;
+
+	/* do CPU increments as long as this context continues */
+	list_for_each_entry_from(job, &cdma->sync_queue, list) {
+		/* different context, gets us out of this loop */
+		if (job->clientid != cdma->timeout.clientid)
+			break;
+
+		/* won't need a timeout when replayed */
+		job->timeout = 0;
+
+		syncpt_incrs = job->syncpt_end - syncpt_val;
+		dev_dbg(&dev->dev,
+			"%s: CPU incr (%d)\n", __func__, syncpt_incrs);
+
+		host1x_job_dump(&dev->dev, job);
+
+		/* safe to use CPU to incr syncpts */
+		host1x->cdma_op.timeout_cpu_incr(cdma,
+				job->first_get,
+				syncpt_incrs,
+				job->syncpt_end,
+				job->num_slots);
+
+		syncpt_val += syncpt_incrs;
+	}
+
+	list_for_each_entry_from(job, &cdma->sync_queue, list)
+		if (job->clientid == cdma->timeout.clientid)
+			job->timeout = 500;
+
+	dev_dbg(&dev->dev,
+		"%s: finished sync_queue modification\n", __func__);
+
+	/* roll back DMAGET and start up channel again */
+	host1x->cdma_op.timeout_teardown_end(cdma, get_restart);
+}
+
+/*
+ * Create a cdma
+ */
+int host1x_cdma_init(struct host1x_cdma *cdma)
+{
+	int err;
+	struct push_buffer *pb = &cdma->push_buffer;
+	struct host1x *host1x = cdma_to_host1x(cdma);
+
+	mutex_init(&cdma->lock);
+	sema_init(&cdma->sem, 0);
+
+	INIT_LIST_HEAD(&cdma->sync_queue);
+
+	cdma->event = CDMA_EVENT_NONE;
+	cdma->running = false;
+	cdma->torndown = false;
+
+	err = host1x->cdma_pb_op.init(pb);
+	if (err)
+		return err;
+	return 0;
+}
+
+/*
+ * Destroy a cdma
+ */
+void host1x_cdma_deinit(struct host1x_cdma *cdma)
+{
+	struct push_buffer *pb = &cdma->push_buffer;
+	struct host1x *host1x = cdma_to_host1x(cdma);
+
+	if (cdma->running) {
+		pr_warn("%s: CDMA still running\n",
+				__func__);
+	} else {
+		host1x->cdma_pb_op.destroy(pb);
+		host1x->cdma_op.timeout_destroy(cdma);
+	}
+}
+
+/*
+ * Begin a cdma submit
+ */
+int host1x_cdma_begin(struct host1x_cdma *cdma, struct host1x_job *job)
+{
+	struct host1x *host1x = cdma_to_host1x(cdma);
+
+	mutex_lock(&cdma->lock);
+
+	if (job->timeout) {
+		/* init state on first submit with timeout value */
+		if (!cdma->timeout.initialized) {
+			int err;
+			err = host1x->cdma_op.timeout_init(cdma,
+					job->syncpt_id);
+			if (err) {
+				mutex_unlock(&cdma->lock);
+				return err;
+			}
+		}
+	}
+	if (!cdma->running)
+		host1x->cdma_op.start(cdma);
+
+	cdma->slots_free = 0;
+	cdma->slots_used = 0;
+	cdma->first_get = host1x->cdma_pb_op.putptr(&cdma->push_buffer);
+
+	trace_host1x_cdma_begin(job->ch->dev->name);
+	return 0;
+}
+
+/*
+ * Push two words into a push buffer slot
+ * Blocks as necessary if the push buffer is full.
+ */
+void host1x_cdma_push(struct host1x_cdma *cdma, u32 op1, u32 op2)
+{
+	host1x_cdma_push_gather(cdma, NULL, 0, op1, op2);
+}
+
+/*
+ * Push two words into a push buffer slot
+ * Blocks as necessary if the push buffer is full.
+ */
+void host1x_cdma_push_gather(struct host1x_cdma *cdma,
+		struct mem_handle *handle,
+		u32 offset, u32 op1, u32 op2)
+{
+	struct host1x *host1x = cdma_to_host1x(cdma);
+	u32 slots_free = cdma->slots_free;
+	struct push_buffer *pb = &cdma->push_buffer;
+
+	if (slots_free == 0) {
+		host1x->cdma_op.kick(cdma);
+		slots_free = host1x_cdma_wait_locked(cdma,
+				CDMA_EVENT_PUSH_BUFFER_SPACE);
+	}
+	cdma->slots_free = slots_free - 1;
+	cdma->slots_used++;
+	host1x->cdma_pb_op.push_to(pb, handle, op1, op2);
+}
+
+/*
+ * End a cdma submit
+ * Kick off DMA, add job to the sync queue, and a number of slots to be freed
+ * from the pushbuffer. The handles for a submit must all be pinned at the same
+ * time, but they can be unpinned in smaller chunks.
+ */
+void host1x_cdma_end(struct host1x_cdma *cdma,
+		struct host1x_job *job)
+{
+	struct host1x *host1x = cdma_to_host1x(cdma);
+	bool was_idle = list_empty(&cdma->sync_queue);
+
+	host1x->cdma_op.kick(cdma);
+
+	add_to_sync_queue(cdma,
+			job,
+			cdma->slots_used,
+			cdma->first_get);
+
+	/* start timer on idle -> active transitions */
+	if (job->timeout && was_idle)
+		cdma_start_timer_locked(cdma, job);
+
+	trace_host1x_cdma_end(job->ch->dev->name);
+	mutex_unlock(&cdma->lock);
+}
+
+/*
+ * Update cdma state according to current sync point values
+ */
+void host1x_cdma_update(struct host1x_cdma *cdma)
+{
+	mutex_lock(&cdma->lock);
+	update_cdma_locked(cdma);
+	mutex_unlock(&cdma->lock);
+}
diff --git a/drivers/gpu/host1x/cdma.h b/drivers/gpu/host1x/cdma.h
new file mode 100644
index 0000000..5fd7cdf
--- /dev/null
+++ b/drivers/gpu/host1x/cdma.h
@@ -0,0 +1,107 @@
+/*
+ * Tegra host1x Command DMA
+ *
+ * Copyright (c) 2010-2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef __NVHOST_CDMA_H
+#define __NVHOST_CDMA_H
+
+#include <linux/sched.h>
+#include <linux/semaphore.h>
+
+#include <linux/host1x.h>
+#include <linux/list.h>
+
+struct host1x_syncpt;
+struct host1x_userctx_timeout;
+struct host1x_job;
+struct mem_handle;
+
+/*
+ * cdma
+ *
+ * This is in charge of a host command DMA channel.
+ * Sends ops to a push buffer, and takes responsibility for unpinning
+ * (& possibly freeing) of memory after those ops have completed.
+ * Producer:
+ *	begin
+ *		push - send ops to the push buffer
+ *	end - start command DMA and enqueue handles to be unpinned
+ * Consumer:
+ *	update - call to update sync queue and push buffer, unpin memory
+ */
+
+struct push_buffer {
+	u32 *mapped;			/* mapped pushbuffer memory */
+	dma_addr_t phys;		/* physical address of pushbuffer */
+	u32 fence;			/* index we've written */
+	u32 cur;			/* index to write to */
+	struct mem_handle **handle;	/* handle for each opcode pair */
+};
+
+struct buffer_timeout {
+	struct delayed_work wq;		/* work queue */
+	bool initialized;		/* timer one-time setup flag */
+	struct host1x_syncpt *syncpt;	/* buffer completion syncpt */
+	u32 syncpt_val;			/* syncpt value when completed */
+	ktime_t start_ktime;		/* starting time */
+	/* context timeout information */
+	int clientid;
+};
+
+enum cdma_event {
+	CDMA_EVENT_NONE,		/* not waiting for any event */
+	CDMA_EVENT_SYNC_QUEUE_EMPTY,	/* wait for empty sync queue */
+	CDMA_EVENT_PUSH_BUFFER_SPACE	/* wait for space in push buffer */
+};
+
+struct host1x_cdma {
+	struct mutex lock;		/* controls access to shared state */
+	struct semaphore sem;		/* signalled when event occurs */
+	enum cdma_event event;		/* event that sem is waiting for */
+	unsigned int slots_used;	/* pb slots used in current submit */
+	unsigned int slots_free;	/* pb slots free in current submit */
+	unsigned int first_get;		/* DMAGET value, where submit begins */
+	unsigned int last_put;		/* last value written to DMAPUT */
+	struct push_buffer push_buffer;	/* channel's push buffer */
+	struct list_head sync_queue;	/* job queue */
+	struct buffer_timeout timeout;	/* channel's timeout state/wq */
+	bool running;
+	bool torndown;
+};
+
+#define cdma_to_channel(cdma) container_of(cdma, struct host1x_channel, cdma)
+#define cdma_to_host1x(cdma) host1x_get_host(cdma_to_channel(cdma)->dev)
+#define cdma_to_memmgr(cdma) ((cdma_to_host1x(cdma))->memmgr)
+#define pb_to_cdma(pb) container_of(pb, struct host1x_cdma, push_buffer)
+
+int	host1x_cdma_init(struct host1x_cdma *cdma);
+void	host1x_cdma_deinit(struct host1x_cdma *cdma);
+void	host1x_cdma_stop(struct host1x_cdma *cdma);
+int	host1x_cdma_begin(struct host1x_cdma *cdma, struct host1x_job *job);
+void	host1x_cdma_push(struct host1x_cdma *cdma, u32 op1, u32 op2);
+void	host1x_cdma_push_gather(struct host1x_cdma *cdma,
+		struct mem_handle *handle, u32 offset, u32 op1, u32 op2);
+void	host1x_cdma_end(struct host1x_cdma *cdma,
+		struct host1x_job *job);
+void	host1x_cdma_update(struct host1x_cdma *cdma);
+void	host1x_cdma_peek(struct host1x_cdma *cdma,
+		u32 dmaget, int slot, u32 *out);
+unsigned int host1x_cdma_wait_locked(struct host1x_cdma *cdma,
+		enum cdma_event event);
+void host1x_cdma_update_sync_queue(struct host1x_cdma *cdma,
+		struct platform_device *dev);
+#endif
diff --git a/drivers/gpu/host1x/channel.c b/drivers/gpu/host1x/channel.c
new file mode 100644
index 0000000..3705cae
--- /dev/null
+++ b/drivers/gpu/host1x/channel.c
@@ -0,0 +1,137 @@
+/*
+ * Tegra host1x Channel
+ *
+ * Copyright (c) 2010-2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "channel.h"
+#include "dev.h"
+
+#include <linux/slab.h>
+#include <linux/module.h>
+
+#define NVHOST_CHANNEL_LOW_PRIO_MAX_WAIT 50
+
+/* Constructor for the host1x device list */
+void host1x_channel_list_init(struct host1x *host1x)
+{
+	INIT_LIST_HEAD(&host1x->chlist.list);
+}
+
+/*
+ * Iterator function for host1x device list
+ * It takes a fptr as an argument and calls that function for each
+ * device in the list
+ */
+void host1x_channel_for_all(struct host1x *host1x, void *data,
+	int (*fptr)(struct host1x_channel *ch, void *fdata))
+{
+	struct host1x_channel *ch;
+	int ret;
+
+	list_for_each_entry(ch, &host1x->chlist.list, list) {
+		if (ch && fptr) {
+			ret = fptr(ch, data);
+			if (ret) {
+				pr_info("%s: iterator error\n", __func__);
+				break;
+			}
+		}
+	}
+}
+
+
+int host1x_channel_submit(struct host1x_job *job)
+{
+	return host1x_get_host(job->ch->dev)->channel_op.submit(job);
+}
+EXPORT_SYMBOL(host1x_channel_submit);
+
+struct host1x_channel *host1x_channel_get(struct host1x_channel *ch)
+{
+	int err = 0;
+
+	mutex_lock(&ch->reflock);
+	if (ch->refcount == 0)
+		err = host1x_cdma_init(&ch->cdma);
+	if (!err)
+		ch->refcount++;
+
+	mutex_unlock(&ch->reflock);
+
+	return err ? NULL : ch;
+}
+EXPORT_SYMBOL(host1x_channel_get);
+
+void host1x_channel_put(struct host1x_channel *ch)
+{
+	mutex_lock(&ch->reflock);
+	if (ch->refcount == 1) {
+		host1x_get_host(ch->dev)->cdma_op.stop(&ch->cdma);
+		host1x_cdma_deinit(&ch->cdma);
+	}
+	ch->refcount--;
+	mutex_unlock(&ch->reflock);
+}
+EXPORT_SYMBOL(host1x_channel_put);
+
+struct host1x_channel *host1x_channel_alloc(struct platform_device *pdev)
+{
+	struct host1x_channel *ch = NULL;
+	struct host1x *host1x = host1x_get_host(pdev);
+	int chindex = host1x->allocated_channels;
+	int max_channels = host1x->info.nb_channels;
+	int err;
+
+	if (chindex > max_channels)
+		return NULL;
+
+	ch = kzalloc(sizeof(*ch), GFP_KERNEL);
+	if (ch == NULL)
+		return NULL;
+
+	/* Link platform_device to host1x_channel */
+	err = host1x->channel_op.init(ch, host1x, chindex);
+	if (err < 0) {
+		dev_err(&host1x->dev->dev, "failed to init channel %d\n",
+				chindex);
+		kfree(ch);
+		return NULL;
+	}
+	ch->dev = pdev;
+
+	/* Add to channel list */
+	list_add_tail(&ch->list, &host1x->chlist.list);
+
+	host1x->allocated_channels++;
+
+	return ch;
+}
+EXPORT_SYMBOL(host1x_channel_alloc);
+
+void host1x_free_channel(struct host1x_channel *ch)
+{
+	struct host1x *host1x = host1x_get_host(ch->dev);
+	struct host1x_channel *chiter, *tmp;
+	list_for_each_entry_safe(chiter, tmp, &host1x->chlist.list, list) {
+		if (chiter == ch) {
+			list_del(&chiter->list);
+			kfree(ch);
+			host1x->allocated_channels--;
+
+			return;
+		}
+	}
+}
diff --git a/drivers/gpu/host1x/channel.h b/drivers/gpu/host1x/channel.h
new file mode 100644
index 0000000..67d9487
--- /dev/null
+++ b/drivers/gpu/host1x/channel.h
@@ -0,0 +1,64 @@
+/*
+ * Tegra host1x Channel
+ *
+ * Copyright (c) 2010-2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef __NVHOST_CHANNEL_H
+#define __NVHOST_CHANNEL_H
+
+#include <linux/cdev.h>
+#include <linux/io.h>
+#include "cdma.h"
+
+#define NVHOST_MAX_WAIT_CHECKS		256
+#define NVHOST_MAX_GATHERS		512
+#define NVHOST_MAX_HANDLES		1280
+#define NVHOST_MAX_POWERGATE_IDS	2
+
+struct host1x;
+struct platform_device;
+struct host1x_channel;
+
+/*
+ * host1x device list in debug-fs dump of host1x and client device
+ * as well as channel state
+ */
+struct host1x_channel {
+	struct list_head list;
+
+	int refcount;
+	int chid;
+	struct mutex reflock;
+	struct mutex submitlock;
+	void __iomem *regs;
+	struct device *node;
+	struct platform_device *dev;
+	struct cdev cdev;
+	struct host1x_cdma cdma;
+};
+
+/* channel list operations */
+void host1x_channel_list_init(struct host1x *);
+void host1x_channel_for_all(struct host1x *, void *data,
+	int (*fptr)(struct host1x_channel *ch, void *fdata));
+
+struct host1x_channel *host1x_alloc_channel(struct platform_device *dev);
+void host1x_free_channel(struct host1x_channel *ch);
+
+struct host1x_channel *host1x_getchannel(struct host1x_channel *ch);
+void host1x_putchannel(struct host1x_channel *ch);
+
+#endif
diff --git a/drivers/gpu/host1x/cma.c b/drivers/gpu/host1x/cma.c
new file mode 100644
index 0000000..bef9d4d
--- /dev/null
+++ b/drivers/gpu/host1x/cma.c
@@ -0,0 +1,116 @@
+/*
+ * Tegra host1x DMA-BUF support
+ *
+ * Copyright (c) 2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <drm/drmP.h>
+#include <drm/drm.h>
+#include <drm/drm_gem_cma_helper.h>
+#include <linux/host1x.h>
+#include <linux/mutex.h>
+
+#include "memmgr.h"
+
+static inline struct drm_gem_cma_object *to_cma_obj(struct mem_handle *h)
+{
+	return (struct drm_gem_cma_object *)(((u32)h) & MEMMGR_ID_MASK);
+}
+
+struct mem_handle *host1x_cma_alloc(size_t size, size_t align, int flags)
+{
+	return NULL;
+}
+
+void host1x_cma_put(struct mem_handle *handle)
+{
+	struct drm_gem_cma_object *obj = to_cma_obj(handle);
+	struct mutex *struct_mutex = &obj->base.dev->struct_mutex;
+
+	mutex_lock(struct_mutex);
+	drm_gem_object_unreference(&obj->base);
+	mutex_unlock(struct_mutex);
+}
+
+struct sg_table *host1x_cma_pin(struct mem_handle *handle)
+{
+	return NULL;
+}
+
+void host1x_cma_unpin(struct mem_handle *handle, struct sg_table *sgt)
+{
+
+}
+
+
+void *host1x_cma_mmap(struct mem_handle *handle)
+{
+	return (to_cma_obj(handle))->vaddr;
+}
+
+void host1x_cma_munmap(struct mem_handle *handle, void *addr)
+{
+
+}
+
+void *host1x_cma_kmap(struct mem_handle *handle, unsigned int pagenum)
+{
+	return (to_cma_obj(handle))->vaddr + pagenum * PAGE_SIZE;
+}
+
+void host1x_cma_kunmap(struct mem_handle *handle, unsigned int pagenum,
+		void *addr)
+{
+
+}
+
+struct mem_handle *host1x_cma_get(u32 id, struct platform_device *dev)
+{
+	struct drm_gem_cma_object *obj = to_cma_obj((void *)id);
+	struct mutex *struct_mutex = &obj->base.dev->struct_mutex;
+
+	mutex_lock(struct_mutex);
+	drm_gem_object_reference(&obj->base);
+	mutex_unlock(struct_mutex);
+
+	return (struct mem_handle *) ((u32)id | mem_mgr_type_cma);
+}
+
+int host1x_cma_pin_array_ids(struct platform_device *dev,
+		long unsigned *ids,
+		long unsigned id_type_mask,
+		long unsigned id_type,
+		u32 count,
+		struct host1x_job_unpin_data *unpin_data,
+		dma_addr_t *phys_addr)
+{
+	int i;
+	int pin_count = 0;
+
+	for (i = 0; i < count; i++) {
+		struct mem_handle *handle;
+
+		if ((ids[i] & id_type_mask) != id_type)
+			continue;
+
+		handle = host1x_cma_get(ids[i], dev);
+
+		phys_addr[i] = (to_cma_obj(handle)->paddr);
+		unpin_data[pin_count].h = handle;
+
+		pin_count++;
+	}
+	return pin_count;
+}
diff --git a/drivers/gpu/host1x/cma.h b/drivers/gpu/host1x/cma.h
new file mode 100644
index 0000000..69e2540
--- /dev/null
+++ b/drivers/gpu/host1x/cma.h
@@ -0,0 +1,43 @@
+/*
+ * Tegra host1x cma memory manager
+ *
+ * Copyright (c) 2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef __NVHOST_CMA_H
+#define __NVHOST_CMA_H
+
+#include "memmgr.h"
+
+struct platform_device;
+
+struct mem_handle *host1x_cma_alloc(size_t size, size_t align, int flags);
+void host1x_cma_put(struct mem_handle *handle);
+struct sg_table *host1x_cma_pin(struct mem_handle *handle);
+void host1x_cma_unpin(struct mem_handle *handle, struct sg_table *sgt);
+void *host1x_cma_mmap(struct mem_handle *handle);
+void host1x_cma_munmap(struct mem_handle *handle, void *addr);
+void *host1x_cma_kmap(struct mem_handle *handle, unsigned int pagenum);
+void host1x_cma_kunmap(struct mem_handle *handle, unsigned int pagenum,
+		void *addr);
+int host1x_cma_get(u32 id, struct platform_device *dev);
+int host1x_cma_pin_array_ids(struct platform_device *dev,
+		long unsigned *ids,
+		long unsigned id_type_mask,
+		long unsigned id_type,
+		u32 count,
+		struct host1x_job_unpin_data *unpin_data,
+		dma_addr_t *phys_addr);
+#endif
diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
index 9255a49..9209333 100644
--- a/drivers/gpu/host1x/dev.c
+++ b/drivers/gpu/host1x/dev.c
@@ -26,6 +26,7 @@
 #include <linux/io.h>
 #include "dev.h"
 #include "intr.h"
+#include "channel.h"
 #include "hw/host1x01.h"
 
 #define CREATE_TRACE_POINTS
@@ -76,6 +77,16 @@ u32 host1x_sync_readl(struct host1x *host1x, u32 r)
 	return readl(sync_regs + r);
 }
 
+void host1x_ch_writel(struct host1x_channel *ch, u32 v, u32 r)
+{
+	writel(v, ch->regs + r);
+}
+
+u32 host1x_ch_readl(struct host1x_channel *ch, u32 r)
+{
+	return readl(ch->regs + r);
+}
+
 static int host1x_alloc_resources(struct host1x *host)
 {
 	host->intr.syncpt = devm_kzalloc(&host->dev->dev,
@@ -184,6 +195,8 @@ static int host1x_probe(struct platform_device *dev)
 
 	host1x_syncpt_reset(host);
 
+	host1x_channel_list_init(host);
+
 	host1x_intr_start(&host->intr, clk_get_rate(host->clk));
 
 	host1x = host;
diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h
index a1622bb..093ac85 100644
--- a/drivers/gpu/host1x/dev.h
+++ b/drivers/gpu/host1x/dev.h
@@ -19,13 +19,59 @@
 
 #include <linux/host1x.h>
 
+#include "channel.h"
 #include "syncpt.h"
 #include "intr.h"
 
 struct host1x;
+struct host1x_intr;
 struct host1x_syncpt;
+struct host1x_channel;
+struct host1x_cdma;
+struct host1x_job;
+struct push_buffer;
+struct dentry;
+struct mem_handle;
 struct platform_device;
 
+struct host1x_channel_ops {
+	const char *soc_name;
+	int (*init)(struct host1x_channel *,
+		    struct host1x *,
+		    int chid);
+	int (*submit)(struct host1x_job *job);
+};
+
+struct host1x_cdma_ops {
+	void (*start)(struct host1x_cdma *);
+	void (*stop)(struct host1x_cdma *);
+	void (*kick)(struct  host1x_cdma *);
+	int (*timeout_init)(struct host1x_cdma *,
+			    u32 syncpt_id);
+	void (*timeout_destroy)(struct host1x_cdma *);
+	void (*timeout_teardown_begin)(struct host1x_cdma *);
+	void (*timeout_teardown_end)(struct host1x_cdma *,
+				     u32 getptr);
+	void (*timeout_cpu_incr)(struct host1x_cdma *,
+				 u32 getptr,
+				 u32 syncpt_incrs,
+				 u32 syncval,
+				 u32 nr_slots);
+};
+
+struct host1x_pushbuffer_ops {
+	void (*reset)(struct push_buffer *);
+	int (*init)(struct push_buffer *);
+	void (*destroy)(struct push_buffer *);
+	void (*push_to)(struct push_buffer *,
+			struct mem_handle *,
+			u32 op1, u32 op2);
+	void (*pop_from)(struct push_buffer *,
+			 unsigned int slots);
+	u32 (*space)(struct push_buffer *);
+	u32 (*putptr)(struct push_buffer *);
+};
+
 struct host1x_syncpt_ops {
 	void (*reset)(struct host1x_syncpt *);
 	void (*reset_wait_base)(struct host1x_syncpt *);
@@ -70,9 +116,15 @@ struct host1x {
 	struct host1x_syncpt *nop_sp;
 
 	const char *soc_name;
+	struct host1x_channel_ops channel_op;
+	struct host1x_cdma_ops cdma_op;
+	struct host1x_pushbuffer_ops cdma_pb_op;
 	struct host1x_syncpt_ops syncpt_op;
 	struct host1x_intr_ops intr_op;
 
+	struct host1x_channel chlist;
+	int allocated_channels;
+
 	struct dentry *debugfs;
 };
 
@@ -90,5 +142,7 @@ struct host1x *host1x_get_host(struct platform_device *_dev)
 
 void host1x_sync_writel(struct host1x *host1x, u32 r, u32 v);
 u32 host1x_sync_readl(struct host1x *host1x, u32 r);
+void host1x_ch_writel(struct host1x_channel *ch, u32 r, u32 v);
+u32 host1x_ch_readl(struct host1x_channel *ch, u32 r);
 
 #endif
diff --git a/drivers/gpu/host1x/hw/cdma_hw.c b/drivers/gpu/host1x/hw/cdma_hw.c
new file mode 100644
index 0000000..55adaa6
--- /dev/null
+++ b/drivers/gpu/host1x/hw/cdma_hw.c
@@ -0,0 +1,477 @@
+/*
+ * Tegra host1x Command DMA
+ *
+ * Copyright (c) 2010-2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/slab.h>
+#include <linux/scatterlist.h>
+#include <linux/dma-mapping.h>
+#include "cdma.h"
+#include "channel.h"
+#include "dev.h"
+#include "memmgr.h"
+
+#include "cdma_hw.h"
+
+static inline u32 host1x_channel_dmactrl(int stop, int get_rst, int init_get)
+{
+	return host1x_channel_dmactrl_dmastop_f(stop)
+		| host1x_channel_dmactrl_dmagetrst_f(get_rst)
+		| host1x_channel_dmactrl_dmainitget_f(init_get);
+}
+
+static void cdma_timeout_handler(struct work_struct *work);
+
+/*
+ * push_buffer
+ *
+ * The push buffer is a circular array of words to be fetched by command DMA.
+ * Note that it works slightly differently to the sync queue; fence == cur
+ * means that the push buffer is full, not empty.
+ */
+
+
+/**
+ * Reset to empty push buffer
+ */
+static void push_buffer_reset(struct push_buffer *pb)
+{
+	pb->fence = PUSH_BUFFER_SIZE - 8;
+	pb->cur = 0;
+}
+
+/**
+ * Init push buffer resources
+ */
+static void push_buffer_destroy(struct push_buffer *pb);
+static int push_buffer_init(struct push_buffer *pb)
+{
+	struct host1x_cdma *cdma = pb_to_cdma(pb);
+	struct host1x *host1x = cdma_to_host1x(cdma);
+	pb->mapped = NULL;
+	pb->phys = 0;
+	pb->handle = NULL;
+
+	host1x->cdma_pb_op.reset(pb);
+
+	/* allocate and map pushbuffer memory */
+	pb->mapped = dma_alloc_writecombine(&host1x->dev->dev,
+			PUSH_BUFFER_SIZE + 4, &pb->phys, GFP_KERNEL);
+	if (IS_ERR_OR_NULL(pb->mapped)) {
+		pb->mapped = NULL;
+		goto fail;
+	}
+
+	/* memory for storing mem client and handles for each opcode pair */
+	pb->handle = kzalloc(HOST1X_GATHER_QUEUE_SIZE *
+				sizeof(struct mem_handle *),
+			GFP_KERNEL);
+	if (!pb->handle)
+		goto fail;
+
+	/* put the restart at the end of pushbuffer memory */
+	*(pb->mapped + (PUSH_BUFFER_SIZE >> 2)) =
+		host1x_opcode_restart(pb->phys);
+
+	return 0;
+
+fail:
+	push_buffer_destroy(pb);
+	return -ENOMEM;
+}
+
+/*
+ * Clean up push buffer resources
+ */
+static void push_buffer_destroy(struct push_buffer *pb)
+{
+	struct host1x_cdma *cdma = pb_to_cdma(pb);
+	struct host1x *host1x = cdma_to_host1x(cdma);
+
+	if (pb->phys != 0)
+		dma_free_writecombine(&host1x->dev->dev,
+				PUSH_BUFFER_SIZE + 4,
+				pb->mapped, pb->phys);
+
+	kfree(pb->handle);
+
+	pb->mapped = NULL;
+	pb->phys = 0;
+	pb->handle = 0;
+}
+
+/*
+ * Push two words to the push buffer
+ * Caller must ensure push buffer is not full
+ */
+static void push_buffer_push_to(struct push_buffer *pb,
+		struct mem_handle *handle,
+		u32 op1, u32 op2)
+{
+	u32 cur = pb->cur;
+	u32 *p = (u32 *)((u32)pb->mapped + cur);
+	u32 cur_mem = (cur/8) & (HOST1X_GATHER_QUEUE_SIZE - 1);
+	WARN_ON(cur == pb->fence);
+	*(p++) = op1;
+	*(p++) = op2;
+	pb->handle[cur_mem] = handle;
+	pb->cur = (cur + 8) & (PUSH_BUFFER_SIZE - 1);
+}
+
+/*
+ * Pop a number of two word slots from the push buffer
+ * Caller must ensure push buffer is not empty
+ */
+static void push_buffer_pop_from(struct push_buffer *pb,
+		unsigned int slots)
+{
+	/* Clear the mem references for old items from pb */
+	unsigned int i;
+	u32 fence_mem = pb->fence/8;
+	for (i = 0; i < slots; i++) {
+		int cur_fence_mem = (fence_mem+i)
+				& (HOST1X_GATHER_QUEUE_SIZE - 1);
+		pb->handle[cur_fence_mem] = NULL;
+	}
+	/* Advance the next write position */
+	pb->fence = (pb->fence + slots * 8) & (PUSH_BUFFER_SIZE - 1);
+}
+
+/*
+ * Return the number of two word slots free in the push buffer
+ */
+static u32 push_buffer_space(struct push_buffer *pb)
+{
+	return ((pb->fence - pb->cur) & (PUSH_BUFFER_SIZE - 1)) / 8;
+}
+
+static u32 push_buffer_putptr(struct push_buffer *pb)
+{
+	return pb->phys + pb->cur;
+}
+
+/*
+ * The syncpt incr buffer is filled with methods to increment syncpts, which
+ * is later GATHER-ed into the mainline PB. It's used when a timed out context
+ * is interleaved with other work, so needs to inline the syncpt increments
+ * to maintain the count (but otherwise does no work).
+ */
+
+/*
+ * Init timeout resources
+ */
+static int cdma_timeout_init(struct host1x_cdma *cdma,
+				 u32 syncpt_id)
+{
+	if (syncpt_id == NVSYNCPT_INVALID)
+		return -EINVAL;
+
+	INIT_DELAYED_WORK(&cdma->timeout.wq, cdma_timeout_handler);
+	cdma->timeout.initialized = true;
+
+	return 0;
+}
+
+/*
+ * Clean up timeout resources
+ */
+static void cdma_timeout_destroy(struct host1x_cdma *cdma)
+{
+	if (cdma->timeout.initialized)
+		cancel_delayed_work(&cdma->timeout.wq);
+	cdma->timeout.initialized = false;
+}
+
+/*
+ * Increment timedout buffer's syncpt via CPU.
+ */
+static void cdma_timeout_cpu_incr(struct host1x_cdma *cdma, u32 getptr,
+				u32 syncpt_incrs, u32 syncval, u32 nr_slots)
+{
+	struct host1x *host1x = cdma_to_host1x(cdma);
+	struct push_buffer *pb = &cdma->push_buffer;
+	u32 i, getidx;
+
+	for (i = 0; i < syncpt_incrs; i++)
+		host1x_syncpt_cpu_incr(cdma->timeout.syncpt);
+
+	/* after CPU incr, ensure shadow is up to date */
+	host1x_syncpt_load_min(cdma->timeout.syncpt);
+
+	/* NOP all the PB slots */
+	getidx = getptr - pb->phys;
+	while (nr_slots--) {
+		u32 *p = (u32 *)((u32)pb->mapped + getidx);
+		*(p++) = HOST1X_OPCODE_NOOP;
+		*(p++) = HOST1X_OPCODE_NOOP;
+		dev_dbg(&host1x->dev->dev, "%s: NOP at 0x%x\n",
+			__func__, pb->phys + getidx);
+		getidx = (getidx + 8) & (PUSH_BUFFER_SIZE - 1);
+	}
+	wmb();
+}
+
+/*
+ * Start channel DMA
+ */
+static void cdma_start(struct host1x_cdma *cdma)
+{
+	struct host1x_channel *ch = cdma_to_channel(cdma);
+	struct host1x *host1x = cdma_to_host1x(cdma);
+
+	if (cdma->running)
+		return;
+
+	cdma->last_put = host1x->cdma_pb_op.putptr(&cdma->push_buffer);
+
+	host1x_ch_writel(ch, host1x_channel_dmactrl(true, false, false),
+		host1x_channel_dmactrl_r());
+
+	/* set base, put, end pointer (all of memory) */
+	host1x_ch_writel(ch, 0, host1x_channel_dmastart_r());
+	host1x_ch_writel(ch, cdma->last_put, host1x_channel_dmaput_r());
+	host1x_ch_writel(ch, 0xFFFFFFFF, host1x_channel_dmaend_r());
+
+	/* reset GET */
+	host1x_ch_writel(ch, host1x_channel_dmactrl(true, true, true),
+		host1x_channel_dmactrl_r());
+
+	/* start the command DMA */
+	host1x_ch_writel(ch, host1x_channel_dmactrl(false, false, false),
+		host1x_channel_dmactrl_r());
+
+	cdma->running = true;
+}
+
+/*
+ * Similar to cdma_start(), but rather than starting from an idle
+ * state (where DMA GET is set to DMA PUT), on a timeout we restore
+ * DMA GET from an explicit value (so DMA may again be pending).
+ */
+static void cdma_timeout_restart(struct host1x_cdma *cdma, u32 getptr)
+{
+	struct host1x *host1x = cdma_to_host1x(cdma);
+	struct host1x_channel *ch = cdma_to_channel(cdma);
+
+	if (cdma->running)
+		return;
+
+	cdma->last_put = host1x->cdma_pb_op.putptr(&cdma->push_buffer);
+
+	host1x_ch_writel(ch, host1x_channel_dmactrl(true, false, false),
+		host1x_channel_dmactrl_r());
+
+	/* set base, end pointer (all of memory) */
+	host1x_ch_writel(ch, 0, host1x_channel_dmastart_r());
+	host1x_ch_writel(ch, 0xFFFFFFFF, host1x_channel_dmaend_r());
+
+	/* set GET, by loading the value in PUT (then reset GET) */
+	host1x_ch_writel(ch, getptr, host1x_channel_dmaput_r());
+	host1x_ch_writel(ch, host1x_channel_dmactrl(true, true, true),
+		host1x_channel_dmactrl_r());
+
+	dev_dbg(&host1x->dev->dev,
+		"%s: DMA GET 0x%x, PUT HW 0x%x / shadow 0x%x\n",
+		__func__,
+		host1x_ch_readl(ch, host1x_channel_dmaget_r()),
+		host1x_ch_readl(ch, host1x_channel_dmaput_r()),
+		cdma->last_put);
+
+	/* deassert GET reset and set PUT */
+	host1x_ch_writel(ch, host1x_channel_dmactrl(true, false, false),
+		host1x_channel_dmactrl_r());
+	host1x_ch_writel(ch, cdma->last_put, host1x_channel_dmaput_r());
+
+	/* start the command DMA */
+	host1x_ch_writel(ch, host1x_channel_dmactrl(false, false, false),
+		host1x_channel_dmactrl_r());
+
+	cdma->running = true;
+}
+
+/*
+ * Kick channel DMA into action by writing its PUT offset (if it has changed)
+ */
+static void cdma_kick(struct host1x_cdma *cdma)
+{
+	struct host1x *host1x = cdma_to_host1x(cdma);
+	struct host1x_channel *ch = cdma_to_channel(cdma);
+	u32 put;
+
+	put = host1x->cdma_pb_op.putptr(&cdma->push_buffer);
+
+	if (put != cdma->last_put) {
+		host1x_ch_writel(ch, put, host1x_channel_dmaput_r());
+		cdma->last_put = put;
+	}
+}
+
+static void cdma_stop(struct host1x_cdma *cdma)
+{
+	struct host1x_channel *ch = cdma_to_channel(cdma);
+
+	mutex_lock(&cdma->lock);
+	if (cdma->running) {
+		host1x_cdma_wait_locked(cdma, CDMA_EVENT_SYNC_QUEUE_EMPTY);
+		host1x_ch_writel(ch, host1x_channel_dmactrl(true, false, false),
+			host1x_channel_dmactrl_r());
+		cdma->running = false;
+	}
+	mutex_unlock(&cdma->lock);
+}
+
+/*
+ * Stops both channel's command processor and CDMA immediately.
+ * Also, tears down the channel and resets corresponding module.
+ */
+static void cdma_timeout_teardown_begin(struct host1x_cdma *cdma)
+{
+	struct host1x *dev = cdma_to_host1x(cdma);
+	struct host1x_channel *ch = cdma_to_channel(cdma);
+	u32 cmdproc_stop;
+
+	if (cdma->torndown && !cdma->running) {
+		dev_warn(&dev->dev->dev, "Already torn down\n");
+		return;
+	}
+
+	dev_dbg(&dev->dev->dev,
+		"begin channel teardown (channel id %d)\n", ch->chid);
+
+	cmdproc_stop = host1x_sync_readl(dev, host1x_sync_cmdproc_stop_r());
+	cmdproc_stop |= BIT(ch->chid);
+	host1x_sync_writel(dev, cmdproc_stop, host1x_sync_cmdproc_stop_r());
+
+	dev_dbg(&dev->dev->dev,
+		"%s: DMA GET 0x%x, PUT HW 0x%x / shadow 0x%x\n",
+		__func__,
+		host1x_ch_readl(ch, host1x_channel_dmaget_r()),
+		host1x_ch_readl(ch, host1x_channel_dmaput_r()),
+		cdma->last_put);
+
+	host1x_ch_writel(ch, host1x_channel_dmactrl(true, false, false),
+		host1x_channel_dmactrl_r());
+
+	host1x_sync_writel(dev, BIT(ch->chid), host1x_sync_ch_teardown_r());
+
+	cdma->running = false;
+	cdma->torndown = true;
+}
+
+static void cdma_timeout_teardown_end(struct host1x_cdma *cdma, u32 getptr)
+{
+	struct host1x *host1x = cdma_to_host1x(cdma);
+	struct host1x_channel *ch = cdma_to_channel(cdma);
+	u32 cmdproc_stop;
+
+	dev_dbg(&host1x->dev->dev,
+		"end channel teardown (id %d, DMAGET restart = 0x%x)\n",
+		ch->chid, getptr);
+
+	cmdproc_stop = host1x_sync_readl(host1x, host1x_sync_cmdproc_stop_r());
+	cmdproc_stop &= ~(BIT(ch->chid));
+	host1x_sync_writel(host1x, cmdproc_stop, host1x_sync_cmdproc_stop_r());
+
+	cdma->torndown = false;
+	cdma_timeout_restart(cdma, getptr);
+}
+
+/*
+ * If this timeout fires, it indicates the current sync_queue entry has
+ * exceeded its TTL and the userctx should be timed out and remaining
+ * submits already issued cleaned up (future submits return an error).
+ */
+static void cdma_timeout_handler(struct work_struct *work)
+{
+	struct host1x_cdma *cdma;
+	struct host1x *host1x;
+	struct host1x_channel *ch;
+
+	u32 syncpt_val;
+
+	u32 prev_cmdproc, cmdproc_stop;
+
+	cdma = container_of(to_delayed_work(work), struct host1x_cdma,
+			    timeout.wq);
+	host1x = cdma_to_host1x(cdma);
+	ch = cdma_to_channel(cdma);
+
+	mutex_lock(&cdma->lock);
+
+	if (!cdma->timeout.clientid) {
+		dev_dbg(&host1x->dev->dev,
+			 "cdma_timeout: expired, but has no clientid\n");
+		mutex_unlock(&cdma->lock);
+		return;
+	}
+
+	/* stop processing to get a clean snapshot */
+	prev_cmdproc = host1x_sync_readl(host1x, host1x_sync_cmdproc_stop_r());
+	cmdproc_stop = prev_cmdproc | BIT(ch->chid);
+	host1x_sync_writel(host1x, cmdproc_stop, host1x_sync_cmdproc_stop_r());
+
+	dev_dbg(&host1x->dev->dev, "cdma_timeout: cmdproc was 0x%x is 0x%x\n",
+		prev_cmdproc, cmdproc_stop);
+
+	syncpt_val = host1x_syncpt_load_min(host1x->syncpt);
+
+	/* has buffer actually completed? */
+	if ((s32)(syncpt_val - cdma->timeout.syncpt_val) >= 0) {
+		dev_dbg(&host1x->dev->dev,
+			 "cdma_timeout: expired, but buffer had completed\n");
+		/* restore */
+		cmdproc_stop = prev_cmdproc & ~(BIT(ch->chid));
+		host1x_sync_writel(host1x, cmdproc_stop,
+			host1x_sync_cmdproc_stop_r());
+		mutex_unlock(&cdma->lock);
+		return;
+	}
+
+	dev_warn(&host1x->dev->dev,
+		"%s: timeout: %d (%s), HW thresh %d, done %d\n",
+		__func__,
+		cdma->timeout.syncpt->id, cdma->timeout.syncpt->name,
+		syncpt_val, cdma->timeout.syncpt_val);
+
+	/* stop HW, resetting channel/module */
+	host1x->cdma_op.timeout_teardown_begin(cdma);
+
+	host1x_cdma_update_sync_queue(cdma, ch->dev);
+	mutex_unlock(&cdma->lock);
+}
+
+static const struct host1x_cdma_ops host1x_cdma_ops = {
+	.start = cdma_start,
+	.stop = cdma_stop,
+	.kick = cdma_kick,
+
+	.timeout_init = cdma_timeout_init,
+	.timeout_destroy = cdma_timeout_destroy,
+	.timeout_teardown_begin = cdma_timeout_teardown_begin,
+	.timeout_teardown_end = cdma_timeout_teardown_end,
+	.timeout_cpu_incr = cdma_timeout_cpu_incr,
+};
+
+static const struct host1x_pushbuffer_ops host1x_pushbuffer_ops = {
+	.reset = push_buffer_reset,
+	.init = push_buffer_init,
+	.destroy = push_buffer_destroy,
+	.push_to = push_buffer_push_to,
+	.pop_from = push_buffer_pop_from,
+	.space = push_buffer_space,
+	.putptr = push_buffer_putptr,
+};
+
diff --git a/drivers/gpu/host1x/hw/cdma_hw.h b/drivers/gpu/host1x/hw/cdma_hw.h
new file mode 100644
index 0000000..4ce2f43
--- /dev/null
+++ b/drivers/gpu/host1x/hw/cdma_hw.h
@@ -0,0 +1,37 @@
+/*
+ * Tegra host1x Command DMA
+ *
+ * Copyright (c) 2011-2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef __HOST1X_CDMA_H
+#define __HOST1X_CDMA_H
+
+/*
+ * Size of the sync queue. If it is too small, we won't be able to queue up
+ * many command buffers. If it is too large, we waste memory.
+ */
+#define HOST1X_SYNC_QUEUE_SIZE 512
+
+/*
+ * Number of gathers we allow to be queued up per channel. Must be a
+ * power of two. Currently sized such that pushbuffer is 4KB (512*8B).
+ */
+#define HOST1X_GATHER_QUEUE_SIZE 512
+
+/* 8 bytes per slot. (This number does not include the final RESTART.) */
+#define PUSH_BUFFER_SIZE (HOST1X_GATHER_QUEUE_SIZE * 8)
+
+#endif
diff --git a/drivers/gpu/host1x/hw/channel_hw.c b/drivers/gpu/host1x/hw/channel_hw.c
new file mode 100644
index 0000000..3bdfef6
--- /dev/null
+++ b/drivers/gpu/host1x/hw/channel_hw.c
@@ -0,0 +1,147 @@
+/*
+ * Tegra host1x Channel
+ *
+ * Copyright (c) 2010-2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/host1x.h>
+#include "channel.h"
+#include "dev.h"
+#include <linux/slab.h>
+#include "intr.h"
+#include <trace/events/host1x.h>
+
+static void submit_gathers(struct host1x_job *job)
+{
+	/* push user gathers */
+	int i;
+	for (i = 0 ; i < job->num_gathers; i++) {
+		struct host1x_job_gather *g = &job->gathers[i];
+		u32 op1 = host1x_opcode_gather(g->words);
+		u32 op2 = g->mem_base + g->offset;
+		host1x_cdma_push_gather(&job->ch->cdma,
+				job->gathers[i].ref,
+				job->gathers[i].offset,
+				op1, op2);
+	}
+}
+
+static int channel_submit(struct host1x_job *job)
+{
+	struct host1x_channel *ch = job->ch;
+	struct host1x_syncpt *sp;
+	u32 user_syncpt_incrs = job->syncpt_incrs;
+	u32 prev_max = 0;
+	u32 syncval;
+	int err;
+	void *completed_waiter = NULL;
+
+	sp = host1x_get_host(job->ch->dev)->syncpt + job->syncpt_id;
+	trace_host1x_channel_submit(ch->dev->name,
+			job->num_gathers, job->num_relocs, job->num_waitchk,
+			job->syncpt_id, job->syncpt_incrs);
+
+	/* before error checks, return current max */
+	prev_max = job->syncpt_end = host1x_syncpt_read_max(sp);
+
+	/* get submit lock */
+	err = mutex_lock_interruptible(&ch->submitlock);
+	if (err)
+		goto error;
+
+	completed_waiter = host1x_intr_alloc_waiter();
+	if (!completed_waiter) {
+		mutex_unlock(&ch->submitlock);
+		err = -ENOMEM;
+		goto error;
+	}
+
+	/* begin a CDMA submit */
+	err = host1x_cdma_begin(&ch->cdma, job);
+	if (err) {
+		mutex_unlock(&ch->submitlock);
+		goto error;
+	}
+
+	if (job->serialize) {
+		/*
+		 * Force serialization by inserting a host wait for the
+		 * previous job to finish before this one can commence.
+		 */
+		host1x_cdma_push(&ch->cdma,
+				host1x_opcode_setclass(NV_HOST1X_CLASS_ID,
+					host1x_uclass_wait_syncpt_r(),
+					1),
+				host1x_class_host_wait_syncpt(job->syncpt_id,
+					host1x_syncpt_read_max(sp)));
+	}
+
+	syncval = host1x_syncpt_incr_max(sp, user_syncpt_incrs);
+
+	job->syncpt_end = syncval;
+
+	/* add a setclass for modules that require it */
+	if (job->class)
+		host1x_cdma_push(&ch->cdma,
+			host1x_opcode_setclass(job->class, 0, 0),
+			HOST1X_OPCODE_NOOP);
+
+	submit_gathers(job);
+
+	/* end CDMA submit & stash pinned hMems into sync queue */
+	host1x_cdma_end(&ch->cdma, job);
+
+	trace_host1x_channel_submitted(ch->dev->name,
+			prev_max, syncval);
+
+	/* schedule a submit complete interrupt */
+	err = host1x_intr_add_action(&host1x_get_host(ch->dev)->intr,
+			job->syncpt_id, syncval,
+			HOST1X_INTR_ACTION_SUBMIT_COMPLETE, ch,
+			completed_waiter,
+			NULL);
+	completed_waiter = NULL;
+	WARN(err, "Failed to set submit complete interrupt");
+
+	mutex_unlock(&ch->submitlock);
+
+	return 0;
+
+error:
+	kfree(completed_waiter);
+	return err;
+}
+
+static inline void __iomem *host1x_channel_regs(void __iomem *p, int ndx)
+{
+	p += ndx * NV_HOST1X_CHANNEL_MAP_SIZE_BYTES;
+	return p;
+}
+
+static int host1x_channel_init(struct host1x_channel *ch,
+	struct host1x *dev, int index)
+{
+	ch->chid = index;
+	mutex_init(&ch->reflock);
+	mutex_init(&ch->submitlock);
+
+	ch->regs = host1x_channel_regs(dev->regs, index);
+	return 0;
+}
+
+static const struct host1x_channel_ops host1x_channel_ops = {
+	.init = host1x_channel_init,
+	.submit = channel_submit,
+};
diff --git a/drivers/gpu/host1x/hw/host1x01.c b/drivers/gpu/host1x/hw/host1x01.c
index c5c55a3..3f41619 100644
--- a/drivers/gpu/host1x/hw/host1x01.c
+++ b/drivers/gpu/host1x/hw/host1x01.c
@@ -24,13 +24,19 @@
 
 #include "hw/host1x01.h"
 #include "dev.h"
+#include "channel.h"
 #include "hw/host1x01_hardware.h"
 
+#include "hw/channel_hw.c"
+#include "hw/cdma_hw.c"
 #include "hw/syncpt_hw.c"
 #include "hw/intr_hw.c"
 
 int host1x01_init(struct host1x *host)
 {
+	host->channel_op = host1x_channel_ops;
+	host->cdma_op = host1x_cdma_ops;
+	host->cdma_pb_op = host1x_pushbuffer_ops;
 	host->syncpt_op = host1x_syncpt_ops;
 	host->intr_op = host1x_intr_ops;
 
diff --git a/drivers/gpu/host1x/hw/host1x01_hardware.h b/drivers/gpu/host1x/hw/host1x01_hardware.h
index 4e57f21..020798f 100644
--- a/drivers/gpu/host1x/hw/host1x01_hardware.h
+++ b/drivers/gpu/host1x/hw/host1x01_hardware.h
@@ -21,6 +21,130 @@
 
 #include <linux/types.h>
 #include <linux/bitops.h>
+#include "hw_host1x01_channel.h"
 #include "hw_host1x01_sync.h"
+#include "hw_host1x01_uclass.h"
+
+/* channel registers */
+#define NV_HOST1X_CHANNEL_MAP_SIZE_BYTES 16384
+
+static inline u32 host1x_class_host_wait_syncpt(
+	unsigned indx, unsigned threshold)
+{
+	return host1x_uclass_wait_syncpt_indx_f(indx)
+		| host1x_uclass_wait_syncpt_thresh_f(threshold);
+}
+
+static inline u32 host1x_class_host_load_syncpt_base(
+	unsigned indx, unsigned threshold)
+{
+	return host1x_uclass_load_syncpt_base_base_indx_f(indx)
+		| host1x_uclass_load_syncpt_base_value_f(threshold);
+}
+
+static inline u32 host1x_class_host_wait_syncpt_base(
+	unsigned indx, unsigned base_indx, unsigned offset)
+{
+	return host1x_uclass_wait_syncpt_base_indx_f(indx)
+		| host1x_uclass_wait_syncpt_base_base_indx_f(base_indx)
+		| host1x_uclass_wait_syncpt_base_offset_f(offset);
+}
+
+static inline u32 host1x_class_host_incr_syncpt_base(
+	unsigned base_indx, unsigned offset)
+{
+	return host1x_uclass_incr_syncpt_base_base_indx_f(base_indx)
+		| host1x_uclass_incr_syncpt_base_offset_f(offset);
+}
+
+static inline u32 host1x_class_host_incr_syncpt(
+	unsigned cond, unsigned indx)
+{
+	return host1x_uclass_incr_syncpt_cond_f(cond)
+		| host1x_uclass_incr_syncpt_indx_f(indx);
+}
+
+static inline u32 host1x_class_host_indoff_reg_write(
+	unsigned mod_id, unsigned offset, bool auto_inc)
+{
+	u32 v = host1x_uclass_indoff_indbe_f(0xf)
+		| host1x_uclass_indoff_indmodid_f(mod_id)
+		| host1x_uclass_indoff_indroffset_f(offset);
+	if (auto_inc)
+		v |= host1x_uclass_indoff_autoinc_f(1);
+	return v;
+}
+
+static inline u32 host1x_class_host_indoff_reg_read(
+	unsigned mod_id, unsigned offset, bool auto_inc)
+{
+	u32 v = host1x_uclass_indoff_indmodid_f(mod_id)
+		| host1x_uclass_indoff_indroffset_f(offset)
+		| host1x_uclass_indoff_rwn_read_v();
+	if (auto_inc)
+		v |= host1x_uclass_indoff_autoinc_f(1);
+	return v;
+}
+
+
+/* cdma opcodes */
+static inline u32 host1x_opcode_setclass(
+	unsigned class_id, unsigned offset, unsigned mask)
+{
+	return (0 << 28) | (offset << 16) | (class_id << 6) | mask;
+}
+
+static inline u32 host1x_opcode_incr(unsigned offset, unsigned count)
+{
+	return (1 << 28) | (offset << 16) | count;
+}
+
+static inline u32 host1x_opcode_nonincr(unsigned offset, unsigned count)
+{
+	return (2 << 28) | (offset << 16) | count;
+}
+
+static inline u32 host1x_opcode_mask(unsigned offset, unsigned mask)
+{
+	return (3 << 28) | (offset << 16) | mask;
+}
+
+static inline u32 host1x_opcode_imm(unsigned offset, unsigned value)
+{
+	return (4 << 28) | (offset << 16) | value;
+}
+
+static inline u32 host1x_opcode_imm_incr_syncpt(unsigned cond, unsigned indx)
+{
+	return host1x_opcode_imm(host1x_uclass_incr_syncpt_r(),
+		host1x_class_host_incr_syncpt(cond, indx));
+}
+
+static inline u32 host1x_opcode_restart(unsigned address)
+{
+	return (5 << 28) | (address >> 4);
+}
+
+static inline u32 host1x_opcode_gather(unsigned count)
+{
+	return (6 << 28) | count;
+}
+
+static inline u32 host1x_opcode_gather_nonincr(unsigned offset,	unsigned count)
+{
+	return (6 << 28) | (offset << 16) | BIT(15) | count;
+}
+
+static inline u32 host1x_opcode_gather_incr(unsigned offset, unsigned count)
+{
+	return (6 << 28) | (offset << 16) | BIT(15) | BIT(14) | count;
+}
+
+#define HOST1X_OPCODE_NOOP host1x_opcode_nonincr(0, 0)
+
+static inline u32 host1x_mask2(unsigned x, unsigned y)
+{
+	return 1 | (1 << (y - x));
+}
 
 #endif
diff --git a/drivers/gpu/host1x/hw/hw_host1x01_channel.h b/drivers/gpu/host1x/hw/hw_host1x01_channel.h
new file mode 100644
index 0000000..3a23d57
--- /dev/null
+++ b/drivers/gpu/host1x/hw/hw_host1x01_channel.h
@@ -0,0 +1,86 @@
+/*
+ * Copyright (c) 2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+ /*
+  * Function naming determines intended use:
+  *
+  *     <x>_r(void) : Returns the offset for register <x>.
+  *
+  *     <x>_w(void) : Returns the word offset for word (4 byte) element <x>.
+  *
+  *     <x>_<y>_s(void) : Returns size of field <y> of register <x> in bits.
+  *
+  *     <x>_<y>_f(u32 v) : Returns a value based on 'v' which has been shifted
+  *         and masked to place it at field <y> of register <x>.  This value
+  *         can be |'d with others to produce a full register value for
+  *         register <x>.
+  *
+  *     <x>_<y>_m(void) : Returns a mask for field <y> of register <x>.  This
+  *         value can be ~'d and then &'d to clear the value of field <y> for
+  *         register <x>.
+  *
+  *     <x>_<y>_<z>_f(void) : Returns the constant value <z> after being shifted
+  *         to place it at field <y> of register <x>.  This value can be |'d
+  *         with others to produce a full register value for <x>.
+  *
+  *     <x>_<y>_v(u32 r) : Returns the value of field <y> from a full register
+  *         <x> value 'r' after being shifted to place its LSB at bit 0.
+  *         This value is suitable for direct comparison with other unshifted
+  *         values appropriate for use in field <y> of register <x>.
+  *
+  *     <x>_<y>_<z>_v(void) : Returns the constant value for <z> defined for
+  *         field <y> of register <x>.  This value is suitable for direct
+  *         comparison with unshifted values appropriate for use in field <y>
+  *         of register <x>.
+  */
+
+#ifndef __hw_host1x_channel_host1x_h__
+#define __hw_host1x_channel_host1x_h__
+
+static inline u32 host1x_channel_dmastart_r(void)
+{
+	return 0x14;
+}
+static inline u32 host1x_channel_dmaput_r(void)
+{
+	return 0x18;
+}
+static inline u32 host1x_channel_dmaget_r(void)
+{
+	return 0x1c;
+}
+static inline u32 host1x_channel_dmaend_r(void)
+{
+	return 0x20;
+}
+static inline u32 host1x_channel_dmactrl_r(void)
+{
+	return 0x24;
+}
+static inline u32 host1x_channel_dmactrl_dmastop_f(u32 v)
+{
+	return (v & 0x1) << 0;
+}
+static inline u32 host1x_channel_dmactrl_dmagetrst_f(u32 v)
+{
+	return (v & 0x1) << 1;
+}
+static inline u32 host1x_channel_dmactrl_dmainitget_f(u32 v)
+{
+	return (v & 0x1) << 2;
+}
+#endif
diff --git a/drivers/gpu/host1x/hw/hw_host1x01_sync.h b/drivers/gpu/host1x/hw/hw_host1x01_sync.h
index b06a2c5..c9342da 100644
--- a/drivers/gpu/host1x/hw/hw_host1x01_sync.h
+++ b/drivers/gpu/host1x/hw/hw_host1x01_sync.h
@@ -63,6 +63,14 @@ static inline u32 host1x_sync_syncpt_thresh_int_enable_cpu0_r(void)
 {
 	return 0x68;
 }
+static inline u32 host1x_sync_cmdproc_stop_r(void)
+{
+	return 0xac;
+}
+static inline u32 host1x_sync_ch_teardown_r(void)
+{
+	return 0xb0;
+}
 static inline u32 host1x_sync_usec_clk_r(void)
 {
 	return 0x1a4;
diff --git a/drivers/gpu/host1x/hw/hw_host1x01_uclass.h b/drivers/gpu/host1x/hw/hw_host1x01_uclass.h
new file mode 100644
index 0000000..948cfe3
--- /dev/null
+++ b/drivers/gpu/host1x/hw/hw_host1x01_uclass.h
@@ -0,0 +1,130 @@
+/*
+ * Copyright (c) 2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+ /*
+  * Function naming determines intended use:
+  *
+  *     <x>_r(void) : Returns the offset for register <x>.
+  *
+  *     <x>_w(void) : Returns the word offset for word (4 byte) element <x>.
+  *
+  *     <x>_<y>_s(void) : Returns size of field <y> of register <x> in bits.
+  *
+  *     <x>_<y>_f(u32 v) : Returns a value based on 'v' which has been shifted
+  *         and masked to place it at field <y> of register <x>.  This value
+  *         can be |'d with others to produce a full register value for
+  *         register <x>.
+  *
+  *     <x>_<y>_m(void) : Returns a mask for field <y> of register <x>.  This
+  *         value can be ~'d and then &'d to clear the value of field <y> for
+  *         register <x>.
+  *
+  *     <x>_<y>_<z>_f(void) : Returns the constant value <z> after being shifted
+  *         to place it at field <y> of register <x>.  This value can be |'d
+  *         with others to produce a full register value for <x>.
+  *
+  *     <x>_<y>_v(u32 r) : Returns the value of field <y> from a full register
+  *         <x> value 'r' after being shifted to place its LSB at bit 0.
+  *         This value is suitable for direct comparison with other unshifted
+  *         values appropriate for use in field <y> of register <x>.
+  *
+  *     <x>_<y>_<z>_v(void) : Returns the constant value for <z> defined for
+  *         field <y> of register <x>.  This value is suitable for direct
+  *         comparison with unshifted values appropriate for use in field <y>
+  *         of register <x>.
+  */
+
+#ifndef __hw_host1x_uclass_host1x_h__
+#define __hw_host1x_uclass_host1x_h__
+
+static inline u32 host1x_uclass_incr_syncpt_r(void)
+{
+	return 0x0;
+}
+static inline u32 host1x_uclass_incr_syncpt_cond_f(u32 v)
+{
+	return (v & 0xff) << 8;
+}
+static inline u32 host1x_uclass_incr_syncpt_indx_f(u32 v)
+{
+	return (v & 0xff) << 0;
+}
+static inline u32 host1x_uclass_wait_syncpt_r(void)
+{
+	return 0x8;
+}
+static inline u32 host1x_uclass_wait_syncpt_indx_f(u32 v)
+{
+	return (v & 0xff) << 24;
+}
+static inline u32 host1x_uclass_wait_syncpt_thresh_f(u32 v)
+{
+	return (v & 0xffffff) << 0;
+}
+static inline u32 host1x_uclass_wait_syncpt_base_indx_f(u32 v)
+{
+	return (v & 0xff) << 24;
+}
+static inline u32 host1x_uclass_wait_syncpt_base_base_indx_f(u32 v)
+{
+	return (v & 0xff) << 16;
+}
+static inline u32 host1x_uclass_wait_syncpt_base_offset_f(u32 v)
+{
+	return (v & 0xffff) << 0;
+}
+static inline u32 host1x_uclass_load_syncpt_base_base_indx_f(u32 v)
+{
+	return (v & 0xff) << 24;
+}
+static inline u32 host1x_uclass_load_syncpt_base_value_f(u32 v)
+{
+	return (v & 0xffffff) << 0;
+}
+static inline u32 host1x_uclass_incr_syncpt_base_base_indx_f(u32 v)
+{
+	return (v & 0xff) << 24;
+}
+static inline u32 host1x_uclass_incr_syncpt_base_offset_f(u32 v)
+{
+	return (v & 0xffffff) << 0;
+}
+static inline u32 host1x_uclass_indoff_r(void)
+{
+	return 0x2d;
+}
+static inline u32 host1x_uclass_indoff_indbe_f(u32 v)
+{
+	return (v & 0xf) << 28;
+}
+static inline u32 host1x_uclass_indoff_autoinc_f(u32 v)
+{
+	return (v & 0x1) << 27;
+}
+static inline u32 host1x_uclass_indoff_indmodid_f(u32 v)
+{
+	return (v & 0xff) << 18;
+}
+static inline u32 host1x_uclass_indoff_indroffset_f(u32 v)
+{
+	return (v & 0xffff) << 2;
+}
+static inline u32 host1x_uclass_indoff_rwn_read_v(void)
+{
+	return 1;
+}
+#endif
diff --git a/drivers/gpu/host1x/hw/syncpt_hw.c b/drivers/gpu/host1x/hw/syncpt_hw.c
index 44a10b0..a070473 100644
--- a/drivers/gpu/host1x/hw/syncpt_hw.c
+++ b/drivers/gpu/host1x/hw/syncpt_hw.c
@@ -97,6 +97,15 @@ static void syncpt_cpu_incr(struct host1x_syncpt *sp)
 	wmb();
 }
 
+/* remove a wait pointed to by patch_addr */
+static int syncpt_patch_wait(struct host1x_syncpt *sp, void *patch_addr)
+{
+	u32 override = host1x_class_host_wait_syncpt(
+			NVSYNCPT_GRAPHICS_HOST, 0);
+	__raw_writel(override, patch_addr);
+	return 0;
+}
+
 static const char *syncpt_name(struct host1x_syncpt *sp)
 {
 	struct host1x_device_info *info = &sp->dev->info;
@@ -141,6 +150,7 @@ static const struct host1x_syncpt_ops host1x_syncpt_ops = {
 	.read_wait_base = syncpt_read_wait_base,
 	.load_min = syncpt_load_min,
 	.cpu_incr = syncpt_cpu_incr,
+	.patch_wait = syncpt_patch_wait,
 	.debug = syncpt_debug,
 	.name = syncpt_name,
 };
diff --git a/drivers/gpu/host1x/intr.c b/drivers/gpu/host1x/intr.c
index f166224..a524826 100644
--- a/drivers/gpu/host1x/intr.c
+++ b/drivers/gpu/host1x/intr.c
@@ -20,6 +20,8 @@
 #include <linux/interrupt.h>
 #include <linux/slab.h>
 #include <linux/irq.h>
+#include <trace/events/host1x.h>
+#include "channel.h"
 #include "dev.h"
 
 /* Wait list management */
@@ -74,7 +76,7 @@ static void remove_completed_waiters(struct list_head *head, u32 sync,
 			struct list_head completed[HOST1X_INTR_ACTION_COUNT])
 {
 	struct list_head *dest;
-	struct host1x_waitlist *waiter, *next;
+	struct host1x_waitlist *waiter, *next, *prev;
 
 	list_for_each_entry_safe(waiter, next, head, list) {
 		if ((s32)(waiter->thresh - sync) > 0)
@@ -82,6 +84,17 @@ static void remove_completed_waiters(struct list_head *head, u32 sync,
 
 		dest = completed + waiter->action;
 
+		/* consolidate submit cleanups */
+		if (waiter->action == HOST1X_INTR_ACTION_SUBMIT_COMPLETE
+			&& !list_empty(dest)) {
+			prev = list_entry(dest->prev,
+					struct host1x_waitlist, list);
+			if (prev->data == waiter->data) {
+				prev->count++;
+				dest = NULL;
+			}
+		}
+
 		/* PENDING->REMOVED or CANCELLED->HANDLED */
 		if (atomic_inc_return(&waiter->state) == WLS_HANDLED || !dest) {
 			list_del(&waiter->list);
@@ -104,6 +117,19 @@ void reset_threshold_interrupt(struct host1x_intr *intr,
 	host1x->intr_op.enable_syncpt_intr(intr, id);
 }
 
+static void action_submit_complete(struct host1x_waitlist *waiter)
+{
+	struct host1x_channel *channel = waiter->data;
+	int nr_completed = waiter->count;
+
+	host1x_cdma_update(&channel->cdma);
+
+	/*  Add nr_completed to trace */
+	trace_host1x_channel_submit_complete(channel->dev->name,
+			nr_completed, waiter->thresh);
+
+}
+
 static void action_wakeup(struct host1x_waitlist *waiter)
 {
 	wait_queue_head_t *wq = waiter->data;
@@ -121,6 +147,7 @@ static void action_wakeup_interruptible(struct host1x_waitlist *waiter)
 typedef void (*action_handler)(struct host1x_waitlist *waiter);
 
 static action_handler action_handlers[HOST1X_INTR_ACTION_COUNT] = {
+	action_submit_complete,
 	action_wakeup,
 	action_wakeup_interruptible,
 };
diff --git a/drivers/gpu/host1x/intr.h b/drivers/gpu/host1x/intr.h
index 3625bf3..fa4b2c4 100644
--- a/drivers/gpu/host1x/intr.h
+++ b/drivers/gpu/host1x/intr.h
@@ -28,6 +28,12 @@ struct host1x_channel;
 
 enum host1x_intr_action {
 	/*
+	 * Perform cleanup after a submit has completed.
+	 * 'data' points to a channel
+	 */
+	HOST1X_INTR_ACTION_SUBMIT_COMPLETE = 0,
+
+	/*
 	 * Wake up a  task.
 	 * 'data' points to a wait_queue_head_t
 	 */
diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c
new file mode 100644
index 0000000..cc8ca2f
--- /dev/null
+++ b/drivers/gpu/host1x/job.c
@@ -0,0 +1,618 @@
+/*
+ * Tegra host1x Job
+ *
+ * Copyright (c) 2010-2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/kref.h>
+#include <linux/err.h>
+#include <linux/vmalloc.h>
+#include <linux/scatterlist.h>
+#include <linux/host1x.h>
+#include <trace/events/host1x.h>
+#include <linux/dma-mapping.h>
+#include "channel.h"
+#include "syncpt.h"
+#include "dev.h"
+#include "memmgr.h"
+
+#ifdef CONFIG_TEGRA_HOST1X_FIREWALL
+static int host1x_firewall = 1;
+#else
+static int host1x_firewall;
+#endif
+
+struct host1x_job *host1x_job_alloc(struct host1x_channel *ch,
+		u32 num_cmdbufs, u32 num_relocs, u32 num_waitchks)
+{
+	struct host1x_job *job = NULL;
+	int num_unpins = num_cmdbufs + num_relocs;
+	s64 total;
+	void *mem;
+
+	/* Check that we're not going to overflow */
+	total = sizeof(struct host1x_job)
+			+ num_relocs * sizeof(struct host1x_reloc)
+			+ num_unpins * sizeof(struct host1x_job_unpin_data)
+			+ num_waitchks * sizeof(struct host1x_waitchk)
+			+ num_cmdbufs * sizeof(struct host1x_job_gather)
+			+ num_unpins * sizeof(dma_addr_t)
+			+ num_unpins * sizeof(u32 *);
+	if (total > ULONG_MAX)
+		return NULL;
+
+	mem = job = kzalloc(total, GFP_KERNEL);
+	if (!job)
+		return NULL;
+
+	kref_init(&job->ref);
+	job->ch = ch;
+
+	/* First init state to zero */
+
+	/*
+	 * Redistribute memory to the structs.
+	 * Overflows and negative conditions have
+	 * already been checked in job_alloc().
+	 */
+	mem += sizeof(struct host1x_job);
+	job->relocarray = num_relocs ? mem : NULL;
+	mem += num_relocs * sizeof(struct host1x_reloc);
+	job->unpins = num_unpins ? mem : NULL;
+	mem += num_unpins * sizeof(struct host1x_job_unpin_data);
+	job->waitchk = num_waitchks ? mem : NULL;
+	mem += num_waitchks * sizeof(struct host1x_waitchk);
+	job->gathers = num_cmdbufs ? mem : NULL;
+	mem += num_cmdbufs * sizeof(struct host1x_job_gather);
+	job->addr_phys = num_unpins ? mem : NULL;
+	mem += num_unpins * sizeof(dma_addr_t);
+	job->pin_ids = num_unpins ? mem : NULL;
+
+	job->reloc_addr_phys = job->addr_phys;
+	job->gather_addr_phys = &job->addr_phys[num_relocs];
+
+	return job;
+}
+EXPORT_SYMBOL(host1x_job_alloc);
+
+void host1x_job_get(struct host1x_job *job)
+{
+	kref_get(&job->ref);
+}
+EXPORT_SYMBOL(host1x_job_get);
+
+static void job_free(struct kref *ref)
+{
+	struct host1x_job *job = container_of(ref, struct host1x_job, ref);
+
+	kfree(job);
+}
+
+void host1x_job_put(struct host1x_job *job)
+{
+	kref_put(&job->ref, job_free);
+}
+EXPORT_SYMBOL(host1x_job_put);
+
+void host1x_job_add_gather(struct host1x_job *job,
+		u32 mem_id, u32 words, u32 offset)
+{
+	struct host1x_job_gather *cur_gather =
+			&job->gathers[job->num_gathers];
+
+	cur_gather->words = words;
+	cur_gather->mem_id = mem_id;
+	cur_gather->offset = offset;
+	job->num_gathers++;
+}
+EXPORT_SYMBOL(host1x_job_add_gather);
+
+/*
+ * Check driver supplied waitchk structs for syncpt thresholds
+ * that have already been satisfied and NULL the comparison (to
+ * avoid a wrap condition in the HW).
+ */
+static int do_waitchks(struct host1x_job *job, struct host1x *host,
+		u32 patch_mem, struct mem_handle *h)
+{
+	int i;
+
+	/* compare syncpt vs wait threshold */
+	for (i = 0; i < job->num_waitchk; i++) {
+		struct host1x_waitchk *wait = &job->waitchk[i];
+		struct host1x_syncpt *sp =
+			host1x_syncpt_get(host, wait->syncpt_id);
+
+		/* validate syncpt id */
+		if (wait->syncpt_id > host1x_syncpt_nb_pts(host))
+			continue;
+
+		/* skip all other gathers */
+		if (patch_mem != wait->mem)
+			continue;
+
+		trace_host1x_syncpt_wait_check(wait->mem, wait->offset,
+				wait->syncpt_id, wait->thresh,
+				host1x_syncpt_read_min(sp));
+		if (host1x_syncpt_is_expired(
+			host1x_syncpt_get(host, wait->syncpt_id),
+			wait->thresh)) {
+			struct host1x_syncpt *sp =
+				host1x_syncpt_get(host, wait->syncpt_id);
+
+			void *patch_addr = NULL;
+
+			/*
+			 * NULL an already satisfied WAIT_SYNCPT host method,
+			 * by patching its args in the command stream. The
+			 * method data is changed to reference a reserved
+			 * (never given out or incr) NVSYNCPT_GRAPHICS_HOST
+			 * syncpt with a matching threshold value of 0, so
+			 * is guaranteed to be popped by the host HW.
+			 */
+			dev_dbg(&host->dev->dev,
+			    "drop WAIT id %d (%s) thresh 0x%x, min 0x%x\n",
+			    wait->syncpt_id, sp->name, wait->thresh,
+			    host1x_syncpt_read_min(sp));
+
+			/* patch the wait */
+			patch_addr = host1x_memmgr_kmap(h,
+					wait->offset >> PAGE_SHIFT);
+			if (patch_addr) {
+				host1x_syncpt_patch_wait(sp,
+					(patch_addr +
+						(wait->offset & ~PAGE_MASK)));
+				host1x_memmgr_kunmap(h,
+						wait->offset >> PAGE_SHIFT,
+						patch_addr);
+			} else {
+				pr_err("Couldn't map cmdbuf for wait check\n");
+			}
+		}
+
+		wait->mem = 0;
+	}
+	return 0;
+}
+
+
+static int pin_job_mem(struct host1x_job *job)
+{
+	int i;
+	int count = 0;
+	int result;
+
+	for (i = 0; i < job->num_relocs; i++) {
+		struct host1x_reloc *reloc = &job->relocarray[i];
+		job->pin_ids[count] = reloc->target;
+		count++;
+	}
+
+	for (i = 0; i < job->num_gathers; i++) {
+		struct host1x_job_gather *g = &job->gathers[i];
+		job->pin_ids[count] = g->mem_id;
+		count++;
+	}
+
+	/* validate array and pin unique ids, get refs for unpinning */
+	result = host1x_memmgr_pin_array_ids(job->ch->dev,
+		job->pin_ids, job->addr_phys,
+		count,
+		job->unpins);
+
+	if (result > 0)
+		job->num_unpins = result;
+
+	return result;
+}
+
+static int do_relocs(struct host1x_job *job,
+		u32 cmdbuf_mem, struct mem_handle *h)
+{
+	int i = 0;
+	int last_page = -1;
+	void *cmdbuf_page_addr = NULL;
+
+	/* pin & patch the relocs for one gather */
+	while (i < job->num_relocs) {
+		struct host1x_reloc *reloc = &job->relocarray[i];
+
+		/* skip all other gathers */
+		if (cmdbuf_mem != reloc->cmdbuf_mem) {
+			i++;
+			continue;
+		}
+
+		if (last_page != reloc->cmdbuf_offset >> PAGE_SHIFT) {
+			if (cmdbuf_page_addr)
+				host1x_memmgr_kunmap(h,
+						last_page, cmdbuf_page_addr);
+
+			cmdbuf_page_addr = host1x_memmgr_kmap(h,
+					reloc->cmdbuf_offset >> PAGE_SHIFT);
+			last_page = reloc->cmdbuf_offset >> PAGE_SHIFT;
+
+			if (unlikely(!cmdbuf_page_addr)) {
+				pr_err("Couldn't map cmdbuf for relocation\n");
+				return -ENOMEM;
+			}
+		}
+
+		__raw_writel(
+			(job->reloc_addr_phys[i] +
+				reloc->target_offset) >> reloc->shift,
+			(cmdbuf_page_addr +
+				(reloc->cmdbuf_offset & ~PAGE_MASK)));
+
+		/* remove completed reloc from the job */
+		if (i != job->num_relocs - 1) {
+			struct host1x_reloc *reloc_last =
+				&job->relocarray[job->num_relocs - 1];
+			reloc->cmdbuf_mem	= reloc_last->cmdbuf_mem;
+			reloc->cmdbuf_offset	= reloc_last->cmdbuf_offset;
+			reloc->target		= reloc_last->target;
+			reloc->target_offset	= reloc_last->target_offset;
+			reloc->shift		= reloc_last->shift;
+			job->reloc_addr_phys[i] =
+				job->reloc_addr_phys[job->num_relocs - 1];
+			job->num_relocs--;
+		} else {
+			break;
+		}
+	}
+
+	if (cmdbuf_page_addr)
+		host1x_memmgr_kunmap(h, last_page, cmdbuf_page_addr);
+
+	return 0;
+}
+
+static int check_reloc(struct host1x_reloc *reloc,
+		u32 cmdbuf_id, int offset)
+{
+	int err = 0;
+	if (reloc->cmdbuf_mem != cmdbuf_id
+			|| reloc->cmdbuf_offset != offset * sizeof(u32))
+		err = -EINVAL;
+
+	return err;
+}
+
+static int check_mask(struct host1x_job *job,
+		struct platform_device *pdev,
+		struct host1x_reloc **reloc, int *num_relocs,
+		u32 cmdbuf_id, int *offset,
+		u32 *words, u32 class, u32 reg, u32 mask)
+{
+	while (mask) {
+		if (*words == 0)
+			return -EINVAL;
+
+		if (mask & 1) {
+			if (job->is_addr_reg(pdev, class, reg)) {
+				if (!*num_relocs ||
+					check_reloc(*reloc, cmdbuf_id, *offset))
+					return -EINVAL;
+				(*reloc)++;
+				(*num_relocs)--;
+			}
+			(*words)--;
+			(*offset)++;
+		}
+		mask >>= 1;
+		reg += 1;
+	}
+
+	return 0;
+}
+
+static int check_incr(struct host1x_job *job,
+		struct platform_device *pdev,
+		struct host1x_reloc **reloc, int *num_relocs,
+		u32 cmdbuf_id, int *offset,
+		u32 *words, u32 class, u32 reg, u32 count)
+{
+	while (count) {
+		if (*words == 0)
+			return -EINVAL;
+
+		if (job->is_addr_reg(pdev, class, reg)) {
+			if (!*num_relocs ||
+				check_reloc(*reloc, cmdbuf_id, *offset))
+				return -EINVAL;
+			(*reloc)++;
+			(*num_relocs)--;
+		}
+		reg += 1;
+		(*words)--;
+		(*offset)++;
+		count--;
+	}
+
+	return 0;
+}
+
+static int check_nonincr(struct host1x_job *job,
+		struct platform_device *pdev,
+		struct host1x_reloc **reloc, int *num_relocs,
+		u32 cmdbuf_id, int *offset,
+		u32 *words, u32 class, u32 reg, u32 count)
+{
+	int is_addr_reg = job->is_addr_reg(pdev, class, reg);
+
+	while (count) {
+		if (*words == 0)
+			return -EINVAL;
+
+		if (is_addr_reg) {
+			if (!*num_relocs ||
+				check_reloc(*reloc, cmdbuf_id, *offset))
+				return -EINVAL;
+			(*reloc)++;
+			(*num_relocs)--;
+		}
+		(*words)--;
+		(*offset)++;
+		count--;
+	}
+
+	return 0;
+}
+
+static int validate(struct host1x_job *job, struct platform_device *pdev,
+		struct host1x_job_gather *g)
+{
+	struct host1x_reloc *reloc = job->relocarray;
+	int num_relocs = job->num_relocs;
+	u32 *cmdbuf_base;
+	int offset = 0;
+	unsigned int words;
+	int err = 0;
+	int class = 0;
+
+	if (!job->is_addr_reg)
+		return 0;
+
+	cmdbuf_base = host1x_memmgr_mmap(g->ref);
+	if (IS_ERR_OR_NULL(cmdbuf_base))
+		return PTR_ERR(cmdbuf_base);
+
+	words = g->words;
+	while (words && !err) {
+		u32 word = cmdbuf_base[offset];
+		u32 opcode = (word & 0xf0000000) >> 28;
+		u32 mask = 0;
+		u32 reg = 0;
+		u32 count = 0;
+
+		words--;
+		offset++;
+
+		switch (opcode) {
+		case 0:
+			class = word >> 6 & 0x3ff;
+			mask = word & 0x3f;
+			reg = word >> 16 & 0xfff;
+			err = check_mask(job, pdev,
+					&reloc, &num_relocs, g->mem_id,
+					&offset, &words, class, reg, mask);
+			if (err)
+				goto out;
+			break;
+		case 1:
+			reg = word >> 16 & 0xfff;
+			count = word & 0xffff;
+			err = check_incr(job, pdev,
+					&reloc, &num_relocs, g->mem_id,
+					&offset, &words, class, reg, count);
+			if (err)
+				goto out;
+			break;
+
+		case 2:
+			reg = word >> 16 & 0xfff;
+			count = word & 0xffff;
+			err = check_nonincr(job, pdev,
+					&reloc, &num_relocs, g->mem_id,
+					&offset, &words, class, reg, count);
+			if (err)
+				goto out;
+			break;
+
+		case 3:
+			mask = word & 0xffff;
+			reg = word >> 16 & 0xfff;
+			err = check_mask(job, pdev,
+					&reloc, &num_relocs, g->mem_id,
+					&offset, &words, class, reg, mask);
+			if (err)
+				goto out;
+			break;
+		case 4:
+		case 5:
+		case 14:
+			break;
+		default:
+			err = -EINVAL;
+			break;
+		}
+	}
+
+	/* No relocs should remain at this point */
+	if (num_relocs)
+		err = -EINVAL;
+
+out:
+	host1x_memmgr_munmap(g->ref, cmdbuf_base);
+
+	return err;
+}
+
+static inline int copy_gathers(struct host1x_job *job,
+		struct platform_device *pdev)
+{
+	size_t size = 0;
+	size_t offset = 0;
+	int i;
+
+	for (i = 0; i < job->num_gathers; i++) {
+		struct host1x_job_gather *g = &job->gathers[i];
+		size += g->words * sizeof(u32);
+	}
+
+	job->gather_copy_mapped = dma_alloc_writecombine(&pdev->dev,
+			size, &job->gather_copy, GFP_KERNEL);
+	if (IS_ERR(job->gather_copy_mapped)) {
+		int err = PTR_ERR(job->gather_copy_mapped);
+		job->gather_copy_mapped = NULL;
+		return err;
+	}
+
+	job->gather_copy_size = size;
+
+	for (i = 0; i < job->num_gathers; i++) {
+		struct host1x_job_gather *g = &job->gathers[i];
+		void *gather = host1x_memmgr_mmap(g->ref);
+		memcpy(job->gather_copy_mapped + offset,
+				gather + g->offset,
+				g->words * sizeof(u32));
+
+		g->mem_base = job->gather_copy;
+		g->offset = offset;
+		g->mem_id = 0;
+		g->ref = 0;
+
+		host1x_memmgr_munmap(g->ref, gather);
+		offset += g->words * sizeof(u32);
+	}
+
+	return 0;
+}
+
+int host1x_job_pin(struct host1x_job *job, struct platform_device *pdev)
+{
+	int err = 0, i = 0, j = 0;
+	struct host1x *host = host1x_get_host(pdev);
+	DECLARE_BITMAP(waitchk_mask, host1x_syncpt_nb_pts(host));
+
+	bitmap_zero(waitchk_mask, host1x_syncpt_nb_pts(host));
+	for (i = 0; i < job->num_waitchk; i++) {
+		u32 syncpt_id = job->waitchk[i].syncpt_id;
+		if (syncpt_id < host1x_syncpt_nb_pts(host))
+			set_bit(syncpt_id, waitchk_mask);
+	}
+
+	/* get current syncpt values for waitchk */
+	for_each_set_bit(i, &waitchk_mask[0], sizeof(waitchk_mask))
+		host1x_syncpt_load_min(host->syncpt + i);
+
+	/* pin memory */
+	err = pin_job_mem(job);
+	if (err <= 0)
+		goto fail;
+
+	/* patch gathers */
+	for (i = 0; i < job->num_gathers; i++) {
+		struct host1x_job_gather *g = &job->gathers[i];
+
+		/* process each gather mem only once */
+		if (!g->ref) {
+			g->ref = host1x_memmgr_get(g->mem_id, job->ch->dev);
+			if (IS_ERR(g->ref)) {
+				err = PTR_ERR(g->ref);
+				g->ref = NULL;
+				break;
+			}
+
+			g->mem_base = job->gather_addr_phys[i];
+
+			for (j = 0; j < job->num_gathers; j++) {
+				struct host1x_job_gather *tmp =
+					&job->gathers[j];
+				if (!tmp->ref && tmp->mem_id == g->mem_id) {
+					tmp->ref = g->ref;
+					tmp->mem_base = g->mem_base;
+				}
+			}
+			err = 0;
+			if (host1x_firewall)
+				err = validate(job, pdev, g);
+			if (err)
+				dev_err(&pdev->dev,
+					"Job validate returned %d\n", err);
+			if (!err)
+				err = do_relocs(job, g->mem_id,  g->ref);
+			if (!err)
+				err = do_waitchks(job, host,
+						g->mem_id, g->ref);
+			host1x_memmgr_put(g->ref);
+			if (err)
+				break;
+		}
+	}
+
+	if (host1x_firewall) {
+		err = copy_gathers(job, pdev);
+		if (err) {
+			host1x_job_unpin(job);
+			return err;
+		}
+	}
+
+fail:
+	wmb();
+
+	return err;
+}
+EXPORT_SYMBOL(host1x_job_pin);
+
+void host1x_job_unpin(struct host1x_job *job)
+{
+	int i;
+
+	for (i = 0; i < job->num_unpins; i++) {
+		struct host1x_job_unpin_data *unpin = &job->unpins[i];
+		host1x_memmgr_unpin(unpin->h, unpin->mem);
+		host1x_memmgr_put(unpin->h);
+	}
+	job->num_unpins = 0;
+
+	if (job->gather_copy_size)
+		dma_free_writecombine(&job->ch->dev->dev,
+			job->gather_copy_size,
+			job->gather_copy_mapped, job->gather_copy);
+}
+EXPORT_SYMBOL(host1x_job_unpin);
+
+/*
+ * Debug routine used to dump job entries
+ */
+void host1x_job_dump(struct device *dev, struct host1x_job *job)
+{
+	dev_dbg(dev, "    SYNCPT_ID   %d\n",
+		job->syncpt_id);
+	dev_dbg(dev, "    SYNCPT_VAL  %d\n",
+		job->syncpt_end);
+	dev_dbg(dev, "    FIRST_GET   0x%x\n",
+		job->first_get);
+	dev_dbg(dev, "    TIMEOUT     %d\n",
+		job->timeout);
+	dev_dbg(dev, "    NUM_SLOTS   %d\n",
+		job->num_slots);
+	dev_dbg(dev, "    NUM_HANDLES %d\n",
+		job->num_unpins);
+}
diff --git a/drivers/gpu/host1x/memmgr.c b/drivers/gpu/host1x/memmgr.c
new file mode 100644
index 0000000..9cf604f
--- /dev/null
+++ b/drivers/gpu/host1x/memmgr.c
@@ -0,0 +1,174 @@
+/*
+ * Tegra host1x Memory Management Abstraction
+ *
+ * Copyright (c) 2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/host1x.h>
+#include <linux/kernel.h>
+#include <linux/err.h>
+
+#include "memmgr.h"
+#include "cma.h"
+
+struct mem_handle *host1x_memmgr_alloc(size_t size, size_t align, int flags)
+{
+	return NULL;
+}
+
+struct mem_handle *host1x_memmgr_get(u32 id, struct platform_device *dev)
+{
+	struct mem_handle *h = NULL;
+
+	switch (host1x_memmgr_type(id)) {
+#if defined(CONFIG_TEGRA_HOST1X_CMA)
+	case mem_mgr_type_cma:
+		h = (struct mem_handle *) host1x_cma_get(id, dev);
+		break;
+#endif
+	default:
+		break;
+	}
+
+	return h;
+}
+
+void host1x_memmgr_put(struct mem_handle *handle)
+{
+	switch (host1x_memmgr_type((u32)handle)) {
+#if defined(CONFIG_TEGRA_HOST1X_CMA)
+	case mem_mgr_type_cma:
+		host1x_cma_put(handle);
+		break;
+#endif
+	default:
+		break;
+	}
+}
+
+struct sg_table *host1x_memmgr_pin(struct mem_handle *handle)
+{
+	switch (host1x_memmgr_type((u32)handle)) {
+#if defined(CONFIG_TEGRA_HOST1X_CMA)
+	case mem_mgr_type_cma:
+		return host1x_cma_pin(handle);
+		break;
+#endif
+	default:
+		return 0;
+		break;
+	}
+}
+
+void host1x_memmgr_unpin(struct mem_handle *handle, struct sg_table *sgt)
+{
+	switch (host1x_memmgr_type((u32)handle)) {
+#if defined(CONFIG_TEGRA_HOST1X_CMA)
+	case mem_mgr_type_cma:
+		host1x_cma_unpin(handle, sgt);
+		break;
+#endif
+	default:
+		break;
+	}
+}
+
+void *host1x_memmgr_mmap(struct mem_handle *handle)
+{
+	switch (host1x_memmgr_type((u32)handle)) {
+#if defined(CONFIG_TEGRA_HOST1X_CMA)
+	case mem_mgr_type_cma:
+		return host1x_cma_mmap(handle);
+		break;
+#endif
+	default:
+		return 0;
+		break;
+	}
+}
+
+void host1x_memmgr_munmap(struct mem_handle *handle, void *addr)
+{
+	switch (host1x_memmgr_type((u32)handle)) {
+#if defined(CONFIG_TEGRA_HOST1X_CMA)
+	case mem_mgr_type_cma:
+		host1x_cma_munmap(handle, addr);
+		break;
+#endif
+	default:
+		break;
+	}
+}
+
+void *host1x_memmgr_kmap(struct mem_handle *handle, unsigned int pagenum)
+{
+	switch (host1x_memmgr_type((u32)handle)) {
+#if defined(CONFIG_TEGRA_HOST1X_CMA)
+	case mem_mgr_type_cma:
+		return host1x_cma_kmap(handle, pagenum);
+		break;
+#endif
+	default:
+		return 0;
+		break;
+	}
+}
+
+void host1x_memmgr_kunmap(struct mem_handle *handle, unsigned int pagenum,
+		void *addr)
+{
+	switch (host1x_memmgr_type((u32)handle)) {
+#if defined(CONFIG_TEGRA_HOST1X_CMA)
+	case mem_mgr_type_cma:
+		host1x_cma_kunmap(handle, pagenum, addr);
+		break;
+#endif
+	default:
+		break;
+	}
+}
+
+int host1x_memmgr_pin_array_ids(struct platform_device *dev,
+		long unsigned *ids,
+		dma_addr_t *phys_addr,
+		u32 count,
+		struct host1x_job_unpin_data *unpin_data)
+{
+	int pin_count = 0;
+
+#if defined(CONFIG_TEGRA_HOST1X_CMA)
+	{
+		int cma_count = host1x_cma_pin_array_ids(dev,
+			ids, MEMMGR_TYPE_MASK,
+			mem_mgr_type_cma,
+			count, &unpin_data[pin_count],
+			phys_addr);
+
+		if (cma_count < 0) {
+			/* clean up previous handles */
+			while (pin_count) {
+				pin_count--;
+				/* unpin, put */
+				host1x_memmgr_unpin(unpin_data[pin_count].h,
+						unpin_data[pin_count].mem);
+				host1x_memmgr_put(unpin_data[pin_count].h);
+			}
+			return cma_count;
+		}
+		pin_count += cma_count;
+	}
+#endif
+	return pin_count;
+}
diff --git a/drivers/gpu/host1x/memmgr.h b/drivers/gpu/host1x/memmgr.h
new file mode 100644
index 0000000..52881ea
--- /dev/null
+++ b/drivers/gpu/host1x/memmgr.h
@@ -0,0 +1,53 @@
+/*
+ * Tegra host1x Memory Management Abstraction header
+ *
+ * Copyright (c) 2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef _NVHOST_MEM_MGR_H_
+#define _NVHOST_MEM_MGR_H_
+
+struct mem_handle;
+struct platform_device;
+
+struct host1x_job_unpin_data {
+	struct mem_handle *h;
+	struct sg_table *mem;
+};
+
+enum mem_mgr_flag {
+	mem_mgr_flag_uncacheable = 0,
+	mem_mgr_flag_write_combine = 1,
+};
+
+struct mem_handle *host1x_memmgr_alloc(size_t size, size_t align,
+		int flags);
+struct mem_handle *host1x_memmgr_get(u32 id, struct platform_device *dev);
+void host1x_memmgr_put(struct mem_handle *handle);
+struct sg_table *host1x_memmgr_pin(struct mem_handle *handle);
+void host1x_memmgr_unpin(struct mem_handle *handle, struct sg_table *sgt);
+void *host1x_memmgr_mmap(struct mem_handle *handle);
+void host1x_memmgr_munmap(struct mem_handle *handle, void *addr);
+void *host1x_memmgr_kmap(struct mem_handle *handle, unsigned int pagenum);
+void host1x_memmgr_kunmap(struct mem_handle *handle, unsigned int pagenum,
+		void *addr);
+
+int host1x_memmgr_pin_array_ids(struct platform_device *dev,
+		long unsigned *ids,
+		dma_addr_t *phys_addr,
+		u32 count,
+		struct host1x_job_unpin_data *unpin_data);
+
+#endif
diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
index 9987548..5de67d2 100644
--- a/drivers/gpu/host1x/syncpt.c
+++ b/drivers/gpu/host1x/syncpt.c
@@ -295,6 +295,12 @@ void host1x_syncpt_debug(struct host1x_syncpt *sp)
 	sp->dev->syncpt_op.debug(sp);
 }
 
+/* remove a wait pointed to by patch_addr */
+int host1x_syncpt_patch_wait(struct host1x_syncpt *sp, void *patch_addr)
+{
+	return sp->dev->syncpt_op.patch_wait(sp, patch_addr);
+}
+
 struct host1x_syncpt *host1x_syncpt_init(struct host1x *host)
 {
 	struct host1x_syncpt *syncpt, *sp;
diff --git a/drivers/gpu/host1x/syncpt.h b/drivers/gpu/host1x/syncpt.h
index d4d1f3f..24d4846 100644
--- a/drivers/gpu/host1x/syncpt.h
+++ b/drivers/gpu/host1x/syncpt.h
@@ -121,6 +121,8 @@ u32 host1x_syncpt_incr_max(struct host1x_syncpt *sp, u32 incrs);
 int host1x_syncpt_wait(struct host1x_syncpt *sp, u32 thresh,
 			long timeout, u32 *value);
 
+int host1x_syncpt_patch_wait(struct host1x_syncpt *sp, void *patch_addr);
+
 void host1x_syncpt_debug(struct host1x_syncpt *sp);
 
 static inline int host1x_syncpt_is_valid(struct host1x_syncpt *sp)
diff --git a/include/linux/host1x.h b/include/linux/host1x.h
index 00060ee..a58c798 100644
--- a/include/linux/host1x.h
+++ b/include/linux/host1x.h
@@ -25,6 +25,8 @@
 #include <linux/types.h>
 #include <linux/platform_device.h>
 
+struct host1x_job;
+struct host1x_job_unpin_data;
 struct host1x_syncpt;
 
 #define NVSYNCPT_INVALID			(-1)
@@ -35,6 +37,177 @@ void host1x_syncpt_incr_byid(u32 id);
 u32 host1x_syncpt_read_byid(u32 id);
 int host1x_syncpt_wait_byid(u32 id, u32 thresh, long timeout, u32 *value);
 
+/* channels */
+struct host1x_channel *host1x_channel_alloc(struct platform_device *pdev);
+struct host1x_channel *host1x_channel_get(struct host1x_channel *ch);
+void host1x_channel_put(struct host1x_channel *ch);
+int host1x_channel_submit(struct host1x_job *job);
+
+/* Buffer encapsulation */
+enum mem_mgr_type {
+	mem_mgr_type_cma = 2,
+};
+
+#define MEMMGR_TYPE_MASK	0x3
+#define MEMMGR_ID_MASK		~0x3
+
+static inline int host1x_memmgr_type(u32 id) { return id & MEMMGR_TYPE_MASK; }
+static inline int host1x_memmgr_id(u32 id) { return id & MEMMGR_ID_MASK; }
+static inline unsigned int host1x_memmgr_host1x_id(u32 type, u32 handle)
+{
+	if (host1x_memmgr_type(type) != type ||
+		host1x_memmgr_id(handle) != handle)
+		return 0;
+
+	return handle | type;
+}
+
+
+enum host1x_class {
+	NV_HOST1X_CLASS_ID		= 0x1,
+	NV_GRAPHICS_2D_CLASS_ID		= 0x51,
+};
+
+struct host1x_job_gather {
+	u32 words;
+	dma_addr_t mem_base;
+	u32 mem_id;
+	int offset;
+	struct mem_handle *ref;
+};
+
+struct host1x_cmdbuf {
+	__u32 mem;
+	__u32 offset;
+	__u32 words;
+	__u32 pad;
+};
+
+struct host1x_reloc {
+	__u32 cmdbuf_mem;
+	__u32 cmdbuf_offset;
+	__u32 target;
+	__u32 target_offset;
+	__u32 shift;
+	__u32 pad;
+};
+
+struct host1x_waitchk {
+	__u32 mem;
+	__u32 offset;
+	__u32 syncpt_id;
+	__u32 thresh;
+};
+
+/*
+ * Each submit is tracked as a host1x_job.
+ */
+struct host1x_job {
+	/* When refcount goes to zero, job can be freed */
+	struct kref ref;
+
+	/* List entry */
+	struct list_head list;
+
+	/* Channel where job is submitted to */
+	struct host1x_channel *ch;
+
+	int clientid;
+
+	/* Gathers and their memory */
+	struct host1x_job_gather *gathers;
+	int num_gathers;
+
+	/* Wait checks to be processed at submit time */
+	struct host1x_waitchk *waitchk;
+	int num_waitchk;
+	u32 waitchk_mask;
+
+	/* Array of handles to be pinned & unpinned */
+	struct host1x_reloc *relocarray;
+	int num_relocs;
+	struct host1x_job_unpin_data *unpins;
+	int num_unpins;
+
+	dma_addr_t *addr_phys;
+	dma_addr_t *gather_addr_phys;
+	dma_addr_t *reloc_addr_phys;
+
+	/* Sync point id, number of increments and end related to the submit */
+	u32 syncpt_id;
+	u32 syncpt_incrs;
+	u32 syncpt_end;
+
+	/* Maximum time to wait for this job */
+	int timeout;
+
+	/* Null kickoff prevents submit from being sent to hardware */
+	bool null_kickoff;
+
+	/* Index and number of slots used in the push buffer */
+	int first_get;
+	int num_slots;
+
+	/* Copy of gathers */
+	size_t gather_copy_size;
+	dma_addr_t gather_copy;
+	u8 *gather_copy_mapped;
+
+	/* Temporary space for unpin ids */
+	long unsigned int *pin_ids;
+
+	/* Check if register is marked as an address reg */
+	int (*is_addr_reg)(struct platform_device *dev, u32 reg, u32 class);
+
+	/* Request a SETCLASS to this class */
+	u32 class;
+
+	/* Add a channel wait for previous ops to complete */
+	u32 serialize;
+};
+/*
+ * Allocate memory for a job. Just enough memory will be allocated to
+ * accomodate the submit.
+ */
+struct host1x_job *host1x_job_alloc(struct host1x_channel *ch,
+		u32 num_cmdbufs, u32 num_relocs, u32 num_waitchks);
+
+/*
+ * Add a gather to a job.
+ */
+void host1x_job_add_gather(struct host1x_job *job,
+		u32 mem_id, u32 words, u32 offset);
+
+/*
+ * Increment reference going to host1x_job.
+ */
+void host1x_job_get(struct host1x_job *job);
+
+/*
+ * Decrement reference job, free if goes to zero.
+ */
+void host1x_job_put(struct host1x_job *job);
+
+/*
+ * Pin memory related to job. This handles relocation of addresses to the
+ * host1x address space. Handles both the gather memory and any other memory
+ * referred to from the gather buffers.
+ *
+ * Handles also patching out host waits that would wait for an expired sync
+ * point value.
+ */
+int host1x_job_pin(struct host1x_job *job, struct platform_device *pdev);
+
+/*
+ * Unpin memory related to job.
+ */
+void host1x_job_unpin(struct host1x_job *job);
+
+/*
+ * Dump contents of job to debug output.
+ */
+void host1x_job_dump(struct device *dev, struct host1x_job *job);
+
 struct host1x_syncpt *host1x_syncpt_alloc(struct platform_device *pdev,
 		int client_managed);
 void host1x_syncpt_free(struct host1x_syncpt *sp);
diff --git a/include/trace/events/host1x.h b/include/trace/events/host1x.h
index d98d74c..e087910 100644
--- a/include/trace/events/host1x.h
+++ b/include/trace/events/host1x.h
@@ -37,6 +37,214 @@ DECLARE_EVENT_CLASS(host1x,
 	TP_printk("name=%s", __entry->name)
 );
 
+DEFINE_EVENT(host1x, host1x_channel_open,
+	TP_PROTO(const char *name),
+	TP_ARGS(name)
+);
+
+DEFINE_EVENT(host1x, host1x_channel_release,
+	TP_PROTO(const char *name),
+	TP_ARGS(name)
+);
+
+TRACE_EVENT(host1x_cdma_begin,
+	TP_PROTO(const char *name),
+
+	TP_ARGS(name),
+
+	TP_STRUCT__entry(
+		__field(const char *, name)
+	),
+
+	TP_fast_assign(
+		__entry->name = name;
+	),
+
+	TP_printk("name=%s",
+		__entry->name)
+);
+
+TRACE_EVENT(host1x_cdma_end,
+	TP_PROTO(const char *name),
+
+	TP_ARGS(name),
+
+	TP_STRUCT__entry(
+		__field(const char *, name)
+	),
+
+	TP_fast_assign(
+		__entry->name = name;
+	),
+
+	TP_printk("name=%s",
+		__entry->name)
+);
+
+TRACE_EVENT(host1x_cdma_flush,
+	TP_PROTO(const char *name, int timeout),
+
+	TP_ARGS(name, timeout),
+
+	TP_STRUCT__entry(
+		__field(const char *, name)
+		__field(int, timeout)
+	),
+
+	TP_fast_assign(
+		__entry->name = name;
+		__entry->timeout = timeout;
+	),
+
+	TP_printk("name=%s, timeout=%d",
+		__entry->name, __entry->timeout)
+);
+
+TRACE_EVENT(host1x_cdma_push,
+	TP_PROTO(const char *name, u32 op1, u32 op2),
+
+	TP_ARGS(name, op1, op2),
+
+	TP_STRUCT__entry(
+		__field(const char *, name)
+		__field(u32, op1)
+		__field(u32, op2)
+	),
+
+	TP_fast_assign(
+		__entry->name = name;
+		__entry->op1 = op1;
+		__entry->op2 = op2;
+	),
+
+	TP_printk("name=%s, op1=%08x, op2=%08x",
+		__entry->name, __entry->op1, __entry->op2)
+);
+
+TRACE_EVENT(host1x_cdma_push_gather,
+	TP_PROTO(const char *name, u32 mem_id,
+			u32 words, u32 offset, void *cmdbuf),
+
+	TP_ARGS(name, mem_id, words, offset, cmdbuf),
+
+	TP_STRUCT__entry(
+		__field(const char *, name)
+		__field(u32, mem_id)
+		__field(u32, words)
+		__field(u32, offset)
+		__field(bool, cmdbuf)
+		__dynamic_array(u32, cmdbuf, words)
+	),
+
+	TP_fast_assign(
+		if (cmdbuf) {
+			memcpy(__get_dynamic_array(cmdbuf), cmdbuf+offset,
+					words * sizeof(u32));
+		}
+		__entry->cmdbuf = cmdbuf;
+		__entry->name = name;
+		__entry->mem_id = mem_id;
+		__entry->words = words;
+		__entry->offset = offset;
+	),
+
+	TP_printk("name=%s, mem_id=%08x, words=%u, offset=%d, contents=[%s]",
+	  __entry->name, __entry->mem_id,
+	  __entry->words, __entry->offset,
+	  __print_hex(__get_dynamic_array(cmdbuf),
+		  __entry->cmdbuf ? __entry->words * 4 : 0))
+);
+
+TRACE_EVENT(host1x_channel_submit,
+	TP_PROTO(const char *name, u32 cmdbufs, u32 relocs, u32 waitchks,
+			u32 syncpt_id, u32 syncpt_incrs),
+
+	TP_ARGS(name, cmdbufs, relocs, waitchks, syncpt_id, syncpt_incrs),
+
+	TP_STRUCT__entry(
+		__field(const char *, name)
+		__field(u32, cmdbufs)
+		__field(u32, relocs)
+		__field(u32, waitchks)
+		__field(u32, syncpt_id)
+		__field(u32, syncpt_incrs)
+	),
+
+	TP_fast_assign(
+		__entry->name = name;
+		__entry->cmdbufs = cmdbufs;
+		__entry->relocs = relocs;
+		__entry->waitchks = waitchks;
+		__entry->syncpt_id = syncpt_id;
+		__entry->syncpt_incrs = syncpt_incrs;
+	),
+
+	TP_printk("name=%s, cmdbufs=%u, relocs=%u, waitchks=%d,"
+		"syncpt_id=%u, syncpt_incrs=%u",
+	  __entry->name, __entry->cmdbufs, __entry->relocs, __entry->waitchks,
+	  __entry->syncpt_id, __entry->syncpt_incrs)
+);
+
+TRACE_EVENT(host1x_channel_submitted,
+	TP_PROTO(const char *name, u32 syncpt_base, u32 syncpt_max),
+
+	TP_ARGS(name, syncpt_base, syncpt_max),
+
+	TP_STRUCT__entry(
+		__field(const char *, name)
+		__field(u32, syncpt_base)
+		__field(u32, syncpt_max)
+	),
+
+	TP_fast_assign(
+		__entry->name = name;
+		__entry->syncpt_base = syncpt_base;
+		__entry->syncpt_max = syncpt_max;
+	),
+
+	TP_printk("name=%s, syncpt_base=%d, syncpt_max=%d",
+		__entry->name, __entry->syncpt_base, __entry->syncpt_max)
+);
+
+TRACE_EVENT(host1x_channel_submit_complete,
+	TP_PROTO(const char *name, int count, u32 thresh),
+
+	TP_ARGS(name, count, thresh),
+
+	TP_STRUCT__entry(
+		__field(const char *, name)
+		__field(int, count)
+		__field(u32, thresh)
+	),
+
+	TP_fast_assign(
+		__entry->name = name;
+		__entry->count = count;
+		__entry->thresh = thresh;
+	),
+
+	TP_printk("name=%s, count=%d, thresh=%d",
+		__entry->name, __entry->count, __entry->thresh)
+);
+
+TRACE_EVENT(host1x_wait_cdma,
+	TP_PROTO(const char *name, u32 eventid),
+
+	TP_ARGS(name, eventid),
+
+	TP_STRUCT__entry(
+		__field(const char *, name)
+		__field(u32, eventid)
+	),
+
+	TP_fast_assign(
+		__entry->name = name;
+		__entry->eventid = eventid;
+	),
+
+	TP_printk("name=%s, event=%d", __entry->name, __entry->eventid)
+);
+
 TRACE_EVENT(host1x_syncpt_load_min,
 	TP_PROTO(u32 id, u32 val),
 
@@ -55,6 +263,33 @@ TRACE_EVENT(host1x_syncpt_load_min,
 	TP_printk("id=%d, val=%d", __entry->id, __entry->val)
 );
 
+TRACE_EVENT(host1x_syncpt_wait_check,
+	TP_PROTO(u32 mem_id, u32 offset, u32 syncpt_id, u32 thresh, u32 min),
+
+	TP_ARGS(mem_id, offset, syncpt_id, thresh, min),
+
+	TP_STRUCT__entry(
+		__field(u32, mem_id)
+		__field(u32, offset)
+		__field(u32, syncpt_id)
+		__field(u32, thresh)
+		__field(u32, min)
+	),
+
+	TP_fast_assign(
+		__entry->mem_id = mem_id;
+		__entry->offset = offset;
+		__entry->syncpt_id = syncpt_id;
+		__entry->thresh = thresh;
+		__entry->min = min;
+	),
+
+	TP_printk("mem_id=%08x, offset=%05x, id=%d, thresh=%d, current=%d",
+		__entry->mem_id, __entry->offset,
+		__entry->syncpt_id, __entry->thresh,
+		__entry->min)
+);
+
 #endif /*  _TRACE_HOST1X_H */
 
 /* This part must be outside protection */
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCHv3 3/7] gpu: host1x: Add channel support
@ 2012-12-13 14:04   ` Terje Bergstrom
  0 siblings, 0 replies; 24+ messages in thread
From: Terje Bergstrom @ 2012-12-13 14:04 UTC (permalink / raw)
  To: tbergstrom, thierry.reding, dev, linux-tegra, dri-devel
  Cc: amerilainen, linux-kernel

Add support for host1x client modules, and host1x channels to submit
work to the clients. The work is submitted in GEM CMA buffers, so
this patch adds support for them.

Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
---
 drivers/gpu/host1x/Kconfig                  |   24 +-
 drivers/gpu/host1x/Makefile                 |    8 +-
 drivers/gpu/host1x/cdma.c                   |  438 +++++++++++++++++++
 drivers/gpu/host1x/cdma.h                   |  107 +++++
 drivers/gpu/host1x/channel.c                |  137 ++++++
 drivers/gpu/host1x/channel.h                |   64 +++
 drivers/gpu/host1x/cma.c                    |  116 +++++
 drivers/gpu/host1x/cma.h                    |   43 ++
 drivers/gpu/host1x/dev.c                    |   13 +
 drivers/gpu/host1x/dev.h                    |   54 +++
 drivers/gpu/host1x/hw/cdma_hw.c             |  477 +++++++++++++++++++++
 drivers/gpu/host1x/hw/cdma_hw.h             |   37 ++
 drivers/gpu/host1x/hw/channel_hw.c          |  147 +++++++
 drivers/gpu/host1x/hw/host1x01.c            |    6 +
 drivers/gpu/host1x/hw/host1x01_hardware.h   |  124 ++++++
 drivers/gpu/host1x/hw/hw_host1x01_channel.h |   86 ++++
 drivers/gpu/host1x/hw/hw_host1x01_sync.h    |    8 +
 drivers/gpu/host1x/hw/hw_host1x01_uclass.h  |  130 ++++++
 drivers/gpu/host1x/hw/syncpt_hw.c           |   10 +
 drivers/gpu/host1x/intr.c                   |   29 +-
 drivers/gpu/host1x/intr.h                   |    6 +
 drivers/gpu/host1x/job.c                    |  618 +++++++++++++++++++++++++++
 drivers/gpu/host1x/memmgr.c                 |  174 ++++++++
 drivers/gpu/host1x/memmgr.h                 |   53 +++
 drivers/gpu/host1x/syncpt.c                 |    6 +
 drivers/gpu/host1x/syncpt.h                 |    2 +
 include/linux/host1x.h                      |  173 ++++++++
 include/trace/events/host1x.h               |  235 ++++++++++
 28 files changed, 3322 insertions(+), 3 deletions(-)
 create mode 100644 drivers/gpu/host1x/cdma.c
 create mode 100644 drivers/gpu/host1x/cdma.h
 create mode 100644 drivers/gpu/host1x/channel.c
 create mode 100644 drivers/gpu/host1x/channel.h
 create mode 100644 drivers/gpu/host1x/cma.c
 create mode 100644 drivers/gpu/host1x/cma.h
 create mode 100644 drivers/gpu/host1x/hw/cdma_hw.c
 create mode 100644 drivers/gpu/host1x/hw/cdma_hw.h
 create mode 100644 drivers/gpu/host1x/hw/channel_hw.c
 create mode 100644 drivers/gpu/host1x/hw/hw_host1x01_channel.h
 create mode 100644 drivers/gpu/host1x/hw/hw_host1x01_uclass.h
 create mode 100644 drivers/gpu/host1x/job.c
 create mode 100644 drivers/gpu/host1x/memmgr.c
 create mode 100644 drivers/gpu/host1x/memmgr.h

diff --git a/drivers/gpu/host1x/Kconfig b/drivers/gpu/host1x/Kconfig
index e89fb2b..61e7ba3 100644
--- a/drivers/gpu/host1x/Kconfig
+++ b/drivers/gpu/host1x/Kconfig
@@ -3,4 +3,26 @@ config TEGRA_HOST1X
 	help
 	  Driver for the Tegra host1x hardware.
 
-	  Required for enabling tegradrm.
+	  Required for enabling tegradrm and 2D acceleration.
+
+if TEGRA_HOST1X
+
+config TEGRA_HOST1X_CMA
+	bool "Support DRM CMA buffers"
+	depends on DRM
+	select DRM_GEM_CMA_HELPER
+	select DRM_KMS_CMA_HELPER
+	help
+	  Say yes if you wish to use DRM CMA buffers.
+
+	  If unsure, choose Y.
+
+config TEGRA_HOST1X_FIREWALL
+	bool "Enable HOST1X security firewall"
+	default y
+	help
+	  Say yes if kernel should protect command streams from tampering.
+
+	  If unsure, choose Y.
+
+endif
diff --git a/drivers/gpu/host1x/Makefile b/drivers/gpu/host1x/Makefile
index 9d00b62..f6c1924 100644
--- a/drivers/gpu/host1x/Makefile
+++ b/drivers/gpu/host1x/Makefile
@@ -3,7 +3,13 @@ ccflags-y = -Idrivers/gpu/host1x
 host1x-objs = \
 	syncpt.o \
 	dev.o \
-	intr.o
+	intr.o \
+	cdma.o \
+	intr.o \
+	channel.o \
+	job.o \
+	memmgr.o
 
+obj-$(CONFIG_TEGRA_HOST1X_CMA) += cma.o
 obj-$(CONFIG_TEGRA_HOST1X) += hw/
 obj-$(CONFIG_TEGRA_HOST1X) += host1x.o
diff --git a/drivers/gpu/host1x/cdma.c b/drivers/gpu/host1x/cdma.c
new file mode 100644
index 0000000..1193fea
--- /dev/null
+++ b/drivers/gpu/host1x/cdma.c
@@ -0,0 +1,438 @@
+/*
+ * Tegra host1x Command DMA
+ *
+ * Copyright (c) 2010-2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "cdma.h"
+#include "channel.h"
+#include "dev.h"
+#include "memmgr.h"
+#include <asm/cacheflush.h>
+
+#include <linux/slab.h>
+#include <linux/kfifo.h>
+#include <linux/interrupt.h>
+#include <trace/events/host1x.h>
+
+#define TRACE_MAX_LENGTH 128U
+
+/*
+ * Add an entry to the sync queue.
+ */
+static void add_to_sync_queue(struct host1x_cdma *cdma,
+			      struct host1x_job *job,
+			      u32 nr_slots,
+			      u32 first_get)
+{
+	if (job->syncpt_id == NVSYNCPT_INVALID) {
+		dev_warn(&job->ch->dev->dev, "%s: Invalid syncpt\n",
+				__func__);
+		return;
+	}
+
+	job->first_get = first_get;
+	job->num_slots = nr_slots;
+	host1x_job_get(job);
+	list_add_tail(&job->list, &cdma->sync_queue);
+}
+
+/*
+ * Return the status of the cdma's sync queue or push buffer for the given event
+ *  - sq empty: returns 1 for empty, 0 for not empty (as in "1 empty queue" :-)
+ *  - pb space: returns the number of free slots in the channel's push buffer
+ * Must be called with the cdma lock held.
+ */
+static unsigned int cdma_status_locked(struct host1x_cdma *cdma,
+		enum cdma_event event)
+{
+	struct host1x *host1x = cdma_to_host1x(cdma);
+	switch (event) {
+	case CDMA_EVENT_SYNC_QUEUE_EMPTY:
+		return list_empty(&cdma->sync_queue) ? 1 : 0;
+	case CDMA_EVENT_PUSH_BUFFER_SPACE: {
+		struct push_buffer *pb = &cdma->push_buffer;
+		return host1x->cdma_pb_op.space(pb);
+	}
+	default:
+		return 0;
+	}
+}
+
+/*
+ * Sleep (if necessary) until the requested event happens
+ *   - CDMA_EVENT_SYNC_QUEUE_EMPTY : sync queue is completely empty.
+ *     - Returns 1
+ *   - CDMA_EVENT_PUSH_BUFFER_SPACE : there is space in the push buffer
+ *     - Return the amount of space (> 0)
+ * Must be called with the cdma lock held.
+ */
+unsigned int host1x_cdma_wait_locked(struct host1x_cdma *cdma,
+		enum cdma_event event)
+{
+	for (;;) {
+		unsigned int space = cdma_status_locked(cdma, event);
+		if (space)
+			return space;
+
+		trace_host1x_wait_cdma(cdma_to_channel(cdma)->dev->name,
+				event);
+
+		/* If somebody has managed to already start waiting, yield */
+		if (cdma->event != CDMA_EVENT_NONE) {
+			mutex_unlock(&cdma->lock);
+			schedule();
+			mutex_lock(&cdma->lock);
+			continue;
+		}
+		cdma->event = event;
+
+		mutex_unlock(&cdma->lock);
+		down(&cdma->sem);
+		mutex_lock(&cdma->lock);
+	}
+	return 0;
+}
+
+/*
+ * Start timer for a buffer submition that has completed yet.
+ * Must be called with the cdma lock held.
+ */
+static void cdma_start_timer_locked(struct host1x_cdma *cdma,
+		struct host1x_job *job)
+{
+	struct host1x *host = cdma_to_host1x(cdma);
+
+	if (cdma->timeout.clientid) {
+		/* timer already started */
+		return;
+	}
+
+	cdma->timeout.clientid = job->clientid;
+	cdma->timeout.syncpt = host1x_syncpt_get(host, job->syncpt_id);
+	cdma->timeout.syncpt_val = job->syncpt_end;
+	cdma->timeout.start_ktime = ktime_get();
+
+	schedule_delayed_work(&cdma->timeout.wq,
+			msecs_to_jiffies(job->timeout));
+}
+
+/*
+ * Stop timer when a buffer submition completes.
+ * Must be called with the cdma lock held.
+ */
+static void stop_cdma_timer_locked(struct host1x_cdma *cdma)
+{
+	cancel_delayed_work(&cdma->timeout.wq);
+	cdma->timeout.clientid = 0;
+}
+
+/*
+ * For all sync queue entries that have already finished according to the
+ * current sync point registers:
+ *  - unpin & unref their mems
+ *  - pop their push buffer slots
+ *  - remove them from the sync queue
+ * This is normally called from the host code's worker thread, but can be
+ * called manually if necessary.
+ * Must be called with the cdma lock held.
+ */
+static void update_cdma_locked(struct host1x_cdma *cdma)
+{
+	bool signal = false;
+	struct host1x *host1x = cdma_to_host1x(cdma);
+	struct host1x_job *job, *n;
+
+	/* If CDMA is stopped, queue is cleared and we can return */
+	if (!cdma->running)
+		return;
+
+	/*
+	 * Walk the sync queue, reading the sync point registers as necessary,
+	 * to consume as many sync queue entries as possible without blocking
+	 */
+	list_for_each_entry_safe(job, n, &cdma->sync_queue, list) {
+		struct host1x_syncpt *sp = host1x->syncpt + job->syncpt_id;
+
+		/* Check whether this syncpt has completed, and bail if not */
+		if (!host1x_syncpt_is_expired(sp, job->syncpt_end)) {
+			/* Start timer on next pending syncpt */
+			if (job->timeout)
+				cdma_start_timer_locked(cdma, job);
+			break;
+		}
+
+		/* Cancel timeout, when a buffer completes */
+		if (cdma->timeout.clientid)
+			stop_cdma_timer_locked(cdma);
+
+		/* Unpin the memory */
+		host1x_job_unpin(job);
+
+		/* Pop push buffer slots */
+		if (job->num_slots) {
+			struct push_buffer *pb = &cdma->push_buffer;
+			host1x->cdma_pb_op.pop_from(pb, job->num_slots);
+			if (cdma->event == CDMA_EVENT_PUSH_BUFFER_SPACE)
+				signal = true;
+		}
+
+		list_del(&job->list);
+		host1x_job_put(job);
+	}
+
+	if (list_empty(&cdma->sync_queue) &&
+				cdma->event == CDMA_EVENT_SYNC_QUEUE_EMPTY)
+			signal = true;
+
+	/* Wake up CdmaWait() if the requested event happened */
+	if (signal) {
+		cdma->event = CDMA_EVENT_NONE;
+		up(&cdma->sem);
+	}
+}
+
+void host1x_cdma_update_sync_queue(struct host1x_cdma *cdma,
+		struct platform_device *dev)
+{
+	u32 get_restart;
+	u32 syncpt_incrs;
+	struct host1x_job *job = NULL;
+	u32 syncpt_val;
+	struct host1x *host1x = cdma_to_host1x(cdma);
+
+	syncpt_val = host1x_syncpt_load_min(cdma->timeout.syncpt);
+
+	dev_dbg(&dev->dev,
+		"%s: starting cleanup (thresh %d)\n",
+		__func__, syncpt_val);
+
+	/*
+	 * Move the sync_queue read pointer to the first entry that hasn't
+	 * completed based on the current HW syncpt value. It's likely there
+	 * won't be any (i.e. we're still at the head), but covers the case
+	 * where a syncpt incr happens just prior/during the teardown.
+	 */
+
+	dev_dbg(&dev->dev,
+		"%s: skip completed buffers still in sync_queue\n",
+		__func__);
+
+	list_for_each_entry(job, &cdma->sync_queue, list) {
+		if (syncpt_val < job->syncpt_end)
+			break;
+
+		host1x_job_dump(&dev->dev, job);
+	}
+
+	/*
+	 * Walk the sync_queue, first incrementing with the CPU syncpts that
+	 * are partially executed (the first buffer) or fully skipped while
+	 * still in the current context (slots are also NOP-ed).
+	 *
+	 * At the point contexts are interleaved, syncpt increments must be
+	 * done inline with the pushbuffer from a GATHER buffer to maintain
+	 * the order (slots are modified to be a GATHER of syncpt incrs).
+	 *
+	 * Note: save in get_restart the location where the timed out buffer
+	 * started in the PB, so we can start the refetch from there (with the
+	 * modified NOP-ed PB slots). This lets things appear to have completed
+	 * properly for this buffer and resources are freed.
+	 */
+
+	dev_dbg(&dev->dev,
+		"%s: perform CPU incr on pending same ctx buffers\n",
+		__func__);
+
+	get_restart = cdma->last_put;
+	if (!list_empty(&cdma->sync_queue))
+		get_restart = job->first_get;
+
+	/* do CPU increments as long as this context continues */
+	list_for_each_entry_from(job, &cdma->sync_queue, list) {
+		/* different context, gets us out of this loop */
+		if (job->clientid != cdma->timeout.clientid)
+			break;
+
+		/* won't need a timeout when replayed */
+		job->timeout = 0;
+
+		syncpt_incrs = job->syncpt_end - syncpt_val;
+		dev_dbg(&dev->dev,
+			"%s: CPU incr (%d)\n", __func__, syncpt_incrs);
+
+		host1x_job_dump(&dev->dev, job);
+
+		/* safe to use CPU to incr syncpts */
+		host1x->cdma_op.timeout_cpu_incr(cdma,
+				job->first_get,
+				syncpt_incrs,
+				job->syncpt_end,
+				job->num_slots);
+
+		syncpt_val += syncpt_incrs;
+	}
+
+	list_for_each_entry_from(job, &cdma->sync_queue, list)
+		if (job->clientid == cdma->timeout.clientid)
+			job->timeout = 500;
+
+	dev_dbg(&dev->dev,
+		"%s: finished sync_queue modification\n", __func__);
+
+	/* roll back DMAGET and start up channel again */
+	host1x->cdma_op.timeout_teardown_end(cdma, get_restart);
+}
+
+/*
+ * Create a cdma
+ */
+int host1x_cdma_init(struct host1x_cdma *cdma)
+{
+	int err;
+	struct push_buffer *pb = &cdma->push_buffer;
+	struct host1x *host1x = cdma_to_host1x(cdma);
+
+	mutex_init(&cdma->lock);
+	sema_init(&cdma->sem, 0);
+
+	INIT_LIST_HEAD(&cdma->sync_queue);
+
+	cdma->event = CDMA_EVENT_NONE;
+	cdma->running = false;
+	cdma->torndown = false;
+
+	err = host1x->cdma_pb_op.init(pb);
+	if (err)
+		return err;
+	return 0;
+}
+
+/*
+ * Destroy a cdma
+ */
+void host1x_cdma_deinit(struct host1x_cdma *cdma)
+{
+	struct push_buffer *pb = &cdma->push_buffer;
+	struct host1x *host1x = cdma_to_host1x(cdma);
+
+	if (cdma->running) {
+		pr_warn("%s: CDMA still running\n",
+				__func__);
+	} else {
+		host1x->cdma_pb_op.destroy(pb);
+		host1x->cdma_op.timeout_destroy(cdma);
+	}
+}
+
+/*
+ * Begin a cdma submit
+ */
+int host1x_cdma_begin(struct host1x_cdma *cdma, struct host1x_job *job)
+{
+	struct host1x *host1x = cdma_to_host1x(cdma);
+
+	mutex_lock(&cdma->lock);
+
+	if (job->timeout) {
+		/* init state on first submit with timeout value */
+		if (!cdma->timeout.initialized) {
+			int err;
+			err = host1x->cdma_op.timeout_init(cdma,
+					job->syncpt_id);
+			if (err) {
+				mutex_unlock(&cdma->lock);
+				return err;
+			}
+		}
+	}
+	if (!cdma->running)
+		host1x->cdma_op.start(cdma);
+
+	cdma->slots_free = 0;
+	cdma->slots_used = 0;
+	cdma->first_get = host1x->cdma_pb_op.putptr(&cdma->push_buffer);
+
+	trace_host1x_cdma_begin(job->ch->dev->name);
+	return 0;
+}
+
+/*
+ * Push two words into a push buffer slot
+ * Blocks as necessary if the push buffer is full.
+ */
+void host1x_cdma_push(struct host1x_cdma *cdma, u32 op1, u32 op2)
+{
+	host1x_cdma_push_gather(cdma, NULL, 0, op1, op2);
+}
+
+/*
+ * Push two words into a push buffer slot
+ * Blocks as necessary if the push buffer is full.
+ */
+void host1x_cdma_push_gather(struct host1x_cdma *cdma,
+		struct mem_handle *handle,
+		u32 offset, u32 op1, u32 op2)
+{
+	struct host1x *host1x = cdma_to_host1x(cdma);
+	u32 slots_free = cdma->slots_free;
+	struct push_buffer *pb = &cdma->push_buffer;
+
+	if (slots_free == 0) {
+		host1x->cdma_op.kick(cdma);
+		slots_free = host1x_cdma_wait_locked(cdma,
+				CDMA_EVENT_PUSH_BUFFER_SPACE);
+	}
+	cdma->slots_free = slots_free - 1;
+	cdma->slots_used++;
+	host1x->cdma_pb_op.push_to(pb, handle, op1, op2);
+}
+
+/*
+ * End a cdma submit
+ * Kick off DMA, add job to the sync queue, and a number of slots to be freed
+ * from the pushbuffer. The handles for a submit must all be pinned at the same
+ * time, but they can be unpinned in smaller chunks.
+ */
+void host1x_cdma_end(struct host1x_cdma *cdma,
+		struct host1x_job *job)
+{
+	struct host1x *host1x = cdma_to_host1x(cdma);
+	bool was_idle = list_empty(&cdma->sync_queue);
+
+	host1x->cdma_op.kick(cdma);
+
+	add_to_sync_queue(cdma,
+			job,
+			cdma->slots_used,
+			cdma->first_get);
+
+	/* start timer on idle -> active transitions */
+	if (job->timeout && was_idle)
+		cdma_start_timer_locked(cdma, job);
+
+	trace_host1x_cdma_end(job->ch->dev->name);
+	mutex_unlock(&cdma->lock);
+}
+
+/*
+ * Update cdma state according to current sync point values
+ */
+void host1x_cdma_update(struct host1x_cdma *cdma)
+{
+	mutex_lock(&cdma->lock);
+	update_cdma_locked(cdma);
+	mutex_unlock(&cdma->lock);
+}
diff --git a/drivers/gpu/host1x/cdma.h b/drivers/gpu/host1x/cdma.h
new file mode 100644
index 0000000..5fd7cdf
--- /dev/null
+++ b/drivers/gpu/host1x/cdma.h
@@ -0,0 +1,107 @@
+/*
+ * Tegra host1x Command DMA
+ *
+ * Copyright (c) 2010-2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef __NVHOST_CDMA_H
+#define __NVHOST_CDMA_H
+
+#include <linux/sched.h>
+#include <linux/semaphore.h>
+
+#include <linux/host1x.h>
+#include <linux/list.h>
+
+struct host1x_syncpt;
+struct host1x_userctx_timeout;
+struct host1x_job;
+struct mem_handle;
+
+/*
+ * cdma
+ *
+ * This is in charge of a host command DMA channel.
+ * Sends ops to a push buffer, and takes responsibility for unpinning
+ * (& possibly freeing) of memory after those ops have completed.
+ * Producer:
+ *	begin
+ *		push - send ops to the push buffer
+ *	end - start command DMA and enqueue handles to be unpinned
+ * Consumer:
+ *	update - call to update sync queue and push buffer, unpin memory
+ */
+
+struct push_buffer {
+	u32 *mapped;			/* mapped pushbuffer memory */
+	dma_addr_t phys;		/* physical address of pushbuffer */
+	u32 fence;			/* index we've written */
+	u32 cur;			/* index to write to */
+	struct mem_handle **handle;	/* handle for each opcode pair */
+};
+
+struct buffer_timeout {
+	struct delayed_work wq;		/* work queue */
+	bool initialized;		/* timer one-time setup flag */
+	struct host1x_syncpt *syncpt;	/* buffer completion syncpt */
+	u32 syncpt_val;			/* syncpt value when completed */
+	ktime_t start_ktime;		/* starting time */
+	/* context timeout information */
+	int clientid;
+};
+
+enum cdma_event {
+	CDMA_EVENT_NONE,		/* not waiting for any event */
+	CDMA_EVENT_SYNC_QUEUE_EMPTY,	/* wait for empty sync queue */
+	CDMA_EVENT_PUSH_BUFFER_SPACE	/* wait for space in push buffer */
+};
+
+struct host1x_cdma {
+	struct mutex lock;		/* controls access to shared state */
+	struct semaphore sem;		/* signalled when event occurs */
+	enum cdma_event event;		/* event that sem is waiting for */
+	unsigned int slots_used;	/* pb slots used in current submit */
+	unsigned int slots_free;	/* pb slots free in current submit */
+	unsigned int first_get;		/* DMAGET value, where submit begins */
+	unsigned int last_put;		/* last value written to DMAPUT */
+	struct push_buffer push_buffer;	/* channel's push buffer */
+	struct list_head sync_queue;	/* job queue */
+	struct buffer_timeout timeout;	/* channel's timeout state/wq */
+	bool running;
+	bool torndown;
+};
+
+#define cdma_to_channel(cdma) container_of(cdma, struct host1x_channel, cdma)
+#define cdma_to_host1x(cdma) host1x_get_host(cdma_to_channel(cdma)->dev)
+#define cdma_to_memmgr(cdma) ((cdma_to_host1x(cdma))->memmgr)
+#define pb_to_cdma(pb) container_of(pb, struct host1x_cdma, push_buffer)
+
+int	host1x_cdma_init(struct host1x_cdma *cdma);
+void	host1x_cdma_deinit(struct host1x_cdma *cdma);
+void	host1x_cdma_stop(struct host1x_cdma *cdma);
+int	host1x_cdma_begin(struct host1x_cdma *cdma, struct host1x_job *job);
+void	host1x_cdma_push(struct host1x_cdma *cdma, u32 op1, u32 op2);
+void	host1x_cdma_push_gather(struct host1x_cdma *cdma,
+		struct mem_handle *handle, u32 offset, u32 op1, u32 op2);
+void	host1x_cdma_end(struct host1x_cdma *cdma,
+		struct host1x_job *job);
+void	host1x_cdma_update(struct host1x_cdma *cdma);
+void	host1x_cdma_peek(struct host1x_cdma *cdma,
+		u32 dmaget, int slot, u32 *out);
+unsigned int host1x_cdma_wait_locked(struct host1x_cdma *cdma,
+		enum cdma_event event);
+void host1x_cdma_update_sync_queue(struct host1x_cdma *cdma,
+		struct platform_device *dev);
+#endif
diff --git a/drivers/gpu/host1x/channel.c b/drivers/gpu/host1x/channel.c
new file mode 100644
index 0000000..3705cae
--- /dev/null
+++ b/drivers/gpu/host1x/channel.c
@@ -0,0 +1,137 @@
+/*
+ * Tegra host1x Channel
+ *
+ * Copyright (c) 2010-2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "channel.h"
+#include "dev.h"
+
+#include <linux/slab.h>
+#include <linux/module.h>
+
+#define NVHOST_CHANNEL_LOW_PRIO_MAX_WAIT 50
+
+/* Constructor for the host1x device list */
+void host1x_channel_list_init(struct host1x *host1x)
+{
+	INIT_LIST_HEAD(&host1x->chlist.list);
+}
+
+/*
+ * Iterator function for host1x device list
+ * It takes a fptr as an argument and calls that function for each
+ * device in the list
+ */
+void host1x_channel_for_all(struct host1x *host1x, void *data,
+	int (*fptr)(struct host1x_channel *ch, void *fdata))
+{
+	struct host1x_channel *ch;
+	int ret;
+
+	list_for_each_entry(ch, &host1x->chlist.list, list) {
+		if (ch && fptr) {
+			ret = fptr(ch, data);
+			if (ret) {
+				pr_info("%s: iterator error\n", __func__);
+				break;
+			}
+		}
+	}
+}
+
+
+int host1x_channel_submit(struct host1x_job *job)
+{
+	return host1x_get_host(job->ch->dev)->channel_op.submit(job);
+}
+EXPORT_SYMBOL(host1x_channel_submit);
+
+struct host1x_channel *host1x_channel_get(struct host1x_channel *ch)
+{
+	int err = 0;
+
+	mutex_lock(&ch->reflock);
+	if (ch->refcount == 0)
+		err = host1x_cdma_init(&ch->cdma);
+	if (!err)
+		ch->refcount++;
+
+	mutex_unlock(&ch->reflock);
+
+	return err ? NULL : ch;
+}
+EXPORT_SYMBOL(host1x_channel_get);
+
+void host1x_channel_put(struct host1x_channel *ch)
+{
+	mutex_lock(&ch->reflock);
+	if (ch->refcount == 1) {
+		host1x_get_host(ch->dev)->cdma_op.stop(&ch->cdma);
+		host1x_cdma_deinit(&ch->cdma);
+	}
+	ch->refcount--;
+	mutex_unlock(&ch->reflock);
+}
+EXPORT_SYMBOL(host1x_channel_put);
+
+struct host1x_channel *host1x_channel_alloc(struct platform_device *pdev)
+{
+	struct host1x_channel *ch = NULL;
+	struct host1x *host1x = host1x_get_host(pdev);
+	int chindex = host1x->allocated_channels;
+	int max_channels = host1x->info.nb_channels;
+	int err;
+
+	if (chindex > max_channels)
+		return NULL;
+
+	ch = kzalloc(sizeof(*ch), GFP_KERNEL);
+	if (ch == NULL)
+		return NULL;
+
+	/* Link platform_device to host1x_channel */
+	err = host1x->channel_op.init(ch, host1x, chindex);
+	if (err < 0) {
+		dev_err(&host1x->dev->dev, "failed to init channel %d\n",
+				chindex);
+		kfree(ch);
+		return NULL;
+	}
+	ch->dev = pdev;
+
+	/* Add to channel list */
+	list_add_tail(&ch->list, &host1x->chlist.list);
+
+	host1x->allocated_channels++;
+
+	return ch;
+}
+EXPORT_SYMBOL(host1x_channel_alloc);
+
+void host1x_free_channel(struct host1x_channel *ch)
+{
+	struct host1x *host1x = host1x_get_host(ch->dev);
+	struct host1x_channel *chiter, *tmp;
+	list_for_each_entry_safe(chiter, tmp, &host1x->chlist.list, list) {
+		if (chiter == ch) {
+			list_del(&chiter->list);
+			kfree(ch);
+			host1x->allocated_channels--;
+
+			return;
+		}
+	}
+}
diff --git a/drivers/gpu/host1x/channel.h b/drivers/gpu/host1x/channel.h
new file mode 100644
index 0000000..67d9487
--- /dev/null
+++ b/drivers/gpu/host1x/channel.h
@@ -0,0 +1,64 @@
+/*
+ * Tegra host1x Channel
+ *
+ * Copyright (c) 2010-2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef __NVHOST_CHANNEL_H
+#define __NVHOST_CHANNEL_H
+
+#include <linux/cdev.h>
+#include <linux/io.h>
+#include "cdma.h"
+
+#define NVHOST_MAX_WAIT_CHECKS		256
+#define NVHOST_MAX_GATHERS		512
+#define NVHOST_MAX_HANDLES		1280
+#define NVHOST_MAX_POWERGATE_IDS	2
+
+struct host1x;
+struct platform_device;
+struct host1x_channel;
+
+/*
+ * host1x device list in debug-fs dump of host1x and client device
+ * as well as channel state
+ */
+struct host1x_channel {
+	struct list_head list;
+
+	int refcount;
+	int chid;
+	struct mutex reflock;
+	struct mutex submitlock;
+	void __iomem *regs;
+	struct device *node;
+	struct platform_device *dev;
+	struct cdev cdev;
+	struct host1x_cdma cdma;
+};
+
+/* channel list operations */
+void host1x_channel_list_init(struct host1x *);
+void host1x_channel_for_all(struct host1x *, void *data,
+	int (*fptr)(struct host1x_channel *ch, void *fdata));
+
+struct host1x_channel *host1x_alloc_channel(struct platform_device *dev);
+void host1x_free_channel(struct host1x_channel *ch);
+
+struct host1x_channel *host1x_getchannel(struct host1x_channel *ch);
+void host1x_putchannel(struct host1x_channel *ch);
+
+#endif
diff --git a/drivers/gpu/host1x/cma.c b/drivers/gpu/host1x/cma.c
new file mode 100644
index 0000000..bef9d4d
--- /dev/null
+++ b/drivers/gpu/host1x/cma.c
@@ -0,0 +1,116 @@
+/*
+ * Tegra host1x DMA-BUF support
+ *
+ * Copyright (c) 2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <drm/drmP.h>
+#include <drm/drm.h>
+#include <drm/drm_gem_cma_helper.h>
+#include <linux/host1x.h>
+#include <linux/mutex.h>
+
+#include "memmgr.h"
+
+static inline struct drm_gem_cma_object *to_cma_obj(struct mem_handle *h)
+{
+	return (struct drm_gem_cma_object *)(((u32)h) & MEMMGR_ID_MASK);
+}
+
+struct mem_handle *host1x_cma_alloc(size_t size, size_t align, int flags)
+{
+	return NULL;
+}
+
+void host1x_cma_put(struct mem_handle *handle)
+{
+	struct drm_gem_cma_object *obj = to_cma_obj(handle);
+	struct mutex *struct_mutex = &obj->base.dev->struct_mutex;
+
+	mutex_lock(struct_mutex);
+	drm_gem_object_unreference(&obj->base);
+	mutex_unlock(struct_mutex);
+}
+
+struct sg_table *host1x_cma_pin(struct mem_handle *handle)
+{
+	return NULL;
+}
+
+void host1x_cma_unpin(struct mem_handle *handle, struct sg_table *sgt)
+{
+
+}
+
+
+void *host1x_cma_mmap(struct mem_handle *handle)
+{
+	return (to_cma_obj(handle))->vaddr;
+}
+
+void host1x_cma_munmap(struct mem_handle *handle, void *addr)
+{
+
+}
+
+void *host1x_cma_kmap(struct mem_handle *handle, unsigned int pagenum)
+{
+	return (to_cma_obj(handle))->vaddr + pagenum * PAGE_SIZE;
+}
+
+void host1x_cma_kunmap(struct mem_handle *handle, unsigned int pagenum,
+		void *addr)
+{
+
+}
+
+struct mem_handle *host1x_cma_get(u32 id, struct platform_device *dev)
+{
+	struct drm_gem_cma_object *obj = to_cma_obj((void *)id);
+	struct mutex *struct_mutex = &obj->base.dev->struct_mutex;
+
+	mutex_lock(struct_mutex);
+	drm_gem_object_reference(&obj->base);
+	mutex_unlock(struct_mutex);
+
+	return (struct mem_handle *) ((u32)id | mem_mgr_type_cma);
+}
+
+int host1x_cma_pin_array_ids(struct platform_device *dev,
+		long unsigned *ids,
+		long unsigned id_type_mask,
+		long unsigned id_type,
+		u32 count,
+		struct host1x_job_unpin_data *unpin_data,
+		dma_addr_t *phys_addr)
+{
+	int i;
+	int pin_count = 0;
+
+	for (i = 0; i < count; i++) {
+		struct mem_handle *handle;
+
+		if ((ids[i] & id_type_mask) != id_type)
+			continue;
+
+		handle = host1x_cma_get(ids[i], dev);
+
+		phys_addr[i] = (to_cma_obj(handle)->paddr);
+		unpin_data[pin_count].h = handle;
+
+		pin_count++;
+	}
+	return pin_count;
+}
diff --git a/drivers/gpu/host1x/cma.h b/drivers/gpu/host1x/cma.h
new file mode 100644
index 0000000..69e2540
--- /dev/null
+++ b/drivers/gpu/host1x/cma.h
@@ -0,0 +1,43 @@
+/*
+ * Tegra host1x cma memory manager
+ *
+ * Copyright (c) 2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef __NVHOST_CMA_H
+#define __NVHOST_CMA_H
+
+#include "memmgr.h"
+
+struct platform_device;
+
+struct mem_handle *host1x_cma_alloc(size_t size, size_t align, int flags);
+void host1x_cma_put(struct mem_handle *handle);
+struct sg_table *host1x_cma_pin(struct mem_handle *handle);
+void host1x_cma_unpin(struct mem_handle *handle, struct sg_table *sgt);
+void *host1x_cma_mmap(struct mem_handle *handle);
+void host1x_cma_munmap(struct mem_handle *handle, void *addr);
+void *host1x_cma_kmap(struct mem_handle *handle, unsigned int pagenum);
+void host1x_cma_kunmap(struct mem_handle *handle, unsigned int pagenum,
+		void *addr);
+int host1x_cma_get(u32 id, struct platform_device *dev);
+int host1x_cma_pin_array_ids(struct platform_device *dev,
+		long unsigned *ids,
+		long unsigned id_type_mask,
+		long unsigned id_type,
+		u32 count,
+		struct host1x_job_unpin_data *unpin_data,
+		dma_addr_t *phys_addr);
+#endif
diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
index 9255a49..9209333 100644
--- a/drivers/gpu/host1x/dev.c
+++ b/drivers/gpu/host1x/dev.c
@@ -26,6 +26,7 @@
 #include <linux/io.h>
 #include "dev.h"
 #include "intr.h"
+#include "channel.h"
 #include "hw/host1x01.h"
 
 #define CREATE_TRACE_POINTS
@@ -76,6 +77,16 @@ u32 host1x_sync_readl(struct host1x *host1x, u32 r)
 	return readl(sync_regs + r);
 }
 
+void host1x_ch_writel(struct host1x_channel *ch, u32 v, u32 r)
+{
+	writel(v, ch->regs + r);
+}
+
+u32 host1x_ch_readl(struct host1x_channel *ch, u32 r)
+{
+	return readl(ch->regs + r);
+}
+
 static int host1x_alloc_resources(struct host1x *host)
 {
 	host->intr.syncpt = devm_kzalloc(&host->dev->dev,
@@ -184,6 +195,8 @@ static int host1x_probe(struct platform_device *dev)
 
 	host1x_syncpt_reset(host);
 
+	host1x_channel_list_init(host);
+
 	host1x_intr_start(&host->intr, clk_get_rate(host->clk));
 
 	host1x = host;
diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h
index a1622bb..093ac85 100644
--- a/drivers/gpu/host1x/dev.h
+++ b/drivers/gpu/host1x/dev.h
@@ -19,13 +19,59 @@
 
 #include <linux/host1x.h>
 
+#include "channel.h"
 #include "syncpt.h"
 #include "intr.h"
 
 struct host1x;
+struct host1x_intr;
 struct host1x_syncpt;
+struct host1x_channel;
+struct host1x_cdma;
+struct host1x_job;
+struct push_buffer;
+struct dentry;
+struct mem_handle;
 struct platform_device;
 
+struct host1x_channel_ops {
+	const char *soc_name;
+	int (*init)(struct host1x_channel *,
+		    struct host1x *,
+		    int chid);
+	int (*submit)(struct host1x_job *job);
+};
+
+struct host1x_cdma_ops {
+	void (*start)(struct host1x_cdma *);
+	void (*stop)(struct host1x_cdma *);
+	void (*kick)(struct  host1x_cdma *);
+	int (*timeout_init)(struct host1x_cdma *,
+			    u32 syncpt_id);
+	void (*timeout_destroy)(struct host1x_cdma *);
+	void (*timeout_teardown_begin)(struct host1x_cdma *);
+	void (*timeout_teardown_end)(struct host1x_cdma *,
+				     u32 getptr);
+	void (*timeout_cpu_incr)(struct host1x_cdma *,
+				 u32 getptr,
+				 u32 syncpt_incrs,
+				 u32 syncval,
+				 u32 nr_slots);
+};
+
+struct host1x_pushbuffer_ops {
+	void (*reset)(struct push_buffer *);
+	int (*init)(struct push_buffer *);
+	void (*destroy)(struct push_buffer *);
+	void (*push_to)(struct push_buffer *,
+			struct mem_handle *,
+			u32 op1, u32 op2);
+	void (*pop_from)(struct push_buffer *,
+			 unsigned int slots);
+	u32 (*space)(struct push_buffer *);
+	u32 (*putptr)(struct push_buffer *);
+};
+
 struct host1x_syncpt_ops {
 	void (*reset)(struct host1x_syncpt *);
 	void (*reset_wait_base)(struct host1x_syncpt *);
@@ -70,9 +116,15 @@ struct host1x {
 	struct host1x_syncpt *nop_sp;
 
 	const char *soc_name;
+	struct host1x_channel_ops channel_op;
+	struct host1x_cdma_ops cdma_op;
+	struct host1x_pushbuffer_ops cdma_pb_op;
 	struct host1x_syncpt_ops syncpt_op;
 	struct host1x_intr_ops intr_op;
 
+	struct host1x_channel chlist;
+	int allocated_channels;
+
 	struct dentry *debugfs;
 };
 
@@ -90,5 +142,7 @@ struct host1x *host1x_get_host(struct platform_device *_dev)
 
 void host1x_sync_writel(struct host1x *host1x, u32 r, u32 v);
 u32 host1x_sync_readl(struct host1x *host1x, u32 r);
+void host1x_ch_writel(struct host1x_channel *ch, u32 r, u32 v);
+u32 host1x_ch_readl(struct host1x_channel *ch, u32 r);
 
 #endif
diff --git a/drivers/gpu/host1x/hw/cdma_hw.c b/drivers/gpu/host1x/hw/cdma_hw.c
new file mode 100644
index 0000000..55adaa6
--- /dev/null
+++ b/drivers/gpu/host1x/hw/cdma_hw.c
@@ -0,0 +1,477 @@
+/*
+ * Tegra host1x Command DMA
+ *
+ * Copyright (c) 2010-2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/slab.h>
+#include <linux/scatterlist.h>
+#include <linux/dma-mapping.h>
+#include "cdma.h"
+#include "channel.h"
+#include "dev.h"
+#include "memmgr.h"
+
+#include "cdma_hw.h"
+
+static inline u32 host1x_channel_dmactrl(int stop, int get_rst, int init_get)
+{
+	return host1x_channel_dmactrl_dmastop_f(stop)
+		| host1x_channel_dmactrl_dmagetrst_f(get_rst)
+		| host1x_channel_dmactrl_dmainitget_f(init_get);
+}
+
+static void cdma_timeout_handler(struct work_struct *work);
+
+/*
+ * push_buffer
+ *
+ * The push buffer is a circular array of words to be fetched by command DMA.
+ * Note that it works slightly differently to the sync queue; fence == cur
+ * means that the push buffer is full, not empty.
+ */
+
+
+/**
+ * Reset to empty push buffer
+ */
+static void push_buffer_reset(struct push_buffer *pb)
+{
+	pb->fence = PUSH_BUFFER_SIZE - 8;
+	pb->cur = 0;
+}
+
+/**
+ * Init push buffer resources
+ */
+static void push_buffer_destroy(struct push_buffer *pb);
+static int push_buffer_init(struct push_buffer *pb)
+{
+	struct host1x_cdma *cdma = pb_to_cdma(pb);
+	struct host1x *host1x = cdma_to_host1x(cdma);
+	pb->mapped = NULL;
+	pb->phys = 0;
+	pb->handle = NULL;
+
+	host1x->cdma_pb_op.reset(pb);
+
+	/* allocate and map pushbuffer memory */
+	pb->mapped = dma_alloc_writecombine(&host1x->dev->dev,
+			PUSH_BUFFER_SIZE + 4, &pb->phys, GFP_KERNEL);
+	if (IS_ERR_OR_NULL(pb->mapped)) {
+		pb->mapped = NULL;
+		goto fail;
+	}
+
+	/* memory for storing mem client and handles for each opcode pair */
+	pb->handle = kzalloc(HOST1X_GATHER_QUEUE_SIZE *
+				sizeof(struct mem_handle *),
+			GFP_KERNEL);
+	if (!pb->handle)
+		goto fail;
+
+	/* put the restart at the end of pushbuffer memory */
+	*(pb->mapped + (PUSH_BUFFER_SIZE >> 2)) =
+		host1x_opcode_restart(pb->phys);
+
+	return 0;
+
+fail:
+	push_buffer_destroy(pb);
+	return -ENOMEM;
+}
+
+/*
+ * Clean up push buffer resources
+ */
+static void push_buffer_destroy(struct push_buffer *pb)
+{
+	struct host1x_cdma *cdma = pb_to_cdma(pb);
+	struct host1x *host1x = cdma_to_host1x(cdma);
+
+	if (pb->phys != 0)
+		dma_free_writecombine(&host1x->dev->dev,
+				PUSH_BUFFER_SIZE + 4,
+				pb->mapped, pb->phys);
+
+	kfree(pb->handle);
+
+	pb->mapped = NULL;
+	pb->phys = 0;
+	pb->handle = 0;
+}
+
+/*
+ * Push two words to the push buffer
+ * Caller must ensure push buffer is not full
+ */
+static void push_buffer_push_to(struct push_buffer *pb,
+		struct mem_handle *handle,
+		u32 op1, u32 op2)
+{
+	u32 cur = pb->cur;
+	u32 *p = (u32 *)((u32)pb->mapped + cur);
+	u32 cur_mem = (cur/8) & (HOST1X_GATHER_QUEUE_SIZE - 1);
+	WARN_ON(cur == pb->fence);
+	*(p++) = op1;
+	*(p++) = op2;
+	pb->handle[cur_mem] = handle;
+	pb->cur = (cur + 8) & (PUSH_BUFFER_SIZE - 1);
+}
+
+/*
+ * Pop a number of two word slots from the push buffer
+ * Caller must ensure push buffer is not empty
+ */
+static void push_buffer_pop_from(struct push_buffer *pb,
+		unsigned int slots)
+{
+	/* Clear the mem references for old items from pb */
+	unsigned int i;
+	u32 fence_mem = pb->fence/8;
+	for (i = 0; i < slots; i++) {
+		int cur_fence_mem = (fence_mem+i)
+				& (HOST1X_GATHER_QUEUE_SIZE - 1);
+		pb->handle[cur_fence_mem] = NULL;
+	}
+	/* Advance the next write position */
+	pb->fence = (pb->fence + slots * 8) & (PUSH_BUFFER_SIZE - 1);
+}
+
+/*
+ * Return the number of two word slots free in the push buffer
+ */
+static u32 push_buffer_space(struct push_buffer *pb)
+{
+	return ((pb->fence - pb->cur) & (PUSH_BUFFER_SIZE - 1)) / 8;
+}
+
+static u32 push_buffer_putptr(struct push_buffer *pb)
+{
+	return pb->phys + pb->cur;
+}
+
+/*
+ * The syncpt incr buffer is filled with methods to increment syncpts, which
+ * is later GATHER-ed into the mainline PB. It's used when a timed out context
+ * is interleaved with other work, so needs to inline the syncpt increments
+ * to maintain the count (but otherwise does no work).
+ */
+
+/*
+ * Init timeout resources
+ */
+static int cdma_timeout_init(struct host1x_cdma *cdma,
+				 u32 syncpt_id)
+{
+	if (syncpt_id == NVSYNCPT_INVALID)
+		return -EINVAL;
+
+	INIT_DELAYED_WORK(&cdma->timeout.wq, cdma_timeout_handler);
+	cdma->timeout.initialized = true;
+
+	return 0;
+}
+
+/*
+ * Clean up timeout resources
+ */
+static void cdma_timeout_destroy(struct host1x_cdma *cdma)
+{
+	if (cdma->timeout.initialized)
+		cancel_delayed_work(&cdma->timeout.wq);
+	cdma->timeout.initialized = false;
+}
+
+/*
+ * Increment timedout buffer's syncpt via CPU.
+ */
+static void cdma_timeout_cpu_incr(struct host1x_cdma *cdma, u32 getptr,
+				u32 syncpt_incrs, u32 syncval, u32 nr_slots)
+{
+	struct host1x *host1x = cdma_to_host1x(cdma);
+	struct push_buffer *pb = &cdma->push_buffer;
+	u32 i, getidx;
+
+	for (i = 0; i < syncpt_incrs; i++)
+		host1x_syncpt_cpu_incr(cdma->timeout.syncpt);
+
+	/* after CPU incr, ensure shadow is up to date */
+	host1x_syncpt_load_min(cdma->timeout.syncpt);
+
+	/* NOP all the PB slots */
+	getidx = getptr - pb->phys;
+	while (nr_slots--) {
+		u32 *p = (u32 *)((u32)pb->mapped + getidx);
+		*(p++) = HOST1X_OPCODE_NOOP;
+		*(p++) = HOST1X_OPCODE_NOOP;
+		dev_dbg(&host1x->dev->dev, "%s: NOP at 0x%x\n",
+			__func__, pb->phys + getidx);
+		getidx = (getidx + 8) & (PUSH_BUFFER_SIZE - 1);
+	}
+	wmb();
+}
+
+/*
+ * Start channel DMA
+ */
+static void cdma_start(struct host1x_cdma *cdma)
+{
+	struct host1x_channel *ch = cdma_to_channel(cdma);
+	struct host1x *host1x = cdma_to_host1x(cdma);
+
+	if (cdma->running)
+		return;
+
+	cdma->last_put = host1x->cdma_pb_op.putptr(&cdma->push_buffer);
+
+	host1x_ch_writel(ch, host1x_channel_dmactrl(true, false, false),
+		host1x_channel_dmactrl_r());
+
+	/* set base, put, end pointer (all of memory) */
+	host1x_ch_writel(ch, 0, host1x_channel_dmastart_r());
+	host1x_ch_writel(ch, cdma->last_put, host1x_channel_dmaput_r());
+	host1x_ch_writel(ch, 0xFFFFFFFF, host1x_channel_dmaend_r());
+
+	/* reset GET */
+	host1x_ch_writel(ch, host1x_channel_dmactrl(true, true, true),
+		host1x_channel_dmactrl_r());
+
+	/* start the command DMA */
+	host1x_ch_writel(ch, host1x_channel_dmactrl(false, false, false),
+		host1x_channel_dmactrl_r());
+
+	cdma->running = true;
+}
+
+/*
+ * Similar to cdma_start(), but rather than starting from an idle
+ * state (where DMA GET is set to DMA PUT), on a timeout we restore
+ * DMA GET from an explicit value (so DMA may again be pending).
+ */
+static void cdma_timeout_restart(struct host1x_cdma *cdma, u32 getptr)
+{
+	struct host1x *host1x = cdma_to_host1x(cdma);
+	struct host1x_channel *ch = cdma_to_channel(cdma);
+
+	if (cdma->running)
+		return;
+
+	cdma->last_put = host1x->cdma_pb_op.putptr(&cdma->push_buffer);
+
+	host1x_ch_writel(ch, host1x_channel_dmactrl(true, false, false),
+		host1x_channel_dmactrl_r());
+
+	/* set base, end pointer (all of memory) */
+	host1x_ch_writel(ch, 0, host1x_channel_dmastart_r());
+	host1x_ch_writel(ch, 0xFFFFFFFF, host1x_channel_dmaend_r());
+
+	/* set GET, by loading the value in PUT (then reset GET) */
+	host1x_ch_writel(ch, getptr, host1x_channel_dmaput_r());
+	host1x_ch_writel(ch, host1x_channel_dmactrl(true, true, true),
+		host1x_channel_dmactrl_r());
+
+	dev_dbg(&host1x->dev->dev,
+		"%s: DMA GET 0x%x, PUT HW 0x%x / shadow 0x%x\n",
+		__func__,
+		host1x_ch_readl(ch, host1x_channel_dmaget_r()),
+		host1x_ch_readl(ch, host1x_channel_dmaput_r()),
+		cdma->last_put);
+
+	/* deassert GET reset and set PUT */
+	host1x_ch_writel(ch, host1x_channel_dmactrl(true, false, false),
+		host1x_channel_dmactrl_r());
+	host1x_ch_writel(ch, cdma->last_put, host1x_channel_dmaput_r());
+
+	/* start the command DMA */
+	host1x_ch_writel(ch, host1x_channel_dmactrl(false, false, false),
+		host1x_channel_dmactrl_r());
+
+	cdma->running = true;
+}
+
+/*
+ * Kick channel DMA into action by writing its PUT offset (if it has changed)
+ */
+static void cdma_kick(struct host1x_cdma *cdma)
+{
+	struct host1x *host1x = cdma_to_host1x(cdma);
+	struct host1x_channel *ch = cdma_to_channel(cdma);
+	u32 put;
+
+	put = host1x->cdma_pb_op.putptr(&cdma->push_buffer);
+
+	if (put != cdma->last_put) {
+		host1x_ch_writel(ch, put, host1x_channel_dmaput_r());
+		cdma->last_put = put;
+	}
+}
+
+static void cdma_stop(struct host1x_cdma *cdma)
+{
+	struct host1x_channel *ch = cdma_to_channel(cdma);
+
+	mutex_lock(&cdma->lock);
+	if (cdma->running) {
+		host1x_cdma_wait_locked(cdma, CDMA_EVENT_SYNC_QUEUE_EMPTY);
+		host1x_ch_writel(ch, host1x_channel_dmactrl(true, false, false),
+			host1x_channel_dmactrl_r());
+		cdma->running = false;
+	}
+	mutex_unlock(&cdma->lock);
+}
+
+/*
+ * Stops both channel's command processor and CDMA immediately.
+ * Also, tears down the channel and resets corresponding module.
+ */
+static void cdma_timeout_teardown_begin(struct host1x_cdma *cdma)
+{
+	struct host1x *dev = cdma_to_host1x(cdma);
+	struct host1x_channel *ch = cdma_to_channel(cdma);
+	u32 cmdproc_stop;
+
+	if (cdma->torndown && !cdma->running) {
+		dev_warn(&dev->dev->dev, "Already torn down\n");
+		return;
+	}
+
+	dev_dbg(&dev->dev->dev,
+		"begin channel teardown (channel id %d)\n", ch->chid);
+
+	cmdproc_stop = host1x_sync_readl(dev, host1x_sync_cmdproc_stop_r());
+	cmdproc_stop |= BIT(ch->chid);
+	host1x_sync_writel(dev, cmdproc_stop, host1x_sync_cmdproc_stop_r());
+
+	dev_dbg(&dev->dev->dev,
+		"%s: DMA GET 0x%x, PUT HW 0x%x / shadow 0x%x\n",
+		__func__,
+		host1x_ch_readl(ch, host1x_channel_dmaget_r()),
+		host1x_ch_readl(ch, host1x_channel_dmaput_r()),
+		cdma->last_put);
+
+	host1x_ch_writel(ch, host1x_channel_dmactrl(true, false, false),
+		host1x_channel_dmactrl_r());
+
+	host1x_sync_writel(dev, BIT(ch->chid), host1x_sync_ch_teardown_r());
+
+	cdma->running = false;
+	cdma->torndown = true;
+}
+
+static void cdma_timeout_teardown_end(struct host1x_cdma *cdma, u32 getptr)
+{
+	struct host1x *host1x = cdma_to_host1x(cdma);
+	struct host1x_channel *ch = cdma_to_channel(cdma);
+	u32 cmdproc_stop;
+
+	dev_dbg(&host1x->dev->dev,
+		"end channel teardown (id %d, DMAGET restart = 0x%x)\n",
+		ch->chid, getptr);
+
+	cmdproc_stop = host1x_sync_readl(host1x, host1x_sync_cmdproc_stop_r());
+	cmdproc_stop &= ~(BIT(ch->chid));
+	host1x_sync_writel(host1x, cmdproc_stop, host1x_sync_cmdproc_stop_r());
+
+	cdma->torndown = false;
+	cdma_timeout_restart(cdma, getptr);
+}
+
+/*
+ * If this timeout fires, it indicates the current sync_queue entry has
+ * exceeded its TTL and the userctx should be timed out and remaining
+ * submits already issued cleaned up (future submits return an error).
+ */
+static void cdma_timeout_handler(struct work_struct *work)
+{
+	struct host1x_cdma *cdma;
+	struct host1x *host1x;
+	struct host1x_channel *ch;
+
+	u32 syncpt_val;
+
+	u32 prev_cmdproc, cmdproc_stop;
+
+	cdma = container_of(to_delayed_work(work), struct host1x_cdma,
+			    timeout.wq);
+	host1x = cdma_to_host1x(cdma);
+	ch = cdma_to_channel(cdma);
+
+	mutex_lock(&cdma->lock);
+
+	if (!cdma->timeout.clientid) {
+		dev_dbg(&host1x->dev->dev,
+			 "cdma_timeout: expired, but has no clientid\n");
+		mutex_unlock(&cdma->lock);
+		return;
+	}
+
+	/* stop processing to get a clean snapshot */
+	prev_cmdproc = host1x_sync_readl(host1x, host1x_sync_cmdproc_stop_r());
+	cmdproc_stop = prev_cmdproc | BIT(ch->chid);
+	host1x_sync_writel(host1x, cmdproc_stop, host1x_sync_cmdproc_stop_r());
+
+	dev_dbg(&host1x->dev->dev, "cdma_timeout: cmdproc was 0x%x is 0x%x\n",
+		prev_cmdproc, cmdproc_stop);
+
+	syncpt_val = host1x_syncpt_load_min(host1x->syncpt);
+
+	/* has buffer actually completed? */
+	if ((s32)(syncpt_val - cdma->timeout.syncpt_val) >= 0) {
+		dev_dbg(&host1x->dev->dev,
+			 "cdma_timeout: expired, but buffer had completed\n");
+		/* restore */
+		cmdproc_stop = prev_cmdproc & ~(BIT(ch->chid));
+		host1x_sync_writel(host1x, cmdproc_stop,
+			host1x_sync_cmdproc_stop_r());
+		mutex_unlock(&cdma->lock);
+		return;
+	}
+
+	dev_warn(&host1x->dev->dev,
+		"%s: timeout: %d (%s), HW thresh %d, done %d\n",
+		__func__,
+		cdma->timeout.syncpt->id, cdma->timeout.syncpt->name,
+		syncpt_val, cdma->timeout.syncpt_val);
+
+	/* stop HW, resetting channel/module */
+	host1x->cdma_op.timeout_teardown_begin(cdma);
+
+	host1x_cdma_update_sync_queue(cdma, ch->dev);
+	mutex_unlock(&cdma->lock);
+}
+
+static const struct host1x_cdma_ops host1x_cdma_ops = {
+	.start = cdma_start,
+	.stop = cdma_stop,
+	.kick = cdma_kick,
+
+	.timeout_init = cdma_timeout_init,
+	.timeout_destroy = cdma_timeout_destroy,
+	.timeout_teardown_begin = cdma_timeout_teardown_begin,
+	.timeout_teardown_end = cdma_timeout_teardown_end,
+	.timeout_cpu_incr = cdma_timeout_cpu_incr,
+};
+
+static const struct host1x_pushbuffer_ops host1x_pushbuffer_ops = {
+	.reset = push_buffer_reset,
+	.init = push_buffer_init,
+	.destroy = push_buffer_destroy,
+	.push_to = push_buffer_push_to,
+	.pop_from = push_buffer_pop_from,
+	.space = push_buffer_space,
+	.putptr = push_buffer_putptr,
+};
+
diff --git a/drivers/gpu/host1x/hw/cdma_hw.h b/drivers/gpu/host1x/hw/cdma_hw.h
new file mode 100644
index 0000000..4ce2f43
--- /dev/null
+++ b/drivers/gpu/host1x/hw/cdma_hw.h
@@ -0,0 +1,37 @@
+/*
+ * Tegra host1x Command DMA
+ *
+ * Copyright (c) 2011-2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef __HOST1X_CDMA_H
+#define __HOST1X_CDMA_H
+
+/*
+ * Size of the sync queue. If it is too small, we won't be able to queue up
+ * many command buffers. If it is too large, we waste memory.
+ */
+#define HOST1X_SYNC_QUEUE_SIZE 512
+
+/*
+ * Number of gathers we allow to be queued up per channel. Must be a
+ * power of two. Currently sized such that pushbuffer is 4KB (512*8B).
+ */
+#define HOST1X_GATHER_QUEUE_SIZE 512
+
+/* 8 bytes per slot. (This number does not include the final RESTART.) */
+#define PUSH_BUFFER_SIZE (HOST1X_GATHER_QUEUE_SIZE * 8)
+
+#endif
diff --git a/drivers/gpu/host1x/hw/channel_hw.c b/drivers/gpu/host1x/hw/channel_hw.c
new file mode 100644
index 0000000..3bdfef6
--- /dev/null
+++ b/drivers/gpu/host1x/hw/channel_hw.c
@@ -0,0 +1,147 @@
+/*
+ * Tegra host1x Channel
+ *
+ * Copyright (c) 2010-2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/host1x.h>
+#include "channel.h"
+#include "dev.h"
+#include <linux/slab.h>
+#include "intr.h"
+#include <trace/events/host1x.h>
+
+static void submit_gathers(struct host1x_job *job)
+{
+	/* push user gathers */
+	int i;
+	for (i = 0 ; i < job->num_gathers; i++) {
+		struct host1x_job_gather *g = &job->gathers[i];
+		u32 op1 = host1x_opcode_gather(g->words);
+		u32 op2 = g->mem_base + g->offset;
+		host1x_cdma_push_gather(&job->ch->cdma,
+				job->gathers[i].ref,
+				job->gathers[i].offset,
+				op1, op2);
+	}
+}
+
+static int channel_submit(struct host1x_job *job)
+{
+	struct host1x_channel *ch = job->ch;
+	struct host1x_syncpt *sp;
+	u32 user_syncpt_incrs = job->syncpt_incrs;
+	u32 prev_max = 0;
+	u32 syncval;
+	int err;
+	void *completed_waiter = NULL;
+
+	sp = host1x_get_host(job->ch->dev)->syncpt + job->syncpt_id;
+	trace_host1x_channel_submit(ch->dev->name,
+			job->num_gathers, job->num_relocs, job->num_waitchk,
+			job->syncpt_id, job->syncpt_incrs);
+
+	/* before error checks, return current max */
+	prev_max = job->syncpt_end = host1x_syncpt_read_max(sp);
+
+	/* get submit lock */
+	err = mutex_lock_interruptible(&ch->submitlock);
+	if (err)
+		goto error;
+
+	completed_waiter = host1x_intr_alloc_waiter();
+	if (!completed_waiter) {
+		mutex_unlock(&ch->submitlock);
+		err = -ENOMEM;
+		goto error;
+	}
+
+	/* begin a CDMA submit */
+	err = host1x_cdma_begin(&ch->cdma, job);
+	if (err) {
+		mutex_unlock(&ch->submitlock);
+		goto error;
+	}
+
+	if (job->serialize) {
+		/*
+		 * Force serialization by inserting a host wait for the
+		 * previous job to finish before this one can commence.
+		 */
+		host1x_cdma_push(&ch->cdma,
+				host1x_opcode_setclass(NV_HOST1X_CLASS_ID,
+					host1x_uclass_wait_syncpt_r(),
+					1),
+				host1x_class_host_wait_syncpt(job->syncpt_id,
+					host1x_syncpt_read_max(sp)));
+	}
+
+	syncval = host1x_syncpt_incr_max(sp, user_syncpt_incrs);
+
+	job->syncpt_end = syncval;
+
+	/* add a setclass for modules that require it */
+	if (job->class)
+		host1x_cdma_push(&ch->cdma,
+			host1x_opcode_setclass(job->class, 0, 0),
+			HOST1X_OPCODE_NOOP);
+
+	submit_gathers(job);
+
+	/* end CDMA submit & stash pinned hMems into sync queue */
+	host1x_cdma_end(&ch->cdma, job);
+
+	trace_host1x_channel_submitted(ch->dev->name,
+			prev_max, syncval);
+
+	/* schedule a submit complete interrupt */
+	err = host1x_intr_add_action(&host1x_get_host(ch->dev)->intr,
+			job->syncpt_id, syncval,
+			HOST1X_INTR_ACTION_SUBMIT_COMPLETE, ch,
+			completed_waiter,
+			NULL);
+	completed_waiter = NULL;
+	WARN(err, "Failed to set submit complete interrupt");
+
+	mutex_unlock(&ch->submitlock);
+
+	return 0;
+
+error:
+	kfree(completed_waiter);
+	return err;
+}
+
+static inline void __iomem *host1x_channel_regs(void __iomem *p, int ndx)
+{
+	p += ndx * NV_HOST1X_CHANNEL_MAP_SIZE_BYTES;
+	return p;
+}
+
+static int host1x_channel_init(struct host1x_channel *ch,
+	struct host1x *dev, int index)
+{
+	ch->chid = index;
+	mutex_init(&ch->reflock);
+	mutex_init(&ch->submitlock);
+
+	ch->regs = host1x_channel_regs(dev->regs, index);
+	return 0;
+}
+
+static const struct host1x_channel_ops host1x_channel_ops = {
+	.init = host1x_channel_init,
+	.submit = channel_submit,
+};
diff --git a/drivers/gpu/host1x/hw/host1x01.c b/drivers/gpu/host1x/hw/host1x01.c
index c5c55a3..3f41619 100644
--- a/drivers/gpu/host1x/hw/host1x01.c
+++ b/drivers/gpu/host1x/hw/host1x01.c
@@ -24,13 +24,19 @@
 
 #include "hw/host1x01.h"
 #include "dev.h"
+#include "channel.h"
 #include "hw/host1x01_hardware.h"
 
+#include "hw/channel_hw.c"
+#include "hw/cdma_hw.c"
 #include "hw/syncpt_hw.c"
 #include "hw/intr_hw.c"
 
 int host1x01_init(struct host1x *host)
 {
+	host->channel_op = host1x_channel_ops;
+	host->cdma_op = host1x_cdma_ops;
+	host->cdma_pb_op = host1x_pushbuffer_ops;
 	host->syncpt_op = host1x_syncpt_ops;
 	host->intr_op = host1x_intr_ops;
 
diff --git a/drivers/gpu/host1x/hw/host1x01_hardware.h b/drivers/gpu/host1x/hw/host1x01_hardware.h
index 4e57f21..020798f 100644
--- a/drivers/gpu/host1x/hw/host1x01_hardware.h
+++ b/drivers/gpu/host1x/hw/host1x01_hardware.h
@@ -21,6 +21,130 @@
 
 #include <linux/types.h>
 #include <linux/bitops.h>
+#include "hw_host1x01_channel.h"
 #include "hw_host1x01_sync.h"
+#include "hw_host1x01_uclass.h"
+
+/* channel registers */
+#define NV_HOST1X_CHANNEL_MAP_SIZE_BYTES 16384
+
+static inline u32 host1x_class_host_wait_syncpt(
+	unsigned indx, unsigned threshold)
+{
+	return host1x_uclass_wait_syncpt_indx_f(indx)
+		| host1x_uclass_wait_syncpt_thresh_f(threshold);
+}
+
+static inline u32 host1x_class_host_load_syncpt_base(
+	unsigned indx, unsigned threshold)
+{
+	return host1x_uclass_load_syncpt_base_base_indx_f(indx)
+		| host1x_uclass_load_syncpt_base_value_f(threshold);
+}
+
+static inline u32 host1x_class_host_wait_syncpt_base(
+	unsigned indx, unsigned base_indx, unsigned offset)
+{
+	return host1x_uclass_wait_syncpt_base_indx_f(indx)
+		| host1x_uclass_wait_syncpt_base_base_indx_f(base_indx)
+		| host1x_uclass_wait_syncpt_base_offset_f(offset);
+}
+
+static inline u32 host1x_class_host_incr_syncpt_base(
+	unsigned base_indx, unsigned offset)
+{
+	return host1x_uclass_incr_syncpt_base_base_indx_f(base_indx)
+		| host1x_uclass_incr_syncpt_base_offset_f(offset);
+}
+
+static inline u32 host1x_class_host_incr_syncpt(
+	unsigned cond, unsigned indx)
+{
+	return host1x_uclass_incr_syncpt_cond_f(cond)
+		| host1x_uclass_incr_syncpt_indx_f(indx);
+}
+
+static inline u32 host1x_class_host_indoff_reg_write(
+	unsigned mod_id, unsigned offset, bool auto_inc)
+{
+	u32 v = host1x_uclass_indoff_indbe_f(0xf)
+		| host1x_uclass_indoff_indmodid_f(mod_id)
+		| host1x_uclass_indoff_indroffset_f(offset);
+	if (auto_inc)
+		v |= host1x_uclass_indoff_autoinc_f(1);
+	return v;
+}
+
+static inline u32 host1x_class_host_indoff_reg_read(
+	unsigned mod_id, unsigned offset, bool auto_inc)
+{
+	u32 v = host1x_uclass_indoff_indmodid_f(mod_id)
+		| host1x_uclass_indoff_indroffset_f(offset)
+		| host1x_uclass_indoff_rwn_read_v();
+	if (auto_inc)
+		v |= host1x_uclass_indoff_autoinc_f(1);
+	return v;
+}
+
+
+/* cdma opcodes */
+static inline u32 host1x_opcode_setclass(
+	unsigned class_id, unsigned offset, unsigned mask)
+{
+	return (0 << 28) | (offset << 16) | (class_id << 6) | mask;
+}
+
+static inline u32 host1x_opcode_incr(unsigned offset, unsigned count)
+{
+	return (1 << 28) | (offset << 16) | count;
+}
+
+static inline u32 host1x_opcode_nonincr(unsigned offset, unsigned count)
+{
+	return (2 << 28) | (offset << 16) | count;
+}
+
+static inline u32 host1x_opcode_mask(unsigned offset, unsigned mask)
+{
+	return (3 << 28) | (offset << 16) | mask;
+}
+
+static inline u32 host1x_opcode_imm(unsigned offset, unsigned value)
+{
+	return (4 << 28) | (offset << 16) | value;
+}
+
+static inline u32 host1x_opcode_imm_incr_syncpt(unsigned cond, unsigned indx)
+{
+	return host1x_opcode_imm(host1x_uclass_incr_syncpt_r(),
+		host1x_class_host_incr_syncpt(cond, indx));
+}
+
+static inline u32 host1x_opcode_restart(unsigned address)
+{
+	return (5 << 28) | (address >> 4);
+}
+
+static inline u32 host1x_opcode_gather(unsigned count)
+{
+	return (6 << 28) | count;
+}
+
+static inline u32 host1x_opcode_gather_nonincr(unsigned offset,	unsigned count)
+{
+	return (6 << 28) | (offset << 16) | BIT(15) | count;
+}
+
+static inline u32 host1x_opcode_gather_incr(unsigned offset, unsigned count)
+{
+	return (6 << 28) | (offset << 16) | BIT(15) | BIT(14) | count;
+}
+
+#define HOST1X_OPCODE_NOOP host1x_opcode_nonincr(0, 0)
+
+static inline u32 host1x_mask2(unsigned x, unsigned y)
+{
+	return 1 | (1 << (y - x));
+}
 
 #endif
diff --git a/drivers/gpu/host1x/hw/hw_host1x01_channel.h b/drivers/gpu/host1x/hw/hw_host1x01_channel.h
new file mode 100644
index 0000000..3a23d57
--- /dev/null
+++ b/drivers/gpu/host1x/hw/hw_host1x01_channel.h
@@ -0,0 +1,86 @@
+/*
+ * Copyright (c) 2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+ /*
+  * Function naming determines intended use:
+  *
+  *     <x>_r(void) : Returns the offset for register <x>.
+  *
+  *     <x>_w(void) : Returns the word offset for word (4 byte) element <x>.
+  *
+  *     <x>_<y>_s(void) : Returns size of field <y> of register <x> in bits.
+  *
+  *     <x>_<y>_f(u32 v) : Returns a value based on 'v' which has been shifted
+  *         and masked to place it at field <y> of register <x>.  This value
+  *         can be |'d with others to produce a full register value for
+  *         register <x>.
+  *
+  *     <x>_<y>_m(void) : Returns a mask for field <y> of register <x>.  This
+  *         value can be ~'d and then &'d to clear the value of field <y> for
+  *         register <x>.
+  *
+  *     <x>_<y>_<z>_f(void) : Returns the constant value <z> after being shifted
+  *         to place it at field <y> of register <x>.  This value can be |'d
+  *         with others to produce a full register value for <x>.
+  *
+  *     <x>_<y>_v(u32 r) : Returns the value of field <y> from a full register
+  *         <x> value 'r' after being shifted to place its LSB at bit 0.
+  *         This value is suitable for direct comparison with other unshifted
+  *         values appropriate for use in field <y> of register <x>.
+  *
+  *     <x>_<y>_<z>_v(void) : Returns the constant value for <z> defined for
+  *         field <y> of register <x>.  This value is suitable for direct
+  *         comparison with unshifted values appropriate for use in field <y>
+  *         of register <x>.
+  */
+
+#ifndef __hw_host1x_channel_host1x_h__
+#define __hw_host1x_channel_host1x_h__
+
+static inline u32 host1x_channel_dmastart_r(void)
+{
+	return 0x14;
+}
+static inline u32 host1x_channel_dmaput_r(void)
+{
+	return 0x18;
+}
+static inline u32 host1x_channel_dmaget_r(void)
+{
+	return 0x1c;
+}
+static inline u32 host1x_channel_dmaend_r(void)
+{
+	return 0x20;
+}
+static inline u32 host1x_channel_dmactrl_r(void)
+{
+	return 0x24;
+}
+static inline u32 host1x_channel_dmactrl_dmastop_f(u32 v)
+{
+	return (v & 0x1) << 0;
+}
+static inline u32 host1x_channel_dmactrl_dmagetrst_f(u32 v)
+{
+	return (v & 0x1) << 1;
+}
+static inline u32 host1x_channel_dmactrl_dmainitget_f(u32 v)
+{
+	return (v & 0x1) << 2;
+}
+#endif
diff --git a/drivers/gpu/host1x/hw/hw_host1x01_sync.h b/drivers/gpu/host1x/hw/hw_host1x01_sync.h
index b06a2c5..c9342da 100644
--- a/drivers/gpu/host1x/hw/hw_host1x01_sync.h
+++ b/drivers/gpu/host1x/hw/hw_host1x01_sync.h
@@ -63,6 +63,14 @@ static inline u32 host1x_sync_syncpt_thresh_int_enable_cpu0_r(void)
 {
 	return 0x68;
 }
+static inline u32 host1x_sync_cmdproc_stop_r(void)
+{
+	return 0xac;
+}
+static inline u32 host1x_sync_ch_teardown_r(void)
+{
+	return 0xb0;
+}
 static inline u32 host1x_sync_usec_clk_r(void)
 {
 	return 0x1a4;
diff --git a/drivers/gpu/host1x/hw/hw_host1x01_uclass.h b/drivers/gpu/host1x/hw/hw_host1x01_uclass.h
new file mode 100644
index 0000000..948cfe3
--- /dev/null
+++ b/drivers/gpu/host1x/hw/hw_host1x01_uclass.h
@@ -0,0 +1,130 @@
+/*
+ * Copyright (c) 2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+ /*
+  * Function naming determines intended use:
+  *
+  *     <x>_r(void) : Returns the offset for register <x>.
+  *
+  *     <x>_w(void) : Returns the word offset for word (4 byte) element <x>.
+  *
+  *     <x>_<y>_s(void) : Returns size of field <y> of register <x> in bits.
+  *
+  *     <x>_<y>_f(u32 v) : Returns a value based on 'v' which has been shifted
+  *         and masked to place it at field <y> of register <x>.  This value
+  *         can be |'d with others to produce a full register value for
+  *         register <x>.
+  *
+  *     <x>_<y>_m(void) : Returns a mask for field <y> of register <x>.  This
+  *         value can be ~'d and then &'d to clear the value of field <y> for
+  *         register <x>.
+  *
+  *     <x>_<y>_<z>_f(void) : Returns the constant value <z> after being shifted
+  *         to place it at field <y> of register <x>.  This value can be |'d
+  *         with others to produce a full register value for <x>.
+  *
+  *     <x>_<y>_v(u32 r) : Returns the value of field <y> from a full register
+  *         <x> value 'r' after being shifted to place its LSB at bit 0.
+  *         This value is suitable for direct comparison with other unshifted
+  *         values appropriate for use in field <y> of register <x>.
+  *
+  *     <x>_<y>_<z>_v(void) : Returns the constant value for <z> defined for
+  *         field <y> of register <x>.  This value is suitable for direct
+  *         comparison with unshifted values appropriate for use in field <y>
+  *         of register <x>.
+  */
+
+#ifndef __hw_host1x_uclass_host1x_h__
+#define __hw_host1x_uclass_host1x_h__
+
+static inline u32 host1x_uclass_incr_syncpt_r(void)
+{
+	return 0x0;
+}
+static inline u32 host1x_uclass_incr_syncpt_cond_f(u32 v)
+{
+	return (v & 0xff) << 8;
+}
+static inline u32 host1x_uclass_incr_syncpt_indx_f(u32 v)
+{
+	return (v & 0xff) << 0;
+}
+static inline u32 host1x_uclass_wait_syncpt_r(void)
+{
+	return 0x8;
+}
+static inline u32 host1x_uclass_wait_syncpt_indx_f(u32 v)
+{
+	return (v & 0xff) << 24;
+}
+static inline u32 host1x_uclass_wait_syncpt_thresh_f(u32 v)
+{
+	return (v & 0xffffff) << 0;
+}
+static inline u32 host1x_uclass_wait_syncpt_base_indx_f(u32 v)
+{
+	return (v & 0xff) << 24;
+}
+static inline u32 host1x_uclass_wait_syncpt_base_base_indx_f(u32 v)
+{
+	return (v & 0xff) << 16;
+}
+static inline u32 host1x_uclass_wait_syncpt_base_offset_f(u32 v)
+{
+	return (v & 0xffff) << 0;
+}
+static inline u32 host1x_uclass_load_syncpt_base_base_indx_f(u32 v)
+{
+	return (v & 0xff) << 24;
+}
+static inline u32 host1x_uclass_load_syncpt_base_value_f(u32 v)
+{
+	return (v & 0xffffff) << 0;
+}
+static inline u32 host1x_uclass_incr_syncpt_base_base_indx_f(u32 v)
+{
+	return (v & 0xff) << 24;
+}
+static inline u32 host1x_uclass_incr_syncpt_base_offset_f(u32 v)
+{
+	return (v & 0xffffff) << 0;
+}
+static inline u32 host1x_uclass_indoff_r(void)
+{
+	return 0x2d;
+}
+static inline u32 host1x_uclass_indoff_indbe_f(u32 v)
+{
+	return (v & 0xf) << 28;
+}
+static inline u32 host1x_uclass_indoff_autoinc_f(u32 v)
+{
+	return (v & 0x1) << 27;
+}
+static inline u32 host1x_uclass_indoff_indmodid_f(u32 v)
+{
+	return (v & 0xff) << 18;
+}
+static inline u32 host1x_uclass_indoff_indroffset_f(u32 v)
+{
+	return (v & 0xffff) << 2;
+}
+static inline u32 host1x_uclass_indoff_rwn_read_v(void)
+{
+	return 1;
+}
+#endif
diff --git a/drivers/gpu/host1x/hw/syncpt_hw.c b/drivers/gpu/host1x/hw/syncpt_hw.c
index 44a10b0..a070473 100644
--- a/drivers/gpu/host1x/hw/syncpt_hw.c
+++ b/drivers/gpu/host1x/hw/syncpt_hw.c
@@ -97,6 +97,15 @@ static void syncpt_cpu_incr(struct host1x_syncpt *sp)
 	wmb();
 }
 
+/* remove a wait pointed to by patch_addr */
+static int syncpt_patch_wait(struct host1x_syncpt *sp, void *patch_addr)
+{
+	u32 override = host1x_class_host_wait_syncpt(
+			NVSYNCPT_GRAPHICS_HOST, 0);
+	__raw_writel(override, patch_addr);
+	return 0;
+}
+
 static const char *syncpt_name(struct host1x_syncpt *sp)
 {
 	struct host1x_device_info *info = &sp->dev->info;
@@ -141,6 +150,7 @@ static const struct host1x_syncpt_ops host1x_syncpt_ops = {
 	.read_wait_base = syncpt_read_wait_base,
 	.load_min = syncpt_load_min,
 	.cpu_incr = syncpt_cpu_incr,
+	.patch_wait = syncpt_patch_wait,
 	.debug = syncpt_debug,
 	.name = syncpt_name,
 };
diff --git a/drivers/gpu/host1x/intr.c b/drivers/gpu/host1x/intr.c
index f166224..a524826 100644
--- a/drivers/gpu/host1x/intr.c
+++ b/drivers/gpu/host1x/intr.c
@@ -20,6 +20,8 @@
 #include <linux/interrupt.h>
 #include <linux/slab.h>
 #include <linux/irq.h>
+#include <trace/events/host1x.h>
+#include "channel.h"
 #include "dev.h"
 
 /* Wait list management */
@@ -74,7 +76,7 @@ static void remove_completed_waiters(struct list_head *head, u32 sync,
 			struct list_head completed[HOST1X_INTR_ACTION_COUNT])
 {
 	struct list_head *dest;
-	struct host1x_waitlist *waiter, *next;
+	struct host1x_waitlist *waiter, *next, *prev;
 
 	list_for_each_entry_safe(waiter, next, head, list) {
 		if ((s32)(waiter->thresh - sync) > 0)
@@ -82,6 +84,17 @@ static void remove_completed_waiters(struct list_head *head, u32 sync,
 
 		dest = completed + waiter->action;
 
+		/* consolidate submit cleanups */
+		if (waiter->action == HOST1X_INTR_ACTION_SUBMIT_COMPLETE
+			&& !list_empty(dest)) {
+			prev = list_entry(dest->prev,
+					struct host1x_waitlist, list);
+			if (prev->data == waiter->data) {
+				prev->count++;
+				dest = NULL;
+			}
+		}
+
 		/* PENDING->REMOVED or CANCELLED->HANDLED */
 		if (atomic_inc_return(&waiter->state) == WLS_HANDLED || !dest) {
 			list_del(&waiter->list);
@@ -104,6 +117,19 @@ void reset_threshold_interrupt(struct host1x_intr *intr,
 	host1x->intr_op.enable_syncpt_intr(intr, id);
 }
 
+static void action_submit_complete(struct host1x_waitlist *waiter)
+{
+	struct host1x_channel *channel = waiter->data;
+	int nr_completed = waiter->count;
+
+	host1x_cdma_update(&channel->cdma);
+
+	/*  Add nr_completed to trace */
+	trace_host1x_channel_submit_complete(channel->dev->name,
+			nr_completed, waiter->thresh);
+
+}
+
 static void action_wakeup(struct host1x_waitlist *waiter)
 {
 	wait_queue_head_t *wq = waiter->data;
@@ -121,6 +147,7 @@ static void action_wakeup_interruptible(struct host1x_waitlist *waiter)
 typedef void (*action_handler)(struct host1x_waitlist *waiter);
 
 static action_handler action_handlers[HOST1X_INTR_ACTION_COUNT] = {
+	action_submit_complete,
 	action_wakeup,
 	action_wakeup_interruptible,
 };
diff --git a/drivers/gpu/host1x/intr.h b/drivers/gpu/host1x/intr.h
index 3625bf3..fa4b2c4 100644
--- a/drivers/gpu/host1x/intr.h
+++ b/drivers/gpu/host1x/intr.h
@@ -28,6 +28,12 @@ struct host1x_channel;
 
 enum host1x_intr_action {
 	/*
+	 * Perform cleanup after a submit has completed.
+	 * 'data' points to a channel
+	 */
+	HOST1X_INTR_ACTION_SUBMIT_COMPLETE = 0,
+
+	/*
 	 * Wake up a  task.
 	 * 'data' points to a wait_queue_head_t
 	 */
diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c
new file mode 100644
index 0000000..cc8ca2f
--- /dev/null
+++ b/drivers/gpu/host1x/job.c
@@ -0,0 +1,618 @@
+/*
+ * Tegra host1x Job
+ *
+ * Copyright (c) 2010-2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/kref.h>
+#include <linux/err.h>
+#include <linux/vmalloc.h>
+#include <linux/scatterlist.h>
+#include <linux/host1x.h>
+#include <trace/events/host1x.h>
+#include <linux/dma-mapping.h>
+#include "channel.h"
+#include "syncpt.h"
+#include "dev.h"
+#include "memmgr.h"
+
+#ifdef CONFIG_TEGRA_HOST1X_FIREWALL
+static int host1x_firewall = 1;
+#else
+static int host1x_firewall;
+#endif
+
+struct host1x_job *host1x_job_alloc(struct host1x_channel *ch,
+		u32 num_cmdbufs, u32 num_relocs, u32 num_waitchks)
+{
+	struct host1x_job *job = NULL;
+	int num_unpins = num_cmdbufs + num_relocs;
+	s64 total;
+	void *mem;
+
+	/* Check that we're not going to overflow */
+	total = sizeof(struct host1x_job)
+			+ num_relocs * sizeof(struct host1x_reloc)
+			+ num_unpins * sizeof(struct host1x_job_unpin_data)
+			+ num_waitchks * sizeof(struct host1x_waitchk)
+			+ num_cmdbufs * sizeof(struct host1x_job_gather)
+			+ num_unpins * sizeof(dma_addr_t)
+			+ num_unpins * sizeof(u32 *);
+	if (total > ULONG_MAX)
+		return NULL;
+
+	mem = job = kzalloc(total, GFP_KERNEL);
+	if (!job)
+		return NULL;
+
+	kref_init(&job->ref);
+	job->ch = ch;
+
+	/* First init state to zero */
+
+	/*
+	 * Redistribute memory to the structs.
+	 * Overflows and negative conditions have
+	 * already been checked in job_alloc().
+	 */
+	mem += sizeof(struct host1x_job);
+	job->relocarray = num_relocs ? mem : NULL;
+	mem += num_relocs * sizeof(struct host1x_reloc);
+	job->unpins = num_unpins ? mem : NULL;
+	mem += num_unpins * sizeof(struct host1x_job_unpin_data);
+	job->waitchk = num_waitchks ? mem : NULL;
+	mem += num_waitchks * sizeof(struct host1x_waitchk);
+	job->gathers = num_cmdbufs ? mem : NULL;
+	mem += num_cmdbufs * sizeof(struct host1x_job_gather);
+	job->addr_phys = num_unpins ? mem : NULL;
+	mem += num_unpins * sizeof(dma_addr_t);
+	job->pin_ids = num_unpins ? mem : NULL;
+
+	job->reloc_addr_phys = job->addr_phys;
+	job->gather_addr_phys = &job->addr_phys[num_relocs];
+
+	return job;
+}
+EXPORT_SYMBOL(host1x_job_alloc);
+
+void host1x_job_get(struct host1x_job *job)
+{
+	kref_get(&job->ref);
+}
+EXPORT_SYMBOL(host1x_job_get);
+
+static void job_free(struct kref *ref)
+{
+	struct host1x_job *job = container_of(ref, struct host1x_job, ref);
+
+	kfree(job);
+}
+
+void host1x_job_put(struct host1x_job *job)
+{
+	kref_put(&job->ref, job_free);
+}
+EXPORT_SYMBOL(host1x_job_put);
+
+void host1x_job_add_gather(struct host1x_job *job,
+		u32 mem_id, u32 words, u32 offset)
+{
+	struct host1x_job_gather *cur_gather =
+			&job->gathers[job->num_gathers];
+
+	cur_gather->words = words;
+	cur_gather->mem_id = mem_id;
+	cur_gather->offset = offset;
+	job->num_gathers++;
+}
+EXPORT_SYMBOL(host1x_job_add_gather);
+
+/*
+ * Check driver supplied waitchk structs for syncpt thresholds
+ * that have already been satisfied and NULL the comparison (to
+ * avoid a wrap condition in the HW).
+ */
+static int do_waitchks(struct host1x_job *job, struct host1x *host,
+		u32 patch_mem, struct mem_handle *h)
+{
+	int i;
+
+	/* compare syncpt vs wait threshold */
+	for (i = 0; i < job->num_waitchk; i++) {
+		struct host1x_waitchk *wait = &job->waitchk[i];
+		struct host1x_syncpt *sp =
+			host1x_syncpt_get(host, wait->syncpt_id);
+
+		/* validate syncpt id */
+		if (wait->syncpt_id > host1x_syncpt_nb_pts(host))
+			continue;
+
+		/* skip all other gathers */
+		if (patch_mem != wait->mem)
+			continue;
+
+		trace_host1x_syncpt_wait_check(wait->mem, wait->offset,
+				wait->syncpt_id, wait->thresh,
+				host1x_syncpt_read_min(sp));
+		if (host1x_syncpt_is_expired(
+			host1x_syncpt_get(host, wait->syncpt_id),
+			wait->thresh)) {
+			struct host1x_syncpt *sp =
+				host1x_syncpt_get(host, wait->syncpt_id);
+
+			void *patch_addr = NULL;
+
+			/*
+			 * NULL an already satisfied WAIT_SYNCPT host method,
+			 * by patching its args in the command stream. The
+			 * method data is changed to reference a reserved
+			 * (never given out or incr) NVSYNCPT_GRAPHICS_HOST
+			 * syncpt with a matching threshold value of 0, so
+			 * is guaranteed to be popped by the host HW.
+			 */
+			dev_dbg(&host->dev->dev,
+			    "drop WAIT id %d (%s) thresh 0x%x, min 0x%x\n",
+			    wait->syncpt_id, sp->name, wait->thresh,
+			    host1x_syncpt_read_min(sp));
+
+			/* patch the wait */
+			patch_addr = host1x_memmgr_kmap(h,
+					wait->offset >> PAGE_SHIFT);
+			if (patch_addr) {
+				host1x_syncpt_patch_wait(sp,
+					(patch_addr +
+						(wait->offset & ~PAGE_MASK)));
+				host1x_memmgr_kunmap(h,
+						wait->offset >> PAGE_SHIFT,
+						patch_addr);
+			} else {
+				pr_err("Couldn't map cmdbuf for wait check\n");
+			}
+		}
+
+		wait->mem = 0;
+	}
+	return 0;
+}
+
+
+static int pin_job_mem(struct host1x_job *job)
+{
+	int i;
+	int count = 0;
+	int result;
+
+	for (i = 0; i < job->num_relocs; i++) {
+		struct host1x_reloc *reloc = &job->relocarray[i];
+		job->pin_ids[count] = reloc->target;
+		count++;
+	}
+
+	for (i = 0; i < job->num_gathers; i++) {
+		struct host1x_job_gather *g = &job->gathers[i];
+		job->pin_ids[count] = g->mem_id;
+		count++;
+	}
+
+	/* validate array and pin unique ids, get refs for unpinning */
+	result = host1x_memmgr_pin_array_ids(job->ch->dev,
+		job->pin_ids, job->addr_phys,
+		count,
+		job->unpins);
+
+	if (result > 0)
+		job->num_unpins = result;
+
+	return result;
+}
+
+static int do_relocs(struct host1x_job *job,
+		u32 cmdbuf_mem, struct mem_handle *h)
+{
+	int i = 0;
+	int last_page = -1;
+	void *cmdbuf_page_addr = NULL;
+
+	/* pin & patch the relocs for one gather */
+	while (i < job->num_relocs) {
+		struct host1x_reloc *reloc = &job->relocarray[i];
+
+		/* skip all other gathers */
+		if (cmdbuf_mem != reloc->cmdbuf_mem) {
+			i++;
+			continue;
+		}
+
+		if (last_page != reloc->cmdbuf_offset >> PAGE_SHIFT) {
+			if (cmdbuf_page_addr)
+				host1x_memmgr_kunmap(h,
+						last_page, cmdbuf_page_addr);
+
+			cmdbuf_page_addr = host1x_memmgr_kmap(h,
+					reloc->cmdbuf_offset >> PAGE_SHIFT);
+			last_page = reloc->cmdbuf_offset >> PAGE_SHIFT;
+
+			if (unlikely(!cmdbuf_page_addr)) {
+				pr_err("Couldn't map cmdbuf for relocation\n");
+				return -ENOMEM;
+			}
+		}
+
+		__raw_writel(
+			(job->reloc_addr_phys[i] +
+				reloc->target_offset) >> reloc->shift,
+			(cmdbuf_page_addr +
+				(reloc->cmdbuf_offset & ~PAGE_MASK)));
+
+		/* remove completed reloc from the job */
+		if (i != job->num_relocs - 1) {
+			struct host1x_reloc *reloc_last =
+				&job->relocarray[job->num_relocs - 1];
+			reloc->cmdbuf_mem	= reloc_last->cmdbuf_mem;
+			reloc->cmdbuf_offset	= reloc_last->cmdbuf_offset;
+			reloc->target		= reloc_last->target;
+			reloc->target_offset	= reloc_last->target_offset;
+			reloc->shift		= reloc_last->shift;
+			job->reloc_addr_phys[i] =
+				job->reloc_addr_phys[job->num_relocs - 1];
+			job->num_relocs--;
+		} else {
+			break;
+		}
+	}
+
+	if (cmdbuf_page_addr)
+		host1x_memmgr_kunmap(h, last_page, cmdbuf_page_addr);
+
+	return 0;
+}
+
+static int check_reloc(struct host1x_reloc *reloc,
+		u32 cmdbuf_id, int offset)
+{
+	int err = 0;
+	if (reloc->cmdbuf_mem != cmdbuf_id
+			|| reloc->cmdbuf_offset != offset * sizeof(u32))
+		err = -EINVAL;
+
+	return err;
+}
+
+static int check_mask(struct host1x_job *job,
+		struct platform_device *pdev,
+		struct host1x_reloc **reloc, int *num_relocs,
+		u32 cmdbuf_id, int *offset,
+		u32 *words, u32 class, u32 reg, u32 mask)
+{
+	while (mask) {
+		if (*words == 0)
+			return -EINVAL;
+
+		if (mask & 1) {
+			if (job->is_addr_reg(pdev, class, reg)) {
+				if (!*num_relocs ||
+					check_reloc(*reloc, cmdbuf_id, *offset))
+					return -EINVAL;
+				(*reloc)++;
+				(*num_relocs)--;
+			}
+			(*words)--;
+			(*offset)++;
+		}
+		mask >>= 1;
+		reg += 1;
+	}
+
+	return 0;
+}
+
+static int check_incr(struct host1x_job *job,
+		struct platform_device *pdev,
+		struct host1x_reloc **reloc, int *num_relocs,
+		u32 cmdbuf_id, int *offset,
+		u32 *words, u32 class, u32 reg, u32 count)
+{
+	while (count) {
+		if (*words == 0)
+			return -EINVAL;
+
+		if (job->is_addr_reg(pdev, class, reg)) {
+			if (!*num_relocs ||
+				check_reloc(*reloc, cmdbuf_id, *offset))
+				return -EINVAL;
+			(*reloc)++;
+			(*num_relocs)--;
+		}
+		reg += 1;
+		(*words)--;
+		(*offset)++;
+		count--;
+	}
+
+	return 0;
+}
+
+static int check_nonincr(struct host1x_job *job,
+		struct platform_device *pdev,
+		struct host1x_reloc **reloc, int *num_relocs,
+		u32 cmdbuf_id, int *offset,
+		u32 *words, u32 class, u32 reg, u32 count)
+{
+	int is_addr_reg = job->is_addr_reg(pdev, class, reg);
+
+	while (count) {
+		if (*words == 0)
+			return -EINVAL;
+
+		if (is_addr_reg) {
+			if (!*num_relocs ||
+				check_reloc(*reloc, cmdbuf_id, *offset))
+				return -EINVAL;
+			(*reloc)++;
+			(*num_relocs)--;
+		}
+		(*words)--;
+		(*offset)++;
+		count--;
+	}
+
+	return 0;
+}
+
+static int validate(struct host1x_job *job, struct platform_device *pdev,
+		struct host1x_job_gather *g)
+{
+	struct host1x_reloc *reloc = job->relocarray;
+	int num_relocs = job->num_relocs;
+	u32 *cmdbuf_base;
+	int offset = 0;
+	unsigned int words;
+	int err = 0;
+	int class = 0;
+
+	if (!job->is_addr_reg)
+		return 0;
+
+	cmdbuf_base = host1x_memmgr_mmap(g->ref);
+	if (IS_ERR_OR_NULL(cmdbuf_base))
+		return PTR_ERR(cmdbuf_base);
+
+	words = g->words;
+	while (words && !err) {
+		u32 word = cmdbuf_base[offset];
+		u32 opcode = (word & 0xf0000000) >> 28;
+		u32 mask = 0;
+		u32 reg = 0;
+		u32 count = 0;
+
+		words--;
+		offset++;
+
+		switch (opcode) {
+		case 0:
+			class = word >> 6 & 0x3ff;
+			mask = word & 0x3f;
+			reg = word >> 16 & 0xfff;
+			err = check_mask(job, pdev,
+					&reloc, &num_relocs, g->mem_id,
+					&offset, &words, class, reg, mask);
+			if (err)
+				goto out;
+			break;
+		case 1:
+			reg = word >> 16 & 0xfff;
+			count = word & 0xffff;
+			err = check_incr(job, pdev,
+					&reloc, &num_relocs, g->mem_id,
+					&offset, &words, class, reg, count);
+			if (err)
+				goto out;
+			break;
+
+		case 2:
+			reg = word >> 16 & 0xfff;
+			count = word & 0xffff;
+			err = check_nonincr(job, pdev,
+					&reloc, &num_relocs, g->mem_id,
+					&offset, &words, class, reg, count);
+			if (err)
+				goto out;
+			break;
+
+		case 3:
+			mask = word & 0xffff;
+			reg = word >> 16 & 0xfff;
+			err = check_mask(job, pdev,
+					&reloc, &num_relocs, g->mem_id,
+					&offset, &words, class, reg, mask);
+			if (err)
+				goto out;
+			break;
+		case 4:
+		case 5:
+		case 14:
+			break;
+		default:
+			err = -EINVAL;
+			break;
+		}
+	}
+
+	/* No relocs should remain at this point */
+	if (num_relocs)
+		err = -EINVAL;
+
+out:
+	host1x_memmgr_munmap(g->ref, cmdbuf_base);
+
+	return err;
+}
+
+static inline int copy_gathers(struct host1x_job *job,
+		struct platform_device *pdev)
+{
+	size_t size = 0;
+	size_t offset = 0;
+	int i;
+
+	for (i = 0; i < job->num_gathers; i++) {
+		struct host1x_job_gather *g = &job->gathers[i];
+		size += g->words * sizeof(u32);
+	}
+
+	job->gather_copy_mapped = dma_alloc_writecombine(&pdev->dev,
+			size, &job->gather_copy, GFP_KERNEL);
+	if (IS_ERR(job->gather_copy_mapped)) {
+		int err = PTR_ERR(job->gather_copy_mapped);
+		job->gather_copy_mapped = NULL;
+		return err;
+	}
+
+	job->gather_copy_size = size;
+
+	for (i = 0; i < job->num_gathers; i++) {
+		struct host1x_job_gather *g = &job->gathers[i];
+		void *gather = host1x_memmgr_mmap(g->ref);
+		memcpy(job->gather_copy_mapped + offset,
+				gather + g->offset,
+				g->words * sizeof(u32));
+
+		g->mem_base = job->gather_copy;
+		g->offset = offset;
+		g->mem_id = 0;
+		g->ref = 0;
+
+		host1x_memmgr_munmap(g->ref, gather);
+		offset += g->words * sizeof(u32);
+	}
+
+	return 0;
+}
+
+int host1x_job_pin(struct host1x_job *job, struct platform_device *pdev)
+{
+	int err = 0, i = 0, j = 0;
+	struct host1x *host = host1x_get_host(pdev);
+	DECLARE_BITMAP(waitchk_mask, host1x_syncpt_nb_pts(host));
+
+	bitmap_zero(waitchk_mask, host1x_syncpt_nb_pts(host));
+	for (i = 0; i < job->num_waitchk; i++) {
+		u32 syncpt_id = job->waitchk[i].syncpt_id;
+		if (syncpt_id < host1x_syncpt_nb_pts(host))
+			set_bit(syncpt_id, waitchk_mask);
+	}
+
+	/* get current syncpt values for waitchk */
+	for_each_set_bit(i, &waitchk_mask[0], sizeof(waitchk_mask))
+		host1x_syncpt_load_min(host->syncpt + i);
+
+	/* pin memory */
+	err = pin_job_mem(job);
+	if (err <= 0)
+		goto fail;
+
+	/* patch gathers */
+	for (i = 0; i < job->num_gathers; i++) {
+		struct host1x_job_gather *g = &job->gathers[i];
+
+		/* process each gather mem only once */
+		if (!g->ref) {
+			g->ref = host1x_memmgr_get(g->mem_id, job->ch->dev);
+			if (IS_ERR(g->ref)) {
+				err = PTR_ERR(g->ref);
+				g->ref = NULL;
+				break;
+			}
+
+			g->mem_base = job->gather_addr_phys[i];
+
+			for (j = 0; j < job->num_gathers; j++) {
+				struct host1x_job_gather *tmp =
+					&job->gathers[j];
+				if (!tmp->ref && tmp->mem_id == g->mem_id) {
+					tmp->ref = g->ref;
+					tmp->mem_base = g->mem_base;
+				}
+			}
+			err = 0;
+			if (host1x_firewall)
+				err = validate(job, pdev, g);
+			if (err)
+				dev_err(&pdev->dev,
+					"Job validate returned %d\n", err);
+			if (!err)
+				err = do_relocs(job, g->mem_id,  g->ref);
+			if (!err)
+				err = do_waitchks(job, host,
+						g->mem_id, g->ref);
+			host1x_memmgr_put(g->ref);
+			if (err)
+				break;
+		}
+	}
+
+	if (host1x_firewall) {
+		err = copy_gathers(job, pdev);
+		if (err) {
+			host1x_job_unpin(job);
+			return err;
+		}
+	}
+
+fail:
+	wmb();
+
+	return err;
+}
+EXPORT_SYMBOL(host1x_job_pin);
+
+void host1x_job_unpin(struct host1x_job *job)
+{
+	int i;
+
+	for (i = 0; i < job->num_unpins; i++) {
+		struct host1x_job_unpin_data *unpin = &job->unpins[i];
+		host1x_memmgr_unpin(unpin->h, unpin->mem);
+		host1x_memmgr_put(unpin->h);
+	}
+	job->num_unpins = 0;
+
+	if (job->gather_copy_size)
+		dma_free_writecombine(&job->ch->dev->dev,
+			job->gather_copy_size,
+			job->gather_copy_mapped, job->gather_copy);
+}
+EXPORT_SYMBOL(host1x_job_unpin);
+
+/*
+ * Debug routine used to dump job entries
+ */
+void host1x_job_dump(struct device *dev, struct host1x_job *job)
+{
+	dev_dbg(dev, "    SYNCPT_ID   %d\n",
+		job->syncpt_id);
+	dev_dbg(dev, "    SYNCPT_VAL  %d\n",
+		job->syncpt_end);
+	dev_dbg(dev, "    FIRST_GET   0x%x\n",
+		job->first_get);
+	dev_dbg(dev, "    TIMEOUT     %d\n",
+		job->timeout);
+	dev_dbg(dev, "    NUM_SLOTS   %d\n",
+		job->num_slots);
+	dev_dbg(dev, "    NUM_HANDLES %d\n",
+		job->num_unpins);
+}
diff --git a/drivers/gpu/host1x/memmgr.c b/drivers/gpu/host1x/memmgr.c
new file mode 100644
index 0000000..9cf604f
--- /dev/null
+++ b/drivers/gpu/host1x/memmgr.c
@@ -0,0 +1,174 @@
+/*
+ * Tegra host1x Memory Management Abstraction
+ *
+ * Copyright (c) 2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/host1x.h>
+#include <linux/kernel.h>
+#include <linux/err.h>
+
+#include "memmgr.h"
+#include "cma.h"
+
+struct mem_handle *host1x_memmgr_alloc(size_t size, size_t align, int flags)
+{
+	return NULL;
+}
+
+struct mem_handle *host1x_memmgr_get(u32 id, struct platform_device *dev)
+{
+	struct mem_handle *h = NULL;
+
+	switch (host1x_memmgr_type(id)) {
+#if defined(CONFIG_TEGRA_HOST1X_CMA)
+	case mem_mgr_type_cma:
+		h = (struct mem_handle *) host1x_cma_get(id, dev);
+		break;
+#endif
+	default:
+		break;
+	}
+
+	return h;
+}
+
+void host1x_memmgr_put(struct mem_handle *handle)
+{
+	switch (host1x_memmgr_type((u32)handle)) {
+#if defined(CONFIG_TEGRA_HOST1X_CMA)
+	case mem_mgr_type_cma:
+		host1x_cma_put(handle);
+		break;
+#endif
+	default:
+		break;
+	}
+}
+
+struct sg_table *host1x_memmgr_pin(struct mem_handle *handle)
+{
+	switch (host1x_memmgr_type((u32)handle)) {
+#if defined(CONFIG_TEGRA_HOST1X_CMA)
+	case mem_mgr_type_cma:
+		return host1x_cma_pin(handle);
+		break;
+#endif
+	default:
+		return 0;
+		break;
+	}
+}
+
+void host1x_memmgr_unpin(struct mem_handle *handle, struct sg_table *sgt)
+{
+	switch (host1x_memmgr_type((u32)handle)) {
+#if defined(CONFIG_TEGRA_HOST1X_CMA)
+	case mem_mgr_type_cma:
+		host1x_cma_unpin(handle, sgt);
+		break;
+#endif
+	default:
+		break;
+	}
+}
+
+void *host1x_memmgr_mmap(struct mem_handle *handle)
+{
+	switch (host1x_memmgr_type((u32)handle)) {
+#if defined(CONFIG_TEGRA_HOST1X_CMA)
+	case mem_mgr_type_cma:
+		return host1x_cma_mmap(handle);
+		break;
+#endif
+	default:
+		return 0;
+		break;
+	}
+}
+
+void host1x_memmgr_munmap(struct mem_handle *handle, void *addr)
+{
+	switch (host1x_memmgr_type((u32)handle)) {
+#if defined(CONFIG_TEGRA_HOST1X_CMA)
+	case mem_mgr_type_cma:
+		host1x_cma_munmap(handle, addr);
+		break;
+#endif
+	default:
+		break;
+	}
+}
+
+void *host1x_memmgr_kmap(struct mem_handle *handle, unsigned int pagenum)
+{
+	switch (host1x_memmgr_type((u32)handle)) {
+#if defined(CONFIG_TEGRA_HOST1X_CMA)
+	case mem_mgr_type_cma:
+		return host1x_cma_kmap(handle, pagenum);
+		break;
+#endif
+	default:
+		return 0;
+		break;
+	}
+}
+
+void host1x_memmgr_kunmap(struct mem_handle *handle, unsigned int pagenum,
+		void *addr)
+{
+	switch (host1x_memmgr_type((u32)handle)) {
+#if defined(CONFIG_TEGRA_HOST1X_CMA)
+	case mem_mgr_type_cma:
+		host1x_cma_kunmap(handle, pagenum, addr);
+		break;
+#endif
+	default:
+		break;
+	}
+}
+
+int host1x_memmgr_pin_array_ids(struct platform_device *dev,
+		long unsigned *ids,
+		dma_addr_t *phys_addr,
+		u32 count,
+		struct host1x_job_unpin_data *unpin_data)
+{
+	int pin_count = 0;
+
+#if defined(CONFIG_TEGRA_HOST1X_CMA)
+	{
+		int cma_count = host1x_cma_pin_array_ids(dev,
+			ids, MEMMGR_TYPE_MASK,
+			mem_mgr_type_cma,
+			count, &unpin_data[pin_count],
+			phys_addr);
+
+		if (cma_count < 0) {
+			/* clean up previous handles */
+			while (pin_count) {
+				pin_count--;
+				/* unpin, put */
+				host1x_memmgr_unpin(unpin_data[pin_count].h,
+						unpin_data[pin_count].mem);
+				host1x_memmgr_put(unpin_data[pin_count].h);
+			}
+			return cma_count;
+		}
+		pin_count += cma_count;
+	}
+#endif
+	return pin_count;
+}
diff --git a/drivers/gpu/host1x/memmgr.h b/drivers/gpu/host1x/memmgr.h
new file mode 100644
index 0000000..52881ea
--- /dev/null
+++ b/drivers/gpu/host1x/memmgr.h
@@ -0,0 +1,53 @@
+/*
+ * Tegra host1x Memory Management Abstraction header
+ *
+ * Copyright (c) 2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef _NVHOST_MEM_MGR_H_
+#define _NVHOST_MEM_MGR_H_
+
+struct mem_handle;
+struct platform_device;
+
+struct host1x_job_unpin_data {
+	struct mem_handle *h;
+	struct sg_table *mem;
+};
+
+enum mem_mgr_flag {
+	mem_mgr_flag_uncacheable = 0,
+	mem_mgr_flag_write_combine = 1,
+};
+
+struct mem_handle *host1x_memmgr_alloc(size_t size, size_t align,
+		int flags);
+struct mem_handle *host1x_memmgr_get(u32 id, struct platform_device *dev);
+void host1x_memmgr_put(struct mem_handle *handle);
+struct sg_table *host1x_memmgr_pin(struct mem_handle *handle);
+void host1x_memmgr_unpin(struct mem_handle *handle, struct sg_table *sgt);
+void *host1x_memmgr_mmap(struct mem_handle *handle);
+void host1x_memmgr_munmap(struct mem_handle *handle, void *addr);
+void *host1x_memmgr_kmap(struct mem_handle *handle, unsigned int pagenum);
+void host1x_memmgr_kunmap(struct mem_handle *handle, unsigned int pagenum,
+		void *addr);
+
+int host1x_memmgr_pin_array_ids(struct platform_device *dev,
+		long unsigned *ids,
+		dma_addr_t *phys_addr,
+		u32 count,
+		struct host1x_job_unpin_data *unpin_data);
+
+#endif
diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
index 9987548..5de67d2 100644
--- a/drivers/gpu/host1x/syncpt.c
+++ b/drivers/gpu/host1x/syncpt.c
@@ -295,6 +295,12 @@ void host1x_syncpt_debug(struct host1x_syncpt *sp)
 	sp->dev->syncpt_op.debug(sp);
 }
 
+/* remove a wait pointed to by patch_addr */
+int host1x_syncpt_patch_wait(struct host1x_syncpt *sp, void *patch_addr)
+{
+	return sp->dev->syncpt_op.patch_wait(sp, patch_addr);
+}
+
 struct host1x_syncpt *host1x_syncpt_init(struct host1x *host)
 {
 	struct host1x_syncpt *syncpt, *sp;
diff --git a/drivers/gpu/host1x/syncpt.h b/drivers/gpu/host1x/syncpt.h
index d4d1f3f..24d4846 100644
--- a/drivers/gpu/host1x/syncpt.h
+++ b/drivers/gpu/host1x/syncpt.h
@@ -121,6 +121,8 @@ u32 host1x_syncpt_incr_max(struct host1x_syncpt *sp, u32 incrs);
 int host1x_syncpt_wait(struct host1x_syncpt *sp, u32 thresh,
 			long timeout, u32 *value);
 
+int host1x_syncpt_patch_wait(struct host1x_syncpt *sp, void *patch_addr);
+
 void host1x_syncpt_debug(struct host1x_syncpt *sp);
 
 static inline int host1x_syncpt_is_valid(struct host1x_syncpt *sp)
diff --git a/include/linux/host1x.h b/include/linux/host1x.h
index 00060ee..a58c798 100644
--- a/include/linux/host1x.h
+++ b/include/linux/host1x.h
@@ -25,6 +25,8 @@
 #include <linux/types.h>
 #include <linux/platform_device.h>
 
+struct host1x_job;
+struct host1x_job_unpin_data;
 struct host1x_syncpt;
 
 #define NVSYNCPT_INVALID			(-1)
@@ -35,6 +37,177 @@ void host1x_syncpt_incr_byid(u32 id);
 u32 host1x_syncpt_read_byid(u32 id);
 int host1x_syncpt_wait_byid(u32 id, u32 thresh, long timeout, u32 *value);
 
+/* channels */
+struct host1x_channel *host1x_channel_alloc(struct platform_device *pdev);
+struct host1x_channel *host1x_channel_get(struct host1x_channel *ch);
+void host1x_channel_put(struct host1x_channel *ch);
+int host1x_channel_submit(struct host1x_job *job);
+
+/* Buffer encapsulation */
+enum mem_mgr_type {
+	mem_mgr_type_cma = 2,
+};
+
+#define MEMMGR_TYPE_MASK	0x3
+#define MEMMGR_ID_MASK		~0x3
+
+static inline int host1x_memmgr_type(u32 id) { return id & MEMMGR_TYPE_MASK; }
+static inline int host1x_memmgr_id(u32 id) { return id & MEMMGR_ID_MASK; }
+static inline unsigned int host1x_memmgr_host1x_id(u32 type, u32 handle)
+{
+	if (host1x_memmgr_type(type) != type ||
+		host1x_memmgr_id(handle) != handle)
+		return 0;
+
+	return handle | type;
+}
+
+
+enum host1x_class {
+	NV_HOST1X_CLASS_ID		= 0x1,
+	NV_GRAPHICS_2D_CLASS_ID		= 0x51,
+};
+
+struct host1x_job_gather {
+	u32 words;
+	dma_addr_t mem_base;
+	u32 mem_id;
+	int offset;
+	struct mem_handle *ref;
+};
+
+struct host1x_cmdbuf {
+	__u32 mem;
+	__u32 offset;
+	__u32 words;
+	__u32 pad;
+};
+
+struct host1x_reloc {
+	__u32 cmdbuf_mem;
+	__u32 cmdbuf_offset;
+	__u32 target;
+	__u32 target_offset;
+	__u32 shift;
+	__u32 pad;
+};
+
+struct host1x_waitchk {
+	__u32 mem;
+	__u32 offset;
+	__u32 syncpt_id;
+	__u32 thresh;
+};
+
+/*
+ * Each submit is tracked as a host1x_job.
+ */
+struct host1x_job {
+	/* When refcount goes to zero, job can be freed */
+	struct kref ref;
+
+	/* List entry */
+	struct list_head list;
+
+	/* Channel where job is submitted to */
+	struct host1x_channel *ch;
+
+	int clientid;
+
+	/* Gathers and their memory */
+	struct host1x_job_gather *gathers;
+	int num_gathers;
+
+	/* Wait checks to be processed at submit time */
+	struct host1x_waitchk *waitchk;
+	int num_waitchk;
+	u32 waitchk_mask;
+
+	/* Array of handles to be pinned & unpinned */
+	struct host1x_reloc *relocarray;
+	int num_relocs;
+	struct host1x_job_unpin_data *unpins;
+	int num_unpins;
+
+	dma_addr_t *addr_phys;
+	dma_addr_t *gather_addr_phys;
+	dma_addr_t *reloc_addr_phys;
+
+	/* Sync point id, number of increments and end related to the submit */
+	u32 syncpt_id;
+	u32 syncpt_incrs;
+	u32 syncpt_end;
+
+	/* Maximum time to wait for this job */
+	int timeout;
+
+	/* Null kickoff prevents submit from being sent to hardware */
+	bool null_kickoff;
+
+	/* Index and number of slots used in the push buffer */
+	int first_get;
+	int num_slots;
+
+	/* Copy of gathers */
+	size_t gather_copy_size;
+	dma_addr_t gather_copy;
+	u8 *gather_copy_mapped;
+
+	/* Temporary space for unpin ids */
+	long unsigned int *pin_ids;
+
+	/* Check if register is marked as an address reg */
+	int (*is_addr_reg)(struct platform_device *dev, u32 reg, u32 class);
+
+	/* Request a SETCLASS to this class */
+	u32 class;
+
+	/* Add a channel wait for previous ops to complete */
+	u32 serialize;
+};
+/*
+ * Allocate memory for a job. Just enough memory will be allocated to
+ * accomodate the submit.
+ */
+struct host1x_job *host1x_job_alloc(struct host1x_channel *ch,
+		u32 num_cmdbufs, u32 num_relocs, u32 num_waitchks);
+
+/*
+ * Add a gather to a job.
+ */
+void host1x_job_add_gather(struct host1x_job *job,
+		u32 mem_id, u32 words, u32 offset);
+
+/*
+ * Increment reference going to host1x_job.
+ */
+void host1x_job_get(struct host1x_job *job);
+
+/*
+ * Decrement reference job, free if goes to zero.
+ */
+void host1x_job_put(struct host1x_job *job);
+
+/*
+ * Pin memory related to job. This handles relocation of addresses to the
+ * host1x address space. Handles both the gather memory and any other memory
+ * referred to from the gather buffers.
+ *
+ * Handles also patching out host waits that would wait for an expired sync
+ * point value.
+ */
+int host1x_job_pin(struct host1x_job *job, struct platform_device *pdev);
+
+/*
+ * Unpin memory related to job.
+ */
+void host1x_job_unpin(struct host1x_job *job);
+
+/*
+ * Dump contents of job to debug output.
+ */
+void host1x_job_dump(struct device *dev, struct host1x_job *job);
+
 struct host1x_syncpt *host1x_syncpt_alloc(struct platform_device *pdev,
 		int client_managed);
 void host1x_syncpt_free(struct host1x_syncpt *sp);
diff --git a/include/trace/events/host1x.h b/include/trace/events/host1x.h
index d98d74c..e087910 100644
--- a/include/trace/events/host1x.h
+++ b/include/trace/events/host1x.h
@@ -37,6 +37,214 @@ DECLARE_EVENT_CLASS(host1x,
 	TP_printk("name=%s", __entry->name)
 );
 
+DEFINE_EVENT(host1x, host1x_channel_open,
+	TP_PROTO(const char *name),
+	TP_ARGS(name)
+);
+
+DEFINE_EVENT(host1x, host1x_channel_release,
+	TP_PROTO(const char *name),
+	TP_ARGS(name)
+);
+
+TRACE_EVENT(host1x_cdma_begin,
+	TP_PROTO(const char *name),
+
+	TP_ARGS(name),
+
+	TP_STRUCT__entry(
+		__field(const char *, name)
+	),
+
+	TP_fast_assign(
+		__entry->name = name;
+	),
+
+	TP_printk("name=%s",
+		__entry->name)
+);
+
+TRACE_EVENT(host1x_cdma_end,
+	TP_PROTO(const char *name),
+
+	TP_ARGS(name),
+
+	TP_STRUCT__entry(
+		__field(const char *, name)
+	),
+
+	TP_fast_assign(
+		__entry->name = name;
+	),
+
+	TP_printk("name=%s",
+		__entry->name)
+);
+
+TRACE_EVENT(host1x_cdma_flush,
+	TP_PROTO(const char *name, int timeout),
+
+	TP_ARGS(name, timeout),
+
+	TP_STRUCT__entry(
+		__field(const char *, name)
+		__field(int, timeout)
+	),
+
+	TP_fast_assign(
+		__entry->name = name;
+		__entry->timeout = timeout;
+	),
+
+	TP_printk("name=%s, timeout=%d",
+		__entry->name, __entry->timeout)
+);
+
+TRACE_EVENT(host1x_cdma_push,
+	TP_PROTO(const char *name, u32 op1, u32 op2),
+
+	TP_ARGS(name, op1, op2),
+
+	TP_STRUCT__entry(
+		__field(const char *, name)
+		__field(u32, op1)
+		__field(u32, op2)
+	),
+
+	TP_fast_assign(
+		__entry->name = name;
+		__entry->op1 = op1;
+		__entry->op2 = op2;
+	),
+
+	TP_printk("name=%s, op1=%08x, op2=%08x",
+		__entry->name, __entry->op1, __entry->op2)
+);
+
+TRACE_EVENT(host1x_cdma_push_gather,
+	TP_PROTO(const char *name, u32 mem_id,
+			u32 words, u32 offset, void *cmdbuf),
+
+	TP_ARGS(name, mem_id, words, offset, cmdbuf),
+
+	TP_STRUCT__entry(
+		__field(const char *, name)
+		__field(u32, mem_id)
+		__field(u32, words)
+		__field(u32, offset)
+		__field(bool, cmdbuf)
+		__dynamic_array(u32, cmdbuf, words)
+	),
+
+	TP_fast_assign(
+		if (cmdbuf) {
+			memcpy(__get_dynamic_array(cmdbuf), cmdbuf+offset,
+					words * sizeof(u32));
+		}
+		__entry->cmdbuf = cmdbuf;
+		__entry->name = name;
+		__entry->mem_id = mem_id;
+		__entry->words = words;
+		__entry->offset = offset;
+	),
+
+	TP_printk("name=%s, mem_id=%08x, words=%u, offset=%d, contents=[%s]",
+	  __entry->name, __entry->mem_id,
+	  __entry->words, __entry->offset,
+	  __print_hex(__get_dynamic_array(cmdbuf),
+		  __entry->cmdbuf ? __entry->words * 4 : 0))
+);
+
+TRACE_EVENT(host1x_channel_submit,
+	TP_PROTO(const char *name, u32 cmdbufs, u32 relocs, u32 waitchks,
+			u32 syncpt_id, u32 syncpt_incrs),
+
+	TP_ARGS(name, cmdbufs, relocs, waitchks, syncpt_id, syncpt_incrs),
+
+	TP_STRUCT__entry(
+		__field(const char *, name)
+		__field(u32, cmdbufs)
+		__field(u32, relocs)
+		__field(u32, waitchks)
+		__field(u32, syncpt_id)
+		__field(u32, syncpt_incrs)
+	),
+
+	TP_fast_assign(
+		__entry->name = name;
+		__entry->cmdbufs = cmdbufs;
+		__entry->relocs = relocs;
+		__entry->waitchks = waitchks;
+		__entry->syncpt_id = syncpt_id;
+		__entry->syncpt_incrs = syncpt_incrs;
+	),
+
+	TP_printk("name=%s, cmdbufs=%u, relocs=%u, waitchks=%d,"
+		"syncpt_id=%u, syncpt_incrs=%u",
+	  __entry->name, __entry->cmdbufs, __entry->relocs, __entry->waitchks,
+	  __entry->syncpt_id, __entry->syncpt_incrs)
+);
+
+TRACE_EVENT(host1x_channel_submitted,
+	TP_PROTO(const char *name, u32 syncpt_base, u32 syncpt_max),
+
+	TP_ARGS(name, syncpt_base, syncpt_max),
+
+	TP_STRUCT__entry(
+		__field(const char *, name)
+		__field(u32, syncpt_base)
+		__field(u32, syncpt_max)
+	),
+
+	TP_fast_assign(
+		__entry->name = name;
+		__entry->syncpt_base = syncpt_base;
+		__entry->syncpt_max = syncpt_max;
+	),
+
+	TP_printk("name=%s, syncpt_base=%d, syncpt_max=%d",
+		__entry->name, __entry->syncpt_base, __entry->syncpt_max)
+);
+
+TRACE_EVENT(host1x_channel_submit_complete,
+	TP_PROTO(const char *name, int count, u32 thresh),
+
+	TP_ARGS(name, count, thresh),
+
+	TP_STRUCT__entry(
+		__field(const char *, name)
+		__field(int, count)
+		__field(u32, thresh)
+	),
+
+	TP_fast_assign(
+		__entry->name = name;
+		__entry->count = count;
+		__entry->thresh = thresh;
+	),
+
+	TP_printk("name=%s, count=%d, thresh=%d",
+		__entry->name, __entry->count, __entry->thresh)
+);
+
+TRACE_EVENT(host1x_wait_cdma,
+	TP_PROTO(const char *name, u32 eventid),
+
+	TP_ARGS(name, eventid),
+
+	TP_STRUCT__entry(
+		__field(const char *, name)
+		__field(u32, eventid)
+	),
+
+	TP_fast_assign(
+		__entry->name = name;
+		__entry->eventid = eventid;
+	),
+
+	TP_printk("name=%s, event=%d", __entry->name, __entry->eventid)
+);
+
 TRACE_EVENT(host1x_syncpt_load_min,
 	TP_PROTO(u32 id, u32 val),
 
@@ -55,6 +263,33 @@ TRACE_EVENT(host1x_syncpt_load_min,
 	TP_printk("id=%d, val=%d", __entry->id, __entry->val)
 );
 
+TRACE_EVENT(host1x_syncpt_wait_check,
+	TP_PROTO(u32 mem_id, u32 offset, u32 syncpt_id, u32 thresh, u32 min),
+
+	TP_ARGS(mem_id, offset, syncpt_id, thresh, min),
+
+	TP_STRUCT__entry(
+		__field(u32, mem_id)
+		__field(u32, offset)
+		__field(u32, syncpt_id)
+		__field(u32, thresh)
+		__field(u32, min)
+	),
+
+	TP_fast_assign(
+		__entry->mem_id = mem_id;
+		__entry->offset = offset;
+		__entry->syncpt_id = syncpt_id;
+		__entry->thresh = thresh;
+		__entry->min = min;
+	),
+
+	TP_printk("mem_id=%08x, offset=%05x, id=%d, thresh=%d, current=%d",
+		__entry->mem_id, __entry->offset,
+		__entry->syncpt_id, __entry->thresh,
+		__entry->min)
+);
+
 #endif /*  _TRACE_HOST1X_H */
 
 /* This part must be outside protection */
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCHv3 4/7] gpu: host1x: Add debug support
  2012-12-13 14:04 ` Terje Bergstrom
@ 2012-12-13 14:04   ` Terje Bergstrom
  -1 siblings, 0 replies; 24+ messages in thread
From: Terje Bergstrom @ 2012-12-13 14:04 UTC (permalink / raw)
  To: tbergstrom, thierry.reding, dev, linux-tegra, dri-devel; +Cc: linux-kernel

Add support for host1x debugging. Adds debugfs entries, and dumps
channel state to UART in case of stuck job.

Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
---
 drivers/gpu/host1x/Makefile                 |    1 +
 drivers/gpu/host1x/cdma.c                   |   37 +++
 drivers/gpu/host1x/debug.c                  |  207 ++++++++++++++
 drivers/gpu/host1x/debug.h                  |   49 ++++
 drivers/gpu/host1x/dev.c                    |    3 +
 drivers/gpu/host1x/dev.h                    |   17 ++
 drivers/gpu/host1x/hw/cdma_hw.c             |    3 +
 drivers/gpu/host1x/hw/debug_hw.c            |  399 +++++++++++++++++++++++++++
 drivers/gpu/host1x/hw/host1x01.c            |    2 +
 drivers/gpu/host1x/hw/hw_host1x01_channel.h |   12 +
 drivers/gpu/host1x/hw/hw_host1x01_sync.h    |   77 ++++++
 drivers/gpu/host1x/hw/syncpt_hw.c           |    1 +
 drivers/gpu/host1x/syncpt.c                 |    3 +
 13 files changed, 811 insertions(+)
 create mode 100644 drivers/gpu/host1x/debug.c
 create mode 100644 drivers/gpu/host1x/debug.h
 create mode 100644 drivers/gpu/host1x/hw/debug_hw.c

diff --git a/drivers/gpu/host1x/Makefile b/drivers/gpu/host1x/Makefile
index f6c1924..541f334 100644
--- a/drivers/gpu/host1x/Makefile
+++ b/drivers/gpu/host1x/Makefile
@@ -8,6 +8,7 @@ host1x-objs = \
 	intr.o \
 	channel.o \
 	job.o \
+	debug.o \
 	memmgr.o
 
 obj-$(CONFIG_TEGRA_HOST1X_CMA) += cma.o
diff --git a/drivers/gpu/host1x/cdma.c b/drivers/gpu/host1x/cdma.c
index 1193fea..b924f23 100644
--- a/drivers/gpu/host1x/cdma.c
+++ b/drivers/gpu/host1x/cdma.c
@@ -19,6 +19,7 @@
 #include "cdma.h"
 #include "channel.h"
 #include "dev.h"
+#include "debug.h"
 #include "memmgr.h"
 #include <asm/cacheflush.h>
 
@@ -369,12 +370,45 @@ int host1x_cdma_begin(struct host1x_cdma *cdma, struct host1x_job *job)
 	return 0;
 }
 
+static void trace_write_gather(struct host1x_cdma *cdma,
+		struct mem_handle *ref,
+		u32 offset, u32 words)
+{
+	void *mem = NULL;
+
+	if (host1x_debug_trace_cmdbuf) {
+		mem = host1x_memmgr_mmap(ref);
+		if (IS_ERR_OR_NULL(mem))
+			mem = NULL;
+	};
+
+	if (mem) {
+		u32 i;
+		/*
+		 * Write in batches of 128 as there seems to be a limit
+		 * of how much you can output to ftrace at once.
+		 */
+		for (i = 0; i < words; i += TRACE_MAX_LENGTH) {
+			trace_host1x_cdma_push_gather(
+				cdma_to_channel(cdma)->dev->name,
+				(u32)ref,
+				min(words - i, TRACE_MAX_LENGTH),
+				offset + i * sizeof(u32),
+				mem);
+		}
+		host1x_memmgr_munmap(ref, mem);
+	}
+}
+
 /*
  * Push two words into a push buffer slot
  * Blocks as necessary if the push buffer is full.
  */
 void host1x_cdma_push(struct host1x_cdma *cdma, u32 op1, u32 op2)
 {
+	if (host1x_debug_trace_cmdbuf)
+		trace_host1x_cdma_push(cdma_to_channel(cdma)->dev->name,
+				op1, op2);
 	host1x_cdma_push_gather(cdma, NULL, 0, op1, op2);
 }
 
@@ -390,6 +424,9 @@ void host1x_cdma_push_gather(struct host1x_cdma *cdma,
 	u32 slots_free = cdma->slots_free;
 	struct push_buffer *pb = &cdma->push_buffer;
 
+	if (handle)
+		trace_write_gather(cdma, handle, offset, op1 & 0xffff);
+
 	if (slots_free == 0) {
 		host1x->cdma_op.kick(cdma);
 		slots_free = host1x_cdma_wait_locked(cdma,
diff --git a/drivers/gpu/host1x/debug.c b/drivers/gpu/host1x/debug.c
new file mode 100644
index 0000000..8bce9f1
--- /dev/null
+++ b/drivers/gpu/host1x/debug.c
@@ -0,0 +1,207 @@
+/*
+ * Copyright (C) 2010 Google, Inc.
+ * Author: Erik Gilling <konkers@android.com>
+ *
+ * Copyright (C) 2011-2012 NVIDIA Corporation
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include <linux/debugfs.h>
+#include <linux/seq_file.h>
+#include <linux/uaccess.h>
+
+#include <linux/io.h>
+
+#include "dev.h"
+#include "debug.h"
+#include "channel.h"
+
+pid_t host1x_debug_null_kickoff_pid;
+unsigned int host1x_debug_trace_cmdbuf;
+
+pid_t host1x_debug_force_timeout_pid;
+u32 host1x_debug_force_timeout_val;
+u32 host1x_debug_force_timeout_channel;
+
+void host1x_debug_output(struct output *o, const char *fmt, ...)
+{
+	va_list args;
+	int len;
+
+	va_start(args, fmt);
+	len = vsnprintf(o->buf, sizeof(o->buf), fmt, args);
+	va_end(args);
+	o->fn(o->ctx, o->buf, len);
+}
+
+static int show_channels(struct host1x_channel *ch, void *data)
+{
+	struct host1x *m = host1x_get_host(ch->dev);
+	struct output *o = data;
+
+	mutex_lock(&ch->reflock);
+	if (ch->refcount) {
+		mutex_lock(&ch->cdma.lock);
+		m->debug_op.show_channel_fifo(m, ch, o, ch->chid);
+		m->debug_op.show_channel_cdma(m, ch, o, ch->chid);
+		mutex_unlock(&ch->cdma.lock);
+	}
+	mutex_unlock(&ch->reflock);
+
+	return 0;
+}
+
+static void show_syncpts(struct host1x *m, struct output *o)
+{
+	int i;
+	host1x_debug_output(o, "---- syncpts ----\n");
+	for (i = 0; i < host1x_syncpt_nb_pts(m); i++) {
+		u32 max = host1x_syncpt_read_max(m->syncpt + i);
+		u32 min = host1x_syncpt_load_min(m->syncpt + i);
+		if (!min && !max)
+			continue;
+		host1x_debug_output(o, "id %d (%s) min %d max %d\n",
+			i, m->syncpt[i].name,
+			min, max);
+	}
+
+	for (i = 0; i < host1x_syncpt_nb_bases(m); i++) {
+		u32 base_val;
+		base_val = host1x_syncpt_read_wait_base(m->syncpt + i);
+		if (base_val)
+			host1x_debug_output(o, "waitbase id %d val %d\n",
+					i, base_val);
+	}
+
+	host1x_debug_output(o, "\n");
+}
+
+static void show_all(struct host1x *m, struct output *o)
+{
+	m->debug_op.show_mlocks(m, o);
+	show_syncpts(m, o);
+	host1x_debug_output(o, "---- channels ----\n");
+	host1x_channel_for_all(m, o, show_channels);
+}
+
+#ifdef CONFIG_DEBUG_FS
+static int show_channels_no_fifo(struct host1x_channel *ch, void *data)
+{
+	struct host1x *host1x = host1x_get_host(ch->dev);
+	struct output *o = data;
+
+	mutex_lock(&ch->reflock);
+	if (ch->refcount) {
+		mutex_lock(&ch->cdma.lock);
+		host1x->debug_op.show_channel_cdma(host1x, ch, o, ch->chid);
+		mutex_unlock(&ch->cdma.lock);
+	}
+	mutex_unlock(&ch->reflock);
+
+	return 0;
+}
+
+static void show_all_no_fifo(struct host1x *host1x, struct output *o)
+{
+	host1x->debug_op.show_mlocks(host1x, o);
+	show_syncpts(host1x, o);
+	host1x_debug_output(o, "---- channels ----\n");
+	host1x_channel_for_all(host1x, o, show_channels_no_fifo);
+}
+
+static int host1x_debug_show_all(struct seq_file *s, void *unused)
+{
+	struct output o = {
+		.fn = write_to_seqfile,
+		.ctx = s
+	};
+	show_all(s->private, &o);
+	return 0;
+}
+
+static int host1x_debug_show(struct seq_file *s, void *unused)
+{
+	struct output o = {
+		.fn = write_to_seqfile,
+		.ctx = s
+	};
+	show_all_no_fifo(s->private, &o);
+	return 0;
+}
+
+static int host1x_debug_open_all(struct inode *inode, struct file *file)
+{
+	return single_open(file, host1x_debug_show_all, inode->i_private);
+}
+
+static const struct file_operations host1x_debug_all_fops = {
+	.open		= host1x_debug_open_all,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= single_release,
+};
+
+static int host1x_debug_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, host1x_debug_show, inode->i_private);
+}
+
+static const struct file_operations host1x_debug_fops = {
+	.open		= host1x_debug_open,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= single_release,
+};
+
+void host1x_debug_init(struct host1x *host1x)
+{
+	struct dentry *de = debugfs_create_dir("tegra_host", NULL);
+
+	if (!de)
+		return;
+
+	/* Store the created entry */
+	host1x->debugfs = de;
+
+	debugfs_create_file("status", S_IRUGO, de,
+			host1x, &host1x_debug_fops);
+	debugfs_create_file("status_all", S_IRUGO, de,
+			host1x, &host1x_debug_all_fops);
+
+	debugfs_create_u32("null_kickoff_pid", S_IRUGO|S_IWUSR, de,
+			&host1x_debug_null_kickoff_pid);
+	debugfs_create_u32("trace_cmdbuf", S_IRUGO|S_IWUSR, de,
+			&host1x_debug_trace_cmdbuf);
+
+	if (host1x->debug_op.debug_init)
+		host1x->debug_op.debug_init(de);
+
+	debugfs_create_u32("force_timeout_pid", S_IRUGO|S_IWUSR, de,
+			&host1x_debug_force_timeout_pid);
+	debugfs_create_u32("force_timeout_val", S_IRUGO|S_IWUSR, de,
+			&host1x_debug_force_timeout_val);
+	debugfs_create_u32("force_timeout_channel", S_IRUGO|S_IWUSR, de,
+			&host1x_debug_force_timeout_channel);
+}
+#else
+void host1x_debug_init(struct host1x *host1x)
+{
+}
+#endif
+
+void host1x_debug_dump(struct host1x *host1x)
+{
+	struct output o = {
+		.fn = write_to_printk
+	};
+	show_all(host1x, &o);
+}
diff --git a/drivers/gpu/host1x/debug.h b/drivers/gpu/host1x/debug.h
new file mode 100644
index 0000000..c36b0d5
--- /dev/null
+++ b/drivers/gpu/host1x/debug.h
@@ -0,0 +1,49 @@
+/*
+ * Tegra host1x Debug
+ *
+ * Copyright (c) 2011-2012 NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __NVHOST_DEBUG_H
+#define __NVHOST_DEBUG_H
+
+#include <linux/debugfs.h>
+#include <linux/seq_file.h>
+
+struct host1x;
+
+struct output {
+	void (*fn)(void *ctx, const char *str, size_t len);
+	void *ctx;
+	char buf[256];
+};
+
+static inline void write_to_seqfile(void *ctx, const char *str, size_t len)
+{
+	seq_write((struct seq_file *)ctx, str, len);
+}
+
+static inline void write_to_printk(void *ctx, const char *str, size_t len)
+{
+	pr_info("%s", str);
+}
+
+void host1x_debug_output(struct output *o, const char *fmt, ...);
+
+extern unsigned int host1x_debug_trace_cmdbuf;
+
+void host1x_debug_init(struct host1x *master);
+void host1x_debug_dump(struct host1x *master);
+
+#endif /*__NVHOST_DEBUG_H */
diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
index 9209333..19e8b59 100644
--- a/drivers/gpu/host1x/dev.c
+++ b/drivers/gpu/host1x/dev.c
@@ -27,6 +27,7 @@
 #include "dev.h"
 #include "intr.h"
 #include "channel.h"
+#include "debug.h"
 #include "hw/host1x01.h"
 
 #define CREATE_TRACE_POINTS
@@ -199,6 +200,8 @@ static int host1x_probe(struct platform_device *dev)
 
 	host1x_intr_start(&host->intr, clk_get_rate(host->clk));
 
+	host1x_debug_init(host);
+
 	host1x = host;
 
 	dev_info(&dev->dev, "initialized\n");
diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h
index 093ac85..aa5182e 100644
--- a/drivers/gpu/host1x/dev.h
+++ b/drivers/gpu/host1x/dev.h
@@ -33,6 +33,7 @@ struct push_buffer;
 struct dentry;
 struct mem_handle;
 struct platform_device;
+struct output;
 
 struct host1x_channel_ops {
 	const char *soc_name;
@@ -72,6 +73,21 @@ struct host1x_pushbuffer_ops {
 	u32 (*putptr)(struct push_buffer *);
 };
 
+struct host1x_debug_ops {
+	void (*debug_init)(struct dentry *de);
+	void (*show_channel_cdma)(struct host1x *,
+				  struct host1x_channel *,
+				  struct output *,
+				  int chid);
+	void (*show_channel_fifo)(struct host1x *,
+				  struct host1x_channel *,
+				  struct output *,
+				  int chid);
+	void (*show_mlocks)(struct host1x *m,
+			    struct output *o);
+
+};
+
 struct host1x_syncpt_ops {
 	void (*reset)(struct host1x_syncpt *);
 	void (*reset_wait_base)(struct host1x_syncpt *);
@@ -119,6 +135,7 @@ struct host1x {
 	struct host1x_channel_ops channel_op;
 	struct host1x_cdma_ops cdma_op;
 	struct host1x_pushbuffer_ops cdma_pb_op;
+	struct host1x_debug_ops debug_op;
 	struct host1x_syncpt_ops syncpt_op;
 	struct host1x_intr_ops intr_op;
 
diff --git a/drivers/gpu/host1x/hw/cdma_hw.c b/drivers/gpu/host1x/hw/cdma_hw.c
index 55adaa6..f09a215 100644
--- a/drivers/gpu/host1x/hw/cdma_hw.c
+++ b/drivers/gpu/host1x/hw/cdma_hw.c
@@ -22,6 +22,7 @@
 #include "cdma.h"
 #include "channel.h"
 #include "dev.h"
+#include "debug.h"
 #include "memmgr.h"
 
 #include "cdma_hw.h"
@@ -409,6 +410,8 @@ static void cdma_timeout_handler(struct work_struct *work)
 	host1x = cdma_to_host1x(cdma);
 	ch = cdma_to_channel(cdma);
 
+	host1x_debug_dump(cdma_to_host1x(cdma));
+
 	mutex_lock(&cdma->lock);
 
 	if (!cdma->timeout.clientid) {
diff --git a/drivers/gpu/host1x/hw/debug_hw.c b/drivers/gpu/host1x/hw/debug_hw.c
new file mode 100644
index 0000000..f1a63b5
--- /dev/null
+++ b/drivers/gpu/host1x/hw/debug_hw.c
@@ -0,0 +1,399 @@
+/*
+ * Copyright (C) 2010 Google, Inc.
+ * Author: Erik Gilling <konkers@android.com>
+ *
+ * Copyright (C) 2011 NVIDIA Corporation
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include <linux/debugfs.h>
+#include <linux/seq_file.h>
+#include <linux/mm.h>
+#include <linux/scatterlist.h>
+
+#include <linux/io.h>
+
+#include "dev.h"
+#include "debug.h"
+#include "cdma.h"
+#include "channel.h"
+#include "memmgr.h"
+
+#define NVHOST_DEBUG_MAX_PAGE_OFFSET 102400
+
+enum {
+	NVHOST_DBG_STATE_CMD = 0,
+	NVHOST_DBG_STATE_DATA = 1,
+	NVHOST_DBG_STATE_GATHER = 2
+};
+
+static int show_channel_command(struct output *o, u32 addr, u32 val, int *count)
+{
+	unsigned mask;
+	unsigned subop;
+
+	switch (val >> 28) {
+	case 0x0:
+		mask = val & 0x3f;
+		if (mask) {
+			host1x_debug_output(o,
+				"SETCL(class=%03x, offset=%03x, mask=%02x, [",
+				val >> 6 & 0x3ff, val >> 16 & 0xfff, mask);
+			*count = hweight8(mask);
+			return NVHOST_DBG_STATE_DATA;
+		} else {
+			host1x_debug_output(o, "SETCL(class=%03x)\n",
+				val >> 6 & 0x3ff);
+			return NVHOST_DBG_STATE_CMD;
+		}
+
+	case 0x1:
+		host1x_debug_output(o, "INCR(offset=%03x, [",
+			val >> 16 & 0xfff);
+		*count = val & 0xffff;
+		return NVHOST_DBG_STATE_DATA;
+
+	case 0x2:
+		host1x_debug_output(o, "NONINCR(offset=%03x, [",
+			val >> 16 & 0xfff);
+		*count = val & 0xffff;
+		return NVHOST_DBG_STATE_DATA;
+
+	case 0x3:
+		mask = val & 0xffff;
+		host1x_debug_output(o, "MASK(offset=%03x, mask=%03x, [",
+			   val >> 16 & 0xfff, mask);
+		*count = hweight16(mask);
+		return NVHOST_DBG_STATE_DATA;
+
+	case 0x4:
+		host1x_debug_output(o, "IMM(offset=%03x, data=%03x)\n",
+			   val >> 16 & 0xfff, val & 0xffff);
+		return NVHOST_DBG_STATE_CMD;
+
+	case 0x5:
+		host1x_debug_output(o, "RESTART(offset=%08x)\n", val << 4);
+		return NVHOST_DBG_STATE_CMD;
+
+	case 0x6:
+		host1x_debug_output(o,
+			"GATHER(offset=%03x, insert=%d, type=%d, count=%04x, addr=[",
+			val >> 16 & 0xfff, val >> 15 & 0x1, val >> 14 & 0x1,
+			val & 0x3fff);
+		*count = val & 0x3fff; /* TODO: insert */
+		return NVHOST_DBG_STATE_GATHER;
+
+	case 0xe:
+		subop = val >> 24 & 0xf;
+		if (subop == 0)
+			host1x_debug_output(o, "ACQUIRE_MLOCK(index=%d)\n",
+				val & 0xff);
+		else if (subop == 1)
+			host1x_debug_output(o, "RELEASE_MLOCK(index=%d)\n",
+				val & 0xff);
+		else
+			host1x_debug_output(o, "EXTEND_UNKNOWN(%08x)\n", val);
+		return NVHOST_DBG_STATE_CMD;
+
+	default:
+		return NVHOST_DBG_STATE_CMD;
+	}
+}
+
+static void show_channel_gather(struct output *o, u32 addr,
+		phys_addr_t phys_addr, u32 words, struct host1x_cdma *cdma);
+
+static void show_channel_word(struct output *o, int *state, int *count,
+		u32 addr, u32 val, struct host1x_cdma *cdma)
+{
+	static int start_count, dont_print;
+
+	switch (*state) {
+	case NVHOST_DBG_STATE_CMD:
+		if (addr)
+			host1x_debug_output(o, "%08x: %08x:", addr, val);
+		else
+			host1x_debug_output(o, "%08x:", val);
+
+		*state = show_channel_command(o, addr, val, count);
+		dont_print = 0;
+		start_count = *count;
+		if (*state == NVHOST_DBG_STATE_DATA && *count == 0) {
+			*state = NVHOST_DBG_STATE_CMD;
+			host1x_debug_output(o, "])\n");
+		}
+		break;
+
+	case NVHOST_DBG_STATE_DATA:
+		(*count)--;
+		if (start_count - *count < 64)
+			host1x_debug_output(o, "%08x%s",
+				val, *count > 0 ? ", " : "])\n");
+		else if (!dont_print && (*count > 0)) {
+			host1x_debug_output(o, "[truncated; %d more words]\n",
+				*count);
+			dont_print = 1;
+		}
+		if (*count == 0)
+			*state = NVHOST_DBG_STATE_CMD;
+		break;
+
+	case NVHOST_DBG_STATE_GATHER:
+		*state = NVHOST_DBG_STATE_CMD;
+		host1x_debug_output(o, "%08x]):\n", val);
+		if (cdma) {
+			show_channel_gather(o, addr, val,
+					*count, cdma);
+		}
+		break;
+	}
+}
+
+static void do_show_channel_gather(struct output *o,
+		phys_addr_t phys_addr,
+		u32 words, struct host1x_cdma *cdma,
+		phys_addr_t pin_addr, u32 *map_addr)
+{
+	/* Map dmaget cursor to corresponding mem handle */
+	u32 offset;
+	int state, count, i;
+
+	offset = phys_addr - pin_addr;
+	/*
+	 * Sometimes we're given different hardware address to the same
+	 * page - in these cases the offset will get an invalid number and
+	 * we just have to bail out.
+	 */
+	if (offset > NVHOST_DEBUG_MAX_PAGE_OFFSET) {
+		host1x_debug_output(o, "[address mismatch]\n");
+	} else {
+		/* GATHER buffer starts always with commands */
+		state = NVHOST_DBG_STATE_CMD;
+		for (i = 0; i < words; i++)
+			show_channel_word(o, &state, &count,
+					phys_addr + i * 4,
+					*(map_addr + offset/4 + i),
+					cdma);
+	}
+}
+
+static void show_channel_gather(struct output *o, u32 addr,
+		phys_addr_t phys_addr,
+		u32 words, struct host1x_cdma *cdma)
+{
+	/* Map dmaget cursor to corresponding mem handle */
+	struct push_buffer *pb = &cdma->push_buffer;
+	u32 cur = addr - pb->phys;
+	struct mem_handle *mem = pb->handle[cur/8];
+	u32 *map_addr, offset;
+	struct sg_table *sgt;
+
+	if (!mem) {
+		host1x_debug_output(o, "[already deallocated]\n");
+		return;
+	}
+
+	map_addr = host1x_memmgr_mmap(mem);
+	if (!map_addr) {
+		host1x_debug_output(o, "[could not mmap]\n");
+		return;
+	}
+
+	/* Get base address from mem */
+	sgt = host1x_memmgr_pin(mem);
+	if (IS_ERR(sgt)) {
+		host1x_debug_output(o, "[couldn't pin]\n");
+		host1x_memmgr_munmap(mem, map_addr);
+		return;
+	}
+
+	offset = phys_addr - sg_dma_address(sgt->sgl);
+	do_show_channel_gather(o, phys_addr, words, cdma,
+			sg_dma_address(sgt->sgl), map_addr);
+	host1x_memmgr_unpin(mem, sgt);
+	host1x_memmgr_munmap(mem, map_addr);
+}
+
+static void show_channel_gathers(struct output *o, struct host1x_cdma *cdma)
+{
+	struct host1x_job *job;
+
+	list_for_each_entry(job, &cdma->sync_queue, list) {
+		int i;
+		host1x_debug_output(o, "\n%p: JOB, syncpt_id=%d, syncpt_val=%d,"
+				" first_get=%08x, timeout=%d"
+				" num_slots=%d, num_handles=%d\n",
+				job,
+				job->syncpt_id,
+				job->syncpt_end,
+				job->first_get,
+				job->timeout,
+				job->num_slots,
+				job->num_unpins);
+
+		for (i = 0; i < job->num_gathers; i++) {
+			struct host1x_job_gather *g = &job->gathers[i];
+			u32 *mapped = host1x_memmgr_mmap(g->ref);
+			if (!mapped) {
+				host1x_debug_output(o, "[could not mmap]\n");
+				continue;
+			}
+
+			host1x_debug_output(o,
+				"    GATHER at %08x+%04x, %d words\n",
+				g->mem_base, g->offset, g->words);
+
+			do_show_channel_gather(o, g->mem_base + g->offset,
+					g->words, cdma, g->mem_base, mapped);
+			host1x_memmgr_munmap(g->ref, mapped);
+		}
+	}
+}
+
+static void host1x_debug_show_channel_cdma(struct host1x *m,
+	struct host1x_channel *ch, struct output *o, int chid)
+{
+	struct host1x_channel *channel = ch;
+	struct host1x_cdma *cdma = &channel->cdma;
+	u32 dmaput, dmaget, dmactrl;
+	u32 cbstat, cbread;
+	u32 val, base, baseval;
+
+	dmaput = host1x_ch_readl(channel, host1x_channel_dmaput_r());
+	dmaget = host1x_ch_readl(channel, host1x_channel_dmaget_r());
+	dmactrl = host1x_ch_readl(channel, host1x_channel_dmactrl_r());
+	cbread = host1x_sync_readl(m, host1x_sync_cbread0_r() + 4 * chid);
+	cbstat = host1x_sync_readl(m, host1x_sync_cbstat_0_r() + 4 * chid);
+
+	host1x_debug_output(o, "%d-%s: ", chid,
+			    channel->dev->name);
+
+	if (host1x_channel_dmactrl_dmastop_v(dmactrl)
+		|| !channel->cdma.push_buffer.mapped) {
+		host1x_debug_output(o, "inactive\n\n");
+		return;
+	}
+
+	switch (cbstat) {
+	case 0x00010008:
+		host1x_debug_output(o, "waiting on syncpt %d val %d\n",
+			cbread >> 24, cbread & 0xffffff);
+		break;
+
+	case 0x00010009:
+		base = (cbread >> 16) & 0xff;
+		baseval = host1x_sync_readl(m,
+				host1x_sync_syncpt_base_0_r() + 4 * base);
+		val = cbread & 0xffff;
+		host1x_debug_output(o, "waiting on syncpt %d val %d "
+			  "(base %d = %d; offset = %d)\n",
+			cbread >> 24, baseval + val,
+			base, baseval, val);
+		break;
+
+	default:
+		host1x_debug_output(o,
+				"active class %02x, offset %04x, val %08x\n",
+				host1x_sync_cbstat_0_cbclass0_v(cbstat),
+				host1x_sync_cbstat_0_cboffset0_v(cbstat),
+				cbread);
+		break;
+	}
+
+	host1x_debug_output(o, "DMAPUT %08x, DMAGET %08x, DMACTL %08x\n",
+		dmaput, dmaget, dmactrl);
+	host1x_debug_output(o, "CBREAD %08x, CBSTAT %08x\n", cbread, cbstat);
+
+	show_channel_gathers(o, cdma);
+	host1x_debug_output(o, "\n");
+}
+
+static void host1x_debug_show_channel_fifo(struct host1x *m,
+	struct host1x_channel *ch, struct output *o, int chid)
+{
+	u32 val, rd_ptr, wr_ptr, start, end;
+	struct host1x_channel *channel = ch;
+	int state, count;
+
+	host1x_debug_output(o, "%d: fifo:\n", chid);
+
+	val = host1x_ch_readl(channel, host1x_channel_fifostat_r());
+	host1x_debug_output(o, "FIFOSTAT %08x\n", val);
+	if (host1x_channel_fifostat_cfempty_v(val)) {
+		host1x_debug_output(o, "[empty]\n");
+		return;
+	}
+
+	host1x_sync_writel(m, 0x0, host1x_sync_cfpeek_ctrl_r());
+	host1x_sync_writel(m, host1x_sync_cfpeek_ctrl_cfpeek_ena_f(1)
+			| host1x_sync_cfpeek_ctrl_cfpeek_channr_f(chid),
+		host1x_sync_cfpeek_ctrl_r());
+
+	val = host1x_sync_readl(m, host1x_sync_cfpeek_ptrs_r());
+	rd_ptr = host1x_sync_cfpeek_ptrs_cf_rd_ptr_v(val);
+	wr_ptr = host1x_sync_cfpeek_ptrs_cf_wr_ptr_v(val);
+
+	val = host1x_sync_readl(m, host1x_sync_cf0_setup_r() + 4 * chid);
+	start = host1x_sync_cf0_setup_cf0_base_v(val);
+	end = host1x_sync_cf0_setup_cf0_limit_v(val);
+
+	state = NVHOST_DBG_STATE_CMD;
+
+	do {
+		host1x_sync_writel(m, 0x0, host1x_sync_cfpeek_ctrl_r());
+		host1x_sync_writel(m, host1x_sync_cfpeek_ctrl_cfpeek_ena_f(1)
+				| host1x_sync_cfpeek_ctrl_cfpeek_channr_f(chid)
+				| host1x_sync_cfpeek_ctrl_cfpeek_addr_f(rd_ptr),
+			host1x_sync_cfpeek_ctrl_r());
+		val = host1x_sync_readl(m, host1x_sync_cfpeek_read_r());
+
+		show_channel_word(o, &state, &count, 0, val, NULL);
+
+		if (rd_ptr == end)
+			rd_ptr = start;
+		else
+			rd_ptr++;
+	} while (rd_ptr != wr_ptr);
+
+	if (state == NVHOST_DBG_STATE_DATA)
+		host1x_debug_output(o, ", ...])\n");
+	host1x_debug_output(o, "\n");
+
+	host1x_sync_writel(m, 0x0, host1x_sync_cfpeek_ctrl_r());
+}
+
+static void host1x_debug_show_mlocks(struct host1x *m, struct output *o)
+{
+	int i;
+
+	host1x_debug_output(o, "---- mlocks ----\n");
+	for (i = 0; i < host1x_syncpt_nb_mlocks(m); i++) {
+		u32 owner = host1x_sync_readl(m,
+				host1x_sync_mlock_owner_0_r() + i);
+		if (host1x_sync_mlock_owner_0_mlock_ch_owns_0_v(owner))
+			host1x_debug_output(o, "%d: locked by channel %d\n",
+				i,
+				host1x_sync_mlock_owner_0_mlock_owner_chid_0_f(
+					owner));
+		else if (host1x_sync_mlock_owner_0_mlock_cpu_owns_0_v(owner))
+			host1x_debug_output(o, "%d: locked by cpu\n", i);
+		else
+			host1x_debug_output(o, "%d: unlocked\n", i);
+	}
+	host1x_debug_output(o, "\n");
+}
+
+static const struct host1x_debug_ops host1x_debug_ops = {
+	.show_channel_cdma = host1x_debug_show_channel_cdma,
+	.show_channel_fifo = host1x_debug_show_channel_fifo,
+	.show_mlocks = host1x_debug_show_mlocks,
+};
diff --git a/drivers/gpu/host1x/hw/host1x01.c b/drivers/gpu/host1x/hw/host1x01.c
index 3f41619..7a26e96 100644
--- a/drivers/gpu/host1x/hw/host1x01.c
+++ b/drivers/gpu/host1x/hw/host1x01.c
@@ -29,6 +29,7 @@
 
 #include "hw/channel_hw.c"
 #include "hw/cdma_hw.c"
+#include "hw/debug_hw.c"
 #include "hw/syncpt_hw.c"
 #include "hw/intr_hw.c"
 
@@ -37,6 +38,7 @@ int host1x01_init(struct host1x *host)
 	host->channel_op = host1x_channel_ops;
 	host->cdma_op = host1x_cdma_ops;
 	host->cdma_pb_op = host1x_pushbuffer_ops;
+	host->debug_op = host1x_debug_ops;
 	host->syncpt_op = host1x_syncpt_ops;
 	host->intr_op = host1x_intr_ops;
 
diff --git a/drivers/gpu/host1x/hw/hw_host1x01_channel.h b/drivers/gpu/host1x/hw/hw_host1x01_channel.h
index 3a23d57..29f0ddc0 100644
--- a/drivers/gpu/host1x/hw/hw_host1x01_channel.h
+++ b/drivers/gpu/host1x/hw/hw_host1x01_channel.h
@@ -51,6 +51,14 @@
 #ifndef __hw_host1x_channel_host1x_h__
 #define __hw_host1x_channel_host1x_h__
 
+static inline u32 host1x_channel_fifostat_r(void)
+{
+	return 0x0;
+}
+static inline u32 host1x_channel_fifostat_cfempty_v(u32 r)
+{
+	return (r >> 10) & 0x1;
+}
 static inline u32 host1x_channel_dmastart_r(void)
 {
 	return 0x14;
@@ -75,6 +83,10 @@ static inline u32 host1x_channel_dmactrl_dmastop_f(u32 v)
 {
 	return (v & 0x1) << 0;
 }
+static inline u32 host1x_channel_dmactrl_dmastop_v(u32 r)
+{
+	return (r >> 0) & 0x1;
+}
 static inline u32 host1x_channel_dmactrl_dmagetrst_f(u32 v)
 {
 	return (v & 0x1) << 1;
diff --git a/drivers/gpu/host1x/hw/hw_host1x01_sync.h b/drivers/gpu/host1x/hw/hw_host1x01_sync.h
index c9342da..c4f6533 100644
--- a/drivers/gpu/host1x/hw/hw_host1x01_sync.h
+++ b/drivers/gpu/host1x/hw/hw_host1x01_sync.h
@@ -63,6 +63,18 @@ static inline u32 host1x_sync_syncpt_thresh_int_enable_cpu0_r(void)
 {
 	return 0x68;
 }
+static inline u32 host1x_sync_cf0_setup_r(void)
+{
+	return 0x80;
+}
+static inline u32 host1x_sync_cf0_setup_cf0_base_v(u32 r)
+{
+	return (r >> 0) & 0x1ff;
+}
+static inline u32 host1x_sync_cf0_setup_cf0_limit_v(u32 r)
+{
+	return (r >> 16) & 0x1ff;
+}
 static inline u32 host1x_sync_cmdproc_stop_r(void)
 {
 	return 0xac;
@@ -83,6 +95,22 @@ static inline u32 host1x_sync_ip_busy_timeout_r(void)
 {
 	return 0x1bc;
 }
+static inline u32 host1x_sync_mlock_owner_0_r(void)
+{
+	return 0x340;
+}
+static inline u32 host1x_sync_mlock_owner_0_mlock_owner_chid_0_f(u32 v)
+{
+	return (v & 0xf) << 8;
+}
+static inline u32 host1x_sync_mlock_owner_0_mlock_cpu_owns_0_v(u32 r)
+{
+	return (r >> 1) & 0x1;
+}
+static inline u32 host1x_sync_mlock_owner_0_mlock_ch_owns_0_v(u32 r)
+{
+	return (r >> 0) & 0x1;
+}
 static inline u32 host1x_sync_syncpt_0_r(void)
 {
 	return 0x400;
@@ -99,4 +127,53 @@ static inline u32 host1x_sync_syncpt_cpu_incr_r(void)
 {
 	return 0x700;
 }
+static inline u32 host1x_sync_cbread0_r(void)
+{
+	return 0x720;
+}
+static inline u32 host1x_sync_cfpeek_ctrl_r(void)
+{
+	return 0x74c;
+}
+static inline u32 host1x_sync_cfpeek_ctrl_cfpeek_addr_f(u32 v)
+{
+	return (v & 0x1ff) << 0;
+}
+static inline u32 host1x_sync_cfpeek_ctrl_cfpeek_channr_f(u32 v)
+{
+	return (v & 0x7) << 16;
+}
+static inline u32 host1x_sync_cfpeek_ctrl_cfpeek_ena_f(u32 v)
+{
+	return (v & 0x1) << 31;
+}
+static inline u32 host1x_sync_cfpeek_read_r(void)
+{
+	return 0x750;
+}
+static inline u32 host1x_sync_cfpeek_ptrs_r(void)
+{
+	return 0x754;
+}
+static inline u32 host1x_sync_cfpeek_ptrs_cf_rd_ptr_v(u32 r)
+{
+	return (r >> 0) & 0x1ff;
+}
+static inline u32 host1x_sync_cfpeek_ptrs_cf_wr_ptr_v(u32 r)
+{
+	return (r >> 16) & 0x1ff;
+}
+static inline u32 host1x_sync_cbstat_0_r(void)
+{
+	return 0x758;
+}
+static inline u32 host1x_sync_cbstat_0_cboffset0_v(u32 r)
+{
+	return (r >> 0) & 0xffff;
+}
+static inline u32 host1x_sync_cbstat_0_cbclass0_v(u32 r)
+{
+	return (r >> 16) & 0x3ff;
+}
+
 #endif /* __hw_host1x_sync_h__ */
diff --git a/drivers/gpu/host1x/hw/syncpt_hw.c b/drivers/gpu/host1x/hw/syncpt_hw.c
index a070473..09a21d2 100644
--- a/drivers/gpu/host1x/hw/syncpt_hw.c
+++ b/drivers/gpu/host1x/hw/syncpt_hw.c
@@ -90,6 +90,7 @@ static void syncpt_cpu_incr(struct host1x_syncpt *sp)
 		dev_err(&dev->dev->dev,
 			"Trying to increment syncpoint id %d beyond max\n",
 			sp->id);
+		host1x_debug_dump(sp->dev);
 		return;
 	}
 	host1x_sync_writel(dev, BIT_MASK(sp->id),
diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
index 5de67d2..e819092 100644
--- a/drivers/gpu/host1x/syncpt.c
+++ b/drivers/gpu/host1x/syncpt.c
@@ -23,6 +23,7 @@
 #include "syncpt.h"
 #include "dev.h"
 #include "intr.h"
+#include "debug.h"
 #include <trace/events/host1x.h>
 
 #define MAX_SYNCPT_LENGTH	5
@@ -219,6 +220,8 @@ int host1x_syncpt_wait(struct host1x_syncpt *sp,
 				 current->comm, sp->id, sp->name,
 				 thresh, timeout);
 			sp->dev->syncpt_op.debug(sp);
+			if (check_count == MAX_STUCK_CHECK_COUNT)
+				host1x_debug_dump(sp->dev);
 			check_count++;
 		}
 	}
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCHv3 4/7] gpu: host1x: Add debug support
@ 2012-12-13 14:04   ` Terje Bergstrom
  0 siblings, 0 replies; 24+ messages in thread
From: Terje Bergstrom @ 2012-12-13 14:04 UTC (permalink / raw)
  To: tbergstrom, thierry.reding, dev, linux-tegra, dri-devel
  Cc: amerilainen, linux-kernel

Add support for host1x debugging. Adds debugfs entries, and dumps
channel state to UART in case of stuck job.

Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
---
 drivers/gpu/host1x/Makefile                 |    1 +
 drivers/gpu/host1x/cdma.c                   |   37 +++
 drivers/gpu/host1x/debug.c                  |  207 ++++++++++++++
 drivers/gpu/host1x/debug.h                  |   49 ++++
 drivers/gpu/host1x/dev.c                    |    3 +
 drivers/gpu/host1x/dev.h                    |   17 ++
 drivers/gpu/host1x/hw/cdma_hw.c             |    3 +
 drivers/gpu/host1x/hw/debug_hw.c            |  399 +++++++++++++++++++++++++++
 drivers/gpu/host1x/hw/host1x01.c            |    2 +
 drivers/gpu/host1x/hw/hw_host1x01_channel.h |   12 +
 drivers/gpu/host1x/hw/hw_host1x01_sync.h    |   77 ++++++
 drivers/gpu/host1x/hw/syncpt_hw.c           |    1 +
 drivers/gpu/host1x/syncpt.c                 |    3 +
 13 files changed, 811 insertions(+)
 create mode 100644 drivers/gpu/host1x/debug.c
 create mode 100644 drivers/gpu/host1x/debug.h
 create mode 100644 drivers/gpu/host1x/hw/debug_hw.c

diff --git a/drivers/gpu/host1x/Makefile b/drivers/gpu/host1x/Makefile
index f6c1924..541f334 100644
--- a/drivers/gpu/host1x/Makefile
+++ b/drivers/gpu/host1x/Makefile
@@ -8,6 +8,7 @@ host1x-objs = \
 	intr.o \
 	channel.o \
 	job.o \
+	debug.o \
 	memmgr.o
 
 obj-$(CONFIG_TEGRA_HOST1X_CMA) += cma.o
diff --git a/drivers/gpu/host1x/cdma.c b/drivers/gpu/host1x/cdma.c
index 1193fea..b924f23 100644
--- a/drivers/gpu/host1x/cdma.c
+++ b/drivers/gpu/host1x/cdma.c
@@ -19,6 +19,7 @@
 #include "cdma.h"
 #include "channel.h"
 #include "dev.h"
+#include "debug.h"
 #include "memmgr.h"
 #include <asm/cacheflush.h>
 
@@ -369,12 +370,45 @@ int host1x_cdma_begin(struct host1x_cdma *cdma, struct host1x_job *job)
 	return 0;
 }
 
+static void trace_write_gather(struct host1x_cdma *cdma,
+		struct mem_handle *ref,
+		u32 offset, u32 words)
+{
+	void *mem = NULL;
+
+	if (host1x_debug_trace_cmdbuf) {
+		mem = host1x_memmgr_mmap(ref);
+		if (IS_ERR_OR_NULL(mem))
+			mem = NULL;
+	};
+
+	if (mem) {
+		u32 i;
+		/*
+		 * Write in batches of 128 as there seems to be a limit
+		 * of how much you can output to ftrace at once.
+		 */
+		for (i = 0; i < words; i += TRACE_MAX_LENGTH) {
+			trace_host1x_cdma_push_gather(
+				cdma_to_channel(cdma)->dev->name,
+				(u32)ref,
+				min(words - i, TRACE_MAX_LENGTH),
+				offset + i * sizeof(u32),
+				mem);
+		}
+		host1x_memmgr_munmap(ref, mem);
+	}
+}
+
 /*
  * Push two words into a push buffer slot
  * Blocks as necessary if the push buffer is full.
  */
 void host1x_cdma_push(struct host1x_cdma *cdma, u32 op1, u32 op2)
 {
+	if (host1x_debug_trace_cmdbuf)
+		trace_host1x_cdma_push(cdma_to_channel(cdma)->dev->name,
+				op1, op2);
 	host1x_cdma_push_gather(cdma, NULL, 0, op1, op2);
 }
 
@@ -390,6 +424,9 @@ void host1x_cdma_push_gather(struct host1x_cdma *cdma,
 	u32 slots_free = cdma->slots_free;
 	struct push_buffer *pb = &cdma->push_buffer;
 
+	if (handle)
+		trace_write_gather(cdma, handle, offset, op1 & 0xffff);
+
 	if (slots_free == 0) {
 		host1x->cdma_op.kick(cdma);
 		slots_free = host1x_cdma_wait_locked(cdma,
diff --git a/drivers/gpu/host1x/debug.c b/drivers/gpu/host1x/debug.c
new file mode 100644
index 0000000..8bce9f1
--- /dev/null
+++ b/drivers/gpu/host1x/debug.c
@@ -0,0 +1,207 @@
+/*
+ * Copyright (C) 2010 Google, Inc.
+ * Author: Erik Gilling <konkers@android.com>
+ *
+ * Copyright (C) 2011-2012 NVIDIA Corporation
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include <linux/debugfs.h>
+#include <linux/seq_file.h>
+#include <linux/uaccess.h>
+
+#include <linux/io.h>
+
+#include "dev.h"
+#include "debug.h"
+#include "channel.h"
+
+pid_t host1x_debug_null_kickoff_pid;
+unsigned int host1x_debug_trace_cmdbuf;
+
+pid_t host1x_debug_force_timeout_pid;
+u32 host1x_debug_force_timeout_val;
+u32 host1x_debug_force_timeout_channel;
+
+void host1x_debug_output(struct output *o, const char *fmt, ...)
+{
+	va_list args;
+	int len;
+
+	va_start(args, fmt);
+	len = vsnprintf(o->buf, sizeof(o->buf), fmt, args);
+	va_end(args);
+	o->fn(o->ctx, o->buf, len);
+}
+
+static int show_channels(struct host1x_channel *ch, void *data)
+{
+	struct host1x *m = host1x_get_host(ch->dev);
+	struct output *o = data;
+
+	mutex_lock(&ch->reflock);
+	if (ch->refcount) {
+		mutex_lock(&ch->cdma.lock);
+		m->debug_op.show_channel_fifo(m, ch, o, ch->chid);
+		m->debug_op.show_channel_cdma(m, ch, o, ch->chid);
+		mutex_unlock(&ch->cdma.lock);
+	}
+	mutex_unlock(&ch->reflock);
+
+	return 0;
+}
+
+static void show_syncpts(struct host1x *m, struct output *o)
+{
+	int i;
+	host1x_debug_output(o, "---- syncpts ----\n");
+	for (i = 0; i < host1x_syncpt_nb_pts(m); i++) {
+		u32 max = host1x_syncpt_read_max(m->syncpt + i);
+		u32 min = host1x_syncpt_load_min(m->syncpt + i);
+		if (!min && !max)
+			continue;
+		host1x_debug_output(o, "id %d (%s) min %d max %d\n",
+			i, m->syncpt[i].name,
+			min, max);
+	}
+
+	for (i = 0; i < host1x_syncpt_nb_bases(m); i++) {
+		u32 base_val;
+		base_val = host1x_syncpt_read_wait_base(m->syncpt + i);
+		if (base_val)
+			host1x_debug_output(o, "waitbase id %d val %d\n",
+					i, base_val);
+	}
+
+	host1x_debug_output(o, "\n");
+}
+
+static void show_all(struct host1x *m, struct output *o)
+{
+	m->debug_op.show_mlocks(m, o);
+	show_syncpts(m, o);
+	host1x_debug_output(o, "---- channels ----\n");
+	host1x_channel_for_all(m, o, show_channels);
+}
+
+#ifdef CONFIG_DEBUG_FS
+static int show_channels_no_fifo(struct host1x_channel *ch, void *data)
+{
+	struct host1x *host1x = host1x_get_host(ch->dev);
+	struct output *o = data;
+
+	mutex_lock(&ch->reflock);
+	if (ch->refcount) {
+		mutex_lock(&ch->cdma.lock);
+		host1x->debug_op.show_channel_cdma(host1x, ch, o, ch->chid);
+		mutex_unlock(&ch->cdma.lock);
+	}
+	mutex_unlock(&ch->reflock);
+
+	return 0;
+}
+
+static void show_all_no_fifo(struct host1x *host1x, struct output *o)
+{
+	host1x->debug_op.show_mlocks(host1x, o);
+	show_syncpts(host1x, o);
+	host1x_debug_output(o, "---- channels ----\n");
+	host1x_channel_for_all(host1x, o, show_channels_no_fifo);
+}
+
+static int host1x_debug_show_all(struct seq_file *s, void *unused)
+{
+	struct output o = {
+		.fn = write_to_seqfile,
+		.ctx = s
+	};
+	show_all(s->private, &o);
+	return 0;
+}
+
+static int host1x_debug_show(struct seq_file *s, void *unused)
+{
+	struct output o = {
+		.fn = write_to_seqfile,
+		.ctx = s
+	};
+	show_all_no_fifo(s->private, &o);
+	return 0;
+}
+
+static int host1x_debug_open_all(struct inode *inode, struct file *file)
+{
+	return single_open(file, host1x_debug_show_all, inode->i_private);
+}
+
+static const struct file_operations host1x_debug_all_fops = {
+	.open		= host1x_debug_open_all,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= single_release,
+};
+
+static int host1x_debug_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, host1x_debug_show, inode->i_private);
+}
+
+static const struct file_operations host1x_debug_fops = {
+	.open		= host1x_debug_open,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= single_release,
+};
+
+void host1x_debug_init(struct host1x *host1x)
+{
+	struct dentry *de = debugfs_create_dir("tegra_host", NULL);
+
+	if (!de)
+		return;
+
+	/* Store the created entry */
+	host1x->debugfs = de;
+
+	debugfs_create_file("status", S_IRUGO, de,
+			host1x, &host1x_debug_fops);
+	debugfs_create_file("status_all", S_IRUGO, de,
+			host1x, &host1x_debug_all_fops);
+
+	debugfs_create_u32("null_kickoff_pid", S_IRUGO|S_IWUSR, de,
+			&host1x_debug_null_kickoff_pid);
+	debugfs_create_u32("trace_cmdbuf", S_IRUGO|S_IWUSR, de,
+			&host1x_debug_trace_cmdbuf);
+
+	if (host1x->debug_op.debug_init)
+		host1x->debug_op.debug_init(de);
+
+	debugfs_create_u32("force_timeout_pid", S_IRUGO|S_IWUSR, de,
+			&host1x_debug_force_timeout_pid);
+	debugfs_create_u32("force_timeout_val", S_IRUGO|S_IWUSR, de,
+			&host1x_debug_force_timeout_val);
+	debugfs_create_u32("force_timeout_channel", S_IRUGO|S_IWUSR, de,
+			&host1x_debug_force_timeout_channel);
+}
+#else
+void host1x_debug_init(struct host1x *host1x)
+{
+}
+#endif
+
+void host1x_debug_dump(struct host1x *host1x)
+{
+	struct output o = {
+		.fn = write_to_printk
+	};
+	show_all(host1x, &o);
+}
diff --git a/drivers/gpu/host1x/debug.h b/drivers/gpu/host1x/debug.h
new file mode 100644
index 0000000..c36b0d5
--- /dev/null
+++ b/drivers/gpu/host1x/debug.h
@@ -0,0 +1,49 @@
+/*
+ * Tegra host1x Debug
+ *
+ * Copyright (c) 2011-2012 NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __NVHOST_DEBUG_H
+#define __NVHOST_DEBUG_H
+
+#include <linux/debugfs.h>
+#include <linux/seq_file.h>
+
+struct host1x;
+
+struct output {
+	void (*fn)(void *ctx, const char *str, size_t len);
+	void *ctx;
+	char buf[256];
+};
+
+static inline void write_to_seqfile(void *ctx, const char *str, size_t len)
+{
+	seq_write((struct seq_file *)ctx, str, len);
+}
+
+static inline void write_to_printk(void *ctx, const char *str, size_t len)
+{
+	pr_info("%s", str);
+}
+
+void host1x_debug_output(struct output *o, const char *fmt, ...);
+
+extern unsigned int host1x_debug_trace_cmdbuf;
+
+void host1x_debug_init(struct host1x *master);
+void host1x_debug_dump(struct host1x *master);
+
+#endif /*__NVHOST_DEBUG_H */
diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
index 9209333..19e8b59 100644
--- a/drivers/gpu/host1x/dev.c
+++ b/drivers/gpu/host1x/dev.c
@@ -27,6 +27,7 @@
 #include "dev.h"
 #include "intr.h"
 #include "channel.h"
+#include "debug.h"
 #include "hw/host1x01.h"
 
 #define CREATE_TRACE_POINTS
@@ -199,6 +200,8 @@ static int host1x_probe(struct platform_device *dev)
 
 	host1x_intr_start(&host->intr, clk_get_rate(host->clk));
 
+	host1x_debug_init(host);
+
 	host1x = host;
 
 	dev_info(&dev->dev, "initialized\n");
diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h
index 093ac85..aa5182e 100644
--- a/drivers/gpu/host1x/dev.h
+++ b/drivers/gpu/host1x/dev.h
@@ -33,6 +33,7 @@ struct push_buffer;
 struct dentry;
 struct mem_handle;
 struct platform_device;
+struct output;
 
 struct host1x_channel_ops {
 	const char *soc_name;
@@ -72,6 +73,21 @@ struct host1x_pushbuffer_ops {
 	u32 (*putptr)(struct push_buffer *);
 };
 
+struct host1x_debug_ops {
+	void (*debug_init)(struct dentry *de);
+	void (*show_channel_cdma)(struct host1x *,
+				  struct host1x_channel *,
+				  struct output *,
+				  int chid);
+	void (*show_channel_fifo)(struct host1x *,
+				  struct host1x_channel *,
+				  struct output *,
+				  int chid);
+	void (*show_mlocks)(struct host1x *m,
+			    struct output *o);
+
+};
+
 struct host1x_syncpt_ops {
 	void (*reset)(struct host1x_syncpt *);
 	void (*reset_wait_base)(struct host1x_syncpt *);
@@ -119,6 +135,7 @@ struct host1x {
 	struct host1x_channel_ops channel_op;
 	struct host1x_cdma_ops cdma_op;
 	struct host1x_pushbuffer_ops cdma_pb_op;
+	struct host1x_debug_ops debug_op;
 	struct host1x_syncpt_ops syncpt_op;
 	struct host1x_intr_ops intr_op;
 
diff --git a/drivers/gpu/host1x/hw/cdma_hw.c b/drivers/gpu/host1x/hw/cdma_hw.c
index 55adaa6..f09a215 100644
--- a/drivers/gpu/host1x/hw/cdma_hw.c
+++ b/drivers/gpu/host1x/hw/cdma_hw.c
@@ -22,6 +22,7 @@
 #include "cdma.h"
 #include "channel.h"
 #include "dev.h"
+#include "debug.h"
 #include "memmgr.h"
 
 #include "cdma_hw.h"
@@ -409,6 +410,8 @@ static void cdma_timeout_handler(struct work_struct *work)
 	host1x = cdma_to_host1x(cdma);
 	ch = cdma_to_channel(cdma);
 
+	host1x_debug_dump(cdma_to_host1x(cdma));
+
 	mutex_lock(&cdma->lock);
 
 	if (!cdma->timeout.clientid) {
diff --git a/drivers/gpu/host1x/hw/debug_hw.c b/drivers/gpu/host1x/hw/debug_hw.c
new file mode 100644
index 0000000..f1a63b5
--- /dev/null
+++ b/drivers/gpu/host1x/hw/debug_hw.c
@@ -0,0 +1,399 @@
+/*
+ * Copyright (C) 2010 Google, Inc.
+ * Author: Erik Gilling <konkers@android.com>
+ *
+ * Copyright (C) 2011 NVIDIA Corporation
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include <linux/debugfs.h>
+#include <linux/seq_file.h>
+#include <linux/mm.h>
+#include <linux/scatterlist.h>
+
+#include <linux/io.h>
+
+#include "dev.h"
+#include "debug.h"
+#include "cdma.h"
+#include "channel.h"
+#include "memmgr.h"
+
+#define NVHOST_DEBUG_MAX_PAGE_OFFSET 102400
+
+enum {
+	NVHOST_DBG_STATE_CMD = 0,
+	NVHOST_DBG_STATE_DATA = 1,
+	NVHOST_DBG_STATE_GATHER = 2
+};
+
+static int show_channel_command(struct output *o, u32 addr, u32 val, int *count)
+{
+	unsigned mask;
+	unsigned subop;
+
+	switch (val >> 28) {
+	case 0x0:
+		mask = val & 0x3f;
+		if (mask) {
+			host1x_debug_output(o,
+				"SETCL(class=%03x, offset=%03x, mask=%02x, [",
+				val >> 6 & 0x3ff, val >> 16 & 0xfff, mask);
+			*count = hweight8(mask);
+			return NVHOST_DBG_STATE_DATA;
+		} else {
+			host1x_debug_output(o, "SETCL(class=%03x)\n",
+				val >> 6 & 0x3ff);
+			return NVHOST_DBG_STATE_CMD;
+		}
+
+	case 0x1:
+		host1x_debug_output(o, "INCR(offset=%03x, [",
+			val >> 16 & 0xfff);
+		*count = val & 0xffff;
+		return NVHOST_DBG_STATE_DATA;
+
+	case 0x2:
+		host1x_debug_output(o, "NONINCR(offset=%03x, [",
+			val >> 16 & 0xfff);
+		*count = val & 0xffff;
+		return NVHOST_DBG_STATE_DATA;
+
+	case 0x3:
+		mask = val & 0xffff;
+		host1x_debug_output(o, "MASK(offset=%03x, mask=%03x, [",
+			   val >> 16 & 0xfff, mask);
+		*count = hweight16(mask);
+		return NVHOST_DBG_STATE_DATA;
+
+	case 0x4:
+		host1x_debug_output(o, "IMM(offset=%03x, data=%03x)\n",
+			   val >> 16 & 0xfff, val & 0xffff);
+		return NVHOST_DBG_STATE_CMD;
+
+	case 0x5:
+		host1x_debug_output(o, "RESTART(offset=%08x)\n", val << 4);
+		return NVHOST_DBG_STATE_CMD;
+
+	case 0x6:
+		host1x_debug_output(o,
+			"GATHER(offset=%03x, insert=%d, type=%d, count=%04x, addr=[",
+			val >> 16 & 0xfff, val >> 15 & 0x1, val >> 14 & 0x1,
+			val & 0x3fff);
+		*count = val & 0x3fff; /* TODO: insert */
+		return NVHOST_DBG_STATE_GATHER;
+
+	case 0xe:
+		subop = val >> 24 & 0xf;
+		if (subop == 0)
+			host1x_debug_output(o, "ACQUIRE_MLOCK(index=%d)\n",
+				val & 0xff);
+		else if (subop == 1)
+			host1x_debug_output(o, "RELEASE_MLOCK(index=%d)\n",
+				val & 0xff);
+		else
+			host1x_debug_output(o, "EXTEND_UNKNOWN(%08x)\n", val);
+		return NVHOST_DBG_STATE_CMD;
+
+	default:
+		return NVHOST_DBG_STATE_CMD;
+	}
+}
+
+static void show_channel_gather(struct output *o, u32 addr,
+		phys_addr_t phys_addr, u32 words, struct host1x_cdma *cdma);
+
+static void show_channel_word(struct output *o, int *state, int *count,
+		u32 addr, u32 val, struct host1x_cdma *cdma)
+{
+	static int start_count, dont_print;
+
+	switch (*state) {
+	case NVHOST_DBG_STATE_CMD:
+		if (addr)
+			host1x_debug_output(o, "%08x: %08x:", addr, val);
+		else
+			host1x_debug_output(o, "%08x:", val);
+
+		*state = show_channel_command(o, addr, val, count);
+		dont_print = 0;
+		start_count = *count;
+		if (*state == NVHOST_DBG_STATE_DATA && *count == 0) {
+			*state = NVHOST_DBG_STATE_CMD;
+			host1x_debug_output(o, "])\n");
+		}
+		break;
+
+	case NVHOST_DBG_STATE_DATA:
+		(*count)--;
+		if (start_count - *count < 64)
+			host1x_debug_output(o, "%08x%s",
+				val, *count > 0 ? ", " : "])\n");
+		else if (!dont_print && (*count > 0)) {
+			host1x_debug_output(o, "[truncated; %d more words]\n",
+				*count);
+			dont_print = 1;
+		}
+		if (*count == 0)
+			*state = NVHOST_DBG_STATE_CMD;
+		break;
+
+	case NVHOST_DBG_STATE_GATHER:
+		*state = NVHOST_DBG_STATE_CMD;
+		host1x_debug_output(o, "%08x]):\n", val);
+		if (cdma) {
+			show_channel_gather(o, addr, val,
+					*count, cdma);
+		}
+		break;
+	}
+}
+
+static void do_show_channel_gather(struct output *o,
+		phys_addr_t phys_addr,
+		u32 words, struct host1x_cdma *cdma,
+		phys_addr_t pin_addr, u32 *map_addr)
+{
+	/* Map dmaget cursor to corresponding mem handle */
+	u32 offset;
+	int state, count, i;
+
+	offset = phys_addr - pin_addr;
+	/*
+	 * Sometimes we're given different hardware address to the same
+	 * page - in these cases the offset will get an invalid number and
+	 * we just have to bail out.
+	 */
+	if (offset > NVHOST_DEBUG_MAX_PAGE_OFFSET) {
+		host1x_debug_output(o, "[address mismatch]\n");
+	} else {
+		/* GATHER buffer starts always with commands */
+		state = NVHOST_DBG_STATE_CMD;
+		for (i = 0; i < words; i++)
+			show_channel_word(o, &state, &count,
+					phys_addr + i * 4,
+					*(map_addr + offset/4 + i),
+					cdma);
+	}
+}
+
+static void show_channel_gather(struct output *o, u32 addr,
+		phys_addr_t phys_addr,
+		u32 words, struct host1x_cdma *cdma)
+{
+	/* Map dmaget cursor to corresponding mem handle */
+	struct push_buffer *pb = &cdma->push_buffer;
+	u32 cur = addr - pb->phys;
+	struct mem_handle *mem = pb->handle[cur/8];
+	u32 *map_addr, offset;
+	struct sg_table *sgt;
+
+	if (!mem) {
+		host1x_debug_output(o, "[already deallocated]\n");
+		return;
+	}
+
+	map_addr = host1x_memmgr_mmap(mem);
+	if (!map_addr) {
+		host1x_debug_output(o, "[could not mmap]\n");
+		return;
+	}
+
+	/* Get base address from mem */
+	sgt = host1x_memmgr_pin(mem);
+	if (IS_ERR(sgt)) {
+		host1x_debug_output(o, "[couldn't pin]\n");
+		host1x_memmgr_munmap(mem, map_addr);
+		return;
+	}
+
+	offset = phys_addr - sg_dma_address(sgt->sgl);
+	do_show_channel_gather(o, phys_addr, words, cdma,
+			sg_dma_address(sgt->sgl), map_addr);
+	host1x_memmgr_unpin(mem, sgt);
+	host1x_memmgr_munmap(mem, map_addr);
+}
+
+static void show_channel_gathers(struct output *o, struct host1x_cdma *cdma)
+{
+	struct host1x_job *job;
+
+	list_for_each_entry(job, &cdma->sync_queue, list) {
+		int i;
+		host1x_debug_output(o, "\n%p: JOB, syncpt_id=%d, syncpt_val=%d,"
+				" first_get=%08x, timeout=%d"
+				" num_slots=%d, num_handles=%d\n",
+				job,
+				job->syncpt_id,
+				job->syncpt_end,
+				job->first_get,
+				job->timeout,
+				job->num_slots,
+				job->num_unpins);
+
+		for (i = 0; i < job->num_gathers; i++) {
+			struct host1x_job_gather *g = &job->gathers[i];
+			u32 *mapped = host1x_memmgr_mmap(g->ref);
+			if (!mapped) {
+				host1x_debug_output(o, "[could not mmap]\n");
+				continue;
+			}
+
+			host1x_debug_output(o,
+				"    GATHER at %08x+%04x, %d words\n",
+				g->mem_base, g->offset, g->words);
+
+			do_show_channel_gather(o, g->mem_base + g->offset,
+					g->words, cdma, g->mem_base, mapped);
+			host1x_memmgr_munmap(g->ref, mapped);
+		}
+	}
+}
+
+static void host1x_debug_show_channel_cdma(struct host1x *m,
+	struct host1x_channel *ch, struct output *o, int chid)
+{
+	struct host1x_channel *channel = ch;
+	struct host1x_cdma *cdma = &channel->cdma;
+	u32 dmaput, dmaget, dmactrl;
+	u32 cbstat, cbread;
+	u32 val, base, baseval;
+
+	dmaput = host1x_ch_readl(channel, host1x_channel_dmaput_r());
+	dmaget = host1x_ch_readl(channel, host1x_channel_dmaget_r());
+	dmactrl = host1x_ch_readl(channel, host1x_channel_dmactrl_r());
+	cbread = host1x_sync_readl(m, host1x_sync_cbread0_r() + 4 * chid);
+	cbstat = host1x_sync_readl(m, host1x_sync_cbstat_0_r() + 4 * chid);
+
+	host1x_debug_output(o, "%d-%s: ", chid,
+			    channel->dev->name);
+
+	if (host1x_channel_dmactrl_dmastop_v(dmactrl)
+		|| !channel->cdma.push_buffer.mapped) {
+		host1x_debug_output(o, "inactive\n\n");
+		return;
+	}
+
+	switch (cbstat) {
+	case 0x00010008:
+		host1x_debug_output(o, "waiting on syncpt %d val %d\n",
+			cbread >> 24, cbread & 0xffffff);
+		break;
+
+	case 0x00010009:
+		base = (cbread >> 16) & 0xff;
+		baseval = host1x_sync_readl(m,
+				host1x_sync_syncpt_base_0_r() + 4 * base);
+		val = cbread & 0xffff;
+		host1x_debug_output(o, "waiting on syncpt %d val %d "
+			  "(base %d = %d; offset = %d)\n",
+			cbread >> 24, baseval + val,
+			base, baseval, val);
+		break;
+
+	default:
+		host1x_debug_output(o,
+				"active class %02x, offset %04x, val %08x\n",
+				host1x_sync_cbstat_0_cbclass0_v(cbstat),
+				host1x_sync_cbstat_0_cboffset0_v(cbstat),
+				cbread);
+		break;
+	}
+
+	host1x_debug_output(o, "DMAPUT %08x, DMAGET %08x, DMACTL %08x\n",
+		dmaput, dmaget, dmactrl);
+	host1x_debug_output(o, "CBREAD %08x, CBSTAT %08x\n", cbread, cbstat);
+
+	show_channel_gathers(o, cdma);
+	host1x_debug_output(o, "\n");
+}
+
+static void host1x_debug_show_channel_fifo(struct host1x *m,
+	struct host1x_channel *ch, struct output *o, int chid)
+{
+	u32 val, rd_ptr, wr_ptr, start, end;
+	struct host1x_channel *channel = ch;
+	int state, count;
+
+	host1x_debug_output(o, "%d: fifo:\n", chid);
+
+	val = host1x_ch_readl(channel, host1x_channel_fifostat_r());
+	host1x_debug_output(o, "FIFOSTAT %08x\n", val);
+	if (host1x_channel_fifostat_cfempty_v(val)) {
+		host1x_debug_output(o, "[empty]\n");
+		return;
+	}
+
+	host1x_sync_writel(m, 0x0, host1x_sync_cfpeek_ctrl_r());
+	host1x_sync_writel(m, host1x_sync_cfpeek_ctrl_cfpeek_ena_f(1)
+			| host1x_sync_cfpeek_ctrl_cfpeek_channr_f(chid),
+		host1x_sync_cfpeek_ctrl_r());
+
+	val = host1x_sync_readl(m, host1x_sync_cfpeek_ptrs_r());
+	rd_ptr = host1x_sync_cfpeek_ptrs_cf_rd_ptr_v(val);
+	wr_ptr = host1x_sync_cfpeek_ptrs_cf_wr_ptr_v(val);
+
+	val = host1x_sync_readl(m, host1x_sync_cf0_setup_r() + 4 * chid);
+	start = host1x_sync_cf0_setup_cf0_base_v(val);
+	end = host1x_sync_cf0_setup_cf0_limit_v(val);
+
+	state = NVHOST_DBG_STATE_CMD;
+
+	do {
+		host1x_sync_writel(m, 0x0, host1x_sync_cfpeek_ctrl_r());
+		host1x_sync_writel(m, host1x_sync_cfpeek_ctrl_cfpeek_ena_f(1)
+				| host1x_sync_cfpeek_ctrl_cfpeek_channr_f(chid)
+				| host1x_sync_cfpeek_ctrl_cfpeek_addr_f(rd_ptr),
+			host1x_sync_cfpeek_ctrl_r());
+		val = host1x_sync_readl(m, host1x_sync_cfpeek_read_r());
+
+		show_channel_word(o, &state, &count, 0, val, NULL);
+
+		if (rd_ptr == end)
+			rd_ptr = start;
+		else
+			rd_ptr++;
+	} while (rd_ptr != wr_ptr);
+
+	if (state == NVHOST_DBG_STATE_DATA)
+		host1x_debug_output(o, ", ...])\n");
+	host1x_debug_output(o, "\n");
+
+	host1x_sync_writel(m, 0x0, host1x_sync_cfpeek_ctrl_r());
+}
+
+static void host1x_debug_show_mlocks(struct host1x *m, struct output *o)
+{
+	int i;
+
+	host1x_debug_output(o, "---- mlocks ----\n");
+	for (i = 0; i < host1x_syncpt_nb_mlocks(m); i++) {
+		u32 owner = host1x_sync_readl(m,
+				host1x_sync_mlock_owner_0_r() + i);
+		if (host1x_sync_mlock_owner_0_mlock_ch_owns_0_v(owner))
+			host1x_debug_output(o, "%d: locked by channel %d\n",
+				i,
+				host1x_sync_mlock_owner_0_mlock_owner_chid_0_f(
+					owner));
+		else if (host1x_sync_mlock_owner_0_mlock_cpu_owns_0_v(owner))
+			host1x_debug_output(o, "%d: locked by cpu\n", i);
+		else
+			host1x_debug_output(o, "%d: unlocked\n", i);
+	}
+	host1x_debug_output(o, "\n");
+}
+
+static const struct host1x_debug_ops host1x_debug_ops = {
+	.show_channel_cdma = host1x_debug_show_channel_cdma,
+	.show_channel_fifo = host1x_debug_show_channel_fifo,
+	.show_mlocks = host1x_debug_show_mlocks,
+};
diff --git a/drivers/gpu/host1x/hw/host1x01.c b/drivers/gpu/host1x/hw/host1x01.c
index 3f41619..7a26e96 100644
--- a/drivers/gpu/host1x/hw/host1x01.c
+++ b/drivers/gpu/host1x/hw/host1x01.c
@@ -29,6 +29,7 @@
 
 #include "hw/channel_hw.c"
 #include "hw/cdma_hw.c"
+#include "hw/debug_hw.c"
 #include "hw/syncpt_hw.c"
 #include "hw/intr_hw.c"
 
@@ -37,6 +38,7 @@ int host1x01_init(struct host1x *host)
 	host->channel_op = host1x_channel_ops;
 	host->cdma_op = host1x_cdma_ops;
 	host->cdma_pb_op = host1x_pushbuffer_ops;
+	host->debug_op = host1x_debug_ops;
 	host->syncpt_op = host1x_syncpt_ops;
 	host->intr_op = host1x_intr_ops;
 
diff --git a/drivers/gpu/host1x/hw/hw_host1x01_channel.h b/drivers/gpu/host1x/hw/hw_host1x01_channel.h
index 3a23d57..29f0ddc0 100644
--- a/drivers/gpu/host1x/hw/hw_host1x01_channel.h
+++ b/drivers/gpu/host1x/hw/hw_host1x01_channel.h
@@ -51,6 +51,14 @@
 #ifndef __hw_host1x_channel_host1x_h__
 #define __hw_host1x_channel_host1x_h__
 
+static inline u32 host1x_channel_fifostat_r(void)
+{
+	return 0x0;
+}
+static inline u32 host1x_channel_fifostat_cfempty_v(u32 r)
+{
+	return (r >> 10) & 0x1;
+}
 static inline u32 host1x_channel_dmastart_r(void)
 {
 	return 0x14;
@@ -75,6 +83,10 @@ static inline u32 host1x_channel_dmactrl_dmastop_f(u32 v)
 {
 	return (v & 0x1) << 0;
 }
+static inline u32 host1x_channel_dmactrl_dmastop_v(u32 r)
+{
+	return (r >> 0) & 0x1;
+}
 static inline u32 host1x_channel_dmactrl_dmagetrst_f(u32 v)
 {
 	return (v & 0x1) << 1;
diff --git a/drivers/gpu/host1x/hw/hw_host1x01_sync.h b/drivers/gpu/host1x/hw/hw_host1x01_sync.h
index c9342da..c4f6533 100644
--- a/drivers/gpu/host1x/hw/hw_host1x01_sync.h
+++ b/drivers/gpu/host1x/hw/hw_host1x01_sync.h
@@ -63,6 +63,18 @@ static inline u32 host1x_sync_syncpt_thresh_int_enable_cpu0_r(void)
 {
 	return 0x68;
 }
+static inline u32 host1x_sync_cf0_setup_r(void)
+{
+	return 0x80;
+}
+static inline u32 host1x_sync_cf0_setup_cf0_base_v(u32 r)
+{
+	return (r >> 0) & 0x1ff;
+}
+static inline u32 host1x_sync_cf0_setup_cf0_limit_v(u32 r)
+{
+	return (r >> 16) & 0x1ff;
+}
 static inline u32 host1x_sync_cmdproc_stop_r(void)
 {
 	return 0xac;
@@ -83,6 +95,22 @@ static inline u32 host1x_sync_ip_busy_timeout_r(void)
 {
 	return 0x1bc;
 }
+static inline u32 host1x_sync_mlock_owner_0_r(void)
+{
+	return 0x340;
+}
+static inline u32 host1x_sync_mlock_owner_0_mlock_owner_chid_0_f(u32 v)
+{
+	return (v & 0xf) << 8;
+}
+static inline u32 host1x_sync_mlock_owner_0_mlock_cpu_owns_0_v(u32 r)
+{
+	return (r >> 1) & 0x1;
+}
+static inline u32 host1x_sync_mlock_owner_0_mlock_ch_owns_0_v(u32 r)
+{
+	return (r >> 0) & 0x1;
+}
 static inline u32 host1x_sync_syncpt_0_r(void)
 {
 	return 0x400;
@@ -99,4 +127,53 @@ static inline u32 host1x_sync_syncpt_cpu_incr_r(void)
 {
 	return 0x700;
 }
+static inline u32 host1x_sync_cbread0_r(void)
+{
+	return 0x720;
+}
+static inline u32 host1x_sync_cfpeek_ctrl_r(void)
+{
+	return 0x74c;
+}
+static inline u32 host1x_sync_cfpeek_ctrl_cfpeek_addr_f(u32 v)
+{
+	return (v & 0x1ff) << 0;
+}
+static inline u32 host1x_sync_cfpeek_ctrl_cfpeek_channr_f(u32 v)
+{
+	return (v & 0x7) << 16;
+}
+static inline u32 host1x_sync_cfpeek_ctrl_cfpeek_ena_f(u32 v)
+{
+	return (v & 0x1) << 31;
+}
+static inline u32 host1x_sync_cfpeek_read_r(void)
+{
+	return 0x750;
+}
+static inline u32 host1x_sync_cfpeek_ptrs_r(void)
+{
+	return 0x754;
+}
+static inline u32 host1x_sync_cfpeek_ptrs_cf_rd_ptr_v(u32 r)
+{
+	return (r >> 0) & 0x1ff;
+}
+static inline u32 host1x_sync_cfpeek_ptrs_cf_wr_ptr_v(u32 r)
+{
+	return (r >> 16) & 0x1ff;
+}
+static inline u32 host1x_sync_cbstat_0_r(void)
+{
+	return 0x758;
+}
+static inline u32 host1x_sync_cbstat_0_cboffset0_v(u32 r)
+{
+	return (r >> 0) & 0xffff;
+}
+static inline u32 host1x_sync_cbstat_0_cbclass0_v(u32 r)
+{
+	return (r >> 16) & 0x3ff;
+}
+
 #endif /* __hw_host1x_sync_h__ */
diff --git a/drivers/gpu/host1x/hw/syncpt_hw.c b/drivers/gpu/host1x/hw/syncpt_hw.c
index a070473..09a21d2 100644
--- a/drivers/gpu/host1x/hw/syncpt_hw.c
+++ b/drivers/gpu/host1x/hw/syncpt_hw.c
@@ -90,6 +90,7 @@ static void syncpt_cpu_incr(struct host1x_syncpt *sp)
 		dev_err(&dev->dev->dev,
 			"Trying to increment syncpoint id %d beyond max\n",
 			sp->id);
+		host1x_debug_dump(sp->dev);
 		return;
 	}
 	host1x_sync_writel(dev, BIT_MASK(sp->id),
diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
index 5de67d2..e819092 100644
--- a/drivers/gpu/host1x/syncpt.c
+++ b/drivers/gpu/host1x/syncpt.c
@@ -23,6 +23,7 @@
 #include "syncpt.h"
 #include "dev.h"
 #include "intr.h"
+#include "debug.h"
 #include <trace/events/host1x.h>
 
 #define MAX_SYNCPT_LENGTH	5
@@ -219,6 +220,8 @@ int host1x_syncpt_wait(struct host1x_syncpt *sp,
 				 current->comm, sp->id, sp->name,
 				 thresh, timeout);
 			sp->dev->syncpt_op.debug(sp);
+			if (check_count == MAX_STUCK_CHECK_COUNT)
+				host1x_debug_dump(sp->dev);
 			check_count++;
 		}
 	}
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCHv3 5/7] drm: tegra: Remove redundant host1x
  2012-12-13 14:04 ` Terje Bergstrom
@ 2012-12-13 14:04   ` Terje Bergstrom
  -1 siblings, 0 replies; 24+ messages in thread
From: Terje Bergstrom @ 2012-12-13 14:04 UTC (permalink / raw)
  To: tbergstrom, thierry.reding, dev, linux-tegra, dri-devel; +Cc: linux-kernel

From: Arto Merilainen <amerilainen@nvidia.com>

This patch removes the redundant host1x driver from tegradrm and
makes necessary bindings to the separate host driver.

The infrastructure for drm client lists is merged to drm.c.

The patch simplifies driver initialization; The original driver had
two lists for registered devices (clients and drm_active). The
clients list included references to all registered devices whereas
the drm_active list included only the devices that the tegradrm
driver itself supported. host1x is separated into a driver of its own
and hence there should be no need to support registration of external
drivers.  Therefore, only the drm_active list is reserved. Removal of
the list also simplifies the driver unregistration.

Signed-off-by: Arto Merilainen <amerilainen@nvidia.com>
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
---
 drivers/gpu/drm/tegra/Kconfig  |    2 +-
 drivers/gpu/drm/tegra/Makefile |    2 +-
 drivers/gpu/drm/tegra/dc.c     |   20 ++-
 drivers/gpu/drm/tegra/drm.c    |  217 +++++++++++++++++++++++++--
 drivers/gpu/drm/tegra/drm.h    |   38 ++---
 drivers/gpu/drm/tegra/fb.c     |   17 ++-
 drivers/gpu/drm/tegra/hdmi.c   |   24 ++-
 drivers/gpu/drm/tegra/host1x.c |  325 ----------------------------------------
 include/drm/tegra_drm.h        |   20 +++
 9 files changed, 275 insertions(+), 390 deletions(-)
 delete mode 100644 drivers/gpu/drm/tegra/host1x.c
 create mode 100644 include/drm/tegra_drm.h

diff --git a/drivers/gpu/drm/tegra/Kconfig b/drivers/gpu/drm/tegra/Kconfig
index be1daf7..4a0290e 100644
--- a/drivers/gpu/drm/tegra/Kconfig
+++ b/drivers/gpu/drm/tegra/Kconfig
@@ -1,6 +1,6 @@
 config DRM_TEGRA
 	tristate "NVIDIA Tegra DRM"
-	depends on DRM && OF && ARCH_TEGRA
+	depends on DRM && OF && ARCH_TEGRA && TEGRA_HOST1X
 	select DRM_KMS_HELPER
 	select DRM_GEM_CMA_HELPER
 	select DRM_KMS_CMA_HELPER
diff --git a/drivers/gpu/drm/tegra/Makefile b/drivers/gpu/drm/tegra/Makefile
index 80f73d1..f4c05bb 100644
--- a/drivers/gpu/drm/tegra/Makefile
+++ b/drivers/gpu/drm/tegra/Makefile
@@ -1,7 +1,7 @@
 ccflags-y := -Iinclude/drm
 ccflags-$(CONFIG_DRM_TEGRA_DEBUG) += -DDEBUG
 
-tegra-drm-y := drm.o fb.o dc.o host1x.o
+tegra-drm-y := drm.o fb.o dc.o
 tegra-drm-y += output.o rgb.o hdmi.o
 
 obj-$(CONFIG_DRM_TEGRA) += tegra-drm.o
diff --git a/drivers/gpu/drm/tegra/dc.c b/drivers/gpu/drm/tegra/dc.c
index 0744103..aae29e8 100644
--- a/drivers/gpu/drm/tegra/dc.c
+++ b/drivers/gpu/drm/tegra/dc.c
@@ -673,10 +673,10 @@ static int tegra_dc_debugfs_exit(struct tegra_dc *dc)
 	return 0;
 }
 
-static int tegra_dc_drm_init(struct host1x_client *client,
+static int tegra_dc_drm_init(struct tegra_drm_client *client,
 			     struct drm_device *drm)
 {
-	struct tegra_dc *dc = host1x_client_to_dc(client);
+	struct tegra_dc *dc = tegra_drm_client_to_dc(client);
 	int err;
 
 	dc->pipe = drm->mode_config.num_crtc;
@@ -708,9 +708,9 @@ static int tegra_dc_drm_init(struct host1x_client *client,
 	return 0;
 }
 
-static int tegra_dc_drm_exit(struct host1x_client *client)
+static int tegra_dc_drm_exit(struct tegra_drm_client *client)
 {
-	struct tegra_dc *dc = host1x_client_to_dc(client);
+	struct tegra_dc *dc = tegra_drm_client_to_dc(client);
 	int err;
 
 	devm_free_irq(dc->dev, dc->irq, dc);
@@ -730,14 +730,13 @@ static int tegra_dc_drm_exit(struct host1x_client *client)
 	return 0;
 }
 
-static const struct host1x_client_ops dc_client_ops = {
+static const struct tegra_drm_client_ops dc_client_ops = {
 	.drm_init = tegra_dc_drm_init,
 	.drm_exit = tegra_dc_drm_exit,
 };
 
 static int tegra_dc_probe(struct platform_device *pdev)
 {
-	struct host1x *host1x = dev_get_drvdata(pdev->dev.parent);
 	struct resource *regs;
 	struct tegra_dc *dc;
 	int err;
@@ -787,9 +786,9 @@ static int tegra_dc_probe(struct platform_device *pdev)
 		return err;
 	}
 
-	err = host1x_register_client(host1x, &dc->client);
+	err = tegra_drm_register_client(&dc->client);
 	if (err < 0) {
-		dev_err(&pdev->dev, "failed to register host1x client: %d\n",
+		dev_err(&pdev->dev, "failed to register tegra drm client: %d\n",
 			err);
 		return err;
 	}
@@ -801,13 +800,12 @@ static int tegra_dc_probe(struct platform_device *pdev)
 
 static int tegra_dc_remove(struct platform_device *pdev)
 {
-	struct host1x *host1x = dev_get_drvdata(pdev->dev.parent);
 	struct tegra_dc *dc = platform_get_drvdata(pdev);
 	int err;
 
-	err = host1x_unregister_client(host1x, &dc->client);
+	err = tegra_drm_unregister_client(&dc->client);
 	if (err < 0) {
-		dev_err(&pdev->dev, "failed to unregister host1x client: %d\n",
+		dev_err(&pdev->dev, "failed to unregister tegra_drm client: %d\n",
 			err);
 		return err;
 	}
diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index 3a503c9..530bed4 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -10,6 +10,7 @@
 #include <linux/module.h>
 #include <linux/of_address.h>
 #include <linux/of_platform.h>
+#include <linux/host1x.h>
 
 #include <mach/clk.h>
 #include <linux/dma-mapping.h>
@@ -24,21 +25,177 @@
 #define DRIVER_MINOR 0
 #define DRIVER_PATCHLEVEL 0
 
+struct tegra_drm_client_entry {
+	struct tegra_drm_client *client;
+	struct device_node *np;
+	struct list_head list;
+};
+
+static struct tegradrm *tegradrm;
+
+static struct tegradrm *tegra_drm_get(void)
+{
+	return tegradrm;
+}
+
+static int tegra_drm_add_client(struct tegradrm *tegradrm,
+		struct device_node *np)
+{
+	struct tegra_drm_client_entry *client;
+
+	client = kzalloc(sizeof(*client), GFP_KERNEL);
+	if (!client)
+		return -ENOMEM;
+
+	INIT_LIST_HEAD(&client->list);
+	client->np = of_node_get(np);
+
+	list_add_tail(&client->list, &tegradrm->drm_clients);
+
+	return 0;
+}
+
+static int tegra_drm_parse_dt(struct tegradrm *tegradrm)
+{
+	static const char * const compat[] = {
+		"nvidia,tegra20-dc",
+		"nvidia,tegra20-hdmi",
+		"nvidia,tegra30-dc",
+		"nvidia,tegra30-hdmi",
+	};
+	unsigned int i;
+	int err;
+	struct device *dev;
+
+	/* host1x is parent of all devices */
+	dev = bus_find_device_by_name(&platform_bus_type, NULL, "host1x");
+	if (!dev)
+		return -ENODEV;
+
+	/* find devices that are available and add them into the 'required'
+	 * list */
+	for (i = 0; i < ARRAY_SIZE(compat); i++) {
+		struct device_node *np;
+
+		for_each_child_of_node(dev->of_node, np) {
+			if (of_device_is_compatible(np, compat[i]) &&
+			    of_device_is_available(np)) {
+				err = tegra_drm_add_client(tegradrm, np);
+				if (err < 0)
+					return err;
+			}
+		}
+	}
+
+	return 0;
+}
+
+int tegra_drm_register_client(struct tegra_drm_client *client)
+{
+	struct tegradrm *tegradrm = tegra_drm_get();
+	struct tegra_drm_client_entry *drm, *tmp;
+	int err;
+
+	mutex_lock(&tegradrm->clients_lock);
+	list_add_tail(&client->list, &tegradrm->clients);
+	mutex_unlock(&tegradrm->clients_lock);
+
+	/* remove this device from 'required' list */
+	list_for_each_entry_safe(drm, tmp, &tegradrm->drm_clients, list)
+		if (drm->np == client->dev->of_node)
+			list_del(&drm->list);
+
+	/* if all required devices are found, register drm device */
+	if (list_empty(&tegradrm->drm_clients)) {
+		struct platform_device *pdev = to_platform_device(client->dev);
+
+		err = drm_platform_init(&tegra_drm_driver, pdev);
+		if (err < 0) {
+			dev_err(client->dev, "drm_platform_init(): %d\n", err);
+			return err;
+		}
+	}
+
+	return 0;
+}
+
+int tegra_drm_unregister_client(struct tegra_drm_client *client)
+{
+	struct tegradrm *tegradrm = tegra_drm_get();
+
+	list_for_each_entry(client, &tegradrm->drm_active, list) {
+
+		struct platform_device *pdev = to_platform_device(client->dev);
+
+		if (client->ops && client->ops->drm_exit) {
+			int err = client->ops->drm_exit(client);
+			if (err < 0) {
+				dev_err(client->dev,
+					"DRM cleanup failed for %s: %d\n",
+					dev_name(client->dev), err);
+				return err;
+			}
+		}
+
+		/* if this is the last device, unregister the drm driver */
+		if (client->list.next == &tegradrm->drm_active)
+			drm_platform_exit(&tegra_drm_driver, pdev);
+
+		list_del_init(&client->list);
+	}
+
+	return 0;
+}
+
+static int tegra_drm_alloc(void)
+{
+	int err;
+
+	tegradrm = kzalloc(sizeof(*tegradrm), GFP_KERNEL);
+	if (!tegradrm)
+		return -ENOMEM;
+
+	mutex_init(&tegradrm->drm_clients_lock);
+	INIT_LIST_HEAD(&tegradrm->drm_clients);
+	INIT_LIST_HEAD(&tegradrm->drm_active);
+	mutex_init(&tegradrm->clients_lock);
+	INIT_LIST_HEAD(&tegradrm->clients);
+
+	err = tegra_drm_parse_dt(tegradrm);
+	if (err < 0) {
+		pr_err("failed to parse DT: %d\n", err);
+		return err;
+	}
+
+	return 0;
+}
+
 static int tegra_drm_load(struct drm_device *drm, unsigned long flags)
 {
-	struct device *dev = drm->dev;
-	struct host1x *host1x;
+	struct tegra_drm_client *client;
 	int err;
+	struct tegradrm *tegradrm = tegra_drm_get();
 
-	host1x = dev_get_drvdata(dev);
-	drm->dev_private = host1x;
-	host1x->drm = drm;
+	drm->dev_private = tegradrm;
+	tegradrm->drm = drm;
 
 	drm_mode_config_init(drm);
 
-	err = host1x_drm_init(host1x, drm);
-	if (err < 0)
-		return err;
+	mutex_lock(&tegradrm->clients_lock);
+
+	list_for_each_entry(client, &tegradrm->clients, list) {
+		if (client->ops && client->ops->drm_init) {
+			err = client->ops->drm_init(client, drm);
+			if (err < 0) {
+				dev_dbg(drm->dev, "drm_init() failed for %s: %d\n",
+					dev_name(client->dev), err);
+				mutex_unlock(&tegradrm->clients_lock);
+				return err;
+			}
+		}
+	}
+
+	mutex_unlock(&tegradrm->clients_lock);
 
 	err = tegra_drm_fb_init(drm);
 	if (err < 0)
@@ -64,12 +221,47 @@ static int tegra_drm_open(struct drm_device *drm, struct drm_file *filp)
 	return 0;
 }
 
+static void tegra_drm_close(struct drm_device *drm, struct drm_file *filp)
+{
+
+}
+
 static void tegra_drm_lastclose(struct drm_device *drm)
 {
-	struct host1x *host1x = drm->dev_private;
+	tegra_drm_fb_restore(drm);
+}
+
+static int __init tegra_drm_init(void)
+{
+	int err;
+
+	if (tegra_drm_alloc())
+		return -ENOMEM;
 
-	drm_fbdev_cma_restore_mode(host1x->fbdev);
+	err = platform_driver_register(&tegra_dc_driver);
+	if (err < 0)
+		goto free_tegradrm;
+
+	err = platform_driver_register(&tegra_hdmi_driver);
+	if (err < 0)
+		goto unregister_dc;
+	return 0;
+
+unregister_dc:
+	platform_driver_unregister(&tegra_dc_driver);
+free_tegradrm:
+	kfree(tegradrm);
+	return err;
 }
+module_init(tegra_drm_init);
+
+static void __exit tegra_drm_exit(void)
+{
+	platform_driver_unregister(&tegra_hdmi_driver);
+	platform_driver_unregister(&tegra_dc_driver);
+	kfree(tegradrm);
+}
+module_exit(tegra_drm_exit);
 
 static struct drm_ioctl_desc tegra_drm_ioctls[] = {
 };
@@ -94,6 +286,7 @@ struct drm_driver tegra_drm_driver = {
 	.load = tegra_drm_load,
 	.unload = tegra_drm_unload,
 	.open = tegra_drm_open,
+	.preclose = tegra_drm_close,
 	.lastclose = tegra_drm_lastclose,
 
 	.gem_free_object = drm_gem_cma_free_object,
@@ -113,3 +306,7 @@ struct drm_driver tegra_drm_driver = {
 	.minor = DRIVER_MINOR,
 	.patchlevel = DRIVER_PATCHLEVEL,
 };
+
+MODULE_AUTHOR("Thierry Reding <thierry.reding@avionic-design.de>");
+MODULE_DESCRIPTION("NVIDIA Tegra DRM driver");
+MODULE_LICENSE("GPL");
diff --git a/drivers/gpu/drm/tegra/drm.h b/drivers/gpu/drm/tegra/drm.h
index 3a843a7..3e800fb 100644
--- a/drivers/gpu/drm/tegra/drm.h
+++ b/drivers/gpu/drm/tegra/drm.h
@@ -17,6 +17,7 @@
 #include <drm/drm_gem_cma_helper.h>
 #include <drm/drm_fb_cma_helper.h>
 #include <drm/drm_fixed.h>
+#include <drm/tegra_drm.h>
 
 struct tegra_framebuffer {
 	struct drm_framebuffer base;
@@ -28,13 +29,9 @@ static inline struct tegra_framebuffer *to_tegra_fb(struct drm_framebuffer *fb)
 	return container_of(fb, struct tegra_framebuffer, base);
 }
 
-struct host1x {
+struct tegradrm {
 	struct drm_device *drm;
 	struct device *dev;
-	void __iomem *regs;
-	struct clk *clk;
-	int syncpt;
-	int irq;
 
 	struct mutex drm_clients_lock;
 	struct list_head drm_clients;
@@ -47,36 +44,30 @@ struct host1x {
 	struct tegra_framebuffer fb;
 };
 
-struct host1x_client;
+struct tegra_drm_client;
 
-struct host1x_client_ops {
-	int (*drm_init)(struct host1x_client *client, struct drm_device *drm);
-	int (*drm_exit)(struct host1x_client *client);
+struct tegra_drm_client_ops {
+	int (*drm_init)(struct tegra_drm_client *, struct drm_device *);
+	int (*drm_exit)(struct tegra_drm_client *);
 };
 
-struct host1x_client {
-	struct host1x *host1x;
+struct tegra_drm_client {
 	struct device *dev;
 
-	const struct host1x_client_ops *ops;
+	const struct tegra_drm_client_ops *ops;
 
 	struct list_head list;
-};
 
-extern int host1x_drm_init(struct host1x *host1x, struct drm_device *drm);
-extern int host1x_drm_exit(struct host1x *host1x);
+};
 
-extern int host1x_register_client(struct host1x *host1x,
-				  struct host1x_client *client);
-extern int host1x_unregister_client(struct host1x *host1x,
-				    struct host1x_client *client);
+extern int tegra_drm_register_client(struct tegra_drm_client *client);
+extern int tegra_drm_unregister_client(struct tegra_drm_client *client);
 
 struct tegra_output;
 
 struct tegra_dc {
-	struct host1x_client client;
+	struct tegra_drm_client client;
 
-	struct host1x *host1x;
 	struct device *dev;
 
 	struct drm_crtc base;
@@ -96,7 +87,8 @@ struct tegra_dc {
 	struct dentry *debugfs;
 };
 
-static inline struct tegra_dc *host1x_client_to_dc(struct host1x_client *client)
+static inline struct tegra_dc *tegra_drm_client_to_dc(
+				struct tegra_drm_client *client)
 {
 	return container_of(client, struct tegra_dc, client);
 }
@@ -225,8 +217,8 @@ extern struct vm_operations_struct tegra_gem_vm_ops;
 /* from fb.c */
 extern int tegra_drm_fb_init(struct drm_device *drm);
 extern void tegra_drm_fb_exit(struct drm_device *drm);
+extern void tegra_drm_fb_restore(struct drm_device *drm);
 
-extern struct platform_driver tegra_host1x_driver;
 extern struct platform_driver tegra_hdmi_driver;
 extern struct platform_driver tegra_dc_driver;
 extern struct drm_driver tegra_drm_driver;
diff --git a/drivers/gpu/drm/tegra/fb.c b/drivers/gpu/drm/tegra/fb.c
index 97993c6..7c686d8 100644
--- a/drivers/gpu/drm/tegra/fb.c
+++ b/drivers/gpu/drm/tegra/fb.c
@@ -11,9 +11,9 @@
 
 static void tegra_drm_fb_output_poll_changed(struct drm_device *drm)
 {
-	struct host1x *host1x = drm->dev_private;
+	struct tegradrm *tegradrm = drm->dev_private;
 
-	drm_fbdev_cma_hotplug_event(host1x->fbdev);
+	drm_fbdev_cma_hotplug_event(tegradrm->fbdev);
 }
 
 static const struct drm_mode_config_funcs tegra_drm_mode_funcs = {
@@ -23,7 +23,7 @@ static const struct drm_mode_config_funcs tegra_drm_mode_funcs = {
 
 int tegra_drm_fb_init(struct drm_device *drm)
 {
-	struct host1x *host1x = drm->dev_private;
+	struct tegradrm *tegradrm = drm->dev_private;
 	struct drm_fbdev_cma *fbdev;
 
 	drm->mode_config.min_width = 0;
@@ -43,14 +43,19 @@ int tegra_drm_fb_init(struct drm_device *drm)
 	drm_fbdev_cma_restore_mode(fbdev);
 #endif
 
-	host1x->fbdev = fbdev;
+	tegradrm->fbdev = fbdev;
 
 	return 0;
 }
 
 void tegra_drm_fb_exit(struct drm_device *drm)
 {
-	struct host1x *host1x = drm->dev_private;
+	struct tegradrm *tegradrm = drm->dev_private;
+	drm_fbdev_cma_fini(tegradrm->fbdev);
+}
 
-	drm_fbdev_cma_fini(host1x->fbdev);
+void tegra_drm_fb_restore(struct drm_device *drm)
+{
+	struct tegradrm *tegradrm = drm->dev_private;
+	drm_fbdev_cma_restore_mode(tegradrm->fbdev);
 }
diff --git a/drivers/gpu/drm/tegra/hdmi.c b/drivers/gpu/drm/tegra/hdmi.c
index ab40164..774baf3 100644
--- a/drivers/gpu/drm/tegra/hdmi.c
+++ b/drivers/gpu/drm/tegra/hdmi.c
@@ -22,7 +22,7 @@
 #include "dc.h"
 
 struct tegra_hdmi {
-	struct host1x_client client;
+	struct tegra_drm_client client;
 	struct tegra_output output;
 	struct device *dev;
 
@@ -46,7 +46,7 @@ struct tegra_hdmi {
 };
 
 static inline struct tegra_hdmi *
-host1x_client_to_hdmi(struct host1x_client *client)
+tegra_drm_client_to_hdmi(struct tegra_drm_client *client)
 {
 	return container_of(client, struct tegra_hdmi, client);
 }
@@ -1152,10 +1152,10 @@ static int tegra_hdmi_debugfs_exit(struct tegra_hdmi *hdmi)
 	return 0;
 }
 
-static int tegra_hdmi_drm_init(struct host1x_client *client,
+static int tegra_hdmi_drm_init(struct tegra_drm_client *client,
 			       struct drm_device *drm)
 {
-	struct tegra_hdmi *hdmi = host1x_client_to_hdmi(client);
+	struct tegra_hdmi *hdmi = tegra_drm_client_to_hdmi(client);
 	int err;
 
 	hdmi->output.type = TEGRA_OUTPUT_HDMI;
@@ -1177,9 +1177,9 @@ static int tegra_hdmi_drm_init(struct host1x_client *client,
 	return 0;
 }
 
-static int tegra_hdmi_drm_exit(struct host1x_client *client)
+static int tegra_hdmi_drm_exit(struct tegra_drm_client *client)
 {
-	struct tegra_hdmi *hdmi = host1x_client_to_hdmi(client);
+	struct tegra_hdmi *hdmi = tegra_drm_client_to_hdmi(client);
 	int err;
 
 	if (IS_ENABLED(CONFIG_DEBUG_FS)) {
@@ -1204,14 +1204,13 @@ static int tegra_hdmi_drm_exit(struct host1x_client *client)
 	return 0;
 }
 
-static const struct host1x_client_ops hdmi_client_ops = {
+static const struct tegra_drm_client_ops hdmi_client_ops = {
 	.drm_init = tegra_hdmi_drm_init,
 	.drm_exit = tegra_hdmi_drm_exit,
 };
 
 static int tegra_hdmi_probe(struct platform_device *pdev)
 {
-	struct host1x *host1x = dev_get_drvdata(pdev->dev.parent);
 	struct tegra_hdmi *hdmi;
 	struct resource *regs;
 	int err;
@@ -1286,9 +1285,9 @@ static int tegra_hdmi_probe(struct platform_device *pdev)
 	INIT_LIST_HEAD(&hdmi->client.list);
 	hdmi->client.dev = &pdev->dev;
 
-	err = host1x_register_client(host1x, &hdmi->client);
+	err = tegra_drm_register_client(&hdmi->client);
 	if (err < 0) {
-		dev_err(&pdev->dev, "failed to register host1x client: %d\n",
+		dev_err(&pdev->dev, "failed to register tegra drm client: %d\n",
 			err);
 		return err;
 	}
@@ -1300,13 +1299,12 @@ static int tegra_hdmi_probe(struct platform_device *pdev)
 
 static int tegra_hdmi_remove(struct platform_device *pdev)
 {
-	struct host1x *host1x = dev_get_drvdata(pdev->dev.parent);
 	struct tegra_hdmi *hdmi = platform_get_drvdata(pdev);
 	int err;
 
-	err = host1x_unregister_client(host1x, &hdmi->client);
+	err = tegra_drm_unregister_client(&hdmi->client);
 	if (err < 0) {
-		dev_err(&pdev->dev, "failed to unregister host1x client: %d\n",
+		dev_err(&pdev->dev, "failed to unregister tegra drm client: %d\n",
 			err);
 		return err;
 	}
diff --git a/drivers/gpu/drm/tegra/host1x.c b/drivers/gpu/drm/tegra/host1x.c
deleted file mode 100644
index bdb97a5..0000000
--- a/drivers/gpu/drm/tegra/host1x.c
+++ /dev/null
@@ -1,325 +0,0 @@
-/*
- * Copyright (C) 2012 Avionic Design GmbH
- * Copyright (C) 2012 NVIDIA CORPORATION.  All rights reserved.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- */
-
-#include <linux/clk.h>
-#include <linux/err.h>
-#include <linux/module.h>
-#include <linux/of.h>
-#include <linux/platform_device.h>
-
-#include "drm.h"
-
-struct host1x_drm_client {
-	struct host1x_client *client;
-	struct device_node *np;
-	struct list_head list;
-};
-
-static int host1x_add_drm_client(struct host1x *host1x, struct device_node *np)
-{
-	struct host1x_drm_client *client;
-
-	client = kzalloc(sizeof(*client), GFP_KERNEL);
-	if (!client)
-		return -ENOMEM;
-
-	INIT_LIST_HEAD(&client->list);
-	client->np = of_node_get(np);
-
-	list_add_tail(&client->list, &host1x->drm_clients);
-
-	return 0;
-}
-
-static int host1x_activate_drm_client(struct host1x *host1x,
-				      struct host1x_drm_client *drm,
-				      struct host1x_client *client)
-{
-	mutex_lock(&host1x->drm_clients_lock);
-	list_del_init(&drm->list);
-	list_add_tail(&drm->list, &host1x->drm_active);
-	drm->client = client;
-	mutex_unlock(&host1x->drm_clients_lock);
-
-	return 0;
-}
-
-static int host1x_remove_drm_client(struct host1x *host1x,
-				    struct host1x_drm_client *client)
-{
-	mutex_lock(&host1x->drm_clients_lock);
-	list_del_init(&client->list);
-	mutex_unlock(&host1x->drm_clients_lock);
-
-	of_node_put(client->np);
-	kfree(client);
-
-	return 0;
-}
-
-static int host1x_parse_dt(struct host1x *host1x)
-{
-	static const char * const compat[] = {
-		"nvidia,tegra20-dc",
-		"nvidia,tegra20-hdmi",
-		"nvidia,tegra30-dc",
-		"nvidia,tegra30-hdmi",
-	};
-	unsigned int i;
-	int err;
-
-	for (i = 0; i < ARRAY_SIZE(compat); i++) {
-		struct device_node *np;
-
-		for_each_child_of_node(host1x->dev->of_node, np) {
-			if (of_device_is_compatible(np, compat[i]) &&
-			    of_device_is_available(np)) {
-				err = host1x_add_drm_client(host1x, np);
-				if (err < 0)
-					return err;
-			}
-		}
-	}
-
-	return 0;
-}
-
-static int tegra_host1x_probe(struct platform_device *pdev)
-{
-	struct host1x *host1x;
-	struct resource *regs;
-	int err;
-
-	host1x = devm_kzalloc(&pdev->dev, sizeof(*host1x), GFP_KERNEL);
-	if (!host1x)
-		return -ENOMEM;
-
-	mutex_init(&host1x->drm_clients_lock);
-	INIT_LIST_HEAD(&host1x->drm_clients);
-	INIT_LIST_HEAD(&host1x->drm_active);
-	mutex_init(&host1x->clients_lock);
-	INIT_LIST_HEAD(&host1x->clients);
-	host1x->dev = &pdev->dev;
-
-	err = host1x_parse_dt(host1x);
-	if (err < 0) {
-		dev_err(&pdev->dev, "failed to parse DT: %d\n", err);
-		return err;
-	}
-
-	host1x->clk = devm_clk_get(&pdev->dev, NULL);
-	if (IS_ERR(host1x->clk))
-		return PTR_ERR(host1x->clk);
-
-	err = clk_prepare_enable(host1x->clk);
-	if (err < 0)
-		return err;
-
-	regs = platform_get_resource(pdev, IORESOURCE_MEM, 0);
-	if (!regs) {
-		err = -ENXIO;
-		goto err;
-	}
-
-	err = platform_get_irq(pdev, 0);
-	if (err < 0)
-		goto err;
-
-	host1x->syncpt = err;
-
-	err = platform_get_irq(pdev, 1);
-	if (err < 0)
-		goto err;
-
-	host1x->irq = err;
-
-	host1x->regs = devm_request_and_ioremap(&pdev->dev, regs);
-	if (!host1x->regs) {
-		err = -EADDRNOTAVAIL;
-		goto err;
-	}
-
-	platform_set_drvdata(pdev, host1x);
-
-	return 0;
-
-err:
-	clk_disable_unprepare(host1x->clk);
-	return err;
-}
-
-static int tegra_host1x_remove(struct platform_device *pdev)
-{
-	struct host1x *host1x = platform_get_drvdata(pdev);
-
-	clk_disable_unprepare(host1x->clk);
-
-	return 0;
-}
-
-int host1x_drm_init(struct host1x *host1x, struct drm_device *drm)
-{
-	struct host1x_client *client;
-
-	mutex_lock(&host1x->clients_lock);
-
-	list_for_each_entry(client, &host1x->clients, list) {
-		if (client->ops && client->ops->drm_init) {
-			int err = client->ops->drm_init(client, drm);
-			if (err < 0) {
-				dev_err(host1x->dev,
-					"DRM setup failed for %s: %d\n",
-					dev_name(client->dev), err);
-				return err;
-			}
-		}
-	}
-
-	mutex_unlock(&host1x->clients_lock);
-
-	return 0;
-}
-
-int host1x_drm_exit(struct host1x *host1x)
-{
-	struct platform_device *pdev = to_platform_device(host1x->dev);
-	struct host1x_client *client;
-
-	if (!host1x->drm)
-		return 0;
-
-	mutex_lock(&host1x->clients_lock);
-
-	list_for_each_entry_reverse(client, &host1x->clients, list) {
-		if (client->ops && client->ops->drm_exit) {
-			int err = client->ops->drm_exit(client);
-			if (err < 0) {
-				dev_err(host1x->dev,
-					"DRM cleanup failed for %s: %d\n",
-					dev_name(client->dev), err);
-				return err;
-			}
-		}
-	}
-
-	mutex_unlock(&host1x->clients_lock);
-
-	drm_platform_exit(&tegra_drm_driver, pdev);
-	host1x->drm = NULL;
-
-	return 0;
-}
-
-int host1x_register_client(struct host1x *host1x, struct host1x_client *client)
-{
-	struct host1x_drm_client *drm, *tmp;
-	int err;
-
-	mutex_lock(&host1x->clients_lock);
-	list_add_tail(&client->list, &host1x->clients);
-	mutex_unlock(&host1x->clients_lock);
-
-	list_for_each_entry_safe(drm, tmp, &host1x->drm_clients, list)
-		if (drm->np == client->dev->of_node)
-			host1x_activate_drm_client(host1x, drm, client);
-
-	if (list_empty(&host1x->drm_clients)) {
-		struct platform_device *pdev = to_platform_device(host1x->dev);
-
-		err = drm_platform_init(&tegra_drm_driver, pdev);
-		if (err < 0) {
-			dev_err(host1x->dev, "drm_platform_init(): %d\n", err);
-			return err;
-		}
-	}
-
-	return 0;
-}
-
-int host1x_unregister_client(struct host1x *host1x,
-			     struct host1x_client *client)
-{
-	struct host1x_drm_client *drm, *tmp;
-	int err;
-
-	list_for_each_entry_safe(drm, tmp, &host1x->drm_active, list) {
-		if (drm->client == client) {
-			err = host1x_drm_exit(host1x);
-			if (err < 0) {
-				dev_err(host1x->dev, "host1x_drm_exit(): %d\n",
-					err);
-				return err;
-			}
-
-			host1x_remove_drm_client(host1x, drm);
-			break;
-		}
-	}
-
-	mutex_lock(&host1x->clients_lock);
-	list_del_init(&client->list);
-	mutex_unlock(&host1x->clients_lock);
-
-	return 0;
-}
-
-static struct of_device_id tegra_host1x_of_match[] = {
-	{ .compatible = "nvidia,tegra30-host1x", },
-	{ .compatible = "nvidia,tegra20-host1x", },
-	{ },
-};
-MODULE_DEVICE_TABLE(of, tegra_host1x_of_match);
-
-struct platform_driver tegra_host1x_driver = {
-	.driver = {
-		.name = "tegra-host1x",
-		.owner = THIS_MODULE,
-		.of_match_table = tegra_host1x_of_match,
-	},
-	.probe = tegra_host1x_probe,
-	.remove = tegra_host1x_remove,
-};
-
-static int __init tegra_host1x_init(void)
-{
-	int err;
-
-	err = platform_driver_register(&tegra_host1x_driver);
-	if (err < 0)
-		return err;
-
-	err = platform_driver_register(&tegra_dc_driver);
-	if (err < 0)
-		goto unregister_host1x;
-
-	err = platform_driver_register(&tegra_hdmi_driver);
-	if (err < 0)
-		goto unregister_dc;
-
-	return 0;
-
-unregister_dc:
-	platform_driver_unregister(&tegra_dc_driver);
-unregister_host1x:
-	platform_driver_unregister(&tegra_host1x_driver);
-	return err;
-}
-module_init(tegra_host1x_init);
-
-static void __exit tegra_host1x_exit(void)
-{
-	platform_driver_unregister(&tegra_hdmi_driver);
-	platform_driver_unregister(&tegra_dc_driver);
-	platform_driver_unregister(&tegra_host1x_driver);
-}
-module_exit(tegra_host1x_exit);
-
-MODULE_AUTHOR("Thierry Reding <thierry.reding@avionic-design.de>");
-MODULE_DESCRIPTION("NVIDIA Tegra DRM driver");
-MODULE_LICENSE("GPL");
diff --git a/include/drm/tegra_drm.h b/include/drm/tegra_drm.h
new file mode 100644
index 0000000..8632f49
--- /dev/null
+++ b/include/drm/tegra_drm.h
@@ -0,0 +1,20 @@
+/*
+ * Copyright (c) 2012, NVIDIA CORPORATION.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef _TEGRA_DRM_H_
+#define _TEGRA_DRM_H_
+
+#endif
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCHv3 5/7] drm: tegra: Remove redundant host1x
@ 2012-12-13 14:04   ` Terje Bergstrom
  0 siblings, 0 replies; 24+ messages in thread
From: Terje Bergstrom @ 2012-12-13 14:04 UTC (permalink / raw)
  To: tbergstrom, thierry.reding, dev, linux-tegra, dri-devel
  Cc: amerilainen, linux-kernel

From: Arto Merilainen <amerilainen@nvidia.com>

This patch removes the redundant host1x driver from tegradrm and
makes necessary bindings to the separate host driver.

The infrastructure for drm client lists is merged to drm.c.

The patch simplifies driver initialization; The original driver had
two lists for registered devices (clients and drm_active). The
clients list included references to all registered devices whereas
the drm_active list included only the devices that the tegradrm
driver itself supported. host1x is separated into a driver of its own
and hence there should be no need to support registration of external
drivers.  Therefore, only the drm_active list is reserved. Removal of
the list also simplifies the driver unregistration.

Signed-off-by: Arto Merilainen <amerilainen@nvidia.com>
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
---
 drivers/gpu/drm/tegra/Kconfig  |    2 +-
 drivers/gpu/drm/tegra/Makefile |    2 +-
 drivers/gpu/drm/tegra/dc.c     |   20 ++-
 drivers/gpu/drm/tegra/drm.c    |  217 +++++++++++++++++++++++++--
 drivers/gpu/drm/tegra/drm.h    |   38 ++---
 drivers/gpu/drm/tegra/fb.c     |   17 ++-
 drivers/gpu/drm/tegra/hdmi.c   |   24 ++-
 drivers/gpu/drm/tegra/host1x.c |  325 ----------------------------------------
 include/drm/tegra_drm.h        |   20 +++
 9 files changed, 275 insertions(+), 390 deletions(-)
 delete mode 100644 drivers/gpu/drm/tegra/host1x.c
 create mode 100644 include/drm/tegra_drm.h

diff --git a/drivers/gpu/drm/tegra/Kconfig b/drivers/gpu/drm/tegra/Kconfig
index be1daf7..4a0290e 100644
--- a/drivers/gpu/drm/tegra/Kconfig
+++ b/drivers/gpu/drm/tegra/Kconfig
@@ -1,6 +1,6 @@
 config DRM_TEGRA
 	tristate "NVIDIA Tegra DRM"
-	depends on DRM && OF && ARCH_TEGRA
+	depends on DRM && OF && ARCH_TEGRA && TEGRA_HOST1X
 	select DRM_KMS_HELPER
 	select DRM_GEM_CMA_HELPER
 	select DRM_KMS_CMA_HELPER
diff --git a/drivers/gpu/drm/tegra/Makefile b/drivers/gpu/drm/tegra/Makefile
index 80f73d1..f4c05bb 100644
--- a/drivers/gpu/drm/tegra/Makefile
+++ b/drivers/gpu/drm/tegra/Makefile
@@ -1,7 +1,7 @@
 ccflags-y := -Iinclude/drm
 ccflags-$(CONFIG_DRM_TEGRA_DEBUG) += -DDEBUG
 
-tegra-drm-y := drm.o fb.o dc.o host1x.o
+tegra-drm-y := drm.o fb.o dc.o
 tegra-drm-y += output.o rgb.o hdmi.o
 
 obj-$(CONFIG_DRM_TEGRA) += tegra-drm.o
diff --git a/drivers/gpu/drm/tegra/dc.c b/drivers/gpu/drm/tegra/dc.c
index 0744103..aae29e8 100644
--- a/drivers/gpu/drm/tegra/dc.c
+++ b/drivers/gpu/drm/tegra/dc.c
@@ -673,10 +673,10 @@ static int tegra_dc_debugfs_exit(struct tegra_dc *dc)
 	return 0;
 }
 
-static int tegra_dc_drm_init(struct host1x_client *client,
+static int tegra_dc_drm_init(struct tegra_drm_client *client,
 			     struct drm_device *drm)
 {
-	struct tegra_dc *dc = host1x_client_to_dc(client);
+	struct tegra_dc *dc = tegra_drm_client_to_dc(client);
 	int err;
 
 	dc->pipe = drm->mode_config.num_crtc;
@@ -708,9 +708,9 @@ static int tegra_dc_drm_init(struct host1x_client *client,
 	return 0;
 }
 
-static int tegra_dc_drm_exit(struct host1x_client *client)
+static int tegra_dc_drm_exit(struct tegra_drm_client *client)
 {
-	struct tegra_dc *dc = host1x_client_to_dc(client);
+	struct tegra_dc *dc = tegra_drm_client_to_dc(client);
 	int err;
 
 	devm_free_irq(dc->dev, dc->irq, dc);
@@ -730,14 +730,13 @@ static int tegra_dc_drm_exit(struct host1x_client *client)
 	return 0;
 }
 
-static const struct host1x_client_ops dc_client_ops = {
+static const struct tegra_drm_client_ops dc_client_ops = {
 	.drm_init = tegra_dc_drm_init,
 	.drm_exit = tegra_dc_drm_exit,
 };
 
 static int tegra_dc_probe(struct platform_device *pdev)
 {
-	struct host1x *host1x = dev_get_drvdata(pdev->dev.parent);
 	struct resource *regs;
 	struct tegra_dc *dc;
 	int err;
@@ -787,9 +786,9 @@ static int tegra_dc_probe(struct platform_device *pdev)
 		return err;
 	}
 
-	err = host1x_register_client(host1x, &dc->client);
+	err = tegra_drm_register_client(&dc->client);
 	if (err < 0) {
-		dev_err(&pdev->dev, "failed to register host1x client: %d\n",
+		dev_err(&pdev->dev, "failed to register tegra drm client: %d\n",
 			err);
 		return err;
 	}
@@ -801,13 +800,12 @@ static int tegra_dc_probe(struct platform_device *pdev)
 
 static int tegra_dc_remove(struct platform_device *pdev)
 {
-	struct host1x *host1x = dev_get_drvdata(pdev->dev.parent);
 	struct tegra_dc *dc = platform_get_drvdata(pdev);
 	int err;
 
-	err = host1x_unregister_client(host1x, &dc->client);
+	err = tegra_drm_unregister_client(&dc->client);
 	if (err < 0) {
-		dev_err(&pdev->dev, "failed to unregister host1x client: %d\n",
+		dev_err(&pdev->dev, "failed to unregister tegra_drm client: %d\n",
 			err);
 		return err;
 	}
diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index 3a503c9..530bed4 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -10,6 +10,7 @@
 #include <linux/module.h>
 #include <linux/of_address.h>
 #include <linux/of_platform.h>
+#include <linux/host1x.h>
 
 #include <mach/clk.h>
 #include <linux/dma-mapping.h>
@@ -24,21 +25,177 @@
 #define DRIVER_MINOR 0
 #define DRIVER_PATCHLEVEL 0
 
+struct tegra_drm_client_entry {
+	struct tegra_drm_client *client;
+	struct device_node *np;
+	struct list_head list;
+};
+
+static struct tegradrm *tegradrm;
+
+static struct tegradrm *tegra_drm_get(void)
+{
+	return tegradrm;
+}
+
+static int tegra_drm_add_client(struct tegradrm *tegradrm,
+		struct device_node *np)
+{
+	struct tegra_drm_client_entry *client;
+
+	client = kzalloc(sizeof(*client), GFP_KERNEL);
+	if (!client)
+		return -ENOMEM;
+
+	INIT_LIST_HEAD(&client->list);
+	client->np = of_node_get(np);
+
+	list_add_tail(&client->list, &tegradrm->drm_clients);
+
+	return 0;
+}
+
+static int tegra_drm_parse_dt(struct tegradrm *tegradrm)
+{
+	static const char * const compat[] = {
+		"nvidia,tegra20-dc",
+		"nvidia,tegra20-hdmi",
+		"nvidia,tegra30-dc",
+		"nvidia,tegra30-hdmi",
+	};
+	unsigned int i;
+	int err;
+	struct device *dev;
+
+	/* host1x is parent of all devices */
+	dev = bus_find_device_by_name(&platform_bus_type, NULL, "host1x");
+	if (!dev)
+		return -ENODEV;
+
+	/* find devices that are available and add them into the 'required'
+	 * list */
+	for (i = 0; i < ARRAY_SIZE(compat); i++) {
+		struct device_node *np;
+
+		for_each_child_of_node(dev->of_node, np) {
+			if (of_device_is_compatible(np, compat[i]) &&
+			    of_device_is_available(np)) {
+				err = tegra_drm_add_client(tegradrm, np);
+				if (err < 0)
+					return err;
+			}
+		}
+	}
+
+	return 0;
+}
+
+int tegra_drm_register_client(struct tegra_drm_client *client)
+{
+	struct tegradrm *tegradrm = tegra_drm_get();
+	struct tegra_drm_client_entry *drm, *tmp;
+	int err;
+
+	mutex_lock(&tegradrm->clients_lock);
+	list_add_tail(&client->list, &tegradrm->clients);
+	mutex_unlock(&tegradrm->clients_lock);
+
+	/* remove this device from 'required' list */
+	list_for_each_entry_safe(drm, tmp, &tegradrm->drm_clients, list)
+		if (drm->np == client->dev->of_node)
+			list_del(&drm->list);
+
+	/* if all required devices are found, register drm device */
+	if (list_empty(&tegradrm->drm_clients)) {
+		struct platform_device *pdev = to_platform_device(client->dev);
+
+		err = drm_platform_init(&tegra_drm_driver, pdev);
+		if (err < 0) {
+			dev_err(client->dev, "drm_platform_init(): %d\n", err);
+			return err;
+		}
+	}
+
+	return 0;
+}
+
+int tegra_drm_unregister_client(struct tegra_drm_client *client)
+{
+	struct tegradrm *tegradrm = tegra_drm_get();
+
+	list_for_each_entry(client, &tegradrm->drm_active, list) {
+
+		struct platform_device *pdev = to_platform_device(client->dev);
+
+		if (client->ops && client->ops->drm_exit) {
+			int err = client->ops->drm_exit(client);
+			if (err < 0) {
+				dev_err(client->dev,
+					"DRM cleanup failed for %s: %d\n",
+					dev_name(client->dev), err);
+				return err;
+			}
+		}
+
+		/* if this is the last device, unregister the drm driver */
+		if (client->list.next == &tegradrm->drm_active)
+			drm_platform_exit(&tegra_drm_driver, pdev);
+
+		list_del_init(&client->list);
+	}
+
+	return 0;
+}
+
+static int tegra_drm_alloc(void)
+{
+	int err;
+
+	tegradrm = kzalloc(sizeof(*tegradrm), GFP_KERNEL);
+	if (!tegradrm)
+		return -ENOMEM;
+
+	mutex_init(&tegradrm->drm_clients_lock);
+	INIT_LIST_HEAD(&tegradrm->drm_clients);
+	INIT_LIST_HEAD(&tegradrm->drm_active);
+	mutex_init(&tegradrm->clients_lock);
+	INIT_LIST_HEAD(&tegradrm->clients);
+
+	err = tegra_drm_parse_dt(tegradrm);
+	if (err < 0) {
+		pr_err("failed to parse DT: %d\n", err);
+		return err;
+	}
+
+	return 0;
+}
+
 static int tegra_drm_load(struct drm_device *drm, unsigned long flags)
 {
-	struct device *dev = drm->dev;
-	struct host1x *host1x;
+	struct tegra_drm_client *client;
 	int err;
+	struct tegradrm *tegradrm = tegra_drm_get();
 
-	host1x = dev_get_drvdata(dev);
-	drm->dev_private = host1x;
-	host1x->drm = drm;
+	drm->dev_private = tegradrm;
+	tegradrm->drm = drm;
 
 	drm_mode_config_init(drm);
 
-	err = host1x_drm_init(host1x, drm);
-	if (err < 0)
-		return err;
+	mutex_lock(&tegradrm->clients_lock);
+
+	list_for_each_entry(client, &tegradrm->clients, list) {
+		if (client->ops && client->ops->drm_init) {
+			err = client->ops->drm_init(client, drm);
+			if (err < 0) {
+				dev_dbg(drm->dev, "drm_init() failed for %s: %d\n",
+					dev_name(client->dev), err);
+				mutex_unlock(&tegradrm->clients_lock);
+				return err;
+			}
+		}
+	}
+
+	mutex_unlock(&tegradrm->clients_lock);
 
 	err = tegra_drm_fb_init(drm);
 	if (err < 0)
@@ -64,12 +221,47 @@ static int tegra_drm_open(struct drm_device *drm, struct drm_file *filp)
 	return 0;
 }
 
+static void tegra_drm_close(struct drm_device *drm, struct drm_file *filp)
+{
+
+}
+
 static void tegra_drm_lastclose(struct drm_device *drm)
 {
-	struct host1x *host1x = drm->dev_private;
+	tegra_drm_fb_restore(drm);
+}
+
+static int __init tegra_drm_init(void)
+{
+	int err;
+
+	if (tegra_drm_alloc())
+		return -ENOMEM;
 
-	drm_fbdev_cma_restore_mode(host1x->fbdev);
+	err = platform_driver_register(&tegra_dc_driver);
+	if (err < 0)
+		goto free_tegradrm;
+
+	err = platform_driver_register(&tegra_hdmi_driver);
+	if (err < 0)
+		goto unregister_dc;
+	return 0;
+
+unregister_dc:
+	platform_driver_unregister(&tegra_dc_driver);
+free_tegradrm:
+	kfree(tegradrm);
+	return err;
 }
+module_init(tegra_drm_init);
+
+static void __exit tegra_drm_exit(void)
+{
+	platform_driver_unregister(&tegra_hdmi_driver);
+	platform_driver_unregister(&tegra_dc_driver);
+	kfree(tegradrm);
+}
+module_exit(tegra_drm_exit);
 
 static struct drm_ioctl_desc tegra_drm_ioctls[] = {
 };
@@ -94,6 +286,7 @@ struct drm_driver tegra_drm_driver = {
 	.load = tegra_drm_load,
 	.unload = tegra_drm_unload,
 	.open = tegra_drm_open,
+	.preclose = tegra_drm_close,
 	.lastclose = tegra_drm_lastclose,
 
 	.gem_free_object = drm_gem_cma_free_object,
@@ -113,3 +306,7 @@ struct drm_driver tegra_drm_driver = {
 	.minor = DRIVER_MINOR,
 	.patchlevel = DRIVER_PATCHLEVEL,
 };
+
+MODULE_AUTHOR("Thierry Reding <thierry.reding@avionic-design.de>");
+MODULE_DESCRIPTION("NVIDIA Tegra DRM driver");
+MODULE_LICENSE("GPL");
diff --git a/drivers/gpu/drm/tegra/drm.h b/drivers/gpu/drm/tegra/drm.h
index 3a843a7..3e800fb 100644
--- a/drivers/gpu/drm/tegra/drm.h
+++ b/drivers/gpu/drm/tegra/drm.h
@@ -17,6 +17,7 @@
 #include <drm/drm_gem_cma_helper.h>
 #include <drm/drm_fb_cma_helper.h>
 #include <drm/drm_fixed.h>
+#include <drm/tegra_drm.h>
 
 struct tegra_framebuffer {
 	struct drm_framebuffer base;
@@ -28,13 +29,9 @@ static inline struct tegra_framebuffer *to_tegra_fb(struct drm_framebuffer *fb)
 	return container_of(fb, struct tegra_framebuffer, base);
 }
 
-struct host1x {
+struct tegradrm {
 	struct drm_device *drm;
 	struct device *dev;
-	void __iomem *regs;
-	struct clk *clk;
-	int syncpt;
-	int irq;
 
 	struct mutex drm_clients_lock;
 	struct list_head drm_clients;
@@ -47,36 +44,30 @@ struct host1x {
 	struct tegra_framebuffer fb;
 };
 
-struct host1x_client;
+struct tegra_drm_client;
 
-struct host1x_client_ops {
-	int (*drm_init)(struct host1x_client *client, struct drm_device *drm);
-	int (*drm_exit)(struct host1x_client *client);
+struct tegra_drm_client_ops {
+	int (*drm_init)(struct tegra_drm_client *, struct drm_device *);
+	int (*drm_exit)(struct tegra_drm_client *);
 };
 
-struct host1x_client {
-	struct host1x *host1x;
+struct tegra_drm_client {
 	struct device *dev;
 
-	const struct host1x_client_ops *ops;
+	const struct tegra_drm_client_ops *ops;
 
 	struct list_head list;
-};
 
-extern int host1x_drm_init(struct host1x *host1x, struct drm_device *drm);
-extern int host1x_drm_exit(struct host1x *host1x);
+};
 
-extern int host1x_register_client(struct host1x *host1x,
-				  struct host1x_client *client);
-extern int host1x_unregister_client(struct host1x *host1x,
-				    struct host1x_client *client);
+extern int tegra_drm_register_client(struct tegra_drm_client *client);
+extern int tegra_drm_unregister_client(struct tegra_drm_client *client);
 
 struct tegra_output;
 
 struct tegra_dc {
-	struct host1x_client client;
+	struct tegra_drm_client client;
 
-	struct host1x *host1x;
 	struct device *dev;
 
 	struct drm_crtc base;
@@ -96,7 +87,8 @@ struct tegra_dc {
 	struct dentry *debugfs;
 };
 
-static inline struct tegra_dc *host1x_client_to_dc(struct host1x_client *client)
+static inline struct tegra_dc *tegra_drm_client_to_dc(
+				struct tegra_drm_client *client)
 {
 	return container_of(client, struct tegra_dc, client);
 }
@@ -225,8 +217,8 @@ extern struct vm_operations_struct tegra_gem_vm_ops;
 /* from fb.c */
 extern int tegra_drm_fb_init(struct drm_device *drm);
 extern void tegra_drm_fb_exit(struct drm_device *drm);
+extern void tegra_drm_fb_restore(struct drm_device *drm);
 
-extern struct platform_driver tegra_host1x_driver;
 extern struct platform_driver tegra_hdmi_driver;
 extern struct platform_driver tegra_dc_driver;
 extern struct drm_driver tegra_drm_driver;
diff --git a/drivers/gpu/drm/tegra/fb.c b/drivers/gpu/drm/tegra/fb.c
index 97993c6..7c686d8 100644
--- a/drivers/gpu/drm/tegra/fb.c
+++ b/drivers/gpu/drm/tegra/fb.c
@@ -11,9 +11,9 @@
 
 static void tegra_drm_fb_output_poll_changed(struct drm_device *drm)
 {
-	struct host1x *host1x = drm->dev_private;
+	struct tegradrm *tegradrm = drm->dev_private;
 
-	drm_fbdev_cma_hotplug_event(host1x->fbdev);
+	drm_fbdev_cma_hotplug_event(tegradrm->fbdev);
 }
 
 static const struct drm_mode_config_funcs tegra_drm_mode_funcs = {
@@ -23,7 +23,7 @@ static const struct drm_mode_config_funcs tegra_drm_mode_funcs = {
 
 int tegra_drm_fb_init(struct drm_device *drm)
 {
-	struct host1x *host1x = drm->dev_private;
+	struct tegradrm *tegradrm = drm->dev_private;
 	struct drm_fbdev_cma *fbdev;
 
 	drm->mode_config.min_width = 0;
@@ -43,14 +43,19 @@ int tegra_drm_fb_init(struct drm_device *drm)
 	drm_fbdev_cma_restore_mode(fbdev);
 #endif
 
-	host1x->fbdev = fbdev;
+	tegradrm->fbdev = fbdev;
 
 	return 0;
 }
 
 void tegra_drm_fb_exit(struct drm_device *drm)
 {
-	struct host1x *host1x = drm->dev_private;
+	struct tegradrm *tegradrm = drm->dev_private;
+	drm_fbdev_cma_fini(tegradrm->fbdev);
+}
 
-	drm_fbdev_cma_fini(host1x->fbdev);
+void tegra_drm_fb_restore(struct drm_device *drm)
+{
+	struct tegradrm *tegradrm = drm->dev_private;
+	drm_fbdev_cma_restore_mode(tegradrm->fbdev);
 }
diff --git a/drivers/gpu/drm/tegra/hdmi.c b/drivers/gpu/drm/tegra/hdmi.c
index ab40164..774baf3 100644
--- a/drivers/gpu/drm/tegra/hdmi.c
+++ b/drivers/gpu/drm/tegra/hdmi.c
@@ -22,7 +22,7 @@
 #include "dc.h"
 
 struct tegra_hdmi {
-	struct host1x_client client;
+	struct tegra_drm_client client;
 	struct tegra_output output;
 	struct device *dev;
 
@@ -46,7 +46,7 @@ struct tegra_hdmi {
 };
 
 static inline struct tegra_hdmi *
-host1x_client_to_hdmi(struct host1x_client *client)
+tegra_drm_client_to_hdmi(struct tegra_drm_client *client)
 {
 	return container_of(client, struct tegra_hdmi, client);
 }
@@ -1152,10 +1152,10 @@ static int tegra_hdmi_debugfs_exit(struct tegra_hdmi *hdmi)
 	return 0;
 }
 
-static int tegra_hdmi_drm_init(struct host1x_client *client,
+static int tegra_hdmi_drm_init(struct tegra_drm_client *client,
 			       struct drm_device *drm)
 {
-	struct tegra_hdmi *hdmi = host1x_client_to_hdmi(client);
+	struct tegra_hdmi *hdmi = tegra_drm_client_to_hdmi(client);
 	int err;
 
 	hdmi->output.type = TEGRA_OUTPUT_HDMI;
@@ -1177,9 +1177,9 @@ static int tegra_hdmi_drm_init(struct host1x_client *client,
 	return 0;
 }
 
-static int tegra_hdmi_drm_exit(struct host1x_client *client)
+static int tegra_hdmi_drm_exit(struct tegra_drm_client *client)
 {
-	struct tegra_hdmi *hdmi = host1x_client_to_hdmi(client);
+	struct tegra_hdmi *hdmi = tegra_drm_client_to_hdmi(client);
 	int err;
 
 	if (IS_ENABLED(CONFIG_DEBUG_FS)) {
@@ -1204,14 +1204,13 @@ static int tegra_hdmi_drm_exit(struct host1x_client *client)
 	return 0;
 }
 
-static const struct host1x_client_ops hdmi_client_ops = {
+static const struct tegra_drm_client_ops hdmi_client_ops = {
 	.drm_init = tegra_hdmi_drm_init,
 	.drm_exit = tegra_hdmi_drm_exit,
 };
 
 static int tegra_hdmi_probe(struct platform_device *pdev)
 {
-	struct host1x *host1x = dev_get_drvdata(pdev->dev.parent);
 	struct tegra_hdmi *hdmi;
 	struct resource *regs;
 	int err;
@@ -1286,9 +1285,9 @@ static int tegra_hdmi_probe(struct platform_device *pdev)
 	INIT_LIST_HEAD(&hdmi->client.list);
 	hdmi->client.dev = &pdev->dev;
 
-	err = host1x_register_client(host1x, &hdmi->client);
+	err = tegra_drm_register_client(&hdmi->client);
 	if (err < 0) {
-		dev_err(&pdev->dev, "failed to register host1x client: %d\n",
+		dev_err(&pdev->dev, "failed to register tegra drm client: %d\n",
 			err);
 		return err;
 	}
@@ -1300,13 +1299,12 @@ static int tegra_hdmi_probe(struct platform_device *pdev)
 
 static int tegra_hdmi_remove(struct platform_device *pdev)
 {
-	struct host1x *host1x = dev_get_drvdata(pdev->dev.parent);
 	struct tegra_hdmi *hdmi = platform_get_drvdata(pdev);
 	int err;
 
-	err = host1x_unregister_client(host1x, &hdmi->client);
+	err = tegra_drm_unregister_client(&hdmi->client);
 	if (err < 0) {
-		dev_err(&pdev->dev, "failed to unregister host1x client: %d\n",
+		dev_err(&pdev->dev, "failed to unregister tegra drm client: %d\n",
 			err);
 		return err;
 	}
diff --git a/drivers/gpu/drm/tegra/host1x.c b/drivers/gpu/drm/tegra/host1x.c
deleted file mode 100644
index bdb97a5..0000000
--- a/drivers/gpu/drm/tegra/host1x.c
+++ /dev/null
@@ -1,325 +0,0 @@
-/*
- * Copyright (C) 2012 Avionic Design GmbH
- * Copyright (C) 2012 NVIDIA CORPORATION.  All rights reserved.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- */
-
-#include <linux/clk.h>
-#include <linux/err.h>
-#include <linux/module.h>
-#include <linux/of.h>
-#include <linux/platform_device.h>
-
-#include "drm.h"
-
-struct host1x_drm_client {
-	struct host1x_client *client;
-	struct device_node *np;
-	struct list_head list;
-};
-
-static int host1x_add_drm_client(struct host1x *host1x, struct device_node *np)
-{
-	struct host1x_drm_client *client;
-
-	client = kzalloc(sizeof(*client), GFP_KERNEL);
-	if (!client)
-		return -ENOMEM;
-
-	INIT_LIST_HEAD(&client->list);
-	client->np = of_node_get(np);
-
-	list_add_tail(&client->list, &host1x->drm_clients);
-
-	return 0;
-}
-
-static int host1x_activate_drm_client(struct host1x *host1x,
-				      struct host1x_drm_client *drm,
-				      struct host1x_client *client)
-{
-	mutex_lock(&host1x->drm_clients_lock);
-	list_del_init(&drm->list);
-	list_add_tail(&drm->list, &host1x->drm_active);
-	drm->client = client;
-	mutex_unlock(&host1x->drm_clients_lock);
-
-	return 0;
-}
-
-static int host1x_remove_drm_client(struct host1x *host1x,
-				    struct host1x_drm_client *client)
-{
-	mutex_lock(&host1x->drm_clients_lock);
-	list_del_init(&client->list);
-	mutex_unlock(&host1x->drm_clients_lock);
-
-	of_node_put(client->np);
-	kfree(client);
-
-	return 0;
-}
-
-static int host1x_parse_dt(struct host1x *host1x)
-{
-	static const char * const compat[] = {
-		"nvidia,tegra20-dc",
-		"nvidia,tegra20-hdmi",
-		"nvidia,tegra30-dc",
-		"nvidia,tegra30-hdmi",
-	};
-	unsigned int i;
-	int err;
-
-	for (i = 0; i < ARRAY_SIZE(compat); i++) {
-		struct device_node *np;
-
-		for_each_child_of_node(host1x->dev->of_node, np) {
-			if (of_device_is_compatible(np, compat[i]) &&
-			    of_device_is_available(np)) {
-				err = host1x_add_drm_client(host1x, np);
-				if (err < 0)
-					return err;
-			}
-		}
-	}
-
-	return 0;
-}
-
-static int tegra_host1x_probe(struct platform_device *pdev)
-{
-	struct host1x *host1x;
-	struct resource *regs;
-	int err;
-
-	host1x = devm_kzalloc(&pdev->dev, sizeof(*host1x), GFP_KERNEL);
-	if (!host1x)
-		return -ENOMEM;
-
-	mutex_init(&host1x->drm_clients_lock);
-	INIT_LIST_HEAD(&host1x->drm_clients);
-	INIT_LIST_HEAD(&host1x->drm_active);
-	mutex_init(&host1x->clients_lock);
-	INIT_LIST_HEAD(&host1x->clients);
-	host1x->dev = &pdev->dev;
-
-	err = host1x_parse_dt(host1x);
-	if (err < 0) {
-		dev_err(&pdev->dev, "failed to parse DT: %d\n", err);
-		return err;
-	}
-
-	host1x->clk = devm_clk_get(&pdev->dev, NULL);
-	if (IS_ERR(host1x->clk))
-		return PTR_ERR(host1x->clk);
-
-	err = clk_prepare_enable(host1x->clk);
-	if (err < 0)
-		return err;
-
-	regs = platform_get_resource(pdev, IORESOURCE_MEM, 0);
-	if (!regs) {
-		err = -ENXIO;
-		goto err;
-	}
-
-	err = platform_get_irq(pdev, 0);
-	if (err < 0)
-		goto err;
-
-	host1x->syncpt = err;
-
-	err = platform_get_irq(pdev, 1);
-	if (err < 0)
-		goto err;
-
-	host1x->irq = err;
-
-	host1x->regs = devm_request_and_ioremap(&pdev->dev, regs);
-	if (!host1x->regs) {
-		err = -EADDRNOTAVAIL;
-		goto err;
-	}
-
-	platform_set_drvdata(pdev, host1x);
-
-	return 0;
-
-err:
-	clk_disable_unprepare(host1x->clk);
-	return err;
-}
-
-static int tegra_host1x_remove(struct platform_device *pdev)
-{
-	struct host1x *host1x = platform_get_drvdata(pdev);
-
-	clk_disable_unprepare(host1x->clk);
-
-	return 0;
-}
-
-int host1x_drm_init(struct host1x *host1x, struct drm_device *drm)
-{
-	struct host1x_client *client;
-
-	mutex_lock(&host1x->clients_lock);
-
-	list_for_each_entry(client, &host1x->clients, list) {
-		if (client->ops && client->ops->drm_init) {
-			int err = client->ops->drm_init(client, drm);
-			if (err < 0) {
-				dev_err(host1x->dev,
-					"DRM setup failed for %s: %d\n",
-					dev_name(client->dev), err);
-				return err;
-			}
-		}
-	}
-
-	mutex_unlock(&host1x->clients_lock);
-
-	return 0;
-}
-
-int host1x_drm_exit(struct host1x *host1x)
-{
-	struct platform_device *pdev = to_platform_device(host1x->dev);
-	struct host1x_client *client;
-
-	if (!host1x->drm)
-		return 0;
-
-	mutex_lock(&host1x->clients_lock);
-
-	list_for_each_entry_reverse(client, &host1x->clients, list) {
-		if (client->ops && client->ops->drm_exit) {
-			int err = client->ops->drm_exit(client);
-			if (err < 0) {
-				dev_err(host1x->dev,
-					"DRM cleanup failed for %s: %d\n",
-					dev_name(client->dev), err);
-				return err;
-			}
-		}
-	}
-
-	mutex_unlock(&host1x->clients_lock);
-
-	drm_platform_exit(&tegra_drm_driver, pdev);
-	host1x->drm = NULL;
-
-	return 0;
-}
-
-int host1x_register_client(struct host1x *host1x, struct host1x_client *client)
-{
-	struct host1x_drm_client *drm, *tmp;
-	int err;
-
-	mutex_lock(&host1x->clients_lock);
-	list_add_tail(&client->list, &host1x->clients);
-	mutex_unlock(&host1x->clients_lock);
-
-	list_for_each_entry_safe(drm, tmp, &host1x->drm_clients, list)
-		if (drm->np == client->dev->of_node)
-			host1x_activate_drm_client(host1x, drm, client);
-
-	if (list_empty(&host1x->drm_clients)) {
-		struct platform_device *pdev = to_platform_device(host1x->dev);
-
-		err = drm_platform_init(&tegra_drm_driver, pdev);
-		if (err < 0) {
-			dev_err(host1x->dev, "drm_platform_init(): %d\n", err);
-			return err;
-		}
-	}
-
-	return 0;
-}
-
-int host1x_unregister_client(struct host1x *host1x,
-			     struct host1x_client *client)
-{
-	struct host1x_drm_client *drm, *tmp;
-	int err;
-
-	list_for_each_entry_safe(drm, tmp, &host1x->drm_active, list) {
-		if (drm->client == client) {
-			err = host1x_drm_exit(host1x);
-			if (err < 0) {
-				dev_err(host1x->dev, "host1x_drm_exit(): %d\n",
-					err);
-				return err;
-			}
-
-			host1x_remove_drm_client(host1x, drm);
-			break;
-		}
-	}
-
-	mutex_lock(&host1x->clients_lock);
-	list_del_init(&client->list);
-	mutex_unlock(&host1x->clients_lock);
-
-	return 0;
-}
-
-static struct of_device_id tegra_host1x_of_match[] = {
-	{ .compatible = "nvidia,tegra30-host1x", },
-	{ .compatible = "nvidia,tegra20-host1x", },
-	{ },
-};
-MODULE_DEVICE_TABLE(of, tegra_host1x_of_match);
-
-struct platform_driver tegra_host1x_driver = {
-	.driver = {
-		.name = "tegra-host1x",
-		.owner = THIS_MODULE,
-		.of_match_table = tegra_host1x_of_match,
-	},
-	.probe = tegra_host1x_probe,
-	.remove = tegra_host1x_remove,
-};
-
-static int __init tegra_host1x_init(void)
-{
-	int err;
-
-	err = platform_driver_register(&tegra_host1x_driver);
-	if (err < 0)
-		return err;
-
-	err = platform_driver_register(&tegra_dc_driver);
-	if (err < 0)
-		goto unregister_host1x;
-
-	err = platform_driver_register(&tegra_hdmi_driver);
-	if (err < 0)
-		goto unregister_dc;
-
-	return 0;
-
-unregister_dc:
-	platform_driver_unregister(&tegra_dc_driver);
-unregister_host1x:
-	platform_driver_unregister(&tegra_host1x_driver);
-	return err;
-}
-module_init(tegra_host1x_init);
-
-static void __exit tegra_host1x_exit(void)
-{
-	platform_driver_unregister(&tegra_hdmi_driver);
-	platform_driver_unregister(&tegra_dc_driver);
-	platform_driver_unregister(&tegra_host1x_driver);
-}
-module_exit(tegra_host1x_exit);
-
-MODULE_AUTHOR("Thierry Reding <thierry.reding@avionic-design.de>");
-MODULE_DESCRIPTION("NVIDIA Tegra DRM driver");
-MODULE_LICENSE("GPL");
diff --git a/include/drm/tegra_drm.h b/include/drm/tegra_drm.h
new file mode 100644
index 0000000..8632f49
--- /dev/null
+++ b/include/drm/tegra_drm.h
@@ -0,0 +1,20 @@
+/*
+ * Copyright (c) 2012, NVIDIA CORPORATION.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef _TEGRA_DRM_H_
+#define _TEGRA_DRM_H_
+
+#endif
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCHv3 6/7] ARM: tegra: Add board data and 2D clocks
  2012-12-13 14:04 ` Terje Bergstrom
@ 2012-12-13 14:04   ` Terje Bergstrom
  -1 siblings, 0 replies; 24+ messages in thread
From: Terje Bergstrom @ 2012-12-13 14:04 UTC (permalink / raw)
  To: tbergstrom, thierry.reding, dev, linux-tegra, dri-devel; +Cc: linux-kernel

Add a driver alias gr2d for Tegra 2D device, and assign a duplicate
of 2D clock to that driver alias.

Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
---
 arch/arm/mach-tegra/board-dt-tegra20.c    |    1 +
 arch/arm/mach-tegra/board-dt-tegra30.c    |    1 +
 arch/arm/mach-tegra/tegra20_clocks_data.c |    2 +-
 arch/arm/mach-tegra/tegra30_clocks_data.c |    1 +
 4 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/arm/mach-tegra/board-dt-tegra20.c b/arch/arm/mach-tegra/board-dt-tegra20.c
index 734d9cc..2b7a3c2 100644
--- a/arch/arm/mach-tegra/board-dt-tegra20.c
+++ b/arch/arm/mach-tegra/board-dt-tegra20.c
@@ -95,6 +95,7 @@ struct of_dev_auxdata tegra20_auxdata_lookup[] __initdata = {
 	OF_DEV_AUXDATA("nvidia,tegra20-slink", 0x7000D800, "spi_tegra.2", NULL),
 	OF_DEV_AUXDATA("nvidia,tegra20-slink", 0x7000DA00, "spi_tegra.3", NULL),
 	OF_DEV_AUXDATA("nvidia,tegra20-host1x", 0x50000000, "host1x", NULL),
+	OF_DEV_AUXDATA("nvidia,tegra20-gr2d", 0x54140000, "gr2d", NULL),
 	OF_DEV_AUXDATA("nvidia,tegra20-dc", 0x54200000, "tegradc.0", NULL),
 	OF_DEV_AUXDATA("nvidia,tegra20-dc", 0x54240000, "tegradc.1", NULL),
 	OF_DEV_AUXDATA("nvidia,tegra20-hdmi", 0x54280000, "hdmi", NULL),
diff --git a/arch/arm/mach-tegra/board-dt-tegra30.c b/arch/arm/mach-tegra/board-dt-tegra30.c
index 6497d12..6a9e6cb 100644
--- a/arch/arm/mach-tegra/board-dt-tegra30.c
+++ b/arch/arm/mach-tegra/board-dt-tegra30.c
@@ -58,6 +58,7 @@ struct of_dev_auxdata tegra30_auxdata_lookup[] __initdata = {
 	OF_DEV_AUXDATA("nvidia,tegra30-slink", 0x7000DC00, "spi_tegra.4", NULL),
 	OF_DEV_AUXDATA("nvidia,tegra30-slink", 0x7000DE00, "spi_tegra.5", NULL),
 	OF_DEV_AUXDATA("nvidia,tegra30-host1x", 0x50000000, "host1x", NULL),
+	OF_DEV_AUXDATA("nvidia,tegra30-gr2d", 0x54140000, "gr2d", NULL),
 	OF_DEV_AUXDATA("nvidia,tegra30-dc", 0x54200000, "tegradc.0", NULL),
 	OF_DEV_AUXDATA("nvidia,tegra30-dc", 0x54240000, "tegradc.1", NULL),
 	OF_DEV_AUXDATA("nvidia,tegra30-hdmi", 0x54280000, "hdmi", NULL),
diff --git a/arch/arm/mach-tegra/tegra20_clocks_data.c b/arch/arm/mach-tegra/tegra20_clocks_data.c
index a23a073..15d440a 100644
--- a/arch/arm/mach-tegra/tegra20_clocks_data.c
+++ b/arch/arm/mach-tegra/tegra20_clocks_data.c
@@ -1041,7 +1041,7 @@ static struct clk_duplicate tegra_clk_duplicates[] = {
 	CLK_DUPLICATE("usbd",	"utmip-pad",	NULL),
 	CLK_DUPLICATE("usbd",	"tegra-ehci.0",	NULL),
 	CLK_DUPLICATE("usbd",	"tegra-otg",	NULL),
-	CLK_DUPLICATE("2d",	"tegra_grhost",	"gr2d"),
+	CLK_DUPLICATE("2d",	"gr2d",	"gr2d"),
 	CLK_DUPLICATE("3d",	"tegra_grhost",	"gr3d"),
 	CLK_DUPLICATE("epp",	"tegra_grhost",	"epp"),
 	CLK_DUPLICATE("mpe",	"tegra_grhost",	"mpe"),
diff --git a/arch/arm/mach-tegra/tegra30_clocks_data.c b/arch/arm/mach-tegra/tegra30_clocks_data.c
index 6942c7a..5787865 100644
--- a/arch/arm/mach-tegra/tegra30_clocks_data.c
+++ b/arch/arm/mach-tegra/tegra30_clocks_data.c
@@ -1338,6 +1338,7 @@ struct clk_duplicate tegra_clk_duplicates[] = {
 	CLK_DUPLICATE("pll_p", "tegradc.0", "parent"),
 	CLK_DUPLICATE("pll_p", "tegradc.1", "parent"),
 	CLK_DUPLICATE("pll_d2_out0", "hdmi", "parent"),
+	CLK_DUPLICATE("2d", "gr2d", "gr2d"),
 };
 
 struct clk *tegra_ptr_clks[] = {
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCHv3 6/7] ARM: tegra: Add board data and 2D clocks
@ 2012-12-13 14:04   ` Terje Bergstrom
  0 siblings, 0 replies; 24+ messages in thread
From: Terje Bergstrom @ 2012-12-13 14:04 UTC (permalink / raw)
  To: tbergstrom, thierry.reding, dev, linux-tegra, dri-devel
  Cc: amerilainen, linux-kernel

Add a driver alias gr2d for Tegra 2D device, and assign a duplicate
of 2D clock to that driver alias.

Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
---
 arch/arm/mach-tegra/board-dt-tegra20.c    |    1 +
 arch/arm/mach-tegra/board-dt-tegra30.c    |    1 +
 arch/arm/mach-tegra/tegra20_clocks_data.c |    2 +-
 arch/arm/mach-tegra/tegra30_clocks_data.c |    1 +
 4 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/arm/mach-tegra/board-dt-tegra20.c b/arch/arm/mach-tegra/board-dt-tegra20.c
index 734d9cc..2b7a3c2 100644
--- a/arch/arm/mach-tegra/board-dt-tegra20.c
+++ b/arch/arm/mach-tegra/board-dt-tegra20.c
@@ -95,6 +95,7 @@ struct of_dev_auxdata tegra20_auxdata_lookup[] __initdata = {
 	OF_DEV_AUXDATA("nvidia,tegra20-slink", 0x7000D800, "spi_tegra.2", NULL),
 	OF_DEV_AUXDATA("nvidia,tegra20-slink", 0x7000DA00, "spi_tegra.3", NULL),
 	OF_DEV_AUXDATA("nvidia,tegra20-host1x", 0x50000000, "host1x", NULL),
+	OF_DEV_AUXDATA("nvidia,tegra20-gr2d", 0x54140000, "gr2d", NULL),
 	OF_DEV_AUXDATA("nvidia,tegra20-dc", 0x54200000, "tegradc.0", NULL),
 	OF_DEV_AUXDATA("nvidia,tegra20-dc", 0x54240000, "tegradc.1", NULL),
 	OF_DEV_AUXDATA("nvidia,tegra20-hdmi", 0x54280000, "hdmi", NULL),
diff --git a/arch/arm/mach-tegra/board-dt-tegra30.c b/arch/arm/mach-tegra/board-dt-tegra30.c
index 6497d12..6a9e6cb 100644
--- a/arch/arm/mach-tegra/board-dt-tegra30.c
+++ b/arch/arm/mach-tegra/board-dt-tegra30.c
@@ -58,6 +58,7 @@ struct of_dev_auxdata tegra30_auxdata_lookup[] __initdata = {
 	OF_DEV_AUXDATA("nvidia,tegra30-slink", 0x7000DC00, "spi_tegra.4", NULL),
 	OF_DEV_AUXDATA("nvidia,tegra30-slink", 0x7000DE00, "spi_tegra.5", NULL),
 	OF_DEV_AUXDATA("nvidia,tegra30-host1x", 0x50000000, "host1x", NULL),
+	OF_DEV_AUXDATA("nvidia,tegra30-gr2d", 0x54140000, "gr2d", NULL),
 	OF_DEV_AUXDATA("nvidia,tegra30-dc", 0x54200000, "tegradc.0", NULL),
 	OF_DEV_AUXDATA("nvidia,tegra30-dc", 0x54240000, "tegradc.1", NULL),
 	OF_DEV_AUXDATA("nvidia,tegra30-hdmi", 0x54280000, "hdmi", NULL),
diff --git a/arch/arm/mach-tegra/tegra20_clocks_data.c b/arch/arm/mach-tegra/tegra20_clocks_data.c
index a23a073..15d440a 100644
--- a/arch/arm/mach-tegra/tegra20_clocks_data.c
+++ b/arch/arm/mach-tegra/tegra20_clocks_data.c
@@ -1041,7 +1041,7 @@ static struct clk_duplicate tegra_clk_duplicates[] = {
 	CLK_DUPLICATE("usbd",	"utmip-pad",	NULL),
 	CLK_DUPLICATE("usbd",	"tegra-ehci.0",	NULL),
 	CLK_DUPLICATE("usbd",	"tegra-otg",	NULL),
-	CLK_DUPLICATE("2d",	"tegra_grhost",	"gr2d"),
+	CLK_DUPLICATE("2d",	"gr2d",	"gr2d"),
 	CLK_DUPLICATE("3d",	"tegra_grhost",	"gr3d"),
 	CLK_DUPLICATE("epp",	"tegra_grhost",	"epp"),
 	CLK_DUPLICATE("mpe",	"tegra_grhost",	"mpe"),
diff --git a/arch/arm/mach-tegra/tegra30_clocks_data.c b/arch/arm/mach-tegra/tegra30_clocks_data.c
index 6942c7a..5787865 100644
--- a/arch/arm/mach-tegra/tegra30_clocks_data.c
+++ b/arch/arm/mach-tegra/tegra30_clocks_data.c
@@ -1338,6 +1338,7 @@ struct clk_duplicate tegra_clk_duplicates[] = {
 	CLK_DUPLICATE("pll_p", "tegradc.0", "parent"),
 	CLK_DUPLICATE("pll_p", "tegradc.1", "parent"),
 	CLK_DUPLICATE("pll_d2_out0", "hdmi", "parent"),
+	CLK_DUPLICATE("2d", "gr2d", "gr2d"),
 };
 
 struct clk *tegra_ptr_clks[] = {
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCHv3 7/7] drm: tegra: Add gr2d device
  2012-12-13 14:04 ` Terje Bergstrom
@ 2012-12-13 14:04   ` Terje Bergstrom
  -1 siblings, 0 replies; 24+ messages in thread
From: Terje Bergstrom @ 2012-12-13 14:04 UTC (permalink / raw)
  To: tbergstrom, thierry.reding, dev, linux-tegra, dri-devel; +Cc: linux-kernel

Add client driver for 2D device.

Signed-off-by: Arto Merilainen <amerilainen@nvidia.com>
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
---
 drivers/gpu/drm/tegra/Makefile |    2 +-
 drivers/gpu/drm/tegra/drm.c    |  211 +++++++++++++++++++++++++++-
 drivers/gpu/drm/tegra/drm.h    |   29 ++++
 drivers/gpu/drm/tegra/gr2d.c   |  300 ++++++++++++++++++++++++++++++++++++++++
 include/drm/tegra_drm.h        |  111 +++++++++++++++
 5 files changed, 651 insertions(+), 2 deletions(-)
 create mode 100644 drivers/gpu/drm/tegra/gr2d.c

diff --git a/drivers/gpu/drm/tegra/Makefile b/drivers/gpu/drm/tegra/Makefile
index f4c05bb..2661f41 100644
--- a/drivers/gpu/drm/tegra/Makefile
+++ b/drivers/gpu/drm/tegra/Makefile
@@ -1,7 +1,7 @@
 ccflags-y := -Iinclude/drm
 ccflags-$(CONFIG_DRM_TEGRA_DEBUG) += -DDEBUG
 
-tegra-drm-y := drm.o fb.o dc.o
+tegra-drm-y := drm.o fb.o dc.o gr2d.o
 tegra-drm-y += output.o rgb.o hdmi.o
 
 obj-$(CONFIG_DRM_TEGRA) += tegra-drm.o
diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index 530bed4..ab4460a 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -60,8 +60,10 @@ static int tegra_drm_parse_dt(struct tegradrm *tegradrm)
 	static const char * const compat[] = {
 		"nvidia,tegra20-dc",
 		"nvidia,tegra20-hdmi",
+		"nvidia,tegra20-gr2d",
 		"nvidia,tegra30-dc",
 		"nvidia,tegra30-hdmi",
+		"nvidia,tegra30-gr2d"
 	};
 	unsigned int i;
 	int err;
@@ -218,12 +220,29 @@ static int tegra_drm_unload(struct drm_device *drm)
 
 static int tegra_drm_open(struct drm_device *drm, struct drm_file *filp)
 {
-	return 0;
+	struct tegra_drm_fpriv *fpriv;
+	int err = 0;
+
+	fpriv = kzalloc(sizeof(*fpriv), GFP_KERNEL);
+	if (!fpriv)
+		return -ENOMEM;
+
+	INIT_LIST_HEAD(&fpriv->contexts);
+	filp->driver_priv = fpriv;
+
+	return err;
 }
 
 static void tegra_drm_close(struct drm_device *drm, struct drm_file *filp)
 {
+	struct tegra_drm_fpriv *fpriv = tegra_drm_fpriv(filp);
+	struct tegra_drm_context *context, *tmp;
 
+	list_for_each_entry_safe(context, tmp, &fpriv->contexts, list) {
+		context->client->ops->close_channel(context);
+		kfree(context);
+	}
+	kfree(fpriv);
 }
 
 static void tegra_drm_lastclose(struct drm_device *drm)
@@ -245,8 +264,14 @@ static int __init tegra_drm_init(void)
 	err = platform_driver_register(&tegra_hdmi_driver);
 	if (err < 0)
 		goto unregister_dc;
+
+	err = platform_driver_register(&tegra_gr2d_driver);
+	if (err < 0)
+		goto unregister_hdmi;
 	return 0;
 
+unregister_hdmi:
+	platform_driver_unregister(&tegra_hdmi_driver);
 unregister_dc:
 	platform_driver_unregister(&tegra_dc_driver);
 free_tegradrm:
@@ -257,13 +282,197 @@ module_init(tegra_drm_init);
 
 static void __exit tegra_drm_exit(void)
 {
+	platform_driver_unregister(&tegra_gr2d_driver);
 	platform_driver_unregister(&tegra_hdmi_driver);
 	platform_driver_unregister(&tegra_dc_driver);
 	kfree(tegradrm);
 }
 module_exit(tegra_drm_exit);
 
+static int
+tegra_drm_ioctl_syncpt_read(struct drm_device *drm, void *data,
+			 struct drm_file *file_priv)
+{
+	struct tegra_drm_syncpt_read_args *args = data;
+
+	args->value = host1x_syncpt_read_byid(args->id);
+	return 0;
+}
+
+static int
+tegra_drm_ioctl_syncpt_incr(struct drm_device *drm, void *data,
+			 struct drm_file *file_priv)
+{
+	struct tegra_drm_syncpt_incr_args *args = data;
+	host1x_syncpt_incr_byid(args->id);
+	return 0;
+}
+
+static int
+tegra_drm_ioctl_syncpt_wait(struct drm_device *drm, void *data,
+			 struct drm_file *file_priv)
+{
+	struct tegra_drm_syncpt_wait_args *args = data;
+	int err;
+
+	err = host1x_syncpt_wait_byid(args->id, args->thresh,
+			args->timeout, &args->value);
+
+	return err;
+}
+
+static int
+tegra_drm_ioctl_open_channel(struct drm_device *drm, void *data,
+			 struct drm_file *file_priv)
+{
+	struct tegra_drm_open_channel_args *args = data;
+	struct tegra_drm_client *client;
+	struct tegra_drm_context *context;
+	struct tegra_drm_fpriv *fpriv = tegra_drm_fpriv(file_priv);
+	struct tegradrm *tegradrm = drm->dev_private;
+	int err = 0;
+
+	context = kzalloc(sizeof(*context), GFP_KERNEL);
+	if (!context)
+		return -ENOMEM;
+
+	list_for_each_entry(client, &tegradrm->clients, list) {
+		if (client->class == args->class) {
+			dev_dbg(drm->dev, "opening client %x\n", args->class);
+			context->client = client;
+			err = client->ops->open_channel(client, context);
+			if (err)
+				goto out;
+
+			dev_dbg(drm->dev, "context %p\n", context);
+			list_add(&context->list, &fpriv->contexts);
+			args->context = (uintptr_t)context;
+			goto out;
+		}
+	}
+	err = -ENODEV;
+
+out:
+	if (err)
+		kfree(context);
+
+	return err;
+}
+
+static int
+tegra_drm_ioctl_close_channel(struct drm_device *drm, void *data,
+			 struct drm_file *file_priv)
+{
+	struct tegra_drm_open_channel_args *args = data;
+	struct tegra_drm_context *context, *tmp;
+	struct tegra_drm_fpriv *fpriv = tegra_drm_fpriv(file_priv);
+	int err = 0;
+
+	list_for_each_entry_safe(context, tmp, &fpriv->contexts, list) {
+		if ((uintptr_t)context == args->context) {
+			context->client->ops->close_channel(context);
+			list_del(&context->list);
+			kfree(context);
+			goto out;
+		}
+	}
+	err = -EINVAL;
+
+out:
+	return err;
+}
+
+static int
+tegra_drm_ioctl_get_syncpoint(struct drm_device *drm, void *data,
+			 struct drm_file *file_priv)
+{
+	struct tegra_drm_get_channel_param_args *args = data;
+	struct tegra_drm_context *context;
+	struct tegra_drm_fpriv *fpriv = tegra_drm_fpriv(file_priv);
+	int err = 0;
+
+	list_for_each_entry(context, &fpriv->contexts, list) {
+		if ((uintptr_t)context == args->context) {
+			args->value =
+				context->client->ops->get_syncpoint(context,
+						args->param);
+			goto out;
+		}
+	}
+	err = -ENODEV;
+
+out:
+	return err;
+}
+
+static int
+tegra_drm_ioctl_submit(struct drm_device *drm, void *data,
+			 struct drm_file *file_priv)
+{
+	struct tegra_drm_submit_args *args = data;
+	struct tegra_drm_context *context;
+	struct tegra_drm_fpriv *fpriv = tegra_drm_fpriv(file_priv);
+	int err = 0;
+
+	list_for_each_entry(context, &fpriv->contexts, list) {
+		if ((uintptr_t)context == args->context) {
+			err = context->client->ops->submit(context, args, drm,
+				file_priv);
+			goto out;
+		}
+	}
+	err = -ENODEV;
+
+out:
+	return err;
+
+}
+
+static int
+tegra_drm_create_ioctl(struct drm_device *drm, void *data,
+			 struct drm_file *file_priv)
+{
+	struct tegra_gem_create *args = data;
+	struct drm_gem_cma_object *cma_obj;
+	int ret;
+
+	cma_obj = drm_gem_cma_create(drm, args->size);
+	if (IS_ERR(cma_obj))
+		goto err_cma_create;
+
+	ret = drm_gem_handle_create(file_priv, &cma_obj->base, &args->handle);
+	if (ret)
+		goto err_handle_create;
+
+	args->offset = cma_obj->base.map_list.hash.key << PAGE_SHIFT;
+
+	drm_gem_object_unreference(&cma_obj->base);
+
+	return 0;
+
+err_handle_create:
+	drm_gem_cma_free_object(&cma_obj->base);
+err_cma_create:
+	return -ENOMEM;
+}
+
 static struct drm_ioctl_desc tegra_drm_ioctls[] = {
+	DRM_IOCTL_DEF_DRV(TEGRA_GEM_CREATE,
+			tegra_drm_create_ioctl, DRM_UNLOCKED | DRM_AUTH),
+	DRM_IOCTL_DEF_DRV(TEGRA_DRM_SYNCPT_READ,
+			tegra_drm_ioctl_syncpt_read, DRM_UNLOCKED),
+	DRM_IOCTL_DEF_DRV(TEGRA_DRM_SYNCPT_INCR,
+			tegra_drm_ioctl_syncpt_incr, DRM_UNLOCKED),
+	DRM_IOCTL_DEF_DRV(TEGRA_DRM_SYNCPT_WAIT,
+			tegra_drm_ioctl_syncpt_wait, DRM_UNLOCKED),
+	DRM_IOCTL_DEF_DRV(TEGRA_DRM_OPEN_CHANNEL,
+			tegra_drm_ioctl_open_channel, DRM_UNLOCKED),
+	DRM_IOCTL_DEF_DRV(TEGRA_DRM_CLOSE_CHANNEL,
+			tegra_drm_ioctl_close_channel, DRM_UNLOCKED),
+	DRM_IOCTL_DEF_DRV(TEGRA_DRM_GET_SYNCPOINT,
+			tegra_drm_ioctl_get_syncpoint, DRM_UNLOCKED),
+	DRM_IOCTL_DEF_DRV(TEGRA_DRM_SUBMIT,
+			tegra_drm_ioctl_submit, DRM_UNLOCKED),
 };
 
 static const struct file_operations tegra_drm_fops = {
diff --git a/drivers/gpu/drm/tegra/drm.h b/drivers/gpu/drm/tegra/drm.h
index 3e800fb..c9c2b85 100644
--- a/drivers/gpu/drm/tegra/drm.h
+++ b/drivers/gpu/drm/tegra/drm.h
@@ -46,16 +46,44 @@ struct tegradrm {
 
 struct tegra_drm_client;
 
+struct tegra_drm_context {
+	struct tegra_drm_client *client;
+	struct host1x_channel *channel;
+	struct list_head list;
+};
+
 struct tegra_drm_client_ops {
 	int (*drm_init)(struct tegra_drm_client *, struct drm_device *);
 	int (*drm_exit)(struct tegra_drm_client *);
+	int (*open_channel)(struct tegra_drm_client *,
+			struct tegra_drm_context *);
+	void (*close_channel)(struct tegra_drm_context *);
+	u32 (*get_syncpoint)(struct tegra_drm_context *, int index);
+	int (*submit)(struct tegra_drm_context *,
+			struct tegra_drm_submit_args *,
+			struct drm_device *,
+			struct drm_file *);
+};
+
+
+struct tegra_drm_fpriv {
+	struct list_head contexts;
 };
 
+static inline struct tegra_drm_fpriv *
+tegra_drm_fpriv(struct drm_file *file_priv)
+{
+	return file_priv ? file_priv->driver_priv : NULL;
+}
+
 struct tegra_drm_client {
 	struct device *dev;
 
 	const struct tegra_drm_client_ops *ops;
 
+	u32 class;
+	struct host1x_channel *channel;
+
 	struct list_head list;
 
 };
@@ -221,6 +249,7 @@ extern void tegra_drm_fb_restore(struct drm_device *drm);
 
 extern struct platform_driver tegra_hdmi_driver;
 extern struct platform_driver tegra_dc_driver;
+extern struct platform_driver tegra_gr2d_driver;
 extern struct drm_driver tegra_drm_driver;
 
 #endif /* TEGRA_DRM_H */
diff --git a/drivers/gpu/drm/tegra/gr2d.c b/drivers/gpu/drm/tegra/gr2d.c
new file mode 100644
index 0000000..6554b6b
--- /dev/null
+++ b/drivers/gpu/drm/tegra/gr2d.c
@@ -0,0 +1,300 @@
+/*
+ * drivers/video/tegra/host/gr2d/gr2d.c
+ *
+ * Tegra Graphics 2D
+ *
+ * Copyright (c) 2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/export.h>
+#include <linux/of.h>
+#include <linux/of_device.h>
+#include <linux/clk.h>
+#include <drm/tegra_drm.h>
+#include <linux/host1x.h>
+#include "drm.h"
+
+struct gr2d {
+	struct tegra_drm_client client;
+	struct clk *clk;
+	struct host1x_syncpt *syncpt;
+	struct host1x_channel *channel;
+};
+
+static int gr2d_is_addr_reg(struct platform_device *dev, u32 class, u32 reg);
+
+static int gr2d_client_init(struct tegra_drm_client *client,
+		struct drm_device *drm)
+{
+	return 0;
+}
+
+static int gr2d_client_exit(struct tegra_drm_client *client)
+{
+	return 0;
+}
+
+static int gr2d_open_channel(struct tegra_drm_client *client,
+		struct tegra_drm_context *context)
+{
+	struct gr2d *gr2d = dev_get_drvdata(client->dev);
+	context->channel = host1x_channel_get(gr2d->channel);
+
+	if (!context->channel)
+		return -ENOMEM;
+
+	return 0;
+}
+
+static void gr2d_close_channel(struct tegra_drm_context *context)
+{
+	host1x_channel_put(context->channel);
+}
+
+static u32 gr2d_get_syncpoint(struct tegra_drm_context *context, int index)
+{
+	struct gr2d *gr2d = dev_get_drvdata(context->client->dev);
+	if (index != 0)
+		return UINT_MAX;
+
+	return host1x_syncpt_id(gr2d->syncpt);
+}
+
+static u32 handle_cma_to_host1x(struct drm_device *drm,
+				struct drm_file *file_priv, u32 gem_handle)
+{
+	struct drm_gem_object *obj;
+	struct drm_gem_cma_object *cma_obj;
+	u32 host1x_handle;
+
+	obj = drm_gem_object_lookup(drm, file_priv, gem_handle);
+	if (!obj)
+		return 0;
+
+	cma_obj = to_drm_gem_cma_obj(obj);
+	host1x_handle = host1x_memmgr_host1x_id(mem_mgr_type_cma, (u32)cma_obj);
+	drm_gem_object_unreference(obj);
+
+	return host1x_handle;
+}
+
+static int gr2d_submit(struct tegra_drm_context *context,
+		struct tegra_drm_submit_args *args,
+		struct drm_device *drm,
+		struct drm_file *file_priv)
+{
+	struct host1x_job *job;
+	int num_cmdbufs = args->num_cmdbufs;
+	int num_relocs = args->num_relocs;
+	int num_waitchks = args->num_waitchks;
+	struct tegra_drm_cmdbuf __user *cmdbufs =
+		(void * __user)(uintptr_t)args->cmdbufs;
+	struct tegra_drm_reloc __user *relocs =
+		(void * __user)(uintptr_t)args->relocs;
+	struct tegra_drm_waitchk __user *waitchks =
+		(void * __user)(uintptr_t)args->waitchks;
+	struct tegra_drm_syncpt_incr syncpt_incr;
+	int err;
+
+	/* We don't yet support other than one syncpt_incr struct per submit */
+	if (args->num_syncpt_incrs != 1)
+		return -EINVAL;
+
+	job = host1x_job_alloc(context->channel,
+			args->num_cmdbufs,
+			args->num_relocs,
+			args->num_waitchks);
+	if (!job)
+		return -ENOMEM;
+
+	job->num_relocs = args->num_relocs;
+	job->num_waitchk = args->num_waitchks;
+	job->clientid = (u32)args->context;
+	job->class = context->client->class;
+	job->serialize = true;
+
+	while (num_cmdbufs) {
+		struct tegra_drm_cmdbuf cmdbuf;
+		err = copy_from_user(&cmdbuf, cmdbufs, sizeof(cmdbuf));
+		if (err)
+			goto fail;
+
+		cmdbuf.mem = handle_cma_to_host1x(drm, file_priv, cmdbuf.mem);
+		if (!cmdbuf.mem)
+			goto fail;
+
+		host1x_job_add_gather(job,
+				cmdbuf.mem, cmdbuf.words, cmdbuf.offset);
+		num_cmdbufs--;
+		cmdbufs++;
+	}
+
+	err = copy_from_user(job->relocarray,
+			relocs, sizeof(*relocs) * num_relocs);
+	if (err)
+		goto fail;
+
+	while (num_relocs--) {
+		job->relocarray[num_relocs].cmdbuf_mem =
+			handle_cma_to_host1x(drm, file_priv,
+			job->relocarray[num_relocs].cmdbuf_mem);
+		job->relocarray[num_relocs].target =
+			handle_cma_to_host1x(drm, file_priv,
+			job->relocarray[num_relocs].target);
+
+		if (!job->relocarray[num_relocs].target ||
+			!job->relocarray[num_relocs].cmdbuf_mem)
+			goto fail;
+	}
+
+	err = copy_from_user(job->waitchk,
+			waitchks, sizeof(*waitchks) * num_waitchks);
+	if (err)
+		goto fail;
+
+	err = host1x_job_pin(job, to_platform_device(context->client->dev));
+	if (err)
+		goto fail;
+
+	err = copy_from_user(&syncpt_incr,
+			(void * __user)(uintptr_t)args->syncpt_incrs,
+			sizeof(syncpt_incr));
+	if (err)
+		goto fail;
+
+	job->syncpt_id = syncpt_incr.syncpt_id;
+	job->syncpt_incrs = syncpt_incr.syncpt_incrs;
+	job->timeout = 10000;
+	job->is_addr_reg = gr2d_is_addr_reg;
+	if (args->timeout && args->timeout < 10000)
+		job->timeout = args->timeout;
+
+	err = host1x_channel_submit(job);
+	if (err)
+		goto fail_submit;
+
+	args->fence = job->syncpt_end;
+
+	host1x_job_put(job);
+	return 0;
+
+fail_submit:
+	host1x_job_unpin(job);
+fail:
+	host1x_job_put(job);
+	return err;
+}
+
+static struct tegra_drm_client_ops gr2d_client_ops = {
+	.drm_init = gr2d_client_init,
+	.drm_exit = gr2d_client_exit,
+	.open_channel = gr2d_open_channel,
+	.close_channel = gr2d_close_channel,
+	.get_syncpoint = gr2d_get_syncpoint,
+	.submit = gr2d_submit,
+};
+
+static int gr2d_is_addr_reg(struct platform_device *dev, u32 class, u32 reg)
+{
+	int ret;
+
+	if (class == NV_HOST1X_CLASS_ID)
+		ret = reg == 0x2b;
+	else
+		switch (reg) {
+		case 0x1a:
+		case 0x1b:
+		case 0x26:
+		case 0x2b:
+		case 0x2c:
+		case 0x2d:
+		case 0x31:
+		case 0x32:
+		case 0x48:
+		case 0x49:
+		case 0x4a:
+		case 0x4b:
+		case 0x4c:
+			ret = 1;
+			break;
+		default:
+			ret = 0;
+			break;
+		}
+
+	return ret;
+}
+
+static struct of_device_id gr2d_match[] __devinitdata = {
+	{ .compatible = "nvidia,tegra30-gr2d" },
+	{ .compatible = "nvidia,tegra20-gr2d" },
+	{ },
+};
+
+static int __devinit gr2d_probe(struct platform_device *dev)
+{
+	int err;
+	struct gr2d *gr2d = NULL;
+
+	gr2d = devm_kzalloc(&dev->dev, sizeof(*gr2d), GFP_KERNEL);
+	if (!gr2d)
+		return -ENOMEM;
+
+	gr2d->clk = devm_clk_get(&dev->dev, "gr2d");
+	if (IS_ERR(gr2d->clk)) {
+		dev_err(&dev->dev, "cannot get clock\n");
+		return PTR_ERR(gr2d->clk);
+	}
+
+	err = clk_prepare_enable(gr2d->clk);
+	if (err) {
+		dev_err(&dev->dev, "cannot turn on clock\n");
+		return err;
+	}
+
+	gr2d->channel = host1x_channel_alloc(dev);
+	if (!gr2d->channel)
+		return -ENOMEM;
+
+	gr2d->syncpt = host1x_syncpt_alloc(dev, 0);
+	if (!gr2d->syncpt) {
+		host1x_channel_put(gr2d->channel);
+		return -ENOMEM;
+	}
+
+	gr2d->client.ops = &gr2d_client_ops;
+	gr2d->client.dev = &dev->dev;
+	gr2d->client.class = NV_GRAPHICS_2D_CLASS_ID;
+
+	platform_set_drvdata(dev, gr2d);
+	return tegra_drm_register_client(&gr2d->client);
+}
+
+static int __exit gr2d_remove(struct platform_device *dev)
+{
+	struct gr2d *gr2d = platform_get_drvdata(dev);
+	host1x_syncpt_free(gr2d->syncpt);
+	return 0;
+}
+
+struct platform_driver tegra_gr2d_driver = {
+	.probe = gr2d_probe,
+	.remove = __exit_p(gr2d_remove),
+	.driver = {
+		.owner = THIS_MODULE,
+		.name = "gr2d",
+		.of_match_table = gr2d_match,
+	}
+};
diff --git a/include/drm/tegra_drm.h b/include/drm/tegra_drm.h
index 8632f49..11fb019 100644
--- a/include/drm/tegra_drm.h
+++ b/include/drm/tegra_drm.h
@@ -17,4 +17,115 @@
 #ifndef _TEGRA_DRM_H_
 #define _TEGRA_DRM_H_
 
+struct tegra_gem_create {
+	__u64 size;
+	unsigned int flags;
+	unsigned int handle;
+	unsigned int offset;
+};
+
+struct tegra_gem_invalidate {
+	unsigned int handle;
+};
+
+struct tegra_gem_flush {
+	unsigned int handle;
+};
+
+struct tegra_drm_syncpt_read_args {
+	__u32 id;
+	__u32 value;
+};
+
+struct tegra_drm_syncpt_incr_args {
+	__u32 id;
+	__u32 pad;
+};
+
+struct tegra_drm_syncpt_wait_args {
+	__u32 id;
+	__u32 thresh;
+	__s32 timeout;
+	__u32 value;
+};
+
+#define DRM_TEGRA_NO_TIMEOUT	(-1)
+
+struct tegra_drm_open_channel_args {
+	__u32 class;
+	__u32 pad;
+	__u64 context;
+};
+
+struct tegra_drm_get_channel_param_args {
+	__u64 context;
+	__u32 param;
+	__u32 value;
+};
+
+struct tegra_drm_syncpt_incr {
+	__u32 syncpt_id;
+	__u32 syncpt_incrs;
+};
+
+struct tegra_drm_cmdbuf {
+	__u32 mem;
+	__u32 offset;
+	__u32 words;
+	__u32 pad;
+};
+
+struct tegra_drm_reloc {
+	__u32 cmdbuf_mem;
+	__u32 cmdbuf_offset;
+	__u32 target;
+	__u32 target_offset;
+	__u32 shift;
+	__u32 pad;
+};
+
+struct tegra_drm_waitchk {
+	__u32 mem;
+	__u32 offset;
+	__u32 syncpt_id;
+	__u32 thresh;
+};
+
+struct tegra_drm_submit_args {
+	__u64 context;
+	__u32 num_syncpt_incrs;
+	__u32 num_cmdbufs;
+	__u32 num_relocs;
+	__u32 submit_version;
+	__u32 num_waitchks;
+	__u32 waitchk_mask;
+	__u32 timeout;
+	__u32 pad;
+	__u64 syncpt_incrs;
+	__u64 cmdbufs;
+	__u64 relocs;
+	__u64 waitchks;
+	__u32 fence;		/* Return value */
+
+	__u32 reserved[5];	/* future expansion */
+};
+
+#define DRM_TEGRA_GEM_CREATE		0x00
+#define DRM_TEGRA_DRM_SYNCPT_READ	0x01
+#define DRM_TEGRA_DRM_SYNCPT_INCR	0x02
+#define DRM_TEGRA_DRM_SYNCPT_WAIT	0x03
+#define DRM_TEGRA_DRM_OPEN_CHANNEL	0x04
+#define DRM_TEGRA_DRM_CLOSE_CHANNEL	0x05
+#define DRM_TEGRA_DRM_GET_SYNCPOINT	0x06
+#define DRM_TEGRA_DRM_SUBMIT		0x08
+
+#define DRM_IOCTL_TEGRA_GEM_CREATE DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_GEM_CREATE, struct tegra_gem_create)
+#define DRM_IOCTL_TEGRA_DRM_SYNCPT_READ DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_DRM_SYNCPT_READ, struct tegra_drm_syncpt_read_args)
+#define DRM_IOCTL_TEGRA_DRM_SYNCPT_INCR DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_DRM_SYNCPT_INCR, struct tegra_drm_syncpt_incr_args)
+#define DRM_IOCTL_TEGRA_DRM_SYNCPT_WAIT DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_DRM_SYNCPT_WAIT, struct tegra_drm_syncpt_wait_args)
+#define DRM_IOCTL_TEGRA_DRM_OPEN_CHANNEL DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_DRM_OPEN_CHANNEL, struct tegra_drm_open_channel_args)
+#define DRM_IOCTL_TEGRA_DRM_CLOSE_CHANNEL DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_DRM_CLOSE_CHANNEL, struct tegra_drm_open_channel_args)
+#define DRM_IOCTL_TEGRA_DRM_GET_SYNCPOINT DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_DRM_GET_SYNCPOINT, struct tegra_drm_get_channel_param_args)
+#define DRM_IOCTL_TEGRA_DRM_SUBMIT DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_DRM_SUBMIT, struct tegra_drm_submit_args)
+
 #endif
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCHv3 7/7] drm: tegra: Add gr2d device
@ 2012-12-13 14:04   ` Terje Bergstrom
  0 siblings, 0 replies; 24+ messages in thread
From: Terje Bergstrom @ 2012-12-13 14:04 UTC (permalink / raw)
  To: tbergstrom, thierry.reding, dev, linux-tegra, dri-devel
  Cc: amerilainen, linux-kernel

Add client driver for 2D device.

Signed-off-by: Arto Merilainen <amerilainen@nvidia.com>
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
---
 drivers/gpu/drm/tegra/Makefile |    2 +-
 drivers/gpu/drm/tegra/drm.c    |  211 +++++++++++++++++++++++++++-
 drivers/gpu/drm/tegra/drm.h    |   29 ++++
 drivers/gpu/drm/tegra/gr2d.c   |  300 ++++++++++++++++++++++++++++++++++++++++
 include/drm/tegra_drm.h        |  111 +++++++++++++++
 5 files changed, 651 insertions(+), 2 deletions(-)
 create mode 100644 drivers/gpu/drm/tegra/gr2d.c

diff --git a/drivers/gpu/drm/tegra/Makefile b/drivers/gpu/drm/tegra/Makefile
index f4c05bb..2661f41 100644
--- a/drivers/gpu/drm/tegra/Makefile
+++ b/drivers/gpu/drm/tegra/Makefile
@@ -1,7 +1,7 @@
 ccflags-y := -Iinclude/drm
 ccflags-$(CONFIG_DRM_TEGRA_DEBUG) += -DDEBUG
 
-tegra-drm-y := drm.o fb.o dc.o
+tegra-drm-y := drm.o fb.o dc.o gr2d.o
 tegra-drm-y += output.o rgb.o hdmi.o
 
 obj-$(CONFIG_DRM_TEGRA) += tegra-drm.o
diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index 530bed4..ab4460a 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -60,8 +60,10 @@ static int tegra_drm_parse_dt(struct tegradrm *tegradrm)
 	static const char * const compat[] = {
 		"nvidia,tegra20-dc",
 		"nvidia,tegra20-hdmi",
+		"nvidia,tegra20-gr2d",
 		"nvidia,tegra30-dc",
 		"nvidia,tegra30-hdmi",
+		"nvidia,tegra30-gr2d"
 	};
 	unsigned int i;
 	int err;
@@ -218,12 +220,29 @@ static int tegra_drm_unload(struct drm_device *drm)
 
 static int tegra_drm_open(struct drm_device *drm, struct drm_file *filp)
 {
-	return 0;
+	struct tegra_drm_fpriv *fpriv;
+	int err = 0;
+
+	fpriv = kzalloc(sizeof(*fpriv), GFP_KERNEL);
+	if (!fpriv)
+		return -ENOMEM;
+
+	INIT_LIST_HEAD(&fpriv->contexts);
+	filp->driver_priv = fpriv;
+
+	return err;
 }
 
 static void tegra_drm_close(struct drm_device *drm, struct drm_file *filp)
 {
+	struct tegra_drm_fpriv *fpriv = tegra_drm_fpriv(filp);
+	struct tegra_drm_context *context, *tmp;
 
+	list_for_each_entry_safe(context, tmp, &fpriv->contexts, list) {
+		context->client->ops->close_channel(context);
+		kfree(context);
+	}
+	kfree(fpriv);
 }
 
 static void tegra_drm_lastclose(struct drm_device *drm)
@@ -245,8 +264,14 @@ static int __init tegra_drm_init(void)
 	err = platform_driver_register(&tegra_hdmi_driver);
 	if (err < 0)
 		goto unregister_dc;
+
+	err = platform_driver_register(&tegra_gr2d_driver);
+	if (err < 0)
+		goto unregister_hdmi;
 	return 0;
 
+unregister_hdmi:
+	platform_driver_unregister(&tegra_hdmi_driver);
 unregister_dc:
 	platform_driver_unregister(&tegra_dc_driver);
 free_tegradrm:
@@ -257,13 +282,197 @@ module_init(tegra_drm_init);
 
 static void __exit tegra_drm_exit(void)
 {
+	platform_driver_unregister(&tegra_gr2d_driver);
 	platform_driver_unregister(&tegra_hdmi_driver);
 	platform_driver_unregister(&tegra_dc_driver);
 	kfree(tegradrm);
 }
 module_exit(tegra_drm_exit);
 
+static int
+tegra_drm_ioctl_syncpt_read(struct drm_device *drm, void *data,
+			 struct drm_file *file_priv)
+{
+	struct tegra_drm_syncpt_read_args *args = data;
+
+	args->value = host1x_syncpt_read_byid(args->id);
+	return 0;
+}
+
+static int
+tegra_drm_ioctl_syncpt_incr(struct drm_device *drm, void *data,
+			 struct drm_file *file_priv)
+{
+	struct tegra_drm_syncpt_incr_args *args = data;
+	host1x_syncpt_incr_byid(args->id);
+	return 0;
+}
+
+static int
+tegra_drm_ioctl_syncpt_wait(struct drm_device *drm, void *data,
+			 struct drm_file *file_priv)
+{
+	struct tegra_drm_syncpt_wait_args *args = data;
+	int err;
+
+	err = host1x_syncpt_wait_byid(args->id, args->thresh,
+			args->timeout, &args->value);
+
+	return err;
+}
+
+static int
+tegra_drm_ioctl_open_channel(struct drm_device *drm, void *data,
+			 struct drm_file *file_priv)
+{
+	struct tegra_drm_open_channel_args *args = data;
+	struct tegra_drm_client *client;
+	struct tegra_drm_context *context;
+	struct tegra_drm_fpriv *fpriv = tegra_drm_fpriv(file_priv);
+	struct tegradrm *tegradrm = drm->dev_private;
+	int err = 0;
+
+	context = kzalloc(sizeof(*context), GFP_KERNEL);
+	if (!context)
+		return -ENOMEM;
+
+	list_for_each_entry(client, &tegradrm->clients, list) {
+		if (client->class == args->class) {
+			dev_dbg(drm->dev, "opening client %x\n", args->class);
+			context->client = client;
+			err = client->ops->open_channel(client, context);
+			if (err)
+				goto out;
+
+			dev_dbg(drm->dev, "context %p\n", context);
+			list_add(&context->list, &fpriv->contexts);
+			args->context = (uintptr_t)context;
+			goto out;
+		}
+	}
+	err = -ENODEV;
+
+out:
+	if (err)
+		kfree(context);
+
+	return err;
+}
+
+static int
+tegra_drm_ioctl_close_channel(struct drm_device *drm, void *data,
+			 struct drm_file *file_priv)
+{
+	struct tegra_drm_open_channel_args *args = data;
+	struct tegra_drm_context *context, *tmp;
+	struct tegra_drm_fpriv *fpriv = tegra_drm_fpriv(file_priv);
+	int err = 0;
+
+	list_for_each_entry_safe(context, tmp, &fpriv->contexts, list) {
+		if ((uintptr_t)context == args->context) {
+			context->client->ops->close_channel(context);
+			list_del(&context->list);
+			kfree(context);
+			goto out;
+		}
+	}
+	err = -EINVAL;
+
+out:
+	return err;
+}
+
+static int
+tegra_drm_ioctl_get_syncpoint(struct drm_device *drm, void *data,
+			 struct drm_file *file_priv)
+{
+	struct tegra_drm_get_channel_param_args *args = data;
+	struct tegra_drm_context *context;
+	struct tegra_drm_fpriv *fpriv = tegra_drm_fpriv(file_priv);
+	int err = 0;
+
+	list_for_each_entry(context, &fpriv->contexts, list) {
+		if ((uintptr_t)context == args->context) {
+			args->value =
+				context->client->ops->get_syncpoint(context,
+						args->param);
+			goto out;
+		}
+	}
+	err = -ENODEV;
+
+out:
+	return err;
+}
+
+static int
+tegra_drm_ioctl_submit(struct drm_device *drm, void *data,
+			 struct drm_file *file_priv)
+{
+	struct tegra_drm_submit_args *args = data;
+	struct tegra_drm_context *context;
+	struct tegra_drm_fpriv *fpriv = tegra_drm_fpriv(file_priv);
+	int err = 0;
+
+	list_for_each_entry(context, &fpriv->contexts, list) {
+		if ((uintptr_t)context == args->context) {
+			err = context->client->ops->submit(context, args, drm,
+				file_priv);
+			goto out;
+		}
+	}
+	err = -ENODEV;
+
+out:
+	return err;
+
+}
+
+static int
+tegra_drm_create_ioctl(struct drm_device *drm, void *data,
+			 struct drm_file *file_priv)
+{
+	struct tegra_gem_create *args = data;
+	struct drm_gem_cma_object *cma_obj;
+	int ret;
+
+	cma_obj = drm_gem_cma_create(drm, args->size);
+	if (IS_ERR(cma_obj))
+		goto err_cma_create;
+
+	ret = drm_gem_handle_create(file_priv, &cma_obj->base, &args->handle);
+	if (ret)
+		goto err_handle_create;
+
+	args->offset = cma_obj->base.map_list.hash.key << PAGE_SHIFT;
+
+	drm_gem_object_unreference(&cma_obj->base);
+
+	return 0;
+
+err_handle_create:
+	drm_gem_cma_free_object(&cma_obj->base);
+err_cma_create:
+	return -ENOMEM;
+}
+
 static struct drm_ioctl_desc tegra_drm_ioctls[] = {
+	DRM_IOCTL_DEF_DRV(TEGRA_GEM_CREATE,
+			tegra_drm_create_ioctl, DRM_UNLOCKED | DRM_AUTH),
+	DRM_IOCTL_DEF_DRV(TEGRA_DRM_SYNCPT_READ,
+			tegra_drm_ioctl_syncpt_read, DRM_UNLOCKED),
+	DRM_IOCTL_DEF_DRV(TEGRA_DRM_SYNCPT_INCR,
+			tegra_drm_ioctl_syncpt_incr, DRM_UNLOCKED),
+	DRM_IOCTL_DEF_DRV(TEGRA_DRM_SYNCPT_WAIT,
+			tegra_drm_ioctl_syncpt_wait, DRM_UNLOCKED),
+	DRM_IOCTL_DEF_DRV(TEGRA_DRM_OPEN_CHANNEL,
+			tegra_drm_ioctl_open_channel, DRM_UNLOCKED),
+	DRM_IOCTL_DEF_DRV(TEGRA_DRM_CLOSE_CHANNEL,
+			tegra_drm_ioctl_close_channel, DRM_UNLOCKED),
+	DRM_IOCTL_DEF_DRV(TEGRA_DRM_GET_SYNCPOINT,
+			tegra_drm_ioctl_get_syncpoint, DRM_UNLOCKED),
+	DRM_IOCTL_DEF_DRV(TEGRA_DRM_SUBMIT,
+			tegra_drm_ioctl_submit, DRM_UNLOCKED),
 };
 
 static const struct file_operations tegra_drm_fops = {
diff --git a/drivers/gpu/drm/tegra/drm.h b/drivers/gpu/drm/tegra/drm.h
index 3e800fb..c9c2b85 100644
--- a/drivers/gpu/drm/tegra/drm.h
+++ b/drivers/gpu/drm/tegra/drm.h
@@ -46,16 +46,44 @@ struct tegradrm {
 
 struct tegra_drm_client;
 
+struct tegra_drm_context {
+	struct tegra_drm_client *client;
+	struct host1x_channel *channel;
+	struct list_head list;
+};
+
 struct tegra_drm_client_ops {
 	int (*drm_init)(struct tegra_drm_client *, struct drm_device *);
 	int (*drm_exit)(struct tegra_drm_client *);
+	int (*open_channel)(struct tegra_drm_client *,
+			struct tegra_drm_context *);
+	void (*close_channel)(struct tegra_drm_context *);
+	u32 (*get_syncpoint)(struct tegra_drm_context *, int index);
+	int (*submit)(struct tegra_drm_context *,
+			struct tegra_drm_submit_args *,
+			struct drm_device *,
+			struct drm_file *);
+};
+
+
+struct tegra_drm_fpriv {
+	struct list_head contexts;
 };
 
+static inline struct tegra_drm_fpriv *
+tegra_drm_fpriv(struct drm_file *file_priv)
+{
+	return file_priv ? file_priv->driver_priv : NULL;
+}
+
 struct tegra_drm_client {
 	struct device *dev;
 
 	const struct tegra_drm_client_ops *ops;
 
+	u32 class;
+	struct host1x_channel *channel;
+
 	struct list_head list;
 
 };
@@ -221,6 +249,7 @@ extern void tegra_drm_fb_restore(struct drm_device *drm);
 
 extern struct platform_driver tegra_hdmi_driver;
 extern struct platform_driver tegra_dc_driver;
+extern struct platform_driver tegra_gr2d_driver;
 extern struct drm_driver tegra_drm_driver;
 
 #endif /* TEGRA_DRM_H */
diff --git a/drivers/gpu/drm/tegra/gr2d.c b/drivers/gpu/drm/tegra/gr2d.c
new file mode 100644
index 0000000..6554b6b
--- /dev/null
+++ b/drivers/gpu/drm/tegra/gr2d.c
@@ -0,0 +1,300 @@
+/*
+ * drivers/video/tegra/host/gr2d/gr2d.c
+ *
+ * Tegra Graphics 2D
+ *
+ * Copyright (c) 2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/export.h>
+#include <linux/of.h>
+#include <linux/of_device.h>
+#include <linux/clk.h>
+#include <drm/tegra_drm.h>
+#include <linux/host1x.h>
+#include "drm.h"
+
+struct gr2d {
+	struct tegra_drm_client client;
+	struct clk *clk;
+	struct host1x_syncpt *syncpt;
+	struct host1x_channel *channel;
+};
+
+static int gr2d_is_addr_reg(struct platform_device *dev, u32 class, u32 reg);
+
+static int gr2d_client_init(struct tegra_drm_client *client,
+		struct drm_device *drm)
+{
+	return 0;
+}
+
+static int gr2d_client_exit(struct tegra_drm_client *client)
+{
+	return 0;
+}
+
+static int gr2d_open_channel(struct tegra_drm_client *client,
+		struct tegra_drm_context *context)
+{
+	struct gr2d *gr2d = dev_get_drvdata(client->dev);
+	context->channel = host1x_channel_get(gr2d->channel);
+
+	if (!context->channel)
+		return -ENOMEM;
+
+	return 0;
+}
+
+static void gr2d_close_channel(struct tegra_drm_context *context)
+{
+	host1x_channel_put(context->channel);
+}
+
+static u32 gr2d_get_syncpoint(struct tegra_drm_context *context, int index)
+{
+	struct gr2d *gr2d = dev_get_drvdata(context->client->dev);
+	if (index != 0)
+		return UINT_MAX;
+
+	return host1x_syncpt_id(gr2d->syncpt);
+}
+
+static u32 handle_cma_to_host1x(struct drm_device *drm,
+				struct drm_file *file_priv, u32 gem_handle)
+{
+	struct drm_gem_object *obj;
+	struct drm_gem_cma_object *cma_obj;
+	u32 host1x_handle;
+
+	obj = drm_gem_object_lookup(drm, file_priv, gem_handle);
+	if (!obj)
+		return 0;
+
+	cma_obj = to_drm_gem_cma_obj(obj);
+	host1x_handle = host1x_memmgr_host1x_id(mem_mgr_type_cma, (u32)cma_obj);
+	drm_gem_object_unreference(obj);
+
+	return host1x_handle;
+}
+
+static int gr2d_submit(struct tegra_drm_context *context,
+		struct tegra_drm_submit_args *args,
+		struct drm_device *drm,
+		struct drm_file *file_priv)
+{
+	struct host1x_job *job;
+	int num_cmdbufs = args->num_cmdbufs;
+	int num_relocs = args->num_relocs;
+	int num_waitchks = args->num_waitchks;
+	struct tegra_drm_cmdbuf __user *cmdbufs =
+		(void * __user)(uintptr_t)args->cmdbufs;
+	struct tegra_drm_reloc __user *relocs =
+		(void * __user)(uintptr_t)args->relocs;
+	struct tegra_drm_waitchk __user *waitchks =
+		(void * __user)(uintptr_t)args->waitchks;
+	struct tegra_drm_syncpt_incr syncpt_incr;
+	int err;
+
+	/* We don't yet support other than one syncpt_incr struct per submit */
+	if (args->num_syncpt_incrs != 1)
+		return -EINVAL;
+
+	job = host1x_job_alloc(context->channel,
+			args->num_cmdbufs,
+			args->num_relocs,
+			args->num_waitchks);
+	if (!job)
+		return -ENOMEM;
+
+	job->num_relocs = args->num_relocs;
+	job->num_waitchk = args->num_waitchks;
+	job->clientid = (u32)args->context;
+	job->class = context->client->class;
+	job->serialize = true;
+
+	while (num_cmdbufs) {
+		struct tegra_drm_cmdbuf cmdbuf;
+		err = copy_from_user(&cmdbuf, cmdbufs, sizeof(cmdbuf));
+		if (err)
+			goto fail;
+
+		cmdbuf.mem = handle_cma_to_host1x(drm, file_priv, cmdbuf.mem);
+		if (!cmdbuf.mem)
+			goto fail;
+
+		host1x_job_add_gather(job,
+				cmdbuf.mem, cmdbuf.words, cmdbuf.offset);
+		num_cmdbufs--;
+		cmdbufs++;
+	}
+
+	err = copy_from_user(job->relocarray,
+			relocs, sizeof(*relocs) * num_relocs);
+	if (err)
+		goto fail;
+
+	while (num_relocs--) {
+		job->relocarray[num_relocs].cmdbuf_mem =
+			handle_cma_to_host1x(drm, file_priv,
+			job->relocarray[num_relocs].cmdbuf_mem);
+		job->relocarray[num_relocs].target =
+			handle_cma_to_host1x(drm, file_priv,
+			job->relocarray[num_relocs].target);
+
+		if (!job->relocarray[num_relocs].target ||
+			!job->relocarray[num_relocs].cmdbuf_mem)
+			goto fail;
+	}
+
+	err = copy_from_user(job->waitchk,
+			waitchks, sizeof(*waitchks) * num_waitchks);
+	if (err)
+		goto fail;
+
+	err = host1x_job_pin(job, to_platform_device(context->client->dev));
+	if (err)
+		goto fail;
+
+	err = copy_from_user(&syncpt_incr,
+			(void * __user)(uintptr_t)args->syncpt_incrs,
+			sizeof(syncpt_incr));
+	if (err)
+		goto fail;
+
+	job->syncpt_id = syncpt_incr.syncpt_id;
+	job->syncpt_incrs = syncpt_incr.syncpt_incrs;
+	job->timeout = 10000;
+	job->is_addr_reg = gr2d_is_addr_reg;
+	if (args->timeout && args->timeout < 10000)
+		job->timeout = args->timeout;
+
+	err = host1x_channel_submit(job);
+	if (err)
+		goto fail_submit;
+
+	args->fence = job->syncpt_end;
+
+	host1x_job_put(job);
+	return 0;
+
+fail_submit:
+	host1x_job_unpin(job);
+fail:
+	host1x_job_put(job);
+	return err;
+}
+
+static struct tegra_drm_client_ops gr2d_client_ops = {
+	.drm_init = gr2d_client_init,
+	.drm_exit = gr2d_client_exit,
+	.open_channel = gr2d_open_channel,
+	.close_channel = gr2d_close_channel,
+	.get_syncpoint = gr2d_get_syncpoint,
+	.submit = gr2d_submit,
+};
+
+static int gr2d_is_addr_reg(struct platform_device *dev, u32 class, u32 reg)
+{
+	int ret;
+
+	if (class == NV_HOST1X_CLASS_ID)
+		ret = reg == 0x2b;
+	else
+		switch (reg) {
+		case 0x1a:
+		case 0x1b:
+		case 0x26:
+		case 0x2b:
+		case 0x2c:
+		case 0x2d:
+		case 0x31:
+		case 0x32:
+		case 0x48:
+		case 0x49:
+		case 0x4a:
+		case 0x4b:
+		case 0x4c:
+			ret = 1;
+			break;
+		default:
+			ret = 0;
+			break;
+		}
+
+	return ret;
+}
+
+static struct of_device_id gr2d_match[] __devinitdata = {
+	{ .compatible = "nvidia,tegra30-gr2d" },
+	{ .compatible = "nvidia,tegra20-gr2d" },
+	{ },
+};
+
+static int __devinit gr2d_probe(struct platform_device *dev)
+{
+	int err;
+	struct gr2d *gr2d = NULL;
+
+	gr2d = devm_kzalloc(&dev->dev, sizeof(*gr2d), GFP_KERNEL);
+	if (!gr2d)
+		return -ENOMEM;
+
+	gr2d->clk = devm_clk_get(&dev->dev, "gr2d");
+	if (IS_ERR(gr2d->clk)) {
+		dev_err(&dev->dev, "cannot get clock\n");
+		return PTR_ERR(gr2d->clk);
+	}
+
+	err = clk_prepare_enable(gr2d->clk);
+	if (err) {
+		dev_err(&dev->dev, "cannot turn on clock\n");
+		return err;
+	}
+
+	gr2d->channel = host1x_channel_alloc(dev);
+	if (!gr2d->channel)
+		return -ENOMEM;
+
+	gr2d->syncpt = host1x_syncpt_alloc(dev, 0);
+	if (!gr2d->syncpt) {
+		host1x_channel_put(gr2d->channel);
+		return -ENOMEM;
+	}
+
+	gr2d->client.ops = &gr2d_client_ops;
+	gr2d->client.dev = &dev->dev;
+	gr2d->client.class = NV_GRAPHICS_2D_CLASS_ID;
+
+	platform_set_drvdata(dev, gr2d);
+	return tegra_drm_register_client(&gr2d->client);
+}
+
+static int __exit gr2d_remove(struct platform_device *dev)
+{
+	struct gr2d *gr2d = platform_get_drvdata(dev);
+	host1x_syncpt_free(gr2d->syncpt);
+	return 0;
+}
+
+struct platform_driver tegra_gr2d_driver = {
+	.probe = gr2d_probe,
+	.remove = __exit_p(gr2d_remove),
+	.driver = {
+		.owner = THIS_MODULE,
+		.name = "gr2d",
+		.of_match_table = gr2d_match,
+	}
+};
diff --git a/include/drm/tegra_drm.h b/include/drm/tegra_drm.h
index 8632f49..11fb019 100644
--- a/include/drm/tegra_drm.h
+++ b/include/drm/tegra_drm.h
@@ -17,4 +17,115 @@
 #ifndef _TEGRA_DRM_H_
 #define _TEGRA_DRM_H_
 
+struct tegra_gem_create {
+	__u64 size;
+	unsigned int flags;
+	unsigned int handle;
+	unsigned int offset;
+};
+
+struct tegra_gem_invalidate {
+	unsigned int handle;
+};
+
+struct tegra_gem_flush {
+	unsigned int handle;
+};
+
+struct tegra_drm_syncpt_read_args {
+	__u32 id;
+	__u32 value;
+};
+
+struct tegra_drm_syncpt_incr_args {
+	__u32 id;
+	__u32 pad;
+};
+
+struct tegra_drm_syncpt_wait_args {
+	__u32 id;
+	__u32 thresh;
+	__s32 timeout;
+	__u32 value;
+};
+
+#define DRM_TEGRA_NO_TIMEOUT	(-1)
+
+struct tegra_drm_open_channel_args {
+	__u32 class;
+	__u32 pad;
+	__u64 context;
+};
+
+struct tegra_drm_get_channel_param_args {
+	__u64 context;
+	__u32 param;
+	__u32 value;
+};
+
+struct tegra_drm_syncpt_incr {
+	__u32 syncpt_id;
+	__u32 syncpt_incrs;
+};
+
+struct tegra_drm_cmdbuf {
+	__u32 mem;
+	__u32 offset;
+	__u32 words;
+	__u32 pad;
+};
+
+struct tegra_drm_reloc {
+	__u32 cmdbuf_mem;
+	__u32 cmdbuf_offset;
+	__u32 target;
+	__u32 target_offset;
+	__u32 shift;
+	__u32 pad;
+};
+
+struct tegra_drm_waitchk {
+	__u32 mem;
+	__u32 offset;
+	__u32 syncpt_id;
+	__u32 thresh;
+};
+
+struct tegra_drm_submit_args {
+	__u64 context;
+	__u32 num_syncpt_incrs;
+	__u32 num_cmdbufs;
+	__u32 num_relocs;
+	__u32 submit_version;
+	__u32 num_waitchks;
+	__u32 waitchk_mask;
+	__u32 timeout;
+	__u32 pad;
+	__u64 syncpt_incrs;
+	__u64 cmdbufs;
+	__u64 relocs;
+	__u64 waitchks;
+	__u32 fence;		/* Return value */
+
+	__u32 reserved[5];	/* future expansion */
+};
+
+#define DRM_TEGRA_GEM_CREATE		0x00
+#define DRM_TEGRA_DRM_SYNCPT_READ	0x01
+#define DRM_TEGRA_DRM_SYNCPT_INCR	0x02
+#define DRM_TEGRA_DRM_SYNCPT_WAIT	0x03
+#define DRM_TEGRA_DRM_OPEN_CHANNEL	0x04
+#define DRM_TEGRA_DRM_CLOSE_CHANNEL	0x05
+#define DRM_TEGRA_DRM_GET_SYNCPOINT	0x06
+#define DRM_TEGRA_DRM_SUBMIT		0x08
+
+#define DRM_IOCTL_TEGRA_GEM_CREATE DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_GEM_CREATE, struct tegra_gem_create)
+#define DRM_IOCTL_TEGRA_DRM_SYNCPT_READ DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_DRM_SYNCPT_READ, struct tegra_drm_syncpt_read_args)
+#define DRM_IOCTL_TEGRA_DRM_SYNCPT_INCR DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_DRM_SYNCPT_INCR, struct tegra_drm_syncpt_incr_args)
+#define DRM_IOCTL_TEGRA_DRM_SYNCPT_WAIT DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_DRM_SYNCPT_WAIT, struct tegra_drm_syncpt_wait_args)
+#define DRM_IOCTL_TEGRA_DRM_OPEN_CHANNEL DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_DRM_OPEN_CHANNEL, struct tegra_drm_open_channel_args)
+#define DRM_IOCTL_TEGRA_DRM_CLOSE_CHANNEL DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_DRM_CLOSE_CHANNEL, struct tegra_drm_open_channel_args)
+#define DRM_IOCTL_TEGRA_DRM_GET_SYNCPOINT DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_DRM_GET_SYNCPOINT, struct tegra_drm_get_channel_param_args)
+#define DRM_IOCTL_TEGRA_DRM_SUBMIT DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_DRM_SUBMIT, struct tegra_drm_submit_args)
+
 #endif
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCHv3 0/7] Support for Tegra 2D hardware
  2012-12-13 14:04 ` Terje Bergstrom
                   ` (7 preceding siblings ...)
  (?)
@ 2012-12-13 15:03 ` Lucas Stach
  2012-12-13 15:33     ` Terje Bergström
  -1 siblings, 1 reply; 24+ messages in thread
From: Lucas Stach @ 2012-12-13 15:03 UTC (permalink / raw)
  To: Terje Bergstrom
  Cc: thierry.reding, linux-tegra, dri-devel, amerilainen, linux-kernel

Hi Terje,

Am Donnerstag, den 13.12.2012, 16:04 +0200 schrieb Terje Bergstrom:
> This set of patches adds support for Tegra20 and Tegra30 host1x and
> 2D. It is based on linux-next.
> 
> The third version has too many changes to list all of them. Here are
> highlights:
>  * Renamed to host1x, and moved to drivers/gpu/host1x
>  * Greatly simplified the inner workings between physical and logical
>    driver
>  * Does not use AUXDATA for passing data to driver
>  * Runtime power management removed - will replace with runtime PM
>    later
>  * IOCTLs padded and use __64 for passing pointers
>  * DMABUF support removed, replaced with GEM CMA support

You are still doing the allocation the IMHO wrong way around. I thought
we agreed to do all the allocations in host1x, which obviously means not
using the cma_gem_helpers anymore, but introducing a new native host1x
object to back GEM/V4L/whatever objects. IMHO the current approach is a
clear layering violation and makes proper IOMMU support a lot harder. It
would also allow to get rid of all the indirections and ifdefs in host1x
memmgr, as host1x would only have to deal with it's native objects.

All the complexity of converting host1x to GEM objects should be located
in tegradrm and not be scattered between different modules.

Did you leave this out on purpose in this version of the patchset?

>  * host1x driver validates command streams and copies them to kernel
>    owned buffer
>  * Generic interrupt support removed - only syncpt irq remains
>  * Sync points are allocated now dynamically
>  * IO register space handling rewritten to use helper functions
>  * Other numerous fixes and simplifications to code
> 
> Some of the issues left open:
>  * Register definitions still use static inline. There has been a
>    debate about code style versus ability to use compiler type
>    checking and code coverage analysis. There was no conclusion, so
>    I left it as was.
>  * tegradrm has a global variable. Plan was to hide that behind a
>    virtual device, and use that as DRM root device. That plan went
>    bad once the FB CMA helper used the device for trying to allocate
>    memory.
See above, we should get rid of the helpers and do all allocations
within host1x.

Regards,
Lucas

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCHv3 4/7] gpu: host1x: Add debug support
  2012-12-13 14:04   ` Terje Bergstrom
@ 2012-12-13 15:23     ` Joe Perches
  -1 siblings, 0 replies; 24+ messages in thread
From: Joe Perches @ 2012-12-13 15:23 UTC (permalink / raw)
  To: Terje Bergstrom; +Cc: linux-kernel, dri-devel, linux-tegra

On Thu, 2012-12-13 at 16:04 +0200, Terje Bergstrom wrote:
> Add support for host1x debugging. Adds debugfs entries, and dumps
> channel state to UART in case of stuck job.

trivial note:

[]

> diff --git a/drivers/gpu/host1x/debug.h b/drivers/gpu/host1x/debug.h
[]
> +void host1x_debug_output(struct output *o, const char *fmt, ...);

This should be marked __printf(2, 3)
so the compiler verifies format and argument types.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCHv3 4/7] gpu: host1x: Add debug support
@ 2012-12-13 15:23     ` Joe Perches
  0 siblings, 0 replies; 24+ messages in thread
From: Joe Perches @ 2012-12-13 15:23 UTC (permalink / raw)
  To: Terje Bergstrom
  Cc: thierry.reding, dev, linux-tegra, dri-devel, amerilainen, linux-kernel

On Thu, 2012-12-13 at 16:04 +0200, Terje Bergstrom wrote:
> Add support for host1x debugging. Adds debugfs entries, and dumps
> channel state to UART in case of stuck job.

trivial note:

[]

> diff --git a/drivers/gpu/host1x/debug.h b/drivers/gpu/host1x/debug.h
[]
> +void host1x_debug_output(struct output *o, const char *fmt, ...);

This should be marked __printf(2, 3)
so the compiler verifies format and argument types.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCHv3 0/7] Support for Tegra 2D hardware
  2012-12-13 15:03 ` [PATCHv3 0/7] Support for Tegra 2D hardware Lucas Stach
@ 2012-12-13 15:33     ` Terje Bergström
  0 siblings, 0 replies; 24+ messages in thread
From: Terje Bergström @ 2012-12-13 15:33 UTC (permalink / raw)
  To: Lucas Stach
  Cc: thierry.reding-RM9K5IK7kjKj5M59NBduVrNAH6kLmebB,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Arto Merilainen,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On 13.12.2012 17:03, Lucas Stach wrote:
> You are still doing the allocation the IMHO wrong way around. I thought
> we agreed to do all the allocations in host1x, which obviously means not
> using the cma_gem_helpers anymore, but introducing a new native host1x
> object to back GEM/V4L/whatever objects. IMHO the current approach is a
> clear layering violation and makes proper IOMMU support a lot harder. It
> would also allow to get rid of all the indirections and ifdefs in host1x
> memmgr, as host1x would only have to deal with it's native objects.
> 
> All the complexity of converting host1x to GEM objects should be located
> in tegradrm and not be scattered between different modules.
> 
> Did you leave this out on purpose in this version of the patchset?

Forgot to mention that, as IOMMU and consequently the "proper"
allocation support was planned as a follow-up. I wanted to keep the
scope of this set as small as possible.

The plan we agreed on still holds.

Terje

>>  * tegradrm has a global variable. Plan was to hide that behind a
>>    virtual device, and use that as DRM root device. That plan went
>>    bad once the FB CMA helper used the device for trying to allocate
>>    memory.
> See above, we should get rid of the helpers and do all allocations
> within host1x.

I noticed that IOMMU and not using the CMA FB helper is now exynos
managed to do this.

Terje

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCHv3 0/7] Support for Tegra 2D hardware
@ 2012-12-13 15:33     ` Terje Bergström
  0 siblings, 0 replies; 24+ messages in thread
From: Terje Bergström @ 2012-12-13 15:33 UTC (permalink / raw)
  To: Lucas Stach
  Cc: thierry.reding, linux-tegra, dri-devel, Arto Merilainen, linux-kernel

On 13.12.2012 17:03, Lucas Stach wrote:
> You are still doing the allocation the IMHO wrong way around. I thought
> we agreed to do all the allocations in host1x, which obviously means not
> using the cma_gem_helpers anymore, but introducing a new native host1x
> object to back GEM/V4L/whatever objects. IMHO the current approach is a
> clear layering violation and makes proper IOMMU support a lot harder. It
> would also allow to get rid of all the indirections and ifdefs in host1x
> memmgr, as host1x would only have to deal with it's native objects.
> 
> All the complexity of converting host1x to GEM objects should be located
> in tegradrm and not be scattered between different modules.
> 
> Did you leave this out on purpose in this version of the patchset?

Forgot to mention that, as IOMMU and consequently the "proper"
allocation support was planned as a follow-up. I wanted to keep the
scope of this set as small as possible.

The plan we agreed on still holds.

Terje

>>  * tegradrm has a global variable. Plan was to hide that behind a
>>    virtual device, and use that as DRM root device. That plan went
>>    bad once the FB CMA helper used the device for trying to allocate
>>    memory.
> See above, we should get rid of the helpers and do all allocations
> within host1x.

I noticed that IOMMU and not using the CMA FB helper is now exynos
managed to do this.

Terje

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCHv3 4/7] gpu: host1x: Add debug support
  2012-12-13 15:23     ` Joe Perches
@ 2012-12-17 14:01       ` Terje Bergström
  -1 siblings, 0 replies; 24+ messages in thread
From: Terje Bergström @ 2012-12-17 14:01 UTC (permalink / raw)
  To: Joe Perches
  Cc: thierry.reding-RM9K5IK7kjKj5M59NBduVrNAH6kLmebB,
	dev-8ppwABl0HbeELgA04lAiVw, linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Arto Merilainen,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On 13.12.2012 17:23, Joe Perches wrote:
> On Thu, 2012-12-13 at 16:04 +0200, Terje Bergstrom wrote:
>> Add support for host1x debugging. Adds debugfs entries, and dumps
>> channel state to UART in case of stuck job.
> 
> trivial note:
> 
> []
> 
>> diff --git a/drivers/gpu/host1x/debug.h b/drivers/gpu/host1x/debug.h
> []
>> +void host1x_debug_output(struct output *o, const char *fmt, ...);
> 
> This should be marked __printf(2, 3)
> so the compiler verifies format and argument types.

Thanks, I didn't know of this "trick". I'll apply it in the next version.

Considering the amount of feedback I've received from the patches, they
must be top notch quality!

Terje

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCHv3 4/7] gpu: host1x: Add debug support
@ 2012-12-17 14:01       ` Terje Bergström
  0 siblings, 0 replies; 24+ messages in thread
From: Terje Bergström @ 2012-12-17 14:01 UTC (permalink / raw)
  To: Joe Perches
  Cc: thierry.reding, dev, linux-tegra, dri-devel, Arto Merilainen,
	linux-kernel

On 13.12.2012 17:23, Joe Perches wrote:
> On Thu, 2012-12-13 at 16:04 +0200, Terje Bergstrom wrote:
>> Add support for host1x debugging. Adds debugfs entries, and dumps
>> channel state to UART in case of stuck job.
> 
> trivial note:
> 
> []
> 
>> diff --git a/drivers/gpu/host1x/debug.h b/drivers/gpu/host1x/debug.h
> []
>> +void host1x_debug_output(struct output *o, const char *fmt, ...);
> 
> This should be marked __printf(2, 3)
> so the compiler verifies format and argument types.

Thanks, I didn't know of this "trick". I'll apply it in the next version.

Considering the amount of feedback I've received from the patches, they
must be top notch quality!

Terje


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCHv3 4/7] gpu: host1x: Add debug support
  2012-12-17 14:01       ` Terje Bergström
  (?)
@ 2012-12-17 17:04       ` Joe Perches
  -1 siblings, 0 replies; 24+ messages in thread
From: Joe Perches @ 2012-12-17 17:04 UTC (permalink / raw)
  To: Terje Bergström
  Cc: thierry.reding, dev, linux-tegra, dri-devel, Arto Merilainen,
	linux-kernel

On Mon, 2012-12-17 at 16:01 +0200, Terje Bergström wrote:
> Considering the amount of feedback I've received from the patches, they
> must be top notch quality!

Maybe.
Maybe no one else has the hardware.

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2012-12-17 17:04 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-12-13 14:04 [PATCHv3 0/7] Support for Tegra 2D hardware Terje Bergstrom
2012-12-13 14:04 ` Terje Bergstrom
2012-12-13 14:04 ` [PATCHv3 1/7] gpu: host1x: Add host1x driver Terje Bergstrom
2012-12-13 14:04   ` Terje Bergstrom
2012-12-13 14:04 ` [PATCHv3 2/7] gpu: host1x: Add syncpoint wait and interrupts Terje Bergstrom
2012-12-13 14:04   ` Terje Bergstrom
2012-12-13 14:04 ` [PATCHv3 3/7] gpu: host1x: Add channel support Terje Bergstrom
2012-12-13 14:04   ` Terje Bergstrom
2012-12-13 14:04 ` [PATCHv3 4/7] gpu: host1x: Add debug support Terje Bergstrom
2012-12-13 14:04   ` Terje Bergstrom
2012-12-13 15:23   ` Joe Perches
2012-12-13 15:23     ` Joe Perches
2012-12-17 14:01     ` Terje Bergström
2012-12-17 14:01       ` Terje Bergström
2012-12-17 17:04       ` Joe Perches
2012-12-13 14:04 ` [PATCHv3 5/7] drm: tegra: Remove redundant host1x Terje Bergstrom
2012-12-13 14:04   ` Terje Bergstrom
2012-12-13 14:04 ` [PATCHv3 6/7] ARM: tegra: Add board data and 2D clocks Terje Bergstrom
2012-12-13 14:04   ` Terje Bergstrom
2012-12-13 14:04 ` [PATCHv3 7/7] drm: tegra: Add gr2d device Terje Bergstrom
2012-12-13 14:04   ` Terje Bergstrom
2012-12-13 15:03 ` [PATCHv3 0/7] Support for Tegra 2D hardware Lucas Stach
2012-12-13 15:33   ` Terje Bergström
2012-12-13 15:33     ` Terje Bergström

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.